iPAS Study Notes - AI Application Planner

TLDR

AI Fundamentals: AI includes Machine Learning (ML) and Deep Learning (DL). Currently, the mainstream is "Narrow AI," while Artificial General Intelligence (AGI) has not yet been achieved.
Data Engineering: Data Lakehouse combines the flexibility of a data lake with the governance capabilities of a data warehouse, serving as the foundation for modern AI projects.
Data Processing: The Medallion Architecture (Bronze/Silver/Gold) is the best practice for layered data management; ELT architecture has gradually replaced traditional ETL due to increased cloud computing power.
Feature Engineering: Categorical features require encoding based on cardinality and order (One-Hot, Target, WoE, etc.); numerical features require normalization (Z-score, Robust Scaling, etc.) to eliminate scale differences.
Model Evaluation: When classes are imbalanced, Accuracy becomes misleading; priority should be given to F1-score, AUC-ROC, or MCC. For regression problems, choose MAE or RMSE based on sensitivity to outliers.
Model Training: Overfitting can be mitigated through Dropout, regularization (L1/L2), Early Stopping, and cross-validation.
Generative AI: RAG is suitable for knowledge base Q&A, while Fine-tuning is suitable for reinforcing specific styles or domain terminology; both can be used in combination.
AI Governance: The EU AI Act adopts risk-based management; AI systems must focus on Bias, Robustness, and Explainability (XAI).

AI Fundamental Concepts

AI Capability Levels and Classification

Current commercial AI (such as the GPT series) belongs to "Narrow AI," characterized by a lack of autonomous goal setting, no persistent memory (unless assisted by RAG), and limited cross-domain transferability. AI functions can be categorized into four types: analytical, predictive, generative, and prescriptive.

Relationship between AI, Machine Learning, and Deep Learning

The three have a nested relationship (AI ⊃ ML ⊃ DL):

AI: Refers generally to technologies that enable machines to exhibit intelligent behavior.
ML: Learns patterns automatically through data, requiring manual feature design.
DL: Uses multi-layer neural networks to automatically extract features, requiring large amounts of data.

Data Engineering

Data Storage Architecture

Data Warehouse: Suitable for structured data; uses Schema-on-Write.
Data Lake: Suitable for raw data storage; uses Schema-on-Read.
Data Lakehouse: Combines the advantages of both, supporting ACID transactions and version control, suitable for reporting, ML, and RAG integration.

Data Processing: ETL vs ELT

Modern cloud platforms mostly adopt ELT, loading raw data into the lakehouse (Bronze layer) first and then transforming it as needed. This approach preserves raw details, which is beneficial for subsequent feature extraction by machine learning models.

Data Governance: Data Mesh

Data Mesh decentralizes data ownership to individual business domains, solving the scaling bottlenecks of centralized platforms through "Data as a Product" and "Federated Governance."

Feature Engineering

Categorical Feature Encoding

One-Hot Encoding: Suitable for features with few categories and no inherent order.
Ordinal Encoding: Suitable for categories with a clear order (e.g., education level).
Target Encoding: Suitable for high-cardinality features, but must guard against Data Leakage (should use Leave-One-Out or Smoothing).
WoE Encoding: Designed specifically for binary classification and fits naturally with Logistic Regression.

Numerical Feature Scaling

Z-score: Suitable for scenarios where the distribution is stable and the algorithm is sensitive to scale.
Robust Scaling: Uses the median and interquartile range, providing robustness against outliers.
MaxAbs Scaling: Preserves the zero-value structure of sparse matrices, suitable for text features.

Machine Learning Model Evaluation

Selection of Evaluation Metrics

Classification Problems: When classes are imbalanced, Accuracy is misleading; use F1-score or MCC instead. If the cost of false positives is high (e.g., spam), prioritize Precision; if the cost of false negatives is high (e.g., cancer screening), prioritize Recall.
Regression Problems: If data contains outliers, MAE reflects true error better than RMSE.

Overfitting Prevention

Early Stopping: Stop training when the loss on the validation set no longer decreases.
Regularization: L1 (Lasso) can produce sparse weights, suitable for feature selection; L2 (Ridge) shrinks weights, improving model stability.

Generative AI and LLM

RAG vs Fine-tuning Decision

RAG: Solves model knowledge cutoff dates and internal data citation issues; suitable for knowledge base Q&A.
Fine-tuning: Solves the model's understanding of specific tones, formats, or domain terminology; suitable for deep customization of behavioral patterns.

Prompt Engineering Techniques

Chain-of-Thought (CoT): Guides the model to break down reasoning steps.
Tree-of-Thoughts (ToT): Models reasoning as a tree search, suitable for complex planning tasks.
Prompt Chaining: Breaks large tasks into multiple steps to improve the stability of complex tasks.

AI Governance and Security

AI Risk Management

EU AI Act: Adopts risk-based classification; high-risk applications (e.g., healthcare, recruitment) must establish risk management systems and human oversight mechanisms.
OWASP Top 10 for LLM: Focuses on preventing Prompt Injection, sensitive information leakage, and Excessive Agency.
Privacy Protection: Differential Privacy, which protects personal data by injecting noise, is the gold standard for privacy; Federated Learning ensures raw data never leaves the local environment.

Explainability (XAI)

SHAP: Allocates feature contributions through cooperative game theory, suitable for explaining individual predictions.
LIME: Performs local approximation near samples, suitable for local explanation of black-box models.
Grad-CAM: Visualizes the image regions the model focuses on, suitable for computer vision tasks.

Change Log: 2026-05-20 Initial document created.

On this page

iPAS Study Notes - AI Application Planner

TLDR

AI Fundamental Concepts

AI Capability Levels and Classification

Relationship between AI, Machine Learning, and Deep Learning

Data Engineering

Data Storage Architecture

Data Processing: ETL vs ELT

Data Governance: Data Mesh

Feature Engineering

Categorical Feature Encoding

Numerical Feature Scaling

Machine Learning Model Evaluation

Selection of Evaluation Metrics

Overfitting Prevention

Generative AI and LLM

RAG vs Fine-tuning Decision

Prompt Engineering Techniques

AI Governance and Security

AI Risk Management

Explainability (XAI)

CloudyWing's Note

Quick Links

Contact

On this page

iPAS Study Notes - AI Application Planner ​

TLDR ​

AI Fundamental Concepts ​

AI Capability Levels and Classification ​

Relationship between AI, Machine Learning, and Deep Learning ​

Data Engineering ​

Data Storage Architecture ​

Data Processing: ETL vs ELT ​

Data Governance: Data Mesh ​

Feature Engineering ​

Categorical Feature Encoding ​

Numerical Feature Scaling ​

Machine Learning Model Evaluation ​

Selection of Evaluation Metrics ​

Overfitting Prevention ​

Generative AI and LLM ​

RAG vs Fine-tuning Decision ​

Prompt Engineering Techniques ​

AI Governance and Security ​

AI Risk Management ​

Explainability (XAI) ​

CloudyWing's Note

Quick Links

Contact

iPAS Study Notes - AI Application Planner

TLDR

AI Fundamental Concepts

AI Capability Levels and Classification

Relationship between AI, Machine Learning, and Deep Learning

Data Engineering

Data Storage Architecture

Data Processing: ETL vs ELT

Data Governance: Data Mesh

Feature Engineering

Categorical Feature Encoding

Numerical Feature Scaling

Machine Learning Model Evaluation

Selection of Evaluation Metrics

Overfitting Prevention

Generative AI and LLM

RAG vs Fine-tuning Decision

Prompt Engineering Techniques

AI Governance and Security

AI Risk Management

Explainability (XAI)