This_ is_the_Ensemble_Learning_Deck.pptx

shivangisingh564490 4 views 24 slides Aug 30, 2025
Slide 1
Slide 1 of 24
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24

About This Presentation

xdcd


Slide Content

Ensemble Learning: Bagging, Boosting & Stacking A practical, teaching-ready deck (with code snippets)

Why Ensemble Learning? Combine multiple models to improve accuracy and robustness Reduce variance (averaging), sometimes bias (boosting) Stronger generalization on tabular data; competitive baseline in practice Natural parallelization (bagging) & strong off-the-shelf performance (RF)

Bias–Variance Perspective (High-Level) Prediction error = Bias² + Variance + Irreducible noise Bagging ↓ variance by averaging unstable learners (e.g., trees) Boosting can ↓ bias by sequentially correcting residuals Diversity among base learners is key for gains

Ensemble Taxonomy Homogeneous vs. Heterogeneous (same vs. different base models) Parallel (Bagging/Random Forest) vs. Sequential (Boosting) Voting/Averaging (hard vs. soft) Stacking/Blending with a meta-learner

Bagging: Bootstrap Aggregating Train B models on bootstrap samples (sampling with replacement) Each model is high-variance (e.g., deep tree) → averaging stabilizes Out-of-Bag (OOB) estimation: ~37% data not seen by a given tree Key knobs: n_estimators (B), base_learner complexity, max_samples

Random Forests (RF) Bagging + Random Subspace: each split considers random subset of features De-correlates trees; often best default for tabular data Classification: majority vote; Regression: average Tune: n_estimators, max_features, max_depth, min_samples_leaf, class_weight

Extremely Randomized Trees (ExtraTrees) Randomized thresholds + random feature subsets at each split Even more de-correlation; often faster May increase bias slightly but reduce variance further

Boosting: Core Idea Train weak learners sequentially; each focuses on previous errors Stage-wise additive modeling: f_{t}(x) = f_{t-1}(x) + η * h_t(x) Requires careful regularization (learning rate, depth, subsampling)

AdaBoost (Binary Classification) Re-weights samples; harder examples get higher weight Weak learner: typically shallow decision trees (stumps) Final prediction: weighted vote of weak learners Sensitive to noise/outliers; strong on clean, small-to-medium data

Gradient Boosting (GBDT/GBM) Fit new tree to negative gradient of loss (residuals) Key hyperparameters: n_estimators, learning_rate, max_depth (or max_leaves), subsample Use early stopping with validation set to prevent overfit Variants: XGBoost, LightGBM, CatBoost

XGBoost vs. LightGBM vs. CatBoost (At a Glance) XGBoost: robust regularization, shrinkage, column subsampling, wide ecosystem LightGBM: leaf-wise growth with depth limits; fast on large, sparse datasets CatBoost: native categorical handling, ordered boosting (reduces target leakage) Pick based on data size, sparsity, categorical richness, and latency needs

Stacking (Meta-Learning) Level-0: diverse base models; Level-1: meta-learner uses out-of-fold predictions Cross-validation is crucial to avoid leakage Use simple meta-learners first (logistic/linear) to avoid overfitting Blending: holdout set for meta-features (simpler, less data-efficient)

Voting & Averaging Hard voting: majority class label Soft voting: average predicted probabilities (requires calibrated models) Weighted voting: weight by validation performance or domain knowledge

Imbalanced Data Strategies Use class_weight='balanced' (RF/AdaBoost/etc.) or sampling strategies Optimize thresholds using PR curves; use AUPRC for evaluation Consider Balanced Random Forest, EasyEnsemble, or focal loss (boosting variants)

Interpretability & Diagnostics Global: permutation importance, minimal depth, gain statistics Local: SHAP values, tree path analysis, counterfactuals Check calibration (reliability curves) for probability outputs

Practical Tips Start with RF as a baseline for tabular problems For boosting: tune learning_rate and trees with early stopping Use OOB (bagging/RF) for quick model iteration Cross-validate across time splits for temporal data (avoid leakage)

Key Hyperparameters (Cheat Sheet) RF: n_estimators↑, max_features (sqrt/log2), min_samples_leaf (1–10) GBM: learning_rate (0.01–0.1), n_estimators (100–1000+), max_depth (3–10), subsample (0.6–0.9) AdaBoost: n_estimators (50–500), learning_rate (0.01–1.0) Stacking: base diversity ↑, meta-learner regularization (ridge/logistic)

Code: Bagging & Random Forest (scikit-learn) from sklearn.ensemble import BaggingClassifier, RandomForestClassifier from sklearn.model_selection import train_test_split, cross_val_score X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42) bag = BaggingClassifier(n_estimators=200, max_samples=0.8, n_jobs=-1, random_state=42) rf = RandomForestClassifier(n_estimators=400, max_features='sqrt', oob_score=True, n_jobs=-1, random_state=42) for model in [bag, rf]: scores = cross_val_score(model, X_train, y_train, cv=5, n_jobs=-1) print(model.__class__.__name__, scores.mean(), scores.std()) rf.fit(X_train, y_train); print('OOB:', rf.oob_score_)

Code: AdaBoost & GradientBoosting (scikit-learn) from sklearn.ensemble import AdaBoostClassifier, GradientBoostingClassifier ada = AdaBoostClassifier(n_estimators=300, learning_rate=0.1, random_state=42) gb = GradientBoostingClassifier(n_estimators=500, learning_rate=0.05, max_depth=3, subsample=0.8, random_state=42) ada.fit(X_train, y_train) gb.fit(X_train, y_train) print('AdaBoost test acc:', ada.score(X_test, y_test)) print('GB test acc:', gb.score(X_test, y_test))

Code: Stacking & Voting (scikit-learn) from sklearn.ensemble import StackingClassifier, VotingClassifier from sklearn.linear_model import LogisticRegression from sklearn.svm import SVC base_learners = [ ('rf', RandomForestClassifier(n_estimators=300, random_state=42)), ('svc', SVC(probability=True, kernel='rbf', C=2.0, gamma='scale', random_state=42)) ] meta = LogisticRegression(max_iter=1000) stack = StackingClassifier(estimators=base_learners, final_estimator=meta, cv=5, n_jobs=-1) vote = VotingClassifier(estimators=base_learners, voting='soft', n_jobs=-1) stack.fit(X_train, y_train) vote.fit(X_train, y_train) print('Stack acc:', stack.score(X_test, y_test)) print('Vote acc:', vote.score(X_test, y_test))

When to Use What RF: strong baseline for mixed/tabular data; low tuning cost GBM (XGB/LGBM/CatBoost): when you need top accuracy and can tune carefully Bagging (generic): unstable base learner & small data → variance reduction Stacking: when diverse models each capture different structure; ensure robust CV

Common Pitfalls & How to Avoid Data leakage in stacking/blending → use out-of-fold predictions Overfitting with too-deep trees in boosting → use small max_depth + regularization Poor probability calibration in RF/Boosting → use calibration on validation set Distribution shift → evaluate with time-aware or group-aware splits

Mini Case Sketch (Credit Risk) Goal: predict default; imbalanced (5% positive) Baseline RF with class_weight='balanced' → tune max_features Compare with LightGBM + early stopping; evaluate AUPRC Stack RF + SVC + LR-meta for final model; check calibration

References & Further Reading Breiman, L. (1996) Bagging Predictors; (2001) Random Forests Freund & Schapire (1997) AdaBoost Friedman (2001) Greedy Function Approximation (GBM) Chen & Guestrin (2016) XGBoost; Ke et al. (2017) LightGBM; Dorogush et al. (2018) CatBoost
Tags