This_is_ the_ ppt_ of_Random_Forest_Basics.pptx

shivangisingh564490 3 views 14 slides Aug 30, 2025

Slide 1 of 14

About This Presentation

scdw

Size: 40.42 KB

Language: en

Added: Aug 30, 2025

Slides: 14 pages

Slide Content

Random Forests: Bagging + Random Subspaces Ensembles of Decision Trees for Robust, High-Performance Models

Core Intuition Train many de-correlated decision trees on bootstrap samples At inference: classification uses majority vote; regression uses average Reduces variance while keeping low bias of deep trees

Bagging (Bootstrap Aggregating) Sample with replacement from training set to create diverse datasets Each tree sees a different subset with duplicates; ~63% unique examples per bootstrap Averaging votes smooths out individual tree noise

Random Feature Selection at Splits At each split, consider a random subset of features (mtry) Prevents dominant features from making trees too similar Typical defaults: sqrt(p) for classification, p/3 for regression (library-dependent)

Training Algorithm (High-level) For b = 1..B (n_estimators): • Draw bootstrap sample; grow a deep CART tree without pruning • At each node: choose best split among mtry random features Aggregate predictions across all trees

Out-of-Bag (OOB) Error Each tree leaves out ~37% of samples → ‘out-of-bag’ set Use OOB samples to estimate generalization error without extra validation set Enable oob_score=True in scikit-learn

Feature Importance Impurity-based: average decrease in Gini/MSE across splits (fast, biased) Permutation importance: drop-in performance after shuffling a feature (slower, more reliable) Consider correlated features and interactions when interpreting

Hyperparameters to Tune n_estimators (B): more trees → better stability (diminishing returns) max_depth, min_samples_leaf: control overfitting and leaf purity max_features (mtry): controls de-correlation bootstrap, class_weight (for imbalance), ccp_alpha (post-pruning if supported)

Bias–Variance Characteristics Forests keep low bias (deep trees) but strongly reduce variance via averaging Performance improves with tree diversity; tune max_features to trade accuracy vs. diversity Too small mtry → underfit; too large mtry → highly correlated trees

Handling Imbalanced Data Use class_weight='balanced' or custom weights Adjust decision threshold using validation PR/ROC curves Try balanced subsampling per tree (if supported)

Practical Tips Use OOB for quick model selection; still confirm with cross-validation Calibrate probabilities (Platt/Isotonic) if well-calibrated confidence is needed For large p, consider feature selection or dimensionality reduction

scikit-learn Example (Classification) from sklearn.ensemble import RandomForestClassifier from sklearn.inspection import permutation_importance rf = RandomForestClassifier(n_estimators=300, max_features='sqrt', oob_score=True, n_jobs=-1, random_state=42) rf.fit(X_train, y_train) print('OOB accuracy:', rf.oob_score_) # Permutation importance result = permutation_importance(rf, X_valid, y_valid, n_repeats=5, random_state=42) imp = result.importances_mean print('Top features:', [feature_names[i] for i in imp.argsort()[::-1][:10]])

Interpretability in Practice Global: feature importance (impurity/permutation), minimal depth analysis Local: tree path inspection, surrogate models, or SHAP for per-instance contributions Remember: ensembles are less interpretable than single trees

Pros & Cons (Summary) Pros: strong accuracy out-of-the-box, robust to noise/outliers, little tuning needed Cons: larger memory/latency, less interpretable, may struggle with very sparse high-dimensional data Widely used baseline for tabular problems

This_is_ the_ ppt_ of_Random_Forest_Basics.pptx

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

This_is_ the_ ppt_ of_Random_Forest_Basics.pptx

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx