shivangisingh564490
3 views
15 slides
Aug 30, 2025
Slide 1 of 15
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
About This Presentation
uqegchqcb
Size: 41.27 KB
Language: en
Added: Aug 30, 2025
Slides: 15 pages
Slide Content
Decision Trees: Intuition to Implementation Classification & Regression Trees (CART/ID3) — Teaching Deck
Why Decision Trees? Interpretable, white-box models with human-readable rules Handle mixed data types (numeric + categorical) and non-linear boundaries Minimal preprocessing (no scaling required); robust to outliers Foundation for powerful ensembles (Random Forests, Gradient Boosting)
Core Idea Recursively split the feature space to create regions with high class purity (classification) Choose the split that maximizes impurity reduction / information gain at each node Continue until stopping criterion; then assign leaf predictions
Impurity Measures (Classification) Gini: G = 1 - \sum_k p_k^2 Entropy: H = -\sum_k p_k log_2 p_k Information Gain = Parent Impurity − Weighted Child Impurities
Split Selection Numerical features: try candidate thresholds (e.g., midpoints of sorted unique values) Categorical features: group categories (may be exhaustive or heuristic) Pick split with best impurity reduction (ties broken by secondary criteria)
CART vs. ID3/C4.5 ID3/C4.5: uses Entropy/Information Gain (or Gain Ratio), often for categorical features CART: uses Gini (classification) and MSE (regression), builds binary trees Modern libraries (e.g., scikit-learn) implement CART-style trees
Stopping Criteria (Pre-pruning) max_depth: limit tree depth min_samples_split / min_samples_leaf: minimum samples to split/at leaf max_leaf_nodes: cap number of leaves min_impurity_decrease: require sufficient gain to split
Post-pruning (Cost-Complexity) Grow a large tree, then prune back by penalizing complexity Minimize: R_\alpha(T) = \sum_{leaves} R(t) + \alpha * |leaves| In scikit-learn: tune ccp_alpha via cross-validation
Regression Trees Impurity measure: Mean Squared Error (MSE) or Mean Absolute Error (MAE) Leaf prediction: average of targets in the leaf Same control parameters; beware of overfitting on noisy targets
Bias–Variance Trade-off Shallow tree: high bias, low variance (underfits) Deep tree: low bias, high variance (overfits) Cross-validate depth and leaf sizes to balance performance
Interpretability & Explanations Path explanations: ‘IF (conditions) THEN prediction’ Global feature importance (impurity decrease) Caveat: impurity-based importance can be biased toward high-cardinality features
Handling Practicalities Missing values: impute before training or use surrogate splits (if library supports) Class imbalance: class_weight, balanced subsampling, or threshold tuning No need for feature scaling; still wise to encode categoricals consistently
Exporting Rules / Visualization from sklearn import tree import matplotlib.pyplot as plt plt.figure(figsize=(12,6)) tree.plot_tree(grid.best_estimator_, feature_names=feature_names, class_names=class_names, filled=True) plt.show()
Pros & Cons (Summary) Pros: simple, interpretable, minimal preprocessing, handles interactions Cons: unstable to small data changes, prone to overfitting, axis-aligned splits only Often used as base learners for ensembles (RF, GBM, XGBoost)