This_is _the_Decision_Tree_Basics.pptx__

shivangisingh564490 3 views 15 slides Aug 30, 2025
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

uqegchqcb


Slide Content

Decision Trees: Intuition to Implementation Classification & Regression Trees (CART/ID3) — Teaching Deck

Why Decision Trees? Interpretable, white-box models with human-readable rules Handle mixed data types (numeric + categorical) and non-linear boundaries Minimal preprocessing (no scaling required); robust to outliers Foundation for powerful ensembles (Random Forests, Gradient Boosting)

Core Idea Recursively split the feature space to create regions with high class purity (classification) Choose the split that maximizes impurity reduction / information gain at each node Continue until stopping criterion; then assign leaf predictions

Impurity Measures (Classification) Gini: G = 1 - \sum_k p_k^2 Entropy: H = -\sum_k p_k log_2 p_k Information Gain = Parent Impurity − Weighted Child Impurities

Split Selection Numerical features: try candidate thresholds (e.g., midpoints of sorted unique values) Categorical features: group categories (may be exhaustive or heuristic) Pick split with best impurity reduction (ties broken by secondary criteria)

CART vs. ID3/C4.5 ID3/C4.5: uses Entropy/Information Gain (or Gain Ratio), often for categorical features CART: uses Gini (classification) and MSE (regression), builds binary trees Modern libraries (e.g., scikit-learn) implement CART-style trees

Stopping Criteria (Pre-pruning) max_depth: limit tree depth min_samples_split / min_samples_leaf: minimum samples to split/at leaf max_leaf_nodes: cap number of leaves min_impurity_decrease: require sufficient gain to split

Post-pruning (Cost-Complexity) Grow a large tree, then prune back by penalizing complexity Minimize: R_\alpha(T) = \sum_{leaves} R(t) + \alpha * |leaves| In scikit-learn: tune ccp_alpha via cross-validation

Regression Trees Impurity measure: Mean Squared Error (MSE) or Mean Absolute Error (MAE) Leaf prediction: average of targets in the leaf Same control parameters; beware of overfitting on noisy targets

Bias–Variance Trade-off Shallow tree: high bias, low variance (underfits) Deep tree: low bias, high variance (overfits) Cross-validate depth and leaf sizes to balance performance

Interpretability & Explanations Path explanations: ‘IF (conditions) THEN prediction’ Global feature importance (impurity decrease) Caveat: impurity-based importance can be biased toward high-cardinality features

Handling Practicalities Missing values: impute before training or use surrogate splits (if library supports) Class imbalance: class_weight, balanced subsampling, or threshold tuning No need for feature scaling; still wise to encode categoricals consistently

scikit-learn Example (Classification) from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import GridSearchCV clf = DecisionTreeClassifier(random_state=42) param_grid = { 'max_depth': [None, 3, 5, 10], 'min_samples_leaf': [1, 2, 5, 10], 'ccp_alpha': [0.0, 0.001, 0.01] } grid = GridSearchCV(clf, param_grid, cv=5, n_jobs=-1) grid.fit(X_train, y_train) print(grid.best_params_, grid.best_score_)

Exporting Rules / Visualization from sklearn import tree import matplotlib.pyplot as plt plt.figure(figsize=(12,6)) tree.plot_tree(grid.best_estimator_, feature_names=feature_names, class_names=class_names, filled=True) plt.show()

Pros & Cons (Summary) Pros: simple, interpretable, minimal preprocessing, handles interactions Cons: unstable to small data changes, prone to overfitting, axis-aligned splits only Often used as base learners for ensembles (RF, GBM, XGBoost)
Tags