This_is _the_Decision_Tree_Basics.pptx__

shivangisingh564490 3 views 15 slides Aug 30, 2025

Slide 1 of 15

About This Presentation

uqegchqcb

Size: 41.27 KB

Language: en

Added: Aug 30, 2025

Slides: 15 pages

Slide Content

Decision Trees: Intuition to Implementation Classification & Regression Trees (CART/ID3) — Teaching Deck

Why Decision Trees? Interpretable, white-box models with human-readable rules Handle mixed data types (numeric + categorical) and non-linear boundaries Minimal preprocessing (no scaling required); robust to outliers Foundation for powerful ensembles (Random Forests, Gradient Boosting)

Core Idea Recursively split the feature space to create regions with high class purity (classification) Choose the split that maximizes impurity reduction / information gain at each node Continue until stopping criterion; then assign leaf predictions

Impurity Measures (Classification) Gini: G = 1 - \sum_k p_k^2 Entropy: H = -\sum_k p_k log_2 p_k Information Gain = Parent Impurity − Weighted Child Impurities

Split Selection Numerical features: try candidate thresholds (e.g., midpoints of sorted unique values) Categorical features: group categories (may be exhaustive or heuristic) Pick split with best impurity reduction (ties broken by secondary criteria)

CART vs. ID3/C4.5 ID3/C4.5: uses Entropy/Information Gain (or Gain Ratio), often for categorical features CART: uses Gini (classification) and MSE (regression), builds binary trees Modern libraries (e.g., scikit-learn) implement CART-style trees

Stopping Criteria (Pre-pruning) max_depth: limit tree depth min_samples_split / min_samples_leaf: minimum samples to split/at leaf max_leaf_nodes: cap number of leaves min_impurity_decrease: require sufficient gain to split

Post-pruning (Cost-Complexity) Grow a large tree, then prune back by penalizing complexity Minimize: R_\alpha(T) = \sum_{leaves} R(t) + \alpha * |leaves| In scikit-learn: tune ccp_alpha via cross-validation

Regression Trees Impurity measure: Mean Squared Error (MSE) or Mean Absolute Error (MAE) Leaf prediction: average of targets in the leaf Same control parameters; beware of overfitting on noisy targets

Bias–Variance Trade-off Shallow tree: high bias, low variance (underfits) Deep tree: low bias, high variance (overfits) Cross-validate depth and leaf sizes to balance performance

Interpretability & Explanations Path explanations: ‘IF (conditions) THEN prediction’ Global feature importance (impurity decrease) Caveat: impurity-based importance can be biased toward high-cardinality features

Handling Practicalities Missing values: impute before training or use surrogate splits (if library supports) Class imbalance: class_weight, balanced subsampling, or threshold tuning No need for feature scaling; still wise to encode categoricals consistently

scikit-learn Example (Classification) from sklearn.tree import DecisionTreeClassifier from sklearn.model_selection import GridSearchCV clf = DecisionTreeClassifier(random_state=42) param_grid = { 'max_depth': [None, 3, 5, 10], 'min_samples_leaf': [1, 2, 5, 10], 'ccp_alpha': [0.0, 0.001, 0.01] } grid = GridSearchCV(clf, param_grid, cv=5, n_jobs=-1) grid.fit(X_train, y_train) print(grid.best_params_, grid.best_score_)

Exporting Rules / Visualization from sklearn import tree import matplotlib.pyplot as plt plt.figure(figsize=(12,6)) tree.plot_tree(grid.best_estimator_, feature_names=feature_names, class_names=class_names, filled=True) plt.show()

Pros & Cons (Summary) Pros: simple, interpretable, minimal preprocessing, handles interactions Cons: unstable to small data changes, prone to overfitting, axis-aligned splits only Often used as base learners for ensembles (RF, GBM, XGBoost)

This_is _the_Decision_Tree_Basics.pptx__

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

This_is _the_Decision_Tree_Basics.pptx__

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......