Decision Trees data analytics and data ensemble methods

Presented by Gayathri Hegde Research Scholar Dept of CSE, UVCE Under Guidance Dr P Deepa Shenoy Dean, Professor, Dept of CSE, UVCE Bangalore

CAR Price >10L Price >10L Decline Bad Mileage Decline Good Mileage Safety Decline Less Safety Color Red Blue Decline Buy

Decision Trees Is a simple but powerful learning paradigm. It’s a type of classification algorithm for supervised algorithm . Is a graphical representation of a tree-shaped diagram that is used to determine the course of action. Each branch node represents a choice between number of alternatives and each leaf node represents a decision. Each node represents a feature (attribute). Each link(Branch) represents a Decision (Rule). Each leaf represents an outcome (Categorical or continuous).

Example of Decision Tree

Decision Tree Algorithms Hunts Algorithm CART( Classification And Regression Trees) ID3 (Iterative Dichotomiser ) C4.5 SLIQ SPRINT( CHAID( Chi-square automatic interaction detection)

Important terminologies used in Decision Trees Root Node: Topmost node in the decision tree where all the information is stored or has the highest entropy. Decision Node has 2 or more branches . It’s the mid node in the decision tree where 2 or more splits arise. Leaf Node : It’s the end of the decision tree that carries the classification or the decision information

Important terminologies used in Decision Trees Entropy: is measure of amount of uncertainty in the dataset S. Average Information Entropy Entropy(Attribute) Information Gain: Tells us how much uncertainty was reduced after splitting datasets in attribute. Information Gain=Entropy(S)- I(Attribute)

ID3(Iterative Dichotomiser 3) algorithm iteratively ( repeatedly ) dichotomizes( divides ) features into two or more groups at each step . It’s a classification algorithm Follows greedy approach by selecting a best attribute that yields maximum Information Gain(IG) or minimum Entropy.

Steps to construct the Decision Tree Compute Entropy for dataset – Entropy(S) For Every attribute/Features Calculate entropy for all other values - Entropy(A). Take average information entropy for the current attribute. Calculate Gain for the current attribute . Pick the highest gain attribute. Repeat until we get the desired tree.

Decision Tree for the dataset Sl No Outlook Temperature Humidity Windy Play Tennis 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 3 Overcast Hot High Weak Yes 4 Rainy Mild High Weak Yes 5 Rainy Cool Normal Weak Yes 6 Rainy Cool Normal Strong No 7 Overcast Cool Normal Strong Yes 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 10 Rainy Mild Normal Weak Yes 11 Sunny Mild Normal Strong Yes 12 Overcast Mild High Strong Yes 13 Overcast Hot Normal Weak Yes 14 Rainy Mild High Strong No

Step1:Calculate the Entropy(S) No. of + ve attribute P=9 – ve attribute is N=5 Total:14 = 0.940 (Entropy of the entire dataset)

Step2.1: For each attribute –calculate Entropy for each values- Attribute - Outlook Outlook P N Entropy Sunny 2 3 Rainy 3 2 Overcast 4 = Outlook P N Entropy Sunny 2 3 0.971 Rainy 3 2 0.971 Overcast 4

Step 2.2 Calculate the average Information Entropy Entropy(Attribute) 0.693

Step2.3: Calculate Information Gain- Attribute Outlook Information Gain=Entropy(S)-I(Attribute=Outlook) = 0.940-0.693 = 0.247 Attribute Information Gain Outlook 0.247 Temperature Humidity Windy

Step2.1: For each attribute –calculate Entropy for each values- Attribute - Temperature = 0.811 Temperature P N Entropy Hot 2 2 Mild 4 2 Cool 3 1 Temperature P N Entropy Hot 2 2 1 Mild 4 2 0.918 Cool 3 1 0.811

Step 2.2 Calculate the average Information Entropy Entropy(Attribute) 0. 9 11

Step2.3: Calculate Information Gain- Attribute Temperature Information Gain=Entropy(S)-I(Attribute=Temperature) = 0.940-0.911 =0.029 Attribute Information Gain Outlook 0.247 Temperature 0.029 Humidity Windy

Step2.1: For each attribute –calculate Entropy for each values- Attribute - Humidity Humidity P N Entropy High 3 4 Normal 6 1 Humidity P N Entropy High 3 4 0.985 Normal 6 1 0.591

Step 2.2 Calculate the average Information Entropy Entropy(Attribute) 0.788

Step2.3: Calculate Information Gain- Attribute Humidity Information Gain=Entropy(S)-I(Attribute=Humidity) = 0.940-0.788 =0.152 Attribute Information Gain Outlook 0.247 Temperature 0.029 Humidity 0.152 Windy

Step2.1: For each attribute –calculate Entropy for each values- Attribute - Windy Windy P N Entropy Strong 3 3 Weak 6 2 Windy P N Entropy Strong 3 3 1 Weak 6 2 0.811

Step 2.2 Calculate the average Information Entropy Entropy(Attribute) 0.892

Step2.3: Calculate Information Gain- Attribute Humidity Information Gain=Entropy(S)-I(Attribute=Humidity) = 0.940-0.892 =0.048 Attribute Information Gain Outlook 0.247 Temperature 0.029 Humidity 0.152 Windy 0.048

Step 3:Pick the highest Gain attribute Here the attribute with maximum Information Gain is outlook Hence RootNode will be outlook. Repeat step 1 to 3 again for Sunny and Rainy. Attribute Information Gain Outlook 0.247 Temperature 0.029 Humidity 0.152 Windy 0.048 Outlook ? ? Yes Overcast Sunny Rainy

New dataset to be considered - Sunny Sl No Outlook Temperature Humidity Windy Play Tennis 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 11 Sunny Mild Normal Strong Yes Step1:Calculate the Entropy(S) No. of + ve attribute P=2 and – ve attribute is N=3 Total: 5 = 0.971 (Entropy of the entire dataset)

Step2.1: For each attribute –calculate Entropy for each values- Attribute - Temperature Temperature P N Entropy Hot 1 Mild 2 Cool 1 1 1 Sl No Outlook Temperature Humidity Windy Play Tennis 1 Sunny Hot High Weak No 2 Sunny Hot High Strong No 8 Sunny Mild High Weak No 9 Sunny Cool Normal Weak Yes 11 Sunny Mild Normal Strong Yes

Step 2.2 Calculate the average Information Entropy Entropy(Attribute) . 4

Step2.3: Calculate Information Gain- Attribute Temperature Information Gain=Entropy(S)-I(Attribute=Temperature) = 0.971-0.4 = 0.571 Attribute Information Gain Temperature 0.571 Humidity Windy

Step2.1: For each attribute –calculate Entropy for each values- Attribute - Humidity Humidity P N Entropy High 3 Normal 2

Step 2.2 Calculate the average Information Entropy Entropy(Attribute)

Step2.3: Calculate Information Gain- Attribute Humidity Information Gain=Entropy(S)-I(Attribute=Humidity) = 0.971-0 = 0.971 Attribute Information Gain Temperature 0.571 Humidity 0.971 Windy

Step2.1: For each attribute –calculate Entropy for each values- Attribute - Windy Windy P N Entropy Strong 1 1 Weak 1 2 Windy P N Entropy Strong 1 1 1 Weak 1 2 0.918

Step 2.2 Calculate the average Information Entropy Entropy(Attribute) 0.951

Step2.3: Calculate Information Gain- Attribute Humidity Information Gain=Entropy(S)-I(Attribute=Humidity) = 0.971-0.951 =0.020 Attribute Information Gain Temperature 0.571 Humidity 0.971 Windy 0.02

Step 3:Pick the highest Gain attribute Here the attribute with maximum Information Gain is Humidity. Repeat step 1 to 3 again for Sunny and Rainy. Outlook ? Humidity Yes Normal Sunny Rainy Attribute Information Gain Temperature 0.571 Humidity 0.971 Windy 0.02 Yes No Overcast High

New dataset to be considered - Rainy Sl No Outlook Temperature Humidity Windy Play Tennis 4 Rainy Mild High Weak Yes 5 Rainy Cool Normal Weak Yes 6 Rainy Cool Normal Strong No 10 Rainy Mild Normal Weak Yes 14 Rainy Mild High Strong No Step1:Calculate the Entropy(S) No. of + ve attribute P=3 and – ve attribute is N=2 Total: 5 = 0.971(Entropy of the entire dataset)

Step2.1: For each attribute –calculate Entropy for each values- Attribute - Temperature Temperature P N Entropy Mild 2 1 Cool 1 1 Temperature P N Entropy Mild 2 1 0.918 Cool 1 1 1

Step 2.2 Calculate the average Information Entropy Entropy(Attribute)

Step2.3: Calculate Information Gain- Attribute Temperature Information Gain=Entropy(S)-I(Attribute=Temperature) = 0.971-0.951 =0.020 Attribute Information Gain Temperature 0.020 Humidity Windy

Step2.1: For each attribute –calculate Entropy for each values- Attribute - Humidity Humidity P N Entropy High 1 1 1 Normal 2 1 0.918

Step 2.2 Calculate the average Information Entropy Entropy(Attribute) 0.951

Step2.3: Calculate Information Gain- Attribute Humidity Information Gain=Entropy(S)-I(Attribute=Humidity) = 0.971-0.951 =0.020 Attribute Information Gain Temperature 0.020 Humidity 0.020 Windy

Step2.1: For each attribute –calculate Entropy for each values- Attribute - Windy Windy P N Entropy Strong 2 Weak 3

Step 2.2 Calculate the average Information Entropy Entropy(Attribute)

Step2.3: Calculate Information Gain- Attribute Humidity Information Gain=Entropy(S)-I(Attribute=Humidity) = 0.971-0 =0.971 Attribute Information Gain Temperature 0.020 Humidity 0.020 Windy 0.971

Step 3:Pick the highest Gain attribute Here the attribute with maximum Information Gain is Windy. Repeat step 1 to 3 again for Sunny and Rainy. Outlook Windy Humidity Yes Normal Sunny Rainy Yes No Overcast High Attribute Information Gain Temperature 0.020 Humidity 0.020 Windy 0.971 Weak Yes No Strong Final Decision Tree

Advantages and Disadvantages simple to understand and interpret . Little effort in data preparation. Non-linear parameter does not effect performance. Disadvantages Overfitting: In case of noise in the data. Instability: Models can become unstable due to variations in data.

Decision Trees data analytics and data ensemble methods

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Decision Trees data analytics and data ensemble methods

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx