Decision tree algorithm in Machine Learning

girilogu2 16 views 39 slides Mar 10, 2025
Slide 1
Slide 1 of 39
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39

About This Presentation

Decision tree


Slide Content

21ES613-MACHINE LEARNING FOR EMBEDDED APPLICATIONS D r T.ANANTHAN Associate Professor Dept. of EEE 1/23/2025

Decision Tree 1/23/2025 Dr T ANANTHAN 2

Decision tree learning Decision tree learning is a method for approximating discrete-valued target function, in which the learned function is represented by a decision tree. This algorithm has been successfully applied to broad range of tasks from learning medical cases to learning to assess credit risk of loan applicants. 1/23/2025 Dr T ANANTHAN 3

Decision tree representation Decision tree classify the instances by sorting them down the tree from the root to some leaf node, which provide the classification of the instance. Each node in the tree specifies a test of some attributes or features of the instance, and each branch descending from that node corresponds to one of the possible values for this feature. An instance is classified by starting at the root node of the tree, testing the feature specified by the node, then moving down the tree branch corresponding to the value of the feature. This process is repeated for the subtree rooted at the new node. 1/23/2025 Dr T ANANTHAN 4

Simple Training Data Set 1/23/2025 Dr T ANANTHAN 5 Days Outlook Temperature Humidity Wind Play Tennis? D1 Sunny high High Weak No D2 Sunny high High Strong No D3 Overcast high High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast high Normal Weak Yes D14 Rain Mild High Strong No

A decision tree for <Outlook, Temperature, Humidity, Wind > Play tennis?   1/23/2025 Dr T ANANTHAN 6 Sunny Outlook Humidity Wind Rain Overcast High Normal Strong Weak No Yes No Yes Yes Root (Starting node) Internal node Leaf (Terminal node) Attribute

Decision tree learning Problem setting: Set of possible instances each instance in is a feature vector e.g.,<Humidity=low, Wind=weak, Outlook=rain, Temp=high> Unknown target function , if we play tennis on this day, else 0 Set of function hypotheses each hypothesis is a decision tree tree sorts to leaf, which assigns   1/23/2025 Dr T ANANTHAN 7 Sunny Outlook Humidity Wind Rain Overcast High Normal Strong Weak No Yes No Yes Yes

Decision tree learning Problem setting: Set of possible instances each instance in is a feature vector = < > Unknown target function is a discrete valued Set of function hypotheses each hypothesis is a decision tree Input: Training examples of unknown target function Output: Hypothesis that best approximates target function   1/23/2025 Dr T ANANTHAN 8

Decision Tree Algorithm-ID3 ID3 is the basic and simplest learning algorithm This learning algorithm is to grow the decision tree starting at the root and just grow top down until all of the training data typically classified and stop growing the tree at that point. The root node will select the best attribute. For example, it is to select the attribute Humidity or Outlook or Temperature or Wind A descendant of the root node is then created for each possible value if this attribute. The entire process is then repeated using the training examples. This forms a greedy search for an acceptable decision tree. 1/23/2025 Dr T ANANTHAN 9

Decision Tree Algorithm-ID3 START node:root Main Loop: A the ‘best’ decision attribute for next node Assign A as decision attribute for node For each value of A, create descendant of the node Sort training examples of leaf nodes If training examples perfectly classified. Then STOP . Else iterate the leaf nodes   1/23/2025 Dr T ANANTHAN 10

What is the statistical test to select the attribute? The statistical property called information gain , that measures how well a given attribute separates the training examples. Information gain is the good quantitative measure of the worth of an attribute. ID3 uses this information gain measure to select among the candidate attributes at each step while growing the tree. A measure commonly used in information theory is entropy The entropy characterizes the impurity of an arbitrary collection of examples. 1/23/2025 Dr T ANANTHAN 11

Entropy measure is the sample of training examples is the proportion of positive examples in is the proportion of negative examples in Entropy measures of impurity of   1/23/2025 Dr T ANANTHAN 12  

Entropy Measure To illustrate , suppose is a collection 14 examples shown in table. There are 9 positive(+) and 5 negative(-) examples, with notation [+9, -5] to summarize such a sample of data Then the entropy of relative to this Boolean classification is = =   1/23/2025 Dr T ANANTHAN 13

Entropy Measure = = Notice that the entropy is 0 if all members of belong to same class. For example if all members are positive ( ), then is 0 and =0. (define to be 0 ) Note the entropy is 1when the collections contains an equal number of positive and negative examples. If the collection contains unequal number of positive and negative examples, the entropy is between 0 and 1. between 0 and 1. The above fig. shows the form of entropy function relative to Boolean classification as varies Hence, entropy is a measure of the impurity in a collection of training examples .   1/23/2025 Dr T ANANTHAN 14

Entropy measure One interpretation of entropy from information theory is that it specifies the minimum number of bits of information needed to encode the classification of an arbitrary member of For example, if is 1, the receiver knows the drawn example will be positive, so no message need be sent, and the entropy is zero. On the other hand if is 0.5, one bit is required to indicate whether the drawn example is positive or negative. If is 0.8, then a collection of message can be encoded using on average less than 1 bit per message.   1/23/2025 Dr T ANANTHAN 15

Entropy Measure So far discussed entropy in the case where the target classification is boolean. More generally, if the target can take on different values, then the entropy of is defined as Where is the proportion of belonging to class   1/23/2025 Dr T ANANTHAN 16

Information Gain What is information gain? The measure of the effectiveness of an attribute in classifying the training data is called information gain. The measure is simply the expected reduction in entropy caused by partition the examples according to this attribute. Let, is the information gain of an attribute in the collection of examples . The information gain is given by   1/23/2025 Dr T ANANTHAN 17  

Information Gain Where - of all possible values of attribute , - subset of for which the attribute has value   1/23/2025 Dr T ANANTHAN 18

Information Gain Example: To measure the information gain of attribute wind Information gain is precisely the measure used by ID3 to select the best attribute at each step in growing the decision tree   1/23/2025 Dr T ANANTHAN 19

Example 1: Construct the decision tree for the given training examples using ID3 algorithm. 1/23/2025 Dr T ANANTHAN 20 Days Outlook Temperature Humidity Wind Play Tennis? D1 Sunny hot High Weak No D2 Sunny hot High Strong No D3 Overcast hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast hot Normal Weak Yes D14 Rain Mild High Strong No

Solution: Days Outlook Temp Hum Wind Play Tennis D1 Sunny hot High Weak No D2 Sunny hot High Strong No D3 Overcast hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast hot Normal Weak Yes D14 Rain Mild High Strong No Step 1: Calculate the entropy of the examples = = Step 2: Calculate the information gain of all attributes Attribute: Outlook Values (outlook)= Sunny, Overcast, Rain =   1/23/2025 Dr T ANANTHAN 21

Days Outlook Temp Hum Wind Play Tennis D1 Sunny hot High Weak No D2 Sunny hot High Strong No D3 Overcast hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast hot Normal Weak Yes D14 Rain Mild High Strong No Attribute: Temp = = Values (Temp)= high, Mild, Cool 8113 = 89   1/23/2025 Dr T ANANTHAN 22

Days Outlook Temp Hum Wind Play Tennis D1 Sunny hot High Weak No D2 Sunny hot High Strong No D3 Overcast hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast hot Normal Weak Yes D14 Rain Mild High Strong No Attribute: Hum = = Values (Hum)= High, Normal =   1/23/2025 Dr T ANANTHAN 23

Days Outlook Temp Hum Wind Play Tennis D1 Sunny hot High Weak No D2 Sunny hot High Strong No D3 Overcast hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast hot Normal Weak Yes D14 Rain Mild High Strong No Attribute: Wind = = Values (Wind)= strong, weak = 0478   1/23/2025 Dr T ANANTHAN 24

Days Outlook Temp Hum Wind Play Tennis D1 Sunny hot High Weak No D2 Sunny hot High Strong No D3 Overcast hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast hot Normal Weak Yes D14 Rain Mild High Strong No Step 3: root node-attribute 89 0478 According to the information gain measure, the outlook attribute provides the best prediction of the target attribute play tennis over the training examples. Therefore, outlook is selected as the decision attribute for the root node. The branches are created below the root for each of its possible values ( sunny, overcast, rain )   1/23/2025 Dr T ANANTHAN 25

Decision Tree 1/23/2025 Dr T ANANTHAN 26 Outlook k ? {D1, D2 , D3 ,……. D14} [9+, 5−] Sunny Overcast Rain {D1, D2 , D8 , D9,D11} [2+, 3−] {D4, D5 , D6 , D10,D14} [3+, 2−] {D3, D7 , D12 , D13} [4+, 0−] Yes

Attribute: Temp Values(Temp) = hot, mild, cool   1/23/2025 Dr T ANANTHAN 27 Days Temp Hum Wind Play Tennis D1 hot High Weak No D2 hot High Strong No D8 Mild High Weak No D9 Cool Normal Weak Yes D11 Mild Normal Strong Yes

Attribute: Hum Values(Hum) = high, normal   1/23/2025 Dr T ANANTHAN 28 Days Temp Hum Wind Play Tennis D1 hot High Weak No D2 hot High Strong No D8 Mild High Weak No D9 Cool Normal Weak Yes D11 Mild Normal Strong Yes

Attribute: Hum Values(Hum) = high, normal   1/23/2025 Dr T ANANTHAN 29 Days Temp Hum Wind Play Tennis D1 hot High Weak No D2 hot High Strong No D8 Mild High Weak No D9 Cool Normal Weak Yes D11 Mild Normal Strong Yes

Attribute: Wind Values(Wind) = weak, strong   1/23/2025 Dr T ANANTHAN 30 Days Temp Hum Wind Play Tennis D1 hot High Weak No D2 hot High Strong No D8 Mild High Weak No D9 Cool Normal Weak Yes D11 Mild Normal Strong Yes

Decision Tree 1/23/2025 Dr T ANANTHAN 31 Outlook k Humidity {D1, D2 , D3 ,……. D14} [9+, 5−] Sunny Overcast Rain {D1, D2 , D8 , D9,D11} [2+, 3−] {D4, D5 , D6 , D10,D14} [3+, 2−] {D3, D7 , D12 , D13} [4+, 0−] Yes High Normal {D1, D2 , D8} No {D9, D11} Yes Days Temp Hum Wind Play Tennis D1 hot High Weak No D2 hot High Strong No D8 Mild High Weak No D9 Cool Normal Weak Yes D11 Mild Normal Strong Yes

Wind has high information gain   1/23/2025 Dr T ANANTHAN 32 Days Temp Hum Wind Play Tennis D4 Mild High Weak Yes D5 Cool Normal Weak Yes D6 Cool Normal Strong No D10 Mild Normal Weak Yes D14 Mild High Strong No

Decision Tree Answer: 1/23/2025 Dr T ANANTHAN 33 Outlook k Humidity {D1, D2 , D3 ,……. D14} [9+, 5−] Sunny Overcast Rain {D1, D2 , D8 , D9,D11} [2+, 3−] {D4, D5 , D6 , D10,D14} [3+, 2−] {D3, D7 , D12 , D13} [4+, 0−] Yes High Normal {D1, D2 , D8} No {D9, D11} Yes Strong Weak {D6, D14} No {D4, D5,D10} Yes Wind

1/23/2025 Dr T ANANTHAN 34 Outlook k Humidity Sunny Overcast Rain Yes High Normal No Yes Strong Weak No Yes Wind Days Outlook Temp Hum Wind Play Tennis D1 Sunny hot High Weak No D2 Sunny hot High Strong No D3 Overcast hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast hot Normal Weak Yes D14 Rain Mild High Strong No Sunny hot Normal weak ? Unseen instance Predicting unseen instance

1/23/2025 Dr T ANANTHAN 35 Days Outlook Temp Hum Wind Play Tennis D1 Sunny hot High Weak No D2 Sunny hot High Strong No D3 Overcast hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong Yes D12 Overcast Mild High Strong Yes D13 Overcast hot Normal Weak Yes D14 Rain Mild High Strong No Sunny hot Normal weak ? Example < sunny,hot,high,strong > <sunny,hot,high,weak> < overcast,hot,high,weak > <rain,hot,high,weak> < overcast,cool,high,weak > <overcast,mild,high,weak>

How random noise in the training examples can lead to overfitting? Days Outlook Temp Hum Wind Play Tennis D1 Sunny hot High Weak No D2 Sunny hot High Strong No D3 Overcast hot High Weak Yes D4 Rain Mild High Weak Yes D5 Rain Cool Normal Weak Yes D6 Rain Cool Normal Strong No D7 Overcast Cool Normal Strong Yes D8 Sunny Mild High Weak No D9 Sunny Cool Normal Weak Yes D10 Rain Mild Normal Weak Yes D11 Sunny Mild Normal Strong No D12 Overcast Mild High Strong Yes D13 Overcast hot Normal Weak Yes D14 Rain Mild High Strong No 1/23/2025 Dr T ANANTHAN 36 noise Thus overfitting results in decision trees that are more complex than necessary

How can we avoid overfitting in decision tree? Decision tree can grow and become very big, sometimes this can cause some extra unnecessary branches (overfitting) which slowing the performance of the decision tree. Decision tree can prune down by removing the branches that do not add much value to it. This can reduce the size of the tree and improves the accuracy of the decision tree prediction. Therefore, the overfitting can be avoided by pruning . 1/23/2025 Dr T ANANTHAN 37

What is pruning? Pruning is a process of removing less importance branches that result in improving the performance of a decision tree What are the types of pruning? Pre pruning Post pruning 1/23/2025 Dr T ANANTHAN 38

What is pre pruning? Stop growing the tree from branching, before it reaches the point where it perfectly classifies the training data. What is post pruning? Removing the branches after the decision tree is complete. Which one is found to be more successful in practice? Post pruning 1/23/2025 Dr T ANANTHAN 39
Tags