Id3 algorithm

1,256 views 22 slides Mar 07, 2020
Slide 1
Slide 1 of 22
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22

About This Presentation

In decision tree learning, ID3 (Iterative Dichotomiser 3) is an algorithm invented by Ross Quinlan used to generate a decision tree from a dataset. ID3 is the precursor to the C4. 5 algorithm, and is typically used in the machine learning and natural language processing domains.


Slide Content

ID3 Algorithm

Abstract ID3 builds a decision tree from a fixed set of examples. Using this decision tree, future samples are classified. The example has several attributes and belongs to a class. The leaf nodes of the decision tree contain the class name whereas a non-leaf node is a decision node. The decision node is an attribute test with each branch being a possible value of the attribute. ID3 uses information gain to help it decide which attribute goes into a decision node.

Algorithm Calculate the entropy of every attribute using the data set. Split the set into subsets using the attribute for which entropy is minimum (or equivalently, information gain is maximum). Make a decision tree node containing that attribute. Recurse on subsets using remaining attributes.

Entropy and Information gain The entropy is a measure of the randomness in the information being processed. If the sample is completely homogeneous the entropy is zero and if the sample is equally divided then it has entropy of one. Entropy can be calculated as: Entropy(S) = ∑ – p(I) . log 2 p(I) The information gain is based on the decrease in entropy after a data-set is split on an attribute. Information gain can be calculated as: Gain(S, A) = Entropy(S) – ∑ [ p(S|A) . Entropy(S|A) ]

Decision tree for deciding if tennis is playable, using data from past 14 days

Entropy ( Decision ) =  – p(Yes) . log 2 p(Yes) – p(No) . log 2 p(No) Entropy ( Decision ) = – (9/14) . log 2 (9/14) – (5/14) . log 2 (5/14) = 0.940 Entropy

Wind factor on decision Wind attribute has two labels: weak and strong . We need to calculate ( Decision|Wind =Weak) and  ( Decision|Wind =Strong) respectively.  

Weak wind factor There are 8 instances for weak wind. Decision of 2 items are no and 6 items are yes . Entropy( Decision|Wind =Weak) = – p(No) . log 2 p(No) – p(Yes) . log 2 p(Yes) Entropy( Decision|Wind =Weak) = – (2/8) . log 2 (2/8) – (6/8) . log 2 (6/8) = 0.811

Strong wind factor Here, there are 6 instances for strong wind. Decision is divided into two equal parts. Entropy( Decision|Wind =Strong) = – (3/6) . log 2 (3/6) – (3/6).log 2 (3/6) = 1

Wind factor on decision Information Gain can be calculated as: Gain(Decision, Wind) = Entropy(Decision) – [ p( Decision|Wind =Weak) . Entropy( Decision|Wind =Weak) ] – [p( Decision|Wind =Strong) . Entropy( Decision|Wind =Strong) ] = 0.940 – [ (8/14) . 0.811 ] – [ (6/14). 1] = 0.048

Other factor on decision On applying similar calculation on the other columns, we get: Gain(Decision, Outlook) = 0.246 Gain(Decision, Temperature) = 0.029 Gain(Decision, Humidity) = 0.151 Outlook Sunny Overcast Rainy

Overcast outlook on decision Decision will always be yes if outlook were overcast.

Outlook Sunny Overcast Rainy Yes

Sunny outlook on decision We have 5 instances for sunny outlook. Decision would be probably 3/5 percent no, 2/5 percent yes Gain(Outlook= Sunny|Temperature ) = 0.570 Gain(Outlook= Sunny|Humidity ) = 0.970 Gain(Outlook= Sunny|Wind ) = 0.019

Outlook Sunny Overcast Rainy Yes Humidity Normal High

Decision will always be no when humidity is high . Decision will always be yes when humidity is normal .

Outlook Sunny Overcast Rainy Yes Humidity Normal High Yes No

Rain outlook on decision Information gain for Rain outlook are: Gain(Outlook=Rain | Temperature) = 0.02 Gain(Outlook=Rain | Humidity) = 0.02 Gain(Outlook=Rain | Wind) = 0.971

Outlook Sunny Overcast Rainy Yes Humidity Normal High Yes No Wind Strong Weak

Decision will always be yes if wind were weak and outlook were rain . Decision will always be no if wind were strong and outlook were rain .

Outlook Sunny Overcast Rainy Yes Humidity Normal High Yes No Wind Strong Weak No Yes

Thank you