03b-algorithm-data-mining-DTs-gain-ratio.pptx

DavidClement34 7 views 13 slides Mar 10, 2025
Slide 1
Slide 1 of 13
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13

About This Presentation

03b-algorithm-data-mining-DTs-gain-ratio.pptx


Slide Content

Lecturer: Abdullahi Ahamad Shehu (M.Sc. Data Science, M.Sc. Computer Science) Office: Faculty of Computing Extension Algorithms – Decision trees Information Gain vs Gain Ratio CSC4316 : DATA MINING

Decision trees purity criteria: Information gain Information is measured in bits Entropy information required to predict the class, given a probability distribution entropy(p 1 ,p 2 ,…, p n ) = -p 1 *log 2 (p 1 ) – p 2 *log 2 (p 2 ) –…– p n *log 2 ( p n ) p i is the proportion in class i Information via entropy of class distribution info([c 1 ,c 2 ,…, c n ]) = entropy(c 1 /c , c 2 /c , … , c n /c) c i is the number of instances of class i c = c 1 +c 2 +…+ c n

Decision trees purity criteria: Information gain Information Gain - difference between information contained in dataset expected information from attribute split Expected Information from attribute splitting weighted average of information from each partition info([a 1 ,…,a n ], [b 1 ,…,b n ], [c 1 ,…, c n ],…,) = A/ Σ * info([a 1 ,…,a n ]) + B/ Σ * info([b 1 ,…,b n ]) + C/ Σ * info([c 1 ,…, c n ]) + … where x i is num. of instances of class i X = x 1 + x 2 + … Σ = A + B + C + … Attribute A info([A T ,A F ]) A=a info([a T ,a F ]) A=b info([ b T ,b F ]) A=x info([x T ,x F ]) a b x e.g. Classification

Highly-branching attributes Subsets are more likely to be pure if there is a large number of values Attributes with a large number of values have higher gain Information gain is biased towards choosing attributes with a large number of values This may result in overfitting Gain ratio compensates for this bias

Root Node: ID code info(Weather Data) = 0.940 info(ID split) = info([0,1],[0,1],[1,0],…,[1,0],[0,1]) = 1/14*0+1/14*0+1/14*0+…+1/14*0+1/14*0 Gain(ID) = 0.940 – 0 = 0.940 bits Gain is maximal for ID code Not a useful decision node! ID Code No Yes No No Yes a b c … m n

Gain Ratio Takes number and size of branches into account when choosing an attribute Information gain entropy with respect to classification Gain ratio normalises information gain with entropy in attribute partition entropy with respect to attribute partition ignores classification

Example: Gain Ratio Outlook Information as before (entropy wrt classification) info(data) = info([9,5]) = 0.940 info(outlook) = 0.693 Gain as before gain(outlook) = 0.940 – 0.693 = 0.247 Information wrt Outlook partitions 5 sunny, 4 cloudy, 5 rainy info(data wrt outlook) = info([5,4,5]) = -5/14*log2(5/14) – 4/14 log2(4/14) – 5/14*log2(5/14) = 1.563 GainRatio (outlook) = gain(outlook)/info(data wrt outlook)= 0.247/1.563 = 0.158 Outlook info([ 9 , 5 ]) 9,11 , 1,2,8 info([ 2 , 3 ]) 3,7,12,13 info([ 4 , ]) 4,5,10, 6,14 info([ 3 , 2 ]) sunny cloudy rainy 5/14 4/14 5/14

Example: Gain Ratio Temp Gain info(data) = 0.940 info(temp) = 0.911 bits gain(temp) = 0.029 bits Information wrt temperature partitions 4 hot, 6 mild, 4 cool info(data wrt temp) = info([4,6,4]) =-4/14*log 2 (4/14) – 6/14 log 2 (6/14) – 4/14*log 2 (4/14) = 1.545 GainRatio (temp) gain(temp)/info(data wrt temp) = 0.029/1.545 = 0.019 temperature info([ 9 , 5 ]) 3,13 , 1,2 info([ 2 , 2 ]) 4,10,11,12 , 14,8 info([ 4 , 2 ]) 5,7,9 , 6 info([ 3 , 1 ]) hot mild cool 4/14 6/14 4/14

Example: Gain Ratio Humidity Gain info(data) = 0.940 info(humidity) = 0.790 bits gain(humidity) = 0.151 bits Information wrt humidity partitions 7 high, 7 normal Info(data wrt humidity) = info([7,7]) = -7/14*log 2 (7/14) – 7/14*log 2 (7/14) = 1 GainRatio (humidity) gain(humidity)/info(data wrt humidity) = 0.151/1 = 0.151 humidity info([ 9 , 5 ]) 3,4,12 , 1,2,8 , 14 info([ 3 , 4 ]) 5,7,9,10,11,13 , 6 info([ 6 , 1 ]) high normal 7/14 7/14

Example: Gain Ratio Windy Gain info(data) = 0.940 info(windy) = 0.892 bits gain(windy) = 0.048 bits Information wrt windy partitions 6 true, 8 false Info(data wrt windy) = info([6,8]) =-6/14*log 2 (6/14) – 8/14 log 2 (8/14) = 0.986 GainRatio (windy) gain(windy)/info(data wrt windy)= 0.048/0.986 = 0.049 windy info([ 9 , 5 ]) 7,11,12 , 2,6 , 14 info([ 3 , 3 ]) 3,4,5,9,10,13 , 1,8 info([ 6 , 2 ]) true false 6/14 8/14

Example: Gain Ratios Gain Ratios GainRatio (outlook) = 0.247 /1.563 = 0.158 GainRatio (temp) = 0.029/1.545 = 0.019 GainRatio (humidity) = 0.151/1 = 0.151 GainRatio (windy) = 0.048/0.986 = 0.049 Outlook still selected as root node But humidity is much closer choice

Problems with Gain Ratio Extreme scenario with very high number of values reject attribute even if high gain or gain ratio Gain ratio may overcompensate may choose an attribute because its intrinsic information is very low Standard solution only consider attributes with greater than average information gain calculate information gain for all attributes calculate gain ratio for those with > average gain choose attribute with highest gain ratio

Exercises 3 – Information Gain Ratio Use the information gain metric to choose a suitable attribute for the root node of the Credit Risk Dataset that predicts credit risk given credit history, debt, collateral and income. The class attribute is risk. Use gain ratio to choose the root node attribute for Credit Risk Dataset
Tags