03b-algorithm-data-mining-DTs-gain-ratio.pptx

DavidClement34 7 views 13 slides Mar 10, 2025

Slide 1 of 13

About This Presentation

Size: 2.06 MB

Language: en

Added: Mar 10, 2025

Slides: 13 pages

Slide Content

Lecturer: Abdullahi Ahamad Shehu (M.Sc. Data Science, M.Sc. Computer Science) Office: Faculty of Computing Extension Algorithms – Decision trees Information Gain vs Gain Ratio CSC4316 : DATA MINING

Decision trees purity criteria: Information gain Information is measured in bits Entropy information required to predict the class, given a probability distribution entropy(p 1 ,p 2 ,…, p n ) = -p 1 *log 2 (p 1 ) – p 2 *log 2 (p 2 ) –…– p n *log 2 ( p n ) p i is the proportion in class i Information via entropy of class distribution info([c 1 ,c 2 ,…, c n ]) = entropy(c 1 /c , c 2 /c , … , c n /c) c i is the number of instances of class i c = c 1 +c 2 +…+ c n

Decision trees purity criteria: Information gain Information Gain - difference between information contained in dataset expected information from attribute split Expected Information from attribute splitting weighted average of information from each partition info([a 1 ,…,a n ], [b 1 ,…,b n ], [c 1 ,…, c n ],…,) = A/ Σ * info([a 1 ,…,a n ]) + B/ Σ * info([b 1 ,…,b n ]) + C/ Σ * info([c 1 ,…, c n ]) + … where x i is num. of instances of class i X = x 1 + x 2 + … Σ = A + B + C + … Attribute A info([A T ,A F ]) A=a info([a T ,a F ]) A=b info([ b T ,b F ]) A=x info([x T ,x F ]) a b x e.g. Classification

Highly-branching attributes Subsets are more likely to be pure if there is a large number of values Attributes with a large number of values have higher gain Information gain is biased towards choosing attributes with a large number of values This may result in overfitting Gain ratio compensates for this bias

Root Node: ID code info(Weather Data) = 0.940 info(ID split) = info([0,1],[0,1],[1,0],…,[1,0],[0,1]) = 1/14*0+1/14*0+1/14*0+…+1/14*0+1/14*0 Gain(ID) = 0.940 – 0 = 0.940 bits Gain is maximal for ID code Not a useful decision node! ID Code No Yes No No Yes a b c … m n

Gain Ratio Takes number and size of branches into account when choosing an attribute Information gain entropy with respect to classification Gain ratio normalises information gain with entropy in attribute partition entropy with respect to attribute partition ignores classification

Example: Gain Ratio Outlook Information as before (entropy wrt classification) info(data) = info([9,5]) = 0.940 info(outlook) = 0.693 Gain as before gain(outlook) = 0.940 – 0.693 = 0.247 Information wrt Outlook partitions 5 sunny, 4 cloudy, 5 rainy info(data wrt outlook) = info([5,4,5]) = -5/14*log2(5/14) – 4/14 log2(4/14) – 5/14*log2(5/14) = 1.563 GainRatio (outlook) = gain(outlook)/info(data wrt outlook)= 0.247/1.563 = 0.158 Outlook info([ 9 , 5 ]) 9,11 , 1,2,8 info([ 2 , 3 ]) 3,7,12,13 info([ 4 , ]) 4,5,10, 6,14 info([ 3 , 2 ]) sunny cloudy rainy 5/14 4/14 5/14

Example: Gain Ratio Temp Gain info(data) = 0.940 info(temp) = 0.911 bits gain(temp) = 0.029 bits Information wrt temperature partitions 4 hot, 6 mild, 4 cool info(data wrt temp) = info([4,6,4]) =-4/14*log 2 (4/14) – 6/14 log 2 (6/14) – 4/14*log 2 (4/14) = 1.545 GainRatio (temp) gain(temp)/info(data wrt temp) = 0.029/1.545 = 0.019 temperature info([ 9 , 5 ]) 3,13 , 1,2 info([ 2 , 2 ]) 4,10,11,12 , 14,8 info([ 4 , 2 ]) 5,7,9 , 6 info([ 3 , 1 ]) hot mild cool 4/14 6/14 4/14

Example: Gain Ratio Humidity Gain info(data) = 0.940 info(humidity) = 0.790 bits gain(humidity) = 0.151 bits Information wrt humidity partitions 7 high, 7 normal Info(data wrt humidity) = info([7,7]) = -7/14*log 2 (7/14) – 7/14*log 2 (7/14) = 1 GainRatio (humidity) gain(humidity)/info(data wrt humidity) = 0.151/1 = 0.151 humidity info([ 9 , 5 ]) 3,4,12 , 1,2,8 , 14 info([ 3 , 4 ]) 5,7,9,10,11,13 , 6 info([ 6 , 1 ]) high normal 7/14 7/14

Example: Gain Ratio Windy Gain info(data) = 0.940 info(windy) = 0.892 bits gain(windy) = 0.048 bits Information wrt windy partitions 6 true, 8 false Info(data wrt windy) = info([6,8]) =-6/14*log 2 (6/14) – 8/14 log 2 (8/14) = 0.986 GainRatio (windy) gain(windy)/info(data wrt windy)= 0.048/0.986 = 0.049 windy info([ 9 , 5 ]) 7,11,12 , 2,6 , 14 info([ 3 , 3 ]) 3,4,5,9,10,13 , 1,8 info([ 6 , 2 ]) true false 6/14 8/14

Example: Gain Ratios Gain Ratios GainRatio (outlook) = 0.247 /1.563 = 0.158 GainRatio (temp) = 0.029/1.545 = 0.019 GainRatio (humidity) = 0.151/1 = 0.151 GainRatio (windy) = 0.048/0.986 = 0.049 Outlook still selected as root node But humidity is much closer choice

Problems with Gain Ratio Extreme scenario with very high number of values reject attribute even if high gain or gain ratio Gain ratio may overcompensate may choose an attribute because its intrinsic information is very low Standard solution only consider attributes with greater than average information gain calculate information gain for all attributes calculate gain ratio for those with > average gain choose attribute with highest gain ratio

Exercises 3 – Information Gain Ratio Use the information gain metric to choose a suitable attribute for the root node of the Credit Risk Dataset that predicts credit risk given credit history, debt, collateral and income. The class attribute is risk. Use gain ratio to choose the root node attribute for Credit Risk Dataset

03b-algorithm-data-mining-DTs-gain-ratio.pptx

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

03b-algorithm-data-mining-DTs-gain-ratio.pptx

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx