04-evaluation-ExercisesAndSolutions.pptx

DavidClement34 7 views 12 slides Mar 10, 2025
Slide 1
Slide 1 of 12
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12

About This Presentation

04-evaluation-ExercisesAndSolutions.pptx


Slide Content

Lecturer: Abdullahi Ahamad Shehu (M.Sc. Data Science, M.Sc. Computer Science) Office: Faculty of Computing Extension CSC4316 : DATA MINING Evaluation of Learning Exercises (From Week 4 Lecture)

Exercises The following contingency tables show the performance of a learner. Estimate the predictive accuracy of the learner on the training data below the test data below. Comment on the values in these two contingency tables. b) Predicted Positive Negative 400 200 Positive Actual 100 300 Negative Predicted Positive Negative 164 36 Positive Actual 111 52 Negative

Exercises Suppose you have a dataset containing 300 instances. Estimate the sizes (number of instances) in the training and test sets using Holdout estimation 3-fold cross-validation 10-fold cross-validation 30-fold cross-validation Leave-one-out testing Bootstrap estimation For each of the evaluation methods in 3. above how are repeated experiments undertaken.

Exercise - Evaluation Using the results in the graph below Which of the two algorithms (non-optimised/fully-optimised retrieval) appears to be better? List the domains for which the difference is significant. The error-bars show 95% confidence intervals. Would it be easier or harder to achieve a significant difference with 99% confidence? Would the error bars for 99% confidence be larger or smaller?

Exercise - evaluation Calculate Kappa stats, and precision and recall for the following confusion matrix === Confusion Matrix === a b c d e f g ← classified as 19407 56 50 24 468 2 0 | a 2837 1088 2808 1566 357 18 0 | b 820 1149 6737 4526 618 40 0 | c 73 716 4707 8651 894 31 0 | d 1732 112 533 1173 38550 16 0 | e 13 13 85 47 31 110 0 | f 5 0 0 1 3 0 0 | g

Exercise - Solution The following contingency tables show the performance of a learner. Estimate the predictive accuracy of the learner on the training data below 700/1000 = 70% the test data below 216/363 = 60% Comment on the values in these two contingency tables. lower accuracy on test data can be indicative of over fitting. b) Predicted Positive Negative 400 200 Positive Actual 100 300 Negative Predicted Positive Negative 164 36 Positive Actual 111 52 Negative

Exercise - Solution Holdout estimation: ~100 testing, ~200 training 3-fold cross-validation: 100 testing, 200 training 10-fold cross-validation: 30 testing, 270 training 30-fold cross-validation: 10 testing, 290 training Leave-one-out testing: 1 testing, 299 training Bootstrap estimation: ~110 testing, ~190 training Holdout estimation: a new test set is randomly chosen k-fold cross-validation: the dataset is re-partitioned into k subsets and k experiments are performed with new partition [same answer for c) and d)] Leave-one-out testing: repetition generates identical results so Leave-one-out testing cannot be usefully repeated Bootstrap estimation: a new training set is selected with replacement

Exercise - Solution Using the results in the graph Which of the two algorithms (non-optimised/fully-optimised retrieval) appears to be better? Optimised seems better for all but one dataset (ionosphere) List the domains for which the difference is significant. Bars do not overlap with LED, LED_17 and WINE The error-bars show 95% confidence intervals. Would it be easier or harder to achieve a significant difference with 99% confidence? Harder Would the error bars for 99% confidence be larger or smaller? Larger

Calculate Kappa stats Kappa stats and precision and recall === Confusion Matrix === a b c d e f g ← classified as 19407 56 50 24 468 2 0 | a 2837 1088 2808 1566 357 18 0 | b 820 1149 6737 4526 618 40 0 | c 73 716 4707 8651 894 31 0 | d 1732 112 533 1173 38550 16 0 | e 13 13 85 47 31 110 0 | f 5 0 0 1 3 0 0 | g

Correctly Classified Instances 74543 74.4931 % Incorrectly Classified Instances 25524 25.5069 % Total Number of Instances 100067 10

P o = sum(diagonal)/sum(all)= 74543/100067 = 0.7449 P e = 0.2693 K = = 0.6509 (good agreement)                   Total in class Total classed as a 19407 56 50 24 468 2 20007 24887 0.0497 b 2837 1088 2808 1566 357 18 8674 3134 0.0027 c 820 1149 6737 4526 618 40 13890 14920 0.0207 d 73 716 4707 8651 894 31 15072 15988 0.0241 e 1732 112 533 1173 38550 16 42116 40921 0.1721 f 13 13 85 47 31 110 299 217 g 5 1 3 9 100067 100067 0.2693 20007 /100067 * 24887 /100067

Precision and recall === Detailed Accuracy By Class === TP Rate FP Rate Precision Recall Class 0.97 0.068 0.78 0.97 a 0.125 0.022 0.347 0.125 b 0.485 0.095 0.452 0.485 c 0.574 0.086 0.541 0.574 d 0.915 0.041 0.942 0.915 e 0.368 0.001 0.507 0.368 f 0 0 0 0 g W.Avg 0.745 0.059 0.728 0.745 TP = true positive FP = true negative W. Avg = weighted average 19407/20007 5480/(100067-20007) All class a 2837 + 820+73+1732+13+5 Precision (class a) = = 0.78 Recall (class a) =  
Tags