Classification Assessment Methods.pptx

RiyadhALHaidari 66 views 17 slides Apr 08, 2023
Slide 1
Slide 1 of 17
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17

About This Presentation

Classification Assessment
Methods


Slide Content

Classification Assessment Methods Riadh Al-Haidari

Introduction Classification techniques such as (logistic regression, LDA and SVM) have been applied to many applications in various fields of sciences. The analysis of such metrics and its significance must be interpreted correctly for evaluating different learning algorithms to find the most suitable algorithm for a particular problem There are several ways of evaluating classification algorithms. Most of these measures are scalar metrics and some of them are graphical

1- Confusion matrix TP TN FP FN

1- Confusion matrix 1- Accuracy π΄π‘π‘π‘’π‘Ÿπ‘Žπ‘π‘¦ = 𝑇𝑃+𝑇𝑁 𝑇𝑃+𝐹𝑃+𝑇𝑁+𝑓𝑁 (sensitive to imbalance data) 2-Error rate (ERR) or misclassification rate (sensitive to imbalance data) ERR=1-accuracy 3- Sensitivity(recall)=TRR π‘‡π‘Ÿπ‘’π‘’ π‘ƒπ‘œπ‘ π‘–π‘‘π‘–π‘£π‘’ π‘…π‘Žπ‘‘π‘’ 𝑇𝑃𝑅 = TPR =1- False negative rate 𝑇𝑃 𝑇𝑃+𝐹𝑁 4- Specificity= 1- FPR False Positive Rate (FPR): FPR= 1- specificity (FPR) = 𝐹𝑃 𝐹𝑃+𝑇𝑁

1- Confusion matrix 5-Predictive value π‘π‘Ÿπ‘’π‘π‘–π‘ π‘–π‘œπ‘›(PPV) = 𝑇𝑃 𝑇𝑃+𝐹𝑃 6-Likelihood ratio Positive likelihood (LR+) = 𝑇𝑃𝑅 1βˆ’π‘‡π‘π‘… Negative likelihood (LR- ) = 1βˆ’π‘‡π‘ƒπ‘… 𝑇𝑁𝑅 Where TNR is True negative rate (TNR)= 𝐹𝑁𝑅 =1- TPR 𝑇𝑁 𝑇𝑁+𝐹𝑃 Diagnostic odds ratio (DOR)= (LR+) / (LR- ) 7-Youden’s index YI=TNR+TPR- 1

1- Confusion matrix 8-Matthews correlation coefficient (MCC ) (sensitive to imbalance data) Discriminant power (DP) F1-score 11- Markedness (MK) (sensitive to imbalance data) MK=PPV+NPV- 1

1- Confusion matrix Balanced classification rate or balanced accuracy (BCR) (BCR)=1/2(TPR+TNR) Geometric Mean (GM) Optimization precision (OP) (sensitive to imbalance data) 15-Jaccard (sensitive to imbalance data)

2- Receiver operating characteristics (ROC) It is used to make a balance between the benefits(true positives), and costs( false positives) In multi- class classification, ROC becomes more complex than in binary to solution for this problem is to produce one ROC curve for each class. Insensitive with the imbalanced data

ROC The steps of generating ROC curve sorting samples according to their scores changing the threshold value from maximum to minimum to process one sample at a time and update the values of TP and FP in each time Next, the values of TPR and FPR are calculated and pushed into the ROC When the threshold becomes very low , all samples are classified as positive samples and hence the values of both TPR and FPR are one

3- Area under the ROC curve (AUC) Scalar value represents the expected performance from ROC AUC= Base*Height (FPR2 -FPR1)(TPR1 - TPR2)

4-Precision- Recall (PR) curve Shows the relationship between recall and precision Generated by changing the threshold as in ROC

Demo I used Wisconsin Diagnostic Breast Cancer Dataset (WDBC) [1] [2]To build a model to apply the classification assessment methods This data’s dimensions is (569, 32) and has tow class B and M Benign 357 Malignant 212 (Not balance) Three classification techniques: Logistic regression Decision tree Linear discriminant Three assessment methods: Confusion matrix (recall, accuracy, F1- score,precision) ROC AUC

Demo: Confusion Matrix Result

Demo: ROC &AUC

Demo Finally after assessment we can choose our best classifier based in the confusion matrix, ROC and AUC result. Therefore the best classifier is Linear discriminant as it has highest recall, accuracy, F1-score,precision and AUC. From ROC, it appears that Linear discriminant is the best one, as it near to the ideal curve ( close the left corner)

Conclusion The paper gives a detailed overview of the classification assessment measures. It explains the relations between these measures and the robustness of each of them against imbalanced data I have implemented a real problem (breast cancer diagnosis) to show the importance of evaluation of classification to choose the best classifier with the highest accurcey, which I think it is better than random numerical values as in the articles.

References [1]Tharwat, A., 2018. Classification assessment methods. Applied Computing and Informatics. [2]Breast Cancer Wisconsin (Diagnostic) Data Set, access March 28, 2020. http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(diagnostic) . [3] Omondiagbe, D.A., Veeramani, S. and Sidhu, A.S., 2019, April. Machine Learning Classification Techniques for Breast Cancer Diagnosis. In IOP Conference Series: Materials Science and Engineering (Vol. 495, No. 1, p. 012033). IOP Publishing.