Introduction Classification techniques such as (logistic regression, LDA and SVM) have been applied to many applications in various fields of sciences. The analysis of such metrics and its significance must be interpreted correctly for evaluating different learning algorithms to find the most suitable algorithm for a particular problem There are several ways of evaluating classification algorithms. Most of these measures are scalar metrics and some of them are graphical
1- Confusion matrix 5-Predictive value ππππππ πππ(PPV) = ππ ππ+πΉπ 6-Likelihood ratio Positive likelihood (LR+) = πππ 1βπππ Negative likelihood (LR- ) = 1βπππ πππ Where TNR is True negative rate (TNR)= πΉππ =1- TPR ππ ππ+πΉπ Diagnostic odds ratio (DOR)= (LR+) / (LR- ) 7-Youdenβs index YI=TNR+TPR- 1
1- Confusion matrix 8-Matthews correlation coefficient (MCC ) (sensitive to imbalance data) Discriminant power (DP) F1-score 11- Markedness (MK) (sensitive to imbalance data) MK=PPV+NPV- 1
1- Confusion matrix Balanced classification rate or balanced accuracy (BCR) (BCR)=1/2(TPR+TNR) Geometric Mean (GM) Optimization precision (OP) (sensitive to imbalance data) 15-Jaccard (sensitive to imbalance data)
2- Receiver operating characteristics (ROC) It is used to make a balance between the benefits(true positives), and costs( false positives) In multi- class classification, ROC becomes more complex than in binary to solution for this problem is to produce one ROC curve for each class. Insensitive with the imbalanced data
ROC The steps of generating ROC curve sorting samples according to their scores changing the threshold value from maximum to minimum to process one sample at a time and update the values of TP and FP in each time Next, the values of TPR and FPR are calculated and pushed into the ROC When the threshold becomes very low , all samples are classified as positive samples and hence the values of both TPR and FPR are one
3- Area under the ROC curve (AUC) Scalar value represents the expected performance from ROC AUC= Base*Height (FPR2 -FPR1)(TPR1 - TPR2)
4-Precision- Recall (PR) curve Shows the relationship between recall and precision Generated by changing the threshold as in ROC
Demo I used Wisconsin Diagnostic Breast Cancer Dataset (WDBC) [1] [2]To build a model to apply the classification assessment methods This dataβs dimensions is (569, 32) and has tow class B and M Benign 357 Malignant 212 (Not balance) Three classification techniques: Logistic regression Decision tree Linear discriminant Three assessment methods: Confusion matrix (recall, accuracy, F1- score,precision) ROC AUC
Demo: Confusion Matrix Result
Demo: ROC &AUC
Demo Finally after assessment we can choose our best classifier based in the confusion matrix, ROC and AUC result. Therefore the best classifier is Linear discriminant as it has highest recall, accuracy, F1-score,precision and AUC. From ROC, it appears that Linear discriminant is the best one, as it near to the ideal curve ( close the left corner)
Conclusion The paper gives a detailed overview of the classification assessment measures. It explains the relations between these measures and the robustness of each of them against imbalanced data I have implemented a real problem (breast cancer diagnosis) to show the importance of evaluation of classification to choose the best classifier with the highest accurcey, which I think it is better than random numerical values as in the articles.
References [1]Tharwat, A., 2018. Classification assessment methods. Applied Computing and Informatics. [2]Breast Cancer Wisconsin (Diagnostic) Data Set, access March 28, 2020. http://archive.ics.uci.edu/ml/datasets/breast+cancer+wisconsin+(diagnostic) . [3] Omondiagbe, D.A., Veeramani, S. and Sidhu, A.S., 2019, April. Machine Learning Classification Techniques for Breast Cancer Diagnosis. In IOP Conference Series: Materials Science and Engineering (Vol. 495, No. 1, p. 012033). IOP Publishing.