Create and apply the model (using Gaussian Naïve
Bayes)
Evaluate Classification Models
Confusion
Matrix
A confusion matrix is a method of visualizing the
truth results of a classification problem. It helps you
more easily examine the dimensions of a classification
problem, as well as enumerate true positives/negatives
and false positives/negatives within the predicted data.
•43 patients were correctly predicted to have heart disease (true
positives).
•6 patients were incorrectly predicted
to have heart disease when they
actually do not (false positives).
•3 patients were incorrectly predicted
to not have heart disease when they
actually do (false negatives).
•212 patients were correctly predicted to not have heart disease
(true negatives).
Evaluate Classification Models
Classifier Performance
Measurement
Accuracy
Accuracy is a measure of how frequently each prediction is correctly deemed positive or negative.
Evaluate Classification Models
Precisio
n
Precision is a measure of how often the positives identified by the learning model are true positives
Evaluate Classification Models
Recall
Recall is the percentage of positive instances that are found by a model as compared to all
relevant instances. A "relevant" instance is any instance that is actually true, even if the
prediction is wrong.
Evaluate Classification Models
F1Score
The F₁score helps you find the optimal combination of both precision and recall. The F1
score essentially just takes a weighted average (more precisely, a harmonic mean) of both
precision and recall.
Evaluate Classification Models
Receiver Operating
Characteristic (ROC) Curve
A receiver operating characteristic
(ROC) curve is a method of plotting the
relationship between predicted "hits" versus
false alarms. On the y-axis is the true
positive rate (TPR) (the "hits"), which is
essentially the same as the recall. On the x-axis
is the false positive rate (FPR)(the false
alarms
•Any data above and to the left of the dotted line is
better than a random guess. This is what you want for
your model.
•Any data near or at the dotted line is no better than a
random guess. This is not what you want for your
model.
•Any data below and to the right of the dotted line is
worse than a random guess. This typically means
there's a problem with the model.
Calculate Accuracy Score
Create Confusion Matrix
Create Classification Report
Discussion
What is the accuracy score of your model?
Describe some possible ways to improve
your model.