Understanding the Confusion Matrix in Machine Learning

In machine learning, evaluating the performance of classification models is crucial. While accuracy is a common metric, it often falls short, especially when dealing with imbalanced datasets. The confusion matrix, along with metrics like precision, recall, and F1 score, provides a more comprehensive view of a model's performance. This article explains the confusion matrix and its related metrics, offering a clear understanding with formulas, intuition, and real-world examples.

What is a Confusion Matrix?

A confusion matrix is a performance evaluation tool in machine learning, representing the accuracy of a classification model. It is an N x N matrix used for evaluating the performance of a classification model, where N is the total number of target classes. The matrix compares the actual target values with those predicted by the machine learning model. It displays the number of true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN).

In essence, a confusion matrix summarizes a classification model’s accuracy and errors, where entries indicate true positive, true negative, false positive, and false negative cases. It provides a thorough assessment of a classification model’s performance and helps with more in-depth analysis by providing information on true positives, true negatives, false positives, and false negatives.

Important Terms in a Confusion Matrix

To understand the confusion matrix, it's essential to define its key components:

True Positive (TP): The model correctly predicts the positive class.
True Negative (TN): The model correctly predicts the negative class.
False Positive (FP): The model incorrectly predicts the positive class when it is actually negative (Type I error).
False Negative (FN): The model incorrectly predicts the negative class when it is actually positive (Type II error).

Why Do We Need a Confusion Matrix?

While a simple accuracy score might seem sufficient, it can be misleading, especially with imbalanced datasets. For example, consider a dataset where 96% of the data points belong to the negative class and only 4% belong to the positive class. A model that always predicts the negative class would achieve 96% accuracy. However, this model would be useless in identifying the positive class.

Read also: Requirements for UNR Scholarships

In such cases, the confusion matrix becomes crucial. It provides insights into the types of errors the model is making, allowing for a more nuanced evaluation. It helps you understand where the model is "confused."

How to Calculate a Confusion Matrix for a 2-Class Classification Problem?

Let's consider a binary classification problem where we want to predict whether a driver will turn left or right at a light. The model is used to predict whether a driver will turn left or right at a light. This is a binary classification and can work on any prediction task that makes a yes or no, or true or false, distinction. The purpose of the confusion matrix is to show how confused the model is.

To do so, we introduce two concepts: false positives and false negatives. If the model is to predict the positive (left) and the negative (right), then the false positive is predicting left when the actual direction is right. A false negative works the opposite way; the model predicts right, but the actual result is left.

In this confusion matrix, there are 19 total predictions made. 14 are correct and 5 are wrong. The False Negative cell, number 3, means that the model predicted a negative, and the actual was a positive. The False Positive cell, number 2, means that the model predicted a positive, but the actual was a negative.

Precision and Recall: Beyond Accuracy

The confusion matrix allows us to calculate other important metrics, such as precision and recall.

Read also: Matrix Course Navigation

Precision

Precision is the ratio of true positives to the total of the true positives and false positives. Precision looks to see how much junk positives got thrown in the mix. If there are no bad positives (those FPs), then the model had 100% precision. The more FPs that get into the mix, the uglier that precision is going to look. To calculate a model’s precision, we need the positive and negative numbers from the confusion matrix.

Precision = TP / (TP + FP)

Precision measures how many of the positively predicted instances were actually positive. A high precision score indicates that the model is good at avoiding false positives.

Recall

Recall goes another route. Instead of looking at the number of false positives the model predicted, recall looks at the number of false negatives that were thrown into the prediction mix. The recall rate is penalized whenever a false negative is predicted. Because the penalties in precision and recall are opposites, so too are the equations themselves. Precision and recall are the yin and yang of assessing the confusion matrix.

Recall = TP / (TP + FN)

Recall measures how many of the actual positive instances were correctly predicted by the model. A high recall score indicates that the model is good at avoiding false negatives.

The Trade-off Between Precision and Recall

As seen before, when understanding the confusion matrix, sometimes a model might want to allow for more false negatives to slip by. That would result in higher precision because false negatives don’t penalize the recall equation. Sometimes a model might want to allow for more false positives to slip by, resulting in higher recall, because false positives are not accounted for. Generally, a model cannot have both high recall and high precision. There is a cost associated with getting higher points in recall or precision. A model may have an equilibrium point where the two, precision and recall, are the same, but when the model gets tweaked to squeeze a few more percentage points on its precision, that will likely lower the recall rate.

In practice, when we try to increase the precision of our model, the recall goes down, and vice-versa.

F1-Score

F1-score is a harmonic mean of Precision and Recall, and so it gives a combined idea about these two metrics.

F1-score = 2 * (Precision * Recall) / (Precision + Recall)

However, there is a catch here. The interpretability of the F1-score is poor. This means that we don’t know what our classifier is maximizing - precision or recall.

Real-World Examples and Applications

The importance of precision and recall depends on the specific problem. Consider the following examples:

Cancer Prediction

For this dataset, if the model predicts cancer records as non-cancer means it’s risky. The cancer data set has 100 records, out of which 94 are cancer records and 6 are non-cancer records. But the model is predicting 90 out of 94 cancer records correctly. So, the precision metric is given more importance while evaluating this model. Only 90 out of 94 records are predicted correctly. It’s risky. The precision rate is 95%. It should be 100%.

If a non-cancer patient is predicted as cancer means, he/she may go for another screening and will get the correct result in the next screening. If those 5 misclassified records belong to the first category (Cancer patients predicted as no cancer) means, it will be very risky.

In this example, recall metrics is more important than precision. The recall rate should be 100%. All positive records (cancer records) should be predicted correctly.

Spam Detection

I have taken 10 records. I have taken the same example mentioned above. I have assigned spam class as 1 and non-spam as 0.

Contagious Virus Detection

In our example, when dealing with a contagious virus, the Confusion Matrix becomes crucial. Recall, assessing the ability to capture all actual positives, emerges as a better metric. We aim to avoid mistakenly releasing an infected person into the healthy population, potentially spreading the virus. This context highlights why accuracy proves inadequate as a metric for our model’s evaluation.

Instagram Nudity Filter

The Instagram algorithm needs to put a nudity filter on all the pictures people post, so a nude photo classifier is created to detect any nudity. If a nude picture gets posted and makes it past the filter, that could be very costly to Instagram. So, they are going to try to classify more things than necessary to filter every nude photo because the cost of failure is so high.

Confusion Matrix for Multi-Class Classification

Finally, confusion matrices do not apply only to a binary classifier. They can be used on any number of categories a model needs, and the same rules of analysis apply. How would a confusion matrix in machine learning work for a multi-class classification problem? Well, don’t scratch your head!

Using Confusion Matrix in Machine Learning: A Step-by-Step Guide

To use a confusion matrix in machine learning:

Train a machine learning model. This can be done using any machine learning algorithm, such as logistic regression, decision tree, or random forest.
Make predictions on a test dataset. This is a dataset of data that the model has not been trained on.
Construct a confusion matrix. This can be done using a Python library such as Scikit-learn.
Analyze the confusion matrix. Look at the diagonal elements of the matrix to see how many instances the model predicted correctly. Look at the off-diagonal elements of the matrix to see how many instances the model predicted incorrectly.

Practical Implementation with Scikit-learn

You know the theory - now let’s put it into practice. Sklearn confusion_matrix() returns the values of the Confusion matrix. The output is, however, slightly different from what we have studied so far. It takes the rows as Actual values and the columns as Predicted values. Sklearn classification_report() outputs precision, recall, and f1-score for each target class.

tags: #confusion #matrix #machine #learning #explained