Cyber Crime — Confusion Matrix

4 min readJun 6, 2021

What Does Confusion Matrix Mean?

A confusion matrix is a type of table construct that plays a specific role in machine learning and related engineering. It helps to show the prediction and recall in a system where the values of the test data are known.

For a binary classification use case, a Confusion Matrix is a 2×2 matrix which is as shown below

From the above figure:
We have,

Actual Class 1 value= 1 which is similar to Positive value in a binary outcome.

Actual Class 2 value = 0 which is similar to a negative value in binary outcome.

The left side index of the confusion matrix basically indicates the Actual Values and the top column indicates the Predicted Values.

There are various components that exist when we create a confusion matrix. The components are mentioned below :-

Positive(P): The predicted result is Positive (Example: Image is a cat)
Negative(N): the predicted result is Negative (Example: Images is not a cat)
True Positive(TP): Here TP basically indicates the predicted and the actual values is 1(True)
True Negative(TN): Here TN indicates the predicted and the actual value is 0(False)

False Negative(FN): Here FN indicates the predicted value is 0(Negative) and Actual value is 1. Here both values do not match. Hence it is False Negative.

False Positive(FP): Here FP indicates the predicted value is 1(Positive) and the actual value is 0. Here again both values mismatches. Hence it is False Positive.

TPR (True Positive Rate) or Sensitivity

True Positive rate which is also known as Sensitivity measures the percentage of the True Positive with respect to the Total Actual Positives which is indicated by (TP+ FN)

TNR (True Negative Rate) or Specificity

True Negative Rate or Specificity measures the proportion of actual negatives with respect to the Total Negatives

False Positive Rate(FPR)

False Positive Rate is the percentage of Predicted False Positive (FP) to the Total No of Predicted Positive Results (TP + FP).

False Negative Rate (FNR)

False Negative Rate is the percentage of Predicted False Negative (FP) to the Total No of Predicted Negative Results (TN + FN).

An Overview of False Positives and False Negatives and how they are related with cyber security .

Understanding the differences between false positives and false negatives, and how they’re related to cybersecurity is important for anyone working in information security. Why? Investigating false positives is a waste of time as well as resources and distracts your team from focusing on real cyber incidents originating from your SIEM.

What Are False Positives and relation with cyber security?

False positives are mislabeled security alerts, indicating there is a threat when in actuality, there isn’t. These false/non-malicious alerts (SIEM events) increase noise for already over-worked security teams and can include software bugs, poorly written software, or unrecognized network traffic.

By default, most security teams are conditioned to ignore false positives. Unfortunately, this practice of ignoring security alerts — no matter how trivial they may seem — can create alert fatigue and cause your team to miss actual, important alerts related to a real/malicious cyber threats (as was the case with the Target data breach).

Strengthening our Cybersecurity Posture by Confusion matrix

The existence of both false positives and false negatives begs the question: Does your cybersecurity strategy include proactive measures? Most security programs rely on preventative and reactive components establishing strong defenses against the attacks those tools know exist. On the other hand, proactive security measures include implementing incident response policies and procedures and proactively hunting for hidden/unknown attacks.

Here are a few simple rules to help govern our approach to cybersecurity with a precaution, reactive, and protactive mindset :-

asume we are breached and begin our offensive initiatives with the goal of finding those breaches. By doing so, we will seek to validate the strength of our defensive/prevention tools with the understanding that none of them are 100% effective.
Use asset discovery tools to discover the hosts, systems, servers, and applications within your network environment, because you can’t protect what you don’t know exists.
Execute regular compromise assessments and inspect every asset residing on our network.
Define security policies and procedures, and implement educational/training requirements so our entire team knows what to do in the event we discover a hidden breach, or worse, fall victim to a data breach.
Time is our most valuable asset, so implementing tools/technology to speed our speed of detection and time to respond are key and can help our security team prevent a data breach.

In an effort to reduce false positives in fraud investigations, careful attention should be spent on steps including:-

Use natural language processing (NLP) — Sift through unstructured data, including emails, messaging, audio and video files to unearth unexpected nuance to communication or connections otherwise unclear in structured, text-only data. For example, the ability of NLP to analyze word choice, tone and possible stress levels expressed in a voicemail can sometimes offer more insight during investigations than text on page alone could offer.
Training and self-learning — Train analytics to learn from a variety of data sources, such as risk issues the organization has confronted in the past. The corresponding models can adapt over time to future risks.
Back testing — Scientifically test forensic analytics performance to evaluate its continued use. Backtesting can help establish confidence that pattern recognition models and algorithms work well and are effective in finding suspicious patterns of interest.

Hope you find this article Helpful !!

For any query contact me by Linkedin Profile.

HappY Learing : )