ML Model Metrics

ML Model Metrics

ML model metrics help to evaluate the performance of the machine learning models.

What are Metrics?

ML model metrics help to evaluate the performance of the machine learning model. Evaluation metrics play a crucial role in the machine learning pipeline to validate a model. 

Choosing the right evaluation metric from the proposed set of evaluation metrics is tricky. Applying a single metric is sometimes inadequate to assess models properly. In such cases, ML practitioners rely on the subset of defined metrics.  

Different types of Metrics used in ML

Confusion Matrix

It is not exactly a performance metric, but it offers a convenient representation to evaluate other performance metrics. The confusion matrix or error matrix is a tabular visualization of the ground-truth labels versus model predictions. Each row of this matrix corresponds to the instances in a predicted class, whereas each column specifies the instances in an actual class. 

For example, Let’s assume our Null Hypothesis H⁰ be “The individual has Diabetic Retinopathy”.

Classification accuracy

Classification accuracy is the simplest metric used to evaluate classification models.

It is defined as the number of accurate predictions divided by the total number of predictions multiplied by 100.  


This metric is typically applied where classification accuracy is not sufficient to indicate model performance.

Precision is nothing but the ratio of true positives and total positives predicted. Mathematically, it is defined as:

Precision= True_Positive/ (True_Positive+ False_Positive)


It is defined as the fraction of samples from a class that are accurately predicted by the model. It calculates how many of the actual positives the model captures correctly and labels positive. Mathematically it is defined as:

Recall= True_Positive/ (True_Positive+ False_Negative)


Several ML applications require attention to both precision and recall metrics. Such applications demand proposing a new metric that combines precision and recall in a single metric – the F1 Score.

It is defined as:

F1-score= 2*Precision*Recall/(Precision+Recall)

ML professionals often observe a trade-off between precision and recall of a model. It means for a higher precision rate, there will be a drop in the recall rate and vice versa. 

Sensitivity and Specificity 

Sensitivity and specificity are two other metrics used in medical and biology domains. These are defined as:

Sensitivity= Recall= TP/(TP+FN)
Specificity= True Negative Rate= TN/(TN+FP)

AUROC (Area under Receiver operating characteristics curve)

It is also termed as AUC-ROC score/curves. It uses true-positive rates (TPR) and false-positive rates (FPR):

True Positive Rate= True Positive/True Positive + False Negative
False Positive Rate=False Positive/ False Positive + True Negative

The receiver operating characteristic curve or plot presents the performance of a binary classifier as a function of its cut-off threshold. The plot shows TPR against FPR for different threshold values.

  • TPR/recall specifies the proportion of positive data points that are considered positive w.r.t. all points classified as positive.
  • FPR/fallout specifies the proportion of negative data points that are mistakenly counted as positive w.r.t. all points classified as negative.

Now, FPR and TPR are combined into a single metric known as AUROC. For this, we need to compute FPR and TPR using many different threshold values for the logistic regression, and then they are combined in a single plot – ROC curve.  

The area under the curve (AUC) defines an aggregated measure of performance of a binary classifier on all possible thresholds. AUC corresponds to the area under the ROC curve. It is between 0 and 1.

Tracking Performance Metrics Effortlessly

ML models are evaluated by monitoring different performance metrics specific to each model. Tracking these metrics has become simplified with ML monitoring solutions. One such complete model monitoring solution is the Censius AI Observability Platform. It allows you effortlessly track the metrics mentioned above and take required actions to monitor violations. A prompt user alert system helps you ensure desired performance of your ML models.  

The Censius ML Monitoring solution accelerates your ML initiatives by automating monitoring tasks and allows you to focus on core jobs.

Liked the content? You'll love our emails!

The best MLOps and AI Observability content handpicked and delivered to your email twice a month

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Censius automates model monitoring

so that you can 

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare

Start Monitoring