ML diagnostics refers to tests designed to recognize and troubleshoot potential issues and apply possible improvements at different stages of training and developing ML models.

ML diagnostics tests offer contextual insights on what is working with the learning algorithm, what is not working, and how to improve the performance of the ML model.

Diagnostics checks include different types of tests such as dataset sanity checks, model checks, leakage detection, and more. Some diagnostic checks consider dataset characteristics and help avoid common pitfalls in evaluation metrics with warning mechanisms. Other diagnostics checks include additional tests performed after model training. These tests help identify potential data leakage, overfitting of the model, etc. so that these issues are fixed before deployment.

‍

Why is Model Diagnostics Important?

Machine learning diagnostics help ML practitioners gain insights on the failure modes and capabilities of training ML models. It also guides ML professionals on how to improve ML model performance.

Diagnostics runs involve probing a model for particular learned qualities that can be positive or potentially problematic. Positive qualities include syntactic knowledge acquired by a model, and problematic qualities include bias or variance. Diagnostics checks often investigate the following aspects in ML models:

Evaluate a hypothesis
Acquisition of syntactic knowledge
Diagnose bias and stereotypes
Any specific phenomena or scope to further improve models

Performing ML Diagnostics for ML Projects

Modern collaborative data science tools facilitate different diagnostic test runs on training as well as deployed models. These are some of the examples of popular diagnostics tests performed using such platforms:

Dataset sanity checks: These diagnostic checks ensure that the dataset used for evaluation rightly represents both training and future scoring data.

Underfitting and overfitting detection: Diagnosing high bias or high variance helps find if the model is under-fit or over-fit. The under-fit model fails to capture sufficient information from the data, whereas the over-fit model fails to generalize to new data arrived.

Leakage detection: When test and training datasets share some overlaps, the model achieves unrealistic high performance in training due to data leakage.

Abnormal predictions detection: A diagnostic test run if the model predicts the same class(output) for all samples provided. It can be due to imbalanced datasets or inadequate training parameters.

Industry experts such as Google Researchers suggest following practices after performing their diagnostic experiment.

Restrict conclusions to a specific checkpoint
The single diagnostic outcome shall not be generalized to complete training setup
It is preferred to test diagnostic tools on publicly available checkpoints and multiple model configurations

ML diagnostics drive insights on model failure possibilities and help apply the best remedial solutions to avoid issues detected in diagnostic runs.

Liked the content? You'll love our emails!

The best MLOps and AI Observability content handpicked and delivered to your email twice a month

ML Diagnostics

What is ML Diagnostics?

Why is Model Diagnostics Important?

Performing ML Diagnostics for ML Projects

Further Reading:

Liked the content? You'll love our emails!

Censius automates model monitoring

so that you can

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare