ML diagnostics include tests to identify potential issues and apply improvements at different stages of the ML development process.
What is ML Diagnostics?
ML diagnostics refers to tests designed to recognize and troubleshoot potential issues and apply possible improvements at different stages of training and developing ML models.
ML diagnostics tests offer contextual insights on what is working with the learning algorithm, what is not working, and how to improve the performance of the ML model.
Diagnostics checks include different types of tests such as dataset sanity checks, model checks, leakage detection, and more. Some diagnostic checks consider dataset characteristics and help avoid common pitfalls in evaluation metrics with warning mechanisms. Other diagnostics checks include additional tests performed after model training. These tests help identify potential data leakage, overfitting of the model, etc. so that these issues are fixed before deployment.
Why is Model Diagnostics Important?
Machine learning diagnostics help ML practitioners gain insights on the failure modes and capabilities of training ML models. It also guides ML professionals on how to improve ML model performance.
Diagnostics runs involve probing a model for particular learned qualities that can be positive or potentially problematic. Positive qualities include syntactic knowledge acquired by a model, and problematic qualities include bias or variance. Diagnostics checks often investigate the following aspects in ML models:
- Evaluate a hypothesis
- Acquisition of syntactic knowledge
- Diagnose bias and stereotypes
- Any specific phenomena or scope to further improve models
Performing ML Diagnostics for ML Projects
Modern collaborative data science tools facilitate different diagnostic test runs on training as well as deployed models. These are some of the examples of popular diagnostics tests performed using such platforms:
Dataset sanity checks: These diagnostic checks ensure that the dataset used for evaluation rightly represents both training and future scoring data.
Underfitting and overfitting detection: Diagnosing high bias or high variance helps find if the model is under-fit or over-fit. The under-fit model fails to capture sufficient information from the data, whereas the over-fit model fails to generalize to new data arrived.
Leakage detection: When test and training datasets share some overlaps, the model achieves unrealistic high performance in training due to data leakage.
Abnormal predictions detection: A diagnostic test run if the model predicts the same class(output) for all samples provided. It can be due to imbalanced datasets or inadequate training parameters.
Industry experts such as Google Researchers suggest following practices after performing their diagnostic experiment.
- Restrict conclusions to a specific checkpoint
- The single diagnostic outcome shall not be generalized to complete training setup
- It is preferred to test diagnostic tools on publicly available checkpoints and multiple model configurations
ML diagnostics drive insights on model failure possibilities and help apply the best remedial solutions to avoid issues detected in diagnostic runs.