Machine learning is becoming more prominent in complex real-world applications like self-driving vehicles and healthcare. However, ML-powered models might fail in mysterious or complicated ways. Autonomous cars, for example, have collided with one sort of highway lane barrier on many occasions. Such failures emphasize the need to verify model quality and enhance models over time, particularly when machine learning is applied in mission-critical domains.
In this post, we'll look at model assertions and how they can help you improve your machine learning models. We will take reference from a research paper written by Daniel Kang, Deepti Raghavan, Peter Bailis, and Matei Zaharia.
What are Model Assertions?
Model assertions are black-box functions that identify when ML models are exhibiting flaws. Model assertions may be used by data scientists, engineers, and domain experts to detect when ML models are having issues. In simple words, a model assertion takes a model's inputs and outputs and generates records with possible mistakes.
- Model assertion calculates a severity score, with a 0 representing abstention.
- It may be utilized to take corrective action or monitor the process in real-time.
- The inputs to model assertions are a collection of past inputs and predictions.
To better grasp the concept, let's take an example of object detection. Pretrained object detection models can be used in video analytics to recognize objects such as vehicles(as seen in the video above). Flickering is a prevalent issue with these models when the model sees things that vanish in between consecutive frames, resulting in clearly incorrect results. Sometimes labelers consistently miss particular objects, which might create issues that lead to significant safety issues.
Using model assertions
Model assertions can be installed as a Python library. Simply install the package to use it in your system.
pip install model_assertions
OMG is a model assertions prototype library that integrates with existing Python machine learning training and deployment frameworks. OMG comprises three components: an API for describing assertions, a runtime engine for detecting assertions, and a pipeline for improving models by combining weak supervision and active learning with assertions. Model assertions may be used in four different ways at runtime and during training:
- Runtime Monitoring: It can be used in a data analysis pipeline to gather information on improper behavior and identify types of failures where models go haywire.
- Corrective Action: If a model assertion triggers during runtime, this can start a corrective action, like transferring a command to a human hand.
- Active Learning: It may be used to determine which inputs cause the model to fail. If a frame contains a flickering assertion, it must be relabeled before the model can be retrained using these frames.
- Weak Supervision: Weak supervision improves model quality by incorporating better input from human experts. If there are some inconsistencies with data labeling, simple automated corrective rules would automatically relabel the wrong model outputs.
Training models via model assertions
You can use model assertions to train your models. This works by triggering assertions based on a set of inputs. You can use human labelers to generate human labels, which you can then use to train the model. This, however, raises several questions, the most significant of which is how you should choose data points to label for active learning.
Let's say you have a set of data points, and one assertion point flags data points B and C, whereas another assertion flags data points A & C.
While this is a more general trend in which many assertions can flag the same data point and a single assertion can flag multiple data points. This may raise the issue of which data points should be labeled. Researchers developed a model assertion-based bandit algorithm. The fundamental concept is to select model assertions with the highest reduction in assertions triggered.
Model assertions can also be used to train models via weak supervision. Given a set of inputs that trigger an assertion, we can utilize model assertions combined with corrective rules to create weak labels, which can subsequently be used to retrain models.
Finding errors in ML models via Model assertions
Let’s take the same flickering example. As you can see, frame 2 isn't detecting anything. So, if a box appears in frames 1 and 3, a box should appear in frame 2, i.e., assert(box in frame 1 and 3 => box in frame 2). So if this assertion triggers, it can be used for corrective action or runtime monitoring.
def flickering(
recent_frames: List[PixelBuf],
recent_outputs: List[BoundingBox]
) -> Float
Improving Model Quality with Assertions
Companies, in most circumstances, do not have an enormous amount of data points to operate. Instead, they are frequently required to work with small datasets, which are prone to disappointing results. They can make a significant improvement by using model assertion.
- It helps in the identification of label inconsistencies and the improvement of label quality.
- Model assertion-based active learning outperforms baselines.
- Generates high accuracy results with fewer errors.
- Produces consistent results.
OMG and model assertions have been applied in various fields, including entertainment, medicine, and autonomous vehicles. Researchers collected frames where each assertion was fired and random frames to use as training data from multiple videos to test the accuracy.
The first several assertions were tested on video analytics use cases, while others were put on self-driving vehicles and ECG data sets, as shown in the table above. The true positive rate for all five assertions was at least 88 %, indicating that the model assertions can be deployed with a few lines of code and a high true positive rate. In simple words, it finds errors with high precision.
Active Learning: Researchers compared against retraining with randomly sampled frames as a simple baseline for active learning. They discovered that retraining with frames that triggered each assertion alone and with both assertions together could significantly increase mAP compared to the pre-trained model and the random baseline.
Weak Supervision: When exposure to ground-truth labels is limited, retraining the model with the flickering assertion and accompanying corrective rule can improve mAP.
Two videos are provided below to provide a qualitative sense of the improvements. The original SSD is shown on the left, and the best-retrained SSD is shown on the right, both over the same data clip.
As you can see, the video on the right side is more consistent than the one on the left. The retrained model generates fewer errors; it doesn't provide projections that flicker or have bounding boxes improperly overlapping.
Conclusion
Model assertions can help you consistently enhance your models. Although it is still in its early stages, the results are promising. Selecting training data using model assertions can be up to 40% less expensive than traditional approaches. There will be a lot of development in the coming days, and you will be able to apply model assertions on a large scale.
If you have any queries about model assertions, please contact the researcher through email - modelassertions@cs.stanford.edu
References:
OMG - Github
Model Assertions for Debugging Machine Learning
Model Assertions as a Tool for Quality Assurance and Improving ML Models
Explore how Censius helps you monitor, analyze and explain your ML models
Explore Platform