Training-serving skew is a gap in the ML model’s performance during training and serving due to data changes and handling discrepancies.
What is Training-Serving Skew?
Google defines training-serving skew as a difference between ML model’s performance during training and performance during serving. It takes place due to following reasons:
- A discrepancy between data handling in training and serving pipelines. E.g. training and serving code paths are not the same, model trained in Python and served in Java.
- A change in the data between training and serving
- A feedback loop between ML model and algorithm
Although data drift and training-serving skew seem similar, they have different root causes. The latter one shows more of a mismatch when the model is introduced in a production environment.
Example of training-serving skew
Google health team launched a computer vision model to detect symptoms of retinopathy using eye scan images. The model was accurate in training to identify diabetic retinopathy signs with more than 90% accuracy that is “human specialist level”—and, in principle, gave a result in less than 10 minutes. However, the prod model struggled to detect disease signs using images captured in poor lighting conditions.
Why is Detecting Training-Serving Skew Important?
Detecting training-serving skew is challenging but essential as it affects the model health:
- The model can act erratically while predicting on differently generated data than the training data
- Training-serving skew can induce logic discrepancies requiring additional engineering efforts for debugging
- Skew generates time discrepancies that result in the consumption of stale data by production models
Detecting Training-Serving Skew
Distribution skew detection
Detecting distribution skew for categorical and numerical features involves computing the baseline distribution of the feature values in the training data. The production feature inputs are analyzed for a specific time interval. For each time frame, the statistical distribution of each corresponding feature is compared with the “baseline” distribution.
Jensen-Shannon divergence or L-infinity distance methods are used to compute the statistical distance score. When this score exceeds the defined threshold value, it indicates a possibility of skew.
Feature skew detection
For feature skew detection the following approach is used
- Key join between respective batches of training and serving data
- Featurewise comparison
Addressing Skew With AI Observability
Training serving skew has been a persistent challenge in the ML model lifecycle. However, modern Observability tools are addressing this challenge by constantly monitoring data for such discrepancies. One such Observability tool is the Censius AI Observability Platform.
With the Censius Observability Platform, you can set up custom monitors and define specific metrics that need to be monitored. The platform continuously monitors the set metrics, and as soon as any skew is detected, it notifies the user of the same. This capability helps ML engineers tackle skews in a much more agile manner.