Google defines training-serving skew as a difference between ML model’s performance during training and performance during serving. It takes place due to following reasons:

A discrepancy between data handling in training and serving pipelines. E.g. training and serving code paths are not the same, model trained in Python and served in Java.
A change in the data between training and serving
A feedback loop between ML model and algorithm

Although data drift and training-serving skew seem similar, they have different root causes. The latter one shows more of a mismatch when the model is introduced in a production environment.

Example of training-serving skew

Google health team launched a computer vision model to detect symptoms of retinopathy using eye scan images. The model was accurate in training to identify diabetic retinopathy signs with more than 90% accuracy that is “human specialist level”—and, in principle, gave a result in less than 10 minutes. However, the prod model struggled to detect disease signs using images captured in poor lighting conditions.

‍

Why is Detecting Training-Serving Skew Important?

Detecting training-serving skew is challenging but essential as it affects the model health:

The model can act erratically while predicting on differently generated data than the training data
Training-serving skew can induce logic discrepancies requiring additional engineering efforts for debugging
Skew generates time discrepancies that result in the consumption of stale data by production models

Detecting Training-Serving Skew

Distribution skew detection

Detecting distribution skew for categorical and numerical features involves computing the baseline distribution of the feature values in the training data. The production feature inputs are analyzed for a specific time interval. For each time frame, the statistical distribution of each corresponding feature is compared with the “baseline” distribution.

Jensen-Shannon divergence or L-infinity distance methods are used to compute the statistical distance score. When this score exceeds the defined threshold value, it indicates a possibility of skew.

Feature skew detection

For feature skew detection the following approach is used

Key join between respective batches of training and serving data
Featurewise comparison

‍

Addressing Skew With AI Observability

Training serving skew has been a persistent challenge in the ML model lifecycle. However, modern Observability tools are addressing this challenge by constantly monitoring data for such discrepancies. One such Observability tool is the Censius AI Observability Platform.

*The Censius AI Observability Platform detecting a training serving skew for a defined threshold*

With the Censius Observability Platform, you can set up custom monitors and define specific metrics that need to be monitored. The platform continuously monitors the set metrics, and as soon as any skew is detected, it notifies the user of the same. This capability helps ML engineers tackle skews in a much more agile manner.

‍

Liked the content? You'll love our emails!

The best MLOps and AI Observability content handpicked and delivered to your email twice a month

Training-Serving Skew

What is Training-Serving Skew?

Why is Detecting Training-Serving Skew Important?

Detecting Training-Serving Skew

Distribution skew detection

Feature skew detection

Addressing Skew With AI Observability

Further Reading

Liked the content? You'll love our emails!

Censius automates model monitoring

so that you can

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare