A few days back, I was speaking to a friend who works in the fintech industry.
He said, "AI is challenging. The models need continuous monitoring."
I asked, “Why, what happened?”
“I insisted on using an AI model to assign credit risk scores. The model worked well for a year. But we received many complaints about credit decisions, and then we realized that our model denied credit to several deserving customers. We had to quickly stop the model and retrain it.”
My friend’s story intrigued me, and I started digging the internet to see if there were similar stories like this. Surprisingly I got many examples where the ML model's degradation broke businesses.
- Instacart's model's accuracy to predict item availability at stores dropped from 93% to 61% due to Covid-induced sudden shift in shopping habits.
- In March 2020, trading algorithms failed to understand market volatility. Bridgewater Associates funds experienced a 21% fall due to Covid-driven market volatility.
- In the poll conducted by KDnuggets, 24 % of respondents answered that model performance was not considered strong enough by decision-makers.
Why ML Monitoring comes into the Picture?
You work hard to deploy your models. Once your models get operationalized, you celebrate. It is undoubtedly an achievement, as many ML projects fail to reach this stage.
But is this success sustainable in production? Do your models deliver the same performance in production as in testing?
ML model development is challenging with its experimental nature. Enterprises have to sort and prioritize many things using limited resources. From data preparation, feature engineering, and model training to testing and deployment, so many things to work on. One of the most critical post-deployment ML-lifecycle components often remains unnoticed in this situation - model monitoring.
Machine learning model monitoring refers to tracking ML models' performance in production to detect potential issues.
- Monitoring helps address the issue of poor generalization that arises due to training models on smaller subsets of data
- It helps optimize production models based on the variables and parameters provided to them
- It helps analyze how the deployed model performs on real-time data over a long period
- ML monitoring involves tracking different stability metrics like Population Stability Index (PSI) and Characteristic Stability Index (CSI).
- It aids in a detailed examination of the complete ML workflow and the implementation of tweaks like model retraining or replacement
AI Observability – A new Game Changer
Model monitoring helps you track what issues arise to decide the next actions. But what if you get insights on how the model performs the way it does, and why is it making certain predictions? You get this added advantage with AI observability.
While monitoring focuses on what and when about model issues, AI observability broadens your perspective on how and why.
Let’s define AI observability formally:
AI observability is a modern, holistic and complete approach to drive insights into the ML model’s behavior, data, and performance across its lifecycle.
AI observability brings accountability and explainability into the context. It enables root cause analysis of model behavior to help you detect what went wrong, the severity of the issue, and its implications, so that you come up with the best strategy to overcome it.
AI observability ensures a better understanding of your model decisions. Such clarity on model decisions helps you control the intelligence you are building. It also fosters the user’s trust in the ML system as they are well-informed of why a particular prediction was made.
What could go wrong with your Models?
Deploying ML models into production is not the end goal of solving business problems because models degrade over time. Once your model is deployed in the production environment, it can behave differently. Initially, it might be perfectly accurate. Later it starts degrading, but you probably won’t be aware of it. In the worst case, it could be disastrous, similar to my friend's story.
Numerous things can go wrong with your deployed models.
Model drift
Model drift is the decay of the model’s predictive accuracy due to changes in the digital environment and subsequent changes in variables like data and concepts. It is usually classified into two types.
- Concept drift: It occurs due to the changes in the properties of dependent variables or the target to be predicted over time. E.g., a high-risk financial portfolio was classified as spam content earlier, but new consumer preferences would not.
- Data drift: Data drift occurs due to changes in the statistical properties of the independent variables, such as feature distributions, changes in the data collection system, or dynamic behavior of noise in the data. E.g., Changes in semantic discrepancies for spam content require the spam email classifier model to be updated. One such semantic change is a new synonym used for an old term.
How AI Observability solves model drift?
AI observability helps you detect early signals on data and real-world concept changes. These clues might help you plan subsequent actions such as model updates. Drift detection is not necessarily a panic condition always. Sometimes it is just a hidden data quality issue or a false positive warning. As a result, the model remains performant in production, despite drift detected. AI observability not only helps with drift warnings but also helps analyze its reason and find exact segments that are affected. Backed with these insights, you can strategize to retrain or rebuild your model.
Data Quality Issues
There could be various issues related to data quality.
Problems with preprocessing pipeline: Streaming data pipelines can include multiple data sources. A change in one or more data sources can cause a breakage in the data pipeline.
Changes in source data schema: Sometimes, valid changes made to the data at the source can cause data quality issues. For example, renaming an existing feature column and adding one more column to catch new data. These schema changes affect models unless it is not updated to map the relationship between the new column with the existing columns.
Data loss at the source: Data loss or corruption is one of the biggest threats to the data integrity of ML systems. Sometimes the ML application is unaware of these losses and accepts data from a corrupted source.
How AI Observability solves data quality issues?
AI observability aids you in investigating when feature data is missing or deviates from a specific range and exceeds defined thresholds. These investigations help you track streaming production data for unwanted changes and figure out implementations such as changing preprocessing pipeline or model retraining. With timely alerts on data quality violations, you can plan to act instantly or later based on severity.
Outliers
An outlier is an instance in a given dataset that lies far from the rest of the data points. In simple terms, an outlier indicates a vastly larger or smaller data point than the rest values in the set. For example, a sudden spike in credit or debit card usage might suggest that the card is stolen and transactions attempted are fraudulent.
Training-serving skew
Training-serving skew occurs when the ML model’s performance during training significantly deviates from the performance during serving.
Adversarial attacks
ML models are more prone to adversarial attacks. As ML models serve critical industries like finance, banking, and insurance, fraudsters target these ML applications to manipulate models.
With such attacks, hackers mislead your ML system with the wrong examples to generate purposeful outputs by the system.
Other concerns that need your attention
Machine learning model monitoring and software monitoring are not the same. AI model monitoring becomes a challenging job as data plays a pivotal role here. It is not just the code snippets you are worried about, but data quality, integrity, and seamless pipelines - all must be taken care of.
Secondly, silent failures could be riskier if unnoticed. You don’t have direct indicators like “Page or file not found” or HTTP 404. Despite problematic input data fed to the model, the model still generates predictions. These silent failures of models awake you when it becomes disastrous.
Finally, how do we distinguish between good and bad performance of models? One outlier does not convey that model is in trouble, and sometimes stable accuracy of predictions can mislead.
How AI observability maintains model performance?
AI observability helps you track performance metrics like accuracy, F1 score, recall, precision, sensitivity, specificity, RMSE, and MAE based on the model type. It enables you to analyze model performance across different cohorts and slices of predictions. You can get insights on issues to decide subsequent actions based on related monitoring violations.
Choosing between ML Monitoring and AI Observability
Selecting between ML monitoring and AI observability will be easy if you evaluate with an example.
Consider a hypothetical model deployed to recognize customer churn. The model identifies possible user churns to tweak marketing strategies and prevent these churns. The model was suitable for predicting churns. But it delivered odd decisions for a specific cost range.
Now ML monitoring helps detect the model’s failure to understand churn compared to ground truth. But it won’t help you investigate further.
On the other hand, AI observability will not just detect the model’s failure but also helps you understand the why behind it. It aids in slicing the data and analyzing the exact product cost segment affected. You can backtrack the root cause for specific decisions and learn the reason for these predictions.
With segmented analysis, you might discover some interesting insights like
- The model might be performing well overall but not giving correct predictions for a specific age group
- In A/B testing, one model version might show better results for some segments, while another version performed well with the other segments
Here’s a table that summarizes the differences between ML model monitoring and AI Observability.
Getting AI Observability Right
“You can’t improve something that cannot be measured”
While ML model monitoring solves the challenge of detection, AI observability ropes in detection and analysis.
So how do you go ahead with AI Observability?
Though there are multiple tools and solutions out there, we would like to talk about our product - the Censius AI Observability Platform.
The core benefit of using the Censius AI Observability Platform is its user experience, making it a very easy tool for every stakeholder. The platform was designed to ensure AI observability is inclusive and addresses the issues of everybody involved - setting up monitors, sharing performance through beautiful dashboards, customizing alerts, and much more.
Apart from the user experience, we have built the platform with continuous feedback from the ML community. We wanted to ensure the platform had the ability to address the unique needs of business and this led us to include features such as
- Custom metrics to measure model performance
- Cohort monitoring
- Setting up custom model monitors
- Automatic monitors
- Customizing alert severity
You can get a customized demo to see how Censius brings model monitoring and explainability to your unique use case. We are also offering a 14-day, no-strings attached free trial to every sign-up so that you experience the product before committing to it.
Explore how Censius helps you monitor, analyze and explain your ML models
Explore Platform