"If you can't explain it simply, you don't understand it well enough." - Albert Einstein.
We are well familiar with these philosophical concepts. But how does this philosophy relate to ML model decisions? Can we always trust a model that excels in performance?
With extensive use of ML models in routine to life-altering decisions, it becomes a prerequisite to understand how your model thinks and arrives at a conclusion. An ML model might reject your loan application for a specific reason. But you would expect the model or respective bank to explain this denial. And the low-income range was the exact reason why your application was rejected.
So to understand models' thought processes, you need AI interpretability and explainability. In one of our blogs, we compared these two. Here we will dive deep into AI interpretability concepts and tools.
What is AI Interpretability?
AI interpretability helps users or experts understand the rationale behind the ML model’s predictions. Two popular definitions of AI interpretability in the literature include:
“Interpretability is the degree to which a human can understand the cause of a decision.” — Tim Miller.
“Interpretability is the degree to which a human can consistently predict the model’s result.” — Been Kim.
Why do we need to Interpret ML models?
Model interpretation helps you understand the model's decision-making process and why it arrived at a particular conclusion.
For example, A model used to process loan applications might produce erroneous outcomes for younger individuals with low monthly average balance in a bank.
Or lets take a more familiar example - A cat classifier!
A model trained on 4000 images (including cat images) and tested on 1000 images attained 85% accuracy with 150 cases of misclassification by model. Here are some misclassified images:
With these misclassification examples, these facts are apparent:
- White cats are not being identified correctly
- Some dogs are recognized as cats
- Dark background images showed misclassification
Model interpretation helps identify the cause-and-effect relationship within the ML system’s input and output. Model interpretability is essential for the following reasons:
- Model interpretability is crucial in debugging an algorithm
- The interpretability of ML models helps guard against embedding bias by identifying and rectifying the source of bias
- It facilitates easy learning transfer into a broader knowledge base
- With interpretation, we can ensure higher reliability of ML systems with more trust in model outcomes
- Machine learning interpretability helps compute the effects of trade-offs and anticipate the future performance of models.
- Interpretability helps ML systems comply with industry standards, company policies, and government regulations.
- Sometimes your model predicts correctly, but it arrives at that prediction incorrectly. ML interpretation helps you here to fix such errors.
Methods of Model Interpretation
With an understanding of model interpretability concepts and its need, we can move ahead with different ML model interpretation methods.
In the first section, we will cover the different phases when model interpretation helps you in the ML lifecycle.
In the pre-modeling phase
How do you get the best understanding of data? Mainly through exploratory data analysis and visualizations. This stage primarily refers to the interpretability methods applicable before selecting and building the model. Also, understanding the statistical properties of attributes and their impact on the output helps here.
During the modeling phase
This in-model interpretability refers to intrinsically interpretable models. We can distinguish models into the white box and black box models. While white-box models are simple and explainable, black-box models are opaque and complex to understand. Such complex models are the favorite research areas for tech teams to make them more transparent and explainable.
Post-hoc or Post modeling phase
These techniques are aimed at interpreting models after their development. The whole purpose of post hoc interpretation is to understand the dynamics of input features and output predictions.
Model-specific Vs. Model-agnostic interpretation
Model-specific interpretation techniques are specific to certain models and depend on the inner mechanics of the model to drive certain predictions. For example, these methods may include weights and biases for neural networks.
On the other hand, model-agnostic methods apply to any model. They are primarily post-hoc techniques used after training. For example, these techniques analyze relations between feature input-output pairs.
A distinction can also be drawn between the local & global interpretation techniques, wherein local interpretation drives explanations for a single specific prediction, and global interpretability techniques explain the overall model.
Model Interpretation Tools
Let’s start exploring different model interpretation tools in this section. The first one works by the principle of trust - LIME.
With ML becoming necessary everywhere, users have become more conscious about trusting ML predictions. Therefore ML developers’ goal is to build a trustworthy ML system that can add consistent and positive business value. And here, LIME comes into the picture.
LIME stands for local interpretable model-agnostic explanations. It is based on the paper “Why Should I Trust You?”: Explaining the Predictions of Any Classifier” by Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin.
LIME is a technique that approximates any black-box model prediction with a local, interpretable model to explain each prediction. It helps interpret prediction scores generated by any classifier. It has a library for both Python and R.
LIME provides a generic framework to interpret back box models. It requires minimal effort to find details about the features that affect the model decisions. The creators of LIME highlight these four essential criteria for explanations:
- Interpretable: The explanation must be easy to understand depending on the target demographic
- Local fidelity: The explanation should drive reasons on how the model behaves for individual predictions
- Model-agnostic: The method applies to any model
- Global perspective: While explaining individual predictions, the model should be interpretable in its entirety.
LIME supports tabular, text, multiclass, and image interpretation. Its installation is simple. Let's find out how LIME provides explanations for a linear regression model trained on the cereals rating dataset. Some critical attributes include cereal name, manufacturer, type, calories, protein, fats, and more. You can check the complete dataset here.
Features include components like Carbohydrates, Proteins, Fiber, Sodium, sugars, and so on. The output was the rating given by the consumers on a scale of 0 to 100.
You can check LIME explanations specific to low-rating instances here:
For an average rating instance, LIME provided the following explanations.
Here, you can see which features had positive or negative influence on the specific ratings belonging to the dataset.
SHapley Additive exPlanations is a game-theoretic approach to explain the output generated by any ML model. SHAP computes the contribution of each feature corresponding to a specific prediction to explain it. SHAP uses the classic Shapley values from game theory and their extensions to ensure optimal credit allocation with local explanations.
Let's simplify things with an example. Consider a credit score classification dataset that includes features like monthly debt, the amount invested, current loan amount, etc. The model predicts if a loan can be granted or not.
The scatter plot shows the distribution of values ranging from 0 to 1 depending on its contribution to the model's prediction.
Here, the outcome of each possible combination of elements should be considered to determine the importance of a single element. These values are assigned based on how much that particular element in the column contributes to the outcome for recommending loans.
Now, look at the bar chart. We can infer that count_of_emi_bounce is the feature that contributes most to the outcome and the term contributes least.
Acronym ELI5 stands for ‘Explain like I am a 5-year-old’. It is a Python library that helps interpret most machine-learning models. Let’s have a look at popular ways of interpreting classification or regression models:
Global interpretation: Inspect model parameters and try to figure out how the model works globally. Global interpretation is a way of analyzing important features in predictions.
Local interpretation: Inspect a model's individual prediction to analyze why the model makes a particular decision.
Now let’s find out how ELI5 helped analyze predictions on the Titanic dataset. For global interpretation, we use:
And ELI5 shows:
For interpreting an individual prediction or local interpretation, ELI5 has:
Here, Valid_xs indicates the record at index 1.
And ELI5 shows the interpretation for only the first record:
ELI5 has built-in support for many machine learning frameworks, including scikit-learn, Keras, XGBoost, LightGBM, and more. And its TextExplainer module helps to interpret text classification models.
In this long blog post, we covered AI interpretability concepts in detail. We also overviewed different interpretability tools used such as LIME, SHAP, and ELI5 that make your model interpretation easier.
And if you are looking for AI explainability tools, Censius is your one-stop solution to all explainability needs.
With global, cohort, and local explainability offered by Censius, you can make highly informed decisions backed by ML models. Check out Censius's explainability features in detail.
Reach out to us in your journey of building trustworthy, responsible, and explainable AI systems. We would be more than happy to help you.
Explore how Censius helps you monitor, analyze and explain your ML modelsExplore Platform