It is to fit a range of ML models on a given predictive modeling dataset using a variety of tools and libraries. The real problem is how to select among a variety of models that you may employ to solve your problem.
You could erroneously believe that the model's performance is sufficient, but you must also consider other aspects, such as the time it takes to train the model or the ease of communication required with project stakeholders.
We cannot predict which model will solve the problem the best. As a result, we fit and assess various models for the issue. This article will discuss model selection, its different approaches, and how you can evaluate ML models.
What is Model Selection?
“The process of selecting the machine learning model most appropriate for a given issue is known as model selection.”
Model selection is a procedure that may be used to compare models of the same type that have been set up with various model hyperparameters and models of other types.
Why Model Selection?
Model selection is a procedure used by statisticians to examine the relative merits of different predictive methods and identify which one best fits the observed data. Model evaluation with the data used for training is not accepted in data science because it easily generates overoptimistic and overfitted models.
You may have to check things like
- Overfitting and underfitting
- Generalization error
- Validation for model selection
For certain algorithms, the best way to reveal the problem's structure to the learning algorithm is through specific data preparation. The next logical step is to define model selection as the process of choosing amongst model development workflows.
So, depending on your use case, you choose an ML model.
How to Choose the Best Model in Machine Learning
The choice of model is influenced by many variables, including dataset, task, model type, etc.
Generally, you need to consider two factors:
- Reason for choosing a model
- The model's performance
So let's explore the reason behind selecting a model. You can choose models based on their data and task:
Type of data
- Images and videos
If your application mainly focuses on images and videos, for example, image recognition. The Convolutional Neural Network model works better with images and videos when compared to other models.
- Text data or speech data
Similarly, recurrent neural networks (RNN) are employed if your problem includes speech or text data.
- Numerical data
You may use Support Vector Machine (SVM), logistic regression, and decision trees if your data is numerical.
How to select a model based on the task?
- Classification Tasks - SVM, logistic regression, and decision trees.
- Regression tasks- Linear regression, Random Forest, Polynomial regression, etc.
- Clustering tasks- K means clustering, hierarchical clustering.
Therefore, depending on the type of data you have and the task you do, you may use a variety of models.
Model Selection Techniques
As the name implies, resampling methods are straightforward methods of rearranging data samples to see how well the model performs on samples of data it hasn't been trained. Resampling, in other words, enables us to determine the model's generalizability.
There are two main types of re-sampling techniques:
It is a resampling procedure to evaluate models by splitting the data. Consider a situation where you have two models and want to determine which one is the most appropriate for a certain issue. In this case, we can use a cross-validation process.
So, let’s say you are working on an SVM model and have a dataset that iterates multiple times. We will now divide the datasets into a few groups. One group out of the five will be used as test data. Machine learning models are evaluated on test data after being trained on training data.
Let's say you calculated the accuracy of each iteration; the figure below illustrates the iteration and accuracy of that iteration.
Now, let's calculate the mean accuracy of all the iterations, which comes to around 84.4%. You now use the same procedure once again for the logistic regression model.
You can now compare the mean accuracy of the logistic regression model with the SVM. So, according to accuracy, you might claim that a certain model is better for a given use case.
To implement cross-validation you can use sklearn.model_selection.cross_val_score, like this:
>>> from sklearn import datasets, linear_model
>>> from sklearn.model_selection import cross_val_score
>>> diabetes = datasets.load_diabetes()
>>> X = diabetes.data[:150]
>>> y = diabetes.target[:150]
>>> lasso = linear_model.Lasso()
>>> print(cross_val_score(lasso, X, y, cv=3))
[0.3315057 0.08022103 0.03531816]
Another sampling technique is called Bootstrap, and it involves replacing the data with random samples. It is used to sample a dataset using replacement to estimate statistics on a population.
- Used with smaller datasets
- The number of samples must be chosen.
- Size of all samples and test data should be the same.
- The sample with the most scores is therefore taken into account.
In simple terms, you start by:
- Randomly selecting an observation.
- You note that value.
- You put that value back.
Now, you repeat the steps N times, where N is the number of observations in the initial dataset. So the final result is the one bootstrap sample with N observations.
Information Criterion is a kind of probabilistic measure that can be used to evaluate the effectiveness of statistical procedures. Its methods include a scoring system that selects the most effective candidate models using a log-likelihood framework of Maximum Likelihood Estimation (MLE).
Resampling only focuses on model performance, whereas probabilistic modeling concentrates on both model performance and complexity.
- IC is a statistical metric that yields a score. The model with the lowest score is the most effective.
- Performance is calculated using in-sample data; thereforea test set is unnecessary. Instead, the score is calculated using all the training data.
- Less complexity entails a straightforward model with fewer parameters that is simple to learn and maintain but unable to detect fluctuations that affect a model's performance.
There are three statistical methods for calculating the degree of complexity and how well a particular model fits a dataset:
Akaike Information Criterion (AIC)
AIC is a single numerical score that may be used to distinguish across many models the one that is most likely to be the best fit for a given dataset. AIC ratings are only helpful when compared to other scores for the same dataset.
Lower AIC ratings are preferable.
AIC calculates the model's accuracy in fitting the training data set and includes a penalty term for model accuracy.
K = the number of distinct variables or predictors.
L = the model's greatest likelihood
N is the number of data points in the practice set (especially helpful in the case of small datasets)
The drawback of AIC is that it struggles with generalizing models since it favors intricate models that retain more training data. This implies that all tested models might still have a poor fit.
Minimum Description Length (MDL)
According to the MDL concept, the explanation that allows for the most data compression is the best given a small collection of observed data. Simply put, it is a technique that forms the cornerstone of statistical modeling, pattern recognition, and machine learning.
d = model D = the model's predictions
L(h) is the number of bits needed to express the model.
L(D | h) = amount of bits needed to describe the model's predictions
Bayesian Information Criterion (BIC)
BIC was derived using the Bayesian probability idea and is appropriate for models that use maximum likelihood estimation during training.
BIC is more commonly employed in time series and linear regression models. However, it may be applied broadly for any models based on maximum probability.
Structural Risk Minimization (SRM)
There are instances of overfitting when the model becomes biased toward the training data, which is its primary source of learning.
A generalized model must frequently be chosen from a limited data set in machine learning, which leads to the issue of overfitting when the model becomes too fitted to the specifics of the training set and performs poorly on new data. By weighing the model's complexity against how well it fits the training data, the SRM principle solves this issue.
J(f) is the complexity of the model
Metrics for Evaluating Regression Models
Model evaluation is crucial in machine learning. It simplifies presenting your model to others and helps you understand how well it performs. Several evaluation metrics are available, but only a few can be employed with regression.
- Mean Absolute Error(MAE):
The MAE adds up each error's absolute value. It is an important metric to evaluate a model. You can simply calculate MAE by importing:
from sklearn.metrics import mean_absolute_error
- Mean Square Error(MSE)
While MAE handles all errors equally, MSE is computed by adding the squares of the real output and the expected output, then dividing the result by the total number of data points. It provides an exact number indicating how much your findings differ from what you projected.
from sklearn.metrics import mean_squared_error
- Adjusted R Square
R Square quantifies how much of the variation in the dependent variable the model can account for. Its name, R Square, refers to the fact that it is the square of the correlation coefficient (R).
You can use Statsmodel or Sklearn Package for this.
Learn more about Key Metrics In Model Monitoring And How To Measure Them
When comparing machine learning models, you must choose a tool or platform that can support your team's needs and your business goal.
With Censius, you can monitor each model's health in one place, use the user-friendly interface to comprehend models and analyze them for particular problems.
- Evaluate performance without ground truth
- Compare the past performance of a model.
- Create personalized dashboards.
- Compare performance between model iterations.
Get a Customized Demo
Model selection is the process of selecting the model that generalizes the best. Less complicated models have fewer parameters, which causes high bias and low variance, which causes under-fitting.
Low bias is caused by more parameters, whereas over-fitting is caused by large variance. Ineffective model performance might result from either too few or too many parameters.
To maintain balance, a penalty term is introduced. For example, when more parameters are added, the model is penalized with a significant penalty, which results in a simpler model.
In this article, we saw what model selection is, and what are different model selection techniques. We also discussed how you should employ different methods that make the most sense for your project.
References and recommended Reading
Explore how Censius helps you monitor, analyze and explain your ML modelsExplore Platform