Overfitting Vs. Underfitting
Machine Learning

Overfitting Vs. Underfitting

Model overfitting and underfitting scenarios depict how well the model learns the training data. Finding a good fit between these two is imperative.

Model Overfitting 

Model overfitting is a notion that arises when the model learns too well the training data. Models know the detail and noise in the training data so deeply that it negatively affects their performance on unseen data. 

In this case, irrespective of the noise in the data, your model will still generalize and make predictions. But, the generalization would most likely be inaccurate.

A low error rate and high variance indicate overfitting. Here the term variance denotes an antonym of ML bias that signifies too many unnecessary data points learned by a model. 

Example

Consider a model predicting the chances of diabetes in a population base. If this model considers data points like income, the number of times you eat out, food consumption, the time you sleep & wake up, gym membership, etc., it might deliver skewed results.

Model Underfitting 

An ML algorithm is underfitting when it cannot capture the underlying trend of the data. That means it fails to model the training data and generalize it to new data. Underfit models are known to be poor learners. They are mainly characterised by insufficient learning & wrong assumptions affecting their learning abilities. 

Model underfitting occurs when a model is overly simplistic and requires more training time, input characteristics, or less regularization. Indicators of underfitting models include considerable bias and low variance.     

Example

In the above diabetes prediction model, due to a lack of data available and inadequate access to an expert, only three features are selected - age, gender, and weight. Crucial data points are left unnoticed, like genetic history, physical activity, ethnicity, pre-existing disorders, etc. The result is an underfit or biased model.

Achieving a Good Fit In Machine Learning 

Finding a good balance between overfitting and underfitting models is crucial but difficult to achieve in practice. 

To discover the best-fit model, you should examine the model performance with training data over time. With time the algorithm learns, and the model's error on the training data decreases, as does the test dataset. However, stretching the model training too long may capture extraneous information and noise in the training data set, leading to an overfit model. So it's essential to cease training at the right time. 

You can get the best-fit model by locating a sweet spot at the point just before the error on the test dataset starts to increase. At this point, your model has good skill on both the training and unseen test datasets.

Comparing Model Overfitting & Underfitting

Further Learning 

Overfitting and Underfitting Principles

Overfitting and Underfitting in Machine Learning + [Example]

Overfitting and Underfitting With Machine Learning Algorithms

Liked the content? You'll love our emails!

The best MLOps and AI Observability content handpicked and delivered to your email twice a month

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Censius automates model monitoring

so that you can 

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare

Start Monitoring