Data And Model Versioning
MLOps

Data And Model Versioning

Versioning is the process of uniquely naming multiple iterations of models used at different stages of ML development to track all changes.

What is Data and Model Versioning?

Versioning refers to the process of uniquely naming multiple iterations of an ML model used at different stages of ML development. It helps track and control all changes applied to various versions allowing the easy recovery of a previous model version when needed.

An example of data and model versioning

ML experiment involves different project versions with specific enhancements or changes in each version. These changes might include

  1. Update features
  2. Update parameters
  3. Adjust parameters
  4. Add the new dataset and features
  5. Readjust parameters

Data versioning tools allow capturing the versions of data and models and switching between different versions as needed. It offers a unified way of accessing data, code, and ML models – a complete journal of work performed. For example, check this elaborative image given in DVC user documentation.

An image showing the different versions of data, features and model
An image showing the different versions of data, features and model. Source: DVC

 

Why is Data and Model Versioning Essential?

Helps bring reproducibility 

ML versioning helps finalize the best model and its tradeoffs. It plays a crucial role in ensuring reproducibility in ML experiments. By catching the snapshot of the entire ML pipeline, it becomes easy to reproduce the same results while saving on the retraining and testing efforts and time.  

Better tracking 

ML workflows are error-prone and complex and hence require tracking. ML models can fail or underperform for several reasons like, adding more data or updating features. Model versioning helps revert failed models to their previous, stable, and working versions. 

Track dependencies 

ML experiments involve complex workflows with several elements that affect model performance. For example, datasets, frameworks, features set, test cases contribute to model performance. Model versioning helps track dependencies that affect ML model performance. It helps test multiple models in various ML pipelines, tune parameters and hyperparameters, and keep model accuracy in check. 

Scaled AI-ML Governance

ML projects are rolled out iteratively for scaled performance and failure tolerance. Model versioning supports better AI governance with access control, policies, the right version deployments, and model activity tracking.

 

Getting ML Versioning Right  

Data and model versioning is better achieved with the right set of tools. The chosen toolset should offer insights on elements of the ML pipeline and mechanism to link data, code, and model versions. The following tools are used for precise ML versioning.

A table listing and describing the trending ML versioning tools
A table listing and describing the trending ML versioning tools

Data and model versioning is an important step in the entire ML lifecycle. Precise ML versioning helps drive reproducibility and desired performance of ML systems over multiple versions. 

Further Reading

Version Control For ML Models, Explained

Version Control for ML Models: Why You Need It, What It Is, How To Implement It

What is data model versioning?

Versioning data models

Versioning data and models 

Data Versioning: Does it mean what you think it means?

Liked the content? You'll love our emails!

The best MLOps and AI Observability content handpicked and delivered to your email twice a month

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Censius automates model monitoring

so that you can 

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare

Start Monitoring