DVC
Data

DVC

Released: 
Jun 2020
  •  
Documentation
  •  
License:  
Apache-2.0 License
535
Github open issues
8757
Github stars
21 Oct
Github last commit
57
Stackoverflow questions

What is DVC?

Iterative.ai launched Data Version Control or DVC as a Git-based version control solution that follows the Commercial Open Source Software (COSS) model.

DVC tool simplifies data version controlling for ML projects with agility, data versioning, reproducibility, and sharing efficiency. This experimentation tool streamlines organizing and accessing big data efficiently. Its flawless Git-like experience and full code-data provenance help track the complete evolution of each ML model.


How Does DVC Help?

DVC addresses the following ML experiment challenges:

  • Ensures consistency of all files and metrics to reproduce the experiment or apply it as a baseline for a new iteration
  • Uses Git to keep metafiles, making version control of data sets and models easier. DVC supports several external storage options as a remote cache for large files.
  • Establishes norms and processes for effective team collaboration and code sharing efficiency. 
  • Intends to replace traditional document sharing tools- Excel or Google Docs and ad-hoc scripts used for model version tracking and management.

 

Key Features of DVC

Storage and language agnostic

DVC allows Microsoft Azure Blob Storage, Amazon S3, Google Cloud Storage and Drive, SSH/SFTP, HTTP, or disk to store data. It allows defining pipelines using R, Python, Notebooks, Scala Spark, TensorFlow, PyTorch, etc.  

Reproducible

DVC assures reproducibility by consistently maintaining the configuration, input data, and the code used to run an experiment. With a single ‘dvc repro’ command, users can reproduce experiments end-to-end.

Metric tracking

DVC streamlines managing experiments with Git tags/branches and metrics to pick the best version and track the progress of experiments.

ML pipeline framework

DVC offers a built-in way to connect ML steps into a DAG and execute the end-to-end pipeline corresponding to data cleaning, loading, feature engineering, and training. 

Compatibility with Git

DVC runs on top of any Git repository and is compatible with GitHub or GitLab. It provides all the advantages of a distributed version control system.

  • Lock-free
  • Local branching
  • Versioning.

Companies using

DVC

No items found.

Liked the content? You'll love our emails!

The best MLOps and AI Observability content handpicked and delivered to your email twice a month

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Censius automates model monitoring

so that you can 

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare

Start Monitoring