MLOps Tools
 • 
6
 minutes read

How to use MLflow to Track and Structure Machine Learning Projects?

Learn what is experiment tracking, why is it important, what is MLflow, MLflow’s features and benefits, and tracking experiments with MLflow

By 
Harshil Patel
How to use MLflow to Track and Structure Machine Learning Projects?
In this post:

Machine Learning is an expensive, experimental process. Every step has to be meticulously planned, and every input is required to have a meaningful effect on the output. The process can take weeks or months to complete, but once it gets rolling, there are no signs of stopping it for a good reason: Machine Learning algorithms are by nature unpredictable and can change course at any moment because they are "trained" on new data points that weren't present before.

MLflow is one of the tools  explicitly designed to enrich the processes of Machine Learning algorithms so that they will not only produce quality work but also be created efficiently and economically.

What is Experiment Tracking, and Why is it important?

Experiment tracking is the technique of keeping track of relevant information about various experiments undertaken while creating a machine learning model. For Example :

  • Various ML models
  • Hyperparameters 
  • Configuration files for the environment
  • Data versions used for training and evaluation
  • Performance visualizations and a lot more

Data scientists can discover the elements that impact model performance, compare the findings, and choose the best version by tracking ML model trials in an organized way.

Collecting and preparing training data, picking a model, and training the model using prepared data are all common steps in developing ML model. A slight change in the training data, hyperparameters, model type, or code used to experiment can significantly impact model performance. Many open-source and enterprise-level MLOps tools and platforms are available to assist you in tracking your machine learning experiments. MLflow is a popular open-source tool used by many data scientists and ML engineers.

What is MLflow?

MLflow is an open-source tool used in machine learning to help developers and data scientists better understand and interact with their data. It allows you to manage the entire machine learning lifecycle - experimentation, reproducibility, deployment, and model registry. Let's look at some of the features MLflow offers before moving on to the key components.

  • It is compatible with a wide range of machine learning frameworks, languages, and code.
  • It packs an ML model in a common format that downstream programs may utilize.
  • It's a model store with APIs, and a user interface for managing the MLops Lifecycle.
An image showing the key components of MLflow with brief descriptions
Graphic shows Key Components of MLflow (Source: medium.com/pytorch)


MLflow Tracking

When executing your machine learning code, the MLflow Tracking component offers an API and UI for recording parameters, code versions, metrics, and output files, as well as viewing the results later. MLflow Tracking uses Python, REST API to log and query trials.

It assists in keeping track of the various number of experiments and iterations on the data. It also helps in obtaining various hyperparameters, characteristics, and analyses for a certain iteration.

Some important function of MLflow Tracking:

MLflow.start_run() -- starts/executes a run.
MLflow.end_run() -- ends a currently active run.
MLflow.log_artifacts() -- logs all the files given in a directory as artifacts.
....


MLflow Projects

An MLflow Project is a convention-based framework for packaging data science code in a reusable and repeatable workflow. The Projects component also offers an API and command-line utilities for executing projects, allowing you to create workflows by chaining projects together.

MLflow supports different types of environments like Docker container environment, system environment, and Conda environments.

Working of an MLflow project
The image represents how MLflow Projects works: Image Source: infoq.com

MLflow Models

A machine learning model is packaged as an MLflow Model, which may be utilised in several downstream tools, such as real-time serving over a REST API or batch inference on Apache Spark. The format establishes a standard that allows you to store a model in many “flavors”. MLflow makes it easy to package models from various popular machine learning libraries in MLflow Model format, with tons of customization options. 

Model Registry

The MLflow Model Registry component provides centralized model storage, API set, and UI for jointly managing an MLflow Model lifecycle. It includes model lineage, versioning, and annotations.

Recommended Reading: Data Version control: MLflow vs DVC

It provides excellent governance and control. You can use CI/CD Workflow Integration to track stage transitions, analyse changes, and approve them.

Recommended Reading: MLflow Tracking Docs

Benefits Of Using MLflow 

Let's take a look at some of MLflow's benefits.

  • It is an Open Source MLOps tool.
  • Supports many Tools and Frameworks
  • Highly Customizable
  • It's ideal for data science projects.
  • Focuses on the entire Machine learning lifecycle.
  • Works with any ML library.
  • Custom Visualization

Let's look at how you can use MLflow to keep track of your machine learning and deep learning projects.

Recommended Reading: MLflow Best Practices

Tracking ML Experiments using MLflow

We will discuss the basic integration process of MLflow in your machine learning application/project. Let's have a look at how you can use the MLflow UI to visualize your data.

UI Workflow

Installing MLflow :

pip install MLflow

Now, open your machine learning project/ML pipeline code file.

First import :

import MLflow
import MLflow.sklearn

Now, name the experiment you are going to track. 

MLflow.set_experiment(experiment_name="MLflow demo")

You have to specify what you're going to track.

MLflow.log_metric("accuracy", model_accuracy) //metric logging
MLflow.log_metric("precision", precision) //metric logging

MLflow.sklearn.log_model(model, "model") //model logging

MLflow.log_param("max_depth", max_depth) //hyperparameters logging
...

Now open the command prompt, and write:

MLflow ui

You will get a similar outcome - “Serving on http://127.0.0.1:5000”.


An image showcasing the UI of MLflow
The image shows MLflow UI


To Learn More, Download the quickstart code by cloning MLflow via git clone and cd into the examples subdirectory of the repository. 


API Workflow

MLflow provides a more detailed Tracking Service API for tracking experiments and runs directly, which is accessible via the MLflow.tracking module's client SDK. This allows you to search for data from previous runs, log extra information about them, create experiments, tag runs, and more.

from MLflowf.tracking import MLflowClient


After importing MLflowClient, define a few parameters. 

client = MLflowClient()
experiments = client.list_experiments() # returns a list of MLflow.entities.Experiment
run = client.create_run(experiments[0].experiment_id) # returns MLflow.entities.Run
client.log_param(run.info.run_id, "hello", "world")
client.set_terminated(run.info.run_id)


You can get more info on MLFlow's Github example repo

Recommended Reading: Be more efficient to produce ML models with MLflow

Some Highlights of MLflow

The MLflow API is well-designed, and new features are released regularly. It's important to keep up with new features and updates by monitoring the API. However, I'd like to draw attention to a few noteworthy characteristics of MLflow.

  • MLflow includes auto-logging. It is incredibly easy to use, and simply activating it assures that all potential metrics are captured and logged. Keras, Tensorflow, XGBoost, and Spark all have Autolog support. 
  • A number of task orchestration platforms are available, but MLflow is designed particularly to enhance the machine learning lifecycle. This means that MLflow can conduct experiments and track their outcomes, as well as train and deploy machine learning models.
  • Deep learning models benefit from auto-logging. As we all know, during the training of a Deep Learning model, multiple parameters/hyper-parameters are captured.
  • With MLflow, you can customize it to meet your specific requirements. It can also handle massive volumes of data
  • MLflow API supports not just Python but also Java and R programming languages. 
  • It is open-source, so you can get good community support.
  • It may be used to deploy various machine learning models, which can be saved as a directory with any number of files in it.
  • With MLflow, data scientists will no longer need to manually monitor the parameters they use in each run. 

Conclusion

We've seen MLflow's potential and learnt how it can help you with experiment tracking and monitoring. We also discussed what MLflow is and how it can help you in your machine learning lifecycle. MLflow can provide a strong method for tracking model, packaging, and repeatability with only a few lines of code. In the machine learning arsenal, this is a must-have tool.

PS: Speaking of must-have tools in the ML Space, Censius can easily track the performance and behavior of your ML models in real-time. Our platform seamlessly integrates with popular ML frameworks like MLflow, so you can gain valuable insights into your projects without any additional effort.

Sign up for the free trial now!

Liked the content? You'll love our emails!

The best MLOps and AI Observability content handpicked and delivered to your email twice a month

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.
Censius AI Monitoring Platform
Automate ML Model Monitoring

Explore how Censius helps you monitor, analyze and explain your ML models

Explore Platform

Censius automates model monitoring

so that you can 

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare

Start Monitoring