MLOps, even though widely acknowledged, intensely discussed, and moderately implemented, is still a new technology that started making rounds in the IT industry only about two to three years ago. It is, therefore, not surprising that machine learning (ML) practitioners, teams, and leaders are still in an exploratory and experimental stage of aligning with MLOps. There are scores of MLOps tools and platforms that promise partial to complete evolution of ML pipeline management, and one among them is MLflow.
By the end of this article, you will be able to take a call on some immediate and long-term action items for leveraging and optimizing your ML pipeline through some best practices for implementing MLOps with MLflow.
What is MLflow?
MLflow is a one-stop open-source platform that helps to manage end-to-end machine learning pipelines. MLflow is library-agnostic and language-agnostic, which means it can be implemented and used with any ML library or any language.
The flexibility of MLflow across libraries and languages is possible because it can be accessed through a REST API and Command Line Interface (CLI). Python, R, and Java APIs are also available for convenience.
MLflow was developed and underwent continual evolution by a large community of open-source developers. Therefore, MLflow grew as a solution and resulted from some common problems the ML community faces.
What are the common challenges of an ML pipeline?
Every ML pipeline is complex since it has various modules that are individually evolving and can change the dynamic of other modules. A minor change in one parameter can create a domino effect across all the following modules. Therefore, it is critical to closely monitor and control every stage of the pipeline, especially the training, production, and retraining phases, to maintain a high-performance solution for as long as feasible.
Some common challenges of an ML pipeline include:
- Reproducing results
ML projects usually start with simplistic plans and tend to go overboard, resulting in an overwhelming quantity of experiments. Manual or non-automated tracking implies a high chance of missing out on finer details. ML pipelines are fragile, and even a single missing element can throw off the results. The inability to reproduce results and codes is one of the top challenges in amateur ML teams.
- Non-standard solution packaging
Different teams or developers are often engaged in different ML projects and undertake varied packaging methods to deploy their ML solutions. There is no standard way to package the different modules together, which could disrupt maintenance and re-training phases since the projects change hands during these stages.
- Inability to track stages
Each experiment is a series of different modules and ML stages. Data gathering, processing, feature engineering, model selection, and tuning are just the primary high-level pieces and are also known to be highly interlinked. For example, the model tuning stage can lead the developer to experiment with the feature engineering stage and vice versa. It is difficult to track different routes and choices meant for high-performing results with so many changes happening simultaneously.
- High downtime
Not having detailed model artifacts can mean that the time required to manage a failed instance will be much more than when the developers have access to the tracking details of every stage and a quick standard way to deploy the revised solution. Usually, ML teams do not have centralized and easily accessible storage for model artifacts and lack a highly detailed tracking system. Having access to such details can help the developers to locate and troubleshoot the exact stages that disrupted the solution.
What are the functions of MLflow?
MLflow solves the recurrent problems of an end-to-end pipeline through its key functionalities. It offers four primary functions, which are as follows:
- MLflow Tracking
MLflow Tracking is used to track different pipeline parameters such as metrics, hyperparameters, feature parameters, code versions, and other artifacts. The logs can later be used to visualize or compare the results between experiments, users, or environments. The logs can be stored both on any local system and remote servers.
Recommended Reading: How to use MLflow to Track and Structure ML Projects?
- MLflow Projects
MLflow Projects offer a convention for packaging or structuring your ML projects and reusable project codes. Fundamentally, a project is a directory along with a descriptor file that defines the structure and dependencies. Additionally, on using the MLflow API in the project, MLflow automatically remembers the parameters or project details.
- MLflow Models
With MLflow Models, your ML model can be packaged into different ‘flavors.’ A ‘flavor’ is a format or structure such as a TensorFlow DAG or a Python function, and the descriptor file defines it. This ability to package ‘flavors’ enables the model to be used across a host of downstream tools and platforms such as on Docker or AWS SageMaker, and consequently makes the model lifecycle easier to process and manage.
- MLflow Registry
MLflow Registry acts as a core and enables APIs, UI, and centralized model storage. It aims to govern the end-to-end ML pipeline through tracking model lineage and versioning capabilities.
Recommended Reading: Data Version Control: DVC vs. MLflow
Five Best Practices for MLOps with MLflow
Below are some of the best practices for MLOps that can be achieved through MLflow:
1. Dedicated storage for pipeline parameters
The parameters discussed in the MLflow Tracking module can add up over time to an overwhelming quantity that can seem challenging to manage and comprehend. To overcome the complexities of managing artifacts such as metadata, hyperparameters, databases, or metrics, especially in the production environment which runs on the cloud, it is essential to reserve dedicated cloud storage for seamless tracking.
2. Testing on staging environments
One of the functionalities provided by MLflow is Model Staging. Staging refers to a replica of the production environment where models can be tested and readjusted before going live in the production environment. Model staging is a best practice in the machine learning lifecycle since it filters out bugs and potential failures from the code to run seamlessly in the production or customer-facing environment.
3. Centralizing experiment tracking
One of the critical requirements of MLOps is experiment tracking, and MLflow allows a simplistic way to transform this function by centralizing the tracking details across users and systems. The tracking API can directly log details from Jupyter notebooks and gather data from other users using the same experiment name. The details are then stored under a common table for quick analysis and comparison.
4. Leveraging AWS for MLflow server
The combination of MLflow and AWS stands out as a superior combination due to the ease of setup and limited operational costs during maintenance. This is because AWS offers a set of free services that can be leveraged to set up MLflow, and since the server doesn’t carry out any heavy tasks, the costs are minimal. Overall, the setup takes about ten minutes and eventually optimizes the MLOps cycle.
5. Tuning the complete pipeline
As a best practice, instead of tuning individual or isolated modules, tuning the entire ML pipeline is ideal to leverage the full potential of different hyperparameter combinations. This means that an overwhelming number of combinations will be generated that can be tracked, stored, and analyzed seamlessly through MLflow’s tracking API and centralized storage.
How to Get Started with MLflow?
MLflow can be kickstarted through 5 simple steps. A simplified overview has been described as under, and for a more detailed insight into the initiation process, feel free to refer to MLflow docs.
Step 1: Install MLflow
Installing MLflow is the quickest and can be done through a single command:
pip install mlflow
Other dependencies must be installed for MLflow to operate if they are not already present in the system.
Step 2: Cloning the quickstart code
The quickstart code from MLflow is an easy-to-use code that walks the user through the MLflow modules and enables quick learning. The code can be cloned from its git repository.
After cloning, the code can be stored in the ‘examples’ subdirectory of the repo.
Step 3: Kickstarting the tracking API
To start and view the tracking logs, the python script included in quickstart/mlflow_tracking.py can be used. After running the program, the tracking logs can be viewed through the command:
After entering the command, the UI can be viewed on http://localhost:5000/
Step 4: Running an MLflow project
With MLflow projects, code can be packaged according to a set template and reused on different codes. To run a local or got project, the following commands can be used:
mlflow run sklearn_elasticnet_wine -P alpha=0.5
mlflow run https://github.com/mlflow/mlflow-example.git -P alpha=5.0
Refer to the MLflow tutorial for a more in-depth understanding of building and running MLflow projects.
Step 5: Saving and serving Models
The model can be saved through the MLflow function as below:
To serve the model, the following command can be used:
mlflow models serve -m runs:/<RUN_ID>/model
If the usual port 5000 is not available, other open ports can be used through the -port option to run the model.
Today, MLOps is at an evolutionary stage, and several MLOps tools are being developed to better serve the machine learning ecosystem. MLflow is one such tool and has demonstrated high performance across enterprises and research projects. It has the proven ability to evolve mismanaged ML projects into organized, streamlined, and high-performance ML pipelines. Learning about the best practices in MLOps and taking gradual steps towards implementing them at regular intervals can significantly change the team’s outputs, and MLflow is one of the top choices that can help the ML teams to get there faster.