This post will answer the following questions:
- Why scale ML flows?
- What are the potential pain points when scaling?
- How is MLOps helping the industry cope with scaling issues?
Scalability of ML Flows and Possible Bottlenecks
Machine Learning (ML) powered systems are found in likely places such as loan credibility calculators, movie recommenders, diagnostic laboratories, and unlikely places like children's games and recipe catalogs. It is an exciting world where an idea can be quickly translated into an ML service, provided you know how to use the relevant data and design a good solution.
Post-training and testing, the model is ready to be served to the intended audience on a shiny platter. But wait!
Your model has been trained and tested on squeaking clean acquired data, but will it tolerate the new data encountered in the wild? Think of the users who will use the loan credibility calculator and the diverseness of their attributes. Or the huge volumes of data that live systems see on an hourly basis.
When you anticipate large-dimensioned or large-sized datasets in a setting requiring quick predictions, there are some factors to consider. You would not want the user to wait for long to determine their loan credibility score; otherwise, they will simply move to another service. Ensuring your ML project can be an answer to these requirements. Of course, the approach to scalability may differ among use cases and system environment capabilities. Therefore we would like to introduce potential bottlenecks that a typical ML workflow may encounter.
As you read along, you may identify with the mentioned issues or anticipate similar problems in your setup and find solutions to them here. Some of the ways in which your ML project may stumble if it lacks scalability are:
- The training datasets are too large to train the model: The apparent consequence is that an ill-fitted model will produce incorrect predictions. The results may seem correct on a cursory evaluation, but the model performance will degrade due to insufficient training.
- Impact on the prediction latency of a deployed model: Deployment of the model is followed by monitoring and re-training to ensure relevant predictions and counter problems like model drift. If the model encounters large-sized input data that had not been adequately pre-processed or had integrity issues, the prediction latency may worsen.
- The data rates are too fast for your model to consume: The dynamic data that may be relevant to your model will languish in the pipeline while it makes predictions based on stale datasets.
- The feature engineering logic slows down the model performance: The relevance of real-time predictions will be lost due to swamped performance.
- In the case of batch processing, the system may not be able to handle large numbers of batches: If the data is too large to process, then the model will break down. No more predictions for the user.
The ML development flow for the hypothetical loan credibility calculator may look like this:
A typical ML lifecycle is riddled with potential pain points and the scope of MLOps over it. Source: Censius
Now let us check out the potential pain points or bottlenecks for different workflow stages and how MLOps practice addresses them.
Scalability issues during data acquisition
As mentioned earlier, the deployed model should be able to handle the barrage of incoming data irrespective of its dimensionality and size. Additionally, the acquisition stage might receive datasets from multiple sources. The loan credibility scoring model may expect to receive data from the user, financial market updates, and data belonging to persons of similar demographics.
How does MLOps handle dynamic data situations?
- A data pipeline will allow your team to speed up data consumption without swamping the system performance. This is especially helpful if there are multiple data sources and parallel consumption is feasible.
- Automated data pipelines can help streamline the acquisition process. MLOps tools like Apache Airflow not only allow the design of complex data pipelines but also automate their execution and handle failures gracefully.
- Job schedulers and alerting systems minimize the human efforts and would allow your team to focus on development and debugging.
Scalability issues during data preprocessing
The smooth scalability of the data preprocessing stage should guard against computational costs incurred by feature engineering. The selection of the best among the features or the creation of new features significantly impacts the model performance. The person's profile using your system to check loan credibility will be a broad dataset. It would contain all the information that the system could capture at the time of collection, whether relevant to your model or not. You certainly do not want a person’s preference for a specific color to be considered a feature while training the prediction model.
How does MLOps handle scalability issues during the preprocessing stage?
- Initial exploration of the datasets and their statistical properties can help gauge teams to devise efficient feature engineering strategies. The trends can be plotted and presented in a dashboard to support the decisions and create a comprehensive source of information for other stakeholders.
- When scaling up the system to cater to big data, you could scale horizontally, increasing the number of machines to handle the big influx. The other option could be vertical scaling, where you can upgrade a smaller number of machines with more computational resources like memory and CPU/GPU cores.
- Horizontal scaling systems like Apache Hadoop facilitate the addition of nodes to improve data processing capabilities. Such tools also allow parallel processing by splitting the dataset among the nodes. This approach is the answer to the limitation on how much an existing machine can be upgraded to perform vertical scaling.
- There are also powerful vertical scaling systems like Apache Spark that support distributed computation.
Scalability issues during modeling
The model development phase is the time when the design and concepts begin to take shape. The development phase goes through an exhaustive cycle of improvement. The experiment tracking technique observes outcomes such as hyperparameters, trained models, the development environment details, versions of training, and validation datasets. It can be helpful for the team to track and log outputs to catch potential bugs early.
How does MLOps handle scalability issues during the modeling stage?
Following MLOps tools and practices will help your team address potential pain points during experiment tracking:
- The developed models, data, and environment-specific configuration that are parts of ML artifacts can be tracked and shared among the teams using a suitable MLOps strategy.
- A model scalable to different serving platforms can be achieved by MLOps tools that package it into formats compatible with different serving platforms like Azure ML, AWS Sagemaker, and Apache Spark.
- Assuming that you decided to implement the loan credibility calculator using neural networks and harness the power of GPUs, the system should be scalable enough to work in multi-GPU environments. MLOps tools can scale up deep learning toolkits like PyTorch, Keras, TensorFlow, and Apache MXNet to multi-GPU parallel processing environments.
- ML development is driven by the data and needs more than the vanilla software versioning process. Dedicated tools like DVC for data version control have much more to offer than dev-oriented platforms.
Scalability issues during integration and deployment
The ML artifacts ready to be shipped to the target environment undergo extensive tests and versioning. Your team could integrate enhancements whenever needed or import trained models from the shared repository. The integration of different artifacts must not downgrade the model performance and should move the product down the development assembly line.
The concept of Continuous Integration and Continuous Delivery (CI/CD) was borrowed from DevOps and further enhanced to suit the needs of ML development. You may read the in-depth comparison to better understand the differences between MLOps and DevOps.
How does MLOps handle scalability issues during the integration and deployment stage?
- CI/CD processes of MLOps allow the teams to ensure a stable running ML service while their changes are deployed quickly. The intent is to focus on the implementation instead of locating and debugging the code additions that had caused unintentional bugs.
- A truly scalable ML system can be achieved by bridging data science requirements with DevOps practices by producing scalable API endpoints. Further scalability can be achieved by auto-scaling the APIs as per the load dynamics of the production environment.
- Different deployment strategies like canary and A/B rollouts ensure no downtime between upgrades.
- MLOps tools address the pain points of integrating models for different workloads and cloud infrastructure and provide methods to automate and monitor deployments.
- Seldon Core is a solution that has found wide acceptance due to its framework-agnostic working. Being built on Kubernetes, the model packaged by Seldon can be deployed to any cloud or on-premise service. It provides fast and reliable deployment of model containerized for custom servers and language wrappers, with scaling up to thousands of production models.
Scalability issues during performance evaluation
The scaling of ML workflows is not limited to handling large datasets. Imagine that a popular bank adopted the loan credibility calculator. Suddenly your ML model caters to millions of applicants who wish to gauge their credibility. In addition, suppose the bank asked you to predict the probability of defaulting in the next twelve months. Now you need to make more predictions each day where the volume may be up to thousands per second.
You already took care of the scalability of data processing and model development and deployment. Now the scalability of predictions should look something like this:
How does MLOps handle scalability issues in the production?
- The model working in production will encounter data different from what it had been trained on. For instance, the loan credibility scoring model trained before the Covid-19 pandemic was prepared for a different economic situation compared to the current crisis. This is an extreme situation that could result in data and concept drifts. Not all changes are sudden and intense, but monitoring the live model will ensure that your team is not caught off-guard.
- MLOps tools that monitor model functioning in terms of performance and business metrics can raise alerts on observing any discrepancies.
- The team can reason the predictions better with the help of interpretability and enhance the model by using explainability metrics.
- A model in production should be periodically checked, and monitoring can make it easier through scheduled checks and custom configuration options.
- The Censius AI Observability Platform can be a lifesaver with tailor-made monitoring facilities and a rich user interface for truly scalable projects. The automated monitoring of the whole lifecycle can not only alert your team in case of an anomaly but allow for the refinement of the process altogether.
Explore the Censius AI Monitoring Features
MLOps Best Practices
In this post, we introduced to you the potential bottlenecks that might slow down the scaling up of an ML project. We also showed how MLOps can solve such problems and which MLOps tools curated by the community can best address your concerns. We would wrap up this blog by listing a few best practices for MLOps:
- Selection of tools that suit the organizational needs. There are numerous options which could be open-source or enterprise. They may need different technical expertise among the team or concentrate either on one lifecycle stage or the whole workflow. The MLOps tools collection has some resources curated for you.
- Automate wherever possible. Manual setup of ML and data pipelines can be cumbersome, especially if the scalability needs are considered. Selecting tools and strategies that automatically consume outputs of one stage and move the product along the MLOps assembly line can lessen the costs incurred by human error and delays.
- Versioning of the data is as important as the code. The power of ML is in the data used to train and test it; therefore, dedicated tools for its management are a must. For scalable projects, a data lake environment can be quite beneficial for assessment needs.
- Quality checks and monitoring. Code checks and monitoring of different stages of ML development can timely detect issues and direct the team towards possible enhancements.
- Documentation is important. Well-documented quality control for processes involved in MLOps can help with contingencies. Additionally, logging issues and metrics signifies a mature ML development process.
Thank you for reading
Explore how Censius helps you monitor, analyze and explain your ML models
Explore Platform