CI/CD for Machine Learning
CI/CD of ML pipelines enables teams to build source code, run tests, and deploy automated pipelines for continuous delivery and training.
What is a CI/CD Pipeline in ML?
CI denotes Continuous Integration, and CD stands for Continuous Delivery. Continuous integration allows teams to simultaneously work, upload code, data, and features multiple times throughout the day into a central repository.
Continuous delivery helps automate the deployment of ML pipelines and their elements by eliminating manual workflows. Such automation helps avoid manual and multi-stage tasks like deployment and provisioning.
Applying CI/CD practices in DevOps is comparatively easy with a 4-stage CI/CD pipeline – code, build, test, and deploy. However, implementing CI/CD practices into the machine learning lifecycle poses unique challenges to MLOps practitioners.
CI/CD For Machine Learning - Challenges
CI/CD implementation as part of MLOps practices needs to address these challenges:
Achieving reproducibility
Evaluating ML experiments to determine the best model and parameter configuration is challenging. Machine learning is experimental by nature, making it challenging to achieve reproducibility with ML experiments so that the same results are reproduced by reusing existing code.
ML testing complexities
Compared to CI/CD implementation of software systems, ML systems face operational complexities in testing phases. It is due to the requirement of testing models and data along with unit and integration tests.
Deployment of multi-step workflows
ML deployments require the deployment of a multi-step pipeline with other cascading services into production. This step demands automation of training and validating new models before deployment, which adds complexity to the CD process.
CI/CD Implementation For Machine Learning
A CI/CD implementation for ML pipelines covers these two concepts:
- Continuous integration to build source code and run various tests
- Continuous delivery to deploy artifacts produced in the CI stage
Continuous integration
The ML pipelines are developed, tested, and packaged for continuous integration when new code changes are attempted on the source code repository. The CI process also involves the following tests:
- Unit testing for feature engineering logic and methods implemented
- Data and model tests
- Testing to confirm that each component produces the expected artifacts
- Integration testing
Continuous delivery
The continuous delivery process involves automated pipeline deployment for continuous training and delivery of ML models. This stage involves:
- Model compatibility verification with the target infrastructure
- Testing the prediction service and prediction service performance for metrics like queries per second and model latency
- Data validation for retraining or batch prediction
- Automated deployments to a test environment
- Semi-automated deployment to a pre-production environment
- Manual deployment to a production environment after successful trials in the pre-production environment
Implementing CI/CD ML practices as part of MLOps processes helps automatically build, test, and deploy ML pipelines and readily adapt to data and business environments changes.
Further Reading
MLOps: Continuous delivery and automation pipelines in machine learning