AI Artifacts denote the output generated by the training process. They are produced at various stages and used across the ML lifecycle.
What are AI Artifacts?
Artifact term denotes the output generated by the training process. These can be a fully trained model, a model checkpoint, or a file created during training.
AI artifacts are generated at various stages and used across the ML project lifecycle. They can change during the project, or you might use multiple versions of the same artifact at some point in your ML development.
Artifacts Generated Across ML Lifecycle
ML development involves primarily four stages - requirements, data, modelling, and operations. The artifacts produced in each step include:
- Requirements stage: Model requirements analysis
- Data-oriented stage: Datasets, labels and annotations, feature sets, data processing source code, logs, and environmental dependencies
- Modelling: Results from the data stage, metadata such as parameters, hyperparameters, and captured metrics, model processing source code, logs, and environment dependencies
- Operations: Trained models and corresponding dependencies such as libraries and runtimes, execution logs & statistics, metadata artifacts like model parameters, hyperparameters, lineage traces, performance metrics
Why is AI Artifacts Management Important?
ML artifacts management is essential to achieve comparability, traceability, and reproducibility of model and data artifacts across all lifecycle steps and iterations. Capturing the input and output artifacts of each lifecycle step and iteration helps ML practitioners ensure the abilities above.
Software-related artifacts like code, configurations and environmental dependencies help bring reproducibility. Metadata artifacts such as model parameters, hyperparameters, quality metrics, and execution statistics enable compatibility in ML runs.
How to Manage AI Artifacts?
Manual management of ML artifacts is one way of doing this. But it is not efficient due to the complexity involved and the required time.
The recommended approach is using ML artifact management tools. ML artifact management covers methods and tools for managing artifacts produced and used in ML development, deployment & operations.
You should set your criteria for selecting the right artifact management system. Here you can consider aspects such as ML lifecycle stage supported, types of artifacts supported (data, model, metadata, software, etc.), operations supported (logging, versioning, exploration, collaboration, management), storage types, integrations, cloud availability, and licences.
Some of the platforms and tools choices for artifacts management are:
Suppose you want to set up an end-to-end artifact management system. In that case, one potential option is to buy an ML platform, especially if you focus on solving business problems rather than maintaining and building pipelines. Companies like Attri, ZenML, and DataRobot provide ML platforms that allow you to ingest your data and generate results. In this way, all the artifact management is automated for you.
Management of Machine Learning Lifecycle Artifacts: A Survey
An Overview of the End-to-End Machine Learning Workflow
ML Metadata Store: What It Is, Why It Matters, and How to Implement It