Feature Management
A feature store is a central repository to store features by defining a feature transformation once and computing its values to serve models.
What is A Feature Store?
Uber in 2017 introduced a term feature store in their introductory blog post for the Michelangelo platform. That blog post defined a feature store as a central repository to store curated features within an organization. It is an ML-centric data system applied to
- Transform raw data into valuable features and execute data pipelines
- Store and manage features
- Serve features for training and inference
Feature store supports a complete data management layer with a data transformation service facilitating users to manage raw data and store it as features to be used by any ML model.
For effective feature management, feature stores support data abstractions for building, deploying, and using features across development and production environment. Such abstraction allows defining a feature transformation once and computing its values consistently to serve training and production models.
Why use Feature Stores For Feature Management?
Using a feature store is not just a recommended MLOps practice, but it also brings economies of scale to ML projects. Feature store’s registered features become immediately available for its reuse by other models across the enterprise. Such reusability saves massive data engineering efforts with a ready-to-use curated feature library for new ML projects.
Feature stores enable effective feature management by allowing:
- Application of new features without massive engineering efforts
- Sharing and reusing features across multiple ML projects
- Automating feature computation, backfilling, and logging
- Monitoring feature pipelines to ensure consistency between training and serving data
- Tracking feature metadata, versions, and lineage
Feature Stores To Complete Your ML Stack
Feature stores are indispensable and critical ML infrastructure when ML projects involve model deployments at scale. Data science professionals consider encapsulating the logic of feature transformations as the key benefit of feature stores. The following feature stores constitute the best alternatives to complete your ML stack.
Feast: An open-source feature store that perfectly serves as great storage and serving layer to use features in production. It is best suited to have transformation pipelines ready to compute your features.
Tecton: It provides a feature-store-as-a-service. Being a managed solution on top of Feast, Tecton supports feature transformations to manage feature pipelines end-to-end. It is a good feature store choice with web UI, advanced collaboration, and managed transformations - batch, real-time, and streaming.
Hopsworks: An enterprise-grade feature store option to store features, manage the transformations, and serve features to production and training models. It has a web UI to explore features. It supports a rich array of infrastructure – Azure, AWS, Kubernetes, and data sources – Snowflake, Redshift, and HDFS. Hopsworks comes in free and paid options.
Further Reading
Feature Store: The missing data layer for Machine Learning pipelines?
Feature Stores: Components of a Data Science Factory [Guide]
What are Feature Stores and Why Are They Critical for Scaling Data Science?