Model Serving


Feb 2019
Apache-2.0 License
Github open issues
Github stars
5 Aug
Github last commit
Stackoverflow questions

What is Cortex?

Cortex is an open-source option to simplify ML model deployment with model serving, model monitoring, CI/CD, and observability integrations. It supports cluster provisioning, log management, and metrics tracking using pre-built utilities like Grafana dashboards and CloudWatch integration.

Cortex allows you to work on top of AWS services like Lambda, Elastic Kubernetes Service, Fargate, and other projects such as TensorFlow Serving, Kubernetes, Docker, and TorchServe. It enables ML model management with EKS to scale workloads and IAM integration for authentication.

How Does Cortex Help?

Cortex is marked with its auto-scaling to serve real-time request volumes and queue length. The multi-framework tool allows automatic scaling APIs to manage ever-expanding production workloads. 

Cortex facilitates a single API load balancer to deploy multiple models and ensures a seamless model updating process without any downtime. 

Cortex provides Prometheus for metrics collection and Grafana dashboard for visualization. By default, you can use Grafana dashboards to monitor APIs or create custom metrics and dashboards. 

Cortex also supports setting alerts for monitoring API performance. It empowers dashboards RealtimeAPI, BatchAPI, Cluster resources, and node resources with alert configurations.

Key Features of Cortex

Autoscaling APIs

Cortex ensures auto-scaling of APIs’ to manage growing production workloads and traffic fluctuations. It effectively monitors API performance and prediction outcomes. 

Automated Cluster Management

Cortex enables cluster autoscaling and running workloads on spot instances with automated on-demand backups. It supports creating multiple clusters with different configurations and scales workload performance. 

Realtime APIs

Realtime APIs respond to requests in real-time and autoscale based on in-flight request volumes. They also support A/B tests, canary deployments, rolling updates, and synchronous responses. 

Async APIs

Async APIs support asynchronous and longer workloads that require autoscaling based on request queue length. 

Batch APIs

Batch APIs execute fault-tolerant and distributed batch processing jobs on demand. It is a good fit for users who want to divide their workloads and assign them to a dedicated team. 


Cortex facilitates implementing compute-intensive microservices without bothering resource limits and timeouts. The tool offers an excellent MLOps alternative with apt services to retrain and evaluate models.

Companies using


No items found.

Liked the content? You'll love our emails!

The best MLOps and AI Observability content handpicked and delivered to your email twice a month

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Censius automates model monitoring

so that you can 

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare

Start Monitoring