What is Cortex?
Cortex is an open-source option to simplify ML model deployment with model serving, model monitoring, CI/CD, and observability integrations. It supports cluster provisioning, log management, and metrics tracking using pre-built utilities like Grafana dashboards and CloudWatch integration.
Cortex allows you to work on top of AWS services like Lambda, Elastic Kubernetes Service, Fargate, and other projects such as TensorFlow Serving, Kubernetes, Docker, and TorchServe. It enables ML model management with EKS to scale workloads and IAM integration for authentication.
How Does Cortex Help?
Cortex is marked with its auto-scaling to serve real-time request volumes and queue length. The multi-framework tool allows automatic scaling APIs to manage ever-expanding production workloads.
Cortex facilitates a single API load balancer to deploy multiple models and ensures a seamless model updating process without any downtime.
Cortex provides Prometheus for metrics collection and Grafana dashboard for visualization. By default, you can use Grafana dashboards to monitor APIs or create custom metrics and dashboards.
Cortex also supports setting alerts for monitoring API performance. It empowers dashboards RealtimeAPI, BatchAPI, Cluster resources, and node resources with alert configurations.
Key Features of Cortex
Autoscaling APIs
Cortex ensures auto-scaling of APIs’ to manage growing production workloads and traffic fluctuations. It effectively monitors API performance and prediction outcomes.
Automated Cluster Management
Cortex enables cluster autoscaling and running workloads on spot instances with automated on-demand backups. It supports creating multiple clusters with different configurations and scales workload performance.
Realtime APIs
Realtime APIs respond to requests in real-time and autoscale based on in-flight request volumes. They also support A/B tests, canary deployments, rolling updates, and synchronous responses.
Async APIs
Async APIs support asynchronous and longer workloads that require autoscaling based on request queue length.
Batch APIs
Batch APIs execute fault-tolerant and distributed batch processing jobs on demand. It is a good fit for users who want to divide their workloads and assign them to a dedicated team.
Microservices
Cortex facilitates implementing compute-intensive microservices without bothering resource limits and timeouts. The tool offers an excellent MLOps alternative with apt services to retrain and evaluate models.