What is KFServing?
A collaborative project by Google, IBM, Bloomberg, NVIDIA, and Seldon in 2019, KFServing was launched as an open-source serverless inferencing solution on Kubernetes.
KFServing facilitates production model serving use cases for common ML frameworks like XGBoost, TensorFlow, scikit-learn, PyTorch, and ONNX. The tool offers a consistent and friendly interface to deploy ML models on Kubernetes.
KFServing has been recently rebranded as KServe, and its GitHub repository is transferred to an independent KServe GitHub organization.
How does KFServing help?
KFServing provides API for inference requests and standardizes ML operations on top of Kubernetes. With the ‘model as data’ approach, KFServing encapsulates the complexity of networking, configuration, autoscaling, health checking, and Canary deployments.
KFServing provides a simple yet complete story for production ML inference serving. It is compatible with different ML frameworks-Tensorflow, XGBoost, ScikitLearn, and ONNX.
KFServing uses two cloud-native technologies - Knative and Istio. Knative is a Kubernetes-based platform that helps manage serverless workloads. Istio uses a Kubernetes sidecar container and enables Canary roll-outs, load balancing, and routing.
Key Features of KFServing
Custom Resource Definition
KFServing implements a Kubernetes Custom Resource Definition(CRD) to serve machine learning models on arbitrary frameworks. CRD object extends the Kubernetes API and helps serve ML models using frameworks like Tensorflow, PyTorch, ONNX, and more.
KFServing abstracts away the complexity of autoscaling, health checking, networking, and server configuration. It accelerates ML deployments with cutting-edge features like GPU-TPU autoscaling, scale to zero, Canary roll-outs, and more.
A complete story
KFServing enables a complete, pluggable, and yet simple story for production ML inference server with its prediction, pre-processing, post-processing, and explainability support.
KFServing uses cloud-native technology Knative at its core. This Kubernetes–based platform helps manage serverless workloads and allows
- Scaling to and from zero
- Autoscaling of GPUs and TPUs to reduce latency
Istio is another cloud-native technology KFServing uses at its core. It is a service mesh technology that implements Kubernetes sidecars and enables
- Canary roll-outs
- Traffic routing and ingress management
- Observability for tracing, logging, and monitoring
- Load balancing