Model Serving


Jun 2019
Apache-2.0 License
Github open issues
Github stars
9 Nov
Github last commit
Stackoverflow questions

What is KFServing?

A collaborative project by Google, IBM, Bloomberg, NVIDIA, and Seldon in 2019, KFServing was launched as an open-source serverless inferencing solution on Kubernetes. 

KFServing facilitates production model serving use cases for common ML frameworks like XGBoost, TensorFlow, scikit-learn, PyTorch, and ONNX. The tool offers a consistent and friendly interface to deploy ML models on Kubernetes. 

KFServing has been recently rebranded as KServe, and its GitHub repository is transferred to an independent KServe GitHub organization.


How does KFServing help?

KFServing provides API for inference requests and standardizes ML operations on top of Kubernetes. With the ‘model as data’ approach, KFServing encapsulates the complexity of networking, configuration, autoscaling, health checking, and Canary deployments.

KFServing provides a simple yet complete story for production ML inference serving. It is compatible with different ML frameworks-Tensorflow, XGBoost, ScikitLearn, and ONNX.  

KFServing uses two cloud-native technologies - Knative and Istio. Knative is a Kubernetes-based platform that helps manage serverless workloads. Istio uses a Kubernetes sidecar container and enables Canary roll-outs, load balancing, and routing.


Key Features of KFServing

Custom Resource Definition

KFServing implements a Kubernetes Custom Resource Definition(CRD) to serve machine learning models on arbitrary frameworks. CRD object extends the Kubernetes API and helps serve ML models using frameworks like Tensorflow, PyTorch, ONNX, and more.


KFServing abstracts away the complexity of autoscaling, health checking, networking, and server configuration. It accelerates ML deployments with cutting-edge features like GPU-TPU autoscaling, scale to zero, Canary roll-outs, and more.

A complete story

KFServing enables a complete, pluggable, and yet simple story for production ML inference server with its prediction, pre-processing, post-processing, and explainability support.

Knative implementation

KFServing uses cloud-native technology Knative at its core. This Kubernetes–based platform helps manage serverless workloads and allows

  • Scaling to and from zero
  • Autoscaling of GPUs and TPUs to reduce latency 

Istio implementation

Istio is another cloud-native technology KFServing uses at its core. It is a service mesh technology that implements Kubernetes sidecars and enables

  • Canary roll-outs
  • Traffic routing and ingress management
  • Observability for tracing, logging, and monitoring 
  • Load balancing 
  • Security

Companies using


No items found.

Liked the content? You'll love our emails!

The best MLOps and AI Observability content handpicked and delivered to your email twice a month

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Censius automates model monitoring

so that you can 

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare

Start Monitoring