KFServing

What is KFServing?

A collaborative project by Google, IBM, Bloomberg, NVIDIA, and Seldon in 2019, KFServing was launched as an open-source serverless inferencing solution on Kubernetes.

KFServing facilitates production model serving use cases for common ML frameworks like XGBoost, TensorFlow, scikit-learn, PyTorch, and ONNX. The tool offers a consistent and friendly interface to deploy ML models on Kubernetes.

KFServing has been recently rebranded as KServe, and its GitHub repository is transferred to an independent KServe GitHub organization.

How does KFServing help?

KFServing provides API for inference requests and standardizes ML operations on top of Kubernetes. With the ‘model as data’ approach, KFServing encapsulates the complexity of networking, configuration, autoscaling, health checking, and Canary deployments.

KFServing provides a simple yet complete story for production ML inference serving. It is compatible with different ML frameworks-Tensorflow, XGBoost, ScikitLearn, and ONNX.

KFServing uses two cloud-native technologies - Knative and Istio. Knative is a Kubernetes-based platform that helps manage serverless workloads. Istio uses a Kubernetes sidecar container and enables Canary roll-outs, load balancing, and routing.

Key Features of KFServing

Custom Resource Definition

KFServing implements a Kubernetes Custom Resource Definition(CRD) to serve machine learning models on arbitrary frameworks. CRD object extends the Kubernetes API and helps serve ML models using frameworks like Tensorflow, PyTorch, ONNX, and more.

Encapsulation

KFServing abstracts away the complexity of autoscaling, health checking, networking, and server configuration. It accelerates ML deployments with cutting-edge features like GPU-TPU autoscaling, scale to zero, Canary roll-outs, and more.

A complete story

KFServing enables a complete, pluggable, and yet simple story for production ML inference server with its prediction, pre-processing, post-processing, and explainability support.

Knative implementation

KFServing uses cloud-native technology Knative at its core. This Kubernetes–based platform helps manage serverless workloads and allows

Scaling to and from zero
Autoscaling of GPUs and TPUs to reduce latency

Istio implementation

Istio is another cloud-native technology KFServing uses at its core. It is a service mesh technology that implements Kubernetes sidecars and enables

Canary roll-outs
Traffic routing and ingress management
Observability for tracing, logging, and monitoring
Load balancing
Security

What is KFServing?

How does KFServing help?

Key Features of KFServing

Custom Resource Definition

Encapsulation

A complete story

Knative implementation

Istio implementation

Companies using

KFServing

Liked the content? You'll love our emails!

Other

Model Serving

Tools

Cortex

BentoML

Seldon Core

Streamlit

Censius automates model monitoring

so that you can

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare

improve models

scale businesses

detect frauds

boost healthcare