Get an overview of MLOps (Machine Learning Operations) and discover how Kubernetes and Kubeflow can streamline your ML workflows. This talk will introduce the core principles of MLOps, highlighting its role in ensuring reliable, scalable, and efficient ML deployments.
We'll then dive into Kubef...
Get an overview of MLOps (Machine Learning Operations) and discover how Kubernetes and Kubeflow can streamline your ML workflows. This talk will introduce the core principles of MLOps, highlighting its role in ensuring reliable, scalable, and efficient ML deployments.
We'll then dive into Kubeflow, a powerful Kubernetes-native platform purpose-built for MLOps. You'll learn about Kubeflow's key components, including:
Pipeline Orchestration: Building and automating end-to-end ML workflows
Notebooks: Interactive development environments for experimentation
Model Serving: Deploying models for real-time or batch predictions
Training: Scaling model training on Kubernetes
Hyperparameter Tuning: Automate hyperparameter optimisation to find the best model configuration and improve model performance with Katib
Security & Access Control: Integration with Kubernetes RBAC
Key Takeaways:
Understand the core concepts of MLOps and why it's important
Learn how Kubeflow simplifies common ML challenges
Gain insights into Kubeflow's key components and how they work together
Get tips for starting your own MLOps journey with Kubernetes and Kubeflow.
MLOpsDefinition
MLOps(Machine Learning
Operations): A set of
practices that aims to deploy
and maintain machine
learning models in production
reliably and efficiently.
Purpose: Bridge the gap
between machine learning
(ML) and operations (Ops),
ensuring smooth integration
and functioning of ML models
in real-world applications.
The Kubeflow was first announced at KubeCon+ CloudNativeCon
North America 2017 by Google engineers David Aronchick, Jeremy
Lewi, and Vishnu Kannan.
The Kubeflow ecosystemis composed of multiple open-source
projects that address different aspects of the ML lifecycle.
The Kubeflow Platformrefers to the full suite of Kubeflow components bundled
together with additional integration and management tools. Using Kubeflow as a
platform means deploying a comprehensive ML toolkit for the entire ML lifecycle.
Kubeflow Components in the ML Lifecycle
The Kubeflow UI –Central Dashboard
Kubeflow Components in the ML Lifecycle
Kubeflow Pipelines (KFP)
•KFP is a platform for building
and running machine
learning workflows on
Kubernetes. It provides
higher-level abstractions for
Argo Workflows to reduce
repetition when defining
machine learning tasks.
Kubeflow Components in the ML Lifecycle
Kubeflow Spark Operator
•Apache Sparkis an unified
engine for large-scale data
analytics
•The Kubernetes Operator
for Apache Spark aims to
make specifying and
running Spark applications
as easy and idiomatic as
running other workloads
on Kubernetes.
Kubeflow Components in the ML Lifecycle
Kubeflow Notebooks
•Kubeflow Notebooks allows users to spawn Pods running instances of
JupyterLab, Visual Studio Code (code-server), and RStudio in profile
namespaces.
•As the cluster administrator, you may configure which options are available
to users when deploying a Notebook Pod:
oContainer Images
oContainer Resources (CPU, Memory, GPU)
oStorage Volumes
oAdvanced Pod Options (Affinity, Tolerations, PodDefaults)
oIdle Notebook Culling
Kubeflow Components in the ML Lifecycle
Katib -Hyperparameter Tuning
•Katib is ML framework-
agnostic open source. It
can tune
hyperparameters in
applications written in
any language of the
user’s choice.
•Katib efficiently builds
more accurate models
and reduces operational
and infrastructure
expenses, improving
company results.
Each line represents a single run’s configuration and its metrics.
Kubeflow Components in the ML Lifecycle
Training Operator
•Training Operator allows you to use Kubernetes workloads to effectively train
your large models via Kubernetes Custom Resources APIs or using Training
Operator Python SDK.
•Training Operator implements centralized Kubernetes controller to orchestrate
distributed training jobs.
Kubeflow Components in the ML Lifecycle
Kubeflow Model Registry (alpha)
Initial commit Sep 16, 2023
•Kubeflow Model Registry isa Go-based application that leverages ML Metadata (MLMD)
project under the hood. MLMD is a library for recording and retrieving metadata associated
with ML developer and data scientist workflows.
•Kubeflow Model Registry isan efficient way to share model versions, artifacts and metadata
with other users that need access to those models as part of the MLOpsworkflow.
Kubeflow Components in the ML Lifecycle
Kserve-former KFserve
•KServeis a standard Model
Inference Platform on
Kubernetes, built for highly
scalable use cases.
•Provides performant,
standardized inference
protocol across ML
frameworks.
•Support modern serverless
inference workload with
Autoscaling including Scale
to Zero on GPU.
Kubeflow Components in the ML Lifecycle
External Add-Ons
•Elyraenables data scientists to visually create end-to-end machine learning
(ML) workflows.
•Feast is an open-source feature store that helps teams operate ML systems
at scale by allowing them to define, manage, validate, and serve features to
models in production.
•Tools for Serving ML models in Kubeflow:
oSeldon Core Serving
oBentoML
oMLRunServing
Use Casesand Examples
•Deploying and Managing complex ML Systems at
Scale
•Research and Development with various Machine
Learning Models
•Hybrid and Multi-Cloud Machine Learning Workloads
•Hyperparameters tuning and optimization
Alternatives to Kubeflow
•Public Cloud MLOpsTools:
oAmazon SageMaker: Simplest to get
started with, as it is a fully managed
service with built-in capabilities.
oCGP Vertex AI:Provides a unified
interface, making it relatively simpler
to manage end-to-end ML workflows.
oAzure Machine Learning (Azure ML):
User-friendly and fully managed,
making it simpler to start with.
•Open Source MLOpsTools for Kubernetes:
oMLflow:Simplest among open-source tools,
providing a lightweight library for managing
the ML lifecycle.
oFlyte:Created by Lyft. Medium
complexity,while powerful, can be more
complex to set up and configure
oAirflow: Medium complexity, widely used for
orchestrating complex workflows with good
support for Kubernetes.
oMetaflow:Originally developed at Netflix,
medium to high complexityespecially when
integrating with Kubernetes