MLOps with Kubernetes - Thiago Ramos.pdf

ThiagoRamos343326 85 views 32 slides Jul 06, 2024
Slide 1
Slide 1 of 32
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32

About This Presentation

Get an overview of MLOps (Machine Learning Operations) and discover how Kubernetes and Kubeflow can streamline your ML workflows. This talk will introduce the core principles of MLOps, highlighting its role in ensuring reliable, scalable, and efficient ML deployments.

We'll then dive into Kubef...


Slide Content

AnIntroductiontoKubeflow
ThiagoRamos-CKAD

Agenda
•WhatisMLOps
•Role ofKubernetesin MLOps
•WhatisKubeflow
•KubeflowPlatform& Ecosystem
•KubeflowFeatures andCapabilities
•Use Cases andReal-World Examples
•Q&A

MLOpsDefinition
MLOps(Machine Learning
Operations): A set of
practices that aims to deploy
and maintain machine
learning models in production
reliably and efficiently.
Purpose: Bridge the gap
between machine learning
(ML) and operations (Ops),
ensuring smooth integration
and functioning of ML models
in real-world applications.

Key Principles of MLOps

ML LifecycleStage Key BenefitofMLOps Key Challenge
Development:
Productivity &
Reproducibility
Automatesrepetitivetasks, freeinguptime for data
scientiststofocusonhigh-valuetasks.
Initialimplementationmayrequire
significanteffort.
Ensuresconsistentandreproducibleresultsacross
environments.
Ensuringversioncontrolfor
datasets, code, andmodels.
Deployment
Speedsuptime frommodel developmenttodeployment.Requires initialsetup ofautomated
pipelines.
Collaboration
Enhancescollaborationbetweendata scientists, developers,
andoperations.
Bridginggaps betweencross-
functionalteams.
Scaling
Efficientlymanagescomputationalresourcesandscales
models acrossenvironments.
Scalinginfrastructuretohandle
increasingdata andmodel
complexities.
Monitoring
Systematicmonitoringofmodel performance detectsdata
andmodel drift, ensuringcontinuousaccuracy.
Setting upcomprehensive
monitoringsystems andhandling
false positives in alerts.
Maintenance
Continuousmonitoringandreal-time feedback improve
model accuracyandreliability.
Setting uprobustmonitoring
systems.
CostManagement
Minimizes manual effortsandoperationalcoststhrough
automation.
Initialinvestmentin MLOpstools
andinfrastructure.

The Kubeflow was first announced at KubeCon+ CloudNativeCon
North America 2017 by Google engineers David Aronchick, Jeremy
Lewi, and Vishnu Kannan.

The Kubeflow ecosystemis composed of multiple open-source
projects that address different aspects of the ML lifecycle.
The Kubeflow Platformrefers to the full suite of Kubeflow components bundled
together with additional integration and management tools. Using Kubeflow as a
platform means deploying a comprehensive ML toolkit for the entire ML lifecycle.

Kubeflow Components in the ML Lifecycle

The Kubeflow UI –Central Dashboard

Kubeflow Components in the ML Lifecycle

Kubeflow Pipelines (KFP)
•KFP is a platform for building
and running machine
learning workflows on
Kubernetes. It provides
higher-level abstractions for
Argo Workflows to reduce
repetition when defining
machine learning tasks.

Kubeflow Components in the ML Lifecycle

Kubeflow Spark Operator
•Apache Sparkis an unified
engine for large-scale data
analytics
•The Kubernetes Operator
for Apache Spark aims to
make specifying and
running Spark applications
as easy and idiomatic as
running other workloads
on Kubernetes.

Kubeflow Components in the ML Lifecycle

Kubeflow Notebooks
•Kubeflow Notebooks allows users to spawn Pods running instances of
JupyterLab, Visual Studio Code (code-server), and RStudio in profile
namespaces.
•As the cluster administrator, you may configure which options are available
to users when deploying a Notebook Pod:
oContainer Images
oContainer Resources (CPU, Memory, GPU)
oStorage Volumes
oAdvanced Pod Options (Affinity, Tolerations, PodDefaults)
oIdle Notebook Culling

Kubeflow Components in the ML Lifecycle

Katib -Hyperparameter Tuning
•Katib is ML framework-
agnostic open source. It
can tune
hyperparameters in
applications written in
any language of the
user’s choice.
•Katib efficiently builds
more accurate models
and reduces operational
and infrastructure
expenses, improving
company results.
Each line represents a single run’s configuration and its metrics.

Kubeflow Components in the ML Lifecycle

Training Operator
•Training Operator allows you to use Kubernetes workloads to effectively train
your large models via Kubernetes Custom Resources APIs or using Training
Operator Python SDK.
•Training Operator implements centralized Kubernetes controller to orchestrate
distributed training jobs.

Kubeflow Components in the ML Lifecycle

Kubeflow Model Registry (alpha)
Initial commit Sep 16, 2023
•Kubeflow Model Registry isa Go-based application that leverages ML Metadata (MLMD)
project under the hood. MLMD is a library for recording and retrieving metadata associated
with ML developer and data scientist workflows.
•Kubeflow Model Registry isan efficient way to share model versions, artifacts and metadata
with other users that need access to those models as part of the MLOpsworkflow.

Kubeflow Components in the ML Lifecycle

Kserve-former KFserve
•KServeis a standard Model
Inference Platform on
Kubernetes, built for highly
scalable use cases.
•Provides performant,
standardized inference
protocol across ML
frameworks.
•Support modern serverless
inference workload with
Autoscaling including Scale
to Zero on GPU.

Kubeflow Components in the ML Lifecycle

External Add-Ons
•Elyraenables data scientists to visually create end-to-end machine learning
(ML) workflows.
•Feast is an open-source feature store that helps teams operate ML systems
at scale by allowing them to define, manage, validate, and serve features to
models in production.
•Tools for Serving ML models in Kubeflow:
oSeldon Core Serving
oBentoML
oMLRunServing

Use Casesand Examples
•Deploying and Managing complex ML Systems at
Scale
•Research and Development with various Machine
Learning Models
•Hybrid and Multi-Cloud Machine Learning Workloads
•Hyperparameters tuning and optimization

Alternatives to Kubeflow
•Public Cloud MLOpsTools:
oAmazon SageMaker: Simplest to get
started with, as it is a fully managed
service with built-in capabilities.
oCGP Vertex AI:Provides a unified
interface, making it relatively simpler
to manage end-to-end ML workflows.
oAzure Machine Learning (Azure ML):
User-friendly and fully managed,
making it simpler to start with.
•Open Source MLOpsTools for Kubernetes:
oMLflow:Simplest among open-source tools,
providing a lightweight library for managing
the ML lifecycle.
oFlyte:Created by Lyft. Medium
complexity,while powerful, can be more
complex to set up and configure
oAirflow: Medium complexity, widely used for
orchestrating complex workflows with good
support for Kubernetes.
oMetaflow:Originally developed at Netflix,
medium to high complexityespecially when
integrating with Kubernetes

Questions?

Sources:
•https://www.deeplearning.ai/courses/machine-learning-specialization/
•https://www.kubeflow.org/
•https://papers.neurips.cc/paper/5656-hidden-technical-debt-in-machine-learning-
systems.pdf
•https://infohub.delltechnologies.com/en-us/l/machine-learning-using-red-hat-openshift-
container-platform-1/
•https://engineering.atspotify.com/2019/12/the-winding-road-to-better-machine-learning-
infrastructure-through-tensorflow-extended-and-kubeflow/
•https://www.reddit.com/r/machinelearningmemes/
•https://huggingface.co/blog/turhancan97/building-your-first-kubeflow-pipeline
•https://www.opencredo.com/blogs/machine-learning-at-scale-first-impressions-of-
kubeflow
•https://www.youtube.com/@cncf/search?query=kubeflow

Thank you!