Machine learning operations model book mlops

Prof. Dr. Jan Kirenz
Machine Learning Operations (MLOps)
Usage of Pipelines in the ML Lifecycle with
Tensor Flow Extended (TFX) and Kubeﬂow
Prof. Dr. Jan Kirenz
HdM Stuttgart

Prof. Dr. Jan Kirenz
80-85% PoC Factory
The Proof of Concept Factory
Most companies...
●… conduct AI experiments and pilots
but achieve a low scaling success
rate
●… have signiﬁcant under investment,
yielding low returns
Source: Accenture (2019) https://www.accenture.com/us-en/insights/artificial-intelligence/ai-investments

Prof. Dr. Jan Kirenzhttps://www.gartner.com/smarterwithgartner/gartner-top-10-data-and-analytics-trends-for-2021/

Scalable AI

Prof. Dr. Jan Kirenz
ML Project Code
The problem with scaling AI: ML code is only a
fraction of a production-ready ML project code
ML
Code
5-10%

Prof. Dr. Jan Kirenz
Monitoring
Hidden technical debt in machine learning systems
Sculley, D. et al. (2015). Hidden technical debt in machine learning systems. Advances in neural information processing systems, 28, pp. 2503-2511
Data Collection
Conﬁguration
Feature Engineering
Data
Veriﬁcation
Metadata Management
Model Analysis
Serving
Infra-
structure
Automation
Process Management
Machine Resource
Management
Testing and Debugging
ML
Code

Prof. Dr. Jan Kirenz
Machine learning operations (MLOps)
●ML Engineering culture and practice that
aims at unifying ML System development
(Dev) and ML system operations (Ops)
●Tools and principles to support workﬂow
standardization and automation through
the ML system lifecycle (e.g. with pipelines)

Prof. Dr. Jan Kirenz Prof. Dr. Jan Kirenz
Machine learning
lifecycle

Prof. Dr. Jan Kirenz
Plan
Model
Deployment Data
Business Analyst
Data
Engineer
Software
Developer
Data Scientist
Lifecycle
of an ML System

Plan | Data | Model | Deployment

Prof. Dr. Jan Kirenz
Plan
Model
Deployment Data
Identify use
case
Frame
problem
Identify
variables
Deﬁne metrics
Business Analyst
Data
Engineer
Software
Developer
Data Scientist
Lifecycle
of an ML System

Plan | Data | Model | Deployment

Prof. Dr. Jan Kirenz
Plan
Model
Deployment Data
Identify use
case
Frame
problem
Identify
variables
Deﬁne metrics
Business Analyst
Data
Engineer
Software
Developer
Data Scientist
Data ingestion
Analyze &
clean data
Deﬁne schema
Feature
engineering
Data splitting
Anomaly
detection
Data preprocessing
Lifecycle
of an ML System

Plan | Data | Model | Deployment

Prof. Dr. Jan Kirenz
Plan
Model
Deployment Data
Identify use
case
Frame
problem
Identify
variables
Deﬁne metrics
Business Analyst
Data
Engineer
Software
Developer
Data Scientist
Data ingestion
Analyze &
clean data
Deﬁne schema
Feature
engineering
Evaluate
model
Model Training
& tuning
Select
algorithm
Data splitting
Anomaly
detection
Data preprocessing
Lifecycle
of an ML System

Plan | Data | Model | Deployment

Prof. Dr. Jan Kirenz
Plan
Model
Deployment Data
Identify use
case
Frame
problem
Identify
variables
Deﬁne metrics
Business Analyst
Data
Engineer
Software
Developer
Data Scientist
Data ingestion
Analyze &
clean data
Deﬁne schema
Feature
engineering
Validate model
Deploy model
Serve model
Retrain
triggers
Evaluate
model
Model Training
& tuning
Monitor model
Select
algorithm
Data splitting
Anomaly
detection
Data preprocessing
Lifecycle
of an ML System

Plan | Data | Model | Deployment

Prof. Dr. Jan Kirenz
Plan
Model
Deployment Data
Identify use
case
Frame
problem
Identify
variables
Deﬁne metrics
Business Analyst
Data
Engineer
Software
Developer
Data Scientist
Data ingestion
Analyze &
clean data
Deﬁne schema
Feature
engineering
Validate model
Deploy model
Serve model
Retrain
triggers
Evaluate
model
Model Training
& tuning
Monitor model
Select
algorithm
Data splitting
Anomaly
detection
Data preprocessing
Lifecycle
of an ML System

Plan | Data | Model | Deployment
Common issues which lead to a PoC to production gap
●Lack of reuse and duplication
●Inconsistency (data, code, models)
●Manual and slow transition from PoC to production

Prof. Dr. Jan Kirenz
Plan
Model
Deployment Data
Identify use
case
Frame
problem
Identify
variables
Deﬁne metrics
Business Analyst
Data
Engineer
Software
Developer
Data Scientist
Data ingestion
Analyze &
clean data
Deﬁne schema
Feature
engineering
Validate model
Deploy model
Serve model
Retrain
triggers
Evaluate
model
Model Training
& tuning
Monitor model
Select
algorithm
Data splitting
Anomaly
detection
Data preprocessing
Model
management
Model registry
Data and feature
management
Feature store
Pipeline
management
Pipeline orchestration
Metadata
management
Metadata store
Lifecycle
of an ML System

Plan | Data | Model | Deployment
MLOps components

Prof. Dr. Jan Kirenz
What is a pipeline?
●Description of an ML workﬂow
●A pipeline component is a self-contained
set of user code that performs one step in
the pipeline
●Includes the deﬁnition of the conﬁguration
and inputs required to run the pipeline (e.g.
model hyperparameters)
… do this
… than that
Start
...
… the end
The workﬂow is
also called directed
acyclic graph (DAG)
This is a component
Complete workﬂow of the ML
system lifecycle

Prof. Dr. Jan KirenzSource: Baer & Ngahane (2019)
… do this
… than that
Start
… the end

Prof. Dr. Jan Kirenz
TensorFlow Extended (TFX)
●Google-production-scale machine learning
(ML) platform based on TensorFlow
●Portable to multiple environments (Azure,
AWS, Google Cloud, IBM, ...)
●Python based toolkit; can be used with
notebooks
●Helps you orchestrate your ML process:
Apache Airﬂow, Apache Beam or Kubeﬂow
pipelines
Source: TensorFlow (2021)

Prof. Dr. Jan Kirenz
TFX 1.0 (19.05.21)
●Enterprise-grade support
●Security patches and select bug ﬁxes for
up to three years
●Guaranteed API & Artifact backward
compatibility
Source: Google (2021)

Prof. Dr. Jan Kirenz
Plan
Model
Deployment Data
Identify use
case
Frame
problem
Identify
variables
Deﬁne metrics
Business Analyst
Data
Engineer
Software
Developer
Data Scientist
Data ingestion
Analyze &
clean data
Deﬁne schema
Feature
engineering
Validate model
Deploy model
Serve model
Evaluate
model
Model Training
& tuning
Monitor model
ExampleGen
Select
algorithm
StatisticsGen
SchemaGen
Example
Validator
Transform
Data splitting
Trainer
TunerEvaluator
InfraValidator
Anomaly
detection
Pusher
HUB / JS / LITE / SERVING
Model Server
TF Data Validation (TFDV)
TFT
TF
KerasTuner
TensorFlow
Model Analysis
(TFMA)
BulkInferrer
Data preprocessing
Metadata Store: ML Metadata (MLMD)
TFX Options for Pipeline
Orchestration

Prof. Dr. Jan Kirenz
Plan
Model
Deployment Data
Identify use
case
Frame
problem
Identify
variables
Deﬁne metrics
Business Analyst
Data
Engineer
Software
Developer
Data Scientist
Data ingestion
Analyze &
clean data
Deﬁne schema
Feature
engineering
Validate model
Deploy model
Serve model
Evaluate
model
Model Training
& tuning
Monitor model
ExampleGen
Select
algorithm
StatisticsGen
SchemaGen
Example
Validator
Transform
Data splitting
Trainer
TunerEvaluator
InfraValidator
Anomaly
detection
Pusher
HUB / JS / LITE / SERVING
Model Server
TF Data Validation (TFDV)
TFT
TF
KerasTuner
TensorFlow
Model Analysis
(TFMA)
BulkInferrer
Metadata Store (ML Metadata)
TFX Options for Pipeline
Orchestration
Data preprocessing

Prof. Dr. Jan Kirenz
Plan
Model
Deployment Data
Identify use
case
Frame
problem
Identify
variables
Deﬁne metrics
Business Analyst
Data
Engineer
Software
Developer
Data Scientist
Data ingestion
Analyze &
clean data
Deﬁne schema
Feature
engineering
Validate model
Deploy model
Serve model
Evaluate
model
Model Training
& tuning
Monitor model
ExampleGen
Select
algorithm
StatisticsGen
SchemaGen
Example
Validator
Transform
Data splitting
Trainer
TunerEvaluator
InfraValidator
Anomaly
detection
Pusher
HUB / JS / LITE / SERVING
Model Server
TF Data Validation (TFDV)
TFT
TF
KerasTuner
TensorFlow
Model Analysis
(TFMA)
BulkInferrer
Data preprocessing
Metadata Store (ML Metadata)
TFX Options for Pipeline
Orchestration

Prof. Dr. Jan Kirenz
Plan
Model
Deployment Data
Identify use
case
Frame
problem
Identify
variables
Deﬁne metrics
Business Analyst
Data
Engineer
Software
Developer
Data Scientist
Data ingestion
Analyze &
clean data
Deﬁne schema
Feature
engineering
Validate model
Deploy model
Serve model
Evaluate
model
Model Training
& tuning
Monitor model
ExampleGen
Select
algorithm
StatisticsGen
SchemaGen
Example
Validator
Transform
Data splitting
Trainer
TunerEvaluator
InfraValidator
Anomaly
detection
Pusher
HUB / JS / LITE / SERVING
Model Server
TF Data Validation (TFDV)
TFT
TF
TensorFlow
Model Analysis
(TFMA)
BulkInferrer
Data preprocessing
Metadata Store (ML Metadata)
TFX Options for Pipeline
Orchestration

Prof. Dr. Jan Kirenz
Plan
Model
Deployment Data
Identify use
case
Frame
problem
Identify
variables
Deﬁne metrics
Business Analyst
Data
Engineer
Software
Developer
Data Scientist
Data ingestion
Analyze &
clean data
Deﬁne schema
Feature
engineering
Validate model
Deploy model
Serve model
Evaluate
model
Model Training
& tuning
Monitor model
ExampleGen
Select
algorithm
StatisticsGen
SchemaGen
Example
Validator
Transform
Data splitting
Trainer
TunerEvaluator
InfraValidator
Anomaly
detection
Pusher
HUB / JS / LITE / SERVING
Model Server
TF Data Validation (TFDV)
TFT
TF
KerasTuner
TensorFlow
Model Analysis
(TFMA)
BulkInferrer
Data preprocessing
Metadata Store (ML Metadata)
TFX Options for Pipeline
Orchestration

Prof. Dr. Jan Kirenz
TunerEvaluator
InfraValidator
ExampleGen
StatisticsGen
SchemaGen
Example
Validator
Transform
Trainer
Pusher
HUB / JS / LITE / SERVING
Model Server
KerasTuner
BulkInferrer
Metadata Store (ML Metadata)
TF Data Validation (TFDV)
TFT
TF
TFX Options for Pipeline
Orchestration
TensorFlow
Model Analysis
(TFMA)
TensorFlow Lite is a set of tools that enables on-device
machine learning by helping developers run their models
on mobile, embedded, and IoT devices.

Prof. Dr. Jan Kirenz
Plan
Model
Deployment Data
Identify use
case
Frame
problem
Identify
variables
Deﬁne metrics
Business Analyst
Data
Engineer
Software
Developer
Data Scientist
Data ingestion
Analyze &
clean data
Deﬁne schema
Feature
engineering
Validate model
Deploy model
Serve model
Retrain model
Evaluate
model
Model Training
& tuning
Monitor model
ExampleGen
Select
algorithm
StatisticsGen
SchemaGen
Example
Validator
Transform
Data splitting
Trainer
TunerEvaluator
InfraValidator
Anomaly
detection
Pusher
HUB / JS / LITE / SERVING
Model Server
TF Data Validation (TFDV)
TFT
TF
KerasTuner
TensorFlow
Model Analysis
(TFMA)
BulkInferrer
Data preprocessing
Metadata Store (ML Metadata)
TFX Options for Pipeline
Orchestration

Production phase:
automate the execution
of the ML pipeline based
on a schedule or certain
triggering conditions.

Development phase: run the ML experiment, instead of
manually executing each step.

Data preparation
phase:
automatically
ingest, validate
and transform
data and provide
features to models

Prof. Dr. Jan Kirenz Prof. Dr. Jan Kirenz
Pipeline orchestration

Prof. Dr. Jan Kirenz
TFX & Apache Airﬂow
●Programmatically author, schedule and
monitor workﬂows with Python code.
●User interface to visualize pipelines
running in production, monitor progress,
and troubleshoot issues.

Prof. Dr. Jan Kirenz
TFX & Apache Beam
●Provides a framework for running batch
and streaming data processing jobs that
run on a variety of runners (Spark, Flink, ...).
●Beam provides an abstraction layer which
enables TFX to run on any supported
runner without code modiﬁcations
●TFX only uses the Beam Python API

Prof. Dr. Jan Kirenz
TFX & Kubeﬂow pipelines
The Kubeﬂow Pipelines platform consists of:
●An engine for scheduling multi-step ML
workﬂows (using Kubernetes).
●User interface (UI) for managing and
tracking experiments, jobs, and runs.
●Python SDK for deﬁning and manipulating
pipelines and components.
●Notebooks for interacting with the system
using the SDK
Kubeﬂow Pipelines is available as a core component of Kubeﬂow or as
a standalone installation.

Prof. Dr. Jan Kirenz

Prof. Dr. Jan Kirenz
KubeFlow

Prof. Dr. Jan Kirenz
ML toolkit for Kubernetes

Prof. Dr. Jan Kirenz
Google’s Vertex AI
Launched in May 2021

Prof. Dr. Jan Kirenz
ML Pipelines | wrap-up
Source: TensorFlow (2021)
By using a ML pipeline, you can:
●Automate your ML process, which lets you
regularly retrain, evaluate, and deploy your
model.
●Utilize distributed compute resources for
processing large datasets and workloads.
●Increase the velocity of experimentation by
running a pipeline with different sets of
hyperparameters.
To learn more visit the following tutorials @:
https://kirenz.github.io/
MLOps tutorials on how to:
●Install TF and TFX
●Build your ﬁrst TFX pipeline
●Install Kubeﬂow
●Build your ﬁrst Kubeﬂow pipeline

Jan Kirenz
www.kirenz.com

Prof. Dr. Jan Kirenz
Backup

Continuous Integration
and Continuous Delivery
pipeline for an ML/AI
project in Microsoft Azure

Prof. Dr. Jan Kirenz
Continuous Integration
and Continuous Delivery
pipeline for an ML/AI
project in Microsoft Azure

Prof. Dr. Jan Kirenz
CI/CD and source code management

Prof. Dr. Jan Kirenz
Cloud based machine learning service

Prof. Dr. Jan KirenzSource: Microsoft (2021) https://github.com/Microsoft/MLOpsPython

Prof. Dr. Jan KirenzSource: Microsoft (2021) https://docs.microsoft.com/en-us/azure/architecture/reference-architectures/ai/mlops-python

1
2
3

Prof. Dr. Jan Kirenzhttps://github.com/Microsoft/MLOpsPython

Prof. Dr. Jan Kirenz

Prof. Dr. Jan Kirenz

End of Demo

Continuous Integration
and Continuous Delivery
pipeline for an ML/AI
project in Microsoft Azure

Machine learning operations model book mlops

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Machine learning operations model book mlops

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx