MLflow: A Platform for Production Machine Learning

matei 1,027 views 15 slides Dec 14, 2019
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

Presentation about MLflow and the ML Platforms / MLOps class of software systems at the NeurIPS 2019 ML Systems workshop.


Slide Content

: A Platform for
Production Machine Learning
Matei Zaharia
Databricks and Stanford University
@matei_zaharia

2
ML Research & CoursesML Products
ML in Production is Different from ML Research
Focus: reliably solving a business problem
Data is often the top challenge
(for models, try many common ones)
Must continuously deploy, monitor &
retrain models to maintain quality
Need new tools to enable this process!
(reproducibility, monitoring, …)
Focus: designing a good model
Data is provided and ready to use
(e.g. benchmark dataset)
No need to deploy, monitor, retrain
Tools for model design & evaluation
(e.g. TensorFlow, PyTorch, …)

Response: ML Platforms
Facebook FBLearner, Uber Michelangelo, Google TFX, …
+Standardize the data prep / training / deploy cycle:
if you work within the platform, you get these!
–Limited to a few algorithms or frameworks
–Tied to each company’s infrastructure
Can we provide similar benefits in an openmanner?

Open source machine learning platform
•Works with any ML library, algorithm, language, etc
•Open interfacedesign(use with any code you already have)
Tracking
Record and query
experiments: code,
data, confs, results
Projects
Packaging format
for reproducible
runs and workflows
Models
General format
that standardizes
deployment paths
Model Registry
Centralized model
management,
review& sharing
new

Community
158 contributors from >50 companies
•Integrated in RStudio, Azure ML, Faculty.ai, Neptune, Splice
900k downloads/month on PyPI

$ mlflow ui
MLflowTracking
data = load_text(file)
ngrams= extract_ngrams(data, N=n)
model = train_model(ngrams,
learning_rate=lr)
score = compute_accuracy(model)
mlflow.log_param(“data_file”, file)
mlflow.log_param(“n”, n)
mlflow.log_param(“learning_rate”, lr)
mlflow.log_metric(“score”, score)
mlflow.keras.log_model(model)Track parameters, metrics,
output files & code version

data = load_text(file)
ngrams= extract_ngrams(data, N=n)
model = train_model(ngrams,
learning_rate=lr)
score = compute_accuracy(model)
mlflow.log_param(“data_file”, file)
mlflow.log_param(“n”, n)
mlflow.log_param(“learning_rate”, lr)
mlflow.log_metric(“score”, score)
mlflow.keras.log_model(model)
$ mlflow ui
MLflowTracking
Track parameters, metrics,
output files & code version
mlflow.keras.autolog()

MLflow UI: Inspecting Runs

MLflowModel Registry
GitHub-like environment for organizing & reviewing models
Model Registry
MODEL
DEVELOPER
DOWNSTREAM
USERS
REST SERVINGREVIEWERS,
CI/CD TOOLS

10

11
Released in MLflow1.4

Interesting MLflowUse Cases
1) Massive number of independent models
•Company wants to train a separate model for each {facility,
chemical processing machine, household, …}
•Solution:large Spark job that runs an AutoMLlibrary for each task
+ MLflowformanaging & selecting models
•ML scientists can’t look at each model ⇒need hands-free ML!

Example:
Millions of models trained on terabytesof data/day

Interesting MLflowUse Cases
2) Big data analytics on model training results
•ML developer wants to analyze the result of multiple runs
interactively, possibly slicing across data points
•Solution:Pandas & SQL interfaces to MLflowtracking data
df = mlflow.search_runs(experiment_id, “metrics.loss< 2.5”)

Conclusion
Turning ML into reliable products ishardandrequiresanew
classofsystems (MLPlatforms)
Try MLflow at mlflow.org
Join the MLOpsworkshop at MLSys2020