CNCF-Istanbul-MLOps for Devops Engineers.pptx

cansukavili1 220 views 29 slides Oct 16, 2024
Slide 1
Slide 1 of 29
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29

About This Presentation

CNCF Istanbul-MLOps for Devops Engineers


Slide Content

Bridging DevOps and MLOps: A Practitioner’s Guide CNCF Istanbul, October 2024

‹#› ☘️ Introduction github.com/ckavili [email protected] Cansu Kavili Ö rnek

‹#› DevOps Data Science 🏃‍♀️ Let’s start

🥜 DevOps in a Nutshell Plan & Code Build & Test Release & Deploy Monitor & Learn ‹#›

‹#› The practice of deploying machine learning models into production reliably and efficiently. 🤖 What is MLOps?

Because, many of the challenges facing developers also apply to data scientists ‹#› 🥰 Why MLOps?

Siloed organization and poor communication between teams “Works on my machine” 🤷🏻‍♀️ Lacking the ability to properly test, deploy, maintain software Not having access to decision makers Verify the model/feature you deploy is still relevant Reproducibility, traceability and explainability ... ‹#› 🙊 Common Challenges

‹#› The Machine Learning Lifecycle Data Engineering Data Ingestion Data Cleansing Data Analysis Data Transformation Data Validation Data Science Data Splitting Feature Engineering Model Development Model Training Training Optimization Model Validation Continuous Integration & Deployment Data Preprocessing App Dev / Heuristics Inferencing Pipeline Deployment Targets Deployment Patterns Monitor / alerts Consumption & optimization metrics Satisficing (Gating) metric Logging & Visualization Explainability, Interpolation Drift, Decay, Skew, Shift Improvements Gather and Prep Data Deployment M onitoring Training MLOps DataOps Experimentation 🦄 MLOps Overview

‹#› 🦄 MLOps Overview Operationalizing AI/ML requires collaboration App developer ML platform engineer Data engineer Data scientist ML engineer Business leadership Set goals Gather a nd prepare data Training Monitoring Deployment Every member of your team plays a critical role in a complex process

‹#› 🦄 MLOps Overview feature engineering model training and tuning data collection and cleaning model validation model deployment monitoring, validation codifying problem and metrics codifying problem and metrics feature engineering model training and tuning model validation data collection and cleaning model deployment monitoring, validation The Machine Learning Lifecycle

‹#› 🦄 MLOps Overview The Machine Learning Lifecycle Gather a nd prepare data Monitor model Develop model Retrain model Deploy model

‹#› 🦄 MLOps Overview We’ve seen this before.. Gather a nd prepare data Monitor model Develop model Retrain model Code Deploy Operate & monitor QA Iterate Deploy model

‹#› 🦄 MLOps Overview feature engineering model training and tuning data collection and cleaning model validation model deployment monitoring, validation codifying problem and metrics codifying problem and metrics feature engineering model training and tuning model validation data collection and cleaning model deployment monitoring, validation feature engineering model training and tuning data collection and cleaning model validation model deployment monitoring, validation codifying problem and metrics bugfix or feature request develop test UAT architecture design production monitoring We’ve seen this before..

‹#› Data Science Essentials

‹#› 🔥 Data Science Essentials What is a model really? In a nutshell, it is a set of parameters plus the algorithm or the neural network architecture, that can be packaged in a single (usually binary or compressed) file.

‹#› 🔥 Data Science Essentials Square Pentagon Triangle Raw Data Labeled Data Training Data Test Data Square Pentagon Square Triangle Triangle Pentagon Square Pentagon Triangle Square Triangle Model Training Model + Model Artifact Model Evaluation 87% Accuracy .82 R-square .032 MSE Model Training Overview

‹#› 🔥 Data Science Essentials Model Serving Model Model in a Container Clients API Input Prediction

‹#› 🔥 Data Science Essentials What about LLMs?

‹#› ML Platform Engineer

‹#› Clark Kent ML Platform Engineer [email protected] (123) 456-7890 Metropolis, NY SKILLS Kubernetes Day Two Ops GPU Management GitOps Containers Python Package Management CAREER OBJECTIVE To enable Data Science teams to scale development and experimentation of Data Science projects using Kube Native tooling to more rapidly prove value with machine learning. WORK EXPERIENCE Created GPU enabled kubernetes cluster to enable multiple data science teams to collaborate and rapidly iterate on data science experiments. Resulted in the a reduction from the time of experiment idea to proof of value and an increase in the number of value added experiments. Managed large multi-tenant environment and provided best practices for managing shared resources for large, distributed compute ML training jobs, including GPU, CPU, and large memory pools. This effort resulted in increased resource utilization, and faster training times for ML jobs. Created multiple Jupyter Notebook Images with team specific Python packages to increase collaboration and reduce number of python dependency issues. Established multi-cluster architecture to enable training and deployment of ML models alongside existing non-ML microservices. 🐈 ML Platform Engineer

‹#› 🐈 ML Platform Engineer Multi Cluster Architecture ML Training Cluster Application Cluster Multi-Tenant Projects Multi-Tenant Projects GitOps Management GitOps Management Accelerated Compute S3 Compatible Storage Cluster Monitoring Cluster Monitoring IDE Distributed ML Training ML Pipelines Model Serving Model Monitoring Model Explainability S3 Compatible Storage Experiment Deployment

‹#› ML Engineer

‹#› 🐈‍⬛ ML Engineer Lois Lane ML Engineer [email protected] (123) 987-6543 Metropolis, NY SKILLS Kubernetes GitOps CI/CD Automation Python Testing REST/GRCP APIs Observability CAREER OBJECTIVE Help businesses to evolve ML Experiments to production ready inference services by creating repeatable pipelines while building trust in ML services. WORK EXPERIENCE Assisted Data Scientist to transform ML Experiment into production ready model and deploy ML model as an API endpoint that is able to be consumed as a microservice. Enabled data science team to actualize value of experiment and get the model out of Jupyter. Created repeatable pipeline to orchestrate training and deployment of ML model as a REST API. Resulted in rapid iteration of ML model enabling increased accuracy of predictions. Created robust ML testing to ensure code changes result in accurate ML models and validate that deployed models using blue/green deployment strategies. Resulted in reduced number of rolled back models, and increased performance of model accuracy against production results. Created observability with dashboards and alerts of model performance with relation to inference time, accuracy of predictions, and impact on business objectives.

‹#› 🐈‍⬛ ML Engineer Build Training Container ML Pipeline Gather data Process data Train model Download existing model Compare new model with existing Deploy new model if better

‹#› 🐈‍⬛ ML Engineer Python Package Landscape Model Training Data Tools Package Management P oetry Code Quality and Testing Polars Other ML Tools Miscellaneous

‹#› Show, Not Tell

‹#› DevOps methodology for ML models . Operationalize CI/CD pipeline for ML. ❤️ opendatahub.io github.com/opendatahub-io

‹#› ❤️ opendatahub.io

‹#› 📖 What’s in the box? Scale distributed computing for AI Automatically adjust the underlying workers based on demands Multi-user Jupyter Used for data science and research Streamline the entire ML lifecycle Accelerate model development & deployment Design pipeline with drag and drop ease Kubeflow integration Supports multiple ML frameworks Deploy and scale AI models quickly and efficiently AI Explainability Toolkit Aims to mitigate AI bias, enhancing trust and fairness in AI systems Trusty AI
Tags