Kubernetes & AI - Beauty and the Beast !?! @KCD Istanbul 2024
TobiasSchneck
112 views
51 slides
May 28, 2024
Slide 1 of 51
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
About This Presentation
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologie...
As AI technology is pushing into IT I was wondering myself, as an “infrastructure container kubernetes guy”, how get this fancy AI technology get managed from an infrastructure operational view? Is it possible to apply our lovely cloud native principals as well? What benefit’s both technologies could bring to each other?
Let me take this questions and provide you a short journey through existing deployment models and use cases for AI software. On practical examples, we discuss what cloud/on-premise strategy we may need for applying it to our own infrastructure to get it to work from an enterprise perspective. I want to give an overview about infrastructure requirements and technologies, what could be beneficial or limiting your AI use cases in an enterprise environment. An interactive Demo will give you some insides, what approaches I got already working for real.
Size: 6.79 MB
Language: en
Added: May 28, 2024
Slides: 51 pages
Slide Content
Kubernetes & AI
?????? Beauty and the ?????? Beast!?!
Tobias Schneck
@ [email protected]
@toschneck
Principal Architect
@toschneck
As Kubernetes folks
why should we care about AI?
??????
… will it be the next big thing?!
??????
By 2028, the adoption of AI will culminate in
over 50% of cloud compute
resources devoted to AI workload, up from
less than 10% in 2023.
Gartner® states, 2023
OK … so what’s about this AI thingy?
????????????
Drawings created with Excalidraw, thanks Koray Oksay (@korayoksay) for the hint ??????
??????
… a lot of Data and Math for an
Infrastructure guy ??????
… how does such data get compute?
?????? ??????
Credits to Andrej Karpathy ?????? Awesome Intro to LLMs
[1hr Talk] Intro to Large Language Models
Credits to Andrej Karpathy ??????
Credits to Andrej Karpathy ??????
Credits to Andrej Karpathy ??????
Credits to Andrej Karpathy ??????
Credits to Andrej Karpathy ??????
How does our normal Job look like?
??????
Based on Adel Zaalouk (@ZaNetworker) drawings from the CNCF Cloud Native AI white paper ??????
What will change in our Infra?
??????
Based on Adel Zaalouk (@ZaNetworker) drawings from the CNCF Cloud Native AI white paper ??????
Based on Adel Zaalouk (@ZaNetworker) drawings from the CNCF Cloud Native AI white paper ??????
So, Why Kube ?
Flexibility & Standardization
Standard
Container
High-Cube
Container
Hardtop
Container
Open Top
Container
Flat Platform (Plat) Ventilated
Container
Cooling Container Bulk Container
Tank
Container
Container Types
Data Center I
Infrastructure Layer
Standardization with Kubernetes
App Services
IT Space I IT Space II IT Space III
Backend Services
DB Services Analytics Observability
Data Center II
Data Center III
Edge
Cloud Providers
Caches AI / ML
DDoS
Protect
Managed
Services
Real Time
Analysis
Intelligent
Edge Devices
Smart
Automation
Data
Processing
Kube for AI ⇔ Kube for Applications
?????? Kube is a de facto standard as “cloud operating systems”
✅ API abstraction layer for multiple types of network, storage, and compute resources
?????? Standard interfaces for support of DevOps best practices like GitOps
?????? Variation of Cloud Providers and Services are consumable via standard APIs
… and OpenAI uses Kube already 2017 ??????
So …. ??????
How to build an AI Platform?
?????? ??????
How does the ecosystem look
like!?
?????? ??????
AI Frameworks
Options to Adapt the Frameworks ??????
Feature
“Kube Native”
via Kubeflow
Managed Platforms
(e.g., SageMaker)
Focused Tools
(e.g., MLflow)
Scope End-to-end MLOps platformManaged MLOps service
Specific functionalities within
ML lifecycle
Open Source Yes No Yes
Scalability & Portability High Depends on cloud provider Moderate
Setup & Management Complex Simpler Simpler
Portability Everywhere Mostly Cloud Mostly Machine based
Vendor Lock-in No Yes (to specific cloud provider)No
AI Frameworks
Could use
KubeFlow ⁉
?????? Currently the most
feature complete
choice for Kube
?????? But Setup is complex!
KubeFlow ⁉
The Beauty ?????? :
●Incubating CNCF Project
●Serving AI Platform in Multi-Tenancy
●Popularity 13.7k ⭐ ~ long-term Maintenance Chance
●Alternatives like MLflow / KServe are integrated
The Beast ??????
●Mostly vendor specific installer instructions ??????
○No maintained automated installer for generic Kubernetes
○Helm chart issue #3173
●Dependency “hell”
○A lot of different 3rd party dependencies constraints
○Hard to adapt again to existing company defaults
●Only support EOL Kubernetes <= 1.26❗
○Usability is then questionable in production
Sounds good, but what about
on-prem / offline cases?
??????
[Cloud] Data Center I
GPU / TPU Powered Services
based on Argo CD
AI Model Serving
[AI] Application Service
Application
⚙ Separate Model Training / Model Usage Example
Infrastructure Layer
Data Center II
Data Center III
Edge
Cloud Providers
Real Time
Analysis
Intelligent
Edge Devices
Smart
Automation
Data
Processing
Data Delivery
Model
Export
Local AI
Consume
Scale for
Training
Vanilla Setup
Starting a POC
??????
github.com/toschneck/kubernetes-and-ai
Kubeflow | Katib Architecture for Hyperparameter
Tuning (aka optimization run)