AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune
Alluxio
365 views
17 slides
Mar 11, 2025
Slide 1 of 17
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
About This Presentation
AI/ML Infra Meetup
Mar. 06, 2025
Organized by Alluxio
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Chongxiao Cao (Senior SWE @ Uber)
Chongxiao Cao from Uber's Michelangelo training team shared valuable insights into Uber's approach to optimizing LLM training and fin...
AI/ML Infra Meetup
Mar. 06, 2025
Organized by Alluxio
For more Alluxio Events: https://www.alluxio.io/events/
Speaker:
- Chongxiao Cao (Senior SWE @ Uber)
Chongxiao Cao from Uber's Michelangelo training team shared valuable insights into Uber's approach to optimizing LLM training and fine-tuning workflows.
Size: 2.09 MB
Language: en
Added: Mar 11, 2025
Slides: 17 pages
Slide Content
How Uber Optimizes
LLM Training and
Finetune
Chongxiao Cao
3/6/2025
Seattle, WA
Apache
Kafka
®
Apache
Flink
®
Streaming
Data
Lake
Apache
Spark
®
ETL
Realtime
Feature
Store
Batch
Feature
Store
Basis
Features
& Labels
Model
Training
Model
Store
Multi-level Orchestration
Realtime
Prediction
Service
Batch
Prediction
Jobs
Data
Lake
RPC
Client
Get Data Train Models Deployment
ML Observability
Realtime
Batch
Michelangelo Overview
Rec Models vs LLMs
Source: Naumov et al., "Deep
Learning Recommendation Model for
Personalization and Recommendation
Systems," 2019.
Source: Vaswani et al., "Attention
Is All You Need," 2017.
1.Complexity from LLMs
a.Require model parallelism support
2.Complexity from integration
a.Storage: Cloud, On-prem remote, Uber Proprietary (Terrablob), Local
i.Terrablob: Uber’s centralized solution to store blob data
b.Hardware: On-prem/Cloud A100/H100 GPUs
c.Open source libraries (details on next slide)
3.Agile development environment
a.Accelerate dev and debug
Challenges of Supporting LLMs @ Uber
Build a
docker
image
Run a full
pipeline
Validate
results
Update
infra
code
A few building blocks for LLMOps Journey
●Open source components:
a.Ray
®
b.Hugging Face
®
: Transformers, Accelerate, PEFT, Bitsandbytes
c.PyTorch
®
d.Microsoft Deepspeed
®
e.Nvidia
®
:TensorRT, Triton Inference Server
f.Dao AI Lab Flast Attention
®
g.…
Our mission: Provide an unified framework for Uber LLMOps, saving development effort.
Ray: setup distributed process group
LLM Training Architecture
Download/Train SOTA LLMs
Model Parallelism
Comm: NVLink/NVSwitch/Ethernet/IB
Ray on k8s®
1.Federated Scheduling over multiple
k8s clusters
2.Elastic Resource Sharing within a
cluster using Hierarchical Queues.
3.Open source kuberay operator for
managing Ray clusters
Training Pipeline Design
●Training Pipeline ●Model Parallelism
1.Data source:
Cloud(e.g. GCS)/On-prem remote
Public dataset (Huggingface Dataset)
TerraBlob(Uber's blob store)
2.Data format:
Parquet/JSON
Distributed Data Loading
Logging/Profiling
●Logging support
○Comet® Logging
●Profiler support
○PyTorch Profiler
1.Simple APIs
a.Input data
b.Model
c.Output destination
2.Data parallelism
a.Use Ray APIs
3.Model parallelism to handle large
models such as llama-2-70B
4.Save prediction results to remote
Offline Model Prediction
1.Provision a Ray cluster with
JupyterLab.
2.Edit training config files and
code (dev purpose) on the
fly.
3.Reuse the same Ray cluster
and submit multiple jobs.
4.Ray Dashboard logging and
monitoring support.
Interactive Dev Environment
Uber Eng Blog (10/2024):
Open Source and In-House: How Uber Optimizes LLM Training