AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune

Alluxio 365 views 17 slides Mar 11, 2025

Slide 1 of 17

About This Presentation

AI/ML Infra Meetup
Mar. 06, 2025
Organized by Alluxio

For more Alluxio Events: https://www.alluxio.io/events/

Speaker:
- Chongxiao Cao (Senior SWE @ Uber)

Chongxiao Cao from Uber's Michelangelo training team shared valuable insights into Uber's approach to optimizing LLM training and fin...

Size: 2.09 MB

Language: en

Added: Mar 11, 2025

Slides: 17 pages

Slide Content

How Uber Optimizes
LLM Training and
Finetune
Chongxiao Cao

3/6/2025
Seattle, WA

Apache
Kafka
®

Apache
Flink
®

Streaming
Data
Lake
Apache
Spark
®
ETL
Realtime
Feature
Store
Batch
Feature
Store
Basis
Features
& Labels
Model
Training
Model
Store
Multi-level Orchestration
Realtime
Prediction
Service
Batch
Prediction
Jobs
Data
Lake
RPC
Client
Get Data Train Models Deployment
ML Observability
Realtime
Batch
Michelangelo Overview

Rec Models vs LLMs
Source: Naumov et al., "Deep
Learning Recommendation Model for
Personalization and Recommendation
Systems," 2019.

Source: Vaswani et al., "Attention
Is All You Need," 2017.

1.Complexity from LLMs
a.Require model parallelism support

2.Complexity from integration
a.Storage: Cloud, On-prem remote, Uber Proprietary (Terrablob), Local
i.Terrablob: Uber’s centralized solution to store blob data
b.Hardware: On-prem/Cloud A100/H100 GPUs
c.Open source libraries (details on next slide)

3.Agile development environment
a.Accelerate dev and debug
Challenges of Supporting LLMs @ Uber
Build a
docker
image
Run a full
pipeline
Validate
results
Update
infra
code

A few building blocks for LLMOps Journey

●Open source components:
a.Ray
®

b.Hugging Face
®
: Transformers, Accelerate, PEFT, Bitsandbytes
c.PyTorch
®

d.Microsoft Deepspeed
®

e.Nvidia
®
:TensorRT, Triton Inference Server
f.Dao AI Lab Flast Attention
®

g.…

Our mission: Provide an unified framework for Uber LLMOps, saving development effort.

Ray: setup distributed process group
LLM Training Architecture

Download/Train SOTA LLMs
Model Parallelism
Comm: NVLink/NVSwitch/Ethernet/IB

Ray on k8s®

1.Federated Scheduling over multiple
k8s clusters
2.Elastic Resource Sharing within a
cluster using Hierarchical Queues.
3.Open source kuberay operator for
managing Ray clusters

Training Pipeline Design
●Training Pipeline ●Model Parallelism

1.Data source:

Cloud(e.g. GCS)/On-prem remote
Public dataset (Huggingface Dataset)
TerraBlob(Uber's blob store)

2.Data format:

Parquet/JSON

Distributed Data Loading

Logging/Profiling
●Logging support
○Comet® Logging
●Profiler support
○PyTorch Profiler

1.Simple APIs
a.Input data
b.Model
c.Output destination

2.Data parallelism
a.Use Ray APIs

3.Model parallelism to handle large
models such as llama-2-70B

4.Save prediction results to remote
Offline Model Prediction

1.Provision a Ray cluster with
JupyterLab.

2.Edit training config files and
code (dev purpose) on the
fly.

3.Reuse the same Ray cluster
and submit multiple jobs.

4.Ray Dashboard logging and
monitoring support.
Interactive Dev Environment

Uber Eng Blog (10/2024):
Open Source and In-House: How Uber Optimizes LLM Training

Overview of LLM Finetune Framework

State-of-th
e-art open
source
models
State-of-th
e-art GPUs
State-of-th
e-art open
source
optimizatio
ns
Llama®, Qwen®,
Mixtral®,
Deepseek®*, …
NVDA A100
NVDA H100
On-prem, GCP
Flash attention
LoRA
QLoRA
Deepspeed
FSDP
+

*: pending on resource
+: lower efficiency compared with Deepspeed

14
Two Tower Embedding Recommendation Model
●Uber Two-tower model Eng blog.
Bring in LLM-based embedding solution

Bring LLM into Recommendation

15
Extended Training Architecture
Source:
https://en.wikipedia.org/wiki/Tra
nsfer_learning
Support customized transfer learning
●LLM encoder (Text Embedding)
●Dense/Sparse Feature Embeddings
●Interaction Layers
Ray
Transformers
PyTorch
Deepspeed
PyTorch Lightning®

16
Join Us
UberAI platform has SWE/MLE opening for DL/LLM training, please contact
[email protected] and [email protected] if interested.

AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

AI/ML Infra Meetup | How Uber Optimizes LLM Training and Finetune

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx