AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, Pretraining, & Inference at Scale

Alluxio 296 views 15 slides Mar 11, 2025

Slide 1 of 15

About This Presentation

AI/ML Infra Meetup
Mar. 06, 2025
Organized by Alluxio

For more Alluxio Events: https://www.alluxio.io/events/

Speaker:
- Bin Fan (VP of Technology @ Alluxio)

In this talk, Bin Fan shares his insights on data access challenges in ML applications, with particular emphasis on how Alluxio's dist...

Size: 4.15 MB

Language: en

Added: Mar 11, 2025

Slides: 15 pages

Slide Content

Optimizing ML Data Access with Alluxio
Preprocessing, Pretraining, & Inference at
Scale
Bin Fan
Founding Engineer, VP of Technology @ Alluxio
March 6th 2025

About Me

2
Bin Fan
○Founding Engineer, VP of Technology @ Alluxio
○Email: [email protected]
○Linkedin: https://www.linkedin.com/in/bin-fan/
○Previously worked in Google
○PhD in CS at Carnegie Mellon University

4
Alluxio Data Platform
Accelerate data-intensive AI & Analytics workloads

Pretraining

DeepSeek: Redeﬁning Open-Source LLMs
●Performance on Par with SOTA Models Like GPT-4 at a fraction of the cost
●Disrupting the Competitive Landscape
○Expanding accessibility to much broader audiences
○Higher bar for upcoming general-purpose LLMs
○Potentially more possibility on LLMs with private domain adaptation
●A key lesson: great LLMs can be created by small teams with extremely
eﬃcient resource utilization

Engineering/Resource Eﬃciency in Pre-training
Data Lake (All Data)
us-east-1
Training
Distributed Cache
(Alluxio)
…
Fast Access with
Only Hot Data Cached
Only retrieve
Data on Demand
Distributed Cache
(Alluxio)
us-west-1
Training
●High and consistent I/O performance
→ Comparable I/O performance to HPC storage

●Cloud agnostic
→ Easy to extend the prod env to multi-region/cloud

●Transparent Cache Mgmt
→ Avoid repeatedly preparing (same) data, and the
overhead to maintain local storage

Inference

LLM Inference: Two Key Metrics
Throughput (System Perspective)
●Measures tokens / sec
●Higher throughput → Better resource utilization, lower system cost
First-time to token (User Perspective)
●Measures time from request submission to the ﬁrst token generation
●< 100ms → Smooth user experience

GPU Memory capacity: Primary Bottleneck
●VRAM is needed for Model Weight & KV-cache
●A typical 13B model inference on A100
●GPT-3 (175B) requires 350GB GPU RAM to load
model weights.
●Large KV-cache is needed for longer context
windows

KV Cache Oﬄoading
●A critical optimization for speeding up Transformer models
○Signiﬁcantly speeding up text generation by reusing previous context instead of recalculating
attention for all tokens at each step.
○Example KV Cache systems :
■LMCache (vLLM Production Stack), MoonCake, etc
●Experimenting Alluxio as a Tiered KV cache
○Talk to me if you are interested in this

Mooncake

Deepseek 3FS

DeepSeek 3FS: High-Performance Parallel Filesystem
●Newly Open-Source Parallel Filesystem by DeepSeek
○Purpose-Built for RDMA + NVMe hardware
○Powered by FoundationDB Scalable metadata
○Achieves 40GB/s per node throughput (8TB/s with 180 nodes)
●Optimized for High-Throughput Workloads
○Focused on large ﬁle read/write performance (not for general-purpose use)
○Recommended using FFRecord format for eﬃcient small ﬁle aggregation

Complementary Technologies
●3FS: Modern Parallel Filesystem (Similar to GPFS, Lustre)
○Optimized for I/O-intensive workloads with RDMA + NVMe
●Alluxio: Distributed Caching & Access Layer
○Bridges Compute & Data Lakes, accelerating I/O workloads
○Achieves RDMA-comparable read speeds with intelligent caching
○Provides namespace abstraction & indirection for S3, HDFS, GCP, and more → Cloud-agnostic I/O
●Alluxio can integrate with 3FS, just like S3 or HDFS
○Enables high-mid-low tiered I/O solutions, allowing applications to optimize performance and cost

AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, Pretraining, & Inference at Scale

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, Pretraining, &amp; Inference at Scale

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx

AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, Pretraining, & Inference at Scale