AI/ML Infra Meetup | Optimizing ML Data Access with Alluxio: Preprocessing, Pretraining, & Inference at Scale

Alluxio 296 views 15 slides Mar 11, 2025
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

AI/ML Infra Meetup
Mar. 06, 2025
Organized by Alluxio

For more Alluxio Events: https://www.alluxio.io/events/

Speaker:
- Bin Fan (VP of Technology @ Alluxio)

In this talk, Bin Fan shares his insights on data access challenges in ML applications, with particular emphasis on how Alluxio's dist...


Slide Content

Optimizing ML Data Access with Alluxio
Preprocessing, Pretraining, & Inference at
Scale
Bin Fan
Founding Engineer, VP of Technology @ Alluxio
March 6th 2025

About Me

2
Bin Fan
○Founding Engineer, VP of Technology @ Alluxio
○Email: [email protected]
○Linkedin: https://www.linkedin.com/in/bin-fan/
○Previously worked in Google
○PhD in CS at Carnegie Mellon University

Powered by Alluxio
ZhihuTELCO & MEDIA
E-COMMERCE
FINANCIAL SERVICES
TECH & INTERNET
OTHERS

4
Alluxio Data Platform
Accelerate data-intensive AI & Analytics workloads

Pretraining

DeepSeek: Redefining Open-Source LLMs
●Performance on Par with SOTA Models Like GPT-4 at a fraction of the cost
●Disrupting the Competitive Landscape
○Expanding accessibility to much broader audiences
○Higher bar for upcoming general-purpose LLMs
○Potentially more possibility on LLMs with private domain adaptation
●A key lesson: great LLMs can be created by small teams with extremely
efficient resource utilization

Engineering/Resource Efficiency in Pre-training
Data Lake (All Data)
us-east-1
Training
Distributed Cache
(Alluxio)

Fast Access with
Only Hot Data Cached
Only retrieve
Data on Demand
Distributed Cache
(Alluxio)
us-west-1
Training
●High and consistent I/O performance
→ Comparable I/O performance to HPC storage

●Cloud agnostic
→ Easy to extend the prod env to multi-region/cloud

●Transparent Cache Mgmt
→ Avoid repeatedly preparing (same) data, and the
overhead to maintain local storage

Inference

LLM Inference: Two Key Metrics
Throughput (System Perspective)
●Measures tokens / sec
●Higher throughput → Better resource utilization, lower system cost
First-time to token (User Perspective)
●Measures time from request submission to the first token generation
●< 100ms → Smooth user experience

GPU Memory capacity: Primary Bottleneck
●VRAM is needed for Model Weight & KV-cache
●A typical 13B model inference on A100
●GPT-3 (175B) requires 350GB GPU RAM to load
model weights.
●Large KV-cache is needed for longer context
windows

KV Cache Offloading
●A critical optimization for speeding up Transformer models
○Significantly speeding up text generation by reusing previous context instead of recalculating
attention for all tokens at each step.
○Example KV Cache systems :
■LMCache (vLLM Production Stack), MoonCake, etc
●Experimenting Alluxio as a Tiered KV cache
○Talk to me if you are interested in this


Mooncake

Deepseek 3FS

DeepSeek 3FS: High-Performance Parallel Filesystem
●Newly Open-Source Parallel Filesystem by DeepSeek
○Purpose-Built for RDMA + NVMe hardware
○Powered by FoundationDB Scalable metadata
○Achieves 40GB/s per node throughput (8TB/s with 180 nodes)
●Optimized for High-Throughput Workloads
○Focused on large file read/write performance (not for general-purpose use)
○Recommended using FFRecord format for efficient small file aggregation

Complementary Technologies
●3FS: Modern Parallel Filesystem (Similar to GPFS, Lustre)
○Optimized for I/O-intensive workloads with RDMA + NVMe
●Alluxio: Distributed Caching & Access Layer
○Bridges Compute & Data Lakes, accelerating I/O workloads
○Achieves RDMA-comparable read speeds with intelligent caching
○Provides namespace abstraction & indirection for S3, HDFS, GCP, and more → Cloud-agnostic I/O
●Alluxio can integrate with 3FS, just like S3 or HDFS
○Enables high-mid-low tiered I/O solutions, allowing applications to optimize performance and cost

<<Scan code to register