Alluxio Webinar | Model Training Across Regions and Clouds – Challenges, Solutions and Live Demo

Alluxio 139 views 18 slides Oct 15, 2024
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

Alluxio Webinar
October.15, 2024

For more Alluxio Events: https://www.alluxio.io/events/

Speaker:
- Tom Luckenbach (Solutions Engineering Manager, Alluxio)

AI training workloads running on compute engines like PyTorch, TensorFlow, and Ray require consistent, high-throughput access to training da...


Slide Content

Alluxio Confidential
Model Training Across Regions & Clouds
– Challenges, Solutions and Live Demo
| Monthly Webinar
Tom Luckenbach
Solutions Engineering Manager @ Alluxio

Alluxio Confidential
AI teams must conquer…
Build models fast. Get to market fast. Learn and
iterate fast.
Sustaining speed at scale requires an efficient
and cost-effective infrastructure.
Ensure AI builders always have the GPUs they
need, when they need.
SPEED
SCALE
SCARCITY

Alluxio Confidential
Unfortunately, most AI teams are stuck.
SPEED
SCALE
SCARCITY
Slow, brittle development and training
workloads delay launch and erode productivity.
Data and compute infra needed to achieve speed
at scale is cost-prohibitive.
Cost & complexity of replicating persistent data
prevents relocating workloads to available GPUs.

Alluxio Confidential
Fortunately, thereʼs Alluxio.
SPEED
SCALE
SCARCITY
Accelerates AI development, training, and
deployment cycles to get to market faster.
Maximizes speed and GPU utilization even with
low-cost, large-scale data infrastructure.
Enables seamless workload portability to utilize
GPUs wherever they are.

Alluxio Confidential
Alluxio makes it easy to use data from
any storage
with any compute
in any environment,
for higher performance, at lower cost
5

Alluxio Confidential
Accelerated by Alluxio
ZhihuTELCO & MEDIA
E-COMMERCE
FINANCIAL SERVICES
TECH & INTERNET
OTHERS

Alluxio Confidential
Options for Accessing Data

Alluxio Confidential
Single Location - Access Data Locally and Remotely
Data Lake
Sources of Truth
Training Cluster
Training Cluster
Pros:

Simple – A Single source of truth
Cons:
●Slow, inconsistent performance
●High costs for accessing regional
cloud storage
Option 1:

Alluxio Confidential
Data Lake
Sources of Truth
Training Cluster
Pros:
gain performance from data locality.
Cons:
●Cost of managing of replication
●Slow, inconsistent performance
●High access costs + costs of
duplication of cloud storage
Training Cluster
REPLICATION
Duplication of Data Between Locations
Option 2:

Alluxio Confidential
Training Cluster
HPC Storage
Pros:

Consistent I/O performance
Cons:
●High cost of HPC storage
●+ Cost and complexity of managing
replication
Data Lake
Sources of Truth
Training Cluster
HPC Storage REPLICATION
Using HPC Storage + Duplicating Data
Option 3:

Alluxio Confidential
Data Lake
Sources of Truth
Training Cluster
Training Cluster
Pros:

●Consistent I/O Performance
●Single Source of Truth
●Dynamically Caches Data Needed for
the Jobs
●Scalable Across N Regions/Clouds
●Simplifies Data Abstraction across
multiple data protocols
Leverage AI-Optimized, Distributed Caches
Option 4:

Alluxio Confidential
SPEED, SCALE, SCARCITY. SOLVED.
AI Training Cluster
On-Prem
ALLUXIO DISTRIBUTED CACHE
Alluxio AI Acceleration Platform
ALLUXIO UNIFIED NAMESPACE
Data Lake & Data Silos
Sources of Truth
Deploys on or near
your training workloads
Distributed cache
leverages commodity SSD/NVMe
drives
Dynamic loading or scheduled
pre-loading of training data
No modifications to apps - Data
access APIs: s3, POSIX/fuse, or
REST API
Faster Reads and Faster
checkpoints.
Alluxio AI Acceleration Platform Overview

Alluxio Confidential
Alluxio Benchmark
with Resnet-50
●GPU server: AWS EC2/Kubernetes
●Deep learning algorithm: ResNet (top CV algorithm)
●Deep learning framework: PyTorch
●Dataset: ImageNet (subset – ~35k images, each is ~100kB – 200kB)
●Dataset storage: AWS S3 (single region)
●Visualization: TensorBoard
●Code execution: Jupyter notebook
CONFIGURATION

Alluxio Confidential
GPU Summary
NameTesla T4
Memory14.62GB
Compute Capability7.5
GPU Utilization16.96%
Est. SM Efficiency16.91%
Est. Achieved Occupancy68.75%
Kernel Time using Tensor Cores0.0%
CategoryTime Duration (us)Percentage
Average Step Time 1,763,649,145 100%
Kernel 299,168,905 16.96%
Memcpy 10,521,722 0.6%
Memset 39,459 0%
Runtime 3,043,169 0.17%
DataLoader 1,446,068,956 81.99%
CPU Exec 1,570,076 0.09%
Other 3,245,858 0.18%

Resnet-50 – 3 Epochs
Using S3 Fuse
Resnet-50 using S3 Fuse
> 80% of total time is spent in DataLoader
Result in low GPU Utilization Rate (<20%)

Alluxio Confidential
GPU Summary
NameTesla T4
Memory14.62GB
Compute Capability7,5
GPU Utilization93.29%
Est. SM Efficiency92.98%
Est. Achieved Occupancy68.03%
Kernel Time using Tensor Cores0.0%
CategoryTime Duration (us)Percentage
Average Step Time 334,274,946 100%
Kernel 311,847,023 93.29%
Memcpy 10,500,126 3.14%
Memset 43,946 0.01%
Runtime 3,899,241 1.17%
DataLoader 3,343,301 1%
CPU Exec 1,648,391 0.49%
Other 2,992,918 0.9%

Resnet-50 – 3 Epochs
Using Alluxio
Resnet-50 using Alluxio
Reduce Data Loader Rate from 82% to 1%
Increase GPU Utilization Rate from 17% to 93%

Alluxio Confidential
RAD (Rapid Alluxio Deployer) Demo

Alluxio Confidential
Questions?
Sign up RAD at
https://signup.alluxio-rad.io/
SPEED, SCALE, SCARCITY. SOLVED.