Alluxio Webinar | Alluxio + S3 A Tiered Architecture for Latency-Critical, Semantically-Rich Workloads

Alluxio 41 views 26 slides Oct 29, 2025
Slide 1
Slide 1 of 26
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26

About This Presentation

Alluxio Webinar
Oct 28, 2025

For more Alluxio Events: https://www.alluxio.io/events/

Speaker:
Jingwen Ouyang (Senior Product Manager @ Alluxio)

Amazon S3 and other cloud object stores have become the de facto storage system for organizations large and small. And it’s no wonder why. Cloud objec...


Slide Content

Alluxio Confidential
Alluxio + S3:
A Tiered Architecture for Latency-Critical,
Semantically-Rich Workloads

Jingwen Ouyang
Senior Product Manager @Alluxio
Oct 28th, 2025

2
Challenge In The New Era
$
Data… Lots of data… Lots of fast Data!

400+ trillion objects stored
150+ million requests per second
11 nines durability (99.999999999%)
~$23/TB/month cost-effective pricing
But Modern Workloads Demand More...
As AI training, inference, and real-time analytics evolve, S3 or Object Storage in general shows strain with latency-critical and
semantically rich operations.
Amazon S3 Becomes the Hard Disk for Cloud

What an Architect is asking for today:
-Sub-millisecond SLAs for online queries, feature stores, agentic memory, etc
-Efficient write-ahead logs and checkpointing for large objects
-High-performance metadata operations across millions of objects
-All while maintaining S3's pricing, scalability, and operational simplicity
Modern AI & Real Time Workload Requirements

-Latency: GetObject TTFB typically 30-200ms — acceptable for batch,
painful for inference and real-time access
-Limited Semantics: Rename operations require copy + delete; append
writes not supported in Standard S3
-Metadata Operations: Standard S3 directories are prefixes, making
large-scale listing expensive and slow
Where S3 Shows Strain

The Reality: S3 excels as a capacity store, but real-time and
latency-critical workloads requiring sub-millisecond response times.

Add a transparent, distributed caching and augmentation layer on top of S3,
combining the best of multiple worlds:
-Mountable experience of AWS FSx for Lustre
-Ultra-low latency of AWS S3 Express One Zone
-Cost efficiency of standard S3 buckets
-No data migration required
Our Solution: Augment, Don't Replace

7
What is Alluxio, indeed?
$
A shim layer on S3 (or other cloud storage) to
provide sub-ms read latency, single-digit ms
write latency, and enhanced semantics, driven
by morden data-intensive workloads

Journey of Alluxio Since Inception

Alluxio open source
project founded
UC Berkeley AMPLab
2019 2023
Baidu deploys
1000+ node cluster
2014
Alluxio scales to
1 billion files
7/10 top internet brands
accelerated by Alluxio
AliPay accelerates
model training
BIG DATA ANALYTICS CLOUD ADOPTION

GENERATIVE AI
1000+ OSS
Contributors
Meta accelerates
Presto workloads
9/10 top internet brands
accelerated by Alluxio
2024
Alluxio scales to
10+ billion files
Leading ecommerce brand
accelerates model training
Fortune 5 brand
accelerates model training
Zhihu accelerates
LLM model training

9
Alluxio in AI & Analytics Ecosystem
S3, POSIX, FSSPEC, HDFS s3://bucket/text.txt
or
/S3/text.txt

Alluxio for Low Latency Caching
Alluxio is the industry-leading
sub-ms time to first byte (TTFB) solution on S3-class storage
How much better is Alluxio? (Details next slide)
➔45x Lower Latency than S3 Standard
➔5x Lower Latency than S3 Express One Zone
➔Unlimited, linear scalability

Alluxio for Low Latency Caching
➔45x Lower Latency than S3 Standard
➔5x Lower Latency than S3 Express One Zone
Test environment references

Alluxio EE
● Version/Spec: Alluxio Enterprise
AI 3.6 (50TB cache)
● Test env: 1 FUSE (C5n.metal,
100Gbps network) and 1
Worker (i3en.metal)
AWS S3
● Version/Spec: AWS S3 bucket
(Standard Class)
● Test env: 1 FUSE (C5n.metal,
100Gbps network)
AWS S3 Express One Zone
● Version/Spec: AWS bucket (S3
Express One Zone Class)
● Test env: 1 FUSE (C5n.metal,
100Gbps network)

Alluxio
Worker n
Alluxio
Worker 2
Big Data Query Big Data ETL Model Training
Core Feature 1: Distributed Caching
Alluxio
Worker 1
A
B
s3:/bucket/file1
s3://bucket/file2C
A C B
Worker selection based
on consistent hashing
●Fine grained chucks
●Can cache the more important ports
s3://bucket/file2

Core Feature 2: Filesystem Namespace Virtualization
●Alluxio can be viewed as a logical file system
○Multiple different storage service can be mounted into same logical Alluxio namespace
●An Alluxio path is backed by an persistent storage address
○alluxio://ip:port/bucket/Users/ <-> S3://bucket/Users
●Easy mount command

Alluxio Namespace
AWS us-east-1

/
Data Users
Alice Bob
s3://bucket/Users
Alice Bob
On-prem data warehouse

hdfs://service/salesdata
Reports Sales
Reports Sales
$ bin/alluxio mount add \
--path /s3/bucket/Users \
--ufs-uri s3://bucket/Users

Common theme:
●Use Apache Parquet format for fast
point-query lookup into structured data
○Industry standard today for data lake
●Store Parquet files of PB level on S3
●Read directly from S3 is bad in tail
latency
Low Latency Read Accelerator on S3
AWS
SQL/Pandas/Polars
Data Lake
Distributed Cache
~1 ms
30 ms - 200 ms

Common theme:
●Can be overwrite or append
●Either keeps replication in Alluxio space
or asynchronously upload to S3

Low-latency & “Reliable” Write Buffer on S3
AWS
Rocksdb/S3 Client
Data Lake
Distributed Cache
~5 ms Append
Upload in
background

Alluxio: Bringing Performance and Semantics to S3
A software layer that transparently sits between applications and S3 (or any object store)
offering both POSIX and S3-compatible APIs.
Benefits (on top of S3) Capability
Zero-migration Mount existing S3 buckets as-is; no data move required
Low-latency accelerator Achieves sub-ms latency for S3 objects
Semantic bridge Allows user to use S3 or POSIX
Minimal-hardware requirementPool local SSDs/NVMEs for intelligent, cost-efficient caching
Flexible write modes Enable append, async writes, and cache-only updates
Kubernetes-native Deploy via Operator; integrated metrics, tracing, and observability

Alternatives on AWS: Side-by-Side Comparison
Feature S3 Standard S3 Express One
Zone
FSx Lustre + S3 Alluxio + S3
Latency (TTFB)100+ ms 1–10 ms 1 ms 1 ms
Multi-cloud ❌ ❌ ❌ ✅
POSIX API ❌ ❌ ✅ ✅
S3 API ✅ ✅ ❌ ✅
Support WALs
(Append)
❌ ✅ ✅ ✅ (via POSIX)
Data Migration
Required
No High (Creation
time choice)
No No
Cost ($/TB/mo)

~$23
1
~$110
2
~$143
3
~$23
4
to ~$41
5

1
Assumes S3 standard is the source of
truth, hoping full data
2
Assumes S3 Express One Zone holding full
data, as it needs to be decided at bucket
creation time
3
Assumes for 1,000 MB/s/TiB class, FSx
Lustre holding 20% hot data, while S3
keeps full data
4
Assumes Alluxio deployed on GPU spare
disks holding 20% hot data, no additional
hardware cost, while S3 keeps full data
5
Assumes separate Alluxio cluster holding
20% hot data using i3en.6xlarge instances
(1 yr reserved), while S3 keeps full data

18
Real World Use Cases
$
●Blackout Power Trading
●Salesforce

30 minutes between last forecast and market close
= 15 minutes to run inference w/ thousands of small ML models
+15 minutes to review, manually adjust for risk profiles and
human insights, and submit
Market Window Pressure Cooker
S3 Becoming Inefficient

●Currently 5k models (~6/sec) -> growing to 100k (~110/sec)
●Query averaged 3.8 seconds for inference
●Reading models artifact and saving results 3+ seconds
●~80% of time waiting on IO
CASE STUDY 1: Blackout Power Trading

Achieving Double-Digit Millisecond Offline Feature Stores with Alluxio

Query Type* Without Alluxio With Alluxio
Query (ms) Cold Query (ms) Hot Query (ms)
Inference (24 rows) 3,727 99 45
Training (70k rows) 3,841 171 104
Performance
Operational Payoff
● ~60x reduction in latency for inference
● ~30× reduction in latency for training
● Scaling from 5,000 to 100,000+ models in the same 15-minute window
● No online feature store necessary
● Low latency training data
* 20 table join, 4 columns per table, 1 primary key, 81 column result
CASE STUDY 1: Blackout Power Trading

Achieving Double-Digit Millisecond Offline Feature Stores with Alluxio

CASE STUDY 1: Blackout Power Trading

Achieving Double-Digit Millisecond Offline Feature Stores with Alluxio
Talk from Greg Lindstrom @ Blackout Power Trading
https://www.alluxio.io/videos/ai-ml-infra-meetup-achieving-double-digit-millisecond-offline-feature-stores-with-alluxio

CASE STUDY 2: Salesforce

Ultra Low Latency Access for Parquet on S3
22
Before
50 ms -
200 ms
Agents
Iceberg
Data Lake
Challenges: requires ultra-low latency
data accesses to PBs of parquet files for
Agentforce.
●S3 P99 latency is ~100 ms
●Data lake is huge (PB+)
●Agents make multiple (10 to 100+)
fine-grained requests to the data
lake.
●Same function needed across 50
AZs
~1 ms
Query Offloading
Agents
Iceberg
Data Lake
After

CASE STUDY 2: Salesforce

Ultra Low Latency Access for Parquet on S3
23
With query offloading (like single RPC to a
parquet reader), the request is optimized, and
much of the filtering/logic happens closer to the
data, reducing chatter and latency.
Result: 411 ms → 0.3 ms

CASE STUDY 2: Salesforce

Ultra Low Latency Access for Parquet on S3
24
A joint engineering collaboration between Alluxio and Salesforce. For More details:
https://www.alluxio.io/whitepaper/meet-in-the-middle-for-a-1-000x-performance-boost-querying-parquet-files
-on-petabyte-scale-data-lakes

25
Takeaway
$
●S3 remains essential but wasn't designed for latency-critical, semantically rich
workloads
●Don't need to migrate away from S3 — augment it with intelligent caching and
performance layers
●Alluxio bridges the gap between S3's cost-effectiveness and the performance
demands of modern AI/ML workloads
●Proven results: Sub-millisecond latency, and real-world production deployments
●Maximize ROI: Get premium performance without premium costs or operational
complexity

BOOK A DEMO WITH US