Amazon S3 and other cloud object stores have become the de facto storage system for organizations large and small. And it’s no wonder why. Cloud objec...
Alluxio Webinar
Oct 28, 2025
For more Alluxio Events: https://www.alluxio.io/events/
Amazon S3 and other cloud object stores have become the de facto storage system for organizations large and small. And it’s no wonder why. Cloud object stores deliver unprecedented flexibility with unlimited capacity that scales on demand and ensures data durability out-of-the-box at unbeatable prices.
Yet as workloads shift toward real-time AI, inference, feature stores, and agentic memory systems, S3’s latency and limited semantics begin to show their limits. In this webinar, you’ll learn how to augment — rather than replace — S3 with a tiered architecture that restores sub-millisecond performance, richer semantics, and high throughput — all while preserving S3’s advantages of low-cost capacity, durability, and operational simplicity.
We’ll walk through:
- The key challenges posed by latency-sensitive, semantically rich workloads (e.g. feature stores, RAG pipelines, write-ahead logs)
- Why “just upgrading storage” isn’t sufficient — the bottlenecks in metadata, object access latency, and write semantics
- How Alluxio transparently layers on top of S3 to provide ultra-low latency caching, append semantics, and zero data migration with both FSx-style POSIX access and S3 API access.
- Real-world results: achieving sub-ms TTFB, >90% GPU utilization in ML training, 80X faster feature store query response times, and dramatic cost savings from reduced S3 operations
- Trade-offs, deployment patterns, and best practices for integrating this tiered approach in your AI/analytics stack
Size: 4.19 MB
Language: en
Added: Oct 29, 2025
Slides: 26 pages
Slide Content
Alluxio Confidential
Alluxio + S3:
A Tiered Architecture for Latency-Critical,
Semantically-Rich Workloads
Jingwen Ouyang
Senior Product Manager @Alluxio
Oct 28th, 2025
2
Challenge In The New Era
$
Data… Lots of data… Lots of fast Data!
400+ trillion objects stored
150+ million requests per second
11 nines durability (99.999999999%)
~$23/TB/month cost-effective pricing
But Modern Workloads Demand More...
As AI training, inference, and real-time analytics evolve, S3 or Object Storage in general shows strain with latency-critical and
semantically rich operations.
Amazon S3 Becomes the Hard Disk for Cloud
What an Architect is asking for today:
-Sub-millisecond SLAs for online queries, feature stores, agentic memory, etc
-Efficient write-ahead logs and checkpointing for large objects
-High-performance metadata operations across millions of objects
-All while maintaining S3's pricing, scalability, and operational simplicity
Modern AI & Real Time Workload Requirements
-Latency: GetObject TTFB typically 30-200ms — acceptable for batch,
painful for inference and real-time access
-Limited Semantics: Rename operations require copy + delete; append
writes not supported in Standard S3
-Metadata Operations: Standard S3 directories are prefixes, making
large-scale listing expensive and slow
Where S3 Shows Strain
The Reality: S3 excels as a capacity store, but real-time and
latency-critical workloads requiring sub-millisecond response times.
Add a transparent, distributed caching and augmentation layer on top of S3,
combining the best of multiple worlds:
-Mountable experience of AWS FSx for Lustre
-Ultra-low latency of AWS S3 Express One Zone
-Cost efficiency of standard S3 buckets
-No data migration required
Our Solution: Augment, Don't Replace
7
What is Alluxio, indeed?
$
A shim layer on S3 (or other cloud storage) to
provide sub-ms read latency, single-digit ms
write latency, and enhanced semantics, driven
by morden data-intensive workloads
Journey of Alluxio Since Inception
Alluxio open source
project founded
UC Berkeley AMPLab
2019 2023
Baidu deploys
1000+ node cluster
2014
Alluxio scales to
1 billion files
7/10 top internet brands
accelerated by Alluxio
AliPay accelerates
model training
BIG DATA ANALYTICS CLOUD ADOPTION
GENERATIVE AI
1000+ OSS
Contributors
Meta accelerates
Presto workloads
9/10 top internet brands
accelerated by Alluxio
2024
Alluxio scales to
10+ billion files
Leading ecommerce brand
accelerates model training
Fortune 5 brand
accelerates model training
Zhihu accelerates
LLM model training
9
Alluxio in AI & Analytics Ecosystem
S3, POSIX, FSSPEC, HDFS s3://bucket/text.txt
or
/S3/text.txt
Alluxio for Low Latency Caching
Alluxio is the industry-leading
sub-ms time to first byte (TTFB) solution on S3-class storage
How much better is Alluxio? (Details next slide)
➔45x Lower Latency than S3 Standard
➔5x Lower Latency than S3 Express One Zone
➔Unlimited, linear scalability
Alluxio for Low Latency Caching
➔45x Lower Latency than S3 Standard
➔5x Lower Latency than S3 Express One Zone
Test environment references
Alluxio EE
● Version/Spec: Alluxio Enterprise
AI 3.6 (50TB cache)
● Test env: 1 FUSE (C5n.metal,
100Gbps network) and 1
Worker (i3en.metal)
AWS S3
● Version/Spec: AWS S3 bucket
(Standard Class)
● Test env: 1 FUSE (C5n.metal,
100Gbps network)
AWS S3 Express One Zone
● Version/Spec: AWS bucket (S3
Express One Zone Class)
● Test env: 1 FUSE (C5n.metal,
100Gbps network)
Alluxio
Worker n
Alluxio
Worker 2
Big Data Query Big Data ETL Model Training
Core Feature 1: Distributed Caching
Alluxio
Worker 1
A
B
s3:/bucket/file1
s3://bucket/file2C
A C B
Worker selection based
on consistent hashing
●Fine grained chucks
●Can cache the more important ports
s3://bucket/file2
Core Feature 2: Filesystem Namespace Virtualization
●Alluxio can be viewed as a logical file system
○Multiple different storage service can be mounted into same logical Alluxio namespace
●An Alluxio path is backed by an persistent storage address
○alluxio://ip:port/bucket/Users/ <-> S3://bucket/Users
●Easy mount command
Alluxio Namespace
AWS us-east-1
/
Data Users
Alice Bob
s3://bucket/Users
Alice Bob
On-prem data warehouse
Common theme:
●Use Apache Parquet format for fast
point-query lookup into structured data
○Industry standard today for data lake
●Store Parquet files of PB level on S3
●Read directly from S3 is bad in tail
latency
Low Latency Read Accelerator on S3
AWS
SQL/Pandas/Polars
Data Lake
Distributed Cache
~1 ms
30 ms - 200 ms
Common theme:
●Can be overwrite or append
●Either keeps replication in Alluxio space
or asynchronously upload to S3
Low-latency & “Reliable” Write Buffer on S3
AWS
Rocksdb/S3 Client
Data Lake
Distributed Cache
~5 ms Append
Upload in
background
Alluxio: Bringing Performance and Semantics to S3
A software layer that transparently sits between applications and S3 (or any object store)
offering both POSIX and S3-compatible APIs.
Benefits (on top of S3) Capability
Zero-migration Mount existing S3 buckets as-is; no data move required
Low-latency accelerator Achieves sub-ms latency for S3 objects
Semantic bridge Allows user to use S3 or POSIX
Minimal-hardware requirementPool local SSDs/NVMEs for intelligent, cost-efficient caching
Flexible write modes Enable append, async writes, and cache-only updates
Kubernetes-native Deploy via Operator; integrated metrics, tracing, and observability
Alternatives on AWS: Side-by-Side Comparison
Feature S3 Standard S3 Express One
Zone
FSx Lustre + S3 Alluxio + S3
Latency (TTFB)100+ ms 1–10 ms 1 ms 1 ms
Multi-cloud ❌ ❌ ❌ ✅
POSIX API ❌ ❌ ✅ ✅
S3 API ✅ ✅ ❌ ✅
Support WALs
(Append)
❌ ✅ ✅ ✅ (via POSIX)
Data Migration
Required
No High (Creation
time choice)
No No
Cost ($/TB/mo)
~$23
1
~$110
2
~$143
3
~$23
4
to ~$41
5
1
Assumes S3 standard is the source of
truth, hoping full data
2
Assumes S3 Express One Zone holding full
data, as it needs to be decided at bucket
creation time
3
Assumes for 1,000 MB/s/TiB class, FSx
Lustre holding 20% hot data, while S3
keeps full data
4
Assumes Alluxio deployed on GPU spare
disks holding 20% hot data, no additional
hardware cost, while S3 keeps full data
5
Assumes separate Alluxio cluster holding
20% hot data using i3en.6xlarge instances
(1 yr reserved), while S3 keeps full data
18
Real World Use Cases
$
●Blackout Power Trading
●Salesforce
30 minutes between last forecast and market close
= 15 minutes to run inference w/ thousands of small ML models
+15 minutes to review, manually adjust for risk profiles and
human insights, and submit
Market Window Pressure Cooker
S3 Becoming Inefficient
●Currently 5k models (~6/sec) -> growing to 100k (~110/sec)
●Query averaged 3.8 seconds for inference
●Reading models artifact and saving results 3+ seconds
●~80% of time waiting on IO
CASE STUDY 1: Blackout Power Trading
Achieving Double-Digit Millisecond Offline Feature Stores with Alluxio
Query Type* Without Alluxio With Alluxio
Query (ms) Cold Query (ms) Hot Query (ms)
Inference (24 rows) 3,727 99 45
Training (70k rows) 3,841 171 104
Performance
Operational Payoff
● ~60x reduction in latency for inference
● ~30× reduction in latency for training
● Scaling from 5,000 to 100,000+ models in the same 15-minute window
● No online feature store necessary
● Low latency training data
* 20 table join, 4 columns per table, 1 primary key, 81 column result
CASE STUDY 1: Blackout Power Trading
Achieving Double-Digit Millisecond Offline Feature Stores with Alluxio
CASE STUDY 1: Blackout Power Trading
Achieving Double-Digit Millisecond Offline Feature Stores with Alluxio
Talk from Greg Lindstrom @ Blackout Power Trading
https://www.alluxio.io/videos/ai-ml-infra-meetup-achieving-double-digit-millisecond-offline-feature-stores-with-alluxio
CASE STUDY 2: Salesforce
Ultra Low Latency Access for Parquet on S3
22
Before
50 ms -
200 ms
Agents
Iceberg
Data Lake
Challenges: requires ultra-low latency
data accesses to PBs of parquet files for
Agentforce.
●S3 P99 latency is ~100 ms
●Data lake is huge (PB+)
●Agents make multiple (10 to 100+)
fine-grained requests to the data
lake.
●Same function needed across 50
AZs
~1 ms
Query Offloading
Agents
Iceberg
Data Lake
After
CASE STUDY 2: Salesforce
Ultra Low Latency Access for Parquet on S3
23
With query offloading (like single RPC to a
parquet reader), the request is optimized, and
much of the filtering/logic happens closer to the
data, reducing chatter and latency.
Result: 411 ms → 0.3 ms
CASE STUDY 2: Salesforce
Ultra Low Latency Access for Parquet on S3
24
A joint engineering collaboration between Alluxio and Salesforce. For More details:
https://www.alluxio.io/whitepaper/meet-in-the-middle-for-a-1-000x-performance-boost-querying-parquet-files
-on-petabyte-scale-data-lakes
25
Takeaway
$
●S3 remains essential but wasn't designed for latency-critical, semantically rich
workloads
●Don't need to migrate away from S3 — augment it with intelligent caching and
performance layers
●Alluxio bridges the gap between S3's cost-effectiveness and the performance
demands of modern AI/ML workloads
●Proven results: Sub-millisecond latency, and real-world production deployments
●Maximize ROI: Get premium performance without premium costs or operational
complexity