Achieving Double-Digit Millisecond Offline Feature Stores with Alluxio

Alluxio 0 views 25 slides Oct 02, 2025
Slide 1
Slide 1 of 25
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25

About This Presentation

AI/ML Infra Meetup
Sep. 30, 2025
Organized by Alluxio

For more Alluxio Events: https://www.alluxio.io/events/

Speaker:
- ​Greg Lindstrom, (VP ML Trading @ Blackout Power Trading)

Greg Lindstrom shared how they achieved double-digit millisecond offline feature store performance using Alluxio, a...


Slide Content

Greg Lindstrom, VP ML Trading
Alluxio Meetup | September 30, 2025
Achieving Double-Digit Millisecond
Offline Feature Stores with Alluxio

What if…
●What if S3 was 10x faster? 30x faster? (bandwidth and latency)

●What if S3 data was served at in-memory latency?

●What complexity could be reduced, how much money and time
could be saved?

But First… Power Trading 101

10k+ Tradeable Locations
Source: Yes Energy

Locational Marginal Price (LMP)
●Prices are the result of a huge math model solving optimal
dispatch (minimize cost)

●Marginal generation unit sets price

Day-Ahead Versus Real Time Power
●Day-ahead (DA) power scheduled one day in advance

●Real-time (RT) power is scheduled every 5 minutes

●Trading the spread between DA and RT power
○Why are these markets different?

●Think air traffic controller scheduling planes one day in
advance

Day-Ahead Power Trading Primer
●Daily blind auction

●Who participates:
○Utilities
○Asset owners
○Speculators (optimizers)

●Speculators physically change generator dispatch

Competitive Market
●Using the latest data
○Weather forecasts
○Renewables forecasts
○Outages
○Pricing
○Etc

●Latest renewables forecast comes out 30 minutes before
market close

Market Window Pressure Cooker
●30 minutes between last forecast and market close
=
●15 minutes to run inference thousands of small ML models
+
●15 minutes to review, manually adjust for risk profiles and
human insights, and submit

The Feature Join Problem
Benchmark Join:
●20 feature tables, 4 columns from each table, 1 primary key
resulting in 81 columns
●24 rows for inference and 70k rows for training

Join Example in SQL
SELECT
t1.col1, t1.col2, t1.col3, t1.col4,
t2.col1, t2.col2, t2.col3, t2.col4,
...
t20.col1, t20.col2, t20.col3, t20.col4
FROM table1 t1
JOIN table2 t2 ON t1.id = t2.t1_id
JOIN table3 t3 ON t2.id = t3.t2_id
JOIN table4 t4 ON t3.id = t4.t3_id
...
JOIN table20 t20 ON t19.id = t20.t19_id;

S3 Becoming Inefficient
●Currently 5k models (~6/sec) -> growing to 100k (~110/sec)

●Query averaged 3.8 seconds for inference

●Reading models artifact and saving results 3+ seconds

●~80% of time waiting on IO

Why Have a Feature Store?
●Simplicity
○Ingest
○Query

●Metadata

●Feature Views

●Versioning

Standard Feature Store Pipeline
Source: https://www.tecton.ai/blog/what-is-a-feature-store/

Online Feature Stores
Pros
●Very high performance
inference
●Ingesting streaming data
is simple
Cons
●Limited Data
●Complexity
○Data lifecycle
○Split sources of truth
●Expensive
●Training data serving is
still very slow
●Volatile

Offline Feature Stores
Pros
●Relatively cheap
●Low complexity
●Durable
●All data
Cons
●Slow
●Ingesting streaming data
is more complex
What if it wasn’t slow?

What Makes Offline Slow?
1.Storage latency / bandwidth

2.Storage format

3.Feature joining

4.Post-join data transfer

What if we optimized for speed?

1.Solving Storage Latency / Bandwidth
[Alluxio enters the chat]
●Cache files on NVME drives for low latency

●Bandwidth now constrained by EC2 instance types (how does
10GB/s sound?)

●High Availability + Maintains S3 durability

●Scales linearly

●Lots of tuning options depending on workload

2. Solving Storage Format
●Contenders: Parquet, Delta Lake, Iceberg, Hudi, Avro

●Parquet is easily the winner for speed however:
○Can't handle concurrent writes
○Queries may see partial results (partitioned tables)
○No version history

Can we still make parquet work?

3. Solving Multi-Join Performance
●Contenders: Spark, Trino, Flink, Dask, Duckdb, Pandas, Polars

●Polars and Duckdb are the fastest with Polars being the clear
winner

●Ideally zero-copy for post-join data transfer

4. Solving post-join data transfer
●Format Contenders: Json, Parquet, Arrow, others

●Arrow streaming over gRPC (HTTP2) outperforms all others
which is a huge time savings due to…

●Arrow flight server

Putting It Together
●Kubernetes
●High Availability
●Offline Low Latency
●Scalable
●Durable

Performance
Query Type*
Without
Alluxio
With Alluxio
Query
(ms)
Cold Query
(ms)
Hot Query
(ms)
Inference (24 rows) 3,727 99 45
Training (70k rows) 3,841 171 104
* 20 Table Join, 4 Columns per Table, 1 Primary key, 81 Column Result

Operational Payoff
●~60x reduction in latency for inference

●~30× reduction in latency for training

●Scaling from 5,000 to 100,000+ models in the same 15-minute
window

●No online feature store necessary

●Low latency training data

Q&A
●Discussion

●Blog Post:
https://www.alluxio.io/blog/blackout-power-trading-achieved-low-l
atency-offline-feature-store-performance-with-alluxio-caching

●Get in touch: linkedin.com/in/greg-lindstrom