Using ScyllaDB to Implement Lists in Medium’s Feature Store by Andreas Saudemont

ScyllaDB 124 views 31 slides Mar 05, 2025

Slide 1 of 31

About This Presentation

Discover how Medium is leveraging ScyllaDB to power a fast, scalable data layer for lists in Medium’s feature store.

Size: 2.48 MB

Language: en

Added: Mar 05, 2025

Slides: 31 pages

Slide Content

A ScyllaDB Community
Using ScyllaDB to Implement
Lists in Medium’s Feature Store
Andréas Saudemont
Software Engineer

Andréas Saudemont (he/him)
■Principal software engineer at Medium
■Building scalable architectures for heavy loads

■Medium’s feature store and list features
■ScyllaDB data model
■Implementing list operations
■Some metrics
■Recap
Presentation Agenda

Medium’s Feature Store
and List Features

Medium’s Feature Store
Key component of Medium’s recommendation system
Used by machine learning models for ‘For you’ feed, Daily Digest, etc.
Database with a specialized API
Low-latency, high-throughput access patterns

Features
A property of an entity in the feature store
Deﬁned by:
●entity type – e.g. user
●name – e.g. is_member
●data type of its values – e.g. boolean
●version (optional) – e.g. ‘2025/03/11’
A feature value is the value of a feature for a given entity ID
●e.g. true for the user.is_member feature for entity "user123"

Relational Features
Has multiple values for a given entity ID
Each value has:
●a relation ID – the ID of the related entity
●a timestamp
Sample: story.user_has_read
●Relates with the user entity type
●Values indicate whether and when a given user has read a given story

Limitations of Relational Features
Suboptimal data model
Data split across 2 tables
→ Too many DB queries
●Ineﬃcient “ALLOW FILTERING” queries to fetch entity IDs
●Plus one query for each entity ID to fetch values
→ Hard to optimize using primary keys/indexes

List Features
Goal: better way to handle cross-entity relations in the feature store
List feature: deﬁned by its entity type, name, and optional version (like
other features)
List: value of a list feature for a given entity ID
List item: holds a value and a timestamp
Mandatory time-to-live (TTL)
Expected call rate of up to 1M ops/s

Sample: user.reading_history

ScyllaDB Data Model

The list_items Table
-- Stores all the list items for all the list features
-- managed by the feature store.
CREATE TABLE list_items (
feature_key TEXT,
entity_id TEXT,
item_key TEXT,
value BLOB,
PRIMARY KEY ((feature_key, entity_id), item_key)
)
WITH CLUSTERING ORDER BY (item_key DESC)
AND DEFAULT_TIME_TO_LIVE = $defaultTTL;

feature_key TEXT
●Identiﬁes the list feature
●Built by concatenating the entity type, name, and version
entity_id TEXT
●ID of the entity the list belongs to
item_key TEXT
●Identiﬁes the item in the list
●Built by concatenating the timestamp and a hash of the value
value BLOB
●Opaque bytes representation of list item value
Columns

Partition key: feature_key + entity_id
●feature_key = entity type + name + version
●All items of a given list are stored in the same partition
●Enables eﬃcient operations on a given list
Clustering key: item_key
●item_key = timestamp + hash(value)
●Items in a list are sorted following the order of their timestamp
●Enables eﬃcient retrieval of list items in reverse-chronological order
●Allows multiple items in a list with same timestamp but distinct values
Primary Key

Used by the Remove List Items with Value operation
Local index ensures data is stored on same node as the base table
Faster than a scan as query is highly selective
Faster than a global index for our use cases
The list_items_by_value LSI
CREATE INDEX list_items_by_value
ON list_items((feature_key, entity_id), value);

Implementing List Operations

Add List Items
BEGIN BATCH
-- for each $item in $items:
INSERT INTO list_items(feature_key, entity_id, item_key, value)
VALUES (
'$entityType#$featureName|$featureVersion',
$entityID,
${buildItemKey(item)},
${item.Value},
)
USING TTL ${item.Timestamp + ttl - now};
APPLY BATCH;1
1Logged batch (default) ensures that insertion of items is atomic

Get List Items
SELECT value, item_key
FROM list_items
WHERE feature_key = '...' AND entity_id = $entityID
AND item_key >= ${buildItemKey(minTimestamp)}
ORDER BY item_key DESC
LIMIT $limit;
1
1Single partition for maximum eﬃciency
2Discard old items
2
3
3Eﬃcient sorting via clustering key

Remove List Items with Value
SELECT item_key FROM list_items
WHERE feature_key = '...' AND entity_id = $entityID
AND value = $value;

-- for each batch of item_key values:
DELETE FROM list_items
WHERE feature_key = '...' AND entity_id = $entityID
AND item_key IN ($itemKeyBatch);
1
1Fetch keys of items to delete via list_items_by_value LSI
2
2Run batches of DELETE statements

Remove All List Items
DELETE FROM list_items
WHERE feature_key = '...' AND entity_id = $entityID; 1
1Deletion is atomic as we’re deleting a whole partition

Some Metrics

Latencies

Recap

Recap
●Suboptimal data model leads to queries that scale badly
●List features designed as replacement for relational features
●Primary key shaped for eﬃcient querying, sorting, and ﬁltering
●Local secondary index for eﬃcient querying outside the primary key
●ScyllaDB is fast

Stay in Touch
Andréas Saudemont
[email protected]
https://medium.com/@asaudemont
https://www.linkedin.com/in/andreassaudemont/

Features

Lists

Controlling Storage Usage with TTL

List Operations

Download

Download Slideshow Get the original presentation file

Quick Actions

Statistics

Views 124
Slides 31
Age 271 days

Using ScyllaDB to Implement Lists in Medium’s Feature Store by Andreas Saudemont

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Using ScyllaDB to Implement Lists in Medium’s Feature Store by Andreas Saudemont

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 28

Slide 29

Slide 30

Slide 31

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......