Caching for Performance Masterclass: Caching Strategies

ScyllaDB 284 views 33 slides Feb 27, 2025
Slide 1
Slide 1 of 33
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33

About This Presentation

Exploring the tradeoffs of common caching strategies – and a look at the architectural differences.

- Which strategies exist
- When to apply different strategies
- ScyllaDB cache design


Slide Content

Caching Strategies
Felipe Mendes
CACHE FORGE

Which Strategies Exist?

Top Strategies
Alex Yu @ ByteByteGo – https://blog.bytebytego.com/p/top-caching-strategies

Applying These Strategies
Cache Aside

Applying These Strategies
Cache Aside
DAX
DAX
External Write/Read
Through
DAX
DAX
DAX

Applying These Strategies
Cache Aside
DAX
DAX
DAX
External Write/Read
Through
DAX
DAX
DAX
Write Around /
Write Back

Applying These Strategies
Cache Aside
DAX
DAX
DAX
External Write/Read
Through
DAX
DAX
DAX
Write Around /
Write Back
Embedded Read
Through

ScyllaDB Cache Design

Data Flow – Writes
memtable
Write
RAM
Disk

Data Flow – Writes
memtable
Write
RAM
Disk
commitlog

Data Flow – Writes
memtable
sstable
memtable
Write
RAM
Disk

Data Flow – Writes
sstable
memtable
Write
RAM
Disk

Data Flow – Reads
RAM
Disk
sstable
sstable
sstable
Read
memtable

Data Flow – Reads
■Read consistency easy
○Pin sstables and memtable
○Thanks to collocation
■… But SLOW

RAM
Disk
sstable
sstable
sstable
Read
memtable

Buffer Cache?
RAM
Disk
sstable4K

Buffer Cache?
Inefficient use of memory:
■Need to cache whole buffers to cache a single row
■Access locality not likely if data set >> RAM

SSTable page (4K)
Row (300B)

Buffer Cache?
Poor negative caching:
■Need to cache whole data buffer to indicate absent data

SSTable page (4K)
?

Buffer Cache?
Inefficient use of memory:
■Redundant buffers due to LSM
○Read may touch multiple SSTables
○Memory waste remark pronounced

sstable sstablesstable
Read

Buffer Cache?
High CPU overhead for reads:
■Reads need to merge data from multiple sstables

sstable sstablesstable
Read

Buffer Cache?
High CPU overhead for reads:
■SSTable format optimized for compact storage, not read speed
■Parsing overhead:
○Need to parse index buffers sequentially
○Need to parse the data file

Buffer Cache?
Premature cache eviction due to SSTable compaction:
■SSTable compaction removes old files => buffer invalidation
■Hurts read performance by incurring misses


sstable
sstable
sstable
sstable

Naturally, we built our own cache
memtable
RAM
Disk
Read
cache
sstable
sstable
sstable

ScyllaDB Cache Structure
■Object cache
○Like memtable
■Optimized for low CPU overhead
○Fast reads
■Row-granularity caching
■Reflects data in all relevant SSTables for a given object (e.g. row)

ScyllaDB Memory Management
24
■ScyllaDB reserves and manages most of the memory on a node
○Small reserve for the OS
○No use of Linux page cache (only direct I/O)
■Cache uses all available free memory
○Shrinked on pressure from memtable and other allocations

memtablecache other

Shard per core
CPU 0
CPU 1
CPU 2
CPU 3

Coherency
memtable
Read
cache
task
task
task
■Complex operations on data without dealing with concurrency
■No locking or complex lock-free algorithms
■Data structures and algorithms simple

memtablecache

Challenge: Query & Manipulation (DQL/DML)
SELECT * FROM tbl WHERE pk = 0 AND ck >= 2


DELETE FROM table WHERE pk = 0 AND ck >= 2
What Now?

Range Queries
2 5
?
SELECT * FROM tbl WHERE pk = 0 AND ck >= 2

Range Queries
2 5
?
SELECT * FROM tbl WHERE pk = 0 AND ck >= 2



range continuity

Range Deletes
DELETE FROM table WHERE pk = 0 AND ck >= 2

2
range continuity
+ tombstone

BYPASS CACHE
31
■Read-through weakness:
○Cold reads may evict important cache items
○Workloads with infrequent access patterns don't
benefit from any caching
■Frequent heavy evictions are bad
■Cache thrashing

SELECT name, occupation FROM users WHERE userid IN (199, 200, 207) BYPASS CACHE;
SELECT * FROM users WHERE birth_year = 1981 AND country = 'FR' ALLOW FILTERING BYPASS CACHE;

ScyllaDB Cache Highlights
■ScyllaDB has a fast cache
■Efficient access & maintenance
○Thanks to collocation with replica and design
■Takes care of consistency guarantees
■Handles complexities of data and query model

Keep in touch!
Felipe Cardeneti Mendes
Technical Director
ScyllaDB

[email protected]
@felipemendes.dev
Tags