Caching for Performance Masterclass: Caching Strategies
ScyllaDB
284 views
33 slides
Feb 27, 2025
Slide 1 of 33
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
About This Presentation
Exploring the tradeoffs of common caching strategies – and a look at the architectural differences.
- Which strategies exist
- When to apply different strategies
- ScyllaDB cache design
Size: 1.75 MB
Language: en
Added: Feb 27, 2025
Slides: 33 pages
Slide Content
Caching Strategies
Felipe Mendes
CACHE FORGE
Which Strategies Exist?
Top Strategies
Alex Yu @ ByteByteGo – https://blog.bytebytego.com/p/top-caching-strategies
Applying These Strategies
Cache Aside
Applying These Strategies
Cache Aside
DAX
DAX
External Write/Read
Through
DAX
DAX
DAX
Applying These Strategies
Cache Aside
DAX
DAX
DAX
External Write/Read
Through
DAX
DAX
DAX
Write Around /
Write Back
Applying These Strategies
Cache Aside
DAX
DAX
DAX
External Write/Read
Through
DAX
DAX
DAX
Write Around /
Write Back
Embedded Read
Through
ScyllaDB Cache Design
Data Flow – Writes
memtable
Write
RAM
Disk
Data Flow – Writes
memtable
Write
RAM
Disk
commitlog
Data Flow – Writes
memtable
sstable
memtable
Write
RAM
Disk
Data Flow – Writes
sstable
memtable
Write
RAM
Disk
Data Flow – Reads
RAM
Disk
sstable
sstable
sstable
Read
memtable
Data Flow – Reads
■Read consistency easy
○Pin sstables and memtable
○Thanks to collocation
■… But SLOW
RAM
Disk
sstable
sstable
sstable
Read
memtable
Buffer Cache?
RAM
Disk
sstable4K
Buffer Cache?
Inefficient use of memory:
■Need to cache whole buffers to cache a single row
■Access locality not likely if data set >> RAM
SSTable page (4K)
Row (300B)
Buffer Cache?
Poor negative caching:
■Need to cache whole data buffer to indicate absent data
SSTable page (4K)
?
Buffer Cache?
Inefficient use of memory:
■Redundant buffers due to LSM
○Read may touch multiple SSTables
○Memory waste remark pronounced
sstable sstablesstable
Read
Buffer Cache?
High CPU overhead for reads:
■Reads need to merge data from multiple sstables
sstable sstablesstable
Read
Buffer Cache?
High CPU overhead for reads:
■SSTable format optimized for compact storage, not read speed
■Parsing overhead:
○Need to parse index buffers sequentially
○Need to parse the data file
Buffer Cache?
Premature cache eviction due to SSTable compaction:
■SSTable compaction removes old files => buffer invalidation
■Hurts read performance by incurring misses
sstable
sstable
sstable
sstable
Naturally, we built our own cache
memtable
RAM
Disk
Read
cache
sstable
sstable
sstable
ScyllaDB Cache Structure
■Object cache
○Like memtable
■Optimized for low CPU overhead
○Fast reads
■Row-granularity caching
■Reflects data in all relevant SSTables for a given object (e.g. row)
ScyllaDB Memory Management
24
■ScyllaDB reserves and manages most of the memory on a node
○Small reserve for the OS
○No use of Linux page cache (only direct I/O)
■Cache uses all available free memory
○Shrinked on pressure from memtable and other allocations
memtablecache other
Shard per core
CPU 0
CPU 1
CPU 2
CPU 3
Coherency
memtable
Read
cache
task
task
task
■Complex operations on data without dealing with concurrency
■No locking or complex lock-free algorithms
■Data structures and algorithms simple
memtablecache
Challenge: Query & Manipulation (DQL/DML)
SELECT * FROM tbl WHERE pk = 0 AND ck >= 2
DELETE FROM table WHERE pk = 0 AND ck >= 2
What Now?
Range Queries
2 5
?
SELECT * FROM tbl WHERE pk = 0 AND ck >= 2
Range Queries
2 5
?
SELECT * FROM tbl WHERE pk = 0 AND ck >= 2
range continuity
Range Deletes
DELETE FROM table WHERE pk = 0 AND ck >= 2
2
range continuity
+ tombstone
BYPASS CACHE
31
■Read-through weakness:
○Cold reads may evict important cache items
○Workloads with infrequent access patterns don't
benefit from any caching
■Frequent heavy evictions are bad
■Cache thrashing
SELECT name, occupation FROM users WHERE userid IN (199, 200, 207) BYPASS CACHE;
SELECT * FROM users WHERE birth_year = 1981 AND country = 'FR' ALLOW FILTERING BYPASS CACHE;
ScyllaDB Cache Highlights
■ScyllaDB has a fast cache
■Efficient access & maintenance
○Thanks to collocation with replica and design
■Takes care of consistency guarantees
■Handles complexities of data and query model
Keep in touch!
Felipe Cardeneti Mendes
Technical Director
ScyllaDB