Why Databases Cache, but Caches Go to Disk

ScyllaDB 468 views 19 slides Oct 15, 2024
Slide 1
Slide 1 of 19
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19

About This Presentation

ScyllaDB teamed up with Memcached to compare how caches and databases handle storage and memory across different scenarios. We'll dive into ScyllaDB's row-based cache vs. Memcached's in-memory hash table and IO handling. #Databases #Caching


Slide Content

A ScyllaDB Community
Why Databases Cache, but Caches
Go to Disk
Felipe Mendes
Technical Director at ScyllaDB
Alan "Dormando" Kasindorf
Production Memcached at Cache Forge

Felipe Mendes (he/him)

Technical Director at ScyllaDB
■ScyllaDB Committer
■"Database Performance
at Scale" co-author
■this=(optimized out)
Alan "dormando" (he/him)
Founder at Cache Forge
:E
■Memcached project maintainer
■I scale things, and fix bugs
■Still finds hardware interesting

Not so long ago in a galaxy far, far away…

Super Idea: Let's compare ScyllaDB and Memcached!
But… How? :-)

Industry Benchmarking is Largely Biased
Fair Benchmarking Considered Difficult

Do it together!

Cache Efficiency – Key:Value
■4-12 bytes Key, 1KB Value (no CAS)
■~101M Cached Items


■12 bytes Key, 1KB Value
■61M Cached Items (WYSINWYG!)
■Be mindful of Dummy rows

Differences Explained
■Per key overhead: 48 / 56 bytes

■Per key overhead: Higher than Memcached :-)
■Memtables, Bloom Filters, SSTable Summaries, etc
■Richer data-model (range continuity, timestamps, liveness...)

Read-Only In-Memory Workload
■mcshredder perfrun_metaget_pipe
■Howdy P99.999 :-)


■cassandra-stress profile (boo Java!)
■Small single key lookups need +compute :(
■Shines on wide-column (or larger payloads)

Opposite Directions
■Memcached Says: ■ScyllaDB Says:
my @post_ids = fetch_all_posts($thread_id);
my @post_entries = ();
for my $post_id (@post_ids) {
push(@post_entries,
$memc->get($post_id));
}
# Yay I have all my post entries!
See that? Don't do that: Do Pipeline.
def process_items(batch):
session.execute(my_statement, batch)

items_to_process = []
for item in incoming_requests():
items_to_process.append(item)
process_items(items_to_process)
See that? Don't do that: Do Parallelize.

To Shard or Not?
■Pipelining is recommended
■Replication halves your cache size
■Consistent hashing is client's concern
■Shard per core complicates "pipelining"
■Replication is the norm
■Built-in Consistent hashing
Memcached Proxy

Disks – Small (1K) Payload
■Extstore's Achilles' Heel :-(
■Many small items eventually fill up hashtable
■IO Threads require fine-tuning
■Individual GETs perform better


■Back to key-value GETs!
■268K Disk Reads/s
■P99 ~2ms

I/O Access Methods
■Buffered I/O ( pread(), pwrite() )
■ext_threads controls the number of threads available for Extstore
■Simple: Pointer to flash adds 12 bytes overhead per key

■Asynchronous Direct I/O
■Complex and error-prone to implement
■Fine-grained I/O control databases need (to persist after all!)

iotop with 64 ext_threads
Userspace I/O Scheduler "Self-tuning"

Extstore Extends The RAM Cache
Key1: 1k RAM value
Key2: 1k RAM value
Key3: 8k RAM value
HEAD
TAIL
LRU
Key3: 100b RAM, 7.9k Disk value
Key1: 1k RAM value
Key2: 1k RAM value
HEAD
TAIL
LRU
Key3 evicted from RAM into
disk. Key and pointer stay in
RAM.

Disks – Larger (8K) Payload
■Still require tuning IO Threads
■IMPRESSIVE (HUGE) 25x savings!
■MGET can easily introduce I/O contention


■Back to key-value GETs!
■Again maximized I/O (~156K reads/s)
■P99 ~2ms

Ok …so who won?

Who told you this was a
competition? ;-)
Read the full report at: https://fee-mendes.gitbook.io/scylladb-mc-compare

Takeaways
■Trust Nobody – Benchmarks are painful hard to express YOUR reality
■Databases and Caches have different approaches – Know its tradeoffs
■Perf is not enough. Costs, feature parity, UX, etc are some other dimensions
■At the heart of every optimization, there's a sacrifice. Beware of it.
■BTW… Despite all this. The numbers here are much better than last years :-)

Felipe Mendes
[email protected]
@felipemendes.dev
scylladb.com
Thank you! Let’s connect.
:E
Alan "dormando"
[email protected]
dormando.me
Memcached
Tags