Using ScyllaDB for Real-Time Read-Heavy Workloads.pdf

ScyllaDB 470 views 32 slides Aug 22, 2024
Slide 1
Slide 1 of 32
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32

About This Presentation

Keeping latencies predictably low for instant user experiences

ScyllaDB’s “sweet spot” is workloads over 50K operations per second that require predictably low (e.g., single-digit millisecond) latency. And its unique architecture makes it particularly valuable for the real-time read-heavy wor...


Slide Content

Using ScyllaDB for
Real-Time Read-Heavy
Workloads
Felipe Cardeneti Mendes, Technical Director, ScyllaDB
Tim Koopmans, Product Experience, ScyllaDB

Poll
Where are you in your NoSQL adoption?

Using ScyllaDB for
Real-Time Read-Heavy
Workloads
Felipe Cardeneti Mendes, Technical Director, ScyllaDB
Tim Koopmans, Product Experience, ScyllaDB

+For data-intensive applications that require high
throughput and predictable low latencies
+Close-to-the-metal design takes full advantage of
modern infrastructure
+>5x higher throughput
+>20x lower latency
+>75% TCO savings
+Compatible with Apache Cassandra and Amazon
DynamoDB
+DBaaS/Cloud, Enterprise and Open Source
solutions

The Database for Gamechangers
4
“ScyllaDB stands apart...It’s the rare product
that exceeds my expectations.”
– Martin Heller, InfoWorld contributing editor and reviewer
“For 99.9% of applications, ScyllaDB delivers all the
power a customer will ever need, on workloads that other
databases can’t touch – and at a fraction of the cost of
an in-memory solution.”
– Adrian Bridgewater, Forbes senior contributor

5
+400 Gamechangers Leverage ScyllaDB
Seamless experiences
across content + devices
Digital experiences at
massive scale
Corporate fleet
management
Real-time analytics 2,000,000 SKU -commerce
management
Video recommendation
management
Threat intelligence service
using JanusGraph
Real time fraud detection
across 6M transactions/day
Uber scale, mission critical
chat & messaging app
Network security threat
detection
Power ~50M X1 DVRs with
billions of reqs/day
Precision healthcare via
Edison AI
Inventory hub for retail
operations

Property listings and
updates
Unified ML feature store
across the business
Cryptocurrency exchange
app
Geography-based
recommendations
Global operations- Avon,
Body Shop + more
Predictable performance for
on sale surges
GPS-based exercise
tracking
Serving dynamic live
streams at scale
Powering India's top
social media platform
Personalized
advertising to players
Distribution of game
assets in Unreal Engine

Presenters

Felipe Cardeneti Mendes, Technical Director
+Puppy Lover
+Open Source Enthusiast
+ScyllaDB passionate!

Tim Koopmans, Product Experience
+Rust devy
+Marathon Swimmer
+Love all things P99

Agenda
+Characterizing Read-heavy workloads
+Challenges and Tradeoffs
+ScyllaDB Under Load
+Best Practices
+Success Stories

High throughput REAL-TIME data processing.
Characterizing
Read-Heavy Workloads

+Commonly referred to as "read-mostly"
+Workloads requiring high volume of reads under very low response times
+Challenges involve:
+Scaling reads – Caches can become prohibitively expensive
+Competing workloads – No coordination
+Expensive queries – Aggregations, expensive filtering clauses
+Performance over time – Dataset growth, changing access patterns (hotspots)
Real-time Read-Heavy?

+?????? Social
+Feeds
+Activity Timelines
+Chat
+⛭ Recommendation / Personalization
+Past Interactions
+Collaborative & Content-based Filtering
+User Profile (preferences, demographics…)
+ ?????? Session / Profile Management
+Authentication / Authorization
+Security
+User Experience
Commonly Seen Use Cases
+?????? Product Catalogs
+Product Browsing / Views
+Reviews
+Inventory & Pricing Tracking
+?????? Betting
+Live Odds and Updates
+Leaderboards
+Real-time notifications
+?????? Metadata Store / CDNs
+Static assets
+Caching layer
+Content versioning

What happens during a read?
Challenges and
Tradeoffs

ScyllaDB Read Path
12
memtable
RAM
Disk
Read
cache
sstable
sstable
sstable
Flush
Merge

Hot versus Cold Reads
13
+Cache items have unlimited fetch ceiling
+Be mindful of your read:
+Do I often retrieve a single key:value or scan a wide partition?
+Is the data frequently accessed?
+Will the read cause eviction of important items I need?
Read from cache Gone to disk

14
Cache Thrashing

Constant populations and evictions without fully taking advantage of caching
■Commonly seen in heavy full-scans / Analytics
■BYPASS CACHE
Image Credits: Yuri Kushch – Caching Strategies
Inside ScyllaDB’s Internal Cache

Paging (internal and external)
R1
R2
R3
Client
Coordinator
Node
DataData
Quorum
Data + Digest
Digest

Tombstones
6 seconds!
+Deletes are actually writes of a "tombstone marker"
+Too many deletes slow down the read path
+When you read:
+Scans need to iterate through your deletes
+Many deletes result in higher latencies

ScyllaDB Under Load
Live Optimizing (or Worsening) Read Performance

Avoiding common mishaps
Best Practices

ScyllaDB Cache
■Cache is LRU on rows
●Use BYPASS CACHE for analytical workloads
■Efficient access & maintenance
●Thanks to replica collocation and design
■Efficient access & maintenance


CPU 0
CPU 1
CPU 2
CPU 3

SSTable index caching
■The whole of index can now be
cached in memory

■Populated on access (read-through)

■Evicted on memory pressure

■Partition index summary still
non-evictable and always resident


RAM
Disk

Workload Prioritization
21
Different workloads require different priorities
■Meet SLAs
■Flexible Configuration
■Adaptability to Changing Conditions

Heat-Weighted Load Balancing
22
+Replica goes down and comes back up
+Caches are cold.
+Never sending requests to the node means caches never warm up.
+Optimize mathematically the desired hit ratio so that caches warm up,
+While still keeping latencies down !

Restarted node. Cache misses are
initially high but deterministically go
down

Prepare your Queries
Ad-hoc, rare queries are the only excuse not to prepare statements.
R2
R3
Client
R1
Quorum
Both coordinator and replica, one less round-trip!

Keep parallelism HIGH
■Low parallelism hurts ScyllaDB
●Fewer units will be working, database will not be efficient
■Is there such a thing as too high?



Nope!
■No need to guess:
●C = T x L
●Example: 200,000 requests/s at 1ms average latency:
■C = 200,000 * 0.001
■C = 200.
■Driver settings:
●Number of connections x maximum requests per connection
●Remaining requests will be queued in the application side.

ScyllaDB and Memcached
p99.999 < 1ms :-)

Success Stories
How ScyllaDB is being used among your peers!

The different GCP disk types each meet these requirements in
different ways. It would be all too convenient if we could combine
both disk types into one super-disk. Since our primary focus for disk
performance was low-latency reads, we would love to read from
GCP's Local SSDs (low latency) while still writing to Persistent Disks
(snapshotting, redundancy via replication). But is there a way to
create such a super-disk at the software level?

How Discord Supercharges Network Disks for Extreme Low Latency

This workload is quite performance sensitive, so getting quick
responses from our database is key. This approach saves us plenty of
headaches, and it performs really well. We’ve had this system
deployed within Epic for over a year, and are working with licensees
to get this deployed for them as well. It’s been serving us well to
allow people to work much more efficiently. [Even as] assets
continue to grow even larger, people can still work from home

Epic Games & Unreal Engine: Where ScyllaDB Comes Into Play

Just to illustrate that idea, let’s say we have a customer in London.
We will place a copy of our services (“a cell”) into that region. And all
of that customer’s interactions will be contained in that region,
ensuring that they always have low latency. We’ll place multiple
replicas of their data in that region. And will also place additional
replicas of their data in other regions. This becomes important later.

Worldwide Local Latency With ScyllaDB: The ZeroFlucs Strategy

Poll
How much data do you have under
management of your transactional
database?

Guilherme Nogueira
[email protected]
Keep Learning
scylladb.com/category/engineering
Register now at p99conf.io
Visit our blog for
more on ScyllaDB
engineering

Thank you
for joining us today.
@scylladb scylladb/
slack.scylladb.com@scylladb company/scylladb/
scylladb/
Tags