How We Boosted ScyllaDB Data Streaming by 25x by Asias He
ScyllaDB
75 views
9 slides
Mar 05, 2025
Slide 1 of 9
1
2
3
4
5
6
7
8
9
About This Presentation
Streaming, the process of scaling out/in to other nodes used to analyze every partition, one-by-one and was too slow and depended on the schema. File based stream is a new feature that optimizes tablet movement significantly. It streams the entire SSTable files without deserializing SSTable files in...
Streaming, the process of scaling out/in to other nodes used to analyze every partition, one-by-one and was too slow and depended on the schema. File based stream is a new feature that optimizes tablet movement significantly. It streams the entire SSTable files without deserializing SSTable files into mutation fragments and re-serializing them back into SSTables on receiving nodes. As a result, less data is streamed over the network, and less CPU is consumed, especially for data models that contain small cells.
Size: 1.12 MB
Language: en
Added: Mar 05, 2025
Slides: 9 pages
Slide Content
A ScyllaDB Community
How We Boosted ScyllaDB
Data Streaming by 25x
Asias He
Principal Software Engineer
Asias He
■ Asias He is a long-time open source developer who
previously worked on Debian Project, Solaris Kernel,
KVM Virtualization for Linux and OSv unikernel. He
now works on Seastar and ScyllaDB.
■What is streaming
■Mutation based streaming
■File based streaming
■Performance improvement
Agenda
Streaming is a low-level mechanism to move data
between nodes for multiple operations
What is streaming in Scylla
●Add node
●Remove node
●Migrate tablets
●Rebuild tablets
●…
Sender Node
What is mutation based streaming
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
Reader
Mutations
Serialize
Network
Deserialize
Mutations
Receiver Node
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
Writer
Sender Node
What is file based streaming
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
Reader
File
Stream
Network
File
Stream
Receiver Node
SSTable
SSTable
SSTable
SSTable
SSTable
SSTable
Writer
New!
Benefits of file based streaming
■Less cpu consumption
■No more processing of individual mutations
■No more serialization work of mutations
■Less network consumption
■SSTable format is more compact than mutations
Tablet migration test with mutation and file based streaming
Performance
Time to finish Stream bandwidth Bytes on wire per
tablet
CPU Load
Mutation based
streaming
3003 seconds 100 MB/s 20090 MB 12%
File based
streaming
116 seconds 1000 MB/s 7280 MB 4%
Difference 25X 10X 2.75X 3X
●3 Scylla nodes i4i.2xlarge
●3 Loaders t3.2xlarge
●1 Billion partitions