Cassandra to ScyllaDB: Technical Comparison and the Path to Success
ScyllaDB
322 views
30 slides
Jun 24, 2024
Slide 1 of 30
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
About This Presentation
What can you expect when migrating from Cassandra to ScyllaDB? This session provides a jumpstart based on what we’ve learned from working with your peers across hundreds of use cases. Discover how ScyllaDB’s architecture, capabilities, and performance compares to Cassandra’s. Then, hear about ...
What can you expect when migrating from Cassandra to ScyllaDB? This session provides a jumpstart based on what we’ve learned from working with your peers across hundreds of use cases. Discover how ScyllaDB’s architecture, capabilities, and performance compares to Cassandra’s. Then, hear about your Cassandra to ScyllaDB migration options and practical strategies for success, including our top do’s and don’ts.
Size: 5.15 MB
Language: en
Added: Jun 24, 2024
Slides: 30 pages
Slide Content
Cassandra to ScyllaDB A technical comparison and path to success Lewis Carr, Senior Director, Product Marketing Paul Preuveneers, Solution Architecture, Customer Experience
Introduction Comparing Cassandra and ScyllaDB Migration Options and Tools Case Studies and Best Practices Conclusion and Extended Q&A Presentation Agenda
Wide column NoSQL key-key-value DB Token ring node structured clusters Tunable consistency and virtual nodes Cassandra Query Language (CQL) Mem table (in-memory) & SSTables (disk) Repair and compaction … Cassandra & ScyllaDB: Same starting line
High throughput AND low latency delivered predictably, without complexity or high cost needed for hyperscale applications Comcast . 962 Cassandra nodes to only 78 Scylla nodes; 60% savings, reduced latencies 95% Discord . 250M users. Saved time, improve consistency, and reduced downtime compared to C*. Expedia . Moved from C* to ScyllaDB Cloud to avoid Java CG, burst traffic, high infrastructure costs and infrequent release schedules Fanatics . went from 55 nodes to 6 and dramatically reduced their AWS EC2 bill by moving to ScyllaDB So why are users switching to us?
European Ecommerce Platform 14,000,000 monthly users Maintenance operations in Cassandra → latency spikes P99 comparison 7X lower response times with ScyllaDB “Usable” P99 latencies Consistent, Low Latencies ‹#›
C* cannot maintain low latencies except at very low throughput (≤30-40k ops) ScyllaDB can maintain low latencies for far greater throughputs (≤170-180k ops) ScyllaDB vs. C*: Latency vs Throughput ‹#›
Comparing Cassandra (C*) and ScyllaDB
Design Tenets for Hyperscale Applications Design Decision Point Cassandra(C*) Implementation ScyllaDB Implementation Impact of difference in implementation Distributed operations and redundancy Shared-nothing, leaderless node, token ring topology Shard-aware tokens mapped to shards, in turn, mapped to CPU cores Improved granularity of resource allocation and use Language implementation Java C++ No garbage collection, better real-time response Vertical vs. horizontal scaling Default scale out Scale up then out Larger more performant nodes for higher throughput, lower latency - extracts more from infra. investment Cloud compute memory use Integrated RAM and disk ops with compaction & repair Unified cache Provided you can allocate sufficient cache you can eliminate the need for frontend caches like Redis and often avoid front-ends like Kafka Scale Elasticity Serverless Faster Operations, Tablets (coming soon) will further accelerate new node spinup Speed of scaling up/down and in/out w/o current cost of serverless
Shared-nothing asynchronous operations with pinned resources Scale up before scaling out, take full advantage of the largest VM instances Reduce node sprawl, operational complexity, and intranode latency Completely use what you pay for ScyllaDB Shar d-Per-Core with Seastar delivers an order of magnitude better performance 21 Node C* Cluster 5 Node ScyllaDB Cluster Partitions = 1, 2, 3, 4, 5 2, 4, 5 1 , 2, 5 1 , 3, 4 1 , 3, 4 2, 3, 5 Close-to-the-metal architecture
Seastar performance boost alone is 3X
Performance Comparison, Up to 5X faster
ScyllaDB 2024.1 vs 2023.1 vs OSS 5.4 (2022) Max Throughput, Higher is better
Tuning JVM for heap and GC Optimal vNode count setting Combating node sprawl Compaction and repair Identifying hot partitions early and often Provisioning to account for lower price performance C* tuning and optimization can be a challenge
No Java! So no, JVM or GC Automated memory, storage and IO tuning Workload prioritization Automated hot partition avoidance Unified memory cache (Memtable) Automated repair and compaction (ICS) Automated scale with speed and granularity Reduced node sprawl reduces admin and infrastructure costs ScyllaDB reduces operational complexity, cost, and risk
Strategies Offline / Cold Migration Online / Hot Migration (Data Migration Using Kafka) Migration Strategies
How to migrate the data Tools and Techniques
How to migrate the data CQL COPY Tools and Techniques
How to migrate the data CQL COPY ScyllaDB’s SSTableloader Tools and Techniques
How to migrate the data CQL COPY ScyllaDB’s SSTableloader Mirror Loader Tools and Techniques
From the migration guide we have CQL COPY Scylla’s SSTableloader Mirror Loader Spark Migrator Tools and Techniques Scylla Apache Spark Migration tool https://github.com/scylladb/scylla-migrator Scylla Migration Guide https://enterprise.docs.scylladb.com/stable/operating-scylla/procedures/cassandra-to-scylla-migration-process.html
Failure handling What should I do if SSTableloader fails? What should I do if an Apache Cassandra node fails? What should I do if a ScyllaDB node fails? How to rollback and start from scratch? Potential Technical Challenges
Cassandra and ScyllaDB Running in Parallel Live (Hot) Setup Use ScyllaDB Tools Best Practices
How to perform Live Migration Create the same schema from Apache Cassandra in ScyllaDB Configure your application to perform dual writes (read only from Cassandra) Snapshot the to-be-migrated data from Cassandra Load the SSTable files to ScyllaDB (using the ScyllaDB sstableloader tool) Verification Dual writes and reads, ScyllaDB serves reads Log mismatches, until a minimal data mismatch threshold is reached Apache Cassandra End Of Life ScyllaDB only for reads and writes Best Practices
Online Sports Apparel Powerhouse 2015 move to Cassandra JVM Garbage Collection issues CPU spiking and timeouts Huge costs and huge maintenance overhead Remedy was ScyllaDB Out of total cluster size of 55 Cassandra nodes, Fanatics were able to reduce 43 nodes of Cassandra to 6 nodes of ScyllaDB, dramatically reducing their EC2 bill. “During the peak minute we saw close to 280,000 IOPS… and we had zero timeouts .” Real World Examples “Just moving one use case [cart mutations] to ScyllaDB we got a huge benefit out of it.” Niraj Konathi Director of Platform Engineering
C* Challenge: “Volatile Latencies” Inconsistent performance Instability Maintenance overhead 24 nodes of C* = 6 nodes of ScyllaDB. Publish items 5x faster 2.5x lower infrastructure costs 4x node reduction Real World Examples
Takeaways ‹#› ScyllaDB delivers predictable high performance and low latency ScyllaDB and Cassandra share a large driver and connectors ecosystem ScyllaDB reduces operational complexity, cost and risk The path for migration is straightforward, low risk, and we're here to help ! Customers met their performance challenges with ScyllaDB