Database Migration Strategies and Pitfalls by Patrick Bossman
ScyllaDB
172 views
31 slides
Mar 10, 2025
Slide 1 of 31
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
About This Presentation
Jump start your migration to ScyllaDB! In this talk, we will discuss how to migrate from Cassandra, DynamoDB, as well as other sources. Review tooling available to assist with Scylla Migrations. Review approaches and considerations for online and offline migrations, and how to plan for a faster m...
Jump start your migration to ScyllaDB! In this talk, we will discuss how to migrate from Cassandra, DynamoDB, as well as other sources. Review tooling available to assist with Scylla Migrations. Review approaches and considerations for online and offline migrations, and how to plan for a faster migration if necessary.
Size: 1.21 MB
Language: en
Added: Mar 10, 2025
Slides: 31 pages
Slide Content
A ScyllaDB Community
Database Migration
Strategies and Pitfalls
Patrick Bossman
Senior Customer Engineer
Patrick Bossman (he/him)
■30+ years experience helping enterprise customers
design and deploy highly performant, reliable,
available solutions
■Passion for building new things, solving problems
■2 x girl dad
■Overview of types of migrations
■How to from
■Cassandra (CQL)
■DynamoDB
■Others
■Tuning migrations
Presentation Agenda
Migration Overview
We can do it! Migrations don’t have to be feared.
■We’re successfully migrating customers all the time
■Frequently with no/minimal application changes
■Rich tooling to assist with the process
■Experienced professionals to lend a hand
Demystifying migration
Migration options
■Online - using Scylla and DC to DC migration (way to go!)
■set replication factor (==dual writes)
■start rebuild (== historic data load)
■repair (==validation)
■Online / Live Migration (way to go)
■dual writes
■historic data fork lifting
■validation
■Offline / Downtime Migration (but why?)
■historic data fork lifting
■validation
Cassandra (CQL) Migration
■Show example of on-line migration
■Can preserve source writetime and ttl
■Bonus - backfill will not “overwrite” new writes
■Discuss tooling
■Zero Downtime Migration (ZDM) - application transparent dual writes
■Scylla Migrator - Spark based performant backfill and validation
Cassandra migration
Write to Cassandra
Time
Read from Cassandra
Live Migration
9
Dual Writes (Application or ZDM)
Enable Dual
Writes
Migrate
Schema
Write to Cassandra
Time
Read from Cassandra
Write to ScyllaDB
Live Migration
10
Dual Writes (Application or ZDM)
Enable Dual
Writes
Write to Cassandra
Forklifting Existing Data
DBs in Sync
Time
Read from Cassandra
Write to ScyllaDB
Live Migration
11
Dual Writes (Application or ZDM)
Scylla
Migrator
Dual Reads
Write to Cassandra
Forklifting Existing Data
Validation
DBs in Sync
Time
Read from ScyllaDB
Read from Cassandra
Write to ScyllaDB
Live Migration
12
Dual Writes (Application or ZDM)
Dual Reads
Write to Cassandra
Forklifting Existing Data
Validation
DBs in Sync
Time
Read from ScyllaDB
Fade off
Cassandra
Read from Cassandra
Write to ScyllaDB
Live Migration
13
Dual Writes (Application or ZDM)
Alternative to implementing dual writes within application.
■Application connects to zdm-proxy
■Proxy dual writes to source and target
■Optionally dual read to target (validate target performance)
■zdm-proxy
SQL
NoSQL
zdm-proxy - Zero Downtime Migration
14
■Highly resilient to failures
■Access compatible Databases using a native connector
■Scalable, high performance parallelized reads and writes
■Support transformations
■Open Source!
SQL
NoSQL
ScyllaDB Migrator
15
■Slower - but maybe ok for smaller datasets
■DSBulk
■Bulk unload/load tool
■Can also pipe
■CQLSH COPY TO/FROM
■CQLSH copy to/from csv
Other options
16
DynamoDB Migration
■ScyllaDB Alternator API compatible with DynamoDB
■Frequently, no application changes required
■Show example of on-line migration using Scylla migrator
■Alternatives
DynamoDB migration
Write to DynamoDB
Time
Read from DynamoDB
Live Migration
19
Updates from DynamoDB Stream
Migrate
Schema
Enable
Streams
Write to DynamoDB
Forklifting Existing Data
DBs in Sync
Time
Read from DynamoDB
Live Migration
20
Updates from DynamoDB Stream
Apply updates from DynamoDB Streams
Migrate
Schema
Dual Reads
Write to DynamoDB
Forklifting Existing Data
Validation
DBs in Sync
Time
Read from ScyllaDB
Fade off
DynamoDB
Read from DynamoDB
Write to ScyllaDB
Live Migration
21
Updates from DynamoDB Stream
Apply updates from DynamoDB Streams
■Highly resilient to failures
■Access compatible Databases using a native connector
■High performance parallelized reads and writes
■Support transformations
■Open Source!
SQL
NoSQL
ScyllaDB Migrator
22
■Existing application fed from queue (kafka)
■Load backup or scylla migrate from source
■Replay queue from offset into target database
■Scylla Migrator now supports DynamoDB S3 backup (Parquet) into
ScyllaDB
Alternatives
Migration Tuning
■Spark and ScyllaDB are scalable!
■Scylla migrator breaks work into segments
■Allocate larger/more workers to process faster
■Consider over-provisioning spark/scylla during migration phase
■Configuration (during migration)
■DynamoDB - temporarily disable TTL during backfill
■Compactions: Increase min_threshold, set static_compaction_shares: 100
Migration tuning considerations
■Test migration throughput for a few minutes, then terminate
■Observe impact on source
■Verify throughput meets migration duration requirements
■Practice validation
■Truncate target, and plan/execute real migration
Validate migration throughput
Other Migrations
■SQL, MongoDB, …
■Data modeling and application changes likely required
■ETL style migration with application consideration required
■Follow migration dual write and backfill principles discussed earlier
Other migrations
Wrap it up
Summary
■ScyllaDB scales well horizontally and vertically
■Many migrations successful with no application changes!
■Rich tooling available to perform migration
■You can do it!
■Don’t be afraid to reach out and ask questions!
Stay in Touch
Patrick Bossman [email protected]
linkedin.com/in/bossman