The Future of Repair: Transparent and Incremental by Botond Dénes
ScyllaDB
237 views
7 slides
Mar 05, 2025
Slide 1 of 7
1
2
3
4
5
6
7
About This Presentation
Regularly run repairs are essential to keep clusters healthy, yet having a good repair schedule is more challenging than it should be. Repairs often take a long time, preventing running them often. This has an impact on data consistency and also limits the usefulness of the new repair based tombston...
Regularly run repairs are essential to keep clusters healthy, yet having a good repair schedule is more challenging than it should be. Repairs often take a long time, preventing running them often. This has an impact on data consistency and also limits the usefulness of the new repair based tombstone garbage collection. We want to address these challenges by making repairs incremental and allowing for automatic repair scheduling, without relying on external tools.
Size: 718.13 KB
Language: en
Added: Mar 05, 2025
Slides: 7 pages
Slide Content
A ScyllaDB Community
The Future of Repair:
Transparent and Incremental
Botond Denes
Software Team Lead
■Consistent data is important
■Consistent tombstones is even more important
■tombstone_gc = {'mode':'timeout'} - at least one repair per
GC_GRACE_SECONDS
■tombstone_gc = {'mode':'repair'} - only tombstones written before last repair
can be garbage-collected
Incremental Repair
Not every repair needs to repair all the data!
■Repair only data written since last repair
■The more frequent repairs are – the less work they have to do
■Repairing often is no longer a problem
Move scheduling into ScyllaDB core!
■Operational simplicity - no need for external tools
■External tools can only observe past/current state
■Internal scheduling is also aware of plans
Automatic repair
■Tablet only
■Tablet repair is simpler - 1 peer shard on each peer node
■Tablet repair happens via the tablet scheduler
■User requests tablet(s) to be repaired via /storage_service/tablets/repair
■Request saved in system.tablets
■Repair of individual tablets is scheduled
■Excludes with ongoing migrations
■Consider load on individual nodes and shards