The Future of Repair: Transparent and Incremental by Botond Dénes

ScyllaDB 237 views 7 slides Mar 05, 2025
Slide 1
Slide 1 of 7
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7

About This Presentation

Regularly run repairs are essential to keep clusters healthy, yet having a good repair schedule is more challenging than it should be. Repairs often take a long time, preventing running them often. This has an impact on data consistency and also limits the usefulness of the new repair based tombston...


Slide Content

A ScyllaDB Community
The Future of Repair:
Transparent and Incremental
Botond Denes
Software Team Lead

■Consistent data is important
■Consistent tombstones is even more important
■tombstone_gc = {'mode':'timeout'} - at least one repair per
GC_GRACE_SECONDS
■tombstone_gc = {'mode':'repair'} - only tombstones written before last repair
can be garbage-collected

Repair is important

■Slow
■Large dataset
■Mixed-shard cluster
■Need external tools to schedule
■nodetool - manual
■scylla-manager - needs provisioning

Repair is challenging

Incremental Repair
Not every repair needs to repair all the data!
■Repair only data written since last repair
■The more frequent repairs are – the less work they have to do
■Repairing often is no longer a problem

Move scheduling into ScyllaDB core!
■Operational simplicity - no need for external tools
■External tools can only observe past/current state
■Internal scheduling is also aware of plans

Automatic repair

■Tablet only
■Tablet repair is simpler - 1 peer shard on each peer node
■Tablet repair happens via the tablet scheduler
■User requests tablet(s) to be repaired via /storage_service/tablets/repair
■Request saved in system.tablets
■Repair of individual tablets is scheduled
■Excludes with ongoing migrations
■Consider load on individual nodes and shards

The way to incremental and automatic repair

Stay in Touch
Botond Denes
[email protected]
denesb
Tags