Spotify: Automating Cassandra repairs

planetcassandra 3,718 views 51 slides Oct 02, 2015
Slide 1
Slide 1 of 51
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51

About This Presentation

Anti-entropy repairs are known to be a very peculiar maintenance operation of Cassandra clusters. They are problematic mostly because of the potential of having negative impact on the cluster's performance. Another problematic aspect is the difficulty of managing the repairs of Cassandra cluster...


Slide Content

Automating Cassandra
Repairs

Radovan Zvoncek
[email protected]

github.com/spotify/cassandra-reaper
#CassandraSummit

About zvo

About zvo
Likes pancakes

About zvo
Likes pancakes

Does this for the 3rd time

About zvo
Likes pancakes

Does this for the 3rd time

Works at Spotify

Working at Spotify
Is autonomous

Squads responsible for their full stack

Including Cassandra

Cassandra

Node’s data



Cassandra

Replication
Cassandra

Running Cassandra
Requires many things

One of them is keeping data consistent

Otherwise it can get lost or reappear

Running Cassandra
Requires many things

One of them is keeping data consistent

Eventually

Eventual consistency





Cassandra

Eventual consistency

Read Repairs





Cassandra
R
W

Eventual consistency

Hinted Handoff





Cassandra

Eventual consistency

Anti-entropy Repair





Cassandra

Coordinated process








Anti-entropy Repair

Coordinated process
Four steps:







Anti-entropy Repair

Coordinated process
Four steps:
1: Hash







Anti-entropy Repair
#
#
#

Coordinated process
Four steps:
1: Hash
2: Compare







Anti-entropy Repair
###

Coordinated process
Four steps:
1: Hash
2: Compare
3: Stream







Anti-entropy Repair

Coordinated process
Four steps:
1: Hash
2: Compare
3: Stream
4: Merge







Anti-entropy Repair

Coordinated process
Four steps:
1: Hash
2: Compare
3: Stream
4: Merge
Can go wild...






Anti-entropy Repair

Repair gone wild

Repair gone wild
Eats a lot of disk IO
●because of hashing all the data

Repair gone wild
Eats a lot of disk IO
Saturates the network
●because of streaming a lot of data around

Repair gone wild
Eats a lot of disk IO
Saturates the network
Fills up the disk
●because of receiving all replicas, possibly
from all other data centers

Repair gone wild
Eats a lot of disk IO
Saturates the network
Fills up the disk
Causes a ton of compactions
●because of having to merge the received
data

Repair gone wild
Eats a lot of disk IO
Saturates the network
Fills up the disk
Causes a ton of compactions

… one better be careful

Careful repair

nodetool repair


Careful repair
All three intervals

Partitioner range
●nodetool repair -pr


Careful repair
This interval only

Start & end tokens
●nodetool repair -pr -st -et


Careful repair
A part of interval only

Requires splitting the ring into smaller intervals

Smaller intervals mean less data

Less data means fewer repairs gone wild

Careful repair

Smaller intervals also mean more intervals

More intervals mean more actual repairs

Repairs need to be babysat :(


Careful repair

The Spotify way
Feature teams meant to do features
Not waste time operating their C* clusters

Cron-ing nodetool repair is no good
●mostly due to no feedback loop

This all led to creation of the Reaper

The Reaper
REST(ish) service

Does a lot of JMX

Orchestrates repairs for you

The reaping
You:
curl http://reaper/cluster --data ‘{“seedHost” : “my.cassandra.host.net”}’

The Reaper:
●Figures out cluster info (e.g. name, partitioner)

The reaping
You:
curl http://reaper/repair_run --data ‘{“clusterName”: “myCluster”}’

The Reaper:
●Prepares repair intervals

The reaping
You:
curl -X PUT http://reaper/repair_run/42 -d state=RUNNING

The Reaper:
●Starts triggering repairs of repair intervals

Reaper’s features

Reaper’s features
Carefulness - doesn’t kill a node
●checks for node load
●backs off after repairing an interval

Reaper’s features
Carefulness - doesn’t kill a node
Resilience - retries when things break
●because things break all the time

Reaper’s features
Carefulness - doesn’t kill a node
Resilience - retries when things break
Parallelism - no idle nodes
●multiple small intervals in parallel

Reaper’s features
Carefulness - doesn’t kill a node
Resilience - retries when things break
Parallelism - no idle nodes
Scheduling - setup things only once
●regular full-ring repairs

Reaper’s features
Carefulness - doesn’t kill a node
Resilience - retries when things break
Parallelism - no idle nodes
Scheduling - setup things only once
Persistency - state saved somewhere
●a bit of extra resilience

What we reaped
First repair done 2015-01-28
1,700 repairs since then, recently 90 per week
176,000 (16%) segments failed at least once
60 repair failures

What we reaped

What we reaped

Reaper’s Future
CASSANDRA-10070

Whatever is needed until then

Greatest benefit
Cassandra Reaper automates a very tedious
maintenance operation of Cassandra clusters
in a rather smart, efficient and careful manner
while requiring minimal Cassandra expertise

github.com/spotify/cassandra-reaper

#CassandraSummit

Thank you!


github.com/spotify/cassandra-reaper

#CassandraSummit