planetcassandra
3,718 views
51 slides
Oct 02, 2015
Slide 1 of 51
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
About This Presentation
Anti-entropy repairs are known to be a very peculiar maintenance operation of Cassandra clusters. They are problematic mostly because of the potential of having negative impact on the cluster's performance. Another problematic aspect is the difficulty of managing the repairs of Cassandra cluster...
Anti-entropy repairs are known to be a very peculiar maintenance operation of Cassandra clusters. They are problematic mostly because of the potential of having negative impact on the cluster's performance. Another problematic aspect is the difficulty of managing the repairs of Cassandra clusters in a careful way that would prevent the negative performance impact.
Based on the long-term pain we have been experiencing with managing repairs of nearly 100 Cassandra clusters, and being unable to find a solution that would meet our needs, we went ahead and developed an open-source tool, named Cassandra Reaper [1], for easy management of Cassandra repairs.
Cassandra Reaper is a tool that automates the management of anti-entropy repairs of Cassandra clusters in a rather smart, efficient and careful manner while requiring minimal Cassandra expertise.
I will have to cover some basics of eventual consistency mechanisms of Cassandra, after which I will be able to focus on the features of Cassandra Reaper and our six months of experience having the tool managing the repairs of our production clusters.
Coordinated process
Four steps:
1: Hash
2: Compare
Anti-entropy Repair
###
Coordinated process
Four steps:
1: Hash
2: Compare
3: Stream
Anti-entropy Repair
Coordinated process
Four steps:
1: Hash
2: Compare
3: Stream
4: Merge
Anti-entropy Repair
Coordinated process
Four steps:
1: Hash
2: Compare
3: Stream
4: Merge
Can go wild...
Anti-entropy Repair
Repair gone wild
Repair gone wild
Eats a lot of disk IO
●because of hashing all the data
Repair gone wild
Eats a lot of disk IO
Saturates the network
●because of streaming a lot of data around
Repair gone wild
Eats a lot of disk IO
Saturates the network
Fills up the disk
●because of receiving all replicas, possibly
from all other data centers
Repair gone wild
Eats a lot of disk IO
Saturates the network
Fills up the disk
Causes a ton of compactions
●because of having to merge the received
data
Repair gone wild
Eats a lot of disk IO
Saturates the network
Fills up the disk
Causes a ton of compactions
… one better be careful
Careful repair
nodetool repair
Careful repair
All three intervals
Partitioner range
●nodetool repair -pr
Careful repair
This interval only
Start & end tokens
●nodetool repair -pr -st -et
Careful repair
A part of interval only
Requires splitting the ring into smaller intervals
Smaller intervals mean less data
Less data means fewer repairs gone wild
Careful repair
Smaller intervals also mean more intervals
More intervals mean more actual repairs
Repairs need to be babysat :(
Careful repair
The Spotify way
Feature teams meant to do features
Not waste time operating their C* clusters
Cron-ing nodetool repair is no good
●mostly due to no feedback loop
This all led to creation of the Reaper
The Reaper
REST(ish) service
Does a lot of JMX
Orchestrates repairs for you
The reaping
You:
curl http://reaper/cluster --data ‘{“seedHost” : “my.cassandra.host.net”}’
The Reaper:
●Figures out cluster info (e.g. name, partitioner)
The reaping
You:
curl http://reaper/repair_run --data ‘{“clusterName”: “myCluster”}’
The Reaper:
●Prepares repair intervals
The reaping
You:
curl -X PUT http://reaper/repair_run/42 -d state=RUNNING
The Reaper:
●Starts triggering repairs of repair intervals
Reaper’s features
Reaper’s features
Carefulness - doesn’t kill a node
●checks for node load
●backs off after repairing an interval
Reaper’s features
Carefulness - doesn’t kill a node
Resilience - retries when things break
●because things break all the time
Reaper’s features
Carefulness - doesn’t kill a node
Resilience - retries when things break
Parallelism - no idle nodes
●multiple small intervals in parallel
Reaper’s features
Carefulness - doesn’t kill a node
Resilience - retries when things break
Parallelism - no idle nodes
Scheduling - setup things only once
●regular full-ring repairs
Reaper’s features
Carefulness - doesn’t kill a node
Resilience - retries when things break
Parallelism - no idle nodes
Scheduling - setup things only once
Persistency - state saved somewhere
●a bit of extra resilience
What we reaped
First repair done 2015-01-28
1,700 repairs since then, recently 90 per week
176,000 (16%) segments failed at least once
60 repair failures
What we reaped
What we reaped
Reaper’s Future
CASSANDRA-10070
Whatever is needed until then
Greatest benefit
Cassandra Reaper automates a very tedious
maintenance operation of Cassandra clusters
in a rather smart, efficient and careful manner
while requiring minimal Cassandra expertise