Geo-Distributed Databases: from Spanner (2012) to R4 (2024) 2012 2015 2019 2024 Optimize mainly Atomic Commit NOT focusing on Tail Latency!
Tail Latency? No Improvement! For Tail Latency: Atomic Commit is NOT the key Abort time is the key!
Bonspiel’s Goal: Reduce Abort Time Reduce abort rate Geo-Distributed Concurrency Control (GDCC) Reduce abort penalty Geo-aware Access Method Selection (GAMS) Abort time = abort rate X abort penalty
Geo-Distributed Concurrency Control (GDCC) Design Principle #1: MR abort-free (with respect to SR ) Reduce MR abort rate Design Principle #2: SR wound-free Minimize performance impact on SR Design Principle #3: MR wait-minimal (with respect to SR ) SR is NOT lightweight in geo-distributed environment – Paxos Logging for 1 WAN RTT Minimize MR’s wait on SR Distinguish them! Multi-Region (MR) Transactions Costly minority Single-Region (SR) Transactions Cheaper majority Dominate Tail Latency
Idea of GDCC OCC Reservation [Polaris; SIGMOD’23] Conditional Wait Early Visible Write
What is Reservation? [Polaris; SIGMOD’23] For supporting priority in OCC Records reserved by a transaction can’t be updated by lower-priority transactions (i.e., validation would fail)
Holder Idea of GDCC MR transactions higher priority SR transactions lower priority Does Polaris solve all problems? NO!
MR Reserve(x) SR Paxos-logging (1 WAN RTT) Validation lock(x) Commit update & unlock(x) ... ... ... Abort MR? violating MR abort-free Wound SR? violating SR wound-free Problems of Polaris in the Geo-Distributed Setting
MR Reserve(x) ... ... ... ... ... ... ... ... ... ... ... ... ... SR Paxos-logging (1 WAN RTT) Validation lock(x) Commit update & unlock(x) ... ... ... Let MR wait? LONG WAIT for MR! Problems of Polaris in the Geo-Distributed Setting
MR Reserve(x) SR Paxos-logging (1 WAN RTT) Validation lock(x) Commit update & unlock(x) ... ... ... Idea of GDCC: Early Visible Write Read(x) ... Early_visible (x) Many other challenges: Full details in the paper
GAMS: Reduce Abort Penalty of Multi-Region Transactions “Cold” record? “Hot” record? Read-Leader(x) Fresh but long latency Read-Nearest(x) Shortest latency but might be stale Transaction T: Read(x) // x in P1 Read-Nearest Read-Leader
Experiment Results (TPC-C): Top-tier Average and P50 Latency
Experiment Results (TPC-C): Top-tier Throughput
Conclusion Low tail latency : around 1.8s even under high contention Top-tier throughput and average latency compared to SOTAs Full generality – no restrictions enforced