Conquering Load Balancing: Experiences from ScyllaDB Drivers

Conquering Load Balancing: Experiences from ScyllaDB Drivers Piotr Grabowski Software Team Leader at ScyllaDB

Piotr Grabowski he / him Software Team Leader at ScyllaDB Software Team Leader at ScyllaDB responsible for all ScyllaDB drivers, ScyllaDB Kafka Connectors (ScyllaDB Sink Connector and ScyllaDB CDC Source Connector) Joined ScyllaDB 3 years ago

Load balancing in ScyllaDB

Load balancing in ScyllaDB SELECT * FROM table WHERE partition_key = “R1250GS” hash(“R1250GS”) = replica nodes

Naïve load balancing policy SELECT * FROM table WHERE partition_key = “R1250GS”

Round-robin policy A very basic load balancing policy algorithm Maintain a incrementing counter N When sending a query: S end it to replica number N Increment counter (modulo number of replicas) The load will be equally distributed between the nodes

Random choice policy A very basic load balancing policy algorithm When sending a query: Send the query to a random replica

Problems!

Slow node SELECT * FROM table WHERE partition_key = “R1250GS” Slow node F aulty network T emporary overload

Slow node SELECT * FROM table WHERE partition_key = “R1250GS” Slow node F aulty network T emporary overload + 10 ms latency

Slow core SELECT * FROM table WHERE partition_key = “R1250GS”

Inflight requests per node+shard Inflight requests

Slow core (synthetic scenario) 5 node i3.4xlarge Scylla cluster P urposefully severely overloaded a single core on one of the nodes That’s only 1/80 of the cluster! The results were staggering!

Slow core (synthetic scenario)

Least Inflight load balancing? One potential solution to that problem Instead of choosing any of the replicas, send the request to the least loaded one (as measured by number of inflight requests) Works well for a single client, but is susceptible to overloading a node in case of many clients

Power of Two Choices load balancing A simple algorithm that aims to ignore the slowest replica while avoid the problem of many clients overloading the same node When selecting the replica: Randomize 2 replicas Select the replica with a fewer number of inflight requests

Sending requests to a wrong shard A counterintuitive approach to dealing with an overloaded shard If you send the request to the wrong shard: Initial processing of the request happens on the wrong shard The wrong shard moves the processing to the correct shard The correct shard didn’t have to do the initial processing, reducing its load!

Sending requests to a wrong shard

Zone awareness in the Cloud

Zone awareness in the Cloud Instead of sending the request to any replica, prefer the replica in the local AZ Potential issues: A common setup is to use Replication Factor equal to 3 If the cluster is deployed in 3 AZs, there will be only a single replica in each of the AZs This severely limits the choices a load balancing algorithm has A problem with many solutions - a tradeoff between latencies and cost

Optimizing load balancing in ScyllaDB Rust Driver Benchmarks of ScyllaDB Rust Driver have shown that load balancing computation is a significant part of sending a query Main goal: reduce number of allocations and atomic operations while building the query plan, especially on the happy path: Plan function was split to pick() and fallback() methods. This allowed to better optimize the most common case, where only one node from the load balancing plan is needed Precomputation of replica sets: A struct introduced that precomputes replica lists of a given strategies, and provides O(1) access to desired replica slices

Optimizing load balancing in ScyllaDB Rust Driver Inserts: ---------- allocs/req: 15.00 reallocs/req: 8.00 frees/req: 15.00 bytes allocated/req: 2458.05 bytes reallocated/req: 269.06 bytes freed/req: 2456.80 (allocated - freed)/req: 1.25 Inserts: ---------- allocs/req: 6.01 reallocs/req: 6.00 frees/req: 6.00 bytes allocated/req: 381.80 bytes reallocated/req: 173.05 bytes freed/req: 380.62 (allocated - freed)/req: 1.18 Before After

Optimizing load balancing in ScyllaDB Rust Driver Inserts: ---------- allocs/req: 15.00 reallocs/req: 8.00 frees/req: 15.00 bytes allocated/req: 2458.05 bytes reallocated/req: 269.06 bytes freed/req: 2456.80 (allocated - freed)/req: 1.25 Inserts: ---------- allocs/req: 6.01 reallocs/req: 6.00 frees/req: 6.00 bytes allocated/req: 381.80 bytes reallocated/req: 173.05 bytes freed/req: 380.62 (allocated - freed)/req: 1.18 Before After 9 fewer allocations (-60%)

Optimizing load balancing in ScyllaDB Rust Driver Inserts: ---------- allocs/req: 15.00 reallocs/req: 8.00 frees/req: 15.00 bytes allocated/req: 2458.05 bytes reallocated/req: 269.06 bytes freed/req: 2456.80 (allocated - freed)/req: 1.25 Inserts: ---------- allocs/req: 6.01 reallocs/req: 6.00 frees/req: 6.00 bytes allocated/req: 381.80 bytes reallocated/req: 173.05 bytes freed/req: 380.62 (allocated - freed)/req: 1.18 Before After 84% fewer bytes allocated

Piotr Grabowski [email protected] Thank you! Let’s connect.

Conquering Load Balancing: Experiences from ScyllaDB Drivers

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Conquering Load Balancing: Experiences from ScyllaDB Drivers

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx