Real-Time or Analytics Workloads... Why Not Both?

ScyllaDB 310 views 26 slides Jun 20, 2024
Slide 1
Slide 1 of 26
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26

About This Presentation

ScyllaDB’s Workload Prioritization provides resource optimization and performance isolation across workloads with different performance needs, such as Analytics and Real-time. In this session, you will learn how Workload Prioritization works, how you can use it to run different types of workloads ...


Slide Content

Real-Time or Analytics Workloads... Why Not Both? Felipe Cardeneti Mendes, Solution Architect at ScyllaDB

Felipe Mendes Published Author ScyllaDB Committer IT Specialist & Solution Architect Open Source Enthusiast Your photo goes here, smile :)

ScyllaDB as an Analytics Engine

An IoT Application Total amount of data points 526 billion temperature readings 1 ,000,000 sensors, representing homes in an area 365 days (1 year storage requirement) 1 reading per minute

Analytics over the entire data? How long would it take at normal speeds? We need more if analytics are a part of the pipeline W e need Scylla DB 200,000 points/second 730 hours (30 days) 1 million points/second 146 hours (almost a week)

Easy Peasy Scanning 3 months of data Finished Scanning. Succeeded 132,480,000,000 rows. Failed 0 rows. RPC Failures: 0. Took 110,892.91 ms Processed 1,194,666,083 rows/s Absolute min: 19.71, date 2019-08-27, sensorID 473869 Absolute max: 135.21, date 2019-08-27, sensorID 473869

Easy Peasy Scanning the entire dataset Finished Scanning. Succeeded 525,599,474,400 rows. Failed 0 rows. RPC Failures: 0. Took 542,191.31 ms Processed 969,398,554 rows/s Absolute min: 68.00, date 2019-05-28, sensorID 82114 Absolute max: 79.99, date 2019-03-19, sensorID 152594

We can efficiently process over 1.2 billion points per second (we’ll process whatever you need, too!)

Concurrency Challenges

‹#›

P99 climbs to unacceptable values The Latency Problem

Throughput gradually drops, load distribution becomes unfair The Throughput Problem

Why Contention Happens? Primarily lack of system resources (disk I/O, CPU time) Not necessarily a problem Introduces queueing

Easy then, let's simply ... Addressing the Problem? Divide and conquer! Division in Space (Multi DC) Division in time (off peak OLAP)

Workload Prioritization

Isolation and Performance Optimization Background Tasks User Tasks

Shares Different workloads require different priorities Meet SLAs Flexible Configuration Adaptability to Changing Conditions

Getting Started Set up authentication, assign Roles to each workload Prioritization is wired on the authenticated users role Define your Service Levels For a primary workload: For a secondary one: Profit! CREATE SERVICE LEVEL main WITH shares = 200 CREATE SERVICE LEVEL secondary WITH shares = 600

Prioritization and Isolation are NOT enough!

Workload Characteristics

Workloads Characteristics: Time ‹#› The timeout dilemma: Timeout should follow: 𝑇𝑠𝑒𝑟𝑣𝑒𝑟 ≤ 𝑇𝑐𝑙𝑖𝑒𝑛𝑡 For Real-time : Can’t be too high Incurs retries or dropped requests Excessive retries result in wasted resources For Analytics: Can’t be too low Otherwise Batch will likely fail High throughput will typically increase latencies due to contention

Workloads Characteristics: Shedding ‹#› Overload response: Interactive workload: Throttling won't help Delaying response to user A will not cause some user B to stop sending requests Unbound concurrency Batch workload: Just throttle Allow us to have a knob that controls the pace of the analytics workload Bounded concurrency

Introducing Workload Characterization ‹#› Ideally – we want the database to behave differently: For Real-time: Have low timeout Load shedding (fail excessive requests), as the database can NOT slow down interactive workloads. Dedicate most of the resources to this workload. For Batch: Relatively higher timeout Apply back-pressure via throttling Use mostly unused resources

For Real-Time: For Analytics: Why not just hint the database with specifics? Introducing Workload Characterization ‹#› Have low timeout (30ms) timeout=30ms Load shedding AND workload_type=interactive Dedicate most of the resources AND shares=800 Have relatively high timeout (5s) timeout=5s Throttling AND workload_type=batch Use mostly unused resources AND shares=200

Takeaways ‹#› ScyllaDB powers both Analytics & Real-time intensive workloads Workload Prioritization helps with: Infrastructure Consolidation Resource Optimization Performance Isolation Workload Characterization provide: Workload specific settings Distinct overload and timeout responses

Stay in Touch Felipe Mendes [email protected] @cardeneti82118 fee-mendes Find me on LinkedIn
Tags