Low-Latency Data Access: The Required Synergy Between Memory & Disk

ScyllaDB 203 views 21 slides Jun 26, 2024
Slide 1
Slide 1 of 21
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21

About This Presentation

Analytics has moved from internal dashboards to a dashboard inside the product, providing a personalized experience for each user, be it the LinkedIn profile views or Uber’s online order management and inventory. Given the requirement of sub-millisecond response times on user-facing apps, how does...


Slide Content

Low-Latency Data Access: The Required Synergy Between CPU, Memory & Disk Kriti Kathuria Database Researcher

Kriti Kathuria ( she/her) Database Researcher Conceptualizing Eventual Durability SQL-gen for Incremental View Maintenance Data Engineer in a past life Good mentorship is powerful and fundamental At scale, the insignificant become significant! ‹#›

Motivation At scale, the insignificant becomes significant! A single IO takes insignificant time But when it is GBs of data and thousands of IO ops, the latencies become significant. Thus, p99, at scale, matters. ‹#›

Outline Motivation Existing Techniques Aggregation Processing Vectorization Query Compilation Closing Remarks ‹#›

Matrix Multiplication ‹#› Example from MIT 6.172, Fall 2018, Lecture 1

Matrix Multiplication ‹#› 3 matrices: A x B = C X = i k j k i j

Matrix Multiplication ‹#›

Matrix Multiplication ‹#›

Processing aggregate queries in a database ‹#›

Aggregation during run-generation ‹#›

Aggregation during run-generation ‹#› tpch sf = 1, 6M rows filter: 5M rows fetched from disk output: 4 rows Thanh Do, Goetz Graefe, and Jeffrey Naughton. 2023. Efficient Sorting, Duplicate Removal, Grouping, and Aggregation. ACM Trans. Database Syst. 47, 4, Article 16 (December 2022), 35 pages. https://doi.org/10.1145/3568027

Aggregation during run-generation ‹#› tpch sf = 1, 6M rows filter: 5M rows fetched from disk output: 4 rows tpch sf = 1000, 6B rows filter: 5B rows fetched from disk output: 4 rows Thanh Do, Goetz Graefe, and Jeffrey Naughton. 2023. Efficient Sorting, Duplicate Removal, Grouping, and Aggregation. ACM Trans. Database Syst. 47, 4, Article 16 (December 2022), 35 pages. https://doi.org/10.1145/3568027

Aggregation during run-generation ‹#› Run generation for sorting Sorted runs Reduction of sorted data Thanh Do, Goetz Graefe, and Jeffrey Naughton. 2023. Efficient Sorting, Duplicate Removal, Grouping, and Aggregation. ACM Trans. Database Syst. 47, 4, Article 16 (December 2022), 35 pages. https://doi.org/10.1145/3568027

Aggregation during run-generation ‹#› In-memory index Unsorted data on disk Thanh Do, Goetz Graefe, and Jeffrey Naughton. 2023. Efficient Sorting, Duplicate Removal, Grouping, and Aggregation. ACM Trans. Database Syst. 47, 4, Article 16 (December 2022), 35 pages. https://doi.org/10.1145/3568027

Vectorized Query Processing ‹#›

Vectorized Query Processing ‹#›

Vectorized Query Processing ‹#›

JIT Query Compilation ‹#›

JIT Query Compilation ‹#›

JIT Query Compilation ‹#›

Kriti Kathuria linkedin.com/in/kriti-kathuria/ twitter.com/kaykathuria Thank you! Let’s connect. ‹#›
Tags