Low-Latency Data Access: The Required Synergy Between Memory & Disk

ScyllaDB 203 views 21 slides Jun 26, 2024

Slide 1 of 21

About This Presentation

Analytics has moved from internal dashboards to a dashboard inside the product, providing a personalized experience for each user, be it the LinkedIn profile views or Uber’s online order management and inventory. Given the requirement of sub-millisecond response times on user-facing apps, how does...

Size: 6 MB

Language: en

Added: Jun 26, 2024

Slides: 21 pages

Slide Content

Low-Latency Data Access: The Required Synergy Between CPU, Memory & Disk Kriti Kathuria Database Researcher

Kriti Kathuria ( she/her) Database Researcher Conceptualizing Eventual Durability SQL-gen for Incremental View Maintenance Data Engineer in a past life Good mentorship is powerful and fundamental At scale, the insignificant become significant! ‹#›

Motivation At scale, the insignificant becomes significant! A single IO takes insignificant time But when it is GBs of data and thousands of IO ops, the latencies become significant. Thus, p99, at scale, matters. ‹#›

Outline Motivation Existing Techniques Aggregation Processing Vectorization Query Compilation Closing Remarks ‹#›

Matrix Multiplication ‹#› Example from MIT 6.172, Fall 2018, Lecture 1

Matrix Multiplication ‹#› 3 matrices: A x B = C X = i k j k i j

Matrix Multiplication ‹#›

Processing aggregate queries in a database ‹#›

Aggregation during run-generation ‹#›

Aggregation during run-generation ‹#› tpch sf = 1, 6M rows filter: 5M rows fetched from disk output: 4 rows Thanh Do, Goetz Graefe, and Jeffrey Naughton. 2023. Efficient Sorting, Duplicate Removal, Grouping, and Aggregation. ACM Trans. Database Syst. 47, 4, Article 16 (December 2022), 35 pages. https://doi.org/10.1145/3568027

Aggregation during run-generation ‹#› tpch sf = 1, 6M rows filter: 5M rows fetched from disk output: 4 rows tpch sf = 1000, 6B rows filter: 5B rows fetched from disk output: 4 rows Thanh Do, Goetz Graefe, and Jeffrey Naughton. 2023. Efficient Sorting, Duplicate Removal, Grouping, and Aggregation. ACM Trans. Database Syst. 47, 4, Article 16 (December 2022), 35 pages. https://doi.org/10.1145/3568027

Aggregation during run-generation ‹#› Run generation for sorting Sorted runs Reduction of sorted data Thanh Do, Goetz Graefe, and Jeffrey Naughton. 2023. Efficient Sorting, Duplicate Removal, Grouping, and Aggregation. ACM Trans. Database Syst. 47, 4, Article 16 (December 2022), 35 pages. https://doi.org/10.1145/3568027

Aggregation during run-generation ‹#› In-memory index Unsorted data on disk Thanh Do, Goetz Graefe, and Jeffrey Naughton. 2023. Efficient Sorting, Duplicate Removal, Grouping, and Aggregation. ACM Trans. Database Syst. 47, 4, Article 16 (December 2022), 35 pages. https://doi.org/10.1145/3568027

Vectorized Query Processing ‹#›

JIT Query Compilation ‹#›

Kriti Kathuria linkedin.com/in/kriti-kathuria/ twitter.com/kaykathuria Thank you! Let’s connect. ‹#›

Low-Latency Data Access: The Required Synergy Between Memory & Disk

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Low-Latency Data Access: The Required Synergy Between Memory &amp; Disk

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx

Low-Latency Data Access: The Required Synergy Between Memory & Disk