Cube Store Design Decisions OSA Analytic

SandeepTewatia12 32 views 53 slides Jun 11, 2024
Slide 1
Slide 1 of 53
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53

About This Presentation

Cube store design decisions


Slide Content

Cube Store Design Decisions ‹#› November 2 , 2021

Pavel Tiunov @ paveltiunov87 Co-founder & CTO @ Cube Dev Cube.js and Cube Store author github.com/cube-js/cube.js

User-facing analytics ‹#›

User-facing analytics ‹#›

Uber Eats eng.uber.com/restaurant-manager

How to build a customer-facing analytics data app ‹#›

Pipelines ‹#›

Pipelines: pros & cons Criteria Virtually unlimited data size ✅ Sub-second query latency ✅ High query concurrency (1000+ QPS) ✅ Low ingestion latency ✅ Low ingestion pipeline maintenance cost ❌ Cheap data replays ❌ Scalable join support ✅ ‹#›

Time series databases ‹#›

TSDBs: pros & cons Criteria Virtually unlimited data size ✅ / ❌ Sub-second query latency ✅ High query concurrency (1000+ QPS) ✅ / ❌ Low ingestion latency ✅ Low ingestion pipeline maintenance cost ✅ Cheap data replays ✅ Scalable join support ❌ ‹#›

OLAP datastores ‹#›

OLAP datastores: pros & cons Criteria Virtually unlimited data size ✅ Sub-second query latency ✅ High query concurrency (1000+ QPS) ✅ Low ingestion latency ✅ Low ingestion pipeline maintenance cost ❌ Cheap data replays ✅ / ❌ Scalable join support ✅ ‹#›

Cloud data warehouses ‹#›

Cloud DWHs: pros & cons Criteria Virtually unlimited data size ✅ Sub-second query latency ❌ High query concurrency (1000+ QPS) ❌ Low ingestion latency ❌ Low ingestion pipeline maintenance cost ✅ Cheap data replays ✅ Scalable join support ✅ ‹#›

Enhance Cloud DWH so it can be as performant as OLAP Store Save original Cloud DWH user experience Solution 🦄 ‹#›

Solution 🦄 Criteria Virtually unlimited data size ✅ Sub-second query latency ❌ ⇒ ✅ High query concurrency (1000+ QPS) ❌ ⇒ ✅ Low ingestion latency ❌ ⇒ ✅ Low ingestion pipeline maintenance cost ✅ Cheap data replays ✅ Scalable join support ✅ ‹#›

Enter Cube Store ‹#›

Billions of rows as an input Sub-second response time High query concurrency Input data always have unique primary key Sources are both batching and real-time streaming Joins between data sources Requirements ‹#›

Cube Store Architecture ‹#›

High throughput Meta Store Indexes are sorted copies Auto partitioning Parquet as a format Distributed FS as a storage layer Shared-nothing architecture Collocated join indexes Real-time in-memory chunks Tools: Rust, Arrow, DataFusion, RocksDB Design decisions ‹#›

Design decisions matrix Billions of rows as an input Sub-second response time High query concurrency Input data always have unique primary key Sources are both batching and real-time streaming Joins between data sources High throughput Meta Store ✅ ✅ Indexes are sorted copies ✅ ✅ ✅ ✅ Auto partitioning ✅ ✅ ✅ ✅ Parquet as a format ✅ Distributed FS as a storage layer ✅ ✅ Shared-nothing architecture ✅ Collocated join indexes ✅ ✅ ✅ ✅ Real-time in-memory chunks ✅ ✅ ✅ ‹#›

High Throughput Meta Store

High Throughput Meta Store Most accessed data in Cube Store Relatively small to data stored but still can be 10M+ of rows in size Strong read after write consistency Heavy read oriented but with significant write concurrency Should provide RDBMS experience RocksDB is a great tool for a problem

Meta Store Tables Schema Table Index Partition Chunk Job Source

Meta Store Access Flow

Meta Store key-value structure

Indexes are sorted copies

Indexes are sorted copies Sorting is the most efficient way to compress data for columnar data structures Most optimal for filtering Index per query is the way to go Paying write amplification for the speed of reads is ok Every algorithm can be merge-based: merge sort, merge join, merge aggregate.

Indexes are used based on filters

Data compression by sorting

Data compression by sorting

Auto partitioning

Auto partitioning Helps to keep all partitions relatively the same size Easy to implement as everything is sorted

Partition split at compaction

Parquet as a format

Parquet as a format De facto standard for storing columnar data Implements Partition Attributes Across (PAX) [1] Allows to leverage min-max statistics for filter push down optimizations [1] https://www.pdl.cmu.edu/PDL-FTP/Database/pax.pdf

Parquet file structure

Distributed FS as a storage layer

Distributed FS as a storage layer Storage-compute separation Compute nodes can be disposable and interchangeable Easy to build replication implementation Warmup everything that will be accessible by queries just before table becomes online to avoid download wait time

Distributed FS as a storage layer

Partition Warmup

Real-time in-memory chunks ‹#›

Real-time in-memory chunks Act as a buffer to avoid single row writes to Parquet Streaming rows sent to the partition owner worker In-memory chunks compacted at 1 minute intervals t o a persisted Parquet partition ‹#›

Routing for in-memory chunks ‹#›

Shared-nothing architecture ‹#›

Shared-nothing architecture Worker nodes never communicate with each other Router node receives only aggregated results and just needs to merge them ‹#›

Shared-nothing architecture ‹#›

Lambda architecture query ‹#›

Collocated join indexes ‹#›

Collocated join indexes Fastest distributed join algorithm Can be also used by other algorithms like shared-nothing Top-K ‹#›

Collocated join indexes ‹#›

Collocated join indexes ‹#›

Learn more about Cube & Cube Store @paveltiunov87 · c ube.dev Thanks! 🙌 ‹#›
Tags