[Paper Reading] QAGen: Generating query-aware test databases

PingCAP-TiDB 113 views 18 slides Nov 05, 2021

Slide 1 of 18

About This Presentation

Today, a common methodology for testing a database management system (DBMS) is to generate a set of test databases and then execute queries on top of them. However, for DBMS testing, it would be a big advantage if we can control the input and/or the output (e.g., the cardinality) of each individual ...

Size: 2.07 MB

Language: en

Added: Nov 05, 2021

Slides: 18 pages

Slide Content

Query-Awared Database Generators

Background and Demands

How to simulate clients’ bussiness? Ideal circumstance: Database Schema + Real Data + SQL Sometimes Real Data is not available Alternatives: Database Schema + SQL + (Statistics?) Mock distribution/ domains for an attribution pk -> uniform distribution pk + fk -> Intergration Constriant Zipfian Enum Type (every low cardinality) Some SQL are too complex to understand: TPC… SSB… Heavy Anylitical Tasks We need a query-aware generator

Hassles of Making a Generator For simple query, the solution is simple and direct select [cols...] from t1 join t2 on t1.pk = t2.fk What if we need to simulate some correlation where a < ‘xx’ and b > ‘yy’ Or proccess complex projection select case when cond1 then ***, case when cond2 And the combination of above circumstances….

Two papers with two different methods QAGen DGL

QAGen: Generating query-aware test databases

Symbolic Execution Volcano mode iterator Data are symbols instead of real data Generate constraints during execution

Agenda of QAGen Analyze the Query && Find out the Available “Knob”s A knob of operator is the cardinality of the output of the operator or the Distribution if pre-grouped Do the Symbolic Execution (and Get the Constraint of the Data) Instantiating the Data Always with expensive cost

Analyze the Query

Symbolic Execution - Selection GetNext for child, read tuple “t” [Positive Tuple Annotation] if output not reach c for each symbol s in t, insert <s, p> return t to parent [Negative Tuple Processing] insert <s, !p>

Symbolic Execution - Join Generate Join Distribution GetNext for (outer) child, read tuple “t” [Positive Tuple Annotation] replace the fk with pk [Negative Tuple Processing]

Instantiazing A typical process of model checking. Low efficiency

Do we need a state-of-the-art generator? Pros: without meddling, we can definitely get effective data work for complex quries that are not understandable perfectly guarantee constriants Cardinality constriants Integretry constriants Cons: low efficiency some cases still can’t be processed How about introducing an intermediate DSL

DGL: Flexible Database Generators

Another alternative What if we have understood the workload pretty well? Then use a more flexiable DSL instead of coding for every workload? DGL (Data Generate Language): Scalar Rows Iterator Distribution Real SQL (Query / Persist) Tables Expression / Functions

[Paper Reading] QAGen: Generating query-aware test databases

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

[Paper Reading] QAGen: Generating query-aware test databases

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx