Leveraging AI to Simplify and Speed Up ETL Testing

RTTS 37,861 views 31 slides Jul 31, 2024
Slide 1
Slide 1 of 31
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31

About This Presentation

The data validation and ETL testing process is difficult and time-consuming without an automated ETL testing solution like QuerySurge.

Creating tests between source and target data stores requires:

- Strong SQL skills
- Lots of time

QuerySurge​’s new AI-powered technology is a generative arti...


Slide Content

Challenges

“79% of organizations
have more than 100 data sources
with 30% using
more than 1,000 sources.” - IDC
1.Driving Business Value from Data in the Face of Fragmentation and Complexity
Data Complexity

Data Growth
1.IDC High Data Growth and Modern Applications Drive New Storage Requirements in Digitally Transformed Enterprises
Data has been growing at an ever-increasing pace. Over the next five
years IDC predicts structured data will At a compounded annual
growth rate for 40.15% from 3,892 exabytes in 2021 to 21,406
exabytes in 2026.
1 exabyte = 1,000 petabyte
1 petabyte = 1,000 terabyte

“Lack of automation impacts the ability to scale data
management. ”
- IDC
“40% of organizations haven’t been able to automate any
process. ”
- IDC
“Poor data quality now costs organizations
an average of $12.9 million annually”
- Gartner
1.Driving Business Value from Data in the Face of Fragmentation and Complexity
2.How to Improve Your Data Quality

How can we solve these data
challenges?

“AI is the defining technology of our times.”
- Satya Nadella
Microsoft CEO

Artificial Intelligence and Language Models
Artificial Intelligence – Is the ability for computers to
understand their environment and to solve problems
rationally.
Language Models (LM) – Are AI systems built for Natural
Language Processing (NLP)

AI Expansion

How can Data Testers Leverage AI?

Test Design Strategy (before AI)
Skill LevelDesign EffortTest Strategy
Table-to-Table &
Column-to-Column Compares
Basic
create with
QuerySurge Wizard
Minimal
incredibly fast
Business Rules
Intermediate
create with SQL
Moderate
medium /fast
Row Count Compares
Basic
create with
QuerySurge Wizard
Minimal
incredibly fast
ETL Logic & Transformations
Intermediate/
Advanced
create with SQL
Moderate/
Difficult
time-consuming

Data Mappings
What is a Mapping Document?
•Also known as a Source-to-Target Map
•It’s THE critical element required to efficiently plan
data movement
•Defines the Extract, Transform, Load (ETL) process
It Captures:
✓Business rules
✓Data flow mapping
✓Data movement requirements
It Specifies:
▪Source input definition
▪Target/output details
▪Business & data transformation rules
▪Absolute & Optional data quality requirements
What is Data Mapping?
It’s the process of matching fields from one database
to another
Schema
Mapping Doc

Survey of Data Experts
Consensus:
•81% said their teams use formal mapping documents
•78% of those teams with mappings store them in Excel
(the next highest answer was a 3
rd
party vendor tool at 14%)
1
st
Question:
Does your team use formal mapping documents and if yes,
in what form do you store the information?
1
Poll conducted by RTTS on
targeted LinkedIn groups
We conducted a survey
1
on Linkedin of 380+ Data Analysts, Data Architects &
ETL Developers and asked 2 questions about mappings:
2
nd
question:
Approximately how many mappings are in your typical Data
Warehouse project?
Consensus:
•68% have more than 250 mappings
•14% have between 100 and 250 mappings

Typical Data Movement Process
(without automated solution)
Roles Tasks
Timeline
Data Analyst
Data Architect
ETL Developer
Data/ETL Tester
Models and
builds target
data store
Reviews
Mapping
Document
Maintains
Target data
Store
Creates 2
tests: 1 for
each mapping
w/SQL editor
Reviews
Mapping
Document
Dumps
results of
tests to 2
Excel files
Compares
Excel files
by eye
Executes
tests
Determines
Requirements
Creates & maintains
Mapping Document
iterate
iterate
Reviews
Mapping
Document
Extracts & Loads data or
Extracts, Transforms &
Loads data into new store
Builds data
movement
logic
where
QuerySurge

typically fits

Integrating AI with QuerySurge
Test Process without automated data validation & ETL testing solution
Test Process with QuerySurge + QuerySurge AI
Test process with automated ETL testing solution (but not QuerySurge AI)
Skilled resource creates
2 SQL tests
for each mapping
(only db to db compares)
Dump results
to
files
Compare
data manually
&
Summaries Results
Execute
tests
QuerySurge compares results
up to 1,000x faster
&
provides detailed reports
Execute
tests
Skilled resource creates
2 SQL tests
for each mapping
(200+ data stores: db, json,
csv, Excel, xml, other)
QuerySurge compares results
up to 1,000x faster
&
provides detailed reports
Execute
tests
Any resource converts
all mappings
in minutes
Time Sinkhole Time Sinkhole
Time Sinkhole Immense Time Savings
Immense Time Savings Immense Time Savings

About QuerySurge

The Executive Office and Critical Data
C-level executives are using BI & Analytics to
make critical business decisions with the
assumption that the underlying data is fine.
We know it is not

How QuerySurge
Works
QuerySurge connects
to any 2 points
at one time
Compares every data set to
find anomalies
Source Data Target Data
•Databases
•Data Warehouses
•Flat Files & Excel
•JSON files
•Web Services
•Mainframe files
•Data Warehouses
•Big Data Stores
(Hadoop & NoSQL)
•Business Intelligence
reports
Source Data
…and provides
Data Validation Analytics
Target Data

•Converts hundreds to thousands of data mappings
into tests in minutes
•Creates tests in the data store’s native SQL with little
to no human intervention
•Requires minimal technical skillset

Integrating AI with QuerySurge

2 Use Cases
1)Initial mapping upload:
Bulk upload of 100’s of mappings
at once
2) Incremental mapping uploads:
One or more mappings at a time

Mapping Uploads: Initial & Incremental
Initial Mapping Document
(100s of source-to-target data requirements)
QuerySurge QueryPairs
(SQL tests pulling data from source & target data stores for comparison)
Incremental Mapping Upload
(single source-to-target data requirement) QuerySurge AI
Pulls mappings into AI
model, converts to SQL and
injects into QuerySurge

QuerySurge Survey & ROI
Metrics:
•A data mapping of average complexity ≈ 1 hour of time to create
data tests
•To create data tests for 1,000 mappings ≈ 1,000 people/hours
•1,000 hours * $100/hour consulting rate = $100,000
•QuerySurge AI can translate 200 mappings per hour or
1,000 mappings in 5 hours
•5 hours * $100/hour = $500
Survey of 530 QuerySurge prospects and customers:
•20% have > 1,000 mappings
•10% have > 5,000 mappings
QuerySurge AI = Huge ROITherefore,

Demo

Future

Current AI Concerns
https://www.intel.com/content/www/us/en/newsroom/news/10-per-cent-orgs-launched-genai-solutions-2023.html

Compliance & Security
Network
Network
Cloud Hosted AI
Internally Hosted AI
QuerySurge
TM
AIOther AI Platforms

Reliability/Accuracy
QuerySurge’s fine tuned model was over 30% more accurate generating
SQL from mapping requirements then GPT-4
31.34%
38.63%
68.82%
0.00%
20.00%
40.00%
60.00%
80.00%
100.00%
GPT-3.5 GPT-4 QuerySurge AI
Execution Accuracy (%)
Model
SQL Generation Accuracy w/ Augmented Dataset

Cost
$51.00
$0.43
$0.00
$10.00
$20.00
$30.00
$40.00
$50.00
$60.00
GPT 4.0 QuerySurge AI
Cost Per 1,000 Mappings
QuerySurge
TM
AI has been optimized for both accuracy and size (cost). When compared to
GPT-4 QuerySurge
TM
AI is 30% more accurate and costs 1/100
th
of the price.

QuerySurge
TM
Trial
(1)Trial in the Cloud of QuerySurge
TM
, including
self-learning tutorial that works with sample data for 3 days
(2) Downloaded Trial of QuerySurge
TM
, including
self-learning tutorial with sample data or your data for 15 days
(3) Proof of Concept of QuerySurge
TM
for your clients for 45 days
•Includes all QuerySurge add-ons:
- Import/Export, Ready for Analytics, BI Tester,
QuerySurge DevOps for Data
•As-needed assistance from QuerySurge team
TRIAL
IN THE CLOUD
Proof
of
Concept
Available at: https://www.querysurge.com/trial

Free Self-Paced Training and Certifications
Learn. Earn. Inform.
Learn QuerySurge.
Earn your digital badge.
Inform your social network of your new skills.
Use Code: QS-TG-2024
Available until August 31
st
2024
Available at:
https://www.querysurge.com/cert

Q&A