[DSC DACH 24] Automatic ETL Migration - on-prem to cloud and more - Miljenko Vukovic

DataScienceConferenc1 79 views 31 slides Sep 21, 2024
Slide 1
Slide 1 of 31
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31

About This Presentation

Nowadays, Business Intelligence is a mature technology. Technological migrations are more frequent than one might imagine. Regardless if we are moving to cloud or just reengineering old integration jobs, rewriting ETL from scratch in a new technology might be looming over our heads. As one might ima...


Slide Content

Automatic ETL Migration
On-prem to Cloud and More
DSC Dach, Vienna
September2024

Speaker
2
Miljenko Vuković
VIS Solutions
Founder, Data Engineer
[email protected]
VIS Solutions
Zagrebačkiinovacijskicentar
AvenijaDubrovnik 15 (Paviljon12)
10000 Zagreb, Croatia
Web page: https://vis-solutions.eu/
Phone number: +385 1 353 55 71
Mail address: [email protected]

AGENDA
3
•Introduction
•Data Architectures
•Integration Technologies
•ETL Migration –When & Why
•ETL Migration –Recommended Project Steps
•ETL Migration –Many, Many Use Cases
•ETL Migration –Benefits & Minuses
•Conclusion

INTRODUCTION
Analytical Databases
4
Data
Warehouse
/ Data Lake
Data Engineering
Data Science
Analytics
Reporting
ERP
CRM
HR
….

INTRODUCTION
Reporting & Analytics
5
1.Reports& Dashboards
2.Ad-hoc Analytics
3.Data Science
•Machine Learning
•Statistical Methods

INTRODUCTION
Data Integration
6
ETL
•Extract, Transform, Load
ELT
•Extract, Load, Transform
Streaming
File Ingestion
… and many more

DATA
Architectures
7
Data Warehouse
•Among the oldest and most well established data architectures
Operational Data Store
Data Lake
•Why not simply dump all your data –structured and not –into a central place?
Data Lakehouse
Modern Data Architecture
•Cloud Data Warehouse
Lambda, Kappa, Data Mesh, Architecture for IoT and many more

CORPORATE ARCHITECTURE
Example
8
Data
Warehouse
Data Science
Analytics
Reporting
ERP
CRM
HR
IoT
(Big Data)
DataLake
Streaming

INTEGRATION
Technologies
9
Stored Procedures
•Early DWH projects often used this solution, especially with source and target
on the same database technology
Blue Chip (and other) Vendors’ ETL Tools
•MS SSIS, Oracle Data Integrator (ODI), IBM DataStage, Informatica…
On Cloud
•Exporting data in files (csv, xml, json…), loading it into cloud stage area,
executing transformation in tools like dbtor writing stored procedures
Cloud Data
Warehouse
Source
System(s)
Cloud
StagingArea

ETL Migration
When & Why?
10
•Moving to cloud (or back ☺)
•Existing technology is outdated / deprecated
•Reengineering estimated as long term cheaper than maintaining old
code
•Company politics (e.g. changing primary vendor)
•… and many, many more

ETL Migration
Recommended steps
11
1.Migrate existing jobs to new technology
as-is
•New job has to generate exactly the same rows
and columns as the old one
2.Technical refactoring
•Remove unnecessary steps, simplify
•Performance optimization
3.Functional refactoring
•Include business / BA to update logic if
necessary
1.AS IS
2. Technical
3. Functional

As-is ETL Migration
Automatic vs. Manual
12
1.Manual
•Basically, opening integration job in the old
technology and rewriting it into the new
technology
2.Automatic
•Technologies, algorithms and procedures for
automatic parsing of original job and generating
the new one

Automatic ETL Migration
Principles
13
1.Direct migration from one technology to another
•Reading file export of integration routine or integration
tool’s repository
•Generating script for the new job, or generating new job
directly in the target tool repository
2.Using middle layer (meta definition of job)
•Source parsing uses middle layer as target
•Target generation uses middle layer as source
Source
Technology
Target
Technology
Source
Technology
Target
Technology
Meta
Format

Automatic ETL Migration
MiddleLayerConcept
14

Automatic ETL Migration
Excel as Middle Layer
15
Using Excel as middle layer solves (integration routine’s) documentation in one
step
* There is also „where” part as well as part with integration routine’s variables
„Select” part
„From” part

Automatic ETL Migration
IBM DataStage as source
16
Example of using file export as conversion source
Export to
XML Select file,
click Start

Automatic ETL Migration
Cloud as Target
17
Example of Cloud Data Architecture
17
Cloud Data
Warehouse
Extract
(csv, json, xml…)
Data Science
Analytics
Reporting
Source
System(s)
Cloud
StagingArea
Load Transform

Automatic ETL Migration
Snowflake as target
18
Automation of ETL migration to Snowflakeusingstoredproceduresin
„Transform” step
Select file,
Click Start

Automatic ETL Migration
Stored Procedures as Target
19
Using templates, stored procedures can be easily generated for any target
technology
Select file,
Click Start

Automatic ETL Migration
Development from scratch
20
•Typically, developers will investigate data relations
using SQL
•That will quite often result in a draft of integration
logic in SQL, regardless of target technology
•Only then they will develop integration job in the
target technology, trying to get the same results as
SQL in the previous bullet
•The million dollars question –can the last step be
automated? ??????

Automatic ETL Migration
SQL as source
21
Use SQL Parser (select file, click) to
generate Excel
Use Excel to generate target technology
integration routines (select file, click)

Automatic ETL Migration
Stored Procedures as Source
22
•Extension of previous case since stored procedures
typically contain main logic in SQL
•Even if there are multiple SQL statements, stored
procedure can still be migrated as series of
integration jobs in target technology
•Intermediate data will be stored in tmptables

Automatic ETL Migration
ODI as Target
23
Example of generating target job (mappingin ODI terminology) directly
in repository
Select file,
Click Start

Automatic ETL Migration
ODI as Source
24
This time, since ODI is basically SQL generator,we use generated
SQL as a source
Select file,
Click Start

Automatic ETL Migration
Staging Area
25
Yes, the routines for loading staging area tables (copy
of source tables) can also be automatically generated,
regardless if:
•Staging area tables are loaded using python scripts
•Staging area tables are loaded using ETL tool itself
•Staging area tables are loaded using stored
procedures

Automatic ETL Migration
Benefits
26
•Smaller time (i.e. money) consumption per integration routine
•Productivity boost
•Standardization and best practice enforcement
•Self-documenting process

Automatic ETL Migration
Minuses
27
•Initial investment
•Time required to develop a solution or
select / implement existing one
•Even existing solutions probably will not be
able to process 100% of jobs!
•Technical and functional refactoring can
not be done automatically (yet ☺)

CONCLUSION
28
•If you have a small number of integration
routines (1 –2 digits), migrate ETL
manually
•If you have a large number of integration
routines (hundreds or more), try any
automation you can

WIN A PRIZE !
Trip to Vis Island for Two
29
Visit us at our stand and apply!

QUESTIONS?
…and Discussion
30
??????

SELECTED
Literature
31
Fundamentals of Data Engineering –Plan and Build Robust Data Systems
Joe Reis & Matt Housley, 2022 O’Reilly Media
The Data Warehouse Toolkit –The Definitive Guide to Dimensional Modelling
Ralph Kimball & MargyRoss, 2013 (3rd Edition), John Wiley & Sons
https://www.vis-solutions.eu/products/epic-ingenioso
Tags