A brief history of data warehousing

RobWinters1 231 views 29 slides Sep 21, 2021
Slide 1
Slide 1 of 29
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29

About This Presentation

A few months back I spoke with some graduate students about "what is data warehousing". In this talk I covered the past, present, and probably future of what data warehousing is and how it can add value to a company.


Slide Content

All About Data Warehousing University of Westminster 26 May 2021

Agenda Why we create a warehouse The Goal The Past Warehouses past and challenges 01 02 The Present How we approach it today An Example The warehouse at TripActions 03 04

About Me 16 years in the data space Worked in a number of industries: Telco Gaming Retail Travel Currently leading all data at TripActions Business Intelligence Data Science Data Engineering Data Warehousing

The Goal 01 Why we warehouse

Traditional data systems have a number of problems : Built to match a technical ( write , single record read oriented ) model Match a single mechan ic al system, not the full “world” of a company 99% of real world data is terrible to work with: Incorrect Inconsistent Unclear Why We need a warehouse

Definition of a data warehouse A data warehouse is a subject oriented, non-volatile, time variant collection of data to support analysis and reporting

A Warehouse Subject Oriented The data model matches the subjects of the business, not the systems Time Variant The data model tracks and reflects history of time in a way that matches business thinking Integrated Data is combined to create consistency of granularity and terminology Non-Volatile Records do not change unless they have a legally required reason to do so

Data is always reflected in the same way, over time and across domains Data resists change and changing systems Consistency Durability Accessibility Data becomes a resource usable beyond the original creator, fast Benefits of a Warehouse

The Past 02 A bri ef histor y of data warehousing

The Daddies of DWH Bill Inmon Invented the data warehouse Ralph Kimball Made the data warehouse business, not technically, oriented

What we worked with

why it broke: Volume

why it broke: Velocity

why it broke: Accessibility

The Present 03 How the world is today

What’s old is New: Back to SQL

The Data Lakehouse

DWH velocity matches Engineering As fast as the business and the systems change, the data warehouse concept evolves

Data Warehouse becomes part of production

Data is preserved raw and changes made in data warehouse to keep flexibility The data model is created as the business needs it, to the scope required for today ELT not ETL Design on demand Software mindset Using dev best practices (CI/CD, TDD) to deliver fast, reliable changes Working Differently

A Real Warehouse 04 What this looks like in the real world

Integration Data Lake Persistent, raw , unedited data for analytics (30+ TB) Marts Denormalized marts for use by end users, dashboards, and machine learning Kimball modelled data warehouse to combine data and normalize it Storage Architecture

Data flows in and out

Data Warehouses are built in code

Workflows like an engineer

Documentation and TDD

Documentation and TDD

Maximum accessibility Dashboards >80% of the company uses the data warehouse through dashboards Raw SQL >80 end users write raw SQL to explore the data Self-Service BI >200 end users who explore data in the data warehouse via drag and drop tooling

Thanks [email protected] Do you have any questions?