A few months back I spoke with some graduate students about "what is data warehousing". In this talk I covered the past, present, and probably future of what data warehousing is and how it can add value to a company.
Size: 9.02 MB
Language: en
Added: Sep 21, 2021
Slides: 29 pages
Slide Content
All About Data Warehousing University of Westminster 26 May 2021
Agenda Why we create a warehouse The Goal The Past Warehouses past and challenges 01 02 The Present How we approach it today An Example The warehouse at TripActions 03 04
About Me 16 years in the data space Worked in a number of industries: Telco Gaming Retail Travel Currently leading all data at TripActions Business Intelligence Data Science Data Engineering Data Warehousing
The Goal 01 Why we warehouse
Traditional data systems have a number of problems : Built to match a technical ( write , single record read oriented ) model Match a single mechan ic al system, not the full “world” of a company 99% of real world data is terrible to work with: Incorrect Inconsistent Unclear Why We need a warehouse
Definition of a data warehouse A data warehouse is a subject oriented, non-volatile, time variant collection of data to support analysis and reporting
A Warehouse Subject Oriented The data model matches the subjects of the business, not the systems Time Variant The data model tracks and reflects history of time in a way that matches business thinking Integrated Data is combined to create consistency of granularity and terminology Non-Volatile Records do not change unless they have a legally required reason to do so
Data is always reflected in the same way, over time and across domains Data resists change and changing systems Consistency Durability Accessibility Data becomes a resource usable beyond the original creator, fast Benefits of a Warehouse
The Past 02 A bri ef histor y of data warehousing
The Daddies of DWH Bill Inmon Invented the data warehouse Ralph Kimball Made the data warehouse business, not technically, oriented
What we worked with
why it broke: Volume
why it broke: Velocity
why it broke: Accessibility
The Present 03 How the world is today
What’s old is New: Back to SQL
The Data Lakehouse
DWH velocity matches Engineering As fast as the business and the systems change, the data warehouse concept evolves
Data Warehouse becomes part of production
Data is preserved raw and changes made in data warehouse to keep flexibility The data model is created as the business needs it, to the scope required for today ELT not ETL Design on demand Software mindset Using dev best practices (CI/CD, TDD) to deliver fast, reliable changes Working Differently
A Real Warehouse 04 What this looks like in the real world
Integration Data Lake Persistent, raw , unedited data for analytics (30+ TB) Marts Denormalized marts for use by end users, dashboards, and machine learning Kimball modelled data warehouse to combine data and normalize it Storage Architecture
Data flows in and out
Data Warehouses are built in code
Workflows like an engineer
Documentation and TDD
Documentation and TDD
Maximum accessibility Dashboards >80% of the company uses the data warehouse through dashboards Raw SQL >80 end users write raw SQL to explore the data Self-Service BI >200 end users who explore data in the data warehouse via drag and drop tooling