Data Warehousing - in the real world

ukc4 6,494 views 34 slides Nov 26, 2015
Slide 1
Slide 1 of 34
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34

About This Presentation

Data Warehouse, Business Intelligence, Big Data


Slide Content

Data Warehousing - in the real world - Dr. Thomas Zurek @tfxz November 2015 Big Data und Analytische Applikationen

Who am I ? Vice President of Development @ SAP for HANA Data Warehousing (DW) Enterprise Performance Management (EPM) HANA Analytics 18 years at SAP PhD in Computer Science Universities of Karlsruhe and Edinburgh 2 November 2015 Real-World Data Warehouses / Thomas Zurek

Agenda Examples Business Intelligence (BI) + Data Warehouses (DW) Data Warehouses Layered Scalable Architecture (LSA) Big Data + Data Warehousing Summary 3 November 2015 Real-World Data Warehouses / Thomas Zurek

Examples 4 November 2015 Real-World Data Warehouses / Thomas Zurek

Examples of Business Intelligence Scenarios fraud detection retail company point-of-sales data & given discounts huge amounts of data a prototypical BI question long tail analysis e-commerce companies like Amazon, Ebay, iTunes, Netflix, … translate sales of popular products into (additional) sales in the long tail BI integrated into operational processes November 2015 Real-World Data Warehouses / Thomas Zurek 5

Long Tail Analysis (1) – An Amazon Example November 2015 Real-World Data Warehouses / Thomas Zurek 6

Long Tail Analysis (2) Source: Chris Anderson, The Long Tail , Wired, October 2004, http://www.wired.com/wired/archive/12.10/tail.html November 2015 Real-World Data Warehouses / Thomas Zurek 7

Long Tail Analysis (3) Source: Chris Anderson, The Long Tail , Wired, October 2004, http://www.wired.com/wired/archive/12.10/tail.html November 2015 Real-World Data Warehouses / Thomas Zurek 8

Business Intelligence + Data Warehouses 9 November 2015 Real-World Data Warehouses / Thomas Zurek

Business Intelligence and Data Warehouses Business Intelligence (BI) An environment in which business users conduct analyses that yield overall understanding of where the business has been, where it is now, and where it will be in the near future (i.e. planning). Data Warehouse (DW) An implementation of an informational database used to collect, integrate and provide sharable data sourced from multiple operational databases for analyses. Provide data that is reliable, consistent, understandable . It typically serves as the foundation for a business intelligence system. November 2015 Real-World Data Warehouses / Thomas Zurek 10

Business Intelligence and Data Warehouses 11 Business Intelligence OLAP, cubes, dimensions, measures, KPIs, scoreboards, dashboards, pivot tables, data mining, predictive, slice & dice, planning, EPM, analytics, … Data Warehouse connectivity, cleansing, scrubbing, ETL, ELT, EHL, transformation, harmonisation,consistency, compliance, auditing, big data, scalability, … Operational System ERP, CRM, SCM, HR, … Meta Data security, models, … November 2015 Real-World Data Warehouses / Thomas Zurek

Business Intelligence and Data Warehouses 12 Operational System ERP, CRM, SCM, HR, … Meta Data security, models, … simply remember: BI and DW BI ≠ DW Business Intelligence OLAP, cubes, dimensions, measures, KPIs, scoreboards, dashboards, pivot tables, data mining, predictive, slice & dice, planning, EPM, analytics, … Data Warehouse connectivity, cleansing, scrubbing, ETL, ELT, EHL, transformation, harmonisation,consistency, compliance, auditing, big data, scalability, … Focus today! November 2015 Real-World Data Warehouses / Thomas Zurek

Data Warehouses 13 November 2015 Real-World Data Warehouses / Thomas Zurek

Multiple Data Sources Why are there so many DBs at an enterprise? business processes  data captured in some DB organisation reflected in system landscape geography reflected in system landscape smaller systems easier to manage than big systems mergers and acquisitions external data: market data, supplier data, … … 14 November 2015 Real-World Data Warehouses / Thomas Zurek

A Typical Example for Business Processes in an Enterprise 15 source: http ://thebankwatch.com/2006/09/13/simplifying-the-business-model/ November 2015 Real-World Data Warehouses / Thomas Zurek

Business transform End-user access / Presentation Provide data Data Acquisition Harmonization Data Propagation Reporting / Analyses / Planning Main Service : Spot for apps/Delta to app/App recovery Transform : Enriched || General Business logic Content : Data source || Business domain specific History : Determined by rebuild requirements of apps Store : DSO(can be logical partitioned) Main Service : Decouple, Fast load and distribute Transform : 1:1 Content : 1 data source, All fields History : 4 weeks Store : PSA, DSO-WO. Main Service : Integrated, harmonized Transform : Harmonize quality assure (in flow|| lookup) Content : Defined fields History : Short or not at all || Long term Store : Info source || IO/DSO/Z-table Main Service : Make data available for reporting & planning tools Transform : Application specific/( dis -)aggregate/lookup Content : Application specific History : Application specific Store : IC,DSO, Info Set, Virtual Provider, Multi Provider. A Typical Data Warehouse Architecture Corp. Memory ODS BI Layer Data Warehouse Source 1 Source 2 Source 3 Source 4 Source 5 Project Governance IT Governance November 2015 Real-World Data Warehouses / Thomas Zurek 16

Challenge 1: RELIABLE typical: data from 50-100 data sources availability of data sources not given system downtimes network failures example: availability per data source = 98% all 100 data sources available = 0.98**100 = 13% 1 out of 100 data sources not available = 1 – 0.13 = 87%  all data in one place asserts reliable data access 17 November 2015 Real-World Data Warehouses / Thomas Zurek

Challenge 2: CONSISTENT Assume: each data source is consistent! Is the union of all data sources consistent? NO !  In a DW, data gets synchronised and harmonized to provide a consistent view spanning multiple data sources. 18 November 2015 Real-World Data Warehouses / Thomas Zurek

Examples Challenge 2: T ransformation , C leansing Jun 1, 2011 = 1.6.2011 = 06/01/11 = … VW Touareg = VW TOUAREG = [product] 87654 = … currency and unit conversions: box  kg €, $, £, ¥, …  € resolve ID clashes: product 123 [in subsiduary A] ≠ product 123 [in subsiduary B] enrich data: add attributes from source A to data from source B 19 November 2015 Real-World Data Warehouses / Thomas Zurek

Examples Challenge 2: History / Time-Dependency data is time-dependent, e.g. employee A worked in department X in 2012 employee A worked in department Y in 2013 currency exchange rates current view vs historic view analysis versioning of meta data models change development  test  production auditing 20 November 2015 Real-World Data Warehouses / Thomas Zurek

Automatisierte Überprüfung der Datenqualität in Form eines Plausibility Gates Single Point of Truth Quelle 1 Quelle 2 Quelle ... Quelle n Fachliche Überprüfung der Daten verringern den Administrationsaufwand und den anschließenden „Ärger“ Harmonisierte Auswertungen Plausibility Gate UNSPSC-Code vorhanden? RVO mit BVO-Bezug? DUNS-Nummer vorhanden? Größenordnung BVO/RVO? real customer example

Challenge 3: UNDERSTANDABLE texts for cryptic numbers multi-language support data provenance: know where the data originated auditing: track changes relevance: show the user data from his "realm of command" 22 November 2015 Real-World Data Warehouses / Thomas Zurek

Layered Scalabale Architecture (LSA) 23 November 2015 Real-World Data Warehouses / Thomas Zurek

Business transform End-user access / Presentation Provide data Data Acquisition Harmonization Data Propagation Reporting / Analyses / Planning Main Service : Spot for apps/Delta to app/App recovery Transform : Enriched || General Business logic Content : Data source || Business domain specific History : Determined by rebuild requirements of apps Store : DSO(can be logical partitioned) Main Service : Decouple, Fast load and distribute Transform : 1:1 Content : 1 data source, All fields History : 4 weeks Store : PSA, DSO-WO. Main Service : Integrated, harmonized Transform : Harmonize quality assure (in flow|| lookup) Content : Defined fields History : Short or not at all || Long term Store : Info source || IO/DSO/Z-table Main Service : Make data available for reporting & planning tools Transform : Application specific/( dis -)aggregate/lookup Content : Application specific History : Application specific Store : IC,DSO, Info Set, Virtual Provider, Multi Provider. A Typical Data Warehouse Architecture Corp. Memory ODS BI Layer Data Warehouse Source 1 Source 2 Source 3 Source 4 Source 5 Project Governance IT Governance November 2015 Real-World Data Warehouses / Thomas Zurek 24

Yet Another, Arbitrary Example … 25 Source: http ://www.zentut.com/wp-content/uploads/2012/10/stand-alone-data-mart.jpg November 2015 Real-World Data Warehouses / Thomas Zurek

The Layered Scalable Architecture (LSA) reference architecture for DW term introduced by SAP, but not SAP-specific layers: each layer has a certain task each layer has an associated service-level layers describe the step-wise refinement of data not every DW needs all LSA-layers modern technology allows to remove / merge layers as less or no performance-motivated services are required more: http://tinyurl.com/sap-lsa 26 November 2015 Real-World Data Warehouses / Thomas Zurek

LSA Reference Layers LSA Reporting Layer Business Transformation Layer Operational Data Store Data Propagation Layer Quality & Harmonisation Layer Corporate Memory Data Acquisition Layer Virtualization Layer 1:1 from extraction, temporary source system service level, long term, comprehensive, complete, master the unknown create harmonised view, guarantee quality EDW layers application neutral corporate owned granular BI Applications/ Analytics Layers digestible, integrated, unified data, ready to consume apply business logic reporting, analysis ready abstraction near real time, operational like November 2015 27

Big Data + Data Warehousing 28 November 2015 Real-World Data Warehouses / Thomas Zurek

Big Data – The 3 "V"s V elocity  speed, parallelism V olume  scale V ariety  many formats, file system November 2015 Real-World Data Warehouses / Thomas Zurek 29

Big Data Example: Connected Cows November 2015 Real-World Data Warehouses / Thomas Zurek 30 estrus detection by counting steps motion sensors for 40000 cows benefits for artifical insemination: less labour intensive higher success rates: 45 %  63% sex determination references: Strata Feb 2015 : http:// tinyurl.com/connected-cows http://www.fujitsu.com/global/about/resources/news/press-releases/2013/1015-01.html

RDBMS vs Big Data Data Warehouses / RDBMS INSERT + UPDATE + DELETE prescreptive schema DDL strict notion of consistency Big Data / HDFS INSERT (mostly) d escriptive schema XML JSON sloppier due to scale of data and number of systems involved November 2015 Real-World Data Warehouses / Thomas Zurek 31

Big Data + Data Warehouses November 2015 Real-World Data Warehouses / Thomas Zurek 32 Source: Strata Feb 2015, San Jose, CA, US – Keynote by Amr Awadallah ( Cloudera)

Summary 33 November 2015 Real-World Data Warehouses / Thomas Zurek

What You Should Take Away Difference: BI vs DW vs Big Data What are the problems that a DW handles? How are those problems tackled? 34 November 2015 Real-World Data Warehouses / Thomas Zurek
Tags