CHAPTER 2 - Datawarehouse Architecture.pptx

AnithaSakthivel3 14 views 30 slides Mar 03, 2025
Slide 1
Slide 1 of 30
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30

About This Presentation

Unit 2


Slide Content

Data warehousing and Data Mining Dr.R.U.Anitha 1

Chapter # 2 Data Warehouse Architecture & Models     Text Book: Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas 2

Learning Objectives The main objective of this chapter is to let the students know the basic structure of a Data warehouse. Learn about Data warehouse physical architecture and various principles of a Data warehousing. To make students understand a data warehouse model. It also explains various types of multidimensional models and Schemas. 3

Major Topics Structure of a Data Warehouse Data Warehouse Architectures Three Data Warehouse Models Extraction, Transformation and Loading (ETL) Design of Data Warehouse Multidimensional Data Model Data cube Conceptual Modeling of Data Warehouses Dimension Schema 4

Structure of a Data Warehouse Essentially, a data warehouse provides historical data for decision-support applications. Such applications include reporting, online analytical processing (OLAP), executive information systems (EIS), and data mining. Data warehouse is a centralized, integrated repository of information. 5

Here, integrated means cleaned up, merged, and redesigned. This may be more or less complicated Depending on how many systems feed into a warehouse and how widely they differ in handling similar information. But most companies already have repositories of information in their production systems and many of them are centralized. 6

Data Warehouse Architectures : Conceptual View Single-layer –Every data element is stored once only –Virtual warehouse 7

Two-layer Real-time + derived data –Most commonly used approach in industry today 8

Three-layer Architecture: Conceptual View Transformation of real-time data to derived data really requires two steps 9

The architecture consists of various interconnected elements: Operational and external database layer – the source data for the DW Information access layer – the tools the end user access to extract and analyze the data Data access layer – the interface between the operational and information access layers Metadata layer – the data directory or repository of metadata information 10

Additional layers are: Process management layer – the scheduler or job controller Application messaging layer – the “middleware” that transports information around the firm Physical data warehouse layer – where the actual data used in the DSS are located Data staging layer – all of the processes necessary to select, edit, summarize and load warehouse data from the operational and external data bases 11

Components of the Data Warehouse Architecture 12

Data Warehouse Architecture 13

Three Data Warehouse Models Enterprise Warehouse - Collects all of the information about subjects spanning the entire organization. The Central data warehouse - A single physical database contains all of the data for a specific functional area. The distributed data warehouse - The components are distributed across several physical databases. 14

Extraction, Transformation and Loading (ETL) Data extraction - get data from multiple, heterogeneous, and external sources Data cleaning –detect errors in the data and rectify them when possible   Data transformation –convert data from legacy or host format to warehouse format Loading –sort, summarize, consolidate, compute views, check integrity, and build indices and partitions Refresh –propagate the updates from the data sources to the warehouse   15

Design of Data Warehouse: A Business Analysis Framework Four views regarding the design of a data warehouse Top-down view It allows selection of the relevant information necessary for the data warehouse Data source view It exposes the information being captured, stored, and managed by operational systems Data warehouse view It consists of fact tables and dimension tables Business query view It shows the perspectives of data in the warehouse from the view of end-user. 16

Multidimensional Data Model From Tables and Spreadsheets to Data Cubes A data warehouse is based on a multidimensional data model which views data in the form of a data cube A data cube, such as sales, allows data to be modeled and viewed in multiple dimensions Dimension tables [ Eg . item ( item_name , brand, type), or time(day, week, month, quarter, year) ] Fact table contains measures (such as dollars_sold ) and keys to each of the related dimension tables In data warehousing literature, an n-D base cube is called a base cuboid . The top most 0-D cuboid , which holds the highest-level of summarization, is called the apex cuboid . The lattice of cuboids forms a data cube. 17

18

Data Cube n-dimensional cube C[A1,A2,…An] is a database with n dimensions as A1,A2…An each of which represents a theme and contains |Ai| number of distinct elements in the dimension Ai. Each distinct element of Ai corresponds to a data row of C. A data cell in the cube C[a1,a2,..an] stores the numeric measures of the data for Ai= ai , for all i . Eg . Data cube C[place, product, year] Data Cell [Peshawar, TV, 4Qtr] stores 314 as numeric measure which represents three dimensions with 3,1 and 4rows respectively. 19

A Sample Data Cube 20

21

Typical Data cube OLAP Operations Roll up (drill-up): Summarize data by climbing up hierarchy or by dimension reduction Drill down(roll down): Reverse of roll-up From higher level summary to lower level summary or detailed data, or introducing new dimensions Slice and Dice: Project and select Pivot (rotate): reorient the cube visualization, 3D to series of 2D planes Drill across: involving (across) more than one fact table Drill through: through the bottom level of the cube to its back-end relational tables (using SQL) 22

23

Conceptual Modeling of Data Warehouses Dimension Schema Modeling data warehouses: dimensions & measures Starschema : A fact table in the middle connected to a set of dimension tables Snowflakeschema : A refinement of star schema where some dimensional hierarchy is normalized into a set of smaller dimension tables, forming a shape similar to snowflake Factconstellations : Multiple fact tables share dimension tables, viewed as a collection of stars, therefore called galaxy schema or fact constellation 24

Star Schema Easy for users to understand Fast response to queries Simple metadata Supported by many front end tools Less robust to change Slower to build Does not support history 25

Example of STAR Schema 26

Snowflake Schema Direct use by some tools More flexible to change Provides for speedier data loading May become large and unmanageable Degrades query performance More complex metadata 27

28

Fact Constellation Measure of online analytical processing Collection of multiple  fact  tables Sharing dimension tables Viewed as a collection of stars. Improvement over Star  schema . 29

30
Tags