Data Warehouse – Introduction, characteristics, architecture, scheme and modelling, Differences between operational database systems and data warehouse.
Size: 1.98 MB
Language: en
Added: May 19, 2023
Slides: 52 pages
Slide Content
UBN204A62
Mr.T.Somasundaram
Assistant Professor
Department Of Management
KristuJayantiCollege (Autonomous), Bengaluru
UNIT 1: DATA WAREHOUSE
DataWarehouse–Introduction,
characteristics,architecture,scheme
andmodelling,Differencesbetween
operationaldatabasesystemsand
datawarehouse.
Let’s go for thinking
Data Set (Combo)Item Set
Transaction / Purchasing
Frequent items
This is what need of Data Warehouse
What is Data Warehouse?
DataWarehouseisanenvironment,notaproduct.
Dataisoftenscatteredacrossdifferentdatabase,itneedDWto
getcompleteinformation.
Itisaimedateffectiveintegrationofoperationaldatabasesthat
enablesstrategicuseofdata.
DataWarehouseworksasacentralrepository,whereinformation
arrivesfromoneormoredatasources.
Dataflowsintodatawarehousefromtransactionalsystemand
otherrelationaldatabases.
Datamaybestructured,semi-structuredandunstructureddata.
Itisprocessed,transformedandingestedthroughBItools,SQL
clientsandspreadsheets.
Datawarehousecontainmultipledatabasesanddataisorganized
intotablesandcolumnsineachdatabase.
Datastoredinvarioustablesdescribedbyschema&Querytoolsuse
theschematodeterminewhichdatatablestoaccessandanalyze.
Datastoredincolumndescribethedatasuchasinteger,datafield,
string,etc.
Datawarehousingmakesdataminingpossible.
How Data Warehouse works?
1.Subject-Oriented:
Itprovidestopicwiseinformationratherthantheoverallprocessesof
abusiness.(i.e.)sales,inventory,promotion,etc.
2.Integrated:
DWisdevelopedbyintegratingdatafromvariedsourceintoa
consistentformat.
Itisstoredindatawarehouseinaconsistentmannerintermsof
naming,format&coding,whichfacilitatesdataanalysis.
3.Non-Volatile:
Dataonceenteredintoadatawarehousemustremainunchanged&
alldataisreadonly.
4.TimeVariant:
Datastoredinadatawarehouseisdocumentedwithanelementof
time,eitherexplicitlyorimplicitly.
Itisexhibitedinprimarykeywithelementoftimelikeday,week,etc.
Characteristics of Data Warehouse
1.EnterpriseDataWarehouse(EDW):
Itservesasakeyorcentraldatabasethatfacilitatesdecision
supportservicesthroughouttheenterprise.
Itprovidesaccesstocross-organizationalinformation,
offersaunifiedapproachtodatarepresentation.
2.OperationalDataStore(ODS):
Itispreferredforroutineactivitieslikestoringemployee
records.
3.DataMart:
Itisasubsetofadatawarehousebuilttomaintaina
particulardepartment,regionorbusinessunit.
Everydepartmentofabusinesshasacentralrepositoryor
datamarttostoredata.
Types of Data Warehouse
Data Warehouse Vs Data base
CharacteristicsData Warehouse Transactional Database
Suitable
workloads
Analytics, reporting, big
data
Transaction processing
Data source
Data collected and
normalized from many
sources
Data captured as-is from a
single source, such as a
transactional system
Data capture
Bulk write operations
typically on a
predetermined batch
schedule
Optimized for continuous
write operations as new
data is available to
maximize transaction
throughput
Data Warehouse Vs Data base
CharacteristicsData Warehouse Transactional Database
Data
normalization
Denormalizedschemas,
such as the Star schema
or Snowflake schema
Highly normalized, static
schemas
Data storage
Optimized for simplicity
of access and high-speed
query performance using
columnar storage
Optimized for high
throughout write
operations to a single row-
oriented physical block
Data access
Optimized to minimize
I/O and maximize data
throughput
High volumes of small
read operations
Data Warehouse Vs Data lake
Characteristics Data Warehouse Data Lake
Data
Relational data from
transactional systems,
operational databases, and line
of business applications
Alldata, including
structured, semi-
structured, and
unstructured
Schema
Often designed prior to the data
warehouseimplementation but
also can be written at the time of
analysis (schema-on-write or
schema-on-read)
Written at the time of
analysis (schema-on-
read)
Price/Performance
Fastest query results using local
storage
Query results getting
faster using low-cost
storage and decoupling
of compute and storage
Data Warehouse Vs Data lake
CharacteristicsData Warehouse Data Lake
Data quality
Highly curated data that
serves as the central
version of the truth
Any data that may or may not
be curated (i.e. raw data)
Users
Business analysts, data
scientists, and data
developers
Business analysts (using
curated data), data scientists,
data developers, data
engineers, and data architects
Analytics
Batch reporting, BI, and
visualizations
Machine learning,
exploratory analytics, data
discovery, streaming,
operational analytics, big
data, and profiling
Thereareseveralbenefitsofdatawarehouseforenduserslike–
Improveddataconsistency,dataqualityandaccuracy.
Betterbusinessdecisions.
Easieraccesstoenterprisedataforend-users.
Betterdocumentationofdata.
Historicaldataanalysis.
Reducedcomputercostsandhigherproductivity.
Enablingend-userstoaskad-hocqueriesorreportswithout
deterringtheperformanceofoperationalsystems.
Collectionorconsolidationofrelateddatafromvarioussources
intoaplace.
Benefits of Data Warehouse
Itallowsbusinessuserstoquicklyaccesscriticaldatafromsome
sources.
Itprovidesconsistentinformationonvariouscross-functional
activities.
Ithelpstointegratemanysourcesofdatatoreducestresson
productionsystem.
Ithelpstoreducetotalturnaroundtimeforanalysis&reporting.
Itstoresalargeamountofhistoricaldataandhelpsusersto
analysedifferenttimeperiods.
Restructuring&Integrationmakeiteasierforusertousefor
reporting&analysis.
Advantages of Data Warehouse
DataWarehousecanbeoutdatedrelativelyquickly.
Difficulttomakechangesindatatypesandranges,data
sourceschema,indexesandqueries.
Datawarehouseseemeasy,butactually,itistoocomplexfor
averageusers.
Sometimewarehouseuserswilldevelopdifferentbusiness
rules.
Organizationneedtospendlotsoftheirresourcesfor
trainingandimplementationpurpose.
Thisisnotanidealoptionforunstructureddata.
Disadvantages of Data Warehouse
1.BasicStructure(SingleTier):
Structure of Data Warehouse (Architecture)
Data
Warehouse
Other Data inputs
Update new data
Cleaning
Reformatting
OLAP
DSS
Data
mining
DB
Sample Data Warehouse Architecture
2.GenericTwoLevel(TwoTier):
Two Tier Architecture
Sources
Sources
(file)
Sources
(DB)
Sources
(External)
Data
Warehouse
Translation &
Integration
Decision Support
Environment
Recommended approach for data
warehouse development
Define a high –level corporate data model
Data
mart
Data
mart
Distributed
data marts
Multi tier
data
warehouse
Enterprise
data
warehouse
Model RefinementModel Refinement
Metadata:
“Metadataaredataaboutdata.Itrepresentswarehouseobjects
whenusedindatawarehouse.Itiscreatedforthedatanamesand
definitionsofgivendatawarehouse”.
Itiscreatedandcapturedforanyextracteddata,sourceofextracted
dataanditisaddedbycleaningorintegrationprocess.
Metadatarepositoryareplacedinthebottomofthedatawarehouse
architecture.
Itisusedasdirectorytohelpdecisionsupportsystemanalystlocate
thecontentsofdatawarehouse.
Itserveasaguidetoalgorithmsusedforsummarization.
Metadatashouldbestoredandmanagedpersistently.
Ametadatarepositoryconsistsof–
a)Datawarehousestructure–itincludesdatawarehouseschema,
view,dimensions,hierarchiesandderiveddatadefinitionsaswellas
datamartlocationsandcontents.
Meta Data Models
b)Operationalmetadata–itincludedatalineage(i.e.)historyof
migrateddataandsequenceoftransformations),currencyofdata
andmonitoringinformation(errorreportsandaudittrails).
c)Algorithmsusedforsummarization–includesmeasureand
dimensiondefinitionalgorithms,dataonpartitions,subjectareas,
aggregation,summarizationandpredefinedqueriesandreports.
d)Mappingfromoperationalenvironmenttodatawarehouse–
includesdatabasesandtheircontents,descriptions,data
partitions,dataextraction,cleaningandsecurity.
e)Datarelatedtosystemperformance–includesindicesand
profilesthatimprovedataaccessandretrievalperformance,
schedulingofupdateandreplicationcycles.
f)Businessmetadata–itincludesbusinesstermsanddefinitions,
dataownershipinformationandchargingpolicies.
Meta Data Models
Difference between Star, Snowflake & Fact
Constellation Schema
Star Schema
Snowflake
Schema Fact Schema
Elements
Single Fact Table
connected to
multiple dimension
tables with no sub-
dimension tables
Single Fact Table
connects to
multiple dimension
tables that connects
to multiple sub-
dimension tables
Multiple Fact
Tables connects to
multiple dimension
tables that connects
to multiple sub-
dimension tables
NormalizationDenormalized Normalized Normalized
Number of
Dimensions
Multiple dimension
tables map to a
single Fact Table
Multiple dimension
tables map to
multiple dimension
tables
Multiple dimension
tables map to
multiple Fact
Tables
Data
Redundancy High Low Low
Difference between Star, Snowflake & Fact
Schema
Star Schema
Snowflake
Schema Fact Schema
Performance
Fewer foreign keys
resulting in
increased
performance
Decreased
performance
compared to Star
Schema from
higher number of
foreign keys
Decreased
performance
compared to Star
and Snowflake.
Used for complex
data aggregation.
Complexity
Simple, designed to
be easy to
understand
More complicated
compared to Star
Schema –can be
more challenging
to understand
Most complicated
to understand.
Reserved for highly
complex data
structures
Difference between Star, Snowflake & Fact
Schema
Star Schema
Snowflake
Schema Fact Schema
Storage
Usage
Higher disk space
due to data
redundancy
Lower disk space
due to limited data
redundancy
Low disk space
usage compared to
the level of
sophistication due
to the limited data
redundancy
Design
Limitations
One Fact Table
only, no sub-
dimensions
One Fact Table
only, multiple sub-
dimensions are
permitted
Multiple Fact
Tables permitted,
only first level
dimensions are
permitted
OperationalDatabasesystems:
Operationalsystemassistacompanyororganizationinitsday-to-
daybusiness.
Itsapplicationsanddataarehighlystructureandprovideimmediate
focusonbusinessfunctionswiththehelpofOLTP.
Itrequiredtosupportalargenumberoftransactiononadaily
basis.
Operationaldatastoressmall,focusingthedatabaseonspecific
businessareaandeliminatingdatabaseoverheadinareassuchas
indexes.
DataWarehousesystem:
Thissystemisorganizedaroundthetrendsorpatternsinthose
eventssetbyoperationalsystems.
Datawarehousefocusonbusinessneedsandrequirements.
Itdevelopideasforchangingthebusinessrulestomakethese
eventsmoreeffective.
Operational Data base systems and Data
warehouse
Themultiplepurposesare–
Itminimisestheimpactofreportingandcomplexquery
processingonoperationalsystems.
Itpreservesoperationaldataforreuseafterthatdatahas
beenpurgedfromtheoperationalsystems.
Itmanagesthedatabasedontime,allowingtheuserto
lookbackandseehowthecompanylookedinthepast
versusthepresent.
Itprovidesadatastorethatcanbemodifiedtoconformto
thewaytheusersviewthedata.
Itunifiesthedatawithinacommonbusinessdefinition,
offeringoneversionofreality.
Operational Data base systems and Data
warehouse
Difference between Operational Data base
systems and Data warehouse
DifferenceOperational SystemData Warehouse
system
Size &
Content
Small Large
PerformanceSpeed in natureSlow, as therequest
of data is based on
query
Content focusSmall work areasCross functional
areas
Tools Typicalstructure and
deals with less no. of
tools
Varioustools and
supports the types of
data