OLTP OLAP
Data Source Operational data Consolidateddata from various OLTP sources
Purpose of data DailyBusiness transactions For Decisionsupport, planning, reporting &
analysis
Insert & Updates Short& Fast Periodic long running batch jobs
Queries Simple queries that returns few
records
Often complex or ad hoc queries involving
aggregations
Processing Speed Very Fast Depends on volume of data
Space Requirements Less Large
Database Design Highly normalized Denormalized
Back Up & Recovery Regularback ups & proper
recovery plans/policies
In frequent back up policy
Data Ware House Architecture
OS
OS
Raw Data
Summary
Data
Metadata
Inventory
Sales
Purchasing
Flat Files
STG DB
Analyti
cs
Report
ing
Mining
Data Source Staging Ware House Data Marts Users
Benefits:
Itprovidesadirect&simplemappingbetweenthebusinessentitiesand
theschemadesign.
Itprovideshighlyoptimizedqueryperformanceduetosimplejoins
betweenoneFact&Dimensions
ItiswidelysupportedbyvariousBItools
Product Dim
Time Dim
Sales Fact
Location Dim
Customer Dim
Product Dim
Time Dim
Sales Fact
Location Dim
Customer Dim
Product
Category Look
Up
Month Look
Up
Customer
Type Look Up
State Look Up
FactConstellationSchemaasnamesuggestedisagroupoffact
tablessharingmultipledimensionsbetweeneachotherrepresenting
shapesimilartolikeaconstellationofstars(i.e.,starschemas).Itis
prettycomplexinnaturehenceshouldbeusedforapplicationswhichare
highlysophisticated&complicatedinnature.
Product
Dim
Time Dim
Sales Fact
Location
Dim
Customer
Dim
Shipping
Fact
Shipper
Dim
Time Dim
Inmanyscenarios,intheprocessofidentifyingdimensionsoutof
transactionalsystemtablescompriseofmanymiscellaneousindicators
andflags,eachofwhichtakesonasmallrangeofdiscretevaluesthat
can’tbincludedinthedimensionsresultinginverylargefacttables.By
creatinganabstractdimension,theseflagsandindicatorsareremoved
fromthefacttable&placedintoadummydimension.Thesedimensions
arecalledJunkDimensions.
Fore.g.,ifweremove10two-valueindicators,suchasthecash
versuscreditpaymenttype,fromtheorderfacttableandplacetheminto
asingledimensioni.e.junkdimensionwith1024rows(2)withasingle
smallsurrogatekeyincludedinthefacttable.
Order Indicator
Key
Payment
Type
Payment Type
Group
Inbound
/Outbound
Commission Credit
Indicator
Order Type
Indicator
1 Cash Cash I C Regular
2 Cash Cash I N Display
3 Cash Cash O C Regular
ADegenerateddimensionisadatadimensionthatisalthough
dimensionalinnaturebutstoredinfacttableItdoesn’thaveany
separatedimensiontojoin&onecanuseittoslice&dicethemeasures
infacttable.
Fore.g.,dimensionkey,suchasatransactionnumber,invoicenumber,
ticketnumber,orbill-of-ladingnumber,thathasnoattributesbutisused
toprovideadirectreferencebacktoatransactionalsystemwithoutthe
overheadofmaintainingaseparatedimensiontable.
Sales Fact table
POS Transaction No (DD)
ProductID (FK)
Date Key(FK)
Store ID (FK)
Salesquantity
Gross Profit
Cost ($)
Selling Price($)
Promotion Key(FK)
Product Dim
Date Dim
Store Dim
Promotion
Dim
Arole-playinginadatawarehouseoccurswhenasingledimension
simultaneouslyappearsseveraltimesinthesamefacttable.The
underlyingdimensionmayexistasasinglephysicaltable,buteachof
therolesshouldbepresentedasaseparatelylabeledview.Fore.g.,the
Datedimensioncanbeusedfortheordereddate,scheduledshipping
date,shipmentdate,andinvoicedateinanorderlinefactorina
InsurancedomainaCustomerdimensioncanbeusedasnominee,
proposer&beneficiaryinaPolicyDetailFact.
Order Transaction Fact
Order Date Key(FK)
Shipped Date(FK)
Invoice Date (FK)
OrderNo(DD)
Order quantity
Product Key(FK)
Total Amount
DiscountAmount
Net Order Amount
Order Line No(DD)
Date Dimension
Product Dimension
Type 1:Overwrite the Value
Withthetype1,weoverwritetheoldattributevalueinthedimension
row,replacingitwiththecurrentvalue.Insodoing,theattributealways
reflectsthemostrecentassignment.Thisismostappropriatefor
rectifyingthecertaintypeofdataerrorslikemisspellingofthename
,addressetcornovalueinkeepingolddescription.
Example:SupplierstatechangesfromCAtoNY.
Old
New
Pros:Easytomaintain&fast.
Cons:Nohistorycanbekept&therecalculationorloadingofthe
aggregatefacttablebasedonstate.
Supplier_KeySupplier_CodeSupplier NameSupplier State
123 XYZ MaxTrading CA
Supplier_KeySupplier_CodeSupplier NameSupplier State
123 XYZ MaxTrading NY
Type 2: Add a dimension row
TheType2methodtrackshistoricaldatabycreatingmultiplerecordsfor
agivenbusinesskeyinthedimensionaltableswithseparatesurrogate
keyswitheffectivedatetimeand/ordifferentversionnumbersor.Using
Type2,wecankeeptheentirehistoryofarecordsbecauseanew
recordisinsertedeachtimeachangeismade.
Thetype2responseistheprimarytechniqueforaccuratelytracking
slowlychangingdimensionattributes.Itisextremelypowerfulbecause
thenewdimensionrowautomaticallypartitionshistoryinthefacttable.
Example:SupplierstatechangesfromCAtoNY
Supplier_
Key
Supplier_C
ode
Supplier
Name
Supplier
State
Version
123 XYZ MaxTradingCA 0
123 XYZ MaxTradingNY 1
Type 2 : Versioning
Pros :
Historycanbekept&trackingofunlimiteddimensionchanges
Itperfectlysegmentsfacttablehistorybecauseprechangefactrows
usetheprechangesurrogatekey.
Cons:
Itcontributesinrapidgrowthofdimensiontables
Databaseoperationslikejoinsareexpensive
Supplier_K
ey
Supplier_C
ode
Supplier
Name
Supplier
State
Start Date EndDate
123 XYZ MaxTradingCA 01/11/2009 07/22/2010
123 XYZ MaxTradingNY 07/23/2010
Type 2: Effective Date Stamp Method
Type 3 : Add a Dimension Column
Type3solutionstrackchangeshorizontallyinthedimensiontableby
addingnewfieldstocontaintheolddata.Oftenonlytheoriginaland
currentvaluesareretainedandintermediatevaluesarediscardedbutat
timeswealsoretaintheprevious¤tvaluesonly.Thiskindoftypeis
usedwherewewanttocomparethecurrentdatavalueswithoriginalor
previousdatavaluesinonegoforthepurposelikesalesforce
reorganizationsetc.
Pros:Avoidanceofmultipledimensionrecordsforsingleentity.
Cons:Limitedhistorytracking&morecomplexqueriestoaccesstheold
values.
Supplier_Key Supplier_CodeSupplier NameOriginal Supplier
State
Current Supplier
State
123 XYZ MaxTrading CA NY
Predictable Changes with Multiple Version Overlays :
Insomecasesweneedtoretainthehistoryfor4-5yearsortimeswhere
thenumberisknowntousorwecanpredictthechangeforagiven
attribute.ItisanextensiontoType3wherewecankeeponaddingthe
columnforachangingattributeas&whenitchangeswhilekeepingthe
latestdatavalueinthecurrentvaluecolumn.For,e.g.,Salesrepdistrict
gotrevisedeveryyearortwothenwecandesigntheSalesRepDimtable
(asshowninfigure).IfhisDistrictchanges
laterthenweneedtoonlyaddanother
columnwithlabelDistrictfor2010withvalue
fromcurrentdistrict&overwritethe
currentdistrictwithnewvalue.
Sales Representative Dimension
Name
Key
Address
Current Sales District
District 2009
District 2008
…….
Unpredictable Changes with Single-Version Overlay:
Itbasicallycombinestheapproachofalltype1,2&3intosingletype.
Thismakesensewhenonehastopreservehistoricalaccuracy
surroundingunpredictableattributechangeswhilesupportingtheabilityto
reporthistoricaldataaccordingtothecurrentvalues.Thisisonlypossible
byclubbingallthe3typestogetherasshownbelow:
InthisnewdimensionrowforSupplier,thecurrentstatewillbeidenticalto
thehistoricalstate.ForallpreviousinstancesofthatSupplierdimension
rows,thecurrentstateattributewillbeoverwrittentoreflectthecurrent
structure.
Supplier
_Key
Supplier
_Code
Supplier Name Historical S -
upplierState
Current
Supplier State
Current
Flag
Start
Date
End
Date
123 XYZ MaxTrading CA IL N
124 XYZ MaxTrading NY IL N
125 XYZ MaxTrading IL IL Y
Thesolutionistobreakofffrequentlyanalyzedorfrequentlychanging
attributesintoaseparatedimension,referredtoasaminidimension&
trackthemasband.Aseparatedimensionofvariablevalued
demographicattributes,suchasage,gender,numberofchildren,and
incomelevel,presumingthatthesecolumnsgetusedextensively.There
wouldbeonerowinthisminidimensionforeachuniquecombinationof
age,gender,numberofchildren,andincomelevelencounteredinthe
data,notonerowpercustomer(referfigurebelow).Businessneedswill
determinewhichcontinuouslyvariableattributesaresuitablefor
convertingtobands.
DemographicKeyAge Gender Income Level
1 20-24 M 0-$20000
2 20-24 M $20000-$24999
3 20-24 F 0-$20000
4 25-29 M 0-$20000
5 25-29 F 0-$20000
Everytimewebuildafacttablerow,weincludetwoforeignkeysrelated
tothecustomer:theregularcustomerdimensionkeyandthe
minidimensiondemographicskey.AsshowninFigurebelow,the
demographicskeyshouldbepartofthefacttable’ssetofforeignkeysin
ordertoprovideefficientaccesstothefacttablethroughthe
demographicsattributes.
Customer Dim
Customer Key (PK)
CustomerID (NK)
Customer Name
Address
DOB
…….
Age
Gender
Income
No of Children
Customer Dim
Customer Key (PK)
CustomerID (NK)
Customer Name
Address
……….
CustDemo Dim
C Demo Key (PK)
C Age Band
Gender
C Income Band
becomes
Fact Table
Customer Key (FK)
C Demo Key (FK)
More Foreign
Keys…..
Facts…….
The minidimensionterminology refers to when the demographics key is
part of the fact table composite key; if the demographics key is a foreign
key in the customer dimension, we refer to it as an outrigger which has
to be a Type 1 attribute
The best approach for efficiently browsing and tracking changes of key
attributes in really huge dimensions is to break off one or more
minidimensions from the dimension table, each consisting of small
clumps of attributes that have been administered to have a limited
number of values
To store exact values instead of the bands or ranges sometimes one
need to create fact less schema that focuses on attribute changes. In
this case the dimension & minidimension are connected via a dummy
fact table which only consists the keys from various dimension table but
no numeric measurement values
On the basis of measure :
Additive
Semi additive
Non additive
Fact less fact or Junction Fact
On the basis of measurement events:
Transactional snapshots
Periodic snapshots
Accumulating snapshots
A Semi Additivefacts are the measurements that can be added across
few of the dimensions but not against all in the fact table. In the figure,
Bill amount is a semi additive fact because it can be added up for a date
or Store key to arrive at the total sales amount for the day or for a given
store but adding up against Product key doesn’t make any sense.
Retail Sales Transactions Fact
Date Key(FK)
ProductKey (FK)
Store Key(FK)
TxnNo
Sales Amount($)
Sales Quantity
CostAmount($)
Gross Profit($)
Gross Margin(%)
Bill Amount($)
ANonAdditivefactsarethemeasurementsthatcannotbeadded
acrossanyofthedimensionsinthefacttable.Percentagesandratios,
suchasgrossmargin,arenonadditive.Thenumeratoranddenominator
shouldbestoredinthefacttable.Theratiocanbecalculatedinadata
accesstoolforanysliceofthefacttablebyrememberingtocalculatethe
ratioofthesums,notthesumoftheratios.Inthefigure,UnitPrice&
Grossmarginarenonadditivefactsbecausethecan’tbesummarized
alonganydimension.
Retail Sales Transactions Fact
Date Key(FK)
ProductKey (FK)
Store Key(FK)
TxnNo
Sales Amount($)
Gross Profit($)
Gross Margin(%)
Unit Price($)
Afactlessfactsarethosetablesthatdoesn’thaveanymeasurement
metrics,itmerelycapturestherelationshipbetweentheinvolved
dimensionkey.Afacttablethathasnofactsbutcapturescertainmany
to-manyrelationshipsbetweenthedimensionkeys.Mostoftenusedto
representeventsorprovidecoverageinformationthatdoesnotappearin
otherfacttables.Fore.g.,todeterminewhatproductswhereon
promotionbutdidn’tsellrequiresaseparatepromotioncoveragefact
table,we’dloadonerowinthefacttableforeachproductonpromotion
inastoreeachday(orweek,sincemanyretailpromotionsareaweekin
duration)regardlessofwhethertheproductsoldornot.Thistableisfact
lessfactduetoabsenceofanymeasurements.
Promotion Coverage Fact
Date Key(FK)
ProductKey (FK)
Store Key(FK)
Promotion Key (FK)
Product Dim
Time Dim
Store Dim
Promotion Dim
Atransactionalsnapshotfacttablerepresentsapointoftimeinthe
lifeofbusinessevents.Arowexistsinthefacttableforagiven
dimensiononlyifatransactioneventoccurred.Transactionalfacttable
holdsdataofthemostdetailedlevel,causing
ittohaveagreatnumberofdimensions
associatedwithit.Onceatransactionhas
beenposted,needn’tberevisited.
Transactional Fact Table
TxnNo
Item ID (FK)
BillNo (DD)
Count of Item
Cost
Profit
SellingPrice
Store ID (FK)
Date Key(FK)
Tax Amount
Discount
Promotional Code (FK)
APeriodicsnapshotfactrepresentsapre-definedintervaloraperiod.
Unlikethetransactionfacttable,withtheperiodicsnapshot,we
takeapicture(hencethesnapshotterminology)of
theactivityattheendofaday,week,ormonth,
thenanotherpictureattheendofthenextperiod,
andsoon.Theperiodicsnapshotsarestacked
consecutivelyintothefacttable.Dailysnapshots
andmonthlysnapshotsarecommon.Aseparate
recordisplacedinaperiodicsnapshotfacttable
eachperiodregardlessofwhetheranyactivityhastakenplaceinthe
underlyingtransaction.
Periodic Fact Table
Item ID (FK)
Store ID (FK)
Date Key(FK)
Cost ($)
Last SellingPrice($)
Quantity on hand
Quantity Sold
Anaccumulatingsnapshotfacttablerepresentsbusinessactivities
overatimeperiod.Accumulatingsnapshotsalmost
alwayshavemultipledatestamps,representingthe
predictablemajoreventsorphasesthattakeplace
duringthecourseofalifetime.Oftenthere’san
additionaldatecolumnthatindicateswhenthe
snapshotrowwaslastupdated.Sincemanyofthese
datesarenotknownwhenthefactrowisfirstloaded,
wemustusesurrogatedatekeystohandleundefined
dates.Thefacttableisrevisitedandupdatedas
activityoccurs.ArecordisplacedinanAccumulating
snapshotfacttablejustonce,whentheitemthatit
representsisfirstcreatedrestofthetimeitisjust
updatedwithnewdatevalues.
Accumulating Snapshot
Fact Table
TxnNo
Order Date Key (FK)
Backlog Date Key (FK)
Release to Manufacturing
Date Key (FK)
Finished Inventory
Placement Date Key (FK)
Requested Ship Date Key
(FK)
Scheduled Ship Date Key
(FK)
Actual Ship Date Key (FK)
Arrival Date Key (FK)
Invoice Date Key (FK)
……..
Atablewithamultipartkeycapturingamany-to-manyrelationship
thatcan’tbeaccommodatedbythenaturalgranularityofasinglefact
tableorsingle-dimensiontable.Servestobridgebetweenthefacttable
andthedimensiontableinordertoallowmany-valueddimensionsor
raggedhierarchies.Sometimesreferredtoasahelperorassociative
table.Whenusingabridgetable,thefactsinthefacttablearemultiplied
bythebridgetable’sweightingfactortoappropriatelyallocatethefactsto
themultivalueddimension.ItiscalledWeightReport&ifweexcludethis
thenitturnsouttobeImpactreport.
Diagnosis Group
Bridge
Diagnosis Group
Key(FK)
Diagnosis Key(FK)
Weighted Factor
Diagnosis
Group Dim
Diagnosis
Group(PK)
Health Care
Billing Line Item
Fact
Diagnosis
Dimension
Balanced
Unbalanced
Ragged
Parentchildrelationship
CEO
CIO COO
Fin HeadIT Head
Emp2
HR Head
Emp1 Emp3 Emp4
Inbalancedhierarchies(balanced/standard),thebranchesofthe
hierarchyalldescendtothesamelevel,witheachmember'sparent
beingatthelevelimmediatelyabovethemember.Theyare
consistentbecauseeachlevelrepresentsthesametypeof
information,andeachlevelislogicallyequivalent.Ancommon
exampleofabalancedhierarchyisonethatrepresentstime,where
thedepthofeachlevel(year,quarter,andmonth)isconsistent.
2009
1
st
Quarter
Jan MarFeb
2010
1
st
Quarter
Jan MarFeb
Unbalancedhierarchiesincludelevelsthathaveaconsistentparent-
childrelationship,buthavelogicallyinconsistentlevels.The
hierarchybranchescanalsohaveinconsistentdepths.Fore.g.,the
organizationalstructureisunbalanced,withsomebranchesinthe
hierarchyhavingmorelevelsthanothers.Inanunbalanced
hierarchy,nullvaluescanappearonthelowerlevelsofthe
hierarchy.
CEO
CIO COO
IT Head
Emp2
Admin 1
Emp1
CFO
Admin 2Fin 1 Fin 2
Emp3
Inaraggedhierarchy,thelogicalparentmemberofatleastone
memberisnotinthelevelimmediatelyabovethemember.Thiscan
causebranchesofthehierarchytodescendtodifferentlevels.A
raggedhierarchycanrepresentageographichierarchyinwhichthe
meaningofeachlevelsuchascityorcountryisusedconsistently,
butthedepthofthehierarchyvaries
North America
Athens
Greece
Europe
US
CA
San Francisco
A parent-child hierarchyis a hierarchy with multiple levels that track
the relationships within the hierarchy. A single table or view is used
that represents the parent-child hierarchy. A view can be used to
flatten the structure in case this kind of hierarchy used multiple
tables . The top level uses the parent key as the level key, whereas
the bottom level contains the child key. For example, in a hierarchy
that represents an organizational structure, you can have two levels:
Manager and Employee. The Manager level is the parent level, and
the Employee level is the child level