Dimensional Modeling Concepts_Nishant.ppt

nishant523869 31 views 55 slides May 08, 2024
Slide 1
Slide 1 of 55
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55

About This Presentation

DWH Dimensional Modeling


Slide Content

A Beginners’ Guide
By Nishant Gupta

Adatawarehouseisarelationaldatabasethatisacollectionoflarge
amountofdatadesignedforreportingandanalysispurpose.Itkeeps
historicaldatafromvarioussourceseparatingtheanalyticaldatafromthe
transactionaldataenablingitsenterprisewideconsolidation.Themain
sourceofthedataiscleaned,transformedandcatalogedusing
extraction,transportation,transformation,andloading(ETL)solutionfor
onlineanalyticalprocessing(OLAP),dataminingcapabilities,client
analysistools,andotherapplicationsthatmanagetheprocessof
gatheringdataanddeliveringittobusinessusers.

Adatawarehouseisasubject-oriented,integrated,time-variantandnon-
volatilecollectionofdatainsupportofmanagement'sdecisionmaking
process.
-W.H.Inmon
Adatawarehouseisacopyoftransactiondataspecificallystructuredfor
queryandanalysis.
-RalphKimball

Onlinetransactionprocessing,orOLTP,arethoseapplicationswhich
caterstodailytransactionalneedsofabusiness.Inthesekindofsystems
fasterturnaround/responsetimeisthekeyfordatastoring&retrievalpoint
ofview.
Onlineanalyticalprocessing,orOLAP,arethosesystemswhichare
usedfordecisionmaking&analyticspurpose.Herethroughputismore
criticalthanresponsetimetogenerateresultsforadhocqueries,business
intelligence,relationalreporting,datamining&forecasting/budgeting
sake.

OLTP OLAP
Data Source Operational data Consolidateddata from various OLTP sources
Purpose of data DailyBusiness transactions For Decisionsupport, planning, reporting &
analysis
Insert & Updates Short& Fast Periodic long running batch jobs
Queries Simple queries that returns few
records
Often complex or ad hoc queries involving
aggregations
Processing Speed Very Fast Depends on volume of data
Space Requirements Less Large
Database Design Highly normalized Denormalized
Back Up & Recovery Regularback ups & proper
recovery plans/policies
In frequent back up policy

Data Ware House Architecture
OS
OS
Raw Data
Summary
Data
Metadata
Inventory
Sales
Purchasing
Flat Files
STG DB
Analyti
cs
Report
ing
Mining
Data Source Staging Ware House Data Marts Users

OperationalSource:Thesearebasicallythoseoperationalsystems
whichareusedforbusinesstransactionpurpose.Itresidesoutsidethe
realmsofdatawarehosuingtoprovideperformanceefficiency&
availabilitytocatertoroutinebusinessoperations/transactions.
DataStagingArea:Thisisbasicallytheplacesusedbothfordata
storage&ETLprocessessuchascleansingthedata(correcting
misspellings,resolvingdomainconflicts,dealingwithmissingelements,
orparsingintostandardformats),combiningdatafrommultiplesources,
deduplicatingdata,andassigningwarehousekeys.Itshouldbeofflimit
tobusinessuser&mayconsistnormalizedstructure.
DataPresentationArea(Mart):Thedatapresentationareaiswhere
dataisorganized,stored,andmadeavailablefordirectqueryingby
users,reportwriters,andotheranalyticalapplications.Itconsistsofa
dimensionalmodelwhichisatomic&simpleenablingperformance-
enhancingsummarydata,oraggregates.

DataAccessTools:Itisbasicallythevarietyofabilitiesprovidedto
endsusertoqueryorconsumethedatawarehousedataforanalytical&
decisionmakingpurpose.Itcouldeitherbeasimpleadhocqueryor
dataminingtooloranyreportingapplication.
Metadata:Itisnottheactuallytheoperationaldatabuttheconfiguration
datanecessaryforthegeneralfunctioningofourdatawarehouse.Itmay
besystemtables,partitionsettings,indexes,viewdefinitions,
numerator/denominatordefinitions,Inclusion/exclusiondefinitionsand
DBMS-levelsecurityprivilegesandgrants,businessnamesand
definitionsfortheDataMarttablesandcolumnsaswellasconstraint
filters,applicationtemplatespecifications,accessandusagestatistics,
andotheruserdocumentation.
OperationalDataStore:AnODSisimplementedtodeliveradequate
operationalreporting,especiallywhenneitherthelegacynoron-line
transactionprocessing(OLTP)systemsprovidesthem.Performance-
enhancingaggregations,significanthistoricaltimeseries,andextensive
descriptiveattributionarespecificallyexcludedfromtheODS.

DimensionalModelingisatechniqueoftenassociatedwithlogical
designingofaDatawarehouses.Itisamodelingmethodwhich
primarilyconsistsofFacts&Dimensions.Ithasfollowingmainfeatures:
Itisprimarilyqueryoriented
Itisstructuredaccordingtodatausagethanthebusinessrules/needs
Itisorganizedintobasefacts,dimensionsofthosefacts&lookupsof
thosedimensions
Itisbasedonidentificationofkeygrainsofdataandtheir
characteristicsItusuallycomprisessnapshot,groupedorSummarydata
Itnormallyhaslessnumberofjoins&itsdepth
Itismoreextensibleinnaturewithnewdatabeingeasily
accommodatedwithoutchangingtheexistingstructureorquery

DimensionalModelscanbeorganizedintovariousschemaslike:
StarSchema
SnowflakeSchema
Constellation
StarSchema:ItisthesimplestformoftheDatawarehouseschema.In
thisoneormorefacttableisconnectedwithmultidimensionaltables
resemblingthestarformation.Thecenterofthestarconsistsofalarge
facttableandthepointsofthestararethedimensiontables.The
primarykeyineachdimensiontableisrelatedtoaforeignkeyinthefact
table.Inotherwords,theyallhavethesamelevelofgranularity.

Benefits:
Itprovidesadirect&simplemappingbetweenthebusinessentitiesand
theschemadesign.
Itprovideshighlyoptimizedqueryperformanceduetosimplejoins
betweenoneFact&Dimensions
ItiswidelysupportedbyvariousBItools
Product Dim
Time Dim
Sales Fact
Location Dim
Customer Dim

SnowFlakeSchemaisanextensionoftheStarschemawhereeachpoint
ofstarisfurtherexplodesintomorepoints.Inotherwords,the
dimensionaltablesarefurthernormalized(3
rd
NormalForm)intomultiple
relatedlookuptableseachrepresentingalevelinthedimensional
hierarchy.TheSnowflakeschemaderivesitsnameduetoitsresemblance
totheshapeofarealSnowflake.The"snowflaking"effectonlyaffects
thedimensiontablesandnotthefacttables.
Pros:
Iteliminatesredundancy
Itrequireslessdiskspaceforstorage
Itrepresentstherealworldscenarioinschemadesign
ItaidsintransactionalreportingviaDatawarehouse
Cons:
Itrequiresadditionalmaintenanceeffortduetoincreaseinnumberof
lookuptables
Itincreasenumberofjoinsresultinginpoorperformanceofdataretrieval

Product Dim
Time Dim
Sales Fact
Location Dim
Customer Dim
Product
Category Look
Up
Month Look
Up
Customer
Type Look Up
State Look Up

FactConstellationSchemaasnamesuggestedisagroupoffact
tablessharingmultipledimensionsbetweeneachotherrepresenting
shapesimilartolikeaconstellationofstars(i.e.,starschemas).Itis
prettycomplexinnaturehenceshouldbeusedforapplicationswhichare
highlysophisticated&complicatedinnature.
Product
Dim
Time Dim
Sales Fact
Location
Dim
Customer
Dim
Shipping
Fact
Shipper
Dim
Time Dim

AsperRalphKimballthefollowingmentionedfourstepsaretheback
boneofanydimensionaldesignprocess:
Selectthebusinessprocesstomodel
Declarethegrainofthebusinessprocess
Choosethedimensionsthatapplytoeachfacttablerow
Identifythenumericfactsthatwillpopulateeachfacttablerow

Firstlyweneedtoselectthebusinessprocessforwhichthe
dimensionalmodelneedstobedesigned.Abusinessprocessmay
requiremorethanonedimensionalmodel.Abusinessprocessisasetof
relatedactivitiessharedacrosslineofservicesorbusinessdepartments.
Toidentifythebusinessprocessesofadimensionalmodel,wecollect
thefollowingmetadata:
Businessrequirements&processes
Stakeholders
Sourcesystems
Dataqualityrelatedissues
Businessprocessrelatedglossary
Otherbusiness-relatedmetadata

Nextstepistoidentify&declarethegrainoftheofthemodel.The
grainofatablerepresentsthemostatomiclevelbywhichthetablesmay
bedefined.
Preferablyweshoulddevelopdimensionalmodelsforthemostatomic
informationcapturedbyabusinessprocess.Atomicdataisthemost
detailedinformationcollected;suchdatacannotbesubdividedfurther.
Atomicdataishighlydimensional&givesusthecapabilitytodrilldown
tothelowestlevelofdetails.Itreallyhelpsinslicing&dicingofthe
data.
ForexamplethegrainofaSALESfacttablemightbestatedas"Sales
volumebyDaybyProductbyStore".Eachrecordinthisfacttableis
thereforeuniquelydefinedbyaday,productandstore.

Dimensionsarebasicallyusedtodescribethebusinessentitiesofan
enterpriseoftencomposedofoneormorehierarchies,thatcategorizes
data.Itcontainsthetextualdescriptorofbusinessusedforfiltering,
grouping&labeling.Dimensiondataistypicallycollectedatthelowest
levelofdetailandthenaggregatedintohigherleveltotalsthataremore
usefulforanalysis.Thesenaturalrollupsoraggregationswithina
dimensiontablearecalledhierarchies.
Inotherwords,dimensiontablescontainattributesthatdescribefact
recordsinthefacttable.Someoftheseattributesprovidedescriptive
information;othersareusedtospecifyhowfacttabledatashouldbe
summarizedtoprovideusefulinformation.Inanycaseitmustcontain
oneprimarykeyusedtouniquelyidentifyeachrecordsindimension
table,e.g.:
ProductDimension >LocationDimension
TimeDimension >CustomerDimension

Therearevarioustypeofdimensionsavailable,suchas,
ConformedDimension
JunkDimension
DegeneratedDimension
RoleplayingDimension
SlowlychangingDimension
RapidlychangingDimension

Conformeddimensionsarethosewhichareeitheridenticaloraperfect
subsetofthemostgranular,detaileddimension.Conformeddimensions
have:
Consistentdimensionkeys
Consistentattributecolumnnames
Consistentattributedéfinitions
Consistentattributevalues
Dimensiontablesarenotconformediftheattributesarelabeled
differentlyorcontaindifferentvalues.Incasethedimensionslike
ProductorCustomeraredeployedinonconformedmannerthen
differentDataMartscan’tbemergedorusedtogether.

ThevariousflavorsofConformeddimensionsare:
Exactlysamedimensionsjoinedwitheverypossiblefacttablesacross
datamarts
Conformeddimensionsatarolleduplevelofgranularitylike
maintainingweeklyinventorysnapshotalongwithdailysnapshot.In
anothersituationlikesales&forecastingfactsaremaintainedatatomic
productlevel&brandlevelrespectively.Roll-updimensionsconformto
thebase-levelatomicdimensioniftheyareastrictsubsetofthatatomic
dimension
Product Dimension
Product Key
Prod Description
Brand Description
SKU Number
Category
Brand Dimensions
Brand Key
Description
Category
Conforms

Conformeddimensionssubsetsatthesamegranularity.
Appliance
Products
Apparel
Products
Enterprise Product Dimension
Drilling across (conforming)

Inmanyscenarios,intheprocessofidentifyingdimensionsoutof
transactionalsystemtablescompriseofmanymiscellaneousindicators
andflags,eachofwhichtakesonasmallrangeofdiscretevaluesthat
can’tbincludedinthedimensionsresultinginverylargefacttables.By
creatinganabstractdimension,theseflagsandindicatorsareremoved
fromthefacttable&placedintoadummydimension.Thesedimensions
arecalledJunkDimensions.
Fore.g.,ifweremove10two-valueindicators,suchasthecash
versuscreditpaymenttype,fromtheorderfacttableandplacetheminto
asingledimensioni.e.junkdimensionwith1024rows(2)withasingle
smallsurrogatekeyincludedinthefacttable.
Order Indicator
Key
Payment
Type
Payment Type
Group
Inbound
/Outbound
Commission Credit
Indicator
Order Type
Indicator
1 Cash Cash I C Regular
2 Cash Cash I N Display
3 Cash Cash O C Regular

ADegenerateddimensionisadatadimensionthatisalthough
dimensionalinnaturebutstoredinfacttableItdoesn’thaveany
separatedimensiontojoin&onecanuseittoslice&dicethemeasures
infacttable.
Fore.g.,dimensionkey,suchasatransactionnumber,invoicenumber,
ticketnumber,orbill-of-ladingnumber,thathasnoattributesbutisused
toprovideadirectreferencebacktoatransactionalsystemwithoutthe
overheadofmaintainingaseparatedimensiontable.
Sales Fact table
POS Transaction No (DD)
ProductID (FK)
Date Key(FK)
Store ID (FK)
Salesquantity
Gross Profit
Cost ($)
Selling Price($)
Promotion Key(FK)
Product Dim
Date Dim
Store Dim
Promotion
Dim

Arole-playinginadatawarehouseoccurswhenasingledimension
simultaneouslyappearsseveraltimesinthesamefacttable.The
underlyingdimensionmayexistasasinglephysicaltable,buteachof
therolesshouldbepresentedasaseparatelylabeledview.Fore.g.,the
Datedimensioncanbeusedfortheordereddate,scheduledshipping
date,shipmentdate,andinvoicedateinanorderlinefactorina
InsurancedomainaCustomerdimensioncanbeusedasnominee,
proposer&beneficiaryinaPolicyDetailFact.
Order Transaction Fact
Order Date Key(FK)
Shipped Date(FK)
Invoice Date (FK)
OrderNo(DD)
Order quantity
Product Key(FK)
Total Amount
DiscountAmount
Net Order Amount
Order Line No(DD)
Date Dimension
Product Dimension

Acharacteristicofdimensionsisthatitsdataisrelativelystatic—data
maybeaddedasnewrecord,butdata,assuchchangesinfrequently.
SlowlyChangingDimensions(SCDs)aredimensionsthathavedatathat
changesslowlyoveraperiodoftime,ratherthanbeingtimebarredor
scheduled.
Totrackchangesofthesedimensionsaremoredependentonbusiness
needs&canbeachievedthroughvariouswaysaspertherequirement.
Thetechnique/methodologytohandleormanageSCDsistermedas
Type0toType6.
TypesofSCD’s
Type0:ItisanapproachinwhichtheSCDismaintainedinthesame
formasitiscreated&thechangestotheexistingrecordsareignored.It
isapassiveapproachoftrackingthedimensionvaluechanges.

Type 1:Overwrite the Value
Withthetype1,weoverwritetheoldattributevalueinthedimension
row,replacingitwiththecurrentvalue.Insodoing,theattributealways
reflectsthemostrecentassignment.Thisismostappropriatefor
rectifyingthecertaintypeofdataerrorslikemisspellingofthename
,addressetcornovalueinkeepingolddescription.
Example:SupplierstatechangesfromCAtoNY.
Old
New
Pros:Easytomaintain&fast.
Cons:Nohistorycanbekept&therecalculationorloadingofthe
aggregatefacttablebasedonstate.
Supplier_KeySupplier_CodeSupplier NameSupplier State
123 XYZ MaxTrading CA
Supplier_KeySupplier_CodeSupplier NameSupplier State
123 XYZ MaxTrading NY

Type 2: Add a dimension row
TheType2methodtrackshistoricaldatabycreatingmultiplerecordsfor
agivenbusinesskeyinthedimensionaltableswithseparatesurrogate
keyswitheffectivedatetimeand/ordifferentversionnumbersor.Using
Type2,wecankeeptheentirehistoryofarecordsbecauseanew
recordisinsertedeachtimeachangeismade.
Thetype2responseistheprimarytechniqueforaccuratelytracking
slowlychangingdimensionattributes.Itisextremelypowerfulbecause
thenewdimensionrowautomaticallypartitionshistoryinthefacttable.
Example:SupplierstatechangesfromCAtoNY
Supplier_
Key
Supplier_C
ode
Supplier
Name
Supplier
State
Version
123 XYZ MaxTradingCA 0
123 XYZ MaxTradingNY 1
Type 2 : Versioning

Pros :
Historycanbekept&trackingofunlimiteddimensionchanges
Itperfectlysegmentsfacttablehistorybecauseprechangefactrows
usetheprechangesurrogatekey.
Cons:
Itcontributesinrapidgrowthofdimensiontables
Databaseoperationslikejoinsareexpensive
Supplier_K
ey
Supplier_C
ode
Supplier
Name
Supplier
State
Start Date EndDate
123 XYZ MaxTradingCA 01/11/2009 07/22/2010
123 XYZ MaxTradingNY 07/23/2010
Type 2: Effective Date Stamp Method

Type 3 : Add a Dimension Column
Type3solutionstrackchangeshorizontallyinthedimensiontableby
addingnewfieldstocontaintheolddata.Oftenonlytheoriginaland
currentvaluesareretainedandintermediatevaluesarediscardedbutat
timeswealsoretaintheprevious&currentvaluesonly.Thiskindoftypeis
usedwherewewanttocomparethecurrentdatavalueswithoriginalor
previousdatavaluesinonegoforthepurposelikesalesforce
reorganizationsetc.
Pros:Avoidanceofmultipledimensionrecordsforsingleentity.
Cons:Limitedhistorytracking&morecomplexqueriestoaccesstheold
values.
Supplier_Key Supplier_CodeSupplier NameOriginal Supplier
State
Current Supplier
State
123 XYZ MaxTrading CA NY

Predictable Changes with Multiple Version Overlays :
Insomecasesweneedtoretainthehistoryfor4-5yearsortimeswhere
thenumberisknowntousorwecanpredictthechangeforagiven
attribute.ItisanextensiontoType3wherewecankeeponaddingthe
columnforachangingattributeas&whenitchangeswhilekeepingthe
latestdatavalueinthecurrentvaluecolumn.For,e.g.,Salesrepdistrict
gotrevisedeveryyearortwothenwecandesigntheSalesRepDimtable
(asshowninfigure).IfhisDistrictchanges
laterthenweneedtoonlyaddanother
columnwithlabelDistrictfor2010withvalue
fromcurrentdistrict&overwritethe
currentdistrictwithnewvalue.
Sales Representative Dimension
Name
Key
Address
Current Sales District
District 2009
District 2008
…….

Unpredictable Changes with Single-Version Overlay:
Itbasicallycombinestheapproachofalltype1,2&3intosingletype.
Thismakesensewhenonehastopreservehistoricalaccuracy
surroundingunpredictableattributechangeswhilesupportingtheabilityto
reporthistoricaldataaccordingtothecurrentvalues.Thisisonlypossible
byclubbingallthe3typestogetherasshownbelow:
InthisnewdimensionrowforSupplier,thecurrentstatewillbeidenticalto
thehistoricalstate.ForallpreviousinstancesofthatSupplierdimension
rows,thecurrentstateattributewillbeoverwrittentoreflectthecurrent
structure.
Supplier
_Key
Supplier
_Code
Supplier Name Historical S -
upplierState
Current
Supplier State
Current
Flag
Start
Date
End
Date
123 XYZ MaxTrading CA IL N
124 XYZ MaxTrading NY IL N
125 XYZ MaxTrading IL IL Y

Arapidlychangingdimensionisadimensionifoneormoreofits
attributeschangesfrequentlyinmanyrows.Forarapidlychanging
dimension,thedimensiontablecangrowverylargefromtheapplication
ofnumerousType2changes.
Fore.g.,letstakeacaseofCustomerdimensionwith10Krecords
with10changepercustomerperyearwillresultin500krecordsin5
yearswhichisacceptablebutconsideranyfinancialorinsurance
organizationwherenotonlychangesbutalsothecustomerbaseis
hugethatcouldresultinadditionofmultimillionrecordovertheperiod
oftimeresultinginrapidlychangingmonsterdimensionproblemlike
browsingperformance&changetrackingchallenges.

Thesolutionistobreakofffrequentlyanalyzedorfrequentlychanging
attributesintoaseparatedimension,referredtoasaminidimension&
trackthemasband.Aseparatedimensionofvariablevalued
demographicattributes,suchasage,gender,numberofchildren,and
incomelevel,presumingthatthesecolumnsgetusedextensively.There
wouldbeonerowinthisminidimensionforeachuniquecombinationof
age,gender,numberofchildren,andincomelevelencounteredinthe
data,notonerowpercustomer(referfigurebelow).Businessneedswill
determinewhichcontinuouslyvariableattributesaresuitablefor
convertingtobands.
DemographicKeyAge Gender Income Level
1 20-24 M 0-$20000
2 20-24 M $20000-$24999
3 20-24 F 0-$20000
4 25-29 M 0-$20000
5 25-29 F 0-$20000

Everytimewebuildafacttablerow,weincludetwoforeignkeysrelated
tothecustomer:theregularcustomerdimensionkeyandthe
minidimensiondemographicskey.AsshowninFigurebelow,the
demographicskeyshouldbepartofthefacttable’ssetofforeignkeysin
ordertoprovideefficientaccesstothefacttablethroughthe
demographicsattributes.
Customer Dim
Customer Key (PK)
CustomerID (NK)
Customer Name
Address
DOB
…….
Age
Gender
Income
No of Children
Customer Dim
Customer Key (PK)
CustomerID (NK)
Customer Name
Address
……….
CustDemo Dim
C Demo Key (PK)
C Age Band
Gender
C Income Band
becomes
Fact Table
Customer Key (FK)
C Demo Key (FK)
More Foreign
Keys…..
Facts…….

The minidimensionterminology refers to when the demographics key is
part of the fact table composite key; if the demographics key is a foreign
key in the customer dimension, we refer to it as an outrigger which has
to be a Type 1 attribute
The best approach for efficiently browsing and tracking changes of key
attributes in really huge dimensions is to break off one or more
minidimensions from the dimension table, each consisting of small
clumps of attributes that have been administered to have a limited
number of values
To store exact values instead of the bands or ranges sometimes one
need to create fact less schema that focuses on attribute changes. In
this case the dimension & minidimension are connected via a dummy
fact table which only consists the keys from various dimension table but
no numeric measurement values

Afacttableistheprimarytableinadimensionalmodelwherethenumerical
performancemeasurementsofthebusinessarestored.Itbasicallyconsistsoftwo
typesofcolumns:onethosecontainmeasurements&theotherwhichareforeign
keytodimensionaltables.Thislistofdimensionsdefinesthegrainofthefacttable
andtellsuswhatthescopeofthemeasurementis.
Arowinafacttablecorrespondstoameasurement.Ameasurementisa
rowinafacttable.Allthemeasurementsinafacttablemustbeatthesame
grain.
AlsoFacttablesexpressthemany-to-manyrelationshipsbetween
dimensionsindimensionalmodels.
Theprimarykeyofafacttableisusuallyacompositekeythatismadeupofallof
itsforeignkeys.Facttablescontainthecontentofthedatawarehouseandstore
differenttypesofmeasureslikeadditive,nonadditive,andsemiadditivemeasures.

On the basis of measure :
Additive
Semi additive
Non additive
Fact less fact or Junction Fact
On the basis of measurement events:
Transactional snapshots
Periodic snapshots
Accumulating snapshots

AnAdditivefactsarethemeasurementsthatcanbesummedup
throughallofthedimensionsinthefacttable.Themostusefulfactsin
afacttablearenumericandadditive.Inthefigure,threeofthefacts,
salesquantity,salesdollaramount,andcostdollaramount,are
beautifullyadditiveacrossallthedimensions.Wecansliceanddicethe
facttablewithimpunity,andeverysumofthesethreefactsisvalidand
correct.
Retail Sales Transactions Fact
Date Key(FK)
ProductKey (FK)
Store Key(FK)
TxnNo
Sales Amount($)
Sales Quantity
CostAmount($)
Gross Profit($)
Gross Margin(%)

A Semi Additivefacts are the measurements that can be added across
few of the dimensions but not against all in the fact table. In the figure,
Bill amount is a semi additive fact because it can be added up for a date
or Store key to arrive at the total sales amount for the day or for a given
store but adding up against Product key doesn’t make any sense.
Retail Sales Transactions Fact
Date Key(FK)
ProductKey (FK)
Store Key(FK)
TxnNo
Sales Amount($)
Sales Quantity
CostAmount($)
Gross Profit($)
Gross Margin(%)
Bill Amount($)

ANonAdditivefactsarethemeasurementsthatcannotbeadded
acrossanyofthedimensionsinthefacttable.Percentagesandratios,
suchasgrossmargin,arenonadditive.Thenumeratoranddenominator
shouldbestoredinthefacttable.Theratiocanbecalculatedinadata
accesstoolforanysliceofthefacttablebyrememberingtocalculatethe
ratioofthesums,notthesumoftheratios.Inthefigure,UnitPrice&
Grossmarginarenonadditivefactsbecausethecan’tbesummarized
alonganydimension.
Retail Sales Transactions Fact
Date Key(FK)
ProductKey (FK)
Store Key(FK)
TxnNo
Sales Amount($)
Gross Profit($)
Gross Margin(%)
Unit Price($)

Afactlessfactsarethosetablesthatdoesn’thaveanymeasurement
metrics,itmerelycapturestherelationshipbetweentheinvolved
dimensionkey.Afacttablethathasnofactsbutcapturescertainmany
to-manyrelationshipsbetweenthedimensionkeys.Mostoftenusedto
representeventsorprovidecoverageinformationthatdoesnotappearin
otherfacttables.Fore.g.,todeterminewhatproductswhereon
promotionbutdidn’tsellrequiresaseparatepromotioncoveragefact
table,we’dloadonerowinthefacttableforeachproductonpromotion
inastoreeachday(orweek,sincemanyretailpromotionsareaweekin
duration)regardlessofwhethertheproductsoldornot.Thistableisfact
lessfactduetoabsenceofanymeasurements.
Promotion Coverage Fact
Date Key(FK)
ProductKey (FK)
Store Key(FK)
Promotion Key (FK)
Product Dim
Time Dim
Store Dim
Promotion Dim

Factsfrommultiplefacttablesareconformedwhenthetechnical
definitionsofthefactsareequivalent.Conformedfactsareallowed
tohavethesamenameinseparatetablesandcanbecombinedand
comparedmathematically.Iffactsdonotconform,thenthedifferent
interpretationsmustbegivendifferentnames.

Atransactionalsnapshotfacttablerepresentsapointoftimeinthe
lifeofbusinessevents.Arowexistsinthefacttableforagiven
dimensiononlyifatransactioneventoccurred.Transactionalfacttable
holdsdataofthemostdetailedlevel,causing
ittohaveagreatnumberofdimensions
associatedwithit.Onceatransactionhas
beenposted,needn’tberevisited.
Transactional Fact Table
TxnNo
Item ID (FK)
BillNo (DD)
Count of Item
Cost
Profit
SellingPrice
Store ID (FK)
Date Key(FK)
Tax Amount
Discount
Promotional Code (FK)

APeriodicsnapshotfactrepresentsapre-definedintervaloraperiod.
Unlikethetransactionfacttable,withtheperiodicsnapshot,we
takeapicture(hencethesnapshotterminology)of
theactivityattheendofaday,week,ormonth,
thenanotherpictureattheendofthenextperiod,
andsoon.Theperiodicsnapshotsarestacked
consecutivelyintothefacttable.Dailysnapshots
andmonthlysnapshotsarecommon.Aseparate
recordisplacedinaperiodicsnapshotfacttable
eachperiodregardlessofwhetheranyactivityhastakenplaceinthe
underlyingtransaction.
Periodic Fact Table
Item ID (FK)
Store ID (FK)
Date Key(FK)
Cost ($)
Last SellingPrice($)
Quantity on hand
Quantity Sold

Anaccumulatingsnapshotfacttablerepresentsbusinessactivities
overatimeperiod.Accumulatingsnapshotsalmost
alwayshavemultipledatestamps,representingthe
predictablemajoreventsorphasesthattakeplace
duringthecourseofalifetime.Oftenthere’san
additionaldatecolumnthatindicateswhenthe
snapshotrowwaslastupdated.Sincemanyofthese
datesarenotknownwhenthefactrowisfirstloaded,
wemustusesurrogatedatekeystohandleundefined
dates.Thefacttableisrevisitedandupdatedas
activityoccurs.ArecordisplacedinanAccumulating
snapshotfacttablejustonce,whentheitemthatit
representsisfirstcreatedrestofthetimeitisjust
updatedwithnewdatevalues.
Accumulating Snapshot
Fact Table
TxnNo
Order Date Key (FK)
Backlog Date Key (FK)
Release to Manufacturing
Date Key (FK)
Finished Inventory
Placement Date Key (FK)
Requested Ship Date Key
(FK)
Scheduled Ship Date Key
(FK)
Actual Ship Date Key (FK)
Arrival Date Key (FK)
Invoice Date Key (FK)
……..

Atablewithamultipartkeycapturingamany-to-manyrelationship
thatcan’tbeaccommodatedbythenaturalgranularityofasinglefact
tableorsingle-dimensiontable.Servestobridgebetweenthefacttable
andthedimensiontableinordertoallowmany-valueddimensionsor
raggedhierarchies.Sometimesreferredtoasahelperorassociative
table.Whenusingabridgetable,thefactsinthefacttablearemultiplied
bythebridgetable’sweightingfactortoappropriatelyallocatethefactsto
themultivalueddimension.ItiscalledWeightReport&ifweexcludethis
thenitturnsouttobeImpactreport.
Diagnosis Group
Bridge
Diagnosis Group
Key(FK)
Diagnosis Key(FK)
Weighted Factor
Diagnosis
Group Dim
Diagnosis
Group(PK)
Health Care
Billing Line Item
Fact
Diagnosis
Dimension

Hierarchiesarelogicalstructuresthatuseorderedlevelsasameans
oforganizingdata.Ahierarchycanbeusedtodefinedata
aggregation(lowerlevelsaggregated&rolleduptohigherlevels).
Forexample,inatimedimension,ahierarchymightaggregatedata
fromthemonthleveltothequarterleveltotheyearlevel.A
hierarchycanalsobeusedtodefineanavigationaldrillpathandto
establishafamilystructure.
Alevelrepresentsapositioninahierarchy
Levelrelationshipsspecifytop-to-bottomorderingoflevelsfrom
roottoleavesnode.
Eachlevelislogicallyconnectedtothelevelsabove(parent)and
below(children)it.
Adimensioncanbecomposedofmorethanonehierarchy.For
example,intheproductdimension,theremightbetwohierarchies--
oneforproductcategoriesandoneforproductsuppliers.

Balanced
Unbalanced
Ragged
Parentchildrelationship
CEO
CIO COO
Fin HeadIT Head
Emp2
HR Head
Emp1 Emp3 Emp4

Inbalancedhierarchies(balanced/standard),thebranchesofthe
hierarchyalldescendtothesamelevel,witheachmember'sparent
beingatthelevelimmediatelyabovethemember.Theyare
consistentbecauseeachlevelrepresentsthesametypeof
information,andeachlevelislogicallyequivalent.Ancommon
exampleofabalancedhierarchyisonethatrepresentstime,where
thedepthofeachlevel(year,quarter,andmonth)isconsistent.
2009
1
st
Quarter
Jan MarFeb
2010
1
st
Quarter
Jan MarFeb

Unbalancedhierarchiesincludelevelsthathaveaconsistentparent-
childrelationship,buthavelogicallyinconsistentlevels.The
hierarchybranchescanalsohaveinconsistentdepths.Fore.g.,the
organizationalstructureisunbalanced,withsomebranchesinthe
hierarchyhavingmorelevelsthanothers.Inanunbalanced
hierarchy,nullvaluescanappearonthelowerlevelsofthe
hierarchy.
CEO
CIO COO
IT Head
Emp2
Admin 1
Emp1
CFO
Admin 2Fin 1 Fin 2
Emp3

Inaraggedhierarchy,thelogicalparentmemberofatleastone
memberisnotinthelevelimmediatelyabovethemember.Thiscan
causebranchesofthehierarchytodescendtodifferentlevels.A
raggedhierarchycanrepresentageographichierarchyinwhichthe
meaningofeachlevelsuchascityorcountryisusedconsistently,
butthedepthofthehierarchyvaries
North America
Athens
Greece
Europe
US
CA
San Francisco

A parent-child hierarchyis a hierarchy with multiple levels that track
the relationships within the hierarchy. A single table or view is used
that represents the parent-child hierarchy. A view can be used to
flatten the structure in case this kind of hierarchy used multiple
tables . The top level uses the parent key as the level key, whereas
the bottom level contains the child key. For example, in a hierarchy
that represents an organizational structure, you can have two levels:
Manager and Employee. The Manager level is the parent level, and
the Employee level is the child level
Tags