Difference between fact tables and dimension tables
kamikhan17
1,714 views
5 slides
Jan 07, 2020
Slide 1 of 5
1
2
3
4
5
About This Presentation
Difference between fact tables and dimension tables
Size: 391.09 KB
Language: en
Added: Jan 07, 2020
Slides: 5 pages
Slide Content
Assignment
Title Fact & Dimensions
Subject Data Warehouse
Submitted by Kamran Haider
Roll No BSIT-2016-20
Class BSIT (6
th
)
Submitted to Sir Samar Abbas
Dated 16-04-2019
Department of Cs &IT
Ghazi University Dera Ghazi Khan
Difference between Fact tables and Dimension Tables:
Fact Tables Dimension Tables
The fact table mainly consists of business
facts and foreign keys that refer to primary
keys in the dimension tables.
A dimension table consists mainly of
descriptive attributes that are textual fields
A fact table is the central table in a star
schema of a data warehouse
Data warehouses are built using dimensional
data models which consist of fact
and dimension tables. Dimension tables are
used to describe dimensions; they
contain dimension keys, values and
attributes.
The fact table consists of two types of
columns. The foreign keys column allows
joins with dimension tables, and the
measures columns contain the data that is
being analyzed.
A dimension table typically has two types of
columns, primary keys to fact tables and
textual\descriptive data.
The fact table contains business facts (or
measures), and foreign keys which refer to
candidate keys (normally primary keys) in
the dimension tables.
Dimension tables contain descriptive
attributes (or fields) that are typically textual
fields (or discrete numbers that behave like
text).
A fact table consists of facts of a particular
business process e.g., sales revenue by
month by product.
For example, a large credit card company
could have a customer dimension with
millions of rows
Example Table
Example Table
What are surrogate key?
Surrogate Keys are integers that are assigned sequentially in the dimension table which can be
used as primary key. The surrogate key column could be identity column or database
sequences are used.
These surrogate keys are used to join dimensioned fact tables.
Usually, database sequences are used to generate surrogate key so it is
always unique number.
Surrogate keys cannot be NULLs. Surrogate key are never populated with
NULL values.
It does not hold any meaning in data warehouse, often called meaningless
numbers. It is just sequentially generated INTEGER number for better lookup
and faster joins.
Why surrogate keys are used in Data warehouse?
Basically, surrogate key is an artificial key that is used as a substitute for natural key
(NK) defined in data warehouse tables. We can use natural key or business keys as a
primary key for tables. However, it is not recommended because of following reasons:
Natural keys (NK) or Business keys are generally alphanumeric values that is
not suitable for index as traversing become slower. For example, prod123 and
prod231 etc…
Business keys are often reused after sometime. It will cause the problem as in
data warehouse we maintain historic data as well as current data.
For example, product codes can be revised and reused after few years. It will become
difficult to differentiate current products and historic products. To avoid such a
situation, surrogate keys are used.
Advantages of Surrogate Key
Below are some of advantages of using surrogate keys in data warehouse:
With help of surrogate keys, you can integrate heterogeneous data sources to
data warehouse if they don’t have natural or business keys.
Joining tables (fact and dimensions) using surrogate key is faster hence better
performance
Surrogate keys are very helpful for ETL transformations.
Data warehouse Surrogate keys are usually small integer numbers that makes
smaller index and better performance
Surrogate keys are required if you are implementing slowly changing dimension
(SCD)
Disadvantages of Surrogate Key
Below are some of disadvantages of using surrogate keys in data warehouse:
Surrogate key generation and assignment takes unnecessary burden on ETL
framework
You should not over use the surrogate keys as they don’t have any meaning in
data warehouse tables.
Data migration becomes difficult if you have database sequence associated
with surrogate key columns. You should carefully take care of number
surrogate key generation in new database otherwise you may end up with
duplicate surrogate keys.
What are natural keys?
A natural key (also known as business key) is a type of unique key, found in relational model
database design, that is formed of attributes that already exist in the real world. It is used in
business-related columns.
A natural key is a column or set of columns that already exist in the table (e.g. they are
attributes of the entity within the data model) and uniquely identify a record in the table. Since
these columns are attributes of the entity they obviously have business meaning. The following
is an example of a table with a natural key (SSN column) along with some sample data. Notice
that the key for the data in this table has business meaning.
Natural Key Pros
Key values have business meaning and can be used as a search key when querying the
table
Column(s) and primary key index already exist so no disk extra space is required for the
extra column/index that would be used by a surrogate key column
Fewer table joins since join columns have meaning. For example, this can reduce disk
IO by not having to perform extra reads on a lookup table
Natural Key Cons
May need to change/rework key if business requirements change. For example, if you
used SSN for your employee as in the example above and your company expands
outside of the United States not all employees would have a SSN so you would have to
come up with a new key.
More difficult to maintain if key requires multiple columns. It's much easier from the
application side dealing with a key column that is constructed with just a single column.
Poorer performance since key value is usually larger and/or is made up of multiple
columns. Larger keys will require more IO both when inserting/updating data as well as
when you query.
Can't enter record until key value is known. It's sometimes beneficial for an application
to load a placeholder record in one table then load other tables and then come back and
update the main table.
Can sometimes be difficult to pick a good key. There might be multiple candidate keys
each with their own trade-offs when it comes to design and/or performance.