Course Outline
Week 1: Introduction to Business Intelligence , data warehouses and
Corporate Information
Week 2: Data warehouse architectures
Week 3: Data warehouse lifecycle and project management
Week 4: Introduction to Dimensional Modeling
Week 5: Building dimensional models – tutorial session
Week 6: Test 1 and Data extraction, transformation and loading(ETL)
Week 7: Online analytical processing (OLAP)
Week 8: Data mining
Week 9: Data warehousing and the Web
Week 10: Data warehousing and the Web
Week 11: Matching information to users
2
Introduction
Data is an enterprise’s most valuable asset. To
fuel innovation, which fuels growth, an
enterprise must:
Store every relevant data point about their
business
Give data access to everyone who needs it
Have the ability to analyze the data in
different ways
Distill the data down to insights
3
Introduction
Data
Items that are the most elementary descriptions of
things, events, activities, and transactions
May be internal or external
Information
Organized data that has meaning and value
Knowledge
Processed data or information that conveys
understanding or learning applicable to a problem
or activity
4
Introduction
Over the years, storage and management of data
from various operational systems has become a
great challenge.
Long-term strategic planning has become
increasingly important in the modern global
market.
For this reason, companies have worked towards:
Access to information at all levels
Survival and prosperity in a competitive world.
The focus of technology shifted from data input
and capture through the operational systems to
information access and availability for strategic
planning.
5
6
Executives and managers need information to
formulate the business strategies, establish goals, set
objectives, and monitor results.
Examples of business objectives include:
Retain the present customer base
Increase the customer base by 15% over the next 5
years
Gain market share by 10% in the next 3 years
Improve product quality levels in the top five product
groups
Enhance customer service level in shipments
Bring three new products to market in 2 years
Increase sales by 15% in a given supermarket branch
7
What is a Data Warehouse (DWH)?
A decision support database that is maintained
separately from the organization’s operational database
Support information processing by providing a solid
platform of consolidated, historical data for analysis.
“A data warehouse is a subject-oriented,
integrated, time-variant, historical and non-
volatile collection of data in support of
management’s decision-making process.”—
W. H. Inmon
8
9
Data Warehouse—Subject-Oriented
Organized around major subjects, such as customer,
product, sales.
Focusing on the modeling and analysis of data for
decision makers, not on daily operations or
transaction processing.
Provide a simple and concise view around particular
subject issues by excluding data that are not useful in
the decision support process.
10
Data Warehouse—Integrated
Constructed by integrating multiple,
heterogeneous data sources
relational databases, flat files, on-line transaction
records
Data cleaning and data integration techniques
are applied.
Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different
data sources
E.g., Hotel price: currency, tax, breakfast covered, etc.
When data is moved to the warehouse, it is converted.
11
Data Warehouse—Time Variant
The time horizon for the data warehouse is
significantly longer than that of operational systems.
Operational database: current value data.
Data warehouse data: provide information from a
historical perspective (e.g., past 5-10 years)
Every key structure in the data warehouse
Contains an element of time, explicitly or implicitly
But the key of operational data may or may not contain
“time element”.
12
Data Warehouse—Non-Volatile
A physically separate store of data transformed
from the operational environment.
Operational update of data does not occur in
the data warehouse environment.
Does not require transaction processing, recovery,
and concurrency control mechanisms
Requires only two operations in data accessing:
initial loading of data and access of data.
Basic concept of data warehousing
Take all the data from the operational systems
Where necessary, include relevant data from
outside, such as industry benchmark indicators
Integrate all the data from the various sources
Remove inconsistencies and transform the
data
Store the data in formats suitable for easy
access for decision making
13
A blend of technologies
14
15
Although a simple concept, it involves different
functions: data extraction, the function of loading
the data, transforming the data, storing the data,
and providing user interfaces.
Different technologies are, therefore, needed to
support these functions. Although many
technologies are in use, they all work together in a
data warehouse.
The end result is the creation of a new computing
environment for the purpose of providing the
strategic information every enterprise needs
desperately.
There are several vendor tools available in each of
these technologies. You do not have to build your
data warehouse from scratch.
16
Data Warehouse vs. traditional integration in
Heterogeneous DBMS
Traditional heterogeneous DB integration:
Build wrappers/mediators on top of heterogeneous databases
Query driven approach
When a query is posed to a client site, a meta-dictionary is used to
translate the query into queries appropriate for individual
heterogeneous sites involved, and the results are integrated into a
global answer set
Complex information filtering, compete for resources
Data warehouse: update-driven, high performance
Information from heterogeneous sources is integrated in advance
and stored in warehouses for direct query and analysis
17
Data Warehouse (OLAP) vs. Operational DBMS (OLTP)
OLTP (on-line transaction processing)
Major task of traditional relational DBMS
Day-to-day operations such as purchasing, inventory,
banking, manufacturing, payroll, registration,
accounting, etc.
OLAP (on-line analytical processing)
Major task of data warehouse system
Data analysis and decision making
Distinct features (OLTP vs. OLAP):
User and system orientation: customer vs. market
Data contents: current, detailed vs. historical, consolidated
Database design: ER + application vs. star + subject
View: current, local vs. evolutionary, integrated
Access patterns: update vs. read-only but complex queries
18
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date
detailed, flat relational
isolated
historical,
summarized, multidimensional
integrated, consolidated
usage repetitive ad-hoc
access read/write
index/hash on prim. key
lots of scans
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response
Successfully implemented data warehouses can bring
benefits to an organization as below:
1. Organisational Performance: - Operational Systems and
Data Warehouse
2. Simplify - Make Complex Data from Many Systems
available in one
3. Accuracy - Standardize and Cleanse
4. Business Value - Provide the Foundation for the Business
to Have Access to Information to Make Timely, Informed
Decisions
5. Direct Use- Non-IT personnel can make reports
19
20
Typical data warehouse queries (Case study: Banking
industry)
• Which corporate customers are above the average
account usage per month and how does this correlate to
their business?
• Who were the first hundred customers in Jan 2006 and
how does this list compare with the list for the previous
three years?
• What is the revenue by destination, by month, by
business unit, by region?
Complexities of Creating a Data Warehouse
Incomplete errors
Missing Fields
Records or Fields That, by Design, are not Being
Recorded
Incorrect errors
Wrong Calculations, Aggregations
Duplicate Records
Wrong Information Entered into Source
System
Inconsistency errors
Inconsistent Use of Different Codes
21
Best Practices
Data Warehousing is a process and not a
project
Complete requirements and design
Prototyping is key to business
understanding
Utilizing proper aggregations and detailed
data
A full iterative approach is essential
Training is an on-going process
Build data integrity checks into your system
22
23
•High investment
•The initial cost of building a data warehouse is very high
and ROI cannot easily be explained.
• Large storage
•Data warehouse stores useful historical data of an
enterprise.
• Maintenance of source systems
• If data source systems are not cleaned, we automatically
get dirty data into the data warehouse. Decision makers
using such data are likely to be misled and their decisions
may lead to loss of company revenue.
• Qualified staff
•Data warehouse building and maintenance requires
skilled personnel.
24
•New insights into
•Customer habits
•Developing new products
•Selling more products
•Cost savings and revenue increases
•Cross-selling of products
•Identify and target most profitable customers
Conclusions
Building data warehouse is good but not
sufficient. The data in a data warehouse has to
be accessed by users and in order to access it; a
BI tool has to be used.
25
All processes, techniques, and tools that support
business decision making based on information
technology.
The approaches can range from a simple excel
spreadsheet to a major competitive intelligence
undertaking.
e.g. data visualization, data mining, statistical
analysis using R, SPSS, etc
What is Business Intelligence (BI)?
26
Discussion Questions
The current trends in hardware/software technology make
data warehousing feasible. Explain via some examples how
exactly technology trends do help.
You are the IT Director of a nationwide insurance company.
Write a memo to the Executive Vice President explaining the
types of opportunities that can be realized with readily
available strategic information.
For an airlines company, how can strategic information
increase the number of frequent flyers? Discuss giving
specific details.
You are a Senior Analyst in the IT department of a company
manufacturing automobile parts. The marketing VP is
complaining about the poor response by IT in providing
strategic information. Explain the reasons for the problems
and why a data warehouse would be the only viable solution.
27