Introduction to Business Intelligence and Data warehousing - ppt

nansambakuluthum7 22 views 27 slides Aug 08, 2024
Slide 1
Slide 1 of 27
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27

About This Presentation

For data mining and cleaning


Slide Content

By:
Dr. Alice Mugisha
[email protected]

Course Outline
Week 1: Introduction to Business Intelligence , data warehouses and
Corporate Information
Week 2: Data warehouse architectures
Week 3: Data warehouse lifecycle and project management
Week 4: Introduction to Dimensional Modeling
Week 5: Building dimensional models – tutorial session
Week 6: Test 1 and Data extraction, transformation and loading(ETL)
Week 7: Online analytical processing (OLAP)
Week 8: Data mining
Week 9: Data warehousing and the Web
Week 10: Data warehousing and the Web
Week 11: Matching information to users
2

Introduction
Data is an enterprise’s most valuable asset. To
fuel innovation, which fuels growth, an
enterprise must:
Store every relevant data point about their
business
Give data access to everyone who needs it
Have the ability to analyze the data in
different ways
Distill the data down to insights
3

Introduction
Data
Items that are the most elementary descriptions of
things, events, activities, and transactions
May be internal or external
Information
Organized data that has meaning and value
Knowledge
Processed data or information that conveys
understanding or learning applicable to a problem
or activity
4

Introduction
Over the years, storage and management of data
from various operational systems has become a
great challenge.
Long-term strategic planning has become
increasingly important in the modern global
market.
For this reason, companies have worked towards:
Access to information at all levels
Survival and prosperity in a competitive world.
The focus of technology shifted from data input
and capture through the operational systems to
information access and availability for strategic
planning.
5

6
Executives and managers need information to
formulate the business strategies, establish goals, set
objectives, and monitor results.
Examples of business objectives include:
Retain the present customer base
Increase the customer base by 15% over the next 5
years
Gain market share by 10% in the next 3 years
Improve product quality levels in the top five product
groups
Enhance customer service level in shipments
Bring three new products to market in 2 years
Increase sales by 15% in a given supermarket branch

7

What is a Data Warehouse (DWH)?
 A decision support database that is maintained
separately from the organization’s operational database
Support information processing by providing a solid
platform of consolidated, historical data for analysis.
“A data warehouse is a subject-oriented,
integrated, time-variant, historical and non-
volatile collection of data in support of
management’s decision-making process.”—
W. H. Inmon
8

9
Data Warehouse—Subject-Oriented
Organized around major subjects, such as customer,
product, sales.
Focusing on the modeling and analysis of data for
decision makers, not on daily operations or
transaction processing.
Provide a simple and concise view around particular
subject issues by excluding data that are not useful in
the decision support process.

10
Data Warehouse—Integrated
Constructed by integrating multiple,
heterogeneous data sources
relational databases, flat files, on-line transaction
records
Data cleaning and data integration techniques
are applied.
Ensure consistency in naming conventions, encoding
structures, attribute measures, etc. among different
data sources
E.g., Hotel price: currency, tax, breakfast covered, etc.
When data is moved to the warehouse, it is converted.

11
Data Warehouse—Time Variant
The time horizon for the data warehouse is
significantly longer than that of operational systems.
Operational database: current value data.
Data warehouse data: provide information from a
historical perspective (e.g., past 5-10 years)
Every key structure in the data warehouse
Contains an element of time, explicitly or implicitly
But the key of operational data may or may not contain
“time element”.

12
Data Warehouse—Non-Volatile
A physically separate store of data transformed
from the operational environment.
Operational update of data does not occur in
the data warehouse environment.
Does not require transaction processing, recovery,
and concurrency control mechanisms
Requires only two operations in data accessing:
initial loading of data and access of data.

Basic concept of data warehousing
Take all the data from the operational systems
 Where necessary, include relevant data from
outside, such as industry benchmark indicators
 Integrate all the data from the various sources
 Remove inconsistencies and transform the
data
 Store the data in formats suitable for easy
access for decision making
13

A blend of technologies
14

15
Although a simple concept, it involves different
functions: data extraction, the function of loading
the data, transforming the data, storing the data,
and providing user interfaces.
Different technologies are, therefore, needed to
support these functions. Although many
technologies are in use, they all work together in a
data warehouse.
The end result is the creation of a new computing
environment for the purpose of providing the
strategic information every enterprise needs
desperately.
There are several vendor tools available in each of
these technologies. You do not have to build your
data warehouse from scratch.

16
Data Warehouse vs. traditional integration in
Heterogeneous DBMS
Traditional heterogeneous DB integration:
Build wrappers/mediators on top of heterogeneous databases
Query driven approach
When a query is posed to a client site, a meta-dictionary is used to
translate the query into queries appropriate for individual
heterogeneous sites involved, and the results are integrated into a
global answer set
Complex information filtering, compete for resources
Data warehouse: update-driven, high performance
Information from heterogeneous sources is integrated in advance
and stored in warehouses for direct query and analysis

17
Data Warehouse (OLAP) vs. Operational DBMS (OLTP)
OLTP (on-line transaction processing)
Major task of traditional relational DBMS
Day-to-day operations such as purchasing, inventory,
banking, manufacturing, payroll, registration,
accounting, etc.
OLAP (on-line analytical processing)
Major task of data warehouse system
Data analysis and decision making
Distinct features (OLTP vs. OLAP):
User and system orientation: customer vs. market
Data contents: current, detailed vs. historical, consolidated
Database design: ER + application vs. star + subject
View: current, local vs. evolutionary, integrated
Access patterns: update vs. read-only but complex queries

18
OLTP vs. OLAP
OLTP OLAP
users clerk, IT professional knowledge worker
function day to day operations decision support
DB design application-oriented subject-oriented
data current, up-to-date
detailed, flat relational
isolated
historical,
summarized, multidimensional
integrated, consolidated
usage repetitive ad-hoc
access read/write
index/hash on prim. key
lots of scans
unit of work short, simple transaction complex query
# records accessed tens millions
#users thousands hundreds
DB size 100MB-GB 100GB-TB
metric transaction throughput query throughput, response

Successfully implemented data warehouses can bring
benefits to an organization as below:
1. Organisational Performance: - Operational Systems and
Data Warehouse
2. Simplify - Make Complex Data from Many Systems
available in one
3. Accuracy - Standardize and Cleanse
4. Business Value - Provide the Foundation for the Business
to Have Access to Information to Make Timely, Informed
Decisions
5. Direct Use- Non-IT personnel can make reports
19

20
Typical data warehouse queries (Case study: Banking
industry)
• Which corporate customers are above the average
account usage per month and how does this correlate to
their business?
• Who were the first hundred customers in Jan 2006 and
how does this list compare with the list for the previous
three years?
• What is the revenue by destination, by month, by
business unit, by region?

Complexities of Creating a Data Warehouse
Incomplete errors
Missing Fields
Records or Fields That, by Design, are not Being
Recorded
Incorrect errors
Wrong Calculations, Aggregations
Duplicate Records
Wrong Information Entered into Source
System
Inconsistency errors
Inconsistent Use of Different Codes
21

Best Practices
Data Warehousing is a process and not a
project
Complete requirements and design
Prototyping is key to business
understanding
Utilizing proper aggregations and detailed
data
A full iterative approach is essential
Training is an on-going process
Build data integrity checks into your system
22

23
•High investment
•The initial cost of building a data warehouse is very high
and ROI cannot easily be explained.
• Large storage
•Data warehouse stores useful historical data of an
enterprise.
• Maintenance of source systems
• If data source systems are not cleaned, we automatically
get dirty data into the data warehouse. Decision makers
using such data are likely to be misled and their decisions
may lead to loss of company revenue.
• Qualified staff
•Data warehouse building and maintenance requires
skilled personnel.

24
•New insights into
•Customer habits
•Developing new products
•Selling more products
•Cost savings and revenue increases
•Cross-selling of products
•Identify and target most profitable customers

Conclusions
 Building data warehouse is good but not
sufficient. The data in a data warehouse has to
be accessed by users and in order to access it; a
BI tool has to be used.
25

All processes, techniques, and tools that support
business decision making based on information
technology.
The approaches can range from a simple excel
spreadsheet to a major competitive intelligence
undertaking.
e.g. data visualization, data mining, statistical
analysis using R, SPSS, etc
What is Business Intelligence (BI)?
26

Discussion Questions
The current trends in hardware/software technology make
data warehousing feasible. Explain via some examples how
exactly technology trends do help.
You are the IT Director of a nationwide insurance company.
Write a memo to the Executive Vice President explaining the
types of opportunities that can be realized with readily
available strategic information.
For an airlines company, how can strategic information
increase the number of frequent flyers? Discuss giving
specific details.
You are a Senior Analyst in the IT department of a company
manufacturing automobile parts. The marketing VP is
complaining about the poor response by IT in providing
strategic information. Explain the reasons for the problems
and why a data warehouse would be the only viable solution.
27
Tags