Chapter 1- Introduction to Data Warehousing, OLAP & OLTP Architecture Text Book: Modern Data Warehousing, Mining, and Visualization: Core Concepts by George M. Marakas 2
Learning Objectives The main objective of this lesson is to introduce the students with the basic concept and terminology relating to Data Warehousing. By the end of this lesson the students will be able to understand: Meaning of a Data warehouse Evolution of Data warehouse Data Warehouse Roles and structures Explain the students the significance and difference between Operational systems and Informational systems. This lesson also includes various characteristics of a Data warehouse and data Marts. 3
Major Topics Data Warehouse: Basic Concepts Data Warehousing Data Warehouse Roles and Structures Data Marts Data Marts and the Data Warehouse Online Analytical Processing, OLAP Online Transaction Processing, OLTP OLAP vs OLTP 4
Data Warehouse “A data warehouse is a subject-oriented, integrated, time-variant, and nonvolatile collection of data in support of management’s decision-making process.”—W. H. Inmon A data warehouse is simply a single, complete, and consistent store of data obtained from a variety of sources and made available to end users in a way they can understand and use it in a business context. Note that the data warehouse contains a copy of the transactions. These are not updated or changed later by the transaction system. Also note that this data is specially structured, and may have been transformed when it was placed in the warehouse 5
Where it is used? It is used for evaluating future strategy. In order for data to be effective. DW must be: Consistent, well integrated, well defined, time stamped DW environment: The data store, data mart & the metadata. Data warehousing Data warehousing is a process, not a product, for assembling and managing data from various sources for the purpose of gaining a single, detailed view of part or all of the business.” The process whereby organizations extract meaning from their informational assets through the use of data warehouses 6
Characteristics of Data Warehouse 1. Data Warehouse—Subject-Oriented Organized around major subjects, such as customer, product, sales. Focusing on the modeling and analysis of data for decision makers, not on daily operations or transaction processing. Provide a simple and concise view around particular subject issues by excluding data that are not useful in the decision support process. 7
8
2. Data Warehouse—Integrated Constructed by integrating multiple, heterogeneous data sources, relational databases, flat files, on-line transaction records Data cleaning and data integration techniques are applied. Ensure consistency in naming conventions, encoding structures, attribute measures, etc. among different data sources E.g., Hotel price: currency, tax, breakfast covered, etc. When data is moved to the warehouse, it is converted. 9
10
3. Data Warehouse—Time Variant The time horizon for the data warehouse is significantly longer than that of operational systems. Operational database: current value data. Data warehouse data: provide information from a historical perspective (e.g., past 5-10 years) Every key structure in the data warehouse contains an element of time, explicitly or implicitly but the key of operational data may or may not contain “time element”. 11
4. Data Warehouse—Non-Volatile A physically separate store of data transformed from the operational environment. Operational update of data does not occur in the data warehouse environment. Does not require transaction processing, recovery, and concurrency control mechanisms. Requires only two operations in data accessing: initial loading of data and access of data. 12
Data Warehouse Roles and Structures The DW has the following primary functions: It is a direct reflection of the business rules of the enterprise. It is the collection point for strategic information. It is the historical store of strategic information. It is the source of information later delivered to data marts. It is the source of stable data regardless of how the business processes may change. 13
Position of the Data Warehouse within the Organization 14
Data Marts A data mart is a smaller, more focused data warehouse. It reflects the business rules of a specific business unit. The data mart does not need to cleanse its data because that was done when it went into the warehouse. It is a set of tables for direct access by users. These tables are designed for aggregation. It typically is not a source for traditional statistical analysis. 15
Position of the Data Mart within the Organization 16
What Can a Data Warehouse Do? Some of the benefits of a DW are: Immediate information delivery Data integration from across and even outside the organization Future vision from historical trends Tools for looking at data in new ways Freedom from IS department resource limitations (you don’t need programmers to use a data warehouse) 17
Examples of common Datawarehousing Applications 1. Sales Analysis Determine real-time product sales to make vital pricing and distribution decisions. Analyze historical product sales to determine success or failure attributes. Evaluate successful products and determine key success factors. Use corporate data to understand the margin as well as the revenue implications of a decision. Rapidly identify a preferred customer segments based on revenue and margin. Quickly isolate past preferred customers who no longer buy. Identify daily what product is in the manufacturing and distribution pipeline. Instantly determine which salespeople are performing, on both a revenue and margin basis, and which are behind. 18
2. Financial Analysis Compare actual to budgets on an annual, monthly and month-to-date basis. Review past cash flow trends and forecast future needs. Identify and analyze key expense generators. Instantly generate a current set of key financial ratios and indicators. Receive near-real-time, interactive financial statements. 19
3. Human Resource Analysis Evaluate trends in benefit program use. Identify the wage and benefits costs to determine company-wide variation. Review compliance levels for EEOC and other regulated activities. 4. Other Areas Warehouses have also been applied to areas such as: logistics, inventory, purchasing, detailed transaction analysis and load balancing. 20
Data Marts and the Data Warehouse Legacy systems feed data to the warehouse. The warehouse feeds specialized information to departments. 21
The Data Mart is More Specialized The data mart serves the needs of one business unit, not the organization. 22
Data Warehouse for Decision Support & OLAP Putting Information technology to help the knowledge worker make faster and better decisions Data Warehousing provides the Enterprise with a memory Data Mining provides the Enterprise with intelligence. 23
Online Transaction Processing (OLTP) OLTP (online transaction processing) is a class of software programs capable of supporting transaction -oriented applications on the Internet. Typically, OLTP systems are used for order entry, financial transactions, customer relationship management ( CRM ) and retail sales. Such systems have a large number of users who conduct short transactions. 24
Database queries are usually simple, require sub-second response times and return relatively few records. Online Transaction Processing (OLTP) system collects data electronically and process the transactions online Backbone of all functional, cross-functional, and inter-organizational systems in an organization An important attribute of an OLTP system is its ability to maintain concurrency. To avoid single points of failure , OLTP systems are often decentralized. IBM's CICS (Customer Information Control System) is a well-known OLTP product. 25
Data Warehouse vs. Heterogeneous DBMS Traditional heterogeneous DB integration: Build wrappers/mediators on top of heterogeneous databases Query driven approach When a query is posed to a client site, a meta-dictionary is used to translate the query into queries appropriate for individual heterogeneous sites involved, and the results are integrated into a global answer set Complex information filtering, compete for resources Data warehouse: update-driven, high performance Information from heterogeneous sources is integrated in advance and stored in warehouses for direct query and analysis 26
OLTP Compared with OLAP OLTP (On-Line Transaction Processing) Maintains a database that is an accurate model of some real-world enterprise. Supports day-to-day operations Major task of traditional relational DBMS Day-to-day operations: purchasing, inventory, banking, manufacturing, payroll, registration, accounting, etc. Characteristics: Short simple transactions Relatively frequent updates Transactions access only a small fraction of the database 27
OLAP (On-Line Analytical Processing) Uses information in database to guide strategic decisions Major task of data warehouse system Data analysis and decision making Characteristics: Complex queries Infrequent updates Transactions access a large fraction of the database Data need not be up-to- update 28