The seminar is about Data warehousing, in here we are gonna discuss about what is data warehousing, comparison b/w database and data warehouse, different data warehouse models.about Data mart, and disadvantages of data warehousing.
Size: 888.83 KB
Language: en
Added: Oct 31, 2014
Slides: 29 pages
Slide Content
welcome
DATA WAREHOUSING Presented By, Neenu C. Paul(12120051) CS B, S7 SOE, CUSAT Guided By, Dr. Sudheep Elayidom Division of Computer Science SOE, CUSAT
CONTENTS What is a data warehouse? What is data warehousing? Database vs Data warehouse OLTP & OLAP Data warehouse architecture Multidimensional data model Data Mart ETL Advantages of data warehouse Disadvantages of data warehouse S/W Solutions of data warehouse Conclusion References
A producer wants to know…. Which are our lowest/highest margin customers ? Who are my customers and what products are they buying? What is the most effective distribution channel? What product prom- - otions have the biggest impact on revenue? What impact will new products/services have on revenue and margins? Which customers are most likely to go to the competition ?
What is a Data Warehouse?? A data warehouse is an appliance for storing and analyzing data , and reporting . C entral database that includes information from several different sources. Keeps current as well as historical data. U sed to produce reports to assist in decision-making and management .
“ Data Warehouse is a subject oriented , integrated , time-variant and non-volatile collection of data in support of management’s decision making process.” – W. H. Inmon
What is Data Warehousing? A process of transforming data into information and making it available to users in a timely enough manner to make a difference Data Information
Database vs Data Warehouse Database Transaction Oriented For saving online bargain data E-R modeling techniques are used for designing Capture data Constitute real time information Data Warehouse Subject oriented For saving historical data Data modeling techniques are used for designing. Analyze data Constitute entire information base for all time.
Data Processing Technologies OLTP (on-line transaction processing) The major task is to perform on-line transaction and query processing. Covers most of the day-to-day operations of an organization. OLAP(On-Line Analytical Processing ) Serve knowledge workers(users) in the role of data analysis and decision making . O rganize and present data in various formats to accommodate the diverse needs of the different users. Data Processing Technologies OLTP OLAP
OLTP vs OLAP
11 July 17, 2014 To summarize ... OLTP Systems are used to “run” a business The Data Warehouse helps to “optimize” the business
Typical DW Architecture System B System C System D System A Extract Transform Load The Data Warehouse Business Model Self Serve Data Sources ETL Data Store Data Access Presentation Prompted Views Dashboards Scorecards Ad-Hoc Reporting 12
Multidimensional data model Developed for implementing data warehouse and data marts. Provides both a mechanism to store data and a way for business analysis. An alternative to entity-relationship (E/R) model TYPES OF MULTIDIMENSIONAL DATA MODEL Data cube model. Star schema model. Snow flake schema model. Fact Constellations.
Data cubes A data warehouse is based on a multidimensional data model which views data in the form of a data cube. Three important concepts are associated with data cubes - Slicing - Dicing - Rotating In the cube given below we have the results of the 1991 Canadian Census with ethnic origin, age group and geography representing the dimensions of the cube, while 174 represents the measure. The dimension is a category of data. Each dimension includes different levels of categories. The measures are actual data values that occupy the cells as defined by the dimensions selected.
1991 Canadian Census 15
Slicing the Data Cube Figure 2 illustrates slicing the Ethnic origin Chinese. When the cube is sliced like in this example, we are able to generate data for Chinese origin for the geography and age groups as a result. The data that is contained within the cube has effectively been filtered in order to display the measures associated only with the Chinese ethnic origin. From an end user perspective, the term slice most often refers to a two- dimensional page selected from the cube . 16
Dicing and Rotating Dicing is a related operation to slicing in which a sub-cube of the original space is defined Dicing provides the user with the smallest available slice of data, enabling you to examine each sub-cube in greater detail. Rotating, which is sometimes called pivoting changes the dimensional orientation of the report or page display from the cube data. Rotating may consist of swapping the rows an columns, or moving one of the row dimensions into the column dimension . 17
Data Mart Contains a subset of the data stored in the data warehouse that is of interest to a specific business community, department, or set of users. E.g.: Marketing promotions, finance ,or account collections. Data marts are small slices of the data warehouse . Data marts improve end-user response time by allowing users to have access to the specific type of data they need to view. A data mart is basically a condensed and more focused version of a data warehouse.
Data warehouse vs Data mart DATA WAREHOUSE Holds multiple subject areas Holds very detailed information Works to integrate all data sources Does not necessarily use a dimensional model but feeds dimensional models DATA MART Often holds only one subject area- for example, Finance, or Sales May hold more summarized data (although many hold full detail) Concentrates on integrating information from a given subject area or set of source systems Is built focused on a dimensional model using a star schema
Reasons for creating a data mart Easy access to frequently needed data Creates collective view by a group of users Improves end-user response time Ease of creation Lower cost than implementing a full data warehouse Potential users are more clearly defined than in a full data warehouse Contains only business essential data and is less cluttered.
Advantages & Disadvantages of data warehousing Advantages Enhances end-user access to a wide variety of data. Increases data consistency. Increases productivity and decreases computing costs. Is able to combine data from different sources, in one place. It provides an infrastructure that could support changes to data and replication of the changed data back into the operational systems. Disadvantages Extracting, cleaning and loading data could be time consuming. Problems with compatibility with systems already in place e.g. transaction processing system. Providing training to end-users, who end up not using the data warehouse. Security could develop into a serious issue, especially if the data warehouse is web accessible.
Applications of data warehousing Industry Application Finance Credit card Analysis Insurance Claims, Fraud Analysis Telecommunication Call record Analysis Transport Logistics management Consumer goods Promotion Analysis
etl Extract-Transform-Load Responsible for the operations taking place in the backstage of data warehouse architecture. Extract : Get the data from source system as efficiently as possible Transform : Perform calculations on data Load : Load the data in the target storage ADVANTAGES OF ETL TOOL Simple, faster and cheaper Deliver good performance even for very large data set Allows reuse of existing complex programs
Popular etl tools Tools Company Infomix IBM Oracle Warehouse Builder ORACLE Microsoft SQL Server Integration Microsoft
IBM Infomix Informix is one of the world’s most widely used database servers High levels of performance and availability, distinctive capabilities in data replication and scalability, and minimal administrative overhead. HIGHLIGHTS Real-time Analytics: Informix is a single platform that can power OLTP and OLAP workloads and successfully meet service-level agreements (SLAs) for each Fast, Always-on Transactions: Provides one of the industry’s widest sets of options for keeping data available at all times, including zero downtime for maintenance Sensor data management: Solves the big data challenge of sensor data with unmatched performance and scalability for managing time series data Easy to Use: Informix runs virtually unattended with self-configuring, self-managing and self-healing capabilities Best-of-breed embeddability: Provides a proven embedded data management platform for ISVs and OEMs to deliver integrated, world-class solutions, enabling platform independence NoSQL capability: IBM Informix unleashes new capabilities, giving you a way to combine unstructured and structured data in a smart way, bringing NoSQL to your SQL database.
Data Warehousing is not a new phenomenon. All large organizations already have data warehouses, but they are just not managing them. Over the next few years, the growth of data warehousing is going to be enormous with new products and technologies coming out frequently. In order to get the most out of this period, it is going to be important that data warehouse planners and developers have a clear idea of what they are looking for and then choose strategies and methods that will provide them with performance today and flexibility for tomorrow. conclusion
Reference Data Mining , Gupta Data Warehousing , C.S.R. Prabhu Jeff Lawyer and Shamsul Chowdhury “Best Practices in Data Warehousing to Support Business Initiatiatives and Needs”, IEEE 2004 Ruilian Hou “Research and Analysis of Data Warehouse Technologies”, IEEE 2011 S. Sai Sathyanarayana Reddy, Dr. L.S.S.Reddy, Dr.V.Khanna, A.Lavanya “Advanced Techniques for Scientific Data Warehousing”, IEEE 2009 Murat Obali, Abdul Kadir Gorur, “A Real Time Data Warehouse Approach for Data Processing”, IEEE 2013 Ruilian Hou “Analysis and research on the difference between data warehouse and database”, IEEE 2011