data warehousing need and characteristics. types of data w data warehouse architecture

103 views 15 slides Feb 28, 2024
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

data warehousing need and characteristics. types of data w data warehouse architecture


Slide Content

DBMS

Data Warehouse & Types of Data The term  data warehouse  is used to distinguish a database that is used for business analysis (OLAP) rather than transaction processing (OLTP). While an OLTP database contains current low-level data and is typically optimized for the selection and retrieval of records, a data warehouse typically contains aggregated historical data and is optimized for particular types of analyses, depending upon the client applications. The contents of your data warehouse depends on the requirements of your users. They should be able to tell you what type of data they want to view and at what levels of aggregation they want to be able to view it. Data warehouse will store these types of data: Historical data Derived data Metadata METADATA :  Metadata is a data about data. Metadata shows basic information about data, which can make finding and working with specific instances of data easier. Metadata increases the accuracy of searching and operating of data from large amount of data

Definition of DBMS A Database Management System (DBMS) is a software system that is designed to manage and organize data in a structured manner. It allows users to create, modify, and query a database, as well as manage the security and access controls for that database . DBMS provides an environment to store and retrieve the data in an efficient manner . A database is a collection of interrelated data which helps in the efficient retrieval, insertion, and deletion of data from the database and organizes the data in the form of tables, views, schemas, reports, etc.

Key Features of DBMS Data storage and retrieval:  A DBMS is responsible for storing and retrieving data from the database, and can provide various methods for searching and querying the data. Concurrency control:  A DBMS provides mechanisms for controlling concurrent access to the database, to ensure that multiple users can access the data without conflicting with each other. Data integrity and security:  A DBMS provides tools for enforcing data integrity and security constraints, such as constraints on the values of data and access controls that restrict who can access the data. Backup and recovery:  A DBMS provides mechanisms for backing up and recovering the data in the event of a system failure.

DBMS can be classified into two types:  Relational Database Management System (RDBMS) and Non-Relational Database Management System ( NoSQL or Non-SQL) RDBMS:  Data is organized in the form of tables and each table has a set of rows and columns. The data are related to each other through primary and foreign keys. NoSQL :  Data is organized in the form of key-value pairs, documents, graphs, or column-based. These are designed to handle large-scale, high-performance scenarios.

Properties of DBMS A transaction is a single logical unit of work that accesses and possibly modifies the contents of a database. Transactions access data using read and write operations. In order to maintain consistency in a database, before and after the transaction, certain properties are followed. These are called  ACID  properties . Atomicity :  A transaction is either completely successful or completely unsuccessful. There is no partial commit. Each transaction is considered as one unit and either runs to completion or is not executed at all. It involves the following two operations. Abort : If a transaction aborts, changes made to the database are not visible.  Commit : If a transaction commits, changes made are visible. Atomicity is also known as the ‘All or nothing rule ’. Consistency :  The database remains in a consistent state before and after a transaction is committed. It refers to the correctness of a database. Isolation :  Transactions are isolated from each other, so that the changes made by one transaction do not affect the other transactions until they are committed. This property ensures that the execution of transactions concurrently will result in a state that is equivalent to a state achieved these were executed serially in some order.  Durability:  Once a transaction is committed, its changes are permanent and cannot be rolled back. This property ensures that once the transaction has completed execution, the updates and modifications to the database are stored in and written to disk and they persist even if a system failure occurs. These updates now become permanent and are stored in non-volatile memory. The effects of the transaction, thus, are never lost. 

Data warehouse Architectur e A  data-warehouse  is a heterogeneous collection of different data sources organized under a unified schema (visual or diagrammatical representation). There are 2 approaches for constructing data-warehouse: Top-down approach and Bottom-up approach Top-down approach:  

External Sources –   External source is a source from where data is collected irrespective of the type of data. Data can be structured, semi structured and unstructured as well.    Stage Area –   Since the data, extracted from the external sources does not follow a particular format, so there is a need to validate this data to load into data warehouse . For this purpose, it is recommended to use  ETL  tool.  E(Extracted):  Data is extracted from External data source.    T(Transform):  Data is transformed into the standard format.    L(Load):  Data is loaded into data warehouse after transforming it into the standard format.    Data-warehouse –   After cleansing of data, it is stored in the data warehouse as central repository. It actually stores the meta data and the actual data gets stored in the data marts.  Note  that data warehouse stores the data in its purest form in this top-down approach. 

  Data Marts –   Data mart is also a part of storage component. It stores the information of a particular function of an organization which is handled by single authority. There can be as many number of data marts in an organization depending upon the functions. We can also say that data mart contains subset of the data stored in data warehouse.    Data Mining –   The practice of analyzing the big data present in data warehouse is data mining. It is used to find the hidden patterns that are present in the database or in data warehouse with the help of algorithm of data mining.  This approach is defined by  Inmon  as – data warehouse as a central repository for the complete organization and data marts are created from it after the complete data warehouse has been created. 

2. Bottom-up approach:  

First, the data is extracted from external sources (same as happens in top-down approach).    Then, the data go through the staging area (as explained above) and loaded into data marts instead of data warehouse . A data mart is a data storage system that contains information specific to an organization's business unit. It contains a small and selected part of the data that the company stores in a larger storage system.  These data marts are then integrated into data warehouse. This approach is given by  Kinball  as – data marts are created first and provides a thin view for analyses and data warehouse is created after complete data marts have been created. 

Category OLAP (Online Analytical Processing) OLTP (Online Transaction Processing) Definition It is well-known as an online database query management system. It is well-known as an online database modifying system. Data source Consists of historical data from various Databases. Consists of only operational current data.  Method used It makes use of a data warehouse. It makes use of a standard  database management system (DBMS). Application It is subject-oriented. Used for  Data Mining , Analytics, Decisions making, etc. It is application-oriented. Used for business tasks. Normalized In an OLAP database, tables are not normalized. In an OLTP database, tables are  normalized . Usage of data The data is used in planning, problem-solving, and decision-making. The data is used to perform day-to-day fundamental operations.

Task It provides a multi-dimensional view of different business tasks. It reveals a snapshot of present business tasks. Purpose It serves the purpose to extract information for analysis and decision-making. It serves the purpose to Insert, Update, and Delete information from the database. Volume of data A large amount of data is stored typically in TB, PB The size of the data is relatively small as the historical data is archived in MB, and GB. Queries Relatively slow as the amount of data involved is large. Queries may take hours. Very Fast as the queries operate on 5% of the data. Update  The OLAP database is not often updated. As a result, data integrity is unaffected. The data integrity constraint must be maintained in an OLTP database. Backup and Recovery It only needs backup from time to time as compared to OLTP. The backup and recovery process is maintained rigorously Processing time The processing of complex queries can take a lengthy time. It is comparatively fast in processing because of simple and straightforward queries.

Types of users This data is generally managed by CEO, MD, and GM. This data is managed by clerksForex and managers. Operations Only read and rarely write operations. Both read and write operations. Updates With lengthy, scheduled batch operations, data is refreshed on a regular basis. The user initiates data updates, which are brief and quick. Nature of audience The process is focused on the customer.    The process is focused on the market.  Database Design Design with a focus on the subject.  Design that is focused on the application. Productivity Improves the efficiency of business analysts. Enhances the user’s productivity.