1. Briefly describe the major components of a data warehouse archi.docx
monicafrancis71118
21 views
5 slides
Jan 07, 2023
Slide 1 of 5
1
2
3
4
5
About This Presentation
1. Briefly describe the major components of a data warehouse architecture?
Components in data warehouse
Data warehouse contains the collection of data that are used for decision making and used business intelligence.
· It is a subject-oriented, integrated, time- variant, and non-updateable data.
...
1. Briefly describe the major components of a data warehouse architecture?
Components in data warehouse
Data warehouse contains the collection of data that are used for decision making and used business intelligence.
· It is a subject-oriented, integrated, time- variant, and non-updateable data.
· Three components in the architecture of the data warehouse are
· Operational data
· Reconciled data
· Derived data
Diagrammatic representation of architecture of data warehouse is shown below:
Components in the data warehouse architecture:
Operational data:
· It maintains the data from the operational system throughout the organization.
Reconciled data
· It is a data stored in the enterprise data warehouse and an operational data store.
· it contains a current and detailed data and authoritative sources for decision support application.
Derived data
· Derives data is a data obtained from the data mart that is used for the end user decision support application.
· It contains the selected, formatted, and aggregated data.
· It is the data stored in every mart.
Types of metadata in the data warehouse architecture:
There are three types of metadata. They are,
· Operational metadata.
· Enterprises data warehouse (EDW)metadata.
· Data mart metadata.
Operational metadata:
It describes the data in the operational system that provides for the enterprise data warehouse.
It is available in various formats, but the quality is poor.
Enterprises data warehouse (EDW)metadata:
It describes the data of reconciled layer.
It provides the rules for converting the operational data into reconciled data.
It extracts from the enterprise data model.
Data mart metadata:
It describes the data of derived data layer.
It provides the rules for converting the reconciled data into derived data.
2. Explain how the volatility of a data warehouse is different from the volatility of a database for an operational information system?
Data warehouse
· Data warehouse contains the collection of data that are used for decision making and used business intelligence.
· It is a unique kind of database, so it focuses on business intelligence, time variant data, and external data.
· The term data warehouse usually denotes to the grouping of many different database across an entire enterprise.
· It is a subject-oriented, integrated, time- variant, and non-updateable data.
Operational database:
An operational database is the database which is usually accessed and restructured on a regular basis and generally handles the daily transactions for a business.
It is used to manage the dynamic data and modification in the real-time data.
Volatility of a data warehouse and operational database:
A key dissimilarity between a data warehouse and an operational system is the data stored type.
Data warehouse is based on the use of periodic data operational system is based on the use of the transient data.
A change in the existing record present in the stores that overwrites the previous reco.
Size: 298.57 KB
Language: en
Added: Jan 07, 2023
Slides: 5 pages
Slide Content
1. Briefly describe the major components of a data warehouse
architecture?
Components in data warehouse
Data warehouse contains the collection of data that are used for
decision making and used business intelligence.
· It is a subject-oriented, integrated, time- variant, and non-
updateable data.
· Three components in the architecture of the data warehouse
are
· Operational data
· Reconciled data
· Derived data
Diagrammatic representation of architecture of data warehouse
is shown below:
Components in the data warehouse architecture:
Operational data:
· It maintains the data from the operational system throughout
the organization.
Reconciled data
· It is a data stored in the enterprise data warehouse and an
operational data store.
· it contains a current and detailed data and authoritative
sources for decision support application.
Derived data
· Derives data is a data obtained from the data mart that is used
for the end user decision support application.
· It contains the selected, formatted, and aggregated data.
· It is the data stored in every mart.
Types of metadata in the data warehouse architecture:
There are three types of metadata. They are,
· Operational metadata.
· Enterprises data warehouse (EDW)metadata.
· Data mart metadata.
Operational metadata:
It describes the data in the operational system that provides for
the enterprise data warehouse.
It is available in various formats, but the quality is poor.
Enterprises data warehouse (EDW)metadata:
It describes the data of reconciled layer.
It provides the rules for converting the operational data into
reconciled data.
It extracts from the enterprise data model.
Data mart metadata:
It describes the data of derived data layer.
It provides the rules for converting the reconciled data into
derived data.
2. Explain how the volatility of a data warehouse is different
from the volatility of a database for an operational information
system?
Data warehouse
· Data warehouse contains the collection of data that are used
for decision making and used business intelligence.
· It is a unique kind of database, so it focuses on business
intelligence, time variant data, and external data.
· The term data warehouse usually denotes to the grouping of
many different database across an entire enterprise.
· It is a subject-oriented, integrated, time- variant, and non-
updateable data.
Operational database:
An operational database is the database which is usually
accessed and restructured on a regular basis and generally
handles the daily transactions for a business.
It is used to manage the dynamic data and modification in the
real-time data.
Volatility of a data warehouse and operational database:
A key dissimilarity between a data warehouse and an
operational system is the data stored type.
Data warehouse is based on the use of periodic data operational
system is based on the use of the transient data.
A change in the existing record present in the stores that
overwrites the previous record and deletes the old record is
called a transient data.
A data that cannot be overwriting after added to the store is
called a periodic data.
In operational system, the data are very volatile and data
warehouse stores each change in the data.
There for, the volatility of data warehouse and operational
database by data store.
Ch11
What are the key capabilities of NoSQL that extend what SQL
can do?
NoSQL is the technology designed for handling big data. It
stores and retrieves the data but not based on relational model.
The key capabilities of NoSQL that extent what SQL can do are:
· It does not concern more about reduction in storage, as the
SQL did, space because today the storage cost has been reduced
so much.
· It focuses on Flexibility, Variety, Versality, Agility and
Scalability.
· It facilitates for “Scaling Out”, instead of “Scaling Up” as
SQL did”. It has huge number of commodity servers to be added
with architectural solutions hence the facility pf “Scaling Out”
is possible.
· Other parts of the system may work efficient, even if there is
found the failure in a single component.
· It facilitates for “Shared-Nothing” architecture which refers a
replication architecture which does not role master and slave
separately
· It provides “Schema on Read”, instead of “Schema on Write”
as SQL did.” Schema on Read”
· Refers separate specification of s single collection of any
individual data items. It uses the languages such as JSON or
XML.
· Instead of using ACID (Atomicity, Consistency, Isolation and
Durability) used in SQL, it uses BASE (Basically Available,
Soft State and Eventually Consistent) characteristic.
· NoSQL guaranties for high availability over that for
consistency while SQL offers guaranteed consistency but in
maintaining availability in number of situations.
· Multi-Model: NoSQL database play significant role in multi-
model database applications where it is capable enough to
handle all kind of data such as structured, semi-structured, and
unstructured and can ensure to work for applications which
require all kind of data.
· Easy Scalability: This database is established using traditional
master-slave architecture which makes it capable for expanding
it by making it larger through additional servers as per
requirements.
· Flexibility: Its more flexible than relational database because
of its multi-model design that allow it controls over multiple
data forms.
· Distributed: This database is distributed in nature because it
provides global accessibility means it can use at multiple
locations by multiple companies at the same time using their
central data centers.
· Zero Downtime: Because it uses master less architecture that
helps it to make multiple copies of same data and manage at
different nodes so if one database node is under maintenance or
not working then can support with other database node.
Explain the relationship between Hadoop and MapReduce?
MapReduce is a programing model designed for large scale
parallel processing of data. In other word, the MapReduce helps
in computer solving problem and parallelization of large data
storage in an environment that consists of a large number of
commodity servers.
1. Mapper phase: In this phase, raw files are taken an input and
then the required-out value and output key are separated.
2. Reduced phase: the output coming from the Mapper phase
will be taken as an input of the Reduced phase. Then, the
grouping performed on these data based on the key and
aggregate all output values and output keys.
At the end the output coming from the reduces will be sent to
the Hadoop Distributed File System (HDFS).
Hadoop is a batch processing tool. Hadoop’s essence is in
processing very large amounts of data by distributing the
data(using HDFS or Hadoop Distributed File System) and
processing task among a large number of lao-cost commodity
server.
Hadoop is an open source implementation of MapReduce that
makes the capabilities of this algorithm available for other
application.
In this way, the MapReduce and Hadoop are related to each
other.