Overview of :
- Data Analysis
- Business Intelligence
- Data warehouses
- Dimensional Modeling / Data Cubes.
- Summery Tables.
-
Size: 947.83 KB
Language: en
Added: Jul 12, 2022
Slides: 49 pages
Slide Content
Part 1:
Data Analysis Introduction.
Data Analysis
Data analysis is a process of inspecting, cleansing, transforming,
and modelling data with the goal of discovering useful
information, informing conclusions, and supporting
decision-making.
Examples Of the information from the data
When and Where Each product is Sold ,
Quantity of a certain Product to order to minimize expiration
rate
Which Features Will actually improve sales or customer
satisfaction.
Types of Data Analysis
1- Descriptive Analysis
The goal of descriptive analytics is to find out what happened? For example,
what was the average revenue for the month of January? Cases of Covid Per
countries .
In other words it is Generating simple summaries of the data.
2. Exploratory Analysis (EDA)
Goal — Examine or explore data and find relationships
between variables which were previously unknown
Data Architect Role
Designing How data will be stored , consumed, managed
and integrated between systems.
Data Analyst Role
Data analysts mostly work with an organization's structured data.
They create advanced and sophisticated visualizations to provide
insights using BI tools like Tableau, Metabase, Redash , Power BI. .
They do this on the fly without IT assistance.
Big Percentage of those who do Data Analysis don’t carry a title with
the words ‘Data’ or ‘Analysis’ On them.
BI tools
BI Tools Features
Combine multiple data sets to create a new one.
Preparing, and cleaning data for analysis.
Data Visualization
Geospatial Analysis
BI features
-Reporting
-What IF analysis using past data to predict
potential outcomes.
- Scenario analysis
-Statistical analysis using advanced functions like
mean, median, mode, standard deviation,
-Mobile Dashboards
-Integrations
Data Acquisition
Process of gathering data from multiple sources such as:
-Server logs
-APIs
-Databases Could be more than one
-File systems and Excel sheets
-IOT sensors data
-Satellite images
-Emails
Data preparation:
Data Cleaning:
Very consuming process to get rid of inconsistent data, misspelled
attributes, duplication
Data transformation
Converting data to a structural data
Part 2: Designing Data intensive systems
(Technical)
3Vs Of Data to consider
Volume
Variety
Velocity
Structured data
is data that has been predefined
and formatted to a set structure
before being placed in data
storage. The best example of
structured data is the relational
database: the data has been
formatted into precisely defined
fields, to be easily queried with
SQL.
Types of Data
UnStructured data
Unstructured data is data stored in its
native format and not processed until
it is used, a myriad of file formats,
including:
email, social media posts,
presentations, chats, IoT sensor data,
and satellite imagery.
Types of Data
Transactional Data
The purpose of Transactional Data
is to support day-to-day operations
of the business. An inspection
Analytical Data
Analytical Data is used for
managerial analysis and decision
making.
Analytical data are:
-Scanning over a huge number of records.
-Only reading from few columns.
-calculates aggregate statistics (such as count, sum, or
average) rather than just returning the raw data to the user.
OLTP Vs OLAP systems
Using Database for Both OLTP and OLAP
At first, the same databases were used for both transaction
processing and analytic queries. SQL turned out to be quite
flexible in this regard: it works well for OLTP type queries as
well as OLAP-type queries.
When OLAP is not recommended with a database
-Slow Analytical Queries.
-Guarding the database which is facing the systems from the expensive
Analytical queries.
-Data Admins won’t allow Business Analyst to run queries on live databases.
-Multiple sources of data, from multiple systems the enterprise might be
using.
ELT tools (Extract, Transform Load)
ETL tools collect, read, and migrate large volumes of raw data from
multiple data sources and across disparate platforms. They load that
data into a single database, data store, or data warehouse for easy
access.
●Hand-coding
●Batch processing during off hours (not real time).
●Real-time ETL tools capture data from and deliver data to
applications in real time using distributed message queues and
continuous data processing. This allows analytics tools to query
Internet of Things (IoT) sensors,
Data warehouse Vs Database
Dataware Houses
●Relational tables
●Uses SQL
●Columnar storage to know more click here
●Single point of truth SSOT
●less well known, because they are primarily used by business
analysts, not by end users.
●Handles a much lower volume of queries than OLTP systems,
but each query is typically very demanding, requiring many
millions of records to be scanned in a short time.
Data Warehouse structure
●The Data Modeling of Data warehouses is called star scheme.
Also called Dimensional Modelling.
●Materialized Views not virtual views
●Data cubes or OLAP cubes
Elements of Dimensional Modeling DM
1. Facts
2.Dimensions
DM has no Many To Many only Fact Dimension relationship.
So tables gets wide.
Facts
Facts are the measurements/metrics or facts from your business process.
For a Sales business process, a measurement would be quarterly sales
numbers.
A Fact Table contains
1.Measurements/facts
2.Foreign key to dimension table
3.Only numerical attributes that can be used for calculations.
-Can grow and be huge.
Dimensions
Dimension provides the context surrounding a business process event. In
simple terms, they give who, what, where of a fact. In the Sales business
process, for the fact quarterly sales number, dimensions would be
●Who – Customer Names
●Where – Location
●What – Product Name
●When -Date
●With what (How) - equipments and services
●Why
OLAP query example
DM Notes
-if the customer buys several different products at once, they are
represented as separate rows in the fact table.)
-Date And time are often represented using dimension tables,
because this allows additional information about dates (such as
public holidays) to be encoded, allowing queries to differentiate
between sales on holidays and non-holidays.
-
DM notes
The name “star schema” comes from the fact that when the table
relationships are visualized, the fact table is in the middle, surrounded
by its dimension tables; the connections to these tables are like the rays
of a star.
Data warehouse queries often involve an aggregate function, such as
COUNT, SUM, AVG, MIN, or MAX in SQL. If the same aggregates are
used by many different queries, it can be wasteful to crunch through the
raw data every time. Why not cache some of the counts or sums that
queries use most often?
Materialized Views Vs Virtual Views
The difference is that a materialized view is an actual copy of the
query results, written to disk, whereas a virtual view is just a
shortcut for writing queries.
When the underlying data changes, a materialized view needs to be
updated, because it is a denormalized copy of the data. The database
can do that automatically,
Data Cubes (Also called Multidimension database)
data is grouped or combined in
multidimensional matrices called
Data Cubes.
Example, XYZ may create a sales data
warehouse to keep records of the
store's sales for the dimensions time,
item, branch, and location.
Data Cubes
In data warehousing, the data cubes are n-dimensional. The cuboid
which holds the lowest level of summarization is called a base cuboid.
For example, the 4-D cuboid in the figure is the base cuboid for the
given time, item, location, and supplier dimensions.
The topmost 0-D cuboid, which holds the highest level of summarization, is known as the
apex cuboid. In this example, this is the total sales, or dollars sold, summarized over all
four dimensions.
Summary Tables
Following table is huge table of Visits with their dates and
Browser and its versions. The needed chart or report is to
answer How many Visits Per Browser Per Version Per Date
Summary tables
Summary Tables
●Best Material about Summary table from the official site of
Maria DB
●Using summary tables can dramatically improve query
performance for queries that access commonly
●There is nothing wrong about redundancy as long as
it's controlled.
●If your clients are requesting information "per day", then you can
create summary tables that include information per day.
Augmenting summary tables cases
"Augment" in this section means to add new rows into the summary table or increment the counts
in existing rows.
Plan A: "While inserting" rows into the Fact table, augment the summary
table(s).
Plan B: "Periodically", via cron or an EVENT.
Plan C: "As needed". That is, when someone asks for a report, First check if
the row table have been updated since last datetime , if chang happened
then the code first updates the summary tables that will be needed then
keep track of this report creation datetime.
Multiple summary tables
●Look at the reports you will need.
●Design a summary table for each.
●Then look at the summary tables -- you are likely to find some similarities.
●Merge similar ones.
Part 3 Data Analysis and SaaS
Data Analysis with SaaS
-The Problem.
-Scenario
-Hand Coding + Client side Visualization.
-Scenario
-Headless BI + Client side Visualization.
-Embedding and integration
- Tableau
- Devexpress , telerik
- Power Bi
-
-