Date Analysis .pdf

webresso 133 views 49 slides Jul 12, 2022
Slide 1
Slide 1 of 49
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49

About This Presentation

Overview of :
- Data Analysis
- Business Intelligence
- Data warehouses
- Dimensional Modeling / Data Cubes.
- Summery Tables.

-


Slide Content

Part 1:
Data Analysis Introduction.

Data Analysis
Data analysis is a process of inspecting, cleansing, transforming,
and modelling data with the goal of discovering useful
information, informing conclusions, and supporting
decision-making.

Examples Of the information from the data
When and Where Each product is Sold ,
Quantity of a certain Product to order to minimize expiration
rate
Which Features Will actually improve sales or customer
satisfaction.

Types of Data Analysis
1- Descriptive Analysis
The goal of descriptive analytics is to find out what happened? For example,
what was the average revenue for the month of January? Cases of Covid Per
countries .
In other words it is Generating simple summaries of the data.

2. Exploratory Analysis (EDA)

Goal — Examine or explore data and find relationships
between variables which were previously unknown

Data Architect Role
Designing How data will be stored , consumed, managed
and integrated between systems.

Data Analyst Role
Data analysts mostly work with an organization's structured data.
They create advanced and sophisticated visualizations to provide
insights using BI tools like Tableau, Metabase, Redash , Power BI. .
They do this on the fly without IT assistance.
Big Percentage of those who do Data Analysis don’t carry a title with
the words ‘Data’ or ‘Analysis’ On them.

BI tools

BI Tools Features
Combine multiple data sets to create a new one.
Preparing, and cleaning data for analysis.
Data Visualization
Geospatial Analysis

BI features
-Reporting
-What IF analysis using past data to predict
potential outcomes.
- Scenario analysis
-Statistical analysis using advanced functions like
mean, median, mode, standard deviation,
-Mobile Dashboards
-Integrations

Data Acquisition
Process of gathering data from multiple sources such as:
-Server logs
-APIs
-Databases Could be more than one
-File systems and Excel sheets
-IOT sensors data
-Satellite images
-Emails

Data preparation:
Data Cleaning:
Very consuming process to get rid of inconsistent data, misspelled
attributes, duplication

Data transformation
Converting data to a structural data

Part 2: Designing Data intensive systems
(Technical)

3Vs Of Data to consider
Volume
Variety
Velocity

Structured data

is data that has been predefined
and formatted to a set structure
before being placed in data
storage. The best example of
structured data is the relational
database: the data has been
formatted into precisely defined
fields, to be easily queried with
SQL.




Types of Data
UnStructured data

Unstructured data is data stored in its
native format and not processed until
it is used, a myriad of file formats,
including:
email, social media posts,
presentations, chats, IoT sensor data,
and satellite imagery.

Types of Data
Transactional Data
The purpose of Transactional Data
is to support day-to-day operations
of the business. An inspection
Analytical Data
Analytical Data is used for
managerial analysis and decision
making.

Analytical data are:
-Scanning over a huge number of records.
-Only reading from few columns.
-calculates aggregate statistics (such as count, sum, or
average) rather than just returning the raw data to the user.

OLTP Vs OLAP systems

Using Database for Both OLTP and OLAP
At first, the same databases were used for both transaction
processing and analytic queries. SQL turned out to be quite
flexible in this regard: it works well for OLTP type queries as
well as OLAP-type queries.

When OLAP is not recommended with a database
-Slow Analytical Queries.
-Guarding the database which is facing the systems from the expensive
Analytical queries.
-Data Admins won’t allow Business Analyst to run queries on live databases.
-Multiple sources of data, from multiple systems the enterprise might be
using.

ELT tools (Extract, Transform Load)
ETL tools collect, read, and migrate large volumes of raw data from
multiple data sources and across disparate platforms. They load that
data into a single database, data store, or data warehouse for easy
access.
●Hand-coding
●Batch processing during off hours (not real time).
●Real-time ETL tools capture data from and deliver data to
applications in real time using distributed message queues and
continuous data processing. This allows analytics tools to query
Internet of Things (IoT) sensors,

Data warehouse Vs Database

Dataware Houses
●Relational tables
●Uses SQL
●Columnar storage to know more click here
●Single point of truth SSOT
●less well known, because they are primarily used by business
analysts, not by end users.
●Handles a much lower volume of queries than OLTP systems,
but each query is typically very demanding, requiring many
millions of records to be scanned in a short time.

Data Warehouse structure
●The Data Modeling of Data warehouses is called star scheme.
Also called Dimensional Modelling.
●Materialized Views not virtual views
●Data cubes or OLAP cubes

Elements of Dimensional Modeling DM
1. Facts
2.Dimensions
DM has no Many To Many only Fact Dimension relationship.
So tables gets wide.

Facts
Facts are the measurements/metrics or facts from your business process.
For a Sales business process, a measurement would be quarterly sales
numbers.
A Fact Table contains
1.Measurements/facts
2.Foreign key to dimension table
3.Only numerical attributes that can be used for calculations.

-Can grow and be huge.

Dimensions
Dimension provides the context surrounding a business process event. In
simple terms, they give who, what, where of a fact. In the Sales business
process, for the fact quarterly sales number, dimensions would be
●Who – Customer Names
●Where – Location
●What – Product Name
●When -Date
●With what (How) - equipments and services
●Why

OLAP query example

DM Notes
-if the customer buys several different products at once, they are
represented as separate rows in the fact table.)

-Date And time are often represented using dimension tables,
because this allows additional information about dates (such as
public holidays) to be encoded, allowing queries to differentiate
between sales on holidays and non-holidays.
-

DM notes
The name “star schema” comes from the fact that when the table
relationships are visualized, the fact table is in the middle, surrounded
by its dimension tables; the connections to these tables are like the rays
of a star.
Data warehouse queries often involve an aggregate function, such as
COUNT, SUM, AVG, MIN, or MAX in SQL. If the same aggregates are
used by many different queries, it can be wasteful to crunch through the
raw data every time. Why not cache some of the counts or sums that
queries use most often?

Materialized Views Vs Virtual Views
The difference is that a materialized view is an actual copy of the
query results, written to disk, whereas a virtual view is just a
shortcut for writing queries.
When the underlying data changes, a materialized view needs to be
updated, because it is a denormalized copy of the data. The database
can do that automatically,

Data Cubes (Also called Multidimension database)
data is grouped or combined in
multidimensional matrices called
Data Cubes.
Example, XYZ may create a sales data
warehouse to keep records of the
store's sales for the dimensions time,
item, branch, and location.

Data Cubes
In data warehousing, the data cubes are n-dimensional. The cuboid
which holds the lowest level of summarization is called a base cuboid.
For example, the 4-D cuboid in the figure is the base cuboid for the
given time, item, location, and supplier dimensions.
The topmost 0-D cuboid, which holds the highest level of summarization, is known as the
apex cuboid. In this example, this is the total sales, or dollars sold, summarized over all
four dimensions.

Summary Tables
Following table is huge table of Visits with their dates and
Browser and its versions. The needed chart or report is to
answer How many Visits Per Browser Per Version Per Date

Summary tables

Summary Tables
●Best Material about Summary table from the official site of
Maria DB
●Using summary tables can dramatically improve query
performance for queries that access commonly
●There is nothing wrong about redundancy as long as
it's controlled.
●If your clients are requesting information "per day", then you can
create summary tables that include information per day.

Augmenting summary tables cases
"Augment" in this section means to add new rows into the summary table or increment the counts
in existing rows.
Plan A: "While inserting" rows into the Fact table, augment the summary
table(s).
Plan B: "Periodically", via cron or an EVENT.
Plan C: "As needed". That is, when someone asks for a report, First check if
the row table have been updated since last datetime , if chang happened
then the code first updates the summary tables that will be needed then
keep track of this report creation datetime.

Multiple summary tables
●Look at the reports you will need.
●Design a summary table for each.
●Then look at the summary tables -- you are likely to find some similarities.
●Merge similar ones.

Part 3 Data Analysis and SaaS
Data Analysis with SaaS
-The Problem.
-Scenario
-Hand Coding + Client side Visualization.
-Scenario
-Headless BI + Client side Visualization.

-Embedding and integration
- Tableau
- Devexpress , telerik
- Power Bi
-
-