Introduction to Descriptive & Predictive Analytics

DilumBandara 680 views 25 slides Apr 17, 2024
Slide 1
Slide 1 of 25
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25

About This Presentation

Introduction to Descriptive & Predictive Analytics
Basic statistics


Slide Content

Introduction to Descriptive & Predictive Analytics CS5122 Descriptive & Predictive Analytics Dilum Bandara [email protected]

Data Analytics, Engineering, & Science Buzz words that every decision maker either wants to or forced to look into Data-driven decision making is hard Needs right data, fitting tools, skilled analysts, & a supportive environment Data analysts Domain experts Tool experts 2 Useful Insights

Objectives Given a dataset, to train you to Ask Right Questions Identify Right Tool(s) Derive Right Answers/Insights We take a data-driven approach First try to derive a set of questions based on data available for the analysis Explore potential techniques to support answering those questions while using available data Deriving right answers by interpreting processed data & visualizations 3

Source: https://moz.com/blog/when-it-comes-to-analytics-are-you-doing-enough 4

Descriptive, Predictive, & Prescriptive Analytics Descriptive Analytics Use data aggregation & data mining techniques to provide insight into the past & answer: “What has happened?” Predictive Analytics Use statistical models & forecasts techniques to understand the future & answer: “What could happen?” Prescriptive Analytics Use optimization & simulation algorithms to advice on possible outcomes & answer: “What should we do?” Source: https://halobi.com/2014/10/descriptive-predictive-and-prescriptive-analytics-explained/ 5

Example Descriptive Analytics Wal-Mart’s found that on Friday afternoons, young American males who buy diapers also tend to buy beer Potential sales of each item can increase, if they are kept close to each other Predictive analytics Demand for diapers could increase in mid to late summer as more babies are expected to bone in the USA. Make sure expected mothers are informed of their diaper choices through advertising, & production & supply are ready to meet the extra demand Increased sales Prescriptive analytics When to start advertising & when to give discounts? Help us understand the most effective dates & percentage of discounts that not only increase sales but also profit 6

Tools can help reduce difficulty 7

Source: IBM 8

Review of Basic Statistics & Probability

Populations & Samples Population All items of interest for a particular decision or investigation E.g., all Gmail users, all subscribers to Netflix Sample A subset of the population E.g., all Google Apps for Education users, list of customers who rented a comedy from Netflix in the past year Purpose of sampling is to obtain sufficient information to draw a valid inference about a population 10

Sample Space & Events Sample Space All possible outcomes of an experiment E.g., flipping a coin {H, T} E.g., rolling a dice {1, 2, 3, 4, 5, 6} Event Any subset of the sample space E.g., {H}, {T}, {H, T}, {1}, or {2, 4, 6} 11

Random Variable Variable whose value is subject to variations due to chance Discrete random variables Toss a coin, roll a dice Continuous random variables Stock value, voltage of a sensor, 12

Measures of Location Mean Population mean Sample mean Median Middle value of data when sorted from least to greatest Mode Observation that occurs most often Midrange Average of greatest & least values = (max – min)/2 13

Probability Distribution/Mass Function 14

Measures of Dispersion Dispersion Refers to the degree of variation in data Range Difference between max & min value Interquartile Range (IQR) Difference between 3 rd and 1 st quartiles Variance Average of squared deviations form mean Standard Deviation (STD) Square root of the variance 15

Measures of Dispersion (Cont.) z -score Standard score is the number of STD an observation is above/below the mean For many data sets encountered in practice: ~68% of observations fall within 1 STD of mean ~95% fall within 2 STDs ~99.7% fall within 3 STDs 16

Measures of Dispersion (Cont.) Coefficient of Variation A relative measure of dispersion Return to risk = 1/CV 17

Exercise Mean & STD of Closing Stock Prices: Intel (INTC): Mean = $18.81, STD = $0.50 General Electric (GE): Mean = $16.19, STD = $0.35 Which stock has higher risk of investment? 18

Measures of Dispersion (Cont.) Percentiles Value below which a given percentage of observations in a group of observations fall Source: www.mathsisfun.com/data/percentiles.html 19

Measures of Shape Skewness Describes lack of symmetry Coefficient of Skewness CS < 0 for left-skewed data CS > 0 for right-skewed data |CS| > 1 suggests high degree of skewness 0.5 ≤ |CS| ≤ 1 suggests moderate skewness |CS| < 0.5 suggests relative symmetry 20

Measures of Shape (Cont.) Kurtosis Refers to peakedness or flatness Coefficient of Kurtosis CK < 3 indicates data is somewhat flat with a wide degree of dispersion CK > 3 indicates data is somewhat peaked with less dispersion 21

Measures of Association Covariance Measure of linear association between 2 variables, X & Y Population Sample 22

Measures of Association Correlation Measure of linear association between 2 variables, X & Y Correlation Coefficient Doesn’t depend upon units of measurement (unlike covariance) Population Sample 23

Measures of Association 24

Outliers Mean & range are sensitive to outliers No standard definition of what constitutes an outlier Possible methods to identify outliers are: z- scores greater than +3 or less than -3 extreme outliers are more than 3*IQR to the left of Q 1 or right of Q 3 mild outliers are between 1.5*IQR and 3*IQR to the left of Q 1 or right of Q 3 25