correlation and regression in statistics

chicogil 25 views 18 slides Aug 07, 2024
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

correlation and regression in statistics


Slide Content

Statistical Significance

Statical significance – some interesting examples Coin1 is flipped four times: 3 times it shows H and once it shows T Can we say it is a biased coin? The same coin is flipped 10 times: 7 times it shows H and 3 times it shows T – can we now say it is a biased coin? The same coin is now flipped 100 times: 69 times it shows H and 31 times it shows T – can we now say it is a biased coin? Lesson : The more sample you have the more you can be sure about how biased the coin is Coin 2 is flipped 10 times: 9 times it shows H and once it shows T – can we say it is a biased coin? Lesson : Less the variation in the results – the more we can be sure about the biasedness of the coin

Statistical significance Statistical significance is simply a way of stating how sure are you that the sample values are true to the population. Statisticians have developed several algorithms to calculate the level of statistical significance Usually a statistics test is carried out which provides a p-value or a significance value The lower the p-value, the greater the surety about the sample representing the population

Statistical Significance Before you test statistical significance you first need to develop a hypotheses. Hypotheses are usually developed in pairs: a null hypotheses and an alternative hypotheses E.g. Null hypotheses H0: The coin is not biased; no difference between males and females w.r.d to usage level Alternative hypotheses H1: The coin is biased. ; there is a difference between males and females w.r.t the usage level.

Chi-squared test The Chi-squared test is used to establish association between two categorical variables.

Dependent and Independent variables Customer Satisfaction Customer Loyalty Cause Effect Independent Variable Dependent Variable

Analysis of Variance (ANOVA) Independent variable Discontinuous variable dependent variable Continuous variable ANOVA is a popular statistical analysis to show whether a discontinuous independent variable impact a continuous dependent variables with an acceptable level of statistical significance The Null hypotheses in ANOVA is that the independent variable doesn’t have a significant impact on the dependent variable

ANOVA example We are trying to test whether gender influences usage rate of the website under study (please refer to the data and the questionnaire) Gender of the visitor Usage rate of website Discontinuous/categorical variable Continuous variable Null Hypothesis – Customer’s gender doesn’t influence the usage rate of the website Alternate hypothesis – Customer’s gender influences usage rate of the website

ANOVA GO to Analysis> compare means > one-way ANOVA

Select Independent variable (gender here) under factor and Dependent variable (usage length) under dependent list.

ANOVA Click options dialogue box Tick descriptives Click continue and then click ok

Interpret the results: Here the p-values is 0.914 much above the 0.05 cut off value hence there is no difference between males and females in terms of usage length: The null hypotheses hold and alternate hypotheses is rejected

Correlation between two variables Two variables can vary in tandem – or be correlated E.g. A student’s likelihood of scoring high marks in two separate subjects The correlation coefficient ‘r’ denotes the strength of association between two variables ‘r’ can vary between +1 and -1. A value close to 0 would mean there is no correlation -- or no association

Correlation plots Source: https://www.latestquality.com/interpreting-a-scatter-plot/

Simple regression Effect Cause Regression involves predicting the value of a dependent (effect) variable by developing an equation from the past values of the dependent and independent variables E.g. If we have data from 100s of Facebook users on the number of friends they have in Facebook and the average time they spend on face book, we can develop a regression equation that relates the number of Facebook friends to the average time someone spends on Facebook The equation could be something like this: Average time spend per day on Facebook = a + b *number of friends you have in Facebook + e a is called the intercept; b is called the regression coefficient and e is an error term; a, b and e are all numbers thrown up by the regression algorithm If you now know the number of friends your friend has in Facebook you could predict the average time she spends on Facebook

Multiple regression Effect Cause 1 Cause 2 Cause 3 In multiple regression we try to find out the impact of more than one independent variable on a dependent variable A dependent variable my be impacted by several independent variables at the same time

How to interpret a multiple regression output Intention to reuse = 2.66 + .107* Trust + .061* Usefulness + -.002* usage length

Household energy bill study Monthly Energy Bill Monthly Income Model 1 Monthly Energy Bill Monthly Income No of rooms No of members Model 2
Tags