Statical significance – some interesting examples Coin1 is flipped four times: 3 times it shows H and once it shows T Can we say it is a biased coin? The same coin is flipped 10 times: 7 times it shows H and 3 times it shows T – can we now say it is a biased coin? The same coin is now flipped 100 times: 69 times it shows H and 31 times it shows T – can we now say it is a biased coin? Lesson : The more sample you have the more you can be sure about how biased the coin is Coin 2 is flipped 10 times: 9 times it shows H and once it shows T – can we say it is a biased coin? Lesson : Less the variation in the results – the more we can be sure about the biasedness of the coin
Statistical significance Statistical significance is simply a way of stating how sure are you that the sample values are true to the population. Statisticians have developed several algorithms to calculate the level of statistical significance Usually a statistics test is carried out which provides a p-value or a significance value The lower the p-value, the greater the surety about the sample representing the population
Statistical Significance Before you test statistical significance you first need to develop a hypotheses. Hypotheses are usually developed in pairs: a null hypotheses and an alternative hypotheses E.g. Null hypotheses H0: The coin is not biased; no difference between males and females w.r.d to usage level Alternative hypotheses H1: The coin is biased. ; there is a difference between males and females w.r.t the usage level.
Chi-squared test The Chi-squared test is used to establish association between two categorical variables.
Dependent and Independent variables Customer Satisfaction Customer Loyalty Cause Effect Independent Variable Dependent Variable
Analysis of Variance (ANOVA) Independent variable Discontinuous variable dependent variable Continuous variable ANOVA is a popular statistical analysis to show whether a discontinuous independent variable impact a continuous dependent variables with an acceptable level of statistical significance The Null hypotheses in ANOVA is that the independent variable doesn’t have a significant impact on the dependent variable
ANOVA example We are trying to test whether gender influences usage rate of the website under study (please refer to the data and the questionnaire) Gender of the visitor Usage rate of website Discontinuous/categorical variable Continuous variable Null Hypothesis – Customer’s gender doesn’t influence the usage rate of the website Alternate hypothesis – Customer’s gender influences usage rate of the website
ANOVA GO to Analysis> compare means > one-way ANOVA
Select Independent variable (gender here) under factor and Dependent variable (usage length) under dependent list.
ANOVA Click options dialogue box Tick descriptives Click continue and then click ok
Interpret the results: Here the p-values is 0.914 much above the 0.05 cut off value hence there is no difference between males and females in terms of usage length: The null hypotheses hold and alternate hypotheses is rejected
Correlation between two variables Two variables can vary in tandem – or be correlated E.g. A student’s likelihood of scoring high marks in two separate subjects The correlation coefficient ‘r’ denotes the strength of association between two variables ‘r’ can vary between +1 and -1. A value close to 0 would mean there is no correlation -- or no association
Simple regression Effect Cause Regression involves predicting the value of a dependent (effect) variable by developing an equation from the past values of the dependent and independent variables E.g. If we have data from 100s of Facebook users on the number of friends they have in Facebook and the average time they spend on face book, we can develop a regression equation that relates the number of Facebook friends to the average time someone spends on Facebook The equation could be something like this: Average time spend per day on Facebook = a + b *number of friends you have in Facebook + e a is called the intercept; b is called the regression coefficient and e is an error term; a, b and e are all numbers thrown up by the regression algorithm If you now know the number of friends your friend has in Facebook you could predict the average time she spends on Facebook
Multiple regression Effect Cause 1 Cause 2 Cause 3 In multiple regression we try to find out the impact of more than one independent variable on a dependent variable A dependent variable my be impacted by several independent variables at the same time
How to interpret a multiple regression output Intention to reuse = 2.66 + .107* Trust + .061* Usefulness + -.002* usage length
Household energy bill study Monthly Energy Bill Monthly Income Model 1 Monthly Energy Bill Monthly Income No of rooms No of members Model 2