A easiest way to run Linear Regression by using SPSS
Size: 1.45 MB
Language: en
Added: Dec 07, 2016
Slides: 19 pages
Slide Content
Linear Regression Analysis using
SPSS Statistics
Dr Athar Khan
MBBS, MCPS, DPH, DCPS-HCSM, DCPS-HPE, MBA,
PGD-Statistics
Associate Professor
Liaquat College of Medicine & Dentistry
Introduction
•Linear regression is the next step up after
correlation.
•It is used when we want to predict the value of a
variable based on the value of another variable.
•The variable we want to predict is called the
dependent variable (or sometimes, the outcome
variable).
•The variable we are using to predict the other
variable's value is called the independent
variable (or sometimes, the predictor variable).
212/07/16 DR ATHAR KHAN - LCMD
Introduction
•For example, exam performance can be
predicted based on revision time; whether
cigarette consumption can be predicted based
on smoking duration; and so forth.
•If you have two or more independent variables,
rather than just one, you need to use multiple
regression.
312/07/16 DR ATHAR KHAN - LCMD
Assumptions
•Assumption #1: Your two variables should be
measured at the continuous level (i.e., they are
either interval or ratio variables).
412/07/16 DR ATHAR KHAN - LCMD
Assumptions
•Assumption #2: There needs to be a linear
relationship between the two variables.
•Creating a scatter plot using SPSS Statistics and
then visually inspect the scatter plot to check for
linearity.
•If the relationship displayed in your scatter plot is
not linear, you will have to either run a non-linear
regression analysis, perform a polynomial
regression or "transform" your data.
512/07/16 DR ATHAR KHAN - LCMD
612/07/16 DR ATHAR KHAN - LCMD
Assumptions
•Assumption #3: There should be no
significant outliers.
•An outlier is an observed data point that has a
dependent variable value that is very different to
the value predicted by the regression equation.
•As such, an outlier will be a point on a
scatterplot that is (vertically) far away from the
regression line indicating that it has a large
residual. The difference between the individual
value in the sample and the
observable sample mean is a residual.
712/07/16 DR ATHAR KHAN - LCMD
8
Residual
In regression analysis, the difference between the
observed value of the dependent variable (y) and the
predicted value (ŷ) is called the residual (e). Each data
point has one residual.
Residual = Observed value - Predicted value
e = y - ŷ
Both the sum and the mean of the residuals are equal to
zero. That is, Σ e = 0 and e = 0.
12/07/16 DR ATHAR KHAN - LCMD
912/07/16 DR ATHAR KHAN - LCMD
Assumptions
•Assumption #4: independence of
observations, which you can easily check using
the Durbin-Watson statistic.
•If observations are made over time, it is likely
that successive observations are related.
•If there is no autocorrelation (where subsequent
observations are related), the Durbin-Watson
statistic should be between 1.5 and 2.5.
1012/07/16 DR ATHAR KHAN - LCMD
1112/07/16 DR ATHAR KHAN - LCMD
Assumptions
•Assumption #5: Data needs to
show homoscedasticity, which is where the
variances along the line of best fit remain similar
as you move along the line.
1212/07/16 DR ATHAR KHAN - LCMD
1312/07/16 DR ATHAR KHAN - LCMD
Assumptions
•Assumption #6: Finally, residuals (errors) of
the regression line are approximately normally
distributed
•Two common methods to check this assumption
include using either a histogram (with a
superimposed normal curve) or a Normal P-P
Plot.
1412/07/16 DR ATHAR KHAN - LCMD
1512/07/16 DR ATHAR KHAN - LCMD
1612/07/16 DR ATHAR KHAN - LCMD
1712/07/16 DR ATHAR KHAN - LCMD
18
If the beta coefficient is not statistically significant, no statistical significance can be
interpreted from that predictor. If the beta coefficient is sufficient, examine the sign of
the beta.
12/07/16 DR ATHAR KHAN - LCMD
19
for every 1-unit increase in the predictor variable, the dependent variable will
increase by the unstandardized beta coefficient value.
12/07/16 DR ATHAR KHAN - LCMD