Regression Analysis is simplified in this presentation. Starting with simple linear to multiple regression analysis, it covers all the statistics and interpretation of various diagnostic plots. Besides, how to verify regression assumptions and some advance concepts of choosing best models makes the ...
Regression Analysis is simplified in this presentation. Starting with simple linear to multiple regression analysis, it covers all the statistics and interpretation of various diagnostic plots. Besides, how to verify regression assumptions and some advance concepts of choosing best models makes the slides more useful SAS program codes of two examples are also included.
STEPS & ASSUMPTIONS OF REGRESSION
Step 1
Formulate the problem
Step 2
Define dependent & independent variables
Step 3
Build the general model
Step 4
Plot the scatter diagram
Step 5
Estimate the parameters
Step 6
Estimate the regression coefficient
Step 7
Test for significance
Step 8
Find the strength of the association
Step 9
Check the prediction accuracy
Step 10
Examine the residuals
Step 11
Cross-validate the model
9
•Linearity of the phenomenon
measured, meaning the mean
of dependent variable is linearly
related to independent variable
•Error are normally distributed
with a mean of zero
•Errorshaveequalvariances, or
in other words the error term is
constant (Homoscedasticity)
•Error are independent, meaning
uncorrelated
•Toprovidejustificationforacceptingorrejectingagivenhypothesis
•InANOVA,thenullhypothesisisthatallpopulationmeansareequalandthe
alternativehypothesisisthatnotallofthepopulationmeansareequal.Itis
assumedthatthepopulationsarenormalandthattheyhaveequalvariances.
11
SIGNIFICANCE TESTING
•To test the hypothesis, F ratio is calculated which has to be higher than the
Fisher distribution statistics (based on sample size), proving the model fit the
data better than the baseline model
•The results has p-value which should be lower0.05 to confirm the probability
that relationship exists between dependent and independent variables
•Testing for the significance of the model parameters can be done in a manner
similar but using ttest statistics
•Inregression,therearethree
typesofsumsofsquares:
variationexplainedbymodel
(SS
M),unexplainedvariationerror
(SS
E),andtotalvariation(SS
T)
12
COEFFICIENT OF DETERMINATION
•Coefficient of determination (R
2
)
explains the strength of association
•R
2
= SS
M/ SS
T
•It measuring the percentage of the
variation in dependent variable
that is explained by the
independent variable
•The value of R
2
closer to 1 means
regression line fits perfectly
whereas the value closer to 0
doesn’t fit the data well
•R
2
value will keep increasing if we
add more independent variables to
the model and results can be
misleading
•After adding the first few variables,
additional independent variables
do not make much contribution
•Adjusted R
2
tells the percentage of
variation explained by only the
independent variables that actually
affect the dependent variable
•For example, in below R
2
values,
variables more than 3 does not add
any value to the model