CORRELATION AND REGRESSION_lesson.pptx.pdf

JoelHinay3 8 views 36 slides Aug 12, 2024
Slide 1
Slide 1 of 36
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36

About This Presentation

Notes and Materials


Slide Content

CORRELATION AND
REGRESSSION
CERTIFIED STATISTICIAN SPECIALIST (CSS) PROGRAM
(VISIONARY RESEARCH ASSOCIATION, INC.)






Eugine B. Dodongan, MAEd
Agriculture Department
Davao de Oro State College

Objectives
1
2
3
Building pre-requisite
knowledge on correlation and
regression
Establishing the desired
knowledge on correlation and
regression
Developing skills in analyzing
data

CORRELATION ANALYSIS
Correlation analysis is a statistical technique that gives
you information about the relationship between
variables.

Correlation analysis can be calculated to investigate
the relationship of variables. How strong
the correlation is is determined by the correlation
coefficient, which varies from -1 to +1. Correlation
analyses can thus be used to make a statement about
the strength and direction of the correlation.

ISSUES OF CORRELATION

Correlation DOES NOT IMPLY
CAUSATION

The size of a correlation can be
influence by the size of your sample.

LINEARITY of the relationship

LINEARITY of the relationship

RANGE of talent (VARIABILITY)

RANGE of talent (VARIABILITY)

Homoscedasticity (equal variability)

Effect of Discontinuous Distributions
(Outliers)

Deciding what is a “good” correlation

CORRELATION ANALYSIS:
ASSUMPTIONS
LEVEL OF MEASUREMENT
Both variables should be measured on a
continuous scale.
RELATED PAIRS
Each observation must include pairs of values for
the two variables.

CORRELATION ANALYSIS:
ASSUMPTIONS
ABSENCE OF OUTLIERS
The data should not contain outliers in either
variable.
LINEARITY
The relationship between variables should be
linear.

REGRESSION ANALYSIS
Regression is a statistical method that allows modeling
relationships between a dependent variable and one or more
independent variables.

WHEN TO USE REGRESSION
ANALYSIS?

TYPES OF REGRESSION ANALYSES

TYPES OF REGRESSION ANALYSES

ASSUMPTIONS OF LINEAR REGRESSION
1
First, linear regression needs the relationship between the
independent and dependent variables to be linear. It is also
important to check for outliers since linear regression is sensitive to
outlier effects. The linearity assumption can best be tested with
scatter plots, the following two examples depict two cases, where no
and little linearity is present.

ASSUMPTIONS OF LINEAR REGRESSION
1
First, linear regression needs the relationship between the independent and dependent variables to be linear. It is
also important to check for outliers since linear regression is sensitive to outlier effects. The linearity assumption
can best be tested with scatter plots, the following two examples depict two cases, where no and little linearity is
present.

ASSUMPTIONS OF LINEAR REGRESSION
1
First, linear regression needs the relationship between the independent and dependent variables to be linear. It is
also important to check for outliers since linear regression is sensitive to outlier effects. The linearity assumption
can best be tested with scatter plots, the following two examples depict two cases, where no and little linearity is
present.

ASSUMPTIONS OF LINEAR REGRESSION
1
First, linear regression needs the relationship between the independent and dependent variables to be linear. It is
also important to check for outliers since linear regression is sensitive to outlier effects. The linearity assumption
can best be tested with scatter plots, the following two examples depict two cases, where no and little linearity is
present.

ASSUMPTIONS OF LINEAR REGRESSION
2
Secondly, the linear regression analysis requires all
variables to be multivariate normal. This assumption can
best be checked with a histogram or a Q-Q-Plot.
Normality can be checked with a goodness of fit test,
e.g., the Kolmogorov-Smirnov test. When the data is not
normally distributed a non-linear transformation (e.g.,
log-transformation) might fix this issue.

ASSUMPTIONS OF LINEAR REGRESSION
2
Secondly, the linear regression analysis requires all variables to be multivariate normal. This assumption can best
be checked with a histogram or a Q-Q-Plot. Normality can be checked with a goodness of fit test, e.g., the
Kolmogorov-Smirnov test. When the data is not normally distributed a non-linear transformation (e.g.,
log-transformation) might fix this issue.

ASSUMPTIONS OF LINEAR REGRESSION
3
Thirdly, linear regression assumes that there is little or no
multicollinearity in the data.  Multicollinearity occurs
when the independent variables are too highly correlated
with each other.
CORRELATION MATRIX
TOLERANCE
Variance Inflation Factor (VIF)

ASSUMPTIONS OF LINEAR REGRESSION
4
Fourthly, linear regression analysis requires that there is
little or no autocorrelation in the data. Autocorrelation
occurs when the residuals are not independent from each
other. In other words when the value of y(x+1) is not
independent from the value of y(x).
You can test the linear regression model for autocorrelation with the Durbin-Watson test.  Durbin-Watson’s d tests the null hypothesis that the residuals are not linearly
auto-correlated. 

ASSUMPTIONS OF LINEAR REGRESSION
4
Fourthly, linear regression analysis requires that there is little or no autocorrelation in the data. Autocorrelation occurs when the residuals are not independent from each
other. In other words when the value of y(x+1) is not independent from the value of y(x).
You can test the linear regression model for
autocorrelation with the Durbin-Watson test. 
Durbin-Watson’s d tests the null hypothesis that the
residuals are not linearly auto-correlated. 

ASSUMPTIONS OF LINEAR REGRESSION
5
The last assumption of the linear regression analysis
is homoscedasticity.  Homoscedasticity, or homogeneity
of variances, is an assumption of equal or similar
variances in different groups being compared. The
scatter plot is good way to check whether the data are
homoscedastic (meaning the residuals are equal across
the regression line).  The following scatter plots show
examples of data that are not homoscedastic (i.e.,
heteroscedastic):

ASSUMPTIONS OF LINEAR REGRESSION
5
The last assumption of the linear regression analysis is homoscedasticity.  The scatter plot is good way to check
whether the data are homoscedastic (meaning the residuals are equal across the regression line).  The following
scatter plots show examples of data that are not homoscedastic (i.e., heteroscedastic):

Linearity Scatter Plot
Normality Shapiro Wilk / Kolmogorov p-value > 0.05
Multicollinearity VIF
A VIF of 1 will mean that the
variables are not correlated; a VIF
between 1 and 5 shows that
variables are moderately correlated,
and a VIF between 5 and 10 will
mean that variables are highly
correlated.
Autocorrelation Durbin-Watson Test
The DW statistic ranges from zero to
four, with a value of 2.0 indicating
zero autocorrelation. Values below
2.0 mean there is positive
autocorrelation and above 2.0
indicates negative autocorrelation.
homoscedasticity Scatter Plot

CORRELATION AND
REGRESSION ANALYSIS USING
JAMOVI AND JASP
Tags