Correlation and Regression analysis .ppt

jayeshraj0000 53 views 46 slides Aug 03, 2024
Slide 1
Slide 1 of 46
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46

About This Presentation

Theory and Problem based explanation


Slide Content

Correlation & RegressionCorrelation & Regression
Chapter 15

CorrelationCorrelation
statistical technique that is used to measure
and describe a relationship between two
variables (X and Y).

3 Characteristics3 Characteristics

1. The direction of the relationship
–Positive correlation ( +)
–Negative Correlation (-)

2. The Form of the 2. The Form of the
RelationshipRelationship
Relationships tend to have a linear
relationship. A line can be drawn through
the middle of the data points in each figure.
The most common use of regression is to
measure straight-line relationships.
Not always the case

ScatterplotScatterplot
Visual representation of scores.
Each individual score is represented by a
single point on the graph.
Allows you to see any patterns of trends
that exist in the data.

Psychology 295Psychology 295
EXAM2
30282624222018161412
H
O
M
E
W
O
R
K
14
12
10
8
6
4
2
0
-2

3. The Degree of the 3. The Degree of the
RelationshipRelationship
Measures how well the data fit the specific
form being considered.
The degree of relationship is measured by
the numerical value of the correlation (0 to
1.00)
–A perfect correlation is always identified by a
correlation of 1.00 and indicates a perfect fit.
–A correlation value of 0 indicates no fit or
relationship at all.

Example Correlations

Pearson Product-Moment Pearson Product-Moment
CorrelationCorrelation
Measures the degree and the direction of the linear
relationship between two variables
Identified by r
degree to which X and Y vary together
r= degree to which X and Y vary separately
= ___covariability of X and Y____
variability of X and Y separately

How do we calculate the How do we calculate the
Pearson Correlation?Pearson Correlation?
Sum of products of deviations: provides a
parallel procedure for measuring the amount of
covariability between two variables.
Definitional formula SP =  (X-X) (Y-Y)

XY
Computational SP = XY - n
formula

Computational FormulaComputational Formula





SS
x SS
y
SP
r

Standardized FormulaStandardized Formula
1


N
zz
r
yx

Using and Interpreting Using and Interpreting rr
Prediction
Validity
ReliabilityReliability
Theory VerificationTheory Verification
*“CORRELATION DOES NOT MEAN CAUSATION”“CORRELATION DOES NOT MEAN CAUSATION”

Restriction of RangeRestriction of Range
Occurs whenever a correlation is computed
from scores that do not represent the full
range of possible values.
ie:IQ tests among college students.
Correlations should not be generalized
beyond the range of data represented in the
sample.

Other Correlation CoefficientsOther Correlation Coefficients
Spearman r
–Two ranked (ordinal) variables
Point-biserial r
–Pearson r between dichotomous and continuous
variable
Phi Coefficient
–Pearson r between two dichotomous variables

OutliersOutliers
An individual with X and/or Y values that
are substantially different (larger or smaller)
from the values obtained for the other
individuals in the data set.
An outlier can dramatically influence the
value obtained for the correlation.
Always look at scatter plots to determine if
there are outliers.

Coefficient of Determination Coefficient of Determination
r
2
measures the proportion of variability in
one variable that can be determined from
the relationship with the other variable.
A correlation of r = .80 means that r
2 =
.64
or 64% of the variability in Y scores can be
predicted from the relationship with X.

Hypothesis Testing with Hypothesis Testing with rr
Standard hypotheses:
H
0
:  = 0 (There is no population correlation)
H
1
:   0 (There is a real correlation)
Other hypotheses are possible, e.g., one-
sided hypotheses or hypotheses with  
0.
If the alternative hypothesis prevails, one
can state that a correlation is significant
in the sample

There will always be some error between a sample
correlation (r) and the population correlation ()
it represents.
Goal of the hypothesis test is to decide between
the following two alternatives:
–The nonzero sample correlation is simply due to
chance.
–The nonzero sample correlation accurately represents a
real, nonzero correlation in the population.
–USE TABLE B 6.

CAPA ExampleCAPA Example
Questions 5-10
Step 1) Calculate the SS for X
Step 2) Calculate the SS for Y
Formula for SS: ( X)
2
SS =  = (X-X
2
) OR SS=  X
2
- n

Calculation Calculation cont’dcont’d
Calculate XY to obtain
Definitional formula SP =  (X-X) (Y-Y) or
 X  Y
Computational SP = XY - n
formula

Calculate r:
SP
r=
Compare to Table B6 and find the critical
value
What can we determine??????
SSxSSy

The errors in prediction are
the distances between
actual Y values and
prediction line

Best Fitting LineBest Fitting Line
The line that gives the best prediction of Y
We must find the specific values for a and b
SP
b = SSxa = Y –bX
Y = bX + a

Caution Be AwareCaution Be Aware
The predicted value is not perfect unless r =
1.00 or –1.00
The regression equation should not be used
to make predictions for X values that fall
outside the range of values covered by the
original data (restriction of range).

The Spearman CorrelationThe Spearman Correlation
Used for non-linear relationships
Ordinal (ranked) Data
Can be used as an alternative to the Pearson
Measure of consistency

RanksRanks
Consistent relationships among scores
produces a linear relationship when the
scores are converted to ranks.

When is the Spearman When is the Spearman
correlation used?correlation used?
When the original data are ordinal, when
the X and Y values are ranks.
When a researcher wants to measure the
consistency of a relationship between X and
Y, independent of the specific form of the
relationship.
–monotonic

Calculating a Spearman Calculating a Spearman
CorrelationCorrelation
Step 1) Rank X and Y scores (separately)
Step 2) Use the Pearson correlation formula
for the ranks of the X and Y scores.
r
s

Tied ScoresTied Scores
When converting scores into ranks for the
Spearman correlation, there may be two or more
identical scores. If this occurs:
1.List the scores in order from smallest to largest
(include tied values)
2.Assign a rank (1
st
, 2
nd
) to each position in the list.
3.When two or more scores are tied, compute the
mean of their ranked positions, and assign this
mean value as the final rank for each score.

Special Formula for the Special Formula for the
Spearman CorrelationSpearman Correlation
X = (n+1)/2
SS= n(n
2
–1)
12
6D
2
r
s = 1– n(n
2
-1) *D is the difference between the X
rank and the Y rank for each individual.
*N = number of pairs

RegressionRegression
Is the statistical technique for finding
the best-fitting straight line for a set
of data.
To find the line that best describes the
relationship for a set of X and Y data.

Regression AnalysisRegression Analysis
Question asked: Given one variable, can we
predict values of another variable?
 
Examples: Given the weight of a person, can we
predict how tall he/she is; given the IQ of a
person, can we predict their performance in
statistics; given the basketball team’s wins, can we
predict the extent of a riot. ...

Using regression analysis one can make this type
of prediction:
 Predictor and Criterion
 
Regression analysis allows one to
 predict values of the criterion: point prediction
estimate strength of predictability (significance
testing)

Regression lineRegression line
makes the relationship between variables
easier to see.
identifies the center, or central tendency, of
the relationship, just as the mean describes
central tendency for a set of scores.
can be used for a prediction.

The Equation for a LineThe Equation for a Line
Y = bX + a
–b = the slope
–a = y-intercept
–Y= predicted value

ExampleExample
Local tennis club charges $5 per hour plus an annual
membership fee of $25.
Compute the total cost of playing tennis for 10 hours per
month.
(predicted cost) Y = (constant) bX + (constant) a
When X = 10
Y= $5(10 hrs) + $25
Y = 75

When X = 30
Y= $5(30 hrs) + $25
Y = $175

Least Squares SolutionLeast Squares Solution
Minimize the square root of the squared
differences between data points and the line
The best fit line has the smallest total
squared error
We seek to minimize
(Y - Y)
2

•When estimating the parameters for slope and intercept,
one minimizes the sum of the squared residuals, that is,
prediction errors:
• least squares estimation.

The errors in prediction are
the distances between
actual Y values and
prediction line

EquationsEquations
The line that gives the best prediction of Y
We must find the specific values for a and b
SP
b = SSxa = Y –bX
Y = bX + a

Caution Be AwareCaution Be Aware
The predicted value is not perfect unless r =
1.00 or –1.00
The regression equation should not be used
to make predictions for X values that fall
outside the range of values covered by the
original data (restriction of range).

ConclusionConclusion
Using methods of statistical inference in
regression analysis we ask whether the
regression line explains a significant
portion of the variance of Y.
Tags