"Understanding Correlation and Regression: Key Concepts for Data Analysis"

RekhaBoraChatare 66 views 43 slides Oct 09, 2024
Slide 1
Slide 1 of 43
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43

About This Presentation

Correlation and Regression: An Overview
1. Definition of Correlation
Correlation is a statistical measure that describes the extent to which two variables are related. It indicates the direction and strength of a relationship between variables. The most common correlation coefficient is Pearson’s ...


Slide Content

Correlation & Correlation &
RegressionRegression

CorrelationCorrelation
Correlation is a statistical technique
used to determine the degree to which
two variables are related

•Rectangular coordinate
•Two quantitative variables
•One variable is called independent (X) and
the second is called dependent (Y)
•Points are not joined
•No frequency table
Scatter diagram

Example

Scatter diagram of weight and systolic blood Scatter diagram of weight and systolic blood
pressurepressure
80
100
120
140
160
180
200
220
60 70 80 90 100 110 120
wt (kg)
SBP(mmHg)

80
100
120
140
160
180
200
220
60 70 80 90 100 110 120
Wt (kg)
SBP(mmHg)
Scatter diagram of weight and systolic blood pressure

Scatter plots
The pattern of data is indicative of the type of
relationship between your two variables:
positive relationship
negative relationship
no relationship

Positive relationshipPositive relationship

0
2
4
6
8
10
12
14
16
18
0 10 20 30 40 50 60 70 80 90
Age in Weeks
H
e
i
g
h
t

i
n

C
M

Negative relationshipNegative relationship
Reliability
Age of Car

No relationNo relation

Correlation CoefficientCorrelation Coefficient
Statistic showing the degree of relation
between two variables

Simple Correlation coefficient Simple Correlation coefficient (r)(r)

It is also called Pearson's correlation It is also called Pearson's correlation
or product moment correlation or product moment correlation
coefficient. coefficient.

It measures the It measures the naturenature and and strengthstrength
between two variables ofbetween two variables of
the the quantitativequantitative type. type.

The The signsign of of rr denotes the nature of denotes the nature of
association association
while the while the valuevalue of of rr denotes the denotes the
strength of association.strength of association.


If the sign is If the sign is +ve+ve this means the relation this means the relation
is is direct direct (an increase in one variable is (an increase in one variable is
associated with an increase in theassociated with an increase in the
other variable and a decrease in one other variable and a decrease in one
variable is associated with avariable is associated with a
decrease in the other variable).decrease in the other variable).

While if the sign is While if the sign is -ve-ve this means an this means an
inverse or indirectinverse or indirect relationship (which relationship (which
means an increase in one variable is means an increase in one variable is
associated with a decrease in the other).associated with a decrease in the other).


The value of r ranges between ( -1) and ( +1)The value of r ranges between ( -1) and ( +1)

The value of r denotes the strength of the The value of r denotes the strength of the
association as illustratedassociation as illustrated
by the following diagram.by the following diagram.
-1 10
-0.25-0.75 0.750.25
strong strongintermediate intermediateweak weak
no relation
perfect
correlation
perfect
correlation
Directindirect

If If rr = Zero = Zero this means no association or this means no association or
correlation between the two variables.correlation between the two variables.
If If 0 < 0 < rr < 0.25 < 0.25 = weak correlation. = weak correlation.
If If 0.25 ≤ 0.25 ≤ rr < 0.75 < 0.75 = intermediate correlation. = intermediate correlation.
If If 0.75 ≤ 0.75 ≤ rr < 1 < 1 = strong correlation. = strong correlation.
If If r r = l= l = perfect correlation. = perfect correlation.



























n
y)(
y.
n
x)(
x
n
yx
xy
r
2
2
2
2
How to compute the simple correlation
coefficient (r)

ExampleExample::
A sample of 6 children was selected, data about their A sample of 6 children was selected, data about their
age in years and weight in kilograms was recorded as age in years and weight in kilograms was recorded as
shown in the following table . It is required to find the shown in the following table . It is required to find the
correlation between age and weight.correlation between age and weight.
serial
No
Age
(years)
Weight
(Kg)
1 7 12
2 6 8
3 8 12
4 5 10
5 6 11
6 9 13

These 2 variables are of the quantitative type, one These 2 variables are of the quantitative type, one
variable (Age) is called the independent and variable (Age) is called the independent and
denoted as (X) variable and the other (weight)denoted as (X) variable and the other (weight)
is called the dependent and denoted as (Y) is called the dependent and denoted as (Y)
variables to find the relation between age and variables to find the relation between age and
weight compute the simple correlation coefficient weight compute the simple correlation coefficient
using the following formula:using the following formula:


























n
y)(
y.
n
x)(
x
n
yx
xy
r
2
2
2
2

Regression AnalysesRegression Analyses
Regression: technique concerned with predicting
some variables by knowing others
The process of predicting variable Y using
variable X

RegressionRegression

Uses a variable (x) to predict some outcome Uses a variable (x) to predict some outcome
variable (y)variable (y)

Tells you how values in y change as a function Tells you how values in y change as a function
of changes in values of xof changes in values of x

Correlation and RegressionCorrelation and Regression

Correlation describes the strength of a Correlation describes the strength of a linear
relationship between two variables
Linear means “straight line”
Regression tells us how to draw the straight line
described by the correlation

Regression

Calculates the “best-fit” line for a certain set of dataCalculates the “best-fit” line for a certain set of data
The regression line makes the sum of the squares of The regression line makes the sum of the squares of
the residuals smaller than for any other linethe residuals smaller than for any other line
Regression minimizes residuals
80
100
120
140
160
180
200
220
60 70 80 90 100 110 120
Wt (kg)

Regression Equation
Regression equation
describes the regression
line mathematically
Intercept
Slope
80
100
120
140
160
180
200
220
60 70 80 90 100 110 120
Wt (kg)
SBP(mmHg)

Linear EquationsLinear Equations
Y
Y = bX + a
a = Y-intercept
X
Change
in Y
Change in X
b = Slope
bXayˆ

By using the least squares method (a procedure By using the least squares method (a procedure
that minimizes the vertical deviations of plotted that minimizes the vertical deviations of plotted
points surrounding a straight line) we arepoints surrounding a straight line) we are
able to construct a best fitting straight line to the able to construct a best fitting straight line to the
scatter diagram points and then formulate a scatter diagram points and then formulate a
regression equation in the form of:regression equation in the form of:







n
x)(
x
n
yx
xy
b
2
2
1b
bXayˆ
Y mean X-Xmean

Hours studying and gradesHours studying and grades

Regressing grades on hours grades on hours
Linear Regression
2.00 4.00 6.00 8.00 10.00
Number of hours spent studying
70.00
80.00
90.00












Final grade in course = 59.95 + 3.17 * study
R-Square = 0.88
Predicted final grade in class =
59.95 + 3.17*(number of hours you study per week)

Predict the final grade ofPredict the final grade of……
Someone who studies for 12 hours
Final grade = 59.95 + (3.17*12)
Final grade = 97.99
Someone who studies for 1 hour:
Final grade = 59.95 + (3.17*1)
Final grade = 63.12
Predicted final grade in class = 59.95 + 3.17*(hours of study)

ExerciseExercise
A sample of 6 persons was selected the A sample of 6 persons was selected the
value of their age ( x variable) and their value of their age ( x variable) and their
weight is demonstrated in the following weight is demonstrated in the following
table. Find the regression equation and table. Find the regression equation and
what is the predicted weight when age is what is the predicted weight when age is
8.5 years8.5 years..

Serial no. Age (x) Weight (y)
1
2
3
4
5
6
7
6
8
5
6
9
12
8
12
10
11
13

AnswerAnswer
Serial no. Age (x)Weight (y)xyX
2
Y
2
1
2
3
4
5
6
7
6
8
5
6
9
12
8
12
10
11
13
84
48
96
50
66
117
49
36
64
25
36
81
144
64
144
100
121
169
Total 41 66 461291742

6.83
6
41
x  11
6
66
y
92.0
6
)41(
291
6
6641
461
2




b
Regression equation
6.83)0.9(x11yˆ
(x) 

0.92x4.675yˆ
(x)

12.50Kg8.5*0.924.675yˆ
(8.5) 
Kg58.117.5*0.924.675yˆ
(7.5)


11.4
11.6
11.8
12
12.2
12.4
12.6
7 7.5 8 8.5 9
Age (in years)
W
e
i
g
h
t

(
i
n

K
g
)
we create a regression line by plotting two
estimated values for y against their X component,
then extending the line right and left.

Exercise 2Exercise 2
The following are the The following are the
age (in years) and age (in years) and
systolic blood systolic blood
pressure of 20 pressure of 20
apparently healthy apparently healthy
adults.adults.
Age
(x)
B.P
(y)
Age
(x)
B.P
(y)
20
43
63
26
53
31
58
46
58
70
120
128
141
126
134
128
136
132
140
144
46
53
60
20
63
43
26
19
31
23
128
136
146
124
143
130
124
121
126
123

Find the correlation between age Find the correlation between age
and blood pressure using simple and blood pressure using simple
and Spearman's correlation and Spearman's correlation
coefficients, and comment.coefficients, and comment.
Find the regression equation?Find the regression equation?
What is the predicted blood What is the predicted blood
pressure for a man aging 25 years?pressure for a man aging 25 years?

Serialx y xy x2
1 201202400 400
2 431285504 1849
3 631418883 3969
4 261263276 676
5 531347102 2809
6 311283968 961
7 581367888 3364
8 461326072 2116
9 581408120 3364
10 7014410080 4900

Serialx y xy x2
11 46 128 58882116
12 53 136 72082809
13 60 146 87603600
14 20 124 2480 400
15 63 143 90093969
16 43 130 55901849
17 26 124 3224 676
18 19 121 2299 361
19 31 126 3906 961
20 23 123 2829 529
Total852263011448641678








n
x)(
x
n
yx
xy
b
2
2
1 4547.0
20
852
41678
20
2630852
114486
2




=
=112.13 + 0.4547 x
for age 25
B.P = 112.13 + 0.4547 * 25=123.49 = 123.5 mm hg

Multiple Regression
Multiple regression analysis is a
straightforward extension of simple
regression analysis which allows more
than one independent variable.