Correlation and Regression

3,560 views 60 slides Oct 21, 2020
Slide 1
Slide 1 of 60
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60

About This Presentation

This presentation covered the following topics:
1. Definition of Correlation and Regression
2. Meaning of Correlation and Regression
3. Types of Correlation and Regression
4. Karl Pearson's methods of correlation
5. Bivariate Grouped data method
6. Spearman's Rank correlation Method
7...


Slide Content

By
Tushar Bhatt
[M.Sc(Maths), M.Phil(Maths), M.Phil(Stat.),M.A(Edu.),P.G.D.C.A]
Assistant Professor in Mathematics,
Atmiya University,
Rajkot

Correlation and Regression

 Meaning of Correlation
 Co – Means two, therefore correlation is a relation between
two variables (like X and Y )

 Correlation is a Statistical method that is commonly used to
compare two or more variables

 For example, comparison between income and expenditure,
price and demand etc...

 Definition of Correlation


 Correlation is a statistical measure for finding out degree
(strength) of association between two or more than two
variables.

 Types of Correlation
 There are three types of correlation as follows :

1.Type – 1 correlation
2. Type – 2 correlation
3. Type – 3 correlation

 Type – 1 correlation
Type – 1
correlation
Positive
correlation
Negative
correlation

 Positive Correlation
 The correlation is said to be positive, if the values of two
variables changing with same direction.

 In other words as X increasing , Y is in increasing similarly as
X decreasing , Y is in decreasing.

 For example : Water consumption and Temperature.

 Negative Correlation
 The correlation is said to be negative, if the values of two
variables changing with opposite direction.

 In other words as X increasing , Y is in decreasing similarly as
X decreasing , Y is in increasing.

 For example : Alcohol consumption and Driving ability.

 Type – 2 correlation
Type – 2
correlation
Simple
correlation
Multiple
correlation
Partial
correlation
Total
correlation

 Simple Correlation
 Under simple correlation problem there are only two
variables are studied.
 Multiple Correlation
 Under multiple correlation problem there are three or
more than three variables are studied.

 Partial Correlation
 Under multiple correlation problem there are two
variables considered and other variables keeping as
constant, known as partial correlation.
 Total Correlation
 Total correlation is based on all the relevant variables, which
is normally not feasible .

 Type – 3 correlation
Type –3
correlation
Linear
correlation
Non-Linear
correlation

 Linear Correlation
 A correlation is said to be linear when the amount of change
in one variable tends to bear a constant ratio to the amount
of change in the other.
 The graph of the variables having a linear relationship will
form a straight line.
 For example:




 Y = 3+2X (as per above table)
X 1 2 3 4 5
Y 5 7 9 11 13

 Non – Linear Correlation
 The correlation would be non-linear, if the amount of change
in one variable does not bear a constant ratio to the amount
of change in the other variable.

 The methods to measure of correlation
 There are three methods to measure of correlation :

1.Karl Pearson’s coefficient of correlation method
2. Coefficient of correlation for Bivariate Grouped data
method
3. Spearman’s Rank correlation method
4. Scatter diagram method

 The methods to measure of correlation
Karl Pearson’s
coefficient of
correlation method
Direct method
If mean of x-series and
y-series are must be
integers
If mid value of x-series
and y-series are not
given in instruction
Short-cut method
If either mean of x-
series and y-series are
not an integer
If mid value of x-series
and y-series are given
in instruction
Data given in term of
middle values of X and
Y .

 Definition : Covariance

 Karl Pearson’s coefficient (r) of correlation method

Case -1: If are integers then cov(X,Y)
xy
r


 ( )(y )
cov( , ) , '
ii
x X Y
X Y n no of obsevations
n
  
 2
2
()
.
(y )
.
i
x
i
y
xX
st deviationof X
n
Y
st deviationof Y
n





 X andY X meanof x series
Y meanof y series


Direct method (Frequency is not given)

 Karl Pearson’s coefficient (r) of correlation method

Case -2: If either may not be integers then X or Y   
   
22
22
dx dy
dxdy
n
r
dx dy
dx dy
nn


   
   

   
      



 ,
,
dx x A Ais assumed mean of x series
dy y B B isassumed mean of y series
  
  
Short-cut Method (Frequency is not given)

 Examples
Ex-1 : Find the correlation coefficient from the following tabular data :



Ans : 0.845

Ex-2 : Calculate Karl Pearson’s coefficient of correlation between
advertisement cost and sales as per the data given below:



Ans : 0.7807
X 1 2 3 4 5 6 7
Y 6 8 11 9 12 10 14
Add. Cost 39 65 62 90 82 75 25 98 36 78
Sales 47 53 58 86 62 68 60 91 51 84

 Examples
Ex-3 : Find the correlation coefficient from the following
tabular data :


Ans : -0.99(approx)
• Ex-4: Calculate Pearson’s coefficient of correlation from the
following taking 100 and 50 as the assumed average of x-
series and y-series respectively:

X 1 2 3 4 5 6 7 8 9 10
Y 46 42 38 34 30 26 22 18 14 10
X 104 111 104 114 118 117 105 108 106 100 104 105
Y 57 55 47 45 45 50 64 63 66 62 69 61
Ans : -0.67

 Coefficient (r) of correlation for Bivariate Grouped data method

 In case of bivariate grouped frequency distribution
,coefficient of correlation is given by   
   
22
22
fu fv
fuv
n
r
fu fv
fu fv
nn


   
   

   
      



 ,,
cislengthof aninterval
,,
islengthof aninterval
XA
u Ais assumed mean of x series
c
YB
v B isassumed mean of y series
d
d





 Examples
Ex-5 : Find the correlation coefficient between the grouped
frequency distribution of two variables (Profit and Sales)
given in the form of a two way frequency table :




Ans : 0.0946
Sales (in rupees ) 
P
r
o
f
i
t

(
R
S
)


80-90 90-100 100-110 110-120 120-130 Total
50-55 1 3 7 5 2 18
55-60 2 4 10 7 4 27
60-65 1 5 12 10 7 35
65-70 - 3 8 6 3 20
Total 4 15 37 28 16 100

 Examples
Ex-6 : Find the correlation coefficient between the ages of
husbands and the ages of wives given in the form of a two
way frequency table :




Ans : 0.61
Ages of Husbands (in years )

W
i
v
e
s

a
g
e
s
(
y
r
)


20-25 25-30 30-35 35-40 Total
15-20 20 10 3 2 35
20-25 4 28 6 4 42
25-30 - 5 11 - 16
30-35 - - 2 - 2
35-40 - - - - 0
Total 24 43 22 6 95

 Spearman’s Rank Correlation Method
 The methods, we discussed in previous section are depends on the
magnitude of the variables.
 but there are situations, where magnitude of the variable is not
possible then we will use “ Spearman’s Rank correlation method”.
 For example we can not measure beauty and intelligence
quantitatively. It possible to rank individual in order.
 Edward Spearman’s formula for Rank Correlation coefficient R,
as follows: 2
3
6
1
'
d
R
nn
n no of individualsineachseries
d Thedifferencebetweentheranksof thetwoseries




 Examples
Ex-7 : Calculate the rank correlation coefficient if two judges
in a beauty contest ranked the entries follows:


Ans : -1
• Ex-8: Ten students got the following percentage of marks in
mathematics and statistics. Evaluate the rank correlation
between them.

Judge X 1 2 3 4 5
Judge Y 5 4 3 2 1
Roll. No. 1 2 3 4 5 6 7 8 9 10
Marks in
Maths
78 36 98 25 75 82 90 62 65 39
Marks in Stat.

84 51 91 60 68 62 86 58 53 47
Ans : 0.8181

 Scatter Diagram Method
 In this method first we plot the observations in XY – plane .
 X - Independent variable along with horizontal axis.
 Y - Dependent variable along with vertical axis.

 Interpretation of correlation coefficient
The closer the value of the correlation coefficient is to 1 or -1, the
stronger the relationship between the two variables and the more
the impact their fluctuations will have on each other.
If the value of r is 1, this denotes a perfect positive relationship
between the two and can be plotted on a graph as a line that goes
upwards, with a high slope.
 If the value of r is 0.5, this will denote a positive relationship
between the two variables and it can be plotted on a graph as a line
that goes upward, with a moderate slope.
 If the value of r is 0, there is no relationship at all between the
two variables.
 If the value of r is -0.5, this will denote a negative relationship
between the two variables and it can be plotted on a graph as a line
that goes downwards with a moderate slope.

 Interpretation of correlation coefficient
If the value of r is -1, it will denote a negative relationship
between the two variables and it can be plotted on a graph as a line
that goes downwards with a steep slope.
If the value of the correlation coefficient is between 0.1 to 0.5 or -
0.1 and -0.5, the two variables in the relationship are said to be
weakly related. If the value of the correlation coefficient is
between 0.9 and 1 or -0.9 and -1, the two variables are extremely
strongly related.
As we discussed earlier, a positive coefficient will show variables
that rise at the same time.
A negative coefficient, on the other hand, will show variables that
move in opposite directions. It’s easy to tell the relationship
between by checking the positive or negative value of the
coefficient.

Regression

 Types of Regression
 SIMPLE REGRESSION
Study only two variables at a time.

• MULTIPLE REGRESSION
 Study of more than two variables at a time.

 Lines of Regression
(a)Regression Equation Y on X

( ) where
yx
Y Y b X X   

2
2
2
2
cov( , )
1. ,
()
2. cov( , ) ,
3.
4. .
5.
yx
x
x
yx
XY
b
XY
X Y X Y
n
XX
nn
n Total no of observations
b regressioncoefficient of regressionlineY onX





 







 Lines of Regression
(b) Regression Equation X on Y

( ) where
xy
X X b Y Y   

2
2
2
2
cov( , )
1. ,
()
2. cov( , ) ,
3.
4. .
5.
xy
y
y
xy
XY
b
XY
X Y X Y
n
YY
nn
n Total no of observations
b regressioncoefficient of regressionline X onY





 







 Regression Equations
 The algebraic expressions of the regression lines are called
regression equations.
 Since there are two regression lines therefore there are two
regression equations.
 Using previous method we have obtained the regression
equation Y on X as Y = a + b X and that of X on Y as X=a + b Y
 The values of “a” and “b” are depends on the means, the standard
deviation and coefficient of correlation between the two
variables.

 Regression equation Y on X ( ) where
y
x
Y Y r X X


   

2
2
2
2
2
2
1. ,
2.
3.
4. .
x
y
XX
nn
r Correlation coefficient between X and Y
YY
nn
n Total no of observation or f



 




 






 Regression equation X on Y ( ) where
x
y
X X r Y Y


   

2
2
2
2
2
2
1. ,
2.
3.
4. .
x
y
XX
nn
r Correlation coefficient between X and Y
YY
nn
n Total no of observation or f



 




 






Ex-3 From the following data calculate two equations of
lines regression.




Where correlation coefficient r = 0.5.
Y=0.4
5X+4
0.5
X=0.
556Y
+22.4
7
Ex-4 From the following data calculate two equations of
lines regression.




Where correlation coefficient r = 0.52.
Y=4.1
6X+4
09.81
X=0.
065Y
– 9.35

X Y
Mean 60 67.5
Standard
Deviation
15 13.5
X Y
Mean 508.4 23.7
Standard
Deviation
36.8 4.6

 Difference between correlation and Regression
1. Describing Relationships
Correlation describes the degree to which two variables are related.
Regression gives a method for finding the relationship between two
variables.
2. Making Predictions
Correlation merely describes how well two variables are related. Analysing
the correlation between two variables does not improve the accuracy with
which the value of the dependent variable could be predicted for a given
value of the independent variable.
Regression allows us to predict values of the dependent variable for a given
value of the independent variable more accurately.
3. Dependence Between Variables
In analysing correlation, it does not matter which variable is independent
and which is independent.
In analysing regression, it is necessary to identify between the dependent
and the independent variable.

Assignment

Q-1 Do as directed (Ex-1 to Ex-5 _ solve using Karl pearson’s method)
Ex-1 Find the correlation coefficient between the serum and diastolic
blood pressure and serum cholesterol levels of 10 randomly selected
data of 10 persons.
Ans. =
0.809
Person 1 2 3 4 5 6 7 8 9 10
Choles
terol
(X)
307 259 341 317 274 416 267 320 274 336
Diastol
ic
B.P(Y)
80 75 90 74 75 110 70 85 88 78
Ex-2 Find the correlation coefficient between Intelligence Ratio (I.R) and
Emotional Ration(E.R) from the following data
Ans. =
0.5963
Student 1 2 3 4 5 6 7 8 9 10
I.R(X) 105 104 102 101 100 99 98 96 93 92
E.R(Y) 101 103 100 98 95 96 104 92 97 94

Assignment


Ex-3 Find the correlation coefficient from the following data Ans. =
-0.79
X 1100 1200 1300 1400 1500 1600 1700 1800 1900 200
Y 0.30 0.29 0.29 0.25 0.24 0.24 0.24 0.29 0.18 0.15
Ex-4 Find the correlation coefficient from the following data Ans. =
0.9582
X 1 2 3 4 5 6 7 8 9 10
Y 10 12 16 28 25 36 41 49 40 50
Ex-5 Find the correlation coefficient from the following data Ans. =
0.9495
X 78 89 97 69 59 79 68 61
Y 125 137 156 112 107 138 123 110

Ex-6 : Find the correlation coefficient between the marks of
class test for the subjects maths and science given in the
form of a two way frequency table :




Assignment
Ages of Husbands (in years )

W
i
v
e
s

a
g
e
s
(
y
r
)


10-15 15-20 20-25 25-30 Total
40-50 0 1 1 1 3
50-60 3 3 0 1 7
60-70 3 3 3 1 10
70-80 1 0 1 1 3
80-90 0 3 3 1 7
Total 7 10 8 5 30
Ans : 0.1413

Ex-7 : Find the correlation coefficient between the marks of
annual exam for the subjects Account and statistics given in
the form of a two way frequency table :




Assignment
Marks in account 
S
t
a
t



m
a
r
k
s


60-65 65-70 70-75 75-80 Total
50-60 5 5 5 5 20
60-70 0 5 5 10 20
70-80 8 10 0 22 40
80-90 3 3 3 3 12
90-100 3 3 0 2 8
Total 19 26 13 42 100
Ans : 0.45

Q-2
Two judges in a beauty contest rank the 12
contestants as follows :


What degree of agreement is there between the
judges?
-0.454
Q-3
Nine Students secured the following percentage of
marks in mathematics and chemistry



Find the rank correlation coefficient and comment
on its value.
0.84
Assignment
X 1 2 3 4 5 6 7 8 9 10 11 12
Y 12 19 6 10 3 5 4 7 8 2 11 1
Roll.No 1 2 3 4 5 6 7 8 9
Marks in Maths 78 36 98 25 75 82 90 62 65
Marks in Chem. 84 51 91 60 68 62 86 58 53

Assignment
Q-4 What is correlation ? How will you measure it?
Q-5 Define coefficient of correlation. Explain how you will interpret the
value of coefficient of correlation .
Q-6 What is Scatter diagram? To what extent does it help in finding
correlation between two variables ? Or Explain Scatter diagram
method.
Q-7 What is Rank correlation?
Q-8 Explain the following terms with an example .
(i)Positive and negative correlation
(ii) Scatter diagram
(iii) correlation coefficient
(iv) total correlation
(v) partial correlation
Q-9 Explain the term regression and state the difference between
correlation and regression.
Q-10 What are the regression coefficient? Stat their properties.
Q-11 Explain the terms Lines of regression and Regression equations.

Q-12
Two judges in a beauty contest rank the 12
contestants as follows :


What degree of agreement is there between the
judges?
-0.454
Q-13
Nine Students secured the following percentage of
marks in mathematics and chemistry



Find the rank correlation coefficient and comment
on its value.
0.84
Assignment
X 1 2 3 4 5 6 7 8 9 10 11 12
Y 12 19 6 10 3 5 4 7 8 2 11 1
Roll.No 1 2 3 4 5 6 7 8 9
Marks in Maths 78 36 98 25 75 82 90 62 65
Marks in Chem. 84 51 91 60 68 62 86 58 53

Mathematicians are born not made