This presentation covered the following topics:
1. Definition of Correlation and Regression
2. Meaning of Correlation and Regression
3. Types of Correlation and Regression
4. Karl Pearson's methods of correlation
5. Bivariate Grouped data method
6. Spearman's Rank correlation Method
7...
This presentation covered the following topics:
1. Definition of Correlation and Regression
2. Meaning of Correlation and Regression
3. Types of Correlation and Regression
4. Karl Pearson's methods of correlation
5. Bivariate Grouped data method
6. Spearman's Rank correlation Method
7. Scattered diagram method
8. Interpretation of correlation coefficient
9. Lines of Regression
10. regression Equations
11. Difference between correlation and regression
12. Related examples
Size: 1.16 MB
Language: en
Added: Oct 21, 2020
Slides: 60 pages
Slide Content
By
Tushar Bhatt
[M.Sc(Maths), M.Phil(Maths), M.Phil(Stat.),M.A(Edu.),P.G.D.C.A]
Assistant Professor in Mathematics,
Atmiya University,
Rajkot
Correlation and Regression
Meaning of Correlation
Co – Means two, therefore correlation is a relation between
two variables (like X and Y )
Correlation is a Statistical method that is commonly used to
compare two or more variables
For example, comparison between income and expenditure,
price and demand etc...
Definition of Correlation
Correlation is a statistical measure for finding out degree
(strength) of association between two or more than two
variables.
Types of Correlation
There are three types of correlation as follows :
1.Type – 1 correlation
2. Type – 2 correlation
3. Type – 3 correlation
Type – 1 correlation
Type – 1
correlation
Positive
correlation
Negative
correlation
Positive Correlation
The correlation is said to be positive, if the values of two
variables changing with same direction.
In other words as X increasing , Y is in increasing similarly as
X decreasing , Y is in decreasing.
For example : Water consumption and Temperature.
Negative Correlation
The correlation is said to be negative, if the values of two
variables changing with opposite direction.
In other words as X increasing , Y is in decreasing similarly as
X decreasing , Y is in increasing.
For example : Alcohol consumption and Driving ability.
Type – 2 correlation
Type – 2
correlation
Simple
correlation
Multiple
correlation
Partial
correlation
Total
correlation
Simple Correlation
Under simple correlation problem there are only two
variables are studied.
Multiple Correlation
Under multiple correlation problem there are three or
more than three variables are studied.
Partial Correlation
Under multiple correlation problem there are two
variables considered and other variables keeping as
constant, known as partial correlation.
Total Correlation
Total correlation is based on all the relevant variables, which
is normally not feasible .
Type – 3 correlation
Type –3
correlation
Linear
correlation
Non-Linear
correlation
Linear Correlation
A correlation is said to be linear when the amount of change
in one variable tends to bear a constant ratio to the amount
of change in the other.
The graph of the variables having a linear relationship will
form a straight line.
For example:
Y = 3+2X (as per above table)
X 1 2 3 4 5
Y 5 7 9 11 13
Non – Linear Correlation
The correlation would be non-linear, if the amount of change
in one variable does not bear a constant ratio to the amount
of change in the other variable.
The methods to measure of correlation
There are three methods to measure of correlation :
1.Karl Pearson’s coefficient of correlation method
2. Coefficient of correlation for Bivariate Grouped data
method
3. Spearman’s Rank correlation method
4. Scatter diagram method
The methods to measure of correlation
Karl Pearson’s
coefficient of
correlation method
Direct method
If mean of x-series and
y-series are must be
integers
If mid value of x-series
and y-series are not
given in instruction
Short-cut method
If either mean of x-
series and y-series are
not an integer
If mid value of x-series
and y-series are given
in instruction
Data given in term of
middle values of X and
Y .
Definition : Covariance
Karl Pearson’s coefficient (r) of correlation method
Case -1: If are integers then cov(X,Y)
xy
r
( )(y )
cov( , ) , '
ii
x X Y
X Y n no of obsevations
n
2
2
()
.
(y )
.
i
x
i
y
xX
st deviationof X
n
Y
st deviationof Y
n
X andY X meanof x series
Y meanof y series
Direct method (Frequency is not given)
Karl Pearson’s coefficient (r) of correlation method
Case -2: If either may not be integers then X or Y
22
22
dx dy
dxdy
n
r
dx dy
dx dy
nn
,
,
dx x A Ais assumed mean of x series
dy y B B isassumed mean of y series
Short-cut Method (Frequency is not given)
Examples
Ex-1 : Find the correlation coefficient from the following tabular data :
Ans : 0.845
Ex-2 : Calculate Karl Pearson’s coefficient of correlation between
advertisement cost and sales as per the data given below:
Examples
Ex-3 : Find the correlation coefficient from the following
tabular data :
Ans : -0.99(approx)
• Ex-4: Calculate Pearson’s coefficient of correlation from the
following taking 100 and 50 as the assumed average of x-
series and y-series respectively:
Coefficient (r) of correlation for Bivariate Grouped data method
In case of bivariate grouped frequency distribution
,coefficient of correlation is given by
22
22
fu fv
fuv
n
r
fu fv
fu fv
nn
,,
cislengthof aninterval
,,
islengthof aninterval
XA
u Ais assumed mean of x series
c
YB
v B isassumed mean of y series
d
d
Examples
Ex-5 : Find the correlation coefficient between the grouped
frequency distribution of two variables (Profit and Sales)
given in the form of a two way frequency table :
Examples
Ex-6 : Find the correlation coefficient between the ages of
husbands and the ages of wives given in the form of a two
way frequency table :
Spearman’s Rank Correlation Method
The methods, we discussed in previous section are depends on the
magnitude of the variables.
but there are situations, where magnitude of the variable is not
possible then we will use “ Spearman’s Rank correlation method”.
For example we can not measure beauty and intelligence
quantitatively. It possible to rank individual in order.
Edward Spearman’s formula for Rank Correlation coefficient R,
as follows: 2
3
6
1
'
d
R
nn
n no of individualsineachseries
d Thedifferencebetweentheranksof thetwoseries
Examples
Ex-7 : Calculate the rank correlation coefficient if two judges
in a beauty contest ranked the entries follows:
Ans : -1
• Ex-8: Ten students got the following percentage of marks in
mathematics and statistics. Evaluate the rank correlation
between them.
Judge X 1 2 3 4 5
Judge Y 5 4 3 2 1
Roll. No. 1 2 3 4 5 6 7 8 9 10
Marks in
Maths
78 36 98 25 75 82 90 62 65 39
Marks in Stat.
84 51 91 60 68 62 86 58 53 47
Ans : 0.8181
Scatter Diagram Method
In this method first we plot the observations in XY – plane .
X - Independent variable along with horizontal axis.
Y - Dependent variable along with vertical axis.
Interpretation of correlation coefficient
The closer the value of the correlation coefficient is to 1 or -1, the
stronger the relationship between the two variables and the more
the impact their fluctuations will have on each other.
If the value of r is 1, this denotes a perfect positive relationship
between the two and can be plotted on a graph as a line that goes
upwards, with a high slope.
If the value of r is 0.5, this will denote a positive relationship
between the two variables and it can be plotted on a graph as a line
that goes upward, with a moderate slope.
If the value of r is 0, there is no relationship at all between the
two variables.
If the value of r is -0.5, this will denote a negative relationship
between the two variables and it can be plotted on a graph as a line
that goes downwards with a moderate slope.
Interpretation of correlation coefficient
If the value of r is -1, it will denote a negative relationship
between the two variables and it can be plotted on a graph as a line
that goes downwards with a steep slope.
If the value of the correlation coefficient is between 0.1 to 0.5 or -
0.1 and -0.5, the two variables in the relationship are said to be
weakly related. If the value of the correlation coefficient is
between 0.9 and 1 or -0.9 and -1, the two variables are extremely
strongly related.
As we discussed earlier, a positive coefficient will show variables
that rise at the same time.
A negative coefficient, on the other hand, will show variables that
move in opposite directions. It’s easy to tell the relationship
between by checking the positive or negative value of the
coefficient.
Regression
Types of Regression
SIMPLE REGRESSION
Study only two variables at a time.
• MULTIPLE REGRESSION
Study of more than two variables at a time.
Lines of Regression
(a)Regression Equation Y on X
( ) where
yx
Y Y b X X
2
2
2
2
cov( , )
1. ,
()
2. cov( , ) ,
3.
4. .
5.
yx
x
x
yx
XY
b
XY
X Y X Y
n
XX
nn
n Total no of observations
b regressioncoefficient of regressionlineY onX
Lines of Regression
(b) Regression Equation X on Y
( ) where
xy
X X b Y Y
2
2
2
2
cov( , )
1. ,
()
2. cov( , ) ,
3.
4. .
5.
xy
y
y
xy
XY
b
XY
X Y X Y
n
YY
nn
n Total no of observations
b regressioncoefficient of regressionline X onY
Regression Equations
The algebraic expressions of the regression lines are called
regression equations.
Since there are two regression lines therefore there are two
regression equations.
Using previous method we have obtained the regression
equation Y on X as Y = a + b X and that of X on Y as X=a + b Y
The values of “a” and “b” are depends on the means, the standard
deviation and coefficient of correlation between the two
variables.
Regression equation Y on X ( ) where
y
x
Y Y r X X
2
2
2
2
2
2
1. ,
2.
3.
4. .
x
y
XX
nn
r Correlation coefficient between X and Y
YY
nn
n Total no of observation or f
Regression equation X on Y ( ) where
x
y
X X r Y Y
2
2
2
2
2
2
1. ,
2.
3.
4. .
x
y
XX
nn
r Correlation coefficient between X and Y
YY
nn
n Total no of observation or f
Ex-3 From the following data calculate two equations of
lines regression.
Where correlation coefficient r = 0.5.
Y=0.4
5X+4
0.5
X=0.
556Y
+22.4
7
Ex-4 From the following data calculate two equations of
lines regression.
Where correlation coefficient r = 0.52.
Y=4.1
6X+4
09.81
X=0.
065Y
– 9.35
X Y
Mean 60 67.5
Standard
Deviation
15 13.5
X Y
Mean 508.4 23.7
Standard
Deviation
36.8 4.6
Difference between correlation and Regression
1. Describing Relationships
Correlation describes the degree to which two variables are related.
Regression gives a method for finding the relationship between two
variables.
2. Making Predictions
Correlation merely describes how well two variables are related. Analysing
the correlation between two variables does not improve the accuracy with
which the value of the dependent variable could be predicted for a given
value of the independent variable.
Regression allows us to predict values of the dependent variable for a given
value of the independent variable more accurately.
3. Dependence Between Variables
In analysing correlation, it does not matter which variable is independent
and which is independent.
In analysing regression, it is necessary to identify between the dependent
and the independent variable.
Assignment
Q-1 Do as directed (Ex-1 to Ex-5 _ solve using Karl pearson’s method)
Ex-1 Find the correlation coefficient between the serum and diastolic
blood pressure and serum cholesterol levels of 10 randomly selected
data of 10 persons.
Ans. =
0.809
Person 1 2 3 4 5 6 7 8 9 10
Choles
terol
(X)
307 259 341 317 274 416 267 320 274 336
Diastol
ic
B.P(Y)
80 75 90 74 75 110 70 85 88 78
Ex-2 Find the correlation coefficient between Intelligence Ratio (I.R) and
Emotional Ration(E.R) from the following data
Ans. =
0.5963
Student 1 2 3 4 5 6 7 8 9 10
I.R(X) 105 104 102 101 100 99 98 96 93 92
E.R(Y) 101 103 100 98 95 96 104 92 97 94
Assignment
Ex-3 Find the correlation coefficient from the following data Ans. =
-0.79
X 1100 1200 1300 1400 1500 1600 1700 1800 1900 200
Y 0.30 0.29 0.29 0.25 0.24 0.24 0.24 0.29 0.18 0.15
Ex-4 Find the correlation coefficient from the following data Ans. =
0.9582
X 1 2 3 4 5 6 7 8 9 10
Y 10 12 16 28 25 36 41 49 40 50
Ex-5 Find the correlation coefficient from the following data Ans. =
0.9495
X 78 89 97 69 59 79 68 61
Y 125 137 156 112 107 138 123 110
Ex-6 : Find the correlation coefficient between the marks of
class test for the subjects maths and science given in the
form of a two way frequency table :
Ex-7 : Find the correlation coefficient between the marks of
annual exam for the subjects Account and statistics given in
the form of a two way frequency table :
Q-2
Two judges in a beauty contest rank the 12
contestants as follows :
What degree of agreement is there between the
judges?
-0.454
Q-3
Nine Students secured the following percentage of
marks in mathematics and chemistry
Find the rank correlation coefficient and comment
on its value.
0.84
Assignment
X 1 2 3 4 5 6 7 8 9 10 11 12
Y 12 19 6 10 3 5 4 7 8 2 11 1
Roll.No 1 2 3 4 5 6 7 8 9
Marks in Maths 78 36 98 25 75 82 90 62 65
Marks in Chem. 84 51 91 60 68 62 86 58 53
Assignment
Q-4 What is correlation ? How will you measure it?
Q-5 Define coefficient of correlation. Explain how you will interpret the
value of coefficient of correlation .
Q-6 What is Scatter diagram? To what extent does it help in finding
correlation between two variables ? Or Explain Scatter diagram
method.
Q-7 What is Rank correlation?
Q-8 Explain the following terms with an example .
(i)Positive and negative correlation
(ii) Scatter diagram
(iii) correlation coefficient
(iv) total correlation
(v) partial correlation
Q-9 Explain the term regression and state the difference between
correlation and regression.
Q-10 What are the regression coefficient? Stat their properties.
Q-11 Explain the terms Lines of regression and Regression equations.
Q-12
Two judges in a beauty contest rank the 12
contestants as follows :
What degree of agreement is there between the
judges?
-0.454
Q-13
Nine Students secured the following percentage of
marks in mathematics and chemistry
Find the rank correlation coefficient and comment
on its value.
0.84
Assignment
X 1 2 3 4 5 6 7 8 9 10 11 12
Y 12 19 6 10 3 5 4 7 8 2 11 1
Roll.No 1 2 3 4 5 6 7 8 9
Marks in Maths 78 36 98 25 75 82 90 62 65
Marks in Chem. 84 51 91 60 68 62 86 58 53