Linear Correlation

Tawfikzahran 3,262 views 26 slides May 04, 2014
Slide 1
Slide 1 of 26
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26

About This Presentation

Types, indications of correlations, types of variables, linear correlation, strength, direction and significance


Slide Content

05/04/14 Dr Tarek Amin 1
Investigating the Relationship
between Two or More Variables
(Correlation)
Professor Tarek Tawfik Amin
Public Health, Faculty of Medicine
Cairo University
[email protected]

The Relationship Between Variables
Variables can be categorized into two types when investigating
their relationship:
Dependent:
A dependent variable is explained or affected
by an independent variable. Age and height
Independent :
Two variables are independent if the pattern of
variation in the scores for one variable is not
related or associated with variation in the scores
for the other variable.
The level of education in Ecuador and the infant
mortality in Mali

Techniques used to Analyze the Relationship between Two
Variables
Method Examples
I- Tabular and graphical methods:
These present data in way that reveals a
possible relationship between two
variables.
II-Numerical methods:
Mathematical operations used to quantify,
in a single number, the strength of a
relationship (measures of association).
When both variables are measured at least
at the ordinal level they also indicate the
direction of the relationship.
Bivariate table for categorical data
(nominal/ordinal data)
Scatter plot for interval/ratio.
Lambda, Cramer’s V (nominal)
Gamma, Somer’s d, Kendall’s tau-b/c
(ordinal with few values)
Spearman’s rank order Co/Co.
(ordinal scales with many values)
Pearson’s product moment correlation
(Interval/ratio)
These techniques are called collectively as
Bi-variate descriptive statistics

Correlation: indications
oCorrelational techniques are used to study
relationships.
oThey may be used in exploratory studies in
which one to intent to determine whether
relationships exist,
oAnd in hypothesis testing about a particular
relationship.

Correlations techniques used to
assess
the existence,
the direction
and the strength
of association between
variables.

Pearson Correlation (Numeric, interval/ratio)
The Pearson product moment correlation coefficient (r or rho)
is the usual method by which the relation between two
variables is quantified.
Type of data required:
Interval/ratio sometimes ordinal data.
At least two measures on each subjects at the
interval/ratio level.
Assumptions:
The sample must be representative of the population.
The variables that are being correlated must be normally
distributed.
The relationship between variables must be LINEAR.

Directions of Correlations on Scatter Plot
Positive
Negative
No Correlation
Non-linear (Curvilinear(

05/04/14 Dr Tarek Amin 8
Relationships Measured with Correlation Coefficient
The correlation coefficient is the cross products
of the Z-scores.
[ ]( )nzXzYrå=
Where:
ZX= the z-score of variable X
ZY= the z-score of variable Y
N= number of observations

Because the means and standard deviations
of any given two sets of variables are
different, we cannot directly compare the
two scores.
However, we can, transform them from the
ordinary absolute figures to Z-scores with a
mean of 0 and SD of 1.
The correlation is the mean of the cross-
products of the Z-score for each value
included, a measure of how much each pair
of observations (scores) varies together.
Tips

Correlation Coefficient (r)
The correlation coefficient r allows us to
state mathematically the relationship that
exists between two variables. The correlation
coefficient may range from +1.00 through 0.00 to – 1.00.
A + 1.00 indicates a perfect positive
relationship,
0.00 indicates no relationship,
and -1.00 indicates a perfect negative
relationship.

I-Strength of the Correlation Coefficient
How large r should for it to be useful?
In decision making at least 0.95 while those concerning
human behaviors 0.5 is fair.
The strengths of r are as follow:
0.00-0.25 little if any.
0.26 -0.49 LOW
0.50- 0.69 Moderate
0.70 - 0.89 High
0.90 – 1.00 Very high .

II-Significance of the Correlation
The level of statistical significance is greatly
affected by the sample size n.
If r is based on a sample of 1,000, there is much
greater likelihood that it represents the r of the
population than if it were based on 10 subjects.

‘ With large sample sizes rs that are described as
demonstrating (little if any) relationship are
statistically significant’
Statistical significance implies that r
did not occur by chance, the
relationship is greater than zero.

- The correlation coefficient also tell us the type
of relation that exists; that is, whether is
positive or negative.
- The relationship between job satisfaction and job
turnover has been shown to be negative; an
inverse relationship exists between them.
When one variable increases, the other decreases.
- Those with higher grades have lower dropout rates
(a positive relationship).
Increases in the score of one variable is accompanied by
increase in the other.
III- Direction of correlation

Relationships Measured by Correlation
Coefficients:
When using the formula with Z-scores, r is the
average of the corss-products of the Z-scores.

[ ]( )nzXzYrå=
A five subjects took a quiz X, on which the scores ranged from
6 to 10 and an examination Y, on which the scores ranged form
82 to 98.
Calculate r and determine the pattern of correlation?

05/04/14 Dr Tarek Amin 16
Formula for calculating correlation coefficient r.
[ ]( )nzXzYrå=

A perfect positive relationship between two variables.
SubjectsX (quiz) Y
(examination
)
zX zY zX*zY
1
2
3
4
5
6
7
8
9
10
82
86
90
94
98
-1.42
-0.71
0.00
0.71
1.42
-1.42
0.71
0.00
0.71
1.42
2.0
0.5
0.0
0.5
2.0
mean X= 8, SD=1.41 mean Y= 90 sd=5.66 ∑zXzY= 5.00

r = ∑zXzY/n =
5.00/5 = +1

Positive Correlation
80
82
84
86
88
90
92
94
96
98
100
0 5 10 15
X score
Y

s
c
o
r
e

Perfect negative relationship
Subjects X Y zX zY zXzY
1
2
3
4
5
6
7
8
9
10
98
94
90
86
82
-1.42
-0.71
00.0
0.71
1.42
1.42
0.71
0.00
-0.71
-1.42
-2.0
-0.5
0.0
-0.71
-2.0
Mean X =8
SD= 1.41
Mean Y= 90
SD= 5.66
zXzY= -5.00 ∑
[ ]( )nzXzYrå= - =5.0/5-=1.0

Negative Correlation
80
82
84
86
88
90
92
94
96
98
100
0 5 10 15
X score
Y

s
c
o
r
e

No relationship
Subjects X Y zX zY zXzY
1
2
3
4
5
6
7
8
9
10
94
82
90
98
86
-1.42
-0.71
0.00
0.71
1.42
0.71
-1.42
0.00
1.42
-0.71
-1.0
1.0
0.0
1.0
-1.0
Mean X= 8
SD= 1.41
Mean Y= 90
SD= 5.66
zXzY= 0.00 ∑
r=0.00/5=0.00

No Correlation
80
82
84
86
88
90
92
94
96
98
100
0 5 10 15
X score
Y
s
c
o
r
e

The following table is SPSS output describing the correlation between age, education in years,
smoking history, satisfaction with the current weight, and the overall state of health for a randomly
selected subjects.
Overall state
of health
Satisfaction
with current
weight
Smoking
history
Education in
years
Subject's
age
1.000
.
434
Subject's age
Pearson Correlation
Sig.(2 tailed)
N
.022
.649
419
Education in years
Pearson Correlation
Sig.(2 tailed)
N
-.108*
.026
423
.143**
.003
432
Smoking history
Pearson Correlation
Sig.(2 tailed)
N
-.009
.849
440
.033
.493
424
-.077
.109
432
Satisfaction with current
weight
Pearson Correlation
Sig.(2 tailed)
N
1.000
.
444
.370*
.000
443
-.200*
.000
441
.149**
.000
425
-.126**
.009
433
Overall state of health
Pearson Correlation
Sig.(2 tailed)
N

* Correlation is significant at the 0.05 level (2-tailed(.
** Correlation is significant at the 0.01 level (2-tailed).

Figure (1): Insulin resistance (HOMA-IR) in relation to
serum ferritin level among cases and controls.
Ferritin (log)
2.82.62.42.22.01.8
H
O
M
A
-
R
I
8
7
6
5
4
3
2
Controls
Sickle
Total Population
r=0.804, P=0.0001

Figure (2): 1,25 (OH) vitamin D in relation to body mass
index among obese and lean controls.
Body mass index
5040302010
V
i
t
a
m
i
n

D

l
e
v
e
l
100
80
60
40
20
0
Lean
Obese
Total Population
r= -.166, P=0.036

05/04/14 Dr Tarek Amin 26
Thank you