Statistics and Probability Correlation and Regression
MathewBuera
4 views
41 slides
Feb 27, 2025
Slide 1 of 41
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
About This Presentation
Statistics Correlation
Size: 3.2 MB
Language: en
Added: Feb 27, 2025
Slides: 41 pages
Slide Content
Larson & Farber, Elementary Statistics: Picturing the World, 3e 1
Correlation and Linear Correlation and Linear
RegressionRegression
Quarter 4Quarter 4
Week 5Week 5
Larson & Farber, Elementary Statistics: Picturing the World, 3e 2
Larson & Farber, Elementary Statistics: Picturing the World, 3e 3
Describe the relationship between the two pictures
Larson & Farber, Elementary Statistics: Picturing the World, 3e 4
Describe the relationship between the two pictures
Larson & Farber, Elementary Statistics: Picturing the World, 3e 5
Describe the relationship between the two pictures
Larson & Farber, Elementary Statistics: Picturing the World, 3e 6
Describe the relationship between the two pictures
Larson & Farber, Elementary Statistics: Picturing the World, 3e 7
Describe the relationship between the two pictures
Correlation
Larson & Farber, Elementary Statistics: Picturing the World, 3e 9
Correlation
A correlation is a relationship between two variables.
The data can be represented by the ordered pairs (x, y)
where x is the independent variable, and y is the
dependent variable.
A scatter plot can be used to
determine whether a linear
(straight line) correlation exists
between two variables.
x
2 4
–2
– 4
y
2
6
x12345
y – 4– 2– 102
Example:
Larson & Farber, Elementary Statistics: Picturing the World, 3e 10
Linear Correlation
x
y
Negative Linear Correlation
x
y
No Correlation
x
y
Positive Linear Correlation
x
y
Nonlinear Correlation
As x increases,
y tends to
decrease.
As x increases,
y tends to
increase.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 11
Correlation Coefficient
The correlation coefficient is a measure of the strength
and the direction of a linear relationship between two
variables. The symbol r represents the sample
correlation coefficient. The formula for r is
2 22 2
.
n xy x y
r
n x x n y y
The range of the correlation coefficient is 1 to 1. If x and
y have a strong positive linear correlation, r is close to 1.
If x and y have a strong negative linear correlation, r is
close to 1. If there is no linear correlation or a weak
linear correlation, r is close to 0.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 12
Interpretation Guideline
Larson & Farber, Elementary Statistics: Picturing the World, 3e 13
Linear Correlation
x
y
Very strong negative correlation
x
y
Moderately positive correlation
x
y
Very strong positive correlation
x
y
Very weak/ Negligible Correlation
r = 0.91 r = 0.88
r = 0.42
r = 0.07
Larson & Farber, Elementary Statistics: Picturing the World, 3e 14
Calculating a Correlation Coefficient
1.Find the sum of the x-values.
2.Find the sum of the y-values.
3.Multiply each x-value by its
corresponding y-value and find the
sum.
4.Square each x-value and find the sum.
5.Square each y-value and find the sum.
6.Use these five sums to calculate
the correlation coefficient.
Continued.
Calculating a Correlation Coefficient
In Words In Symbols
x
y
xy
2
x
2
y
2 22 2
.
n xy x y
r
n x x n y y
Larson & Farber, Elementary Statistics: Picturing the World, 3e 15
Correlation Coefficient
Example:
Calculate the correlation coefficient r for the following data.
x y xy x
2
y
2
1 – 3 – 3 1 9
2 – 1 – 2 4 1
3 0 0 9 0
4 1 4 16 1
5 2 10 25 4
15x 1y 9xy
2
55x
2
15y
2 22 2
n xy x y
r
n x x n y y
22
5(9) 15 1
5(55) 15 5(15) 1
60
50 74
0.986
There is a very strong
positive linear correlation
between x and y.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 16
Correlation Coefficient
Hours, x 0123355567710
Test score, y968582749568768458657550
Example:
The following data represents the number of hours 12
different students watched television during the
weekend and the scores of each student who took a test
the following Monday.
a.) Display the scatter plot.
b.) Calculate the correlation coefficient r.
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 17
Correlation Coefficient
Hours, x 0123355567710
Test score, y968582749568768458657550
Example continued:
100
x
y
Hours watching TV
T
e
s
t
s
c
o
r
e80
60
40
20
246 810
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 18
Correlation Coefficient
Hours, x 0 1 23 3555 67710
Test score, y968582749568768458657550
xy 085164222285340380420348455525500
x
2
0 1 49 9252525364949100
y
2
921672256724547690254624577670563364422556252500
Example continued:
2 22 2
n xy x y
r
n x x n y y
22
12(3724) 54 908
12(332) 54 12(70836) 908
0.831
There is a very strong negative linear correlation.
As the number of hours spent watching TV increases,
the test scores tend to decrease.
54x 908y 3724xy
2
332x
2
70836y
Linear Regression
Larson & Farber, Elementary Statistics: Picturing the World, 3e 20
Regression Line
A regression line, also called a line of best fit, is the line for
which the sum of the squares of the residuals is a minimum.
The Equation of a Regression Line
The equation of a regression line for an independent variable
x and a dependent variable y is
ŷ = mx + b
where ŷ is the predicted y-value for a given x-value. The
slope m and y-intercept b are given by
-
-
22
and
where is the mean of the y values and is the mean of the
values. The regression line always passes through ( , ).
n xy x y y x
m b y mx m
n n
n x x
y x
x x y
Larson & Farber, Elementary Statistics: Picturing the World, 3e 21
Regression Line
Example:
Find the equation of the regression line.
x y xy x
2
y
2
1 – 3 – 3 1 9
2 – 1 – 2 4 1
3 0 0 9 0
4 1 4 16 1
5 2 10 25 4
15x 1y 9xy
2
55x
2
15y
22
n xy x y
m
n x x
2
5(9) 15 1
5(55) 15
60
50
1.2
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 22
Regression Line
Example continued:
b y mx
1 15
(1.2)
5 5
3.8
The equation of the regression line is
ŷ = 1.2x – 3.8.
2
x
y
1
1
2
3
123 4 5
1
( , ) 3,
5
x y
Larson & Farber, Elementary Statistics: Picturing the World, 3e 23
Regression Line
Example:
The following data represents the number of hours 12
different students watched television during the
weekend and the scores of each student who took a test
the following Monday.
Hours, x 0 1 23 3555 67710
Test score, y968582749568768458657550
xy 085164222285340380420348455525500
x
2
0 1 49 9252525364949100
y
2
921672256724547690254624577670563364422556252500
54x 908y 3724xy
2
332x
2
70836y
a.) Find the equation of the regression line.
b.) Use the equation to find the expected test score
for a student who watches 9 hours of TV.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 24
Regression Line
Example continued:
22
n xy x y
m
n x x
2
12(3724) 54 908
12(332) 54
4.067
b y mx
908 54
( 4.067)
12 12
93.97
ŷ = –4.07x + 93.97
100
x
y
Hours watching TV
T
e
s
t
s
c
o
r
e80
60
40
20
246 810
54 908
( , ) , 4.5,75.7
12 12
x y
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 25
Regression Line
Example continued:
Using the equation ŷ = –4.07x + 93.97, we can predict
the test score for a student who watches 9 hours of TV.
= –4.07(9) + 93.97
ŷ = –4.07x + 93.97
= 57.34
A student who watches 9 hours of TV over the weekend
can expect to receive about a 57.34 on Monday’s test.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 26
Predicting y-Values
After finding the equation of the multiple regression line, you
can use the equation to predict y-values over the range of the data.
Example:
The following multiple regression equation can be used to predict
the annual U.S. rice yield (in pounds).
ŷ = 859 + 5.76x
1
+ 3.82x
2
where x
1
is the number of acres planted (in thousands), and x
2
is
the number of acres harvested (in thousands).
(Source: U.S. National Agricultural Statistics Service)
a.) Predict the annual rice yield when x
1 = 2758, and x
2 = 2714.
b.) Predict the annual rice yield when x
1
= 3581, and x
2
= 3021.
Continued.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 27
Predicting y-Values
Example continued:
= 859 + 5.76(2758) + 3.82(2714)
= 27,112.56
a.) ŷ = 859 + 5.76x
1
+ 3.82x
2
The predicted annual rice yield is 27,1125.56 pounds.
= 859 + 5.76(3581) + 3.82(3021)
= 33,025.78
b.) ŷ = 859 + 5.76x
1 + 3.82x
2
The predicted annual rice yield is 33,025.78 pounds.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 28
Assessment
Larson & Farber, Elementary Statistics: Picturing the World, 3e 29
Assessment
Direction:
a. Calculate the correlation (r) between the two
variables.
b. Write a brief interpretation of this correlation,
including the strength, direction, and an
explanation of the effect.
c. Find the equation of the regression line.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 30
Assessment
Age (x)
43212542573328
Glucose
Level
(y)
99657975878270
1. 1.
2. 2.
Age (x)
20212445465460
Weight
(y) 123132145155160162150
Larson & Farber, Elementary Statistics: Picturing the World, 3e 31
Problem Solving
Larson & Farber, Elementary Statistics: Picturing the World, 3e 32
Problem Solving
1. Alice and Leo did a study on feelings of stress
and life satisfaction during Quarantine.
Participants completed a measure on how stressed
they were feeling (on a 1 to 30 scale) and a measure
of how satisfied they felt with their lives (measures
on a 1 to 10 scale). The table below indicates the
participants’ scores.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 34
Problem Solving
a.Calculate the correlation (r) between stress and
life satisfaction.
b.Write a brief interpretation of this correlation,
including the strength, direction, and an
explanation of the effect.
c.Can you say that being more stressed causes a
lower level of life satisfaction? Why and why
not?
Larson & Farber, Elementary Statistics: Picturing the World, 3e 35
Problem Solving
2. In a biology experiment a number of cultures
from Brgy. Aplaya Lake were grown in the
laboratory of ANHS. The numbers of bacteria, in
millions, and their ages, in days, are given below.
Age (x)
1 2 3 4 5 6 7 8
No. of
bacteria
(y)
34106135181192231268300
Larson & Farber, Elementary Statistics: Picturing the World, 3e 36
Problem Solving
a.Calculate the correlation (r) and write a brief
interpretation of this correlation, including the
strength, direction, and an explanation of the effect.
b.Some late readings were taken and are given below.
X 13 14 15
y 400 403 405
Add these points to you graph and describe what
they show.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 37
Problem Solving
3. A metal rod was gradually heated and its length,
L, was measured at various temperature, T.
Temperature
(C)
152025303540
Length (cm)
100103.8106.1112116.1119.9
Larson & Farber, Elementary Statistics: Picturing the World, 3e 38
Problem Solving
a.Calculate the correlation (r) and write a brief
interpretation of this correlation, including the
strength, direction, and an explanation of the effect.
b.Do you suspect a major inaccuracy in any of the
recorded values? If so, discard any you consider
untrustworthy and find the new value of r.
Larson & Farber, Elementary Statistics: Picturing the World, 3e 39
What I Can Do
Answer the following questions:
1. What are the three types of correlation?
2. How will we know if we have a perfect
correlation?
3. Can we consider a correlation of 0.02 significant?
Larson & Farber, Elementary Statistics: Picturing the World, 3e 40
What I Have Learned
I have learned that …
I understand that …
I realized that …
Larson & Farber, Elementary Statistics: Picturing the World, 3e 41
Reflection
Answer the following questions about your personal
insights about the lesson using the prompts below:
Compare your recent situation to last year
situation before the pandemic?
Is vaccination enough to stop the spread of COVID
19 virus? Why and why not?