Everything that you always wanted to know about Linear Regression for a Senior level college class in finance but were too afraid to ask

lucfaucheux 0 views 99 slides Oct 14, 2025
Slide 1
Slide 1 of 99
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99

About This Presentation

Everything that you always wanted to know about Linear Regression for a Senior level college class in finance but were too afraid to ask


Slide Content

Investment Portfolio
Management
FIN 421
Luc Faucheux, PhD
Fall 2025

Straight Lines
Part Un
Linear Regression

Straight lines are everywhere
3
Luc Faucheux 2025

Straight Lines - I
4
Luc Faucheux 2025
“God does not build in straight lines” Prometheus.. (the movie not the Greek Titan god)

Straight Lines - II
5
Luc Faucheux 2025

Straight Lines - III
6
Luc Faucheux 2025

Straight Lines - IV
7
Luc Faucheux 2025

Straight Lines - V
8
Luc Faucheux 2025

Straight Lines - VI
9
Luc Faucheux 2025

Straight Lines - VII
10
Luc Faucheux 2025
�−??????
??????=�+�.�
�−??????
??????+�
??????.���+�
??????.���

Straight Lines - VIII
11
Luc Faucheux 2025
•Sampling – Measuring – OLS – Linear regression
•Predicting – Modeling – Security Market Line – CAPM
•Evaluating Performance – alpha and beta
•Explaining away – Factor Analysis – PCA
•Explaining away – French Fama factor analysis
•In this deck we will barely touch the Linear Regression (most of it within the OLS, Ordinary Least
Squares method).
•Just that will give us like almost 100 slides
•Part Deux, Trois, Quatre and more about the CAPM, the SML, the alpha and beta of portfolios,
factor analysis, PCA and the French Fama framework will be in another deck, or other decks
depending on the number of slides.

Measuring – Sampling – Linear regression
12
Luc Faucheux 2025

Measuring – Sampling – Linear regression - I
•With one variable : Sampling a distribution of returns {�
??????} with N observations
•A reasonable approximation is that the observations are iid
•Independent: ??????
��=0 except when �=�
•Identically distributed: ??????�
�=?????? and V�
�=??????
2
•The Mean Estimator (Sample Mean) is: Ƹ??????=
1
�

��
�
•??????Ƹ??????=??????
•??????Ƹ??????=??????
2
Ƹ??????=
??????
2
�
•Ƹ?????? is BLUE (Best Linear Unbiased Estimator)
•Ƹ??????՜
??????
??????(??????,
??????
2
�
)
•Standard Deviation (population) ??????=??????[�]
•Standard Error (sample)
??????
�
=??????[Ƹ??????]
13
Luc Faucheux 2025

Measuring – Sampling – Linear regression - II
•With one variable : Sampling a distribution of returns {�
??????} with N observations
•Standard Deviation (population) ??????=??????[�]
•Standard Error (sample)
??????
�
=??????[Ƹ??????]
•Sample Variance Estimator


??????
2
=
1
�

�(�
�−Ƹ??????)
2
•??????

??????
2
=
�−1
�
??????
2
•??????

??????
2
=??????
2
Ƹ??????=
??????
4−??????
4
??????
2

??????
2
�
14
Luc Faucheux 2025

15
Luc Faucheux 2025
Measuring – Sampling – Linear regression - III
•HOW DOES IT WORK?
•Get a sample with N observations {�
�}
•Compute the sample mean ො??????=
1
�

��
�
•??????ො??????=??????
•The sample mean ො?????? is your estimate of the “true” population mean ??????
•That is an estimate, that might not be the actual value
•Compute the sample standard deviation

??????
2
=
1
�

�(�
�−ො??????)
2

16
Luc Faucheux 2025
Measuring – Sampling – Linear regression - IV
•HOW DOES IT WORK?
•Compute the sample standard deviation

??????
2
=
1
�

�(�
�−ො??????)
2

•??????

??????
2
=
�−1
�
??????
2
•??????
2
=??????

??????
2
.
�
�−1
•So the value

??????
2
.
�
�−1
is your estimate of the “true” population variance ??????
2
•That is an estimate, that might not be the actual value
•But hey, that is the best that you can do…

Measuring – Sampling – Linear regression - V
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
17
Luc Faucheux 2025
�
�
•(�
1,�
1)
�
1
�
1

Measuring – Sampling – Linear regression - VI
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
18
Luc Faucheux 2025
�
�
•(�
1,�
1)
�
1
�
1
•(�
2,�
2)
•(�
3,�
3)

Measuring – Sampling – Linear regression - VII
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
19
Luc Faucheux 2025
�
�
•(�
1,�
1)
•(�
2,�
2)
•(�
3,�
3)

Measuring – Sampling – Linear regression - VIII
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
20
Luc Faucheux 2025
�
�
•(�
1,�
1)
•(�
2,�
2)
•(�
3,�
3)
•(�
4,�
4)
•(�
5,�
5)
•(�
6,�
6)
•(�
7,�
7)

Measuring – Sampling – Linear regression - IX
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
21
Luc Faucheux 2025
�
�






Measuring – Sampling – Linear regression - X
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
22
Luc Faucheux 2025
�
�






Measuring – Sampling – Linear regression - XI
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
23
Luc Faucheux 2025
�
�







The truth ?
(�
�,�
�)

Measuring – Sampling – Linear regression - XII
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
24
Luc Faucheux 2025
�
�







The truth ?
Y=α+�.�+ℰ
(�
�,�
�)

Measuring – Sampling – Linear regression - XIII
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
25
Luc Faucheux 2025
�
�







The truth ?
Y=α+�.�+ℰ
(�
�,�
�)

�

Measuring – Sampling – Linear regression - XIV
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
26
Luc Faucheux 2025
�
�







So really a linear regression has not TWO
but THREE variables: �, Y, AND ℰ
Y=α+�.�+ℰ(�
�,�
�)

�

Measuring – Sampling – Linear regression - XV
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
27
Luc Faucheux 2025
�
�







So really a linear regression has not TWO
but THREE variables: �, Y, AND ℰ
Y=α+�.�+ℰ(�
�,�
�)

�

Measuring – Sampling – Linear regression - XVI
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
28
Luc Faucheux 2025
�
�







•So really a linear regression has not
TWO but THREE variables: �, Y, AND ℰ
•Modeling: Y=α+�.�+ℰ
•Sampling: �
�=α+�.�
�+ℰ
�
•outcome: �
�=α+�.�
�+??????
�
(�
�,�
�)

�

Measuring – Sampling – Linear regression - XVII
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
29
Luc Faucheux 2025
•So really a linear regression has not TWO but THREE variables: �, Y, AND ℰ
•Modeling: Y=α+�.�+ℰ
•Sampling: �
�=α+�.�
�+ℰ
�
•outcome: �
�=α+�.�
�+??????
�
•So all THREE of those variables have their own distributions, with of course mean and standard deviation,
among many other things….
•If you think about it, it makes a lot of sense, there were TWO variables �, Y, but by trying to find a linear
relationship between those TWO we are obviously introducing a THIRD one ℰ. In many ways you can say tht
we are replacing Y by ℰ by imposing that Y=α+�.�+ℰ

Measuring – Sampling – Linear regression - XVIII
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
30
Luc Faucheux 2025
•Modeling: Y=α+�.�+ℰ
•Sampling: �
�=α+�.�
�+ℰ
�
•outcome: �
�=α+�.�
�+??????
�
•So for a given (�
�,�
�) we will calculate the residual ℰ
� as :
• �
�=α+�.�
�+ℰ
�
•ℰ
�=�
�− α−�.�
�
•The goal is to find the intercept α and the slope � that “best” describes the relationship between � and �
•“best” is being used loosely here, but can be more mathematically refined, in particular based on the specific
method being used to compute the intercept and the slope (or more exactly estimate of the intercept and
the slope)
•This should be somewhat familiar to use now having gone through the sample moments.

Measuring – Sampling – Linear regression - XIX
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
31
Luc Faucheux 2025
•Modeling: Y=α+�.�+ℰ
•Sampling: �
�=α+�.�
�+ℰ
�
•outcome: �
�=α+�.�
�+??????
�
•Just like with one variable where we defined the sample mean estimator, the sample variance estimator, we
will also define some estimators that are ALSO themselves stochastic variables, hence having their own
distribution, mean, and standard deviation for example
•Just like with one variable we will have to make some “reasonable” assumptions in order to be able to say
anything meaningful at all about solving the problem at hand

Measuring – Sampling – Linear regression - XX
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
32
Luc Faucheux 2025
•Modeling: Y=α+�.�+ℰ
•Sampling: �
�=α+�.�
�+ℰ
�
•outcome: �
�=α+�.�
�+??????
�
•”Reasonable” assumptions
•There are a lot of different ways to approach the problem
•This is NOT a course in statistics or quantitative analysis, as much as I would like to make it
•We are dealing primarily here with OLS (Ordinary Least Squares).
•Suffice to say that it is a powerful and well oiled method to be able to get what we want, in most cases that
you will encounter in finance
•When it gets more complicated, call up your favorite statistician….

Measuring – Sampling – Linear regression - XXI
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
33
Luc Faucheux 2025
•Modeling: Y=α+�.�+ℰ
•Sampling: �
�=α+�.�
�+ℰ
�
•outcome: �
�=α+�.�
�+??????
�
•”Reasonable” assumptions:
•??????ℰ=0
•??????ℰ|�=0
•??????
�,ℰ=0
•The pairs (�
�,�
�) are iid (independent and identically distributed) draws from their joint distribution
•V�>0
•Vℰ|�=??????
2
=�????????????

Measuring – Sampling – Linear regression - XXII
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
34
Luc Faucheux 2025
•Modeling: Y=α+�.�+ℰ
•Sampling: �
�=α+�.�
�+ℰ
�
•outcome: �
�=α+�.�
�+??????
�
•”Reasonable” assumptions:
•??????ℰ=0
•??????ℰ|�=0
•??????
�,ℰ=0
•The pairs (�
�,�
�) are iid (independent and identically distributed) draws from their joint distribution
•V�>0
•Vℰ|�=??????
2
=�????????????

Measuring – Sampling – Linear regression - XXIII
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
35
Luc Faucheux 2025
•Under those reasonable ”Reasonable” assumptions, one can show that:
•If we define the sample slope estimator as:
•መ�=
ෟ??????
��
ෟ??????��
=ෞ??????
��.
ෞ??????
�
ෞ??????�
•With:
•ෞ??????
��=
1
�

��
�−ෞ??????
�.(�
�−ෞ??????
�)
•ෞ??????
�=
1
�

��
�
•ෞ??????
�=
1
�

��
�
•ෞ??????
��=
1
�

��
�−ෞ??????
�.(�
�−ෞ??????
�)
•And the sample intercept estimator as:
•ො�=ෞ??????
�−መ�.ෞ??????
�

Measuring – Sampling – Linear regression - XXIV
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
36
Luc Faucheux 2025
•Modeling: Y=α+�.�+ℰ
•Sampling: �
�=α+�.�
�+ℰ
�
•outcome: �
�=α+�.�
�+??????
�
•Under those reasonable ”Reasonable” assumptions, one can show that:
•??????ො�=�
•??????መ�=�
•መ�=
ෟ??????
��
ෟ??????
��
=ෞ??????
��.
ෞ??????
�
ෞ??????
�
•ො�=ෞ??????
�−መ�.ෞ??????
�

Measuring – Sampling – Linear regression - XXV
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
37
Luc Faucheux 2025
•Modeling: Y=α+�.�+ℰ
•Sampling: �
�=α+�.�
�+ℰ
�
•outcome: �
�=α+�.�
�+??????
�
•Under those reasonable ”Reasonable” assumptions, one can show that:
•??????ො�=�
•??????መ�=�
•መ�=
ෟ??????
��
ෟ??????
��
=ෞ??????
��.
ෞ??????
�
ෞ??????
�
�.መ�−�՜
??????
??????(0,
??????
2
??????
��
)
•ො�=ෞ??????
�−መ�.ෞ??????
� �.ො�−�՜
??????
??????(0,
??????
2
(??????
��+??????
�
2
)
??????
��
)

Measuring – Sampling – Linear regression - XXVI
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
38
Luc Faucheux 2025
•Modeling: Y=α+�.�+ℰ
•Sampling: �
�=α+�.�
�+ℰ
�
•outcome: �
�=α+�.�
�+??????
�
•Under those reasonable ”Reasonable” assumptions, one can show that:
•The sample slope and intercept estimators are also such that they minimize the residual sum of squares,
which is defined as: σ
�(�
�−ො� −መ�.�
�)
2
•Another formulation that you sometimes encounter is that the sample slope and intercept are the arguments
of the minimum of the function over the span of possible α and �
•ො�,መ�=���
α,?????? ��� {σ
�(�
�−� −�.�
�)
2
}

Measuring – Sampling – Linear regression - XXVII
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
39
Luc Faucheux 2025
•Modeling: Y=α+�.�+ℰ
•Sampling: �
�=α+�.�
�+ℰ
�
•outcome: �
�=α+�.�
�+??????
�
•A couple of notes:
•As always, once the proverbial rubber hits the road, things are more complicated and the devil is always in
the details….
•IN PARTICULAR, when starting to test hypotheses, you cannot rely on the following:
•�.መ�−�՜
??????
??????0,
??????
2
??????
��
•�.ො�−�՜
??????
??????(0,
??????
2
(??????
��+??????
�
2
)
??????
��
)
•BECAUSE you do not know the unknown values ??????
2
, ??????
��, and ??????
� for example
•So you will have to rely on estimates for those

Measuring – Sampling – Linear regression - XXVIII
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
40
Luc Faucheux 2025
•A couple more notes:
•FOR EXAMPLE, how do we estimate ??????
2
?
•Well, it gets a little technical, so I am only giving you the outline, but the outline follows exactly what we did
in the case of a single random variable, so you should be able to get the gist out of it
•STEP 1: estimate the sample slope estimator መ�
•መ�=
ෟ??????
��
ෟ??????
��
=ෞ??????
��.
ෞ??????
�
ෞ??????
�
•With:
•ෞ??????
��=
1
�

��
�−ෞ??????
�.(�
�−ෞ??????
�)
•ෞ??????
�=
1
�

��
�
•ෞ??????
�=
1
�

��
�
•ෞ??????
��=
1
�

��
�−ෞ??????
�.(�
�−ෞ??????
�)

Measuring – Sampling – Linear regression - XXIX
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
41
Luc Faucheux 2025
•STEP 1: estimate the sample slope estimator መ�
•መ�=
ෟ??????
��
ෟ??????
��
=ෞ??????
��.
ෞ??????
�
ෞ??????
�
=
1
??????

??????
�
??????−ෞ??????
�.(�
??????−ෞ??????
�)
1
??????

??????
�
??????−ෞ??????
�.(�
??????−ෞ??????
�)
•So…
•STEP 0: compute ෞ??????
� and ෞ??????
�
•That is easy because we have done that in the case of a single variable, under some reasonable assumptions
of iid, the sample mean estimators (Sample Mean) are: ෞ??????
�=
1
�

��
� and ෞ??????
�=
1
�

��
�
•ෞ??????
� and ෞ??????
� are BLUE (Best Linear Unbiased Estimator)
•ෞ??????
�՜
??????
??????(??????
�,
??????
�
2
�
) and ෞ??????
�՜
??????
??????(??????
�,
??????
�
2
�
)

Measuring – Sampling – Linear regression - XXX
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
42
Luc Faucheux 2025
•OK, back to STEP 1: compute the sample slope estimator መ�
•መ�=
ෟ??????
��
ෟ??????
��
=ෞ??????
��.
ෞ??????
�
ෞ??????
�
=
1
??????

??????
�
??????−ෞ??????
�.(�
??????−ෞ??????
�)
1
??????

??????
�
??????−ෞ??????
�.(�
??????−ෞ??????
�)
•NOW, STEP 2: compute the sample intercept estimator:
•ො�=ෞ??????
�−መ�.ෞ??????
�
•NOW, STEP 3: define the “sample model residuals” ෡ℰ
� as:
•�
�=α+�.�
�+ℰ
�
•෡ℰ
�=�
� −ො�−መ�.�
�

Measuring – Sampling – Linear regression - XXXI
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
43
Luc Faucheux 2025
•NOW, STEP 3: define the “sample model residuals” ෡ℰ
� as:
•�
�=α+�.�
�+ℰ
�
•෡ℰ
�=�
� −ො�−መ�.�
�
•So just to make sure, you have to juggle all those different notations
•Modeling: Y=α+�.�+ℰ
•Sampling: �
�=α+�.�
�+ℰ
�
•Outcome: �
�=α+�.�
�+??????
�
•Estimate: �
�= ො�+መ�.�
�+෡ℰ
�

Measuring – Sampling – Linear regression - XXXI
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
44
Luc Faucheux 2025
•So just to make sure, you have to juggle all those different notations
•Modeling: Y=α+�.�+ℰ �, Y and ℰ are random variables
•Sampling: �
�=α+�.�
�+ℰ
� �
�, �
� and ℰ
� are sample draws
•Outcome: �
�=α+�.�
�+??????
� �
�, �
� and ??????
� are the actual numbers
•Note: this is technical but it matters, because ??????�
�=??????
� but ??????�
�=�
�
•Estimate: �
�= ො�+መ�.�
�+෡ℰ
� Based on the estimators ො� and መ�, which are random variables
because they are functions of the (�
�,�
�) and so the sample residual estimators are also random variables
defined as: ෡ℰ
�=�
� −ො�−መ�.�
�

Measuring – Sampling – Linear regression - XXXII
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
45
Luc Faucheux 2025
•So to avoid this rather technical distinction, sometimes in textbooks you will find the following terminology:
•ℰ
� are called ”population disturbances”
•෡ℰ
� are called “model residuals” but I like to call them “sample residual estimators”
•REMEMBER, if capital letter, it is a random variable
•REMEMBER, if funny hat, it is also a random variable, but computed from the draws (the sample). It is also an
estimate of the population parameter, hence the term “estimator”
•So, if funny hat, I ALWAYS try to use a terminology as “sample something estimator”, like
•Sample mean estimator: ෞ??????
�=
1
�

��
�
•Sample variance estimator:

??????
�
2
=ෞ??????
��=
1
�

�(�
�−ෞ??????
�)
2

Measuring – Sampling – Linear regression - XXXII
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
46
Luc Faucheux 2025
•Sample mean estimator: ෞ??????
�=
1
�

��
�
•Sample variance estimator:

??????
�
2
=ෞ??????
��=
1
�

�(�
�−ෞ??????
�)
2
•Sample slope estimator: መ�=
ෟ??????
��
ෟ??????
��
=ෞ??????
��.
ෞ??????
�
ෞ??????
�
=
1
??????

??????
�
??????−ෞ??????
�.(�
??????−ෞ??????
�)
1
??????

??????
�
??????−ෞ??????
�.(�
??????−ෞ??????
�)
•Sample intercept estimator: ො�=ෞ??????
�−መ�.ෞ??????
�
•Sample “model residuals” estimator: ෡ℰ
�=�
� −ො�−መ�.�
�
•Sample “model residual sum of squares” estimator: σ
�
෡ℰ
�
2
•You observe ෡ℰ
�, you do not observeℰ
�

Measuring – Sampling – Linear regression - XXXIII
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
47
Luc Faucheux 2025
•You observe ෡ℰ
�, you do not observeℰ
�
•You could question if the ℰ
� are drawn from a population ℰ, are the ෡ℰ
� ALSO draws from that same
population?
•Or since the ෡ℰ
� are very complicated functions of the ෞ??????
�, ෞ??????
�,ෞ??????
��, ෞ??????
��, ෞ??????
�� (through their relationship to
define the ො� and መ�), are we actually justified in making the same assumptions on the ෡ℰ
� as we did on the ℰ
�?
•Remember we said something like “in OLS, it helps to make some reasonable assumptions such as:”

•??????ℰ=0
•??????ℰ|�=0
•??????
�,ℰ=0
•The pairs (�
�,�
�) are iid (independent and identically distributed) draws from their joint distribution
•V�>0
•Vℰ|�=??????
2
=�????????????

Measuring – Sampling – Linear regression - XXXIII
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
48
Luc Faucheux 2025
•You observe ෡ℰ
�, you do not observeℰ
�.
•Remember we said something like “in OLS, it helps to make some reasonable assumptions such as:”
•??????ℰ=0
•??????ℰ|�=0
•??????
�,ℰ=0
•The pairs (�
�,�
�) are iid (independent and identically distributed) draws from their joint distribution
•V�>0
•Vℰ|�=??????
2
=�????????????
•Can we still say for example: ??????෡ℰ
�=0 ?
•It is a very subtle BUT very valid question, that might trip you into a long journey down the rabbit hole, but it
is worth thinking about it for a minute.

Measuring – Sampling – Linear regression - XXXIV
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
49
Luc Faucheux 2025
•You observe ෡ℰ
�, you do not observeℰ
�.
•It is a very subtle BUT very valid question, that might trip you into a long journey down the rabbit hole, but it
is worth thinking about it for a minute.
•Another quick note to guide you through this, remember in one variable we had:


??????
�
2
=ෞ??????
��=
1
�

�(�
�−ෞ??????
�)
2
where ෞ??????
�=
1
�

��
�
•??????ෞ??????
��=
�−1
�
.??????
��=
�−1
�
.??????
�
2
where V�
�=??????
�
2
=??????
��
•Here, after much math you will get something very similar


??????

2
=ෞ??????
ℰℰ=
1
�

�(෡ℰ
�−ෞ??????
ℰ)
2
where ෞ??????
ℰ=
1
�

�
෡ℰ
�
•??????ෞ??????
ℰℰ=
�−2
�
.??????
ℰℰ=
�−2
�
.??????

2
where Vℰ|�=??????

2
=??????
ℰℰ

Measuring – Sampling – Linear regression - XXXV
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
50
Luc Faucheux 2025
•You observe ෡ℰ
�, you do not observeℰ
�.
•It is a very subtle BUT very valid question, that might trip you into a long journey down the rabbit hole, but it
is worth thinking about it for a minute.
•Note that to be truly consistent, I should not be writing:
• ෞ??????
ℰ=
1
�

�
෡ℰ
�
•But really
•෢ෞ??????
ℰ=
1
�

�
෡ℰ
�
•Since the estimator is based on something that is ALREADY and estimator, but first of all, no textbooks out
there that I know of actually goes into that level of rigorous terminology (or if there is one please send it to
me), and also I think that this little slip in rigor is actually ok (but am still on the fence on that one)

Measuring – Sampling – Linear regression - XXXVI
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
51
Luc Faucheux 2025
•OK, so NOW we are almost here, we were at:
•NOW, STEP 3: define the “sample model residuals” ෡ℰ
� as: ෡ℰ
�=�
� −ො�−መ�.�
�


??????

2
=ෞ??????
ℰℰ=
1
�

�(෡ℰ
�−ෞ??????
ℰ)
2
where ෞ??????
ℰ=
1
�

�
෡ℰ
�
•??????ෞ??????
ℰℰ=??????
1
�

�(෡ℰ
�−ෞ??????
ℰ)
2
=??????
1
�

�(෡ℰ
�−(
1
�

�
෡ℰ
�))
2
=
�−2
�
.??????
ℰℰ=
�−2
�
.??????

2

•where Vℰ|�=??????

2
=??????
ℰℰ
•SO WE DID IT! The estimate of ??????

2
which was unknown to us can be computed by:
•Vℰ|�=??????

2
=??????
ℰℰ=
�
�−2
.??????ෞ??????
ℰℰ

Measuring – Sampling – Linear regression - XXXVII
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
52
Luc Faucheux 2025
•SO WE DID IT! The estimate of ??????

2
which was unknown to us can be computed by:
•Vℰ|�=??????

2
=??????
ℰℰ=
�
�−2
.??????ෞ??????
ℰℰ=
�
�−2
.??????
1
�

�(෡ℰ
�−(
1
�

�
෡ℰ
�))
2
•Where:
•෡ℰ
�=�
� −ො�−መ�.�
�
•ො�=ෞ??????
�−መ�.ෞ??????
�
•መ�=
ෟ??????
��
ෟ??????
��
=ෞ??????
��.
ෞ??????
�
ෞ??????
�
=
1
??????

??????
�
??????−ෞ??????
�.(�
??????−ෞ??????
�)
1
??????

??????
�
??????−ෞ??????
�.(�
??????−ෞ??????
�)
•ෞ??????
�=
1
�

��
�
•ෞ??????
�=
1
�

��
�


??????
�
2
=ෞ??????
��=
1
�

�(�
�−ෞ??????
�)
2
•ෞ??????
��=
1
�

��
�−ෞ??????
�.(�
�−ෞ??????
�)

Measuring – Sampling – Linear regression - XXXVIII
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
53
Luc Faucheux 2025
•SO WE DID IT! The estimate of ??????

2
which was unknown to us can be computed by:
•Vℰ|�=??????

2
=??????
ℰℰ=
�
�−2
.??????ෞ??????
ℰℰ=
�
�−2
.??????
1
�

�(෡ℰ
�−(
1
�

�
෡ℰ
�))
2
•That is an awful lot of steps, and super easy to get lost in, and also there is an awful lot of assumptions (some
pretty obvious, some pretty hidden and hard to pinpoint to) that we had to rely on.
•Quite frankly in a lot of situations, people just add on top of those “hey it is a large sample” so let me replace
things I do not know by their sample estimate
•So, for example:
•For ??????

2
let me just use ෞ??????
ℰℰ
•For ??????
�
2
let me just use ෞ??????
��
•So for large samples, an easy way out of a lot of math is to just “drop the hat”
•This is why when teaching stats the right way people always get super confused about the hat, because it
most cases they never really had to pay attention to it. The issue is that you need it for hypothese testing.

Measuring – Sampling – Linear regression - XXXIX
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
54
Luc Faucheux 2025
•Quite frankly in a lot of situations, people just add on top of those “hey it is a large sample” so let me replace
things I do not know by their sample estimate
•So, for example:
•For ??????

2
let me just use ෞ??????
ℰℰ which I know how to calculate
•For ??????
�
2
let me just use ෞ??????
�� which I know how to calculate
•So for large samples, an easy way out of a lot of math is to just “drop the hat”
•This is why when teaching stats the right way people always get super confused about the hat, because it
most cases they never really had to pay attention to it. The issue is that you need it for hypothese testing to
do it the right way and not get confused say between the population variance, the sample variance, the
variance of the sample variance, the variance of the sample mean, and all that good stuff….

Measuring – Sampling – Linear regression - XXXX
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
55
Luc Faucheux 2025
•Quite frankly in a lot of situations, people just add on top of those “hey it is a large sample” so let me replace
things I do not know by their sample estimate
•So, for example:
•For ??????

2
let me just use ෞ??????
ℰℰ which I know how to calculate
•For ??????
�
2
let me just use ෞ??????
�� which I know how to calculate
•So for large samples, an easy way out of a lot of math is to just “drop the hat”
•This is why when teaching stats the right way people always get super confused about the hat, because it
most cases they never really had to pay attention to it. The issue is that you need it for hypothese testing to
do it the right way and not get confused say between the population variance, the sample variance, the
variance of the sample variance, the variance of the sample mean, and all that good stuff….

Measuring – Sampling – Linear regression - XXXXI
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
56
Luc Faucheux 2025
•FURTHERMORE, in practice a lot of time a lot of the estimators are just treated a random variable drawn from
a Normal Distribution Function (because of some general use of the CLT, Central Limit Theorem)
•So for large samples, an easy way out of a lot of math is to just “drop the hat”
•Furthermore for hypothese testing in practice use the NDF Normal Distribution Function ??????
•So drop the hat, not the ball…

Measuring – Sampling – Linear regression - XXXXII
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
57
Luc Faucheux 2025
•So drop the hat, but do not drop the ball…

Measuring – Sampling – Linear regression - XXXXIII
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
58
Luc Faucheux 2025
•I bet that you had no idea that linear regression could be so complicated…
•But as always, there is nothing that a good Excel spreadsheet can solve….
•Onto the Excel spreadsheet Robin !

Measuring – Sampling – Linear regression - XXXXIV
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
59
Luc Faucheux 2025
•Let’s play with those numbers:

Measuring – Sampling – Linear regression - XXXXV
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
60
Luc Faucheux 2025
•We can try to draw lines using the Alpha and Beta parameters : Y=α+�.�

Measuring – Sampling – Linear regression - XXXXV
•With TWO variables : Sampling a distribution of returns {�
??????,�
??????} with N observations
61
Luc Faucheux 2025
•We can try to draw lines using the Alpha and Beta parameters : Y=α+�.�
•That one looks better

Measuring – Sampling – Linear regression - XXXXV
62
Luc Faucheux 2025
•Let’s compute the residuals such that : ℰ
�=�
�−α−�.�
�

Measuring – Sampling – Linear regression - XXXXV
63
Luc Faucheux 2025
•Let’s compute the sum of the squares of the residuals defined as: σ
�ℰ
�
2

�(�
�−α−�.�
�)
2

Measuring – Sampling – Linear regression - XXXXV
64
Luc Faucheux 2025
•We can use a very “brute force” approach and take advantage of the fact that Excel has a GoalSeek or Solver
•This is very much along the lines of what AI is really doing these days, not very smart, but brute force
•This is actually somewhat pragmatic, if not theoretically unstatisfying
•Because the theory only works within the assumptions that you are making
•Change the assumption, you have to redo everything
•Here, if you want to add any kind of constraints (say you want to bound some of the parameters), you can
very easily add this, and let the solver find a minima
•If you are lucky, like in a Neural Network, the minima will be somewhat global, somewhat stable, somewhat
well behaved, somewhat easily interpretable
•So sometimes brute force is the most resilient and pragmatic approach, nothing wrong with a little brute
force sometimes
•So let’s use Excel Solver to find the parameters (�,�) that minimizes the sum of the squares of the residuals:
σ
�ℰ
�
2

�(�
�−α−�.�
�)
2

Measuring – Sampling – Linear regression - XXXXVI
65
Luc Faucheux 2025
•So let’s use Excel Solver to find the parameters (�,�) that minimizes the sum of the squares of the residuals:
σ
�ℰ
�
2

�(�
�−α−�.�
�)
2
…..READY ? SET ?

Measuring – Sampling – Linear regression - XXXXVII
66
Luc Faucheux 2025
•So let’s use Excel Solver to find the parameters (�,�) that minimizes the sum of the squares of the residuals:
σ
�ℰ
�
2

�(�
�−α−�.�
�)
2
…..READY ? SET ? GO !!!!

Measuring – Sampling – Linear regression - XXXXVIII
67
Luc Faucheux 2025
•So let’s use Excel Solver to find the parameters (�,�) that minimizes the sum of the squares of the residuals:
σ
�ℰ
�
2

�(�
�−α−�.�
�)
2
…..READY ? SET ? GO !!!!
•EXCEL solved and found the following solution:

Measuring – Sampling – Linear regression - XXXXIX
68
Luc Faucheux 2025
•So let’s use Excel Solver to find the parameters (�,�) that minimizes the sum of the squares of the residuals:
σ
�ℰ
�
2

�(�
�−α−�.�
�)
2
…..READY ? SET ? GO !!!!
•EXCEL solved and found the following solution:
•Not bad, and as I said, you can do a lot in Excel, add constraints, and all of that good stuff that you would not
be able to do in OLS.
•But the issue is the following now: for you to make any statement about how good your solution is, you
would need to know something about the distribution of those parameters, meaning the following:
•Someone told you that the BETA should be 3 (because say it is a 3xtimes leveraged ETF on the underlying
asset).
•You found roughly 2
•Is that because your friend is lying, or is that because you do not have enough samples?
•In order to answer that question, you sort of need to know the “Z-score” of your observation (roughly 2)
against the distribution of the parameters, to know ”how far is 2 from 3?”
ALPHA BETA
0.0164052.00142546

Measuring – Sampling – Linear regression - XXXXX
69
Luc Faucheux 2025
•Someone told you that the BETA should be 3 (because say it is a 3xtimes leveraged ETF on the underlying
asset).
•You found roughly 2
•Is that because your friend is lying, or is that because you do not have enough samples?
•In order to answer that question, you sort of need to know the “Z-score” of your observation (roughly 2)
against the distribution of the parameters, to know ”how far is 2 from 3?”
•To answer that you need some estimate of the standard deviation of BETA
•If you think that it is say 0.2, then observing 2 when the mean is 3 is an observation with a Z-score of -5
•That is quite significant, and you should be able to make some sort of a statement that the sampling strongly
indicates that the true BETA is not 3, and using some of the Normal Distribution table, put some numbers on
that statement
•But say if you think that the standard deviation is like 3, then now the Z-score is -0.33, and the strength of any
statement that you can make just got a lot smaller
•That is why it is useful to work through the math, no matter how painful it is, because it allows you to put
some numbers on your estimate

Measuring – Sampling – Linear regression - XXXXXI
70
Luc Faucheux 2025
•But say if you think that the standard deviation is like 3, then now the Z-score is -0.33, and the strength of any
statement that you can make just got a lot smaller
•That is why it is useful to work through the math, no matter how painful it is, because it allows you to put
some numbers on your estimate
•Just as a note, you could of course manually come up with estimate using the brute force approach, but
changing some inputs, adding some constraints, and essentially recover some estimate of the standard
deviation of your parameters, but in the end that might end up being more work and more brain power to
justify this approach than some simple statistics (keeping in mind of course all the assumptions, hidden or
not, that you had to swallow on the way)
•OK, so back to Excel Robin !

Measuring – Sampling – Linear regression - XXXXXII
71
Luc Faucheux 2025
•Let get some stats on those pesky variables

Measuring – Sampling – Linear regression - XXXXXIII
72
Luc Faucheux 2025
•With a little more Zoom

Measuring – Sampling – Linear regression - XXXXXIII
73
Luc Faucheux 2025
•With a little more Zoom

Measuring – Sampling – Linear regression - XXXXXIII
74
Luc Faucheux 2025
•EXCEL solved and found the following solution:
•Using all our newly acquired math skills we found:
•So we know that we are on the right track, and that we did not mess up too much.
•The slight difference in values is from the numerical solver, so no worries about that
•OK, so now your friend told you BETA should be equal to 3, but we are finding more like around 2 from Excel
solver and also from the OLS Linear regression. We need to get an estimate of the standard deviation of the
slope, because we already know that it is an UNBIASED estimate of the population slope
•??????መ�=�
ALPHA BETA
0.0164052.00142546
ALPHA BETA
0.0164052.001427

Measuring – Sampling – Linear regression - XXXXXIV
75
Luc Faucheux 2025
•??????መ�=�
•We now need some numbers around the standard deviation of መ�
•We know that:
•�.መ�−�՜
??????
??????(0,
??????
2
??????
��
)
•But we do not know ??????
2
, nor do we know ??????
��
•Hold on I hear you say, yes we do know those, we have spent like 30 slides or so deriving those.
•YES INDEED
•Since: ??????

??????
�
2
=??????ෞ??????
��=
�−1
�
.??????
�
2
=
�−1
�
.??????
��
•We have the estimate of the population variance: ??????
��=
�
�−1
.

??????
�
2
•OK, so that’s one, we have ??????
�� (0.234 in Excel cell C20)

Measuring – Sampling – Linear regression - XXXXXV
76
Luc Faucheux 2025
•OK, so that’s one, we have ??????
�� (0.234 in Excel cell C20)

Measuring – Sampling – Linear regression - XXXXXVI
77
Luc Faucheux 2025
•OK, so that’s one, we have ??????
�� (0.234 in Excel cell C20)
•�.መ�−�՜
??????
??????(0,
??????
2
??????
��
)
•So we need ??????
2
the variance for the population “shocks”. Some textbooks used the term “shock” as opposed
to “residuals”
•We went through that in the previous slides !! (slide XXXVIII)
•SO WE DID IT! The estimate of ??????

2
which was unknown to us can be computed by:
•Vℰ|�=??????

2
=??????
ℰℰ=
�
�−2
.??????ෞ??????
ℰℰ=
�
�−2
.??????
1
�

�(෡ℰ
�−(
1
�

�
෡ℰ
�))
2
•OK we are almost there, back again to the spreadsheet Robin !

Measuring – Sampling – Linear regression - XXXXXVII
78
Luc Faucheux 2025
•We need to compute:
•Vℰ|�=??????

2
=??????
ℰℰ=
�
�−2
.??????ෞ??????
ℰℰ=
�
�−2
.??????
1
�

�(෡ℰ
�−(
1
�

�
෡ℰ
�))
2
•We plug back into our ALPHA and BETA values for the straight line the values we got from our OLS Linear
Regression:
ALPHA BETA
0.0164052.001427

Measuring – Sampling – Linear regression - XXXXXVIII
79
Luc Faucheux 2025
•We need to compute:
•Vℰ|�=??????

2
=??????
ℰℰ=
�
�−2
.??????ෞ??????
ℰℰ=
�
�−2
.??????
1
�

�(෡ℰ
�−(
1
�

�
෡ℰ
�))
2
•We plug back into our ALPHA and BETA values for the straight line the values we got from our OLS Linear
Regression:

Measuring – Sampling – Linear regression - XXXXXIX
80
Luc Faucheux 2025
•Turns out that the Sample Mean estimator for the residuals is EXACTLY ZERO ??!!
•That is a neat little result of OLS because we are minimizing the sum of the squares
•CAREFUL that it would not always be the case should we choose another minimization
•But Hey, let’s use that because it is really nice to work in a mean adjusted sample

Measuring – Sampling – Linear regression - XXXXXX
81
Luc Faucheux 2025
•We need to compute:
•Vℰ|�=??????

2
=??????
ℰℰ=
�
�−2
.??????ෞ??????
ℰℰ=
�
�−2
.??????
1
�

�(෡ℰ
�−(
1
�

�
෡ℰ
�))
2
•Since
1
�

�
෡ℰ
�=0
•We need to compute:
•Vℰ|�=??????

2
=??????
ℰℰ=
�
�−2
.??????ෞ??????
ℰℰ=
�
�−2
.??????
1
�

�(෡ℰ
�)
2
•We compute the sample variance residuals estimator:

1
�

�(෡ℰ
�)
2

Measuring – Sampling – Linear regression - XXXXXXI
82
Luc Faucheux 2025
•Vℰ|�=??????

2
=??????
ℰℰ=
�
�−2
.??????ෞ??????
ℰℰ=
�
�−2
.??????
1
�

�(෡ℰ
�)
2
•We compute the sample variance residuals estimator:

1
�

�(෡ℰ
�)
2

Measuring – Sampling – Linear regression - XXXXXXII
83
Luc Faucheux 2025
•Vℰ|�=??????

2
=??????
ℰℰ=
�
�−2
.??????ෞ??????
ℰℰ=
�
�−2
.??????
1
�

�(෡ℰ
�)
2
•We now compute :
�
�−2
.
1
�

�(෡ℰ
�)
2
•That is an UNBIASED estimate of the population variance for the residuals

Measuring – Sampling – Linear regression - XXXXXXIII
84
Luc Faucheux 2025
•OK, so that’s one, we have ??????
�� (0.234 in Excel cell C20)
•We now have ??????

2
(0.0272 in Excel cell F14)
•Bear in mind that those are ESTIMATES
•If we get more samples, or different samples, the exact number will change
•STILL, they are both UNBIASED estimates of what we are after
•OK so we have :
•�.መ�−�՜
??????
??????(0,
??????

2
??????
��
)
•Let’s assume that we can use the Normal Distribution Function ?????? for መ�

Measuring – Sampling – Linear regression - XXXXXXIV
85
Luc Faucheux 2025
•OK, so that’s one, we have ??????
�� (0.234 in Excel cell C20)
•We now have ??????

2
(0.0272 in Excel cell F14)
•OK so we have :
•�.መ�−�՜
??????
??????(0,
??????

2
??????
��
)
•Let’s assume that we can use the Normal Distribution Function ?????? for መ�
•So the መ� that we computed is a draw from a Normal Distribution Function with mean � and with variance :
1
�
.
??????

2
??????
��
=0.051 (in Excel cell M3) so the standard deviation will be the usual square root (0.227 in cell M4)

Measuring – Sampling – Linear regression - XXXXXXV
86
Luc Faucheux 2025

Measuring – Sampling – Linear regression - XXXXXXVI
87
Luc Faucheux 2025
•Ok so now we are almost there:
•Your friend is telling you that the BETA should be 3 (expected mean)
•Your friend is not telling you anything about the standard deviation of the BETA population, so you will have
to rely on your estimate which you computed to be 0.227
•Your estimate of the BETA is 2
•So if the Null Hypothesis (a big word to use since we have not covered yet the Hypothesis Testing, but trust
me on this one for now), or if you want your assumptions is that the BETA should indeed be centered around
3, with a Normal distribution function of standard deviation .227, then your observation of 2 has a Z-score of
•Z-score =(2-3)/0.227 = -4.41
•That is pretty high
•From the Normal Distribution function, the probability to observe a draw with a Z-score less than (-4.41) is
essentially 0
•So you can tell your friend that the sample that you are observing would strongly indicate that the real BETA
is 2, and not 3 as they say

Measuring – Sampling – Linear regression - XXXXXXVII
88
Luc Faucheux 2025

Straight lines are everywhere
89
Luc Faucheux 2025

Straight Lines - I
90
Luc Faucheux 2025
“God does not build in straight lines” Prometheus.. (the movie not the Greek Titan god)
CHECKED, I watched the movie, awesome movie…

Straight Lines - Ia
91
Luc Faucheux 2025
And yes, I was already working for Weyland-Yutani in 2008 while trading options at Lehman
Brothers….

Straight Lines - II
92
Luc Faucheux 2025

Straight Lines - III
93
Luc Faucheux 2025

Straight Lines - IV
94
Luc Faucheux 2025
�−??????
??????=�+�.�
�−??????
??????

Straight Lines - V
95
Luc Faucheux 2025

Straight Lines - VI
96
Luc Faucheux 2025

Straight Lines - VII
97
Luc Faucheux 2025
�−??????
??????=�+�.�
�−??????
??????+�
??????.���+�
??????.���

Straight Lines - VIII
98
Luc Faucheux 2025
•Sampling – Measuring – OLS – Linear regression
•DONE (minus small stuff)
•Predicting – Modeling – Security Market Line – CAPM
•NEXT DECK (but essentially a straightforward application of Linear regression)
•Evaluating Performance – alpha and beta
•NEXT DECK (but again now that we have gone through the linear regression, the hard part is behind us)
•Explaining away – Factor Analysis – PCA
•DECK TROIS
•Explaining away – French Fama factor analysis
•ALSO DECK TROIS, essentially it is adding explanatory variables in order to improve the linear regression
and reduce the residuals, so now we have done the hard part, it is all how to apply it now)
•Part Deux, Trois, Quatre and more about the CAPM, the SML, the alpha and beta of portfolios, factor analysis,
PCA and the French Fama framework will be in another deck, or other decks depending on the number of slides.

So at least for now…
99 Luc Faucheux 2025