Bivariate linear regression

3,925 views 33 slides Nov 22, 2013
Slide 1
Slide 1 of 33
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33

About This Presentation

in this presentation, I've tried to compile all the details about bivariate linear regression and correlation. This presentation has all the key issues addressed, but those who want to use it have to speak more and verbally describe all the details covered according to the understanding of your...


Slide Content

Linear Regression Dr Menaal Kaushal JR II Department of S P M S N Medical College, Agra 22/1/13 1

Statistical Analysis can be: Univariate : When Only one variable is studied. E.g Heights of all the IV graders, ages of mothers delivering at a DH, etc. (Measures of Central Tendency, Measures of Dispersion) Bivariate : When relationship between two variables are studied. e.g. Relationship between height and weight of Every Child in the IV grade; relation between mother’s age & birth weight of her baby, etc. Multivariate : When relationship between more than two variables are studied. E.g Relationship between height, weight and MAC of every child in the IV grade 22/1/13 2

Bivariate Regression Linear Regression : When the data is continuous Logistic Regression : When the data is categorical, e.g. the research question can be answered as either yes or no category 22/1/13 3

Levels (Types) of Data Nominal (Categorical) Measures: Are exhaustive and mutually exclusive (e.g., religion), gender Ordinal Measures: All of the above plus can be rank-ordered (e.g., social class). Interval Measures: All of the above plus equal differences between measurement points (temperature in ℃ or ℉ ). Ratio Measures: All of the above plus a true zero point (weight, Absolute Temperature in Kelvin). 22/1/13 4

Relationship Between Two Variables Association: any relation between variables Positive association: above average values of one variable tend to go with above average values of the other; the scatter slopes up Negative association: above average values of one variable tend to go with below average values of the other; the scatter slopes down Linear association: roughly, the scatter diagram is clustered around a straight line. This is Correlation 22/1/13 5

22/1/13 6

[‘p-0 22/1/13 7

22/1/13 8

The “Football” Bivariate Normal Scatter Plot 22/1/13 9

Can you identify any difference? 22/1/13 10

How Tightly Clustered Are these Data? 22/1/13 11

Calculating the Correlation Coefficient 22/1/13 12

So, How to Calculate r 22/1/13 13

Formula of Correlation Coefficient 22/1/13 14 Lets Simplify: Convert the data into Standard units. Multiply the corresponding standard unit values of x and y r is the mean of this product

Properties of Correlation Coefficient The calculations uses only standard units so r is a pure number with no units -1≤ r ≤ 1 In the extreme cases, r = -1 when the scatter diagram is a perfect straight line sloping down. If r = 1, the scatter diagram is a perfect line sloping up Switching the variables x and y does not change r. it remains the same 22/1/13 15

22/1/13 16 Adding a constant to one of the lists just slides the scatter diagram so r stays the same Multiplying one of the lists by a positive constant does not change standard units so r stays the same Multiplying just one (not both) of the lists by a negative constant switches the signs of the standard units of that variable, so r has the same absolute value but its sign gets switched.

Heteroscadastic Curve 22/1/13 17

What r can not tell? Association is not causation. r does not tell “Why” r is only used for linearly correlated variables. It measures linear association. This diagram shows a strong relation between x& y, but it is not linear. But r for this diagram comes out to be Zero 22/1/13 18

Beware of: Outliers Tendency for Ecological correlations 22/1/13 19

Deal with the outliers 22/1/13 20

Can you find the outlier? 22/1/13 21

Avoid “Ecological Correlation”: 22/1/13 22 Replacing students by averages can artificially increase clustering. This is not desirable.

Regression The technique to estimate dependent variable “y”, for a given value of variable “x” when they are linearly associated and the correlation coefficient “r” is known. 22/1/13 23

22/1/13 24 Each estimate is at the center of the vertical strip

22/1/13 25

The slope of the green line= r 22/1/13 26

The Equation of Regression Estimate of y = r* given x (in Standard units ) ⇒ estimate of y- µ y = r ( x- µ x ) SD y SD x Estimate of y= Slope* (x) + intercept (Here Slope= r * SDy / SDx and intercept= µ y -slope*x) 22/1/13 27

Why call “Regression” Sir Francis Galton 1822- 1911: “The Galton Effect” “Those who have high values in one variable tend to be not as high in the second variable” A eugenicist, who gave the idea of SD and regression “Fathers who are tall, tend to have sons who are not quite that tall on average” All data regresses towards “mediocrity” i.e. regresses towards mean The Regression Fallacy or Sophomore Slump 22/1/13 28

22/1/13 29

Univariate Normal Bivariate Normal 22/1/13 30 µ x r +1 SD +1 r.m.s . error 68% 68%

Residual Plot 22/1/13 31 Regardless of the shape of the scatter diagram: the average of the residuals is Always 0, There is No linear association between residuals and x. The residual plot should not show any trend or linear relation. Good regression: Residual plot should look like a formless blob around the horizontal axis

Residual Plot as a Diagnostic Tool 22/1/13 32

22/1/13 33 Thank you Questions??