STATISTICAL REGRESSION MODELS

3,779 views 35 slides Sep 28, 2023
Slide 1
Slide 1 of 35
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35

About This Presentation

MDS (PUBLIC HEALTH DENTISTRY)


Slide Content

STATISTICAL REGRESSION MODELS Seminar no: 6 1

2

3 Regression analysis is a tool in assessing specific forms of relationship between the variables. The ultimate objective of this method of analysis is to predict or estimate the value of one variable corresponding to a given value of another variable . The ideas of regression were first elucidated by the English scientist Sir Francis Galton (1822–1911) in reports of his research on heredity—first in sweet peas and later in human stature.

Correlation gives degree & direction of relationship between 2 variables, whereas the regression analysis enables us to predict the values of one variable (dependent) based on other variable/s (independent). Regression coefficient is a measure of change of one dependent variable with one unit change in independent variable. 4

5 Purpose of regression analysis is to find a reasonable regression equation to predict the average value of dependent variable that is associated with a fixed value of one independent variable. If more than one independent variable were used to predict the average value of a dependent variable , then we need multiple regression .

6 The scatter plots aid us in determining the nature of the relationship and correlation by comparing two sets of data. It gives us a visual picture of any connection between the two variables.

7 Simple regression problem is a statistical relationship between dependent and independent variable. Dependent variable/ response variable /outcome variable (y)  — the main factor that you’re trying to understand or predict. Eg : weight Independent variables /predictor variable/regressor variable/explanatory variable (x) — the factors you suspect have an impact on your dependent variable. Eg : height

8 Where , Y is a value on the vertical axis (dependent variable), X is a value on the horizontal axis (independent variable), A is the point where the line crosses the vertical axis and the value of Y at X = 0, Y-intercept is the value of A when X = 0, B shows the amount, by which Y changes for each unit change in X, i.e. slope of the straight line. Y = A + B(X) The independent variable predicts the value of dependent variable for a given value of independent variable. The general equation for a straight line may be written as

Where , Yi is the value of the dependent variable at the i th level of the independent variable, β0 and β1 are unknown regression coefficients whose values are to be estimated, Xi is a known constant, which is the value of the independent variable at the i th level. 9

10 A regression line, also called a line of best fit , is the line for which the sum of the squares of the residuals is a minimum . The vertical distance between the data point and the fitted regression line is called a residual . Regression minimises residuals. By using the least squares method (a procedure that minimizes the vertical deviations of plotted points surrounding a straight line) a best fitting straight line is constructed to the scatter diagram points and then formulate a regression equation.

The equation of the line is μ y= β o+ β1 x, with intercept β o and slope β1 11

12 Types of Regression Analysis: 1. Bivariate regression – it is the simplest form involving the prediction of the value of unknown variable from the value of known variable. E.g. – to predict the amount of mandibular growth remaining from the Cervical Vertebra Maturation stage.

Types of Regression Models : 13

Types of Regression Models: 14 Dependent variable is quantitative (preferably continuous) Linear regression is a parametric test and is based on a linear relationship between variables. Linear means “ straight line ” Regression tells us how to draw the straight line described by the correlation

15

16 1. Simple Linear Regression Model – It is the most widely known model in which the dependent variable is continuous and the independent variable can be continuous or discrete. Here the nature of regression line is linear which establishes a relationship between the variables using a best fit straight line, known as the regression line. In a simple linear regression model we assume that the graph of the mean of the response variable Yi for given values of the independent variable Xi is a straight line.

17 2. Multiple Linear Regression Model – The difference between simple linear regression and multiple linear regression is that the later has more than one (>1) independent variables, whereas the former has only 1 independent variable.

18 Four assumptions are associated with a linear regression model: Linearity : The relationship between X and the mean of Y is linear. Homoscedasticity : The variance of residual is the same for any value of X. Independence : Observations are independent of each other. Normality : For any fixed value of X, Y is normally distributed. Assumptions for Linear Regression:

19 It is used when the dependent variable is binary/nominal or categorical with just two values like yes/no, true/false, male/female, healthy/diseased etc. The logit function is a link which provides the distribution amongst these two values ranging from 0 to 1. There are two types of logistic regression. Simple logistic regression with only one independent variable and multiple logistic regressions which has more than one independent variable. Simple logistic regression finds the equation that best predicts the value of the Y variable for each value of the X variable.

20 A two-way plot of the residuals versus the values of the independent variable (or the fitted values of the response variable), known as the residual plots , is a useful tool for examining the aptness of a regression model. The residual can be viewed as an observed error.

The residual plots are used to check the following : The regression function is linear The error terms have constant variance The error terms are statistically independent The error terms are normally distributed The model fits the data points except for a few outlier observations. 21

22

23

24 Correlation is a statistical method used to assess a possible linear association between two continuous variables. Statisticians use a correlation coefficient to quantify the strength and direction of the relationship between two variables.

25 Pearson's correlation coefficient: Pearson's correlation coefficient is denoted as ϱ(rho) for a population parameter and as r for a sample statistic.  It is used when both variables being studied are normally distributed.  For a correlation between variables x and y, the formula for calculating the sample Pearson's correlation coefficient is given by :

26 Spearman's rank correlation coefficient: Spearman's rank correlation coefficient is denoted as ϱs for a population parameter and as rs for a sample statistic. It is appropriate when one or both variables are skewed or ordinal and is robust when extreme values are present. For a correlation between variables x and y, the formula for calculating the sample Spearman's correlation coefficient is given by:

Types of correlation : Positive – the two variables changes in the same direction and in the same proportion. Negative – the two variables changes in the opposite direction and in the same proportion. Curved – it represents nonlinear association between the two variables. Even if the relationship is strong the correlation coefficient can be small or zero. Partial – it measures the association between two variables while controlling or adjusting the effect of one or more variables. 27

28

29 Degree of correlation: None – this is no relationship between the variables. Low – there is some relationship between the variables but a weak one. High – there exists a very close relationship between the two variables. Perfect – it’s an ideal relationship. As scores on one of the two variables increase or decrease, the scores on the other variable increase or decrease by the same magnitude

30 A parametric correlation coefficient ( r ) is a measure of the linear relationship between two continuous variables. The range of r is 1 to –1. When r is equal to zero, there is no relationship between the two variables. Regression coefficients estimate how much of the change in the outcome is associated with changes in the explanatory variables. Correlation measures the strength of relationship and regression measures the magnitude of relationship between the two variables.

31 When a ŷ -value is predicted from an x -value, the prediction is a point estimate. An interval can also be constructed . The standard error of estimate s e is the standard deviation of the observed y i -values about the predicted ŷ -value for a given x i -value. It is given by where n is the number of ordered pairs in the data set. The closer the observed y -values are to the predicted ŷ -values, the smaller the standard error of estimate will be.

32 Predicts the future course of events in growth, development and treatment. Support treatment planning decisions Correcting errors. Provide new insights.

33 Looks at only linear straight-line relationship between the dependent and independent variable. Observes on the mean of the dependent variable. Sensitive to extreme data. Not useful when the actual and exact mathematical relationship between the variables are know.

34 The regression analysis is a statistical technique that deals with the analysis of relationship between such variables. They help us to predict the duration, course and outcome of the treatment parameters. It is a complex process of predicting/estimating the magnitude of some unknown characteristics which might be involved in the growth and treatment of a given patient. C orrelation coefficients are used to assess the strength and direction of the linear relationships between pairs of variables. 

35 Jay S. Kim, Ronald J. Dailey. Biostatistics for Oral Healthcare. 2008, First Edition, Blackwell Munskgaard Publication. B. K. Mahajan. Methods in Biostatistics, 9th Edition, Jaypee Publications Jain S, Chourse S, Dubey S, Jain S, Kamakoty J, Jain D. Regression analysis–its formulation and execution in dentistry. Journal of Applied Dental and Medical Sciences. 2016 Jan;2:1. M.M Mukaka . Statistics Corner: A guide to appropriate use of Correlation coefficient in medical research. Malawi Medical Journal 2012; 24(3): 69-71 Saha I, Paul B. ESSENTIALS OF BIOSTATISTICS & RESEARCH METHODOLOGY. Academic Publishers; 2020 Oct 20. Lewis S. Regression analysis. Pract Neurol. 2007 Aug;7(4):259-64. doi : 10.1136/jnnp.2007.120055. PMID: 17636142.