REGRESSION ANALYSIS THEORY EXPLAINED HERE

154 views 12 slides Apr 01, 2024
Slide 1
Slide 1 of 12
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12

About This Presentation

REGRESSION ANALYSIS


Slide Content

Unit 2 Regression Regression As Prediction Model: In earlier discussions we have studied correlation. It gives extent of linear relationship between two variables. Whereas in regression analysis one of the variable is cause which is known as independent variable or regressor and another variable is effect which is known as dependent variable or predictor. If two variables are correlated, we can use this correlation for prediction of variable given the other variable. Applications of regression are numerous and applied in almost every field including Engineering, Physical and Chemical Sciences, Life and Biological Sciences, Social Sciences and Computer Sciences. Under computer science we are using many applications such as, Data mining, Artificial Intelligence, Machine Learning, Deep Learning, Data Science and Big data analytics regression analysis is widely used for prediction and forecasting purpose in these fields. FUNDAMENTAL CONCEPTS OF REGRESSION Model: Model is a mathematical representation of phenomena. For example, when we conduct an experiment, there is a process which has two outcomes/variables, one is an output variable and second is the variable which is causing output variable called as input variable The relationship between input (independent) variable and output (dependent) variable is called as a model.

Regression: Technique of prediction on the basis of correlation is called as regression. Regression represents relationship between input and output variables, where the X (Input and Y(Output) are measurable. The model has two components, one is variable and another is parameter. Since correlation measures the linear relation between two variables, we find a linear equation in these variables. In otherwords , we state the relation in terms of equation of straight line Y = mX+c , here X and Y are variables and m and c are parameters, slope m = tanɵ . Therefore, the knowledge on m= slope and c= intercept completely describe the equation Y = mx + c. the statistical tool used to find these constants is known as regression analysis. Montgomery (1982) gave four purposes of carrying out Regression Analysis: 1) Description: The analyst find the regression model that describes or summarizes the relationship between two variables. 2) Estimation of coefficients: The analyst may have a theoretical relationship in mind, and the regression analysis will confirm this theory. 3) Prediction: The prime concern of Regression Analysis is to predict the response variable. These predictions may be very crucial in planning, monitoring, or evaluating some process or system. 4) Control: Regression models may be used for monitoring and controlling a system. When a regression model is used for control purposes, the independent variable must be related to the dependent variable in a causal way. Furthermore, this functional relationship must continue

Definition: Regression analysis is a statistical technique for investigating and modelling the relationship between variables. Regression is used for predictions and forecasting. Thus, the primary objective of the regression is to develop an equation that will help investigator to predict the response variable as accurately as possible for the given values of regressor . LINEAR AND NON-LINEAR REGRESSION ANALYSIS In our day today life we come across the variables were one variable may depend on one of more other variables. Linear regression models describe the straight-line relationships between the variables X and Y whereas non-linear regression models describe the curvilinear relationships between X and Y. Linear and non-linear relationship between two variables can be easily evaluated by using scatter plot.

The goodness of fit of a model can be studied with the help of coefficient of determination. There are some situations where two variables exhibit the non-linear correlation and if we plot these points on scatter plot, it shows curvilinear nature. Consider the following example, X: -1 -2 1 2 Y: 1 4 1 4 Fig.: Curvilinear relationship between X and Y Here, if we calculate the correlation coefficient, the value of r = 0 it means that X and Y are not linearly correlated but it can be clearly seen that Y= X 2 . Hence, there exists non-linear relationship between X and Y. In this unit you are going to learn some non-linear regression models like second degree parabola and exponential models, how to fit these non-linear regression models by using Least Squares method and how we can use these fitted regression equations for predicting the value of dependent variable for the given value of independent variable.

APPROPRIATE SITUATIONS FOR REGRESSION ANALYSIS Following are some real-life situations where the linear regression analysis is appropriate 1. Weight of a person can be estimated by using the regression equation of weigh_height . It may be used to find the growth chart of normal child. 2. The medical practitioner may be interested to know the relationship between blood pressure levels of a group of bulky persons and their weights. The regression between the weight and blood pressure level of a person may be used in finding abnormalities. 3. In agricultural field, one can use the regression equation to estimate the variations the production of yield due to various factors like rainfall, fertilizers, pH of irrigation facilities etc. Similarly one can find the relation between germination time and soil temperature. 4. In business and trade, Sales of products depend upon the amount of rupees spent on the advertisement. Similarly, price of the commodity may depend upon supply. 5. The economist may be interested to know the relationship between productivity index and age which can be used to estimate productivity index based on the age. Sales, profit, production, population over time, Bean root cells, water content versus distance from growing tip (Data source: Ratkowsky (1983)) are the situations where non-linear regression analysis is appropriate.

FITTING OF REGRESSION LINES Using scatter diagram we get an idea of correlation. One can obtain a line passing through these points. However, if correlation is not perfect (i.e. r ≠ ± 1) then several lines can be drawn through these points. Out of those lines, how to choose the best line is a problem. So a line which minimizes the total of sum of squares of differences between true value and the value given by straight line is chosen. The principle is called as least square principle. The equation so obtained is called as least square regression line. As there are two correlated variables (X, Y), in some cases variable Y may be dependent variable while in some cases variable X may be dependent variable so we will study the procedure of fitting of both the regression lines Y on X and X on Y. 1 . Regression line of Y on X: Suppose (x1,y1), (X2, y2), ..., ( Xn , Yn ) are n pairs of observations on a bivariate variable (X,Y). The general equation of line of Y is Y= a+bX ……(1). In this case we assume y as dependent variable (effect) or response variable and x as independent variable (cause) or explanatory variable. Therefore this line can be used to predict values of y for known values of x. If the correlation between X and Y, r=±1, the above equation of line itself is the equation of regression line of Y on X but when r≠±1, we need to find equation of the best line. To find the equation of the best line, the method of least square is used. Let yi be the observed value of Y and ŷ is its estimate. The error in the estimate of Y is defined as Y- ŷ. ŷ can be obtained by putting the known value of X = xi in the equation of line y = a + bx.

We need to find the constants 'a' and 'b' such that the error in the estimation of Y is minimum. A line is fitted through the XY points such that the sum of the squared vertical distances ( ei , i = 1, 2, ..., .n) between the observed values and the estimated values on line is minimized as shown in given Figure. The least squares principle is used to fit the equation of regression line which minimizes the sum of the squares of the vertical distances between the actual Y values and the estimated values of Y. Fig: Fitted regression line, estimated value and its error from observed value

According to the principle of least squares, we have to determine a and b so that, is minimum. Symbolically we write, S = i.e. S = (Y − ŷ) 2 = (Y – a − bxi ) 2 To find minima of S consider ……(2)    

…..(3)   b is called as regression coefficient of Y on X. Substitute (2) and (3) in eq n . (1) Since 'b' is the slope of line of regression of Y on X it is denoted by byx and we term it as regression coefficient of Y on X. Since, the line of regression passes through the point ( , ), its equation is, ..... (4) Equation (4) represents a least square regression equation of Y on X.  

Regression line of X on Y Suppose (x1,y1), (X2, y2), ..., ( Xn , Yn ) are n pairs of observations on a bivariate variable (X,Y). The general equation of line of X is X= a+bY ……(5). In this case X is dependent variable (effect) and Y is independent variable (cause). Let xi be the observed value of X and is its estimate. The error in the estimate of X is defined as X- . can be obtained by putting the known value of Y = yi in the equation of line X = a + bY . Therefore, = a + b yi . We need to find the constants 'a' and 'b' such that the error in the estimation of X is minimum. A line is fitted through the XY points such that the sum of the squared horizontal distances ( ei , i = 1,2, ..., n) between the observed values and the estimated values on line is minimized. The least square principle is used to fit the equation of regression line which minimizes the sum of the squares of the horizontal distances between the actual X values and the estimated values of X.   According to the principle of least squares, we have to determine a and b so that, is minimum. Symbolically we write, S = i.e. S = (X − ) 2 = (X – a − byi ) 2  

To find minima of S consider   ……(6)  

  …..(7) b is called as regression coefficient of X on Y. Substitute (6) and (7) in eq n . (5) Since 'b' is the slope of line of regression of X on Y it is denoted by bxy and we term it as regression coefficient of X on Y. Since, the line of regression passes through the point ( , ), its equation is, ..... (8) Equation (8) represents a least square regression equation of X on Y.  
Tags