What is Regression Analysis ? Technique of estimating the unknown value of dependent variable from the known value of independent variable is called regression analysis. Eg : The effect of a price increase upon demand, or the effect of changes in the money supply upon the inflation rate 2
Regression Lines A regression line is a line that best describes the linear relationship between the two variables. y = a + bx 3 a Y= a+bX Y=a- bX X Axis Y Axis
Assumptions for regression Measurement : All independent variables –interval/ratio/dichotomous Dependent variable- interval/ratio Specification : Linear relationship between dependent and independent Expected value of error term : zero 4
Homoscedasticity Variance of error term is same/ constant Normality of error Normally distributed for each set of values of independent variable Absence of multicollinearity Assumptions for regression 5
Limitations of linear regression Violation of measurements Dependent variable : if it is dichotomous eg .: Smoker and non-smoker Adopter and non-adopter Participating and non-participating Independent variable: if any of the IV is dichotomous Eg: male and female 6
Shall we use LPM ???... yes but… Non-normality of the errors Ui Hetroscedastic variances of the errors Non fulfillment of 0 < E ( Yi|Xi ) < 1 Questionable of value of R 2 as a measure of goodness of fit 7
What is the way out ??? Logit Probit Tobit 8
Presentation on Logit, Probit and tobit Model Rabeesh Kumar Verma Roll no : 10756 Division of Agricultural Extension,ICAR - IARI 9
What is logistic regression ? Used to analyze relationships between a dichotomous dependent variable and metric or dichotomous independent variables Combines the independent variables to estimate the probability that a particular event will occur or not LR is a nonlinear regression model that forces the output (predicted values) to be either 0 or 1 It could be called a qualitative response/discrete choice model in the terminology of economics 10
Assumptions: NO NEED a linear relationship between the dependent and independent variables NO NEED- Homoscedasticity of independent variables The error terms need to be independent It requires quite large sample sizes Absence of perfect multicollinearity NO NEED - normality, linearity, and homogeneity of variance for the independent variables 11
12 -∞ +∞ 1 P CDF
Feature of logit model: As P goes from 0 to 1 the logit L goes from -∞ to +∞. That is although probabilities lie between 0 to 1,logits are not so bounded. L is linear in X, the probabilities themselves are not, which is in contrast with LPM model where probabilities increases linearly with X. If L , the logit is positive, it means that when the value of the regressor (s) increases the odds that the regressand equals to 1 increases and vice versa. 13
Level of measurement: Logistic regression analysis requires that the dependent variable be dichotomous. Logistic regression analysis requires that the independent variables be metric or dichotomous. If an independent variable is nominal level and not dichotomous, the logistic regression procedure in SPSS has a option to dummy code the variable . If an independent variable is ordinal, we will attach the usual caution. 14
Variables in logistic regression : In typical logistic regression analysis there will always be one dependent (dichotomous) and Usually set of independent variable that may be either dichotomous or quantitative or some combination . 15
The minimum number of cases per independent variable is 10 (Hosmer and Lemeshow , Applied Logistic Regression ) For example- If we are using 8 independent variables, then minimum sample size should be = 8 x 10= 80 Sample size: Logit model and Probit model 16
Logistics regression equation Ln (Pi / (1-Pi)= a + b1x1 +b2x2+….+ bnXn Where, Pi= probability of happening of event eg : adoption of technology (1-Pi) = probability of not happening of the event eg : non-adoption of technology X1, X2…. Xn = independent variables b1, b2… bn = regression coefficients a= constant (intercept) 17
Example : Dependent variable Adoption / Non-adoption Independent variables Description Hypothesized relation Age Chronological years of farmers - Education No of years of formal schooling + Land holding Farm size measured acres + Access to training Yes=1 / no=0 + Distance to market In kilometers - Access to credit Yes=1 / no=0 + Extension services Yes=1 / no=0 + 18
Logit in SPSS 19
Logit in SPSS contd … 20
Logit model in stata :
Logit :predicted possibilities.
logit: Odd ratio
Case 1: A Logit Analysis of Bt Cotton Adoption and Assessment of Farmers’ Training Need Padaria, et al., 2009 24
Contd … Padaria, et al., 2009 B = regression coefficient Used to predict whether or not an independent variable would be significant in the model. degrees of freedom for the Wald chi- square test, Are the standard errors associated with the coefficients Wald chi square value and 2tailed p value used in testing the null hypothesis that the coefficient (parameter) is 0 Exp(B) the exponentiation of the B coefficient, which is an odds ratio . This value is given by default because odds ratios can be easier to interpret than the coefficient 25
Advantages of logit model : Transformation of a dependent dichotomous dependent variable into continuous variable Results - easily interpretable simple to analyse method. It gives parameter estimates - asymptotically consistent, efficient and normal, so that the analogue by the regression t-test can be applied. 26
Limitation: As in case of logit probility model, the disturbance term in logit model hetroscedasticity and therefore we should go for weighted least squares . As in many other regression , there may be problem of multicollinearity if the explanatory variable are related among themselves 27
Application of logit model: 1.It can be used to identify the factors that affects the adoption of particular technology say, use of new crop varities, fertilizers, pesticides etc on the farm . 2.Model used extensively to analyzing growth phenomena such as population, GNP, money supply etc . 3. In field of marketing it can be used for brand preferences and brand loyalty for a brand 4. Gender studies can be used logit analysis to find out factors which affect the decision making status of men and women in family 28
Probit regression model: Probit model is a type of regression where the dependent variable can only take two values, for example adoption or non-adoption, married or not married. The purpose of the model is to estimate the probability Estimating model that emerge from normal cumulative distribution function (CDF) is popularly known as probit model Sometimes it is also called as normit model. 29
Probit :Level of measurement requirements Dependent variable = dichotomous/categorical Eg: adoption and non adoption, participation and non- participation Independent variables be metric or dichotomous Eg: age-ratio level data Gender- male/female(dichotomous) 30
Case 2 : Factors Affecting Adoption of Improved Rice Varieties among Rural Farm Households in Central Nepal Ghimire (2015 ) (Published in : Rice Science) 31
Probit result cont … 32
Difference b/w Logit and Probit model: Logit Probit Slightly flatter tails The conditional probability Pi approaches 0 or 1 at a faster rate Basis of logit model is standard logistic distribution Basis of probit model is standard normal distribution Variance = Π 2 / 3 Variance = 1 Simple mathematics Sophisticated mathematics Both give same result, preference of the method depends on the researcher choice but logit regression is mostly preferered 33
Significance of Wald test To test Statistical significance of unique contribution of each coefficient in the model This test is similar to the t test in the multiple regression 34
Ordinal logit & probit model In both the cases - when the outcome is more than 2 and are ordinal in nature The dependent variables : Eg1: Likert type scale : strongly agree , somewhat agree, strongly disagree Eg2: less than high school (0), high school(1), college (2), post graduate (3) The independent variables remain same as in logit and probit model 35
Multi nominal logit and multi nominal probit When the dependent variable is not ordinal nature & the categories of dependent variables are more than 2. E.g. 1: adoption of different adaptation strategies Dependent variables =choice of transportation to work Eg2: occupation classification : unskilled, semiskilled, highly skilled 36
Multi nominal logit model Kassie et al. 2008 37 Dependent variable : compost , conservation tillage, both we have three categories i.e. > 2 categories
Tobit model An extension of probit model. Developed by James Tobin (Nobel laurate economist) Used when a sample in which information on the regressand is available only for some observation. Such sampled are called as censored sample. Therefore Tobit model is also know as censored regression model. Sometimes also called as limited dependent variable regression models 38
Conti.. Example: Suppose we have a set of consumer and we are interested in finding out the amount of money a person or family spends on a house in relation to socioeconomic variables. Here we have a dilemma … If a consumer does not purchase a house, obviously we have no data on housing expenditure for such consumers, we have such data only for the consumers; who actually purchase a house. 39
Thus, consumers are divided into two groups consisting of say n1 and n2 n1-about whom we have information on the regressor (say income, no.of people, mortagage interest rate ) as well as r egressand (amount of expenditure on house ) n2- about whom we have information only on the regressor but not on the regressand. Now questions arise ? 40
Can we estimates regression using only n1 observation and not worry about the remaining n2 observation. The answer is no.. For the OLS estimates of the parameters obtained from the subset of n1 observation will be biased as well as incosistent . 41
Statistically we can express tobit model as Yi= β 1 + β 2 Xi+Ui if RHS>0 = 0 Where RHS=right hand side Note : additional X variables can be easily added to the model. 42
Truncated sample : Distinguish from censored sample. In truncated sample information on the regreessor (IV) is available only if the regressand(DV) is observed. 43
If we estimate a regression line based on the n1 observation only, the resulting intercept and slope coefficients are bound to be different than if all the (n1+n2) observation were taken into account . 44
Mechanics of estimating tobit model: Tobit model are estimated by method of maximum likelihood . James Hackman has proposed alternative to ML which is comparatively easy. The Heckman procedure yields consistent estimates of the parameters but they are not as efficient as the ML estimates. 45
Nested regression analysis A nested model is one in which you incrementally add variables such that every subsequent model is a superset of the preceding one. For example, if y = a + bx is the first model, then the second model would be something like y = a + bx + cz +.... The advantage of this set-up is that it allows you to compare different specifications and ultimately investigate the relative importance of specific variables. 46
Note that a model is nested if and only if the next model contains the exact same terms in the preceding one and has at least one additional term. On the other hand, a two-stage model is one in which two equations are estimated one after the other with the second stage equation including a predicted value (usually the predicted outcome or residuals) from the first stage equation 47
Conclusion Clear on – About the assumption of different regression analysis model. Researcher should be well aware of the different model and used according to the defined research problem. Logit and probit model are being extensively used in health science, behavioral and social sciences. Models are extensively used in social research when dependent variable is dichotomous. 48
References : Meyers,L.S ., Gamst , G., & Guarino , A.J (2006). Applied Multivariate Research : Design And Interpretation Padaria et.al (2009) . A Logit Analysis Of Bt Cotton Adoption And Assessment of Farmer’s Training Need. Indian Res.J.Ext.Edu.9(2) Damodar et al . (2012). Basic econometrics. Mcgraw Hill Education , India 49