What is Dummy Variable? Variables that are essentially qualitative in nature (or) variables that are not readily quantifiable Examples: gender, marital status, race, colour, religion, nationality, geographical location, political/policy changes, party affiliation
Other Names for Dummy Variable Indicator variables Binary variables Categorical variables Dichotomous variables Qualitative variables
Why Dummy Variable Regression? To include qualitative variables as an explanatory variable in the regression model Example: If we want to see whether gender discrimination has any influence on earnings, apart from other factors
How to quantify qualitative aspect? By constructing artificial variables that take on values of 1 or (zero) 1 indicates presence of that attribute indicates absence of that attribute Example: Gender = 1 if the respondent is female = if the respondent is male Time = 1 if war time; if peacetime He r e v a r iables with v a lu es 1 a n d 0 a r e called dummy variables
Types of Dummy Variable Models Analysis of Variance (ANOVA) Model : All explanatory variables are dummy variables Analysis of Covariance (ANCOVA) Model: Mix of quantitative and qualitative explanatory variables
ANOVA Model Su p pos e w e w a n t t o me a su r e im p act of GENDER on wages/employee compensation female against e mplo y ee s their w e a r e i n t e r e st ed t o know a r e male In particular, whether di s crimi n a t ed counterparts Gender is not strictly quantifiable
Hence , w e des c ribe g e n de r usi n g du m m y variable D = 1 if male respondent = 0 if female respondent [reference group] Let the regression model as Y i = + D + u i (1) (where Y-Monthly salary)
Thi s specifi c a t i o n hel p s u s t o se e wh e ther gender makes difference in salary. Interpretation of model (1): Taking expectation of (1) on both sides, we get Mean salary of male as E (Y i /D=1) = + Mean salary of female as E (Y i /D=0) =
Note, mean salary of female is given by intercept Coefficient tells by how much mean salary of male workers differ from mean salary of female workers (or) simply difference in average salary between men & women - called differential intercept coefficient is attached to category which is assigned dummy variable value of 1 (here male)
Intercept ( ) belongs to the category for which zero dummy variable value is assigned (here female) The category which is assigned zero dummy is known as benchmark/control/reference category Intercept value represents mean value of benchmark category All comparisons (with ) are made in relation to benchmark category
Hypothesis testing: Done in the usual way H : = 0 [No gender discrimination in salary determination/no statistically significant difference in salaries between males and females] H1 : 0 [Gender discrimination is present in salary determination] Use t – statistics If is significantly different from zero, we can accept alternate hypothesis
Example: Cross Section Data on Monthly Wages and Gender Y D Y D Y D 1345 1566 2533 1 2435 1 1187 1602 1715 1 1345 1839 1461 1 1345 2218 1 1639 1 2167 1 1529 1345 1402 1 1461 1 1602 2115 1 3307 1 1144 2218 1 3833 1 1566 1 3575 1 1839 1 1496 1 1972 1 1461 1234 1234 1433 1 1345 1926 1 2115 1345 2165 1839 1 3389 1 2365 1288 1 1839 1 1345 1288 981 1 1839 Male 26 1345 2613 1 Female 23
Regression Results: R 2 =0.195; F=11.410 Fe m a l e M a l e + =568.23 (=1 5 1 8 .6 9 6) (=2086.923) Y = 1518.696 + 568.227 D t: (12.394) (3.378) Y Nos.
Results show mean salary of female workers is about Rs.1519 Mean salary of male workers is increased by Rs.568 (i.e. 1519 + 568 = 2087) t statistics reveal that mean salary of male is statistically significantly higher by about Rs.568
Does conclusion of model change if we interchange dummy values? Suppose Y i = + D + u i where Y = Hourly wage D = Gen d er (1 = Male; – F emale) Now, if we interchange dummy values as (1= Female; 0- Male), it will not change overall conclusion of original model (see figure) Only change is, now “otherwise” category has become benchmark category and all comparisons are made in relation to this category Hence, choice of benchmark category ( ) is strictly up to the researcher
Extension of ANOVA Model Ca n b e e x t ended t o incl ud e mo r e than o ne qualitative variable Y i = + 1 D 1 + 2 D 2 + u i where Y = Hourly wage D 1 = Marital status (1= married; - otherwise) D 2 = Region of residence (1= south; – otherwise) Which is the benchmark category here? Unmarried, non-south residence
Mean hourly wages of benchmark category is Mean wages of those who are married is + 1 Mean wages of those who live in south is + 2
ANCOVA Model Consists a mixture of qualitative and quantitative explanatory variables Suppose, in our original model (1) we include number of years of experience as an additional variable Now we can raise one more question: between 2 employees with same experience, is there a gender difference in wages?
We can express regression model as Y i = 1 + 2 D + X i + u i where D is dummy; X i is experience variable Now, mean salary of male is E (Y i /D=1,X) = 1 + 2 + X i Mean salary of female is E (Y i /D=0,X) = 1 + X i Slope is same for both categories (male & female), only intercept differs Sl o pe
What does common slope mean? 2 measures average difference in salary between male and female, given the same level of experience If we take a female and male with same levels of experience, 1 + 2 represents salary of male, on average, and 1 salary of female on average Note that since we controlled for experience in the regression, the wage differential can’t be explained by different average levels of experience between male and female
Hence, we can conclude that wage differential is due to gender factor
Diagrammatic Explanation X Y 1 + X i 1 + 2 + X i 1 Constant Term – intercept for base group; 1 + 2 – intercept for male; and 2 measures the difference in intercept 1 2 1 + 2 Slope
Regression Estimation Results: Y (Cap) = 1366.267 + 525.632 D +19.807 X ( 8 .5 3 4) ( 3 . 1 1 4 ) 1 .4 5 6 ) R 2 =0.48; F = 6.901 Intercept for female (base) Group 1 =1366.27. It measures mean salary of female Intercept for male Group, 1 + 2 = 1891.90. It measures mean salary of male, of which 525.63 ( 2 ) is average difference in salary between male and female (i.e. 19.81) – as no. of years of experience goes up by 1 year, on average, a workers (male or female) salary goes up by Rs.19.81
2 – difference in intercept is 525.63 and is statistically significant at 5% level. Therefore, we can reject the null hypothesis of no gender differential