What is Dummy Variable?
•Variablesthatareessentiallyqualitativein
nature(or)variablesthatarenotreadily
quantifiable
•Examples:gender,maritalstatus,race,
colour,religion,nationality,geographical
location,political/policychanges,party
affiliation
Other Names for Dummy Variable
Indicatorvariables
Binaryvariables
Categoricalvariables
Dichotomousvariables
Qualitativevariables
Why Dummy Variable Regression?
•To include qualitative variables as an
explanatory variable in the regression
model
•Example: If we want to see whether gender
discrimination has any influence on
earnings, apart from other factors
How to quantifyqualitative aspect?
•Byconstructingartificialvariablesthattake
onvaluesof1or0(zero)
•1indicatespresenceofthatattribute
•0indicatesabsenceofthatattribute
•Example:
(1)Gender=1iftherespondentisfemale
=0iftherespondentismale
(2)Time=1ifwartime;0ifpeacetime
•Herevariableswithvalues1and0are
calleddummyvariables
Types of Dummy Variable Models
(1)AnalysisofVariance(ANOVA)Model:All
explanatoryvariablesaredummyvariables
(2)AnalysisofCovariance(ANCOVA)Model:
Mixofquantitativeandqualitative
explanatoryvariables
ANOVA Model
•Supposewewanttomeasureimpactof
GENDERonwages/employeecompensation
•Inparticular,weareinterestedtoknow
whether femaleemployees are
discriminatedagainsttheirmale
counterparts
•Genderisnotstrictlyquantifiable
•Hence,wedescribegenderusingdummy
variable
D = 1 if male respondent
= 0 if female respondent [reference group]
Let the regression model as
Y
i= + D + u
i (1)
(where Y-Monthly salary)
Does conclusion of model change if we
interchange dummy values?
Suppose Y
i= + D + u
i
where Y = Hourly wage
D= Gender (1= Male; 0 –Female)
Now,ifweinterchangedummyvaluesas(1=Female;0-
Male),itwillnotchangeoverallconclusionoforiginal
model(seefigure)
Onlychangeis,now“otherwise”categoryhasbecome
benchmarkcategoryandallcomparisonsaremadein
relationtothiscategory
•Hence, choice of benchmark category () is strictly up to
the researcher
Extension of ANOVA Model
•Canbeextendedtoincludemorethanone
qualitativevariable
Y
i= +
1D
1+
2D
2+ u
i
where Y = Hourly wage
D
1= Marital status (1= married; 0 -otherwise)
D
2= Region of residence (1= south; 0 –otherwise)
Which is the benchmark category here?
Unmarried, non-south residence
•Mean hourly wages of benchmark category is
•Mean wages of those who are married is +
1
•Mean wages of those who live in south is +
2
ANCOVA Model
•Consistsamixtureofqualitativeand
quantitativeexplanatoryvariables
•Suppose,inouroriginalmodel(1)weinclude
numberofyearsofexperienceasanadditional
variable
•Nowwecanraiseonemorequestion:between
2employeeswithsameexperience,istherea
genderdifferenceinwages?
•We can express regression model as
Y
i=
1+
2D + X
i+ u
i
where D is dummy; X
iis experience variable
•Now, mean salary of male is
E (Y
i/D=1,X) =
1+
2+ X
i
•Mean salary of female is
E (Y
i/D=0,X) =
1+ X
i
•Slope is same for both categories (male &
female), only intercept differs
Slope
What does common slope mean?
•
2 measures average difference in salary between
male and female, given the same level of experience
•Ifwetakeafemaleandmalewithsamelevelsof
experience,
1+
2representssalaryofmale,on
average,and
1salaryoffemaleonaverage
•Notethatsincewecontrolledforexperienceinthe
regression,thewagedifferentialcan’tbeexplainedby
differentaveragelevelsofexperiencebetweenmale
andfemale
Diagrammatic Explanation
X
Y
1+ X
i
1+
2+X
i
Constant Term
1–intercept for base group;
1 +
2–intercept for male; and
2measures the difference in intercept
1
2
1+2
Slope
Regression Estimation Results:
Y (Cap) = 1366.267 + 525.632 D +19.807 X
(8.534)(3.114)1.456)
R
2
=0.48; F = 6.901
Interceptforfemale(base)Group
1=1366.27.It
measuresmeansalaryoffemale
InterceptformaleGroup,
1+
2=1891.90.It
measuresmeansalaryofmale,ofwhich525.63(
2)is
averagedifferenceinsalarybetweenmaleandfemale
(i.e.19.81)–asno.ofyearsofexperiencegoesup
by1year,onaverage,aworkers(maleorfemale)
salarygoesupbyRs.19.81
Example: Several qualitative variables, with
some having more than two category:
•Example:Consumptionfunctionanalysis.
•Supposetherearethreequalitativefactors:
gender,ageofhouseholdheadandeducation
levelofhead.
•Definedummyvariablesas:
D
1=1ifmaleand=0otherwise
D
2=1ifage<25and=0otherwise
D
3=1ifagebetween25and50and=0otherwise
D
4=1ifhighschooleducationand=0otherwise
D
5=1ifH.sc.,degreeandaboveand=0otherwise
Base or Reference Groups:
Regression Model:
C
t= + Y
t+
1D
1+
2D
2+
3D
3+
4D
4 +
5D
5 + u
t
-intercept for female head of household
-intercept term if age of head is above 50 years
-intercept term if head’s education is below high school
In short represents female head of household aged above
50 years and with below high school education
Differential intercepts or mean
compensation for other groups:
+
1-for male household head
+
2 –for age is less than 25 years
+
3 -for age between 25 and 50 years
+
4 –for high school education
+
5 –for above high school education
If the household head is male with age 40 years and
high school education, what is the intercept?
+
1+
3+
4
Interactions Involving Dummy Variables
•Consider the following model:
Y
i=
1+
2D
2i+
3D
3i+ X
i+ u
i -----(1)
Y
i= Hourly wage
X
i= Education (years of schooling)
D
2= 1 if female, 0 if male [GENDER]
D
3= 1 if black, 0 if white [RACE]
Notethatinthismodeldummyvariablesare
interactiveinnature.How?
•Here, if mean salary is higher for female than for
male, this is so whether they (female) are black or
white
•Similarly,ifmeansalaryislowerforblack,thisisso
whetherthey(black)aremaleorfemale
•Implication:EffectofD
2andD
3onYmaynotbe
simplyadditiveasin(1)butmultiplicativeasbelow
Male Female Black White
BlackWhiteBlackWhiteMaleFemaleMaleFemale
Gender
Race
Y
i=
1+
2D
2i+
3D
3i+
4 (D
2iD
3i)+ X
i+ u
i --(2)
Eq(2)includesexplicitlyinteractionbetweenGENDER&
RACE,i.e.D
2iD
3i
2–differential effect of being a female (gender alone)
3 –differential effect of being a black (race alone)
4–differential effect of being a black female (g & r)
1 –Male white (base category)
Note: While running (2), simply multiply D
2iD
3i values
Eq(2)isadifferentwayoffindingwagedifferentials
acrossallgender-racecombinations
Inotherwordsinteractivemodel(eq.2)allowsus
toobtainestimatedwagedifferentialamongall4
groups(male,female,black&white).How?
(i) Black Female (Y
i/D
2i=1, D
3i=1, X
i)
1+
2+
3 +
4
(ii) Black Male (Y
i/D
2i=0, D
3i=1, X
i)
1+
3
(iii) White Male (Y
i/D
2i=0, D
3i=0, X
i)
1
(iv) White Female (Y
i/D
2i= 1, D
3i=0, X
i)
1+
2
Male Female Married (M)Unmarried (UM)
M UM M UMMaleFemaleMaleFemale
Gender
Marital status