Mastering’MetricsMastering’MetricsThePathfro

AbramMartino96 243 views 182 slides Sep 22, 2022
Slide 1
Slide 1 of 568
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118
Slide 119
119
Slide 120
120
Slide 121
121
Slide 122
122
Slide 123
123
Slide 124
124
Slide 125
125
Slide 126
126
Slide 127
127
Slide 128
128
Slide 129
129
Slide 130
130
Slide 131
131
Slide 132
132
Slide 133
133
Slide 134
134
Slide 135
135
Slide 136
136
Slide 137
137
Slide 138
138
Slide 139
139
Slide 140
140
Slide 141
141
Slide 142
142
Slide 143
143
Slide 144
144
Slide 145
145
Slide 146
146
Slide 147
147
Slide 148
148
Slide 149
149
Slide 150
150
Slide 151
151
Slide 152
152
Slide 153
153
Slide 154
154
Slide 155
155
Slide 156
156
Slide 157
157
Slide 158
158
Slide 159
159
Slide 160
160
Slide 161
161
Slide 162
162
Slide 163
163
Slide 164
164
Slide 165
165
Slide 166
166
Slide 167
167
Slide 168
168
Slide 169
169
Slide 170
170
Slide 171
171
Slide 172
172
Slide 173
173
Slide 174
174
Slide 175
175
Slide 176
176
Slide 177
177
Slide 178
178
Slide 179
179
Slide 180
180
Slide 181
181
Slide 182
182
Slide 183
183
Slide 184
184
Slide 185
185
Slide 186
186
Slide 187
187
Slide 188
188
Slide 189
189
Slide 190
190
Slide 191
191
Slide 192
192
Slide 193
193
Slide 194
194
Slide 195
195
Slide 196
196
Slide 197
197
Slide 198
198
Slide 199
199
Slide 200
200
Slide 201
201
Slide 202
202
Slide 203
203
Slide 204
204
Slide 205
205
Slide 206
206
Slide 207
207
Slide 208
208
Slide 209
209
Slide 210
210
Slide 211
211
Slide 212
212
Slide 213
213
Slide 214
214
Slide 215
215
Slide 216
216
Slide 217
217
Slide 218
218
Slide 219
219
Slide 220
220
Slide 221
221
Slide 222
222
Slide 223
223
Slide 224
224
Slide 225
225
Slide 226
226
Slide 227
227
Slide 228
228
Slide 229
229
Slide 230
230
Slide 231
231
Slide 232
232
Slide 233
233
Slide 234
234
Slide 235
235
Slide 236
236
Slide 237
237
Slide 238
238
Slide 239
239
Slide 240
240
Slide 241
241
Slide 242
242
Slide 243
243
Slide 244
244
Slide 245
245
Slide 246
246
Slide 247
247
Slide 248
248
Slide 249
249
Slide 250
250
Slide 251
251
Slide 252
252
Slide 253
253
Slide 254
254
Slide 255
255
Slide 256
256
Slide 257
257
Slide 258
258
Slide 259
259
Slide 260
260
Slide 261
261
Slide 262
262
Slide 263
263
Slide 264
264
Slide 265
265
Slide 266
266
Slide 267
267
Slide 268
268
Slide 269
269
Slide 270
270
Slide 271
271
Slide 272
272
Slide 273
273
Slide 274
274
Slide 275
275
Slide 276
276
Slide 277
277
Slide 278
278
Slide 279
279
Slide 280
280
Slide 281
281
Slide 282
282
Slide 283
283
Slide 284
284
Slide 285
285
Slide 286
286
Slide 287
287
Slide 288
288
Slide 289
289
Slide 290
290
Slide 291
291
Slide 292
292
Slide 293
293
Slide 294
294
Slide 295
295
Slide 296
296
Slide 297
297
Slide 298
298
Slide 299
299
Slide 300
300
Slide 301
301
Slide 302
302
Slide 303
303
Slide 304
304
Slide 305
305
Slide 306
306
Slide 307
307
Slide 308
308
Slide 309
309
Slide 310
310
Slide 311
311
Slide 312
312
Slide 313
313
Slide 314
314
Slide 315
315
Slide 316
316
Slide 317
317
Slide 318
318
Slide 319
319
Slide 320
320
Slide 321
321
Slide 322
322
Slide 323
323
Slide 324
324
Slide 325
325
Slide 326
326
Slide 327
327
Slide 328
328
Slide 329
329
Slide 330
330
Slide 331
331
Slide 332
332
Slide 333
333
Slide 334
334
Slide 335
335
Slide 336
336
Slide 337
337
Slide 338
338
Slide 339
339
Slide 340
340
Slide 341
341
Slide 342
342
Slide 343
343
Slide 344
344
Slide 345
345
Slide 346
346
Slide 347
347
Slide 348
348
Slide 349
349
Slide 350
350
Slide 351
351
Slide 352
352
Slide 353
353
Slide 354
354
Slide 355
355
Slide 356
356
Slide 357
357
Slide 358
358
Slide 359
359
Slide 360
360
Slide 361
361
Slide 362
362
Slide 363
363
Slide 364
364
Slide 365
365
Slide 366
366
Slide 367
367
Slide 368
368
Slide 369
369
Slide 370
370
Slide 371
371
Slide 372
372
Slide 373
373
Slide 374
374
Slide 375
375
Slide 376
376
Slide 377
377
Slide 378
378
Slide 379
379
Slide 380
380
Slide 381
381
Slide 382
382
Slide 383
383
Slide 384
384
Slide 385
385
Slide 386
386
Slide 387
387
Slide 388
388
Slide 389
389
Slide 390
390
Slide 391
391
Slide 392
392
Slide 393
393
Slide 394
394
Slide 395
395
Slide 396
396
Slide 397
397
Slide 398
398
Slide 399
399
Slide 400
400
Slide 401
401
Slide 402
402
Slide 403
403
Slide 404
404
Slide 405
405
Slide 406
406
Slide 407
407
Slide 408
408
Slide 409
409
Slide 410
410
Slide 411
411
Slide 412
412
Slide 413
413
Slide 414
414
Slide 415
415
Slide 416
416
Slide 417
417
Slide 418
418
Slide 419
419
Slide 420
420
Slide 421
421
Slide 422
422
Slide 423
423
Slide 424
424
Slide 425
425
Slide 426
426
Slide 427
427
Slide 428
428
Slide 429
429
Slide 430
430
Slide 431
431
Slide 432
432
Slide 433
433
Slide 434
434
Slide 435
435
Slide 436
436
Slide 437
437
Slide 438
438
Slide 439
439
Slide 440
440
Slide 441
441
Slide 442
442
Slide 443
443
Slide 444
444
Slide 445
445
Slide 446
446
Slide 447
447
Slide 448
448
Slide 449
449
Slide 450
450
Slide 451
451
Slide 452
452
Slide 453
453
Slide 454
454
Slide 455
455
Slide 456
456
Slide 457
457
Slide 458
458
Slide 459
459
Slide 460
460
Slide 461
461
Slide 462
462
Slide 463
463
Slide 464
464
Slide 465
465
Slide 466
466
Slide 467
467
Slide 468
468
Slide 469
469
Slide 470
470
Slide 471
471
Slide 472
472
Slide 473
473
Slide 474
474
Slide 475
475
Slide 476
476
Slide 477
477
Slide 478
478
Slide 479
479
Slide 480
480
Slide 481
481
Slide 482
482
Slide 483
483
Slide 484
484
Slide 485
485
Slide 486
486
Slide 487
487
Slide 488
488
Slide 489
489
Slide 490
490
Slide 491
491
Slide 492
492
Slide 493
493
Slide 494
494
Slide 495
495
Slide 496
496
Slide 497
497
Slide 498
498
Slide 499
499
Slide 500
500
Slide 501
501
Slide 502
502
Slide 503
503
Slide 504
504
Slide 505
505
Slide 506
506
Slide 507
507
Slide 508
508
Slide 509
509
Slide 510
510
Slide 511
511
Slide 512
512
Slide 513
513
Slide 514
514
Slide 515
515
Slide 516
516
Slide 517
517
Slide 518
518
Slide 519
519
Slide 520
520
Slide 521
521
Slide 522
522
Slide 523
523
Slide 524
524
Slide 525
525
Slide 526
526
Slide 527
527
Slide 528
528
Slide 529
529
Slide 530
530
Slide 531
531
Slide 532
532
Slide 533
533
Slide 534
534
Slide 535
535
Slide 536
536
Slide 537
537
Slide 538
538
Slide 539
539
Slide 540
540
Slide 541
541
Slide 542
542
Slide 543
543
Slide 544
544
Slide 545
545
Slide 546
546
Slide 547
547
Slide 548
548
Slide 549
549
Slide 550
550
Slide 551
551
Slide 552
552
Slide 553
553
Slide 554
554
Slide 555
555
Slide 556
556
Slide 557
557
Slide 558
558
Slide 559
559
Slide 560
560
Slide 561
561
Slide 562
562
Slide 563
563
Slide 564
564
Slide 565
565
Slide 566
566
Slide 567
567
Slide 568
568

About This Presentation

Mastering’Metrics



Mastering’Metrics
ThePathfromCausetoEffect

JoshuaD.Angrist
and

Jörn-SteffenPischke

PRINCETONUNIVERSITYPRESS▪PRINCETONANDOXFORD



Copyright©2015byPrincetonUniversityPress
PublishedbyPrincetonUniversityPress,41WilliamStreet,Princeton,NewJersey08540
IntheUnitedKingdom:P...


Slide Content

Mastering’Metrics



Mastering’Metrics
ThePathfromCausetoEffect

JoshuaD.Angrist
and

Jörn-SteffenPischke

PRINCETONUNIVERSITYPRESS▪PRINCETONANDOXFORD



Copyright©2015byPrincetonUniversityPress
PublishedbyPrincetonUniversityPress,41WilliamStreet,Princeto
n,NewJersey08540
IntheUnitedKingdom:PrincetonUniversityPress,6OxfordStreet,
Woodstock,Oxfordshire

OX201TW

press.princeton.edu
JacketandillustrationdesignbyWandaEspana

BookillustrationsbyGarrettScafani
AllRightsReserved

LibraryofCongressCataloging-in-PublicationData
Angrist,JoshuaDavid.

Mastering’metrics:thepathfromcausetoeffect/JoshuaD.Angrist,Jö
rn-SteffenPischke.
pagescm
Includesindex.

Summary:“Appliedeconometrics,knowntoaficionadosas’metrics,
istheoriginaldatascience.
’Metricsencompassesthestatisticalmethodseconomistsusetountan
glecauseandeffectin
humanaffairs.Throughaccessiblediscussionandwithadoseofkungf
u-themedhumor,
Mastering’Metricspresentstheessentialtoolsofeconometricresearc
handdemonstrateswhy
econometricsisexcitinganduseful.Thefivemostvaluableeconometr
icmethods,orwhatthe
authorscalltheFuriousFive—
randomassignment,regression,instrumentalvariables,regression
discontinuitydesigns,anddifferencesindifferences-
areillustratedthroughwellcraftedreal-world
examples(vettedforawesomenessbyKungFuPanda’sJadePalace).
Doeshealthinsurancemake
youhealthier?Randomizedexperimentsprovideanswers.Areexpens
iveprivatecollegesand
selectivepublichighschoolsbetterthanmorepedestrianinstitutions?
Regressionanalysisanda
regressiondiscontinuitydesignrevealthesurprisingtruth.Whenpriv
atebanksteeter,and

depositorstaketheirmoneyandrun,shouldcentralbanksstepintosav
ethem?Differences-in-
differencesanalysisofaDepression-
erabankingcrisisoffersaresponse.CouldarrestingO.J.
Simpsonhavesavedhisex-

wife’slife?Instrumentalvariablesmethodsinstructlawenforcement
authoritiesinhowbesttorespondtodomesticabuse.Wieldingeconom
etrictoolswithskilland
confidence,Mastering’Metricsusesdataandstatisticstoilluminatet
hepathfromcausetoeffect.
ShowswhyeconometricsisimportantExplainseconometricresearch
throughhumorousand
accessiblediscussionOutlinesempiricalmethodscentraltomoderne
conometricpracticeWorks

throughinterestingandrelevantreal-worldexamples”—
Providedbypublisher.
ISBN978-0-691-15283-7(hardback:alk.paper)—
ISBN978-0-691-15284-4(paperback:alk.paper)
1.Econometrics.I.Pischke,Jörn-Steffen.II.Title.

HB139.A539842014
330.01′5195—dc232014024449

BritishLibraryCataloging-in-PublicationDataisavailable
ThisbookhasbeencomposedinSabonwithHelveticaNeueCondense
dfamilydisplayusing

ZzTEXbyPrincetonEditorialAssociatesInc.,Scottsdale,Arizona

Printedonacid-freepaper.♾
PrintedintheUnitedStatesofAmerica

13579108642

http://press.princeton.edu


CONTENTS

ListofFiguresvii

ListofTablesix
Introductionxi

1RandomizedTrials1
1.1InSicknessandinHealth(Insurance)1
1.2TheOregonTrail24

Mastersof’Metrics:FromDanieltoR.A.Fisher30
Appendix:MasteringInference33

2Regression47
2.1ATaleofTwoColleges47
2.2MakeMeaMatch,RunMeaRegression55
2.3CeterisParibus?68

Mastersof’Metrics:GaltonandYule79
Appendix:RegressionTheory82

3InstrumentalVariables98
3.1TheCharterConundrum99
3.2AbuseBusters115
3.3ThePopulationBomb123

Mastersof’Metrics:TheRemarkableWrights139
Appendix:IVTheory142

4RegressionDiscontinuityDesigns147



4.1BirthdaysandFunerals148
4.2TheEliteIllusion164
Mastersof’Metrics:DonaldCampbell175

5Differences-in-Differences178
5.1AMississippiExperiment178

5.2Drink,Drank,…191
Mastersof’Metrics:JohnSnow204
Appendix:StandardErrorsforRegressionDD205

6TheWagesofSchooling209
6.1Schooling,Experience,andEarnings209
6.2TwinsDoubletheFun217
6.3EconometriciansAreKnownbyTheir…Instruments223
6.4RustlingSheepskinintheLoneStarState235
Appendix:BiasfromMeasurementError240

AbbreviationsandAcronyms245
EmpiricalNotes249
Acknowledgments269
Index271



FIGURES

1.1Astandardnormaldistribution 40
1.2Thedistributionofthet-statisticforthemeaninasample

ofsize10
41

1.3Thedistributionofthet-statisticforthemeaninasample
ofsize40

42

1.4Thedistributionofthet-statisticforthemeaninasample
ofsize100

42

2.1TheCEFandtheregressionline 83
2.2VarianceinXisgood 96
3.1ApplicationandenrollmentdatafromKIPPLynnlotteries 103
3.2IVinschool:theeffectofKIPPattendanceonmathscores 108
4.1Birthdaysandfunerals 149
4.2AsharpRDestimateofMLDAmortalityeffects 150
4.3RDinaction,threeways 154
4.4QuadraticcontrolinanRDdesign 158
4.5RDestimatesofMLDAeffectsonmortalitybycauseof

death
161

4.6EnrollmentatBLS 166
4.7EnrollmentatanyBostonexamschool 167
4.8PeerqualityaroundtheBLScutoff 168
4.9MathscoresaroundtheBLScutoff 172
4.10ThistlethwaiteandCampbell’sVisualRD 177
5.1BankfailuresintheSixthandEighthFederalReserve

Districts
184

5.2TrendsinbankfailuresintheSixthandEighthFederal 185



ReserveDistricts
5.3TrendsinbankfailuresintheSixthandEighthFederal

ReserveDistricts,andtheSixthDistrict’sDD
counterfactual

186

5.4AnMLDAeffectinstateswithparalleltrends 198

5.5AspuriousMLDAeffectinstateswheretrendsarenot

parallel
198

5.6ArealMLDAeffect,visibleeventhoughtrendsarenot
parallel

199

5.7JohnSnow’sDDrecipe 206
6.1Thequarterofbirthfirststage 230
6.2Thequarterofbirthreducedform 230
6.3Last-chanceexamscoresandTexassheepskin 237
6.4Theeffectoflast-chanceexamscoresonearnings 237



TABLES

1.1Healthanddemographiccharacteristicsofinsuredand
uninsuredcouplesintheNHIS

5

1.2OutcomesandtreatmentsforKhuzdarandMaria 7
1.3Demographiccharacteristicsandbaselinehealthinthe

RANDHIE
20

1.4HealthexpenditureandhealthoutcomesintheRANDHIE 23
1.5OHPeffectsoninsurancecoverageandhealth-careuse 27
1.6OHPeffectsonhealthindicatorsandfinancialhealth 28
2.1Thecollegematchingmatrix 53
2.2Privateschooleffects:Barron’smatches 63

2.3Privateschooleffects:AverageSATscorecontrols 66
2.4Schoolselectivityeffects:AverageSATscorecontrols 67
2.5Privateschooleffects:Omittedvariablesbias 76
3.1AnalysisofKIPPlotteries 104
3.2Thefourtypesofchildren 112
3.3AssignedanddeliveredtreatmentsintheMDVE 117
3.4Quantity-qualityfirststages 135
3.5OLSand2SLSestimatesofthequantity-qualitytrade-off 137
4.1SharpRDestimatesofMLDAeffectsonmortality 160
5.1Wholesalefirmfailuresandsalesin1929and1933 190
5.2RegressionDDestimatesofMLDAeffectsondeathrates 196
5.3RegressionDDestimatesofMLDAeffectscontrollingfor

beertaxes
201

6.1Howbadcontrolcreatesselectionbias 216



6.2ReturnstoschoolingforTwinsburgtwins 220
6.3Returnstoschoolingusingchildlaborlawinstruments 226
6.4IVrecipeforanestimateofthereturnstoschoolingusing

asinglequarterofbirthinstrument
231

6.5Returnstoschoolingusingalternativequarterofbirth
instruments

232



INTRODUCTION

BLINDMASTERPO:Closeyoureyes.Whatdoyouhear?

YOUNGKWAICHANGCAINE:Ihearthewate r,Ihearthebirds.

MASTERPO:Doyouhearyourownheartbeat?

KWAICHANGCAINE:No.

MASTERPO:Doyouhearthegrasshopperthatisatyourfeet?

KWAICHANGCAINE:Oldman,howisitthatyouhearthesethings?

MASTERPO:Youngman,howisitthatyoudonot?
KungFu,Pilot

Economists’ reputation for dismality is a bad rap. Economics is
as
exciting as any science can be: theworld is our lab, and themany
diversepeopleinitareoursubjects.
The excitement in ourwork comes from the opportunity to learn

aboutcauseandeffectinhumanaffairs.Thebigquestionsofthedayare
ourquestions:Willloosemonetarypolicysparkeconomicgrowthorju
st
fanthefiresof inflation?IowafarmersandtheFederalReservechair
wanttoknow.WillmandatoryhealthinsurancereallymakeAmerican
s
healthier? Such policy kindling lights the fires of talk radio. We
approachthesequestionscoolly,however,armednotwithpassionbut
withdata.
Economists’ use of data to answer cause-and-effect questions

constitutes the field of applied econometrics, known to students
and

mastersalikeas’metrics.Thetoolsofthe’metricstradearedisciplined
dataanalysis,pairedwiththemachineryofstatisticalinference.There
is
amysticalaspecttoourworkaswell:we’reaftertruth,buttruthisnot
revealed in full, and the messages the data transmit require
interpretation. In this spirit,wedraw inspiration fromthe
journeyof
KwaiChangCaine,herooftheclassicKungFuTVseries.Caine,amixe
d-
raceShaolinmonk,wandersinsearchofhisU.S.-bornhalf-
brotherinthe
nineteenthcenturyAmericanWest.Ashesearches,Cainequestionsal
l
hesees inhumanaffairs,uncoveringhiddenrelationshipsanddeeper
meanings.LikeCaine’s journey, theWayof ’Metrics is
illuminatedby
questions.

OtherThingsEqual
Inadisturbingdevelopmentyoumayhaveheardof,theproportionof
Americancollegestudentscompletingtheirdegreesinatimelyfashio
n
has taken a sharp turn south. Politicians and policy analysts
blame
fallingcollegegraduationratesonaperniciouscombinationoftuition
hikesand the large student loansmany studentsuse to finance
their
studies.Perhapsincreasedstudentborrowingderailssomewhowould
otherwisestayontrack.Thefactthatthestudentsmostlikelytodrop
out of school often shoulder large student loans would seem to
substantiatethishypothesis.
You’d rather pay for school with inherited riches than borrowed
money if you can. As we’ll discuss in detail, however,
education
probablyboostsearningsenoughtomakeloanrepaymentbearablefor

mostgraduates.Howthenshouldweinterpretthenegativecorrelation
betweendebtburdenandcollegegraduationrates?Doesindebtedness
causedebtorstodropout?Thefirstquestiontoaskinthiscontextiswho
borrows themost. Studentswhoborrowheavily typically come
from
middle and lower income families, since richer families have
more
savings.Formanyreasons,studentsfromlowerincomefamiliesarele
ss
likely to complete a degree than those fromhigher income
families,
regardlessofwhetherthey’veborrowedheavily.Weshouldtherefore
be



skeptical of claims that high debt burdens cause lower college
completionrateswhentheseclaimsarebasedsolelyoncomparisonsof
completionratesbetweenthosewithmoreorlessdebt.Byvirtueofthe
correlationbetweenfamilybackgroundandcollegedebt,thecontrasti
n
graduationratesbetweenthosewithandwithoutstudentloansisnotan
otherthingsequalcomparison.
Ascollegestudentsmajoringineconomics,wefirstlearnedtheother
thingsequal ideaby
itsLatinname,ceterisparibus.Comparisonsmade
underceterisparibusconditionshaveacausalinterpretation.Imagine
two
studentsidenticalineveryway,sotheirfamilieshavethesamefinanci
al
resourcesandtheirparentsaresimilarlyeducated.Oneofthesevirtual
twinsfinancescollegebyborrowingandtheotherfromsavings.Becau
se
theyareotherwiseequalineveryway(theirgrandmotherhastreated
bothtoasmallnestegg),differencesintheireducationalattainmentca
n

beattributedtothefactthatonlyonehasborrowed.Tothisday,we
wonderwhy somany economics students first encounter this
central
ideainLatin;maybeit’saconspiracytokeepthemfromthinkingabout
it.Because,as thishypotheticalcomparisonsuggests, realother
things
equalcomparisonsarehardtoengineer,somewouldevensayimpossib
ile
(that’sItaliannotLatin,butatleastpeoplestillspeakit).
Hardtoengineer,maybe,butnotnecessarilyimpossible.The’metrics
craftusesdatatogettootherthingsequalinspiteoftheobstacles—
called
selectionbiasoromittedvariablesbias—
foundonthepathrunningfrom
raw numbers to reliable causal knowledge. The path to causal
understandingisroughandshadowedasitsnakesaroundtheboulders
of selection bias. And yet, masters of ’metrics walk this path
with
confidenceaswellashumility,successfullylinkingcauseandeffect.
Our first line of attack on the causality problem is a randomized
experiment, often called a randomized trial. In a randomized
trial,
researcherschangethecausalvariablesofinterest(say,theavailabilit
y
ofcollegefinancialaid)foragroupselectedusingsomethinglikeacoin
toss.Bychangingcircumstancesrandomly,wemakeithighlylikelyth
at
the variable of interest is unrelated to the many other factors
determiningtheoutcomeswemeantostudy.Randomassignmentisn’t



thesameasholdingeverythingelse fixed,but ithas thesameeffect.
Randommanipulationmakesotherthingsequalholdonaverageacros
s
thegroupsthatdidanddidnotexperiencemanipulation.Asweexplain

inChapter1,“onaverage”isusuallygoodenough.

Randomized trials takeprideofplace inour ’metrics toolkit.Alas,
randomizedsocialexperimentsareexpensivetofieldandmaybeslowt
o
bear fruit, while research funds are scarce and life is short.
Often,
therefore,mastersof’metricsturntolesspowerfulbutmoreaccessible
researchdesigns.Evenwhenwecan’tpracticablyrandomize,howeve
r,
we still dreamof the trialswe’d like to do. The notion of an ideal
experimentdisciplinesourapproachtoeconometricresearch.Master
ing
’Metrics showshowwise application of our five favorite
econometric
toolsbringsusascloseaspossibletothecausality-revealingpowerofa
realexperiment.
Ourfavoriteeconometrictoolsareillustratedherethroughaseriesof

well-
craftedandimportanteconometricstudies.VettedbyGrandMaster
OogwayofKungFuPanda’sJadePalace,theseinvestigationsofcausa
l
effectsaredistinguishedbytheirawesomeness.Themethodstheyuse

random assignment, regression, instrumental variables,
regression
discontinuity designs, and differences-in-differences—are the
Furious
Five of econometric research. For starters, motivated by the
contemporary American debate over health care, the first
chapter



describes two social experiments that reveal whether, as many

policymakersbelieve,healthinsuranceindeedhelpsthosewhohaveit
stayhealthy.Chapters2–5putourothertoolstowork,craftinganswers
to importantquestions ranging from
thebenefitsofattendingprivate
collegesandselectivehighschoolstothecostsofteendrinkingandthe
effectsofcentralbankinjectionsofliquidity.
OurfinalchapterputstheFuriousFivetothetestbyreturningtothe
education arena. On average, college graduates earn about twice
as
muchashighschoolgraduates,anearningsgapthatonlyseemstobe
growing.Chapter6askswhetherthisgapisevidenceofalargecausal
returntoschoolingormerelyareflectionofthemanyotheradvantages
thosewithmoreeducationmighthave(suchasmoreeducatedparents).
Cantherelationshipbetweenschoolingandearningseverbeevaluate
d
onaceterisparibusbasis,ormustthebouldersofselectionbiasforever
blockourway?Thechallengeofquantifying thecausal linkbetween
schoolingandearningsprovidesagrippingtestmatchfor’metricstool
s
andthemasterswhowieldthem.



Mastering’Metrics



Chapter1

RandomizedTrials

KWAICHANGCAINE:Whathappensinaman’slifeisalreadywritte
n.A
manmustmovethroughlifeashisdestinywills.

OLDMAN:Yeteachisfreetoliveashechooses.Thoughtheyseem

opposite,botharetrue.
KungFu,Pilot

OurPath

Our path begins with experimental random assignment, both as
a
frameworkforcausalquestionsandabenchmarkbywhichtheresults
fromothermethodsare judged.We illustrate theawesomepowerof
randomassignmentthroughtworandomizedevaluationsoftheeffect
sof
health insurance. The appendix to this chapter also uses the
experimental framework to review the concepts and methods of
statisticalinference.

1.1InSicknessandinHealth(Insurance)

The Affordable Care Act (ACA) has proven to be one of the
most
controversial and interestingpolicy innovationswe’ve
seen.TheACA
requiresAmericanstobuyhealthinsurance,withataxpenaltyforthose



who don’t voluntarily buy in. The question of the proper role of
governmentinthemarketforhealthcarehasmanyangles.Oneisthe
causaleffectofhealth insuranceonhealth.TheUnitedStates spends
moreof itsGDPonhealthcare thandootherdevelopednations,yet
Americansaresurprisinglyunhealthy.Forexample,Americansarem
ore
likelytobeoverweightanddiesoonerthantheirCanadiancousins,wh
o
spendonlyabouttwo-thirdsasmuchoncare.Americaisalsounusual
among developed countries in having no universal health
insurance

scheme.Perhapsthere’sacausalconnectionhere.
ElderlyAmericansarecoveredbyafederalprogramcalledMedicare,
while some poor Americans (including most single mothers,
their
children,andmanyotherpoorchildren)arecoveredbyMedicaid.Man
y
oftheworking,prime-agepoor,however,havelongbeenuninsured.In
fact,manyuninsuredAmericanshavechosennot toparticipate inan
employer-provided insuranceplan.1 Theseworkers, perhaps
correctly,
count on hospital emergency departments, which cannot turn
them
away,toaddresstheirhealth-
careneeds.Buttheemergencydepartment
mightnotbethebestplacetotreat,say,theflu,ortomanagechronic
conditionslikediabetesandhypertensionthataresopervasiveamong
poorAmericans.Theemergencydepartmentisnotrequiredtoprovide
long-termcare.Itthereforestandstoreasonthatgovernment-
mandated
healthinsurancemightyieldahealthdividend.Thepushforsubsidize
d
universalhealthinsurancestemsinpartfromthebeliefthatitdoes.
The ceteris paribus question in this context contrasts the health
of
someonewithinsurancecoveragetothehealthofthesamepersonwere
theywithoutinsurance(otherthananemergencydepartmentbackstop
).
Thiscontrasthighlightsafundamentalempiricalconundrum:people
are
eitherinsuredornot.Wedon’tgettoseethembothways,atleastnotat
thesametimeinexactlythesamecircumstances.
Inhiscelebratedpoem,“TheRoadNotTaken,”RobertFrostusedthe
metaphor of a crossroads to describe the causal effects of
personal
choice:

Tworoadsdivergedinayellowwood,



AndsorryIcouldnottravelboth
Andbeonetraveler,longIstood
AndlookeddownoneasfarasIcould
Towhereitbentintheundergrowth;

Frost’stravelerconcludes:

Tworoadsdivergedinawood,andI—
Itooktheonelesstraveledby,
Andthathasmadeallthedifference.

Thetravelerclaimshischoicehasmattered,but,beingonlyoneperson,
hecan’tbesure.Alatertriporareportbyothertravelerswon’tnailit
downforhim,either.Ournarratormightbeolderandwiserthesecond
timearound,whileothertravelersmighthavedifferentexperienceson
thesameroad.Soitiswithanychoice,includingthoserelatedtohealth
insurance:woulduninsuredmenwithheartdiseasebedisease-free if
theyhadinsurance?InthenovelLightYears,JamesSalter’s
irresolute
narratorobserves:“Actsdemolishtheiralternatives,thatistheparado
x.”
Wecan’tknowwhatliesattheendoftheroadnottaken.
Wecan’tknow,butevidencecanbebroughttobearonthequestion.
Thischaptertakesyouthroughsomeoftheevidencerelatedtopaths
involvinghealth insurance. The starting point is
theNationalHealth
InterviewSurvey(NHIS),anannualsurveyoftheU.S.populationwith
detailedinformationonhealthandhealthinsurance.Amongmanyoth
er
things, the NHIS asks: “Would you say your health in general is
excellent,verygood,good,fair,orpoor?”Weusedthisquestiontocod
e

an indexthatassigns5 toexcellenthealthand1topoorhealth ina
sample ofmarried 2009NHIS respondentswhomay ormay not be
insured.2 This index is our outcome: a measure we’re interested
in
studying.Thecausalrelationofinteresthereisdeterminedbyavariabl
e
thatindicatescoveragebyprivatehealthinsurance.Wecallthisvariab
le
thetreatment,borrowingfromtheliteratureonmedicaltrials,althoug
h
thetreatmentswe’reinterestedinneednotbemedicaltreatmentslike



drugsorsurgery.Inthiscontext,thosewithinsurancecanbethoughtof
asthetreatmentgroup;thosewithoutinsurancemakeupthecompariso
n
orcontrolgroup.Agoodcontrolgrouprevealsthefateofthetreatedina
counterfactualworldwheretheyarenottreated.
The first row of Table 1.1 compares the average health index of

insuredanduninsuredAmericans,withstatisticstabulatedseparately
for
husbandsandwives.3Thosewithhealthinsuranceareindeedhealthie
r
thanthosewithout,agapofabout.3intheindexformenand.4inthe
indexforwomen.Thesearelargedifferenceswhenmeasuredagainstt
he
standard deviation of the health index,which is about 1.
(Standard
deviations,reportedinsquarebracketsinTable1.1,measurevariabili
ty
indata.Thechapterappendixreviewstherelevantformula.)Theselar
ge
gapsmightbethehealthdividendwe’relookingfor.

FruitlessandFruitfulComparisons
Simplecomparisons,suchasthoseatthetopofTable1.1,areoftencited
as evidence of causal effects. More often than not, however,
such
comparisonsaremisleading.Onceagaintheproblemisotherthingseq
ual,
or lack thereof. Comparisons of people with and without health
insurancearenotapplestoapples;suchcontrastsareapplestooranges,
orworse.
Among other differences, those with health insurance are better

educated,havehigherincome,andaremorelikelytobeworkingthan
theuninsured.ThiscanbeseeninpanelBofTable1.1,whichreports
theaveragecharacteristicsofNHISrespondentswhodoanddon’thav
e
health insurance.Many of the differences in the table are large
(for
example,anearly3-yearschoolinggap);mostarestatisticallyprecise
enoughtoruleoutthehypothesisthatthesediscrepanciesaremerely
chancefindings(seethechapterappendixforarefresheronstatistical
significance).Itwon’tsurpriseyoutolearnthatmostvariablestabulat
ed
herearehighlycorrelatedwithhealthaswellaswithhealthinsurance
status.More-educatedpeople,forexample,tendtobehealthieraswell
as being overrepresented in the insured group. Thismaybe
because



more-
educatedpeopleexercisemore,smokeless,andaremorelikelyto
wearseatbelts.Itstandstoreasonthatthedifferenceinhealthbetween
insuredanduninsuredNHISrespondentsatleastpartlyreflectstheext
ra
schoolingoftheinsured.

TABLE1.1
Healthanddemographiccharacteristicsofinsuredanduninsured

couplesintheNHIS

Notes:Thistablereportsaveragecharacteristicsforinsuredandunins
uredmarriedcouplesin
the2009NationalHealthInterviewSurvey(NHIS).Columns(1),(2),(
4),and(5)showaverage
characteristicsofthegroupofindividualsspecifiedbythecolumnhea
ding.Columns(3)and(6)
reportthedifferencebetweentheaveragecharacteristicforindividual
swithandwithouthealth
insurance(HI).Standarddeviationsareinbrackets;standarderrorsar
ereportedinparentheses.

Ourefforttounderstandthecausalconnectionbetweeninsuranceand



healthisaidedbyfleshingoutFrost’stwo-roadsmetaphor.Weusethe
letterYas shorthand forhealth, theoutcomevariableof interest.To
makeitclearwhenwe’retalkingaboutspecificpeople,weusesubscrip
ts
asastand-infornames:Yiisthehealthofindividuali.TheoutcomeYiis
recordedinourdata.But,facingthechoiceofwhethertopayforhealth
insurance, person i has two potential outcomes, only one
ofwhich is
observed.Todistinguishonepotentialoutcomefromanother,weadda
secondsubscript:TheroadtakenwithouthealthinsuranceleadstoY0i
(read this as “y-zero-i”) for person i, while the road with health
insurance leads toY1i (read this as “y-one–i”) for person i.
Potential
outcomeslieattheendofeachroadonemighttake.Thecausaleffectof
insuranceonhealthisthedifferencebetweenthem,writtenY1i−Y0i.

4

Tonailthisdownfurther,considerthestoryofvisitingMassachusetts
InstituteofTechnology(MIT)studentKhuzdarKhalat,recentlyarriv
ed
fromKazakhstan. Kazakhstan has a national health insurance
system
thatcoversallitscitizensautomatically(thoughyouwouldn’tgothere
just for the health insurance). Arriving
inCambridge,Massachusetts,
KhuzdarissurprisedtolearnthatMITstudentsmustdecidewhetherto
optintotheuniversity’shealthinsuranceplan,forwhichMITleviesa
hefty fee. Upon reflection, Khuzdar judges theMIT
insuranceworth
paying for, since he fears upper respiratory infections in
chillyNew
England.Let’ssaythatY0i=3andY1i=4fori=Khuzdar.Forhim,
thecausaleffectofinsuranceisonestepupontheNHISscale:

Table1.2summarizesthisinformation.

TABLE1.2
OutcomesandtreatmentsforKhuzdarandMaria

KhuzdarKhalat MariaMoreño
Potentialoutcomewithoutinsurance:Y0i 3 5



Potentialoutcomewithinsurance:Y1i 4 5

Treatment(insurancestatuschosen):Di 1 0

Actualhealthoutcome:Yi 4 5

Treatmenteffect:Y1i−Y0i 1 0

It’sworthemphasizingthatTable1.2isanimaginarytable:someof
theinformationitdescribesmustremainhidden.Khuzdarwilleitherb
uy
insurance,revealinghisvalueofY1i,orhewon’t,inwhichcasehisY0ii
s
revealed. Khuzdar has walked many a long and dusty road in
Kazakhstan,butevenhecannotbesurewhatliesattheendofthosenot
taken.
MariaMoreñoisalsocomingtoMITthisyear;shehailsfromChile’s

Andeanhighlands.LittleconcernedbyBostonwinters,heartyMariai
s
not the type to fall sick easily. She therefore passes up the MIT
insurance,planningtousehermoneyfortravelinstead.BecauseMaria
hasY0,Maria=Y1,Maria=5,thecausaleffectofinsuranceonherhealt
h
is

Maria’snumberslikewiseappearinTable1.2.
SinceKhuzdarandMariamakedifferentinsurancechoices,theyoffer

aninterestingcomparison.Khuzdar’shealthisYKhuzdar=Y1,Khuz
dar=4,
whileMaria’sisYMaria=Y0,Maria=5.Thedifferencebetweenthemi
s

Taken at face value, this quantity—which we observe—suggests
Khuzdar’s decision to buy insurance is counterproductive. His
MIT
insurancecoveragenotwithstanding,insuredKhuzdar’shealthiswor
se
thanuninsuredMaria’s.
Infact,thecomparisonbetweenfrailKhuzdarandheartyMariatells

uslittleaboutthecausaleffectsoftheirchoices.Thiscanbeseenby
linkingobservedandpotentialoutcomesasfollows:

Thesecondlineinthisequationisderivedbyaddingandsubtracting
Y0,Khuzdar, therebygenerating twohidden comparisons
thatdetermine
the onewe see. The first comparison,Y1,Khuzdar−Y0,Khuzdar,
is the
causaleffectofhealthinsuranceonKhuzdar,whichisequalto1.The
second,Y0,Khuzdar−Y0,Maria,isthedifferencebetweenthetwostu
dents’
healthstatuswerebothtodecideagainstinsurance.Thisterm,equalto
−2, reflectsKhuzdar’s relative frailty. In thecontextofoureffort
to
uncovercausaleffects,thelackofcomparabilitycapturedbythesecon
d
termiscalledselectionbias.
Youmightthinkthatselectionbiashassomethingtodowithourfocus
on particular individuals instead of on groups, where, perhaps,
extraneousdifferencescanbeexpectedto“averageout.”Butthediffic
ult
problemofselectionbiascarriesovertocomparisonsofgroups,thoug
h,
insteadofindividualcausaleffects,ourattentionshiftstoaveragecaus
al
effects.Inagroupofnpeople,averagecausaleffectsarewrittenAvgn[
Y1i
−Y0i],where averaging is done in the usualway (that is,we sum
individualoutcomesanddividebyn):

Thesymbol indicatesasumovereveryonefromi=1ton,wheren
is thesizeof thegroupoverwhichweareaveraging.Notethatboth
summationsinequation(1.1)aretakenovereverybodyinthegroupof
interest.Theaveragecausaleffectofhealthinsurancecomparesavera
ge

healthinhypotheticalscenarioswhereeverybodyinthegroupdoesan
d
doesnothavehealthinsurance.Asacomputationalmatter,thisisthe
average of individual causal effects like Y1,Khuzdar −
Y0,Khuzdar and
Y1,Maria−Y0,Mariaforeachstudentinourdata.
An investigationof theaveragecausaleffectof insurancenaturally
begins by comparing the average health of groups of insured
and
uninsuredpeople,asinTable1.1.Thiscomparisonisfacilitatedbythe
constructionofadummyvariable,Di,whichtakesonthevalues0and1
toindicateinsurancestatus:

WecannowwriteAvgn[Yi|Di=1]fortheaverageamongtheinsured
and Avgn[Yi|Di = 0] for the average among the uninsured.
These
quantitiesareaveragesconditionaloninsurancestatus.5

TheaverageYifortheinsuredisnecessarilyanaverageofoutcome
Y1i, but contains no information aboutY0i. Likewise, the
averageYi
amongtheuninsuredisanaverageofoutcomeY0i,butthisaverageis
devoidofinformationaboutthecorrespondingY1i.Inotherwords,the
roadtakenbythosewithinsuranceendswithY1i,whiletheroadtaken
bythosewithoutinsuranceleadstoY0i.Thisinturnleadstoasimple
but important conclusion about the difference in average health
by
insurancestatus:

anexpressionhighlightingthefactthatthecomparisonsinTable1.1tel
l
ussomethingaboutpotentialoutcomes,thoughnotnecessarilywhatw
e

want to know.We’re afterAvgn[Y1i−Y0i], an average causal
effect
involvingeveryone’sY1iandeveryone’sY0i,butweseeaverageY1io
nly



fortheinsuredandaverageY0ionlyfortheuninsured.
Tosharpenourunderstandingofequation(1.2),ithelpstoimagine

thathealthinsurancemakeseveryonehealthierbyaconstantamount,κ
.
Asisthecustomamongourpeople,weuseGreekletterstolabelsuch
parameters,soastodistinguishthemfromvariablesordata;thisoneis
theletter“kappa.”Theconstant-effectsassumptionallowsustowrite:

or,equivalently,Y1i−Y0i=κ.Inotherwords,κisboththeindividual
andaveragecausaleffectofinsuranceonhealth.Thequestionathandis
howcomparisonssuchasthoseatthetopofTable1.1relatetoκ.
Using the constant-effects model (equation (1.3)) to substitute
for

Avgn[Y1i|Di=1]inequation(1.2),wehave:

Thisequationrevealsthathealthcomparisonsbetweenthosewithand
without insurance equal the causal effect of interest (κ) plus the
differenceinaverageY0ibetweentheinsuredandtheuninsured.Asin
theparableofKhuzdarandMaria,thissecondtermdescribesselection
bias.Specifically, thedifference inaveragehealthby
insurancestatus
canbewritten:

whereselectionbiasisdefinedasthedifferenceinaverageY0ibetwee
n
thegroupsbeingcompared.
Howdoweknowthatthedifferenceinmeansbyinsurancestatusis

contaminatedbyselectionbias?WeknowbecauseY0iisshorthandfor
everythingaboutpersonirelatedtohealth,otherthaninsurancestatus.



The lower part of Table 1.1 documents important noninsurance
differencesbetweentheinsuredanduninsured,showingthatceterisis
n’t
paribushereinmanyways.TheinsuredintheNHISarehealthierforall
sortsofreasons,including,perhaps,thecausaleffectsofinsurance.Bu
t
theinsuredarealsohealthierbecausetheyaremoreeducated,among
other things.To seewhy thismatters, imagineaworld inwhich the
causaleffectofinsuranceiszero(thatis,κ=0).Eveninsuchaworld,
we should expect insured NHIS respondents to be healthier,
simply
becausetheyaremoreeducated,richer,andsoon.
Wewrapupthisdiscussionbypointingoutthesubtleroleplayedby

informationlikethatreportedinpanelBofTable1.1.Thispanelshows
thatthegroupsbeingcompareddifferinwaysthatwecanobserve.As
we’llseeinthenextchapter,iftheonlysourceofselectionbiasisaset
of differences in characteristics that we can observe and
measure,
selectionbiasis(relatively)easytofix.Suppose,forexample,thatthe
onlysourceofselectionbiasintheinsurancecomparisoniseducation.
Thisbiasiseliminatedbyfocusingonsamplesofpeoplewiththesame
schooling,say,collegegraduates.Educationisthesameforinsuredan
d
uninsuredpeopleinsuchasample,becauseit’sthesameforeveryonein
thesample.
The subtlety inTable1.1arisesbecausewhenobserveddifferences

proliferate,soshouldoursuspicionsaboutunobserveddifferences.T
he

factthatpeoplewithandwithouthealthinsurancedifferinmanyvisibl
e
wayssuggeststhatevenwerewetoholdobservedcharacteristicsfixed
,
theuninsuredwouldlikelydifferfromtheinsuredinwayswedon’tsee
(afterall,thelistofvariableswecanseeispartlyfortuitous).Inother
words,eveninasampleconsistingofinsuredanduninsuredpeoplewit
h
thesameeducation,income,andemploymentstatus,theinsuredmight
have higher values ofY0i. The principal challenge facingmasters
of
’metrics is elimination of the selection bias that arises from
such
unobserveddifferences.



BreakingtheDeadlock:JustRANDomize
Mydoctorgaveme6monthstolive…butwhenIcouldn’tpaythe
bill,hegaveme6monthsmore.
WalterMatthau

Experimentalrandomassignmenteliminatesselectionbias.Thelogis
tics
ofarandomizedexperiment,sometimescalledarandomizedtrial,can
be
complex,butthelogicissimple.Tostudytheeffectsofhealthinsuranc
e
in a randomized trial, we’d start with a sample of peoplewho are
currentlyuninsured.We’dthenprovidehealthinsurancetoarandoml
y
chosen subset of this sample, and let the rest go to the
emergency
department if the need arises. Later, the health of the insured
and
uninsured groups can be compared. Random assignment makes

this
comparison ceteris paribus: groups insured and uninsured by
random
assignmentdifferonly intheir
insurancestatusandanyconsequences
thatfollowfromit.
SupposetheMITHealthServiceelectstoforgopaymentandtossesa

coin to determine the insurance status of new students Ashish
and
Zandile (just this once, as a favor to their distinguished
Economics
Department).Zandileisinsuredifthetosscomesupheads;otherwise,
Ashish gets the coverage. A good start, but not good enough,
since
random assignment of two experimental subjects does not
produce



insuredanduninsuredapples.Foronething,AshishismaleandZandil
e
female.Women,asarule,arehealthierthanmen.IfZandilewindsup
healthier,itmightbeduetohergoodluckinhavingbeenbornawoman
andunrelatedtoherluckydrawintheinsurancelottery.Theproblem
here is that two is not enough to tangowhen it comes to random
assignment.Wemustrandomlyassigntreatmentinasamplethat’slarg
e
enoughtoensurethatdifferencesinindividualcharacteristics
likesex
washout.
Two randomly chosen groups, when large enough, are indeed

comparable.Thisfactisduetoapowerfulstatisticalpropertyknownas
theLawofLargeNumbers(LLN).TheLLNcharacterizesthebehavior
of

sampleaverages in relation to sample size.Specifically,
theLLNsays
thatasampleaveragecanbebroughtascloseasweliketotheaverage
in the population from which it is drawn (say, the population of
Americancollegestudents)simplybyenlargingthesample.
ToseetheLLNinaction,playdice.6Specifically,rollafairdieonce

andsavetheresult.Thenrollagainandaveragethesetworesults.Keep
onrollingandaveraging.Thenumbers1to6areequallylikely(that’s
whythedieissaidtobe“fair”),sowecanexpecttoseeeachvaluean
equal number of times if we play long enough. Since there are
six
possibilitieshere,andallareequallylikely,theexpectedoutcomeisan
equallyweightedaverageofeachpossibility,withweightsequalto1/6
:

Thisaveragevalueof3.5 is calledamathematical expectation; in
this
case,it’stheaveragevaluewe’dgetininfinitelymanyrollsofafairdie.
The expectation concept is important to our work, so we define
it
formallyhere.

MATHEMATICALEXPECTATIONThemathematicalexpectation
ofavariable,Yi,
writtenE[Yi], is thepopulationaverageof thisvariable. IfYi isa



variable generated by a randomprocess, such as throwing a die,
E[Yi]istheaverageininfinitelymanyrepetitionsofthisprocess.IfYi
isavariablethatcomesfromasamplesurvey,E[Yi]istheaverage
obtained if everyone in the population fromwhich the sample is
drawnweretobeenumerated.

Rollingadieonlyafewtimes,theaveragetossmaybefarfromthe

correspondingmathematicalexpectation.Roll two times,
forexample,
andyoumightgetboxcarsorsnakeeyes(twosixesortwoones).These
averagetovalueswellawayfromtheexpectedvalueof3.5.Butasthe
numberoftossesgoesup,theaverageacrosstossesreliablytendsto3.5
.
ThisistheLLNinaction(andit’showcasinosmakeaprofit:inmost
gamblinggames,youcan’tbeatthehouseinthelongrun,becausethe
expectedpayout forplayers isnegative).More remarkably,
itneedn’t
take toomany rolls or too large a sample for a sample average to
approach the expected value. The chapter appendix addresses
the
question of how the number of rolls or the size of a sample
survey
determinesstatisticalaccuracy.
Inrandomizedtrials,experimentalsamplesarecreatedbysampling

fromapopulationwe’dliketostudyratherthanbyrepeatingagame,
buttheLLNworksjustthesame.Whensampledsubjectsarerandomly
divided(as ifbyacointoss) intotreatmentandcontrolgroups, they
comefromthesameunderlyingpopulation.TheLLNthereforepromis
es
thatthoseinrandomlyassignedtreatmentandcontrolsampleswillbe
similarifthesamplesarelargeenough.Forexample,weexpecttosee
similarproportionsofmenandwomeninrandomlyassignedtreatment
andcontrolgroups.Randomassignmentalsoproducesgroupsofabout
the same age and with similar schooling levels. In fact,
randomly
assignedgroupsshouldbesimilarineveryway,includinginwaysthat
we cannot easily measure or observe. This is the root of random
assignment’sawesomepowertoeliminateselectionbias.
Thepowerofrandomassignmentcanbedescribedpreciselyusingthe

following definition, which is closely related to the definition
of

mathematicalexpectation.



CONDITIONAL EXPECTATION The conditional expectation
of a variable,Yi,
givenadummyvariable,Di=1,iswrittenE[Yi|Di=1].Thisisthe
averageofYiinthepopulationthathasDiequalto1.Likewise,the
conditional expectation of a variable, Yi, given Di = 0, written
E[Yi|Di=0],istheaverageofYiinthepopulationthathasDiequal
to0.IfYiandDiarevariablesgeneratedbyarandomprocess,such
asthrowingadieunderdifferentcircumstances,E[Yi|Di=d]isthe
averageofinfinitelymanyrepetitionsofthisprocesswhileholding
thecircumstancesindicatedbyDifixedatd.IfYiandDicomefroma
samplesurvey,E[Yi|Di=d]istheaveragecomputedwheneveryone
inthepopulationwhohasDi=dissampled.

Becauserandomlyassignedtreatmentandcontrolgroupscomefrom
the same underlying population, they are the same in every way,
including their expected Y0i. In other words, the conditional
expectations,E[Y0i|Di=1]andE[Y0i|Di=0],arethesame.Thisinturn
meansthat:

RANDOM ASSIGNMENT ELIMINATES SELECTION BIAS
When Di is randomly
assigned,E[Y0i|Di= 1]= E[Y0i|Di= 0], and the difference in
expectations by treatment status captures the causal effect of
treatment:

ProvidedthesampleathandislargeenoughfortheLLNtoworkits
magic(sowecanreplacetheconditionalaveragesinequation(1.4)wit
h
conditional expectations), selection bias disappears in a
randomized
experiment. Random assignmentworks not by eliminating
individual

differences but rather by ensuring that themix of individuals
being
comparedisthesame.Thinkofthisascomparingbarrelsthatinclude
equalproportionsofapplesandoranges.Asweexplaininthechapters
that follow, randomization isn’t theonlyway togenerate
suchceteris
paribuscomparisons,butmostmastersbelieveit’sthebest.
Whenanalyzingdatafromarandomizedtrialoranyotherresearch
design,mastersalmostalwaysbeginwithacheckonwhethertreatment
andcontrolgroupsindeedlooksimilar.Thisprocess,calledcheckingf
or
balance,amountstoacomparisonofsampleaveragesasinpanelBof
Table1.1.Theaveragecharacteristics inpanelBappeardissimilaror
unbalanced,underliningthefactthatthedatainthistabledon’tcome
fromanythinglikeanexperiment.It’sworthcheckingforbalanceinthi
s
manneranytimeyoufindyourselfestimatingcausaleffects.
Random assignment of health insurance seems like a fanciful
proposition. Yet health insurance coverage has twice been
randomly
assignedtolargerepresentativesamplesofAmericans.TheRANDHe
alth
InsuranceExperiment(HIE),whichranfrom1974to1982,wasoneof
themost influential social experiments in research history. The
HIE
enrolled3,958peopleaged14to61fromsixareasofthecountry.The
HIE sample excluded Medicare participants and most Medicaid
and
militaryhealth
insurancesubscribers.HIEparticipantswererandomly
assignedtooneof14insuranceplans.Participantsdidnothavetopay
insurancepremiums,buttheplanshadavarietyofprovisionsrelatedto
costsharing,leadingtolargedifferencesintheamountofinsurancethe

y
offered.
ThemostgenerousHIEplanofferedcomprehensivecareforfree.At
theotherendoftheinsurancespectrum,three“catastrophiccoverage”
plans required families topay95%of theirhealth-carecosts,
though
thesecostswerecappedasaproportionofincome(orcappedat$1,000
perfamily,ifthatwaslower).Thecatastrophicplansapproximateano-
insurance condition. A second insurance scheme (the
“individual
deductible” plan) also required families to pay 95% of
outpatient
charges,butonlyupto$150perpersonor$450perfamily.Agroupof
nine other plans had a variety of coinsurance provisions,
requiring



participantstocoveranywherefrom25%to50%ofcharges,butalways
capped at a proportion of income or $1,000, whicheverwas
lower.
Participatingfamiliesenrolledintheexperimentalplansfor3or5year
s
andagreed togiveupanyearlier insurancecoverage in return fora
fixedmonthlypaymentunrelatedtotheiruseofmedicalcare.7

TheHIEwasmotivatedprimarilybyaninterestinwhateconomists
callthepriceelasticityofdemandforhealthcare.Specifically,theRA
ND
investigatorswantedtoknowwhetherandbyhowmuchhealth-
careuse
fallswhenthepriceofhealthcaregoesup.Familiesinthefreecareplan
facedapriceofzero,whilecoinsuranceplanscutpricesto25%or50%
of costs incurred, and families in the catastrophic coverage and
deductibleplanspaidsomethingclosetothestickerpriceforcare,at
leastuntiltheyhitthespendingcap.Buttheinvestigatorsalsowantedt

o
knowwhethermorecomprehensiveandmoregeneroushealthinsuran
ce
coverageindeedleadstobetterhealth.Theanswertothefirstquestion
wasaclear“yes”:health-careconsumptionishighlyresponsivetothe
priceofcare.Theanswertothesecondquestionismurkier.

RandomizedResults
Randomized field experiments are more elaborate than a coin
toss,
sometimes regrettably so. TheHIEwas
complicatedbyhavingmany
smalltreatmentgroups,spreadovermorethanadozeninsuranceplans
.
Thetreatmentgroupsassociatedwitheachplanaremostlytoosmallfor
comparisonsbetweenthemtobestatisticallymeaningful.Mostanalys
es
oftheHIEdatathereforestartbygroupingsubjectswhowereassigned
tosimilarHIEplanstogether.Wedothathereaswell.8

Anatural grouping scheme combines plans by the amount of cost
sharing they require. The three catastrophic coverage plans,
with
subscribers shouldering almost all of theirmedical expenses up
to a
fairly high cap, approximate a no-insurance state. The
individual
deductibleplanprovidedmorecoverage,butonlybyreducingthecap
ontotalexpensesthatplanparticipantswererequiredtoshoulder.The
ninecoinsuranceplansprovidedmoresubstantialcoveragebysplittin
g



subscribers’ health-care costswith the insurer, startingwith the
first

dollar of costs incurred. Finally, the free plan constituted a
radical
interventionthatmightbeexpectedtogeneratethelargestincreasein
health-careusageand,perhaps,health.Thiscategorizationleadsusto
four groups of plans: catastrophic, deductible, coinsurance, and
free,
instead of the 14 original plans. The catastrophic plans provide
the
(approximate)no-
insurancecontrol,whilethedeductible,coinsurance,
andfreeplansarecharacterizedbyincreasinglevelsofcoverage.
Aswithnonexperimentalcomparisons,afirststepinourexperimental
analysis is to check for balance. Do subjects randomly assigned
to
treatmentandcontrolgroups—
inthiscase,tohealthinsuranceschemes
ranging from little to complete coverage—indeed look similar?
We
gauge thisbycomparingdemographiccharacteristicsandhealthdata
collected before the experiment began. Because demographic
characteristicsareunchanging,while thehealthvariables
inquestion
weremeasuredbeforerandomassignment,weexpecttoseeonlysmall
differences in these variables across the groups assigned to
different
plans.
IncontrastwithourcomparisonofNHISrespondents’characteristics
byinsurancestatusinTable1.1,acomparisonofcharacteristicsacross
randomlyassignedtreatmentgroupsintheRANDexperimentshowst
he
peopleassignedtodifferentHIEplanstobesimilar.Thiscanbeseenin
panelAofTable1.3.Column(1)inthistablereportsaveragesforthe
catastrophic plan group, while the remaining columns compare
the
groupsassignedmoregenerousinsurancecoveragewiththecatastrop
hic

controlgroup.Asasummarymeasure,column(5)comparesasample
combiningsubjectsinthedeductible,coinsurance,andfreeplanswith
subjectsinthecatastrophicplans.Individualsassignedtotheplanswit
h
moregenerouscoveragearealittlelesslikelytobefemaleandalittle
lesseducated than those in thecatastrophicplans.Wealso see
some
variation in income, but differences between plan groups
aremostly
smallandareaslikelytogoonewayasanother.Thispatterncontrasts
withthelargeandsystematicdemographicdifferencesbetweeninsur
ed
anduninsuredpeopleseenintheNHISdatasummarizedinTable1.1.



ThesmalldifferencesacrossgroupsseeninpanelAofTable1.3seem
likelytoreflectchancevariationthatemergesnaturallyaspartofthe
sampling process. In any statistical sample, chance differences
arise
because we’re looking at one of many possible draws from the
underlying population fromwhichwe’ve sampled. A new sample
of
similar size from the same population can be expected to
produce
comparisons that are similar—though not identical—to those in
the
table.Thequestionofhowmuchvariationweshouldexpectfromone
sampletoanotherisaddressedbythetoolsofstatisticalinference.

TABLE1.3
DemographiccharacteristicsandbaselinehealthintheRANDHIE



Notes:Thistabledescribesthedemographiccharacteristicsandbasel

inehealthofsubjectsin
theRANDHealth InsuranceExperiment (HIE).Column (1) shows
theaverage for thegroup
assigned catastrophic coverage. Columns (2)–(5) compare
averages in the deductible, cost-
sharing,freecare,andanyinsurancegroupswiththeaverageincolumn
(1).Standarderrorsare
reported in parentheses in columns (2)–(5); standard deviations
are reported in brackets in
column(1).

Theappendixtothischapterbrieflyexplainshowtoquantifysampling
variation with formal statistical tests. Such tests amount to the
juxtapositionofdifferencesinsampleaverageswiththeirstandarderr
ors,
thenumbersinparenthesesreportedbelowthedifferencesinaverages
listedincolumns(2)–
(5)ofTable1.3.Thestandarderrorofadifference
inaveragesisameasureofitsstatisticalprecision:whenadifferencein



sample averages is smaller than about two standard errors, the
differenceistypicallyjudgedtobeachancefindingcompatiblewithth
e
hypothesisthatthepopulationsfromwhichthesesamplesweredrawn
are,infact,thesame.
Differencesthatarelargerthanabouttwostandarderrorsaresaidto

bestatisticallysignificant:insuchcases,itishighlyunlikely(thoughn
ot
impossible) that
thesedifferencesarosepurelybychance.Differences
thatarenotstatisticallysignificantareprobablyduetothevagariesof
the sampling process. The notion of statistical significance
helps us

interpret comparisons like those in Table 1.3. Not only are the
differencesinthistablemostlysmall,onlytwo(forproportionfemalei
n
columns (4) and (5)) aremore than twiceas largeas theassociated
standarderrors.Intableswithmanycomparisons,thepresenceofafew
isolatedstatisticallysignificantdifferencesisusuallyalsoattributabl
eto
chance.Wealsotakecomfortfromthefactthatthestandarderrorsin
this table are not very big, indicating differences across groups
are
measuredreasonablyprecisely.
Panel B of Table 1.3 complements the contrasts in panel A with

evidence for reasonablygoodbalance inpre-
treatmentoutcomesacross
treatmentgroups.Thispanelshowsnostatisticallysignificantdiffere
nces
in a pre-treatment index of general health. Likewise, pre-
treatment
cholesterol,bloodpressure,andmentalhealthappearlargelyunrelate
d
to treatment assignment, with only a couple of contrasts close to
statisticalsignificance.Inaddition,althoughlowercholesterolinthef
ree
groupsuggestssomewhatbetterhealththaninthecatastrophicgroup,
differencesinthegeneralhealthindexbetweenthesetwogroupsgothe
otherway(sincelowerindexvaluesindicateworsehealth).Lackofa
consistent pattern reinforces the notion that these gaps are due
to
chance.
ThefirstimportantfindingtoemergefromtheHIEwasthatsubjects

assigned to more generous insurance plans used substantially
more
health care. This finding, which vindicates economists’ view
that

demandforagoodshouldgoupwhenitgetscheaper,canbeseenin
panel A of Table 1.4.9 As might be expected, hospital inpatient



admissions were less sensitive to price than was outpatient care,
probablybecauseadmissionsdecisionsareusuallymadebydoctors.O
n
the other hand, assignment to the free care plan raised
outpatient
spending by two-thirds (169/248) relative to spending by those
in
catastrophic plans, while total medical expenses increased by
45%.
These large gaps are economically important as well as
statistically
significant.
Subjectswhodidn’thavetoworryaboutthecostofhealthcareclearly

consumedquiteabitmoreofit.Didthisextracareandexpensemake
themhealthier?PanelBinTable1.4,whichcompareshealthindicators
across HIE treatment groups, suggests not. Cholesterol levels,
blood
pressure,andsummaryindicesofoverallhealthandmentalhealthare
remarkablysimilaracrossgroups(theseoutcomesweremostlymeasu
red
3or5yearsafterrandomassignment).Formalstatisticaltestsshowno
statisticallysignificantdifferences,ascanbeseeninthegroup-
specific
contrasts(reportedincolumns(2)–(4))andinthedifferencesinhealth
betweenthoseinacatastrophicplanandeveryoneinthemoregenerous
insurancegroups(reportedincolumn(5)).
TheseHIEfindingsconvincedmanyeconomiststhatgeneroushealth

insurance can have unintended and undesirable consequences,
increasinghealth-

careusageandcosts,withoutgeneratingadividendin
theformofbetterhealth.10

TABLE1.4
HealthexpenditureandhealthoutcomesintheRANDHIE



Notes: This table reportsmeans and treatment effects for health
expenditure and health
outcomesintheRANDHealthInsuranceExperiment(HIE).Column(
1)showstheaverageforthe
groupassignedcatastrophiccoverage.Columns(2)–
(5)compareaveragesinthedeductible,cost-
sharing,freecare,andanyinsurancegroupswiththeaverageincolumn
(1).Standarderrorsare
reported in parentheses in columns (2)–(5); standard deviations
are reported in brackets in
column(1).

1.2TheOregonTrail

MASTERKAN:Truthishardtounderstand.

KWAICHANGCAINE:Itisafact,itisnotthetruth.Truthisoftenhidde
n,
likeashadowindarkness.
KungFu,Season1,Episode14



The HIE was an ambitious attempt to assess the impact of health
insurance on health-care costs and health. And yet, as far as the
contemporarydebateoverhealth insurancegoes, theHIEmighthave
missedthemark.Foronething,eachHIEtreatmentgrouphadatleast
catastrophic coverage, so financial liability for health-care

costswas
limited under every treatment. More importantly, today’s
uninsured
Americans differ considerably from theHIE population:most of
the
uninsured are younger, less educated, poorer, and less likely to
be
working.Thevalueofextrahealthcareinsuchagroupmightbevery
differentthanforthemiddleclassfamiliesthatparticipatedintheHIE.
Oneofthemostcontroversialideasinthecontemporaryhealthpolicy

arena is the expansion ofMedicaid to cover the currently
uninsured
(interestingly, on the eve of the RAND experiment, talk was of
expanding Medicare, the public insurance program for
America’s
elderly).Medicaidnowcoversfamiliesonwelfare,someofthedisable
d,
otherpoor children, andpoorpregnantwomen.Supposewewere to
expandMedicaidtocoverthosewhodon’tqualifyundercurrentrules.
Howwould suchan expansionaffecthealth-care spending?Would
it
shift treatment from costly and crowded emergency departments
to
possibly more effective primary care? Would Medicaid
expansion
improvehealth?
Many American states have begun to “experiment”withMedicaid

expansioninthesensethatthey’veagreedtobroadeneligibility,with
thefederalgovernmentfootingmostofthebill.Alas,thesearen’treal
experiments, since everyone who is eligible for expanded
Medicaid
coverage gets it. The most convincing way to learn about the
consequences of Medicaid expansion is to randomly offer
Medicaid

coveragetopeopleincurrentlyineligiblegroups.Randomassignment
of
Medicaid seems too much to hope for. Yet, in an awesome
social
experiment,thestateofOregonrecentlyofferedMedicaidtothousand
s
of randomlychosenpeople inapubliclyannouncedhealth insurance
lottery.
We can think of Oregon’s health insurance lottery as randomly

selectingwinnersandlosersfromapoolofregistrants,thoughcoverag
e



was not automatic, even for lottery winners. Winners won the
opportunitytoapplyfor thestate-runOregonHealthPlan(OHP), the
OregonversionofMedicaid.Thestatethenreviewedtheseapplication
s,
awardingcoveragetoOregonresidentswhowereU.S.citizensorlegal
immigrantsaged19–
64,nototherwiseeligibleforMedicaid,uninsured
foratleast6months,withincomebelowthefederalpovertylevel,and
few financial assets. To initiate coverage, lottery winners had to
documenttheirpovertystatusandsubmittherequiredpaperworkwith
in
45days.
Therationaleforthe2008OHPlotterywasfairnessandnotresearch,

butit’snolessawesomeforthat.TheOregonhealthinsurancelottery
providessomeofthebestevidencewecanhopetofindonthecostsand
benefitsof insurancecoverageforthecurrentlyuninsured,afactthat
motivated researchonOHPbyMITmasterAmyFinkelstein andher
coauthors.11

Roughly75,000 lotteryapplicants registered

forexpandedcoverage
throughtheOHP.Ofthese,almost30,000wererandomlyselectedand
invitedtoapplyforOHP;thesewinnersconstitutetheOHPtreatment
group.Theother45,000constitutetheOHPcontrolsample.
ThefirstquestionthatarisesinthiscontextiswhetherOHPlottery

winnersweremorelikelytoendupinsuredasaresultofwinning.This
question ismotivated by the fact that some applicants qualified
for
regularMedicaidevenwithoutthelottery.PanelAofTable1.5shows
thatabout14%ofcontrols(lotterylosers)werecoveredbyMedicaidin
theyearfollowingthefirstOHPlottery.Atthesametime,thesecond
column,which reportsdifferencesbetween the
treatmentandcontrol
groups,showsthattheprobabilityofMedicaidcoverageincreasedby2
6
percentage points for lottery winners. Column (4) shows a
similar
increase for the subsample living in and around Portland,
Oregon’s
largest city.Theupshot is thatOHP lotterywinnerswere insuredat
muchhigherratesthanwerelotterylosers,adifferencethatmighthave
affectedtheiruseofhealthcareandtheirhealth.12

TheOHPtreatmentgroup(thatis,lotterywinners)usedmorehealth-



careservicesthantheyotherwisewouldhave.Thiscanalsobeseenin
Table1.5,whichshowsestimatesofchangesinserviceuseintherows
below the estimate of the OHP effect on Medicaid coverage.
The
hospitalizationrateincreasedbyabouthalfapercentagepoint,amode
st
though statistically significant effect. Emergency department
visits,

outpatientvisits,andprescriptiondruguseallincreasedmarkedly.Th
e
factthatthenumberofemergencydepartmentvisitsroseabout10%,a
precisely estimated effect (the standard error associated with
this
estimate, reported in column (4), is .029), is especially
noteworthy.
Many policymakers hoped and expected health insurance to
shift
formerlyuninsuredpatientsawayfromhospitalemergencydepartme
nts
towardlesscostlysourcesofcare.

TABLE1.5
OHPeffectsoninsurancecoverageandhealth-careuse



Notes:ThistablereportsestimatesoftheeffectofwinningtheOregon
HealthPlan(OHP)
lotteryoninsurancecoverageanduseofhealthcare.Odd-
numberedcolumnsshowcontrolgroup
averages. Even-numbered columns report the regression
coefficient on a dummy for lottery
winners.Standarderrorsarereportedinparentheses.

Finally, theproofof thehealth insurancepuddingappears inTable
1.6: lottery winners in the statewide sample report a modest
improvementintheprobabilitytheyassesstheirhealthasbeinggoodo
r
better(aneffectof.039,whichcanbecomparedwithacontrolmeanof
.55; theHealth isGoodvariable isadummy).Results fromin-person
interviewsconducted inPortlandsuggest thesegains stemmore
from
improvedmental rather than physical health, as can be seen in
the

second and third rows in column (4) (the health variables in the
Portlandsampleare indicesranging from0 to100).As in theRAND
experiment,resultsfromPortlandsuggestphysicalhealthindicatorsl
ike
cholesterol and blood pressurewere largely unchanged by
increased
accesstoOHPinsurance.

TABLE1.6
OHPeffectsonhealthindicatorsandfinancialhealth



Notes:ThistablereportsestimatesoftheeffectofwinningtheOregon
HealthPlan(OHP)
lotteryonhealth indicatorsandfinancialhealth.Odd-
numberedcolumnsshowcontrolgroup
averages. Even-numbered columns report the regression
coefficient on a dummy for lottery
winners.Standarderrorsarereportedinparentheses.

TheweakhealtheffectsoftheOHPlotterydisappointedpolicymakers
wholookedtopubliclyprovidedinsurancetogenerateahealthdividen
d
for low-income Americans. The fact that health insurance
increased
ratherthandecreasedexpensiveemergencydepartmentuseisespecia
lly
frustrating.Atthesametime,panelBofTable1.6revealsthathealth
insurance provided the sort of financial safety net forwhich
itwas
designed.Specifically,householdswinningthelotterywerelesslikel
yto
have incurred large medical expenses or to have accumulated
debt
generated by the need to pay for health care. It may be this

improvement in financial health that accounts for improved
mental
healthinthetreatmentgroup.



Italsobearsemphasizingthatthefinancialandhealtheffectsseenin
Table1.6mostlikelycomefromthe25%ofthesamplewhoobtained
insuranceasaresultofthelottery.Adjustingforthefactthatinsurance
statuswasunchangedformanywinnersshowsthatgains in financial
securityandmentalhealthfortheone-quarterofapplicantswhowere
insuredasaresultofthelotterywereconsiderablylargerthansimple
comparisons of winners and losers would suggest. Chapter 3, on
instrumentalvariablesmethods,detailsthenatureofsuchadjustment
s.
As you’ll soon see, the appropriate adjustment here amounts to
the
divisionofwin/lossdifferencesinoutcomesbywin/lossdifferencesi
n
theprobabilityofinsurance.Thisimpliesthattheeffectofbeinginsure
d
is asmuch as four times larger than the effect ofwinning theOHP
lottery(statisticalsignificanceisunchangedbythisadjustment).
The RAND and Oregon findings are remarkably similar. Two
ambitiousexperimentstargetingsubstantiallydifferentpopulations
show
that the use of health-care services increases sharply in response
to
insurance coverage, while neither experiment reveals much of
an
insurance effect on physical health. In 2008, OHP lottery
winners
enjoyed small but noticeable improvements in mental health.
Importantly,andnotcoincidentally,OHPalsosucceeded in
insulating
many lotterywinners fromthe

financialconsequencesofpoorhealth,
justasagoodinsurancepolicyshould.Atthesametime,thesestudies
suggestthatsubsidizedpublichealthinsuranceshouldnotbeexpected
toyieldadramatichealthdividend.

MASTERJOSHWAY:Inanutshell,please,Grasshopper.

GRASSHOPPER:Causalinferencecomparespotentialoutcomes,
descriptionsoftheworldwhenalternativeroadsaretaken.

MASTERJOSHWAY:Dowecomparethosewhotookoneroadwithth
ose
whotookanother?

GRASSHOPPER:Suchcomparisonsareoftencontaminatedbyselect
ion
bias,thatis,differencesbetweentreatedandcontrolsubjectsthat



existevenintheabsenceofatreatmenteffect.

MASTERJOSHWAY:Canselectionbiasbeeliminated?

GRASSHOPPER:Randomassignmenttotreatmentandcontrol
conditionseliminatesselectionbias.Yeteveninrandomizedtrials,
wecheckforbalance.

MASTERJOSHWAY:Isthereasinglecausaltruth,whichallrandomi
zed
investigationsaresuretoreveal?

GRASSHOPPER:Iseenowthattherecanbemanytruths,Master,some
compatible,someincontradiction.Wethereforetakespecialnote
whenfindingsfromtwoormoreexperimentsaresimilar.

Mastersof’Metrics:FromDanieltoR.A.Fisher

ThevalueofacontrolgroupwasrevealedintheOldTestament.The
BookofDanielrecountshowBabylonianKingNebuchadnezzardecid
ed
togroomDanielandother Israelite captives forhis royal
service.As
slaverygoes,thiswasn’tabadgig,sincethekingorderedhiscaptivesb
e
fed“foodandwinefromtheking’stable.”Danielwasuneasyaboutthe
rich diet, however, preferring modest vegetarian fare. The
king’s
chamberlainsinitiallyrefusedDaniel’sspecialmealsrequest,fearing
that
hisdietwouldprove inadequate foronecalledon to serve theking.
Daniel,notwithoutchutzpah,proposedacontrolledexperiment:“Tes
t
yourservants fortendays.Giveusnothingbutvegetablestoeatand
watertodrink.Thencompareourappearancewiththatoftheyoung
menwhoeattheroyalfood,andtreatyourservantsinaccordancewith
what you see” (Daniel 1, 12–13). The Bible recounts how this
experiment supported Daniel’s conjecture regarding the relative
healthfulness of a vegetarian diet, though as far aswe
knowDaniel
himselfdidn’tgetanacademicpaperoutofit.
Nutrition is a recurring theme in the quest for balance. Scurvy,
a
debilitatingdiseasecausedbyvitaminCdeficiency,wasthescourgeof
theBritishNavy. In 1742, James Lind, a
surgeononHMSSalisbury,
experimentedwithacureforscurvy.Lindchose12seamenwithscurvy



and started themonan identicaldiet.He then formed sixpairs and
treatedeachofthepairswithadifferentsupplementtotheirdailyfood

ration.Oneoftheadditionswasanextratwoorangesandonelemon
(Lindbelievedanacidicdietmightcurescurvy).ThoughLinddidnot
userandomassignment,andhissamplewassmallbyourstandards,he
wasapioneerinthathechosehis12studymemberssotheywere“as
similarasIcouldhavethem.”Thecitruseaters—
Britain’sfirstlimeys—
were quickly and incontrovertibly cured, a life-changing
empirical
finding that emerged from Lind’s data even though his theory
was
wrong.13

Almost150yearspassedbetweenLindandthefirstrecordeduseof
experimental random assignment. This was by Charles Peirce,
an
American philosopher and scientist,who experimentedwith
subjects’
ability todetect smalldifferences inweight. Ina less-than-
fascinating
but methodologically significant 1885 publication, Peirce and
his
student Joseph Jastrow explained how they varied experimental
conditionsaccordingtodrawsfromapileofplayingcards.14

Theideaofarandomizedcontrolledtrialemergedinearnestonlyat
thebeginningofthetwentiethcentury,intheworkofstatisticianand
geneticistSirRonaldAylmerFisher,whoanalyzeddatafromagricult
ural
experiments.ExperimentalrandomassignmentfeaturesinFisher’s1
925



StatisticalMethodsforResearchWorkersandisdetailedinhislandma
rk
TheDesignofExperiments,publishedin1935.15

Fisher hadmany fantastically good ideas and a few bad ones. In
additiontoexplainingthevalueofrandomassignment,heinventedthe
statisticalmethodofmaximumlikelihood.Alongwith
’metricsmaster
SewallWright(andJ.B.S.Haldane),helaunchedthefieldoftheoretica
l
population genetics. But he was also a committed eugenicist and
a
proponentof
forcedsterilization(aswasregressionmasterSirFrancis
Galton,whocoinedtheterm“eugenics”).Fisher,alifelongpipesmoke
r,
wasalsoonthewrongsideofthedebateoversmokingandhealth,due
inparttohisstronglyheldbeliefthatsmokingandlungcancersharea
commongeneticorigin.Thenegativeeffectofsmokingonhealthnow
seemswellestablished,thoughFisherwasrighttoworryaboutselecti
on
biasinhealthresearch.Manylifestylechoices,suchaslow-
fatdietsand
vitamins,havebeenshown tobeunrelated tohealthoutcomeswhen
evaluatedwithrandomassignment.

Appendix:MasteringInference

YOUNGCAINE:Iampuzzled.

MASTERPO:Thatisthebeginningofwisdom.
KungFu,Season2,Episode25

Thisisthefirstofanumberofappendicesthatfillinkeyeconometric
and statistical details. You can spend your life studying
statistical
inference;manymastersdo.Hereweofferabrief sketchofessential
ideasandbasicstatisticaltools,enoughtounderstandtableslikethose
in

thischapter.
TheHIEisbasedonasampleofparticipantsdrawn(moreorless)at
random from the population eligible for the experiment.
Drawing
anothersamplefromthesamepopulation,we’dgetsomewhatdifferen
t
results,butthegeneralpictureshouldbesimilarifthesampleislarge
enoughfortheLLNtokickin.Howcanwedecidewhetherstatistical



resultsconstitutestrongevidenceormerelyaluckydraw,unlikelytob
e
replicatedinrepeatedsamples?Howmuchsamplingvarianceshould
we
expect?Thetoolsofformalstatisticalinferenceanswerthesequestion
s.
Thesetoolsworkforalloftheeconometricstrategiesofconcerntous.
Quantifyingsamplinguncertainty isanecessarystep
inanyempirical
project and on the road to understanding statistical claimsmade
by
others.WeexplainthebasicinferenceideahereinthecontextofHIE
treatmenteffects.
The taskathand is thequantificationof theuncertaintyassociated
withaparticularsampleaverageand,especially,groupsofaveragesan
d
thedifferencesamongthem.Forexample,we’dliketoknowifthelarge
differencesinhealth-
careexpenditureacrossHIEtreatmentgroupscan
bediscountedasachancefinding.TheHIEsamplesweredrawnfroma
much largerdata set thatwe thinkof as covering thepopulationof
interest. The HIE population consists of all families eligible for
the
experiment(tooyoungforMedicareandsoon).Insteadofstudyingthe
manymillionsofsuchfamilies,amuchsmallergroupofabout2,000

families (containingabout4,000people)was selectedat randomand
thenrandomlyallocatedtooneof14plansortreatmentgroups.Note
thattherearetwosortsofrandomnessatworkhere:thefirstpertainsto
theconstructionofthestudysampleandthesecondtohowtreatment
wasallocatedtothosewhoweresampled.Randomsamplingandrando
m
assignmentarecloselyrelatedbutdistinctideas.

AWorldwithoutBias
We first quantify the uncertainty induced by random sampling,
beginning with a single sample average, say, the average health
of
everyone in thesampleathand,asmeasuredbyahealth index.Our
targetisthecorrespondingpopulationaveragehealthindex,thatis,the
meanovereveryoneinthepopulationofinterest.Aswenotedonp.14,
thepopulationmeanofavariableiscalleditsmathematicalexpectatio
n,
or justexpectation for short.For theexpectationofavariable,Yi,we
write E[Yi]. Expectation is intimately related to formal notions
of



probability. Expectations can bewritten as aweighted average of
all
possiblevaluesthatthevariableYicantakeon,withweightsgivenby
the probability these values appear in the population. In our
dice-
throwingexample,theseweightsareequalandgivenby1/6(seeSectio
n
1.1).
Unlikeournotationforaverages,thesymbolforexpectationdoesnot
reference the sample size.That’sbecause
expectationsarepopulation
quantities, defined without reference to a particular sample of
individuals.Foragivenpopulation,thereisonlyoneE[Yi],whilethere

aremanyAvgn[Yi],dependingonhowwechoosenandjustwhoendsup
inoursample.BecauseE[Yi]isafixedfeatureofaparticularpopulatio
n,
wecallitaparameter.Quantitiesthatvaryfromonesampletoanother,
suchasthesampleaverage,arecalledsamplestatistics.
Atthispoint,it’shelpfultoswitchfromAvgn[Yi]toamorecompact
notationforaverages,Ȳ.Notethatwe’redispensingwiththesubscript
n
to avoid clutter—henceforth, it’s on you to remember that
sample
averages are computed in a sample of a particular size. The
sample
average,Ȳ,isagoodestimatorofE[Yi](instatistics,anestimatorisany
functionofsampledatausedtoestimateparameters).Foronething,the
LLNtellsusthatinlargesamples,thesampleaverageislikelytobevery
closetothecorrespondingpopulationmean.Arelatedpropertyisthat
theexpectationofȲisalsoE[Yi].Inotherwords,ifweweretodraw
infinitelymanyrandomsamples,theaverageoftheresultingȲacross
draws would be the underlying population mean. When a sample
statistic has expectation equal to the corresponding population
parameter,it’ssaidtobeanunbiasedestimatorofthatparameter.Here’
s
thesamplemean’sunbiasednesspropertystatedformally:

UNBIASEDNESSOFTHESAMPLEMEANE[Ȳ]=E[Yi]

The sample mean should not be expected to be bang on the
corresponding population mean: the sample average in one
sample
might be too big, while in other samples it will be too small.



Unbiasednesstellsusthatthesedeviationsarenotsystematicallyupor
down; rather, in repeated samples they average out to zero. This
unbiasedness property is distinct from the LLN,which says that

the
samplemeangetscloserandclosertothepopulationmeanasthesampl
e
sizegrows.Unbiasednessofthesamplemeanholdsforsamplesofany
size.

MeasuringVariability
In addition to averages, we’re interested in variability. To gauge
variability,it’scustomarytolookataveragesquareddeviationsfromt
he
mean, in which positive and negative gaps get equal weight. The
resultingsummaryofvariabilityiscalledvariance.
ThesamplevarianceofYiinasampleofsizenisdefinedas

The corresponding population variance replaces averages with
expectations,giving:

Like E[Yi], the quantityV(Yi) is a fixed feature of a
population—a
parameter. It’s therefore customary to christen it inGreek: ,
whichisreadas“sigma-squared-y.”16

Becausevariancessquarethedatatheycanbeverylarge.Multiplya
variableby10and itsvariancegoesupby100.Therefore,weoften
describevariabilityusingthesquarerootofthevariance:thisiscalled
the standard deviation,writtenσY.Multiply a variable by 10 and
its
standarddeviationincreasesby10.Asalways,thepopulationstandar
d
deviation,σY,hasasamplecounterpartS(Yi),thesquarerootofS(Yi)

2.
VarianceisadescriptivefactaboutthedistributionofYi.(Reminder:

thedistributionofavariableisthesetofvaluesthevariabletakesonand
therelativefrequencythateachvalueisobservedinthepopulationor
generatedbyarandomprocess.)Somevariablestakeonanarrowsetof
values(likeadummyvariableindicatingfamilieswithhealthinsuranc
e),
whileothers(likeincome)tendtobespreadoutwithsomeveryhigh
valuesmixedinwithmanysmallerones.
It’s important to document the variability of the variables
you’re
working with. Our goal here, however, goes beyond this. We’re
interestedinquantifyingthevarianceofthesamplemeaninrepeated
samples. Since the expectationof the samplemean isE[Yi](from
the
unbiasednessproperty),thepopulationvarianceofthesamplemeanca
n
bewrittenas

Thevarianceofastatisticlikethesamplemeanisdistinctfromthe
varianceusedfordescriptivepurposes.WewriteV(Ȳ)forthevariance
of
the sample mean, while V(Yi) (or ) denotes the variance of the
underlyingdata.BecausethequantityV(Ȳ)measuresthevariabilityo
fa
samplestatisticinrepeatedsamples,asopposedtothedispersionofra
w
data,V(Ȳ)hasaspecialname:samplingvariance.
Sampling variance is related to descriptive variance, but, unlike
descriptivevariance, samplingvariance is alsodeterminedby
sample
size. We show this by simplifying the formula for V(Ȳ). Start
by
substitutingtheformulaforȲinsidethenotationforvariance:

Tosimplifythisexpression,wefirstnotethatrandomsamplingensure
s
theindividualobservationsinasamplearenotsystematicallyrelatedt

o
one another; in otherwords, they are statistically independent.
This
important property allows us to take advantage of the fact that
the



varianceofasumofstatisticallyindependentobservations,eachdraw
n
randomly from the same population, is the sum of their
variances.
Moreover,becauseeachYi issampledfromthesamepopulation,each
drawhasthesamevariance, .Finally,weusethepropertythatthe
varianceofaconstant(like1/n)timesYiisthesquareofthisconstant
timesthevarianceofYi.Fromtheseconsiderations,weget

Simplifyingfurther,wehave

We’veshownthatthesamplingvarianceofasampleaveragedepends
onthevarianceoftheunderlyingobservations, ,andthesamplesize,
n. As you might have guessed, more data means less dispersion
of
sampleaveragesinrepeatedsamples.Infact,whenthesamplesizeis
verylarge,there’salmostnodispersionatall,becausewhennislarge,
is small. This is the LLNatwork: asn approaches infinity, the

sampleaverageapproachesthepopulationmean,andsamplingvarian
ce
disappears.
Inpractice,weoftenworkwiththestandarddeviationofthesample

meanratherthanitsvariance.Thestandarddeviationofastatisticlike
thesampleaverageiscalleditsstandarderror.Thestandarderrorofthe
samplemeancanbewrittenas

Everyestimatediscussedinthisbookhasanassociatedstandarderror.
This includes sample means (for which the standard error
formula
appearsinequation(1.6)),differencesinsamplemeans(discussedlat
er



inthisappendix),regressioncoefficients(discussedinChapter2),and
instrumentalvariablesandothermoresophisticatedestimates.Form
ulas
forstandarderrorscangetcomplicated,but the idearemainssimple.
The standard error summarizes the variability in an estimate due
to
random sampling. Again, it’s important to avoid confusing
standard
errorswiththestandarddeviationsoftheunderlyingvariables;thetwo
quantitiesareintimatelyrelatedyetmeasuredifferentthings.
One last step on the road to standard errors: most population

quantities, includingthestandarddeviationinthenumeratorof(1.6),
are unknown and must be estimated. In practice, therefore, when
quantifyingthesamplingvarianceofasamplemean,weworkwithan
estimatedstandarderror.ThisisobtainedbyreplacingσYwithS(Yi)i
nthe
formula for SE(Ȳ). Specifically, the estimated standard error of
the
samplemeancanbewrittenas

Weoftenforgetthequalifier“estimated”whendiscussingstatisticsan
d
theirstandarderrors,butthat’sstillwhatwehaveinmind.Forexample,
thenumbersinparenthesesinTable1.4areestimatedstandarderrors
fortherelevantdifferencesinmeans.

Thet-StatisticandtheCentralLimitTheorem

Havinglaidoutasimpleschemetomeasurevariabilityusingstandard
errors,itremainstointerpretthismeasure.Thesimplestinterpretation
usesat-statistic.Supposethedataathandcomefromadistributionfor
whichwe believe the populationmean,E[Yi], takes on a
particular
value, μ (read this Greek letter as “mu”). This value constitutes
a
workinghypothesis.At-
statisticforthesamplemeanundertheworking
hypothesisthatE[Yi]=μisconstructedas



Theworkinghypothesisisareferencepointthatisoftencalledthenull
hypothesis.Whenthenullhypothesisisμ=0,thet-statisticistheratio
ofthesamplemeantoitsestimatedstandarderror.
Manypeoplethinkthescienceofstatisticalinferenceisboring,butin

fact it’snothingshortofmiraculous.Onemiraculousstatistical fact
is
thatifE[Yi]isindeedequaltoμ,then—aslongasthesampleislarge
enough—thequantityt(μ)hasasamplingdistributionthatisveryclose
toabell-shapedstandardnormaldistribution, sketched inFigure1.1.
Thisproperty,whichappliesregardlessofwhetherYiitselfisnormall
y
distributed,iscalledtheCentralLimitTheorem(CLT).TheCLTallow
sus
tomakeanempirically informeddecisionastowhethertheavailable
datasupportorcastdoubtonthehypothesisthatE[Yi]equalsμ.

FIGURE1.1
Astandardnormaldistribution

TheCLTisanastonishingandpowerfulresult.Amongotherthings,it
impliesthatthe(large-sample)distributionofat-
statisticisindependent

of the distribution of the underlying data used to calculate it.
For
example, supposewemeasure health status with a dummy
variable
distinguishinghealthypeoplefromsickandthat20%ofthepopulation
issick.Thedistributionofthisdummyvariablehastwospikes,oneof
height.8atthevalue1andoneofheight.2atthevalue0.TheCLTtells



usthatwithenoughdata,thedistributionofthet-statisticissmoothand
bell-
shapedeventhoughthedistributionoftheunderlyingdatahasonly
twovalues.
We can see the CLT in action through a sampling experiment. In

sampling experiments, we use the random number generator in
our
computertodrawrandomsamplesofdifferentsizesoverandoveragai
n.
Wedidthisforadummyvariablethatequalsone80%ofthetimeand
forsamplesofsize10,40,and100.Foreachsamplesize,wecalculated
thet-statisticinhalfamillionrandomsamplesusing.8asourvalueofμ.
Figures1.2–1.4plotthedistributionof500,000t-statisticscalculated

foreachofthethreesamplesizesinourexperiment,withthestandard
normal distribution superimposed. With only 10 observations,
the
samplingdistributionisspiky,thoughtheoutlinesofabell-
shapedcurve
alsoemerge.Asthesamplesizeincreases,thefittoanormaldistributio
n
improves.With100observations,thestandardnormalisjustaboutban
g
on.
The standard normal distribution has a mean of 0 and standard

deviationof1.Withanystandardnormalvariable,values largerthan
±2arehighlyunlikely. In fact, realizations larger than2
inabsolute
valueappearonlyabout5%ofthetime.Becausethet-
statisticiscloseto
normallydistributed,wesimilarlyexpect it tofallbetweenabout±2
mostofthetime.Therefore,it’scustomarytojudgeanyt-
statisticlarger
thanabout2(inabsolutevalue)astoounlikelytobeconsistentwiththe
nullhypothesisusedtoconstructit.Whenthenullhypothesisisμ=0
andthet-statisticexceeds2inabsolutevalue,wesaythesamplemeanis
significantlydifferent fromzero.Otherwise, it’snot.Similar
language is
usedforothervaluesofμaswell.

FIGURE1.2
Thedistributionofthet-statisticforthemeaninasampleofsize10



Note:Thisfigureshowsthedistributionofthesamplemeanofadummy
variablethatequals
1withprobability.8.

FIGURE1.3
Thedistributionofthet-statisticforthemeaninasampleofsize40

Note:Thisfigureshowsthedistributionofthesamplemeanofadummy
variablethatequals
1withprobability.8.

FIGURE1.4
Thedistributionofthet-statisticforthemeaninasampleofsize100

Note:Thisfigureshowsthedistributionofthesamplemeanofadummy
variablethatequals
1withprobability.8.

Wemightalsoturnthequestionofstatisticalsignificanceonitsside:
instead of checkingwhether the sample is consistentwith a
specific
valueofμ,wecanconstructthesetofallvaluesofμthatareconsistent
withthedata.Thesetofsuchvaluesiscalledaconfidenceintervalfor
E[Yi].Whencalculatedinrepeatedsamples,theinterval

shouldcontainE[Yi]about95%ofthetime.Thisintervalistherefore
said to be a 95% confidence interval for the population mean.
By
describing the set of parameter values consistent with our data,
confidence intervals provide a compact summary of the
information
thesedatacontainaboutthepopulationfromwhichtheyweresampled.

PairingOff
Onesampleaverageistheloneliestnumberthatyou’lleverdo.Luckily
,
we’re usually concernedwith two.We’re especially keen to
compare
averagesforsubjectsinexperimentaltreatmentandcontrolgroups.W
e
reference these averages with a compact notation, writing Ȳ1
for



Avgn[Yi|Di=1]andȲ
0forAvgn[Yi|Di=0].Thetreatmentgroupmean,

Ȳ1, is theaverage for then1observationsbelonging to the

treatment
group,withȲ0definedsimilarly.Thetotalsamplesizeisn=n0+n1.
For our purposes, the difference between Ȳ1 andȲ0 is either an

estimateof the causal effectof treatment (ifYi is anoutcome), or
a
checkonbalance(ifYiisacovariate).Tokeepthediscussionfocused,
we’ll assume the former. Themost important null hypothesis in
this
contextisthattreatmenthasnoeffect,inwhichcasethetwosamples
usedtoconstructtreatmentandcontrolaveragescomefromthesame
population. On the other hand, if treatment changes outcomes,
the
populationsfromwhichtreatmentandcontrolobservationsaredrawn
arenecessarilydifferent.Inparticular,theyhavedifferentmeans,whi
ch
wedenoteμ1andμ0.
Wedecidewhethertheevidencefavorsthehypothesisthatμ1=μ0

by lookingforstatisticallysignificantdifferences in
thecorresponding
sampleaverages.Statisticallysignificantresultsprovidestrongevid
ence
of a treatment effect, while results that fall short of statistical
significanceareconsistentwiththenotionthattheobserveddifferenc
e
in treatment and controlmeans is a chance finding. The
expression
“chancefinding”inthiscontextmeansthatinahypotheticalexperime
nt
involvingvery large samples—so large that any
samplingvariance is
effectivelyeliminated—
we’dfindtreatmentandcontrolmeanstobethe
same.
Statisticalsignificanceisdeterminedbytheappropriate t-statistic.A

keyingredientinanytrecipeisthestandarderrorthatlivesdownstairs
inthetratio.Thestandarderrorforacomparisonofmeansisthesquare
root of the sampling variance ofȲ1−Ȳ0. Using the fact that the
varianceofadifferencebetweentwostatisticallyindependentvariabl
es
isthesumoftheirvariances,wehave



Thesecondequalityhereusesequation(1.5),whichgivesthesampling
varianceofasingleaverage.Thestandarderrorweneedistherefore

In deriving this expression, we’ve assumed that the variances of
individualobservationsarethesameintreatmentandcontrolgroups.
This assumption allows us to use one symbol, , for the common
variance.Aslightlymorecomplicatedformulaallowsvariancestodif
fer
acrossgroupsevenifthemeansarethesame(anideatakenupagainin
thediscussionofrobustregressionstandarderrors in theappendix to
Chapter2).17

Recognizingthat mustbeestimated,inpracticeweworkwiththe
estimatedstandarderror

whereS(Yi) isthepooledsamplestandarddeviation.This is
thesample
standard deviation calculated using data from both treatment
and
controlgroupscombined.
Underthenullhypothesisthatμ1−μ0isequaltothevalueμ,thet-

statisticforadifferenceinmeansis

Weusethist-statistictotestworkinghypothesesaboutμ1−μ0andto
construct confidence intervals for this difference. When the null

hypothesisisoneofequalmeans(μ=0),thestatistict(μ)equalsthe
differenceinsamplemeansdividedbytheestimatedstandarderrorof
thisdifference.Whenthet-
statisticislargeenoughtorejectadifference
ofzero,wesaytheestimateddifferenceisstatisticallysignificant.The
confidenceintervalforadifferenceinmeansisthedifferenceinsampl
e
meansplusorminustwostandarderrors.
Bearinmindthatt-statisticsandconfidenceintervalshavelittletosay

about whether findings are substantively large or small. A large
t-
statistic ariseswhen the estimated effect of interest is largebut
also
when theassociated standarderror is small
(ashappenswhenyou’re
blessed with a large sample). Likewise, the width of a
confidence
interval isdeterminedby statisticalprecisionas reflected in
standard
errorsandnotby themagnitudeof therelationshipsyou’re trying to
uncover. Conversely, t-statistics may be small either because
the
difference in theestimatedaverages is smallorbecause the
standard
errorofthisdifferenceislarge.Thefactthatanestimateddifferenceis
notsignificantlydifferentfromzeroneednotimplythattherelationshi
p
under investigation is small or unimportant. Lack of statistical
significance often reflects lack of statistical precision, that is,
high
sampling variance.Masters aremindful of this factwhen
discussing

econometricresults.

1Formoreonthissurprisingfact,seeJonathanGruber,“Coveringthe
UninsuredintheUnited

States,”JournalofEconomicLiterature,vol.46,no.3,September200
8,pages571–606.
2Oursampleisaged26–
59andthereforedoesnotyetqualifyforMedicare.
3AnEmpiricalNotessectionafterthelastchaptergivesdetailednotesf
orthistableandmost

oftheothertablesandfiguresinthebook.
4 Robert Frost’s insights notwithstanding, econometrics isn’t
poetry. A modicum of

mathematicalnotationallowsustodescribeanddiscusssubtlerelatio
nshipsprecisely.Wealso
use italics to introduce repeatedly used terms, such as
potentialoutcomes, that have special
meaningformastersof’metrics.
5OrderthenobservationsonYisothatthen0observationsfromthegro
upindicatedbyDi=

0precedethen1observationsfromtheDi=1group.Theconditionalave
rage



isthesampleaverageforthen0observationsintheDi=0group.Theter
mAvgn[Yi|Di=1]is
calculatedanalogouslyfromtheremainingn1observations.
6Six-
sidedcubeswithonetosixdotsengravedoneachside.There’sanappfo
r’emonyour

smartphone.
7OurdescriptionoftheHIEfollowsRobertH.Brooketal.,“DoesFree
CareImproveAdults’

Health?ResultsfromaRandomizedControlledTrial,”NewEnglandJ
ournalofMedicine,vol.309,
no.23,December8,1983,pages1426–1434.SeealsoAvivaAron-
Dine,LiranEinav,andAmy
Finkelstein,“TheRANDHealthInsuranceExperiment,ThreeDecad
esLater,”JournalofEconomic
Perspectives,vol.27,Winter2013,pages197–
222,forarecentassessment.
8 OtherHIE complications include the fact that instead of
simply tossing a coin (or the

computer equivalent), RAND investigators implemented a
complex assignment scheme that
potentiallyaffectsthestatisticalpropertiesoftheresultinganalyses(f
ordetails,seeCarlMorris,
“AFiniteSelectionModelforExperimentalDesignoftheHealthInsur
anceStudy,”Journalof
Econometrics,vol.11,no.1,September1979,pages43–
61).Intentionshereweregood,inthat
theexperimentershoped to insure themselvesagainst
chancedeviation fromperfectbalance
acrosstreatmentgroups.MostHIEanalystsignoretheresultingstatist
icalcomplications,though
manyprobably joinus
inregrettingthisattempttogildtherandomassignment lily.Amore
seriousproblemarisesfromthelargenumberofHIEsubjectswhodrop
pedoutoftheexperiment
andthelargedifferencesinattritionratesacrosstreatmentgroups(few
erleftthefreeplan,for
example).AsnotedbyAron-
Dine,Einav,andFinkelstein,“TheRANDExperiment,”Journalof
Economic Perspectives, 2013, differential attrition may have

compromised the experiment’s
validity.Today’s“randomistas”dobetteronsuchnuts-and-
boltsdesignissues(see,forexample,
the experiments described in Abhijit Banerjee and Esther Duflo,
Poor Economics: A Radical
RethinkingoftheWaytoFightGlobalPoverty,PublicAffairs,2011).
9TheRANDresultsreportedherearebasedonourowntabulationsfro
mtheHIEpublicuse

file,asdescribedintheEmpiricalNotessectionattheendofthebook.T
heoriginalRANDresults
are summarized in Joseph P. Newhouse et al., Free for All?
Lessons from the RANDHealth
InsuranceExperiment,HarvardUniversityPress,1994.
10Participants
inthefreeplanhadslightlybettercorrectedvisionthanthose
intheother

plans;seeBrooketal.,“DoesFreeCareImproveHealth?”NewEnglan
dJournalofMedicine,1983,
fordetails.
11SeeAmyFinkelsteinetal.,“TheOregonHealthInsuranceExperim
ent:Evidencefromthe

FirstYear,”Quarterly Journal of Economics, vol. 127, no.
3,August 2012, pages1057–1106;
KatherineBaickeretal.,“TheOregonExperiment—
EffectsofMedicaidonClinicalOutcomes,”
NewEnglandJournalofMedicine,vol.368,no.18,May2,2013,pages
1713–1722;andSarah
Taubmanetal.,“MedicaidIncreasesEmergencyDepartmentUse:Evi
dencefromOregon’sHealth
InsuranceExperiment,”Science,vol.343,no.6168,January17,2014,
pages263–268.
12 Why weren’t all OHP lottery winners insured? Some failed
to submit the required

paperworkontime,whileabouthalfofthosewhodidcompletethenece
ssaryformsinatimely
fashionturnedouttobeineligibleonfurtherreview.
13Lind’sexperimentisdescribedinDuncanP.Thomas,“Sailors,Scur
vy,andScience,”Journal

oftheRoyalSocietyofMedicine,vol.90,no.1,January1997,pages50
–54.
14CharlesS.PeirceandJosephJastrow,“OnSmallDifferencesinSen
sation,”Memoirsofthe

NationalAcademyofSciences,vol.3,1885,pages75–83.
15RonaldA. Fisher,StatisticalMethods for
ResearchWorkers,Oliver andBoyd, 1925, and

RonaldA.Fisher,TheDesignofExperiments,OliverandBoyd,1935.



16Samplevariancestendtounderestimatepopulationvariances.Sam
plevarianceistherefore
sometimesdefinedas

thatis,dividingbyn−1insteadofbyn.Thismodifiedformulaprovides
anunbiasedestimate
ofthecorrespondingpopulationvariance.
17Usingseparatevariancesfortreatmentandcontrolobservations,w
ehave

whereV1(Yi) is the variance of treated observations, andV
0(Yi) is the variance of control

observations.

Chapter2

Regression

KWAICHANGCAINE:Aworkerisknownbyhistools.Ashovelfora
man
whodigs.Anaxforawoodsman.Theeconometricianruns
regressions.
KungFu,Season1,Episode8

OurPath

Whenthepathtorandomassignmentisblocked,welookforalternate
routestocausalknowledge.Wieldedskillfully,’metricstoolsotherth
an
randomassignmentcanhavemuchofthecausality-
revealingpowerofa
real experiment. The most basic of these tools is regression,
which
comparestreatmentandcontrolsubjectswhohavethesameobserved
characteristics.Regressionconceptsarefoundational,pavingthewa
yfor
themoreelaboratetoolsusedinthechaptersthat follow.Regression-
basedcausalinferenceispredicatedontheassumptionthatwhenkey
observedvariableshavebeenmadeequalacrosstreatmentandcontrol
groups, selection bias from the things we can’t see is also
mostly
eliminated.Weillustratethisideawithanempiricalinvestigationofth
e
economicreturnstoattendanceateliteprivatecolleges.

2.1ATaleofTwoColleges

Studentswhoattendedaprivatefour-yearcollegeinAmericapaidan
averageofabout$29,000intuitionandfeesinthe2012–2013school
year.Thosewhowenttoapublicuniversityintheirhomestatepaidless
than$9,000.Aneliteprivateeducationmightbebetterinmanyways:
the classes smaller, the athletic facilities newer, the faculty
more
distinguished,andthestudentssmarter.But$20,000peryearofstudyi
s
abigdifference.Itmakesyouwonderwhetherthedifferenceisworthit.
Theapples-to-applesquestioninthiscaseaskshowmucha40-year-
oldMassachusetts-
borngraduateof,say,Harvard,wouldhaveearnedif
heorshehadgonetotheUniversityofMassachusetts(U-
Mass)instead.
Moneyisn’teverything,but,asGrouchoMarxobserved:“Moneyfree
s
you from doing things you dislike. Since I dislike doing nearly
everything, money is handy.” So whenwe ask whether the
private
school tuition premium is worth paying, we focus on the
possible
earnings gain enjoyed by thosewho attend elite private
universities.
Higherearningsaren’ttheonlyreasonyoumightpreferaneliteprivate
institutionoveryour localstateschool.Manycollegestudentsmeeta
futurespouseandmakelastingfriendshipswhileincollege.Still,whe
n
families invest an additional $100,000 ormore in human capital,
a
higheranticipatedearningspayoffseemslikelytobepartofthestory.
Comparisonsofearningsbetweenthosewhoattenddifferentsortsof
schools invariably reveal large gaps in favor of elite-college
alumni.
Thinkingthisthrough,however,it’seasytoseewhycomparisonsofth
e
earningsofstudentswhoattendedHarvardandU-Massareunlikelyto

revealthepayofftoaHarvarddegree.Thiscomparisonreflectsthefact
thatHarvardgradstypicallyhavebetterhighschoolgradesandhigher
SAT scores, aremoremotivated, and perhaps have other skills
and
talents.NodisrespectintendedforthemanygoodstudentswhogotoU-
Mass,butit’sdamnhardtogetintoHarvard,andthosewhodoarea
specialandselectgroup.Incontrast,U-Massacceptsandevenawards
scholarshipmoneytoalmosteveryMassachusettsapplicantwithdece
nt
tenth-grade test scores. We should therefore expect earnings
comparisonsacrossalmamaterstobecontaminatedbyselectionbias,
justlikethecomparisonsofhealthbyinsurancestatusdiscussedinthe



previous chapter.We’ve also seen that this sort of selection bias
is
eliminatedbyrandomassignment.Regrettably,theHarvardadmissio
ns
officeisnotyetpreparedtoturntheiradmissionsdecisionsovertoa
randomnumbergenerator.
Thequestionofwhethercollegeselectivitymattersmustbeanswered
using the data generated by the routine application, admission,
and
matriculation decisionsmade by students and universities of
various
types.Canweusethesedatatomimictherandomizedtrialwe’dliketo
runinthiscontext?Not toperfection,surely,butwemaybeableto
comeclose.Thekeytothisundertakingisthefactthatmanydecisions
and choices, including those related to college attendance,
involve a
certain amount of serendipitous variation generated by financial
considerations,personalcircumstances,andtiming.
Serendipitycanbeexploitedinasampleofapplicantsonthecusp,
whocouldeasilygoonewayor theother.Doesanyoneadmitted to
Harvard really go to their local state school instead?Our

friendand
formerMITPhDstudent,Nancy,didjustthat.NancygrewupinTexas,
sotheUniversityofTexas(UT)washerstateschool.UT’sflagshipAus
tin
campusisrated“HighlyCompetitive”inBarron’srankings,butit’sno
t
Harvard. UT is, however, much less expensive than Harvard
(The
PrincetonReview recently named UT Austin a “Best Value
College”).
Admitted to both Harvard and UT, Nancy chose UT over
Harvard
becausetheUTadmissionsoffice,anxioustoboostaverageSATscore
s
oncampus,offeredNancyanda fewotheroutstandingapplicantsan
especiallygenerousfinancialaidpackage,whichNancygladlyaccept
ed.
WhataretheconsequencesofNancy’sdecisiontoacceptUT’soffer
anddeclineHarvard’s?ThingsworkedoutprettywellforNancyinspit
e
ofherchoiceofUToverHarvard:todayshe’saneconomicsprofessora
t
anotherIvyLeagueschoolinNewEngland.Butthat’sonlyoneexampl
e.
Well,actually,it’stwo:OurfriendMandygotherbachelor’sfromthe
University of Virginia, her home state school, declining offers
from
Duke, Harvard, Princeton, and Stanford. Today, Mandy teaches
at
Harvard.
Asampleoftwoisstilltoosmallforreliablecausalinference.We’d



like to comparemany people likeMandy andNancy tomany other
similarpeoplewhochoseprivatecollegesanduniversities.Fromlarge

r
groupcomparisons,wecanhopetodrawgeneral lessons.Accesstoa
largesampleisnotenough,however.Thefirstandmostimportantstep
inourefforttoisolatetheserendipitouscomponentofschoolchoiceist
o
hold constant the most obvious and important differences
between
studentswhogotoprivateandstateschools.Inthismanner,wehope
(thoughcannotpromise)tomakeotherthingsequal.
Here’s a small-sample numerical example to illustrate the
ceteris

paribus idea (we’ll have more data when the time comes for real
empiricalwork).Supposetheonlythingsthatmatterinlife,atleastas
farasyourearningsgo,areyourSATscoresandwhereyougotoschool.
ConsiderUmaandHarvey,bothofwhomhaveacombinedreadingand
mathscoreof1,400ontheSAT.1UmawenttoU-Mass,whileHarvey
wenttoHarvard.WestartbycomparingUma’sandHarvey’searnings.
Becausewe’veassumedthatallthatmattersforearningsbesidescolle
ge
choiceisthecombinedSATscore,Umavs.Harveyisaceterisparibus
comparison.
Inpractice,ofcourse,lifeismorecomplicated.Thissimpleexample

suggests one significant complication: Uma is a young woman,
and
Harveyisayoungman.Womenwithsimilareducationalqualification
s
oftenearnlessthanmen,perhapsduetodiscriminationortimespent
outofthelabormarkettohavechildren.ThefactthatHarveyearns20%
morethanUmamaybetheeffectofasuperiorHarvardeducation,butit
might justaswellreflectamale-femalewagegapgeneratedbyother
things.
We’d like to disentangle the pureHarvard effect from these
other

things.Thisiseasyiftheonlyotherthingthatmattersisgender:replace
Harvey with a female Harvard student, Hannah, who also has a
combinedSATof1,400,comparingUmaandHannah.Finally,becaus
e
we’re after general conclusions that gobeyond individual
stories,we
lookformanysimilarsame-sexandsame-SATcontrastsacrossthetwo
schools. That is,we compute the average earnings difference
among
HarvardandU-MassstudentswiththesamegenderandSATscore.The



averageofallsuchgroup-specificHarvardversusU-
Massdifferencesis
ourfirstshotatestimatingthecausaleffectofaHarvardeducation.Thi
s
is an econometricmatching estimator that controls for—that is,
holds
fixed—
sexandSATscores.Assumingthat,conditionalonsexandSAT
scores, the students who attend Harvard and U-Mass have
similar
earningspotential,thisestimatorcapturestheaveragecausaleffectof
a
Harvarddegreeonearnings.

Matchmaker,Matchmaker
Alas,there’smoretoearningsthansex,schools,andSATscores.Since
collegeattendancedecisionsaren’trandomlyassigned,wemustcontr
ol
for all factors that determine both attendance decisions and later
earnings. These factors include student characteristics, like
writing
ability,diligence,familyconnections,andmore.Controlforsuchawi
de

rangeoffactorsseemsdaunting:thepossibilitiesarevirtuallyinfinite,
andmanycharacteristicsarehardtoquantify.ButStacyBergDaleand
AlanKruegercameupwithacleverandcompellingshortcut.2Instead
of
identifyingeverythingthatmightmatterforcollegechoiceandearnin
gs,
theyworkwithakeysummarymeasure:thecharacteristicsofcolleges
towhichstudentsappliedandwereadmitted.
ConsideragainthetaleofUmaandHarvey:bothappliedto,andwere
admittedto,U-MassandHarvard.ThefactthatUmaappliedtoHarvard
suggests she has themotivation to go there,while her admission
to
Harvardsuggestsshehastheabilitytosucceedthere,justlikeHarvey.
Atleastthat’swhattheHarvardadmissionsofficethinks,andtheyare
not easily fooled.3 Uma nevertheless opts for a cheaper U-Mass
education. Her choice might be attributable to factors that are
not
closelyrelatedtoUma’searningspotential,suchasasuccessfuluncle
whowenttoU-Mass,abestfriendwhochoseU-Mass,orthefactthat
Umamissedthedeadline for thateasilywonRotaryClubscholarship
thatwouldhavefundedanIvyLeagueeducation.Ifsuchserendipitous
eventsweredecisiveforUmaandHarvey,thenthetwoofthemmakea
goodmatch.



DaleandKruegeranalyzedalargedatasetcalledCollegeandBeyond
(C&B).TheC&Bdatasetcontainsinformationonthousandsofstuden
ts
whoenrolledinagroupofmoderatelytohighlyselectiveU.S.colleges
anduniversities, togetherwith survey information collected from
the
students at the time they took the SAT, about a year before
college
entry,andinformationcollectedin1996,longaftermosthadgraduate
d

fromcollege.Theanalysisherefocusesonstudentswhoenrolledin19
76
and who were working in 1995 (most adult college graduates are
working).Thecollegesincludeprestigiousprivateuniversities,liket
he
UniversityofPennsylvania,Princeton,andYale; anumberof
smaller
privatecolleges,likeSwarthmore,Williams,andOberlin;andfourpu
blic
universities(Michigan,TheUniversityofNorthCarolina,PennState,
and
MiamiUniversity inOhio). The average (1978) SAT scores at
these
schoolsrangedfromalowof1,020atTulanetoahighof1,370atBryn
Mawr.In1976,tuitionrateswereaslowas$540attheUniversityof
NorthCarolinaandashighas$3,850atTufts(thosewerethedays).
Table2.1details a stripped-downversionof theDale andKrueger

matchingstrategy,inasetupwecallthe“collegematchingmatrix.”Th
is
table lists applications, admissions, andmatriculation decisions
for a
(made-up) listofninestudents,eachofwhomapplied toasmanyas
threeschoolschosenfromanimaginarylistofsix.Threeoutofthesix
schoolslistedinthetablearepublic(AllState,TallState,andAltered
State)andthreeareprivate(Ivy,Leafy,andSmart).Fiveofournine
students(numbers1,2,4,6,and7)attendedprivateschools.Average
earnings in this group are $92,000. The other four, with average
earningsof$72,500,wenttoapublicschool.Thealmost$20,000gap
betweenthesetwogroupssuggestsalargeprivateschooladvantage.

TABLE2.1
Thecollegematchingmatrix

Note:Enrollmentdecisionsarehighlightedingray.

ThestudentsinTable2.1areorganizedinfourgroupsdefinedbythe
setof schools towhich theyappliedandwereadmitted.Withineach
group,studentsarelikelytohavesimilarcareerambitions,whilethey
were also judged to be of similar ability by admissions staff at
the
schools to which they applied. Within-group comparisons
should
therefore be considerably more apples-to-apples than
uncontrolled
comparisonsinvolvingallstudents.
ThethreegroupAstudentsappliedtotwoprivateschools,Leafyand

Smart,andonepublicschool,TallState.Althoughthesestudentswere
rejectedatLeafy,theywereadmittedtoSmartandTallState.Students
1
and2wenttoSmart,whilestudent3optedforTallState.Thestudents
ingroupAhavehighearnings,andprobablycomefromuppermiddle
classfamilies(asignalhereisthattheyappliedtomoreprivateschools
thanpublic).Student3, thoughadmitted toSmart,opted forcheaper
TallState,perhapstosaveherfamilymoney(likeourfriendsNancyan
d
Mandy).Althoughthestudents ingroupAhavedonewell,withhigh
averageearningsandahighrateofprivateschoolattendance,within
groupA,theprivateschooldifferentialisnegative:(110+100)/2−
110=−5,inotherwords,agapof−$5,000.
ThecomparisoningroupAisoneofanumberofpossiblematched



comparisonsinthetable.GroupBincludestwostudents,eachofwhom
appliedtooneprivateandtwopublicschools(Ivy,AllState,andAltere
d
State).ThestudentsingroupBhaveloweraverageearningsthanthose
in group A. Bothwere admitted to all three schools towhich they

applied.Number4enrolledatIvy,whilenumber5choseAlteredState.
Theearningsdifferentialhere is$30,000 (60−30=30).Thisgap
suggestsasubstantialprivateschooladvantage.
GroupCincludestwostudentswhoappliedtoasingleschool(Leafy),
where they were admitted and enrolled. Group C earnings reveal
nothing about the effects of private school attendance, because
both
studentsinthisgroupattendedprivateschool.Thetwostudentsingrou
p
Dappliedtothreeschools,wereadmittedtotwo,andmadedifferent
choices. But these two students choseAll State and Tall State,
both
publicschools,sotheirearningsalsorevealnothingaboutthevalueofa
privateeducation.GroupsCandDareuninformative,because,fromth
e
perspectiveofourefforttoestimateaprivateschooltreatmenteffect,
eachiscomposedofeitherall-treatedorall-controlindividuals.
GroupsAandBarewheretheactionisinourexample,sincethese
groupsincludepublicandprivateschoolstudentswhoappliedtoand
wereadmittedtothesamesetofschools.Togenerateasingleestimate
thatusesallavailabledata,weaveragethegroup-
specificestimates.The
averageof−$5,000forgroupAand$30,000forgroupBis$12,500.
This isagoodestimateof theeffectofprivate schoolattendanceon
averageearnings,because,toalargedegree,itcontrolsforapplicants’
choicesandabilities.
Thesimpleaverageoftreatment-controldifferencesingroupsAandB
isn’t theonlywell-controlled comparison that canbe computed
from
thesetwogroups.Forexample,wemightconstructaweightedaverage
whichreflectsthefactthatgroupBincludestwostudentsandgroupA
includesthree.Theweightedaverageinthiscaseiscalculatedas



Byemphasizinglargergroups,thisweightingschemeusesthedatamo

re
efficiently and may therefore generate a statistically more
precise
summaryoftheprivate-publicearningsdifferential.
Themostimportantpointinthiscontextistheapples-to-applesand
oranges-to-oranges nature of the underlying matched
comparisons.
ApplesingroupAarecomparedtoothergroupAapples,whileoranges
in group B are compared only with oranges. In contrast, naive
comparisons that simply compare the earnings of private and
public
schoolstudentsgenerateamuchlargergapof$19,500whencomputed
using all nine students in the table. Evenwhen limited to the
five
studentsingroupsAandB,theuncontrolledcomparisongeneratesaga
p
of$20,000(20=(110+100+60)/3−(110+30)/2).Thesemuch
larger uncontrolled comparisons reflect selection bias: students
who
apply to and are admitted to private schools have higher
earnings
wherevertheyultimatelychosetogo.
Evidence of selection bias emerges from a comparison of
average
earningsacross(insteadofwithin)groupsAandB.Averageearningsi
n
group A, where two-thirds apply to private schools, are around
$107,000.AverageearningsingroupB,wheretwo-
thirdsapplytopublic
schools, are only $45,000.Ourwithin-group comparisons reveal
that
much of this shortfall is unrelated to students’ college
attendance
decisions. Rather, the cross-group differential is explained by a
combinationofambitionandability,asreflectedinapplicationdecisi
ons

andthesetofschoolstowhichstudentswereadmitted.

2.2MakeMeaMatch,RunMeaRegression

Regression is the tool thatmasterspickup first, ifonly toprovidea
benchmarkformoreelaborateempiricalstrategies.Althoughregressi
on
isamany-splendoredthing,wethinkofitasanautomatedmatchmaker.
Specifically, regression estimates are weighted averages of
multiple
matched comparisons of the sort constructed for the groups in
our
stylized matching matrix (the appendix to this chapter discusses
a



closely related connection between regression and mathematical
expectation).

Thekeyingredientsintheregressionrecipeare

▪thedependentvariable,inthiscase,studenti’searningslaterin
life,alsocalledtheoutcomevariable(denotedbyYi);

▪ the treatment variable, in this case, a dummy variable that
indicatesstudentswhoattendedaprivatecollegeoruniversity
(denotedbyPi);and

▪asetofcontrolvariables,inthiscase,variablesthatidentifysets
ofschoolstowhichstudentsappliedandwereadmitted.

Inourmatchingmatrix,thefivestudentsingroupsAandB(Table
2.1)contributeusefuldata,whilestudents ingroupsCandDcanbe
discarded.InadatasetcontainingthoseleftafterdiscardinggroupsC
andD,asinglevariableindicatingthestudentsingroupAtellsuswhich

ofthetwogroupstheremainingstudentsarein,becausethosenotin
groupAareingroupB.Thisvariable,whichwe’llcallAi, isoursole
control.NotethatbothPiandAiaredummyvariables,thatis,theyequa
l
1 to indicate observations in a specific state or condition, and 0



otherwise.Dummies,as theyarecalled(noreferencetoabilityhere),
classifydataintosimpleyes-or-
nocategories.Evenso,bycodingmany
dummies,wegetasetofcontrolvariablesthat’sasdetailedaswelike.4

The regression model in this context is an equation linking the
treatment variable to the dependent variable while holding
control
variablesfixedbyincludingtheminthemodel.Withonlyonecontrol
variable,Ai,theregressionofinterestcanbewrittenas

The distinction between the treatment variable, Pi, and the
control
variable,Ai,inequation(2.1)isconceptual,notformal:thereisnothin
g
inequation(2.1)toindicatewhichiswhich.Yourresearchquestionan
d
empiricalstrategyjustifythechoiceofvariablesanddeterminetherol
es
theyplay.
As in the previous chapter, here we also use Greek letters for

parameters todistinguish them from thevariables in
themodel.The
regressionparameters—calledregressioncoefficients—are

▪theintercept,α(“alpha”);
▪thecausaleffectoftreatment,β(“beta”);

▪andtheeffectofbeingagroupAstudent,γ(“gamma”).

Thelastcomponentofequation(2.1)istheresidual,ei(alsocalledan
error term). Residuals are defined as the difference between the
observedYi and the fittedvalues generated by the specific
regression
modelwehaveinmind.Thesefittedvaluesarewrittenas

andthecorrespondingresidualsaregivenby



Regressionanalysisassignsvaluestomodelparameters(α,β,andγ)so
astomakeŶiascloseaspossibletoYi.Thisisaccomplishedbychoosin
g
values that minimize the sum of squared residuals, leading to
the
moniker ordinary least squares (OLS) for the resulting
estimates.5

Executingthisminimization inaparticularsample,wearesaidtobe
estimating regression parameters. ’Metrics masters, who
estimate
regressionmodelseveryday,aresometimessaidto“runregressions,”
thoughoftenitseemsthatregressionsrunusratherthantheotherway
around. The formalities of regression estimation and the
statistical
theorythatgoeswithitaresketchedintheappendixtothischapter.
Runningregression(2.1)ondataforthefivestudentsingroupsAand

Bgenerates thefollowingestimates(theseestimatescanbecomputed
usingahandcalculator,butforrealempiricalwork,weuseprofessiona
l
regressionsoftware):

Theprivateschoolcoefficientinthiscaseis10,000,implyingaprivate

-
public earnings differential of $10,000. This is indeed a
weighted
averageofour twogroup-specificeffects (recall thegroupAeffect
is
−5,000 and the groupB effect is 30,000).While this is neither
the
simple unweighted average (12,500) nor the group-size
weighted
average (9,000), it’s not too far from either of them. In this
case,
regressionassignsaweightof4/7togroupAand3/7togroupB.As
with these other averages, the regression-weighted average is
considerablysmallerthantheuncontrolledearningsgapbetweenpriv
ate
andpublicschoolalumni.6

Regression estimates (and the associated standard errors used to
quantify their sampling variance) are readily constructed using
computersandeconometricsoftware.Computationalsimplicityandt
he
conceptualinterpretationofregressionestimatesasaweightedavera
ge
of group-specific differences are two of the reasons we regress.



Regressionalsohastwomorethingsgoingforit.First,it’saconvention
among masters to report regression estimates in almost every
econometric investigation of causal effects, including those
involving
treatment variables that take on more than two values.
Regression
estimatesprovidea simplebenchmark for fancier
techniques.Second,
undersomecircumstances,regressionestimatesareefficientinthese

nse
ofproviding themost
statisticallypreciseestimatesofaveragecausal
effectsthatwecanhopetoobtainfromagivensample.Thistechnical
pointisreviewedbrieflyinthechapterappendix.

Public-PrivateFace-Off
TheC&Bdata set includesmore than14,000 former students.
These
studentswereadmittedandrejectedatmanydifferentcombinationsof
schools (C&B asked for the names of at least three schools
students
considered seriously,besides theoneattended).Manyof
thepossible
application/acceptancesets in thisdatasetarerepresentedbyonlya
singlestudent.Moreover,insomesetswithmorethanonestudent,all
schoolsareeitherpublicorprivate.JustaswithgroupsCandDinTable
2.1,theseperfectlyhomogeneousgroupsprovidenoguidanceastothe
valueofaprivateeducation.
Wecanincreasethenumberofusefulcomparisonsbydeemingschools

to be matched if they are equally selective instead of insisting
on
identicalmatches.Tofattenupthegroupsthisschemeproduces,we’ll
call schools comparable if they fall into the sameBarron’s
selectivity
categories.7Returningtoourstylizedmatchingmatrix,supposeAllSt
ate
andTall Stateare ratedasCompetitive,AlteredStateandSmart are
ratedHighlyCompetitive,andIvyandLeafyareMostCompetitive.In
theBarron’sscheme,thosewhoappliedtoTallState,Smart,andLeafy,
and were admitted to Tall State and Smart can be compared with
studentswhoappliedtoAllState,Smart,andIvy,andwereadmittedto
AllStateandSmart.StudentsinbothgroupsappliedtooneCompetitiv
e,
oneHighlyCompetitive, and oneMostCompetitive school, and

they
wereadmittedtooneCompetitiveandoneHighlyCompetitiveschool.



In theC&Bdata, 9,202 students canbematched in thisway.But
because we’re interested in public-private comparisons, our
Barron’s
matchedsampleisalsolimitedtomatchedapplicantgroupsthatcontai
n
both public and private school students. This leaves
5,583matched
students for analysis. These matched students fall into 151
similar-
selectivitygroupscontainingbothpublicandprivatestudents.
OuroperationalregressionmodelfortheBarron’sselectivity-
matched

sample differs from regression (2.1), used to analyze the
matching
matrixinTable2.1,inanumberofways.First,theoperationalmodel
putsthenaturallogofearningsontheleft-handsideinsteadofearnings
itself.Asexplainedinthechapterappendix,useofaloggeddependent
variable allows regression estimates to be interpreted as a
percent
change.Forexample,anestimatedβof.05impliesthatprivateschool
alumniearnabout5%morethanpublicschoolalumni,conditionalon
whatevercontrolswereincludedinthemodel.
Another important difference between our operational empirical

model and the Table 2.1 example is that the former
includesmany
control variables, while the example controls only for the
dummy
variableAi, indicating students in groupA. The key controls in
the

operationalmodel are a set ofmanydummyvariables indicating
all
Barron’smatchesrepresentedinthesample(withonegroupleftoutasa
referencecategory).Thesecontrolscapturetherelativeselectivityoft
he
schoolstowhichstudentsappliedandwereadmittedintherealworld,
where many combinations of schools are possible. The resulting
regressionmodellookslike

Theparameterβinthismodelisstillthetreatmenteffectofinterest,an
estimateofthecausaleffectofattendanceataprivateschool.Butthis
modelcontrolsfor151groupsinsteadofthetwogroupsinourexample.
The parameters γj, for j = 1 to 150, are the coefficients on 150



selectivity-groupdummies,denotedGROUPji.
It’sworthunpackingthenotationinequation(2.2),sincewe’lluseit
again.ThedummyvariableGROUPjiequals1whenstudentiisingrou
p
jandis0otherwise.Forexample,thefirstofthesedummies,denoted
GROUP1i,might indicate studentswhoappliedandwereadmitted
to
threeHighlyCompetitiveschools.Thesecond,GROUP2i,mightindi
cate
studentswhoappliedtotwoHighlyCompetitiveschoolsandoneMost
Competitiveschool,andwereadmittedtooneofeachtype.Theorderin
which the categories are coded doesn’t matter as long as we
code
dummies for all possible combinations,with one groupomitted as
a
referencegroup.Althoughwe’vegonefromonegroupdummyto150,
theideaisasbefore:controllingforthesetsofschoolstowhichstudents
appliedandwereadmittedbringsusonegiantstepclosertoaceteris
paribuscomparisonbetweenprivateandpublicschoolstudents.
Afinalmodificationforoperationalpurposes is theadditionoftwo

furthercontrolvariables: individualSAT scores (SATi)and the
logof
parentalincome(PIi),plusafewvariableswe’llrelegatetoafootnote.

8

The individual SAT and log parental income controls appear in
the
modelwith coefficients δ1 and δ2 (read as “delta-1” and “delta-
2”),
respectively.Controls foradirectmeasureof individualaptitude,
like
students’SATscores,andameasureoffamilybackground,likeparent
al
income,mayhelpmakethepublic-privatecomparisonsattheheartof
our model more apples-to-apples and oranges-to-oranges than
they
otherwisewouldbe.Atthesametime,conditionalonselectivity-
group
dummies,suchcontrolsmaynolongermatter,apointexploredindetail
below.

RegressionsRun
We start with regression estimates of the private school
earnings
advantage from models with no controls. The coefficient from a
regression of log earnings (in 1995) on a dummy for private
school
attendance,withnootherregressors(right-handsidevariables)inthe



model, gives the raw difference in log earnings between
thosewho
attended a private school and everyone else (the chapter
appendix

explains why regression on a single dummy variable produces a
difference in means across groups defined by the dummy). Not
surprisingly, this rawgap, reported in the first columnofTable
2.2,
showsasubstantialprivateschoolpremium.Specifically,privatesch
ool
students are estimated to have earnings about 14% higher than
the
earningsofotherstudents.
The numbers that appear in parentheses below the regression
estimates inTable2.2are theestimatedstandarderrors thatgowith
these estimates. Like the standard errors for a difference in
means
discussedintheappendixtoChapter1,thesestandarderrorsquantify
thestatisticalprecisionof theregressionestimates
reportedhere.The
standarderrorassociatedwiththeestimateincolumn(1)is.055.The
factthat.135ismorethantwicethesizeoftheassociatedstandarderror
of.055makesitveryunlikelythepositiveestimatedprivate-
schoolgap
ismerelyachancefinding.Theprivateschoolcoefficientisstatisticall
y
significant.

TABLE2.2
Privateschooleffects:Barron’smatches



Notes:Thistablereportsestimatesoftheeffectofattendingaprivateco
llegeoruniversityon
earnings.Eachcolumnreportscoefficientsfromaregressionof
logearningsonadummyfor
attendingaprivateinstitutionandcontrols.Theresultsincolumns(4)
–(6)arefrommodelsthat
include applicant selectivity-group dummies. The sample size is

5,583. Standard errors are
reportedinparentheses.

Thelargeprivateschoolpremiumreportedincolumn(1)ofTable2.2
is an interesting descriptive fact, but, as in our example
calculation,
someofthisgapisalmostcertainlyduetoselectionbias.Asweshow
below,privateschoolstudentshavehigherSATscoresandcomefrom
wealthier families than do public school students, and so might
be
expected to earnmore regardlessofwhere theywent to college.We
thereforecontrolformeasuresofabilityandfamilybackgroundwhen
estimatingtheprivateschoolpremium.Anestimateoftheprivatescho
ol
premium from a regression model that includes an individual
SAT



controlisreportedincolumn(2)ofTable2.2.Every100pointsofSAT
achievementareassociatedwithabouta5percentagepointearnings
gain.Controllingforstudents’SATscoresreducesthemeasuredpriva
te
schoolpremium toabout .1.Addingcontrols forparental income,as
well as for demographic characteristics related to race and sex,
high
schoolrank,andwhetherthegraduatewasacollegeathletebringsthe
privateschoolpremiumdownalittlefurther,toastillsubstantialand
statisticallysignificant.086,reportedincolumn(3)ofthetable.
A substantial effect indeed, but probably still too big, that is,
contaminatedbypositive selectionbias.Column(4)
reportsestimates
from a model with no controls for ability, family background, or
demographiccharacteristics.Importantly,however,theregressionm
odel
usedtoconstructtheestimatereportedinthiscolumnincludesadumm

y
foreachmatchedcollegeselectivitygroupinthesample.That is, the
model used to construct this estimate includes the dummy
variables
GROUPji,forj=1,…,150(thetableomitsthemanyestimatedγjthis
model produces, but indicates their inclusion in the row labeled
“selection controls”). The estimated private school premium
with
selectivity-
groupcontrolsincludedisalmostbangon0,withastandard
errorofabout.04.Andthat’snotall:havingkilledtheprivateschool
premiumwithselectivity-
groupdummies,columns(5)and(6)showthat
the premium moves little when controls for ability and family
background are added to the model. This suggests that control
for
collegeapplicationandadmissionsselectivitygroupstakesusalongw
ay
towardtheapples-to-applesandoranges-to-
orangescomparisonsatthe
heartofanycredibleregressionstrategyforcausalinference.
The results in columns (4)–(6) of Table 2.2 are generated by the
subsample of 5,583 students for whom we can construct
Barron’s
matchesandgeneratewithin-groupcomparisonsofpublicandprivate
school students. Perhaps there’s something special about this
limited
sample,whichcontains less thanhalfof the full
complementofC&B
respondents.Thisconcernmotivatesa
lessdemandingcontrolscheme
thatincludesonlytheaverageSATscoreinthesetofschoolsstudents
appliedtoplusdummiesforthenumberofschoolsappliedto(thatis,a

dummyforstudentswhoappliedtotwoschools,adummyforstudents
whoappliedtothreeschools,andsoon), insteadofafullsetof150
selectivity-
groupdummies.Thisregression,whichcanbeestimatedin
thefullC&Bsample, ischristenedthe“self-
revelationmodel”because
it’smotivatedbythenotionthatapplicantshaveaprettygoodideaof
theirabilityandwherethey’relikelytobeadmitted.Thisself-
assessment
isreflectedinthenumberandaverageselectivityoftheschoolstowhic
h
they apply.As a rule,weaker applicants apply to fewer and to
less-
selectiveschoolsthandostrongerapplicants.
Theself-
revelationmodelgeneratesresultsremarkablysimilartothose

generatedbyBarron’smatches.Theself-
revelationestimates,computed
inasampleof14,238students,canbeseeninTable2.3.Asbefore,the
first three columns of the table show that the raw private school
premium falls markedly, but remains substantial, when controls
for
abilityandfamilybackgroundareaddedtothemodel(falling inthis
case, from .21 to .14).At the same time, columns (4)–(6) show
that
modelscontrollingforthenumberandaverageselectivityoftheschoo
ls
studentsapplytogeneratesmallandstatisticallyinsignificanteffects
on
theorderof.03.Moreover,aswiththemodelsthatcontrolforBarron’s
matches,models with average selectivity controls generate
estimates
thatare largely insensitivetothe inclusionofcontrols forabilityand
familybackground.
Privateuniversityattendanceseemsunrelatedtofutureearningsonce

wecontrol forselectionbias.Butperhapsour focusonpublic-private
comparisons misses the point. Students may benefit from
attending
schoolslikeIvy,Leafy,orSmartsimplybecausetheirclassmatesatsuc
h
schools are somuchbetter. The synergy generatedby a strongpeer
groupmaybethefeaturethatjustifiestheprivateschoolpricetag.
Wecanexplorethishypothesisbyreplacingtheprivateschooldummy

intheself-
revelationmodelwithameasureofpeerquality.Specifically,
asintheoriginalDaleandKruegerstudythatinspiresouranalysis,we
replacePiinequation(2.2)withtheaverageSATscoreofclassmatesat
theschoolattended.9Columns(1)–(3)ofTable2.4showthatstudents



whoattendedmore selective schoolsdomarkedlybetter in the
labor
market,withanestimatedcollegeselectivityeffectontheorderof8%
higherearningsforevery100pointsofaverageselectivityincrease.Y
et,
this effect too appears to be an artifact of selection bias due to
the
greater ambition and ability of those who attend selective
schools.
Estimatesfrommodelswithself-
revelationcontrols,reportedincolumns
(4)–(6)of thetable,showaveragecollegeselectivity tobeessentially
unrelatedtoearnings.

TABLE2.3
Privateschooleffects:AverageSATscorecontrols

Notes:Thistablereportsestimatesoftheeffectofattendingaprivateco
llegeoruniversityon
earnings.Eachcolumnshowscoefficients fromaregressionof
logearningsonadummyfor
attending a private institution and controls. The sample size is
14,238. Standard errors are
reportedinparentheses.

TABLE2.4
Schoolselectivityeffects:AverageSATscorecontrols



Notes:Thistablereportsestimatesoftheeffectofalmamaterselectivit
yonearnings.Each
columnshowscoefficientsfromaregressionof
logearningsontheaverageSATscoreatthe
institutionattendedandcontrols.Thesamplesize
is14,238.Standarderrorsare reported in
parentheses.

2.3CeterisParibus?

TOPIC:Brieflydescribeexperiences,challenges,andaccomplishme
nts
thatdefineyouasaperson.



ESSAY:Iamadynamicfigure,oftenseenscalingwallsandcrushing
ice.IcookThirty-MinuteBrowniesintwentyminutes.Iaman
expertinstucco,aveteraninlove,andanoutlawinPeru.On
Wednesdays,afterschool,Irepairelectricalappliancesfreeof
charge.

Iamanabstractartist,aconcreteanalyst,andaruthlessbookie.I
wave,dodge,andfrolic,yetmybillsareallpaid.Ihavewon
bullfightsinSanJuan,cliff-divingcompetitionsinSriLanka,and
spellingbeesattheKremlin.IhaveplayedHamlet,Ihave
performedopen-heartsurgery,andIhavespokenwithElvis.

ButIhavenotyetgonetocollege.
FromanessaybyHughGallagher,age19.
(HughlaterwenttoNewYorkUniversity.)

ImagineHarveyandUmaonthedayadmissionslettersgoout.Both
aredelightedtogetintoHarvard(itmustbethose20-
minutebrownies).
Harvey immediatelyacceptsHarvard’soffer—
wouldn’tyou?ButUma
makesadifficult choiceandgoes toU-Mass instead.What’supwith
Uma?Isherceterisreallyparibus?
Umamighthavegoodreasonstooptforless-prestigiousU-Massover

Harvard.Priceisanobviousconsideration(UmawonaMassachusetts
AdamsScholarship,whichpays state school tuition
forgoodstudents
likeherbutcannotbeusedatprivateschools).Ifpricemattersmoreto
UmathantoHarvey,it’spossiblethatUma’scircumstancesdifferfro
m
Harvey’s inotherways.Perhaps she’spoorer.Someofour
regression
modelscontrolforparentalincome,butthisisanimperfectmeasureof
familylivingstandards.Amongotherthings,wedon’tknowhowmany
brothersandsistersthestudentsintheC&Bsamplehad.Alargerfamily
at the same income levelmay find it harder topay for each
child’s
education.Iffamilysizeisalsorelatedtolaterearnings(seeChapter3
for more on this point), our regression estimates of private
college
premiamaynotbeapples-to-applesafterall.

Thisismorethanacampfirestory.Regressionisawaytomakeother

thingsequal,but equality is generatedonly forvariables
includedas



controlsontheright-handsideofthemodel.Failuretoincludeenough
controls or the right controls still leaves uswith selection bias.
The
regressionversionoftheselectionbiasgeneratedbyinadequatecontr
ols
iscalledomittedvariablesbias(OVB),andit’soneofthemostimporta
nt
ideasinthe’metricscanon.
ToillustrateOVB,wereturntoourfive-studentexampleandthebias

fromomittingcontrolformembershipinapplicantgroupA.The“long
regression”hereincludesthedummyvariable,Ai,whichindicatestho
se
ingroupA.WewritetheregressionmodelthatincludesAias

Thisisequation(2.1)rewrittenwithsuperscriptlonparametersandthe
residualtoremindusthattheinterceptandprivateschoolcoefficientar
e
fromthelongmodel,andtofacilitatecomparisonswiththeshortmodel
tocome.
Does the inclusionofAimatter for estimatesof theprivate school

effect in the regression above? Suppose we make do with a
short
regressionwithnocontrols.Thiscanbewrittenas

Because the single regressor here is a dummy variable, the slope
coefficientinthismodelisthedifferenceinaverageYibetweenthose
withPi switchedon and thosewithPi switchedoff.Aswenoted in

Section 2.1, βs = 20,000 in the short regression, while the long
regressionparameter,βl,isonly10,000.Thedifferencebetweenβsan
dβl

is theOVBdue toomissionofAi in the short regression.Here,OVB
amountsto$10,000,afigureworthworryingabout.
Whydoes theomissionof thegroupAdummychange theprivate

collegeeffectsomuch?Recallthattheaverageearningsofstudentsin
groupAexceedstheaverageearningsof those ingroupB.Moreover,
two-thirdsof thestudents inhigh-earninggroupAattendedaprivate



school,whilelower-
earninggroupBisonlyhalfprivate.Differencesin
earningsbetweenprivateandpublicalumnicomeinpartfromthefact
that the mostly private students in group A have higher earnings
anyway, regardless ofwhere they enrolled. Inclusionof the
groupA
dummyinthelongregressioncontrolsforthisdifference.
Asthisdiscussionsuggests,theformalconnectionbetweenshortand

longregressioncoefficientshastwocomponents:

(i)The relationshipbetween theomittedvariable (Ai) and the
treatmentvariable(Pi);we’llsoonseehowtoquantifythiswith
anadditionalregression.

(ii) The relationship between the omitted variable (Ai) and the
outcomevariable(Yi).This isgivenbythecoefficientonthe
omitted variable in the long regression, in this case, the
parameterγinequation(2.3).

Together,thesepiecesproducetheOVBformula.Westartwiththefact
that

To be specific, when the omitted variable is Ai and the
treatment
variableisPi,wehave

Omittedvariablesbias,definedasthedifferencebetweenthecoefficie
nt
onPi intheshortandlongmodels, isasimplerearrangementof this
equation:



WecanrefinetheOVBformulausingthefactthatbothtermsinthe
formula are themselves regression coefficients. The first term is
the
coefficientfromaregressionoftheomittedvariableAiontheprivate
schooldummy.Inotherwords,thistermisthecoefficientπ1(read“pi-
1”)intheregressionmodel

whereuiisaresidual.WecannowwritetheOVBformulacompactlyin
Greek:

whereγ isthecoefficientonAi inthelongregression.Thisimportant
formulaisderivedinthechapterappendix.
Amongstudentswhoattendedprivateschool,twoareingroupAand

oneingroupB,whileamongthosewhowenttopublicschool,oneisin
groupA and one in group B. The coefficientπ1 in our five-
student
exampleistherefore2/3−1/2=.1667.AsnotedinSection2.2,the
coefficientγis60,000,reflectingthehigherearningsofgroupA.Putti
ng
thepiecestogether,wehave

and

Phew!ThecalculationsuggestedbytheOVBformulaindeedmatchest
he
directcomparisonofshortandlongregressioncoefficients.
TheOVB formula isamathematical result thatexplainsdifferences

between regression coefficients in any short-versus-long
scenario,
irrespective of the causal interpretationof the
regressionparameters.
Thelabels“short”and“long”arepurelyrelative:Theshortregression
neednotbeparticularlyshort,butthelongregressionisalwayslonger,
sinceitincludesthesameregressorsplusatleastonemore.Often,the
additionalvariablesthatmakethelongregressionlongarehypothetic
al,
thatis,unavailableinourdata.TheOVBformulaisatoolthatallowsus
toconsidertheimpactofcontrolforvariableswewishwehad.Thisin
turnhelpsusassesswhetherceteris isindeedparibus.Whichbringsus
backtoUmaandHarvey.
Supposeanomittedvariableinequation(2.2)isfamilysize,FSi.We’v
e

includedparentalincomeasacontrolvariable,butnotthenumberof
brothersandsisterswhomightalsogotocollege,whichisnotavailable
intheC&Bdataset.WhentheomittedvariableisFSi,wehave

Whymighttheomissionoffamilysizebiasregressionestimatesofthe
privatecollegeeffect?BecausedifferencesinearningsbetweenHarv
ard
and U-Mass graduates arise in part from differences in family
size
betweenthetwogroupsofstudents(thisistherelationshipbetweenFSi
andPi)andfromthefactthatsmallerfamiliesareassociatedwithhighe
r
earnings,evenaftercontrollingforthevariablesincludedintheshort

regression(thisistheeffectofFSiinthelongregression,whichinclude
s
thesesamecontrolsaswell).Thelongregressioncontrolsforthefact



that students who go to Harvard come from smaller families (on
average) than do students who went to U-Mass, while the short
regressionthatomitsFSidoesnot.
ThefirstterminthisapplicationoftheOVBformulaisthecoefficient
inaregressionofomitted(FSi)onincluded(Pi)variablesandeverythi
ng
else that appears on the right-hand side of equation (2.2). This
regression—
whichissometimessaidtobe“auxiliary”becauseithelpsus
interprettheregressionwecareabout—canbewrittenas

Most of the coefficients in equation (2.4) are of little
interest.What
matters here isπ1, since this captures the relationship between
the
omittedvariable,FSi,andthevariablewhoseeffectwe’reafter,Pi,afte
r
controllingforothervariables thatappear inboththeshortandlong
regressionmodels.10

To complete the OVB formula for this case, we write the long
regressionas

againusingsuperscriptlfor“long.”TheregressorFSiappearsherewit
h
coefficientλ.11TheOVBformulaistherefore

whereβisfromequation(2.2).
Continuingtothinkofequation(2.2)astheshortregression,whilethe
longregressionincludesthecontrolvariablesthatappearinthismodel

plus family size,we see thatOVBhere is probably positive.
Private
school students tend to come fromsmaller familiesonaverage,
even



after conditioning on family income. If so, the regression
coefficient
linkingfamilysizeandprivatecollegeattendanceisnegative(π1<0in
equation(2.4)).Students fromsmaller familiesarealso likely
toearn
morenomatterwheretheygotoschool,sotheeffectofomittingfamily
sizecontrols ina longregressionisalsonegative(λ<0inequation
(2.5)).Theproductofthesetwonegativetermsispositive.
CarefulreasoningaboutOVBisanessentialpartofthe’metricsgame.
Wecan’tusedatatochecktheconsequencesofomittingvariablesthat
wedon’tobserve,butwecanusetheOVBformulatomakeaneducated
guessastothelikelyconsequencesoftheiromission.Mostofthecontr
ol
variablesthatmightbeomittedfromequation(2.2)aresimilartofamil
y
sizeinthatthesignoftheOVBfromtheiromissionisprobablypositive.
Fromthisweconcludethat,assmallastheestimatesoftheeffectsof
privateschoolattendanceincolumns(4)–(6)ofTables2.2–
2.3are,they
couldwellbetoobig.Theseestimatesthereforeweighstronglyagainst
thehypothesisofasubstantialprivateschoolearningsadvantage.

RegressionSensitivityAnalysis
Becausewecanneverbesurewhetheragivensetofcontrolsisenough
toeliminateselectionbias,it’simportanttoaskhowsensitiveregressi
on
resultsaretochangesinthelistofcontrols.Ourconfidenceinregressio
n
estimatesofcausaleffectsgrowswhentreatmenteffectsareinsensitiv

e—
masters say “robust”—to whether a particular variable is added
or
droppedaslongasafewcorecontrolsarealwaysincludedinthemodel.
Thisdesirablepatternisillustratedbycolumns(4)–(6)inTables2.2–
2.3,
whichshowthatestimatesoftheprivateschoolpremiumareinsensitiv
e
totheinclusionofstudents’ability(asmeasuredbyownSATscores),
parentalincome,andafewothercontrolvariables,oncewecontrolfor
thenatureoftheschoolstowhichstudentsapplied.
TheOVBformulaexplainsthisremarkablefinding.StartwithTable
2.5, which reports coefficients from regressions like equation
(2.4),
exceptthatinsteadofFSi,weputSATiontheleft-handsidetoproduce
the estimates in columns (1)–(3) while ln PIi on the left-hand
side



generates columns (4)–(6). These auxiliary regressions assess
the
relationshipbetweenprivateschoolattendanceandtwoofourcontrol
s,
SATi and ln PIi, conditional on other controls in the model. Not
surprisingly,privateschoolattendanceisastrongpredictorofstudent
s’
own SAT scores and family income, relationships documented
in
columns(1)and(4)inthetable.Theadditionofdemographiccontrols,
highschoolrank,andadummyforathleticparticipationdoeslittleto
changethis,ascanbeseenincolumns(2)and(5).Butcontrolforthe
numberofapplicationsandtheaverageSATscoreofschoolsappliedto
,
as in the self-revelationmodel, effectively eliminates the
relationship

between private school attendance and these important
background
variables.Thisexplainswhytheestimatedprivateschoolcoefficients
in
columns(4),(5),and(6)ofTable2.3areessentiallythesame.

TABLE2.5
Privateschooleffects:Omittedvariablesbias



Notes:Thistabledescribestherelationshipbetweenprivateschoolatt
endanceandpersonal
characteristics.Dependentvariablesaretherespondent’sSATscore(
dividedby100)incolumns
(1)–(3)andlogparentalincomeincolumns(4)–
(6).Eachcolumnshowsthecoefficientfroma
regressionofthedependentvariableonadummyforattendingaprivat
einstitutionandcontrols.
Thesamplesizeis14,238.Standarderrorsarereportedinparentheses.

TheOVBformulaisthePrimeDirectiveofappliedeconometrics,so
let’srockitwithournumbersandseehowitworksout.Forillustration,
we’lltaketheshortmodeltobearegressionoflogwagesonPiwithno
controlsandthelongmodeltobetheregressionthataddsindividual
SATscores.Theshort(nocontrols)coefficientonPi incolumn(1)of
Table2.3is.212,whilethecorrespondinglongcoefficient(controllin
g



forSATi)incolumn(2)is.152.Ascanalsobeseenincolumn(2)ofthe
table,theeffectofSATiinthelongregressionis.051.Thefirstcolumni
n
Table 2.5 shows that the regression of omitted SATi on included
Pi

producesacoefficientof1.165.Puttingthesetogether,wehaveOVB,
twoways:

Comparethiswiththeparallelcalculationtakingusfromcolumn(4)
tocolumn(5)inTable2.3.Thesecolumnsreportresultsfrommodels
thatincludeself-
revelationcontrols.Here,Short−Longissmall:.034−
.031=.003,tobeprecise.Boththeshortandlongregressionsinclude
selectivitycontrolsfromtheself-
revelationmodel,asdoestherelevant
auxiliary regression of own SAT scores on Pi. With self-
revelation
controlsincludedinbothmodels,wehave

(Roundingerrorwithsmallnumberspushesusoffofthetargetof.003.)
TheeffectoftheomittedSATiinthelongregressionfallsherefrom.05
1
to.036,whiletheregressionofomittedonincludedgoesfromahefty
1.165tosomethinganorderofmagnitudesmallerat .066(shownin
column(3)ofTable2.5).Thisshowsthat,conditionalonthenumber
andaverageselectivityofschoolsappliedto,studentswhochosepriva
te
andpublicschoolsaren’tverydifferent,atleastasfarastheirownSAT
scores go. Consequently, the gap between short and long
estimates
disappears.
Because our estimated private school effect is insensitive to the
inclusionoftheavailableabilityandfamilybackgroundvariablesonc
e



the self-revelation controls are included, other control
variables,
includingthoseforwhichwehavenodata,mightmatterlittleaswell.In
other words, any remaining OVB due to uncontrolled

differences is
probablymodest.12ThiscircumstantialevidenceformodestOVBdo
esn’t
guaranteethattheregressionresultsdiscussedinthischapterhavethe
samecausalforceasresultsfromarandomizedtrial—we’dstillrather
havearealexperiment.Ataminimum,however,thesefindingscallint
o
questionclaimsforasubstantialearningsadvantageduetoattendance
atexpensiveprivatecolleges.

MASTERSTEVEFU:Inanutshell,please,Grasshopper.
GRASSHOPPER:Causalcomparisonscomparelikewithlike.Inasse
ssing
theeffectsofcollegechoice,wefocusonstudentswithsimilar
characteristics.
MASTERSTEVEFU:Eachisdifferentinathousandways.Mustallwa
ys
besimilar?
GRASSHOPPER:Goodcomparisonseliminatesystematicdifferenc
es
betweenthosewhochoseonepathandthosewhochooseanother,
whensuchdifferencesareassociatedwithoutcomes.
MASTERSTEVEFU:Howisthisaccomplished?
GRASSHOPPER:Themethodofmatchingsortsindividualsintogrou
ps
withthesamevaluesofcontrolvariables,likemeasuresofability
andfamilybackground.Matchedcomparisonswithinthesegroups
arethenaveragedtogetasingleoveralleffect.
MASTERSTEVEFU:Andregression?
GRASSHOPPER:Regressionisanautomatedmatchmaker.The
regressionestimateofacausaleffectisalsoanaverageofwithin-
groupcomparisons.
MASTERSTEVEFU:WhatistheTaoofOVB?
GRASSHOPPER:OVBisthedifferencebetweenshortandlong
regressioncoefficients.Thelongregressionincludesadditional
controls,thoseomittedfromtheshort.Shortequalslongplusthe

effectofomittedinlongtimestheregressionofomittedon
included.
MASTERJOSHWAY:Nothingomittedhere,Grasshopper.

Mastersof’Metrics:GaltonandYule

The term “regression” was coined by Sir Francis Galton,
Charles
Darwin’shalf-cousin, in1886.Galtonhadmany interests,buthewas
grippedbyDarwin’smasterpiece,TheOriginofSpecies.Galtonhope
dto
applyDarwin’stheoryofevolutiontovariationinhumantraits.Inthe
course of his research, Galton studied attributes ranging from
fingerprintstobeauty.HewasalsooneofmanyBritishintellectualsto
useDarwininthesinisterserviceofeugenics.Thisregrettablediversi
on
notwithstanding, his work in theoretical statistics had a lasting
and
salutaryeffectonsocialscience.Galtonlaidthestatisticalfoundation
s
forquantitativesocialscienceofthesortthatgripsus.

Galtondiscovered that theaverageheightsof fathers and sonsare
linked by a regression equation. He also uncovered an
interesting
implicationof thisparticular regressionmodel:
theaverageheightof
sonsisaweightedaverageoftheirfathers’heightandtheaverageheigh
t
inthepopulationfromwhichthefathersandsonsweresampled.Thus,
parentswhoaretallerthanaveragewillhavechildrenwhoarenotquite

astall,whileparentswhoareshorterthanaveragewillhavechildren
whoareabit taller.Tobespecific,MasterStevefu,who is6′3″,can
expecthischildrentobetall, thoughnotas tallashe is.Thankfully,
however,MasterJoshway,whois5′6″onagoodday,canexpecthis
childrentoattainsomewhatgranderstature.
Galtonexplainedthisaveragingphenomenoninhiscelebrated1886
paper“RegressiontowardsMediocrityinHereditaryStature.”13Tod
ay,
wecallthisproperty“regressiontothemean.”Regressiontothemeani
s
notacausalrelationship.Rather,it’sastatisticalpropertyofcorrelate
d
pairsofvariablesliketheheightsoffathersandsons.Althoughfathers’
and sons’ heights are never exactly the same, their frequency
distributions are essentially unchanging. This distributional
stability
generatestheGaltonregression.
Weseeregressionasastatisticalprocedurewiththepowertomake
comparisonsmoreequal through the inclusionof controlvariables
in
modelsfortreatmenteffects.Galtonseemstohavebeenuninterestedi
n
regression as a control strategy. The use of regression for
statistical
controlwaspioneeredbyGeorgeUdnyYule,a studentof statistician
Karl Pearson, whowas Galton’s protégé. Yule realized that
Galton’s
regressionmethodcouldbeextendedtoincludemanyvariables.Inan
1899paper,Yuleusedthisextensiontolinktheadministrationofthe
EnglishPoorLawsindifferentcountiestothelikelihoodcountyreside
nts
were poor, while controlling for population growth and the age
distributioninthecounty.14Thepoorlawsprovidedsubsistenceforth
e
indigent, usually by offering shelter and employment in

institutions
called workhouses. Yule was particularly interested in whether
the
practice of outdoor relief, which provided income support for
poor
people without requiring them to move to a workhouse,
increased
povertyratesbymakingpauperismlessonerous.Thisisawell-defined
causalquestionmuchlikethosethatoccupysocialscientiststoday.



Appendix:RegressionTheory

ConditionalExpectationFunctions
Chapter 1 introduces the notion ofmathematical expectation,
called
“expectation”forshort.WewriteE[Yi]fortheexpectationofavariabl
e,
Yi. We’re also concerned with conditional expectations, that is,
the
expectationof a variable ingroups (also called
“cells”)definedbya
secondvariable.Sometimesthissecondvariableisadummy,takingon
only twovalues,but itneednotbe.Often, as in this chapter,we’re
interestedinconditionalexpectationsingroupsdefinedbythevalueso
f
variables thataren’tdummies, forexample, theexpectedearnings
for
people who have completed 16 years of schooling. This sort of
conditionalexpectationcanbewrittenas

andit’sreadas“TheconditionalexpectationofYigiventhatXiequals
theparticularvaluex.”
Conditionalexpectations tellushowthepopulationaverageofone
variablechangesaswemovetheconditioningvariableoverthevalues

thisvariablemightassume.Foreveryvalueoftheconditioningvariabl
e,



wemight get a different average of the dependent variable,Yi.
The
collection of all such averages is called the conditional
expectation
function (CEF for short).E[Yi|Xi] is the CEF ofYi givenXi,
without
specifyingavalueforXi,whileE[Yi|Xi=x]isonepointintherangeof
thisfunction.
AfavoriteCEFofoursappearsinFigure2.1.Thedotsinthisfigure

show the average logweeklywage formenwith different levels of
schooling(measuredbyhighestgradecompleted),withschoolinglev
els
arrayed on theX-axis (data here come from the 1980U.S.
Census).
Though it bobsupanddown, the earnings-schoolingCEF is
strongly
upward-sloping,withanaverageslopeofabout.1.Inotherwords,each
yearofschoolingisassociatedwithwagesthatareabout10%higheron
average.

FIGURE2.1
TheCEFandtheregressionline

Notes:Thisfigureshowstheconditionalexpectationfunction(CEF)o
flogweeklywagesgiven
yearsofeducation,andthelinegeneratedbyregressinglogweeklywag
esonyearsofeducation
(plottedasabrokenline).

Many of the CEFs we’re interested in involve more than one

conditioningvariable,eachofwhichtakesontwoormorevalues.We
write



for a CEF with K conditioning variables. With many
conditioning
variables,theCEFishardertoplot,buttheideaisthesame.E[Yi|X1i=
x1,…,XKi=xK]givesthepopulationaverageofYiwiththeseKother
variables held fixed. Insteadof looking at averagewages
conditional
onlyonschooling,forexample,wemightalsoconditiononcellsdefine
d
byage,race,andsex.

RegressionandtheCEF
Table2.1illustratesthematchmakingideabycomparingstudentswho
attendedpublicandprivatecolleges,aftersortingstudentsintocellso
n
thebasisofthecollegestowhichtheyappliedandwereadmitted.The
bodyofthechapterexplainshowweseeregressionasaquickandeasy
wayofautomatingsuchmatchedcomparisons.Here,weusetheCEFto
makethisinterpretationofregressionmorerigorous.15

The regression estimates of equation (2.2) reported in Table 2.3
suggestthatprivateschoolattendanceisunrelatedtoaverageearnings
once individual SAT scores, parental income, and the selectivity
of
colleges applied and admitted to are held fixed. As a
simplification,
suppose that the CEF of log wages is a linear function of these
conditioningvariables.Specifically,assumethat

whereGreekletters,asalways,areparameters.WhentheCEFoflnYiis
alinearfunctionoftheconditioningvariablesasinequation(2.6),the
regressionof lnYion these sameconditioningvariables recovers

this
linear function.(Weskipadetailedproofof this fact, thoughit’snot
hard to show.) In particular, given linearity, the coefficient onPi
in
equation(2.2)willbeequaltothecoefficientonPiinequation(2.6).



WithalinearCEF,regressionestimatesofprivateschooleffectsbased
onequation(2.2)arealsoidenticaltothosewe’dgetfromastrategythat
(i)matchesstudentsbyvaluesofGROUPi,SATi,andlnPIi;(ii)compar
es
theaverageearningsofmatchedstudentswhowenttoprivate(Pi=1)
and public (Pi = 0) schools for each possible combination of the
conditioningvariables;and(iii)producesasingleaveragebyaveragin
g
allofthesecell-
specificcontrasts.Toseethis,it’senoughtouseequation
(2.6)towritecell-specificcomparisonsas

BecauseourlinearmodelfortheCEFassumesthattheeffectofprivate
schoolattendanceisequaltotheconstantβineverycell,anyweighted
averageofcell-specificprivate-attendancecontrastsisalsoequaltoβ.
Linear models help us understand regression, but regression is a

wonderfully flexible tool,useful regardlessofwhether
theunderlying
CEFislinear.Regressioninheritsthisflexibilityfromthefollowingpa
ir
ofcloselyrelatedtheoreticalproperties:

▪IfE forsomeconstantsaandb1,
…,bK,thentheregressionofYionX1i,…,XKihasintercepta
andslopesb1,…,bK.Inotherwords,iftheCEFofYionX1i,…,
XKiislinear,thentheregressionofYionX1i,…,XKiisit.

▪IfE[Yi|X1i,…,XKi]isanonlinearfunctionoftheconditioning
variables,thentheregressionofYionX1i,…,XKigivesthebest
linear approximation to this nonlinear CEF in the sense of
minimizingtheexpectedsquareddeviationbetweenthefitted
valuesfromalinearmodelandtheCEF.

Tosummarize:iftheCEFis linear,regressionfindsit; ifnotlinear,
regression findsagoodapproximation to it.We’ve justused the
first
theoreticalpropertyto
interpretregressionestimatesofprivateschool



effectswhentheCEFislinear.Thesecondpropertytellsusthatwecan
expect regressionestimatesofa treatmenteffect tobeclose to those
we’d get by matching on covariates and then averaging within-
cell
treatment-controldifferences,eveniftheCEFisn’tlinear.
Figure2.1documentsthemanner inwhichregressionapproximates

thenonlinearCEFoflogwagesconditionalonschooling.Althoughthe
CEFbouncesaround the regression line, this linecaptures the
strong
positive relationship between schooling and wages. Moreover,
the
regression slope is close to E{E[Yi|Xi]− E[Yi|Xi− 1]}; that is,
the
regressionslopealsocomesclose to theexpectedeffectofaone-unit
changeinXionE[Yi|Xi].

16

BivariateRegressionandCovariance

Regressioniscloselyrelatedtothestatisticalconceptofcovariance.T

he
covariancebetweentwovariables,XiandYi,isdefinedas

Covariancehasthreeimportantproperties:

(i) The covariance of a variable with itself is its variance;
.

(ii)IftheexpectationofeitherXiorYiis0,thecovariancebetween
themistheexpectationoftheirproduct;C(Xi,Yi)=E[XiYi].

(iii)ThecovariancebetweenlinearfunctionsofvariablesXiandYi
—writtenWi=a+bXiandZi=c+dYiforconstantsa,b,c,
d—isgivenby

The intimate connectionbetween regressionandcovariancecanbe
seen in a bivariate regression model, that is, a regression with
one
regressor,Xi, plus an intercept.

17 The bivariate regression slope and



interceptarethevaluesofaandbthatminimizetheassociatedresidual
sumofsquares,whichwewriteas

The termRSS referencesa sum of squares because, carryingout
this
minimization in a particular sample, we replace expectation
with a
sampleaverageorsum.Thesolutionforthebivariatecaseis

An implication of equation (2.7) is that when two variables are
uncorrelated(haveacovarianceof0),theregressionofeitheroneonth
e

othergeneratesaslopecoefficientof0.Likewise,abivariateregressio
n
slopeof0impliesthetwovariablesinvolvedareuncorrelated.

FitsandResiduals
Regressionbreaksanydependentvariableintotwopieces.Specificall
y,
fordependentvariableYi,wecanwrite

Thefirsttermconsistsofthefittedvalues,Ŷi,sometimessaidtobethe
partofYithat’s“explained”bythemodel.Thesecondpart,theresidual
s,
ei,iswhat’sleftover.
Regression residuals and the regressors included in themodel
that

producedthemareuncorrelated.Inotherwords,ifeiistheresidualfro
m
a regressiononX1i,…,XKi, then the regression of ei on these
same
variablesproducescoefficientsthatareall0.Becausefittedvaluesare
a
linear combination of regressors, they’re also uncorrelated with
residuals.Wesummarizetheseimportantpropertieshere.



PROPERTIESOFRESIDUALSSupposethatαandβ1,…,βKarethein
terceptand
slopecoefficientsfromaregressionofYionX1i,…,XKi.The fitted
valuesfromthisregressionare

andtheassociatedregressionresidualsare

Regressionresiduals

(i)haveexpectationandsamplemean0:E[ei]=
(ii) are uncorrelated in both population and sample with all
regressors thatmade themandwith thecorresponding fitted
values.Thatis,foreachregressor,Xki,

Youcantakethesepropertiesonfaith,butforthosewhoknowalittle
calculus, they’re easy to establish. Startwith the fact that
regression
parameters and estimatesminimize the residual sumof squares.
The
first-order conditions for this minimization problem amount to
statementsequivalentto(i)and(ii).

RegressionforDummies
An important regression special case is bivariate regression
with a
dummy regressor. The conditional expectation ofYi given a
dummy
variable,Zi,takesontwovalues.WritetheminGreek,likethis:



sothat

isthedifferenceinexpectedYiwiththedummyregressor,Zi,switched
onandoff.
Usingthisnotation,wecanwrite

This shows thatE[Yi|Zi] is a linear function ofZi,with slopeβ
and
interceptα.Because theCEFwitha singledummyvariable is linear,
regressionfitsthisCEFperfectly.Asaresult,theregressionslopemust
alsobeβ=E[Yi|Zi=1]−E[Yi|Zi=0],thedifferenceinexpectedYi
withZiswitchedonandoff.
Regressionfordummiesisimportantbecausedummyregressorscrop

upoften,as inouranalysesofhealth insuranceand typesof college
attended.

RegressionAnatomyandtheOVBFormula

Themost interesting regressions aremultiple; that is, they
include a
causalvariableofinterest,plusoneormorecontrolvariables.Equatio
n
(2.2),forexample,regresseslogearningsonadummyforprivatecolle
ge
attendanceinamodelthatcontrolsforability,familybackground,and
the selectivity of schools that students have applied to and been
admitted to.We’veargued that control for covariates in a
regression
model ismuch likematching.That is, the
regressioncoeffiicentona
privateschooldummyinamodelwithcontrolsissimilartowhatwe’d
getifwedividedstudentsintocellsbasedonthesecontrols,compared



publicschoolandprivateschoolstudentswithinthesecells,andthen
tookanaverageoftheresultingsetofconditionalcomparisons.Here,
we
offeramoredetailed“regressionanatomy”lesson.
SupposethecausalvariableofinterestisX1i(say,adummyforprivate

school)andthecontrolvariable isX2i (say,SATscores).Witha little
work, thecoefficientonX1i inaregressioncontrolling forX2i
canbe
writtenas

where istheresidualfromaregressionofX1ionX2i:

As always, residuals are uncorrelatedwith the regressors

thatmade
them,andsoitisfortheresidual .It’snotsurprising,therefore,that
thecoefficientonX1iinamultivariateregressionthatcontrolsforX2ii
s
thebivariatecoefficientfromamodelthatincludesonlythepartofX1i
thatisuncorrelatedwithX2i.Thisimportantregressionanatomyform
ula
shapes our understanding of regression coefficients from around
the
world.
Theregressionanatomyideaextendstomodelswithmorethantwo

regressors. The multivariate coefficient on a given regressor can
be
written as the coefficient fromabivariate regressionon the
residual
fromregressingthisregressoronallothers.Here’stheanatomyofthe
kthcoefficientinamodelwithKregressors:

REGRESSIONANATOMY

where istheresidualfromaregressionofXkiontheK−1other



covariatesincludedinthemodel.

Regressionanatomyisespeciallyrevealingwhenthecontrolsconsist
of dummy variables, as in equation (2.2). For the purposes of
this
discussion, we simplify the model of interest to have only
dummy
controls,thatis,

RegressionanatomytellsusthatthecoefficientonPicontrollingforth
e

set of 150 GROUPji dummies is the bivariate coefficient from a
regressionon ,wherethisistheresidualfromaregressionofPiona
constantandthesetof150GROUPjidummies.
It’shelpfulheretoaddasecondsubscripttoindexgroupsaswellas

individuals.Inthisscheme,lnYijisthelogearningsofcollegegraduat
ei
in selectivity group j, while Pij is this graduate’s private school
enrollmentstatus.Whatistheresidual, ,fromtheauxiliaryregression
ofPijonthesetof150selectivity-
groupdummies?Becausetheauxiliary
regressionthatgenerates hasaparameterforeverypossiblevalueof
theunderlyingCEF,thisregressioncapturestheCEFofPijconditiona
lon
selectivitygroupperfectly. (Herewe’reextending thedummy-
variable
resultdescribedbyequation(2.8)toregressionondummiesdescribin
ga
categorical variable that takes onmany values instead of just
two.)
Consequently,thefittedvaluefromaregressionofPijonthefullsetof
selectivity-
groupdummiesisthemeanprivateschoolattendanceratein
eachgroup.Forapplicantiingroupj,theauxiliaryregressionresiduali
s
therefore ,where isshorthandforthemeanprivateschool
enrollmentrateintheselectivitygrouptowhichibelongs.
Finally,puttingthepiecestogether,regressionanatomytellsusthat

themultivariateβinthemodeldescribedbyequation(2.9)is



Thisexpressionrevealsthat,justasifweweretomanuallysortstudents
intogroupsandcomparepublicandprivatestudentswithineachgroup
,

regression on private school attendance with control for
selectivity-
groupdummiesisalsoawithin-
groupprocedure:variationacrossgroups
isremovedbysubtracting toconstructtheresidual, .Moreover,as
forgroupsCandDinTable2.1,equation(2.10)impliesthatapplicant
groupsinwhicheveryoneattendseitherapublicorprivateinstitution
areuninformativeabouttheeffectsofprivateschoolattendancebecau
se

is0foreveryoneinsuchgroups.
TheOVBformula,usedattheendofthischapter(inSection2.3)to

interpretestimatesfrommodelswithdifferentsetsofcontrols,provid
es
anotherrevealingtakeonregressionanatomy.Callthecoefficienton
X1i
inamultivariateregressionmodelcontrollingforX2ithelongregressi
on
coefficient,βl:

CallthecoefficientonX1iinabivariateregression(thatis,withoutX2i
)
theshortregressioncoefficient,βs:

TheOVB formula describes the relationship between short and
long
coefficientsasfollows.

OMITTEDVARIABLESBIA S(OVB)FORMULA

whereγisthecoefficientonX2iinthelongregression,andπ21isthe
coefficientonX1iinaregressionofX2ionX1i.Inwords:shortequals

longplustheeffectofomittedtimestheregressionofomittedoninclud
ed.

Thiscentral formula isworthderiving.The slopecoefficient in the
shortmodelis

SubstitutingthelongmodelforYiinequation(2.11)gives

Thefirstequalssigncomesfromthefactthatthecovarianceofalinear
combination of variables is the corresponding linear
combination of
covariancesafterdistributingterms.Also,thecovarianceofaconstan
t
withanythingelseis0,andthecovarianceofavariablewithitselfisthe
varianceofthatvariable.Thesecondequalssigncomesfromthefact
that ,becauseresidualsareuncorrelatedwiththeregressors
thatmadethem( istheresidualfromaregressionthatincludesX1i).
The third equals sign defines π21 to be the coefficient on X1i in
a
regressionofX2ionX1i.

18

Often, as in the discussion of equations (2.2) and (2.5), we’re
interested inshortvs. longcomparisonsacrossregressionmodels
that
includeasetofcontrolscommontobothmodels.TheOVBformulafor
thisscenarioisastraightforwardextensionoftheoneabove.Callthe
coefficientonX1iinamultivariateregressioncontrollingforX2iand
X3i
the long regression coefficient, βl; call the coefficient on X1i in
a
multivariateregressioncontrollingonlyforX3i(thatis,withoutX2i)t
he
shortregressioncoefficient,βs.TheOVBformulainthiscasecanstillb
e

written

where γ is the coefficient on X2i in the long regression, but that
regressionnowincludesX3iaswellasX2i,andπ21isthecoefficienton
X1iinaregressionofX2ionbothX1iandX3i.Onceagain,wecansay:
shortequalslongplustheeffectofomittedtimestheregressionofomitt
edon
included. We leave it to the reader to derive equation (2.12);
this
derivation tests your understanding (and makes an awesome
exam
question).

BuildingModelswithLogs
Theregressionsdiscussedinthischapterlooklike

arepeatofequation(2.2).What’supwithlnYiontheleft-handside?
WhyuselogsandnotthevariableYiitself?Theansweriseasiesttosee
inabivariateregression,say,

wherePiisadummyforprivateschoolattendance.Becausethisisacas
e
ofregressionfordummies,wehave

Inotherwords,regressioninthiscasefitstheCEFperfectly.
SupposeweengineeraceterisparibuschangeinPiforstudenti.This

reveals potential outcome Y0i when Pi = 0 and Y1i when Pi = 1.
Thinkingnowofequation(2.13)asamodelforthelogofthesepotential
outcomes,wehave

Thedifferenceinpotentialoutcomesistherefore

Rearrangingfurthergives

where Δ%Yp is shorthand for the percentage change in potential
outcomesinducedbyPi.Calculustellsusthatln{1+Δ%Yp}iscloseto
Δ
%Yp,whenthelatterissmall.Fromthis,weconcludethattheregressio
n
slopeinamodelwithlnYiontheleft-handsidegivestheapproximate
percentage change in Yi generated by changing the
corresponding
regressor.
TocalculatetheexactpercentagechangegeneratedbychangingPi,

exponentiatebothsidesofequation(2.14)

so

Whenβ is less thanabout .2,exp(β)−1andβarecloseenoughto
justifyreferencetothelatteraspercentagechange.19

Youmighthearmastersdescriberegressioncoefficients froma log-
linear model as measuring “log points.” This terminology
reminds



listeners that thepercentage change interpretation is
approximate. In
general,logpointsunderestimatepercentagechange,thatis,

with the gap between the two growing asβ increases. For
example,
whenβ=.05,exp(β)−1=.051,butwhenβ=.3,exp(β)−1=.35.

RegressionStandardErrorsandConfidenceIntervals
Ourregressiondiscussionhaslargelyignoredthefactthatourdatacom
e
fromsamples.Aswenotedintheappendixtothefirstchapter,sample
regression estimates, like sample means, are subject to sampling
variance.Althoughweimaginetheunderlyingrelationshipquantifie
dby
a regression tobe fixedandnonrandom,weexpect estimatesof this
relationshiptochangewhencomputedinanewsampledrawnfromthe
same population. Suppose we’re after the relationship between
the
earningsofcollegegraduatesandthetypesofcollegesthey’veattende
d.
We’reunlikely tohavedataontheentirepopulationofgraduates. In
practice,therefore,weworkwithsamplesdrawnfromthepopulationo
f
interest. (Even if we had a complete enumeration of the student
populationinoneyear,differentstudentswillhavegonetoschool in
otheryears.)ThedatasetanalyzedtoproducetheestimatesinTables
2.2–2.5 is one such sample.Wewould like toquantify the
sampling
varianceassociatedwiththeseestimates.
Justaswitha samplemean, the samplingvarianceofa regression

coefficientismeasuredbyitsstandarderror.IntheappendixtoChapte
r
1,weexplainedthatthestandarderrorofasampleaverageis

The standarderrorof the slopeestimate inabivariate regression ( )
lookssimilarandcanbewrittenas



whereσeisthestandarddeviationoftheregressionresiduals,andσXis
thestandarddeviationoftheregressor,Xi.
Likethestandarderrorofasampleaverage,regressionstandarderrors

decreasewithsamplesize.Standarderrorsincrease(thatis,regression
estimatesarelessprecise)whentheresidualvarianceislarge.Thisisn’
t
surprising, since a large residual variancemeans the regression
line
doesn’tfitverywell.Ontheotherhand,variabilityinregressorsisgood
:
as σX increases, the slope estimate becomes more precise. This
is
illustrated in Figure 2.2, which shows how adding variability in
Xi
(specifically,addingtheobservationsplottedingray)helpspindownt
he
slopelinkingYiandXi.

FIGURE2.2
VarianceinXisgood

Theregressionanatomyformulaformultipleregressioncarriesovert
o
standarderrors.Inamultivariatemodellikethis,



thestandarderrorforthekthsampleslope, ,is

where isthestandarddeviationof ,theresidualfromaregression
ofXkionallotherregressors.Theadditionofcontrolshastwoopposin
g
effects on SE( ). The residual variance (σe in the numerator of
the
standarderrorformula)fallswhencovariatesthatpredictYiareadded
totheregression.Ontheotherhand,thestandarddeviationof inthe
denominator of the standard error formula is less than the
standard

deviation ofXki, increasing the standard error. Additional
covariates
explainsomeofthevariationinotherregressors,andthisvariationis
removedbyvirtueofregressionanatomy.Theupshotofthesechangest
o
topandbottomcanbeeitheranincreaseordecreaseinprecision.
Standard errors computed using equation (2.15) are nowadays

considered old-fashioned and are not often seen in public. The
old-
fashioned formula is derived assuming the variance of residuals
is
unrelated to regressors—a scenario thatmasters
callhomoskedasticity.
Homoskedastic residuals canmake regression estimates a
statistically
efficient matchmaker. However, because the homoskedasticity
assumptionmaynotbesatisfied,kids todayrockamorecomplicated
calculationknownasrobuststandarderrors.
Therobuststandarderrorformulacanbewrittenas

Robuststandarderrorsallowforthepossibilitythattheregressionline
fitsmoreor lesswell fordifferentvaluesofXi, a scenarioknownas
heteroskedasticity.Iftheresidualsturnouttobehomoskedasticaftera
ll,
therobustnumeratorsimplifies:



Inthiscase,estimatesofRSE( )shouldbeclosetoestimatesofSE( ),
sincethetheoreticalstandarderrorsarethenidentical.Butifresiduals
areindeedheteroskedastic,estimatesofRSE(
)usuallyprovideamore
accurate (and typically somewhat larger) measure of sampling
variance.20

1SATscoreshereare fromthepre-2005SAT.Pre-2005 total
scoresaddmathandverbal

scores,eachofwhichrangefrom0to800,sothecombinedmaximumis
1,600.
2StacyBergDaleandAlanB.Krueger,“EstimatingthePayofftoAtten
dingaMoreSelective

College:AnApplicationofSelectiononObservablesandUnobservab
les,”QuarterlyJournalof
Economics,vol.117,no.4,November2002,pages1491–1527.
3Whichisn’ttosaytheyareneverfooled.AdamWheelerfakedhiswayi
ntoHarvardwith

doctoredtranscriptsandboardscoresin2007.Hisfakerynotwithstan
ding,Adammanagedto
earnmostlyAsandBsatHarvardbeforehisschemewasuncovered(Joh
nR.EllementandTracy
Jan,“Ex-
HarvardStudentAccusedofLivingaLie,”TheBostonGlobe,May18,
2010).
4WhendatafallintooneofJgroups,weneedJ−1dummiesforafulldesc
riptionofthe

groups.Thecategoryforwhichnodummyiscodediscalledthereferen
cegroup.
5“Ordinary-ness”hererefers to the fact
thatOLSweightseachobservation in thissumof

squaresequally.WediscussweightedleastsquaresestimationinChap
ter5.
6 Our book, Mostly Harmless Econometrics (Princeton
University Press, 2009), discusses

regression-weightingschemesinmoredetail.

7 Barron’s classifies colleges asMost Competitive, Highly
Competitive, Very Competitive,

Competitive, Less Competitive, andNoncompetitive, according
to the class rank of enrolled
studentsandtheproportionofapplicantsadmitted.
8Othercontrolsintheempiricalmodelincludedummiesforfemalestu
dents,studentrace,

athletes,andadummyforthosewhograduatedinthetop10%oftheirhi
ghschoolclass.These
variablesarenotwrittenoutinequation(2.2).
9DaleandKrueger,“EstimatingthePayofftoAttendingaMoreSelecti
veCollege,”Quarterly

JournalofEconomics,2002.
10Thegroupdummiesin(2.4),θj,areread“theta-j.”
11Thiscoefficientisread“lambda.”
12 JosephAltonji, Todd Elder, and Christopher Taber formalize
the notion that theOVB

associatedwiththeregressorsyouhaveathandprovidesaguidetotheO
VBgeneratedbythose
you don’t. For details, see their study “Selection on Observed
and Unobserved Variables:
AssessingtheEffectivenessofCatholicSchools,”JournalofPolitical
Economy,vol.113,no.1,
February2005,pages151–184.
13 Francis Galton, “Regression towards Mediocrity in
Hereditary Stature,” Journal of the

AnthropologicalInstituteofGreatBritainandIreland,vol.15,1886,p
ages246–263.
14GeorgeUdnyYule,“AnInvestigationintotheCausesofChangesin
PauperisminEngland,

ChieflyduringtheLastTwoIntercensalDecades,”JournaloftheRoya
lStatisticalSociety,vol.62,
no.2,June1899,pages249–295.
15Foramoredetailedexplanation,
seeChapter3ofAngristandPischke,MostlyHarmless



Econometrics,2009.
16Thethinginsidebraceshere,E[Yi|Xi]−E[Yi|Xi−1],isafunctionof
Xi,andso,likethe
variableXi,ithasanexpectation.
17Theterm“bivariate”comesfromthefactthattwovariablesareinvol
ved,onedependent,on
theleft-
handside,andoneregressor,ontheright.Multivariateregressionmod
elsaddregressors
tothisbasicsetup.
18Theregressionanatomyformulaisderivedsimilarly,hencewesho
wthestepsonlyforOVB.
19Thepercentagechangeinterpretationofregressionmodelsbuiltwi
thlogsdoesnotrequirea
linkwithpotentialoutcomes,butit’seasiertoexplaininthecontextof
modelswithsuchalink.
20Thedistinctionbetweenrobustandold-
fashionedstandarderrorsforregressionestimates
parallelsthedistinction(notedintheappendixtoChapter1)betweenst
andarderrorestimators
forthedifferenceintwomeansthatuseseparateorcommonestimateso
f forthevarianceof
datafromtreatmentandcontrolgroups.



Chapter3

InstrumentalVariables

KWAICHANGCAINE:Fromasingleaction,youdrawanentireunive
rse.
KungFu,Season1,Episode1

OurPath

Statisticalcontrol throughregressionmay fail
toproduceconvincing
estimatesofcausaleffects.Luckily,otherpathsleadtootherthingsequ
al.
Just as in randomized trials, the forces of nature, including
human
nature,sometimesmanipulatetreatmentinamannerthatobviatesthe
needforcontrols.Suchforcesarerarelytheonlysourceofvariationin
treatment,but this isanobstacleeasily surmounted.The
instrumental
variables (IV) method harnesses partial or incomplete random
assignment,whethernaturallyoccuringorgeneratedbyresearchers.
We
illustratethisimportantideathreeways.ThefirstevaluatesanAmeric
an
educationinnovation—charterschools—
withanelementaryIVanalysis
that exploits randomized school admissions lotteries. A second
IV
application,examiningthequestionofhowbesttorespondtodomestic
violence, showshow IV canbeused to analyze field experiments
in
whichthesubjectsrandomlyassignedtotreatmentarefreetooptout.
Thethirdapplicationexploresthelong-runeffectsofgrowingupina
larger or smaller family. This application illustrates two-stage
least
squares(2SLS),anelaborationontheIVmethodandoneofourmost

powerfultools.

3.1TheCharterConundrum

INTERVIEWER:Haveyourmomanddadtoldyouaboutthelottery?

DAISY:Thelottery…isn’tthatwhenpeopleplayandtheywin
money?
WaitingforSuperman,2010

ThereleaseofWaitingforSuperman,adocumentaryfilmthattellsthe
story of applicants to charter schools in New York and
California,
intensifiedanalreadyfeverishdebateoverAmericaneducationpolic
y.
Superman argues that charter schools offer the best hope for
poor
minority students whowould otherwise remain at inner city
public
schools,wherefewexcelandmanydropout.
Charter schools are public schools that operate with
considerably
moreautonomythantraditionalAmericanpublicschools.Acharter—
the
righttooperateapublicschool—
istypicallyawardedtoanindependent
operator (mostly private, nonprofitmanagement organizations)
for a
limited period, subject to renewal conditional on good
performance.
Charter schools are free to structure their curricula and school
environments.Manycharterschoolsexpandinstructiontimebyrunni
ng
long schooldaysand continuing schoolonweekendsandduring the

summer.Perhapsthemostimportantandsurelythemostcontroversial
differencebetween charters and traditionalpublic schools is that
the
teachersandstaffwhoworkattheformerrarelybelongtolaborunions.
Bycontrast,mostbig-citypublicschoolteachersworkunderteachers’
unioncontractsthatregulatepayandworkingconditions,ofteninaver
y
detailedmanner.Thesecontractsmayimproveworkingconditionsfor
teachers,buttheycanmakeithardtorewardgoodteachersordismiss
badones.
AmongtheschoolsfeaturedinWaitingforSupermanisKIPPLAColle
ge
Prep,oneofmore than140 schoolsaffiliatedwith theKnowledge Is



Power Program. KIPP schools are emblematic of the No
Excuses
approach topublic education,awidely replicatedchartermodel
that
emphasizesdisciplineandcomportmentandfeaturesalongschoolda
y,
an extended school year, selective teacher hiring, and a focus on
traditional readingandmath skills.KIPPwas started inHoustonand
NewYorkCityin1995byveteransofTeachforAmerica,aprogramthat
recruits thousands of recent graduates of America’s most
selective
colleges and universities to teach in low-performing school
districts.
Today,theKIPPnetworkservesastudentbodythatis95%blackand
Hispanic,withmorethan80%ofKIPPstudentspoorenoughtoqualify
forthefederalgovernment’ssubsidizedlunchprogram.1

The American debate over education reform often focuses on
the
achievement gap, shorthand for uncomfortably large test score

differencesbyraceandethnicity.BlackandHispanicchildrengeneral
ly
scorewellbelowwhiteandAsianchildrenonstandardizedtests.The
questionofhowpolicymakersshouldreacttolargeandpersistentracia
l
achievementgapsgenerates twosortsof responses.The first looks
to
schoolstoproducebetteroutcomes;thesecondcallsforbroadersocial
change, arguing that schools aloneareunlikely to close
achievement
gaps.Becauseofitsfocusonminoritystudents,KIPPisoftencentralin
thisdebate,withsupporterspointingoutthatnonwhiteKIPPstudents
havemarkedlyhigheraveragetestscoresthannonwhitestudentsfrom
nearbyschools.KIPPskepticshavearguedthatKIPP’sapparentsucce
ss
reflects the fact that KIPP attracts familieswhose children
aremore
likelytosucceedanyway:

KIPP students, as a group, enter KIPPwith substantially higher
achievement than the typicalachievementof schools fromwhich
theycame….[T]eacherstolduseitherthattheyreferredstudents
whoweremoreablethantheirpeers,orthatthemostmotivated
andeducationallysophisticatedparentswerethoselikelytotakethe
initiative…andenrollinKIPP.2

This claim raises the important questionofwhether ceteris is
paribus



whenKIPPstudentsarecomparedtootherpublicschoolchildren.

PlayingtheLottery
ThefirstKIPPschoolinNewEnglandwasamiddleschoolinthetownof
Lynn,Massachusetts, justnorthofBoston.Anolddittywarns:“Lynn,

Lynn,cityofsin,younevercomeoutthewayyoucamein.”Alas,there’s
notmuchcomingoutofLynntoday,sinfulorotherwise.Onceashoe
manufacturinghub,Lynnhasmorerecentlybeendistinguishedbyhig
h
ratesofunemployment,crime,andpoverty.In2009,morethanthree-
quartersofLynn’smostlynonwhitepublic school studentswerepoor
enoughtoqualifyforasubsidizedlunch.Povertyratesareevenhigher
amongKIPPLynn’s entering cohortsof
fifthgraders.Althoughurban
charterschoolstypicallyenrollmanypoor,blackstudents,KIPPLynn
is
unusual among charters in enrolling a high proportion of
Hispanic
childrenwithlimitedEnglishproficiency.
KIPPLynngotofftoaslowstartwhenitopenedinfall2004,with

fewerapplicantsthanseats.Ayearlatertheschoolwasoversubscribed
,
butnotbymuch.After2005,however,demandaccelerated,withmore
than200studentsapplyingforabout90seatsinfifthgradeeachyear.
AsrequiredbyMassachusettslaw,scarcecharterseatsareallocatedby
lottery.Morethanacolorfulinstitutionaldetail,theselotteriesallowu
s
tountangle thecharterschoolcausalityconundrum.Our IVtooluses
these admissions lotteries to frame a naturally occurring
randomized
trial.
Thedecisiontoattendacharterschoolisneverentirelyrandom:even

amongapplicants,someofthoseofferedaseatneverthelesschoosetog
o
elsewhere,whileafewlotterylosersfindtheirwayinbyothermeans.
However,comparisonsofapplicantswhoareandarenotofferedaseata
s
aresultofrandomadmissionslotteriesshouldbesatisfyinglyapplesto
applesinnature.Assumingtheonlydifferencecreatedbywinningthe

lotteryisinthelikelihoodofcharterenrollment(anassumptioncalled
anexclusion restriction), IV turns randomized offer effects into
causal
estimatesoftheeffectofcharterattendance.Specifically,IVestimate
s



capturecausal effectson the sortof childwhoenrolls inKIPPwhen
offeredaseatinalotterybutwouldn’tmanagetogetinotherwise.As
we explain below, this group is known as the set of KIPP lottery
compliers.
MasterJoshwayandhiscollaboratorscollecteddataonapplicantsto
KIPPLynnfromfall2005throughfall2008.3Someapplicantsbypasst
he
lottery:thosewithpreviouslyenrolledsiblingsare(forthemostpart)
guaranteed admission. A few applicants are categorically
excluded
(those too old for middle school, for example). Among the 446
applicantsforfifth-
gradeentrywhoweresubjecttorandomassignment
in the fourKIPP lotteries held from2005 to 2008, 303 (68%)were
offereda seat.Perhaps surprisingly,however, a fairnumberof
these
studentsfailedtoenrollcomeSeptember.Somehadmovedaway,whil
e
othersultimatelypreferredanearbyneighborhoodschool.Amongtho
se
offeredaseat,221(73%)appearedatKIPPthefollowingschoolyear.A
t
the same time, a handful of those not offered a place (about
3.5%)
nevertheless found theirway intoKIPP(a few
losingapplicantswere
offeredcharter seats at a laterdateor ina later lottery). Figure3.1
summarizesthisimportantinformation.

KIPP lotteries randomize the offer of a charter seat. Random
assignmentofoffersshouldbalancethedemographiccharacteristics
of
applicantswhowereandwerenotofferedseats.Balancebyofferstatus
indeed looks good, as can be seen in panel A of Table 3.1. As a
benchmark, the first column reports demographic characteristics
and
elementaryschooltestscoresforallLynnpublicschoolfifthgraders.T
he
second and third columns, which report averages for KIPP
lottery
winnersandthedifferenceinmeansbetweenwinnersandlosers,show
thatwinnersandlosersareaboutequallylikelytobeblackorHispanic
orpoorenoughtoqualifyforafreelunch.

FIGURE3.1
ApplicationandenrollmentdatafromKIPPLynnlotteries



Note:NumbersofKnowledgeIsPowerProgram(KIPP)applicantsare
showninparentheses.

AnespeciallyimportantfeatureofTable3.1isthecheckforbalancein
pretreatmentoutcomes,namely,thetestscoresoflotteryapplicantsin
fourth grade, prior to KIPP enrollment (these are labeled
“baseline
scores”inthetable).Asiscommoninresearchonstudentachievement,
these scores have been standardized by subtracting the mean
and
dividingbythestandarddeviationofscoresinareferencepopulation,i
n
this case, the population of Massachusetts fourth graders. After
standardization, scoresaremeasured inunitsdefinedby
thestandard
deviation of the reference population. As inmany poorer cities

and
townsinMassachusetts,averagemathscoresinLynnfallaboutthree-
tenthsofastandarddeviationbelowthestatemean.Thislevelofscores
iswritten−.3σ (as in the appendix to Chapters 1 and 2, standard
deviation is representedby theGreek letter “sigma”).The small
and
statistically insignificant baseline differences between KIPP
lottery
winnersandlosersreportedincolumn(3)ofTable3.1aremostlikely
duetochance.

TABLE3.1
AnalysisofKIPPlotteries



Notes:ThistabledescribesbaselinecharacteristicsofLynnfifthgrad
ersandreportsestimated
offereffectsforKnowledgeIsPowerProgram(KIPP)Lynnapplicants
.Meansappearincolumns
(1),(2),and(4).Column(3)showsdifferencesbetweenlotterywinner
sandlosers.Theseare
coefficientsfromregressionsthatcontrolforrisksets,namely,dummi
esforyearandgradeof
applicationandthepresenceofasiblingapplicant.Column(5)showsd
ifferencesbetweenKIPP
studentsandapplicantswhodidnotattendKIPP.Standarderrorsarere
portedinparentheses.

Thefinal twocolumns inTable3.1showaverages for fifthgraders
who enrolled at KIPP Lynn, along with differences between
KIPP
applicantswhodidanddidnotenrollatKIPP.Sinceenrollmentisnot
randomly assigned, differences between enrolled and
nonenrolled
studentspotentiallyreflectselectionbias:Lotterywinnerswhochose

to
goelsewheremaycarelessaboutschoolthanthosewhoacceptedaKIP
P
enrollmentopportunity.Thisistheselectionbiasscenariodescribedb
y
KIPPskeptics.Asitturnsout,however,thegapsincolumn(5)aresmall
,



andnoneapproachstatisticalsignificance,suggestingthatselectionb
ias
maynotbeimportantinthiscontextafterall.
MostKIPP applicants apply to enterKIPP in fifth grade, one year
beforeregularmiddleschoolstarts,butsomeapplytoenterinsixth.We
lookhereateffectsofKIPPattendanceontestscoresforteststakenat
theendofthegradefollowingtheapplicationgrade.Thesescoresare
fromtheendof fifthgradeforthosewhoappliedtoKIPPwhenthey
wereinfourthgradeandtheendofsixthgradeforthosewhoappliedto
KIPPwhileinfifth.Theresultingsample,whichincludes371applican
ts,
omitsyoungapplicantswhoappliedforentryafterfinishingthirdgrad
e
andafewapplicantswithmissingbaselineoroutcomescores.4

PanelBofTable3.1showsthatKIPPapplicantswhowereoffereda
seat had standardizedmath scores close to 0, that is, near the
state
mean. Because KIPP applicants start with fourth-grade scores
that
averageroughly.3σbelowthestatemean,achievementatthelevelof
thestatemeanshouldbeseenasimpressive.Bycontrast,theaverage
outcomescoreamongthosenotofferedaseatisabout−.36σ,alittle
belowthefourth-gradestartingpoint.
Sincelotteryoffersarerandomlyassigned,thedifferencebetween0
and−.36,reportedincolumn(3),isanaveragecausaleffect:theoffer

ofa seatatKIPPLynnboostsmathscoresby .36σ, a largegain (the
effectofKIPPoffersonreadingscores,thoughalsopositive,issmaller
andnotstatisticallysignificant).Asatechnicalnote,theanalysisherei
s
slightlymorecomplicatedthanasimplecomparisonofmeans,though
theideaisthesame.Theresultsincolumn(3)comefromregressionsof
scoresonadummyvariableindicatingKIPPoffers,alongwithdummi
es
foryearandgradeofapplicationandthepresenceofasiblingapplicant.
Thesecontrolvariablesarenecessarybecausetheprobabilityofwinni
ng
the lotteryvaries fromyear toyearand fromgrade tograde,and is
much higher for siblings. The control variables used here
describe
groupsofstudents(sometimescalledrisksets)forwhomtheoddsofa
lotteryofferareconstant.5

Whatdoesanoffereffectof.36σtellusabouttheeffectsofKIPPLynn



attendance? The IV estimator converts KIPP offer effects into
KIPP
attendanceeffects.Inthiscase,theinstrumentalvariable(or“instrum
ent”
forshort)isadummyvariableindicatingKIPPapplicantswhoreceive
offers.Ingeneral,aninstrumentmeetsthreerequirements:

(i)Theinstrumenthasacausaleffectonthevariablewhoseeffects
we’re trying to capture, in this case KIPP enrollment. For
reasonsthatwillsoonbecomeclear,thiscausaleffectiscalled
thefirststage.

(ii)Theinstrumentisrandomlyassignedor“asgoodasrandomly
assigned,” in the sense of being unrelated to the omitted
variableswemightliketocontrolfor(inthiscasevariableslike

family background or motivation). This is known as the
independenceassumption.

(iii)Finally,IVlogicrequiresanexclusionrestriction.Theexclusion
restriction describes a single channel through which the
instrument affects outcomes. Here, the exclusion restriction
amountstotheclaimthatthe.36σscoredifferentialbetween
winners and losers is attributable solely to the .74win-loss
differenceinattendanceratesshownincolumn(3)ofTable3.1
(atthetopofpanelB).

TheIVmethodusesthesethreeassumptionstocharacterizeachain
reactionleadingfromtheinstrumenttostudentachievement.Thefirst
link in thiscausalchain—the first stage—connects
randomlyassigned
offerswithKIPPattendance,whilethesecondlink—
theonewe’reafter
—connects KIPP attendance with achievement. By virtue of the
independenceassumptionandtheexclusionrestriction,theproductof
thesetwolinksgeneratestheeffectofoffersontestscores:



Rearranging,thecausaleffectofKIPPattendanceis

Thisworksouttobe.48σ,asshownattheleftinFigure3.2.
Thelogicgeneratingequation(3.1)iseasilysummarized:KIPPoffers

are assumed to affect test scores via KIPP attendance alone.
Offers
increase attendance rates by about 75 percentage points (.74 to
be
precise), somultiplying effects of offers on scores by about 4/3
(≈
1/.74)generatestheattendanceeffect.Thisadjustmentcorrectsforthe
factsthatroughlyaquarterofthosewhowereofferedaseatatKIPP

chose togoelsewhere,whilea fewof thosenotofferednevertheless
woundupatKIPP.
An alternative estimate of the KIPP attendance effect appears in

columns(4)and(5)inTable3.1.Column(4)reportsmeansforKIPP
students,whilecolumn(5)shows thecontrastbetweenKIPPstudents
andeveryoneelseintheapplicantpool.Thedifferencesincolumn(5)
ignore randomized lotteryoffersandcome fromaregressionofpost-
enrollmentmathscoresonadummyvariableforKIPPattendance,alon
g
with the same controls used to construct thewin/loss differences
in
column(3).ThevariationinKIPPattendanceinthisregressioncomes
mostly, but not entirely, from the lottery. Because KIPP
enrollment
involves randomassignment aswell as individual choices (made,
for
example,whenwinnersoptout), comparisonsbetween thosewhodo
anddon’tenrollmaybecompromisedbyselectionbias.However,the
estimateformathincolumn(5)(about.47σ)isclosetotheIVestimate
inFigure3.2, confirmingour earlier conjecture that selectionbias
is
unimportantinthiscase.

FIGURE3.2
IVinschool:theeffectofKIPPattendanceonmathscores



Note:TheeffectofKnowledgeIsPowerProgram(KIPP)enrollmentd
escribedbythisfigureis
.48σ=.355σ/.741.

Againofhalfastandarddeviationinmathscoresafteroneschool
year is a remarkable effect. Lynn residents lucky enough to
have

attendedKIPPreallydon’tcomeoutthewaytheycamein.

LATEforCharterSchool
TheKIPPlotteryexemplifiesanIVchainreaction.Thecomponentsof
suchreactionshavebeennamed,somasterscandiscussthemefficientl
y.
We’venotedthattheoriginalrandomizer(inthiscase,aKIPPoffer)is
calledaninstrumentalvariableorjustaninstrumentforshort.Aswe’v
e
seen,thelinkfromtheinstrumenttothecausalvariableofinterest(in
thiscase,theeffectoflotteryoffersonKIPPattendance)iscalledthe
first-stage,becausethisisthefirstlinkinthechain.Thedirecteffectof
theinstrumentonoutcomes,whichrunsthefulllengthofthechain(in
this case, the effect of offers on scores), is called the reduced
form.
Finally, thecausaleffectof interest—thesecond link in thechain—
is
determinedby theratioof reduced formto first-
stageestimates.This
causaleffectiscalledalocalaveragetreatmenteffect(LATEforshort)
.
ThelinksintheIVchainaremadeofdifferencesbetweenconditional

expectations, that is,comparisonsofpopulationaverages
fordifferent
groups. In practice, population averages are estimated using
sample



means,usuallywithdatafromrandomsamples.Thenecessarydataare

▪theinstrument,Zi:inthiscase,adummyvariablethatequals1
forapplicantsrandomlyofferedaseatatKIPP(definedonlyfor
thoseparticipatinginthelottery);

▪ thetreatmentvariable,Di: inthiscase,adummyvariablethat
equals1 for thosewhoattendedKIPP (forhistorical reasons,
thisissometimescalledtheendogenousvariable);and

▪theoutcomevariable,Yi:inthiscase,fifth-grademathscores.

Keyrelationshipsbetween thesevariables, that is, the links in the
IV
chain,areparameters.Wethereforechristen them,youguessed it, in
Greek.

THEFIRSTSTAGEE[Di|Zi=1]−E[Di|Zi=0];callthisϕ.

In theKIPPstudy,ϕ (“phi”) is thedifference
inKIPPattendancerates
betweenthosewhowereandwerenotofferedaseatinthelottery(equalt
o
.74inFigure3.2).

THEREDUCEDFORME[Yi|Zi=1]−E[Yi|Zi=0];callthisρ.

IntheKIPPstudy,ρ(“rho”)isthedifferenceinaveragetestscoresbetw
een
applicantswhowereandwerenotofferedaseatinthelottery(equalto.3
6
inFigure3.2).

THELOCALAVERAGETREATMENTEFFECT(LATE)

LATE,denotedherebyλ(“lambda”),istheratioofthereducedform
tothefirststage.

IntheKIPPstudy,LATEisthedifferenceinscoresbetweenwinnersan
d

losersdividedbythedifferenceinKIPPattendanceratesbetweenwinn
ers
andlosers(equalto.48inFigure3.2).

Wecanestimateλbyreplacingthefourpopulationexpectationson
the right-hand side of equation (3.2)with the corresponding
sample
averages,anestimatormasterscallIV.Inpractice,however,weusuall
y
optforamethodknownastwo-stageleastsquares(2SLS),detailedin
Section 3.3 below. 2SLS implements the same idea, with added
flexibility. Either way, the fact that parameters are estimated
using
samples requires us to quantify their sampling variance with the
appropriatestandarderrors.Itwon’tsurpriseyoutolearnthatthere’sa
formula for IV standard errors and that your econometric
software
knowsit.Problemsolved!
Amoreinterestingquestionconcernstheinterpretationofλ:justwho

isLATEforcharterschool,youmightask.Childrenprobablydifferint
he
extenttowhichtheybenefitfromKIPP.Forsome,perhapsthosewitha
supportivefamilyenvironment,thechoiceofKIPPLynnoraLynnpub
lic
school matters little; the causal effect of KIPP attendance on
such
applicantsis0.Forothers,KIPPattendancemaymattergreatly.LATE
is
anaverageofthesedifferentindividualcausaleffects.Specifically,L
ATE
istheaveragecausaleffectforchildrenwhoseKIPPenrollmentstatusi
s
determinedsolelybytheKIPPlottery.
Thebiblical storyofPassoverexplains that thereare four typesof

children,andsoitiswithchildrentoday.We’llstartwiththefirstthree
types:ApplicantslikeAlvaroaredyingtogotoKIPP;iftheylosethe
lottery,theirmothersgetthemintoKIPPanyway.ApplicantslikeCam
ila
arehappytogotoKIPPiftheywin,butstoicallyaccepttheverdictif
theylose.Finally,applicantslikeNormandoworryaboutlongdaysan
d
lots of homework.Normandodoesn’t reallywant to go toKIPP
and
refuses todo sowhenhearing thathehaswona seat.Normando is
calledanever-taker,becausehischoiceof school isunaffectedby the
lottery(it’sthesocialworkerwhoputhisnameinthehat).Attheother
endofKIPPkommitment,Alvaroiscalledanalways-
taker.He’llhappily
take a seatwhen offered,while hismother finds away tomake it



happenforhimevenwhenheloses,perhapsbyfalselyclaimingasiblin
g
amongthewinners.ForAlvaro,too,choiceofschoolisunaffectedbyth
e
lottery.
CamilaattendsKIPPwhen shewins the lotterybutwill regretfully

take a seat in her neighborhood school if she loses (Camila’s
foster
motherhasherhandsfull;shewantsthebestforherdaughter,butplays
thehandshe’sdealt).CamilaisthetypeofapplicantwhogivesIVits
power,becausetheinstrumentchangeshertreatmentstatus.Whenher
Zi=0,Camila’sDi=0;andwhenherZi=1,Camila’sDi=1. IV
strategiesdependonapplicantslikeCamila,whoarecalledcompliers,
a
groupweindicatewiththedummyvariable,Ci.Theterm“compliers”
comesfromtheworldofrandomizedtrials.Inmanyrandomizedtrials,

suchasthoseusedtoevaluatenewdrugs,thedecisiontocomplywitha
randomized treatment assignment remains voluntary and
nonrandom
(experimentalsubjectswhoarerandomlyofferedtreatmentmaydecli
ne
it,forexample).Compliersinsuchtrialsarethosewhotaketreatment
when randomly offered treatment but not otherwise. With
lottery
instruments,LATE is theaveragecausaleffectofKIPPattendanceon
CamilaandothercomplierswhoenrollatKIPPifandonlyiftheywin
thelottery.IVmethodsareuninformativeforalways-
takerslikeAlvaro
andnever-
takerslikeNormando,becausetheinstrumentisunrelatedto
theirtreatmentstatus.

TABLE3.2
Thefourtypesofchildren



Note:KIPP=KnowledgeIsPowerProgram.

Table3.2classifieschildren likeAlvaro,Normando,andCamila,as
wellasa fourth type,calleddefiers.Thecolumns
indicateattendance
choicesmadewhenZi=0;rowsindicatechoicesmadewhenZi=1.
Thetablecoversallpossiblescenariosforeveryapplicant,notonlytho
se
weobserve (for example, for applicantswhowonanoffer, the table
describeswhattheywouldhavedonehadtheylost).Never-takerslike
Normandoandalways-takerslikeAlvaroappearonthemaindiagonal.
Winor lose, their choiceof school isunchanged.At thebottom
left,
Camilacomplieswithherlotteryoffer,attendingKIPPifandonlyifsh
e

wins.Thefirststage,E[Di|Zi=1]−E[Di|Zi=0],isdrivenbysuch
applicants,andLATEreflectsaveragetreatmenteffectsinthisgroup.
ThedefiersinTable3.2arethosewhoenrollinKIPPonlywhennot
offeredaseatinthelottery.TheBiblereferstosuchrebelsas“wicked,”
butwemakenomoraljudgments.Wenote,however,thatsuchperverse
behaviormakesIVestimateshardtointerpret.Withdefiersaswellas
compliersinthedata,theaverageeffectofaKIPPoffermightbe0even
ifeveryonebenefitsfromKIPPattendance.Luckily,defiantbehaviori
s
unlikely incharter lotteriesandmanyother IVsettings.Wetherefore
assume defiant behavior is rare to nonexistent. This no-defiers
assumptioniscalledmonotonicity,meaningthattheinstrumentpushe
s
affectedapplicantsinonedirectiononly.
We’ve argued that instrumental variables can be understood as
initiatingacausalchaininwhichaninstrument,Zi,changesthevariabl
e
ofinterest,Di,inturnaffectingoutcomes,Yi.Thenotionofacomplier
populationtiedtoeachinstrumentplaysakeyroleinourinterpretation
of thischainreaction.TheLATEtheoremsays that foranyrandomly
assigned instrument with a nonzero first stage, satisfying both
monotonicityandanexclusionrestriction,theratioofreducedformto
firststageisLATE,theaveragecausaleffectoftreatmentoncompliers
.6

Recall (fromSection1.1) thatY1idenotes theoutcomefor iwith the
treatmentswitchedon,whileY0i is theoutcome for the sameperson



with treatment switched off. Using this notation and the
parameters
definedabove,LATEcanbewritten:

Without stronger assumptions, such as a constant causal effect
for

everybody(thisisthemodeldescribedbyequation(1.3)inChapter1),
LATEneedn’tdescribecausaleffectsonnever-takersandalways-
takers.
It shouldn’t surprise you that an instrumental variable is not
necessarilyhelpfulforlearningabouteffectsonpeoplewhosetreatme
nt
status cannot be changed bymanipulating the instrument. The
good
newshere is that thepopulationofcompliers isagroupwe’d liketo
learnabout.IntheKIPPexample,compliersarechildrenlikelytoatten
d
KIPPwerethenetworktoexpandandofferadditionalseatsinalottery,
perhapsasaconsequenceofopeninganewschoolinthesamearea.In
Massachusetts,wherethenumberofcharterseatsiscappedbylaw,the
consequencesofcharterexpansion is
theeducationpolicyquestionof
theday.
Researchers and policymakers are sometimes interested in
average
causaleffectsfortheentiretreatedpopulation,aswellasinLATE.This
averagecausaleffectiscalledthetreatmenteffectonthetreated(TOTf
or
short).TOTiswrittenE[Y1i−Y0i|Di=1].Asarule,therearetwoways
to be treated, that is, to haveDi switched on.One is to be treated
regardlessofwhether the instrument is switchedofforon.Aswe’ve
discussed,thisisthestoryofAlvaro,analways-
taker.Theremainderof
the treated population consists of compliers who were randomly
assigned Zi = 1. In the KIPP study, the treated sample includes
complierswhowereofferedaseat(likeCamila)andalways-
takers(like
Alvaro)whoattendKIPPnomatterwhat.Thepopulationofcompliers
whowererandomlyofferedaseatisrepresentativeofthepopulationof
allcompliers(includingcomplierswholosethelotteryandgotopublic
schools),buteffectsonalways-takersneednotbethesameaseffectson
compliers.Wemight imagine, forexample, thatAlvaro

isanalways-



taker because hismother senses that KIPPwill change his life.
The
causaleffectheexperiencesisthereforelargerthanthatexperiencedb
y
less-committedtreatedapplicants,thatis,bytreatedcompliers.
Becausethetreatedpopulationincludesalways-
takers,LATEandTOT
are usually not the same.Moreover, neither of these average
causal
effectsneedbethesameovertimeorindifferentsettings(suchasat
charterschoolswithfewerminorityapplicants).Thequestionofwhet
her
aparticularcausalestimatehaspredictivevaluefortimes,places,and
peoplebeyondthoserepresentedinthestudythatproduceditiscalled
external validity. When assessing external validity, masters
must ask
themselveswhy a particular LATE estimate is big or small. It
seems
likely, for example, that KIPP boosts achievement because the
KIPP
recipeprovides a structured educational environment
inwhichmany
children—butperhapsnotall—finditeasytolearn.Childrenwhoare
especiallybrightandindependentmightnotthriveatKIPP.Toexplore
theexternalvalidityofaparticularLATE,wecanuseasingleinstrume
nt
to look at estimates for different types of students—say,
thosewith
higher or lower baseline scores. We can also look for additional
instrumentsthataffectdifferentsortsofcompliers,athemetakenupin
Section3.3.Aswithestimatesfromrandomizedtrials,thebestevidenc
e

for the external validity of IV estimates comes from
comparisons of
LATEsforthesameorsimilartreatmentsacrossdifferentpopulations.

3.2AbuseBusters

ThepolicewerecalledtoO.J.Simpson’sLosAngelesmansionatleast
ninetimesoverthecourseofhismarriagetoNicoleBrownSimpson.Bu
t
theformerNationalFootballLeaguesuperstar,nicknamed“TheJuice
,”
wasarrestedonlyonce,in1989,whenhepleadednocontesttoacharge
ofspousalabuseinanepisodethatputNicoleinthehospital.Simpson
paidasmallfine,didtokencommunityservice,andwasorderedtoseek
counseling from thepsychiatrist of his choice.Theprosecutor in
the
1989 case, Robert Pingle, noted that Nicole had not been very
cooperativewithauthoritiesintheaftermathofherseverebeating.Fiv
e



yearslater,NicoleBrownSimpsonandhercompanionRonaldGoldma
n
were murdered by an unknown intruder whom many believe was
Nicole’sex-husband,O.J.7

Howshouldpolicerespondtodomesticviolence?LikeNicoleBrown
Simpson,abusevictimsareoftenreluctant topresscharges.Arresting
battererswithoutvictimcooperationmaybepointlessandcouldserve
to
aggravateanalreadybadsituation.Tomanyobserversandnotafew
policeofficers,socialserviceagenciesseembestequippedtorespondt
o
domesticviolence.At thesametime,victimadvocatesworrythat the
failuretoarrestbattererssignalssocialtoleranceforviolentactsthat,i

f
observed between strangers, would likely provoke a vigorous
law
enforcementresponse.
Inthewakeofaheatedpolicydebate,themayorandpolicechiefof
Minneapolisembarkedonapathbreakingexperimentintheearly1980
s.
TheMinneapolisDomesticViolenceExperiment(MDVE)w asdesig
nedto
assess the value of arresting batterers.8 The MDVE research
design
incorporatedthreetreatments:arrest,orderingthesuspectedoffender
off
thepremisesfor8hours(separation),andacounselinginterventionth
at
mightincludemediationbytheofficerscalledtothescene(advice).Th
e
designcalledforoneofthesethreetreatmentstoberandomlyselected
whenever participating Minneapolis police officers encountered
a
situationmeetingexperimentalcriteria(specifically,probablecause
to
believethatacohabitantorspousehadcommittedmisdemeanorassaul
t
againstapartnerinthepast4hours).Casesoflife-threateningorsevere
injury(thatis,felonyassault)wereexcluded.Bothsuspectandvictim
had tobepresent at the timeofficers arrived.Theprimaryoutcome
examinedbytheMDVEwasthereoccurrenceofadomesticassaultat
thesameaddresswithin6monthsoftheoriginalrandomassignment.
TheMDVErandomizationdevicewasapadofreportformsrandomly
color-
codedforthreepossibleresponses:arrest,separation,andadvice.
Officerswhoencounteredasituationthatmetexperimentalcriteriawe
re
toactaccordingtothecoloroftheformontopofthepad.Thepolice
officerswhoparticipatedintheexperimenthadvolunteeredtotakepar

t



andwerethereforeexpectedtoimplementtheresearchdesign.Atthe
same time, everyone involved with the study understood that
strict
adherence to the randomization protocol was unrealistic and
inappropriate.

TABLE3.3
AssignedanddeliveredtreatmentsintheMDVE

Notes:Thistableshowspercentagesandcountsforthedistributionofa
ssignedanddelivered
treatmentsintheMinneapolisDomesticViolenceExperiment(MDV
E).Thefirstthreecolumns
show row percentages. The last column reports column
percentages. The number of cases
appearsinparentheses.

Inpractice,officersoftendeviatedfromtheresponsescalledforbythe
colorofthereportformdrawnatthetimeofanincident.Insomecases,
suspects were arrested even though random assignment called
for
separationoradvice.Mostarrestsinthesecasesoccurredwhenasuspe
ct
attemptedtoassaultanofficer,avictimpersistentlydemandedanarres
t,
orwhenbothpartieswereinjured.Afewdeviationsarosewhenofficer
s
forgot their report forms. As a result of these deviations from
the
experimentalprotocol,treatmentdeliveredwasnotrandom.Thiscan
be
seeninTable3.3,which tabulates treatmentsassignedanddelivered.

Almosteverycaseassignedtoarrestresultedinarrest(91of92cases
assigned), but many cases assigned to the separation or advice
treatmentsalsoresultedinarrest.
Thecontrastbetweenarrest,whichusuallyresultedinanightinjail,

andgentleralternativesgeneratesthemostinterestingandcontrovers
ial
findingsintheMDVE.Table3.3thereforecombinesthetwononarrest



treatments under the heading “coddled.” Random assignment
had a
largebutnotdeterministiceffectonthelikelihoodasuspectedbatterer
wascoddled:Acaseassignedtobecoddledwascoddledwithprobabili
ty

; while a case not assigned to coddling (that is,
assignedtoarrest)wascoddledwithprobability .011(1/92).Because
coddlingwasnotdelivered randomly, theMDVE looks
likeabroken
experiment.IVmethods,however,readilyfixit.

WhenLATEIstheEffectontheTreated
The LATE framework is motivated by an analogy between IV
and
randomized trials. But some instrumental variables really come
from
randomizedtrials.IVmethodsallowustocapturethecausaleffectof
treatmentonthetreatedinspiteofthenonrandomcompliancedecision
s
madebyparticipantsinexperimentsliketheMDVE.Infact,theuseofI
V
isusuallynecessaryinsuchexperiments.AnaiveanalysisoftheMDV
E
databasedontreatmentdeliveredismisleading.

Analysis of theMDVE based on treatment delivered
ismisleading

because the cases in which police officers were supposed to
coddle
suspectedbatterersandactuallydidsoareanonrandomsubsetofall
casesassignedtocoddling.Comparisonsofthosewhowereandweren
ot
coddled are therefore contaminated by selection bias. Batterers
who
were arrested when assigned to coddling were often especially
aggressiveoragitated.Useofrandomlyassignedintentiontotreatasa
n
instrumentalvariablefortreatmentdeliveredeliminatesthissourceof
selectionbias.
Asalways,anIVchainreactionbeginswiththefirststage.9TheMDVE

first stage is thedifferencebetween theprobabilityofbeing
coddled
whenassignedtobecoddledandtheprobabilityofbeingcoddledwhen
assignedtobearrested.LetZiindicateassignmenttocoddling,andlet
Di
indicateincidentswherecoddlingwasdelivered.Thefirststageforthi
s
setupis



alargegap,butstillfarfromthedifferenceof1we’dgetifcompliance
hadbeenperfect.
Unfortunately,domesticabuseisoftenarepeatoffense,ascanbeseen

in the fact the police were called for a second domestic violence
intervention at 18% of the addresses in the MDVE sample. Most
importantlyfromthepointofviewofMDVEresearchers,recidivismw
as

greater among suspects assigned to be coddled than among
those
assignedtobearrested.Welearnthisbycalculatingtheeffectofrando
m
assignmenttocoddlingonanoutcomevariable,Yi,thatindicatesatlea
st
onepost-treatmentepisodeofsuspectedabuse:

Giventhattheoverallrecidivismrateis18%,thisestimateddifference
of
11percentagepointsissubstantial.
In randomized trials with imperfect compliance, where
treatment

assigneddiffersfromtreatmentdelivered,effectsofrandomassignme
nt
suchasthatcalculatedinequation(3.3)arecalledintention-to-
treat(ITT)
effects.AnITTanalysiscapturesthecausaleffectofbeingassignedto
treatment. But an ITT analysis ignores the fact that some of
those
assignedtobecoddledwereneverthelessarrested.BecausetheITTeff
ect
doesnottakethisnoncomplianceintoaccount,it’stoosmallrelativeto
theaveragecausaleffectofcoddlingonthosewhowereindeedcoddled
.
Thisproblem,however, iseasilyaddressed:ITTeffectsdividedbythe
difference in compliance rates between treatment and control
groups
capturethecausaleffectofcoddlingoncomplierswhowerecoddledas
a
resultoftheexperiment.
DividingITTestimatesfromarandomizedtrialbythecorresponding

difference in compliance rates is another case of IV in action:
We

recognizeITTasthereducedformforarandomlyassignedinstrument,
specifically, random assignment to coddling. As we’ve seen,
many
suspectedbatterersassignedtobecoddledwereneverthelessarrested
.
Theregressionofadummyforhavingbeencoddledonadummyfor
randomassignment to coddling is the first stage that goeswith
this



reducedform.TheIVcausalchainbeginswithrandomassignmentto
treatment, runs through treatment delivered, and ultimately
affects
outcomes.
TheLATEestimatethatemergesfromtheMDVEdataisimpressive:

.114/.786=.145,alargecoddlingeffect,evenincomparisonwiththe
correspondingITTestimates.Remarkably,eventhoughofficersonth
e
scene were highly selective in choosing whether to follow the
experimental protocol, this estimate of LATE is likely to be a
good
measureofthecausaleffectoftreatmentdelivered.
As always, the causal interpretation of LATE turns in part on
the

relevantexclusionrestriction,whichrequiresthatthetreatmentvaria
ble
of interestbe theonly channel throughwhich the
instrumentaffects
outcomes.IntheMDVE,theIVchainreactionbeginswiththecolorof
police officers’ incident report forms. The exclusion restriction
here
requires that randomly assigned form color affect recidivism
solely

through the decision to arrest or to coddle suspected batterers.
This
seems likea reasonableassumption,all themore soasbatterersand
victimswereunawareoftheirparticipationinanexperimentalstudy.
Are the modest complications of an IV analysis really
necessary?

Suppose we analyze the MDVE using information on treatment
delivered,ignoringthenonrandomnatureofdecisionstocomplywith
randomassignment.Theresultinganalysiscomparesrecidivismamo
ng
thosewhowereandwerenotcoddled,withnofurthercomplicationsor
adjustments:

TheestimatedeffecthereisquiteabitsmallerthantheIVestimateof
almost15percentagepoints.
Chapter 1 shows thatwithout random assignment, comparisons
of

treatedanduntreated subjectsequal thecausaleffectof interestplus
selectionbias.Theselectionbiasthatcontaminatesanaiveanalysisof
the MDVE is the difference in potential recidivism (that is, in
Y0i)
betweenbattererswhowereandwerenotcoddled.Althoughmuchof



thevariationincoddlingwasproducedbyrandomassignment,officer
s
on the scene also used discretion. Battererswhowere arrested
even
though they’d been randomly assigned to be coddled were often
especially violent or agitated,while suspects in caseswhere
officers
compliedwithacoddlingassignmentweretypicallymoresubdued.In
otherwords,battererswhowerecoddledwerelesslikelytoabuseagain

inanycase.Theresultingselectionbiasleadsthecalculationbasedon
treatmentdeliveredtounderestimatetheimpactofcoddling.Incontra
st
with theKIPPstudy(discussed inSection3.1),
selectionbiasmatters
here.
IVanalysisoftheMDVEeliminatesselectionbias,capturingaverage

causaleffectsoncompliers(inthiscase,theeffectofcoddlingbatterer
s
in incidents in which officers were willing to comply with
random
assignment to coddling).An interesting and important feature of
the
MDVEisthevirtuallyone-sidednatureofnoncomplianceintreatment
delivered. When randomized to arrest, the police faithfully
arrested
(withonlyoneexceptionin92cases).Bycontrast,morethan20%of
thoseassignedtobecoddledwereneverthelessarrested.
Theasymmetryincoddlingcompliancemeanstherewerealmostno

always-takers in theMDVE. Inour IVanalysisof
theMDVE,always-
takers are suspected batterers who were coddled without regard
to
treatmentassigned.Thesizeofthisgroupisgivenbytheprobabilityof
coddlingwhenassignedtoarrest,inthiscase,only1/92.Aswenotedin
Section3.1,anytreatedpopulationistheunionoftwogroups,thesetof
compliersrandomlyassignedtobetreatedandthesetofalways-takers.
Withnoalways-takers,allofthetreatedarecompliers,inwhichcase,
LATEisTOT:

Applying the no-always-takers property to theMDVE,we see
that
LATE is the average causal effect of coddling on the coddled.
Specifically, the TOT estimate emerging from the MDVE

contrasts
recidivismamongthecoddled(E[Y1i|Di=1])withtherateswewould



observe in a counterfactual world in which coddled batterers
were
arrested instead (E[Y0i|Di=1]).This important simplificationof
the
usual LATE story emerges in any IV analysiswith no always-
takers,
includingmanyotherrandomizedtrialswithone-
sidednoncompliance.
Whensomeofthoserandomlyassignedtotreatmentgountreated,but
noonerandomlyassignedtothecontrolgroupgetstreated,IVmethods
usingrandomintentiontotreatasaninstrumentfortreatmentdelivere
d
captureTOT.10

Afinalnoteonhowmuchgood’metricsmatters:It’shardtooverstate
the impact of the MDVE on U.S. law enforcement. Batterers in
misdemeanordomesticassaultcasesarenowroutinelyarrested.Inma
ny
states, arrest in cases of suspected domestic abuse has become
mandatory.

GRASSHOPPER:Master,theO.J.casecameadecadeaftertheMDVE
.
ThepathbreakingMDVEresearchdesigndidnotsaveNicoleBrown
andRonGoldman.
MASTERJOSHWAY:Socialchangehappensslowly,Grasshopper.
Andthe
originalMDVEanalystsreportednaiveestimatesbasedontreatment
delivered,alongwithintention-to-treateffects.TheIVestimatesin
my2006studyaremuchlarger.
GRASSHOPPER:WouldNicoleandRonhavebeensavedifearlier

analystshadusedinstrumentalvariables?
MASTERJOSHWAY:Therearesomethingswecanneverknow.

3.3ThePopulationBomb

Populationcontrolorracetooblivion?
PaulEhrlich,1968

Worldpopulationincreasedfrom3billionto6billionbetween1960an
d
1999,adoublingtimeof39years,andabouthalfaslongasthetimeit



tooktogofrom1.5billionto3billion.Onlyadozenyearspassedbefore
theseventhbillioncamealong.Butcontemporarydemographersagre
e
that population growth has slowed dramatically. Projections
using
current fertility ratespoint toadoubling timeof100yearsormore,
perhaps even forever. One widely quoted estimate has
population
peaking at 9 billion in 2070.11 Contemporary hand-wringing
about
sustainable growth notwithstanding, the population bomb has
been
defused—whatarelief!
Thequestionofhowpopulationgrowthaffects livingstandardshas

bothamacrosideandamicroside.Macrodemographytracesitsroots
totheeighteenth-
centuryEnglishscholarThomasMalthus,whoargued
thatpopulationsizeincreaseswhenfoodoutputincreases,somuchso
that productivity gains fail to boost living standards. The
unhappy
Malthusianoutcome ischaracterizedbyapermanentsubsistence-

level
existenceformostpeople.Thispessimisticviewofeconomicgrowthh
as
repeatedlybeenfalsifiedbyhistory,but thathasn’tpreventedit from
gainingtractionamonglatter-
daydoomsayers.BiologistPaulEhrlich’s
1968blockbusterThePopulationBombfamouslyarguedforaMalthu
sian
scenariofeaturingimminentmassstarvationinIndia.Sincethen,Indi
a’s
population has tripled, while Indian living standards have
increased
markedly.12

Economists have turned amicro lens on the relationship between
familysizeandlivingstandards.Here,attentionfocusesontheability
of
householdsofdifferentsizestosupportacomfortablestandardoflivin
g.
Wemightindeedexpectincreasesinfamilysizetobeassociatedwith
increasedpovertyandreducededucation—
moremouthstofeedmeans
less for each—and that’s what simple correlations show. A more
elaborate theoretical rationalization for this powerful relation
comes
from thework of the late Gary Becker and his collaborators.
These
studiesintroducedthenotionofa“quantity-qualitytradeoff,”theidea
thatreductionsinfamilysizeincreaseparentalinvestmentinchildren.
Forexample,parentswithfewerchildrenmightguardtheirchildren’s
healthmorecloselyandinvestmoreintheirschooling.13



On thepolicy side, theview that smaller families are essential
for

increasing living standards hasmotivated international agencies
and
manygovernmentstopromote,andoccasionallyeventorequire,small
er
families.China led thewaywith the controversialOneChildPolicy,
implemented in 1979.Other aggressive government-sponsored
family
planningeffortsincludeaforced-
sterilizationprograminIndiaandthe
publicpromotionoffamilyplanninginMexicoandIndonesia.By1990
,
85%ofpeople in thedevelopingworld lived in countrieswhere the
governmentconsideredhighfertility tobeamajor forceperpetuating
poverty.14

The negative correlation between average family size and
developmentindicatorslikeschoolingishardtoarguewith.Istherea
causal connection between family size and children’s
education?The
challengeinansweringthisquestion,asalways,istheparibus-
nessofthe
ceteris.Forthemostpart,fertilityisdeterminedbythechoicesparents
make.15Notsurprisingly,therefore,womenwithlargefamiliesdiffer
in
many ways from those with smaller families; they tend to be
less
educated,forexample.Andthechildrenofless-
educatedmotherstendto
be less educated themselves. Marked differences in observable
characteristics across families of different sizes raise the red
flag of
selectionbias.Sincewomenwithdifferentnumbersofchildrenareso
observablydifferent,wemustacknowledgethepossibilityofimporta
nt
unobserveddifferencesassociatedwithfamilysizeaswell.
As always, the ideal solution to an omitted variables problem is

randomassignment.Inthiscase,theexperimentmightgolikethis.(i)
Draw a sample of families with one child. (ii) In some of these
households,randomlydistributeanadditionalchild.(iii)Wait20year
s
andcollectdataontheeducationalattainmentoffirstbornswhodidand
didnotgetanextrasibling.Ofcourse,wearen’tlikelytoseesuchan
experimentanytimesoon.Clevermastersmight,however,findsource
s
ofvariationthatrevealthecausalconnectionbetweenfamilysizeand
schoolingwithoutthebenefitofarealexperiment.



Whichbringsustothequestionofwherebabiescomefrom.Asmost
ofourreaderswillknow,humaninfantsaredeliveredtohouseholdsby
along-legged,long-neckedbirdcalledastork(thoughit’samyththat
theinfantisdroppeddownthechimney—chimneyshaveadamperthat
prevents delivery of a live infant).Delivery occurs 9months after
a
woman,whomwewillrefertoasthe“mother,”declaresherintentionto
haveachild.Storksareunresponsivetothewishesofmen(exceptwhen
thesewishesarepassedonbywomen),sowefocushereonthenotional
experimentfromthepointofviewofthemotherandheroldestchild.
The experiment we have in mind is the addition of children to

households that have one already. The first-born child is our
experimentalsubject.The’metricschallengeishowtogenerate“asgo
od
as randomly assigned” variation in family size for these
subjects.
Unfortunately, the Association of Stork Midwives rejects
random
assignment as unnatural. But storks nevertheless generate
circumstantiallyrandomvariationinfamilysizebysometimesdelive
ring

morethanonechildintheformoftwins(aconsequenceofthefactthat
storks are large and infants are small, so storks sometimes
scoop
multipleswhenpickingbabiesintheinfantstoragewarehouse).Thefa
ct
that twins induce a family size experimentwas first recognized
in a
pioneeringstudybyMarkRosenzweigandKennethWolpin,whoused
a



small sample of twins to investigate the quantity-quality trade-
off in
India.16

Toexploitthetwinsexperiment,weturntoalargesamplefromIsrael,
analyzedinastudyofthequantity-
qualitytradeoffbyMasterJoshway,
withcolleaguesVictorLavyandAnaliaSchlosser(the“ALSstudy”fo
r
short).17Israelmakesforaninterestingcasestudybecauseithasavery
diversepopulation,includingmanypeoplewhowerebornindevelopi
ng
countries and into large families. About half of the Israeli
Jewish
populationisofEuropeanancestry,whiletheotherhalfhasrootsinAsi
a
orAfrica.QuiteafewArabsliveinIsraelaswell,butthedataforIsraeli
non-JewsarelesscompletethanforJews.Anattractivefeatureofthe
IsraeliJewishsample,besidesethnicdiversityandlargerfamiliestha
n
arefoundinmostdevelopedcountries,istheavailabilityofinformatio
n
on respondents’ familiesoforigin, including theageand sexof
their

siblings. This unusual data structure is the foundation of the
ALS
empiricalstrategy.
Wefocushereonagroupoffirst-bornadultsinarandomsampleof
men andwomen born tomotherswith at least two children. These
firstbornshaveatleastoneyoungersibling,butmanyhavetwoormore.
Considerafamilyinwhichthesecondbirthisasingleton.Onaverage,
such families include 3.6 children. A second twin birth,
however,
increasesaverage family sizeby .32, that is,byaboutone-thirdofa
child.WhydotwinbirthsincreasefamilysizebyaSolomonicfractiona
l
child?Many Israeli parents would like three or four children;
their
familysize is largelyunaffectedbytheoccurrenceofamultiple twin
birth,sincetheyweregoingtohavemorethantwochildreneitherway.
Ontheotherhand,somefamiliesarehappywithonlytwochildren.The
lattergroupisforcedtoincreasefamilysizefromtwotothreewhenthe
storkdeliverstwins.Theone-third-of-a-
childtwinsdifferentialinfamily
sizereflectsadifferenceinprobabilities:thelikelihoodofhavingathir
d
childincreasesfromabout.7withasingletonsecondbirthtoacertainty
whenthesecondbirthismultiple.The.3figurecomesfromthefactthat
thedifferencebetweenaprobabilityof1andprobabilityof.7is.3.



Asimple regressionofadult firstborns’highestgradecompletedon
familysizeshowsthateachextrasiblingisassociatedwithareduction
of
about one-quarter of a year of schooling (these results come
froma
modelwithageandsexcontrols).Ontheotherhand,astheALSstudy
shows, even though first-born adults with second-born twin
siblings

wereraisedinlargerfamilies,theyarenolesseducatedthanfirst-born
adults in familieswhere the second-born childwas a singleton.
The
comparison of schooling between firstborns with twin and
singleton
siblingsconstitutesthereducedformforanIVestimatethatusestwin
birthsasaninstrumentforfamilysize.
IVestimatesareconstructedfromtheratioofreduced-formtofirst-
stage estimates, so a reduced formof zero immediately suggests
the
causaleffectofsibshipsizeisalsozero.Thefactthatthetwinsreduced-
formandassociatedIVestimatesareclosetozeroweighsagainst the
viewthatalargerfamilyoforiginreduceschildren’sschooling.Inothe
r
words,thetwinsexperimentgeneratesnoevidenceofaquantity-
quality
tradeoff.
Multiplebirthshaveamarkedeffect on family size, but the twins
experiment isn’t perfect. Because the Association of Stork
Midwives
refuses to use random assignment, there’s some imbalance in
the
incidenceoftwinning.Multiplebirthsaremorefrequentamongmothe
rs
whoareolderandforwomeninsomeracialandethnicgroups.This
potentiallyleadstoomittedvariablesbiasinouranalysisofthetwins
experiment,especiallyifsomeofthecharacteristicsthatboosttwinni
ng
are hard to observe and control for.18 Luckily, a second fertility
experimentprovidesevidenceonthequantity-qualitytrade-off.
Inmanycountries,fertilityisaffectedbysiblingsexcomposition.For
onething,parentsoftenhopeforason;sonpreferenceisparticularly
stronginpartsofAsia.InEurope,theAmericas,andIsrael,parentssee
m
tocarelittleaboutwhetherchildrenaremaleorfemale.Rather,many
parentshopeforadiversifiedsibling-

sexportfolio:Familieswhosefirst
twochildrenarebothboysorbothgirlsaremorelikelytohaveathird
child.Because the sexof anewborn is essentially
randomlyassigned
(male births occur about half the time and, in the absence of
sex-



selective abortion, little can be done to change this), parental
preferences for mixed sibling-sex composition generate sex-mix
instruments.
First-born Israeli adults who have a second-born sibling of the
opposite sex grew up in households with about 3.60 children.
But
firstbornswhosesecond-bornsiblingisofthesamesexwereraisedin
familieswith3.68children.Inotherwords,thesame-sexfirststagefor
Israeli firstborns is about .08. As with the twins first stage, this
differentialreflectschangesintheprobabilityofchildbearinginduce
dby
an instrument. In this case, the instrumental variable is a
dummy
variablethatequals1forfamilieswhosefirsttwochildrenarebothmal
e
orboth femaleandequals0 for familieswithoneboyandonegirl.
Whilethesex-mixfirststageissmallerthanthatarisingfromtwinning,
thenumberoffamiliesaffectedbysame-
sexsibshipsismuchlargerthan
thenumberoffamiliesaffectedbytwinning.Abouthalfofallfamilies
withat leasttwochildrenhaveeithertwoboysortwogirlsatbirths
numberoneandnumbertwo.Bycontrast,onlyabout1%ofmothers
havetwins.Siblingsexcompositionalsohasa legupontwinning in
beingunrelatedtomaternalcharacteristics,suchasageatbirthandrac
e
(as shown by ALS and in an earlier study byMaster Joshway and
WilliamEvans).19

Asitturnsout,theeducationalattainmentoffirst-bornIsraeliadultsis
unaffectedbytheirsiblings’sexcomposition.Forexample,theaverag
e
highest grade completedby firstborns from familieswithmixed-
and
same-sex sibships isaboutequalat12.6.Thus, the same-sex
reduced
form,andthereforethecorrespondingIVestimates,arebothzero.Like
the twins experiment, fertility changes generated by differences
in
siblingsexcompositionshownoevidenceofaquantity-qualitytrade-
off.
Theexclusionrestrictionrequiredforacausal interpretationofsex-
mixIVestimatesassertsthatsiblingsexcompositionmattersforadult
outcomesonlyinsofarasitchangesfamilysize.Mightthesex-
mixofthe
first two children affect children’s educational outcomes for
other
reasons?Twoboysandtwogirlsarelikelytoshareabedroomlonger
thanmixed-sexsiblings,forexample,andsame-
sexsiblingsmaymake



betteruseofhand-me-
downclothing.Suchhouseholdefficienciesmight
makefamilieswithasame-sexsibshipfeelalittlericher,afeelingthat
may ultimately increase parental investment in their children’s
schooling.
Canwetesttheexclusionrestriction?Notdirectly,but,asisoftenthe
case, evidence can be brought to bear on the question. For some
mothers,sexcompositionisunlikelytoaffectfertility.Forexample,in
an Israeli sample, religiouswomenwhoplan to have three ormore
childrenarealways-takersforsex-mixinstruments.Ontheotherhand,
highlyeducatedwomen,mostofwhomplansmallfamilies,arenever-

takersiftheirfertilitybehaviorisunchangedbysexmix.Becausethe
fertilityofalways-takersandnever-takers
isunchangedbysiblingsex
composition, any relationship between sex-mix instruments and
outcomes in sampleswith fewcompliersmaysignalviolationsof
the
underlyingexclusionrestriction.
Wecanexpress this ideamore formallyusing therepresentationof
LATE inequation (3.2).This expressiondefines LATEas the
ratioof
reduced-formtofirst-stageparameters,thatis:

whichimpliesinturnthatthereducedform,ρ,istheproductofthefirst
stageandLATE:

Fromthisweconcludethatinsampleswherethefirststage,ϕ,iszero,
the reduced form should be zero as well. On the other hand, a
statistically significant reduced-form estimate with no evidence
of a
correspondingfirststageiscauseforworry,becausethissuggestssom
e
channelotherthanthetreatmentvariable(inthiscase,familysize)link
s
instrumentswithoutcomes. In this spirit,ALS
identifieddemographic
groupsforwhichtheeffectoftwinsandsex-
compositioninstrumentson
familysizeissmallandnotsignificantlydifferentfromzero.These“no
-



first-stage samples” generate no evidence of significant
reduced-form
effectsthatmightsignalviolationsoftheexclusionrestriction.

One-StopShoppingwithTwo-StageLeastSquares

IVestimatesofcausaleffectsboildownto reduced-formcomparisons
acrossgroupsdefinedbytheinstrument,scaledbytheappropriatefirst
stage. This is a universal IV principle, but the details vary
across
applications.Thequantity-
qualityscenariodiffersfromtheKIPPstoryin
thatwehavemorethanoneinstrumentforthesameunderlyingcausal
relation.Assumingthattwinsandsex-mixinstrumentsbothsatisfythe
requiredassumptionsandcapturesimilaraveragecausaleffects,we’d
liketocombinethetwoIVestimatestheygeneratetoincreasestatistica
l
precision.Atthesametime,twinningmightbecorrelatedwithmaterna
l
characteristicslikeageatbirthandethnicity,leadingtobiasintwinsIV
estimates.We’d therefore likeasimple IVprocedure thatcontrols
for
maternalageandanyotherconfoundingfactors.Thissuggestsapayoff
to integrating the IV ideawith the regressionmethods discussed
in
Chapter2.
Two-
stageleastsquares(2SLS)generalizesIVintwoways.First,2SLS
estimatesusemultiple instruments efficiently.
Second,2SLSestimates
control for covariates, thereby mitigating OVB from imperfect
instruments.Toseehow2SLSworks,ithelpstorewritethefirststage
(ϕ)andreducedform(ρ)parametersasregressioncoefficientsinstead
of
differencesinmeans.Startingwithasingleinstrument,say,adummy
variable formultiple second births denoted by Zi, the reduced-
form
effectcanbewrittenasthecoefficientρintheregressionequation:

AswenotedintheappendixtoChapter2,regressiononaconstantterm

andasingledummyvariableproducesthedifferenceintheconditional
meansofthedependentvariablewiththedummyswitchedoffandon.
ThecoefficientonZiinequation(3.4)istherefore



Likewise,thefirst-stageeffectofZiisthecoefficientϕinthefirst-
stage
equation:

whereϕ=E[Di|Zi=1]−E[Di|Zi=0].Sinceλ=ρ/ϕ,weconclude
thatLATEistheratiooftheslopecoefficientsinregressions(3.4)and
(3.5).
The2SLSprocedureoffersanalternativewayofcomputingρ/ϕ.The
2SLS name comes from the fact that LATE can be obtained
from a
sequence of two regressions. In the 2SLS first stage, we
estimate
equation(3.5)andsavethefittedvalues, .These“first-stagefits”are
definedas

The2SLSsecondstageregressesYion ,asin

Thevalueofλ2SLSgeneratedbythissecondstepisidenticaltotheratio
ofreducedformtofirst-
stageregressioncoefficients,ρ/ϕ,atheoretical
relationshipderivedinthechapterappendix.
Control variables like maternal age fit neatly into this two-step
regressionframework.20Addingmaternalage,denotedAi,thereduc
ed
formandfirststagelooklike

Here, the first-stage fittedvalues come frommodels that include
the
controlvariable,Ai:

2SLSestimatesareagainconstructedbyregressingYionboth andAi.
Hence,the2SLSsecond-stageequationis

whichalsoincludesAi.
The2SLSsetupallowsasmanycontrolvariablesasyoulike,provided
they appear in both the first and second stages.As discussed in
the
chapterappendix,thecorrespondingcovariate-
adjustedLATEcanstillbe
constructedfromtheratioofreduced-formtofirst-
stagecoefficients,ρ/
ϕ.Indeed,weshouldseparatelyinspecttheupstairsanddownstairsin
thisratiotomakesureallonbothfloorsiskosher.Butwhenitcomes
time to report results to the public, 2SLS is theway to go even
in
relativelysimplescenarioslikethisone.Econometricssoftwarepack
ages
compute2SLSestimatesdirectly, reducingthescopeformistakesand
generatingappropriatestandarderrorsatnoextracharge.21

Whataboutoursecondfamily-sizeinstrument,adummyforsame-sex
sibships?CallthisWi(whereWi=1indicatestwogirlsortwoboys,and
Wi = 0 otherwise). Here, too, control variables are called for, in
particular, the sexof the first-born,whichwe codeas adummy,Bi,
indicatingfirst-bornboys(asarule,boysarebornslightlymoreoften
thangirls,sotheprobabilityofasame-sexpairisslightlyhigherwhen
thefirstbornismale).Withtwoinstruments,WiandZi,andtheextra
controlvariable,Bi,the2SLSfirststagebecomes

The first-stage effects of the twins and sex-mix instruments are
distinguishedbysubscriptstfortwinsandsforsex-mix:wewritethese
as ϕt and ϕs. Both instruments appear with similarly subscripted
coefficientsinthecorrespondingreducedformaswell:

Withtheseingredientsathand,it’stimetocook!
Second-stageestimateswithtwoinstrumentsandtwocovariatesare
generatedbytheregressionequation

wherethefittedvalues, ,comefromfirst-stageequation(3.10).Note
thatthecovariatesappearateveryturn:inthefirstandsecondstages,
andinthereducedform.Equation(3.11)producesaweightedaverage
oftheestimateswe’dgetusingtheinstrumentsZiandWioneatatime,
while controlling for covariates Ai and Bi. When the
instruments
generate similar resultswhenusedoneata time, the2SLSweighted
average is typically amore precise estimate of this common
causal
effect.

TABLE3.4
Quantity-qualityfirststages

Notes: This table reports coefficients from a regression of the
number of children on
instruments and covariates. The sample size is 89,445. Standard
errors are reported in
parentheses.

2SLS offers awonderfully flexible framework for IV estimation.
In



addition to incorporating control variables and using multiple
instrumentsefficiently,theframeworkaccommodatesinstrumentsof
all
shapes and sizes, not just dummy variables. In practice,
however,

masters use special-purpose statistical software to calculate
2SLS
estimatesinsteadofestimatingregressionsonfittedvalueslike(3.11)
.
Estimationofthisequation,knownas“manual2SLS,”doesn’tproduc
e
thecorrectstandarderrorsneededtomeasuresamplingvariance.The
chapterappendixexplainswhy.
Estimatesoftwinsandsex-mixfirststageswithandwithoutcovariates
appearinTable3.4.Theestimatefromafirst-
stagemodelwithcontrols,
reportedincolumn(2)ofthetable,showsthatfirst-bornIsraeliadults
whosesecond-
bornsiblingsweretwinwereraisedinfamilieswithabout
.44morechildrenthanthoseraisedinfamilieswherethesecondbirth
wasasingleton.Thisfirst-stageestimateislargerthantheestimateof
.32 computed without controls (reported in column (1)). The
OVB
formulathereforetellsusthattwinbirthsareassociatedwithfactorsth
at
reducefamilysize,likeoldermaternalage.Adjustingformaternalage
andotherpossibleconfoundingfactorsbooststhetwinsfirststage.On
theotherhand,thesame-sexfirststageof.073generatedbyamodel
withcovariatesisclosetotheuncontrolledestimateof.079,sincesex
mixisessentiallyunrelatedtotheincludedcontrols(theseestimatesca
n
beseenincolumns(3)and(4)).Thefactthatthefirst-bornismalealso
haslittleeffectonthesizeofhisfamily.Thiscanbeseeninthesmall,
marginallysignificantmalecoefficientsreportedinthelastrow(thisi
s
theonlycovariatecoefficientreportedinthetable,thoughthepresenc
e
ofothercontrolsisindicatedinthebottomrow).22

Second-stageestimatesofthequantity-qualitytrade-
offarereportedin

Table3.5,alongwiththecorrespondingestimatesfromaconventional
(thatis,uninstrumented)OLSregressionoftheform

The conventional regression estimates in column (1) show a
strong
negative relation between family size and education outcomes,
even



afteradjustingforfamilybackgroundvariablesrelatedtoethnicityan
d
mother’sageatbirth.Bycontrast,the2SLSestimatesgeneratedbytwi
ns
instruments, reported incolumn(2)of the table,mostlygotheother
way, though the 2SLS estimates in this case are not
significantly
different from zero. Estimation using sex-composition
instruments
reinforces the twins findings.The2SLSestimates
incolumn(3)show
uniformlypositiveeffectsoffamilysizeoneducation(thoughonlyon
e
oftheseissignificantlydifferentfromzero).

TABLE3.5
OLSand2SLSestimatesofthequantity-qualitytrade-off

Notes:ThistablereportsOLSand2SLSestimatesoftheeffectoffamil
ysizeonschooling.OLS
estimatesappearincolumn(1).Columns(2),(3),and(4)show2SLSest
imatesconstructedusing
the instruments indicated
incolumnheadings.Samplesizesare89,445forrows(1)and(2);
50,561forrow(3);and50,535forrow(4).Standarderrorsarereportedi
nparentheses.

Animportant featureofboththetwinsandsex-compositionsecond
stagesistheirprecision,orlackthereof.IVmethodsdiscardallvariati
on
infertilityexceptthatgeneratedbytheinstrument.Thiscanleavetoo
little variation for statistically conclusive findings. We can
increase
precision,however,bypoolingmultipleinstruments,especiallyif,w
hen
takenoneatatime,theinstrumentsgeneratesimilarfindings(inthis
case,bothtwinsandsex-
compositioninstrumentsshowlittleevidenceof
aquantity-quality trade-off).Theresultingpooled first-
stageestimates
appearincolumn(5)ofTable3.4,whilethecorrespondingsecond-
stage



resultsarereportedincolumn(4)ofTable3.5.
Thepooledsecond-stageestimatesarenotverydifferentfromthose

generatedusingtheinstrumentsoneatatime,butthestandarderrors
areappreciablysmaller.Forexample,theestimatedeffectoffamilysi
ze
on highest grade completed using both instruments is .24, with a
standarderrorof.13,amarkeddropfromthestandarderrorsofabout
.17 and .21 using twins and same-sex instruments one at a time.
Importantly,theregressionestimateincolumn(1),averyprecise−.15
forhighestgradecompleted, lieswelloutside theconfidence
interval
associatedwiththe2SLSestimateincolumn(4).23Thissuggeststhatt
he
strongnegativeassociationbetweenfamilysizeandschoolingisdrive
n
inlargepartandperhapsentirelybyselectionbias.

MASTERJOSHWAY:BuildthehouseofIV,Grasshopper.
GRASSHOPPER:Thefoundationhasthreelayers:(i)thefirst-stage
requiresinstrumentsthataffectthecausalchannelofinterest;(ii)
theindependenceassumptionrequiresinstrumentstobeasgoodas
randomlyassigned;(iii)theexclusionrestrictionassertsthatasingle
causalchannelconnectsinstrumentswithoutcomes.
MASTERJOSHWAY:Cantheseassumptionsbechecked?
GRASSHOPPER:Checkthefirststagebylookingforastrong
relationshipbetweeninstrumentsandtheproposedcausalchannel;
checkindependencebycheckingcovariatebalancewiththe
instrumentswitchedoffandon,asinarandomizedtrial.
MASTERJOSHWAY:Andexclusion?
GRASSHOPPER:Theexclusionrestrictionisnoteasilyverified.
Sometimes,however,wemayfindasamplewherethefirststageis
verysmall.Exclusionimpliessuchsamplesshouldgeneratesmall
reduced-formestimates,sincethehypothesizedcausalchannelis
absent.
MASTERJOSHWAY:HowareIVestimatescomputed?
GRASSHOPPER:Statisticalsoftwarecomputestwo-
stageleastsquares



estimatesforus.Thisallowsustoaddcovariatesandusemorethan
oneinstrumentatatime.Butwelookatthefirst-stageandreduced-
formestimatesaswell.

Mastersof’Metrics:TheRemarkableWrights

TheIVmethodwasinventedbyeconomistPhilipG.Wright,assistedb
y
hisson,Sewall,ageneticist.Philipwrotefrequentlyaboutagricultura
l
markets.In1928,hepublishedTheTariffonAnimalandVegetableOil
s.24

Mostofthisbookisconcernedwiththequestionofwhetherthesteep
tariffsonfarmproductsimposedintheearly1920sbenefiteddomestic
producers.A1929reviewernotedthat“Whateverthepracticalvalueo
f
theintricatecomputationofelasticityofdemandandsupplyasapplied
particularlytobutterinthischapter,thediscussionhashightheoretica
l
value.”25

In competitive markets, shifting supply and demand curves
simultaneously generate equilibrium prices and quantities. The
path
fromtheseobservedequilibriumpricesandquantitiestotheunderlyin
g
supplyanddemandcurvesthatgeneratethemisunclear.Thechallenge
of how to derive supply and demand elasticities from the
observed
relationship between prices and quantities is called an
identification
problem.AtthetimePhilipwaswriting,econometricidentificationw
as
poorly understood. Economists knew for sure only that the
observed
relationshipbetweenpriceandquantityfailstocaptureeithersupplyo
r
demand,andissomehowdeterminedbyboth.
AppendixBofTheTariffonAnimalandVegetableOilsbeginswithan

elegant statement of the identification problem in simultaneous
equationsmodels.Theappendixthengoesontoexplainhowvariables
present in one equation but excluded from another solve the
identification problem. Philip referred to such excluded
variables as
“external factors,” because, by shifting the equation in which
they

appear,theytraceouttheequationfromwhichthey’reomitted(thatis,
towhich theyareexternal).Todaywecall such shifters instruments.
PhilipderivedandthenusedIVtoestimatesupplyanddemandcurvesi
n



marketsforbutterandflaxseed(flaxseedisusedtomakelinseedoil,an
ingredientinpaint).Philip’sanalysisoftheflaxseedmarketusesprice
s
of substitutes as demand shifters,while farmyields per
acre,mostly
drivenbyweatherconditions,shiftsupply.
AppendixBwasamajorbreakthroughin’metricsthought,remarkable

andunexpected,somuchsothatsomehavewonderedwhetherPhilip
really wrote it. Perhaps Appendix B was written by Sewall, a
distinguishedscholarinhisownright.Like’metricsmastersGaltonan
d
Fisher,profiledattheendofChapters1and2,Sewallwasageneticist
andstatistician.WellbeforetheappearanceofAppendixB,Sewallhad
developedastatisticalmethodcalled“pathanalysis”thatwasmeantto
solveproblems related toomittedvariablesbias.Todaywe
recognize
pathanalysisasanapplicationofthemultivariateregressionmethods
discussedinChapter2;itdoesn’tsolvetheidentificationproblemrais
ed
by simultaneous equations models. Some of Appendix B
references
Sewall’s idea of “path coefficients,” but Philip’s method of
external
factorswasentirelynew.
MastersJamesStockandFrancescoTrebbi investigatedthecasefor

Sewall’sauthorshipusingStylometrics.26Stylometricsidentifiesau
thors

bythestatisticalregularitiesintheirwordusageandsentencestructur
e.
Stylometrics confirms Philip’s authorship of Appendix B.
Recently,
however,StockandhisstudentKerryClarkuncoveredlettersbetween
fatherandsonthatshowtheideasinAppendixBdevelopingjointlyina
self-
effacinggiveandtake.Inthisexchange,Philipdescribesthepower
andsimplicityofIV.Buthewasn’tnaiveabouttheeasewithwhichthe
methodcouldbeapplied.InaMarch1926lettertoSewall,writingon
the prospect of finding external factors, Philip commented:
“Such
factors,Ifear,especiallyinthecaseofdemandconditions,arenoteasy
to find.”27 The search for identification has not gotten easier in
the
interveningdecades.



Philip’s journeywaspersonalaswellas intellectual.Heworkedfor
many years as a teacher at obscure Lombard College in
Galesburg,
Illinois. Lombard College failed to survive theGreatDepression,
but
Philip’s time there bore impressive fruit. At Lombard, he
mentored
youngCarl Sandburg,whose loosely structured and evocative
poetry
latermadehimanAmericanicon.Here’sSandburg’sdescriptionofthe
pathblazedbyexperience:28

THISmorningIlookedatthemapoftheday
Andsaidtomyself,“Thisistheway!ThisisthewayIwillgo;
ThusshallIrangeontheroadsofachievement,
Thewayissoclear—itshallallbeajoyonthelinesmarkedout.”
AndthenasIwentcameaplacethatwasstrange,—

’Twasaplacenotdownonthemap!
AndIstumbledandfellandlayintheweeds,
Andlookedonthedaywithrue.

Iamlearningalittle—nevertobesure—
Tobepositiveonlywithwhatispast,
Andtopeersometimesatthethingstocome



Asawanderertreadingthenight
Whenthemazystarsneitherpointnorbeckon,
Andofalltheroads,noroadissure.

Iseethosemenwithmapsandtalk
Whotellhowtogoandwhereandwhy;
Ihearwithmyearsthewordsoftheirmouths,
Astheyfingerwitheasethemarksonthemaps;
Andonlyasonelooksrobust,lonely,andquerulous,
Asifhehadgonetoacountryfar
Andmadeforhimselfamap,
DoIcrytohim,“Iwouldseeyourmap!
Iwouldheedthatmapyouhave!”

Appendix:IVTheory

IV,LATE,and2SLS
WefirstrefreshnotationforanIVsetupwithone instrumentandno
covariates.Thefirststagelinksinstrumentandtreatment:

Thereducedformlinksinstrumentandoutcomes:

The2SLSsecondstageistheregressionofoutcomesonfirst-
stagefitted
values:

NotethattheLATEformula(3.2)canbewrittenintermsoffirst-stage
andreduced-formregressioncoefficientsas



Here,we’veusedthefactthatthedifferencesinmeansonthetopand
bottomofequation(3.2)arethesameastheregressioncoefficients,ϕ
andρ.Writtenthisway,thatis,asaratioofcovariances,λiscalledthe
IVformula.It’ssampleanalogueistheIVestimator.
Inthissimplesetup,theregressionofYion (the2SLSsecondstep)is

thesameasequation(3.12).Thisisapparentoncewewriteoutthe2SLS
secondstage:

In deriving this, we’ve used the rules for variances and
covariances
detailedintheappendixtoChapter2.
With covariates included in the first and second stage—say, the

variableAi,as inour investigationof thepopulationbomb—
the2SLS
secondstageisequation(3.9).Here,too,2SLSandtheIVformulaare
equivalent,withthelatteragaingivenbytheratioofreduced-formto
first-
stagecoefficients.Inthiscase,thesecoefficientsareestimatedwith
Aiincluded,asinequations(3.7)and(3.8):

where istheresidualfromaregressionofZionAi(thisweknowfrom
regressionanatomy).Thedetailsbehindthesecondequalssignareleft
foryoutofillin.

2SLSStandardErrors
Justaswithsamplemeansandregressionestimates,weexpectIVand

2SLSestimatestovaryfromonesampletoanother.Wemustgaugethe
extentofsamplingvariability inanyparticularsetofestimatesaswe
decide whether they’re meaningful. The sampling variance of
2SLS
estimatesisquantifiedbytheappropriatestandarderrors.
2SLSstandarderrorsforamodelthatusesZitoinstrumentDi,while

controllingforAi, arecomputedas follows.First the2SLS residual
is
constructedusing

Thestandarderrorfor isthengivenby

whereσηisthestandarddeviationofηi,and isthestandarddeviation
ofthefirst-stagefittedvalues, .
It’simportanttonotethatηiisnottheresidualgeneratedbymanual

estimation of the 2SLS second stage, equation (3.9). This
incorrect
residualis

Thevarianceofe2iplaysnoroleinequation(3.13),soamanual2SLS
second stage generates incorrect standard errors. Themoral is
clear:
explorefreelyintheprivacyofyourowncomputer,butwhenitcomes
to theestimatesandstandarderrorsyouplan to report inpublic, let
professionalsoftwaredothework.

2SLSBias
IVisapowerfulandflexibletool,butmastersusetheirmostpowerful
toolswisely.Aswe’veseen,2SLScombinesmultipleinstrumentsina
n
efforttogeneratepreciseestimatesofasinglecausaleffect.Typically,
a
researcherblessedwithmanyinstrumentsknowsthatsomeproducea

stronger first stage than others. The temptation is to use them
all
anyway(econometricssoftwaredoesn’tchargemoreforthis).Theris
k
here is that 2SLS estimates with many weak instruments can be
misleading.Aweakinstrumentisonethatisn’thighlycorrelatedwith
theregressorbeinginstrumented,sothefirst-
stagecoefficientassociated
withthis instrument issmallor
impreciselyestimated.2SLSestimates
withmanysuchinstrumentstendtobesimilartoOLSestimatesofthe
samemodel.When2SLS isclose toOLS, it’snatural toconcludeyou
needn’tworryaboutselectionbiasinthelatter,butthisconclusionmay
beunwarranted.Becauseoffinitesamplebias,2SLSestimatesinaman
y-
weakIVscenariotellyoulittleaboutthecausalrelationshipofinterest.
Whenisfinitesamplebiasworthworryingabout?Mastersoftenfocus

onthefirst-stageF-statistictestingthejointhypothesisthatallfirst-
stage
coefficientsinamany-instrumentsetuparezero(anF-
statisticextends
thet-statistictotestsofmultiplehypothesesatonce).Apopularruleof
thumbrequiresanFvalueofatleast10toputmany-weakfearstorest.
An alternative to 2SLS, called the limited information maximum
likelihoodestimator (LIML for short) is lessaffectedby finite
sample
bias.You’dlikeLIMLestimatesand2SLSestimatestobeclosetoone
another,sincetheformerareunlikelytobebiasedevenwithmanyweak
instruments(thoughLIMLestimatestypicallyhavelargerstandarder
rors
thandothecorresponding2SLSestimates).
Themany-weakinstrumentsproblemlosesitsstingwhenyouusea

single instrument to estimate a single causal effect. Estimates of
the
quantity-qualitytrade-
offusingeitherasingledummyformultiplebirths
orasingledummyforsame-
sexsibshipsasaninstrumentforfamilysize
arethereforeunlikelytobeplaguedbyfinitesamplebias.Suchestimat
es
appear in columns (2) and (3) of Table 3.5. Finally, reduced-
form
estimatesarealwaysworthacarefullook,sincetheseareOLSestimate
s,
unaffectedbyfinitesamplebias.Reduced-
formestimatesthataresmall
andnotsignificantlydifferentfromzeroprovideastrongandunbiased
hint that thecausal relationshipof interest isweakornonexistentas
well,atleastinthedataathand(multiplereduced-formcoefficientsare
alsotestedtogetherusinganF-test).Wealwaystellourstudents:Ifyou



can’tseeitinthereducedform,itain’tthere.

1JayMathews’book,WorkHard.BeNice,AlgonquinBooks,2009,de
tailsthehistoryofKIPP.

In 2012, Teach for America was the largest single employer of
graduating seniors on 55
Americancollegecampuses,rangingfromArizonaStatetoYale.
2MartinCarnoy,Rebecca Jacobsen, LawrenceMishel,
andRichardRothstein,The Charter

SchoolDust-
Up:ExaminingEvidenceonStudentAchievement,EconomicPolicyI
nstitutePress,2005,
p.58.

3 JoshuaD.Angrist et al., “Inputs and Impacts inCharter
Schools:KIPPLynn,”American

EconomicReviewPapersandProceedings,vol.100,no.2,May2010,p
ages239–243,andJoshua
D.Angristetal.,“WhoBenefitsfromKIPP?”JournalofPolicyAnalysi
sandManagement,vol.31,
no.4,Fall2012,pages837–860.
4AsnotedinChapter1,attrition(missingdata)isaconcerneveninrand
omizedtrials.The

keytotheintegrityofarandomizeddesignwithmissingdataisanequal
probabilitythatdata
aremissingintreatmentandcontrolgroups.IntheKIPPsampleusedto
constructTable3.1,
winnersandlosersareindeedaboutequallylikelytohavecompletedat
a.
5Section3.3detailstheroleofcovariatesinIVestimation.
6 This theorem comes fromGuidoW. Imbens and
JoshuaD.Angrist, “Identification and

EstimationofLocalAverageTreatmentEffects,”Econometrica,vol.
62,no.2,March1994,pages
467–475. The distinction between compliers, always-takers, and
never-takers is detailed in
JoshuaD.Angrist,GuidoW.Imbens,andDonaldB.Rubin,“Identifica
tionofCausalEffectsUsing
InstrumentalVariables,”Journalof
theAmericanStatisticalAssociation,vol.91,no.434,June
1996,pages444–455.
7Simpsonwasacquittedofmurderinacriminaltrialbutwasheldrespo
nsibleforthedeaths

inaciviltrial.HelaterauthoredabooktitledIfIDidIt:Confessionsofth
eKiller,BeaufortBooks,
2007.Ouraccountofrepeatedpolicevisits

toSimpson’shomeisbasedonSaraRimer,“The
Simpson Case: The Marriage; Handling of 1989 Wife-Beating
Case Was a ‘Terrible Joke,’
ProsecutorSays,”TheNewYorkTimes,June18,1994.
8TheoriginalanalysisoftheMDVEappearsinLawrenceW.Shermana
ndRichardA.Berk,

“TheSpecificDeterrentEffectsofArrestforDomesticAssault,”Ame
ricanSociologicalReview,vol.
49,no.2,April1984,pages261–272.
9OurIVanalysisoftheMDVEisbasedonJoshuaD.Angrist,“Instrume
ntalVariablesMethods

in Experimental Criminological Research: What, Why and
How,” Journal of Experimental
Criminology,vol.2,no.1,April2006,pages23–44.
10 This theoretical result originateswithHoward S. Bloom,
“Accounting forNo-Shows in

ExperimentalEvaluationDesigns,”EvaluationReview,vol.8,no.2,
April1984,pages225–246.
TheLATEinterpretationoftheBloomresultappearsinImbensandAn
grist,“Identificationand
Estimation,”Econometrica,1994.SeealsoSection4.4.3
inJoshuaD.AngristandJörn-Steffen
Pischke,MostlyHarmless Econometrics: An Empiricist’s
Companion, PrincetonUniversity Press,
2009.AnexamplefromourfieldoflaboreconomicsistheJobTraining
PartnershipAct(JTPA).
TheJTPAexperimentrandomlyassignedtheopportunitytoparticipat
einafederallyfundedjob-
trainingprogram.About60%ofthoseofferedtrainingreceivedJTPAs
ervices,butnocontrols
gotJTPAtraining.AnIVanalysisoftheJTPAusingtreatmentassigned
asaninstrumentfor
treatmentdeliveredcapturestheeffectoftrainingontrainees.Fordeta

ils,seeLarryL.Orretal.,
DoesTrainingfortheDisadvantagedWork?EvidencefromtheNation
alJTPAStudy,UrbanInstitute
Press,1996.



11SeeDavidLam,“HowtheWorldSurvivedthePopulationBomb:Le
ssonsfrom50Yearsof
ExtraordinaryDemographicHistory,”Demography,vol.48,no.4,No
vember2011,pages1231–
1262, and Wolfgang Lutz, Warren Sanderson, and Sergei
Scherbov, “The End of World
PopulationGrowth,”Nature,vol.412,no.6846,August2,2001,pages
543–545.
12JusthowmuchIndianlivingstandardshaverisenisdebated.Still,sc
holarsgenerallyagree

thatconditionshave improveddramaticallysince1970(see,
forexample,AngusDeaton,The
GreatEscape:Health,Wealth,andtheOriginsofInequality,Princeton
UniversityPress,2013).
13GaryS.BeckerandH.GreggLewis,“OntheInteractionbetweenthe
QuantityandQuality

ofChildren,”JournalofPoliticalEconomy,vol.81,no.2,part2,March
/April1973,pagesS279–
288,andGaryS.BeckerandNigelTomes,“ChildEndowmentsandthe
QuantityandQualityof
Children,”JournalofPoliticalEconomy,vol.84,no.4,part2,August1
976,pagesS143–S162.
14JohnBongaarts,“TheImpactofPopulationPolicies:Comment,”P
opulationandDevelopment

Review,vol.20,no.3,September1994,pages616–620.
15Youmightthinkthisistrueonlyofsocietieswithaccesstomodernco

ntraceptivemethods,

suchasthepillorthepenny(heldbetweenthekneesasneeded).Butdem
ographershaveshown
thatevenwithoutaccesstomoderncontraceptives,potentialparentse
xertaremarkabledegree
offertilitycontrol.Forexample,inanextensivebodyofwork,Ansley
Coaledocumentedthe
dramatic decline in marital fertility in nineteenth- and
twentieth-century Europe (see
http://opr.princeton.edu/archive/pefp/).Thispattern,
sincerepeated inmostof theworld, is
calledthedemographictransition.
16MarkR.RosenzweigandKennethI.Wolpin,“TestingtheQuantity-
QualityFertilityModel:

TheUseofTwinsasaNaturalExperiment,”Econometrica,vol.48,no.
1,January1980,pages
227–240.
17JoshuaD.Angrist,VictorLavy,andAnaliaSchlosser,“MultipleEx
perimentsfortheCausal

LinkbetweentheQuantityandQualityofChildren,”JournalofLabor
Economics,vol.28,no.4,
October2010,pages773–824.
18Inmorerecentsamples,twinsinstrumentsarealsocompromisedby
theproliferationofin

vitrofertilization,atreatmentforinfertility.Motherswhoturntoinvit
rofertilization,which
increasestwinbirthratessharply,tendtobeolderandmoreeducatedth
anothermothers.
19JoshuaD.AngristandWilliamEvans,“ChildrenandTheirParents’
LaborSupply:Evidence

fromExogenousVariationinFamilySize,”AmericanEconomicRevi

ew,vol.88,no.3,June1998,
pages450–477.
20We’ve seena versionof IVwith covariates
already.TheKIPPoffer effects reported in

column(3)ofTable3.1comefromregressionmodelsforthefirststage
andreducedformthat
includecovariatesintheformofdummiesforapplicationrisksets.
21Alertreaderswillhavenoticedthatthetreatmentvariablehere,fami
lysize,isnotadummy

variablelikeKIPPenrollment,butratheranorderedtreatmentthatcou
ntschildren.Youmight
wonderwhetherit’sOKtodescribe2SLSestimatesoftheeffectsofvar
iableslikefamilysizeas
LATE.Although thedetailsdiffer,2SLSestimatescan stillbe said
tocaptureaveragecausal
effectsoncompliersinthiscontext.TheextensionofLATEtoorderedt
reatmentsisdevelopedin
JoshuaD.AngristandGuidoW.Imbens,“TwoStageLeastSquaresEst
imationofAverageCausal
Effects in Models with Variable Treatment Intensity,” Journal
of the American Statistical
Association, vol. 90, no. 430, June1995, pages431–442.Along
the same lines, 2SLS easily
accommodatesinstrumentsthataren’tdummies.We’llseeanexampl
eofthisinChapter6.
22Inadditiontothemaledummy,othercovariatesincludeindicatorsf
orcensusyear,parents’

ethnicity,age,missingmonthofbirth,mother’sage,mother’sageatfir
stbirth,andmother’sage

http://opr.princeton.edu/archive/pefp/

atimmigration(whererelevant).SeetheEmpiricalNotessectionford
etails.
23 Specifically, the regression estimate of −.145 lies outside the
multi-instrument 2SLS

confidenceintervalof.237±(2×.128)=[−.02,.49].Youcan,insomeca
ses,havetoomany
instruments, especially if they have little explanatory power in
the first stage. The chapter
appendixelaboratesonthispoint.
24PhilipG.Wright,TheTariffonAnimalandVegetableOils,Macmill
anCompany,1928.
25 G.O.Virtue, “The Tariff on Animal and Vegetable Oils by
Philip G.Wright,”American

EconomicReview,vol.19,no.1,March1929,pages152–
156.Thequoteisfrompage155.
26JamesH.StockandFrancescoTrebbi,“WhoInventedInstrumental
VariablesRegression?”

JournalofEconomicPerspectives,vol.17,no.3,Summer2003,pages
177–194.
27Thisquoteandtheoneinthesketcharefromfromunpublishedletter
s,uncoveredbyJames

H.StockandKerryClark.See“PhilipWright,theIdentificationProbl
eminEconometrics,andIts


Solution

,”presentedattheTuftsUniversityDepartmentofEconomicsSpecial
Eventinhonorof

Philip Green Wright, October 2011
(http://ase.tufts.edu/econ/news/documents/wrightPhilipAndSewa
ll.pdf),andKerryClark’s2012
Harvardseniorthesis,“TheInventionandReinventionofInstrumenta
lVariablesRegression.”
28“Experience.”FromInRecklessEcstasy,AsgardPress,1904,edite
dandwithaforewordby

PhilipGreenWright.

http://ase.tufts.edu/econ/news/documents/wrightPhilipAndSewal
l.pdf


Chapter4

RegressionDiscontinuityDesigns

YOUNGCAINE:Master,maywespeakfurtherontheforcesofdestiny
?

MASTERPO:Speak.

CAINE:Aswestandwithtworoadsbeforeus,howshallweknow
whethertheleftroadortherightroadwillleadustoourdestiny?

MASTERPO:Youspokeofchance,Grasshopper.Asifsuchathingwer
e
certaintoexist.Inthematteryouspeakof,destiny,thereisnosuch
thingaschance.
KungFu,Season3,Episode62

OurPath

Humanbehaviorisconstrainedbyrules.TheStateofCalifornialimits
elementaryschoolclass size to32students;33 isone toomany.The
Social Security Administrationwon’t pay you a penny in
retirement
benefits until you’ve reached age 62. Potential armed forces
recruits
withtestscoresinthelowerdecilesareineligibleforAmericanmilitar
y
service. Although many of these rules seem arbitrary, with little
groundinginscienceorexperience,wesay:bring’emon!Forrulesthat
constraintheroleofchanceinhumanaffairsoftengenerateinteresting
experiments.Mastersof
’metricsexploittheseexperimentswithatool
calledtheregressiondiscontinuity (RD)design.RDdoesn’twork
forall

causalquestions,butitworksformany.Andwhenitdoes,theresults
havealmostthesamecausalforceasthosefromarandomizedtrial.

4.1BirthdaysandFunerals

KATY:Isthisreallywhatyou’regonnadofortherestofyourlife?

BOON:Whatdoyoumean?

KATY:Imeanhangingaroundwithabunchofanimalsgettingdrunk
everyweekend.

BOON:No!AfterIgraduate,I’mgonnagetdrunkeverynight.
AnimalHouse,1978…ofcourse

Yourtwenty-firstbirthdayisanimportantmilestone.Americanover-
21s
candrinklegally,“atlast,”somewouldsay.Ofcourse,thoseunderage
drinkaswell.AswelearnfromtheexploitsofBoonandhisfraternity
brothers, not all underage drinking is inmoderation. In an effort
to
addressthesocialandpublichealthproblemsassociatedwithunderag
e

drinking,agroupofAmericancollegepresidentshavelobbiedstatesto
return the minimum legal drinking age (MLDA) to the
Vietnamera
thresholdof18.Thetheorybehindthiseffort(knownastheAmethyst
Initiative)isthatlegaldrinkingatage18discouragesbingedrinkingan
d
promotesacultureofmaturealcoholconsumption.Thiscontrastswith
thetraditionalviewthattheage-21MLDA,whileabluntandimperfect
tool,reducesyouthaccesstoalcohol,therebypreventingsomeharm.
Fortunately, the history of the MLDA generates two natural
experimentsthatcanbeusedforasoberassessmentofalcoholpolicy.
Wediscuss the firstexperiment in thischapterandthesecondinthe
next.1The firstMLDAexperimentemerges fromthe fact
thatasmall
changeinage(measuredinmonthsorevendays)generatesabigchange
inlegalaccess.ThedifferenceadaymakescanbeseeninFigure4.1,
whichplotstherelationshipbetweenbirthdaysandfunerals.Thisfigu
re
shows the number of deaths amongAmericans aged 20–22
between
1997and2003.Deathshereareplottedbyday,relativetobirthdays,



whichare labeledasday0.Forexample, someonewhowasbornon

September18,1990,anddiedonSeptember19,2012,iscountedamong
deathsof22-year-oldsoccurringonday1.

FIGURE4.1
Birthdaysandfunerals

Mortalityriskshootsuponandimmediatelyfollowingatwenty-first
birthday,afactvisibleinthepronouncedspikeindailydeathsonthese
days.Thisspikeaddsabout100deathstoabaselinelevelofabout150
per day. The age-21 spike doesn’t seem to be a generic party-
hardy
birthdayeffect.Ifthisspikereflectsbirthdaypartyingalone,weshoul
d
expect to seedeaths shootup after the twentieth and twenty-
second
birthdaysaswell,but thatdoesn’thappen.There’s something
special
aboutthetwenty-
firstbirthday.Itremainstobeseen,however,whether
the age-21 effect can be attributed to the MLDA, and whether
the
elevatedmortalityriskseeninFigure4.1lastslongenoughtobeworth
worryingabout.

FIGURE4.2
AsharpRDestimateofMLDAmortalityeffects

Notes:Thisfigureplotsdeathratesfromallcausesagainstageinmonth
s.Thelinesinthe
figureshowfittedvalues fromaregressionofdeathratesonanover-
21dummyandage in
months(theverticaldashedlineindicatestheminimumlegaldrinking
age(MLDA)cutoff).

SharpRD
Thestory linkingtheMLDAwithasharpandsustainedrise indeath
rates is told inFigure4.2.This figureplotsdeathrates (measuredas
deathsper100,000personsperyear)bymonthofage(definedas30-
day intervals), centered around the twenty-first birthday. TheX-
axis
extends2years ineitherdirection,andeachdot in the figure is the
deathrateinonemonthlyinterval.Deathratesfluctuatefrommonthto
month,butfewratestotheleftoftheage-21cutoffareabove95.At
agesover21,however,deathratesshiftup,andfewofthosetotheright
oftheage-21cutoffarebelow95.
Happily,theoddsayoungpersondiesdecreasewithage,afactthat

canbeseeninthedownward-slopinglinesfittothedeathratesplotted

inFigure4.2.Butextrapolatingthetrendlinedrawntotheleftofthe
cutoff,wemighthaveexpectedanage-21deathrateofabout92,while
thetrendlinetotherightof21startsmarkedlyhigher,ataround99.



Thejumpintrendlinesatage21illustratesthesubjectofthischapter,
regressiondiscontinuitydesigns(RDdesignsforshort).RDisbasedo
n
theseeminglyparadoxicalideathatrigidrules—
whichatfirstappearto
reduce or even eliminate the scope for randomness—create
valuable
experiments.
The causal question addressed by Figure 4.2 is the effect of
legal
accesstoalcoholondeathrates.Thetreatmentvariableinthiscasecan
bewrittenDa,whereDa=1indicateslegaldrinkingandis0otherwise.
Da is a function of age, a: the MLDA transforms 21-year-olds
from
underage minors to legal alcohol consumers. We capture this
transformationinmathematicalnotationbywriting

ThisrepresentationhighlightstwosignalfeaturesofRDdesigns:

▪Treatmentstatusisadeterministicfunctionofa,sothatoncewe
knowa,weknowDa.

▪ Treatmentstatusisadiscontinuousfunctionofa,becauseno
matterhowcloseagets to thecutoff,Da remainsunchanged
untilthecutoffisreached.

Thevariablethatdeterminestreatment,age inthiscase, iscalledthe
runningvariable.RunningvariablesplayacentralroleintheRDstory.
In
sharpRDdesigns, treatmentswitchescleanlyofforonastherunning
variablepassesacutoff.TheMLDA isa sharp functionofage, soan
investigationofMLDAeffects onmortality is a sharpRD
study.The
secondhalfof thechapterdiscussesa secondRDscenario,knownas
fuzzyRD,inwhichtheprobabilityorintensityoftreatmentjumpsata
cutoff.
Mortality clearly changeswith the running variable,a, for
reasons
unrelated to theMLDA. Death rates from disease-related causes
like
cancer (known to epidemiologists as internal causes) are low
but

increasingforthoseintheirlateteensandearly20s,whiledeathsfrom
externalcauses,primarilycaraccidents,homicides,andsuicides,fall
.To
separate this trendvariation fromanypossibleMLDAeffects,
anRD
analysiscontrolsforsmoothvariationindeathratesgeneratedbya.RD
getsitsnamefromthepracticeofusingregressionmodelstoimplemen
t
thiscontrol.
A simpleRDanalysisof theMLDAestimatescausaleffectsusinga

regressionlike

where isthedeathrateinmontha(again,monthisdefinedasa30-
day interval counting from the twenty-first birthday). Equation
(4.2)
includesthetreatmentdummy,Da,aswellasalinearcontrolforagein
months.Fittedvalues fromequation(4.2)producethe linesdrawnin
Figure 4.2. The negative slope, captured by γ, reflects smoothly
declining death rates among young people as they mature. The
parameterρ captures the jump indeathsatage21.Regression (4.2)
generatesanestimateofρequalto7.7.Whencastagainstaveragedeath
ratesofaround95,thisestimateindicatesasubstantialincreaseinrisk
attheMLDAcutoff.

IsthisacredibleestimateofthecausaleffectoftheMLDA?Shouldwe

notcontrolforotherthings?TheOVBformulatellsusthatthedifferenc
e
betweentheestimateofρ inthisshortregressionandtheresultsany
longer regressionmight produce depend on the correlation
between
variablesaddedtothelongregressionandDa.Butequation(4.1)tellsu
s
thatDaisdeterminedsolelybya.Assumingthattheeffectofaondeath
ratesiscapturedbyalinearfunction,wecanbesurethatnoOVBafflicts
thisshortregression.
ThelackofOVBinequation(4.2)isthepayofftoinsideinformation:

althoughtreatment isn’t randomlyassigned,weknowwhere
itcomes
from.Specifically,treatmentisdeterminedbytherunningvariable—
an
implication of the deterministic link noted above. The question
of
causality therefore turns on whether the relationship between
the

runningvariableandoutcomeshasindeedbeennailedbyaregression
withalinearcontrolforage.
AlthoughRDusesregressionmethodstoestimatecausaleffects,RD

designsarebestseenasadistincttoolthatdiffersimportantlyfromthe
regressionmethodsdiscussedinChapter2.InChapter2,wecompared
treatment and control outcomes at particular values of the
control
variables, inthehopethattreatmentisasgoodasrandomlyassigned
afterconditioningoncontrols.Here, there isnovalueof therunning
variable at which we get to observe both treatment and control
observations.Whoa,Grasshopper!Unlike
thematchingandregression
strategiesdiscussedinChapter2,whicharebasedontreatment-
control
comparisonsconditionaloncovariatevalues,thevalidityofRDturns
on
ourwillingnesstoextrapolateacrossvaluesoftherunningvariable,at
least forvalues in theneighborhoodof thecutoffatwhichtreatment
switcheson.
ThelocalnatureofsuchneighborlycomparisonsisapparentinFigure

4.2.The jump in trend linesat theMLDAcutoff implicitly
compares
death rates forpeopleon either sideof—but close to—a twenty-

first
birthday.Inotherwords,thenotionalexperimenthereinvolveschang
es
inaccesstoalcoholforyoungpeople,inaworldwherealcoholisfreely
availabletoadults.Theresultsfromthisexperiment,thoughrelevantf
or
contemporarydiscussionsofalcoholpolicy,neednottellusmuchabo
ut
theconsequencesofmoredramaticpolicychanges,suchasProhibitio
n.

RDSpecifics
RDtoolsaren’tguaranteedtoproducereliablecausalestimates.Figur
e
4.3showswhynot. InpanelA, therelationshipbetweentherunning
variable(X)andtheoutcome(Y)islinear,withaclearjumpinE[Y|X]
at thecutoffvalueofone-half.PanelB lookssimilar,except that the
relationshipbetweenaverageYandXisnonlinear.Still,thejumpatX
=.5isplaintosee.PanelCofFigure4.3highlightsthechallengeRD
designersface.Here,thefigureexhibitsabaroquenonlineartrend,wit
h
sharp turns to the left and right of the cutoff, but no
discontinuity.

Estimatesconstructedusingalinearmodellikeequation(4.2)mistake
thisnonlinearityforadiscontinuity.

FIGURE4.3
RDinaction,threeways

Notes:PanelAshowsRDwithalinearmodelforE[Yi|Xi];panelBadds
somecurvature.Panel
C shows nonlinearity mistaken for a discontinuity. The vertical
dashed line indicates a
hypotheticalRDcutoff.



TwostrategiesreducethelikelihoodofRDmistakes,thoughneither
providesperfectinsurance.Thefirstmodelsnonlinearitiesdirectly,w
hile
thesecondfocusessolelyonobservationsnearthecutoff.Westartwith
thenonlinearmodelingstrategy,brieflytakingupthesecondapproach
attheendofthissection.
Nonlinearities in an RD framework are typically modeled using
polynomial functionsof the runningvariable. Ideally, the results
that
emergefromthisapproachareinsensitivetothedegreeofnonlinearity

themodel allows. Sometimes, however, as in the caseof
panelCof
Figure 4.3, they are not. The question of howmuch nonlinearity
is
enoughrequiresajudgmentcall.Ariskhereisthatyou’llpickthemode
l
thatproduces the results that seemmostappealing,perhaps
favoring
those that conformmost closely to your prejudices.
RDpractitioners
thereforeowetheirreadersareportonhowtheirRDestimateschange
asthedetailsoftheregressionmodelusedtoconstructthemchange.
Figure4.2suggeststhepossibilityofmildcurvatureintherelationship
between anda,at least forthepointstotherightofthecutoff.A
simpleextensionthatcapturesthiscurvatureusesquadraticinsteadof
linearcontrol for therunningvariable.TheRDmodelwithquadratic
runningvariablecontrolbecomes

where γ1a + γ2a
2 is a quadratic function of age, and the γs are

parameterstobeestimated.
A related modification allows for different running variable
coefficients to the left and right of the cutoff. This modification
generatesmodels that interact a withDa. Tomake themodel with

interactions easier to interpret, we center the running variable
by
subtractingthecutoff,a0.Replacingabya−a0(here,a0=21),and
addinganinteractionterm,(a−a0)Da,theRDmodelbecomes



Centeringtherunningvariableensuresthatρinequation(4.3)isstill
thejumpinaverageoutcomesatthecutoff(ascanbeseenbysettinga
=a0intheequation).
Why should the trend relationship between age and death rates

changeatthecutoff?Datatotheleftofthecutoffreflecttherelationship
betweenageanddeathratesforasamplewhosedrinkingbehavioris
restricted by the MLDA. In this sample, we might expect
steadily
decliningdeathratesasyoungpeoplematureandtakefewerrisks.Afte
r
age 21, however, unrestricted access to alcohol might change
this
process,perhapsslowingadeclining trend.Ontheotherhand, if the
college presidents who back the Amethyst Initiative are right,
responsible legal drinking accelerates the development of
mature
behavior.Thedirectionofsuchachangeinslopesismerelyahypothesi

s
—themainpointisthatequation(4.3)allowsforslopechangeseither
way.
Asubtleimplicationofthemodelwithinteractiontermsisthataway

fromthea0cutoff,theMLDAtreatmenteffectisgivenbyρ+δ(a−a0).
Thiscanbeseenbysubtractingtheregressionlinefit toobservations
whereDa isswitchedofffromthelinefittoobservationswhereDa is
switchedon:

Estimatesawayfromthecutoffconstituteaboldextrapolation,howev
er,
andshouldbeconsumedwithasliceoflimeandashakerofsalt.There
isnodataoncounterfactualdeathratesinaworldwheredrinkingat
agessubstantiallyolderthan21isforbidden.Likewise,fartotheleftof
thecutoff,it’shardtosaywhatdeathrateswouldbeinaworldwhere
drinkingatveryyoungagesisallowed.Bycontrast,itseemsreasonabl
e
to say that those just under 21 provide a good counterfactual
comparisonforthosejustover21.Thisleadsustoseeestimatesofthe
parameterρ(thecausaleffectrightatthecutoff)asmostreliable,even
whenthemodelusedforestimationimplicitlytellsusmorethanthat.

Nonlinear trends and changes in slope at the cutoff can also be
combinedinamodelthatlookslike

Inthissetup,boththelinearandquadratictermschangeaswecrossthe
cutoff.Asbefore,thejumpindeathratesattheMLDAcutoffiscaptured
bytheMLDAtreatmenteffect,ρ.Thetreatmenteffectawayfromthe
cutoffisnowρ+δ1(a−a0)+δ2(a−a0)

2,thoughagainthecausal
interpretation of this quantity is more speculative than the
causal
interpretationofρitself.
Figure 4.4 shows that the estimated trend function generated by

equation(4.4)hassomecurvature,mildlyconcavetotheleftofage21
andmarkedlyconvexthereafter.Thismodelgeneratesalargerestimat
e
oftheMLDAeffectatthecutoffthandoesalinearmodel,equaltoabout
9.5 deaths per 100,000. Figure 4.4 also shows the linear trend
line
generatedbyequation(4.2).Themoreelaboratemodelseemstogivea
betterfitthanthesimplemodel:Deathratesjumpsharplyatage21,but
then recover somewhat in the first few months after a twenty-
first
birthday.Thisechoes thespike indailydeathratesonoraroundthe

twenty-firstbirthdayseeninFigure4.1.UnlikeBoonandhisfraternity
brothers, many newly legalized drinkers seem eventually to tire
of
gettingtrashedeverynight.Specification(4.4)capturesthisjump—
and
decline—nicely,thoughatthecostofsometechnicalfanciness.
Whichmodel isbetter, fancyorsimple?Therearenogeneralrules

here,andnosubstituteforathoughtfullookatthedata.We’reespeciall
y
fortunatewhentheresultsarenothighlysensitivetothedetailsofour
modelingchoices,asappearstrueinFigure4.4.ThesimpleRDmodel
seemsflexibleenoughtocaptureeffectsrightatthecutoff,inthiscase
aroundatwenty-
firstbirthday.Thefancierversionfitsthespikeindeath
ratesnear twenty-firstbirthdays,while also capturing the
subsequent
partialrecoveryindeathrates.



Effectsatthecutoffneednotbethemostimportant.Supposeweraise
thedrinkingageto22.Inaworldwhereexcessalcoholdeathsaredue
entirely toMLDAbirthdayparties, suchachangemightextendsome
livesbyayearbutotherwisehavelittleeffect.Thesustainedincreasein

death rates apparent in Figure 4.4 is therefore important, since
this
suggests restrictedalcoholaccesshas
lastingbenefits.Wecommented
abovethatevidenceforeffectsawayfromthecutoffismorespeculativ
e
thantheevidencefoundinajumpnearthecutoff.Ontheotherhand,
whenthetrendrelationshipbetweenrunningvariableandoutcomesis
approximatelylinear,limitedextrapolationseemsjustified.Thejum
pin
death rates at the cutoff shows that drinking behavior responds
to
alcoholaccessinamannerthatisreflectedindeathrates,animportant
pointofprinciple,whiletheMLDAtreatmenteffectextrapolatedasfar
outasage23stilllookssubstantialandseemsbelievable,ontheorderof
5extradeathsper100,000.Thispatternhighlightsthevalueof“visual
RD,”thatis,carefulassessmentofplotslikeFigure4.4.

FIGURE4.4
QuadraticcontrolinanRDdesign

Notes:Thisfigureplotsdeathratesfromallcausesagainstageinmonth
s.Dashedlinesinthe
figureshowfittedvalues fromaregressionofdeathratesonanover-
21dummyandage in

months.Thesolidlinesplotfittedvaluesfromaregressionofmortality
onanover-21dummy



andaquadraticinage,interactedwiththeover-
21dummy(theverticaldashedlineindicatesthe
minimumlegaldrinkingage[MLDA]cutoff).

HowconvincingistheargumentthatthejumpinFigure4.4isindeed
duetodrinking?Dataondeathratesbycauseofdeathhelpusmakethe
case. Although alcohol is poisonous, few people die from
alcohol
poisoningalone,anddeathsfromalcohol-
relateddiseasesoccuronlyat
olderages.Butalcoholiscloselytiedtomotorvehicleaccidents(MVA
),
thenumber-onekillerofyoungpeople.Ifdrunkdrivingistheprimary
alcohol-relatedcauseofdeaths,weshouldseea large jumpinmotor
vehicle fatalitiesalongside littlechangeindeathratesdueto internal
causes.LikethebalancingtestsreportedfortheRANDHIEexperimen
t
inTable1.3andfortheKIPPofferinstrumentinpanelAofTable3.1,
zeroeffectsonoutcomesthatshouldbeunchangedbytreatmentraise
ourconfidenceinthecausaleffectsweareafter.

Asabenchmarkforresultsrelatedtospecificcausesofdeath,thefirst

rowofTable4.1showsestimatesforalldeaths,constructedusingboth
simple RD equation (4.2) and fancy RD equation (4.4). These
are
displayedincolumns(1)and(2).ThesecondrowofTable4.1reveals
strongeffectsoflegaldrinkingonMVAfatalities,effectslargeenoug
hto
accountformostoftheexcessdeathsrelatedtotheMLDA.Theestimat
es
herearelargelyinsensitivetowhetherthefancyorsimplemodelisused
toconstructthem.Othercausesofdeathwemightexpecttoseeaffected
by drinking are suicide and other external causes, which include
accidentsotherthancarcrashes.Indeed,estimatedeffectsonsuicidea
nd
deathsfromotherexternalcauses(excludinghomicide)alsoshowsma
ll
butstatisticallysignificantincreasesattheMLDAcutoff.
Importantly,theestimatesreportedincolumns(1)and(2)fordeaths

fromall internal causes (these includedeaths fromcancer
andother
diseases)aresmallandandnotsignificantlydifferentfromzero.Asthe
lastrowinthetableshows,effectsfromdirectalcoholpoisoningalso
appeartobemodestandofroughlythesamemagnitudeasthosefrom

internal causes, though the estimated jump in deaths from
alcohol
poisoning is significantly different from zero. On balance,
therefore,



Table4.1supportstheMLDAstory,showingcleareffectsforcausesm
ost
likelyattributable toalcoholbut littleevidenceofan increasedueto
internalcauses.
Alsoinsupportofthisconclusion,Figure4.5plots fittedvalues for

MVAfatalities,constructedusingthemodelthatgeneratestheestimat
es
incolumn(2)ofTable4.1.ThefigureshowsaclearbreakattheMLDA
cutoff,withnoevidenceofpotentiallymisleadingnonlineartrends.At
the same time, there isn’tmuchof a jump indeathsdue to internal
causes,while the standarderrors inTable4.1 suggest that the
small
jumpininternaldeathsseeninthefigureislikelyduetochance.

TABLE4.1
SharpRDestimatesofMLDAeffectsonmortality

Notes:Thistablereportscoefficientsonanover-
21dummyfromregressionsofmonth-of-age-
specific death rates by cause on an over-21 dummy and linear or
interacted quadratic age



controls.Standarderrorsarereportedinparentheses.

Inadditiontostraightforwardregressionestimation,anapproachthat
mastersrefertoasparametricRD,asecondRDstrategyexploitsthefac
t
thattheproblemofdistinguishingjumpsfromnonlineartrendsgrows
lessvexingaswezeroinonpointsclosetothecutoff.Forthesmallset
ofpointsclosetotheboundary,nonlineartrendsneednotconcernusat
all. This suggests an approach that compares averages in a
narrow
windowjusttotheleftandjusttotherightofthecutoff.Adrawback
hereisthatifthewindowisverynarrow,therearefewobservationsleft,
meaning the resulting estimates are likely to be too imprecise to
be
useful.Still,weshouldbeabletotradethereductioninbiasnearthe
boundary against the increased variance suffered by throwing
data
away,generatingsomekindofoptimalwindowsize.

FIGURE4.5
RDestimatesofMLDAeffectsonmortalitybycauseofdeath

Notes:Thisfigureplotsdeathratesfrommotorvehicleaccidentsandin
ternalcausesagainst
ageinmonths.Linesinthefigureplotfittedvaluesfromregressionsof
mortalitybycauseonan
over-21dummyandaquadratic functionofage inmonths,
interactedwith thedummy(the
verticaldashedlineindicatestheminimumlegaldrinkingage[MLDA
]cutoff).



Theeconometricprocedurethatmakesthistrade-offisnonparametric
RD.NonparametricRDamountstoestimatingequation(4.2)inanarro
w
windowaroundthecutoff.Thatis,weestimate

Theparameterb describes thewidth of thewindow and is called a
bandwidth.Theresults inTable4.1canbeseenasnonparametricRD
withabandwidthequalto2yearsofagefortheestimatesreportedin
columns(1)and(2)andabandwidthhalfas large(that is, including
onlyages20–21insteadof19–22)fortheestimatesshownincolumns

(3)and(4).Thechoiceofthesimplemodel inequation(4.5)vs. the
fancierequation(4.4)shouldmatterlittlewhenbothareestimatedin
narrower age windows around the cutoff. The results in Table
4.1
supportthisconjecture,thoughthereissomewobblinessintheestimat
es
across columns that we might reasonably attribute to sampling
variance.2

Simpleenough!Buthowshallwepickthebandwidth?Ononehand,
toobviateconcernsaboutpolynomial choice,we’d like toworkwith
data close to the cutoff. On the other hand, less data means less
precision. For starters, therefore, the bandwidth should vary as
a
function of the sample size. The more information available
about
outcomesintheneighborhoodofanRDcutoff,thenarrowerwecanset
thebandwidthwhilestillhopingtogenerateestimatespreciseenought
o
be useful. Theoretical econometricians have proposed
sophisticated
strategies formaking such bias-variance trade-offs efficiently,
though
here too, the bandwidth selection algorithm is not completely
data-

dependent and requires researchers to choose
certainparameters.3 In
practice,bandwidthchoice—
likethechoiceofpolynomialinparametric
models—requiresajudgmentcall.Thegoalhereisnotsomuchtofind
theoneperfectbandwidthastoshowthatthefindingsgeneratedbyany
particularchoiceofbandwidtharenotafluke.



Inthisspirit,thestudiesuponwhichourinvestigationoftheMLDAis
basedappeartohavebeenwritteninRDheaven(perhapsarewardfor
theirauthors’temperance).TheRDestimatesgeneratedbyparametri
c
modelswith alternativepolynomial controls comeout similar
toone
anotherandclose toa corresponding setofnonparametric
estimates.
Thesenonparametricestimatesare largely
insensitivetothechoiceof
bandwidthoverawiderange.4Thisalignmentof results suggests the
findingsgeneratedbyanRDanalysisoftheMLDAcapturerealcausal
effects. Someyoungpeople appear to pay theultimate price for
the
privilegeofdowningalegaldrink.

4.2TheEliteIllusion

KWAICHANGCAINE:Iseeknottoknowtheanswers,buttoundersta
nd



thequestions.
KungFu,Season1,Episode14

TheBostonandNewYorkCitypublicschoolsystemsincludeahandful
ofselectiveexamschools.UnlikemostotherAmericanpublicschools
,
examschoolsscreenapplicantsonthebasisofacompetitiveadmission
s
test.JustasmanyAmericanhighschoolseniorscompetetoenrollinthe
country’smostselectivecollegesanduniversities,youngerstudentsa
nd
theirparentsinafewcitiesaspiretocovetedseatsattopexamschools.
Fewer thanhalfofBoston’sexamschoolapplicantswinaseatat the
JohnD.O’BryantSchool,BostonLatinAcademy,or theBostonLatin
School(BLS);onlyone-
sixthofNewYorkapplicantsareofferedaseatat
oneof the threeoriginal examschools in theBigApple

(Stuyvesant,
BronxScience,andBrooklynTech).
At first blush, the intense competition for exam school seats is
understandable. Many exam school students go on to
distinguished
careersinscience,thearts,andpolitics.Byanymeasure,examschool
studentsarewellaheadofotherpublicschoolstudents.It’seasytosee
whysomeparentswouldgiveakidney(perhapsaliver!)toplacetheir
childreninsuchschools.Economistsandothersocialscientistsareals
o
interested in theconsequencesof theexamschool
treatment.Forone
thing,examschoolsbringhigh-
abilitystudentstogether.Surelythat’sa
goodthing:brightstudentslearnasmuchfromtheirpeersasfromtheir
teachers,orsowesayathighlyselectiveinstitutionslikeMITandthe
LondonSchoolofEconomics.
Thecaseforanexamschooladvantageiseasytomake,butit’salso
clearthatat leastsomeoftheachievementdifferenceassociatedwith
exam school attendance reflects these schools’ selective
admissions
policies.Whenschoolsadmitonlyhighachievers,thenthestudentswh
o
gotherearenecessarilyhighachievers,regardlessofwhetherthescho
ol

itself addsvalue.This sounds likea caseof selectionbias, and it
is.
Takingacuefromthefar-
sightedOregonHealthAuthorityanditshealth
insurance lottery,wemighthope to convince Stuyvesant
andBoston
Latintoadmitstudentsatrandom,insteadofonthebasisofatest.We



couldthenusetheresultingexperimentaldatatolearnwhetherexam
schools add value.Or couldwe? For if exam schoolswere to
admit
studentsrandomly,thentheywouldn’tbeexamschoolsafterall.
Ifselectiveadmissionsareanecessarypartofwhatitmeanstobean

examschool,howcanwehope todesignanexperiment that reveals
exam school effectiveness? Necessity is the mother of
invention, as
revered philosophers Plato and FrankZappa remindus. The
discrete
natureofexamschooladmissionspoliciescreatesanaturalexperimen
t.
Amongapplicantswithscoresclosetoadmissionscutoffs,whetheran
applicant falls to the rightor left of the cutoffmightbe as goodas

randomly assigned. In this case, however, the experiment is
subtle:
ratherthanasimpleon-offswitch, it’s thenatureof theexamschool
experience that changes discontinuously at the cutoff, since
some
admittedstudentschoosetogoelsewherewhilemanyofthoserejected
at one exam school end up at another.When discontinuities
change
treatmentprobabilities or average characteristics (treatment
intensity,
forshort), insteadof flickingasimpleon-offswitch,
theresultingRD
designissaidtobefuzzy.

FuzzyRD
Justwhatistheexamschooltreatment?Figures4.6–
4.8,whichfocuson
applicantstoBLS,helpuscraftananswer.BLSapplicants,likeallwho
aspiretoanexamschoolseatinBoston,taketheIndependentSchools
Entrance Exam (ISEE for short). The sample used to construct
these
figures consists of applicantswith ISEE scoresnear theBLS
entrance
cutoff.ThedotsinthefiguresareaveragesofthevariableontheY-axis
calculatedforapplicantswithISEEscoresinbinsonepointwide,whil

e
thelinethroughthedotsshowsafitobtainedbysmoothingthesedatain
amannerexplainedinafootnote.5Figure4.6showsthatmostbutnot
allqualifyingapplicantsenrollatBLS.

FIGURE4.6
EnrollmentatBLS



Notes: This figure plots enrollment rates at Boston Latin School
(BLS), conditional on
admissions test scores, forBLSapplicants scoringnear
theBLSadmissionscutoff.Solid lines
showfittedvaluesfromalocallinearregressionestimatedseparatelyo
neithersideofthecutoff
(indicatedbytheverticaldashedline).

FIGURE4.7
EnrollmentatanyBostonexamschool

Notes:ThisfigureplotsenrollmentratesatanyBostonexamschool,co
nditionalonadmissions
testscores,forBostonLatinSchool(BLS)applicantsscoringnearthe
BLSadmissionscutoff.Solid

linesshowfittedvaluesfromalocallinearregression,estimatedsepar
atelyoneithersideofthe



cutoff(indicatedbytheverticaldashedline).

BLS is the most prestigious exam school in Boston. Where do
applicants who miss the BLS cutoff go? Most go to Boston
Latin
Academy,avenerableinstitutionthat’soneschooldownintheBoston
examschoolhierarchy.Thisenrollmentshift isdocumented
inFigure
4.7,whichplotsenrollmentratesatanyBostonexamschoolaroundthe
BLScutoff.Figure4.7showsthatmoststudentswhomisstheBLScuto
ff
indeedendupatanotherexamschool,sothattheoddsofenrollingat
someexamschoolarevirtuallyunchangedattheBLScutoff. Itwould
seem, therefore, that we have to settle for a parochial-sounding
experiment comparing highly selective BLS to the somewhat
less
selectiveBostonLatinAcademy,insteadofamoreinterestingevaluat
ion
ofthewholeexamschoolidea.

FIGURE4.8
PeerqualityaroundtheBLScutoff

Notes:This figureplotsaverage seventh-gradepeerquality
forapplicants toBostonLatin
School (BLS), conditional on admissions test scores, for BLS
applicants scoring near the
admissionscutoff.Peerquality ismeasuredbyseventh-
gradeschoolmates’ fourth-grademath
scores. Solid lines show fittedvalues froma local linear
regression, estimated separatelyon
eithersideofthecutoff(indicatedbytheverticaldashedline).



Or do we? One of the most controversial questions in education
researchisthenatureofpeereffects;thatis,whethertheabilityofyour
classmateshasacausaleffectonyourlearning.Ifyou’reluckyenought
o
attendhighschoolwithothergoodstudents,thismaycontributetoyou
r
success.Ontheotherhand,ifyou’rerelegatedtoaschoolwheremost
studentsdopoorly,thismayholdyouback.Peereffectsareimportant
for policies related to school assignment, that is, the rules and
regulations that determine where children attend school. In

many
Americancities,forexample,studentsattendschoolsneartheirhomes
.
Because poor, nonwhite, and low-achieving students tend to live
far
fromwell-to-do,high-
achievingstudentsinmostlywhiteneighborhoods,
schoolassignmentbyneighborhoodmayreducepoorminoritychildre
n’s
chancestoexcel.Manyschooldistrictsthereforebuschildrentoschoo
ls
farfromwheretheyliveinanefforttoincreasethemixingofchildren
fromdifferentbackgroundsandraces.
Exam schools induce a dramatic experiment in peer quality.

Specifically, applicantswho qualify for admission at one of
Boston’s
examschoolsattendschoolwithmuchhigher-achievingpeersthando
applicantswhojustmissthecut,evenwhenthealternativeisanother
examschool.Figure4.8documentsthisforBLSapplicants.Here,peer
achievementismeasuredbythemathscoreofapplicants’schoolmates
onatesttheytookinfourthgrade(2yearsbeforetheyappliedtoexam
schools).AsinthecharterschoolinvestigationdiscussedinChapter3,
testscoresinthisfigurearemeasuredinstandarddeviationunits,wher
e

onestandarddeviationiswritteninGreekas1σ.Successfulapplicants
to
BLSstudywithmuchhigher-scoring schoolmates,enjoyinga jump
in
peermathachievementof.8σ,equivalenttothedifferenceinaverage
peerqualitybetween innercityBostonand itswealthysuburbs.Such
dramaticvariationintreatmentintensityliesattheheartofanyfuzzy
RDresearchdesign.Thedifferencebetweenfuzzyandsharpdesignsis
that,withfuzzy,applicantswhocrossathresholdareexposedtoamore
intensetreatment,whileinasharpdesigntreatmentswitchescleanlyo
n
oroffatthecutoff.



FuzzyRDIsIV
Inaregressionriteofpassage,socialscientistsaroundtheworld link
studentachievement to theaverageabilityof
theirschoolmates.Such
regressionsreliablyrevealastrongassociationbetweentheperforma
nce
ofstudentsandtheachievementoftheirpeers.AmongallBostonexam
schoolapplicants,aregressionofstudents’seventh-
grademathscoreson
the average fourth-grade scores of their seventh-grade

classmates
generatesacoefficientofaboutone-quarter.Thisputativepeereffect
comesfromtheregressionmodel

whereYi isstudent i’sseventh-grademathscore,Xi is i’sfourth-
grade
mathscore,and istheaveragefourth-grademathscoreofi’sseventh-
grade classmates (the subscript “(i)” remindsus that student i is
not
includedwhencalculatingtheaverageachievementofhisorherpeers)
.
Theestimatedcoefficientonpeerquality(θ1) isaround .25,meaning
thataonestandarddeviationincrease intheabilityofmiddleschool
peers,asmeasuredbytheirelementaryschoolscoresandcontrollingf
or
a student’s own elementary school performance, is
associatedwith a
.25σgaininmiddleschoolachievement.
Parentsandteachershaveapowerfulintuitionthat“peersmatter,”so

thestrongpositiveassociationbetweentheachievementofstudentsan
d
theirclassmatesringstrue.Butthisnaivepeerregressionisunlikelyto
haveacausalinterpretationforthesimplereasonthatstudentseducate
d

together tend to be similar for many reasons. Your authors’ four
children,forexample,precocioushigh-
achieversliketheirparents,have
beenfortunatetoattendschoolsattendedbymanychildrenfromsimila
r
families.Becausefamilybackgroundisnotheldfixedinregressionsli
ke
equation (4.6), the observed association between students and
their
classmates undoubtedly reflects some of these shared
influences. To
break the resulting causal deadlock, we’d like to randomly
assign
studentstoarangeofdifferentpeergroups.



Exam schools to the rescue! Figure 4.8 documents the
remarkable
differenceinpeerabilitythatBLSadmissionproduces,withajumpof
four-fifthsofastandarddeviationinpeerqualityattheBLScutoff.The
jumpinpeerqualityatexamschooladmissionscutoffsarises—
bydesign

fromthemixofstudentsenrolledinselectiveschools.Thisisjustwhat

theeconometricianorderedbywayofan idealpeerexperiment (this
improvementinpeerqualityalsomakesmanyparentshopeanddream
ofanexamschoolseatfortheirchildren).Moreover,whilepeerquality
jumps at the cutoff, cross-cutoff comparisons of variables
related to
applicants’ own abilities, motivation, and family background—
the
sources of selection bias we usually worry about—show no
similar
jumps. For example, there’s no jump in applicants’ own
elementary
schoolscores.Peerschangediscontinuouslyatadmissionscutoffs,bu
t
examschoolapplicants’owncharacteristicsdonot.6

Hopes, dreams, and the results from our naive peer regression
(equation (4.6)) notwithstanding, the exam school experiment
casts
doubtonthenotionofacausalpeereffectontheachievementofBoston
examschoolapplicants.TheseedsofdoubtareplantedbyFigure4.9,
whichplotsseventh-andeighth-
grademathscores(onteststakenafter1
or 2 years of middle school) against ISEE scores (the exam
school
runningvariable)forapplicantsscoringneartheBLScutoff.Admitted

applicantsareexposedtoamuchstrongerpeergroup,butthisexposure
generatesnoparalleljumpinapplicants’middleschoolachievement.
As in equation (4.2), the size of the jump in Figure 4.9 can be

estimatedbyfittinganequationlike

Here,Diisadummyvariableindicatingapplicantswhoqualify,while
Ri
is the running variable that determines qualification. In a
sample of
seventh-
gradeapplicantstoBLS,whereYiisamiddleschoolmathscore
asinthefigures,thisregressionproducesanestimateof−.02witha
standarderrorof.10,astatisticalzeroinourbook.



FIGURE4.9
MathscoresaroundtheBLScutoff

Notes:Thisfigureplotsseventh-andeighth-
grademathscoresforapplicantstotheBoston
LatinSchool(BLS),conditionalonadmissionstestscores,
forBLSapplicantsscoringnearthe
admissions cutoff. Solid lines show fitted values from a local

linear regression, estimated
separatelyoneithersideofthecutoff(indicatedbytheverticaldashedl
ine).

Howshouldweinterpretthisestimateofρ?Throughthelensofthe
correspondingfirststage,ofcourse!Equation(4.7)isthereducedform
fora2SLSsetupwheretheendogenousvariableisaveragepeerquality
,
.Thefirst-stageequationthatgoeswiththisreducedformis

wheretheparameterϕcapturesthejumpinmeanpeerqualityinduced
by an exam school offer. This is the jump shown in Figure 4.8,
a
preciselyestimated.80σ.
Thelastpieceofour2SLSsetupisthecausalrelationshipofinterest,

the2SLSsecondstage.Inthiscase,thesecondstagecapturestheeffect
ofpeerqualityonseventh-andeighth-
grademathscores.Asalways,the
secondstageincludesthesamecontrolvariablesasappearinthefirst
stage.Thisleadstoasecond-stageequationthatcanbewritten



whereλisthecausaleffectofpeerquality,andthevariable isthe

first-stagefittedvalueproducedbyestimatingequation(4.8).
Notethatequation(4.9)inheritsacovariatefromthefirststageand

reduced form, the runningvariable,Ri.On theotherhand, the jump
dummy, Di, is excluded from the second stage, since this is the
instrument that makes the 2SLS engine run. Substantively,
we’ve
assumedthatintheneighborhoodofadmissionscutoffs,afteradjustin
g
for running variable effects with a linear control, exam school
qualificationhasnodirect effecton test scores,but rather
influences
achievement,ifatall,solelythroughpeerquality.Thisassumptionist
he
all-importantIVexclusionrestrictioninthiscontext.
The2SLSestimateofλ inequation(4.9) is−.023withastandard

error of .132.7 Since the reduced-form estimate is close to and
not
significantlydifferentfromzero,soisthecorresponding2SLSestima
te.
Thisestimate isalso far from theestimateof .25σgeneratedbyOLS
estimationofthenaivepeereffectsregression,equation(4.6).Onthe
otherhand,who’stosaythattheonlythingthatmattersaboutanexam
schooleducationispeerquality?Theexclusionrestrictionrequiresus

to
committoaspecificcausalchannel.Buttheassumedchannelneednot
betheonlyonethatmattersinpractice.
A distinctive feature of the exam school environment besides
peer

achievement is racial composition.
InBoston’smostlyminoritypublic
schools,examschoolsoffertheopportunitytogotoschoolwithamore
diversepopulation,wherediversitymeansmorewhiteclassmates.Th
e
court-
mandateddismantlingofsegregatedAmericanschoolsystemswas
motivatedbyanefforttoimproveeducationaloutcomes.In1954,the
U.S.SupremeCourtfamouslydeclared:“Separateeducationalfacilit
ies
areinherentlyunequal,”layingtheframeworkforcourt-
orderedbusing
to increase racial balance in public schools. Does increasing
racial
balance indeedboost achievement?Exam schools are relevant to
the
debateover racial
integrationbecauseexamschooladmissionsharply
increasesexposuretowhitepeers.Atthesametime,weknowthatifwe

replace peer quality, , with peer proportion white, this too will
produceazerosecond-stagecoefficient,aconsequenceofthefactthat
the underlying reduced form is unchanged by the choice of
causal
channel.
Examschoolsmightdiffer inotherwaysaswell,perhapsattracting

better teachers or offering more Advanced Placement (college-
level)
coursesthannonselectivepublicschools.Importantly,however,scho
ol
resources and other features of the school environment that
might
changeatexamschooladmissionscutoffsseemlikelytobebeneficial.
This in turn suggests that anyomittedvariablesbias
associatedwith
2SLSestimatesofexamschoolpeereffectsispositive.Thisclaimecho
es
thatmade inChapter2 regarding the likelydirectionofOVB inour
evaluationofselectivecolleges.Becauseomittedvariableswithposit
ive
effectsareprobablypositivelycorrelatedwithexamschooloffers,

the
2SLSestimateusingexamschoolqualificationasaninstrumentforpe
er
qualityis,ifanything,toobigrelativetothepurepeereffectwe’reafter.
Allthemoresurprising,then,thatthisestimateturnsouttobezero.
AswithanyIVstory,fuzzyRDrequirestoughjudgmentsaboutthe

causalchannelsthroughwhichinstrumentsaffectoutcomes.Inpracti
ce,
multiplechannelsmightmediatecausaleffects,inwhichcaseweexplo
re
alternatives.Likewise,thechannelswemeasurereadilyneednotbeth
e
onlyones thatmatter.Thecausal journeyneverends;newquestions
emergecontinuously.ButthefuzzyframeworkthatusesRDtogenerat
e
instrumentsisnolessusefulforallthat.

MASTERSTEVEFU:SummarizeRDforme,Grasshopper.
GRASSHOPPER:TheRDdesignexploitsabruptchangesintreatment
statusthatarisewhentreatmentisdeterminedbyacutoff.
MASTERSTEVEFU:IsRDasgoodas arandomizedtrial?
GRASSHOPPER:RDrequiresustoknowtherelationshipbetweenthe
runningvariableandpotentialoutcomesintheabsenceof
treatment.Wemustcontrolforthisrelationshipwhenusing

discontinuitiestoidentifycausaleffects.Randomizedtrialsrequire
nosuchcontrol.
MASTERSTEVEFU:Howcanyouknowthatyourcontrolstrategyis
adequate?
GRASSHOPPER:Onecan’tbesure,Master.Butourconfidenceincau
sal
conclusionsincreaseswhenRDestimatesremainsimilaraswe
changedetailsoftheRDmodel.
MASTERSTEVEFU:Andsharpversusfuzzy?
GRASSHOPPER:Sharpiswhentreatmentitselfswitchesonoroffata
cutoff.Fuzzyiswhentheprobabilityorintensityoftreatment
jumps.Infuzzydesigns,adummyforclearingthecutoffbecomesan
instrument;thefuzzydesignisanalyzedby2SLS.
MASTERSTEVEFU:Youapproachthethresholdformastery,
Grasshopper.

Mastersof’Metrics:DonaldCampbell

TheRDstorywas first
toldbypsychologistsDonaldL.Thistlethwaite
andDonaldT.Campbell,whousedRDin1960toevaluatetheimpactof
NationalMeritScholarshipawardsonawardees’careersandattitudes

.8

As many of our readers will know, the American National Merit
Scholarshipprogramisamulti-roundprocess,attheendofwhichafew
thousand high-achieving high school seniors are awarded a
college
scholarship.Selection isbasedonapplicants’ scoreson
thePSATand
SAT tests, the college entrance exams taken by most U.S.
college
applicants.
Successful candidates in theNationalMerit
competitionhavePSAT
scoresaboveacutoff(andhavetheirPSATscoresvalidatedbydoing
well on the SAT, taken later). Among these, a few are awarded
scholarshipsbytheNationalMeritscreeningcommittee,whiletheres
t
getaCertificateofMerit.Studentsreceivingthiscertificate,knownas
NationalMerit finalists, are justifiably pleased: in recognition of
this
accomplishment, their names are distributed to colleges,
universities,

andtootherscholarshipsponsors.CollegeswithmanyNationalMerit
finalists in their incoming classes also like to advertise this
fact.
ThistlethwaiteandCampbellaskedwhether recognitionasaNational
Meritfinalisthasanylastingconsequencesforthosesorecognized.
Inearlierworkrelyingonmatchingmethods(ofthesortdescribedin
Chapter2),Thistlethwaiteestimatedthatapplicantswhowereawarde
d
aCertificateofMeritwere4percentagepointsmorelikelytoplanto
becomecollegeteachersorresearchersthantheyotherwisewouldhav
e
been.9ButanRDdesignexploitingdiscontinuitiesatthePSATcutofff
or
aCertificateofMeritgeneratedastatisticallyinsignificantestimateo
f
onlyabout2pointsforthisoutcome.Theplotthatgoeswiththisfinding
isreproducedhereasFigure4.10.Publicrecognitionbyitselfseemsto
havelittleeffectoncareerchoiceorplansforgraduatestudy.

FIGURE4.10
ThistlethwaiteandCampbell’sVisualRD

Notes:ThisfigureplotsPSATtesttakers’plansforgraduatestudy(lin
eI–I′)andameasureof
testtakers’careerplans(lineJ–

J′)againsttherunningvariablethatdeterminesNationalMerit
recognition.

DonaldCampbellisrememberednotjustforinventingRDbutalsofor
his 1963 essay, “Experimental and Quasi-Experimental Designs
for



ResearchonTeaching,”writtenwithJulianC.Stanleyandlaterrelease
d
in book form. The Campbell and Stanley essay was a pioneering
explorationofthe’metricsmethodsdiscussedinthisandthefollowing
chapterofourbook.AsubsequentupdatewrittenwithThomasD.Cook
remainsanimportantreferencetothisday.10

1OurMLDAdiscussiondrawsonChristopherCarpenterandCarlosD
obkin,“TheEffectof
Alcohol Consumption on Mortality: Regression Discontinuity
Evidence from the Minimum
DrinkingAge,”AmericanEconomicJournal—
AppliedEconomics,vol.1,no.1,January2009,pages
164–182, and “TheMinimum Legal Drinking Age and Public
Health,” Journal of Economic
Perspectives,vol.25,no.2,Spring2011,pages133–156.

2NonparametricRDmavenstypicallyestimatemodelslikeequation(
4.2)usingweightedleast

squares.Thisisaprocedurethatputsthemostweightonobservationsri
ghtatthecutoffandless
weightonobservationsfartheraway.Theweightingfunctionusedfort
hispurposeiscalleda
kernel.TheestimatesinTable4.1implicitlyuseauniformkernel;thati
s,theyweightobservations
insidethebandwidthequally.
3 See Guido W. Imbens and Karthik Kalyanaraman, “Optimal
Bandwidth Choice for the

RegressionDiscontinuityEstimator,”ReviewofEconomicStudies,v
ol.79,no.3,July2012,pages
933–959.
4A comparisonof parametric andnonparametric estimates
appears inTables 4 and5of

CarpenterandDobkin,“TheEffectofAlcoholComsumption,”Ameri
canEconomicJournal:Applied
Economics,2009.Sensitivitytochoiceofbandwidthisexploredinthe
ironlineappendix(DOI:
10.1257/app.1.1.164).The2009studyanalyzesmortalitybyexactda
yofbirth,whileherewe

workwithmonthlydata.
5 The variable that determines admissions in these figures is
aweighted averageof each

applicant’sISEEscoreandGPA,butwerefertothisrunningvariableas
theISEEscoreforshort.
Thedotsherecomefromasmoothingmethodknownaslocallinearregr
ession,whichworksby
fittingregressionstosmallsamplesdefinedbyabandwidtharoundeac
hpoint.Smoothedvalues
are the fitted values generated by this procedure. For details,
see the study onwhich our
discussionhereisbased:AtilaAbdulkadiroglu,JoshuaD.Angrist,an
dParagPathak,“TheElite
Illusion:AchievementEffectsatBostonandNewYorkExamSchools,
”Econometrica,vol.81,no.
1,January2014,pages137–196.
6ThisisdocumentedinAbdulkadirogluetal.,“TheEliteIllusion,”Ec
onometrica,2014.
7Thisstandarderrorisclusteredbyapplicant.Asexplainedintheappe
ndixtoChapter5,we

useclusteredstandarderrorstoadjustforthefactthatthedatacontainc
orrelatedobservations
(inthiscase,theseventh-andeighth-

gradetestscoresforeachBLSapplicantarecorrelated).
8DonaldL.ThistlethwaiteandDonaldT.Campbell,“Regression-
DiscontinuityAnalysis:An

AlternativetotheExPostFactoExperiment,”JournalofEducationalP
sychology,vol.51,no.6,
December1960,pages309–317.
9DonaldL.Thistlethwaite,“EffectsofSocialRecognitionupontheE
ducationalMotivationof

TalentedYouths,”JournalofEducationalPsychology,vol.50,no.3,1
959,pages111–116.
10DonaldT.CampbellandJulianC.Stanley,“ExperimentalandQuas
i-ExperimentalDesigns

forResearch onTeaching,”Chapter5 inNathaniel L. Gage
(ed.),Handbook of Research on
Teaching, Rand McNally, 1963; and Donald T. Campbell and
Thomas D. Cook, Quasi-
Experimentation:DesignandAnalysisIssuesforFieldSettings,Houg
htonMifflin,1979.



Chapter5

Differences-in-Differences

MASTERKAN:Ifwhilebuildingahouse,acarpenterstrikesanailand
itprovesfaultybybending,doesthecarpenterlosefaithinallnails
andstopbuilding?Soitiswithempiricalwork.
KungFu,Season1,Episode7

OurPath

Credibleinstrumentalvariablesanddramaticpolicydiscontinuitiesc
an
behard to find;you’llneedother ’metrics tools inyourkit too.The
differences-in-
differences(DD)methodrecognizesthatintheabsenceof
randomassignment,treatmentandcontrolgroupsarelikelytodifferfo
r
many reasons. Sometimes, however, treatment and control
outcomes
move in parallel in the absence of treatment. When they do, the
divergence of a post-treatment path from the trend established
by a
comparisongroupmaysignala treatmenteffect.WedemonstrateDD
withastudyoftheeffectsofmonetarypolicyonbankfailuresduringthe
GreatDepression.WealsorevisittheMLDA.

5.1AMississippiExperiment

OntheeveofthelargesteconomicdownturninAmericanhistory—the
GreatDepression—
economicspiritsranhighinthehallsofhighfinance.



CaldwellandCompany’sslogan“WeBankontheSouth”reflectedthe
confidenceofaregionalfinancialempire.BasedinNashville,Caldwe
ll
ranthelargestSouthernbankingchaininthe1920s,andownedmany
nonbanking businesses aswell. Rogers Caldwell, known as the J.
P.
MorganoftheSouth,livedlargeonanestatethathousedhisstableof
prize-winning thoroughbreds. Alas, in November of 1930,
mismanagement and fallout from the stockmarket crash of
October
1929 brought the Caldwell empire down. Within days,
Caldwell’s
collapse felled closely tied bankingnetworks
inTennessee,Arkansas,
Illinois,andNorthCarolina.TheCaldwellcrisiswasaharbingerofa
surgeinbankfailuresacrossthecountry.

Banking is abusinessbuilt on confidenceand trust.Banks lend to
businessesandpropertyownersintheexpectationthatmostloanswill
be paid offwhen they comedue.Depositors trust they’ll be able
to
withdraw their funds on demand. This confidence
notwithstanding,
bankshold lesscash thanneededtopayalldepositors,becausemost
deposits are out on loan. The resultingmaturitymismatch poses
no
probleminnormaltimes,whenfewdepositorsmakewithdrawalsonan
y
givenday.
Ifconfidencefalters,thebankingsystembreaksdown.Inthe1930s,
when your bank went out of business, your savings very likely
disappearedwithit.Evenifyourbank’smortgageandloanportfolios
lookedsafe,youwouldn’thavewantedtobethelastdepositortotryto
get yourmoney out.Once other depositors are seen towithdraw in
panic,you’ddowelltopanictoo.That’showabankrunstarts.
Caldwell’s demise shook depositor confidence throughout the
AmericanSouthandprecipitatedarunonMississippibanksinDecemb
er
1930.DepositsinMississippifellslowlyatfirst,butonDecember19,t
he
floodgatesopenedwhensaverspanicked.Onthatday, theMississippi
stateBankingDepartmentclosedthreebanks.Twomorebanksceased

operationsthedayafter,andanother29foldedinthenextsixmonths.
Theregionalpanicof1930wasoneofmanymoretocome.In1933,the
yearDepression-
erabankfailurespeaked,morethan4,000banksfailed
nationwide.



Economists have long sought to understand whether and how
monetarypolicycontributedtotheGreatDepression,andwhethermor
e
aggressive monetary intervention might have stemmed the
financial
collapseandeconomicfreefallseeninthosedarkdays.Depression-
era
lessonsmayhelpusunderstandthepresent.Althoughfinancialmarket
s
todayaremoresophisticated,thepillarsoffinanceremainmuchasthey
were:banksborrowandlend,typicallyatdifferentmaturities,andbet
onbeingabletoraisethecash(knowninbankingjargonas“liquidity”)
neededtocoverliabilitiesastheycomedue.
We’reunluckyenoughtoliveineconomicallyinterestingtimes.The
year2008 saw theU.S. financial systemshakenbya collapse in the
market for mortgage-backed securities, followed by a European
sovereign debt crisis beginning in late 2009. Carmen Reinhart

and
Kenneth Rogoff have recently chronicled financial crises since
the
fourteenth century, arguing they share a common anatomy. The
apparent similarityof suchepisodesmakesyouwonderwhether
they
canbeavoided,orat leastwhether theireffectscanbemitigated. In
their masterful 1963 monetary history of the United States,
Milton
FriedmanandAnnaSchwartzconvincedmanyeconomiststhataprope
r
understandingoftheeffectsofmonetarypolicyisthekeytoanswering
thisquestion.1

OneMississippi,TwoMississippi
Policymakersfacingabankruncanopentheflowofcreditorturnoff
the tap. Friedman and Schwartz argued that the Federal Reserve
(America’s central bank) foolishly restricted credit as the Great
Depressionunfolded.Easymoneymighthaveallowedbanks tomeet
increasinglyurgentwithdrawaldemands,stavingoffdepositorpanic.
By
lendingtotroubledbanksfreely,thecentralbankhasthepowertostem
aliquiditycrisisandobviatetheneedforabailoutinthefirstplace.
Butwho’stosaywhenacrisisismerelyacrisisofconfidence?Some
crisesarereal.Bankbalancesheetsmaybesosickenedbybaddebtsthat

noamountoftemporaryliquiditysupportwillcure’em.Afterall,bank
s



don’tlosetheirliquiditybyrandomassignment.Rather,bankmanager
s
makeloansthateitherfailorarefruitful.Injectingcentralbankfunds
intobadbanksmaythrowgoodmoneyafterbad.Betterinsuchcasesto
declare bankruptcy and hope for an orderly distribution of any
remainingassets.
Supportforbadbanksalsoraisesthespecterofwhateconomistscall

moralhazard.Ifbankersknowthatthecentralbankwill lendcheaply
whenliquidityrunsdry,theyneedn’ttakecaretoavoidcrisesinthefirst
place.In1873,TheEconomist’seditor-in-
chiefWalterBagehotdescribed
thedangerthisway:

If the banks are bad, theywill certainly continue bad andwill
probablybecomeworseiftheGovernmentsustainsandencourages
them.Thecardinalmaximis,thatanyaidtoapresentbadBankis
thesurestmodeofpreventingtheestablishmentofa futuregood
Bank.2

BagehotwasaprofessedSocialDarwinist,believingthatevolutionar
y
principlesappliedinsocialaffairsjustasinbiology.Whichpolicystan
ce
is more likely to speed a happy ending to an economic
downturn,
liquidity backstopping or survival of banking’s fittest? As
always,
mastersof’metricswouldliketosettlethisquestionwitharandomized
trial.Wehaveagrantproposaltofundsuchabankliquidityexperiment
underreview;we’llsurelyblogtheresults if itcomesthrough.Inthe
meantime,wemustlearnabouttheeffectsofmonetarypolicyfromthe
historyofbankingcrisesandpolicyresponsestothem.
Fortunatelyforthisresearchagenda,theU.S.FederalReserveSystem

isorganized into12districts,eachrunbyaregionalFederalReserve
Bank.Depression-
eraheadsoftheregionalFedshadconsiderablepolicy
independence. The Atlanta Fed, running the Sixth District,
favored
lendingtotroubledbanks.Bycontrast,theSt.LouisFedrantheEighth
District according to a philosophy known as theReal
BillsDoctrine,
whichholdsthatthecentralbankshouldrestrictcreditinarecession.
Especiallyhappilyforresearchonmonetarypolicy,theborderbetwee

n



theSixthandEighthDistrictsrunseast-westsmackthroughthemiddle
of the state of Mississippi (District borders were determined by
populationsizein1913,atthebirthoftheFederalReserveSystem).Thi
s
border defines awithin-state natural experiment fromwhichwe
can
profit.
MastersGaryRichardson andWilliamTroost
analyzedMississippi’s
monetarytwo-
step.3Aswemightexpectfromtheirdifferingapproaches
to monetary policy, the Atlanta and St. Louis Feds reacted very
differentlytotheCaldwellcrisis.Within4weeksofCaldwell’scollap
se,
theAtlantaFedhadincreasedbanklendingbyabout40%intheSixth
District. Inthesameperiod,banklendingbytheSt.LouisFedinthe
EighthDistrictfellalmost10%.
TheRichardson and Troost policy experiment imagines the
Eighth
Districtasacontrolgroup,wherepolicywastodolittleorevenrestrict
lending,whiletheSixthDistrictisatreatmentgroup,wherepolicywas

to increase lending.A first-lineoutcome is thenumberofbanks
still
operating ineachDistrictonJuly1,1931,about8monthsafter the
beginningofthecrisis.Onthatday,132bankswereopenintheEighth
Districtand121wereopenintheSixthDistrict,adeficitof11banksin
theSixthDistrict.Thissuggestseasymoneywascounterproductive.B
ut
lookagain:theSixthandEighthDistrictsweresimilarbutnotidentical
.
Weseethisinthefactthatthenumberofbanksoperatinginthetwo
districtsdifferedmarkedlyacrossdistrictsonJuly1,1930,wellbefore
theCaldwellcrisis,with135banksopenintheSixthDistrictand165
banksopenintheEighth.Toadjustforthisdifferenceacrossdistrictsin
thepre-
treatmentperiod,weanalyzetheMississippiexperimentusinga
toolcalleddifferences-in-differences,orDDforshort.

ParallelWorlds
LetYdtdenotethenumberofbanksopeninDistrictdinyeart,where
thesubscriptdtellsuswhetherwe’relookingatdatafromtheSixthor
EighthDistrictandthesubscriptttellsuswhetherwe’relookingatdata
from1930(beforetheCaldwellcrisis)or1931(after).TheDDestimate

(δDD)oftheeffectofeasymoneyintheSixthDistrictis

InsteadofcomparingthenumberofbanksopenintheSixthandEighth
DistrictsafterCaldwell,DDcontraststhechangeinthenumberofbank
s
operatinginthetwodistricts.
Comparingchanges insteadof levelsadjusts forthefact that inthe

pre-treatmentperiod,theEighthDistricthadmorebanksopenthanthe
Sixth.Toseethis,notethatwecanproducethesameDDbottomline
thisway:

ThisversionoftheDDcalculationsubtractsthepre-
treatmentdifference
between the Sixth and Eighth Districts from the post-treatment
difference,therebyadjustingforthefactthatthetwodistrictsweren’t
thesameinitially.DDestimatessuggestthatlendingtotroubledbanks
keptmanyofthemopen.Specifically,theAtlantaFedappearstohave
saved19banks—
morethan10%ofthoseoperatinginMississippi’sSixth
Districtin1930.
DDlogicisdepictedinFigure5.1,whichplotsthenumberofbanksin

theSixthandEighthDistrictsin1930and1931,withdatafromthetwo
periodsconnectedbysolidlines.Figure5.1highlightsthefactthatwhi

le
banksfailedinbothFederalReserveDistricts,theydidsomuchmore
sharplyintheEighth.

FIGURE5.1
BankfailuresintheSixthandEighthFederalReserveDistricts



Notes:ThisfigureshowsthenumberofbanksinoperationinMississip
piintheSixthand
EighthFederalReserveDistrictsin1930and1931.Thedashedlinedep
ictsthecounterfactual
evolutionofthenumberofbanksintheSixthDistrictifthesamenumbe
rofbankshadfailedin
thatdistrictinthisperiodasdidintheEighth.

The DD tool amounts to a comparison of slopes or trends across
districts.ThedottedlineinFigure5.1isthecounterfactualoutcometha
t
liesattheheartoftheDDresearchdesign:thislinetellsuswhatwould
havehappenedintheSixthDistricthadeverythingevolvedasitdidin
theEighth.The fact that the solid line for
theSixthDistrictdeclines
muchmoregraduallythanthiscounterfactual line isevidencefor the

effectivenessofeasymoney.The19bankfailuresuncoveredbyourD
D
calculation is thedifferencebetweenwhat reallyhappenedandwhat
wouldhavehappenedhadbankactivityinthetwodistrictsunfoldedin
parallel.
The DD counterfactual comes from a strong but easily stated

assumption:commontrends.IntheMississippiexperiment,DDpresu
mes
that,absentanypolicydifferences,theEighthDistricttrendiswhatwe
shouldhaveexpectedtoseeintheSixth.Althoughstrong,thecommon
trendsassumptionseemslikeareasonablestartingpoint,onethattake
s
account of pretreatment differences in levels. With more data,
the
assumptioncanalsobeprobed,tested,andrelaxed.

FIGURE5.2



TrendsinbankfailuresintheSixthandEighthFederalReserveDistrict
s

Note:This figureshowsthenumberofbanks inoperation

inMississippi in theSixthand
EighthFederalReserveDistrictsbetween1929and1934.

Figure5.2providesevidenceonthecommontrendsassumptionfor
Mississippi’sFederalReserveDistricts.Theevidencecomesinthefor
m
of a longer time series on bank activity. Before 1931, the Great
DepressionhadnotyethitMississippihard.RegionalFedpoliciesinth
e
twodistrictswerealsosimilarinthismorerelaxedperiod.Thefactthat
bankfailuresmovedalmostinparallelinthetwodistrictsbetween192
9
and1930,withthenumberofbanksdecliningslightlyinbothdistricts,
isthereforeconsistentwiththecommontrendshypothesisforuntreate
d
periods. Figure 5.3 adds the SixthDistrict counterfactual implied
by
extrapolatingEighthDistricttrendstotheSixthDistrictforyearsafter
1930.ThegapbetweenactualandcounterfactualSixthDistrictbankin
g
activitychangedlittlethrough1934.

FIGURE5.3
TrendsinbankfailuresintheSixthandEighthFederalReserveDistrict
s,

andtheSixthDistrict’sDDcounterfactual



Notes:ThisfigureaddsDDcounterfactualoutcomestothebankingdat
aplottedinFigure5.2.
Thedashedlinedepictsthecounterfactualevolutionofthenumberofb
anksintheSixthDistrict
ifthesamenumberofbankshadfailedinthatdistrictafter1930asdidint
heEighth.

AsinFigure5.1, therelativelysteepfall-off inbankactivity inthe
EighthDistrictaftertheCaldwellcollapseemergesclearlyinFigures5
.2
and5.3.But these figuresdocument something further.Beginning
in
July1931,theSt.LouisFedabandonedtightmoneyandstartedlending
to troubledbanks freely. Inotherwords,after1931,FederalReserve
policy in the twodistrictswasagainsimilar,withbothregionalFeds
willing to provide liquidity with a free hand. Moreover, while
the
Depressionwasfarfromoverin1932,theCaldwellcrisishadpetered
out andwithdrawalshad returned topre-crisis levels.Given the
two

regionalFeds’commonreadiness to lendas theneedarose, trends
in
bankactivityshouldagainhavebeencommonafter1931.The1931–
1934datalineupwellwiththishypothesis.

JustDDoIt:ADepressionRegression
ThesimplestDDcalculationinvolvesonlyfournumbers,asinequatio
ns
(5.1)and(5.2).Inpractice,however,theDDrecipeisbestcookedwith
regressionmodelsfittosamplesofmorethanfourdatapoints,suchas
the12pointsplottedinFigure5.2.Inadditiontoallowingformorethan
twoperiods,regressionDDneatlyincorporatesdataonmorethantwo



cross-
sectionalunits,aswe’llseeinamultistateanalysisoftheMLDAin
Section 5.2. Equally important, regression DD facilitates
statistical
inference, often a trickymatter in a DD setup (for details, see
the
appendixtothischapter).
The regression DD recipe associated with Figure 5.2 has three

ingredients:

(i)Adummyforthetreatmentdistrict,writtenTREATd,wherethe
subscriptdremindsusthatthisvariesacrossdistricts;TREATd
controls for fixed differences between the units being
compared.

(ii)Adummyforpost-treatmentperiods,writtenPOSTt,wherethe
subscripttremindsusthatthisvariesovertime;POSTtcontrols
for the fact that conditions change over time for everyone,
whethertreatedornot.

(iii) The interaction term, TREATd × POSTt, generated by
multiplyingthesetwodummies;thecoefficientonthistermis
theDDcausaleffect.

WethinkoftheCaldwell-
eraexperimentaltreatmentasprovisionofeasy
credit in the faceofa liquiditycrisis, soTREATdequalsone
fordata
pointsfromtheSixthDistrictandzerootherwise.Thebankfailurerate
slowedafter1931astheCaldwellcrisissubsided.Inthe1930s,howeve
r,
there were no zombie banks: dead banks were gone for good.
The
Caldwell-erafailuresresulted infewerbanksopenintheyears1932–

1934aswell,eventhoughtheSt.LouisFedhadbythenbeguntolend
freely.We thereforecodePOSTt to indicateall theobservations
from
1931onward.Finally,theinteractionterm,TREATd×POSTt,indicat
es
observations in the SixthDistrict in the post-treatment
period.More
precisely,TREATd×POSTtindicatesobservationsfromtheSixthDis
trict
inperiodswhentheAtlantaFed’sresponsetoCaldwellmatteredforthe
numberofactivebanks.
Regression DD for the Mississippi experiment puts these pieces



togetherbyestimating

in a sample of size 12. This sample is constructed by stacking
observationsfrombothdistrictsandallavailableyears(6yearsforeac
h
district).Thecoefficientontheinteractionterm,δrDD,isthecausaleff
ect
ofinterest.Withonlytwoperiods,asinFigure5.1,estimatesofδDDan
d
δrDD coincide (a consequence of the properties of dummy

variable
regressionoutlinedintheappendixtoChapter2).Withmorethantwo
periods,asinFigure5.2,estimatesbasedonequation(5.3)shouldbe
morepreciseandprovideamorereliablepictureofpolicyeffectsthan
thesimplefour-numberDDrecipe.4

Fittingequation (5.3) to the 12 observations plotted in Figure
5.2
generates the following estimates (with standard errors shown
in
parentheses):

These results suggest that roughly21bankswerekeptalivebySixth
Districtlending.Thisestimateisclosetotheestimateof19bankssaved
usingthefour-numberDDrecipe.Thestandarderrorfortheestimated
δrDDisabout11,so21isamarginallysignificantresult,thebestwecan
hopeforwithsuchasmallsample.



Let’sGetReal
TheAtlantaFedverylikelysavedmanySixthDistrictbanksfromfailu
re.
But banks arenot valued for their own sakes.Did theAtlanta
Fed’s

policyofeasymoneysupportrealeconomicactivity,thatis,non-bank
businesses and jobs? Statistics on business activity within states
are
scarce for this period. Still, the few numbers available suggest
the
Atlanta Fed’s bank liquidity backstopping generated real
economic
benefits.ThisisdocumentedinTable5.1,whichliststheingredientsfo
r
asimpleDDanalysisofFederalReserveliquidityeffectsonthenumbe
r
ofactivewholesalersandtheirsales.

TABLE5.1
Wholesalefirmfailuresandsalesin1929and1933



Notes:ThistablepresentsaDDanalysisofFederalReserveliquidityef
fectsonthenumberof
wholesalefirmsandthedollarvalueoftheirsales,parallelingtheDDa
nalysisofliquidityeffects
onbankactivityinFigure5.1.

DDestimatesforMississippiwholesalersparallelthoseforMississip

pi
banks.Between1929and1933,thenumberofwholesalefirmsandthei
r
sales fell inboththeSixthandEighthDistricts,withamuchsharper
dropintheEighthDistrict,wheremorebanksfailed.Inthe1920sand
1930s,wholesalersreliedheavilyonbankcredittofinanceinventorie
s.
TheestimatesinTable5.1suggestthatthereductioninbankcreditin
theEighthDistrictinthewakeofCaldwellbroughtwholesalebusiness
activitydownaswell,witha likelyrippleeffect throughoutthe local
economy. Sixth District wholesalers were more likely to have
been
sparedthisfate.Cookedwithonlyafour-numberDDrecipe,however,
theevidenceforaliquiditytreatmenteffectinTable5.1isweakerthan
thatproducedbythelargersampleforbankactivity.
TheCaldwell experimentoffersahard-won lesson inhow tonipa

bankingcrisis in thebud.Perhaps thegovernorof theSt.LouisFed,
seeingamoremodestcollapseintheSixthDistrictthanintheEighth,
had absorbed theCaldwell lesson by the timehe reversed course
in
1931.Butthepalliativepowerofmonetarypolicyinafinancialcrisis
was understood by national authorities only much later. In their
memoirs,MiltonFriedmanandhiswifeRosefamouslyrecounted:

Insteadofusing itspowers tooffset theDepression, [theFederal
ReserveBoardinWashington,D.C.]presidedoveradeclineinthe



quantity of money by one-third from 1929 to 1933. If it had
operated as its founders intended, itwould have prevented that
declineand,indeed,converteditintotherisethatwascalledforto
accommodatethenormalgrowthintheeconomy.5

Whichisn’ttosaythattheproblemoffinancialcrisismanagementhas
sincebeennailed.Today’scomplexfinancialmarketsrunofftherailsf
or
many reasons,notall ofwhichcanbe containedby theFedand its
printing presses. That hard lesson is being learned by
themonetary
authoritiesofourday.

5.2Drink,Drank,…

SHEN:Areyouwillingtodietofindthetruth?

PO:YoubetIam!…Although,I’dprefernotto.
KungFuPanda2

WiththerepealoffederalalcoholProhibitionin1933,U.S.stateswere
freetoregulatealcohol.MostinstitutedanMLDAof21,butKansas,Ne
w
York, and North Carolina, among others, allowed drinking at
18.
Following the twenty-sixth amendment to the constitution in
1971,
whichloweredthevotingageto18inresponsetoagitationsparkedby
theVietnamWar,manystatesreducedtheMLDA.Butnotall:Arkansa
s,
California,andPennsylvaniaareamongthestatesthatheldthelineat
21.In1984,theNationalMinimumDrinkingAgeActpunishedyouthf
ul
intemperancebywithholdingfederalaidforhighwayconstructionfro
m
stateswithanage-18MLDA.By1988,all50statesandtheDistrictof
Columbiahadopted foranMLDAof21, thoughsomehad taken the
federalhighwayhintmorequicklythanothers.
AswithmuchAmericanpolicymaking,theinteractionoffederaland

statelawproducesacolorfulandoft-changingquiltoflegalstandards.
Thispolicyvariationisaboontomastersof’metrics:variationinstate
MLDAlawsiseasilyexploitedinaDDframework.Ineffortstouncove
r

effectsofalcoholpolicy,thisframeworkprovidesanalternativetothe
RDapproachdetailedinChapter4.6

PatternsfromPatchwork
Alabama lowered its MLDA to 19 in 1975, but alphabetically
and
geographically proximate Arkansas has had an MLDA of 21
since
Prohibition’srepeal.DidAlabama’sindulgenceofitsyouthfuldrinke
rs
cost some of them their lives?We tackle this question by fitting
a
regressionDDmodeltodataonthedeathratesof18–20-year-oldsfrom
1970to1983.ThedependentvariableisdenotedYst,fordeathratesin
statesandyeart.WithasampleincludingonlyAlabamaandArkansas,
theregressionDDmodelforYsttakestheform

whereTREATs is a dummy variable indicating Alabama, POSTt
is a
dummy indicatingyears from1975onward,and the interaction
term
TREATs×POSTtindicatesAlabamaobservationsfromlow-
drinking-age

years.ThecoefficientδrDD captures theeffectofanage-
19MLDAon
deathrates.
Equation(5.4)parallelstheregressionDDmodelforMississippi’stw
o
FederalReserveDistricts.ButwhylookonlyatAlabamaandArkansas
?
There’smorethanoneMLDAexperimentinthelegislativerecord.For
example,Tennessee’sMLDAfellto18in1971,thenroseto19in1979.
Acomplicatingbutmanageableconsequenceofdifferencesinthetimi
ng
ofMLDA reductions in Alabama and Tennessee is the absence of
a
common post-treatment period. When combining multiple
MLDA
experimentsinaDDframework,weswapthesinglePOSTtdummyfora
setofdummiesindicatingeachyearinthesample,withoneomittedasa
reference group. The coefficients on these dummies, known as
time
effects,capturetemporalchangesindeathratesthatarecommontoall
states.7



Ourmulti-MLDAregressionDDprocedureshouldalsoreflectthefact

that there are many states driving causal comparisons. Instead
of
controllingonly for thedifferencebetween,say,
theSixthandEighth
FederalReserveDistrictsasintheMississippiexperimentofSection5
.1,
orthedifferencebetweenAlabamaandArkansasintheexampleabove
,
themultistate setup controls for the differing death rates in each
of
manystates.This isaccomplishedbyintroducingstateeffects,asetof
dummiesforeverystateinthesample,exceptforone,whichisomitted
asareferencegroup.AregressionDDanalysisofdatafromAlabama,
Arkansas,andTennessee,forexample,includestwostateeffects.Stat
e
effectsreplacethesingleTREATsdummyincludedinatwo-
state(ortwo-
group)analysis.
A final complication in this scenario is the absence of a
common
treatmentvariablethatdiscretelyswitchesoffandon.TheMLDAruns
fromage18toage21,generatingtreatmenteffectsforlegaldrinkingat
ages18,19,or20.Mastersof’metricssimplifysuchthingsbyreducing
themtoasinglemeasureofexposuretothepolicyofinterest, inthis
case,accesstoalcohol.OursimplificationstrategyreplacesTREATd

×
POSTtwith a variablewe’ll callLEGALst. This variablemeasures
the
proportionof18–20-year-oldsallowedtodrinkinstatesandyeart.In
somestates,nooneunder21isallowedtodrink,whileinstateswithan
age-19MLDA,roughlytwo-thirdsof18–20-year-
oldscandrink,andin
states with an age-18 MLDA, all 18–20-year-olds can drink.
Our
definitionofLEGALstalsocapturesvariationduetowithin-
yeartiming.
Forexample,Alabama’sage-19MLDAcame intoeffect inJuly1975.
LEGALAL,1975isthereforescaledtoreflectthefactthatAlabama’s1
9–20-
year-oldswerefreetodrinkforonlyhalfthatyear.
ThemultistateregressionDDmodellookslike



Don’tletthebigsumsinthisequationscareyou.Thisnotationdescribe
s
modelswithmanydummyvariablescompactly, justas in themodels
withcollegeselectivitygroupdummiesinChapter2.Hereeverystate
butone(thereferencestate)getsitsowndummyvariable,indexedby
thesubscriptkforstatek.Theindexskeepstrackofthestatesupplying

theobservations.Thekth statedummy,STATEks equalsonewhenan
observation is from state k,meaning s= k, and is zero otherwise.
ObservationsfromCalifornia,forexample,haveSTATECA,sswitch
edon,
andallotherstatedummiesswitchedoff.
Thestateeffects,βk,are thecoefficientson thestatedummies.For
example,theCaliforniastateeffect,βCAisthecoefficientonSTATE
CA,s.
Every state except the reference state, the one omitted when
constructingstatedummies,hasastateeffectinequation(5.5).Becaus
e
there are so many of these, we use summation notation,

tosavewritingthemallout.Thetimeeffects,γt,are
similarly coefficients on the year dummies,YEARjt. These
switch on
whenobservationsinthedatacomefromyearj,thatis,whent=j.We
thereforealsocallthemyeareffects.The1975yeareffect,γ1975,
isthe
coefficientonYEAR1975,t.Here,too,everyyearinthesampleexcept
the
referenceyearhasayeareffect,soweusesummationnotationtowrite
theseoutcompactly.8

OurmultistateMLDAanalysisusesadatasetwith14yearsand51

states (including the District of Columbia), for a total of 714
observations.Thisdatastructure iscalledastate-yearpanel.The
state
effectsinequation(5.5)controlforfixeddifferencesbetweenstates(f
or
example, fatal car accidents aremore frequent, on average, in
rural
stateswithhighaveragetravelspeeds).Thetime(year)effectsinthis
equationcontrolfortrendsindeathratesthatarecommontoallstates
(due, for example, to national trends in drinking or vehicle
safety).
Equation(5.5)attributeschangesinmortalitywithinstatestochanges
in
LEGALst.Aswe’llseeshortly,thiscausalattributionturnsonacomm
on
trends assumption, just as in our analysis of Caldwell-induced
bank



failuresintheprevioussection.
EstimatesofδrDD inequation(5.5)suggest that legalalcoholaccess
caused about 11 additional deaths per 100,000 18–20-year-olds,
of
whichsevenoreightdeathsweretheresultofmotorvehicleaccidents.

Theseresults,reportedinthefirstcolumnofTable5.2,aresomewhat
largerthanbutstillbroadlyconsistentwiththeRDestimatesreportedi
n
Table 4.1 in Chapter 4. The MVA estimates in Table 5.2 are
also
reasonablyprecise,withstandarderrorsofabout2.5. Importantly,as
withtheRDestimates,thisregressionDDmodelgenerateslittleevide
nce
of an effect of legal drinking on deaths from internal causes.
The
regression DD evidence for an effect on suicide is weaker than
the
corresponding RD evidence in Table 4.1. At the same time, both
strategiessuggestanyincreaseinnumbersofsuicidesissmallerthanfo
r
MVAdeaths.

TABLE5.2
RegressionDDestimatesofMLDAeffectsondeathrates

Notes:This table reports
regressionDDestimatesofminimumlegaldrinkingage (MLDA)
effectsonthedeathrates(per100,000)of18–20-year-
olds.Thetableshowscoefficientsonthe
proportionoflegaldrinkersbystateandyearfrommodelscontrollingf

orstateandyeareffects.
Themodelsusedtoconstructtheestimatesincolumns(2)and(4)inclu
destate-specificlinear
time trends.Columns (3)and (4) showweighted least
squaresestimates,weightingby state
population.Thesamplesizeis714.Standarderrorsarereportedinpare
ntheses.



ProbingDDAssumptions
Samples that include many states and years allow us to relax the
commontrendsassumption,thatis,tointroduceadegreeofnonparalle
l
evolution in outcomes between states in the absence of a
treatment
effect. A regression DDmodelwith controls for state-specific
trends
lookslike

Thismodelpresumes that in theabsenceofa treatmenteffect,death
ratesinstatekdeviatefromcommonyeareffectsbyfollowingthelinea
r
trendcapturedbythecoefficientθk.
Heretofore and hitherto we’ve been sayin’ that DD is all about

commontrends.Howcanitbe,then,thatwe’renowentertainingmodel
s
likeequation(5.6),whichrelaxthekeycommontrendsassumption?T
o
seehowsuchmodelswork,considerasampleoftwostates:Thefirst,
Allatsea,reducedtheMLDAto18in1975,whileneighboringAlabaste
r
heldthelineat21.Asabaseline,Figure5.4sketchesthecommontrends
story.Deathsper100,000moveinparalleluntil1975(mostthingsgot
worseinthe1970s,soweshowdeathratesincreasing).Deathratesalso
jumpabovetrendinAllatseain1975,whenthatstatelowereditsMLDA
.
Giventheparallelismandthetiming,itseemsfairtoblameAllatsea’s
lowerMLDAforthisjump.
Figure5.5sketchesascenariowithasteepertrendinAllatseathanin

Alabaster. As with the data plotted in the previous figure,
simple
regressionDDestimationinthiscasegeneratesestimatesimplicating
the
MLDA(thepost-minus-precontrastinAllatseais largerthanthepost-
minus-
precontrastinAlabaster).Inthiscase,however,theresultingDD
estimate is spurious: thedifference in state

trendspredatesAllatsea’s



MLDAliberalizationandmustthereforebeunrelatedtoit.
Luckily,suchdifferencesintrendcanbecapturedbythestate-specific

trendparameters,θk,inequation(5.6).Inmodelsthatcontrolforstate-
specifictrends,evidenceforMLDAeffectscomesfromsharpdeviatio
ns
fromotherwisesmoothtrends,evenwherethetrendsarenotcommon.
Figure5.6showshowregressionDDcapturestreatmenteffects inthe
faceofuncommontrends.DeathratesinAllatseaincreasemoresteepl
y
than in Alabaster throughout the sample period. But the Allatsea
increaseisespeciallysteepfrom1974to1975,whenAllatsealoweredi
ts
MLDA.ThecoefficientonLEGALstinequation(5.6)picksthisup,wh
ile
themodelallowsforthefactthatdeathratesindifferentstateswereon
differenttrajectoriesfromtheget-go.

FIGURE5.4
AnMLDAeffectinstateswithparalleltrends

FIGURE5.5
AspuriousMLDAeffectinstateswheretrendsarenotparallel



FIGURE5.6
ArealMLDAeffect,visibleeventhoughtrendsarenotparallel

Modelswithstate-specificlineartrendsprovideanimportantcheckon
the causal interpretationof any set of regressionDDestimates
using
multiperiod data. In practice, however, empirical reality may be
considerablymushierandhardertointerpretthanthestylizedexample
s
laidoutinFigures5.4–
5.6.Thefindingsgeneratedbyaregressionmodel
likeequation(5.6)areoftenimprecise.Thesharperthedeviationfrom
trendinducedbyacausaleffect,themorelikelywearetobeableto
uncover it. On the other hand, if treatment effects emerge only



gradually, estimates of equations like (5.6) may fail to
distinguish
treatmenteffectsfromdifferentialtrends,withtheendresultbeingan

impreciseandthereforeinconclusivesetoffindings.
Happily for a coherent causal DD analysis of MLDA effects,

introductionofstate-
specifictrendshaslittleeffectonourregressionDD
estimates.Thiscanbeseenincolumn(2)ofTable5.2,whichreports
regressionDDestimatesofMLDAeffectsfromthemodeldescribedby
equation(5.6).Theadditionoftrendsincreasesstandarderrorsalittle,
but the loss of precisionhere ismodest.The findings in column
(2)
support a causal interpretation of the more precise MLDA
effects
reportedincolumn(1)ofthetable.
State policymaking is amessy business,with frequent changes on

many fronts. DD estimates of MLDA effects, with or without
state-
specific trends,maybebiasedbycontemporaneouspolicychanges in
other areas. An important consideration in research on alcohol,
for
example,isthepriceofadrink.Taxesarethemostpowerfultoolthe
government uses to affect the price of your favorite
beverage.Many
stateslevyaheavytaxonbeer,whichwemeasureindollarspergallon
ofalcoholcontent.Beertaxesrangefromjustpenniespergallontomor

e
thanadollarpergalloninsomeSouthernstates.Beertaxeschangefrom
timetotime,mostlyincreasing,muchtothedismayoftheBeerInstitute
(withataxrateof2centspergallonsince1935,Wyomingisbeerbliss).
Itstandstoreasonthatstatesmightraisetaxratesatthesametimethat
theyincreasetheirMLDA,perhapsasapartofabroaderefforttoreduce
drinking.Ifso,weshouldcontrolfortime-varyingstatetaxrateswhen
estimatingMLDAeffects.
Regression DD models that include controls for state beer taxes

generateMLDAestimatessimilar to
thosewithoutsuchcontrols.This
canbeseeninTable5.3,whichreportsboththeestimatedcoefficients
onLEGALstandtheestimatedcoefficientsonstatebeertaxesinmodel
s
forthefourdeathratesexaminedinTable5.2.Columns(1)and(2)of
Table 5.3 show beer tax andMLDA effects estimated using a
single
regression without controls for state-specific trends, while those
in
columns(3)and(4)comefromanotherregressionincludingcontrolsf
or

state-specific trends.Beer taxeffectsareestimated lessprecisely
than
MLDAeffects,mostlikelybecausebeertaxeschangelessoftenthanth
e
MLDA.Thebeertaxestimatesfrommodelsthatincludestatetrendsar
e
especiallynoisy.Still, theBeer Institutewillbepleased to learn
that
theseresultsdon’t speak in favorof furtherbeer tax
increases.We’re
likewisepleased toknow thatourMLDAestimates are robust to
the
inclusionofabeertaxcontrol;we’llshareabeertocelebrate!

TABLE5.3
RegressionDDestimatesofMLDAeffectscontrollingforbeertaxes

Notes:This table reports
regressionDDestimatesofminimumlegaldrinkingage (MLDA)
effectsonthedeathrates(per100,000)of18–20-year-
olds,controllingforstatebeertaxes.The
tableshowscoefficientsontheproportionoflegaldrinkersbystateand
yearandthebeertaxby
stateandyear,frommodelscontrollingforstateandyeareffects.Thefr
actionlegalandbeertax

variablesare included
inasingleregressionmodel,estimatedwithout trends toproduce the
estimatesincolumns(1)and(2)andestimatedwithstate-
specificlineartrendstoproducethe
estimates in columns (3) and (4). The sample size is 700.
Standard errors are reported in
parentheses.

WhatAreYouWeightingFor?
Theestimatesofequations (5.5)and (5.6) incolumns (1)and (2)of
Table5.2giveallobservationsequalweight,asifdatafromeachstate
wereequallyvaluable.Statesarenotcreatedequal,however,inatleast
oneimportantrespect:some,likeTexasandCalifornia,arebiggerthan
most countries, while others, like Vermont and Wyoming, have



populationssmallerthanthoseofmanyAmericancities.Wemayprefe
r
estimates that reflect this fact by givingmore populous
statesmore
weight.Theregressionprocedurethatdoesthisiscalledweightedleas
t
squares(WLS).ThestandardOLSestimatorfitsalinebyminimizingt
he

sampleaverageofsquaredresiduals,witheachsquaredresidualgettin
g
equalweightinthesum.9Justasthenamesuggests,WLSweightseach
termintheresidualsumofsquaresbypopulationsizeorsomeother
researcher-chosenweight.
Populationweightinghastwoconsequences.First,asnotedinChapter

2,regressionmodelsoftreatmenteffectscaptureaweightedaverageof
effects for thegroupsorcellsrepresented inourdata. Inastate-year
panel, thesegroupsare states.OLSestimatesofmodels for state-
year
panelsproduceestimatesofaveragecausaleffectsthatignorepopulati
on
size,sotheresultingestimatesareaveragesoverstates,notoverpeople
.
Populationweighting generates a people-weighted average,
inwhich
causaleffectsforstateslikeTexasgetmoreweightthanthoseforstates
likeVermont.People-
weightingmaysoundappealing,butitneednotbe.
The typical citizen ismore likely to live inTexas
thanVermont,but
changes in theVermontMLDAprovidevariation thatmaybe justas
usefulaschangesinTexas.Youshouldhope,therefore,thatregression
estimates from your state-year panel are not highly sensitive to

weighting.
Populationweightingmay also increase the precision of
regression

estimates.WithfarfewerdriversinVermontthaninTexas,MVAdeath
ratesinVermontarelikelytobemorevariablefromyeartoyearthan
those in Texas (this reflects the sampling variation discussed in
the
appendixtoChapter1).Inastatisticalsense,thedatafromTexasare
morereliableandtherefore,perhaps,worthyofhigherweight.Hereto
o,
however, thecaseforweightingisnotopenandshut.Asamatterof
econometric theory, masters of ’metrics can claim that weighted
estimates are more precise than unweighted estimates only when
a
numberof restrictive technical conditionsaremet.10Onceagain,
the
bestscenarioisasetoffindings(thatis,estimatesandstandarderrors)
thatarereasonablyinsensitivetoweighting.



Columns(3)and(4)inTable5.2reportWLSestimatesofequations
(5.5) and (5.6). These correspond to the OLS estimates shown in
columns(1)and(2)ofthetable,buttheWLSestimatorweightseach

observation by state population aged 18–20. Happily for our
understandingofMLDAeffects,weightingherematterslittle.Itwoul
d
seemonceagainthatteetotalingmastershavebeenrewardedfortheir
temperance.

MASTERSTEVEFU:Wrapitupforme,Grasshopper.
GRASSHOPPER:Treatmentandcontrolgroupsmaydifferintheabse
nce
oftreatment,yetmoveinparallel.Thispatternopensthedoorto
DDestimationofcausaleffects.
MASTERSTEVEFU:WhyisDDbetterthansimpletwo -group
comparisons?
GRASSHOPPER:Comparingchangesinsteadoflevels,weeliminate
fixeddifferencesbetweengroupsthatmightotherwisegenerate
omittedvariablesbias.
MASTERSTEVEFU:HowisDDexecutedwithm ultiplecomparison
groupsandmultipleyears?
GRASSHOPPER:IhaveseenthepowerandflexibilityofregressionD
D,
Master.Inastate-yearpanel,forexample,withtime-varyingstate
policiesliketheMLDA,weneedonlycontrolforstateandyear
effects.
MASTERSTEVEFU:OnwhatdoesthefateofD Destimatesturn?
GRASSHOPPER:Paralleltrends,theclaimthatintheabsenceof

treatment,treatmentandcontrolgroupoutcomeswouldindeed
moveinparallel.DDlivesanddiesbythis.Thoughwecanallowfor
state-specificlineartrendswhenapanelislongenough,masters
hopeforresultsthatareunchangedbytheirinclusion.

Mastersof’Metrics:JohnSnow



British physician John Snow was one of the fathers of modern
epidemiology, the study of how illnessmoves through a
population.
StudyinganoutbreakofcholerainLondonin1849,Snowchallengedth
e
conventionalwisdomthatthediseaseiscausedbybadair.Hethought
choleramightbecausedbybadwaterinstead,anideahefirstlaidoutin
his1849essayOntheModeofCommunicationofCholera.
Afurthercholeraoutbreakin1853and1854claimedmanylivesin

theLondonneighborhoodofSoho.SnowattributedtheSohoepidemic
to
water from a pump on Broad Street. Not afraid to give a natural
experiment ahelpinghand, he convinced the local parish council
to
removethehandleof theBroadStreetpump.Choleradeaths inSoho

subsidedsoonafter, thoughSnownotedthatdeathrates inhisBroad
Street treatmentzonewerealreadydeclining,and that thismade the
datafromhisnaturalexperimenthardtointerpret.DDwasasfickleat
birthasitistoday.
Snowwasameticulousdatagrubber,settingastandardwestillaspire

tomeet.Inan1855revisionofhisessay,Snowreporteddeathratesby
district andwater source for variousparts of London.Henoted
that
manyofthehigh-death-
ratedistrictsinSouthLondonweresuppliedby
oneof twocompanies, theSouthwarkandVauxhallCompanyor the
Lambeth Company. In 1849, both companies drew water from
the
contaminatedThamesincentralLondon.Startingin1852,however,th
e
Lambeth Company drew from the river at Thames Ditton, an
uncontaminated water source upstream. Snow showed that
between
1849and1854deaths fromcholera fell in the area suppliedby the
Lambeth Company but rose in that supplied by the Southwark
and
VauxhallCompany.OurFigure5.7 reproducesTable12
fromSnow’s
1855essay.11ThistablecontainstheingredientsforSnow’stwo-

period
DDanalysisofdeathratesbywatersource.

Appendix:StandardErrorsforRegressionDD

RegressionDDisaspecialcaseofestimationwithpaneldata.Astate-



yearpanel consistsof repeatedobservationson statesover
time.The
repetitivestructureofsuchdatasetsraisesspecialstatisticalproblems
.
Economic data of this sort typically exhibit a property called
serial
correlation (that’s serial as in “murder,” not “breakfast”).
Serially
correlateddataarepersistent,meaningthevaluesofvariablesfornear
by
periodsarelikelytobesimilar.
We expect serial correlation in time series data like annual

unemploymentrates.Whenastate’sunemploymentrateishigherthan
averageinoneyear, it’s likelytobehigherthanaverageinthenext.
Becausepaneldata sets combine repeatedobservations for

individual
states(inourMLDAexample)orregions(inourMississippiexperime
nt),
suchdataareoftenseriallycorrelated.Whenthedependentvariablein
a
regressionisseriallycorrelated,theresidualsfromanyregressionmo
del
explaining this variable are often serially correlated as well. A
combination of serially correlated residuals and serially
correlated
regressorschangestheformularequiredtocalculatestandarderrors.
If we ignore serial correlation and use the simple standard error

formula,equation(2.15),theresultingstatisticalconclusionsarelikel
y
tobemisleading.Thepenaltyforignoringserialcorrelationisthatyou
exaggerate the precision of regression estimates. This is because
the
sampling theory for regression inference laid out in the
appendix to
Chapter1presumesthedataathandcomefromrandomsamples.Serial
correlation is a deviation from randomness, with the important
consequence that each new observation in a serially correlated
time
seriescontains less information thanwouldbe thecase if

thesample
wererandom.

FIGURE5.7
JohnSnow’sDDrecipe



JustastherobuststandarderrorsdiscussedintheappendixtoChapter
1 correct for heteroskedasticity, there’s a modified standard
error
formulathatanswerstheserialcorrelationchallenge.Theappropriate
formulainthiscaseisknownasaclusteredstandarderror.Theformula
forclusteredstandarderrorsismorecomplicatedthantheformulafor
robust standarderrorsgiven inequation (2.16);wewon’taskyou to
learnitforthetest.Theimportantthingisthatclustering(anoptionin



mostregressionsoftware)allowsforcorrelateddatawithinresearcher
-
defined clusters. In contrast with the assumption that all data
are
randomly sampled, the formula for clustered standarderrors
requires

only that clusters be sampled randomly, with no random
sampling
assumptioninvokedforwhat’sinsidethem.
In theMLDAexamplediscussed in this chapter, statesareclusters.
Often, it’s individual people who appear in our samples
repeatedly.
ParticipantsintheRANDHIEcontributeduptofiveannualobservatio
ns
ontheirhealth-careuseinthesampleusedtoconstructTable1.4,and
childrenappearintwoseparategradesinthesampleusedtoestimate
thepeereffectsmodel,equation(4.9).Intheseexamples,weadjustfor
the fact that repeated outcomes for the same person tend to be
correlatedbyclusteringonindividual.
IntheMississippiexperiment,clustersareFederalReserveDistricts.
Thereareonly twoof these, an important caution.Serial
correlation
mightnotbeaproblemintheMississippiexperiment,butifitis,we’ll
needmoredatabeforewecansayanythingconclusiveabouttheeffects
of liquidity on bank survival. Once you start clustering, the
formal
theory behind statistical inference presumes you havemany
clusters
instead of (or in addition to) many individual observations
within
clusters.Inpractice,“many”mightbeonlyafewdozen,as

withAmericanstates.That’sprobablyOK,butapairorahandfulof
clustersmaynotbeenough.12

Clustered standard errors are appropriate for a wide variety of
settings,notonlyforpaneldata.Inprinciple,clusteringsolvesanysort
ofdependenceproblem inyourdata (thoughyoumightnot like the
largestandarderrorsthatresult).Forexample,datafromachievement
tests taken by schoolchildren are likely to be correlated within
classrooms if children in the same classes share a teacher and
have
similarfamilybackgrounds.Whenreportingestimatesoftheeffectso
f
educationalinterventionslikepeereffectsinequation(4.6)ortheeffe
cts
ofprivateuniversitiesinChapter2,mastersclustertheirstandarderror
s
onclass,school,oruniversity.




1CarmenReinhartandKennethRogoff,ThisTimeIsDifferent:Eight
CenturiesofFinancialFolly,
PrincetonUniversityPress,2009;andMiltonFriedmanandAnnaSch
wartz,AMonetaryHistoryof

theUnitedStates,1867–1960,PrincetonUniversityPress,1963.
2FromChapter IV.4
inWalterBagehot,LombardStreet:ADescriptionof
theMoneyMarket,
HenryS.KingandCo.,1873.
3 Gary Richardson andWilliam Troost, “Monetary
InterventionMitigated Banking Panics
during the Great Depression: Quasi-Experimental Evidence
from a Federal Reserve District
Border,1929–
1933,”JournalofPoliticalEconomy,vol.117,no.6,December2009,p
ages1031–
1073.NumbersinthissectionareourtabulationsfromtheRichardsona
ndTroostdata.
4 In fact,asweexplain in thechapterappendix, it’shard togauge
theprecisionofaDD
estimateconstructedfromonlytwocross-
sectionalunitsandtwoperiods.
5MiltonFriedmanandRoseD.Friedman,TwoLuckyPeople:Memoir
s,UniversityofChicago
Press,1998,page233.
6CarpenterandDobkin,“TheMinimumLegalDrinkingAge,”Journal
ofEconomicPerspectives,
2011,analyzedtheMLDAinaDDframework .
7We includeone less timeeffect than there areyears

inourdata.Timeeffectsmeasure
temporalchangesrelativetoastartingpoint,usuallythefirstyearinthe
sample.
8Here’sanotherwaytoseehowthenotationworks.Consideranobserv
ationfors=NY.Then
wehave

so the sum of all possible state dummies picks up the New York
state effect, βNY,when
observationsarefromNewYork.Alltheotherdummiesinthesumarez
ero.Likewise,ift=
1980,thenwehave

sothesumpicksupthe1980yeareffectwhenobservationsarefrom198
0.
9Regressionresiduals,definedintheappendixtoChapter2,arethedif
ferencesbetweenthe
fittedvaluesgeneratedbythemodelwe’reestimatingandthedepende
ntvariableinthismodel.
10OnerequirementisthattheunderlyingCEFbelinear.Theappendixt
oChapter2notes,
however,thatmanyregressionmodelsareonlylinearapproximations
totheCEF.
11JohnSnow,OntheModeofCommunicationofCholera,JohnChurc
hill,secondedition,1855.

12Foramoredetaileddiscussionofthispoint,seeourbook,MostlyHar
mlessEconometrics,
PrincetonUniversityPress,2009.Inananalysisofhundredsofcountie
soneithersideofFederal
ReserveDistrictborders,AndrewJaliladdsclusterstotheMississippi
experiment.See“Monetary
InterventionReallyDidMitigateBankingPanicsduringtheGreatDep
ression:Evidencealongthe
AtlantaFederalReserveDistrictBorder,”JournalofEconomicHistor
y,vol.74,no.1,March2014,
pages259–273.



Chapter6

TheWagesofSchooling

Legendtellsofalegendaryeconometricianwhoseeconometricskills
werethestuffoflegend.

MastersatWork

Thischaptercompletesourexplorationofpathsfromcausetoeffect
withamultifaceted investigationof the causal effectof

schoolingon
wages.Goodquestionsarethefoundationofourwork,andthequestion
ofwhether increased education really increases earnings is a
classic.
Masters have tackled the schooling questionwith all tools in
hand,
except, ironically, randomassignment.Theanswers they’ve
fashioned
arenolessinterestingforbeingincomplete.

6.1Schooling,Experience,andEarnings

BritishWorldWarIIveteranBertieGladwindroppedoutofsecondary
schoolatage14,thoughhestillfoundworkasaradiocommunication
engineerintheBritishintelligenceservice.Inhissixties,Bertiereturn
ed
toschool,completingaBAinpsychology.Later,BertieearnedaBScin
microbiology, before embarking on a Master’s degree in
military
intelligence, completed at the age of 91. Bertie has since been



consideringstudyforaPhD.1

It’s never too late to learn somethingnew.UnlikeBertieGladwin,
however, most students complete their studies before
establishing a
career.Collegestudentsspendyearsburiedinbooksandtuitionbills,
whilemanyoftheirhighschoolfriendswhodidn’tgotocollegemay
havestartedworkandgainedameasureoffinancialindependence.In
return for the time-consuming toil and expense of college,
college
graduateshope tobe rewardedwithhigher earningsdown the road.
Hopes and dreams are one thing; life follows many paths. Are
the
forgone earnings and tuition costs associated with a college
degree
worthwhile?That’samilliondollarquestion,andour interest in it is
morethanpersonal.Taxpayerssubsidizecollegeattendanceforstude
nts
aroundtheworld,apolicymotivatedinpartbytheviewthatcollegeis
thekeytoeconomicsuccess.
Economistscallthecausaleffectofeducationonearningsthereturns
toschooling.Thisterminvokesthenotionthatschoolingisaninvestm
ent
inhumancapital,withamonetarypayoffsimilartothatofafinancial
investment.Generationsofmastershaveestimatedtheeconomicretur
ns
toschooling.Theireffortsillustratefourofourtools:regression,DD,I

V,
andRD.
’MetricsmasterJacobMincerpioneeredeffortstoquantifythereturn
toschoolingusingregression.2WorkingwithU.S.censusdata,Mince
r
ranregressionslike

where lnYi is the log annual earnings ofman i,Si is his
schooling
(measured as years spent studying), and Xi is his years of work
experience.Mincerdefinedthe latterasageminusyearsof schooling
minus6,acalculationthatcountsallyearssincegraduationasyearsof
work.Masters callXi calculated in thisway potential experience.
It’s
customarytocontrolforaquadraticfunctionofpotentialexperienceto
allowforthefactthat,althoughearningsincreasewithexperience,the
y



dosoatadecreasingrate,eventuallyflatteningoutinmiddleage.
Mincer’s estimates of equation (6.1) for a sample of about
31,000
nonfarmwhitemeninthe1960Censuslooklike

Withnocontrols,ρ=.07.Thisestimatecomesfromamodelbuiltwith
logs,soρ=.07impliesaverageearningsrisebyabout7%witheach
additional year of schooling (the appendix to Chapter 2
discusses
regression models with logs on the left-hand side). With
potential
experienceincludedasacontrolvariable,theestimatedreturnsincrea
se
toabout.11.
Themodelwithpotentialexperiencecontrolsforthefactthatthose
withmore schooling typically have fewer years ofwork
experience,
sinceeducatedmenusuallystartfull-timeworklater(thatis,aftertheir
schoolingiscompleted).BecauseSiandXiarenegativelycorrelated,t
he
OVB formula tells us that omitting experience,which has a
positive
effectonearnings,leadstoalowerestimateofthereturnstoschooling
thanwecanexpectinlongregressionsthatincludeexperiencecontrols
.
Mincer’sestimatesimplythatwhitemenwithagivenlevelofexperien
ce
enjoyan11%earningsadvantageforeachadditionalyearofeducation
.
Itremainstobeseen,however,whetherthisisacausaleffect.3

OfSingers,Fencers,andPhDs:AbilityBias
Equation(6.1)comparesmenwithmoreandfeweryearsofschooling,
while holding their years of work experience fixed. Is control
for
potentialexperiencesufficientforceteristobeparibus?Inotherwords
,at
agivenexperiencelevel,aremore-andless-educatedworkersequally
ableanddiligent?Dotheyhavethesamefamilyconnectionsthatmight
offeralegupinthelabormarket?Suchclaimsseemhardtoswallow.
Likeothermasters,we’reprettyhighlyeducatedourselves.Andwe’re



smarter,harderworking,andbetterbredthanmostofthosewhodidn’t
stick itout in the schoolingdepartment,or sowe tellourselves.The
goodqualities thatwe imaginewe sharewithotherhighly educated
workers are also associated with higher earnings, complicating
the
causalinterpretationofregressionestimateslikethoseinequation(6.
2).
We can hope to improve on these simple regression estimates by

controllingforattributescorrelatedwithschooling,variableswe’llca
llAi

(shortfor“ability”).Ignoringtheexperiencetermfornowandfocusin
g
onothersourcesofOVB,theresultinglongregressioncanbewrittenas

TheOVBformulatellsusthattheshortregressionslopefromamodel
withnocontrols,ρs,isrelatedtothelongregressionslopeinmodel(6.3
)
bytheformula

whereδASistheslopefromabivariateregressionofAionSi.Asalways
,
short(ρs)equalslong(ρl)plustheregressionofomitted(fromshort)on
included(δAS)timestheeffectofomittedinlong(γ).Inthiscontext,th
e
differencebetweenshortandlongiscalledabilitybiassincetheomitte
d
variableisability.
Whichwaydoesabilitybiasgo?We’vedefinedAisothatγinthelong

regressionispositive(otherwise,we’dcallAidis-ability).SurelyδAS
is
positive as well, implying upward ability bias: we expect the
short
regressionρs toexceed themorecontrolledρl.Afterall, ourLondon
SchoolofEconomicsandMITstudentstendtobehighability,atleastin

thesenseofhavinghightestscoresandgoodgradesinhighschool.On
theotherhand,somepeoplecuttheirschoolingshortsoastopursue
more immediately lucrativeactivities.
SirMickJaggerabandonedhis
pursuitofadegreeattheLondonSchoolofEconomicsin1963toplay
withanoutfitknownastheRollingStones.Jaggergotnosatisfaction,



andhecertainlynevergraduatedfromcollege,butheearnedplentyasa
singerinarockandrollband.Nolessimpressive,Swedishépéefencer
JohanHarmenbergleftMITafter2yearsofstudyin1979,winninga
goldmedalatthe1980MoscowOlympics, insteadofearninganMIT
diploma. Harmenberg went on to become a biotech executive
and
successful researcher.Theseexamples
illustratehowpeoplewithhigh
ability—musical, athletic, entrepreneurial, or otherwise—may
be
economically successful without the benefit of an education.
This
suggests thatδAS,andhenceabilitybias,canbenegativeaseasilyas
positive.

TheMeasureofMen:ControllingAbility

Here’s an easy work-around for the ability bias roadblock:
collect
informationonAianduse itasacontrol inregressions likeequation
(6.3).InanefforttotackleOVBinestimatesofthereturnstoschooling,
’metricsmasterZviGrilichesusedIQasanabilitycontrol.4WithoutI
Q
in the model, Griliches’ estimate of ρs in a model controlling
for
potential experience is .068. Griliches’ estimated short
regression
schooling coefficient is well belowMincer’s estimate of about
11%,
probably due to differences in samples and dependent variables
(Grilicheslookedateffectsonhourlywagesinsteadofannualearnings
).
Importantly, the addition of an IQ control knocksGriliches’
estimate
downtoρl

=.059,aconsequenceofthefactsthatIQandschoolingarestrongly
positivelycorrelatedandthathigherIQpeopleearnmore(sotheeffect
ofomittedabilityinlongisindeedpositive).
Althoughintriguing,it’shardtoseeGriliches’findingsasconclusive.
IQ doesn’t capture Mick Jagger’s charisma or Johan
Harmenberg’s

perseverance,dimensionsofabilitythatarerarelymeasuredinstatisti
cal
samples.Therelevantnotionofabilityhereisanindividual’searnings
potential, a concept reminiscent of the potential outcomeswe
use to
describecausaleffectsthroughoutthebook.Theproblemwithpotenti
al



outcomes,asalways,isthatwecanneverseethemall,weseeonlythe
oneassociatedwiththeroadtaken.Forexample,weseeonlythe“highl
y
educated”potentialoutcomeinasampleofcollegegraduates.Wecan’
t
knowhowsuchpeoplewouldhavefaredifthey’dfollowedJohanand
Mickoutofcollege.Attemptstosummarizepotentialearningswitha
single test score are probably inadequate. Moreover, for reasons
explained inSection6.2anddetailed further in theappendix to this
chapter, when schooling is mismeasured (as we think it often
is),
estimateswithabilitycontrolscanbemisleadinglysmall.

BewareBadControl
Perhapsmorecontrolsaretheanswer.Whynotcontrolforoccupation,

forexample?Manydatasetsthatreportearningsalsoclassifyworkers’
jobs,suchasmanagerorlaborer.Surelyoccupationisastrongpredicto
r
ofbothschoolingandearnings,possiblycapturingtraitsthatdistingui
sh
MickandJohanfrommoreaverageJoes.BythelogicofOVB,therefore
,
we should control for occupation, a matter easily accomplished
by
includingdummyvariablestoindicatethetypesofjobsheld.
Althoughoccupation is stronglycorrelatedwithbothschoolingand

wages, occupation dummies are bad controls in
regressionsmeant to
capture causal effects of schooling on wages. The fact that
Master
Joshwayworkstodayasaprofessorandnotasanurse’saide(asheonce
did)isinpartarewardforhisextravagantschooling.It’samistaketo
eliminatethisbenefitfromourcalculationbycomparingonlyprofess
ors
or nurse’s aideswhen attempting to quantify the economic value
of
schooling. Even in a world where all professors earn a uniform
$1
millionayear(mayitsooncometopass)andallnurse’saidesearna

uniform$10,000,anexperimentthatrandomlyassignsschoolingwou
ld
show that schooling raiseswages. The channel bywhichwages
are
increasedinthisnotionalexperimentistheshiftfromlowlynurse’said
e
toelevatedprofessorness.
There’sasecond,moresubtle,confoundingforcehere:badcontrols

createselectionbias.Toillustrate,supposewe’reinterestedintheeffe
cts



ofacollegedegreeandthatcollegecompletionisrandomlyassigned.
Peoplecanworkinoneoftwooccupations,whitecollarandbluecollar,
anda collegedegreenaturallymakeswhite collarworkmore likely.
Becausecollegechangesoccupationforsome,comparisonsofwages
by
college degree status conditional on occupation are no longer
well
balanced, even when college degrees are randomly assigned and
unconditionalcomparisonsareapples-to-apples.
This troubling phenomenon is a composition effect. By virtue of

randomassignment,thosewhodoanddon’thaveacollegedegreeare
similarineveryway,atleastonaverage.Mostimportantly,theyhave
the same averageY0i, that is, the same average earnings
potential.
Suppose, however, thatwe limit the comparison to thosewho
have
white collar jobs. The noncollege control group in this case
consists
entirelyofespeciallybrightworkerswhomanagetolandawhitecollar
jobwithoutthebenefitofacollegeeducation.Butthewhitecollargrou
p
thatgraduatesfromcollegeincludesthesealways-white-
collarguysplus
aweakergroupthat landsawhitecollar jobbyvirtueofcompleting
collegebutnototherwise.
We can see the consequences of this compositional difference
by

imaginingthreeequal-sizedgroupsofworkers.Thefirstgroupworksa
bluecollarjobwithorwithoutcollege(AlwaysBlue,orAB).Asecond
groupworksawhitecollarjobirrespectiveoftheireducation(Always
White,orAW).Membersofathirdgroup,BlueWhite(BW),getawhite
collar jobonlywithacollegedegree.Thesepotentialoccupationsare
describedinthefirsttwocolumnsofTable6.1,whichlistsjobsobtaine
d

bythoseineachgroupinscenarioswithandwithoutacollegedegree.
In spite of the fact that college is randomly assigned, and
simple

comparisons of college and noncollegeworkers reveal causal
effects,
within-
occupationcomparisonsaremisleading.Suppose,forthesakeof
argument,thevalueofcollegeisthesame$500perweekforallthree
groups.Althoughthethreetypesofworkersenjoythesamegainsfroma
collegeeducation,theirpotentialearnings(thatis,theirY0ivalues)ar
e
likelytodiffer.Tobeconcrete,supposetheAWgroupearns$3,000per



weekwithoutacollegedegree,theABgroupearnsonly$1,000perwee
k
withoutacollegedegree,andtheBWsearnsomethinginthemiddle,
say,$2,000perweekwithoutacollegedegree.Columns(3)and(4)of
Table6.1summarizethesefacts.

TABLE6.1
Howbadcontrolcreatesselectionbias

Limitingthecollege/noncollegecomparisontothosewhohavewhite
collar jobs, theaverageearningsofcollegegraduates isgivenby the
averageofthe$3,500earnedbytheAWswithacollegedegreeandthe
$2,500earnedbytheBWs,whiletheaveragefornoncollegegraduatesi
s
the constant $3,000 earned by the AWs without a college
degree.
Because the average of $3,500 and $2,500 also equals $3,000,
the
conditional-on-white-collar comparisonbycollegegraduation
status is
zero,amisleadingestimateofthereturnstocollege,whichis$500for
everyone.Thecomparisonofearningsbygraduationstatusamongblu
e
collar workers is an equally misleading zero. Although random
assignmentofcollegeensuresequalproportionsofapplesandoranges
(typesorgroups)inthecollegeandnoncollegebarrels,conditioningo
n
white collar employment, an outcomedetermined in part by
college
graduation,distortsthisbalance.
Themoralofthebadcontrolstoryisthattimingmatters.Variables
measuredbefore the
treatmentvariablewasdeterminedaregenerally

good controls, because they can’t be changed by the treatment.
By
contrast, control variables that are measured later may have
been
determinedinpartbythetreatment,inwhichcasetheyaren’tcontrols
at all, they are outcomes.Occupation in a regressionmodel for
the
causaleffectofschoolingisacaseinpoint.Abilitycontrols(suchastes
t
scores)mayalsohavethisproblem,especiallyiftestscorescomefrom
tests taken by those who have completed most of their
schooling.
(Schooling probably boosts test scores.) This is onemore reason
to
questionempiricalstrategiesthatrelyontestscorestoremoveability
biasfromeconometricestimatesofthereturnstoschooling.5

6.2TwinsDoubletheFun

Twinsburg,Ohio,nearCleveland,wasfoundedasMillsvilleintheearl
y
nineteenthcentury.ProsperousMillsvillebusinessmenMosesandAa
ron

Wilcoxwere identical twinswhom few could distinguish.Moses
and
Aaronwere generous toMillsville in their success, a fact
recognized
whenMillsvillewasrenamedTwinsburgintheearlynineteenthcentur
y.
Since1976,Twinsburghasembraceditszygoticheritageintheformof
a
summerfestivalcelebratingtwins.Millsville’sannualTwinsDaysatt
ract
notonlytwinsrevelingintheirsimilaritiesbutalsoresearcherslookin
g
forwell-controlledcomparisons.
Twin siblings indeedhavemuch in common:mostgrowup in the
samefamilyatthesametime,whileidenticaltwinsevensharegenes.
Twinsmightthereforebesaidtohavethesameabilityaswell.Perhaps
thefactthatonetwingetsmoreschoolingthanhisorhertwinsiblingis
duemostlytothesortofserendipitousforcesdiscussedinChapter2.Th
e
notionthatonetwinprovidesagoodcontrolfortheothermotivatesa
pairofstudiesbymastersOrleyAshenfelter,AlanKrueger,andCecili
a
Rouse.6Thekeyideabehindthiswork,asinmanyotherstudiesusing
twins, is that ifability is common toapairof twin siblings,wecan
remove it fromtheequationby subtractingone twin’sdata fromthe

other’sandworkingonlywiththedifferencesbetweenthem.



Thelongregressionthatmotivatesatwinsanalysisofthereturnsto
schoolingcanbewrittenas

Here,subscriptfstandsforfamily,whilesubscripti=1,2indexestwin
siblings,say,KarenandSharonorRonaldandDonald.WhenRonaldan
d
Donaldhavethesameability,wecansimplifybywritingAif=Af.This
inturnimpliesthatwecanmodeltheirearningsas

SubtractingtheequationforDonaldfromthatforRonaldgives

an equation fromwhich abilitydisappears.7 From thiswe learn
that
when ability is constantwithin twin pairs, a short regression of
the
differenceintwins’earningsonthedifferenceintheirschoolingrecov
ers
thelongregressioncoefficient,ρl.
Regression estimates constructedwithout differencing in the
twins

samplegenerateaschoolingreturnofabout11%,remarkablysimilart
o
Mincer’s.ThiscanbeseeninthefirstcolumnofTable6.2.Themodel
thatproducestheestimatesincolumn(1)includesage,agesquared,a
dummyforwomen,andadummyforwhites.Whitetwinsearnlessthan
blacktwins,anunusualresultintherealmofearningscomparisonsby
race,thoughthegaphereisnotsignificantlydifferentfromzero.
Thedifferencedequation(6.5)generatesaschoolingreturnofabout

6%, a result shown in column (2)ofTable6.2. This is
substantially
below the short regression estimate in column (1).This
declinemay
reflectabilitybiasintheshortmodel.Yet,onceagain,moresubtleforc
es
mayalsobeatwork.



TwinReportsfromTwinsburg
Twinsare similar inmanyways, including—alas—their
schooling.Of
340twinpairsinterviewedfortheTwinsburgschoolingstudies,about
halfreportidenticaleducationalattainment.Schoolingdifferences,S
1,f

−S2,f,varymuchlessthanschoolinglevels,Sif.Ifmosttwinsreallyha
ve
thesameschooling, thena fairnumberof thenonzerodifferences in
reportedschoolingmayreflectmistakenreportsbyatleastoneofthem.
Erroneousreports,calledmeasurementerror,tendtoreduceestimates
of
ρl in equation (6.5), a fact thatmay account for the decline in
the
estimatedreturnstoschoolingafterdifferencing.Afewpeoplereporti
ng
theirschoolingincorrectlysoundsunimportant,yettheconsequences
of
suchmeasurementerrorcanbemajor.
Toseewhymistakesmatter,imaginethattwinsfromthesamefamily

alwayshavethesameschooling.Inthisscenario,theonlyreasonS1,f−
S2,f isn’t zero for everyone is because schooling is sometimes
misreported. Suppose such erroneous reports are due to random
forgetfulness or inattention rather than something systematic.
The
coefficient from a regression of earnings differences on
schooling
differencesthatarenomorethanrandommistakesshouldbezerosince
randommistakesareunrelatedtowages.Inanintermediatecase,wher
e

some but not all of the variation in observed schooling is due to
misreporting,thecoefficientinequation(6.5)issmallerthanitwouldb
e
ifschoolingwerereportedcorrectly.Thebiasgeneratedbythissortof
measurement error in regressors is called attenuation bias. The
mathematical formula for attenuation bias is derived in the
chapter
appendix.

TABLE6.2
ReturnstoschoolingforTwinsburgtwins



Notes:ThistablereportsestimatesofthereturnstoschoolingforTwin
sburgtwins.Column
(1)showsOLSestimatesfrommodelsestimatedinlevels.OLSestimat
esofmodelsforcross-twin
differencesappearincolumn(2).Column(3)reports2SLSestimateso
falevelsregressionusing
sibling reports as instruments for schooling. Column (4) reports
2SLS estimates using the
differenceinsiblingreportstoinstrumentthecross-
twindifferenceinschooling.Standarderrors
appearinparentheses.

Misreported schooling attenuates the levels regression estimates
shown in column (1) of Table 6.2, but less so than the
differenced
estimatesincolumn(2).Thisdifferenceintheextentofattenuationbia
s
isalsoillustratedbythehypotheticalscenariowherealltwinssharethe
sameschoolingbutschoolinglevelsdifferacrossfamilies.Whentwin
sin
thesamefamilyreallyhavethesameschooling,allvariationinwithin-
family differences in reported schooling comes from mistakes.
By
contrast,mostofthecross-
familyvariationinreportedschoolingreflects
realdifferences ineducation.Realvariation inschooling is related
to
earnings, a fact thatmoderates attenuation bias in estimates of
the
modelforlevels,equation(6.4).Thisreflectsageneralpointaboutthe
consequencesof covariates formodelswithmismeasured
regressors—
additionalcontrolsmakeattenuationbiasworse—
apointdetailedinthe

chapterappendix.
Measurementerrorraisesan importantchallenge for theTwinsburg

analysis, sincemeasurement error alonemay explain the pattern
of
resultsseenincolumns(1)and(2)ofTable6.2.Movingfromthelevels
to the differenced regression accentuates attenuation bias,
probably
morethanalittle.Thedeclineinschoolingcoefficientsacrosscolumn
s
maythereforehavelittletodowithabilitybias.Fortunately,seasoned
masters Ashenfelter, Krueger, and Rouse anticipated the
attenuation
problem.Theyaskedeachtwintoreportnotonlytheirownschooling
butalsothatoftheirsibling.Asaresult,theTwinsburgdatasetscontain
twomeasuresofschoolingforeachtwin,oneself-
reportandonesibling
report.Thesiblingreportsprovideleveragetoreduce,andperhapseve
n
eliminate,attenuationbias.
Thekeytoolinthiscase,aswithmanyoftheotherproblemswe’ve

encountered, is IV.KarenandSharonmakemistakeswhen reporting
eachother’sschoolingaswellaswhenreportingtheirown.Aslongas

themistakesinKaren’sreportofhersister’sschoolingareunrelatedto
mistakes in her sister’s self-report, and vice versa, Karen’s
report of
Sharon’sschoolingcanbeusedasaninstrumentforSharon’sself-
report,
andviceversa.IVeliminatesattenuationbiasinthelevelsregressiona
s
well as in estimates of the differenced model (though the levels
regressionisstillmore likelythanthedifferencedregressiontosuffer
fromabilitybias).
Asalways,an IVestimate is theratioof reduced-formestimates to

first-stage estimates. When instrumenting the levels equation,
the
reduced-form estimate is the effect of Karen’s report of
Sharon’s
schoolingonSharon’searnings.Thecorrespondingfirst-
stageestimateis
the effect of Karen’s report of Sharon’s schooling on Sharon’s
self-
reportedschooling.Reduced-formandfirst-
stageresultsarestillsubject
toattenuationbias.Butwhenwedivideonebytheother,thesebiases
cancelout,leavinguswithanunattenuatedIVestimate.
IVworkssimilarlyinthefirstdifferencedmodel.Theinstrumentfor

within-family differences in schooling is the difference in the
cross-
sibling reports.Provided thatmeasurement errors inown- and
cross-



sibling schooling reports are uncorrelated, IV produces the no-
OVB,
unattenuatedlong-regressionreturntoschooling,ρl,thatwesetoutto
obtain.Uncorrelatednessofreportingerrorsacrosssiblings
isastrong
assumption,butanaturalstartingpointforanyexplorationofbiasfrom
measurementerror.
IVestimatesofthelevelsequationappearincolumn(3)ofTable6.2

(asalways,weexecutethisIVprocedurebyrunning2SLS,whichwork
s
no less well with instruments that are not dummy variables).
Instrumenting self-reported schooling with cross-sibling
reported
schoolingincreasestheestimatedreturntoschoolingonlyalittle,fro
m
.110to.116.Thisresultisconsistentwiththenotionthatthere’slittle

measurementerrorinthelevelofschooling.Bycontrast,instrumentin
g
thedifferencedequationbooststheestimatedreturntoschoolingfrom
.062to.108.Thisresult,reportedincolumn(4)ofTable6.2,pointsto
considerablemeasurement error in thedifferenceddata.At the
same
time, thedifferenced IV estimate of .108 is not far below the
cross-
sectionalestimateof.116,suggestingtheproblemwesetouttosolve—
abilitybiasinestimatesofthereturnstoschooling—
isn’tsuchabigdeal
afterall.

6.3EconometriciansAreKnownbyTheir…Instruments

It’stheLaw
Economiststhinkpeoplemakeimportantchoicessuchasthoserelated
to
schoolingbycomparinganticipatedcostswithexpectedbenefits.The
costofstayinginsecondaryschoolisdeterminedpartlybycompulsory
schoolinglaws,whichpunishthosewholeaveschooltoosoon.Sincey
ou
avoidpunishmentbystayinginschool,compulsoryschoolinglawsma
ke
extraschoolingseemcheaperrelativetothealternative,droppingout.

This generates a causal chain reaction leading from compulsory
schooling laws toschoolingchoices toearnings thatmight reveal
the
economicreturns toschooling.The ’metricsmethodsbehindthis
idea
arethoseofChapters3and5: instrumentalvariablesanddifferences-



indifferences.
Asalways,IVbeginswiththefirststage.Onehundredyearsago,there
werefewcompulsoryattendancelaws,whiletodaymostAmericansta
tes
keepstudents inschooluntilat leastage16.Manystatesalso forbid
school-
agedchildrenfromworking,orrequireschoolauthoritiestogive
permission for a child towork. Assuming that some
studentswould
otherwise drop out if not for such laws, stricter compulsory
school
requirements should increase average schooling. Provided
changes in
state compulsory attendance laws are also unrelated to the
potential
earningsofresidentsineachstate(asdeterminedbythingslikefamily

background, the states’ industrial structure, or other policy
changes),
theselawscreatevalidinstrumentsforschoolinginequationslike(6.1
).
But compulsory attendance laws probably are related to
potential
earnings. In the early twentieth century, for example,
agricultural
Southern states had few compulsory attendance requirements,
while
compulsoryschooling lawswerestricter in themore
industrialNorth.
SimplecomparisonsofearningsacrossU.S.regionstypicallyrevealv
ast
differences inearnings,but thesearemostlyunrelated to
theNorth’s
more rigorous schooling requirements. Compulsory schooling
requirements also grew stricter over time, but here, too, simple
comparisonsaremisleading.Many featuresof
theAmericaneconomy
changedasthetwentiethcenturyprogressed;compulsoryschoolingla
ws
arebutasmallpartofthisever-evolvingeconomicstory.
AcreativecombinationofDDand IVoffersapossiblewayaround
OVB roadblocks in this context. Compulsory schooling

requirements
expanded and tightened most dramatically in the first half of the
twentieth century. Masters Joshway and Daron Acemoglu
collected
state-by-
yearinformationonthecompulsoryschoolinglawsapplicable
tothosewhomighthavebeeninschoolatthistime.8Theselawsinclude
child
laborprovisionsaswellascompulsoryattendancerequirements.
Childlaborlawsthatrequireacertainamountofschoolingbecomplete
d
beforechildrenareallowedtoworkseemtohaveincreasedschooling
more than attendance requirements. A useful simplification in
this
contextusesthelawsineffectincensusrespondents’statesofbirthat



thetimetheywere14yearsoldtoidentifystatesandyearsinwhich7,
8, and9ormoreyearsof schoolingwere requiredbeforeworkwas
allowed.Theresultingsetofinstrumentalvariablesconsistsofdummi
es
foreachofthesethreecategories;theomittedcategoryconsistsofstate
s
andyearsinwhich6orfeweryearsofschoolingwererequiredbefore

workwasallowed.
Becausechildlaborinstrumentsvarywithbothstateandyearofbirth,
they can be used to estimate a first-stage equation that controls
for
possible time effects through the inclusion of year-of-birth
dummies,
whilecontrollingforstatecharacteristicsthroughtheinclusionofstat
e-
of-birth dummies. Control for state effects shouldmitigate bias
from
regional differences that are correlated with compulsory
schooling
provisions,while the inclusionofyear-of-birtheffects
shouldmitigate
bias from the fact that earnings differ across birth cohorts
formany
reasons besides compulsory schooling laws. The resulting first-
stage
equation looks like theChapter5 regressionDDmodel
(describedby
equation(5.5))usedtoestimatetheeffectofstateandyearchangesin
theMLDAondeathrates.Here,however,year-of-
birthdummiesreplace
dummiesforcalendartime.
TheAcemogluandAngristcompulsoryschoolingfirst-

stageequation
wasestimatedwithanextractofmenintheirforties,drawnfromeach
oftheU.S.censussamplesavailableeverydecadefrom1950to1990.
Stackingthesefivecensusesproducesasinglelargedataset inwhich
differentcensusescontributedifferentcohorts.Forexample,meninth
eir
fortiesobservedinthe1950Censuswerebornfrom1900to1909and
subject to laws ineffect in the1910sand1920s,whilemen in their
fortiesobservedinthe1960Censuswerebornfrom1910to1919and
subjecttolawsineffectinthe1920sand1930s.
Thefirst-stageestimatesreportedincolumn(1)ofTable6.3suggest
thatchildlaborlawsrequiring7or8yearsofschoolingbeforeworkwas
allowedincreasedschooling(measuredashighestgradecompleted)b
y
abouttwo-tenthsofayear.Lawsrequiring9ormoreyearsofschooling
beforeworkwasallowedhadaneffecttwiceaslarge.Aparallelsetof
reduced-formestimatesappearincolumn(3)ofthetable.Thesecome



fromregressionmodelssimilartothoseusedtoconstructthefirst-
stage
estimatesreportedincolumn(1),withthelogweeklywagereplacing
yearsofschoolingasthedependentvariable.Lawsrequiring7or8year
s

ofschoolingbeforeworkwasallowedappeartohaveraisedwagesby
about1%,whilelawsrequiring9ormoreyearsofschoolingbeforewor
k
increased earnings by almost 5%, though only the latter
estimate is
significant.The2SLSestimategeneratedbytheseestimatesis.124(wi
th
anestimatedstandarderrorof.036).
A12%wagegainforeachadditionalyearofschoolingisimpressive,
allthemoresosincetheschoolingincreaseinquestionisinvoluntary.
Strongercompulsoryschoolinglawsappeartoraiseschooling,andthi
s
inturnproduceshigherwagesforthemenconstrainedbythese laws
(compulsoryschoolingcompliers,inthiscase).Especiallyinterestin
gis
thefactthatthe2SLSestimateofthereturnstoschoolinggeneratedby
compulsory schooling instruments exceeds the corresponding
OLS
estimate of .075. This findingweighs against the notion of
upward
abilitybiasintheOLSestimate.

TABLE6.3
Returnstoschoolingusingchildlaborlawinstruments

Notes:Thistableshows2SLSestimatesofthereturnstoschoolingusin
gasinstrumentsthree
dummies indicating the years of schooling required by child
labor laws as a condition for
employment.PanelAreportsfirst-stageandreduced-
formestimatescontrollingforyearandstate
ofbirtheffectsandforcensusyeardummies.Columns(2)and(4)showt
heresultsofadding
state-
specificlineartrendstothelistofcontrols.PanelBshowsthe2SLSesti
matesofthereturns
toschoolinggeneratedbythefirst-stageandreduced-
formestimatesinpanelA.Samplesizeis
722,343.Standarderrorsarereportedinparentheses.

Beforedeclaringmissionaccomplished,amasterlooksforthreatsto
validity.Thevariationinschoolinggeneratedbycompulsoryschooli
ng
lawsproducesaDD-stylefirststageandreducedform.Asdiscussedin
Chapter5,theprincipalthreattovalidityinthiscontextisomittedstate
-
specific trends. Specifically, we must worry that states in which
compulsory schooling laws grew stricter simultaneously

experienced
unusually largewage growth across cohorts for reasons
unrelated to
schooling.Perhapswagegrowthandchangesinschoolinglawsarebot
h
drivenbysomethirdvariable,say,changesinindustrialstructure.
Thecaseforomittedvariablesbiasinthiscontextgrowsevenstronger
oncewerecognizethatmostoftheactioninthecompulsoryschooling
research design comes from comparisons of Northern and
Southern
states.Southernstatessawenormouseconomicgrowthinthetwentiet
h
century, while at the same time, social legislation in these states
proliferated.Therelativegrowth inearnings
inSouthernstatesmight
have been caused in part bymore restrictive compulsory
attendance
provisions.Butitmightnot.
Chapter5explainsthatasimplecheckforstate-specifictrendsaddsa
lineartimetrendforeachstatetothemodelofinterest.Inthiscase,the
relevanttimedimensionisyearofbirth,sothemodelwithstate-
specific
trendsincludesaseparatelinearyear-of-birthvariableforeachstateof
birthinthesample(theregressionmodelwithyear-of-
birthtrendslooks

likeequation(5.6)).
Columns(2)and(4)inTable6.3reporttheresultsofthisaddition.
The estimates in these columns offer little evidence that
compulsory
schooling lawsmatter for either schooling orwages. First-stage
and
reduced-
formestimatesbothfallprecipitouslyinthemodelwithtrends,
andnone are significantly different from zero. Importantly, the
first-



stageestimates incolumn(2)aremoreprecise (that is,have smaller
standarderrors)thanthoseestimatedwithoutstate-
specifictrends.Lack
ofstatisticalsignificancethereforecomesfromthefactthattheestima
tes
with trends are much smaller and not from reduced precision.
The
reduced-
formestimatesincolumn(4)similarlyofferlittleevidenceofa
linkbetweencompulsoryschoollawsandearnings.The2SLSestimat
e
generatedbycolumns (2)and (4) comesoutat an implausibly large

.399,butwithastandarderroralmostaslarge.SadtosayforMaster
Joshway,Table6.3revealsafailedresearchdesign.

ToEverythingThereIsaSeason(ofBirth)
MASTEROOGWAY:Ye sterdayishistory,tomorrowisamystery,but
todayisagift.Thatiswhyitiscalledthepresent.
KungFuPanda

Yougetpresentsonyourbirthday,butsomebirthdatesarebetterthan
others.AbirthdaythatfallsnearChristmasmightreduceyourwindfall
if
giftgiverstrytomakeonepresentdodoubleduty.Ontheotherhand,
manyAmericansbornlateintheyeargetsurprisegiftsintheformof
higherschoolingandhigherearnings.
The path leading from late-year births to increased schooling
and

earnings starts in kindergarten. In most states, children enter
kindergartenintheyeartheyturn5,whetherornotthey’vehadafifth
birthday by the time school starts in early September. Jae, born
on
January1st,waswellonthewaytowardhissixthbirthdaywhenhe
startedschool.Bycontrast,Dante,bornonDecember1st,wasnoteven
5
whenhestarted.Suchbirthday-baseddifferencesinschool-

startingage
arelifechangingforsome.
The life-changing nature of school-starting age is an unintended

consequenceofAmericancompulsoryattendancelaws.Bythemiddle
of
thetwentiethcentury,moststateswereallowingstudentstoleavescho
ol
(thatis,todropoutofhighschool)onlyafterthey’dturned16(some
statesrequireattendanceuntil17or18).Mostcompulsoryattendance



laws allowyou toquit school once you’ve reached thedropout
age,
withoutfinishingtheschoolyear.Jae,havingstartedschoolattheripe
oldageof5yearsand8months,turned16inJanuarytenyearslater,
earlyinhistenth-gradeyear.Dante,havingstartedschoolatthetender
ageof4yearsand9months,turned16inDecemberelevenyearslater,
afterfinishingtenthgradeandstartingeleventh.Bothwereitchingto
leave school as soon as they were allowed, and each dropped
out
immediatelyonturning16.ButDante,havingstartedschoolyounger,
wasforcedbyaccidentofbirthtocompleteonemoregradethanJae.
Youcan’tpickyourbirthday.Evenyourparentsprobablyfoundyour

birthday hard to fix. Ultimately, birth timing has a good deal of
randomnesstoit,mimickingexperimentalrandomassignment.Byvir
tue
ofthepartlyrandomnatureofbirthdates,menlikeJaeandDante,born
at different times of the year, are likely to have similar family
backgrounds and talents, even though they have very different
educationalattainment.ThissoundslikeapromisingscenarioforIV,a
nd
itis.
Masters Joshway and Alan Krueger used differences in
schooling

generatedbyquarterofbirth (QOB) to construct IVestimatesof the
economic returns to compulsory schooling.9 Angrist and
Krueger
analyzedlargepubliclyavailablesamplesfromthe1970and1980U.S.
Censuses, samples similar to those used by Acemoglu and
Angrist.
Somewhatunusuallyforpubliclyavailabledatasets,thesecensusfile
s
containinformationonrespondents’QOB.
TheQOBfirst stage for1980Censusrespondentsappears inFigure

6.1.ThisfigureplotsaverageschoolingbyyearandQOBformenborn

inthe1930s.Mostmeninthesecohortsfinishedhighschool,sotheir
averagehighestgradecompletedrangesfrom12to13years.Figure6.1
exhibitsasurprisingsawtoothpattern:Menbornearlierintheyeartend
tohaveloweraverageschoolingthanthosebornlater.Theteethofthe
sawhaveanamplitudeofabout.15.Thismaynotseemlikemuch,but
it’sconsistentwiththestoryofJaeandDante.Amongmenborninthe
1930s,about20%leftschoolingrade10orsooner.Late-quarterbirths
imposeabout.75ofagrade’sworthofextraschoolingonthis20%.The



calculation.2×.75=.15accountsfortheupsanddownsinFigure
6.1.
Asalways,IVistheratioofthereducedformtothecorresponding

firststage.TheQOBreducedformisplottedinFigure6.2.Theflatness
ofearningsfromyeartoyearseeninthisfigureisn’tsurprising.Earnin
gs
initiallyincreasesharplywithage,buttheage-
earningsprofiletendsto
flatten out formen in their forties. Importantly, however, the
QOB
sawtoothinschoolingisparalleledbyasimilarQOBsawtoothinavera
ge
earnings.Menbornlaterintheyearnotonlygetmoreschoolingthan

thosebornearlier,theyhavehigherearningsaswell.IVlogicattributes
the sawtooth pattern in average earnings by QOB to the
sawtooth
patterninaverageschoolingbyQOB.

FIGURE6.1
Thequarterofbirthfirststage

Notes:Thisfigureplotsaverageschoolingbyquarterofbirthformenb
ornin1930–1939in
the1980U.S.Census.Quartersarelabeled1–
4,andsymbolsforthefourthquarterarefilledin.

FIGURE6.2
Thequarterofbirthreducedform



Notes:Thisfigureplotsaveragelogweeklywagesbyquarterofbirthfo
rmenbornin1930–
1939inthe1980U.S.Census.Quartersarelabeled1–
4,andsymbolsforthefourthquarterare
filledin.

AsimpleQOB-basedIVestimatecomparestheschoolingandearnings

ofmenborninthefourthquartertotheschoolingandearningsofmen
borninearlierquarters.Table6.4organizestheingredientsforthisIV
recipeusingthesamesampleaswasusedtoconstructFigure6.1.Men
borninthefourthquarterearnalittlemorethanthosebornearlier,a
differenceofabout.7%.Fourth-
quarterbirthsalsohavehigheraverage
educationalattainment;here,thedifferenceisabout.09years.Dividi
ng
thefirstdifferencebythesecond,wehave

TABLE6.4
IVrecipeforanestimateofthereturnstoschoolingusingasingle

quarterofbirthinstrument



Notes:Samplesizeis329,509.Standarderrorsarereportedinparenthe
ses.

TABLE6.5
Returnstoschoolingusingalternativequarterofbirthinstruments

Notes:ThistablereportsOLSand2SLSestimatesofthereturnstoscho
olingusingquarterof

birthinstruments.Theestimatesincolumns(3)–
(5)arefrommodelscontrollingforyearofbirth.
Columns(1)and(3)showOLSestimates.Columns(2),(4),and(5)sho
w2SLSestimatesusing
theinstrumentsindicatedinthethirdrowofthetable.F-
testsforthejointsignificanceofthe
instrumentsinthecorrespondingfirst-
stageregressionarereportedinthesecondrow.Sample
sizeis329,509.Standarderrorsarereportedinparentheses.

Bywayofcomparison,thebivariateregressionoflogweeklywageson
schoolingcomesoutremarkablyclose,at.071.ThesesimpleOLSandI
V
estimates are repeated in the first two columns of Table 6.5. The
columnscontainingIVestimatesarelabeled“2SLS”because,asalwa
ys,
that’showwedoIV.
Aswith the IV estimates of the effects of family size discussed
in

Chapter3,wecanuse2SLStoaddcovariatesandadditionalinstrument
s
totheQOBIVstory.OLSand2SLSestimatesofmodelsincludingyear
of

birthdummies(acontrolforageinour1980crosssection)appearin
columns (3) and (4) of Table 6.5. These results are almost
indistinguishablefromthoseincolumns(1)and(2).Addingdummies
for firstandsecondquartersofbirth to the instrument list,however,
leadstoanoteworthygaininprecision.Thethree-
instrumentestimate,
reported in column (5), is larger than single-instrument
estimates
reportedincolumns(2)and(4),withastandarderrorthatfallsfrom
.028to.020.
What’srequiredfor2SLSestimatesusingQOBinstrumentstocapture

thecausaleffectofeducationonearnings?First,theinstrumentsmust
predict the regressorof interest (in this case, schooling).Second,
the
instruments shouldbe as good as randomly assigned in the sense
of
being independent of omitted variables (in this case, variables
like
family background and ability). Finally,QOB should affect
outcomes
solely through the channel we’ve chosen as the variable to be
instrumented(inthiscase,schooling).Otherchannelsmustbeexclude

d.
It’sworthaskinghowQOBinstrumentsmeasureuptothesefirst-
stage,
independence,andexclusionrestrictionrequirements.
We’ve seen thatQOBproducesa clear sawtoothpattern inhighest

gradecompleted.Thisisacompellingvisualrepresentationofastrong
firststage,confirmedbythelargeF-
statisticsinTable6.5.Asdiscussed
intheappendixtoChapter3,alargefirst-stageF-statisticsuggestsbias
fromweakinstrumentsisunlikelytobeaprobleminthiscontext.
Is QOB independent of maternal characteristics? Birthdays
aren’t

literally randomly assigned, of course. Researchers have long
documented season of birth patterns in mothers’ socioeconomic
background.A recent
studybyKaseyBucklesandDanielHungerman
explores these patterns further.10 Buckles and Hungerman find
that
maternalschooling—agoodmeasureof familybackground—peaks
for
motherswhogivebirthinthesecondquarter.Thissuggeststhatfamily
backgroundcannotaccount for the seasonalpattern in
schoolingand

wages seen inFigures6.1 and6.2, bothofwhich exhibit third- and
fourth-quarter peaks. In fact, averagematernal schooling byQOB
is
slightlynegativelycorrelatedwithaverageoffspringschoolingbyQ
OB.



Notsurprisingly,therefore,controlforaveragematernalcharacterist
ics
moderately increases IV estimates of schooling returns using
QOB
instruments.Seasonofbirthvariationinfamilybackground,thoughn
ot
zero,doesnotfollowapatternthatchangesQOB-
based2SLSestimates
substantially.
Finally,whatofexclusion?TheQOBfirststageisgeneratedbythefact

thatlater-bornstudentsenterschoolyoungerthanthosebornearlierin
theyear,andthereforecompletemoreschoolingbeforethey’reallowe
d
todropout.Butwhat if school-startingage itselfmatters?Themost
commonlytoldentry-agestoryisthattheyoungestchildreninafirst-
gradeclassareatadisadvantage,whilechildrenwhoarealittleolder

than their classmates tend to do better.Here too, the
circumstantial
evidence forQOB instruments is encouraging. The crux of
theQOB-
compulsoryschoolingstoryisthatyoungerentrantsultimatelycomeo
ut
ahead,andthisiswhatthedatashow.11

Empiricalstrategiesareneverperfect.Weaknailsbend,butthehouse
of’metricsneedn’tcollapse.Wecan’tprovethataparticularIVstrateg
y
satisfies the assumptions required for a causal interpretation.
The
econometrician’s position is necessarily defensive. As we’ve
seen,
however,keyassumptionscanbeprobedandcheckedinavarietyof
ways,andsotheymustbe.Mastersroutinelychecktheirownworkand
assumptions,whilecarefullyevaluatingresultsreportedbyothers.
Onthesubstantiveside,IVestimatesusingQOBinstrumentscomeout

similar to or larger than the corresponding OLS estimates of the
economic return to schooling. Modest measurement error in the
schooling variable might explain the gap between 2SLS and
OLS
estimates,muchasinthetwinsdata.Theseresultssuggestdownward

bias frommismeasured schoolingmatters asmuchormore thanany
ability bias that causes us to overestimate the economic value of
education. The earnings gain generated by an additional grade
completed seems to be about 7–10%. Bertie Gladwin might have
accomplishedevenmorehadhefinishedhisschoolingsooner.



6.4RustlingSheepskinintheLoneStarState

Schooling means many things, and every educational experience
is
different.Buteconomistslookatdiverseeducationalexperiencesand
see
themallascreatinghumancapital:acostly investment inskills from
whichwealsoexpecttoseeareturn.Somestudents,likeBertieGladwi
n,
enjoy school for its own sake and show little interest in
economic
returns.Butmanymoreprobablyseetheirschoolingasstressful,tiring
,
andexpensive.Inadditiontotuitioncosts,timespentinschoolcould
havebeenspentworking.Manycollegestudentsspendrelativelylittle
ontuition,butallfull-
timestudentspayanopportunitycost.Thisnotion

—thatalargepartofthecostsofacquiringaneducationcomesinthe
formof forgoneearnings—leadsus toexpect eachyearof additional
schoolingtogenerateaboutthesameeconomicreturn,whetherit’sthe
tenth,twelfth,ortwentiethyearatthebooks.Thesimplehumancapital
viewofschoolingembodiesthisidea.
Ofcourse,peoplewhohavenothadthebenefitofeconomicstraining

probably don’t think about education like this. Most measure
their
educational attainment in termsofdegrees insteadof years. Few
job
applicants describe themselves as having completed “17 years
of
schooling.”Rather,applicantslisttheschoolsfromwhichtheygradua
ted
andthedatesofdegreesreceived.Toaneconomist,however,degrees
arejustpiecesofpaperthatshouldhavelittleornorealvalue.Master
Stevefu is a case in point: though he spent many years in
college,
attendingSusquehannaUniversityincentralPennsylvania(amongot
her
fineinstitutions)hehasyettoearnhisbachelor’sdegree.Reflectingthi
s
dismissive view of the value of certification, economists refer
to the

hypothesisthatdegreesmatteras“sheepskineffects,”afterthemateri
al
onwhichdiplomaswereoriginallyinscribed.
Thesearchforsheepskineffects ledMastersDamonClarkandPaco

MartorelltoacleverfuzzyRDresearchdesign.12Theyexploitthefact
thatinTexas,asinmanyotherstates,receiptofahighschooldiplomais
conditionalon satisfactorycompletionofanexitexam inaddition to
state-required coursework. Students first take this exam in tenth
or



eleventhgrade,withretestsscheduledperiodicallyforthosewhofail.
A
last-chance exit exam for those who have failed previously is
administered at the end of twelfth grade. In truth this isn’t the
last
chanceforaTexasseniortoearnadiploma;it’spossibletotryagain
later.Still,formanywhotakeit,thelast-chanceexamisdecisive.
Thedecisivenatureofthelast-chanceexitexamformanyTexashigh

schoolseniorsisdocumentedinFigure6.3,whichplotstheprobability
of diploma receipt against last-chance exam scores, centered at
the

passingthreshold.Thefigure,whichplotsaveragesconditionaloneac
h
score value alongwith fitted values from a fourth-order
polynomial
estimatedseparatelyoneithersideofthepassingcutoff,showsdiplom
a
awardratescloseto.5forstudentswhomissthecutoff.Forthosewhose
scoresclearthecutoff,however,diplomaawardratesjumpabove90%.
Thischangeisdiscontinuousandunambiguous:Figure6.3documents
a
fuzzyRDfirststageofnearly.5fortheeffectsofexitexampassageon
diplomareceipt.
Manyofthosewhoearnadiplomagoontocollege, inwhichcase

theirearningsstaylowuntilthisadditionalschoolingisalsocomplete
d.
It’s therefore important to look far enough down the road for
any
sheepskineffect inearningstoemerge.ClarkandMartorelluseddata
from the Texas unemployment insurance system, which records
longitudinalinformationontheearningsofmostworkersinthestate,t
o
followtheearningsofthosetakingthelast-chanceexamforupto11
years.
Earningsdataforaperiodrangingfrom7–11yearsafterstudentssat

fortheir last-chanceexitexamshownoevidenceofsheepskineffects.
This canbe seen inFigure6.4,whichplots average annual earnings
againstexamscoresinaformatparallelingthatofFigure6.3(earnings
hereareindollarsandnotinlogs,andtheaveragesincludezerosfor
peoplewhoaren’tworking).Figure6.4isapictureofthereducedform
inafuzzyRDdesignthatusesadummyforpassingtheexitexamasan
instrumentalvariablefortheeffectofdiplomareceiptonearnings.As
always,whenthereducedformiszero—inthiscase,nojumpappearsin
Figure6.4—weknowthat thecorresponding2SLSestimate iszeroas



well.

FIGURE6.3
Last-chanceexamscoresandTexassheepskin

Notes: Last-chance exam scores are normalized relative to
passing thresholds.Dots show
averagediplomareceiptconditionaloneachscorevalue.Thesolidline
sarefittedvaluesfroma
fourth-
orderpolynomial,estimatedseparatelyoneithersideofthepassingcu
toff(indicatedby

theverticaldashedline).

FIGURE6.4
Theeffectoflast-chanceexamscoresonearnings

Notes: Last-chance exam scores are normalized relative to
passing thresholds.Dots show
averageearningsconditionaloneachscorevalue,includingzerosforn
onworkers.Thesolidlines
are fitted values froma fourth-order polynomial, estimated
separately on either side of the
passingcutoff(indicatedbytheverticaldashedline).

The2SLSestimatesgeneratedbydividingthefirst-stageandreduced-
formdiscontinuitiesseeninFigures6.3and6.4showadiplomaeffecto
f
$52(withastandarderrorofabout$630).Thisamountstolessthan



halfapercentofaverageearnings,whichareabout$13,000.Theseare
smalleffectsindeed,weighingagainstthesheepskinhypothesis.Onth
e
other hand, the associated confidence intervals also include
earnings

effectsofnearly10%.
Largestandarderrorsleaveuswiththepossibilityofsomesheepskin
effects, so the search forevidenceon thispointwill
surelycontinue.
Masters know the search for econometric truth never ends, and
that
what isgoodtodaywillbebetteredtomorrow.Ourstudents teachus
this.

MASTERSTEVEFU:Timeforyoutoleave,Grasshopper.Youmust
continueyourjourneyalone.Remember,whenyoufollowthe
’metricspath,anythingispossible.
MASTERJOSHWAY:Anythingisp ossible,Grasshopper.Evenso,al
ways



takethemeasureoftheevidence.

Appendix:BiasfromMeasurementError

You’vedreamedofrunningtheregression

butdata ,ontheregressorofyourdreams,areunavailable.Yousee
onlyamismeasuredversion,Si.Writetherelationshipbetweenobserv

ed
anddesiredregressorsas

wheremi is themeasurement error inSi. To simplify, assume
errors
averagetozeroandareuncorrelatedwith andtheresidual,ei.Then
wehave

Theseassumptionsdescribeclassicalmeasurementerror(jazzierfor
msof
measurementerrormayrockyourregressioncoefficientsevenmore).
Theregressioncoefficientyou’reafter,βinequation(6.6),isgivenby

Usingthemismeasuredregressor,Si,insteadof ,youget

whereβbhasasubscript“b”asareminderthatthiscoefficientisbiased.
Toseewhyβbisabiasedversionofthecoefficientyou’reafter,use



equations(6.6)and(6.7)tosubstituteforYiandSiinthenumeratorof
equation(6.8):

Thenext-to-
lastequalssignhereusestheassumptionthatmeasurement

error,mi,isuncorrelatedwith andei;thelastequalssignusesthefact
that isuncorrelatedwithaconstantandwithei,sincethelatterisa
residual from a regression on .We’ve also used the fact that the
covarianceof withitselfisitsvariance(seetheappendixtoChapter2
for an explanation of these and related properties of variance
and
covariance).
We’veassumedthatmiisuncorrelatedwith .Becausethevarianceof
the sumof uncorrelated variables is the sumof their variances,
this
implies

whichmeanswecanwrite

where

isanumberbetweenzeroandone.
The fraction r describes the proportion of variation in Si that is



unrelated to mistakes and is called the reliability of Si.
Reliability
determines theextent towhichmeasurementerrorattenuatesβb.The
attenuationbiasinβbis

sothatβb is smaller than(apositive)βunlessr=1,andthere’sno
measurementerrorafterall.

AddingCovariates
InSection6.1,wenotedthattheadditionofcovariatestoamodelwith
mismeasured regressors tends to exacerbate attenuation bias.
The
TwinsburgstorytoldinSection6.2isaspecialcaseofthis,wherethe
covariatesaredummies for families in samplesof twins.To
seewhy
covariatesincreaseattenuationbias,supposetheregressionofinteres
tis

whereXiisacontrolvariable,perhapsIQoranothertestscore.Wekno
w
from regression anatomy that the coefficient on in thismodel is
givenby

where istheresidualfromaregressionof onXi.Likewise,replacing
withSi,thecoefficientonSibecomes

where istheresidualfromaregressionofSionXi.
Add the (classical) assumption that measurement error, mi, is

uncorrelated with the covariate, Xi. Then the coefficient from a
regressionofmismeasuredSionXiisthesameasthecoefficientfroma
regressionof onXi(usethepropertiesofcovarianceandthedefinition
ofaregressioncoefficienttoseethis).Thisinturnimpliesthat

wheremiand areuncorrelated.Wethereforehave

Applyingthelogicusedtoestablishequation(6.9),weget

where

Liker,thisliesbetweenzeroandone.
What’snewhere?Thevarianceof isnecessarilyreducedrelativeto
thatof ,becausethevarianceof isthevarianceofaresidualfroma
regression model in which is the dependent variable. Since

,wealsohave

This explains why adding covariates to a model with
mismeasured
schooling aggravates attenuation bias in estimates of the returns
to

schooling.Intuitively,thisaggravationisaconsequenceofthefacttha
t
covariates are correlated with accurately measured schooling
while
being unrelated to mistakes. The regression-anatomy operation
that
removes the influenceofcovariates therefore reduces the
information
contentofamismeasuredregressorwhileleavingthenoisecomponent

the mistakes—unchanged (test your understanding of the formal
argumentherebyderivingequation(6.11)).Thisargumentcarriesove
r
tothedifferencingoperationusedtopurgeabilityfromequation(6.4):
differencingacrosstwinsremovessomeofthesignalinschooling,whi
le
leavingthevarianceofthenoiseunchanged.

IVClearsOurPath
Withoutcovariates,theIVformulaforthecoefficientonSiinabivariat
e
regressionis

whereZiistheinstrument.InSection6.2,forexample,weusedcross-
sibling reports to instrument for possibly mismeasured self-
reported
schooling. Provided the instrument is uncorrelated with the
measurement error and the residual, ei, in equations like (6.6),
IV
eliminatesthebiasduetomismeasuredSi.
ToseewhyIVworksinthiscontext,useequations(6.6)and(6.7)to
substituteforYiandSiinequation(6.12):

Ourdiscussionof themistakes inKaren andSharon’s reports of
one
another’sschoolingassumesthatC(ei,Zi)=C(mi,Zi)=0.Thisinturn



impliesthat

Thishappyconclusioncomesfromourassumptionthattheonlyreason
Zi is correlatedwithwages is because it’s correlatedwith . Since

,andmiisunrelatedtoZi,theusualIVmiraclegoesthrough.

PO:Thatisseverelycool.
KungFuPanda2

1See“‘I’mJustaLateBloomer’:Britain’sOldestStudentGraduatesw
ithaDegreeinMilitary
IntelligenceAged91,”TheDailyMail,May21,2012.
2Mincer’sworkappearsinhislandmarkbook,Schooling,Experience
,andEarnings,Columbia
UniversityPressandtheNationalBureauofEconomicResearch,1974
.
3Therelationshipbetweenexperienceandearningsdescribedbythes
eestimatesreflectsa
gradualdeclineinearningsgrowthwithage.Toseethis,supposeweinc
reaseXifromavaluex
tox+1.ThetermXiincreasesby1,while increasesby

Theneteffectofa1-yearexperienceincreaseistherefore

Thefirstyearofexperienceisthereforeestimatedtoboostearningsbya
lmost8%whilethetenth
yearofexperienceincreasesearningsbyonlyabout5.6%.Infact,theex
perienceprofile,asthis
relationshipiscalled,flattensoutcompletelyafterabout30yearsofex
perience.
4 Zvi Griliches, “Estimating the Returns to Schooling—Some
Econometric Problems,”

Econometrica,vol.45,no.1,January1977,pages1–22.
5Attentivereaderswillnoticethatpotentialexperience,itselfadowns
treamconsequenceof
schooling,alsofallsunderthecategoryofbadcontrol.Inprinciple,the
biasherecanberemoved
byusingageanditssquaretoinstrumentpotentialexperienceanditssq
uare.Asinthestudies
referencedintherestofthischapter,wemightalsosimplyreplacethee
xperiencecontrolwith
age, thereby targetinganet schoolingeffect thatdoesnotadjust
fordifferences inpotential
experience.
6OrleyAshenfelterandAlanB.Krueger,“Estimatesof
theEconomicReturns toSchooling
fromaNewSampleofTwins,”AmericanEconomicReview,vol.84,no
.5,December1994,pages
1157–
1173,andOrleyAshenfelterandCeciliaRouse,“Income,Schooling,a
ndAbility:Evidence
fromaNewSampleofIdenticalTwins,”QuarterlyJournalofEconomi
cs,vol.113,no.1,February



1998,pages253–284.

7Estimatesofthisdifferencedmodelcanalsobeobtainedbyaddingad
ummyforeachfamily
toanundifferencedmodelfit inasamplethat
includesbothtwins.Familydummiesarelike
selectivity-
groupdummiesinequation(2.2)inChapter2andstatedummiesinequa
tion(5.5)in
Section5.2.Withonlytwoobservationsperfamily,modelsestimated
afterdifferencingacross
twinswithinfamiliestoproduceasingleobservationperfamilygenera
teestimatesofthereturns
toschoolingidenticaltothosegeneratedby“dummyingout”eachfami
lyinapooledsamplethat
includesbothtwins.
8 Daron Acemoglu and Joshua D. Angrist, “How Large Are
Human-Capital Externalities?
EvidencefromCompulsory-
SchoolingLaws,”inBenS.BernankeandKennethRogoff(editors),
NBERMacroeconomicsAnnual2000,vol.15,MITPress,2001,pages
9–59.
9 Joshua D. Angrist and Alan B. Krueger, “Does Compulsory
School Attendance Affect
SchoolingandEarnings?”QuarterlyJournalofEconomics,vol.106,n
o.4,November1991,pages
979–1014.

10 Kasey Buckles and DanielM. Hungerman, “Season of Birth
and Later Outcomes: Old
Questions,NewAnswers,”NBERWorkingPaper14573,NationalBu
reauofEconomicResearch,
December2008.SeealsoJohnBound,DavidA.Jaeger,andReginaM.
Baker,whowerethefirst
tocautionthatIVestimatesusingQOBinstrumentsmightnothaveaca
usalinterpretationin
“ProblemswithInstrumentalVariablesEstimationWhentheCorrelat
ionbetweentheInstruments
and the Endogeneous Explanatory Variable Is Weak,” Journal of
the American Statistical
Association,vol.90,no.430,June1995,pages443–450.
11Formoreonthispoint,seeJoshuaD.AngristandAlanB.Krueger,“T
heEffectofAgeat
SchoolEntryonEducationalAttainment:AnApplicationofInstrume
ntalVariableswithMoments
fromTwoSamples,”JournaloftheAmericanStatisticalAssociation,
vol.87,no.418,June1992,
pages328–336.
12DamonClarkandPacoMartorell,“TheSignalingValueofaHighSc
hoolDiploma,”Journal
ofPoliticalEconomy,vol.122,no.2,April2014,pages282–318.

ABBREVIATIONSANDACRONYMS

Abbreviations andacronymsare introducedon thepage indicated
in
parentheses.


2SLS two-stage leastsquares,an instrumentalvariablesestimator
that
replaces theregressorbeing instrumentedwith fittedvalues from
thefirststage(p.132)

ALSastudybyJoshuaD.Angrist,VictorLavy,andAnaliaSchlossero
n
thecausallinkbetweenquantityandqualityofchildreninIsraeli
families(p.127)

BLS Boston Latin School, the top school in the Boston exam
school
hierarchy(p.164)

C&BCollegeandBeyond,adataset(p.52)

CEFconditionalexpectationfunction,thepopulationaverageofYiwi

th
Xiheldfixed(p.82)

CLT Central Limit Theorem, a theoremwhich says that almost
any
sample average is approximately normally distributed, with the
accuracy of the approximation increasing as the sample size
increases(p.39)

DD differences-in-differences, an econometric tool that
compares
changesovertimeintreatmentandcontrolgroups(p.178)

HIEHealthInsuranceExperiment,alargerandomizedtrialconducted
by
theRANDCorporationthatprovidedtreatedfamilieswithdifferent



typesofhealthinsurancecoverage(p.16)
ITT intention-to-treat effect, the average causal effect of an
offer of
treatment(p.119)

IVinstrumentalvariables,aneconometrictoolusedtoeliminateomitt

ed
variablesbiasorattenuationbiasduetomeasurementerror(p.98)

JTPAJobTrainingPartnershipAct,anAmericantrainingprogramtha
t
includedarandomizedevaluation(p.122)

KIPPKnowledgeIsPowerProgram,anetworkofcharterschoolsinthe
UnitedStates(p.99)

LATE local average treatment effect, the average causal effect
of
treatmentoncompliers(p.109)

LIMLlimitedinformationmaximumlikelihoodestimator,analternat
ive
totwo-stageleastsquareswithlessbias(p.145)

LLNLawofLargeNumbers,astatisticallawaccordingtowhichsampl
e
averages approach the corresponding population average
(expectation)asthesamplesizegrows(p.13)

MDVE Minneapolis Domestic Violence Experiment, a
randomized

evaluationof policing strategies to combatdomestic violence (p.
116)

MLDAminimumlegaldrinkingage(p.148)

MVAmotorvehicleaccidents(p.159)

NHISNationalHealthInterviewSurvey,adataset(p.3)

OHPOregonHealthPlan, theOregonversionofMedicaid, forwhich
eligibilitywaspartlydeterminedbyalottery(p.25)

OLSordinaryleastsquares,thesampleanalogofpopulationregressio
n
coefficients;weuseOLStoestimateregressionmodels(p.58)

OVB omitted variables bias, the relationship between regression
coefficientsinmodelswithdifferentsetsofcovariates(p.69)

QOBquarterofbirth(p.229)



RD regression discontinuity design, an econometric tool used
when

treatment, the probability of treatment, or average treatment
intensityisaknown,discontinuousfunctionofacovariate(p.147)

RSS residual sum of squares, the expected (population average
of)
squaredresidualsinregressionanalysis(p.86)

TOT treatment effect on the treated, the average causal effect of
treatmentinthetreatedpopulation(p.114)

WLS weighted least squares, a regression estimator that weights
observationssummedintheRSS(p.202)



EMPIRICALNOTES

Tables

Table 1.1 Health and demographic characteristics of insured and
uninsuredcouplesintheNHIS

Data source. The 2009 NHIS data are from the Integrated Health
InterviewSeries(IHIS)andareavailableatwww.ihis.us/ihis/.

Sample.Thesampleusedtoconstruct this tableconsistsofhusbands
andwivesaged26–59,withatleastonespouseworking.

Variable definitions. Insurance status is determined by the IHIS
variableUNINSURED. The health index is on a five-point scale,
where1=poor,2=fair,3=good,4=verygood,5=excellent;
this comes from the variable HEALTH. Education is constructed
fromthevariableEDUCandmeasurescompletedyearsofschooling.
HighschoolgraduatesandGEDholdersareassigned12yearsof
schooling.Peoplewithsomecollegebutnodegree,andthosewith
anassociate’sdegree,areassigned14yearsofschooling.Bachelor’s
degreeholdersareassigned16yearsofschooling,andholdersof
higher degrees are assigned 18 years of schooling. Employed
individuals are those “working for pay” or “with job but not at
work”asindicatedbythevariableEMPSTAT.
Familyincomeisconstructedbyassigningtoeachbracketofthe
IHISincomevariable(INCFAM07ON)theaveragehouseholdincom
e
forthatbracketbasedondatafromthe2010CurrentPopulation
Survey(CPS)Marchsupplement(usingtheCPSvariableFTOTVAL).

http://www.ihis.us/ihis/


The CPS sample used for this purpose omits observations with

nonpositivehouseholdincomeaswellasobservationswithnegative
weights. CPS income is censored at the 98th percentile; values
abovethe98thpercentileareassigned1.5timesthe98thpercentile
value.

Additionaltablenotes.Allcalculationsareweightedusingthevariabl
e
PERWEIGHT.Robuststandarderrorsareshowninparentheses.

Table1.3DemographiccharacteristicsandbaselinehealthintheRAN
D
HIE

Datasource.TheRANDHIEdataarefromJosephP.Newhouse,“RAN
D
Health Insurance Experiment [in Metropolitan and Non-
MetropolitanAreasoftheUnitedStates],1974–1982,”ICPSR06439-
v1, Inter-UniversityConsortium forPolitical andSocialResearch,
1999. This data set is available at
http://doi.org/10.3886/ICPSR06439.v1.

Sample. The sample used to construct this table consists of
adult
participants (14 years old and older) with valid enrollment,
expenditure,andstudyexitdata.

Variabledefinitions.Thedemographicvariables inpanelAand the
healthcharacteristicsinpanelBaremeasuredattheexperimental
baseline.Thegeneralhealthindexratestheparticipant’sperception
ofhisorhergeneralhealthatthetimeofenrollment.Highervalues
indicatemore favorable self-ratings of health; less health-related
worry;andgreaterperceivedresistancetoillness.Thementalhealth
indexratestheparticipant’smentalhealth,combiningmeasuresof
anxiety, depression, and psychological well-being. Higher
values
indicate better mental health. The education variable measures
numberof years of completed education and is onlydefined for
individuals16yearsandolder.Familyincomeisinconstant1991
dollars.

Additionaltablenotes.Standarderrorsinparenthesesareclusteredat
thefamilylevel.

http://doi.org/10.3886/ICPSR06439.v1


Table1.4HealthexpenditureandhealthoutcomesintheRANDHIE

Datasource.SeenoteforTable1.3.

Sample.SeenoteforTable1.3.ThepanelAsamplecontainsmultiple
observationsforthesamepersonfromadifferentfollow-upyear.

Variabledefinitions.SeenotesforTable1.3.VariablesinpanelAare
constructed from administrative claims data for each year, and
variablesinpanelBaremeasureduponexitfromtheexperiment.
Face-to-face visits counts the number of face-to-face visits with
health professionals that were covered by insurance (excluding
dental, psychotherapy, and
radiology/anaesthesiology/pathology-
only visits). Hospital admissions indicates the total number of
covered participant hospitalizations, including admissions for
reasonsofmentalhealth.Theexpenditurevariablesareinconstant
1991dollars.

Additionaltablenotes.Standarderrorsinparenthesesareclusteredat
thefamilylevel.

Table1.5OHPeffectsoninsurancecoverageandhealth-careuse

Sources. The numbers in columns (1) and (2) are from Amy N.
Finkelstein et al., “The Oregon Health Insurance Experiment:
EvidencefromtheFirstYear,”QuarterlyJournalofEconomics,vol.
127,no.3,August2012,pages1057–1106.Ournumberscomefrom
theoriginalasfollows:

▪ row(1) inpanelA fromrow(1),columns(1)and(2) in
TableIII;

▪ row(2) inpanelA fromrow(1),columns(1)and(2) in
TableIV;

▪ row(1) inpanelB fromrow(2),columns(5)and(6) in
TableV;and

▪ row(2) inpanelB fromrow(1),columns(1)and(2) in
TableV.



Thenumbersreportedincolumns(3)and(4)arefromSarahL.
Taubman et al., “Medicaid Increases Emergency-
DepartmentUse:
EvidencefromOregon’sHealthInsuranceExperiment,”Science,vol
.
343, no. 6168, January 17, 2014, pages 263–268.Our numbers
comefromtheoriginalasfollows:

▪row(1)fromrow(1),columns(1)and(2)inTableS7;
▪row(3)fromrow(1),columns(3)and(4)inTableS2;

▪row(4)fromrow(1),columns(7)and(8)inTableS2.

Samples.Columns(1)and(2)inpanelAusethefullsampleanalyzedin
the hospital discharge and mortality data in Finkelstein et al.
(2012). Columns (3) and (4) in panel A are drawn from the
emergency department records of 12 Portland area emergency
departments for visits occurring between March 10, 2008 and
September 30, 2009. Panel B uses the follow-up survey data
analyzedinFinkelsteinetal.(2012).

Variabledefinitions.Thevariableinrow(1)inpanelAisadummyfor
Medicaidenrollmentinthestudyperiod(fromlotterynotification
through the end of September 2009), obtained from Medicaid
administrativedata.Thevariableinrow(2)inpanelAisadummy
equal to1 if the respondenthadanon-childbirthhospitalization
fromnotificationuntil the endofAugust2009.Thevariables in
rows(3)and(4)inpanelAindicateanyemergencydepartmentvisit
andcount thenumberof suchvisits.Thevariable in row (1) in
panelBmeasures thenumberofnon-childbirth-relatedoutpatient
visitsinthepast6months.Thevariableinrow(2)inpanelBisa
dummyforwhetherthepatienthadaprescriptiondrugatthetime
ofthesurvey.

Additionaltablenotes.Standarderrorsinparenthesesareclusteredat
thehouseholdlevel.

Table1.6OHPeffectsonhealthindicatorsandfinancialhealth



Sources.SeenotesforTable1.5.Thenumbersinrow(1)inpanelAin
thistableareobtainedfromrow(2),columns(1)and(2)inTable
IXinFinkelsteinetal.(2012).Thenumbersreportedincolumns(3)
and(4)arefromKatherineBaickeretal.,“TheOregonExperiment

EffectsofMedicaidonClinicalOutcomes,”NewEnglandJournalof
Medicine,vol.368,no.18,May2,2013,pages1713–1722.
Thenumbersincolumns(3)and(4)comefromcolumns(1)and

(2)intheoriginalasfollows:

▪row(2)inpanelAfromrow(3)inTableS2;
▪row(3)inpanelAfromrow(2)inTableS2;
▪row(4)inpanelAfromrow(6)inTableS1;
▪row(5)inpanelAfromrow(1)inTableS1;
▪row(1)inpanelBfromrow(3)inTableS3;and
▪row(2)inpanelBfromrow(4)inTableS3.

We thank Amy Finkelstein and Allyson Barnett for providing
unpublishedstandarderrorsforestimatesfromBaickeretal.(2013).

Samples.Columns(1)and(2)usethesamplefromthe(first)follow-up
surveyanalyzedinFinkelsteinetal.(2012).Columns(3)and(4)
use the sample from the (second) follow-up survey analyzed in
Baickeretal.(2013).

Variabledefinitions.Thevariableinrow(1)inpanelAisadummyfor
whethertherespondentratedhisorherhealthasgood,verygood,
orexcellent(ascomparedtofairorpoor).Rows(2)and(3)inpanel
AcontaintheSF-8physicalandmentalcomponentscores.Higher
SF-8scoresindicatebetterhealth.Thescaleisnormalizedtohavea
meanof50andstandarddeviationof10intheU.S.population;the
rangeis0to100.Seepages14–16oftheappendixofBaickeretal.
(2013) fordescriptionsof thesubjectiveandclinicalmeasuresof
healthusedinrows(2)–(5).Thevariableinrow(1)inpanelBisa
dummy forwhether health expenditures surpassed 30% of total
incomeinthepast12months.Thevariableinrow(2)inpanelBis



adummyforwhethertherespondenthadanymedicaldebtatthe
timeofthesurvey.

Additionaltablenotes.Standarderrorsinparenthesesareclusteredat
thehouseholdlevel.

Table2.2Privateschooleffects:Barron’smatches

Datasources.Thedataused to construct this tablearedescribed in
Stacy BergDale andAlan B. Krueger, “Estimating the Payoff to
AttendingaMoreSelectiveCollege:AnApplicationofSelectionon
Observables and Unobservables,” Quarterly Journal of
Economics,
vol.117,no.4,November2002,pages1491–1527.
ThesedataarefromtheCollegeandBeyond(C&B)surveylinked

toasurveyadministeredbyMathematicaPolicyResearch,Inc.,in
1995–1997 and to files provided by the College Entrance
Examination Board and the Higher Education Research Institute
(HERI) at the University of California, Los Angeles. The
college
selectivitycategoryisasdeterminedbyBarron’sProfilesofAmerican
Colleges1978,Barron’sEducationalSeries,1978.

Sample.Thesampleconsistsofpeoplefromthe1976collegeentering
cohort who appear in the C&B survey and who were full-time
workers in1995.Theanalysisexcludes students fromhistorically
blackuniversities(HowardUniversity,MorehouseCollege,Spellma
n
College,andXavierUniversity;seepages1500–1501inDaleand

Krueger (2002) for details). The sample is further restricted to
applicantselectivitygroupscontainingsomestudentswhoattended
public universities and some students who attended private
universities.

Variabledefinitions.Thedependentvariableisthelogofpretaxannua
l
earningsin1995.ThequestionintheC&Bsurveyhas10income
brackets;seefootnote8onpages1501–1502inDaleandKrueger
(2002)forexactconstructionoftheearningsvariable.Theapplicant
groupvariableisformedbymatchingstudentsaccordingtothelist
ofcategoriesofschoolswheretheyappliedandwereacceptedor



rejected(fromtheC&Bsurvey),whereschoolcategoriesarebased
ontheBarron’scollegeselectivitymeasure(seepages1502–1503in
DaleandKrueger(2002)formoreonthis).ThevariableownSAT
score/100measurestherespondent’sSATscoredividedby100.See
page 1508 inDale andKrueger (2002) for the definition of the
parentalincomevariable(thisisimputedusingparentaloccupation
and schooling). Variables female, black, Hispanic, Asian,
other/missingrace,highschooltop10%,highschoolrankmissing,
andathletearedummies.

Additionaltablenotes.Regressionsareweightedtomakethesample
representativeofthepopulationofstudentsatC&Binstitutions(see
page1501inDaleandKrueger(2002)fordetails).Standarderrors
inparenthesesareclusteredatthelevelofschoolattended.

Table2.3Privateschooleffects:AverageSATscorecontrols

Datasources.SeenotesforTable2.2.

Sample.SeenotesforTable2.2.Thesampleusedtoconstructthistable
containsallC&BstudentsandnotjustthosewithBarron’sselectivity
groupmatches.

Variabledefinitions.SeenotesforTable2.2.ThevariableaverageSA
T
score of schools applied to/100 is constructed as follows: the
averageSATscore(dividedby100)iscomputedforeachuniversity
usingHERIdataandthenaveragedovertheuniversitieswhereeach
respondentapplied.

Additionaltablenotes.Regressionsareweightedtomakethesample
representative of the population of students at C&B institutions.
Standarderrorsinparenthesesareclusteredattheuniversitylevel.

Table2.4Schoolselectivityeffects:AverageSATscorecontrols

Datasources.SeenotesforTable2.2.

Sample.SeenotesforTable2.3.

Variable definitions. See notes for Table 2.3. The variable
school



averageSATscore/100istheaverageSATscore(dividedby100)of
thestudentsattheschooltherespondentattended.

Additionaltablenotes.SeenotesforTable2.3.

Table2.5Privateschooleffects:Omittedvariablesbias

Datasources.SeenotesforTable2.2.

Sample,variabledefinitions,andadditionaltablenotes.Seenotes
forTable2.3.

Table3.1AnalysisofKIPPlotteries

Data sources. Demographic information on students in Lynn

public
schoolsisfromtheMassachusettsStudentInformationManagement
System.DemographicandlotteryinformationforKIPPapplicantsis
fromKIPPLynnschoolrecords.ScoresarefromtheMassachusetts
Comprehensive Assessment System (MCAS) tests in math and
Englishlanguagearts.Fordetails,seeJoshuaD.Angristetal.,“Who
BenefitsfromKIPP?”JournalofPolicyAnalysisandManagement,vo
l.
31,no.4,Fall2012,pages837–860.

Sample.Thesampleincolumn(1)containsstudentswhoattendedfifth
gradeinLynnpublicschoolsbetweenfall2005andspring2008.
Thesamplesincolumns(2)–(5)aredrawnfromthesetofKIPPLynn
applicants for fifth- and sixth-grade entry in the same period.
Applicants with siblings already enrolled in KIPP or who went
directly onto the waiting list are excluded (see footnote 14 in
Angristetal.(2012)).Lotterycomparisonsarelimitedtothe371
applicantswithfollow-updata.

Variabledefinitions.Hispanic,black,female,free/reduced-
pricelunch,
andenrolledatKIPParedummyvariables.Themathandverbal
scoresforstudentsinagivengradearestandardizedwithrespectto
the reference population of all students inMassachusetts in that
grade.Baselinescoresarefromfourth-gradetests.Outcomescores

are from the grades following the applicationgrade, specifically,



fifth-gradescoresforthosewhoappliedtoKIPPwhentheywerein
fourthgradeandsixthgradescoresforthosewhoappliedtoKIPP
whileinfifth.

Additional table notes. Robust standard errors are reported in
parentheses.

Table3.3AssignedanddeliveredtreatmentsintheMDVE

Datasources.ThenumbersreportedinthistablearefromTable1in
LawrenceW.ShermanandRichardA.Berk,“TheSpecificDeterrent
EffectsofArrestforDomesticAssault,”AmericanSociologicalRevi
ew,
vol.49,no.2,April1984,pages261–272.

Table3.4Quantity-qualityfirststages

Datasources.Thedatausedtoconstructthistablearefromthe20%
public-use microdata samples from the 1983 and 1995 Israeli
Censuses,linkedwithnonpublicinformationonparentsandsiblings
from the population registry. For details, see JoshuaD. Angrist,

Victor Lavy, andAnalia Schlosser, “MultipleExperiments for the
CausalLinkbetweentheQuantityandQualityofChildren,”Journal
ofLaborEconomics,vol.28,no.4,October2010,pages773–824.

Sample.ThesampleincludesJewish,first-bornnon-twinsaged18–
60.
The sample is restricted to individualswhosemotherswereborn
after1930andwhohadtheirfirstbirthbetweentheagesof15and
45.

Variable definitions. The twins instrument (second-born twins)
is a
dummy variable equal to 1 in families where the second birth
produces twins. The sex-mix instrument (same sex) is a dummy
variableequalto1ifthesecondandfirstbornaresame-sex.

Additionaltablenotes.Inadditiontoadummyformales,additional
covariatesaredummiesforcensusyear,parents’ethnicities(Asian
orAfricanorigin, from the formerSovietUnion, fromEuropeor
America),andmissingmonthofbirth;age,mother’sage,mother’s



ageatfirstbirth,andmother’sageatimmigration(whererelevant).
Thefirststagesinthistablegowiththesecond-stageestimatesin

thefirsttworowsofTable3.5.Robuststandarderrorsarereported
inparentheses.

Table3.5OLSand2SLSestimatesofthequantity-qualitytrade-off

Datasources.SeenotesforTable3.4.

Sample.SeenotesforTable3.4.Estimatesinthethirdandfourthrows
ofthetablearelimitedtosubjectsaged24–60atthetimeofthe
census. The college graduation outcome has a few additional
missingvalues.

Variabledefinitions.SeenotesforTable3.4.Thedependentvariables
inthesecond,third,andfourthrowsaredummyvariables.

Additionaltablenotes.CovariatesarelistedinthenotesforTable3.4.

Table4.1SharpRDestimatesofMLDAeffectsonmortality

Datasources.MortalitydataarefromtheNationalCenterforHealth
Statistics(NCHS)confidentialmortalitydetailfilesfor1997–2004.
Thesedataarederivedfromdeathcertificatesandcoveralldeaths
intheUnitedStatesinthestudyperiod.Populationestimatesinthe
denominatorarefromthe1970–1990U.S.Censuses.Fordetails,see
pages166–169ofChristopherCarpenterandCarlosDobkin,“The

Effect of Alcohol Consumption on Mortality: Regression
DiscontinuityEvidencefromtheMinimumDrinkingAge,”American
EconomicJournal—AppliedEconomics,vol.1,no.1,January2009,
pages164–182.

Sample.Thesampleisrestrictedtofatalitiesofyoungadultsaged19–
22.Thedatausedhereconsistofaveragesin48cellsdefinedbyage
in30-dayintervals.

Variabledefinitions.Causeofdeathisreportedondeathcertificatesin
theNCHSdata.Causesaredividedintointernalandexternal,with
the latter split into mutually exclusive subcategories: homicide,



suicide, motor vehicle accidents, and other external causes. A
separate category for alcohol-related causes covers all deaths
for
whichalcoholwasmentionedonthedeathcertificate.Outcomesare
mortality rates per 100,000,where thedenominator comes from
censuspopulationestimates.

Additional table notes. Robust standard errors are reported in
parentheses.

Table5.1Wholesalefirmfailuresandsalesin1929and1933

Source.Numbers in this tableare fromTable8(page1066) inGary
RichardsonandWilliamTroost,“Monetary InterventionMitigated
Banking Panics during the Great Depression: Quasi-
Experimental
Evidence from a Federal Reserve District Border, 1929–1933,”
JournalofPoliticalEconomy,vol.117,no.6,December2009,pages
1031–1073.

Datasources.Dataarefromthe1935CensusofAmericanBusiness,as
compiledbyRichardsonandTroost(2009).

Table5.2RegressionDDestimatesofMLDAeffectsondeathrates

Datasources.MLDAprovisionsbystateandyeararefrom“Minimum
Purchase Age by State and Beverage, 1933–Present,” DISCUS
(DistilledSpiritsCounciloftheUS),1996;AlexanderC.Wagenaar,
“LegalMinimumDrinkingAgeChangesintheUnitedStates:1970–
1981,”Alcohol Health and ResearchWorld, vol. 6, no. 2,Winter
1981–1982, pages 21–26; and William Du Mouchel, Allan F.
Williams,andPaulZador,“Raising theAlcoholPurchaseAge: Its
Effects on Fatal Motor Vehicle Crashes in Twenty-Six States,”
JournalofLegalStudies,vol.16,no.1,January1987,pages249–
266.WefollowthecodingoftheselawsimplementedinKarenE.

Norberg,LauraJ.Bierut,andRichardA.Grucza,“Long-TermEffects
ofMinimumDrinkingAgeLawsonPast-YearAlcoholandDrugUse
Disorders,”Alcoholism:ClinicalandExperimentalResearch,vol.33
,
no.12,September2009,pages2180–2190,correctingminorcoding



errors.
Mortality information comes from theMultiple Cause-of-Death
MortalityDataavailablefromtheNationalVitalStatisticsSystemof
the National Center for Health Statistics, obtained from
www.nber.org/data/mortality-data.html.Populationdataare from
theU.S.CensusBureau’sintercensalpopulationestimatesavailable
online.See:

▪ http://www.census.gov/popest/data/state/asrh/pre-
1980/tables/e7080sta.txt;

▪http://www.census.gov/popest/data/state/asrh/1980s/80s_st_age
_sex.html
and

▪http://www.census.gov/popest/data/state/asrh/1990s/st_age_sex
.html

Sample.Thedatasetusedtoconstructtheseestimatescontainsdeath
ratesof18–20-year-oldsbetween1970and1983bystateandyear.

Variabledefinitions.Themortalityratemeasuresthenumberof18–
20-
year-oldswhodiedinagivenstateandyear(per100,000),bycause
of death (all deaths, motor vehicle accidents, suicide, and all
internalcauses).TheMLDAregressormeasuresthefractionof18–
20-year-oldswhoarelegaldrinkersinagivenstateandyear.This
fractioniscalculatedusingMLDAchangedates ineachstateand
accounts forgrandfatheringclauses.Thecalculationassumes that
birthsaredistributeduniformlythroughouttheyear.

Additional table notes. Regressions in columns (3) and (4) are
weighted by state population aged 18–20. Standard errors in
parenthesesareclusteredatthestatelevel.

Table5.3RegressionDDestimatesofMLDAeffectscontrollingforbe
er
taxes

Datasources.SeenotesforTable5.2.BeertaxdataarefromNorberget
al., “Long-Term Effects,” Alcoholism: Clinical and
Experimental

Research,2009.

http://www.nber.org/data/mortality-data.html
http://www.census.gov/popest/data/state/asrh/pre-
1980/tables/e7080sta.txt
http://www.census.gov/popest/data/state/asrh/1980s/80s_st_age
_sex.html
http://www.census.gov/popest/data/state/asrh/1990s/st_age_sex.
html


Sample.SeenotesforTable5.2.

Variabledefinitions.SeenotesforTable5.2.Thebeertaxismeasured
inconstant1982dollarspergallon.

Additionaltablenotes.SeenotesforTable5.2.

Table6.2ReturnstoschoolingforTwinsburgtwins

Data sources. The twins data are detailed inOrleyAshenfelter
and
Cecilia Rouse, “Income, Schooling, andAbility: Evidence from
a
NewSampleofIdenticalTwins,”QuarterlyJournalofEconomics,vol

.
113,no.1,February1998,pages253–284.Thesedataareavailable
at
http://dataspace.princeton.edu/jspui/handle/88435/dsp01xg94hp
567
ThisincludesdatausedinOrleyAshenfelterandAlanB.Krueger,
“Estimates of the Economic Returns to Schooling from a New
Sample of Twins,” American Economic Review, vol. 84, no. 5,
December1994,pages1157–1173.

Sample.Thesampleconsistsof680twinswhowereinterviewedatthe
TwinsburgTwinsFestivalin1991,1992,and1993.Thesampleis
restrictedtoU.S.-residenttwinswhohavebeenemployedinthe2
yearsprecedingtheinterview.

Variabledefinitions.Estimatesinthistablewereconstructedusingsel
f-
reported years of education and sibling reports, defined as an
individual’sreportofthenumberofyearsofeducationattainedby
hisorhertwinsibling.

Additional table notes. Robust standard errors are reported in
parentheses.

Table6.3Returnstoschoolingusingchildlaborlawinstruments

Datasources.ThedatausedtoconstructthistablearedetailedinDaron
AcemogluandJoshuaD.Angrist,“HowLargeAreHuman -Capital
Externalities?EvidencefromCompulsory-
SchoolingLaws,”inBenS.
Bernanke and Kenneth Rogoff (editors), NBER Macroeconomics

http://dataspace.princeton.edu/jspui/handle/88435/dsp01xg94hp
567


Annual2000,vol.15,MITPress,2001,pages9–59.

Sample. The sample consists of U.S.-born white men aged 40–
49,
interviewedinU.S.censusesfrom1950through1990.Thesample
was drawn from the integrated public use micro data samples
(IPUMS)forthesecensuses.

Variabledefinitions.Thedependentvariableisthelogweeklywage.
The schooling variable is top-coded at 17. The 1990 Census
schoolingvariableispartlyimputedusingcategoricalmeansfrom
other sources. The child labor law instruments are dummies
indicatingtheschoolingrequiredbeforeworkwasallowedinthe
respondent’sstateofbirth,accordingtolawsinplaceatthetimethe

respondentwas 14 years old. For details, see pages 22–28 and
AppendixBinAcemogluandAngrist(2001).

Additionaltablenotes.AllregressionsareweightedusingtheIPUMS
weightingvariable.Standarderrorsinparenthesesareclusteredat
thestatelevel.

Table6.4IVrecipeforanestimateofthereturnstoschoolingusinga
singlequarterofbirthinstrument

Data sources. The data used to construct this table are detailed
in
JoshuaD.AngristandAlanB.Krueger,“DoesCompulsorySchool
Attendance Affect Schooling and Earnings?”Quarterly Journal
of
Economics,vol.106,no.4,November1991,pages979–1014.

Sample.Thesampleconsistsofmenbornbetween1930and1939inthe
1980U.S.Census5%publicusesample.Observationswithallocated
valueswereexcludedfromtheanalysis,aswererespondentswho
reportednowageincomeornoweeksworkedin1979.Seepages
1011–1012inAppendix1inAngristandKrueger(1991).

Variable definitions. Log weekly wages in 1979 are computed
by

dividingannualearningsbyweeksworked.Theschoolingvariable
isthehighestgradecompleted.

Additional table notes. Robust standard errors are reported in



parentheses.

Table 6.5 Returns to schooling using alternative quarter of birth
instruments

Data sources, sample, variable definitions, and additional table
notes.SeenotesforTable6.4.

Figures

Figure2.1TheCEFandtheregressionline

Source.ThisisFigure3.1.2onpage39inJoshuaD.AngristandJörn-
Steffen Pischke, Mostly Harmless Econometrics: An
Empiricist’s
Companion,PrincetonUniversityPress,2009.

Sample.SeenotesforTable6.4.

Variabledefinitions.Thedependentvariableisthelogweeklywage.
Theschoolingvariableisthehighestgradecompleted.

Figure3.1ApplicationandenrollmentdatafromKIPPLynnlotteries

Datasources.SeenotesforTable3.1.

Sample.TheKIPPdatasetanalyzedherecontainsfirst-
timeapplicants
forfifth-andsixth-gradeseatsin2005–2008.Thissamplecontains
446applicantsandincludessomeapplicantswithoutfollow-updata.

Figure3.2IVinschool:theeffectofKIPPattendanceonmathscores

Datasources.SeenotesforTable3.1.

Sample.Thesampleherematchesthatincolumn(3)ofTable3.1.

Figure4.1Birthdaysandfunerals

Source.This figure is
fromAppendixAofChristopherCarpenterand
CarlosDobkin, “TheEffectofAlcoholConsumptiononMortality:
Regression Discontinuity Evidence from the Minimum Drinking

Age,”AmericanEconomicJournal—
AppliedEconomics,vol.1,no.1,



January2009,pages164–182.

Additionalfigurenotes.Thefigureplotsthenumberofdeathsinthe
United States between1997 and2003by age in daysmeasured
relativetobirthdays.

Figure4.2AsharpRDestimateofMLDAmortalityeffects

Datasourcesandsample.SeenotesforTable4.1.

Variable definitions. See notes for Table 4.1. TheY-axis
measures
mortality(per100,000)fromallcauses.Averagesinthefigureare
for48cellsdefinedbyagein30-dayintervals.

Figure4.4QuadraticcontrolinanRDdesign

Datasources,sample,andvariabledefinitions.SeenotesforTable
4.1.

Additionalfigurenotes.SeenotesforFigure4.2.

Figure4.5RDestimatesofMLDAeffectsonmortalitybycauseofdeat
h

Datasourcesandsample.SeenotesforTable4.1.

Variable definitions. See notes for Table 4.1. TheY-axis
measures
mortalityratesper100,000populationbycauseofdeath.Theseare
averagesfor48cellsdefinedbyagein30-dayintervals.

Additionalfigurenotes.SeenotesforFigure4.2.

Figure4.6EnrollmentatBLS

Data sources. This figure uses Boston Public Schools (BPS)
data on
exam school applications, including information on Independent
School Entrance Exam (ISEE) scores, school enrollment status
between 1999 and 2008, and MCAS scores from school years
1999/2000through2008/2009.Fordetails,seepages142–143and
appendixCinthesupplementtoAtilaAbdulkadiroglu,JoshuaD.
Angrist,andParagPathak,“TheEliteIllusion:AchievementEffects
atBostonandNewYorkExamSchools,”Econometrica,vol.81,no.

1, January2014,pages137–196.The supplement is availableat
http://www.econometricsociety.org/ecta/supmat/10266_data_des
cription.pdf

Sample. The sample includes BPS-enrolled students who
applied to
Boston Latin School (BLS) for seventh grade seats from1999 to
2008.ThesampleisrestrictedtostudentsforwhomBLSiseithera
first choice or a top choice after eliminating schoolswhere the
studentdidn’tqualify.

Variable definitions. The running variable, labeled “entrance
exam
score”inthefigure,isaweightedaverageofapplicants’ISEEtotal
score andGPA. Exam school enrollment ismeasured using data
fromtheschoolyearfollowingapplication.

Additional figurenotes.Runningvariablevalues in the figurewere
normalizedbysubtractingthelowestscoreofferedaseatatBLSina
givenyear,sothatthecutoffforeachyearis0.Thesmoothedlines
in the figures are fittedvalues from regressionmodels estimated
with data near each point. Thesemodels regress the dependent

variableontherunningvariableforobservationswithvaluesinside
anonparametricbandwidth.SeeAbdulkadirogluetal. (2014) for
details.

Figure4.7EnrollmentatanyBostonexamschool

Data sources, sample, and additional figure notes. See notes for
Figure4.6.

Variabledefinitions.SeenotesforFigure4.6.Enrollmentatanyexam
school indicates whether an applicant enrolled at Boston Latin
School,BostonLatinAcademy,ortheJohnD.O’BryantHighSchool
ofMathematicsandScience.

Figure4.8PeerqualityaroundtheBLScutoff

Data sources, sample, and additional figure notes. See notes for
Figure4.6.

Variabledefinitions.Seenotes forFigure4.6.Foreachexamschool

http://www.econometricsociety.org/ecta/supmat/10266_data_des
cription.pdf

applicant, peer quality is the average of the fourth-gradeMCAS
math scores of his or her schoolmates in seventh grade, at any
schoolheorsheattendedinthatgrade.

Figure4.9MathscoresaroundtheBLScutoff

Data sources, sample, and additional figure notes. See notes for
Figure4.6.

Variabledefinitions.SeenotesforFigure4.6.ThevariableontheY-
axishereistheaverageofseventh-andeighth-gradeMCASmath
scores.

Figure4.10ThistlethwaiteandCampbell’sVisualRD

Source. This is Figure 3 inDonald L. Thistlethwaite andDonald
T.
Campbell,“Regression-
DiscontinuityAnalysis:AnAlternativetothe
expostfactoExperiment,”JournalofEducationalPsychology,vol.51
,
no.6,December1960,pages309–317.

Sample.Thesamplecontains5,126nearwinnersand2,848nearlosers
of aCertificate ofMerit in the 1957NationalMerit Scholarship

competition. The running variable is the score on the College
Entrance Examination Board’s Scholarship Qualifying Test,
now
knownasthePSAT.Thetwooutcomemeasurescomefromasurvey
administeredtoallstudentsinthesampleapproximately6months
afterawardswereannounced.

Variable definitions. The two outcome variables are dummies
for
whetherastudentplanstodo3ormoreyearsofgraduatestudy
(plottedaslineI–I′),andwhetherastudentplanstobeacollege
teacherorascientificresearcher(plottedaslineJ–J′).

Figure 5.1 Bank failures in the Sixth and Eighth Federal
Reserve
Districts

Data sources. Daily data on the number of banks operating in
MississippiwerecompiledbyGaryRichardsonandWilliamTroost
and are described on pages 1034–1038 of Gary Richardson and



WilliamTroost, “Monetary InterventionMitigatedBankingPanics
duringtheGreatDepression:Quasi-ExperimentalEvidencefroma

Federal Reserve District Border, 1929–1933,” Journal of
Political
Economy,vol.117,no.6,December2009,pages1031–1073.

Sample.Thebankoperationsdatacountallnationalandstatechartered
banksinMississippi,summedwithinFederalReserveDistrictsand
inoperationonJuly1,1930,andJuly1,1931.

Variabledefinitions.TheY-axisshowsthenumberofbanksopenfor
businessonJuly1ofagivenyearinagivendistrict.

Figure 5.2 Trends in bank failures in the Sixth and Eighth
Federal
ReserveDistricts

Datasources.SeenotesforFigure5.1.

Sample.Thebankoperationsdatacountallnationalandstatechartered
banks inMississippi, summedwithinFederalReserveDistricts, in
operationbetweenJuly1929andJuly1934.

Variabledefinitions.SeenotesforFigure5.1.

Figure 5.3 Trends in bank failures in the Sixth and Eighth
Federal

ReserveDistricts,andtheSixthDistrict’sDDcounterfactual

Data sources and variable definitions. See notes for Figure 5.1.
Sample.SeenotesforFigure5.2.

Figure5.7JohnSnow’sDDrecipe

Source.ThisisTableXII(onpage90)inJohnSnow,OntheModeof
CommunicationofCholera,secondedition,JohnChurchill,1855.

Figure6.1Thequarterofbirthfirststage

Datasources,sample,andvariabledefinitions.SeenotesforTable
6.4.

Figure6.2Thequarterofbirthreducedform



Datasources,sample,andvariabledefinitions.SeenotesforTable
6.4.

Figure6.3Last-chanceexamscoresandTexassheepskin

Data sources. This figure was constructed using a data set

linking
administrative high school records, administrative post-
secondary
schooling records, andunemployment insurance earnings records
fromTexas.Thesedataaredetailedonpages288–289ofDamon
ClarkandPacoMartorell,“TheSignalingValueofaHighSchool
Diploma,”JournalofPoliticalEconomy,vol.122,no.2,April2014,
pages282–318.

Sample.Thesampleconsistsoffivecohortsofseniorstakingtheirlast-
chancehighschoolexitexaminspring1993–1997.Earningsdata
areavailablethrough2004,namely,foraperiodrunningfrom7to
11yearsafterthetimeofthelast-chanceexam.

Variabledefinitions.TherunningvariableontheX-axismeasuresthe
scoreonthelast-chanceexam,centeredaroundthepassingscore.
Becausetheexitexamtestsmultiplesubjectsandstudentsmustpass
alltograduate,scoresarenormalizedrelativetopassingthresholds
and the running variable is given by the minimum of these
normalized scores. The Y axis plots the probability of diploma
receiptconditionaloneachscorevalue.

Figure6.4Theeffectoflast-chanceexamscoresonearnings

Datasourcesandsample.SeenotesforFigure6.3.

Variabledefinitions.TherunningvariableontheX-axisisasinFigure
6.3.TheY-axismeasuresaverageannualearnings includingzeros
forthosenotworkingconditionaloneachscorevalue.



ACKNOWLEDGMENTS

Georg Graetz, Kyle Greenberg, Christian Perez, Miikka
Rokkanen,
Daisy Sun, ChrisWalters, andAlicia Xiong provided expert
research
assistance.NoamAngrist,A.J.Bostian,StephanieCheng,DonCox,D
an
Fetter,YiJieGwee,SamuelHuang,AyratMaksyutov,ThomasPischk
e,
andMelvynWeeks gaveus detailed readings andwritten
comments.
SpecialthanksgotoGabrielKreindler,whopainstakinglycompileda
nd
drafted the empirical appendix, and to Mayara Silva for
insightful
proofreadingand invaluableorganizationof the
finalmanuscript.The

empiricalapplicationsweredevelopedwiththeassistanceofindulgen
t
mastersKittCarpenter,DamonClark,StacyDale,CarlosDobkin,Am
y
Finkelstein, Karen Norberg, Gary Richardson, and Analia
Schlosser,
whomwethankfortheirhelpaswellastheirdata.Gratefulthanksalso
go to our editor, Seth Ditchik at Princeton University Press, for
encouraging and guiding this project; to our skilled and
disciplined
productioneditors,PrincetonEditorialAssociatesandTerriO’Preya
tthe
Press;andtoGarrettScafaniandYetiTechnologiesforawesomeorigi
nal
artwork.
Inthisendeavoraselsewhere,thosewelovelightourpath.



INDEX

Pagenumbersforentriesoccurringinfiguresarefollowedbyanf;thos
e
forentriesinnotes,byann;andthoseforentriesintables,byat.

2SLS.Seetwo-stageleastsquares

Abdulkadiroglu,Atila,167n,171n
abilitybias,212–14
ACA(AffordableCareAct),1,24
accidents.Seemotorvehicleaccidents
Acemoglu,Daron,224,224n,225,229
achievementgap,100,169
AffordableCareAct(ACA),1,24
Alabama,minimumlegaldrinkingagein,192–93,194
alcohol:drinkprices,200;mortalityand,148–63,191–203;motor
vehicleaccidentsand,158–160,160t,161f,195,196t;taxeson,
200–201,201t.Seealsominimumlegaldrinkingage

Altonji,Joseph,78n
always-takers,111–14,121–22,130
AmethystInitiative,148,156
Angrist,JoshuaD.,59n,80,84n,102n,113n,118n,122n,127,127n,
130,130n,134n,167n,208n,224,224n,225,229,229n,234n.See
alsoMasterJoshway

AnimalHouse,148
AppendixBinTheTariffonAnimalandVegetableOils(Wright),139–
40
Arkansas,minimumlegaldrinkingagein,192

Aron-Dine,Aviva,17n,18n
Ashenfelter,Orley,218,218n,221
AtlantaFederalReserveBank,181–91
attenuationbias,221–22,242–43
attrition,18n,105n
average.Seemathematicalexpectation;mean;samplemean
averagecausaleffect,8–10;intention-to-treateffect,119–20;local
averagetreatmenteffect,109–15;intheRDdesign,156–58;treatment
effectonthetreated,114,121–22

badcontrol,214–17,216t
Bagehot,Walter,181,181n
Baicker,Katherine,25n
Baker,ReginaM.,233n
balance,checkingfor,16,19–22,103
bandwidth,162,164
Banerjee,Abhijit,18n
banks:failuresinGreatDepression,178–79,182–
89,184f,185f,186f;
liquidityof,180–81
Barron’sselectivitycategories,60,60n
Becker,GaryS.,124,124n

BeerInstitute,200,201
beertax,200–201,201t
Berk,RichardA.,116n
birthdays:mortalityratesattwenty-first,148–52,149f,150f,153,
156–63;quarterofbirth,228–34,230f
birthrates.Seefamilyplanning;familysize
bivariateregression,86–87,86n,88–89
Bloom,HowardS.,122n
BLS.SeeBostonLatinSchool
Bongaarts,John,125n
Boston,selectiveexamschoolsin,164–65,166–68,167f,173
BostonLatinAcademy,164,167–68
BostonLatinSchool(BLS),164,166–68,166f,168f,169,170–71



Bound,John,233n
BritishNavy,31
Brook,RobertH.,17n,24n
Buckles,Kasey,233,233n
busing,174

C&B(CollegeandBeyonddataset),52,59–60,69
Caldwell,Rogers,178–79
CaldwellandCompany,178–79,182,186

Campbell,DonaldT.,175–77,176n,177n
Carnoy,Martin,100n
Carpenter,Christopher,148n,164n,192n
causaleffect,6–8.Seealsoaveragecausaleffect;intention-to-
treateffect;
localaveragetreatmenteffect;treatmenteffectonthetreated
causalinference,30,47,49
CEF(conditionalexpectationfunction),82–85,83f
centralbank.SeeFederalReserve
CentralLimitTheorem(CLT),39–41
ceterisparibuscomparison:causalinferenceand,xii–
xv;inregression,
48–51,68–74.Seealsoselectionbias
charterschoollotteries,101–2;instrumentalvariablesanalysisof,
101–15,104t
charterschools:debateson,99–101;definitionof,99;instructional
approachesof,99–100,115;KIPP,99–115;testscoresin,103–6,104t,
1089,108f
checkingforbalance,16,19–22,103
Chen,Shaohua,124n
childlaborlaws,224,225,226t
children:schoolstartingagesof,228–
29,234;siblingsexcomposition
of,129–31,134–38,135t,137t.Seealsocharterschools;education;
familysize;minoritystudents;twins;typesofchildren

China,OneChildPolicyin,124
cholera,204–5,206f
Clark,Damon,235–36,235n
Clark,Kelly,140–41,141n



CLT(CentralLimitTheorem),39–41
clusteredstandarderror,207–8
Coale,Ansley,125n
CollegeandBeyond(C&B)dataset,52,59–60,69
college:financialaidfor,xiii,49;privatevs.public,47–78.Seealso
education

commontrendsassumption,184–87;relaxing,196–200
comparisongroup.Seecontrolgroup
compliance,inrandomizedtrials,118–22
compliers,102,111–14
compulsoryschoolinglaws,223–28
conditionalexpectationfunction(CEF),82–85,83f
conditionalexpectation,15–16,82
confidenceinterval,43,45,46
constant-effectsassumption,10
controlgroup:checkingforbalance,16,19–
22,103;definitionof,3.See

alsorandomassignment

controlvariable:bad,214–17;definitionof,56–57;good,217;intwo-
stageleastsquares,133–34.Seealsoomittedvariablesbias

Cook,ThomasD.,177,177n
correlation,serial,205–7
covariance,86–87.Seealsovariance
covariates,89–93,221,232,242–43

Dale,StacyBerg,51,51n,52,68,68n
Daniel,Bookof,30–31
Darwin,Charles,79–80
DD.Seedifferences-in-differences
deathrate.Seemortalityrate
defiers,112–13
demandcurve,139–40
demographiccharacteristics,19–21,20t
demographictransition,125n.Seealsofamilysize
dependentvariable:definitionof,56–57;fittedvalues,87–
88;logged,



60,93–94.Seealsooutcome

differences-in-differences(DD):bankfailuresexample,180–
91;common
trendsassumptionin,184–86,196–97;counterfactual,184–86,186f;
minimumlegaldrinkingageexample,192–
201,196t;monetarypolicy
example,180–91;regressionmodelsfor,187–89,192–201;returnsto
educationestimates,224–27,226t;standarderrorsfor,205–8;state
effectsin,193–95;withstate-specifictrends,196–200,198f,199f;
timeeffectsin,193,193n
distribution:standardnormal,39,40f,41;ofvariables,36
Dobkin,Carlos,148n,164n,192n
domesticviolence.SeeMinneapolisDomesticViolenceExperiment;
spousalabuse
drinking.Seealcohol;minimumlegaldrinkingage
drunkdriving.Seemotorvehicleaccidents
Duflo,Esther,18n
dummyvariable,9,15,57,61,62,88–89,90–91

earnings:genderdifferencesin,50;potential,214,223;yearsofwork
experienceand,211,211n.Seealsoeducation,returnsto
economicgrowth,123–24
education:achievementgapin,100,169;charterschools,99–101;
collegequality47–68;desegregationof,173–74;familysizeand,
125–38;NationalMeritScholarshipprogram,175–76,177f;peer
qualityin,168–73,168f,172f;schoolassignmentpolicies,169;

selectiveexamschools,164–69,170–74;studentdebt,xii–xiii;
universitytuition,47–48,49.Seealsoeducation,returnsto
education,returnsto:abilitybiasand,212–14,217–19;compulsory
schoolinglawsand,223–27,226t;conditionalexpectationfunction,
82,83f,84–85;controllingforability,212–14,217;controllingfor
occupation,214–17,216t;controllingforworkexperience,210,211,
217n;degreeeffectsin,235–38,237f;differences-indifferences
estimatesof,224–27,226t;instrumentalvariablesestimatesof,
223–34,226t,231t,232t;measurementerrorand,219–22,234;
opportunitycostsand,235;panelestimatesof,217–22,220t;quarter
ofbirthand,228–34,230f,231t,232t;regressionestimatesof,
210–17,218–21,225–26,232;sheepskineffectsin,235–38,237f;for



twins,217–22,220t
educationalattainment:healthstatusand,4–
6;highschoolgraduation,
236–38,237f;ofmothers,233–34;byquarterofbirth,229,230f,231,
231t;sheepskineffects,235–38,237f;siblingsexcompositionand,
130;ofwomen,125,129n,233–34

Ehrlich,Paul,123,124
Einav,Liran,17n,18n
Elder,Todd,78n

Ellement,JohnR.,51n
employment.Seeearnings;occupations
EnglishPoorLaws,81
epidemiology,204–5
error.Seemeasurementerror;standarderror
errorterm.Seeresiduals
estimatedstandarderror,38–39,45.Seealsostandarderror
estimator:definitionof,35;unbiased,35
eugenics,32,79–80
Evans,William,130,130n
exclusionrestriction,101–2,107,120,130–131
expectation:conditional,15–16,82;mathematical,14,34–35
externalvalidity,114–15

familyplanning,124–25,125n
familysize:ALSstudy,127–30,135–
36;children’shumancapitaland,
125–38;collegechoiceand,69–74;livingstandardsand,124–25;
reductionsin,124–25,125n;siblingsexcompositionand,129–31,
134–38,135t,137t

FederalReserve:EighthDistrict,181–91,184f,185f,186f;monetary
policyof,180–91;SixthDistrict,181–91,184f,185f,186f

FederalReserveBanks:Atlanta,181–91;St.Louis,181–91

fertility.Seefamilyplanning;familysize
fieldexperiment:instrumentalvariablesanalysisof,118–22;
MinneapolisDomesticViolenceExperiment,116–22,117t;RAND
HealthInsuranceExperiment,16–24,18n,20t,23t,29–30,33–34.See
alsorandomizedtrial



financialaid,xiii,49
financialcrisis:moralhazardin,181;similaritiesamong,180.Seeals
o
banks

finitesamplebiasof2SLS,145–46
Finkelstein,Amy,17n,18n,25,25n
firststage,instrumentalvariables,103f,106–7,109,110,112,113,
118–19,129;infuzzyregressiondiscontinuitydesign,171–73;for
returnstoschoolingexample,222,224–25,226t,229,230f,236–38;
intwo-stageleastsquares,131,132–34,135t,142–43

Fisher,RonaldA.,32–33,32n,140
fittedvalue,58,87–88
Friedman,Milton,180,180n,191,191n
Friedman,RoseD.,191,191n
Frost,Robert,6n;“TheRoadNotTaken,”2–3,6

F-statistic,145–46,232t,233
fuzzyregressiondiscontinuitydesign,166–74,235–38

Gage,NathanielL.,177n
Gallagher,Hugh,68
Galton,Francis,32,79–81,80n,140
genderdifferencesinearnings,50
Gladwin,Bertie,209,234,235
Goldman,Ronald,115,122–23
GreatDepression:bankfailuresin,178–79,182–89,184f,185f,186f;
monetarypolicyand,179–91

Griliches,Zvi,213–14,213n
Gruber,Jonathan,2n

Haldane,J.B.S.,32
Harmenberg,Johan,213,214
healthcare:expenditureson,1,22,23t,28t,29;priceelasticityof
demandfor,17

healthinsurance:AffordableCareAct,1,24;financialbenefitsof,28t,
29–30;health-careuseand,22,23t,26–27,27t,29;Medicaid,2,
24–30;Medicare,2;national,6;randomizedtrialsofeffectsof,12–13,

16–24,25–30;relationshiptohealth,1–11,5t,22–24,23t,27–29,28t;
uninsuredindividuals,2,4–6,5t,7–8,24
HealthInsuranceExperiment(HIE),16–24,18n,20t,23t,29–30,33–
34
height,80
heredity,80
heteroskedasticity,97
HIE.SeeHealthInsuranceExperiment
highereducation.Seeeducation
highschools:exitexams,236–38,237f;selectiveexamschools,164–
69,
170–74.Seealsoeducation
homoskedasticity,97
humancapital.Seeeducation
Hungerman,DanielM.,233,233n
hypothesis,null,39

identificationproblem,139–40
Imbens,GuidoW.,113n,122n,134n,162n
income.Seeearnings;education,returnsto
independenceassumption,106–7
IndependentSchoolsEntranceExam(ISEE),166–67,171
India:familyplanningin,124–25;livingstandardsin,124,124n
instrument,definitionof,106

instrumentalvariables(IV):causalinferencebasedon,102,106–7,
109–10,113,131;asachainreaction,107,109–10,113;charter
schoollotteryexample,101–9,111–15;estimator,98,106,143;
familysizeexample,128–30;infieldexperiments,118–22;invention
ofmethod,139–41;localaveragetreatmenteffect,109–14;
measurementerrorand,243–44;MinnesotaDomesticViolence
Experimentexample,116–22;selectionbiaseliminatedby,121;
standarderrors,144.Seealsoeducation,returnsto;firststage;fuzzy
regressiondiscontinuitydesign;reducedform;two-
stageleastsquares
insurance.Seehealthinsurance
intention-to-treat(ITT)effect,119–20
interactionterm,155–57,187–88
IQ,returnstoeducationand,213–14



ISEE(IndependentSchoolsEntranceExam),166–67,171
Israel:ALSstudy,127–30,135–36;demographicsof,127
ITT.Seeintention-to-treateffect
IV.Seeinstrumentalvariables

Jacobsen,Rebecca,100n
Jaeger,DavidA.,233n
Jagger,Mick,213,214

Jalil,Andrew,208n
Jan,Tracy,51n
Jastrow,Joseph,31,31n
JobTrainingPartnershipAct(JTPA),122n

Kalyanaraman,Karthik,162n
kernelweighting,162n
KnowledgeIsPowerProgram(KIPP)charterschools,99–101,102–9,
111,114–15
Krueger,AlanB.,51,51n,52,68,68n,218,218n,221,229,229n,234n
KungFu,xi–xii,1,24,33,47,98,147,164,178
KungFuPanda,xiv,228
KungFuPanda2,191,244

Lam,David,123n
LATE.Seelocalaveragetreatmenteffect
Lavy,Victor,127,127n
LawofLargeNumbers(LLN),13–16
Lewis,H.Gregg,124n
limitedinformationmaximumlikelihoodestimator(LIML),145
Lind,James,31,31n
liquor.Seealcohol;minimumlegaldrinkingage
livingstandards,familysizeand,124–25.Seealsoeconomicgrowth;
poverty
LLN(LawofLargeNumbers),13–16

localaveragetreatmenteffect(LATE),109–14;definitionof,109–
10;
estimating,110;externalvalidityof,114–15



locallinearregression,167n
logpoint,94
LombardCollege,141
London,choleraoutbreaksin,204–5,206f
lotteries,OregonHealthPlan,25–30,27t,28t;charterschool,101–9
Lutz,Wolfgang,123n
Lynn,Massachusetts,101,109.SeealsoKnowledgeIsPowerProgram
charterschools

Malthus,Thomas,123
Martorell,Paco,235–36,235n
Marx,Groucho,48
MasterJoshway,30,79,102,122,123,127,130,138,214,227,239
MasterStevefu,78,79,175,203,204,235,239
matching,50–51,52–56,53t
mathematicalexpectation,14,34–35
Mathews,Jay,100n
Matthau,Walter,12
MDVE(MinneapolisDomesticViolenceExperiment),116–22,117t

mean:8–9,34–35;differencein,9–11,43–45;population,35.Seealso
regressiontothemean,samplemean

measurementerror,219–22,234,240–44
Medicaid,2,24–30,28t
Medicare,2
Mincer,Jacob,210–11,210n,219
minimumlegaldrinkingage(MLDA):differences-
indifferencesanalysis
of,192–203,196t;federalandstatelawson,191–92,200;mortality
riskand,148–52,157–63,192–203,196t,201t;regression
discontinuityanalysisof,150–63

MinneapolisDomesticViolenceExperiment(MDVE),116–22,117t
minoritystudents:achievementgapof,100,169;incharterschools,99
,
100,101;desegregationand,173–74;inselectiveexamschools,
173–74

Mishel,Lawrence,100n
missingdata.Seeattrition



Mississippi:bankfailuresin,179,182–89,184f,185f,186f;Federal

Reservedistrictboundaryin,182;wholesalefirmsin,190,190t

MLDA.Seeminimumlegaldrinkingage
monetarypolicy:economicactivityand,189–91,190t;inGreat
Depression,179–91;RealBillsDoctrine,181–82

monotonicity,113
moralhazard,181
Morris,Carl,18n
mortalityrate:bycauseofdeath,158–62,160t,161f;fromcholera,
204–5,206f;minimumlegaldrinkingageand,148–52,157–63,
192–203,196t,201t;attwenty-firstbirthday,148–52,149f,150f,
157–63

mothers:educationalattainmentof,129n,233–
34;invitrofertilization
of,129n;oftwins,128–29,129n,131–32.Seealsofamilysize

motorvehicleaccidents(MVA),deathsinalcohol-related,158–
62,160t,
161f,195,196t

multivariateregression,86n,89–93.Seealsoomittedvariablesbias
MVA.Seemotorvehicleaccidents

NationalHealthInterviewSurvey(NHIS),3–6,5t,11
NationalMeritScholarshipprogram,175–76,177f
NationalMinimumDrinkingAgeAct,191–92
naturalexperiment,148,182,204
Nebuchadnezzar,King,30–31
never-takers,111,112
Newhouse,JosephP.,22n
NewYorkCity,selectiveexamschoolsin,164–65
NHIS(NationalHealthInterviewSurvey),3–6,5t,11
nonparametricregressiondiscontinuitydesign,161–63
nullhypothesis,39

occupation,returnstoeducationand,214–17,216t
OHP.SeeOregonHealthPlan
OLS(ordinaryleastsquares),58,58n,145–46
omittedvariablesbias(OVB),69–78,76t,91–93,132,152,174,212–
13.



Seealsoselectionbias
omittedvariablesbias(OVB)formula,70–74,75–77,91–93,140,212
opportunitycostofeducation,235
ordinaryleastsquares(OLS),58,58n,145–46
OregonHealthPlan(OHP)lottery,25–30,27t,28t

Orr,LarryL.,122n
outcome:definitionof,3;observed,6–8;potential,6–8,214;
pretreatment,21–22;variable56,70,109.Seealsodependentvariable

OVB.Seeomittedvariablesbias

paneldata,195,205
paralleltrends.Seecommontrendsassumption
parameter,10,34–35
parametricregressiondiscontinuitydesign,160
Pathak,Parag,167n
pathanalysis,140
Pearson,Karl,80
peereffectsineducation,65,68,165,168–73,168f,172f
Peirce,CharlesS.,31,31n
Pingle,Robert,115
Pischke,Jörn-
Steffen,59n,80,84n,122n,208n.SeealsoMasterStevefu
Plato,165
police.SeeMinneapolisDomesticViolenceExperiment
pooledsamplestandarddeviation,45
populationaverage.Seemathematicalexpectation
populationgrowth,123–24.Seealsofamilysize
populationmean.Seemathematicalexpectation
populationparameter,10,34–35

populationstandarddeviation,36
populationvariance,36,37
populationweighting,202–3
potentialexperience,210,211,217n
potentialoutcome,6–8,214
poverty:childrenin,100,101;EnglishPoorLaws,81;residential



segregationby,169.Seealsolivingstandards;Medicaid
priceelasticityofdemand:forhealthcare,17;identification,139
probability,34
publiceducation.Seecharterschools;education

QOB.Seequarterofbirth
quantity-qualitytrade-off.Seefamilysize
quarterofbirth(QOB),returnstoschoolingestimatesusing,228–34,
230f,231t,232t

RANDHealthInsuranceExperiment.SeeHealthInsuranceExperime
nt
randomassignment:ofcontrolandtreatmentgroups,xiii,12–16;
distinctionfromrandomsampling,34;selectionbiaseliminatedby,
15–16.Seealsolotteries

randomizedtrial:advantagesof,xiii–xiv,12,14–16;analysisof,12–
16;
ofhealthinsuranceeffects,12–13,16–24,25–30;history,30–33;with
imperfectcompliance116–22;policeresponsestodomesticviolence,
116–22;samplesfor,14–16

randomsampling,14–16,34–35,37
Ravallion,Martin,124n
RDdesign.Seeregressiondiscontinuitydesign
RealBillsDoctrine,181–82
reducedform,instrumentalvariables,109,110,113,119–20;infuzzy
regressiondiscontinuitydesign,171–74;inreturnstoschooling
example,222,225,226t,229,230f,236–38;intwos-tageleast
squares,131,132–34,143,145–46

regression:55–59,83–97;anatomy,89–91;bivariate,86–87,86n,
88–89;coefficients,57;conditionalexpectationfunctionand,82–85,
83f;dependentvariable,56–57;dummyvariablesin,57,88–89,
90–91;fittedvalues,87–88;Galton’suseof,80–81;locallinear,167n;
withlogs,93–94;long,69–70,71,72–74,75–77,79;andmatching,
55–56,58–59;measurementerrorin,240–44;multivariate,86n,
89–93;omittedvariablesbiasin,69–78,76t,91–93;residuals,58,
87–88;sensitivityanalysis,74–78;short,70,72–73,74,79;standard
errors,95–97;weighted201–3.Seealsocontrolvariables

regressiondifferences-in-differences(DD):bankfailuresexample,
187–89;minimumlegaldrinkingageexample,192–201;standard
errorsfor,205–8

regressiondiscontinuity(RD)design:150–53;bandwidth,162,164;
centeringrunningvariable,155;comparedtoregression,153;fuzzy,
165–74,235–38;withinteractionterms,155,156;nonlinearitiesin,
153–57,154f;nonparametric,161–66;parametric,160;with
quadraticrunningvariablecontrol,155–57,158f;runningvariable,
151,152,153;sharp,150–63,150f,169;visual,158

regressiontothemean,80
regression-weightedaverage,55,56,58–59
Reinhart,Carmen,180,180n
reliability,241
residuals:definitionof,58;propertiesof,87–88;serialcorrelationof,
205;squared,58;intwo-stageleastsquares,144

residualsumofsquares(RSS),86,202
returnstoschooling.Seeeducation,returnsto
Richardson,Gary,182,182n
Rimer,Sara,115n
riskset,106

robuststandarderror,97
Rogoff,Kenneth,180,180n
Rosenzweig,MarkR.,127,127n
Rothstein,Richard,100n
Rouse,Cecilia,218,218n,221
RSS(residualsumofsquares),86,202
Rubin,DonaldB.,113n
runningvariable,151,152,153

St.LouisFederalReserveBank,181–91
Salter,James,3
sampleaverage.Seesamplemeans
samplemeans:differencein,9–11,43–
45;estimatingpopulationmeans
from,35;samplingdistributionof,39–43,41f,42f;samplingvariance
of,37–38;standarderrorsof,38–39;t-statisticsfor,39–43,41f,42f;



unbiasednessof,35
samplesize:LawofLargeNumbersand,13–
16;randomassignmentand,
13–14;samplingdistributionsand,40–
41,41f,42f;samplingvariance
and,38;standarderrorand,38,95;unbiasedestimatorsand,35

samplestandarddeviation,36,38
samplestatistic:definitionof,35;standarderrorsof,38
samplevariance,36,36n,44–45
sampling,random,14–16,34–35,37
samplingvariance,21,37–39,44–46
samplingvariation.Seesamplingvariance
Sandburg,Carl,141–42
Sanderson,Warren,123n
SATscores,48,50,50n,52,64,75–77,176
Scherbov,Sergei,123n
Schlosser,Analia,127,127n
schools.Seecharterschools;education;highschools
Schwartz,Anna,180,180n
scurvy,31
selectionbias:8–11;incharterschoolattendance,105,108–
9;incollege
choice,48,55,54,68,69;definitionof,8;differencesinmeansand,
10–11;duetobadcontrol,215–17,216t;inMinneapolisDomestic
ViolenceExperiment,118,120–21;inreturnstoeducationestimates,
211–13;intwo-stageleastsquares,145–46.Seealsoomittedvariables
bias
selectiveexamschools:admissionscutoffsof,165–68,166f,167f;
admissionstestsfor,164–
67,171;differencesfromnonselectivepublic
schools,173–74;peerqualityin,165,168–69,168f,170–73,172f;

racialcompositionof,173–74
self-revelationmodel,65–66,66t,68,77
serialcorrelation,205–7
sharpregressiondiscontinuitydesign,150–63,150f,169
sheepskineffects,235–38,237f
Sherman,LawrenceW.,116n
siblings,sexcompositionof,129–31,134–
38,135t,137t.Seealsotwins
significance.Seestatisticalsignificance



Simpson,NicoleBrown,115,122–23
Simpson,O.J.,115,115n,122
simultaneousequationsmodels,139–40
smoking,32–33
Snow,John,204–5,205n,206f
spousalabuse:MinneapolisDomesticViolenceExperiment,116–22,
117t;Simpsoncase,115
standarddeviation:pooledsample,45;population,36;sample,36,38
standarderror:clustered,207–8;forcomparisonofmeans,21,44–45;
definitionof,38;fordifferences-in-differences,205–8;estimated,
38–39,45;instrumentalvariables,144;regression,62–63,95–97;
robust,97;samplesizeand,95–96;statisticalsignificanceand,21,
41–43;two-stageleastsquares,144

standardizedtestscore:incharterschools,103–6,104t,108–9,108f;
definitionof,103;onhighschoolexitexams,236–38,237f;SAT,48,
50,50n,52,64,75–77,176;onselectiveexamschooladmissions
tests,164–68,171
standardnormaldistribution,39,40f,41
Stanley,JulianC.,177,177n
stateeffects,193,194,195,224–27
states:childlaborlawsin,224,225,226t;compulsoryschoolinglawsi
n,
223–27,228;Medicaidexpansionby,24–30.Seealsominimumlegal
drinkingage
state-yearpanel,195,202,205
statisticalindependence,37
statisticalinference,33
statisticalsignificance,21,41–43,44,46
Stock,JamesH.,140–41,140n,141n
students.Seeeducation;minoritystudents
Stylometrics,140
suicide,159,160t,195–96
supplycurve,139–40

Taber,Christopher,78n
Taubman,Sarah,25n

taxes,alcohol,200–201,201t
TeachforAmerica,100,100n
Tennessee,minimumlegaldrinkingagein,193
Texas,highschoolexitexamsin,236–38,237f
Thistlethwaite,DonaldL.,175–76,176n
Thomas,DuncanP.,31n
timeeffects,193,193n,194–95,195n
Tomes,Nigel,124n
treatment,definitionof,3
treatmenteffect.Seeaveragecausaleffect;intentiontotreateffect;lo
cal
averagetreatmenteffect;treatmenteffectonthetreated
treatmenteffectonthetreated(TOT),114,121–22
treatmentgroup:checkingforbalance,16,19–22;definitionof,3.See
alsorandomassignment
treatmentvariable,definition,3;indifferences-in-
differences,193;in
instrumentalvariablesanalysis,109;inregression,56,57;inthe
regressiondiscontinuitydesign,151,166–69
Trebbi,Francesco,140,140n
Troost,William,182,182n
t-statistic:forcomparisonofmeans,45;definitionof,39;forsample
mean,39;samplingdistributionof,39–43,41f,42f
twins:asinstrumentforfamilysize,127–29,135–38;returnsto

educationfor,217–22,220t
Twinsburg,Ohio,217–22
two-stageleastsquares(2SLS):131–38,142–46;controlvariablesin,
133;familysizeexample,132–37,137t;firststage,132,134,135–36,
135t,142–43;many-weakinstrumentsproblemin,145–46;reduced
form,131,132–34,143,145–46;secondstage,133,134–35,136,137,
143–44;standarderrors,135,144
typesofchildren,111,112

unbiasedestimator,35
uniformkernel,162n
universities.Seecollege;education
U.S.SupremeCourt,174



validity,external,114–15
variability,measuring,35–39
variable:dependent,56–57,60,87–88,93–94;distributionof,36;
dummy9,15,57,61,62,88–89,90–91;running,151,152,153;
treatment,3,56,57,109,166–69,193.Seealsocontrolvariable;
instrumentalvariables;outcome
variance:definitionof,35–36;descriptive,37;differencesin,44–
45,97;
population,36,37;residual,95–97;sample,36,36n,44–

45;sampling,
33,37–39,44,46,95–97.Seealsocovariance
Virtue,G.O.,139n

wages.Seeearnings;education,returnsto
WaitingforSuperman,99
weightedleastsquares(WLS),162n,202–3
Wheeler,Adam,51n
wholesalefirms,190,190t
Wilcox,MosesandAaron,217
WLS.Seeweightedleastsquares
Wolpin,KennethI.,127,127n
women:earningsof,50;educationalattainmentof,125,129n,233–
34.
Seealsomothers
workinghypothesis,39
Wright,PhilipG.,139–41,139n,142n
Wright,Sewall,32,139,140–41

yeareffects.Seetimeeffects
Yule,GeorgeUdny,80–81,81n

Zappa,Frank.SeePlato
Contents (v)Figures (vii)List of Tables (ix)Introduction
(xi)Other Things Equal (xii)1. Randomized Trials (1)1.1 In

Sickness and in Health (Insurance) (1)Fruitless and Fruitful
Comparisons (4)Randomized Results (17)2019-11-02-22031.2
The Oregon Trail (23)Masters of ’Metrics: From Daniel to R. A.
Fisher (30)Appendix: Mastering Inference (32)A World without
Bias (33)Measuring Variability (35)The ???-Statistic and the
Central Limit Theorem (38)Pairing Off (42)Notes (45)2.
Regression (48)2.1 A Tale of Two Colleges (48)Matchmaker,
Matchmaker (52)2.2 Make Me a Match, Run Me a Regression
(56)Public-Private Face-Off (60)Regressions Run (62)2.3
Ceteris Paribus? (69)Regression Sensitivity Analysis (76)2019-
10-03-1507Masters of ’Metrics: Galton and Yule (81)Appendix:
Regression Theory (83)Conditional Expectation Functions
(83)Regression and the CEF (85)Bivariate Regression and
Covariance (87)Fits and Residuals (88)Regression for Dummies
(89)Regression Anatomy and the OVB Formula (90)Building
Models with Logs (95)Regression Standard Errors and
Confidence Intervals (97)Notes (100)3. Instrumental Variables
(102)Our Path (102)3.1 The Charter Conundrum (103)Playing
the Lottery (105)LATE for Charter School (112)3.2 Abuse
Busters (118)When LATE Is the Effect on the Treated (121)3.3
The Population Bomb (125)One-Stop Shopping with Two-Stage
Least Squares (133)Masters of ’Metrics: The Remarkable
Wrights (140)Appendix: IV Theory (143)IV, LATE, and 2SLS
(143)2SLS Standard Errors (144)2SLS Bias (145)Notes (147)4.
Regression Discontinuity Designs (150)Our Path (150)4.1

Birthdays and Funerals (151)Sharp RD (153)RD Specifics
(156)4.2 The Elite Illusion (166)Fuzzy RD (168)Fuzzy RD Is
RV (172)Masters of ’Metrics: Donald Campbell (177)Notes
(179)5. Differences-in-Differences (180)Our Path (180)5.1 A
Mississippi Experiment (180)One Mississippi, Two Mississippi
(182)Parallel Worlds (184)Just DDo It: A Depression
Regression (188)5.2 Drink, Drank, … (193)Patterns from
Patchwork (194)Probing DD Assumptions (198)What Are You
Weighting For? (202)Masters of ’Metrics: John Snow
(204)Appendix: Standard Errors for Regression DD (205)Notes
(209)6. The Wages of Schooling (210)Masters at Work (210)6.1
Schooling, Experience, and Earnings (210)Of Singers, Fencers,
and PhDs: Ability Bias (212)The Measure of Men: Controlling
Ability (214)Beare Bad Control (215)6.2 Twins Double the Fun
(218)Twin Reports from Twinsburg (220)6.3 Econometricians
Are Known by Their … Instruments (223)It's the Law (223)To
Everything There Is a Season (of Birth) (228)6.4 Rustling
Sheepskin in the Lone Star State (235)Appendix: Bias from
Measurement Error (239)Adding Covariates (241)IV Clears Our
Path (243)Notes (244)Abbreviations and Acronyms
(246)Empirical Notes (249)Acknowledgments (269)Index (270)

Journal of Criminal Law and Criminology
Volume 83
Issue 1 Spring

Article 5

Spring 1992

The Variable Effects of Arrest on Criminal Careers:
The Milwaukee Domestic Violence Experiment
Lawrence W. Sherman

Janell D. Schmidt

Dennis P. Rogan

Douglas A. Smith

Follow this and additional works at:
http://scholarlycommons.law.northwestern.edu/jclc

Part of the Criminal Law Commons, Criminology Commons,
and the Criminology and Criminal
Justice Commons

This Symposium is brought to you for free and open access by
Northwestern University School of Law Scholarly Commons. It
has been accepted for
inclusion in Journal of Criminal Law and Criminology by an
authorized administrator of Northwestern University School of
Law Scholarly Commons.

Recommended Citation
Lawrence W. Sherman, Janell D. Schmidt, Dennis P. Rogan,
Douglas A. Smith, The Variable Effects of Arrest on Criminal
Careers: The
Milwaukee Domestic Violence Experiment, 83 J. Crim. L. &
Criminology 137 (1992-1993)

http://scholarlycommons.law.northwestern.edu/jclc?utm_source
=scholarlycommons.law.northwestern.edu%2Fjclc%2Fvol83%2
Fiss1%2F5&utm_medium=PDF&utm_campaign=P DFCoverPage
s
http://scholarlycommons.law.northwestern.edu/jclc/vol83?utm_s
ource=scholarlycommons.law.northwestern.edu%2Fjclc%2Fvol8
3%2Fiss1%2F5&utm_medium=PDF&utm_campaign=PDFCover
Pages
http://scholarlycommons.law.northwestern.edu/jclc/vol83/iss1?u
tm_source=scholarlycommons.law.northwestern.edu%2Fjclc%2
Fvol83%2Fiss1%2F5&utm_medium=PDF&utm_campaign=PDF

CoverPages
http://scholarlycommons.law.northwestern.edu/jclc/vol83/iss1/5
?utm_source=scholarlycommons.law.northwestern.edu%2Fjclc%
2Fvol83%2Fiss1%2F5&utm_medium =PDF&utm_campaign=PD
FCoverPages
http://scholarlycommons.law.northwestern.edu/jclc?utm_source
=scholarlycommons.law.northwestern.edu%2Fjclc%2Fvol83%2
Fiss1%2F5&utm_medium=PDF&utm_campaign=PDFCoverPage
s
http://network.bepress.com/hgg/discipline/912?utm_source=sch
olarlycommons.law.northwestern.edu%2Fjclc%2Fvol83%2Fiss1
%2F5&utm_medium=PDF&utm_campaign=PDFCoverPages
http://network.bepress.com/hgg/discipline/417?utm_source=sch
olarlycommons.law.northwestern.edu%2Fjclc%2Fvol83%2Fiss1
%2F5&utm_medium=PDF&utm_campaign=P DFCoverPages
http://network.bepress.com/hgg/discipline/367?utm_source=sch
olarlycommons.law.northwestern.edu%2Fjclc%2Fvol83%2Fiss1
%2F5&utm_medium=PDF&utm_campaign=PDFCoverPages
http://network.bepress.com/hgg/discipline/367?utm_source=sch
olarlycommons.law.northwestern.edu%2Fjclc%2Fvol83%2Fiss1
%2F5&utm_medium=PDF&utm_campaign=PDFCoverPages


0091-4169/92/8301-0137
THE JOURNAL OF CRIMINAL LAW & CRIMINOLOGY Vol.

83, No. 1
Copyright 0 1992 by Northwestern University, School of Law
Prinled in U.S.A.

THE VARIABLE EFFECTS OF ARREST ON
CRIMINAL CAREERS: THE MILWAUKEE

DOMESTIC VIOLENCE EXPERIMENT

LAWRENCE W. SHERMAN, JANELL D. SCHMIDT,
DENNIS P. ROGAN, DOUGLAS A. SMITH,

PATRICK R. GARTIN, ELLEN G. COHN,
DEAN J. COLLINS, and ANTHONY R. BACICH*

I. INTRODUCTION

The jurisprudence of the criminal sanction has long recognized
diverse objectives: deterrence, justly deserved punishment,
incapac-

This research was supported by grant 861JCXK043 to the Crime
Control Institute
from the National Institute ofJustice. The points of view or
opinions stated herein are

those of the authors and do not necessarily represent the official
views of the U.S. De-
partment of Justice or the Milwaukee Police Department. We
are deeply indebted to
former Milwaukee Police Chief Robert J. Ziarnik for his support
of this research and to
Chief Philip Arreola for continuing that support. We also thank
Kathleen Stolpman and
the staff of the Sojourner Truth House for their maintenance of
hotline records of re-
peat violence. This article also reflects the work of many Crime
Control Institute staff
members and interviewers and the advice of Drs.Joel Garner,
AlbertJ. Reiss,jr., Robert
Boruch, and Kinley Larntz, as well as Lucy Friedman and Allen
Andrews. Most impor-
tant were the officers who carried out the experiment, all of
whom entered eligible cases
into it: Joseph Vukovich, Zygmunt Lipski, Kenneth Jones,
Michael Dubis, Lawrence
Roberts, Thomas Skovera, Alan Singer, Frederick Birts, Thomas
Bohl, Daniel Halbur,
Peter Panasiuk, Scott Rinderle, Michael Braunreiter, Timothy
Koceja, John Bogues, Ed-
gar Bullock, Kim Stack, "Mick" Heinrich, Jerome Sims, John

Wallace, Wayne Armon,
Rosalie Gallegos, Edward Prah, Dean Schubert, Richard
Thompson, Debra Glass,
Cheryl Switzer, Robert Eckert, Daniel Kent, Tracy Becker,
Steven Fyfe, Jeffery Watts,
Gregory Blumenberg, Mark Hilt, and Kathleen Borkowski.

* Lawrence W. Sherman is Professor of Criminology,
University of Maryland and
President, Crime Control Institute. Ph.D. Yale University, 1976.
Janell D. Schmidt is
Director, Milwaukee Office, Crime Control Institute. M.S.,
University of Wisconsin at
Milwaukee, 1985. Dennis P. Rogan is Vice President, Crime
Control Institute. Ph.D.,
University of Maryland, 1988. Douglas A. Smith is Professor of
Criminology, University
of Maryland. Ph.D., Indiana University, 1982. Patrick R. Gartin
is Associate in Crimi-
nology, University of Florida. Ph.D., University of Maryland,
1992. Ellen G. Cohn is
Visiting Assistant Professor of Criminal justice, Indiana
University at Indianapolis.
Ph.D., Cambridge University, 1991. DeanJ. Collins is Deputy
Inspector, Milwaukee Po-

lice Department. M.S., University of Wisconsin at Milwaukee,
1973. Anthony R. Bacich
is Captain, Milwaukee Police Department. M.S., University of
Wisconsin at Milwaukee,
1986.



SHERMAN ET AL.

itation and perhaps rehabilitation.' Yet it has rarely recognized
arrest as a form of sanctioning, despite the widely
acknowledged use
of arrest for that purpose. 2 While the Supreme Court has held
that
pre-trial detention does not legally constitute punishment, 3 the
ju-
risprudence of arrest must nonetheless confront problems of
poten-
tial conflict among the diverse objectives of arrest as a
sanction.
Perhaps the most perplexing problem involves empirical
evidence
of conditions under which arrests increase, rather than reduce,
the

frequency of repeat offending by arrested individuals. This
problem
is particularly challenging for misdemeanor offenses that rarely
re-
sult in prosecution and for which arrest may be the only
criminal
sanction ever applied.

Mandatory arrest laws for misdemeanor domestic battery have
become the leading example of this problem in the
jurisprudence of
arrest. Enacted by some fifteen state legislatures 4 despite
implicit
knowledge that few arrests are ever prosecuted, 5 mandatory
arrest
was widely viewed as a criminal sanction that produced a
specific
deterrent effect.6 This view was supported by the findings of
the
pioneering Minneapolis, Minnesota domestic violence arrest
experi-
ment (also called the Minneapolis Spouse Abuse Experiment),
the
first controlled experiment in the use of arrest for any offense,
which found a substantial specific deterrent effect in a sample

of 314
cases. 7 But as the authors of that experiment pointed out, the
sam-

I See HERBERT L. PACKER, LIMITS OF THE CRIMINAL
SANCTION (1968).
2 WAYNE R. LAFAVE, ARREST: THE DECISION To TAKE A
SUSPECT INTO CUSTODY 437

(1965).
3 Bell v. Wolfish, 441 U.S. 520 (1979).
4 NAT'L CENTER ON WOMEN & FAMILY LAW, INC.,
MANDATORY ARREST SUMMARY

CHART (1991). See ARIZ. REV. STAT. ANN. § 13-3601B
(1991); CONN. GEN. STAT. ANN.
§ 466-386(a) (West 1990); D.C. CODE ANN. § 16-1031(a)
(1991); HAW. REV. STAT. § 709-
906(4) (1991); 1991 IOWA ACTS 2160; ME. REV. STAT.
ANN. tit. 19, § 770 (West 1991);
Mo. REV. STAT. § 455.085 (1990); NEV. REV. STAT. §
171.137 (1991); NJ. REV. STAT.
§ 2C:25-5a (1991); OR. REV. STAT. § 133.055 (1989); R.I.
GEN. LAWS § 12-29-3(B)
(1991); S.D. CODIFIED LAWS ANN. § 23A -3-21 (1989);

UTAH CODE ANN. § 30 -6-8(2)
(1991); WASH. REV. CODE ANN. § 10.31.100(2) (West 1991);
WIs. STAT. ANN.
§ 968.075(2) (West 1990).

5 In Milwaukee, Wisconsin, for example, the prosecution rate
for misdemeanor do-
mestic battery was about ten percent at the time the Wisconsin
State legislature enacted
mandatory arrest for probable cause cases of that offense. In the
Milwaukee experiment
reported in this article, the prosecution rate was under five
percent of all arrests.

6 U.S. ATrORNEY GENERAL'S TASK FORCE ON FAMILY
VIOLENCE, REPORT (1984); Law -
rence W. Sherman & Ellen G. Cohn, The Impact of Research on
Legal Policy: The Minneapolis
Domestic Violence Experiment, 23 LAw & Soc'Y. REV. 117
(1989) [hereinafter Sherman &
Cohn, The Impact of Research].

7 See Richard A. Berk & Lawrence W. Sherman, Police
Responses to Family Violence Inci-
dents: An Analysis of An Experimental Design With Incomplete

Randomization, 83 J. AM. STAT.
ASS'N 70 (1988); Lawrence W. Sherman & Richard A. Berk,
The Specific Deterrent Effects of
Arrest for Domestic Assault, 49 AM. Soc. REV. 261 (1984).

[Vol. 83138



MIL WA UKEE EXPERIMENT

ple size precluded thorough testing of an important possibility:
"that for some kinds of people, arrest may only make matters
worse." They went on to recommend that "until subsequent re-
search addresses that issue more thoroughly, it would be
premature
for state legislatures to pass laws requiring arrests in all misde-
meanor domestic assaults." s

This article reports subsequent research that has now ad-
dressed the issue more thoroughly. Just as the Minneapolis
study's
authors feared, the Milwaukee, Wisconsin domestic violence
arrest
experiment provides substantial evidence that arrest makes

some
kinds of people more frequently violent against their
cohabitants.
This evidence creates a philosophical conflict between the
objectives
of punishment and deterrence, a problem with little previous
com-
mentary in the jurisprudence of sanctions. The evidence shows
that, while arrest deters repeat domestic violence in the short
run,
arrests with brief custody increase the frequency of domestic
vio-
lence in the long run among offenders in general. The evidence
also shows that, among cases predominantly reported from
Milwau-
kee's black urban poverty ghetto, different kinds of offenders
react
differently to arrest: some become much more frequently
violent,
while others become somewhat less frequently violent.

These variable effects of arrests on criminal careers raise
impor-
tant questions about whether crime prevention or just deserts is
to

be the paramount goal of the criminal sanction. 9 The
longstanding
jurisprudential premise that punishment always deters, or at
least
never backfires,10 can no longer be accepted. Such a serious
claim
requires substantial documentation. This article expands upon
find-

8 LAWRENCE W. SHERM AN & RICHARD A. BERK, THE
MINNEAPOLIS DOMESTIC VIO -
LENCE EXPERIMENT 7 (1984). See also Lawrence W.
Sherman, Experiments in Police Discre-
tion: Scientific Boon or Dangerous Knowledge? 47 LAw &
CONTEMP. PROBS. 61 (1984)
[hereinafter Sherman, Experiments in Police Discretion].

9 This question is not new, even though it lacks systematic
treatment in modern
jurisprudence. In 1764, Cesare Beccaria argued that when the
infliction of punishment
produces no effect, then punishment is not morally justified and
violates the social con-
tract. CESARE BECCARIA, ON CRIMES AND
PUNISHMENTS 14 (Henry Paolucci trans., 1963).

A century later, Sir Arthur Conan Doyle answered the question
this way: "To revenge
crime is important, but to prevent it is more so." 2 THE
ANNOTATED SHERLOCK HOLMES
672 (William S. Baring-Gould ed., 1967).

10 von Hirsch, for example, has observed that

When one seeks to justify the criminal sanction by reference to
its deterrent utility,
desert is called for to explain why that utility mayjustly be
pursued at the offenders'
expense. When one seeks to justify punishment as deserved,
deterrence is needed
to deal with the countervailing concern about the suffering
inflicted. The interde-
pendence of these two concepts suggests that the criminal
sanction rests, ultimately,
on both.

ANDREW VON HIRSCH, DOING JUSTICE: CHOICE OF
PUNISHMENTS 55 (1976).

1992]

SHERMAN ET AL.

ings presented elsewhere, providing considerably more detail
than
has been previously reported"I about the results of the
Milwaukee
domestic violence experiment.

II. THE CRIMINOLOGY AND JURISPRUDENCE OF POLICE
DISCRETION

The factual premise of mandatory arrest advocates has been
that police discriminate against victims of domestic violence,
largely
because most police officers are men. 12 The indisputable
evidence
cited in support of this premise is that police often fail to make
ar-
rests in cases of misdemeanor domestic battery, a claim
supported
by repeated field observation studies of police decisionmaking.
13

On occasion, police have even failed to make arrests for
domestic
violence felonies committed in their presence. For example, in
1983, Torrington, Connecticut police officer Frederick Petrovits
stood and watched as Charles Thurman, holding a bloody knife,
kicked his wife Tracey in the head. ' 4 She was already bleeding
from
knife wounds in the chest, neck and throat. Petrovits did
nothing as
Mr. Thurman went into the house, grabbed his three-year old
son,
came back out and kicked his wife in the head again. Three
other
officers arrived and also did nothing but call for an ambulance
while
Mr. Thurman wandered around, continuing to threaten his wife.
Only when he approached his wife again as she was lying on a
stretcher did the police finally arrest Mr. Thurman, a short-
order
cook at a cafe frequented by local police officers.

t 5

The evidence for police discrimination against women domestic
battery victims is bolstered by incidents of police officers

commit-
ting battery against their own wives. In the City of Chicago in
1988,

11 See LAWRENCE W. SHERMAN, POLICING DOMESTIC
VIOLENCE: EXPERIMENTS AND Di -
LEMMAS (1992) [hereinafter SHERMAN, POLICING
DOMESTIC VIOLENCE); Lawrence W.
Sherman et al., From Initial Deterrence to Long-Term
Escalation: Short-Custody Arrest for Poverty
Ghetto Domestic Violence, 29 CRIMINOLOGY 821 (1991)
[hereinafter Sherman, From Initial
Deterrence]; Lawrence W. Sherman & Douglas A. Smith,
Crime, Punishment and Stake in
Conformity: Legal and Extralegal Control of Domestic Violence
(forthcoming 1992 in AM. Soc.
REV.) [hereinafter Sherman & Smith, Crime, Punishment and
Stake in Conformity].

12 SHERMAN, POLICING DOMESTIC VIOLENCE, supra note
11, ch. 2.
13 DONALD J. BLACK, THE MANNERS AND CUSTOMS OF
THE POLICE 94 (1980), reports

that in a 1966 study of high-crime area policing in Boston,

Washington and Chicago,
arrests were made in only forty-seven percent of all
misdemeanors involving family
members. Nan Oppenlander, Coping or Copping Out, 20
CRIMINOLOGY 449, 455 (1982),
reports similar results from a 1977 observation study of policing
in twenty-four agencies
in three metropolitan areas (Tampa, Rochester, N.Y., and St.
Louis): arrests were made
in only twenty-two percent of all family assault cases. See also
Delbert S. Elliott, Criminal
Justice Procedures in Family Violence Crimes, in FAMILY
VIOLENCE 427 (Lloyd F. Ohlin &
Michael H. Tonry, eds. 1989).

14 Thurman v. City of Torrington, 595 F. Supp. 1521, 1526
(1984).
15 Id.

140 [Vol. 83



MILWAUKEE EXPERIMENT

for example, at least four of the city's 12,000 police officers
killed
their wives and then killed themselves.' 6 A 1991 lawsuit filed
against the Chicago Police Department claimed it had a
continuing
pattern of covering up police violence against spouses. 17 The
plain-
tiff, a police officer's wife, claimed to have been beaten for
years,
with no help from police supervisors to whom she complained
or
from officers who responded after beating incidents. After the
plaintiff obtained a court order of protection, her husband
stopped
her on the street while she was driving her son in her car. Her
hus-
band was in uniform, in a marked squad car, with his uniformed
partner sitting in the car. The husband beat his wife in full view
of
both his partner and his son. The partner did nothing to
intervene,
even though he knew there was a valid order of protection being
violated. The officer was later tried and convicted on battery
charges but was not immediately dismissed from the police
force.

1 8

The problem with the use of these facts as evidence of discrimi-
nation against women victims of domestic violence is that they
are
silent about disparity. If one assumes full enforcement of laws
against other offenses, then the evidence of under-enforcement
of
this offense is sufficient. But if full enforcement is only a myth,
then
the question becomes how much difference there is between the
probability of arrest (given an opportunity) for domestic
violence
and that for other offenses. That this is the appropriate question
is
clear. Full enforcement is indeed a myth, and American police
prac-
tice "aggressive" under-enforcement of most offenses. 19 One
study
found that even with the suspect present and with legally
sufficient
evidence, police made arrests in only forty-four percent of all
re-
ported misdemeanors and fifty-eight percent of all reported

felo-
nies. 20 Other studies reach similar findings.2 1 For a wide
variety of
reasons, police ignore most opportunities to make arrests. 22

Criminological study of police discretion has established little

16 Jacob R. Clark, Policing's Dirty Little Secret?, LAw
ENFORCEMENT NEWS, April 15,
1991, at 1, 10.

17 Id at 1.
18 Id.

19 See Harold E. Pepinsky, Better Living Through Police
Discretion, 47 LAw & CoNTEMP.
PROBS. 249 (1984).

20 BLACK, supra note 13, at 90.
21 See Douglas A. Smith & Christy A. Visher, Street-Level
Justice: Situational Determinants

of Police Arrest Decisions, 29 Soc. PROBS. 167 (1981).
22 This fact has stimulated extensive social science theorizing
and commentary. See,

e.g, MICHAEL P. BANTON, THE POLICEMAN IN TH E
COMMUNITY (1964); BLACK, supra note
13; MICHAEL K. BROWN, WORKING THE STREET (1981);
ALBERTJ. REISS, THE POLICE AND
THE PUBLIC (1971);JEROME H. SKOLNICK, JUSTICE
WITHOUT TRIAL (1966);JAMES Q. WIL -
SON, VARIETIES OF POLICE BEHAVIOR (1968).

1992]



SHERMAN ET AL.

consistent explanation of the causes of police behavior, 2 3 but
one
nearly universal finding is that police attend to the demeanor or
overall "moral worth" of the suspect and victim. If police are
clearly
not blind "ministerial" agents automatically carrying out the
legisla-
ture's commands, a more accurate description seems to be that
they
are judicial officials administering their own conceptions of just

deserts.2 4 As sociologist William Westley observed in the first
sys-
tematic field study of an American police department (Gary,
Indiana
in 1949), police do not enforce the law so much as their own
moral-
ity. 2 5 Police routinely speak of suspects who "fail the attitude
test,"
or who are guilty of "contempt of cop," or who are just plain
bad
people, denoted by the widespread police use of the label
"asshole." 2 6 The importance of police "gut" reactions to
people
and situations has so shaped our understanding of what police
do
that one scholar makes it part of the very definition of policing:
"a
mechanism for the distribution of non-negotiably coercive force
em-
ployed in accordance with the dictates of an intuitive grasp of
situa-
tional exigencies." 2 7 Much of this intuition goes beyond the
"craft"
of how to accomplish a goal in a particular situation 28 to a
moral

judgment about what that goal should be.

The "police justice" model of discretion has a clear conse-
quence for domestic violence: leading police to arrest the
"unem-
ployed, unmarried, nonchurchgoing riffraff," while letting the
more
respectable (and deferential) suspects they encounter go free.29

This practice is clearly supported by a just deserts view of
police as
judicial officials and a free will conception of human behavior.
It
falls down, however, on a premise of deterrence and
determinism.
Criminological theory for the past half century has suggested
that
persons most likely to be arrested for domestic violence are the
per-
sons least likely to be deterred by an arrest.3 0 That was one
reason
why a controlled experiment was necessary to test the effects of
arrest-even on people whom police would normally not arrest.
The more important reason, though, was to determine how
police

23 See Lawrence W. Sherman, Cause of Police Behavior: The
Current State of Quantitative
Research, 17J. RES. CRIME & DELINQ. 69 (1980).

24 See Sherman, Experiments in Police Discretion, supra note 8.
25 See WILLIAM A. WESTLEY, VIOLENCE AND THE
POLICE (1970).
26 SeeJohn Van Maanen, The Asshole, in POLICING: THE
VIEW FROM THE STREET (Peter

K. Manning &John Van Maanen eds. 1978).
27 EGON BITTNER, THE FUNCTIONS OF THE POLICE IN
MODERN SOCIETY 46 (1970).
28 See WILSON, supra note 22.
29 See Sherman, Experiments in Police Discretion, supra note 8,
at 78.
30 See, generally, TRAVIS HIRSCHI, CAUSES OF
DELINQUENCY (1969);JOHN LOFLAND, DE -

VIANCE AND IDENTITY (1969).

142 [Vol. 83

MIL WAUKEE EXPERIMENT

could best prevent future domestic violence, regardless of any
"in-
tuitive" grasp of the justice of the situation.

The results of the pioneering Minneapolis experiment helped
proponents of mandatory arrest to try to eliminate the police
justice
model and restore the ministerial model, for that one offense.3 1
To
our knowledge, no other type of offense has ever been subjected
to
an offense-specific mandatory arrest statute by any state
legislature.
While field research suggests that police may easily evade such
man-
dates, 2 the laws have at least increased substantially the
chances of
suspects' being arrested for domestic violence.3 3 They may
even
have created the closest approximation of full enforcement ever
achieved by American police. Whether or not this approach can
ever eliminate discretion is less important than the content of

the
mandate: to arrest everyone, regardless of the likely effects of
the
arrest on future violence.

An alternative to the ministerial approach is to take the likely
consequences of arrest into account in exercising police
discretion.
The key criterion for deciding to arrest in any specific case
would be
the probable effect of the arrest on the suspect's future conduct,
based on predictions derived from controlled experiments in
arrest.
This "professional crime control" model poses enormous
difficul-
ties in finding legally and ethically acceptable guidelines for
when
arrests should and should not be made.3 4 Yet the difficulties
may be
no greater than the inequitable consequences resulting from a
mandatory arrest policy. Equal protection for suspects may
produce
unequal protection for victims.

The choice between "justice" and "crime control" models of

police discretion, up to now, has been moot. As long as
criminology
merely raised questions ofjustice by documenting the inequities
of
police discretion, the choice was limited to legislative versus
police
conceptions of justice. This choice attracted relatively little
public
concern outside the scholarly community of criminal law and
crimi-
nology, allowing the jurisprudence of arrest to lie dormant in
recent
years. But if the evidence presented below is at all persuasive,
it
demonstrates the need for a new approach to police
discretionary
rule-making: one that confronts the variable effects of arrest on
criminal careers.

31 See Sherman & Cohn, The Impact of Research, supra note 6.
See also James W. Meeker
& Arnold Binder, Experiments as Reforms: The Impact of the
'Minneapolis Experiment'on Police
Policy, 17J. POLICE Sci. & ADMIN. 147 (1990).

32 See KathleenJ. Ferraro, Policing Woman Battering, 36 Soc.
PROBS. 61 (1989).
33 See SHERMAN, POLICING DOMESTIC VIOLENCE, supra
note II.
34 Sherman, Experiments in Police Discretion, supra note 8, at
76.

19921



SHERMAN ET AL.

III. RESEARCH DESIGN

From April 7, 1987 to August 8, 1988, the Milwaukee Police
Department conducted a controlled experiment in the use of
arrest
for misdemeanor domestic battery.3 5 A controlled experiment
is a
research design which attempts to isolate a cause and effect
relation-
ship between two variables;3 6 in this case, police decisions to
arrest
or not to arrest and subsequent domestic violence by the

suspects.
The essential logic of a controlled experiment is to make two or
more groups virtually identical in all respects except one: the
treat-
ment to be evaluated (in this case, arrest). The elimination of
rival
hypotheses allows a very strong inference of cause and effect to
be
made about differences in the groups observed.

The method by which pre-existing differences between the
groups are minimized or almost eliminated is called random
assign-
ment, a lottery method giving each suspect an equal probability
of
receiving each treatment. 37 Thus, whether a suspect is arrested
or
not is purely a matter of chance, regardless of police officers'
intui-
tive grasp of the circumstances. This method of evaluating legal
practices has been endorsed by an advisory committee of the
Chief
Justice of the United States, 38 and it has not been subject to
legal
challenge in the arrest experiments conducted to date. The equal

probability of arrest and no arrest in the Milwaukee experiment
was
produced by a computer-generated sequence of police responses
(treatments) in advance of the experiment. This sequence was
sealed and kept secret from all participants in the experiment
until
the actual occurence of each of the 1200 cases eligible for entry
into
the experiment.

Unlike the earlier Minneapolis experiment (and all of its other
replications), the Milwaukee experiment was conducted well
after
the May 1, 1986 implementation of a citywide policy of
mandatory
arrest. Thus it had the effect of reducing the severity of police
re-
sponse in the control group, rather than increasing it in the
experi-
mental group. While that effect improved the ethical posture of
the
experiment, 39 it is unclear what effect it may have had on the
results.
The effect of giving a "break" to the control group may be
different

from "cracking down" on the experimental group. The low level
of

35 See SHERMAN, POLICING DOMESTIC VIOLENCE, supra
note 11, at app. 2; Sherman,
From Initial Deterrence, supra note 11, at 826.

36 See SOCIAL EXPERIMENTATION (Henry W. Rieck en &
Robert F. Boruch, eds., 1974).
37 See STUARTJ. POCOCK, CLINICAL TRIALS: A
PRACTICAL APPROACH (1983).
38 ADVISORY COMMITTEE ON EXPERIMENTATION IN
THE LAW, FED. JUDICIAL CTR., RE -

PORT (1981).
39 See Norval Morris, Impediments to Penal Reform, 33 U.
CHI. L. REV. 627 (1966).

[Vol. 83



MIL WA UKEE EXPERIMENT

awareness of the mandatory arrest policy among the victims and

suspects in the sample, however, suggests that the prior
existence of
mandatory arrest had little effect on the results.

40

A. SAMPLE

The experiment was conducted in four of the six police patrol
districts in Milwaukee. While the districts were racially and
econom-
ically diverse, most of the cases in the experiment came from
poor
black neighborhoods. This is consistent with the often-observed
pattern of greater frequency of requests for police intervention
in
domestic disturbances in such areas than in predominantly white
working class and middle-class neighborhoods. 4 1 The
resulting
sample of suspects was ninety-one percent male, seventy-six
percent
black, sixty-four percent never married to the victim, fifty-five
per-
cent unemployed, thirty-one percent high school graduates,
forty-

two percent intoxicated at the time police arrived, and fifty
percent
with a prior arrest record, consisting of thirty-two percent with
a
prior arrest for domestic battery against anyone and twenty-six
per-
cent with a prior arrest for a battery against the same victim as
in the
presenting case. These characteristics of the 1200 eligible cases
were not very different from the 854 ineligible cases
encountered by
the thirty-five specially selected officers who participated in the
ex-
periment; the most frequent reason for ineligibility was the
absence
of the offender from the scene (Table 1).

40 Twenty-four percent of the victims and nineteen percent of
the suspects inter-

viewed correctly identified the city's policy of mandatory arrest.
Sherman, From Initial
Deterrence, supra note 11, at 845.

41 See BLACK, supra note 13, ch. 6; M. P. Baumgartner, Law

and the Middle Class: Evi-
dence From a Suburban Town, 9 LAw & HUM. BEHAV. 3
(1985).

1992]



SHERMAN ETAL.

TABLE 1
CASE ACTIVITY AND INELIGIBILITY REASONS BY
DISTRICT

District Total
2 3 5 7

Number of Ineligible Cases 119 363 157 215 854
Primary Ineligible Reason N % N % N % N %.

Suspect Not On Scene 63 53 211 58 63 40 144 67 481
Open Warrants, Commitments 12 10 36 10 24 15 11 5 83
Imminent Danger To Victim 12 10 24 7 9 6 8 4 53
Serious Injury To Victim 6 5 19 5 10 6 11 5 46
Both Parties Arrested 2 2 13 4 5 3 5 2 25

Officer Decision 6 5 6 2 7 4 4 2 23
Valid Restraining Order 1 1 5 1 6 4 4 2 16
Victim Insists On Arrest 13
Officer Assaulted 9
Victim Assaulted At Scene 7
Other 98

The 1200 eligible cases encountered by the thirty-five experi-
menting officers constituted twenty-five percent of all domestic
vio-
lence incidents reported by all police in those four districts
during
the eight-hour shift (7:00 p.m to 3:00 a.m.) in which the
experiment
was conducted. There is good reason to believe that the
experimen-
tal cases were typical of all cases citywide, since all officers in
those
four districts in those eight hours produced forty percent of all
do-
mestic batteries citywide, twenty-four hours a day. Dispatchers
were
instructed to refer cases to the experimental officers whenever
they
were available, regardless of the area of the district in which the

case
was located. Other officers also frequently referred cases to the
ex-
perimental officers, especially when they judged the cases to be
eli-
gible: suspect and victim both present; probable cause to arrest;
victim and suspect currently or formerly married, cohabiting, or
parents of a child in common; no valid restraining order in
effect; no
outstanding arrest warrants against either party; one party only
eli-
gible for arrest; no serious injury; no apparent threat of
immediate
violence after police leave; and a victim who did not insist upon
an
arrest being made.

These restrictions created some limitations on the general-
izability of the results but apparently allowed about half of all
mandatory arrest situations into the experiment (with fifty-eight
per-
cent of the cases the experimenting officers encountered).
More-
over, inspection of the cases deemed ineligible in each of the
four

districts shows a fairly high level of consistency (Table 1).

B. RANDOM ASSIGNMENT AND TREATMENTS

If the case was deemed eligible, participating officers agreed to

[Vol. 83



MILWAUKEE EXPERIMENT

radio headquarters for a warrant check. If no warrants were out-
standing, they were to radio or phone the Crime Control
Institute
(CCI) office with the names and dates of birth of the suspect
and
victim, as well as the officer's payroll number. The CCI staff
would
then open a wax-sealed envelope (prepared in Washington,
D.C.) in
a pre-arranged sequence, containing a piece of paper marked
"1,"

"2,2" or "3." The numbers were codes for police actions

(treatments):

Code 1: Standard arrest under mandatory arrest policy; suspect
eli-
gible for release on $250 bail, cash or credit card.

Code 2: Suspect to be arrested in the same way, but to be
released
on personal recognizance as quickly as possible after arrival
at central booking, preferably within two hours.

Code 3: Suspect not to be arrested, but police to read a standard
warning of arrest if police had to return that evening.

The labels "' ","'2' and "3' as well as "Full Arrest," "Short
Arrest"
and "Warning" are used below in both text and tables as
shorthand
for these three treatments.

The purpose of comparing two lengths of time in custody was
to determine whether differences across police agencies in
average
dosage of custody time affected the results of arrest. The earlier
Minneapolis experiment had been conducted with a night in jail

as
the minimum dosage, while other agencies around the country
were
reportedly releasing arrested suspects within two hours. 4 2

This screening process was to be undertaken regardless of prior
contact with the experiment, just as the earlier Minneapolis
experi-
ment had done. The one exception was for prior Code 3 cases on
the same night. If the officers had to return again, they were in-
structed to abort the random assignment and make an arrest,
consis-
tent with the warning delivered on the first encounter. Handling
each event as the unit of analysis for separate randomization-
rather
than consistent application of the same treatment once a suspect
had
been randomized as the unit of analysis-was a major difference
be-
tween Milwaukee and several of the other replications of the
pio-
neering Minneapolis experiment, such as in Omaha, Nebraska.
43

An even greater difference between the Milwaukee and Minne-

apolis experiments was the high degree of compliance with the
ran-
domized design achieved by the Milwaukee officers. As Table 2
shows, in over ninety-eight percent of the cases, the treatments
actu-

42 Our own survey of fifteen Wisconsin police departments
found that eight of them
released domestic violence suspects in less than three hours.
Sherman, From Initial Deter-
rence, supra note 11, at 824.

43 See Franklyn W. Dunford et al., The Role of Arrest in
Domestic Assault: The Omaha Police
Experiment, 28 CRIMINOLOGY 183 (1990).

19921



SHERMAN ET AL.

TABLE 2
TREATMENTS As RANDOMIZED AND DELIVERED

Treatments as Randomized
Treatments as Delivered Arrest/Hold Arrest/Release Warn Total

Arrest/Hold 400 13 1 414

Arrest/Release 1 384 1 386
Warn 3 1 396 400
Total 404 398 398 1200

ally delivered were the same as the randomly assigned
treatments
contained in the envelopes. This includes repeat randomization
of
some couples, for a total of 1,112 couples across the 1200
cases.

44

Most of the twenty "treatment failures," as we trained police to
think of them, were cases randomly assigned to arrest and
release
which had to be misassigned to arrest and hold. Most of those,
in
turn, were due to failures of information systems supporting
police

in the field. The most common problem (6 of the 20) was
incorrect
field information about whether the suspect was wanted on a
war-
rant. When the arrest/release suspects were brought to
headquar-
ters for booking, they were subjected to a second warrant check.
Three of those suspects were found to have given false names in
the
field, and three were found to have had a warrant that the
original,
radio-transmitted warrant check had not found. A seventh case
was
barred for early release by the booking officers because of
outstand-
ing municipal warrants, in violation of the official orders for
the
experiment.

The remaining thirteen reasons for misassignments reveal the
human limitations on random assignment in these
circumstances.
Seven of those cases were caused by unpredictable events after
the
envelope was opened. Two of those cases were changed from

arrest/release to arrest/hold after the suspect became violent in
the
booking area. Two cases were changed because the suspects
were
hospitalized and could be neither booked nor released on
recogni-
zance. Two cases were changed to arrest and hold after evidence
of
additional crimes was discovered at the scene (theft in one case,
drug possession in another). One case was changed to arrest and
hold due to an escalation of danger at the scene after the
envelope
was opened. The last six misassignments were due to simple
officer
error.

The three treatments produced substantially different exper-

44 This means that 7.3 % (88 of 1200) of the randomized cases
were repeat couples,
almost identical to the 7.5% (25 of 330) in the earlier
Minneapolis study.

[Vol. 83

MIL WAUKEE EXPERIMENT

iences for both victims and suspects. 4 5 Perhaps the most
important
difference was the special processing needed to get the short
arrest
suspects out of custody within the two-hour goal. The result of
their
being taken to the head of the line at most stages of the booking
process was an average time in custody of about three hours,
com-
pared to an estimated eleven hours or more for the suspects ran-
domly assigned to full arrest.4 6 Whether this experience is
comparable to speedy booking for everyone in small6r police
agen-
cies remains an unanswered question.

C. OUTCOME MEASURES

Four outcome measures were used to estimate the prevalence
and frequency of repeat violence by the sample suspects. The
most
comprehensive and precise was the "hotline" reports called in

by all
police citywide to the battered women's shelter whenever they
en-
countered a case of domestic battery, whether or not they could
make an arrest. These reports encompassed most, but not all, of
the
second and third data sources: arrests of the suspects for repeat
vio-
lence (against any victim, including the same one as in the
present-
ing incident), and offense reports of repeat violence by the same
suspect against the same victim. All three of these "official"
sources
were available for 100% of the cases.

The fourth data source was up to two face-to-face interviews
conducted with the victim in each randomized case. One
interview
was attempted shortly after the presenting incident, for the first
900
of the 1200 cases. A separate interview was attempted in all
1200
cases six to twelve months after the presenting incident. The
initial
interviews were suspended after 900 cases to test for any

possible
influence of the interviews on the rate of repeat violence. 4 7
Re-
sponse rates for both interviews were fairly high, at seventy-
eight
percent for the initial interviews and seventy-seven percent for
the
long-term follow-ups.

IV. MAIN EFFECTS

The analysis of the Milwaukee experiment proceeded in two
stages. The first stage was the analysis of the "main effects" of
the
randomized experiments, or the differences (or lack of them) in
out-
come measures between the three treatment groups. The second

45 Sherman, From Initial Deterrence, supra note 11, tbl. 2, at
831.
46 Id.

47 No differences in effects of arrest were found between the
last 300 cases and the
first 900 cases.

1992]



SHERMAN ET AL.

stage analyzed differences in treatment effects within various
subgroups of the sample. Of the two, the main effects are more
sta-
tistically powerful and more straightforwardly interpretable.
Their
analysis begins with an examination of the effects of the
treatments
on the amount of time each couple spent together during the
follow-up period. Answering this question is a necessary
first.step in
determining whether any differential incapacitation effect has
oc-
curred which might obscure or falsely portray any deterrent
effects.

48

A. TIME-AT-RISK

One possibility is that making an arrest might be more likely
than failing to make an arrest to break up a couple; the arrested
suspect may simply never return home after the arrest, whereas
the
warned suspect was never taken away. This did not happen very
often, however. Among the Milwaukee arrest group couples,
seventy-four percent had been together again by the time of the
ini-
tial interview. By the time of the six month interview, forty-one
per-
cent of all victims said they were living with the suspect then,
and
another thirty-one percent said they had lived with the suspect
for at
least part of the time since the randomized police response.
Among
those who were living together, seventy-two percent had
cohabited
all of the time since the randomized response.

The key question for our analysis is whether time-at-risk varied
by treatment group. One way to answer that question is by
analyz-
ing the set of interviews that were done consistently near the six

month anniversary of the randomized response, namely the 563
follow-up interviews completed between case 473 and case
1200.
These data have the least amount of error in estimating time-at-
risk
due to variations in the amount of time since the presenting
inci-
dent. They show that there were only slight differences in the
ex-
tent of cohabitation across the three treatment groups.

Among those interviewed close to six months in all three treat-
ment groups (N=563, see supra), the majority of couples were
no
longer cohabiting: only forty-five percent of the full arrest
cases,
forty-four percent of the short arrest cases, and thirty-eight
percent
of the warning cases were cohabiting at the time of the
interview.
Of those still cohabiting at six months, the proportions who had
co-
habited the entire time since the presenting incident were
seventy

48 See Albert J. Reiss, Some Failures in Designing Data
Collection That Distort Results, in
COLLECTING EVALUA TION DATA: PROBLEMS AND
SOLUTIONS 161 (Leigh Burstein et al.,
eds., 1985).

[Vol. 83



MIL WAUKEE EXPERIMENT

TABLE 3
COHABITATION DAYS To FOLLOW -UP INTERVIEW

By TREATMENT GROUP

Treatment N of Mean Days to Mean Days Standard Cohabitation
Group Interviews Interview Cohabitation Deviation Ratio

Full Arrest 315 292 136 151 .47

Short Arrest 280 279 115 139 .41

Warning 287 295 121 147 .41

t tests
Full arrest vs. Short arrest t = 1.79, df = 593, p = .074
Full arrest vs. Warning t = 1.27, df = 600, p = .205
Short arrest vs. Warning t = - .49, df = 565, p = .623

percent, seventy-one percent, and eighty-five percent,
respectively.
Among couples not cohabiting at the time of the interview,
thirty-
one percent, thirty-two percent and twenty percent,
respectively,
had cohabited for some portion of the six month follow-up
period.
Note that the differences between the arrest and warning groups
are
not always consistent in direction, although they do show lower
prevalence of any cohabitation in the warning group compared
to
the arrest group.

Another test for differences in time-at-risk is to estimate the to-
tal number of days of cohabitation reported by the victims at all
follow-up interviews, regardless of when the interviews were
done

(N= 882, Table 3). This procedure required distinguishing four
categories from among the interview data: (1) those who
cohabited
continuously; (2) those who had not cohabited continuously but
were cohabiting on the date of the interview; (3) those who
were not
cohabiting on the date of the interview but had cohabited some
of
the time since the presenting incident; and (4) those who had
not
cohabited at all since the presenting incident. Precise estimates
of
the number of days of cohabitation were available for the first
and
fourth categories from the dates of the presenting incident and
the
interview. The two middle categories, however, provide only
victim
recall, in days, weeks or months, to estimate the days of
cohabitation.

4 9

Table 3 presents the results of our estimates (expressed as
Mean Days Cohabitation) of actual days at risk, for each

treatment

49 Victim recall of the days of cohabitation is far from perfect.
In five cases, for ex-
ample, the victim's estimates for groups 2 and 3 were in excess
of the time between the
presenting incident and the interview. In twenty-two other
cases, the victim said they
had lived together some of the time but provided no estimate for
how much time. For
reasons like this, we treated 39 of the 921 interviews as missing
data, without examining
their treatment groups. That left 882 interviews across all three
treatment groups.

1992]



SHERMAN ET AL.

group. It shows that there were no greater differences in time-
at-
risk than we would expect by chance variation (p =.05). It also
shows that, on average, all three treatment groups were

cohabiting
less than half the time from the presenting incident to the
interview.
We do not know whether this represents a before-after decrease
in
the cohabitation ratio (days cohabiting divided by total days).
The
relationships could have been just as intermittent and variable
in
level of cohabitation in the period before the presenting incident
as
in the period after. We do know, however, that ninety percent of
the 1200 police reports and seventy-four percent of the 900
initial
victim interviews reported that the couples were cohabiting on
the
date of the presenting incident. This compares to only forty-one
percent of the total follow-up interviews reporting cohabitation
since the presenting incident. Moreover, thirty-six percent (114)
of
the full arrest group's victims, forty-two percent (119) of the
short
arrest victims, and forty-seven percent (135) of the warning
group's
vitim's reported zero days of cohabitation since the presenting

inci-
dent. The evidence suggests, then, that there was a reduction in
the
prevalence of cohabitation (as a percentage of all couples), even
if
there might not have been an overall reduction in couple-days at
risk.

We conclude two things from these findings. First, the differ-
ences in contact across treatment groups are not great enough to
affect the findings presented below about differences in repeat
vio-
lence between the groups. Whatever differences in recidivism
we
find are more likely attributable to deterrence or escalation than
to
differential time-at-risk. Second, the potential differences in
deter-
rence within the relationship are greatly attenuated by the low
time-
at-risk overall and by reduced prevalence of cohabitation. This
is
significant from a policy standpoint, since it suggests that no
matter
which of the three responses police provided, one major sequel

was
a tendency for the couple to split up. This fact alone may help
ex-
plain, for example, the finding that women who called police
about
domestic violence in the late 1970s (when police did not usually
make arrests) were half as likely to suffer repeat violence as
those
who did not call police-an effect possibly due entirely to
reduced
time-at-risk. 50

B. INITIAL DETERRENCE

We have reported elsewhere a clear initial deterrent effect of

50 See PATRICK A. LANGAN & CHRISTOPHER A. INNES,
BUREAU OF JUSTICE STATISTICS,
PREVENTING DOMESTIC VIOLENCE AGAINST WOMEN
(1986).

152 [Vol. 83

MIL WA UKEE EXPERIMENT

both short and full arrest treatments in comparison to the
warning
treatment. 51 For thirty days or more after the presenting
incidents,
the prevalence (proportion of cases with one or more instances
of) of
repeat violence reported in victim interviews is substantially
lower
for the arrest groups. For short arrest only, the frequency
(average
number of instances per case) of violence reported to the hotline
is
significantly lower than for the warning group. Other official
meas-
ures (arrest and offense reports) show no evidence of initial
deter-
rence, either in frequency or prevalence.

FIGURE 1

Survival Functions by Arrest or Warning

2 4 6 8 10 12

Number of Months

14 16 18

Here, we display the initial deterrent effect of both types of
arrest combined, a procedure recommended by some commenta-
tors. 52 Figure 1 shows the "survival" trend in the prevalence of
re-
peat violence over time, with an obviously clear advantage for
the
arrested suspects in the early days. At about seven to nine
months
after the presenting incidents, however, the arrest and non-
arrest

51 Sherman, From Initial Deterrence, supra note 11, at 836.
52 See, e.g., Arnold Binder &James W. Meeker, Experiments as
Reforms, 16J. GRIM. JUST.

347 (1988).

O
0

Co

>0C

Z3
()

C
0

--

(0

0) 0
0

L0

Cc;

1992]



SHERMAN ET AL.

TABLE 4
LONG TERM PR EVALENCE OF SAME -VICTIM REPEAT
VIOLENCE

(during period up to follow-up interview date)

Treatment

Sample N = 921
All Victim Interviews

Repeat Violence N
Prevalence Ratio

Hotlines to Interview Date
Repeat Violence N
Prevalence

Arrests to Interview Date
Repeat Violence N
Prevalence Ratio

Offenses to Interview Date
Repeat Violence N
Prevalence Ratio

Any Measure to Interview Date
Repeat Violence N
Prevalence Ratio

* P < .05, two tailed tests. n.s.

Full
Arrest

324

Short
Arrest Warning

300 297

113 89 92
35% 30% 31%

88 80
27% 27%

66 62
20% 21%

78
26%

69
23%

86 75 75
27% 25% 25%

148 131
46% 44%

131
44%

P Value of
Pair Differences*

1&2 1&3

n.s. n.s. n.s.

n.s. n.s. n.s.

n.s. n.s. n.s.

n.s. n.s. n.s.

n.s. n.s. n.s.

means non-significant.

curves cross over, and from there on out the
worse.

53
arrest group does

C. LONG-TERM ESCALATION

Whatever the initial effects may be, there is clearly no long-
term
deterrence from arrest in the Milwaukee experiment. Tables 4
and
5 show no reductions in either the prevalence of same-victim
vio-
lence or the frequency of any-victim violence in the arrest
groups
compared to the non-arrest (warning) group. The only

significant
differences, in fact, are those showing arrest increasing the risk
of vio-
lence. These differences are not consistent enough across
measures
for us to draw a conclusion that arrest backfired, and the
magnitude
of the increased risk from arrest is generally small. But the
direction
of the difference is fairly consistent across measures in favor of
warnings yielding lower long-term risks of repeat violence.

The problem with Tables 4 and 5 is that they suffer from "trun-
cation," as statisticians call it, in their long term effects. The
follow-
up period is cut off arbitrarily, and the truncation is inconsistent
across cases. This raises various problems of interpretation, and

53 In order to make them comparable, Figures I and 2 are
limited to the 1,133 cases
for which employment data are available.

Measure

[Vol. 83

MIL WA UKEE EXPERIM ENT

TABLE 5
LONG-TERM FREQUENCY OF ANY -VICTIM REPEAT
VIOLENCE

(unrestricted follow-up period)

P Value of
Measure Treatment Pair Differences*

Full Short
Arrest Arrest Warning 1&2 1&3 2&3

Sample N = 1200 404 398 398
Hotlines n.s. .02 .00

Repeat Violence N 296 301 261
Mean Events Per Suspect .73 .76 .66

Arrests n.s. n.s. n.s.
Repeat Violence N 146 157 151

Mean Events Per Suspect .36 .39 .38

Offenses .04 n.s. n.s.
Repeat Violence N 200 168 179
Mean Events Per Suspect .49 .42 .45

Offenses Without Arrests .00 .02 n.s.
Repeat Violence N 134 84 101
Mean Events Per Suspect .33 .21 .25

* P < .05, two tailed tests. n.s. means non-significant.

also violates important assumptions necessary to use the tests of
sta-
tistical significance we have employed here. In order to deal
with
the truncation problem and to take full advantage of the
maximum
period of observation completed after each randomized case, we
computed the mean number of days to the first repeat incident
of
domestic violence among the thirty-six percent of all cases with
any
repeat violence at any time, during a period of up to twenty-two
months after the randomized police response. 5 4 This

comparison of
arrest and warning yields a statistically significant escalation
effect
for the arrest treatment. At a mean of 124 days to first repeat
vio-
lence, the combined arrest group recidivated twenty-three
percent
sooner than the warning group, which averaged 160 days to first
failure.

The time to failure measures, however, have great limitations
for policy research on violence. Originally designed to analyze
the
permanent "failures" of light bulbs burning out or medical
patients
dying, the models lose the important information on what
happens
after the initial failure. The question of total repeat violence,
and
not just whether there has been any, is also an important one to
the
police officers who conducted the experiment. As they told us
in
our last meeting of the experiment, their primary concern was
the

reduction of calls to police about domestic violence citywide.
This

54 This computation is also restricted to the 1,133 cases used in
Figures 1 and 2.

1992]



SHERMAN ET AL.

concern requires that effects on high-rate offenders be weighted
more heavily than effects on low rate offenders, with analyses
that
take total numbers of violent events into account.

If that is the case, then the victim interview data must be cast
aside, given the difficulty of obtaining a precise count of events
in
the victim interviews. (They were also set aside, of course, from
the
time-to-failure analyses in Minneapolis and Omaha because
victims
also have difficulty in giving a precise date for even one

offense.)
The hotline data, however, are ideally suited to the task of
providing
exact counts. And as reported elsewhere, the hotline data also
show
a statistically significant long-term escalation effect from
arrest. 55

The effect is limited to the short arrest treatment only, but that
fact
may have broad policy significance for the many police
agencies re-
leasing domestic violence arrestees within three hours of arrest.

In sum, the main effects analysis shows some evidence of initial
deterrent effects, no evidence of long-term deterrent effects,
and
some evidence of long-term escalation in both the timing and
fre-
quency of violence against any victim. While the large number
of
tests showing statistical nonsignificance may make some
readers sus-
pect that some of the effects occurred by chance, there is little
doubt

that the main effects of the Milwaukee experiment fail to
replicate
the strong specific deterrence showing of the earlier
Minneapolis
experiment.

V. VARIABLE EFFECTS

The large sample size of the Milwaukee experiment was explic-
itly designed to go beyond the main effects and to explore the
possi-
bility that arrest may have different effects on different kinds of
people. Four years before the experiment began, the central hy-
pothesis was described at a Duke Law School conference on
police
discretion: that more socially marginal people, as indicated by
such
characteristics as unemployment and unmarried cohabitation,
would be less deterrable than less marginal people. 56 The
experi-
ment collected data on both those indicators of marginality, as
well
as several others, including high school graduation, length of
prior
cohabitation, and race (because of its effect on employment

rates).
The hypothesis was not necessarily that arrest would backfire
for the
more marginal groups, although that is generally what was
found
with respect to frequency of repeat violence and less so with
respect
to its prevalence.

55 Sherman, From Initial Deterrence, supra note 11, at 837.
56 Sherman, Experiments in Police Discretion, supra note 8, at
78.

[Vol. 83



MIL WAUKEE EXPERIMENT

A. TWO CAUTIONS

1. Experimental vs. Correlational Results

The whole purpose of doing experiments is to reduce the un-
certainty associated with correlational analysis. The endless

number
of possible correlations to test always leaves researchers
uncertain
whether the correlations found are true "causes" or mere
coattails
to a hidden truly causal factor. By randomizing, experimenters
vir-
tually eliminate such unknown rival hypotheses. That is why the
main effects analysis is more straightforward.

A problem arises when one begins to explore how different
subgroups within an experiment react to the experimental treat-
ment. The strongest way to examine that question is to plan
those
explorations in advance, building them into the design. By
"block-
ing," or assigning police responses under a separate random
sched-
ule for each subgroup, one would still eliminate rival
hypotheses for
the apparent effects of the treatments within each group.5 7 If
we
had done that in Milwaukee, for example, we would have had a
sepa-
rate set of pre-randomized envelopes for black and white

suspects,
or for employed and unemployed suspects. We could even have
used separate sets of envelopes for some combinations of such
fac-
tors (called factorial designs), such as employed unmarried
suspects,
unemployed mairried suspects, employed married suspects, etc.

Just contemplating such a design, however, shows how compli-
cated it can become. It can also raise major political problems
in the
selection of factors for blocking randomization within the
separate
lists. When this experiment was negotiated in 1986, the use of
ran-
domized experiments in arrest was still a very fragile idea, with
only
one precedent. Blocking randomization in advance on individual
suspect or victim factors could have caused enough controversy
to
kill the whole venture and so was eschewed.

Many analysts have advocated examining the underlying struc-
ture of main effects in randomized experiments, much as
surveys

analyze demographic patterns in attitudes and reported
behavior.
These "post-hoc" analyses of experiments can strongly suggest
causal relationships for some kinds of people. But what post-
hoc anal-
ysis cannot do is prove that there is a causal interaction effect
between a ran-
domized treatment and a correlated characteristic. The second
stage
analysis results of differences in treatment effects within
subgroups,
which are reported in this article, are couched in strong
language
because we believe the findings to be theoretically coherent and

57 See PococK, supra note 37.

1992]



SHERMAN ET AL.

very likely to represent truly causal relationships. But without a
ran-

domized design within each of the subgroups, we cannot be
nearly
as certain of the interaction effects as we are of the "main
effects" of
no difference across treatments.

2. Replicated vs. Unreplicated Results

A second caution is also in order. The earlier Minneapolis ex-
periment had a broad-ranging policy impact long before any at-
tempt was made to replicate it. This fact has been the subject of
considerable discussion 58 and criticism. 5 9 While it is
arguably better
to make policy on an unreplicated finding than on no finding at
all,
it is important to know the difference.

The Milwaukee findings are not unreplicated. At the time of
this writing, they have been replicated on two out of two
attempts,
as reported below. But we must remind the reader that these
three
experiments are just snapshots of three cities at three times. Not
enough is yet known about how social experiments generalize to
other times and places to be certain the thrice-observed effects

will
hold true. This is true no matter how often a finding is
replicated.
Nonetheless, the replication of the findings increases our
confidence
about their generalizability.

B. ESCALATION AMONG MARGINAL SUSPECTS

1. Prevalence of Repeat Violence

Table 6 presents the differences in the prevalence rates of each
treatment group within a uniform six month (183-day) period
fol-
lowing the presenting randomized incident, controlling for
various
indicators of individual characteristics. These rates show, in
effect,
the odds of any given individual suspect committing at least one
new
act of domestic violence. The relative (as distinct from
absolute) per-
centage differences in those odds, calculated using the warning
group rate as the base of one hundred percent, all show that
arrest

versus non-arrest treatment has very different effects for
different
kinds of people.

The most consistent prevalence effect is that those with high
stakes in social conformity, experience a deterrent effect from
both
versions of arrest, while those with low stakes in conformity
show no
such effect. Those who are employed, high school graduates,
white,
or married and those who have cohabitated for over two years
all

58 See Sherman & Cohn, The Impact of Research, supra note 6.
59 See Richard 0. Lempert, Humility is a Virtue: On the
Publicization of Policy-Relevant

Research, 23 LAw & Soc'Y REV. 145 (1989).

158 [Vol. 83



MIL WA UKEE EXPERIMENT

TABLE 6
PREVALENCE OF REPEAT HOTLINE REPORTS

PER 10,000 SUSPECTS
(during a six-month period)

Individual Full Short
Characteristic Arrest Arrest Warning

Prior 3,341 3,950 3,846
(137) (119) (130)

No Prior 1,873 1,828 2,089
(267) (279) (268)

Blacks 2,656 2,721 2,633
(305) (305) (300)

Whites 1,481 1,538 2,436
(81) (78) (78)

Employed 2,011 1,702 2,766
(189) (188) (141)

Unemployed 2,775 3,140 2,629
(209) (207) (251)

High School 2,278 2,958 3,235
(158) (142) (102)

Less than H.S. 2,327 2,000 2,466
(202) (220) (296)

Married 1,700 1,509 2,564
(147) (106) (117)

Not Married 2,813 2,808 2,734
(256) (292) (278)

Yrs. Cohabit > 2 2,438 2,795 2,895
(201) (161) (152)

Yrs. Cohabit < 2 2,456 2,185 2,444
(114) (119) (135)

N for each prevalence rate is shown in parentheses.

% Difference

1&3 2&3

-13.3 02.7

-10.3 -12.5

00.8 03.3

-39.2 -36.8

-27.3

05.5

-29.6

-05.6

-33.7

02.9

-15.8

00.5

-38.5

19.4

-08.6

-18.9

-41.1

02.7

-03.5

-10.6

show substantially lower prevalence rates of repeat violence
when
randomly assigned to arrest than when warned. Yet their
opposites
(unemployed, dropouts, etc.) show little difference in
prevalence of
recidivism between being arrested or warned.

Looking solely on rates of prevalence of repeat violence, there
is
apparently good reason to adopt a policy of mandatory arrest.
Arrest has strong deterrent effects for some groups, with up to
one-
third fewer suspects repeating their violence in the next six
months.
Its failure to deter others does not, at least, cause any harm.
This
apparent conclusion, however, demonstrates the importance of
looking beyond prevalence to a robust examination of the
frequency
of repeat violence. In the Milwaukee experiment, where
frequency
was well measured, prevalence alone as an outcome measure
would
be a very misleading basis for policy implications.

ANY-VICTIM

19921



SHERMAN ET AL.

2. Frequency of Repeat Violence

More important from a policy standpoint are the frequency
rates, which show the effects of a mandatory arrest policy on
the
total incidence of violence in the community. The group (not
indi-
vidual) frequency results per days at risk in the Milwaukee
experi-
ment are shown in Table 7. This table shows that over an
unrestricted follow-up period, arrest not only deters some
groups; it
also escalates other groups into far higher frequency of
domestic
violence. The magnitude of the percentage differences (again
using
the warning group as the base of one hundred percent) in effects
across subgroups is quite large by the normal standards of
social
research and statistics. The table consistently shows arrest to
make
those with less stake in conformity more violent, and those with
more stake in conformity less violent.

The difference in reaction to full arrest between blacks and
whites is startling. The fact that 10,000 arrested whites produce
2,504 (=5,212-2,708) fewer acts of domestic violence a year
than
warned whites, while 10,000 arrested blacks produce 1,803 (=
7,296-5,493) more acts of violence per year than warned blacks,
is
a far larger magnitude than we ever expected. If three times as
many blacks as whites are arrested in a city like Milwaukee,
which is
a fair approximation, then an across-the-board policy of
mandatory
arrests prevents 2,504 acts of violence against primarily white
wo-
men at the price of 5,409 acts of violence against primarily
black
women. While one explanation is that this effect is mostly due
to
racial differences in unemployment rates, the differential impact
by
race is just as morally troublesome whatever the underlying
cause.

There is even less reduced-violence benefit due to full arrest by
employed suspects at the price of increased violence by

unemployed
suspects. With 958 (= 5,991-5,033) fewer acts of violence
commit-
ted against victims of 10,000 employed suspects who had been
ar-
rested than of those who had been warned, the price equals
2,274
(= 7,504-5,230) more acts of violence per 10,000 unemployed
sus-
pects who had been arrested than if they had only been warned.
Some might reason that since most people are employed, this
policy
seems to be reasonable as a utilitarian tradeoff. But wherever
the
majority of the domestic violence incidents police respond to
in-
volve unemployed suspects-as they do in Milwaukee-then
mandatory arrest fails to produce the greatest good for the
greatest
number. The fact that this is not evident in the main effects
reflects
the relatively even splits of most of the three treatment groups
on
most of the characteristics presented in the table. Figure 2
displays

[Vol. 83



MIL WA UKEE EXPERIMENT

TABLE 7
AFTER-ONLY MEAN FREQUENCY OF HOTLINE REPORTS
PER ANNUM

PER 10,000 SUSPECTS

(unrestricted follow-up period)

% Difference
Individual Full Short
Characteristic Arrest Arrest Warning 1 & 3 2 & 3
Prior 10,771 11,318 8,403 28.2 340

(137) (119) (130)
No Prior 4,204 4,899 4,179 00.5 17.2

(267) (279) (268)
Blacks 7,296 7,410 5,493 32.8 34.9

(305) (305) (300)
Whites 2,708 4,942 5,212 -48.0 -05.2

(81) (78) (78)
Employed 5,033 4,842 5,991 -16.0 -19.2

(189) (188) (141)
Unemployed 7,504 8,428 5,230 43.5 61.1

(209) (207) (251)
High School 5,869 7,355 6,367 -07.8 15.5

(158) (142) (102)
Less than H.S. 6,360 6,211 5,106 24.6 21.6

(202) (220) (296)
Married 4,720 4,774 5,386 -12.3 -11.3

(147) (106) (117)
Not Married 7,222 7,441 5,545 30.2 34.2

(256) (292) (278)
Yrs. Cohabit > 2 7,048 4,774 5,386 30.9 -11.4

(201) (161) (152)
Yrs. Cohabit < 2 6,195 5,666 5,661 09.4 00.0

(114) (119) (135)
N for each mean is shown in parentheses.

the differences in survival curves over time among the four
groups
divided by arrest and employment status.

1992]



162 SHERMAN ET A L. [Vol. 83

FIGURE 2

Survival Functions by Arrest and Employment Status
0

:)

.. .... ......

C
0

00
Q_

Warning/No Job
--- Arrest/No Job

Warning/Employed
0 - - - Arrest/Employed

00 2 4 6 8 10 12 14 16 18

Number of Months

It is particularly interesting that the worst escalation effect in
Table 7 is found among unemployed suspects who received the
short arrest treatment. While the unemployed have largest
percent-
age increase in violent acts per 10,000 of any group (61.1%),
the
employed have one of the largest deterrent effects from short
arrest
(-19.2%)-even slightly larger than the effects of full arrest

(-16.0%). This may suggest that the employed react to getting a
break, or to getting out early enough to go to work, by avoiding
a
second chance to lose their job.

An important point about the employment data in an industrial
town like Milwaukee is that the suspects' jobs were primarily at
lower levels of occupational prestige. The first ten most often
listed
occupations in the sample, for example, were general assistance
(a
part-time workfare program), factory maintenance, security
guard,
retail stock handler, grocery store meat wrapper, grocery
cashier,
car wash attendant, and valet shop clerk. A review of all of the
sus-
pects' listed occupations shows a total of six mid-level prestige
jobs:
one teacher, one child's counselor, one editor, one retail sales
man-



MIL WAUKEE EXPERIMENT

ager, one insurance salesman, and one bank executive. The
differ-
ence between the working class and the underclass is often
forgotten by middle-class. The middle class is concerned with
hav-
ing any great job to lose or career to ruin, as opposed to the
under-
class, which is concerned with working at all.

A high school education predicts a fairly weak deterrent effect
of arrest, but lack of a high school education predicts a fairly
strong
criminogenic effect of arrest. Marriage is more powerful than
edu-
cation, with a marriage license enhancing the deterrent effect
and its
absence aggravating the adverse reaction to arrest. Contrary to
our
expectations, length of cohabitation goes the other way,
although it
is not inversely correlated with marriage. Arrest appears to
make
suspects more violent if they have lived with the victim for over
two

years than if they have not.

3. Are the Interactions Significant?

The next question is whether these differences are due to
chance. Table 8 presents the results of a Poisson regression
model
for both main effects and two-way interactions. Only prior
domestic
hotline reports show a main effect that is not due to chance, and
strongly predicts the number of after-treatment incidents. But
the
interaction effects are generally significant and are all in the
theoret-
ically predicted direction. Our interpretation of the model is
that
being black rather than white increases the recidivism frequency
rate for the arrest group by sixty-two percent, while having a
job
reduces it by fifty-eight percent and being a high school
graduate
reduces it by forty-three percent. The interaction effects with
mar-
riage and length of cohabitation are weaker and not significant.

The key indicators of marginal social status, then, fairly
consist-
ently show that arrest increases the frequency of violence
among
marginal suspects. The deterrent effects of arrest on persons
with
higher stakes in conformity are not consistently strong, possibly
be-
cause there are so few middle class persons in the sample. The
de-
terrent effects could, for example, become stronger as the value
of
the suspect's stake in conformity increased. The most startling
in-
teraction effect is the strongly opposite directions of the effects
of
arrest for whites and blacks.

4. Why Do Prevalence and Frequency Results Conflict?

Accounting for the differences between Tables 6 and 7 is an
important policy question. They differ both in follow-up period
covered and in treatment effects. In order to ensure that the
differ-

1992] 163



SHERMAN ET AL.

TABLE 8
POISSON REGRESSION COEFFICIENTS MAIN EFFECTS
AND TwO-WAY

INTERACTIONS (CONTROLLING PR IOR OFFENDING AND
DAYS AT
RISK) FOR ANY VICTIM AFTER -ONLY FREQUENCY OF
HOTLINE

REPORTS

Variable

Main Effects
Intercept
Arrest
Black
Employed
High School

Married
Cohabit. > 2
Prior
LogADAYS

Two-Way Interactions
Arrest & Black
Arrest & Employed
Arrest & High School
Arrest & Married
Arrest & Cohabit.

* P < .05
T - ratio Prob Itl > x

Coefficient Std. Error T-ratio Prob >x
t

-8.34405
-.113133
.066061
.248127
.226241

-.185799

.114660
.842037
1.21703

.624443
-.584834
-.430113
-.070437
.177514

1.02519
.347519
.189910
.156097
.152576
.171293
.154494
.102621
.165801

.297136

.214761

.209724

.230941
.217008

-8.139
-. 326

.348
1.590
1.483

-1.085
.742

8.205
7.340

2.102
-2.723
-2.051

-. 305
.818

.00000

.74477

.72795

.11193

.13812
.27806
.45799
.00000*
.00000

.03559*

.00647*

.04028*

.76037

.41336

ence in treatment effects was not due simply
tween the six-month restriction in Table 6

to the difference be-
and the individually

varying, longer-term follow-up (up to twenty-two months) in
Table
7, we recalculated Table 7 with a six-month (183-day)
restriction. 60

The results reveal little difference. We can therefore disregard
an
assertion that prevalence and frequency results differ due to
differ-
ences in follow-up periods.

Given a clear difference between frequency and prevalence
rates, the next question is how the higher frequency rates are
dis-
tributed across individuals. The two extreme alternatives would
be:

1) similar frequency rates for most offenders, with a small
number of extremely arrest-reactive offenders driving the
overall group frequency rate upward, and

2) generally higher frequency rates among all subgrouping of

the arrest-reactive offenders, such as greater percentages of
suspects with two events regardless of subgroup
membership.

These alternatives have major policy significance, since a
mandatory
arrest policy with a few exceptions might become an alternative
to a

60 The results are available on request; ask the first author for
Table 8-12.

[Vol. 83



MIL WAUKEE EXPERIMEN T

more generally discretionary policy. Table 9 examines the
frequency
distribution of repeat events by treatment group and the three
key
individual predictors of treatment effects: employment,
education,
and race. It shows that between two-thirds and three-quarters of

all
recidivist events in the first six months are concentrated among
of-
fenders who had only one repeat event. Only whites randomized
to
full arrest pose an exception to that pattern, with ninety-two
percent
of the recidivism committed by offenders with only one repeat
event. Similarly, the concentrations of offenders with three or
more
subsequent events in six months vary little, with the exception
of
whites.

These data falsify the hypothesis that the difference in fre-
quency rates by subgroup characteristics is due to a small
number of
highly arrest "allergic" offenders. Rather, the data are
consistent
with the conclusion that arrest produces generally higher
frequency
rates among more socially marginal persons.

C. REPLICATIONS

The generalizability of these findings is greatly enhanced by
their having been replicated in experiments in two other cities:
Omaha, Nebraska and Colorado Springs, Colorado. 6 1
Statistically
significant interaction effects might be obtained within any one
ex-
periment just by chance, if enough tests are performed. It is
highly
unlikely, however, that similar interaction effects in three
independ-
ent experiments are due merely to chance.

The most complete replication is found in Omaha. Our own re-
analysis of the Omaha data allowed us to examine both the
preva-
lence and frequency data among different subgroups within that
sample. Both results are consistent with the Milwaukee results.
Consistent with the less severe concentration of urban problems
in
Omaha, the benefits of mandatory arrest there appear to
outweigh
the risks. The frequency data, reported elsewhere, 62 show a
stronger deterrent effect among the employed, and a weaker
escala-
tion effect among the unemployed, than in Milwaukee. Nonethe-

less, the directions of the interactions are consistent with the
Milwaukee results.

The same is generally true for the Omaha prevalence data re-

61 The interactions for Colorado Springs are reported in this
issue. Richard A. Berk

et al., A Bayesian Analysis of the Colorado Springs Spouse
Abuse Experiment, 83 J. CRIM. L. &
CRIMINOLOGY 170 (1992). The interactions for Omaha are
reported in SHERMAN, POLIC-
ING DOMESIC VIOLENCE, supra note 11, and in Sherman &
Smith, Crime, Punishment and
Stake in Conformity, supra note 11.

62 Sherman & Smith, Crime, Punishment and Stake in
Conformity, supra note 11.

1992]



SHERMAN ET AL.

el eoo.e

04 0 J

e,,,e,1",.-OCinC)-

*

z

z

Z

I6:
-0-

*v

,C4 00

000 c D
0 Z

oI

rn

10 e C - 1
C\lCI

1-.

0

C' M. Vx 0"0O .- -

1...

LO o~ 00V

0

r- C/)

C) C14

+ 0

C 0

4
nzCczz

cz 0

[Vol. 83

1:- -

CI4 10 0 0 0 C) C>,10 r'- 0 0
C> C0 C .1 t - ' ,

-1 - 1:--

-0 CfIr- CI C - -

0 CIS Co ,t. C" , 0 ' ,,

mQ Cl- 10

M kO r- 0 r Clc >C

+ +

C) l~ (D 0-"t

C'T cq

o\ 11t

0 -0z 0

- C,
00 C.0



1992] MIL WAUKEE EXPERIMENT 167.

TABLE 10
OMAHA, NEBRASKA

PREVALENCE OF REPEAT OFFICIAL VIOLENCE BY
RANDOMIZED

POLICE TREATMENT AND SOCIAL STATUS
(401 day maximum follow-up period)

Social Status Arrested Not Arrested

Employed 19% 28%

Unemployed 57% 53%

Married 29% 18%
Unmarried 35% 48%

High School Graduate 24% 34%
High School Dropout 48% 32%

Whites 17% 27%
Blacks 55% 47%

ported here in Table 10. With the exception of marriage, the
differ-
ences in prevalence of officially measured repeat violence (any
rearrest or new complaint, combined) go in the same directions
as
in Milwaukee. Three out of four indicators of marginality are
associ-
ated with less deterrence and generally with some escalation.

The Bayesian analysis of the Colorado Springs experiment is
confined to prevalence only. Absent a mandatory arrest policy
and a
strong custom of reporting all domestic battery misdemeanors,
the

Colorado Springs experiment does not offer a very robust test of
differences in frequency. The prevalence results, however, are
con-
sistent with the Milwaukee results, showing clear deterrence of
per-
sons with higher stakes in conformity and much weaker
evidence of
escalation effects of arrest for less marginal people. As the
Milwau-
kee results suggest-but with only Omaha as a replication-the
analysis of prevalence as the only outcome may obscure
important
consequences of mandatory arrest policies on the total amount
of
domestic violence in a community. Frequently rates more dearly
show the escalation effects of arrest.

VI. CONCLUSIONS AND IMPLICATIONS

The Milwaukee domestic violence experiment finds no evidence
of an overall long-term deterrent effect of arrest. The initial
deter-
rent effects observed for up to thirty days quickly disappear. By
one
year later, short arrest alone, and short and full arrest combined,

produce an escalation effect. The first reported act of repeat
vio-
lence following combined arrest treatments occurs an average of
twenty percent sooner than it does following the warning
treatment.

The Milwaukee experiment does find strong evidence that



SHERMAN ET AL.

arrest has different effects on different kinds of people.
Employed,
married, high school graduate and white suspects are all less
likely
to have any incident of repeat violence reported to the domestic
vio-
lence hotline if they are arrested than if they are not.
Unemployed,
unmarried, high school dropouts and black suspects, on average,
are reported much more frequently to the domestic violence
hotline
if they are arrested than if they are not. The magnitudes of the
in-

creased domestic violence associated with arrest of the latter
groups
are substantial, ranging up to sixty percent. The Milwaukee
findings
are replicated clearly in Omaha, as well as by a more limited
data set
in Colorado Springs.

These results strongly suggest that arrest has variable effects on
criminal careers, depending upon the social marginality of the
of-
fenders. At least for the offense of misdemeanor domestic
battery-
or harassment, in the case of Colorado Springs-arrest appears to
deter less marginal persons and to escalate the frequency of vio-
lence among more marginal persons. Whether this pattern
applies
to other types of offenses is still unknown, but it is certainly
plausi-
ble. As one leading labeling theorist observed two decades ago,
arrest probably serves to keep the large majority of people in
line,
even while it causes a small group of social outcasts to become
more
criminal.

63

The accumulating empirical support for the proposition of vari-
able effects of arrest on criminal careers raises major questions
for
criminology, jurisprudence, and public policy. The question for
criminology is how theory can clearly account for these
differential
reactions to a relatively minor application of the criminal
sanction.
Competing theoretical perspectives, such as shaming, control,
and
power, might all account for these facts. Future experiments can
now be more closely focused on comparative tests of competing
the-
oretical perspectives.

The question for jurisprudence is whether the ministerial ap-
proach to police discretion is proper, especially with a
mandatory
arrest statute. Previous jurisprudence has rejected the
proposition
that punishment should be made more severe than just deserts
al-

low in order to increase a deterrent effect. But it has never
before
considered the proposition that punishment should be made less
se-
vere in order to reduce an escalation effect. Indeed,
jurisprudence
seems to have hardly considered the problem of escalation
effects at
all. Andrew von Hirsch, for example, suggests that

the disposition of convicted offenders should be commensurate
with

63 LOFLAND, supra note 30.

[Vol. 83



MILWAUKEE EXPERIMENT

the seriousness of their offenses, even if greater or lesser
severity
would promote other goals. For the principle, we have argued,
is a

requirement ofjustice, whereas deterrence, incapacitation and
rehabil-
itation are essentially strategies for controlling crime. The
priority of
the principle follows from the assumptions we stated at the
outset: the
requirement of justice ought to constrain the pursuit of crime
prevention.

64

Yet, he concedes elsewhere that all punishment depends upon
the
assumption of deterrence for its moral justification. 65 How
punish-
ment can be justified when it escalates violence is not at all
clear.
Yet how the punishment of some, and the failure to punish
others,
could be justified is equally unclear. The conflict between
justice
and crime control seems never to have been framed so baldly.

The short-term implications of this dilemma for public policy
are daunting. At the least, it suggests a need for other

approaches
to the control of domestic violence among marginal persons,
such
as greater investment in battered women's shelters. At best, it
sug-
gests a serious and thoughtful debate about the effects of
domestic
violence on our society, as well as the current inequities in
police
discretion that have been tolerated for years. The price of
reduced
violence may be changing the nature of the inequities and
making
the fact of inequity explicit. Whether we are willing to pay that
price
is a matter for every citizen to consider.

64 VON HIRSCH, supra note 10, at 74-75.
65 Id. at 55.

1992]
Journal of Criminal Law and CriminologySpring 1992The
Variable Effects of Arrest on Criminal Careers: The Milwaukee
Domestic Violence ExperimentLawrence W. ShermanJanell D.
SchmidtDennis P. RoganDouglas A. SmithRecommended

CitationVariable Effects of Arrest on Criminal Careers: The
Milwaukee Domestic Violence Experiment, The


The Variable Effects of Arrest on Criminal Careers: The
Milwaukee Domestic Violence Experiment (MilDVE)
Sherman, L. W. et al. (1992)
1




Quick Review of the 1984 Minneapolis Domestic Violence
Experiment (MDVE)
2


MDVE: Minneapolis Domestic Violence Experiment
The MDVE (Sherman and Berk 1984) field experiment
(conducted during 1981-1982) found arrest to be an effective
deterrent against repeat domestic violence.

Main Independent Variables:
Group assignment – assigned to advise, 8 hour separation,
or arrest

Police Action (delivered treatment) – assigned to advise, 8
hour separation,
or arrest

Main Dependent Variable:
Repeat Violence Over Six Months

3


Sherman & Berk (1984) Caveat: Treatment Dilution and
Treatment Migration
According to Angrist (2006) treat dilution occurs when subjects
that are assigned to the treatment group do not receive the
treatment

While treatment migration, according to Angrist (2006), refers
to the occurrence of control group members receiving the
treatment

Both phenomena pose a threat to internal validity of an
experimental design

E.g., regarding the MDVE, non-random crossovers of those who
were suppose to be advised ended up being arrested may cause

those who are arrested lose comparability to the advised group –
treatment migration

4






Prefect Compliance Cases: 91 + 84 + 83 = 258

Treatment (Arrest) Dilution Case: 1
Simple Treatment Migration Cases: 19 + 26 = 45
5


Non-compliance with random assignment Attenuated the
Treatment Effect of Arrest
Do to non-compliance with random assignment significant
selection bias was introduced

However, when the selection bias was controlled for the
deterrent effect of arrest is even stronger than previously
believed (Angrist, 2006, p. 39).

6



Source: Berk and Sherman, 1984, p. 6
7




Milwaukee Domestic Violence Experiment (MilDVE)
8


MilDVE (1987 to 1989): A Modified Replication of MDVE
The MilDVE field experiment represents a modified replication
of the Minneapolis Domestic Violence Experiment (MDVE)

The two key purposes of the MilDVE study were
(1) to examine the possible differences in reactions to arrest,
and
(2) to compare the effects of short and long
incarceration associated
with arrest.

9


Sherman et al. (1992): Variable Effects of Arrest on Criminal
Careers (MilDVE)
Sherman and Berk (1984) were concerned about concerns that
the MDVE sample did not allow for the evaluation of the
possibility “that for some kinds of people, arrest may only make
matters” – e.g., encourage domestic violence recidivism
(Sherman et al. 1992, p.139)

The current article addresses the above concern and Sherman et
al. (1992) observed based on the data that

“…[t]he evidence shows that, while arrest deters repeat
domestic violence in the short run, arrests with brief custody
increase the frequency of domestic violence in the long run
among offenders in general. The evidence also shows that,
among cases predominantly reported from Milwaukee’s black
urban poverty ghetto, different kinds of offenders react
differently to arrest: some become much more frequently
violent, while others become somewhat less frequently violent.”
(p. 139)

10


MilDVE Funding
United States Department of Justice. Office of Justice
Programs. National Institute of Justice (86-IJ-CX-K043)

11




Sampling
12


Sample Design
The sample was a non-probability purposive sample of calls
regarding misdemeanor domestic assault cases

Unit of observation is the individual

Calls received by the Milwaukee Police regarding misdemeanor

domestic assault were screened by police officers to establish
eligibility for the experiment. Eligible calls were referred to the
Crime Control Institute staff, who randomly assigned one of
three treatments. Selection of cases continued until 1,200
eligible cases were obtained out of 2,054 cases (ICPSR)

13


Select Sample Characteristics (n = 1,200)
About 91 of the suspects were male suspects (Sherman et al.
1992, p. 145)

Blacks comprised 76% of suspects (Sherman et al. 1992, p. 145)

Around 55% of the suspects were unemployed (Sherman et al.
1992, p. 145)

31% were high school graduates (Sherman et al. 1992, p. 145)

42% were intoxicated at the time police arrived (Sherman et al.
1992, p. 145)

55% had a prior arrest record and 26% with a prior arrest for a
battery against the same victim as in the presenting case

(Sherman et al. 1992, p. 145)

Majority of the suspects (or 64%) were never married to the
domestic violence survivor (Sherman et al. 1992, p. 145)





14


Modal Reason for Ineligibility (n = 854): Absence of the
Offender from the Scene (56% of the ineligibles)
15







Experiment Protocol
16

Random Assignment: Each Offender had Approximately a
Chance for Any One of 3 Treatments
Research protocol involved 35 patrol officers in four Milwaukee
police districts screening domestic violence cases for eligibility,
then calling police headquarters to request a randomly-assigned
disposition (ICPSR, 1994).

The three possible randomly assigned dispositions were (ICPSR,
1994):
(1) Code 1, which consisted of arrest and at least one night in
jail, unless the suspect posted bond,
(2) Code 2, which consisted of arrest and immediate release on
recognizance from the booking
area at police headquarters, or as soon as possible,
and
(3) Code 3, which consisted of a standard Miranda-style script
warning read by police to both suspect and victim.

Each domestic violence case had approximately equal likelihood
of being assigned to any one of the three codes

Whether the groups were balanced on background variables was
not discussed in this paper
17

MilDVE Exhibits High Degree of Experimental Protocol
Compliance





18


Prefect Compliance Cases: 400 + 384 + 396 = 1,180 (98.3%)
Sherman et al. (1992), p. 148
7.3% (or 88 of 1,200) the randomization cases were repeat
couples




Data Collection
19


Data Collection: Reporting and Interviews

A battered women's shelter hotline system provided the primary
measurement of the frequency of violence by the same suspects
both before and after each case leading to a randomized police
action (ICPSR, 1994).

Initial victim interviews were attempted within one month after
the first 900 incidents were compiled.

A second victim interview was attempted six months after the
incident for all 1,200 cases (Sherman et al. 1992, p.149).

The data collected consists of personal interviews and police
records (Sherman et al. 1992, p.149).


20





Variables
21

Outcome Measures: Recidivism
The main outcome measure involved recidivism, namely, the
prevalence and frequency of repeat violence by sample suspects
(Sherman et al. 1992)

First, recidivism was measured by calls recorded by a
Milwaukee battered women's shelter hotline system

Second, according to Sherman et al. (1992) “arrests of the
suspects for repeat violence (against any victim, including the
same one as in the presenting incident)”, p. 149

Third, “offense reports of repeat violence by the same suspect
against the same victim” (Sherman et al., 1992, p. 149)

The above three outcome variables are from official sources and
no cases are missing data

The fourth data source was on recidivism was based on
interviews: “up to two face-to-face interviews conducted with
the victim in each randomized case” (Sherman et al., 1992, p.
149)

22


Independent Variables
Treatment Group: Full Arrest, Short Arrest and Warning

Time-at-risk refers to time during which recidivism (the
outcome) could have occurred – measured as length of
cohabitation after initial police encounter (cf, Sherman et al.
1992, p.150-152)

Initial deterrence: initial disinclination of the offender to
reoffend after experimental police treatment (time and
frequency)

Long-term escalation: prevalence of same-victim violence or the
frequency of any-victim violence
23


Other Covariates
24Main EffectsArrestBlackEmployedHigh
SchoolMarriedCohabit. > 2 (over two years of
cohabiting)PriorLogADAYSTwo-Way InteractionsArrest &

BlackArrest & EmployedArrest & High SchoolArrest &
MarriedArrest & Cohabit




Analysis
25


Generic Research Question and Hypotheses We Need to Keep in
Mind
Generic Research Question:
What is the ith main effect’s impact on the outcome measure
between the three treatment groups?

Generic Null Hypothesis:
The ith main effect has the same impact on the outcome
measure for each of the three treatment groups.

Generic Research Hypothesis:
The ith main effect has a differential impact on the outcome
measure for at least one of the three treatment groups.

26





Did Time-at-Risk Vary by Treatment Group?
More Specifically: “What are the Effects of Treatment on the
Amount of Time Each Couple Spent together During the
Follow-up Period?”

Using n = 882 follow-up interviews, it is observed that “there
were no greater differences in time-at-risk than” would be
expected by chance variation: i.e., all p-values > 0.05

So the groups did not statistically differ in time-at-risk

27

Aside: Survival function
28


Survival Function Basics
In our Context, the survival function is a function that gives
the probability that an offender will survive (not
recidivate) beyond any given specified time

The Graph of a survival function consist of (Wikipedia):
x-axis is time.
y-axis indicates the proportion of offenders surviving.
The survival function graph show the probability that an
offender will survive (not reoffend) beyond time t

The survival function graph is provided by the below equation:



where: is the number of offenders at the beginning of the
period
is the number of offenders reoffending at time
is the ith time period

29


30


3
~86% of offenders who were initially arrested do not reoffend
for more than 3 months
~81% of offenders who were initially warned do not reoffend
for more than 3 months



The prevalence of repeat
violence over time is displayed for the warning treatment and
the combined arrest (short/full ) treatment for n = 1,133 person
with employment data available

According to Sherman et al. (1992), “[f]igure 1 shows the
"survival" trend in the prevalence of repeat violence over time,
with an obviously clear advantage for the arrested suspects in
the early days. At about seven to nine months after the
presenting incidents, however, the arrest and non-arrest curves
cross over, and from there on out the arrest group does worse”

(p. 153-154)

31
What are the Deterrent Effects of the Respective Treatments?




32





33
Sherman et al. (1992) Key Findings So Far (p. 156)


34


Any-Victim Prevalence of Repeat Violence per 10,000
Full Arrested employed suspects

per 10,000 suspects












35

10,000 arrested employed suspects produced 958 (= 5,991 –
5,033) fewer acts of domestic violence a year than warned
employed suspects

36


Findings So Far
“A high school education predicts a fairly weak deterrent effect
of arrest, but lack of a high school education predicts a fairly
strong criminogenic effect of arrest. Marriage is more powerful
than education, with a marriage license enhancing the deterrent
effect and its absence aggravating the adverse reaction to arrest.
Contrary to our expectations, length of cohabitation goes the
other way, although it is not inversely correlated with marriage.
Arrest appears to make suspects more violent if they have lived
with the victim for over two years than if they have not.”
(Sherman et al. 1992, p.163)
37





Aside: Poisson Regression

38


Poisson Regression
Poisson regression is a form of a regression where the response
variable is a non-negative integer values (e.g.,
count data) modeled as having a Poisson distribution.

The probability mass function of the Poisson distribution with
mean µ is

, for

The explanatory variables model the mean of the response
variable, .

Since the mean must be positive but the linear combination can
take on any
value, we need to use a link function for the parameter µ. The
standard link function is the natural logarithm.



so that

)
39



40
Outcome or Dependent Variable: Number of Subsequent Violent
Incidents




Poisson Regression
Alternatively, having a prior DV hotline report rather than not
having such a report increases recidivism frequency by 84%
controlling for other variables in the model

“Our interpretation of the model is that being black rather than
white increases the recidivism frequency rate for the arrest
group by sixty-two percent, while having a job reduces it by
fifty-eight percent and being a high school graduate reduces it
by forty-three percent.” (Sherman et al. 1992, p. 163)

41


42

“between two-thirds and three-quarters of all recidivist events
in the first six months are concentrated among offenders who
had only one repeat event” (Sherman et al. 1992, p. 165)






43


Omaha is the Most Complete Replication of MilDVE
“…the Milwaukee results suggest-but with only Omaha as a
replication-the analysis of prevalence as the only outcome may
obscure important consequences of mandatory arrest policies on
the total amount of domestic violence in a community.

Frequently rates more dearly show the escalation effects of
arrest.” (Sherman et al. 1992, p. 167)

However, the frequency and prevalence results among different
subgroups in the sample are consistent with the Milwaukee
results

“With the exception of marriage, the differences in prevalence
of officially measured repeat violence (any rearrest or new
complaint, combined) go in the same directions as in
Milwaukee. Three out of four indicators of marginality are
associated with less deterrence and generally with some
escalation.” (Sherman et al. 1992, p. 167)
44





Conclusion
45


Findings
“The Milwaukee domestic violence experiment finds no

evidence of an overall long-term deterrent effect of arrest. The
initial deterrent effects observed for up to thirty days quickly
disappear. By one year later, short arrest alone, and short and
full arrest combined, produce an escalation effect. The first
reported act of repeat violence following combined arrest
treatments occurs an average of twenty percent sooner than it
does following the warning treatment.” (Sherman et al. 1992, p.
167)

“The Milwaukee experiment does find strong evidence that
arrest has different effects on different kinds of people.
Employed, married, high school graduate and white suspects are
all less likely to have any incident of repeat violence reported
to the domestic violence hotline if they are arrested than if they
are not. Unemployed, unmarried, high school dropouts and
black suspects, on average, are reported much more frequently
to the domestic violence hotline if they are arrested than if they
are not. The magnitudes of the increased domestic violence
associated with arrest of the latter groups are substantial,
ranging up to sixty percent. The Milwaukee findings are
replicated clearly in Omaha, as well as by a more limited data
set in Colorado Springs.” (Sherman et al. 1992, p. 167-168)

Sherman et al. (1992) suggest a need for other approaches to the
control of domestic violence among marginalized groups (e.g.,

the unemployed), such as greater investment in battered
women’s shelters (p. 169).


46


References
Angrist (2006). Instrumental Variable Methods in Experimental
Criminological Research: What, Why, and How. Journal of
Experimental Criminology: 23, p. 23-44.

Berk, R. & Sherman, L. W. (1984). The Minneapolis Domestic
Violence
Experiment. The Police Foundation Reports. April, p. 1-13.

Sherman, L. W. et al. (1992). The Variable Effects of Arrest on
Criminal
Careers: The Milwaukee Domestic Violence Experiment.
Journal of
Criminal Law and Criminology, Vol. 83, Issue 1, p. 137-169.

Sherman, L. R., Schmidt, J. D., Rogan, D. P. (2000). User
Guide to the Machine-Readable Files and
Documentation With Codebooks (ICPSR 9966). Inter-

university Consortium for Political and Social
Research.


47
Tags