501_06_Rohatgi_An-Introduction-to-Probability-and-Statistics-Wiley-2015.pdf

810 views 315 slides Sep 17, 2023
Slide 1
Slide 1 of 707
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94
Slide 95
95
Slide 96
96
Slide 97
97
Slide 98
98
Slide 99
99
Slide 100
100
Slide 101
101
Slide 102
102
Slide 103
103
Slide 104
104
Slide 105
105
Slide 106
106
Slide 107
107
Slide 108
108
Slide 109
109
Slide 110
110
Slide 111
111
Slide 112
112
Slide 113
113
Slide 114
114
Slide 115
115
Slide 116
116
Slide 117
117
Slide 118
118
Slide 119
119
Slide 120
120
Slide 121
121
Slide 122
122
Slide 123
123
Slide 124
124
Slide 125
125
Slide 126
126
Slide 127
127
Slide 128
128
Slide 129
129
Slide 130
130
Slide 131
131
Slide 132
132
Slide 133
133
Slide 134
134
Slide 135
135
Slide 136
136
Slide 137
137
Slide 138
138
Slide 139
139
Slide 140
140
Slide 141
141
Slide 142
142
Slide 143
143
Slide 144
144
Slide 145
145
Slide 146
146
Slide 147
147
Slide 148
148
Slide 149
149
Slide 150
150
Slide 151
151
Slide 152
152
Slide 153
153
Slide 154
154
Slide 155
155
Slide 156
156
Slide 157
157
Slide 158
158
Slide 159
159
Slide 160
160
Slide 161
161
Slide 162
162
Slide 163
163
Slide 164
164
Slide 165
165
Slide 166
166
Slide 167
167
Slide 168
168
Slide 169
169
Slide 170
170
Slide 171
171
Slide 172
172
Slide 173
173
Slide 174
174
Slide 175
175
Slide 176
176
Slide 177
177
Slide 178
178
Slide 179
179
Slide 180
180
Slide 181
181
Slide 182
182
Slide 183
183
Slide 184
184
Slide 185
185
Slide 186
186
Slide 187
187
Slide 188
188
Slide 189
189
Slide 190
190
Slide 191
191
Slide 192
192
Slide 193
193
Slide 194
194
Slide 195
195
Slide 196
196
Slide 197
197
Slide 198
198
Slide 199
199
Slide 200
200
Slide 201
201
Slide 202
202
Slide 203
203
Slide 204
204
Slide 205
205
Slide 206
206
Slide 207
207
Slide 208
208
Slide 209
209
Slide 210
210
Slide 211
211
Slide 212
212
Slide 213
213
Slide 214
214
Slide 215
215
Slide 216
216
Slide 217
217
Slide 218
218
Slide 219
219
Slide 220
220
Slide 221
221
Slide 222
222
Slide 223
223
Slide 224
224
Slide 225
225
Slide 226
226
Slide 227
227
Slide 228
228
Slide 229
229
Slide 230
230
Slide 231
231
Slide 232
232
Slide 233
233
Slide 234
234
Slide 235
235
Slide 236
236
Slide 237
237
Slide 238
238
Slide 239
239
Slide 240
240
Slide 241
241
Slide 242
242
Slide 243
243
Slide 244
244
Slide 245
245
Slide 246
246
Slide 247
247
Slide 248
248
Slide 249
249
Slide 250
250
Slide 251
251
Slide 252
252
Slide 253
253
Slide 254
254
Slide 255
255
Slide 256
256
Slide 257
257
Slide 258
258
Slide 259
259
Slide 260
260
Slide 261
261
Slide 262
262
Slide 263
263
Slide 264
264
Slide 265
265
Slide 266
266
Slide 267
267
Slide 268
268
Slide 269
269
Slide 270
270
Slide 271
271
Slide 272
272
Slide 273
273
Slide 274
274
Slide 275
275
Slide 276
276
Slide 277
277
Slide 278
278
Slide 279
279
Slide 280
280
Slide 281
281
Slide 282
282
Slide 283
283
Slide 284
284
Slide 285
285
Slide 286
286
Slide 287
287
Slide 288
288
Slide 289
289
Slide 290
290
Slide 291
291
Slide 292
292
Slide 293
293
Slide 294
294
Slide 295
295
Slide 296
296
Slide 297
297
Slide 298
298
Slide 299
299
Slide 300
300
Slide 301
301
Slide 302
302
Slide 303
303
Slide 304
304
Slide 305
305
Slide 306
306
Slide 307
307
Slide 308
308
Slide 309
309
Slide 310
310
Slide 311
311
Slide 312
312
Slide 313
313
Slide 314
314
Slide 315
315
Slide 316
316
Slide 317
317
Slide 318
318
Slide 319
319
Slide 320
320
Slide 321
321
Slide 322
322
Slide 323
323
Slide 324
324
Slide 325
325
Slide 326
326
Slide 327
327
Slide 328
328
Slide 329
329
Slide 330
330
Slide 331
331
Slide 332
332
Slide 333
333
Slide 334
334
Slide 335
335
Slide 336
336
Slide 337
337
Slide 338
338
Slide 339
339
Slide 340
340
Slide 341
341
Slide 342
342
Slide 343
343
Slide 344
344
Slide 345
345
Slide 346
346
Slide 347
347
Slide 348
348
Slide 349
349
Slide 350
350
Slide 351
351
Slide 352
352
Slide 353
353
Slide 354
354
Slide 355
355
Slide 356
356
Slide 357
357
Slide 358
358
Slide 359
359
Slide 360
360
Slide 361
361
Slide 362
362
Slide 363
363
Slide 364
364
Slide 365
365
Slide 366
366
Slide 367
367
Slide 368
368
Slide 369
369
Slide 370
370
Slide 371
371
Slide 372
372
Slide 373
373
Slide 374
374
Slide 375
375
Slide 376
376
Slide 377
377
Slide 378
378
Slide 379
379
Slide 380
380
Slide 381
381
Slide 382
382
Slide 383
383
Slide 384
384
Slide 385
385
Slide 386
386
Slide 387
387
Slide 388
388
Slide 389
389
Slide 390
390
Slide 391
391
Slide 392
392
Slide 393
393
Slide 394
394
Slide 395
395
Slide 396
396
Slide 397
397
Slide 398
398
Slide 399
399
Slide 400
400
Slide 401
401
Slide 402
402
Slide 403
403
Slide 404
404
Slide 405
405
Slide 406
406
Slide 407
407
Slide 408
408
Slide 409
409
Slide 410
410
Slide 411
411
Slide 412
412
Slide 413
413
Slide 414
414
Slide 415
415
Slide 416
416
Slide 417
417
Slide 418
418
Slide 419
419
Slide 420
420
Slide 421
421
Slide 422
422
Slide 423
423
Slide 424
424
Slide 425
425
Slide 426
426
Slide 427
427
Slide 428
428
Slide 429
429
Slide 430
430
Slide 431
431
Slide 432
432
Slide 433
433
Slide 434
434
Slide 435
435
Slide 436
436
Slide 437
437
Slide 438
438
Slide 439
439
Slide 440
440
Slide 441
441
Slide 442
442
Slide 443
443
Slide 444
444
Slide 445
445
Slide 446
446
Slide 447
447
Slide 448
448
Slide 449
449
Slide 450
450
Slide 451
451
Slide 452
452
Slide 453
453
Slide 454
454
Slide 455
455
Slide 456
456
Slide 457
457
Slide 458
458
Slide 459
459
Slide 460
460
Slide 461
461
Slide 462
462
Slide 463
463
Slide 464
464
Slide 465
465
Slide 466
466
Slide 467
467
Slide 468
468
Slide 469
469
Slide 470
470
Slide 471
471
Slide 472
472
Slide 473
473
Slide 474
474
Slide 475
475
Slide 476
476
Slide 477
477
Slide 478
478
Slide 479
479
Slide 480
480
Slide 481
481
Slide 482
482
Slide 483
483
Slide 484
484
Slide 485
485
Slide 486
486
Slide 487
487
Slide 488
488
Slide 489
489
Slide 490
490
Slide 491
491
Slide 492
492
Slide 493
493
Slide 494
494
Slide 495
495
Slide 496
496
Slide 497
497
Slide 498
498
Slide 499
499
Slide 500
500
Slide 501
501
Slide 502
502
Slide 503
503
Slide 504
504
Slide 505
505
Slide 506
506
Slide 507
507
Slide 508
508
Slide 509
509
Slide 510
510
Slide 511
511
Slide 512
512
Slide 513
513
Slide 514
514
Slide 515
515
Slide 516
516
Slide 517
517
Slide 518
518
Slide 519
519
Slide 520
520
Slide 521
521
Slide 522
522
Slide 523
523
Slide 524
524
Slide 525
525
Slide 526
526
Slide 527
527
Slide 528
528
Slide 529
529
Slide 530
530
Slide 531
531
Slide 532
532
Slide 533
533
Slide 534
534
Slide 535
535
Slide 536
536
Slide 537
537
Slide 538
538
Slide 539
539
Slide 540
540
Slide 541
541
Slide 542
542
Slide 543
543
Slide 544
544
Slide 545
545
Slide 546
546
Slide 547
547
Slide 548
548
Slide 549
549
Slide 550
550
Slide 551
551
Slide 552
552
Slide 553
553
Slide 554
554
Slide 555
555
Slide 556
556
Slide 557
557
Slide 558
558
Slide 559
559
Slide 560
560
Slide 561
561
Slide 562
562
Slide 563
563
Slide 564
564
Slide 565
565
Slide 566
566
Slide 567
567
Slide 568
568
Slide 569
569
Slide 570
570
Slide 571
571
Slide 572
572
Slide 573
573
Slide 574
574
Slide 575
575
Slide 576
576
Slide 577
577
Slide 578
578
Slide 579
579
Slide 580
580
Slide 581
581
Slide 582
582
Slide 583
583
Slide 584
584
Slide 585
585
Slide 586
586
Slide 587
587
Slide 588
588
Slide 589
589
Slide 590
590
Slide 591
591
Slide 592
592
Slide 593
593
Slide 594
594
Slide 595
595
Slide 596
596
Slide 597
597
Slide 598
598
Slide 599
599
Slide 600
600
Slide 601
601
Slide 602
602
Slide 603
603
Slide 604
604
Slide 605
605
Slide 606
606
Slide 607
607
Slide 608
608
Slide 609
609
Slide 610
610
Slide 611
611
Slide 612
612
Slide 613
613
Slide 614
614
Slide 615
615
Slide 616
616
Slide 617
617
Slide 618
618
Slide 619
619
Slide 620
620
Slide 621
621
Slide 622
622
Slide 623
623
Slide 624
624
Slide 625
625
Slide 626
626
Slide 627
627
Slide 628
628
Slide 629
629
Slide 630
630
Slide 631
631
Slide 632
632
Slide 633
633
Slide 634
634
Slide 635
635
Slide 636
636
Slide 637
637
Slide 638
638
Slide 639
639
Slide 640
640
Slide 641
641
Slide 642
642
Slide 643
643
Slide 644
644
Slide 645
645
Slide 646
646
Slide 647
647
Slide 648
648
Slide 649
649
Slide 650
650
Slide 651
651
Slide 652
652
Slide 653
653
Slide 654
654
Slide 655
655
Slide 656
656
Slide 657
657
Slide 658
658
Slide 659
659
Slide 660
660
Slide 661
661
Slide 662
662
Slide 663
663
Slide 664
664
Slide 665
665
Slide 666
666
Slide 667
667
Slide 668
668
Slide 669
669
Slide 670
670
Slide 671
671
Slide 672
672
Slide 673
673
Slide 674
674
Slide 675
675
Slide 676
676
Slide 677
677
Slide 678
678
Slide 679
679
Slide 680
680
Slide 681
681
Slide 682
682
Slide 683
683
Slide 684
684
Slide 685
685
Slide 686
686
Slide 687
687
Slide 688
688
Slide 689
689
Slide 690
690
Slide 691
691
Slide 692
692
Slide 693
693
Slide 694
694
Slide 695
695
Slide 696
696
Slide 697
697
Slide 698
698
Slide 699
699
Slide 700
700
Slide 701
701
Slide 702
702
Slide 703
703
Slide 704
704
Slide 705
705
Slide 706
706
Slide 707
707

About This Presentation

Math Book


Slide Content

AN INTRODUCTION TO PROBABILITY
AND STATISTICS

WILEY SERIES IN PROBABILITY AND STATISTICS
Established by WALTER A. SHEWHART and SAMUEL S. WILKS
Editors:David J. Balding, Noel A. C. Cressie, Garrett M. Fitzmaurice,
Geof H. Givens, Harvey Goldstein, Geert Molenberghs, David W. Scott,
Adrian F. M. Smith, Ruey S. Tsay, Sanford Weisberg
Editors Emeriti:J. Stuart Hunter, Iain M. Johnstone, Joseph B. Kadane,
Jozef L. Teugels
A complete list of the titles in this series appears at the end of this volume.

ANINTRODUCTIONTO
PROBABILITYAND
STATISTICS
Third Edition
VIJAY K. ROHATGI
A. K. Md. EHSANES SALEH

Copyright © 2015 by John Wiley & Sons, Inc. All rights reserved
Published by John Wiley & Sons, Inc., Hoboken, New Jersey
Published simultaneously in Canada
No part of this publication may be reproduced, stored in a retrieval system, or transmitted in any form or by
any means, electronic, mechanical, photocopying, recording, scanning, or otherwise, except as permitted under
Section 107 or 108 of the 1976 United States Copyright Act, without either the prior written permission of the
Publisher, or authorization through payment of the appropriate per-copy fee to the Copyright Clearance Center,
Inc., 222 Rosewood Drive, Danvers, MA 01923, (978) 750-8400, fax (978) 750-4470, or on the web at
www.copyright.com. Requests to the Publisher for permission should be addressed to the Permissions
Department, John Wiley & Sons, Inc., 111 River Street, Hoboken, NJ 07030, (201) 748-6011, fax (201)
748-6008, or online at http://www.wiley.com/go/permissions.
Limit of Liability/Disclaimer of Warranty: While the publisher and author have used their best efforts in
preparing this book, they make no representations or warranties with respect to the accuracy or completeness
of the contents of this book and specifically disclaim any implied warranties of merchantability or fitness for a
particular purpose. No warranty may be created or extended by sales representatives or written sales materials.
The advice and strategies contained herein may not be suitable for your situation. You should consult with a
professional where appropriate. Neither the publisher nor author shall be liable for any loss of profit or any
other commercial damages, including but not limited to special, incidental, consequential, or other damages.
For general information on our other products and services or for technical support, please contact our
Customer Care Department within the United States at (800) 762-2974, outside the United States at (317)
572-3993 or fax (317) 572-4002.
Wiley also publishes its books in a variety of electronic formats. Some content that appears in print may not be
available in electronic formats. For more information about Wiley products, visit our web site at
www.wiley.com.
Library of Congress Cataloging-in-Publication Data:
Rohatgi, V. K., 1939-
An introduction to probability theory and mathematical statistics / Vijay K. Rohatgi and A. K. Md. Ehsanes
Saleh. – 3rd edition.
pages cm
Includes index.
ISBN 978-1-118-79964-2 (cloth)
1. Probabilities. 2. Mathematical statistics. I. Saleh, A. K. Md. Ehsanes. II. Title.
QA273.R56 2015
519.5–dc23
2015004848
Set in 10/12pts Times Lt Std by SPi Global, Pondicherry, India
Printed in the United States of America
10987654321
3 2015

To Bina and Shahidara.

CONTENTS
PREFACE TO THE THIRD EDITION xiii
PREFACE TO THE SECOND EDITION xv
PREFACE TO THE FIRST EDITION xvii
ACKNOWLEDGMENTS xix
ENUMERATION OF THEOREMS AND REFERENCES xxi
1 Probability 1
1.1 Introduction, 1
1.2 Sample Space, 2
1.3 Probability Axioms, 7
1.4 Combinatorics: Probability on Finite Sample Spaces, 20
1.5 Conditional Probability and Bayes Theorem, 26
1.6 Independence of Events, 31
2 Random Variables and Their Probability Distributions 39
2.1 Introduction, 39
2.2 Random Variables, 39
2.3 Probability Distribution of a Random Variable, 42
2.4 Discrete and Continuous Random Variables, 47
2.5 Functions of a Random Variable, 55

viii CONTENTS
3 Moments and Generating Functions 67
3.1 Introduction, 67
3.2 Moments of a Distribution Function, 67
3.3 Generating Functions, 83
3.4 Some Moment Inequalities, 93
4 Multiple Random Variables 99
4.1 Introduction, 99
4.2 Multiple Random Variables, 99
4.3 Independent Random Variables, 114
4.4 Functions of Several Random Variables, 123
4.5 Covariance, Correlation and Moments, 143
4.6 Conditional Expectation, 157
4.7 Order Statistics and Their Distributions, 164
5 Some Special Distributions 173
5.1 Introduction, 173
5.2 Some Discrete Distributions, 173
5.2.1 Degenerate Distribution, 173
5.2.2 Two-Point Distribution, 174
5.2.3 Uniform Distribution onnPoints, 175
5.2.4 Binomial Distribution, 176
5.2.5 Negative Binomial Distribution (Pascal or Waiting Time
Distribution), 178
5.2.6 Hypergeometric Distribution, 183
5.2.7 Negative Hypergeometric Distribution, 185
5.2.8 Poisson Distribution, 186
5.2.9 Multinomial Distribution, 189
5.2.10 Multivariate Hypergeometric Distribution, 192
5.2.11 Multivariate Negative Binomial Distribution, 192
5.3 Some Continuous Distributions, 196
5.3.1 Uniform Distribution (Rectangular Distribution), 199
5.3.2 Gamma Distribution, 202
5.3.3 Beta Distribution, 210
5.3.4 Cauchy Distribution, 213
5.3.5 Normal Distribution (the Gaussian Law), 216
5.3.6 Some Other Continuous Distributions, 222
5.4 Bivariate and Multivariate Normal Distributions, 228
5.5 Exponential Family of Distributions, 240
6 Sample Statistics and Their Distributions 245
6.1 Introduction, 245
6.2 Random Sampling, 246
6.3 Sample Characteristics and Their Distributions, 249

CONTENTS ix
6.4 Chi-Square,t-, andF-Distributions: Exact Sampling Distributions, 262
6.5 Distribution of(X,S
2
)in Sampling from a Normal Population, 271
6.6 Sampling from a Bivariate Normal Distribution, 276
7 Basic Asymptotics: Large Sample Theory 285
7.1 Introduction, 285
7.2 Modes of Convergence, 285
7.3 Weak Law of Large Numbers, 302
7.4 Strong Law of Large Numbers, 308
7.5 Limiting Moment Generating Functions, 316
7.6 Central Limit Theorem, 321
7.7 Large Sample Theory, 331
8 Parametric Point Estimation 337
8.1 Introduction, 337
8.2 Problem of Point Estimation, 338
8.3 Sufficiency, Completeness and Ancillarity, 342
8.4 Unbiased Estimation, 359
8.5 Unbiased Estimation (Continued ): A Lower Bound for the Variance
of An Estimator, 372
8.6 Substitution Principle (Method of Moments), 386
8.7 Maximum Likelihood Estimators, 388
8.8 Bayes and Minimax Estimation, 401
8.9 Principle of Equivariance, 418
9 Neyman–Pearson Theory of Testing of Hypotheses 429
9.1 Introduction, 429
9.2 Some Fundamental Notions of Hypotheses Testing, 429
9.3 Neyman–Pearson Lemma, 438
9.4 Families with Monotone Likelihood Ratio, 446
9.5 Unbiased and Invariant Tests, 453
9.6 Locally Most Powerful Tests, 459
10 Some Further Results on Hypotheses Testing 463
10.1 Introduction, 463
10.2 Generalized Likelihood Ratio Tests, 463
10.3 Chi-Square Tests, 472
10.4t-Tests, 484
10.5F-Tests, 489
10.6 Bayes and Minimax Procedures, 491

x CONTENTS
11 Confidence Estimation 499
11.1 Introduction, 499
11.2 Some Fundamental Notions of Confidence Estimation, 499
11.3 Methods of Finding Confidence Intervals, 504
11.4 Shortest-Length Confidence Intervals, 517
11.5 Unbiased and Equivariant Confidence Intervals, 523
11.6 Resampling: Bootstrap Method, 530
12 General Linear Hypothesis 535
12.1 Introduction, 535
12.2 General Linear Hypothesis, 535
12.3 Regression Analysis, 543
12.3.1 Multiple Linear Regression, 543
12.3.2 Logistic and Poisson Regression, 551
12.4 One-Way Analysis of Variance, 554
12.5 Two-Way Analysis of Variance with One Observation Per Cell, 560
12.6 Two-Way Analysis of Variance with Interaction, 566
13 Nonparametric Statistical Inference 575
13.1 Introduction, 575
13.2U-Statistics, 576
13.3 Some Single-Sample Problems, 584
13.3.1 Goodness-of-Fit Problem, 584
13.3.2 Problem of Location, 590
13.4 Some Two-Sample Problems, 599
13.4.1 Median Test, 601
13.4.2 Kolmogorov–Smirnov Test, 602
13.4.3 The Mann–Whitney–Wilcoxon Test, 604
13.5 Tests of Independence, 608
13.5.1 Chi-square Test of Independence—Contingency Tables, 608
13.5.2 Kendall’s Tau, 611
13.5.3 Spearman’s Rank Correlation Coefficient, 614
13.6 Some Applications of Order Statistics, 619
13.7 Robustness, 625
13.7.1 Effect of Deviations from Model Assumptions on Some
Parametric Procedures, 625
13.7.2 Some Robust Procedures, 631
FREQUENTLY USED SYMBOLS AND ABBREVIATIONS 637
REFERENCES 641

CONTENTS xi
STATISTICAL TABLES 647
ANSWERS TO SELECTED PROBLEMS 667
AUTHOR INDEX 677
SUBJECT INDEX 679

PREFACE TO THE THIRD EDITION
TheThird Editioncontains some new material. More specifically, the chapter on large sam-
ple theory has been reorganized, repositioned, and re-titled in recognition of the growing
role of asymptotic statistics. In Chapter 12 on General Linear Hypothesis, the section on
regression analysis has been greatly expanded to include multiple regression and logistic
and Poisson regression.
Some more problems and remarks have been added to illustrate the material covered.
The basic character of the book, however, remains the same as enunciated in the Preface to
the first edition. It remains a solid introduction to first-year graduate students or advanced
seniors in mathematics and statistics as well as a reference to students and researchers in
other sciences.
We are grateful to the readers for their comments on this book over the past 40 years
and would welcome any questions, comments, and suggestions. You can communi-
cate with Vijay K. Rohatgi at [email protected] and with A. K. Md. Ehsanes Saleh at
[email protected].
Vijay K. RohatgiSolana Beach, CA
A. K. Md. Ehsanes SalehOttawa, Canada

PREFACE TO THE SECOND EDITION
There is a lot that is different about this second edition. First, there is a co-author without
whose help this revision would not have been possible. Second, we have benefited from
countless letters from readers and colleagues who have pointed out errors and omissions
and have made valuable suggestions over the past 25 years. These communications make
this revision worth the effort. Third, we have tried to update the content of the book while
striving to preserve the character and spirit of the first edition.
Here are some of the numerous changes that have been made.
1. The Introduction section has been removed. We have also removed Chapter 14 on
sequential statistical inference.
2. Many parts of the book have gone substantial rewriting. For example, Chapter 4 has
many changes, such as inclusion of exchangeability. In Chapter 3, an introduction to
characteristic functions has been added. In Chapter 5 some new distributions have
been added while in Chapter 6 there have been many changes in proofs.
3. The statistical inference part of the book (Chapters 8 to 13) has been updated.
Thus in Chapter 8 we have expanded the coverage of invariance and have included
discussions of ancillary statistics and conjugate prior distributions.
4. Similar changes have been made in Chapter 9. A new section on locally most
powerful tests has been added.
5. Chapter 11 has been greatly revised and a discussion of invariant confidence
intervals has been added.
6. Chapter 13 has been completely rewritten in the light of increased emphasis on
nonparametric inference. We have expanded the discussion ofU-statistics. Later
sections show the connection between commonly used tests andU-statistics.
7. In Chapter 12, the notation has been changed to confirm to the current convention.

xvi PREFACE TO THE SECOND EDITION
8. Many problems and examples have been added.
9. More figures have been added to illustrate examples and proofs.
10. Answers to selected problems have been provided.
We are truly grateful to the readers of the first edition for countless comments and
suggestions and hope we will continue to hear from them about this edition.
Special thanks are due Ms. Gillian Murray for her superb word processing of the
manuscript, and Dr. Indar Bhatia for figures that appear in the text. Dr. Bhatia spent count-
less hours preparing the diagrams for publication. We also acknowledge the assistance of
Dr. K. Selvavel.
Vijay K. Rohatgi
A. K. Md. Ehsanes Saleh

PREFACE TO THE FIRST EDITION
This book on probability theory and mathematical statistics is designed for a three-quarter
course meeting 4 hours per week or a two-semester course meeting 3 hours per week. It is
designed primarily for advanced seniors and beginning graduate students in mathematics,
but it can also be used by students in physics and engineering with strong mathematical
backgrounds. Let me emphasize that this is a mathematics text and not a “cookbook.” It
should not be used as a text for service courses.
The mathematics prerequisites for this book are modest. It is assumed that the reader has
had basic courses in set theory and linear algebra and a solid course in advanced calculus.
No prior knowledge of probability and/or statistics is assumed.
My aim is to provide a solid and well-balanced introduction to probability theory and
mathematical statistics. It is assumed that students who wish to do graduate work in prob-
ability theory and mathematical statistics will be taking, concurrently with this course, a
measure-theoretic course in analysis if they have not already had one. These students can
go on to take advanced-level courses in probability theory or mathematical statistics after
completing this course.
This book consists of essentially three parts, although no such formal divisions are des-
ignated in the text. The first part consists of Chapters 1 through 6, which form the core of
the probability portion of the course. The second part, Chapters 7 through 11, covers the
foundations of statistical inference. The third part consists of the remaining three chapters
on special topics. For course sequences that separate probability and mathematical statis-
tics, the first part of the book can be used for a course in probability theory, followed by
a course in mathematical statistics based on the second part and, possibly, one or more
chapters on special topics.
The reader will find here a wealth of material. Although the topics covered are fairly
conventional, the discussions and special topics included are not. Many presentations give

xviii PREFACE TO THE FIRST EDITION
far more depth than is usually the case in a book at this level. Some special features of the
book are the following:
1. A well-referenced chapter on the preliminaries.
2. About 550 problems, over 350 worked-out examples, about 200 remarks, and about
150 references.
3. An advance warning to reader wherever the details become too involved. They can
skip the later portion of the section in question on first reading without destroying
the continuity in any way.
4. Many results on characterizations of distributions (Chapter 5).
5. Proof of the central limit theorem by the method of operators and proof of the
strong law of large numbers (Chapter 6).
6. A section on minimal sufficient statistics (Chapter 8).
7. A chapter on special tests (Chapter 10).
8. A careful presentation of the theory of confidence intervals, including Bayesian
intervals and shortest-length confidence intervals (Chapter 11).
9. A chapter on the general linear hypothesis, which carries linear models through to
their use in basic analysis of variance (Chapter 12).
10. Sections on nonparametric estimation and robustness (Chapter 13).
11. Two sections on sequential estimation (Chapter 14).
The contents of this book were used in a 1-year (two-semester) course that I taught three
times at the Catholic University of America and once in a three-quarter course at Bowling
Green State University. In the fall of 1973 my colleague, Professor Eugene Lukacs, taught
the first quarter of this same course on the basis of my notes, which eventually became
this book. I have always been able to cover this book (with few omissions) in a 1-year
course, lecturing 3 hours a week. An hour-long problem session every week is conducted
by a senior graduate student.
In a book of this size there are bound to be some misprints, errors, and ambiguities of
presentation. I shall be grateful to any reader who brings these to my attention.
V. K. RohatgiBowling Green, Ohio
February 1975

ACKNOWLEDGMENTS
We take this opportunity to thank many correspondents whose comments and criticisms
led to improvements in theThird Edition. The list below is far from complete since it
does not include the names of countless students whose reactions to the book as a text
helped the authors in this revised edition. We apologize to those whose names may have
been inadvertently omitted from the list because we were not diligent enough to keep
a complete record of all the correspondence. For the third edition we wish to thank
Professors Yue-Cune Chang, Anirban Das Gupta, A. G. Pathak, Arno Weiershauser, and
many other readers who sent their questions and comments. We also wish to acknowl-
edge the assistance of Dr. Pooplasingam Sivakumar in preparation of the manuscript.
For the second edition: Barry Arnold, Lennart Bondesson, Harry Cohn, Frank Connonito,
Emad El-Neweihi, Ulrich Faigle, Pier Alda Ferrari, Martin Feuerrnan, Xavier Fernando,
Z. Govindarajulu, Arjun Gupta, Hassein Hamedani, Thomas Hem, Jin-Sheng Huang, Bill
Hudson, Barthel Huff, V. S. Huzurbazar, B. K. Kale, Sam Kotz, Bansi Lal, Sri Gopal
Mohanty, M. V. Moorthy, True Nguyen, Tom O’Connor, A. G. Pathak, Edsel Pena,
S. Perng, Madan Puri, Prem Puri, J. S. Rao, Bill Raser, Andrew Rukhin, K. Selvavel,
Rajinder Singh, R. J. Tomkins; for the first edition, Ralph Baty, Ralph Bradley, Eugene
Lukacs, Kae Lea Main, Tom and Carol O’Connor, M. S. Scott Jr., J. Sethuraman, Beatrice
Shube, Jeff Spielman, and Robert Tortora.
We thank the publishers of theAmerican Mathematical Monthly,theSIAM Review,
and theAmerican Statisticianfor permission to include many examples and problems that
appeared in these journals. Thanks are also due to the following for permission to include
tables: Professors E. S. Pearson and L. R. Verdooren (Table ST11), Harvard University
Press (Table ST1), Hafner Press (Table ST3), Iowa State University Press (Table ST5),
Rand Corporation (Table ST6), the American Statistical Association (Tables ST7 and
ST10), the Institute of Mathematical Statistics (Tables ST8 and ST9), Charles Griffin&
Co., Ltd. (Tables ST12 and ST13), and John Wiley & Sons (Tables ST1, ST2, ST4, ST10,
and ST11).

ENUMERATION OF THEOREMS
AND REFERENCES
This book is divided into 13 chapters, numbered 1 through 13. Each chapter is divided
into several sections. Lemmas, theorems, equations, definitions, remarks, figures, and so
on, are numbered consecutively within each section. Thus Theoremi.j.krefers to thekth
theorem in Sectionjof Chapteri, Sectioni.jrefers to thejth section of Chapteri, and
so on. Theoremjrefers to thejth theorem of the section in which it appears. A similar
convention is used for equations except that equation numbers are enclosed in parenthe-
ses. Each section is followed by a set of problems for which the same numbering system
is used.
References are given at the end of the book and are denoted in the text by numbers
enclosed in square brackets, [ ]. If a citation is to a book, the notation([i,p.j])refers to
thejth page of the reference numbered[i].
A word about the proofs of results stated without proof in this book. If a reference
appears immediately following or preceding the statement of a result, it generally means
that the proof is beyond the scope of this text. If no reference is given, it indicates that the
proof is left to the reader. Sometimes the reader is asked to supply the proof as a problem.

1
PROBABILITY
1.1 INTRODUCTION
The theory of probability had its origin in gambling and games of chance. It owes much
to the curiosity of gamblers who pestered their friends in the mathematical world with all
sorts of questions. Unfortunately this association with gambling contributed to a very slow
and sporadic growth of probability theory as a mathematical discipline. The mathemati-
cians of the day took little or no interest in the development of any theory but looked only
at the combinatorial reasoning involved in each problem.
The first attempt at some mathematical rigor is credited to Laplace. In his monumental
work,Theorie analytique des probabilités(1812), Laplace gave the classical definition of
the probability of an event that can occur only in a finite number of ways as the proportion
of the number of favorable outcomes to the total number of all possible outcomes, provided
that all the outcomes areequally likely. According to this definition, the computation of
the probability of events was reduced to combinatorial counting problems. Even in those
days, this definition was found inadequate. In addition to being circular and restrictive,
it did not answer the question of what probability is, it only gave a practical method of
computing the probabilities of some simple events.
An extension of the classical definition of Laplace was used to evaluate the probabilities
of sets of events with infinite outcomes. The notion ofequal likelihoodof certain events
played a key role in this development. According to this extension, ifΩis some region with
a well-defined measure (length, area, volume, etc.), the probability that a point chosenat
randomlies in a subregionAofΩis the ratio measure(A)/measure (Ω). Many problems
of geometric probability were solved using this extension. The trouble is that one can
An Introduction to Probability and Statistics, Third Edition. Vijay K. Rohatgi and A.K. Md. Ehsanes Saleh.
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.

2 PROBABILITY
define “at random” in any way one pleases, and different definitions therefore lead to dif-
ferent answers. Joseph Bertrand, for example, in his bookCalcul des probabilités(Paris,
1889) cited a number of problems in geometric probability where the result depended
on the method of solution. In Example 9 we will discuss the famous Bertrand paradox
and show that in reality there is nothing paradoxical about Bertrand’s paradoxes; once
we define “probability spaces” carefully, the paradox is resolved. Nevertheless difficul-
ties encountered in the field of geometric probability have been largely responsible for
the slow growth of probability theory and its tardy acceptance by mathematicians as a
mathematical discipline.
The mathematical theory of probability, as we know it today, is of comparatively recent
origin. It was A. N. Kolmogorov who axiomatized probability in his fundamental work,
Foundations of the Theory of Probability(Berlin), in 1933. According to this development,
random events are represented by sets and probability is just anormed measuredefined on
these sets. This measure-theoretic development not only provided a logically consistent
foundation for probability theory but also, at the same time, joined it to the mainstream of
modern mathematics.
In this book we follow Kolmogorov’s axiomatic development. In Section 1.2 we intro-
duce the notion of a sample space. In Section 1.3 we state Kolmogorov’s axioms of
probability and study some simple consequences of these axioms. Section 1.4 is devoted to
the computation of probability on finite sample spaces. Section 1.5 deals with conditional
probability and Bayes’s rule while Section 1.6 examines the independence of events.
1.2 SAMPLE SPACE
In most branches of knowledge, experiments are a way of life. In probability and statis-
tics, too, we concern ourselves with special types of experiments. Consider the following
examples.
Example 1.A coin is tossed. Assuming that the coin does not land on the side, there are
two possible outcomes of the experiment: heads and tails. On any performance of this
experiment one does not know what the outcome will be. The coin can be tossed as many
times as desired.
Example 2.A roulette wheel is a circular disk divided into 38 equal sectors numbered
from 0 to 36 and 00. A ball is rolled on the edge of the wheel, and the wheel is rolled
in the opposite direction. One bets on any of the 38 numbers or some combinations of
them. One can also bet on a color, red or black. If the ball lands in the sector numbered
32, say, anybody who bet on 32 or combinations including 32 wins, and so on. In this
experiment, all possible outcomes are known in advance, namely 00, 0, 1, 2,...,36, but
on any performance of the experiment there is uncertainty as to what the outcome will be,
provided, of course, that the wheel is not rigged in any manner. Clearly, the wheel can be
rolled any number of times.
Example 3.A manufacturer produces footrules. The experiment consists in measuring
the length of a footrule produced by the manufacturer as accurately as possible. Because

SAMPLE SPACE 3
of errors in the production process one does not know what the true length of the footrule
selected will be. It is clear, however, that the length will be, say, between 11 and 13 in.,
or, if one wants to be safe, between 6 and 18 in.
Example 4.The length of life of a light bulb produced by a certain manufacturer is
recorded. In this case one does not know what the length of life will be for the light bulb
selected, but clearly one is aware in advance that it will be some number between 0 and
∞hours.
The experiments described above have certain common features. For each experiment,
we know in advance all possible outcomes, that is, there are no surprises in store after the
performance of any experiment. On any performance of the experiment, however, we do
not know what the specific outcome will be, that is, there is uncertainty about the outcome
on any performance of the experiment. Moreover, the experiment can be repeated under
identical conditions. These features describe arandom(or astatistical)experiment.
Definition 1.A random (or a statistical) experiment is an experiment in which
(a) all outcomes of the experiment are known in advance,
(b) any performance of the experiment results in an outcome that is not known in
advance, and
(c) the experiment can be repeated under identical conditions.
In probability theory we study this uncertainty of a random experiment. It is convenient
to associate with each such experiment a setΩ, the set of all possible outcomes of the
experiment. To engage in any meaningful discussion about the experiment, we associate
withΩaσ-fieldS, of subsets ofΩ. We recall that aσ-field is a nonempty class of subsets
ofΩthat is closed under the formation of countable unions and complements and contains
the null setΦ.
Definition 2.The sample space of a statistical experiment is a pair(Ω,S), where
(a)Ωis the set of all possible outcomes of the experiment and
(b)Sis aσ-field of subsets ofΩ.
The elements ofΩare calledsample points.Anyset A∈Sis known as anevent. Clearly
Ais a collection of sample points. We say that an eventAhappens if the outcome of the
experiment corresponds to a point inA. Each one-point set is known as asimpleor an
elementary event.Iftheset Ωcontains only a finite number of points, we say that(Ω,S)is
afinite sample space.If Ωcontains at most a countable number of points, we call(Ω,S)
adiscrete sample space. If, however, Ωcontains uncountably many points, we say that
(Ω,S)is anuncountable sample space. In particular, if Ω=R
kor some rectangle inR k,
we call it acontinuous sample space.
Remark 1.The choice ofSis an important one, and some remarks are in order. IfΩcon-
tains at most a countable number of points, we can always takeSto be the class of all

4 PROBABILITY
subsets ofΩ. This is certainly aσ-field. Each one point set is a member ofSand is the
fundamental object of interest. Every subset ofΩis an event. IfΩhas uncountably many
points, the class of all subsets ofΩis still aσ-field, but it is much too large a class of
sets to be of interest. It may not be possible to choose the class of all subsets ofΩasS.
One of the most important examples of an uncountable sample space is the case in which
Ω=RorΩis an interval inR. In this case we would like all one-point subsets ofΩand all
intervals (closed, open, or semiclosed) to be events. We use our knowledge of analysis to
specifyS. We will not go into details here except to recall that the class of all semiclosed
intervals(a,b]generates a classB
1which is aσ-field onR. This class contains all one-
point sets and all intervals (finite or infinite). We takeS=B
1. Since we will be dealing
mostly with the one-dimensional case, we will writeBinstead ofB
1. There are many
subsets ofRthat are not inB
1, but we will not demonstrate this fact here. We refer the
reader to Halmos [42], Royden [96], or Kolmogorov and Fomin [54] for further details.
Example 5.Let us toss a coin. The setΩis the set of symbols H and T, where H
denotes head and T represents tail. Also,Sis the class of all subsets ofΩ, namely,
{{H}, {T},{H,T},Φ}. If the coin is tossed two times, then
Ω={(H,H),(H,T),(T,H),(T,T)}, S={∅,{(H,H)},
{(H,T)},{(T,H)},{(T,T)},{(H,H),(H,T)},{(H,H),(T,H)},
{(H,H),(T,T)},{(H,T),(T,H)},{(T,T),(T,H)},{(T,T),
(H,T)},{(H,H),(H,T),(T,H)},{(H,H),(H,T),(T,T)},
{(H,H),(T,H),(T,T)},{(H,T),(T,H),(T,T)},Ω},
where the first element of a pair denotes the outcome of the first toss and the second
element, the outcome of the second toss. The eventat least one headconsists of sample
points(H,H),(H,T),(T,H). The eventat most one headis the collection of sample points
(H,T),(T,H),(T,T).
Example 6.A die is rolledntimes. The sample space is the pair(Ω,S), whereΩis the
set of alln-tuples(x
1,x2,...,x n),xi∈{1,2,3,4,5,6},i=1,2,...,n,andSis the class of
all subsets ofΩ.Ωcontains 6
n
elementary events. The eventAthat 1 shows at least once
is the set
A={(x
1,x2,...,x n):at least one ofx i’s is 1}
=Ω−{(x
1,x2,...,x n):none of thex i’s is 1}
=Ω−{(x
1,x2,...,x n):xi∈{2,3,4,5,6},i=1,2,...,n}.
Example 7.A coin is tossed until the first head appears. Then
Ω={H, (T,H),(T,T,H),(T,T,T,H),...},
andSis the class of all subsets ofΩ. An equivalent way of writingΩwould be to look
at the number of tosses required for the first head. Clearly, this number can take values

SAMPLE SPACE 5
1,2,3,...,so thatΩis the set of all positive integers. TheSis the class of all subsets of
positive integers.
Example 8.Consider a pointer that is free to spin about the center of a circle. If the pointer
is spun by an impulse, it will finally come to rest at some point. On the assumption that
the mechanism is not rigged in any manner, each point on the circumference is a possible
outcome of the experiment. The setΩconsists of all points 0≤x<2πr, whereris the
radius of the circle. Every one-point set{x}is a simple event, namely, that the pointer
will come to rest atx. The events of interest are those in which the pointer stops at a point
belonging to a specified arc. HereSis taken to be the Borelσ-field of subsets of[0,2πr).
Example 9.A rod of lengthlis thrown onto a flat table, which is ruled with parallel lines
at distance 2l . The experiment consists in noting whether the rod intersects one of the ruled
lines.
Letrdenote the distance from the center of the rod to the nearest ruled line, and letθ
be the angle that the axis of the rod makes with this line (Fig. 1). Every outcome of this
experiment corresponds to a point(r,θ)in the plane. AsΩwe take the set of all points
(r,θ)in{(r,θ):0≤r≤l,0≤θ<π}.ForS we take the Borelσ-field,B
2, of subsets of
Ω, that is, the smallestσ-field generated by rectangles of the form
{(x,y):a<x≤b,c<y≤d,0≤a<b≤l,0≤c<d<π}.
Clearly the rod will intersect a ruled line if and only if the center of the rod lies in the area
enclosed by the locus of the center of the rod (while one end touches the nearest line) and
the nearest line (shaded area in Fig. 2).
Remark 2.From the discussion above it should be clear that in the discrete case there is
really no problem. Every one-point set is also an event, andSis the class of all subsets ofΩ.
r
l/2
l/2
2l
Fig. 1

6 PROBABILITY
r
r= sin θ
θ
l
π
l/2
π/2
l
2
Fig. 2
The problem, if there is any, arises only in regard to uncountable sample spaces. The reader
has to remember only that in this case not all subsets ofΩare events. The case of most inter-
est is the one in whichΩ=R
k. In this case, roughly all sets that have a well-defined volume
(or area or length) are events. Not every set has the property in question, but sets that lack
it are not easy to find and one does not encounter them in practice.
PROBLEMS 1.2
1.A club has five membersA,B,C,D, andE. It is required to select a chairman and a
secretary. Assuming that one member cannot occupy both positions, write the sam-
ple space associated with these selections. What is the event that memberAis an
office holder?
2.In each of the following experiments, what is the sample space?
(a) In a survey of families with three children, the sexes of the children are recorded
in increasing order of age.
(b) The experiment consists of selecting four items from a manufacturer’s output
and observing whether or not each item is defective.
(c) A given book is opened to any page, and the number of misprints is counted.
(d) Two cards are drawn (i) with replacement and (ii) without replacement from an
ordinary deck of cards.
3.LetA,B,Cbe three arbitrary events on a sample space(Ω,S). What is the event that
onlyAoccurs? What is the event that at least two ofA,B,Coccur? What is the event

PROBABILITY AXIOMS 7
that bothAandC, but notB, occur? What is the event that at most one ofA,B,C
occurs?
1.3 PROBABILITY AXIOMS
Let(Ω,S)be the sample space associated with a statistical experiment. In this section we
define a probability set function and study some of its properties.
Definition 1.Let(Ω,S)be a sample space. A set functionPdefined onSis called a
probability measure (or simply probability) if it satisfies the following conditions:
(i)P(A)≥0 for allA∈S.
(ii)P(Ω) =1.
(iii) Let{A
j},Aj∈S,j=1,2,...,be a disjoint sequence of sets, that is,A j∩Ak=Φ
forjα=kwhereΦis the null set. Then
P



u
j=1
Aj

⎠=

u
j=1
P(Aj), (1)
where we have used the notation
n

j=1
Ajto denote union of disjoint setsA j.
We callP(A)theprobability of event A. If there is no confusion, we will write PA
instead ofP(A). Property (iii) is calledcountable additivity. ThatPΦ=0 andPis also
finitely additive follows from it.
Remark 1.IfΩis discrete and contains at mostn(<∞) points, each single-point set{ω
j},
j=1,2,...,n, is an elementary event, and it is sufficient to assign probability to each{ω
j}.
Then, ifA∈S, whereSis the class of all subsets ofΩ,PA=
n
ω∈A
P{ω}. One such
assignment is theequally likelyassignment or the assignment ofuniformprobabilities.
According to this assignment,P{ω
j}=1/n,j=1,2,...,n. ThusPA=m/nifAcontains
melementary events, 1≤m≤n.
Remark 2.IfΩis discrete and contains a countable number of points, one cannot make
an equally likely assignment of probabilities. It suffices to make the assignment for
each elementary event. IfA∈S, whereSis the class of all subsets ofΩ, definePA=
n
ω∈A
P{ω}.
Remark 3.IfΩcontains uncountably many points, each one-point set is an elementary
event, and again one cannot make an equally likely assignment of probabilities. Indeed,
one cannot assign positive probability to each elementary event without violating the
axiomPΩ=1. In this case one assigns probabilities to compound events consisting of
intervals. For example, ifΩ=[0,1]andSis the Borelσ-field of all subsets ofΩ,the
assignmentP[I]=length ofI, whereIis a subinterval ofΩ, defines a probability.

8 PROBABILITY
Definition 2.The triple(Ω,S,P)is called a probability space.
Definition 3.LetA∈S. We say that the odds forAareatobifPA=a/(a+b), and then
the odds againstAarebtoa.
In many games of chance, probability is often stated in terms of odds against an event.
Thus in horse racing a two dollar bet on a horse to win with odds of 2 to 1 (against) pays
approximately six dollars if the horse wins the race. In this case the probability of winning
is 1/3.
Example 1.Let us toss a coin. The sample space is(Ω,S), whereΩ={H, T}, andSis
theσ-field of all subsets ofΩ. Let us definePonSas follows.
P{H} =1/2, P{T} =1/2.
ThenPclearly defines a probability. Similarly,P{H} =2/3,P{T} =1/3, andP{H} =1,
P{T} =0 are probabilities defined onS. Indeed,
P{H} =pandP{T} =1−p(0≤p≤1)
defines a probability on(Ω,S).
Example 2.LetΩ={1,2,3,...}be the set of positive integers, and letSbe the class of
all subsets ofΩ.DefinePonSas follows:
P{i}=
1
2
i
,i=1,2,....
Then
n

i=1
P{i}=1, andPdefines a probability.
Example 3.LetΩ=(0,∞)andS=B,theBorelσ-Field onΩ.DefinePas follows: for
each intervalI⊆Ω,
PI=
α
I
e
−x
dx.
ClearlyPI≥0,PΩ=1, and Pis countably additive by properties of integrals.
Theorem 1.Pis monotone and subtractive; that is, ifA,B∈SandA⊆B, thenPA≤PB
andP(B−A)=PB−PA, whereB−A=B∩A
c
,A
c
being the complement of the eventA.
Proof.IfA⊆B, then
B=(A∩B)+(B−A)=A+(B−A).
and it follows thatPB=PA+P(B−A).
Corollary.For allA∈S,0≤PA≤1.

PROBABILITY AXIOMS 9
Remark 4.We wish to emphasize that, ifPA=0forsomeA ∈S, we callAan event with
zero probabilityor anullevent. However, it does not follow thatA=Φ. Similarly, ifPB=1
for someB∈S, we callBacertain eventbut it does not follow thatB=Ω.
Theorem 2(The Addition Rule).IfA,B∈S, then
P(A∪B)=PA+PB−P(A∩B). (2)
Proof.Clearly
A∪B=(A−B)+(B−A)+(A∩B)
and
A=(A∩B)+(A−B),B=(A∩B)+(B−A).
The result follows by countable additivity ofP.
Corollary 1.Pis subadditive, that is, ifA,B∈S, then
P(A∪B)≤PA+PB. (3)
Corollary 1 can be extended to an arbitrary number of eventsA
j,
P


j
Aj


u
j
PAj. (4)
Corollary 2.IfB=A
c
, thenAandBare disjoint and
PA=1−PA
c
. (5)
The following generalization of (2) is left as an exercise.
Theorem 3(The Principle of Inclusion–Exclusion).LetA
1,A2,...,A n∈S. Then
P

n

k=1
Ak

=
n
u
k=1
PAk−
n
u
k1<k2
P(Ak1
∩Ak2
)
+
n
u
k1<k2<k3
P(Ak1
∩Ak2
∩Ak3
)
+···+(−1)
n+1
P

n

k=1
Ak

. (6)

10 PROBABILITY
Example 4.A die is rolled twice. Let all the elementary events inΩ={(i, j):i,j=
1,2,...,6}be assigned the same probability. LetAbe the event that the first throw shows
a number≤2, andB, the event that the second throw shows at least 5. Then
A={(i,j):1≤i≤2,j=1,2,...,6},
B={(i,j):5≤j≤6,i=1,2,...,6},
A∩B={(1,5),(1,6),(2,5),(2,6)};
P(A∪B)=PA+PB−P(A∩B)
=
1
3
+
1
3

4
36
=
5
9
.
Example 5.A coin is tossed three times. Let us assign equal probability to each of the 2
3
elementary events inΩ.LetAbe the event that at least one head shows up in three throws.
Then
P(A)= 1−P(A
c
)
=1−P(no heads)
=1−P(TTT)=
7
8
.
We next derive two useful inequalities.
Theorem 4(Bonferroni’s Inequality).Givenn(>1) eventsA
1,A2,...,A n,
n
u
i=1
PAi−
u
i<j
P(Ai∩Aj)≤P

n

i=1
Ai


n
u
i=1
PAi. (7)
Proof.In view of (4) it suffices to prove the left side of (7). The proof is by induction.
The inequality on the left is true forn=2 since
PA
1+PA2−P(A 1∩A2)=P(A 1∪A2).
Forn=3,
P

3

i=1
Ai

=
3
u
i=1
PAi−
u
i<j
P(Ai∩Aj)+P(A 1∩A2∩A3),
and the result holds. Assuming that (7) holds for 3<m≤n−1, we show that it holds also
form+1:
P

m+1

i=1
Ai

=P

m

i=1
Ai

∪A
m+1

=P

m

i=1
Ai

+PA
m+1−P

A m+1∩

m

1
Ai

PROBABILITY AXIOMS 11

m+1
u
i=1
PAi−
m
u
i<j
P(Ai∩Aj)−P

m

i=1
(Ai∩Am+1)


m+1
u
i=1
PAi−
m
u
i<j
P(Ai∩Aj)−
m
u
i=1
P(Ai∩Am+1)
=
m+1
u
i=1
PAi−
m+1
u
i<j
P(Ai∩Aj).
Theorem 5(Boole’s Inequality).For any two events,AandB,
P(A∩B)≥1−PA
c
−PB
c
. (8)
Corollary 1.Let{A
j},j=1,2,...,be a countable sequence of events; then
P(∩A
j)≥1−
u
P(A
c
j
). (9)
Proof.Take
B=


j=2
AjandA=A 1
in (8).
Corollary 2(The Implication Rule).IfA,B,C∈SandAandBimplyC, then
PC
c
≤PA
c
+PB
c
. (10)
Let{A
n}be a sequence of sets. The set of all pointsω∈Ωthat belong toA nfor infinitely
many values ofnis known as thelimit superiorof the sequence and is denoted by
limsup
n→∞
Anor
lim
n→∞
An.
The set of all points that belong toA
nfor all but a finite number of values ofnis known
as thelimit inferiorof the sequence{A
n}and is denoted by
lim
n→∞
infA norlim
n→∞
An.
If
lim
n→∞
An=lim
n→∞
An,
we say that the limit exists and writelim
n→∞Anfor the common set and call it thelimit
set.

12 PROBABILITY
We have
lim
n→∞
An=


n=1


k=n
Ak⊆


n=1


k=n
Ak=
lim
n→∞
An.
If the sequence{A
n}is such thatA n⊆An+1,forn =1,2,..., it is callednondecreasing;
ifA
n⊇An+1,n=1,2,..., it is callednonincreasing. If the sequence A nis nondecreasing,
we writeA
nα;ifA nisnonincreasing, we write A nα. Clearly, ifA nαorA nα, the limit
exists and we have
lim
n
An=


n=1
AnifAnα
and
lim
n
An=


n=1
AnifAnα.
Theorem 6.Let{A
n}be a nondecreasing sequence of events inS, that is,A n∈S,
n=1,2,...,and
A
n⊇An−1,n=2,3,....
Then
lim
n→∞
PAn=P

lim
n→∞
An

=P


n=1
An

. (11)
Proof.Let
A=


j=1
Aj.
Then
A=A
n+

u
j=n
(Aj+1−Aj).
By countable additivity we have
PA=PA
n+

u
j=n
P(Aj+1−Aj).
and lettingn→∞, we see that
PA= lim
n→∞
PAn+ lim
n→∞

u
j=n
P(Aj+1−Aj).

PROBABILITY AXIOMS 13
The second term on the right tends to 0 asn→∞since the sum
n

j=1
P(Aj+1−Aj)≤1
and each summand is nonnegative. The result follows.
Corollary.Let{A
n}be a nonincreasing sequence of events inS. Then
lim
n→∞
PAn=P

lim
n→∞
An

=P


n=1
An

. (12)
Proof.Consider the nondecreasing sequence of events{A
c
n
}. Then
lim
n→∞
A
c
n
=


j=1
A
c
j
=A
c
.
It follows from Theorem 6 that
lim
n→∞
PA
c
n
=P

lim
n→∞
A
c
n

=P

⎝ ∞

j=1
A
c
n

⎠=P(A
c
).
In other words,
lim
n→∞
(1−PA n)=1−PA,
as asserted.
Remark 5.Theorem 6 and its corollary will be used quite frequently in subsequent chap-
ters. Property (11) is called thecontinuity of P from below, and (12) is known as the
continuity of P from above. Thus Theorem 6 and its corollary assure us that the set function
Pis continuous from above and below.
We conclude this section with some remarks concerning the use of the word “ran-
dom” in this book. In probability theory “random” has essentially three meanings. First,
in sampling from a finite population a sample is said to be arandom sampleif at each
draw all members available for selection have the same probability of being included.
We will discuss sampling from a finite population in Section 1.4. Second, we speak of a
random sample from a probability distribution. This notion is formalized in Section 6.2.
The third meaning arises in the context of geometric probability, where statements such
as “a point is randomly chosen from the interval(a,b)” and “a point is picked randomly
from a unit square” are frequently encountered. Once we have studied random variables
and their distributions, problems involving geometric probabilities may be formulated
in terms of problems involving independent uniformly distributed random variables, and
these statements can be given appropriate interpretations.
Roughly speaking, these statements involve a certain assignment of probability. The
word “random” expresses our desire to assign equal probability to sets of equal lengths,
areas, or volumes. LetΩ⊆R
nbe a given set, andAbe a subset ofΩ. We are interested in
the probability that a “randomly chosen point” inΩfalls inA. Here “randomly chosen”

14 PROBABILITY
means that the point may be any point ofΩand that the probability of its falling in some
subsetAofΩis proportional to the measure ofA(independently of the location and shape
ofA). Assuming that bothAandΩhave well-defined finite measures (length, area, volume,
etc.), we define
PA=
measure(A)
measure(Ω)
.
(In the language of measure theory we are assuming thatΩis a measurable subset ofR
n
that has a finite, positive Lebesque measure. IfAis any measurable set,PA=μ(A)/μ(Ω),
whereμis then-dimensional Lebesque measure.) Thus, if a point is chosen at random
from the interval(a,b), the probability that it lies in the interval(c,d),a≤c<d≤b,
is(d−c)/(b−a). Moreover, the probability that the randomly selected point lies in any
interval of length(d−c)is the same.
We present some examples.
Example 6.A point is picked “at random” from a unit square. LetΩ={(x,y):0≤x≤1,
0≤y≤1}. It is clear that all rectangles and their unions must be inS. So too should all
circles in the unit square, since the area of a circle is also well defined. Indeed, every set
that has a well-defined area has to be inS. We chooseS=B
2,theBorelσ -field generated
by rectangles inΩ. As for the probability assignment, ifA∈S, we assignPAtoA, where
PAis the area of the setA.IfA ={(x,y):0≤x≤1/2,1/2≤y≤1}, thenPA=1/4. If
Bis a circle with center(1/2,1/2)and radius 1/2, then PB=π(1/2)
2
=π/4. IfCis the
set of all points which are at most a unit distance from the origin, thenPC=π/4 (see
Figs. 1–3).
Example 7(Buffon’s Needle Problem).We return to Example 1.2.9. A needle (rod) of
lengthlis tossed at random on a plane that is ruled with a series of parallel lines at distance
x
(1,1)(0,1)
(0,0)
(1,0)
A
y
Fig. 1A={(x,y):0≤x≤1/2,1/2≤y≤1}.

PROBABILITY AXIOMS 15
y
x
(1,0)
(1,1)
(0,1)
(0,0)
B
Fig. 2B={(x,y):(x−1/2)
2
+(y−1/2)
2
=1}.
(0,0) (1,0)
(1,1)
(0,1)
C
y
x
Fig. 3C={(x,y):(x
2
+y
2
≤1}.
2lapart. We wish to find the probability that the needle will intersect one of the lines.
Denoting byrthe distance from the center of the needle to the closest line and byθthe
angle that the needle forms with this line, we see that a necessary and sufficient condition
for the needle to intersect the line is thatr≤(l/2)sinθ. The needle will intersect the
nearest line if and only if its center falls in the shaded region in Fig. 1.2.2. We assign
probability to an eventAas follows:
PA=
area of setA

.

16 PROBABILITY
Thus the required probability is
1

α
π
0
l
2
sinθdθ=
1
π
.
Here we have interpreted “at random” to mean that the position of the needle is character-
ized by a point(r,θ)which lies in the rectangle 0≤r≤l,0≤θ≤π. We have assumed
that the probability that the point(r,θ)lies in any arbitrary subset of this rectangle is pro-
portional to the area of this set. Roughly, this means that “all positions of the midpoint of
the needle are assigned the same weight and all directions of the needle are assigned the
same weight.”
Example 8.An interval of length 1, say (0, 1), is divided into three intervals by choosing
two points at random. What is the probability that the three line segments form a triangle?
It is clear that a necessary and sufficient condition for the three segments to form a
triangle is that the length of any one of the segments be less than the sum of the other two.
Letx,ybe the abscissas of the two points chosen at random. Then we must have either
0<x<
1
2
<y<1 andy−x<
1
2
or
0<y<
1
2
<x<1 andx−y<
1
2
.
This is precisely the shaded area in Fig. 4. It follows that the required probability is 1/4.
If it is specified in advance that the pointxis chosen at random from(0,1/2), and the
pointyat random from(1/2,1),wemusthave
0<x<
1
2
,
1
2
<y<1,
(0,0)
(1,0)
(1,1)(0,1)
x
y
Fig. 4{(x,y):0<x<1/2<y<1, and(y−x)<1/2or0<y<1/2<x<1, and(x−y)<1/2}.

PROBABILITY AXIOMS 17
and
y−x<x+1−yor 2(y −x)<1.
In this case the area bounded by these lines is the shaded area in Fig. 5, and it follows that
the required probability is 1/2.
Note the difference in sample spaces in the two computations made above.
Example 9(Bertrand’s Paradox).A chord is drawn at random in the unit circle. What is
the probability that the chord is longer than the side of the equilateral triangle inscribed in
the circle?
We present here three solutions to this problem, depending on how we interpret the
phrase “at random.” The paradox is resolved once we define the probability spaces
carefully.
Solution 1. Since the length of a chord is uniquely determined by the position of
its midpoint, choose a pointCat random in the circle and draw a line throughCandO,
the center of the circle (Fig. 6). Draw the chord throughCperpendicular to the lineOC.
Ifl
1is the length of the chord withCas midpoint,l 1>

3 if and only ifClies inside the
circle with centerOand radius 1/2. Thus PA=π(1/2)
2
/π=1/4.
In this caseΩis the circle with centerOand radius 1, and the eventAis the concentric
circle with centerOand radius
1
2
.Sis the usual Borelσ-field of subsets ofΩ.
Solution 2. Because of symmetry, we may fix one end point of the chord at some
pointPand then choose the other end pointP
1at random. Let the probability thatP 1lies
on an arbitrary arc of the circle be proportional to the length of this arc. Now the inscribed equilateral triangle havingPas one of its vertices divides the circumference into three
l/2
1
0
l/2 x
y
Fig. 5{(x,y):0<x<1/2,1/2<y<1and2(y−x)<1}.

18 PROBABILITY
c
o
Fig. 6
PP
1
Fig. 7
equal parts. A chord drawn throughPwill be longer than the side of the triangle if and
only if the other end pointP
1(Fig. 7) of the chord lies on that one third of the circumference
that is opposite toP. It follows that the required probability is 1/3. In this case Ω=[0,2π],
S=B
1∩ΩandA=[2π/3,4π/3].
Solution 3. Note that the length of a chord is uniquely determined by the distance of
its midpoint from the center of the circle. Due to the symmetry of the circle, we assume that
the midpoint of the chord lies on a fixed radius,OM, of the circle (Fig. 8). The probability
that the midpointMlies in a given segment of the radius throughMis then proportional
to the length of this segment. Clearly, the length of the chord will be longer than the side
of the inscribed equilateral triangle if the length ofOMis less thanradius/2. It follows
that the required probability is 1/2.

PROBABILITY AXIOMS 19
O
M
Fig. 8
PROBLEMS 1.3
1.LetΩbe the set of all nonnegative integers andSthe class of all subsets ofΩ.In
each of the following cases doesPdefine a probability on(Ω,S)?
(a) ForA∈S,let
PA=
u
x∈A
e
−λ
λ
x
x!
,λ>0.
(b) ForA∈S,let
PA=
u
x∈A
p(1−p)
x
,0<p<1.
(c) ForA∈S,letPA =1ifAhas a finite number of elements, andPA=0 otherwise.
2.LetΩ=RandS=B. In each of the following cases doesPdefine a probability on
(Ω,S)?
(a) For each intervalI,let
PI=
α
I
1
π
.
1
1+x
2
dx.
(b) For each intervalI,letPI=1ifIis an interval of finite length andPI=0ifIis
an infinite interval.
(c) For each intervalI,letPI=0ifI⊆(−∞,1)andPI=

I
(1/2)dxifI⊆[1,∞].
(IfI=I
1+I2, whereI 1⊆(−∞,1)andI 2⊆[1,∞), thenPI=PI 2.)
3.LetAandBbe two events such thatB⊇A. What isP(A∪B)? What isP(A∩B)?
What isP(A−B)?

20 PROBABILITY
4.In Problems 1(a) and (b), letA={all integers>2},B={all nonnegative
integers<3}, andC={all integersx,3<x<6}.FindPA,PB,PC,P(A∩B),
P(A∪B),P(B∪C),P(A∩C), andP(B∩C).
5.In Problem 2(a) letAbe the eventA={x:x≥0}.FindPA. Also findP{x:x>0}.
6.A box contains 1000 light bulbs. The probability that there is at least 1 defective bulb
in the box is 0.1, and the probability that there are at least 2 defective bulbs is 0.05.
Find the probability in each of the following cases:
(a) The box contains no defective bulbs.
(b) The box contains exactly 1 defective bulb.
(c) The box contains at most 1 defective bulb.
7.Two points are chosen at random on a line of unit length. Find the probability that
each of the three line segments so formed will have a length>1/4.
8.Find the probability that the sum of two randomly chosen positive numbers (both
≤1) will not exceed 1 and that their product will be≤2/9.
9.Prove Theorem 3.
10.Let{A
n}be a sequence of events such thatA n→Aasn→∞. Show thatPA n→PA
asn→∞.
11.The base and the altitude of a right triangle are obtained by picking points ran-
domly from[0,a]and[0,b], respectively. Show that the probability that the area
of the triangle so formed will be less thanab/4is(1+nn2)/2.
12.A pointXis chosen at random on a line segmentAB. (i) Show that the probability
that the ratio of lengthsAX/BXis smaller thana(a>0)isa/(1+a). (ii) Show that
the probability that the ratio of the length of the shorter segment to that of the larger
segment is less than 1/3 is 1/2.
1.4 COMBINATORICS: PROBABILITY ON FINITE SAMPLE SPACES
In this section we restrict attention to sample spaces that have at most a finite number of
points. LetΩ={ω
1,ω2,...,ωn}andSbe theσ-field of all subsets ofΩ. For anyA∈S,
PA=
u
ωj∈A
P{ωj}.
Definition 1.An assignment of probability is said to be equally likely (or uniform) if each
elementary event inΩis assigned the same probability. Thus, ifΩcontainsnpointsω
j,
P{ω
j}=1/n,j=1,2,...,n.
With this assignment
PA=
number of elementary events inA
total number of elementary events inΩ
. (1)
Example 1.A coin is tossed twice. The sample space consists of four points. Under the
uniform assignment, each of four elementary events is assigned probability 1/4.

COMBINATORICS: PROBABILITY ON FINITE SAMPLE SPACES 21
Example 2.Three dice are rolled. The sample space consists of 6
3
points. Each one-point
set is assigned probability 1/6
3
.
In games of chance we usually deal with finite sample spaces where uniform proba-
bility is assigned to all simple events. The same is the case in sampling schemes. In such
instances the computation of the probability of an eventAreduces to a combinatorial
counting problem. We therefore consider some rules of counting.
Rule 1.Given a collection ofn
1elementsa 11,a12,...,a 1n1
,n2elementsa 21,a22,...,a 2n2
,
and so on, up ton
kelementsa k1,ak2,...,a knk
, it is possible to formn 1·n2·····n kordered
k-tuples(a
1j1
,a2j2
,...,a kjk
)containing one element of each kind, 1≤j i≤ni,i=1,2,...,k.
Example 3.Hererdistinguishable balls are to be placed inncells. This amounts to choos-
ing one cell for each ball. The sample space consists ofn
r
r-tuples (i 1,i2,...,i r), wherei j
is the cell number of thejth ball,j=1,2,...,r,(1≤i j≤n).
Considerrtossings with a coin. There are 2
r
possible outcomes. The probability that
no heads will show up inrthrows is(1/2)
r
. Similarly, the probability that no 6 will turn
up inrthrows of a die is(5/6)
r
.
Rule 2 is concerned withordered samples. Consider a set of nelementsa
1,a2,...,a n.
Any ordered arrangement(a
i1
,ai2
,...,a ir
)ofrof thesensymbols is called anordered
sampleof sizer. If elements are selected one by one, there are two possibilities:
1.Sampling with replacementIn this case repetitions are permitted, and we can draw
samples of an arbitrary size. Clearly there aren
r
samplesofsizer.
2.Sampling without replacementIn this case an element once chosen is not replaced,
so that there can be no repetitions. Clearly the sample size cannot exceedn, the size
of the population. There aren(n−1)···(n−r+1)=
nPr, say, possible samples of
sizer. Clearly
nPr=0 for integersr>n.Ifr=n, then nPr=n!.
Rule 2.If ordered samples of sizerare drawn from a population ofnelements, there are
n
r
different samples with replacement andnPrsamples without replacement.
Corollary.The number of permutations ofnobjects isn!.
Remark 1.We will frequently use the term “random sample” in this book to describe the
equal assignment of probability to all possible samples in sampling from a finite popula-
tion. Thus, when we speak of a random sample of sizerfrom a population ofnelements,
it means that each ofn
r
samples, in sampling with replacement, has the same probability
1/n
r
or that each ofnPrsamples, in sampling without replacement, is assigned probability
1/
nPr.
Example 4.Consider a set ofnelements. A sample of sizeris drawn at random with
replacement. Then the probability that no element appears more than once is clearly
nPr/n
r
.

22 PROBABILITY
Thus, ifnballs are to be randomly placed inncells, the probability that each cell will
be occupied isn!/n
n
.
Example 5.Consider a class ofrstudents. The birthdays of theserstudents form a sample
of sizerfrom the 365 days in the year. Then the probability that allrbirthdays are different
is
365Pr/(365)
r
. One can show that this probability is<1/2ifr=23.
The following table gives the values ofq
r=365Pr/(365)
r
for some selected values ofr.
r
20 23 25 30 35 60
qr0.589 0.493 0.431 0.294 0.186 0.006
Next suppose that each of therstudents is asked for his birth date in order, with the
instruction that as soon as a student hears his birth date he is to raise his hand. Let us
compute the probability that a hand is first raised when thekth (k=1,2,...,r) student
is asked his birth date. Letp
kbe the probability that the procedure terminates at thekth
student. Then
p
1=1−

364
365

r−1
and
p
k=
365Pk−1
(365)
k−1

1−
k−1
365

r−k+1

1−

365−k
365−k+1

r−k

,k=2,3,....,r.
Example 6.LetΩbe the set of all permutations ofnobjects. LetA
ibe the set of all permu-
tations that leave theith object unchanged. Then the set∪
n
i=1
Aiis the set of permutations
with at least one fixed point. Clearly
PA
i=
(n−1)!
n!
,i=1,2,...,n,
P(A
i∩Aj)=
(n−2)!
n!
,i<j;i,j=1,2,...,n,etc.
By Theorem 1.3.3 we have
P

n

i=1
Ai

=

1−
1
2!
+
1
3!
−···±
1
n!

.
As an application consider an absent-minded secretary who placesnletters inn
envelopes at random. Then the probability that she will misplace every letter is
1−

1−
1
2!
+
1
3!
−···±
1
n!

.
It is easy to see that this last probability−→e
−1
=0.3679 asn→∞.

COMBINATORICS: PROBABILITY ON FINITE SAMPLE SPACES 23
Rule 3.There are

n
r

different subpopulations of sizer≤nfrom a population ofn
elements, where

n
r

=
n!
r!(n−r)!
. (2)
Example 7.Consider the random distribution ofrballs inncells. LetA
kbe the event that
a specified cell has exactlykballs,k=0,1,2,...,r;kballs can be chosen in

r
k

ways. We
placekballs in the specified cell and distribute the remainingr−kballs in then−1 cells
in(n−1)
r−k
ways. Thus
PA
k=

r
k

(n−1)
r−k
n
r
=

r
k

1
n

k⊃
1−
1
n

r−k
.
Example 8.There are

52
13

=635,013,559,600 different hands at bridge, and

52
5

=
2,598,960 hands at poker.
The probability that all 13 cards in a bridge hand have different face values is 4
13
/

52
13

.
The probability that a hand at poker contains five different face values is

13
5

4
5
/

52
5

.
Rule 4.Consider a population ofnelements. The number of ways in which the population
can be partitioned intoksubpopulations of sizesr
1,r2,...,r k, respectively,r 1+r2+···+
r
k=n,0≤r i≤n, is given by

n
r
1,r2,...,r k

=
n!
r1!r2!···r k!
. (3)
The numbers defined in (3) are known asmultinomial coefficients.
Proof.For the proof of Rule 4 one uses Rule 3 repeatedly. Note that

n
r
1,r2,...,r k

=

n
r
1

n−r
1
r2

···

n−r
1···−r k−2
rk−1

. (4)
Example 9.In a game of bridge the probability that a hand of 13 cards contains 2 spades,
7 hearts, 3 diamonds, and 1 club is

13
2

13
7

13
3

13
1


52
13
.
Example 10.An urn contains 5 red, 3 green, 2 blue, and 4 white balls. A sample of size 8
is selected at random without replacement. The probability that the sample contains 2 red,
2 green, 1 blue, and 3 white balls is

24 PROBABILITY

5
2

3
2

2
1

4
3


14
8
.
PROBLEMS 1.4
1.How many different words can be formed by permuting letters of the word “Missis-
sippi”? How many of these start with the letters “Mi”?
2.An urn containsRred andWwhite marbles. Marbles are drawn from the urn one
after another without replacement. LetA
kbe the event that a red marble is drawn for
the first time on thekth draw. Show that
PA
k=

R
R+W−k+1

k−1

j=1

1−
R
R+W−j+1

.
Letpbe the proportion of red marbles in the urn before the first draw. Show that
PA
k→p(1−p)
k−1
asR+W→∞. Is this to be expected?
3.In a population ofNelements,Rare red andW=N−Rare white. A group ofn
elements is selected at random. Find the probability that the group so chosen will contain exactlyrred elements.
4.Each permutation of the digits 1, 2, 3, 4, 5, 6 determines a six-digit number. If the
numbers corresponding to all possible permutations are listed in increasing order of
magnitude, find the 319th number on this list.
5.The numbers 1, 2,...,nare arranged in random order. Find the probability that the
digits 1,2,...,k(k<n) appear as neighbors in that order.
6.A pin table has seven holes through which a ball can drop. Five balls are played.
Assuming that at each play a ball is equally likely to go down any one of the seven
holes, find the probability that more than one ball goes down at least one of the holes.
7.If 2nboys are divided into two equal subgroups find the probability that the two
tallest boys will be (a) in different subgroups and (b) in the same subgroup.
8.In a movie theater that can accommodaten+kpeople,npeople are seated. What is
the probability thatr≤ngiven seats are occupied?
9.Waiting in line for a Saturday morning movie show are 2n children. Tickets are
priced at a quarter each. Find the probability that nobody will have to wait for change
if, before a ticket is sold to the first customer, the cashier has 2k (k<n) quarters.
Assume that it is equally likely that each ticket is paid for with a quarter or a half-
dollar coin.
10.Each box of a certain brand of breakfast cereal contains a small charm, withkdistinct
charms forming a set. Assuming that the chance of drawing any particular charm is
equal to that of drawing any other charm, show that the probability of finding at least
one complete set of charms in a random purchase ofN≥kboxes equals

COMBINATORICS: PROBABILITY ON FINITE SAMPLE SPACES 25
1−

k
1

k−1
k

N
+

k
2

k−2
k

N


k
3

k−3
k

N
+···+(−1)
k−1

k
k−1

1
k

N
.
[Hint: Use (1.3.6).]
11.Prove Rules 1–4.
12.In a five-card poker game, find the probability that a hand will have:
(a) A royal flush (ace, king, queen, jack, and 10 of the same suit).
(b) A straight flush (five cards in a sequence, all of the same suit; ace is high but A,
2, 3, 4, 5 is also a sequence) excluding a royal flush.
(c) Four of a kind (four cards of the same face value).
(d) A full house (three cards of the same face valuexand two cards of the same face
valuey).
(e) A flush (five cards of the same suit excluding cards in a sequence).
(f) A straight (five cards in a sequence).
(g) Three of a kind (three cards of the same face value and two cards of different
face values).
(h) Two pairs.
(i) A single pair.
13.(a) A married couple and four of their friends enter a row of seats in a concert hall.
What is the probability that the wife will sit next to her husband if all possible
seating arrangements are equally likely?
(b) In part (a), suppose the six people go to a restaurant after the concert and sit at
a round table. What is the probability that the wife will sit next to her husband?
14.Consider a town withNpeople. A person sends two letters to two separate people,
each of whom is asked to repeat the procedure. Thus for each letter received, two
letters are sent out to separate persons chosen at random (irrespective of what hap-
pened in the past). What is the probability that in the firstnstages the person who
started the chain letter game will not receive a letter?
15.Consider a town withNpeople. A person tells a rumor to a second person, who in
turn repeats it to a third person, and so on. Suppose that at each stage the recipient
of the rumor is chosen at random from the remainingN−1 people. What is the
probability that the rumor will be repeatedntimes
(a) Without being repeated to any person.
(b) Without being repeated to the originator.
16.There were four accidents in a town during a seven-day period. Would you be sur-
prised if all four occurred on the same day? Each of the four occurred on a different
day?
17.While Rules 1 and 2 of counting deal with ordered samples with or without replace-
ment, Rule 3 concerns unordered sampling without replacement. The most difficult
rule of counting deals with unordered with replacement sampling. Show that there

26 PROBABILITY
are

n+r−1
r

possible unordered samples of sizerfrom a population ofnelements
when sampled with replacement.
1.5 CONDITIONAL PROBABILITY AND BAYES THEOREM
So far, we have computed probabilities of events on the assumption that no information
was available about the experiment other than the sample space. Sometimes, however,
it is known that an eventHhas happened. How do we use this information in mak-
ing a statement concerning the outcome of another eventA? Consider the following
examples.
Example 1.Let urn 1 contain one white and two black balls, and urn 2, one black and two
white balls. A fair coin is tossed. If a head turns up, a ball is drawn at random from urn 1
otherwise, from urn 2. LetEbe the event that the ball drawn is black. The sample space
isΩ={Hb
11,Hb12,Hw11,Tb21,Tw21,Tw22}, where H denotes head, T denotes tail,b ij
denotesjth black ball inith urn,i=1,2, and so on. Then
PE=P{Hb
11,Hb12,Tb21}=
3
6
=
1
2
.
If, however, it is known that the coin showed a head, the ball could not have been drawn
from urn 2. Thus, the probability ofE, conditional on informationH,is
2
3
. Note that this
probability equals the ratioP{Head and ball drawn black}/P{Head}.
Example 2.Let us toss two fair coins. Then the sample space of the experiment isΩ=
{HH,HT,TH,TT}. Let eventA={both coins show same face} andB={at least one
coin shows H}. Then PA=2/4. IfBis known to have happened, this information assures
that TT cannot happen, andP{Aconditional on the information thatBhas happened} =
1
3
=
1
4
/
3
4
=P(A∩B)/PB.
Definition 1.Let(Ω,S,P)be a probability space, and letH∈SwithPH>0. For an
arbitraryA∈Swe shall write
P{A|H}=
P(A∩H)
PH
(1)
and call the quantity so defined the conditional probability ofA,givenH. Conditional
probability remains undefined whenPH=0.
Theorem 1.Let(Ω,S,P)be a probability space, and letH∈SwithPH>0. Then
(Ω,S,P
H), whereP H(A)=P{A|H}for allA∈S, is a probability space.
Proof.ClearlyP
H(A)=P{A |H}≥0 for allA∈S.Also,P H(Ω) =P(Ω∩H)/PH=1.
IfA
1,A2,...is a disjoint sequence of sets inS, then

CONDITIONAL PROBABILITY AND BAYES THEOREM 27
PH


u
i=1
Ai

=P


u
i=1
Ai|H

=
P{(
n

1
Ai)∩H}
PH
=
n
∞ i=1
P(Ai∩H)
PH
=

u
i=1
PH(Ai).
Remark 1.What we have done is to consider a new sample space consisting of the basic
setHand theσ-fieldS
H=S∩H, of subsetsA∩H,A∈S,ofH . On this space we have
defined a set functionP
Hby multiplying the probability of each event by(PH)
−1
. Indeed,
(H,S
H,PH)is a probability space.
LetAandBbe two events withPA>0,PB>0. Then it follows from (1) that



P(A∩B)=PA·P{B|A},
P(A∩B)=PB·P{A|B}.
(2)
Equation (2) may be generalized to any number of events. LetA
1,A2,...,A n∈S,n≥2,
and assume thatP(

n−1
j=1
Ai)>0. Since
A
1⊃(A 1∩A2)⊃(A 1∩A2∩A3)⊃···⊃


n−2

j=1
Aj

⎠⊃


n−1

j=1
Aj

⎠,
we see that
PA
1>0,P(A 1∩A2)>0,...,P

n−2

j=1
Aj

>0.
It follows thatP{A
k|∩
k−1
j=1
Aj}are well defined fork=2,3,...,n.
Theorem 2(The Multiplication Rule).Let(Ω,S,P)be a probability space andA
1,A2,...,
A
n∈S, withP(∩
n−1
j=1
Aj)>0. Then
P

n

j=1
Aj

=P(A
1)P{A 2|A1}P{A 3|A1∩A2}···P

A n|
n−1

j=1
Aj

. (3)
Proof.The proof is simple.
Let us suppose that{H
j}is a countable collection of events inSsuch thatH j∩Hk=Φ,
jα=k, and
n

j=1
Hj=Ω. Suppose thatPH j>0 for allj. Then
PB=

u
j=1
P(Hj)P{B|H j}for allB∈S. (4)

28 PROBABILITY
For the proof we note that
B=

u
j=1
(B∩H j),
and the result follows. Equation (4) is called thetotal probability rule.
Example 3.Consider a hand of five cards in a game of poker. If the cards are dealt at
random, there are

52
5

possible hands of five cards each. LetA={at least 3 cards of
spades}andB={all 5 cards of spades}. Then
P(A∩B)=P{all 5 cards of spades}
=

13
5


52
5

and
P{B|A}=
P(A∩B)
PA
=

13
5



52
5


13
3

39
2

+

13
4

39
1

+

13
5



52
5
.
Example 4.Urn 1 contains one white and two black marbles, urn 2 contains one black
and two white marbles, and urn 3 contains three black and three white marbles. A die is
rolled. If a 1, 2, or 3 shows up, urn 1 is selected; if a 4 shows up, urn 2 is selected; and if
a 5 or 6 shows up, urn 3 is selected. A marble is then drawn at random from the selected
urn. LetAbe the event that the marble drawn is white. IfU,V,W, respectively, denote the
events that the urn selected is 1, 2, 3, then
A=(A∩U)+(A∩V)+(A∩W),
P(A∩U)=P(U )·P{A|U}=
3
6
·
1
3
,
P(A∩V)=P(V )·P{A|V}=
1
6
·
2
3
,
P(A∩W)=P(W )·P{A|W}=
2
6
·
3
6
.
It follows that
PA=
1
6
+
1
9
+
1
6
=
4
9
.
A simple consequence of the total probability rule is the Bayes rule, which we now
prove.

CONDITIONAL PROBABILITY AND BAYES THEOREM 29
Theorem 3(Bayes Rule).Let{H n}be a disjoint sequence of events such thatPH n>0,
n=1,2,...,and
n

n=1
Hn=Ω.LetB ∈SwithPB>0. Then
P{H
j|B}=
P(H
j)P{B|H j}
u∞
i=1
P(Hi)P{B|H i}
,j=1,2,.... (5)
Proof.From (2)
P{B∩H
j}=P(B)P{H j|B}=PH jP{B|H j},
and it follows that
P{H
j|B}=
PH
jP{B|H j}
PB
.
The result now follows on using (4).
Remark 2.Suppose thatH
1,H2,...are all the “causes” that lead to the outcome of a ran-
dom experiment. LetH
jbe the set of outcomes corresponding to thejth cause. Assume
that the probabilitiesPH
j,j=1,2,...,called thepriorprobabilities, can be assigned. Now
suppose that the experiment results in an eventBof positive probability. This information
leads to a reassessment of the prior probabilities. The conditional probabilitiesP{H
j|B}
are called theposteriorprobabilities. Formula (5) can be interpreted as a rule giving the
probability that observed eventBwas due to cause or hypothesisH
j.
Example 5.In Example 4 let us compute the conditional probabilityP{V|A}.
We have
P{V|A}=
PV P{A |V}
PU P{A |U}+PV P{A |V}+PW P{A |W}
=
1
6
·
2
3
3
6
·
1
3
+
1
6
·
2
3
+
2
6
·
3
6
=
1
9
4
9
=
1
4
.
PROBLEMS 1.5
1.LetAandBbe two events such thatPA=p
1>0,PB=p 2>0, andp 1+p2>1.
Show thatP{B|A}≥1−[(1−p
2)/p1].
2.Two digits are chosen at random without replacement from the set of integers
{1,2,3,4,5,6,7,8}.
(a) Find the probability that both digits are greater than 5.
(b) Show that the probability that the sum of the digits will be equal to 5 is the same
as the probability that their sum will exceed 13.
3.The probability of a family chosen at random having exactlykchildren isαp
k
,0<
p<1. Suppose that the probability that any child has blue eyes isb,0<b<1,

30 PROBABILITY
independently of others. What is the probability that a family chosen at random has
exactlyr(r≥0) children with blue eyes?
4.In Problem 3 let us write
p
k=probability of a randomly chosen family having exactlykchildren=αp
k
,
k=1,2,...,
p
0=1−
αp
(1−p)
.
Suppose that all sex distributions ofkchildren are equally likely. Find the probability
that a family has exactlyrboys,r≥1. Find the conditional probability that a family
has at least two boys, given that it has at least one boy.
5.Each of(N+1)identical urns marked 0, 1,2,...,NcontainsNballs. Thekth urn
containskblack andN−kwhite balls,k=0,1,2,...,N. An urn is chosen at random,
andnrandom drawings are made from it, the ball drawn being always replaced. If
all thendraws result in black balls, find the probability that the(n+1)th draw will
also produce a black ball. How does this probability behave asN→∞?
6.Each ofnurns contains four white and six black balls, while another urn contains
five white and five black balls. An urn is chosen at random from the(n+1)urns,
and two balls are drawn from it, both being black. The probability that five white
and three black balls remain in the chosen urn is 1/7. Findn.
7.In answering a question on a multiple choice test, a candidate either knows the
answer with probabilityp(0≤p<1) or does not know the answer with probability
1−p. If he knows the answer, he puts down the correct answer with probability 0.99,
whereas if he guesses, the probability of his putting down the correct result is 1/k
(kchoices to the answer). Find the conditional probability that the candidate knew
the answer to a question, given that he has made the correct answer. Show that this
probability tends to 1 ask→∞.
8.An urn contains five white and four black balls. Four balls are transferred to a sec-
ond urn. A ball is then drawn from this urn, and it happens to be black. Find the
probability of drawing a white ball from among the remaining three.
9.Prove Theorem 2.
10.An urn containsrred andggreen marbles. A marble is drawn at random and its
color noted. Then the marble drawn, together withc>0 marbles of the same color,
are returned to the urn. Supposensuch draws are made from the urn? Find the
probability of selecting a red marble at any draw.
11.Consider a bicyclist who leaves a pointP(see Fig. 1), choosing one of the roads
PR
1,PR2,PR3at random. At each subsequent crossroad he again chooses a road at
random.
(a) What is the probability that he will arrive at pointA?
(b) What is the conditional probability that he will arrive atAvia roadPR
3?
12.Five percent of patients suffering from a certain disease are selected to undergo a
new treatment that is believed to increase the recovery rate from 30 percent to 50
percent. A person is randomly selected from these patients after the completion of

INDEPENDENCE OF EVENTS 31
P
R
3R
2
R
1
R
11 R
12
R
21
A
R
32
R
33
R
34Fig. 1Map for Problem 11.
the treatment and is found to have recovered. What is the probability that the patient
received the new treatment?
13.Four roads lead away from the county jail. A prisoner has escaped from the jail and
selects a road at random. If road I is selected, the probability of escaping is 1/8;
if road II is selected, the probability of success is 1/6; if road III is selected, the
probability of escaping is 1/4; and if road IV is selected, the probability of success
is 9/10.
(a) What is the probability that the prisoner will succeed in escaping?
(b) If the prisoner succeeds, what is the probability that the prisoner escaped by
using road IV? Road I?
14.A diagnostic test for a certain disease is 95 percent accurate; in that if a person has
the disease, it will detect it with a probability of 0.95, and if a person does not have
the disease, it will give a negative result with a probability of 0.95. Suppose only 0.5
percent of the population has the disease in question. A person is chosen at random
from this population. The test indicates that this person has the disease. What is the
(conditional) probability that he or she does have the disease?
1.6 INDEPENDENCE OF EVENTS
Let(Ω,S,P)be a probability space, and letA,B∈S, withPB>0. By the multiplication
rule we have
P(A∩B)=P(B)P{A |B}.

32 PROBABILITY
In many experiments the information provided byBdoes not affect the probability of
eventA, that is,P{A|B}=P{A}.
Example 1.Let two fair coins be tossed, and letA={head on the second throw},
B={head on the first throw}. Then
P(A)= P{HH, TH}=
1
2
,P(B)= {HH,HT}=
1
2
,
and
P{A|B}=
P(A∩B)
P(B)
=
1
4
1
2
=
1
2
=P(A)
Thus
P(A∩B)=P(A)P(B).
In the following, we will writeA∩B=AB.
Definition 1.Two events,AandB, are said to be independent if and only if
P(AB)= P(A)P(B). (1)
Note that we have not placed any restriction onP(A)orP(B). Thus conditional prob-
ability is not defined whenP(A)orP(B)= 0 but independence is. Clearly, ifP(A)= 0,
thenAis independent of everyE∈S. Also, any eventA∈Sis independent ofΦandΩ.
Theorem 1.IfAandBare independent events, then
P{A|B}=P(A) ifP(B)>0
and
P{B|A}=P(B) ifP(A)>0.
Theorem 2.IfAandBare independent, so areAandB
c
,A
c
andB, andA
c
andB
c
.
Proof.
P(A
c
B)=P(B−(A∩B))
=P(B)−P(A∩B)sinceB⊇(A∩B)
=P(B){1−P(A)}
=P(A
c
)P(B).
Similarly, one proves that (i)A
c
andB
c
and (ii)AandB
c
are independent.

INDEPENDENCE OF EVENTS 33
We wish to emphasize that independence of events is not to be confused with disjoint
or mutually exclusive events. If two events, each with nonzero probability, are mutually
exclusive, they are obviously dependent since the occurrence of one will automatically
preclude the occurrence of the other. Similarly, ifAandBare independent andPA>0,
PB>0, thenAandBcannot be mutually exclusive.
Example 2.A card is chosen at random from a deck of 52 cards. LetAbe the event that
the card is an ace andB, the event that it is a club. Then
P(A)=
4
52
=
1
13
,P(B)=
13
52
=
1
4
,
P(AB)= P{ace of clubs} =
1
52
,
so thatAandBare independent.
Example 3.Consider families with two children, and assume that all four possible dis-
tributions of sex—BB, BG, GB, GG, where B stands for boy and G for girl—are equally
likely. LetEbe the event that a randomly chosen family has at most one girl andF,the
event that the family has children of both sexes. Then
P(E)=
3
4
,P(F)=
1
2
,andP(EF)=
1
2
,
so thatEandFare not independent.
Now consider families with three children. Assuming that each of the eight possible
sex distributions is equally likely, we have
P(E)=
4
8
,P(F)=
6
8
,P(EF)=
3
8
,
so thatEandFare independent.
An obvious extension of the concept of independence between two eventsAandBto a
given collectionUof events is to require that any two distinct events inUbe independent.
Definition 2.LetUbe a family of events fromS. We say that the eventsUare pairwise
independent if and only if, for every pair of distinct eventsA,B∈U,
P(AB)= PA PB.
A much stronger and more useful concept ismutualorcomplete independence.
Definition 3.A family of eventsUis said to be a mutually or completely independent
family if and only if, for every finite sub collection{A
i1
,Ai2
,...,A ik
}ofU, the following
relation holds:
P(A
i1
∩Ai2
∩···∩A ik
)=
k

j=1
PAij
. (2)

34 PROBABILITY
In what follows we will omit the adjective “mutual” or “complete” and speak of inde-
pendent events. It is clear from Definition 3 that in order to check the independence ofn
eventsA
1,A2,...,A n∈Swe must check the following 2
n
−n−1 relations.
P(A
iAj)=PA iPAj, iα=j;i,j=1,2,...,n,
P(A
iAjAk)=PA iPAjPAk, iα=jα=k;i,j,k=1,2,...,n,
.
.
.
P(A
1A2···An)=PA 1PA2···PAn.
The first of these requirements is pairwise independence. Independence therefore implies
pairwise independence, but not conversely.
Example 4(Wong [120]). Take four identical marbles. On the first, write symbolsA
1A2A3.
On each of the other three, writeA
1,A2,A3, respectively. Put the four marbles in an urn
and draw one at random. LetE
idenote the event that the symbolA iappears on the drawn
marble. Then
P(E
1)=P(E 2)=P(E 3)=
1
2
,
P(E
1E2)=P(E 2E3)=P(E 1E3)=
1
4
,
and
P(E
1E2E3)=
1
4
. (3)
It follows that although eventsE
1,E2,E3are not independent, they are pairwise
independent. Example 5(Kac [48], pp. 22–23). In this exampleP(E
1E2E3)=P(E 1)P(E 2)P(E 3),but
E
1,E2,E3are not pairwise independent and hence not independent. LetΩ={1,2,3,4},
and letp
ibe the probability assigned to{i},i=1,2,3,4. Letp 1=

2
2

1
4
,p2=
1
4
,p3=
3
4


2
2
,p4=
1
4
.LetE 1={1,3},E 2={2,3},E 3={3,4}. Then
P(E
1E2E3)=P{3} =
3
4


2
2
=
1
2

1−

2
2

1−

2
2

=(p
1+p3)(p2+p3)(p3+p4)
=P(E
1)P(E 2)P(E 3).
ButP(E
1E2)=
3
4


2
2
α=PE1PE2, and it follows thatE 1,E2,E3are not independent.
Example 6.A die is rolled repeatedly until a 6 turns up. We will show that eventA, that
“a 6 will eventually show up,” is certain to occur. LetA
kbe the event that a 6 will show up
for the first time on thekth throw. LetA=
n

k=1
Ak. Then
PA
k=
1
6

5
6

k−1
,k=1,2,....,

INDEPENDENCE OF EVENTS 35
and
PA=
1
6

u
k=1

5
6

k−1
=
1
6
1
1−
5
6
=1.
Alternatively, we can use the corollary to Theorem 1.3.6. LetB
nbe the event that a 6 does
not show up on the firstntrials. ClearlyB
n+1⊆Bn, and we haveA
c
=∩

n=1
Bn. Thus
1−PA=PA
c
=P



n=1
Bn

= lim
n→∞
P(Bn) = lim
n→∞

5
6

n
=0.
Example 7.A slip of paper is given to personA, who marks it with either a plus or a
minus sign; the probability of her writing a plus sign is 1/3.Apasses the slip toB,who
may either leave it alone or change the sign before passing it toC.Next,Cpasses the slip
toDafter perhaps changing the sign; finally,Dpasses it to a referee after perhaps changing
the sign. The referee sees a plus sign on the slip. It is known thatB,C, andDeach change
the sign with probability 2/3. We shall compute the probability thatAoriginally wrote a
plus.
LetNbe the event thatAwrote a plus sign, andM, the event that she wrote a minus
sign. LetEbe the event that the referee saw a plus sign on the slip. We have
P{N|E}=
P(N)P{E|N}
P(M)P{E|M}+P(N)P{E|N}
.
Now
P{E|N}=P{the plus sign was either not changed or changed exactly twice}
=

1
3

3
+3

2
3

2
+

1
3

and
P{E|M}=P{the minus sign was changed either once or three times}
=3

2
3

1
3

2
+

2
3

3
.
It follows that
P{N|E}=
(
1
3
)[(
1
3
)
3
+3(
2
3
)
2
(
1
3
)]
(
1
3
)[(
1
3
)
3
+3(
2
3
)
2
(
1
3
)] + (
2
3
)[3(
2
3
)(
1
3
)
2
+(
2
3
)
3
]
=
13
18
41
81
=
13
41
.

36 PROBABILITY
PROBLEMS 1.6
1.A biased coin is tossed until a head appears for the first time. Letpbe the probability
of a head, 0<p<1. What is the probability that the number of tosses required is
odd? Even?
2.LetAandBbe two independent events defined on some probability space, and let
PA=1/3,PB=3/4. Find (a)P(A∪B),(b)P{A|A∪B}, and (c)P{B|A∪B}.
3.LetA
1,A2, andA 3be three independent events. Show thatA
c
1
,A
c
2
, andA
c
3
are
independent.
4.A biased coin with probabilityp,0<p<1, of success (heads) is tossed until for the
first time the same result occurs three times in succession (i.e., three heads or three
tails in succession). Find the probability that the game will end at the seventh throw.
5.A box contains 20 black and 30 green balls. One ball at a time is drawn at random,
its color is noted, and the ball is then replaced in the box for the next draw.
(a) Find the probability that the first green ball is drawn on the fourth draw.
(b) Find the probability that the third and fourth green balls are drawn on the sixth
and ninth draws, respectively.
(c) LetNbe the trial at which the fifth green ball is drawn. Find the probability that
the fifth green ball is drawn on thenth draw. (Note thatNtake values 5, 6,7,....)
6.An urn contains four red and four black balls. A sample of two balls is drawn at
random. If both balls drawn are of the same color, these balls are set aside and a new
sample is drawn. If the two balls drawn are of different colors, they are returned to
the urn and another sample is drawn. Assume that the draws are independent and
that the same sampling plan is pursued at each stage until all balls are drawn.
(a) Find the probability that at leastnsamples are drawn before two balls of the
same color appear.
(b) Find the probability that after the first two samples are drawn four balls are left,
two black and two red.
7.LetA,B, andCbe three boxes with three, four, and five cells, respectively. There are
three yellow balls numbered 1 to 3, four green balls numbered 1 to 4, and five red
balls numbered 1 to 5. The yellow balls are placed at random in boxA, the green in
B, and the red inC, with no cell receiving more than one ball. Find the probability
that only one of the boxes will show no matches.
8.A pond contains red and golden fish. There are 3000 red and 7000 golden fish,
of which 200 and 500, respectively, are tagged. Find the probability that a random
sample of 100 red and 200 golden fish will show 15 and 20 tagged fish, respectively.
9.Let(Ω,S,P)be a probability space. LetA,B,C∈SwithPBandPC>0. IfBand
Care independent show that
P{A|B}=P{A|B∩C}PC+P{A|B∩C
c
}PC
c
.
Conversely, if this relation holds,P{A|BC} α=P{A|B}, andPA>0, thenBandC
are independent (Strait [111]).

INDEPENDENCE OF EVENTS 37
10.Show that the converse of Theorem 2 also holds. ThusAandBare independent if,
and only if,AandB
c
are independent, and so on.
11.A lot of five identical batteries is life tested. The probability assignment is assumed
to be
P(A)=
α
A
(1/λ)e
−x/λ
dx
for any eventA⊆[0,∞), whereλ>0 is a known constant. Thus the probability that
a battery fails after timetis given by
P(t,∞)=
α

t
(1/λ)e
−x/λ
dx,t≥0.
If the times to failure of the batteries are independent, what is the probability that at
least one battery will be operating aftert
0hours?
12.OnΩ=(a,b),−∞<a<b<∞, each subinterval is assigned a probability propor-
tional to the length of the interval. Find a necessary and sufficient condition for two
events to be independent.
13.A game of craps is played with a pair of fair dice as follows. A player rolls the dice.
If a sum of 7 or 11 shows up, the player wins; if a sum of 2, 3, or 12 shows up, the
player loses. Otherwise the player continues to roll the pair of dice until the sum is
either 7 or the first number rolled. In the former case the player loses and in the latter
the player wins.
(a) Find the probability that the player wins on thenth roll.
(b) Find the probability that the player wins the game.
(c) What is the probability that the game ends on: (i) the first roll, (ii) second roll,
and (iii) third roll?

2
RANDOM VARIABLES AND THEIR
PROBABILITY DISTRIBUTIONS
2.1 INTRODUCTION
In Chapter 1 we dealt essentially with random experiments which can be described by
finite sample spaces. We studied the assignment and computation of probabilities of
events. In practice, one observes a function defined on the space of outcomes. Thus, if
a coin is tossedntimes, one is not interested in knowing which of the 2
n
n-tuples in the
sample space has occurred. Rather, one would like to know the number of heads inntosses.
In games of chance one is interested in the net gain or loss of a certain player. Actually, in
Chapter 1 we were concerned with such functions without defining the termrandom vari-
able. Here we study the notion of a random variable and examine some of its properties.
In Section 2.2 we define a random variable, while in Section 2.3 we study the notion
of probability distribution of a random variable. Section 2.4 deals with some special types
of random variables, and Section 2.5 considers functions of a random variable and their
induced distributions.
The fundamental difference between a random variable and a real-valued function of a
real variable is the associated notion of a probability distribution. Nevertheless our knowl-
edge of advanced calculus or real analysis is the basic tool in the study of random variables
and their probability distributions.
2.2 RANDOM VARIABLES
In Chapter 1 we studied properties of a set functionPdefined on a sample space(Ω,S).
SincePis a set function, it is not very easy to handle; we cannot perform arithmetic or
An Introduction to Probability and Statistics, Third Edition. Vijay K. Rohatgi and A.K. Md. Ehsanes Saleh.
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.

40 RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS
algebraic operations on sets. Moreover, in practice one frequently observes some function
of elementary events. When a coin is tossed repeatedly, which replication resulted in heads
is not of much interest. Rather one is interested in the number of heads, and consequently
the number of tails, that appear in, say,ntossings of the coin. It is therefore desirable to
introduce a point function on the sample space. We can then use our knowledge of calculus
or real analysis to study properties ofP.
Definition 1.Let(Ω,S)be a sample space. A finite, single-valued function which maps
ΩintoRis called a random variable (RV) if the inverse images underXof all Borel sets
inRare events, that is, if
X
−1
(B)={ω :X(ω)∈B}∈S for allB∈B. (1)
In order to verify whether a real-valued function on(Ω,S)is an RV, it is not necessary
to check that (1) holds for all Borel setsB∈B. It suffices to verify (1) for any classA
of subsets ofRwhich generatesB. By takingAto be the class of semiclosed intervals
(−∞,x],x∈Rwe get the following result.
Theorem 1.Xis an RV if and only if for eachx∈R
{ω:X(ω)≤x}={X≤x}∈S. (2)
Remark 1.Note that the notion of probability does not enter into the definition of an RV.
Remark 2.IfXis an RV, the sets{X=x},{a<X≤b},{X<x},{a≤X<b},{a<
X<b},{a≤X≤b}are all events. Moreover, we could have used any of these intervals
to define an RV. For example, we could have used the following equivalent definition:X
is an RV if and only if
{ω:X(ω)<x}∈S for allx∈R. (3)
We have
{X<x}=

Ω
n=1
φ
X≤x−
1
n
σ
(4)
and
{X≤x}=

ε
n=1
φ
X<x+
1
n
σ
. (5)
Remark 3.In practice (1) or (2) is a technical condition in the definition of an RV which
the reader may ignore and think of RVs simply as real-valued functions defined onΩ.It
should be emphasized though that there do exist subsets ofRwhich do not belong toB
and hence there exist real-valued functions defined onΩwhich are not RVs but the reader
will not encounter them in practical applications.

RANDOM VARIABLES 41
Example 1.For any setA⊆Ω, define
I
A(ω)=

0,ω/∈A,
1,ω∈A.
I
A(ω)is called theindicator functionof setA.I Ais an RV if and only ifA∈S.
Example 2.LetΩ={H, T}andSbe the class of all subsets ofΩ.DefineX byX(H) =1,
X(T) =0. Then
X
−1
(−∞,x]=





φ ifx<0,
{T} if 0≤x<1,
{H,T}if 1≤x,
and we see thatXis an RV.
Example 3.LetΩ={HH,TT,HT,TH} andSbe the class of all subsets ofΩ.
DefineXby
X(ω)=number of H’s inω.
ThenX(HH) =2,X(HT) =X(TH) =1, andX(TT) =0.
X
−1
(−∞,x]=









φ, x<0,
{TT}, 0≤x<1,
{TT,HT,TH}, 1≤x<2,
Ω, 2≤x.
ThusXis an RV.
Remark 4.Let(Ω,S)be a discrete sample space; that is, letΩbe a countable set of points
andSbe the class of all subsets ofΩ. Then every numerical valued function defined on
(Ω,S)is an RV.
Example 4.LetΩ=[0,1]andS=B∩[0,1]be theσ-field of Borel sets on[0,1].Define
XonΩby
X(ω)=ω, ω ∈[0,1].
ClearlyXis an RV. Any Borel subset ofΩis an event.
Remark 5.LetXbe an RV defined on(Ω,S)anda,bbe constants. ThenaX+bis also
an RV on(Ω,S). Moreover,X
2
is an RV and so also is 1/X, provided that{X=0}=φ.
For a general result see Theorem 2.5.1.

42 RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS
PROBLEMS 2.2
1.LetXbe the number of heads in three tosses of a coin. What isΩ? What are the values
thatXassigns to points ofΩ? What are the events{X≤2.75}, {0.5≤X≤1.72}?
2.A die is tossed two times. LetXbe the sum of face values on the two tosses andY
be the absolute value of the difference in face values. What isΩ? What values doX
andYassign to points ofΩ? Check to see whetherXandYare random variables.
3.LetXbe an RV. Is|X|also an RV? IfXis an RV that takes only nonnegative values,
is

Xalso an RV?
4.A die is rolled five times. LetXbe the sum of face values. Write the events{X=4},
{X=6},{X=30},{X≥29}.
5.LetΩ=[0,1]andSbe the Borelσ-field of subsets ofΩ.DefineX onΩas follows:
X(ω)=ω if 0≤ω≤1/2, andX(ω)=ω−1/2if1/2<ω≤1. IsXan RV? If so,
what is the event{ω:X(ω)∈(1/4,1/2)}?
6.LetAbe a class of subsets ofRwhich generatesB. Show thatXis an RV onΩif
and only ifX
−1
(A)∈Rfor allA∈A.
2.3 PROBABILITY DISTRIBUTION OF A RANDOM VARIABLE
In Section 2.2 we introduced the concept of an RV and noted that the concept of proba-
bility on the sample space was not used in this definition. In practice, however, random
variables are of interest only when they are defined on a probability space. Let(Ω,S,P)
be a probability space, and letXbe an RV defined on it.
Theorem 1.The RVXdefined on the probability space(Ω,S,P)induces a probability
space(R,B,Q)by means of the correspondence
Q(B)= P{X
−1
(B)}=P{ω:X(ω)∈B} for allB∈B. (1)
We writeQ=PX
−1
and callQorPX
−1
the (probability)distributionofX.
Proof.ClearlyQ(B)≥0 for allB∈B, and alsoQ(R)=P{X∈R}=P(Ω) =1.
LetB
i∈B,i=1,2,...withB i∩Bj=φ,i =j. Since the inverse image of a disjoint
union of Borel sets is the disjoint union of their inverse images, we have
Q



i=1
Bi

=P

X
−1



i=1
Bi
↑⊇
=P



i=1
X
−1
(Bi)

=


i=1
PX
−1
(Bi)=


i=1
Q(Bi).
It follows that(R,B,Q)is a probability space, and the proof is complete.

PROBABILITY DISTRIBUTION OF A RANDOM VARIABLE 43
We note thatQis a set function, and set functions are not easy to handle. It is therefore
more practical to use(2.2.2) since thenQ(−∞,x]is a point function. Let us first introduce
and study some properties of a special point function onR.
Definition 1.A real-valued functionFdefined on(−∞,∞)that is nondecreasing, right
continuous, and satisfies
F(−∞)=0 and F(+∞)=1
is called a distribution function (DF).
Remark 1.Recall that ifFis a nondecreasing function onR, thenF(x−) = lim
t↑xF(t),
F(x+) = lim
t↓xF(t)exist and are finite. Also,F(+∞) andF(−∞)exist aslim t↑+∞F(t)
andlim
t↓−∞F(t), respectively. In general,
F(x−)≤F(x)≤F(x+),
andxis a jump point ofFif and only ifF(x+)andF(x−)exist but are unequal. Thus a
nondecreasing functionFhas only jump discontinuities. If we define
F

(x)=F(x+) for allx,
we see thatF

is nondecreasing and right continuous onR. Thus in Definition 1 the non-
decreasing part is very important. Some authors demand left continuity in the definition
of a DF instead of right continuity.
Theorem 2.The set of discontinuity points of a DFFis at most countable.
Proof.Let(a,b]be a finite interval with at leastndiscontinuity points:
a<x
1<x2<···<x n≤b.
Then
F(a)≤F(x
1−)<F(x 1)≤···≤F(x n−)<F(x n)≤F(b).
Letp
k=F(x k)−F(x k−),k=1,2,...,n. Clearly,
n

k=1
pk≤F(b)−F(a),
and it follows that the number of pointsxin(a,b]with jumpp(x)>ε>0isatmost
ε
−1
{F(b)−F(a)}. Thus, for every integerN, the number of discontinuity points with
jump greater than 1/N is finite. It follows that there are no more than a countable number
of discontinuity points in every finite interval(a,b]. SinceRis a countable union of such
intervals, the proof is complete.

44 RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS
Definition 2.LetXbe an RV defined on(Ω,S,P). Define a point functionF(.)onRby
using (1), namely,
F(x)=Q(−∞,x]=P{ω:X(ω)≤x} for allx∈R. (2)
The functionFis called the distribution function of RVX.
If there is no confusion, we will write
F(x)=P{X ≤x}.
The following result justifies our callingFas defined by (2) a DF.
Theorem 3.The functionFdefined in (2) is indeed a DF.
Proof.Letx
1<x2. Then(−∞,x 1]⊂(−∞,x 2], and we have
F(x
1)=P{X ≤x 1}≤P{X ≤x 2}=F(x 2).
SinceFis nondecreasing, it is sufficient to show that for any sequence of numbersx
n↓x,
x
1>x2>···>x n>···>x,F(x n)→F(x).LetA k={ω:X(ω)∈(x,x k]}. ThenA k∈S
andA
k ↑.Also,
lim
k→∞
Ak=

ε
k=1
Ak=φ,
since none of the intervals(x,x
k]containsx. It follows thatlim k→∞P(Ak)=0. But,
P(A
k)=P{X≤x k}−P{X≤x}
=F(x
k)−F(x),
so that
lim
k→∞
F(xk)=F(x)
andFis right continuous.
Finally, let{x
n}be a sequence of numbers decreasing to−∞. Then,
{X≤x
n}⊇{X≤x n+1}for eachn
and
lim
n→∞
{X≤x n}=

ε
n=1
{X≤x n}=φ.
Therefore,
F(−∞) = lim
n→∞
P{X≤x n}=P

lim
n→∞
{X≤x n}

=0.

PROBABILITY DISTRIBUTION OF A RANDOM VARIABLE 45
Similarly,
F(+∞) = lim
xn→∞
P{X≤x n}=1,
and the proof is complete.
The next result, stated without proof, establishes a correspondence between the induced
probabilityQon(R,B)and a point functionFdefined onR.
Theorem 4.Given a probabilityQon(R,B), there exists a distribution functionF
satisfying
Q(−∞,x]=F(x)for allx∈R, (3)
and, conversely, given a DFF, there exists a unique probabilityQdefined on(R,B)that
satisfies (3).
For proof see Chung [15, pp. 23–24].
Theorem 5.Every DF is the DF of an RV on some probability space.
Proof.LetFbe a DF. From Theorem 4 it follows that there exists a unique probabilityQ
defined onRthat satisfies
Q(−∞,x]=F(x)for allx∈R.
Let(R,B,Q)be the probability space on which we define
X(ω)=ω, ω ∈R.
Then
Q{ω:X(ω)≤x}=Q(−∞,x]=F(x),
andFis the DF of RVX.
Remark 2.IfXis an RV on(Ω,S,P), we have seen (Theorem 3) thatF(x)=P{X≤x}is a
DF associated withX. Theorem 5 assures us that to every DFFwe can associate some RV.
Thus, given an RV, there exists a DF, and conversely. In this book, when we speak of
an RV we will assume that it is defined on some probability space.
Example 1.LetXbe defined on(Ω,S,P)by
X(ω)=c for allω∈Ω.

46 RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS
Then
P{X=c}=1,
F(x)=Q(−∞,x]=P{X
−1
(−∞,x]}=0ifx<c
and
F(x)=1ifx≥c.
Example 2.LetΩ={H, T}andXbe defined by
X(H) =1, X(T) =0.
IfPassigns equal mass to{H}and{T}, then
P{X=0}=
1
2
=P{X=1}
and
F(x)=Q(−∞,x]=





0,x<0,
1
2
,0≤x<1,
1,1≤x.
Example 3.LetΩ={(i, j):i,j∈{1,2,3,4,5,6}}andSbe the set of all subsets ofΩ.Let
P{(i,j)}=1/6
2
for all 6
2
pairs(i,j)inΩ.Define
X(i,j)=i+j, 1≤i,j≤6.
Then,
F(x)=Q(−∞,x]=P{X≤x}=

























0, x<2,
1
36
, 2≤x<3,
3
36
, 3≤x<4,
6
36
, 4≤x<5,
.
.
.
35
36
,11≤x<12,
1, 12≤x.
Example 4.We return to Example 2.2.4. For every subintervalIof[0,1]letP(I)be the
length of the interval. Then(Ω,S,P)is a probability space, and the DF of RVX(ω)=ω,
ω∈Ωis given byF(x)=0ifx<0,F(x)=P{ω:X(ω)≤x}=P([0,x]) =xifx∈[0,1],
andF(x)=1ifx≥1.

DISCRETE AND CONTINUOUS RANDOM VARIABLES 47
PROBLEMS 2.3
1.Write the DF of RVXdefined in Problem 2.2.1, assuming that the coin is fair.
2.What is the DF of RVYdefined in Problem 2.2.2, assuming that the die is not loaded?
3.Do the following functions define DFs?
(a)F(x)=0ifx<0,=xif 0≤x<1/2, and=1ifx≥
1
2
.
(b)F(x)=(1/π)tan
−1
x,−∞<x<∞.
(c)F(x)=0ifx≤1, and=1−(1/x)if 1<x.
(d)F(x)=1−e
−x
ifx≥0, and=0ifx<0.
4.LetXbe an RV with DFF.
(a) IfFis the DF defined in Problem 3(a), findP{X>
1
4
},P{
1
3
<X≤
3
8
}.
(b) IfFis the DF defined in Problem 3(d), findP{−∞<X<2}.
2.4 DISCRETE AND CONTINUOUS RANDOM VARIABLES
LetXbe an RV defined on some fixed, but otherwise arbitrary, probability space(Ω,S,P),
and letFbe the DF ofX. In this book, we shall restrict ourselves mainly to two cases,
namely, the case in which the RV assumes at most a countable number of values and
hence its DF is a step function and that in which the DFFis (absolutely) continuous.
Definition 1.An RVXdefined on(Ω,S,P)is said to be of the discrete type, or simply
discrete, if there exists a countable setE⊆Rsuch thatP{X∈E}=1. The points ofE
which have positive mass are called jump points or points of increase of the DF ofX, and
their probabilities are called jumps of the DF.
Note thatE∈Bsince every one-point is inB. Indeed, ifx∈R, then
{x}=

ε
n=1

x−
1
n
<x≤x+
1
n

. (1)
Thus{X∈E}is an event. LetXtake on the valuex
iwith probabilityp i(i=1,2,...).
We have
P{ω:X(ω)=x
i}=p i,i=1,2,...,p i≥0 for alli.
Then


i=1
pi=1.
Definition 2.The collection of numbers{p
i}satisfyingP{X=x i}=p i≥0, for alliand


i=1
pi=1, is called the probability mass function (pmf) of RVX.
The DFFofXis given by
F(x)=P{X≤x}=

xi≤x
pi. (2)

48 RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS
IfIAdenotes the indicator function of the setA, we may write
X(ω)=


i=1
xiI
[X=x i](ω). (3)
Let us define a functionε(x)as follows:
ε(x)=
π
1,x≥0,
0,x<0.
Then we have
F(x)=


i=1
piε(x−x i). (4)
Example 1.The simplest example is that of an RVX degenerateatc,P{X=c}=1:
F(x)=ε(x−c)=
π
0,x<c,
1,x≥c.
Example 2.A box contains good and defective items. If an item drawn is good, we assign
the number 1 to the drawing; otherwise, the number 0. Letpbe the probability of drawing
at random a good item. Then
P
π
X=
0
1

=
π
1−p
p,
and
F(x)=P{X≤x}=





0, x<0,
1−p,0≤x<1,
1, 1≤x.
Example 3.LetXbe an RV with PMF
P{X=k}=
6
π
2
·
1
k
2
,k=1,2,....
Then,
F(x)=
6
π
2


k=1
1
k
2
ε(x−k).
Theorem 1.Let{p
k}be a collection of nonnegative real numbers such that


k=1
pk=1.
Then{p
k}is the PMF of some RVX.

DISCRETE AND CONTINUOUS RANDOM VARIABLES 49
We next consider RVs associated with DFs that have no jump points. The DF of such
an RV is continuous. We shall restrict our attention to a special subclass of such RVs.
Definition 3.LetXbe an RV defined on(Ω,S,P)with DFF. ThenXis said to be of
the continuous type (or, simply, continuous) ifFis absolutely continuous, that is, if there
exists a nonnegative functionf(x)such that for every real numberxwe have
F(x)=

x
−∞
f(t)dt. (5)
The functionfis called the probability density function (PDF) of the RVX.
Note thatf≥0 and satisfieslim
x→+∞ F(x)=F(+∞)=


−∞
f(t)dt=1. Letaandb
be any two real numbers witha<b. Then
P{a<X≤b}=F(b)−F(a)
=

b
a
f(t)dt.
In view of remarks following Definition 2.2.1, the following result holds.
Theorem 2.LetXbe an RV of the continuous type with PDFf. Then for every Borel set
B∈B
P(B)=

B
f(t)dt. (6)
IfFis absolutely continuous andfis continuous atx,wehave
F
λ
(x)=
dF(x)
dx
=f(x). (7)
Theorem 3.Every nonnegative real functionfthat is integrable overRand satisfies


−∞
f(x)dx=1
is the PDF of some continuous type RVX.
Proof.In view of Theorem 2.3.5 it suffices to show that there corresponds a DFFtof.
Define
F(x)=

x
−∞
f(t)dt,x∈R.

50 RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS
ThenF(−∞)=0,F(+∞)=1, and, if x 2>x1,
F(x
2)=

x1
−∞
+

x2
x1

f(t)dt≥

x1
−∞
f(t)dt=F(x 1).
Finally,Fis (absolutely) continuous and hence continuous from the right.
Remark 1.In the discrete case,P{X=a}is the probability thatXtakes the valuea.Inthe
continuous case,f(a)is not the probability thatXtakes the valuea. Indeed, ifXis of the
continuous type, it assumes every value with probability 0.
Theorem 4.LetXbe any RV. Then
P{X=a}= lim
t→a
t<a
P{t<X≤a}. (8)
Proof.Lett
1<t2<···<a,t n→a, and write
A
n={tn<X≤a}.
ThenA
nis a nonincreasing sequence of events which converges to


n=1
An={X=a}.It
follows thatlim
n→∞PAn=P{X=a}.
Remark 2.SinceP{t<X≤a}=F(a)−F(t), it follows that
lim
t→a
t<a
P{t<X≤a}=P{X=a}=F(a)−lim
t→a
t<a
F(t)
=F(a)−F(a−).
ThusFhas a jump discontinuity ataif and only ifP{X=a}>0, that is,Fis continuous
ataif and only ifP{X=a}=0. IfXis an RV of the continuous type,P{X=a}=0for
alla∈R. Moreover,
P{X∈R−{a}}=1.
This justifies Remark 1.3.4.
Remark 3.The set of real numbersxfor which a DFFincreases is called thesupport
ofF.LetXbe the RV with DFF, and letSbe the support ofF. ThenP(X∈S)=1 and
P(X∈S
c
)=0. The set of positive integers is the support of the DF in Example 3, and the
open interval(0,1)is the support ofFin Example 4 below.
Example 4.LetXbe an RV with DFFgiven by (Fig. 1)
F(x)=





0,x≤0,
x,0<x≤1,
1,1<x.

DISCRETE AND CONTINUOUS RANDOM VARIABLES 51
0 0.5 1 1.5
1
F(x)
Fig. 1
1
1
f(x)
x
Fig. 2
DifferentiatingFwith respect toxat continuity points off, we get
f(x)=F

(x)=

0,x<0orx>1,
1,0<x<1.
The functionfis not continuous atx=0oratx =1 (Fig. 2). We may definef(0)andf(1)
in any manner. Choosingf(0)=f (1)=0, we have
f(x)=

1,0<x<1,
0,otherwise.

52 RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS
Then
P{0.4 <X≤0.6}=F(0.6)−F(0.4)= 0.2.
Example 5.LetXhave thetriangularPDF (Fig. 3)
f(x)=





x, 0<x≤1,
2−x,1≤x≤2,
0, otherwise.
It is easy to check thatfis a PDF. For the DFFofXwe have (Fig. 4)
F(x)=0i fx≤0,
F(x)=

x
0
tdt=
x
2
2
if 0<x≤1,
F(x)=

1
0
tdt+

x
1
(2−t)dt=2x−
x
2
2
−1if1 <x≤2,
and
F(x)=1i fx≥2.
Then
P{0.3 <X≤1.5}=P{X≤1.5}−P{X≤0.3}
=0.83.
1
012 x
f(x)
Fig. 3Graph off.

DISCRETE AND CONTINUOUS RANDOM VARIABLES 53
0
1
x
F(x)
F(x)
12
Fig. 4Graph ofF.
Example 6.Letk>0 be a constant, and
f(x)=
π
kx(1−x),0<x<1,
0, otherwise.
Then

1
0
f(x)dx=k/6. It follows thatf(x)defines a PDF ifk=6. We have
P{X>0.3}=1−6

.3
0
x(1−x)dx=0.784.
We conclude this discussion by emphasizing that the two types of RVs considered above
form only a part of the class of all RVs. These two classes, however, contain practically all
the random variables that arise in practice. We note without proof (see Chung [15, p. 9])
that every DFFcan be decomposed into two parts according to
F(x)=aF
d(x)+(1−a)F c(x). (9)
HereF
dandF care both DFs;F dis the DF of a discrete RV, whileF cis a continuous (not
necessarily absolutely continuous) DF. In fact,F
ccan be further decomposed, but we will
not go into that (see Chung [15, p.11]).
Example 7.LetXbe an RV with DF
F(x)=









0, x<0,
1
2
, x=0,
1
2
+
x
2
,0<x<1,
1, 1≤x.

54 RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS
Note that the DFFhas a jump atx=0 andFis continuous (in fact, absolutely continuous)
in the interval(0,1).Fis the DF of an RVXthat is neither discrete nor continuous. We
can write
F(x)=
1
2
F
d(x)+
1
2
F
c(x),
where
F
d(x)=
π
0,x<0,
1,x≥0
and
F
c(x)=





0,x≤0,
x,0<x<1,
1,1≤x.
HereF
d(x)is the DF of the RV degenerate atx=0, andF c(x)is the DF with PDF
f
c(x)=
π
1,0<x<1,
0,otherwise.
PROBLEMS 2.4
1.Let
p
k=p(1−p)
k
,k=0,1,2,...,0<p<1.
Does{p
k}define the PMF of some RV? What is the DF of this RV? IfXis an RV
with PMF{p
k}, what isP{n≤X≤N}, wheren,N(N>n) are positive integers?
2.In Problem 2.3.3, find the PDF associated with the DFs of parts (b), (c), and (d).
3.Does the functionf
θ(x)=θ
2
xe
−θx
ifx>0, and=0ifx≤0, whereθ>0, define
a PDF? Find the DF associated withf
θ(x);ifXis an RV with PDFf θ(x),find
P{X≥1}.
4.Does the functionf
θ(x)={(x+1)/[θ(θ+1)]}e
−x/θ
ifx>0, and=0 otherwise,
whereθ>0 define a PDF? Find the corresponding df.
5.For what values ofKdo the following functions define the PMF of some RV?
(a)f(x)=K(λ
x
/x!),x=0,1,2,..., λ>0.
(b)f(x)=K/N,x=1,2,...,N.
6.Show that the function
f(x)=
1
2
e
−|x|
,−∞<x<∞,
is a PDF. Find its DF.
7.For the PDFf(x)=xif 0≤x<1, and=2−xif 1≤x<2, findP{1/6 <X≤7/4}.

FUNCTIONS OF A RANDOM VARIABLE 55
8.Which of the following functions are density functions:
(a)f(x)=x(2−x),0<x<2, and 0 elsewhere.
(b)f(x)=x(2x−1),0<x<2, and 0 elsewhere.
(c)f(x)=
1
λ
exp{−(x−θ)/λ},x>θ, and 0 elsewhere,λ>0.
(d)f(x)=sinx,0<x<π/2, and 0 elsewhere.
(e)f(x)=0forx <0,=(x+1)/9for0≤x<1,=2(2x−1)/9for1≤x<3/2,
=2(5−2x)/9for3/2≤x<1,=4/27 for 2≤x<5, and 0 elsewhere.
(f)f(x)=1/[π (1+x
2
)],x∈R.
9.Are the following functions distribution functions? If so, find the corresponding
density or probability functions.
(a)F(x)=0forx≤0,=x/2for0≤x<1,=1/2for1≤x<2,=x/4for
2≤x<4 and=1forx≥4.
(b)F(x)=0ifx <−θ,=
1
2

x
θ
+1

if|x|≤θ, and 1 forx>θwhereθ>0.
(c)F(x)=0ifx <0, and=1−(1+x)exp(−x)ifx≥0.
(d)F(x)=0ifx <1,=(x−1)
2
/8if1≤x<3, and 1 forx≥3.
(e)F(x)=0ifx <0, and=1−e
−x
2
ifx≥0.
10.SupposeP(X≥x)is given for a random variableX(of the continuous type) for
allx. How will you find the corresponding density function? In particular find the
density function in each of the following cases:
(a)P(X≥x)=1ifx≤0, andP(X≥x)=e
−λx
forx>0,λ>0 is a constant.
(b)P(X≥x)=1ifx<0, and=(1+x/λ)
−λ
,forx≥0,λ>0 is a constant.
(c)P(X≥x)=1ifx≤0, and=3/(1+x)
2
−2/(1+x)
3
ifx>0.
(d)P(X>x)=1ifx≤x
0, and=(x 0/x)
α
ifx>x 0;x0>0 andα>0 are constants.
2.5 FUNCTIONS OF A RANDOM VARIABLE
LetXbe an RV with a known distribution, and letgbe a function defined on the real line.
We seek the distribution ofY=g(X), provided thatYis also an RV. We first prove the
following result.
Theorem 1.LetXbe an RV defined on(Ω,S,P). Also, letgbe a Borel-measurable
function onR. Theng(X)is also an RV.
Proof.Fory∈R,wehave
{g(X)≤y}={X∈g
−1
(−∞,y]},
and sincegis Borel-measurable,g
−1
(−∞,y]is a Borel set. It follows that{g(X)≤y}∈S,
and the proof is complete.
Theorem 2.Given an RVXwith a known DF, the distribution of the RVY=g(X), where
gis a Borel-measurable function, is determined.

56 RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS
Proof.Indeed, for ally∈R
P{Y≤y}=P{X∈g
−1
(−∞,y]}. (1)
In what follows, we will always assume that the functions under consideration are
Borel-measurable.
Example 1.LetXbe an RV with DFF. Then|X|,aX+b(wherea =0 andbare constants),
X
k
(wherek≥0 is an integer), and|X|
α
(α>0) are all RVs. Define
X
+
=

X,X≥0,
0,X<0,
and
X

=

X,X≤0,
0,X>0.
ThenX
+
,X

arealsoRVs.Wehave
P{|X|≤y}=P{−y≤X≤y}=P{X≤y}−P{X<−y}
=F(y)−F(−y)+P{X =−y}, y>0;
P{aX+b≤y}=P{aX≤y−b}
=







P

X≤
y−b
a

ifa>0,
P

X≥
y−b
a

ifa<0;
P{X
+
≤y}=





0i fy<0,
P{X≤0} ify=0,
P{X<0}+P{0≤X≤y}ify>0.
Similarly,
P{X

≤y}=

1if y≥0,
P{X≤y}ify<0.
LetXbe an RV of the discrete type, andAbe the countable set such thatP{X∈A}=1
andP{X=x}>0forx ∈A.LetY=g(X)be a one-to-one mapping fromAonto some
setB. Then the inverse map,g
−1
, is a single-valued function ofy.TofindP{Y =y},we
note that
P{Y=y}=P{g(X )=y}=P{X=g
−1
(y)}, y∈B,
andP{Y=y}=0,y∈B
c
.

FUNCTIONS OF A RANDOM VARIABLE 57
Example 2.LetXbe aPoissonRV with PMF
P{X=k}=



e
−λ
λ
k
k!
,k=0,1,2,...;λ>0,
0, otherwise.
LetY=X
2
+3. Theny=x
2
+3 mapsA={0,1,2,...}ontoB={3,4,7,12,19,28,...}.
The inverse map isx=

(y−3), and since there are no negative values inAwe take the
positive square root ofy−3. We have
P{Y=y}=P{X=

y−3}=
e
−λ
λ

y−3

(y−3)!
,y∈B,
andP{Y=y}=0 elsewhere.
Actually the restriction to a single-valued inverse ongis not necessary. Ifghas a
finite (or even a countable) number of inverses for eachy, from countable additivity ofP
we have
P{Y=y}=P{g(X )=y}=P


a
[X=a,g(a)= y]

=

a
P{X=a,g(a)= y}.
Example 3.LetXbe an RV with PMF
P{X=−2}=
1
5
,P{X=−1}=
1
6
,P{X=0}=
1
5
,
P{X=1}=
1
15
,andP{X=2}=
11
30
.
LetY=X
2
. Then
A={−2,−1,0,1,2} andB={0,1,4}.
We have
P{Y=y}=







1
5
y=0,
1
6
+
1
15
=
7
30
,y=1,
1
5
+
11
30
=
17
30
,y=4.
The case in whichXis an RV of the continuous type is not as simple. First we note that
ifXis a continuous type RV andgis some Borel-measurable function,Y=g(X)may not
be an RV of the continuous type.

58 RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS
Example 4.LetXbe an RV withuniformdistribution on[−1,1], that is, the PDF ofXis
f(x)=1/2, −1≤x≤1, and=0 elsewhere. LetY=X
+
. Then, from Example 1,
P{Y≤y}=











0, y<0,
1
2
, y=0,
1
2
+
1
2
y,1≥y>0,
1, y>1.
WeseethattheDFofY has a jump aty=0 and thatYis neither discrete nor continuous.
Note that all we require is thatP{X<0}>0forX
+
to be of the mixed type.
Example 4 shows that we need some conditions ongto ensure thatg(X)is also an RV
of the continuous type wheneverXis continuous. This is the case whengis a continuous
monotonic function. A sufficient condition is given in the following theorem.
Theorem 3.LetXbe an RV of the continuous type with PDFf.Lety =g(x)be differen-
tiable for allxand eitherg

(x)>0 for allxorg

(x)<0 for allx. ThenY=g(X)is also
an RV of the continuous type with PDF given by
h(y)=



f[g
−1
(y)]




d
dy
g
−1
(y)

,α<y<β,
0, otherwise,
(2)
whereα=min{g(−∞ ),g(+∞)} andβ=max{g(−∞ ),g(+∞)}.
Proof.Ifgis differentiable for allxandg

(x)>0 for allx, thengis continuous and strictly
increasing, the limitsα,βexist (may be infinite), and the inverse functionx=g
−1
(y)
exists, is strictly increasing, and is differentiable. The DF ofYforα<y<βis given by
P{Y≤y}=P{X≤g
−1
(y)}.
The PDF ofgis obtained on differentiation. We have
h(y)=
d
dy
P{Y≤y}
=f[g
−1
(y)]
d
dy
g
−1
(y).
Similarly, ifg

<0, thengis strictly decreasing and we have
P{Y≤y}=P{X≥g
−1
(y)}
=1−P{X≤g
−1
(y)} (Xis a continuous type RV)
so that
h(y)=−f[g
−1
(y)]·
d
dy
g
−1
(y).

FUNCTIONS OF A RANDOM VARIABLE 59
Sincegandg
−1
are both strictly decreasing,(d/dy)g
−1
(y)is negative and (2) follows.
Note that
d
dy
g
−1
(y)=
1
dg(x)/dx




x=g
−1
(y)
,
so that (2) may be rewritten as
h(y)=
f(x)
|dg(x)/dx|



x=g
−1
(y)
,α< y<β. (3)
Remark 1.The key to computation of the induced distribution ofY=g(X)from the dis-
tribution ofXis (1). If the conditions of Theorem 3 are satisfied, we are able to identify
the set{X∈g
−1
(−∞,y]}as{X≤g
−1
(y)}or{X≥g
−1
(y)}, according to whethergis
increasing or decreasing. In practice Theorem 3 is quite useful, but whenever the condi-
tions are violated one should return to (1) to compute the induced distribution. This is the
case, for example, in Examples 7 and 8 and Theorem 4 below.
Remark 2.If the PDFfofXvanishes outside an interval[a,b]of finite length, we need
only to assume thatgis differentiable in(a,b)and eitherg
λ
(x)>0org
λ
(x)<0 throughout
the interval. Then we take
α=min{g(a), g(b)} andβ=max{g(a), g(b)}
in Theorem 3.
Example 5.LetXhave the densityf(x)=1, 0<x<1, and=0 otherwise. LetY=e
X
.
ThenX=logY, and we have
h(y)=




1
y

·1, 0<logy<1,
that is,
h(y)=



1
y
,1<y<e,
0,otherwise.
Ify=−2logx, thenx=e
−y/2
and
h(y)=






12
e
−y/2





·1,0<e
−y/2
<1,
=

1
2
e
−y/2
, 0<y<∞,
0, otherwise.

60 RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS
Example 6.LetXbe a nonnegative RV of the continuous type with PDFf, and letα>0.
LetY=X
α
. Then
P{X
α
≤y}=
π
P{X≤y
1/α
}ify≥0,
0i fy<0.
The PDF ofYis given by
h(y)=f(y
1/α
)




d
dy
y
1/α

=



1
α
y
1/α−1
f(y
1/α
),y>0,
0, y≤0.
Example 7.LetXbe an RV with PDF
f(x)=
1


e
−x
2
/2
,−∞<x<∞.
LetY=X
2
. In this case,g
λ
(x)=2xwhich is>0forx >0, and<0forx <0, so that the
conditions of Theorem 3 are not satisfied. But fory>0
P{Y≤y}=P{−

y≤X≤

y}
=F(

y)−F(−

y),
whereFis the DF ofX. Thus the PDF ofYis given by
h(y)=



1
2

y
{f(

y)+f(−

y)}, y>0,
0, y≤0.
Thus
h(y)=



1

2πy
e
−y/2
,0<y,
0, y≤0.
Example 8.LetXbe an RV with PDF
f(x)=



2x
π
2
,0<x<π,
0, otherwise.
LetY=sinX. In this caseg
λ
(x)=cosx>0forx in(0,π/2)and<0forx in(π/2,π),
so that the conditions of Theorem 3 are not satisfied. To compute the PDF ofYwe return
to (1) and see that (Fig. 1) the DF ofYis given by

FUNCTIONS OF A RANDOM VARIABLE 61

y
Fig. 1y=sinx,0≤x≤π.
P{Y≤y}=P{sinX≤y}, 0<y<1,
=P{[0≤X≤x
1]∪[x 2≤X≤π]},
wherex
1=sin
−1
yandx 2=π−sin
−1
y. Thus
P{Y≤y}=

x1
0
f(x)dx+

π
x
2
f(x)dx
=

x
1
π

2
+1−

x
2
π

2
,
and the PDF ofYis given by
h(y)=
d
dy

sin
−1
y
π

2
+
d
dy

1−

π−sin
−1
y
π

2

=



2
π

1−y
2
,0<y<1,
0, otherwise.
In Examples 7 and 8 the functiony=g(x)can be written as the sum of two mono-
tone functions. We applied Theorem 3 to each of these monotonic summands. These two
examples are special cases of the following result.
Theorem 4.LetXbe an RV of the continuous type with PDFf.Lety =g(x)be differen-
tiable for allx, and assume thatg

(x)is continuous and nonzero at all but a finite number
of values ofx. Then, for every real numbery,

62 RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS
(a) there exist a positive integern=n(y)and real numbers (inverses)x 1(y),x 2(y),...,
x
n(y)such that
g[x
k(y)] =y, g
λ
[xk(y)] =0, k=1,2,...,n(y),
or
(b) there does not exist anyxsuch thatg(x)=y,g
λ
(x) =0, in which case we write
n(y)=0.
ThenYis a continuous RV with PDF given by
h(y)=





n

k=1
f[xk(y)]|g
λ
[xk(y)]|
−1
ifn>0,
0i fn=0.
Example 9.LetXbe an RV with PDFf, and letY=|X|.Heren(y)=2, x
1(y)=y,
x
2(y)=− yfory>0, and
h(y)=
π
f(y)+f (−y),y>0,
0, y≤0.
Thus, iff(x)=1/2, −1≤x≤1, and=0 otherwise, then
h(y)=
π
1,0≤y≤1,
0,otherwise.
Iff(x)=(1/

2π)e
−(x
2
/2)
,−∞<x<∞, then
h(y)=



2


e
−(y
2
/2)
,y>0,
0, otherwise.
Example 10.LetXbe an RV of the continuous type with PDFf, and letY=X
2m
, where
mis a positive integer. In this caseg(x)=x
2m
,g
λ
(x)=2mx
2m−1
>0forx >0 andg
λ
(x)<0
forx<0. Writingn=2m, we see that, for anyy>0,n(y)=2,x
1(y)=−y
1/n
,x2(y)=y
1/n
.
It follows that
h(y)=f[x
1(y)]·
1
ny
1−1/n
+f[x 2(y)]
1
ny
1−1/n
=



1
ny
1−1/n
{f(y
1/n
)+f(−y
1/n
)}ify>0,
0i fy≤0.

FUNCTIONS OF A RANDOM VARIABLE 63
In particular, iffis the PDF given in Example 7, then
h(y)=





2

2πny
1−1/n
exp


y
2/n
2

ify>0,
0i fy≤0.
Remark 3.The basic formula (1) and the countable additivity of probability allow us to
compute the distribution ofY=g(X)in some instances even ifghas a countable number
of inverses. LetA⊆RandgmapAintoB⊆R. Suppose thatAcan be represented as a
countable union of disjoint setsA
k,k=1,2,....Then the DF ofYis given by
P{Y≤y}=P{X∈g
−1
(−∞,y]}
=P

X∈


k=1
[{g
−1
(−∞,y]}∩A k]

=


k=1
P

X∈A k∩{g
−1
(−∞,y]}

.
If the conditions of Theorem 3 are satisfied by the restriction ofgto eachA
k,wemay
obtain the PDF ofYon differentiating the DF ofY. We remind the reader that term-by-term
differentiation is permissible if the differentiated series is uniformly convergent.
Example 11.LetXbe an RV with PDF
f(x)=

θe
−θx
,x>0,
0, x≤0,
θ>0.
LetY=sinX, and letsin
−1
ybe the principal value. Then (Fig. 2), for 0<y<1,
P{sinX≤y}
=P{0<X≤sin
−1
yor(2n−1)π−sin
−1
y≤X≤2nπ+sin
−1
y
for all integersn≥1}
=P{0<X≤sin
−1
y}+


n=1
P{(2n −1)π−sin
−1
y≤X≤2nπ+sin
−1
y}
=1−e
−θsin
−1
y
+


n=1
[e
−θ[(2n−1)π−sin
−1
y]
−e
−θ(2nπ+sin
−1
y)
]
=1−e
−θsin
−1
y
+(e
θπ+θsin
−1
y
−e
−θsin
−1
y
)


n=1
e
−(2θπ)n
=1−e
−θsin
−1
y
+(e
θπ+θsin
−1
y
−e
−θsin
−1
y
)

e
−2θπ
1−e
−2θπ

=1+
e
−θπ+θsin
−1
y
−e
−θsin
−1
y
1−e
−2πθ
.

64 RANDOM VARIABLES AND THEIR PROBABILITY DISTRIBUTIONS
3π2π
y
0

Fig. 2y=sinx,x≥0.
A similar computation can be made fory<0. It follows that the PDF ofYis given by
h(y)=





θe
−θπ
(1−e
−2θπ
)
−1
(1−y
2
)
−1/2
[e
θsin
−1
y
+e
−θπ−θsin
−1
y
]if−1<y<0,
θ(1−e
−2θπ
)
−1
(1−y
2
)
−1/2
[e
−θsin
−1
y
+e
−θπ+θsin
−1
y
] if 0<y<1,
0 otherwise.
PROBLEMS 2.5
1.LetXbe a random variable with probability mass function
P{X=r}=

n
r

p
r
(1−p)
n−r
,r=0,1,2,...,n,0≤p≤1.
Find the PMFs of the RVs (a)Y=aX+b,(b)Y=X
2
, and (c)Y=

X.
2.LetXbe an RV with PDF
f(x)=











0ifx≤0,
1
2
if 0<x≤1,
1
2x
2
if 1<x<∞.
Find the PDF of the RV 1/X .

FUNCTIONS OF A RANDOM VARIABLE 65
3.LetXbe a positive RV of the continuous type with PDFf(·). Find the PDF of the
RVU=X/(1+X). If, in particular,Xhas the PDF
f(x)=

1,0≤x≤1,
0,otherwise,
what is the PDF ofU?
4.LetXbe an RV with PDFfdefined by Example 11. LetY=cosXandZ=tanX.
Find the DFs and PDFs ofYandZ.
5.LetXbe an RV with PDF
f
θ(x)=

θe
−θx
ifx≥0,
0 otherwise,
whereθ>0. LetY=[X−1/θ]
2
. Find the PDF ofY.
6.A point is chosen at random on the circumference of a circle of radiusrwith center
at the origin, that is, the polar angleθof the point chosen has the PDF
f(θ)=
1

,θ ∈(−π,π).
Find the PDF of the abscissa of the point selected.
7.For the RVXof Example 7 find the PDF of the following RVs: (a)Y
1=e
X
,(b)Y 2=
2X
2
+1, and (c)Y 3=g(X), whereg(x)=1ifx>0,=1/2ifx=0, and=−1if
x<0.
8.Suppose that a projectile is fired at an angleθabove the earth with a velocityV.
Assuming thatθis an RV with PDF
f(θ)=



12
π
if
π
6
<θ<
π
4
,
0 otherwise,
find the PDF of the rangeRof the projectile, whereR=V
2
sin2θ/g,gbeing the
gravitational constant.
9.LetXbe an RV with PDFf(x)=1/(2π)if 0<x<2π, and=0 otherwise. Let
Y=sinX. Find the DF and PDF ofY.
10.LetXbe an RV with PDFf(x)=1/3if−1 <x<2, and=0 otherwise. LetY=|X|.
Find the PDF ofY.
11.LetXbe an RV with PDFf(x)=1/(2θ)if−θ≤x≤θ, and=0 otherwise. Let
Y=1/X
2
. Find the PDF ofY.
12.LetXbe an RV of the continuous type, and letY=g(X)be defined as follows:
(a)g(x)=1ifx>0, and=−1ifx≤0.
(b)g(x)=bifx≥b,=xif|x|<b, and=−bifx≤−b.
(c)g(x)=xif|x|≥b, and=0if|x|<b.
Find the distribution ofYin each case.

3
MOMENTS AND GENERATING
FUNCTIONS
3.1 INTRODUCTION
The study of probability distributions of a random variable is essentially the study of some
numerical characteristics associated with them. These so-calledparameters of the distribu-
tionplay a key role in mathematical statistics. In Section 3.2 we introduce some of these
parameters, namely, moments and order parameters, and investigate their properties. In
Section 3.3 the idea of generating functions is introduced. In particular, we study prob-
ability generating functions, moment generating functions, and characteristic functions.
Section 3.4 deals with some moment inequalities.
3.2 MOMENTS OF A DISTRIBUTION FUNCTION
In this section we investigate some numerical characteristics, calledparameters, associ-
ated with the distribution of an RVX. These parameters are (a)momentsand their functions
and (b)order parameters. We will concentrate mainly on moments and their properties.
LetXbe a random variable of the discrete type with probability mass function
p
k=P{X=x k},k=1,2,....If

Ω
k=1
|xk|pk<∞, (1)
An Introduction to Probability and Statistics, Third Edition. Vijay K. Rohatgi and A.K. Md. Ehsanes Saleh.
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.

68 MOMENTS AND GENERATING FUNCTIONS
we say that theexpected value(or themeanor themathematical expectation)of Xexists
and write
μ=EX=

ϕ
k=1
xkpk. (2)
Note that the series
π

k=1
xkpkmay converge but the series
π

k=1
|xk|pkmay not. In that
case we say thatEXdoes not exist.
Example 1.LetXhave the PMF given by
p
j=P
ω
X=(−1)
j+1
3
j
j
α
=
2
3
j
,j=1,2,....
Then

ϕ
j=1
|xj|pj=

ϕ
j=1
2
j
=∞,
andEXdoes not exist, although the series

ϕ
j=1
xjpj=

ϕ
j=1
(−1)
j+1
2
j
is convergent.
IfXis of the continuous type and has PDFf, we say thatEXexists and equals
β
xf(x)dx,
provided that
δ
|x|f(x)dx<∞.
A similar definition is given for the mean of any Borel-measurable functionh(X)ofX.
Thus, ifXis of the continuous type and has PDFf, we say thatEh(X)exists and equals
β
h(x)f(x)dx, provided that
δ
|h(x)|f(x)dx<∞.
We emphasize that the condition
β
|x|f(x)dx<∞must be checked before it can be
concluded thatEXexists and equals
β
xf(x)dx. Moreover, it is worthwhile to recall at
this point that the integral
β

−∞
ϕ(x)dxexists, provided that the limitlim
a→∞
b→∞
β
a
−b
ϕ(x)dx
exists. It is quite possible for the limitlim
a→∞
β
a
−a
ϕ(x)dxto exist without the existence
of
β

−∞
ϕ(x)dx. As an example, consider the Cauchy PDF:
f(x)=
1
π
1
1+x
2
,−∞<x<∞.

MOMENTS OF A DISTRIBUTION FUNCTION 69
Clearly
lim
a→∞
δ
a
−a
x
π
1
1+x
2
dx=0.
However,EXdoes not exist since the integral(1/π)
β

−∞
|x|/(1+x
2
)dxdiverges.
Remark 1.LetX(ω)=I
A(ω)for someA∈S. ThenEX=P(A).
Remark 2.If we writeh(X)=|X|, we see thatEXexists if and only ifE|X|does.
Remark 3.We say that an RVXissymmetricabout a pointαif
P{X≥α+x}=P{X≤α−x}for allx.
In terms of DFFofX, this means that, if
F(α−x)=1−F(α+x)+P{X =α+x}
holds for allx∈R, we say that the DFF(or the RVX) is symmetric withαas thecenter
ofsymmetry.Ifα=0, then for everyx
F(−x)=1−F(x)+P{X =x}.
In particular, ifXis an RV of the continuous type,Xis symmetric with centerαif and
only if the PDFfofXsatisfies
f(α−x)=f(α+x)for allx.
Ifα=0, we will say simply thatXis symmetric (or thatFis symmetric).
As an immediate consequence of this definition we see that, ifXis symmetric withα
as the center of symmetry andE|X|<∞, thenEX=α. A simple example of a symmetric
distribution is the Cauchy PDF considered above (before Remark 1). We will encounter
many such distributions later.
Remark 4.Ifaandbare constants andXis an RV withE|X|<∞, thenE|aX+b|<∞
andE{aX+b}=aEX+b. In particular,E{X−μ}=0, a fact that should not come as a
surprise.
Remark 5.IfXis bounded, that is, ifP{|X|<M}=1, 0<M<∞, thenEXexists.
Remark 6.If{X≥0}=1, andEXexists, thenEX≥0.
Theorem 1.LetXbe an RV, andgbe a Borel-measurable function onR.LetY =g(X).
IfXis of discrete type then
EY=

Ω
j=1
g(xj)P{X=x j} (3)

70 MOMENTS AND GENERATING FUNCTIONS
in the sense that, if either side of (3) exists, so does the other, and then the two are equal.
IfXis of continuous type with PDFfthenEY=

g(x)f(x)dxin the sense that, if either
of the two integrals converges absolutely, so does the other, and the two are equal.
Remark 7.LetXbe a discrete RV. Then Theorem 1 says that

Ω
j=1
g(xj)P{X=x j}=

Ω
k=1
ykP{Y=y k}
in the sense that, if either of the two series converges absolutely, so does the other, and
the two sums are equal. IfXis of the continuous type with PDFf,leth(y) be the PDF of
Y=g(X). Then, according to Theorem 1,

g(x)f(x)dx=

yh(y)dy,
provided thatE|g(X)|<∞.
Proof of Theorem 1.In the discrete case, suppose thatP{X∈A}=1. Ify=g(x)is a
one-to-one mapping ofAonto some setB, then
P{Y=y}=P{X=g
−1
(y)}, y∈B.
We have
Ω
x∈A
g(x)P{X=x}=
Ω
y∈B
yP{Y=y}.
In the continuous case, supposegsatisfies the conditions of Theorem 2.5.3. Then

g(x)f(x)dx=

β
α
yf[g
−1
(y)]
d
dy
g
−1
(y)|dy
by changing the variable toy=g(x). Thus

g(x)f(x)dx=

β
α
yh(y)dy.
The functionsh(x)=x
n
, wherenis a positive integer, andh(x)=|x|
α
, whereαis a pos-
itive real number, are of special importance. IfEX
n
exists for some positive integern,we
callEX
n
thenth momentof (the distribution function of)X about the origin.IfE |X|
α
<∞
for some positive real numberα, we callE|X|
α
theαth absolute moment ofX. We shall
use the following notation:
m
n=EX
n
βα=E|X|
α
, (4)
whenever the expectations exist.

MOMENTS OF A DISTRIBUTION FUNCTION 71
Example 2.LetXhave theuniformdistribution on the firstNnatural numbers, that is, let
P{X=k}=
1
N
,k=1,2,...,N.
Clearly, moments of all order exist:
EX=
N

k=1

1
N
=
N+1
2
,
EX
2
=
N

k=1
k
2
·
1
N
=
(N+1)(2N +1)
6
.
Example 3.LetXbe an RV with PDF
f(x)=



2
x
3
,x≥1,
0,x<1.
Then
EX=


1
2
x
2
dx=2.
But
EX
2
=


1
2
x
dx
does not exist. Indeed, it is easily possible to construct examples of random variables for
which all moments of a specified order exist by no higher-order moments do.
Example 4.Two players,AandB, play a coin-tossing game.AgivesBone dollar if a
head turns up; otherwiseBpaysAone dollar. If the probability that the coin shows a head
isp, find the expected gain ofA.
LetXdenote the gain ofA. Then
P{X=1}=P{Tails}=1−p, P{X=−1}=p
and
EX=1−p−p=1−2p

>0 if and only ifp<
1
2
,
=0 if and only ifp=
1
2
,
ThusEX=0 if and only if the coin isfair.

72 MOMENTS AND GENERATING FUNCTIONS
Theorem 2.If the moment of ordertexists for an RVX, moments of order 0<s<texist.
Proof.LetXbe of the continuous type with PDFf.Wehave
E|X|
s
=
δ
|x|
s
≤1
|x|
s
f(x)dx+
δ
|x|
s
>1
|x|
s
f(x)dx
≤P{|X|
s
≤1}+E|X|
t
<∞.
A similar proof can be given whenXis a discrete RV.
Theorem 3.LetXbe an RV on a probability space(Ω,S,P).LetE |X|
k
<∞for some
k>0. Then
n
k
P{|X|>n}→0as n→∞.
Proof.We provide the proof for the case in whichXis of the continuous type with
densityf.Wehave
∞>
δ
|x|
k
f(x)dx= lim
n→∞
δ
|x|≤n
|x|
k
f(x)dx.
It follows that
lim
n→∞
δ
|x|>n
|x|
k
f(x)dx→0as n→∞.
But
δ
|x|>n
|x|
k
f(x)dx≥n
k
P{|X|>n},
completing the proof.
Remark 8.Probabilities of the typeP{|X|>n}or either of its components,P{X>n}or
P{X<−n}, are calledtail probabilities. The result of Theorem 3, therefore, gives the rate
at whichP{|X|>n}converges to 0 asn→∞.
Remark 9.The converse of Theorem 3 does not hold in general, that is,
n
k
P{|X|>n}→0as n→∞for somek
does not necessarily imply thatE|X|
k
<∞,fortheRV
P{X=n}=
c
n
2
logn
,n=2,3,...,

MOMENTS OF A DISTRIBUTION FUNCTION 73
wherecis a constant determined from


n=2
c
n
2
logn
=1.
We have
P{X>n}≈c
δ

n
1
x
2
logx
dx≈cn
−1
(logn)
−1
andnP{X >n}→0asn →∞. (Here and subsequently≈means that the ratio of two
sides→1asn→∞.) But
EX=

c
nlogn
=∞.
In fact, we need
n
k+δ
P{|X|>n}→0as n→0
for someδ>0 to ensure thatE|X|
k
<∞. A condition such as this is called amoment
condition.
For the proof we need the following lemma.
Lemma 1.LetXbe a nonnegative RV with distribution functionF. Then
EX=
δ

0
[1−F(x)]dx, (5)
in the sense that, if either side exists, so does the other and the two are equal.
Proof.IfXis of the continuous type with densityfandEX<∞, then
EX=
δ

0
xf(x)dx= lim
n→∞
δ
n
0
xf(x)dx.
On integration by parts we obtain
δ
n
0
xf(x)dx=nF(n)−
δ
n
0
F(x)dx
=−n[1−F(n)] +
δ
n
0
[1−F(x)]dx.

74 MOMENTS AND GENERATING FUNCTIONS
But
n[1−F(n)] =n


n
f(x)dx
<


n
xf(x)dx,
and, sinceE|X|<∞, it follows that
n[1−F(n)]→0as n→∞.
We have
EX= lim
n→∞

n
0
xf(x)dx= lim
n→∞

n
0
[1−F(x)]dx
=


0
[1−F(x)]dx.
If


0
[1−F(x)]dx<∞, then

n
0
xf(x)dx≤

n
0
[1−F(x)]dx≤


0
[1−F(x)]dx,
and it follows thatE|X|<∞.
We leave the reader to complete the proof in the discrete case.
Corollary.For any RVX,E|X|<∞if and only if the integrals

0
−∞
P{X≤x}dxand


0
P{X>x}dxboth converge, and in that case
EX=


0
P{X>x}dx−

0
−∞
P{X≤x}dx.
Actually we can get a little more out of Lemma 1 than the above corollary. In fact,
E|X|
α
=


0
P{|X|
a
>x}dx=α


0
x
α−1
P{|X|>x}dx,
and we see thatan RV X possessesanabsolute moment of orderα>0if and only if
|x|
α−1
P{|X|>x}is integrable over(0,∞).
A simple application of the integral test leads to the followingmoments lemma.

MOMENTS OF A DISTRIBUTION FUNCTION 75
Lemma 2.
E|X|
α
<∞⇔


n=1
P{|X|>n
1/α
}<∞. (6)
Note that an immediate consequence of Lemma 2 is Theorem 3. We are now ready to
prove the following result.
Theorem 4.LetXbe an RV with a distribution satisfyingn
α
P{|X|>n}→0asn →∞
for someα>0. ThenE|X|
β
<∞for 0<β<α.
Proof.Givenε>0, we can choose anN=N(ε)such that
P{|X|>n}<
ε
n
α
for alln≥N.
It follows that for 0<β<α
E|X|
β


N
0
x
β−1
P{|X|>x}dx+β


N
x
β−1
P{|X|>x}dx
≤N
β
+βε


N
x
β−α−1
dx
<∞.
Remark 10.Using Theorems 3 and 4, we demonstrate the existence of random variables
for which moments of any order do not exist, that is, for whichE|X|
α
=∞for everyα>0.
For such an RVn
α
P{|X|>n}∞0asn→∞for anyα>0. Consider, for example, the
RVXwith PDF
f(x)=



1
2|x|(log|x|)
2
for|x|>e
0 otherwise.
TheDFofX is given by
F(x)=















1
2log|x|
ifx≤−e
1
2
if−e<x<e,
1−
1
2logx
ifx≥e.
Then forx>e
P{|X|>x}=1−F(x)+F(−x)
=
1
2logx
,

76 MOMENTS AND GENERATING FUNCTIONS
andx
α
P{|X|>x}→∞asx→∞for anyα>0. It follows thatE|X|
α
=∞for every
α>0. In this example we see thatP{|X|>cx}/P{|X |>x}→1asx →∞for every
c>0. A positive functionL(·)defined on(0,∞)is said to be a function ofslow variation
if and only ifL(cx)/L(x)→1asx→∞for everyc>0. For such a functionx
α
L(x)→∞
for everyα>0 (see Feller [26, pp. 275–279]). It follows that, ifP{|X|>x}is slowly
varying,E|X|
α
=∞for everyα>0. Functions of slow variation play an important role
in the theory of probability.
Random variables for whichP{|X|>x}is slowly varying are clearly excluded from
the domain of the following result.
Theorem 5.LetXbe an RV satisfying
P{|X|>cx}
P{|X|>x}
→0as x→∞ for allc>1; (7)
thenXpossesses moments of all orders. (Note that, ifc=1, the limit in (7) is 1, whereas
ifc<1, the limit will not go to 0 sinceP{|X|>cx}≥P{| X|>x}.)
Proof.Letε>0 (we will chooseεlater), choosex
0so large that
P{|X|>cx}
P{|X|>x}
<ε for allx≥x
0, (8)
and choosex
1so large that
P{|X|>x}<ε for allx≥x
1. (9)
LetN=max(x
0,x1). We have, for a fixed positive integerr,
P{|X|>c
r
x}
P{|X|>x}
=
r

p=1
P{|X|>c
p
x}
P{|X|>c
p−1
x}
≤ε
r
(10)
forx≥N. Thus forx≥Nwe have, in view of (9),
P{|X|>c
r
x}≤ε
r+1
. (11)
Next note that, for any fixed positive integern,
E|X|
n
=n
ν

0
x
n−1
P{|X|>x}dx
=n
ν
N
0
x
n−1
P{|X|>x}dx+n
ν

N
x
n−1
P{|X|>x}dx. (12)

MOMENTS OF A DISTRIBUTION FUNCTION 77
Since the first integral in (12) is finite, we need only show that the second integral is also
finite. We have


N
x
n−1
P{|X|>x}dx=


r=1

c
r
N
c
r−1
N
x
n−1
P{|X|>x}dx



r=1
(c
r
N)
n−1
ε
r
·2c
r
N
=2N
n


r=1
(εc
n
)
r
=2N
n
εc
n
1−εc
n
<∞,
provided that we chooseεsuch thatεc
n
<1. It follows thatE|X|
n
<∞forn=1,2,....
Actually we have shown that (7) impliesE|X|
δ
<∞for allδ>0.
Theorem 6.Ifh
1,h2,...,h nare Borel-measurable functions of an RVXandEh i(X)exists
fori=1,2,...,n, thenE
τ→
n
i=1
hi(X)

exists and equals

n
i=1
Ehi(X).
Definition 1.Letkbe a positive integer andcbe a constant. IfE(X−c)
k
exists, we call
it themoment of order kabout the pointc.Ifwetakec =EX=μ, which exists since
E|X|<∞, we callE(X−μ)
k
thecentral moment of order kor the moment of orderk
about the mean. We shall write
μ
k=E{X−μ}
k
.
If we knowm
1,m2,...,m k, we can computeμ 1,μ2,...,μk, and conversely. We have
μ
k=E{X−μ}
k
=mk−

k
1

μm k−1+

k
2

μ
2
mk−2−···+(−1)
k
μ
k
(13)
and
m
k=E{X−μ+μ}
k
=μk+

k
1

μμ k−1+

k
2

μ
2
μk−2+···+μ
k
. (14)
The casek=2 is of special importance.
Definition 2.IfEX
2
exists, we callE{X−μ}
2
thevarianceofX, and we writeσ
2
=
var(X)=E{X−μ}
2
. The quantityσis called thestandard deviation(SD) ofX.
From Theorem 6 we see that
σ
2
=μ2=EX
2
−(EX)
2
. (15)
Variance has some important properties.

78 MOMENTS AND GENERATING FUNCTIONS
Theorem 7.Var(X)=0 if and only ifXis degenerate.
Theorem 8.Var(X)<E(X−c)
2
for anyc =EX.
Proof.We have
var(X)=E{X−μ}
2
=E{X−c}
2
+(c−μ)
2
.
Note that
var(aX+b)=a
2
var(X).
LetE|X|
2
<∞. Then we define
Z=
X−EX

var(X)
=
X−μ
σ
(16)
and see thatEZ=0 andvar(Z)=1. We callZastandardizedRV.
Example 5.LetXbe an RV withbinomialPMF
P{X=k}=

n
k

p
k
(1−p)
n−k
,k=0,1,2,...,n;0<p<1.
Then
EX=

Ω
k=0
k

n
k

p
k
(1−p)
n−k
=np
Ω

n−1
k−1

p
k−1
(1−p)
n−k
=np;
EX
2
=E{X(X−1)+X }
=
Ω
k(k−1)

n
k

p
k
(1−p)
n−k
+np
=n(n−1)p
2
+np;
var(X)=n(n−1)p
2
+np−n
2
p
2
=np(1−p);
EX
3
=E{X(X−1)(X−2)+3X (X−1)+X }
=n(n−1)(n−2)p
3
+3n(n−1)p
2
+np;
μ
3=m3−3μm 2+2μ
3
=n(n−1)(n−2)p
3
+3n(n−1)p
2
+np−3np[n(n −1)p
2
+np]+2n
3
p
3
=np(1−p)(1−2p).

MOMENTS OF A DISTRIBUTION FUNCTION 79
p
1−p
f(x)
10
p(X)
Fig. 1Quantile of orderp.
In the above example we computedfactorial moments EX(X−1)(X−2)···(X−k+1)
for various values ofk. For some discrete integer-valued RVs whose PMF contains facto-
rials or binomial coefficients it may be more convenient to compute factorial moments.
We have seen that for some distributions even the mean does not exist. We next consider
some parameters, calledorder parameters, which always exist.
Definition 3.A numberx(Fig. 1) satisfying
P{X≤x}≥p, P{X≥x}≥1−p,0<p<1, (17)
is called a quantile of orderp[or(100p)th percentile] for the RVX(or, for the DFFofX).
We writez
p(X)for a quantile of orderpfor the RVX.
Ifxis a quantile of orderpfor an RVXwith DFF, then
p≤F(x)≤p+P{X=x}. (18)
IfP{X=x}=0, as is the case—in particular, ifXis of the continuous type—a quantile
of orderpis a solution of the equation
F(x)=p. (19)
IfFis strictly increasing, (19) has a unique solution. Otherwise (Fig. 2) there may be
many (even uncountably many) solutions of (19), each of which is then called a quantile
of orderp. Quantiles are of great deal of interest in testing of hypotheses.
Definition 4.LetXbe an RV with DFF. A numberxsatisfying
1
2
≤F(x)≤
1
2
+P{X=x} (20)

80 MOMENTS AND GENERATING FUNCTIONS
1
10 x
F(x)
p
p(X)
p
x10
1
F(x)
Fig. 2(a) Unique quantile and (b) infinitely many solutions ofF(x)=p.
or, equivalently,
P{X≤x}≥
1
2
andP{X≥x}≥
1
2
(21)
is called amedianofX(orF).
Again we note that there may be many values that satisfy (20) or (21). Thus a median
is not necessarily unique.
IfFis a symmetric DF, the center of symmetry is clearly the median of the DFF.
The median is an important centering constant especially in cases where the mean of the
distribution does not exist.
Example 6.LetXbe an RV withCauchyPDF
f(x)=
1
π
1
1+x
2
,−∞<x<∞.

MOMENTS OF A DISTRIBUTION FUNCTION 81
ThenE|X|is not finite butE|X|
δ
<∞for 0<δ<1. The median of the RV Xis
clearlyx=0.
Example 7.LetXbe an RV with PMF
P{X=−2}=P{X=0}=
1
4
,P{X=1}=
1
3
,P{X=2}=
1
6
.
Then
P{X≤0}=
1
2
andP{X≥0}=
3
4
>
1
2
.
In fact, ifxis any number such that 0<x<1, then
P{X≤x}=P{X=−2}+P{X=0}=
1
2
and
P{X≥x}=P{X=1}+P{X=2}=
1
2
,
and it follows that everyx,0≤x<1, is a median of the RVX.
Ifp=0.2, the quantile of orderpisx=−2, since
P{X≤−2}=
1
4
>p andP{X≥−2}=1>1−p.
PROBLEMS 3.2
1.Find the expected number of throws of a fair die until a 6 is obtained.
2.From a box containingNidentical tickets numbered 1 throughN,ntickets are
drawn with replacement. LetXbe the largest number drawn. FindEX.
3.LetXbe an RV with PDF
f(x)=
c
(1+x
2
)
m
,−∞<x<∞,m≥1,
wherec=Γ(m)/[Γ(1/2)Γ(m −1/2)]. Show thatEX
2r
exists if and only if 2r<
2m−1. What isEX
2r
if 2r<2m−1?
4.LetXbe an RV with PDF
f(x)=



ka
k
(x+a)
k+1
ifx≥0,
0 otherwise (a>0).
Show thatE|X|
α
<∞forα<k. Find the quantile of orderpfor the RVX.

82 MOMENTS AND GENERATING FUNCTIONS
5.LetXbe an RV such thatE|X|<∞. Show thatE|X−c|is minimized if we choose
cequal to the median of the distribution ofX.
6.Pareto’s distributionwith parametersαandβ(bothαandβpositive) is defined
by the PDF
f(x)=





βα
β
x
β+1
ifx≥α,
0if x<α.
Show that the moment of ordernexists if and only ifn<β.Letβ> 2. Find the
mean and the variance of the distribution.
7.For an RVXwith PDF
f(x)=







1
2
x if 0≤x<1,
1
2
if 1<x≤2,
1
2
(3−x)if 2<x≤3,
show that moments of all order exist. Find the mean and the variance ofX.
8.For the PMF of Example 5 show that
EX
4
=np+7n(n−1)p
2
+6n(n−1)(n−2)p
3
+n(n−1)(n−2)(n−3)p
4
and
μ
4=3(npq)
2
+npq(1 −6pq),
where 0≤p≤1,q=1−p.
9.For thePoissonRVXwith PMF
P{X=x}=e
−λ
λ
x
x!
,x=0,1,2,...,
show thatEX=λ,EX
2
=λ+λ
2
,EX
3
=λ+3λ
2

3
,EX
4
=λ+7λ
2
+6λ
3

4
,
andμ
2=μ3=λ,μ 4=λ+3λ
2
.
10.For any RVXwithE|X|
4
<∞define
α
3=
μ
3
(μ2)
3/2
,α 4=
μ
4
μ
2
2
.
Hereα
3is known as thecoefficient of skewnessand is sometimes used as a measure
of asymmetry, andα
4is known askurtosisand is used to measure the peakedness
(“flatness of the top”) of a distribution.
Computeα
3andα 4for the PMFs of Problems 8 and 9.
11.For a positive RVXdefine the negative moment of ordernbyEX
−n
, wheren>0
is an integer. FindE{1/(X +1)}for the PMFs of Example 5 and Problem 9.

GENERATING FUNCTIONS 83
12.Prove Theorem 6.
13.Prove Theorem 7.
14.In each of the following cases, computeEX,var(X), andEX
n
(forn≥0, an integer)
whenever they exist.
(a)f(x)=1,−1/2≤x≤1/2, and 0 elsewhere.
(b)f(x)=e
−x
,x≥0, and 0 elsewhere.
(c)f(x)=(k−1)/x
k
,x≥1, and 0 elsewhere;k>1 is a constant.
(d)f(x)=1/[π (1+x
2
)],−∞<x<∞.
(e)f(x)=6x (1−x),0<x<1, and 0 elsewhere.
(f)f(x)=xe
−x
,x≥0, and 0 elsewhere.
(g)P(X=x)=p(1−p)
x−1
,x=1,2,..., and 0 elsewhere: 0<p<1.
15.Find the quantile of orderp(0<p<1)for the following distributions.
(a)f(x)=1/x
2
,x≥1, and 0 elsewhere.
(b)f(x)=2x exp(−x
2
),x≥0, and 0 otherwise.
(c)f(x)=1/θ ,0≤x≤θ, and 0 elsewhere.
(d)P(X=x)=θ(1−θ)
x−1
,x=1,2,..., and 0 otherwise; 0<θ<1.
(e)f(x)=(1/β
2
)xexp(− x/β),x>0, and 0 otherwise;β>0.
(f)f(x)=(3/b
3
)(b−x)
2
,0<x<b, and 0 elsewhere.
3.3 GENERATING FUNCTIONS
In this section we consider some functions that generate probabilities or moments of an
RV. The simplest type of generating function in probability theory is the one associated
with integer-valued RVs. LetXbe an RV, and let
p
k=P{X=k}, k=0,1,2,...
with
Γ

k=0
pk=1.
Definition 1.The function defined by
P(s)=

Ω
k=0
pks
k
, (1)
which surely converges for|s|≤1, is called the probability generating function (PGF)
ofX.
Example 1.Consider the Poisson RV with PMF
P{X=k}=e
−λ
λ
k
k!
,k=0,1,2,....

84 MOMENTS AND GENERATING FUNCTIONS
We have
P(s)=

Ω
k=0
(sλ)
k
e
−λ
k!
=e
−λ
e

=e
−λ(1− s)
,for alls.
Example 2.LetXbe an RV withgeometricdistribution, that is, let
P{X=k}=pq
k
,k=0,1,2,...;0<p<1,q=1−p.
Then
P(s)=

Ω
k=0
s
k
pq
k
=p
1
1−sq
,|s|≤1.
Remark 1.SinceP(1)= 1, series (1) is uniformly and absolutely convergent in|s|≤1
and the PGFPis a continuous function ofs. It determines the PGF uniquely, sinceP(s)
can be represented in a unique manner as a power series.
Remark 2.Since a power series with radius of convergencercan be differentiated
termwise any number of times in(−r,r), it follows that
P
(k)
(s)=

Ω
n=k
n(n−1)···(n−k+1)P(X =n)s
n−k
,
whereP
(k)
is thekthderivative ofP. The series converges at least for−1<s<1. Fors=1
the right side reduces formally toE{X(X−1)···(X−k+1)}which is thekthfactorial
moment ofXwhenever it exists. In particular, ifEX<∞thenP
β
(1)=EX, and ifEX
2
<∞
thenP
ΩΩ
(1)=EX(X−1)and Var(X )=EX
2
−(EX)
2
=P
ΩΩ
(1)−[P
β
(1)]
2
+P
β
(1).
Example 3.In Example 1 we found thatP(s)=e
−λ(1− s)
,|s|≤1, for a Poisson RV. Thus
P
β
(s)=λ e
−λ(1−s)
,
P
ββ
(s)=λ
2
e
−λ(1− s)
.
Also,EX=λ,E{X
2
−X}=λ
2
, so thatvar(X)=EX
2
−(EX)
2

2
+λ−λ
2
=λ.
In Example 2 we computedP(s)=p/(1 −sq), so that
P
β
(s)=
pq
(1−sq)
2
andP
ββ
(s)=
2pq
2
(1−sq)
3
.
Thus
EX=
q
p
,EX
2
=
q
p
+
2pq
2
p
3
,var(X)=
q
2
p
2
+
q
p
=
q
p
2
.
Example 4.Consider the PGF
P(s)=[(1+s)/2]
n
,−∞<s<∞.

GENERATING FUNCTIONS 85
Expanding the right side into a power series we get
P(s)=
n
Ω
k=0
1
2
n

n
k

s
n−k
=
n
Ω
k=0
pks
k
,
and it follows that
P(X=k)=p
k=

n
k

2
n
,k=0,1,...,n.
We note that the PGF, being defined only for discrete integer-valued RVS, has limited
utility. We next consider a generating function which is quite useful in probability and
statistics.
Definition 2.LetXbe an RV defined on(Ω,S,P). The function
M(s)=Ee
sX
(2)
is known as the moment generating function (MGF) of the RVXif the expectation on the
right side of (2) exists in some neighborhood of the origin.
Example 5.LetXhave the PMF
f(x)=



6
π
2
·
1
k
2
,k=1,2,...,
0, otherwise.
Then(1/π
2
)
Γ

k=1
e
sk
/k
2
, is infinite for everys>0. We see that the MGF ofXdoes not
exist. In fact,EX=∞.
Example 6.LetXhave the PDF
f(x)=

1
2
e
−x/2
,x>0,
0, otherwise.
Then
M(s)=
1
2
δ

0
e
(s−1/2)x
dx
=
1
1−2s
,s<
1
2
.
Example 7.LetXhave the PMF
P{X=k}=



e
−λ
λ
k
k!
,k=0,1,2,...,
0, otherwise.

86 MOMENTS AND GENERATING FUNCTIONS
Then
M(s)=Ee
sX
=e
−λ

Ω
k=0
e
sk
λ
kk!
=e
−λ(1−e
s
)
for alls.
The following result will be quite useful subsequently.
Theorem 1.The MGF uniquely determines a DF and, conversely, if the MGF exists, it is
unique.
For the proof we refer the reader to Widder [117, p. 460], or Curtiss [19]. Theorem 2
explains why we callM(s)an MGF.
Theorem 2.If the MGFM(s)of an RVXexists forsin(−s
0,s0)say,s 0>0, the
derivatives of all order exist ats=0 and can be evaluated under the integral sign, that is,
M
(k)
(s)


s=0
=EX
k
for positive integralk. (3)
For the proof of Theorem 2 we refer to Widder [117, pp. 446–447]. See also Problem 9.
Remark 3.Alternatively, if the MGFM(s)exists forsin(−s
0,s0)says 0>0, one can
expressM(s)(uniquely) in a Maclaurin series expansion:
M(s)=M(0)+
M
ε
(0)
1!
s+
M
εε
(0)
2!
s
2
+···, (4)
so thatEX
k
is the coefficient ofs
k
/k!in expansion (4).
Example 8.LetXbe an RV with PDFf(x)=(1/2)e
−x/2
,x>0. From Example 6,M(s)=
1/(1−2s)fors<1/2. Thus
M
ε
(s)=
2
(1−2s)
2
andM
εε
(s)=
4·2
(1−2s)
3
,s<
1
2
.
It follows that
EX=2, EX
2
=8, and var(X)=4.
Example 9.LetXbe an RV with PDFf(x)=1, 0≤x≤1, and=0 otherwise. Then
M(s)=
ν
1
0
e
sx
dx=
e
s
−1
s
,alls,
M
ε
(s)=
e
s
·s−(e
s
−1)·1s
2
,
EX=M
ε
(0) = lim
s→0
se
s
−e
s
+1
s
2
=
1
2
.

GENERATING FUNCTIONS 87
We emphasize that the expectationEe
sX
does not exist unlesssis carefully restricted.
In fact, the requirement thatM(s)exists in a neighborhood of zero is a very strong require-
ment that is not satisfied by some common distributions. We next consider a generating
function which exists for all distributions.
Definition 3.LetXbe an RV. The complex-valued functionφdefined onRby
φ(t)=E(e
itX
)=E(costX)+iE(sintX),t∈R
wherei=

(−1)is the imaginary unit, is called thecharacteristic function(CF)
of RVX.
Clearly
φ(t)=
Ω
k
(costk+isintk)P(X=k)
in the discrete case, and
φ(t)=
δ

−∞
costxf(x)dx+i
δ

−∞
sintx f(x)dx
in the continuous case. Example 10.LetXbe a normal RV with PDF
f(x)=

1



exp

−x
2
2

,x∈R.
Then
φ(t)=

1


δ

−∞
costx e
−x
2
/2
dx+

i


δ

−∞
sintx e
−x
2
/2
dx.
Note thatsintxis an odd function and so also issintx e
−x
2
/2
. Thus the second integral on
the right-side vanishes and we have
φ(t)=

1


δ

−∞
costx e
−x
2
/2
dx
=

2


δ

0
costx e
−x
2
/2
dx=e
−t
2
/2
,t∈R.
Remark 4.Unlike an MGF which may not exist for some distributions, a CF always exists
which makes it a much more convenient tool. In fact, it is easy to see thatφis continuous
onR,|φ(t)|≤1 for allt, andφ(−t)=φ(t)whereφis the complex-conjugate ofφ. Thus
φis the CF of−X. Moreover,φuniquely determines the DF of RVX. For these and

88 MOMENTS AND GENERATING FUNCTIONS
many other properties of characteristic functions we need a comprehensive knowledge
of complex variable theory, well beyond the scope of this book. We refer the reader to
Lukacs [69].
Finally, we consider the problem of characterizing a distribution from its moments.
Given a set of constants{μ
0=1,μ 1,μ2,...}the problem of moments asks if they can be
moments of a distribution functionF. At this point it will be worthwhile to take note of
some facts.
First, we have seen that if theM(s)=Ee
sX
exists for someXforsin some neighborhood
of zero, thenE|X|
n
<∞for alln≥1. Suppose, however, thatE|X|
n
<∞for alln≥1. It
does not follow that the MGF ofXexists.
Example 11.LetXbe an RV with PDF
f(x)=ce
−|x|
α
,0<α<1,−∞<x<∞,
wherecis a constant determined from
c


−∞
e
−|x|
α
dx=1.
Lets>0. Then


0
e
sx
e
−x
α
dx=


0
e
x(s−x
α−1
)
dx
and sinceα−1<0,


0
s
sx
e
−x
α
dxis not finite for anys>0. Hence the MGF does not
exist. But
E|X|
n
=c


−∞
|x|
n
e
−|x|
α
dx=2c


0
x
n
e
−x
α
dx<∞ for eachn,
as is easily checked by substitutingy=x
α
.
Second, two (or more) RVs may have the same set of moments.
Example 12.LetXhavelognormalPDF
f(x)=(x

2π)
−1
e
−(logx)
2
/2
,x>0,
andf(x)=0forx≤0. LetX
ε,|ε|≤1, have PDF
f
ε(x)=f(x)[1+εsin(2πlogx)],x∈R.

GENERATING FUNCTIONS 89
(Note thatf ε≥0 for allε,|ε|≤1, and
ε


fε(x)dx=1, sof εis a PDF.) Since, however,
ν

0
x
k
f(x)sin(2πlogx)dx=

1


ν

−∞
e
−(t
2
/2)+kt
sin(2π t)dt
=

1



e
k
2
/2
ν

−∞
e
−y
2
/2
sin(2π y)dy
=0,
we see that
ν

0
x
k
f(x)dx=
ν

0
x
k
fε(x)dx
for allε,|ε|≤1, andk=0,1,2,....Butf(x) =f
ε(x).
Third, moments of any RVXnecessarily satisfy certain conditions. For example, if
β
ν=E|X|
ν
, we will see (Theorem 3.4.3) that(β ν)
1/ν
is an increasing function ofν.
Similarly, the quadratic form
E

n
β
i=1
X
αi
ti

2
≥0
yields a relation between moments of various orders ofX.
The following result, which we will not prove here, gives a sufficient condition for
unique determination ofFfrom its moments.
Theorem 3.Let{m
k}be the moment sequence of an RVX. If the series

β
k=1
mk
k!
s
k
(5)
converges absolutely for somes>0, then{m
k}uniquely determines the DFFofX.
Example 13.SupposeXhas PDF
f(x)=e
−x
,forx≥0,and=0forx <0.
ThenEX
k
=
ε

0
x
k
e
−x
dx=k!and from Theorem 3

β
k=1
mk
k!
s
k
=

β
k=1
s
k
=s/(1−s)

90 MOMENTS AND GENERATING FUNCTIONS
for 0<s<1sothat{m k}determinesFuniquely. In fact, from Remark 3
M(s)=

ϕ
k=0
mks
k
k!
=

ϕ
k=0
s
k
=
1
(1−s)
,
0<s<1, which is the MGF ofX.
In particular if for some constantc
|m
k|≤c
k
,k=1,2,...,
then

ϕ
k=1
|mk|
k!
s
k


ϕ
1
(cs)
k
k!
<e
cs
fors>0,
and the DF ofXis uniquely determined. Thus ifP{|X|≤c}=1forsomec >0, then all
moments ofXexist, satisfying|m
k|≤c
k
,k≥1, and the DF ofXis uniquely determined
from its moments.
Finally, we mention some sufficient conditions for a moment sequence to determine a
unique DF:
(i) The range of the RV is finite.
(ii) (Carleman)
π

k=1
(m2k)
−1/2k
=∞when the range of the RV is(−∞,∞).Ifthe
range is(0,∞), a sufficient condition is
π

k=1
(mk)
−1/2k
=∞.
(iii)lim
n→∞{(m2n)
1/2n
/2n}is finite.
PROBLEMS 3.3
1.Find the PGF of the RVs with the following PMFs:
(a)P{X=k}=

n
k

p
k
(1−p)
n−k
,k=0,1,2,...,0≤p≤1.
(b)P{X=k}=[e
−λ
/(1−e
−λ
)](λ
k
/k!),k=1,2,...;λ>0.
(c)P{X=k}=pq
k
(1−q
N+1
)
−1
,k=0,1,2,...,N;0<p<1,q=1−p.
2.LetXbe an integer-valued RV with PGFP(s).Letαandβbe nonnegative integers,
and writeY=αX+β. Find the PGF ofY.
3.LetXbe an integer-valued RV with PGFP(s), and suppose that the mgfM(s)exists
fors∈(−s
0,s0),s0>0. How areM(s)andP(s)related? UsingM
(k)
(s)|s=0=EX
k
for positive integralk,findEX
k
in terms of the derivatives ofP(s)for values of
k=1,2,3,4.
4.For the Cauchy PDF
f(x)=
1
π
1
1+x
2
,−∞<x<∞,
does the MGF exist?

GENERATING FUNCTIONS 91
5.LetXbe an RV with PMF
P{X=j}=p
j,j=0,1,2,....
SetP{X>j}=q
j,j=0,1,2,....Clearlyq j=pj+1+pj+2+···,j≥0. WriteQ(s)=


j=0
qjs
j
. Then the series forQ(s)converges in|s|<1. Show that
Q(s)=
1−P(s)
1−s
for|s|<1,
whereP(s)is the PGF ofX. Find the mean and the variance ofX(when they exist)
in terms ofQand its derivatives.
6.For the PMF
P{X=j}=
a

j
f(θ)
,j=0,1,2,..., θ >0,
wherea
j≥0 andf(θ)=


j=0
ajθ
j
, find the PGF and the MGF in terms off.
7.For theLaplacePDF
f(x)=
1

e
−|x−μ|/λ
,−∞<x<∞;λ>0, −∞<μ<∞,
show that the MGF exists and equals
M(t)=(1−λ
2
t
2
)
−1
e
μt
,|t|<
1
λ
.
8.For any integer-valued RVX, show that


n=0
s
n
P{X≤n}=(1−s)
−1
P(s),
wherePis the PGF ofX.
9.LetXbe an RV with MGFM(t), which exists fort∈(−t
0,t0),t0>0. Show that
E|X|
n
<n!s
−n
[M(s)+M (−s)]
for any fixeds,0<s<t
0, and for each integern≥1. Expandinge
tx
in a power
series, show that, fort∈(−s,s),0<s<t
0,
M(t)=


n=0
t
n
EX
n
n!
.

92 MOMENTS AND GENERATING FUNCTIONS
(Since a power series can be differentiated term by term within the interval of
convergence, it follows that for|t|<s,
M
(k)
(t)|t=0=EX
k
for each integerk≥1.) (Roy, LePage, and Moore [95])
10.LetXbe an integer-valued random variable with
E{X(X−1)···(X−k+1)}=





k!

n
k

ifk=0,1,2,...,n
0if k>n.
Show thatXmust be degenerate atn.
[Hint:Prove and use the fact that ifEX
k
<∞for allk, then
P(s)=


k=0
(s−1)
k
k!
E{X(X−1)···(X−k+1)}.
WriteP(s)as
P(s)=


k=0
P(X=k)s
k
=


k=0
P(X=k)
k

i=0
(s−1)
i
=


i=0
(s−1)
i


k=i

k
i

P(X=k).]
11.Letp(n,k)=f(n,k)/n!wheref(n,k)is given by
f(n+1,k)=f(n,k)+f(n,k−1)+···+f(n,k−n),
fork=0,1,...,

n
2

and
f(n,k)=0fork<0,f(1,0)=1,f(1,k)=0 otherwise.
Let
P
n(s)=
1
n!


k=0
s
k
f(n,k)
be the probability generating function ofp(n,k). Show that
P
n(s)=(n!)
−1
n

k=2
1−s
k
1−s
|s|<1.
(P
nis the generating function of Kendall’sτ-statistic.)

SOME MOMENT INEQUALITIES 93
12.Fork=0,1,...,

n
2

letu n(k)be defined recursively by
u
n(k)=u n−1(k−n)+u n−1(k)
withu
0(0)=1,u 0(k)=0 otherwise, andu n(k)=0fork<0. LetP n(s)=
Γ

k=0
s
k
un(k)be the generating function of{u n}. Show that
P
n(s)=
n

j=1
(1+s
j
)for|s|<1.
Ifp
n(k)=u n(k)/2
n
,find{p n(k)}forn=2,3,4. (P nis the generating function of
one-sample Wilcoxon test statistic.)
3.4 SOME MOMENT INEQUALITIES
In this section we derive some inequalities for moments of an RV. The main result of this
section is Theorem 1 (and its corollary), which gives a bound for tail probability in terms
of some moment of the random variable.
Theorem 1.Leth(X)be a nonnegative Borel-measurable function of an RVX.IfEh(X )
exists, then, for everyε>0,
P{h(X )≥ε}≤
Eh(X)
ε
. (1)
Proof.We prove the result whenXis discrete. LetP{X=x
k}=p k,k=1,2,....Then
Eh(X)=
Ω
k
h(xk)pk
=

Ω
A
+
Ω
A
c

h(x
k)pk,
where
A={k:h(x
k)≥ε}.
Then
Eh(X)≥
Ω
A
h(xk)pk≥ε
Ω
A
pk
=εP{h(X )≥ε}.
Corollary.Leth(X)=|X |
r
andε=K
r
, wherer>0 andK>0. Then
P{|X|≥K}≤
E|X|
r
K
r
, (2)

94 MOMENTS AND GENERATING FUNCTIONS
which isMarkov’s inequality. In particular, if we take h(X)=(X−μ)
2
,ε=K
2
σ
2
, we get
Chebychev–Bienayme inequality:
P{|X−μ|≥Kσ}≤
1
K
2
, (3)
whereEX=μ,var(X)=σ
2
.
Remark 1.The inequality (3) is generally attributed to Chebychev although recent
research has shown that credit should also go to I.J. Bienayme.
Remark 2.If we wish to be consistent with our definition of a DF asF
X(x)=P(X ≤x),
then we may want to reformulate (1) in the following form.
P{h(X )>ε}<Eh(X)/ε.
For RVs with finite second-order moments one cannot do better than the inequality in (3).
Example 1.
P{X=0}=1−
1
K
2
P{X=∓1}=
1
2K
2
K>1,constant,
EX=0, EX
2
=
1
K
2
,σ =
1
K
,
P{|X|≥Kσ}=P{|X|≥1}=
1
K
2
,
so that equality is achieved. Example 2.LetXbe distributed with PDFf(x)=1if0<x<1, and=0 otherwise. Then
EX=
1
2
,EX
2
=
1
3
,var(X)=
1
3

1
4
=
1
12
,
P

|X−
1
2
|<2

1
12

=P

1
2

1

3
<X<
1
2
+
1

3

=1.
From Chebychev’s inequality
P

|X−
1
2
|<2

1
12

≥1−
1
4
=0.75.
In Fig. 1 we compare the upper bound forP{|X−1/2|≥ k/

12}with the exact
probability.
It is possible to improve upon Chebychev’s inequality, at least in some cases, if we
assume the existence of higher order moments. We need the following lemma.

SOME MOMENT INEQUALITIES 95
1
k
Exact
Upper bound
10 √3
Fig. 1Chebychev upper bound versus exact probability.
Lemma 1.LetXbe an RV withEX=0 andvar(X)=σ
2
. Then
P{X≥x}≤
σ

2
+x
2
ifx>0, (4)
P{X≥x}≥
x

2
+x
2
ifx<0, (5)
Proof.Leth(t)=(t+c)
2
,c>0. Thenh(t)≥0 for alltand
h(t)≥(x+c)
2
fort≥x>0.
It follows that
P{X≥x}≤P{h(X )≥(x+c)
2
} (6)

E(X+c)
2
(x+c)
2
for allc>0,x>0.
SinceEX=0,EX
2

2
, and the right side of (6) is minimum whenc=σ
2
/x,wehave
P{X≥x}≤
σ
2
σ
2
+x
2
,x>0.
Similar proof holds for (5).
Remark 3.Inequalities (4) and (5) cannot be improved (Problem 3).
Theorem 2.LetE|X|
4
<∞, and letEX=0,EX
2

2
. Then
P{|X|≥Kσ}≤
μ
4−σ
4
μ4+σ
4
K
4
−2K
2
σ
4
forK>1, (7)
whereμ
4=EX
4
.

96 MOMENTS AND GENERATING FUNCTIONS
Proof.For the proof let us substitute(X
2
−σ
2
)/(K
2
σ
2
−σ
2
)forXand takex=1in(4).
Then
P{X
2
−σ
2
≥K
2
σ
2
−σ
2
}≤
var{(X
2
−σ
2
)/(K
2
σ
2
−σ
2
)}
1+var{(X
2
−σ
2
)/(K
2
σ
2
−σ
2
)}
=
μ
4−σ
4
σ
4
(K
2
−1)
2
+μ4−σ
4
=
μ
4−σ
4
μ4+σ
4
K
4
−2K
2
σ
4
,K>1,
as asserted.
Remark 4.Bound (7) is better than bound (3) ifK
2
≥μ4/σ
4
and worse if 1≤K
2
<μ4/σ
4
(Problem 5).
Example 3.LetXhave the uniform density
f(x)=

1if0<x<1,
0 otherwise.
Then
EX=
1
2
,var(X)=
1
12

4=E

X−
1
2

4
=
1
80
,
and
P





X−
1
2




≥2

1
12


1
80

1
144
1
80
+
1
144
·16−8
1
144
=
4
49
,
that is,
P


X−
1
2

<2

1
12


45
49
≈0.92,
which is much better than the bound given by Chebychev’s inequality (Example 2).
Theorem 3(Lyapunov Inequality).Letβ
n=E|X|
n
<∞. Then for arbitraryk,2≤k≤n,
we have
β
1/(k−1)
k−1
≤β
1/k
k
. (8)
Proof.Consider the quadratic form
Q(u,v)=


−∞
(u|x|
(k−1)/2
+v|x|
(k+1)/2
)
2
f(x)dx,

SOME MOMENT INEQUALITIES 97
wherewehaveassumedthatXis continuous with PDFf.Wehave
Q(u,v)=u
2
βk−1+2uvβ k+βk+1v
2
.
ClearlyQ≥0 for allu,vreal. It follows that





β
k−1βk
βkβk+1





≥0,
implying that
β
2k
k
≤β
k
k−1
β
k
k+1
.
Thus
β
2
1
≤β
1
0
β
1
2

4
2
≤β
2
1
β
2
3
,...,β
2(n−1)
n−1
≤β
n−1
n−2
β
n−1
n
,
whereβ
0=1. Multiplying successivek−1 of these, we have
β
k
k−1
≤β
k−1
k
orβ
1/(k−1)
k−1
≤β
1/k
k
.
It follows that
β
1≤β
1/2
2
≤β
1/3
3
≤···≤β
1/n
n
.
The equality holds if and only if
β
1/k
k

1/(k+1)
k+1
fork=1,2,...,
that is,{β
1/k
k
}is a constant sequence of numbers, which happens if and only if|X|is
degenerate, that is, for somec,P{|X|=c}=1.
PROBLEMS 3.4
1.For the RV with PDF
f(x;λ)=
e
−x
x
λ
λ!
,x>0,
whereλ≥0 is an integer, show that
P{0<X<2(λ+1)}>
λ
λ+1
.
2.LetXbe any RV, and suppose that the MGF ofX,M(t)=Ee
tx
, exists for everyt>0.
Then for anyt>0
P{tX>s
2
+logM(t)}<e
−s
2
.

98 MOMENTS AND GENERATING FUNCTIONS
3.Construct an example to show that inequalities (4) and (5) cannot be improved.
4.Letg(.)be a function satisfyingg(x)>0forx >0,g(x)increasing forx>0, and
E|g(X)|<∞. Show that
P{|X|>ε}<
Eg(|X|)
g(ε)
for everyε>0.
5.LetXbe an RV withEX=0,var(X)=σ
2
, andEX
4
=μ4.LetK be any positive real
number. Show that
P{|X|≥Kσ}≤







1i fK
2
<1,
1
K
2 if 1≤K
2
<
μ4
σ
4,
μ
4−σ
4
μ4+σ
4
K
4
−2K
2
σ
4
ifK
2

μ4
σ
4.
In other words, show that bound (7) is better than bound (3) ifK
2
≥μ4/σ
4
and worse
if 1≤K
2
<μ4/σ
4
. Construct an example to show that the last inequalities cannot
be improved.
6.Use Chebychev’s inequality to show that for anyk>1,e
k+1
≥k
2
.
7.For any RVX, show that
P{X≥0}≤inf{ϕ(t):t≥0}≤1,
whereϕ(t)=Ee
tX
,0<ϕ(t)≤∞.
8.LetXbe an RV such thatP(a≤X≤b)=1 where−∞<a<b<∞. Show that
var(X)≤(b−a)
2
/4.

4
MULTIPLE RANDOM VARIABLES
4.1 INTRODUCTION
In many experiments an observation is expressible, not as a single numerical quantity, but
as a family of several separate numerical quantities. Thus, for example, if a pair of distin-
guishable dice is tossed, the outcome is a pair(x,y), wherexdenotes the face value on the
first die, andy, the face value on the second die. Similarly, to record the height and weight
of every person in a certain community we need a pair(x,y), where the components repre-
sent, respectively, the height and weight of a particular individual. To be able to describe
such experiments mathematically we must study themultidimensional random variables.
In Section 4.2 we introduce the basic notations involved and study joint, marginal,
and conditional distributions. In Section 4.3 we examine independent random variables
and investigate some consequences of independence. Section 4.4 deals with functions of
several random variables and their induced distributions. Section 4.5 considers moments,
covariance, and correlation, and in Section 4.6 we study conditional expectation. The last
section deals with ordered observations.
4.2 MULTIPLE RANDOM VARIABLES
In this section we study multidimensional RVs. Let(Ω,S,P)be a fixed but otherwise
arbitrary probability space.
An Introduction to Probability and Statistics, Third Edition. Vijay K. Rohatgi and A.K. Md. Ehsanes Saleh.
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.

100 MULTIPLE RANDOM VARIABLES
Definition 1.The collectionX=(X 1,X2,...,X n)defined on(Ω,S,P)intoR nby
X(ω)=(X
1(ω),X 2(ω),...,X n(ω)),ω ∈Ω,
is called ann-dimensional RV if the inverse image of everyn-dimensional interval
I={(x
1,x2,...,x n):−∞<x i≤ai,ai∈R,i=1,2,...,n}
is also inS, that is, if
X
−1
(I)={ω:X 1(ω)≤a 1,...,X n(ω)≤a n}∈S fora i∈R.
Theorem 1.LetX
1,X2,...,X nbenRVs on(Ω,S,P). ThenX=(X 1,X2,...,X n)is an
n-dimensional RV on(Ω,S,P).
Proof.LetI={(x
1,x2,...,x n):−∞<x i≤ai,i=1,2,...,n}. Then
{(X
1,X2,...,X n)∈I}={ω:X 1(ω)≤a 1,X2(ω)≤a 2,...,X n(ω)≤a n}
=
n
Ω
k=1
{ω:X k(ω)≤a k}∈S
as asserted.
From now on we will restrict attention mainly to two-dimensional random variables.
The discussion for then-dimensional (n>2) case is similar except when indicated. The
development follows closely the one-dimensional case.
Definition 2.The functionF(·,·), defined by
F(x,y)=P{X≤x,Y≤y}, all(x,y)∈R
2, (1)
is known as the DF of the RV(X,Y).
Following the discussion in Section 2.3, it is easily shown that
(i)F(x,y)is nondecreasing and continuous from the right with respect to each
coordinate and
(ii) lim
x→+∞
y→+∞
F(x,y)=F(+∞, +∞)=1,
lim
y→−∞
F(x,y)=F(x,−∞)=0 for allx,
lim
x→−∞
F(x,y)=F(−∞,y)=0 for ally.
But (i) and (ii) are not sufficient conditions to make any functionF(·,·)aDF.
Example 1.LetFbe a function (Fig. 1) of two variables defined by
F(x,y)=

0,x<0orx+y<1ory<0,
1,otherwise.

MULTIPLE RANDOM VARIABLES 101
1
10 x
y
F(x,y)=0
F(x,y)=1
Fig. 1
ThenFsatisfies both (i) and (ii) above. However,Fis not a DF since
P{
1
3
<X≤1,
1
3
<Y≤1}=F(1,1)+F (
1
3
,
1
3
)−F(1,
1
3
)−F(
1
3
,1)
=1+0−1−1=−1πθ0.
Letx
1<x2andy 1<y2.Wehave
P{x
1<X≤x 2,y1<Y≤y 2}=P{X≤x 2,Y≤y 2}+P{X≤x 1,Y≤y 1}
−P{X≤x
1,Y≤y 2}−P{X≤x 2,Y≤y 1}
=F(x
2,y2)+F(x 1,y1)−F(x 1,y2)−F(x 2,y1)
≥0
for all pairs(x
1,y1),(x2,y2)withx 1<x2,y1<y2(see Fig. 2).
Theorem 2.A functionFof two variables is a DF of some two-dimensional RV if and
only if it satisfies the following conditions:
(i)Fis nondecreasing and right continuous with respect to both arguments;
(ii)F(−∞,y)=F(x,−∞)=0 andF(+∞, +∞)=1; and
(iii) for every(x
1,y1),(x2,y2)withx 1<x2andy 1<y2the inequality
F(x
2,y2)−F(x 2,y1)+F(x 1,y1)−F(x 1,y2)≥0( 2)
holds.
The “if” part of the theorem has already been established. The “only if” part will not
be proved here (see Tucker [114, p. 26]).
Theorem 2 can be generalized to then-dimensional case in the following manner.

102 MULTIPLE RANDOM VARIABLES
y
x0
(x
1,y
1)( x
2,y
1)
(x
2,y
2)(x
1,y
2)
Fig. 2{x 1<x<x 2,y1<y≤y 2}.
Theorem 3.A functionF(x 1,x2,...,x n)is the joint DF of somen-dimensional RV if and
only ifFis nondecreasing and continuous from the right with respect to all the arguments
x
1,x2,...,x nand satisfies the following conditions:
(i)
F(−∞,x
2,...,x n)=F(x 1,−∞,x 3,...,x n)···
=F(x
1,...,x n−1,−∞)=0,
F(+∞, +∞,...,+∞)=1.
(ii) For every(x
1,x2,...,x n)∈R nand allε i>0(i=1,2,...,n)the inequality
F(x
1+ε1,x2+ε2,...,x n+εn)

n
Γ
i=1
F(x1+ε1,...,x i−1+εi−1,xi,xi+1+εi+1,...,x n+εn)
+
n
Γ
i,j=1
i<j
F(x1+ε1,...,x i−1+εi−1,xi,xi+1+εi+1,...,x j−1+εj−1,
x
j,xj+1+εj+1,...,x n+εn)
+···
+(−1)
n
F(x1,x2,...,x n)≥0( 3)
holds.
We restrict ourselves here to two-dimensional RVs of the discrete or the continuous
type, which we now define.

MULTIPLE RANDOM VARIABLES 103
Definition 3.A two-dimensional (or bivariate) RV(X,Y)is said to be of thediscretetype
if it takes on pairs of values belonging to a countable set of pairsAwith probability 1. We
call every pair(x
i,yj)that is assumed with positive probabilityp ijajump pointof the DF
of(X,Y)and callp
ijthejumpat(x i,yj).HereA is the support of the distribution of(X,Y).
Clearly
Θ
ij
pij=1. As for the DF of(X,Y),wehave
F(x,y)=
Γ
B
pij,
whereB={(i,j):x
i≤x,y j≤y}.
Definition 4.Let(X,Y)be an RV of the discrete type that takes on pairs of values
(x
i,yj),i=1,2,...,andj=1,2,....We call
p
ij=P{X=x i,Y=y j}, i=1,2,...,j=1,2,...,
thejoint probability mass function(PMF) of(X,Y).
Example 2.A fair die is rolled, and a fair coin is tossed independently. LetXbe the face
value on the die, and letY=0 if a tail turns up andY=1 if a head turns up. Then
A={(1,0),(2,0),...,(6, 0),(1,1),(2,1),...,(6, 1)},
p
ij=
1
12
fori=1,2,...,6;j=0,1.
TheDFof(X ,Y)is given by
F(x,y)=





































































0, x<1,−∞<y<∞;−∞<x<∞,y<0,
1
12
,1≤x<2,0≤y<1,
1
6
,2≤x<3,0≤y<1;1≤x<2,1≤y,
1
4
,3≤x<4,0≤y<1,
1
3
,4≤x<5,0≤y<1;2≤x<3,1≤y,
5
12
,5≤x<6,0≤y<1,
1
2
,6≤x,0≤y<1;3≤x<4,1≤y,
2
3
,4≤x<5,1≤y,
5
6
,5≤x<6,1≤y,
1, 6≤x,1≤y.

104 MULTIPLE RANDOM VARIABLES
Theorem 4.A collection of nonnegative numbers{p ij:i=1,2,...;j=1,2,...}satisfy-
ing
Θ

i,j=1
pij=1 is the PMF of some RV.
Proof.The proof of Theorem 4 is easy to construct with the help of Theorem 2.
Definition 5.A two-dimensional RV(X,Y)is said to be of thecontinuoustype if there
exists a nonnegative functionf(·,·)such that for every pair(x,y)∈R
2we have
F(x,y)=

x
−∞

y
−∞
f(u,v)dv

du, (4)
whereFis the DF of(X,Y). The functionfis called the (joint) PDF of(X,Y).
Clearly,
F(+∞, +∞) = lim
x→+∞
y→+∞

x
−∞
y
−∞
f(u,v)dvdu
=


−∞

−∞
f(u,v)dvdu=1.
Iffis continuous at(x,y), then

2
F(x,y)
∂x∂y
=f(x,y). (5)
Example 3.Let(X,Y)be an RV with joint PDF (Fig. 3) given by
f(x,y)=

e
−(x+y)
,0<x<∞,0<y<∞,
0, otherwise.
Then
F(x,y)=

(1−e
−x
)(1−e
−y
),0<x<∞,0<y<∞,
0, otherwise.
Theorem 5.Iffis a nonnegative function satisfying


−∞

−∞
f(x,y)dxdy=1, thenfis
the joint density function of some RV.
Proof.For the proof define
F(x,y)=

x
−∞

y
−∞
f(u,v)dv

du
and use Theorem 2.

MULTIPLE RANDOM VARIABLES 105
012345678
0
2
4
6
8
0
0.1
0.2
0.3
0.4
0.5
0.6
0.7
0.8
0.9
1
Fig. 3f(x,y)=exp{−(x+y)},x>0,y>0.
Let(X,Y)be a two-dimensional RV with PMF
p
ij=P{X=x i,Y=y j}.
Then

Γ
i=1
pij=

Γ
i=1
P{X=x i,Y=y j}=P{Y=y j} (6)
and

Γ
j=1
pij=

Γ
j=1
P{X=x i,Y=y j}=P{X=x i}. (7)
Let us write
p
i·=

Γ
j=1
pijandp ·j=

Γ
i=1
pij. (8)
Thenp
i·≥0 and
Θ

i=1
pi·=1,p ·j≥0 and
Θ

j=1
p·j=1, and{p i·},{p·j}represent PMFs.
Definition 6.The collection of numbers{p
i·}is called themarginalPMF ofX, and the
collection{p
·j}, the marginal PMF ofY.

106 MULTIPLE RANDOM VARIABLES
Example 4.A fair coin is tossed three times. LetX=numberof heads in three tossings,
andY= difference, in absolute value, between number of heads and number of tails. The
joint PMF of(X,Y)is given in the following table:
Y
X
0123 P{Y=y}
1 0
3
8
3
8
0
6
8
3
1
8
00
1
8
2
8
P{X=x}
1
8
3
8
3
8
1
8
1
The marginal PMF ofYis shown in the column representing row totals, and the marginal
PMF ofX, in the row representing column totals.
If(X,Y)is an RV of the continuous type with PDFf, then
f
1(x)=


−∞
f(x,y)dy (9)
and
f
2(y)=


−∞
f(x,y)dx (10)
satisfyf
1(x)≥0,f 2(y)≥0, and


−∞
f1(x)dx=1,


−∞
f2(y)dy=1. It follows thatf 1(x)
andf
2(y)are PDFs.
Definition 7.The functionsf
1(x)andf 2(y), defined in (9) and (10), are called themarginal
PDF ofXand the marginal PDF ofY, respectively.
Example 5.Let(X,Y)be jointly distributed with PDFf(x,y)=2, 0<x<y<1, and,=0
otherwise (Fig. 4). Then
f
1(x)=

1
x
2dy=

2−2x,0<x<1
0, otherwise
and
f
2(y)=

y
0
2dx=

2y,0<y<1
0,otherwise
are the two marginal density functions.

MULTIPLE RANDOM VARIABLES 107
1
0 1 x
y
f(x,y)=2
Fig. 4f(x,y)=2,0<x<y<1.
Definition 8.Let(X,Y)be an RV with DFF. Then themarginalDF ofXis defined by
F
1(x)=F(x,∞) = lim
y→∞
F(x,y) (11)
=

Θ
xi≤x
pi· if(X,Y)is discrete,

x
−∞
f1(t)dtif(X,Y)is continuous.
A similar definition is given for the marginal DF ofY.
In general, given a DFF(x
1,x2,...,x n)of ann-dimensional RV(X 1,X2,...,X n), one
can obtain anyk-dimensional(1≤k≤n−1)marginal DF from it. Thus the marginal DF
of(X
i1
,Xi2
,...X ik
), where 1≤i 1<i2<···<i k≤n, is given by
lim
xi→∞
iΘ=i
1,i2,...,i k
F(x1,x2,...,x n)
=F(+∞,...,+∞,x
i1
,+∞,...,+∞,...,x ik
,+∞,...,+∞).
We now consider the concept ofconditional distributions.Let(X ,Y)be an RV of the
discrete type with PMFp
ij=P{X=x i,Y=y j}. The marginal PMFs arep i·=
Θ

j=1
and
p
·j=
Θ

i=1
pij. Recall that, ifA,B∈SandPB>0, the conditional probability ofA,given
B, is defined by
P{A|B}=
P(AB)
P(B)
.

108 MULTIPLE RANDOM VARIABLES
TakeA={X=x i}={(x i,y):−∞<y<∞}andB={Y=y j}={(x,y j);−∞<x<∞},
and assume thatPB=P{Y=y
j}=p ·j>0. ThenA∩B={X=x i,Y=y j}and
P{A|B}=P{X=x
i|Y=y j}=
p
ij
p·j
.
For fixedj, the functionP{X=x
i|Y=y j}≥0 and
Θ

i=1
P{X=x i|Y=y j}=1. Thus
P{X=x
i|Y=y j},forfixedj, defines a PMF.
Definition 9.Let(X,Y)be an RV of the discrete type. IfP{Y=y
j}>0, the function
P{X=x
i|Y=y j}=
P{X=x
i,Y=y j}
P{Y=y j}
(12)
for fixedjis known as theconditionalPMF ofX,givenY=y
j. A similar definition
is given forP{Y=y
j|X=x i}, the conditional PMF ofY,givenX=x i, provided that
P{X=x
i}>0.
Example 6.For the joint PMF of Example 4, we have forY=1
P{X=i|Y=1}=



0,i=0,3,
1
2
,i=1,2.
Similarly
P{X=i|Y=3}=



1
2
,ifi=0,3,
0,ifi=1,2,
P{Y=j|X=0}=

0,ifj=1,
1,ifj=3,
and so on.
Next suppose that(X,Y)is an RV of the continuous type with joint PDFf. Since
P{X=x}=0,P{Y=y}=0 for anyx,y, the probabilityP{X≤x|Y=y}orP{Y≤
y|X=x}is not defined. Letε>0, and suppose thatP{y−ε<Y ≤y+ε}>0. For
everyxand every interval(y−ε,y+ε], consider the conditional probability of the event
{X≤x}, given thatY∈(y−ε,y+ε].Wehave
P{X≤x|y−ε<Y ≤y+ε}=
P{X≤x,y−ε<Y ≤y+ε}
P{Y∈(y−ε,y+ε]}
.
For any fixed interval(y−ε,y+ε], the above expression defines the conditional DF ofX,
given thatY∈(y−ε,y+ε], provided thatP{Y∈(y−ε,y+ε]}>0. We shall be interested
in the case where the limit
lim
ε→0+
P{X≤x|Y∈(y−ε,y+ε]}
exists.

MULTIPLE RANDOM VARIABLES 109
Definition 10.TheconditionalDF of an RVX,givenY=y, is defined as the limit
lim
ε→0+
P{X≤x|Y∈(y−ε,y+ε]}, (13)
provided that the limit exists. If the limit exists, we denote it byF
X|Y(x|y)and define
the conditional density function ofX,givenY=y,f
X|Y(x|y), as a nonnegative function
satisfying
F
X|Y(x|y)=

x
−∞
f
X|Y(t|y)dtfor allx∈R. (14)
For fixedy, we see thatf
X|Y(x|y)≥0 and


−∞
f
X|Y(x|y)dx=1. Thusf
X|Y(x|y)is a PDF
for fixedy.
Suppose that(X,Y)is an RV of the continuous type with PDFf. At every point(x,y)
wherefis continuous and the marginal PDFf
2(y)>0 and is continuous, we have
F
X|Y(x|y) = lim
ε→0+
P{X≤x,Y∈(y−ε,y+ε]}
P{Y∈(y−ε,y+ε]}
= lim
ε→0+

x
−∞
ρ

y+ε
y−ε
f(u,v)dv

du

y+ε
y−ε
f2(v)dv
.
Dividing numerator and denominator by 2εand passing to the limit asε→0+,wehave
F
X|Y(x|y)=

x
−∞
f(u,y)du
f2(y)
=

x
−∞

f(u,y)
f2(y)

du.
It follows that there exists a conditional PDF ofX,givenY=y, that is expressed by
f
X|Y(x|y)=
f(x,y)
f2(y)
,f
2(y)>0.
We have thus proved the following theorem.
Theorem 6.Letfbe the PDF of an RV(X,Y)of the continuous type, and letf
2be the
marginal PDF ofY. At every point(x,y)at whichfis continuous andf
2(y)>0 and is
continuous, the conditional PDF ofX,givenY=y, exists and is expressed by
f
X|Y(x|y)=
f(x,y)
f2(y)
. (15)
Note that

x
−∞
f(u,y)du=f 2(y)F
X|Y(x|y),

110 MULTIPLE RANDOM VARIABLES
so that
F
1(x)=


−∞

x
−∞
f(u,y)du

dy=


−∞
f2(y)F
X|Y(x|y)dy, (16)
whereF
1is the marginal DF ofX.
It is clear that similar definitions may be made for the conditional DF and conditional
PDF of the RVY,givenX=x, and an analog of Theorem 6 holds.
In the general case, let(X
1,X2,...,X n)be ann-dimensional RV of the continuous type
with PDFf
X1,X2,...,X n
(x1,x2,...,x n). Also, let{i 1<i2<···<i k,j1<j2<···<j l}be a
subset of{1,2,...,n}. Then
F(x
i1
,xi2
,...,x ik
|xj1
,xj2
,...,x jl
), (17)
=

xi
1
−∞
···

xik
−∞
fXi1...,X ik,Xj1,...,X jl
(ui1
,...,u ik
,xj1
,...,x jl
)

k
p=1
duip


−∞
···


−∞
fXi1,...,X 1k,Xj1...,X jl
(ui1
,...,u ik
,xj1
,...,x jl
)

k p=1
duip
provided that the denominator exceeds 0. Heref Xi
1,...,X i
k,Xj
1,...,X j
l
is the joint marginal PDF
of(X
i1
,Xi2
,...,X ik
,Xj1
,Xj2
,...,X jl
). The conditional densities are obtained in a similar
manner.
The case in which(X
1,X2,...,X n)is of the discrete type is similarly treated.
Example 7.For the joint PDF of Example 5 we have
f
Y|X(y|x)=
f(x,y)
f1(x)
=
1
1−x
,x<y<1,
so that the conditional PDFf
Y|Xis uniform on(x,1).Also,
f
X|Y(x|y)=
1
y
,0<x<y,
which is uniform on(0,y). Thus
P

Y≥
1
2
|x=
1
2

=

1
1/2
1
1−
1
2
dy=1,
P

X≥
1
3




y=
2
3

=

2/3
1/3
1
2
3
dx=
1
2
.
We conclude this section with a discussion of a technique calledtruncation. We con-
sider two types of truncation each with a different objective. In probabilistic modeling we
usetruncated distributionswhen sampling from an incomplete population.
Definition 11.LetXbe an RV on(Ω,S,P), andT∈Bsuch that 0<P{X∈T}<1.
Then the conditional distributionP{X≤x|X∈T}, defined for any realx, is called the
truncated distributionofX.

MULTIPLE RANDOM VARIABLES 111
IfXis a discrete RV with PMFp i=P{X=x i},i=1,2,...,the truncated distribution
ofXis given by
P{X=x
i|X∈T}=
P{X=x
i,X∈T}
P{X∈T}
=



p
i
Θ
xj∈T
pj
ifxi∈T,
0 otherwise.
(18)
IfXis of the continuous type with PDFf, then
P{X≤x|X∈T}=
P{X≤x,X∈T}
P{X∈T}
=

(−∞,x ]∩T
f(y)dy

T
f(y)dy
. (19)
The PDF of the truncated distribution is given by
h(x)=



f(x)

T
f(y)dy
,x∈T,
0, xΘΩT.
(20)
HereTis not necessarily a bounded set of real numbers. If we writeYfor the RV with
distribution functionP{X≤x|X∈T}, thenYhas supportT.
Example 8.LetXbe an RV withstandard normalPDF
f(x)=
1


e
−x
2/
2
.
LetT=(−∞,0]. ThenP{X∈T}=1/2, sinceXis symmetric and continuous. For the
truncated PDF, we have
h(x)=

2f(x),−∞<x≤0,
0, x>0.
Some other examples are the truncated Poisson distribution
P{X=k}=
e
−λ
1−e
−λ
x
k
k!
,k=1,2,...,
whereT={X≥1}, and the truncated uniform distribution
f(x)=1/θ, 0<x<θ,and=0 otherwise,
whereT={X<θ},θ>0.
The second type of truncation is very useful in probability limit theory specially when
the DFFin question does not have a finite mean. Leta<bbe finite real numbers. Define
RVX

by
X

=

X,ifa≤X≤b
0,ifX<a,orX>b.

112 MULTIPLE RANDOM VARIABLES
This method produces an RV for whichP(a≤X

≤b)=1 so thatX

has moments of all
orders. The special case whenb=c>0 anda=−cis quite useful in probability limit
theory when we wish to approximateXthrough bounded rvs. We say thatX
c
isX truncated
at cifX
c
=Xfor|X|≤c, and=0for|X|>c. ThenE|X
c
|
k
≤c
k
. Moreover,
P{XΘ=X
c
}=P{|X|>c}
so thatccan be selected sufficiently large to makeP{|X|>c}arbitrarily small. For
example, ifE|X|
2
<∞then
P{|X|>c}≤E |X|
2
/c
2
and givenε>0, we can choosecsuch thatE|X|
2
/c
2
<ε.
The distribution ofX
c
is no longer the truncated distributionP{X≤x||X|≤c}. In fact,
F
c
(y)=









0, y≤−c
F(y)−F(−c), −c<y<0
1−F(c)+F (y), 0≤y<c
1 y>c,
whereFis the DF ofXandF
c
, that ofX
c
.
A third type of truncation, sometimes called Winsorization, sets
X

=X,ifa<X<b,=aifX≤a,and=bifX≥b.
This method also produces an RV for whichP(a≤X

≤b)=1, moments of all orders
forX

exist but its DF is given by
F

(y)=0fory<a,=F(y)fora≤y<b,=1fory≥b.
PROBLEMS 4.2
1.LetF(x,y)=1ifx+2y≥1, and=0ifx+2y<1. DoesFdefine a DF in the
plane?
2.LetTbe a closed triangle in the plane with vertices(0,0),(0,

2), and(

2,

2).
LetF(x,y)denote the elementary area of the intersection ofTwith{(x
1,x2):x1≤
x,x
2≤y}. Show thatFdefines a DF in the plane, and find its marginal DFs.
3.Let(X,Y)have the joint PDFfdefined byf(x,y)=1/2 inside the square with
corners at the points(1,0),(0,1),(−1,0), and(0,−1)in the(x,y)-plane, and=0
otherwise. Find the marginal PDFs ofXandYand the two conditional PDFs.
4.Letf(x,y,z)=e
−x−y−z
,x>0,y>0,z>0, and=0 otherwise, be the joint PDF
of(X,Y,Z). ComputeP{X<Y<Z}andP{X=Y<Z}.
5.Let(X,Y)have the joint PDFf(x,y)=
4
3
[xy+(x
2
/2)]if 0<x<1, 0<y<2, and
=0 otherwise. FindP{Y<1|X<1/2}.

MULTIPLE RANDOM VARIABLES 113
6.For DFsF,F 1,F2,...,F nshow that
1−
n

i=1
{1−F i(xi)}≤F (x 1,x2,...,x n)≤min
1≤i≤n
Fi(xi)
for all real numbersx
1,x2,...,x nif and only ifF i’s are marginal DFs ofF.
7.For thebivariate negative binomialdistribution
P{X=x,Y=y}=
(x+y+k−1)!
x!y!(k−1)!
p
x
1
p
y
2
(1−p 1−p2)
k
,
wherex,y=0,1,2,...,k≥1 is an integer, 0<p
1<1, 0<p 2<1, andp 1+p2<1,
find the marginal PMFs ofXandYand the conditional distributions.
In Problems 8–10 the bivariate distributions considered are not unique generalizations
of the corresponding univariate distributions.
8.For thebivariate CauchyRV(X,Y)with PDF
f(x,y)=
c

(c
2
+x
2
+y
2
)
−3/2
,−∞<x<∞,−∞<y<∞,c>0,
find the marginal PDFs ofXandY. Find the conditional PDF ofYgivenX=x.
9.For thebivariate betaRV(X,Y)with PDF
f(x,y)=
Γ(p
1+p2+p3)
Γ(p1)Γ(p 2)Γ(p 3)
x
p1−1
y
p2−1
(1−x−y)
p3−1
,
x≥0,y≥0,x+y≤1,
wherep
1,p2,p3are positive real numbers, find the marginal PDFs ofXandYand
the conditional PDFs. Find also the conditional PDF ofY/(1−X),givenX =x.
10.For thebivariate gammaRV(X,Y)with PDF
f(x,y)=
β
α+γ
Γ(α)Γ(γ)
x
α−1
(y−x)
γ−1
e
−βy
,0<x<y;α,β,γ >0,
find the marginal PDFs ofXandYand the conditional PDFs. Also, find the con-
ditional PDF ofY−X,givenX=x, and the conditional distribution ofX/Y,given
Y=y.
11.For thebivariate hypergeometricRV(X,Y)with PMF
P{X=x,Y=y}=

N
n

−1
Np
1
x

Np
2
y

N−Np
1−Np2
n−x−y

,
x,y=0,1,2,...,n,
wherex≤Np
1,y≤Np 2,n−x−y≤N(1−p 1−p2),N,nintegers withn≤N, and
0<p
1<1,0<p 2<1sothatp 1+p2≤1, find the marginal PMFs ofXandYand
the conditional PMFs.

114 MULTIPLE RANDOM VARIABLES
12.LetXbe an RV with PDFf(x)=1if0≤x≤1, and=0 otherwise. LetT=
{x:1/3<x≤1/2}. Find the PDF of the truncated distribution ofX, its means,
and its variance.
13.LetXbe an RV with PMF
P{X=x}=e
−λ
λ
x
x!
,x=0,1,2,...,λ>0.
Suppose that the valuex=0 cannot be observed. Find the PMF of the truncated
RV, its mean, and its variance.
14.Is the function
f(x,y,z,u)=

exp(−u),0<x<y<z<u<∞
0 elsewhere
a joint density function? If so, findP(X≤7), where(X,Y,Z,U)is a random
variable with densityf.
15.Show that the function defined by
f(x,y,z,u)=
24
(1+x+y+z+u)
5
,x>0,y>0,z>0,u>0
and 0 elsewhere is a joint density function.
(a) FindP(X>Y<Z>U).
(b) FindP(X+Y+Z+U≥1).
16.Let(X,Y)have joint density functionfand joint distribution functionF. Suppose
that
f(x
1,y1)f(x2,y2)≤f(x 1,y2)f(x2,y1)
holds forx
1≤a≤x 2andy 1≤b≤y 2. Show that
F(a,b)≤F
1(a)F2(b).
17.Suppose(X,Y,Z)are jointly distributed with density
f(x,y,z)=

g(x)g(y)g(z), x>0,y>0,z>0
0 elsewhere.
FindP(X>Y>Z). Hence find the probability that(x,y,z)Θ∈{X>Y>Z}or
{X<Y<Z}.(Heregis density function onR.)
4.3 INDEPENDENT RANDOM VARIABLES
We recall that the joint distribution of a multiple RV uniquely determines the marginal
distributions of the component random variables, but, in general, knowledge of marginal
distributions is not enough to determine the joint distribution. Indeed, it is quite possible
to have an infinite collection of joint densitiesf
αwith given marginal densities.

INDEPENDENT RANDOM VARIABLES 115
Example 1.(Gumbel [39]). Letf 1,f2,f3be three PDFs with corresponding DFsF 1,F2,F3,
and letαbe a constant,|α|≤1. Define
f
α(x1,x2,x3)=f1(x1)f2(x2)f3(x3)
·{1+α[2F
1(x1)−1][2F 2(x2)−1][2F 3(x3)−1]}.
We show thatF
αis a PDF for eachαin[−1,1]and that the collection of densities
{f
α;−1≤α≤1}has the same marginal densitiesf 1,f2,f3. First note that
|[2F
1(x1)−1][2F 2(x2)−1][2F 3(x3)−1]|≤1,
so that
1+α[2F
1(x1)−1][2F 2(x2)−1][2F 3(x3)−1]≥0.
Also,

f
α(x1,x2,x3)dx1dx2dx3
=1+α

[2F 1(x1)−1]f 1(x1)dx1

[2F
2(x2)−1]f 2(x2)dx2

·

[2F
3(x3)−1]f 3(x3)dx3

=1+α{[F
2
1
(x1)



−∞
−1][F
2
2
(x2)



−∞
−1][F
2
3
(x3)



−∞
−1]}
=1.
It follows thatf
αis a density function. Thatf 1,f2,f3are the marginal densities off αfollows
similarly.
In this section we deal with a very special class of distributions in which the marginal
distributions uniquely determine the joint distribution of a multiple RV. First we consider
the bivariate case.
LetF(x,y)andF
1(x),F 2(y), respectively, be the joint DF of(X,Y)and the marginal
DFs ofXandY.
Definition 1.We say thatXandYareindependentif and only if
F(x,y)=F
1(x)F2(y)for all(x,y)∈R 2. (1)
Lemma 1.IfXandYare independent anda<c,b<dare real numbers, then
P{a<X≤c,b<Y≤d}=P{a<X≤c}P{b <Y≤d}. (2)
Theorem 1.(a) A necessary and sufficient condition for RVsX,Yof the discrete type
to be independent is that
P{X=x
i,Y=y j}=P{X=x i}P{Y=y j} (3)

116 MULTIPLE RANDOM VARIABLES
for all pairs(x i,yj).(b)TwoRVsX andYof the continuous type are independent if
and only if
f(x,y)=f
1(x)f2(y)for all(x,y)∈R 2, (4)
wheref,f
1,f2, respectively, are the joint and marginal densities ofXandY, andfis
everywhere continuous.
Proof.(a) LetX,Ybe independent. Then from Lemma 1, lettinga→candb→d,we
get
P{X=c,Y=d}=P{X=c}P{Y =d}.
Conversely,
F(x,y)=
Γ
B
P{X=x i,Y=y j},
where
B={(i,j):x
i≤x,y j≤y}.
Then
F(x,y)=
Γ
B
P{X=x i}P{Y=y j}
=
Γ
xi≤x
[
Γ
yj≤y
P{Y=y j}]P{X=x i}=F(x)F(y).
The proof of part (b) is left as an exercise.
Corollary.LetXandYbe independent RVs. ThenF
Y|X(y|x)=F Y(y)for ally, and
F
X|Y(x|y)=F X(x)for allx.
Theorem 2.The RVsXandYare independent if and only if
P{X∈A
1,Y∈A 2}=P{X∈A 1}P{Y∈A 2} (5)
for all Borel setsA
1on thex-axis andA 2on they-axis.
Theorem 3.LetXandYbe independent RVs andfandgbe Borel-measurable functions.
Thenf(X)andg(Y)are also independent.

INDEPENDENT RANDOM VARIABLES 117
Proof.We have
P{f(X)≤x,g(Y)≤y}=P{X∈f
−1
(−∞,x],Y∈g
−1
(−∞,y]}
=P{X∈f
−1
(−∞,x]}P{Y∈g
−1
(−∞,y]}
=P{f(X)≤x}P{g(Y )≤y}.
Note that a degenerate RV is independent of any RV.
Example 2.LetXandYbe jointly distributed with PDF
f(x,y)=



1+xy
4
,|x|<1,|y|<1,
0, otherwise.
ThenXandYare not independent sincef
1(x)=1/2,|x|<1, andf 2(y)=1/2, |y|<1arethe
marginal densities ofXandY, respectively. However, the RVsX
2
andY
2
are independent.
Indeed,
P{X
2
≤u,Y
2
≤v}=

v
1/2
−v
1/2

u
1/2
−u
1/2
f(x,y)dxdy
=
1
4

v
1/2
−v
1/2


u
1/2
−u
1/2
(1+xy)dx

dy
=u
1/2
v
1/2
=P{X
2
≤u}P{Y
2
≤v}.
Note thatφ(X
2
)andψ(Y
2
)are independent whereφandψare Borel–measurable
functions. ButXis not a Borel-measurable function ofX
2
.
Example 3.We return to Buffon’s needle problem, discussed in Examples 1.2.9 and 1.3.7.
Suppose that the RVR, which represents the distance from the center of the needle to the
nearest line, is uniformly distributed on(0,l]. Suppose further thatΘ, the angle that the
needle forms with this line, is uniformly distributed on[0,π).IfRandΘareassumedto
be independent, the joint PDF is given by
f
R,Θ(r,θ)=f R(r)fΘ(θ)=



1
l
·
1
π
if 0<r≤l,0≤π,
0 otherwise.
The needle will intersect the nearest line if and only if
l
2
sinΘ≥R.

118 MULTIPLE RANDOM VARIABLES
Therefore, the required probability is given by
P

sinΘ≥
2R
l

=

π
0
(
l
2
)sinθ
0
fR,Θ(r,θ)dr dθ
=
1


π
0
l
2
sinθdθ=
1
π
.
Definition 2.A collection of jointly distributed RVsX
1,X2,...,X nis said to bemutually
or completely independentif and only if
F(x
1,x2,...,x n)=
n

i=1
Fi(xi),for all(x 1,x2,...,x n)∈R n, (6)
whereFis the joint DF of(X
1,X2,...,X n), andF i(i=1,2,...,n)is the marginal DF of
X
i.X1,...,X n, which are said to bepairwise independentif and only if every pair of them
are independent.
It is clear that an analog of Theorem 1 holds, but we leave the reader to construct it.
Example 4.In Example 1 we cannot write
f
α(x1,x2,x3)=f1(x1)f2(x2)f3(x3)
except whenα=0. It follows thatX
1,X2, andX 3are not independent except whenα=0.
The following result is easy to prove.
Theorem 4.IfX
1,X2,...,X nare independent, every subcollectionX i1
,Xi2
,...,X ik
of
X
1,X2,...,X nis also independent.
Remark 1.It is quite possible for RVsX
1,X2,...X nto bepairwise independentwithout
being mutually independent. Let(X,Y,Z)have the joint PMF defined by
P{X=x,Y=y,Z=z}=













3
16
if(x,y,z)∈{(0,0,0),(0,1,1),
(1,0,1),(1,1,0)},
1
16
if(x,y,z)∈{(0,0,1),(0,1,0),
(1,0,0),(1,1,1)}.

INDEPENDENT RANDOM VARIABLES 119
Clearly,X,Y,Zare not independent (why?). We have
P{X=x,Y=y}=
1
4
,(x,y)∈{(0,0),(0,1),(1,0),(1,1)},
P{Y=y,Z=z}=
1
4
,(y,z)∈{(0,0),(0,1),(1,0),(1,1)},
P{X=x,Z=z}=
1
4
,(x,z)∈{(0,0),(0,1),(1,0),(1,1)},
P{X=x}=
1
2
, x=0,x=1,
P{Y=y}=
1
2
, y=0,y=1,
P{Z=z}=
1
2
, z=0,z=1.
It follows thatXandY,YandZ, andXandZare pairwise independent.
Definition 3.A sequence{X
n}of RVs is said to be independent if for everyn=2,3,4,...
the RVsX
1,X2,...,X nare independent.
Similarly, one can speak of an independent family of RVs.
Definition 4.We say that RVsXandYareidentically distributedifXandYhave the
same DF, that is,
F
X(x)=F Y(x)for allx∈R
whereF
XandF Yare the DF’s ofXandY, respectively.
Definition 5.We say that{X
n}is a sequence ofindependent, identically distributed
(iid) RVs with common lawL(X)if{X
n}is an independent sequence of RVs and the
distribution ofX
n(n=1,2...)is the same as that ofX.
According to Definition 4,XandYare identically distributed if and only if they have
the same distribution. It does not follow thatX=Ywith probability 1 (see Problem 7). If
P{X=Y}=1, we say thatXandYareequivalentRVs. All Definition 4 says is thatX
andYare identically distributed if and only if
P{X∈A}=P{Y∈A} for allA∈B.
Nothing is said about the equality of events{X∈A}and{Y∈A}.
Definition 6.Two multiple RVs(X
1,X2,...,X m)and(Y 1,Y2,...,Y n)are said to be
independent if
F(x
1,x2,...,x m,y1,y2,...,y n)=F 1(x1,x2,...,x m)F2(y1,y2,...,y n) (7)

120 MULTIPLE RANDOM VARIABLES
for all(x 1,x2,...,x m,y1,y2,...,y n)∈R m+n, whereF,F 1,F2are the joint distribution func-
tions of(X
1,X2,...,X m,Y1,Y2,...,Y n),(X1,X2,...,X m), and(Y 1,Y2,...,Y n), respectively.
Of course, the independence of(X
1,X2,...,X m)and(Y 1,Y2,...,Y n)does not imply the
independence of componentsX
1,X2,...,X mofXor componentsY 1,Y2,...,Y nofY.
Theorem 5.LetX=(X
1,X2,...,X m)andY=(Y 1,Y2,...,Y n)be independent RVs.
Then the componentX
jofX(j=1,2,...,m)and the componentY kofY(k=1,2,...,n)
are independent RVs. Ifhandgare Borel-measurable functions,h(X
1,X2,...,X m)and
g(Y
1,Y2,...,Y n)are independent.
Remark 2.It is possible that an RVXmay be independent ofYand also ofZ,butXmay
not be independent of the random vector(Y,Z). See the example in Remark 1.
LetX
1,X2,...,X nbe independent and identically distributed RVs with common DFF.
Then the joint DFGof(X
1,X2,...,X n)is given by
G(x
1,x2,...,x n)=
n

j=1
F(xj).
We note that for any of then!permutations(x
i1
,xi2
,...,x in
)of(x 1,x2,...,x n)
G(x
1,x2,...,x n)=
n

j=1
F(xij
)=G(x i1
,xi2
,...,x in
)
so thatGis a symmetric function ofx
1,x2,...,x n. Thus(X 1,X2,...,X n)
d
=(X i1
,Xi2
,...,X in
),
whereX
d
=Ymeans thatXandYare identically distributed RVs.
Definition 7.The RVsX
1,X2,...,X naresaidtobeexchangeable if
(X
1,X2,...,X n)
d
=(X i1
,Xi2
,...,X in
)
for alln!permutations(i
1,i2,...,i n)of(1,2,...,n). The RVs in the sequence{X n}are said
to be exchangeable ifX
1,X2,...,X nare exchangeable for eachn.
Clearly ifX
1,X2,...,X nare exchangeable, thenX iare identically distributed but not
necessarily independent.
Example 5.SupposeX,Y,Zhave joint PDF
f(x,y,z)=

2
3
(x+y+z),0<x<1,0<y<1,0<z<1
0, otherwise.
ThenX,Y,Zare exchangeable but not independent.

INDEPENDENT RANDOM VARIABLES 121
Example 6.LetX 1,X2,...,X nbe iid RVs. LetS n=
π
n
j=1
Xj,n=1,2,...andY k=
X
k−Sn/n,k=1,2,...,n−1. ThenY 1,Y2,...,Y n−1are exchangeable.
Theorem 6.LetX,Ybe exchangeable RVs. ThenX−Yhas a symmetric distribution.
Definition 8.LetXbe an RV, and letX

be an RV that is independent ofXandX

d
=X.
We call the RV
X
s
=X−X

thesymmetrized X.
In view of Theorem 6,X
s
is symmetric about 0 so that
P{X
s
≥0}≥
1
2
andP{X
s
≤0}≥
1
2
.
IfE|X|<∞, thenE|X
s
|≤2E |X|<∞, andEX
s
=0.
The technique of symmetrization is an important tool in the study of probability limit
theorems. We will need the following result later. The proof is left to the reader.
Theorem 7.Forε>0,
(a)P{|X
s
|>ε}≤2P{|X |>ε/2}.
(b) Ifa≥0 such thatP{X≥a}≤1 −pandP{X≤−a}≤1 −p, then
P{|X
s
|≥ε}≥P{| X|>a+ε},
forε>0.
PROBLEMS 4.3
1.LetAbe a set ofknumbers, andΩbe the set of all ordered samples of sizenfrom
Awith replacement. Also, letSbe the set of all subsets ofΩ, andPbe a probability
defined onS.LetX
1,X2,...,X nbe RVs defined on(Ω,S,P)by setting
X
i(a1,a2,...,a n)=a i,(i=1,2,...,n).
Show thatX
1,X2,...,X nare independent if and only if each sample point is equally
likely.
2.LetX
1,X2be iid RVs with common PMF
P{X=±1}=
1
2
.
WriteX
3=X1X2. Show thatX 1,X2,X3are pairwise independent but not
independent.

122 MULTIPLE RANDOM VARIABLES
3.Let(X 1,X2,X3)be an RV with joint PMF
f(x
1,x2,x3)=
1
4
if(x
1,x2,x3)∈A,
=0 otherwise,
where
A={(1,0,0),(0,1,0),(0,0,1),(1,1,1)}.
AreX
1,X2,X3independent? AreX 1,X2,X3pairwise independent? AreX 1+X2and
X
3independent?
4.LetXandYbe independent RVs such thatXYis degenerate atcΘ=0. That is,
P(XY=c)=1. Show thatXandYare also degenerate.
5.Let(Ω,S,P)be a probability space andA,B∈S.DefineXandYso that
X(ω)=I
A(ω), Y(ω)=I B(ω)for allω∈Ω.
Show thatXandYare independent if and only ifAandBare independent.
6.LetX
1,X2,...,X nbe a set of exchangeable RVs. Then
E

X
1+X2+···+X k
X1+X2+···+X n

=
k
n
,1≤k≤n.
7.LetXandYbe identically distributed. Construct an example to show thatXandY
need not be equal, that is,P{X=Y}need not equal 1.
8.Prove Lemma 1.
9.LetX
1,X2,...,X nbe RVs with joint PDFf, and letf jbe the marginal PDF ofX j(j=
1,2,...,n). Show thatX
1,X2,...,X nare independent if and only if
f(x
1,x2,...,x n)=
n

j=1
fj(xj)for all(x 1,x2,...x n)∈R n.
10.Suppose two buses, A and B, operate on a route. A person arrives at a certain bus
stop on this route at time 0. LetXandYbe the arrival times of buses A and B,
respectively, at this bus stop. SupposeXandYare independent and have density
functions given, respectively, by
f
1(x)=
1
a
,0≤x≤a,and 0 elsewhere,
f
2(y)=
1
b
,0≤y≤b,and 0 otherwise.
What is the probability that bus A will arrive before bus B?
11.Consider two batteries, one of Brand A and the other of Brand B. Brand A batteries have a length of life with density function
f(x)=3λx
2
exp(− λx
3
),x>0,and 0 elsewhere,

FUNCTIONS OF SEVERAL RANDOM VARIABLES 123
whereas Brand B batteries have a length of life with density function given by
g(x)=3μy
2
exp(− μy
3
),y>0,and 0 elsewhere
Brand A and Brand B batteries operate independently and are put to a test. What
is the probability that Brand B battery will outlast Brand A? In particular, what is
the probability ifλ=μ?
12.(a) Let(X,Y)have joint densityf. Show thatXandYare independent if and only
if for some constantk>0 and nonnegative functionsf
1andf 2
f(x,y)=kf 1(x)f2(y)
for allx,y∈R.
(b) LetA={f
X(x)>0},B={f Y(y)>0}, andf X,fYare marginal densities ofX
andY, respectively. Show that ifXandYare independent then{f>0}=A×B.
13.Ifφis the CF ofX, show that the CF ofX
s
is real and even.
14.LetX,Ybe jointly distributed with PDFf(x,y)=(1−x
3
y)/4for|x |<1,|y|<1,
and=0 otherwise. Show thatX
d
=YandX−Yhas a symmetric distribution.
4.4 FUNCTIONS OF SEVERAL RANDOM VARIABLES
LetX
1,X2,...,X nbe RVs defined on a probability space(Ω,S,P). In practice we deal with
functions ofX
1,X2,...,X nsuch asX 1+X2,X1−X2,X1X2,min(X 1,...,X n), and so on. Are
these also RVs? If so, how do we compute their distribution given the joint distribution of
X
1,X2,...,X n?
What functions of(X
1,X2,...,X n)are RVs?
Theorem 1.Letg:R
n→R mbe a Borel-measurable function, that is, ifB∈B m, then
g
−1
(B)∈B n.IfX =(X 1,X2,...,X n)is ann-dimensional RV(n≥1), theng(X)is an
m-dimensional RV.
Proof.ForB∈B
m
{g(X 1,X2,...,X n)∈B}={(X 1,X2,...,X n)∈g
−1
(B)},
and, sinceg
−1
(B)∈B n, it follows that{(X 1,X2,...,X n)∈g
−1
(B)}∈S, which concludes
the proof.
In particular, ifg:R
n→Rmis a continuous function, theng(X 1,X2,...,X n)is an RV.
How do we compute the distribution ofg(X
1,X2,...,X n)? There are several ways to
go about it. We first consider the method of distribution functions. Suppose thatY=
g(X
1,...,X n)is real-valued, and lety∈R. Then

124 MULTIPLE RANDOM VARIABLES
P{Y≤y}=P(g(X 1,...,X n)≤y)
=








{(x1,...,x n):g(x 1,...,x n)≤y}
P(X1=x1,...,x n=Xn)in the discrete case

{(x1,...,x n):g(x 1,...,x n)≤y}
f(x1,...,x n)dx1...dxnin the continuous case,
where in the continuous casefis the joint PDF of(X
1,...,X n).
In the continuous case we can obtain the PDF ofY=g(X
1,...,X n)by differentiating
the DFP{Y≤y}with respect toyprovided thatYis also of the continuous type. In the
discrete case it is easier to computeP{g(X
1,...,X n)=y}.
We take a few examples,
Example 1.Consider the bivariate negative binomial distribution with PMF
P{X=x,Y=y}=
(x+y+k−1)!
x!y!(k−1)!
p
x
1
p
y
2
(1−p 1−p2)
k
,
wherex,y=0,1,2,...;k≥1 is an integer;p
1,p2∈(0,1); andp 1+p2<1. Let us find
the PMF ofU=X+Y. We introduce an RVV=Y(see Remark 1 below) so thatu=
x+y,v=yrepresents a one-to-one mapping ofA={(x,y):x,y=0,1,2,...}onto the
setB={(u,v):v=0,1,2,...,u;u=0,1,2,...}with inverse mappingx=u−v,y=v.It
follows that the joint PMF of(U,V)is given by
P{U=u,V=v}=



(u+k−1)!
(u−v)!v!(k−1)!
p
u−v
1
p
v
2
(1−p 1−p2)
k
for(u,v)∈B,
0 otherwise.
The marginal PMF ofUis given by
P{U=u}=
(u+k−1)!(1−p
1−p2)
k
(k−1)!u!
u

v=0

u
v

p
u−v
1
p
v
2
=
(u+k−1)!(1−p
1−p2)
k
(k−1)!u!
(p
1+p2)
u
=

u+k−1
u

(p 1+p2)
u
(1−p 1−p2)
k
(u=0,1,2,...).
Example 2.Let(X
1,X2)have uniform distribution on the triangle{0≤x 1≤x2≤1), that
is,(X
1,X2)has joint density function
f(x
1,x2)=

2,0≤x
1≤x2≤1
0,elsewhere.
LetY=X
1+X2. Then fory<0,P(Y≤y)=0, and fory>2,P(Y≤y)=1. For 0≤y≤2,
we have
P(Y≤y)=P(X
1+X2≤y)=

0≤x
1
≤x
2
≤1
x1+x2≤y
f(x1,x2)dx1dx2.

FUNCTIONS OF SEVERAL RANDOM VARIABLES 125
y
y/2 x
1
x
2
1
10
(a)
1
(b)
01 x
1
x
2
y/2y−1
Fig. 1(a){x 1+x2≤y,0<x 1≤x2≤1,0<y≤1}and (b) {x 1+x2≤y,0≤x 1≤x2≤1≤y≤2}.
There are two cases to consider according to whether 0≤y≤1or1≤y≤2 (Fig. 1a
and 1b). In the former case,
P(Y≤y)=

y/2
x
1=0
ł
y−x1
x2=x1
2dx2

dx
1=2

y/2
0
(y−2x 1)dx1=y
2
/2
and in the latter case,
P(Y≤y)=1−P(Y>y)+1 −

1
x
2=y/2
ł
x2
x1=y−x 2
2dx1

dx
2
=1−2

1
y/2
(2x2−y)dx 1=1−
(y−2)
2
2
.

126 MULTIPLE RANDOM VARIABLES
Hence the density function ofYis given by
f
Y(y)=





y, 0≤y≤1
2−y,1≤y≤2
0, elsewhere.
The method of distribution functions can also be used in the case whengtakes values
inR
m,1≤m≤n, but the integration becomes more involved.
Example 3.LetX
1be the time that a customer takes from getting in line at a service desk
in a bank to completion of service, and letX
2be the time she waits in line before she
reaches the service desk. ThenX
1≥X2andX 1−X2is the service time of the customer.
Suppose the joint density of(X
1,X2)is given by
f(x
1,x2)=

e
−x1
,0≤x 2≤x1<∞
0, elsewhere.
LetY
1=X1+X2andY 2=X1−X2. Then the joint distribution of(Y 1,Y2)is given by
P(Y
1≤y1,Y2≤y2)=

A
f(x1,x2)dx1dx2,
whereA={(x
1,x2):x1+x2≤y1,x1−x2≤y2,0≤x 2≤x1<∞}. Clearly,x 1+x2≥
x
1−x2so that the setAis as shown in Fig. 2. It follows that
x
1
x
2
1
10
Fig. 2{x 1+x2≤y1,x1−x2≤y2,0≤x 2≤x1<∞}.

FUNCTIONS OF SEVERAL RANDOM VARIABLES 127
P(Y1≤y1,y2≤y2)=

(y1−y2)/2
x
2=0

x2+y2
x1=x2
e
−x1
dx1

dx
2
+

y1/2
x
2=(y1−y2)/2

y1−x2
x1=x2
e
−x1
dx1

dx
2
=

(y1−y2)/2
0
e
−x2
(1−e
−y2
)dx2
+

y1/2
(y
1−y2)/2
(e
−x2
−e
−y1+x2
)dx2
=(1−e
−y2
)(1−e
−(y1−y2)/2
)
+(e
−(y1−y2)/2
−e
−y1/2
)−e
−y1
(e
y1/2
−e
(y1−y2)/2
)
=1−e
−y2
−2e
−y1/2
+2e
−(y1+y2)/2
.
Hence the joint density ofY
1,Y2is given by
f
Y1,Y2
(y1,y2)=





1
2
e
−(y1+y2)/2
,0≤y 2≤y1<∞
0, elsewhere.
The marginal densities ofY
1,Y2are easily obtained as
f
y1
(y1)=e
−y1
fory1≥0,and 0 elsewhere;
f
y2
(y2)=e
−y2/2
(1−e
−y2/2
),fory 2≥0,and 0 elsewhere.
We next consider the method of transformations. Let(X
1,...,X n)be jointly distributed
with continuous PDFf(x
1,x2,...,x n), and lety=g(x 1,x2,...,x n)=(y 1,y2,...,y n), where
y
i=gi(x1,x2,...,x n),i=1,2,...,n
be a mapping ofR
ntoRn. Then
P{(Y
1,Y2,...,Y n)∈B}=P{(X 1,X2,...,X n)∈g
−1
(B)}
=

g
−1
(B)
f(x1,x2,...,x n)
n

i=1
dxi,
whereg
−1
(B)={x =(x 1,x2,...,x n)∈R n:g(x)∈B}. Let us chooseBto be the
n-dimensional interval
B=B
y={(y

1
,y

2
,...,y

n
):−∞<y

i
≤yi,i=1,2,...,n}.

128 MULTIPLE RANDOM VARIABLES
Then the joint DF ofYis given by
P{Y∈B
y}=G Y(y)=P{g 1(X)≤y 1,g2(X)≤y 2,...,g n(X)≤y n}
=

···
g
−1
(By)

f(x
1,x2,...,x n)
n

i=1
dxi,
and (ifG
Yis absolutely continuous) the PDF ofYis given by
w(y)=

n
GY(y)
∂y1∂y2···∂y n
at every continuity pointyofw. Under certain conditions it is possible to writewin terms
offby making a change of variable in the multiple integral.
Theorem 2.Let(X
1,X2,...,X n)be ann-dimensional RV of the continuous type with PDF
f(x
1,x2,...,x n).
(a) Let
y
1=g1(x1,x2,...,x n),
y
2=g2(x1,x2,...,x n),
.
.
.
.
.
.
y
n=gn(x1,x2,...,x n),
be a one-to-one mapping ofR
ninto itself, that is, there exists the inverse transfor-
mation
x
1=h1(y1,y2,...,y n), x 2=h2(y1,y2,...,y n),...,
x
n=hn(y1,y2,...,y n)
defined over the range of the transformation.
(b) Assume that both the mapping and its inverse are continuous.
(c) Assume that the partial derivatives
∂x
i
∂yj
,1≤i≤n,1≤j≤n,
exist and are continuous.
(d) Assume that the JacobianJof the inverse transformation
J=
∂(x
1,...,x n)
∂(y1,...,y n)
=
∂x1
∂y1
∂x1
∂y2
...
∂x1
∂yn
∂x2
∂y1
∂x2
∂y2
...
∂x2
∂yn
.
.
.
.
.
.
.
.
.
∂xn
∂y1
∂xn
∂y2
...
∂xn
∂yn
is different from 0 for(y 1,y2,...,y n)in the range of the transformation.

FUNCTIONS OF SEVERAL RANDOM VARIABLES 129
Then(Y 1,Y2,...,Y n)has a joint absolutely continuous DF with PDF given by
w(y
1,y2,...,y n)=|J|f(h 1(y1,...y n),...,h n(y1,...,y n)). (1)
Proof.For(y
1,y2,...,y n)∈R n,let
B={(y

1
,y

2
,...,y

n
)∈R n:−∞<y

i
≤yi,i=1,2,...,n}.
Then
g
−1
(B)={x ∈R n:g(x)∈B}={(x 1,x2,...,x n):gi(x)≤y i,i=1,2,...,n}
and
G
Y(y)=P{Y ∈B}=P{X∈g
−1
(B)}
=

···
g
−1
(B)

f(x
1,x2,...,x n)dx1dx2···dxn
=

y1
−∞
···

yn
−∞
f(h1(y),...,h n(y))




∂(x
1,x2,...,x n)
∂(y1,y2,...,y n)

dy
1···dyn.
Result (1) now follows on differentiation of DFG
Y.
Remark 1.In actual applications we will not know the mapping fromx
1,x2,...,x nto
y
1,y2,...,y ncompletely, but one or more of the functionsg iwill be known. If only
k,1≤k<n,oftheg
i’s are known, we introduce arbitrarilyn−kfunctions such that the
conditions of the theorem are satisfied. To find the joint marginal density of thesekvari-
ables we simply integrate thewfunction over all then−kvariables that were arbitrarily
introduced.
Remark 2.An analog of Theorem 2.5.4 holds, which we state without proof.
LetX=(X
1,X2,...,X n)be an RV of the continuous type with joint PDFf, and let
y
i=gi(x1,x2,...,x n),i=1,2,...,n, be a mapping ofR ninto itself. Suppose that for each
ythe transformationghas a finite numberk=k(y)of inverses. Suppose further thatR
n
can be partitioned intokdisjoint setsA 1,A2,...,A k, such that the transformationgfrom
A
i(i=1,2,...,n)intoR nis one-to-one with inverse transformation
x
1=h1i
(y1,y2,...,y n),...,x n=hni
(y1,y2,...,y n),i=1,2,...,k.
Suppose that the first partial derivatives are continuous and that each Jacobian
J
i=
∂h1i
∂y1
∂h1i
∂y2
···
∂h1i
∂yn
∂h2i
∂y1
dh2i
∂y2
···
∂h2i
∂yn
.
.
.
.
.
.
.
.
.
dhni
dy1
∂hni
∂y2
···
∂hni
∂yn

130 MULTIPLE RANDOM VARIABLES
is different from 0 in the range of the transformation. Then the joint PDF ofYis given by
w(y
1,y2,...,y n)=
k
Γ
i=1
|Ji|f(h1i(y1,y2,...,y n),...,h ni(y1,y2,...,y n)).
Example 4.LetX
1,X2,X3be iid RVs with commonexponentialdensity function
f(x)=

e
−x
ifx>0,
0 otherwise.
Also, let
Y
1=X1+X2+X3,Y 2=
X
1+X2
X1+X2+X3
,Y 3=
X
1
X1+X2
.
Then
x
1=y1y2y3,x2=y1y2−x1=y1y2(1−y 3)and
x
3=y1−y1y2=y1(1−y 2).
The Jacobian of transformation is given by
J=
y2y3 y1y3 y1y2
y2(1−y 3)y1(1−y 3)−y 1y2
1−y 2 −y1 0
=−y
2
1
y2.
Note that 0<y
1<∞,0<y 2<1, and 0<y 3<1. Thus the joint PDF ofY 1,Y2,Y3is
given by
w(y
1,y2,y3)=y
2
1
y2e
−y1
=(2y 2)

1
2
y
2 1
e
−y1

,0<y
1<∞,0<y 2,y3<1.
It follows thatY
1,Y2, andY 3are independent.
Example 5.LetX
1,X2be independent RVs with common density given by
f(x)=

1if0<x<1,
0 otherwise.
LetY
1=X1+X2,Y2=X1−X2. Then the Jacobian of the transformation is given by
J=
1
2
1
2
1
2

1
2
=−
1
2
,

FUNCTIONS OF SEVERAL RANDOM VARIABLES 131
0
12
1
–1
y
1
y
2
Fig. 3{0<y 1+y2<2,0<y 1−y2<2}.
and the joint density ofY 1,Y2(Fig. 3) is given by
f
Y1,Y2
(y1,y2)=
1
2
f
ł
y
1+y2
2

f
ł
y
1−y2
2

if 0<
y
1+y22
<1,0<
y
1−y2
2
<1,
=
1
2
if(y
1,y2)∈{0<y 1+y2<2,0<y 1−y2<2}.
The marginal PDFs ofY
1andY 2are given by
f
Y1
(y1)=














y1
−y1
1
2
dy
2=y1, 0<y 1≤1,

2−y1
y1−2
1 2
dy
2=2−y 1,1<y 1<2,
0, otherwise;
f
Y2
(y2)=














y2+2
−y
2
1
2
dy
1=y2+1,−1<y 2≤0,

2−y2
y2
1 2
dy
1=1−y 2,0<y 2<1,
0, otherwise.
Example 6.LetX
1,X2,X3be iid RVs with common PDF
f(x)=
1


e
−x
2
/2
, −∞<x<∞.

132 MULTIPLE RANDOM VARIABLES
LetY 1=(X 1−X2)/

2,Y2=(X 1+X2−2X 3)/

6, andY 3=(X 1+X2+X3)/

3. Then
x
1=
y
1

2
+
y
2

6
+
y
3

3
,
x
2=−
y
1

2
+
y
2

6
+
y
3

3
x
3=−

2y2

3
+
y
3

3
.
The Jacobian of transformation is given by
J=
1

2
1

6
1

3
−1

2
1

6
1

3
0


2

3
1

3
=1.
The joint PDF ofX
1,X2,X3is given by
g(x
1,x2,x3)=
1
(

2π)
3
exp


x
2
1
+x
2
2
+x
2
3
2

,x
1,x2,x3∈R.
It is easily checked that
x
2
1
+x
2
2
+x
2
3
=y
2
1
+y
2
2
+y
2
3
,
so that the joint PDF ofY
1,Y2,Y3is given by
w(y
1,y2,y3)=
1
(

2π)
3
exp


y
2
1
+y
2
2
+y
2
3
2

.
It follows thatY
1,Y2,Y3are also iid RVs with common PDFf.
In Example 6 the transformation used is orthogonal and is known asHelmert’s trans-
formation. In fact, we will show in Section 6.5 that under orthogonal transformations iid
RVs with PDFfdefined above are transformed into iid RVs with the same PDF.
It is easily verified that
y
2
1
+y
2
2
=
3
Γ
j=1

x
j−
x
1+x2+x3
3

2
.
We have therefore proved that(X
1+X2+X3)is independent of
Θ
3
j=1
{Xj−[(X 1+X2+
X
3)/3]}
2
. This is a very important result in mathematical statistics, and we will return to
it in Section 7.7.

FUNCTIONS OF SEVERAL RANDOM VARIABLES 133
Example 7.Let(X,Y)be abivariate normalRV with joint PDF
f(x,y)=
1
2πσ1σ2(1−ρ
2
)
1/2
·exp


1
2(1−ρ
2
)

(x−μ
1)
2
σ
2
1

2ρ(x−μ
1)(y−μ 2)
σ1σ2
+
(y−μ
2)
2
σ
2
2

,
−∞<x<∞,−∞<y<∞;μ
1∈R,μ 2∈R;
andσ
1>0,σ 2>0,|ρ|<1.
Let
U
1=

X
2
+Y
2
,U 2=
X
Y
.
Foru
1>0,

x
2
+y
2
=u1 and
x
y
=u
2
have two solutions:
x
1=
u
1u2

1+u
2
2
,y1=
u
1

1+u
2 2
,andx 2=−x 1,y2=−y 1
for anyu 2∈R. The Jacobians are given by
J
1=J2=
u2

1+u
2 2
u1
(1+u
2 2
)
3/2
1

1+u
2 2

u
1u2
(1+u
2
2
)
3/2
=−
u
1
1+u
2
2
.
It follows from the result in Remark 2 that the joint PDF of(U
1,U2)is given by
w(u
1,u2)=

















u
1
1+u
2
2

f

u
1u2

1+u
2 2
,
u
1

1+u
2 2

+f

−u
1u2

1+u
2 2
,
−u
1

1+u
2 2

ifu
1>0,u 2∈R,
0 otherwise.
In the special case whereμ
1=μ2=0,ρ=0, andσ 1=σ2=σ,wehave
f(x,y)=
1
2πσ
2
e
−[(x
2
+y
2
)/2σ
2
]
so thatXandYare independent. Moreover,
f(x,y)=f(−x,−y),

134 MULTIPLE RANDOM VARIABLES
and it follows that whenXandYare independent
w(u
1,u2)=



1
2πσ
2
2u1
1+u
2
2
e
−u
2
1
/2σ
2
,u1>0,−∞<u 2<∞,
0, otherwise.
Since
w(u
1,u2)=
1
π(1+u
2
2
)
u
1
σ
2
e
−u
2
1
/2σ
2
,
it follows thatU
1andU 2are independent with marginal PDFs given by
w
1(u1)=

u
1
σ
2
e
−u
2
1
/2σ
2
,u1>0,
0, u
1≤0
and
w
2(u2)=
1
π(1+u
2
2
)
, −∞<u
2<∞,
respectively.
An important application of the result in Remark 2 will appear in Theorem 4.7.2.
Theorem 3.Let(X,Y)be an RV of the continuous type with PDFf.Let
Z=X+Y,U=X−Y,V=XY;
and letW=X/Y. Then the PDFs ofZ,U,V, andWare, respectively, given by
f
Z(z)=


−∞
f(x,z−x)dx, (2)
f
U(u)=


−∞
f(u+y,y)dy, (3)
f
V(v)=


−∞
f

x,
v
x

1
|x|
dx, (4)
f
W(w)=


−∞
f(xw,x)|x|dx. (5)
Proof.The proof is left as an exercise.

FUNCTIONS OF SEVERAL RANDOM VARIABLES 135
Corollary.IfXandYare independent with PDFsf 1andf 2, respectively, then
f
Z(z)=


−∞
f1(x)f2(z−x)dx, (6)
f
U(u)=


−∞
f1(u+y)f 2(y)dy, (7)
f
V(v)=


−∞
f1(x)f2

v
x

1
|x|
dx, (8)
f
W(w)=


−∞
f1(xw)f 2(x)|x|dx. (9)
Remark 3.LetFandGbe two absolutely continuous DFs; then
H(x)=


−∞
F(x−y)G

(y)dy=


−∞
G(x−y)F

(y)dy
is also an absolutely continuous DF with PDF
H

(x)=


−∞
F

(x−y)G

(y)dy=


−∞
G

(x−y)F

(y)dy.
If
F(x)=
Γ
k
pkε(x−x k)andG(x)=
Γ
j
qjε(x−y j)
are two DFs, then
H(x)=
Γ
k
Γ
j
pkqjε(x−x k−yj)
is also a DF of an RV of the discrete type. The DFHis called theconvolutionofFandG,
and we writeH=F∗G. Clearly, the operation is commutative and associative; that is, if
F
1,F2,F3are DFs,F 1∗F2=F2∗F1and(F 1∗F2)∗F3=F1∗(F2∗F3). In this terminology,
ifXandYare independent RVs with DFsFandG, respectively,X+Yhas the convolution
DFH=F∗G. Extension to an arbitrary number of independent RVs is obvious.
Finally, we consider a technique based on MGF or CF which can be used in certain
situations to determine the distribution of a functiong(X
1,X2,...,X n)ofX 1,X2,...,X n.
Let(X
1,X2,...,X n)be ann-variate RV, andgbe a Borel-measurable function fromR n
toR1.

136 MULTIPLE RANDOM VARIABLES
Definition 1.If(X 1,X2,...,X n)is discrete type and
Θ
x1,...,x n
|g(x1,x2,...,x n)|P{X 1=
x
1,X2=x2,...,X n=xn}<∞, then the series
Eg(X
1,X2,...,X n)=
Γ
x1,...,x n
g(x1,x2,...,x n)P{X 1=x1,X2=x2,...,X n=xn}
is called theexpected valueofg(X
1,X2,...,X n).If(X 1,X2,...,X n)is a continuous type
RV with joint PDFf, and if


−∞

−∞
···


−∞
|g(x1,x2,...,x n)|f(x 1,x2,...,x n)
n

i=1
dxi<∞,
then
Eg(X
1,X2,...,X n)=


−∞

−∞
···


−∞
g(x1,x2,...,x n)f(x1,x2,...,x n)
n

i=1
dxi
is called the expected value ofg(X 1,X2,...,X n).
LetY=g(X
1,X2,...,X n), and leth(y)be its PDF. IfE|Y|<∞then
EY=


−∞
yh(y)dy.
An analog of Theorem 3.2.1 holds. That is,


−∞
yh(y)dy =


−∞

−∞
···


−∞
g(x1,x2,...,x n)f(x1,x2,...,x n)
n

i=1
dxi
in the sense that if either integral exists so does the other and the two are equal. The result
also holds in the discrete case.
Some special functions of interest are
Θ
n
j=1
xj,

n
j=1
x
kj
j
, wherek 1,k2,...,k nare non-
negative integers,e
Θ
n
j=1
tjxj
, wheret 1,t2,...,t nare real numbers, ande
i
Θ
n
j=1
tjxj
, where
i=

−1.
Definition 2.LetX
1,X2,...,X nbejointly distributed.IfEe
Θ
n
j=1
tjXj
exists for|t j|≤h j,
j=1,2,...,n,forsomeh
j>0,j=1,2,...,n, we write
M(t
1,t2,...,t n)=E

e
t1X1+t2X2+···+t nXn

(10)
and call it the MGF of the joint distribution of(X
1,X2,...,X n)or, simply, the MGF of
(X
1,X2,...,X n).
Definition 3.Lett
1,t2,...,t nbe real numbers andi=

−1. Then the CF of
(X
1,X2,...,X n)is defined by

FUNCTIONS OF SEVERAL RANDOM VARIABLES 137
φ(t1,t2,...,t n)=E



exp

⎝i
n

j=1
tjXj





=E



cos


n

j=1
tjXj





+iE



sin


n

j=1
tjXj





(11)
As in the univariate caseφ(t
1,t2,...,t n)always exists.
We will mostly deal with MGF even though the condition that it exist for|t
j|≤h j,
j=1,2,...,nrestricts its application considerably. The multivariate MGF (CF) has prop-
erties similar to the univariate MGF discussed earlier. We state some of these without
proof. For notational convenience we restrict ourselves to the bivariate case.
Theorem 4.The MGFM(t
1,t2)uniquely determines the joint distribution of(X,Y), and
conversely, if the MGF exists it is unique.
Corollary.The MGFM(t
1,t2)completely determines the marginal distributions ofXand
Y. Indeed,
M(t
1,0)=E(e
t1X
)=M X(t1), (12)
M(0,t
2)=E(e
t2Y
)=M Y(t2). (13)
Theorem 5.IfM(t
1,t2)exists, the moments of all orders of(X,Y)exist and may be
obtained from

m+n
M(t1,t2)
∂t
m
1
∂t
n
2




t1=t2=0
=E(X
m
Y
n
). (14)
Thus,
∂M(0,0)
∂t1
=EX,
∂M(0,0)
∂t2
=EY,

2
M(0,0)
∂t
2
1
=EX
2
,

2
M(0,0)
∂t
2
2
=EY
2
,

2
M(0,0)
∂t1∂t2
=E(XY),
and so on.
A formal definition of moments in the multivariate case will be given in Section 4.5.
Theorem 6.XandYare independent RVs if and only if
M(t
1,t2)=M(t 1,0)M(0,t 2)for allt 1,t2∈R. (15)
Proof.LetXandYbe independent. Then,
M(t
1,t2)=E{e
t1X+t2Y
}=E(e
t1X
)E(e
t2Y
)=M(t 1,0)M(0,t 2).

138 MULTIPLE RANDOM VARIABLES
Conversely, if
M(t
1,t2)=M(t 1,0)M(0,t 2),
then, in the continuous case,

e
t1x+t2y
f(x,y)dxdy=

e
t1x
f1(x)dx

e
t2y
f2(y)dy

,
that is,

e
t1x+t2y
f(x,y)dxdy=

e
t1x+t2y
f1(x)f2(y)dxdy.
By the uniqueness of the MGF (Theorem 4) we must have
f(x,y)=f
1(x)f2(y) for all(x,y)∈R 2.
It follows thatXandYare independent. A similar proof is given in the case where(X,Y)
is of the discrete type.
The MGF technique uses the uniqueness property of Theorem 4. In order to find the
distribution (DF, PDF, or PMF) ofY=g(X
1,X2,...,X n)we compute the MGF ofYusing
definition. If this MGF is one of the known kind thenYmust have this kind of distribution.
Although the technique applies to the case whenYis anm-dimensional RV, 1≤k≤n,we
will mostly use it for them=1 case.
Example 8.Let us first consider a simple case whenXis normal PDF
f(x)=
1


e
−x
2
/2
,−∞<x<−∞.
LetY=X
2
. Then
M
Y(s)=Ee
sX
2
=
1




−∞
e
1
2
(1−2s)x
2
dx
=
1

1−2s
,forx<1/2.
It follows (see Section 5.3 and also Example 2.5.7) thatYhas achi-squarePDF
w(y)=
(e
−y/2
)


,y>0.
Example 9.SupposeX
1andX 2are independent with common PDFfof Example 8. Let
Y
1=X1−X2. There are three equivalent ways to use MGF technique here. LetY 2=X2.
Then rather than compute
M(s
1,s2)=Ee
s1Y1+s2Y2
,

FUNCTIONS OF SEVERAL RANDOM VARIABLES 139
it is simpler to recognize thatY 1is univariate so
M
Y1
(s)=Ee
s(X1−X2)
=(Ee
sX1
)(Ee
−sX2
)
=e
s
2
/2
e
s
2
/2
=e
s
2
.
It follows thatY
1has PDF
f(x)=
1


e
−s
2
/4
,−∞<x<∞.
Note thatM
Y1
(s)=M(s,0).
LetY
3=X1+X2. Let us find the joint distribution ofY 1andY 3. Indeed
E

e
s1Y1+s2Y3

=E

e
(s1+s2)X1
·e
(s1−s2)X2

=E

e
(s1+s2)X1

E

e
(s1−s2)X2

=e
(s1+s2)
2
/2
·e
(s1−s2)
2
/2
=e
s
2
1
·e
s
2
2
and it follows thatY 1andY 3are independent RVs with common PDFfdefined above.
The following result has many applications as we will see. Example 9 is a special case.
Theorem 7.LetX
1,X2,...,X nbe independent RVs with respective MGFsM i(s),
i=1,2,...,n. Then the MGF ofY=
Θ
n
i=1
aiXifor real numbersa 1,a2,...,a nis given by
M
Y(s)=
n

i=1
Mi(ais).
Proof.IfM
iexists for|s|≤h i,hi>0, thenM Yexists for|s|≤min(h 1,...,h n)and
M
Y(s)=Ee
s
Θ
n
i=1
aiXi
=
n

i=1
Ee
saiXi
=
n

i=1
Mi(ais).
Corollary.IfX
i’s are iid, then the MGF ofY=
Θ
n
1
Xiis given byM Y(s)=[M(s)]
n
.
Remark 4.The converse of Theorem 7 does not hold. We leave the reader to construct an
example illustrating this fact.
Example 10.LetX
1,X2,...,X mbe iid RVs with common PMF
P{X=k}=

n
k

p
k
(1−p)
n−k
,k=0,1,2,...,n;0<p<1.

140 MULTIPLE RANDOM VARIABLES
Then the MGF ofX iis given by
M(t)=(1−p+pe
t
)
n
.
It follows that the MGF ofS
m=X1+X2+···+X mis
M
Sm
(t)=
m

1
(1−p+pe
t
)
n
=(1−p+pe
t
)
nm
,
and we see thatS
mhas the PMF
P{S
m=s}=

mn
s

p
s
(1−p)
mn−s
,s=0,1,2,...,mn.
From these examples it is clear that to use this technique effectively one must be able
to recognize the MGF of the function under consideration. In Chapter 5 we will study a
number of commonly occurring probability distributions and derive their MGFs (whenever
they exist). We will have occasion to use Theorem 7 quite frequently.
For integer-valued RVs one can sometimes use PGFs to compute the distribution of
certain functions of a multiple RV.
We emphasize the fact that a CF always exists and analogs of Theorems 4–7 can be
stated in terms of CFs.
PROBLEMS 4.4
1.LetFbe a DF andεbe a positive real number. Show that
Ψ
1(x)=
1
ε

x+ε
x
F(x)dx
and
Ψ
2(x)=
1


x+ε
x−ε
F(x)dx
are also distribution functions.
2.LetX,Ybe iid RVs with common PDF
f(x)=

e
−x
ifx>0,
0ifx≤0.
(a) Find the PDF of RVsX+Y,X−Y,XY,X/Y,min{X,Y},max{X,Y},
min{X,Y}/max{X,Y}, andX/(X+Y).

FUNCTIONS OF SEVERAL RANDOM VARIABLES 141
(b) LetU=X+YandV=X−Y. Find the conditional PDF ofV,givenU =u,
for some fixedu>0.
(c) Show thatUandZ=X/(X+Y)are independent.
3.LetXandYbe independent RVs defined on the space(Ω,S,P).LetXbe uniformly
distributed on(−a,a),a>0, andYbe an RV of the continuous type with densityf,
wherefis continuous and positive onR.LetFbe the DF ofY.Ifu
0∈(−a,a)is a
fixed number, show that
f
Y|X+Y(y|u 0)=



f(y)
F(u0+a)−F(u 0−a)
ifu
0−a<y<u 0+a,
0 otherwise,
wheref
Y|X+Y(y|u 0)is the conditional density function ofY,givenX+Y=u 0.
4.LetXandYbe iid RVs with common PDF
f(x)=

1if0≤x≤1,
0 otherwise.
Find the PDFs of RVsXY,X/Y,min{X,Y},max{X,Y},min{X,Y}/max{X,Y}.
5.LetX
1,X2,X3be iid RVs with common density function
f(x)=

1if0≤x≤1;
0 otherwise.
Show that the PDF ofU=X
1+X2+X3is given by
g(u)=

















u
2
2
, 0≤u<1,
3u−u
2

32
,1≤u<2,
(u−3)
2
2
, 2≤u≤3,
0, elsewhere.
An extension to then-variate case holds.
6.LetXandYbe independent RVs with common geometric PMF
P{X=k}=π(1−π)
k
,k=0,1,2,...;0<π<1.
Also, letM=max{X,Y}. Find the joint distribution ofMandX, the marginal
distribution ofM, and the conditional distribution ofX,givenM.
7.LetXbe a nonnegative RV of the continuous type. The integral part,Y,ofXis dis-
tributed with PMFP{Y=k}=λ
k
e
−λ
/k!,k=0,1,2,...,λ>0; and the fractional
part,Z,ofXhas PDFf
z(z)=1if0≤z≤1, and=0 otherwise. Find the PDF of
X, assuming thatYandZare independent.

142 MULTIPLE RANDOM VARIABLES
8.LetXandYbe independent RVs. If at least one ofXandYis of the continuous
type, show thatX+Yis also continuous. What ifXandYare not independent?
9.LetXandYbe independent integral RVs. Show that
P(t)=P
X(t)PY(t),
whereP,P
X, andP Y, respectively, are the PGFs ofX+Y,X, andY.
10.LetXandYbe independent nonnegative RVs of the continuous type with PDFsf
andg, respectively. Letf(x)=e
−x
ifx>0, and=0ifx≤0, and letgbe arbitrary.
Show that the MGFM(t)ofY, which is assumed to exist, has the property that the
DF ofX/Yis 1−M(−t).
11.LetX,Y,Zhave the joint PDF
f(x,y,z)=

6(1+x+y+z)
−4
if 0<x,0<y,0<z,
0 otherwise.
Find the PDF ofU=X+Y+Z.
12.LetXandYbe iid RVs with common PDF
f(x)=

(x

2π)
−1
e
−(1/2)(logx)
2
,x>0,
0, x≤0.
Find the PDF ofZ=XY.
13.LetXandYbe iid RVs with common PDFfdefined in Example 8. Find the joint
PDF ofUandVin the following cases:
(a)U=

X
2
+Y
2
,V=tan
−1
(X/Y),−(π/2)<V≤(π/2).
(b)U=(X+Y)/2,V=(X−Y)
2
/2.
14.Construct an example to show that even when the MGF ofX+Ycan be written as
a product of the MGF ofXand the MGF ofY,XandYneed not be independent.
15.LetX
1,X2,...,X nbe iid with common PDF
f(x)=
1
(b−a)
,a<x<b,=0otherwise.
Using the distribution function technique show that
(a) The joint PDF ofX
(n)=max(X 1,X2,...,X n), andX
(1)=min(X 1,X2,...,X n)
is given by
u(x,y)=
n(n−1)(x−y)
n−2
(b−a)
n
,a<y<x<b,
and=0 otherwise.
(b) The PDF ofX
(n)is given by
g(z)=
n(z−a)
n
(b−a)
n
,a<z<b,=0 otherwise

COVARIANCE, CORRELATION AND MOMENTS 143
and that ofX
(1)by
h(z)=
n(b−z)
n−1(b−a)
n
,a<z<b,=0 otherwise.
16.LetX
1,X2be iid with common Poisson PMF
P(X
i=x)=e
−λ
λ
x
x!
,x=0,1,2,...,i=1,2,
whereλ>0 is a constant. LetX
(2)=max(X 1,X2)andX
(1)=min(X 1,X2).Find
the PMF ofX
(2).
17.LetXhave the binomial PMF
P(X=k)=

n
k

p
k
(1−p)
n−k
,k=0,1,...,n;0<p<1.
LetYbe independent ofXandY
d
=X.FindPMFofU =X+YandW=X−Y.
4.5 COVARIANCE, CORRELATION AND MOMENTS
LetXandYbe jointly distributed on(Ω,S,P). In Section 4.4 we definedEg(X,Y)for
Borel functionsgonR
2. Functions of the formg(x,y)=x
j
y
k
wherejandkare nonnegative
integers are of interest in probability and statistics.
Definition 1.IfE|X
j
Y
k
|<∞for nonnegative integersjandk, we callE(X
j
Y
k
)amoment
of order(j+k)of(X,Y)and write
m
jk=E(X
j
Y
k
). (1)
Clearly,
m
10=EX,m 01=EY
m
20=EX
2
,m 11=EXY,m 02=EY
2
.
'
(2)
Definition 2.IfE


(X−EX)
j
(Y−EY)
k


<∞for nonnegative integersjandk, we call
E
(
(X−EX)
j
(Y−EY)
k
)
acentral moment of order(j+k)and write
μ
jk=E
(
(X−EX)
j
(Y−EY)
k
)
. (3)
Clearly,
μ
10=μ01=0,μ 20=var(X),μ 02=var(Y),
μ
11=E{(X−m 10)(Y−m 01)}.
'
(4)
We see easily that
μ
11=E(XY)−EX EY. (5)

144 MULTIPLE RANDOM VARIABLES
Note that ifXandYincrease (or decrease) together then(X−EX)(Y−EY)should be pos-
itive, whereas ifXdecreases whileYincreases (and conversely) then the product should be
negative. Hence the average value of(X−EX)(Y−EY), namelyμ
11, provides a measure
of association or joint variation betweenXandY.
Definition 3.IfE{(X−EX)(Y−EY)}exists, we call it thecovariancebetweenXandY
and write
cov(X ,Y)=E{(X−EX)(Y−EY)}=E(XY)−EXEY. (6)
Recall (Theorem 3.2.8) thatE{Y−a}
2
is minimized when we choosea=EYso that
EYmay be interpreted as the best constant predictor ofY. If instead, we choose to predict
Yby a linear function ofX,sayaX +b, and measure the error in this prediction byE{Y−
aX−b}
2
, then we should chooseaandbto minimize this so-calledmean square error.
Clearly,E(Y−aX−b)
2
is minimized, for anya, by choosingb=E(Y−aX)=EY−aEX.
With this choice ofb,wefinda such that
E(Y−aX−b)
2
=E{(Y−EY)−a(X−EX)}
2

2
Y
−2aμ 11+a
2
σ
2
X
is minimum. An easy computation shows that the minimum occurs if we choose
a=
μ
11
σ
2
X
, (7)
providedσ
2
X
>0. Moreover,
min
a,b
E(Y−aX−b)
2
=min
a
(
σ
2
Y
−2aμ 11+a
2
σ
2
X
)
=
σ
2
Y
−μ
2
11
σ
2
X

2
Y

1−

μ
11
(σXσY)

2
'
. (8)
Let us write
ρ=
μ
11
σXσY
. (9)
Then (8) shows that predictingYby a linear function ofXreduces the prediction error
fromσ
2
Y
toσ
2
Y
(1−ρ
2
). We may therefore think ofρas a measure of thelinear dependence
between RVsXandY.
Definition 4.IfEX
2
,EY
2
exist, we define thecorrelation coefficientbetweenXandYas
ρ=
cov(X ,Y)
SD(X)SD(Y )
=
EXY−EXEY

EX
2
−(EX)
2

EY
2
−(EY)
2
, (10)
whereSD(X)denotes the standard deviation of RVX.

COVARIANCE, CORRELATION AND MOMENTS 145
We note that for any two real numbersaandb
|ab|≤
a
2
+b
2
2
,
so thatE|XY|<∞ifEX
2
<∞andEY
2
<∞.
Definition 5.We say that RVsXandYareuncorrelatedifρ=0, or equivalently,
cov(X ,Y)=0.
IfXandYare independent, then from (5)cov(X ,Y)=0, andXandYare uncorrelated.
If, however,ρ=0 thenXandYmay not necessarily be independent.
Example 1.LetUandVbe two RVs with common mean and common variance. Let
X=U+VandY=U−V. Then
cov(X ,Y)=E(U
2
−V
2
)−E(U+V)E(U−V)=0
so thatXandYare uncorrelated but not necessarily independent. See Example 4.4.9.
Let us now study some properties of the correlation coefficient. From the definition we
see thatρ(and alsocov(X ,Y)) is symmetric inXandY.
Theorem 1.
(a) The correlation coefficientρbetween two RVsXandYsatisfies
|ρ|≤1. (11)
(b) The equality|ρ|=1 holds if and only if there exist constantsaΘ=0 andbsuch that
P{aX+b=1}=1.
Proof.From (8) sinceE(Y−aX−b)
2
≥0, we must have 1−ρ
2
≥0, or equivalently, (11)
holds.
Equality in (11) holds if and only ifρ
2
=1, or equivalently,E(Y−aX−b)
2
=0 holds.
This implies and is implied byP(Y=aX+b)=1. HereaΘ=0.
Remark 1.From (7) and (9) we note that the signs ofaandρare the same so ifρ=1 then
P(Y=aX+b)wherea>0, and ifρ=−1 thena<0.
Theorem 2.LetEX
2
<∞,EY
2
<∞, and letU=aX+b,V=cY+d. Then,
ρ
X,Y=±ρ U,V,
whereρ
X,Yandρ U,V, respectively, are the correlation coefficients betweenXandYandU
andV.
Proof.The proof is simple and is left as an exercise.

146 MULTIPLE RANDOM VARIABLES
Example 2.LetX,Ybe identically distributed with common PMF
P{X=k}=
1
N
,k=1,2,...,N(N>1).
Then
EX=EY=
N+1
2
,EX
2
=EY
2
=
(N+1)(2N +1)
6
,
so that
var(X)=var(Y)=
N
2
−1
12
.
Also,
E(XY)=
1
2
{EX
2
+EY
2
−E(X−Y)
2
}
=
(N+1)(2N +1)
6

E(X−Y)
2
2
.
Thus,
cov(X ,Y)=
(N+1)(2N +1)
6

E(X−Y)
2
2

(N+1)
2
4
=
(N+1)(N−1)
12

1
2
E(X−Y)
2
and
ρ
X,Y=
(N
2
−1)/12−E(X−Y)
2
/2(N
2
−1)/12
=1−
6E(X−Y)
2
N
2
−1
.
IfP{X=Y}=1, thenρ=1, and conversely. IfP{Y=N+1−X}=1, then
E(X−Y)
2
=E(2X−N−1)
2
=4
(N+1)(2N +1)
6
−4
(N+1)
2
2
+(N+1)
2
,
and it follows thatρ
XY=−1. Conversely, ifρ X,Y=−1, from Remark 1 it follows that
Y=−aX+bwith probability 1 for somea>0 and some real numberb.Tofinda andb,
we note thatEY=−aEX+bso thatb=[(N+1)/2](1 +a).AlsoEY
2
=E(b−aX)
2
,
which yields
(1−a
2
)EX
2
+2abEX−b
2
=0.
Substituting forbin terms ofaand the values ofEX
2
andEX, we see thata
2
=1, so that
a=1. Hence,b=N+1, and it follows thatY=N+1−Xwith probability 1.

COVARIANCE, CORRELATION AND MOMENTS 147
Example 3.Let(X,Y)be jointly distributed with density function
f(x,y)=

x+y,0<x<1,0<y<1.
0, otherwise.
Then
EX
l
Y
m
=

1
0
1
0
x
l
y
m
(x+y)dxdy
=

1
0
1
0
x
l+1
y
m
dxdy+

1
0
1
0
x
l
y
m+1
dxdy
=
1
(l+2)(m+1)
+
1
(l+1)(m+2)
,
wherelandmare positive integers. Thus
EX=EY=
7
12
,
EX
2
=EY
2
=
5
12
,
var(X)=var(Y)=
5
12

49
144
=
11
144
,
cov(X ,Y)=
1
3

49
144
=−
1
144
,ρ=−1/11.
Theorem 3.LetX
1,X2,...,X nbe RVs such thatE|X i|<∞,i=1,2,...,n.Let
a
1,a2,...,a nbe real numbers, and write
S=a
1X1+a2X2+···+a nXn.
ThenESexists, and we have
ES=
n
Γ
j=1
ajEXj. (12)
Proof.If(X
1,X2,...,X n)is of the discrete type, then
ES=
Γ
i1,i2,...,i n
(a1xi1
+a2xi2
+···+a nxin
)P{X 1=xi1
,X2=xi2
,···,X n=xin
}
=a
1
Γ
i1
xi1
Γ
i2,...,i n
P{X1=xi1
,...,X n=xin
}
+···+a
n
Γ
in
xin
Γ
i1,...,i n−1
P{X1=xi1
,...,X n=xin
}
=a
1
Γ
i1
xi1
P{X1=xi1
}+···+a n
Γ
in
P{Xn=xin
}
=a
1EX1+···+a nEXn.

148 MULTIPLE RANDOM VARIABLES
The existence ofESfollows easily by replacing eacha jby|a j|and eachx ijby|x ij|and
remembering thatE|X
j|<∞,j=1,2,...,n. The case of continuous type(X 1,X2,...,X n)
is similarly treated.
Corollary.Takea
1=a2=···=a n=1/n. Then
E

X
1+X2+···+X n
n

=
1
n
n

i=1
EXi,
and ifEX
1=EX2=···=EX n=μ, then
E

X
1+X2+···X n
n

=μ.
Theorem 4.LetX
1,X2,...,X nbe independent RVs such thatE|X i|<∞,i=1,2,...,n.
ThenE(

n
i=1
Xi)exists and
E

n

i=1
Xi

=
n

i=1
EXi. (13)
LetXandYbe independent, andg
1(·)andg 2(·)be Borel-measurable functions. Then
we know (Theorem 4.3.3) thatg
1(X)andg 2(Y)are independent. IfE{g 1(X)},E{g 2(Y)},
andE{g
1(X)g 2(Y)}exist, it follows from Theorem 4 that
E{g
1(X)g 2(Y)}=E{g 1(X)}E{g 2(Y)}. (14)
Conversely, if for any Borel setsA
1andA 2we takeg 1(X)=1ifX∈A 1, and=0 otherwise,
andg
2(Y)=1ifY ∈A 2, and=0 otherwise, then
E{g
1(X)g 2(Y)}=P{X∈A 1,Y∈A 2}
andE{g
1(X)}=P{X∈A 1},E{g 2(Y)}=P{Y∈A 2}. Relation (14) implies that for any
Borel setsA
1andA 2of real numbers
P{X∈A
1,Y∈A 2}=P{X∈A 1}P{Y∈A 2}.
It follows thatXandYare independent if (14) holds. We have thus proved the following
theorem.
Theorem 5.Two RVsXandYare independent if and only if for every pair of Borel-
measurable functionsg
1andg 2the relation
E{g
1(X)g 2(Y)}=E{g 1(X)}E{g 2(Y)} (15)
holds, provided that the expectations on both sides of (15) exist.

COVARIANCE, CORRELATION AND MOMENTS 149
Theorem 6.LetX 1,X2,...,X nbe RVs withE|X i|
2
<∞fori=1,2,...,n.Let
a
1,a2,...,a nbe real numbers and writeS=
π
n
i=1
aiXi. Then the variance ofSexists and
is given by
var(S)=
n

i=1
a
2
i
var(X i)+
n

i=1
n

j=1
iπ=j
aiajcov(X i,Xj). (16)
If, in particular,X
1,X2,...,X nare such thatcov(X i,Xj)=0fori,j=1,2,...,n,iπ=j, then
var(S)=
n

i=1
a
2
i
var(X i). (17)
Proof.We have
var(S)=E
ε
n

i=1
aiXi−
n

i=1
aiEXi
'
2
=E



n

i=1
a
2
i
(Xi−EXi)
2
+

iπ=j
aiaj(Xi−EXi)(Xj−EXj)



=
n

i=1
a
2
i
E(Xi−EXi)
2
+

iπ=j
aiajE{(X i−EXi)(Xj−EXj)}.
If theX
i’s satisfy
cov(X
i,Xj)=0for i,j=1,2,...,n;iπ=j,
the second term on the right side of (16) vanishes, and we have (17).
Corollary 1.LetX
1,X2,...,X nbe exchangeable RVs withvar(X i)=σ
2
,i=1,2,...,n.
Then
var

n

i=1
aiXi


2
n

i=1
a
2
i
+ρσ
2
n

iπ=j
aiaj,
whereρis the correlation coefficient betweenX
iandX j,iπ=j. In particular,
var

n

i=1
Xi
n

=
σ
2
n
+
n−1
n
ρσ
2
.
Corollary 2.IfX
1,X2,...,X nare exchangeable and uncorrelated then
var

n

i=1
aiXi


2
n

i=1
a
2
i
,

150 MULTIPLE RANDOM VARIABLES
and
var

n
Γ
i=1
Xin

=
σ
2
n
.
Theorem 7.LetX
1,X2,...,X nbe iid RVs with common varianceσ
2
. Also, let
a
1,a2,...,a nbe real numbers such that
Θ
n
1
ai=1, and letS=
Θ
n
i=1
aiXi. Then the variance
ofSis least if we choosea
i=1/n,i=1,2,...,n.
Proof.We have
var(S)=σ
2
n
Γ
i=1
a
2
i
,
which is least if and only if we choose thea
i’s so that
Θ
n
i=1
a
2
i
is smallest, subject to the
condition
Θ
n
i=1
ai=1. We have
n
Γ
i=1
a
2
i
=
n
Γ
i=1

a
i−
1
n
+
1
n

2
=
n
Γ
i=1

a
i−
1
n

2
+
2
n
n
Γ
i=1

a
i−
1n

+
1
n
=
n
Γ
i=1

a
i−
1
n

2
+
1
n
,
which is minimized for the choicea
i=1/n,i=1,2,...,n.
Note that the result holds if we replace independence by the condition thatX
i’s are
exchangeable and uncorrelated.
Example 4.Suppose thatrballs are drawn one at a time without replacement from a bag
containingnwhite andmblack balls. LetS
rbe the number of black balls drawn.
Let us define RVsX
kas follows:
X
k=1ifthekth ball drawn is black
=0ifthekth ball drawn is white
k=1,2,...,r.
Then
S
r=X1+X2+···+X r.
Also
P{X
k=1}=
m
m+n
,P{X
k=0}=
n
m+n
. (18)

COVARIANCE, CORRELATION AND MOMENTS 151
ThusEX k=m/(m+n)and
var(X
k)=
m
m+n

m
2
(m+n)
2
=
mn
(m+n)
2
.
To computecov(X
j,Xk),jΘ=k, note that the RVX jXk=1ifthejth and thekth balls drawn
are black, and=0 otherwise. Thus
E(X
jXk)=P{X j=1,X k=1}=
m
m+n
m−1
m+n−1
(19)
and
cov(X
j,Xk)=−
mn
(m+n)
2
(m+n−1)
.
Thus
ES
r=
r
Γ
k=1
EXk=
mr
m+n
and
var(S
r)=r
mn
(m+n)
2
−r(r−1)
mn
(m+n)
2
(m+n−1)
=
mnr
(m+n)
2
(m+n+1)
(m+n−r).
The reader is asked to satisfy himself that (18) and (19) hold.
Example 5.LetX
1,X2,...,X nbe independent, anda 1,a2,...,a nbe real numbers such that
Θ
a
i=1. Assume thatE|X
2
i
|<∞,i=1,2,...,n, and letvar(X i)=σ
2
i
,i=1,2,...,n.
WriteS=
Θ
n
i=1
aiXi. Thenvar(S)=
Θ
n
i=1
a
2
i
σ
2
i
=σ, say. To find weightsa isuch thatσ
is minimum, we write
σ=a
2
1
σ
2
1
+a
2
2
σ
2
2
+···+(1−a 1−a2−···−a n−1)
2
σ
2
n
and differentiate partially with respect toa 1,a2,...,a n−1, respectively. We get
∂σ
∂a1
=2a 1σ
2
1
−2(1−a 1−a2−···−a n−1)σ
2
n
=0,
.
.
.
∂σ∂an−1
=2a n−1σ
2
n−1
−2(1−a 1−a2−···−a n−1)σ
2
n
=0.
It follows that
a

2
j
=anσ
2
n
,j=1,2,...,n−1,

152 MULTIPLE RANDOM VARIABLES
that is, the weightsa j,j=1,2,...,n, should be chosen proportional to 1/σ
2
j
. The minimum
value ofσis then
σ
min=
n

i=1
k
2
σ
4
i
σ
2
i
=k
2
n

i=1
1
σ
2
i
,
wherekis given by
π
n
j=1
(k/σ
2
j
)=1. Thus
σ
min=
1
π
n
j=1
(1/σ
2
j
)
=
Hn
,
whereHis the harmonic mean of theσ
2
j
.
We conclude this section with some important moment inequalities. We begin with the
simple inequality
|a+b|
r
≤cr(|a|
r
+|b|
r
), (20)
wherec
r=1for0≤r≤1 and=2
r−1
forr>1. Forr=0 andr=1, (20) is trivially true.
First note that it is sufficient to prove (20) when 0<a≤b.Let0<a≤b, and write
x=a/b. Then
(a+b)
r
a
r
+b
r
=
(1+x)
r
1+x
r
.
Writingf(x)=(1+x)
r
/(1+x
r
), we see that
f

(x)=
r(1+x)
r−1
(1+x
r
)
2
(1−x
r−1
),
where 0<x≤1. It follows thatf

(x)>0ifr>1,=0ifr=1, and<0ifr<1. Thus
max
0≤x≤1
f(x)=f(0)=1if r≤1,
while
max
0≤x≤1
f(x)=f(1)=2
r−1
ifr≥1.
Note that|a+b|
r
≤2
r
(|a|
r
+|b|
r
)is trivially true since
|a+b|≤max(2|a|,2|b|).
An immediate application of (20) is the following result.
Theorem 8.LetXandYbe RVs andr>0 be a fixed number. IfE|X|
r
,E|Y|
r
are both
finite, so also isE|X+Y|
r
.

COVARIANCE, CORRELATION AND MOMENTS 153
Proof.Leta=Xandb=Yin (20). Taking the expectation on both sides, we see that
E|X+Y|
r
≤cr(E|X|
r
+E|Y|
r
),
wherec
r=1if0<r≤1 and=2
r−1
ifr>1.
Next we establish Hölder’s inequality,
|xy|≤
|x|
p
p
+
|y|
q
q
, (21)
wherepandqare positive real numbers such thatp>1 and 1/p +1/q=1. Note that for
x>0 the functionw=logxis concave. It follows that forx
1,x2>0
log[tx
1+(1−t)x 2]≥tlogx 1+(1−t)logx 2.
Taking antilogarithms, we get
x
t
1
x
1−t
2
≥tx1+(1−t)x 2.
Now we choosex
1=|x|
p
,x2=|y|
q
,t=1/p,1−t=1/q, wherep>1 and 1/p +1/q=1,
to get (21).
Theorem 9.Letp>1,q>1sothat1/p+1/q=1. Then
E|XY|≤(E|X|
p
)
1/p
(E|Y|
q
)
1/q
. (22)
Proof.By Hölder’s inequality, lettingx=X{E|X|
p
}
−1/p
,y=Y{E|Y|
q
}
−1/q
, we get
|XY|≤p
−1
|X|
p
{E|X|
p
}
1/p−1
{E|Y|
q
}
1/q
+q
−1
|Y|
q
{E|Y|
q
}
1/q−1
{E|X|
p
}
1/p
.
Taking the expectation on both sides leads to (22).
Corollary.Takingp=q=2, we obtain theCauchy–Schwarz inequality,
E|XY|≤E
1/2
|X|
2
E
1/2
|Y|
2
.
The final result of this section is an inequality due to Minkowski.
Theorem 10.Forp≥1,
{E|X+Y|
p
}
1/p
≤{E|X|
p
}
1/p
+{E|Y|
p
}
1/p
. (23)
Proof.We have, forp>1,
|X+Y|
p
≤|X||X+Y|
p−1
+|Y||X+Y|
p−1
.

154 MULTIPLE RANDOM VARIABLES
Taking expectations and using Hölder’s inequality withYreplaced by|X+Y|
p−1
(p>1),
we have
E|X+Y|
p
≤{E|X|
p
}
1/p
{E|X+Y|
(p−1)q
}
1/q
+{E|Y|
p
}
1/p
{E|X+Y|
(p−1)q
}
1/q
=[{E|X|
p
}
1/p
+{E|Y|
p
}
1/p
]·{E|X+Y|
(p−1)q
}
1/q
.
Excluding the trivial case in whichE|X+Y|
p
=0, and noting that(p−1)q=p,wehave,
after dividing both sides of the last inequality by{E|X+Y|
p
}
1/q
,
{E|X+Y|
p
}
1/p
≤{E|X|
p
}
1/p
+{E|Y|
p
}
1/p
,p>1.
The casep=1 being trivial, this establishes (23).
PROBLEMS 4.5
1.Suppose that the RV(X,Y)is uniformly distributed over the regionR={(x,y):
0<x<y<1}. Find the covariance betweenXandY.
2.Let(X,Y)have the joint PDF given by
f(x,y)=

x
2
+
xy
3
if 0<x<1,0<y<2,
0 otherwise.
Find all moments of order 2.
3.Let(X,Y)be distributed with joint density
f(x,y)=



1
4
[1+xy(x
2
−y
2
)]if|x|≤1,|y|≤1,
0 otherwise.
Find the MGF of(X,Y).AreX,Yindependent? If not, find the covariance between
XandY.
4.For a positive RVXwith finite first moment show that (1)E

X≤

EXand
(2)E{1/X}≥1/EX .
5.IfXis a nondegenerate RV with finite expectation and such thatX≥a>0, then
E{

X
2
−a
2
}<

(EX)
2
−a
2
.
(Kruskal [56])
6.Show that forx>0


x
te
−t
2
/2
dt

2



x
e
−t
2
/2
dt


x
t
2
e
−t
2
/2
dt,

COVARIANCE, CORRELATION AND MOMENTS 155
and hence that


x
e
−t
2
/2
dt≥
1
2
[(4+x
2
)
1/2
−x]e
−x
2
/2
.
7.Given a PDFfthat is nondecreasing in the intervala≤x≤b, show that for any
s>0

b
a
x
2s
f(x)dx≥
b
2s+1
−a
2s+1
(2s+1)(b−a)

b
a
f(x)dx,
with the inequality reversed iffis nonincreasing.
8.Derive the Lyapunov inequality (Theorem 3.4.3)
{E|X|
r
}
1/r
≤{E|X|
s
}
1/s
,1<r<s<∞,
from Hölder’s inequality (22).
9.LetXbe an RV withE|X|
r
<∞forr>0. Show that the functionlogE|X|
r
is a
convex function ofr.
10.Show with the help of an example that Theorem 9 is not true forp<1.
11.Show that the converse of Theorem 8 also holds for independent RVs, that is, if
E|X+Y|
r
<∞for somer>0 andXandYare independent, thenE|X|
r
<∞,
E|Y|
r
<∞.
[Hint:Without loss of generality assume that the median of bothXandYis 0.
Show that, for anyt>0,P{|X+Y|>t}>(1/2)P{|X |>t}. Now use the remarks
preceding Lemma 3.2.2 to conclude thatE|X|
r
<∞.]
12.Let(Ω,S,P)be a probability space, andA
1,A2,...,A nbe events inSsuch that
P(∪
n
k=1
Ak)>0. Show that
2

1≤j<k<n
P(AjAk)≥
(
γ
n
k=1
PAk)
2

γ
n
k=1
PAk
P(∪
n
k=1
Ak)
.
(Chung and Erdös [14])
[Hint:LetX
kbe the indicator function ofA k,k=1,2,...,n. Use the Cauchy–
Schwarz inequality.]
13.Let(Ω,S,P)be a probability space, andA,B,∈Swith 0<PA<1,0<PB<1.
Defineρ(A,B)byρ(A,B)=correlation coefficient between RVsI
A, andI B, where
I
A,IB, are the indicator functions ofAandB, respectively. Expressρ(A,B)in terms
ofPA,PB, andP(AB) and conclude thatρ(A,B)=0 if and only ifAandBare
independent. What happens ifA=Bor ifA=B
c
?
(a) Show that
ρ(A,B)>0⇔P{A|B}>P(A)⇔P{B|A}>P(B)

156 MULTIPLE RANDOM VARIABLES
and
ρ(A,B)<0⇔P{A|B}<PA⇔P{B|A}<PB.
(b) Show that
ρ(A,B)=
P(AB) P(A
c
B
c
)−P(AB
c
)P(A
c
B)
(PA PA
c
·PBPB
c
)
1/2
.
14.LetX
1,X2,...,X nbe iid RVs and define
¯X=
π
n
i=1
Xi
n
,S
2
=
π
n i=1
(Xi−¯X)
2
(n−1)
.
Suppose that the common distribution is symmetric. Assuming the existence of
moments of appropriate order, show thatcov(¯X,S
2
)=0.
15.LetX,Ybe iid RVs with common standard normal density
f(x)=
1


e
−x
2
/2
,−∞<x<∞.
LetU=X+YandV=X
2
+Y
2
. Find the MGF of the random variable(U,V).
Also, find the correlation coefficient betweenUandV.AreU andVindependent?
16.LetXandYbe two discrete RVs:
P{X=x
1}=p 1,P{X=x 2}=1−p 1;
and
P{Y=y
1}=p 2,P{Y=y 2}=1−p 2.
Show thatXandYare independent if and only if the correlation coefficient between
XandYis 0.
17.LetXandYbe dependent RVs with common means 0, variances 1, and correlation
coefficientρ. Show that
E{max(X
2
,Y
2
)}≤1 +

1−ρ
2
.
18.LetX
1,X2be independent normal RVs with density functions
f
i(x)=
1
σi


exp
ε

1
2

x−μ
i
σi

2
'
,−∞<x<∞;i=1,2.
Also let
Z=X
1cosθ+X 2sinθandW=X 2cosθ−X 1sinθ.

CONDITIONAL EXPECTATION 157
Find the correlation coefficient betweenZandWand show that
0≤ρ
2


σ
2
1
−σ
2
2
σ
2
1

2
2
2
,
whereρdenotes the correlation coefficient betweenZandW.
19.Let(X
1,X2,...,X n)be an RV such that the correlation coefficient between each
pairX
i,Xj,iπ=j,isρ. Show that−(n−1)
−1
≤ρ≤1.
20.LetX
1,X2,...,X m+nbe iid RVs with finite second moment. LetS k=
π
k
j=1
Xj,k=
1,2,...,m+n. Find the correlation coefficient betweenS
nandS m+n−Sm, where
n>m.
21.Letfbe the PDF of a positive RV, and write
g(x,y)=



f(x+y)
x+y
ifx>0,y>0,
0 otherwise.
Show thatgis a density function in the plane. If themth moment offexists for
some positive integerm,findEX
m
. Compute the means and variances ofXandY
and the correlation coefficient betweenXandYin terms of moments off. (Adapted
from Feller [26, p. 100].)
22.A die is thrownn+2 times. After each throw a+sign is recorded for 4, 5, or 6, and
a−sign for 1, 2, or 3, the signs forming an ordered sequence. Each sign, except
the first and the last, is attached to a characteristic RV that assumes the value 1
if both the neighboring signs differ from the one between them and 0 otherwise.
LetX
1,X2,...,X nbe these characteristic RVs, whereX icorresponds to the(i+1)st
sign(i=1,2,...,n)in the sequence. Show that
E

n

1
Xi
'
=
n
4
andvar

n

1
Xi
'
=
5n−2
16
.
23.Let(X,Y)be jointly distributed with PDFfdefined byf(x,y)=
1
2
inside the
square with corners at the points(0,1),(1,0),(−1,0),(0,−1)in the(x,y)-plane,
andf(x,y)=0 otherwise. AreX,Yindependent? Are they uncorrelated?
4.6 CONDITIONAL EXPECTATION
In Section 4.2 we defined the conditional distribution of an RVX,givenY. We showed that,
if(X,Y)is of the discrete type, the conditional PMF ofX,givenY =y
j, whereP{Y=y j}>
0, is a PMF when considered as a function of thex
i’s (for fixedy j). Similarly, if(X,Y)is an
RV of the continuous type with PDFf(x,y)and marginal densitiesf
1andf 2, respectively,
then, at every point(x,y)at whichfis continuous and at whichf
2(y)>0 and is continuous,
a conditional density function ofX,givenY, exists and may be defined by
f
X|Y(x|y)=
f(x,y)
f2(y)
.

158 MULTIPLE RANDOM VARIABLES
We also showed thatf
X|Y(x|y),forfixedy, when considered as a function of xis a PDF
in its own right. Therefore, we can (and do) consider the moments of this conditional
distribution.
Definition 1.LetXandYbe RVs defined on a probability space(Ω,S,P), and lethbe a
Borel-measurable function. Then theconditional expectationofh(X),givenY , written as
E{h(X)|Y}, is an RV that takes the valueE{h(X)|y}, defined by
E{h(X)|y}=
















x
h(x)P{X=x|Y=y}if(X,Y)is of the discrete
type andP{Y=y}>0,


−∞
h(x)f
X|Y(x|y)dx if(X,Y)is of the contain-
nous type andf
2(y)>0.
(1)
when the RVYassumes the valuey.
Needless to say, a similar definition may be given for the conditional expectation
E{h(Y)|X}.
It is immediate thatE{h(X)|Y}satisfies the usual properties of an expectation provided
we remember thatE{h(X)|Y}is not a constant but an RV. The following results are easy
to prove. We assume existence of indicated expectations.
E{c|Y}=c,for any constantc (2)
E{[a
1g1(X)+a 2g2(X)]|Y}=a 1E{g1(X)|Y}+a 2E{g2(X)|Y}, (3)
for any Borel functionsg
1,g2.
P(X≥0)=1=⇒E{X|Y}≥0( 4)
P(X
1≥X2)=1=⇒E{X 1|Y}≥E{X 2|Y}. (5)
The statements in (3), (4), and (5) should be understood to hold with probability 1.
E{X|Y}=E(X),E{Y|X}=E(Y) (6)
for independent RVsXandY.
Ifφ(X,Y)is a function ofXandY, then
E{φ(X,Y)|y}=E{φ(X,y)|y} (7)
E{ψ(X)φ(X,Y)|X}=ψ(X)E{φ(X,Y)|X} (8)
for any Borel functionsψandφ.
Again (8) should be understood as holding with probability 1. Relation (7) is useful as
a computational device. See Example 3 below.
The moments of a conditional distribution are defined in the usual manner. Thus, for
r≥0,E{X
r
|Y}defines therth moment of the conditional distribution. We can define the

CONDITIONAL EXPECTATION 159
central moments of the conditional distribution and, in particular, the variance. There is
no difficulty in generalizing these concepts forn-dimensional distributions whenn>2.
We leave the reader to furnish the details.
Example 1.An urn contains three red and two green balls. A random sample of two balls
is drawn (a) with replacement and (b) without replacement. LetX=0 if the first ball drawn
is green,=1 if the first ball drawn is red, and letY=0 if the second ball drawn is green,
=1 if the second ball drawn is red.
The joint PMF of(X,Y)is given in the following tables:
(a) With Replacement (b) Without Replacement
Y
X
01
0
4
25
6
25
2
5
1
6
25
9
25
3
5
2
5
3
5
1
Y
X
01
0
2
20
6
20
2
5
1
6
20
6
20
3
5
2
5
3
5
1
The conditional PMFs and the conditional expectations are as follows:
(a)P{X=x|0}=
ε
2
5
,x=0,
3
5
,x=1,
P{Y=y|0}=
ε
2
5
,y=0,
3
5
,y=1,
P{X=x|1}=
ε
2
5
,x=0,
3
5
,x=1,
P{Y=y|1}=
ε
2
5
,y=1,
3
5
,y=1,
E{X|Y}=
ε
3
5
,y=0
3
5
,y=1,
E{Y|X}=
ε
3
5
,x=0,
3
5
,x=1;
(b)P{X=x|0}=
ε
1 4
,x=0,
3
4
,x=1,
P{Y=y|0}=
ε
1
4
,y=0,
3
4
,y=1,
P{X=x|1}=
ε
1
2
,x=0,
1
2
,x=1,
P{Y=y|1}=
ε
1
2
,y=0,
1
2
,y=1,
E{X|Y}=
ε
3
4
,y=0,
1
2
,y=1,
E{Y|X}=
ε
3
4
,x=0,
1
2
,x=1.
Example 2.For the RV(X,Y)considered in Examples 4.2.5 and 4.2.7
E{Y|x}=

1
x
yf
Y|X(y|x)dy=
1
2
1−x
2
1−x
=
1+x
2
0<x<1
and
E{X|y}=

y
0
xf
X|Y(x|y)dx=
y
2
,0<y<1.

160 MULTIPLE RANDOM VARIABLES
Also,
E{X
2
|y}=

y
0
x
2
1
y
dx=
y
2
3
,0<y<1
and
var{X|y}=E{X
2
|y}−[E{X|y}]
2
=
y
2
3

y
2
4
=
y
2
12
,0<y<1.
Theorem 1.LetEh(X)exist. Then,
Eh(X)=E{E{h(X)|Y}}, (9)
Proof.Let(X,Y)be of the discrete type. Then,
E{E{h(X)|Y}}=

y


x
h(x)P{X=x|Y=y}
'
P{Y=y}
=

y


x
h(x)P{X=x,Y=y}
'
=

x
h(x)

y
P{X=x,Y=y}
=Eh(X).
The proof in the continuous case is similar.
Theorem 1 is quite useful in computation ofEh(X)in many applications.
Example 3.LetXandYbe independent continuous type RVs with respective PDFfand
g, and DF’sFandG. ThenP{X<Y}is of interest in many statistical applications. In view
of Theorem 1
P(X<Y)=EI
{X<Y} =E
(
E
(
I
{X<Y} |Y
))
,
whereI
Ais the indicator function of eventA.Now
E
(
I
{X<Y} |Y=y
)
=E
(
I
[X<y]|y
)
=E

I
[X<y]

=F(y)
and it follows that
P{X<Y}=E{F(Y)}=


−∞
F(y)g(y)dy.

CONDITIONAL EXPECTATION 161
If, in particular,X
d
=Y, then
P{X<Y}=


−∞
F(y)f(y)dy=
1
2
.
More generally,
P{X−Y≤z}=E
(
E
(
I
{X−Y≤z} |Y
))
=E{F(Y+z)}
=


−∞
F(y+z)g(y)dy
gives the DF ofZ=X−Yas computed in corollary to Theorem 4.4.3.
Example 4.Consider the joint PDF
f(x,y)=xe
−x(1+y)
,x≥0,y≥0,and 0 otherwise
of(X,Y). Then
f
X(x)=e
−x
,x≥0,and 0 otherwise
f
Y(y)=
1
(1+y)
2
,y≥0,and 0 otherwise.
Clearly,EYdoes not exist but
E{Y|x}=


0
yxe
−xy
dy=
1
x
.
Theorem 2.IfEX
2
<∞, then
var(X)=var(E{X|Y})+E (var{X|Y}). (10)
Proof.The right-hand side of (10) equals, by definition,
{E(E{X|Y})
2
−[E(E{X|Y})]
2
}+E(E{X
2
|Y}−(E{X|Y})
2
)
={E(E{X|Y})
2
−(EX)
2
}+EX
2
−E(E{X|Y})
2
=var(X).
Corollary.IfEX
2
<∞, then
var(X)≥var(E{X|Y}) (11)
with equality if and only ifXis a function ofY.

162 MULTIPLE RANDOM VARIABLES
Equation (11) follows immediately from (10). The equality in (11) holds if and only if
E(var{X|Y})=E(X−E{X|Y})
2
=0,
which holds if and only if with probability 1
X=E{X|Y}. (12)
Example 5.LetX
1,X2,...be iid RVs and letNbe a positive integer-valued RV. Let
S
N=
Θ
N
k=1
Xkand suppose that theX’s andNare independent. Then,
E(S
N)=E{E{S N|N}}.
Now,
E{S
N|N=n}=E{S n|N=n}=nEX 1
so that
E(S
N)=E{NEX 1}=(EN)(EX 1).
Again, we have assumed above and below that all indicated expectations exist. Also,
var(S
N)=var(E{S N|N})+E (var{S N|N}).
First,
var(E{S
N|N})=var(NEX 1)=(EX 1)
2
var(N).
Second,
var{S
N|N=n}=nvar(X 1)
so
E(var{S
N|N})=(EN)var(X 1).
It follows that
Var(S
N)=(EX 1)
2
var(N)+(EN)var(X 1).
PROBLEMS 4.6
1.LetXbe an RV with PDF given by
f(x)=
1
σ


exp


1
2
(x−μ)
2
σ
2

,−∞<x<∞,−∞<μ<∞,σ > 0.
FindE{X|a<X<b}, whereaandbare constants.

CONDITIONAL EXPECTATION 163
2.(a) Let(X,Y)be jointly distributed with density
f(x,y)=

y(1+x)
−4
e
−y(1+x)
−1
,x,y≥0,
0, otherwise.
FindE{Y|X}.
(b) Do the same for the joint density
f(x,y)=
4
5
(x+3y)e
−x−2y
,x,y≥0,
=0, otherwise.
3.Let(X,Y)be jointly distributed with bivariate normal density
f(x,y)=
1
2πσ1σ2

1−ρ
2
·exp

1
2(1−ρ
2
)


x−μ
1
σ1

2
−2ρ

x−μ
1
σ1

y−μ
2
σ2

+

y−μ
2
σ2

2
'
.
FindE{X|y}andE{Y|x}.(Here,μ
1,μ2∈R,σ 1,σ2>0, and|ρ|<1.)
4.FindE{Y−E{Y|X}}
2
.
5.Show thatE(Y−φ(X))
2
is minimized by choosingφ(X)=E{Y|X}.
6.LetXhave PMF
P
λ(X=x)=λ
x
e
−λ
/x!,x=0,1,2,...
and suppose thatλis a realization of a RVΛwith PDF
f(λ)=e
−λ
,λ>0.
FindE{e
−Λ
|X=1}.
7.FindE(XY)by conditioning onXorYfor the following cases:
(a)f(x,y)=xe
−x(1+y)
,x>0,y>0, and 0 otherwise.
(b)f(x,y)=2, 0≤y≤x≤1, and zero otherwise.
8.SupposeXhas uniform PDFf(x)=1, 0≤x≤1, and 0 otherwise. LetYbe chosen
from interval(0,X]according to PDF
g(y|x)=
1
x
,0<y≤x,and 0 otherwise
FindE{Y
k
|X}andEY
k
for any fixed constantk>0.

164 MULTIPLE RANDOM VARIABLES
4.7 ORDER STATISTICS AND THEIR DISTRIBUTIONS
Let(X
1,X2,...,X n)be ann-dimensional random variable and(x 1,x2,...,x n)be ann-tuple
assumed by(X
1,X2,...,X n). Arrange(x 1,x2,...,x n)in increasing order of magnitude so
that
x
(1)≤x
(2)≤···≤x
(n),
wherex
(1)=min(x 1,x2,...,x n),x
(2)is the second smallest value inx 1,x2,...,x n, and so
on,x
(n)=max(x 1,x2,...,x n).Ifanytwox i,xjare equal, their order does not matter.
Definition 1.The functionX
(k)of(X 1,X2,...,X n)that takes on the valuex
(k)in each
possible sequence(x
1,x2,...,x n)of values assumed by(X 1,X2,...,X n)is known as the
kth order statistic or statistic of order k.{X
(1),X
(2),...,X
(n)}is called the set of order
statistics for(X
1,X2,...,X n).
Example 1.LetX
1,X2,X3be three RVs of the discrete type. Also, letX 1,X3take on values
0, 1, andX
2take on values 1, 2, 3. Then the RV(X 1,X2,X3)assumes these triplets of values:
(0,1,0),(0,2,0),(0,3,0),(0,1,1),(0,2,1),(0,3,1),(1,1,0),(1,2,0),(1,3,0),(1,1,1),
(1,2,1),(1,3,1);X
(1)takes on values 0, 1;X
(2)takes on values 0, 1; andX(3)takes on
values 1, 2, 3.
Theorem 1.Let(X
1,X2,...,X n)be ann-dimensional RV. LetX
(k),1≤k≤n, be the order
statistic of orderk. ThenX
(k)is also an RV.
Statistical considerations such as sufficiency, completeness, invariance, and ancillarity
(Chapter 8) lead to the consideration of order statistics in problems of statistical inference.
Order statistics are particularly useful in nonparametric statistics (Chapter 13) where, for
example, many test procedures are based on ranks of observations. Many of these methods
require the distribution of the ordered observations which we now study.
In the following we assume thatX
1,X2,...,X nare iid RVs. In the discrete case there is
no magic formula to compute the distribution of anyX
(j)or any of the joint distributions.
A direct computation is the best course of action.
Example 2.SupposeX
n’s are iid with geometric PMF
p
k=P(X=k)=pq
k−1
,k=1,2,...,0<p<1,q=1−p.
Then for any integersx≥1 andr≥1
P{X
(r)=x}=P{X
(r)≤x}−P{X
(r)≤x−1}.
Now
P{X
(r)≤x}=P{At leastrofX’s are≤x}

ORDER STATISTICS AND THEIR DISTRIBUTIONS 165
=
r
Γ
i=1

n
i

[P(X
1≤x)]
i
[P(X1>x)]
n−i
and
P{X
1≥x)=

Γ
k=x
pq
k−1
=(1−p)
x−1
.
It follows that
P{X
(r)=x}=
n
Γ
i=r

n
i

q
(x−1)(n− i)
(
q
n−i
[1−q
x
]
i
−[1−q
x−1
]
i
)
,
x=1,2,.... In particular, letn=r=2. Then,
P{X
(2)=x}=pq
x−1
{pq
x−1
+2−2q
x−1
},x≥1.
Also for integersx,y≥1wehave
P
(
X
(1)=x,X
(2)−X
(1)=y
)
=P{X
(1)=x,X
(2)=x+y}
=P{X
1=x,X 2=x+y}+P{X 1=x+y,X 2=x}
=2pq
x−1
·pq
x+y−1
=2pq
2x−2
·pq
y
and
P{X
(1)=1,X
(2)−X
(1)=0}=P{X
(1)=X
(2)=1}=p
2
.
It follows thatX
(1)andX
(2)−X
(1)are independent RVs.
In the following we assume thatX
1,X2,...,X nare iid RVs of the continuous type with
PDFf.Let{X
(1),X
(2),...,X
(n)}be the set of order statistics forX 1,X2,...,X n. Since the
X
iare all continuous type RVs, it follows with probability 1 that
X
(1)<X
(2)<···<X
(n).
Theorem 2.The joint PDF of(X
(1),X
(2),...,X
(n))is given by
g(x
(1),x
(2),...,x
(n))=

n!

n
i=1
f(x
(i)),x
(1)<x
(2)<···<x
(n),
0, otherwise.
(1)
Proof.The transformation from(X
1,X2,...,X n)to(X
(1),X
(2),...,X
(n))is not one-to-one.
In fact, there aren!possible arrangements ofx
1,x2,...,x nin increasing order of magnitude.
Thus there aren!inverses to the transformation. For example, one of then!permutations
might be
x
4<x1<xn−1<x3<···<x n<x2,

166 MULTIPLE RANDOM VARIABLES
then the corresponding inverse is
x
4=x
(1),x1=x
(2),xn−1=x
(3),x3=x
(4),...,x n=x
(n−1),
x
2=x
(n).
The Jacobian of this transformation is the determinant of ann×nidentity matrix with rows
rearranged, since eachx
(i)equals one and only one ofx 1,x2,...,x n. ThereforeJ=±1 and
g(x
(2),x
(n),x
(4),x
(1),...,x
(3),x
(n−1))=|J|
n

i=1
f(x
(i)),
x
(1)<x
(2)<···<x
(n).
The same expression holds for each of then!arrangements.
It follows (see Remark 2) that
g(x
(1),x
(2),...,x
(n))=
Γ
alln!
inverses
n

i=1
f(x
(i))
=

n!f(x
(1))f(x
(2))···f(x
(n))ifx
(1)<x
(2)···<x
(n).
0 otherwise.
Example 3.LetX
1,X2,X3,X4be iid RVs with PDFf. The joint PDF ofX
(1),X
(2),
X
(3),X
(4)is
g(y
1,y2,y3,y4)=

4!f(y
1)f(y2)f(y3)f(y4),y 1<y2<y3<y4,
0, otherwise.
Let us compute the marginal PDF ofX
(2).Wehave
g
2(y2)=4!

f(y 1)f(y2)f(y3)f(y4)dy1dy3dy4
=4!f(y 2)

y2
−∞


y
2


y
3
f(y4)dy4

f(y
3)f(y1)dy3dy1
=4!f(y 2)

y2
−∞


y
2
[1−F(y 3)]f(y 3)dy3

f(y
1)dy1
=4!f(y 2)

y2
−∞
[1−F(y 2)]
2
2
f(y
1)dy1
=4!f(y 2)
[1−F(y
2)]
2
2!
F(y
2), y 2∈R.
The procedure for computing the marginal PDF ofX
(r),ther th-order statistic of
X
1,X2,...,X nis similar. The following theorem summarizes the result.

ORDER STATISTICS AND THEIR DISTRIBUTIONS 167
Theorem 3.The marginal PDF ofX
(r)is given by
g
r(yr)=
n!
(r−1)!(n−r)!
[F(y
r)]
r−1
[1−F(y r)]
n−r
f(yr), (2)
whereFis the common DF ofX
1,X2,...,X n.
Proof.
g
r(yr)=n!f(y r)

yr
−∞

yr−1
−∞
···

y2
−∞


y
r


y
r+1
···


y
n−1
n

iΘ=r
f(yi)dyn···dyr+1dy1···dyr−1
=n!f(y r)
[1−F(y
r)]
n−r
(n−r)!

y2
−∞
···

yr
−∞
r−1

i=1
[f(yi)dyi]
=n!f(y
r)
[1−F(y
r)]
n−r
(n−r)!
[F(y
r)]
r−1
(r−1)!
as asserted.
We now compute the joint PDF ofX
(j)andX
(k),1≤j<k≤n.
Theorem 4.The joint PDF ofX
(j)andX
(k)is given by
g
jk(yj,yk)=







n!
(j−1)!(k−j−1)!(n−k)!
F
j−1
(yj)[F(y k)−
F(y
j)]
k−j−1
[1−F(y k)]
n−k
f(yj)f(yk) ify j<yk,
0 otherwise.
(3)
Proof.
g
jk(yj,yk)=

yj
−∞
···

y2
−∞

yk
yj
···

yk
yk−2


y
k
···


y
n−1
n!f(y 1)···f(y n)
·dy
n···dyk+1dyk−1···dyj+1dy1···dyj−1
=n!

yj
−∞
···

y2
−∞

yk
yj
···

yk
yk−2
[1−F(y k)]
n−k
(n−k)!
f(y
1)f(y2)···f(y k)
·dy
k−1···dyj+1dy1···dyj−1
=n!
[1−F(y
k)]
n−k
(n−k)!
f(y
k)

yj
−∞
···

y2
−∞
[F(yk)−F(y j)]
k−j−1
(k−j−1)!
·f(y
1)f(y2)···f(y j)dy1···dyj−1

168 MULTIPLE RANDOM VARIABLES
=
n!
(n−k)!(k−j−1)!
[1−F(y
k)]
n−k
[F(yk)−F(y j)]
k−j−1
·f(yk)f(yj)
[F(y
j)]
j−1
(j−1)!
,y
j<yk,
as asserted.
In a similar manner we can show that the joint PDF ofX
(j1),...,X
(jk),1≤j 1<j2
<···<j k≤n,1≤k≤n, is given by
g
j1,j2,...,j k
(y1,y2...,y k)=
n!
(j1−1)!(j 2−j1−1)!···(n−j k)!
·F
j1−1
(y1)f(y1)[F(y 2)−F(y 1)]
j2−j1−1
f(y2)···[1−F(y k)]
n−jk
f(yk)
fory
1<y2<···<y k, and=0 otherwise.
Example 4.LetX
1,X2,...,X nbe iid RVs with common PDF
f(x)=

1if0<x<1,
0 otherwise.
Then,
g
r(yr)=







n!
(r−1)!(n−r)!
y
r−1
r
(1−y r)
n−r
,0<y r<1
(1≤r≤n),
0, otherwise.
The joint distribution ofX
(j)andX
(k)is given by
g
jk(yj,yk)=







n!
(j−1)!(k−j−1)!(n−k)!
y
j−1
j
(yk−yj)
k−j−1
(1−y k)
n−k
,
0<y
j<yk<1,
0, otherwise,
where 1≤j<k≤n.
The joint PDF ofX
(1)andX
(n)is given by
g
1n(y1,yn)=n(n −1)(y n−y1)
n−2
0<y 1<yn<1
and that of the rangeR
n=X
(n)−X
(1)by
g
Rn
(w)=

n(n−1)w
n−2
(1−w),0<w<1,
0, otherwise.

ORDER STATISTICS AND THEIR DISTRIBUTIONS 169
Example 5.LetX
(1),X
(2),X
(3)be the order statistics of iid RVsX 1,X2,X3with common
PDF
f(x)=
Λ
βe
−xβ
,x>0,
0, otherwise,
(β>0).
LetY
1=X
(3)−X
(2)andY 2=X
(2). We show thatY 1andY 2are independent. The joint
PDF ofX
(2)andX
(3)is given by
g
23(x,y)=



3!
1!0!0!
(1−e
−βx
)βe
−βx
βe
−βy
,x<y,
0, otherwise.
The PDF of(Y
1,Y2)is
f(y
1,y2)=3!β
2
(1−e
−βy 2
)e
−βy 2
e
−(y1+y2)β
=
Λ
{3!βe
−2βy 2
(1−e
−βy 2)
}{βe
−βy 1
},0<y 1<∞,0<y 2<∞,
0, otherwise.
It follows thatY
1andY 2are independent.
Finally, we consider the moments, namely, the means, variances, and covariances of
order statistics. SupposeX
1,X2,...,X nare iid RVs with common DFF.Letg be a Borel
function onRsuch thatE|g(X)|<∞, whereXhas DFF. Then for 1≤r≤n






−∞
g(x)
n!
(n−r)!(r−1)!
[F(x)]
r−1
[1−F(x)]
n−r
f(x)dx

≤n

n−1
r−1


−∞
|g(x)|f(x)dx(0≤F≤1)
<∞
and we write
Eg(X
(r))=


−∞
g(y)g r(y)dy,
forr=1,2,...,n. The converse also holds. SupposeE|g(X
(r))|<∞forr=1,2,...,n.
Then,
n

n−1
r−1


−∞
|g(x)|F
r−1
(x)[1−F(x)]
n−r
f(x)dx<∞,

170 MULTIPLE RANDOM VARIABLES
forr=1,2,...,nand hence
n


−∞

n
Γ
r=1

n−1
r−1

F
r−1
(x)[1−F(x)]
n−r
'
|g(x)|f(x)dx
=n


−∞
|g(x)|f(x)dx<∞.
Moreover, it also follows that
n
Γ
r=1
Eg(X
(r))=nEg(X ).
As a consequence of the above remarks we note that ifE|g(X
(r))|=∞for somer,1≤
r≤n, thenE|g(X)|=∞and conversely, ifE|g(X)|=∞thenE|g|X
(r))|=∞for somer,
1≤r≤n.
Example 6.LetX
1,X2,...,X nbe iid withParetoPDFf(x)=1/x
2
,ifx≥1, and=0
otherwise. ThenEX=∞.Nowfor1≤r≤n
EX
(r)=n

n−1
r−1


1
x

1−
1
x

r−1
1
x
n−r
dx
x
2
=n

n−1
r−1

1
0
y
r−1
(1−y)
n−r−1
dy.
Since the integral on the right side converges for 1≤r≤n−1 and diverges forr>n−k,
we see thatEX
(r)=∞forr=n−k+1,...,n.
PROBLEMS 4.7
1.LetX
(1),X
(2),...X
(n)be the set of order statistics of independent RVsX 1,X2,...,X n
with common PDF
f(x)=

βe
−xβ
ifx≥0,
0 otherwise.
(a) Show thatX
(r)andX
(s)−X
(r)are independent for anys>r.
(b) Find the PDF ofX
(r+1)−X
(r).
(c) LetZ
1=nX
(1),Z2=(n−1)(X
(2)−X
(1)),Z3=(n−2)(X
(3)−X
(2)),...,Z n=
(X
(n)−X
(n−1). Show that(Z 1,Z2,...,Z n)and(X 1,X2,...,X n)are identically
distributed.
2.LetX
1,X2,...,X nbe iid from PMF
p
k=1/N,k=1,2,...,N.
Find the marginal distributions ofX
(1),X
(n), and their joint PMF.

ORDER STATISTICS AND THEIR DISTRIBUTIONS 171
3.LetX 1,X2,...,X nbe iid with a DF
f(y)=
ε
y
α
if 0<y<1,
0 otherwise,α> 0.
Show thatX
(i)/X
(n),i=1,2,...,n−1, andX
(n)are independent.
4.LetX
1,X2,...,X nbe iid RVs with common Pareto DFf(x)=ασ
α
/x
α+1
,x>σ
whereα>0,σ>0. Show that
(a)X
(1)and(X
(2)/X
(1),...,X
(n)/X
(1))are independent,
(b)X
(1)has Pareto(σ,nα)distribution, and
(c)
π
n
j=1
n(X
(j)/X
(1))has PDF
f(x)=
x
n−2
e
−αx
(n−2)!
,x>0.
5.LetX
1,X2,...,X nbe iid nonnegative RVs of the continuous type. IfE|X|<∞,show
thatE|X
(r)|<∞. WriteM n=X
(n)=max(X 1,X2,...,X n). Show that
EM
n=EM n−1+


0
F
n−1
(x)[1−F(x)]dx,n=2,3,....
FindEM
nin each of the following cases:
(a)X
ihave the common DF
F(x)=1−e
−βx
,x≥0.
(b)X
ihave the common DF
F(x)=x,0<x<1.
6.LetX
(1),X
(2),...,X
(n)be the order statistics ofnindependent RVsX 1,X2,...,X nwith
common PDFf(x)=1if0<x<1, and=0 otherwise. Show thatY
1=X
(1)/X
(2),
Y
2=X
(2)/X
(3),...,Y
(n−1)=X
(n−1)/X
(n), andY n=X
(n)are independent. Find the
PDFs ofY
1,Y2,...,Y n.
7.For the PDF in Problem 4 findEX
(r).
8.An urn containsNidentical marbles numbered 1 throughN. From the urnnmar-
bles are drawn and letX
(n)be the largest number drawn. Show thatP(X
(n)=k)=

k−1
n−1
*
N
n

,k=n,n+1,...,N, andEX
(n)=n(N+1)/(n+1).

5
SOME SPECIAL DISTRIBUTIONS
5.1 INTRODUCTION
In preceding chapters we studied probability distributions in general. In this chapter we
will study some commonly occurring probability distributions and investigate their basic
properties. The results of this chapter will be of considerable use in theoretical as well
as practical applications. We begin with some discrete distributions in Section 5.2 and
follow with some continuous models in Section 5.3. Section 5.4 deals with bivariate and
multivariate normal distributions and in Section 5.5 we discuss the exponential family of
distributions.
5.2 SOME DISCRETE DISTRIBUTIONS
In this section we study some well-known univariate and multivariate discrete distributions
and describe their important properties.
5.2.1 Degenerate Distribution
The simplest distribution is that of an RVX degenerateat pointk, that is,P{X=k}=1
and=0 elsewhere. If we define
ε(x)=

0ifx<0,
1ifx≥0,
(1)
An Introduction to Probability and Statistics, Third Edition. Vijay K. Rohatgi and A.K. Md. Ehsanes Saleh.
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.

174 SOME SPECIAL DISTRIBUTIONS
theDFoftheRVX isε(x−k). Clearly,EX
l
=k
l
,l=1,2,...,andM(t)=e
tk
. In particular,
var(X)=0. This property characterizes a degenerate RV. As we shall see, the degenerate
RV plays an important role in the study of limit theorems.
5.2.2 Two-Point Distribution
We say that an RVXhas atwo-point distributionif it takes two values,x
1andx 2, with
probabilities
P{X=x
1}=p, P{X=x 2}=1−p, 0<p<1.
We may write
X=x
1I
[X=x 1]+x2I
[X=x 2], (2)
whereI
Ais the indicator function ofA. The DF ofXis given by
F(x)=pε(x−x
1)+(1−p)ε(x−x 2). (3)
Also
EX
k
=px
k
1
+(1−p)x
k
2
,k=1,2,..., (4)
M(t)=pe
txi
+(1−p)e
tx2
for allt. (5)
In particular,
EX=px
1+(1−p)x 2 (6)
and
var(X)=p(1 −p)(x
1−x2)
2
. (7)
Ifx
1=1,x 2=0, we get the importantBernoulliRV:
P{X=1}=p, P{X=0}=1−p, 0<p<1. (8)
For a Bernoulli RVXwith parameterp, we writeX∼b(1,p)and have
EX=p, var(X)=p(1 −p), M(t)=1+p(e
t
−1),allt. (9)
Bernoulli RVs occur in practice, for example, in coin-tossing experiments. Suppose that
P{H} =p,0<p<1, andP{T}=1−p.DefineRVX so thatX(H)=1 andX(T)=0.
ThenP{X=1}=pandP{X=0}=1−p. Each repetition of the experiment will be
called atrial. More generally, any nontrivial experiment can be dichotomized to yield a
Bernoulli model. Let(Ω,S,P)be the sample space of an experiment, and letA∈Swith
P(A)= p>0. ThenP(A
c
)=1−p. Each performance of the experiment is a Bernoulli
trial. It will be convenient to call the occurrence of eventAasuccessand the occurrence
ofA
c
,afailure.

SOME DISCRETE DISTRIBUTIONS 175
Example 1(Sabharwal [97]). In a sequence ofnBernoulli trials with constant probability
pof success(S), and 1−pof failure(F),letY
ndenote the number of times the combination
SFoccurs. To findEY
nandvar(Y n),letX irepresent the event that occurs on theith trial,
and define RVs
f(X
i,Xi+1)=
Ω
1ifX
i=S,X i+1=F,
0 otherwise.
(i=1,2,...,n−1).
Then
Y
n=
n−1
Θ
i=1
f(Xi,Xi+1)
and
EY
n=(n−1)p(1−p).
Also,
EY
2
n
=E
Ω
n−1
Θ
i=1
f
2
(Xi,Xi+1)
Γ
+E



ΘΘ
iΩ=j
f(Xi,Xi+1)f(XjXj+1)



=(n−1)p(1−p)+(n−2)(n−3)p
2
(1−p)
2
,
so that
var(Y
n)=(n−1)p(1−p)(1−p+p
2
).
Ifp=1/2, then
EY
n=
n−1
4
and var(Y
n)=
3(n−1)
16
.
5.2.3 Uniform Distribution onnPoints
Xis said to have auniform distributiononnpoints{x
1,x2,...,x n}if its PMF is of the form
P{X=x
i}=
1
n
,i=1,2,...,n. (10)
Thus we may write
X=
n
Θ
i=1
xiI
[X=x i]andF(x)=
1
n
n
Θ
i=1
ε(x−x i),
EX=
1
n
n
Θ
i=1
xi, (11)
EX
l
=
1
n
n
Θ
i=1
x
l
i
,l=1,2,..., (12)

176 SOME SPECIAL DISTRIBUTIONS
and
var(X)=
1
n
n
Θ
i=1
x
2
i


1
n
Θ
x
i

2
=
1
n
n
Θ
i=1
(xi−¯x)
2
(13)
if we write¯x=

n
i=1
xi/n.Also,
M(t)=
1
n
n
Θ
i=1
e
txi
for allt. (14)
If, in particular,x
i=i,i=1,2,...,n,
EX=
n+1
2
,EX
2
=
(n+1)(2n+1)
6
, (15)
var(X)=
n
2
−1 12
. (16)
Example 2.A box contains tickets numbered 1 toN.LetX be the largest number drawn
innrandom drawings with replacement.
ThenP{X≤k}=(k/N)
n
, so that
P{X=k}=P{X≤k}−P{X≤k−1}
=

k
N

n


k−1
N

n
.
Also,
EX=N
−n
N
Θ
1
[k
n+1
−(k−1)
n+1
−(k−1)
n
]
=N
−n

N
n+1

N
Θ
1
(k−1)
n
γ
.
5.2.4 Binomial Distribution
We say thatXhas abinomial distributionwith parameterpif its PMF is given by
p
k=P{X=k}=

n
k

p
k
(1−p)
n−k
,k=0,1,2,...,n;0≤p≤1. (17)
Since

n
k=0
pk=[p+(1−p)]
n
=1, thep k’s indeed define a PMF. IfXhas PMF (17),
we will writeX∼b(n,p). This is consistent with the notation for a Bernoulli RV. We have
F(x)=
n
Θ
k=0

n
k

p
k
(1−p)
n−k
ε(x−k).

SOME DISCRETE DISTRIBUTIONS 177
In Example 3.2.5 we showed that
EX=np, (18)
EX
2
=n(n−1)p
2
+np, (19)
and
var(X)=np(1 −p)=npq, (20)
whereq=1−p.Also,
M(t)=
n
Θ
k=0
e
tk

n
k

p
k
(1−p)
n−k
=(q+pe
t
)
n
for allt. (21)
The PGF ofX∼b(n,p)is given byP(s)={1 −p(1−s)}
n
,|s|≤1.
Binomial distribution can also be considered as the distribution of the sum ofninde-
pendent, identically distributedb(1,p)random variables. If we toss a coin, with constant
probabilitypof heads and 1−pof tails,ntimes, the distribution of the number of heads
is given by (17). Alternatively, if we write
X
k=
Ω
1ifkth toss results in a head,
0 otherwise,
the number of heads inntrials is the sumS
n=X1+X2+···+X n.Also
P{X
k=1}=p, P{X k=0}=1−p, k=1,2,...,n.
Thus
ES
n=
n
Θ
1
EXi=np,
var(S
n)=
n
Θ
1
var(X i)=np(1 −p),
and
M(t)=
n
η
i=1
Ee
tXi
=(q+pe
t
)
n
.
Theorem 1.LetX
i(i=1,2,...,k)be independent RVs withX i∼b(n i,p). ThenS k=

k
i=1
Xihas ab(n 1+n2+···+n k,p)distribution.
Corollary.IfX
i(i=1,2,...,k)are iid RVs with common PMFb(n,p), thenS khas a
b(nk,p)distribution.

178 SOME SPECIAL DISTRIBUTIONS
Actually, the additive property described in Theorem 1 characterizes the binomial dis-
tribution in the following sense. LetXandYbe two independent, nonnegative, finite
integer-valued RVs and letZ=X+Y. ThenZis a binomial RV with parameterpif and
only ifXandYare binomial RVs with the same parameterp. The “only if” part is due to
Shanbhag and Basawa [103] and will not be proved here.
Example 3.A fair die is rolledntimes. The probability of obtaining exactly one 6 is
n(
1
6
)(
5
6
)
n−1
, the probability of obtaining no 6 is(
5
6
)
n
, and the probability of obtaining at
least one 6 is 1−(
5
6
)
n
.
The number of trials needed for the probability of at least one 6 to be≥1/2isgiven
by the smallest integernsuch that
1−

5
6

n

1
2
so that
n≥
log2
log1.2
≈3.8.
Example 4.Hererballs are distributed inncells so that each ofn
r
possible arrangements
has probabilityn
−r
. We are interested in the probabilityp kthat a specified cell has exactly
kballs(k=0,1,2,...,r). Then the distribution of each ball may be considered as a trial.
A success results if the ball goes to the specified cell (with probability 1/n); otherwise the
trial results in a failure (with probability 1−1/n). LetXdenote the number of successes
inrtrials. Then
p
k=P{X=k}=

r
k

1
n

k
1−
1
n

r−k
,k=0,1,2,...,n.
5.2.5 Negative Binomial Distribution (Pascal or Waiting Time Distribution)
Let(Ω,S,P)be a probability space of a given statistical experiment, and letA∈Swith
P(A)= p. On any performance of the experiment, ifAhappens we call it a success, oth-
erwise a failure. Consider a succession of trials of this experiment, and let us compute the
probability of observing exactlyrsuccesses, wherer≥1 is a fixed integer. IfXdenotes
the number of failures that precede therth success,X+ris the total number of replica-
tions needed to producersuccesses. This will happen if and only if the last trial results in
a success and among the previous(r+X−1)trials there are exactlyXfailures. It follows
by independence that
P{X=x}=

x+r−1
x

p
r
(1−p)
x
,x=0,1,2,.... (22)
Rewriting (22) in the form
P{X=x}=

−r
x

p
r
(−q)
x
,x=0,1,2,...;q=1−p, (23)

SOME DISCRETE DISTRIBUTIONS 179
we see that


x=0

−r
x

(−q)
x
=(1−q)
−r
=p
−r
. (24)
It follows that


x=0
P{X=x}=1.
Definition 1.For a fixed positive integerr≥1 and 0<p<1, an RV with PMF given by
(22) is said to have anegative binomial distribution. We will use the notation X∼NB(r;p)
to denote thatXhas a negative binomial distribution.
We may write
X=


x=0
xI
[X=x] andF(x)=


k=0

k+r−1
k

p
r
(1−p)
k
ε(x−k).
For the MGF ofXwe have
M(t)=


x=0

x+r−1
x

p
r
(1−p)
x
e
tx
=p
r


x=0
(qe
t
)
x

x+r−1
x

(q=1−p)
=p
r
(1−qe
t
)
−r
forqe
t
<1. (25)
The PGF is given byP(s)=p
r
(1−sq)
−r
,|s|≤1. Also,
EX=


x=0
x

x+r−1
x

p
r
q
x
=rp
r


x=0

x+r
x

q
x+1
=rp
r
q(1−q)
−r−1
=
rq
p
. (26)
Similarly, we can show that
var(X)=
rq
p
2
. (27)
If, however, we are interested in the distribution of the number of trials required to get
rsuccesses, we have, writingY=X+r,
P{Y=y}=

y−1
r−1

p
r
(1−p)
y−r
,y=r,r+1,..., (28)

180 SOME SPECIAL DISTRIBUTIONS





EY=EX+r=
r
p
,
var(Y)=var(X)=
rq
p
2
,
(29)
and
M
Y(t)=(pe
t
)
r
(1−qe
t
)
−r
forqe
t
<1. (30)
LetXbe ab(n,p)RV, and letYbe the RV defined in (28). If there areror more successes
in the firstntrials, at mostntrials were required to obtain the firstrof these successes.
We have
P{X≥r}=P{Y≤n} (31)
and also
P{X<r}=P{Y>n}. (32)
In the special case whenr=1, the distribution ofXis given by
P{X=x}=pq
x
,x=0,1,2,.... (33)
An RVXwith PMF (33) is said to have ageometric distribution. Clearly, for the geometric
distribution, we have









M(t)=p(1 −qe
t
)
−1
,
EX=
q
p
,
var(X)=
q
p
2
.
(34)
Example 5(Banach’s matchbox problem).A mathematician carries one matchbox each
in his right and left pockets. When he wants a match, he selects the left pocket with proba-
bilitypand the right pocket with probability 1−p. Suppose that initially each box contains
Nmatches. Consider the moment when the mathematician discovers that a box is empty.
At that time the other box may contain 0,1,2...,Nmatches. Let us identify success with
the choice of the left pocket. The left-pocket box will be empty at the moment when the
right-pocket box contains exactlyrmatches if and only if exactlyN−rfailures precede
the(N+1)st success. A similar argument applies to the right pocket, and we have
p
r=probability that the mathematician discovers a box empty while
the other containsrmatches
=

2N−r
N−r

p
N+1
q
N−r
+

2N−r
N−r

q
N+1
p
N−r
.
Example 6.A fair die is rolled repeatedly. Let us compute the probability of eventAthat a
2 will show up before a 5. LetA
jbe the event that a 2 shows up on thejth trial (j =1,2,...)

SOME DISCRETE DISTRIBUTIONS 181
for the first time, and a 5 does not show up on the previousj−1 trials. ThenPA=


j=1
PAj,
wherePA
j=
1
6
(
4
6
)
j−1
. It follows that
P(A)=


j=1

1
6

4
6

j−1
=
1
2
.
Similarly the probability that a 2 will show up before a 5 or a 6 is 1/3, and so on.
Theorem 2.LetX
1,X2,...,X kbe independentNB(r i;p)RV’s,i=1,2,...,k, respectively.
ThenS
k=

k i=1
Xiis distributed asNB(r 1+r2+···+r k;p).
Corollary.IfX
1,X2,...,X kare iid geometric RVs, thenS kis anNB(k;p)RV.
Theorem 3.LetXandYbe independent RVs with PMFsNB(r
1;p)andNB(r 2;p),
respectively. Then the conditional PMF ofX,givenX+Y=t, is expressed by
P{X=x|X+Y=t}=

x+r1−1
x

t+r2−x−1
t−x


t+r1+r2−1
t
.
If, in particular,r
1=r2=1, the conditional distribution is uniform ont+1 points.
Proof.By Theorem 2,X+Yis anNB(r
1+r2;p)RV. Thus
P{X=x|X+Y=t}=
P{X=x,Y=t−x}
P{X+Y=t}
=

x+r1−1
x

p
r1
(1−p)
x

t−x+r 2−1
t−x

p
r2
(1−p)
t−x

t+r1+r2−1
t

p
r1+r2(1−p)
t
=

x+r1−1
x

t+r2−x−1
t−x

t+r1+r2−1
t
,t=0,1,2,....
Ifr
1=r2=1, that is, ifXandYare independent geometric RVs, then
P{X=x|X+Y=t}=
1
t+1
,x=0,1,2,...,t;t=0,1,2,.... (35)
Theorem 4(Chatterji [13]). LetXandYbe iid RVs, and let
P{X=k}=p
k>0, k=0,1,2,....
If
P{X=t|X+Y=t}=P{X=t−1|X+Y=t}=
1
t+1
,t≥0, (36)
thenXandYare geometric RVs.

182 SOME SPECIAL DISTRIBUTIONS
Proof.We have
P{X=t|X+Y=t}=
p
tp0

t
k=0
pkpt−k
=
1
t+1
(37)
and
P{X=t−1|X+Y=t}=
p
t−1p1

t k=0
pkpt−k
=
1
t+1
. (38)
It follows that
p
t
pt−1
=
p
1
p0
and by iterationp t=(p1/p0)
t
p0. Since

∞ t=0
pt=1, we must have(p 1/p0)<1. Moreover,
1=p
0
1
1−(p 1/p0)
,
so that(p
1/p0)=1−p 0, and the proof is complete.
Theorem 5.IfXhas a geometric distribution, then, for any two nonnegative integersm
andn,
P{X>m+n|X>m}=P{X≥n}. (39)
Proof.The proof is left as an exercise.
Remark 1.Theorem 5 says that the geometric distribution hasno memory, that is, the
information of no successes inmtrials is forgotten in subsequent calculations.
The converse of Theorem 5 is also true.
Theorem 6.LetXbe a nonnegative integer-valued RV satisfying
P{X>m+1|X>m}=P{X≥1}
for any nonnegative integerm. ThenXmust have a geometric distribution.
Proof.Let the PMF ofXbe written as
P{X=k}=p
k,k=0,1,2,....
Then
P{X≥n}=


k=n
pk

SOME DISCRETE DISTRIBUTIONS 183
and
P{X>m}=


m+1
pk=qm,say,
P{X>m+1|X>m}=
P{X>m+1}
P{X>m}
=
q
m+1
qm
.
Thus
q
m+1=qmq0,
whereq
0=P{X>0}=p 1+p2+···=1−p 0. It follows thatq k=(1−p 0)
k+1
, and hence
p
k=qk−1−qk=(1−p 0)
k
p0, as asserted.
Theorem 7.LetX
1,X2,...,X nbe independent geometric RVs with parametersp 1,p2,...,p n,
respectively. ThenX
(1)=min(X 1,X2,...,X n)is also a geometric RV with parameter
p=1−
n
η
i=1
(1−p i).
Proof.The proof is left as an exercise.
Corollary.Iid RVsX
1,X2,...,X nareNB(1; p)if and only ifX
(1)is a geometric RV with
parameter 1−(1−p)
n
.
Proof.The necessity follows from Theorem 7. For the sufficiency part of the proof let
P{X
(1)≤k}=1−P{X
(1)>k}=1−(1−p)
n(k+1)
.
But
P{X
(1)≤k}=1−P{X 1>k,X 2>k,...,X n>k}
=1−[1−F(k)]
n
,
whereFis the common DF ofX
1,X2,...,X n. It follows that
[1−F(k)] = (1 −p)
k+1
,
so thatP{X
1>k}=(1−p)
k+1
, which completes the proof.
5.2.6 Hypergeometric Distribution
A box containsNmarbles. Of these,Mare drawn at random, marked, and returned to the
box. The contents of the box are then thoroughly mixed. Next,nmarbles are drawn at
random from the box, and the marked marbles are counted. IfXdenotes the number of
marked marbles, then
P{X=x}=

N
n

−1
M
x

N−M
n−x

. (40)

184 SOME SPECIAL DISTRIBUTIONS
Sincexcannot exceedMorn,wemusthave
x≤min(M,n). (41)
Alsox≥0 andN−M≥n−x, so that
x≥max(0,M+n−N). (42)
Note that
n

k=0

a
k

b
n−k

=

a+b
n

for arbitrary numbersa,band positive integern. It follows that

x
P{X=x}=

N
n

−1

x

M
x

N−M
n−x

=1.
Definition 2.An RVXwith PMF given by (47) is called ahypergeometricRV.
It is easy to check that
EX=
n
N
M, (43)
EX
2
=
M(M−1)
N(N−1)
n(n−1)+
nM
N
, (44)
and
var(X)=
nM
N
2
(N−1)
(N−M)(N−n). (45)
Example 7.A lot consisting of 50 bulbs is inspected by taking at random 10 bulbs and
testing them. If the number of defective bulbs is at most 1, the lot is accepted; otherwise,
it is rejected. If there are, in fact, 10 defective bulbs in the lot, the probability of accepting
the lot is

10
1

40
9


50
10
+

40
10


50
10
=.3487.
Example 8.Suppose that an urn containsbwhite andcblack balls,b+c=N.Aballis
drawn at random, and before drawing the next ball,s+1 balls of the same color are added
to the urn. The procedure is repeatedntimes. LetXbe the number of white balls drawn
inndraws,X=0,1,2,...,n. We shall find the PMF ofX.
First note that the probability of drawingkwhite balls in successive draws is

b
N

b+s
N+s

b+2s
N+2s

···

b+(k−1)s
N+(k−1)s

,

SOME DISCRETE DISTRIBUTIONS 185
and the probability of drawingkwhite balls in the firstkdraws and thenn−kblack balls
in the nextn−kdraws is
p
k=

b
N

b+s
N+s

···

b+(k−1)s
N+(k−1)s

c
N+ks

c+s
N+(k+1)s

(46)
···

c+(n−k−1)s
N+(n−1)s

.
Herep
kalso gives the probability of drawingkwhite andn−kblack balls in any given
order. It follows that
P{X=k}=

n
k

p
k. (47)
An RVXwith PMF given by (47) is said to have aPolya distribution. Let us write
Np=b, N(1−p)=c, andNα=s.
Then withq=1−p,wehave
P{X=k}=

n
k

p(p+α)···[p+(k−1)α]q(q+α)···[q+(n−k−1)α]
1(1+α)···[1+(n−1)α]
.
Let us takes=−1. This means that the ball drawn at each draw is not replaced in the urn
before drawing the next ball. In this caseα=−1/N, and we have
P{X=k}=

n
k

Np(Np−1)···[Np−(k−1)]c(c −1)···[c−(n−k−1)]
N(N−1)···[N−(n−1)]
=

Np
k

Nq
n−k


N
n
, (48)
which is a hypergeometric distribution. Here
max(0,n−Nq)≤k≤min(n,Np). (49)
Theorem 8.LetXandYbe independent RVs with PMFsb(m,p)andb(n,p), respectively.
Then the conditional distribution ofX,givenX+Y, is hypergeometric.
5.2.7 Negative Hypergeometric Distribution
Consider the model of Section 5.2.6. A box containsNmarbles,Mof these are marked (or
say defective) andN−Mare unmarked. A sample of sizenis taken and letXdenote the
number of defective marbles in the sample. If the sample is drawn without replacement
we saw thatXhas a hypergeometric distribution with PMF (40). If, on the other hand, the
sample is drawn with replacement thenX∼b(n,p)wherep=M/N.

186 SOME SPECIAL DISTRIBUTIONS
LetYdenote the number of draws needed to draw therth defective marble. If the draws
are made with replacement thenYhas the negative binomial distribution given in (22) with
p=M/N. What if the draws are made without replacement? In that case in order that the
kth draw(k≥r)bether th defective marble drawn, thekth draw must produce a defective
marble, whereas the previousk−1 draws must producer−1 defectives. It follows that
P(Y=k)=

M
r−1

N−M
k−r


N
k−1
·
M−r+1
N−k+1
fork=r,r+1,...,N. Rewriting we see that
P(Y=k)=

k−1
r−1

N−k
m−r


N
M
. (50)
An RVYwith PMF (50) is said to have anegative hypergeometric distribution.
It is easy to see that
EY=r
N+1
M+1
,EY(Y+1)=
r(r+1)(N+1)(N+2)
(M+1)(M+2)
,
and
var(Y)=
r(N−M)(N+1)(M+1−r)
(M+1)
2
(M+2)
.
Also, ifr/N→0, andk/N→0asN→∞, then

k−1
r−1

N−k
M−r



N
M

−→

k−1
r−1

M
N

r
1−
M
N

k−r
which is (22).
5.2.8 Poisson Distribution
Definition 3.An RVXis said to be aPoissonRV with parameterλ>0 if its PMF is
given by
P{X=k}=
e
−λ
λ
k
k!
,k=0,1,2,.... (51)
We first check to see that (51) indeed defines a PMF. We have

Θ
k=0
P{X=k}=e
−λ

Θ
k=0
λ
k
k!
=e
−λ
e
λ
=1.
IfXhas the PMF given by (51), we will writeX∼P(λ). Clearly,
X=

Θ
k=0
kI
[X=k]

SOME DISCRETE DISTRIBUTIONS 187
and
F(x)=

Θ
k=0
e
−λ
λ
k
k!
ε(x−k).
The mean and the variance are given by (see Problem 3.2.9)
EX=λ,EX
2
=λ+λ
2
, (52)
and
var(X)=λ. (53)
The MGF ofXis given by (see Example 3.3.7)
Ee
tX
=exp{λ(e
t
−1)}, (54)
and the PGF byP(s)=e
−λ(1−s)
,|s|≤1.
Theorem 9.LetX
1,X2,...,X nbe independent Poisson RVs withX k∼P(λ k),k=
1,2,...,n. ThenS
n=X1+X2+···+X nis aP(λ 1+λ2+···+λ n)RV.
The converse of Theorem 9 is also true. Indeed, Raikov [84] showed that if
X
1,X2,...,X nare independent andS n=

n
i=1
Xihas a Poisson distribution, each of the
RVsX
1,X2,...,X nhas a Poisson distribution.
Example 9.The number of female insects in a given region follows a Poisson distribution
with meanλ. The number of eggs laid by each insect is aP(μ)RV. We are interested in
the probability distribution of the number of eggs in the region.
LetFbe the number of female insects in the given region. Then
P{F=f}=
e
−λ
λ
f
f!
,f=0,1,2,....
LetYbe the number of eggs laid by each insect. Then
P{Y=y,F=f}=P{F=f}·P{Y=y|F=f}
=
e
−λ
λ
f
f!
(fμ)
y
e
−μf
y!
.
Thus
P{Y=y}=
e
−λ
μ
y
y!

Θ
f=0
(λe
−μ
)
f
f
y
f!

188 SOME SPECIAL DISTRIBUTIONS
The MGF ofYis given by
M(t)=

Θ
f=0
λ
f
e
−λ
f!

Θ
y=0
e
yt
(fμ)
y
y!
e
−μf
=

Θ
f=0
λ
f
e
−λ
f!
exp{fμ(e
t
−1)}
=e
−λ

Θ
f=0
{λe
μ(e
t
−1)
}
ff!
=e
−λ
exp{λe
μ(e
t
−1)
}.
Theorem 10.LetXandYbe independent RVs with PMFsP(λ
1)andP(λ 2), respectively.
Then the conditional distribution ofX,givenX+Y, is binomial.
Proof.For nonnegative integersmandn,m<n,wehave
P{X=m|X+Y=n}=
P{X=m,Y=n−m}
P{X+Y=n}
=
e
−λ1

m
1
/m!)e
−λ2

n−m
2
/(n−m)!)
e
−(λ 1+λ2)
(λ1+λ2)
n
/n!
=

n
m

λ
m
1
λ
n−m
2
(λ1+λ2)
n
=

n
m

λ
1
λ1+λ2

m
1−
λ
1
λ1+λ2

n−m
,
m=0,1,2,...,n,
and the proof is complete.
Remark 2.The converse of this result is also true in the following sense. IfXandYare
independent nonnegative integer-valued RVs such thatP{X=k}>0,P{Y=k}>0, for
k=0,1,2,...,and the conditional distribution ofX,givenX+Y, is binomial, bothXand
Yare Poisson. This result is due to Chatterji [13]. For the proof see Problem 13.
Theorem 11.IfX∼P(λ)and the conditional distribution ofY,givenX=x,isb(x,p),
thenYis aP(λp)RV.
Example 10.(Lamperti and Kruskal [60]). LetNbe a nonnegative integer-valued RV.
Independently of each other,Nballs are placed either in urnAwith probabilityp(0<p<
1) or in urnBwith probability 1−p, resulting inN
Aballs in urnAandN B=N−N Aballs
in urnB. We will show that the RVsN
AandN Bare independent if and only ifNhas a
Poisson distribution. We have
P{N
A=aandN B=b|N=a+b}=

a+b
a

p
a
(1−p)
b
,

SOME DISCRETE DISTRIBUTIONS 189
wherea,b, are integers≥0. Thus
P{N
A=a,N B=b}=

a+b
a

p
a
q
b
P{N=n}, q=1−p,n=a+b.
IfNhas a Poisson(λ)distribution, then
P{N
A=a,N B=b}=
(a+b)!
a!b!
p
a
q
b
e
−λ
λ
a+b
(a+b)!
=

p
a
λ
a
e
−λ/2
a!

q
b
λ
b
b!
e
−λ/2

so thatN
AandN Bare independent.
Conversely, ifN
AandN Bare independent, then
P{N=n}n!=f(a)g(b)
for some functionsfandg. Clearly,f(0) =0,g(0) =0 becauseP{N
A=0,N B=0}>0.
Thus there is a functionhsuch thath(a+b)=f(a)g(b) for all nonnegative integersa,b.
It follows that
h(1)=f(1)g(0)= f(0)g(1),
h(2)=f(2)g(0)= f(1)g(1)= f(0)g(2),
and so on. By induction,
f(a)=f (1)

g(1)
g(0)

a−1
,g(b)=g(1)

f(1)
f(0)

b−1
.
We may write, for someα
1,α2,λ,
f(a)=α
1e
−aλ
,g(b)=α 2e
−bλ
,
P{N=n}=α
1α2
e
−λ(a+ b)
(a+b)!
so thatNis a Poisson RV.
5.2.9 Multinomial Distribution
The binomial distribution is generalized in the following natural fashion. Suppose that an
experiment is repeatedntimes. Each replication of the experiment terminates in one ofk
mutually exclusive and exhaustive eventsA
1,A2,...,A k.Letp jbe the probability that the
experiment terminates inA
j,j=1,2,...,k, and suppose thatp j(j=1,2,...,k) remains
constant for allnreplications. We assume that thenreplications are independent.

190 SOME SPECIAL DISTRIBUTIONS
Letx 1,x2,...,x k−1be nonnegative integers such thatx 1+x2+···+x k−1≤n. Then the
probability that exactlyx
itrials terminate inA i,i=1,2,...,k−1 and hence thatx k=
n−(x
1+x2+···+x k−1)trials terminate inA kis clearly
n!
x1!x2!···x k!
p
x1
1
p
x2
2
···p
xk
k
.
If(X
1,X2,...,X k)is a random vector such thatX j=xjmeans that eventA jhas occurredx j
times,x j=0,1,2,...,n, the joint PMF of(X 1,X2,...,X k)is given by
P{X
1=x1,X2=x2,...,X k=xk} (55)
=



n!
x1!x2!···x k!
p
x1
1
p
x2
2
...p
xk
k
ifn=
k
1
xi,
0 otherwise.
Definition 4.An RV(X
1,X2,...,X k−1)with joint PMF given by
P{X
1=x1,X2=x2,...,X k−1=xk−1} (56)
=







n!
x1!x2!...(n−x 1−···−x k−1)!
p
x1
1
p
x2
2
...p
n−x1−···−x k−1
k
ifx1+x2+···+x k−1≤n,
0 otherwise
is said to have amultinomial distribution.
For the MGF of(X
1,X2,...,X k−1)we have
M(t
1,t2,...,t k−1)=Ee
t1X1+t2X2+···+t k−1Xk−1
=
n
α
x1,x2,...,x k−1=0
x
1+x2+···x k−1≤n
e
t1x1+···+t k−1xk−1
n!p
x1
1
p
x2
2
...p
xk
k
x1!x2!···x k!
=
n
α
x1,x2,...,x k−1=0
x
1+x2+···x k−1≤n
n!
x1!x2!...x k!
(p
1e
t1
)
x1
(p2e
t2
)
x2
...
·(p
k−1e
tk−1
)
xk−1
p
xk
k
=(p1e
t1
+p2e
t2
+···+p k−1e
tk−1
+pk)
n
(57)
for allt
1,t2,...,t k−1∈R.
Clearly,
M(t
1,0,0,...,0)=(p 1e
t1
+p2+···+p k)
n
=(1−p 1+p1e
t1
)
n
,
which is binomial. Indeed, the marginal PMF of eachX
i,i=1,2,...,k−1, is binomial.
Similarly, the joint MGF ofX
i,Xj,i,j=1,2,...,k−1(i =j), is
M(0,0,...,0,t
i,0,...,0,t j,0,...,0)=[p ie
ti
+pje
tj
+(1−p i−pj)]
n
,

SOME DISCRETE DISTRIBUTIONS 191
whichistheMGFofatrinomial distribution with PMF
f(x
i,xj)=
n!
xi!xj!(n−x i−xj)!
p
xi
i
p
xj
j
p
n−xi−xj
k
,p k=1−p i−pj. (58)
Note that the RVsX
1,X2,...,X k−1are dependent.
From the MGF of(X
1,X2,...,X k−1)or directly from the marginal PMFs we can
compute the moments. Thus
EX
j=npjand var(X j)=np j(1−p j), j=1,2,...,k−1, (59)
and forj=1,2,...,k−1, andi =j,
cov(X
i,Xj)=E{(X i−npi)(Xj−npj)}=−np ipj, (60)
It follows that the correlation coefficient betweenX
iandX jis given by
ρ
ij=−

p
ipj
(1−p i)(1−p j)

1/2
,i,j=1,2,...,k−1(i =j). (61)
Example 11.Consider the trinomial distribution with PMF
P{X=x,Y=y}=
n!
x!y!(n−x−y)!
p
x
1
p
y
2
p
n−x−y
3
,
wherex,yare nonnegative integers such thatx+y≤n, andp
1,p2,p3>0 withp 1+p2+
p
3=1. The marginal PMF ofXis given by
P{X=x}=

n
x

p
x
1
(1−p 1)
n−x
,x=0,1,2,...,n.
It follows that
P{Y=y|X=x}
=





(n−x)!
y!(n−x−y)!

p
2
1−p 1

p
3
1−p 1

n−x−y
ify=0,1,2,...,n−x,
0 otherwise,
(62)
which isb(n−x,p
2/(1−p 1)). Thus
E{Y|x}=(n−x)
p
2
1−p 1
. (63)
Similarly,
E{X|y}=(n−y)
p
1
1−p 2
. (64)

192 SOME SPECIAL DISTRIBUTIONS
Finally, we note that, ifX=(X 1,X2,...,X k)andY=(Y 1,Y2,...,Y k)are two indepen-
dent multinomial RVs with common parameter(p
1,p2,...,p k), thenZ=X+Yis also a
multinomial RV with probabilities(p
1,p2,...,p k). This follows easily if one employs the
MGF technique, using (57). Actually this property characterizes the multinomial distri-
bution. IfXandYarek-dimensional, nonnegative, independent random vectors, and if
Z=X+Yis a multinomial random vector with parameter(p
1,p2,...,p k), thenXandY
also have multinomial distribution with the same parameter. This result is due to Shanbhag
and Basawa [103] and will not be proved here.
5.2.10 Multivariate Hypergeometric Distribution
Consider an urn containingNitems divided intokcategories containingn
1,n2,...,n k
items, where

k
j=1
nj=N. A random sample, without replacement, of sizenis taken
from the urn. LetX
i=number of items in sample of typei. Then
P{X
1=x1,X2=x2,...,X k=xk}=
k
η
j=1

n
j
xj



N
n

, (65)
wherex
j=0,1,...,min(n,n j)and

k
j=1
xj=n.
We say that(X
1,X2,...,X k−1)hasmultivariate hypergeometric distributionif its joint
PMF is given by (65). It is clear that eachX
jhas a marginal hypergeometric distribution.
Moreover, the conditional distributions are also hypergeometric. Thus
P{X
i=xi|Xj=xj}=

ni
ni

N−n i−xj
n−xi−xj


N−n j
n−xj
,
and
P{X
i=xi|Xj=xj,XΘ=xΘ}=

ni
xi

N−n i−nj−nΩ
n−xi−xj−xΩ


N−n j−nΩ
n−xj−xΩ
,
and so on. It is therefore easy to write down the marginal and conditional means and
variances. We leave the reader to show that
EX
j=n
n
j
N
,
var(X
j)=n
n
jn

N−n
j
N

N−n
N−1

,
and
cov(X
i,Xj)=−
N−n
N−1
n

n
j
N

2
.
5.2.11 Multivariate Negative Binomial Distribution
Consider the setup of Section 5.2.9 where each replication of an experiment terminates
in one ofkmutually exclusive and exhaustive eventsA
1,A2,...,A k.Letp j=P(A j),

SOME DISCRETE DISTRIBUTIONS 193
j=1,2,...,k. Suppose the experiment is repeated until eventA kis observed for therth
time,r≥1. Then
P(X
1=x1,X2=x2,...,X k=r)
=
(x
1+x2+···+x k−1+r−1)!


k−1
j=1
xj!

(r−1)!
p
r
k
k−1
η
j=1
p
xj
j
, (66)
forx
i=0,1,2,...(i=1,2,...k−1), 1≤r<∞,0<p i<1,

k−1
i=1
pi<1, andp k=
1−

k−1
j=1
pj.
We say that(X
1,X2,...,X k−1)has amultivariate negative binomial (or negative
multinomial) distributionif its joint PMF is given by (66).
It is easy to see the marginal PMF of any subset of{X
1,X2,...,X k−1}is negative
multinomial. In particular, eachX
jhas a negative binomial distribution.
We will leave the reader to show that
M(s
1,s2,...,s k−1)=Ee

k−1
j=1
sjXj
=p
r
k

⎝1−
k−1
Θ
j=1
sjpj


−r
, (67)
and
cov(X
i,Xj)=
rp
ipj
p
2
k
. (68)
PROBLEMS 5.2
1.(a) Let us write
b(k;n,p)=

n
k

p
k
(1−p)
n−k
,k=0,1,2,...,n.
Show that, askgoes from 0 ton,b(k;n,p)first increases monotonically and then
decreases monotonically. The greatest value is assumed whenk=m, wherem
is an integer such that
(n+1)p−1<m≤(n+1)p
except thatb(m−1;n,p)=b(m;n,p)whenm=(n+1)p.
(b) Ifk≥np, then
P{X≥k}≤b(k ;n,p)
(k+1)(1−p)
k+1−(n+1)p
;
and ifk≤np, then
P{X≤k}≤b(k ;n,p)
(n−k+1)p
(n+1)p−k
.

194 SOME SPECIAL DISTRIBUTIONS
2.Generalize the result in Theorem 10 tonindependent Poisson RVs, that is, if
X
1,X2,...,X nare independent RVs withX i∼P(λ i),i=1,2,...,n, the conditional
distribution ofX
1,X2,...,X n,given

n
i=1
Xi=t, is multinomial with parameterst,
λ
1/

n
i
λi,...,λn/

n
1
λi.
3.LetX
1,X2be independent RVs withX i∼b(n i,
1
2
),i=1,2. What is the PMF of
X
1−X2+n2?
4.A box containsNidentical balls numbered 1 throughN. Of these balls,nare drawn at
a time. LetX
1,X2,...,X ndenote the numbers on thenballs drawn. LetS n=

n i=1
Xi.
Findvar(S
n).
5.From a box containingNidentical balls marked 1 throughN,Mballs are drawn one
after another without replacement. LetX
idenote the number on theith ball drawn,
i=1,2,...,M,1≤M≤N.LetY=max(X
1,X2,...,X M). Find the DF and the PMF
ofY. Also find the conditional distribution ofX
1,X2,...,X M,givenY=y.FindEY
andvar(Y).
6.Letf(x;r,p),x=0,1,2,...,denote the PMF of anNB(r;p)RV. Show that the terms
f(x;r,p)first increase monotonically and then decrease monotonically. When is the
greatest value assumed?
7.Show that the terms
P
λ{X=k}=e
−λ
λ
k
k!
,k=0,1,2,...,
of the Poisson PMF reach their maxima whenkis the largest integer≤λand at
(λ−1)andλifλis an integer.
8.Show that

n
k

p
k
(1−p)
n−k
→e
−λ
λ
k
k!
asn→∞andp→0, so thatnp=λremains constant.
[Hint:Use Stirling’s approximation, namely,n!≈

2πn
n+1/2
e
−n
asn→∞.]
9.A biased coin is tossed indefinitely. Letp(0<p<1) be the probability of success
(heads). LetY
1denote the length of the first run, andY 2, the length of the second
run. Find the PMFs ofY
1andY 2and show thatEY 1=q/p+p/q,EY 2=2. IfY n
denotes the length of thenth run,n≥1, what is the PMF ofY n?FindEY n.
10.Show that

N
n

−1
Np
k

N(1−p)
n−k



n
k

p
k
(1−p)
n−k
asN→∞.
11.Show that

r+k−1
k

p
r
(1−p)
k
→e
−λ
λ
k
k!
asp→1 andr→∞in such a way thatr(1−p)=λremains fixed.

SOME DISCRETE DISTRIBUTIONS 195
12.LetXandYbe independent geometric RVs. Show that min(X,Y)andX−Yare
independent.
13.LetXandYbe independent RVs with PMFsP{X=k}=p
k,P{Y=k}=q k,
k=0,1,2,...,wherep
k,qk>0 and


k=0
pk=


k=0
qk=1. Let
P{X=k|X+Y=t}=

t
k

α
k
t
(1−α t)
t−k
,0≤k≤t.
Thenα
t=αfor allt, and
p
k=
e
−θβ
(θβ)
k
k!
,q
k=
e
−θ
θ
k
k!
,
whereβ=α/(1−α), andθ>0 is arbitrary. (Chatterji [13])
14.Generalize the result of Example 10 to the case ofkurns,k≥3.
15.Let(X
1,X2,...,X k−1)have a multinomial distribution with parametersn,p 1,p2,...,
p
k−1. Write
Y=
k
Θ
i=1
(Xi−npi)
2
npi
,
wherep
k=1−p 1−···−p k−1, andX k=n−X 1−···−X k−1.FindEYandvar(Y).
16.LetX
1,X2be iid RVs with common DFF, having positive mass at 0, 1,2,....Also,
letU=max(X
1,X2)andV=X 1−X2. Then
P{U=j,V=0}=P{U=j}P{V =0}
for alljif and only ifFis a geometric distribution. (Srivastava [109])
17.LetXandYbe mutually independent RVs, taking nonnegative integer values. Then
P{X≤n}−P{X+Y≤n}=αP{X+Y=n}
holds forn=0,1,2,...and someα>0 if and only if
P{Y=n}=
1
1+α

α
1+α

n
,n=0,1,2,....
[Hint:Use Problem 3.3.8.] (Puri [83]]
18.LetX
1,X2,...be a sequence of independentb(1,p)RVs with 0<p<1. Also, let
Z
N=

N i=1
Xi, whereNis aP(λ)RV which is independent of theX i’s. Show that
Z
NandN−Z Nare independent.
19.Prove Theorems 5, 7, 8, and 11.
20.In Example 2 show that
(a)
P

X
(1)=k

=pq
2k−2
(1+q), k=1,2,....

196 SOME SPECIAL DISTRIBUTIONS
(b)
P

X
(2)−X
(1)=k

=
p
(1+q)
fork=0
=
2pq
k
(1+q)
fork=1,2,....
5.3 SOME CONTINUOUS DISTRIBUTIONS
In this section we study some most frequently used absolutely continuous distributions and
describe their important properties. Before we introduce specific distributions it should
be remarked that associated with each PDFfthere is anindexor aparameterθ(may be
multidimensional) which takes values in an index setΘ. For any particular choice ofθ∈Θ
we obtain a specific PDFf
θfrom the family of PDFs{f θ,θ∈Θ}.
LetXbe an RV with PDFf
θ(x), whereθis a real-valued parameter. We say thatθis
alocationparameter and{f
θ}is alocation familyifX−θhas PDFf(x)which does not
depend onθ. The parameterθis said to be ascaleparameter and{f
θ}is ascale familyof
PDFs ifX/θhas PDFf(x)which is free ofθ.Ifθ=(μ,σ)is two-dimensional, we say that
θis alocation-scaleparameter if the PDF of(X−μ)/σis free ofμandσ. In that case
{f
θ}is known as alocation-scalefamily.
It is easily seen thatθis a location parameter if and only iff
θ(x)=f(x−θ),a
scale parameter if and onlyf
θ(x)=(1/θ)f(x), and a location-scale parameter iff θ(x)=
(1/σ)f((x−μ)/σ),σ>0 for some PDFf. The densityfis called thestandardPDF for
the family{f
θ,θ∈Θ}.
A location parameter simply relocates or shifts the graph of PDFfwithout changing
its shape. A scale parameter stretches (ifθ>1) or contracts (ifθ<1) the graph off.
A location-scale parameter, on the other hand, stretches or contracts the graph offwith
the scale parameter and then shifts the graph to locate atμ. (see Fig. 1.)
Some PDFs also have a shape parameter. Changing its value alters the shape of the
graph. For the Poisson distributionλis a shape parameter.
For the following PDF
f(x;μ,β,α)=
1
βΓ(α)

x−μ
β

α−1
exp{−(x−μ)/β},x>μ
and=0 otherwise,μis a location,β, a scale, andα, a shape parameter. The standard
density for this location-scale family is
f(x)=
1
Γ(α)
x
α−1
e
−x
,x>0
and=0 otherwise. For the standard PDFf,αis a shape parameter.

SOME CONTINUOUS DISTRIBUTIONS 197
02 46 8 10
0.5
1
1.5
(a)
θ=0
θ=1
θ=2
θ=3
θ=1
θ=2
θ=3
l/3
l/2
1
(b)
0
x
Fig. 1(a) Exponential location family; (b) exponential scale family; (c) normal location-scale
family; and (d ) shape parameter familyf
θ(x)=θxθ−1.

198 SOME SPECIAL DISTRIBUTIONS
−5−10
(c)
0 510
μ= 0, σ=1
μ= 1, σ=2
μ= 1, σ=3
0
1
2
3
4
5
(d)
1
θ= 1/2
θ=1
θ=2
θ=3
x
Fig. 1(continued).

SOME CONTINUOUS DISTRIBUTIONS 199
5.3.1 Uniform Distribution (Rectangular Distribution)
Definition 1.An RVXis said to have auniform distributionon the interval[a,b],
−∞<a<b<∞if its PDF is given by
f(x)=



1
b−a
,a≤x≤b,
0, otherwise.
(1)
We will writeX∼U[a,b]ifXhas a uniform distribution on[a,b].
The end pointaorbor both may be excluded. Clearly,


−∞
f(x)dx=1,
so that (1) indeed defines a PDF. The DF ofXis given by
F(x)=







0, x<a,
x−a
b−a
,a≤x<b,
1, b≤x;
(2)
EX=
a+b
2
,EX
k
=
b
k+1
−a
k+1
(k+1)(b−a)
,k>0 is an integer; (3)
var(X)=
(b−a)
2
12
; (4)
M(t)=
1
t(b−a)
(e
tb
−e
ta
), t =0. (5)
Example 1.LetXhave PDF given by
f(x)=
Ω
λe
−λx
,0<x<∞,λ> 0,
0, otherwise.
Then
F(x)=
Ω
0 x≤0,
1−e
−λx
,x>0.
LetY=F(X)=1−e
−λX
. The PDF ofYis given by
f
Y(y)=
1
λ
·
1
1−y
λe
−λ(−
1
λ)log(1− y)
0≤y<1,
=1, 0≤y<1.

200 SOME SPECIAL DISTRIBUTIONS
Let us definef Y(y)=1at y=1. Then we see thatYhas density function
f
Y(y)=
Ω
1,0≤y≤1,
0,otherwise,
which is theU[0,1]distribution. That this is not a mere coincidence is shown in the
following theorem.
Theorem 1(Probability Integral Transformation).LetXbe an RV with a continuous
DFF. ThenF(X)has the uniform distribution on[0,1].
Proof.The proof is left as an exercise.
The reader is asked to consider what happens in the case whereFis the DF of a discrete
RV. In the converse direction the following result holds.
Theorem 2.LetFbe any df, and letXbe aU[0,1]RV. Then there exists a functionhsuch
thath(X)has DFF, that is,
P{h(X )≤x}=F(x)for allx∈(−∞,∞). (6)
Proof.IfFis the DF of a discrete RVY,let
P{Y=y
k}=p k,k=1,2,....
Definehas follows:
h(x)=







y
1if 0≤x<p 1,
y
2ifp1≤x<p 1+p2,
.
.
.
.
.
.
Then
P{h(X )=y
1}=P{0≤X<p 1}=p 1,
P{h(X )=y
2}=P{p 1≤X<p 1+p2}=p 2,
and, in general,
P{h(X )=y
k}=p k,k=1,2,....
Thush(X)is a discrete RV with DFF.

SOME CONTINUOUS DISTRIBUTIONS 201
IfFis continuous and strictly increasingF
−1
is well defined, and we takeh(X)=
F
−1
(X).Wehave
P{h(X )≤x}=P{F
−1
(X)≤x}
=P{X≤F(x)}
=F(x),
as asserted.
In general, define
F
−1
(y)=inf{x:F(x)≥y}, (7)
and leth(X)=F
−1
(X). Then we have
{F
−1
(y)≤x}={y≤F(x)}. (8)
Indeed,F
−1
(y)≤ximplies, that, for everyε>0,y≤F(x+ε). Sinceε>0 is arbitrary
andFis continuous on the right, we letε→0 and conclude thaty≤F(x). Sincey≤F(x)
impliesF
−1
(y)≤xby definition (7), it follows that (8) holds generally. Thus
P{F
−1
(X)≤x}=P{X≤F(x)}=F(x).
Theorem 2 is quite useful in generating samples with the help of the uniform distribu-
tion.
Example 2.LetFbe the DF defined by
F(x)=
Ω
0, x≤0
1−e
−x
,x>0.
Then the inverse toy=1−e
−x
,x>0, isx=−log(1−y),0<y<1. Thus
h(y)=−log(1−y),
and−log(1−X)has the required distribution, whereXis aU[0,1]RV.
Theorem 3.LetXbe an RV defined on[0,1].IfP{x<X≤y}depends only ony−xfor
all 0≤x≤y≤1, thenXisU[0,1].
Proof.LetP{x<X≤y}=f(y−x)thenf(x+y)=P{0 <X≤x+y}=P{0<X≤x}+
P{x<X≤x+y}=f(x)+f(y). Note thatfis continuous from the right. We have
f(x)=f(x)+f(0),
so that
f(0)=0.

202 SOME SPECIAL DISTRIBUTIONS
We will show thatf(x)=cxfor some constantc. It suffices to prove the result for positivex.
Letmbe an integer then
f(mx)=f(x)+···+f(x)=mf(x).
Lettingx=n/m, we get
f



n
m

=mf

n
m

,
so that
f

n
m

=
1
m
f(n)=
n
m
f(1),
for positive integersnandm. Lettingf(1)=c,wehaveprovedthat
f(x)=cx
for rational numbersx.
To complete the proof we consider the case wherexis a positive irrational number.
Then we can find a decreasing sequence of positive rationalsx
1,x2,...such thatx n→x.
Sincefis right continuous,
f(x) = lim
xn↓x
f(xn) = lim
xn↓x
cxn=cx.
Now, for 0≤x≤1,
F(x)=P{X ≤0}+P{0<X≤x}
=F(0)+P{0 <X≤x}
=f(x)
=cx,0≤x≤1.
SinceF(1)=1, we must havec=1, so that
F(x)=x,0≤x≤1.
This completes the proof.
5.3.2 Gamma Distribution
The integral
Γ(α)=


0+
x
α−1
e
−x
dx (9)
converges or diverges according asα>0or≤0. Forα>0 the integral in (9) is called the
gamma function. In particular, if α=1,Γ(1)=1. If α>1, integration by parts yields
Γ(α)=(α−1)


0
x
α−2
e
−x
dx=(α−1)Γ(α −1). (10)

SOME CONTINUOUS DISTRIBUTIONS 203
Ifα=nis a positive integer, then
Γ(n)=( n−1)!. (11)
Also writingx=y
2
/2inΓ

1
2

we see that
Γ

1
2

=
1

2


0
e
−y
2
/2
dy.
Now consider the integralI=


0
e
−y
2
/2
dy.Wehave
I
2
=


0

0
exp{
−(x
2
+y
2
)
2
}dxdy,
and changing to polar coordinates we get
I
2
=


0

0
rexp(
−r
2
2
)dr dθ=2π.
It follows thatΓ

1
2

=

π.
Let us writex=y/β,β>0, in the integral in (9). Then
Γ(α)=


0
y
α−1
β
α
e
−y/β
dy, (12)
so that


0
1
Γ(α)β
α
y
α−1
e
−y/β
dy=1. (13)
Since the integrand in (13) is positive fory>0, it follows that the function
f(y)=



1
Γ(α)β
α
y
α−1
e
−y/β
0<y<∞,
0, y≤0
(14)
defines a PDF forα>0,β>0.
Definition 2.An RVXwith PDF defined by (14) is said to have agamma distribution
with parametersαandβ. We will writeX∼G(α,β).
Figure 2 gives graphs of some gamma PDFs.
The DF of aG(α,β)RV is given by
F(x)=



0, x≤0,
1
Γ(α)β
α

x
0
y
α−1
e
−y/β
dy,x>0.
(15)

204 SOME SPECIAL DISTRIBUTIONS
0
1
2
3
(a)
0 0.5 1 1.5 2
β= 0.5
α= 0.5
β= 0.5
β=1
β=2
α=2
1
0.8
(b)
0.5
0
012345678
Fig. 2Gamma density functions.

SOME CONTINUOUS DISTRIBUTIONS 205
10501 5 20
0
0.1
0.15
(c)
α=4
β=2
β=4
0.05
0 5 10 15 20 25 30 35
0.04
0.08
0.12
0.16
(d)
0
β=2
α=8
Fig. 2(continued).
The MGF ofXis easily computed. We have
M(t)=
1
Γ(α)β
α


0
e
x(t−1/β)
x
α−1
dx
=

1
1−βt

α

0
y
α−1
e
−y
Γ(α)
dy, t<
1
β
=(1−βt)
−α
t<
1
β
. (16)

206 SOME SPECIAL DISTRIBUTIONS
It follows that
EX=M

(t)|t=0=αβ, (17)
EX
2
=M
≈≈
(t)|t=0=α(α+1)β
2
, (18)
so that
var(X)=αβ
2
. (19)
Indeed, we can compute the moment of ordernsuch thatα+n>0 directly from the
density. We have
EX
n
=
1
Γ(α)β
α


0
e
−x/β
x
α+n−1
dx

n
Γ(α+n)
Γ(α)

n
(α+n−1)(α+n−2)···α. (20)
The special case whenα=1 leads to theexponential distribution with parameterβ.
The PDF of an exponentially distributed RV is therefore
f(x)=
Ω
β
−1
e
−x/β
,x>0,
0, otherwise.
(21)
Note that we can speak of the exponential distribution on(−∞,0). The PDF of such an
RV is
f(x)=
Ω
β
−1
e
x/β
,x<0,
0, x≥0.
(22)
Clearly, ifX∼G(1,β),wehave
EX
n
=n!β
n
(23)
EX=β and var(X)=β
2
, (24)
M(t)=(1−βt)
−1
fort<β
−1
. (25)
Another special case of importance is whenα=n/2,n>0 (an integer), andβ=2.
Definition 3.An RVXis said to have achi-square distribution(χ
2
-distribution) withn
degrees of freedom wherenis a positive integer if its PDF is given by
f(x)=



1
Γ(n/2)2
n/2
e
−x/2
x
n/2−1
,0<x<∞,
0, x≤0.
(26)

SOME CONTINUOUS DISTRIBUTIONS 207
We will writeX∼χ
2
(n)for aχ
2
RV withndegrees of freedom (d.f.). [Note the difference
in the abbreviations of distribution function (DF) anddegrees of freedom(d.f.).]
IfX∼χ
2
(n), then
EX=n, var(X)=2n, (27)
EX
k
=
2
k
Γ[(n/2)+k ]
Γ(n/2)
, (28)
and
M(t)=(1−2t)
−n/2
fort<
1
2
. (29)
Theorem 4.LetX
1,X2,...,X nbe independent RVs such thatX j∼G(α j,β),j=1,2,...,n.
ThenS
n=

n
k=1
Xkis aG(

n
j=1
αj,β)RV.
Corollary 1.LetX
1,X2,...,X nbe iid RVs, each with an exponential distribution with
parameterβ. ThenS
nis aG(n,β)RV.
Corollary 2.IfX
1,X2,...,X nare independent RVs such thatX j∼χ
2
(rj),j=1,2,...,n,
thenS
nis aχ
2
(

n
i=1
rj)RV.
Theorem 5.LetX∼U(0,1). ThenY=−2logXisχ
2
(2).
Corollary.LetX
1,X2,...,X nbe iid RVs with common distributionU(0,1). Then
−2

n
i=1
logX i=2log(1/

n
i=1
Xi)isχ
2
(2n).
Theorem 6.LetX∼G(α
1,β)andY∼G(α 2,β)be independent RVs. ThenX+Yand
X/Yare independent.
Corollary.LetX∼G(α
1,β)andY∼G(α 2,β)be independent RVs. ThenX+Yand
X/(X+Y)are independent.
The converse of Theorem 6 is also true. The result is due to Lukacs [68], and we state
it without proof.
Theorem 7.LetXandYbe two nondegenerate RVs that take only positive values. Sup-
pose thatU=X+YandV=X/Yare independent. ThenXandYhave gamma distribution
with the same parameterβ.
Theorem 8.LetX∼G(1,β). Then the RVXhas “no memory,” that is,
P{X>r+s|X>s}=P{X>r}, (30)
for any two positive real numbersrands.

208 SOME SPECIAL DISTRIBUTIONS
Proof.The proof is left as an exercise.
The converse of Theorem 8 is also true in the following sense.
Theorem 9.LetFbe a DF such thatF(x)=0ifx<0,F(x)<1ifx>0, and
1−F(x+y)
1−F(y)
=1−F(x)for allx,y>0. (31)
Then there exists a constantβ>0 such that
1−F(x)=e
−xβ
,x>0. (32)
Proof.Equation (31) is equivalent to
g(x+y)=g(x)+g(y)
if we writeg(x)=log{1−F(x)}. From the proof of Theorem 3 it is clear that the only
right continuous solution isg(x)=cx. HenceF(x)=1−e
cx
,x≥0. SinceF(x)→1as
x→∞, it follows thatc<0 and the proof is complete.
Theorem 10.LetX
1,X2,...,X nbe iid RVs. ThenX i∼G(1,nβ),i=1,2,...,n, if and only
ifX
(1)isG(1,β).
Note that ifX
1,X2,...,X nare independent withX i∼G(1,β i),i=1,2,...,n, thenX
(1)
is aG

1,1/

b
i=1
β
−1
i

RV.
The following result describes the relationship between exponential and Poisson RVs.
Theorem 11.LetX
1,X2,...be a sequence of iid RVs having common exponential density
with parameterβ>0. LetS
n=

n
k=1
Xkbe thenth partial sum,n=1,2,...,and suppose
thatt>0. IfY=number ofS
n∈[0,t], thenYis aP(t/β)RV.
Proof.We have
P{Y=0}=P{S
1>t}=
1
β


t
e
−x/β
dx=e
−t/β
,
so that the assertion holds forY=0. Letnbe a positive integer. Since theX
i’s are
nonnegative,S
nis nondecreasing, and
P{Y=n}=P{S
n≤t,S n+1>t}. (33)
Now
P{S
n≤t}=P{S n≤t,S n+1>t}+P{S n+1≤t}. (34)

SOME CONTINUOUS DISTRIBUTIONS 209
It follows that
P{Y=n}=P{S
n≤t}−P{S n+1≤t}, (35)
and, sinceS
n∼G(n,β),wehave
P{Y=n}=

t
0
1
Γ(n)β
n
x
n−1
e
−x/β
dx−

t
0
1
Γ(n+1)β
n+1
x
n
e
−x/β
dx
=
t
n
e
−t/β
β
n
n!
,
as asserted.
Theorem 12.IfXandYare independent exponential RVs with parameterβ, thenZ=
X/(X+Y)has aU(0,1)distribution.
Note that, in view of Theorem 7, Theorem 12 characterizes the exponential distribution
in the following sense. LetXandYbe independent RVs that are nondegenerate and take
only positive values. Suppose thatX+YandX/Yare independent. IfX/(X+Y)isU(0,1),
XandYboth have the exponential distribution with parameterβ. This follows since, by
Theorem 7,XandYmust have the gamma distribution with parameterβ. ThusX/(X+Y)
must have (see Theorem 14) the PDF
f(x)=
Γ(α
1+α2)
Γ(α1)Γ(α 2)
x
α1−1
(1−x)
α2−1
,0<x<1,
and this is the uniform density on(0,1)if and only ifα
1=α2=1. ThusXandYboth
have theG(1,β)distribution.
Theorem 13.LetXbe aP(λ)RV. Then
P{X≤K}=
1
K!


λ
e
−x
x
K
dx (36)
expresses the DF ofXin terms of an incomplete gamma function.
Proof.
d

P{X≤K}=
K

j=0
1
j!
{je
−λ
λ
j−1
−λ
j
e
−λ
}
=
−λ
K
e
−λ K!
,
and it follows that
P{X≤K}=
1
K!


λ
e
−x
x
K
dx,
as asserted.

210 SOME SPECIAL DISTRIBUTIONS
An alternative way of writing (36) is the following:
P{X≤K}=P{Y≥2λ},
whereX∼P(λ)andY∼χ
2
(2K+2).
5.3.3 Beta Distribution
The integral
B(α,β)=

1−
0+
x
α−1
(1−x)
β−1
dx (37)
converges forα>0 andβ>0 and is called abeta function.Forα ≤0orβ ≤0 the integral
in (37) diverges. It is easy to see that forα>0 andβ>0
B(α,β)=B(β,α), (38)
B(α,β)=


0+
x
α−1
(1+x)
−α−β
dx, (39)
and
B(α,β)=
Γ(α)Γ(β)
Γ(α+β)
. (40)
It follows that
f(x)=



x
α−1
(1−x)
β−1
B(α,β)
,0<x<1,
0, otherwise,
(41)
defines a pdf.
Definition 4.An RVXwith PDF given by (41) is said to have abeta distributionwith
parametersαandβ,α>0 andβ>0. We will writeX∼B(α,β)for a beta variable with
density (41).
Figure 3 gives graphs of some beta PDFs.
The DF of aB(α,β)RV is given by
F(x)=







0, x≤0,
[B(α,β)]
−1

x
0+
y
α−1
(1−y)
β−1
dy,0<x<1,
1, x≥1.
(42)

SOME CONTINUOUS DISTRIBUTIONS 211
x
f(x)
(1,2)
(1,1)
(3,3)
(9,2)
(2,1)
(0.5,0.5)
0 0.2 0.4 0.6 0.8 1
1
2
3
4
Fig. 3Beta density functions.
Ifnis a positive number, then
EX
n
=
1
B(α,β)

1
0
x
n+α−1
(1−x)
β−1
dx
=
B(n+α,β)
B(α,β)
=
Γ(n+α)Γ(α+β)
Γ(α)Γ(n+α+β)
, (43)
using (40). In particular,
EX=
α
α+β
(44)
and
var(X)=
αβ
(α+β)
2
(α+β+1)
. (45)
For the MGF ofX∼B(α,β),wehave
M(t)=
1
B(α,β)

1
0
e
tx
x
α−1
(1−x)
β−1
dx. (46)

212 SOME SPECIAL DISTRIBUTIONS
Since moments of all order exist, andE|X|
j
<1 for allj,wehave
M(t)=


j=0
t
j
j!
EX
j
=


j=0
t
j
Γ(j+1)
Γ(α+j)Γ(α+β)
Γ(α+β+j)Γ(α)
. (47)
Remark 1.Note that in the special case whereα=β=1 we get the uniform distribution
on(0,1).
Remark 2.IfXis a beta RV with parametersαandβ, then 1−Xis a beta variate with
parametersβandα. In particular,XisB(α,α) if and only if 1−XisB(α,α). A special
case is the uniform distribution on(0,1).IfXand 1−Xhave the same distribution, it does
not follow thatXhas to beB(α,α). All this entails is that the PDF satisfies
f(x)=f(1−x), 0<x<1.
Take
f(x)=
1
B(α,β)+B(β,α )
[x
α−1
(1−x)
β−1
+(1−x)
α−1
x
β−1
], 0<x<1.
Example 3.LetXbe distributed with PDF
f(x)=

1
12
x
2
(1−x),0<x<1,
0, otherwise.
ThenX∼B(3,2)and
EX
n
=
Γ(n+3)Γ(5)
Γ(3)Γ(n +5)
=
4!
2!
·
(n+2)!
(n+4)!
=
12
(n+4)(n+3)
,
EX=
12
20
,var(X)=
6
5
2
·6
=
1
25
,
M(t)=


j=0
t
j j!
·
(j+2)!
(j+4)!
4!
2!
,
=


j=0
12
(j+4)(j+3)
·
t
j
j!
,
and
P{0.2 <X<0.5}=
1
12

0.5
0.2
(x
2
−x
3
)dx
=0.023.

SOME CONTINUOUS DISTRIBUTIONS 213
Theorem 14.LetXandYbe independentG(α 1,β)andG(α 2,β), respectively, RVs. Then
X/(X+Y)is aB(α
1,α2)RV.
LetX
1,X2,...,X nbe iid RVs with the uniform distribution on[0,1].LetX
(k)be the
kth-order statistic.
Theorem 15.The RVX
(k)has a beta distribution with parametersα=kandβ=n−k+1.
Proof.LetXbe the number ofX
i’s that lie in[0,t]. ThenXisb(n,t).Wehave
P{X
(k)≤t}=P{X≥k}
=
n
Θ
j=k

n
j

t
j
(1−t)
n−j
.
Also
d
dt
P{X≥k}=
n
Θ
j=k

n
j

{jt
j−1
(1−t)
n−j
−(n−j)t
j
(1−t)
n−j−1
}
=
n
Θ
j=k

n

n−1
j−1

t
j−1
(1−t)
n−j
−n

n−1
j

t
j
(1−t)
n−j−1
!
=n

n−1
k−1

t
k−1
(1−t)
n−k
.
On integration, we get
P{X
(k)≤t}=n

n−1
k−1

t
0
x
k−1
(1−x)
n−k
dx,
as asserted.
Remark 3.Note that we have shown that ifXisb(n,p), then
1−P{X<k}=n

n−1
k−1

p
0
x
k−1
(1−x)
n−k
dx, (48)
which expresses the DF ofXin terms of the DF of aB(k,n−k+1)RV.
Theorem 16.LetX
1,X2,...,X nbe independent RVs. ThenX 1,X2,...,X nare iidB(α,1)
RVs if and only ifX
(n)∼B(αn,1).
5.3.4 Cauchy Distribution
Definition 5.An RVXis said to have aCauchy distributionwith parametersμandθif
its PDF is given by
f(x)=
μ
π
1
μ
2
+(x−θ)
2
,−∞<x<∞,μ> 0. (49)

214 SOME SPECIAL DISTRIBUTIONS
−10 0 10
x
−2−4−6−8 2468
Fig. 4Cauchy density function.
We will writeX∼C(μ,θ)for a Cauchy RV with density (49).
Figure 4 gives graph of a Cauchy PDF.
We first check that (49) in fact defines a PDF. Substitutingy=(x−θ)/μ, we get


−∞
f(x)dx=
1
π


−∞
dy
1+y
2
=
2
π
(tan
−1
y)

0
=1.
TheDFofaC(1,0)RV is given by
F(x)=
1
2
+
1
π
tan
−1
x,−∞<x<∞. (50)
Theorem 17.LetXbe a Cauchy RV with parametersμandθ. The moments of order<1
exist, but the moments of order≥1 do not exist for the RVX.
Proof.It suffices to consider the PDF
f(x)=
1
π
1
1+x
2
,−∞<x<∞.
E|X|
α
=
2
π


0
x
α
1
1+x
2
dx,

SOME CONTINUOUS DISTRIBUTIONS 215
and lettingz=1/(1+x
2
)in the integral, we get
E|X|
α
=
1
π

1
0
z
(1−α)/2−1
(1−z)
[(α+1)/ 2]−1
dz,
which converges forα<1 and diverges forα≥1. This completes the proof of the theorem.
It follows from Theorem 17 that the MGF of a Cauchy RV does not exist. This creates
some manipulative problems. We note, however, that the CF ofX∼C(μ,0)is given by
φ(t)=e
−μ|t|
. (51)
Theorem 18.LetX∼C(μ
1,θ1)andY∼C(μ 2,θ2)be independent RVs. ThenX+Yis a
C(μ
1+μ2,θ1+θ2)RV.
Proof.For notational convenience we will prove the result in the special case whereμ
1=
μ
2=1 andθ 1=θ2=0, that is, whereXandYhave the common PDF
f(x)=
1
π
·
1
1+x
2
,−∞<x<∞.
The proof in the general case follows along the same lines. IfZ=X+Y, the PDF ofZis
given by
f
Z(z)=
1
π
2


−∞
1
1+x
2
·
1
1+(z−x)
2
dx.
Now
1
(1+x
2
)[1+(z−x)
2
]
=
1
z
2
(z
2
+4)

2zx
1+x
2
+
z
2
1+x
2
+
2z
2
−2zx
1+(z−x)
2
+
z
2
1+(z−x)
2

,
so that
f
Z(z)=
1
π
2
1
z
2
(z
2
+4)

zlog
1+x
2
1+(z−x)
2
+z
2
tan
−1
x+z
2
tan
−1
(x−z)


−∞
=
1
π
2
z
2
+2
2
,−∞<z<∞.
It follows that, ifXandYare iidC(1,0)RVs, thenX+Yis aC(2,0)RV. We note that the
result follows effortlessly from (51).
Corollary.LetX
1,X2,...,X nbe independent Cauchy RVs,X k∼C(μ k,θk),k=1,2,...,n.
ThenS
n=

n
1
Xkis aC(

n
1
μk,

n
1
θk)RV.
In particular, ifX
1,X2,...,X nare iidC(1,0)RVs,n
−1
Snis also aC(1,0)RV. This is
a remarkable result, the importance of which will become clear in Chapter 7. Actually,

216 SOME SPECIAL DISTRIBUTIONS
this property uniquely characterizes the Cauchy distribution. IfFis a nondegenerate DF
with the property thatn
−1
Snalso has DFF, thenFmust be a Cauchy distribution (see
Thompson [113, p. 112]).
The proof of the following result is simple.
Theorem 19.LetXbeC(μ,0). Thenλ/X, whereλis a constant, is aC(|λ|/μ,0)RV.
Corollary.XisC(1,0)if and only if 1/X isC(1,0).
We emphasize that ifXand 1/X have the same PDF on(−∞,∞), it does not follow

thatXisC(1,0),forletXbe an RV with PDF
f(x)=
1
4
if|x|≤1,
=
1
4x
2
if|x|>1.
ThenXand 1/X have the same PDF, as can be easily checked.
Theorem 20.LetXbe aU(−π/2,π/2)RV. ThenY=tanXis a Cauchy RV.
Many important properties of the Cauchy distribution can be derived from this result
(see Pitman and Williams [80]).
5.3.5 Normal Distribution (the Gaussian Law)
One of the most important distributions in the study of probability and mathematical
statistics is thenormal distribution, which we will examine presently.
Definition 6.An RVXis said to have astandard normal distributionif its PDF is given by
ϕ(x)=
1


e
−(x
2
/2)
,−∞<x<∞. (52)
We first check thatfdefines a PDF. Let
I=


−∞
e
−x
2
/2
dx.

Menon [73] has shown that we need the condition that bothXand 1/ Xbestableto conclude that
Xis Cauchy. A nondegenerate distribution functionFis said to be stable if, for two iid RVsX
1,X2
with common DFF, and given constantsa 1,a2>0, we can findα>0andβ (a 1,a2)such that
the RV
X
3=α
−1
(a1X1+a2X2−β)
again has the same distributionF. Examples are the Cauchy (see the corollary to Theorem 18)
and normal (discussed in Section 5.3.5) distributions.

SOME CONTINUOUS DISTRIBUTIONS 217
Then
0<e
−x
2
/2
<e
−|x|+1
,−∞<x<∞,


−∞
e
−|x|+1
dx=2e,
and it follows thatIexists. We have
I=


0
y
−1/2
e
−y/2
dy


1
2

2
1/2
=

2π.
Thus


−∞
ϕ(x)dx=1, as required.
Let us writeY=σX+μ, whereσ>0. Then the PDF ofYis given by
ψ(y)=
1
σ
ϕ

y−μ
σ

=
1
σ


e
−[(y−μ)
2
/2σ
2
]
,−∞<y<∞;σ>0,−∞<μ<∞. (53)
Definition 7.An RVXis said to have a normal distribution with parametersμ(−∞<
μ<∞) and σ(>0)if its PDF is given by (53).
IfXis a normally distributed RV with parametersμandσ, we will writeX∼N(μ,σ
2
).
In this notationϕdefined by (53) is the PDF of anN(0,1)RV. The DF of anN(0,1)RV
will be denoted byΦ(x), where
Φ(x)=
1



x
−∞
e
−u
2
/2
du. (54)
Clearly, ifX∼N(μ,σ
2
), thenZ=(X−μ)/σ∼N(0,1).Zis called astandard normal RV.
For the MGF of anN(μ,σ
2
)RV, we have
M(t)=
1

2πσ


−∞
exp

−x
2

2
+x
(tσ
2
+μ)
σ
2

μ
2

2
!
dx
=
1

2πσ


−∞
exp

−(x−μ−σ
2
t)
2

2
+μt+
σ
2
t
2
2
!
dx
=exp

μt+
σ
2
t
2
2

, (55)
for all real values oft. Moments of all order exist and may be computed from the MGF.
Thus
EX=M
π
(t)|t=0=(μ+σ
2
t)M(t)| t=0=μ (56)

218 SOME SPECIAL DISTRIBUTIONS
and
EX
2
=M
≈≈
(t)|t=0=[M(t)σ
2
+(μ+σ
2
t)
2
M(t)]t=0

2

2
. (57)
Thus
var(X)=σ
2
. (58)
Clearly, the central moments of odd order are all 0. The central moments of even order
are as follows:
E{X−μ}
2n
=
1
σ




−∞
x
2n
e
−x
2
/2σ
2
dx(nis a positive integer)
=
σ
2n


2
n+1/2
Γ

n+
1
2

=[(2n−1)(2n−3)···3·1]σ
2n
. (59)
As for the absolute moment of orderα, for a standard normal RVZwe have
E|Z|
α
=
1


2


0
z
α
e
−z
2
/2
dz
=
1




0
y
[(α+1)/ 2)]−1
e
−y/2
dy
=
Γ[(α+1)/2]2
α/2

π
. (60)
As remarked earlier, the normal distribution is one of the most important distributions
in probability and statistics, and for this reason the standard normal distribution is available
in tabular form. Table ST2 at the end of the book gives the probabilityP{Z>z}for various
values ofz(>0)in the tail of anN(0,1)RV. In this book we will writez
αfor the value of
Zthat satisfiesα=P{Z>z
α},0≤α≤1.
Example 4.By Chebychev’s inequality, ifE|X|
2
<∞,EX=μ, andvar(X)=σ
2
, then
P{|X−μ|≥Kσ}≤
1
K
2
.
ForK=2, we getP{|X−μ|≥Kσ}≤0.25, and forK=3, we haveP{|X−μ|≥Kσ}≤
1
9
.
IfXis, in particular,N(μσ
2
), then
P{|X−μ|≥Kσ}=P{|Z|≥K},
whereZisN(0,1). From Table ST2.
P{|Z|≥1}=0.318, P{|Z|≥2}=0.046, andP{|Z|≥3}=0.002.

SOME CONTINUOUS DISTRIBUTIONS 219
Thus practically all the distribution is concentrated within three standard deviations of
the mean.
Example 5.LetX∼N(3,4). Then
P{2<X≤5}=P

2−3
2
<
X−3
2

5−3
2
!
=P{−0.5<Z≤1}
=P{Z≤1}−P{Z≤−0.5}
=0.841−P{Z≥0.5}
=0.0841−0.309=0.532.
Theorem 21.(Feller [25, p. 175]). LetZbe a standard normal RV. Then
P{Z>x}≈
1

2πx
e
−x
2
/2
asx→∞. (61)
More precisely, for everyx>0
1


e
−x
2
/2

1
x

1
x
3

<P{Z>x}<
1
x


e
−x
2
/2
. (62)
Proof.We have
1




x
e
−(1/2)y
2

1−
3
y
4

dy=
1


e
−x
2
/2

1
x

1
x
3

(63)
and
1




x
e
−y
2
/2

1+
1
y
2

dy=
1


e
−x
2
/2
1
x
, (64)
as can be checked on differentiation. Approximation (61) follows immediately.
Theorem 22.LetX
1,X2,...,X nbe independent RVs withX k∼N(μ k,σ
2
k
),k=1,2,...,n.
ThenS
n=

n
k=1
Xkis anN(

n
k=1
μk,

n
1
σ
2
k
)RV.
Corollary 3.IfX
1,X2,...,X nare iidN(μ,σ
2
)RVs, thenS nis anN(nμ,nσ
2
)RV and
n
−1
Snis anN(μ,σ
2
/n)RV.
Corollary 4.IfX
1,X2,...,X nare iidN(0,1)RVs, thenn
−1/2
Snis also anN(0,1)RV.
We remark that ifX
1,X2,...,X nare iid RVs withEX=0,EX
2
=1 such thatn
−1/2
Snalso
has the same distribution for eachn=1,2,...,that distribution can only beN(0,1).This
characterization of the normal distribution will become clear when we study the central
limit theorem in Chapter 7.

220 SOME SPECIAL DISTRIBUTIONS
Theorem 23.LetXandYbe independent RVs. ThenX+Yis normally distributed if and
only ifXandYare both normal.
IfXandYare independent normal RVs,X+Yis normal by Theorem 22. The converse
is due to Cramér [16] and will not be proved here.
Theorem 24.LetXandYbe independent RVs with commonN(0,1)distribution. Then
X+YandX−Yare independent.
The converse is due to Bernstein [4] and is stated here without proof.
Theorem 25.IfXandYare independent RVs with the same distribution and ifZ
1=X+Y
andZ
2=X−Yare independent, all RVsX,Y,Z 1, andZ 2are normally distributed.
The following result generalizes Theorem 24.
Theorem 26.IfX
1,X2,...,X nare independent normal RVs and

n
i=1
aibivar(X i)=0,
thenL
1=

n
i=1
aiXiandL 2=

n
i=1
biXiare independent. Herea 1,a2,...,a nand
b
1,b2,...,b nare fixed (nonzero) real numbers.
Proof.Letvar(X
i)=σ
2
i
, and assume without loss of generality thatEX i=0,i=
1,2,...,n. For any real numbersα,β, andt
Ee
(αL1+βL 2)t
=Eexp
Ω
t
n
Θ
1
(αai+βb i)Xi
Γ
=
n
η
i=1
exp

t
2
2
(αa
i+βb i)
2
σ
2
i
!
=exp
Ω
α
2
t
2
2
n
Θ
1
a
2
i
σ
2
i
+
β
2
t
2
2
n
Θ
1
b
2 i
σ
2
i
Γ"
since
n
Θ
i
aibiσ
2
i
=0
#
=
n
η
i=1
exp

t
2
α
2
2
a
2 i
σ
2
i
!
·n
η
i=1
exp

t
2
β
2
2
b
2 i
σ
2
i
!
=
n
η
1
Ee
tαaiXi
·
n
η
1
Ee
tβbiXi
=Eexp
"

n
Θ
1
aiXi
#
·Eexp
"

n
Θ
1
biXi
#
=Ee
αtL1
Ee
βtL2
.
Thus we have shown that
M(αt,βt)=M(αt,0)M(0,βt)for allα,β,t.
It follows thatL
1andL 2are independent.

SOME CONTINUOUS DISTRIBUTIONS 221
Corollary.IfX 1,X2are independentN(μ 1,σ
2
)andN(μ 2,σ
2
)RVs, thenX 1−X2and
X
1+X2are independent. (This gives Theorem 24.)
Darmois [20] and Skitovitch [106] provided the converse of Theorem 26, which we
state without proof.
Theorem 27.IfX
1,X2,...,X nare independent RVs,a 1,a2,...,a n,b1,b2,...,b nare real
numbers none of which equals 0, and if the linear forms
L
1=
n
Θ
i=1
aiXi,L 2=
n
Θ
i=1
biXi
are independent, then all the RVs are normally distributed.
Corollary.IfXandYare independent RVs such thatX+YandX−Yare independent,
X,Y,X+Y, andX−Yare all normal.
Yet another result of this type is the following theorem.
Theorem 28.LetX
1,X2,...,X nbe iid RVs. Then the common distribution is normal if
and only if
S
n=
n
Θ
k=1
Xk andY n=
n
Θ
i=1
(Xi−n
−1
Sn)
2
are independent.
In Chapter 6 we will prove the necessity part of this result, which is basic to the theory
oft-tests in statistics (Chapter 10; see also Example 4.4.6). The sufficiency part was proved
by Lukacs [67], and we will not prove it here.
Theorem 29.X∼N(0,1)⇒X
2
∼χ
2
(1).
Proof.See Example 2.5.7 for the proof.
Corollary 1.IfX∼N(μ,σ
2
),theRVZ
2
=(X−μ)
2

2
isχ
2
(1).
Corollary 2.IfX
1,X2,...,X nare independent RVs andX k∼N(μ k,σ
2
k
),k=1,2,...,n,
then

n
k=1
(Xk−μk)
2

2
k
isχ
2
(n).
Theorem 30.LetXandYbe iidN(0,σ
2
)RVs. ThenX/YisC(1,0).
Proof.For the proof see Example 2.5.7.
We remark that the converse of this result does not hold; that is, ifZ=X/Yis the
quotient of two iid RVs andZhas aC(1,0)distribution, it does not follow thatXandY

222 SOME SPECIAL DISTRIBUTIONS
are normal, for takeXandYto be iid with PDF
f(x)=

2
π
1
1+x
4
,−∞<x<∞.
We leave the reader to verify thatZ=X/YisC(1,0).
5.3.6 Some Other Continuous Distributions
Several other distributions which are related to distributions studied earlier also arise in
practice. We record briefly some of these and their important characteristics. We will use
these distributions infrequently. We say thatXhas alognormal distributionifY=nX
has a normal distribution. The PDF ofXis then
f(x)=
1



exp


(logx−μ)
2

2
!
,x≥0, (65)
andf(x)=0forx<0, where−∞<μ<∞, σ>0. In fact forx>0
P(X≤x)=P(n X≤nx)
=P(Y≤nx)=P

Y−μ
σ

nx−μ
σ



nx−μ
σ

,
whereΦis the DF of aN(0,1)RV which leads to (65). It is easily seen that forn≥0



EX
n
=exp

nμ+n
2
σ
2
2

EX=exp

μ+σ
2
2

,var(X) = exp(2μ +2σ
2
)−exp(2μ +σ
2
).
(66)
The MGF ofXdoes not exist.
We say that the RVXhas aParetodistribution with parametersθ>0 andα>0 if its
PDF is given by
f(x)=
αθ
α
(x+θ)
α+1
,x>0 (67)
and 0 otherwise. Hereθis scale parameter andαis a shape parameter. It is easy to check
that



F(x)=P(X≤x)=1−
θ
α
(θ+x)
α
,x>0
EX=
θ
α−1
,α>1,andvar(X)=
αθ
2
(α−2)(α− 1)
2
(68)
forα>2. The MGF ofXdoes not exist since all moments ofXdo not.

SOME CONTINUOUS DISTRIBUTIONS 223
SupposeXhas a Pareto distribution with parametersθandα. WritingY=n(X/θ)we
see thatYhas PDF
f
Y(y)=
αe
y
(1+e
y
)
α+1
,−∞<y<∞, (69)
and DF
F
Y(y)=1−(1+e
y
)
−α
,for ally.
The PDF in (69) is known as alogisticdistribution. We introduce location and scale param-
etersμandσby writingZ=μ+σY, takingα=1 and then the PDF ofZis easily seen
to be
f
Z(z)=
1
σ
exp{(z −μ)/σ}
{1+exp[(z−μ)/σ]}
2
(70)
for all realz. This is the PDF of a logistic RV with location–scale parametersμandσ.We
leave the reader to check that















F
Z(z)=exp
$
(z−μ)
σ
%$
1+exp
&
(z−μ)
σ
'%
−1
EZ=μ,var(Z)=
π
2
σ
2
3
MZ(t) = exp(μt )Γ(1−σt)Γ(1+σt),t<
1
σ
.
(71)
Pareto distribution is also related to an exponential distribution. LetXhave Pareto PDF of
the form
f
X(s)=
ασ
α
x
α+1
,x>σ (72)
and 0 otherwise. A simple transformation leads to PDF (72) from (67). Then it is easily
seen thatY=n(X/σ)has an exponential distribution with mean 1/α. Thus some proper-
ties of exponential distribution which are preserved under monotone transformations can
be derived for Pareto PDF (72) by using the logarithmic transformation.
Some other distributions are related to the gamma distribution. SupposeX∼G(1,β).
LetY=X
1/α
,α>0. ThenYhas PDF
f
Y(y)=

α
β

y
α−1
exp

−y
α
β
!
,y>0 (73)
and 0 otherwise. The RVYis said to have aWeibull distribution. We leave the reader to
show that













F
Y(y)=1 −exp

−y
α
β

,y>0
EY
n

n/α
Γ

1+
n
α

,EY=β
1/β
Γ

1+


,
var(Y)=β
2/α
(
Γ

1+
2
α

−Γ
2

1+
1
α
)
.
(74)

224 SOME SPECIAL DISTRIBUTIONS
The MGF ofYexists only forα≥1butforα> 1 it does not have a form useful in
applications. The special caseα=2 andβ=θ
2
is known as aRayleigh distribution.
SupposeXhas a Weibull distribution with PDF (73). LetY=nX. ThenYhas DF
F
Y(y)=1−exp


1
β
e
αy
!
,−∞<y<∞.
Settingθ=(1/α)nβandσ=1/αwe get
F
Y(y)=1 −exp

−exp

y−θ
σ
!
(75)
with PDF
f
Y(y)=
1
σ
exp

(y−θ)
σ

−exp

(y−θ)
σ
!
, (76)
for−∞<y<∞andσ>0. An RV with PDF (76) is called anextreme valuedistribution
with location–scale parametersθandσ. It can be shown that





EY=θ−γσ,var(Y)=
π
2
σ
2
6
,and
M
Y(t)=e
θt
Γ(1+σt),
(77)
whereγ≈0.577216 is the Euler constant.
The final distribution we consider is also related to aG(1,β)RV. Letf
1be the PDF of
G(1,β)andf
2the PDF
f
2(x)=
1
β
exp

x
β

,x<0,=0 otherwise.
Clearlyf
2is also an exponential PDF defined on(−∞,0). Consider themixturePDF
f(x)=
1
2
[f
1(x)+f 2(x)],−∞<x<∞. (78)
Clearly,
f(x)=
1
2
exp

−|x|
β
!
,−∞<x<∞, (79)
and the PDFfdefined in (79) is called aLaplace or double exponentialpdf. It is convenient
to introduce a location parameterμand consider instead the PDF
f(x)=
1
2
exp

−|x−μ|
β
!
−∞<x<∞, (80)
where−∞<μ<∞, β>0. It is easy to see that for RVXwith PDF (80) we have
EX=μ,var(X)=2β
2
,andM(t)=e
μt
[1−(βt)
2
]
−1
, (81)
for|t|<1/β.

SOME CONTINUOUS DISTRIBUTIONS 225
For completeness let us define amixturePDF (PMF). Letg(x|θ)be a PDF and leth(θ)
be amixingPDF. Then the PDF
f(x)=

g(x|θ)h(θ)dθ (82)
is called amixture density function. In case his a PMF with support set{θ
1,θ2,...,θk},
then (82) reduces to afinite mixture density function
f(x)=
k
Θ
i=1
g(x|θ i)h(θi). (83)
The quantitiesh(θ
i)are calledmixing proportions. The PDF (78) is an example withk=2,
h(θ
1)=h(θ 2)=1/2, g(x|θ 1)=f1(x), andg(x|θ 2)=f2(x).
PROBLEMS 5.3
1.Prove Theorem 1.
2.LetXbe an RV with PMFp
k=P{X=k}given below. IfFis the corresponding
DF, find the distribution ofF(X), in the following cases:
(a)p
k=

n
k

p
k
(1−p)
n−k
,k=0,1,2,...,n;0<p<1.
(b)p
k=e
−λ

k
/k!),k=0,1,2,...;λ>0.
3.LetY
1∼U[0,1],Y 2∼U[0,Y 1],...,Y n∼U[0,Y n−1]. Show that
Y
1∼X1,Y 2∼X1X2,...,Y n∼X1X2···Xn,
whereX
1,X2,...,X nare iidU[0,1]RVs. IfUis the number ofY 1,Y2,...,Y nin[t,1],
where 0<t<1, show thatUhas a Poisson distribution with parameter−logt.
4.LetX
1,X2,...,X nbe iidU[0,1]RVs. Prove by induction or otherwise thatS n=

n
k=1
Xkhas the PDF
f
n(x)=[(n−1)!]
−1
n
Θ
k=0
(−1)
k

n
k

[ε(x−k)]
n−1
(x−k)
n−1
,
whereε(x)=1ifx≥0,=0ifx<0.
5.(a) LetXbe an RV with PMFp
j=P(X=x j),j=0,1,2,...,and letFbe the DF
ofX. Show that
EF(X)=
1
2



1+

Θ
j=0
p
2
j



varF(X)=

Θ
j=0
pjq
2
j+1

1
2

⎝1−

Θ
j=0
p
2
j


2
,
whereq
j+1=


i=j+1
pi.

226 SOME SPECIAL DISTRIBUTIONS
(b) Letp j>0forj=0,1,...,Nand

N
j=0
pj=1. Show that
EF(X)≥
(N+2)
[2(N+1)]
with equality if and only ifp
j=1/(N+1)for allj.
(Rohatgi [91])
6.Prove (a) Theorem 6 and its corollary, and (b) Theorem 10.
7.LetXbe a nonnegative RV of the continuous type, and letY∼U(0,X). Also, let
Z=X−Y. Then the RVsYandZare independent if and only ifXisG(2,1/λ)for
someλ>0. (Lamperti [59])
8.LetXandYbe independent RVs with common PDFf(x)=β
−α
αx
α−1
if 0<x<β,
and=0 otherwise;α≥1. LetU=min(X,Y)andV=max(X,Y). Find the joint
PDF ofUandVand the PDF ofU+V. Show thatU/VandVare independent.
9.Prove Theorem 14.
10.Prove Theorem 8.
11.Prove Theorems 19 and 20.
12.LetX
1,X2,...,X nbe independent RVs withX i∼C(μ i,λi),i=1,2,...,n. Show that
the RVX=1/

n i=1
X
−1
i
is also a Cauchy RV with parametersμ/(λ
2

2
)and
λ/(λ
2

2
), where
λ=
n
α
i=1
λi
λ
2
i

2
i
andμ=
n
α
i=1
μi
λ
2 i

2 i
.
13.LetX
1,X2,...,X nbe iidC(1,0)RVs anda i =0,b i,i=1,2,...,n, be any real
numbers. Find the distribution of

n
i=1
1/(aiXi+bi).
14.Suppose that the load of an airplane wing is a random variableXwith
N(1000, 14400) distribution. The maximum load that the wing can withstand is an
RVY, which isN(1260, 2500).If XandYare independent, find the probability that
the load encountered by the wing is less than its critical load.
15.LetX∼N(0,1). Find the PDF ofZ=1/X
2
.IfXandYare iidN(0,1), deduce that
U=XY/

X
2
+Y
2
isN(0,1/4).
16.In Problem 15 letXandYbe independent normal RVs with zero means. Show
thatU=XY/
*
(X
2
+Y
2
)is normal. If, in addition,var(X)=var(Y)show that
V=(X
2
−Y
2
)/
*
(X
2
+Y
2
)is also normal. Moreover,UandVare indepen-
dent. (Shepp [104])
17.LetX
1,X2,X3,X4be independentN(0,1). Show thatY=X 1X2+X3X4has the PDF
f(y)=
1
2
e
−|y|
,−∞<y<∞.
18.LetX∼N(15,16).Find(a)P{X≤12},(b)P{10≤X≤17},(c)P{10≤X≤19
|X≤17}and (d)P{|X−15|≥0.5}.
19.LetX∼N(−1,9).Findxsuch thatP{X>x}=0.38. Also findxsuch that
P{|X+1|<x}=0.4.

SOME CONTINUOUS DISTRIBUTIONS 227
20.LetXbe an RV such thatlog(X−a)isN(μ,σ
2
). Show thatXhas PDF
f(x)=





1
σ(x−a)


exp


[log(x −a)−μ]
2

2
!
ifx>a,
0i fx≤a.
Ifm
1,m2are the first two moments of this distribution andα 3=μ3/μ
3/2
2
is the
coefficient of skewness, show thata,μ,σare given by
a=m
1−
*
m2−m
2
1
η

2
=log(1+η
2
),
and
μ=log(m
1−a)−
1
2
σ
2
,
whereηis the real root of the equationη
3
+3η−α 3=0.
21.LetX∼G(α,β)and letY∼U(0,X).
(a) Find the PDF ofY.
(b) Find the conditional PDF ofXgivenY=y.
(c) FindP(X+Y≤2).
22.LetXandYbe iidN(0,1)RVs. Find the PDF ofX/|Y|. Also, find the PDF of|X|/|Y|.
23.It is known thatX∼B(α,β)andP(X<0.2)=0.22. Ifα+β=26, findαandβ.
[Hint: Use Table ST1.]
24.LetX
1,X2,...,X nbe iidN(μ,σ
2
)RVs. Find the distribution of
Y
n=

n
k=1
kXk−μ

n
k=1
k

n k=1
k
2

1/2
.
25.LetF
1,F2,...,F nbenDFs. Show thatmin[F 1(x1),F2(x2),...,F n(xn)]is an
n-dimensional DF with marginal DFsF
1,F2,...F n. (Kemp [50])
26.LetX∼NB(1; p)andY∼G(1,1/λ). Show thatXandYare related by the equation
P{X≤x}=P{Y≤[x]} forx>0,λ =log

1
1−p

,
where[x]is the largest integer≤x. Equivalently, show that
P{Y∈(n,n+1]}=P
θ{X=n},
whereθ=1−e
−λ
(Prochaska [82]).
27.LetTbe an RV with DFFand writeS(t)=1−F(t)=P(T >t). The functionFis
called thesurvival (or reliability) functionofX(or DFF). The functionλ(t)=
f(t)
S(t)
is calledhazard (or failure-rate) function. For the following PDF find the hazard
function:

228 SOME SPECIAL DISTRIBUTIONS
(a) Rayleigh:f(t)=(t/α
2
)exp{−t
2
/(2α
2
)},t>0.
(b) Lognormal:f(t)=1/(t σ

2π)exp{−(nt−μ)
2
/2σ
2
}.
(c) Pareto:f(t)=αθ
α
/t
α+1
,t>θ, and=0 otherwise.
(d) Weibull:f(t)=(α/β)t
α−1
exp(− t
α
/β),t>0.
(e) Logistic:f(t)=(1/β)exp{−(t−μ)/β}[1+exp{−(t−μ)/β}]
−2
,−∞<
t<∞.
28.Consider the PDF
f(x)=

λ
2πx
3

1/2
exp



λ(x−μ)
2

2
x
!
,x>0
and=0otherwise.AnRVX with PDFfis said to have aninverse Gaussian
distributionwith parametersμandλ, both positive. Show that
EX=μ,var(X)=μ
3
/λand
M(t)=Eexp(tX)=exp
Ω
λ
μ

1−

1−
2tμ
2
λ

1/2
γλ
.
29.Letfbe the PDF of aN(μ,σ
2
)RV:
(a) For what value ofcis the functioncf
n
,n>0, a pdf?
(b) LetΦbe the DF ofZ∼N(0,1).FindE{ZΦ(Z)}andE{Z
2
Φ(Z)}.
5.4 BIVARIATE AND MULTIVARIATE NORMAL DISTRIBUTIONS
In this section we introduce the bivariate and multivariate normal distributions and inves-
tigate some of their important properties. We note that bivariate analogs of other PDFs are
known but they are not always uniquely identified. For example, there are several versions
of bivariate exponential PDFs so-called because each has exponential marginals. We will
not encounter any of these bivariate PDFs in this book.
Definition 1.A two-dimensional RV(X,Y)is said to have a bivariate normal distribution
if the joint PDF is of the form
f(x,y)=
1
2πσ1σ2
*
1−ρ
2
e
−Q(x,y)/2
, (1)
−∞<x<∞, −∞<y<∞,
whereσ
1>0,σ 2>0,|ρ|<1, andQis the positive definite quadratic form
Q(x,y)=
1
1−ρ
2


x−μ
1σ1

2
−2ρ

x−μ
1
σ1

y−μ
2
σ2

+

y−μ
2
σ2

2
γ
.(2)
Figure 1 gives graphs of bivariate normal PDF for selected values ofρ.

BIVARIATE AND MULTIVARIATE NORMAL DISTRIBUTIONS 229
−3
−2
−1
0
ρ = −0.91
2
3
−3
−2
−1
0
1
2
3
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
0.4
(a)
–3
−2
−1
0
ρ = −0.5
1
2
3
−3
−2
−1
0
1
2
3
0
0.02
0.04
0.06
0.08
0.1
0.12
0.14
0.16
0.18
0.2
(b)
Fig. 1Bivariate normal withμ 1=μ2=0,σ 1=σ2=1, andρ=−0.9,−0.5,0.5,0.9.

230 SOME SPECIAL DISTRIBUTIONS
−2
−1
0
1
2
ρ = 0.5
3
−3
−2
−1
0
1
2
3
0
0.05
0.1
0.15
0.2
(c)
−3
ρ= 0.9
(d)
Fig. 1(continued).
We first show that (1) indeed defines a joint PDF. In fact, we prove the following result.
Theorem 1.The function defined by (1) and (2) withσ
1>0,σ 2>0,|ρ|<1 is a joint
PDF. The marginal PDFs ofXandYare, respectively,N(μ
1,σ
2
1
)andN(μ 2,σ
2
2
), andρis
the correlation coefficient betweenXandY.

BIVARIATE AND MULTIVARIATE NORMAL DISTRIBUTIONS 231
Proof.Letf 1(x)=


−∞
f(x,y)dy. Note that
(1−ρ
2
)Q(x,y)=

y−μ
2
σ2
−ρ
x−μ
1
σ1

2
+(1−ρ
2
)

x−μ
1
σ1

2
=

y−[μ
2+ρ(σ 2/σ1)(x−μ 1)]
σ2
!
2
+(1−ρ
2
)

x−μ
1
σ1

2
.
It follows that
f
1(x)=
1
σ1


exp

−(x−μ
1)
2

2
1
!

−∞
exp
+
−(y−β x)
2
/[2σ
2
2
(1−ρ
2
)]
,
σ2
*
1−ρ
2


dy, (3)
where we have written
β
x=μ2+ρ

σ
2
σ1

(x−μ
1). (4)
The integrand is the PDF of anN(β
x,σ
2
2
(1−ρ
2
))RV, so that
f
1(x)=
1
σ1


exp
Ω

1
2

x−μ
1
σ1

2
Γ
,−∞<x<∞.
Thus


−∞


−∞
f(x,y)dy
!
dx=


−∞
f1(x)dx=1,
andf(x,y)is a joint PDF of two RVs of the continuous type. It also follows thatf
1is the
marginal PDF ofX, so thatXisN(μ
1,σ
2
1
). In a similar manner we can show thatYis
N(μ
2,σ
2
2
).
Furthermore, we have
f(x,y)
f1(x)
=
1
σ2
*
1−ρ
2


exp

−(y−β
x)
2

2
2
(1−ρ
2
)
!
, (5)
whereβ
xis given by (4). It is clear, then, that the conditional PDFf
Y|X(y|x)given by (5)
is also normal, with parametersβ
xandσ
2
2
(1−ρ
2
).Wehave
E{Y|x}=β
x=μ2+ρ
σ
2
σ1
(x−μ 1) (6)
and
var{Y|x}=σ
2
2
(1−ρ
2
). (7)

232 SOME SPECIAL DISTRIBUTIONS
In order to show thatρis the correlation coefficient betweenXandY, it suffices to
show thatcov(X ,Y)=ρσ
1σ2.Wehavefrom(6)
E(XY)=E{E{XY|X}}
=E

X

μ
2+ρ
σ
2
σ1
(X−μ 1)
!

1μ2+
ρσ
2
σ1
σ
2
1
.
It follows that
cov(X ,Y)=E(XY)−μ
1μ2=ρσ1σ2.
Remark 1.Ifρ
2
=1, then (1) becomes meaningless. But in that case we know
(Theorem 4.5.1) that there exist constantsaandbsuch thatP{Y=aX+b}=1. We
thus have a univariate distribution, which is called thebivariate degenerate(orsingular)
normaldistribution. The bivariate degenerate normal distribution does not have a PDF
but corresponds to an RV(X,Y)whose marginal distributions are normal or degenerate
and are such that(X,Y)falls on a fixed line with probability 1. It is for this reason that
degenerate distributions are considered as normal distributions with variance 0.
Next we compute the MGFM(t
1,t2)of a bivariate normal RV(X,Y).Wehave,iff(x,y)
is the PDF given in (1) andf
1is the marginal PDF ofX,
M(t
1,t2)=


−∞

−∞
e
t1x+t2y
f(x,y)dxdy,
=


−∞


−∞
f
Y|X(y|x)e
t2y
dy
!
e
t1x
f1(x)dx
=


−∞
e
t1x
f1(x)

exp

1
2
σ
2
2
t
2
2
(1−ρ
2
)+t2

μ
2+ρ
σ
2σ1
(x−μ 1)
!
dx
=exp

1
2
σ
2
2
t
2
2
(1−ρ
2
)+t2μ2−ρt2
σ2
σ1
μ1


−∞
e
t1x
e
(ρσ2/σ1)xt2
f1(x)dx.
Now


−∞
e
(t1+ρt2σ2/σ1)x
f1(x)dx=exp

μ 1

t
1+ρ
σ
2
σ1
t2

+
1
2
σ
2
1

t
1+ρt2
σ2σ1

2
γ
.
Therefore,
M(t
1,t2)=exp

μ 1t1+μ2t2+
σ
2
1
t
2
1

2
2
t
2
2
+2ρσ 1σ2t1t2
2

. (8)
The following result is an immediate consequence of (8).

BIVARIATE AND MULTIVARIATE NORMAL DISTRIBUTIONS 233
Theorem 2.If(X,Y)has a bivariate normal distribution,XandYare independent if and
only ifρ=0.
Remark 2.It is quite possible for an RV(X,Y)to have a bivariate density such that the
marginal densities ofXandYare normal and the correlation coefficient is 0, yetXandY
are not independent. Indeed, if the marginal densities ofXandYare normal, it does not
follow that the joint density of(X,Y)is a bivariate normal. Let
f(x,y)=
1
2

1
2π(1−ρ
2
)
1/2
exp

−1
2(1−ρ
2
)
(x
2
−2ρxy+y
2
)

(9)
+
1
2π(1−ρ
2
)
1/2
exp

−1
2(1−ρ
2
)
(x
2
+2ρxy+y
2
)
!
.
Heref(x,y)is a joint PDF such that both marginal densities are normal,f(x,y)is not
bivariate normal, andXandYhave zero correlation. ButXandYare not independent. We
have
f
1(x)=
1


e
−x
2
/2
,−∞<x<∞,
f
2(y)=
1


e
−y
2
/2
,−∞<y<∞,
EXY=0.
Example 1.(Rosenberg [93]). Letfandgbe PDFs with corresponding DFsFandG.
Also, let
h(x,y)=f(x)g(y)[1 +α(2F(x)−1)(2G(y) −1)], (10)
where|α|≤1 is a constant. It was shown in Example 4.3.1 thathis a bivariate density
function with given marginal densitiesfandg.
In particular, takefandgto be the PDF ofN(0,1), that is,
f(x)=g(x )=
1


e
−x
2
/2
,−∞<x<∞, (11)
and let(X,Y)have the joint PDFh(x,y). We will show thatX+Yis not normal except in
the trivial caseα=0, whenXandYare independent.
LetZ=X+Y. Then
EZ=0, var(Z)=var(X)+var(Y)+2cov(X ,Y).
It is easy to show (Problem 2) thatcov(X ,Y)=α/π, so thatvar(Z)=2[1+(α/π)].IfZ
is normal, its MGF must be
M
z(t)=e
t
2
[1+(α/π )]
. (12)

234 SOME SPECIAL DISTRIBUTIONS
Next we compute the MGF ofZdirectly from the joint PDF (10). We have
M
1(t)=E{e
tX+tY
}
=e
t
2



−∞

−∞
e
tx+ty
[2F(x)−1][2F(y)−1]f(x)f(y)dxdy
=e
t
2



−∞
e
tx
[2F(x)−1]f(x)dx

2
.
Now


−∞
e
tx
[2F(x)−1]f(x)dx=−2


−∞
e
tx
[1−F(x)]f(x)dx+e
t
2
/2
=e
t
2
/2
−2


−∞

x
1

exp


1
2
(x
2
+u
2
−2tx)
!
dudx
=e
t
2
/2



−∞

0exp


1
2
[x
2
+(v+x)
2
−2tx]
!
π
dvdx
=e
t
2
/2



0
exp{−v
2
/2+(v−t)
2
/4}

π


−∞
exp{−[x+(v−t)/2]
2
}

π
dxdv
=e
t
2
/2
−2e
t
2
/2


0exp


1
2
[(v+t)
2
/2]
!
2

π
dv
=e
t
2
/2
−2e
t
2
/2
P

Z 1>
t

2
!
, (13)
whereZ
1is anN(0,1)RV.
It follows that
M
1(t)=e
t
2


e
t
2
/2
−2e
t
2
/2
P

Z 1>
1

2
!
2
=e
t
2

1+α

1−2P

Z
1>
t

2
!
2
γ
. (14)
IfZwere normally distributed, we must haveM
z(t)=M 1(t)for alltand all|α|≤1,
that is,
e
t
2
e
(α/π)t
2
=e
t
2

1+α

1−2P

Z
1>
t

2
!
2
γ
. (15)
Forα=0, the equality clearly holds. The expression within the brackets on the right side
of (15) is bounded by 1+α, whereas the expressione
(α/π)t
2
is unbounded, so the equality
cannot hold for alltandα.
Next we investigate the multivariate normal distribution of dimensionn,n≥2. LetM
be ann×nreal, symmetric, and positive definite matrix. Letxdenote then×1 column

BIVARIATE AND MULTIVARIATE NORMAL DISTRIBUTIONS 235
vector of real numbers(x 1,x2,...,x n)
π
and letμdenote the column vector(μ 1,μ2,...,μn)
π
,
whereμ
i(i=1,2,...,n)are real constants.
Theorem 3.The nonnegative function
f(x)=cexp


(x−μ)
π
M(x−μ)
2
!
−∞<x
i<∞, (16)
i=1,2,...,n,
defines the joint PDF of some random vectorX=(X
1,X2,...,X n)
π
, provided that the
constantcis chosen appropriately. The MGF ofXexists and is given by
M(t
1,t2,...,t n)=exp

t
π
μ+
t
π
M
−1
t
2
!
, (17)
wheret=(t
1,t2,...,t n)
π
andt 1,t2,...,t nare arbitrary real numbers.
Proof.Let
I=c


−∞
···


−∞
exp

t
π
x−
(x−μ)
π
M(x−μ)
2
!
n
η
i=1
dxi. (18)
Changing the variables of integration toy
1,y2,...,y nby writingx i−μi=yi,i=1,2,...,n,
andy=(y
1,y2,...,y n)
π
,wehavex−μ=yand
I=cexp(t
π
μ)


−∞
···


−∞
exp

t
π
y−
y
π
My
2

n
η
i=1
dyi. (19)
SinceMis positive definite, it follows that all thencharacteristic roots ofM,say
m
1,m2,...,m n, are positive. Moreover, sinceMis symmetric there exists ann×northog-
onal matrixLsuch thatL
π
MLis a diagonal matrix with diagonal elementsm 1,m2,...,m n.
Let us change the variables toz
1,z2,...,z nby writingy=Lz, wherez
π
=(z1,z2,...,z n),
and note that the Jacobian of this orthogonal transformation is|L|. SinceL
π
L=I n, where
I
nis ann×nunit matrix,|L|=1 and we have
I=cexp(t
π
μ)


−∞
···


−∞
exp

t
π
Lz−
z
π
L
π
MLz
2

n
η
i=1
dzi. (20)
If we writet
π
L=u
π
=(u 1,u2,...,u n)thent
π
Lz=

n
i=1
uizi.AlsoL
π
ML=
diag(m
1,m2,...,m n)so thatz
π
L
π
MLz=

n
i=1
miz
2
i
. The integral in (20) can therefore
be written as
n
η
i=1


−∞
exp

u izi−
m
1z
2
i
2

dz
i

= n
η
i=1
-2π
mi
exp

u
2 i
2mi

γ
.

236 SOME SPECIAL DISTRIBUTIONS
If follows that
I=cexp(t
π
u)
(2π)
n/2
(m1m2···mn)
1/2
exp
"
n
Θ
i=1
u
2
i
2mi
#
. (21)
Settingt
1=t2=···=t n=0, we see from (18) and (21) that


−∞
···


−∞
f(x1,x2,...,x n)dx1dx2···dxn=
c(2π)
n/2
(m1m2···mn)
1/2
.
By choosing
c=
(m
1m2···mn)
1/2
(2π)
n/2
(22)
we see thatfis a joint PDF of some random vectorX, as asserted.
Finally, since
(L
π
ML)
−1
=diag(m
−1
1
,m
−1
2
,...,m
−1
n
),
we have
n
Θ
i=1
u
2 i
mi
=u
π
(L
π
M
−1
L)u=t
π
M
−1
t.
Also
|M
−1
|=|L
π
M
−1
L|=(m 1m2···mn)
−1
.
It follows from (21) and (22) that the MGF ofXis given by (17), and we may write
c=
1
{(2π)
n
|m
−1
|}
1/2
. (23)
This completes the proof of Theorem 3.
Let us writeM
−1
=(σ ij)i,j=1,2,...,n . Then
M(0,0,...,0,t
i,0,...,0)=exp

t iμi+σii
t
2
i
2

is the MGF ofX
i,i=1,2,...,n. Thus eachX iisN(μ i,σii),i=1,2,...,n.Fori =j,we
have for the MGF ofX
iandX j
M(0,0,...,0,t i,0,...,0,t j,0,...,0)
=exp
"
t
iμi+tjμj+
σ
iit
2
i
+2σ ijtitj+t
2
j
σjj
2
#
.

BIVARIATE AND MULTIVARIATE NORMAL DISTRIBUTIONS 237
This is the MGF of a bivariate normal distribution with meansμ i,μj, variancesσ ii,σjj,
and covarianceσ
ij. Thus we see that
μ
π
=(μ 1,μ2,...,μn) (24)
is the mean vector ofX
π
=(X 1,...,X n),
σ
ii=σ
2
i
=var(X i), i=1,2,...,n, (25)
and
σ
ij=ρijσiσj,i =j;i,j=1,2,...,n. (26)
The matrixM
−1
is called thedispersion(variance-covariance) matrix of the multivariate
normal distribution.
Ifσ
ij=0fori =j, the matrixM
−1
is a diagonal matrix, and it follows that the RVs
X
1,X2,...,X nare independent. Thus we have the following analog of Theorem 2.
Theorem 4.The componentsX
1,X2,...,X nof a jointly normally distributed RVXare
independent if and only if the covariancesσ
ij=0 for alli =j(i,j=1,2,...,n).
The following result is stated without proof. The proof is similar to the two-variate case
except that now we consider the quadratic form innvariables:E{

n
i=1
ti(Xi−μi)}
2
≥0.
Theorem 5.The probability that the RVsX
1,X2,...,X nwith finite variances satisfy at
least one linear relationship is 1 if and only if|M|=0.
Accordingly, if|M|=0 all the probability mass is concentrated on a hyperplane of
dimension<n.
Theorem 6.Let(X
1,X2,...,X n)be ann-dimensional RV with a normal distribution. Let
Y
1,Y2,...,Y k,k≤n, be linear functions ofX j(j=1,2,...,n). Then(Y 1,Y2,...,Y k)also
has a multivariate normal distribution.
Proof.Without loss of generality let us assume thatEX
i=0,i=1,2,...,n.Let
Y
p=
n
α
j=1
ApjXj,p=1,2,...,k;k≤n. (27)
ThenEY
p=0,p=1,2,...,k, and
cov(Y
p,Yq)=
n
α
i,j=1
ApiAqjσij, (28)
whereE(X
iXj)=σ ij,i,j=1,2,...,n.

238 SOME SPECIAL DISTRIBUTIONS
The MGF of(Y 1,Y2,...,Y k)is given by
M

(t1,t2,...,t k)=E



exp

⎝t 1
n
Θ
j=1
A1jXj+···+t k
n
Θ
j=1
AkjXj





.
Writingu
j=

k
p=1
tpApj,j=1,2,...,n,wehave
M

(t1,t2,...,t k)=E
Ω
exp
"
n
Θ
i=1
uiXi

=exp


1
2
n
Θ
i,j=1
σijuiuj
⎞ ⎠by (17)
=exp
⎛ ⎝
1
2
n
Θ
i,j=1
σij
k
Θ
l,m=1
tltmAliAmj
⎞ ⎠
=exp
⎛ ⎝
1
2
k
Θ
l,m=1
tltm
n
Θ
i,j=1
AliAmjσij
⎞ ⎠
=exp



1
2
k
Θ
l,m=1
tltmcov(Y l,Ym)



. (29)
When (17) and (29) are compared, the result follows.
Corollary 1.Every marginal distribution of ann-dimensional normal distribution is
univariate normal. Moreover, any linear function ofX
1,X2,...,X nis univariate normal.
Corollary 2.IfX
1,X2,...,X nare iidN(μ,σ
2
)andAis ann×northogonal transforma-
tion matrix, the componentsY
1,Y2,...,Y nofY=AX
π
, whereX=(X 1,...,X n)
π
,are
independent RVs, each normally distributed with the same varianceσ
2
.
We have from (27) and (28)
cov(Y
p,Yq)=
n
Θ
i=1
ApiAqiσii+
Θ
iΩ=j
ApiAqjσij
=
Ω
0ifp =q,
σ
2
ifp=q,
since

n
i=1
ApiAqi=0 and

n
j=1
A
2
pj
=1. It follows that
M

(t1,t2,...,t n)=exp
"
1
2
n
Θ
l=1
t
2
l
σ
2
#
and Corollary 2 follows.

BIVARIATE AND MULTIVARIATE NORMAL DISTRIBUTIONS 239
Theorem 7.LetX=(X 1,X2,...,X n)
π
. ThenXhas ann-dimensional normal distribution
if and only if every linear function ofX
X
π
t=t 1X1+t2X2+···+t nXn
has a univariate normal distribution.
Proof.Suppose thatX
π
tis normal for anyt. Then the MGF ofX
π
tis given by
M(s)=exp

bs+
1
2
σ
2
s
2

. (30)
Hereb=E{X
π
t}=

n
1
tijμi=t
π
μ, whereμ
π
=(μ 1,...,μn), andσ
2
=var(X
π
t)=
var(

t
iXi)=t
π
M
−1
t, whereM
−1
is the dispersion matrix ofX. Thus
M(s)=exp

t
π
μs+
1
2
t
π
M
−1
ts
2

. (31)
Lets=1 then
M(1)=exp

t
π
μ+
1
2
t
π
M
−1
t

, (32)
and since the MGF is unique, it follows thatXhas a multivariate normal distribution. The
converse follows from Corollary 1 to Theorem 6.
Many characterization results for the multivariate normal distribution are now available.
We refer the reader to Lukacs and Laha [70, p. 79].
PROBLEMS 5.4
1.Let(X,Y)have joint PDF
f(x,y)=
1


7
exp


8
7

x
2
16

31
32
x+
xy
8
+
y
2
9

4
3
y+
71
16
!
,
for−∞<x<∞,−∞<y<∞.
(a) Find the means and variances ofXandY. Also findρ.
(b) Find the conditional PDF ofYgivenX=xandE{Y|x},var{Y|x}.
(c) FindP{4≤Y≤6|X=4}.
2.In Example 1 show thatcov(X ,Y)=α/π.
3.Let(X,Y)be a bivariate normal RV with parametersμ
1,μ2,σ
2
1

2
2
, andρ. What is
the distribution ofX+Y? Compare your result with that of Example 1.
4.Let(X,Y)be a bivariate normal RV with parametersμ
1,μ2,σ
2
1

2
2
, andρ, and let
U=aX+b,a =0, andV=cY+d,c =0. Find the joint distribution of(U,V).

240 SOME SPECIAL DISTRIBUTIONS
5.Let(X,Y)be a bivariate normal RV with parametersμ 1=5,μ 2=8,σ
2
1
=16,σ
2
2
=9,
andρ=0.6. FindP{5<Y<11|X=2}.
6.LetXandYbe jointly normal with means 0. Also, let
W=Xcosθ+Ysinθ,Z=Xcosθ−Ysinθ.
Findθsuch thatWandZare independent.
7.Let(X,Y)be a normal RV with parametersμ
1,μ2,σ
2
1

2
2
, andρ. Find a necessary
and sufficient condition forX+YandX−Yto be independent.
8.For a bivariate normal RV with parametersμ
1,μ2,σ1,σ2, andρshow that
P(X>μ
1,Y>μ 2)=
1
4
+
1

tan
−1
ρ
*
1−ρ
2
.
[Hint: The required probability isP

(X−μ
1)/σ1>0,(Y−μ 2)/σ2>0

. Change
to polar coordinates and integrate.]
9.Show that every variance–covariance matrix is symmetric positive semidefinite and
conversely. If the variance–covariance matrix is not positive definite, then with prob-
ability 1 the random (column) vectorXlies in some hyperplanec
π
X=awith
c =0.
10.Let(X,Y)be a bivariate normal RV withEX=EY=0,var(X)=var(Y)=1, and
cov(X ,Y)=ρ. Show that the RVZ=Y/Xhas a Cauchy distribution.
11.(a) Show that
f(x)=
1
(2π)
n/2
exp



x
2
i
2
!

1+
n
η
1

x
ie
−x
2
i
/2

γ
is a joint PDF onR
n.
(b) Let(X
1,X2,...,X n)have PDFfgiven in (a). Show that the RVs in any proper
subset of{X
1,X2,...,X n}containing two or more elements are independent
standard normal RVs.
5.5 EXPONENTIAL FAMILY OF DISTRIBUTIONS
Most of the distributions that we have so far encountered belong to a general family of
distributions that we now study. LetΘbe an interval on the real line, and let{f
θ:θ∈Θ}
be a family of PDFs (PMFs). Here and in what follows we writex=(x
1,x2,...,x n)unless
otherwise specified.
Definition 1.If there exist real-valued functionsQ(θ)andD(θ)onΘand Borel-
measurable functionsT(x
1,x2,...,x n)andS(x 1,x2,...,x n)onR nsuch that
f
θ(x1,x2,...,x n)=exp{Q(θ)T(x)+D(θ )+S(x)}, (1)
we say that the family{f
θ,θ∈Θ}is aone-parameter exponential family.

EXPONENTIAL FAMILY OF DISTRIBUTIONS 241
LetX 1,X2,...,X mbe iid with PMF (PDF)f θ. Then the joint distribution ofX=
(X
1,X2,...,X m)is given by
g
θ(x)=
m
η
i=1
fθ(xi)=
m
η
i=1
exp{Q(θ )T(x i)+D(θ )+S(x i)}
=exp
ε
Q(θ)
m
α
i=1
T(xi)+mD(θ )+
m
α
i=1
S(xi)
λ
,
wherex=(x
1,x2,...,x m),xj=(x j1,xj2,...,x jn),j=1,2,...,m, and it follows that
{g
θ:θ∈Θ}is again a one-parameter exponential family.
Example 1.LetX∼N(μ
0,σ
2
), whereμ 0is known andσ
2
unknown. Then
f
σ
2(x)=
1
σ


exp


(x−μ
0)
2

2
!
=exp

−log(σ

2π)−
(x−μ
0)
2

2
!
is a one-parameter exponential family with
Q(σ
2
)=−
1

2
,T(x)=(x−μ 0)
2
,S(x)=0,and
D(σ
2
)=− log(σ

2π).
IfX∼N(μ,σ
2
0
), whereσ 0is known butμis unknown, then
f
μ(x)=
1
σ0


exp


(x−μ)
2

2
0
!
=
1
σ0


exp


x
2

2
0
+
μx
σ
2
0

μ
2

2
0

is a one-parameter exponential family with
Q(μ)=
μ
σ
2
0
,D(μ)=−
μ

2
0
,T(x)=x,
and
S(x)=−

x
2

2
0
+
1
2
log(2πσ
2
0
)

.
Example 2.LetX∼P(λ),λ>0 unknown. Then
P
λ{X=x}=e
−λ
λ
x
x!
=exp{−λ+xlogλ−log(x!)},

242 SOME SPECIAL DISTRIBUTIONS
and we see that the family of Poisson PMFs with parameterλis a one-parameter
exponential family.
Some other important examples of one-parameter exponential families are binomial,
G(α,β)(provided that one ofα,βis fixed),B(α,β)(provided that one ofα,βis
fixed), negative binomial, and geometric. The Cauchy family of densities and the uniform
distribution on[0,θ]do not belong to this class.
Theorem 1.Let{f
θ:θ∈Θ}be a one-parameter exponential family of PDFs (PMFs)
given in (1). Then the family of distributions ofT(X)is also a one-parameter exponential
family of PDFs (PMFs), given by
g
θ(t)=exp{tQ(θ)+D(θ )+S

(t)}
for suitableS

(t).
Proof.The proof of Theorem 1 is a simple application of the transformation of variables
technique studied in Section 4.4 and is left as an exercise, at least for the cases considered
in Section 4.4. For the general case we refer to Lehmann [64, p. 58].
Let us now consider thek-parameter exponential family,k≥2. LetΘ⊆R
kbe ak-
dimensional interval.
Definition 2.If there exist real-valued functionsQ
1,Q2,...,Q k,Ddefined onΘ, and
Borel-measurable functionsT
1,T2,...,T k,SonR nsuch that
f
θ(x)=exp
ε
k
α
i=1
Qi(θ)T i(x)+D(θ )+S(x)
λ
, (2)
we say that the family{f
θ,θ∈Θ}is ak-parameter exponential family.
Once again, ifX=(X
1,X2,...,X m)andX jare iid with common distribution (2), the
joint distributions ofXform ak-parameter exponential family. An analog of Theorem 1
also holds for thek-parameter exponential family.
Example 3.The most important example of ak-parameter exponential family isN(μ,σ
2
)
when bothμandσ
2
are unknown. We have
θ=(μ,σ
2
), Θ={(μ,σ
2
):−∞ <μ<∞,σ
2
>0}
and
f
θ(x)=
1
σ


exp


x
2
−2μx+μ
2

2

=exp


x
2

2
+
μ
σ
2
x−
1
2

μ
2
σ
2
+log(2πσ
2
)
!
.

EXPONENTIAL FAMILY OF DISTRIBUTIONS 243
It follows thatf θis a two-parameter exponential family with
Q
1(θ)=−
1

2
,Q 2(θ)=
μ
σ
2
,T 1(x)=x
2
,T 2(x)=x,
D(θ)=−
1
2

μ
2
σ
2
+log(2πσ
2
)

,andS(x)=0.
Other examples are theG(α,β)andB(α,β)distributions when bothα,βare unknown,
and the multinomial distribution.U[α,β]does not belong to this family, nor doesC(α,β).
Some general properties of exponential families will be studied in Chapter 8, and the
importance of these families will then become evident.
Remark 1.The form in (2) is not unique as easily seen by substitutingαQ
iforQ iand
(1/α)T
iforT i. This, however, is not going to be a problem in statistical considerations.
Remark 2.The integerkin Definition 2 is also not unique since the family{1,Q
1,...,Q k}
or{1,T
1,...,T k}may be linearly dependent. In general,kneed not be the dimension ofΘ.
Remark 3.The support{x:f
θ(x)>0}does not depend onθ.
Remark 4.In (2), one can change parameters toη
i=Qi(θ),i=1,2,...,kso that
fη(x)=exp
θ
k
Θ
i=1
ηiTi(x)+D(η )+S(x)
Γ
(3)
where the parametersη=(η
1,η2,...,ηk)are callednatural parameters.Againη imay be
linearly dependent so one ofη
imay be eliminated.
PROBLEMS 5.5
1.Show that the following families of distributions are one-parameter exponential
families:
(a)X∼b(n,p).
(b)X∼G(α,β), (i) ifαis known and (ii) ifβis known.
(c)X∼B(α,β), (i) ifαis known and (ii) ifβis known.
(d)X∼NB(r;p), whereris known,punknown.
2.LetX∼C(1,θ). Show that the family of distributions ofXis not a one-parameter
exponential family.
3.LetX∼U[0,θ],θ∈[0,∞). Show that the family of distributions ofXis not an
exponential family.
4.Is the family of PDFs
f
θ(x)=
1
2
e
−|x−θ|
,−∞<x<∞,θ∈(−∞,∞),
an exponential family?

244 SOME SPECIAL DISTRIBUTIONS
5.Show that the following families of distributions are two-parameter exponential
families:
(a)X∼G(α,β), bothαandβunknown.
(b)X∼B(α,β), bothαandβunknown.
6.Show that the families of distributionsU[α,β]andC(α,β)do not belong to the
exponential families.
7.Show that the multinomial distributions form an exponential family.

6
SAMPLE STATISTICS AND THEIR
DISTRIBUTIONS
6.1 INTRODUCTION
In the preceding chapters we discussed fundamental ideas and techniques of probability
theory. In this development we created a mathematical model of a random experiment by
associating with it a sample space in which random events correspond to sets of a certain
σ-field. The notion of probability defined on thisσ-field corresponds to the notion of
uncertainty in the outcome on any performance of the random experiment.
In this chapter we begin the study of some problems of mathematical statistics. The
methods of probability theory learned in preceding chapters will be used extensively in
this study.
Suppose that we seek information about some numerical characteristics of a collection
of elements called apopulation. For reasons of time or cost we may not wish or be able to
study each individual element of the population. Our object is to draw conclusions about
the unknown population characteristics on the basis of information on some characteristics
of a suitably selectedsample. Formally, letXbe a random variable which describes the
population under investigation, and letFbetheDFofX . There are two possibilities. Either
Xhas a DFF
θwith a known functional form (except perhaps for the parameterθ, which
maybeavector)orX has a DFFabout which we know nothing (except perhaps thatF
is, say, absolutely continuous). In the former case letΘbe the set of possible values of the
unknown parameterθ. Then the job of a statistician is to decide, on the basis of a suitably
selected sample, which member or members of the family{F
θ,θ∈Θ}can represent the
DF ofX. Problems of this type are called problems ofparametric statistical inferenceand
will be the subject of investigation in Chapters 8 through 12. The case in which nothing is
An Introduction to Probability and Statistics, Third Edition. Vijay K. Rohatgi and A.K. Md. Ehsanes Saleh.
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.

246 SAMPLE STATISTICS AND THEIR DISTRIBUTIONS
known about the functional form of the DFFofXis clearly much more difficult. Inference
problems of this type fall into the domain ofnonparametric statisticsand will be discussed
in Chapter 13.
To be sure, the scope of statistical methods is much wider than the statistical inference
problems discussed in this book. Statisticians, for example, deal with problems of plan-
ning and designing experiments, of collecting information, and of deciding how best the
collected information should be used. However, here we concern ourselves only with the
best methods of making inferences about probability distributions.
In Section 6.2 of this chapter we introduce the notions of (simple)random sampleand
sample statistics. In Section 6.3 we study sample moments and their exact distributions. In
Section 6.4 we consider some important distributions that arise in sampling from a normal
population. Sections 6.5 and 6.6 are devoted to the study of sampling from univariate and
bivariate normal distributions.
6.2 RANDOM SAMPLING
Consider a statistical experiment that culminates in outcomesx, which are the values
assumedbyanRVX .LetF betheDFofX. In practice,Fwill not be completely known,
that is, one or more parameters associated withFwill be unknown. The job of a statistician
is to estimate these unknown parameters or to test the validity of certain statements about
them. She can obtainnindependent observations onX. This means that she observesn
valuesx
1,x2,...,x nassumedbytheRVX . Eachx ican be regarded as the value assumed
by an RVX
i,i=1,2,...,n, whereX 1,X2,...,X nare independent RVs with common DFF.
The observed values(x
1,x2,...,x n)are then values assumed by(X 1,X2,...,X n).Theset
{X
1,X2,...,X n}is then asampleof sizentaken from apopulation distribution F.Theset
ofnvaluesx
1,x2,...,x nis called arealizationof the sample. Note that the possible values
of the RV(X
1,X2,...,X n)can be regarded as points inR n, which may be called thesample
space. In practice one observes notx
1,x2,...,x nbut some functionf(x 1,x2,...,x n). Then
f(x
1,x2,...,x n)are values assumed by the RVf(X 1,X2,...,X n).
Let us now formalize these concepts.
Definition 1.LetXbe an RV with DFF, and letX
1,X2,...,X nbe iid RVs with common
DFF. Then the collectionX
1,X2,...,X nis known as a random sample of sizenfrom the
DFFor simply asnindependent observations onX.
IfX
1,X2,...,X nis a random sample fromF, their joint DF is given by
F

(x1,x2,...,x n)=
n
σ
i=1
F(xi). (1)
Definition 2.LetX
1,X2,...,X nbenindependent observations on an RVX, and let
f:R
n→R kbe a Borel-measurable function. Then the RVf(X 1,X2,...,X n)is called a
(sample) statisticprovided that it is not a function of any unknown parameter(s).
Two of the most commonly used statistics are defined as follows.

RANDOM SAMPLING 247
Definition 3.LetX 1,X2,...,X nbe a random sample from a distribution functionF. Then
the statistic
X=n
−1
Sn=
n
θ
i=1
Xi
n
(2)
is called thesample mean, and the statistic
S
2
=
n
θ
1
(Xi−
X)
2
n−1
=
ε
n
i=1
X
2
i
−n
X
2
n−1
(3)
is called thesample varianceandSis called thesample standard deviation.
Remark 1.Whenever the word “sample” is used subsequently, it will mean “random
sample.”
Remark 2.Sampling from a probability distribution (Definition 1) is sometimes referred
to as sampling from an infinite population since one can obtain samples of any size one
desires even if the population is finite (by samplingwith replacement).
Remark 3.In samplingwithout replacementfrom a finite population, the independence
condition of Definition 1 is not satisfied. Suppose a sample of size 2 is taken from a finite
population(a
1,a2,...,a N)without replacement. LetX ibe the outcome on theith draw.
ThenP{X
1=a1}=1/N,P{X 2=a2|X1=a1}=
1
N−1
, andP{X 2=a2|X1=a2}=0.
Thus the PMF ofX
2depends on the outcome of the first draw (that is, on the value ofX 1),
andX
1andX 2are not independent. Note, however, that
P{X
2=a2}=
N
θ
j=1
P{X1=aj}P{X 2=a2|aj}
=
θ
jθ=2
P{X1=aj}P{X 2=a2|aj}=
1
N
,
andX
1
d=X2. A similar argument can be used to show thatX 1,X2,...,X nall have the same
distribution but they are not independent. In fact,X
1,X2,...,X nare exchangeable RVs.
Sampling without replacement from a finite population is often referred to assimple
random sampling.
Remark 4.It should be remembered that sample statistics
X,S
2
(and others that we will
define later on) are random variables, while the population parametersμ,σ
2
, and so on
are fixed constants that may be unknown. Remark 5.In (3) we divide byn−1 rather thann. The reason for this will become clear
in the next section.

248 SAMPLE STATISTICS AND THEIR DISTRIBUTIONS
Remark 6.Other frequently occurring examples of statistics are sample order statistics
X
(1),X
(2),...,X
(n)and their functions, as well as sample moments, which will be studied
in the next section.
Example 1.LetX∼b(1,p), wherepis possibly unknown. The DF ofXis given by
F(x)=pε(x−1)+(1−p)ε(x), x∈R.
Suppose that five independent observations onXare 0, 1, 1, 1, 0. Then 0, 1, 1, 1, 0 is a
realization of the sampleX
1,X2,...,X 5. The sample mean is
x=
0+1+1+1+0
5
=0.6,
which is the value assumed by the RVX. The sample variance is
s
2
=
5
θ
i=1
(xi−
x)
2
5−1
=
2(0.6)
2
+3(0.4)
2
4
=0.3,
which is the value assumed by the RVS
2
.Alsos =

0.3=0.55.
Example 2.LetX∼N(μ,σ
2
), whereμis known butσ
2
is unknown. LetX 1,X2,...,X nbe
a sample fromN(μ,σ
2
). Then, according to our definition,
ε
n
i=1
Xi/σ
2
is not a statistic.
Suppose that five observations onXare−0.864, 0.561, 2.355, 0.582,−0.774. Then the
sample mean is 0.372, and the sample variance is 1.648.
PROBLEMS 6.2
1.LetXbe ab(1,
1
2
)RV, and consider all possible random samples of size 3 onX.
ComputeXandS
2
for each of the eight samples, and also compute the PMFs ofX
andS
2
.
2.A fair die is rolled. LetXbe the face value that turns up, andX
1,X2be two
independent observations onX. Compute the PMF of
X.
3.LetX
1,X2,...,X nbe a sample from some population. Show that
max
1≤i≤n
|Xi−
X|<
(n−1)S

n
unless either all thenobservations are equal or exactlyn−1oftheX
j’s are equal.
(Samuelson [99])
4.Letx
1,x2,...,x nbe real numbers, and letx
(n)=max{x 1,x2,...,x n},x
(1)=
min{x
1,x2,...,x n}. Show that for any set of real numbersa 1,a2,...,a nsuch that
ε
n
i=1
ai=0 the following inequality holds:
ρ
ρ
ρ
ρ
ρ
n
θ
i=1
aixi
ρ
ρ
ρ
ρ
ρ

1
2
φ
x
(n)−x
(1)
α
n
θ
i=1
|ai|.

SAMPLE CHARACTERISTICS AND THEIR DISTRIBUTIONS 249
5.For any set of real numbersx 1,x2,...,x nshow that the fraction ofx 1,x2,...,x n
included in the interval(x−ks,x+ks)fork≥1 is at least 1−1/k
2
.Herexis the
mean andsthe standard deviation ofx’s.
6.3 SAMPLE CHARACTERISTICS AND THEIR DISTRIBUTIONS
LetX
1,X2,...,X nbe a sample from a population DFF. In this section we consider some
commonly used sample characteristics and their distributions.
Definition 1.LetF

n
(x)=n
−1
ε
n
j=1
ε(x−X j). ThennF

n
(x)is the number ofX k’s (1≤
k≤n) that are≤x.F

n
(x)is called thesample (or empirical) distribution function.
We note that 0≤F

n
(x)≤1 for allx, and, moreover, thatF

n
is right continuous,
nondecreasing, andF

n
(−∞)=0,F

n
(∞)=1. Thus F

n
is a DF.
IfX
(1),X
(2),...,X
(n)is the order statistic forX 1,X2,...,X n, then clearly
F

n
(x)=







0ifx<X
(1)
k
n
ifX
(k)≤x<X
(k+1) (k=1,2,...,n−1).
1ifx≥X
(n).
(1)
For fixed but otherwise arbitraryx∈R,F

n
(x)itself is an RV of the discrete type. The
following result is immediate.
Theorem 1.The RVF

n
(x)has the probability function
P

F

n
(x)=
j
n

=

n
j

[F(x)]
j
[1−F(x)]
n−j
,j=0,1,...,n, (2)
with mean
EF

n
(x)=F(x) (3)
and variance
var(F

n
(x)) =
F(x)[1−F(x)]
n
. (4)
Proof.Sinceε(x−X
j),j=1,2,...,n, are iid RVs, each with PMF
P{ε(x −X
j)=1} =P{x−X j≥0}=F(x)
and
P{ε(x −X
j)=0} =1−F(x),
their sumnF

n
(x)is ab(n,p)RV, wherep=F(x). Relations (2), (3), and (4) follow
immediately.

250 SAMPLE STATISTICS AND THEIR DISTRIBUTIONS
We next consider some typical values of the DFF

n
(x), calledsample statistics. Since
F

n
(x)has jump pointsX j,j=1,2,...,n, it is clear that all moments ofF

n
(x)exist. Let us
write
a
k=n
−1
n
Σ
j=1
X
k
j
(5)
for the moment of orderkabout 0. Herea
kwill be called thesample moment of order k.
In this notation
a
1=n
−1
n
Σ
j=1
Xj=
X. (6)
Thesample central momentis defined by
b
k=n
−1
n
Σ
j=1
(Xj−a1)
k
=n
−1
n
Σ
j=1
(Xj−
X)
k
. (7)
Clearly,
b
1=0 andb 2=

n−1
n

S
2
.
As mentioned earlier, we do not callb
2the sample variance.S
2
will be referred to as the
sample variancefor reasons that will subsequently become clear. We have
b
2=a2−a
2
1
. (8)
FortheMGFofDFF

n
(x),wehave
M

(t)=n
−1
n
Σ
j=1
e
tXj
. (9)
Similar definitions are made for sample moments of bivariate and multivariate dis-
tributions. For example, if(X
1,Y1),(X2,Y2),...,(X n,Yn)is a sample from a bivariate
distribution, we write
X=n
−1
n
Σ
j=1
Xjand
Y=n
−1
n
Σ
j=1
Yj (10)
for the two sample means, and for the second-order sample central moments we write
b
20=n
−1
n
Σ
j=1
(Xj−
X)
2
,b 02=n
−1
n
Σ
j=1
(Yj−
Y)
2
, (11)
b
11=n
−1
n
Σ
j=1
(Xj−
X)(Y j−Y).

SAMPLE CHARACTERISTICS AND THEIR DISTRIBUTIONS 251
Once again we write
S
2
1
=(n−1)
−1
n
Σ
j=1
(Xj−
X)
2
andS
2
2
=(n−1)
−1
n
Σ
j=1
(Yj−Y)
2
(12)
for the twosample variances, and for thesample covariancewe use the quantity
S
11=(n−1)
−1
n
Σ
j=1
(Xj−
X)(Y j−Y). (13)
In particular, thesample correlation coefficientis defined by
R=
b
11

b20b02
=
S
11
S1S2
. (14)
It can be shown (Problem 4) that|R|≤1, the extreme values±1 can occur only when all
sample points(X
1,Y1),...,(X n,Yn)lie on a straight line.
The sample quantiles are defined in a similar manner. Thus, if 0<p<1, thesample
quantile of order p, denoted byZ
p, is the order statisticX
(r), where
r=

np ifnpis an integer,
[np+1]ifnpis not an integer.
As usual,[x]is the largest integer≤x. Note that, ifnpis an integer, we can take any value
betweenX
(np)andX
(np)+1 as thepth sample quantile. Thus, ifp=
1
2
andnis even, we
can take any value betweenX
(n/2)andX
(n/2)+1 , the two middle values, as the median. It
is customary to take the average. Thus the sample median is defined as
Z
1/2=



X
((n+1)/2) ifnis odd,
X
(n/2)+X
((n/2)+1)
2
ifnis even.
(15)
Note that

n
2
+1

=

n+1
2

ifnis odd.
Example 1.A random sample of 25 observations is taken from the interval (0,1):
0.50 0.24 0.89 0.54 0.34 0.89 0.92 0.17 0.32 0.80
0.06 0.21 0.58 0.07 0.56 0.20 0.31 0.17 0.41 0.38
0.88 0.61 0.35 0.06 0.90

252 SAMPLE STATISTICS AND THEIR DISTRIBUTIONS
In order to computeF

25
, the first step is to order the observations from smallest to largest.
The ordered sample is
0.06, 0.06, 0.07, 0.17, 0.17, 0.20, 0.21, 0.24, 0.31, 0.32, 0.34,
0.35, 0.38, 0.41, 0.50, 0.54, 0.56, 0.58, 0.61, 0.80, 0.88, 0.89,
0.89, 0.90, 0.92
Then the empirical DF is given by
F

25
(x)=



























0, x<0.06
2/25, 0.06≤x<0.07
3/25, 0.07≤x<0.17
5/25, 0.17≤x<0.20
.
.
.
24/25, 0.90≤x<0.92
1, x≥0.92
.
AplotofF

25
is shown in Fig. 1. The sample mean and variance are
x=0.45, s
2
=0.084, ands=0.29.
Also sample median is the 13th observation in the ordered sample, namely,z
1/2=0.38,
and ifp=0.2 thennp=5 andz
0.2=0.17.
0 0.2 0.4 0.6 0.8 1
0.2
0.4
0.6
0.8
1
Fig. 1Empirical DF for data of Example 1.

SAMPLE CHARACTERISTICS AND THEIR DISTRIBUTIONS 253
Next we consider the moments of sample characteristics. In the following we write
EX
k
=mkandE(X−μ)
k
=μkfor thekth-order population moments. Wherever we use
m
k(orμ k), it will be assumed to exist. Also,σ
2
represents the population variance.
Theorem 2.LetX
1,X2,...,X nbe a sample from a population with DFF. Then
E
X=μ, (16)
var(X)=
σ
2
n
, (17)
E(X)
3
=
m
3+3(n−1)m 2μ+(n−1)(n−2)μ
3
n
2
, (18)
and
E(X)
4
=
m
4+4(n−1)m 3μ+6(n−1)(n−2)m 2μ
2
+3(n−1)m
2
2
n
3
(19)
+
(n−1)(n−2)(n−3)μ
4 n
3
.
Proof.In view of Theorems 4.5.3 and 4.5.7, it suffices to prove (18) and (19). We have


n
Σ
j=1
Xj


3
=
n
Σ
j=1
X
3
j
+3
Σ
jΣ=k
X
2
j
Xk+
Σ
jΣ=kΣ=l
XjXkXl,
and (18) follows. Similarly,

n
Σ
i=1
Xi

4
=

n
Σ
i=1
Xi



n
Σ
j=1
X
3
j
+3
Σ
jΣ=k
X
2
j
Xk+
Σ
jΣ=kΣ=l
XjXkXl


=
n
Σ
i=1
X
4
i
+4
Σ
jΣ=k
XjX
3
k
+3
Σ
jΣ=k
X
2
j
X
2
k
+6
Σ
iΣ=jΣ=k
X
2
i
XjXk
+
Σ
iΣ=jΣ=kΣ=l
XiXjXkXl,
and (19) follows.
Theorem 3.For the third and fourth central moments of
X,wehave
μ
3(
X)=
μ
3
n
2
(20)
and
μ
4(
X)=
μ
4
n
3
+3
(n−1)μ
2
2
n
3
. (21)

254 SAMPLE STATISTICS AND THEIR DISTRIBUTIONS
Proof.We have
μ
3(
X)=E(X−μ)
3
=
1
n
3
E

n
θ
i=1
(Xi−μ)

3
=
1
n
3
n
θ
i=1
E(Xi−μ)
3
=
μ
3
n
2
,
and
μ
4(
X)=E(X−μ)
4
=
1
n
4
E

n
θ
i=1
(Xi−μ)

4
=
1
n
4
n
θ
i=1
E(Xi−μ)
4
+

4
2

1
n
4
θ
i<j
E{(X i−μ)
2
(Xj−μ)
2
}
=
μ
4
n
3
+
3(n−1)
n
3
μ
2
2
.
Theorem 4.For the moments ofb
2,wehave
E(b
2)=
(n−1)σ
2
n
, (22)
var(b
2)=
μ
4−μ
2
2n

2(μ
4−2μ
2 2
)
n
2
+
μ
4−3μ
2 2
n
3
, (23)
E(b
3)=
(n−1)(n−2)
n
2
μ3, (24)
and
E(b
4)=
(n−1)(n
2
−3n+3)
n
3
μ4+
3(n−1)(2n−3)
n
3
μ
2
2
. (25)
Proof.We have
Eb
2=
1
n
E

n
θ
1
(Xi−μ+μ−
X)
2

=
1
n
E

n
θ
i=1
(Xi−μ)
2
−n(
X−μ)
2

=
1
n
(nσ
2
−σ
2
)=
n−1
n
σ
2
.
Now
n
2
b
2 2
=

n
θ
i=1
(Xi−μ)
2
−n(
X−μ)
2

2
.

SAMPLE CHARACTERISTICS AND THEIR DISTRIBUTIONS 255
WritingY i=Xi−μ, we see thatEY i=0,var(Y i)=σ
2
, andEY
4
i
=μ4.Wehave
n
2
Eb
2
2
=E

n
Σ
1
Y
2
i
−n
Y
2

2
=E


n
Σ
i=1
Y
4
i
+
Σ
iΣ=j
Y
2
i
Y
2
j

2
n
⎛ ⎝
Σ
iΣ=j
Y
2
i
Y
2
j
+
n
Σ
j=1
Y
4
j
⎞ ⎠
+
1
n
2
⎛ ⎝3
Σ
iΣ=j
Y
2
i
Y
2
j
+
n
Σ
1
Y
4
j
⎞ ⎠
⎤ ⎦.
It follows that
n
2
Eb
2
2
=nμ 4+n(n−1)σ
4

2
n
[n(n−1)σ
4
+nμ 4]
+
1
n
2
[3n(n−1)σ
4
+nμ 4]
=

n−2+
1
n

μ
4+

n−2+
3
n

(n−1)μ
2 2
(μ2=σ
2
).
Therefore,
var(b
2)=Eb
2
2
−(Eb 2)
2
=

n−2+
1
n

μ
4
n
2
+(n−1)

n−2+
3
n

μ
2
2
n
2


n−1
n

2
μ
2
2
=

n−2+
1
n

μ
4
n
2
+(n−1)(3−n)
μ
2
2
n
3
,
as asserted.
Relations (24) and (25) can be proved similarly.
Corollary 1.ES
2

2
.
This is precisely the reason why we callS
2
, and notb 2, the sample variance.
Corollary 2.var(S
2
)=
μ
4
n
+
3−n
n(n−1)
μ
2 2
.
Remark 1.The results of Theorems 3 to 5 can easily be modified and stated for the case
when theX
i’s are exchangeable RVs. Thus (16) holds and (17) has to be modified to
var(
X)=
σ
2
n
+
n−1
n
ρσ
2
, (17
Λ
)
whereρis the correlation coefficient betweenX
iandX j. The expressions for(ΣX j)
3
and(ΣX j)
4
in the proof of Theorem 3 still hold but both (18) and (19) need appropriate
modification. For example, (18) changes to

256 SAMPLE STATISTICS AND THEIR DISTRIBUTIONS
EX
3
=
m
3+3(n−1)E(X
2
j
Xk)+(n−1)(n−2)E(X jXkXl)
n
2
. (18
Λ
)
Let us show how Corollary 1 changes for exchangeable RVs. Clearly,
(n−1)S
2
=
n
Σ
i=1
(Xi−μ)
2
−n(
X−μ)
2
so that
(n−1)ES
2
=nσ
2
−nE(
X−μ)
2
=nσ
2


σ
2
+(n−1)ρσ
2
!
.
in view of (17
Λ
). It follows that
ES
2

2
(1−ρ).
We note thatE(S
2
−σ
2
)=−ρσ
2
and, moreover, from Problem 4.5.19 (or from (17
Λ
)) we
note thatρ≥−1/(n−1)so that 1−ρ≤n/(n−1)and hence
0≤ES
2

n
n−1
σ
2
.
Remark 2.In simple random sampling from a (finite ) population of sizeNwe note that
whenn=N,¯X=μ, which is a constant so that (17
Λ
) reduces to
0=
σ
2
N
+
N−1
N
ρσ
2
,
so thatρ=−1/(N−1). It follows that
var(¯X)=
σ
2
n
"
1−
n−1
N−1
#
=
"
N−n
N−1
#
σ
2
n
. (17
ΛΛ
)
The factor(N−n)/(N −1)in (17
ΛΛ
) is called thefinite population correction factor.As
N→∞, withnfixed,(N−n)/(N −1)→1 so that the expression forvar(¯X)in (17
ΛΛ
)
approaches that in (17).
Remark 3.In view of (17
Λ
)iftheXi’s are uncorrelated, that is, ifρ= 0, then var (
X)=σ
2
/n,
the SD ofXisσ/

n. The SD ofXis sometimes calledstandard error(SE) although ifσ
is unknown S/

nis most commonly referred to as the SE ofX.
The following result provides a justification for our definition of sample covariance. Theorem 5.Let(X
1,Y1),(X2,Y2),...,(X n,Yn)be a sample from a bivariate population
with variancesσ
2
1

2
2
and covarianceρσ 1σ2. Then,
ES
2
1

2
1
,ES
2
2

2
2
,andES 11=ρσ1σ2, (26)
whereS
2
1
,S
2
2
, andS 11are defined in (12) and (13).

SAMPLE CHARACTERISTICS AND THEIR DISTRIBUTIONS 257
Proof.It follows from Corollary 1 to Theorem 4 thatES
2
1

2
1
andES
2
2

2
2
. To prove
thatES
11=ρσ1σ2we note thatX iis independent ofX j(i =j) andY j(i =j). We have
(n−1)ES
11=E



n
Σ
j=1
(Xj−
X)(Y j−Y)



.
Now
E{(X
j−
X)(Y j−Y)}=E

X jYj−Xj
Γ
n
1
Yj
n
−Y
j
Γ
n 1
Xj
n
+
Γ
X
j
Γ
Y
j
n
2

=EXY−
1
n
[EXY+(n−1)EX EY]−
1
n
[EXY+(n−1)EX EY]
+
1
n
2
[nEXY+n(n−1)EX EY]
=
n−1
n
(EXY−EX EY),
and it follows that
(n−1)ES
11=n

n−1
n

(EXY−EX EY),
that is,
ES
11=EXY−EX EY=cov(X,Y)=ρσ 1σ2,
as asserted.
We next turn our attention to the distributions of sample characteristics. Several possi-
bilities exist. If the exact sampling distribution is required, the method of transformation
described in Section 4.4 can be used. Sometimes the technique of MGF or CF can be
applied. Thus, ifX
1,X2,...,X nis a random sample from a population distribution for which
the MGF exists, the MGF of the sample mean
Xis given by
M
X
(t)=
n
Θ
i=1
Ee
tXi/n
=

M
"
t
n
#
n
, (27)
whereMis the MGF of the population distribution. IfM
X
(t)has one of the known forms,
it is possible to write the PDF ofX. Although this method has the obvious drawback
that it applies only to distributions for which all moments exist, we will see in Section 6.5
its effectiveness in the important case of sampling from a normal population where this
condition is satisfied. An analog of (27) holds for CFs without any condition on existence
of moments. Indeed,
φ
¯X(t)=
n
Σ
j=1
Ee
itXj/n
=

φ
"
t
n
#
n
, (28)
whereφis the CF ofX
j.

258 SAMPLE STATISTICS AND THEIR DISTRIBUTIONS
Example 2.LetX 1,X2,...,X nbe a sample from aG(α,1)distribution. We will compute
the PDF ofX.Wehave
M
X
(t)=

M
"
t
n
#
n
=
1
(1−t/n)
αn
,
t
n
<1,
so thatXis aG(αn,1/n)variate.
Example 3.LetX
1,X2,...,X nbe a random sample from a uniform distribution on(0,1).
Consider the geometric mean
Y
n=

n
σ
i=1
Xi

1/n
.
We havelogY
n=(1/n)
ε
n
i=1
logX i, so thatlogY nis the mean oflogX 1,...,logX n.
The common PDF oflogX
1,...,logX nis
f(x)=

e
x
ifx<0,
0 otherwise,
which is the negative exponential distribution with parameterβ=1. We see that the MGF
oflogY
nis given by
M(t)=
n
σ
i=1
Ee
tlogX i/n
=
1
(1+t/n)
n
,
and the PDF oflogY
nis given by
f

(x)=



n
n
Γ(n)
(−x)
n−1
e
nx
,−∞<x<0,
0, otherwise.
It follows thatY
nhas PDF
f
Yn
(y)=



n
n
Γ(n)
y
n−1
(−logy)
n−1
,0<y<1,
0, otherwise.
Example 4.(Hogben [46]). LetX
1,X2,...,X nbe a random sample from a Bernoulli
distribution with parameterp,0<p<1. Let
Xbe the sample mean andS
2
the sample
variance. We will find the PMF ofS
2
. Note thatS n=
ε
n
i=1
Xi=
ε
n
i=1
X
2
i
and thatS nis
b(n,p). Since
(n−1)S
2
=
n
θ
i=1
X
2
i
−n(
X)
2
=
S
n(n−S n)
n
,

SAMPLE CHARACTERISTICS AND THEIR DISTRIBUTIONS 259
S
2
only assumes values of the form
t=
i(n−i)
n(n−1)
,i=0,1,2,...,

n
2

,
where[x]is the largest integer≤x. Thus
P{S
2
=t}=P{nS n−S
2
n
=i(n−1)}
=P

"
S
n−
n
2
#
2
=
"
i−
n
2
#
2

=P{S
n=1orS n=n−i}
=

n
i

p
i
(1−p)
n−i
+

n
i

p
n−i
(1−p)
i
=

n
i

p
i
(1−p)
i
{(1−p)
n−2i
+p
n−2i
}, i<

n
2

.
Ifnis even,n=2m, say, wherem≥0 is an integer, andi=m, then
P

S
2
=
m
2(2m−1)

=2

2m
m

p
m
(1−p)
m
.
In particular, ifn=7,S
2
=0,
1
7
,
5
21
, and
2
7
with probabilities{p
7
+(1−p)
7
},
7p(1−p){p
5
+(1−p)
5
},21p
2
(1−p)
2
{p
3
+(1−p)
3
}, and 35p
3
(1−p)
3
, respectively.
Ifn=6, thenS
2
=0,
1
6
,
4
15
, and
3
10
with probabilities{p
6
+(1−p)
6
},6p(1−p){p
4
+
(1−p)
4
},15p
2
(1−p)
2
{p
2
+(1−p)
2
}, and 40p
3
(1−p)
3
, respectively.
We have already considered the distribution of the sample quantiles in Section 4.7 and
the distribution of rangeX
(n)−X
(1)in Example 4.7.4. It can be shown, without much
difficulty, that the distribution of the sample median is given by
f
r(y)=
n!
(r−1)!(n−r)!
[F(y)]
r−1
[1−F(y)]
n−r
f(y) ifr=
n+1
2
, (29)
whereFandfare the population DF and PDF, respectively. Ifn=2mand the median is
taken as the average ofX
(m)andX
(m+1), then
f
r(y)=
2(2m)!
[(m−1)!]
2
'

y
[F(2y−v)]
m−1
[1−F(v)]
m−1
f(2y−v)f(v)dv. (30)
Example 5.LetX
1,X2,...,X nbe a random sample fromU(0,1). Then the integrand
in (30) is positive for the intersection of the regions 0<2y−v<1 and 0<v<1. This
gives(v/2)<y<(v+1)/2,y<v, and 0<v<1. The shaded area in Fig. 2 gives the
limits on the integral as
y<v<2y if 0<y≤
1
2

260 SAMPLE STATISTICS AND THEIR DISTRIBUTIONS
1/2 1
1
y=v/2
y=v
y
v
0
0
Fig. 2{y<θ≤2y,0<y<1/2, andy<θ<1, 1/2<y≤1}.
and
y<v<1i f
1
2
<y<1.
In particular, ifm=2, the PDF of the median,(X
(2)+X
(3))/2, is given by
f
r(y)=





8y
2
(3−4y) if 0<y<
1
2
,
8(4y
3
−9y
2
+6y−1)if
1
2
<y<1,
0 otherwise.
The method of MGF (or CF) introduced in this section is particularly effective in com-
puting distributions of commonly used statistics in sampling from a univariate or bivariate
normal distribution as we shall see in the next two sections. However, when sampling
from nonnormal populations these methods may not be very fruitful in determining the
exact distribution of the statistic under consideration. Often the statistic itself may be too
intractable. Then we have some of other alternatives at our disposal. One may be able to
use the asymptotic distribution of the statistic or one may resort to simulation methods. In
Chapter 7 we study some of these procedures.
PROBLEMS 6.3
1.LetX
1,X2,...,X nbe random sample from a DFF, and letF

n
(x)be the sample
distribution function. Findcov(F

n
(x),F

n
(y))for fixed real numbersx,y.

SAMPLE CHARACTERISTICS AND THEIR DISTRIBUTIONS 261
2.LetF

n
be the empirical DF of a random sample from DFF. Show that
P

|F

n
(x)−F(x)|≥
ε
2

n


1
ε
2
for allε>0.
3.For the data of Example 6.2.2 compute the sample distribution function.
4.(a) Show that the sample correlation coefficientRsatisfies|R|≤1 with equality if
and only if all sample points lie on a straight line.
(b) If we writeU
i=aXi+b(a =0) andV i=cYi+d(c =0), what is the sample
correlation coefficient between theU’s and theV’s?
5.(a) A sample of size 2 is taken from the PDFf(x)=1, 0≤x≤1, and=0 otherwise.
FindP(¯X≥0.9).
(b) A sample of size 2 is taken fromb(1,p):
(i) FindP(¯X≤p). (ii) Find P(S
2
≥0.5).
6.LetX
1,X2,...,X nbe a random sample fromN(μ,σ
2
). Compute the first four sample
moments of
Xabout the origin and about the mean. Also compute the first four
sample moments ofS
2
about the mean.
7.Derive the PDF of the median given in (29) and (30).
8.LetU
(1),U
(2),...,U
(n)be the order statistic of a sample sizenfromU(0,1).
ComputeEU
k
(r)
for any 1≤r≤nand integerk(>0). In particular, show that
EU
(r)=
r
n+1
andvar(U
(r))=
r(n−r+1)
(n+1)
2
(n+2)
.
Show also that the correlation coefficient betweenU
(r)andU
(s)for 1≤r<s≤nis
given by[r(n−s+1)/s(n −r+1)]
1/2
.
9.LetX
1,X2,...,X nbenindependent observations onX. Find the sampling distribution
of
X, the sample mean, if (a)X∼P(λ),(b)X∼C(1,0), and (c)X∼χ
2
(m).
10.LetX
1,X2,...,X nbe a random sample fromG(α,β). Let us writeY n=
(
X−αβ)/β
(
(α/n),n=1,2,.....
(a) Compute the first four moments ofY
n, and compare them with the first four
moments of the standard normal distribution.
(b) Compute the coefficients of skewnessα
3and of kurtosisα 4for the RVsY n.(For
definitions ofα
3,α4see Problem 3.2.10.)
11.LetX
1,X2,...,X nbe a random sample fromU[0,1].AlsoletZ n=(
X−
0.5)/
(
(1/12n) . Repeat Problem 10 for the sequenceZ n.
12.LetX
1,X2,...,X nbe a random sample fromP(λ).Findvar( S
2
), and compare it with
var(
X). Note thatEX=λ=ES
2
.[Hint:Use Problem 3.2.9.]
13.Prove (24) and (25).
14.Multiple RVsX
1,X2,...,X nare exchangeable if then!permutations (X i1
,
X
i2
,...,X in
)havethesamen-dimensional distribution. Consider the special case
whenX’s are two dimensional. Find an analog of Theorem 6 for exchangeable
bivariate RVs(X
1,Y1),(X2,Y2),...,(X n,Yn).

262 SAMPLE STATISTICS AND THEIR DISTRIBUTIONS
15.LetX 1,X2,...,X nbe a random sample from a distribution with finite third moment.
Show thatcov(X,S
2
)=μ 3/n.
6.4 CHI-SQUARE,t-, ANDF-DISTRIBUTIONS: EXACT SAMPLING
DISTRIBUTIONS
In this section we investigate certain distributions that arise in sampling from a normal pop-
ulation. LetX
1,X2,...,X nbe a sample fromN(μ,σ
2
). Then we know that
X∼N(μ,σ
2
/n).
Also,{

n(X−μ)/σ}
2
isχ
2
(1). We will determine the distribution ofS
2
in the next
section. Here we mainly define chi-square,t-, andF-distributions and study their prop-
erties. Their importance will become evident in the next section and later in the testing of
statistical hypotheses (Chapter 10).
The first distribution of interest is thechi-square distribution, defined in Chapter 5 as a
special case of the gamma distribution. Letn>0 be an integer. ThenG(n/2, 2)is aχ
2
(n)
RV. In view of Theorem 5.3.29 and Corollary 2 to Theorem 5.3.4, the following result
holds.
Theorem 1.LetX
1,X2,...,X nbe iid RVs, and letS n=
ε
n
k=1
Xk. Then
(a)S
n∼χ
2
(n)⇔X 1∼χ
2
(1)
and
(b)X
1∼N(0,1)⇒
n
θ
k=1
X
2
k
∼χ
2
(n).
IfXhas a chi-square distribution withnd.f., we writeX∼χ
2
(n). We recall that, if
X∼χ
2
(n), its PDF is given by
f(x)=



x
n/2−1
e
−x/2
2
n/2
Γ(n/2)
ifx≥0,
0i fx<0,
(1)
the MGF by
M(t)=(1−2t)
−n/2
fort<
1
2
, (2)
and the mean and the variance by
EX=n, var(X)=2n. (3)
Theχ
2
(n)distribution is tabulated for values ofn=1,2,....Tables usually go up to
n=30, since forn>30 it is possible to use normal approximation. In Fig. 1 we plot the
PDF (1) for selected values ofn.

CHI-SQUARE,t-, ANDF-DISTRIBUTIONS: EXACT SAMPLING DISTRIBUTIONS 263
n=1
n=10
n=20
n=40
0 10203040506070
0
Fig. 1Chi-square densities.
We will writeχ
2
n,α
for the upperαpercent point of theχ
2
(n)distribution, that is,
P{χ
2
(n)>χ
2
n,α
}=α. (4)
Table ST3 at the end of the book gives the values ofχ
2
n,α
for some selected values ofn
andα.
Example 1.Letn=25. Then, from Table ST3,
P{χ
2
(25)≤34.382} =0.90.
Let us approximate this probability using CLT. We see thatEχ
2
(25)=25,
varχ
2
(25)=50, so that
P{χ
2
(25)≤34.382} =P

χ
2
(25)−25

50

34.382−25
5

2

≈P{Z≤1.32}
=0.9066.

264 SAMPLE STATISTICS AND THEIR DISTRIBUTIONS
Definition 1.LetX 1,X2,...,X nbe independent normal RVs withEX i=μiandvar(X i)=
σ
2
,i=1,2,...,n. Also, letY=
ε
n
i=1
X
2
i

2
.TheRVY is said to be anon-central chi-
square RV with noncentrality parameter
ε
n
i=1
μ
2
i

2
andnd.f. We will writeY∼χ
2
(n,δ),
whereδ=
ε
n
i=1
μ
2
i

2
.
Although the PDF of aχ
2
(n,δ)RV is hard to compute (see Problem 16), its MGF is
easily evaluated. We have
M(t)=Ee
t
ε
n
1
X
2
i

2
=
n
σ
1
Ee
tX
2
i

2
,
whereX
i∼N(μ i,σ
2
). Thus
Ee
tX
2
i

2
=
'

−∞
1
σ


exp

tx
2
i
σ
2

(x
i−μi)
2

2

dx
i,
where the integral exists fort<
1
2
. In the integrand we complete squares, and after some
simple algebra we obtain
Ee
tX
2
i

2
=
1

1−2t
exp


2
i
σ
2
(1−2t)

,t<
1
2
.
It follows that
M(t)=(1−2t)
−n/2
exp

t
1−2t
ε
μ
2 i
σ
2

,t<
1
2
, (5)
and the MGF of aχ
2
(n,δ)RV is therefore
M(t)=(1−2t)
−n/2
exp

t
1−2t
δ

,t<
1
2
. (6)
It is immediate that, ifX
1,X2,...,X kare independent,X i∼χ
2
(ni,δi),i=1,2,...,k, then
ε
k
i=1
Xiisχ
2
(
ε
k
i=1
ni,
ε
k
i=1
δi).
The mean and variance ofχ
2
(n,δ)are easy to calculate. We have
EY=
ε
n
1
EX
2
i
σ
2
=
ε
n 1
[var(X i)+(EX i)
2
]
σ
2
=

2
+
ε
n 1
μ
2
i
σ
2
=n+δ,
and
var(Y)=var
ε
n
1
X
2
i
σ
2

=
1
σ
4

n
θ
i=1
var(X
2
i
)

=
1
σ
4

n
θ
i=1
EX
4
i

n
θ
i=1
[E(X
2
i
)]
2

CHI-SQUARE,t-, ANDF-DISTRIBUTIONS: EXACT SAMPLING DISTRIBUTIONS 265
=
1
σ
4

n
Σ
i=1
(3σ
4
+6σ
2
μ
2
i

4
i
)−
n
Σ
i=1

2

2
i
)
2

=
1
σ
4
(2nσ
4
+4σ
2
Σ
μ
2
i
)=2n +4δ.
We next turn our attention toStudent’s t-statistic, which arises quite naturally in
sampling from a normal population.
Definition 2.LetX∼N(0,1)andY∼χ
2
(n), and letXandYbe independent. Then the
statistic
T=
X
(
Y/n
(7)
is said to have at-distributionwithnd.f. and we writeT∼t(n).
Theorem 2.The PDF ofTdefined in (7) is given by
f
n(t)=
Γ[(n+1)/2]
Γ(n/2)


(1+t
2
/n)
−(n+1)/2
,−∞<t<∞. (8)
Proof.The proof is left as an exercise.
Remark 1.Forn=1,Tis a Cauchy RV. We will therefore assume thatn>1. For each
n, we have a different PDF. In Fig. 2 we plotf
n(t)for some selected values ofn.Likethe
–10 –5 0 5 10
n=1
n=10
n=20
n=40
Fig. 2Student’st-densities.

266 SAMPLE STATISTICS AND THEIR DISTRIBUTIONS
t(n)
0 t(n, α/2)
α/2
α/2
–t(n, α/2)
Fig. 3
normal distribution, thet-distribution is important in the theory of statistics and hence is
tabulated (Table ST4).
Remark 2.The PDFf
n(t)is symmetric int, andf n(t)→0ast →+∞.Forlargen,the
t-distribution is close to the normal distribution. Indeed,(1+t
2
/n)
−(n+1)/2
→e
−t
2
/2
as
n→∞. Moreover, ast→∞ort→−∞, the tails off
n(t)→0 much more slowly than do
the tails of theN(0,1)PDF. Thus for smallnand larget
0
P{|T|>t 0}≥P{|Z|>t 0}, Z∼N(0,1),
that is, there is more probability in the tail of thet-distribution than in the tail of the
standard normal. In what follows we will writet
n,α/2for the value (Fig. 3) ofTfor which
P{|T|>t
n,α/2}=α. (9)
In Table ST4 positive values oft
n,αare tabulated for some selected values ofnandα.
Negative values may be obtained from symmetry,t
n,1−α =−t n,α.
Example 2.Letn=5. Then from Table ST4, we gett
5,0.025=2.571 andt 5,0.05=2.015.
The corresponding values under theN(0,1)distribution arez
0.025=1.96 andz 0.05=1.65.
Forn=30,
t
30,0.05=1.697 and z 0.05=1.65.

CHI-SQUARE,t-, ANDF-DISTRIBUTIONS: EXACT SAMPLING DISTRIBUTIONS 267
Theorem 3.LetX∼t(n),n>1. ThenEX
r
exists forr<n. In particular, ifr<nis odd,
EX
r
=0, (10)
and ifr<nis even,
EX
r
=n
r/2
Γ[(r+1)/2]Γ[(n −r)/2]
Γ(1/2)Γ(n/2)
. (11)
Corollary.Ifn>2,EX=0 andEX
2
=var(X)=n/(n −2).
Remark 3.If in Definition 2 we takeX∼N(μ,σ
2
),Y/σ
2
∼χ
2
(n), andXandY
independent,
T=
X
(
Y/n
is said to have anoncentral t-distribution with parameter(also callednoncentrality param-
eter)δ=μ/σand d.f.n. Various moments of noncentralt-distribution may be computed
by using the fact that expectation of a product of independent RVs is the product of their
expectations.
We leave the reader to show (Problem 3) that, ifThas a noncentralt-distribution with
nd.f. and noncentrality parameterδ, then
ET=δ
Γ[(n−1)/2]
Γ(n/2)
)
n
2
,n>1, (12)
and
var(T)=
n(1+δ
2
)
n−2

δ
2
n
2

Γ[(n−1)/2]
Γ(n/2)

2
,n>2. (13)
Definition 3.LetXandYbe independentχ
2
RVs withmandnd.f., respectively. The RV
F=
X/m
Y/n
(14)
is said to have anF-distribution with(m,n)d.f., and we writeF∼F(m,n).
Theorem 4.The PDF of theF-statistic defined in (14) is given by
g(f)=











Γ[(m+n)/2]
Γ(m/2)Γ(n/2)
"
m
n
#"
m
n
f
#
(m/2)−1
·
"
1+
m
n
f
#
−(m+n)/2
, f>0,
0, f≤0.
(15)

268 SAMPLE STATISTICS AND THEIR DISTRIBUTIONS
Proof.The proof is left as an exercise.
Remark 4.IfX∼F(m,n), then 1/X ∼F(n,m).Ifwetakem =1, thenF=[t(n)]
2
, so that
F(1,n)andt
2
(n)have the same distribution. It also follows that, ifZisC(1,0)[which is
the same ast(1)],Z
2
isF(1,1).
Remark 5.As usual, we writeF
m,n,αfor the upperαpercent point of theF(m,n)
distribution, that is,
P{F(m,n)>F
m,n,α}=α. (16)
From Remark 4, we have the following relation:
F
m,n,1− α =
1
Fn,m,α
. (17)
It therefore suffices to tabulate values ofFthat are≥1. This is done in Table ST5, where
values ofF
m,n,αare listed for some selected values ofm,n, andα. See Fig. 4 for a plot
ofg(f).
Theorem 5.LetX∼F(m,n). Then, fork>0, integral,
EX
k
=
"
n
m
#
kΓ[k+(m/2)]Γ[(n/2) −k]
Γ[(m/2)Γ(n/2)]
forn>2k. (18)
012345678
(5,15)
(5,10)
(5,5)
Fig. 4Fdensities.

CHI-SQUARE,t-, ANDF-DISTRIBUTIONS: EXACT SAMPLING DISTRIBUTIONS 269
In particular,
EX=
n
n−2
,n>2, (19)
and
var(X)=
n
2
(2m+2n−4)
m(n−2)
2
(n−4)
,n>4. (20)
Proof.We have, for a positive integerk,
'

0
f
k
f
m/2−1
"
1+
m
n
f
#
−(m+n)/2
df=
"
m
n
#
k+(m/ 2)
'
1
0
x
k+(m/ 2)−1
(1−x)
(n/2)−k−1
dx,
(21)
where we have changed the variable tox=(m/n)f [1+(m/n)f ]
−1
. The integral in the right
side of (21) converges for(n/2)−k>0 and diverges for(n/2)−k≤0. We have
EX
k
=
Γ[(m+n)/2]
Γ(m/2)Γ(n/2)
"
m
n
#
m/2"
n
m
#
k+(m/ 2)
B
"
k+
m
2
,
n
2
−k
#
,
as asserted.
Fork=1, we get
EX=
n
m
m/2
(n/2)−1
=
n
n−2
,n>2.
Also,
EX
2
=
"
n
m
#
2(m/2)[(m/2)+1]
[(n/2) −1][(n/2) −2]
,n>4,
=
"
n
m
#
2m(m+2)
(n−2)(n−4)
,
and
var(X)=
"
n
m
#
2m(m+2)
(n−2)(n−4)


n
n−2

2
=
2n
2
(m+n−2)
m(n−2)
2
(n−4)
,n>4.
Theorem 6.IfX∼F(m,n), thenY=1/[1+(m/n)X ]isB(n/2, m/2). Consequently, for
eachx>0,
F
X(x)=1−F Y
*
1
1+(m/n)x
+
.
If in Definition 3 we takeXto be a noncentralχ
2
RV withnd.f. and noncentrality
parameterδ, we get anoncentral FRV.

270 SAMPLE STATISTICS AND THEIR DISTRIBUTIONS
Definition 4.LetX∼χ
2
(m,δ)andY∼χ
2
(n), and letXandYbe independent. Then
the RV
F=
X/m
Y/n
(22)
is said to have a noncentralF-distribution with noncentrality parameterδ.
It is shown in Problem 2 that ifFhas a noncentralF-distribution with(m,n)d.f. and
noncentrality parameterδ,
EF=
n(m+δ)
m(n−2)
,n>2,
and
var(F)=
2n
2
m
2
(n−4)(n−2)
2
[(m+δ)
2
+(n−2)(m+2δ)], n>4.
PROBLEMS 6.4
1.Let
P
x=
,
Γ
"
n
2
#
2
n/2
-
−1
'
x
0
ω
(n−2)/2
e
−ω/2
dω, x>0.
Show that
x<
n1−P x
.
2.LetX∼F(m,n,δ).FindEXandvar(X).
3.LetTbe a noncentralt-statistic withnd.f. and noncentrality parameterδ.FindET
andvar(T).
4.LetF∼F(m,n). Then
Y=
"
1+
m
n
F
#
−1
∼B
"
n
2
,
m
2
#
.
Deduce that forx>0
P{F≤x}=1−P

Y≤
"
1+
m
n
x
#
−1

.
5.Derive the PDF of anF-statistic with(m,n)d.f.
6.Show that the square of a noncentralt-statistic is a noncentralF-statistic.
7.A sample of size 16 showed a variance of 5.76. Findcsuch thatP{|
X−μ|<c}=
0.95, whereXis the sample mean andμis the population mean. Assume that the
sample comes from a normal population.

DISTRIBUTION OF(X,S
2
)IN SAMPLING FROM A NORMAL POPULATION 271
8.A sample from a normal population produced variance 4.0. Find the size of the
sample if the sample mean deviates from the population mean by no more than 2.0
with a probability of at least 0.95.
9.LetX
1,X2,X3,X4,X5be a sample fromN(0,4).FindP{
ε
5
i=1
X
2
i
≥5.75}.
10.LetX∼χ
2
(61).FindP{X>50}.
11.LetF∼F(m,n). The random variableZ=
1
2
logFis known asFisher’s Z statistic.
Find the PDF ofZ.
12.Prove Theorem 1.
13.Prove Theorem 2.
14.Prove Theorem 3.
15.Prove Theorem 4.
16.(a) Letf
1,f2,...be PDFs with corresponding MGFsM 1,M2,..., respectively. Letα j
(0<α j<1)be constants such that
ε
∞ j=1
αj=1. Thenf=
ε
∞ 1
αjfjis a PDF
with MGFM=
ε
∞ j=1
αjMj.
(b) Write the MGF of aχ
2
(n,δ)RV in (6) as
M(t)=

θ
j=0
αjMj(t),
whereM
j(t)=(1−2t)
−(2j+n)/2
is the MGF of aχ
2
(2j+n)RV andα j=
e
−δ/2
(δ/2)
j
/j!is the PMF of aP(δ/2)RV. Conclude that PDF ofY∼χ
2
(n,δ)is
the weighted sum of PDFs ofχ
2
(2j+n)RVs,j=0,1,2,...with Poisson weights
and hence
f
Y(y)=

θ
j=0
e
−δ/2
(δ/2)
j
j!
y
(2j+n)/2−1
exp(−y/2)
2
(2j+n)/2
Γ
"
2j+n
2
#.
6.5 DISTRIBUTION OF (X,S
2
)IN SAMPLING FROM A NORMAL
POPULATION
LetX
1,X2,...,X nbe a sample fromN(μ,σ
2
), and write
X=n
−1
ε
n
i=1
Xiand
S
2
=(n−1)
−1
ε
n
i=1
(Xi−
X)
2
. In this section we show thatXandS
2
are independent
and derive the distribution ofS
2
. More precisely, we prove the following important result.
Theorem 1.LetX
1,X2,...,X nbe iidN(μ,σ
2
)RVs. Then
Xand(X 1−X,X2−X,...,
X
n−
X)are independent.
Proof.We compute the joint MGF ofXandX 1−X,X2−X,...,X n−Xas follows:
M(t,t
1,t2,...,t n)=Eexp{t
X+t 1(X1−X)+t 2(X2−X)+···+t n(Xn−X)}
=Eexp

n
θ
i=1
tiXi−

n
θ
i=1
ti−t

X

272 SAMPLE STATISTICS AND THEIR DISTRIBUTIONS
=Eexp

n
Σ
i=1
Xi

t
i−
t
1+t2+···+t n−t
n


=E

n
Θ
i=1
exp

X
i(nti−n
t+t)
n


wheret=n
−1
n
Σ
i=1
ti

=
n
Θ
i=1
Eexp

X
i[t+n(t i−
t)]
n

=
n
Θ
1
exp

μ[t+n(t
i−
t)]
n
+
σ
2
2
1
n
2
[t+n(t i−t)]
2

=exp

μ
n
[nt+n
n
Σ
i=1
(ti−
t)] +
σ
2
2n
2
n
Σ
i=1
[t+n(t i−
t)]
2

= exp(μ t)exp

σ
2
2n
2

nt
2
+n
2
n
Σ
i=1
(ti−
t)
2

=exp

μt+
σ
2
2n
t
2

exp

σ
2
2
n
Σ
i=1
(ti−t)
2

=M
X
(t)M
X1−X,...,X n−X
(t1,t2,...,t n)
=M(t,0,0,...,0)M(0,t
1,t2,...,t n).
Corollary 1.
XandS
2
are independent.
Corollary 2.(n−1)S
2

2
isχ
2
(n−1).
Since
n
Σ
i=1
(Xi−μ)
2
σ
2
∼χ
2
(n), n

X−μ
σ

2
∼χ
2
(1),
and
XandS
2
are independent, it follows from
Γ
n
1
(Xi−μ)
2
σ
2
=n

X−μ
σ

2
+(n−1)
S
2
σ
2
that
E

exp

t
n
Σ
1
(Xi−μ)
2
σ
2

=E

exp

tn

X−μ
σ

2
+(n−1)
S
2
σ
2
t

=Eexp

n

X−μ
σ

2
t

Eexp
*
(n−1)
S
2
σ
2
t
+
,

DISTRIBUTION OF(X,S
2
)IN SAMPLING FROM A NORMAL POPULATION 273
that is,
(1−2t)
−n/2
=(1−2t)
−1/2
Eexp
*
(n−1)
S
2
σ
2
t
+
,t<
1
2
,
and it follows that
Eexp
*
(n−1)
S
2
σ
2
t
+
=(1−2t)
−(n−1)/2
,t<
1
2
.
By the uniqueness of the MGF it follows that(n−1)S
2

2
isχ
2
(n−1).
Corollary 3.The distribution of

n(X−μ)/Sist(n−1).
Proof.Since

n(X−μ)/σisN(0,1), and(n−1)S
2

2
∼χ
2
(n−1)and sinceXandS
2
are independent,

n(X−μ)/σ
(
[(n−1)S
2

2
]/(n−1)
=

n(X−μ)
S
ist(n−1).
Corollary 4.IfX
1,X2,...,X mare iidN(μ 1,σ
2
1
)RVs,Y 1,Y2,...,Y nare iidN(μ 2,σ
2
2
)RVs,
and the two samples are independently taken,(S
2
1

2
1
)/(S
2
2

2
2
)isF(m−1,n−1).If,in
particular,σ
1=σ2, thenS
2
1
/S
2
2
isF(m−1,n−1).
Corollary 5.LetX
1,X2,...,X mandY 1,Y2,...,Y n, respectively, be independent samples
fromN(μ
1,σ
2
1
)andN(μ 2,σ
2
2
). Then
X−Y−(μ 1−μ2)
{[(m−1)S
2
1

2
1
]+[(n−1)S
2
2

2
2
]}
1/2
.
m+n−2
σ
2
1
/m+σ
2
2
/n
∼t(m+n−2).
In particular, ifσ
1=σ2, then
X−Y−(μ 1−μ2)
(
[(m−1)S
2
1
+(n−1)S
2
2
]
)
mn(m+n−2)
m+n
∼t(m+n−2).
Corollary 5 follows since
X−Y∼N

μ 1−μ2,
σ
2
1
m
+
σ
2
2
n

and
(m−1)S
2
1
σ
2
1
+
(n−1)S
2
2
σ
2
2
∼χ
2
(m+n−2)
and the two statistics are independent.

274 SAMPLE STATISTICS AND THEIR DISTRIBUTIONS
Remark 1.The converse of Corollary 1 also holds. See Theorem 5.3.28.
Remark 2.In sampling from a symmetric distribution,XandS
2
are uncorrelated. See
Problem 4.5.14.
Remark 3.Alternatively, Corollary 1 could have been derived from Corollary 2 to
Theorem 5.4.6 by using the Helmert orthogonal matrix:
A=













1/

n 1/

n 1/

n ··· 1/

n
−1/

21 /

20 ··· 0
−1/

6 −1/

62 /

6 ··· 0
···· ·· 0
···· ···
···· ·· 0
−1/
(
n(n−1)−1/
(
n(n−1)−1/
(
n(n−1)···(n−1)/
(
n(n−1)













.
For the case ofn=3 this was done in Example 4.4.6. In Problem 7 the reader is asked to
work out the details in the general case.
Remark 4.An analytic approach to the development of the distribution of
XandS
2
is as
follows. Assuming without loss of generality thatX
iisN(0,1), we have as the joint PDF
of(X
1,X2,...,X n)
f(x
1,x2,...,x n)=
1
(2π)
n/2
exp




1
2
n
Σ
j=1
x
2
j



=
1
(2π)
n/2
exp


(n−1)s
2
+n
x
2
2

.
Changing the variables toy
1,y2,...,y nby using the transformationy k=(xk−
x)/s,wesee
that
n
Σ
k=1
yk=0 and
n
Σ
k=1
y
2
k
=n−1.
It follows that two of they
k’s, sayy n−1andy n, are functions of the remainingy k. Thus
either
y
n−1=
α+β
2
,y
n=
α−β
2
,
or
y
n−1=
α−β
2
,y
n=
α+β
2
,

DISTRIBUTION OF(X,S
2
)IN SAMPLING FROM A NORMAL POPULATION 275
where
α=−
n−2
θ
k=1
ykandβ=
1
2
2
3
2(n−1)−2
n−2
θ
k=1
y
2
k


n−2
θ
k=1
yk
2
.
We leave the reader to derive the joint PDF of (Y
1,Y2,...,Y n−2,
X,S
2
), using the
result described in Remark 2, and to show that the RVsX,S
2
, and(Y 1,Y2,...,Y n−2)are
independent.
PROBLEMS 6.5
1.LetX
1,X2,...,X nbe a random sample fromN(μ,σ
2
)and
XandS
2
, respectively,
be the sample mean and the sample variance. LetX
n+1∼N(μ,σ
2
), and assume
thatX
1,X2,...,X n,Xn+1are independent. Find the sampling distribution of[(X n+1−
X)/S]
(
n/(n+1).
2.LetX
1,X2,...,X mandY 1,Y2,...,Y nbe independent random samples fromN(μ 1,σ
2
)
andN(μ
2,σ
2
), respectively. Also, letα,βbe two fixed real numbers. If
X,Ydenote
the corresponding sample means, what is the sampling distribution of
α(X−μ 1)+β(Y−μ 2)
)
(m−1)S
2
1
+(n−1)S
2
2
m+n−2
)
α
2
m
+
β
2
n
,
whereS
2
1
andS
2
2
, respectively, denote the sample variances of theX’s and theY’s?
3.LetX
1,X2,...,X nbe a random sample fromN(μ,σ
2
), andkbe a positive integer.
FindE(S
2k
). In particular, findE(S
2
)andvar(S
2
).
4.A random sample of 5 is taken from a normal population with mean 2.5 and variance
σ
2
=36.
(a) Find the probability that the sample variance lies between 30 and 44.
(b) Find the probability that the sample mean lies between 1.3 and 3.5, while the
sample variance lies between 30 and 44.
5.The mean life of a sample of 10 light bulbs was observed to be 1327 hours with a
standard deviation of 425 hours. A second sample of 6 bulbs chosen from a different
batch showed a mean life of 1215 hours with a standard deviation of 375 hours. If
the means of the two batches are assumed to be same, how probable is the observed
difference between the two sample means?
6.LetS
2
1
andS
2
2
be the sample variances from two independent samples of sizesn 1=
5 andn
2=4 from two populations having the same unknown varianceσ
2
.Find
(approximately) the probability thatS
2
1
/S
2
2
<1/5.2or >6.25.
7.LetX
1,X2,...,X nbe a sample fromN(μ,σ
2
). By using the Helmert orthogonal
transformation defined in Remark 3, show that
XandS
2
are independent.
8.Derive the joint PDF ofXandS
2
by using the transformation described in Remark 4.

276 SAMPLE STATISTICS AND THEIR DISTRIBUTIONS
6.6 SAMPLING FROM A BIVARIATE NORMAL DISTRIBUTION
Let(X
1,Y1),(X2,Y2),...,(X n,Yn)be a sample from a bivariate normal population with
parametersμ
1,μ2,ρ,σ
2
1

2
2
. Let us write
X=n
−1
n
Σ
i=1
Xi,
Y=n
−1
n
Σ
i=1
Yi,
S
2
1
=(n−1)
−1
n
Σ
i=1
(Xi−
X)
2
,S
2
2
=(n−1)
−1
n
Σ
i=1
(Yi−Y)
2
,
and
S
11=(n−1)
−1
n
Σ
i=1
(Xi−
X)(Y i−Y).
In this section we show that(X,Y)is independent of(S
2
1
,S11,S
2
2
)and obtain the distribution
of the sample correlation coefficient and regression coefficients (at least in the special case
whereρ=0).
Theorem 1.The random vectors(
X,Y)and (X 1−X,X 2−X,...,X n−X,Y 1−Y,
Y
2−
Y,...,Y n−Y) are independent. The joint distribution of(X,Y)is bivariate normal
with parametersμ
1,μ2,ρ,σ
2
1
/n,σ
2
2
/n.
Proof.The proof follows along the lines of the proof of Theorem 1. The MGF of(
X,Y,
X
1−
X,...,X n−X,Y1−Y,...,Y n−Y)is given by
M

=M(u,v,t 1,t2,...,t n,s1,s2,...,s n)
=Eexp

u
X+vY+
n
Σ
i=1
ti(Xi−X)+
n
Σ
i=1
si(Yi−Y)

=Eexp

n
Σ
i=1
Xi
"
u
n
+t
i−
t
#
+
n
Σ
i=1
Yi
"
v
n
+s
i−
s
#

,
wheret=n
−1
Γ
n
i=1
ti,s=n
−1
Γ
n i=1
si. Therefore,
M

=
n
Θ
i=1
Eexp
,"
u
n
+t
i−
t
#
Xi+
"
v
n
+s
i−
s
#
Yi
-
=
n
Θ
i=1
exp
,"
u
n
+t
i−
t
#
μ1+
"
v
n
+s
i−
s
#
μ2
+
σ
2
1
[(u/n)+t i−t]
2
+2ρσ 1σ2[(u/n)+t i−t][(v/n)+s i−s]
2

2
2
[(v/n)+ s i−
s]
2
2

SAMPLING FROM A BIVARIATE NORMAL DISTRIBUTION 277
=exp

μ 1u+μ 2v+
u
2
σ
2
1
+2ρσ 1σ2uv+v
2
σ
2
2
2n

·exp

1
2
σ
2
1
n
Σ
i=1
(ti−
t)
2
+ρσ1σ2
n
Σ
i=1
(ti−
t)(si−s)
+
1
2
σ
2
2
2
Σ
i=1
(si−
s)
2

=M
1(u,v)M 2(t1,t2,...,t n,s1,s2,...,s n)
for all realu,v,t
1,t2,...,t n,s1,s2,...,s nwhereM 1is the MGF of(
X,Y)andM 2is the MGF
of(X
1−
X,...,X n−X,Y1−Y,...,Y n−Y).Also,M 1is the MGF of a bivariate normal
distribution. This completes the proof.
Corollary.The sample mean vector(X,Y)is independent of the sample variance–
covariance matrix

s
2
1
s11
s11s
2
2

in sampling from a bivariate normal population.
Remark 1.The result of Theorem 1 can be generalized to the case of sampling from a
k-variate normal population. We do not propose to do so here.
Remark 2.Unfortunately the method of proof of Theorem 1 does not lead to the distribu-
tion of the variance–covariance matrix. The distribution of(
X,Y,S
2
1
,S11,S
2
2
)was found
by Fisher [30] and Romanovsky [92]. The general case is due to Wishart [119], who
determined the distribution of the sample variance–covariance matrix in sampling from
ak-dimensional normal distribution. The distribution is named after him.
We will next compute the distribution of the sample correlation coefficient:
R=
Γ
n
i=1
(Xi−
X)(Y i−Y)
Γ
n
i=1
(Xi−
X)
2
Γ
n i=1
(Yi−Y)
2
!
1/2
=
S
11
S1S2
. (1)
It is convenient to introduce the so-calledsample regression coefficientofYonX
B
Y|X=
Γ
n
i=1
(Xi−
X)(Y i−Y)
Γ
n i=1
(Xi−X)
2
=
S
11
S
2
1
=R
S
2
S1
. (2)
Since we will need only the distribution ofRandB
Y|Xwheneverρ=0, we will make
this simplifying assumption in what follows. The general case is computationally quite
complicated. We refer the reader to Cramér [17] for details.
We note that
R=
Γ
n
i=1
Yi(Xi−
X)
(n−1)S 1S2
(3)

278 SAMPLE STATISTICS AND THEIR DISTRIBUTIONS
and
B
Y|X=

n
i=1
Yi(Xi−
X)
(n−1)S
2
1
. (4)
Moreover,
R
2
=
B
2
Y|X
S
2
1
S
2
2
. (5)
In the following we writeB=B
Y|X.
Theorem 2.Let(X
1,Y1),...,(X n,Yn),n≥2, be a sample from a bivariate normal popu-
lation with parametersEX=μ
1,EY=μ 2,var(X)=σ
2
1
,var(Y)=σ
2
2
, andcov(X ,Y)=0.
In other words, letX
1,X2,...,X nbe iidN(μ 1,σ
2
1
)RVs, andY 1,Y2,...,Y nbe iidN(μ 2,σ
2
2
)
RVs, and suppose that theX’s andY’s are independent. Then the PDF ofRis given by
f
1(r)=







Γ[(n−1)/2]
Γ(
1
2
)Γ[(n−2)/2]
(1−r
2
)
(n−4)/2
,−1≤r≤1,
0, otherwise;
(6)
and the PDF ofBis given by
h
1(b)=
Γ(n/2)
Γ

1
2

Γ[(n−1)/2]
σ

n−1
2

2
2

2
1
b
2
)
n/2
,−∞<b<∞. (7)
Proof.Without any loss of generality, we assume thatμ
1=μ2=0 andσ
2
1

2
2
=1, for
we can always define
X

i
=
X
i−μ1
σ1
andY

i
=
Y
i−μ2
σ2
. (8)
Now note that the conditional distribution ofY
i,givenX 1,X2,...,X n,isN(0,1), andY 1,
Y
2,...,Y n,givenX 1,X2,...,X n, are mutually independent. Let us define the following
orthogonal transformation:
u
i=
n
Σ
j=1
cijyj,i=1,2,...,n, (9)
where(c
ij)i,j=1,2,...,n is an orthogonal matrix with the first two rows
c
1j=
1

n
,j=1,2,...,n, (10)
c
2j=
x
j−
x

n
i=1
(xi−
x)
2
!
1/2
,j=1,2,...,n. (11)

SAMPLING FROM A BIVARIATE NORMAL DISTRIBUTION 279
It follows from orthogonality that for anyi≥2
n

j=1
cij=

n
n

j=1
cij
1

n
=

n
n

j=1
cijc1j=0 (12)
and
n

i=1
u
2
i
=
n

i=1


n

j=1
cijyj
n

j

=1
cij
∈yj



=
n

j=1
n

j

=1

n

i=1
cijcij


y
jyj

=
n

j=1
y
2
j
∈. (13)
Moreover,
u
1=

ny (14)
and
u
2=b
4

(x
i−
x)
2
, (15)
wherebis a value assumed by RVB.AlsoU
1,U2,...,U n,givenX 1,X2,...,X n, are normal
RVs (being linear combinations of theY’s). Thus
E{U
i|X1,X2,...,X n}=
n

j=1
cijE{Yj|X1,X2,...,X n}
=0 (16)
and
cov{U
i,Uk|X1,X2,...,X n}=cov



n

j=1
cijYj,
n

p=1
ckpYp|X1,X2,...,X n



=
n

j=1
n

p=1
cijckpcov{Y j,Yp|X1,X2,...,X n}
=
n

j=1
cijckj.
This last equality follows since
cov{Y
j,Yp|X1,X2,...,X n}=

0,j =p,
1,j=p.

280 SAMPLE STATISTICS AND THEIR DISTRIBUTIONS
From orthogonality, we have
cov{U
i,Uk|X1,X2,...,X n}=

0,i =k,
1,i=k;
(17)
and it follows that the RVsU
1,U2,...,U n,givenX 1,X2,...,X n, are mutually independent
N(0,1).Now
n
Σ
j=1
(yj−
y
2
)=
n
Σ
i=1
y
2
i
−n
y
2
=
n
Σ
j=1
u
2
j
−u
2
1
=
n
Σ
j=2
u
2
j
. (18)
Thus
R
2
=
U
2
2
Γ
n
i=2
U
2
i
=
U
2
2
U
2
2
+
Γ
n i=3
U
2
i
. (19)
WritingU=U
2
2
andW=
Γ
n i=3
U
2
i
, we see that the conditional distribution ofU,givenX 1,
X
2,...,X n,isχ
2
(1), and that ofW,givenX 1,X2,...,X n,isχ
2
(n−2). MoreoverUand
Ware independent. Since these conditional distributions do not involve theX’s, we see
thatUandWare unconditionally independent withχ
2
(1)andχ
2
(n−2)distributions,
respectively. The joint PDF ofUandWis
f(u,w)=
1
Γ(
1
2
)

2
u
1/2−1
e
−u/2
1
Γ[(n−2)/2]2
(n−2)/2
w
(n−2)/2−1
e
−w/2
.
Letu+w=z, thenu=r
2
zandw=z(1−r
2
). The Jacobian of this transformation isz,so
that the joint PDF ofR
2
andZis given by
f

(r
2
,z)=
1
Γ(
1
2
)Γ[(n−2)/2]2
(n−1)/2
z
n/2−3/2
e
−z/2
(r
2
)
−1/2
(1−r
2
)
n/2−2
.
The marginal PDF ofR
2
is easily computed as
f

1
(r
2
)=
Γ[(n−1)/2]
Γ(
1
2
)Γ[(n−2)/2]
(r
2
)
−1/2
(1−r
2
)
n/2−2
,0≤r
2
≤1. (20)
Finally, using Theorem 2.5.4, we get the PDF ofRas
f
1(r)=
Γ[(n−1)/2]
Γ(
1
2
)Γ[(n−2)/2]
(1−r
2
)
n/2−2
,−1≤r≤1.

SAMPLING FROM A BIVARIATE NORMAL DISTRIBUTION 281
As for the distribution ofB, note that the conditional PDF ofU 2=

n−1BS 1,givenX 1,
X
2,...,X n,isN(0,1), so that the conditional PDF ofB,givenX 1,X2,...,X n,isN(0,1/
Γ
(x
i−
x)
2
). Let us writeΛ=(n−1)S
2
1
. Then the PDF of RVΛis that of aχ
2
(n−1)RV.
Thus the joint PDF ofBandΛis given by
h(b,λ)=g(b|λ)h
2(λ), (21)
whereg(b|λ)isN(0,1/λ), andh
2(λ)isχ
2
(n−1).Wehave
h
1(b)=
'

0
h(b,λ)dλ
=
1
2
n/2
Γ(
1
2
)Γ[(n−1)/2]
'

0
λ
n/2−1
e
−λ/2(1+ b
2
)

=
Γ(n/2)
Γ(
1
2
)Γ[(n−1)/2]
1
(1+b
2
)
n/2
,−∞<b<∞. (22)
To complete the proof let us write
X
i=μ1+X

i
σ1andY i=μ2+Y

i
σ2,
whereX

i
∼N(0,1)andY

i
∼N(0,1). ThenX i∼N(μ 1,σ
2
1
),Yi∼N(μ 2,σ
2
2
), and
R=
Γ
n
i=1
(Xi−
X)(Y i−Y)
4
Γ
n i=1
(Xi−X)
2
Γ
n i=1
(Yi−Y)
2
=R

, (23)
so that the PDF ofRis the same as derived above. Also
B=
σ
1σ2
Γ
n
i=1
(X

i

X

)(Y

i

Y

)
σ
2
1
Γ
n i=1
(X

i

X

)
2
=
σ
2
σ1
B

, (24)
where the PDF ofB

is given by (22). Relations (22) and (24) are used to find the PDF of
B. We leave the reader to carry out these simple details.
Remark 3.In view of (23), namely the invariance ofRunder translation and (positive)
scale changes, we note that for fixednthe sampling distribution ofR, underρ=0, does
not depend onμ
1,μ2,σ1, andσ 2. In the general case whenρ =0, one can show that for
fixednthe distribution ofRdepends only onρbut not onμ
1,μ2,σ1, andσ 2(see, for
example, Cramér [17], p. 398).
Remark 4.Let us change the variable to
T=
R

1−R
2

n−2. (25)

282 SAMPLE STATISTICS AND THEIR DISTRIBUTIONS
Then
1−R
2
=

1+
T
2n−2

−1
,
and the PDF ofTis given by
p(t)=
1

n−2
1
B[(n−2)/2,
1
2
]
1
[1+t
2
/(n−2)]
(n−1)/2
, (26)
which is the PDF of at-statistic withn−2 d.f. ThusTdefined by (25) has at(n−2)
distribution, provided thatρ=0. This result facilitates the computation of probabilities
under the PDF ofRwhenρ=0.
Remark 5.To compute the PDF ofB
X|Y=R(S 1/S2), the so-calledsample regression
coefficientofXonY, all we need to do is to interchangeσ
1andσ 2in (7).
Remark 6.From (7) we can compute the mean and variance ofB.Forn>2, clearly
EB=0,
and forn>3, we can show that
EB
2
=var(B)=
σ
2
2
σ
2
1
1
n−3
.
Similarly, we can use (6) to compute the mean and variance ofR.Wehave,forn>4,
underρ=0,
ER=0
and
ER
2
=var(R)=
1
n−1
.
PROBLEMS 6.6
1.Let(X
1,Y1),(X2,Y2),...,(X n,Yn)be a random sample from a bivariate normal pop-
ulation withEX=μ
1,EY=μ 2,var(X)=var(Y)=σ
2
, andcov(X ,Y)=ρσ
2
.Let
X,Ydenote the corresponding sample means,S
2
1
,S
2
2
, the corresponding sample vari-
ances, andS
11, the sample covariance. WriteR=2S 11/(S
2
1
+S
2
2
). Show that the PDF
ofRis given by
f(r)=
Γ

n
2


πΓ

n−1
2
≥(1−ρ
2
)
(n−1)/2
(1−ρr)
−(n−1)
(1−r
2
)
(n−3)/2
,|r|<1.
(Rastogi [89])
[Hint:LetU=(X+Y)/2 andV=(X−Y)/2, and observe that the random vector
(U,V)is also bivariate normal. In fact,UandVare independent.]

SAMPLING FROM A BIVARIATE NORMAL DISTRIBUTION 283
2.LetXandYbe independent normal RVs. A sample ofn=11 observations on(X,Y)
produces sample correlation coefficientr=0.40. Find the probability of obtaining
a value ofRthat exceeds the observed value.
3.LetX
1,X2be jointly normally distributed with zero means, unit variances, and cor-
relation coefficientρ.LetS be aχ
2
(n)RV that is independent of(X 1,X2). Then
the joint distribution ofY
1=X1/
(
S/nandY 2=X2/
(
S/nis known as acentral
bivariate t-distribution. Find the joint PDF of (Y
1,Y2)and the marginal PDFs ofY 1
andY 2, respectively.
4.Let(X
1,Y1),...,(X n,Yn)be a sample from a bivariate normal distribution with
parametersEX
1=μ1,EYi=μ2,var(X i)=var(Y i)=σ
2
, andcov(X i,Yi)=ρσ
2
,
i=1,2,...,n. Find the distribution of the statistic
T(X,Y)=

n
(X−μ 1)−(Y−μ 2)
4
Γ
n
i=1
(Xi−Yi−
X+Y)
2
.

7
BASIC ASYMPTOTICS: LARGE SAMPLE
THEORY
7.1 INTRODUCTION
In Chapter 6 we described some methods of finding exact distributions of sample statistics
and their moments. While these methods are used in some cases such as sampling from a
normal population when the sample statistic of interest is
XorS
2
, often either the statistics
of interest, sayT
n=T(X 1,...,X n), is either too complicated or its exact distribution is not
simple to work with. In such cases we are interested in the convergence properties of T
n. We want to know what happens when the sample size is large. What is the limiting
distribution ofT
n? When the exact distribution ofT n(and its moments) is unknown or too
complicated we will often use their asymptotic approximations whennis large.
In this chapter, we discuss some basic elements of statistical asymptotics. In Section 7.2
we discuss various modes of convergence of a sequence of random variables. In Sections 7.3 and 7.4 the laws of large numbers are discussed. Section 7.5 deals with
limiting moment generating functions and in Section 7.6 we discuss one of the most fun-
damental theorem of classical statistics called the central limit theorem. In Section 7.7 we
consider some statistical applications of these methods.
The reader may find some parts of this chapter a bit difficult on first reading. Such a
discussion has been indicated with a

.
7.2 MODES OF CONVERGENCE
In this section we consider several modes of convergence and investigate their interrela-
tionships. We begin with the weakest mode of convergence.
An Introduction to Probability and Statistics, Third Edition. Vijay K. Rohatgi and A.K. Md. Ehsanes Saleh.
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.

286 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
Definition 1.Let{F n}be a sequence of distribution functions. If there exists a DFFsuch
that, asn→∞,
F
n(x)→F(x) (1)
at every pointxat whichFis continuous, we say thatF
nconverges in law (or, weakly),to
F, and we writeF
n
w−→F.
If{X
n}is a sequence of RVs and{F n}is the corresponding sequence of DFs, we say
thatX
nconverges in distribution (or law)toXif there exists an RVXwith DFFsuch that
F
n
w−→F. We writeX n
L−→X.
It must be remembered that it is quite possible for a given sequence DFs to converge
to a function that is not a DF.
Example 1.Consider the sequence of DFs
F
n(x)=

0,x<n,
1,x≥n.
HereF
n(x)is the DF of the RVX ndegenerate atx=n. We see thatF n(x)converges to a
functionFthat is identically equal to 0, and hence it is not a DF.
Example 2.LetX
1,X2,...,X nbe iid RVs with common density function
f(x)=



1
θ
0<x<θ,(0<θ<∞),
0 otherwise.
LetX
(n)=max(X 1,X2,...,X n). Then the density function ofX
(n)is
f
n(x)=



nx
n−1
θ
n
0<x<θ,
0 otherwise,
and the DF ofX
(n)is
F
n(x)=





0 x<0,
(x/θ)
n
0≤x<θ,
1, x≥θ.
We see that, asn→∞,
F
n(x)→F(x)=

0x<θ,
1,x≥θ,
which is a DF. ThusF
n
w−→F.

MODES OF CONVERGENCE 287
Example 3.LetF nbe a sequence of DFs defined by
F
n(x)=







0, x<0,
1−
1
n
,0≤x<n,
1, n≤x.
ClearlyF
n
w−→F, whereFis the DF given by
F(x)=
Ω
0,x<0,
1,x≥0.
Note thatF
nis the DF of the RVX nwith PMF
P{X
n=0}=1−
1
n
,P{X
n=n}=
1
n
,
andFis the DF of the RVXdegenerate at 0. We have
EX
k
n
=n
k
η
1
n
α
=n
k−1
,
wherekis a positive integer. AlsoEX
k
=0. So that
EX
k
n
ΩEX
k
for anyk≥1.
We next give an example to show that weak convergence of distribution functions does
not imply the convergence of corresponding PMF’s or PDF’s.
Example 4.Let{X
n}be a sequence of RVs with PMF
f
n(x)=P{X n=x}=
Ω
1ifx=2+1/n,
0 otherwise.
Note that none of thef
n’s assigns any probability to the pointx=2. It follows that
f
n(x)→f(x)asn→∞,
wheref(x)=0 for allx. However, the sequence of DFs{F
n}of RVsX nconverges to the
function
F(x)=
Ω
0,x<2,
1x≥2,

288 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
at all continuity points ofF. SinceFis the DF of the RV degenerate atx=2,F n
w−→F.
The following result is easy to prove.
Theorem 1.LetX
nbe a sequence of integer-valued RVs. Also, letf n(k)=P{X n=k},
k=0,1,2,...,be the PMF ofX
n,n=1,2,...,andf(k)=P{x =k}be the PMF ofX.
Then
f
n(x)→f(x)for allx⇔X n
L→X.
In the continuous case we state the following result of Scheffé [100] without proof.
Theorem 2.LetX
n,n=1,2,...,andXbe continuous RVs such that
f
n(x)→f(x)for (almost) allxasn→∞.
Here,f
nandfare the PDFs ofX nandX, respectively. ThenX n
L−→X.
The following result is easy to establish.
Theorem 3.Let{X
n}be a sequence of RVs such thatX n
L−→X, and letcbe a constant.
Then
(a)X
n+c
L
−→X+c,
(b)cX
n
L−→cX,cη=0.
A slightly stronger concept of convergence is defined byconvergence in probability.
Definition 2.Let{X
n}be a sequence of RVs defined on some probability space(Ω,S,P).
We say that the sequence{X
n}converges in probabilityto the RVXif, for everyε>0.
P{|X
n−X|>ε}→0as n→∞. (2)
We writeX
n
P−→X.
Remark 1.We emphasize that the definition says nothing about the convergence of the
RVsX
nto the RVXin the sense in which it is understood in real analysis. ThusX n
P−→
Xdoes not imply that, givenε>0, we can find anNsuch that|X
n−X|<εforn≥
N. Definition 2 speaks only of the convergence of the sequence of probabilitiesP{|X
n−
X|>ε}to 0.
Example 5.Let{X
n}be a sequence of RVs with PMF
P{X
n=1}=
1
n
,P{X
n=0}=1−
1
n
.

MODES OF CONVERGENCE 289
Then
P{|X
n|>ε}=



P{X
n=1}=
1
n
if 0<ε<1,
0i fε≥1.
It follows thatP{|X
n|>ε}→0asn →∞, and we conclude thatX n
P−→0.
The following statements can be verified.
1.X
n
P−→X⇔X n−X
P
−→0.
2.X
n
P−→X,X n
P−→Y⇒P{X=Y}=1, forP{|X−Y|>c}≤P{| X n−X|>
c
2
}+
P{|X
n−Y|>
c
2
}, and it follows thatP{|X−Y|>c}=0 for everyc>0.
3.X
n
P−→X⇒X n−Xm
P−→0asn,m→∞,for
P{|X
n−Xm|>ε}≤P

|X n−X|>
ε
2

+P

|X
m−X|>
ε
2

.
4.X
n
P−→X,Y n
P−→Y⇒X n±Yn
P−→X±Y.
5.X
n
P−→X,kconstant,⇒kX n
P−→kX.
6.X
n
P−→k⇒X
2
n
P
−→k
2
.
7.X
n
P−→a,Y n
P−→b,a,bconstants⇒X nYn
P−→ab,for
X
nYn=
(X
n+Yn)
2
−(X n−Yn)
2
4
P
−→
(a+b)
2
−(a−b)
2
4
=ab.
8.X
n
P−→1⇒X
−1
n
P
−→1, for
P

|
1 Xn
−1|≥ε

=P

1
Xn
≥1+ε

+P

1
Xn
≤1−ε

=P

1
Xn
≥1+ε

+P

1
Xn
≤0

+P

0<
1
Xn
≤1−ε

and each of the three terms on the right goes to 0 asn→∞.
9.X
n
P−→a,Y n
P−→b,a,bconstants,bη=0⇒X nY
−1
n
P
−→ab
−1
.
10.X
n
P−→X, andYan RV⇒X nY
P
−→XY.
Note thatYis an RV so that, givenδ>0, there exists ak>0 such thatP{|Y|>k}
<δ/2. Thus
P{|X
nY−XY|>ε}=P{|X n−X Y|>ε,|Y|>k}
+P{|X
n−X Y|>ε,|Y|≤k}
<
δ
2
+P

|X
n−X|>
ε
k

.

290 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
11.X n
P−→X,Y n
P−→Y⇒X nYn
P−→XY,for
(X
n−X)(Y n−Y)
P
−→0.
The result now follows on multiplication, using result 10. It also follows that
X
n
P−→X⇒X
2
n
P
−→X
2
.
Theorem 4.LetX
n
P−→Xandgbe a continuous function defined onR. Theng(X n)
P
−→g(X)
asn→∞.
Proof.SinceXis an RV, we can, givenε>0, find a constantk=k(ε)such that
P{|X|>k}<
ε
2
.
Also,gis continuous onR, so thatgis uniformly continuous on[−k,k]. It follows that
there exists aδ=δ(ε,k)such that
|g(x
n)−g(x)|<ε
whenever|x|≤kand|x
n−x|<δ.Let
A={|X|≤k}, B={|X
n−X|<δ}, C={|g(X n)−g(X)|<ε}.
Thenω∈A∩B⇒ω∈C, so that
A∩B⊆C.
It follows that
P{C
c
}≤P{A
c
}+P{B
c
},
that is,
P{|g(X
n)−g(X)|≥ε}≤P{|X n−X|≥δ}+P{|X|>k}<ε
forn≥N(ε,δ,k), whereN(ε,δ,k)is chosen so that
P{|X
n−X|≥δ}<
ε
2
forn≥N(ε,δ,k).
Corollary.X
n
P−→c, wherecis a constant⇒g(X n)
P
−→g(c),gbeing a continuous function.
We remark that a more general result than Theorem 4 is true and state it without proof
(see Rao [88, p. 124]):X
n
L−→Xandgcontinuous onR⇒g(X n)
L
−→g(X).
The following two theorems explain the relationship between weak convergence and
convergence in probability.

MODES OF CONVERGENCE 291
Theorem 5.X n
P−→X⇒X n
L−→X.
Proof.LetF
nandF, respectively, be the DFs ofX nandX.Wehave
{ω:X(ω)≤x

}={ω:X n(ω)≤x,X(ω)≤x

}∪{ω:X n(ω)>x,
X(ω)≤x

}⊆{X n≤x}∪{X n>x,X≤x

}.
It follows that
F(x

)≤F n(x)+P{X n>x,X≤x

}.
SinceX
n−X
P
−→0, we have forx

<x
P{X
n>x,X≤x

}≤P{|X n−X|>x−x

}→0as n→∞.
Therefore
F(x

)≤lim
n→∞
Fn(x), x

<x.
Similarly, by interchangingXandX
n, andxandx

, we get
lim
n→∞
Fn(x)≤F(x
→→
), x<x
→→
.
Thus, forx

<x<x
→→
,wehave
F(x

)≤lim
Fn(x)≤limF n(x)≤F(x
→→
).
SinceFhas only a countable number of discontinuity points, we choosexto be a point of
continuity ofF, and lettingx
→→
↓xandx

↑x,wehave
F(x) = lim
n→∞
Fn(x)
at all points of continuity ofF.
Theorem 6.Letkbe a constant. Then
X
n
L−→k⇒X n
P−→k.
Proof.The proof is left as an exercise.
Corollary.Letkbe a constant. Then
X
n
L−→k⇔X n
P→k.

292 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
Remark 2.We emphasize that we cannot improve the above result by replacingkby an
RV, that is,X
n
L−→Xin general does not implyX n
P−→X,forletX ,X 1,X2...be identically
distributed RVs, and let the joint distribution of(X
n,X)be as follows:
X\X
n
01
0 0
1
2
1
2
1
1
2
0
1
2
1
2
1
2
1
Clearly,X
n
L−→X.But
P

|X
n−X|>
1
2

=P{|X
n−X|=1}
=P{X
n=0,X=1}+P{X n=1,X=0}
=1→0.
Hence,X
n
P→X,butX n
L−→X.
Remark 3.Example 3 shows thatX
n
P−→Xdoes not implyEX
k
n
→EX
k
for anyk>0,k
integral.
Definition 3.Let{X
n}be a sequence of RVs such thatE|X n|
r
<∞,forsomer >0. We
say thatX
nconverges in therthmeanto an RVXifE|X|
r
<∞and
E|X
n−X|
r
→0as n→∞, (3)
and we writeX
n
r−→X.
Example 6.Let{X
n}be a sequence of RVs defined by
P{X
n=0}=1−
1
n
,P{X
n=1}=
1
n
,n=1,2,....
Then
E|X
n|
2
=
1
n
→0as n→∞,
and we see thatX
n
2−→X, where RVXis degenerate at 0.
Theorem 7.LetX
n
r−→Xfor somer>0. ThenX n
P−→X.
Proof.The proof is left as an exercise.
Example 7.Let{X
n}be a sequence of RVs defined by
P{X
n=0}=1−
1
n
r
,P{X n=n}=
1
n
r
,r>0,n=1,2,....

MODES OF CONVERGENCE 293
ThenE|X n|
r
=1, so thatX n
r→0. We show thatX n
P−→0.
P{|X
n|>ε}=

P{X
n=n}ifε<n
0if ε>n

→0asn→∞.
Theorem 8.Let{X
n}be a sequence of RVs such thatX n
2−→X. ThenEX n→EXand
EX
2
n
→EX
2
asn→∞.
Proof.We have
|E(X
n−X)|≤E |X n−X|≤E
1/2
|Xn−X|
2
→0as n→∞.
To see thatEX
2
n
→EX
2
(see also Theorem 9), we write
EX
2
n
=E(X n−X)
2
+EX
2
+2E{X(X n−X)}
and note that
|E{X(X
n−X)}| ≤

EX
2
E(Xn−X)
2
by the Cauchy–Schwarz inequality. The result follows on passing to the limits.
We get, in addition, thatX
n
2−→Ximpliesvar(X n)→var(X).
Corollary.Let{X
m},{Y n}be two sequences of RVs such thatX m
2−→X,Y n
2−→Y. Then
E(X
mYn)→E(XY)asm,n→∞.
Proof.The proof is left to the reader.
As a simple consequence of Theorem 8 and its corollary we see thatX
m
2−→X,Y n
2−→Y
together implycov(X
m,Yn)→cov(X ,Y).
Theorem 9.IfX
n
r−→X, thenE|X n|
r
→E|X|
r
.
Proof.Let 0<r≤1. Then
E|X
n|
r
=E|X n−X+X|
r
so that
E|X
n|
r
−E|X|
r
≤E|X n−X|
r
.
InterchangingX
nandX, we get
E|X|
r
−E|X n|
r
≤E|X n−X|
r
.

294 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
It follows that
|E|X|
r
−E|X n|
r
|≤E|X n−X|
r
→0as n→∞.
Forr>1, we use Minkowski’s inequality and obtain
[E|X
n|
r
]
1/r
≤[E|X n−X|
r
]
1/r
+[E|X|
r
]
1/r
and
[E|X|
r
]
1/r
≤[E|X n−X|
r
]
1/r
+[E|X n|
r
]
1/r
.
It follows that
|E
1/r
|Xn|
r
−E
1/r
|X|
r
|≤E
1/r
|Xn−X|
r
→0as n→∞.
This completes the proof.
Theorem 10.Letr>s. ThenX
n
r−→X⇒X n
s−→X.
Proof.From Theorem 3.4.3 it follows that fors<r
E|X
n−X|
s
≤[E|X n−X|
r
]
s/r
→0as n→∞
sinceX
n
r−→X.
Remark 4.Clearly the converse to Theorem 10 cannot hold, sinceE|X|
s
<∞fors<r
does not implyE|X|
r
<∞.
Remark 5.In view of Theorem 9, it follows thatX
n
r−→X⇒E|X n|
s
→E|X|
s
fors≤r.
Definition 4.

Let{X n}be a sequence of RVs. We say thatX nconverges almost surely
(a.s.) to an RVXif and only if
P{ω:X
n(ω)→X(ω)asn→∞}= 1, (4)
and we writeX
n
a.s.−−→XorX n→Xwith probability 1.
The following result elucidates Definition 4.
Theorem 11.X
n
a.s.−−→Xif and only iflim n→∞P{sup
m≥n|Xm−X|>ε}=0 for allε>0.
Proof.SinceX
n
a.s.−−→X,X n−X
a.s.
−−→0, and it will be sufficient to show the equiva-
lence of

May be omitted on the first reading.

MODES OF CONVERGENCE 295
(a)X n
a.s.−−→0 and
(b)lim
n→∞P{sup
m≥n|Xm|>ε}=0.
Let us suppose that (a) holds. Letε>0, and write
A
n(ε)=

sup
m≥n
|Xm|>ε

andC=

lim
n→∞
Xn=0

.
Also writeB
n(ε)=C ∩A n(ε), and note thatB n+1(ε)⊂B n(ε), and the limit set


n=1
Bn(ε)=φ .Itfollowsthat
lim
n→∞
PBn(ε)=P
θ


n=1
Bn(ε)

=0.
SincePC=1,PC
c
=0, we have
PB
n(ε)−P(A n∩C)=1−P(C
c
∪A
c
n
)
=1−PC
c
−PA
c
n
+P(C
c
∩A
c
n
)
=PA
n+P(C
c
∩A
c
n
)
=PA
n.
It follows that (b) holds.
Conversely, letlim
n→∞PAn(ε)=0, and write
D(ε)=

lim
n→∞
|Xn|>ε>0

.
SinceD(ε)⊂A
n(ε)forn=1,2,...,it follows thatPD(ε)=0. Also,
C
c
=

lim
n→∞
Xnλ=0




k=1

lim|X n|>
1
k

,
so that
1−PC≤


k=1
PD
λ
1
k

=0,
and (a) holds.
Remark 6.ThusX
n
a.s.−−→0 means that, forε>0,η>0 arbitrary, we can find ann 0such
that
P

sup
n≥n0
|Xn|>ε

<η. (5)
Indeed, we can write, equivalently, that
lim
n0→∞
P


n≥n0
{|Xn|>ε}

=0. (6)

296 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
Theorem 12.X n
a.s.−−→X⇒X n
P−→X.
Proof.By Remark 6,X
n
a.s.−−→Ximplies that, for arbitraryε>0,η>0, we can choose an
n
0=n0(ε,η)such that
P



n=n0
{|Xn−X|≤ε}

≥1−η.
Clearly,


n=n0
{|Xn−X|≤ε}⊂{| X n−X|≤ε} forn≥n 0.
It follows that forn≥n
0
P{|X n−X|≤ε}≥P



n=n0
{|Xn−X|≤ε}

≥1−η,
that is
P{|X
n−X|>ε}<η forn≥n 0,
which is the same as sayingX
n
P−→X.
That the converse of Theorem 12 does not hold is shown in the following example.
Example 8.For each positive integernthere exist integersmandk(uniquely determined)
such that
n=2
k
+m, 0≤m<2
k
,k=0,1,2,....
Thus, forn=1,k=0 andm=0; forn=5,k=2 andm=1; and so on. Define RVsX
n,
forn=1,2,...,onΩ=[0,1]by
X
n(ω)=



2
k
,
m
2
k
≤ω<
m+1
2
k
,
0,otherwise.
Let the probability distribution ofX
nbe given byP{I}=length of the intervalI⊆Ω.
Thus
P{X
n=2
k
}=
1
2
k
,P{X n=0}=1−
1
2
k
.
The limitlim
n→∞Xn(ω)does not exist for anyω∈Ω, so thatX ndoes not converge almost
surely. But
P{X
n|>ε}=P{X n>ε}=



0if ε≥2
k
,
1
2
k
if 0<ε<2
k
,

MODES OF CONVERGENCE 297
and we see that
P{|X
n|>ε}→0as n(and hencek)→∞.
Theorem 13.Let{X
n}be a strictly decreasing sequence of positive RVs, and suppose
thatX
n
P−→0. ThenX n
a.s.−−→0.
Proof.The proof is left as an exercise.
Example 9.Let{X
n}be a sequence of independent RVs defined by
P{X
n=0}=1−
1
n
,P{X
n=1}=
1
n
,n=1,2,....
Then
E|X
n−0|
2
=E|X n|
2
=
1
n
→0as n→∞,
so thatX
n
2−→0. Also
P{X
n=0 for everym≤n≤n 0}
=
n0≈
n=m
η
1−
1
n

=
m−1
n0
,
which diverges to 0 asn
0→∞for all values ofm. ThusX ndoes not converge to 0 with
probability 1.
Example 10.Let{X
n}be independent defined by
P{X
n=0}=1−
1
n
r
,P{X n=n}=
1
n
r
,r≥2,n=1,2,....
Then
P{X
n=0form≤n≤n 0}=
n0≈
n=m
η
1−
1
n
r

.
Asn
0→∞, the infinite product converges to some nonzero quantity, which itself
converges to 1 asm→∞. ThusX
n
a.s.−−→0. However,E|X n|
r
=1 andX n
r→0asn→∞.
Example 11.Let{X
n}be a sequence of RVs withP{X n=±1/n}=
1
2
. ThenE|X n|
r
=
1/n
r
→0asn →∞andX n
r−→0. Forj<k,|X j|>|X k|, so that{|X k|>ε}⊂{|X j|>ε}.It
follows that


j=n
{|Xj|>ε}={|X n|>ε}.

298 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
Choosingn>1/ε, we see that
P



ζ
j=n
{|Xj|>ε}

⎦=P{|X n|>ε}≤P

|X n|>
1
n

=0,
and (6) implies thatX
n
a.s.−−→0.
Remark 7.In Theorem 7.4.3 we prove a result which is sometimes useful in proving a.s.
convergence of a sequence of RVs.
Theorem 14.Let{X
n,Yn},n=1,2,...,be a sequence of RVs. Then
|X
n−Yn|
P
−→0 andY n
L−→Y⇒X n
L−→Y.
Proof.Letxbe a point of continuity of the DF ofYandε>0. Then
P{X
n≤x}=P{Y n≤x+Y n−Xn}
=P{Y
n≤x+Y n−Xn;Yn−Xn≤ε}
+P{Y
n≤x+Y n−Xn;Yn−Xn>ε}
≤P{Y
n≤x+ε}+P{Y n−Xn>ε}.
It follows that
lim
n→∞
P{Xn≤x}≤lim
n→∞
P{Yn≤x+ε}.
Similarly
lim
n→∞
P{Xn≤x}≥lim
n→∞
P{Yn≤x−ε}.
Sinceε>0 is arbitrary andxis a continuity point ofP{Y≤x}, we get the result by
lettingε→0.
Corollary.X
n
P−→X⇒X n
L−→X.
Theorem 15.(Slutsky’s Theorem). Let{X
n,Yn},n=1,2,...,be a sequence of pairs of
RVs, and letcbe a constant. Then
(a)X
n
L−→X,Y n
P−→c⇒X n+Yn
L−→X+c;
(b)X
n
L−→X,

MODES OF CONVERGENCE 299
Yn
P→c⇒

X
nYn
L−→cXifcη=0,
X
nYn
P−→0ifc=0;
(c)X
n
L−→X,Y n
P−→c⇒
X
n
Yn
L
−→X/cifcη=0
Proof.(a)X
n
L−→X⇒X n+c
L
−→X+c(Theorem 3). Also,Y n−c=(Y n+Xn)−(X n+c)
P
−→0.
A simple use of Theorem 14 shows that
X
n+Yn
L−→X+c.
(b) We first consider the case wherec=0. We have, for any fixed numberk>0,
P{|X
nYn|>ε}=P

|X nYn|>ε,|Y n|≤
ε
k

+P

|X
nYn|>ε,|Y n|>
ε
k

≤P{|X
n|>k}+P

|Y n|>
ε k

.
SinceY
n
P−→0 andX n
L−→X, it follows that, for any fixedk>0,
lim
n→∞
P{|X nYn|>ε}≤P{|X|>k}.
Sincekis arbitrary, we can makeP{|X|>k}as small as we please by choosingk
large. It follows that
X
nYn
P−→0.
Now, letcη=0. Then
X
nYn−cXn=Xn(Yn−c)
and, sinceX
n
L−→X,Y n
P−→c,X n(Yn−c)
P
−→0. Using Theorem 14, we get the result
that
X
nYn
L−→cX.
(c)Y
n
P−→c, andcη=0⇒Y
−1
n
P
−→c
−1
. It follows thatX n
L−→X,Y n
P−→c⇒
X
nY
−1
n
L
−→c
−1
X, and the proof of the theorem is complete.
As an application of Theorem 15 we present the following example.

300 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
Example 12.LetX 1,X2,...,be iid RVs with common lawN(0,1). We shall determine
the limiting distribution of the RV
W
n=

n
X
1+X2+···+X n
X
2
1
+X
2
2
+···+X
2
n
.
Let us write
U
n=
1

n
(X
1+X2+···+X n)andV n=
X
2
1
+X
2
2
+···+X
2
n
n
.
Then
W
n=
U
n
Vn
.
For the MGF ofU
nwe have
M
Un
(t)=
n

i=1
Ee
tX
i/

n=
n

i=1
e
t
2
/2n
=e
t
2
/2
,
so thatU
nis anN(0,1)variate (see also Corollary 2 to Theorem 5.3.22). It follows that
U
n
L−→Z, whereZis anN(0,1)RV. As forV n, we note that eachX
2
i
is a chi-square variate
with 1 d.f. Thus
M
Vn
(t)=
n

i=1
η
1
1−2t/n
α
1/2
,t<
n
2
,
=
η
1−
2t
n
α
−n/2
, t<
n
2
,
which is the MGF of a gamma variate with parametersα=n/2 andβ=2/n. Thus the
density function ofV
nis given by
f
Vn
(x)=



1
Γ(n/2)
1
(2/n)
n/2
x
n/2−1
e
−nx/2
,0<x<∞,
0, otherwise.
We will show thatV
n
P−→1. We have, for anyε>0,
P{|V
n−1|>ε}≤
var(V
n)
ε
2
=

n
2

η
2
n
α
2
1
ε
2
→0as n→∞.
We have thus shown that
U
n
L−→Z andV n
P−→1.
It follows by Theorem 15 (c) thatW
n=Un/Vn
L−→Z, whereZis anN(0,1)RV.

MODES OF CONVERGENCE 301
Later on we will see that the condition that theX i’s beN(0,1)is not needed. All we
need is thatE|X
i|
2
<∞.
PROBLEMS 7.2
1.LetX
1,X2,...be a sequence of RVs with corresponding DFs given byF n(x)=0if
x<−n,=(x+n)/2n if−n≤x<n, and=1ifx≥n. DoesF
nconverge to a DF?
2.LetX
1,X2...be iidN(0,1)RVs. Consider the sequence of RVs{
Xn}, whereXn=
n
−1

n
i=1
Xi.LetF nbetheDFof
Xn,n=1,2,....Findlim n→∞Fn(x). Is this limit
aDF?
3.LetX
1,X2,...be iidU(0,θ)RVs. LetX
(1)=min(X 1,X2,···,X n), and consider the
sequenceY
n=nX
(1). DoesY nconverge in distribution to some RVY? If so, find
the DF of RVY.
4.LetX
1,X2,...be iid RVs with common absolutely continuous DFF.LetX
(n)=
max(X
1,X2,...,X n), and consider the sequence of RVsY n=n[1−F(X
(n))].Find
the limiting DF ofY
n.
5.LetX
1,X2,...be a sequence of iid RVs with common PDFf(x)=e
−x+θ
ifx≥θ,
and=0ifx<θ. Write
Xn=n
−1

n
i=1
Xi.
(a) Show that
Xn
P−→1+θ.
(b) Show thatmin{X
1,X2,···,X n}
P
−→θ.
6.LetX
1,X2,...be iidU[0,θ]RVs. Show thatmax{X 1,X2,...,X n}
P
−→θ.
7.Let{X
n}be a sequence of RVs such thatX n
L−→X.Leta nbe a sequence of positive
constants such thata
n→∞asn→∞. Show thata
−1
n
Xn
P−→0.
8.Let{X
n}be a sequence of RVs such thatP{|X n|≤k}=1 for allnand some
constantk>0. Suppose thatX
n
P−→X. Show thatX n
r−→Xfor anyr>0.
9.LetX
1,X2,...,X 2nbe iidN(0,1)RVs. Define
U
n=

X
1
X2
+
X
3
X4
+···+
X
2n−1
X2n

,V
n=X
2
1
+X
2
2
+···+X
2
n
,and
Z
n=
U
n
Vn
.
Find the limiting distribution ofZ
n.
10.Let{X
n}be a sequence of geometric RVs with parameterλ/n,n>λ>0. Also,
letZ
n=Xn/n. Show thatZ n
L−→G(1,1/λ)asn→∞ (Prochaska [82]).
11.LetX
nbe a sequence of RVs such thatX n
a.s.−−→0, and letc nbe a sequence of real
numbers such thatc
n→0asn→∞. Show thatX n+cn
a.s.−−→0.
12.Does convergence almost surely imply convergence of moments?
13.LetX
1,X2,...be a sequence of iid RVs with common DFF, and writeX
(n)=
max{X
1,X2,...,X n},n=1,2,....
(a) Forα>0,lim
x→∞x
α
P{X1>x}=b>0. Find the limiting distribution
of(bn)
−1/α
X
(n). Also, find the PDF corresponding to the limiting DF and
compute its moments.

302 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
(b) IfFsatisfies
lim
x→∞
e
x
[1−F(x)] =b>0,
find the limiting DF ofX
(n)−log(bn) and compute the corresponding PDF and
the MGF.
(c) IfX
iis bounded above byx 0with probability 1, and for someα>0
lim
x→x 0−
(x0−x)
−α
[1−F(x)] =b>0,
find the limiting distribution of(bn)
1/α
{X
(n)−x0}, the corresponding PDF,
and the moments of the limiting distribution.
(The above remarkable result, due to Gnedenko[36],exhausts all limiting
distributions of X
(n)with suitable norming and centering.)
14.Let{F
n}be a sequence of DFs that converges weakly to a DFFwhich is continuous
everywhere. Show thatF
n(x)converges toF(x)uniformly.
15.Prove Theorem 1.
16.Prove Theorem 6.
17.Prove Theorem 13.
18.Prove Corollary 1 to Theorem 8.
19.LetVbe the class of all random variables defined on a probability space with finite
expectations, and forX∈Vdefine
ρ(X)=E

|X|
1+|X|

.
Show the following:
(a)ρ(X+Y)≤ρ(X)+ρ(Y );ρ(σX)≤max(|σ|,1)ρ(X).
(b)d(X,Y)=ρ(X−Y)is a distance function onV(assuming that we identify RVs
that are a.s. equal).
(c)lim
n→∞d(Xn,X)=0⇔X n
P−→X.
20.For the following sequences of RVs{X
n}, investigate convergence in probability
and convergence in rth mean.
(a)X
n∼C(1/n,0).
(b)P(X
n=e
n
)=
1
n
2,P(X n=0)=1−
1
n
2.
7.3 WEAK LAW OF LARGE NUMBERS
Let{X
n}be a sequence of RVs. WriteS n=

n
k=1
Xk,n=1,2,....In this section we
answer the following question in the affirmative: Do there exist sequences of constantsA
n
andB n>0,B n→∞asn→∞, such that the sequence of RVsB
−1
n
(Sn−An)converges
in probability to 0 asn→∞?

WEAK LAW OF LARGE NUMBERS 303
Definition 1.Let{X n}be a sequence of RVs, and letS n=

n
k=1
Xk,n=1,2,....We say
that{X
n}obeys theweak law of large numbers(WLLN) with respect to the sequence of
constants{B
n},Bn>0,B n↑∞, if there exists a sequence of real constantsA nsuch that
B
−1
n
(Sn−An)
P
−→0asn→∞.A nare calledcentering constantsandB nnorming constants.
Theorem 1.Let{X
n}be a sequence of pairwise uncorrelated RVs withEX i=μiand
var(X
i)=σ
2
i
,i=1,2,....If

n
i=1
σ
2
i
→∞asn→∞, we can chooseA n=

n
k=1
μkand
B
n=

n
i=1
σ
2
i
, that is,
n

i=1
Xi−μi

n i=1
σ
2
i
P
−→0as n→∞.
Proof.We have, by Chebychev’s inequality,
P

|S
n−
n

k=1
μk|>ε
n

i=1
σ
2
i


E{

n i=1
(Xi−μi)}
2
ε
2
(

n i=1
σ
2
i
)
2
=
1
ε
2

n
i=1
σ
2
i
→0as n→∞.
Corollary 1.If theX
n’s are identically distributed and pairwise uncorrelated withEX i=μ
andvar(X
i)=σ
2
<∞, we can chooseA n=nμandB n=nσ
2
.
Corollary 2.In Theorem 1 we can chooseB
n=n, provided thatn
−2

n
i=1
σ
2
i
→0as
n→∞.
Corollary 3.In Corollary 1, we can takeA
n=nμandB n=n, sincenσ
2
/n
2
→0as
n→∞. Thus, if{X
n}are pairwise-uncorrelated identically distributed RVs with finite
variance,S
n/n
P
−→μ.
Example 1.LetX
1,X2,...be iid RVs with common lawb(1,p). ThenEX i=p,var(X i)=
p(1−p), and we have
S
n
n
P
−→p asn→∞.
Note thatS
n/nis the proportion of successes inntrials. In particular, recall from Section
6.3 thatnF

n
(x)is ab(x,F(x))RV. It follows that for eachx∈R,
F

n
(x)
P
−→F(x)asn→∞.
Hereafter, we shall be interested mainly in the case whereB
n=n. When we say that
{X
n}obeys the WLLN, this is so with respect to the sequence{n}.
Theorem 2.Let{X
n}be any sequence of RVs. WriteY n=n
−1

n
k=1
Xk. A necessary and
sufficient condition for the sequence{X
n}to satisfy the weak law of large numbers is that
E

Y
2
n
1+Y
2
n

→0as n→∞. (1)

304 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
Proof.For any two positive numbersa,b,a≥b>0, we have
η
a
1+a
αη
1+b
b

≥1. (2)
LetA={|Y
n|≥ε}. Then ω∈A⇒|Y n|
2
≥ε
2
>0. Using (2), we see thatω∈Aimplies
Y
2
n
1+Y
2
n
1+ε
2
ε
2
≥1.
It follows that
PA≤P

Y
2
n
1+Y
2
n

ε
2
1+ε
2

≤E
|Y
2
n
/(1+Y
2
n
)|
ε
2
/(1+ε
2
)
by Markov’s inequality
→0as n→∞.
That is,
Y
n
P−→0as n→∞.
Conversely, we will show that for everyε>0
P{|Y
n|≥ε}≥E

Y
2
n
1+Y
2
n

−ε
2
. (3)
We will prove (3) for the case in whichY
nis of the continuous type. The discrete case
being similar, we ask the reader to complete the proof. IfY
nhas PDFf n(y), then


−∞
y
2
1+y
2
fn(y)dy=




|y|>ε
+

|y|≤ε



y
2
1+y
2
fn(y)dy
≤P{|Y
n|>ε}+

ε
−ε
η
1−
1
1+y
2

f
n(y)dy
≤P{|Y
n|>ε}+
ε
2
1+ε
2
≤P{|Y n|>ε}+ε
2
,
which is (3).
Remark 1.Since condition (1) applies not to the individual variables but to their sum, The-
orem 2 is of limited use. We note, however, that all weak laws of large numbers obtained
as corollaries to Theorem 1 follow easily from Theorem 2 (Problem 6).

WEAK LAW OF LARGE NUMBERS 305
Example 2.Let(X 1,X2,...,X n)be jointly normal withEX i=0,EX
2
i
=1 for alli, and
cov(X
i,Xj)=ρif|j−i|=1, and=0 otherwise. ThenS n=

n
k=1
XkisN(0,σ
2
), where
σ
2
=var(S n)=n+2(n−1)ρ,
and
E

Y
2
n
1+Y
2
n

=E

S
2
nn
2
+S
2
n

=
2
σ




0
x
2
n
2
+x
2
e
−x
2
/2σ
2
dx
=
2




0
y
2
[n+2(n−1)ρ]
n
2
+y
2
[n+2(n−1)ρ]
e
−y
2
/2
dy

n+2(n−1)ρ
n
2


0
2


y
2
e
−y
2
/2
dy→0as n→∞.
It follows from Theorem 2 thatn
−1
Sn
P−→0. We invite the reader to compare this result to
that of Problem 7.5.6.
Example 3.LetX
1,X2,...be iidC(1,0)RVs. We have seen (corollary to Theorem 5.3.18)
thatn
−1
Sn∼C(1,0), so thatn
−1
Sndoes not converge in probability to 0. It follows that
the WLLN does not hold (see also Problem 10).
LetX
1,X2,...be an arbitrary sequence of RVs, and letS n=

n
k=1
Xk,n=1,2,....Let
us truncate eachX
iatc>0, that is, let
X
c
i
=
Ω
X
iif|Xi|≤c
0if|X
i|>c
,i=1,2,...,n.
Write
S
c
n
=
n

i=1
X
c
i
,andm n=
n

i=1
EX
c
i
.
Lemma 1.For anyε>0,
P{|S
n−mn|>ε}≤P{|S
c
n
−mn|>ε}+
n

k=1
P{|X k|>c}. (4)
Proof.We have
P{|S
n−mn|>ε}=P{|S n−mn|>εand|X k|≤cfork=1,2,...,n}
+P{|S
n−mn|>εand|X k|>cfor at least onek,
k=1,2,...,n}

306 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
≤P{|S
c
n
−mn|>ε}+P{|X k|>cfor at least onek,
1≤k≤n}
≤P{|S
c
n
−mn|>ε}+
n

k=1
P{|X k|>c}.
Corollary.IfX
1,X2,...,X nare exchangeable, then
P{|S
n−mn|>ε}≤P{|S
c
n
−mn|>ε}+nP{|X 1|>c}. (5)
If, in addition, the RVsX
1,X2,...,X nare independent, then
P{|S
n−mn|>ε}≤
nE(X
c
1
)
2
ε
2
+nP{|X 1|>c}. (6)
Inequality (6) yields the following important theorem.
Theorem 3.Let{X
n}be a sequence of iid RVs with common finite meanμ=EX 1. Then
n
−1
Sn
P−→μ asn→∞.
Proof.Let us takec=nin (6) and replaceεbynε; then we have
P{|S
n−mn|>nε}≤
1

2
E(X
n
1
)
2
+nP{|X 1|>n},
whereX
n
1
isX1truncated atn.
First note thatE|X
1|<∞⇒nP{|X 1|>n}→0asn →∞. Now (see remarks following
Lemma 3.2.1)
E(X
n
1
)
2
=2

n
0
xP{|X 1|>x}dx
=2
η
A
0
+

n
A

xP{|X
1|>x}dx,
whereAis chosen sufficiently large that
xP{|X
1|>x}<
δ
2
for allx≥A,δ >0 arbitrary.
Thus
E(X
n
1
)
2
≤c+δ

n
A
dx≤c+nδ,
wherecis a constant. It follows that
1

2
E(X
n
1
)
2

c

2
+
δ
ε
2
,

WEAK LAW OF LARGE NUMBERS 307
and sinceδis arbitrary,(1/nε
2
)E(X
n
1
)
2
can be made arbitrarily small for sufficiently large
n. The proof is now completed by the simple observation that, sinceEX
j=μ,
m
n
n
→μ asn→∞.
We emphasize that in Theorem 3 we require only thatE|X
1|<∞; nothing is said about
the variance. Theorem 3 is due to Khintchine.
Example 4.LetX
1,X2,...be iid RVs withE|X 1|
k
<∞for some positive integerk. Then
n

j=1
X
k
j
n
P
−→EX
k
1
asn→∞.
Thus, ifEX
2
1
<∞, then

n
1
X
2
j
/n
P
−→EX
2
1
, and since(

n
j=1
Xj/n)
2
P
−→(EX 1)
2
it follows
that
ΣX
2
j
n

η
ΣX
j
n
α
2
P
−→var(X 1).
Example 5.LetX
1,X2,...be iid RVs with common PDF
f(x)=



1+δ
x
2+δ
,x≥1
0, x<1
,δ>0.
Then
E|X|=(1+δ)


1
1
x
1+δ
dx
=
1+δ
δ
<∞,
and the law of large numbers holds, that is,
n
−1
Sn
P−→
1+δ
δ
asn→∞.
PROBLEMS 7.3
1.LetX
1,X2,...be a sequence of iid RVs with common uniform distribution on[0,1].
Also, letZ
n=(
#
n
i=1
Xi)
1/n
be the geometric mean ofX 1,X2,...,X n,n=1,2,....
Show thatZ
n
P−→c, wherecis some constant. Findc.

308 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
2.LetX 1,X2,...be iid RVs with finite second moment. Let
Y
n=
2
n(n+1)
n

i=1
iXi.
Show thatY
n
P−→EX 1.
3.LetX
1,X2,...be a sequence of iid RVs withEX i=μandvar(X i)=σ
2
.LetS k=

k
j=1
Xj. Does the sequenceS kobey the WLLN in the sense of Definition 1? If so,
find the centering and the norming constants.
4.Let{X
n}be a sequence of RVs for whichvar(X n)≤Cfor allnandρ ij=
cov(X
i,Xj)→0as|i−j|→∞. Show that the WLLN holds.
5.For the following sequences of independent RVs does the WLLN hold?
(a)P{X
k=±2
k
}=
1
2
.
(b)P{X
k=±k}=1/2

k,P{X k=0}=1−(1/

k).
(c)P{X
k=±2
k
}=1/2
2k+1
,P{X k=0}=1−(1/2
2k
).
(d)P{X
k=±1/k}=1/2.
(e)P{X
k=±

k}=
1
2
.
6.LetX
1,X2,...be a sequence of independent RVs such that var(X k)<∞for
k=1,2,...,and(1/n
2
)

n k=1
var(X k)→0asn →∞. Prove the WLLN, using
Theorem 2.
7.LetX
nbe a sequence of RVs with common finite varianceσ
2
. Suppose that the
correlation coefficient betweenX
iandX jis<0 for alliη=j. Show that the WLLN
holds for the sequence{X
n}.
8.Let{X
n}be a sequence of RVs such thatX kis independent ofX jforjη=k+1or
jη=k−1. Ifvar(X
k)<Cfor allk, whereCis some constant, the WLLN holds
for{X
k}.
9.For any sequence of RVs{X
n}show that
max
1≤k≤n
|Xk|
P
−→0⇒n
−1
Sn
P−→0.
10.LetX
1,X2,...be iidC(1,0)RVs. Use Theorem 2 to show that the weak law of large
numbers does not hold. That is, show that
E
S
2
n
n
2
+S
2
n
θ0as n→∞,whereS n=
n

k=1
Xk,n=1,2,....
11.Let{X
n}be a sequence of iid RVs withP{X n≥0}=1. LetS n=

n
j=1
Xj,n=
1,2,.... Suppose{a
n}is a sequence of constants such thata
−1
n
Sn
P−→1. Show that
(a)a
n→∞asn→∞ and (b)a n+1/an→1.
7.4 STRONG LAW OF LARGE NUMBERS

In this section we obtain a stronger form of the law of large numbers discussed in
Section 7.3. LetX
1,X2,...be a sequence of RVs defined on some probability space
(Ω,S,P).

This section may be omitted on the first reading.

STRONG LAW OF LARGE NUMBERS 309
Definition 1.We say that the sequence{X n}obeys thestrong law of large numbers
(SLLN) with respect to the norming constants{B
n}if there exists a sequence of (centering)
constants{A
n}such that
B
−1
n
(Sn−An)
a.s.
−−→0as n→∞. (1)
HereB
n>0 andB n→∞asn→∞.
We will obtain sufficient conditions for a sequence{X
n}to obey the SLLN. In what fol-
lows, we will be interested mainly in the caseB
n=n. Indeed, when we speak of the SLLN
we will assume that we are speaking of the norming constantsB
n=n, unless specified
otherwise.
We start with theBorel–Cantelli lemma.Let{A
j}be any sequence of events inS.We
recall that
lim
n→∞
An= lim
n→∞


k=n
Ak=


n=1


k=n
Ak. (2)
We will writeA=
limn→∞An. Note thatAis the event thatinfinitely manyof theA noccur.
We will sometimes write
PA=P

lim
n→∞
An

=P(A
ni.o.),
where “i.o.” stands for “infinitely often.” In view of Theorem 7.2.11 and Remark 7.2.6 we
haveX
n
a.s.−−→0 if and only ifP{|X n|>εi.o.}=0 for allε>0.
Theorem 1(Borel–Cantelli Lemma).
(a) Let{A
n}be a sequence of events such that


n=1
PAn<∞. ThenPA=0.
(b) If{A
n}is an independent sequence of events such that


n=1
PAn=∞, then
PA=1.
Proof.
(a)PA=P(lim
n→∞
$

k=n
Ak) = limn→∞P(
$

k=n
Ak)≤lim n→∞


k=n
PAk=0.
(b) We haveA
c
=
$

n=1%

k=n
A
c
k
, so that
PA
c
=P
&
lim
n→∞


k=n
A
c
k
'
= lim
n→∞
P
&


k=n
A
c
k
'
.
Forn
0>n, we see that
%

k=n
A
c
k

%
n0
k=n
A
c
k
, so that
P
&


k=n
A
c
k
'
≤lim
n0→∞
P
&
n0↑
k=n
A
c
k
'
= lim
n0→∞
n
0≈
k=n
(1−PA k),

310 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
because{A n}is an independent sequence of events. Now we use the elementary
inequality
1−exp

⎝−
n0√
j=n
αj

⎠≤1−
n0≈
j=n
(1−α j)≤
n0√
j=n
αj,n0>n,1≥α j≥0,
to conclude that
P
&


k=n
A
c
k
'
≤lim
n0→∞
exp
&

n0√
k=n
PAk
'
.
Since the series


n=1
PAndiverges, it follows thatPA
c
=0orPA =1.
Corollary.Let{A
n}be a sequence of independent events. ThenPAis either 0 or 1.
The corollary follows since


n=1
PAneither converges or diverges.
As a simple application of the Borel–Cantelli lemma, we obtain a version of the SLLN.
Theorem 2.IfX
1,X2,...are iid RVs with common meanμand finite fourth moment,
then
P

lim
n→∞
Sn
n


=1.
Proof.We have
E{Σ(X
i−μ)}
4
=nE(X 1−μ)
4
+6
η
n
2

σ
4
≤Cn
2
.
By Markov’s inequality
P
→(
(
(
(
(
n

1
(X1−μ)
(
(
(
(
(
>nε


E{

n
1
(X1−μ)}
4
(nε)
4

Cn
2
(nε)
4
=
C

n
2
.
Therefore,


n=1
P{|Sn−μn|>nε}<∞,
and it follows by the Borel–Cantelli lemma that with probability 1 only finitely many of
the events{ω:|(S
n/n)−μ|>ε}occur, that is,PA ε=0, where
A
ε= lim
n→∞
sup
(
(
(
(
S
n
n
−μ
( ( ( (


.

STRONG LAW OF LARGE NUMBERS 311
The setsA εincrease, asε→0, to theωset on whichS n/nθμ. Lettingε→0 through a
countable set of values, we have
P

S
n
n
−μθ0

=P
θ
ζ
k
A
1/k

=0.
Corollary.IfX
1,X2,...are iid RVs such thatP{|X n|<K}=1 for alln, whereKis a
positive constant, thenn
−1
Sn
a.s.−−→μ.
Theorem 3.LetX
1,X2,...be a sequence of independent RVs. Then
X
n
a.s.−−→0⇔


n=1
P{|X n|>ε}<∞ for allε>0.
Proof.WritingA
n={|X n|>ε}, we see that{A n}is a sequence of independent events.
SinceX
n
a.s.−−→0,X n→0onasetE
c
withPE=0. A pointω∈E
c
belongs only to a finite
number ofA
n.Itfollowsthat
lim
n→∞
supA n⊂E,
hence,P(A
ni.o.)= 0. By the Borel–Cantelli lemma (Theorem 1(b)) we must have


n=1
PAn<∞. (Otherwise,


n=1
PAn=∞, and thenP(A ni.o.)=1.)
In the other direction, let
A
1/k= limsup
n→∞

|X
n|>
1
k

,
and use the argument in the proof of Theorem 2.
Example 1.We take an application of Borel–Cantelli Lemma to prove a.s. convergence.
Let{X
n}have PMF
P(X
n=0)=1−
1
n
α
,P(X n=±n)=
1
2n
α
.
ThenP(|X
n|>ε)=
1
n
αand it follows that


n=1
P(|Xn|>ε)=


n=1
1
n
α
<∞forα>1.
Thus from Borel–Cantelli lemmaP(A
ni.o.)=0, whereA n={|X n|>ε}. Now using the
argument in the proof of Theorem 2 we can show thatP(X
nλθ0}=0.
We next prove some important lemmas that we will need subsequently.

312 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
Lemma 1(Kolmogorov’s Inequality).LetX 1,X2,...,X nbe independent RVs with com-
mon mean 0 and variancesσ
2
k
,k=1,2,...,n, respectively. Then for anyε>0
P

max
1≤k≤n
|Sk|>ε


n

1
σ
2
i
ε
2
. (3)
Proof.LetA
0=Ω,
A
k=

max
1≤j≤k
|Sj|≤ε

,k=1,2,...,n,
and
B
k=Ak−1∩A
c
k
={|S 1|≤ε,...,|S k−1|≤ε}∩{at least one of|S 1|,...,|S k|is>ε}
={|S
1|≤ε,...,|S k−1|≤ε,|S k|>ε}.
It follows that
A
c
n
=
n

k=1
Bk
and
B
k⊂{|S k−1|≤ε,|S k|>ε}.
As usual, let us writeI
Bk
, for the indicator function of the eventB k. Then
E(S
nIBk
)
2
=E{(S n−Sk)IBk
+SkIBk
}
2
,
=E{(S
n−Sk)
2
IBk
+S
2
k
IBk
+2S k(Sn−Sk)IBk
}.
SinceS
n−Sk=Xk+1+···+X nandS kIBk
are independent, andEX k=0 for allk,itfollows
that
E(S
nIBk
)
2
=E{(S n−Sk)IBk
}
2
+E(S kIBk
)
2
≥E(S kIBk
)
2
≥ε
2
PBk.
The last inequality follows from the fact that, inB
k,|Sk|>ε. Moreover,
n

k=1
E(SnIBk
)
2
=E(S
2
n
IA
c
n)≤E(S
2
n
)=
n

1
σ
2
k
so that
n

1
σ
2
k
≥ε
2
n

1
PBk=ε
2
P(A
c
n
),
as asserted.

STRONG LAW OF LARGE NUMBERS 313
Corollary.Taken=1 then
P{|X
1|>ε}≤
σ
2
1
ε
2
,
which is Chebychev’s inequality.
Lemma 2(Kronecker Lemma).If


n=1
xnconverges tos(finite) andb n↑∞, then
b
−1
n
n

k=1
bkxk→0.
Proof.Writingb
0=0,a k=bk−bk−1, ands n+1=

n
k=1
xk,wehave
1
bn
n

k=1
bkxk=
1
bn
n

k=1
bk(sk+1−sk)
=
1
bn
&
b
nsn+1+
n

1
bk−1sk
'

1
bn
n

k=1
bksk
=sn+1−
1
bn
n

k=1
(bk−bk−1)sk
=sn+1−
1
bn
n

k=1
aksk.
It therefore suffices to show thatb
−1
n

n k=1
aksk→s. Sinces n→s, there exists ann 0=
n
0(ε)such that
|s
n−s|<
ε
2
forn>n
0.
Sinceb
n↑∞,letn 1be an integer>n 0such that
b
−1
n
(
(
(
(
(
n0√
1
(bk−bk−1)(sk−s)
(
(
(
(
(
<
ε
2
forn>n
1.
Writing
r
n=b
−1
n
n

k=1
(bk−bk−1)sk,
we see that
|r
n−s|=
1
bn
( ( ( ( (
n

k=1
(bk−bk−1)(sk−s)
( ( ( ( (
,

314 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
and, choosingn>n 1,wehave
|r
n−s|≤
(
(
(
(
(
1
bn
n
0√
k=1
(bk−bk−1)(sk−s)
( ( (
(
(
+
1
bn
( ( (
(
(
n

k=n0+1
(bk−bk−1)
ε2
( ( (
(
(
<ε.
This completes the proof.
Theorem 4.If


n=1
varX n<∞, then


n=1
(Xn−EXn)converges almost surely.
Proof.Without loss of generality assume thatEX
n=0. By Kolmogorov’s inequality
P

max
1≤k≤n
|Sm+k−Sm|≥ε


1
ε
2
n

k=1
var(X m+k).
Lettingn→∞we have
P

max
k≥1
|Sm+k−Sm|≥ε

=P

max
k≥m+1
|Sk−Sm|≥ε


1
ε
2


k=m+1
var(X k).
It follows that
lim
m→∞
P

max
k>m
|Sk−Sm|<ε

=1,
and sinceε>0isarbitrarywehave
P



lim
m→∞
(
(
(
(
(
(


j=m
Xj
(
(
(
(
(
(
=0



=1.
Consequently,


j=1
Xjconverges a.s.
As a corollary we get a version of the SLLN for nonidentically distributed RVs which
subsumes Theorem 2.
Corollary 1.Let{X
n}be independent RVs. If


k=1
var(X k)
B
2
k
<∞, B n↑∞,
then
S
n−ESn
Bn
a.s.
−−→0.
The corollary follows from Theorem 4 and the Kronecker lemma.

STRONG LAW OF LARGE NUMBERS 315
Corollary 2.Every sequence{X n}of independent RVs with uniformly bounded vari-
ances obeys the SLLN.
Ifvar(X
k)≤Afor allk, andB k=k, then


k=1
σ
2
k
k
2
≤A


1
1
k
2
<∞,
and it follows that
S
n−ESn
n
a.s.
−−→0.
Corollary 3(Borel’s Strong Law of Large Numbers).For a sequence of Bernoulli tri-
als with (constant) probabilitypof success, the SLLN holds (withB
n=nandA n=np).
Since
EX
k=p,var(X k)=p(1 −p)≤
1
4
,0<p<1,
the result follows from Corollary 2.
Corollary 4.Let{X
n}be iid RVs with common meanμand finite varianceσ
2
. Then
P

lim
n→∞
Sn
n


=1.
Remark 1.Kolmogorov’s SLLN is much stronger than Corollaries 1 and 4 to Theorem 4.
It states that if{X
n}is a sequence of iid RVs then
n
−1
Sn
a.s.−→μ⇐⇒E|X 1|<∞,
and thenμ=EX
1. The proof requires more work and will not be given here. We refer the
reader to Billingsley [6], Chung [15], Feller [26], or Laha and Rohatgi [58].
PROBLEMS 7.4
1.For the following sequences of independent RVs does the SLLN hold?
(a)P{X
k=±2
k
}=
1
2
.
(b)P{X
k=±k}=1/2

k,P{X k=0}=1−(1/

k).
(c)P{X
k=±2
k
}=1/2
2k+1
,P{X k=0}=1−(1/2
2k
).
2.LetX
1,X2,...be a sequence of independent RVs with


k=1
var(X k)/k
2
<∞. Show
that
1
n
2
n

k=1
var(X k)→0as n→∞.
Does the converse also hold?

316 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
3.For what values ofαdoes the SLLN hold for the sequence
P{X
k=±k
α
}=
1
2
?
4.Let{σ
2
k
}be a sequence of real numbers such that


k=1
σ
2
k
/k
2
=∞. Show that there
exists a sequence of independent RVs{X
k}withvar(X k)=σ
2
k
,k=1,2,...,such that
n
−1

n
k=1
(Xk−EXk)does not converge to 0 almost surely.
[Hint:LetP{X
k=±k}=σ
2
k
/2k
2
,P{X k=0}=1−(σ
2
k
/k
2
)ifσ k/k≤1, andP{X k=
±σ
k}=
1
2
ifσk/k>1. Apply the Borel–Cantelli lemma to{|X n|>n}.]
5.LetX
nbe a sequence of iid RVs withE|X n|=+∞. Show that, for every positive
numberA,P{|X
n|>nAi.o.}=1 andP{|S n|<nAi.o.}=1.
6.Construct an example to show that the converse of Theorem 1(a) does not hold.
7.Investigate a.s. convergence of{X
n}to 0 in each case.
(a)P(X
n=e
n
)=1/n
2
,P(X n=0)=1−1/n
2
.
(b)P(X
n=0)=1−1/n,P(X n=±1)=1/(2n).
(X
n’s are independent in each case.)
7.5 LIMITING MOMENT GENERATING FUNCTIONS
LetX
1,X2,...be a sequence of RVs. LetF nbe the DF ofX n,n=1,2,...,and suppose
that the MGFM
n(t)ofF nexists. What happens toM n(t)asn→∞? If it converges, does
it always converge to an MGF?
Example 1.Let{X
n}be a sequence of RVs with PMFP{X n=−n}=1,n=1,2,....We
have
M
n(t)=Ee
tXn
=e
−tn
→0as n→∞ for allt>0,
M
n(t)→+∞for allt<0,andM n(t)→1att=0.
Thus
M
n(t)→M(t)=





0,t>0
1,t=0asn→∞.
∞,t<0
ButM(t)is not an MGF. Note that ifF
nis the DF ofX nthen
F
n(x)=
θ
0ifx<−n
1ifx≥−n
→F(x)=1 for allx,
andFis not a DF.

LIMITING MOMENT GENERATING FUNCTIONS 317
Next suppose thatX nhas MGFM nandX n
L−→X, whereXis an RV with MGFM. Does
M
n(t)→M(t)asn→∞? The answer to this question is in the negative.
Example 2.(Curtiss [19]). Consider the DF
F
n(x)=





0, x<−n,
1
2
+cntan
−1
(nx),−n≤x<n,
1, x≥n,
wherec
n=1/[2tan
−1
(n
2
)]. Clearly, asn→∞,
F
n(x)→F(x)=

0,x<0,
1,x≥0,
at all points of continuity of the DFF. The MGF associated withF
nis
M
n(t)=

n
−n
cne
tx
n
1+n
2
x
2
dx,
which exists for allt. The MGF corresponding toFisM(t)=1 for allt.ButM
n(t)→M(t),
sinceM
n(t)→∞iftη=0. Indeed
M
n(t)>

n
0
cn
|t|
3
x
3
6
n
1+n
2
x
2
dx.
The following result is a weaker version of the continuity theorem due to Lévy and
Cramér. We refer the reader to Lukacs [69, p. 47], or Curtiss [19], for details of the proof.
Theorem 1(Continuity Theorem).Let{F
n}be a sequence of DFs with corresponding
MGFs{M
n}, and suppose thatM n(t)exists for|t|≤t 0for everyn. If there exists a DF
Fwith corresponding MGFMwhich exists for|t|≤t
1<t0, such thatM n(t)→M(t)as
n→∞for everyt∈[−t
1,t1], thenF n
w−→F.
Example 3.LetX
nbe an RV with PMF
P{X
n=1}=
1
n
and P{X
n=0}=1−
1
n
.
ThenM
n(t)=(1/n)e
t
+[1−(1/n)]exists for allt∈R, andM n(t)→1asn→∞for allt.
HereM(t)=1istheMGFofanRVX degenerate at 0. ThusX
n
L−→X.
Remark 1.The following notation on orders of magnitude is quite useful. We writex
n=
o(r
n)if, givenε>0, there exists anNsuch that|x n/rn|<εfor alln≥Nandx n=O(r n)

318 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
if there exists anNand a constantc>0, such that|x n/rn|<cfor alln≥N. We write
x
n=O(1)to express the fact thatx nis bounded for largen, andx n=o(1)to mean that
x
n→0asn→∞.
This notation is extended to RVs in an obvious manner. Thus
X
n=op(rn)if, for everyε>0 andδ>0, there exists anNsuch thatP(|X n/rn|≤δ)≥
1−εforn≥N, andX
n=Op(rn)if, forε>0, there exists ac>0 and anNsuch that
P(|X
n/rn|≤c) ≥1−ε. We writeX n=op(1)to meanX n
P−→0. This notation can be easily
extended to the case wherer
nitselfisanRV.
The following lemma is quite useful in applications of Theorem 1.
Lemma 1.Let us writef(x)=o(x ),iff(x)/x→0asx→0. We have
lim
n→∞

1+
a
n
+o
η
1
n
ασ
n
=e
a
for every reala.
Proof.By Taylor’s expansion we have
f(x)=f(0)+xf
θ
(θx)
=f(0)+xf
θ
(0)+{ f
θ
(θx)−f
θ
(0)}x,0<θ<1.
Iff
θ
(x)is continuous atx=0, then asx→0
f(x)=f(0)+xf
θ
(0)+o(x ).
Takingf(x)=log(1+x),wehavef
θ
(x)=(1+x)
−1
, which is continuous atx=0, so that
log(1+x)=x+o(x).
Then for sufficiently largen
nlog

1+
a
n
+o
η
1
n
ασ
=n

a
n
+o
η
1
n
α
+o
,
a
n
+o
η
1
n
α-σ
=a+no
η
1
n
α
=a+o(1).
It follows that

1+
a
n
+o
η
1
n
ασ
n
=e
a+o(1)
,
as asserted.
Example 4.LetX
1,X2,...be iidb(1,p)RVs. Also, letS n=

n
1
Xk, and letM n(t)be the
MGF ofS
n. Then
M
n(t)=(q+pe
t
)
n
for allt,

LIMITING MOMENT GENERATING FUNCTIONS 319
whereq=1−p.Ifweletn →∞in such a way thatnpremains constant atλ, say, then,
by Lemma 1,
M
n(t)=
η
1−
λ
n
+
λ
n
e
t

n
=

1+
λ
n
(e
t
−1)

n
→exp{λ(e
t
−1)}for allt,
which is the MGF of aP(λ)RV. Thus, the binomial distribution function approaches the
Poisson df, provided thatn→∞in such a way thatnp=λ>0.
Example 5.LetX∼P(λ).TheMGFofX is given by
M(t)=exp{λ(e
t
−1)} for allt.
LetY=(X−λ)/

λ. Then the MGF ofYis given by
M
Y(t)=e
−t

λ
M
η
t

λ

.
Also,
logM
Y(t)=−t

λ+logM
η
t

λ

=−t

λ+λ(e
t/

λ
−1)
=−t

λ+λ
η
t

λ
+
t
2

+
t
3
3!λ
3/2
+···

=
t
2
2
+
t
3
3!λ
3/2
+···.
It follows that
logM
Y(t)→
t
2
2
asλ→∞,
so thatM
Y(t)→e
t
2
/2
asλ→∞, which is the MGF of anN(0,1)RV.
For more examples see Section 7.6.
Remark 2.As pointed out earlier working with MGFs has the disadvantage that the exis-
tence of MGFs is a very strong condition. Working with CFs which always exist, on the
other hand, permits a much wider application of the continuity theorem. Letφ
nbe the CF
ofF
n. ThenF n
w−→Fif and only ifφ n→φasn→∞onR, whereφis continuous at
t=0. In this caseφ, the limit function, is the CF of the limit DFF.

320 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
Example 6.LetXbe aC(0,1)RV. Then its CF is given by
Eexp(itX)=
1
π


−∞
costx
1+x
2
dx+i
1
π


−∞
sintx
1+x
2
dx
=
1
π


−∞
costx
1+x
2
dx=e
−|t|
since the second integral on the right side vanishes.
Let{X
n}be iid RVs with common lawL(X)and setY n=

n
j=1
Xj/n.ThentheCFof
Y
nis given by
ϕ
n(t)=Eexp



it
n

j=1
Xj/n



=
n

j=1
exp


|t|
n

= exp(−|t |)
for alln. It followsϕ
nis the CF of aC(1,0)RV. We could not have derived this result
using MGFs. Also ifU
n=

n
j=1
Xj/n
α
forα>1, then
ϕU
n(t)=exp
.
−|t|/n
α−1
/
→1
asn→∞for allt. Sinceϕ(t)=1 is continuous att=0,ϕis the CF of the limit
DFF. ClearlyFis the DF of an RV degenerate at 0. Thus

n
j=1
Xj/n
α
L,P
−→U, where
P(U=0)=1.
PROBLEMS 7.5
1.LetX∼NB(r;p). Show that
2pX
L
−→Y asp→0,
whereY∼χ
2
(2r).
2.LetX
n∼NB(r n;1−p n),n=1,2,....Show thatX n
L−→Xasr n→∞,p n→0, in such
a way thatr
npn→λ, whereX∼P(λ).
3.LetX
1,X2,...be independent RVs with PMF given byP{X n=±1}=
1
2
,n=1,2,....
LetZ
n=

n j=1
Xj/2
j
. Show thatZ n
L−→Z, whereZ∼U[−1,1].
4.Let{X
n}be a sequence of RVs withX n∼G(n,β)whereβ>0 is a constant
(independent ofn). Find the limiting distribution ofX
n/n.
5.LetX
n∼χ
2
(n),n=1,2,....Find the limiting distribution ofX n/n
2
.
6.LetX
1,X2,...,X nbe jointly normal withEX i=0,EX
2
i
=1 for alliandcov(X i,Xj)=
ρ,i,j=1,2,...(iη=j). What is the limiting distribution ofn
−1
Sn, whereS n=

n
k=1
Xk?

CENTRAL LIMIT THEOREM 321
7.6 CENTRAL LIMIT THEOREM
LetX
1,X2,...be a sequence of RVs, and letS n=

n
k=1
,Xk,n=1,2,....In Sections 7.3
and 7.4 we investigated the convergence of the sequence of RVsB
−1
n
(Sn−An)to
the degenerate RV. In this section we examine the convergence ofB
−1
n
(Sn−An)to a
nondegenerate RV. Suppose that, for a suitable choice of constantsA
nandB n>0, the RVs
B
−1
n
(Sn−An)
L
−→Y. What are the properties of this limit RVY? The question as posed is
far too general and is not of much interest unless the RVsX
iare suitably restricted. For
example, if we takeX
1with DFFandX 2,X3,...to be 0 with probability 1, choosingA n=0
andB
n=1 leads toFas the limit DF.
We recall (Example 7.5.6) that, ifX
1,X2,...,X nare iid RVs with common lawC(1,0),
thenn
−1
Snis alsoC(1,0).Again,ifX 1,X2,...,X nare iidN(0,1)RVs thenn
−1/2
Snis
alsoN(0,1)(Corollary 2 to Theorem 5.3.22). We note thus that for certain sequences of
RVs there exist sequencesA
nandB n>0,B n→∞, such thatB
−1
n
(Sn−An)
L
−→Y.Inthe
Cauchy caseB
n=n,A n=0, and in the normal caseB n=n
1/2
,An=0. Moreover, we see
that Cauchy and normal distributions appear as limiting distributions—in these two cases,
because of the reproductive nature of the distributions. Cauchy and normal distributions
are examples ofstable distributions.
Definition 1.LetX
1,X2, be iid nondegenerate RVs with common DFF.Leta 1,a2be any
positive constants. We say thatFisstableif there exist constantsAandB(depending on
a
1,a2) such that the RVB
−1
(a1X1+a2X2−A)also has the DFF.
LetX
1,X2,...be iid RVs with common DFF. We remark without proof (see Loève [66,
p. 339]) that only stable distributions occur as limits. To make this statement more precise
we make the following definition.
Definition 2.LetX
1,X2,...be iid RVs with common DFF. We say thatFbelongs to
the domain of attraction of a distributionVif there exist norming constantsB
n>0 and
centering constantsA
nsuch that, asn→∞,
P{B
−1
n
(Sn−An)≤x}→V (x), (1)
at all continuity pointsxofV.
In view of the statement after Definition 1, we see that only stable distributions possess
domains of attraction. From Definition 1 we also note that each stable law belongs to its
own domain of attraction. The study of stable distributions is beyond the scope of this
book. We shall restrict ourselves to seeking conditions under which the limit lawVis the
normal distribution. The importance of the normal distribution in statistics is due largely
to the fact that a wide class of distributionsFbelongs to the domain of attraction of the
normal law. Let us consider some examples.
Example 1.LetX
1,X2,...,X nbe iidb(1,p)RVs. Let
S
n=
n

k=1
Xk,A n=ESn=np, B n=
ϕ
var(S n)=
ϕ
np(1−p).

322 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
Then
M
n(t)=Eexp
Ω
S
n−np

np(1−p)
t

=
n

i=1
Eexp
Ω
X
i−p

np(1−p)
t

=exp
Ω

npt

np(1−p)
Ω
q+pexp

t

np(1−p)
⇐∪
n
,q=1−p,
=

qexp
η

pt

npq

+pexp
η
qt

npq
⇒⊆
n
=
,
1+
t
2
2n
+o
η
1
n
⇒-
n
.
It follows from Lemma 7.5.1 that
M
n(t)→e
t
2
/2
asn→∞,
and sincee
t
2
/2
is the MGF of anN(0,1)RV, we have by the continuity theorem
P

S
n−np

npq
≤x


1



x
−∞
e
t
2
/2
dtfor allx∈R.
In particular, we note that for eachx∈R,F

n
(x)
p
−→F(x)asn→∞and

n[F

n
(x)−F(x)]↓
F(x)(1−F(x))
L
−→Z asn→∞,
whereZisN(0,1). It is possible to make a probability statement simultaneously for allx.
This is the so-called Glivenko–Cantelli theorem:F

n
(x)converges uniformly toF(x).For
a proof, we refer to Fisz [31, p. 391].
Example 2.LetX
1,X2,...,X nbe iidχ
2
(1)RVs. ThenS n∼χ
2
(n),ES n=n, andvar(S n)=
2n.AlsoletZ
n=(Sn−n)/

2nthen
M
n(t)=Ee
tZn
=exp
η
−t
0
n
2
αη
1−
2t

2n

−n/2
,2t<

2n,
=

exp
&
t
0
2
n
'
−t
0
2
n
exp
&
t
0
2
n
'⇐
−n/2
,t<
0
n
2
.
Using Taylor’s approximation, we get
exp
&
t
0
2
n
'
=1+t
0
2
n
+
t
2
2
&0
2
n
'
2
+
1
6
exp(θ
n)
&
t
0
2
n
'
3
,

CENTRAL LIMIT THEOREM 323
where 0<θ n<t
ϕ
(2/n). It follows that
M
n(t)=
η
1−
t
2
n
+
ζ(n)
n
α
−n/2
,
where
ζ(n)=−
0
2
n
t
3
+
&
t
3
3
0
2
n

2t
4
3n
'
exp(θ
n)→0as n→∞,
for every fixedt. We have from Lemma 1 thatM
n(t)→e
t
2
/2
asn→∞for all realt, and
it follows thatZ
n
L−→Z, whereZisN(0,1).
These examples suggest that if we take iid RVs with finite variance and takeA
n=ESn,
B
n=
ϕ
var(S n), thenB
−1
n
(Sn−An)
L
−→Z, whereZisN(0,1). This is the central limit result,
which we now prove. The reader should note that in both Examples 1 and 2 we used more
than just the existence ofE|X|
2
. Indeed, the MGF exists and hence moments of all order
exist. The existence of MGF is not a necessary condition.
Theorem 1(Lindeberg–Lévy Central Limit Theorem).Let{X
n}be a sequence of iid
RVs with 0<var(X
n)=σ
2
<∞and common meanμ.LetS n=

n
j=1
Xj,n=1,2,....
Then for everyx∈R
lim
n→∞
P

S
n−nμ
σ

n
≤x

= lim n→∞
P

X−μ
σ/

n
≤x

=
1



x
−∞
e
−u
2
/2
du.
Proof.The proof we give here assumes that the MGF ofX
nexists. Without loss of gen-
erality, we also assume thatEX
n=0 andvar(X n)=1. LetMbe the MGF ofX n. Then the
MGF ofS
n/

nis given by
M
n(t)=Eexp(tS n/

n)=[M(t/

n)]
n
and
√nM
n(t)=n√nM(t/

n)=
√nM(t/

n)
1/n
=
L(t/

n)
1/n
,
whereL(t/

n)=√n M(t/

n). ClearlyL(0)=√n(1)= 0, so that asn→∞, the conditions
for L’Hospital’s rule are satisfied. It follows that
lim
n→∞
√nM n(t) = lim
n→∞
L
θ
(t/

n)t
2/

n
and sinceL
θ
(0)=EX=0, we can use L’Hospital’s Rule once again to get
lim
n→∞
√nM n(t) = lim
n→∞
L
θθ
(t/

n)t
2
2
=
t
2
2

324 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
usingL
ΩΩ
(0)=var(X)=1. Thus
M
n(t)−→exp(t
2
/2)=M(t)
whereM(t)is the MGT of aN(0,1)RV.
Remark 1.In the proof above we could have used the Taylor series expansion ofMto
arrive at the same result.
Remark 2.Even though we proved Theorem 1 for the case when the MGF ofX
n’s exists,
we will use the result whenever 0<EX
2
n

2
<∞. The use of CFs would have provided
a complete proof of Theorem 1. Letφbe the CF ofX
n. Assuming again, without loss of
generality, thatEX
n=0,var(X n)=1, we can write
φ(t)=1−
1
2
t
2
+t
2
o(1).
Thus the CF ofS
n/

nis
[φ(t/

n)]
n
=
,
1−
1
2n
t
2
+
t
2
n
o(1)
-
n
,
which converges toexp(− t
2
/2)which is the CF of aN(0,1)RV. The devil is in the details
of the proof.
The following converse to Theorem 1 holds.
Theorem 2.LetX
1,X2,...,X nbe iid RVs such thatn
−1/2
Snhas the same distribution for
everyn=1,2,....Then, ifEX
i=0,var(X i)=1, the distribution ofX imust beN(0,1).
Proof.LetFbe the DF ofn
−1/2
Sn. By the central limit theorem,
lim
n→∞
P{n
−1/2
Sn≤x}=Φ(x).
Also,P{n
−1/2
Sn≤x}=F(x)for eachn. It follows that we must haveF(x)=Φ(x).
Example 3.LetX
1,X2,...be iid RVs with common PMF
P{X=k}=p(1−p)
k
,k=0,1,2,...,0<p<1,q=1−p.
ThenEX=q/p,var(X)=q/p
2
. By Theorem 1 we see that
P
Ω
S
n−n(q/p)

nq
p≤x

→Φ(x)asn→∞for allx∈R.
Example 4.LetX
1,X2,...be iid RVs with commonB(α,β)distribution. Then
EX=
α
α+β
and var(X)=
αβ
(α+β)
2
(α+β+1)
.

CENTRAL LIMIT THEOREM 325
By the corollary to Theorem 1, it follows that
S
n−n[α/(α+β)]

αβn/[(α+β+1)(α+β)
2
]
L
−→Z,
whereZisN(0,1).
For nonidentically distributed RVs we state, without proof, the following result due to
Lindeberg.
Theorem 3.LetX
1,X2,...be independent RVs with DFsF 1,F2,...,respectively. Let
EX
k=μkandvar(X k)=σ
2
k
, and write
s
2
n
=
n

j=1
σ
2
j
.
If theF
k’s are absolutely continuous with PDFf k, assume that the relation
lim
n→∞
1
s
2
n
n

k=1

|x−μ k|>εs n
(x−μ k)
2
fk(x)dx=0( 2)
holds for allε>0. (A similar condition can be stated for the discrete case.) Then
S

n
=

n
j=1
Xj−

n
j=1
μj
sn
L
−→Z∼N(0,1). (3)
Condition (2) is known as theLindeberg condition.
Feller [24] has shown that condition (2) is necessary as well in the following sense. For
independent RVs{X
k}for which (3) holds and
P

max
1≤k≤n
|Xk−EXk|>ε

var(S n)

→0,
(2) holds for everyε>0.
Example 5.LetX
1,X2,...be independent RVs such thatX kisU(−a k,ak). ThenEX k=0,
var(X
k)=(1/3)a
2
k
. Suppose that|a k|<aand

n
1
a
2
k
→∞asn→∞. Then
1
s
2
n
n

k=1

|x|>εs n
x
2
fk(x)dx≤
1
s
2 n
n

k=1

|x|>εs n
a
2
1
2ak
dx

a
2
s
2
n
n

k=1
P{|X k|>εs n}≤
a
2
s
2 n
n

k=1
var(X k)
ε
2
s
2 n
=
a
2
ε
2
s
2
n
→0as n→∞.

326 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
If


1
a
2
k
<∞, thens
2
n
↑A
2
,say,asn →∞.Forfixedk , we can findε ksuch that
ε
kA<a kand thenP{|X k|>εksn}≥P{|X k|>εkA}>0. Forn≥k,wehave
1
s
2
n
n

j=1

|x|>ε ksn
x
2
fj(x)dx≥
s
2
n
ε
2
k
s
2
n
n

j=1
P{|X j|>εksn}
≥ε
2
k
P{|X k|>εksn}
>0,
so that the Lindeberg condition does not hold. Indeed, ifX
1,X2,...are independent RVs
such that there exists a constantAwithP{|X
n|≤A} =1 for alln, the Lindeberg condi-
tion (2) is satisfied ifs
2
n
→∞asn→∞. To see this, suppose thats
2
n
→∞. Since theX k’s
are uniformly bounded, so are the RVsX
k−EXk. It follows that for everyε>0 we can
find anN
εsuch that, forn≥N ε,P{|X k−EXk|<εs n,k=1,2,...,n}=1. The Lindeberg
condition follows immediately. The converse also holds, for, iflim
n→∞s
2
n
<∞and the
Lindeberg condition holds, there exists a constantA<∞such thats
2
n
→A
2
. For any fixed
j, we can find anε>0 such thatP{|X
j−μj|>εA}>0. Then, forn≥j,
1
s
2
n
n

k=1

|x−μ k|>εs n
(x−μ k)
2
fk(x)dx
≥ε
2
n

k=1
P{|X k−μk|>εs n}
≥ε
2
P{|X j−μj|>εA}
>0,
and the Lindeberg condition does not hold. This contradiction shows thats
2
n
→∞is also
a necessary condition that is, for a sequence of uniformly bounded independent RVs, a
necessary and sufficient condition for the central limit theorem to hold iss
2
n
→∞as
n→∞.
Example 6.LetX
1,X2,...be independent RVs such thatα k=E|X k|
2+δ
<∞for some
δ>0 andα
1+α2+···+α n=o(s
2+δ
n
). Then the Lindeberg condition is satisfied, and the
central limit theorem holds. This result is due to Lyapunov. We have
1
s
2
n
n

k=1

|x|>εs n
x
2
fk(x)dx

1
ε
δ
s
2+δ
n
n

k=1


−∞
|x|
2+δ
fk(x)dx
=

n
k=1
αk
ε
δ
s
2+δ
n
→0as n→∞.
A similar argument applies in the discrete case.

CENTRAL LIMIT THEOREM 327
Remark 3.Both the central limit theorem (CLT) and the (weak) law of large numbers
(WLLN) hold for a large class of sequences of RVs{X
n}.Ifthe{X n}are independent
uniformly bounded RVs, that is, ifP{|X
n|≤M}=1, the WLLN (Theorem 7.3.1) holds;
the CLT holds provided thats
2
n
→∞(Example 5).
If the RVs{X
n}are iid, then the CLT is a stronger result than the WLLN in that the
former provides an estimate of the probabilityP{|S
n−nμ|/n ≥ε}. Indeed,
P{|S
n−nμ|>nε}=P

|S
n−nμ|
σ

n
>
ε
σ

n

≈1−P

|Z|≤
ε
σ

n

,
whereZisN(0,1), and the law of large number follows. On the other hand, we note that
the WLLN does not require the existence of a second moment.
Remark 4.If{X
n}are independent RVs, it is quite possible that the CLT may apply to the
X
n’s, but not the WLLN.
Example 7(Feller [25, p. 255]). Let{X
k}be independent RVs with PMF
P{X
k=k
λ
}=P{X k=−k
λ
}=
1
2
,k=1,2,....
ThenEX
k=0,var(X k)=k

.Alsoletλ>0, then
s
2
n
=
n

k=1
k



n+1
0
x

dx=
(n+1)
2λ+1
2λ+1
It follows that, if 0<λ<
1
2
,sn/n→0, and by Corollary 2 to Theorem 7.3.1 the WLLN
holds. Nowk
λ
<n
λ
, so that the sum

n
k=1
|xkl|>εs n
x
2
kl
pklwill be nonzero ifn
λ
>εsn≈
ε[n
λ+1/2
/

(2λ+1)]. It follows that, as long asn>(2λ+1)ε
−2
,
1
s
2
n
n

k=1

|xkl|>εs n
x
2
kl
pkl=0
and the Lindeberg condition holds. Thus the CLT holds forλ>0. This means that
P

a<
0
2λ+1
n
2λ+1
Sn<b



b
a
e
−t
2
/2
dt√

.
Thus
P

an
λ+1/2−1

2λ+1
<
S
n
n
<
bn
λ+1/2−1

2λ+1



b
a
e
−t
2
/2


dt
and the WLLN cannot hold forλ≥
1
2
.

328 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
We conclude this section with some remarks concerning the application of the CLT.
LetX
1,X2,...be iid RVs with common meanμand varianceσ
2
. Let us write
Z
n=
S
n−nμ
σ

n
,
and letz
1,z2be two arbitrary real numbers withz 1<z2.IfF nis the DF ofZ n, then
lim
n→∞
P{z1<Zn≤z2}= lim
n→∞
[Fn(z2)−F n(z1)]
=
1



z2
z1
e
−t
2
/2
dt,
that is,
lim
n→∞
P{z1σ

n+nμ<S n≤z2σ

n+nμ}=
1



z2
z1
e
−t
2
/2
dt. (4)
It follows that the RVS
n=

n
k=1
Xkisasymptotically normally distributedwith meannμ
and variancenσ
2
. Equivalently, the RVn
−1
Snis asymptoticallyN(μ,σ
2
/n). This result is
of great importance in statistics.
In Fig. 1 we show the distribution of
Xin sampling fromP(λ)andG(1,1).Wehave
also superimposed, in each case, the graph of the corresponding normal approximation.
How large shouldnbe before we apply approximation (4)? Unfortunately the answer
is not simple. Much depends on the underlying distribution, the corresponding speed of
convergence, and the accuracy one desires. There is a vast amount of literature on the
speed of convergence and error bounds. We will content ourselves with some examples.
The reader is referred to Rohatgi [90] for a detailed discussion.
In the discrete case when the underlying distribution is integer-valued, approximation
(4) is improved by applying thecontinuity correction.IfXis integer-valued, then for
integersx
1,x2
P{x1≤X≤x 2}=P{x 1−1/2<X<x 2+1/2},
which amounts to making the discrete space of values ofXcontinuous by considering
intervals of length 1 with midpoints at integers.
Example 8.LetX
1,X2,...,X nbe iidb(1,p)RVs. ThenES n=npandvar(S n)=np(1 −p)
so(S
n−np)/
ϕ
np(1−p)is approximatelyN(0,1).
Supposen=10,p=1/2. Then from binomial tablesP(X≤4)=0.3770. Using normal
approximation without continuity correction
P(X≤4)≈P
η
Z≤
4−5

2.5
α
=P(Z≤−0.63)= 0.2643.
Applying continuity correction,
P(X≤4)=P(X<4.5)≈P(Z≤−0.32)= 0.3745.

CENTRAL LIMIT THEOREM 329
0 1 2 3 4 5 6
0.2
0.4
0.6
0.8
(a)
Exact density
Approximation
0 0.5 1 1.5 2
1
2
(b)
Fig. 1(a) Distribution ofXfor Poisson RV with mean 3 and normal approximation and (b) distri-
bution ofXfor exponential RV with mean 1 and normal approximation.

330 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
Next suppose thatn=100,p=0.1. Then from binomial tablesP(X=7)=0.0889.
Using normal approximation, without continuity correction
P(X=7)=P(6.0<X<8.0)≈P(−1.33<Z<−0.67)
=0.1596
and with continuity correction
P(X=7)=P(6.5<X<7.5)≈P(−1.17<Z<−0.83)
=0.0823
The rule of thumb is to use continuity correction, and use normal approximation whenever
np(1−p)>10, and use Poisson approximation withλ=npforp<0.1,λ≤10.
Example 9.LetX
1,X2,...be iidP(λ)RVs. ThenS nhas approximately anN(nλ,nλ)dis-
tribution for largen.Letn =64,λ=0.125. ThenS
n∼P(8)and from Poisson distribution
tablesP(S
n=10)=0.099. Using normal approximation
P(S
n=10)=P(9.5<S n<10.5)≈P(0.53<Z<0.88)
=0.1087.
Ifn=96,λ=0.125, thenS
n∼P(12) and
P(S
n=10)=0.105, exact,
P(S
n=10)≈0.1009, normal approximation.
PROBLEMS 7.6
1.Let{X
n}be a sequence of independent RVs with the following distributions. In each
case, does the Lindeberg condition hold?
(a)P{X
n=±(1/2
n
)}=
1
2
.
(b)P{X
n=±2
n+1
}=1/2
n+3
,P{X n=0}=1−(1/2
n+2
).
(c)P{X
n=±1}=(1−2
−n
)/2,P{X n=±2
−n
}=1/2
n+1
.
(d){X
n}is a sequence of independent Poisson RVs with parameterλ n,n=1,2,...,
such that

n
k=1
λk→∞.
(e)P{X
n=±2
n
}=
1
2
.
2.LetX
1,X2,...be iid RVs with mean 0, variance 1, andEX
4
i
<∞. Find the limiting
distribution of
Z
n=

n
X
1X2+X3X4+···+X 2n−1X2n
X
2
1
+X
2
2
+···+X
2
2n
.
3.LetX
1,X2,...be iid RVs with meanαand varianceσ
2
, and letY 1,Y2,...be iid
RVs with meanβ(η=0) and varianceτ
2
. Find the limiting distribution ofZ n=

n(Xn−α)/Yn, whereXn=n
−1

n i=1
XiandYn=n
−1

n i=1
Yi.

LARGE SAMPLE THEORY 331
4.LetX∼b(n,θ). Use the CLT to findnsuch thatP θ{X>n/2}≥1 −α. In particular,
letα=0.10 andθ=0.45. Calculaten, satisfyingP{X>n/2}≥ 0.90.
5.LetX
1,X2,...be a sequence of iid RVs with common meanμand varianceσ
2
.Also,
let
X=n
−1

n
k=1
XkandS
2
=(n−1)
−1

n
i=1
(Xi−X)
2
. Show that

n(X−μ)/
S
L
−→Z, whereZ∼N(0,1).
6.LetX
1,X2,...,X 100be iid RVs with mean 75 and variance 225. Use Chebychev’s
inequality to calculate the probability that the sample mean will not differ from the
population mean by more than 6. Then use the CLT to calculate the same probability
and compare your results.
7.LetX
1,X2,...,X 100be iidP(λ)RVs, whereλ=0.02. LetS=S 100=

100
i=1
Xi.Use
the central limit result to evaluateP{S≥3}and compare your result to the exact
probability of the eventS≥3.
8.LetX
1,X2,...,X 81be iid RVs with mean 54 and variance 225. Use Chebychev’s
inequality to find the possible difference between the sample mean and the pop-
ulation mean with a probability of at least 0.75. Also use the CLT to do the
same.
9.Use the CLT applied to a Poisson RV to show thatlim
n→∞e
−nt

n−1
k=1
(nt)
k
k!
=1for
0<t<1,=
1
2
ift=1, and 0 ift>1.
10.LetX
1,X2,...be a sequence of iid RVs with meanμand varianceσ
2
, and assume that
EX
4
1
<∞. WriteV n=

n
k=1
(Xk−μ)
2
. Find the centering and norming constantsA n
andB nsuch thatB
−1
n
(Vn−An)
L
−→Z, whereZisN(0,1).
11.From an urn containing 10 identical balls numbered 0 through 9,nballs are drawn
with replacement.
(a) What does the law of large numbers tell you about the appearance of 0’s in the
ndrawings?
(b) How many drawings must be made in order that, with probability at least 0.95,
the relative frequency of the occurrence of 0’s will be between 0.09 and 0.11?
(c) Use the CLT to find the probability that among thennumbers thus chosen
the number 5 will appear between(n−3

n)/10 and(n+3

n)/10 times
(inclusive) if (i)n=25 and (ii)n=100.
12.LetX
1,X2,...,X nbe iid RVs withEX 1=0 andEX
2
1

2
<∞.Let
X=

n
i=1
Xi/n,
and for any positive real numberεletP
n,ε=P{
X≥ε}. Show that
P
n,ε≈
σ
ε

n
1


e
−nε
2
/2σ
2
,asn→∞.
[Hint:Use (5.3.61).]
7.7 LARGE SAMPLE THEORY
In many applications of probability one needs the distribution of a statistic or some func-
tion of it. The methods of Section 7.3 when applicable lead to the exact distribution of the
statistic under consideration. If not, it may be sufficient to approximate this distribution
provided the sample size is large enough.

332 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
Let{X n}be a sequence of RVs which converges in law toN(μ,σ
2
). Then{(X n−μ)/σ)}
converges in law toN(0,1), and conversely. We will say alternatively and equivalently
that{X
n}isasymptotically normalwith meanμand varianceσ
2
. More generally, we
say thatX
nisasymptotically normalwith “mean”μ nand “variance”σ
2
n
, and writeX nis
AN(μ
n,σ
2
n
),ifσ n>0 and asn→∞.
X
n−μn
σn
L
−→N(0,1). (1)
Hereμ
nis not necessarily the mean ofX nandσ
2
n
, not necessarily its variance. In this
case we can approximate, for sufficiently largen,P(X
n≤t)byP

Z≤
t−μn
σn

, whereZis
N(0,1).
The most common method to show thatX
nis AN(μ n,σ
2
n
)is the central limit theorem of
Section 6. Thus, according to Theorem 7.6.1

n(Xn−μ)
L
−→N(0,σ
2
)asn→∞, where
Xnis the sample mean ofniid RVs with meanμand varianceσ
2
. The same result applies
tokth sample moment, providedE|X|
2k
<∞. Thus
n

j=1
X
k
n
/nis AN
η
EX
k
,
var(X
k
)n

.
In many large sample approximations an application of the CLT along with Slutsky’s
theorem suffices.
Example 1.LetX
1,X2,...be iidN(μ,σ
2
). Consider the RV
T
n=

n(X−μ)
S
.
The statisticT
nis well-known for its applications in statistics and in Section 6.5 we deter-
mined its exact distribution. From Example 6.3.4(n−1)S
2
/n
P
−→σ
2
and henceS/σ
P
−→1.
Since

n(X−μ)/σ
L
−→Z∼N(0,1), it follows from Slutsky’s theorem thatT n
L−→Z. Thus
for sufficiently largen(n≥30) we can approximateP(T
n≤t)byP(Z≤t).
Actually, we do not needX’s to be normally distributed (see Problem 7.6.5).
Often we need to approximate the distribution ofg(Y
n)given thatY nis AN(μ,σ
2
).
Theorem 1(Delta Method).SupposeY
nis AN(μ,σ
2
n
), withσ n→0 andμa fixed real
number. Letgbe a real-valued function which is differentiable atx=μ, withg

(μ)η=0.
Then
g(Y
n)isAN
1
g(μ),[g

(μ)]
2
σ
2
n
2
. (2)
Proof.We first show that
[g(Y
n)−g(μ)]
g

(μ)σ n

(Y
n−μ)
σn
P
−→0. (3)

LARGE SAMPLE THEORY 333
Set
h(x)=
Ω
g(x)−g(μ)
x−μ
−g
Ω
(μ),x≤=μ
0, x=μ.
Thenhis continuous atx=μ. Since
Y
n−μ=σ n
,
Y
n−μ
σn
-
L
−→0
by Problem 7.2.7,Y
n−μ
P
−→0, and it follows from Theorem 7.2.4 thath(Y n)
P
−→h(μ)=0.
By Slutsky’s theorem, therefore,
h(Y
n)
Y
n−μ
σn
P
−→0.
That is,
g(Y
n)−g(μ)
σng
Ω
(μ)

Y
n−μ
σn
P
−→0.
It follows again by Slutsky’s theorem that[g(Y
n)−g(μ)]/[g
Ω
(μ)σ n]has the same limit
law as(Y
n−μ)/σ n.
Example 2.We know by CLT theorem thatY
n=
XisAN(μ,σ
2
/n). Supposeg(X)=
X(1−X)whereXis the sample mean in random sampling from a population with mean
μand varianceσ
2
. Sinceg
Ω
(μ)=1−2μ≤=0forμ ≤=1/2, it follows that forμ≤=1/2,
σ
2
<∞,
X(1−X)isAN(μ(1−μ),(1−2μ)
2
σ
2
/n). Thus
P(X(1−X)≤y)=P

X(1−X)−μ(1−μ)
|1−2μ|σ/

n)

y−μ(1−μ)
|1−2μ|σ/

n

≈Φ

y−μ(1−μ)
|1−2μ|σ/

n

for largen.
Remark 1.Supposegin Theorem 1 is differentiablektimes,k≥1, atx=μandg
(i)
(μ)=0
for 1≤i≤k−1,g
(k)
(μ)≤=0. Then a similar argument using Taylor’s theorem shows that
[g(Y
n)−g(μ)]/

1
k!
g
(k)
(μ)σ
k
n

L
−→Z
k
, (4)
whereZis aN(0,1)RV. Thus in Example 2, whenμ=1/2,g
Ω
(1/2)= 0 andg
ΩΩ
(1/2)=
−2≤=0. It follows that
n[
X(1−X)−1/4]
L
−→ −σ
2
χ
2
(1)
sinceZ
2
d

2
(1).

334 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
Remark 2.Theorem 1 can be extended to the multivariate case but we will not pursue the
development. We refer the reader to Ferguson [29] or Serfling [102].
Remark 3.In general the asymptotic variance[g
Ω
(μ)]
2
σ
2
n
ofg(Y n)will depend on the
parameterμ. In problems of inference it will often be desirable to use transformation
gsuch that the approximate variancevarg(Y
n)is free of the parameter. Such transforma-
tions are calledvariance stabilizing transformations. Let us writeσ
2
n

2
(μ)/n. Then
finding agsuch thatvarg(Y
n)is free ofμis equivalent to finding agsuch that
g
Ω
(μ)=c/σ (μ)
for allμ, wherecis a constant independent ofμ. It follows that
g(x)=c

dx
σ(x)
. (5)
Example 3.In Example 2,σ
2
(μ)=μ(1−μ). SupposeX 1,...,X nare iidb(1,p). Then
σ
2
(p)=p(1−p)and (5) reduces to
g(x)=c

dx
ϕ
x(1−x)
=2arcsin

x.
Sinceg(0)=0,g(1)=1,c=(2/π), andg(x)=(2/π)arcsin

x.
Remark 4.In Section 6.3 we computed exact moments of some statistics in terms of pop-
ulation parameters. Approximations for moments ofg(X)can also be obtained from series
expansions ofg. Supposegis twice differentiable atx=μ. Then
Eg(X)≈g(μ)+E(X−μ)g
Ω
(μ)+
1
2
g
ΩΩ
(μ)E(X−μ)
2
(6)
and
E[g(X)−g(μ)]
2
≈[g
Ω
(μ)]
2
E(X−μ)
2
, (7)
by dropping remainder terms. The case of most interest is to approximateEg(
X)and
varg(X). In this case, under suitable conditions, one can show that
Eg(X)≈g(μ)+
σ
2
2n
g
ΩΩ
(μ) (8)
and
varg(
X)≈
σ
2
n
[g
Ω
(μ)]
2
, (9)
whereE
X=μandvar(X)=σ
2
.

LARGE SAMPLE THEORY 335
In Example 2, whenX i’s are iidb(1,p), andg(x)=x(1−x),g
Ω
(x)=1−2x,g
ΩΩ
(x)=−2
so that
Eg(X)≈E[X(1−X)]≈p(1−p)+
σ
2
2n
(−2)
=p(1−p)
n−1
n
and
varg(X)≈
p(1−p)
n
(1−2p)
2
.
In this case we can computeEg(
X)andvarg(X)exactly. We have
Eg(X)=EX−EX
2
=p−
η
p(1−p)
n
+p
2
α
=p(1−p)
n−1
n
so that (8) is exact. Also sinceX
k
i
=Xi, using Theorem 6.3.4 we have
varg(
X)=var(X−X
2
)
=varX−2cov(X,X
2
)+EX
4
−(EX
2
)
2
=
p(1−p)
n

(1−2p)
2
+
2p(1−p)
n−1
ση
n−1
n
α
2
.
Thus the error in approximation (9) is
Error=
2p
2
(1−p)
2
n
3
(n−1).
Remark 5.Approximations (6) through (9) do not assert the existence ofEg(X)orEg(X),
orvarg(X)orvarg(X).
Remark 6.It is possible to extend (6) through (9) to two (or more) variables by using
Taylor series expansion in two (or more) variables.
Finally, we state the following result which gives the asymptotic distribution of therth
order statistic, 1≤r≤n, in sampling from a population with an absolutely continuous DF
Fwith PDFf. For a proof see Problem 4.
Theorem 2.IfX
(r)denotes therth order statistic of a sampleX 1,X2,...,X nfrom an
absolutely continuous DFFwith PDFf, then

n
p(1−p)

1/2
f(zp){X
(r)−zp}
L
−→Zasn→∞, (10)
so thatr/nremains fixed,r/n=p, whereZisN(0,1), andz
pis the unique solution of
F(z
p)=p(that is,z pis the population quantile of orderpassumed unique).

336 BASIC ASYMPTOTICS: LARGE SAMPLE THEORY
Remark 7.The sample quantile of orderp,Z p,is
AN
η
z
p,
1
[f(zp)]
2
p(1−p)
n
α
,
wherez
pis the corresponding population quantile, andfis the PDF of the population
distribution function. It also follows thatZ
p
P−→z p.
PROBLEMS 7.7
1.In sampling from a distribution with meanμand varianceσ
2
find the asymptotic
distribution of
(a)
X
2
, (b) 1/ X,(c) √n|X|
2
,(d) exp(X)
both whenμη=0 and whenμ=0.
2.LetX∼P(λ). Then(X−λ)/

λ
L
−→N(0,1). Find a transformationgsuch that
(g(X)−g(λ))has an asymptoticN(0,c)distribution for largeμwherecis a suitable
constant.
3.LetX
1,X2,...,X nbe a sample from an absolutely continuous DFFwith PDFf.
Show that
EX
(r)≈F
−1
η
r
n+1
α
and
var(X
(r))≈
r(n−r+1)
(n+1)
2
(n+2)
1
{f[F
−1
(r/n+1)]}
2
.
[Hint:LetYbe an RV with meanμandφbe a Borel function such thatEφ(Y)exists.
Expandφ(Y)about the pointμby a Taylor series expansion, and use the fact that
F(X
(r))=U
(r).]
4.Prove Theorem 2. [Hint:For any realμandσ(>0)compute the PDF of
(U
(r)−μ)/σand show that the standardizedU
(r),(U
(r)−μ)/σ, is asymptotically
N(0,1)under the conditions of the theorem.]
5.LetX∼χ
2
(n). Then(X−n)/

2nisAN(0, 1)andX/nisAN
1
1,
2
n
2
.Finda
transformationgsuch that the distribution ofg(X)−g(n)isAN(0,c).
6.SupposeXisG(1,θ).Findg such thatg(X)−g(θ)isAN(0,c).
7.LetX
1,X2,...,X nbe iid RVs withE|X 1|
4
<∞.Letvar(X)=σ
2
andβ 2=μ4/σ
4
:
(a) Show, using the CLT for iid RVs, that

n(S
2
−σ
2
)
L
−→N(0,μ 4−σ
4
).
(b) Find a transformationgsuch thatg(S
2
)has an asymptotic distribution which
depends onβ
2alone but not onσ
2
.

8
PARAMETRIC POINT ESTIMATION
8.1 INTRODUCTION
In this chapter we study the theory of point estimation. Suppose, for example, that a ran-
dom variableXis known to have a normal distributionN(μ,σ
2
), but we do not know one of
the parameters, sayμ. Suppose further that a sampleX
1,X2,...,X nis taken onX. The prob-
lem of point estimation is to pick a (one-dimensional) statisticT(X
1,X2,...,X n)that best
estimates the parameterμ. The numerical value ofTwhen the realization isx
1,x2,...,x n
is frequently called anestimateofμ, while the statisticTis called anestimatorofμ.If
bothμandσ
2
are unknown, we seek a joint statisticT=(U,V)as an estimator of(μ,σ
2
).
In Section 8.2 we formally describe the problem of parametric point estimation. Since
the class of all estimators in most problems is too large it is not possible to find the “best”
estimator in this class. One narrows the search somewhat by requiring that the estimators
have some specified desirable properties. We describe some of these and also outline some
criteria for comparing estimators.
Section 8.3 deals, in detail, with some important properties of statistics such as suffi-
ciency, completeness, and ancillarity. We use these properties in later sections to facilitate
our search for optimal estimators. Sufficiency, completeness, and ancillarity also have
applications in other branches of statistical inference such as testing of hypotheses and
nonparametric theory.
In Section 8.4 we investigate the criterion of unbiased estimation and study methods for
obtaining optimal estimators in the class of unbiased estimators. In Section 8.5 we derive
two lower bounds for variance of an unbiased estimator. These bounds can sometimes help
in obtaining the “best” unbiased estimator.
An Introduction to Probability and Statistics, Third Edition. Vijay K. Rohatgi and A.K. Md. Ehsanes Saleh.
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.

338 PARAMETRIC POINT ESTIMATION
In Section 8.6 we describe one of the oldest methods of estimation and in Section 8.7
we study the method of maximum likelihood estimation and its large sample properties.
Section 8.8 is devoted to Bayes and minimax estimation, and Section 8.9 deals with
equivariant estimation.
8.2 PROBLEM OF POINT ESTIMATION
LetXbe an RV defined on a probability space(Ω,S,P). Suppose that the DFFofX
depends on a certain number of parameters, and suppose further that the functional form of
Fis known except perhaps for a finite number of these parameters. Letθ=(θ
1,θ2,...,θk)
be the unknown parameter associated withF.
Definition 1.The set of all admissible values of the parameters of a DFFis called the
parameter space.
LetX=(X
1,X2,...,X n)be an RV with DFF θ, whereθ=(θ 1,θ2,...,θk)is a vector
of unknown parameters,θ∈Θ.Letψ be a real-valued function onΘ. In this chapter we
investigate the problem of approximatingψ(θ)on the basis of the observed valuexofX.
Definition 2.LetX=(X
1,X2,...,X n)∼P θ,θ∈Θ. A statisticδ(X)is said to be a (point)
estimator ofψifδ:X→Θ, whereXis the space of values ofX.
The problem of point estimation is to find an estimatorδfor the unknown parametric
functionψ(θ)that has some nice properties. The valueδ(x)ofδ(X)for the dataxis
called the estimate ofψ(θ).
In most problemsX
1,X2,...,X nare iid RVs with common DFF θ.
Example 1.LetX
1,X2,...,X nbe iidG(1,θ), whereΘ={θ> 0}andθis to be estimated.
ThenX=R
+
n
and any mapδ:X→(0,∞)is an estimator ofθ. Some typical estimators
ofθare¯X=n
−1
Ω
n
j=1
Xjand{2/[n(n +1)]}
Ω
n
j=1
jXj.
Example 2.LetX
1,X2,...,X nbe iidb(1,p)RVs, wherep∈[0,1]. Then¯Xis an estimator
ofpand so also areδ
1(X)=X 1,δ2(X)=(X 1+Xn)/2, andδ 3(X)=
Ω
n
j=1
ajXj, where
0≤a
j≤1,
Ω
n
j=1
aj=1.
It is clear that in any given problem of estimation we may have a large, often an infinite,
class of appropriate estimators to choose from. Clearly we would like the estimatorδto
be close toψ(θ), and sinceδis a statistic, the usual measure of closeness|δ(X)−ψ(θ)|
is also an RV, we interpret “δ close toψ” to mean “close on the average.” Examples of
such measures of closeness are
P
θ{|δ(X)−ψ(θ)|<ε} (1)
for someε>0, and
E
θ|δ(X)−ψ(θ)|
r
(2)

PROBLEM OF POINT ESTIMATION 339
for somer>0. Obviously we want (1) to be large whereas (2) to be small. Forr=2, the
quantity defined in (2) is calledmean square errorand we denote it by
MSE
θ(δ)=E θ{δ(X)−ψ(θ)}
2
. (3)
Among all estimators forψwe would like to choose one sayδ
0such that
P
θ{|δ0(X)−ψ(θ)|<ε}≥P θ{|δ(X)−ψ(θ)|<ε} (4)
for allδ,allε>0 and allθ. In case of (2) the requirement is to chooseδ
0such that
MSE
θ(δ0)≤MSE θ(δ) (5)
for allδ, and allθ∈Θ. Estimators satisfying (4) or (5) do not generally exist.
We note that
MSE
θ(δ)=E θ{δ(X)−E θδ(X)}
2
+{E θδ(X)−ψ(θ)}
2
=varθδ(X)+{b(δ, ψ )}
2
, (6)
where
b(δ, ψ)=E
θδ(X)−ψ(θ) (7)
is called thebiasofδ. An estimator that has small MSE has small bias and variance. In
order to control MSE, we need to control both variance and bias.
One approach is to restrict attention to estimators which have zero bias, that is,
E
θδ(X)=ψ (θ)for allθ∈Θ. (8)
The condition ofunbiasedness(8) ensures that, on the average the estimatorδhas no sys-
tematic error; it neither over-nor underestimatesψon the average. If we restrict attention
only to the class of unbiased estimators then we need to find an estimatorδ
0in this class
such thatδ
0has the least variance for allθ∈Θ. The theory of unbiased estimation is
developed in Section 8.4.
Another approach is to replace|δ−ψ|
r
in (2) by a more general function. LetL(θ,δ)
measure the loss in estimatingψbyδ. Assume thatL,theloss function, satisfiesL(θ,δ)≥0
for allθandδ, andL(θ,ψ(θ)) =0 for allθ. Measure average loss by therisk function
R(θ,δ)=E
θL(θ,δ(X)). (9)
Instead of seeking an estimator which minimizesRthe risk uniformly inθ, we minimize
θ
R(θ,δ)π(θ)dθ (10)
for some weight functionπonΘand minimize
sup
θ∈Θ
R(θ,δ). (11)
The estimator that minimizes the average risk defined in (10) leads to the Bayes estimator
and the estimator that minimizes (11) leads to the minimax estimator. Bayes and minimax
estimation are discussed in Section 8.8.

340 PARAMETRIC POINT ESTIMATION
Sometimes there are symmetries in the problem which may be used to restrict attention
only to estimators which also exhibit the same symmetry. Consider, for example, an exper-
iment in which the length of life of a light bulb is measured. Then an estimator obtained
from the measurements expressed in hours and minutes must agree with an estimator
obtained from the measurements expressed in minutes. IfXrepresents measurements in
original units (hours) andYrepresents corresponding measurements in transformed units
(minutes) thenY=cX(herec=60). Ifδ(X)is an estimator of the true mean, then we
would expectδ(Y), the estimator of the true mean to correspond toδ(X)according to the
relationδ(Y)=cδ (X). That is,δ(cX)= cδ(X), for allc>0. This is an example of an
equivariant estimatorwhich is the topic under extensive discussion in Section 8.9.
Finally, we consider some large sample properties of estimators. As the sample size
n→∞, the dataxare practically the whole population, and we should expectδ(X)to
approachψ(θ)in some sense. For example, ifδ(X)=
X,ψ(θ)=E θX1, andX 1,X2,...,X n
are iid RVs with finite mean then strong law of large numbers tells us thatX→E θX1with
probability 1. This property of a sequence of estimators is calledconsistency.
Definition 3.LetX
1,X2,...be a sequence of iid RVs with common DFF θ,θ∈Θ.A
sequence of point estimatorsT
n(X1,X2,...,X n)=T nwill be called consistent forψ(θ)if
T
n
P−→ψ(θ)asn→∞
for each fixedθ∈Θ.
Remark 1.Recall thatT
n
P−→ψ(θ)if and only ifP{|T n−ψ(θ)|>ε}→0asn →∞for
everyε>0. One can similarly definestrong consistencyof a sequence of estimatorsT
nif
T
n
a.s.−−→ψ(θ). Sometimes one speaks ofconsistency in the rth meanwhenT n
r−→ψ(θ).
In what follows, “consistency” will meanweak consistencyofT
nforψ(θ), that is,
T
n
P−→ψ(θ).
It is important to remember that consistency is a large sample property. Moreover, we
speak of consistency of a sequence of estimators rather than one point estimator.
Example 3.LetX
1,X2,...be iidb(1,p)RVs. ThenEX 1=p, and it follows by the
WLLN that
Ω
n
1
Xi
n
P
−→p.
ThusXis consistent forp.Also(
Ω
n
1
Xi+1)/(n+2)
P
−→p, so that a consistent estimator
need not be unique. Indeed, ifT
n
P−→p, andc n→0asn→∞, thenT n+cn
P−→pand if
d
n→1 thend nTn
p−→p.
Theorem 1.IfX
1,X2...are iid RVs with common lawL(X), andE|X|
p
<∞for some
positive integerp, then
Ω
n
1
X
k
i
n
P
−→EX
k
for 1≤k≤p,

PROBLEM OF POINT ESTIMATION 341
andn
−1

n
1
X
k
i
is consistent forEX
k
,1≤k≤p. Moreover, ifc nis any sequence of con-
stants such thatc
n→0asn →∞, then{n
−1

X
k
i
+cn}is also consistent forEX
k
,
1≤k≤p. Also, ifc
n→1asn →∞, then

c nn
−1

X
k
i

is consistent forEX
k
.This
is simply a restatement of the WLLN for iid RVs.
Example 4.LetX
1,X2,...be iidN(μ,σ
2
)RVs. IfS
2
is the sample variance, we know that
(n−1)S
2

2
∼χ
2
(n−1). ThusE(S
2

2
)=1 andvar(S
2

2
)=2/(n −1). It follows that
P{|S
2
−σ
2
|>ε}≤
var(S
2
)
ε
2
=

4
(n−1)ε
2
→0as n→∞.
ThusS
2
P
−→σ
2
. Actually, this result holds for any sequence of iid RVs withE|X|
2
<∞and
can be obtained from Theorem 1.
Example 4 is a particular case of the following theorem.
Theorem 2.IfT
nis a sequence of estimators such thatET n→ψ(θ)andvar(T n)→0as
n→∞, thenT
nis consistent forψ(θ).
Proof.We have
P{T
n−ψ(θ)|>ε}≤ε
−2
E{Tn−ETn+ETn−ψ(θ)}
2

−2
{var(T n)+(ET n−ψ(θ))
2
}→0as n→∞.
Other large sample of properties of estimators are asymptotic unbiasedness, asymptotic
normality, and asymptotic efficiency. A sequence of estimators{T
n}isasymptotically
unbiasedforψ(θ)if
lim
n→∞
EθTn(X)=ψ(θ)
for allθ. A consistent sequence of estimators{T
n}is said to be consistent asymptotically
normal (CAN) forψ(θ)ifT
n∼AN(ψ (θ),v(θ)/n)for allθ∈Θ.Ifv(θ)=1/I (θ), where
I(θ)is the Fisher information (Section 8.7), then{T
n}is known as abest asymptotically
normal(BAN) estimator.
Example 5.LetX
1,X2,...,X nbe iidN(θ,1)RVs. ThenT n=

n
i=1
Xi/(n+1)is asymp-
totically unbiased forθand BAN estimator forθwithv(θ)=1.
In Section 8.7 we consider large sample properties of maximum likelihood estimators
and in Section 8.5 asymptotic efficiency is introduced.
PROBLEMS 8.2
1.Suppose thatT
nis a sequence of estimators for parameterθthat satisfies the condi-
tions of Theorem 2. ThenT
n
2−→θ, that is,T nis squared error consistent forθ.IfT n
is consistent forθand|T n−θ|≤A<∞for allθand all(x 1,x2,...,x n)∈R n,show

342 PARAMETRIC POINT ESTIMATION
thatT n
2−→θ.If,however,|T n−θ|≤A n<∞, then show thatT nmay not be squared
error consistent forθ.
2.LetX
1,X2,...,X nbe a sample fromU[0,θ],θ∈Θ=(0,∞).LetX
(n)=max
{X
1,X2,...,X n}. Show thatX
(n)
P−→θ. WriteY n=2
X.IsY nconsistent forθ?
3.LetX
1,X2,...,X nbe iid RVs withEX i=μandE|X i|
2
<∞. Show thatT(X 1,
X
2,...,X n)=2[n(n +1)]
−1

n
i=1
iXiis a consistent estimator forμ.
4.LetX
1,X2,...,X nbe a sample fromU[0,θ]. Show thatT(X 1,X2,...,X n)=
(

n
i=1
Xi)
1/n
is a consistent estimator forθe
−1
.
5.In Problem 2 show thatT(X)=X
(n)is asymptotically biased forθand is not BAN.
(Show thatn(θ−X
(n))
L
−→G(1,θ).)
6.In Problem 5 consider the class of estimatorsT(X)=eX
(n),c>0. Show that the
estimatorT
θ(X)=(n+2)X
(n)/(n+1)in this class has the least MSE.
7.LetX
1,X2,...,X nbe iid with PDFf θ(x)=exp{−(x−θ)},x>θ. Consider the class
of estimatorsT(X)=X
(1)+b,b∈R. Show that the estimator that has the smallest
MSE in this class is given byT(X)=X
(1)−1/n.
8.3 SUFFICIENCY, COMPLETENESS AND ANCILLARITY
After the completion of any experiment, the job of a statistician is to interpret the data she
has collected and to draw some statistically valid conclusions about the population under
investigation. The raw data by themselves, besides being costly to store, are not suitable
for this purpose. Therefore the statistician would like to condense the data by computing
some statistics from them and to base her analysis on these statistics, provided that there is
“no loss of information” in doing so. In many problems of statistical inference a function
of the observations contains as much information about the unknown parameter as do all
the observed values. The following example illustrates this point.
Example 1.LetX
1,X2,...,X nbe a sample fromN(μ,1), whereμis unknown. Suppose
that we transform variablesX
1,X2,...,X ntoY1,Y2,...,Y nwith the help of an orthogo-
nal transformation so thatY
1isN(

nμ,1),Y 2,...,Y nare iidN(0,1), andY 1,Y2,...,Y n
are independent. (Takey 1=

nx, and, fork=2,...,n,y k=[(k−1)x k−(x1+···+
x
k−1)]/

k(k−1)). To estimateμwe can use either the observed values ofX 1,X2,...,X n
or simply the observed value ofY 1=

nX.TheRVsY 2,Y3,...,Y nprovide no information
aboutμ. Clearly,Y
1is preferable since one need not keep a record of all the observations;
it suffices to cumulate the observations and computey
1. Any analysis of the data based
ony
1is just as effective as any analysis that could be based onx i’s. We note thatY 1takes
values inR
1whereas(X 1,X2,...,X n)takes values inR n.
A rigorous definition of the concept involved in the above discussion requires the notion
of a conditional distribution and is beyond the scope of this book. In view of the discussion
of conditional probability distributions in Section 4.2, the following definition will suffice
for our purposes.
Definition 1.LetX=(X
1,X2,...,X n)be a sample from{F θ:θ∈Θ}. A statisticT=
T(X)is sufficient forθor for the family of distributions{F
θ:θ∈Θ}if and only if the

SUFFICIENCY, COMPLETENESS AND ANCILLARITY 343
conditional distribution ofX,givenT =t, does not depend onθ(except perhaps for a null
setA,P
θ{T∈A}=0 for allθ).
Remark 1.The outcomeX
1,X2,...,X nis always sufficient, but we will exclude this trivial
statistic from consideration. According to Definition 1, ifTis sufficient forθ, we need
only concentrate onTsince it exhausts all the information that the sample has aboutθ.
In practice, there will be several sufficient statistics for a family of distributions, and the
question arises as to which of these should be used in a given problem. We will return to
this topic in more detail later in this section.
Example 2.We show that the statisticY
1in Example 1 is sufficient forμ. By construction
Y
2,...,Y nare iidN(0,1)RVs that are independent ofY 1. Hence the conditional distribution
ofY
2,...,Y n,givenY 1=

nX, is the same as the unconditional distribution of(Y 2,...,Y n),
which is multivariate normal with mean(0,0,...,0)and dispersion matrixI
n−1. Since this
distribution is independent ofμ, the conditional distribution of(Y
1,Y2,...,Y n), and hence
(X
1,X2,...,X n),givenY 1=y1, is also independent ofμandY 1is sufficient.
Example 3.LetX
1,X2,...,X nbe iidb(1,p)RVs. Intuitively, if a loaded coin is tossed
with probabilitypof headsntimes, it seems unnecessary to know which toss resulted in
a head. To estimatep, it should be sufficient to know the number of heads inntrials. We
show that this is consistent with our definition. LetT(X
1,X2,...,X n)=
Ω
n
i=1
Xi. Then
P
Λ
X
1=x1,...,X n=xn




n

i=1
Xi=t

=
P{X
1=x1,...,X n=xn,T=t}

n
t

p
t
(1−p)
n−t
,
if
Ω
n
1
xi=t, and=0 otherwise. Thus, for
Ω
n
1
xi=t,wehave
P{X
1=x1,...,X n=xn|T=t}=
p
Ω
n
1
xi
(1−p)
n−
Ω
x i

n
t

p
t
(1−p)
n−t
=
1

n
t
,
which is independent ofp. It is therefore sufficient to concentrate on
Ω
n
1
Xi.
Example 4.LetX
1,X2be iidP(λ)RVs. ThenX 1+X2is sufficient forλ,for
P{X
1=x1,X2=x2|X1+X2=t}
=



P{X
1=x1,X2=t−x 1}
P{X1+X2=t}
ift=x
1+x2,xi=0,1,2,...,
0 otherwise.
Thus, forx
i=0,1,2,...,i=1,2,x 1+x2=t,wehave
P{X
1=x1,X2=x2|X1+X2=t}=
η
t
x
1
τη
1
2
τ
t
,
which is independent ofλ.
Not every statistic is sufficient.

344 PARAMETRIC POINT ESTIMATION
Example 5.LetX 1,X2be iidP(λ)RVs, and consider the statisticT=X 1+2X 2.Wehave
P{X
1=0,X 2=1|X 1+2X 2=2}=
P{X
1=0,X 2=1}P{X1+2X 2=2}
=
e
−λ
(λe
−λ
)P{X1=0,X 2=1}+P{X 1=2,X 2=0}
=
λe
−2λ
λe
−2λ
+(λ
2
/2)e
−2λ
=
1
1+(λ/2)
,
and we see thatX
1+2X 2is not sufficient forλ.
Definition 1 is not a constructive definition since it requires that we first guess a statistic
Tand then check to see whetherTis sufficient. Moreover, the procedure for checking that
Tis sufficient is quite time-consuming. We now give a criterion for determining sufficient
statistics.
Theorem 1(The Factorization Criterion).LetX
1,X2,...,X nbe discrete RVs with PMF
p
θ(x1,x2,...,x n),θ∈Θ. ThenT(X 1,X2,...,X n)is sufficient forθif and only if we can
write
p
θ(x1,x2,...,x n)=h(x 1,x2,...,x n)gθ(T(x1,x2,...,x n)), (1)
wherehis a nonnegative function ofx
1,x2,...,x nonly and does not depend onθ, and
g
θis a nonnegative nonconstant function ofθandT(x 1,x2,...,x n)only. The statistic
T(X
1,...,X n)and parameterθmay be multidimensional.
Proof.LetTbe sufficient forθ. ThenP{X=x|T=t}is independent ofθ, and we
may write
P
θ{X=x}=P θ{X=x,T(X 1,X2,...,X n)=t}
=P
θ{T=t}P{X=x|T=t},
provided thatP{X=x|T=t}is well defined.
For values ofxfor whichP
θ{X=x}=0 for allθ, let us defineh(x 1,x2,...,x n)=0,
and forxfor whichP
θ{X=x}>0forsomeθ , we define
h(x
1,x2,...,x n)=P{X 1=x1,...,X n=xn|T=t}
and
g
θ(T(x1,x2,...,x n)) =P θ{T(x 1,...,x n)=t}.
Thus we see that (1) holds.
Conversely, suppose that (1) holds. Then for fixedt
0we have
P
θ{T=t 0}=

(x:T(x)=t 0)
Pθ{X=x}
=

(x:T(x)=t 0)
gθ(T(x))h(x)
=g
θ(t0)

T(x)=t 0
h(x).

SUFFICIENCY, COMPLETENESS AND ANCILLARITY 345
Suppose thatP θ{T=t 0}>0forsomeθ> 0. Then
P
θ{X=x|T=t 0}=
P
θ{X=x,T(x)=t 0}
Pθ{T(x)=t 0}
=



0i fT(x) =t
0,
P
θ{X=x}
Pθ{T(x)=t 0}
ifT(x)=t
0.
Thus, ifT(x)=t
0, then
P
θ{X=x}
Pθ{T(x)=t 0}
=
g
θ(t0)h(x)
gθ(t0)
θ
T(x)=t 0
h(x)
,
which is free ofθ, as asserted. This completes the proof.
Remark 2.Theorem 1 holds also for the continuous case and, indeed, for quite arbitrary
families of distributions. The general proof is beyond the scope of this book, and we refer
the reader to Halmos and Savage [41] or to Lehmann [64, pp. 53–56]. We will assume that
the result holds for the absolutely continuous case. We leave the reader to write the analog
of (1) and to prove it, at least under the regularity conditions assumed in Theorem 4.4.2.
Remark 3.Theorem 1 (and its analog for the continuous case) holds ifθis a vector of
parameters andTis a multiple RV, and we say thatTisjointly sufficientforθ. We empha-
size that, even ifθis scalar,Tmay be multidimensional (Example 9). IfθandTare of
the same dimension, and ifTis sufficient forθ, it does not follow that thejth component
ofTis sufficient for thejth component ofθ(Example 8). The converse is true under mild
conditions (see Fraser [32, p. 21]).
Remark 4.IfTis sufficient forθ, any one-to-one function ofTis also sufficient. This
follows from Theorem 1, ifU=k(T)is a one-to-one function ofT, thent=k
−1
(u)and
we can write
f
θ(x)=g θ(t)h(x)=g θ(k
−1
(u))h(x )=g

θ
(u)h(x ).
IfT
1,T2are two distinct sufficient statistics, then
f
θ(x)=g θ(t1)h1(x)=g θ(t2)h2(x),
and it follows thatT
1is a function ofT 2. It does not follow, however, that every function of
a sufficient statistic is itself sufficient. For example, in sampling from a normal population,
Xis sufficient for the meanμbutX
2
is not. Note thatXis sufficient forμ
2
.
Remark 5.As a rule, Theorem 1 cannot be used to show that a given statisticTis not
sufficient. To do this, one would normally have to use the definition of sufficiency. In
most cases Theorem 1 will lead to a sufficient statistic if it exists.
Remark 6.IfT(X)is sufficient for{F
θ:θ∈Θ}, thenTis sufficient for{F θ:θ∈ω},
whereω⊆Θ. This follows trivially from the definition.

346 PARAMETRIC POINT ESTIMATION
Example 6.LetX 1,X2,...,X nbe iidb(1,p)RVs. ThenT=
σ
n
i=1
Xiis sufficient. We have
P
p{X1=x1,X2=x2,...,X n=xn}=p
σ
n
1
xi
(1−p)
n−
σ
n
1
xi
,
and, taking
h(x
1,x2,...,x n)=1 and g p(x1,x2,...,x n)=(1−p)
n
η
p
1−p
τ
σ
n i=1
xi
,
we see thatTis sufficient. We note thatT
1(X)=(X 1,X2+X3+···+X n)andT 2(X)=
(X
1+X2,X3,X4+X5+···+X n)are also sufficient forpalthoughTis preferable toT 1
orT2.
Example 7.LetX
1,X2,...,X nbe iid RVs with common PMF
P{X
i=k}=
1
N
,k=1,2,...,N;i=1,2,...,n.
Then
P
N{X1=k1,X2=k2,...,X n=kn}=
1
N
n
if 1≤k 1,...,k n≤N,
=
1
N
n
ϕ(1,min
1≤i≤n
ki)ϕ(max
1≤i≤n
ki,N),
whereϕ(a,b)=1ifb≥a, and=0ifb<a. It follows, by takingg
N[max(k 1,...k n)] =
(1/N
n
)ϕ(max1≤i≤nki,N)andh=ϕ(1,mink i), thatmax(X 1,X2,...,X n)is sufficient for
the family of joint PMFsP
N.
Example 8.LetX
1,X2,...,X nbe a sample fromN(μ,σ
2
), where bothμandσ
2
are
unknown. The joint PDF of(X
1,X2,...,X n)is
f
μ,σ
2(x)=
1


2π)
n
exp
ν

σ
(x
i−μ)
2

2
φ
=
1


2π)
n
exp
η

σ
n
1
x
2
i

2
+
μ
σ
n 1
xi
σ
2


2

2
τ
.
It follows that the statistic
T(X
1,...,X n)=

n

1
Xi,
n

1
X
2
i

is jointly sufficient for the parameter(μ,σ
2
). An equivalent sufficient statistic that is
frequently used isT
1(X1,...,X n)=(
X,S
2
). Note thatXis not sufficient forμifσ
2
is unknown, andS
2
is not sufficient forσ
2
ifμis unknown. If, however,σ
2
is known,
Xis sufficient forμ.Ifμ=μ 0is known,
σ
n 1
(Xi−μ0)
2
is sufficient forσ
2
.

SUFFICIENCY, COMPLETENESS AND ANCILLARITY 347
Example 9.LetX 1,X2,...,X nbe a sample from PDF
f
θ(x)=





1
θ
,x∈


θ
2
,
θ
2

,θ>0,
0,otherwise.
The joint PDF ofX
1,X2,...,X nis given by
f
θ(x1,x2,...,x n)=
1
θ
n
IA(x1,...,x n),
where
A=
ν
(x
1,x2,...,x n):−
θ
2
≤minx
i≤maxx i≤
θ
2
φ
.
It follows that(X
(1),X
(n))is sufficient forθ.
We note that the order statistic(X
(1),X
(2),...,X
(n))is also sufficient. Note also that the
parameter is one-dimensional, the statistics(X
(1),X
(n))is two-dimensional, whereas the
order statistic isn-dimensional.
In Example 9 we saw that order statistic is sufficient. This is not a mere coincidence.
In fact, ifX=(X
1,X2,...,X n)are exchangeable then the joint PDF ofXis a symmetric
function of its arguments. Thus
f
θ(x1,x2,...,x n)=fθ(x
(1),x
(2),...,x
(n)),
and it follows that the order statistic is sufficient forf
θ.
The concept of sufficiency is frequently used with another concept, calledcomplete-
ness, which we now define.
Definition 2.Let{f
θ(x),θ∈Θ}be a family of PDFs (or PMFs). We say that this family
is complete if
E
θg(X)=0 for all θ∈Θ,
which implies
P
θ{g(X)=0}=1 for allθ∈Θ.
Definition 3.A statisticT(X)is said to be complete if the family of distributions ofTis
complete.
In Definition 3Xwill usually be a multiple RV. The family of distributions ofTis
obtained from the family of distributions ofX
1,X2,...,X nby the usual transformation
technique discussed in Section 4.4.

348 PARAMETRIC POINT ESTIMATION
Example 10.LetX 1,X2,...,X nbe iidb(1,p)RVs. ThenT=
σ
n
1
Xiis a sufficient statistic.
We show thatTis also complete, that is, the family of distributions ofT,{b(n,p),0<
p<1}, is complete.
E
pg(T)=
n

t=0
g(t)
η
n
t
τ
p
t
(1−p)
n−t
=0 for allp∈(0,1)
may be rewritten as
(1−p)
n
n

t=0
g(t)
η
n
t
τη
p
1−p
τ
t
=0 for allp∈(0,1).
This is a polynomial inp/(1−p). Hence the coefficients must vanish, and it follows that
g(t)=0fort=0,1,2,...,n, as required.
Example 11.LetXbeN(0,θ). Then the family of PDFs{N(0,θ),θ>0}is not complete
sinceEX=0 andg(x)=xis not identically 0. Note thatT(X)=X
2
is complete, for the
PDF ofX
2
∼θχ
2
(1)is given by
f(t)=



e
−t/2θ

2πθt
,t>0,
0, otherwise.
E
θg(T)=
1

2πθ
θ

0
g(t)t
−1/2
e
−t/2θ
dt=0 for allθ>0,
which holds if and only if
ˆ

0
g(t)t
−1/2
e
−t/2θ
dt=0, and using the uniqueness property
of Laplace transforms, it follows that
g(t)t
−1/2
=0 for allt>0,
that is,g(t)=0.
The next example illustrates the existence of a sufficient statistic which is not complete.
Example 12.LetX
1,X2,...,X nbe a sample fromN(θ,θ
2
). ThenT=(
σ
n
1
Xi,
σ
n
1
X
2
i
)is
sufficient forθ. However,Tis not complete since
E
θ



2

n

1
Xi

2
−(n+1)
n

1
X
2
i



=0 for allθ,
and the functiong(x
1,...,x n)=2(
σ
n
1
xi)
2
−(n+1)
σ
n
1
x
2
j
is not identically 0.
Example 13.LetX∼U(0,θ),θ∈(0,∞). We show that the family of PDFs ofXis
complete. We need to show that
E
θg(X)=
θ
θ
0
1
θ
g(x)dx=0 for allθ>0

SUFFICIENCY, COMPLETENESS AND ANCILLARITY 349
if and only ifg(x)=0 for allx. In general, this result follows from Lebesgue integration
theory. Ifgis continuous, we differentiate both sides in
Θ
θ
0
g(x)dx=0
to getg(θ)=0 for allθ>0.
Now letX
1,X2,...,X nbe iidU(0,θ)RVs. Then the PDF ofX
(n)is given by
f
n(x|θ)=
Λ

−n
x
n−1
,0<x<θ,
0, otherwise.
We see by a similar argument thatX
(n)is complete, which is the same as saying that
{f
n(x|θ);θ>0}is a complete family of densities. Clearly,X
(n)is sufficient.
Example 14.LetX
1,X2,...,X nbe a sample from PMF
P
N(x)=



1
N
,x=1,2,...,N,
0,otherwise.
We first show that the family of PMFs{P
N,N≥1}is complete. We have
E
Ng(X)=
1
N
N

k=1
g(k)=0 for allN≥1,
and this happens if and only ifg(k)=0,k=1,2,...,N. Next we consider the family of
PMFs ofX
(n)=max(X 1,...,X n). The PMF ofX
(n)is given by
P
(n)
N
(x)=
x
n
N
n

(x−1)
n
N
n
,x=1,2,...,N.
Also
E
Ng(X
(n))=
N

k=1
g(k)

k
n
N
n

(k−1)
n
N
n

=0 for allN≥1.
E
1g(X
(n))=g(1)= 0
impliesg(1)=0. Again,
E
2g(X
(n))=
g(1)
2
n
+g(2)
η
1−
1
2
n
τ
=0
so thatg(2)=0.
Using an induction argument, we conclude thatg(1)=g(2)= ···=g(N)=0 and
henceg(x)=0. It follows thatP
(n) N
is a complete family of distributions, andX
(n)is a
complete sufficient statistic.

350 PARAMETRIC POINT ESTIMATION
Now suppose that we exclude the valueN=n 0for some fixedn 0≥1 from the family
{P
N:N≥1}. Let us writeP={P N:N≥1,N =n 0}. ThenPis not complete. We ask the
reader to show that the class of all functionsgsuch thatE
Pg(X)=0 for allP∈Pconsists
of functions of the form
g(k)=





0, k=1,2,...,n
0−1,n 0+2,n 0+3,...,
c, k=n
0,
−c,k=n
0+1,
wherecis a constant,c =0.
Remark 7.Completeness is a property of a family of distributions. In Remark 6 we saw
that if a statistic is sufficient for a class of distributions it is sufficient for any subclass of
those distributions. Completeness works in the opposite direction. Example 14 shows that
the exclusion of even one member from the family{P
N:N≥1}destroys completeness.
The following result covers a large class of probability distributions for which a
complete sufficient statistic exists.
Theorem 2.Let{f
θ:θ∈Θ}be ak-parameter exponential family given by
f
θ(x)=exp



k

j=1
Qj(θ)T j(x)+D(θ )+S(x)



, (2)
whereθ=(θ
1,θ2,...,θk)∈Θ, an interval inR k,T1,T2,...,T k, andSare defined on
R
n,T=(T 1,T2,...,T k), andx=(x 1,x2,...,x n),k≤n.LetQ =(Q 1,Q2,...,Q k), and
suppose that the range ofQcontains an open set inR
k. Then
T=(T
1(X),T 2(X),...,T k(X))
is a complete sufficient statistic.
Proof.For a complete proof in a general setting we refer the reader to Lehmann [64,
pp. 142–143]. Essentially, the unicity of the Laplace transform is used on the probability
distribution induced byT. We will content ourselves here by proving the result for the
k=1 case whenf
θis a PMF.
Let us writeQ(θ)=θin (2), and let(α,β)⊆Θ.Wewishtoshowthat
E
θg(T(X)) =

t
g(t)P θ{T(X)=t}
=

t
g(t)exp{θt+D(θ)+S

(t)}=0 for allθ (3)
implies thatg(t)=0.

SUFFICIENCY, COMPLETENESS AND ANCILLARITY 351
Let us writex
+
=xifx≥0,=0ifx<0, andx

=−xifx<0,=0ifx≥0. Then
g(t)=g
+
(t)−g

(t), and bothg
+
andg

are nonnegative functions. In terms ofg
+
and
g

, (3) is the same as

t
g
+
(t)e
θt+S

(t)
=

t
g

(t)e
θt+S

(t)
(4)
for allθ.
Letθ
0∈(α,β)be fixed, and write
p
+
(t)=
g
+
(t)e
θ0t+S

(t)
Θ
t
g
+
(t)e
θ0t+S

(t)
andp

(t)=
g

(t)e
θ0t+S

(t)
Θ
t
g

(t)e
θ0t+S

(t)
(5)
Then bothp
+
andp

are PMFs, and it follows from (4) that

t
e
δt
p
+
(t)=

t
e
δt
p

(t) (6)
for allδ∈(α−θ
0,β−θ 0). By the uniqueness of MGFs (6) implies that
p
+
(t)=p

(t)for allt
and hence thatg
+
(t)=g

(t)for allt, which is equivalent tog(t)=0 for allt. SinceTis
clearly sufficient (by the factorization criterion), it is proved thatTis a complete sufficient
statistic.
Example 15.LetX
1,X2,...,X nbe iidN(μ,σ
2
)RVs where bothμandσ
2
are unknown.
We know that the family of distributions ofX=(X
1,...,X n)is a two-parameter exponen-
tial family withT(X
1,...,X n)=(
Θ
n
1
Xi,
Θ
n
1
X
2
i
). From Theorem 2 it follows thatTis a
complete sufficient statistic. Examples 10 and 11 fall in the domain of Theorem 2.
In Example 6, 8, and 9 we have shown that a given family of probability dis-
tributions that admits a nontrivial sufficient statistic usually admits several sufficient
statistics. Clearly we would like to be able to choose the sufficient statistic that results
in the greatest reduction of data collection. We next study the notion of amini-
mal sufficient statistic. For this purpose it is convenient to introduce the notion of
asufficient partition. The reader will recall that a partitionof a spaceXis just a
collection of disjoint setsE
αsuch that
Θ
α
Eα=X. Any statisticT(X 1,X2,...,X n)
induces a partition of the space of values of(X
1,X2,...,X n), that is,Tinduces
a covering ofXby a familyUof disjoint setsA
t={(x 1,x2,...,x n)∈X:T(x 1,
x
2,...,x n)=t}, wheretbelongs to the range ofT.ThesetsA tare calledpartition sets.
Conversely, given a partition, any assignment of a number to each set so that no two par-
tition sets have the same number assigned defines a statistic. Clearly this function is not,
in general, unique.
Definition 4.Let{F
θ:θ∈Θ}be a family of DFs, andX=(X 1,X2,...,X n)be a
sample fromF
θ.LetU be a partition of the sample space induced by a statistic

352 PARAMETRIC POINT ESTIMATION
T=T(X 1,X2,...,X n). We say thatU={A t:tis in the range ofT}is a sufficient parti-
tion forθ(or the family{F
θ:θ∈Θ}) if the conditional distribution ofX,givenT =t,
does not depend onθfor anyA
t, provided that the conditional probability is well defined.
Example 16.LetX
1,X2,...,X nbe iidb(1,p)RVs. The sample space of values of(X 1,
X
2,...,X n)is the set ofn-tuples(x 1,x2,...,x n), where eachx i=0or= 1 and consists of
2
n
points. LetT(X 1,X2,...,X n)=
Ω
n
1
Xi, and consider the partitionU={A 0,A1,...,A n},
wherex∈A
jif and only if
Ω
n
1
xi=j,0≤j≤n. EachA jcontains

n
j

sample points. The
conditional probability
P
p{x|A j}=
P
p{x}
Pp(Aj)
=
η
n
j
τ
−1
ifx∈A j,
and we see thatUis a sufficient partition.
Example 17.LetX
1,X2,...,X nbe iidU[0,θ]RVs. Consider the statisticT(X)=
max
1≤i≤nXi. The space of values ofX 1,X2,...,X nis the set of points{x:0≤x i≤θ,
i=1,2,...,n}.Tinduces a partitionUon this set. The sets of this partition areA
t={(x 1,
x
2,...,x n):max(x 1,...,x n)=t},t∈[0,θ].
We have
f
θ(x|t)=
f
θ(x)
f
T
θ
(t)
ifx∈A
t,
wheref
T
θ
(t)is the PDF ofT.Wehave
f
θ(x|t)=
1/θ
n
nt
n−1

n
=
1
nt
n−1
ifx∈A t.
It follows thatU={A
t}defines a sufficient partition.
Remark 8.Clearly a sufficient statisticTfor a family of DFs{F
θ:θ∈Θ}induces a
sufficient partition and, conversely, given a sufficient partition, we can define a sufficient
statistic (not necessarily uniquely) for the family.
Remark 9.Two statisticsT
1,T2that define the same partition must be in one-to-one cor-
respondence, that is, there exists a functionhsuch thatT
1=h(T 2)with a unique inverse,
T
2=h
−1
(T1). It follows that ifT 1is sufficient every one-to-one function ofT 1is also
sufficient.
LetU
1,U2be two partitions of a spaceX. We say thatU 1is asubpartitionofU 2if every
partition set inU
2is a union of sets ofU 1. We sometimes say also thatU 1isfinerthan
U
2(U2iscoarserthanU 1)orthatU 2is areductionofU 1. In this case, a statisticT 2that
definesU
2must be a function of any statisticT 1that definesU 1. Clearly, this function need
not have a unique inverse unless the two partitions have exactly the same partition sets.
Given a family of distributions{F
θ:θ∈Θ}for which a sufficient partition exists, we
seek to find a sufficient partitionUthat is as coarse as possible, that is, any reduction ofU
leads to a partition that is not sufficient.

SUFFICIENCY, COMPLETENESS AND ANCILLARITY 353
Definition 5.A partitionUis said to be minimal sufficient if
(i)Uis a sufficient partition, and
(ii) ifCis any sufficient partition,Cis a subpartition ofU.
The question of the existence of the minimal partition was settled by Lehmann and
Scheffé [65] and, in general, involves measure-theoretic considerations. However, in the
cases that we consider where the sample space is either discrete or a finite-dimensional
Euclidean space and the family of distributions ofXis defined by a family of PDFs (PMFs)
{f
θ,θ∈Θ}such difficulties do not arise. The construction may be described as follows.
Two pointsxandyin the sample space are said to belikelihood equivalent, and we
writex∼y, if and only if there exists ak(y,x) =0 which does not depend onθsuch that
f
θ(y)=k(y,x)f θ(x). We leave the reader to check that “∼ ” is an equivalence relation
(that is, it is reflexive, symmetric, and transitive) and hence “∼” defines a partition of the
sample space. This partition defines the minimal sufficient partition.
Example 18.Consider again Example 16. Then
f
p(x)
fp(y)
=p

x
i−

y i
(1−p)


x i+

y i
,
and this ratio is independent ofpif and only if
n

1
xi=
n

1
yi,
so thatx∼yif and only if

n
1
xi=

n
1
yi. It follows that the partitionU={A 0,A1,...,A n},
wherex∈A
jif and only if

n
1
xi=j, introduced in Example 16 is minimal sufficient.
A rigorous proof of the above assertion is beyond the scope of this book. The basic
ideas are outlined in the following theorem.
Theorem 3.The relation “∼” defined above induces a minimal sufficient partition.
Proof.IfTis a sufficient statistic, we have to show thatx∼ywheneverT(x)=T(y).
This will imply that every set of the minimal sufficient partition is a union of sets of the
formA
t={T=t}, proving condition (ii) of Definition 5.
Sufficiency ofTmeans that wheneverx∈A
t, then
f
θ{x|T=t}=
f
θ(x)
f
T
θ
(t)
ifx∈A
t
is free ofθ. It follows that if bothxandy∈A t, then
f
θ(x|t)
fθ(y|t)
=
f
θ(x)
fθ(y)
is independent ofθ, and hencex∼y.

354 PARAMETRIC POINT ESTIMATION
To prove the sufficiency of the minimal sufficient partitionU,letT 1be an RV that
inducesU. ThenT
1takes on distinct values over distinct sets ofUbut remains constant on
the same set. Ifx∈{T
1=t1}, then
f
θ(x|T 1=t1)=
f
θ(x)
Pθ{T1=t1}
. (7)
Now
P
θ{T1=t1}=

(y:T 1(y)=t 1)
fθ(y)dy or

(y:T 1(y)=t 1)
fθ(y),
depending on whether the joint distribution ofXis absolutely continuous or discrete. Since
f
θ(x)/f θ(y)is independent ofθwheneverx∼y, it follows that the ratio on the right-hand
side of (7) does not depend onθ. ThusT
1is sufficient.
Definition 6.A statistic that induces the minimal sufficient partition is called a minimal
sufficient statistic.
In view of Theorem 3 a minimal sufficient statistic is a function of every sufficient
statistic. It follows that ifT
1andT 2are both minimal sufficient, then both must induce the
same minimal sufficient partition and henceT
1andT 2must be equivalent in the sense that
each must be a function of the other (with probability 1).
How does one show that a statisticTis not sufficient for a family of distributionsP?
Other than using the definition of sufficiency one can sometimes use a result of Lehmann
and Scheffé [65] according to which ifT
1(X)is sufficient forθ,θ∈Θ, thenT 2(X)is also
sufficient if and only ifT
1(X)=g(T 2(X))for some Borel-measurable functiongand all
x∈B, whereBis a Borel set withP
θB=1.
Another way to proveTnonsufficient is to show that there existxfor whichT(x)=
T(y)butxandyare not likelihood equivalent. We refer to Sampson and Spencer [98] for
this and other similar results.
The following important result will be proved in the next section.
Theorem 4.A complete sufficient statistic is minimal sufficient.
We emphasize that the converse is not true. A minimal sufficient statistic may not be
complete.
Example 19.SupposeX∼U(θ,θ+1). ThenXis a minimal sufficient statistic. However,
Xis not complete. Take for exampleg(x)=sin2πx. Then
Eg(X)=

θ+1
θ
sin2πxdx=

1
0
sin2πxdx=0.
for allθand it follows thatXis not complete.

SUFFICIENCY, COMPLETENESS AND ANCILLARITY 355
IfX1,X2,...,X nis a sample fromU(θ,θ+1), then(X
(1),X
(n))is minimal sufficient for
θbut not complete since
E
θ(X
(n)−X
(1))=
n−1
n+1
for allθ.
Finally, we consider statistics that have distributions free of the parameter(s)θand
seem to contain no information aboutθ. We will see (Example 23) that such statistics can
sometimes provide useful information aboutθ.
Definition 7.A statisticA(x)is said to be ancillary if its distribution does not depend on
the underlying model parameterθ.
Example 20.LetX
1,X2,...,X nbe a random sample fromN(μ,1). Then the statistic
A(X)=( n−1)S
2
=
σ
n
i=1
(Xi−
X)
2
is ancillary since(n−1)S
2
∼χ
2
(n−1)which is
free ofμ. Some other ancillary statistics are
X
1−
X,X
(n)−X
(1),
n

i=1
|Xi−X|.
Also,X, a complete sufficient statistic (hence minimal sufficient) forμis independent
ofA(X).
Example 21.LetX
1,X2,...,X nbe a random sample fromN(0,σ
2
). Then,A(X)=
X
follows aN(0,n
−1
σ
2
)and not ancillary with respect to the parameterσ
2
.
Example 22.LetX
(1),X
(2),...,X
(n)be the order statistics of a random sample from the
PDFf(x−θ), whereθ∈R. Then the statisticA(X)=( X
(2)−X
(1),...X
(n)−X
(1))is
ancillary forθ.
In Example 20 we saw thatS
2
was independent of the minimal sufficient statistic
X.
The following result due to Basu shows that it is not a mere coincidence.
Theorem 5.IfS(X)is a complete sufficient statistic forθ, then any ancillary statistic
A(X) is independent ofS.
Proof.IfAis ancillary, thenP
θ{A(X) ≤a}is free ofθfor alla. Consider the conditional
probabilityg
a(s)=P {A(X) ≤a|S(X)=s}. Clearly
E
θ
ψ
g
a(S(X))
δ
=P θ{A(X) ≤a}.
Thus
E
θ(ga(S)−P{A(X) ≤a})=0

356 PARAMETRIC POINT ESTIMATION
for allθ. By completeness ofSit follows that
P
θ{ga(S)−P{A≤a}=0}=1,
that is ,
P
θ{A(X) ≤a|S(X)=s}=P{A(X) ≤a},
with probability 1. HenceAandSare independent.
The converse of Basu’s Theorem is not true. A statisticSthat is independent of every
ancillary statistic need not be complete (see, for example, Lehmann [62]).
The following example due to R.A. Fisher shows that if there is no sufficient statis-
tic forθ, but there exists a reasonable statistic not independent of an ancillary statistic
A(X), then the recovery of information is sometimes helped by the ancillary statistic via
a conditional analysis. Unfortunately, the lack of uniqueness of ancillary statistics creates
problems with this conditional analysis.
Example 23.LetX
1,X2,...,X nbe a random sample from an exponential distribution with
meanθ, and letY
1,Y2,...,Y nbe another random sample from an exponential distribution
and mean 1/θ . AssumeX’s andY’s are independent and consider the problem of estimation
ofθbased on the observations(X
1,X2,...,X n;Y1,Y2,...,Y n).LetS 1(x)=
σ
n
i=1
xiand
S
2(y)=
σ
n
i=1
yi. Then

S 1(X),S 2(Y)

is jointly sufficient forθ. It is easily seen that
(S
1,S2)is a minimal sufficient statistic forθ.
Consider the statistics
S(X,Y)=

S
1(X)/S 2(Y)

1/2
and
A(X,Y)=S
1(X)S 2(Y).
Then the joint PDF ofSandAis given by
2
[Γ(n)]
2
exp
ν
−A(x,y)
η
S(x,y)
θ
+
θ
S(x,y)
τφ
[A(x,y)]
S(x,y)
2n−1
,
and it is clear thatSandAare not independent. The marginal distribution ofAis given by
the PDF
C(x,y)[A(x,y)]
2n−1
,
whereC(x,y)is the constant of integration which depends only onx,y, andnbut not
onθ. In fact,C(x,y)=4K
0[2A(x,y)]/[Γ(n)]
2
, whereK 0is the standard form of a Bessel
function (Watson [116]). ConsequentlyAis ancillary forθ.

SUFFICIENCY, COMPLETENESS AND ANCILLARITY 357
Clearly, the conditional PDF ofSgivenA=ais of the form
1
2K0[2a]S(x,y)
exp
ν
−a
η
S(x,y)
θ
+
θ
S(x,y)
τφ
.
The amount of information lost by usingS(X,Y)alone is(
1
2n+1
)th part of the total and
this loss of information is gained by the knowledge of the ancillary statisticA(X,Y).
These calculations will be discussed in Example 8.5.9.
PROBLEMS 8.3
1.Find a sufficient statistic in each of the following cases based on a random sample
of sizen:
(a)X∼B(α,β)when (i)αis unknown,βknown; (ii)β; is unknown,αknown; and
(iii)α,βare both unknown.
(b)X∼G(α,β)when (i)αis unknown,βknown; (ii)βis unknown,αknown; and
(iii)α,βare both unknown.
(c)X∼P
N1,N2
(x), where
P
N1,N2
(x)=
1
N2−N1
,x=N 1+1,N 1+2,...,N 2,
andN
1,N2(N1<N2)are integers, when (i)N 1is known,N 2unknown; (ii)N 2
known,N 1unknown; and (iii)N 1,N2are both unknown.
(d)X∼f
θ(x), where
f
θ(x)=
Λ
e
−x+θ
if<x<∞,
0 otherwise.
(e)X∼f(x;μ,σ), where
f(x;μ,σ)=
1



exp
ν

1

2
(logx−μ)
2
φ
,x>0
(f)X∼f
θ(x), where
f
θ(x)=P θ{X=x}=c(θ)2
−x/θ
,x=θ,θ+1,...,θ >0
and
c(θ)=2
1−1/θ
(2
1/θ
−1).
(g)X∼P
θ,p(x), where
P
θ,p(x)=(1−p)p
x−θ
,x=θ,θ+1,...,0<p<1,

358 PARAMETRIC POINT ESTIMATION
when (i)pis known,θunknown; (ii)pis unknown,θknown; and ( iii)p,θare
both unknown.
2.LetX=(X
1,X2,...,X n)be a sample fromN(ασ,σ
2
), whereαis a known real
number. Show that the statisticT(X)=(
σ
n
i=1
Xi,
σ
n
i=1
X
2
i
)is sufficient forσbut
that the family of distributions ofT(X)is not complete.
3.LetX
1,X2,...,X nbe a sample fromN(μ,σ
2
). ThenX=(X 1,X2,...,X n)is clearly
sufficient for the familyN(μ,σ
2
),μ∈R,σ >0. Is the family of distributions ofX
complete?
4.LetX
1,X2,...,X nbe a sample fromU(θ−
1
2
,θ+
1
2
),θ∈R. Show that the statistic
T(X
1,...,X n)=(minX i,maxX i)is sufficient forθbut not complete.
5.IfT=g(U)andTis sufficient, then so also isU.
6.In Example 14 show that the class of all functionsgfor whichE
Pg(X)=0 for all
P∈Pconsists of functions of the form
g(k)=





0,k=1,2,...,n
0−1,n 0+2,n 0+3,...,
c,k=n
0,
−c,k=n
0+1,
wherecis a constant.
7.For the class{F
θ1
,Fθ2
}of two DFs whereF θ1
isN(0,1)andF θ1
isC(1,0),finda
sufficient statistic.
8.Consider the class of hypergeometric probability distributions{P
D:D=
0,1,2,...,N}, where
P
D{X=x}=
η
N
n
τ
−1η
D
x
τη
N−D
n−x
τ
,x=0,1,...,min{n,D}.
Show that it is a complete class. IfP={P
D:D=0,1,2,...,N,D =d,dintegral 0≤
d≤N},isPcomplete?
9.Is the family of distributions of the order statistic in sampling from a Poisson
distribution complete?
10.Let(X
1,X2,...,X n)be a random vector of the discrete type. Is the statistic
T(X
1,...,X n)=(X 1,...,X n−1)sufficient?
11.LetX
1,X2,...,X nbe a random sample from a population with lawL(X).Finda
minimal sufficient statistic in each of the following cases:
(a)X∼P(λ).
(b)X∼U[0,θ].
(c)X∼NB(1; p).
(d)X∼P
N, whereP N{X=k}=1/Nifk=1,2,...,N, and=0 otherwise.
(e)X∼N(μ,σ
2
).
(f)X∼G(α,β).
(g)X∼B(α,β).
(h)X∼f
θ(x), wheref θ(x)=(2/θ
2
)(θ−x),0<x<θ.

UNBIASED ESTIMATION 359
12.LetX 1,X2be a sample of size 2 fromP(λ). Show that the statisticX 1+αX 2, where
α>1 is an integer, is not sufficient forλ.
13.LetX
1,X2,...,X nbe a sample from the PDF
f
θ(x)=
χ
x
θ
e
−x
2
/2θ
ifx>0
0if x≤0
θ>0.
Show that
σ
n
i=1
X
2
i
is a minimal sufficient statistic forθ,but
σ
n
i=1
Xiis not sufficient.
14.LetX
1,X2,...,X nbe a sample fromN(0,σ
2
). Show that
σ
n
i=1
X
2
i
is a minimal
sufficient statistic but
σ
n
i=1
Xiis not sufficient forσ
2
.
15.LetX
1,X2,...,X nbe a sample from PDFf α,β(x)=βe
−β(x−α)
ifx>α, and=0if
x≤α. Find a minimal sufficient statistic for(α,β).
16.LetTbe a minimal sufficient statistic. Show that a necessary condition for a
sufficient statisticUto be complete is thatUbe minimal.
17.LetX
1,X2,...,X nbe iidN(μ,σ
2
). Show that(¯X,S
2
)is independent of each of(X
(n)−
X
(1))/S,(X
(n)−¯X)/S, and
σ
n−1
i=1
(Xi+1−Xi)
2
/S
2
.
18.LetX
1,X2,...,X nbe iidN(θ,1). Show that a necessary and sufficient condition for
σ
n
i=1
aiXiand
σ
n
i=1
Xito be independent is
σ
n
i=1
ai=0.
19.LetX
1,X2,...,X nbe a random sample fromf θ(x)=exp{−(x−θ)},x>θ. Show
thatX
(1)is a complete sufficient statistic which is independent ofS
2
.
20.LetX
1,X2,...,X nbe iid RVs with common PDFf θ(x)=(1/θ)exp(−x/θ),
x>0,θ>0. Show thatXmust be independent of every scale-invariant statistic
such asX
1/
σ
n
j=1
Xj.
21.LetT
1,T2be two statistics with common domainD. ThenT 1is a function ofT 2if
and only if
for allx,y∈D,T
1(x)=T 1(y)=⇒T 2(x)=T 2(y).
22.LetSbe the support off
θ,θ∈Θand letTbe a statistic such that for someθ 1,θ2∈Θ,
andx,y∈S,x =y,T(x)=T(y)butf
θ1
(x)fθ2
(y) =f θ2
(x)fθ1
(y). Then show thatTis
not sufficient forθ.
23.LetX
1,X2,...,X nbe iidN(θ,1). Use the result in Problem 22 to show that
σ
n
1
Xi

2
is not sufficient forθ.
24.(a) IfTis complete then show that any one-to-one mapping ofTis also complete.
(b) Show with the help of an example that a complete statistic is not unique for a
family of distributions.
8.4 UNBIASED ESTIMATION
In this section we focus attention on the class of unbiased estimators. We develop a
criterion to check if an unbiased estimator is optimal in this class. Using sufficiency
and completeness, we describe a method of constructing uniformly minimum variance
unbiased estimators.

360 PARAMETRIC POINT ESTIMATION
Definition 1.Let{F θ,θ∈Θ},Θ⊆R k, be a nonempty set of probability distributions.
LetX=(X
1,X2,...,X n)be a multiple RV with DFF θand sample spaceX.Letψ :Θ→ R
be a real-valued parametric function. A Borel-measurable functionT:X→Θis said to
be unbiased forψif
E
θT(X)=ψ(θ)for allθ∈Θ. (1)
Any parametric functionψfor which there exists aTsatisfying (1) is called an
estimable function. An estimator that is not unbiased is called biased, and the function
b(T,ψ), defined by
b(T,ψ)=E
θT(X)−ψ(θ), (2)
is called thebiasofT.
Remark 1.Definition 1, in particular, requires thatE
θ|T|<∞for allθ∈Θand can be
extended to the case when bothψandTare multidimensional. In most applications we
considerΘ⊆R
1,ψ(θ)=θ, andX 1,X2,...,X nare iid RVs.
Example 1.LetX
1,X2,...,X nbe a random sample from some population with finite
mean. Then
Xis unbiased for the population mean. If the population variance is finite, the
sample varianceS
2
is unbiased for the population variance. In general, if thekth population
momentm
kexists, thekth sample moment is unbiased form k.
Note thatSis not, in general, unbiased forσ.IfX
1,X2,...,X nare iidN(μ,σ
2
)RVs we
know that(n−1)S
2

2
isχ
2
(n−1). Therefore,
E(S

n−1/σ)=


0
√x
1
2
(n−1)/2
Γ[(n−1)/2]
x
(n−1)/2−1
e
−x/2
dx
=



n
2
˜

Γ
η
n−1
2

−1
,
E
σ(S)=σ
√ 2
n−1
Γ

n
2
˜

Γ
η
n−1
2

−1

.
The bias ofSis given by
b(S,σ)=σ

2
n−1
Γ

n
2
˜

Γ
η
n−1
2

−1
−1

.
We note thatb(s,σ)→0asn→∞so thatSis asymptotically unbiased forσ.
IfTis unbiased forθ,g(T)is not, in general, an unbiased estimator ofg(θ)unlessgis
a linear function.
Example 2.Unbiased estimators do not always exist. Consider an RV with PMFb(1,p).
Suppose that we wish to estimateψ(p)=p
2
. Then, in order thatTbe unbiased forp
2
,we
must have
p
2
=EpT=pT(1)+(1−p)T(0), 0≤p≤1,

UNBIASED ESTIMATION 361
that is,
p
2
=p{T(1)−T(0)}+T(0)
must hold for allpin the interval[0,1], which is impossible. (If a convergent power series
vanishes in an open interval, each of the coefficients must be 0. See also Problem 1.)
Example 3.Sometimes an unbiased estimator may be absurd. LetXbeP(λ), and
ψ(λ)=e
−3λ
. We show thatT(X)=(−2)
X
is unbiased forψ(λ).Wehave
E
λT(X)=e
−λ


x=0
(−2)
x
λ
x
x!
=e
−λ


x=0
(−2λ)
x
x!
=e
−λ
e
−2λ
=ψ(λ).
However,T(x)=(−2)
x
>0ifxis even, and<0ifxis odd, which is absurd sinceψ(λ)>0.
Example 4.LetX
1,X2,...,X nbe a sample fromP(λ). Then
Xis unbiased forλand so
also isS
2
, since both the mean and the variance are equal toλ. Indeed,α
X+(1−α)S
2
,
0≤α≤1, is unbiased forλ.
Letθbe estimable, and letTbe an unbiased estimator ofθ.LetT
1be another unbiased
estimator ofθ, different fromT. This means that there exists at least oneθsuch thatP
θ{T =
T
1}>0. In this case there exist infinitely many unbiased estimators ofθof the form
αT+(1−α)T
1,0<α<1. It is therefore desirable to find a procedure to differentiate
among these estimators.
Definition 2.Letθ
0∈ΘandU(θ 0)be the class of all unbiased estimatorsTofθ 0such
thatE
θ0
T
2
<∞. ThenT 0∈U(θ 0)is called a locally minimum variance unbiased estimator
(LMVUE) atθ
0if
E
θ0
(T0−θ0)
2
≤Eθ0
(T−θ 0)
2
(3)
holds for allT∈U(θ
0).
Definition 3.LetUbe the set of all unbiased estimatorsTofθ∈Θsuch thatE
θT
2
<∞
for allθ∈Θ. An estimatorT
0∈Uis called a uniformly minimum variance unbiased
estimator (UMVUE) ofθif
E
θ(T0−θ)
2
≤Eθ(T−θ)
2
(4)
for allθ∈Θand everyT∈U.
Remark 2.Leta
1,a2,...,a nbe any set of real numbers with
Ω
n
i=1
ai=1. Let
X
1,X2,...,X nbe independent RVs with common meanμand variancesσ
2
k
,k=1,2,...,n.
ThenT=
Ω
n
i=1
aiXiis an unbiased estimator ofμwith variance
Ω
n
i=1
a
2
i
σ
2
i
(see
Theorem 4.5.6).Tis called alinear unbiased estimatorofμ. Linear unbiased estimators
ofμthat have minimum variance (among all linear unbiased estimators) are calledbest
linear unbiased estimators(BLUEs). In Theorem 4.5.6 (Corollary 2) we have shown that,

362 PARAMETRIC POINT ESTIMATION
ifXiare iid RVs with common varianceσ
2
, the BLUE ofμisX=n
−1
σ
n
i=1
Xi.IfX iare
independent with common meanμbut different varianceσ
2
i
, the BLUE ofμis obtained
if we choosea
iproportional to 1/σ
2
i
, then the minimum variance isH/n, whereHis the
harmonic mean ofσ
2
1
,...,σ
2
n
(see Example 4.5.4).
Remark 3.Sometimes the precision of an estimatorTof parameterθis measured by the
so-calledmean square error(MSE). We say that an estimatorT
0is at least as good as any
other estimatorTin the sense of the MSE if
E
θ(T0−θ)
2
≤Eθ(T−θ)
2
for allθ∈Θ. (5)
In general, a particular estimator will be better than another for some values ofθand worse
for others. Definitions 2 and 3 are special cases of this concept if we restrict attention only
to unbiased estimators.
The following result gives a necessary and sufficient condition for an unbiased
estimator to be a UMVUE.
Theorem 1.LetUbe the class of all unbiased estimatorsTof a parameterθ∈Θwith
E
θT
2
<∞for allθ, and suppose thatUis nonempty. LetU 0be the class of all unbiased
estimatorsvof 0, that is,
U
0={v:E θv=0,E θv
2
<∞ for allθ∈Θ}.
ThenT
0∈Uis a UMVUE if and only if
E
θ(vT0)=0 for all θand allv∈U 0. (6)
Proof.The conditions of the theorem guarantee the existence ofE
θ(vT0)for allθand
v∈U
0. Suppose thatT 0∈Uis a UMVUE andE θ0
(v0T0) =0forsomeθ 0and some
v
0∈U0. ThenT 0+λv 0∈Ufor all realλ.IfE θ0
v
2
0
=0, thenE θ0
(v0T0)=0 must hold
sinceP
θ0
{v0=0}=1. LetE θ0
v
2
0
>0. Chooseλ 0=−E θ0
(T0v0)/Eθ0
v
2
0
. Then
E
θ0
(T0+λ0v0)
2
=Eθ0
T
2
0

E
2
θ
0
(v0T0)
Eθ0
v
2
0
<Eθ0
T
2
0
. (7)
SinceT
0+λ0v0∈UandT 0∈U, it follows from (7) that
var
θ0
(T0+λ0v0)<var θ0
(T0), (8)
which is a contradiction. It follows that (6) holds.
Conversely, let (6) hold for someT
0∈U,allθ ∈Θand allv∈U 0, and letT∈U. Then
T
0−T∈U 0, and for everyθ
E
θ{T0(T0−T)}=0.

UNBIASED ESTIMATION 363
We have
E
θT
2
0
=Eθ(TT0)≤(E θT
2
0
)
1/2
(EθT
2
)
1/2
by the Cauchy–Schwarz inequality. IfE θT
2
0
=0, thenP(T 0=0)=1 and there is nothing
to prove. Otherwise
(E
θT
2
0
)
1/2
≤(E θT
2
)
1/2
orvarθ(T0)≤var θ(T). SinceTis arbitrary, the proof is complete.
Theorem 2.LetUbe the nonempty class of unbiased estimators as defined in Theorem 1.
Then there exists at most one UMVUE forθ.
Proof.IfTandT
0∈Uare both UMVUEs, thenT−T 0∈U0and
E
θ{T0(T−T 0)}=0 for allθ∈Θ,
that is,E
θT
2
0
=Eθ(TT0), and it follows that
cov(T,T
0)=varθ(T0)for allθ.
SinceT
0andTare both UMVUEsvar θ(T)=var θ(T0), and it follows that the correlation
coefficient betweenTandT
0is 1. This implies thatP θ{aT+bT 0=0}=1forsomea, b
and allθ∈Θ. SinceTandT
0are both unbiased forθ,wemusthaveP θ{T=T 0}=1
for allθ.
Remark 4.Both Theorems 1 and 2 have analogs for LMVUE’s atθ
0∈Θ,θ 0fixed.
Theorem 3.If UMVUEsT
iexist for real functionsψ i,i=1,2,ofθ, they also exist for
λψ
i(λreal), as well as forψ 1+ψ2, and are given byλT iandT 1+T2, respectively.
Theorem 4.Let{T
n}be a sequence of UMVUEs andTbe a statistic withE θT
2
<∞
such thatE
θ{Tn−T}
2
→0asn→∞for allθ∈Θ. ThenTis also the UMVUE.
Proof.ThatTis unbiased follows from|E
θT−θ|≤E θ|T−T n|≤E
1/2
0
{Tn−T}
2
. For all
v∈U
0,allθ, and everyn=1,2,...,
E
θ(Tnv)=0
by Theorem 1. Therefore,
E
θ(vT)=E θ(vT)−E θ(vTn)
=E
θ[v(T−T n)]
and
|E
θ(vT)|≤(E θv
2
)
1/2
[Eθ(T−T n)
2
]
1/2
→0as n→∞

364 PARAMETRIC POINT ESTIMATION
for allθand allv∈U. Thus
E
θ(vT)=0 for allv∈U 0,allθ∈Θ,
and, by Theorem 1,Tmust be the UMVUE.
Example 5.LetX
1,X2,...,X nbe iidP(λ). Then
Xis the UMVUE ofλ. SurelyXis unbi-
ased. Letgbe an unbiased estimator of 0. ThenT(X)=X+g(X)is unbiased forθ.But
Xis complete. It follows that
E
λg(
X)=0 for all λ>0 ⇒g(x)=0forx=0,1,2,....
HenceXmust be the UMVUE ofλ.
Example 6.Sometimes an estimator with larger variance may be preferable.
LetXbe aG(1,1/β)RV.Xis usually taken as a good model to describe the time to
failure of a piece of equipment. LetX
1,X2,...,X nbe a sample ofnobservations onX. Then
Xis unbiased forEX=1/βwith variance 1/(nβ
2
).(Xis actually the UMVUE for 1/β .)
Now considerX
(n)=min(X 1,X2,...,X n). ThennX
(n)is unbiased for 1/β with variance
1/β
2
, and it has a larger variance than
X. However, if the length of time is of importance,
nX
(n)may be preferable to
X, since to observenX
(n)one needs to wait only until the first
piece of equipment fails, whereas to computeXone would have to wait until all then
observationsX
1,X2,...,X nare available.
Theorem 5.If a sample consists ofnindependent observationsX
1,X2,...,X nfrom the
same distribution, the UMVUE, if it exists, is a symmetric function of theX
i’s.
Proof.The proof is left as an exercise.
The converse of Theorem 5 is not true. IfX
1,X2,...,X nare iidP(λ)RVs,λ>0, both
XandS
2
are unbiased forθ.ButXis the UMVUE, whereasS
2
is not.
We now turn our attention to some methods for finding UMVUE’s.
Theorem 6.(Blackwell [10], Rao [87]). Let{F
θ:θ∈Θ}be a family of probability DFs
andhbe any statistic inU, whereUis the (nonempty) class of all unbiased estimators
ofθwithE
θh
2
<∞.LetTbe a sufficient statistic for{F θ,θ∈Θ}. Then the conditional
expectationE
θ{h|T}is independent ofθand is an unbiased estimator ofθ. Moreover,
E
θ(E{h|T}−θ)
2
≤Eθ(h−θ)
2
for allθ∈Θ. (9)
The equality in (9) holds if and only ifh=E{h|T}(that is,P
θ{h=E{h|T}}=1
for allθ).
Proof.We have
E
θ{E{h|T}}=E θh=θ.

UNBIASED ESTIMATION 365
It is therefore sufficient to show that
E
θ{E{h|T}}
2
≤Eθh
2
for allθ∈Θ. (10)
ButE
θh
2
=Eθ{E{h
2
|T}}, so that it will be sufficient to show that
[E{h|T}]
2
≤E{h
2
|T}. (11)
By the Cauchy–Schwarz inequality
E
2
{h|T}≤E{h
2
|T}E{1|T},
and (11) follows. The equality holds in (9) if and only if
E
θ[E{h|T}]
2
=Eθh
2
, (12)
that is,
E
θ[E{h
2
|T}−E
2
{h|T}]=0,
which is the same as
E
θ{var{h|T}}=0.
This happens if and only ifvar{h|T}=0, that is, if and only if
E{h
2
|T}=E
2
{h|T},
as will be the case if and only ifhis a function ofT. Thush=E{h|T}with probability 1.
Theorem 6 is applied along with completeness to yield the following result.
Theorem 7.(Lehmann-Scheffé [65]). IfTis a complete sufficient statistic and there exists
an unbiased estimatorhofθ, there exists a unique UMVUE ofθ, which is given by
E{h|T}.
Proof.Ifh
1,h2∈U, thenE{h 1|T}andE{h 2|T}are both unbiased and
E
θ[E{h 1|T}−E{h 2|T}]=0, for allθ∈Θ.
SinceTis a complete sufficient statistic, it follows thatE{h
1|T}=E{h 2|T}.By
Theorem 6E{h|T}is the UMVUE.
Remark 5.According to Theorem 6, we should restrict our search to Borel-measurable
functions of a sufficient statistic (whenever it exists). According to Theorem 7, if a com-
plete sufficient statisticTexists, all we need to do is to find a Borel-measurable function

366 PARAMETRIC POINT ESTIMATION
ofTthat is unbiased. If a complete sufficient statistic does not exist, an UMVUE may still
exist (see Example 11).
Example 7.LetX
1,X2,...,X nbeN(θ,1).X 1is unbiased forθ. However,
X=n
−1
σ
n
1
Xi
is a complete sufficient statistic, so thatE{X 1|
X}is the UMVUE.
We will show thatE{X
1|
X}=X.LetY =nX. ThenYisN(nθ,n),X 1isN(θ,1), and
(X
1,Y)is a bivariate normal RV with variance covariance matrix



11
1n



. Therefore,
E{X
1|y}=EX 1+
cov(X
1,Y)
var(Y)
(y−EY)
=θ+
1
n
(y−nθ)=
y
n
,
as asserted.
If we letψ(θ)=θ
2
, we can show similarly that
X
2
−1/nis the UMVUE forψ(θ).Note
thatX
2
−1/nmay occasionally be negative, so that an UMVUE forθ
2
is not very sensible
in this case.
Example 8.LetX
1,X2,...,X nbe iidb(1,p)RVs. ThenT=
σ
n
1
Xiis a complete sufficient
statistic. The UMVUE forpis clearly
X. To find the UMVUE forψ(p)=p(1 −p),we
haveE(nT)=n
2
p,ET
2
=np+n(n−1)p
2
, so thatE{nT−T
2
}=n(n−1)p(1−p), and it
follows that(nT−T
2
)/n(n−1)is the UMVUE forψ(p)=p(1−p).
Example 9.LetX
1,X2,...,X nbe a sample fromN(μ,σ
2
). Then(
X,S
2
)is a complete
sufficient statistic for(μ,σ
2
).
Xis the UMVUE forμ, andS
2
is the UMVUE forσ
2
.Also
k(n)Sis the UMVUE forσ, wherek(n)=
π
[(n−1)/2]Γ[(n−1)/2]/Γ(n/2).Wewishto
find the UMVUE for thepth quantilez
p.Wehave
p=P{X≤z
p}=P
ν
Z≤
z
p−μ
σ
φ
,
whereZisN(0,1). Thusz
p=σz 1−p+μ, and the UMVUE is
T(X
1,X2,...,X n)=z 1−pk(n)S+
X.
Example 10.(Stigler [110]). We return to Example 14. We have seen that the family
{P
(n) N
:N≥1}of PMFs ofX
(n)=max1≤i≤nXiis complete andX
(n)is sufficient for
N.NowEX
1=(N+1)/2, so thatT(X 1)=2X 1−1 is unbiased forN. It follows from
Theorem 7 thatE{T(X
1)|X
(n)}is the UMVUE ofN.Wehave
P{X
1=x1|X
(n)=y}=







y
n−1
−(y−1)
n−1
y
n
−(y−1)
n
ifx1=1,2,...,y−1,
y
n−1
y
n
−(y−1)
n
x1=y.

UNBIASED ESTIMATION 367
Thus
E{T(X
1)|X
(n)=y}=
y
n−1
−(y−1)
n−1
y
n
−(y−1)
n
y−1

x1=1
(2x1−1)+(2y−1)
y
n−1
y
n
−(y−1)
n
=
y
n+1
−(y−1)
n+1
y
n
−(y−1)
n
is the UMVUE ofN.
If we consider the familyPinstead, we have seen (Example 8.3.14 and Problem 8.3.6)
thatPis not complete. The UMVUE for the family{P
N:N≥1}isT(X 1)=2X 1−1,
which is not the UMVUE forP. The UMVUE forPis in fact, given by
T
1(k)=

2k−1,k =n
0,k =n 0+1,
2n
0,k=n 0,k=n 0+1.
The reader is asked to check thatT
1has covariance 0 with all unbiased estimatorsgof 0
that are of the form described in Example 8.3.14 and Problem 8.3.6, and hence Theorem 1
implies thatT
1is the UMVUE. ActuallyT 1(X1)is a complete sufficient statistic forP.
SinceE
n0
T1(X1)=n 0+1/n 0,T1is not even unbiased for the family{P N:N≥1}.The
minimum variance is given by
var
N(T1(X1)) =



var
N(T(X 1)) ifN<n 0,
var
N(T(X 1))−
2
N
ifN>n
0.
The following example shows that UMVUE may exist while minimal sufficient statistic
may not.
Example 11.LetXbe an RV with PMF
P
θ(X=−1)=θandP θ(X=x)=(1−θ)
2
θ
x
,
x=0,1,2,...,where 0<θ<1. Letψ(θ)=P
θ(X=0)=(1−θ)
2
. ThenXis clearly
sufficient, in fact minimal sufficient, forθbut since
E
θX=(−1)θ+


x=0
x(1−θ)
2
θ
x
=−θ+θ(1−θ)
2
d



x=1
θ
x
=0,
it follows thatXis not complete for{P
θ:0<θ<1}. We will use Theorem 1 to check if
a UMVUE forψ(θ)exists. Suppose
E
θh(X)=h(− 1)θ+


x=0
(1−θ)
2
θ
x
h(x)=0

368 PARAMETRIC POINT ESTIMATION
for all 0<θ<1. Then, for 0<θ<1,
0=θh(−1)+


x=0
θ
x
h(x)−2


x=0
θ
x+1
h(x)+


x=0
θ
x+2
h(x)
=h(0)+


x=0
θ
x+1
[h(x+1)−2h(x)+h(x −1)]
which is a power series inθ.
It follows thath(0)=0, and forx≥1,h(x+1)−2h(x)+h(x −1)=0. Thus
h(1)=h(−1),h(2)=2h(1)−h(0)= 2h(−1),
h(3)=2h(2)−h(1)= 4h(−1)−h(−1)=3h(−1),
and so on. Consequently, all unbiased estimators of 0 are of the formh(X)=cX. Clearly,
T(X)=1ifX=0, and=0 otherwise is unbiased forψ(θ). Moreover, for allθ
E{cX·T(X)}=0
so thatTis UMVUE ofψ(θ).
We conclude this section with a proof of Theorem 8.3.4.
Theorem 8.(Theorem 8.3.4) A complete sufficient statistic is minimal sufficient statistic.
Proof.LetS(X)be a complete sufficient statistic for{f
θ:θ∈Θ}and letTbe any statistic
for whichE
θ|T
2
|<∞. Writingh(S)=E θ{T|S}we see thathis UMVUE ofE θT.Let
S
1(X)be another sufficient statistic. We show thath(S)is a function ofS 1. If not, then
h
1(S1)=E θ{h(S)|S 1}is unbiased forE θTand by Rao–Blackwell theorem
var
θh1(S1)≤var θh(S),
contradicting the fact thath(S)is UMVUE forE
θT. It follows thath(S)is a function ofS 1.
SincehandS
1are arbitrary,Smust be a function of every sufficient statistic and hence,
minimal sufficient.
PROBLEMS 8.4
1.LetX
1,X2,...,X n(n≥2)be a sample fromb(1,p). Find an unbiased estimator for
ψ(p)=p
2
.
2.LetX
1,X2,...,X n(n≥2)be a sample fromN(μ,σ
2
). Find an unbiased estimator for
σ
p
, wherep+n>1. Find a minimum MSE estimator ofσ
p
.
3.LetX
1,X2,...,X nbe iidN(μ,σ
2
)RVs. Find a minimum MSE estimator of the form
αS
2
for the parameterσ
2
. Compare the variances of the minimum MSE estimator
and the obvious estimatorS
2
.

UNBIASED ESTIMATION 369
4.LetX∼b(1,θ
2
). Does there exist an unbiased estimator ofθ?
5.LetX∼P(λ). Does there exist an unbiased estimator ofψ(λ)=λ
−1
?
6.LetX
1,X2,...,X nbe a sample fromb(1,p),0<p<1, and 0<s<nbe an integer.
Find the UMVUE for (a)ψ(p)=p
s
and (b)ψ(p)=p
s
+(1−p)
n−s
.
7.LetX
1,X2,...,X nbe a sample from a population with meanθand finite variance, and
Tbe an estimator ofθof the formT(X
1,X2,...,X n)=

n
i=1
αiXi.IfTis an unbiased
estimator ofθthat has minimum variance andT

is another linear unbiased estimator
ofθ, then
cov
θ(T,T

)=varθ(T).
8.LetT
1,T2be two unbiased estimators having common varianceασ
2
(α>1), where
σ
2
is the variance of the UMVUE. Show that the correlation coefficient betweenT 1
andT 2is≥(2−α)/α.
9.LetX∼NB(1; θ)andd(θ)=P
θ{X=0}.LetX 1,X2,...,X nbe a sample onX.Find
the UMVUE ofd(θ).
10.This example covers most discrete distributions. LetX
1,X2,...,X nbe a sample from
PMF
P
θ{X=x}=
α(x)θ
x
f(θ)
,x=0,1,2,...,
whereθ>0,α(x)>0,f(θ)=

∞ x=0
α(x)θ
x
,α(0)=1, and letT=X 1+X2+···+X n.
Write
c(t,n)=

x1,x2,...,x n
n
'
i=1
α(xi).
with
n

i=1
xi=t
Show thatTis a complete sufficient statistic forθand that the UMVUE ford(θ)=θ
r
(r>0 is an integer) is given by
Y
r(t)=



0if t<r
c(t−r,n)
c(t,n)
ift≥r.
(Roy and Mitra [94])
11.LetXbe a hypergeometric RV with PMF
P
M{X=x}=
η
N
n
τ
−1η
M
x
τη
N−M
n−x
τ
,
wheremax(0,M+n−N)≤x≤min(M,n).
(a) Find the UMVUE forMwhenNis assumed to be known.
(b) Does there exist an unbiased estimator ofN(Mknown)?

370 PARAMETRIC POINT ESTIMATION
12.LetX 1,X2,...,X nbe iidG(1,1/λ)RVsλ>0. Find the UMVUE ofP λ{X1≤t0},
wheret
0>0 is a fixed real number.
13.LetX
1,X2,...,X nbe a random sample fromP(λ).Letψ (λ)=
σ

k=0
ckλ
k
be a para-
metric function. Find the UMVUE forψ(λ). In particular, find the UMVUE for
(a)ψ(λ)=1/(1 −λ),(b)ψ (λ)=λ
s
for some fixed integers>0, (c)ψ(λ)=
P
λ{X=0}, and (d)ψ(λ)=P λ{X=0or1}.
14.LetX
1,X2,...,X nbe a sample from PMF
P
N(x)=
1
N
,x=1,2,...,N.
Letψ(N)be some function ofN. Find the UMVUE ofψ(N).
15.LetX
1,X2,...,X nbe a random sample fromP(λ). Find the UMVUE ofψ(λ)=
P
λ{X=k}, wherekis a fixed positive integer.
16.Let(X
1,Y1),(X2,Y2),...,(X n,Yn)be a sample from a bivariate normal population
with parametersμ
1,μ2,σ
2
1

2
2
, andρ. Assume thatμ 1=μ2=μ, and it is required
to find an unbiased estimator ofμ. Since a complete sufficient statistic does not exist,
consider the class of all linear unbiased estimators
ˆμ(α)=α
X+(1−α)Y.
(a) Find the variance ofˆμ.
(b) Chooseα=α
0to minimizevar( ˆμ)and consider the estimator
ˆμ
0=α0
X+(1−α 0)Y.
Computevar( ˆμ
0).Ifσ 1=σ2, the BLUE ofμ(in the sense of minimum
variance) is
ˆμ
1=
X+Y
2
irrespective of whetherσ
1andρare known or unknown.
(c) Ifσ
1 =σ2andρ,σ 1,σ2are unknown, replace these values inα 0by their
corresponding estimators. Let
ˆα=
S
2
2
−S11
S
2
1
+S
2
2
−2S 11
.
Show that
ˆμ
2=
Y+(X−Y)ˆα
is an unbiased estimator ofμ.
17.LetX
1,X2,...,X nbe iidN(θ,1).Letp =Φ(x−θ), whereΦis the DF of aN(0,1)
RV. Show that the UMVUE ofpis given byΦ

(x−
x)
(
n
n−1
˜
.

UNBIASED ESTIMATION 371
18.Prove Theorem 5.
19.In Example 10 show thatT
1is the UMVUE forN(restricted to the familyP), and
compute the minimum variance.
20.Let(X
1,Y1),...,(X n,Yn)be a sample from a bivariate population with finite vari-
ancesσ
2
1
andσ
2
2
, respectively, and covarianceγ. Show that
var(S
11)=
1
n
η
μ
22−
n−2
n−1
γ
2
+
σ
2
1
σ
2
2
n−1
τ
,
whereμ
22=E[(X−EX)
2
(Y−EY)
2
]. It is assumed that appropriate order moments
exist.
21.Suppose that a random sample is taken on(X,Y)and it is desired to estimateγ,
the unknown covariance betweenXandY. Suppose that for some reason a setSof
nobservations is available on bothXandY, an additionaln
1−nobservations are
available onXbut the correspondingYvalues are missing, and an additionaln
2−n
observations ofYare available for which theXvalues are missing. LetS
1be the set
of alln
1(≥n)Xvalues, andS 2,thesetofalln 2(≥n)Yvalues, and write
ˆX=
Ω
j∈S1
Xj
n1
,ˆY=
Ω
j∈S2
Yj
n2
,X=
Ω
i∈S
Xi
n
,Y=
Ω
i∈S
Yi
n
.
Show that
ˆγ=
n
1n2
n(n1n2−n1−n2+n)

i∈S
(Xi−ˆX)(Y i−ˆY)
is an unbiased estimator ofγ. Find the variance ofˆγ, and show thatvar(ˆγ)≤
var(S
11), whereS 11is the usual unbiased estimator ofγbased on thenobservations
inS(Boas [11]).
22.LetX
1,X2,...,X nbe iid with common PDFf θ(x)=exp(−x+θ),x>θ.Letx 0be a
fixed real number. Find the UMVUE off
θ(x0).
23.LetX
1,X2,...,X nbe iidN(μ,1)RVs. LetT(X)=
Ω
n
i=1
Xi. Show thatϕ(x;t/n,n−
1/n)is UMVUE ofϕ(x;μ,1)whereϕ(x;μ,σ
2
)is the PDF of aN(μ,σ
2
)RV.
24.LetX
1,X2,...,X nbe iidG(1,θ)RVs. Show that the UMVUE off(x;θ)=
(1/θ)exp(−x/θ),x>0, is given byh(x|t)the conditional PDF ofX
1givenT(X)=
Ω
n
i=1
Xi=t, where
h(x|t)=(n−1)(t−x)
n−2
/t
n−1
forx<tand=0forx >t.
25.LetX
1,X2,...,X nbe iid RVs with common PDFf θ(x)=1/(2θ),|x|<θ, and=0
elsewhere. Show thatT(X)=max {−X
(1),X
(1)}is a complete sufficient statistic for
θ. Find the UMVU estimator ofθ
r
.
26.LetX
1,X2,...,X nbe a random sample from PDF
f
θ(x)=(1/σ)exp{−(x−μ)/σ},x>μ,σ>0,

372 PARAMETRIC POINT ESTIMATION
whereθ=(μ,σ).
(a)

X
(1),
σ
n
j=1

X
j−X
(1)

˜
is a complete sufficient statistic forθ.
(b) Show that the UMVUEs ofμandσare given by
ˆμ=X
(1)−
1
n(n−1)
n

j=1

X
j−X
(1)

,ˆσ=
1
n−1
n

j=1

X
j−X
(1)

.
(c) Find the UMVUE ofψ(μ,σ)=E
μ,σX1.
(d) Show that the UMVUE ofP
θ(X1≥t)is given by
ˆP(X
1≥t)=
n−1
n
χ
η
1−
t−X
(1)
σ
n
1
(Xj−X
(1))
τ
+

n−2
,
wherex
+
=max(x,0).
8.5 UNBIASED ESTIMATION (CONTINUED): A LOWER BOUND FOR
THE VARIANCE OF AN ESTIMATOR
In this section we consider two inequalities, each of which provides a lower bound for
the variance of an estimator. These inequalities can sometimes be used to show that an
unbiased estimator is the UMVUE. We first consider an inequality due to Fréchet, Cramér,
and Rao (the FCR inequality).
Theorem 1.(Cramér [18], Fréchet [34], Rao [86]). LetΘ⊆Rbe an open interval and
suppose the family{f
θ:θ∈Θ}satisfies the following regularity conditions:
(i) It has common support setS. ThusS={x:f
θ(x)>0}does not depend onθ.
(ii) Forx∈Sandθ∈Θ, the derivative
θ
∂θ
logfθ(x)exists and is finite.
(iii) For any statistichwithE
θ|h(X)| <∞for allθ, the operations of integration
(summation) and differentiation with respect toθcan be interchanged inE
θh(X).
That is,

∂θ
θ
h(x)f
θ(x)dx=
θ
h(x)

∂θ
f
θ(x)dx (1)
whenever the right-hand side of (1) is finite.
LetT(X)be such thatvar
θT(X)<∞for allθand setψ(θ)=E θT(X).IfI(θ)=
E
θ
ψ

∂θ
logfθ(X)
δ
2
satisfies 0<I(θ)<∞then
var
θT(X)≥

π
(θ)]
2I(θ)
. (2)

LOWER BOUND FOR THE VARIANCE 373
Proof.Since (iii) holds forh≡1, we get
0=

S
∂∂θ
f
θ(x)dx
=

S
ν

∂θ
logf
θ(x)
φ
f θ(x)dx
=E
ν

∂θ
logf
θ(X)
φ
. (3)
Differentiatingψ(θ)=E
θT(X)and using (1) we get
ψ

(θ)=

S
T(x)

∂θ
f
θ(x)dx
=

S
ν
T(x)

∂θ
logf
θ(x)
φ
f θ(x)dx
=cov
η
T(X),

∂θ
logf
θ(X)
τ
. (4)
Also, in view of (3) we have
var
θ
η

∂θ
logf
θ(X)
τ
=E θ
ν

∂θ
logf
θ(X)
φ
2
and using Cauchy–Schwarz inequality in (4) we get


(θ)]
2
≤varθT(X)E θ
ν

∂θ
logf
θ(X)
φ
2
which proves (2). Practically the same proof may be given whenf θis a PMF by replacing
ˆ
byΣ.
Remark 1.If, in particular,ψ(θ)=θ, then (2) reduces to
var
θ(T(X))≥
1
I(θ)
. (5)
Remark 2.LetX
1,X2,...,X nbe iid RVs with common PDF (PMF)f θ(x). Then
I(θ)=E
θ
ν
∂logf
θ(X)
∂θ
φ
2
=
n

i=1

ν
∂logf
θ(Xi)
∂θ
φ
2
=nEθ
ν
∂logf
θ(X1)
∂θ
φ
2
=nI1(θ),
whereI
1(θ)=E θ
)
∂logf θ(X1)
∂θ
*
2
. In this case the inequality (2) reduces to
var
θ(T(X))≥


(θ)]
2
nI1(θ)
.

374 PARAMETRIC POINT ESTIMATION
Definition 1.The quantity
I
1(θ)=E θ
ν
∂logf
θ(X1)
∂θ
φ
2
(6)
is called Fisher’s information inX
1and
I
n(θ)=E θ
ν
∂logf
θ(X)
∂θ
φ
2
=nI1(θ) (7)
is known as Fisher information in the random sampleX
1,X2,...,X n.
Remark 3.Asngets larger, the lower bound forvar
θ(T(X))gets smaller. Thus, as the
Fisher information increases, the lower bound decreases and the “best” estimator (one for
which equality holds in (2)) will have smaller variance, consequently more information
aboutθ.
Remark 4.Regularity condition (i) is unnecessarily restrictive. An examination of the
proof shows that it is only necessary that (ii) and (iii) hold for (2) to hold. Condition (i)
excludes distributions such asf
θ(x)=(1/θ),0<x<θ, for which (3) fails to hold. It also
excludes densities such asf
θ(x)=1,θ<x<θ+1, orf θ(x)=
2
π
sin
2
(x+π),θ≤x≤θ+π,
each of which satisfies (iii) forh≡1 so that (3) holds but not (1) for allhwithE
θ|h|<∞.
Remark 5.Sufficient conditions for regularity condition (iii) may be found in most calcu-
lus textbooks. For example if (i) and (ii) hold then (iii) holds provided that for allhwith
E
θ|h|<∞for allθ∈Θ, bothE θ
)
h(X)
∂logf θ(X)
∂θ
*
andE
θ


h(X)
∂fθ(X)
∂θ
are contin-
uous functions ofθ. Regularity conditions (i) to (iii) are satisfied for a one-parameter
exponential family.
Remark 6.The inequality (2) holds trivially ifI(θ)=∞ (andψ

(θ)is finite) or if
var
θ(T(X)) =∞.
Example 1.LetX∼b(n,p);Θ=(0,1)⊂R. Here the Fisher Information may be obtained
as follows:
logf
p(x)=log
η
n
x
τ
+xlogp+(n−x)log(1−p),
∂logf
p(x)
∂p
=
x
p

n−x
1−p
,
and
E
p
η
∂logf
p(x)
∂p
τ
2
=
n
p(1−p)
=I(p).
Letψ(p)beafunctionofp andT(X)be an unbiased estimator ofψ(p). The only condition
that need be checked is differentiability under the summation sign. We have
ψ(p)=E
pT(X)=
n

x=0
η
n
x
τ
T(x)p
x
(1−p)
n−x
,

LOWER BOUND FOR THE VARIANCE 375
which is a polynomial inpand hence can be differentiated with respect top. For any
unbiased estimatorT(X)ofpwe have
var
p(T(X))≥
1
n
p(1−p)=
1
I(p)
,
and since
var
η
X
n
τ
=
np(1−p)
n
2
=
p(1−p)
n
,
it follows that the variance of the estimatorX/nattains the lower bound of the FCR
inequality, and henceT(X)has least variance among all unbiased estimators ofp. Thus
T(X)is UMVUE forp.
Example 2.LetX∼P(λ). We leave the reader to check that the regularity conditions are
satisfied and
var
λ(T(X))≥λ.
SinceT(X)=Xhas varianceλ,Xis the UMVUE ofλ. Similarly, if we take a sample
of sizenfromP(λ), we can show that
I
n(λ)=
n
λ
andvar
λ(T(X 1,...,X n))≥
λ
n
andXis the UMVUE.
Let us next consider the problem of unbiased estimation ofψ(λ)=e
−λ
based on a
sample of size 1. The estimator
∂(X)=
Λ
1ifX=0
0ifX≥1
is unbiased forψ(λ)since
E
λ∂(X)=E λ[∂(X)]
2
=Pλ{X=0}=e
−λ
.
Also,
var
λ(∂(X)) =e
−λ
(1−e
−λ
).
To compute the FCR lower bound we have
logf
λ(x)=xlogλ−λ−logx!.
This has to be differentiated with respect toe
−λ
, since we want a lower bound for an
estimator of the parametere
−λ
.Letθ=e
−λ
. Then
logf
θ(x)=xlog log
1
θ
+logθ−logx!,

∂θ
logf
θ(x)=x
1
θlogθ
+
1
θ
,

376 PARAMETRIC POINT ESTIMATION
and
E
θ
ν

∂θ
logf
θ(X)
φ
2
=
1
θ
2

1+
2
logθ
log
1
θ
+
1
(logθ)
2

log
1
θ
+
η
log
1
θ
τ
2

=e

ν
1−2+
1
λ
2
(λ+λ
2
)
φ
=
e

λ
=I(e
−λ
),
so that
var
θT(X)≥
λ
e

=
1
I(e
−λ
)
,
whereθ=e
−λ
.
Sincee
−λ
(1−e
−λ
)>λe
−2λ
forλ>0, we see thatvar(δ(X))is greater than the lower
bound obtained from the FCR inequality. We show next thatδ(X)is the only unbiased
estimator ofθand hence is the UMVUE.
Ifhis any unbiased estimator ofθ,itmustsatisfyE
θh(X)=θ. That is, for allλ>0
e
−λ
=


k=0
h(k)e
−λ
λ
k
k!
.
Equating coefficients of powers ofλwe see immediately thath(0)= 1 andh(k)=0for
k=1,2,....Itfollowsthath(X )=∂(X).
The same computation can be carried out whenX
1,X2,...,X nis random sample from
P(λ). We leave the reader to show that the FCR lower bound for any unbiased estimator of
θ=e
−λ
isλe
−2λ
/n. The estimator
θ
n
i=1
∂(Xi)/nis clearly unbiased fore
−λ
with variance
e
−λ
(1−e
−λ
)/n>(λe
−2λ
)/n. The UMVUE ofe
−λ
is given byT 0=

n−1
n

θ
n
i=1
Xi
with
var
λ(T0)=e
−2λ
(e
λ/n
−1)>(λe
−2λ
)/nfor allλ>0.
Corollary.LetX
1,X2,...,X nbe iid with common PDFf θ(x). Suppose the family{f θ:
θ∈Θ}satisfies the conditions of Theorem 1. Then equality holds in (2) if and only if, for
allθ∈Θ,
T(x)−ψ(θ)=k(θ)

∂θ
logf
θ(x) (8)
for some functionk(θ).
Proof.Recall that we derived (2) by an application of Cauchy–Schwatz inequality where
equality holds if and only if (8) holds.
Remark 7.Integrating (8) with respect toθwe get
logf
θ(x)=Q(θ )T(x)+S(θ)+A(x )

LOWER BOUND FOR THE VARIANCE 377
for some functionsQ,S, andA.Itfollowsthatf θis a one-parameter exponential family
and the statisticTis sufficient forθ.
Remark 8.A result that simplifies computations is the following. Iff
θis twice differen-
tiable andE
θ
ψ

∂θ
logfθ(X)
δ
can be differentiated under the expectation sign, then
I(θ)=E
θ
ν

∂θ
logf
θ(X)
φ
2
=−E θ
ν

2
∂θ
2
logfθ(X)
φ
. (9)
For the proof of (9), it is straightforward to check that

2
∂θ
2
logfθ(x)=
f
ππ
θ
(x)
fθ(x)

ν

∂θ
logf
θ(x)
φ
2
.
Taking expectations on both side we get (9).
Example 3.LetX
1,X2,...,X nbe iidN(μ,1). Then
logf
μ(x)=−
1
2
log(2π )−
(x−μ)
2
2
,

∂μ
logf
μ(x)=x−μ,

2
∂μ
2
logfμ(x)=−1.
HenceI(μ)=1 andI
n(μ)=n.
We next consider an inequality due to Chapman, Robbins, and Kiefer (the CRK inequal-
ity) that gives a lower bound for the variance of an estimator but does not require regularity
conditions of the Fréchet–Cramér–Rao type.
Theorem 2(Chapman and Robbins [12], Kiefer [52]).LetΘ⊂Rand{f
θ(x):θ∈Θ}
be a class of PDFs (PMFs). Letψbe defined onΘ, and letTbe an unbiased estimator
ofψ(θ)withE
θT
2
<∞for allθ∈Θ.Ifθ =ϕ, assume thatf θandf ϕare different and
assume further that there exists aϕ∈Θsuch thatθ =ϕand
S(θ)={f
θ(x)>0}⊃S (ϕ)={f ϕ(x)>0}. (10)
Then
var
θ(T(X))≥ sup
{ϕ:S(ϕ)⊂ S(θ),ϕ =θ}
[ψ(ϕ)−ψ(θ)]
2
varθ{fϕ(X)/f θ(X)}
(11)
for allθ∈Ω.

378 PARAMETRIC POINT ESTIMATION
Proof.SinceTis unbiased forψ,E ϕT(X)=ψ(ϕ)for allϕ∈Θ. Hence, forϕ =θ,
θ
S(θ)
T(x)
f
ϕ(x)−f θ(x)
fθ(x)
f
θ(x)dx=ψ(ϕ)−ψ(θ), (12)
which yields
cov
θ
ν
T(X),
f
ϕ(X)
fθ(X)
−1
φ
=ψ(ϕ)−ψ(θ).
Using the Cauchy–Schwarz inequality, we get
cov
2
θ
ν
T(X),
f
ϕ(X)
fθ(X)
−1
φ
≤var
θ(T(X))var θ
ν
f
ϕ(X)
fθ(X)
−1
φ
=var
θ(T(X))var θ
η
f
ϕ(X)
fθ(X)
τ
.
Thus
var
θ(T(X))≥
[ψ(ϕ)−ψ(θ)]
2
varθ{fϕ(X)/f θ(X)}
,
and the result follows. In the discrete case it is necessary only to replace the integral in the
left side of (12) by a sum. The rest of the proof needs no change.
Remark 9.Inequality (11) holds without any regularity conditions onf
θorψ(θ). We will
show that it covers some nonregular cases of the FCR inequality. Sometimes (11) is avail-
able in an alternative form. Letθandθ+δ(δ =0)be any two distinct values inΘsuch
thatS(θ+δ)⊂S(θ), and takeψ(θ)=θ. Write
J=J(θ,δ)=
1
δ
2
χ
η
f
θ+δ(X)
fθ(X)
τ
2
−1

.
Then (11) can be written as
var
θ(T(X))≥
1
inf
δ
EθJ
, (13)
where the infimum is taken over allδ =0 such thatS(θ+δ)⊂S(θ).

LOWER BOUND FOR THE VARIANCE 379
Remark 10.Inequality (11) applies if the parameter space is discrete, but the Fréchet–
Cramér–Rao regularity conditions do not hold in that case.
Example 4.LetXbeU[0,θ]. The regularity conditions of FCR inequality do not hold in
this case. Letψ(θ)=θ.Ifϕ<θ, thenS(ϕ)⊂S(θ).Also,
E
θ
ν
f
ϕ(X)
fθ(X)
φ
2
=
Θ
ϕ
0
η
θ
ϕ
τ
2
1
θ
dx=
θ
ϕ
.
Thus
var
θ(T(X))≥sup
(ϕ:ϕ<θ)
(ϕ−θ)
2
(θ/ϕ)−1
=sup
(ϕ:ϕ<θ)
{ϕ(θ−ϕ)}=
θ
2
4
for any unbiased estimatorT(X)ofθ.Xis a complete sufficient statistic, and 2X is unbiased
forθso thatT(X)=2Xis the UMVUE. Also
var
θ(2X)=4varX=
θ
2
3
>
θ
2
4
.
Thus the lower bound ofθ
2
/4 of the CRK inequality is not achieved by any unbiased
estimator ofθ.
Example 5.LetXhave PMF
P
N{X=k}=



1
N
,k=1,2,...,N
0,otherwise.
LetΘ={N:N≥M,M>1given}.Takeψ (N)=N. Although the FCR regularity
conditions do not hold, (11) is applicable since, forN =N

∈Θ⊂R,
S(N)={1,2,...,N}⊃S(N

)={1, 2,...,N

} ifN

<N.
Also,P
NandP N
Θare different forN =N

. Thus
var
N(T)≥sup
N
Θ
<N
(N−N

)
2
varN{PN
Θ/PN}
.
Now
P
N
Θ
PN
(x)=
P
N
Θ(x)
PN(x)
=



N
N

,x=1,2,...,N

,N

<N,
0, otherwise,
E
N
ν
P
N
Θ(X)
PN(X)
φ
2
=
1
N
N
Θ

1
η
N
N

τ
2
=
N
N

,

380 PARAMETRIC POINT ESTIMATION
and
var
N
ν
P
N
Θ(X)
PN(X)
φ
=
N
N

−1>0forN>N

.
It follows that
var
N(T(X))≥sup
N
Θ
<N
(N−N

)
2
(N−N

)/N

=sup
N
Θ
<N
N

(N−N

).
Now
k(N−k)
(k−1)(N−k+1)
>1 if and only ifk<
N+1
2
,
so thatN

(N−N

)increases as long asN

<(N+1)/2 and decreases ifN

>(N+1)/2.
The maximum is achieved atN

=[(N+1)/2]ifM≤(N+1)/2 and atN

=MifM>
(N+1)/2, where[x]is the largest integer≤x. Therefore,
var
N(T(X))≥

N+1
2

N−

N+1
2

ifM≤(N+1)/2
and
var
N(T(X))≥M(N−M)ifM>(N+1)/2,
Example 6.LetX∼N(0,σ
2
). Let us computeJ(see Remark 9) forδ =0.
J=
1
δ
2
Λ
η
f
σ+δ(X)
fσ(X)
τ
2
−1

=
1
δ
2

σ
2n
(σ+δ)
2n
exp
ν

Ω
X
2
i
(σ+δ)
2
+
Ω
X
2
i
σ
2
φ
−1

=
1
δ
2
+
η
σ
σ+δ
τ
2n
exp
νΩ
X
2
i

2
+2σδ)
σ
2
(σ+δ)
2
φ
−1
,
,
E
σJ=
1
δ
2
η
σ
σ+δ
τ
2n

ν
exp
η
c
Ω
X
2
i
σ
2
τφ

1
δ
2
,
wherec=(δ
2
+2σδ)/(σ+δ)
2
.
Since
Ω
X
2
i

2
∼χ
2
(n)
E
σJ=
1
δ
2
Λ
η
σ
σ+δ
τ
2n
1
(1−2c)
n/2
−1

forc<
1
2
.
Letk=δ/σthen
c=
2k+k
2
(1+k)
2
and 1−2c=
1−2k−k
2
(1+k)
2
,
E
σJ=
1
k
2
σ
2
[(1+k)
−n
(1−2k−k
2
)
−n/2
−1].

LOWER BOUND FOR THE VARIANCE 381
Here 1+k>0 and 1−2c>0, so that 1−2k−k
2
>0, implying−

2<k+1<

2 and
alsok>−1. Thus−1<k<

2−1 andk =0. Also,
lim
k→0
EσJ= lim
k→0
(1+k)
−n
(1−2k−k
2
)
−n/2
−1
k
2
σ
2
=
2n
σ
2
by L’Hospital’s rule. We leave the reader to check that this is the FCR lower bound for
var
σ(T(X)). But the minimum value ofE σJis not achieved in the neighborhood ofk=0
so that the CRK inequality is sharper than the FCR inequality. Next, we show that for
n=2 we can do better with the CRK inequality. We have
E
σJ=
1
k
2
σ
2
ν
1
(1−2k−k
2
)(1+k)
2
−1
φ
=
(k+2)
2
σ
2
(1+k)
2
(1−2k−k
2
)
,−1<k<

2−1,k =0.
Fork=−0.1607 we achieve the lower bound as(E
σJ)
−1
=0.2698σ
2
, so that
var
σ(T(X))≥0.2698σ
2

2
/4. Finally, we show that this bound is by no means the
best available; it is possible to improve on the Chapman–Robbins–Kiefer bounds too in
some cases. Take
T(X
1,X2,...,X n)=
Γ(n/2)
Γ[(n+1)/2]
σ

2
Ω
n
1
X
2
i
σ
2
to be an estimate ofσ.NowE σT=σand
E
σT
2
=
σ
2
2
η
Γ(n/2)
Γ[(n+1)/2]
τ
2
E
νΩ
n
1
X
2
i
σ
2
φ
=

2
2
ν
Γ(n/2)
Γ[(n+1)/2]
φ
2
so that
var
σ(T)=σ
2
Λ
n
2
η
Γ(n/2)
Γ[(n+1)/2]
τ
2
−1

.
Forn=2,
var
σ(T)=σ
2

4
π
−1

=0.2732σ
2
,
which is>0.2698σ
2
, the CRK bound. Note thatTis the UMVUE.
Remark 11.In general the CRK inequality is as sharp as the FCR inequality. See Chapman
and Robbins [12, pp. 584–585], for details.
We next introduce the concept ofefficiency.

382 PARAMETRIC POINT ESTIMATION
Definition 2.LetT 1,T2be two unbiased estimators for a parameterθ. Suppose that
E
θT
2
1
<∞,E θT
2
2
<∞. We define the efficiency ofT 1relative toT 2by
eff
θ(T1|T2)=
var
θ(T2)
varθ(T1)
(14)
and say thatT
1is more efficient thanT 2if
eff
θ(T1|T2)>1. (15)
It is usual to consider the performance of an unbiased estimator by comparing its
variance with the lower bound given by the FCR inequality.
Definition 3.Assume that the regularity conditions of the FCR inequality are satisfied by
the family of DFs{F
θ,θ∈Θ},Θ⊆R. We say that an unbiased estimatorTfor parameter
θis most efficient for the family{F
θ}if
var
θ(T)=
+
E θ
ν
∂logf
θ(X)
∂θ
φ
2
,
−1
=1/I n(θ). (16)
Definition 4.LetTbe the most efficient estimator for the regular family of DFs{F
θ,
θ∈Θ}. Then the efficiency of any unbiased estimatorT
1ofθis defined as
eff
θ(T1)=effθ(T1|T)=
var
θ(T)
varθ(T1)
=
1
In(θ)varθ(T1)
. (17)
Clearly, the efficiency of the most efficient estimator is 1, and the efficiency of any
unbiased estimatorT
1is<1.
Definition 5.We say that an estimatorT
1is asymptotically (most) efficient if
lim
n→∞
effθ(T1)=1 (18)
andT
1is at least asymptotically unbiased in the sense thatlim n→∞EθT1=θ.Heren is
the sample size.
Remark 12.Definition 3, although in common usage, has many drawbacks. We have
already seen cases in which the regularity conditions are not satisfied and yet UMVUEs
exist. The definition does not cover such cases. Moreover, in many cases where the regu-
larity conditions are satisfied and UMVUEs exist, the UMVUE is not most efficient since
the variance of the best estimator (the UMVUE) does not achieve the lower bound of the
FCR inequality.
Example 7.LetX∼b(n,p). Then we have seen in Example 1 thatX/nis the UMVUE
since its variance achieves the lower bound of the FCR inequality. It follows thatX/nis
most efficient.

LOWER BOUND FOR THE VARIANCE 383
Example 8.LetX 1,X2,...,X nbe iidP(λ)RVs and supposeψ(λ)=P λ(X=0)=e
−λ
.
From Example 2, the UMVUE ofψis given byT
0=

n−1n

σ
n
i=1
Xi
with
var
λ(T0)=e
−2λ
(e
λ/n
−1).
AlsoI
n(λ)=n/(λe
−2λ
).Itfollowsthat
eff
λ(T0)=
(λe
−2λ
)/n
e
−2λ
(e
λ/n
−1)
<
λe
−2λ
/n
e
−2λ
(λ/n)
=1
sincee
x
−1>xforx>0. ThusT 0is not most efficient. However, sinceeff λ(T0)→1as
n→∞,T
0is asymptotically efficient.
In view of Remarks 6 and 7, the following result describes the relationship between
most efficient unbiased estimators and UMVUEs.
Theorem 3.A necessary and sufficient condition for an unbiased estimatorTofψto be
most efficient is thatTbe sufficient and the relation (8) holds for some functionk(θ).
Clearly, an estimatorTsatisfying the conditions of Theorem 3 will be the UMVUE, and
two estimators coincide. We emphasize that we have assumed the regularity conditions of
FCR inequality in making this statement.
Example 9.Let(X,Y)be jointly distributed with PDF
f
θ(x,y)=exp
)


x
θ
+θy
˜*
,x>0,y>0.
For a sample(x,y)of size 1, we have


∂θ
logf
θ(x,y)=

∂θ

x
θ
+θy
˜
=−
x
θ
2
+y.
Hence, information for this sample is
I(θ)=E
θ

Y−
X
θ
2
˜
2
=Eθ(Y
2
)+
E(X
2
)
θ
4

2E(XY)
θ
2
.
Now
E
θ(Y
2
)=
2
θ
2
,E θ(X
2
)=2θ
2
andE(XY)=1,
so that
I(θ)=
2
θ
2
+
2
θ
2

2
θ
2
=
2
θ
2
.
Therefore, amount of Fisher’s Information in a sample ofnpairs is
2n
θ
2.

384 PARAMETRIC POINT ESTIMATION
We return to Example 8.3.23 whereX 1,X2,...,X nare iidG(1,θ)andY 1,Y2,...,Y n
are iidG(1,1/θ), andX’s andY’s are independent. Then(X 1,Y1)has common PDF
f
θ(x,y)given above. We will compute Fisher’s Information forθin the family of PDFs
ofS(X,Y)=(
σ
X
i/
σ
Y i)
1/2
.UsingthePDFsof
σ
X i∼G(n,θ)and
σ
Y i∼G(n,1/θ)
and the transformation technique, it is easy to see thatS(X,Y)has PDF
g
θ(s)=
2Γ(2n)
[Γ(n)]
2
s
−1
η
s
θ
+
θ
s
τ
−2n
,s>0.
Thus
∂logg
θ(s)
∂θ
=−2n
η

s
θ
2
+
1
s
τη
s
θ
+
θ
s
τ
−1
.
It follows that
E
θ
ν

∂θ
logg
θ(S)
φ
2
=
4n
2
θ
2

χ
1−4
η
S
θ
+
θ
S
τ
−2

=
4n
2
θ
2
ν
1−4
n
2(2n+1)
φ
=
2n
θ
2
η
2n
2n+1
τ
<
2n
θ
2
.
That is, the information aboutθinSis smaller than that in the sample.
The Fisher Information in the conditional PDF ofSgivenA=a, whereA(X,Y)=
S
1(X)S 2(Y), can be shown (Problem 12) to equal
2a
θ
2
K1(2a)
K0(2a)
,
whereK
0andK 1are Bessel functions of order 0 and 1, respectively. Averaging over all val-
ues ofA, one can show that the information is 2n/θ
2
which is the total Fisher information
in the sample ofnpairs(x
j,yj)’s.
PROBLEMS 8.5
1.Are the following families of distributions regular in the sense of Fréchet, Cramér,
and Rao? If so, find the lower bound for the variance of an unbiased estimator based
on a sample sizen.
(a)f
θ(x)=θ
−1
e
−x/θ
ifx>0, and=0 otherwise;θ>0.
(b)f
θ(x)=e
−(x−θ)
ifθ<x<∞, and=0 otherwise.
(c)f
θ(x)=θ(1−θ)
x
,x=0,1,2,...;0<θ<1.
(d)f(x;σ
2
)=(1/σ

2π)e
−x
2
/2σ
2
,−∞<x<∞;σ
2
>0.
2.Find the CRK lower bound for the variance of an unbiased estimator ofθ, based on
a sample of sizenfrom the PDF of Problem 1(b).
3.Find the CRK bound for the variance of an unbiased estimator ofθin sampling from
N(θ,1).

LOWER BOUND FOR THE VARIANCE 385
4.In Problem 1 check to see whether there exists a most efficient estimator in each
case.
5.LetX
1,X2,...,X nbe a sample from a three-point distribution:
P{X=y
1}=
1−θ
2
,P{X=y
2}=
1
2
,P{X=y
3}=
θ
2
,
where 0<θ<1. Does the FCR inequality apply in this case? If so, what is the lower
bound for the variance of an unbiased estimator ofθ?
6.LetX
1,X2,...,X nbe iid RVs with meanμand finite variance. What is the efficiency
of the unbiased (and consistent) estimator[2/n(n +1)]
σ
n
i=1
iXirelative to
X?
7.When does the equality hold in the CRK inequality?
8.LetX
1,X2,...,X nbe a sample fromN(μ,1), and letd(μ)=μ
2
:
(a) Show that the minimum variance of any estimator ofμ
2
from the FCR inequality
is 4μ
2
/n:
(b) Show thatT(X
1,X2,...,X n)=
X
2
−(1/n) is the UMVUE ofμ
2
with variance
(4μ
2
/n+2/n
2
).
9.LetX
1,X2,...,X nbe iidG(1,1/α)RVs:
(a) Show that the estimatorT(X
1,X2,...,X n)=(n−1)/n
Xis the UMVUE forα
with variancea
2
/(n−2).
(b) Show that the minimum variance from FCR inequality isα
2
/n.
10.In Problem 8.4.16 compute the relative efficiency ofˆμ
0with respect toˆμ 1.
11.LetX
1,X2,...,X nandY 1,Y2,...,Y mbe independent samples fromN(μ,σ
2
1
)and
N(μ,σ
2
2
), respectively, whereμ,σ
2
1

2
2
are unknown. Letρ=σ
2
2

2
1
andθ=m/n,
and consider the problem of unbiased estimation ofμ:
(a) Ifρis known, show that
ˆμ
0=α
X+(1−α)Y,
whereα=ρ/(ρ+θ)is the BLUE ofμ. Computevar( ˆμ
0).
(b) Ifρis unknown, the unbiased estimator
¯μ=
X+θY
1+θ
is optimum in the neighborhood ofρ=1. Find the variance of¯μ.
(c) Compute the efficiency of¯μrelative toˆμ
0.
(d) Another unbiased estimator ofμis
ˆμ=
ρF
X+θY
θ+ρF
,
whereF=S
2
2
/ρS
2
1
is anF(m−1,n−1)RV.
12.Show that the Fisher Information onθbased on the PDF
1
2K0(2a)
sexp
ν
−a
η
s
θ
+
θ
s
τφ
for fixedaequals
2a
θ
2
K1(2a)
K0(2a)
, whereK 0(2a)andK 1(2a)are Bessel functions of order
0 and 1 respectively.

386 PARAMETRIC POINT ESTIMATION
8.6 SUBSTITUTION PRINCIPLE (METHOD OF MOMENTS)
One of the simplest and oldest methods of estimation is thesubstitution principle:Let
ψ(θ),θ∈Θbe a parametric function to be estimated on the basis of a random sample
X
1,X2,...,X nfrom a population DFF. Suppose we can writeψ(θ)=h(F )for some known
functionh. Then the substitution principle estimator ofψ(θ)ish(F

n
), whereF

n
is the
sample distribution function. Accordingly we estimateμ=μ(F)byμ(F

n
)=
X,m k=
E
FX
k
by
σ
n
j=1
Xj/n, and so on. Themethod of momentsis a special case when we need to
estimate some known function of a finite number of unknown moments. Let us suppose
that we are interested in estimating
θ=h(m
1,m2,...,m k), (1)
wherehis some known numerical function andm
jis thejth-order moment of the
population distribution that is known to exist for 1≤j≤k.
Definition 1.The method of moments consists in estimatingθby the statistic
T(X
1,...,X n)=h

n
−1
n

1
xi,n
−1
n

1
X
2
i
,...,n
−1
n

1
X
k
i

. (2)
To make sure thatTis a statistic, we will assume thath:R
k→Ris a Borel-measurable
function.
Remark 1.It is easy to extend the method to the estimation of joint moments. Thus we
usen
−1
σ
n
1
XiYito estimateE(XY)and so on.
Remark 2.From the WLLN,n
−1
σ
n
i=1
X
j
i
P
−→EX
j
. Thus, if one is interested in estimating
the population moments, the method of moments leads to consistent and unbiased estima-
tors. Moreover, the method of moments estimators in this case are asymptotically normally
distributed (see Section 7.5).
Again, if one estimates parameters of the typeθdefined in (1) andhis a continuous
function, the estimatorsT(X
1,X2,...,X n)defined in (2) are consistent forθ(see Prob-
lem 1). Under some mild conditions onh, the estimatorTis also asymptotically normal
(see Cramér [17, pp. 386–387]).
Example 1.LetX
1,X2,...,X nbe iid RVs with common meanμand varianceσ
2
. Then
σ=
π
(m2−m
2
1
), and the method of moments estimator forσis given by
T(X
1,...,X n)=
-
.
.
/
1
n
n

1
X
2
i

(
σ
X
i)
2
n
2
.
AlthoughTis consistent and asymptotically normal forσ, it is not unbiased.
In particular, ifX
1,X2,...,X nare iidP(λ)RVs, we know thatEX 1=λandvar(X 1)=λ.
The method of moments leads to using either
Xor
σ
n
1
(Xi−X)
2
/nas an estimator ofλ.
To avoid this kind of ambiguity we take the estimator involving the lowest-order sample
moment.

SUBSTITUTION PRINCIPLE (METHOD OF MOMENTS) 387
Example 2.LetX 1,X2,...,X nbe a sample from
f(x)=



1
b−a
,a≤x≤b,
0, otherwise.
Then
EX=
a+b
2
and var(X)=
(b−a)
2
12
.
The method of moments leads to estimatingEXbyXandvar(X)by
σ
n
1
(Xi−X)
2
/nso
that the estimators foraandb, respectively, are
T
1(X1,...,X n)=
X−
0
3
σ
n 1
(Xi−
X)
2
n
and
T
2(X1,...,X n)=
X+
0
3
σ
n
1
(Xi−X)
2
n
.
Example 3.LetX
1,X2,...,X Nbe iidb(n,p)RVs, where bothnandpare unknown. The
method of moments estimators ofpandnare given by X=EX=np
and
1
N
N

1
X
2
i
=EX
2
=np(1−p)+n
2
p
2
.
Solving fornandp, we get the estimator forpas
T
1(X1,...,X N)=
X
T2(X1,...,X N)
,
whereT
2(X1,...,X N)is the estimator forn, given by
T
2(X1,X2,...,X N)=
(
X)
2
X+X
2


σ
N
1
X
2
i
/N
˜.
Note that
X
P
−→np,
σ
N
1
X
2
i
/N
P
−→np(1−p)+n
2
p
2
, so that bothT 1andT 2are consistent
estimators.
Method of moments may lead to absurd estimators. The reader is asked to compute
estimators ofθinN(θ,θ)orN(θ,θ
2
)by the method of moments and verify this assertion.

388 PARAMETRIC POINT ESTIMATION
PROBLEMS 8.6
1.LetX
n
P−→a, andY n
P−→b, whereaandbare constants. Leth:R 2→Rbe a continuous
function. Show thath(X
n,Yn)
P
−→h(a,b).
2.LetX
1,X2,...,X nbe a sample fromG(α,β). Find the method of moments estimator
for(α,β).
3.LetX
1,X2,...,X nbe a sample fromN(μ,σ
2
). Find the method of moments estimator
for(μ,σ
2
).
4.LetX
1,X2,...,X nbe a sample fromB(α,β). Find the method of moments estimator
for(α,β).
5.A random sample of sizenis taken from the lognormal PDF
f(x;μ,σ)=(σ

2π)
−1
x
−1
exp
ν

1

2
(logx−μ)
2
φ
,x>0.
Find the method of moments estimators forμandσ
2
.
8.7 MAXIMUM LIKELIHOOD ESTIMATORS
In this section we study a frequently used method of estimation, namely, themethod of
maximum likelihood estimation. Consider the following example.
Example 1.LetX∼b(n,p). One observation onXis available, and it is known thatnis
either 2 or 3 andp=
1
2
or
1
3
. Our objective is to estimate the pair(n,p). The following
table gives the probability thatX=xfor each possible pair(n,p):
x(2,
1
2
)(2,
1
3
)(3,
1
2
)(3,
1
3
)Maximum Probability
0
1
4
4
9
1
8
8
27
4
9
1
1
2
4
9
3
8
12
27
1
2
2
1
4
1
9
3
8
6
27
3
8
30 0
1
8
1
27
1
8
The last column gives the maximum probability in each row, that is, for each value that
Xassumes. If the valuex=1, say, is observed, it is more probable that it came from
the distributionb(2,
1
2
)than from any of the other distributions and so on. The following
estimator is, therefore, reasonable in that it maximizes the probability of the observed value:
(ˆn,ˆ p)(x)=











(2,
1
3
)ifx=0,
(2,
1
2
)ifx=1,
(3,
1
2
)ifx=2,
(3,
1
2
)ifx=3.

MAXIMUM LIKELIHOOD ESTIMATORS 389
Theprinciple of maximum likelihoodessentially assumes that the sample is represen-
tative of the population and chooses as the estimator that value of the parameter which
maximizes the PDF (PMF)f
θ(x).
Definition 1.Let(X
1,X2,...,X n)be a random vector with PDF (PMF)f θ(x1,x2,...,x n),
θ∈Θ. The function
L(θ;x
1,x2,...,x n)=fθ(x1,x2,...,x n), (1)
considered as a function ofθ, is called the likelihood function.
Usuallyθwill be a multiple parameter. IfX
1,X2,...,X nare iid with PDF (PMF)f θ(x),
the likelihood function is
L(θ;x
1,x2,...,x n)=
n
'
i=1
fθ(xi). (2)
LetΘ⊆R
kandX=(X 1,X2,...,X n).
Definition 2.The principle of maximum likelihood estimation consists of choosing as an
estimator ofθa
ˆ
θ(X)that maximizesL(θ;x
1,x2,...,x n), that is, to find a mapping
ˆ
θof
R
n→R kthat satisfies
L(
ˆ
θ;x
1,x2,...,x n)=sup
θ∈θ
L(θ;x 1,x2,...,x n). (3)
(Constants are not admissible as estimators.)
If a
ˆ
θsatisfying (3) exists, we call it amaximum likelihood estimator(MLE).
It is convenient to work with the logarithm of the likelihood function. Sincelogis a
monotone function,
logL(
ˆ
θ;x
1,...,x n)=sup
θ∈θ
logL(θ;x 1,...,x n). (4)
LetΘbe an open subset ofR
k, and suppose thatf θ(x)is a positive, differentiable
function ofθ(that is, the first-order partial derivatives exist in the components ofθ). If a
supremum
ˆ
θexists, it must satisfy thelikelihood equations
∂logL(
ˆ
θ;x
1,...,x n)
∂θj
=0, j=1,2,...,k,θ=(θ 1,...,θk). (5)
Any nontrivial root of the likelihood equations (5) is called an MLE in theloose sense.
A parameter value that provides the absolute maximum of the likelihood function is called
an MLE in thestrictsenseor,simply,anMLE.
Remark 1.IfΘ⊆R, there may still be many problems. Often the likelihood equa-
tion∂L/∂ θ=0 has more than one root, or the likelihood function is not differentiable
everywhere inΘ,or
ˆ
θmay be a terminal value. Sometimes the likelihood equation
may be quite complicated and difficult to solve explicitly. In that case one may have to

390 PARAMETRIC POINT ESTIMATION
resort to some numerical procedure to obtain the estimator. Similar remarks apply to the
multiparameter case.
Example 2.LetX
1,X2,...,X nbe a sample fromN(μ,σ
2
), where bothμandσ
2
are
unknown. HereΘ={(μ,σ
2
),−∞<μ<∞,σ
2
>0}. The likelihood function is
L(μ,σ
2
;x1,...,x n)=
1
σ
n
(2π)
n/2
exp
χ

n

i=1
(xi−μ)
2

2

and
logL(μ,σ
2
;x)=−
n
2
logσ
2

n
2
log(2π )−
σ
n
1
(xi−μ)
2

2
.
The likelihood equations are
1
σ
2
n

i=1
(xi−μ)=0
and

n
2
1
σ
2
+
1

4
n

i=1
(xi−μ)
2
=0.
Solving the first of these equations forμ, we getˆμ=
Xand, substituting in the second,
ˆσ
2
=
σ
n
i=1
[(Xi−
X)
2
/n]. We see that(ˆμ,ˆσ
2
)∈Θwith probability 1. We show that(ˆμ,ˆσ
2
)
maximizes the likelihood function. First note thatXmaximizesL(μ,σ
2
;x)whateverσ
2
is, sinceL(μ,σ
2
;x)→0as|μ|→∞, and in that caseL(ˆμ,σ
2
;x)→0asσ
2
→0or∞
whenever
ˆ
θ∈Θ,
ˆ
θ=(ˆμ,ˆσ
2
).
Note thatˆσ
2
is not unbiased forσ
2
. Indeed,Eˆσ
2
=[(n−1)/n]σ
2
.Butnˆσ
2
/(n−1)=S
2
is unbiased, as we already know. Also,ˆμis unbiased, and bothˆμandˆσ
2
are consistent. In
addition,ˆμandˆσ
2
are method of moments estimators forμandσ
2
, and(ˆμ,ˆσ
2
)is jointly
sufficient.
Finally, note thatˆμis the MLE ofμifσ
2
is known; but ifμis known, the MLE ofσ
2
is notˆσ
2
but
σ
n 1
(Xi−μ)
2
/n.
Example 3.LetX
1,X2,...,X nbe a sample from PMF
P
N(k)=



1
N
,k=1,2,...,N,
0 otherwise.
The likelihood function is
L(N;k
1,k2,...,k n)=



1
N
n
,1≤max(k 1,...,k n)≤N,
0, otherwise.
Clearly the MLE ofNis given by
ˆN(X
1,X2,...,X n)=max(X 1,X2,...,X n),

MAXIMUM LIKELIHOOD ESTIMATORS 391
if we take anyˆα<ˆNas the MLE, thenP ˆα(k1,k2,...,k n)=0; and if we take any
ˆ
β>ˆN
as the MLE, thenP
ˆ
β(k1,k2,...,k n)=1/(
ˆ
β)
n
<1/(ˆN)
n
=PˆN(k1,k2,...,k n).
We see that the MLEˆNis consistent, sufficient, and complete, but not unbiased.
Example 4.Consider the hypergeometric PMF
P
N(x)=






M
x

N−M
n−x


N
n
,max(0,n−N+M)≤x≤min(n,M),
0, otherwise.
To find the MLEˆN=ˆN(X)ofNconsider the ratio
R(N)=
P
N(x)
PN−1(x)
=
N−n
N
N−M
N−M−n+x
.
For values ofNfor whichR(N)>1,P
N(x)increases withN, and for values ofNfor
whichR(N)<1,P
N(x)is a decreasing function ofN:
R(N)>1 if and only ifN<
nM
x
and
R(N)<1 if and only ifN>
nM
x
.
It follows thatP
N(x)reaches its maximum value whereN≈nM/x. ThusˆN(X)=[nM/X],
where[x]denotes the largest integer≤x.
Example 5.LetX
1,X2,...,X nbe a sample fromU[θ−
1
2
,θ+
1
2
]. The likelihood function is
L(θ;x
1,x2,...,x n)=





1ifθ−
1
2
≤min(x 1,...,x n)
≤max(x
1,...,x n)≤θ+
1
2
,
0 otherwise.
ThusL(θ;x)attains its maximum provided that
θ−
1
2
≤min(x
1,...,x n)andθ+
1
2
≥max(x
1,...,x n),
or when
θ<min(x
1,...,x n)+
1
2
andθ≥max(x
1,...,x n)−
1
2
.
It follows that every statisticT(X
1,X2,...,X n)such that
max
1≤i≤n
Xi−
1
2
≤T(X
1,X2,...,X n)≤min
1≤i≤n
Xi+
1
2
(6)

392 PARAMETRIC POINT ESTIMATION
is an MLE ofθ. Indeed, for 0<α<1,
T
α(X1,...,X n)= max
1≤i≤n
Xi−
1
2
+α(1+min 1≤i≤n
Xi−max
1≤i≤n
Xi)
lies in interval (6), and hence for eachα,0<α<1,T
α(X1,...,X n)is an MLE ofθ.In
particular, ifα=
1
2
,
T
1/2(X1,...,X n)=
minX
i+maxX i
2
is an MLE ofθ.
Example 6.LetX∼b(1,p),p∈[
1
4
,
3
4
]. In this caseL(p;x)=p
x
(1−p)
1−x
,x=0,1, and
we cannot differentiateL(p;x)to get the MLE ofp, since that would lead toˆp=x,avalue
that does not lie inΘ=[
1
4
,
3
4
].Wehave
L(p;x)=
χ
p, x=1,
1−p,x=0,
which is maximized if we chooseˆ p(x)=
1
4
ifx=0, and=
3
4
ifx=1. Thus the MLE ofp
is given by
ˆ p(X)=
2X+1
4
.
Note thatE
pˆ p(X)=(2p+1)/4, so thatˆ pis biased. Also, the mean square error forˆ pis
E
p(ˆ p(X)−p)
2
=
1
16
E
p(2X+1−4p)
2
=
1
16
.
In the sense of the MSE, the MLE is worse than the trivial estimatorδ(X)=
1
2
,for
E
p(
1
2
−p)
2
=(
1
2
−p)
2

1
16
forp∈[
1
4
,
3
4
].
Example 7.LetX
1,X2,...,X nbe iidb(1,p)RVs, and suppose thatp∈(0,1).If
(0,0,...,0)((1, 1,...,1)) is observed,
X=0(X=1)is the MLE, which is not an admissible
value ofp. Hence an MLE does not exist.
Example 8.(Oliver [78]). This example illustrates a distribution for which an MLE is
necessarily an actual observation, but not necessarily any particular observation. Let
X
1,X2,...,X nbe a sample from PDF
f
θ(x)=









2
α
x
θ
, 0≤x≤θ,
2
α
α−x
α−θ
,θ≤x≤α,
0, otherwise,

MAXIMUM LIKELIHOOD ESTIMATORS 393
whereα>0 is a (known) constant. The likelihood function is
L(θ;x
1,x2,...,x n)=
η
2
α
τ
n
'
xi≤θ

x
i
θ
˜'
xi>0
η
α−x
i
α−θ
τ
,
where we have assumed that observations are arranged in increasing order of magnitude,
0≤x
1<x2<···<x n≤α. ClearlyLis continuous inθ(even forθ=somex i) and
differentiable for values ofθbetween any twox
i’s. Thus, forx j<θ<x j+1,wehave
L(θ)=
η
2
α
τ
n
θ
−j
(α−θ)
−(n−j)
j
'
i=1
xi
n
'
i=j+1
(α−x i),
∂logL
∂θ
=−
j
θ
+
n−j
α−θ
,and

2
logL
∂θ
2
=
j
θ
2
+
n−j
(α−θ)
2
>0.
It follows that any stationary value that exists must be a minimum, so that there can be no maximum in any rangex
j<θ<x j+1. Moreover, there can be no maximum in 0≤θ<x 1
orxn<θ≤α. This follows since, for 0≤θ<x 1,
L(θ)=
η
2
α
τ
n
(α−θ)
−n
n
'
i=1
(α−x i)
is a strictly increasing function ofθ. By symmetry,L(θ)is a strictly decreasing function
ofθinx
n<θ≤α. We conclude that an MLE has to be one of the observations.
In particular, letα=5 andn=3, and suppose that the observations, arranged in
increasing order of magnitude, are 1, 2,4. In this case the MLE can be shown to be
ˆ
θ=1,
which corresponds to the first-order statistic. If the sample values are 2,3,4, the third-order
statistic is the MLE.
Example 9.LetX
1,X2,...,X nbe a sample fromG(r,1/β);β>0 andr>0 are both
unknown. The likelihood function is
L(β,r;x
1,x2,...,x n)=



β
nr
{Γ(r)}
n
ε
n
i=1
x
r−1
i
exp

−β
σ
n
i=1
xi

,x
i≥0,
0, otherwise.
Then
logL(β,r)=nrlogβ−nlog Γ(r)+(r−1)
n

i=1
logxi−β
n

i=1
xi,
∂logL(β,r)
∂β
=
nr
β

n

i=1
xi=0,
∂logL(β,r)
∂r
=nlogβ−n
Γ
π
(r)
Γ(r)
+
n

i=1
logxi=0.
The first of the likelihood equations yields
ˆ
β(x
1,x2,...,x n)=ˆr/¯x, while the second gives
nlog
r
x
+
n

i=1
logxi−n
Γ
π
(r)
Γ(r)
=0,

394 PARAMETRIC POINT ESTIMATION
that is,
logr−
Γ

(r)
Γ(r)
=logx−
1
n
n

i=1
logxi,
whichistobesolvedforˆr. In this case, the likelihood equation is not easily solvable and
it is necessary to resort to numerical methods, using tables forΓ

(r)/Γ(r).
Remark 2.We have seen that MLEs may not be unique, although frequently they are.
Also, they are not necessarily unbiased even if a unique MLE exists. In terms of MSE,
an MLE may be worthless. Moreover, MLEs may not even exist. We have also seen that
MLEs are functions of sufficient statistics. This is a general result, which we now prove.
Theorem 1.LetTbe a sufficient statistic for the family of PDFs (PMFs){f
θ:θ∈Θ}.If
a unique MLE ofθexists, then it is a (nonconstant) function ofT.IfaMLEofθ exists but
is not unique, then one can find a MLE that is a function ofT.
Proof.SinceTis sufficient, we can write
L(θ)=f
θ(x)=h(x )g θ(T(x)),
for allx,allθ, and somehandg
θ. If a unique MLE
ˆ
θexists that maximizesL(θ),italso
maximizesg
θ(T(x))and hence
ˆ
θis a function ofT.IfaMLEofθ exists but is not unique,
we choose a particular MLE
ˆ
θfrom the set of all MLE’s which is a function ofT.
Example 10.LetX
1,X2,...,X nbe a random sample fromU[θ,θ+1],θ∈R. Then the
likelihood function is given by
L(θ;x)=
η
1
2
τ
n
I
[θ−1≤x
(1)≤x
(n)≤θ+1](x).
We note thatT(X)=(X
(1),X
(n))is jointly sufficient forθand anyθsatisfying
θ−1≤x
(1)≤x
(n)≤θ+1,
or, equivalently,
x
(n)−1≤θ≤x
(1)+1
maximizes the likelihood and hence is an MLE forθ. Thus, for 0≤α≤1,
ˆ
θ
α=α(X
(n)−1)+(1−α)(X
(1)+1)
is an MLE ofθ.Ifαis a constant independent of theX’s, then
ˆ
θ
αis a function ofT.If,
on the other hand,αdepends on theX’s, then
ˆ
θ
αmay not be a function ofTalone. For
example
ˆ
θ
α=(sin
2
X1)(X
(n)−1) + (cos
2
X1)(X
(1)+1)
is an MLE ofθbut not a function ofTalone.

MAXIMUM LIKELIHOOD ESTIMATORS 395
Theorem 2.Suppose that the regularity conditions of the FCR inequality are satisfied and
θbelongs to an open interval on the real line. If an estimator
ˆ
θofθattains the FCR lower
bound for the variance, the likelihood equation has a unique solution
ˆ
θthat maximizes the
likelihood.
Proof.If
ˆ
θattains the FCR lower bound, we have [see (8.5.8)]
∂logf
θ(X)
∂θ
=[k(θ)]
−1
[
ˆ
θ(X)−θ]
with probability 1, and the likelihood equation has a unique solutionθ=
ˆ
θ.
Let us writeA(θ)=[k(θ)]
−1
. Then

2
logfθ(X)
∂θ
2
=A
π
(θ)(
ˆ
θ−θ)−A(θ),
so that

2
logfθ(X)
∂θ
2




θ=
ˆ
θ
=−A(θ).
We need only to show thatA(θ)>0.
Recall from (8.5.4) withψ(θ)=θthat
E
θ
ν
[T(X)−θ]
∂logf
θ(X)
∂θ
φ
=1,
and substitutingT(X)−θ=k(θ)
∂logf θ(X)
∂θ
we get
k(θ)E
θ
ν
∂logf
θ(X)
∂θ
φ
2
=1.
That is,
A(θ)=E
ν
∂logf
θ(X)
∂θ
φ
2
>0
and the proof is complete.
Remark 3.In Theorem 2 we assumed the differentiability ofA(θ)and the existence of the
second-order partial derivative∂
2
logfθ/∂ θ
2
. If the conditions of Theorem 2 are satisfied,
the most efficient estimator is necessarily the MLE. It does not follow, however, that every
MLE is most efficient. For example, in sampling from a normal population,ˆσ
2
=
σ
n
1
(Xi−
X)
2
/nis the MLE ofσ
2
, but it is not most efficient. Since
σ
(X i−X)
2

2
isχ
2
(n−1),
we see thatvar(ˆσ
2
)=2(n −1)σ
4
/n
2
, which is not equal to the FCR lower bound, 2σ
4
/n.
Note thatˆσ
2
is not even an unbiased estimator ofσ
2
.
We next consider an important property of MLEs that is not shared by other methods
of estimation. Often the parameter of interest is notθbut some functionh(θ).If
ˆ
θis MLE

396 PARAMETRIC POINT ESTIMATION
ofθwhat is the MLE ofh(θ)?Ifλ=h(θ)is a one to one function ofθ, then the inverse
functionh
−1
(λ)=θis well defined and we can write the likelihood function as a function
ofλ.Wehave
L

(λ;x)=L(h
−1
(λ);x)
so that
sup
λ
L

(λ;x)=sup
λ
L(h
−1
(λ);x)=sup
θ
L(θ;x).
It follows that the supremum ofL

is achieved atλ=h(
ˆ
θ). Thush(
ˆ
θ)is the MLE ofh(θ).
In many applicationsλ=h(θ)is not one-to-one. It is still tempting to take
ˆ
λ=h(
ˆ
θ)as
the MLE ofλ. The following result provides a justification.
Theorem 3(Zehna [122]).Let{f
θ:θ∈Θ}be a family of PDFs (PMFs), and letL(θ)be
the likelihood function. Suppose thatΘ⊆R
k,k≥1. Leth:Θ→Λbe a mapping ofΘ
ontoΛ, whereΛis an interval inR
p(1≤p≤k).If
ˆ
θis an MLE ofθ, thenh(
ˆ
θ)is an MLE
ofh(θ).
Proof.For eachλ∈Λ, let us define
Θ
λ={θ:θ∈Θ,h(θ)=λ}
and
M(λ;x)= sup
θ∈Θ λ
L(θ;x).
ThenMdefined onΛis called the likelihood function induced byh.If
ˆ
θis any MLE of
θ, then
ˆ
θbelongs to one and only one set,Θ
ˆ
λ. Since
ˆ
θ∈Θ ˆ
λ,
ˆ
λ=h(
ˆ
θ).Now
M(
ˆ
λ;x)= sup
θ∈Θ λ
L(θ;x)≥L(
ˆ
θ;x)
and
ˆ
λmaximizesM, since
M(
ˆ
λ;x)≤sup
λ∈Λ
M(λ;x)= sup
θ∈Θ λ
L(θ;x)=L(
ˆ
θ;x),
so thatM(
ˆ
λ;x)=sup
λ∈ΛM(λ;x). It follows that
ˆ
λis an MLE ofh(θ), where
ˆ
λ=h(
ˆ
θ).
Example 11.LetX∼b(1,p),0≤p≤1, and leth(p)=var(X)=p(1−p).Wewishto
find the MLE ofh(p). Note thatΛ=[0,
1
4
]. The functionhis not one-to-one. The MLE of
pbased on a sample of sizenisˆp(X
1,...,X n)=
X. Hence the MLE of parameterh(p)is
h(X)=X(1−X).
Example 12.Consider a random sample fromG(1,β). It is required to find the MLE ofβ
in the following manner. A sample of sizenis taken, and it is known only thatk,0≤k≤n,
of these observations are≤M, whereMis a fixed positive number.

MAXIMUM LIKELIHOOD ESTIMATORS 397
Letp=P{X i≤M}=1−e
−M/β
, so that−M/β=log(1−p)andβ=M/log[1/
(1−p)]. Therefore, the MLE ofβisM/log[1/(1 −ˆp)], whereˆ pis the MLE ofp.To
compute the MLE ofpwe have
L(p;x
1,x2,...,x n)=p
k
(1−p)
n−k
,
so that the MLE ofpisˆ p=k/n. Thus the MLE ofβis
ˆ
β=
M
log[n/(n −k)]
.
Finally we consider some important large-sample properties of MLE’s. In the following
we assume that{f
θ,θ∈Θ}is a family of PDFs (PMFs), whereΘis an open interval onR.
The conditions listed below are stated whenf
θis a PDF. Modifications for the case where
f
θis a PMF are obvious and will be left to the reader.
(i)∂logf
θ/∂ θ,∂
2
logfθ/∂ θ
2
,∂
3
logfθ/∂ θ
3
exist for allθ∈Θand everyx.Also,
Θ

−∞
∂fθ(x)
∂θ
dx=E
θ
∂logf θ(X)
∂θ
=0 for allθ∈Θ.
(ii)
ˆ

−∞∂
2
fθ(x)
∂θ
2
dx=0 for allθ∈Θ.
(iii)−∞<
ˆ

−∞∂
2
logfθ(x)
∂θ
2
fθ(x)dx<0 for allθ.
(iv) There exists a functionH(x)such that for allθ∈Θ





3
logfθ(x)
∂θ
3




<H(x)and
Θ

−∞
H(x)f θ(x)dx=M(θ)<∞.
(v) There exists a functiong(θ)which is positive and twice differentiable for every
θ∈Θ, and a functionH(x)such that for allθ





2
∂θ
2

g(θ)
∂logf
θ
∂θ

<H(x)and
Θ

−∞
H(x)f θ(x)dx<∞.
Note that the condition (v) is equivalent to condition (iv) with the added qualification
thatg(θ)=1.
We state the following results without proof.
Theorem 4(Cramér [17]).
(a) Conditions (i), (iii), and (iv) imply that, with probability approaching 1, asn→∞,
the likelihood equation has a consistent solution.
(b) Conditions (i) through (iv) imply that a consistent solution
ˆ
θ
nof the likelihood
equation is asymptotically normal, that is,
σ
−1

n(
ˆ
θn−θ)
L
−→Z,

398 PARAMETRIC POINT ESTIMATION
whereZisN(0,1)and
σ
2
=
+
E θ
ν
∂logf
θ(X)
∂θ
φ
2
,
−1
.
On occasions one encounters examples where the conditions of Theorem 4 are not
satisfied and yet a solution of the likelihood equation is consistent and asymptotically
normal.
Example 13(Kulldorf [57]).LetX∼N(0,θ),θ >0. LetX
1,X2,...,X nbenindepen-
dent observations onX. The solution of the likelihood equation is
ˆ
θ
n=
Ω
n
i=1
X
2
i
/n.Also,
EX
2
=θ,var(X
2
)=2θ
2
, and
E
θ
ν
∂logf
θ(X)
∂θ
φ
2
=
1

2
.
We note that
ˆ
θ
n
a.s.−−→θ
and

n(
ˆ
θn−θ)=θ

2
Ω
n 1
X
2
i
−nθ

2nθ
L
−→N(0,2θ
2
).
However,

3
logfθ

3
θ
=−
1
θ
3
+
3x
2
θ
4
→∞ asθ→0
and is not bounded in 0<θ<∞. Thus condition (iv) does not hold.
The following theorem covers such cases also.
Theorem 5(Kulldorf [57]).
(a) Conditions (i), (iii), and (v) imply that, with probability approaching 1 asn→∞,
the likelihood equation has a solution.
(b) Conditions (i), (ii), (iii), and (v) imply that a consistent solution of the likelihood
equation is asymptotically normal.
Proof of Theorems 4 and 5.For proofs we refer to Cramér [17, p. 500], and Kulldorf [57].
Remark 4.It is important to note that the results in Theorems 4 and 5 establish the con-
sistency of some root of the likelihood equation but not necessarily that of the MLE when
the likelihood equation has several roots. Huzurbazar [47] has shown that under certain
conditions the likelihood equation has at most one consistent solution and that the like-
lihood function has a relative maximum for such a solution. Since there may be several
solutions for which the likelihood function has relative maxima, Cramér’s and Huzur-
bazar’s results still do not imply that a solution of the likelihood equation that makes the
likelihood function an absolute maximum is necessarily consistent.

MAXIMUM LIKELIHOOD ESTIMATORS 399
Wald [115] has shown that under certain conditions the MLE is strongly consistent. It
is important to note that Wald does not make any differentiability assumptions.
In any event, if the MLE is a unique solution of the likelihood equation, we can use
Theorems 4 and 5 to conclude that it is consistent and asymptotically normal. Note that
the asymptotic variance is the same as the lower bound of the FCR inequality.
Example 14.ConsiderX
1,X2,...,X niidP(λ)RVs,λ∈Θ=(0,∞). The likelihood equa-
tion has a unique solution,
ˆ
λ(x
1,...,x n)=
X, which maximizes the likelihood function. We
leave the reader to check that the conditions of Theorem 4 hold and that MLEXis consis-
tent and asymptotically normal with meanλand varianceλ/n, a result that is immediate
otherwise.
We leave the reader to check that in Example 13 conditions of Theorem 5 are satisfied.
Remark 5.The invariance and the large sample properties of MLEs permit us to find
MLEs of parametric functions and their limiting distributions. The delta method intro-
duced in Section 7.5 (Theorem 1) comes in handy in these applications. Suppose in
Example 13 we wish to estimateψ(θ)=θ
2
. By invariance of MLEs, the MLE ofψ(θ)
isψ(
ˆ
θ
n)where
ˆ
θ n=
σ
n
1
X
2
i
/nis the MLE ofθ. Applying Theorem 7.5.1 we see thatψ(
ˆ
θ n)
isAN(θ
2
,8θ
4
/n).
In Example 14, suppose we wish to estimateψ(λ)=P
λ(X=0)=e
−λ
. Thenψ(
ˆ
λ)=
e

X
is the MLE ofψ(λ)and, in view of Theorem 7.5.1,ψ(
ˆ
λ)∼AN(e
−λ
,λe
−2λ
/n).
Remark 6.Neither Theorem 4 nor Theorem 5 guarantee asymptotic normality for a unique
MLE. Consider, for example, a random sample fromU(0,θ]. ThenX
(n)is the unique MLE
forθand in Problem 8.2.5 we asked the reader to show thatn(θ−X
(n))
L
→G(1,θ).
PROBLEMS 8.7
1.LetX
1,X2,...,X nbe iid RVs with common PMF (pdf)f θ(x).FindanMLEforθin
each of the following cases:
(a)f
θ(x)=
1
2
e
−|x−θ|
,−∞<x<∞.
(b)f
θ(x)=e
−x+θ
,θ≤x<∞.
(c)f
θ(x)=(θα)x
α−1
e
−θx
α
,x>0, andαknown.
(d)f
θ(x)=θ(1−x)
θ−1
,0≤x≤1,θ>1.
2.Find an MLE, if it exists, in each of the following cases:
(a)X∼b(n,θ): bothnandθ∈[0,1]are unknown, and one observation is available.
(b)X
1,X2,...,X n∼b(1,θ),θ∈[
1
2
,
3
4
].
(c)X
1,X2,...,X n∼N(θ,θ
2
),θ∈R.
(d)X
1,X2,...,X nis a sample from
P{X=y
1}=
1−θ
2
,P{X=y
2}=
1
2
,P{X=y
3}=
θ
2
(0<θ<1).

400 PARAMETRIC POINT ESTIMATION
(e)X 1,X2,...,X n∼N(θ,θ), 0<θ<∞.
(f)X∼C(θ,0).
3.Suppose thatnobservations are taken on an RVXwith distributionN(μ,1),
but instead of recording all the observations one notes only whether or not the
observation is less than 0. If{X<0}occursm(<n)times, find the MLE ofμ.
4.LetX
1,X2,...,X nbe a random sample from PDF
f(x;α,β)=β
−1
e
−β
−1
(x−α)
,α< x<∞, −∞<α<∞,β> 0.
(a) Find the MLE of(α,β).
(b) Find the MLE ofP
α,β{X1≥1}.
5.LetX
1,X2,...,X nbe a sample from exponential densityf θ(x)=θe
−θx
,x≥0,θ>0.
Find the MLE ofθ, and show that it is consistent and asymptotically normal.
6.For Problem 8.6.5 find the MLE for(μ,σ
2
).
7.For a sample of size 1 taken fromN(μ,σ
2
), show that no MLE of(μ,σ
2
)exists.
8.For Problem 8.6.5 suppose that we wish to estimateNon the basis of observations
X
1,X2,...,X M:
(a) Find the UMVUE ofN.
(b) Find the MLE ofN.
(c) Compare the MSEs of the UMVUE and the MLE.
9.LetX
ij(1=1,2,...,s;j=1,2,...,n)be independent RVs whereX ij∼N(μ i,σ
2
),i=
1,2,...,s. Find MLEs forμ
1,μ2,...,μs, andσ
2
. Show that the MLE forσ
2
is not
consistent ass→∞(nfixed) (Neyman and Scott [77]).
10.Let(X,Y)have a bivariate normal distribution with parametersμ
1,μ2,σ
2
1

2
2
, andρ.
Suppose thatnobservations are made on the pair(X,Y), andN−nobservations on
Xthat is,N−nobservations onYare missing. Find the MLE’s ofμ
1,μ2,σ
2
1

2
2
, and
ρ(Anderson [2]).
[Hint:Iff(x,y;μ
1,μ2,σ
2
1

2
2
,ρ)is the joint PDF of(X,Y)write
f(x,y;μ
1,μ2,σ
2
1

2
2
,ρ)=f 1(x;μ 1,σ
2
1
)f
Y|X(y|β x,σ
2
2
(1−ρ
2
)),
wheref
1is the marginal (normal) PDF ofX, andf
Y|Xis the conditional (normal) PDF
ofY,givenxwith mean
β
x=
η
μ 2−ρ
σ
2
σ1
μ1
τ

σ
2
σ1
x
and varianceσ
2
2
(1−ρ
2
). Maximize the likelihood function first with respect toμ 1
andσ
2
1
and then with respect toμ 2−ρ(σ 2/σ1)μ1,ρσ2/σ1, andσ
2
2
(1−ρ
2
).]
11.In Problem 5, let
ˆ
θdenote the MLE ofθ.FindtheMLEofμ=EX
1=1/θand its
asymptotic distribution.
12.In Problem 1(d), find the asymptotic distribution of the MLE ofθ.

BAYES AND MINIMAX ESTIMATION 401
13.In Problem 2(a), find MLE ofd(θ)=θ
2
and its asymptotic distribution.
14.LetX
1,X2,...,X nbe a random sample from some DFFon the real line. Suppose
we observex
1,x2,...,x nwhich are all different. Show that the MLE ofFisF

n
,the
empirical DF of the sample.
15.LetX
1,X2,...,X nbe iidN(μ,1). SupposeΘ={μ ≥0}.FindtheMLEof μ.
16.Let(X
1,X2,...,X k−1)have a multinomial distribution with parametersn,p 1,...,
p
k−1,0≤p 1,p2,...,p k−1≤1,
Ω
k−1
1
pj≤1, wherenis known. Find the MLE of
(p
1,p2,...,p k−1).
17.Consider the one parameter exponential density introduced in Section 5.5 in its
natural form with PDF
f
θ(x)=exp{ηT(x)+D(η )+S(x)}.
(a) Show that the MGF ofT(X)is given by
M(t)=exp{D(η)−D(η+t)}
fortin some neighborhood of the origin. Moreover,E
ηT(X)=−D

(η)and
var(T(X)) =−D∂∂(η).
(b) If the equationE
ηT(X)=T(x)has a solution, it must be the unique MLE ofη.
18.In Problem 1(b) show that the unique MLE ofθis consistent. Is it asymptotically
normal?
8.8 BAYES AND MINIMAX ESTIMATION
In this section we consider the problem of point estimation in a decision-theoretic setting.
We will consider here Bayes and minimax estimation.
Let{f
θ:θ∈Θ}be a family of PDFs (PMFs) andX 1,X2,...,X nbe a sample from this
distribution. Once the sample point(x
1,x2,...,x n)is observed, the statistician takes an
actionon the basis of these data. Let us denote byAthe set of allactionsordecisions
open to the statistician.
Definition 1.A decision functionδis a statistic that takes values inA, that is,δis a
Borel-measurable function that mapsR
nintoA.
IfX=xis observed, the statistician takes actionδ(X)∈A.
Example 1.LetA={a
1,a2}. Then any decision functionδpartitions the space of values
of(X
1,...,X n), namely,R n,intoasetC and its complementC
c
, such that ifx∈Cwe
take actiona
1, and ifx∈C
c
actiona 2is taken. This is the problem of testing hypotheses,
which we will discuss in Chapter 9.
Example 2.LetA=Θ. In this case we face the problem of estimation.

402 PARAMETRIC POINT ESTIMATION
Another element of decision theory is the specification of aloss function, which
measures the loss incurred when we take a decision.
Definition 2.LetAbe an arbitrary space of actions. A nonnegative functionLthat maps
Θ×AintoRis called a loss function.
The valueL(θ,a)is the loss to the statistician if he takes actionawhenθis the true
parameter value. If we use the decision functionδ(X)and loss functionLandθis the true
parameter value, then the loss is the RVL(θ,δ(X)). (As always, we will assume thatLis
a Borel-measurable function.)
Definition 3.LetDbe a class of decision functions that mapR
nintoA, and letLbe a
loss function onΘ×A. The functionRdefined onΘ×Dby
R(θ,δ)=E
θL(θ,δ(X)) (1)
is known as the risk function associated withδatθ.
Example 3.LetA=Θ⊆ R,L(θ,a)=|θ−a|
2
. Then
R(θ,δ)=E
θL(θ,δ(X)) =E θ{δ(X)−θ}
2
,
which is just the MSE. If we restrict attention to estimators that are unbiased, the risk is
just the variance of the estimator.
The basic problem of decision theory is the following: Given a space of actionsA, and a
loss functionL(θ,a), find a decision functionδinDsuch that the riskR(θ,δ)is “minimum”
in some sense for allθ∈Θ. We need first to specify some criterion for comparing the
decision functionsδ.
Definition 4.The principle of minimax is to chooseδ

∈Dso that
max
θ
R(θ,δ

)≤max
θ
R(θ,δ) (2)
for allδinD. Such a ruleδ

, if it exists, is called a minimax (decision) rule.
If the problem is one of estimation, that is, ifA=Θ, we callδ

satisfying (2) aminimax
estimatorofθ.
Example 4.LetX∼b(1,p),p∈Θ={
1
4
,
1
2
}, andA={a 1,a2}. Let the loss function be
defined as follows.
a1a2
p1=
1
4
14
p
2=
1
2
32

BAYES AND MINIMAX ESTIMATION 403
The set of decision rules includes four functions:δ 1,δ2,δ3,δ4, defined byδ 1(0)=δ 1(1)=
a
1;δ2(0)=a 1,δ2(1)=a 2;δ3(0)=a 2,δ3(1)=a 1; andδ 4(0)=δ 4(1)=a 2. The risk
function takes the following values
iR(p 1,δi)R(p 2,δi)MaxR(p,δ i)Min MaxR(p,δ i)
p1,p2 ip1,p2
11 3 3
2
7
4
5
2
5
2
5
2
3
13
4
5
2
13
4
44 2 4
Thus the minimax solution isδ 2(x)=a 1ifx=0 and=a 2ifx=1.
The computation of minimax estimators is facilitated by the use of theBayes estimation
method. So far, we have consideredθas a fixed constant andf
θ(x)has represented the PDF
(PMF) of the RVX. In Bayesian estimation we treatθas a random variable distributed
according to PDF (PMF)π(θ)onΘ.Also,π is called thea priori distribution.Nowf (x|θ)
represents the conditional probability density (or mass) function of RVX, given thatθ∈Θ
is held fixed. Sinceπis the distribution ofθ, it follows that the joint density (PMF) ofθ
andXis given by
f(x,θ)=π(θ)f(x|θ). (3)
In this frameworkR(θ,δ)is the conditional average loss,E{L(θ,δ(X))|θ}, given thatθ
is held fixed. (Note that we are using the same symbol to denote the RVθand a value
assumed by it.)
Definition 5.The Bayes risk of a decision functionδis defined by
R(π,δ)=E
πR(θ,δ). (4)
Ifθis a continuous RV andXis of the continuous type, then
R(π,δ)=
θ
R(θ,δ)π(θ)dθ
=
θθ
L(θ,δ(x))f(x|θ)π(θ)dxdθ
=
θθ
L(θ,δ(x))f(x,θ)dxdθ. (5)
Ifθis discrete with PMFπandXis of the discrete type, then
R(π,δ)=

θ

x
L(θ,δ(x))f(x,θ). (6)
Similar expressions may be written in the other two cases.

404 PARAMETRIC POINT ESTIMATION
Definition 6.A decision functionδ

is known as a Bayes rule (procedure) if it minimizes
the Bayes risk, that is, if
R(π,δ

)=inf
δ
R(π,δ). (7)
Definition 7.The conditional distribution of RVθ,givenX=x, is called the a posteriori
probability distribution ofθ, given the sample.
Let the joint PDF (PMF) be expressed in the form
f(x,θ)=g(x )h(θ|x), (8)
wheregdenotes the joint marginal density (PMF) ofX. The a priori PDF (PMF)π(θ)
gives the distribution ofθbefore the sample is taken, and the a posteriori PDF (PMF)
h(θ|x)gives the distribution ofθafter sampling. In terms ofh(θ|x)we may write
R(π,δ)=
θ
g(x)
νθ
L(θ, δ(x))h(θ|x)dθ
φ
dx (9)
or
R(π,δ)=

x
g(x)
χ

θ
L(θ,δ(x))h(θ|x)

, (10)
depending on whetherfandπare both continuous or both discrete. Similar expressions
may be written if only one offandπis discrete.
Theorem 1.Consider the problem of estimation of a parameterθ∈Θ⊆Rwith respect
to the quadratic loss functionL(θ,δ)=(θ−δ)
2
. A Bayes solution is given by
δ(x)=E{θ|X=x} (11)
(δ(x)defined by (11) is called theBayes estimator).
Proof.In the continuous case, ifπis the prior PDF ofθ, then
R(π,δ)=
θ
g(x)
νθ
[θ−δ(x)]
2
h(θ|x)dθ
φ
dx,
wheregis the marginal PDF ofX, andhis the conditional PDF ofθ,givenx.The
Bayes rule is a functionδthat minimizesR(π,δ). Minimization ofR(π,δ)is the same
as minimization of
θ
[θ−δ(x)]
2
h(θ|x)dθ,
which is minimum if and only if
δ(x)=E{θ|x}.
The proof for the remaining cases is similar.

BAYES AND MINIMAX ESTIMATION 405
Remark 1.The argument used in Theorem 1 shows that a Bayes estimator is one which
minimizesE{L(θ,δ(X))|X}. Theorem 1 is a special case which says that ifL(θ,δ(X)) =
[θ−δ(X)]
2
the function
δ(x)=

θh(θ|x)dθ
is the Bayes estimator forθwith respect toπ, the a priori distribution onΘ.
Remark 2.SupposeT(X)is sufficient for the parameterθ. Then it is easily seen that the
posterior distribution ofθgivenxdepends onxonly throughTand it follows that the
Bayes estimator ofθis a function ofT.
Example 5.LetX∼b(n,p)andL(p,δ(x)) = [p −δ(x)]
2
.Letπ(p)=1for0<p<1be
the a priori PDF ofp. Then
h(p|x)=

n
x

p
x
(1−p)
n−x
ˆ
1
0

n
x

p
x
(1−p)
n−x
dp
It follows that
E{p|x}=

1
0
ph{p|x}dp
=
x+1
n+2
.
Hence the Bayes estimator is
δ

(X)=
X+1
n+2
.
The Bayes risk is
R(π,δ

)=

π(p)
n

x=0


(x)−p]
2
f(x|p)dp
=

1
0
E

η
X+1
n+2
−p
τ
2



p

dp
=
1
(n+2)
2

1
0
[np(1−p)+(1−2p)
2
]dp
=
1
6(n+2)
.
Example 6.LetX∼N(μ,1), and let the a priori PDF ofμbeN(0,1). Also, letL(μ,δ)=
[μ−δ(X)]
2
. Then
h(μ|x)=
f(x,μ)
g(x)
=
π(μ)f(x|μ)
g(x)
,

406 PARAMETRIC POINT ESTIMATION
where
f(x)=
Θ
f(x,μ)dμ
=
1
(2π)
(n+1)/2
exp


1
2
n

1
x
2
i

·
Θ

−∞
exp
ν

n+1
2
η
μ
2
−2μ
n
x
n+1
τφ

=
(n+1)
−1/2 (2π)
n/2
exp
ν

1
2

x
2
i
+
n
2
x
2
2(n+1)
φ
.
It follows that
h(μ|x)=
1

2π/(n+1)
exp
Λ

n+1
2
η
μ−
nx
n+1
τ
2

,
and the Bayes estimator is
δ

(x)=E{μ|x}=
n
x
n+1
=
Ω
n
1
xi
n+1
.
The Bayes risk is
R(π,δ

)=
Θ
π(μ)
Θ


(x)−μ]
2
f(x|μ)dxdμ
=
Θ

−∞

ν
n
X
n+1
−μ
φ
2
π(μ)dμ
=
Θ

−∞
(n+1)
−2
(n+μ
2
)π(μ)dμ
=
1n+1
.
The quadratic loss function used in Theorem 1 is but one example of a loss function in
frequent use. Some of many other loss functions that may be used are
|θ−δ(X)|,
|θ−δ(X)|
2
|θ|
,|θ−δ|
4
,and
η
|θ−δ(X)|
|θ|+1
τ
1/2
.
Example 7.LetX
1,X2,...,X nbe iidN(μ,σ
2
)RVs. It is required to find a Bayes estimator
ofμof the formδ(x
1,...,x n)=δ(
x), wherex=
Ω
n 1
xi/n, using the loss functionL(μ,δ)=
|μ−δ(
x)|. From the argument used in the proof of Theorem 1 (or by Remark 1), the Bayes
estimator is one that minimizes the integral
ˆ
|μ−δ(x)|h(μ|x)dμ. This will be the case if
we chooseδto be the median of the conditional distribution (see Problem 3.2.5).
Let the a priori distribution ofμbeN(θ,τ
2
). Since
X∼N(μ,σ
2
/n),wehave
f(x,μ)=

n
2πστ
exp
ν

(μ−θ)
2

2

n(x−μ)
2

2
φ
.

BAYES AND MINIMAX ESTIMATION 407
Writing
(x−μ)
2
=(x−θ+θ−μ)
2
=(x−θ)
2
−2(x−θ)(μ−θ)+(μ−θ)
2
,
we see that the exponent inf(x,μ)is

1
2
ν
(μ−θ)
2
η
1
τ
2
+
n
σ
2
τ

2n(x−θ)(μ−θ)
σ
2
+
n
σ
2
(x−θ)
2
φ
.
It follows that the joint PDF ofμandXis bivariate normal with meansθ,θ, variancesτ
2
,
τ
2
+(σ
2
/n), and correlation coefficientτ/


2
+(σ
2
/n)]. The marginal ofXisN(θ,τ
2
+

2
/n)), and the conditional distribution ofμ,given
X, is normal with mean
θ+
τ

τ
2
+(σ
2
/n)
τ

τ
2
+(σ
2
/n)
(x−θ)=
θ(σ
2
/n)+

2
τ
2
+(σ
2
/n)
and variance
τ
2

1−
τ
2
τ
2
+(σ
2
/n)

=
τ
2
σ
2
/n
τ
2
+(σ
2
/n)
(see the proof of Theorem 1). The Bayes estimator is therefore the median of this
conditional distribution, and since the distribution is symmetric about the mean,
δ

(
x)=
θ(σ
2
/n)+

2
τ
2
+(σ
2
/n)
is the Bayes estimator ofμ.
Clearlyδ

is also the Bayes estimator under the quadratic loss functionL(μ,δ)=
[μ−δ(X)]
2
.
Key to the derivation of Bayes estimator is the posteriori distribution,h(θ|x).The
derivation of the posteriori distributionh(θ|x), however, is a three-step process:
1. Find the joint distribution ofXandθgiven byπ(θ)f(x|θ).
2. Find the marginal distribution with PDF (PMF)g(x)by integrating (summing) over
θ∈Ω.
3. Divide the joint PDF (PMF) byg(x).
It is not always easy to go through these steps in practice. It may not be possible to
obtainh(θ|x)in a closed form.
Example 8.LetX∼N(μ,1)and the prior PDF ofμbe given by
π(μ)=
e
−(μ−θ)
[1+e
−(μ−θ)
]
2
,

408 PARAMETRIC POINT ESTIMATION
whereθis a location parameter. Then the joint PDF ofXandμis given by
f(x,μ)=
1


e
−(x−μ)
2
/2
e
−(μ−θ)
[1+e
−(μ−θ)
]
2
so that the marginal PDF ofXis
g(x)=
e
θ


θ

−∞
e
−(x−μ)
2
/2
e
−μ
[1+e
−(μ−θ)
]
2
dμ.
A closed form forgis not known.
To avoid problem of integration such as that in Example 8, statisticians use the so-called
conjugate priordistributions. Often there is a natural parameter family of distributions
such that the posterior distributions also belong to the same family. These priors make the
computations much easier.
Definition 8.LetX∼f(x|θ)andπ(θ)be the prior distribution onΘ. Thenπis said to be
a conjugate prior family if the corresponding posterior distributionh(θ|x)also belongs
to the same family asπ(θ).
Example 9.Consider Example 6 whereπ(μ)isN(0,1)andh(μ|x)isN

n
x
n+1
,
1
n+1
˜
so
that bothhandπbelong to the same family. HenceN(0,1)is a conjugate prior forμ.
Example 10.LetX∼b(n,p),0<p<1, andπ(p)be the beta PDF with parameters(α,β).
Then
h(p|x)=
p
x+α−1
(1−p)
β−1
ˆ
1
0
p
x+α−1
(1−p)
β−1
dp
=
p
x+α−1
(1−p)
β−1
B(x+α,β)
which is also a beta density. Thus the family of beta distributions is a conjugate family of
priors forp.
Conjugate priors are popular because whenever the prior family is parametric the pos-
terior distributions are always computable,h(θ|x)being an updated parametric version of
π(θ). One no longer needs to go through a computation ofg, the marginal PDF (PMF) of
X. Onceh(θ|x)is knowng, if needed, is easily determined from
g(x)=
π(θ)f(x|θ)
h(θ|x)
.
Thus in Example 10, we see easily thatg(x)is beta(x+α,β), while in Example 6gis
given by
g(x)=
1
(n+1)
1/2
(2π)
n/2
exp
χ

1
2
n

i=1
x
2
i
+
n
2
x
2
2(n−1)

.

BAYES AND MINIMAX ESTIMATION 409
Conjugate priors are usually associated with a wide class of sampling distributions,
namely, the exponential family of distributions.
Natural Conjugate Priors
Sampling Prior Posterior
PDF(PMF),f(x|θ)π(θ) h(θ|x)
N(θ,σ
2
) N(μ,τ
2
)N

σ
2
μ+xτ
2
σ
2

2,
σ
2
τ
2
σ
2

2

G(ν,β) G(α,β)G(α+ν,β+x)
b(n,p) B(α,β)B(α+x,β+n−x)
P(λ) G(α,β)G(α+x,β+1)
NB(r;p) B(α,β)B(α+r,β+x)
G(γ,1/θ) G(α,β)G(α+ν,β+x)
Another easy way is to use anoninformative priorπ(θ)though one needs some
integration to obtaing(x).
Definition 9.APDFπ (θ)is said to be a noninformative prior if it contains no information
aboutθ, that is, the distribution does not favor any value ofθover others.
Example 11.Some simple examples of noninformative priors areπ(θ)=1,π(θ)=
1
θ
and
π(θ)=

I(θ). These may quite often lead to infinite mass and the PDF may be improper
(that is, does not integrate to 1).
Calculation ofh(θ|x)becomes easier by-passing the calculation ofg(x)whenf(x|θ)is
invariant under a groupGof transformations following Fraser’s [33] structural theory.
LetGbe a group of Borel-measurable functions onR
nonto itself. The group operation
is composition, that is, ifg
1andg 2are mappings fromR nontoR n,g2g1is defined by
g
2g1(x)=g 2(g1(x)).Also,Gis closed under composition and inverse, so that all maps in
Gare one-to-one. We define the groupGof affine linear transformationsg={a,b}by
gx=a+bx,a∈R,b>0.
The inverse of{a,b}is
{a,b}
−1
=
ν

a
b
,
1
b
φ
,
and the composition{a,b}and{c,d}∈Gis given by
{a,b}{c,d}(x)={a,b}(c+dx)=a+b(c+dx)
=(a+bc)+bdx ={a+bc,bd}(x).
In particular,
{a,b}{a,b}
−1
={a,b}
ν

a
b
,
1
b
φ
={0,1}=e.

410 PARAMETRIC POINT ESTIMATION
Example 12.LetX∼N(μ,1)and letGbe the group of translationsG={{b,1},−∞<
b<∞}.LetX
1,...,X nbe a sample fromN(μ,1). Then, we may write
X
i={μ,1}Z i,i=1,...,n,
whereZ
1,...,Z nare iidN(0,1).
It is clear that
Z∼N(0,1/n)with PDF

n

exp
)

n
2
z
2
*
and there is a one-to-one correspondence between values of{z,1}and{μ,1}given by
{x,1}={μ,1}{z,1}={μ+z,1}.
Thusx=μ+zwith inverse mapz=x−μ.Wefixxand consider the variation inzas a
function ofμ. Changing the PDF element ofztoμwe get

n

exp
)

n
2
(μ−x)
2
*
as the posterior ofμgivenxwith priorπ(μ)=1.
Example 13.LetX∼N(0,σ
2
)and consider the scale groupG={{0,c},c>0}.LetX 1,
X
2,...,X nbe iidN(0,σ
2
). Write
X
i={0,σ}Z i,i=1,2,...,n,
whereZ
iare iidN(0,1)RVs. Then the RVnS
2
z
=

n
i=1
Z
2
i
∼χ
2
(n)with PDF
1
2
n
2Γ(
n
2
)
exp
ν

ns
2
z
2
φ
(ns
2
z
)
n
2
−1
.
The values of{0,s
z}are in one-to-one correspondence with those of{0,σ}through
{0,s
x}={0,σ}{0,s z},
wherenS
2
x
=

n
i=1
X
2
i
, so thats x=σs z. Considering the variation ins zas a function ofσ
for fixeds
xwe see thatds z=sx

σ
2. Changing the PDF element ofs ztoσwe get the PDF
ofσas
1
2
n/2
Γ(
n
2
)
exp
ν

ns
2
x

2
φη
ns
2 x
σ
2
τ(
n
2
−1)
which is the same as the posterior ofσgivens xwith priorπ(σ)=1/σ .
Example 14.LetX
1...Xnbe a sample fromN(μ,σ
2
)and consider the affine linear group
G={{a,b},−∞<a<∞,b>0}. Then
X
i={μ,σ}Z i,i=1,...,n

BAYES AND MINIMAX ESTIMATION 411
whereZ i’s are iidN(0,1). We know that the joint distribution of(Z,S
2
z
)is given by
n

exp
ν

nz
2
2
φ
dz
1
(
(
n−1
2
)
η
(n−1)s
2
z
2
τ
n−1
2
−1
exp
ν

(n−1)s
2
z
2
φ
d

(n−1)s
2
z
2

.
Further,the values of{z,sz}are in one-to-one correspondence with the values of{μ,σ}
through
{x,sx}={μ,σ}{z,sz}={μ+σz,σsz}
⇒z=
x−μ
σ
ands
z=
s
x
σ
.
Consider the variation of(z,sz)as a function of(μ,σ)for fixed(x,sx). The Jacobian of
the transformation from{z,sz}to{μ,σ}is given by
J=






1
σ

(x−μ)
σ
2
0 −
sx
σ
2





=
s
x
σ
3
.
Hence, the joint PDF of(μ,σ)given(x,sx)is given by

n

exp
ν

n(μ−x)
2

2
φ
1
(

n−1
2

η
(n−1)s
2
x

2
τ
n−1
2
−1
×exp
ν

(n−1)s
2
x

2
φη
(n−1)s
2
x

2
τ
n−1
2
−1η
(n−1)s
2
x
σ
3
τ
.
This is the PDF one obtains ifπ(μ)=1 andπ(σ)=
1
σ
andμandσare independent RV.
The following theorem provides a method for determining minimax estimators.
Theorem 2.Let{f
θ:θ∈Θ}be a family of PDFs (PMFs), and suppose that an estimator
δ

ofθis a Bayes estimator corresponding to an a priori distributionπonΘ. If the risk
functionR(θ,δ

)is constant onΘ, thenδ

is a minimax estimator forθ.
Proof.Sinceδ

is the Bayes estimator ofθwith constant riskr

(free ofθ), we have
r

=R(π,δ

)=
Θ

−∞
R(θ,δ

)π(θ)dθ
=inf
δ∈D
Θ
R(θ,δ)π(θ)dθ
≤sup
θ∈Θ
inf
δ∈D
R(θ,δ)≤inf
δ∈D
sup
θ∈Θ
R(θ,δ).

412 PARAMETRIC POINT ESTIMATION
Similarly, sincer

=R(θ,δ

)for allθ∈Θ,wehave
r

=sup
θ∈Θ
R(θ,δ

)≥inf
δ∈D
sup
θ∈Θ
R(θ,δ).
Together we then have
sup
θ∈Θ
R(θ,δ

)= inf
δ∈D
sup
θ∈Θ
R(θ,δ),
which meansδ

is minimax.
The following examples show how to obtain constant risk estimators and the suitable
prior distribution.
Example 15.(Hodges and Lehmann [43]). LetX∼b(n,p),0≤p≤1. We seek a minimax
estimator ofpof the formαX+β, using the squared error loss function. We have
R(p,δ)=E
p{αX+β−p}
2
=Ep{α(X−np)+β +(αn−1)p}
2
=[(αn−1)
2
−α
2
n]p
2
+[α
2
n+2β(αn−1)]p+β
2
,
which is a quadratic equation inp.Tofindαandβsuch thatR(p,δ)is constant for all
p∈Θ, we set the coefficients ofp
2
andpequal to 0 to get
(αn−1)
2
−α
2
n=0 and α
2
n+2β(αn−1)=0.
It follows that
α=
1

n(1+

n)
or
1

n(

n−1)
and
β=
1
2(1+

n)
or−
1
2(

n−1)
.
Since 0≤p≤1, we discard the second set of roots for bothαandβ, and then the estimator
is of the form
δ

(x)=
X

n(1+

n)
+
1
2(1+

n)
It remains to show thatδ

is Bayes against some a priori PDFπ.
Consider the natural conjugate priori PDF
π(p)=[β(α



)]
−1
p
α

−1
(1−p)
β

−1
,0≤p≤1,α



>0.
The a posteriori PDF ofp,givenx, is expressed by
h(p|x)=
p
x+α

−1
(1−p)
n−x+β

−1
B(x+α

,n−x+β

)

BAYES AND MINIMAX ESTIMATION 413
It follows that
E{p|x}=
B(x+α

+1,n−x+β

)
B(x+α

,n−x+β

)
=
x+α

n+α



,
which is the Bayes estimator for a squared error loss. For this to be of the formδ

,we
must have
1

n(1+

n)
=
1
n+α



and
1
2(1+

n)
=
α

n+α



,
givingα



=

n/2. It follows that the estimatorδ

(x)is minimax with constant risk
R(p,δ

)=
1
4(1+

n)
2
for allp∈[0,1].
Note that the UMVUE (which is also the MLE) isδ(X)=X/nwith riskR(p,d)=
p(1−p)/n. Comparing the two risks (Figs. 1 and 2), we see that
p(1−p)
n

1
4(1+

n)
2
if and only if|p−
1
2
|≥

1+2

n
2(1+

n)
,
so that
R(p,δ

)<R(p,δ)
0.25
l/16
0.5 1
p
R
R(p, δ*)
Fig. 1Comparison ofR(p,δ)andR(p,δ

),n=1.

414 PARAMETRIC POINT ESTIMATION
0 0.5 1 p
R
R(p, δ*)
R(p, δ)
l/64
Fig. 2Comparison ofR(p,δ)andR(p,δ

),n=9.
in the interval(
1
2
−an,
1
2
+an), wherea n→0asn→∞. Moreover,
sup
pR(p,δ)
sup
pR(p,δ

)
=
1/4n
1/[4(1 +

n)
2
]
=
n+2

n+1
n
→1as n→∞.
Clearly, we would prefer the minimax estimator ifnis small and would prefer the UMVUE
because of its simplicity ifnis large.
Example 16.(Hodges and Lehmann [43]). A lot containsNelements, of whichDare
defective. A random sample of sizenproducesXdefectives. We wish to estimateD.
Clearly,
P
D{X=k}=
η
D
k
τη
N−D
n−k
τη
N
n
τ
−1
,
E
DX=n
D
N
,andσ
2
D
=
nD(N−n)(N−D)
N
2
(N−1)
.
Proceeding as in Example 8, we find a linear function ofXwith constant risk. Indeed,
E
D(αX+β−D)
2

2
when
α=
N
n+

n(N−n)/(N −1)
and β=
N
2

1−
αn
N
˜
.

BAYES AND MINIMAX ESTIMATION 415
We show thatαX+βis the Bayes estimator corresponding to a priori PMF
P{D=d}=c
θ
1
0
η
N
d
τ
p
d
(1−p)
N−d
p
a−1
(1−p)
b−1
dp,
wherea,b>0, andc=Γ(a+b)/Γ(a)Γ(b). First note that
σ
N
d=0
P{D=d}=1sothat
N

d=0
η
N
d
τ
Γ(a+b)
Γ(a)Γ(b)
Γ(a+d)Γ(N+b−d)
Γ(N+a+b)
=1.
The Bayes estimator is given by
δ

(k)=
σ
N−n+k
d=k
d

d
k

N−d
n−k

N
d

Γ(a+d)Γ(N+b−d)
σ
N−n+k
d=k

d
k

N−d
n−k

N
d

Γ(a+d)Γ(N+b−d)
.
A little simplification, writingd=(d−a)+a and using
η
d
k
τη
N−d
n−k
τη
N
d
τ
=
η
N−n
d−k
τη
N
n
τη
n
k
τ
,
yields
δ

(k)=
σ
N−n
i=0

N−n
i

Γ(d+a+1)Γ(N +b−d)
σ
N−n
0

N−n
i

Γ(d+a)Γ(N +b−d)
−a
=k
a+b+Na+b+n
+
a(N−n)
a+b+n
.
Now putting
α=
a+b+N
a+b+n
andβ=
a(N−n)
a+b+n
and solving foraandb, we get
a=
β
α−1
and b=
N−αn−β
α−1
.
Sincea>0,β>0, and sinceb>0,N>αn+β. Moreover,α>1ifN>n+1. IfN=n+1,
the result is obtained if we giveDa binomial distribution with parameterp=
1
2
.IfN=n,
the result is immediate.
The following theorem which is an extension of Theorem 2 is of considerable help to
prove minimaxity of various estimators.

416 PARAMETRIC POINT ESTIMATION
Theorem 3.Let{π k(θ);k≥1}be a sequence of prior distributions onΘand let


k
}be the corresponding sequence of Bayes estimators with Bayes risksR(π k;δ

k
).If
limsup
k→∞R(πk;δ

k
)=r

and there exists an estimatorδ

for which
sup
θ∈Θ
R(θ,δ

)≤r

thenδ

is minimax.
Proof.Supposeδ

is not minimax. Then there exists an estimator
˜
δsuch that
sup
θ∈Θ
R(θ,
˜
δ)≤sup
θ∈Θ
R(θ,δ

).
On the other hand, consider the Bayes estimators{δ

k
}corresponding to the priors{π k(θ)}.
We obtain
R(π
k,δ

k
)=

R(θ,δ

k
)πk(θ)dθ (12)


R(θ,
˜
δ)π
k(θ)dθ (13)
≤sup
θ∈Θ
R(θ,
˜
δ), (14)
which contradictssup
θ∈ΘR(θ,δ

)≤r

. Henceδ

is minimax.
Example 17.LetX
1,...,X nbe a sample of sizenfromN(μ,1). Then, the MLE ofμis
X
with variance
1
n
. We show thatXis minimax. Letμ∼N(0,τ
2
). Then the Bayes estimator
ofμisX


2
1+nτ
2

. The Bayes risk of this estimator isR(π,δ
τ
2)=
1n


2
1+τ
2

.Now,asτ
2
→∞
R(π,δ

τ
2)→
1
n
which is the risk ofX. HenceXis minimax.
Definition 10.A decision ruleδisinadmissibleif there exists aδ

∈Dsuch that
R(θ,δ

)≤R(θ,δ)where the inequality is strict for someθ∈Θ; otherwiseδis admissible.
Theorem 4.IfX
1,...,X nis a sample fromN(θ,1), then
Xis an admissible estimator ofθ
under square error lossL(θ,a)=(θ−a)
2
.
Proof.Clearly,
X∼N(θ,
1
n
). SupposeXis not admissible, then there exists another rule
δ

(x)such thatR(θ,δ

)≤R(θ,
X)while the inequality is strict for someθ=θ 0(say).
Now, the riskR(θ,δ)is a continuous function ofθand hence there exists anε>0 such
thatR(θ,δ

)<R(θ,
X)−εfor|θ−θ 0|<ε.
Now consider the priorN(0,τ
2
). Then the Bayes estimator isδ(X)=
X

1+
1

2

−1
with risk
1
n


2
1+nτ
2
˜
. Thus,
R(π,X)−R(π,δ
τ
2)=
1
n
η
1
1+nτ
2
τ
.

BAYES AND MINIMAX ESTIMATION 417
However,
τ[R(π,δ

)−R(π,
X)]

θ
[R(θ,δ

)−R(θ,
X)]
1

2πτ
exp
ν

1

2
θ
2
φ

≤−
ε


θ
θ0+ε
θ
0−ε
exp
ν

1

2
θ
2
φ
dθ.
We get
0≤τ[R(π,δ

)−R(π,
X)] +τ
1
R(π,X)−R(π,δ
τ
2

]
≤−
ε


θ
θ0+ε
θ
0+ε
exp
ν

1

2
θ
2
φ
dθ+
τ
n
1
(1+nτ
2
)
.
The right-hand side goes to−

2


asτ→∞. This result leads to a contradiction thatδ

is admissible. HenceXis admissible under squared loss.
Thus we have proved thatXis an admissible minimax estimator of the mean of a normal
distributionN(θ,1).
PROBLEMS 8.8
1.It rains quite often in Bowling Green, Ohio. On a rainy day a teacher has essentially
three choices: (1) to take an umbrella and face the possible prospect of carrying it
around in the sunshine; (2) to leave the umbrella at home and perhaps get drenched;
or (3) to just give up the lecture and stay at home. LetΘ={θ
1,θ2}, whereθ 1corre-
sponds to rain, andθ
2, to no rain. LetA={a 1,a2,a3}, wherea icorresponds to the
choicei,i=1,2,3. Suppose that the following table gives the losses for the decision
problem:
θ1θ2
a112
a
2
40
a
3
55
The teacher has to make a decision on the basis of a weather report that depends on
θas follows.
θ1θ2
W1(Rain)0.70.2
W
2(No rain)
0.30.8
Find the minimax rule to help the teacher reach a decision.

418 PARAMETRIC POINT ESTIMATION
2.LetX 1,X2,...,X nbe a random sample fromP(λ). For estimatingλ,usingthe
quadratic error loss function, an a priori distribution overΘ, given by PDF
π(λ)=e
−λ
ifλ>0,
=0 otherwise,
is used:
(a) Find the Bayes estimator forλ.
(b) If it is required to estimateϕ(λ)=e
−λ
with the same loss function and same a
priori PDF, find the Bayes estimator forϕ(λ).
3.LetX
1,X2,...,X nbe a sample fromb(1,θ). Consider the class of decision rulesδof
the formδ(x
1,x2,...,x n)=n
−1
σ
n
i=1
xi+α, whereαis a constant to be determined.
Findαaccording to the minimax principle, using the loss function(θ−δ)
2
, where
δis an estimator forθ.
4.Letδ

be a minimax estimator foraψ(θ)with respect to the squared error loss
function. Show thataδ

+b(a,bconstants)is a minimax estimator foraψ(θ)+b.
5.LetX∼b(n,θ), and suppose that the a priori PDF ofθisU(0,1). Find the Bayes
estimator ofθ, using loss functionL(θ,δ)=(θ−δ)
2
/[θ(1−θ)]. Find a minimax
estimator forθ.
6.In Example 5 find the Bayes estimator forp
2
.
7.LetX
1,X2,...,X nbe a random sample fromG(1,1/λ). To estimateλ, let the a priori
PDF onλbeπ(λ)=e
−λ
,λ>0, and let the loss function be squared error. Find the
Bayes estimator ofλ.
8.LetX
1,X2,...,X nbe iidU(0,θ)RVs. Suppose the prior distribution ofθis a Pareto
PDFπ(θ)=
αa
α
θ
α+1forθ≥a,=0forθ< a. Using the quadratic loss function find
the Bayes estimator ofθ.
9.LetTbe the unique Bayes estimator ofθwith respect to the prior densityπ. Then
Tis admissible.
10.LetX
1,X2,...,X nbe iid with PDFf θ(x)=exp{−(x−θ)},x>θ.Takeπ(θ)=e
−θ
,
θ>0. Find the Bayes estimator ofθunder quadratic loss.
11.For the PDF of Problem 10 consider the estimation ofθunder quadratic loss. Con-
sider the class of estimatorsa

X
(1)−
1
n

for alla>0. Show thatX
(1)−1/nis
minimax in this class.
8.9 PRINCIPLE OF EQUIVARIANCE
LetP={P
θ:θ∈Θ}be a family of distributions of some RVX.LetX ⊆R nbe sample
space of values ofX. In Section 8.8 we saw that the statistical decision theory revolves
around the following four basic elements: the parameter spaceΘ, the action spaceA,the
sample spaceX, and the loss functionL(θ,a).
LetGbe a group of transformations which mapXonto itself. We say thatPisinvariant
underGif for eachg∈Gand everyθ∈Θ, there is a uniqueθ
π
=¯gθ∈Θsuch that
g(X)∼P
¯gθwheneverX∼P θ. Accordingly,
P
θ{g(X) ∈A}=P ¯gθ{X∈A} (1)

PRINCIPLE OF EQUIVARIANCE 419
for all Borel subsets inR n. We note that the invariance ofPunderGdoes not change the
class of distributions we begin with; it only changes the parameter or indexθto¯gθ.The
groupGinduces
¯
G, a group of transformations¯ gonΘonto itself.
Example 1.LetX∼b(n,p),0≤p≤1. LetG={g,e}, whereg(x)=n−x, ande(x)=x.
Thengg
−1
=e. Clearly,g(X)∼b(n,1−p)so that¯ gp=1−pand¯ep=e. The groupG
leaves{b(n,p);0≤p≤1}invariant.
Example 2.LetX
1,X2,...,X nbe iidN(μ,σ
2
)RVs. Consider the affine group of trans-
formationsG={{a,b},a∈R,b>0}onX. The joint PDF of{a,b}X=(a+bX
1,...,
a+bX
n)is given by
f(x
1,x2,...,x n)=
1
(bσ

2π)
n
exp
Λ

1
2b
2
σ
2
n

i=1
(xi−a−bμ)
2

and we see that
¯ g(μ,σ)=(a+μσ,bσ)={a, b}{μ,σ}.
ClearlyGleaves the family of joint PDFs ofXinvariant.
In order to apply invariance considerations to a decision problem we need also to ensure
that the loss function is invariant.
Definition 1.A decision problem is said to be invariant under a groupGif
(i)Pis invariant underGand
(ii) the loss functionLis invariant in the sense that for everyg∈Ganda∈Athere is
a uniquea

∈Asuch that
L(θ,a)=L(¯gθ,a

)for allθ.
Thea

∈Ain Definition 1 is uniquely determined bygand may be denoted by˜ g(a).
One can show that
˜
G={˜ g:g∈G}is a group of transformations ofAinto itself.
Example 3.Consider the estimation ofμin sampling fromN(μ,1). In Example 8.9.2
we have shown that the normal family is invariant under the location groupG=
{{b,1},−∞<b<∞}. Consider the quadratic loss function
L(μ,a)=(μ−a)
2
.
Then,{b,1}a=b+aand{b,1}{μ,1}={b+μ,1}. Hence,
L({b,1}μ,{b,1}a)= L[(b+μ)−(b+a)]
2
=(μ−a)
2
=L(μ,a).
ThusL(μ,a)is invariant underGand the problem of estimation ofμis invariant under
groupG.

420 PARAMETRIC POINT ESTIMATION
Example 4.Consider the normal familyN(0,σ
2
)which is invariant under the scale group
G={{0,c},c>0}. Let the loss function be
L(σ
2
,a)=
1
σ
4

2
−a)
2
.
Now,{0,c}a=caand{0,c}{0,σ
2
}={0,cσ
2
}and
L[{0,c}σ
2
,{0,c}a]=
1
c
2
σ
4
(cσ
2
−ca)
2
=
1
σ
4

2
−a)
2
=L(σ
2
,a).
Thus, the loss functionL(σ
2
,a)is invariant underG={{0,c},c>0}and the problem of
estimation ofσ
2
is invariant.
Example 5.Consider the loss function
L(σ
2
,a)=
a
σ
2
−1−log
a
σ
2
for the estimation ofσ
2
from the normal familyN(0,σ
2
). We show that this loss-function
is invariant under the scale group. Since
{0,c}σ
2
={0,cσ
2
}and{0,c}{0,a}={0,ca},
we have
L[{0,c}σ
2
,{0,c}a]=
ca

2
−1−log
ca

2
=L(σ
2
,a).
Let us now return to the problem of estimation of a parametric functionψ:Θ→ R.
For convenience let us takeΘ⊆Randψ(θ)=θ. ThenA=Θand
˜
G=
¯
G.
Supposeθis the mean of PDFf
θ,G={{b,1},b∈R}, and{f θ}is invariant under
G. Consider the estimator∂(X)=
X. What we want in an estimator∂

ofθis that it
changes in the same prescribed way as the data are changed. In our case, sinceXchanges
to{b,1}X=X+bwe would likeXto transform to{b,1}X=X+b.
Definition 2.An estimatorδ(X)ofθis said to be equivariant, underG,if
δ(gX)= ¯gδ(X) for allg∈G, (2)
where we have writtengXforg(X)for convenience.
IndeedgonSinduces¯ gonΘ. Thus ifX∼f
θ, thengX∼f ¯gθso ifδ(X)estimates
θthenδ(gX)should estimate¯ gθ.Theprinciple of equivariancerequires that we restrict
attention to equivariant estimators and select the “best” estimator in this class in a sense
to be described later in this section.

PRINCIPLE OF EQUIVARIANCE 421
Example 6.In Example 3, consider the estimators∂ 1(X)=X,∂2(X)=(X
(1)+X
(n))/2,
and∂
3(X)=α
X,αa fixed real number. ThenG={(b,1),−∞<b<∞}induces
¯
G=G
onΘand both∂
1,∂2are equivariant underG. The estimatorδ 3is not equivariant unless
α=1. In Example 8.9.1∂(X)=X/nis an equivariant estimator ofp.
In Example 6 consider the statistic∂(X)=S
2
. Note that under the translation group
{b,1}X=X+band∂({b,1}X)= ∂(X). That is, for everyg∈G,∂(gX)= ∂(X).A
statistic∂is said to beinvariantunder a group of transformationsGif∂(gX)= ∂(X)for
allg∈G. WhenGis the translation group, an invariant statistic (function) underGis called
location-invariant. Similarly if Gis the scale group, we call∂scale-invariantand ifGis
the location-scale group, we call∂location-scale invariant. In Example 6 ∂
4(X)=S
2
is
location-invariant but not equivariant, and∂
2(X)and∂ 3(X)are not location-invariant.
A very important property of equivariant estimators is that their risk function is constant
on orbits ofθ.
Theorem 1.Suppose∂is an equivariant estimator ofθin a problem which is invariant
underG. Then the risk function of∂satisfies
R(¯gθ,∂)=R(θ,∂ ) (3)
for allθ∈Θandg∈G. If, in particular,
¯
Gis transitive overΘ, thenR(θ,∂)is independent
ofθ.
Proof.We have forθ∈Θandg∈G
R(θ,∂(X)) =E
θL(θ,∂(X))
=E
θL(¯ gθ,¯ g∂(X)) (Invariance ofL)
=E
θL(¯ gθ,∂(g(X)) (Equivariance ofδ)
=E
¯gθL(¯ gθ,∂(X)) (Invariance of{P θ})
=R(¯ gθ,∂(X)).
In the special case when
¯
Gis transitive overΘthen for anyθ
1,θ2∈Θ, there exists a¯ g∈
¯
G
such thatθ
2=¯ gθ 1.Itfollowsthat
R(θ
2,∂)=R( ¯ gθ 1,∂)=R(θ 1,∂)
so thatRis independent ofθ.
Remark 1.When the risk function of every equivariant estimator is constant, an estimator
(in the class equivariant estimators) which is obtained by minimizing the constant is called
theminimum risk equivariant(MRE) estimator.
Example 7.LetX
1,X2,...,X niid RVs with common PDF
f(x,θ)=exp{−(x−θ)},x≥θ,and=0,ifx<0.

422 PARAMETRIC POINT ESTIMATION
Consider the location groupG={{b,1},−∞<b<∞}which induces
¯
GonΘwhere
¯
G=G. Clearly
¯
Gis transitive. LetL(θ,∂)=(θ−∂)
2
. Then the problem of estimation of
θis invariant and according to Theorem 1 the risk of every equivariant estimator is free of
θ. The estimatorδ
0(X)=X
(1)−
1
n
is equivariant underGsince
δ
0({b,1}X)= min
1≤i≤n
(Xi+b)−
1
n
=b+X
(1)−
1
n
=b+δ
0(X).
We leave the reader to check that
R(θ,∂
0)=E θ
η
X
(1)−
1
n
−θ
τ
2
=
1
n
2
,
and it will be seen later that∂
0is the MRE estimator ofθ.
Example 8.In this example we consider sampling from a normal PDF. Let us first con-
sider estimation ofμwhenσ=1. LetG={{b,1},−a<b<∞}. Then∂(X)=
Xis
equivariant underGand it has the smallest risk 1/n. Note that{x,1}
−1
={−x,1}may be
used to designatexon its orbits
{x,1}
−1
x=(x 1−x,...,x n−x)=A(x ).
ClearlyA(x)is invariant underGandA(X) is ancillary toμ. By Basu’s theoremA(X)
and¯Xare independent.
Next consider estimation ofσ
2
withμ=0 andG={{0,c},c>0}. ThenS
2
x
=
Ω
n
1
X
2
i
is an equivariant estimator ofσ
2
. Note that{0,s x}
−1
may be used to designatexon its
orbits
{0,s
x}
−1
x=
η
x
1
sx
,...,
x
n
sx
τ
=A(x).
AgainA(x)is invariant underGandA(X) is ancillary toσ
2
. Moreover,S
2
x
andA(X) are
independent.
Finally, we consider estimation of(μ,σ
2
)whenG={{b,c},−a<b<∞,c>0}.
Then(
X,S
2
x
), whereS
2
x
=
Ω
n 1
(Xi−
X)
2
is an equivariant estimator of(μ,σ
2
).Also
{x,sx}
−1
may be used to designatexon its orbits
{x,sx}
−1
x=
η
x
1−
x
sx
,...,
x
n−
x
sx
τ
=A(x).
Note that the statisticA(X)defined in each of the three cases considered in Example 8
is constant on its orbits. A statisticAis said to bemaximal invariantif
(i)Ais invariant, and
(ii)Ais maximal, that is,A(x
1)=A(x 2)⇒x 1=g(x 2)for someg∈G.
We now derive an explicit expression for MRE estimator for a location parameter. Let
X
1,X2,...,X nbe iid with common PDFf θ(x)=f(x−θ),−∞<θ<∞. Then{f θ:θ∈Θ}
is invariant underG={{b,1},−∞<b<∞}and an estimator ofθis equivariant if
∂({b,1}X=∂(X)+b
for all realb.

PRINCIPLE OF EQUIVARIANCE 423
Lemma 1.An estimator∂is equivariant forθif and only if
∂(X)=X
1+q(X 2−X1,...,X n−X1), (4)
for some functionq.
Proof.If (4) holds then
∂({b,1}x)= b+x
1+q(x 2−x1,...,x n−x1)
=b+∂(x).
Conversely,
∂(x)=∂(x
1+x1−x1,x1+x2−x1,...,x 1+xn−x1)
=x
1+∂(0,x 2−x1,x···,x n−x1),
which is (4) withq(x
2−x1,...,x n−x1)=∂(0,x 2−x1,...,x n−x1).
From Theorem 1 the risk function of an equivariant estimator∂is constant with risk
R(θ,∂)=R(0,∂ )=E
0[∂(X)]
2
,for allθ,
where the expectation is with respect to PDFf
0(x)=f(x). Consequently, among all
equivariant estimators∂forθ, the MRE estimator is∂
0satisfying
R(0,∂
0)=min

R(0,∂).
Thus we only need to choose the functionqin (4).
LetL(θ,∂)be the loss function. Invariance considerations require that
L(θ,∂)=L(¯gθ,¯ g∂)=L(θ+b,∂+b)
for all realbso thatL(θ,∂)must be some functionwof∂−θ.
LetY
i=Xi−X1,i=2,...,n,Y=(Y 2,...,Y n), andg(y)be the joint PDF ofYunder
θ=0. Leth(x
1|y)be the conditional density, underθ=0, ofX 1givenY=y. Then
R(0,∂)=E
0[w(X 1−q(Y))]
=
θνθ
w(x
1−q(y))h(x 1|y)dx
φ
g(y)dy. (5)
ThenR(0,∂)will be minimized by choosing, for each fixedy,q(y)to be that value
ofcwhich minimizes
θ
w(u−c)h(u|y )du. (6)

424 PARAMETRIC POINT ESTIMATION
Necessarilyqdepends ony. In the special casew(d−θ)=(d−θ)
2
, the integral in (6) is
minimum whencis chosen to be the mean of the conditional distribution. Thus the unique
MRE estimator ofθis given by

0(x)=x 1−Eθ{X1|Y=y}. (7)
This is the so-calledPitman estimator. Let us simplify it a little more by computing
E
0{x1−X1|Y=y}.
First we need to computeh(u|y). Whenθ=0, the joint PDF ofX
1,Y2,...,Y nis easily
seen to be
f(x
1)f(x1+y2)...f(x 1+yn)
so the joint PDF of(Y
2,...,Y n)is given by
Θ

−∞
f(u)f(u+y 2)...f(u+y n)du.
It follows that
h(u|y)=
f(u)f(u+y
2)···f(u+y n)
ˆ

−∞
f(u)f(u+y 2)···f(u+y n)du
. (8)
Now letZ=x
1−X1. Then the conditional PDF ofZgivenyish(x 1−z|y). It follows
from (8) that

0(x)=E 0{Z|y}=
Θ

−∞
zh(x1−z)dz
=
ˆ

−∞
z
Σ
n
j=1
f(xj−z)dz
ˆ

−∞Σ
n j=1
f(xj−z)dz
. (9)
Remark 2.Since the joint PDF ofX
1,X2,...,X nis
Σ
n
j=1
fθ(xj)=
Σ
n
j=1
f(xj−θ), the joint
PDF ofθandXwhenθhas priorπ(θ)isπ(θ)
Σ
n
j=1
f(xj−θ). The joint marginal ofXis
ˆ

−∞
π(θ)
Σ
n
j=1
f(xj−θ)dθ. It follows that the conditional pdf ofθgivenX=xis given by
π(θ)
Σ
n
j=1
f(xj−θ)
ˆ

−∞
π(θ)
Σ
n j=1
f(xj−θ)dθ
.
Takingπ(θ)=1, the improper uniform prior onΘ, we see from (9) that∂
0(x)is the Bayes
estimator ofθunder squared error loss and priorπ(θ)=1. Since the risk of∂
0is constant,
it follows that∂
0is also minimax estimator ofθ.

PRINCIPLE OF EQUIVARIANCE 425
Remark 3.SupposeSis sufficient forθ. Then
ε
n
j=1
fθ(xj)=g θ(s)h(x )so that the Pitman
estimator ofθcan be rewritten as

0(x)=
ˆ

−∞
θ
ε
n
j=1
fθ(xj)dθ
ˆ

−∞ε
n j=1
fθ(xj)dθ
=
ˆ

−∞
θgθ(s)h(x )dθ
ˆ

−∞
gθ(s)h(x )dθ
=
ˆ

−∞
θgθ(s)dθ
ˆ

−∞
gθ(s)dθ
,
which is a function ofsalone.
Examples 7 and 8 (continued). A direct computation using (9) shows that X
(1)−1/nis the
Pitman MRE estimator ofθin Example 7 and
Xis the MRE estimator ofμin Example 8
(whenσ=1). The results can be obtained by using sufficiency reduction. In Example 7,
X
(1)is the minimal sufficient statistic forθ. Every (translation) equivariant function based
onX
(1)must be of the form∂ c(X)=X
(1)+cwherecis a real number. Then
R(θ,∂
c)=E θ{X
(1)+c−θ}
2
=Eθ{X
(1)−1/n−θ+(c+1/n)}2
=R(θ,∂
0)+(c+1/n)
2
=(1/n)
2
+(c+1/n)
2
which is minimized forc=−1/n. In Example 8,
Xis the minimal sufficient statistic so
every equivariant function ofXmust be of the form∂ c(X)=X+c, wherecis a real
constant. Then
R(μ,∂
c)=E μ(
X+c−μ)
2
=
1
n
+c
2
,
which is minimized forc=0.
Example 9.LetX
1,X2,...,X nbe iidU(θ−1/2,θ+1/2). Then (X
(1),X
(n))is jointly
sufficient forθ. Clearly,
f(x
1−θ,...,x n−θ)=
χ
1x
(1)<θ<x
(n)
0 otherwise
so that Pitman estimator ofθis given by

0(x)=
θ
x
(n)
x
(1)
θdθ
θ
x
(n)
x
(1)

=
(x
(n)+x
(1))
2
.

426 PARAMETRIC POINT ESTIMATION
We now consider, briefly, Pitman estimator of a scale parameter. LetXhave a joint
PDF
f
σ(x)=
1
σ
n
f

x
1
σ
,...,
x
n
σ
˜
,
wherefis known andσ>0 is a scale parameter. The family{f
σ:σ>0}remains invariant
underG={{0,c},c>0}which induces
¯
G=GonΘ. Then for estimation ofσ
k
loss
functionL(σ,a)is invariant under these transformations if and only ifL(σ,a)=w

a
σ
k

.
An estimator∂ofσ
k
is equivariant underGif
∂({0,c}X)= c
k
∂(X) for allc>0.
Some simple examples of scale-equivariant estimators ofσare the mean deviation
θ
n
1
|Xi−
X|/nand the standard deviation
(
θ
n 1
(Xi−X)
2
/(n−1). We note that the group
¯
GoverΘis transitive so according to Theorem 1, the risk of any equivariant estimator of
σ
k
is free ofσand an MRE estimator minimizes this risk over the class of all equivariant
estimators ofσ
k
. Using the loss functionL(σ,a)=w(a/σ
k
)=(a−σ
k
)
2

2k
it can be
shown that the MRE estimator ofσ
k
, also known as thePitman estimateofσ
k
, is given by

0(x)=
ˆ

0
v
n+k−1
f(vx1,...,vx n)dv
ˆ

0
v
n+2k−1
f(vx1,...,vx n)dv
.
Just as in the location case one can show that∂
0is a function of the minimal suffi-
cient statistic and∂
0is the Bayes estimator ofσ
k
with improper priorπ(σ)=1/σ
2k+1
.
Consequently,∂
0is minimax.
Example 8 (continued).In Example 8, the Pitman estimator ofσ
k
is easily shown to be

0(X)=
Γ

n+k
2

Γ

n+2k
2


n

1
X
2
i

k/2
.
Thus the MRE estimator ofσis given by
)
Γ

n+1
2

(
θ
n 1
X
2
i
2
Γ

n+2
2

*
and that ofσ
2
by
θ
n
1
X
2
i
/(n+2).
Example 10.LetX
1,X2,...,X nbe iidU(0,θ). The Pitman estimator ofθis given by

0(X)=
ˆ

X
(n)
v
n
dv
ˆ

X
(n)
v
n+1
dv
=
n+2
n+1
X
(n).
PROBLEMS 8.9
In all problems assume thatX
1,X2,...,X nis a random sample from the distribution under
consideration.
1.Show that the following statistics are equivariant under translation group:
(a) Median(X
i).
(b)(X
(1)+X
(n))/2.

PRINCIPLE OF EQUIVARIANCE 427
(c)X
[np]+1, the quantile of orderp,0<p<1.
(d)

X
(r)+X
(r+1)+···+X
(n−r)

/(n−2r).
(e)
X+Y, whereYis the mean of a sample of sizem,m =n.
2.Show that the following statistics are invariant under location or scale or location-
scale group:
(a)X−median(X i).
(b)X
(n+1−k) −X
(k).
(c)
Θ
n
i=1
|Xi−
X|/n.
(d)
Θ
n
i=1
(Xi−
X)(Y i−Y)
{
Θ
n
i=1
(Xi−
X)
2
Θ
n i=1
(Yi−Y)
2
}
1/2, where(X 1,Y1,...,(X n,Yn)is a random sample from
a bivariate distribution.
3.Let the common distribution beG(α,σ)whereα(>0)is known andσ>0is
unknown. Find the MRE estimator ofσunder lossL(σ,a)=(1−a/σ)
2
.
4.Let the common PDF be the folded normal distribution

2
π
exp
ν

1
2
(x−μ)
2
φ
I
[μ,∞)(x).
Verify that the best equivariant estimator ofμunder quadratic loss is given by
ˆμ=
X−
exp{−
n
2
(X
(1)−X)
2
}

2nπ
)
ˆ

n(X
(1)−X)
0
1


exp(− z
2
/2)dz
*.
5.LetX∼U(θ,2θ).
(a) Show that(X
(1),X
(n))is jointly sufficient statistic forθ.
(b) Verify whether or not(X
(n)−X
(1))is an unbiased estimator ofθ.Findan
ancillary statistic.
(c) Determine the best invariant estimator ofθunder the loss functionL(θ,a)=

1−
a
θ

2
.
6.Let
f
θ(x)=
1
2
exp{−|x −θ|}.
Find the Pitman estimator ofθ.
7.Letf
θ(x)=exp{−(x−θ)}·[1+exp{−(x−θ)}]
−2
,forx∈R,θ∈R. Find the Pitman
estimator ofθ.
8.Show that an estimator∂is (location) equivariant if and only if
∂(x)=∂
0(x)+φ(x),
where∂
0is any equivariant estimator andφis an invariant function.

428 PARAMETRIC POINT ESTIMATION
9.LetX 1,X2be iid with PDF
f
σ(x)=
2
σ

1−
x
σ
˜
,0<x<σ,and=0 otherwise.
Find, explicitly, the Pitman estimator ofσ
r
.
10.LetX
1,X2,...,X nbe iid with PDF
f
θ(x)=
1
θ
exp(− x/θ),x>0,and=0,otherwise.
Find the Pitman estimator ofθ
k
.

9
NEYMAN–PEARSON THEORY OF
TESTING OF HYPOTHESES
9.1 INTRODUCTION
LetX
1,X2,...,X nbe a random sample from a population distributionF θ,θ∈Θ, where the
functional form ofF
θis known except, perhaps, for the parameterθ. Thus, for example, the
X
i’s may be a random sample fromN(θ,1), whereθ∈Ris not known. In many practical
problems the experimenter is interested in testing the validity of an assertion about the
unknown parameterθ. For example, in a coin-tossing experiment it is of interest to test,
in some sense, whether the (unknown) probability of headspequals a given numberp
0,
0<p
0<1. Similarly, it is of interest to check the claim of a car manufacturer about
the average mileage per gallon of gasoline achieved by a particular model. A problem of
this type is usually referred to as a problem oftesting of hypothesesand is the subject of
discussion in this chapter. We will develop the fundamentals of Neyman–Pearson theory.
In Section 9.2 we introduce the various concepts involved. In Section 9.3 the fundamental
Neyman–Pearson lemma is proved, and Sections 9.4 and 9.5 deal with some basic results
in the testing of composite hypotheses. Section 9.6 deals with locally optimal tests.
9.2 SOME FUNDAMENTAL NOTIONS OF HYPOTHESES TESTING
In Chapter 8 we discussed the problem of point estimation in sampling from a popula-
tion whose distribution is known except for a finite number of unknown parameters. Here
we consider another important problem in statistical inference, the testing of statistical
hypotheses. We begin by considering the following examples.
An Introduction to Probability and Statistics, Third Edition. Vijay K. Rohatgi and A.K. Md. Ehsanes Saleh.
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.

430 NEYMAN–PEARSON THEORY OF TESTING OF HYPOTHESES
Example 1.In coin-tossing experiments one frequently assumes that the coin is fair,
that is, the probability of getting heads or tails is the same:
1
2
. How does one test whether
the coin is fair (unbiased) or loaded (biased)? If one is guided by intuition, a reasonable
procedure would be to toss the coinntimes and count the number of heads. If the pro-
portion of heads observed does not deviate “too much” fromp=
1
2
, one would tend to
conclude that the coin is fair.
Example 2.It is usual for manufacturers to make quantitative assertions about their prod-
ucts. For example, a manufacturer of 12-volt batteries may claim that a certain brand of his
batteries lasts forNhours. How does one go about checking the truth of this assertion? A
reasonable procedure suggests itself: Take a random sample ofnbatteries of the brand in
question and note their length of life under more or less identical conditions. If the average
length of life is “much smaller” thanN, one would tend to doubt the manufacturer’s claim.
To fix ideas, let us define formally the concepts involved. As usual,X=(X
1,X2,...,X n)
and letX∼F
θ,θ∈Θ⊆R k. It will be assumed that the functional form ofF θis known
except for the parameterθ. Also, we assume thatΘcontains at least two points.
Definition 1.A parametric hypothesis is an assertion about the unknown parameterθ.
It is usually referred to as thenull hypothesis,H
0:θ∈Θ 0⊂Θ. The statementH 1:θ∈
Θ
1=Θ−Θ 0is usually referred to as the alternative hypothesis.
Usually the null hypothesis is chosen to correspond to the smaller or simpler subsetΘ
0
ofΘand is a statement of “no difference,” whereas the alternative represents change.
Definition 2.IfΘ
0(Θ1)contains only one point, we say thatΘ 0(Θ1)issimple; otherwise,
composite. Thus, if a hypothesis is simple, the probability distribution ofXis completely
specified under that hypothesis.
Example 3.LetX∼N(μ,σ
2
). If bothμandσ
2
are unknown,Θ={(μ,σ
2
):−∞<
μ<∞,σ
2
>0}. The hypothesisH 0:μ≤μ 0,σ
2
>0, whereμ 0is a known constant, is
a composite null hypothesis. The alternative hypothesis isH
1:μ>μ 0,σ
2
>0, which is
also composite. Similarly, the null hypothesisμ=μ
0,σ
2
>0 is also composite.
Ifσ
2

2
0
is known, the hypothesisH 0:μ=μ 0is a simple hypothesis.
Example 4.LetX
1,X2,...,X nbe iidb(1,p)RVs. Some hypotheses of interest arep=
1
2
,
p≤
1
2
,p≥
1
2
or, quite generally,p=p 0,p≤p 0,p≥p 0, wherep 0is a known number,
0<p
0<1.
The problem of testing of hypotheses may be described as follows: Given the sample
pointx=(x
1,x2,...,x n), find a decision rule (function) that will lead to a decision to reject
or fail to reject the null hypothesis. In other words, partition the sample space into two
disjoint setsCandC
c
such that, ifx∈C, we rejectH 0, and ifx∈C
c
, we fail to rejectH 0.
In the following we will write acceptH
0when we fail to rejectH 0. We emphasize that when
the sample pointx∈C
c
and we fail to rejectH 0, it does not mean thatH 0gets our stamp
of approval. It simply means that the sample does not have enough evidence againstH
0.

SOME FUNDAMENTAL NOTIONS OF HYPOTHESES TESTING 431
Definition 3.LetX∼F θ,θ∈Θ. A subsetCofR nsuch that ifx∈C, thenH 0is rejected
(with probability 1) and is called thecritical region(set):
C={x∈R
n:H0is rejected ifx∈C}.
There are two types of errors that can be made if one uses such a procedure. One may
rejectH
0when in fact it is true, called atype I error, or accept H 0when it is false, called a
type II error,
True
H0 H1
H0 Correct Type II Error
Accept
H1Type I Error Correct
IfCis the critical region of a rule,P θC,θ∈Θ 0,isaprobability of typeIerror, and
P
θC
c
,θ∈Θ 1,isaprobability of typeIIerror. Ideally, one would like to find a critical
region for which both these probabilities are 0. This will be the case if we can find a subset
S⊆R
nsuch thatP θS=1 for everyθ∈Θ 0andP θS=0 for everyθ∈Θ 1. Unfortunately,
situations such as this do not arise in practice, although they are conceivable. For example,
letX∼C(1,θ)underH
0andX∼P(θ)underH 1. Usually, if a critical region is such that the
probability of type I error is 0, it will be of the form “do not rejectH
0” and the probability
of type II error will then be 1.
The procedure used in practice is to limit the probability of type I error to some pre-
assigned levelα(usually 0.01 or 0.05) that is small and to minimize the probability of
type II error. To restate our problem in terms of this requirement, let us formulate these
notions.
Definition 4.Every Borel-measurable mappingϕofR
n→[0,1]is known as atest
function.
Some simple examples of test functions areϕ(x)=1 for allx∈R
n,ϕ(x)=0 for all
x∈R
n,orϕ(x )=α,0≤α≤1, for allx∈R n. In fact, Definition 4 includes Definition 3
in the sense that, wheneverϕis the indicator function of some Borel subsetAofR
n,Ais
called thecritical region(of the testϕ).
Definition 5.The mappingϕis said to be atestof hypothesisH
0:θ∈Θ 0against the alter-
nativesH
1:θ∈Θ 1with error probabilityα(also calledlevel of significanceor, simply,
level)if
E
θϕ(X) ≤α for allθ∈Θ 0. (1)
We shall say, in short, thatϕis a test for the problem(α,Θ
0,Θ1).

432 NEYMAN–PEARSON THEORY OF TESTING OF HYPOTHESES
Let us writeβ ϕ(θ)=E θϕ(X). Our objective, in practice, will be to seek a testϕfor a
givenα,0≤α≤1, such that
sup
θ∈Θ0
βϕ(θ)≤α. (2)
The left-hand side of (2) is usually known as thesizeof the testϕ. Condition (1) therefore
restricts attention to tests whose size does not exceed a given level of significanceα.
The following interpretation may be given to all testsϕsatisfyingβ
ϕ(θ)≤αfor all
θ∈Θ
0.Toeveryx∈R nwe assign a numberϕ(x),0≤ϕ(x)≤1, which is the probability
of rejectingH
0thatX∼f θ,θ∈Θ 0,ifxis observed. The restrictionβ ϕ(θ)≤αforθ∈Θ 0
then says that, ifH 0were true,ϕrejects it with a probability≤α. We will call such a test
arandomizedtest function. Ifϕ(x)=I
A(x),ϕwill be called anonrandomizedtest. If
x∈A, we rejectH
0with probability 1; and ifx∈A, this probability is 0. Needless to say,
A∈B
n.
We next turn our attention to the type II error.
Definition 6.Letϕbe a test function for the problem(α,Θ
0,Θ1). For everyθ∈Θ
define
β
ϕ(θ)=E θϕ(X)= P θ{RejectH 0}. (3)
As a function ofθ,β
ϕ(θ)is called thepower functionof the testϕ. For anyθ∈Θ 1,βϕ(θ)
is called the power ofϕagainst the alternativeθ.
In view of Definitions 5 and 6 the problem of testing of hypotheses may now be refor-
mulated. LetX∼f
θ,θ∈Θ⊆R k,Θ=Θ 0+Θ1. Also, let 0≤α≤1 be given. Given a
sample pointx,findatestϕ( x)such thatβ
ϕ(θ)≤αforθ∈Θ 0, andβ ϕ(θ)is a maximum
forθ∈Θ
1.
Definition 7.LetΦ
αbe the class of all tests for the problem(α,Θ 0,Θ1).Atestϕ 0∈Φα
is said to be amost powerful(MP) test against an alternativeθ∈Θ 1if
β
ϕ0
(θ)≥β ϕ(θ)for allϕ∈Φ α. (4)
IfΘ
1contains only one point, this definition suffices. If, on the other hand,Θ 1contains
at least two points, as will usually be the case, we will have an MP test corresponding to
eachθ∈Θ
1.
Definition 8.Atestϕ
0∈Φαfor the problem(α,Θ 0,Θ1)is said to be auniformly most
powerful(UMP) test if
β
ϕ0
(θ)≥β ϕ(θ)for allϕ∈Φ α,uniformly inθ∈Θ 1. (5)
Thus, ifΘ
0andΘ 1are both composite, the problem is to find a UMP testϕfor the
problem(α,Θ
0,Θ1). We will see that UMP tests very frequently do not exist, and we will
have to place further restrictions on the class of all tests,Φ
α.

SOME FUNDAMENTAL NOTIONS OF HYPOTHESES TESTING 433
Note that ifϕ 1,ϕ2are two tests andλis a real number, 0<λ<1, then λϕ 1+
(1−λ)ϕ
2is also a test function, and it follows that the class of all test functionsΦ αis
convex.
Example 5.LetX
1,X2,...,X nbe iidN(μ,1)RVs, whereμis unknown but it is known that
μ∈Θ={μ
0,μ1},μ0<μ1.LetH 0:Xi∼N(μ 0,1),H 1:Xi∼N(μ 1,1).BothH 0andH 1
are simple hypotheses. Intuitively, one would acceptH 0if the sample mean
Xis “closer”
toμ
0than toμ 1; that is to say, one would rejectH 0if
X>k, and acceptH 0otherwise. The
constantkis determined from the level requirements. Note that, underH
0,
X∼N(μ 0,1/n),
and, underH
1,
X∼N(μ 1,1/n).Given0 <α<1, we have
P
μ0
{
X>k}=P
θ
X−μ 0
1/

n
>
k−μ
0
1/

n
σ
=P{Type I error} =α,
so thatk=μ
0+zα/

n. The test, therefore, is (Fig. 1)
ϕ(x)=
α
1ifx>μ0+zα/

n,
0 otherwise.
HereXis known as atest statistic, and the testϕis nonrandomized with critical region
C={x:x>μ0+zα/

n}. Note that in this case the continuity ofX(that is, the absolute
continuity of the DF ofX) allows us to achieve any sizeα,0<α<1.
The power of the test atμ
1is given by
E
μ1
ϕ(X)= P μ1
θ
X>μ 0+
z
α

n
σ
=P
θ
X−μ 1
1/

n
>(μ
0−μ1)

n+z α
σ
=P{Z>z
α−

n(μ1−μ0)},
Accept H
0
Reject H
0
α
μ
0

0+z
α/√n
Fig. 1Rejection region ofH 0in Example 5.

434 NEYMAN–PEARSON THEORY OF TESTING OF HYPOTHESES
whereZ∼N(0,1). In particular,E μ1
ϕ(X) >αsinceμ 1>μ0. The probability of type II
error is given by
P{Type II error} =1−E
μ1
ϕ(X)
=P{Z≤z
α−

n(μ1−μ0)}.
Figure 2 gives a graph of the power functionβ
ϕ(μ)ofϕforμ>0 whenμ 0=0, and
H
1:μ>0.
Example 6.LetX
1,X2,X3,X4,X5,beasamplefromb(1, p), wherepis unknown and 0≤
p≤1. Consider the simple null hypothesisH
0:Xi∼b(1,
1
2
), that is, underH 0,p=
1
2
.
ThenH
1:Xi∼b(1,p),p=1/2. A reasonable procedure would be to compute the average
number of 1’s, namely,
X=

5
1
Xi/5, and to acceptH 0if|X−
1
2
|≤c, wherecis to be
determined. Letα=0.10. Then we would like to choosecsuch that the size of our test
isα, that is,
0.10=P
p=1/2

|
X−
1
2
|>c

,
or
0.90=P
p=1/2

−5c≤
5

1
Xi−
5
2
≤5c

=P
p=1/2

−k≤
5

1
Xi−
5
2
≤k

, (6)
0 1.5−1.5
0
0.05
0.5
1
Fig. 2Power function ofϕin Example 5.

SOME FUNDAMENTAL NOTIONS OF HYPOTHESES TESTING 435
wherek=5c.Now
Λ
5
1
Xi∼b(5,
1
2
)underH 0, so that the PMF of
Λ
5 1
Xi−
5
2
is given in
the following table.
5
Δ
1
xi
5
Δ
1
xi−
5
2
P
p=1/2
Γ
5
Δ
1
Xi=
5
Δ
1
xi

0 −2.50 .03125
1 −1.50 .15625
2 −0.50 .31250
30 .50 .31250
41 .50 .15625
52 .50 .03125
Note that we cannot choose anykto satisfy (6) exactly. It is clear that we have to reject
H
0whenk=±2.5, that is, when we observe
Λ
X i=0 or 5. The resulting size if we use
this test isα=0.03125+0.03125=0.0625<0.10. A second procedure would be to
rejectH
0ifk=±1.5or±2.5(
Λ
X i=0,1,4,5), in which case the resulting size isα=
0.0625+2(0.15625)= 0.375, which is considerably larger than 0.10. A third alternative,
if we insist on achievingα=0.10, is to randomize on the boundary. Instead of accepting or
rejectingH
0with probability 1 when
Λ
X i=1 or 4, we rejectH 0with probabilityγwhere
0.10=P
p=1/2
Γ
5
Δ
1
Xi=0or5

+γP
p=1/2
Γ
5
Δ
1
Xi=1or4

.
Thus
γ=
0.0375
0.3125
=0.114.
A randomized test of sizeα=0.10 is therefore given by
ϕ(x)=















1if
5
Δ
1
xi=0or5,
0.114 if
5
Δ
1
xi=1or4,
0 otherwise.
The power of this test is
E
pϕ(X)=P p
Γ
5
Δ
1
Xi=0or5

+0.114P p
Γ
5
Δ
1
Xi=1or4

,
wherep=
1
2
and can be computed for any value ofp. Figure 3 gives a graph ofβ ϕ(p).

436 NEYMAN–PEARSON THEORY OF TESTING OF HYPOTHESES
0.5 1 1.5
0.5
1
0
0
0.1
Fig. 3Power function ofϕin Example 6.
We conclude this section with the following remarks.
Remark 1.The problem of testing of hypotheses may be considered as a special case of the
general decision problem described in Section 8.8. LetA={a
0,a1}, wherea 0represents
the decision to acceptH
0:θ∈Θ 0anda 1represents the decision to rejectH 0. A decision
functionδis a mapping ofR
nintoA. Let us introduce the following loss functions:
L
1(θ,a1)=
π
1ifθ∈Θ
0
0ifθ∈Θ 1
andL 1(θ,a0)=0 for allθ,
and
L
2(θ,a0)=
π
0ifθ∈Θ
0
1ifθ∈Θ 1
andL 2(θ,a1)=0 for allθ.
Then the minimization ofE
θL2(θ,δ(X))subject toE θL1(θ,δ(X))≤αis the hypotheses
testing problem discussed above. We have
E
θL2(θ,δ(X)) =P θ{δ(X)=a 0}, θ∈Θ 1,
=P
θ{AcceptH 0|H1true},
and
E
θL1(θ,δ(X)) =P θ{δ(X)=a 1}, θ∈Θ 0,
=P
θ{RejectH 0|θ∈Θ 0true}.
Remark 2.In Example 6 we saw that the chosen sizeαis often unattainable. The choice
of a specific value ofαis completely arbitrary and is determined by nonstatistical

SOME FUNDAMENTAL NOTIONS OF HYPOTHESES TESTING 437
considerations such as the possible consequences of rejectingH 0falsely, and the economic
and practical implications of the decision to rejectH
0. An alternative, and somewhat sub-
jective, approach wherever possible is to report the so-called P-valueof the observed test
statistic. This is the smallest levelαat which the observed sample statistic is significant. In
Example 6, letS=
ϕ
5
i=1
Xi.IfS=0 is observed, thenP H0
(S=0)=P 0(S=0)=0.03125.
By symmetry, if we rejectH
0forS=0 we should do so also forS=5 so the probability
of interest isP
0(S=0or5)=0.0625 which is theP-value. IfS=1 is observed and we
decide to rejectH
0, then we would do so also forS=0 becauseS=0ismore extreme
thanS=1. By symmetry considerations
P-value=P
0(S≤1orS≥4)=2(0.03125+0.15625)= 0.375.
This discussion motivates Definition 9 below. Suppose the appropriate critical region
for testingH
0againstH 1is one-sided. That is, supposeCis either of the form{T≥c 1}
or{T≤c
2}, whereTis the test statistic.
Definition 9.The probability of observing underH
0a sample outcome at least as extreme
as the one observed is called theP-value. The smaller theP-value, the more extreme the
outcome and the stronger the evidence againstH
0.
Ifαis given, then we rejectH
0ifP≤αand do not rejectH 0ifP>α. In the two-sided
case when the critical region is of the formC={|T(X)|>k}, the one-sidedP-value is
doubled to obtain theP-value. If the distribution ofTis not symmetric then theP-value
is not well-defined in the two-sided case although many authors recommend doubling the
one-sidedP-value.
PROBLEMS 9.2
1.A sample of size 1 is taken from a population distributionP(λ).TotestH
0:λ=1
againstH
1:λ=2, consider the nonrandomized testϕ(x)=1ifx>3, and=0if
x≤3. Find the probabilities of type I and type II errors and the power of the test
againstλ=2. If it is required to achieve a size equal to 0.05, how should one modify
the testϕ?
2.LetX
1,X2,...,X nbe a sample from a population with finite meanμand finite vari-
anceσ
2
. Suppose thatμis not known, butσis known, and it is required to testμ=μ 0
againstμ=μ 1(μ1>μ0). Letnbe sufficiently large so that the central limit theorem
holds, and consider the test
ϕ(x
1,x2,...,x n)=1if
x>k,
=0ifx≤k,
wherex=n
−1
ϕ
n
i=1
xi.Findksuch that the test has (approximately) sizeα. What is
the power of this test atμ=μ
1? If the probabilities of type I and type II errors are
fixed atαandβ, respectively, find the smallest sample size needed.
3.In Problem 2, ifσis not known, findksuch that the testϕhas sizeα.

438 NEYMAN–PEARSON THEORY OF TESTING OF HYPOTHESES
4.LetX 1,X2,...,X nbe a sample fromN(μ,1). For testingμ≤μ 0againstμ>μ 0
consider the test function
ϕ(x
1,x2,...,x n)=







1if
x>μ0+
z
α

n
,
0ifx≤μ 0+
z
α

n
.
Show that the power function ofϕis a nondecreasing function ofμ. What is the size
of the test?
5.A sample of size 1 is taken from an exponential PDF with parameterθ, that is,
X∼G(1,θ).TotestH
0:θ=1againstH 1:θ>1,thetesttobeusedisthe
nonrandomized test
ϕ(x)=1ifx>2,
=0ifx≤2.
Find the size of the test. What is the power function?
6.LetX
1,X2,...,X nbe a sample fromN(0,σ
2
).TotestH 0:σ=σ 0againstH 1=
σ=σ
0, it is suggested that the test
ϕ(x
1,x2,...,x n)=
α
1if
ϕ
x
2
i
>c1or
ϕ
x
2
i
<c2,
0ifc
2≤
ϕ
x
2
i
≤c1,
be used. How will you findc
1andc 2such that the size ofϕis a preassigned number
α,0<α<1? What is the power function of this test?
7.An urn contains 10 marbles, of whichMare white and 10−Mare black. To test that
M=5 against the alternative hypothesis thatM=6, one draws 3 marbles from the
urn without replacement. The null hypothesis is rejected if the sample contains 2 or
3 white marbles; otherwise it is accepted. Find the size of the test and its power.
9.3 NEYMAN–PEARSON LEMMA
In this section we prove the fundamental lemma due to Neyman and Pearson [76], which
gives a general method for finding a best (most powerful) test of a simple hypothesis
against a simple alternative. Let{f
θ,θ∈Θ}, whereΘ={θ 0,θ1}, be a family of possible
distributions ofX.Also,f
θrepresents the PDF ofXifXis a continuous type rv, and the
PMF ofXifXis of the discrete type. Let us writef
0(x)=f θ0
(x)andf 1(x)=f θ1
(x)for
convenience.
Theorem 1(The Neyman–Pearson Fundamental Lemma).
(a) Any testϕof the form
ϕ(x)=





1if f
1(x)>kf 0(x),
γ(x)iff
1(x)=kf 0(x),
0if f
1(x)<kf 0(x),
(1)

NEYMAN–PEARSON LEMMA 439
for somek≥0 and 0≤γ(x)≤1, is most powerful of its size for testingH 0:θ=θ 0
againstH 1:θ=θ 1.Ifk=∞,thetest
ϕ(x)=1if f
0(x)=0, (2)
=0iff
0(x)>0,
is most powerful of size 0 for testingH
0againstH 1.
(b) Givenα,0≤α≤1, there exists a test of form (1) or (2) withγ(x)=γ(a constant),
for whichE
θ0
ϕ(X)= α.
Proof.Letϕbe a test satisfying (1), andϕ

be any test withE θ0
ϕ

(X)≤E θ0
ϕ(X).In
the continuous case

(ϕ(x)−ϕ

(x))(f 1(x)−kf 0(x))dx
=



f1>kf0
+

f1<kf0

⎠(ϕ(x)−ϕ

(x))(f 1(x)−kf 0(x))dx.
For anyx∈{f
1(x)>kf 0(x)},ϕ(x)−ϕ

(x)=1−ϕ

(x)≥0, so that the integrand is≥0.
Forx∈{f
1(x)<kf 0(x)},ϕ(x)−ϕ

(x)=−ϕ

(x)≤0, so that the integrand is again≥0.
It follows that

(ϕ(x)−ϕ

(x))(f 1(x)−kf 0(x))dx
=E
θ1
ϕ(X)−E θ1
ϕ

(X)−k(E θ0
ϕ(X)−E θ0
ϕ

(X))≥0,
which implies
E
θ1
ϕ(X)−E θ1
ϕ

(X)≥k(E θ0
ϕ(X)−E θ0
ϕ

(X))≥0
sinceE
θ0
ϕ

(X)≤E θ0
ϕ(X).
Ifk=∞,anytestϕ

of size 0 must vanish on the set{f 0(x)>0}.Wehave
E
θ1
ϕ(X)−E θ1
ϕ

(X)=

{f0(x)=0}
(1−ϕ

(x))f 1(x)dx≥0.
The proof for the discrete case requires the usual change of integral by a sum throughout.
To prove (b) we need to restrict ourselves to the case where 0<α≤1, since the MP
size 0 test is given by (2). Letγ(x)=γ, and let us compute the size of a test of form (1).
We have
E
θ0
ϕ(X)=P θ0
{f1(X)>kf 0(X)}+γP θ0
{f1(X)=kf 0(X)}
=1−P
θ0
{f1(X)≤kf 0(X)}+γP θ0
{f1(X)=kf 0(X)}.

440 NEYMAN–PEARSON THEORY OF TESTING OF HYPOTHESES
SinceP θ0
{f0(X)=0} =0, we may rewriteE θ0
ϕ(X) as
E
θ0
ϕ(X)=1 −P θ0
ϕ
f
1(X)
f0(X)
≤k
α
+γP
θ0
ϕ
f
1(X)
f0(X)
=k
α
. (3)
Given 0<α≤1, we wish to findkandγsuch thatE
θ0
ϕ(X)=α , that is,
P
θ0
ϕ
f
1(X)
f0(X)
≤k
α
−γP
θ0
ϕ
f
1(X)
f0(X)
≤k
α
=1−α. (4)
Note that
ϕ
f
1(X)
f0(X)
≤k
α
is a DF so that it is a nondecreasing and right continuous function ofk. If there exists ak
0
such that
P
θ0
ϕ
f
1(X)
f0(X)
≤k
0
α
=1−α,
we chooseγ=0 andk=k
0. Otherwise there exists ak 0such that
P
θ0
ϕ
f
1(X)
f0(X)
<k
0
α
≤1−α<P
θ0
ϕ
f
1(X)f0(X)
≤k
0
α
, (5)
that is, there is a jump atk
0(see Fig. 1). In this case we choosek=k 0and
γ=
P
θ0
{f1(X)/f 0(X)≤k 0}−(1−α)
Pθ0
{f1(X)/f 0(X)=k 0}.
(6)
Sinceγgiven by (6) satisfies (4), and 0≤γ≤1, the proof is complete.
Remark 1.It is possible to show (see Problem 6) that the test given by (1) or (2) is unique
(except on a null set), that is, ifϕis an MP test of sizeαofH
0againstH 1,itmusthave
form (1) or (2), except perhaps for a setAwithP
θ0
(A)=P θ1
(A)=0.
Remark 2.An analysis of proof of part (a) of Theorem 1 shows that test (1) is MP even if
f
1andf 0are not necessarily densities.
Theorem 2.If a sufficient statisticTexists for the family{f
θ:θ∈Θ},Θ={θ 0,θ1},the
Neyman–Pearson MP test is a function ofT.
Proof.The proof of this result is left as an exercise.
Remark 3.If the family{f
θ:θ∈Θ}admits a sufficient statistic, one can restrict attention
to tests based on the sufficient statistic, that is, to tests that are functions of the sufficient

NEYMAN–PEARSON LEMMA 441
0 k
0 k
1
1–α
Fig. 1
statistic. Ifϕis a test function andTis a sufficient statistic,E{ϕ(X) |T}is itself a test
function, 0≤E{ϕ(X) |T}≤1, and
E
θ{ϕ(X) |T}}=E θϕ(X),
so thatϕandE{ϕ|T}have the same power function.
Example 1.LetXbe an RV with PMF underH
0andH 1given by
x
123456
f0(x)0.01 0.01 0.01 0.01 0.01 0.95
f
1(x)
0.05 0.04 0.03 0.02 0.01 0.85
Thenλ(x)=f
1(x)/f 0(x)is given by
x
12345 6
λ(x)543210 .89
Ifα=0.03, for example, then Neyman–Pearson MP size 0.03 test rejectsH
0ifλ(X)≥3,
that is, ifX≤3 and has power
P
1(X≤3)=0.05+0.04+0.03=0.12
withP(Type II error)=1−0.12=0.88.

442 NEYMAN–PEARSON THEORY OF TESTING OF HYPOTHESES
Example 2.LetX∼N(0,1)underH 0andX∼C(1,0)underH 1.TofindanMPsizeα
test ofH
0againstH 1,
λ(x)=
f
1(x)
f0(x)
=
(1/π)[1/(1 +x
2
)]
(1/

2π)e
−x
2
/2
=

2
π
e
x
2
/2
1+x
2
.
Figure 2 gives a graph ofλ(x)and we note thatλhas a maximum atx=0 and two min-
imas atx=±1. Note thatλ(0)=0.7979 and λ(±1)=0.6578 so fork∈(0.6578, 0.7989),
λ(x)=kintersects the graph at four points and the critical region is of the form|X|≤k
1or
|X|≥k
2, wherek 1andk 2are solutions ofλ(x)=k.Fork=0.7979, the critical region is of
the form|X|≥k
0, wherek 0is the positive solution ofe
−k
2
0
/2
=1+k
2
0
so thatk 0≈1.59 with
α=0.1118. Fork<0.6578,α=1 and fork=0.6578, the critical region is|X|≥1 with
α=0.3413. For the traditional levelα=0.05, the critical region is of the form|X|≥1.96.
Example 3.LetX
1,X2,...,X nbe iidb(1,p)RVs, and letH 0:p=p 0,H1:p=p 1,p1>p0.
TheMPsizeα test ofH
0againstH 1is of the form
ϕ(x
1,x2,...,x n)=









1,λ(x)=
p

x
i
1
(1−p 1)
n−

x i
p

x
i
0
(1−p 0)
n−

x i
>k,
γ, λ(x)=k,
0,λ(x)<k,
λ(0) = 0.7979
λ(1) = 0.6578
−k
1 k
1 1 k
2 x−k
2 −1 0
Fig. 2Graph ofλ(x)=(2/π)
1/2exp(x
2
/2)
(1+x
2
)
.

NEYMAN–PEARSON LEMMA 443
wherekandγare determined from
E
p0
ϕ(X)= α.
Now
λ(x)=

p
1
p0

ϕ
x
i

1−p
11−p 0

n−
ϕ
x i
,
and sincep
1>p0,λ(x)is an increasing function of
ϕ
x i. It follows thatλ(x)>kif and
only if
ϕ
x
i>k1, wherek 1is some constant. Thus the MP sizeαtest is of the form
ϕ(x)=





1if
ϕ
x
i>k1,
γif
ϕ
x
i=k1,
0 otherwise.
Also,k
1andγare determined from
α=E
p0
ϕ(X)= P p0
α
n
β
1
Xi>k1
λ
+γP
p0
α
n
β
1
Xi=k1
λ
=
n
β
r=k1+1

n
r

p
r
0
(1−p 0)
n−r


n
k
1

p
k1
0
(1−p 0)
n−k1
.
Note that the MP sizeαtest is independent ofp
1as long asp 1>p0, that is, it remains an
MP sizeαtest against anyp>p
0and is therefore a UMP test ofp=p 0againstp>p 0.
In particular, letn=5,p
0=
1
2
,p1=
3
4
, andα=0.05. Then the MP test is given by
ϕ(x)=





1
ϕ
x
i>k,
γ
ϕ
x
i=k,
0
ϕ
x
i<k,
wherekandγare determined from
0.05=α=
5
β
k+1

5
r

1
2

5


5
k

1
2

5
.
It follows thatk=4 andγ=0.122. Thus the MP sizeα=0.05 test is to rejectp=
1
2
in
favor ofp=
3
4
if
ϕ
n
1
Xi=5 and rejectp=
1
2
with probability 0.122 if
ϕ
n 1
Xi=4.
It is simply a matter of reversing inequalities to see that the MP sizeαtest ofH
0:p=p 0
againstH 1:p=p 1(p1<p0) is given by
ϕ(x)=





1if
ϕ
x
i<k,
γif
ϕ
x
i=k,
0if
ϕ
x
i>k,
whereγandkare determined fromE
p0
ϕ(X)= α.

444 NEYMAN–PEARSON THEORY OF TESTING OF HYPOTHESES
We note thatT(X)=
Λ
X iis minimal sufficient forpso that, in view of Remark 3, we
could have considered tests based only onT. SinceT∼b(n,p),
λ(t)=
f
1(t)
f0(t)
=

n
t

p
t
1
(1−p 1)
n−t

n
t

p
t
0
(1−p 0)
n−t
=

p
1p0

t
1−p
1
1−p 0

n−t
so that an MP Test is of the same form as above but the computation is somewhat simpler.
We remark that in both cases(p
1>p0,p1<p0)the MP test is quite intuitive. We would
tend to accept the larger probability if a larger number of “successes” showed up, and
the smaller probability if a smaller number of “successes” were observed. See, however,
Example 2.
Example 4.LetX
1,X2,...,X nbe iidN(μ,σ
2
)RVs where bothμandσ
2
are unknown.
We wish to test the null hypothesisH
0:μ=μ 0,σ
2

2
0
against the alternativeH 1:μ=μ 1,
σ
2

2
0
. The fundamental lemma leads to the following MP test:
ϕ(x)=
Γ
1ifλ(x)>k,
0ifλ(x)<k,
where
λ(x)=
(1/σ
0

2π)
n
exp{−[
Λ
(x i−μ1)
2
/2σ
2
0
]}
(1/σ0

2π)
n
exp{−[
Λ
(x i−μ0)
2
/2σ
2
0
]}
,
andkis determined fromE
μ0,σ0
ϕ(X)= α.Wehave
λ(x)=exp
Θ
Δ
x
i

μ
1
σ
2
0

μ
0
σ
2
0

+n

μ
2
0

2
0

μ
2 1

2
1

.
Ifμ
1>μ0, then
λ(x)>kif and only if
n
Δ
i=1
xi>k
Γ
,
wherek
Γ
is determined from
α=P
μ0,σ0
Γ
n
Δ
i=1
Xi>k
Γ

=P
ΘΛ
X
i−nμ 0

nσ0
>
k
Γ
−nμ 0

nσ0
Φ
,
givingk
Γ
=zα

nσ0+nμ 0. The caseμ 1<μ0is treated similarly. Ifσ 0is known, the
test determined above is independent ofμ
1as long asμ 1>μ0, and it follows that the
test is UMP againstH
Γ
1
:μ>μ 1,σ
2

2
0
.If,however,σ 0is not known, that is, the null
hypothesis is a composite hypothesisH
ΓΓ
0
:μ=μ 0,σ
2
>0 to be tested against the alterna-
tivesH
ΓΓ
1
:μ=μ 1,σ
2
>0(μ 1>μ0), then the MP test determined above depends onσ
2
.
In other words, an MP test against the alternativeμ
1,σ
2
0
will not be MP againstμ 1,σ
2
1
,
whereσ
2
1

2
0
.

NEYMAN–PEARSON LEMMA 445
PROBLEMS 9.3
1.A sample of size 1 is taken from PDF
f
θ(x)=



2
θ
2
(θ−x)if 0<x<θ,
0 otherwise.
Find an MP test ofH
0:θ=θ 0againstH 1:θ1(θ1<θ0).
2.Find the Neyman–Pearson sizeαtest ofH
0:θ=θ 0againstH 1:θ=θ 1(θ1<θ0)
based on a sample of size 1 from the PDF
f
θ(x)=2θx+2(1−θ)(1−x), 0<x<1,θ ∈[0,1].
3.Find the Neyman–Pearson sizeαtest ofH
0:β=1againstH 1:β=β 1(>1) based
on a sample of size 1 from
f(x;β)=
θ
βx
β−1
,0<x<1,
0, otherwise.
4.Find an MP sizeαtest ofH
0:X∼f 0(x), wheref 0(x)=(2π)
−1/2
e
−x
2
/2
,−∞<x<
∞,againstH
1:X∼f 1(x)wheref 1(x)=2
−1
e
−|x|
,−∞<x<∞, based on a sample
of size 1.
5.For the PDFf
θ(x)=e
−(x−θ)
,x≥θ,findanMPsizeα test ofθ=θ 0againstθ=θ 1
(>θ0), based on a sample of sizen.
6.Ifϕ

is an MP sizeαtest ofH 0:X∼f 0(x)againstH 1:X∼f 1(x)show that it has
to be either of form (1) or form (2) (except for a set ofxthat has probability 0 under
H
0andH 1).
7.Letϕ

be an MP sizeα(0<α≤1) test ofH 0againstH 1, and letk(α)denote the
value ofkin (1). Show that ifα
1<α2, thenk(α 2)≤k(α 1).
8.For the family of Neyman–Pearson tests show that the larger theα, the smaller the
β(=P[Type II error]).
9.Let 1−βbe the power of an MP sizeαtest, where 0<α<1. Show thatα<1−β
unlessP
θ0
=Pθ1
.
10.Letαbe a real number, 0<α<1, andϕ

be an MP sizeαtest ofH 0againstH 1.
Also, letβ=E
H1
ϕ

(X)<1. Show that 1−ϕ

is an MP test for testingH 1against
H
0at level 1−β.
11.LetX
1,X2,...,X nbe a random sample from PDF
f
θ(x)=
θ
x
2
if 0<θ≤x<∞.
Find an MP test ofθ=θ
0againstθ=θ 1(=θ0).
12.LetXbe an observation in(0,1).FindanMPsizeαtest ofH
0:X∼f(x)=4x if
0<x<
1
2
, and=4−4xif
1
2
≤x<1, againstH 1:X∼f(x)=1if0<x<1. Find
the power of your test.

446 NEYMAN–PEARSON THEORY OF TESTING OF HYPOTHESES
13.In each of the following cases of simple versus simple hypothesesH 0:X∼f 0,H1:
X∼f
1, draw a graph of the ratioλ(x)=f 1(x)/f 0(x)and find the form of the Neyman–
Pearson test:
(a)f
0(x)=(1/2)exp{−|x+1|};f 1(x)=(1/2)exp{−|x−1|}.
(b)f
0(x)=(1/2)exp{−|x|};f 1(x)={1/[π(1+x
2
)]}.
(c)f
0(x)=(1/π){1+(1+x)
2
}
−1
;f1(x)=(1/π){1+(1−x)
2
}
−1
.
14.LetX
1,X2,...,X nbe a random sample with common PDF
f
θ(x)=
1

exp{−|x |/θ},x∈R,θ>0.
Find a sizeαMP test for testingH
0:θ=θ 0versusH 1:θ=θ 1(>θ0).
15.LetX∼f
j,j=0,1, where
x 12345
f
0(x)1/51/51/51/51/5
f
1(x)1/61/41/61/41/6
(a) Find the form of the MP test of its size.
(b) Find the size and the power of your test for various values of the cutoff point.
(c) Consider now a random sample of sizenfromf
0underH 0orf1underH 1.Find
the form of the MP test of its size.
9.4 FAMILIES WITH MONOTONE LIKELIHOOD RATIO
In this section we consider the problem of testing one-sided hypotheses on a single real-
valued parameter. Let{f
θ,θ∈Θ}be a family of PDFs (PMFs),Θ⊆R, and suppose that
we wish to testH
0:θ≤θ 0against the alternativesH 1:θ>θ 0or its dual,H
Γ
0
:θ≥θ 0,
againstH
Γ
1
:θ<θ0. In general, it is not possible to find a UMP test for this problem. The
MP test ofH
0:θ≤θ 0, say, against the alternativeθ=θ 1(>θ0) depends onθ 1and cannot
be UMP. Here we consider a special class of distributions that is large enough to include the
one-parameter exponential family, for which a UMP test of a one-sided hypothesis exists.
Definition 1.Let{f
θ,θ∈Θ}be a family of PDFs (PMFs),θ⊆R. We say that{f θ}has
amonotone likelihood ratio(MLR) in statisticT(x)if forθ
1<θ2, wheneverf θ1
,fθ2
are
distinct, the ratiof
θ2
(x)/f θ1
(x)is a nondecreasing function ofT(x)for the set of valuesx
for which at least one off
θ1
andf θ2
is>0.
It is also possible to define families of densities with nonincreasing MLR inT(x),but
such families can be treated by symmetry.
Example 1.LetX
1,X2,...,X n∼U[0,θ],θ>0. The joint PDF ofX 1,...,X nis
f
θ(x)=



1
θ
n
,0≤maxx i≤θ,
0,otherwise.

FAMILIES WITH MONOTONE LIKELIHOOD RATIO 447
Letθ 2>θ1and consider the ratio
f
θ2
(x)
fθ1
(x)
=
(1/θ
n
2
)I
[maxx i≤θ2]
(1/θ
n
1
)I
[maxx i≤θ1]
=

θ
1
θ2

n
I
[maxx i≤θ2]/I
[maxx i≤θ1].
Let
R(x)=
I
[maxx i≤θ2]
I
[maxx i≤θ1]
=
α
1,maxx
i∈[0,θ 1],
∞,maxx
i∈[θ1,θ2].
DefineR(x)=∞ ifmaxx
i>θ2. It follows thatf θ2
/fθ1
is a nondecreasing function of
max
1≤i≤nxi, and the family of uniform densities on[0,θ]has an MLR inmax 1≤i≤nxi.
Theorem 1.The one-parameter exponential family
f
θ(x)=exp{Q(θ)T(x)+S(x)+D(θ )}, (1)
whereQ(θ)is nondecreasing, has an MLR inT(x).
Proof.The proof is left as an exercise.
Remark 1.The nondecreasingness ofQ(θ)can be obtained by a reparametrization, putting
ϑ=Q(θ), if necessary.
Theorem 1 includes normal, binomial, Poisson, gamma (one parameter fixed), beta
(one parameter fixed), and so on. In Example 1 we have already seen thatU[0,θ], which
is not an exponential family, has an MLR.
Example 2.LetX∼C(1,θ). Then
f
θ2
(x)
fθ1
(x)
=
1+(x−θ
1)
2
1+(x−θ 2)
2
→1as x→±∞,
and we see thatC(1,θ)does not have an MLR.
Theorem 2.LetX∼f
θ,θ∈Θ, where{f θ}has an MLR inT(x). For testingH 0:θ≤θ 0
againstH 1:θ>θ0,θ0∈Θ, any test of the form
ϕ(x)=





1ifT(x)>t
0,
γifT(x)=t
0,
1ifT(x)<t
0,
(2)

448 NEYMAN–PEARSON THEORY OF TESTING OF HYPOTHESES
has a nondecreasing power function and is UMP of its sizeE θ0
ϕ(X)=α (provided that
the size is not 0).
Moreover, for every 0≤α≤1 and everyθ
0∈Θ, there exists at 0,−∞ ≤t 0≤∞, and
0≤γ≤1 such that the test described in (2) is the UMP sizeαtest ofH
0againstH 1.
Proof.Letθ
1,θ2∈Θ,θ 1<θ2. By the fundamental lemma any test of the form
ϕ(x)=





1,λ (x)>k,
γ(x),λ(x)=k,
0,λ (x)<k,
(3)
whereλ(x)=f
θ2
(x)/f θ1
(x)is MP of its size for testingθ=θ 1againstθ=θ 2, provided
that 0≤k<∞and ifk=∞,thetest
ϕ(x)=
θ
1,iff
θ1
(x)=0,
0iff
θ1
(x)>0,
(4)
is MP of size 0. Sincef
θhas an MLR inT, it follows that any test of form (2) is also of
form (3), provided thatE
θ1
ϕ(X) >0, that is, provided that its size is>0. The trivial test
ϕ
θ
(x)≡αhas sizeαand powerα, so that the power of any test (2) is at leastα, that is,
E
θ2
ϕ(X) ≥E θ2
ϕ
θ
(X)=α =E θ1
ϕ(X).
It follows that, ifθ
1<θ2andE θ1
ϕ(X) >0, thenE θ1
ϕ(X) ≤E θ2
ϕ(X), as asserted.
Letθ
1=θ0andθ 2>θ0, as above. We know that (2) is an MP test of its sizeE θ0
ϕ(X)
for testingθ=θ
0againstθ=θ 2(θ2>θ0), provided thatE θ0
ϕ(X) >0. Since the power
function ofϕis nondecreasing,
E
θϕ(X) ≤E θ0
ϕ(X)=α 0 for allθ≤θ 0. (5)
Since, however,ϕdoes not depend onθ
2(it depends only on constantskandγ), it follows
thatϕis the UMP sizeα
0test for testingθ=θ 0againstθ>θ 0. Thusϕis UMP among
the class of testsϕ
θθ
for which
E
θ0
ϕ
θθ
(X)≤E θ0
ϕ(X)=α 0. (6)
Now the class of tests satisfying (5) is contained in the class of tests satisfying (6)
[there are more restrictions in (5)]. It follows thatϕ, which is UMP in the larger class
satisfying (6), must also be UMP in the smaller class satisfying (5). Thus, provided that
α
0>0,ϕis the UMP sizeα 0test forθ≤θ 0againstθ>θ 0.
We ask the reader to complete the proof of the final part of the theorem, using the
fundamental lemma.

FAMILIES WITH MONOTONE LIKELIHOOD RATIO 449
Remark 2.By interchanging inequalities throughout in Theorem 2, we see that this
theorem also provides a solution of the dual problemH
Γ
0
:θ≥θ 0againstH
Γ
1
:θ<θ0.
Example 3.LetXhave the hypergeometric PMF
P
M{X=x}=

M
x

N−M
n−x


N
n
,x=0,1,2,...,M.
Since
P
M+1{X=x}
PM{X=x}
=
M+1
N−M
N−M−n+x
M+1−x
,
we see that{P
M}has an MLR inx(P M2
/PM1
whereM 2>M1is just a product of such
ratios). It follows that there exists a UMP test ofH
0:M≤M 0againstH 1:M>M 0, which
rejectsH
0whenXis too large, that is, the UMP sizeαtest is given by
ϕ(x)=





1,x>k,
γ,x=k,
0,x<k,
where (integer)kandγare determined from
E
M0
ϕ(X)=α.
For the one-parameter exponential family UMP tests exist also for some two-sided
hypotheses of the form
H
0:θ≤θ 1orθ≥θ 2(θ1<θ2). (7)
We state the following result without proof.
Theorem 3.For the one-parameter exponential family (1), there exists a UMP test of the
hypothesisH
0:θ≤θ 1orθ≥θ 2(θ1<θ2)againstH 1:θ1<θ<θ2that is of the form
ϕ(x)=





1ifc
1<T(x)<c 2,
γ
iifT(x)=c i,i=1,2 (c 1<c2),
0ifT(x)<c
1or>c 2,
(8)
where thec’s and theγ’s are given by
E
θ1
ϕ(X)=E θ2
ϕ(X)= α. (9)
See Lehmann [64, pp. 101–103], for proof.

450 NEYMAN–PEARSON THEORY OF TESTING OF HYPOTHESES
Example 4.LetX 1,X2,...,X nbe iidN(μ,1)RVs. To testH 0:μ≤μ 0orμ≥μ 1(μ1>μ0)
againstH
1:μ0<μ<μ 1, the UMP test is given by
ϕ(x)=





1ifc
1<
ϕ
n
1
xi<c2,
γ
iif
ϕ
x i=c1orc2,
0if
ϕ
x
i<c1or>c 2,
where we determinec
1,c2from
α=P
μ0
{c1<
β
X i<c2}=P μ1
{c1<
β
X i<c2}
andγ
1=γ2=0. Thus
α=P
θ
c
1−nμ 0

n
<
ϕ
X
i−nμ 0

n
<
c
2−nμ 0

n
σ
=P
θ
c
1−nμ 1

n
<
ϕ
X
i−nμ 1

n
<
c
2−nμ 1

n
σ
=P
θ
c
1−nμ 0

n
<Z<
c
2−nμ 0

n
σ
=P
θ
c
1−nμ 1

n
<Z<
c
2−nμ 1

n
σ
,
whereZisN(0,1).Givenα,n,μ
0, andμ 1, we can solve forc 1andc 2from the simultaneous
equations
Φ

c
2−nμ 0

n

−Φ

c
1−nμ 0

n

=α,
Φ

c
2−nμ 1

n

−Φ

c
1−nμ 1

n

=α,
whereΦis the DF ofZ.
Remark 3.We caution the reader that UMP tests for testingH
0:θ1≤θ≤θ 2and
H
α
0
:θ=θ 0for the one-parameter exponential family do not exist. An example will suffice.
Example 5.LetX
1,X2,...,X nbe a sample fromN(0,σ
2
). Since the family of joint PDFs
ofX=(X
1,...,X n)has an MLR inT(X)=
ϕ
n 1
X
2
i
, it follows that UMP tests exist for
one-sided hypothesesσ≥σ
0andσ≤σ 0.
Consider now the null hypothesesH
0:σ=σ 0against the alternativeH 1:σ=σ 0.We
will show that a UMP test ofH
0does not exist. For testingσ=σ 0againstσ>σ 0,atest
of the form
ϕ
1(x)=
α
1,
ϕ
x
2
i
>c1
0,otherwise

FAMILIES WITH MONOTONE LIKELIHOOD RATIO 451
is UMP, and for testingσ=σ 0againstσ<σ 0, a test of the form
ϕ
2(x)=
α
1
ϕ
x
2
i
<c2
0 otherwise
is UMP. If the size is chosen asα, thenc
1=σ
2
0
χ
2
n,α
andc 2=σ
2
0
χ
2
n,1−α
. Clearly, neitherϕ 1
norϕ 2is UMP forH 0againstH 1:σ=σ 0. The power of any test ofH 0for valuesσ>σ 0
cannot exceed that ofϕ 1, and for values ofσ<σ 0it cannot exceed the power of testϕ 2.
Hence no test ofH
0can be UMP (see Fig. 1).
PROBLEMS 9.4
1.For the following families of PMFs (PDFs)f
θ(x),θ∈Θ⊆R,findaUMPsizeα test
ofH
0:θ≤θ 0againstH 1:θ>θ0, based on a sample ofnobservations.
(a)f
θ(x)=θ
x
(1−θ)
1−x
,x=0,1; 0<θ<1.
(b)f
θ(x)=(1/

2π)exp{−(x−θ)
2
/2},−∞<x<∞,−∞<θ<∞.
(c)f
θ(x)=e
−θ

x
/x!),x=0,1,2,...;θ>0.
(d)f
θ(x)=(1/θ)e
−x/θ
,x>0,θ>0.
(e)f
θ(x)=[1/Γ(θ )]x
θ−1
e
−x
,x>0,θ>0.
(f)f
θ(x)=θx
θ−1
,0<x<1,θ>0.
0
1
13 2
σ>1
σ≠1
σ<1
Fig. 1Power functions of chi-square tests ofH 0:σ=σ 0againstH 1.

452 NEYMAN–PEARSON THEORY OF TESTING OF HYPOTHESES
2.LetX 1,X2,...,X nbe a sample of sizenfrom the PMF
P
N(x)=
1
N
,x=1,2,...,N;N∈{1,2,...}.
(a) Show that the test
ϕ(x
1,x2,...,x n)=
α
1ifmax(x
1,x2,...,x n)>N 0
αifmax(x 1,x2,...,x n)≤N 0
is UMP sizeαfor testingH 0:N≤N 0againstH 1:N>N 0.
(b) Show that
ϕ(x
1,x2,...,x n)=





1ifmax(x
1,x2,...,x n)>N 0or
max(x
1,x2,...,x n)≤α
1/n
N0
0 otherwise,
is a UMP sizeαtest ofH
α
0
:N=N 0againstH
α
1
:N=N 0.
3.LetX
1,X2,...,X nbe a sample of sizenfromU(0,θ),θ>0. Show that the test
ϕ
1(x1,x2,...,x n)=
α
1ifmax(x
1,...,x n)>θ0
αifmax(x 1,x2,...,x n)≤θ 0
is UMP sizeαfor testingH 0:θ≤θ 0againstH 1:θ>θ0and that the test
ϕ
2(x1,x2,...,x n)=





1ifmax(x
1,...,x n)>θ0or
max(x
1,x2,...,x n)≤θ 0α
1/n
0 otherwise
is UMP sizeαforH
α
0
:θ=θ 0againstH
α
1
:θ=θ 0.
4.Does the Laplace family of PDFs
f
θ(x)=
1
2
exp{−|x −θ|},−∞<x<∞,θ ∈R,
possess an MLR?
5.LetXhave logistic distribution with PDF
f
θ(x)=e
−x−θ
{1+e
−x−θ
}
−2
,x∈R.
Does{f
θ}belong to the exponential family? Does{f θ}have MLR?
6.(a) Letf
θbe the PDF of aN(θ,θ)RV. Does{f θ}have MLR?
(b) Do the same as in (a) ifX∼N(θ,θ
2
).

UNBIASED AND INVARIANT TESTS 453
9.5 UNBIASED AND INVARIANT TESTS
We have seen that, if we restrict ourselves to the classΦ
αof all sizeαtests, there do not
exist UMP tests for many important hypotheses. This suggests that we reduce the class of
tests under consideration by imposing certain restrictions.
Definition 1.Asizeα testϕofH
0:θ∈Θ 0against the alternativesH 1:θ∈Θ 1is said to
beunbiasedif
E
θϕ(X) ≥α for allθ∈Θ 1. (1)
It follows that a testϕis unbiased if and only if its power functionβ
ϕ(θ)satisfies
β
ϕ(θ)≤α forθ∈Θ 0 (2)
and
β
ϕ(θ)≥α forθ∈Θ 1. (3)
This seems to be a reasonable requirement to place on a test. An unbiased test rejects a
falseH
0more often than a trueH 0.
Definition 2.LetU
αbe the class of all unbiased sizeαtests ofH 0. If there exists a test
ϕ∈U
αthat has maximum power at eachθ∈Θ 1, we callϕaUMPunbiasedsizeαtest.
ClearlyU
α⊂Φα. If a UMP test exists inΦ α,itisUMPinU α. This follows by com-
paring the power of the UMP test with that of the trivial testϕ(x)=α. It is convenient to
introduce another class of tests.
Definition 3.Atestϕ is said to beα-similaron a subsetΘ

ofΘif
β
ϕ(θ)=E θϕ(X)=α forθ∈Θ

. (4)
A test is said to besimilaronasetΘ

⊆Θif it isα-similar onΘ

for someα,0≤α≤1.
It is clear that there exists at least one similar test on everyΘ

, namely,ϕ(x)≡α,
0≤α≤1.
Theorem 1.Letβ
ϕ(θ)be continuous inθfor anyϕ.Ifϕ is an unbiased sizeαtest of
H
0:θ∈Θ 0againstH 1:θ∈Θ 1,itisα-similar on the boundaryΛ=
Θ0∩Θ1.(HereAis
the closure of setA.)
Proof.Letθ∈Λ. Then there exists a sequence{θ
n},θn∈Θ0, such thatθ n→θ. Since
β
ϕ(θ)is continuous,β ϕ(θn)→β ϕ(θ); and sinceβ ϕ(θn)≤α,forθ n∈Θ0,βϕ(θ)≤α.
Similarly, there exists a sequence{θ
Γ
n
},θ
Γ
n
∈Θ1, such thatβ ϕ(θ
Γ
n
)≥α(ϕis unbiased) and
θ
Γ
n
→θ. Thusβ ϕ(θ
Γ
n
)→β ϕ(θ), and it follows thatβ ϕ(θ)≥α. Henceβ ϕ(θ)=αforθ∈Λ,
andϕisα-similar onΛ.

454 NEYMAN–PEARSON THEORY OF TESTING OF HYPOTHESES
Remark 1.Thus, ifβ ϕ(θ)is continuous inθfor anyϕ, an unbiased sizeαtest ofH 0
againstH 1is alsoα-similar for the PDFs (PMFs) ofΛ, that is, for{f θ,θ∈Λ}. If we can
find an MP similar test ofH
0:θ∈ΛagainstH 1, and if this test is unbiased sizeα, then
necessarily it is MP in the smaller class.
Definition 4.Atestϕ that is UMP among allα-similar tests on the boundaryΛ=
Θ0∩Θ1
is said to be a UMPα-similartest.
It is frequently easier to find a UMPα-similar test. Moreover, tests that are UMP similar
on the boundary are often UMP unbiased. Theorem 2.Let the power function of every testϕofH
0:θ∈Θ 0againstH 1:θ∈Θ 1be
continuous inθ. Then a UMPα-similar test is UMP unbiased, provided that its size isα
for testingH
0againstH 1.
Proof.Letϕ
0be UMPα-similar. ThenE θϕ0(X)≤αforθ∈Θ 0. Comparing its power
with that of the trivial similar testϕ(x)≡α, we see thatϕ
0is unbiased also. By the
continuity ofβ
ϕ(θ)we see that the class of all unbiased sizeαtests is a subclass of the
class of allα-similar tests. It follows thatϕ
0is a UMP unbiased sizeαtest.
Remark 2.The continuity of power functionβ
ϕ(θ)is not always easy to check but
sufficient conditions may be found in most advanced calculus texts. See, for example,
Widder [117, p. 356]. If the family of PDF (PMF)f
θis an exponential family then a proof
is given in Lehman [64, p. 59].
Example 1.LetX
1,X2,...,X nbe a sample fromN(μ,1).WewishtotestH 0:μ≤0
againstH
1:μ>0. Since the family of densities has an MLR in
ϕ
n
1
Xi, we can use
Theorem 2 to conclude that a UMP test rejectsH
0if
ϕ
n
1
Xi>c. This test is also UMP
unbiased. Nevertheless we use this example to illustrate the concepts introduced above.
HereΘ
0={μ≤0},Θ 1={μ>0}, andΛ=
Θ0∩Θ1={μ=0}. SinceT(X)=
ϕ
n
i=1
Xi
is sufficient, we focus attention to tests based onTalone. Note thatT∼N(nμ,n)which is
one-parameter exponential. Thus the power function of any testϕbased onTis continuous
inμ. It follows that any unbiased sizeαtest ofH
0has the propertyβ ϕ(0)=α of similarity
overΛ. In order to use Theorem 2, we find a UMP test ofH
α
0
:μ∈ΛagainstH 1.Letμ 1>0.
By the fundamental lemma an MP test ofμ=0againstμ=μ
1>0 is given by
ϕ(t)=
α
1ifexp

t
2
2n

(t−nμ)
2
2n

>k
α
0 otherwise,
=
α
1ift>k
0ift≤k
wherekis determined from
α=P
0{T>k}=P
θ
Z>
k

n
σ
.

UNBIASED AND INVARIANT TESTS 455
Thusk=

nzα. Sinceϕis independent ofμ 1as long asμ 1>0, we see that the test
ϕ(t)=
Γ
1,t>

nzα
0,otherwise,
,
is UMPα-similar. We need only check thatϕis of the right size for testingH
0againstH 1.
We have, forμ≤0,
E
μϕ(T)=P μ{T>

nzα}
=P
Θ
T−nμ

n
>z
α−


Φ
≤P{Z>z
α},
since−

nμ≥0. HereZisN(0,1). It follows that
E
μϕ(T)≤α forμ≤0,
henceϕis UMP unbiased.
Theorem 2 can be used only if it is possible to find a UMPα-similar test. Unfortunately
this requires heavy use of conditional expectation, and we will not pursue the subject any
further. We refer to Lehmann [64, chapters 4 and 5] and Ferguson [28, pp. 224–233] for
further details.
Yet another reduction is obtained if we apply the principle of invariance to hypothesis
testing problems. We recall that a class of distributions is invariant under a group of trans-
formationsGif for everyg∈Gand everyθ∈Θthere exists a uniqueθ
Γ
∈Θsuch that
g(X)has distributionP
θ
Θ, wheneverX∼P θ. We rewriteθ
Γ
=
gθ.
In a hypothesis testing problem we need to reformulate the principle of invariance.
First, we need to ensure that under transformationsGnot only doesP={P
θ:θ∈Θ}
remain invariant but also the problem of testingH
0:θ∈Θ 0againstH 1:θ∈Θ 1remain
invariant. Second, since the problem has not changed by application ofG, the decision also
must not change.
Definition 5.A groupGof transformations on the space of values ofXleaves a hypothesis
testing probleminvariantifGleaves both{P
θ:θ∈Θ 0}and{P θ:θ∈Θ 1}invariant.
Definition 6.We say thatϕisinvariantunderGif
ϕ(g(x )) =ϕ(x)for allxand allg∈G.
Definition 7.LetGbe a group of transformations on the space of values of the RVX.We
say that a statisticT(x)ismaximal invariantunderGif (a)Tis invariant; (b)Tis maximal,
that isT(x
1)=T(x 2)⇒x 1=g(x 2)for someg∈G.

456 NEYMAN–PEARSON THEORY OF TESTING OF HYPOTHESES
Example 2.Letx=(x 1,x2,...,x n), andGbe the group of translations
g
c(x)=(x 1+c,...,x n+c), −∞<c<∞.
Here the space of values ofXisR
n. Consider the statistic
T(x)=(x
n−x1,...,x n−xn−1).
Clearly,
T(g
c(x)) = (x n−x1,...,x n−xn−1)=T(x).
IfT(x)=T(x
Γ
), thenx n−xi=x
Γ
n
−x
Γ
i
,i=1,2,...,n−1, and we havex i−x
Γ
i
=xn−x
Γ
n
=c
(i=1,2,...,n−1), that is,g
c(x
Γ
)=(x
Γ
1
+c,...,x
Γ
n
+c)=xandTis maximal invariant.
Next consider the group of scale changes
g
c(x)=(cx 1,...,cx n), c>0.
Then
T(x)=







0i fallx
i=0,

x
1
z
,...,
x
n
z

if at least onex
i=0, z=

n
Δ
1
x
2
i

1/2
,
is maximal invariant; for
T(g
c(x)) =T(cx 1,...,cx n)=T(x),
and ifT(x)=T(x
Γ
), then eitherT(x)=T(x
Γ
)=0 in which casex i=x
Γ
i
=0, orT(x)=
T(x
Γ
)=0, in which casex i/z=x
Γ
i
/z
Γ
, implyingx
Γ
i
=(z
Γ
/z)xi=cxi, andTis maximal.
Finally, if we consider the group of translation and scale changes,
g(x)=(ax
1+b,...,ax n+b), a>0, −∞<b<∞,
a maximal invariant is
T(x)=



0i fβ=0,

x
1−
x
β
,
x
2−
x
β
,...,
x
n−
x
β

ifβ=0,
wherex=n
−1
Λ
n
1
xiandβ=n
−1
Λ
n
1
(xi−x)
2
.
Definition 8.LetI
αdenote the class of all invariant sizeαtests ofH 0:θ∈Θ 0against
H
1:θ∈Θ 1. If there exists a UMP member inI α, we call the test a UMPinvariant testofH 0
againstH 1.

UNBIASED AND INVARIANT TESTS 457
The search for UMP invariant tests is greatly facilitated by the use of the following
result.
Theorem 3.LetT(x)be maximal invariant with respect toG. Thenϕis invariant under
Gif and only ifϕis a function ofT.
Proof.Letϕbe invariant. We have to show thatT(x
1)=T(x 2)⇒ϕ(x 1)=ϕ(x 2).If
T(x
1)=T(x 2), there is ag∈Gsuch thatx 1=g(x 2), so thatϕ(x 1)=ϕ(g(x 2)) =ϕ(x 2).
Conversely, ifϕis a function ofT,ϕ(x)=h[T (x)], then
ϕ(g(x )) =h[T(g(x))] =h[T(x)] =ϕ(x),
andϕis invariant.
Remark 3.The use of Theorem 3 is obvious. If a hypothesis testing problem is invariant
under a groupG,theprinciple of invariancerestricts attention to invariant tests. According
to Theorem 3, it suffices to restrict attention to test functions that are functions of maximal
invariantT.
Example 3.LetX
1,X2,...,X nbe a sample fromN(μ,σ
2
), where bothμandσ
2
are
unknown. We wish to testH
0:σ≥σ 0,−∞<μ<∞,against H 1:σ<σ 0,−∞<
μ<∞. The family{N(μ,σ
2
)}remains invariant under translationsx
Γ
i
=xi+c,
−∞<c<∞. Moreover, sincevar(X+c)=var(X), the hypothesis testing problem
remains invariant under the group of translations, that is, both{N(μ,σ
2
):σ
2
≥σ
2
0
}and
{N(μ,σ
2
):σ
2

2
0
}remain invariant. The joint sufficient statistic is(
X,
Λ
(X i−X)
2
),
which is transformed to(X+c,
Λ
(X i−X)
2
)under translations. A maximal invariant is
Λ
(X
i−
X)
2
. It follows that the class of invariant tests consists of tests that are functions
of
Λ
(X
i−
X)
2
.
Now
Λ
(X
i−
X)
2

2
∼χ
2
(n−1), so that the PDF ofZ=
Λ
(X i−X)
2
is given by
f
σ
2(z)=
σ
−(n−1)
Γ[(n−1)/2]2
(n−1)/2
z
(n−3)/2
e
−z/2σ
2
,z>0.
The family of densities{f
σ
2:σ
2
>0}has an MLR inz, and it follows that a UMP test is
to rejectH
0:σ
2
≥σ
2
0
ifz≤k, that is, a UMP invariant test is given by
ϕ(x)=
Γ
1if
Λ
(x
i−
x)
2
≤k,
0if
Λ
(x
i−
x)
2
>k,
wherekis determined from the size restriction
α=P
σ0
Δ
(X
i−
X)
2
≤k

=P
ΘΛ
(X
i−
X)
2
σ
2
0

k
σ
2
0
Φ
,
that is,
k=σ
2
0
χ
2
n−1,1−α
.

458 NEYMAN–PEARSON THEORY OF TESTING OF HYPOTHESES
Example 4.LetXhave PDFf i(x1−θ,...,x n−θ)underH i(i=0,1),−∞<θ<∞.Let
Gbe the group of translations
g
c(x)=(x 1+c,...,x n+c), −∞<c<∞,n≥2.
Clearly,ginduces
gonΘ, wheregθ=θ+c. The hypothesis testing problem remains
invariant underG. A maximal invariant underGisT(X)=(X
1−Xn,...,X n−1−Xn)=
(T
1,T2,...,T n−1). The class of invariant tests coincides with the class of tests that are
functions ofT. The PDF ofTunderH
iis independent ofθand is given by


−∞
fi(t1+
z,...,t
n−1+z,z)dz. The problem is thus reduced to testing a simple hypothesis against a
simple alternative. By the fundamental lemma the MP test
ϕ(t
1,t2,...,t n−1)=
Γ
1ifλ(t)>c,
0ifλ(t)<c,
wheret=(t
1,t2,...,t n−1)and
λ(t)=


−∞
f1(t1+z,...,t n−1+z,z)dz


−∞
f0(t1+z,...,t n−1+z,z)dz
,
is UMP invariant.
A particular case of Example 4 will be, for instance, to testH
0:X∼N(θ,1)against
H
1:X∼C(1,θ),θ∈R. See Problem 1.
Example 5.Suppose(X,Y)has joint PDF
f
θ(x,y)=λμexp{−λx−μy},x>0,y>0,
and=0 elsewhere, whereθ=(λ,μ),λ>0,μ> 0. Consider scale groupG=
{{0,c},c>0}which leaves{f
θ}invariant. Suppose we wish to testH 0:μ≥λagainst
H
1:μ<λ. It is easy to see that
GΘ0=Θ0so thatGleaves(α,Θ 0,Θ1)invariant and
T=Y/Xis maximal invariant. The PDF ofTis given by
f
T
θ
(t)=
λμ
(λ+μt)
2
,t>0,=0fort <0.
The family{f
T
θ
}has MLR inTand hence a UMP invariant test ofH 0is of the form
ϕ(t)=





1,t>c(α),
γ,t=c(α),
0,t<c(α),
where
α=


c(α)
1
(1+t)
2
dt⇒c(α)=
1−α
α
.

LOCALLY MOST POWERFUL TESTS 459
PROBLEMS 9.5
1.To testH
0:X∼N(θ,1)againstH 1:X∼C(1,θ)asampleofsize2isavailableonX .
Find a UMP invariant test ofH
0againstH 1.
2.LetX
1,X2,...,X nbe a sample fromP(λ). Find a UMP unbiased sizeαtest for
the null hypothesisH
0:λ≤λ 0against alternativesλ>λ 0by the methods of this
section.
3.LetX∼NB(1; θ). By the methods of this section find a UMP unbiased sizeαtest
ofH
0:θ≥θ 0againstH 1:θ<θ0.
4.LetX
1,X2,...,X niidN(μ,σ
2
)RVs. Consider the problem of testingH 0:μ≤0
againstH
1:μ>0:
(a) It suffices to restrict attention to sufficient statistic(U,V)whereU=
Xand
V=S
2
. Show that the problem of testingH 0is invariant underG={{a,1},
a∈R}and a maximal invariant isT=U/

V.
(b) Show that the distribution ofThas MLR and a UMP invariant test rejectsH
0
whenT>c.
5.LetX
1,X2,...,X nbe iid RVs and letH 0be thatX i∼N(θ,1), andH 1be that the
common PDF isf
θ(x)=(1/2)exp{−|x−θ|}. Find the form of the UMP invariant
test ofH
0againstH 1.
6.LetX
1,X2,...,X nbe iid RVs and supposeH 0:Xi∼N(0,1)andH 1:Xi∼f1(x)=
exp{−|x |}/2:
(a) Show that the problem of testingH
0againstH 1is invariant under scale changes
g
c(x)=cx ,c>0 and a maximal invariant isT(X)=(X 1/Xn,...,X n−1/Xn).
(b) Show that the MP invariant test rejectsH
0when




1+
n−1
Δ
i=1
Y
2
i


1+ n+1
Δ
i=1
|Yi|

<kwhereY
j=Xj/Xn,j=1,2,...,n−1, or equivalently when


n
Δ
j=1
X
2
j


1/2
n
Δ
j=1
|Xj|
<k.
9.6 LOCALLY MOST POWERFUL TESTS
In the previous section we argued that whenever a UMP test does not exist, we restrict the
class of tests under consideration and then find a UMP test in the subclass. Yet another
approach when no UMP test exists is to restrict the parameter set to a subset ofΘ
1.In
most problems, the parameter values that are close to the null hypothesis are the hardest
to detect. Tests that have good power properties for “local alternatives” may also retain
good power properties for “nonlocal” alternatives.

460 NEYMAN–PEARSON THEORY OF TESTING OF HYPOTHESES
Definition 1.LetΘ⊆R. Then a testϕ 0with power functionβ ϕ0
(θ)=E θϕ0(X)is said
to be alocally most powerful(LMP) test ofH
0:θ≤θ 0againstH 1:θ>θ0if there exists
aΔ>0 such that for any other testϕwith
β
ϕ(θ0)=β ϕ0
(θ0)=

ϕ(x)f θ0
(x)dx (1)
β
ϕ0
(θ)≥β ϕ(θ)for everyθ∈(θ 0,θ0+Δ]. (2)
We assume that the tests under consideration have continuously differentiable power
function atθ=θ
0and the derivative may be taken under the integral sign. In that case, an
LMP test maximizes

∂θ
β
ϕ(θ)



θ=θ 0

Γ
ϕ
(θ)



θ=θ 0
=

ϕ(x)

∂θ
f
θ(x)



θ=θ 0
dx (3)
subject to the size constraint (1). A slight extension of the Neyman–Pearson lemma
(Remark 9.3.2) implies that a test satisfying (1) and given by
ϕ
0(x)=









1if

∂θ
fθ(x)



θ0
>kfθ0
(x),
γif
∂ ∂θ
fθ(x)

θ0
=kfθ0
(x),
0if
∂ ∂θ
fθ(x)

θ0
<kfθ0
(x)
(4)
will maximizeβ
Γ
ϕ
(θ0). It is possible that a test that maximizesβ
Γ
ϕ
(θ0)is not LMP, but if
the test maximizesβ
Γ
(θ0)and is unique then it must be LMP test (see Kallenberg et al. [49,
p. 290] and Lehmann [64, p. 528]).
Note that forxfor whichf
θ0
(x)=0wecanwrite

∂θ
fθ(x)

θ0
fθ0
(x)
=

∂θ
logf
θ(x)

θ0
,
and then
ϕ
0(x)=









1if

∂θ
logfθ(x)



θ0
>k,
γif
∂ ∂θ
logfθ(x)

θ0
=k,
0if
∂ ∂θ
logfθ(x)



θ0
<k.
(5)
Example 1.LetX
1,X2,...,X nbe iid with common normal PDF with meanμand vari-
anceσ
2
. If one of these parameters is unknown while the other is known, the family of
PDFs has MLR and UMP tests exist for one-sided hypotheses for the unknown parameter.
Let us derive the LMP test in each case.

LOCALLY MOST POWERFUL TESTS 461
First consider the case whenσ
2
is known, sayσ
2
=1 andH 0:μ≤0,H 1:μ>0. An
easy computation shows that an LMP test is of the form
ϕ
0(x)=
α
1if
x>k
0ifx≤k
which, of course, is the form of the UMP test obtained in Problem 9.4.1 by an application
of Theorem 9.4.2.
Next consider the case whenμis known, sayμ=0 andH
0:σ≤σ 0,H1:σ>σ0.Using
(5) we see that an LMP test is of the form
ϕ
1(x)=
α
1if
ϕ
n
i=1
x
2
i
>k
0if
ϕ
n
i=1
x
2
i
≤k
which coincides with the UMP test.
In each case the power function is differentiable and the derivatives may be taken inside
the integral sign because the PDF is a one–parameter exponential type PDF.
Example 2.LetX
1,X2,...,X nbe iid RVs with common PDF
f
θ(x)=
1
π
1
1+(x−θ)
2
,x∈R,
and consider the problem of testingH
0:θ≤0againstH 1:θ>0.
In this case{f
θ}does not have MLR. A direct computation using the Neyman–Pearson
lemma shows that an MP test ofθ=0againstθ =θ
1,θ1>0 depends onθ 1and hence
cannot be MP for testingθ=0againstθ=θ
2,θ2=θ1. Hence a UMP test ofH 0against
H
1does not exist. An LMP test ofH 0againstH 1is of the form
ϕ
0(x)=





1if
n
β
i=1
2xi
1+x
2
i
>k
0 otherwise,
wherekis chosen so that the size ofϕ
0isα. For smallnit is hard to computekbut for
largenit is easy to computekusing the central limit theorem. Indeed{
Xi
1+X
2
i}are iid RVs
with mean 0 and finite variance(=3/8)so thatk=z
α

n/2 will give an (approximate)
levelαtest for largen.
The testϕ
0is good at detecting small departures fromθ≤0 but it is quite unsatisfactory
in detecting values ofθaway from 0. In fact, forα<1/2,β
ϕ0
(θ)→0asθ→∞.
This procedure for finding locally best tests has applications in nonparametric statistics.
We refer the reader to Randles and Wolfe [85, section 9.1] for details.

462 NEYMAN–PEARSON THEORY OF TESTING OF HYPOTHESES
PROBLEMS 9.6
1.LetX
1,X2,...,X nbe iidC(1,θ)RVs. Show thatE 0(1+X
2
1
)
−k
=(1/π)B(k+
1/2,1/2). Hence or otherwise show thatE
0
!
X
2
1
(1+X
2
1
)
2
"
=var
#
X1
1+X
2
1
$
=1/8.
2.LetX
1,X2,...,X nbe a random sample from logistic PDF
f
θ(x)=
1
2[1+cosh(x−θ)]
=
e
x−θ
{1+e
x−θ
}
2
.
Show that the LMP test ofH
0:θ=0againstH 1:θ>0 rejectsH 0if
ϕ
n
i=1
tanh
%
xi
2
&
>k.
3.LetX
1,X2,...,X nbe iid RVs with common Laplace PDF
f
θ(x)=(1/2)exp{−|x−θ|}.
Forn≥2 show that UMP sizeα(0<α<1)test ofH
0:θ≤0againstH 1:θ>0
does not exist. Find the form of the LMP test.

10
SOME FURTHER RESULTS ON
HYPOTHESES TESTING
10.1 INTRODUCTION
In this chapter we study some commonly used procedures in the theory of testing of
hypotheses. In Section 10.2 we describe the classical procedure for constructing tests
based on likelihood ratios. This method is sufficiently general to apply to multi-parameter
problems and is specially useful in the presence ofnuisance parameters. These are
unknown parameters in the model which are of no inferential interest. Most of the normal
theory tests described in Sections 10.3 to 10.5 and those in Chapter 12 can be derived
by using methods of Section 10.2. In Sections 10.3 to 10.5 we list some commonly
used normal theory-based tests. In Section 10.3 we also deal with goodness-of-fit tests.
In Section 10.6 we look at the hypothesis testing problem from a decision-theoretic
viewpoint and describe Bayes and minimax tests.
10.2 GENERALIZED LIKELIHOOD RATIO TESTS
In Chapter 9 we saw that UMP tests do not exist for some problems of hypothesis testing.
It was suggested that we restrict attention to smaller classes of tests and seek UMP tests in
these subclasses or, alternatively, seek tests which are optimal against local alternatives.
Unfortunately, some of the reductions suggested in Chapter 9, such as invariance, do not
apply to all families of distributions.
In this section we consider a classical procedure for constructing tests that has
some intuitive appeal and that frequently, though not necessarily, leads to optimal
An Introduction to Probability and Statistics, Third Edition. Vijay K. Rohatgi and A.K. Md. Ehsanes Saleh.
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.

464 SOME FURTHER RESULTS ON HYPOTHESES TESTING
tests. Also, the procedure leads to tests that have some desirable large-sample
properties.
Recall that for testingH
0:X∼f 0againstH 1:X∼f 1, Neyman–Pearson MP test is
based on the ratiof
1(x)/f 0(x). If we interpret the numerator as the best possible explana-
tion ofxunderH
1, and the denominator as the best possible explanation ofXunderH 0,
then it is reasonable to consider the ratio
r(x)=
sup
θ∈Θ 1
L(θ;x)
sup
θ∈Θ 0
L(θ;x)
=
sup
θ∈Θ 1
fθ(x)
sup
θ∈Θ 0
fθ(x)
as a test statistic for testingH
0:θ∈Θ 0againstH 1:θ∈Θ 1.HereL(θ;x)is the likelihood
function ofx. Note that for eachxfor which the MLEs ofθunderΘ
1andΘ 0exist the
ratio is well defined and free ofθand can be used as a test statistic. Clearly we should
rejectH
0ifr(x)>c.
The statisticris hard to compute; only one of the two supremas in the ratio may be
attained.
Letθ∈Θ⊆R
kbe a vector of parameters, and letXbe a random vector with PDF
(PMF)f
θ. Consider the problem of testing the null hypothesisH 0:X∼f θ,θ∈Θ 0against
the alternativeH
1:X∼f θ,θ∈Θ 1.
Definition 1.For testingH
0againstH 1, a test of the form, rejectH 0if and only if
λ(x)<c, wherecis some constant, and
λ(x)=
sup
θ∈Θ 0
fθ(x1,x2,...,x n)
sup
θ∈Θ
fθ(x1,x2,...,x n)
,
is called ageneralized likelihood ratio(GLR) test.
We leave the reader to show that the statisticsλ(X)andr(X)lead to the same criterion
for rejectingH
0.
The numerator of the likelihood ratioλis the bestexplanationofX(in the sense of
maximum likelihood) that the null hypothesisH
0can provide, and the denominator is the
best possible explanation ofX.H
0is rejected if there is a much better explanation ofX
than the best one provided byH
0.
It is clear that 0≤λ≤1. The constantcis determined from the size restriction
sup
θ∈Θ 0
Pθ{λ(X) <c}=α.
If the distribution ofλis continuous (that is, the DF is absolutely continuous), any sizeα
is attainable. If, however,λ(X)is a discrete RV, it may not be possible to find a likelihood
ratio test whose size exactly equalsα. This problem arises because of the nonrandomized
nature of the likelihood ratio test and can be handled by randomization. The following result holds.
Theorem 1.If for givenα,0≤α≤1, nonrandomized Neyman–Pearson and likelihood
ratio tests of a simple hypothesis against a simple alternative exist, they are equivalent.
Proof.The proof is left as an exercise.

GENERALIZED LIKELIHOOD RATIO TESTS 465
Theorem 2.For testingθ∈Θ 0againstθ∈Θ 1, the likelihood ratio test is a function of
every sufficient statistic forθ.
Theorem 2 follows from the factorization theorem for sufficient statistics.
Example 1.LetX∼b(n,p), and we seek a levelαlikelihood ratio test ofH
0:p≤p 0
againstH 1:p>p 0:
λ(x)=
sup
p≤p0
θ
n
x

p
x
(1−p)
n−x
sup
0≤p≤1
θ
n
x

p
x
(1−p)
n−x
.
Now
sup
0≤p≤1
p
x
(1−p)
n−x
=

x
n

x⊆
1−
x
n

n−x
.
The functionp
x
(1−p)
n−x
first increases, then achieves its maximum atp=x/n, and
finally decreases, so that
sup
p≤p0
p
x
(1−p)
n−x
=





p
x
0
(1−p 0)
n−x
ifp0<
x
n
,

x
n

x⊆
1−
x
n

n−x
if
x
n
≤p
0.
It follows that
λ(x)=





p
x
0
(1−p 0)
n−x
(x/n)
x
[1−(x/n)]
n−x
ifp0<
x
n
,
1i f
x
n
≤p
0.
Note thatλ(x)≤1fornp
0<xandλ(x)=1ifx ≤np 0, and it follows thatλ(x)is a
decreasing function ofx. Thusλ(x)<cif and only ifx>c

, and the GLR test rejectsH 0
ifx>c

.
The GLR test is of the type obtained in Section 9.4 for families with an MLR except
for the boundaryλ(x)=c. In other words, if the size of the test happens to be exactlyα,
the likelihood ratio test is a UMP levelαtest. SinceXis a discrete RV, however, to obtain
sizeαmay not be possible. We have
α=sup
p≤p0
Pp{X>c

}=P p0
{X>c

}.
If such ac

does not exist, we choose an integerc

such that
P
p0
{X>c

}≤α andP p0
{>c

−1}>α.
The situation in Example 1 is not unique. For one-parameter exponential family it can
be shown (Birkes [7]) that a GLR test ofH
0:θ≤θ 0againstH 1:θ>θ0is UMP of its

466 SOME FURTHER RESULTS ON HYPOTHESES TESTING
size. The result holds also for the dualH
α
0
:θ≥θ 0and, in fact, for a much wider class of
one-parameter family of distributions.
The GLR test is specially useful whenθis a multiparameter and we wish to test
hypothesis concerning one of the parameters. The remaining parameters act as nuisance
parameters.
Example 2.Consider the problem of testingμ=μ
0againstμδ=μ 0in sampling from
N(μ,σ
2
), where bothμandσ
2
are unknown. In this caseΘ 0={(μ 0,σ
2
):σ
2
>0}and
Θ={(μ,σ
2
):−∞<μ<∞, σ
2
>0}. We writeθ=(μ,σ
2
):
sup
θ∈Θ 0
fθ(x)= sup
σ
2
>0

1


2π)
n
exp



n
1
(xi−μ0)
2

2

=f
2
ˆσ
0
(x),
whereˆσ
2
0
is the MLE,ˆσ
2
0
=(1/n)

n i=1
(xi−μ0)
2
. Thus
sup
θΘ0
fθ(x)=
1
(2π/n)
n/2

n 1
(xi−μ0)
2

n/2
e
−n/2
.
The MLE ofθ=(μ,σ
2
)when bothμandσ
2
are unknown is (

n 1
xi/n,

n 1
(xi−
x)
2
/n).
It follows that
sup
θ∈Θ
fθ(x)=sup
μ,σ
2

1


2π)
n
exp



n
1
(xi−μ)
2

2

=
1
(2π/n)
n/2


n
1
(xi−
x)
2

n/2
e
−n/2
.
Thus
λ(x)=

n 1
(xi−
x)
2

n 1
(xi−μ0)
2
n/2
=

1
1+[n(x−μ 0)
2
/

n 1
(xi−
x)
2
]

n/2
.
The GLR test rejectsH
0if
λ(x)<c,
and sinceλ(x)is a decreasing function ofn(
x−μ 0)
2
/

n
1
n(xi−x)
2
, we rejectH 0if






x−μ 0


n
1
(xi−
x)
2






>c
α
,
that is, if





n(x−μ 0)
s

>c
αα
,

GENERALIZED LIKELIHOOD RATIO TESTS 467
wheres
2
=(n−1)
−1

n
1
(xi−
x)
2
. The statistic
t(X)=

n(X−μ 0)
S
has at-distribution withn−1 d.f. UnderH
0:μ=μ 0,t(X)has a centralt(n−1)dis-
tribution, but underH
1:μ⎪=μ 0,t(X)has a noncentralt-distribution withn−1 d.f. and
noncentrality parameterδ=(μ−μ
0)/σ. We choosec
⊆⊆
=t
n−1,α/2 in accordance with
the distribution oft(X)underH
0. Note that the two-sidedt-test obtained here is UMP
unbiased. Similarly one can obtain one-sidedt-tests also as likelihood ratio tests.
The computations in Example 2 could be slightly simplified by using Theorem 2.
IndeedT(X)=(
X,S
2
)is a minimal sufficient statistic forθand sinceXandS
2
are indepen-
dent the likelihood is the product of the PDFs ofXandS
2
. We note thatX∼N(μ,σ
2
/n)
andS
2

σ
2
n−1
χ
2
n−1
. We leave it to the reader to carry out the details.
Example 3.LetX
1,X2,...,X mandY 1,Y2,...,Y nbe independent random samples from
N(μ
1,σ
2
1
)andN(μ 2,σ
2
2
), respectively. We wish to test the null hypothesisH 0:σ
2
1

2
2
againstH 1:σ
2
1
⎪=σ
2
2
.Here
Θ={(μ
1,σ
2
1
,μ2,σ
2
2
):−∞<μ i<∞,σ
2
i
>0,i=1,2}
and
Θ
0={(μ 1,σ
2
1
,μ2,σ
2
2
):−∞<μ i<∞,i=1,2,σ
2
1

2
2
>0}.
Letθ=(μ
1,σ
2
1
,μ2,σ
2
2
). Then the joint PDF is
f
θ(x,y)=
1
(2π)
(m+n)/2
σ
m
1
σ
n
2
exp


12σ
2
1
m

1
(xi−μ1)
2

1

2
2
n

1
(yi−μ2)
2

.
Also,
logf
θ(x,y)=−
m+n
2
log2π−
m
2
logσ
2
1

n
2
logσ
2
2


m
1
(xi−μ1)
2

2
1

1

2
2
n

1
(yi−μ2)
2
.
Differentiating with respect toμ
1andμ 2, we obtain the MLEs
ˆμ
1=
xand ˆμ 2=y.
Differentiating with respect toσ
2
1
andσ
2
2
, we obtain the MLEs
ˆσ
2
1
=
1
m
m

1
(xi−x)
2
and ˆσ
2
2
=
1
n
n

1
(yi−y)
2
.

468 SOME FURTHER RESULTS ON HYPOTHESES TESTING
If, however,σ
2
1

2
2

2
,theMLEofσ
2
is
ˆσ
2
=

m
1
(xi−
x)
2
+

n 1
(yi−
y)
2
m+n
.
Thus
sup
θ∈Θ 0
fθ(x,y)=
e
−(m+n)/2
[2π/(m+n)]
(m+n)/2

m
1
(xi−
x)
2
+

n 1
(yi−
y)
2

(m+n)/2
and
sup
θ∈Θ
fθ(x,y)=
e
−(m+n)/2(2π/m)
m/2
(2π/n)
n/2

m
1
(xi−
x)
2

m/2
n 1
(yi−y)
2

n/2
,
so that
λ(x,y)=
θ
m
m+n
λ
m/2θ
n
m+n
λ
n/2

m 1
(xi−
x)
2

m/2
n 1
(yi−y)
2

n/2

m 1
(xi−x)
2
+

n 1
(yi−
y)
2

(m+n)/2
.
Now

m 1
(xi−
x)
2

m/2
n 1
(yi−y)
2

n/2

m 1
(xi−x)
2
+

n 1
(yi−
y)
2

(m+n)/2
=
1

1+

m 1
(xi−x)
2
/

n 1
(yi−
y)
2

n/2
1+

n 1
(yi−y)
2
/

m 1
(xi−
x)
2

m/2
.
Writing
f=

m
1
(xi−
x)
2
/(m−1)

n 1
(yi−y)
2
/(n−1)
,
we have
λ(x,y)=
θ
m
m+n
λ
m/2θ
n
m+n
λ
n/2
·
1
{1+[(m−1)/(n−1)]f}
n/2
{1+[(n−1)/(m −1)](1/f )}
m/2
.
We leave the reader to check thatλ(x,y)<cis equivalent tof<c
1orf>c 2.(Take
logarithms and use properties of convex functions. Alternatively, differentiatelogλ.)
UnderH
0, the statistic
F=

m
1
(Xi−
X)
2
/(m−1)

n
1
(Yi−
Y)
2
/(n−1)

GENERALIZED LIKELIHOOD RATIO TESTS 469
has anF(m−1,n−1)distribution, so thatc 1,c2can be selected. It is usual to take
P{F≤c
1}=P{F≥c 2}=
α
2
.
UnderH
1,(σ
2
2

2
1
)Fhas anF(m−1,n−1)distribution.
In Example 3 we can obtain the same GLR test by focusing attention on the joint suf-
ficient statistic(
X,Y,S
2
X
,S
2
Y
)whereS
2
X
andS
2
Y
are sample variances of theX’s and theY’s,
respectively. In order to write down the likelihood function we note that
X,Y,S
2
X
,S
2
Y
are
independent RVs. The distributions
XandS
2
X
are the same as in Example 2 except thatm
is the sample size. Distributions of
YandS
2
Y
require appropriate modifications. We leave
the reader to carry out the details. It turns out that the GLR test coincides with the UMP
unbiased test in this case.
In certain situations the GLR test does not perform well. We reproduce here an example
due to Stein and Rubin.
Example 4.LetXbe a discrete RV with PMF
P
p=0{X=x}=









α
2
ifx=±2,
1−2α
2
ifx=±1,
α ifx=0,
under the null hypothesisH
0:p=0, and
P
p{X=x}=





















pc ifx=−2,
1−c
1−α
Θ
1
2
−α
Φ
ifx=±1,
α
Θ
1−c
1−α
Φ
ifx=0,
(1−p)c ifx=2,
under the alternativeH
1:p∈(0,1), whereαandcare constants with
0<α<
1
2
and
α
2−α
<c<α.
To test the simple null hypothesis against the composite alternative at the level of
significanceα, let us compute the likelihood ratioλ.Wehave
λ(2)=
P
0{X=2}
sup
0≤p<1
Pp{X=2}
=
α/2
c
=
α
2c
sinceα/2<c. Similarlyλ(−2)=α/(2c).Also
λ(1)=λ (−1)=
1
2
−α
[(1−c)/(1−α)]

1
2
−α
=
1−α
1−c
,α<
1
2
,

470 SOME FURTHER RESULTS ON HYPOTHESES TESTING
and
λ(0)=
1−α
1−c
.
The GLR test rejectsH
0ifλ(x)<k, wherekis to be determined so that the level isα.We
see that
P
0

λ(X)<
1−α
1−c

=P
0{X=±2}=α,
provided thatα/2c<[(1−α)/(1−c)].Butα/(2−α)<c<αimpliesα<2c−cα,so
thatα−cα<2c−2cα,orα(1−c)<2c(1−α), as required. Thus the GLR sizeαtest is
to rejectH
0ifX=±2. The power of the GLR test is
P
p

λ(X)<
1−α
1−c

=P
p{X=±2}=pc+(1−p)c=c<α
for allp∈(0,1). The test is not unbiased and is even worse than the trivial testϕ(x)≡α.
Another test that is better than the trivial test is to rejectH
0wheneverx=0 (this is
opposite to what the likelihood ratio test says). Then
P
0{X=0}=α,
P
p{X=0}=α
1−c
1−α
>α (sincec<α),
for allp∈(0,1), and the test is unbiased.
We will use the generalized likelihood ratio procedure quite frequently hereafter
because of its simplicity and wide applicability. The exact distribution of the test statistic
underH
0is generally difficult to obtain (despite what we saw in Examples 1 to 3 above)
and evaluation of power function is also not possible in many problems. Recall, however,
that under certain conditions the asymptotic distribution of the MLE is normal. This result
can be used to prove the following large-sample property of the GLR underH
0, which
solves the problem of computation of the cut-off pointcat least when the sample size is
large.
Theorem 3.Under some regularity conditions onf
θ(x), the random variable−2logλ(X)
underH
0is asymptotically distributed as a chi-square RV with degrees of freedom equal to
the difference between the number of independent parameters inΘand the number inΘ
0.
We will not prove this result here; the reader is referred to Wilks [118, p. 419]. The
regularity conditions are essentially the ones associated with Theorem 8.7.4. In Example 2
the number of parameters unspecified underH
0is one (namely,σ
2
), and underH 1two
parameters are unspecified (μ andσ
2
), so that the asymptotic chi-square distribution will
have 1 d.f. Similarly, in Example 3, the d.f.=4−3=1.
Example 5.In Example 2 we showed that, in sampling from a normal population with
unknown meanμand unknown varianceσ
2
, the likelihood ratio for testingH 0:μ=μ 0
againstH 1:μδ=μ 0is

GENERALIZED LIKELIHOOD RATIO TESTS 471
λ(x)=

1+
n(x−μ 0)
2

n
i=1
(xi−
x)
2
−n/2
.
Thus
−2logλ(X)=n log

1+n
(X−μ 0)
2

n
1
(Xi−
X)
2

.
UnderH
0,

n(X−μ 0)/σ∼N(0,1)and

n
1
(Xi−X)
2

2
∼χ
2
(n−1).Also

n i=1
(Xi−
X)
2
/[(n−1)σ
2
]
P
−→1. It follows that ifZ∼N(0,1)then−2logλ(X)has the
same limiting distribution asnlog

1+
Z
2
n−1

. Moreover,

1+
Z
2
n−1

n
L
−→exp{Z
2
}
and since logarithm is a continuous function we see that
nlog

1+
Z
2
n−1

L
−→Z
2
.
Thus−2logλ(X)
L
−→Y, whereY∼χ
2
(1). This result is consistent with Theorem 3.
PROBLEMS 10.2
1.Prove Theorems 1 and 2.
2.A random sample of sizenis taken from PMFP(X
j=xj)=p j,j=1,2,3,4, 0<
p
j<1,

4 j=1
pj=1. Find the form of the GLR test ofH 0:p1=p2=p3=p4=1/4
againstH
1:p1=p2=p/2,p 3=p4=(1−p)/2, 0<p<1.
3.Find the GLR test ofH
0:p=p 0againstH 1:pδ=p 0, based on a sample of size 1
fromb(n,p).
4.LetX
1,X2,...,X nbe a sample fromN(μ,σ
2
), where bothμandσ
2
are unknown.
Find the GLR test ofH
0:σ=σ 0againstH 1:σδ=σ 0.
5.LetX
1,X2,...,X kbe a sample from PMF
P
N{X=j}=
1
N
,j=1,2,...,N,N≥1 is an integer.
(a) Find the GLR test ofH
0:N≤N 0againstH 1:N>N 0.
(b) Find the GLR test ofH
0:N=N 0againstH 1:Nδ=N 0.
6.For a sample of size 1 from PDF
f
θ(x)=
2
θ
2
(θ−x), 0<x<θ,
find the GLR test ofθ=θ
0againstθδ=θ 0.

472 SOME FURTHER RESULTS ON HYPOTHESES TESTING
7.LetX 1,X2,...,X nbe a sample fromG(1,β):
(a) Find the GLR test ofβ=β
0againstβδ=β 0.
(b) Find the GLR test ofβ≤β
0againstβ>β 0.
8.Let(X
1,Y1),(X2,Y2),...,(X n,Yn)be a random sample from a bivariate normal pop-
ulation withEX
i=μ1,EYi=μ2,var(X i)=σ
2
,var(Y i)=σ
2
, andcov(X i,Yi)=ρσ
2
.
Show that the likelihood ratio test of the null hypothesisH
0:ρ=0againstH 1:ρδ=0
reduces to rejectingH
0if|R|>c, whereR=2S 11/(S
2
1
+S
2
2
),S11,S
2
1
, andS
2
2
being
the sample covariance and the sample variances, respectively. (For the PDF of the
test statisticR, see Problem 7.7.1.)
9.LetX
1,X2,...,X mbe iidG(1,θ)RVs and letY 1,Y2,...,Y nbe iidG(1,μ)RVs, where
θandμare unknown positive real numbers. Assume that theX’s and theY’s are
independent. Develop anα-level GLR test for testingH
0:θ=μagainstH 1:θδ=μ.
10.A die is tossed 60 times in order to testH
0:P{j}=1/6,j=1,2,...,6(dieisfair)
againstH
1:P{2}=P{4}=P{6}=2/9,P{1}=P{3}=P{5}=1/9. Find the
GLR test.
11.LetX
1,X2,...,X nbe iid with common PDFf θ(x)=exp{−(x−θ)},x>0, and=0
otherwise. Find the levelαGLR test for testingH
0:θ≤θ 0againstH 1:θ>θ0.
12.LetX
1,X2,...,X nbe iid RVs with common Pareto PDFf θ(x)=θ/x
2
forx>θ,
and=0 elsewhere. Show that the family of joint PDFs has MLR inX
(1)and find a
sizeαtest ofH
0:θ=θ 0againstH 1:θ>θ0. Show that the GLR test coincides with
the UMP test.
10.3 CHI-SQUARE TESTS
In this section we consider a variety of tests where the test statistic has an exact or a limit-
ing chi-square distribution. Chi-square tests are also used for testing some nonparametric
hypotheses and will be taken up again in Chapter 13.
We begin with tests concerning variances in sampling from a normal population. Let
X
1,X2,...,X nbe iidN(μ,σ
2
)RVs whereσ
2
is unknown. We wish to test a hypothesis
of the typeσ
2
≥σ
2
0

2
≤σ
2
0
,orσ
2

2
0
, whereσ 0is some given positive number. We
summarize the tests in the following table.
RejectH 0at levelαif
H0 H1 μKnown μUnknown
I.σ≥σ 0σ<σ0

n
1
(xi−μ)
2
≤χ
2
n,1−α
σ
2
0
s
2

σ
2
0
n−1
χ
2 n−1,1−α
II.σ≤σ 0σ>σ0

n
1
(xi−μ)
2
≥χ
2
n,α
σ
2
0
s
2

σ
2
0
n−1
χ
2
n−1,α

n
1
(xi−μ)
2
≤χ
2
n,1−α/2
σ
2
0
s
2

σ
2
0
n−1
χ
2 n−1,1−α/2
III.σ=σ 0σδ=σ 0









or









or

n
1
(xi−μ)
2
≥χ
2
n,α/2
σ
2
0
s
2

σ
2
0
n−1
χ
2 n−1,α/2

CHI-SQUARE TESTS 473
Remark 1.All these tests can be derived by the standard likelihood ratio procedure. Ifμ
is unknown, tests I and II are UMP unbiased (and UMP invariant). Ifμis known, tests I
and II are UMP (see Example 9.4.5). For tests III we have chosen constantsc
1,c2so that
each tail has probabilityα/2. This is the customary procedure, even though it destroys the
unbiasedness property of the tests, at least for small samples.
Example 1.A manufacturer claims that the lifetime of a certain brand of batteries pro-
duced by his factory has a variance of 5000 (hours)
2
. A sample of size 26 has a variance
of 7200 (hours)
2
. Assuming that it is reasonable to treat these data as a random sample
from a normal population, let us test the manufacturer’s claim at theα=0.02 level. Here
H
0:σ
2
=5000 is to be tested againstH 1:σ
2
δ=5000. We rejectH 0if either
s
2
=7200≤
σ
2
0
n−1
χ
2
n−1,1−α/2
ors
2
>
σ
2
0
n−1
χ
2 n−1,α/2
.
We have
σ
2
0
n−1
χ
2 n−1,1−α/2
=
5000
25
×11.524=2304.8
σ
2
0 n−1
χ
2
n−1,α/2
=
5000
25
×44.314=8862.8.
Sinces
2
is neither≤2304.8 nor ≥8862.8, we cannot reject the manufacturer’s claim at
level 0.02.
A test based on a chi-square statistic is also used for testing the equality of several
proportions. LetX
1,X2,...,X kbe independent RVs withX i∼b(n i,pi),i=1,2,...,k,
k≥2.
Theorem 1.The RV

k
i=1
{(Xi−nipi)/
ˆ
nipi(1−p i)}
2
converges in distribution to the
χ
2
(k)RV asn 1,n2,...,n k→∞.
Proof.The proof is left as an exercise.
Ifn
1,n2,...,n kare large, we can use Theorem 1 to testH 0:p1=p2=···=p k=p
against all alternatives. Ifpis known, we compute
y=
k

1

x
i−nip
ˆ
nip(1−p)

2
,
and ify≥χ
2
k,α
, we rejectH 0. In practicepwill be unknown. Letp=(p 1,p2,...,p k). Then
the likelihood function is
L(p;x
1,...,x k)=
k

1
ρθ
n
i
xi
λ
p
xi
i
(1−p i)
ni−xi

474 SOME FURTHER RESULTS ON HYPOTHESES TESTING
so that
logL(p;x)=
k

i=1
log
Θ
n
i
xi
Φ
+ k

i=1
xilogp i+
k

i=1
(ni−xi)log(1−p i).
The MLEˆpofpunderH
0is therefore given by

k
1
xi
p


k 1
(ni−xi)
1−p
=0,
that is,
ˆp=
x
1+x2+···+x k
n1+n2+···+n k
.
Under certain regularity assumptions (see Cramér [17, pp. 426–427]) it can be shown that
the statistic
Y
1=
k

1
(Xi−niˆp)
2
niˆ p(1−ˆ p)
(1)
is asymptoticallyχ
2
(k−1). Thus the test rejectsH 0:p1=p2=···=p k=p,punknown,
at levelαify
1≥χ
2
k−1,α
.
It should be remembered that the tests based on Theorem 1 are all large-sample tests and
hence not exact, in contrast to the tests concerning the variance discussed above, which
are all exact tests. In the casek=1, UMP tests ofp≥p
0andp≤p 0exist and can be
obtained by the MLR method described in Section 9.4. For testingp=p
0, the usual test
is UMP unbiased.
In the casek=2, ifn
1andn 2are large, a test based on the normal distribution can be
used instead of Theorem 1. In this case the statistic
Z=
X
1/n1−X2/n2
ˆ
ˆp(1−ˆ p)(1/n 1+1/n 2)
, (2)
whereˆ p=(X
1+X2)/(n1+n2), is asymptoticallyN(0,1)underH 0:p1=p2=p.Ifpis
known, one usespinstead ofˆ p. It is not too difficult to show thatZ
2
is equal toY 1, so that
the two tests are equivalent.
For small samples the so-calledFisher–Irwin testis commonly used and is based on
the conditional distribution ofX
1givenT=X 1+X2.Letρ =[p 1(1−p 2)]/[p 2(1−p 1)].
Then
P(X
1+X2=t)=
t

j=0
Θ
n
1
j
Φ
p
j
1
(1−p 1)
n1−j
Θ
n
2
t−j
Φ
p
t−j
2
(1−p 2)
n1−t+j
=
t

j=0
Θ
n
1
j
ΦΘ
n
2
t−j
Φ
ρ
j
a(n1,n2),

CHI-SQUARE TESTS 475
where
a(n
1,n2)=(1−p 1)
n1
(1−p 2)
n2
{p2/(1−p 2)}
t
.
It follows that
P

X
1=x|X 1+X2=t

=

n1
x

p
x
1
(1−p 1)
n1−x

n2
t−x

p
t−x
2
(1−p 2)
n2−t+x
a(n1,n2)

t
j=0

n1
j

n2
t−j

ρ
j
=

n1
j

n2
t−j

ρ
x

t j=0

n1
j

n2
t−j

ρ
j
.
On the boundary of any of the hypothesesp
1=p2,p1≤p2,orp 1≥p2we note thatρ=1
so that
P

X
1=x|X 1+X2=t

=

n1
x

n2
t−x


n1+n2
t
,
which is a hypergeometric distribution. For testingH
0:p1≤p2this conditional test rejects
ifX
1≤k(t), wherek(t)is the largest integer for whichP

X 1≤k(T)|T=t}≤α. Obvious
modifications yield critical regions for testingp
1=p2, andp 1≥p2against corresponding
alternatives.
In applications a wide variety of problems can be reduced to the multinomial distribu-
tion model. We therefore consider the problem of testing the parameters of a multinomial
distribution. Let(X
1,X2,...,X k−1)be a sample from a multinomial distribution with
parametersn,p
1,p2,...,p k−1, and let us writeX k=n−X 1−···−X k−1, andp k=1−
p
1−···−p k−1. The difference between the model of Theorem 1 and the multinomial model
is the independence of theX
i’s.
Theorem 2.Let(X
1,X2,...,X k−1)be a multinomial RV with parametersn,p 1,p2,...,
p
k−1. Then the RV
U
k=
k

i=1

(X
i−npi)
2
npi

(3)
is asymptotically distributed as aχ
2
(k−1)RV (asn→∞).
Proof.For the general proof we refer the reader to Cramér [17, pp. 417–419] or
Ferguson [29, p. 61]. We will consider here thek=2 case to make the result a little more
plausible. We have
U
2=
(X
1−np1)
2
np1
+
(X
2−np2)
2
np2
=
(X
1−np1)
2
np1
+
[n−X
1−n(1−p 1)]
2
n(1−p 1)
=(X
1−np1)
2

1
np1
+
1
n(1−p 1)

=
(X
1−np1)
2
np1(1−p 1)
.

476 SOME FURTHER RESULTS ON HYPOTHESES TESTING
It follows from Theorem 1 thatU 2
L−→Yasn→∞, whereY∼χ
2
(1).
To use Theorem 2 to testH
0:p1=p

1
,...,p k=p

k
, we need only to compute the quantity
u=
k

1

(x
i−np

i
)
2
np
⊆ i

from the sample; ifnis large, we rejectH
0ifu>χ
2
k−1,α
.
Example 2.A die is rolled 120 times with the following results:
123456
Frequency: 20 30 20 25 15 10
Let us test the hypothesis that the die is fair at levelα=0.05. The null hypothesis is
H
0:pi=
1
6
,i=1,2,...,6, wherep iis the probability that the face value isi,1≤i≤6. By
Theorem 2 we rejectH
0if
u=
6

1
[xi−120

1
6

]
2
120

1
6

2
5,0.05
.
We have
u=0+
10
2
20
+0+
5
2
20
+
5
2
20
+
10
2
20
=12.5
Sinceχ
5,0.05=11.07, we rejectH 0. Note that, if we chooseα=0.025, thenχ 5,0.025=12.8,
and we cannot reject at this level.
Theorem 2 has much wider applicability, and we will later study its application to
contingency tables. Here we consider the application of Theorem 2 to testing the null
hypothesis that the DF of an RVXhas a specified form.
Theorem 3.LetX
1,X2,...,X nbe a random sample onX. Also, letH 0:X∼F, where
the functional form of the DFFis known completely. Consider a collection of disjoint
Borel setsA
1,A2,...,A kthat form a partition of the real line. LetP{X∈A i}=p i,i=
1,2,...,k, and assume thatp
i>0 for eachi.LetY j=number ofX i’s inA j,j=1,2,...,k,
i=1,2,...,n. Then the joint distribution of(Y
1,Y2,...,Y k−1)is multinomial with param-
etersn,p
1,p2,...,p k−1. Clearly,Y k=n−Y 1−···−Y k−1andp k=1−p 1−···−p k−1.
The proof of Theorem 3 is obvious. One frequently selectsA
1,A2,...,A kas disjoint
intervals. Theorem 3 is especially useful when one or more of the parameters associated
with the DFFare unknown. In that case the following result is useful.
Theorem 4.LetH
0:X∼F θ, whereθ=(θ 1,θ2,...,θr)is unknown. LetX 1,X2,...,X n
be independent observations onX, and suppose that the MLEs ofθ 1,θ2,...,θrexist and

CHI-SQUARE TESTS 477
are, respectively,
ˆ
θ 1,
ˆ
θ2,...,
ˆ
θ r.LetA 1,A2,...,A kbe a collection of disjoint Borel sets that
cover the real line, and let
ˆp
i=Pˆ
θ{X∈A i}>0 i=1,2,...,k,
where
ˆ
θ=(
ˆ
θ
1,...,
ˆ
θ r), andP θis the probability distribution associated withF θ.Let
Y
1,Y2,...,Y kbe the RVs, defined as follows:Y i=number ofX 1,X2,...,X ninAi,i=
1,2,...,k.
Then the RV
V
k=
k

n=1

(Y
i−nˆ p i)
2
nˆ pi

is asymptotically distributed as aχ
2
(k−r−1)RV (asn→∞).
The proof of Theorem 4 and some regularity conditions required onF
θare given in
Rao [88, pp. 391–392].
To testH
0:X∼F, whereFis completely specified, we rejectH 0if
u=
k

1

(y
i−npi)
2
npi


2
k−1,α
,
provided thatnis sufficiently large. If the null hypothesis isH
0:X∼F θ, whereF θis
known except for the parameterθ, we use Theorem 4 and rejectH
0if
v=
k

i=1

(y
i−nˆp i)
2
nˆ pi


2
k−r−1,α
,
whereris the number of parameters estimated.
Example 3.The following data were obtained from a table of random numbers of normal
distribution with mean 0 and variance 1.
0.464 0.137 2.455 −0.323−0.068
0.906−0.513−0.525 0.595 0.881
−0.482 1.678 −0.057−1.229−0.486
−1.787−0.261 1.237 1.046 −0.508
We want to test the null hypothesis that the DFFfrom which the data came is normal
with mean 0 and variance 1. HereFis completely specified. Let us choose three intervals
(−∞,−0.5],(−0.5,0.5], and(0.5,∞). We see thatY
1=5,Y 2=8, andY 3=7.
Also, ifZisN(0,1), thenp
1=0.3085,p 2=0.3830, andp 3=0.3085. Thus
u=
3

i=1

(y
i−npi)
2
npi

478 SOME FURTHER RESULTS ON HYPOTHESES TESTING
=
(5−20×0.3085)
2
6.17
+
(8−20×0.383)
2
7.66
+
(7−20×0.3085)
2
6.17
<1.
Also,χ
2
2,0.05
=5.99, so we cannot rejectH 0at level 0.05.
Example 4.In a 72-hour period on a long holiday weekend there was a total of 306 fatal
automobile accidents. The data are as follows:
Number of Fatal Accidents
per Hour Numbers of Hours
0or1 4
21 0
31 5
41 2
51 2
66 76 8ormore 7
Let us test the hypothesis that the number of accidents per hour is a Poisson RV.
Since the mean of the Poisson RV is not given, we estimate it by
ˆ
λ=x=
306
72
=4.25.
Let us now estimateˆp
i=Pˆ
λ{X=i},i=0,1,2,...,ˆ p 0=e

ˆ
λ
=0.0143. Note that
P
ˆ
λ{X=x+1}

λ{X=x}
=
ˆ
λ
x+1
,
so thatˆ p
i+1=[
ˆ
λ/(i+1)]ˆ p i. Thus
ˆ p
1=0.0606, ˆ p 2=0.1288, ˆ p 3=0.1825, ˆ p 4=0.1939,
ˆ p
5=0.1648, ˆ p 6=0.1167, ˆ p 7=0.0709, ˆ p 8=1−0.9325=0.0675.
The observed and expected frequencies are as follows:
i
0or12345678ormore
Observed Frequency,o i4 1015121266 7
Expected Frequency 5.38 9.28 13.14 13.96 11.87 8.41 5.10 4.86
=72ˆ p
i=ei

CHI-SQUARE TESTS 479
Thus
u=
8

i=1
(oi−ei)
2
ei
=2.74.
Since we estimated one parameter, the number of degrees of freedom isk−r−1=8−
1−1=6. From Table ST3,χ
2
6,0.05
=12.6, and since 2.74<12.6, we cannot reject the null
hypothesis.
Remark 2.Any application of Theorem 3 or 4 requires that we choose setsA
1,A2,...,A k,
and frequently these are chosen to be disjoint intervals. As a rule of thumb, we choose the
length of each interval in such a way that the probabilityP{X∈A
i}underH 0is approxi-
mately 1/k . Moreover, it is desirable to haven/k≥5 or, rather,e
i≥5 for eachi.Ifanyof
thee
i’s is<5, the corresponding interval is pooled with one or more adjoining intervals
to make the cell frequency at least 5. The number of degrees of freedom, if any pooling
is done, is the number of classes after pooling, minus 1, minus the number of parameters
estimated.
Finally, we consider a test ofhomogeneityof several multinomial distributions. Sup-
posewehavecsamples of sizesn
1,n2,...,n cfromcmultinomial distributions. Let the
associated probabilities with thejth population be(p
1j,p2j,...,p rj), where

r
i=1
pij=1,
j=1,2,...,c. Given observationsN
ij,i=1,2,...,r,j=1,2,...,cwith

r
i=1
Nij=nj,j=
1,2,...,cwe wish to testH
0:pij=pi,forj=1,2,...,c,i=1,2,...,r−1. The casec=1
is covered by Theorem 2. By Theorem 2 for eachj
U
r=
r

i=1

(N
ij−njpi)
2
njpi

has a limitingχ
2
r−1
distribution. Since samples are independent, the statistic
U
rc=
c

j=1
r

i=1
(Nij−njpi)
2
njpi
has a limitingχ
2
c(r−1)
distribution. Ifp i’s are unknown we use the MLEs
ˆp
i=

c
j=1
Nij

c j=1
nj
,i=1,2,...,r−1
forp
iand we see that the statistic
V
rc=
c

j=1
r

i=1
(Nij−njˆpi)
2
njˆ pi
has a chi-square distribution withc(r−1)−(r−1)=(c−1)(r−1)d.f. We rejectH 0at
(approximate) levelαisV
rc>χ
2
(r−1)(c− 1),α
.

480 SOME FURTHER RESULTS ON HYPOTHESES TESTING
Example 5.A market analyst believes that there is no difference in preferences of televi-
sion viewers among the four Ohio cities of Toledo, Columbus, Cleveland, and Cincinnati.
In order to test this belief, independent random samples of 150, 200, 250, and 200 per-
sons were selected from the four cities and asked, “What type of program do you prefer
most: Mystery, Soap, Comedy, or News Documentary?” The following responses were
recorded.
City
Program Type Toledo Columbus Cleveland Cincinnati
Mystery 50 70 85 60
Soap 45 50 58 40
Comedy 35 50 72 67
News 20 30 35 33
Sample Size 150 200 250 200
Under the null hypothesis that the proportions of viewers who prefer the four types of
programs are the same in each city, the maximum likelihood estimates ofp
i,i=1,2,3,4
are given by
ˆp
1=
50+70+85+60
150+200+250+200
=
265
800
=0.33, ˆ p
3=
35+50+72+67
800
=
224
800
=0.28,
ˆ p
2=
45+50+58+40
800
=
193
800
=0.24, ˆ p
4=
20+30+35+33
800
=
118
800
=0.15.
Herep
1=proportion of people who prefer mystery, and so on. The following table
gives the expected frequencies underH
0. Expected Number of Responses UnderH 0
Program
Type Toledo Columbus Cleveland Cincinnati
Mystery 150×0.33=49.5 200×0.33=66 250×0.33=82.5 200×0.33=66
Soap 150 ×0.24=36 200×0.24=48 250×0.24=60 200×0.24=48
Comedy 150×0.28=42 200×0.28=56 250×0.28=70 200×0.28=56
News 150 ×0.15=22.5 200×0.15=30 250×0.15=37.5 200×0.15=30
Sample 150 200 250 200
Size

CHI-SQUARE TESTS 481
It follows that
u
44=
(50−49.5)
249.5
+
(45−36)
2
36
+
(35−42)
2
42
+
(20−22.5)
2
22.5
+
(70−66)
2
66
+
(50−48)
2
48
+
(50−56)
2
56
+
(30−30)
2
30
+
(85−82.5)
2
82.5
+
(58−60)
2
60
+
(72−70)
2
70
+
(35−37.5)
2
37.5
+
(60−66)
2
66
+
(40−48)
2
48
+
(67−56)
2
56
+
(33−30)
2
30
=9.37.
Sincec=4 andr=4, the number of degrees of freedom is(4−1)(4−1)=9 and we
note that underH
0
0.30<P(U 44≥9.37)<0.50.
With such a largeP-value we can hardly rejectH
0. The data do not offer any evidence to
conclude that the proportions in the four cities are different.
PROBLEMS 10.3
1.The standard deviation of capacity for batteries of a standard type is known to be 1.66
ampere-hours. The following capacities (ampere-hours) were recorded for 10 bat-
teries of a new type: 146, 141, 135, 142, 140, 143, 138, 137, 142, 136. Does the
new battery differ from the standard type with respect to variability of capacity
(Natrella [75, p. 4-1])?
2.A manufacturer recorded the cut-off bias (volts) of a sample of 10 tubes as follows:
12.1, 12.3, 11.8, 12.0, 12.4, 12.0, 12.1, 11.9, 12.2, 12.2. The variability of cut-off
bias for tubes of a standard type as measured by the standard deviation is 0.208
volts. Is the variability of the new tube, with respect to cut-off bias less than that of
the standard type (Natrella [75, p. 4-5])?
3.Approximately equal numbers of four different types of meters are in service and
all types are believed to be equally likely to break down. The actual numbers of
breakdowns reported are as follows:
Type of Meter1234
Number of Breakdowns Reported30 40 33 47
Is there evidence to conclude that the chances of failure of the four types are not equal (Natrella [75, p. 9-4])?
4.Every clinical thermometer is classified into one of four categories,A,B,C,D,on
the basis of inspection and test. From past experience it is known that thermometers
produced by a certain manufacturer are distributed among the four categories in the
following proportions:

482 SOME FURTHER RESULTS ON HYPOTHESES TESTING
Category ABCD
Proportion0.87 0.09 0.03 0.01
A new lot of 1336 thermometers is submitted by the manufacturer for inspection and
test, and the following distribution into the four categories results:
Category ABCD
Number of Thermometers Reported1188 91 47 10
Does this new lot of thermometers differ from the previous experience with regard
to proportion of thermometers in each category(Natrella [75, p. 9-2])?
5.A computer program is written to generate random numbers,X, uniformly in the
interval 0≤X<10. From 250 consecutive values the following data are obtained:
X-value 0–1.99 2–3.99 4–5.99 6–7.99 8–9.99
Frequency 38 55 54 41 62
Do these data offer any evidence that the program is not written properly?
6.A machine working correctly cuts pieces of wire to a mean length of 10.5 cm with a standard deviation of 0.15 cm. Sixteen samples of wire were drawn at random from a
production batch and measured with the following results (centimeters): 10.4, 10.6,
10.1, 10.3, 10.2, 10.9, 10.5, 10.8, 10.6, 10.5, 10.7, 10.2, 10.7, 10.3, 10.4, 10.5. Test
the hypothesis that the machine is working correctly.
7.An experiment consists in tossing a coin until the first head shows up. One hun-
dred repetitions of this experiment are performed. The frequency distribution of the
number of trials required for the first head is as follows:
Number of trials
1 2 345ormore
Frequency 40 32 15 7 6
Can we conclude that the coin is fair?
8.Fit a binomial distribution to the following data:
x 01234
Frequency: 8 46 55 40 11
9.Prove Theorem 1.

CHI-SQUARE TESTS 483
10.Three dice are rolled independently 360 times each with the following results.
Face Value Die 1 Die 2 Die 3
15 06 23 8
24 85 56 0
36 96 16 4
44 55 45 8
57 17 87 3
67 75 06 7
Sample Size 360 360 360
Are all the dice equally loaded? That is, test the hypothesisH 0:pi1=pi2=pi3,
i=1,2,...,6, wherep
i1is the probability of getting aniwith die 1, and so on.
11.Independent random samples of 250 Democrats, 150 Republicans, and 100 Indepen-
dent voters were selected 1 week before a nonpartisan election for mayor of a large
city. Their preference for candidates Albert, Basu, and Chatfield were recorded as
follows.
Party Affiliation
Preference Democrat Republican Independent
Albert 160 70 90
Basu 32 45 25
Chatfield 30 23 15
Undecided 28 12 20
Sample Size 250 150 150
Are the proportions of voters in favor of Albert, Basu, and Chatfield the same within
each political affiliation?
12.Of 25 income tax returns audited in a small town, 10 were from low- and middle-
income families and 15 from high-income families. Two of the low-income families
and four of the high-income families were found to have underpaid their taxes. Are
the two proportions of families who underpaid taxes the same?
13.A candidate for a congressional seat checks her progress by taking a random sample
of 20 voters each week. Last week, six reported to be in her favor. This week nine
reported to be in her favor. Is there evidence to suggest that her campaign is working?
14.Let{X
11,X21,...,X r1},...,{X 1c,X2c,...,X rc}be independent multinomial RVs
with parameters(n
1,p11,p21,...,p r1),...,(n c,p1c,p2c,...,p rc)respectively. Let

484 SOME FURTHER RESULTS ON HYPOTHESES TESTING
Xi·=

c
j=1
Xijand

c
j=1
nj=n. Show that the GLR test for testingH 0:pij=pj,
forj=1,2,...,c,i=1,2,...,r−1, wherep
j’s are unknown against all alternatives
can be based on the statistic
λ(X)=

r
i=1

Xi·
n

Xi·

r i=1
c j=1
α
Xij
nj

Xij
.
10.4t-TESTS
In this section we investigate one of the most frequently used types of tests in statistics,
the tests based on at-statistic. LetX
1,X2,...,X nbe a random sample fromN(μ,σ
2
), and,
as usual, let us write
X=n
−1
n

1
Xi,S
2
=(n−1)
−1
n

1
(Xi−
X)
2
.
The tests for usual null hypotheses about the mean can be derived using the GLR method.
In the following table we summarize the results.
RejectH 0at Levelαif
H0 H1 σ
2
Known σ
2
Unknown
I.μ≤μ 0μ>μ0X≥μ 0+
σ

n
z
α
x≥μ 0+
s

n
t
n−1,α
II.μ≥μ 0μ<μ0
X≤μ 0+
σ

n
z
1−α
x≤μ 0+
s

n
t
n−1,1−α
III.μ=μ 0μ⎪=μ 0|
x−μ 0|≥
σ

n
z
α/2 |
x−μ 0|≥
s

n
t
n−1,α/2
Remark 1.A test based on at-statistic is called at-test.The t-tests in I and II are called
one-tailed tests;thet-test in III, atwo-tailed test.
Remark 2.Ifσ
2
is known, tests I and II are UMP and test III is UMP unbiased. Ifσ
2
is
unknown, thet-tests are UMP unbiased and UMP invariant.
Remark 3.Ifnis large we may use normal tables instead oft-tables. The assumption
of normality may also be dropped because of the central limit theorem. For small sam- ples care is required in applying the proper test, since the tail probabilities under normal
distribution andt-distribution differ significantly for smalln(see Remark 6.4.2).
Example 1.Nine determinations of copper in a certain solution yielded a sample mean of
8.3 percent with a standard deviation of 0.025 percent. Letμbe the mean of the population
of such determinations. Let us testH
0:μ=8.42 againstH 1:μ<8.42 at levelα=0.05.

t-TESTS 485
Heren=9,x=8.3,s=0.025,μ 0=8.42, andt n−1,1−α =−t 8,0.05=−1.860.
Thus
μ
0+
s

n
t
n−1,1−α =8.42−
0.025
3
1.86=8.4045.
We rejectH
0since 8.3 <8.4045.
We next consider the two-sample case. LetX
1,X2,...,X mandY 1,Y2,...,Y nbe inde-
pendent random samples fromN(μ
1,σ
2
1
)andN(μ 2,σ
2
2
), respectively. Let us write
X=m
−1
m
1
Xi,
Y=n
−1
n 1
Yi,
S
2
1
=(m−1)
−1
m
1
(Xi−
X)
2
,S
2
2
=(n−1)
−1
n 1
(Yi−
Y)
2
,
and
S
2
p
=
(m−1)S
2
1
+(n−1)S
2
2
m+n−2
.
S
2
p
is sometimes called thepooled sample variance. The following table summarizes the
two sample tests comparingμ
1andμ 2:
H0 H1 RejectH 0at Levelαif
(δ=Known Constant) σ
2
1

2
2
Known σ
2
1

2
2
Unknown,σ 1=σ2
I.μ1−μ2≤δμ 1−μ2>δx−y≥ x−y≥δ+t m+n−2,α
δ+z α

σ
2
1
m
+
σ
2
2
n
·s
p

1
m
+
1
n
II.μ
1−μ2≥δμ 1−μ2<δ
x−y≤ x−y≤δ−t m+n−2,α
δ−z α

σ
2
1
m
+
σ
2
2
n
·s
p

1
m
+
1
n
III.μ
1−μ2=δμ 1−μ2δ=δ|
x−y−δ|≥ | x−y−δ|≥t
m+n−2,α/2
z
α/2

σ
2
1
m
+
σ
2
2
n
·s
p

1
m
+
1
n
Remark 4.The case of most interest is that in whichδ=0. Ifσ
2
1

2
2
are unknown and
σ
2
2

2
2

2

2
unknown, thenS
2
p
is an unbiased estimate ofσ
2
. In this case all the
two-samplet-tests are UMP unbiased and UMP invariant. Before applying thet-test, one
should first make sure thatσ
2
1

2
2

2

2
unknown. This means applying another test
on the data. We will consider this test in the next section.

486 SOME FURTHER RESULTS ON HYPOTHESES TESTING
Remark 5.Ifm+nis large, we use normal tables; if bothmandnare large, we can drop
the assumption of normality, using the CLT.
Remark 6.The problem of equality of means in sampling from several populations will
be considered in Chapter 12.
Remark 7.The two sample problem whenσ
1δ=σ2, both unknown, is commonly referred
to asBehrens–Fisher problem.TheWelch approximate t-testofH
0:μ1=μ2is based on
a random number of d.f.fgiven by
f=

Θ
R
1+R
Φ
2
1
m−1
+
1
(1+R)
2
1
n−1

−1
,
where
R=
S
2
1
/m
S
2
2
/n
,
and thet-statistic
T=
(
X−Y)−(μ 1−μ2)
ˆ
S
2
1
/m+S
2
2
/n
withfd.f. This approximation has been found to be quite good even for small samples.
The formula forfgenerally leads to noninteger d.f. Linear interpolation int-table can be
used to obtain the required percentiles forfd.f.
Example 2.The mean life of a sample of 9 light bulbs was observed to be 1309 hours with
a standard deviation of 420 hours. A second sample of 16 bulbs chosen from a different
batch showed a mean life of 1205 hours with a standard deviation of 390 hours. Let us
test to see whether there is a significant difference between the means of the two batches,
assuming that the population variances are the same (see also Example 10.5.1).
HereH
0:μ1=μ2,H1:μ1δ=μ2,m=9,n=16,
x=1309,s 1=420,y=1205,s 2=390,
and let us takeα=0.05. We have
s
p=

8(420)
2
+15(390)
2
23
so that
t
m+n−2,α/2 sp

1
m
+
1
n
=t
23,0.025

8(420)
2
+15(390)
2
23

1
9
+
1
16
=345.44.
Since|x−y|=|1309−1205| =104δ>345.44, we cannot rejectH 0at levelα=0.05.
Quite frequently one samples from a bivariate normal population with meansμ
1,μ2,
variancesσ
2
1

2
2
, and correlation coefficientρ, the hypothesis of interest beingμ 1=μ2.
Let(X
1,Y1),(X2,Y2),...,(X n,Yn)be a sample from a bivariate normal distribution with

t-TESTS 487
parametersμ 1,μ2,σ
2
1

2
2
, andρ. ThenX j−YjisN(μ 1−μ2,σ
2
), whereσ
2

2
1
+
σ
2
2
−2ρσ 1σ2. We can therefore treatD j=(X j−Yj),j=1,2,...,n,asasamplefroma
normal population. Let us write
d=

n
1
di
n
ands
2
d
=

n
1
(di−
d)
2
n−1
.
The following table summarizes the resulting tests:
H0 H1
d0=Known Constant RejectH 0at Levelαif
I.μ 1−μ2≥d0 μ1−μ2<d0d≤d 0+
s
d

n
t
n−1,1−α
II.μ 1−μ2≤d0 μ1−μ2>d0
d≥d 0+
s
d

n
t
n−1,α
III.μ 1−μ2=d0 μ1−μ2δ=d0 |
d−d 0|≥
s
d

n
t
n−1,α/2
Remark 8.The case of most importance is that in whichd 0=0. All thet-tests, based
onD
j’s, are UMP unbiased and UMP invariant. Ifσis known, one can base the test on a
standardized normal RV, but in practice such an assumption is quite unrealistic. Ifnis large
one can replacet-values by the corresponding critical values under the normal distribution.
Remark 9.Clearly, it is not necessary to assume that(X
1,Y1),...,(X n,Yn)is a sample from
a bivariate normal population. It suffices to assume that the differencesD
iform a sample
from a normal population.
Example 3.Nine adults agreed to test the efficacy of a new diet program. Their weights
(pounds) were measured before and after the program and found to be as follows:
Participant
123456789
Before 132 139 126 114 122 132 142 119 126
After 124 141 118 116 114 132 145 123 121
Let us test the null hypothesis that the diet is not effective,H 0:μ1−μ2=0, against the
alternative,H
1:μ1−μ2>0, that it is effective at levelα=0.01. We compute
d=
8−2+8−2+8+0−3−4+5
9
=
18
9
=2,
s
2
d
=26.75, s d=5.17.

488 SOME FURTHER RESULTS ON HYPOTHESES TESTING
Thus
d
0+
s
d

n
t
n−1,α=0+
5.17

9
t
8,0.01=
5.17
3
×2.896=4.99.
Sincedδ>4.99, we cannot reject hypothesisH 0that the diet is not very effective.
PROBLEMS 10.4
1.The manufacturer of a certain subcompact car claims that the average mileage of
this model is 30 miles per gallon of regular gasoline. For nine cars of this model
driven in an identical manner, using 1 gallon of regular gasoline, the mean distance
traveled was 26 miles with a standard deviation of 2.8 miles. Test the manufacturer’s
claim if you are willing to reject a true claim no more than twice in 100.
2.The nicotine contents of five cigarettes of a certain brand showed a mean of 21.2
milligrams with a standard deviation of 2.05 milligrams. Test the hypothesis that the
average nicotine content of this brand of cigarettes does not exceed 19.7 milligrams.
Useα=0.05.
3.The additional hours of sleep gained by eight patients in an experiment with a certain
drug were recorded as follows:
Patient
1 23456 78
Hours Gained0.7−1.1 3.4 0.8 2.0 0.1 −0.2 3.0
Assuming that these patients form a random sample from a population of such patients and that the number of additional hours gained from the drug is a normal random variable, test the hypothesis that the drug has no effect at levelα=0.10.
4.The mean life of a sample of 8 light bulbs was found to be 1432 hours with a standard deviation of 436 hours. A second sample of 19 bulbs chosen from a different batch
produced a mean life of 1310 hours with a standard deviation of 382 hours. Making
appropriate assumptions, test the hypothesis that the two samples came from the
same population of light bulbs at levelα=0.05.
5.A sample of 25 observations has a mean of 57.6 and a variance of 1.8. A further
sample of 20 values has a mean of 55.4 and a variance of 2.5. Test the hypothesis
that the two samples came from the same normal population.
6.Two methods were used in a study of the latent heat of fusion of ice. Both methodA
and methodBwere conducted with the specimens cooled to−0.72

C. The following
data represent the change in total heat from−0.72

C to water, 0

C, in calories per
gram of mass:
MethodA:79.98,80.04, 80.02, 80.04, 80.03, 80.03, 80.04, 79.97, 80.05, 80.03,
80.02, 80.00, 80.02
MethodB:80.02,79.74, 79.98, 79.97, 79.97, 80.03, 79.95, 79.97

F-TESTS 489
Perform a test at level 0.05 to see whether the two methods differ with regard to their
average performance (Natrella [75, p. 3-23]).
7.In Problem 6, if it is known from past experience that the standard deviations of the
two methods areσ
A=0.024 andσ B=0.033, test the hypothesis that the methods
are same with regard to their average performance at levelα=0.05.
8.During World War II bacterial polysaccharides were investigated as blood plasma
extenders. Sixteen samples of hydrolyzed polysaccharides supplied by various man-
ufacturers in order to assess two chemical methods for determining the average
molecular weight yielded the following results:
MethodA:62,700;29,100;44,400;47,800;36,300;40,000;43,400;35,800;
33,900;44,200;34,300;31,300;38,400;47,100;42,100;42,200
MethodB:56,400;27,500;42,200;46,800;33,300;37,100;37,300;36,200;
35,200;38,000;32,200;27,300;36,100;43,100;38,400;39,900
Perform an appropriate test of the hypothesis that the two averages are the same
against a one-sided alternative that the average of MethodAexceeds that of
MethodB.Useα=0.05. (Natrella [75, p. 3-38]).
9.The following grade-point averages were collected over a period of 7 years to
determine whether membership in a fraternity is beneficial or detrimental to grades:
Year
1234567
Fraternity 2.4 2.0 2.3 2.1 2.1 2.0 2.0
Nonfraternity 2.4 2.2 2.5 2.4 2.3 1.8 1.9
Assuming that the populations were normal, test at the 0.025 level of significance
whether membership in a fraternity is detrimental to grades.
10.Consider the two samplet-statisticT=(X−Y)/[S p
ˆ
1/m+1/n], whereS
2
p
=
[(m−1)S
2
1
+(n−1)S
2
2
]/(m+n−2). Supposeσ 1δ=σ2.Letm, n→∞such that
m/(m +n)→ρ. Show that, underμ
1=μ2,T
L
−→U, whereU∼N(0,τ
2
), where
τ
2
=[(1−ρ)σ
2
1
+ρσ
2
2
]/[ρσ
2
1
+(1−ρ)σ
2
2
]. Thus whenm≈n,ρ≈1/2 andτ
2
≈1
andTis approximatelyN(0,1)asm(≈n)→∞. In this case, at-test based onTwill
have approximately the right level.
10.5F-TESTS
The termF-testsrefers to tests based on anF-statistic. LetX
1,X2,...,X mandY 1,Y2,...,Y n
be independent samples fromN(μ 1,σ
2
1
)andN(μ 2,σ
2
2
), respectively. We recall that

m
1
(Xi−
X)/σ
2
1
∼χ
2
(m−1)and

n 1
(Yi−Y)
2

2
2
∼χ
2
(n−1)are independent RVs,

490 SOME FURTHER RESULTS ON HYPOTHESES TESTING
so that the RV
F(X,Y)=

m
1
(Xi−
X)
2

n
1
(Yi−
Y)
2
σ
2
2
(n−1)
σ
2
1
(m−1)
=
σ
2
2
σ
2
1
S
2
1
S
2
2
is distributed asF(m−1,n−1).
The following table summarizes theF-tests:
RejectH 0at Levelαif
H0 H1 μ1,μ2Known μ 1,μ2Unknown
I.σ
2
1
≤σ
2
2
σ
2
1

2
2

m
1
(xi−μ1)
2

n 1
(yi−μ2)
2

m
n
F
m,n,α
s
2
1
s
2
2
≥Fm−1,n−1,α
II.σ
2
1
≥σ
2
2
σ
2
1

2
2

n
1
(yi−μ2)
2

m 1
(xi−μ1)
2

n
m
F
n,m,α
s
2
2
s
2 1
≥Fn−1,m−1,α
III.σ
2
1

2
2
σ
2
1
δ=σ
2
2








m
1
(xi−μ1)
2

n 1
(yi−μ2)
2

m
n
F
m,n,α/2
or ≤
m
n
F
m,n,1− α/2





s
2
1
s
2 2
≥F
m−1,n−1,α/2
or≤F
m−1,n−1,1−α/2
Remark 1.Recall (Remark 6.4.5) that
F
m,n,1− α ={F n,m,α}
−1
.
Remark 2.The tests described above can be easily obtained from the likelihood ratio pro-
cedure. Moreover, in the important case whereμ
1,μ2are unknown, tests I and II are UMP
unbiased and UMP invariant. For test III we have chosen equal tails, as is customarily done
for convenience even though the unbiasedness property of the test is thereby destroyed.
Example 1(Example 10.4.2 continued).In Example 10.4.2 let us test the validity of the
assumption on which thet-test was based, namely, that the two populations have the
same variance at level 0.05. We computes
2
1
/s
2
2
=(420/390)
2
=196/169 =1.16. Since
F
m−1,n−1,α/2 =F8,15,0.025 =3.20, we cannot rejectH 0:σ1=σ2.
An important application of theF-test involves the case where one is testing the equality
of means of two normal populations under the assumption that the variances are the same,
that is, testing whether the two samples come from the same population. LetX
1,X2,...,X m
andY 1,Y2,...,Y nbe independent samples fromN(μ 1,σ
2
1
)andN(μ 2,σ
2
2
), respectively. If
σ
2
1

2
2
but is unknown, thet-test rejectsH 0:μ1=μ2if|T|>c, wherecis selected so
thatα
2=P{|T|>c|μ 1=μ2,σ1=σ2}, that is,c=t
m+n−2,α 2/2sp
ˆ
(1/m+1/n), where
s
2
p
=
(m−1)s
2
1
+(n−1)s
2
2
m+n−2
,

BAYES AND MINIMAX PROCEDURES 491
s1,s2being the sample variances. If first anF-test is performed to testσ 1=σ2, and then
at-test to testμ
1=μ2at levelsα 1andα 2, respectively, the probability of accepting both
hypotheses when they are true is
P{|T|≤c,c
1<F<c 2|μ1=μ2,σ1=σ2};
and ifFis independent ofT, this probability is(1−α
1)(1−α 2). It follows that the
combined test has a significance levelα=1−(1−α
1)(1−α 2). We see that
α=α
1+α2−α1α2≤α1+α2
andα≥max(α 1,α2). In fact,αwill be closer toα 1+α2, since for smallα 1andα 2,α1α2
will be closer to 0.
We show thatFis independent ofTwheneverσ
1=σ2. The statisticV=(
X,Y,

m
1
(Xi−X)
2
+

n 1
(Yi−
Y)
2
)is a complete sufficient statistic for the parameter(μ 1,μ2,
σ
1=σ2)(see Theorem 8.3.2). Since the distribution ofFdoes not depend onμ 1,μ2,
andσ
1=σ2, it follows (Problem 5) thatFis independent ofVwheneverσ 1=σ2.ButT
is a function ofValone, so thatFmust be independent ofTalso.
In Example 1, the combined test has a significance level of
α=1−(0.95)(0.95)= 1−0.9025=0.0975.
PROBLEMS 10.5
1.For the data of Problem 10.4.4 is the assumption of equality of variances, on which
thet-test is based, valid?
2.Answer the same question for Problems 10.4.5 and 10.4.6.
3.The performance of each of two different dive bombing methods is measured a dozen
times. The sample variances for the two methods are computed to be 5545 and 4073,
respectively. Do the two methods differ in variability?
4.In Problem 3 does the variability of the first method exceed that of the second
method?
5.LetX=(X
1,X2,...,X n)be a random sample from a distribution with PDF (PMF)
f(x,θ),θ∈ΘwhereΘis an interval inR
k.LetT (X)be a complete sufficient statistic
for the family{f(x;θ):θ∈Θ}.IfU (X)is a statistic (not a function ofTalone)
whose distribution does not depend onθ, show thatUis independent ofT.
10.6 BAYES AND MINIMAX PROCEDURES
LetX
1,X2,...,X nbe a sample from a probability distribution with PDF (PMF)f θ,θ∈Θ.
In Section 8.8 we described the general decision problem, namely, once the statistician
observesx, she has a setAof options available. The problem is to find a decision func-
tiondthat minimizes the riskR(θ,δ)=E
θL(θ,δ)in some sense. Thus a minimax solution
requires the minimization ofmaxR(θ,δ), while a Bayes solution requires the minimiza-
tion ofR(π,δ)=ER(θ,δ ), whereπis the a priori distribution onΘ. In Remark 9.2.1

492 SOME FURTHER RESULTS ON HYPOTHESES TESTING
we considered the problem of hypothesis testing as a special case of the general decision
problem. The setAcontains two points,a
0anda 1;a0corresponds to the acceptance of
H
0:θ∈Θ 0, anda 1corresponds to the rejection ofH 0. Suppose that the loss function is
defined by









L(θ,a
0)=a(θ )ifθ∈Θ 1,a(θ)>0,
L(θ,a
1)=b(θ )ifθ∈Θ 0,b(θ)>0,
L(θ,a
0)=0if θ∈Θ 0,
L(θ,a
1)=0if θ∈Θ 1.
(1)
Then
R(θ,δ(X)) =L (θ,a
0)Pθ{δ(X)=a 0}+L(θ,a 1)Pθ{δ(X)=a 1} (2)
=

a(θ)P
θ{δ(X)=a 0}ifθ∈Θ 1
b(θ)P θ{δ(X)=a 1}ifθ∈Θ 0.
(3)
Aminimax solutionto the problem of testingH
0:θ∈Θ 0againstH 1:θ∈Θ 1, where
Θ=Θ
0+Θ1,istofindaruleδ that minimizes
max
θ
[a(θ)P θ{δ(X)=a 0},b(θ)P θ{δ(X)=a 1}].
We will consider here only the special case of testingH
0:θ=θ 0againstH 1:θ=θ 1.
In that case we want to find a ruleδwhich minimizes
max[aP
θ1
{δ(X)=a 0}, bP θ0
{δ(X)=a 1}]. (4)
We will show that the solution is to rejectH
0if
f
θ1
(x)
fθ0
(x)
≥k, (5)
provided that the constantkis chosen so that
R(θ
0,δ(X)) =R(θ 1,δ(X)), (6)
whereδis the rule defined in (5); that is, the minimax ruleδis obtained if we choosek
in (5) so that
aP
θ1
{δ(X)=a 0}=bP θ0
{δ(X)=a 1}, (7)
or, equivalently, we choosekso that
aP
θ1

f
θ1
(X)
fθ0
(X)
<k

=bP
θ0

f
θ1
(X)
fθ0
(X)
≥k

. (8)

BAYES AND MINIMAX PROCEDURES 493
Letδ

be any other rule. IfR(θ 0,δ)<R(θ 0,δ

), thenR(θ 0,δ)=R(θ 1,δ)<
max[R(θ
0,δ

),R(θ 1,δ

)]andδ

cannot be minimax. Thus,R(θ 0,δ)≥R(θ 0,δ

), which
means that
P
θ0


(X)=a 1}≤P θ0
{δ(X)=a 1}=P{RejectH 0|H0true}. (9)
By the Neyman–Pearson lemma, ruleδis the most powerful of its size, so that its power
must be at least that ofδ

, that is,
P
θ1
{δ(X)=a 1}≥P θ1


(X)=a 1}
so that
P
θ1
{δ(X)=a 0}≤P θ1


(X)=a 0}.
It follows that
aP
θ1
{δ(X)=a 0}≤aP θ1


(X)=a 0}
and hence that
R(θ
1,d)≤R(θ 1,δ

). (10)
This means that
max[R(θ
0,δ),R(θ 1,δ)] =R(θ 1,δ)≤R(θ 1,δ

)
and thus
max[R(θ
0,δ),R(θ 1,δ)]≤max[R(θ 0,δ

),R(θ 1,δ

)].
Note that in the discrete case one may need some randomization procedure in order to
achieve equality in (8).
Example 1.LetX
1,X2,...,X nbe iidN(μ,1)RVs. To testH 0:μ=μ 0againstH 1:μ=μ 1
(>μ0), we should choosekso that (8) is satisfied. This is the same as choosingc, and
thusk, so that
aP
μ1
{
X<c}=bP μ0
{X≥c}
or
aP
μ1

X−μ 1
1/

n
<
c−μ
1
1/

n

=bP

X−μ 0
1/

n

c−μ
0
1/

n

.
Thus
aΦ[

n(c−μ 1)] =b{1−Φ[

n(c−μ 0)]},

494 SOME FURTHER RESULTS ON HYPOTHESES TESTING
whereΦis the DF of anN(0,1)RV. This can easily be accomplished with the help of
normal tables once we knowa,b,μ
0,μ1, andn.
We next consider the problem of testingH
0:θ∈Θ 0againstH 1:θ∈Θ 1from a
Bayesian point of view. Letπ(θ)be the a priori probability distribution onΘ.
Then
R(π,d)=E
θR(θ,δ(X))
=

Θ
R(θ,δ)π(θ)dθifπis a pdf,

Θ
R(θ,δ)π(θ)ifπis a pmf,
=










Θ0
b(θ)π(θ)P θ{δ(X)=a 1}dθ+

Θ1
a(θ)π(θ)P θ{δ(X)=a 0}dθifπis a PDF,

Θ0
b(θ)π(θ)P θ{δ(X)=a 1}+

Θ1
a(θ)π(θ)P θ{δ(X)=a 0}ifπis a PMF.
(11)
The Bayes solution is a decision rule that minimizesR(π,δ). In what follows we restrict
our attention to the case where bothH
0andH 1have exactly one point each, that is,
Θ
0={θ 0},Θ1={θ 1}.Letπ(θ 0)=π 0andπ(θ 1)=1−π 0=π1. Then
R(π,δ)=bπ
0Pθ0
{δ(X)=a 1}+aπ 1Pθ1
{δ(X)=a 0}, (12)
whereb(θ
0)=b, a(θ 1)=a;(a,b>0).
Theorem 1.LetX=(X
1,X2,...,X n)be an RV of the discrete (continuous) type with
PMF (PDF)f
θ,θ∈Θ={θ 0,θ1}.Letπ (θ 0)=π 0,π(θ 1)=1−π 0=π1be the a priori prob-
ability mass function onΘ. A Bayes solution for testingH
0:X∼f θ0
againstH 1:X∼f θ1
,
using the loss function (1), is to rejectH
0if
f
θ1
(x)
fθ0
(x)


0
aπ1
. (13)
Proof.We wish to findδwhich minimizes
R(π,δ)=bπ
0Pθ0
{δ(X)=a 1}+aπ 1Pθ1
{δ(X)=a 0}.
Now
R(π,δ)=E
θR(θ,δ)
=E{E
θ{L(θ,δ)|X}}
so it suffices to minimize{E
θ{L(θ,δ)|X}.
The a posteriori distribution ofθis given by
h(θ|x)=
π(θ)f
θ(x)

θ
fθ(x)π(θ)
=
π(θ)f
θ(x)π0fθ0
(x)+π 1fθ1
(x)

BAYES AND MINIMAX PROCEDURES 495
=







π
0fθ0
(x)
π0fθ0
(x)+π 1fθ1
(x)
ifθ=θ
0,
π
1fθ1
(x)π0fθ0
(x)+π 1fθ1
(x)
ifθ=θ
1.
(14)
Thus
E
θ{L(θ,δ(X))|X =x}=

bh(θ
0|x),θ=θ 0,δ(X)=a 1,
ah(θ
1|x),θ=θ 1,δ(X)=a 0,
It follows that we rejectH
0, that is,δ(X)=a 1if
bh(θ
0|x)≤ah(θ 1|x),
which is the case if and only if

0fθ0
(x)≤aπ 1fθ1
(x),
as asserted.
Remark 1.In the Neyman–Pearson lemma we fixedP
θ0
{δ(X)=a 1}, the probability of
rejectingH
0when it is true, and minimizedP θ1
{δ(X)=a 0}, the probability of accepting
H
0when it is false. Here we no longer have a fixed levelαforP θ0
{δ(X)=a 1}. Instead
we allow it to assume any value as long asR(π,δ), defined in (12), is minimum.
Remark 2.It is easy to generalize Theorem 1 to the case of multiple decisions. LetXbe
an RV with PDF (PMF)f
θ, whereθcan take any of thekvaluesθ 1,θ2,...,θk. The problem
is to observexand decide which of theθ
i’s is the correct value ofθ. Let us writeH i:θ=θ i,
i=1,2,...,k, and assume thatπ(θ
i)=π i,i=1,2,...k,

k
1
πi=1, is the prior probability
distribution onΘ={θ
1,θ2,...,θk}.Let
L(θ
i,δ)=

1ifδchoosesθ
j,j⎪=i.
0ifδchoosesθ
i.
The problem is to find a ruleδthat minimizesR(π,δ). We leave the reader to show that a
Bayes solution is to acceptH
i:θ=θ i(i=1,2,...,k)if
π
ifθi
(x)≥π jfθj
(x)for allj⎪=i,j=1,2,...,k, (15)
where any point lying in more than one such region is assigned to any one of them.
Example 2.LetX
1,X2,...,X nbe iidN(μ,1)RVs. To testH 0:μ=μ 0againstH 1:μ=μ 1
(>μ0), let us takea=bin the loss function (1). Then Theorem 1 says that the Bayes rule
is one that rejectsH
0if
f
θ1
(x)
fθ0
(x)

π
0
1−π 0
,

496 SOME FURTHER RESULTS ON HYPOTHESES TESTING
that is,
exp



n
1
(xi−μ1)
2
2
+

n 1
(xi−μ0)
2
2


π
0
1−π 0
,
exp


1−μ0)
n

1
xi+
n(μ
2
0
−μ
2
1
)
2


π
0
1−π 0
.
This happens if and only if
1
n
n

1
xi≥
1
n
log[π
0/(1−π 0)]
μ1−μ0
+
μ
0+μ1
2
,
where the logarithm is to the basee.Itfollowsthat,ifπ
0=
1
2
, the rejection region
consists of
x≥
μ
0+μ1
2
.
Example 3.This example illustrates the result described in Remark 2. LetX
1,X2,...,X n
be a sample fromN(μ,1)and suppose thatμcan take any one of the three valuesμ 1,
μ
2,orμ 3.Letμ 1<μ2<μ3. Assume, for simplicity, thatπ 1=π2=π3. Then we accept
H
i:μ=μ i,i=1,2,3, if
π
iexp


n

k=1
(xk−μi)
2
2

≥π
jexp


n

k=1
(xk−μj)
2
2

for eachj⎪=i,j=1,2,3.
It follows that we acceptH
iif

i−μj)
x+
μ
2
j
−μ
2
i
2
≥0, j=1,2,3,(j⎪=i),
that is,
x(μi−μj)≥

i−μj)(μi+μj)
2
,j=1,2,3,(j⎪=i).
Thus, the acceptance region ofH
1is given by
x≤
μ
1+μ2
2
andx≤
μ
1+μ3
2
.
Also, the acceptance region ofH
2is given by
x≥
μ
1+μ2
2
andx≤
μ
2+μ3
2
,

BAYES AND MINIMAX PROCEDURES 497
and that ofH 3by
x≥
μ
1+μ3
2
andx≥
μ
2+μ3
2
.
In particular, ifμ
1=0,μ 2=2,μ 3=4, we acceptH 1if
x≤1,H 2if 1≤x≤3, andH 3
ifx≥3. In this case, boundary points 1 and 3 have zero probability, and it does not matter
where we include them.
PROBLEMS 10.6
1.In Example 1 letn=15,μ
0=4.7, andμ 1=5.2, and choosea=b>0. Find the
minimax test, and compute its power atμ=4.7 andμ=5.2.
2.A sample of five observations is taken on ab(1,θ)RV to testH
0:θ=
1
2
against
H
1:θ=
3
4
.
(a) Find the most powerful test of sizeα=0.05.
(b) IfL(
1
2
,
1
2
)=L(
3
4
,
3
4
)=0,L(
1
2
,
3
4
)=1, andL(
3
4
,
1
2
)=2, find the minimax rule.
(c) If the prior probabilities ofθ=
1
2
andθ=
3
4
areπ 0=
1
3
andπ 1=
2
3
, respectively,
find the Bayes rule.
3.A sample of sizenis to be used from the PDF
f
θ(x)=θe
−θx
,x>0,
to testH
0:θ=1againstH 1:θ=2. If the a priori distribution onθisπ 0=
2
3
,π1=
1
3
,
anda=b, find the Bayes solution. Find the power of the test atθ=1 andθ=2.
4.Given two normal densities with variances 1 and with means−1 and 1, respectively,
find the Bayes solution based on a single observation whena=band (a)π
0=
π
1=
1
2
, and (b)π 0=
1
4
,π1=
3
4
.
5.Given three normal densities with variances 1 and with means−1, 0, 1, respec-
tively, find the Bayes solution to the multiple decision problem based on a single
observation whenπ
1=
2
5
,π2=
2
5
,π3=
1
5
.
6.For the multiple decision problem described in Remark 2 show that a Bayes solution
is to acceptH
i:θ=θ i(i=1,2,....k) if (15) holds.

11
CONFIDENCE ESTIMATION
11.1 INTRODUCTION
In many problems of statistical inference the experimenter is interested in constructing
a family of sets that contain the true (unknown) parameter value with a specified (high)
probability. IfX, for example, represents the length of life of a piece of equipment, the
experimenter is interested in a lower boundθfor the meanθofX. Sinceθ=θ(X)will be
a function of the observations, one cannot ensure with probability 1 thatθ(X)≤θ. All that
one can do is to choose a number 1−αthat is close to 1 so thatP
θ{θ
(X)≤θ}≥1−αfor
allθ. Problems of this type are called problems ofconfidence estimation. In this chapter
we restrict ourselves mostly to the case whereΘ⊆Rand consider the problem of setting
confidence limits for the parameterθ.
In Section 11.2 we introduce the basic ideas of confidence estimation. Section 11.3
deals with various methods of finding confidence intervals, and Section 11.4 deals with
shortest-length confidence intervals. In Section 11.5 we study unbiased and equivariant
confidence intervals.
11.2 SOME FUNDAMENTAL NOTIONS OF CONFIDENCE ESTIMATION
So far we have considered a random variable or some function of it as the basic observable
quantity. LetXbe an RV, anda,b, be two given positive real numbers. Then
An Introduction to Probability and Statistics, Third Edition. Vijay K. Rohatgi and A.K. Md. Ehsanes Saleh.
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.

500 CONFIDENCE ESTIMATION
P{a<X<b}=P{a<XandX<b}
=P

X<b<
bX
a

,
and if we know the distribution ofXanda,b, we can determine the probabilityP{a<
X<b}. Consider the intervalI(X)=(X,bX/a). This is an interval with end points that
are functions of the RVX, and hence it takes the value(x,bx/a)whenXtakes the valuex.
In other words,I(X)assumes the valueI(x)wheneverXassumes the valuex. ThusI(X)
is a random quantity and is an example of arandom interval. Note that I(X)includes the
valuebwith a certain fixed probability. For example, ifb=1,a=
1
2
, andXisU(0,1),
the interval(X,2X)includes point 1 with probability
1
2
. We note thatI(X)is a family
of intervals with associatedcoverage probability P(I (X)σ1)=
1
2
. It has (random) length
⊆(I(X)) =2X−X=X. In general the larger the length of the interval the larger the coverage
probability. Let us formalize these notions.
Definition 1.LetP
θ,θ∈Θ⊆R k, be the set of probability distributions of an RVX.A
family of subsetsS(x)ofΘ, whereS(x)depends on the observationxbut not onθ,is
called afamily of random sets. If, in particular, Θ⊆RandS(x)is an interval(θ
(x),θ(x)),
whereθ(x)andθ(x)are functions ofxalone (and notθ), we callS(X)arandom interval
withθ(X)andθ(X)as lower and upper bounds, respectively.θ(X)may be−∞, and
θ(X)may be+∞.
In a wide variety of inference problems one is not interested in estimating the parameter
or testing some hypothesis concerning it. Rather, one wishes to establish a lower or an
upper bound, or both, for the real-valued parameter. For example, ifXis the time to failure
of a piece of equipment, one may be interested in a lower bound for the mean ofX.Ifthe
RVXmeasures the toxicity of a drug, the concern is to find an upper bound for the mean.
Similarly, if the RVXmeasures the nicotine content of a certain brand of cigarettes, one
may be interested in determining an upper and a lower bound for the average nicotine
content of these cigarettes.
In this chapter we are interested in theproblem of confidence estimation, namely, that of
finding a family of random setsS(x)for a parameterθsuch that, for a givenα,0<α<1
(usually small),
P
θ{S(X)σθ}≥1−α for allθ∈Θ. (1)
We restrict our attention mainly to the case whereθ∈Θ⊆R.
Definition 2.Letθ∈Θ⊆Rand 0<α<1. Astatisticθ
(X)satisfying
P
θ{θ
(X)≤θ}≥1−α for allθ (2)
is called alower confidence boundforθat confidence level 1−α. The quantity
inf
θ∈Θ
Pθ{θ
(X)≤θ} (3)
is called theconfidence coefficient.

SOME FUNDAMENTAL NOTIONS OF CONFIDENCE ESTIMATION 501
Definition 3.A statisticθthat minimizes
P
θ{θ
(X)≤θ
α
} for allθ
α
<θ (4)
subject to (2) is known as auniformly most accurate(UMA) lower confidence bound for
θat confidence level 1−α.
Remark 1.SupposeX∼P
θand (2) holds. Then the smallest probability oftrue coverage,
P
θ{θ
(X)≤θ)=P θ{[θ(X),∞)σθ}is 1−α. Then the probability offalse (or incorrect)
coverageisP
θ{[θ
(X),∞)σθ
α
}=P θ{θ(X)≤θ
α
}forθ
α
<θ. According to Definition 3
among the class of all lower confidence bounds satisfying (2), a UMA lower confidence
bound has the smallest probability of false coverage.
Similar definitions are given for an upper confidence bound forθand a UMA upper
confidence bound.
Definition 4.A family of subsetsS(x)ofΘ⊆R
kis said to constitute a family of
confidence sets at confidence level 1−αif
P
θ{S(X)σθ}≥1−α for allθ∈Θ, (5)
that is, the random setS(X)covers the true parameter valueθwith probability≥1−α.
A lower confidence bound corresponds to the special case wherek=1 and
S(X)={θ:θ
(x)≤θ<∞}; (6)
and an upper confidence bound, to the case where
S(x)={θ :θ(x)≥θ>−∞}. (7)
IfS(x)is of the form
S(x)=(θ(x),θ(x)) (8)
we will call it aconfidence intervalat confidence level 1−α, provided that
P
θ{θ
(X)<θ<θ(X)}≥ 1−α for allθ, (9)
and the quantity
inf
θ
Pθ{θ
(X)<θ<θ(X)} (10)
will be referred to as the confidence coefficient associated with the random interval. Remark 2.We writeS(X)σθto indicate thatX, and henceS(X), is random here and
notθso the probability distribution referred to is that ofX.

502 CONFIDENCE ESTIMATION
Remark 3.WhenX=xis the realization the confidence interval (set)S(x)is a fixed sub-
set ofR
k. No probability is attached toS(x)itself since neitherθnorS(x)has a probability
distribution. In fact eitherS(x)coversθor it does not and we will never know which since
θis unknown. One can give a relative frequency interpretation. If (1−α)-level confidence
sets forθwere computed a large number of times, then a fraction (approximately) 1−α
of these would contain the true (but unknown) parameter value.
Definition 5.A family of (1−α)-level confidence sets{S(x)}is said to be a UMA family
of confidence sets at level 1−αif
P
θ{S(X)containsθ
Γ
}≤P θ{S
Γ
(X)containsθ
Γ
}
for allθ =θ
Γ
and any (1−α)-level family of confidence setsS
Γ
(X).
Example 1.LetX
1,X2,...,X nbe iid RVs,X i∼N(μ,σ
2
). Consider the interval(
X−c 1,
X+c 2). In order for this to be a (1−α)-level confidence interval, we must have
P{X−c 1<μ<X+c 2}≥1−α,
which is the same as
P{μ−c
2<
X<μ+c 1}≥1−α.
Thus
P
Θ

c
2
σ

n<
X−μ
σ

n<
c
1
σ

n
Γ
≥1−α.
Since

n(X−μ)/σ∼N(0,1), we can choosec 1andc 2to have equality, namely,
P
Θ

c
2
σ

n<
X−μ
σ

n<
c
1
σ

n
Γ
=1−α,
provided thatσis known. There are infinitely many such pairs of values(c
1,c2).In
particular, an intuitively reasonable choice isc
1=−c 2=c, say. In that case
c

n
σ
=z
α/2,
and the confidence interval is(
X−(σ/

n)z
α/2,X+(σ/

n)z
α/2). The length of this inter-
val is(2σ/

n)z
α/2.Givenσ andα, we can choosento get a confidence interval of a fixed
length.
Ifσis not known, we have from
P{−c
2<
X−μ<c 1}≥1−α

SOME FUNDAMENTAL NOTIONS OF CONFIDENCE ESTIMATION 503
that
P


c
2
S

n<
X−μ
S/

n
<
c
1
S

n

≥1−α,
and once again we can choose pairs of values(c
1,c2)using at-distribution withn−1d.f.
such that
P


c
2

n
S
<
X−μ
S

n<
c
1

n
S

=1−α.
In particular, if we takec
1=−c 2=c, say, then
c

n
S
=t
n−1,α/2 ,
and(
X−(S/

n)t
n−1,α/2 ,X+(S/

n)t
n−1,α/2 ),isa(1−α)-level confidence interval forμ.
The length of this interval is(2S/

n)t
n−1,α/2 , which is no longer constant. Therefore we
cannot choosento get a fixed-width confidence interval of level 1−α. Indeed, the length
of this interval can be quite large ifσis large. Its expected length is
2

n
t
n−1,α/2 EσS=
2

n
t
n−1,α/2

2
n−1
Γ(n/2)
Γ[(n−1)/2]
σ,
which can be made as small as we please by choosingnlarge enough.
Example 2.In Example 1, suppose that we wish to find a confidence interval forσ
2
instead whenμis unknown. Consider the interval(c 1S
2
,c2S
2
),c1,c2>0. We have
P{c
1S
2

2
<c2S
2
}≥1−α,
so that
P

c
−1
2
<
S
2
σ
2
<c
−1
1

≥1−α.
Since(n−1)S
2

2
isχ
2
(n−1), we can choose pairs of values(c 1,c2)from the tables of
the chi-square distribution. In particular, we can choosec
1,c2so that
P

S
2
σ
2

1
c1

=
α
2
=P

S
2
σ
2

1
c2

.
Then
n−1
c1

2
n−1,α/2
and
n−1
c2

2 n−1,1−α/2
.
Thus
σ
(n−1)S
2
χ
2 n−1,α/2
,
(n−1)S
2
χ
2 n−1,1−α/2

504 CONFIDENCE ESTIMATION
is a (1−α)-level confidence interval forσ
2
wheneverμis unknown. Ifμis known, then
n

1
(Xi−μ)
2
σ
2
∼χ
2
(n).
Thus we can base the confidence interval on

n
1
(Xi−μ)
2
. Proceeding similarly, we get a
(1−α)-level confidence interval as
σ

n
1
(Xi−μ)
2
χ
2
n,α/2
,

n
1
(Xi−μ)
2
χ
2
n,1−α/2

.
Next suppose that bothμandσ
2
are unknown and that we want a confidence set for
(μ,σ
2
). We have from Boole’s inequality
P

X−
S

n
t
n−1,α 1/2<μ<
X+
S

n
t
n−1,α 1/2,
(n−1)S
2
χ
2 n−1,α
2/2

2
<
(n−1)S
2
χ
2 n−1,1−α
2/2

≥1−P

X+
S

n
t
n−1,α 1/2≤μor
X−
S

n
t
n−1,α 1/2≥μ

−P

(n−1)S
2 χ
2 n−1,1−α
2/2
≤σ
2
or
(n−1)S
2
χ
2 n−1,α
2/2
≥σ
2

=1−α
1−α2,
so that the Cartesian product,
S(X)=

X−
S

n
t
n−1,α 1/2,
X+
S

n
t
n−1,α 1/2

×
σ
(n−1)S
2
χ
2 n−1,α
2/2
,
(n−1)S
2
χ
2 n−1,1−α
2/2

is a (1−α
1−α2)-level confidence set for(μ,σ
2
).
11.3 METHODS OF FINDING CONFIDENCE INTERVALS
We now consider some common methods of constructing confidence sets. The most
common of these is the method ofpivots.
Definition 1.LetX∼P
θ. A random variableT(X,θ)is known as apivotif the
distribution ofT(X,θ)does not depend onθ.
In many problems, especially in location and scale problems, pivots are easily found.
For example, in sampling fromf(x−θ),X
(n)−θis a pivot and so is
X−θ.Insam-
pling from(1/σ)f(x/σ), a scale family,X
(n)/σis a pivot and so isX
(1)/σ, and in
sampling from(1/σ)f((x−θ)/σ), a location-scale family,(
X−θ)/Sis a pivot, and so
is(X
(2)+X
(1)−2θ)/S.

METHODS OF FINDING CONFIDENCE INTERVALS 505
If the DFF θofXiis continuous, thenF θ(Xi)∼U[0,1]and, in case of random sampling,
we can take
T(X,θ)=
n

i=1
Fθ(Xi),
or,
−logT(X,θ)=−
n

i=1
logF θ(Xi)
as a pivot. SinceF
θ(Xi)∼U[0,1],−logF θ(Xi)∼G(1,1)and−

n
i=1
logF θ(Xi)∼
G(n,1). It follows that−

n
i=1
logF θ(Xi)is a pivot.
The following result gives a simple sufficient condition for a pivot to yield a confidence
interval for a real-valued parameterθ.
Theorem 1.LetT(X,θ)be a pivot such that for eachθ,T(X,θ)is a statistic, and as a
function ofθ,Tis either strictly increasing or decreasing at eachx∈R
n.LetΛ⊆Rbe
the range ofT, and for everyλ∈Λandx∈R
nlet the equationλ=T(x,θ)be solvable.
Then one can construct a confidence interval forθat any level.
Proof.Let 0<α<1. Then we can choose a pair of numbersλ
1(α)andλ 2(α)inΛnot
necessarily unique, such that
P
θ{λ1(α)<T(X,θ)<λ 2(α)}≥1 −α for allθ. (1)
Since the distribution ofTis independent ofθ, it is clear thatλ
1andλ 2are independent
ofθ. Since, moreover,Tis monotone inθ, we can solve the equations
T(x,θ)=λ
1(α)andT(x,θ)=λ 2(α) (2)
for everyxuniquely forθ.Wehave
P
θ{θ
(X)<θ<θ(X)}≥ 1−α for allθ, (3)
whereθ(X)<θ(X)are RVs. This completes the proof.
Remark 1.The condition thatλ=T(x,θ)be solvable will be satisfied if, for example,T
is continuous and strictly increasing or decreasing as a function ofθinΘ.
Note that in the continuous case (that is, when the DF ofTis continuous) we can find
a confidence interval with equality on the right side of (1). In the discrete case, however,
this is usually not possible.

506 CONFIDENCE ESTIMATION
Remark 2.Relation (1) is valid even when the assumption of monotonicity ofTin the
theorem is dropped. In that case inversion of the inequalities may yield a set of intervals
(random set)S(X)inΘinstead of a confidence interval.
Remark 3.The argument used in Theorem 1 can be extended to cover the multiparameter
case, and the method will determine a confidence set for all the parameters of a distribution.
Example 1.LetX
1,X2,...,X n∼N(μ,σ
2
), whereσis unknown and we seek a (1−α)-
level confidence interval forμ. Let us choose
T(X,μ)=
X−μ
S

n,
whereX,S
2
are the usual sample statistics. The RVT(X,μ)has Student’st-distribution
withn−1 d.f., which is independent ofμandT(X,μ), as a function ofμis monotone. We
can clearly chooseλ
1(α),λ2(α)(not necessarily uniquely) so that
P{λ
1(α)<T(X,μ)<λ 2(α)}=1−α for allμ.
Solving
λ
1(α)=
X−μ
S

n,
we get
μ(X)=X−
S

n
λ
2(α),
μ(X)=X−
S

n
λ
1(α),
and the (1−α)-level confidence interval is

X−
S

n
λ
2(α),
X−
S

n
λ
1(α)

.
In practice, one choosesλ
2(α)=−λ 1(α)=t
n−1,α/2 .
Example 2.LetX
1,X2,...,X nbe iid with common PDF
f
θ(x)=exp{−(x−θ)},x>θ,and 0 elsewhere.
Then the joint PDF ofXis
f(x;θ)=exp


n
λ
i=1
xi+nθ

I
[x
(1)>θ].
Clearly,T(X,θ)=X
(1)−θis a pivot. We can chooseλ 1(α),λ 2(α)such that
P
θ
γ
λ
1(α)<X
(1)−θ<λ 2(α)
τ
=1−αfor allθ,
which yields(X
(1)−λ2(α),X
(1)−λ1(α))as a (1−α)-level confidence interval forθ.

METHODS OF FINDING CONFIDENCE INTERVALS 507
Remark 4.In Example 1 we choseλ 2=−λ 1whereas in Example 2 we did
not indicate how to choose the pair(λ
1,λ2)from an infinite set of solutions to
P
θ{λ1(α)<T(X,θ)<λ 2(α)}=1−α. One choice is theequal-tailsconfidence interval
which is arrived at by assigning probabilityα/2 to each tail of the distribution ofT.This
means that we solve
α/2=P
θ{T(X,θ)<λ 1}=P{T(X,θ)>λ 2}.
In Example 1 symmetry of the distribution leads to the indicated choice. In Example 2,
Y=X
(1)−θhas PDF
g(y)=nexp(− ny)fory>0
so we choose(λ
1,λ2)from
P
θ
γ
X
(1)−θ<λ 1
τ
=α/2=P
θ
γ
X
(1)−θ>λ 2
τ
,
givingλ
2(α)=(1/n)Λn(α/ 2)andλ 1(α)=−(1/n)Λn(1 −α/2). Yet another method is to
chooseλ
1,λ2in such a way that the resulting confidence interval has smallest length. We
will discuss this method in Section 11.4.
We next consider the method oftest inversionand explore the relationship between a
test of hypothesis for a parameterθand confidence interval forθ. Consider the following
example.
Example 3.LetX
1,X2,...,X nbe a sample fromN(μ,σ
2
0
), whereσ 0is known. In
Example 11.2.1 we showed that

X−
1

n
z
α/2σ0,
X+
1

n
z
α/2σ0

is a (1−α)-level confidence interval forμ. If we define a testϕthat rejects a value of
μ=μ
0if and only ifμ 0lies outside this interval, that is, if and only if

n|X−μ 0|
σ0
≥z
α/2,
then
P
μ0
θ

n
|X−μ 0|
σ0
≥z
α/2
α
=α,
and the testϕis a sizeαtest ofμ=μ
0against the alternativesμ =μ 0.
Conversely, a family ofα-level tests for the hypothesisμ=μ
0generates a family of
confidence intervals forμby simply taking, as the confidence interval forμ
0,thesetof
thoseμfor which one cannot rejectμ=μ
0.
Similarly, we can generate a family ofα-level tests from a (1−α)-level lower (or upper)
confidence bound. Suppose that we start with the (1−α)-level lower confidence bound

508 CONFIDENCE ESTIMATION
X−z α(σ0/

n)forμ. Then, by defining a testϕ(X) that rejectsμ≤μ 0if and only if
μ
0<
X−z α(σ0/

n), we get anα-level test for a hypothesis of the formμ≤μ 0.
Example 1 is a special case of the duality principle proved in Theorem 2 below. In the
following we restrict attention to the case in which the rejection (acceptance) region of
the test is the indicator function of a (Borel-measurable) set, that is, we consider only
nonrandomized tests (and confidence intervals). For notational convenience we write
H
0(θ0)for the hypothesisH 0:θ=θ 0andH 1(θ0)for the alternative hypothesis, which
may be one- or two-sided.
Theorem 2.LetA(θ
0),θ0∈Θ, denote the region of acceptance of anα-level test of
H
0(θ0). For each observationx=(x 1,x2,...,x n)letS(x)denote the set
S(x)={θ :x∈A(θ),θ∈Θ}. (4)
ThenS(x)is a family of confidence sets forθat confidence level 1−α. If, moreover,
A(θ
0)is UMP for the problem(α,H 0(θ0),H1(θ0)), thenS(X)minimizes
P
θ{S(X)λθ

}for allθ∈H 1(θ

) (5)
among all (1−α)-level families of confidence sets. That is,S(X)is UMA.
Proof.We have
S(x)λθif and onlyx∈A(θ), (6)
so that
P
θ{S(X)λθ}=P θ{X∈A(θ)}≥1 −α,
as asserted.
IfS

(X)is any other family of (1−α)-level confidence sets, letA

(θ)=
{x:S

(x)λθ}. Then
P
θ{X∈A

(θ)}=P θ{S

(X)λθ}≥1−α
and sinceA(θ
0)is UMP for(α,H 0(θ0),H1(θ0)), it follows that
P
θ{X∈A

(θ0)}≥P θ{X∈A(θ 0)} for anyθ∈H 1(θ0).
Hence
P
θ{S

(X)λθ 0}≥P θ{X∈A(θ 0)}=P θ{S(X)λθ 0}
for allθ∈H
1(θ0). This completes the proof.

METHODS OF FINDING CONFIDENCE INTERVALS 509
Example 4.LetXbe an RV of the continuous type with one-parameter exponential PDF,
given by
f
θ(x)=exp{Q(θ)T(x)+S
α
(x)+D(θ )},
whereQ(θ)is a nondecreasing function ofθ.LetH
0:θ=θ 0andH 1:θ<θ0. Then the
acceptance region of a UMP sizeαtest ofH
0is of the form
A(θ
0)={x :T(x)>c(θ 0)}.
Since, forθ≥θ
α
,
P
θ
θ{T(X)≤c(θ
α
)}=α=P θ{T(X)≤c(θ)}≤P θ
θ{T(X)≤c(θ)},
c(θ)may be chosen to be nondecreasing. (The last inequality follows because the power
of the UMP test is at leastα, the size.) We have
S(x)={θ :x∈A(θ)},
so thatS(x)is of the form(−∞,c
−1
(T(x))),or(−∞,c
−1
(T(x))], wherec
−1
is defined by
c
−1
(T(x)) = sup
θ
{θ:c(θ)≤T(x)}.
In particular, ifX
1,X2,...,X nis a sample from
f
θ(x)=



1
θ
e
−x/θ
,x>0,
0, otherwise,
thenT(x)=

n
i=1
xiand for testingH 0:θ=θ 0againstH 1:θ<θ0, the UMP acceptance
region is of the form
A(θ
0)={x :
n

i=1
xi≥c(θ 0)},
wherec(θ
0)is the unique solution of


c(θ
0)/θ0
y
n−1
(n−1)!
e
−y
dy=1−α, 0<α<1.
The UMA family of (1−α)-level confidence sets is of the form
S(x)={θ :x∈A(θ)}.
In the casen=1,c(θ
0)=θ 0log
1
1−α
andS(x)=

0,
x
−log(1−α)

.

510 CONFIDENCE ESTIMATION
Example 5.LetX 1,X2,...,X nbe iidU(0,θ)RVs. In Problem 9.4.3 we asked the reader
to show that the test
φ(x)=

1x
(n)>θ0orx
(n)<θ0α
1/n
,
0 otherwise
is UMP sizeαtest ofθ=θ
0againstθ =θ 0. Then
A(θ
0)={x :θ 0α
1/n
≤x
(n)≤θ0}
and it follows that[x
(n),x
(n)α
−1/n
]is a (1−α)-level UMA confidence interval forθ.
The third method we consider is based on Bayesian analysis where we take into account
any prior knowledge that the experimenter has aboutθ. This is reflected in the specification
of the prior distributionπ(θ)onΘ. Under this setup the claims of probability of coverage
are based not on the distribution ofXbut on the conditional distribution ofθgivenX=x,
the posterior distribution ofθ.
LetΘbe the parameter set, and let the observable RVXhave PDF (PMF)f
θ(x). Sup-
pose that we considerθas an RV with distributionπ(θ)onΘ. Thenf
θ(x)can be considered
as the conditional PDF (PMF) ofX, given that the RVθtakes the valueθ. Note that we
are using the same symbol for the RVθand the value that it assumes. We can determine
the joint distribution ofXandθ, the marginal distribution ofX, and also the conditional
distribution ofθ,givenX=xas usual. Thus the joint distribution is given by
f(x,θ)=π(θ)f
θ(x), (7)
and the marginal distribution ofXby
g(x)=

ϕ
π(θ)f
θ(x)ifπis a PMF,

π(θ)f
θ(x)dθifπis a PDF.
(8)
The conditional distribution ofθ, given thatxis observed, is given by
h(θ|x)=
π(θ)f
θ(x)
g(x)
,g(x)>0. (9)
Givenh(θ|x), it is easy to find functionsl(x),u(x)such that
P{l(X) <θ<u(X)}≥1 −α,
where
P{l(X) <θ<u(X) |X=x}=

u
l
h(θ|x)dθ
ϕ
u
l
h(θ|x),
(10)
depending on whetherhisaPDForaPMF.

METHODS OF FINDING CONFIDENCE INTERVALS 511
Definition 2.An interval(l(x),u(x))that has probability at least 1−αof includingθis
called a (1−α)-level Bayes intervalforθ.Alsol (x)andu(x)are called thelower and
upper limitsof the interval.
One can similarly define one-sided Bayes intervals or (1−α)-level lower and upper
Bayes limits.
Remark 5.We note that, under the Bayesian set-up, we can speak of the probability thatθ
lies in the interval(l(x),u(x))with probability 1−αbecauselanduare computed based
on the posterior distribution ofθgivenx. In order to emphasize this distinction between
Bayesian and classical analysis, some authors prefer the termcredible setsfor Bayesian
confidence sets.
Example 6.LetX
1,X2,...,X nbe iidN(μ,1),μ∈R, and let the a priori distribution ofμ
beN(0,1). Then from Example 8.8.6 we know thath(μ|x)is
N
ϕ
n
1
xi
n+1
,
1
n+1

.
Thus a (1−α)-level Bayesian confidence interval is

nx
n+1

z
α/2

n+1
,
nx
n+1
+
z
α/2

n+1

A(1−α)-level confidence interval forμ(treatingμas fixed) is a random interval with
value

x−
z
α/2

n
,x+
z
α/2

n

.
Thus the Bayesian interval is somewhat shorter in length. This is to be expected since we
assumed more in the Bayesian case.
Example 7.LetX
1,X2,...,X nbe iidb(1,p)RVs, and let the prior distribution on
Θ=(0,1)beU(0,1). A simple computation shows that the posterior PDF ofp,givenx,is
h(p|x)=



p

ϕ
n
1
x
i
(1−p)
n−
ϕ
n
1
x
i
B(
ϕ
n
1
xi+1,n−
ϕ
n
1
xi+1)
,0<p<1
0, otherwise.
Given a table of incomplete beta integrals and the observed value of
ϕ
n
1
xi, one can
easily construct a Bayesian confidence interval forp.
Finally, we consider some large sample methods of constructing confidence intervals.
SupposeT(X)∼AN(θ,v(θ)/n). Then

n
T(X)−θ

v(θ)
L
−→Z,

512 CONFIDENCE ESTIMATION
whereZ∼N(0,1). Suppose further that there is a statisticS(X)such thatS(X)
P
−→v(θ).
Then, by Slutsky’s theorem

n
T(X)−θ

S(X)
L
−→Z,
and we can obtain an (approximate) (1−α)-level confidence interval forθby inverting
the inequality






n
T(X)−θ

S(X)





≤z
α/2.
Example 8.LetX
1,X2,...,X nbe iid RVs with finite variance. Also, letEX i=μandEX
2
i
=
σ
2

2
. From the CLT it follows that X−μ
σ/

n
L
−→Z,
whereZ∼N(0,1). Suppose that we want a (1−α)-level confidence interval forμwhen
σis not known. SinceS
P
−→σ,forlargenthe quantity[

n(X−μ)/S]is approximately
normally distributed with mean 0 and variance 1. Hence, for largen, we can find constants
c
1,c2such that
P

c
1<
X−μ
S

n<c 2

=1−α.
In particular, we can choose−c
1=c2=z
α/2to give

x−
s

n
z
α/2,
x+
s

n
z
α/2

as an approximate (1−α)-level confidence interval forμ.
Recall that if
ˆ
θis the MLE ofθand the conditions of Theorem 8.7.4 or 8.7.5 are satisfied
(caution:See Remark 8.7.4), then

n(
ˆ
θ−θ)
σ
L
−→N(0,1) asn→∞,
where
σ
2
=

E θ

∂logf
θ(X)
∂θ

2
ˆ
−1
=
1
I(θ)
.
Then we can invert the statement
P
θ

−z
α/2<
ˆ
θ−θ
σ

n<z
α/2

≥1−α
to give an approximate (1−α)-level confidence interval forθ.

METHODS OF FINDING CONFIDENCE INTERVALS 513
Yet another possible procedure has universal applicability and hence can be used
for large or small samples. Unfortunately, however, this procedure usually yields con-
fidence intervals that are much too large in length. The method employs the well-known
Chebychev inequality (see Section 3.4):
P

|X−EX|<ε

var(X)

>1−
1
ε
2
.
If
ˆ
θis an estimate ofθ(not necessarily unbiased) with finite varianceσ
2
(θ), then by
Chebychev’s inequality
P

|
ˆ
θ−θ|<ε

E(
ˆ
θ−θ)
2

>1−
1
ε
2
.
It follows that

ˆ
θ−ε

E(
ˆ
θ−θ)
2
,
ˆ
θ+ε

E(
ˆ
θ−θ)
2

is a [1−(1/ε
2
)]-level confidence interval forθ. Under some mild consistency conditions
one can replace the normalizing constant

[E(
ˆ
θ−θ)
2
], which will be some functionλ(θ)
ofθ,byλ(
ˆ
θ).
Note that the estimator
ˆ
θneed not have a limiting normal law.
Example 9.LetX
1,X2,...,X nbe iidb(1,p)RVs, and it is required to find a confidence
interval forp. We know thatE
X=pand
var(X)=
var(X)
n
=
p(1−p)
n
.
It follows that
P

|X−p|<ε

p(1−p)
n

>1−
1
ε
2
.
Sincep(1−p)≤
1
4
,wehave
P

X−
1
2

n
ε<p<X+
1
2

n
ε

>1−
1
ε
2
.
One can now chooseεandnor, ifnis kept constant at a given number,εto get the
desired level.
Actually the confidence interval obtained above can be improved somewhat. We note
that
P

|X−p|<ε

p(1−p)
n

>1−
1
ε
2
.

514 CONFIDENCE ESTIMATION
so that
P

|X−p|
2
<
ε
2
p(1−p)
n

>1−
1
ε
2
.
Now
|X−p|
2
<
ε
2
n
p(1−p)
if and only if

1+
ε
2
n

p
2


2
X+
ε
2
n

p+X
2
<0.
This last inequality holds if and only ifplies between the two roots of the quadratic
equation

1+
ε
2
n

p
2


2
X+
ε
2
n

p+X
2
=0.
The two roots are
p
1=
2
X+(ε
2
/n)−

[2X+(ε
2
/n)]
2
−4[1+(ε
2
/n)]X
2
2[1+(ε
2
/n)]
=
X
1+(ε
2
/n)
+

2
/n)−

4(ε
2
/n)X(1−X)+(ε
4
/n
2
)
2[1+(ε
2
/n)]
and
p
2=
2
X+(ε
2
/n)+

[2X+(ε
2
/n)]
2
−4[1+(ε
2
/n)]X
2
2[1+(ε
2
/n)]
=
X
1+(ε
2
/n)
+

2
/n)+

4(ε
2
/n)X(1−X)+(ε
4
/n
2
)
2[1+(ε
2
/n)]
It follows that
P{p
1<p<p 2}>1−
1
ε
2
.
Note that whennis large
p
1≈
X−ε

X(1−X)
n
and p
2≈
X+ε

X(1−X)
n
,

METHODS OF FINDING CONFIDENCE INTERVALS 515
as one should expect in view of the fact thatX→pwith probability 1 and

[X(1−X)/n]
estimates

[p(1−p)/n]. Alternatively, we could have used the CLT (or large-sample
property of the MLE) to arrive at the same result but withεreplaced byz
α/2.
Example 10.LetX
1,X2,...,X nbe a sample fromU(0,θ). We seek a confidence interval
for the parameterθ. The estimator
ˆ
θ=X
(n)is the MLE ofθ, which is also sufficient forθ.
From Example 5,[X
(n),α
−1/n
X
(n)]is a (1−α)-level UMA confidence interval forθ.
Let us now apply the method of Chebychev’s inequality to the same problem. We have
E
θX
(n)=
n
n+1
θ
and
E
θ(X
(n)−θ)
2

2
2
(n+1)(n+2)
.
Thus
P

|X
(n)−θ|
θ

(n+1)(n+2)
2


>1−
1
ε
2
.
SinceX
(n)
P−→θ, we replaceθbyX
(n)in the denominator, and, for moderately largen,
P

|X
(n)−θ|
X
(n)

(n+1)(n+2)
2


>1−
1
ε
2
.
It follows that
Φ
X
(n)−εX
(n)

2

(n+1)(n+2)
,X
(n)+εX
(n)

2

(n+1)(n+2)

is a [1−(1/ε
2
)]-confidence interval forθ. Choosing 1−(1/ε
2
)=1−α,orε=1/

α,
and noting that 1/

[(n+1)(n+2)]≈1/nfor largen, and the fact that with probability 1,
X
(n)≤θ, we can use the approximate confidence interval
Φ
X
(n),X
(n)
Φ
1+
1
n

2
α
∈∈
forθ.
In the examples given above we see that, for a given confidence interval 1−α,awide
choice of confidence intervals is available. Clearly, the larger the interval, the better the
chance of trapping a true parameter value. Thus the interval(−∞,+∞), which ignores the
data completely will include the real-valued parameterθwith confidence level 1. How-
ever, the larger the confidence interval, the less meaningful it is. Therefore, for a given

516 CONFIDENCE ESTIMATION
confidence level 1−α, it is desirable to choose the shortest possible confidence interval.
Since the lengthθ−θ, in general, is a random variable, one can show that a confidence
interval of level 1−αwith uniformly minimum length among all such intervals does not
exist in most cases. The alternative, to minimizeE
θ(
θ−θ), is also quite unsatisfactory.
In the next section we consider the problem of finding shortest-length confidence interval
based on some suitable statistic.
PROBLEMS 11.3
1.A sample of size 25 from a normal population with variance 81 produced a mean of
81.2. Find a 0.95 level confidence interval for the meanμ.
2.LetXbe the mean of a random sample of sizenfromN(μ,16). Find the smallest
sample sizensuch that(X−1,X+1)is a 0.90 level confidence interval forμ.
3.LetX
1,X2,...,X mandY 1,Y2,...,Y nbe independent random samples fromN(μ 1,σ
2
)
andN(μ
2,σ
2
), respectively. Find a confidence interval forμ 1−μ2at confidence
level 1−αwhen (a)σis known and (b)σis unknown.
4.Two independent samples, each of size 7, from normal populations with common
unknown varianceσ
2
produced sample means 4.8 and 5.4 and sample variances
8.38 and 7.62, respectively. Find a 0.95 level confidence interval forμ
1−μ2,the
difference between the means of samples 1 and 2.
5.In Problem 3 suppose that the first population has varianceσ
2
1
and the second pop-
ulation has varianceσ
2
2
, where bothσ
2
1
, andσ
2
2
are known. Find a (1−α)-level
confidence interval forμ
1−μ2. What happens if bothσ
2
1
andσ
2
2
are unknown and
unequal?
6.In Problem 5 find a confidence interval for the ratioσ
2
2

2
1
, both whenμ 1,μ2are
known and whenμ
1,μ2are unknown. What happens if eitherμ 1orμ2is unknown
but the other is known?
7.LetX
1,X2,...,X nbe a sample from aG(1,β)distribution. Find a confidence interval
for the parameterβwith confidence level 1−α.
8.(a) Use the large-sample properties of the MLE to construct a (1−α)-level con-
fidence interval for the parameterθin each of the following cases: (i)X
1,
X
2,...,X nis a sample fromG(1,1/θ)and (ii)X 1,X2,...,X nis a sample
fromP(θ).
(b) In part (a) use Chebychev’s inequality to do the same.
9.For a sample of size 1 from the population
f
θ(x)=
2
θ
2
(θ−x), 0<x<θ,
finda(1−α)-level confidence interval forθ.
10.LetX
1,X2,...,X nbe a sample from the uniform distribution onNpoints. Find an
upper (1−α)-level confidence bound forN, based onmax(X
1,X2,...,X n).
11.In Example 10 find the smallestnsuch that the length of the (1−α)-level confidence
interval(X
(n),α
−1/n
X
(n))<d, provided it is known thatθ≤a, whereais a known
constant.

SHORTEST-LENGTH CONFIDENCE INTERVALS 517
12.LetXandYbe independent RVs with PDFsλe
−λx
(x>0) andμe
−μy
(y>0), respec-
tively. Find a (1−α)-level confidence region for(λ,μ) of the form{(λ,μ): λX+
μY≤k}.
13.LetX
1,X2,...,X nbe a sample fromN(μ,σ
2
), whereσ
2
is known. Find a UMA
(1−α)-level upper confidence bound forμ.
14.LetX
1,X2,...,X nbe a sample from a Poisson distribution with unknown param-
eterλ. Assuming thatλis a value assumed by aG(α,β)RV, find a Bayesian
confidence interval forλ.
15.LetX
1,X2,...,X nbe a sample from a geometric distribution with parameterθ.
Assuming thatθhas a priori PDF that is given by the density of aB(α,β)RV, find
a Bayesian confidence interval forθ.
16.LetX
1,X2,...,X nbe a sample fromN(μ,1), and suppose that the a priori PDF for
μisU(−1,1). Find a Bayesian confidence interval forμ.
11.4 SHORTEST-LENGTH CONFIDENCE INTERVALS
We have already remarked that we can increase the confidence level by simply taking
a larger-length confidence interval. Indeed, the worthless interval−∞<θ<∞, which
simply says thatθis a point on the real line, has confidence level 1. In practice, one would
like to set the level at a given fixed number 1−α(0<α<1) and, if possible, construct an
interval as short in length as possible among all confidence intervals with the same level.
Such an interval is desirable since it is more informative. We have already remarked that
shortest-length confidence intervals do not always exist. In this section we will investigate
the possibility of constructing shortest-length confidence intervals based on simple RVs.
The discussion here is based on Guenther [37]. Theorem 11.3.1 is really the key to the
following discussion.
LetX
1,X2,...,X nbe a sample from a PDFf θ(x), andT(X 1,X2,...,X n,θ)=T θbe a
pivot forθ. Also, letλ
1=λ1(α),λ 2=λ2(α)be chosen so that
P{λ
1<Tθ<λ2}=1−α, (1)
and suppose that (1) can be rewritten as
P{θ
(X)<θ<θ(X)}=1−α. (2)
For everyT
θ,λ1andλ 2can be chosen in many ways. We would like to chooseλ 1and
λ
2so that
θ−θis minimum. Such an interval is a (1−α)-level shortest-length confidence
interval based onT
θ. It may be possible, however, to find another RVT

θ
that may yield
an even shorter interval. Therefore we are not asserting that the procedure, if it succeeds,
will lead to a (1−α)-level confidence interval that has shortest length among all intervals
of this level. ForT
θwe use the simplest RV that is a function of a sufficient statistic andθ.
Remark 1.An alternative to minimizing the length of the confidence interval is to
minimize the expected lengthE
θ{
θ(X)−θ(X)}. Unfortunately, this also is quite unsat-
isfactory since, in general, there does not exist a member of the class of all (1−α)-level

518 CONFIDENCE ESTIMATION
confidence intervals that minimizesE θ{θ(X)−θ(X)}for allθ. The procedures applied
in finding the shortest-length confidence interval based on a pivot are also applicable in
finding an interval that minimizes the expected length. We remark here that the restriction
to unbiased confidence intervals is natural if we wish to minimizeE
θ{
θ(X)−θ(x)}. See
Section 11.5 for definitions and further details.
Example 1.LetX
1,X2,...,X nbe sample fromN(μ,σ
2
), whereσ
2
is known. Then
Xis
sufficient forμand take
T
μ(X)=
X−μ
σ/

n
.
Then
1−α=P
θ
a<
X−μ
σ

n<b
α
=P
θ
X−b
σ

n
<μ<X−a
σ

n
α
.
The length of this confidence interval is(σ/

n)(b−a). We wish to minimizeL=(σ/

n)
(b−a)such that
Φ(b)−Φ(a)=
1



b
a
e
−x
2
/2
dx=

b
a
ϕ(x)dx=1−α.
HereϕandΦ, respectively, are the PDF and DF of anN(0,1)RV. Thus
dL
da
=
σ

n

db
da
−1

and
ϕ(b)
db
da
−ϕ(a)=0,
giving
dL
da
=
σ

n
˜
ϕ(a)
ϕ(b)
−1

.
The minimum occurs whenϕ(a)= ϕ(b), that is, whena=bora=−b. Sincea=bdoes
not satisfy

b
a
ϕ(t)dt=1−α,
we choosea=−b. The shortest confidence interval based onT
μis therefore the equals-
tails interval,

X+z
1−α/2
σ

n
,X+z
α/2
σ

n

or

X−z
α/2
σ

n
,X+z
α/2
σ

n

.

SHORTEST-LENGTH CONFIDENCE INTERVALS 519
The length of this interval is 2z
α/2(σ/

n). In this case we can plan our experiment to give
a prescribed confidence level and a prescribed length for the interval. To have level 1−α
and length≤2d, we choose the smallestnsuch that
d≥z
α/2
σ

n
orn≥z
2
α/2
σ
2
d
2
.
This can also be interpreted as follows. If we estimateμbyX, taking a sample of size
n≥z
2
α/2

2
/d
2
), we are 100(1−α) percent confident that the error in our estimate is at
mostd.
Example 2.In Example 1, suppose thatσis unknown. In that case we use
T
μ(X)=
X−μ
S

n
as a pivot.T
μhas Student’st-distribution withn−1 d.f. Thus
1−α=P
θ
a<
X−μ
S

n<b
α
=P
θ
X−b
S

n
<μ<X−a
S

n
α
.
We wish to minimize
L=(b−a)
S

n
subject to

b
a
fn−1(t)dt=1−α,
wheref
n−1(t)is the PDF ofT μ.Wehave
dL
da
=

db
da
−1

S

n
andf
n−1(b)
db
da
−f
n−1(a)=0,
giving
dL
da
=
˜
f
n−1(a)
fn−1(b)
−1

S

n
.
It follows that the minimum occurs ata=−b(the other solution,a=b, is not admissible).
The shortest-length confidence interval based onT
μis the equal-tails interval,

X−t
n−1,α/2
S

n
,X+t
n−1,α/2
S

n

.
The length of this interval is 2t
n−1,α/2 (S/

n), which, being random, may be arbitrarily
large. Note that the same confidence interval minimizes the expected length of the interval,

520 CONFIDENCE ESTIMATION
namely,EL=(b−a)c n(σ/

n), wherec nis a constant determined fromES=c nσand the
minimum expected length is 2t
n−1,α/2 cn(σ/

n).
Example 3.LetX
1,X2,...,X nbe iidN(μ,σ
2
)RVs. Suppose thatμis known and we want
a confidence interval forσ
2
. The obvious choice for a pivotT
σ
2is given by
T
σ
2(x)=

n
1
(Xi−μ)
2
σ
2
,
which has a chi-square distribution withnd.f. Now
P
Θ
a<

n 1
(Xi−μ)
2
σ
2
<b
Γ
=1−α,
so that
P
Θ∼
n 1
(Xi−μ)
2
b

2
<

n 1
(Xi−μ
2
)
a
Γ
=1−α.
We wish to minimize
L=

1
a

1
b

n

1
(Xi−μ)
2
subject to

b
a
fn(t)dt=1−α,
wheref
nis the PDF of a chi-square RV withnd.f. We have
dL
da
=−

1
a
2

1
b
2
db
da

n

1
(Xi−μ)
2
and
db
da
=
f
n(a)
fn(b)
,
so that
dL
da
=−
˜
1
a
2

1
b
2
fn(a)
fn(b)

n

1
(Xi−μ)
2
,
which vanishes if
1
a
2
=
1
b
2
fn(a)
fn(b)
.

SHORTEST-LENGTH CONFIDENCE INTERVALS 521
Numerical results giving values ofaandbto four significant places of decimals are
available (see Tate and Klett [112]). In practice, the simpler equal-tails interval,
Φ

n
i=1
(Xi−μ)
2
χ
2
n,α/2
,

n
i=1
(Xi−μ)
2
χ
2
n,1−α/2
Σ
,
maybeused.
Ifμis unknown, we use
T
σ
2(X)=

n
1
(Xi−
X)
2
σ
2
=(n−1)
S
2
σ
2
as a pivot.T
σ
2has aχ
2
(n−1)distribution. Proceeding as above, we can show that the
shortest-length confidence interval based onT
σ
2is((n−1)(S
2
/b),(n−1)(S
2
/a)); herea
andbare a solution of
P{a<χ
2
(n−1)<b}=1−α
and
a
2
fn−1(a)=b
2
fn−1(b),
wheref
n−1is the PDF of aχ
2
(n−1)RV. Numerical solutions due to Tate and Klett [112]
may be used, but, in practice, the simpler equal-tails confidence interval,
Φ
(n−1)S
2
χ
2
n−1,α/2
,
(n−1)S
2
χ
2 n−1,1−α/2
Σ
is employed.
Example 4.LetX
1,X2,...,X nbe a sample fromU(0,θ). ThenX
(n)is sufficient forθwith
density
f
n(y)=n
y
n−1
θ
n
,0<y<θ.
The RVT
θ=X
(n)/θhas PDF
h(t)=nt
n−1
,0<t<1.
UsingT
θas pivot, we see that the confidence interval is(X
(n)/b,X
(n)/a)with length
L=X
(n)(1/a−1/b). We minimizeLsubject to

b
a
nt
n−1
dt=b
n
−a
n
=1−α.

522 CONFIDENCE ESTIMATION
Now
(1−α)
1/n
<b≤1
and
dL
db
=X
(n)


1
a
2
da
db
+
1
b
2

=X
(n)

a
n+1
−b
n+1
b
2
a
n+1

<0,
so that the minimum occurs atb=1. The shortest interval is therefore(X
(n),X
(n)/α
1/n
).
Note that
EL=

1
a

1
b

EX
(n)=

n+1

1
a

1
b

,
which is minimized subject to
b
n
−a
n
=1−α,
whereb=1 anda=α
1/n
. The expected length of the interval that minimizesELis
[(1/α
1/n
)−1][nθ/(n+1)], which is also the expected length of the shortest confidence
interval based onX
(n).
Note that the length of the interval(X
(n),α
−1/n
X
(n))goes to 0 asn→∞.
For some results on asymptotically shortest-length confidence intervals, we refer the
reader to Wilks [118, pp. 374–376].
PROBLEMS 11.4
1.LetX
1,X2,...,X nbe a sample from
f
θ(x)=

e
−(x−θ)
ifx>θ,
0 otherwise.
Find the shortest-length confidence interval forθat level 1−αbased on a sufficient
statistic forθ.
2.LetX
1,X2,...,X nbe a sample fromG(1,θ). Find the shortest-length confidence
interval forθat level 1−α, based on a sufficient statistic forθ.
3.In Problem 11.3.9 how will you find the shortest-length confidence interval forθat
level 1−α, based on the statisticX/θ?
4.LetT(X,θ)be a pivot of the formT(X,θ)=T
1(X)−θ. Show how one can construct
a confidence interval forθwith fixed widthdand maximum possible confidence
coefficient. In particular, construct a confidence interval that has fixed widthdand
maximum possible confidence coefficient for the meanμof a normal population
with variance 1. Find the smallest sizenfor which this confidence interval has a

UNBIASED AND EQUIVARIANT CONFIDENCE INTERVALS 523
confidence coefficient≥1−α. Repeat the above in sampling from an exponential
PDF
f
μ(x)=e
μ−x
forx>μandf μ(x)=0for x≤μ.
(Desu [21])
5.LetX
1,X2,...,X nbe a random sample from
f
θ(x)=
1

exp{−|x |/θ},x∈R,θ>0.
Find the shortest-length (1−α)-level confidence interval forθ, based on the
sufficient statistic

n
i=1
|Xi|.
6.In Example 4, letR=X
(n)−X
(1).Finda(1−α)-level confidence interval forθof
the form(R,R/c). Compare the expected length of this interval to the one computed
in Example 4.
7.LetX
1,X2,...,X nbe a random sample from a Pareto PDFf θ(x)=θ/x
2
,x>θ, and
=0forx≤θ. Show that the shortest-length confidence interval forθbased onX
(1)
is(X
(1)α
1/n
,X
(1)).(Useθ/X
(1)as a pivot.)
8.LetX
1,X2,...,X nbe a sample from PDFf θ(x)=1/(θ 2−θ1),θ1≤x≤θ 2,θ1<θ2
and=0 otherwise. LetR=X
(n)−X
(1).UsingR/(θ 2−θ1)as a pivot for estimating
θ
2−θ1, show that the shortest-length confidence interval is of the form(R,R/c),
wherecis determined from the level as a solution ofc
n−1
{(n−1)c−n}+α=0
(Ferentinos [27]).
11.5 UNBIASED AND EQUIVARIANT CONFIDENCE INTERVALS
In Section 11.3 we studied test inversion as one of the methods of constructing confidence
intervals. We showed that UMP tests lead to UMA confidence intervals. In Chapter 9 we
saw that UMP tests generally do not exist. In such situations we either restrict considera-
tion to smaller subclasses of tests by requiring that the test functions have some desirable
properties, or we restrict the class of alternatives to those near the null parameter values.
In this section will follow a similar approach in constructing confidence intervals.
Definition 1.A family{S(x)}of confidence sets for a parameterθis said to beunbiased
at confidence level 1−αif
P
θ{S(X)containsθ}≥1−α (1)
and
P
θ{S(X)containsθ

}≤1−α for allθ, θ

∈Θ,θ =θ

. (2)
IfS(X)is an interval satisfying (1) and (2), we call it a (1−α)-level unbiased confidence
interval. If a family of unbiased confidence sets at level 1−αis UMA in the class of all

524 CONFIDENCE ESTIMATION
(1−α)-level unbiased confidence sets, we call it a UMA unbiased (UMAU) family of
confidence sets at level 1−α. In other words ifS

(x)satisfies (1) and (2) and minimizes
P
θ{S(X)containsθ

} forθ, θ

∈Θ,θ =θ

among all unbiased families of confidence setsS(X)at level 1−α, thenS

(X)is a UMAU
family of confidence sets at level 1−α.
Remark 1.Definition 1 says that a familyS(X)of confidence sets for a parameterθis
unbiased at level 1−αif the probability of true coverage is at least 1−αand that of false
coverage is at most 1−α. In other words,S(X)traps a true parameter value more often
than it does a false one.
Theorem 1.LetA(θ
0)be the acceptance region of a UMP unbiased sizeαtest of
H
0(θ0):θ=θ 0againstH 1(θ0):θ =θ 0for eachθ 0. ThenS(x)={θ:x∈A(θ)}is a UMA
unbiased family of confidence sets at level 1−α.
Proof.To see thatS(x)is unbiased we note that, sinceA(θ)is the acceptance region of
an unbiased test,
P
θ{S(X)containsθ

}=P θ{X∈A(θ

)}≤1 −α.
We next show thatS(X)is UMA. LetS

(x)be any other unbiased (1−α)-level family
of confidence sets, and writeA

(θ)={x :S

(x)containsθ}. ThenP θ{X∈A



)}=
P
θ{S

(X)containsθ

}≤1−α, and it follows thatA

(θ)is the acceptance region of an
unbiased sizeαtest. Hence
P
θ{S

(X)containsθ

}=P θ{X∈A



)}
≥P
θ{X∈A(θ

)}
=P
θ{S(X)containsθ

}.
The inequality follows sinceA(θ)is the acceptance region of a UMP unbiased test. This
completes the proof.
Example 1.LetX
1,X2,...,X nbe a sample fromN(μ,σ
2
)where bothμandσ
2
are
unknown. For testingH
0:μ=μ 0againstH 1:μ =μ 0, it is known (Ferguson [28, p. 232])
that thet-test
ϕ(x)=



1,
|

n(x−μ 0)|
s
>c,
0,otherwise,
wherex=

x i/nands
2
=(n−1)
−1

(x
i−
x)
2
is UMP unbiased. We choosecfrom the
size requirement
α=P
μ=μ 0





n(X−μ 0)
S

>c

,

UNBIASED AND EQUIVARIANT CONFIDENCE INTERVALS 525
so thatc=t
n−1,α/2 . Thus
A(μ
0)=
Θ
x:




√ n(x−μ 0)
s

≤t
n−1,α/2
Γ
is the acceptance region of a UMP unbiased sizeαtest ofH
0:μ=μ 0againstH 1:μ =μ 0.
By Theorem 1, it follows that
S(x)={μ :x∈A(μ)}
=
Θ
x−
s

n
t
n−1,α/2 ≤μ≤
x+
s

n
t
n−1,α/2
Γ
is a UMA unbiased family of confidence sets at level 1−α.
If the measure of precision of a confidence interval is its expected length, one is natu-
rally led to a consideration of unbiased confidence intervals. Pratt [81] has shown that the
expected length of a confidence interval is the average of false coverage probabilities.
Theorem 2.LetΘbe an interval on the real line andf
θbe the PDF ofX.LetS (X)
be a family of (1−α)-level confidence intervals of finite length, that is, letS(X)=

(X),θ(X)), and suppose thatθ(X)−θ(X)is (random) finite. Then

(θ(x)−θ(x))f θ(x)dx=

θ
Θ
Σ=θ
Pθ{S(X)containsθ
Γ
}dθ
Γ
(3)
for allθ∈Θ.
Proof.We have
θ−θ=

θ
θ

Γ
.
Thus for allθ∈Θ
E
θ{
θ(X)−θ(X)}=E θ


θ
θ

Γ

=

f
θ(x)


θ
θ

Γ

dx
=



θ
θ
fθ(x)dx


Γ
=

P θ{S(X)containsθ
Γ
}dθ
Γ
=

θ
Θ
Σ=θ
Pθ{S(X)containsθ
Γ
}dθ
Γ
.

526 CONFIDENCE ESTIMATION
Remark 2.IfS(X)is a family of UMAU (1−α)-level confidence intervals, the expected
length ofS(X)is minimal. This follows since the left-hand side of (3) is the expected
length, ifθis the true value, ofS(X)andP
θ{S(X)containsθ
Γ
}is minimal [because
S(X)is UMAU], by Theorem 1, with respect to all families of 1−αunbiased confidence
intervals uniformly inθ(θ =θ
Γ
).
Since a reasonably complete discussion of UMP unbiased tests (see Section 9.5) is
beyond the scope of this text, the following procedure for determining unbiased confidence
intervals is sometimes quite useful (see Guenther [38]). LetX
1,X2,...,X nbe a sample
from an absolutely continuous DF with PDFf
θ(x)and suppose that we seek an unbiased
confidence interval forθ. Following the discussion in Section 11.4, suppose that
T(X
1,X2,...,X n,θ)=T(X,θ)=T θ
is a pivot, and suppose that the statement
P{λ
1(α)<T θ<λ2(α)}=1−α
can be converted to
P
θ{θ
(X)<θ<θ(X)}=1−α.
In order for(θ,θ)to be unbiased, we must have
P(θ,θ
Γ
)=P θ{θ
(X)<θ
Γ
<θ(X)}=1−α ifθ
Γ
=θ (4)
and
P(θ,θ
Γ
)<1−α ifθ
Γ
=θ. (5)
IfP(θ,θ
Γ
)depends only on a functionγofθ,θ
Γ
, we may write
P(γ)

=1−αifθ
Γ
=θ,
<1−αifθ
Γ
=θ,
(6)
and it follows thatP(γ)has a maximum atθ
Γ
=θ.
Example 2.LetX
1,X2,...,X nbe iidN(μ,σ
2
)RVs, and suppose that we desire an
unbiased confidence interval forσ
2
. Then
T(X,σ
2
)=
(n−1)S
2
σ
2
=Tσ
has aχ
2
(n−1)distribution, and we have
P
Θ
λ
1<(n−1)
S
2
σ
2
<λ2
Γ
=1−α,

UNBIASED AND EQUIVARIANT CONFIDENCE INTERVALS 527
so that
P
Θ
(n−1)
S
2
λ2

2
<(n−1)
S
2
λ1
Γ
=1−α.
Then
P(σ
2

Γ2
)=P
σ
2
Θ
(n−1)
S
2
λ2

Γ2
<(n−1)
S
2
λ1
Γ
=P
Θ
T
σ
λ2
<γ<
T
σ
λ1
Γ
,
whereγ=σ
Γ2

2
andT σ∼χ
2
(n−1). Thus
P(γ)=P{λ
1γ<T σ<λ2γ}.
Then
P(1)= 1−α
and
P(γ)<1−α.
Thus we needλ
1,λ2such that
P(1)= 1−α (7)
and
dP(γ)



γ=1
=λ2fn−1(λ2)−λ 1fn−1(λ1)=0, (8)
wheref
n−1is the PDF ofT σ. Equations (7) and (8) have been solved numerically for
λ
1,λ2by several authors (see, for example, Tate and Klett [112]). Having obtainedλ 1,λ2
from (7) and (8), we have as the unbiased (1−α)-level confidence interval

(n−1)
S
2
λ2
,(n−1)
S
2
λ1

. (9)
Note that in this case the shortest-length confidence interval (based onT
σ) derived in
Example 11.4.3, the usual equal-tails confidence interval, and (9) are all different. The
length of the confidence interval (9), however, can be considerably greater than that of the
shortest interval of Example 11.4.3. For largenall three sets of intervals are approximately
the same.
Finally, let us briefly investigate how invariance considerations apply to confidence
estimation. LetX=(X
1,X2,...,X n)∼fθ,θ∈Θ⊆R.LetGbe a group of transformations

528 CONFIDENCE ESTIMATION
onXwhich leavesP={f θ:θ∈Θ}invariant. LetS(X)bea(1−α)-level confidence
set forθ.
Definition 2.LetPbe invariant underGand letS(x)be a confidence set forθ. ThenSis
equivariantunderG, if for everyx∈X,θ∈Θandg∈G
S(x)∈θ⇔S(g(x))Φgθ. (10)
Example 3.LetX
1,X2,...,X nbe a sample from PDF
f
θ(x)=exp{−(x−θ)},x>θ
and=0ifx≤θ.LetG={{a,1}:a∈R}, where{a,1}x=(x
1+a,x 2+a,...,x n+a)
andGinduces
¯
G=GonΘ=R. The family{f
θ}remains invariant underG. Consider a
confidence interval of the form
S(x)={θ :
x−c 1≤θ≤x+c 2}
wherec
1,c2are constants. Then
S({a,1}x)= {θ:
x+a−c 1≤θ≤x+a−c 2}.
Clearly,
S(x)Φθ⇐⇒x+a−c 1≤θ+a≤x+a−c 2
⇐⇒S({a,1}x)Φgθ
and it follows thatS(x)is an equivariant confidence interval.
The most useful method of constructing invariant confidence intervals is test inversion.
Inverting the acceptance region of invariant tests often leads to equivariant confidence
intervals under certain conditions. Recall that a groupGof transformations leaves a
hypothesis testing problem invariant ifGleaves bothΘ
0andΘ 1invariant. For each
H
0:θ=θ 0,θ0∈Θ, we have a different group of transformations,G θ0
, which leaves the
problem of testingθ=θ
0invariant. The equivariant confidence interval, on the other hand,
must be equivariant with respect toG, which is a much larger group sinceG⊃G
θ0
for
allθ
0. The relationship between an equivariant confidence set and invariant tests is more
complicated when the familyPhas a nuisance parameterτ.
Under certain conditions there is a relationship between equivariant confidence sets
and associated invariant tests. Rather than pursue this relationship, we refer the reader to
Ferguson [28, p. 262]; it is generally easy to check that (10) holds for a given confidence
intervalSto show thatSis invariant. The following example illustrates this point.

UNBIASED AND EQUIVARIANT CONFIDENCE INTERVALS 529
Example 4.LetX 1,X2,...,X nbe iidN(μ,σ
2
)RVs where bothμandσ
2
are unknown.
In Example 9.5.3 we showed that the test
φ(x)=

1if
ϕ
n
1
(xi−
x)
2
≤σ
2
0
χ
2
n−1,1−α
0 otherwise
is UMP invariant, under translation group for testingH
0:σ
2
≥σ
2
0
againstH 1:σ
2

2
0
.
Then the acceptance region ofφis
A(x)=

x:
n
λ
1
(xi−
x)
2

2
0
χ
2
n−1,1−α

.
Clearly,
x∈A(x)⇐⇒σ
2
0
<(n−1)s
2

2
n−1,1−α
and it follows that
S(x)=
γ
σ
2

2
<(n−1)s
2

2
n−1,1−α
τ
is a (1−α)-level confidence interval (upper confidence bound) forσ
2
. We show thatSis
invariant with respect to the scale group. In fact
S({0,c}x)=
γ
σ
2

2
<c
2
(n−1)s
2

2
n−1,1−α
τ
and
σ
2
<(n−1)s
2

2
n−1,1−α
⇐⇒S({0,c}x)σ¯gσ
2
={0,c}σ
2
and it follows thatS(x)is an equivariant confidence interval forσ
2
.
PROBLEMS 11.5
1.LetX
1,X2,...,X nbe a sample fromU(0,θ). Show that the unbiased confidence inter-
vals forθbased on the pivot maxX
i/θcoincides with the shortest-length confidence
interval based on the same pivot.
2.LetX
1,X2,...,X nbe a sample fromG(1,θ). Find the unbiased confidence interval
forθbased on the pivot 2
ϕ
n
i=1
Xi/θ.
3.LetX
1,X2,...,X nbe a sample from PDF
f
θ(x)=

e
−(x−θ)
ifx>θ
0 otherwise.
Find the unbiased confidence interval based on the pivot 2n[minX
i−θ].
4.LetX
1,X2,...,X nbe iidN(μ,σ
2
)RVs where bothμandσ
2
are unknown. Using
the pivotT
μ,σ=

n(X−μ)/Sshow that the shortest-length unbiased (1−α)-level
confidence interval forμis the equal-tails interval(X−t
n−1,α/2 S/

n,X+
t
n−1,α/2 S/

n).

530 CONFIDENCE ESTIMATION
5.LetX 1,X2,...,X nbe iid with PDFf θ(x)=θ/x
2
,x≥θ, and=0 otherwise. Find the
shortest-length (1−α)-level unbiased confidence interval forθbased on the pivot
θ/X
(1).
6.LetX
1,X2,...,X nbe a random sample from a location familyP={f θ(x)=
f(x−θ);θ∈R}. Show that a confidence interval of the formS(x)={θ :T(x)−c
1≤
θ≤T(x)+c
2}whereT(x)is an equivariant estimate under location group is an
equivariant confidence interval.
7.LetX
1,X2,...,X nbe iid RVs with common scale PDFf σ(x)=
1
σ
f(x/σ),σ>0.
Consider the scale groupG={{0,b}:b>0}.IfT(x)is an equivariant estimate
ofσ, show that a confidence interval of the form
X(x)=
θ
σ:c
1≤
T(x)
σ
≤c
2
α
is equivariant.
8.LetX
1,X2,...,X nbe iid RVs with PDFf θ(x)=exp{−(x−θ)},x>θand,=0,
otherwise. For testingH
0:θ=θ 0againstH 1:θ>θ0, consider the (UMP) test
φ(x)=1, ifX
(1)≥θ0−(σnα)/n,=0,otherwise.
Is the acceptance region of thisα-level test an equivariant (1−α)-level confidence
interval (lower bound) forθwith respect to the location group?
11.6 RESAMPLING: BOOTSTRAP METHOD
In many applications of statistical inference the investigator has a random sample from a
population distribution DFFwhich may or may not be completely specified. Indeed the
empirical data may not even fit any known distribution. The inference is typically based
on some statistic such as
X,S
2
, a percentile or some much more complicated statistic
such as sample correlation coefficient or odds ratio. For this purpose we need to know the
distribution of the statistic being used and/or its moments. Except for the simple situations
such as those described in Chapter 6 this is not easy. And even if we are able to get a handle
on it, it may be inconvenient to deal with it. Often, when the sample is large enough, one
can resort to asymptotic approximations considered in Chapter 7. Alternatively, one can
use computer-intensive techniques which have become quite popular in the last 25 years
due to the availability of fast home or office laptops and desktops.
Supposex
1,x2,...,x nis a random sample from a distributionFwith unknown param-
eterθ(F), and let
˜
θbe an estimate ofθ(F). What is the bias of
˜
θand its SE?Resampling
refers to sampling fromx
1,x2,...,x nand using these samples to estimate the statistical
properties of
˜
θ.Jackknifeis one such method where one uses subsets of the sample by
excluding one or more observations at a time. For each of these subsamples an estimate
˜
θj
ofθis computed, and these estimates are then used to investigate the statistical properties
of
˜
θ.
The most commonly used resampling method is thebootstrap, introduced by
Efron [22], where one draws random samples of sizen, with replacement, from
x
1,x2,...,x n. This allows us to generate a large number of bootstrap samples and hence

RESAMPLING: BOOTSTRAP METHOD 531
bootstrap estimates
ˆ
θ bofθ. This bootstrap distribution of
ˆ
θ bis then used to study the
statistical properties of
˜
θ.
LetX

b1
,X

b2
,...,X

bn
,b=1,2,...,B, be iid RVs with common DFF

n
, the empirical DF
corresponding to the samplex
1,x2,...,x n. Then (X

b1
,X

b2
,...,X

bn
)iscalledabootstrap
sample.Letθ be the parameter of interest associated with DFFand suppose we have
chosen
ˆ
θto be an estimate ofθbased on the samplex
1,x2,...,x n. For each bootstrap
sample let
ˆ
θ
b,b=1,2,...,B, be the corresponding bootstrap estimate ofθ. We can now
study the statistical properties of
ˆ
θbased on the distribution of the
ˆ
θ
b,b=1,2,...,B,
values. Let
θ


B

θ
b/B. Then the variance of
ˆ
θis estimated by the bootstrap variance.
var
bs(
ˆ
θ)=var(
ˆ
θ b)=
1
B−1
Σ
B
b=1
!
ˆ
θ
b−
θ

"
2
. (1)
Similarly the bias of
ˆ
θ,b(θ)=E(
ˆ
θ)-θ, is estimated by
bias
bs(
ˆ
θ)=
θ


ˆ
θ. (2)
Arranging the values of
ˆ
θ
b,b=1,2,...,B, in increasing order of magnitude and then
excluding 100α/ 2 percent smallest and largest values we get a (1−α)-level confidence
interval forθ. This is the so-calledpercentile confidence interval. One can also use this
confidence interval to test hypotheses concerningθ.
Example 1.For this example we took a random sample of size 20 from a distribution on
(.25, 1.25) with following results.
0.75 0.49 1.14 0.79 0.59 1.14 1.17 0.42 0.57 1.05
0.31 0.46 0.73 0.32 0.81 0.45 0.56 0.42 0.66 0.63
Suppose we wish to estimate the meanθof the population distribution. For the sake of
this illustration we use
ˆ
θ=
xand use the bootstrap to estimate the SE of
ˆ
θ.
We took 1000 random samples, with replacement, of size 20 each from this sample
with the following distribution of
ˆ
θ
b.
Interval Frequency 0.49–0.56 6 0.53–0.57 29 0.57–0.61 109 0.61–0.65 200 0.65–0.69 234 0.69–0.73 229 0.73–0.77 123 0.77–0.81 59 0.81–0.85 10 0.85–0.89 2

532 CONFIDENCE ESTIMATION
The bootstrap estimate ofθisθ

=0.677 and that of the variance is 0.061. By excluding
the smallest and the largest twenty-five vales of
ˆ
θ
ba 95 percent confidence interval forθ
is given by (0.564, 0.793). (We note that
x=0.673 ands
2
=0.273 so the SE(x)=.061.)
Figure 1 show the frequency distribution of the bootstrap statistic
ˆ
θ
b.
It is natural to ask how well does the distribution of the bootstrap statistic
ˆ
θ
bapproxi-
mate the distribution of
ˆ
θ? The bootstrap approximation is often better when applied to the
appropriately centered
ˆ
θ. Thus to estimate population meanθbootstrap is applied to the
centered sample mean
x−θ. The corresponding bootstrapped version will then bexb−x,
wherexbis the sample mean of the bth bootstrap sample. Similarly if
ˆ
θ=Z
1/2= med(X 1,
X
2,...,X n) then the bootstrap approximation will be applied to the centered sample median
Z
1/2−F
−1
(0.5). The bootstrap version will be then be med(X

b1
,X

b2
,...,X

bn
)−Z
1/2.Sim-
ilarly , in estimation of the distribution of sample varianceS
2
, the bootstrap version will
be applied to the ratioS
2

2
, whereσ
2
is the variance of the DF F.
We have already considered the percentile method of constructing confidence intervals.
Let us denote theαthpercentile of the distribution of
ˆ
θ
b,b=1,2,...,B,byB α. Suppose that
the sampling distribution of
ˆ
θ−θis approximated by the bootstrap distribution of
ˆ
θ
b−
ˆ
θ.
Then the probability that
ˆ
θ−θis covered by the interval (B
α/2−
ˆ
θ,B
1−α/2 +
ˆ
θ) is approx-
imately (1−α). This is called a (1−α)-levelcentered bootstrap percentile confidence
intervalforθ.
Recall that in sampling from a normal distribution when both mean and the variance
are unknown, a (1−α)-level confidence interval for the meanθis based ont-statistic and
is given by (
x−t
n−1,α/2 ,x+t
n−1,α/2 ). For nonnormal distributions the bootstrap analog
of the Student’st-statistic is the statistic(
ˆ
θ−θ)/(ˆσ/

n). The bootstrap version is the
statisticT
b=(
ˆ
θ b−
ˆ
θ)/SE b, whereSE bis the SE computed from the bootstrap sample
distribution.A(1−α)-level confidence interval is now easily constructed.
0
0.49–0.53
0.53–0.57
0.57–0.61
0.61–0.65
0.65–0.69
0.69–0.73
0.73–0.77
0.77–0. 8 1
0. 8 1–0. 8 5
0. 8 5–0. 8 9
50
100
150
200
250
Fig. 1

RESAMPLING: BOOTSTRAP METHOD 533
In our discussion above we have assumed that F(θ)is completely unspecified. What if
we know F except for the parameterθ? In that case we take bootstrap samples from the
distribution F(
ˆ
θ).
We refer the reader to Efron and Tibshirani [23] for further details.
PROBLEMS 11.6
1.(a) Show that there are
#
2n−1
n
$
distinct bootstrap samples of sizen. [Hint: Problem
1.4.17.]
(b) What is the probability that a bootstrap sample is identical to the original
samples?
(c) What is the most likely bootstrap sample to be drawn?
(d) What is the mean number of times thatx
iappears in the bootstrap samples?
2.Letx
1,x2,...,x nbe a random sample. Thenˆμ=
xis an estimate of the unknown
meanμ. Consider the leave-one-out Jackknife sample. Let˜μ
ibe the mean of the
remaining (n −1) observations whenx
iis excluded:
(a) Show thatx
i=nˆμ−(n−1)˜μ i.
(b) Now suppose we need to estimate a parameterθand choose
ˆ
θto be an estimate
from the sample. Imitating the Jackknife procedure for estimatingμwe note that
θ

i
=n
ˆ
θ−(n−1)
˜
θ i. What is the Jackknife estimate ofθ? What is the Jackknife
estimate of the bias of
ˆ
θand its variance?
3.Letx
1,x2,...,x nbe a random sample fromN(θ,1) and suppose that
xis an estimate
ofθ.LetX

1
,X

2
,...,X

n
be a bootstrap sample fromN(
x,1). Show that bothX−θ
andX

−xhave the sameN(0,1/n) distribution.
4.Consider the data set
2, 5, 3, 9.
Letx

1
,x

2
,x

3
,x

4
be a bootstrap sample from this data set:
(a) Find the probability that the bootstrap mean equals 2.
(b) Find the probability that the maximum value of the bootstrap mean is 9.
(c) Find the probability that the bootstrap sample mean is 4.

12
GENERAL LINEAR HYPOTHESIS
12.1 INTRODUCTION
This chapter deals with the general linear hypothesis. In a wide variety of problems the
experimenter is interested in making inferences about a vector parameter. For example,
he may wish to estimate the mean of a multivariate normal or to test some hypotheses
concerning the mean vector. The problem of estimation can be solved, for example, by
resorting to the method of maximum likelihood estimation, discussed in Section 8.7. In this
chapter we restrict ourselves to the so-called linear model problems and concern ourselves
mainly with problems of hypotheses testing.
In Section 12.2 we formally describe the general model and derive a test in complete
generality. In the next four sections we demonstrate the power of this test by solving
four important testing problems. We will need a considerable amount of linear algebra
in Section 12.2.
12.2 GENERAL LINEAR HYPOTHESIS
A wide variety of problems of hypotheses testing can be treated under a general setup. In
this section we state the general problem and derive the test statistic and its distribution.
Consider the following examples.
Example 1.LetY
1,Y2,...,Y kbe independent RVs withEY i=μi,i=1,2,...,k, and com-
mon varianceσ
2
.Also,n iobservations are taken onY i,i=1,2,...,k, and
σ
k
i=1
ni=n.
An Introduction to Probability and Statistics, Third Edition. Vijay K. Rohatgi and A.K. Md. Ehsanes Saleh.
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.

536 GENERAL LINEAR HYPOTHESIS
It is required to testH 0:μi=μ2=···=μ k. The casek=2 has already been treated
in Section 10.4. Problems of this nature arise quite naturally, for example, in agricultural
experiments where one is interested in comparing the average yield whenkfertilizers are
available.
Example 2.An experimenter observes the velocity of a particle moving along a line. He
takes observations at given timest
1,t2,...,t n.Letβ 1be the initial velocity of the particle
andβ
2the acceleration; then the velocity at timetis given byy=β 1+β2t+ε, whereεis an
RV that is nonobservable (like an error in measurement). In practice, the experimenter does
not knowβ
1andβ 2and has to use the random observationsY 1,Y2,...,Y nmade at times
t
1,t2,...,t n, respectively, to obtain some information about the unknown parametersβ 1,β2.
A similar example is the case when the relation betweenyandtis governed by
y=β
0+β1t+β 2t
2
+ε,
wheretis a mathematical variable,β
0,β1,β2are unknown parameters, andεis a nonob-
servable RV. The experimenter takes observationsY
1,Y2,...,Y nat predetermined values
t
1,t2,...,t n, respectively, and is interested in testing the hypothesis that the relation is in
fact linear, that is,β
2=0.
Examples of the type discussed above and their much more complicated variants can
all be treated under a general setup. To fix ideas, let us first make the following definition.
Definition 1.LetY=(Y
1,Y2,...,Y n)
Θ
be a random column vector andXbe ann×k
matrix,k<n, of known constantsx
ij,i=1,2,...,n;j=1,2,...,k. We say that the
distribution ofYsatisfies alinear modelif
EY=Xβ, (1)
whereβ=(β
1,β2,...,βk)
Θ
is a vector of unknown (scalar) parametersβ 1,β2,...,βk.Itis
convenient to write
Y=Xβ+ε, (2)
whereε=(ε
1,ε2,...,εn)
Θ
is a vector of nonobservable RVs withEε j=0,j=1,2,...,n.
Relation (2) is known as a linear model. Thengeneral linear hypothesisconcernsβ,
namely, thatβsatisfiesH
0:Hβ=0, whereHis a knownr×kmatrix withr≤k.
In what follows we will assume thatε
1,ε2,...,εnare independent, normal RVs with
common varianceσ
2
andEε j=0,j=1,2,...,n. In view of (2), it follows thatY 1,Y2,...,Y n
are independent normal RVs with
EY
i=
k
Ω
j=1
xijβjand var(Y i)=σ
2
,i=1,2,...,n. (3)
We will assume thatHis a matrix of full rankr,r≤k, andXis a matrix of full rankk<n.
Some remarks are in order.

GENERAL LINEAR HYPOTHESIS 537
Remark 1.Clearly,Ysatisfies a linear model if the vector of meansEY=(EY 1,
EY
2,...,EY n)
Θ
lies in ak-dimensional subspace generated by the linearly independent
column vectorsx
1,x2,...,x kof the matrixX. Indeed, (1) states thatEYis a linear combi-
nation of the known vectorsx
1,...,x k. The general linear hypothesisH 0:Hβ=0states
that the parametersβ
1,β2,...,βksatisfyrindependent homogeneous linear restrictions. It
follows that, underH
0,EYlies in a(k−r)-dimensional subspace of thek-space generated
byx
1,...,x k.
Remark 2.The assumption of normality, which is conventional, is made to compute the
likelihood ratio test statistic ofH
0and its distribution. If the problem is to estimateβ,no
such assumption is needed. One can use theprinciple of least squaresand estimateβby
minimizing the sum of squares,
n
Ω
i=1
ε
2
i
=εε
Θ
=(Y−Xβ)
Θ
(Y−Xβ). (4)
The minimizing value
ˆ
β(y)is known as aleast square estimate ofβ. This is not a difficult
problem, and we will not discuss it here in any detail but will mention only that any solution
of the so-callednormal equations
X
Θ
Xβ=X
Θ
Y (5)
is a least square estimator. If the rank ofXisk(<n), thenX
Θ
X, which has the same rank
asX, is a nonsingular matrix that can be inverted to give a unique least square estimator
ˆ
β=(X
Θ
X)
−1
X
Θ
Y. (6)
If the rank ofXis<k, thenX
Θ
Xis singular and the normal equations do not have a
unique solution. One can show, for example, that
ˆ
βis unbiased forβ, and if theY
i’s
are uncorrelated with common varianceσ
2
, the variance–covariance matrix of the
ˆ
β i’s is
given by
E
ε
π
ˆ
β−β
απ
ˆ
β−β
α
Θ
λ

2
(X
Θ
X)
−1
. (7)
Remark 3.One can similarly compute the so-calledrestricted least square estimatorofβ
by the usual method of Lagrange multipliers. For example, underH
0:Hβ=0one simply
minimizes(Y−Xβ)
Θ
(Y−Xβ)subject toHβ=0to get the restricted least square
estimator
ˆ
β. The important point is that, ifεis assumed to be a multivariate normal RV
with mean vector0and dispersion matrixσ
2
In,theMLEofβis the same as the least
square estimator. In fact, one can show that
ˆ
β
iis the UMVUE ofβ i,i=1,2,...,k,bythe
usual methods.
Example 3.Suppose that a random variableYis linearly related to a mathematical vari-
ablexthat is not random (see Example 2). LetY
1,Y2,...,Y nbe observations made at
different known valuesx
1,x2,...,x nofx. For example,x 1,x2,...,x nmay represent dif-
ferent levels of fertilizer, andY
1,Y2,...,Y n, respectively, the corresponding yields of

538 GENERAL LINEAR HYPOTHESIS
a crop. Also,ε 1,ε2,...,εnrepresent unobservable RVs that may be errors of measurements.
Then
Y
i=β0+β1xi+εi,i=1,2,...,n,
and we wish to test whetherβ
1=0, that the fertilizer levels do not affect the yield. Here
X=






1x
1
1x 2
.
.
.
.
.
.
1x
n






,
β=(β
0,β1)
σ
,andε=(ε 1,ε2,...,εn)
σ
.
The hypothesis to be tested isH
0:β1=0 so that, withH=(0,1), the null hypothesis can
be written asH
0:Hβ=0. This is a problem oflinear regression.
Similarly, we may assume that the regression ofYonxisquadratic:
Y=β
0+β1x+β 2x
2
+ε,
and we may wish to test that a linear function will be sufficient to describe the relationship,
that is,β
2=0. HereXis then×3matrix
X=






1x
1x
2
1
1x 2x
2
2
.
.
.
.
.
.
.
.
.
1x
nx
2
n






,
β=(β
0,β1,β2)
σ
,ε=(ε 1,ε2,...,εn)
σ
,
andHis the 1×3matrix(0, 0,1).
In another example of regression, theY’s can be written as
Y=β
1x1+β2x2+β3x3+ε,
and we wish to test the hypothesis thatβ
1=β2=β3. In this case,Xis the matrix
X=






x
11x12x13
x21x22x23
.
.
.
.
.
.
.
.
.
x
n1xn2xn3






,
andHmay be chosen to be the 2×3matrix
H=

10 −1
1−10
φ
.

GENERAL LINEAR HYPOTHESIS 539
Example 4.Another important example of the general linear hypothesis involves the
analysis of variance. We have already derived tests of hypotheses regarding the equal-
ity of the means of two normal populations when the variances are equal. In practice, one
is frequently interested in the equality of several means when the variances are the same,
that is, one hasksamples fromN(μ
1,σ
2
),...,N (μ k,σ
2
), whereσ
2
is unknown and one
wants to testH
0:μ1=μ2=···=μ k(see Example 1). Such a situation is of common
occurrence in agricultural experiments. Suppose thatktreatments are applied to experi-
mental units (plots), theith treatment is applied ton
irandomly chosen units,i=1,2,...,k,
Θ
k
i=1
ni=n, and the observationy ijrepresents some numerical characteristic (yield) of the
jth experimental unit under theith treatment. Suppose also that
Y
ij=μi+εij,j=1,2,...,n; i=1,2,...,k,
whereε
ijare iidN(0,σ
2
)RVs. We are interested in testingH 0:μ1=μ2=···=μ k.We
write
Y=(Y
11,Y12,...,Y 1n1
,Y21,Y22,...,Y 2n2
,...,Y k1
,Yk2
,...,Y knk
)
Θ
,
β=(μ
1,μ2,...,μk)
Θ
,
X=






1
n1
0···0
01
n2
···0
.
.
.
.
.
.
···
···
···.
.
.
00 ···1
nk






,
where1
ni
=(1,1,...,1)
Θ
is then i-vector(i=1,2,...,k), each of whose elements is unity.
ThusXisn×k. We can choose
H=






1−10 ···0
10 −1···0
.
.
.
.
.
.
.
.
.
···
···
···.
.
.
100 ··· −1






so thatH
0:μ1=μ2=···=μ kis of the formHβ=0.HereH is a(k−1)×kmatrix.
The model described in this example is frequently referred to as aone-way analysis of
variancemodel. This is a very simple example of an analysis of variance model. Note that
the matrixXis of a very special type, namely, the elements ofXare either 0 or 1.Xis
known as adesign matrix.
Returning to our general model
Y=Xβ+ε,
we wish to test the null hypothesisH
0:Hβ=0. We will compute the likelihood ratio
test and the distribution of the test statistic. In order to do so, we assume thatεhas a
multivariate normal distribution with mean vector0and variance–covariance matrixσ
2
In,

540 GENERAL LINEAR HYPOTHESIS
whereσ
2
is unknown andI nis then×nidentity matrix. This means thatYhas ann-variate
normal distribution with meanXβand dispersion matrixσ
2
Infor someβand someσ
2
,
both unknown. Here the parameter spaceΘis the set of(k+1)-tuples(β
σ

2
)=(β 1,β2,
...,β
k,σ
2
), and the joint PDF of theX’s is given by
f
β,σ
2(y1,y2,...,y n) (8)
=
1
(2π)
n/2
σ
n
exp
γ

1

2
n
β
i=1
(yi−β1xi1−···−β kxik)
2

=
1
(2π)
n/2
σ
n
exp
ε

1

2
(Y−Xβ)
σ
(Y−Xβ)
λ
.
Theorem 1.Consider the linear model
Y=Xβ+ε,
whereXis ann×kmatrix((x
ij)),i=1,2,...,n,j=1,2,...,k, of known constants
and full rankk<n,βis a vector of unknown parametersβ
1,β2,...,βkandε=(ε 1,ε2,
...,ε
n)is a vector of nonobservable independent normal RVs with common varianceσ
2
and meanEε=0. The likelihood ratio test for testing the linear hypothesisH 0:Hβ=0,
whereHis anr×kmatrix of full rankr≤k, is to rejectH
0at levelαifF≥F α, where
P
H0
{F≥F α}=α, andFis the RV given by
F=
(Y−X
ˆ
ˆ
β)
σ
(Y−X
ˆ
ˆ
β)−(Y−X
ˆ
β)
σ
(Y−X
ˆ
β)
(Y−X
ˆ
β)
σ
(Y−X
ˆ
β)
. (9)
In (9),
ˆ
β,
ˆ ˆ
βare the MLE’s ofβunderΘandΘ
0, respectively. Moreover, the RV[(n−
k)/r]Fhas anF-distribution with(r,n−k)d.f. underH
0.
Proof.The likelihood ratio test ofH
0:Hβ=0is to rejectH 0if and only ifλ(y)<c,
where
λ(y)=
sup
θ∈Θ 0
f
β,σ
2(y)
sup
θ∈Θf
β,σ
2(y)
, (10)
θ=(β
σ

2
)
σ
, andΘ 0={(β
σ

2
)
σ
:Hβ=0}.Let
ˆ
θ=(
ˆ
β
σ
,ˆσ
2
)
σ
be the MLE ofθ
σ
∈Θ,
and
ˆ ˆ
θ=(
ˆ ˆ
β
σ
,
ˆ ˆσ
2
)
σ
be the MLE ofθunderH 0, that is, whenHβ=0. It is easily seen that
ˆ
βis the value ofβthat minimizes(y−Xβ)
σ
(y−Xβ), and
ˆσ
2
=n
−1
(y−X
ˆ
β)
σ
(y−X
ˆ
β). (11)
Similarly,
ˆ
ˆ
βis the value ofβthat minimizes(y−Xβ)
σ
(y−Xβ)subject toHβ=0, and
ˆ
ˆσ
2
=n
−1
(y−X
ˆ
ˆ
β)
σ
(y−X
ˆ
ˆ
β). (12)

GENERAL LINEAR HYPOTHESIS 541
It follows that
λ(y)=

ˆσ
2
ˆ
ˆσ2
n/2
. (13)
The critical regionλ(y)<cis equivalent to the region{λ(y)}
−2/n
<{c}
−2/n
, which is
of the form
ˆ
ˆσ
2
ˆσ
2
>c1. (14)
This may be written as
(y−X
ˆ ˆ
β)
σ
(y−X
ˆ ˆ
β)
(y−X
ˆ
β)
σ
(y−X
ˆ
β)
>c
1 (15)
or, equivalently, as
(y−X
ˆ
ˆ
β)
σ
(y−X
ˆ
ˆ
β)−(y−X
ˆ
β)
σ
(y−X
ˆ
β)
(y−X
ˆ
β)
σ
(y−X
ˆ
β)
>c
1−1. (16)
It remains to determine the distribution of the test statistic. For this purpose it is
convenient to reduce the problem to thecanonical form.Let V
nbe the vector space of
the observation vectorY,V
kbe the subspace ofV ngenerated by the column vectors
x
1,x2,...,x kofX, andV k−rbe the subspace ofV kin whichEYis postulated to lie
underH
0. We change variables fromY 1,Y2,...,Y ntoZ1,Z2,...,Z n, whereZ 1,Z2,...,Z n
are independent normal RVs with common varianceσ
2
and meansEZ i=θi,i=1,2,...,k,
EZ
i=0,i=k+1,...,n. This is done as follows. Let us choose an orthonormal basis
ofk−rcolumn vectors{α
i}forV k−r,say{α r+1,αr+2,...,α k}. We extend this to an
orthonormal basis{α
1,α2,...,α r,αr+1,...,α k}forV k, and then extend once again to
an orthonormal basis{α
1,α2,...,α k,αk+1,...,α n}forV n. This is always possible.
Letz
1,z2,...,z nbe the coordinates ofyrelative to the basis{α 1,α2,...,α n}. Thenz i=
α
σ
i
yandz=PY, wherePis an orthogonal matrix withith rowα
σ
i
. ThusEZ i=Eα
σ
i
Y=
α
σ
i
Xβ, andEZ=PXβ. SinceXβ∈V k(Remark 1), it follows thatα
σ
i
Xβ=0fori >k.
Similarly, underH
0,Xβ∈V k−r⊂Vk, so thatα
σ
i
Xβ=0fori ≤r. Let us writeω=PXβ.
Thenω
k+1=ωk+2=···=ω n=0, and underH 0,ω1=ω2=···=ω r=0. Finally, from
Corollary 2 of Theorem 6 it follows thatZ
1,Z2,...,Z nare independent normal RVs with
the same varianceσ
2
andEZ i=ωi,i=1,2,...,n. We have thus transformed the problem
to the following simpler canonical form:





Ω:Z
iare independentN(ω i,σ
2
),i=1,2,...,n,
ω
k+1=ωk+2=···=ω n=0,
H
0:ω1=ω2=···=ω r=0.
(17)
Now
(y−Xβ)
σ
(y−Xβ)=(P
σ
z−P
σ
ω)
σ
(P
σ
z−P
σ
ω) (18)

542 GENERAL LINEAR HYPOTHESIS
=(z−ω)
Θ
(z−ω)
=
k
Ω
i=1
(zi−ωi)
2
+
n
Ω
i=k+1
z
2
i
.
The quantity(y−Xβ)
Θ
(y−Xβ)is minimized if we chooseˆω i=zi,i=1,2,...,k,so
that
(y−X
ˆ
β)
Θ
(y−X
ˆ
β)=
n
Ω
i=k+1
z
2
i
. (19)
UnderH
0,ω1=ω2=···=ω r=0, so that(y−Xβ)
Θ
(y−Xβ)will be minimized if
we choose
ˆ
ˆω
i=zi,i=r+1,...,k. Thus
(y−X
ˆ
ˆ
β)
Θ
(y−X
ˆ
ˆ
β)=
r
Ω
i=1
z
2
i
+
n
Ω
i=k+1
z
2
i
. (20)
It follows that
F=
Θ
r
i=1
Z
2
i
Θ
n i=k+1
Z
2
i
.
Now
Θ
n i=k+1
Z
2
i

2
has aχ
2
(n−k)distribution, and, underH 0,
Θ
r i=1
Z
2
i

2
has aχ
2
(r)
distribution. Since
Θ
r i=1
Z
2
i
and
Θ
n i=k+1
Z
2
i
are independent, we see that[(n−k)/r]Fis
distributed asF(r,n−k)underH
0, as asserted. This completes the proof of the theorem.
Remark 4.In practice, one does not need to find a transformation that reduces the problem
to the canonical form. As will be done in the following sections, one simply computes the
estimators
ˆ
θand
ˆ
ˆ
θand then computes the test statistic in any of the equivalent forms (14),
(15), or (16) to apply theF-test.
Remark 5.The computation of
ˆ
β,
ˆ
ˆ
βis greatly facilitated, in view of Remark 3, by using
the principle of least squares. Indeed, this was done in the proof of Theorem 1 when we
reduced the problem of maximum likelihood estimation to that of minimization of sum of
squares(y−Xβ)
Θ
(y−Xβ).
Remark 6.The distribution of the test statistic underH
1is easily determined. We note
thatZ
i/σ∼N(ω i/σ,1)fori=1,2,...,r, so that
Θ
r
i=1
Z
2
i

2
has a noncentral chi-square
distribution withrd.f. and noncentrality parameterδ=
Θ
r
i=1
ω
2
i

2
.Itfollowsthat
[(n−k)/r]Fhas a noncentralF-distribution with d.f.(r,n−k)and noncentrality param-
eterδ. UnderH
0,δ=0, so that[(n−k)/r]Fhas a centralF(r,n−k)distribution. Since
Θ
r
i=1
ω
2
i
=
Θ
r
i=1
(EZi)
2
, it follows from (19) and (20) that if we replace each observation
Y
iby its expected value in the numerator of (16), we getσ
2
δ.
Remark 7.The general linear hypothesis makes use of the assumption of common vari-
ance. For instance, in Example 4,Y
ij∼N(μ i,σ
2
),j=1,2,...,k. Let us suppose that
Y
ij∼N(μ i,σ
2
i
),i=1,2,...,k. Then we need to test thatσ 1=σ2=···=σ kbefore we
can apply Theorem 1. The casek=2 has already been considered in Section 10.3. For the

REGRESSION ANALYSIS 543
case wherek>2 one can show that a UMP unbiased test does not exist. A large-sample
approximation is described in Lehmann [64, pp. 376–377]. It is beyond the scope of this
book to consider the effects of departures from the underlying assumptions. We refer the
reader to Scheffé [101, Chapter 10], for a discussion of this topic.
Remark 8.The general linear model (GLM) is widely used in social sciences where Y
is often referred to as the response (or dependent) variable and X as the explanatory (or
independent) variable. In this language the GLM “predicts” a response variable from a
linear combination of one or more explanatory variables. It should be noted that dependent
and independent in this context do not have the same meaning as in Chapter 4. Moreover,
dependence does not imply causality.
PROBLEMS 12.2
1.Show that any solution of the normal equations (5) minimizes the sum of squares
(Y−Xβ)
σ
(Y−Xβ).
2.Show that the least square estimator given in (6) is an unbiased estimator ofβ.Ifthe
RVsY
iare uncorrelated with common varianceσ
2
, show that the covariance matrix
of the
ˆ
β
i’s is given by (7).
3.Under the assumption thatε[in model (2)] has a multivariate normal distribution
with mean 0 and dispersion matrixσ
2
Inshow that the least square estimators and
the MLE’s ofβcoincide.
4.Prove statements (11) and (12).
5.Determine the expression for the least squares estimator ofβsubject toHβ=0
12.3 REGRESSION ANALYSIS
In this section we study regression analysis, which is a tool to investigate the interrela-
tionship between two or more variables. Typically, in its simplest form a response random
variableYis hypothesized to be related to one or more explanatory nonrandom vari-
ables x
i’s. Regression analysis with a single explanatory RV is known assimple regression
and if, in addition, the relationship is thought to be linear, it is calledsimple linear regres-
sion(Example 12.2.3). In the case where several explanatory variables x
i’s are involved
the regression is referred to asmultiple linear regression. Regression analysis is widely
used in forecasting and prediction. Again this is a special case of GLM.
This section is divided into three subsections. The first subsection deals with multiple
linear regression where the RVYis of the continuous type. In the next two subsections we
study the case whenYis either Bernoulli or a count variable.
12.3.1 Multiple Linear Regression
It is convenient to write GLM in the form
Y=β
01n+Xβ+ε, (1)

544 GENERAL LINEAR HYPOTHESIS
whereY,X,ε, andβare as in Equation (12.2.1), and1 nis the columnn×1 unit vector
(1,1,...,1). The parameterβ
0is usually referred to as the intercept whereasβis known
as the slope vector withkparameters. The least estimator (LSE) ofβ
0andβare easily
obtained by minimizing.
n
β
i=1
(yi−β0−x
σ
i
β)
2
,xi=(xi1,xi2,...,x ik)
σ
,i=1,2,...,n, (2)
resulting ink+1 normal equations
y=β 0+β1x1+β2x2+···+β kxk=β0+β
σ
x,x=
n
β
i=1
xi
n
S
xx=
n
β
i=1
(xi−
x)(xi−x)
σ
,Sxy=
n
β
i=1
(xi−x)yi (3)
S
xx
ˆ
β=S
xy
or
ˆ
β=S
−1
xx
Sxyand
ˆ
β 0=
y−
ˆ
β
σ
x
E(
ˆ
β
0)=β 0,E(
ˆ
β)=β (4)
and
Cov(
ˆ
β
0,
ˆ
β
σ
)=
σ
2
n

1+nx
σ
S
−1
xxxnxS
−1
xx
n
xS
−1
xx
nS
−1
xx
φ
(5)
An unbiased estimate ofσ
2
is given by
ˆσ
2
=
1
n−k−1
π
Y−
ˆ
β
01n−X
ˆ
β
α
σπ
Y−
ˆ
β
01n−X
ˆ
β
α
. (6)
Let us now consider the simple linear regression model
y=β
01n+Xβ+. (7)
The LSEs of(β
0,β)
σ
is given by

ˆ
β
0
ˆ
β
φ
=

¯y−
ˆ
β
x
σ
(x
i−
x)yi
σ
(x
i−
x)
2
φ
(8)
and
ˆσ
2
=
1
n−2
n
β
n=1
π
y
i−
y−
ˆ
β(x i−x)
α
2
. (9)

REGRESSION ANALYSIS 545
The covariance matrix is given by
Cov

ˆ
β
0
ˆ
β
φ
=
σ
2
n

1+n
n
x
2
sn

nx
sn

nx
sn
n
sn
φ
, (10)
wheres
2
n
=
Θ
n
i=1
(xi−
x)
2
Let us now verify these results using the maximum likelihood method.
Clearly,Y
1,Y2,...,Y nare independent normal RVs withEY i=β0+β1xiand
var(Y
i)=σ
2
,i=1,2,...,n, andYis ann-variate normal random vector with meanXβ
and varianceσ
2
In. The joint PDF ofYis given by
f(y;β
0,β1,σ
2
)=
1
(2π)
n/2
1
σ
n
exp
γ

1

2
n
Ω
i=1
(yi−β0−β1xi)
2

. (11)
It easily follows that the MLE’s forβ
0,β1, andσ
2
are given by
ˆ
β
0=
Θ
n
i=1
Yi
n

ˆ
β
1
x, (12)
ˆ
β
1=
Θ
n
i=1
(xi−
x)(Yi−Y)
Θ
n
i=1
(xi−
x)
2
(13)
and
ˆσ
2
=
1
n
n
Ω
i=1
(Yi−
ˆ
β0−
ˆ
β1xi)
2
, (14)
wherex=n
−1
Θ
n
i=1
xi.
If we wish to testH
0:β1=0, we takeH=(0,1), so that the model is a special case
of the general linear hypothesis withk=2,r=1. UnderH
0the MLE’s are
ˆ
ˆ
β
0=
Y=
Θ
n
i=1
Yi
n
(15)
and
ˆ
ˆσ
2
=
1
n
n
Ω
i=1
(Yi−Y)
2
. (16)
Thus
F=
Θ
n
i=1
(Yi−
Y)
2

Θ
n i=1
(Yi−
Y+
ˆ
β 1x−
ˆ
β 1xi)
2
Θ
n i=1
(Yi−Y+
ˆ
β 1x−
ˆ
β 1xi)
2
(17)
=
ˆ
β
2
1
Θ
n i=1
(xi−
x)
2
Θ
n i=1
(Yi−Y+
ˆ
β 1x−
ˆ
β 1xi)
2
.

546 GENERAL LINEAR HYPOTHESIS
From Theorem 12.2.1, the statistic[(n−2)/1]F has a centralF(1,n−2)distribution
underH
0. SinceF(1,n−2)is the square of at(n−2), the likelihood ratio test rejectsH 0if
|
ˆ
β
1|
γ
(n−2)
Θ
n
i=1
(xi−
x)
2
Θ
n
i=1
(Yi−
Y+
ˆ
β 1x−
ˆ
β 1xi)
2

1/2
>c0, (18)
wherec
0is computed fromt-tables forn−2d.f.
For testingH
0:β0=0, we chooseH=(1,0)so that the model is again a special case
of the general linear hypothesis. In this case
ˆ
ˆ
β
1=
Θ
n
i=1
xiYi
Θ
n i=1
x
2
i
and
ˆ
ˆσ
2
=
1
n
n
Ω
i=1
(Yi−
ˆ ˆ
β1xi)
2
. (19)
It follows that
F=
Θ
n
i=1
(Yi−
ˆ
ˆ
β1xi)
2

Θ
n
i=1
(Yi−
Y+
ˆ
β 1x−
ˆ
β 1xi)
2
Θ
n
i=1
(Yi−
Y+
ˆ
β 1x−
ˆ
β 1xi)
2
, (20)
and since
ˆ
ˆ
β
1=
Θ
n
i=1
xiYi
Θ
n
i=1
x
2
i
=
Θ
n
i=1
(xi−
x)(Yi−Y)+nxY
Θ
n
i=1
x
2
i
(21)
=
ˆ
β
1
Θ
n
i=1
(xi−
x)
2
+nx(
ˆ
β0+
ˆ
β1x)
Θ
n i=1
x
2
i
=
ˆ
β1+
n
ˆ
β
0x
Θ
n i=1
x
2
i
,
we can write the numerator ofFas
n
Ω
i=1
(Yi−
ˆ
ˆ
β1xi)
2

n
Ω
i=1
(Yi−
Y+
ˆ
β 1x−
ˆ
β 1xi)
2
(22)
=
n
Ω
i=1

Y
i−
ˆ
β1xi+
ˆ
β1
x−Y+Y−
ˆ
β 1x−
n
ˆ
β
0
xxi
Θ
n
i=1
x
2
φ
2

n
Ω
i=1
(Yi−
Y+
ˆ
β 1x−
ˆ
β 1xi)
2
=
n
Ω
i=1

Y−
ˆ
β 1x−
n
ˆ
β
0
xxi
Θ
n i=1
x
2
i
φ
2
+2
n
Ω
i=1
(Yi−
ˆ
β1xi+
ˆ
β1
x−Y)

REGRESSION ANALYSIS 547
·

Y−
ˆ
β 1x−
n
ˆ
β
0
xxi

n
i=1
x
2
i
φ
=
ˆ
β
2
0
n

n
i=1
(xi−
x)
2

n i=1
x
2
i
.
It follows from Theorem 12.2.1 that the statistic
ˆ
β
0

n

n
i=1
(xi−x)
2
/

n i=1
x
2
i


n i=1
(Yi−Y+
ˆ
β 1x−
ˆ
β 1xi)
2
/(n−2)
(23)
has a centralt-distribution withn−2 d.f. underH
0:β0=0. The rejection region is
therefore given by
|
ˆ
β
0|

n

n
i=1
(xi−x)
2
/

n i=1
x
2
i


n i=1
(Yi−
ˆ
β0−
ˆ
β1xi)
2
/(n−2)
>c
0, (24)
wherec
0is determined from the tables oft(n−2)distribution for a given level of
significanceα.
For testingH
0:β0=β1=0, we chooseH=

10
01
φ
, so that the model is again a
special case of the general linear hypothesis withr=2. In this case
ˆ
ˆσ
2
=
1
n
n

i=1
Y
2
i
(25)
and
F=

n
i=1
Y
2
i


n
i=1
(Yi−
Y+
ˆ
β 1x−
ˆ
β 1xi)
2

n i=1
(Yi−Y+
ˆ
β 1x−
ˆ
βx i)
2
(26)
=
nY
2
+
ˆ
β
2
1

n i=1
(xi−
x)
2

n i=1
(Yi−
ˆ
β0−
ˆ
β1xi)
2
=
n(
ˆ
β
0+
ˆ
β1
x)
2
+
ˆ
β
2
1

n i=1
(xi−
x)
2

n i=1
(Yi−
ˆ
β0−
ˆ
β1xi)
2
.
From Theorem 12.2.1, the statistic[(n−2)/2]F has a centralF(2,n−2)distribution under
H
0:β0=β1=0. It follows that theα-level rejection region forH 0is given by
n−2
2
F>c
0, (27)
whereFis given by (26), andc
0is the upperαpercent point under theF(2,n−2)
distribution.

548 GENERAL LINEAR HYPOTHESIS
Remark 1.It is quite easy to modify the analysis above to obtain tests of null hypothe-
sesβ
0=β
σ
0
,β1=β
σ
1
, and(β 0,β1)
σ
=(β
σ
0

σ
1
)
σ
, whereβ
σ
0

σ
1
are given real numbers
(Problem 4).
Remark 2.The confidence intervals forβ
0,β1are also easily obtained. One can show that
a(1−α)-level confidence interval forβ
0is given by

⎝ˆ
β
0−t
n−2,α/2

σ
n
i=1
x
2
i
σ
n
i=1
(Yi−
ˆ
β0−
ˆ
β1xi)
2
n(n−2)
σ
n i=1
(xi−
x)
2
, (28)
ˆ
β
0+t
n−2,α/2

σ
n i=1
x
2
i
σ
n i=1
(Yi−
ˆ
β0−
ˆ
β1xi)
2
n(n−2)
σ
n i=1
(xi−
x)
2


and that forβ
1is given by

⎝ˆ
β
1−t
n−2,α/2

σ
n
i=1
(Yi−
ˆ
β0−
ˆ
β1xi)
2
(n−2)
σ
n
i=1
(xi−x)
2
, (29)
ˆ
β
1+t
n−2,α/2

σ
n
i=1
(Yi−
ˆ
β0−
ˆ
β1xi)
2
(n−2)
σ
n
i=1
(xi−x)
2

⎠.
Similarly, one can obtain confidence sets for(β
0,β1)
σ
from the likelihood ratio test of

0,β1)
σ
=(β
σ
0

σ
1
)
σ
. It can be shown that the collection of sets of points(β 0,β1)
σ
satisfying
(n−2)[n(
ˆ
β
0−β0)
2
+2n
x(
ˆ
β0−β0)(
ˆ
β1−β1)+
σ
n
i=1
x
2
i
(
ˆ
β1−β1)
2
]
2
σ
n i=1
(Yi−
ˆ
β0−
ˆ
β1xi)
2
(30)
≤F
2,n−2,α
is a (1−α)-level collection of confidence sets (ellipsoids) for(β 0,β1)
σ
centered at
(
ˆ
β
0,
ˆ
β1)
σ
.
Remark 3.Sometimes interest lies in constructing a confidence interval on the unknown
linear regression functionE{Y|x
0}=β 0+β1x0for a given value ofx,oronavalue
ofY,givenx =x
0. We assume thatx 0is a value ofxdistinct fromx 1,x2,...,x n. Clearly,
ˆ
β
0+
ˆ
β1x0is the maximum likelihood estimator ofβ 0+β1x0. This is also the best linear
unbiased estimator. Let us writeˆE{Y|x
0}=
ˆ
β 0+
ˆ
β1x0. Then
ˆE{Y|x
0}=
Y−
ˆ
β 1x+
ˆ
β 1x0
=Y+(x 0−x)
σ
n
i=1
(xi−
x)(Yi−Y)
σ
n
i=1
(xi−
x)
2
,
which is clearly a linear function of normal RVsY
i. It follows thatˆE{Y|x 0}is also
normally distributed with meanE(
ˆ
β
0+
ˆ
β1x0)=β 0+β1x0and variance
var{ˆE{Y|x
0}}=E{
ˆ
β 0−β0+
ˆ
β1x0−β1x0}
2
(31)

REGRESSION ANALYSIS 549
=var(
ˆ
β 0)+x
2
0
var(
ˆ
β 1)+2x 0cov(
ˆ
β 0,
ˆ
β1)

2
ˆ
1
n
+
(x−x 0)
2
Θ
n
i=1
(xi−
x)
2

.
(See Problem 6.) It follows that
ˆ
β
0+
ˆ
β1x0−β0−β1x0
σ{(1/n)+[( x−x 0)
2
/
Θ
n
i=1
(xi−x)
2
]}
1/2
(32)
isN(0,1).Butσis not known, so that we cannot use (32) to construct a confidence interval
forE{Y|x
0}. Sincenˆσ
2

2
is aχ
2
(n−2)RV andnˆσ
2

2
is independent of
ˆ
β 0+
ˆ
β1x0
(why?), it follows that

n−2
ˆ
β
0+
ˆ
β1t0−β0−β1x0
ˆσ{1+n[(x−x 0)
2
/
Θ
n
i=1
(xi−x)
2
]}
1/2
(33)
has at(n−2)distribution. Thus, a (1−α)-level confidence interval forβ
0+β1x0is
given by

ˆ
β
0+
ˆ
β1x0−t
n−2,α/2 ˆσ

n
n−2
ˆ
1
n
+
(x−x 0)
2
Θ
n i=1
(xi−x)
2

, (34)
ˆ
β
0+
ˆ
β1x0+t
n−2,α/2 ˆσ

n
n−2
ˆ
1
n
+
(x−x 0)
2
Θ
n i=1
(xi−x)
2

φ
.
In a similar manner one can show (Problem 7) that

ˆ
β
0+
ˆ
β1x0−t
n−2,α/2 ˆσ

n
n−2
ˆ
n+1
n
+
(x−x 0)
2
Θ
n i=1
(xi−x)
2

, (35)
ˆ
β
0+
ˆ
β1x0+t
n−2,α/2 ˆσ

n
n−2
ˆ
n+1
n
+
(x−x 0)
2
Θ
n i=1
(xi−x)
2

φ
is a (1−α)-level confidence interval forY
0=β0+β1x0+ε, that is, for the estimated value
Y
0ofYatx 0.
Remark 4.The simple regression model (2) considered above can be generalized in many
directions. Thus we may considerEYas a polynomial inxof a degree higher than 1, or
we may regardEYas a function of several variables. Some of these generalizations will
be taken up in the problems.
Remark 5.Let(X
1,Y1),(X2,Y2),...,(X n,Yn)be a sample from a bivariate normal popu-
lation with parametersEX=μ
1,EY=μ 2,var(X)=σ
2
1
,var(Y)=σ
2
2
, andcov(X ,Y)=ρ.

550 GENERAL LINEAR HYPOTHESIS
In Section 6.6 we computed the PDF of the sample correlation coefficientRand showed
(Remark 6.6.4) that the statistic
T=R

n−2
1−R
2
(36)
has at(n−2)distribution, provided thatρ=0. If we wish to testρ=0, that is, the
independence of two jointly normal RVs, we can base a test on the statisticT. Essen-
tially, we are testing that the population covariance is 0, which implies that the population
regression coefficients are 0. Thus we are testing, in particular, thatβ
1=0. It is there-
fore not surprising that (36) is identical with (18). We emphasize that we derived (36)
for a bivariate normal population, but (18) was derived by taking theX’s as fixed and
the distribution ofY’s as normal. Note that for a bivariate normal populationE{Y|x}=
μ
2+ρ(σ 2/σ1)(x−μ 1)is linear, in consistency with our model (1) or (2).
Example 1.Let us assume that the following data satisfy a linear regression model:
Y
i=β0+β1xi+εi.
x
0123 45
y0.475 1.007 0.838 −0.618 1.0378 0.943.
Let us test the null hypothesis thatβ
1=0. We have
x=2.5,
5
Ω
i=0
(xi−x)
2
=17.5, y=0.671,
5
Ω
i=0
(xi−x)(yi−y)=0.9985,
ˆ
β
1=0.0571,
ˆ
β 0=
y−
ˆ
β 1x=0.5279,
5
Ω
i=0
(yi−
ˆ
β0−
ˆ
β1xi)
2
=2.3571,
and
|
ˆ
β
1|

(n−2)
Θ
(x i−x)
2
Θ
(y
i−
ˆ
β0−
ˆ
β1xi)
2
=0.3106.
Sincet
n−2,α/2 =t4,0.025=2.776>0.3106, we acceptH 0at levelα=0.05.
Let us next find a 95 percent confidence interval forE{Y|x=7}. This is given by (34).
We have
t
n−2,α/2 ˆσ

n
n−2
ˆ
1
n
+
(x−x 0)
2
Θ
(x
i−
x)
2

=2.776

2.3571
6

6
4

1
6
+
20.25
17.5

=2.3707,

REGRESSION ANALYSIS 551
ˆ
β
0+
ˆ
β1x0=0.5279+0.0571×7
=0.9276,
so that the 95 percent confidence interval is(−1.4431, 3.2983).
(The data were produced from Table ST6, of random numbers withμ=0,σ=1, by
lettingβ
0=1 andβ 1=0sothatE{Y|x}=β 0+β1x=1, which surely lies in the interval.)
12.3.2 Logistic and Poisson Regression
In the regression model considered aboveYis a continuous type RV. However, in a wide
variety of problemsYis either binary or a count variable. Thus in a medical studyY
may be the presence or absence of a disease such as diabetes. How do we modify linear
regression model to apply in this case? The idea here is to choose a function ofE(Y)so
that in Section 12.3.1
f(E(Y)) =Xβ.
This can be accomplished by choosing the functionfto be the logarithm of the odds ratio
f(p)=log

p
1−p

, (37)
wherep=P(Y=1)so thatE(Y)=p. It follows that
p=E(Y)=P(Y=1)=
exp(Xβ)
1+exp(Xβ)
(38)
so that logistic regression models the logarithm of odds ratio as a linear function of RVs X
i.
The termlogistic regressionderives from the fact that the functione
x
/(1+e
x
)is known
as thelogistic function.
For simplicity we will only consider the simple linear regression model case so that
E(Y
i)=π i(β0+βx i),i=1,2,...,n,0<π i(β0+βx i)<1. (39)
Choosing the logistic distribution as
π
i=πi(β0+βx i)=
exp(β
0+βx i)
1+ exp(β 0+βx i)
, (40)
letY
1,Y2,...,Y nbe iid binary RVs taking values 0 or 1. Then the joint PMF ofY 1,Y2,...,Y n
is given by
L(β
0,β|x)=
n

i=1

π
yi
i
(1−π i)
1−yi

=
n

i=1
(1−π i)
γ
exp
n
Ω
i=1
yilog

π
i
1−π i


(41)

552 GENERAL LINEAR HYPOTHESIS
and the log likelihood function by
logL(β
0,β|x)=n
yβ0+β
x
Ω
i=1
xiyi−
n
Ω
i=1
log{1+ exp(β 0+βx i)}. (42)
It is easy to see that
∂logL
∂β0
=ny−
n
Ω
i=1
πi=0,
∂logL
∂β
=
n
Ω
i=1
xiyi−
n
Ω
i=1
xiπi=0. (43)
Since the likelihood equations are nonlinear in the parameters, the MLEs ofβ
0andβare
obtained numerically by using Newton–Raphson method.
Let
ˆ
β
0and
ˆ
βbe the MLE ofβ 0andβ, respectively. From section 8.7 we note that the
variance of
ˆ
βis given by
var(
ˆ
β)=
n
Ω
i=1
x
2
i
πi(1−π i), (44)
so that the standard error (SE) of
ˆ
βis its square root. For largen, the so-called Wald
statisticZ=
ˆ
β/SE(
ˆ
β)has an approximateN(0,1) distribution underH
0:β=0. Thus we
rejectH
0at levelαif|z|>z
α/2. One can use
ˆ
β±z
α/2SE(
ˆ
β)as a (1−α)-level confidence
interval forβ.
Yet another choice for testingH
0is to use the LRT statistics−2logλ(see Theorem
10.2.3). UnderH
0,−2logλhas a chi-square distribution with 1 d.f. Here
λ=
L(
ˆ
β
0,0|x)
L(
ˆ
β0,
ˆ
β|x)
. (45)
In (40) we chose the DF of a logistic RV. We could have chosen some other DF such
asφ(x), the DF of aN(0,1) RV. In that case we haveβ
0+βx=φ(x). The resulting model
is calledprobit regression.
We finally consider the case when the RVYis a count of rare events and has Poisson
distribution with parameterλ. Clearly, the GLM is not directly applicable. Again we only
consider the linear regression model case. LetY
i,i=1,2,...,k, be independentP(λ i)RVs
whereλ
i= exp(β 0+xiβ1), so that
θ
i=logλ i=β0+xiβ1.
The log likelihood function is given by
logL(β
0,β1;y1,...,y n)=
n
Ω
i=1

y
iθi−e
θi
−log(y i!)

. (46)

REGRESSION ANALYSIS 553
In order to find the MLEs ofβ 0andβ 1we need to solve the likelihood equations
∂logL
∂β0
=
n
β
i=1
{yi−θi}=0
∂logL
∂β1
=
n
β
i=1
{xiyi−xiθi}=0, (47)
which are nonlinear inβ
0andβ 1. The most common method of obtaining the MLEs is to
apply the iteratively weighted least squares algorithm.
Once the MLEs ofβ
0andβ 1are computed, one can compute the SEs of the estimates
by using methods of Section 8.7. Using the SE(
ˆ
β
1), for example, one can test hypothesis
concerningβ
1or construct (1−α)-level confidence interval forβ 1.
For a detailed discussion of Geometric and Poisson regression we refer Agresti [1].
A wide variety of software is available, which can be used to carry out the computations
required.
PROBLEMS 12.3
1.Prove statements (12), (13), and (14).
2.Prove statements (15) and (16).
3.Prove statement (19).
4.Obtain tests of null hypothesesβ
0=β
σ
0
,β1=β
σ
1
, and(β 0,β1)
σ
=(β
σ
0

σ
1
)
σ
, where
β
σ
0

σ
1
are given real numbers.
5.Obtain the confidence intervals forβ
0andβ 1as given in (28) and (29), respectively.
6.Derive the expression forvar{ˆE{Y|x
0}}as given in (31).
7.Show that the interval given in (35) is a (1−α)-level confidence interval forY
0=
β
0+β1x0+ε, the estimated value ofYatx 0.
8.Suppose that the regression ofYon the (mathematical) variablexis a quadratic
Y
i=β0+β1xi+β2x
2
i
+εi,
whereβ
0,β1,β2are unknown parameters,x 1,x2,...,x nare known values ofx, and
ε
1,ε2,...,εnare unobservable RVs that are assumed to be independently normally
distributed with common mean 0 and common varianceσ
2
(see Example 12.2.3).
Assume that the coefficient vectors(x
k
1
,x
k
2
,...,x
k
n
),k=0,1,2, are linearly indepen-
dent. Write the normal equations for estimating theβ’s and derive the generalized
likelihood ratio test ofβ
2=0.
9.Suppose that theY’s can be written as
Y
i=β1xi1+β2xi2+β3xi3+εi,
wherex
i1,xi2,xi3are three mathematical variables, andε iare iidN(0,1)RVs.
Assuming that the matrixX(see Example 3) is of full rank, write the normal
equations and derive the likelihood ratio test of the null hypothesisH
0:β1=β2=β3.

554 GENERAL LINEAR HYPOTHESIS
10.The following table gives the weightY(grams) of a crystal suspended in a saturated
solution against the time suspendedT(days):
TimeT 0123456
WeightY0.40.71.11.61.92.32.6
(a) Find the linear regression line ofYonT.
(b) Test the hypothesis thatβ
0=0 in the linear regression modelY i=β0+β1Ti+εi.
(c) Obtain a 0.95 level confidence interval forβ
0.
11.Let o
i=πi/(1−π i)be the odds ratio corresponding to xi,i=1,2,...,n. By consid-
ering the ratio o
i+1/oi, how will you interpret the value of the slope parameterβ 1?
12.Do the same for parameterβ
1in the Poisson regression model by considering the
ratioλ
i+1/λ1.
12.4 ONE-WAY ANALYSIS OF VARIANCE
In this section we return to the problem of one-way analysis of variance considered in
Examples 12.2.1 and 12.2.4. Consider the model
Y
ij=μi+εij,j=1,2,...,n i;i=1,2,...,k, (1)
as described in Example 12.2.4. In matrix notation we write
Y=Xβ+ε, (2)
where
Y=(Y
11,Y12,...,Y 1n1
,Y21,Y22,...,Y 2n2
,...,Y k1,Yk2,...,Y knk
)
σ
,
β=(μ
1,μ2,...,μk)
σ
,
X=




1
n1
0···0
.
.
.
.
.
.
···
···
···.
.
.
00 ···1
nk




,
ε=(ε
11,ε12,...,ε1n1
,ε21,ε22,...,ε2n2
,...,εk1,εk2,...,εknk
)
σ
.
As in Example 12.2.4,Yis a vector ofn-observations(n=
σ
k
i=1
ni), whose components
Y
ijare subject to random errorε ij∼N(0,σ
2
),βis a vector ofkunknown parameters,
andXis a design matrix. We wish to find a test ofH
0:μ1=μ2=···=μ kagainst all
alternatives. We may writeH
0in the formHβ=0, whereHis a(k−1)×kmatrix of
rank(k−1), which can be chosen to be

ONE-WAY ANALYSIS OF VARIANCE 555
H=






1−10 ···0
10 −1···0
.
.
.
.
.
.
.
.
.
···
···
···.
.
.
100 ··· −1






.
Let us writeμ
1=μ2=···μ k=μunderH 0. The joint PDF ofYis given by
f(y;μ
1,μ2,...,μk,σ
2
)=

1
2πσ
2

n/2
exp




1

2
k
Ω
i=1
n

j=1
(yij−μi)
2



, (3)
and, underH
0,by
f(x;μ,σ
2
)=

1
2πσ
2

n/2
exp




1

2
k
Ω
i=1
n

j=1
(yij−μ)
2



. (4)
It is easy to check that the MLEs are
ˆμ
i=
Θ
ni
j=1
yij
ni
=y
i
.,i=1,2,...,k, (5)
ˆσ
2
=
Θ
k
i=1Θ
ni
j=1
(yij−
y
i
.)
2
n
, (6)
ˆ
ˆμ=
Θ
k
i=1Θ
ni
j=1
yij
n
=y, (7)
and
ˆ
ˆσ
2
=
Θ
k
i=1Θ
ni
j=1
(yij−
y)
2
n
. (8)
By Theorem 12.2.1, the likelihood ratio test is to rejectH
0if
Θ
k
i=1Θ
ni
j=1
(Yij−
Y)
2

Θ
k i=1Θ
ni
j=1
(Yij−
Yi.)
2
Θ
k i=1Θ
ni
j=1
(Yij−Yi.)
2

n−k
k−1

≥F
0, (9)
whereF
0is the upperαpercent point in theF(k−1,n−k)distribution. Since
k
Ω
i=1
n

j=1
(Yij−Y)
2
=
k
Ω
i=1
n

j=1
(Yij−
Yi.+Yi.−Y)
2
(10)
=
k
Ω
i=1
n

j=1
(Yij−
Yi.)
2
+
k
Ω
i=1
ni(Yi.−Y)
2
,

556 GENERAL LINEAR HYPOTHESIS
we may rewrite (9) as
Θ
k
i=1
ni(
Yi.−Y)
2
/(k−1)
Θ
k i=1Θ
ni
j=1
(Yij−Yi.)
2
/(n−k)
≥F
0. (11)
It is usual to call the sum of squares in the numerator of (11) thebetween sum of squares
(BSS) and the sum of squares in the denominator of (11)the within sum of squares(WSS).
The results are conveniently displayed in a so-calledanalysis of variance tablein the
following form.
One-Way Analysis of Variance
Source Degrees of Mean Sum
Variation Sum of Squares Freedom of Squares F-Ratio
Between BSS=
k
Ω
i=1
ni(Yi.−Y)
2
k−1BSS /(k−1)
BSS/(k−1)
WSS/(n−k)
Within WSS =
k
Ω
i=1
n

j=1
(Yij− Yi.)
2
n−k WSS/(n−k)
Mean nY
2
1
Total TSS =
k
Ω
i=1
n

j=1
Y
2
ij
n
The third row, designated “Mean,” has been included to make the total of the second
column add up to the total sum of squares (TSS),
Θ
k
i=1Θ
ni
j=1
Y
2
ij
.
Example 1.The lifetimes (in hours) of samples from three different brands of batteries
were recorded with the following results:
Brand
Y1 Y2 Y3
40 60 60
30 40 50
50 55 70
50 65 65
30 75
40
We wish to test whether the three brands have different average lifetimes. We will assume
that the three samples come from normal populations with common (unknown) standard
deviationσ.

ONE-WAY ANALYSIS OF VARIANCE 557
From the datan 1=5,n 2=4,n 3=6,n=15, and
y
1
=
200
5
=40, y
2
=
220
4
=55, y
3
=
360
6
=60,
5
Ω
i=1
(y1i−
y
1
)
2
=400,
4
Ω
i=1
(y2i−y
2
)
2
=350,
6
Ω
i=1
(y3i−y
3
)
2
=850.
Also, the grand mean is
y=
200+220+360
15
=
780
15
=52.
Thus
BSS=5(40−52)
2
+4(55−52)
2
+6(60−52)
2
=1140,
WSS=400+350+850=1600.
Analysis of Variance
Source SS d.f. MSS F-Ratio
Between 1140 2 570 570 /133.33 =4.28
Within 1600 12 133.33
Choosingα=0.05, we see thatF 0=F2,12,0.05 =3.89. Thus we rejectH 0:μ1=μ2=μ3
at levelα=0.05.
Example 2.Three sections of the same elementary statistics course were taught by three
instructors. The final grades of students were recorded as follows:
Instructor
II II II
95 88 68
33 78 79
48 91 91
76 51 71
89 85 87
82 77 68
(continued)

558 GENERAL LINEAR HYPOTHESIS
Instructor
II II II
60 31 79
77 62 16
96 35
81
Let us test the hypothesis that the average grades given by the three instructors are the
same at levelα=0.05.
From the datan
1=8,n 2=10,n 3=9,n=27,
y
1
=70,y
2
=74,y
3
=66,
σ
8
i=1
(y1i−
y
1
)
2
=3168,
σ
10 i=1
(y2i−
y
2
)
2
=3686,
σ
9 i=1
(y3i−
y
3
)
2
=4898. Also, the grand
mean is
y=
560+740+594
27
=
1894
27
=70.15.
Thus
BSS=8(0.15)
2
+10(3.85)
2
+9(4.15)
2
=303.4075
WSS=3168+3686+4898
=11,752.
Analysis of Variance
Source SS d.f. MSS F-Ratio
Between 303.41 2 151.70 151.70/489.67
Within 11,752.00 24 489.67
We therefore cannot reject the null hypothesis that the average grades given by the three
instructors are the same.
PROBLEMS 12.4
1.Prove statements (5), (6), (7), and (8).
2.The following are the coded values of the amounts of corn (in bushels per acre)
obtained from four varieties, using unequal number of plots for the different
varieties:
A:2,1,3,2
B:3,4,2,3,4,2

ONE-WAY ANALYSIS OF VARIANCE 559
C:6,4,8
D:7,6,7,4
Test whether there is a significant difference between the yields of the varieties.
3.A consumer interested in buying a new car has reduced his search to six different
brands:D,F,G,P,V,T. He would like to buy the brand that gives the highest
mileage per gallon of regular gasoline. One of his friends advises him that he should
use some other method of selection, since the average mileages of the six brands are
the same, and offers the following data in support of her assertion.
Distance Traveled (Miles) per Gallon of Gasoline
Brand
Car DF G PVT
1 4238 28 323025
2 3533 32 363532
3 3728 35 272524
4 37372630
52 8 30
61 9
Should the consumer accept his friend’s advice?
4.The following data give the ages of entering freshmen in independent random
samples from three different universities.
University
ABC
17 16 21
19 16 23
20 19 22
21 20
18 19
Test the hypothesis that the average ages of entering freshman at these universities are the same.
5.Five cigarette manufacturers claim that their product has low tar content. Inde- pendent random samples of cigarettes are taken from each manufacturer and the
following tar levels (in milligrams) are recorded.

560 GENERAL LINEAR HYPOTHESIS
Brand Tar Level (mg)
A 4.2, 4.8, 4.6, 4.0, 4.4
B 4.9, 4.8, 4.7, 5.0, 4.9, 5.2
C 5.4, 5.3, 5.4, 5.2, 5.5
D 5.8, 5.6, 5.5, 5.4, 5.6, 5.8
E 5.9, 6.2, 6.2, 6.8, 6.4, 6.3
Can the differences among the sample means be attributed to chance?
6.The quantity of oxygen dissolved in water is used as a measure of water pollution.
Samples are taken at four locations in a lake and the quantity of dissolved oxygen is
recorded as follows (lower reading corresponds to greater pollution):
Location Quantity of Dissolved Oxygen (%)
A 7.8, 6.4, 8.2, 6.9
B 6.7, 6.8, 7.1, 6.9, 7.3
C 7.2, 7.4, 6.9, 6.4, 6.5
D 6.0, 7.4, 6.5, 6.9, 7.2, 6.8
Do the data indicate a significant difference in the average amount of dissolved
oxygen for the four locations?
12.5 TWO-WAY ANALYSIS OF VARIANCE WITH ONE OBSERVATION
PER CELL
In many practical problems one is interested in investigating the effects of two factors that
influence an outcome. For example, the variety of grain and the type of fertilizer used both
affect the yield of a plot or the score on a standard examination is influenced by the size
of the class and the instructor.
Let us suppose that two factors affect the outcome of an experiment. Suppose also
that one observation is available at each of a number of levels of these two factors. Let
Y
ij(i=1,2,...,a;j=1,2,...,b)be the observation when the first factor is at theith level,
and the second factor at thejth level. Assume that
Y
ij=μ+α i+βj+εij,i=1,2,...,a; j=1,2,...,b, (1)
whereα
iis the effect of theith level of the first factor,β jis the effect of thejth level of the
second factor, andε
ijis the random error, which is assumed to be normally distributed with
mean 0 and varianceσ
2
. We will assume that theε ij’s are independent. It follows thatY ij
are independent normal RVs with meansμ+α i+βjand varianceσ
2
. There is no loss of
generality in assuming that
Θ
a
i=1
αi=
Θ
b
j=1
βj=0, for, ifμ ij=μ
Θ

Θ
i

Θ
j
, we can write
μ
ij=(μ
Θ
+
α
Θ

Θ
)+(α
Θ
i

α
Θ
)+(β
Θ
j

β
Θ
)
=μ+α
i+βj

TWO-WAY ANALYSIS OF VARIANCE WITH ONE OBSERVATION PER CELL 561
and
Θ
a
i=1
αi=0,
Θ
b
j=1
βj=0. Here we have written
α
Θ
andβ
Θ
for the means of
α
Θ
i
’s andβ
Θ
j
’s, respectively. ThusY ijmay denote the yield from the use of theith variety of
some grain and thejth type of some fertilizer. The two hypotheses of interest are
α
1=α2=···=α a=0 and β 1=β2=···=β b=0.
The first of these, for example, says that the first factor has no effect on the outcome of
the experiment.
In view of the fact that
Θ
a
i=1
αi=0 and
Θ
b
j=1
βj=0,α a=−
Θ
a−1
i=1
αi,βb=−
Θ
b−1
j=1
βj,
and we can write our model in matrix notation as
Y=Xβ+ε, (2)
where
Y=(Y
11,Y12,...,Y 1b,Y21,Y22,...,Y 2b,...,Y a1,Ya2,...,Y ab)
Θ
,
β=(μ,α
1,α2,...,αa−1,β1,β2,...,βb−1)
Θ
,
ε=(ε
11,ε12,...,ε1b,ε21,ε22,...,ε2b,...,εa1,εa2,...,εab)
Θ
,
and
X=



















































μ
α1α2···α a−1β1β2···β b−1
110 ··· 010 ···0
110 ··· 001 ···0
·· · ··· · · · ··· ·
·· · ··· · · · ··· ·
·· · ··· · · · ··· ·
110 ··· 000 ···1
110 ··· 0−1−1··· −1
101 ··· 010 ···0
101 ··· 001 ···0
·· · ··· · · · ··· ·
·· · ··· · · · ··· ·
·· · ··· · · · ··· ·
101 ··· 000 ···1
101 ··· 0−1−1··· −1
·· · ··· · · · ··· ·
·· · ··· · · · ··· ·
·· · ··· · · · ··· ·
1−1−1··· −110 ···0
1−1−1··· −101 ···0
·· · ··· · · · ··· ·
·· · ··· · · · ··· ·
·· · ··· · · · ··· ·
1−1−1··· −100 ···1
1−1−1··· −1−1−1··· −1


















































562 GENERAL LINEAR HYPOTHESIS
The vector of unknown parametersβis(a+b−1)×1 and the matrixXisab×(a+
b−1)(bblocks ofarows each). We leave the reader to check thatXis of full rank,
a+b−1. The hypothesisH
α:α1=α2=···=α a=0orH β:β1=β2=···=β b=0
can easily be put into the formHβ=0. For example, forH
βwe can chooseHto be the
(b−1)×(a+b−1)matrix of full rankb−1, given by
H=













μ
α1α2···α a−1β1β2···β b−1
000 ··· 010 ···0
000 ··· 001 ···0
·· · ··· · · · ··· ·
·· · ··· · · · ··· ·
·· · ··· · · · ··· ·
000 ··· 000 ···1













.
Clearly, the model described above is a special case of the general linear hypothesis, and
we can use Theorem 12.2.1 to testH
β.
To apply Theorem 12.2.1 we need the estimatorsˆμ
ijand
ˆ
ˆμ ij. It is easily checked that
ˆμ=
β
a
i=1β
b
j=1
yij
ab
=y (3)
and
ˆα
i=
y
i
.−y,
ˆ
β j=y.j−y, (4)
wherey
i
.=
β
b
j=1
yij/b,y.j=
β
a i=1
yij/a. Also, underH β, for example,
ˆ
ˆμ=
yand
ˆ ˆα i=
y
i
.−y. (5)
In the notation of Theorem 12.2.1,n=ab,k=a+b−1,r=b−1, so thatn−k=
ab−a−b+1=(a−1)(b−1), and
F=
β
a
i=1β
b
j=1
(Yij−
Yi.)
2

β
a i=1β
b j=1
(Yij−
Yi.−Y.j+Y)
2
β
a i=1β
b j=1
(Yij−Yi.−Y.j+Y)
2
. (6)
Since
a
ε
i=1
b
ε
j=1
(Yij−
Yi.)
2
=
a
ε
i=1
b
ε
j=1
{(Yij−
Yi.−Y.j+Y)+(Y.j−Y)}
2
(7)
=
a
ε
i=1
b
ε
j=1
(Yij−
Yi.−Y.j+Y)
2
+a
b
ε
j=1
(Y.j−Y)
2
,

TWO-WAY ANALYSIS OF VARIANCE WITH ONE OBSERVATION PER CELL 563
we may write
F=
a
Θ
b
j=1
(
Y.j−Y)
2
Θ
a i=1Θ
b j=1
(Yij−Yi.−Y.j+Y)
2
. (8)
It follows that, underH
β,(a−1)Fhas a centralF(b−1,(a−1)(b−1))distribution.
The numerator ofFin (8) measures the variability between the means
Y.j, and the
denominator measures the variability that exists once the effects due to the two factors
have been subtracted.
IfH
αis the null hypothesis to be tested, one can show that underH αthe MLEs are
ˆ
ˆμ=
yand
ˆ ˆ
β j=
y.j−y. (9)
As before,n=ab,k=a+b−1, butr=a−1. Also,
F=
Θ
a
i=1Θ
b
j=1
(Yij−
Y.j)
2

Θ
a i=1Θ
b j=1
(Yij−
Yi.−Y.j+Y)
2
Θ
a i=1Θ
b j=1
(Yij−Yi.−Y.j+Y)
2
, (10)
which may be rewritten as
F=
b
Θ
a i=1
(
Yi.−Y)
2
Θ
a i=1Θ
b j=1
(Yij−Yi.−Y.j+Y)
2
. (11)
It follows that, underH
α,(b−1)Fhas a centralF(a−1,(a−1)(b−1))distribution. The
numerator ofFin (11) measures the variability between the means
Yi..
If the data are put into the following form:
α
β Levels of Factor 2
12 bRow Means
1 Y11,Y 12,···, Y 1bY1.
Levels 2 Y21,Y 22,···, Y 2bY2.
of · · · ··· · ·
Factor 1· · · ··· · ·
· · · ··· · ·
a Ya1,Y a2,···, Y abYa.
Column MeansY.1,Y.2,···,Y.b Y
so that the rows represent various levels of factor 1, and the columns, the levels of factor 2,
one can write
between sum of squares for rows=b
a
Ω
i=1
(
Yi.−Y)
2
=sum of squares for factor 1
=SS
1.

564 GENERAL LINEAR HYPOTHESIS
Similarly,
between sum of squares for columns=a
b
Ω
j=1
(
Y.j−Y)
2
=sum of squares for factor 2
=SS
2.
It is usual to write error or residual sum of squares (SSE) for the denominator of (8) or (11).
These results are conveniently presented in an analysis of variance table as follows.
Two-Way Analysis of Variance Table with One Observation per Cell
Source of Sum of Degrees of Mean
Variation Squares Freedom Square F-Ratio
Rows SS 1 a−1MS 1=SS1/(a−1) MS 1/MSE
Columns SS
2 b−1MS 2=SS2/(b−1) MS 2/MSE
Error SSE (a−1)(b−1)MSE=SSE/(a−1)(b−1)
Mean ab
Y
2
1 abY
2
Total
a
Ω
i=1
b
Ω
j=1
Y
2
ij
ab
a
Ω
i=1
b
Ω
j=1
Y
2
ij
/ab
Example 1.The following table gives the yield (pounds per plot) of three varieties of
wheat, obtained with four different kinds of fertilizers.
Variety of Wheat
FertilizerAB C
α 83 7
β 10 4 8
γ 65 6
δ 84 7
Let us test the hypothesis of equality in the average yields of the three varieties of wheat and the null hypothesis that the four fertilizers are equally effective.
In our notation,b=3,a=4,
y
1
.=6,y
2
.=7.33,y
3
.=5.67,y
4
.=6.33,y.1=8,y.2=4,
y.3=7,y=6.33.
Also,
SS
1=sum of squares due to fertilizer
=3[(0.33)
2
+1
2
+(0.66)
2
+0
2
]
=4.67;

TWO-WAY ANALYSIS OF VARIANCE WITH ONE OBSERVATION PER CELL 565
SS2=sum of squares due to variety of wheat
=4[(1.67)
2
+(2.33)
2
+(0.67)
2
]
=34.67
SSE =
4
Ω
i=1
3
Ω
j=1
(yij−
y
i
.−y.j+y)
2
=7.33
The results are shown in the following table:
Analysis of Variance
Source SS d.f. MS F-Ratio
Variety of wheat 34.67 2 17.33 14.2
Fertilizer 4.67 3 1.56 1.28
Error 7.33 6 1.22
Mean 481.33 1 481.33
Total 528.00 12 44.00
NowF 2,6,0.05 =5.14 andF 3,6,0.05 =4.76. Since 14.2 >5.14, we rejectH β, that there is
equality in the average yield of the three varieties; but, since 1.28θ>4.76, we acceptH
α,
that the four fertilizers are equally effective.
PROBLEMS 12.5
1.Show that the matrixXfor the model defined in (2) is of full rank,a+b−1.
2.Prove statements (3), (4), (5), and (9).
3.The following data represent the units of production per day turned out by four
different brands of machines used by four machinists:
Machinist
MachineA 1A2A3A4
B1 15 14 19 18
B
2 17 12 20 16
B
3 16 18 16 17
B
4 16 16 15 15
Test whether the differences in the performances of the machinists are significant and also whether the differences in the performances of the four brands of machines
are significant. Useα=0.05.

566 GENERAL LINEAR HYPOTHESIS
4.Students were classified into four ability groups, and three different teaching
methods were employed. The following table gives the mean for four groups:
Teaching Method
Ability GroupAB C
11 5191 4
21 8171 2
32 2251 7
41 7211 9
Test the hypothesis that the teaching methods yield the same results. That is, that the
teaching methods are equally effective.
5.The following table shows the yield (pounds per plot) of four varieties of wheat,
obtained with three different kinds of fertilizers.
Variety of Wheat
FertilizerABC D
α 836 7
β 10 4 5 8
γ 846 7
Test the hypotheses that the four varieties of wheat yield the same average yield and
that the three fertilizers are equally effective.
12.6 TWO-WAY ANALYSIS OF VARIANCE WITH INTERACTION
The model described in Section 12.5 assumes that the two factors act independently, that
is, areadditive. In practice this is an assumption that needs testing. In this section we allow
for the possibility that the two factors might jointly affect the outcome, that is, there might
be so-calledinteractions. More precisely, ifY
ijis the observation in the(i,j)th cell, we
will consider the model
Y
ij=μ+α i+βj+γij+εij, (1)
whereα
i(i=1,2,...,a)represent row effects (or effects due to factor 1),β j(j=1,2,...,b)
represent column effects (or effects due to factor 2), andγ
ijrepresent interactions or joint
effects. We will assume thatε
ijare independentlyN(0,σ
2
). We will further assume that
a
β
i=1
αi=0=
b
β
j=1
βjand
b
β
j=1
γij=0 for all i,
a
β
i=1
γij=0(2)
for allj.

TWO-WAY ANALYSIS OF VARIANCE WITH INTERACTION 567
The hypothesis of interest is
H
0:γij=0 for alli,j. (3)
One may also be interested in testing that allα’s are 0 or that allβ’s are 0 in the presence
of interactionsγ
ij.
We first note that (2) is not restrictive since we can write
Y
ij=μ
Θ

Θ
i

Θ
j

Θ
ij
+εij,
whereα
Θ
i

Θ
j
, andγ
Θ
ij
do not satisfy (2), as
Y
ij=μ
Θ
+
α
Θ

Θ

Θ
+(α
Θ
i

α
Θ

Θ
i
.−
γ
Θ
)+(β
Θ
j

β
Θ
+γ.
Θ j
−γ
Θ
)
+(γ
Θ
ij

γ
Θ i
.−γ.
Θ j

Θ
)+ε ij,
and then (2) is satisfied by choosing
μ=μ
Θ
+
α
Θ

Θ

Θ
,
α
i=α
Θ
i

α
Θ

Θ i
.−γ
Θ
,
β
j=β
Θ
j

β
Θ
+γ.
Θ j
−γ
Θ
,
γ
ij=γ
Θ
ij

γ
Θ
i
.−
γ.
Θ j

Θ
.
Here
α
Θ
=a
−1
a
Ω
i=1
α
Θ
i

Θ
=b
−1
b
Ω
j=1
β
Θ
j

Θ i
.=b
−1
b
Ω
j=1
γ
Θ
ij
,
γ.
Θ j
=a
−1
a
Ω
i=1
γ
Θ
ij
,andγ
Θ
=(ab)
−1
a
Ω
i=1
b
Ω
j=1
γ
Θ
ij
.
Next note that, unless we replicate, that is, take more than one observation per cell,
there are no degrees of freedom left to estimate the error SS (see Remark 1).
LetY
ijsbe thesth observation when the first factor is at theith level, and the second
factor at thejth level,i=1,2,...,a,j=1,2,...,b,s=1,2,...,m(>1). Then the model
becomes as follows:
Levels of Factor 2
Levels of Factor 1 1 2 ···b
1 y 111y121···y 1b1
· · ··· ·
· · ··· ·
· · ··· ·
y
11m y12m ···y 1bm
(continued)

568 GENERAL LINEAR HYPOTHESIS
Levels of Factor 2
Levels of Factor 1 1 2 ···b
2 y 211y221···y 2b1
· · ··· ·
· · ··· ·
· · ··· ·
y
21m y22m ···y 2bm
· · ··· ·
· · ··· ·
· · ··· ·
ay
a11ya21···y ab1
· · ··· ·
· · ··· ·
· · ··· ·
y
a1m ya2m ···y abm
Yijs=μ+α i+βj+γij+εijs, (4)
i=1,2,...,a,j=1,2,...,b, ands=1,2,...,m, whereε
ijs’s are independentN(0,σ
2
).
We assume that

a
i=1
αi=

b
j=1
βj=

a
i=1
γij=

b
j=1
γij=0. Suppose that we wish to
testH
α:α1=α2=···=α a=0. We leave the reader to check that model (4) is then a
special case of the general linear hypothesis withn=abm,k=ab,r=a−1, andn−k=
ab(m−1).
Let us write
Y=

a i=1≤
b j=1≤
m s=1
Yijs
n
,Yij.=

m
s=1
Yijs
m
,
Yi..=

b j=1≤
m s=1
Yijs
mb
,Y.j.=

a i=1≤
m s=1
Yijs
am
. (5)
Then it can be easily checked that
γ
ˆμ=
ˆ
ˆμ=Y,
ˆ ˆα i=
Yi..−Y,
ˆ
β j=
ˆ
ˆ
βj=Y.j.−Y,
ˆγ
ij=
ˆ
ˆγij=
Yij.−Yi..−Y.j.+Y.
(6)
It follows from Theorem 12.2.1 that
F=

i

j

s
(Yijs−
Yij.+Yi..−Y)
2


i

j

s
(Yijs−
Yij.)
2

i

j

s
(Yijs−Yij.)
2
. (7)

TWO-WAY ANALYSIS OF VARIANCE WITH INTERACTION 569
Since
Ω
i
Ω
j
Ω
s
(Yijs−
Yij.+Yi..−Y)
2
=
Ω
i
Ω
j
Ω
s
(Yijs−
Yij.)
2
+
Ω
i
Ω
j
Ω
s
(
Yi..−Y)
2
,
we can write (7) as
F=
bm
Θ
i
(
Yi..−Y)
2
Θ
i
Θ
j
Θ
s
(Yijs−Yij.)
2
. (8)
UnderH
αthe statistic[ab(m −1)/(a−1)]Fhas the centralF(a−1,ab(m−1))distribu-
tion, so that the likelihood ratio test rejectsH
αif
ab(m−1)
a−1
mb
Θ
i
(
Yi..−Y)
2
Θ
i
Θ
j
Θ
s
(Yijs−Yij.)
2
>c. (9)
A similar analysis holds for testingH
β:β1=β2=···=β b.
Next consider the test of hypothesisH
γ:γij=0 for alli,j, that is, that the two factors are
independent and the effects are additive. In this casen=abm,k=ab,r=(a−1)(b−1),
andn−k=ab(m−1). It can be shown that
ˆ
ˆμ=
Y,
ˆ ˆα i=
Yi..−Y,and
ˆ ˆ
β j=
Y.j.−Y. (10)
Thus
F=
Θ
i
Θ
j
Θ
s
(Yijs−
Yi..−Y.j.+Y)
2

Θ
i
Θ
j
Θ
s
(Yijs−
Yij.)
2
Θ
i
Θ
j
Θ
s
(Yijs−Yij.)
2
. (11)
Now
Ω
i
Ω
j
Ω
s
(Yijs−
Yi..−Y.j.+Y)
2
=
Ω
i
Ω
j
Ω
s
(Yijs−
Yij.+Yij.−Yi..−Y.j.+Y)
2
=
Ω
i
Ω
j
Ω
s
(Yijs−
Yij.)
2
+
Ω
i
Ω
j
Ω
s
(
Yij.−Yi..−Y.j.+Y)
2
,
so that we may write
F=
Θ
i
Θ
j
Θ
s
(
Yij.−Yi..−Y.j.+Y)
2
Θ
i
Θ
j
Θ
s
(Yijs−Yij.)
2
. (12)

570 GENERAL LINEAR HYPOTHESIS
UnderH γ, the statistic{(m−1)ab/[(a −1)(b−1)]}F has theF((a−1)(b−1),ab(m−1))
distribution. The likelihood ratio test rejectsH
γif
(m−1)ab(a−1)(b−1)
m
σ
i
σ
j
(
Yij.−Yi..−Y.j.+Y)
2
σ
i
σ
j
σ
s
(Yijs−Yij.)
2
>c. (13)
Let us write
SS
1=sum of squares due to factor 1 (row sum of squares)
=bm
a
α
i=1
(
Yi..−Y)
2
,
SS
2=sum of squares due to factor 2 (column sum of squares)
=am
b
α
j=1
(
Y.j.−Y)
2
,
SSI=sum of squares due to interaction
=m
a
α
i=1
b
α
j=1
(
Yij.−Yi..−Y.j.+Y)
2
,
and
SSE=sum of squares due to error (residual sum of squares)
=
a
α
i=1
b
α
j=1
m
α
s=1
(Yijs−
Yij.)
2
.
Then we may summarize the above results in the following table.
Two-Way Analysis of Variance Table with Interaction
Source of Sum of Degrees of Mean
Variation Squares Freedom Square F-Ratio
Rows SS 1 a−1MS 1=SS1/(a−1) MS 1/MSE
Columns SS
2 b−1MS 2=SS2/(b−1) MS 2/MSE
Interaction SSI (a−1)(b−1)MSI=SSI/(a−1)(b−1)MSI/MSE
Error SSE ab(m−1) MSE=SSE/ab(m −1)
Mean abm
X
2
1 abmX
2
Total
a
α
i=1
b
α
j=1
m
α
s=1
Y
2
ijs
abm
a
α
i=1
b
α
j=1
m
α
s=1
Y
2
ijs
/abm

TWO-WAY ANALYSIS OF VARIANCE WITH INTERACTION 571
Remark 1.Note that, ifm=1, there are no d.f. associated with the SSE. Indeed, SSE=0
ifm=1. Hence, we cannot make tests of hypotheses whenm=1, and for this reason we
assumem>1.
Example 1.To test the effectiveness of three different teaching methods, three instructors
were randomly assigned 12 students each. The students were then randomly assigned to
the different teaching methods and were taught exactly the same material. At the con-
clusion of the experiment, identical examinations were given to the students with the
following results in regard to grades.
Instructor
Teaching
Method I II III
1956086
85 90 77
74 80 75
74 70 70
2908983
80 90 70
92 91 75
82 86 72
3706874
80 73 86
85 78 91
85 93 89
From the data the table of means is as follows:
y
ij
. y
i
..
82 75 77 78.0 86 89 75 83.3 80 78 85 81.0
y.j.82.780 .779.0y=80.8

572 GENERAL LINEAR HYPOTHESIS
Then
SS
1=sum of squares due to methods
=bm
a
Ω
i=1
(
y
i
..−y)
2
=3×4×14.13=169.56,
SS
2=sum of squares due to instructors
=am
b
Ω
j=1
(
y.j.−y)
2
=3×4×6.86=82.32,
SSI=sum of squares due to interaction
=m
3
Ω
i=1
3
Ω
j=1
(
y
ij
.−y
i
..−y.j.+y)
2
=4×140.45 =561.80,
SSE=residual sum of squares
=
3
Ω
i=1
3
Ω
j=1
4
Ω
s=1
(yijs−
y
ij
.)
2
=1830.00.
Analysis of Variance
Source SS d.f. MSS F-Ratio
Methods 169.56 2 84.78 1.25
Instructors 82.32 2 41.16 0.61
Interactions 561.80 4 140.45 2.07
Error 1830.00 27 67.78
Withα=0.05, we see from the tables thatF 2,27,0.05 =3.35 andF 4,27,0.05 =2.73, so that
we cannot reject any of the three hypotheses that the three methods are equally effective,
that the three instructors are equally effective, and that the interactions are all 0.
PROBLEMS 12.6
1.Prove statement (6).
2.Obtain the likelihood ratio test of the null hypothesisH
β:β1=β2=···=β b=0.
3.Prove statement (10).

TWO-WAY ANALYSIS OF VARIANCE WITH INTERACTION 573
4.Suppose that the following data represent the units of production turned out each day
by three different machinists, each working on the same machine for three different
days:
Machinist
Machine ABC
B1 15, 15, 17 19, 19, 16 16, 18, 21
B
2 17, 17, 17 15, 15, 15 19, 22, 22
B
3 15, 17, 16 18, 17, 16 18, 18, 18
B
4 18, 20, 22 15, 16, 17 17, 17, 17
Using a 0.05 level of significance, test whether (a) the differences among the machin-
ists are significant, (b) the differences among the machines are significant, and (c)
the interactions are significant.
5.In an experiment to determine whether four different makes of automobiles average
the same gasoline mileage, a random sample of two cars of each make was taken
from each of four cities. Each car was then test run on 5 gallons of gasoline of the
same brand. The following table gives the number of miles traveled.
Automobile Make
Cities A B C D
Cleveland 92.3, 104.1 90.4, 103.8 110.2, 115.0 120.0, 125.4
Detroit 96.2, 98.6 91.8, 100.4 112.3, 111.7 124.1, 121.1
San Francisco 90.8, 96.2 90.3, 89.1 107.2, 103.8 118.4, 115.6
Denver 98.5, 97.3 96.8, 98.8 115.2, 110.2 126.2, 120.4
Construct the analysis of variance table. Test the hypothesis of no automobile effect,
no city effect, and no interactions. Useα=0.05.

13
NONPARAMETRIC STATISTICAL
INFERENCE
13.1 INTRODUCTION
In all the problems of statistical inference considered so far, we assumed that the distribu-
tion of the random variable being sampled is known except, perhaps, for some parameters.
In practice, however, the functional form of the distribution is seldom, if ever, known.
It is therefore desirable to devise methods that are free of this assumption concerning
distribution. In this chapter we study some procedures that are commonly referred to as
distribution-freeornonparametric methods. The term “distribution-free” refers to the fact
that no assumptions are made about the underlying distribution except that the distribution
function being sampled is absolutely continuous. The term “nonparametric” refers to the
fact that there are no parameters involved in the traditional sense of the term “parameter”
used thus far. To be sure, there is a parameter which indexes the family of absolutely con-
tinuous DFs, but it is not numerical and hence the parameter set cannot be represented as a
subset ofR
n, for anyn≥1. The restriction to absolutely continuous distribution functions
is a simplifying assumption that allows us to use the probability integral transformation
(Theorem 5.3.1) and the fact that ties occur with probability 0.
Section 13.2 is devoted to the problem of unbiased (nonparametric) estimation. We
develop the theory ofU-statistics since many estimators and test statistics may be viewed
asU-statistics. Sections 13.3 through 13.5 deal with some common hypotheses testing
problems. In Section 13.6 we investigate applications of order statistics in nonparamet-
ric methods. Section 13.7 considers underlying assumptions in some common parametric
problems and the effect of relaxing these assumptions.
An Introduction to Probability and Statistics, Third Edition. Vijay K. Rohatgi and A.K. Md. Ehsanes Saleh.
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.

576 NONPARAMETRIC STATISTICAL INFERENCE
13.2U-STATISTICS
In Chapter 6 we encountered several nonparametric estimators. For example, the empir-
ical DF defined in Section 6.3 as an estimator of the population DF is distribution-free,
and so also are the sample moments as estimators of the population moments. These are
examples of what are known asU–statisticswhich lead to unbiased estimators of popula-
tion characteristics. In this section we study the general theory ofU-statistics. Although
the thrust of this investigation is unbiased estimation, many of theU-statistics defined in
this section may be used as test statistics.
LetX
1,X2,...,X nbe iid RVs with common lawL(X), and letPbe the class of all pos-
sible distributions ofXthat consists of the absolutely continuous or discrete distributions,
or subclasses of these.
Definition 1.A statisticT(X)issufficient for the family of distributionsPif the
conditional distribution ofX,givenT=t, is the same whatever the trueF∈P.
Example 1.LetX
1,X2,...,X nbe a random sample from an absolutely continuous DF, and
letT=(X
(1),...,X
(n))be the order statistic. Then
f(x|T=t)=(n!)
−1
,
and we see thatTis sufficient for the family of absolutely continuous distributions onR.
Definition 2.A family of distributionsPiscompleteif the only unbiased estimator of 0
is the zero function itself, that is,
E
Fh(X)= 0 for allF∈P⇒h(x)=0
for allx(except for a null set with respect to eachF∈P).
Definition 3.A statisticT(X)is said to becomplete in relation to a class of distributions
Pif the class of induced distributions ofTis complete.
We have already encountered many examples of complete statistics or complete
families of distributions in Chapter 8.
The following result is stated without proof. For the proof we refer to Fraser [32,
pp. 27–30, 139–142].
Theorem 1.The order statistic(X
(1),X
(2),...,X
(n))is a complete sufficient statistic
provided that the iid RVsX
1,X2,...,X nare of either the discrete or the continuous type.
Definition 4.A real-valued parameterg(F)is said to beestimableif it has an unbiased
estimator, that is, if there exists a statisticT(X)such that
E
FT(X)=g(F)for allF∈P. (1)

U-STATISTICS 577
Example 2.IfPis the class of all distributions for which the second moment exists,Xis
an unbiased estimator ofμ(F), the population mean. Similarly,μ
2(F)=var F(X)is also
estimable, and an unbiased estimator isS
2
=
ζ
n
1
(Xi−
X)
2
/(n−1). We would like to know
whetherXandS
2
are UMVUEs. Similarly,F(x)andP F(X1+X2>0)are estimable for
F∈P.
Definition 5.Thedegree m(m≥1) of an estimable parameterg(F)is the smallest sample
size for which the parameter is estimable, that is, it is the smallestmsuch that there exists
an unbiased estimatorT(X
1,X2,...,X m)with
E
FT=g(F)for allF∈P.
Example 3.The parameterg(F)=P
F{X>c}, wherecis a known constant, has degree 1.
Also,μ(F)is estimable with degree 1 (we assume that there is at least oneF∈Psuch that
μ(F)→=0), andμ
2(F)is estimable with degreem=2, sinceμ 2(F)cannot be estimated
(unbiasedly) by one observation only. At least two observations are needed. Similarly,
μ
2
(F)has degree 2, andP(X 1+X2>0)also is of degree 2.
Definition 6.An unbiased estimator of a parameter based on the smallest sample size
(equal to degreem) is called akernel.
Example 4.ClearlyX
i1≤i≤nis a kernel ofμ(F);T(X i)=1, ifX i>c, and=0if
X
i≤cis a kernel ofP(X>c). Similarly,T(X i,Xj)=1ifX i+Xj>0, and=0 otherwise
is a kernel ofP(X
i+Xj>0),X iXjis a kernel ofμ
2
(F)andX
2
i
−XiXjis a kernel ofμ 2(F).
Lemma 1.There exists a symmetric kernel for every estimable parameter.
Proof.IfT(X
1,X2,...,X m)is a kernel ofg(F),soalsois
T
s(X1,X2,...,X m)=
1
m!
σ
P
T(Xi1
,Xi2
,...,X im
), (2)
where the summationPis over allm!permutations of{1,2,...,m}.
Example 5.A symmetric kernel forμ
2(F)is
T
s(Xi,Xj)=
1
2
{T(X i,Xj)+T(X j,Xi)}
=
1
2
(Xi−Xj)
2
,i,j=1,2,...,n(i→=j).
Definition 7.Letg(F)be an estimable parameter of degreem, and letX
1,X2,...,X nbe a
sample of sizen,n≥m. Corresponding to any kernelT(X
i1
,...,X im
)ofg(F), we define a
U-statisticfor the sample by
U(X
1,X2,...,X n)=
λ
n
m

−1
σ
C
Ts(Xi1
,...,X im
), (3)
where the summationCis over all
θ
n
m
α
combinations ofmintegers(i
1,i2,...,i m)chosen
from{1,2,...,n}, andT
sis the symmetric kernel defined in (2).

578 NONPARAMETRIC STATISTICAL INFERENCE
Clearly, theU-statistic defined in (3) is symmetric in theX i’s, and
E
FU(X)=g(F )for allF. (4)
Moreover,U(X)is a function of the complete sufficient statisticX
(1),X
(2),...,X
(n).It
follows from Theorem 8.4.6 that it is UMVUE of its expected value.
Example 6.For estimatingμ(F),theU -statistic isn
−1
Θ
n
1
Xi. For estimatingμ 2(F),
a symmetric kernel is
T
s(Xi1
,Xi2
)=
1
2
(Xi1
−Xi2
)
2
,i 1=1,2,...,n(i 1→=i2),
so that the correspondingU-statistic is
U(X)=
Γ
n
2

−1

i1<i2
1
2
(X
i1
−Xi2
)
2
=
1
n−1
n

1
(Xi−X)
2
=S
2
.
Similarly, for estimatingμ
2
(F), a symmetric kernel isT s(Xi1
,Xi2
)=X i1
Xi2
, and the
correspondingU-statistic is
U(X)=
1
θ
n
2
α

i<j
XiXj=
1
n(n−1)

iΘ=j
XiXj.
For estimatingμ
3
(F), a symmetric kernel isT s(Xi1
,Xi2
,Xi3
)=X i1
Xi2
Xi3
so that the
correspondingU-statistic is
U(X)=
Γ
n
3

−1
ffffff
i<j<k
XiXjXk
=
1
n(n−1)(n−2)

iΘ=jΘ=k
XiXjXk.
For estimatingF(x)a symmetric kernel isI
[Xi≤x]so the correspondingU-statistic is
U(X)=
1
n
n

i=1
I
[Xi≤x]=F

n
(x),
and for estimatingP(X>0)theU-statistic is
U(X)=
1
n
n

i=1
I
[Xi>0]=1−F

n
(0).

U-STATISTICS 579
Finally, for estimatingP(X 1+X2>0)theU-statistic is
U(X)=
1
θ
n
2
α

i<j
I
[Xi+Xj>0].
Theorem 2.The variance of theU-statistic defined in (3) is given by
varU(X)=
1
θ
n
m
α
m

c=1
Γ
m
c
λα
n−m
m−c
λ
ζ
c, (5)
where
ζ
c=covF{Ts(Xi1
,...,X im
),Ts(Xji
,...,X jm
)}
withm, the degree ofg(F), andcis the common number of integers in the sets{i
1,...,i m}
and{j
1,...,j m}.(Forc=0, the two statisticsT(X i1
,...,X im
)andT(X j1
,...,X jm
)are
independent and have zero covariance.)
Proof.Clearly
varU(X)
=
1
χθ
n
m
αε
2
ffff
E
F[{Ts(Xi1
,...,X im
)−g(F)}{T s(Xj1
,...,X jm
)−g(F)}].
Letcbe the number of common integers in{i
1,i2,...,i m}and{j 2,j2,...,j m}. Thenctakes
values 0, 1,...,mand forc=0,T
s(Xi1
,...,X im
)andT s(Xj1
,...,X jm
)are independent. It
follows that
varU(X)=
1
χθ
n
m
αε
2
m

c=1
Γ
n
m
λα
m
c
λα
n−m
m−c
λ
ζ
c, (6)
which is (5). The counting argument from (6) to (7) is as follows: First we select integers
{i
1,...,i m}from{1,2,...,n}in
θ
n
m
α
ways. Next we select the integers in{j
1,...,j m}.This
is done by selecting first thecintegers that will be in{i
1,...,i m}(hence common to both
sets) and then them−cintegers fromn−mintegers which will not be{j
1,...,j m}.Note
thatζ
0=0 from independence.
Example 7.Consider theU-statistic estimator
Xofg(F)=μ(F)in Example 6. Here
m=1,T(x)=x, andζ
1=var(X 1)=σ
2
so thatvar(
X)=σ
2
/n.
For the parameterg(F)=μ
2(F),U(X)=S
2
. In this case,m=2,T s(Xi1
,Xi2
)=
(X
i1
−Xi2
)
2
/2so
varU(X)=
1
θ
n
2
α{2(n−2)ζ
1+ζ2},
where
ζ
2=EF

1
4
(X
i1
−Xi2
)
4
−σ
4

=
μ
4+σ
4
2

580 NONPARAMETRIC STATISTICAL INFERENCE
and
ζ
1=cov

1
2
(X
i1
−Xi2
)
2
,
1
2
(X
i1
−Xj2
)
2

wherei
2→=j2. Then
ζ
1=

4−σ
4
)
4
and
varU(X)=var(S
2
)=
2
n(n−1)

(n−2)(μ
4−σ
4
)
2
+
μ
4+σ
4
2

=
1
n

μ
4−
n−3
n−1
σ
4

,
which agrees with Corollary 2 to Theorem 6.3.4.
For the parameterg(F)=F(x),varU(X)= F(x)(1−F(x))/n, and forg(F)=
P
F(X1+X2>0)
varU(X)=
1
n(n−1)
{4(n−2)ζ
1+2ζ2},
where
ζ
1=PF(X1+X2>0,X 1+X3>0)−P
2
F
(X1+X2>0)
and
ζ
2=PF(X1+X2>0)−P
2
F
(X1+X2>0)
=P
F(X1+X2>0)P F(X1+X2≤0).
Corollary to Theorem 2.LetU be theU-statistic for a symmetric kernel
T
s(X1,X2,...,X m). SupposeE F[Ts(X1,...,X m)]
2
<∞. Then
lim
n→∞
{nvarU(X)}=m
2
ζ1. (7)
Proof.It is easily shown that 0≤ζ
c≤ζmfor 1≤c≤m. It follows from the hypothesis
ζ
m=var[T s(X1,...,X m)]
2
<∞and (5) thatvarU(X)<∞.Now
n
θ
m
c
αθ
n−m
m−c
α
θ
n
m
αζ
c=
(m!)
2
n
c![(m−c)!]
2
[(n−m)!]
2
n!(n−2m+c)!
ζ
c
=
(m!)
2
c![(m−c)!]
2
n
(n−m)(n−m−1)···(n−2m+c+1)
n(n−1)···(n−m+1)
ζ
c.

U-STATISTICS 581
Now note that the numerator hasm−c+1 factors involvingn, while the denominator has
msuch factors so that forc>1, the ratio involvingngoes to 0 asn→∞.Forc=1, this
ratio→1 and
varU(X)−→
(m!)
2
((m−1)!)
2
ζ1=m
2
ζ1
asn→∞.
Example 8.In Example 7,nvar(X)≡σ
2
and
nvar(S
2
)−→2
2
ζ1=μ4−σ
4
asn→∞.
Finally we state, without proof, the following result due to Hoeffding [45], which estab-
lishes the asymptotic normality of a suitably centered and normedU-statistic. For proof
we refer to Lehmann [61, pp. 364–365] or Randles and Wolfe [85, p. 82].
Theorem 3.LetX
1,X2,...,X nbe a random sample from a DFFand letg(F)be an
estimable parameter of degreemwith symmetric kernelT
s(X1,X2,...,X m).
IfE
F{Ts(X1,X2,...,X m)}
2
<∞andUis theU-statistic forg(as defined in (3)), then

n(U(X)−g(F))
L
−→N(0,m
2
ζ1), provided
ζ
1=covF{Ts(Xi1
,...,X im
),Ts(Xj1
,...,X jm
)}>0.
In view of the corollary to Theorem 2, it follows that(U−g(F))/
τ
var(U)
L
−→N(0,1),
providedζ
1>0.
Example 9(Example 7 continued). Clearly,

n(X−μ)/σ
L
−→N(0,1)asn→∞since
ζ
1=σ
2
>0.
For the parameterg(F)=μ
2(F),varU(X)=var( S
2
)=
1
n

μ
4−
n−3
n−1
σ
4

,
ζ
1=(μ 4−σ
4
)/4>0 so it follows from Theorem 3 that

n(S
2
−σ
2
)
L
−→N(0,μ
4
−σ
4
).
The concept ofU-statistics can be extended to multiple random samples. We will
restrict ourselves to the case of two samples. LetX
1,X2,...,X n1
andY 1,Y2,...,Y n2
be two
independent random samples from DFsFandG, respectively.
Definition 8.A parameterg(F,G)is estimable of degrees(m
1,m2)ifm 1andm 2are the
smallest sample sizes for which there exists a statisticT(X
1,...,X m1
;Y1,...,Y m2
)such that
E
F,GT(X1,...,X m1
;Y1,...,Y m2
)=g(F ,G) (8)
for allF,G∈P.

582 NONPARAMETRIC STATISTICAL INFERENCE
The statisticTin Definition 8 is called akernelofgand a symmetrized version ofT,
T
sis called asymmetric kernelofg. Without loss of generality therefore we assume that
the two-sample kernelTin (9) is a symmetric kernel.
Definition 9.Letg(F,G),F,G∈Pbe an estimable parameter of degree(m
1,m2). Then,
a (two-sample)U-statistic estimate ofgis defined by
U(X;Y)=
Γ
n
1
m1

−1Γ
n
2
m2

−1

i∈A

j∈B
T
θ
Xi1
,...,X im
1
;Yj1
,...,Y jm
2
α
, (9)
whereAandBare collections of all subsets ofm
1andm 2integers chosen without
replacement from the sets{1,2,...,n
1}and{1,2,...,n 2}respectively.
Example 10.LetX
1,X2,...,X n1
andY 1,Y2,...,Y n1
be two independent samples
from DFsFandG, respectively. Letg(F,G)=P(X<Y)=
ψ

−∞
F(x)g(x)dx=
ψ

−∞
P(Y>y)f(y)dy, wherefandgare the respective PDFs ofFandG. Then
T(X
i;Yj)=
δ
1,ifX
i<Yj
0,ifX i≥Yj
is an unbiased estimator ofg. Clearly,ghas degree(1,1)and the two-sampleU-statistic
is given by
U(X;Y)=
1
n1n2
n
1ff
i=1
n
2ff
j=1
T(Xi;Yj).
Theorem 4.The variance of the two-sampleU-statistic defined in (10) is given by
varU(X;Y)=
1
θ
n1
m1
αθ
n2
m2
α
m1ff
c=0
m
2ff
d=0
Γ
m
1
c
→∗
n
1−m1
m1−c
→∗
m
2
d
→∗
n
2−m2
m2−d

ζ
c,d, (10)
whereζ
c,dis the covariance betweenT
θ
X i1
,...,X im
1
;Yj1
,...,Y jm
1
α
andT(X
k1
,...,X km
1
;
Y
Θ1
,...,Y Θm
2
)with exactlycX’s anddY’s in common.
Corollary.SupposeE
F,GT
2
(X1,...,X m1
;Y1,...,Y m2
)<∞for allF,G∈P.LetN=n 1+
n
2and supposen 1,n2,N→∞such thatn 1/N→λ,n 2/N→1−λ. Then
lim
N→∞
NvarU(X;Y)=
m
2
1
λ
ζ
1,0+
m
2 2
1−λ
ζ
0,1. (11)
The proofs of Theorem 4 and its corollary parallel those of Theorem 2 and its corollary
and are left to the reader.
Example 11.For theU-statistic in Example 10
E
F,GU
2
(X;Y)=
1
n
2
1
n
2
2
ffffffff
E
F,G{T(X i;Yj)T(X k;YΘ)}.

U-STATISTICS 583
Now
E
F,G{T(X i;Yj)T(X k;Y≥)}=P(X i<Yj,Xk<Y≥)
=

















ψ

−∞
F(x)g(x)dx fori=k,j=λ,
ψ

−∞
[1−G(x)]
2
f(x)dxfori=k,jλ=λ,
ψ

−∞
F
2
(x)g(x)dx foriλ=k,j=λ,

ψ

−∞
F(x)g(x)dx

2
foriλ=k,jλ=λ,
wherefandgare PDFs ofFandG, respectively. Moreover,
ζ
1,0=


−∞
[1−G(x)]
2
f(x)dx−[g(F,G)]
2
and
ζ
0,1=


−∞
F
2
(X)g(x)dx−[g(F,G)]
2
.
It follows that
varU(X;Y)=
1
n1n2
{g(F,G)[1−g(F,G)] + (n 1−1)ζ 1,0+(n2−1)ζ 0,1}.
In the special case whenF=G,g(F,G)=1/2,ζ
1,0=ζ0,1=1/3−1/4=1/12, and
varU=(n
1+n2+1)/[12n 1n2].
Finally we state, without proof, the two-sample analog of Theorem 3 which establishes
the asymptotic normality of the two-sampleU-statistic defined in (10).
Theorem 5.LetX
1,X2,...,X n1
andY 1,Y2,...,Y n2
be independent random samples from
DFsFandG, respectively, and letg(F,G)be an estimable parameter of degree(m
1,m2).
LetT(X
1,...,X m1
;Y1,...,Y m2
)be a symmetric kernel forgsuch thatET
2
<∞. Then

n1+n2{U(X;Y)−g(F,G)}
L
−→N(0,σ
2
),
whereσ
2
=
m
2
1
ζ1,0
λ
+
m
2 2
ζ0,1
1−λ
, providedσ
2
>0 and 0<λ= lim N→∞(m1/N)=λ<1,
N=n
1+n2.
We see that(U−g)/
√ varU
L
−→N(0,1), providedσ
2
>0.
For the proof of Theorem 5 we refer to Lehmann [61, p. 364], or Randles and Wolfe [85,
p. 92].
Example 11(Continued). In Example 11 we saw that in the special case whenF=G,
ζ
1,0=ζ0,1=1/12, andvarU=(n 1+n2+1)/[12n 1n2].Itfollowsthat
U(X;Y)−(1/2)
τ
(n1+n2+1)/[12n 1n2]
L
−→N(0,1).

584 NONPARAMETRIC STATISTICAL INFERENCE
PROBLEMS 13.2
1.Let(R,B,P
θ)be a probability space, and letP={P θ:θ∈Θ}.LetA be a Borel
subset ofR, and consider the parameterd(θ)=P
θ(A).Isdestimable? If so, what is
the degree? Find the UMVUE ford, based on a sample of sizen, assuming thatPis
the class of all continuous distributions.
2.LetX
1,X2,...,X mandY 1,Y2,...,Y nbe independent random samples from two
absolutely continuous DFs. Find the UMVUEs of (a)E{XY}and (b)var(X+Y).
3.Let(X
1,Y1),(X2,Y2),...,(X n,Yn)be a random sample from an absolutely continu-
ous distribution. Find the UMVUEs of (a)E(XY)and (b)var(X+Y).
4.LetT(X
1,X2,...,X n)be a statistic that is symmetric in the observations. Show that
Tcan be written as a function of the order statistic. Conversely, ifT(X
1,X2,...,X n)
can be written as a function of the order statistic,Tis symmetric in the observations.
5.LetX
1,X2,...,X nbe a random sample from an absolutely continuous DFF,F∈P.
FindU-statistics forg
1(F)=μ
3
(F)andg 2(F)=μ 3(F). Find the corresponding
expressions for the variance of theU-statistic in each case.
6.In Example 3, show thatμ
2(F)is not estimable with one observation. That is, show
that the degree ofμ
2(F)whereF∈P, the class of all distributions with finite second
moment, is 2.
7.Show that forc=1,2,...,m,0≤ζ
c≤ζm.
8.LetX
1,X2,...,X nbe a random sample from an absolutely continuous DFF,F∈P.
Let
g(F)=E
F|X1−X2|.
Find theU-statistic estimator ofg(F)and its variance.
13.3 SOME SINGLE-SAMPLE PROBLEMS
LetX
1,X2,...,X nbe a random sample from a DFF. In Section 13.2 we studied properties
ofU-statistics as nonparametric estimators of parametersg(F). In this section we con-
sider some nonparametric tests of hypotheses. Often the test statistic may be viewed as a
function of aU-statistic.
13.3.1 Goodness-of-Fit Problem
The problem of fit is to test the hypothesis that the sample comes from a specified DF
F
0against the alternative that it is from some other DFF, whereF(x)λ=F 0(x)for some
x∈R. In Section 10.3 we studied the chi-square test of goodness of fit for testingH
0:
X
i∼F0. Here we consider the Kolmogorov–Smirnov test ofH 0. SinceH 0concerns the
underlying DF of theX’s, it is natural to compare theU-statistic estimator ofg(F)=
F(x)with the specified DFF
0underH 0.TheU-statistic forg(F)=F(x)is the empirical
DFF

n
(x).

SOME SINGLE-SAMPLE PROBLEMS 585
Definition 1.LetX 1,X2,...,X nbe a sample from a DFF, and letF

n
be a corresponding
empirical DF. The statistic
D
n=sup
x
|F

n
(x)−F(x)| (1)
is called the(two-sided) Kolmogorov–Smirnov statistic. We write
D
+
n
=sup
n
[F

n
(x)−F(x)] (2)
and
D

n
=sup
x
[F(x)−F

n
(x)] (3)
and callD
+
n
,D

n
theone-sided Kolmogorov–Smirnov statistic.
Theorem 1.The statisticsD
n,D

n
,D
+
n
are distribution-free for any continuous DFF.
Proof.Clearly,D
n=max(D
+
n
,D

n
).LetX
(1)≤X
(2)≤ ··· ≤X
(n)be the order statistics
ofX
1,X2,...,X n, and defineX
(0)=−∞,X
(n+1)=+∞. Then
F

n
(x)=
i
n
forX
(i)≤x<X
(i+1),i=0,1,2,...,n,
and we have
D
+
n
=max
0≤i≤n
sup
X
(i)≤x<X
(i+1)

i
n
−F(x)

=max
0≤i≤n

i
n
− infX
(i)≤x<X
(i+1)
F(x)

=max
0≤i≤n

i
n
−F(X
(i))

=max

max
1≤i≤n

i
n
−F(X
(i))

,0

.
SinceF(X
(i))is theith-order statistic of a sample fromU(0,1)irrespective of whatFis, as
long as it is continuous, we see that the distribution ofD
+
n
is independent ofF. Similarly,
D

n
=max

max
1≤i≤n

F(X
(i))−
i−1
n

,0

,
and the result follows.
Without loss of generality, therefore, we assume thatFis the DF of aU(0,1)RV.

586 NONPARAMETRIC STATISTICAL INFERENCE
Theorem 2.IfFis continuous, then
P

D
n≤v+
1
2n

=























0i fv≤0,

v+(1/ 2n)
(1/2n)−v
v+(3/ 2n)
(3/2n)−v
···

v+[(2n− 1)/2n]
[(2n−1)/2n]−v
f(u1,u2,...,u n)·
1

n
duiif 0<v<
2n−1
2n
,
1i fv≥
2n−1
2n
,
(4)
where
f(u
1,u2,...,u n)=
δ
n!,0<u
1<···<u n<1,
0,otherwise,
(5)
is the joint PDF of the set of order statistics for a sample of sizenfromU(0,1).
We will not prove this result here. LetD
n,αbe the upperα-percent point of the distribu-
tion ofD
n, that is,P{D n>Dn,α}≤α. The exact distribution ofD nfor selected values ofn
andαhas been tabulated by Miller [74], Owen [79], and Birnbaum [9]. The large-sample
distribution ofD
nwas derived by Kolmogorov [53], and we state it without proof.
Theorem 3.LetFbe any continuous DF. Then for everyz≥0
lim
n→∞
P{Dn≤zn
−1/2
}=L(z), (6)
where
L(z)=1−2


i=1
(−1)
i−1
e
−2i
2
z
2
. (7)
Theorem 3 can be used to findd
αsuch thatlim n→∞P{

nDn≤dα}=1−α. Tables
ofd
αfor various values ofαare also available in Owen [79].
The statisticsD
+
n
andD

n
have the same distribution because of symmetry, and their
common distribution is given by the following theorem.
Theorem 4.LetFbe a continuous DF. Then
P{D
+
n
≤z}=



















0i fz≤0,

1
1−z
un
[(n−1)/n]−z
···

u3
(2/n)−z
×

u2
(1/n)−z
f(u1,u2,...,u n)
n

i=1
duiif 0<z<1,
1i fz≥1,
(8)
wherefis given by (5).

SOME SINGLE-SAMPLE PROBLEMS 587
Proof.We leave the reader to prove Theorem 4.
Tables for the critical valuesD
+
n,α
, whereP{D
+
n
>D
+
n,α
}≤α, are also available for
selected values ofnandα; see Birnbaum and Tingey [8]. Table ST7 at the end of this
book givesD
+
n,α
andD n,αfor some selected values ofnandα. For large samples Smirnov
[108] showed that
lim
n→∞
P{

nD
+
n
≤z}=1−e
−2z
2
,z≥0. (9)
In fact, in view of (9), the statisticV
n=4nD
+2
n
has a limitingχ
2
(2)distribution, for
4nD
+2
n
≤4z
2
if and only if

nD
+
n
≤z,z≥0, and the result follows since
lim
n→∞
P{Vn≤z
2
}=1−e
−2z
2
,z≥0,
so that
lim
n→∞
P{Vn≤x}=1−e
−x/2
,x≥0,
which is the DF of aχ
2
(2)RV.
Example 1.Letα=0.01, and let us approximateD
+
n,α
.Wehaveχ
2
2,0.01
=9.21. Thus
X
n=9.21, yielding
D
+
n,0.01
=

9.21
4n
=
3.03
2

n
.
If, for example,n=9, thenD
+ n,0.01
=3.03/6 =0.50. Of course, the approximation is better
for largen.
The statisticD
nand its one-sided analogs can be used in testingH 0:X∼F 0against
H
1:X∼F, whereF 0(x)λ=F(x)for somex.
Definition 2.To testH
0:F(x)=F 0(x)for allxat levelα,theKolmogorov–Smirnov test
rejectsH
0ifDn>Dn,α. Similarly, it rejectsF(x)≥F 0(x)for allxifD

n
>D
+
n,α
and rejects
F(x)≤F
0(x)for allxat levelαifD
+
n
>D
+
n,α
.
For large samples we can approximate by using Theorem 3 or (9) to obtain an
approximateα-level test.
Example 2.Let us consider the data in Example 10.3.3, and apply the Kolmogorov–
Smirnov test to determine the goodness of the fit. Rearranging the data in increasing order
of magnitude, we have the following result:

588 NONPARAMETRIC STATISTICAL INFERENCE
xF 0(x)F

20
(x)i/20−F 0(x
(i))F 0(x
(i))−(i−1)/20
−1.787 0.0367
1
20
0.0133 0.0367
−1.229 0.1093
2
20
−0.0093 0.0593
−0.525 0.2998
3
20
−0.1498 0.1998
−0.513 0.3050
4
20
−0.1050 0.1550
−0.508 0.3050
5
20
−0.0550 0.1050
−0.486 0.3121
6
20
−0.0121 0.0621
−0.482 0.3156
7
20
0.0344 0.0156
−0.323 0.3745
8
20
0.0255 0.0245
−0.261 0.3974
9
20
0.0526 −0.0026
−0.068 0.4721
10
20
0.0279 0.0221
−0.057 0.4761
11
20
0.0739 −0.0239
0.137 0.5557
12
20
0.0443 0.0057
0.464 0.6772
13
20
−0.0272 0.0772
0.595 0.7257
14
20
−0.0257 0.0757
0.881 0.8106
15
20
−0.0606 0.1106
0.906 0.8186
16
20
−0.0186 0.0686
1.046 0.8531
17
20
−0.0031 0.0531
1.237 0.8925
18
20
0.0075 0.0425
1.678 0.9535
19
20
−0.0035 0.0535
2.455 0.9931 1 0.0069 0.0431
From Theorem 1,
D

20
=0.1998, D
+
20
=0.0739, andD 20=max(D
+
20
,D

20
)=0.1998.
Let us takeα=0.05. ThenD
20,0.05=0.294. Since 0.1998 <0.294, we acceptH 0at the
0.05 level of significance.
It is worthwhile to compare the chi-square test of goodness of fit and the Kolmogorov–
Smirnov test. The latter treats individual observations directly, whereas the former
discretizes the data and sometimes loses information through grouping. Moreover, the
Kolmogorov–Smirnov test is applicable even in the case of very small samples, but the
chi-square test is essentially for large samples.
The chi-square test can be applied when the data are discrete or continuous, but the
Kolmogorov–Smirnov test assumes continuity of the DF. This means that the latter test

SOME SINGLE-SAMPLE PROBLEMS 589
provides a more refined analysis of the data. If the distribution is actually discontinuous,
the Kolmogorov–Smirnov test is conservative in that it favorsH
0.
We next turn our attention to some other uses of the Kolmogorov–Smirnov statistic.
LetX
1,X2,...,X nbe a sample from a DFF, and letF

n
be the sample DF. The estimateF

n
ofFfor largenshould be close toF. Indeed,
P
δ
|F

n
(x)−F(x)|≤
λ
τ
F(x)[1−F(x)]

n
ˆ
≥1−
1
λ
2
, (10)
and, sinceF(x)[1−F(x)]≤
1
4
,wehave
P

|F

n
(x)−F(x)|≤
λ
2

n

≥1−
1
λ
2
. (11)
ThusF

n
can be made close toFwith high probability by choosingλand large enoughn.
The Kolmogorov–Smirnov statistic enables us to determine the smallestnsuch that the
error in estimation never exceeds a fixed valueεwith a large probability 1−α. Since
P{D
n≤ε}≥1 −α, (12)
ε=D
n,α; and, givenεandα, we can readnfrom the tables. For largen, we can use the
asymptotic distribution ofD
nand solved α=ε

nforn.
We can also form confidence bounds forF.Givenαandn, we first findD
n,αsuch that
P{D
n>Dn,α}≤α, (13)
which is the same as
P

sup
x
|F

n
(x)−F(x)|≤D n,α

≥1−α.
Thus
P{|F

n
(x)−F(x)|≤D n,αfor allx}≥1−α. (14)
Define
L
n(x)=max{F

n
(x)−D n,α,0} (15)
and
U
n(x)=min{F

n
(x)+D n,α,1}. (16)
Then the region betweenL
n(x)andU n(x)can be used as a confidence band forF(x)with
associated confidence coefficient 1−α.
Example 3.For the data on the standard normal distribution of Example 2, let us form
a 0.90 confidence band for the DF. We haveD
20,0.10=0.265. The confidence band is,
therefore,F

20
(x)±0.265 as long as the band is between 0 and 1.

590 NONPARAMETRIC STATISTICAL INFERENCE
13.3.2 Problem of Location
LetX
1,X2,...,X nbe a sample of sizenfrom some unknown DFF.Letp be a positive
real number, 0<p<1, and letz
p(F)denote the quantile of orderpfor the DFF.Inthe
following analysis we assume thatFis absolutely continuous. The problem of location
is to testH
0:zp(F)=z 0,z0a given number, against one of the alternativesz p(F)>z 0,
z
p<z0, andz p→=z0. The problem of location and symmetry is to testH
χ
0
:z0.5(F)=z 0,
andFis symmetric againstH
χ
1
:z0.5(F)→=ζ 0orFis not symmetric.
We consider two tests of location. First, we describe the sign test.
13.3.2.1 The Sign TestLetX
1,X2,...,X nbe iid RVs with common PDFf. Consider
the hypothesis testing problem
H
0:zp(f)=z 0 againstH 1:zp(f)>z 0, (17)
wherez
p(f)is the quantile of orderpof PDFf,0<p<1. Letg(F)=P(X i>z0)=
P(X
i−z0>0). Then the correspondingU-statistic is given by
nU(X)=R
+
(X),
the number of positive elements inX
1−z0,X2−z0,...,X n−z0. Clearly,P(X i=z0)=0.
Fraser [32, pp. 167–170] has shown that a UMP test ofH
0againstH 1is given by
ϕ(x)=





1,R
+
(x)>c,
γ,R
+
(x)=c,
0,R
+
(x)<c,
(18)
wherecandγare chosen from the size restriction
α=
n
σ
i=c+1
λ
n
R
+
(x)

(1−p)
R
+
(x)
p
n−R
+
(x)

λ
n
c

(1−p)
c
p
n−c
. (19)
Note that, underH
0,zp(f)=z 0, so thatP H0
(X≤z 0)=pandR
+
(X)∼b(n,1−p).The
same test is UMP forH
0:zp(f)≤z 0againstH 1:zp(f)>z 0. For the two-sided case,
Fraser [32, p. 171] shows that the two-sided sign test is UMP unbiased.
If, in particular,z
0is the median off, thenp=1/2 underH 0. In this case one can also
use the sign test to testH
0:med(X)=z 0,Fis symmetric.
For largenone can use the normal approximation to binomial to findcandγin (19).
Example 4.Entering college freshmen have taken a particular high school achievement
test for many years, and the upper quartile (p =0.75) is well established at a score of 195.
A particular high school sent 12 of its graduates to college, where they took the examina-
tion and obtained scores of 203, 168, 187, 235, 197, 163, 214, 233, 179, 185, 197, 216.
Let us test the null hypothesisH
0thatz 0.75≤195 againstH 1:z0.75>195 at theα=0.05
level.

SOME SINGLE-SAMPLE PROBLEMS 591
We have to findcandγsuch that
12
σ
i=c+1
λ
12
i
→∗
1
4


3
4

12−i

λ
12
c
→∗
1
4


3
4

12−c
=0.05.
From the table of cumulative binomial distribution (Table ST1) forn=12,p=
1
4
,wesee
thatc=6. Thenγis given by
0.0142+γ
λ
12
6
→∗
1
4


3
4

6
=0.05.
Thus
γ=
0.0358
0.0402
=0.89.
In our case the number of positive signs,x
i−195,i=1,2,...,12, is 7, so we rejectH 0
that the upper quartile is≤195.
Example 5.A random sample of size 8 is taken from a normal population with mean 0
and variance 1. The sample values are−0.465, 0.120,−0.238,−0.869,−1.016, 0.417,
0.056, 0.561. Let us test hypothesisH
0:μ=−1.0againstH 1:μ>− 1.0. We should
expect to rejectH
0since we know that it is false. The number of observations,x i−μ0=
x
i+1.0, that are≥0is7.Wehavetofindcandγsuch that
8
σ
i=c+1
λ
8
i
→∗
1
2

8

λ
8
c
→∗
1
2

8
=0.05, say,
that is,
8
σ
i=c+1
λ
8
i


λ
8
c

=12.8.
We see thatc=6 andγ=0.13. Since the number of positivex
i−μ0is>6, we rejectH 0.
Let us now apply the parametric test here. We have
x=−
1.434
8
=−0.179.
Sinceσ=1, we rejectH
0if
x>μ0+
1

n
z
α=−1.0+
1

8
1.64
=−0.42.
Since−0.179>−0.42, we rejectH
0.

592 NONPARAMETRIC STATISTICAL INFERENCE
The single-sample sign test described above can easily be modified to apply to sampling
from a bivariate population. Let(X
1,Y1),(X2,Y2),...,(X n,Yn)be a random sample from a
bivariate population. LetZ
i=Xi−Yi,i=1,2,...,n, and assume thatZ ihas an absolutely
continuous DF. Then one can test hypotheses concerning the order parameters ofZby
using the sign test. A hypothesis of interest here is thatZhas a given medianz
0. Without
loss of generality letz
0=0. ThenH 0:med(Z)=0, that is,P{Z>0}=P{Z<0}=
1
2
.
Note thatmed(Z)is not necessarily equal tomed(X )−med(Y ), so thatH
0is not
thatmed(X)=med(Y)but thatmed(Z )=0. The sign test is UMP against one-sided
alternatives and UMP unbiased against two-sided alternatives.
Example 6.We consider an example due to Hahn and Nelson [40], in which two measur-
ing devices take readings on each of 10 test units. LetXandY, respectively, be the readings
on a test unit by the first and second measuring devices. LetX=A+ε
1,Y=A+ε 2,
whereA,ε
1,ε2, respectively, are the contributions to the readings due to the test unit and
to the first and the second measuring devices. LetA,ε
1,ε2be independent withEA=μ,
var(A)=σ
2
a
,Eε1=Eε 2=0,var(ε 1)=σ
2
1
,var(ε 2)=σ
2
2
, so thatXandYhave common
meanμand variancesσ
2
1

2
a
andσ
2
2

2
a
, respectively. Also, the covariance betweenX
andYisσ
2
a
. The data are as follows:
Test unit
1234 5 678 910
First device,X 71 108 72 140 61 97 90 127 101 114
Second device,Y77 105 71 152 88 117 93 130 112 105
Z=X−Y −631 −8−17−20−3−3−11 9
Let us test the hypothesisH 0:med(Z)=0. The number ofZ i’s>0is3.Wehave
P{number ofZ
i’s>0is≤3|H 0}=
3

k=0
Γ
10
k

1
2

10
=0.172.
Using the two-sided sign test, we cannot rejectH
0at levelα=0.05, since 0.172 >0.025.
The RVsZ
ican be considered to be distributed normally, so that underH 0the common
mean ofZ
i’s is 0. Using a paired comparisont-test on the data, we can show thatt=−0.88
for 9 d.f., so we cannot reject the hypothesis of equality of means ofXandYat level
α=0.05.
Finally, we consider the Wilcoxon signed-ranks test.
13.3.2.2 The Wilcoxon Signed-Ranks TestThe sign test for median and symmetry
loses information since it ignores the magnitude of the difference between the observa-
tions and the hypothesized median. The Wilcoxon signed-ranks test provides an alternative

SOME SINGLE-SAMPLE PROBLEMS 593
test of location (and symmetry) that also takes into account the magnitudes of these
differences.
LetX
1,X2,...,X nbe iid RVs with common absolutely continuous DFF, which is sym-
metric about the medianz
1/2. The problem is to testH 0:z
1/2=z0against the usual
one- or two-sided alternatives. Without loss of generality, we assume thatz
0=0. Then
F(−x)=1−F(x)for allx∈R.TotestH
0:F(0)=
1
2
orz
1/2=0, we first arrange
|X
1|,|X2|,...,|X n|in increasing order of magnitude, and assign ranks 1,2,...,n, keeping
track of the original signs ofX
i. For example, ifn=4 and|X 2|<|X 4|<|X 1|<|X 3|,the
rank of|X
1|is 3, of|X 2|is 1, of|X 3|is 4, and of|X 4|is 2.
Let
δ
T
+
=the sum of the ranks of positiveX i’s,
T

=the sum of the ranks of negativeX i’s.
(20)
Then, underH
0, we expectT
+
andT

to be the same. Note that
T
+
+T

=
n
σ
1
i=
n(n+1)
2
, (21)
so thatT
+
andT

are linearly related and offer equivalent criteria. Let us define
Z
i=
δ
1ifX
i>0
0ifX
i<0
,i=1,2,...,n, (22)
and writeR(|X
i|)=R
+
i
for the rank of|X i|. ThenT
+
=
ζ
n
i=1
R
+
i
ZiandT

=
ζ
n
i=1
(1−Z i)R
+
i
.Also,
T
+
−T

=−
n
σ
i=1
R
+
i
+2
n
σ
i=1
ZiR
+
i
=2
n
σ
i=1
R
+
i
Zi−
n(n+1)
2
. (23)
The statisticT
+
(orT

) is known as theWilcoxon statistic. A large value of T
+
(or,
equivalently, a small value ofT

) means that most of the large deviations from 0 are
positive, and therefore we rejectH
0in favor of the alternative,H 1:z
1/2>0.
A similar analysis applies to the other two alternatives. We record the results as follows:
Test
H0 H1 RejectH 0if
z
1/2=0z
1/2>0 T
+
>c1
z
1/2=0z
1/2<0 T
+
<c2
z
1/2=0z
1/2λ=0 T
+
<c3orT
+
>c4

594 NONPARAMETRIC STATISTICAL INFERENCE
We now show how the Wilcoxon signed-ranks test statistic is related to theU-statistic
estimate ofg
2(F)=P F(X1+X2>0). Recall from Example 13.2.6 that the corresponding
U-statistic is
U
2(X)=
Γ
n
2

−1

1≤i<j≤n
I
[Xi+Xj>0]. (24)
First note that

1≤i≤j≤n
I
[Xi+Xj>0]=
n

j=1
I
[Xj>0]+

1≤i<j≤n
I
[Xi+Xj>0]. (25)
Next note that fori<j,X
(i)+X
(j)>0 if and only ifX
(j)>0 and|X
(i)|<|X
(j)|.Itfollows
that
Θ
j
i=1
I
[X
(i)+X
(j)>0]is the signed-rank ofX
(j). Consequently,
T
+
=
n

j=1
j

i=1
I
[X
(i)+X
(j)>0]=

1≤i≤j≤n
I
[Xi+Xj>0]
=
n

j=1
I
[Xj>0]+

1≤i<j≤n
I
[Xi+Xj>0]
=nU1(X)+
Γ
n
2

U 2(X), (26)
whereU
1is theU-statistic forg 1(F)=P F(X1>0).
We next compute the distribution ofT
+
for small samples. The distribution ofT
+
is
tabulated by Kraft and Van Eeden [55, pp. 221–223].
Let
Z
(i)=
δ
1ifthe|X
j|that has rankiis>0
0 otherwise.
Note thatT
+
=0 if all differences have negative signs, andT
+
=n(n+1)/2 if all differ-
ences have positive signs. Here a difference means a difference between the observations
and the postulated value of the median.T
+
is completely determined by the indicatorsZ
(i),
so that the sample space can be considered as a set of 2
n
n-tuples(z 1,z2,...,z n), where
eachz
iis 0 or 1. UnderH 0,z
1/2=z0and each arrangement is equally likely. Thus
P
H0
{T
+
=t}=
{number of ways to assign+or−signs to
integers 1, 2,...,nso that the sum ist}
2
n
=
n(t)
2
n
, say. (27)

SOME SINGLE-SAMPLE PROBLEMS 595
Note that every assignment has a conjugate assignment with plus and minus signs
interchanged so that for this conjugate,T
+
is given by
n

1
i(1−Z
(i))=
n(n+1)2

n

1
iZ
(i). (28)
Thus underH
0the distribution ofT
+
is symmetric about the meann(n+1)/4.
Example 7.Let us compute the null distribution forn=3.E
H0
T
+
=n(n+1)/4=3, and
T
+
takes values from 0 ton(n+1)/2=6:
Ranks Associated with
Value ofT
+
Positive Differencesn(t)
6 1,2,3 1
52 ,31
41 ,31
31 ,2;32
so that
P
H0
{T
+
=t}=





1
8
,t=4,5,6,0,1,2,
2
8
,t=3,
0,otherwise.
(29)
Similarly, forn=4, one can show that
P
H0
{T
+
=t}=





1
16
,t=0,1,2,8,9,10,
2
16
,t=3,4,5,6,7,
0,otherwise.
(30)
An alternative procedure would be to use the MGF technique. UnderH
0,theRVsiZ
(i)
are independent and have the PMF
P{iZ
(i)=i}=P{iZ
(i)=0}=
1
2
.
Thus
M(t)=Ee
tT
+
=
n

i=1
Γ
e
it
+1
2

. (31)

596 NONPARAMETRIC STATISTICAL INFERENCE
We expressM(t)as a sum of terms of the formα je
jt
/2
n
. The PMF ofT
+
can then be
determined by inspection. For example, in the casen=4, we have
M(t)=
4

i=1
Γ
e
it
+1
2
λ
=
Γ
e
t
+1
2
λα
e
2t
+1
2
λα
e
3t
+1
2
λα
e
4t
+1
2
λ
=
1
4
(e
3t
+e
2t
+e
t
+1)
Γ
e
3t
+1
2
λα
e
4t
+1
2
λ
(32)
=
1
8
(e
6t
+e
5t
+e
4t
+2e
3t
+e
2t
+e
t
+1)
Γ
e
4t
+1
2
λ
(33)
=
1
16
(e
10t
+e
9t
+e
8t
+2e
7t
+2e
6t
+2e
5t
+2e
4t
+2e
3t
+e
2t
+e
t
+1). (34)
This method gives us the PMF ofT
+
forn=2,n=3, andn=4 immediately. Quite
simply,
P
H0
{T
+
=j}=coefficient ofe
jt
in the expansion ofM(t),j=0,
1,...,n(n+1)/2.
(35)
See Problem 3.3.12 for the PGF ofT
+
.
Example 8.Let us return to the data of Example 5 and testH
0:z
1/2=μ=−1.0against
H
1:z
1/2>−1.0. Ranking|x i−z
1/2|in increasing order of magnitude, we have
0.016<0.131<0.535<0.762<1.056<1.120<1.417<1.561
54137268
Thus
r
1=3, r 2=6, r 3=4, r 4=2,
r
5=1, r 6=7, r 7=5, r 8=8
and
T
+
=3+6+4+2+7+5+8=35.
From Table ST10,H
0is rejected at levelα=0.05 ifT
+
≥31. Since 35>31, we rejectH 0.
Remark 1.The Wilcoxon test statistic can also be used to test for symmetry. Let
X
1,X2,...,X nbe iid observations on an RV with absolutely continuous DFF.Wesetthe
null hypothesis as
H
0:z
1/2=z0,and DFFis symmetric aboutz 0.
The alternative is
H
1:z
1/2λ=z0andFsymmetric, orFasymmetric.
The test is the same since the null distribution ofT
+
is the same.

SOME SINGLE-SAMPLE PROBLEMS 597
Remark 2.If we havenindependent pairs of observations(X 1,Y1),(X2,Y2),,...,(X n,Yn)
from a bivariate DF, we form the differencesZ
i=Xi−Yi,i=1,2,...,n. Assuming that
Z
1,Z2,...,Z nare (independent) observations from a population of differences with abso-
lutely continuous DFFthat is symmetric with medianz
1/2, we can use the Wilcoxon
statistic to testH
0:z
1/2=z0.
We present some examples.
Example 9.For the data of Example 10.3.3 let us apply the Wilcoxon statistic to test
H
0:z
1/2=0 andFis symmetric againstH 1:z
1/2λ=0 andFsymmetric orFnot
symmetric.
The absolute values, when arranged in increasing order of magnitude, are as follows:
0.057<0.068<0.137<0.261<0.323<0.464<0.482<0.486<0.508<0.513
13 5 2 17 4 1 11 15 20 7
<0.525<0.595<0.881<0.906<1.046<1.229<1.237<1.678<1.787<2.455
8 9 106 19141812163
Thus
r
1=6, r 2=3, r 3=20, r 4=5, r 5=2,r 6=14,
r
7=10, r 8=11, r 9=12,r 10=13,r 11=7,r 12=18,
r
13=1,r 14=16,r 15=8,r 16=19,r 17=4,r 18=17,
r
19=15,r 20=9,
and
T
+
=6+3+20+14+12+13+18+17+15=118.
From Table ST10 we see thatH
0cannot be rejected even at levelα=0.20.
Example 10.Returning to the data of Example 6, we apply the Wilcoxon test to the dif-
ferencesZ
i=Xi−Yi. The differences are−6, 3, 1,−8,−17,−20,−3,−3,−11, 9. To test
H
0:z
1/2=0againstH 1:z
1/2λ=0, we rank the absolute values ofz iin increasing order to
get
1<3=3=3<6<8<9<11<17<20
and
T
+
=1+2+7=10.
Here we have assigned ranks 2, 3, 4 to observations+3,−3,−3. (If we assign rank 4 to
observation 3, thenT
+
=12 without appreciably changing the result.)
From Table ST10, we rejectH
0atα=0.05 if eitherT
+
>46 orT
+
<9. SinceT
+
>9
and<46, we acceptH
0. Note that hypothesisH 0was also accepted by the sign test.

598 NONPARAMETRIC STATISTICAL INFERENCE
For large samples we use the normal approximation. In fact, from (26) we see that

n(T
+
−ET
+
)

n
2
∞ =
n
3/2

n 2
∞(U
1−EU1)+

n(U2−EU2).
Clearly,U
1−EU1
P−→0 and sincen
3/2


n 2

→0, the first term→0 in probability as
n→∞. By Slutsky’s theorem (Theorem 7.2.15) it follows that

n

n 2
∞(T
+
−ET
+
)and

n(U2−EU2)
have the same limiting distribution. From Theorem 13.2.3 and Example 13.2.7 it follows
that

n(U2−EU2), and hence(T
+
−ET
+
)

n

n
2

, has a limiting normal distribution
with mean 0 and variance

1=4P F(X1+X2>0,X 1+X3>0)−4P
2
F
(X1+X2>0).
UnderH
0,theRVsiZ
(i)are independentb(1,1/2)so
E
H0
T
+
=
n(n+1)
4
andvar
H0
T
+
=

1
2
λα
1
2
λ
n

i=1
i
2
=
n(n+1)(2n+1)
24
.
Also, underH
0,Fis continuous and symmetric so
P
F(X1+X2>0)=


−∞
PF(X1>−x)f(x)dx=
1
2
and
P
F(X1+X2>0,X 1+X3>0)=


−∞
[PF(X1>−x)]
2
f(x)dx=
1
3
Thus 4ζ
1=4/3−4/4=1/3 so that
T
+
−EH0
T
+

n
2

1
3n
L
−→N(0,1).
However,
(var
H0
T
+
)
1/2

n 2


1
3n
=
[n(n+1)(2n+4)/24]
1/2
n(n−1)
2

1
3n
→1
asn→∞. Consequently, underH
0
T
+
∼AN

n(n+1)
4
,
n(n+1)(2n+1)
24
λ
.

SOME TWO-SAMPLE PROBLEMS 599
Thus, for large enoughnwe can determine the critical values for a test based onT
+
by
using normal approximation.
As an example, taken=20. From Table ST10 theP-value associated witht
+
=140 is
0.10. Using normal approximation
P
H0
(T
+
>140)≈P
Γ
Z>
140−105
27.45
λ
=P(Z>1.28)= 0.10003
PROBLEMS 13.3
1.Prove Theorem 4.
2.A random sample of size 16 from a continuous DF on[0,1]yields the following
data: 0.59, 0.72, 0.47, 0.43, 0.31, 0.56, 0.22, 0.90, 0.96, 0.78, 0.66, 0.18, 0.73, 0.43,
0.58, 0.11. Test the hypothesis that the sample comes fromU[0,1].
3.Test the goodness of fit of normality for the data of Problem 10.3.6, using the
Kolmogorov–Smirnov test.
4.For the data of Problem 10.3.6 find a 0.95 level confidence band for the distribution
function.
5.The following data represent a sample of size 20 fromU[0,1]: 0.277, 0.435, 0.130,
0.143, 0.853, 0.889, 0.294, 0.697, 0.940, 0.648, 0.324, 0.482, 0.540, 0.152, 0.477,
0.667, 0.741, 0.882, 0.885, 0.740. Construct a .90 level confidence band forF(x).
6.In Problem 5 test the hypothesis that the distribution isU[0,1].Takeα =0.05.
7.For the data of Example 2 test, by means of the sign test, the null hypothesis
H
0:μ=1.5againstH 1:μλ=1.5.
8.For the data of Problem 5 test the hypothesis that the quantile of orderp=0.20
is 0.20.
9.For the data of Problem 10.4.8 use the sign test to test the hypothesis of no
difference between the two averages.
10.Use the sign test for the data of Problem 10.4.9 to test the hypothesis of no
difference in grade-point averages.
11.For the data of Problem 5 apply the signed-rank test to testH
0:z
1/2=0.5against
H
1:z
1/2λ=0.5.
12.For the data of Problems 10.4.8 and 10.4.9 apply the signed-rank test to the
differences to testH
0:z
1/2=0againstH 1:z
1/2λ=0.
13.4 SOME TWO-SAMPLE PROBLEMS
In this section we consider some two-sample tests. LetX
1,X2,...,X mandY 1,Y2,...,Y nbe
independent samples from two absolutely continuous distribution functionsF
XandF Y,
respectively. The problem is to test the null hypothesisH
0:FX(x)=F Y(x)for allx∈R
against the usual one- and two-sided alternatives.

600 NONPARAMETRIC STATISTICAL INFERENCE
Tests ofH 0depend on the type of alternative specified. We state some of the alternatives
of interest even though we will not consider all of these in this text.
I Location alternative:F
Y(x)=F X(x−θ),θλ=0.
II Scale alternative:F
Y(x)=F X(x/σ),σ>0.
III Lehmann alternative:F
Y(x)=1−[1−F X(x)]
θ+1
,θ+1>0.
IV Stochastic alternative:F
Y(x)≥F X(x)for allx, andF Y(x)>F X(x)for at least onex.
V General alternative:F
Y(x)λ=F X(x)for somex.
Some comments are in order. Clearly I through IV are special cases of V. Alternatives I
and II show differences inF
XandF Yin location and scale, respectively. Alternative III
states thatP(Y>x)=[P(X>x)]
θ+1
. In the special case whenθis an integer it states that
Yhas the same distribution as the smallest of theθ+1ofX-variables. A similar alternative
to test that is sometimes used isF
Y(x)=[F X(x)]
α
for someα>0 and allx. Whenαis an
integer, this states thatYis distributed as the largest of theαX-variables. Alternative IV
refers to the relative magnitudes ofX’s andY’s. It states that
P(Y≤x)≥P(X≤x)for allx,
so that
P(Y>x)≤P(X>x), (1)
for allx. In other words,X’s tend to be larger than theY’s.
Definition 1.We say that a continuous RVXisstochastically largerthan a continuous
RVYif inequality (1) is satisfied for allxwith strict inequality for somex.
A similar interpretation may be given to the one-sided alternativeF
X>FY. In the spe-
cial case where bothXandYare normal RVs with meansμ
1,μ2and common varianceσ
2
,
F
X=FYcorresponds toμ 1=μ2andF X>FYcorresponds toμ 1<μ2
In this section we consider some common two-sample tests for location (Case I) and
stochastic ordering (Case IV) alternatives. First, note that a test of stochastic ordering
may also be used as a test of less restrictive location alternatives since, for example,
F
X>FYcorresponds to largerY’s and hence larger location forY. Second, we note that
the chi-square test of homogeneity described in Section 10.3 can be used to test general
alternatives (Case V)H
1:F(x)λ=G(x)for somex. Briefly, one partitions the real line into
Borel setsA
1,A2,...,A k.Let
p
i1=P(X j∈Ai)andp i2=P(Y j∈Ai),
i=1,2,...,k. UnderH
0:F=G,p i1=pi2,i=1,2,...,k, which is the problem of testing
equality of two independent multinomial distributions discussed in Section 10.3.
We first consider a simple test of location. This test, based on the sample median of the
combined sample, is a test of the equality of medians of the two DFs. It will tend to accept
H
0:F=Geven if the shapes ofFandGare different as long as their medians are equal.

SOME TWO-SAMPLE PROBLEMS 601
13.4.1 Median Test
The combined sampleX
1,X2,...,X m,Y1,Y2,...,Y nis ordered and a sample median is
found. Ifm+nis odd, the median is the[(m+n+1)/2]th value in the ordered arrange-
ment.If m+nis even, the median is any number between the two middle values. LetVbe
the number of observed values ofXthat are less than or equal to the sample median for the
combined sample. IfVis large, it is reasonable to conclude that the actual median ofXis
smaller than the median ofY. One therefore rejectsH
0:F=Gin favor ofH 1:F(x)≥G(x)
for allxandF(x)>G(x)for somexifVis too large, that is, ifV≥c.If,however,the
alternative isF(x)≤G(x)for allxandF(x)<G(x)for somex, the median test rejectsH
0
ifV≤c.
For the two-sided alternative thatF(x)=G(x)for somex, we use the two-sided test.
We next compute the null distribution of the RVV.Ifm+n=2p,pa positive integer,
then
P
H0
{V=v}=P H0
{exactlyvof theX i’s are≤combined median}
=













λ
m
v

n
p−v

λ
m+n
p
,v=0,1,2,...,m,
0, otherwise.
(2)
Here 0≤V≤min(m,p).Ifm+n=2p+1,p>0, is an integer, the[(m+n+1)/2]th
value is the median in the combined sample, and
P
H0
{V=v}=P{exactlyvof theX i’s are below the(p+1)th value
in the ordered arrangement}
=













λ
m
v

n
p−v

λ
m+n
p
,v=0,1,...,min(m,p),
0, otherwise.
(3)
Remark 1.UnderH
0we expect(m+n)/2 observations above the median and(m+n)/2
below the median. One can therefore apply the chi-square test with 1 d.f. to testH
0against
the two-sided alternative.
Example 1.The following data represent lifetimes (hours) of batteries for two different
brands:
Brand A: 40 30 40 45 55 30
Brand B: 50 50 45 55 60 40

602 NONPARAMETRIC STATISTICAL INFERENCE
The combined ordered sample is 30, 30, 40, 40, 40, 45, 45, 50, 50, 55, 55, 60. Since
m+n=3 is even, the median is 45. Thus
v=number of observed values ofXthat are less than or equal to 45
=5.
Now
P
H0
{V≥5}=
Γ
6
5
→∗
6
1

Γ
12
6
→+
Γ
6 6
→∗
6 0

Γ
12
6
→≈0.04.
SinceP
H0
{V≥5}>0.025, we cannot rejectH 0that the two samples come from the same
population.
We now consider two tests of the stochastic alternatives. As mentioned earlier they may
also be used as tests of location.
13.4.2 Kolmogorov–Smirnov Test
LetX
1,X2,...,X mandY 1,Y2,...,Y nbe independent random samples from continuous DFs
FandG, respectively. LetF

m
andG

n
, respectively, be the empirical DFs of theX’s and
theY’s. Recall theF

m
is theU-statistic forFandG

n
, that forG. UnderH 0:F(x)=G(x)
for allx, we expect a reasonable agreement between the two sample DFs. We define
D
m,n=sup
x
|F

m
(x)−G

n
(x)|. (4)
ThenD
m,nmaybeusedtotestH 0against the two-sided alternativeH 1:F(x)→=G(x)for
somex. The test rejectsH
0at levelαif
D
m,n≥Dm,n,α, (5)
whereP
H0
{Dm,n≥Dm,n,α}≤α.
Similarly, one can define the one-sided statistics
D
+
m,n
=sup
x
[F

m
(x)−G

n
(x)] (6)
and
D

m,n
=sup
x
[G

n
(x)−F

m
(x)], (7)
to be used against the one-sided alternatives
G(x)≤F(x)for allxandG(x)<F(x)for somex
with rejection regionD
+
m,n
≥D
+
m,n,α
(8)

SOME TWO-SAMPLE PROBLEMS 603
and
F(x)≤G(x)for allxandF(x)<G(x)for somex
with rejection regionD

m,n
≥D

m,n,α
,
(9)
respectively.
For small samples tables due to Massey [72] are available. In Table ST9, we give the
values ofD
m,n,αandD
+
m,n,α
for some selected values ofm,n, andα. Table ST8 gives the
corresponding values for them=ncase.
For large samples we use the limiting result due to Smirnov [107]. LetN=mn/(m +n).
Then
lim
m,n→∞
P{

ND
+
m,n
≤λ}=
δ
1−e
−2λ
2
,λ>0,
0,λ ≤0,
(10)
lim
m,n→∞
P{

NDm,n≤λ}=







j=−∞
(−1)
j
e
−2j
2
λ
2
,λ>0,
0,λ ≤0.
(11)
Relations (10) and (11) give the distribution ofD
+
m,n
andD m,n, respectively, under
H
0:F(x)=G(x)for allx∈R.
Example 2.Let us apply the test to data from Example 1. Do the two brands differ with
respect to average life?
Let us first apply the Kolmogorov–Smirnov test to testH
0that the population distribu-
tion of length of life for the two brands is the same.
xF

6
(x)G

6
(x)|F

6
(x)−G

6
(x)|
30
2
6
0
2
6
40
4
6
1
6
3
6
45
5
6
2
6
3
6
50
5
6
4
6
1
6
55 1
5
6
1
6
60 1 1 0
D6,6=sup
x|F

6
(x)−G

6
(x)|=
36
.
From Table ST8, the critical value form=n=6atlevelα=0.05 isD
6,6,0.05 =
4
6
.
SinceD
6,6→>D6,6,0.05, we acceptH 0that the population distribution for the length of life
for the two brands is the same.

604 NONPARAMETRIC STATISTICAL INFERENCE
Let us next apply the two-samplet-test. We havex=40,y=50,s
2
1
=90,s
2
2
=50,
s
2
p
=70. Thus
t=
40−50

70

1
6
+
1
6
=−2.08.
Sincet
10,0.025=2.2281, we accept the hypothesis that the two samples come from the
same (normal) population.
The second test of stochastic ordering alternatives we consider is the Mann–Whitney–
Wilcoxon test which can be viewed as a test based on aU-statistic.
13.4.3 The Mann–Whitney–Wilcoxon Test
LetX
1,X2,...,X mandY 1,Y2,...,Y nbe independent samples from two continuous DFs,
FandG, respectively. As in Example 13.2.10, let
T(X
i;Yj)=
δ
1,ifX
i<Yj
0,ifX i≥Yj,
fori=1,2,...,m,j=1,2,...,n. Recall thatT(X
i;Yj)is an unbiased estimator of
g(F,G)=P
F,G(X<Y)and the two sampleU-statistic forgis given byU 1(X;Y)=
(m,n)
−1
Θ
m
i=1Θ
n
j=1
T(Xi;Yj). For notational convenience, let us write
U=mnU
1(X;Y)=
m

i=1
n

j=1
T(Xi;Yj). (12)
ThenUis the number of values ofX
1,X2,...,X mthat are smaller than each ofY 1,Y2,...,Y n.
The statisticUis called theMann–Whitney statistic. An alternative equivalent form using
Wilcoxon scores is the linear rank statistic given by
W=
n

j=1
Qj, (13)
whereQ
j=rank ofY jamong the combinedm+nobservations. Indeed,
Q
j=rank ofY j=(#ofX i’s<Y j)+rank ofY jinY’s.
Thus
W=
n

j=1
Qj=U+
n

j=1
j=U+
n(n+1)
2
(14)
so thatUandWare equivalent test statistics. Hence the nameMann–Whitney–Wilcoxon
Test. We will restrict attention toUas the test statistic.

SOME TWO-SAMPLE PROBLEMS 605
Example 3.Letm=4,n=3, and suppose that the combined sample when ordered is as
follows:
x
2<x1<y3<y2<x4<y1<x3.
ThenU=7, since there are three values ofx<y
1, two values ofx<y 2, and two values
ofx<y
3.Also,W=13 soU=13−3(4)/2 =7.
Note thatU=0 if all theX
i’s are larger than all theY j’s andU=mnif all theX i’s are
smaller than all theY
j’s, because then there aremX’s<Y 1,mX’s<Y 2, and so on. Thus
0≤U≤mn.IfUis large, the values ofYtend to be larger than the values ofX(Yis
stochastically larger thanX), and this supports the alternativeF(x)≥G(x)for allxand
F(x)>G(x)for somex. Similarly, ifUis small, theYvalues tend to be smaller than theX
values, and this supports the alternativeF(x)≤G(x)for allxandF(x)<G(x)for somex.
We summarize these results as follows:
H0 H1 RejectH 0if
F=GF≥GU ≥c 1
F=GF≤GU ≤c 2
F=GFλ=GU≥c 3orU≤c 4
To compute the critical values we need the null distribution ofU.Let
p
m,n(u)=P H0
{U=u}. (15)
We will set up a difference equation relatingp
m,ntopm−1,nandp m,n−1. If the observations
are arranged in increasing order of magnitude, the largest value can be either anxvalue or
ayvalue. UnderH
0,allm+nvalues are equally likely, so the probability that the largest
value will be anxvalue ism/(m+n)and that it will be ayvalue isn/(m+n).
Now, if the largest value is anx, it does not contribute toU, and the remainingm−1
values ofxandnvalues ofycan be arranged to give the observed valueU=uwith
probabilityp
m−1,n(u). If the largest value is aY, this value is larger than all themx’s. Thus,
to getU=u, the remainingn−1 values ofYandmvalues ofxcontributeU=u−m.It
follows that
p
m,n(u)=
m
m+n
p
m−1,n(u)+
n
m+n
p
m,n−1(u−m). (16)
Ifm=0, then forn≥1
p
0,n(u)=
δ
1ifu=0,
0ifu>0.
(17)

606 NONPARAMETRIC STATISTICAL INFERENCE
Ifn=0,m≥1, then
p
m,0(u)=
δ
1ifu=0,
0ifu>0,
(18)
and
p
m,n(u)=0ifu<0,m≥0,n≥0. (19)
For small values ofmandnone can easily compute the null PMF ofU. Thus, ifm=
n=1, then
p
1,1(0)=
1
2
,p 1,1(1)=
1
2
.
Ifm=1,n=2, then
p
1,2(0)=p 1,2(1)=p 1,2(2)=
1
3
.
Tables for critical values are available for small values ofmandn,m≤n. See, for
example, Auble [3] or Mann and Whitney [71]. Table ST11 gives the values ofu
αfor
whichP
H0
{U>u α}≤α for some selected values ofm,n, andα.
Ifm,nare large we can use the asymptotic normality ofU. In Example 13.2.11 we
showed that, underH
0,
U/(mn) −
1
2
τ
(m+n+1)/(12mn)
L
−→N(0,1)
asm,n→∞such thatm/(m +n)→constant. The approximation is fairly good for
m,n≥8.
Example 4.Two samples are as follows:
Values ofX
i:1,2,3,5,7,9,11,18
Values ofY
i:4,6,8,10,12,13,14,15,19
Thusm=8,n=9, andU=3+4+5+6+7+7+7+7+8=54. The (exact)p-value
P
H0
(U≥54)=0.046, so we reject H 0at (two-tailed) levelα=0.1. Let us apply the
normal approximation. We have
E
H0
U=
8·9
2
=36, var
H0
(U)=
8·9
12
(8+9+1)=108,
and
Z=
54−36

108
=
18
6

3
=

3=1.732.
We note thatP(Z>1.73)= 0.042.

SOME TWO-SAMPLE PROBLEMS 607
PROBLEMS 13.4
1.For the data of Example 4 apply the median test.
2.Twelve 4-year-old boys and twelve 4-year-old girls were observed during two
15-minute play sessions, and each child’s play during these two periods was scored
as follows for incidence and degree of aggression:
Boys: 86, 69,72,65,113,65,118,45,141,104,41,50
Girls: 55,40,22,58,16,7,9,16,26,36,20,15
Test the hypothesis that there were sex differences in the amount of aggression
shown, using (a) the median test and (b) the Mann-Whitney-Wilcoxon test (Siegel
[105]).
3.To compare the variability of two brands of tires, the following mileages (1000
miles) were obtained for eight tires of each kind:
BrandA:32.1,20.6,17.8,28.4,19.6,21.4,19.9,30.1
BrandB:19.8,27.6,30.8,27.6,34.1,18.7,16.9,17.9
Test the null hypothesis that the two samples come from the same population, using
the Mann–Whitney–Wilcoxon test.
4.Use the data of Problem 2 to apply the Kolmogorov–Smirnov test.
5.Apply the Kolmogorov–Smirnov test to the data of Problem 3.
6.Yet another test for testingH
0:F=Gagainst general alternatives is the so-called
runs test.A runis a succession of one or more identical symbols which are pre-
ceeded and followed by a different symbol (or no symbol). Thelengthof a run
is the number of like symbols in a run. The total number of runs,R, in the com-
bined sample ofX’s andY’s when arranged in increasing order can be used as a
test ofH
0. UnderH 0theXandYsymbols are expected to be well-mixed. A small
value ofRsupportsH
1:F→=G. A test based onRis appropriate only for two-sided
(general) alternatives. Tables of critical values are available. For large samples, one
uses normal approximation:R∼AN

1+
2mn
m+n
,
2mn(2mn− m−n)
(m+n−1)(m+ n)
2
˜
.
(a) LetR
1=#ofX-runs,R 2=#Y-runs, andR=R 1+R2. UnderH 0, show that
P(R
1=r1,R2=r2)=k
θ
m−1
r
1−1
αθ
n−1
r
2−1
α
θ
m+n
m
α,
wherek=2ifr
1=r2,=1if|r 1−r2|=1,r 1=1,2,...,mandr 2=1,2,...,n.
(b) Show that
P
H0
(R1=r1)=
θ
m−1
r
1−1
αθ
n+1
r
1
α
θ
m+n
m
α,0≤r
1≤m.

608 NONPARAMETRIC STATISTICAL INFERENCE
7.Fifteen 3-year-old boys and 15 3-year-old girls were observed during two sessions
of recess in a nursery school. Each child’s play was scored for incidence and degree
of aggression as follows:
Boys: 96 65 74 78 82 121 68 79 111 48 53 92 81 31 40
Girls: 12 47 32 59 83 14 32 15 17 82 21 34 9 15 51
Is there evidence to suggest that there are sex differences in the incidence and amount
of aggression? Use both Mann–Whitney–Wilcoxon and runs tests.
13.5 TESTS OF INDEPENDENCE
LetXandYbe two RVs with joint DFF(x,y), and letF
1andF 2, respectively, be
the marginal DFs ofXandY. In this section we study some tests of the hypothesis of
independence, namely,
H
0:F(x,y)=F 1(x)F2(y) for all(x,y)∈R 2
against the alternative
H
1:F(x,y)=F 1(x)F2(y) for some(x,y).
If the joint distribution functionFis bivariate normal, we know thatXandYare indepen-
dent if and only if the correlation coefficientρ=0. In this case, the test of independence
is to testH
0:ρ=0.
In the nonparametric situation the most commonly used test of independence is the
chi-square test, which we now study.
13.5.1 Chi-square Test of Independence—Contingency Tables
LetXandYbe two RVs, and suppose that we havenobservations on(X,Y). Let us
divide the space of values assumed byX(the real line) intormutually exclusive inter-
valsA
1,A2,...,A r. Similarly, the space of values ofYis divided intocdisjoint intervals
B
1,B2,...,B c. As a rule of thumb, we choose the length of each interval in such a way
that the probability thatX(Y)lies in an interval is approximately(1/r)(1/c). Moreover,
it is desirable to haven/randn/cat least equal to 5. LetX
ijdenote the number of pairs
(X
k,Yk),k=1,2,...,n, that lie inA i×Bj, and let
p
ij=P{(X,Y)∈A i×Bj}=P{X∈A iandY∈B j}, (1)
wherei=1,2,...,r,j=1,2,...,c. If eachp
ijis known, the quantity
r

i=1
c

j=1

(X
ij−npij)
2
npij

(2)

TESTS OF INDEPENDENCE 609
has approximately a chi-square distribution withrc−1 d.f., provided thatnis large (see
Theorem 10.3.2.). IfXandYare independent,P{(X,Y)∈A
i×Bj}=P{X∈A i}P{Y∈B j}.
Let us writep
i·=P{X∈A i}andp ·j=P{Y∈B j}. Then underH 0:pij=pi·p·j,i=
1,2,...,r,j=1,2,...,c. In practice,p
ijwill not be known. We replacep ijby their
estimates. UnderH
0, we estimatep i·by
ˆp
i·=

c
j=1
Xij
n
,i=1,2,...,r, (3)
andp
·jby
ˆp
·j=
r

i=1
Xij
n
,j=1,2,...,c. (4)
Since

c
j=1
ˆp·j=1=

r
1
ˆpi·, we have estimated onlyr−1+c−1=r+c−2 parameters.
It follows (see Theorem 10.3.4) that the RV
U=
r

i=1
c

j=1

(X
ij−nˆ p i·ˆ p·j)
2
nˆ pi·ˆ p·j

(5)
is asymptotically distributed asχ
2
withrc−1−(r+c−2)=(r−1)(c−1)d.f., underH 0.
The null hypothesis is rejected if the computed value ofUexceedsχ
2
(r−1)(c− 1),α
.
It is frequently convenient to list the observed and expected frequencies of thercevents
A
i×Bjin anr×ctable, called acontingency table, as follows:
Observed Frequencies,O ij
B1 B2···Bc
A1X11 X12···X1c

X
1j
A2X21 X22···X2c

X
2j
· · ····· ·
· · ····· ·
· · ····· ·
A
rXr1 Xr2···Xrc

X
rj

X
i1

X
i2

X
icn
Expected Frequencies,E ij
B1 B2···B c
A1np1·p·1np1·p·2···np1·p·cnp1·
A2np2·p·1np2·p·2···np2·p·cnp2·
· · · ··· · · · ··· · · · ···
A
rnpr·p·1npr·p·2···npr·p·cnpr·
np·1 np·2 np·c n
Note that theX ij’s in the table are frequencies. Once the categoryA i×Bjis determined
for an observation(X,Y), numerical values ofXandYare irrelevant. Next, we need to
compute the expected frequency table. This is done quite simply by multiplying the row

610 NONPARAMETRIC STATISTICAL INFERENCE
and column totals for each pair(i,j)and dividing the product byn. Then we compute the
quantity

i

j
(Eij−Oij)
2
Eij
and compare it with the tabulatedχ
2
value. In this form the test can be applied even to
qualitative data.A
1,A2,...,A randB 1,B2,...,B crepresent the two attributes, and the null
hypothesis to be tested is that the attributesAandBare independent.
Example 1.The following are the results for a random sample of 400 employed
individuals:
Annual Income (dollars)
Length of time
(years) with the Less than More than
Same Company 40,000 40,000–75,000 75,000 Total
<5 50 75 25 150
5–10 25 50 25 100
10 or more 25 75 50 150
100 200 100 400
IfXdenotes the length of service with the same company, andY, the annual income we
wish to test the hypothesis thatXandYare independent. The expected frequencies are as
follows:
Expected Frequencies
Time (years)
with the Same
Company <40,000 40–75,000 ≥75,000 Total
<5 37.5 75 37.5 150
5–10 25 50 25 100
≥10 37.5 75 37.5 150
100 200 100 400
Thus
U=
(12.5)
2
37.5
+
0
25
+
(12.5)
2
37.5
+0+0+0+
(12.5)
2
37.5
+0+
(12.5)
2
37.5
=16.66.
The number of degrees of freedom is(3−1)(3−1)=4, and χ
2
4,0.05
=9.488. Since
16.66>9.488, we rejectH
0at level 0.05 and conclude that length of service with a
company is not independent of annual income.

TESTS OF INDEPENDENCE 611
13.5.2 Kendall’s Tau
Let(X
1,Y1),(X2,Y2),...,(X n,Yn)be a sample from a bivariate population.
Definition 1.For any two pairs(X
i,Yi)and(X j,Yj)we say that the relation isperfect
concordance (or agreement)if
X
i<XjwheneverY i<YjorX i>XjwheneverY i>Yj (6)
and that the relation isperfect discordance (or disagreement)if
X
i>XjwheneverY i<YjorX i<XjwheneverY i>Yj. (7)
Writingπ
candπ dfor the probability of perfect concordance and of perfect discordance,
respectively, we have
π
c=P{(X j−Xi)(Yj−Yi)>0} (8)
and
π
d=P{(X j−Xi)(Yj−Yi)<0}, (9)
and, if the marginal distributions ofXandYare continuous,
π
c=[P{Y i<Yj}−P{X i>XjandY i<Yj}]
+[P{Y
i>Yj}−P{X i<XjandY i>Yj}]
=1−π
d. (10)
Definition 2.The measure of association between the RVsXandYdefined by
τ=π
c−πd (11)
is known asKendall’s tau.
If the marginal distributions ofXandYare continuous, we may rewrite (11), in view
of (10), as follows:
τ=1−2π
d=2π c−1. (12)
In particular, ifXandYare independent and continuous RVs, then
P{X
i<Xj}=P{X i>Xj}=
1
2
,
since thenX
i−Xjis a symmetric RV. Then
π
c=P{X i<Xj}P{Y i<Yj}+P{X i>Xj}P{Y i>Yj}
=P{X
i>Xj}P{Y i<Yj}+P{X i<Xj}P{Y i>Yj}

d,
and it follows thatτ=0 for independent continuous RVs.

612 NONPARAMETRIC STATISTICAL INFERENCE
Note that, in general,τ=0 does not imply independence. However, for the bivariate
normal distributionτ=0 if and only if the correlation coefficientρ, betweenXandY,
is 0, so thatτ=0 if and only ifXandYare independent (Problem 6).
Let
ψ((x
1,y1),(x2,y2)) =
δ
1,(y
2−y1)(x2−x1)>0,
0,otherwise.
(13)
ThenEψ((X
1,Y1),(X2,Y2)) =τ c=(1+τ)/2, and we see thatτ cis estimable of degree 2,
with symmetric kernelψdefined in (13). The corresponding one-sampleU-statistic is
given by
U((X
1,Y1),...,(X n,Yn)) =
λ
n
2

−1
σ
1≤i<j≤n
ψ((X i,Yi),(Xj,Yj)). (14)
Then the corresponding estimator of Kendall’s tau is
T=2U−1 (15)
and is calledKendall’s sample correlation coefficient.
Note that−1≤T≤1. To testH
0thatXandYare independent againstH 1:XandY
are dependent, we rejectH
0if|T|is large. UnderH 0,τ=0, so that the null distribution of
Tis symmetric about 0. Thus we rejectH
0at levelαif the observed value ofT,t, satisfies
|t|>t
α/2, whereP{|T|≥t
α/2|H0}=α.
For small values ofnthe null distribution can be directly evaluated. Values for 4≤
n≤10 are tabulated by Kendall [51]. Table ST12 gives the values ofS
αfor which
P{S>S
α}≤α, whereS=
θ
n
2
α
Tfor selected values ofnandα.
For a direct evaluation of the null distribution we note that the numerical value ofTis
clearly invariant under all order-preserving transformations. It is therefore convenient to
orderXandYvalues and assign them ranks. If we write the pairs from the smallest to the
largest according to, say,Xvalues, then the number of pairs of values of 1≤i<j≤nfor
whichY
j−Yi>0 is the number of concordant pairs,P.
Example 2.Letn=4, and let us find the null distribution ofT. There are 4!different
permutations of ranks ofY:
Ranks ofXvalues: 1 2 3 4
Ranks ofYvalues:a
1a2a3a4
where(a 1,a2,a3,a4)is one of the 24 permutations of 1, 2,3,4. Since the distribution is
symmetric about 0, we need only compute one half of the distribution.

TESTS OF INDEPENDENCE 613
PT Number of PermutationsP H0
{T=t}
0−1.00 1
1
24
1−0.67 3
3
24
2−0.33 5
5
24
30.00 6
6
24
Similarly, forn=3 the distribution ofTunderH 0is as follows:
PT Number of PermutationsP H0
{T=t}
0−1.00 1 :(3,2,1)
1
6
1−0.33 2 :(2,3,1),(3,1,2)
2
6
Example 3.Two judges rank four essays as follows:
Essay
Judge 1 2 3 4
1,X 3421
2,Y 3142
To testH 0:rankings of the two judges are independent, let us arrange the rankings of the
first judge from 1 to 4. Then we have:
Judge 1, X:1234
Judge 2, Y:2431
P=number of pairs of rankings for Judge 2 such that forj>i,Y
j−Yi>0=2[thepairs
(2,4)and(2,3)], and
t=
2·2

4
2
−1=−0.33.
Since
P
H0
{|T|≥0.33} =
18
24
=0.75,
we cannot rejectH
0.
For largenwe can use an extension of Theorem 13.3.3 to bivariate case to conclude
that

n(U−τ c)
L
−→N(0,4ζ 1), where
ζ
1=cov{ψ((X 1,Y1),(X2,Y2)),ψ((X 1,Y1),(X3,Y3))}.

614 NONPARAMETRIC STATISTICAL INFERENCE
UnderH 0, it can be shown that
3
τ
n(n−1)
τ
2(2n+5)
T
L
−→N(0,1).
See, for example, Kendall [51], Randles and Wolfe [85], or Gibbons [35]. Approximation
is good forn≥8.
13.5.3 Spearman’s Rank Correlation Coefficient
Let(X
1,Y1),(X2,Y2),...,(X n,Yn)be a sample from a bivariate population. In Section 6.3
we defined the sample correlation coefficient by
R=
Θ
n
i=1
(Xi−
X)(Y i−Y)
Θ
n i=1
(Xi−X)
2
Θ
n i=1
(Yi−Y)
2
!
1/2
, (16)
where
X=n
−1
n

i=1
Xiand
Y=n
−1
n

i=1
Yi.
If the sample valuesX
1,X2,...,X nandY 1,Y2,...,Y nare each ranked from 1 tonin
increasing order of magnitude separately, and if theX’s andY’s have continuous DFs, we
get a unique set of rankings. The data will then reduce tonpairs of rankings. Let us write
R
i= rank(X i)and S i= rank(Y i)
thenR
iandS i∈{1,2,...,n}.Also,
n

1
Ri=
n

1
Si=
n(n+1)
2
, (17)
R=n
−1
n

1
Ri=
n+1
2
,S=n
−1
n

1
Si=
n+1
2
, (18)
and
n

1
(Ri−
R)
2
=
n

1
(Si−S)
2
=
n(n
2
−1)
12
. (19)
Substituting in (16), we obtain
R=
12
Θ
n i=1
(Ri−
R)(Si−S)
n
3
−n
=
12
Θ
n 1
RiSi
n(n
2
−1)

3(n+1)
n−1
. (20)

TESTS OF INDEPENDENCE 615
WritingD i=Ri−Si=(R i−R)−(S i−S),wehave
n

i=1
D
2
i
=
n

i=1
(Ri−
R)
2
+
n

i=1
(Si−S)
2
−2
n

i=1
(Ri−R)(Si−S)
=
1
6
n(n
2
−1)−2
n

i=1
(Ri−
R)(Si−S),
and it follows that
R=1−
6
Θ
n
i=1
D
2
i
n(n
2
−1)
. (21)
The statisticRdefined in (20) and (21) is calledSpearman’s rank correlation coefficient
(see also Example 4.5.2).
From (20) we see that
ER=
12
n(n
2
−1)
E
"
n

i=1
RiSi
#

3(n+1)
n−1
=
12
n
2
−1
E(R
iSi)−
3(n+1)
n−1
. (22)
UnderH
0,theRVsXandYare independent, so that the ranksR iandS iare also
independent. It follows that
E
H0
(RiSi)=ER iESi=
Γ
n+1
2
λ
2
and
E
H0
R=
12
n
2
−1
Γ
n+1
2
λ
2

3(n+1)
n−1
=0. (23)
Thus we should rejectH
0if the absolute value ofRis large, that is, rejectH 0if
|R|>R
α, (24)
whereP
H0
{|R|>R α}≤α. To computeR αwe need the null distribution ofR. For this
purpose it is convenient to assume, without loss of generality, thatR
i=i,i=1,2,...,n.
ThenD
i=i−S i,i=1,2,...,n. UnderH 0,XandYbeing independent, then!pairs(i,S i)
of ranks are equally likely. It follows that
P
H0
{R=r}=(n!)
−1
×(number of pairs for whichR=r) (25)
=
n
r
n!
,say.
Note that−1≤R≤1, and the extreme values can occur only when either the rankings
match, that is,R
i=Si, in which caseR=1, orR i=n+1−S i, in which caseR=−1.

616 NONPARAMETRIC STATISTICAL INFERENCE
Moreover, one need compute only one half of the distribution, since it is symmetric about 0
(Problem 7).
In the following example we will compute the distribution ofRforn=3 and 4. The
exact complete distribution of
Θ
n
i=1
D
2
i
, and henceR,forn ≤10 has been tabulated by
Kendall [51]. Table ST13 gives the values ofR
αfor some selected values ofnandα.
Example 4.Let us first enumerate the null distribution ofRforn=3. This is done in the
following table:
(s1,s2,s3)
n

i=1
isir=
12
Θ
n
1
isi
n(n
2
−1)

3(n+1)
n−1
(1,2,3) 14 1.0
(1,3,2) 13 0.5
(2,1,3) 13 0.5
Thus
P
H0
{R=r}=









1
6
,r=1.0,
2
6
,r=0.5,
2
6
,r=−0.5,
1
6
,r=−1.0.
Similarly, forn=4 we have the following:
(s1,s2,s3,s4)
n

1
isirnrPH0
{R=r}
(1,2,3,4) 30 1 1
1
24δ
(1,3,2,4),(2,1,3,4)
(1,2,4,3)
29 0.83
3
24
(2,1,4,3) 28 0.61
1
24δ
(1,3,4,2),(1,4,2,3),(2,3,1,4)
(3,1,2,4)
27 0.44
4
24
(1,4,3,2),(3,2,1,4) 26 0.22
2
24
25 0.02
2
24
The last value is obtained from symmetry.
Example 5.In Example 3, we see that
r=
12×23
4×15

3×5
3
=−0.4.

TESTS OF INDEPENDENCE 617
SinceP H0
{|R|≥0.4}=18/24=0.75, we cannot rejectH 0atα=0.05 orα=0.10.
For large samples it is possible to use a normal approximation. It can be shown (see,
e.g., Fraser [32, pp. 247–248]) that underH
0the RV
Z=
"
12
n

i=1
RiSi−3n
3
#
n
−5/2
or, equivalently,
Z=R

n−1
has approximately a standard normal distribution. The approximation is good forn≥10.
PROBLEMS 13.5
1.A sample of 240 men was classified according to characteristicsAandB. Char-
acteristicAwas subdivided into four classesA
1,A2,A3, andA 4, whileBwas
subdivided into three classesB
1,B2, andB 3, with the following result:
A1A2A3A4
B112 25 32 11 80
B
2
17 18 22 23 80
B
3
21 17 16 26 80
50 60 70 60 240
Is there evidence to support the theory thatAandBare independent?
2.The following data represent the blood types and ethnic groups of a sample of Iraqi
citizens:
Blood Type
Ethnic GroupOABAB
Kurd 531 450 293 226
Arab 174 150 133 36
Jew 422626 8
Turkoman 47 49 22 10
Ossetian 50592615
Is there evidence to conclude that blood type is independent of ethnic group?
3.In a public opinion poll, a random sample of 500 American adults across the coun-
try was asked the following question: “Do you believe that there was a concerted

618 NONPARAMETRIC STATISTICAL INFERENCE
effort to cover up the Watergate scandal? Answer yes, no, or no opinion.” The
responses according to political beliefs were as follows:
Response
Political
Affiliation Yes No No Opinion
Republican 45 75 30 150 Independent 85 45 20 150 Democrat 140 30 30 200
270 150 80 500
Test the hypothesis that attitude toward the Watergate cover-up is independent of
political party affiliation.
4.A random sample of 100 families in Bowling Green, Ohio, showed the following
distribution of home ownership by family income:
Annual Income (dollars)
Residential Less than 30,000– 50,000
Status 30,000 50,000 or Above
Home Owner 10 15 30 Renter 8 17 20
Is home ownership in Bowling Green independent of family income?
5.In a flower show the judges agreed that five exhibits were outstanding, and these
were numbered arbitrarily from 1 to 5. Three judges each arranged these five
exhibits in order of merit, giving the following rankings:
JudgeA:53124
JudgeB:31542
JudgeC:52314
Compute the average values of Spearman’s rank correlation coefficientRand
Kendall’s sample tau coefficientTfrom the three possible pairs of rankings.
6.For the bivariate normally distributed RV(X,Y)show thatτ=0 if and only ifXand
Yare independent. [Hint:Show thatτ=(2/π)sin
−1
ρ, whereρis the correlation
coefficient betweenXandY.]
7.Show that the distribution of Spearman’s rank correlation coefficientRis symmet-
ric about 0 underH
0.
8.In Problem 5 test the null hypothesis that rankings of judgeAand judgeCare
independent. Use both Kendall’s tau and Spearman’s rank correlation tests.

SOME APPLICATIONS OF ORDER STATISTICS 619
9.A random sample of 12 couples showed the following distribution of heights:
Height (in.) Height (in.)
Couple Husband Wife Couple Husband Wife
1 80 72 7 74 68
2 70 60 8 71 71
3 73 76 9 63 61
4 7262 10 6465
5 6263 11 6866
6 6546 12 6767
(a) ComputeT.
(b) ComputeR.
(c) Test the hypothesis that the heights of husband and wife are independent, using
Tas well asR. In each case use the normal approximation.
13.6 SOME APPLICATIONS OF ORDER STATISTICS
In this section we consider some applications of order statistics. We are mainly inter-
ested in three applications, namely, tolerance intervals for distributions, coverages, and
confidence interval estimates for quantiles and location parameters.
Definition 1.LetFbe a continuous DF. Atolerance interval for Fwithtolerance coeffi-
cientγis a random interval such that the probability isγthat this random interval covers
at least a specific percentage (100p ) of the distribution.
LetX
1,X2,...,X nbe a sample of sizenfromF, and letX
(1),X
(2),...,X
(n)be the cor-
responding set of order statistics. If the end points of the tolerance interval are two-order
statisticsX
(r),X
(s),r<s,wehave
P{P{X
(r)<X<X
(s)}≥p}=γ. (1)
SinceFis continuous,F(X)isU(0,1), and we have
P{X
(r)<X<X
(s)}=P{X<X
(s)}−P{X≤X
(r)}
=F(X
(s))−F(X
(r))
=U
(s)−U
(r), (2)
whereU
(r),U
(s)are the order statistics fromU(0,1). Thus (1) reduces to
P{U
(s)−U
(r)≥p}=γ. (3)

620 NONPARAMETRIC STATISTICAL INFERENCE
The statisticV=U
(s)−U
(r),1≤r<s≤n, is called thecoverage of the interval
(X
(r),X
(s)). More precisely, the differencesV k=F(X
(k))−F(X
(k−1))=U
(k)−U
(k−1),for
k=1,2,...,n+1, whereU
(0)=−∞andU
(n+1)=1, are calledelementary coverages.
Since the joint PDF ofU
(1),U
(2),...,U
(n)is given by
f(u
1,u2,...,u n)=
δ
n!,0<u
1<u2<···<u n,
0,otherwise,
the joint PDF ofV
1,V2,...,V nis easily seen to be
h(v
1,v2,...,v n)=
δ
n!,v
i≥0,i=1,2,...,n,
Θ
n
1
vi≤1
0,otherwise.
(4)
Note thathis symmetric in its arguments. Consequently,V
i’s are exchangeable RVs and
the distribution of every sum ofr,r<n, of these coverages is the same and, in particular,
it is the distribution ofU
(r)=
Θ
r
j=1
Vj, namely,
g
r(u)=
δ
n

n−1
r−1

u
r−1
(1−u)
n−r
,0<u<1
0, otherwise.
(5)
The common distribution of elementary coverages is
g
1(u)=n(1−u)
n−1
,0<u<1,=0,otherwise.
ThusEV
i=1/(n+1)and
Θ
r
i=1
EVi=r/(n+1). This may be interpreted as follows:
The order statisticsX
(1),X
(2),...,X
(n)partition the area under the PDF inn+1 parts such
that each part has the same average (expected) area.
The sum of anyrsuccessive elementary coveragesV
i+1,Vi+1,...,V i+ris called an
r-coverage. Clearly
r

j=1
Vi+j=U
(i+r)−U
(i),i+r≤n, (6)
and, in particular,U
(s)−U
(r)=
Θ
s
j=r+1
Vj. SinceV’s are exchangeable it follows that
U
(s)−U
(r)
d=U
(s−r) (7)
with PDF
g
s−r(u)=n
Γ
n−1
s−r−1

u
s−r−1
(1−u)
n−s+r
,0<u<1.
From (3), therefore,
γ=

1
p
gs−r(u)du=
s−r−1

i=0
Γ
n
i

p
i
(1−p)
n−i
, (8)

SOME APPLICATIONS OF ORDER STATISTICS 621
where the last equality follows from (5.3.48). Givenn,p,γit may not always be possible
to finds−rto satisfy (8).
Example 1.Lets=nandr=1. Then
γ=
n−2
σ
i=0
λ
n
i

p
i
(1−p)
n−i
=1−p
n
−np
n−1
(1−p).
Ifp=0.8,n=5,r=1, then
γ=1−(0.8)
5
−5(0.8)
4
(0.2)= 0.263.
Thus the interval(X
(1),X
(5))in this case defines a 26 percent tolerance interval for 0.80
probability under the distribution (ofX).
Example 2.LetX
1,X2,X3,X4,X5be a sample from a continuous DFF. Let us findrands,
r<s, such that(X
(r),X
(s))is a 90 percent tolerance interval for 0.50 probability underF.
We have
0.90=P

U≥
1
2

=
s−r−1
σ
i=0
λ
5
i
→∗
1
2

5
.
It follows that, if we chooses−r=4, thenγ=0.81; and if we chooses−r=5, then
γ=0.969. In this case, we must settle for an interval with tolerance coefficient 0.969,
exceeding the desired value 0.90.
In general, givenp,0<p<1, it is possible to choose a sufficiently large sample of
sizenand a corresponding value ofs−rsuch that with probability≥γan interval of the
form(X
(r),X
(s))covers at least 100p percent of the distribution. Ifs−ris specified as a
function ofn, one chooses the smallest sample sizen.
Example 3.Letp=
3
4
andγ=0.75. Suppose that we want to choose the smallest sample
size required such that(X
(2),X
(n))covers at least 75 percent of the distribution. Thus we
want the smallestnto satisfy
0.75≤
n−3
σ
i=0
λ
n
i
→∗
3
4


1
4

n−i
.
From Table ST1 of binomial distributions we see thatn=14.
We next consider the use of order statistics in constructing confidence intervals for
population quantiles. LetXbe an RV with a continuous DFF,0<p<1. Then the quantile
of orderpsatisfies
F(z
p)=p. (9)

622 NONPARAMETRIC STATISTICAL INFERENCE
LetX 1,X2,...,X nbenindependent observations onX. Then the number ofX i’s<z pis an
RV that has a binomial distribution with parametersnandp. Similarly, the number ofX
i’s
that are at leastz
phas a binomial distribution with parametersnand 1−p.
LetX
(1),X
(2),...,X
(n)be the set of order statistics for the sample. Then
P{X
(r)≤zp}=P{At leastrof theX i’s≤z p}
=
n
σ
i=r
λ
n
i

p
i
(1−p)
n−i
. (10)
Similarly
P{X
(s)≥zp}=P{At leastn−s+1oftheX i’s≥z p}
=P{At mosts−1oftheX
i’s<z p}
=
s−1
σ
i=0
λ
n
i

p
i
(1−p)
n−i
. (11)
It follows from (10) and (11) that
P{X
(r)≤zp≤X
(s)}=P{X
(s)≥zp}−P{X
(r)>zp}
=P{X
(r)≤zp}−1+P{X
(s)≥zp}
=
n
σ
i=r
λ
n
i

p
i
(1−p)
n−i
+
s−1
σ
i=0
λ
n
i

p
i
(1−p)
n−i
−1
=
s−1
σ
i=r
λ
n
i

p
i
(1−p)
n−i
. (12)
It is easy to determine a confidence interval forz
pfrom (12), once the confidence level is
given. In practice, one determinesrandssuch thats−ris as small as possible, subject to
the condition that the level is 1−α.
Example 4.Suppose that we want a confidence interval for the median (p=
1
2
), based on
a sample of size 7 with confidence level 0.90. It suffices to findrands,r<s, such that
s−1
σ
i=r
λ
7
i
→∗
12

7
≥0.90.
By trial and error, using the probability distributionb(7,
1
2
)we see that we can choose
s=7,r=2orr=1,s=6; in either cases−ris minimum (=5), and the confidence level
is at least 0.92.
Example 5.Let us compute the number of observations required for(X
(1),X
(n))to be a
0.95 level confidence interval for the median, that is, we want to findnsuch that
P{X
(1)≤z
1/2≤X
(n)}≥0.95.

SOME APPLICATIONS OF ORDER STATISTICS 623
It suffices to findnsuch that
n−1

i=1

n
i
→∗
1
2

n
≥0.95.
It follows from Table ST1 thatn=6.
Finally we consider applications of order statistics to constructing confidence intervals
for a location parameter. For this purpose we will use the method of test inversion discussed
in Chapter 11. We first consider confidence estimation based on the sign test of location.
LetX
1,X2,...,X nbe a random sample from a symmetric, continuous DFF(x−θ)and
suppose we wish to find a confidence interval forθ.LetR
+
(X−θ 0)=#ofX i’s>θ0,be
the sign-test statistic for testingH
0:θ=θ 0againstH 1:θ→=θ 0. Clearly,R
+
(X−θ 0)∼
b(n,1/2)underH
0. The sign-test rejectsH 0if
min{R
+
(X−θ 0),R
+
(θ0−X)}≤c (13)
for some integercto be determined from the level of the test. Letr=c+1. Then any
value ofθis acceptable provided it is greater than therth smallest observation and smaller
than therth largest observation, giving as confidence interval
X
(r)<θ<X
(n+1−r) . (14)
If we want level 1−αto be associated with (14), we choosecso that the level of test
(13) isα.
Example 6.The following 12 observations come from a symmetric, continuous DF
F(x−θ):
−223,−380,−94,−179,194,25,−177,−274,−496,−507,−20,122.
We wish to obtain a 95% confidence interval forθ. Sign test rejectsH
0ifR
+
(X)≥9or≤2
at level 0.05. Thus
P{3<R
+
(X−θ)<10}=1−2(0.0193)= 0.9614≥0.95.
It follows that a 95% confidence interval forθis given by(X
(3),X
(10))or(−380,25).
We next consider the Wilcoxon signed-ranks test ofH
0:θ=θ 0to construct a con-
fidence interval forθ. The test statistic in this case isT
+
=sum of ranks of positive
(X
i−θ0)’s in the ordered|X i−θ0|’s. From (13.3.4)
T
+
=

1≤i≤j≤n
I
[Xi+Xj>2θ 0]
=number of
X
i+Xj
2

0.

624 NONPARAMETRIC STATISTICAL INFERENCE
LetT ij=(X i+Xj)/2, 1≤i≤j≤nand order theN=
θ
n+1
2
α
T
ij’s in increasing order of
magnitude
T
(1)<T
(2)<···<T
(N).
Then using the argument that converts (13) to (14) we see that a confidence interval forθ
is given by
T
(r)<θ<T
(N+1−r) . (15)
Critical valuescare taken from Table ST10.
Example 7.For the data in Example 6, the Wilcoxon signed-rank test rejectsH
0:θ=θ 0
at level 0.05 ifT
+
>64 orT
+
<14. Thus
P{14≤T
+
(X−θ 0)≤64}≥0.95.
It follows that a 95% confidence interval forθis given by[T
(14),T
(64)]=[−336.5, −20].
PROBLEMS 13.6
1.Find the smallest values ofnsuch that the intervals (a)(X
(1),X
(n))and
(b)(X
(2),X
(n−1))contain the median with probability≥0.90.
2.Find the smallest sample size required such that(X
(1),X
(n))covers at least 90 percent
of the distribution with probability≥0.98.
3.Find the relation betweennandpsuch that(X
(1),X
(n))covers at least 100ppercent
of the distribution with probability≥1−p.
4.Givenγ,δ,p
0,p1withp 1>p0, find the smallestnsuch that
P{F(X
(s))−F(X
(r))≥p 0}≥γ
and
P{F(X
(s))−F(X
(r))≥p 1}≤δ.
Find alsos−r.
[Hint:Use the normal approximation to the binomial distribution.]
5.In Problem 4 find the smallestnand the associated value ofs−rifγ=0.95,δ=
0.10,p
1=0.75,p 0=0.50.
6.LetX
1,X2,...,X 7be a random sample from a continuous DFF. Compute:
(a)P(X
(1)<z.5<X
(7)).
(b)P(X
(2)<z.3<X
(5)).
(c)P(X
(3)<z.8<X
(6)).
7.LetX
1,X2,...,X nbe iid with common continuous DFF.

ROBUSTNESS 625
(a) What is the distribution of
F((X
n−1)−F(X
(j))+F(X
(i))−F(X
(2))
for 2≤i<j≤n−1?
(b) What is the distribution of[F(X
(n))−F(X
(2))]/[F(X
(n))−F(X
(1))].
13.7 ROBUSTNESS
Most of the statistical inference problems treated in this book are parametric in nature. We
have assumed that the functional form of the distribution being sampled is known except
for a finite number of parameters. It is to be expected that any estimator or test of hypothe-
sis concerning the unknown parameter constructed on this assumption will perform better
than the corresponding nonparametric procedure, provided that the underlying assump-
tions are satisfied. It is therefore of interest to know how well the parametric optimal tests
or estimators constructed for one population perform when the basic assumptions are mod-
ified. If we can construct tests or estimators that perform well for a variety of distributions,
for example, there would be little point in using the corresponding nonparametric method
unless the assumptions are seriously violated.
In practice, one makes many assumptions in parametric inference, and any one or all
of these may be violated. Thus one seldom has accurate knowledge about the true under-
lying distribution. Similarly, the assumption of mutual independence or even identical
distribution may not hold. Any test or estimator that performs well under modifications of
underlying assumptions is usually referred to asrobust.
In this section we will first consider the effect that slight variation in model assump-
tions have on some common parametric estimators and tests of hypotheses. Next we will
consider some corresponding nonparametric competitors and show that they are quite
robust.
13.7.1 Effect of Deviations from Model Assumptions on Some Parametric Proce-
dures
Let us first consider the effect of contamination on sample mean as an estimator of the
population mean.
The most commonly used estimator of the population meanμis the sample mean
X.
It has the property of unbiasedness for all populations with finite mean. For many parent populations (normal, Poisson, Bernoulli, gamma, etc.) it is a complete sufficient statistic and hence a UMVUE. Moreover, it is consistent and has asymptotic normal distribution
whenever the conditions of the central limit theorem are satisfied. Nevertheless, the sam-
ple mean is affected by extreme observations, and a single observation that is either too
large or too small may make
Xworthless as an estimator ofμ. Suppose, for example, that
X
1,X2,...,X nis a sample from some normal population. Occasionally something happens
to the system, and a wild observation is obtained that is, suppose one is sampling from N(μ,σ
2
), say, 100α percent of the time and fromN(μ,kσ
2
), wherek>1,(1−α)100

626 NONPARAMETRIC STATISTICAL INFERENCE
percent of the time. Here bothμandσ
2
are unknown, and one wishes to estimateμ.In
this case one is really sampling from the density function
f(x)=αf
0(x)+(1−α)f 1(x), (1)
wheref
0is the PDF ofN(μ,σ
2
), andf 1, the PDF ofN(μ,kσ
2
). Clearly,
X=
ζ
n
1
Xi
n
(2)
is still unbiased forμ.Ifαis nearly 1, there is no problem since the underlying distribution
is nearlyN(μ,σ
2
), and
Xis nearly the UMVUE ofμwith varianceσ
2
/n.If1−αis large
(that is, not nearly 0), then, since one is sampling fromf, the variance ofX
1isσ
2
with
probabilityαand iskσ
2
with probability 1−α, and we have
var
σ(
X)=
1
n
var(X
1)=
σ
2
n
[α+(1−α)k]. (3)
Ifk(1−α)is large,var
σ(
X)is large and we see that even an occasional wild observa-
tion makesXsubject to a sizable error. The presence of an occasional observation from
N(μ,kσ
2
)is frequently referred to ascontamination. The problem is that we do not know,
in practice, the distribution of the wild observations and hence we do not know the PDFf.
It is known that the sample median is a much better estimator than the mean in the pres-
ence of extreme values. In the contamination model discussed above, if we useZ
1/2,the
sample median of theX
i’s, as an estimator ofμ(which is the population median), then for
largen
E(Z
1/2−μ)
2
=var(Z
1/2)≈
1
4n
1
[f(μ)]
2
. (4)
(See Theorem 7.5.2 and Remark 7.5.7.) Since
f(μ)=αf
0(μ)+(1−α)f 1(μ)
=
α
σ


+(1−α)
1
σ

2πk
=
λ
α+
1−α

k
λ
1
σ


,
we have
var(Z
1/2)≈
πσ
2
2n
1
{α+[(1−α)/

k]}
2
. (5)
Ask→∞,var(Z
1/2)≈πσ
2
/(2nα
2
). If there is no contamination,α=1 andvar(Z
1/2)≈
πσ
2
/2n.Also,
πσ
2
/2nα
2
πσ
2
/2n
=
1
α
2
,

ROBUSTNESS 627
which will be close to 1 ifαis close to 1. Thus the estimatorZ
1/2will not be greatly
affected by how largekis, that is, how wild the observations are. We have
var(X)
var(Z
1/2)
=
2
π
[α+(1−α)k]

α+
(1−α)

k

2
→∞ ask→∞.
Indeed,var(
X)→∞ask→∞, whereasvar(Z
1/2)→πσ
2
/(2nα
2
)ask→∞. One can
check that, whenk=9 andα≈0.915, the two variances are (approximately) equal. As
kbecomes larger than 9 orαsmaller than 0.915,Z
1/2becomes a better estimator ofμ
than
X.
There are other flaws as well. Suppose, for example, thatX
1,X2,...,X nis a sam-
ple fromU(0,θ),θ>0. Then both
XandT(X)=( X
(1)+X
(n))/2, whereX
(1)=
min(X
1,...,X n),X
(n)=max(X 1,...,X n), are unbiased forEX=θ/2. Also,var θ(
X)=
var(X)/n=θ
2
/[12n], and one can show thatvar(T)=θ
2
/[2(n+1)(n+2)]. It follows
that the efficiency of
Xrelative to that ofTis
eff
θ(
X|T)=
var
θ(T)
varθ(X)
=
6n
(n+1)(n+2)
<1ifn>2.
In fact,eff
θ(
X|T)→0asn →∞, so that in sampling from a uniform parentXis much
worse thanT, even for moderately large values ofn.
Let us next turn our attention to the estimation of standard deviation. LetX
1,X2,...,X n
be a sample fromN(μ,σ
2
). Then the MLE ofσis
ˆσ=
δ
n
σ
i=1
(Xi−
X)
2
n
ˆ
1/2
=
λ
n−1
n
λ
1/2
S. (6)
Note that the lower bound for the variance of any unbiased estimator forσisσ
2
/2n.
Althoughˆσis not unbiased, the estimator
S
1=

n
2
Γ[(n−1)/2]
Γ(n/2)
ˆσ=

n−1
2
Γ[(n−1)/2]
Γ(n/2)
S (7)
is unbiased forσ.Also,
var(S
1)=σ
2
δ
n−1
2
λ
Γ[(n−1)/2]
Γ(n/2)
λ
2
−1
ˆ
=
σ
2
2n
+O
λ
1
n
2
λ
. (8)
Thus the efficiency ofS
1(relative to the estimator with least variance=σ
2
/2n)is
σ
2
/2n
var(S 1)
=
1
1+σ
2
O
θ
2
n
α<1

628 NONPARAMETRIC STATISTICAL INFERENCE
and→1asn→∞. For smalln, the efficiency ofS 1is considerably smaller than 1. Thus,
forn=2, eff(S
1)=1/[2(π −2)] =0.438 and, forn=3, eff(S 1)=π/[6(4−π)] =0.61.
Yet another estimator ofσis the sample mean deviation
S
2=
1
n
n

i=1
|Xi−X|. (9)
Note that
E
δ
π
2
1
n
n

i=1
|Xi−μ|
ˆ
=

π
2
E|X
i−μ|=σ,
and
var
δ
π
2
1
n
n

i=1
|Xi−μ|
ˆ
=
π−2
2n
σ
2
. (10)
Ifnis large enough so that
X≈μ, we see thatS 3=
τ
(π/2)S 2is nearly unbiased forσ
with variance[(π−2)/2n]σ
2
. The efficiency ofS 3is
σ
2
(2n)
σ
2
[(π−2)/(2n)]
=
1
π−2
<1.
For largen, the efficiency ofS
1relative toS 3is
var(S
3)
var(S 1)
=
[(π−2)/(2n)]σ
2
σ
2
/(2n)+O(1/n
2
)
=π−2+
π−2
O(2/n)
>1.
Now suppose that there is some contamination. As before, let us suppose that for a
proportionαof the time we sample fromN(μ,σ
2
)and for a proportion 1−αof the time we
get a wild observation fromN(μ,kσ
2
),k>1. Assuming that bothμandσ
2
are unknown,
suppose that we wish to estimateσ. In the notation used above, let
f(x)=αf
0(x)+(1−α)f 1(x),
wheref
0is the PDF ofN(μ,σ
2
), andf 1, the PDF ofN(μ,kσ
2
). Let us see how even small
contamination can make the maximum likelihood estimateˆσofσquite useless.
If
ˆ
θis the MLE ofθ, andϕis a function ofθ, thenϕ(
ˆ
θ)is the MLE ofϕ(θ).Inview
of (7.5.7) we get
E(ˆσ−σ)
2

1

2
E(ˆσ
2
−σ
2
)
2
. (11)
Using Theorem 7.3.5, we see that
E(ˆσ
2
−σ
2
)
2

μ
4−μ
2
2
n
(12)

ROBUSTNESS 629
(dropping the other two terms withn
2
andn
3
in the denominator), so that
E(ˆσ−σ)
2

1

2
n

4−μ
2
2
). (13)
For the densityf, we see that
μ
4=3σ
4
[α+k
2
(1−α)] (14)
and
μ
2=σ
2
[α+k(1−α)]. (15)
It follows that
E{ˆσ−σ}
2

σ
2
4n

3[α+k
2
(1−α)]−[α+k(1−α)]
2
!
. (16)
If we are interested in the effect of very small contamination,α≈1 and 1−α≈0.
Assuming thatk(1−α)≈0, we see that
E{ˆσ−σ}
2

σ
2
4n
{3[1+k
2
(1−α)]−1}
=
σ
2
2n
χ
1+
3
2
k
2
(1−α)

. (17)
In the normal case,μ
4=3σ
4
andμ
2
2

4
, so that from (11)
E{ˆσ−σ}
2

σ
2
2n
.
Thus we see that the mean square error due to a small contamination is now multiplied by
a factor[1+
3
2
k
2
(1−α)]. If, for example,k=10,α=0.99, then 1+
3
2
k
2
(1−α)=
5
2
.If
k=10,α=0.98, then 1+
3
2
k
2
(1−α)=4, and so on.
A quick comparison withS
3shows that, althoughS 1(or evenˆσ) is a better estimator of
σthanS
3if there is no contamination,S 3becomes a much better estimator in the presence
of contamination askbecomes large.
Next we consider the effect of deviation from model assumptions on tests of hypothe-
ses. One of the most commonly used tests in statistics is Student’st-test for testing the
mean of a normal population when the variance is unknown. LetX
1,X2,...,X nbe a sam-
ple from some population with meanμand finite varianceσ
2
. As usual, let
Xdenote the
sample mean, andS
2
, the sample variance. If the population being sampled is normal, the
t-test rejectsH
0:μ=μ 0againstH 1:μ→=μ 0at levelαif|
x−μ 0|>t
n−1,α/2 (s/

n).If
nis large, we replacet
n−1,α/2 by the corresponding critical value,z
α/2, under the stan-
dard normal law. If the sample does not come from a normal population, the statistic T=[(X−μ 0)/S]

nis no longer distributed as at(n−1)statistic. If, however,nis suf-
ficiently large, we know thatThas an asymptotic normal distribution irrespective of the
population being sampled, as long as it has a finite variance. Thus, for largen, the distri-
bution ofTis independent of the form of the population, and thet-test is stable. The

630 NONPARAMETRIC STATISTICAL INFERENCE
same considerations apply to testing the difference between two means when the two
variances are equal. Although we assumed thatnis sufficiently large for Slutsky’s result
(Theorem 7.2.15) to hold, empirical investigations have shown that the test based on Stu-
dent’s statistic is robust. Thus a significant value oftmay not be interpreted to mean a
departure from normality of the observations. Let us next consider the effect of depar-
ture from independence on thet-distribution. Suppose that the observationsX
1,X2,...,X n
have a multivariate normal distribution withEX i=μ,var(X i)=σ
2
, andρas the common
correlation coefficient between anyX
iandX j,iλ=j. Then
E
X=μand var(X)=
σ
2
n
[1+(n−1)ρ], (18)
and sinceX
i’s are exchangeable it follows from Remark 6.3.1 that
ES
2

2
(1−ρ). (19)
For largen, the statistic

n(X−μ 0)/Swill be asymptotically distributed asN(0,1+
nρ/(1 −ρ)), instead ofN(0,1). UnderH
0,ρ=0 andT
2
=n(
X−μ 0)
2
/S
2
is distributed as
F(1,n−1). Consider the ratio
nE(X−μ 0)
2
ES
2
=
σ
2
[1+(n−1)ρ]
σ
2
(1−ρ)
=1+

1−ρ
. (20)
The ratio equals 1 ifρ=0butis>0forρ> 0 and→∞asρ→1. It follows that a large
value ofTis likely to occur whenρ>0 and is large, even thoughμ
0is the true value of
the mean. Thus a significant value oftmay be due to departure from independence, and
the effect can be serious.
Next, consider a test of the null hypothesisH
0:σ=σ 0againstH 1:σλ=σ 0. Under the
usual normality assumptions on the observationsX
1,X2,...,X n, the test statistic used is
V=
(n−1)S
2
σ
2
=
ζ
n
i=1
(Xi−
X)
2
σ
2
, (21)
which has aχ
2
(n−1)distribution underH 0. The usual test is to rejectH 0if
V
0=
(n−1)S
2
σ
2
0

2
n−1,α/2
orV 0<χ
2
n−1,1−α/2
. (22)
Let us suppose thatX
1,X2,...,X nare not normal. It follows from Corollary 2 of Theo-
rem 7.3.4 that
var(S
2
)=
μ
4
n
+
3−n
n(n−1)
μ
2
2
, (23)
so that
var
λ
S
2
σ
2
λ
=
1
n
μ
4
σ
4
+
3−n
n(n−1)
. (24)

ROBUSTNESS 631
Writingγ 2=(μ 4/σ
4
)−3, we have
var
λ
S
2
σ
2

=
γ
2
n
+
2
n−1
(25)
when theX
i’s are not normal, and
var
λ
S
2
σ
2

=
2
n−1
(26)
when theX
i’s are normal (γ 2=0). Now(n−1)S
2
=
ζ
n
i=1
(Xi−
X)
2
is the sum ofnidenti-
cally distributed but dependent RVs(X
j−
X)
2
,j=1,2,...,n. Using a version of the central
limit theorem for dependent RVs (see, e.g., Cramér [17, p. 365]), it follows that
λ
n−1
2

−1/2λ
S
2
σ
2
−1

,
underH
0, is asymptoticallyN(0,1+(γ 2/2)), and notN(0,1)as under the normal theory.
As a result the size of the test based on the statisticV
0will be different from the stated
level of significance ifγ
2differs greatly from 0. It is clear that the effect of violation
of the normality assumption can be quite serious on inferences about variances, and the
chi-square test is not robust.
In the above discussion we have used somewhat crude calculations to investigate the
behavior of the most commonly used estimators and test statistics when one or more of
the underlying assumptions are violated. Our purpose here was to indicate that some tests
or estimators are robust whereas others are not. The moral is clear: One should check
carefully to see that the underlying assumptions are satisfied before using parametric
procedures.
13.7.2 Some Robust Procedures
LetX
1,X2,...,X nbe a random sample from a continuous PDFf(x−θ),θ∈Rand assume
thatfis symmetric aboutθ. We shall be interested in estimation or tests of hypotheses
concerningθ. Our objective is to find procedures that perform well for several different
types of distributions but do not have to be optimal for any particular distribution. We will
call such proceduresrobust. We first consider estimation ofθ.
The estimators fall under one of the following three types:
1. Estimators that are functions ofR=(R
1,R2,...,R n), whereR jis the rank ofX j,are
known asR-estimators. Hodges and Lehmann [44] devised a method of deriving
such estimators from rank tests. These include the sample median˜X(based on the
sign test) andW=med{(X
i+Xj)/2,1≤i≤j≤n}based on the Wilcoxon signed-
rank test.
2. Estimators of the form
ζ
n
i=1
aiX
(i)are calledL-estimators, being linear combina-
tions of order statistics. This class includes the median, the mean, and the trimmed
mean obtained by dropping a prespecified proportion of extreme observations.

632 NONPARAMETRIC STATISTICAL INFERENCE
3. Maximum likelihood type estimators obtained as solutions to certain equations
ζ
n
j=1
ψ(Xj−θ)=0 are calledM-estimators. The functionψ(t)=−f
χ
(t)/f(t)gives
MLEs.
Definition 1.Letk=[nα]be the largest integer≤nαwhere 0<α<1/2. Then the
estimator
Xα=
n−k
σ
j=k+1
X
(j)
n−2k
(27)
is called atrimmed-mean.
Two extreme examples of trimmed means are the sample meanX(α=0)and the
median˜Xwhen all except the central (nodd) or the two central (neven) observations
are excluded.
Example 1.Consider the following sample of size 15 taken from a symmetric
distribution.
0.97 0.66 0.73 0.78 1.30 0.58 0.79 0.94
0.52 0.52 0.83 1.25 1.47 0.96 0.71
Supposeα=0.10. Thenk=[nα]=1 and
x0.10=
ζ
14
j=2
x
(j)
15−2
=0.85.
Here¯x=0.867,med
1≤j≤15
xj=x
(8)=0.79.
We will limit this discussion to four estimators of location, namely, the sample median,
trimmed mean, sample mean, and Hodges–Lehmann type estimator based on Wilcoxon
signed-rank test. In order to compare the performance of two proceduresAandBwe will
use a (large sample) measure of relative efficiency due to Pitman. Pitman’sasymptotic
relative efficiency(ARE) of procedureBrelative to procedureAis the limit of the ratio
of sample sizesn
A/nB, wheren A,nBare sample sizes needed for proceduresAandBto
perform equivalently with respect to a specified criterion. For example, suppose{T
n(A)}
and{T
n(B)}are two sequences of estimators forψ(θ)such that
T
n(A)∼AN
λ
ψ(θ),
σ
2
A
(θ)
n(A)

,
and
T
n(B)∼AN
λ
ψ(θ),
σ
2
B
(θ)
n(B)

.
Suppose further thatAandBperform equivalently if their asymptotic variances are the
same, that is,
σ
2
A
(θ)
n(A)

σ
2
B
(θ)
n(B)
.

ROBUSTNESS 633
Then
n(A)
n(B)
−→
σ
2
A
(θ)
σ
2
B
(θ)
.
Clearly, different performance measures may lead to different measures of ARE.
Similarly if proceduresAandBlead to two sequences of tests, then ARE is the limiting
ratio of the sample sizes needed by the tests to reach a certain powerβ
0against the same
alternative and at the same limiting levelα.
Accordingly, lete(B,A)denote the ARE ofBrelative toA.Ife(B,A)=1/2say,
then procedureArequires (approximately) half as many observations as procedureB.
We will writee
F(B,A), whenever necessary to indicate the dependence of ARE on the
underlying DFF.
For detailed discussion of Pitman efficiency we refer to Lehmann [61, pp. 371–380],
Lehmann [63, section 5.2], Serfling [102, chapter 10], Randles and Wolfe [85, chapter 5],
and Zacks [121]. The expressions for AREs of median and the Hodges-Lehmann estima-
tors of location parameterθwith respect to the sample mean
Xare
e
F(˜X,
X)=4σ
2
F
f(0), (28)
e
F(W,
X)=12σ
2
F


−∞
f
2
(x)dx

2
, (29)
wherefis the PDF corresponding toF. In order to gete
F(˜X,W)we use the fact that
e
F(˜X,W)=
e
F(˜X,
X)
eF(W,X)
=
f(0)
3

ψ

−∞
f
2
(x)dx

2
. (30)
Bickel [5] showed that
e
F(
Xα,X)=
σ
2
F
σ
2
α
, (31)
where
σ
2
α
=
2
(1−2α)
2

z1−α
0
t
2
f(t)dt+αz 1−α

(32)
andz
αis the uniqueαth percentile ofF. It is clear from (32) that no closed form expression
fore
F(
Xα,X)is possible for most DFsF.
In the following table we give the AREs for some selectedF.

634 NONPARAMETRIC STATISTICAL INFERENCE
ARE Computations for SelectedF
F e(˜X,X) e(W,X) e(˜X,W)
U(−1/2,1/2) 1/3 1 1/3
N(0,1) 2/π=0.637 3/π =0.955 2/3
Logistic,f(x)=e
−x
(1+e
−x
)
−1
π
2
/12=0.822 1.10 0.748
Double Exponential,
f(x)=(1/2)exp(−|x|) 21 .54 /3
C(0,1) ∞∞ 4/3It can be shown thate F(˜X,X)≥1/3 for all symmetricF,so˜Xis quite inefficient
compared toXforU(−1/2,1/2). Even for normalf,˜Xwould require 157 observations
to achieve the same accuracy thatXachieves with 100 observations. For heavier tailed
distributions, however,˜Xprovides more protection thatX.
The values ofe(W,X), on the other hand, are quite high for mostFand, in fact,
e
F(W,
X)≥0.864 for all symmetricF. Even for normalFone loses little (4.5%) in using
Winstead ofX. ThusWis more robust as an estimator ofθ.
A look at the values ofe(˜X,W)shows that˜Xis worse thanWfor distributions with
light-tails but does slightly better thanWfor heavier-tailedF.
Let us now compare the AREs ofXα,X, andW. The following AREs for selectedα
are due to Bickel [5].
ARE Comparisons
α=0.01 α=0.05
F e(Xα,X)e(W,Xα)e(Xα,X)e(W,Xα)
Uniform 0.96 1.04 0.83 1.20
Normal 0.995 0.96 0.97 0.985
Double Exponential 1.06 1.41 1.21 1.24
Cauchy ∞ 6.72 ∞ 2.67
We note thatXαperforms quite well compared toX. In fact, for normal distribution the
efficiency is quiet close to 1 so there is little loss in usingXα. For heavier-tailed distribu-
tionsXαis preferable. For small values ofα, it should be noted thatXαdoes not differ
much fromX. Nevertheless,Xαis more robust; it cannot do much worse thanXbut can
do much better. Compared to Hodges–Lehmann estimator,Xαdoes not perform as well.
It(W)provides better protection against outliers (heavy tails) and gives up little in the
normal case.
Finally we consider testingH
0:θ=θ 0againstH 1:θ>θ0. Recall thatX 1,X2,...,X n
are iid with common continuous symmetric DFF(x−θ),θ∈Rand PDFf(x−θ).
Supposeσ
2
F
=Var(X 1)<∞.LetSdenotes the sign test based on the statis-
ticR
+
(X)=
Θ
n
i=1
I
[Xi>θ0],Wdenotes the Wilcoxon signed-rank test based on the

ROBUSTNESS 635
statisticT
+
(X)=
Θ
1≤i≤j≤n
I
[Xi+Xj>2θ 0],Mdenotes the test based on theZ-statisticZ=

n((X−θ 0)/σF, andtdenotes the student’st-test based on the statistic

n(X−θ 0)/S,
whereS
2
is the sample variance.
First note thate(T,M)=1. Next we note thate
F(S,t)=e F(˜X,
X),e F(W,t)=e F(W,X)
so that AREs are the same as given in (28), (29), and (30) and values of ARE given in the
table for variousFremain the same for corresponding tests.
Similar remarks apply as in the case of estimation ofθ. Sign test is not as efficient as the
Wilcoxon signed-rank test. But for heavier-tailed distributions such as Cauchy and double
exponential sign test does better than the Wilcoxon signed-rank test.
PROBLEMS 13.7
1.Let(X
1,X2,...,X n)be jointly normal withEX i=μ,var(X i)=σ
2
, andcov(X i,Xj)=
ρσ
2
if|i−j|=1,i→=j, and=0 otherwise.
(a) Show that
var(
X)=
σ
2
n

1+2ρ
Γ
1−
1
n

and
E(S
2
)=σ
2
Γ
1−

n

.
(b) Show that thet-statistic

n(X−μ)/Sis asymptotically normally distributed with
mean 0 and variance 1+2ρ. Conclude that the significance oftis overestimated
for positive values ofρand underestimated forρ<0 in large samples.
(c) For finiten, consider the statistic
T
2
=
n(
X−μ)
2
S
2
.
Compare the expected values of the numerator and the denominator ofT
2
and
study the effect ofρ→=0 to interpret significanttvalues (Scheffé [101, p. 338].)
2.LetX
1,X2,...,X nbe a random sample fromG(α,β),α>0,β>0:
(a) Show that
μ
4=3α(α+2)/β
4
.
(b) Show that
var

(n−1)
S
2
σ
2

≈(n−1)
Γ
2+
6
α

.
(c) Show that the large sample distribution of(n−1)S
2

2
is normal.
(d) Compare the large-sample test ofH
0:σ=σ 0based on the asymptotic normality
of(n−1)S
2

2
with the large-sample test based on the same statistic when the
observations are taken from a normal population. In particular, takeα=2.

636 NONPARAMETRIC STATISTICAL INFERENCE
3.LetX 1,X2,...,X mandY 1,Y2,...,Y nbe two independent random samples from pop-
ulations with meansμ
1andμ 2, and variancesσ
2
1
andσ
2
2
, respectively. Let
X,Ybe
the two sample means, andS
2
1
,S
2
2
be the two sample variances. WriteN=m+n,
R=m/n, andθ=σ
2
1

2
2
. The usual normal theory test ofH 0:μ1−μ2=δ0is the
t-test based on the statistic
T=
X−Y−δ 0
Sp(1/m+1/n)
1/2
,
where
S
2
p
=
(m−1)S
2
1
+(n−1)S
2
2
m+n−2
.
UnderH
0, the statisticThas at-distribution withN−2 d.f., provided thatσ
2
1

2
2
.
Show that the asymptotic distribution ofTin the nonnormal case is
N(0,(θ+R)(1+Rθ)
−1
)for largemandn.Thus,ifR =1,Tis asymptotically
N(0,1)as in the normal theory case assuming equal variances, even though the two
samples come from nonnormal populations with unequal variances. Conclude that
the test is robust in the case of large, equal sample sizes (Scheffé [101, p. 339]).
4.Verify the computations in the table above using the expressions of ARE in (28),
(29), and (30).
5.SupposeFis aG(α,β)r.v. Show that
e(W,
X)=
3αΓ
2
(2α)
2
4(α− 1)
(2α−1)
2
{Γ(α)}
4
.
(Note thatFis not symmetric.)
6.SupposeFhas PDF
f(x)=
Γ(m)
Γ(1/2)Γ((m −1)/2)(1 +x
2
)
m
,−∞<x<∞,
form≥1. computee(˜X,X),e(W,X), ande(˜X,W). (From Problem 3.2.3,E|X|
k
<∞
ifk<m−1/2.)

FREQUENTLY USED SYMBOLS
AND ABBREVIATIONS
⇒ implies
⇔ implies and is implied by
→ converges to
↑,↓ increasing, decreasing
χα,χλ nonincreasing, nondecreasing
Γ(x) gamma function
lim,lim,limlimit superior, limit inferior, limit
R,R
n real line,n-dimensional Euclidean space
B,B
n Borelσ-field onR, Borelσ-field onR n
IA indicator function of setA
ε(x)= 1ifx≥0, and=0ifx<0
μ EX, expected value
m
n EX
n
,n≥0integral
β
α E|X|
α
,α>0
μ
k E(X−EX)
k
,k≥0integral
σ
2
=μ2, variance
f

,f
⇒⇒
,f
⇒⇒⇒
first, second, third derivative off
∼ distributed as
≈ asymptotically (or approximately) equal to
An Introduction to Probability and Statistics, Third Edition. Vijay K. Rohatgi and A.K. Md. Ehsanes Saleh.
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.

638 FREQUENTLY USED SYMBOLS AND ABBREVIATIONS
L
−→ convergence in law
P
−→ convergence in probability
a.s.
−−→ convergence almost surely
r
−→ convergence inrth mean
RV random variable
DF distribution function
PDF probability density function
PMF probability mass function
PGF probability generating function
MGF moment generating function
d.f. degrees of freedom
BLUE best linear unbiased estimate
MLE maximum likelihood estimate
MVUE minimum variance unbiased estimate
UMA uniformly most accurate
UMVUE uniformly minimum variance unbiased estimate
UMAU uniformly most accurate unbiased
MP most powerful
UMP uniformly most powerful
GLM general linear model
i.o. infinitely often
iid independent, identically distributed
SD standard deviation
SE standard error
MLR monotone likelihood ratio
MSE mean square error
WLLN weak law of large numbers
SLLN strong law of large numbers
CLT central limit theorem
SPRT sequential probability ratio test
b(1,p) Bernoulli with parameterp
b(n,p) binomial with parametersn,p
NB(r;p) negative binomial with parametersr,p
P(λ) Poisson with parameterλ
U[a,b] uniform on[a,b]
G(α,β) gamma with parametersα,β
B(α,β) beta with parametersα,β
χ
2
(n) chi-square with d.f.n
C(μ,θ) Cauchy with parametersμ,θ

FREQUENTLY USED SYMBOLS AND ABBREVIATIONS 639
N(μ,σ
2
) normal with meanμ, varianceσ
2
t(n) Student’stwithnd.f.
F(m,n) F-distribution with(m,n)d.f.
z
α 100(1 −α)th percentile ofN(0,1)
χ
2
n,α
100(1 −α)th percentile ofχ
2
(n)
t
n,α 100(1 −α)th percentile oft(n)
F
m,n,α 100(1 −α)th percentile ofF(m,n)
AN(μ
n,σ
2
n
)asymptotically normal
GLR generalized likelihood ratio
MRE minimum risk equivariant
nx logarithm (to basee)ofx
exp(X) exponential
LMP locally most powerful
L(x) law or distribution of RVX
b(δ,.) bias in estimatorδ
iid independent, identically distributed

REFERENCES
1. A. Agresti,Categorical Data Analysis, 3rd ed., Wiley, New York, 2012.
2. T. W. Anderson, Maximum likelihood estimates for a multivariate normal distribution when
some observations are missing,J. Am. Stat. Assoc.52 (1957), 200–203.
3. J. D. Auble, Extended tables for the Mann–Whitney statistic,Bull.Inst.Educ.Res.1 (1953),
No. i–iii, 1–39.
4. D. Bernstein, Sur une propriété charactéristique de la loi de Gauss,Trans. Leningrad Polytech.
Inst.3 (1941), 21–22.
5. P. J. Bickel, On some robust estimators of location,Ann. Math. Stat.36 (1965), 847–858.
6. P. Billingsley,Probability and Measure,2nded., Wiley, New York, 1986.
7. D. Birkes, Generalized likelihood ratio tests and uniformly most powerful tests,Am. Stat.44
(1990), 163–166.
8. Z. W. Birnbaum and F. H. Tingey, One-sided confidence contours for probability distribution
functions,Ann. Math. Stat.22 (1951), 592–596.
9. Z. W. Birnbaum, Numerical tabulation of the distribution of Kolmogorov’s statistic for finite
sample size,J. Am. Stat. Assoc., 17 (1952), 425–441.
10. D. Blackwell, Conditional expectation and unbiased sequential estimation,Ann. Math. Stat.
18 (1947), 105–110.
11. J. Boas, A note on the estimation of the covariance between two random variables using extra
information on the separate variables,Stat. Neerl.21 (1967), 291–292.
12. D. G. Chapman and H. Robbins, Minimum variance estimation without regularity assump-
tions,Ann. Math. Stat.22 (1951), 581–586.
13. S. D. Chatterji, Some elementary characterizations of the Poisson distribution,Am. Math. Mon.
70 (1963), 958–964.
An Introduction to Probability and Statistics, Third Edition. Vijay K. Rohatgi and A.K. Md. Ehsanes Saleh.
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.

642 REFERENCES
14. K. L. Chung and P. Erdös, On the application of the Borel–Cantelli lemma,Trans. Am. Math.
Soc.72 (1952), 179–186.
15. K. L. Chung,A Course in Probability Theory, Harcourt, Brace & World, New York, 1968.
16. H. Cramér, Über eine Eigenschaft der normalen Verteilungsfunktion,Math. Z.41 (1936),
405–414.
17. H. Cramér,Mathematical Methods of Statistics, Princeton University Press, Princeton, N.J.,
1946.
18. H. Cramér, A contribution to the theory of statistical estimation,Skand. Aktuarietidskr.29
(1946), 85–94.
19. J. H. Curtiss, A note on the theory of moment generating functions,Ann. Math. Stat.13 (1942),
430–433.
20. D. A. Darmois, Sur diverses propriétés charactéristique de la loi de probabilité de Laplace-
Gauss,Bull. Int. Stat. Inst.23 (1951), part II, 79–82.
21. M. M. Desu, Optimal confidence intervals of fixed width,Am. Stat.25 (1971), No. 2, 27–29.
22. B. Efron, Bootstrap methods: Another look at the jackknife.Ann. Stat. 7 (1979), 1–26.
23. B. Efron and R. J. Tibshirani,An Introduction to Bootstrap, Chapman Hall, New York, 1993.
24. W. Feller, Über den Zentralen Granzwertsatz der Wahrscheinlichkeitsrechnung,Math. Z.40
(1935), 521–559; 42 (1937), 301–312.
25. W. Feller,An Introduction to Probability Theory and Its Applications,Vol.1,3rded.,Wiley,
New York, 1968.
26. W. Feller,An Introduction to Probability Theory and Its Applications, Vol. 2, 2nd ed., Wiley,
New York, 1971.
27. K. K. Ferentinos, Shortest confidence intervals for families of distributions involving trunca-
tion parameters,Am. Stat.44 (1990), 40–41.
28. T. S. Ferguson,Mathematical Statistics, Academic Press, New York, 1967.
29. T. S. Ferguson,A Course in Large Sample Theory, Chapman & Hall, London, 1996.
30. R. A. Fisher, On the mathematical foundations of theoretical statistics,Phil. Trans. R. Soc.
A222 (1922), 309–386.
31. M. Fisz,Probability Theory and Mathematical Statistics, 3rd ed., Wiley, 1963.
32. D. A. S. Fraser,Nonparametric Methods in Statistics, Wiley, New York, 1965.
33. D. A. S. Fraser,The Structure of Inference, Wiley, New York, 1968.
34. M. Fréchet, Sur l’extension de certaines evaluations statistiques au cas de petits echantillons,
Rev.Inst.Int.Stat.11 (1943), 182–205.
35. J. D. Gibbons,Nonparametric Statistical Inference, Dekker, New York, 1985.
36. B. V. Gnedenko, Sur la distribution limite du terme maximum d’une série aléatoire,Ann. Math.
44 (1943), 423–453.
37. W. C. Guenther, Shortest confidence intervals,Am. Stat.23 (1969), No. 1, 22–25.
38. W. C. Guenther, Unbiased confidence intervals,Am. Stat.25 (1971), No. 1, 51–53.
39. E. J. Gumbel, Distributions à plusieurs variables dont les marges sont données,C. R. Acad.
Sci. Paris246 (1958), 2717–2720.
40. J. H. Hahn and W. Nelson, A problem in the statistical comparison of measuring devices,
Technometrics12 (1970), 95–102.
41. P. R. Halmos and L. J. Savage, Application of the Radon–Nikodym theorem to the theory of
sufficient statistics,Ann. Math. Stat.20 (1949), 225–241.
42. P. R. Halmos,Measure Theory, Van Nostrand, New York, 1950.

REFERENCES 643
43. J. L. Hodges and E. L. Lehmann, Some problems in minimax point estimation,Ann. Math.
Stat.21 (1950), 182–197.
44. J. L. Hodges, Jr. and E. L. Lehmann, Estimates of location based on rank tests,Ann. Math.
Stat.34 (1963), 598–611.
45. W. Hoeffding, A class of statistics with asymptotically normal distribution,Ann. Math. Stat.
19 (1948), 293–325.
46. D. Hogben, The distribution of the sample variance from a two-point binomial population,
Am. Stat.22 (1968), No. 5, 30.
47. V. S. Huzurbazar, The likelihood equation consistency, and maxima of the likelihood function,
Ann. Eugen.(London) 14 (1948), 185–200.
48. M. Kac,Lectures in Probability, The Mathematical Association of America, Washinton, D.C.,
1964–1965.
49. W. C. M. Kallenberg et al.,Testing Statistical Hypotheses: Worked Solutions,CWI,Amster-
dam, 1980.
50. J. F. Kemp, A maximal distribution with prescribed marginals,Am. Math. Mon.80 (1973), 83.
51. M. G. Kendall,Rank Correlation Methods, 3rd ed., Charles Griffin, London, 1962.
52. J. Kiefer, On minimum variance estimators,Ann. Math. Stat.23 (1952), 627–629.
53. A. N. Kolmogorov, Sulla determinazione empirica di una legge di distribuzione,G. Inst. Ital.
Attuari4 (1933), 83–91.
54. A. N. Kolmogorov and S. V. Fomin,Elements of the Theory of Functions and Functional
Analysis, Vol. 2, Graylock Press, Albany, N.Y., 1961.
55. C. H. Kraft and C. Van Eeden,A Nonparametric Introduction to Statistics, Macmillan,
New York, 1968.
56. W. Kruskal, Note on a note by C. S. Pillai,Am. Stat.22 (1968), No. 5, 24–25.
57. G. Kulldorf, On the condition for consistency and asymptotic efficiency of maximum likeli-
hood estimates,Skand. Aktuarietidskr.40 (1957), 129–144.
58. R. G. Laha and V. K. Rohatgi,Probability Theory, Wiley, New York, 1979.
59. J. Lamperti, Density of random variable,Am. Math. Mon.66 (1959), 317.
60. J. Lamperti and W. Kruskal, Solution by W. Kruskal to the problem “Poisson Distribution”
posed by J. Lamperti,Am. Math. Mon.67 (1960), 297–298.
61. E. L. Lehmann,Nonparametrics: Statistical Methods Based on Ranks, Holden-Day, San
Francisco, C.A., 1975.
62. E. L. Lehmann, An interpretation of completeness and Basu’s theorem,J. Am. Stat. Assoc.76
(1981), 335–340.
63. E. L. Lehmann,Theory of Point Estimation, Wiley, New York, 1983.
64. E. L. Lehmann,Testing Statistical Hypotheses, 2nd ed., Wiley, New York, 1986.
65. E. L. Lehmann and H. Scheffé, Completeness, similar regions, and unbiased estimation,
Sankhya, Ser. A, 10 (1950), 305–340.
66. M. Loève,Probability Theory, 4th ed., Springer-Verlag, New York, 1977.
67. E. Lukacs, A characterization of the normal distribution,Ann. Math. Stat.13 (1942), 91–93.
68. E. Lukacs, Characterization of populations by properties of suitable statistics,Proc. Third
Berkeley Symp.2 (1956), 195–214.
69. E. Lukacs,Characteristic Functions, 2nd ed., Hafner, New York, 1970.
70. E. Lukacs and R. G. Laha,Applications of Characteristic Functions, Hafner, New York,
1964.

644 REFERENCES
71. H. B. Mann and D. R. Whitney, On a test whether one of two random variables is stochastically
larger than the other,Ann. Math. Stat.18 (1947), 50–60.
72. F. J. Massey, Distribution table for the deviation between two sample cumulatives,Ann. Math.
Stat.23 (1952), 435–441.
73. M. V. Menon, A characterization of the Cauchy distribution,Ann. Math. Stat.33 (1962),
1267–1271.
74. L. H. Miller, Table of percentage points of Kolmogorov statistics,J. Am. Stat. Assoc.51 (1956),
111–121.
75. M. G. Natrella,Experimental Statistics, Natl. Bur. Stand. Handb. 91, Washington, D.C., 1963.
76. J. Neyman and E. S. Pearson, On the problem of the most efficient tests of statistical
hypotheses,Phil. Trans. R. Soc.A231 (1933), 289–337.
77. J. Neyman and E. L. Scott, Consistent estimates based on partially consistent observations,
Econometrica16 (1948), 1–32.
78. E. H. Oliver, A maximum likelihood oddity,Am. Stat.26 (1972), No. 3, 43–44.
79. D. B. Owen,Handbook of Statistical Tables, Addison-Wesley, Reading, M.A., 1962.
80. E. J. G. Pitman and E. J. Williams, Cauchy-distributed functions of Cauchy variates,Ann.
Math. Stat.38 (1967), 916–918.
81. J. W. Pratt, Length of confidence intervals,J. Am. Stat. Assoc.56 (1961), 260–272.
82. B. J. Prochaska, A note on the relationship between the geometric and exponential distribu-
tions,Am. Stat.27 (1973), 27.
83. P. S. Puri, On a property of exponential and geometric distributions and its relevance to
multivariate failure rate,Sankhya, Ser. A, 35 (1973), 61–68.
84. D. A. Raikov, On the decomposition of Gauss and Poisson laws (in Russian),Izv. Akad. Nauk.
SSSR, Ser. Mat.2 (1938), 91–124.
85. R. R. Randles and D. A. Wolfe,Introduction to the Theory of Nonparametric Statistics,
Krieger, Melbourne, F.L., 1991.
86. C. R. Rao, Information and the accuracy attainable in the estimation of statistical parameters,
Bull. Calcutta Math. Soc.37 (1945), 81–91.
87. C. R. Rao, Sufficient statistics and minimum variance unbiased estimates,Proc. Cambridge
Phil. Soc.45 (1949) 213–218.
88. C. R. Rao,Linear Statistical Inference and Its Applications, 2nd ed., Wiley, New York, 1973.
89. S. C. Rastogi, Note on the distribution of a test statistic,Am. Stat.23 (1969), 40–41.
90. V. K. Rohatgi,Statistical Inference, Wiley, New York, 1984.
91. V. K. Rohatgi, On the moments ofF(X)whenFis discrete,J. Stat. Comp. Simul.29 (1988),
340–343.
92. V. I. Romanovsky, On the moments of the standard deviations and of the correlation coefficient
in samples from a normal population,Metron5 (1925), No. 4, 3–46.
93. L. Rosenberg, Nonnormality of linear combinations of normally distributed random variables,
Am. Math. Mon.72 (1965), 888–890.
94. J. Roy and S. Mitra, Unbiased minimum variance estimation in a class of discrete distributions,
Sankhy¯a18 (1957), 371–378.
95. R. Roy, Y. LePage, and M. Moore, On the power series expansion of the moment generating
function,Am. Stat.28 (1974), 58–59.
96. H. L. Royden,Real Analysis, 2nd ed., Macmillan, New York, 1968.
97. Y. D. Sabharwal, A sequence of symmetric Bernoulli trials,SIAM Rev.11 (1969), 406–409.

REFERENCES 645
98. A. Sampson and B. Spencer, Sufficiency, minimal sufficiency, and lack thereof,Am. Stat.30
(1976) 34–35. Correction, 31 (1977), 54.
99. P. A. Samuelson, How deviant can you be?J. Am. Stat. Assoc.63 (1968), 1522–1525.
100. H. Scheffé, A useful convergence theorem for probability distributions,Ann. Math. Stat.18
(1947), 434–438.
101. H. Scheffé,The Analysis of Variance, Wiley, New York, 1961.
102. R. J. Serfling,Approximation Theorems of Mathematical Statistics, Wiley, New York, 1979.
103. D. N. Shanbhag and I. V. Basawa, On a characterization property of the multinomial
distribution,Ann. Math. Stat.42 (1971), 2200.
104. L. Shepp, Normal functions of normal random variables,SIAM Rev.4 (1962), 255–256.
105. A. E. Siegel, Film-mediated fantasy aggression and strength of aggression drive,Child Dev.
27 (1956), 365–378.
106. V. P. Skitovitch, Linear forms of independent random variables and the normal distribution
law,Izv. Akad. Nauk. SSSR. Ser. Mat.18 (1954), 185–200.
107. N. V. Smirnov, On the estimation of the discrepancy between empirical curves of distributions
for two independent samples (in Russian),Bull. Moscow Univ.2 (1939), 3–16.
108. N. V. Smirnov, Approximate laws of distribution of random variables from empirical data (in
Russian),Usp. Mat. Nauk.10 (1944), 179–206.
109. R. C. Srivastava, Two characterizations of the geometric distribution,J. Am. Stat. Assoc.69
(1974), 267–269.
110. S. M. Stigler, Completeness and unbiased estimation,Am. Stat.26 (1972), 28–29.
111. P. T. Strait, A note on the independence and conditional probabilities,Am. Stat.25 (1971),
No. 2, 17–18.
112. R. F. Tate and G. W. Klett, Optimum confidence intervals for the variance of a normal
distribution,J. Am. Stat. Assoc.54 (1959), 674–682.
113. W. A. Thompson, Jr.,Applied Probability, Holt, Rinehart and Winston, New York, 1969.
114. H. G. Tucker,A Graduate Course in Probability, Academic Press, New York, 1967.
115. A. Wald, Note on the consistency of the maximum likelihood estimate,Ann. Math. Stat.20
(1949), 595–601.
116. G. N. Watson,A Treatise on the Theory of Bessel Functions, 2nd ed., Cambridge University
Press, Cambridge, 1966.
117. D. V. Widder,Advanced Calculus, 2nd ed., Prentice-Hall, Englewood Cliffs, N.J., 1961.
118. S. S. Wilks,Mathematical Statistics, Wiley, New York, 1962.
119. J. Wishart, The generalized product-moment distribution in samples from a normal multivari-
ate population,Biometrika20A (1928), 32–52.
120. C. K. Wong, A note on mutually independent events.Am. Stat.26 (1972), 27.
121. S. Zacks,The Theory of Statistical Inference, Wiley, New York, 1971.
122. P. W. Zehna, Invariance of maximum likelihood estimation,Ann. Math. Stat.37 (1966), 755.

STATISTICAL TABLES
ST1 Cumulative Binomial Probabilities
ST2 Tail Probability Under Standard Normal Distribution
ST3 Critical Values Under Chi-Square Distribution
ST4 Student’st-Distribution
ST5F-Distribution: 5% and 1% Points for the Distribution ofF
ST6 Random Normal Numbers,μ=0 andσ=1
ST7 Critical Values of the Kolmogorov–Smirnov One-Sample Test Statistic
ST8 Critical Values of the Kolmogorov–Smirnov Test Statistics for Two Samples of
Equal Size
ST9 Critical Values of the Kolmogorov–Smirnov Test Statistics for Two Samples of
Unequal Size
ST10 Critical Values of the Wilcoxon Signed–Rank Test Statistic
ST11 Critical Values of the Mann–Whitney–Wilcoxon Test Statistic
ST12 Critical Points of Kendall’s Tau Statistics
ST13 Critical Values of Spearman’s Rank Correlation Statistic
An Introduction to Probability and Statistics, Third Edition. Vijay K. Rohatgi and A.K. Md. Ehsanes Saleh.
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.

648 STATISTICAL TABLES
Table ST1. Cumulative Binomial Probabilities,
σ
r
x=0

n
x

p
x
(1−p)
n−x
,
r=0,1,2,...,n−1p
nr 0.01 0.05 0.10 0.20 0.25 0.30 0.333 0.40 0.50
2 0 0.9801 0.9025 0.8100 0.6400 0.5625 0.4900 0.4444 0.3600 0.2500
1 0.9999 0.9975 0.9900 0.9600 0.9375 0.9100 0.8888 0.8400 0.7500
3 0 0.9703 0.8574 0.7290 0.5120 0.4219 0.3430 0.2963 0.2160 0.1250
1 0.9997 0.9928 0.9720 0.8960 0.8438 0.7840 0.7407 0.6480 0.5000
2 1.0000 0.9999 0.9990 0.9920 0.9844 0.9730 0.9629 0.9360 0.8750
4 0 0.9606 0.8145 0.6561 0.4096 0.3164 0.2401 0.1975 0.1296 0.0625
1 0.9994 0.9860 0.9477 0.8192 0.7383 0.6517 0.5926 0.4742 0.3125
2 1.0000 0.9995 0.9963 0.9728 0.9492 0.9163 0.8889 0.8198 0.6875
3 1.0000 0.9999 0.9984 0.9961 0.9919 0.9877 0.9734 0.9375
5 0 0.9510 0.7738 0.5905 0.3277 0.2373 0.1681 0.1317 0.0778 0.0312
1 0.9990 0.9774 0.9185 0.7373 0.6328 0.5283 0.4609 0.3370 0.1874
2 1.0000 0.9988 0.9914 0.9421 0.8965 0.8370 0.7901 0.6826 0.4999
3 0.9999 0.9995 0.9933 0.9844 0.9693 0.9547 0.9130 0.8124
4 1.0000 1.0000 0.9997 0.9990 0.9977 0.9959 0.9898 0.9686
6 0 0.9415 0.7351 0.5314 0.2621 0.1780 0.1176 0.0878 0.0467 0.0156
1 0.9986 0.9672 0.8857 0.6553 0.5340 0.4201 0.3512 0.2333 0.1094
2 1.0000 0.9977 0.9841 0.9011 0.8306 0.7442 0.6804 0.5443 0.3438
3 0.9998 0.9987 0.9830 0.9624 0.9294 0.8999 0.8208 0.6563
4 0.9999 0.9999 0.9984 0.9954 0.9889 0.9822 0.9590 0.8907
5 1.0000 1.0000 0.9999 0.9998 0.9991 0.9987 0.9959 0.9845
7 0 0.9321 0.6983 0.4783 0.2097 0.1335 0.0824 0.0585 0.0280 0.0078
1 0.9980 0.9556 0.6554 0.5767 0.4450 0.3294 0.2633 0.1586 0.0625
2 1.0000 0.9962 0.8503 0.8520 0.7565 0.6471 0.5706 0.4199 0.2266
3 0.9998 0.9743 0.9667 0.9295 0.8740 0.8267 0.7102 0.5000
4 1.0000 0.9973 0.9953 0.9872 0.9712 0.9547 0.9037 0.7734
5 0.9998 0.9996 0.9987 0.9962 0.9931 0.9812 0.9375
6 1.0000 1.0000 0.9999 0.9998 0.9995 0.9984 0.9922
8 0 0.9227 0.6634 0.4305 0.1678 0.1001 0.0576 0.0390 0.0168 0.0039
1 0.9973 0.9427 0.8131 0.5033 0.3671 0.2553 0.1951 0.1064 0.0352
2 0.9999 0.9942 0.9619 0.7969 0.6786 0.5518 0.4682 0.3154 0.1445
3 1.0000 0.9996 0.9950 0.9437 0.8862 0.8059 0.7413 0.5941 0.3633
4 1.0000 0.9996 0.9896 0.9727 0.9420 0.9120 0.8263 0.6367
5 1.0000 0.9988 0.9958 0.9887 0.9803 0.9502 0.8555
6 1.0000 0.9996 0.9987 0.9974 0.9915 0.9648
7 1.0000 0.9999 0.9998 0.9993 0.9961
9 0 0.9135 0.6302 0.3874 0.1342 0.0751 0.0404 0.0260 0.0101 0.0020
1 0.9965 0.9287 0.7748 0.4362 0.3004 0.1960 0.1431 0.0706 0.0196
2 0.9999 0.9916 0.9470 0.7382 0.6007 0.4628 0.3772 0.2318 0.0899
3 1.0000 0.9993 0.9916 0.9144 0.8343 0.7296 0.6503 0.4826 0.2540
4 0.9999 0.9990 0.9805 0.9511 0.9011 0.8551 0.7334 0.5001

STATISTICAL TABLES 649
(Continued)
p
nr 0.01 0.05 0.10 0.20 0.25 0.30 0.333 0.40 0.50
5 1.0000 0.9998 0.9970 0.9900 0.9746 0.9575 0.9006 0.7462
6 0.9999 0.9998 0.9987 0.9956 0.9916 0.9749 0.9103
7 1.0000 1.0000 0.9999 0.9995 0.9989 0.9961 0.9806
8 1.0000 0.9999 0.9998 0.9996 0.9982
10 0 0.9044 0.5987 0.3487 0.1074 0.0563 0.0282 0.0173 0.0060 0.0010
1 0.9958 0.9138 0.7361 0.3758 0.2440 0.1493 0.1040 0.0463 0.0108
2 1.0000 0.9884 0.9298 0.6778 0.5256 0.3828 0.2991 0.1672 0.0547
3 0.9989 0.9872 0.8791 0.7759 0.6496 0.5592 0.3812 0.1719
4 0.9999 0.9984 0.9672 0.9219 0.8497 0.7868 0.6320 0.3770
5 1.0000 0.9999 0.9936 0.9803 0.9526 0.9234 0.8327 0.6231
6 1.0000 0.9991 0.9965 0.9894 0.9803 0.9442 0.8282
7 0.9999 0.9996 0.9984 0.9966 0.9867 0.9454
8 1.0000 1.0000 0.9998 0.9996 0.9973 0.9893
9 1.0000 0.9999 0.9999 0.9991
11 0 0.8954 0.5688 0.3138 0.0859 0.0422 0.0198 0.0116 0.0036 0.0005
1 0.9948 0.8981 0.6974 0.3221 0.1971 0.1130 0.0752 0.0320 0.0059
2 0.9998 0.9848 0.9104 0.6174 0.4552 0.3128 0.2341 0.1189 0.0327
3 1.0000 0.9984 0.9815 0.8389 0.7133 0.5696 0.4726 0.2963 0.1133
4 0.9999 0.9972 0.9496 0.8854 0.7897 0.7110 0.5328 0.2744
5 1.0000 0.9997 0.9884 0.9657 0.9218 0.8779 0.7535 0.5000
6 1.0000 0.9981 0.9924 0.9784 0.9614 0.9007 0.7256
7 0.9998 0.9988 0.9947 0.9912 0.9707 0.8867
8 1.0000 0.9999 0.9994 0.9986 0.9941 0.9673
9 1.0000 0.9999 0.9999 0.9993 0.9941
10 1.0000 1.0000 1.0000 0.9995
12 0 0.8864 0.5404 0.2824 0.0687 0.0317 0.0139 0.0077 0.0022 0.0002
1 0.9938 0.8816 0.6590 0.2749 0.1584 0.0850 0.0540 0.0196 0.0032
2 0.9998 0.9804 0.8892 0.5584 0.3907 0.2528 0.1811 0.0835 0.0193
3 1.0000 0.9978 0.9744 0.7946 0.6488 0.4925 0.3931 0.2254 0.0730
4 1.0000 0.9998 0.9957 0.9806 0.8424 0.7237 0.6315 0.4382 0.1939
5 1.0000 1.0000 0.9995 0.9961 0.9456 0.8822 0.8223 0.6652 0.3872
6 1.0000 0.9994 0.9858 0.9614 0.9336 0.8418 0.6128
7 0.9999 0.9972 0.9905 0.9812 0.9427 0.8062
8 1.0000 0.9996 0.9983 0.9962 0.9848 0.9270
9 10000 0.9998 0.9995 0.9972 0.9807
10 1.0000 0.9999 0.9997 0.9968
11 1.0000 1.0000 0.9998
13 0 0.8775 0.5134 0.2542 0.0550 0.0238 0.0097 0.0052 0.0013 0.0000
1 0.9928 0.8746 0.6214 0.2337 0.1267 0.0637 0.0386 0.0126 0.0017
2 0.9997 0.9755 0.8661 0.5017 0.3326 0.2025 0.1388 0.0579 0.0112
3 1.0000 0.9969 0.9659 0.7473 0.5843 0.4206 0.3224 0.1686 0.0462
4 0.9997 0.9936 0.9009 0.7940 0.6543 0.5521 0.3531 0.1334
5 1.0000 0.9991 0.9700 0.9198 0.8346 0.7587 0.5744 0.2905

650 STATISTICAL TABLES
Table ST1. (Continued)
p
nr 0.01 0.05 0.10 0.20 0.25 0.30 0.333 0.40 0.50
6 0.9999 0.9930 0.9757 0.9376 0.8965 0.7712 0.5000
7 1.0000 0.9988 0.9944 0.9818 0.9654 0.9024 0.7095
8 0.9998 0.9990 0.9960 0.9912 0.9679 0.8666
9 1.0000 0.9999 0.9994 0.9984 0.9922 0.9539
10 1.0000 0.9999 0.9998 0.9987 0.9888
11 1.0000 1.0000 0.9999 0.9983
12 1.0000 0.9999
14 0 0.8687 0.4877 0.2288 0.0440 0.0178 0.0068 0.0034 0.0008 0.0000
1 0.9916 0.8470 0.5847 0.1979 0.1010 0.0475 0.0274 0.0081 0.0009
2 0.9997 0.9700 0.8416 0.4480 0.2812 0.1608 0.1054 0.0398 0.0065
3 1.0000 0.9958 0.9559 0.6982 0.5214 0.3552 0.2612 0.1243 0.0287
4 0.9996 0.9908 0.8702 0.7416 0.5842 0.4755 0.2793 0.0898
5 1.0000 0.9986 0.9562 0.8884 0.7805 0.6898 0.4859 0.2120
6 0.9998 0.9884 0.9618 0.9067 0.8506 0.6925 0.3953
7 1.0000 0.9976 0.9897 0.9686 0.9424 0.8499 0.6048
8 0.9996 0.9979 0.9917 0.9826 0.9417 0.7880
9 1.0000 0.9997 0.9984 0.9960 0.9825 0.9102
10 1.0000 0.9998 0.9993 0.9961 0.9713
11 1.0000 0.9999 0.9994 0.9936
12 1.0000 0.9999 0.9991
13 0.9999
15 0 0.8601 0.4633 0.2059 0.0352 0.0134 0.0048 0.0023 0.0005 0.0000
1 0.9904 0.8291 0.5491 0.1672 0.0802 0.0353 0.0194 0.0052 0.0005
2 0.9996 0.9638 0.8160 0.3980 0.2361 0.1268 0.0794 0.0271 0.0037
3 1.0000 0.9946 0.9444 0.6482 0.4613 0.2969 0.2092 0.0905 0.0176
4 0.9994 0.9873 0.8358 0.6865 0.5255 0.4041 0.2173 0.0592
5 1.0000 0.9978 0.9390 0.8516 0.7216 0.6184 0.4032 0.1509
6 0.9997 0.9820 0.9434 0.8689 0.7970 0.6098 0.3036
7 1.0000 0.9958 0.9827 0.9500 0.9118 0.7869 0.5000
8 0.9992 0.9958 0.9848 0.9692 0.9050 0.6964
9 0.9999 0.9992 0.9964 0.9915 0.9662 0.8491
10 1.0000 0.9999 0.9993 0.9982 0.9907 0.9408
11 1.0000 0.9999 0.9997 0.9981 0.9824
12 1.0000 1.0000 0.9997 0.9963
13 1.0000 0.9995
14 1.0000
Source:Forn=2 through 10, adapted with permission from E. Parzen,Modern Probability Theory and Its
Applications, John Wiley, New York, 1962. For n=11 through 15, adapted with permission fromTables of
Cumulative Binomial Probability Distribution, Harvard University Press, Cambridge, M.A., 1955.

STATISTICAL TABLES 651
Table ST2. Tail Probability Under Standard Normal Distribution
a
z0.00 0.01 0.02 0.03 0.04 0.05 0.06 0.07 0.08 0.09
0.0 0.5000 0.4960 0.4920 0.4880 0.4840 0.4801 0.4761 0.4721 0.4681 0.4641
0.1 0.4602 0.4562 0.4522 0.4483 0.4443 0.4404 0.4364 0.4325 0.4286 0.4247
0.2 0.4207 0.4168 0.4129 0.4090 0.4052 0.4013 0.3974 0.3936 0.3897 0.3859
0.3 0.3821 0.3783 0.3745 0.3707 0.3669 0.3632 0.3594 0.3557 0.3520 0.3483
0.4 0.3446 0.3409 0.3372 0.3336 0.3300 0.3264 0.3228 0.3192 0.3156 0.3121
0.5 0.3085 0.3050 0.3015 0.2981 0.2946 0.2912 0.2877 0.2843 0.2810 0.2776
0.6 0.2743 0.2709 0.2676 0.2643 0.2611 0.2578 0.2546 0.2514 0.2483 0.2451
0.7 0.2420 0.2389 0.2358 0.2327 0.2297 0.2266 0.2231 0.2206 0.2177 0.2148
0.8 0.2119 0.2090 0.2061 0.2033 0.2005 0.1977 0.1949 0.1922 0.1984 0.1867
0.9 0.1841 0.1814 0.1788 0.1762 0.1736 0.1711 0.1685 0.1660 0.1635 0.1611
1.0 0.1587 0.1562 0.1539 0.1515 0.1492 0.1469 0.1446 0.1423 0.1401 0.1379
1.1 0.1357 0.1335 0.1314 0.1292 0.1271 0.1251 0.1230 0.1210 0.1190 0.1170
1.2 0.1151 0.1131 0.1112 0.1093 0.1075 0.1056 0.1038 0.1020 0.1003 0.0985
1.3 0.0968 0.0951 0.0934 0.0918 0.0901 0.0885 0.0869 0.0853 0.0838 0.0823
1.4 0.0808 0.0793 0.0778 0.0764 0.0749 0.0735 0.0721 0.0708 0.0694 0.0681
1.5 0.0668 0.0655 0.0643 0.0630 0.0618 0.0606 0.0594 0.0582 0.0571 0.0559
1.6 0.0548 0.0537 0.0526 0.0516 0.0505 0.0495 0.0485 0.0475 0.0465 0.0455
1.7 0.0446 0.0436 0.0427 0.0418 0.0409 0.0401 0.0392 0.0384 0.0375 0.0367
1.8 0.0359 0.0351 0.0344 0.0336 0.0329 0.0322 0.0314 0.0307 0.0301 0.0294
1.9 0.0287 0.0281 0.0274 0.0268 0.0262 0.0256 0.0250 0.0244 0.0239 0.0233
2.0 0.0228 0.0222 0.0217 0.0212 0.0207 0.0202 0.0197 0.0192 0.0188 0.0183
2.1 0.0179 0.0174 0.0170 0.0166 0.0162 0.0158 0.0154 0.0150 0.0146 0.0143
2.2 0.0139 0.0136 0.0132 0.0129 0.0125 0.0122 0.0119 0.0116 0.0113 0.0110
2.3 0.0107 0.0104 0.0102 0.0099 0.0096 0.0094 0.0091 0.0089 0.0087 0.0084
2.4 0.0082 0.0080 0.0078 0.0075 0.0073 0.0017 0.0069 0.0068 0.0066 0.0064
2.5 0.0062 0.0060 0.0059 0.0057 0.0055 0.0054 0.0052 0.0051 0.0049 0.0048
2.6 0.0047 0.0045 0.0044 0.0043 0.0041 0.0040 0.0039 0.0038 0.0037 0.0036
2.7 0.0035 0.0034 0.0033 0.0032 0.0031 0.0030 0.0029 0.0028 0.0027 0.0026
2.8 0.0026 0.0025 0.0024 0.0023 0.0023 0.0022 0.0021 0.0021 0.0020 0.0019
2.9 0.0019 0.0018 0.0018 0.0017 0.0016 0.0016 0.0015 0.0015 0.0014 0.0014
3.0 0.0013 0.0013 0.0013 0.0012 0.0012 0.0011 0.0011 0.0011 0.0010 0.0010
Source:Adapted with permission from P. G. Hoel,Introduction to Mathematical Statistics, 4th ed., Wiley,
New York, 1971, p. 391.
a
This table gives the probability that the standard normal variableZwill exceed a given positive valuez,thatis,
P{Z>z
α}=α. The probabilities for negative values ofzare obtained by symmetry.

Table ST3. Critical Values Under Chi-Square Distribution
a
Degrees
of
Freedom
α
0.99 0.98 0.95 0.90 0.80 0.70 0.50 0.30 0.20 0.10 0.05 0.02 0.01
10.000157 0.000628 0.00393 0.0158 0.0642 0.148 0.455 1.074 1.642 2.706 3.841 5.412 6.635
20.0201 0.0404 0.103 0.211 0.446 0.713 1.386 2.408 3.219 4.605 5.991 7.824 9.210
30.115 0.185 0.352 0.584 1.005 1.424 2.366 3.665 4.642 6.251 7.815 9.837 11.341
40.297 0.429 0.711 1.064 1.649 2.195 3.357 4.878 5.989 7.779 9.488 11.668 13.277
50.554 0.752 1.145 1.610 2.343 3.000 4.351 6.064 7.289 9.236 11.070 13.388 15.086
60.872 1.134 1.635 2.204 3.070 3.828 5.348 7.231 8.558 10.645 12.592 15.033 16.812
71.239 1.564 2.167 2.833 3.822 4.671 6.346 8.383 9.803 12.017 14.067 16.622 18.475
81.646 2.032 2.733 3.490 4.594 5.527 7.344 9.524 11.030 13.362 15.507 18.168 20.090
92.088 2.532 3.325 4.168 5.380 6.393 8.343 10.656 12.242 14.684 16.919 19.679 21.666
102.558 3.059 3.940 4.865 6.179 7.267 9.342 11.781 13.442 15.987 18.307 21.161 23.209
113.053 3.609 4.575 5.578 6.989 8.148 10.341 12.899 14.631 17.275 19.675 22.618 24.725
123.571 4.178 5.226 6.304 7.807 9.034 11.340 14.011 15.812 18.549 21.026 24.054 26.217
134.107 4.765 5.892 7.042 8.634 9.926 12.340 15.119 16.985 19.812 22.362 25.472 27.688
144.660 5.368 6.571 7.790 9.467 10.821 13.339 16.222 18.151 12.064 23.685 26.873 29.141
155.229 5.985 7.261 8.547 10.307 11.721 14.339 17.322 19.311 22.307 24.996 28.259 30.578
165.812 6.614 7.962 9.312 11.152 12.624 15.338 18.418 20.465 23.542 26.296 29.633 32.000
176.408 7.255 8.672 10.085 12.002 13.531 16.338 19.511 21.615 24.669 27.587 30.995 33.409
187.015 7.906 9.390 10.865 12.857 14.440 17.338 20.601 22.760 25.989 28.869 32.346 34.805
197.633 8.567 10.117 11.651 13.716 15.352 18.338 21.689 23.900 27.204 30.144 33.687 36.191
208.260 9.237 10.851 12.443 14.578 16.266 19.337 22.775 25.038 28.412 31.410 35.020 37.566
218.897 9.915 11.591 13.240 15.445 17.182 20.337 23.858 26.171 29.615 32.671 36.343 38.932
229.542 10.600 12.338 14.041 16.314 18.101 21.337 24.939 27.301 30.813 33.924 37.659 40.289
23 10.196 11.293 13.091 14.848 17.187 19.021 22.337 26.018 28.429 32.007 35.172 38.968 41.638
24 10.856 11.992 13.848 15.659 18.062 19.943 23.337 27.096 29.553 33.196 36.415 40.270 42.980
25 11.524 12.697 14.611 16.473 18.940 20.867 24.337 28.172 30.675 34.382 37.652 41.566 44.314
26 12.198 13.409 15.379 17.292 19.820 21.792 25.336 29.246 31.795 35.563 38.885 42.856 45.642
27 12.879 14.125 16.151 18.114 20.703 22.719 26.336 30.319 32.912 36.741 40.113 44.140 46.963
28 13.565 14.847 16.928 18.939 21.588 23.647 27.336 31.391 34.027 37.916 41.337 45.419 48.278
29 14.256 15.574 17.708 19.768 22.475 24.577 28.336 32.461 35.139 39.087 42.557 46.693 49.588
30 14.953 16.306 18.493 20.599 23.364 25.508 29.336 33.530 36.250 40.256 43.773 47.962 50.892
Source:Reproduced fromStatistical Methods for Research Workers , 14th ed., 1972, with the permission of the Estate of R. A. Fisher, and Hafner Press. a
For degrees of freedom greater than 30, the expression


2


2n−1 may be used as a normal deviate with unit variance, where nis the number of degrees of freedom.

STATISTICAL TABLES 653
Table ST4. Student’st-Distribution
a
α
n 0.10 0.05 0.025 0.01 0.005
1 3.078 6.314 12.706 31.821 63.657
2 1.886 2.920 4.303 6.965 9.925
3 1.638 2.353 3.182 4.541 5.841
4 1.533 2.132 2.776 3.747 4.604
5 1.476 2.015 2.571 3.365 4.032
6 1.440 1.943 2.447 3.143 3.707
7 1.415 1.895 2.365 2.998 3.499
8 1.397 1.860 2.306 2.896 3.355
9 1.383 1.833 2.262 2.821 3.250
10 1.372 1.812 2.228 2.764 3.169
11 1.363 1.796 2.201 2.718 3.106
12 1.356 1.782 2.179 2.681 3.055
13 1.350 1.771 2.160 2.650 3.012
14 1.345 1.761 2.145 2.624 2.977
15 1.341 1.753 2.131 2.602 2.947
16 1.337 1.746 2.120 2.583 2.921
17 1.333 1.740 2.110 2.567 2.898
18 1.330 1.734 2.101 2.552 2.878
19 1.328 1.729 2.093 2.539 2.861
20 1.325 1.725 2.086 2.528 2.845
21 1.323 1.721 2.080 2.518 2.831
22 1.321 1.717 2.074 2.508 2.819
23 1.319 1.714 2.069 2.500 2.807
24 1.318 1.711 2.064 2.492 2.797
25 1.316 1.708 2.060 2.485 2.787
26 1.315 1.706 2.056 2.479 2.779
27 1.314 1.703 2.052 2.473 2.771
28 1.313 1.701 2.048 2.467 2.763
29 1.311 1.699 2.045 2.462 2.756
30 1.310 1.697 2.042 2.457 2.750
40 1.303 1.684 2.021 2.423 2.704
60 1.296 1.671 2.000 2.390 2.660
120 1.289 1.658 1.980 2.358 2.617
∞ 1.282 1.645 1.960 2.326 2.576
Source:P. G. Hoel,Introduction to Mathematical Statistics, 4th ed., Wiley, New York, 1971, p. 393. Reprinted
by permission of John Wiley & Sons, Inc.
a
The first column lists the number of degrees of freedom (n). The headings of the other columns give probabilities
(α)fort to exceed the entry value. Use symmetry for negativetvalues.

Table ST5.F-Distribution: 5% (Lightface Type) and 1% (Boldface Type) Points for the Distribution of F
Degrees ofDegrees of Freedom for Numerator ( m)
Freedom for
Denominator ( n)1234567891011121416202430405075100200500∞
1 161 200 216 225 230 234 237 239 241 242 243 244 245 246 248 249 250 251 252 253 253 254 254 254
4052 4999 5403 5625 5764 5859 5928 5981 6022 6056 6082 6106 6142 6169 6208 6234 6258 6286 6302 6323 6334 6352 6361 6366
2 18.51 19.00 19.16 19.25 19.30 19.33 19.36 19.37 19.38 19.39 19.40 19.41 19.42 19.43 19.44 19.45 19.46 19.47 19.47 19.48 19.49 19.49 19.50 19.50
98.49 99.01 99.17 99.25 99.30 99.33 99.34 99.36 99.38 99.40 99.41 99.42 99.43 99.44 99.45 99.46 99.47 99.48 99.48 99.49 99.49 99.49 99.50 99.50
3 10.13 9.55 9.28 9.12 9.01 8.94 8.88 8.84 8.81 8.78 8.76 8.74 8.71 8.69 8.66 8.64 8.62 8.60 8.58 8.57 8.56 8.54 8.54 8.53
34.12 30.81 29.46 28.71 28.24 27.91 27.67 27.49 27.34 27.23 27.13 27.05 26.92 26.83 26.69 26.60 26.50 26.41 26.30 26.27 26.23 26.18 26.14 26.12
4 7.71 6.94 6.59 6.39 6.26 6.16 6.09 6.04 6.00 5.96 5.93 5.91 5.87 5.84 5.80 5.77 5.74 5.71 5.70 5.68 5.66 5.65 5.64 5.63
21.20 18.00 16.69 15.98 15.52 15.21 14.98 14.80 14.66 14.54 14.45 14.37 14.24 14.15 14.02 13.93 13.83 13.74 13.69 13.61 13.57 13.52 13.48 13.46
5 6.61 5.79 5.41 5.19 5.05 4.95 4.88 4.82 4.78 4.74 4.70 4.68 4.64 4.60 4.56 4.53 4.50 4.46 4.44 4.42 4.40 4.38 4.37 4.36
16.26 13.27 12.06 11.39 10.97 10.67 10.45 10.27 10.15 10.05 9.96 9.89 9.77 9.68 9.55 9.47 9.38 9.29 9.24 9.17 9.13 9.07 9.04 9.02
6 5.99 5.14 4.76 4.53 4.39 4.28 4.21 4.15 4.10 4.06 4.03 4.00 3.96 3.92 3.87 3.84 3.81 3.77 3.75 3.72 3.71 3.69 3.68 3.67
13.74 10.92 9.78 9.15 8.75 8.47 8.26 8.10 7.98 7.87 7.79 7.72 7.60 7.52 7.39 7.31 7.23 7.14 7.09 7.02 6.99 6.94 6.90 6.88
7 5.59 4.74 4.35 4.12 3.97 3.87 3.79 3.73 3.68 3.63 3.60 3.57 3.52 3.49 3.44 3.41 3.38 3.34 3.32 3.29 3.28 3.25 3.24 3.23
12.25 9.55 8.45 7.85 7.46 7.19 7.00 6.84 6.71 6.62 6.54 6.47 6.35 6.27 6.15 6.07 5.98 5.90 5.85 5.78 5.75 5.70 5.67 5.65
8 5.32 4.46 4.07 3.84 3.69 3.58 3.50 3.44 3.39 3.34 3.31 3.28 3.23 3.20 3.15 3.12 3.08 3.05 3.03 3.00 2.98 2.96 2.94 2.93
11.26 8.65 7.59 7.01 6.63 6.37 6.19 6.03 5.91 5.82 5.74 5.67 5.56 5.48 5.36 5.28 5.20 5.11 5.06 5.00 4.96 4.91 4.88 4.86
9 5.12 4.26 3.86 3.63 3.48 3.37 3.29 3.23 3.18 3.13 3.10 3.07 3.02 2.98 2.93 2.90 2.86 2.82 2.80 2.77 2.76 2.73 2.72 2.71
10.56 8.02 6.99 6.42 6.06 5.80 5.62 5.47 5.35 5.26 5.18 5.11 5.00 4.92 4.80 4.73 4.64 4.56 4.51 4.45 4.41 4.36 4.33 4.31
10 4.96 4.10 3.71 3.48 3.33 3.22 3.14 3.07 3.02 2.97 2.94 2.91 2.86 2.82 2.77 2.74 2.70 2.67 2.64 2.61 2.59 2.56 2.55 2.54
10.04 7.56 6.55 5.99 5.64 5.39 5.21 5.06 4.95 4.85 4.78 4.71 4.60 4.52 4.41 4.33 4.25 4.17 4.12 4.05 4.01 3.96 3.93 3.91
11 4.84 3.98 3.59 3.36 3.20 3.09 3.01 2.95 2.90 2.96 2.82 2.79 2.74 2.70 2.65 2.61 2.57 2.53 2.50 2.47 2.45 2.42 2.41 2.40
9.65 7.20 6.22 5.67 5.32 5.07 4.88 4.74 4.63 4.54 4.46 4.40 4.29 4.21 4.10 4.02 3.94 3.86 3.80 3.74 3.70 3.66 3.62 3.60
12 4.75 3.88 3.49 3.26 3.11 3.00 2.92 2.85 2.80 2.76 2.72 2.69 2.64 2.60 2.54 2.50 2.46 2.42 2.40 2.36 2.35 2.32 2.31 2.30
9.33 6.93 5.95 5.41 5.06 4.82 4.65 4.50 4.39 4.30 4.22 4.16 4.05 3.98 3.86 3.78 3.70 3.61 3.56 3.49 3.46 3.41 3.38 3.36
13 4.67 3.80 3.41 3.18 3.02 2.92 2.84 2.77 2.72 2.67 2.63 2.60 2.55 2.51 2.46 2.42 2.38 2.34 2.32 2.28 2.26 2.24 2.22 2.21
9.07 6.70 5.74 5.20 4.86 4.62 4.44 4.30 4.19 4.10 4.02 3.96 3.85 3.78 3.67 3.59 3.51 3.42 3.37 3.30 3.27 3.21 3.18 3.16

14 4.60 3.74 3.34 3.11 2.96 2.85 2.77 2.70 2.65 2.60 2.56 2.53 2.48 2.44 2.39 2.35 2.31 2.27 2.24 2.21 2.19 2.16 2.14 2.13
8.86 6.51 5.56 5.03 4.69 4.46 4.28 4.14 4.03 3.94 3.86 3.80 3.70 3.62 3.51 3.43 3.34 3.26 3.21 3.14 3.11 3.06 3.02 3.00
15 4.54 3.68 3.29 3.06 2.90 2.79 2.70 2.64 2.59 2.55 2.51 2.48 2.43 2.39 2.33 2.29 2.25 2.21 2.18 2.15 2.12 2.10 2.08 2.07
8.68 6.36 5.42 4.89 4.56 4.32 4.14 4.00 3.89 3.80 3.73 3.67 3.56 3.48 3.36 3.29 3.20 3.12 3.07 3.00 2.97 2.92 2.89 2.87
16 4.49 3.63 3.24 3.01 2.85 2.74 2.66 2.59 2.54 2.49 2.45 2.42 2.37 2.33 2.28 2.24 2.20 2.16 2.13 2.09 2.07 2.04 2.02 2.01
8.53 6.23 5.29 4.77 4.44 4.20 4.03 3.89 3.78 3.69 3.61 3.55 3.45 3.37 3.25 3.18 3.10 3.01 2.96 2.89 2.86 2.80 2.77 2.75
17 4.45 3.59 3.20 2.96 2.81 2.70 2.62 2.55 2.50 2.45 2.41 2.38 2.33 2.29 2.23 2.19 2.15 2.11 2.08 2.04 2.02 1.99 1.97 1.96
8.40 6.11 5.18 4.67 4.34 4.10 3.93 3.79 3.68 3.59 3.52 3.45 3.35 3.27 3.16 3.08 3.00 2.92 2.86 2.79 2.76 2.70 2.67 2.65
18 4.41 3.55 3.16 2.93 2.77 2.66 2.58 2.51 2.46 2.41 2.37 2.34 2.29 2.25 2.19 2.15 2.11 2.07 2.04 2.00 1.98 1.95 1.93 1.92
8.28 6.01 5.09 4.58 4.25 4.01 3.85 3.71 3.60 3.51 3.44 3.37 3.27 3.19 3.07 3.00 2.91 2.83 2.78 2.71 2.68 2.62 2.59 2.57
19 4.38 3.52 3.13 2.90 2.74 2.63 2.55 2.48 2.43 2.38 2.34 2.31 2.26 2.21 2.15 2.11 2.07 2.02 2.00 1.96 1.94 1.91 1.90 1.88
8.18 5.93 5.01 4.50 4.17 3.94 3.77 3.63 3.52 3.43 3.36 3.30 3.19 3.12 3.00 2.92 2.84 2.76 2.70 2.63 2.60 2.54 2.51 2.49
20 4.35 3.49 3.10 2.87 2.71 2.60 2.52 2.45 2.40 2.35 2.31 2.28 2.23 2.18 2.12 2.08 2.04 1.99 1.96 1.92 1.90 1.87 1.85 1.84
8.10 5.85 4.94 4.43 4.10 3.87 3.71 3.56 3.45 3.37 3.30 3.23 3.13 3.05 2.94 2.86 2.77 2.69 2.63 2.56 2.53 2.47 2.44 2.42
21 4.32 3.47 3.07 2.84 2.68 2.57 2.49 2.42 2.37 2.32 2.28 2.25 2.20 2.15 2.09 2.05 2.00 1.96 1.93 1.89 1.87 1.84 1.82 1.81
8.02 5.78 4.87 4.37 4.04 3.81 3.65 3.51 3.40 3.31 3.24 3.17 3.07 2.99 2.88 2.80 2.72 2.63 2.58 2.51 2.47 2.42 2.38 2.36
22 4.30 3.44 3.05 2.82 2.66 2.55 2.47 2.40 2.35 2.30 2.26 2.23 2.18 2.13 2.07 2.03 1.98 1.93 1.91 1.87 1.84 1.81 1.80 1.78
7.94 5.72 4.82 4.31 3.99 3.76 3.59 3.45 3.35 3.26 3.18 3.12 3.02 2.94 2.83 2.75 2.67 2.58 2.53 2.46 2.42 2.37 2.33 2.31
23 4.28 3.42 3.03 2.80 2.64 2.53 2.45 2.38 2.32 2.28 2.24 2.20 2.14 2.10 2.04 2.00 1.96 1.91 1.88 1.84 1.82 1.79 1.77 1.76
7.88 5.66 4.76 4.26 3.943.713.54 3.41 3.30 3.21 3.14 3.07 2.97 2.89 2.78 2.70 2.62 2.53 2.48 2.41 2.37 2.32 2.28 2.26
24 4.26 3.40 3.01 2.78 2.62 2.51 2.43 2.36 2.30 2.26 2.22 2.18 2.13 2.09 2.02 1.98 1.94 1.89 1.86 1.82 1.80 1.76 1.74 1.73
7.82 5.61 4.72 4.22 3.90 3.67 3.50 3.36 3.25 3.17 3.09 3.03 2.93 2.85 2.74 2.66 2.58 2.49 2.44 2.36 2.33 2.27 2.23 2.21
25 4.24 3.38 2.99 2.76 2.60 2.49 2.41 2.34 2.28 2.24 2.20 2.16 2.11 2.06 2.00 1.96 1.92 1.87 1.84 1.80 1.77 1.74 1.72 1.71
7.77 5.57 4.68 4.18 3.86 3.63 3.46 3.32 3.21 3.13 3.05 2.99 2.89 2.81 2.70 2.62 2.54 2.45 2.40 2.32 2.29 2.23 2.19 2.17
26 4.22 3.37 2.89 2.74 2.59 2.47 2.39 2.32 2.27 2.22 2.18 2.15 2.10 2.05 1.99 1.95 1.90 1.85 1.82 1.78 1.76 1.72 1.70 1.69
7.72 5.53 4.64 4.14 3.82 3.59 3.42 3.29 3.17 3.09 3.02 2.96 2.86 2.77 2.66 2.58 2.50 2.41 2.36 2.28 2.25 2.19 2.15 2.13

Table ST5. (Continued)
Degrees ofDegrees of Freedom for Numerator,m
Freedom for
Denominator ( n)1234567891011121416202430405075100200500∞
27 4.21 3.35 2.96 2.73 2.57 2.46 2.37 2.30 2.25 2.20 2.16 2.13 2.08 2.03 1.97 1.93 1.88 1.84 1.80 1.76 1.74 1.71 1.68 1.67
7.68 5.49 4.60 4.11 3.79 3.56 3.39 3.26 3.14 3.06 2.98 2.93 2.83 2.74 2.63 2.55 2.47 2.38 2.33 2.25 2.21 2.16 2.12 2.10
28 4.20 3.34 2.95 2.71 2.56 2.44 2.36 2.29 2.24 2.19 2.15 2.12 2.06 2.02 1.96 1.91 1.87 1.81 1.78 1.75 1.72 1.69 1.67 1.65
7.64 5.45 4.57 4.07 3.76 3.53 3.36 3.23 3.11 3.03 2.95 2.90 2.80 2.71 2.60 2.52 2.44 2.35 2.30 2.22 2.18 2.13 2.09 2.06
29 4.18 3.33 2.93 2.70 2.54 2.43 2.35 2.28 2.22 2.18 2.14 2.10 2.05 2.00 1.94 1.90 1.85 1.80 1.77 1.73 1.71 1.68 1.65 1.64
7.60 5.52 4.54 4.04 3.73 3.50 3.33 3.20 3.08 3.00 2.92 2.87 2.77 2.68 2.57 2.49 2.41 2.32 2.27 2.19 2.15 2.10 2.06 2.03
30 4.17 3.32 2.92 2.69 2.53 2.42 2.34 2.27 2.21 2.16 2.12 2.09 2.04 1.99 1.93 1.89 1.84 1.79 1.76 1.72 1.69 1.66 1.64 1.62
7.56 5.39 4.51 4.02 3.70 3.47 3.30 3.17 3.06 2.98 2.90 2.84 2.74 2.66 2.55 2.47 2.38 2.29 2.24 2.16 2.13 2.07 2.03 2.01
32 4.15 3.30 2.90 2.67 2.51 2.40 2.32 2.25 2.19 2.14 2.10 2.07 2.02 1.97 1.91 1.86 1.82 1.76 1.74 1.69 1.67 1.64 1.61 1.59
7.50 5.34 4.46 3.97 3.66 3.42 3.25 3.12 3.01 2.94 2.86 2.80 2.70 2.62 2.51 2.42 2.34 2.25 2.20 2.12 2.08 2.02 1.98 1.96
34 4.13 3.28 2.88 2.65 2.49 2.38 2.30 2.23 2.17 2.12 2.08 2.05 2.00 1.95 1.89 1.84 1.80 1.74 1.71 1.67 1.64 1.61 1.59 1.57
7.44 5.29 4.42 3.93 3.61 3.38 3.21 3.08 2.97 2.89 2.82 2.76 2.66 2.58 2.47 2.38 2.30 2.21 2.15 2.08 2.04 1.98 1.94 1.91
36 4.11 3.26 2.86 2.63 2.48 2.36 2.28 2.21 2.15 2.10 2.06 2.03 1.89 1.93 1.87 1.82 1.78 1.72 1.69 1.65 1.62 1.59 1.56 1.55
7.39 5.25 4.38 3.89 3.58 3.35 3.18 3.04 2.94 2.86 2.78 2.72 2.62 2.54 2.43 2.35 2.26 2.17 2.12 2.04 2.00 1.94 1.90 1.87
38 4.10 3.25 2.85 2.62 2.46 2.35 2.26 2.19 2.14 2.09 2.05 2.02 1.96 1.92 1.85 1.80 1.76 1.71 1.67 1.63 1.60 1.57 1.54 1.53
7.35 5.21 4.34 3.86 3.54 3.32 3.15 3.02 2.91 2.82 2.75 2.69 2.59 2.51 2.30 2.32 2.22 2.14 2.08 2.00 1.97 1.90 1.86 1.84
40 4.08 3.23 2.84 2.61 2.45 2.34 2.25 2.18 2.12 2.07 2.04 2.00 1.95 1.90 1.84 1.79 1.74 1.69 1.66 1.61 1.59 1.55 1.53 1.51
7.31 5.18 4.31 3.83 3.51 3.29 3.12 2.99 2.88 2.80 2.73 2.66 2.56 2.49 2.37 2.29 2.20 2.11 2.05 1.97 1.94 1.88 1.84 1.81
42 4.07 3.22 2.83 2.59 2.44 2.32 2.24 2.17 2.11 2.06 2.02 1.99 1.94 1.89 1.82 1.78 1.73 1.68 1.64 1.60 1.57 1.54 1.51 1.49
7.27 5.15 4.29 3.80 3.49 3.26 3.10 2.96 2.86 2.77 2.70 2.64 2.54 2.46 2.35 2.26 2.17 2.08 2.02 1.94 1.91 1.85 1.80 1.78
44 4.06 3.21 2.82 2.58 2.43 2.31 2.23 2.16 2.10 2.05 2.01 1.98 1.92 1.88 1.81 1.76 1.72 1.66 1.63 1.58 1.56 1.52 1.50 1.48
7.24 5.12 4.26 3.78 3.46 3.24 3.07 2.94 2.84 2.75 2.68 2.62 2.52 2.44 2.32 2.24 2.15 2.06 2.00 1.92 1.88 1.82 1.78 1.75
46 4.05 3.20 2.81 2.57 2.42 2.30 2.22 2.14 2.09 2.04 2.00 1.97 1.91 1.87 1.80 1.75 1.71 1.65 1.62 1.57 1.54 1.51 1.48 1.46
7.21 5.10 4.24 3.76 3.44 3.22 3.05 2.92 2.82 2.73 2.66 2.60 2.50 2.42 2.40 2.22 2.13 2.04 1.98 1.90 1.86 1.80 1.76 1.72
48 4.04 3.19 2.80 2.56 2.41 2.30 2.21 2.14 2.08 2.03 1.99 1.96 1.90 1.86 1.79 1.74 1.70 1.64 1.61 1.56 1.53 1.50 1.47 1.45
7.19 5.08 4.22 3.74 3.42 3.20 3.04 2.90 2.80 2.71 2.64 2.58 2.48 2.40 2.28 2.20 2.11 2.02 1.96 1.88 1.84 1.78 1.73 1.70

50 4.03 3.18 2.79 2.56 2.40 2.29 2.20 2.13 2.07 2.02 1.98 1.95 1.90 1.85 1.78 1.74 1.69 1.63 1.60 1.55 1.52 1.48 1.46 1.44
7.17 5.06 4.20 3.72 3.41 3.18 3.02 2.88 2.78 2.70 2.62 2.56 2.46 2.39 2.26 2.18 2.10 2.00 1.94 1.86 1.82 1.76 1.71 1.68
55 4.02 3.17 2.78 2.54 2.38 2.27 2.18 2.11 2.05 2.00 1.97 1.93 1.88 1.83 1.76 1.72 1.67 1.61 1.58 1.52 1.50 1.46 1.43 1.41
7.12 5.01 4.16 3.68 3.37 3.15 2.98 2.85 2.75 2.66 2.59 2.53 2.43 2.35 2.23 2.15 2.06 1.96 1.90 1.82 1.78 1.71 1.66 1.64
60 4.00 3.15 2.76 2.52 2.37 2.25 2.17 2.10 2.04 1.99 1.95 1.92 1.86 1.81 1.75 1.70 1.65 1.59 1.56 1.50 1.48 1.44 1.41 1.39
7.08 4.98 4.13 3.65 3.34 3.12 2.95 2.82 2.72 2.63 2.56 2.50 2.40 2.32 2.20 2.12 2.03 1.93 1.87 1.79 1.74 1.68 1.63 1.60
65 3.99 3.14 2.75 2.51 2.36 2.24 2.15 2.08 2.02 1.98 1.94 1.90 1.85 1.80 1.73 1.68 1.63 1.57 1.54 1.49 1.46 1.42 1.39 1.37
7.04 4.95 4.10 3.62 3.31 3.09 2.93 2.79 2.70 2.61 2.54 2.47 2.30 2.37 2.18 2.09 2.00 1.90 1.84 1.76 1.71 1.64 1.60 1.56
70 3.98 3.13 2.74 2.50 2.35 2.32 2.14 2.07 2.01 1.97 1.93 1.89 1.84 1.79 1.72 1.67 1.62 1.56 1.53 1.47 1.45 1.40 1.37 1.35
7.01 4.92 4.08 3.60 3.29 3.07 2.91 2.77 2.67 2.59 2.51 2.45 2.35 2.28 2.15 2.07 1.98 1.88 1.82 1.74 169 1.63 1.56 1.53
80 3.96 3.11 2.72 2.48 2.33 2.21 2.12 2.05 1.99 1.95 1.91 1.88 1.82 1.77 1.70 1.65 1.60 1.54 1.51 1.45 1.42 1.38 1.35 1.32
6.96 4.88 4.04 3.56 3.25 3.04 2.87 2.74 2.64 2.55 2.48 2.41 2.32 2.24 2.11 2.03 1.94 1.84 1.78 1.70 1.65 1.57 1.52 1.49
100 3.94 3.09 2.70 2.46 2.30 2.19 2.10 2.03 1.97 1.92 1.88 1.85 1.79 1.75 1.68 1.63 1.57 1.51 1.48 1.42 1.39 1.34 1.30 1.28
6.90 4.82 3.98 3.51 3.20 2.99 2.82 2.69 2.59 2.51 2.43 2.36 2.26 2.19 2.06 1.98 1.89 1.79 1.73 1.64 1.59 1.51 1.46 1.43
125 3.92 3.07 2.68 2.44 2.29 2.17 2.08 2.01 1.95 1.90 1.86 1.83 1.77 1.72 1.65 1.60 1.55 1.49 1.45 1.39 1.36 1.31 1.27 1.25
6.84 4.78 3.94 3.47 3.17 2.95 2.79 2.65 2.56 2.47 2.40 2.33 2.23 2.15 2.03 1.94 1.85 1.75 1.68 1.59 1.54 1.46 1.40 1.37
150 3.91 3.06 2.67 2.43 2.27 2.16 2.07 2.00 1.94 1.89 1.85 1.82 1.76 1.71 1.64 1.59 1.54 1.47 1.44 1.37 1.34 1.29 1.25 1.22
6.81 4.75 3.91 3.44 3.13 2.92 2.76 2.62 2.53 2.44 2.37 2.30 2.20 2.12 2.00 1.91 1.83 1.72 1.66 1.56 1.51 1.43 1.37 1.33
200 3.89 3.04 2.65 2.41 2.26 2.14 2.05 1.98 1.92 1.87 1.83 1.80 1.74 1.69 1.62 0.157 1.52 1.45 1.42 1.35 1.32 1.26 1.22 1.19
6.76 4.71 3.88 3.41 3.11 2.90 2.73 2.60 2.50 2.41 2.34 2.28 1.17 2.09 1.97 1.88 1.79 1.69 1.62 1.53 1.48 1.39 1.33 1.28
400 3.86 3.02 2.62 2.39 2.23 2.12 2.03 1.96 190 1.85 1.81 1.78 1.72 1.67 1.60 1.54 1.49 1.42 1.38 1.32 1.28 1.22 1.16 1.13
6.70 4.66 3.83 3.36 3.06 2.85 2.69 2.55 2.46 2.37 2.29 2.23 2.12 2.04 1.92 1.84 1.74 1.64 1.57 1.47 1.42 1.32 1.24 1.19
1000 3.85 3.00 2.61 2.38 2.22 2.10 2.02 1.95 1.89 1.84 1.80 1.76 1.70 1.65 1.58 1.53 1.47 1.41 1.36 1.30 1.26 1.19 1.13 1.08
6.66 4.62 3.80 3.34 3.04 2.82 2.66 2.53 2.43 2.34 2.26 2.20 2.09 2.01 1.89 1.81 1.71 1.61 1.54 1.44 1.38 1.28 1.19 1.11
∞3.84 2.99 2.60 2.37 2.21 2.09 2.01 1.94 1.88 1.83 1.79 1.75 1.69 1.64 1.57 1.52 1.46 1.40 1.35 1.28 1.24 1.17 1.11 1.00
6.64 4.60 3.78 3.32 3.02 2.80 2.64 2.51 2.41 2.32 2.24 2.18 2.07 1.99 1.87 1.79 1.69 1.59 1.52 1.41 1.36 1.25 1.15 1.00
Source:Reprinted by permission from George W. Snedecor and William G. Cochran, Statistical Methods, 6th ed., © 1967 by Iowa State University Press, Ames, I.A.

658 STATISTICAL TABLES
Table ST6. Random Normal Numbers,μ=0andσ =1
12345678910
0.464 0.137 2.455 −0.323−0.068 0.290 −0.288 1.298 0.241 −0.957
0.060−2.526−0.531−0.194 0.543 −1.558 0.187 −1.190 0.022 0.525
1.486−0.354−0.634 0.697 0.926 1.375 0.785 −0.963−0.853−1.865
1.022−0.472 1.279 3.521 0.571 −1.851 0.194 1.192 −0.501−0.273
1.394−0.555 0.046 0.321 2.945 1.974 −0.258 0.412 0.439 −0.035
0.906−0.513−0.525 0.595 0.881 −0.934 1.579 0.161 −1.885 0.371
1.179−1.055 0.007 0.769 0.971 0.712 1.090 −0.631−0.255−0.702
−1.501−0.488−0.162−0.136 1.033 0.203 0.448 0.748 −0.423−0.432
−0.690 0.756 −1.618−0.345−0.511−2.051−0.457−0.218 0.857 −0.465
1.372 0.225 0.378 0.761 0.181 −0.736 0.960 −1.530−0.260 0.120
−0.482 1.678−0.057−1.229−0.486 0.856 −0.491−1.983−2.830−0.238
−1.376−0.150 1.356 −0.561−0.256−0.212 0.219 0.779 0.953 −0.869
−1.010 0.598 −0.918 1.598 0.065 0.415 −0.169 0.313 −0.973−1.016
−0.005−0.899 0.012 −0.725 1.147 −0.121 1.096 0.481 −1.691 0.417
1.393 1.163−0.911 1.231 −0.199−0.246 1.239 −2.574−0.558 0.056
−1.787−0.261 1.237 1.046 −0.508−1.630−0.146−0.392−0.627 0.561
−0.105−0.357−1.384 0.360 −0.992−0.116−1.698−2.832−1.108−2.357
−1.339 1.827 −0.959 0.424 0.969 −1.141−1.041 0.362 −1.726 1.956
1.041 0.535 0.731 1.377 0.983 −1.330 1.620 −1.040 0.524 −0.281
0.279−2.056 0.717 −0.873−1.096−1.396 1.047 0.089 −0.573 0.932
−1.805−2.008−1.633 0.542 0.250 −0.166 0.032 0.079 0.471 −1.029
−1.186 1.180 1.114 0.882 1.265 −0.202 0.151 −0.376−0.310 0.479
0.658−1.141 1.151 −1.210 0.927 0.425 0.290 −0.902 0.610 2.709
−0.439 0.358 −1.939 0.891 −0.227 0.602 0.873 −0.437−0.220−0.057
−1.399−0.230 0.385 −0.649−0.577 0.237 −0.289 0.513 0.738 −0.300
0.199 0.208−1.083−0.219−0.291 1.221 1.119 0.004 −2.015−0.594
0.159 0.272−0.313 0.084 −2.828−0.430−0.792−1.275−0.623−1.047
2.273 0.606 0.606 −0.747 0.247 1.291 0.063 −1.793−0.699−1.347
0.041−0.307 0.121 0.790 −0.584 0.541 0.484 −0.986 0.481 0.996
−1.132−2.098 0.921 0.145 0.446 −1.661 1.045 −1.363−0.586−1.023
0.768 0.079−1.473 0.034 −2.127 0.665 0.084 −0.880−0.579 0.551
0.375−1.658−0.851 0.234 −0.656 0.340 −0.086−0.158−0.120 0.418
−0.513−0.344 0.210 −0.736 1.041 0.008 0.427 −0.831 0.191 0.074
0.292−0.521 1.266 −1.206−0.899 0.110 −0.528−0.813 0.071 0.524
1.026 2.990−0.574−0.491−1.114 1.297 −1.433−1.345−3.001 0.479
−1.334 1.278 −0.568−0.109−0.515−0.566 2.923 0.500 0.359 0.326
−0.287−0.144−0.254 0.574 −0.451−1.181−1.190−0.318−0.094 1.114
0.161−0.886−0.921−0.509 1.410 −0.518 0.192 −0.432 1.501 1.068
−1.346 0.193 −1.202 0.394 −1.045 0.843 0.942 1.045 0.031 0.772
1.250−0.199−0.288 1.810 1.378 0.584 1.216 0.733 0.402 0.226
0.630−0.537 0.782 0.060 0.499 −0.431 1.705 1.164 0.884 −0.298
0.375−1.941 0.247 −0.491 0.665 −0.135−0.145−0.498 0.457 1.064

STATISTICAL TABLES 659
(Continued)
12345678910
−1.420 0.489 −1.711−1.186 0.754 −0.732−0.066 1.006 −0.798 0.162
−0.151−0.243−0.430−0.762 0.298 1.049 1.810 2.885 −0.768−0.129
−0.309 0.531 0.416 −1.541 1.456 2.040 −0.124 0.196 0.023 −1.204
0.424−0.444 0.593 0.993 −0.106 0.116 0.484 −1.272 1.066 1.097
0.593 0.658−1.127−1.407−1.579−1.616 1.458 1.262 0.736 −0.916
0.862−0.885−0.142−0.504 0.532 1.381 0.022 −0.281−0.342 1.222
0.235−0.628−0.023−0.463−0.899−0.394−0.538 1.707 −0.188−1.153
−0.853 0.402 0.777 0.833 0.410 −0.349−1.094 0.580 1.395 1.298
Source:From tables of the RAND Corporation, by permission.
Table ST7. Critical Values of the Kolmogorov–Smirnov One-Sample Test Statistic
a
One-Sided Test:
α=0.10 0.05 0.025 0.01 0.005 α= 0.10 0.05 0.025 0.01 0.005
Two-Sided Test:
α=0.20 0.10 0.05 0.02 0.01 α= 0.20 0.10 0.05 0.02 0.01
n=1 0.900 0.950 0.975 0.990 0.995 n=21 0.226 0.259 0.287 0.321 0.344
2 0.684 0.776 0.842 0.900 0.929 22 0.221 0.253 0.281 0.314 0.337
3 0.565 0.636 0.708 0.785 0.829 23 0.216 0.247 0.275 0.307 0.330
4 0.493 0.565 0.624 0.689 0.734 24 0.212 0.242 0.269 0.301 0.323
5 0.447 0.509 0.563 0.627 0.669 25 0.208 0.238 0.264 0.295 0.317
6 0.410 0.468 0.519 0.577 0.617 26 0.204 0.233 0.259 0.290 0.311
7 0.381 0.436 0.483 0.538 0.576 27 0.200 0.229 0.254 0.284 0.305
8 0.358 0.410 0.454 0.507 0.542 28 0.197 0.225 0.250 0.279 0.300
9 0.339 0.387 0.430 0.480 0.513 29 0.193 0.221 0.246 0.275 0.295
10 0.323 0.369 0.409 0.457 0.489 30 0.190 0.218 0.242 0.270 0.290
11 0.308 0.352 0.391 0.437 0.468 31 0.187 0.214 0.238 0.266 0.285
12 0.296 0.338 0.375 0.419 0.449 32 0.184 0.211 0.234 0.262 0.281
13 0.285 0.325 0.361 0.404 0.432 33 0.182 0.208 0.231 0.258 0.277
14 0.275 0.314 0.349 0.390 0.418 34 0.179 0.205 0.227 0.254 0.273
15 0.266 0.304 0.338 0.377 0.404 35 0.177 0.202 0.224 0.251 0.269
16 0.258 0.295 0.327 0.366 0.392 36 0.174 0.199 0.221 0.247 0.265
17 0.250 0.286 0.318 0.355 0.381 37 0.172 0.196 0.218 0.244 0.262
18 0.244 0.279 0.309 0.346 0.371 38 0.170 0.194 0.215 0.241 0.258
19 0.237 0.271 0.301 0.337 0.361 39 0.168 0.191 0.213 0.238 0.255
20 0.232 0.265 0.294 0.329 0.352 40 0.165 0.189 0.210 0.235 0.252
Approximation
forn>40
1.07

n
1.22

n
1.36

n
1.52

n
1.63

n
Source:Adapted by permission from Table 1 of Leslie H. Miller, Table of Percentage points of Kolmogrov
statistics,J. Am. Stat. Assoc. 51 (1956), 111–121.
a
This table gives the values ofD
+
n,α
andD n,αfor whichα≥P{D
+
n
>D
+
n,α
}andα≥P{D n>Dn,α}for some
selected values ofnandα.

660 STATISTICAL TABLES
Table ST8. Critical Values of the Kolmogorov–Smirnov Test Statistic for Two Samples of
Equal Size
a
One-Sided Test:
α= 0.10 0.05 0.025 0.01 0.005 α=0.10 0.05 0.025 0.01 0.005
Two-Sided Test:
α= 0.20 0.10 0.05 0.02 0.01 α=0.20 0.10 0.05 0.02 0.01
n=3 2/3 2/3 n=20 6/20 7/20 8/20 9/20 10/20
4 3/4 3/4 3/4 21 6/21 7/21 8/21 9/21 10/21
5 3/5 3/5 4/5 4/5 4/5 22 7/22 8/22 8/22 10/22 10/22
6 3/6 4/6 4/6 5/6 5/6 23 7/23 8/23 9/23 10/23 10/23
7 4/7 4/7 5/7 5/7 5/7 24 7/24 8/24 9/24 10/24 11/24
8 4/8 4/8 5/8 5/8 6/8 25 7/25 8/25 9/25 10/25 11/25
9 4/9 5/9 5/9 6/9 6/9 26 7/26 8/26 9/26 10/26 11/26
10 4/10 5/10 6/10 6/10 7/10 27 7/27 8/27 9/27 11/27 11/27
11 5/11 5/11 6/11 7/11 7/11 28 8/28 9/28 10/28 11/28 12/28
12 5/12 5/12 6/12 7/12 7/12 29 8/29 9/29 10/29 11/29 12/29
13 5/13 6/13 6/13 7/13 8/13 30 8/30 9/30 10/30 11/30 12/30
14 5/14 6/14 7/14 7/14 8/14 31 8/31 9/31 10/31 11/31 12/31
15 5/15 6/15 7/15 8/15 8/15 32 8/32 9/32 10/32 12/32 12/32
16 6/16 6/16 7/16 8/16 9/16 34 8/34 10/34 11/34 12/34 13/34
17 6/17 7/17 7/17 8/17 9/17 36 9/36 10/36 11/36 12/36 13/36
18 6/18 7/18 8/18 9/18 9/18 38 9/38 10/38 11/38 13/38 14/38
19 6/19 7/19 8/19 9/19 9/19 40 9/40 10/40 12/40 13/40 14/40
Approximation
forn>40:
1.52

n
1.73

n
1.92

n
2.15

n
2.30

n
Source:Adapted by permission from Tables 2 and 3 of Z. W. Birnbaum and R. A. Hall, Small sample distributions
for multisample statistics of the Smirnov type,Ann. Math. Stat. 31 (1960), 710–720.
a
This table gives the values ofD
+
n,n,α
andD n,n,αfor whichα≥P{D
+
n,n
>D
+
n,n,α
}andα≥P{D n,n>Dn,n,α}
for some selected values ofnandα.

STATISTICAL TABLES 661
Table ST9. Critical Values of the Kolmogorov–Smirnov Test Statistic for Two Samples of
Unequal Size
a
One-Sided Test: α=0.10 0.05 0.025 0.01 0.005
Two-Sided Test: α=0.20 0.10 0.05 0.02 0.01
N1=1 N 2=9 17/18
10 9/10
N
1=2 N 2=35/6
43/4
5 4/5 4/5
6 5/6 5/6
7 5/7 6/7
8 3/4 7/8 7/8
9 7/9 8/9 8/9
10 7/10 4/5 9/10
N
1=3 N 2=4 3/4 3/4
5 2/3 4/5 4/5
6 2/3 2/3 5/6
7 2/3 5/7 6/7 6/7
8 5/8 3/4 3/4 7/8
9 2/3 2/3 7/9 8/9 8/9
10 3/5 7/10 4/5 9/10 9/10
12 7/12 2/3 3/4 5/6 11/12
N
1=4 N 2=5 3/5 3/4 4/5 4/5
6 7/12 2/3 3/4 5/6 5/6
7 17/28 5/7 3/4 6/7 6/7
8 5/8 5/8 3/4 7/8 7/8
9 5/9 2/3 3/4 7/9 8/9
10 11/20 13/20 7/10 4/5 4/5
12 7/12 2/3 2/3 3/4 5/6
16 9/16 5/8 11/16 3/4 13/16
N
1=5 N 2=6 3/5 2/3 2/3 5/6 5/6
7 4/7 23/35 5/7 29/35 6/7
8 11/20 5/8 27/40 4/5 4/5
9 5/9 3/5 31/45 7/9 4/5
10 1/2 3/5 7/10 7/10 4/5
15 8/15 3/5 2/3 11/15 11/15
20 1/2 11/20 3/5 7/10 3/4

662 STATISTICAL TABLES
Table ST9. (Continued)
One-Sided Test: α=0.10 0.05 0.025 0.01 0.005
Two-Sided Test: α=0.20 0.10 0.05 0.02 0.01
N1=6 N 2=7 23/42 4/7 29/42 5/7 5/6
8 1/2 7/12 2/3 3/4 3/4
9 1/2 5/9 2/3 13/18 7/9
10 1/2 17/30 19/30 7/10 11/15
12 1/2 7/12 7/12 2/3 3/4
18 4/9 5/9 11/18 2/3 13/18
24 11/24 1/2 7/12 5/8 2/3
N
1=7 N 2=8 27/56 33/56 5/8 41/56 3/4
9 31/63 5/9 40/63 5/7 47/63
10 33/70 39/70 43/70 7/10 5/7
14 3/7 1/2 4/7 9/14 5/7
28 3/7 13/28 15/28 17/28 9/14
N
1=8 N 2=9 4/9 13/24 5/8 2/3 3/4
10 19/40 21/40 23/40 27/40 7/10
12 11/24 1/2 7/12 5/8 2/3
16 7/16 1/2 9/16 5/8 5/8
32 13/32 7/16 1/2 9/16 19/32
N
1=9 N 2=10 7/15 1/2 26/45 2/3 31/45
12 4/9 1/2 5/9 11/18 2/3
15 19/45 22/45 8/15 3/5 29/45
18 7/18 4/9 1/2 5/9 11/18
36 13/36 5/12 17/36 19/36 5/9
N
1=10 N 2=15 2/5 7/15 1/2 17/30 19/30
20 2/5 9/20 1/2 11/20 3/5
40 7/20 2/5 9/20 1/2
N
1=12 N 2=15 23/60 9/20 1/2 11/20 7/12
16 3/8 7/16 23/48 13/24 7/12
18 13/36 5/12 17/36 19/36 5/9
20 11/30 5/12 7/15 31/60 17/30
N
1=15 N 2=20 7/20 2/5 13/30 29/60 31/60
N
1=16 N 2=20 27/80 31/80 17/40 19/40 41/80
Large-sample
approximation
1.07

m+n
mn
1.22

m+n
mn
1.36

m+n
mn
1.52

m+n
mn
1.63

m+n
mn
Source:Adapted by permission from F. J. Massey, Distribution table for the deviation between two sample
cumulatives,Ann. Math. Stat.23 (1952), 435–441.
a
This table gives the values ofD
+
m,n,α
andD m,n,αfor whichα≥P{D
+
m,n
>D
+
m,n,α
}andα≥P{D m,n>Dm,n,α}
for some selected values ofN
1=smaller sample size,N 2=larger sample size, andα.

STATISTICAL TABLES 663
Table ST10. Critical Values of the Wilcoxon Signed-Ranks Test Statistic
a
α
n 0.01 0.025 0.05 0.10
36 6 6 6
41 01 01 0 9
51 51 51 41 2
62 12 01 81 7
72 72 52 42 2
83 43 23 02 7
94 13 93 63 4
10 49 46 44 40
11 58 55 52 48
12 67 64 60 56
13 78 73 69 64
14 89 84 79 73
15 100 94 89 83
16 112 106 100 93
17 125 118 111 104
18 138 130 123 115
19 152 143 136 127
20 166 157 149 140
Source:Adapted by permission from Table 1 of R. L. McCornack, Extended tables of the Wilcoxon matched
pairs signed-rank statistics,J. Am. Stat. Assoc.60 (1965), 864–871.
a
This table gives values oft αfor whichP{T
+
>tα}≤αfor selected values ofnandα. Critical values in the
lower tail may be obtained by symmetry from the equationt
1−α=n(n+1)/2−t α.

664 STATISTICAL TABLES
Table ST11. Critical Values of the Mann–Whitney–Wilcoxon Test Statistic
a
n
m α 2345678910
2 0.01 4 6 8 10 12 14 16 18 20
0.025 4 6 8 10 12 14 15 17 19
0.05 4 6 8 9 11 13 14 16 18
0.10 4 5 7 8 10 12 13 15 16
3 0.01 9 12 15 18 20 20 25 28
0.025 9 12 14 16 19 21 24 26
0.05 8 11 13 15 18 20 22 25
0.10 7 10 12 14 16 18 21 23
4 0.01 16 19 22 26 29 32 36
0.025 15 18 21 24 27 31 34
0.05 14 17 20 23 26 29 32
0.10 12 15 18 21 24 26 29
5 0.01 23 27 31 35 39 43
0.025 22 26 29 33 37 41
0.05 20 24 28 31 35 38
0.10 19 22 26 29 32 36
6 0.01 32 37 41 46 51
0.025 30 35 39 43 48
0.05 28 33 37 41 45
0.10 26 30 34 38 42
7 0.01 42 48 53 58
0.025 40 45 50 55
0.05 37 42 47 52
0.10 35 39 44 48
8 0.01 54 60 66
0.025 50 56 62
0.05 48 53 59
0.10 44 49 55
9 0.01 66 73
0.025 63 69
0.05 59 65
0.10 55 61
10 0.01 80
0.025 76
0.05 72
0.10 67
Source:Adapted by permission from Table 1 of L. R. Verdooren, Extended tables of critical values for Wilcoxon’s
test statistic,Biometrika50 (1963), 177–186, with the kind permission of Professor E. S. Pearson, the author,
and theBiometrikaTrustees.
a
This table gives values ofu αfor whichP{U>u α}≤αfor some selected values ofm,n,andα. Critical values
in the lower tail may be obtained by symmetry from the equationu
1−α=mn−u α.

STATISTICAL TABLES 665
Table ST12. Critical Points of Kendall’s Tau Test Statistic
a
α
n 0.100 0.050 0.025 0.01
33 3 3 3
44 4 6 6
56 6 8 8
67 9 1 11 1
7 9 11 13 15
81 01 41 61 8
91 21 61 82 2
10 15 19 21 25
Source:Adapted by permission from Table 1, p. 173, of M. G. Kendall,Rank Correlation Methods,3rded.,
Griffin, London, 1962. For values ofn≥11, see W. J. Conover,Practical Nonparametric Statistics, John Wiley,
New York, 1971, p. 390.
a
This table gives the values ofS αfor whichP{S>S α}≤α,whereS =

n
2

T, for some selected values ofα
andn. Values in the lower tail may be obtained by symmetry,S
1−α=−S α.
Table ST13. Critical Values of Spearman’s Rank Correlation Statistic
a
α
n 0.01 0.025 0.05 0.10
3 1.000 1.000 1.000 1.000
4 1.000 1.000 0.800 0.800
5 0.900 0.900 0.800 0.700
6 0.886 0.829 0.771 0.600
7 0.857 0.750 0.679 0.536
8 0.810 0.714 0.619 0.500
9 0.767 0.667 0.583 0.467
10 0.721 0.636 0.552 0.442
Source:Adapted by permission from Table 2, pp. 174–175, of M. G. Kendall,Rank Correlation Methods,3rd
ed., Griffin, London, 1962. For values ofn≥11, see W. J. Conover,Practical Nonparametric Statistics, John
Wiley, New York, 1971, p. 391.
a
This table gives the values ofR αfor whichP{R>R α}≤αfor some selected values ofnandα. Critical
values in the lower tail may be obtained by symmetry,R
1−α=−R α.

ANSWERS TO SELECTED PROBLEMS
Problems 1.3
1. (a) Yes; (b) yes; (c) no. 2. (a) Yes; (b) no; (c) no.
6. (a) 0.9; (b) 0.05; (c) 0.95. 7. 1/16. 8.
1
3
+
2
9
Φn2=0.487.
Problems 1.4
3.
Φ
R
n
ΓΦ
W
n−r
α
φ
Φ
N
n
α
4. 352146 5. (n−k+1)!/n!
6. 1−
7P5/758.
Φ
n+k−r
n−r
α
φ
Φ
n+k
k
α
9. 1−
n−k
θ
i=1
Φ
2i
i
α
φ
Φ
2n
n−k
α
12. (a) 4
π
Φ
52
5
α
(b) 9(4)
φ
Φ
52
5
α
(c) 13
Φ
48
1
α
φ
Φ
52
5
α
(d) 13
Φ
4
3
α
12
Φ
4
2
α
φ
Φ
52
5
α
(e)
λ
4
Φ
13
5
α
−9(4)−4
σ
φ
Φ
52
5
α
(f)[10(4)
5
−4−9(4)]
φ
Φ
52
5
α
(g) 13
Φ
12
2
ΓΦ
4
3
α
4
2
φ
Φ
52
5
α
(h)
Φ
13
2
ΓΦ
4
2
ΓΦ
4
2
ΓΦ
44
1
α
φ
Φ
52
5
α
(i)
Φ
13
1
ΓΦ
4
3
ΓΦ
12
3
α
4
3
φ
Φ
52
5
α
An Introduction to Probability and Statistics, Third Edition. Vijay K. Rohatgi and A.K. Md. Ehsanes Saleh.
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.

668 ANSWERS TO SELECTED PROBLEMS
Problems 1.5
3.α(pb)
r

θ
Φ=0
Φ
r+Φ
Φ
α
[p(1−b)]
Φ
4.p/(2−p)
5.
N
θ
j=0
(j/N)
n+1
φN
θ
j=0
(j/N)
n
Φ
n+1
n+2
for largeN 6.n=4
10.r/(r+g) 11. (a) 1/4; (b) 1/3 12. 0.08
13. (a) 173/480 (b) 108/173; 15/173 14. 0.0872
Problems 1.6
1. 1/(2 −p);(1−p)/(2−p) 4.p
2
(1−p)
2
[3−7p(1−p)]
12. For any two disjoint intervalsI
1,I2⊆(a,b),Φ(I 1)Φ(I2)=(b−a)Φ(I 1∩I2), where
Φ(I)=length of intervalI.
13. (a)p
n=

8/36 ifn=1
2

27
36

n−2
3
6

2
+2

26
36

n−2
4
36

2
+2

25
36

n−2
5
36

2
,n≥2
(b) 22/45
(c) 12/36; 2

27
36

n−2
9
36

3
16

+2

26
36

n−2
10
36

4
36

+2

25
36

n−2
11
36

5
36

forn=2,3,....
Problems 2.2
3. Yes; yes
4.φ;{(1,1,1,1,2),(1,1,1,2,1),(1,1,2,1,1),(1,2,1,1,1),(2,1,1,1,1)};{(6,6,6,6,6)};
{(6,6,6,6,6),(6,6,6,6,5),(6,6,6,5,6),(6,6,5,6,6),(6,5,6,6,6),(5,6,6,6,6)}
5. Yes;(1/4,1/2)∪(3/4,1)
Problems 2.3
1.
x 0123
P(X=x)1/83/83/81/8
F(x)=0,x<0,=1/8,0≤x<1;=1/2,1≤x<2;=5/8,2≤x<3;
=1,x≥3
3. (a) Yes; (b) yes; (c) yes; yes
Problems 2.4
1.(1−p)
n+1
−(1−p)
N+1
,N≥n
2. (b)
1
π(1+x
2
)
;(c)1 /x
2
;(d) e
−x
3. Yes;F θ(x)=0x≤0,=1−e
−θx
−θxe
−θx
forx>0;P(X≥1)=1−F θ(1)
4. Yes;F(x)=0,x≤0;=1−

1+
x
θ+1

e
−x/θ
forx>0
6.F(x)=e
x
/2forx≤0,=1−e
−x
/2forx>0
8. (c), (d), and (f)
9. Yes; (a) 1/2, 0 <x<1, 1/4for2 <x<4; (b) 1/(2θ ),|x|≤θ;
(c)xe
−x
,x>0; (d)(x−1)/4for1≤x<3, andP(X=3)=1/2;
(e) 2xe
−x
2
,x>0
10. IfS(x)=1−F(x)=P(X >x), thenS
α
(x)=−f(x)

ANSWERS TO SELECTED PROBLEMS 669
Problems 2.5
2.X
d
=1/X
4.θ[1−exp(− 2πθ)]
ψ
1−y
2

e
−θarc cosy
+e
−2πθ+θ arc cosy

,|y|≤1;

θexp{−θarctanz}[(1+z
2
)(1−e
−θπ
]
−1
,z>0
θexp{−πθ−arctanz}[(1+z
2
)(1−e
−θπ
)]
−1
,z<0
10.f
|X|(y)=2/3for0<y<1,=1/3for1<y<2
12. (a) 0, y<0;F(0)for−1≤y<1, and 1 fory≥1;
(b)=0ify<−b,=F(−b)ify=−b,=F(y)if−b≤y<b,=1ify≥b;
(c)=F(y)ify<−b,=F(−b)if−b≤y<0,=F(b)if 0≤y<b,=F(y).
ify≥b.
Problems 3.2
3.EX
2r
=0if2r<2m−1 is an odd integer,
=
Γ

m−r+
1
2

Γ

r+
1
2

Γ

1
2

Γ

m−1
2
if 2r<2m−1 is an even integer
9.z
p=a(1−v)/v, wherev=(1−p)
1/k
10. Binomial:α 3=(q−p)/

npq,α 4=3+(1−6pq)/3npq
Poisson:α
3=λ
−1/2
,α4=3+1/λ.
Problems 3.3
1. (b)e
−λ
(e
λs
−1)/(1−e
−λ
);(c)p[1−(qs)
N+1
]/[(1−qs)(1−q
N+1
)],s<1/q.
6.f(θs)/f(θ);f(θe
t
)/f(θ).
Problems 3.4
3. For anyσ
2
>0takeP(X =x)=
σ
2
σ
2
+x
2,P

X=−
σ
2
x

=
x
2
σ
2
+x
2,x =0.
5.P

X
2
=
σ
4
K
2
−μ4
K
2
σ
2
−σ
2

=
σ
4
[K
2
−1]
2μ4+K
4
σ
4
−2K
2
σ
41<K<

2
P(X
2
=K
2
σ
2
)=
μ4−σ
4
μ4+K
4
σ
4
−2K
2
σ
4.
Problems 4.2
1. No 4. 1/6; 0. 7. Marginals negative binomial, so also conditionals.
8.h(y|x)=
1
2
(c
2
+x
2
)/(c
2
+x
2
+y
2
)
3/2
.
9.X∼B(p
1,p2+p3);Y/(1−x)∼B(p 2,p3).
10.X∼G(α,1/β),Y∼G(α+γ,1/β),X/y∼B(α,γ),Y−x∼G(γ,1/β).
14.P(X≤7)=1−e
−7
15. 1/24; 15/16. 17. 1/6.
Problems 4.3
3. No; Yes; No. 10.=1−a/(2b) ifa<b,=b/(2a) ifa>b.
11.λ/(λ+μ);1/2.
Problems 4.4
2. (b)f
V|U(v|u)= 1/(2u), |v|<u,u>0.
6.P(X=x,M=m)=π (1−π)
m
[1−(1−π)
m+1
]ifx=m,=π
2
(1−π)
m+x
ifx<m.P(M=m)=2π(1−π)
m
−π(2−π)(1−π)
2m
,m≥0.

670 ANSWERS TO SELECTED PROBLEMS
7.fX(x)=λ
k
e
−λ
/k!,k≤x<k+1,k=0,1,2,...
11.f
U(u)=3u
2
/(1+u)
4
,u>0.
13. (a)F
U,V(u,v)=

1−exp


u
2

2

π+2v


ifu>0,|v|≤π/2,
=1−exp[1−u
2
/(2σ
2
)]ifu>0,v>π/2,=0 elsewhere.
(b)f(u,v)=
1

π
e
−u
2
v
1/2− 1
e
−v/2
Γ(1/2)

2
.
Problems 4.5
2.EX
k
Y
Φ
=
2
Φ+1
(k+3)(Φ+1)
+
2
Φ+2
3(k+2)(Φ+2)
.3. cov(X ,Y)=0;X,Ydependent.
15.M
U,V(u,v)=(1−2v)
−1
exp{u
2
/(1−2v)}forv<1/2;ρ(U,V)=0; no.
18.ρ
Z,W=(σ
2
2
−σ
2
1
)sinθcosθ/
ψ
var(Z)var(W).
21. IfUhas PDFf, thenEX
m
=EU
m
/(m+1)form≥0;ρ=
1
2

EU
2
8
3
var(U)+
2
3
(EU)
2.
Problems 4.6
1.μ+σ

f

a−μ
σ

−f

b−μ
σ



b−μ
σ

−Φ

a−μ
σ

whereΦis the standard normal DF.
2. (a) 2(1 +X).3. E{X|y}=μ
1+ρ
σ1
σ2
(y−μ 2).4. E(var{Y|X}).
6. 4/9. 7. (a) 1; (b) 1/4. 8.x
k
/(k+1);1/(1+k)
2
.
Problems 4.7
5. (a)
Φ
n
Δ
j=1
1/j
Γ
/β;(b)
n
n+1
.
Problems 5.2
5.FY(y)=

y
M

π

N
M

,P(Y=y)=

y−1
M−1

π

N
M

,y≥M+1, and
P(Y=M)=1
πγN
M

.P(x
1,...,x m|Y=y)=
(y−m)!
(y−1)!M
,0<x i≤y,
i=1,...,j,x
i =xjfori =j.
9.P(Y
1=x)=qp
x
+pq
x
,x≥1.P(Y 2=x)=p
2
q
x−1
+q
2
p
x−1
,x≥1
P(Y
n=x)=P(Y 1=x)fornodd;=P(Y 2=x)forneven.
Problems 5.3
2. (a)P

F(X)=

x
k=0
n
k

p
k
(1−p)
n−k

=
n
x

p
x
(1−p)
n−x
,x=0,1,...,n.
13.C
Φ

n
i=1
|a i|
a
2
i
+b
2
i,

n
i=1
b
i
a
2
i
+b
2
i
Γ
22.X/|Y|∼C(1,0);(2/π)(1+z
2
)
−1
,0<z<∞.
27. (a)t/α
2
;(c)=0ift≤θ,=α/tift>θ;(d)(α/β)t
α−1
.
29. (b) 1/(2

π);1/2.

ANSWERS TO SELECTED PROBLEMS 671
Problems 5.4
1. (a)μ 1=4;μ 2=15/4,ρ=−3/4; (b)N

6−
9
16
x,
63
16

; (c) 0.3191.
4.BN(aμ
1+b,cμ 2+d,a
2
σ
2
1
,c
2
σ
2
2
,ρ).6. tan
2
θ=EX
2
/EY
2
.7. σ
2
1

2
2
.
Problems 6.2
1.P(
X=0)=P(X=1)=1/8,P(X=1/3)= P(X=2/3)= 3/8
P(S
2
=0)=1/4, P(S
2
=1/3)= 3/4.
2.
x 11 .522 .533 .544 .555 .56
p(x)1/36 2/36 3/36 4/36 5/36 6/ 36 5/36 4/36 3/36 2/36 1/ 36
.
Problems 6.3
1.{F(min(x ,y))−F(x)F(y)}/n.
6.E(S
2
)
k
=
σ
2
(n−1)
k(n−1)(n+2)···(n+2k−3),k≥1.
9. (a)P(X=t)=e
−nλ
(nλ)
tn
/(tn)!, t=0,1/n,2/n,...;(b)C(1,0);
(c)Γ(nm/2, 2/n). 10. (b) 2/

αn;3+6/(αn).
11. 0,1,0,E(Xn−0.5)
4
/(144n
2
). 12.var(S
2
)=
1
n

λ+
2nλ
2
n−1

>var(X).
Problems 6.4
2.n(m+δ)/[m(n −2)];2n
2
{(m+δ)
2
+(n−2)(m+2δ)}/[m
2
(n−2)
2
(n−4)].
3.δ
ψ
n
2
Γ(
n−1
2
)
Γ(
n
2
)
,n>1;
n
n−2
(1+δ
2
)−

δ
ψ
n
2
Γ(
n−1
2
)
Γ(
n
2
)

2
,n>2.
11. 2m
m/2
n
n/2
(n+me
2z
)
−(m+n)/2
e
zm
/B

m 2
,
n
2

,−∞<z<∞.
Problems 6.5
1.t(n−1)2.t(m+n−2)3.


2
n−1

k
Γ

n−1
2
+k
ρπ
Γ

n−1
2

.
Problems 6.6
3.[2π(1−ρ
2
)]
−1/2

1+
y
2
1
+y
2
2
−2ρy 1y2
n(1−ρ
2
)

−(
n
2
+1)
; both∼t(n).
4.

n−1T∼t(n−1).
Problems 7.2
1. No. 2. Yes
3.Y
n→Y∼F(y)=0ify<0,=1−e
−y/θ
ify≥0.
4.F(y)=0ify≤0,=1−e
−y
ify>0.
9.C(1,0)12. No
13. (a)exp(− x
−α
),x>0;EX
k
=Γ(1−k/α),k<α.
(b)exp(− e
−x
),−∞<x<∞;M(t)=Γ(1−t),t<1.
(c)exp{−(−x)
α
},x<0;EX
k
=(−1)
k
Γ(1+k/α),k>−α.
20. (a) Yes; No (b) Yes; No.
Problems 7.3
3. Yes;A n=n(n+1)μ/2,B n=σ
ψ
n(n+1)(2n+1)/6
5. (a)M
n(t)→0asn→∞; no. (b)M n(t)diverges asn→∞
(c) Yes (d) Yes (e)M
n→e
t
2
/4
; no.

672 ANSWERS TO SELECTED PROBLEMS
Problems 7.4
1. (a) No; (b) No. 2. No. 3. For α<1/2. 7. (a) Yes; (b) No.
Problems 7.5
4. Degenerate atβ. 5. Degenerate at 0.
6. Forρ≥0,N(0,

ρ), and forρ<0,S n/n
L
−→degenerate.
Problems 7.6
1. (b) No; (c) Yes; (d) No.
2.N(0,1).3.N(0,σ
2

2
). 4. 163. 8. 0.0926; 1.92
Problems 7.7
1. (a)AN(μ
2
,4μ
2
σ
2
n
)forμ =0,
X
2

2
n
L
−→χ
2
(1)forμ=0,σ
2
n

2
/n.
(b) Forμ =0, 1/
X∼AN(1/μ,σ
2
n

4
);forμ =0,σ n/
Xn
L−→1/N(0,1).
(c) Forμ =0,Φn|
X|∼AN(Φn|μ|,σ
2
n

2
);forμ =0,Φn(|
X|/σn)
L
−→Φn|N(0,1)|.
(d)AN(e
μ
,e

σ
2
n
).
2.c=1/2 and

X∼AN(

λ,1/4).
Problems 8.3
2. No. 7.f θ2
(x)/f θ1
(x). 9. No. 10. No.
11. (b)X
(n);(e)(
X,S
2
);(g)
Φ
n

1
Xi,
n

1
(1−X i)
α
(h)X(
(1),X
(2),...,X
(n)).
Problems 8.4
2.

n−1
2



n−1
2

Γ

n+p−1
2
S
p
;

n−1
2

p/2Γ

n+p−1
2

Γ

n+2p−1
2
S.
3.S
2
1
=
n−1
n+1
S
2
;var(S
2
1
)=

n−1
n+1

2
2σ 4
n−1
<var(S
2
)=

4
n−1
; 4. No; 5. No.
6. (a)
Φ
n−s
t−s
α
/
Φ
n
t
α
,0≤s≤t≤n,t=

n
1
xi;(b)=
Φ
s
t
α
π
Φ
n
t
α
if 0≤t<s,
=2/
Φ
n
t
α
ift=s, and
Φ
n−s
t−s
α
π
Φ
n
t
α
ifs+1≤t≤n.
9.
Φ
t+n−2
t
α
π
Φ
t+n−1
t
α
,t=Σx
i. 11. (a)NX/n;(b)No.
12.t=Σ
n
1
xi,1−

1−
t0
t

n−1
ift>t 0, and 1 ift≤t 0.
13. (a) Witht=

n
1
xj,

t
j=0
t!
j!
n
j−t
;(b)
t!
(t−s)!
n
−s
,t≥s(c)(1−1/n)
t
;
(d)(1−1/n)
t−1
[1+
t−1
n
].
14. Witht=x
(n),[t
n
ψ(t)−(t−1)
n
ψ(t−1)]/[t
n
−(t−1)
n
],t≥1.
15. Witht=

n
1
xj,
Φ
t
k
α

1
n

k
1−
1n

t−k
.

ANSWERS TO SELECTED PROBLEMS 673
Problems 8.5
1. (a), (c), (d) Yes; (b) No. 2. 0.64761/n
2
.
3.n
−1
sup
xθ=0
{x
2
/[e
x
2
−1]}.5.2 θ(1−θ)/n
Problems 8.6
2.
ˆ
β=(n−1)S
2
/(n
X),ˆα=X/
ˆ
β3.ˆμ=X,ˆσ
2
=(n−1)S
2
/n.
4.ˆα=X(X−X
2
)[X
2
−X
2
]
−1
,X
2
=

n
1
X
2
i
/n
ˆ
β=(1−
X)(X−X
2
)[X
2
−X
2
]
−1
.
5.ˆμ=∞n{X
2
/[X
2
]
1/2
},ˆσ
2
=∞n{X
2
/X
2
},X
2
=

n
1
X
2
i
/n.
Problems 8.7
1. (a) med(X j);(b)X
(1);(c)n/

n
1
X
α
j
;(d)−n/

n
1
∞n(1−X j).
2. (a)X/n;(b)
ˆ
θ
n=1/2if
X≤1/2,=Xif 1/2≤X≤3/4,=3/4ifX≥3/4;
(c)
ˆ
θ=

ˆ
θ
0,if
X≥0
ˆ
θ
1,if
X≤0
where
ˆ
θ
0=−
X
2
+

X
2
+(
X
2
)
2
,
ˆ
θ
1=−
X
2


X
2
+(
X
2
)
2
,X
2
=

X
2
1
/n;
(d)
ˆ
θ=
n3
n1+n3
ifn1,n3>0; = any value in (0,1) ifn 1=n3=0;
no mle ifn
1=0,n 3 =0; no mle ifn 1 =0,n 3=0;
(e)
ˆ
θ=−
1
2
+
1
2
ψ
1+4X
2
; (f)
ˆ
θ=X.
3.ˆμ=−Φ
−1
(m/n).
4. (a)ˆα=X
(1),
ˆ
β=

n 1
(Xi−ˆα)/n;(b) Δ=P α,β(X1≥1)=e
(α−1)β
α≤1,
=1,α≥1.
ˆ
Δ=1if ˆα≥1,=exp{(ˆα−1)/
ˆ
β}ifˆα<1.
5.
ˆ
θ=1/
X.6.ˆμ=Σ∞nX i/n,ˆσ
2
=

n 1
(∞nX i−ˆμ)
2
/n.
8. (a)ˆN=
M+1
M
X
(M)−1; (b)X
(M).
9.ˆμ
i=

n j=1
Xij/n=
Xi,i=1,2,...,sˆσ
2
= ΣΣ(X ij−Xi)
2
/(ns).
11.ˆμ=X, 13.d(
ˆ
θ)=(X/n)
2
. 15.ˆμ=max(X,0).
16.ˆp
j=Xj/n,j=1,2,...,k−1.
Problems 8.8
2. (a)(Σx i+1)/(n+1);(b)

n+1
n+2

Σxi+1
.3.X.5. X/n.
6.(X+1)(X+n)/[(n +2)(n+3)].8. (α+n)max(a,X
(n))/(α+n−1).
Problems 8.9
5. (c)(n+2)

(X
(n)/2)
−(n+1)
−(X
(1))
−(n+1)

/{(n+1)[(X
(n)/2)
−(n+2)
−(X
(1))
−(n+2)
]}
10.(ΣX
i)
k
Γ(n+k)/Γ(n +2k)
Problems 9.2
1. 0.019, 0.857. 2.k=μ 0+σz α/

n;1−Φ

z α−
μ1−μ0
σ

n

.
5.exp(− 2);exp(−2/θ),θ≥1.

674 ANSWERS TO SELECTED PROBLEMS
Problems 9.3
1.φ(x)=1ifx<θ 0(1−

1−α)=0 otherwise.
4.φ(x)=1if||x |−1|>k.5. φ(x)=1ifx
(1)>c=θ 0−Φn(α
1/n
).
11. Ifθ
0<θ1,φ(x)=1if x
(1)>θ0α
−1/n
, and ifθ 1<θ0, thenφ(x)=1ifx
(1)
<θ0(1−α
1/n
)
−1
.
12.φ(x)=1ifx <

α/2or>1−

α/2.
Problems 9.4
1. (a), (b), (c), (d) have MLR inΣX j; (e) and (f) in

n
1
Xj
4. Yes. 5. Yes; yes.
Problems 9.5
1.φ(x 1,x2)=1if|x 1−x2|>c,=0 otherwise,c=

2z
α/2.
2.φ(x)=1ifΣx
i>k. Choosekfromα=P λ0

n
1
Xi>k

.
Problems 9.6
3.φ(x)=1 if (no. ofx i’s>0−no. ofx i’s<0)>k.
Problems 10.2
2.Y=#ofx 1,x2in sample,Y<c 1orY>c 2.3. X<c 1or>c 2.
4.S
2
>c1or<c 2.5.(a) X
(n)>N0;(b)X
(n)>N0or<c.
6.|X−θ
0/2|>c.7.(a)
X<c 1or>c 2;(b)X>c.
11.X
(1)>θ0−Φn(α)
1/n
. 12.X
(1)>θ0α
−1/n
.
Problems 10.3
1. Reject atα=0.05. 3. Do not rejectH 0:p1=p2=p3=p4at 0.05 level.
4. RejectH
0atα=0.05. 5. Reject at 0.10 but not at 0.05 level.
7. Do not rejectH
0atα=0.05. 8. Do not rejectH 0atα=0.05.
10.U=15.41. 12. P-value = 0.5447.
Problems 10.4
1.t=−4.3, rejectH 0atα=0.02. 2. t=1.64, do not rejectH 0.
5.t=5.05. 6. RejectH
0atα=0.05. 7. RejectH 0. 8. RejectH 0.
Problems 10.5
1. Do not rejectH 0:σ1=σ2atα=0.10.
3. Do not rejectH
0atα=0.05. 4. Do not rejectH 0.
Problems 10.6
2. (a)φ(x)=1ifΣx i=5,=0.12 ifΣx i=4,=0 otherwise;
(b) Minimax rule rejectsH
0ifΣx i=4 or 5, and with probability 1/16 ifΣx i=3;
(c) Bayes rule rejectsH
0ifΣx i≥2.

ANSWERS TO SELECTED PROBLEMS 675
3. RejectH 0ifx≤(1−1/n)Φn2
β(1)=P(Y≤(n−1)Φn2),β(2)=P(Z≤(n−1)Φn2), whereY∼G(n,1)and
Z∼G(n,1/2)
Problems 11.3
1. (77.7, 84.7). 2.n=42. 7.

2ΣXi
χ
2
2n,α/2,2ΣX i/χ
2
2n,1− α/2

.
9.(2X/(2−λ
1),2X/(2−λ 2)),λ
2
2
−λ
2
1
=4(1−α). 10. [α
1/n
N].
11.n≥
Φn(1/α)
[Φn(1+d/X
(n))]
.
12. Choosekfromα=(k+1)e
−k
. 13.
X+z ασ/

n
14.(ΣX
2
i
/c2,ΣX
2
i
/c1), where
ˆ
c2
c1
χ
2
n
(y)dy=1−αand
ˆ
c2
c1

2
n
(y)dy=n(1−α).
15. PosteriorB(n+α,Σx
i+β−n).
16.h(μ|x)=
ψ
n

exp{−
n
2
(μ−x)
2
}[Φ(

n(1−¯x))−Φ(−

n(1+¯x))], whereΦ
is standard normal DF.
Problems 11.4
1.(X
(1)−χ
2
2,α
/(2n), X
(1)).
2.(2n
X/b,2nX/a), choosea,bfrom
ˆ
b
a
χ
2
2n
(u)du=1−α, anda
2
χ
2
2n
(a)=b
2
χ
2
2n
(b),
whereχ
2
v
(x)is the PDF ofχ
2
(v)RV.
3.(X/(1−b),X/(1−a)), choosea,bfrom 1−α=b
2
−a
2
anda(1−a)
2
=b(1−b)
2
.
4.n=[4z
2
1−α/2
/d
2
]+1;n>(1/α)Φn(1/α ).
Problems 11.5
1.(X
(n),α
−1/n
X
(n)).
2.(2ΣX
i/λ2,2ΣX i/λ1), whereλ 1,λ2are solutions ofλ 1f2nα(λ1)=λ 2f2nα(λ2)and
P(1)= 1−α,f
visχ
2
(v)PDF.
3.(X
(1)−
χ
2
2,α
2n
,X
(1)).5. (α
1/n
X
(1),X
(1)).8.Yes.
Problems 12.3
4. RejectH 0:α0=α

0
if
|ˆα0−α
Φ 0
|
√nΣ(t i−t)
2
/Σt
2
i

Σ(Yi−ˆα0−ˆα1ti)
2
/(n−2)
>c0.
8. Normal equations
ˆ
β
0Σx
k
i
+
ˆ
β1Σx
k+1
i
+
ˆ
β2Σx
k+2
i
=ΣY ix
k
i
,k=0,1,2.
RejectH
0:β2=0if{|
ˆ
β 2|/
ψ
c
2
1
}/

Σ(Yi−
ˆ
β0−
ˆ
β1xi−
ˆ
β2x
2
i
)}>c 0, where
ˆ
β
2=Σc iYiand
ˆ
β 0=
Y−
ˆ
β 1x,
ˆ
β1=Σ(x i−x)(Yi−Y)/Σ(x i−x)
2
.
10. (a)
ˆ
β
0=0.28,
ˆ
β 1=0.411; (b)t=4.41, rejectH 0.
Problems 12.4
2.F=10.8. 3. Reject atα=0.05 but not atα=0.01.
4. BSS = 28.57, WSS = 26, reject atα=0.05 but not at 0.01.
5.F=56.45. 6.F=0.87.
Problems 12.5
4. SS Methods = 50, SS Ability = 64.56, ESS = 25.44; rejectH 0atα=0.05, not at 0.01.
5.F
variety=24.00.

676 ANSWERS TO SELECTED PROBLEMS
Problems 12.6
2. RejectH 0if
am

b
1
(y
.j.
−y)
ΣΣΣ(y ijs−y
ij.
)
2>c.
4.SS
1(machines) = 2.786, d.f. = 3; SSI = 73.476, d.f. = 6;
SS
2(machines) = 27.054, d.f. = 2; SSE = 41.333, d.f. = 24.
5.Cities 3 227.27 4.22
Auto 3 3695.94 68.66
Interactions 9 9.28 0.06
Error 16 287.08
Problems 13.2
1.dis estimable of degree 1; (number ofx i’s inA)/n.
2. (a)(mn)
−1
ΣXiΣYj;(b)S
2
1
+S
2
2
.
3. (a)ΣX
iYi/n;(b)Σ(X i+Yi−
X−Y)
2
/(n−1).
Problems 13.3
3. Do not rejectH 0. 7. RejectH 0. 10. Do not rejectH 0at 0.05 level.
11.T
+
=133, do not rejectH 0.
12. (Second part)T
+
=9, do not rejectH 0atα=0.05.
Problems 13.4
1. Do not rejectH 0. 2. (a) Reject; (b) Reject.
3.U=29, rejectH
0.5. d=1/4, do not rejectH 0.
7.t=313.5,z=3.73, reject;r=10 or 12, do not reject atα=0.05.
Problems 13.5
1. RejectH 0atα=0.05. 4. Do not rejectH 0atα=0.05.
9. (a)t=1.21; (b)r=0.62; (c) RejectH
0in each case.
Problems 13.6
1. (a) 5; (b) 8. 3.p
n−2
(n+p−np)≤1.
4.n≥(z
1−γ
ψ
p0(1−p 0)−z1−δ
ψ
p1(1−p 1))
2
/(p1−p0)
2
.
Problems 13.7
1. (c)E{n(X−μ)
2
}/ES
2
=1+2ρ(1−2ρ/n)
−1
; ratio = 1 ifρ=0,>1forρ>0.
2. Chi-square test based on (c) is not robust for departures from normality.

AUTHOR INDEX
A. Agresti, 553
T.W. Anderson, 400
J.D. Auble, 606
I.V. Basawa, 178, 192
D. Bernstein, 220
P.J. Bickel, 633, 634
P. Billingsley, 315
D. Birkes, 465
Z.W. Birnbaum, 586, 587
D. Blackwell, 364, 368
J. Boas, 371
D.G. Chapman, 377, 381
S.D. Chatterji, 181, 188, 195
K.L. Chung, 45, 53, 155, 315
W. G. Cochran, 657
W. J. Conover, 665
H. Cramér, 220, 277, 281, 317, 372, 386,
397, 398, 474, 475, 631
J.H. Curtiss, 86, 317
D.A. Darmois, 221
M.M. Desu, 523
B. Efron, 530, 533
P. Erdös, 155
W. Feller, 76, 157, 219, 315, 325, 327
K.K. Ferentinos, 523
T.S. Ferguson, 334, 455, 475, 524, 528
R.A. Fisher, 227, 356, 374, 383
M. Fisz, 322
S.V. Fomin, 4
D.A.S. Fraser, 345, 409, 576, 590, 617
M. Fréchet, 372
J.D. Gibbons, 614
B.V. Gnedenko, 302
W.C. Guenther, 517, 526
E.J. Gumbel, 115
J.H. Hahn, 592
R. A. Hall, 660
P.R. Halmos, 4, 345
J.L. Hodges, 412, 414, 631
W. Hoeffding, 581
P. G. Hoel, 651, 653
D. Hogben, 258
V.S. Huzurbazar, 398
An Introduction to Probability and Statistics, Third Edition. Vijay K. Rohatgi and A.K. Md. Ehsanes Saleh.
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.

678 AUTHOR INDEX
M. Kac, 34
W.C.M. Kallenberg, 460
J.F. Kemp, 227
M.G. Kendall, 612, 614, 616
J. Kiefer, 377, 381
G.W. Klett, 521, 527
A.N. Kolmogorov, 2, 4, 586
C.H. Kraft, 594
W. Kruskal, 154, 188
G. Kulldorf, 398
R.G. Laha, 239, 315
J. Lamperti, 188, 226
E.L. Lehmann, 242, 345, 350, 353, 354,
356, 412, 414, 449, 455, 460, 543, 581,
583, 631, 633
Y. L e P a g e , 9 2
M. Loève, 321
E. Lukacs, 88, 207, 221, 239, 317
H.B. Mann, 606
F.J. Massey, 603
M.V. Menon, 216
L.H. Miller, 586
S. Mitra, 369
M. Moore, 92
M.G. Natrella, 481, 482, 489
W. Nelson, 592
J. Neyman, 400, 438
E.H. Oliver, 392
D.B. Owen, 586
E. Parzen, 650
E.S. Pearson, 438
E.J.G. Pitman, 216
J.W. Pratt, 525
B.J. Prochaska, 227, 301
P.S. Puri, 195
D.A. Raikov, 187
R.R. Randles, 461, 581, 583, 614, 633
C.R. Rao, 290, 364, 372
S.C. Rastogi, 282
H. Robbins, 377, 381
V.K. Rohatgi, 226, 315, 328
V.I. Romanovsky, 277
L. Rosenberg, 233
J. Roy, 369
R. Roy, 92
H.L. Royden, 4
Y.D. Sabharwal, 175
A. Sampson, 354
P.A. Samuelson, 248
L. J. Savage, 345
H. Scheffé, 288, 353, 354, 543, 635, 636
E.L. Scott, 400
R.J. Serfling, 334, 633
D.N. Shanbhag, 178, 192
L. Shepp, 226
A.E. Siegel, 607
V.P. Skitovitch, 221
N.V. Smirnov, 587
G. W. Snedecor, 657
B. Spencer, 354
R.C. Srivastava, 195
S.M. Stigler, 366
P.T. Strait, 36
R.F. Tate, 521, 527
R.J. Tibshirani, 533
F.H. Tingey, 587
W.A. Thompson, Jr. 216
H.G. Tucker, 101
C. Van Eeden, 594
L. R. Verdooren, 664
A. Wald, 399
G.N. Watson, 356
D.R. Whitney, 606
D.V. Widder, 86, 454
S.S. Wilks, 470, 522
E.J. Williams, 216
J. Wishart, 277
D.A. Wolfe, 461, 581, 583, 614, 633
C.K. Wong, 32
S. Zacks, 633
P.W. Zehna, 396

SUBJECT INDEX
Absolutely continuous df, 47, 49, 53, 135,
335, 336, 576, 590
Actions, 401, 397
Admissible decision rule, 416
Analysis of variance, 539
one-way, 539, 554
table, 555
two-way, 560
two-way with interaction, 566, 570
Ancillary statistic, 355
Assignment of probability, 7, 13
equally likely, 1, 7, 20
on finite sample spaces, 20
random, 13
uniform, 7, 20
Asymptotic distribution,
ofrth order-statistic, 335
of sample moments, 328
of sample quantile, 336
Asymptotic relative efficiency(Pitman’s),
632
Asymptotically efficient estimator, 382
Asymptotically normal, 332
Asymptotically normal estimator, 332
best, 341
consistent, 341
Asymptotically unbiased estimator, 341
At random, 1, 16
Banach’s matchbox problem, 180
Bayes,
risk, 403
rule, 28, 403
solution, 404
Behrens-Fisher problem, 486
Welch approximation, 486
Bernoulli random variable, 174
Bernoulli trials, 174
Bertrand’s paradox, 17
Best asymptotically normal estimator, 341
Beta distribution, 210
bivariate, 113
MGF, 212
moments, 211
Beta function, 210
Bias of an estimator, 339, 360
Biased estimator, 359
Binomial coefficient, 79
Binomial distribution, 78, 176
bounds for tail probability, 193
central term, 193
characterization, 178
An Introduction to Probability and Statistics, Third Edition. Vijay K. Rohatgi and A.K. Md. Ehsanes Saleh.
© 2015 John Wiley & Sons, Inc. Published 2015 by John Wiley & Sons, Inc.

680 SUBJECT INDEX
Binomial distribution (cont’d )
coefficient of skewness, 82
generalized to multinomial, 190
Kurtosis, 82
mean, 78
MGF, 177
moments, 78, 82
PGF, 95, 177
relation to negative binomial, 180
tail probability as incomplete beta
function, 213
variance, 78
Blackwell-Rao theorem, 364
Bonferroni’s inequality, 10
Boole’s inequality, 11
Bootstrap,
method, 530
sample, 530
Borel-Cantelli lemma, 309
Borel-measurable functions, of an rv, 55,
69, 117
Buffon’s needle problem, 14
Canonical form, 541
Cauchy distribution, 68, 80, 213
bivariate, 113
characterization, 216
characteristic function, 215, 320
mean does not exist, 215
MGF does not exist, 215
moments, 214
as ratio of two normal, 221
as stable distribution, 216
Cauchy-Schwarz inequality, 153
Central limit theorem, 321
applications of, 327
Chapman, Robbins and Kiefer inequality,
377
for discrete uniform, 378
for normal, 379
for uniform, 378
Characteristic function, 87
of multiple RVs, 136
properties, 136
Chebychev-Bienayme inequality, 94
Chebychev’s inequality, 94
improvement of, 95
Chi-square distribution, central, 206, 261
MGF, 207, 262
moments, 207, 262
as square of normal, 221
noncentral, 264
MGF, 264
moments, 264
Chi-square test(s), 472
as a goodness of fit, 476
for homogeneity, 479
for independence, 608
one-tailed, 472
robustness, 631
for testing equality of proportions, 473
for testing parameters of multinomial,
475
for testing variance, 472
two-tailed, 472
Combinatorics, 20
Complete, family of distributions, 347
Complete families, binomial, 348
chi-square, 348
discrete uniform, 358
hypergeometric, 358
uniform, 348
Complete sufficient statistic, 347, 576
for Bernoulli, 348
for exponential family, 350
for normal, 351
for uniform, 349
Concordance, 611
Conditional, DF, 108
distribution, 107
PDF, 109
PMF, 108
probability, 26
Conditional expectation, 158
properties of, 158
Confidence, bounds, 500
coefficient, 500
estimation problem, 500
Confidence interval, 499
Bayesian, 511
equivariant, 527
expected length of, 517
general method(s) of construction, 504
level of, 500
length of, 500
percentile, 531
for location parameter, 623
for the parameter of, Bernoulli, 513
discrete uniform, 516
exponential, 509

SUBJECT INDEX 681
normal, 502–503
uniform, 509, 515
for quantile of orderp, 621
shortest-length, 516
from tests of hypotheses, 507
UMA family, 502
UMAU family, 524
for normal mean, 524
for normal variance, 526
unbiased, 523
using Chebychev’s inequality, 513
using CLT, 512
using properties of MLE’s, 513
Conjugate prior distribution, 408
natural, 408
Confidence set, 501
for mean and variance of normal,
502
UMA family of, 502
UMAU family of, 524
unbiased, 523
Consistent estimator, 340
asymptotically normal, 341
inrth mean, 340
strong and weak, 340
Contaminated normal, 625
Contingency table, 608
Continuity correction, 328
Continuity theorem, 317
Continuous type distributions, 49
Convergence, a.s., 294
in distribution=weak, 286
in law, 286
of MGFs, 316–317
modes of, 285
of moments, 287
of PDFs, 287
of PMFs, 287–288
in probability, 288
inrth mean, 292
Convolution of DFs, 135
Correlation, 144
Correlation coefficient, 144, 277
properties, 145
Countable additivity, 7
Covariance, 144
sample, 277
Coverage, elementary, 619
r-coverage, 620
probability, 619
Credible sets, 511
Critical region, 431
Decision function, 401
Degenerate RV, 173
Degrees of freedom when pooling classes,
479
Delta method, 332
Density function, probability, 49, 104
Design matrix, 539
Diachotomous trials, 174
Discordance, 611
Discrete distributions, 173
Discrete uniform distribution, 175
Dispersion matrix=variance – covariance
matrix, 328
Distribution, conditional, 107
conjugate prior, 408
of a function of an RV, 55
induced, 59
a posteriori, 404
a priori, 403
of sample mean, 257
of sample median, 259
of sample quantile, 167, 336
of sample range, 162, 326
Distribution function, 43
continuity points of a, 43, 50
of a continuous type RV, 49
convolution, 135
decomposition of a, 53
discontinuity points of a, 43
of a discrete type RV, 47
of a function of an RV, 56
of an RV, 43
of multiple RVs, 100, 102
Domain of attraction, 321
Efficiency of an estimate, 382
relative, 382
Empirical DF=sample DF, 249
Equal likelihood, 1
Equivalent RVs, 119
Estimable function, 360
Estimable parameter, 576, 581
degree, 577, 581
kernel, 577, 582
Estimator, 338
equivariant, 340, 420

682 SUBJECT INDEX
Estimator (cont’d )
Hodges-Lehmann, 631
least squares, 537
minimum risk equivariant, 422
Pitman, 424, 426
point, 338
Event, 3
certain, 8
elementary=simple, 3
disjoint=mutually exclusive, 7, 33
independent, 31
null, 3
Exchangeable random variables, 120, 149,
255
Expectation, conditional, 158
properties, 158
Expected value=mean=mathematical
expectation, 68
of a function of RV, 67, 136
of product of RVs, 148
of sum of RVs, 147
Exponential distribution, 206
characterizations, 208
memoryless property of, 207
MGF, 206
moments, 206
Exponential family, 242
k-parameter, 242
natural parameters of, 243
one-parameter, 240
Extreme value distribution, 224
Factorial moments, 79
Factorization criterion, 344
Finite mixture density function, 225
Finite population correction, 256
Fisher Information, 375
Fisher’s Z-statistic, 270
Fitting of distribution, binomial, 482
geometric, 482
normal, 477
Poisson, 478
Fréchet, Cramér, and Rao inequality, 374
Fréchet, Cramér, and Rao lower bound,
375
binomial, 376
exponential, 385
normal, 385
one-parameter exponential family, 377
Poisson, 375
F-distribution, central, 267
moments of, 267
noncentral, 269
moments of, 269
F-test(s), 489
of general linear hypothesis, 540
as generalized likelihood ratio test,
540
for testing equality of variances, 440
Gamma distribution, 203
bivariate, 113
characterizations, 207
MGF, 205
moments, 206
relation with Poisson, 208
Gamma function, 202
General linear hypothesis, 536
canonical form, 541
estimation in, 536
GLR test of, 540
General linear model, 536
Generalized Likelihood ratio test, 464
asymptotic distribution, 470
F-test as, 468
for general linear hypothesis, 540
for parameter of, binomial, 465
for simple vs. simple hypothesis, 464
bivariate normal, 471
discrete uniform, 471
exponential, 472
normal, 466
Generating functions, 83
moment, 85
probability, 83
Geometric distribution, 84, 180
characterizations, 182
memoryless property of, 182
MGF, 180
moments, 180
order statistic, 164
PGF, 84
Glivenko-Cantelli theorem, 322
Goodness-of-fit problem, 584
Hazard(=failure rate) function, 227
Helmert orthogonal matrix, 274
Hodges-Lehmann estimators, 631
Hölder’s inequality, 153

SUBJECT INDEX 683
Hypergeometric distribution, 184
bivariate, 113
mean and variance, 184
Hypothesis, tests of, 429
alternative, 430
composite, 430
null, 430
parametric, 430
simple, 430
tests of, 430
Identically distributed RVs, 119
Implication rule, 11
Inadmissible decision rule, 416
Independence and correlation, 145
Independence of events, 115
complete=mutual, 118
pairwise, 118
Independence of RVs, 114–121
complete=mutual, 118
pairwise, 118
Independent, identically distributed rv’s,
119
sequence of, 119
Indicator function, 41
Induced distribution, 59
Infinitely often, 309
Interections, 566
Invariance, of hypothesis testing problem,
455
principle, 455
Invariant,
decision problem, 419
family of distributions, 418
function, 420, 455
location, 421
location-scale, 421
loss function, 420
maximal, 505
scale, 420
statistic, 420
Invariant, class of distributions, 419
estimators, 420
maximal, 422, 455
tests, 455
Inverse Gaussian PDF, 228
Jackknife, 533
Joint, DF, 100–102
PDF, 104
PMF, 103
Jump, 47, 103
Jump point, of a DF, 47, 103
Kendall’s sample tau, 612
distribution of, 612
generating function, 92
Kendall’s tau coefficient, 611
Kendall’s tau test, 612
Kernel, symmetric, 577, 582
Kolmogorov’s, inequality, 312
strong law of large numbers, 315
Kolmogorov-Smirnov one sample statistic,
584
for confidence bounds of DF, 587
distribution, 585–587
Kolmogorov-Smirnov test, 602
comparison with chi-square test, 588
one-sample, 587
two-sample, 603
Kolmogorov-Smirnov two sample statistic,
601
distribution, 603
Kronecker lemma, 313
Kurtosis, coefficient of, 83
Laplace=double exponential distribution,
91, 224
MGF, 87
Least square estimation, 537
principle, 537
restricted, 537
L’Hospital rule, 323
Likelihood,
equal, 1
equation, 389
equivalent, 353
function, 389
Limit, inferior, 11
set, 11
superior, 11
Lindeberg central limit theorem, 325
Lindeberg-Levy CLT, 323
Lindeberg condition, 324
Linear combinations of RVs, 147
mean and variance, 147, 149
Linear dependence, 145
Linear model, 536

684 SUBJECT INDEX
Linear regression model, 538, 543
confidence intervals, 545
estimation, 543
testing of hypotheses, 545–546
Locally most powerful test, 459
Location family, 196
Location-scale family, 196
Logistic distribution, 223
Logistic function, 551
Logistic regression, 550
Lognormal distribution, 88, 222
Loss function, 339, 401
Lower bound for variance, Chapman,
Robbins and Kiefer inequality, 377
Fréchet, Cramér and Rao inequality, 372
Lyapunov condition, 326
Lyapunov inequality, 96
Maclaurin expansion of an mgf, 86
Mann-Whitney statistic, 604
moments, 582
null distribution, 605
Mann-Whitney-Wilcoxon test, 605
Marginal,
DF, 107
PDF, 106
PMF, 105
Markov’s inequality, 94
Maximal invariant statistic, 422, 455
function of, 457
Maximum likelihood estimation, principle
of, 389
Maximum likelihood estimator, 389
asymptotic normality, 397–398
consistency, 397
as a function of sufficient statistic, 394
invariance property, 396
Maximum likelihood estimation method
applied to, Bernoulli, 392
binomial, 399
bivariate normal, 395
Cauchy, 399
discrete uniform, 390
exponential, 396
gamma, 393
hypergeometric, 391
normal, 390
Poisson, 399
uniform, 391, 394
Mean square error, 339, 362
Median, 80, 82
Median test, 600
Memoryless property,
of exponential, 207
of geometric, 182
Method of finding distribution,
CF or MGF, 90, 137
DF, 56, 124
transformations 128
Methods of finding confidence interval
Bayes, 511
for large samples, 511
pivot, 504
test inversion, 507
Method of moments, 386
applied to, beta, 388
binomial, 387
gamma, 388
lognormal, 388
normal, 388
Poisson, 386
uniform, 387
Minimal sufficient statistic, 354
for beta, 358
for gamma, 358
for geometric, 358
for normal, 355
for Poisson, 358
for uniform, 354, 358
Minimax, estimator, 402
principle, 402
solution, 492
Minimax estimation for parameter of,
Bernoulli, 402
binomial, 412
hypergeometric, 414
Minimum mean square error estimator, 339
for variance of normal, 368
Minimum risk equivariant estinator, 421
for location parameter, 424
for scale parameter, 425
Mixing proportions, 225
Minkowski inequality, 153
Mixture density function, 224–225
Moment, about origin, 70
absolute, 70
central, 77
condition, 73
Factorial, 79
of conditional distribution, 158

SUBJECT INDEX 685
of DF, 70
of functions of multiple RVs, 136
inequalities, 93
lemma, 74–75
non-existence of order, 75
of sample covariance, 257
of sample mean, 253
of sample variance, 253–254
Moment generating function, 85
continuity theorem for, 317
differentiation, 86
existence, 87
expansion, 86
limiting, 316
of linear combinations, 139
and moments, 86
of multiple RVs, 136
of sample mean, 256
series expansion, 86
of sum of independent RVs, 139
uniqueness, 86
Monotone likelihood ratio, 446
for hypergeometric, 448
for one-parameter exponential family, 447
UMP test for families with, 447
for uniform, 446
Most efficient estimator, 382
asymptotically, 382
as MLE, 395
Most powerful test, 432
for families with MLR, 446
as a function of sufficient statistic, 440
invariant, 456
Neyman-Pearson, 438
similar, 433
unbiased, 432
uniformly, 432
Multidimentional RV=multiple RV, 99
Multinomial coefficient, 23
Multinomial distribution, 190
MGF, 190
moments, 191
Multiple RV, 99
continuous type, 104
discrete type, 103
functions of, 123
Multiple regression, 543
Multiplication rule, 27
Multivariate hypergeometric distribution,
192
Multivariate negative binomial
distribution, 193
Multivariate normal, 234
dispersion matrix, 236
Natural parameters, 243
Negative binomial (=Pascal or waiting
time) distribution, 178–179
bivariate, 113
central term, 194
mean and variance, 179
MGF, 179
Negative hypergeometric distribution, 186
mean and variance, 186
Neyman-Pearson lemma, 438
Neyman-Pearson lemma applied to,
Bernoulli, 442
normal, 444
Noncentral, chi-square distribution, 263
F-distribution, 269
t-distribution, 266
Noncentrality parameter, of chi-square,
263
F-distribution, 269
t-distribution, 266
Noninformative prior, 409
Nonparametric=distribution-free
estimation, 576–577
methods, 576
Nonparametric unbiased estimation, 576
of population mean, 578
of population variance, 578
Normal approximation, to binomial, 328
to Poisson, 330
Normal distribution=Gaussian law,
87, 216
bivariate, 228
characteristic function, 87
characterizations, 219, 221, 238
contaminated, 625, 628
folded, 426
as limit of binomial, 321, 328
as limit of chi-square, 322
as limit of Poisson, 330
MGF, 217
moments, 217–218
multivariate, 234
singular, 232
as stable distribution, 321
standard, 216

686 SUBJECT INDEX
Normal distribution=Gaussian law (cont’d )
tail probability, 219
truncated, 111
Normal equations, 537
Odds, 8
Order statistic, 164
is complete and sufficient, 576
joint PDF, 165
joint marginal PDF, 168
kth, 164
marginal PDF, 167
uses, 619
moments, 169
Ordered samples, 21
Orders of magnitude, o and O notation, 318
Parameter(s), of a distribution, 67,
196, 576
estimable, 576
location, 196
location-scale, 196
order, 79
scale, 196
shape, 196
space, 338
Parametric statistical hypothesis, 430
alternative, 430
composite, 430
null, 430
problem of testing, 430
simple, 430
Parametric statistical inference, 245
Pareto distribution, 82, 222
Partition, 351
coarser, 352
finer, 352
minimal sufficient, 353
reduction of a, 352
sets, 351
sub-, 352
sufficient, 351
Percentile confidence interval, 531
centered percentile confidence interval,
532
Permutation, 21
Pitman estimator, 24
location, 426
scale, 426
Pitman’s asymptotic relative efficiency, 632
Pivot, 504
Point estimator, 338, 340
Point estimation, problem of, 338
Poisson DF, as incomplete gamma, 209
Poisson distribution, 57, 83, 186
central term, 194
characterizations, 187
coefficient of skewness, 82
kurtosis, 82
as limit of binomial, 194
as limit of negative binomial, 194
mean and variance, 187
MGF, 187
moments, 82
PGF, 187
truncated, 111
Poisson regression, 553
Polya distribution, 185
Pooled sample variance, 485
Population, 245
Population distribution, 246
Posterior probability, 29
Principle of,
equivariance, 420
inclusion-exclusion, 9
invariance, 456
least squares, 537
Probability, 7
addition rule, 9
axioms, 7
conditional, 26
continuity of, 13
countable additivity of, 7
density function, 49
distribution, 42
equally likely assignment, 7, 21
on finite sample spaces, 20
generating function, 83
geometric, 13
integral transformation, 200
mass function, 47
measure, 7
monotone, 8
multiplication rule, 27
posterior and prior, 29
principle of inclusion-exclusion, 9
space, 8
subadditivity, 9
tail, 72

SUBJECT INDEX 687
total, 28
uniform assignment of, 7, 21
Probability integral transformation, 200
Probit regression, 552
Problem,
of location, 590
of location and symmetry, 590
of moments, 88
P-value, 437, 481, 599
Quadratic form, 228
Quantile of orderp=(100p)th percentile,
79
Random, 13
Random experiment=statistical
experiment, 3
Random interval, 500
coverage of, 619
Random sample, 13, 246
from a finite population, 13
from a probability distribution, 13, 246
Random sampling, 246
Random set, family of, 500
Random variable(s), 40
bivariate, 103
continuous type, 49, 104
discrete type, 47
degenerate, 48
equivalent, 119
exchangeable 120, 149, 255
functions of a, 55
multiple=multivariate, 99
standardized, 78
symmetric, 69
symmetrized, 121
truncated, 110
uncorrelated, 145
Range, 168
Rank correlation coefficient, 614
Rayleigh distribution, 224
Realization of a sample, 246
Rectangular distribution, 199
Regression, 543
coefficient, 277
linear, 544
logistic, 551
model, 543
multiple, 543
Poisson, 552
probit, 552
Regularity conditions of FCR inequality,
372
Resampling, 530
Risk function, 339, 402
Robust estimator(s), 631
Robust test(s), 634
Robustness, of chi-square test, 631
of sample mean as an estimator, 628
of sample standard deviation as an
estimator, 628
of Student’st-test, 629
Robust procedure, defined, 625, 631
Rules of counting, 21–24
Run, 607
Run test, 607
Sample, 245–246
correlation coefficient, 251
covariance, 251
DF, 250
mean, 247
median, 251
distribution of, 260
MGF, 251
moments, 250–251
ordered, 21
point, 3
quantile of orderp, 251, 342
random, 246
regression coefficient, 282
space, 3
statistic(s), 246, 249
standard deviation, 248
standard error, 256
variance, 247
Sampling with and without replacement,
21, 247
Sampling from bivariate normal, 276
distribution of sample correlation
coefficient, 277
distribution of sample regression
coefficient, 277
independence of sample mean vector
and dispersion matrix, 277
Sampling from univariate normal, 271
distribution of sample variance, 273
independenceof¯XandS
2
, 273
Scale family, 196

688 SUBJECT INDEX
Sequence of events, 11
limit inferior, 11
limit set, 11
limit superior, 11
nondecreasing, 12
nonincreasing, 12
Set function, 7
Shortest-length confidence interval(s), 517
for the mean of normal, 518–519
for the parameter of exponential, 523
for the parameter of uniform, 521
for the variance of normal, 519
σ-field, 3
choice of, 3
generated by a class=smallest, 40
Sign test, 590
Similar tests, 454
Single-sample problem(s), 584
of fit, 584
of location, 590
and symmetry, 590
Skewness, coefficient of, 82
Slow variation, function of, 76
Slutsky’s theorem, 298
Spearman’s rank correlation coefficient, 614
distribution, 615
Stable distribution, 216, 321
Standard deviation, 77
Standard error, 256
Standardized RV, 78
Statistic of orderk, 164
marginal PDF, 167
Stirling’s approximation, 194
Stochastically larger, 600
Strong law of large numbers, 308
Borel’s, 315
Kolmogorov’s, 315
Student’st-distribution, central, 265
bivariate, 282
moments, 267
noncentral, 267
moments, 267
Student’st- statistic, 265
Student’st- test, 484–485
as generalized likelihood ratio test, 467
for paired observations, 486
robustness of, 630
Substitution principle, 386
estimator, 386
Sufficient statistic, 343
factorization criterion, 344
joint, 345
Sufficient statistic for, Bernoulli, 345
beta, 356
discrete uniform, 346
gamma, 356
lognormal, 357
normal, 346
Poisson, 343
uniform, 346
Support, of a DF, 50, 103
Survival function=reliability function,
227
Symmetric DF or RV, 50, 103
Symmetrization, 121
Symmetrized rv, 121
Symmetry, center of, 73
Tail probabilities, 72
Test(s),
α-similar, 453
chi-square, 470
critical=rejection region, 431
critical function, 431
of hypothesis, 431
F-, 489
invariant, 453
level of significance, 431
locally most powerful, 459
most powerful, 432
nonrandomized, 432
one-tailed, 484
power function, 432
randomized, 432
similar, 453
size, 432
statistic, 433
Student’st,506
two tailed, 484
unbiased, 484
uniformly most powerful, 432
Testing the hypothesis of, equality of several
normal means, 539
goodness-of- fit, 482, 584
homogeneity, 479
independence, 608
Tests of hypothesis, Bayes, 507
GLR, 463
minimax, 491
Neyman-Pearson, 438

SUBJECT INDEX 689
Tests of location, 590
sign test, 590
Wilcoxon signed-rank, 592
Tolerance coefficient and interval, 619
Total probability rule, 28
Transformation, 55
of continuous type, 58, 124, 128
of discrete type, 58, 135
Helmert, 274
Jacobian of, 128
not one-to-one, 165
one-to-one, 56, 129
Triangular distribution, 52
Trimmed mean, 632
Trinomial distribution, 191
Truncated distribution, 110
Truncated RVs, 110
Truncation, 110
Two-point distribution, 174
Two-sample problems, 599
Types of error in testing hypotheses, 431
Unbiased confidence interval(s), 523
general method of construction, 524
for mean of normal, 524
for parameter of exponential, 529
for parameter of uniform, 529
for variance of normal, 526
Unbiased estimator, 339
best linear, 361
and complete sufficient statistic, 365
LMV, 361
and sufficient statistic, 364
UMV, 361
Unbiased estimation for parameter of,
Bernoulli, 365, 364
bivariate normal, 368
discrete uniform, 369
exponential, 369
hypergeometric, 369
negative binomial, 368
normal, 365
Poisson, 363
Unbiased test, 453
for mean of normal, 454
and similar test, 453
UMP, 453
Uncorrelated RVs, 145
Uniform distribution, 56, 197
characterization, 201
discrete, 72, 175
generating samples, 201
MGF, 199
moments, 199
statistic of orderk, 168, 213
truncated, 111
UMP test(s)
α-similar, 453
invariant, 457
unbiased, 453
U-statistic, 576
for estimating mean and variance, 578
one-sample, 576
two-sample, 581
Variance, 77
properties of, 77
of sum of RVs, 148
Variance stablizing transformations, 333
Weak law of large numbers, 303, 306
centering and norming constants, 303
Weibull distribution, 223
Welch approximatet-test, 486
Wilcoxon signed-rank test, 592
Wilcoxon statistic, 593
distribution, 594, 597
generating function, 93
moments, 597
Winsorization, 112
Tags