Applied Statistical Genetics With R 2nd Edition Andrea S Foulkes

aiuoboddman 5 views 88 slides May 19, 2025
Slide 1
Slide 1 of 88
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88

About This Presentation

Applied Statistical Genetics With R 2nd Edition Andrea S Foulkes
Applied Statistical Genetics With R 2nd Edition Andrea S Foulkes
Applied Statistical Genetics With R 2nd Edition Andrea S Foulkes


Slide Content

Applied Statistical Genetics With R 2nd Edition
Andrea S Foulkes download
https://ebookbell.com/product/applied-statistical-genetics-
with-r-2nd-edition-andrea-s-foulkes-38366498
Explore and download more ebooks at ebookbell.com

Here are some recommended products that we believe you will be
interested in. You can click the link to download.
Applied Statistical Genetics With R For Populationbased Association
Studies 1st Edition Andrea S Foulkes Auth
https://ebookbell.com/product/applied-statistical-genetics-with-r-for-
populationbased-association-studies-1st-edition-andrea-s-foulkes-
auth-2262024
Applied Statistical Techniques Ekwal Imam
https://ebookbell.com/product/applied-statistical-techniques-ekwal-
imam-48756450
Applied Statistical Learning With Case Studies In Stata Matthias
Schonlau
https://ebookbell.com/product/applied-statistical-learning-with-case-
studies-in-stata-matthias-schonlau-51227254
Applied Statistical Considerations For Clinical Researchers David
Culliford
https://ebookbell.com/product/applied-statistical-considerations-for-
clinical-researchers-david-culliford-52564806

Applied Statistical Modeling And Data Analytics A Practical Guide For
The Petroleum Geosciences 1st Edition Srikanta Mishra
https://ebookbell.com/product/applied-statistical-modeling-and-data-
analytics-a-practical-guide-for-the-petroleum-geosciences-1st-edition-
srikanta-mishra-43334818
Applied Statistical Inference Likelihood And Bayes 1st Edition
Leonhard Held
https://ebookbell.com/product/applied-statistical-inference-
likelihood-and-bayes-1st-edition-leonhard-held-4606826
Applied Statistical Inference With Minitab Harcdr Sally Lesik
https://ebookbell.com/product/applied-statistical-inference-with-
minitab-harcdr-sally-lesik-4767108
Applied Statistical Methods In Agriculture Health And Life Sciences
1st Edition Bayo Lawal Auth
https://ebookbell.com/product/applied-statistical-methods-in-
agriculture-health-and-life-sciences-1st-edition-bayo-lawal-
auth-4930638
Applied Statistical Inference With Minitab Second Edition Lesik
https://ebookbell.com/product/applied-statistical-inference-with-
minitab-second-edition-lesik-10501870

Use R!
Series Editors:
Robert Gentleman Kurt Hornik Giovanni Parmigiani
For other titles published in this series, go to
http://www.springer.com/series/6991

Andrea S. Foulkes
AppliedStatistical
GeneticswithR
For Population-based Association Studies
123

Andrea S. Foulkes
University of Massachusetts
School of Public Health & Health Sciences
404 Arnold House
715 N. Pleasant Street
Amherst, MA 01003
USA
[email protected]
ISBN 978-0-387-89553-6 e-ISBN 978-0-387-89554-3
DOI 10.1007/978-0-387-89554-3
Springer Dordrecht Heidelberg London New York
Library of Congress Control Number: PCN applied for
cSpringer Science+Business Media, LLC 2009
All rights reserved. This work may not be translated or copied in whole or in part without the written
permission of the publisher (Springer Science+Business Media, LLC, 233 Spring Street, New York, NY
10013, USA), except for brief excerpts in connection withreviews or scholarly analysis. Use in connection
with any form of information storage and retrieval, electronic adaptation, computer software, or by similar
or dissimilar methodology now known orhereafter developed is forbidden.
The use in this publication of trade names, trademarks, service marks, and similar terms, even if they are
not identified as such, is not to be taken as an expression of opinion as to whether or not they are subject
to proprietary rights.
Printed on acid-free paper
Springer is part of Springer Science+Business Media (www.springer.com)

To Rich, Sophie and Ella

Preface
This book is intended to provide fundamental statistical concepts and tools
relevant to the analysis of genetic data arising from population-based associa-
tion studies. Elementary knowledge of statistical methods at the level of a rst
course in biostatistics is assumed. Chapters 1{3 provide a general overview of
the genetic and epidemiological considerations relevant to this setting. Topics
covered include: (1) types of investigations, typical data components and fea-
tures in genetic association studies, and basic genetic vocabulary (Chapter 1);
(2) epidemiological principles relevant to population-based studies, including
confounding and eect modication (Chapter 2); (3) elementary statistical
methods for estimating and testing association (Chapter 2); (4) the overarch-
ing analytical challenges inherent in these investigations (Chapter 2); (5) basic
genetic concepts, including linkage disequilibrium, Hardy-Weinberg equilib-
rium, and haplotypic phase (Chapter 3); and (6) quality control methods for
assessing genotyping errors and population substructure (Chapter 3).
The remaining chapters are organized as follows. Chapters 4 and 5 deal
primarily with methods that aim to identify single genetic polymorphisms or
single genes that contribute individually to measures of disease progression or
disease status. This includes testing concepts and methods for appropriately
adjusting for multiple comparisons (Chapter 4) and approaches to the analy-
sis of unobservable haplotypic phase (Chapter 5). Chapters 6 and 7 focus on
methods for variable subset selection and particularly methods that simulta-
neously evaluate a large number of variables to arrive at the best predictive
model for the complex disease trait under investigation. Notably, while all
of these methods consider multiple polymorphisms concomitantly, some fo-
cus on conditional eects of these genetic variables, while other methods are
specically designed for identifying and testing potential interaction among
genetic polymorphisms in their eects on disease phenotypes. This section
covers classication and regression trees (Chapter 6), extensions of the tree
framework|namely random forests, logic regression and multivariable adap-
tive regression splines|and a brief introduction to Bayesian variable selection
(Chapter 7).
VII

VIII Preface
The eld of statistical genomics includes a large array of methods for a
wide variety of medical and public health applications. While the methods
described herein are broadly relevant, this text does not directly address is-
sues specic to family-based studies, evolutionary (population genetic) mod-
eling, and gene expression analysis. This text also does not attempt to pro-
vide a comprehensive summary of existing methods in the rapidly expand-
ing eld of statistical genomics. Rather, fundamental concepts are presented
at the level of an introductory graduate-level course in biostatistics, with
the aim of oering students a foundation and framework for understanding
more complex methods. Two application areas are considered throughout this
text: (1) human genetic investigations in population-based association stud-
ies of unrelated individuals and (2) studies aiming to characterize associa-
tions between Human Immunodeciency Virus (HIV) genotypes and pheno-
types, as measured byin vitrodrug responsiveness. Several publicly available
datasets are used for illustration and can be downloaded at the book web-
site (http://people.umass.edu/foulkes/asg.html). While data simulations are
not described, emphasis is placed on understanding the implicit modeling as-
sumption generally required for testing. An overarching theme of this text is
that the application of any statistical method aims to characterize aspecic
relationship among variables. For example, just as an additive model of asso-
ciation can be used to evaluate additive structure, a classication or regression
tree aims to characterize conditional associations. The array of methods that
are applied to data arising from genetic association studies dier primarily in
the types of associations that they are designed to uncover.
This text is also intended to complement the existing literature on statis-
tical genetics and molecular epidemiology in two ways. First, this text oers
extensive and integrated examples using R, an open-source, publicly avail-
able statistical computing software environment. This is intended both as a
pedagogical tool for providing readers with a deeper understanding of the
statistical algorithms presented and as a practical tool for applying the ap-
proaches described herein. Second, this text provides comprehensive coverage
of both genetic concepts, such as linkage disequilibrium and Hardy-Weinberg
equilibrium, from a statistical perspective, as well as fundamental statistical
concepts, such as adjusting for multiplicity and methods for high-dimensional
data analysis, relevant to the analysis of data arising from genetic associa-
tion studies. Several excellent texts, including Thomas (2004) and Ziegler and
Koenig (2007), provide in-depth coverage of genetic data concepts relevant
to both population-based and family-based investigations. The present text
presents these concepts within the context of familiar statistical nomencla-
ture while providing coverage of several additional pertinent epidemiological
concepts and statistical methods for characterizing association. This presen-
tation is at a level that is accessible to the reader with a limited background
in biostatistics and with an interest in public health or biomedical research.
More advanced discussions of the underlying theory can be found in alterna-

Preface IX
tive texts such as Hastieet al.(2001) and Lange (2002), as well as the original
manuscripts cited throughout this text.
The primary focus of this text is on candidate gene studies that involve the
investigation of polymorphisms at several genetic sites within and across one
or more genes and their associations with a trait. In the past several years,
technological advancements leading to development and widespread availabil-
ity of \SNP chips" have led to an explosion of genome-wide association studies
(GWAS) involving 500 thousand to 1 million single-nucleotide polymorphisms
(SNPs). The methods presented in this text apply equally to candidate gene
approaches and whole and partial GWAS. Notably, however, the latter setting
requires additional consideration of the computational burden of associated
analysis as well as data preprocessing and error checking, as discussed in Sec-
tion 3.3 and throughout this text. While GWAS have gained a great deal of
popularity in recent years, they do not obviate the need for candidate gene
studies that further investigate the role of specic genes in disease progression
as well as the potential confounding or modifying roles of traditional risk fac-
tors, including both clinical and demographic characteristics. Instead, GWAS
provide investigators with a vastly improved body of scientic knowledge to
inform the selection of candidate genes for hypothesis-driven research.
The term high-dimensional has taken on many meanings across dierent
elds of research and over the past decade of rapid expansion in these elds. In
this text, high-dimensional is dened simply as a large number of potentially
correlated variables that may interact, in a statistical or a biological sense,
in their association with the outcome under investigation. The term is used
loosely to refer to any number of variables for which there is a complex, un-
characterized structure and the usual least squares regression setting may not
be easily applicable. High-dimensional data methods including approaches to
multiplicity and characterizing gene{gene and gene{environment interactions
are addressed within the context of characterizing associations among genetic
sequence data and disease traits. In these settings, the predictor variables are
SNPs or corresponding amino acids and are categorical. Primary consideration
is given to dependent variables that are either continuous measures of disease
progression or binary indicators of disease status, though brief mention is also
made of methods for multivariate and survival outcomes. Specic attention
is given to the potential confounding and mediating roles of individual-level
clinical and demographic data.
Implementation of all described methods is demonstrated using the R en-
vironment and associated packages, which are publicly available at the Com-
prehensive R Archive Network (CRAN) website (http://cran.r-project.org/).
The decision to use R in this text over alternative programming languages
is multifaceted. First, as a publicly available package, R is freely accessible
to all readers and, importantly, students will continue to have access to R
at all future personal and professional venues. As an open-source language,
R also provides students with the opportunity to view code used to generate
functions, serving as a valuable pedagogical tool for more programmatically

X Preface
minded learners. Another key advantage of R is that investigators who develop
new statistical methodology often provide an accompanying R package for
implementation through the CRAN website, providing users with almost im-
mediate access to implementation of the most recently developed approaches.
Finally, with the availability of contributed packages, the choice of method to
apply rests with the user rather than with what a core development team of
the programming language chooses to release.
While strongly preferable for the reasons mentioned above, use of R in
this text does have the drawback from a pedagogical perspective that both
the versions and packages are updated frequently. That is, we see a clear
trade-o between accessibility and stability. In the process of writing this
text, several changes in the packages described herein occurred, resulting in
inconsistent outputs. While these inconsistencies have been resolved as of
the present date, several more are likely to arise over the next several years.
The reader is encouraged to visit the textbook website for information on
these changes. All of the programming scripts in this text were written and
tested for R version 2.7.1. Ascii text les with complete R code used for the
examples in this textbook can be found on the textbook website. The les
can be downloaded, or read directly into R using thesource()function. For
example, to source the code from Example 1.1, we can write the following at
the R prompt:
> source("http://people.umass.edu/foulkes/asg/examples/1.1.r")
Additionally specifyingprint.eval=Tin this function call will print the cor-
responding output. While the programs presented within this text are compre-
hensive, the novice reader can begin with the appendix for a brief introduction
to some fundamental concepts relevant to programming in R. Several, more
comprehensive, introductions to R are available, and the reader is encouraged
to reference these texts as well, including Gentleman (2008), Spector (2008)
and Dalgaard (2002), for additional programming tools and background.
I am grateful for the advice and support I have received in writing this
text from many colleagues, students, friends and family members. I would
especially like to thank my students and postdoctoral fellows, M. Eliot, X.
Li, Y. Liu, Dr. B.A. Nonyane and Dr. K. Au, who spent many hours check-
ing for notational and programming consistency as well as sharing in helpful
discussions. I am indebted to all of the students in the fall 2008 semester
of public health 690T at the University of Massachusetts, Amherst for their
helpful suggestions and for bearing with me in the rst run of this text. I am
grateful for having a long-term friend and colleague in Dr. R. Balasubrama-
nian, whose support and encouragement were pivotal in my decision to write
this text. I am also thankful for the many conversations with Dr. D. Cheng
and her willingness to share her extensive knowledge in applied statistics. I
am obliged to Dr. M.P. Reilly for an enduring collaboration that has fueled
my interest and enhanced my knowledge in applied statistical genetics for
medical research. I am grateful to Dr. A.V. Custer, whose dedication to the

Preface XI
open-source software community was inspirational to me. Dr. V. De Gruttola's
early mentorship continues to shape my research interests, and I am thankful
for the passion and deep thinking he brings to our profession. I also value the
strong encouragement and intellectual engagement of my early career mentors
Dr. E. George and Dr. T. Ten Have. The eorts of Dr. E. Homan, Dr. H.
Gorski and colleagues in providing the FAMuSS and HGDP data were ex-
traordinary, and their commitment to public access to data resources is truly
outstanding. I am also indebted to Dr. R. Shafer and colleagues for their re-
markable eort in creating and maintaining the Stanford University HIV Drug
Resistance Database, from which the Virco data were downloaded and sev-
eral additional data sets can be accessed easily. I also greatly appreciate the
insightful leadership of the R core development team and the individuals who
wrote and maintain the R packages used throughout this text. All gures in
this text were generated in R or created using the open-source graphics editor
Inkscape (http://www.inkscape.org/). I value the many insightful comments
and suggestions of the editors and anonymous reviewers. Support for this text
was provided in part by a National Institute of Allergies and Infectious Dis-
ease (NIAID) individual research award (R01AI056983). Finally, thanks to
my family for their tremendous love and support.
Andrea S. Foulkes
Amherst, MA
May 2009

Contents
Preface. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . VII
List of Tables. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . XVII
List of Figures. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .XIX
Acronyms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . .XXI
1 Genetic Association Studies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 1
1.1 Overview of population-based investigations . . . . . . . . . . . . . . . . 2
1.1.1 Types of investigations . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 2
1.1.2 Genotype versus gene expression . . . . . . . . . . . . . . . . . . . . 4
1.1.3 Population-versus family-based investigations . . . . . . . . . 6
1.1.4 Association versus population genetics . . . . . . . . . . . . . . . 7
1.2 Data components and terminology . . . . . . . . . . . . . . . . . . . . . . . . . 7
1.2.1 Genetic information . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 8
1.2.2 Traits . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 11
1.2.3 Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3 Data examples . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 12
1.3.1 Complex disease association studies . . . . . . . . . . . . . . . . . . 13
1.3.2 HIV genotype association studies . . . . . . . . . . . . . . . . . . . . 16
1.3.3 Publicly available data used throughout the text . . . . . . 18
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 27
2 Elementary Statistical Principles. . . . . . . . . . . . . . . . . . . . . . . . . . 29
2.1 Background . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 30
2.1.1 Notation and basic probability concepts . . . . . . . . . . . . . . 30
2.1.2 Important epidemiological concepts . . . . . . . . . . . . . . . . . . 33
2.2 Measures and tests of association . . . . . . . . . . . . . . . . . . . . . . . . . . 37
2.2.1 Contingency table analysis for a binary trait . . . . . . . . . . 38
2.2.2 M-sample tests for a quantitative trait . . . . . . . . . . . . . . . 44
XIII

XIV Contents
2.2.3 Generalized linear model . . . . . . . . . . . . . . . . . . . . . . . . . . . 48
2.3 Analytic challenges . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 55
2.3.1 Multiplicity and high dimensionality . . . . . . . . . . . . . . . . . 55
2.3.2 Missing and unobservable data considerations . . . . . . . . . 58
2.3.3 Race and ethnicity . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 60
2.3.4 Genetic models and models of association . . . . . . . . . . . . 61
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 62
3 Genetic Data Concepts and Tests . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.1 Linkage disequilibrium (LD) . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 65
3.1.1 Measures of LD:D
0
andr
2
. . . . . . . . . . . . . . . . . . . . . . . . . 66
3.1.2 LD blocks and SNP tagging . . . . . . . . . . . . . . . . . . . . . . . . 74
3.1.3 LD and population stratication . . . . . . . . . . . . . . . . . . . . 76
3.2 Hardy-Weinberg equilibrium (HWE) . . . . . . . . . . . . . . . . . . . . . . . 78
3.2.1 Pearson's
2
-test and Fisher's exact test . . . . . . . . . . . . . 78
3.2.2 HWE and population substructure . . . . . . . . . . . . . . . . . . 82
3.3 Quality control and preprocessing . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.3.1 SNP chips . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 86
3.3.2 Genotyping errors . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 88
3.3.3 Identifying population substructure . . . . . . . . . . . . . . . . . . 89
3.3.4 Relatedness . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 92
3.3.5 Accounting for unobservable substructure . . . . . . . . . . . . 94
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 95
4 Multiple Comparison Procedures . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.1 Measures of error . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 97
4.1.1 Family-wise error rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 98
4.1.2 False discovery rate . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 100
4.2 Single-step and step-down adjustments . . . . . . . . . . . . . . . . . . . . . 101
4.2.1 Bonferroni adjustment . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 102
4.2.2 Tukey and Schee tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . 105
4.2.3 False discovery rate control . . . . . . . . . . . . . . . . . . . . . . . . . 109
4.2.4 Theq-value . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 112
4.3 Resampling-based methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.3.1 Free step-down resampling . . . . . . . . . . . . . . . . . . . . . . . . . . 114
4.3.2 Null unrestricted bootstrap . . . . . . . . . . . . . . . . . . . . . . . . . 120
4.4 Alternative paradigms . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.4.1 Eective number of tests . . . . . . . . . . . . . . . . . . . . . . . . . . . 123
4.4.2 Global tests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 125
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 127

Contents XV
5 Methods for Unobservable Phase . . . . . . . . . . . . . . . . . . . . . . . . . . 129
5.1 Haplotype estimation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 130
5.1.1 An expectation-maximization algorithm . . . . . . . . . . . . . . 130
5.1.2 Bayesian haplotype reconstruction . . . . . . . . . . . . . . . . . . . 137
5.2 Estimating and testing for haplotype{trait association . . . . . . . . 140
5.2.1 Two-stage approaches . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 140
5.2.2 A fully likelihood-based approach . . . . . . . . . . . . . . . . . . . . 145
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 149
Supplemental notes . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 150
Supplemental R scripts . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 155
6 Classication and Regression Trees . . . . . . . . . . . . . . . . . . . . . . . . 157
6.1 Building a tree . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
6.1.1 Recursive partitioning. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 157
6.1.2 Splitting rules . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 158
6.1.3 Dening inputs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 167
6.2 Optimal trees . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 173
6.2.1 Honest estimates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
6.2.2 Cost-complexity pruning . . . . . . . . . . . . . . . . . . . . . . . . . . . 174
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 179
7 Additional Topics in High-Dimensional Data Analysis . . . . . 181
7.1 Random forests . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 182
7.1.1 Variable importance . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 183
7.1.2 Missing data methods . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 187
7.1.3 Covariates . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
7.2 Logic regression . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 198
7.3 Multivariate adaptive regression splines . . . . . . . . . . . . . . . . . . . . 205
7.4 Bayesian variable selection . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 209
7.5 Further readings . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 211
Problems . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 212
Appendix R Basics. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
A.1 Getting started. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 213
A.2 Types of data objects . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 216
A.3 Importing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 220
A.4 Managing data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 221
A.5 Installing packages . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 224
A.6 Additional help . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 225
References. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 227
Glossary of Terms. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 237
Glossary of Select R Packages. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 243

XVI Contents
Subject Index. . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 247
Index of R Functions and Packages. . . . . . . . . . . . . . . . . . . . . . . . . . . . 251

List of Tables
1.1 Sample of FAMuSS data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 19
1.2 Sample of HGDP data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 24
1.3 Sample Virco data . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 26
2.1 23 contingency table for genotype{disease association . . . . . . . 38
2.2 22 contingency table for genotype{disease association . . . . . . . 39
3.1 Expected allele distributions under independence . . . . . . . . . . . . . 67
3.2 Observed allele distributions under LD . . . . . . . . . . . . . . . . . . . . . . 67
3.3 Genotype counts for two biallelic loci . . . . . . . . . . . . . . . . . . . . . . . 68
3.4 Haplotype distribution assuming linkage equilibrium and
varying allele frequencies . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 76
3.5 Apparent LD in the presence of population stratication . . . . . . 77
3.6 Genotype counts for two homologous chromosomes . . . . . . . . . . . 79
3.7 Example of the eect of population admixture on HWE . . . . . . . 83
3.8 Genotype distributions for varying allele frequencies . . . . . . . . . . 84
3.9 HWD in the presence of population stratication . . . . . . . . . . . . . 85
4.1 Type-1 and type-2 errors in hypothesis testing . . . . . . . . . . . . . . . 98
4.2 Errors for multiple hypothesis tests . . . . . . . . . . . . . . . . . . . . . . . . . 99
6.1 Sample case{control data by genotype indicators . . . . . . . . . . . . . 161
XVII

List of Figures
1.1 Marker SNPs . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 3
1.2 Haplotype pairs corresponding to heterozygosity at two
SNP loci . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 10
1.3 Meiosis and recombination . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 15
1.4 HIV life cycle . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 17
2.1 Confounding . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 34
2.2 Eect mediation . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 36
2.3 Eect modication and conditional association . . . . . . . . . . . . . . . 37
2.4 Possible haplotype pairs corresponding to two SNPs . . . . . . . . . . 59
3.1 Map of pairwise LD . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 71
3.2 Illustration of LD blocks and associated tag SNPs . . . . . . . . . . . . 75
3.3 Application of MDS for identifying population substructure . . . . 92
3.4 Application of PCA for identifying population substructure . . . . 93
6.1 Tree structure . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . . 159
6.2 Classication tree for Example 6.2 . . . . . . . . . . . . . . . . . . . . . . . . . . 164
6.3 Cost-complexity pruning for Example 6.5 . . . . . . . . . . . . . . . . . . . . 178
7.1 Ordered variable importance scores from random forest . . . . . . . 186
7.2 Example boolean statement in logic regression . . . . . . . . . . . . . . . 199
7.3 Single logic regression tree from Example 7.5 . . . . . . . . . . . . . . . . 201
7.4 Sum of logic regression trees from Example 7.5 . . . . . . . . . . . . . . . 202
7.5 Monte Carlo logic regression results from Example 7.6 . . . . . . . . 204
XIX

Acronyms
AA: Amino acid
AIDS: Acquired immunodeciency syndrome
ANOVA: Analysis of variance
BART: Bayesian additive regression tree
BSS: Between-group sum of squares
B-H: Benjamini and Hochberg (approach to multiple testing)
B-Y: Benjamini and Yekutieli (approach to multiple testing)
BMI: Body mass index
BVS: Bayesian variable selection
CART: Classication and regression trees
CV: Cross-validation
DNA: Deoxyribonucleic acid
EM: Expectation-maximization
FAMuSS: Functional SNPS Associated with Muscle Size and Strength
FDR: False discovery rate
XXI

XXII Acronyms
FSDR: Free step-down resampling
FWEC: Family-wise error under the complete null
FWEP: Family-wise error under a partial null
FWER: Family-wise error rate
GLM: Generalized linear model
GWAS: Genome-wide association study
GWS: Genome-wide scan
HGDP: Human Genome Diversity Project
HTR: Haplotype trend regression
HWD: Hardy-Weinberg disequilibrium
HWE: Hardy-Weinberg equilibrium
IBD: Identical by descent
IBS: Identical by state
IDV: Indinavir
LD: Linkage disequilibrium
LOH: Loss of heterozygosity
LS: Learning sample
MARS: Multivariate adaptive regression splines
MCMC: Markov chain Monte Carlo
MDS: Multidimensional scaling
MI: Multiple imputation
MIRF: Multiple imputation and random forests

Acronyms XXIII
MSE: Mean square error
NFV: Nelnavir
OOB: Out-of-bag
PCA: Principal components analysis
pFDR: Positive false-discovery rate
Pr: Protease
PRD: Positively regression dependent
QTL: Quantitative trait loci
RF: Random forest
RNA: Ribonucleic acid
RT: Reverse transcriptase
SAM: Signicance analysis of microarrays
SNP: Single-nucleotide polymorphism
STP: Simultaneous test procedure
WSS: Within-group sum of squares

1
Genetic Association Studies
Recent technological advancements allowing for large-scale sequencing eorts
present an exciting opportunity to uncover the genetic underpinnings of com-
plex diseases. In an attempt to characterize these genetic contributors to dis-
ease, investigators have embarked in multitude on what are commonly referred
to aspopulation-based genetic association studies. These studies generally aim
to relate genetic sequence information derived from unrelated individuals to a
measure of disease progression or disease status. The eld of genomics spans
a wide array of research areas that involve the many stages of processing from
genetic sequence information to protein products and ultimately the expres-
sion of a trait. The breadth of genomic investigations also includes studies
of multiple organisms, ranging from bacteria to viruses to parasites to hu-
mans. In this chapter, two settings are described in which population-based
genetic association studies have marked potential for uncovering disease etiol-
ogy while elucidating new approaches for targeted, individualized therapeutic
interventions: (1) complex disease association studies in humans; and (2) stud-
ies involving the Human Immunodeciency Virus (HIV).
In both settings, interest lies in characterizing associations between mul-
tiple genetic polymorphisms and a measured trait. In addition, these settings
share the essential need to account appropriately for patient-level covariates
as potential confounders or modiers of disease progression to make clinically
meaningful conclusions. While these two settings are not comprehensive, to-
gether they provide a launching point for discussion of quantitative methods
that address the challenges inherent in many genetic investigations. This chap-
ter begins by describing types of population-based studies, which represent one
class of investigations within the larger eld of genomics research. Also dis-
cussed are the fundamental features of data arising from these investigations
as well as the analytical challenges inherent in this endeavor.
A.S. Foulkes,Applied Statistical Genetics with R: For Population-based Association1
Studies, Use R, DOI: 10.1007/978-0-387-89554-3
1,
cSpringer Science+Business Media LLC 2009

2 1 Genetic Association Studies
1.1 Overview of population-based investigations
Population-based genetic association studies can be divided roughly into four
categories of studies: candidate polymorphism, candidate gene, ne mapping
and whole or partial genome-wide scans. In the following paragraphs, each
of these types of studies is described briey, followed by a discussion of how
population-based genetic investigations t within a larger context of genomic-
based studies. Further discussions of population-based and family-based de-
signs can be found in Thomas (2004) and Balding (2006).
1.1.1 Types of investigations
Candidate polymorphism studies
Investigations of genotype{trait associations for which there is ana priorihy-
pothesis about functionality are calledcandidate polymorphismstudies. Here
the termpolymorphismis dened simply as a genetic variant at a single lo-
cation within a gene. Technically, a variation must be present in at least 1%
of a population to be classied as a polymorphism. Such a variable site is
commonly referred to as a single-nucleotide polymorphism (SNP). Candidate
polymorphism studies typically rely on prior scientic evidence suggesting
that the set of polymorphisms under investigation is relevant to the disease
trait. The aim is to test for the presence of association, and the primary hy-
pothesis is that the variable site under investigation isfunctional. That is, the
goal of candidate polymorphism studies is to determine whether a given SNP
or set of SNPs inuences the disease trait directly.
Candidate gene studies
Candidate genestudies generally involve multiple SNPs within a single gene.
The choice of SNPs depends on dened linkage disequilibrium (LD) blocks and
is discussed further in Section 3.1. The underlying premise of these studies is
that the SNPs under investigation capture information about the underlying
genetic variability of the gene under consideration, though the SNPs may
not serve as the true disease-causing variants. That is, the SNPs that are
being studied are not necessarily functional. Consider for example a setting
in which we want to investigate the association between a gene and disease.
A gene comprises a region of deoxyribonucleic acid (DNA), representing a
portion of the human genome. This is illustrated by the shaded rectangle in
Figure 1.1. In a simple model, we might assume that a mutation at a single
site within this region results in disease. In general, the precise location of this
disease-causing variant is not known. Instead, investigators measure multiple
SNPs that are presumed \close" to this site on the genome. The term \close"
can be thought of as physical distance, though precise methods for choosing
appropriate SNPs are described in more detail in Section 3.1.

1.1 Overview of population-based investigations 3
Fig. 1.1.Marker SNPs
These proximate SNPs are commonly referred to asmarkerssince the
observed genotype at these locations tends to be associated with the genotype
at the true disease-causing locus. The idea underlying this phenomenon is that,
over evolutionary time (that is, over many generations of reproduction), the
disease allele was inherited alongside variants at these marker loci. This occurs
when the probability of a recombination event in the DNA region between
the disease locus and the marker locus is small. Thus, capturing variability
in these loci will tend to capture variability in the true disease locus. Further
discussion of recombination is provided in Section 1.3.1.
Fine mapping studies
The aim ofne mappingstudies tends to dier from those of candidate gene
and candidate polymorphism approaches. Fine mapping studies set out to
identify, with a high level of precision, thelocationof a disease-causing vari-
ant. That is, these studies aim to determine precisely where on the genome the
mutation that causes the disease is positioned. Knowledge about this location
can obviate the need for investigations based on marker loci, thus reducing
the error and variability in associated tests. Within the context of mapping
studies, the termquantitative trait loci (QTL)is used to refer to a chromoso-
mal position that underlies a trait. Methods for mapping and characterizing
QTLs based on controlled experiments of inbred mouse lines are described in
Chapter 15 of Lynch and Walsh (1998). Mapping studies are not a focal point
of this text; however, we note that in some contexts the term \mapping" is
used more loosely to refer to association, the topic of this text, in both family-
and population-based studies. For comprehensive and advanced coverage of
gene mapping methods, the reader is referred to Siegmund and Yakir (2007).
Genome-wide association studies (GWAS)
Similar to candidate gene approaches, studies involving whole and partial
genome-wide scans, termedgenome-wide association studies (GWAS), aim

4 1 Genetic Association Studies
to identify associations between SNPs and a trait. GWAS, however, tend to
be less hypothesis driven and involve the characterization of a much larger
number of SNPs. Partial scans generally involve between 100Kb and 500Kb
segments of DNA, while whole-genome scans range from 500Kb to 1000Kb
regions. While the underlying goal of candidate gene studies and GWAS can
be similar, the data preprocessing is generally more extensive and the compu-
tational burden greater in the context of GWAS, requiring the application of
software packages designed specically to address the high-dimensional nature
of the data, as described in Section 3.3. While GWAS have gained in popu-
larity in recent years due to the advent and widespread availability of \SNP
chips", they do not obviate the need for candidate gene studies. Candidate
gene studies serve to validate ndings from GWAS as well as further explore
the biological and clinical interactions between genes and more traditional risk
factors for complex diseases, such as age, gender, and other patient-level clini-
cal and demographic characteristics. Importantly, the fundamental statistical
concepts and methods described throughout this text are broadly relevant to
both candidate gene studies and GWAS.
1.1.2 Genotype versus gene expression
The term \association" study has come to refer to studies that consider the
relationship between geneticsequenceinformation and a phenotype. Gene
expression studies, based on microarray technology, on the other hand, aim
to characterize associations among geneproducts, such as ribonucleic acid
(RNA) or proteins, and disease outcomes. While the scientic ndings from
these investigations will likely lend support to one another, it is important to
recognize that the two types of studies focus on dierent aspects of the cell
life cycle. In the context of association studies, the raw genetic information
as characterized by the DNA sequence is the primary predictor variable un-
der investigation, and the aim is to understand how polymorphisms in the
sequences explain the variability in a disease trait. Gene expression studies,
on the other hand, focus on the extent to which a DNA sequence coding for
a specic gene is transcribed into RNA (transcriptomics) and then translated
into a protein product (proteomics). The former arises from gene chip tech-
nology and is commonly referred to as expression data, while the latter is
an output of mass spectrometry. Since transcription and translation depend
on many internal and external regulation factors, the expression of a gene
sequence represents a dierent phenomenon than the sequence itself.
A fundamental unit of analysis in population association studies is the
genotype. As described in Section 1.2, genotype is a categorical variable that
takes on values from a predened set of discrete characters. For example, in
humans, most SNPs arebiallelic, indicating there are two possible bases at
the corresponding site within a gene (e.g.,Aanda). Furthermore, since hu-
mans arediploid, each individual will carry two bases, corresponding to each
of two homologous chromosomes. As a result, the possible genotype values

1.1 Overview of population-based investigations 5
in the population areAA,Aaandaa. In studies of gene expression, on the
other hand, the basic unit used in analysis is the gene product, which is typi-
cally a real-valued positive number. Notably, investigators may subsequently
dichotomize this variable, though this additional level of data processing will
depend on the scientic questions under consideration and prior knowledge.
In both settings, a measure of disease status or disease progress, referred
to as thetraitin this text, is also collected for analysis. Notably, in population
association studies, we generally treat the genotype as thepredictorvariable
and the trait as thedependentvariable. In gene expression studies, this may or
may not be the case. Consider for example the setting in which investigators
aim to uncover the association between breast cancer and gene expression.
In this case, the expression of a gene, as measured by how much RNA is
produced, may serve as the main dependent variable, with cancer status as
the potential predictor. The alternative formulation is also tenable. In this
text, since emphasis is on population-based association studies, it is always
assumed that genotype precedes the trait in the causal chain.
While careful consideration must be given to the several notable dierences
in the form as well as the interpretation of the data, many of the statistical
methods described herein are equally applicable to gene expression studies. In
the context of genotype data, we might for example test the null hypothesis
that cholesterol level is the same for individuals with genotypeAAand geno-
typeaa. In the expression setting, the null hypothesis may instead be framed
as the gene expression level is the same for individuals with cardiovascular
disease and those without cardiovascular disease. In both cases, a two-sample
test for equality of means or medians (e.g., the two-samplet-test or Wilcoxon
rank sum test) could be performed and similar approaches to account for mul-
tiple testing employed. Notably, preprocessing of gene expression data prior
to formal statistical analysis also has its unique challenges. Several seminal
texts provide discussion of statistical methods for the analysis of gene expres-
sion data. See for example Speed (2003), Parmigianiet al.(2003), McLachlan
et al.(2004), Gentlemanet al.(2005) and Ewens and Grant (2006).
Finally, we distinguish between genetic association studies and the rapidly
growing eld of research in epigenetics. The termepigeneticsis used to de-
scribe heritable features that control the functioning of genes within an in-
dividual cell but do not constitute a physical change in the corresponding
DNA sequence. Theepigenome, dened literally as \above-the-genome", also
referred to as theepigenetic code, includes information on methylation and hi-
stone patterns, calledepigenetic tags, and plays an essential role in controlling
the expression of genes. These tags can inhibit and silence genes, leading to
common complex diseases such as cancer. In this text, we consider traditional
epidemiological risk factors, such as smoking status and diet, that may play
a role in dening an individual's epigenetic makeup; however, we do not ad-
dress directly the challenges of epigenetic data. For a further discussion of the
role of epigenetics in the link between environmental exposures and disease
phenotypes, see Jirtle and Skinner (2007).

6 1 Genetic Association Studies
1.1.3 Population-versus family-based investigations
The term \population"-based is used to refer to investigations involving un-
related individuals and distinguished from family-based studies. The latter,
as the name implies, involves data collected on multiple individuals within
the same family unit. The statistical considerations for family-based studies
dier from those of population-based investigations in two primary regards.
First, individuals within the same family are likely to be more similar to one
another than are individuals from dierent families. This phenomenon is re-
ferred to in statistics as clustering and implies a within-family correlation.
The idea is that there is something unmeasurable (latent), such as diet or
underlying biological makeup, that makes people from the same family more
alike than people across families. As a result, the trait under investigation is
more highly correlated among individuals within the same family. Account-
ing for the potential within-cluster correlation in the statistical analysis of
family-based data is essential to making valid inference in these settings.
In population-based studies, a fundamental assumption is that individu-
als are unrelated; however, other forms of clustering may exist. For example,
individuals may have been recruited across multiple hospitals so that patients
from the same hospital are more similar than those across hospitals. This
within-cluster correlation can arise particularly if the catchment areas for the
hospitals include dierent socioeconomic statuses or if the standards for pa-
tient care are remarkably dierent. Alternatively, we may have repeated mea-
surements of a trait on the same individual. This is another common situation
in which the assumption of independence is violated. In all of these cases, an-
alytical methods for correlated data are again warranted and are essential for
correctly estimating variance components. In this text, attention is restricted
primarily to methods for independent observations, though consideration is
given to clustered data methods in Section 4.4.2. Tests for relatedness are also
described in Section 3.3. In-depth and comprehensive coverage of correlated
data methods can be found in Diggleet al.(1994), Vonesh and Chinchilli
(1997), Verbeke and Molenberghs (2000), Pinheiro and Bates (2000), McCul-
loch and Searle (2001), Fitzmauriceet al.(2004) and Demidenko (2004).
A second remarkable dierence between population- and family-based
studies involves what is termedallelic phaseand is dened as the alignment
of nucleotides on a single homolog. Allelic phase is typically unobservable in
population-based association studies but can often be determined in the con-
text of family studies. This concept is described in greater detail in Section 1.2
and Chapter 5. As a result of these dierences in the data structure, the meth-
ods for analysis of family-based association studies tend to dier from those
developed in the context of population-based studies. Though some of the
methods described herein, particularly adjustments for multiplicity, are appli-
cable to family-based studies, this text focuses on methods specically relevant
to population association studies, including inferring haplotypic phase (Chap-
ter 5). Elaboration on the specic statistical considerations and methods for

1.2 Data components and terminology 7
family-based studies can be found in Khouryet al.(1993), Liu (1998), Lynch
and Walsh (1998), Thomas (2004), Siegmund and Yakir (2007) and Ziegler
and Koenig (2007).
1.1.4 Association versus population genetics
Finally, we distinguish between population-based association studies (the
topic of this text) andpopulation geneticinvestigations. Population genet-
ics refers generally to the study of changes in the genetic composition of a
population that occur over time and under evolutionary pressures. This in-
cludes, for example, the study of natural selection and genetic drift. In this
text, we instead focus on estimation and inference regarding the association
between genetic polymorphisms and a trait. Statistical methods relevant to
population genetics are described in a number of texts, including Weir (1996),
Gillespie (1998) and Ewens and Grant (2006).
1.2 Data components and terminology
Data arising from population-based genetic association studies are generally
comprised of three components: (1) thegenotypeof the organism under in-
vestigation; (2) a single trait or multipletraits(also referred to aspheno-
types) that are associated with disease progression or disease status; and (3)
patient-speciccovariates, including treatment history and additional clinical
and demographic information. The primary aim of many association studies
is to characterize the relationship between the rst two of these components,
the genotype and a trait.Pharmacogenomicinvestigations aim specically to
analyze how genotypes modify the eects of drug exposure (the third data
component) on a trait. That is, these investigations focus on the statistical
interaction between treatment and genotype on a disease outcome. While the
specic aims of many association studies do not expressly involve the third
data component, patient-specic clinical and demographic information, care-
ful consideration of how these factors inuence the relationship between the
genotype and trait is essential to making valid biological and clinical conclu-
sions. In this chapter, we describe each of these data components, all of which
are highly relevant to population-based association studies, and introduce
some additional terminology. A discussion of the potential interplay among
components of the data and important epidemiological principles, including
confounding, eect mediation, eect modication and conditional association,
is provided in Section 2.1.2. Further elaboration on the concept of phase am-
biguity and appropriate statistical approaches to handling this aspect of the
data are given in Chapter 5.

8 1 Genetic Association Studies
1.2.1 Genetic information
Throughout this text, the termgenotypeis dened as the observed genetic
sequence information and can be thought of as a categorical variable. The
termobservedis used here to distinguish genotype information from haplotype
data, as described below. Humans carry twohomologous chromosomes, which
are dened as segments of deoxyribonucleic acid (DNA), one inherited from
each parent, that code for the same trait but may carry dierent genetic
information. Thus, in its rawest form in humans, the genotype is the pair
of DNA bases adenine (A), thymine (T), guanine (G) and/or cytosine (C)
observed at a location on the organism's genome. This pair includes one base
inherited from each of the two parental genomes and should not be confused
with the pairing that occurs to form the DNA double helix. These two types
of pairing are described further in Section 1.3.1. Genotype data can take
dierent forms across the array of genetic association studies and depend
both on the specic organism under investigation and the scientic questions
being considered, as we will see throughout this text.
The termnucleotiderefers to a single DNA base linked with both a sugar
molecule and phosphate and is often used interchangeably with the term DNA
base.Genesare dened simply as regions of DNA that are eventually made
into proteins or are involved in the regulation of transcription; that is, regions
that regulate the production of proteins from other segments of DNA. In
candidate genestudies, the set of genes under investigation is chosen based
on known biological function. These genes may, for example, be involved in
the production of proteins that are important components of one or more
pathways to disease. In whole and partialgenome-wide association studies
(GWAS), segments of DNA across large regions of the genome are considered
and may not be accompanied by ana priorihypothesis about the specic
pathways to disease.
In population-based association studies, the fundamental unit of analysis
is the single-nucleotide polymorphism (SNP). ASNPsimply describes a single
base pair change that is variable across the general population at a frequency
of at least 1%. The term can also be used more loosely to describe the specic
location of this variability. The overriding premise of association studies is
that there exists variability in DNA sequences across individuals that cap-
tures information on a disease trait. Regions of DNA within and across genes
are said to havegenetic variabilityif the alleles within the region vary across
a population.Conservedregions, on the other hand, exhibit no variability in
a population. Take the simple example of a single base pair location within a
gene. If the genotype at this site isAAfor all individuals within the popula-
tion, then this site is referred to as conserved. On the other hand, ifAA,Aa
andaaare observed, then this site is called variable. Here the lettersAanda
are used to represent the observed nucleotides (A, C, T or G). For example,A
may represent adenine (A) andamay represent thymine (T). Further discus-
sion of notation is provided in Section 2.1.1. Highly conserved regions of DNA

1.2 Data components and terminology 9
are less relevant in the context of association studies since they will not be
able to capture the variability in the disease trait. Studying highly conserved
regions would be tantamount in a traditional epidemiological investigation to
only recruiting smokers to a study and then trying to assess the impact of
smoking on cancer risk. Clearly, multiple levels of the predictor variable, in
this case smoking status, are necessary if the goal is to assess the impact of
this factor on disease.
Multilocus genotypeis used to describe the observed genotype across mul-
tiple SNPs or genes, though the termsgenotypeandmultilocus genotypeare
often used interchangeably. Alocusorsitecan refer to the portion of the
genome that encodes a single gene or the location of a single nucleotide on
the genome. Multilocus genotype data consist of a string of categorical vari-
ables, with elements corresponding to the genotype at each of multiple sites on
the genome. For example, an individual's multilocus genotype may be given
by (Aa; Bb), whereAais the genotype at one site andBbis the genotype at
a second site. Again the lettersA,a,Bandbeach represent the observed
nucleotides (A, C, T or G). Notably, the specic ordering of alleles is non-
informative, so, for example, the genotypesAaandaAare equivalent.
The term multilocus genotype should not be confused with the concept of
haplotype. Haplotype refers to the specic combination of alleles that are in
alignmenton a singlehomolog, dened as one of the two homologous chromo-
somes in humans. Suppose again that an individual's multilocus genotype is
given by (Aa; Bb). The corresponding pair of haplotypes, also referred to as
this individual'sdiplotype, could be (AB; ab) or (Ab; aB). That is, either the
AandBalleles are in alignment on the same homolog, in which caseaand
balign, or theAandballeles align, in which caseaandBare in alignment.
These two possibilities are illustrated in Figure 1.2 and described further in
Section 2.3.2. This uncertainty is commonly referred to asambiguity in allelic
phaseor more simplyphase ambiguity. In general, a multilocus genotype is
observable, although missing data can arise from a variety of mechanisms.
Haplotype data, on the other hand, are generallyunobservablein population-
based studies of unrelated individuals and require special consideration for
analysis, as described in detail in Chapter 5.
This layer of missingness renders population-based association studies
unique from family-based investigations. If parental information were available
on the individual above, then it might be possible to clarify the uncertainty
in allelic phase. For example, if the maternal genotype is (AA; BB) and the
paternal genotype is (aa; bb), then it is clear thatAandBalign on the same
homolog that was inherited from the maternal side and theaandbalign on
the copy inherited from the paternal side. In population-based studies, family
data are generally not available to infer these haplotypes. However, it is possi-
ble to draw strength from the population haplotype frequencies to determine
the most likely alignment for an individual. This is discussed in greater detail
in Chapter 5.

10 1 Genetic Association Studies
Fig. 1.2.Haplotype pairs corresponding to heterozygosity at two SNP loci
The termzygosityrefers to the comparative genetic makeup of two ho-
mologous chromosomes. An individual is said to behomozygousat a given
SNP locus if the two observed base pairs are the same.Heterozygosity, on the
other hand, refers to the presence of more than one allele at a given site. For
example, someone presenting with theAAoraagenotype would be called
homozygous, while an individual with theAais said to be heterozygous at
the corresponding locus. The termloss of heterozygosity (LOH), commonly
used in the context of oncology, refers specically to the loss of function of
an allele, when a second allele is already inactive, through inheritance of the
heterozygous genotype.
Theminor allelefrequency, also referred to as thevariant allelefrequency,
refers to the frequency of the less common allele at a variable site. Note that
here the termfrequencyis used to refer to a population proportion, while
statisticians tend to use the term to refer to a count. The termshomozygous
rareandhomozygous variantare commonly used to refer to homozygosity
with two copies of the minor allele. Consider the simple example of a single-
variable site for whichAAis present in 75% of the population,Aais present
in 20% andaais present in 5%. The frequency of theAallele is then equal to
(75 + 75 + 20)%=2 = 85%, while the frequency ofais (20 + 5 + 5)%=2 = 15%.
In this case, the minor allele (a) frequency is equal to 15%. Themajor allele

1.2 Data components and terminology 11
is the more common allele and is given byAin this example. An example
of calculating the minor and major allele frequencies in R is provided in Sec-
tion 1.3.3.
1.2.2 Traits
Population-based genetic association studies generally aim to relate genetic
information to aclinical outcomeorphenotype, which are both referred to in
this text as atrait. The termsquantitativeandbinarytraits refer respectively
to continuous and binary variables, where a binary variable is dened as one
that can take on two values, such as diseased or not diseased. The termphe-
notypeis dened formally as a physical attribute or the manifestation of a
trait and in the context of association studies generally refers to a measure of
disease progression. In the context of viral genetic investigations, phenotypes
typically refer to anin vitromeasure such as the 50% inhibitory concentration
(IC50), which is dened as the amount of drug required to reduce the replica-
tion rate of the virus by 50%. The termoutcometends to mean the presence
of disease, though it is often used more generally in a statistical sense to refer
to any dependent variable in a modeling framework.
Clinical measures such as total cholesterol and triglyceride levels are ex-
amples of quantitative traits, while the indicator for a cardiovascular outcome,
such as a heart attack, is an example of a binary trait. In a study of breast
cancer, the trait may be dened as an indicator for whether or not a patient
has breast cancer. In HIV investigations, traits include viral load (VL), de-
ned as the concentration of virus in plasma, and CD4+ cell count, which
is a marker for disease progression. In this text, the termstrait,phenotype
andoutcomeare used broadly to refer to bothin vitroandin vivoclinical
measures of disease progression and disease status. Survival outcomes, such
as the time to onset of AIDS, time to a cardiovascular event, or time to death,
as well as ordinal outcomes, such as severity of disease, are other examples of
traits that are also highly relevant to the study of genetic associations with
disease. While this text focuses on continuous and binary traits, alternative
formulations apply and the general methodology presented is applicable to a
wider array of measures.
Traits can be measured cross-sectionally or over multiple time points span-
ning several weeks to several years. Data measured over time are referred to
aslongitudinalormultivariatedata and provide several advantages from an
analytical perspective, as discussed in detail in several texts, including Fitz-
mauriceet al.(2004). The choice of using cross-sectional or longitudinal data
rests primarily on the scientic question at hand. For example, if interest lies
in determining whether genotype aects the change in VL after exposure to
a specic drug, then a longitudinal design with repeated measures of VL is
essential. On the other hand, if the interest is in characterizing VL as a func-
tion of genotype at initiation of therapy, then cross-sectional data may be

12 1 Genetic Association Studies
sucient. While longitudinal studies can increase the power to detect associ-
ation, they tend to be more costly than cross-sectional studies and are more
susceptible to missing data and the resulting biases. In this text, we focus
on the analysis of cross-sectional studies, though the overarching themes and
concepts, such as multiple testing adjustments and the need to control type-1
error rates, are equally applicable to alternative modeling frameworks.
1.2.3 Covariates
In addition to capturing information on the genotype and trait, population-
based studies generally involve the collection of other information on patient-
specic characteristics. For example, in relating genetic polymorphisms to
total cholesterol level among patients at risk for cardiovascular disease, ad-
ditional relevant information may include body mass index (BMI), gender,
age and smoking status. The additional data collected tend to be on variables
that have previously been associated with the trait of interest, in this case
cholesterol level, and may include environmental, demographic and clinical
factors. Consideration of additional variables in the context of analysis will
again depend on the scientic question at hand, the biological pathways to
disease and the overarching goal of the analysis. For example, if the aim of
a study is to identify the best predictive model (that is, to determine the
model that can give the most accurate and precise prediction of cholesterol
level for a new individual), then it is generally a good idea to include variables
previously associated with the outcome in the model. If the goal is to charac-
terize the association between a given gene and the outcome, then including
additional variables, for example self-reported race, may also be warranted if
these variables are associated with both the genotype and the outcome. This
phenomenon is typically referred to asconfoundingand is discussed in greater
detail in Chapter 2. On the other hand, if a variable such as smoking status
is in thecausal pathwayto disease (that is, the gene under investigation in-
uences the smoking status of an individual, which in turn tends to increase
cholesterol levels), then inclusion of smoking status in the analysis may not
be appropriate. In this text, the termcovariateis used loosely to refer to
any explanatory variables that are not of specic independent interest in the
present investigation. Covariates are also commonly referred to asindependent
orpredictorvariables.
1.3 Data examples
Throughout this textbook, we provide examples using publicly available
datasets, including data arising from two human-based investigations and one
study involving HIV. Each of these datasets can be downloaded as ascii text
les from the textbook website:

1.3 Data examples 13
http://people.umass.edu/foulkes/asg.html
Below we include a summary of each dataset and example code for importing
the data into R. Instructions for downloading R, inputing data and basic data
manipulation strategies are given in the appendix. Additional elementary R
concepts can be found in Gentleman (2008), Spector (2008), Venables and
Smith (2008) and Dalgaard (2002). Complete information on all of the vari-
ables within each dataset can be found in the associated ReadMe les on the
textbook website.
The two settings described in this section, complex disease association
studies in humans and HIV genotype{trait association studies, serve as a
framework for the methods presented throughout the text. While both the
structure of the data and the overarching aims of the two settings are similar,
there are a few notable dierences worth mentioning. In both settings, belief
lies in the idea that genetic polymorphisms (that is, variability in the genetic
makeup across a population) will inform us about the variability observed in
the occurrence or presentation of disease. Furthermore, this genetic variability
in both HIV and humans is introduced through the process of replication.
Therateat which these two organisms complete one life cycle, however, is
dramatically dierent. While humans tend to replicate over the course of
several years, an estimated 10
9
to 10
10
new virions are generated in a single
day within an HIV-infected individual. Furthermore, the replication process
for HIV, described in more detail below, is highly error-prone, resulting in a
mutation rate of approximately 310
5
per base per replication cycle, see
for example Robertsonet al.(1995).
As a result, there is a tremendous degree of HIV genetic variability within
a single human host. That is, each HIV-infected individual carries an entire
population of viruses, with each viral particle potentially comprised of dier-
ent genetic material. In addition, the number of viral particles varies across
individuals. Notably, both of these phenomena, genetic variability and the
amount of virus in plasma, are inuenced by current and past drug expo-
sures. In contrast, humans carry two copies of each chromosome, with the
exception of the sex chromosome, one inherited from each parent, and these
tendto remain constant over an individual's lifetime. While relatively rare,
mutations in the human genome do occur within a lifespan as a result of en-
vironmental exposure to mutagens. This process is notably slower in humans
than in HIV and is not a focal point of this text. Additional details on each
of these two settings are provided below.
1.3.1 Complex disease association studies
Characterizing the underpinnings of complex diseases, such as cardiovascular
disease and cancer, is likely to require consideration of multiple genetic and
environmental factors. As described in Section 1.1.1, human genetic inves-
tigations can involve several stages of processing of human genes, from the

14 1 Genetic Association Studies
DNA sequence to the protein product, and encompass a wide assortment of
study designs. In this text, consideration is given to population-based studies
of unrelated individuals, and the primary unit of genetic analysis is the DNA
sequence. Humans inherit their genetic information from their two parental
genomes through processes termedmitosisandmeiosis. All human cells, with
the exception of gametes, contain 46 chromosomes, including 22 homologous
pairs, calledautosomes, and 2 sex chromosomes. Each chromosome is com-
prised of a DNA double helix with two sugar-phosphate backbones connected
by paired bases. In this context, guanine pairs with cytosine (G-C) and ade-
nine pairs with thymine (A-T). This pairing is distinct from the pairing of
homologous chromosomes that constitutes an individual's genotype. Notably,
the latter pairing is not restricted, so that, for example, genotypesGTand
ACcan be observed.
Mitosisis a process of cell division that results in the creation of daugh-
ter cells that carry identical copies of this complete set of 46 chromosomes.
Meiosisis the process by which a germ cell that contains 46 chromosomes,
consisting of one homolog from each parent cell, undergoes two cell divisions,
resulting in daughter cells, calledgametes, with only 23 chromosomes each. In
turn, this new generation of maternal and paternal gametes combines to form
azygote. A visual representation of meiosis is provided in Figure 1.3. Notably,
prior to the meiotic divisions, each of the two homologous chromosomes are
replicated to formsister chromatid. Subsequently, in the process of meiosis,
cross-over between these maternal and paternal chromatids can occur. This
is referred to as across-overor arecombination eventand is depicted in
the gure, where we see an exchange of segments of the paternal chromatid
(shaded) and the maternal chromatid (unshaded). Finally, it is important to
note that the 23 chromosomes are combined independently so that there are
2
23
= 8;388;608 possible combinations of chromosomes within a gamete. This
phenomenon is commonly referred to asindependent assortment. The reader
is referred to any of a number of excellent textbooks that describe these pro-
cesses in greater detail. See for example Chapter 19 of Vanderet al.(1994)
and Albertset al.(1994).
Meiosis ensures two things: (1) each ospring carries the same number of
chromosome pairs (23) as its parents; and (2) the genetic makeup of ospring is
not identical to that of their parents. The latter results from both recombina-
tion and independent assortment. An important aspect of meiosis is that whole
portions orsegmentsof DNA within a chromosome tend to be passed from one
generation to another. However, portions of DNA within chromosomes that
are far from one another are less likely to be inherited together, as a result
of recombination events. In the context of candidate gene studies, the SNPs
under investigation can be knownfunctionalSNPs or what are referred to as
haplotype taggingSNPs. Functional SNPs aect a trait directly, serving as a
component within the causal pathway to disease. Haplotype tagging SNPs, on
the other hand, are chosen based on their ability to capture overall variabil-
ity within the gene under consideration. These SNPs tend to be associated

1.3 Data examples 15
Fig. 1.3.Meiosis and recombination
with functional SNPs but may not be causal themselves. Notably, the length
of a gene region can vary as well as the number of measured base pairs within
each gene. The latter depends on what are calledlinkage disequilibriumblocks
and relate to the probability of recombination within a region. This is de-
scribed further in Section 3.1.
The structure of human genetics data is similar to that in the HIV set-
ting, with a couple of notable exceptions. First, in human investigations, each
individual has exactly two bases present at each location, one from each of
the two homologous chromosomes. As described below in Section 1.3.2, in the
viral genetics setting, an individual can be infected with multiple strains, re-
sulting in any number of nucleotides at a given site. A second dierence is that
in many population-based association studies, human genetic sequence data
are assumed to remain constant over the study period. One notable exception

16 1 Genetic Association Studies
is in the context of cancer, in which DNA damage develops, resulting from
environmental exposure to mutagens and resulting in uncontrolled cell prolif-
eration. In the complex disease association studies described in this text, the
genes under investigation do not vary within the timeframe of study. This is
a marked dierence from the viral genetic setting, in which multiple genetic
polymorphisms can occur within a short period of time, typically in response
to treatment pressures. In the following section, we describe the HIV genetic
setting in greater detail.
1.3.2 HIV genotype association studies
TheHuman Immunodeciency Virus (HIV)is a retrovirus that causes a weak-
ening of the immune system in its infected host. This condition, commonly
referred to asAcquired Immunodeciency Syndrome (AIDS), leaves infected
individuals vulnerable to opportunistic infections and ultimately death. The
World Health Organization estimates that there have been more than 25 mil-
lion AIDS-related deaths in the last 25 years, the majority of which occurred
in the developing world. Highly activeanti-retroviral therapies (ARTs)have
demonstrated a powerful ability to delay the onset of clinical disease and
death, but unfortunately access to these therapies continues to be severely lim-
ited. Furthermore, drug resistance, which can be characterized by mutations
in the viral genome, reduces and in some cases eliminates their usefulness.
Both vaccine and drug development eorts, as well as treatment allocation
strategies in the context of HIV/AIDS, will inevitably require consideration
of the genetic contributors to the onset and progression of disease. In this
section, the viral life cycle and notable features of the data relevant to these
investigations are described.
A visual representation of the HIV life cycle is given in Figure 1.4. As
a retrovirus, HIV is comprised of ribonucleic acid (RNA). From the gure,
we see that the virus begins by fusing on the membrane of a CD4+ cell in
the human host and injecting its core, which includes viral RNA, structural
proteins, and enzymes, into the cell. The viral RNA is then reverse transcribed
into DNA using one of these enzymes,reverse transcriptase. Another enzyme,
integrase, then splices this viral DNA into the host cell DNA. The normal cell
mechanisms for transcription and translation then result in the production
of new viral protein. In turn, this protein is cleaved by theprotease(Pr)
enzyme and together with additional viral RNA forms a new virion. As this
virion buds from the cell, the infected cell is killed, ultimately leading to the
depletion of CD4 cells, which are vital to the human immune system. ARTs,
the drugs used to treat HIV-infected individuals, aim to inhibit each of the
enzymes involved in this life cycle.
Reverse transcription of RNA into DNA is a highly error-prone process,
resulting in a mutation rate of approximately 310
5
per base per cycle. This,
coupled with a very fast replication cycle leading to 10
9
to 10
10
new virions
each day, results in a very high level of genetic variability in the viral genome.

1.3 Data examples 17
Fig. 1.4.HIV life cycle
The resulting viral population within a single human host is commonly re-
ferred to as aquasi-species. While many of these viruses are not viable (that
is, they cannot survive with the resulting mutations), many others do remain.
Notably, evidence suggests that mutated viruses can be transmitted from one
host to another. The composition of a viral quasi-species tends to be highly
inuenced by current and past treatment exposures. HIV therapies generally
consist of a combination of two or three anti-retroviral drugs, commonly re-
ferred to as adrug cocktail. There are currently four classes of drugs that each
target a dierent aspect of the viral life cycle: fusion inhibitors, nucleoside
reverse transcriptase inhibitors (NRTIs), non-nucleoside reverse transcriptase
inhibitors (NNRTIs) and protease inhibitors (PIs). In the presence of these
treatment pressures, viruses that are resistant to the drugs tend to emerge
as the dominant species within a person. As individuals develop resistance to
one therapy, another combination of drugs may be administered and a new
dominant species can emerge. Evidence suggests that a blueprint of drug ex-
posure history remains in latent reservoirs in the sense that a resistant species
will re-emerge quickly in the presence of a drug to which a patient previously
exhibited resistance.
The genetic composition of HIV is a single strand of RNA consisting of
the four base pairs adenine (A), cytosine (C), guanine (G) and uracil (U).
In general, and for the purpose of this textbook, theamino acid(AA) corre-
sponding to three adjacent bases is of interest since AAs serve as the building
blocks for proteins. Notably, there is not a one-to-one correspondence between

18 1 Genetic Association Studies
base triplets and AAs, and thus there are instances in which base information
is more relevant, for example in phylogenetic analyses aimed at characteriz-
ing viral evolution. There are a total of 20 AAs, though between 1 and 5 are
typically observed within a given site on the viral genome across a sample of
individuals.
As described above, the viral genome changes over time and in response to
treatment exposures. Thus, while viral RNA is single stranded, an individual
can carry multiple genotypically distinct viruses, which we refer to asstrains,
resulting from multiple infections or quasi-species that developed over time
within the host. Technically, a strain refers to a group of organisms with a
common ancestor; however, here we use the term more loosely to refer to
genetically distinct viral particles. As a result, multiple AAs can be present
at a given site within a single individual. Typically, a frequency of at least
20% within a single host is necessary for standard population sequencing
technology to recognize the presence of an allele. Thus, the number of AAs at
a given location within an individual tends to range between one and three. In
contrast, there are always exactly two alleles present at a given site within an
individual for the human genetic setting, one inherited from each of the two
parental genomes. Regions of the genome are segments of RNA that generally
code for a protein of interest. For example, in the context of studying viral
resistance, theProtease(Pr) region andReverse Transcriptase(RT) regions
are of interest since these code for enzymes that are targeted by ARTs. The
Envelope region, on the other hand, may be relevant to studies of vaccine
ecacy since it is involved in cell entry. Regions are tantamount to genes in
the context of human genetic studies.
1.3.3 Publicly available data used throughout the text
The FAMuSS study
The Functional SNPS Associated with Muscle Size and Strength (FAMuSS)
study was conducted to identify the genetic determinants of skeletal muscle
size and strength before and after exercise training. A total ofn= 1397
college student volunteers participated in the study, and data on 225 SNPs
across multiple genes were collected. The exercise training involved students
training their non-dominant arms for 12 weeks. The primary aim of the study
was to identify genes associated with muscle performance and specically to
understand associations among SNPs and normal variation in volumetric MRI
(muscle, bone, subQ fat), muscle strength, response to training and clinical
markers of metabolic syndrome. Primary ndings are given in Thompsonet al.
(2004). A complete list of associated publications can be found in the ReadMe
le on the textbook webpage.
The data are contained in a tab-delimited text le entitledFMS
data.txt
and illustrated, in part, in Table 1.1. The le contains information on genotype
across all SNPs as well as an extensive list of clinical and demographic factors

1.3 Data examples 19
Table 1.1.
Sample of FAMuSS data
fms.id actn3
r577x actn3
rs540874 actn3
rs1815739 actn3
1671064 Term Gender Age Race NDRM.CH DRM.CH
1 FA-1801 CC GG
CC
AA
02-1 Female 27 Caucasian 40.00 40.00
2 FA-1802 CT GA
TC
GA
02-1 Male 36 Caucasian 25.00 0.00
3 FA-1803 CT GA
TC
GA
02-1 Female 24 Caucasian 40.00 0.00
4 FA-1804 CT GA
TC
GA
02-1 Female 40 Caucasian 125.00 0.00
5 FA-1805 CC GG
CC
AA
02-1 Female 32 Caucasian 40.00 20.00
6 FA-1806 CT GA
TC
GA
02-1 Female 24 Hispanic 75.00 0.00
7 FA-1807 TT AA
TT
GG
02-1 Female 30 Caucasian 100.00 0.00
8 FA-1808 CT GA
TC
GA
9 FA-1809 CT GA
TC
GA
02-1 Female 28 Caucasian 57.10

14.30
10 FA-1810 CC GG
CC
AA
02-1 Male 27 Hispanic 33.30 0.00
11 FA-1811 CC GG
CC
AA
12 FA-1812 CT GA
TC
GA
02-1 Female 30 Caucasian 20.00 0.00
13 FA-1813 CT GA
TC
GA
02-1 Female 20 Caucasian 25.00 25.00
14 FA-1814 CT GA
TC
GA
02-1 Female 23 African Am 100.00 25.00
15 FA-1815 16 FA-1816 TT GA
TC
GA
02-1 Female 24 Caucasian 28.60 12.50
17 FA-1817 CT GA
TC
GA
18 FA-1818 CT GA
TC
GA
19 FA-1819 CT GG
CC
AA
02-3 Male 34 Caucasian 7.10

7.10
20 FA-1820 CC GA
TC
GA
02-3 Female 31 Caucasian 75.00 20.00

20 1 Genetic Association Studies
for a subset (n= 1035) of the study participants. We begin by specifying the
web location of the data le as follows:
> fmsURL <- "http://people.umass.edu/foulkes/asg/data/FMS_data.txt"
We then use theread.delim()function to pull the data into R directly from
the textbook website:
> fms <- read.delim(file=fmsURL, header=T, sep=" ")
By specifyingheader=T, we are indicating that the rst row of the text le
contains the variable names. Alternatively, we could have speciedheader=F,
which assumes that the rst line of the le is the rst record of data. We also
indicate with the argumentsep="nt"that a tab separates each variable within
a line of the data. Common alternative specications aresep=","andsep="",
indicating comma and space delimiters, respectively. As described in the ap-
pendix, other useful functions for reading data into R includeread.table()
andread.csv(). The specications given above are the default values for
read.delim()and need not be written out explicitly. We do so for the pur-
pose of illustration.
A portion of the data on the rst 20 individuals in this sample are dis-
played in Table 1.1. Included in this table are the genotypes for four SNPs
within theactn3gene and a few corresponding clinical and demographic
parameters. The variableTermindicates the year and term (1|spring, 2|
summer, 3|fall) of recruitment into the study, andGender,AgeandRaceare
all self-declared values of these demographic factors. The percentage changes
in muscle strength before and after exercise training are given byNDRM.CH
for the non-dominant arm andDRM.CHfor the dominant arm. Generation of
the LaTeX code for Table 1.1 is done in R using thextable()function in
thextablepackage. Theprint()function with thefloating.environment
option set equal to`sidewaystable'is used to generate a landscape table.
Alternatively, we can print the table in R as shown below:
> attach(fms)
> data.frame(id, actn3_r577x, actn3_rs540874, actn3_rs1815739,
+ actn3_1671064, Term, Gender, Age, Race, NDRM.CH,DRM.CH)[1:20,]
We use theattach()function so that we can call each variable by its name
without having to indicate the corresponding dataframe. For example, after
submitting the commandattach(fms), we can call the variableGenderwith-
out reference tofms. Alternatively, we could writefms$Gender, which is valid
whether or not theattach()function was used. A dataframe must be re-
attached at the start of a new R session for the corresponding variable names
to be recognized. The numbers1:20within the square brackets and before
the comma are used to indicate that row numbers 1 through 20 are to be
printed.
We see from this table that the genotype forid=FA-1801 at the rst
recorded SNP (r577x) within the geneactn3is the pair of basesCC. In most

1.3 Data examples 21
cases, SNPs are biallelic, which means that two bases are observed within a
site across individuals. For example, for SNPr577xin geneactn3, the letters
CandTare observed, while atrs540874in geneactn3, the two basesGand
Aare observed. This pairing is not restricted (that is,Acan be present with
T,CorGwithin another site), distinguishing this from the pairing of bases
that occurs to form the DNA double helix within a single homolog (in which
Aalways pairs withTandCwithG).
Recall that an individual is said to behomozygousif the two observed
base pairs are the same at a given site andheterozygousif they dier. From
Table 1.1, for example, we see that individual FA-1801 from the FAMuSS
study is homozygous atactn3
rs540874with the observed genotype equal
toGG. Likewise, individual FA-1807 is homozygous at this site since the
observed genotype isAA. Individuals FA-1802, 1803 and 1804, on the other
hand, are all heterozygous atactn3
rs540874since their genotypes contain
both theGandAalleles. Determination of a minor allele and its frequency is
demonstrated in the following example using data from the FAMuSS study.
Example 1.1 (Identifying the minor allele and its frequency).Suppose we are
interested in determining the minor allele for the SNP labeledactn3
rs540874
in the FAMuSS data. To do this, we need to calculate corresponding allele
frequencies. First we determine the number of observations with each genotype
for this SNP using the following code:
> attach(fms)
> GenoCount <- summary(actn3_rs540874)
> GenoCount
AA GA GG NA's
226 595 395 181
Thetable()function in R outputs the counts of each level of the ordinal
variable given as its argument. In this case, we seen= 226 individuals have
theAAgenotype,n= 595 individuals have theGAgenotype andn= 395
individuals have theGGgenotype. An additionaln= 181 individuals are
missing this genotype. For simplicity, we assume that this missingness is non-
informative. That is, we make the strong assumption that our estimates of the
allele frequencies would be the same had we observed the genotypes for these
individuals. To calculate the allele frequencies, we begin by determining our
reduced sample size (that is, the number of individuals with complete data):
> NumbObs <- sum(!is.na(actn3_rs540874))
The genotype frequencies forAA,GAandGGare then given respectively by
> GenoFreq <- as.vector(GenoCount/NumbObs)
> GenoFreq
[1] 0.1858553 0.4893092 0.3248355 0.1488487

22 1 Genetic Association Studies
The frequencies of theAandGalleles are calculated as follows:
> FreqA <- (2*GenoFreq[1] + GenoFreq[2])/2
> FreqA
[1] 0.4305099
> FreqG <- (GenoFreq[2] + 2*GenoFreq[3])/2
> FreqG
[1] 0.5694901
Thus, we reportAis the minor allele at this SNP locus, with a frequency
of 0:43. In this case, an individual is said to be homozygous rare at SNP
rs540874if the observed genotype isAA.Homozygous wildtype, on the other
hand, refers to the state of having two copies of the more common allele, or
the genotypeGGin this case.
Alternatively, we can achieve the same result using thegenotype()and
summary()functions within thegeneticspackage. First we install and upload
the R package as follows:
> install.packages("genetics")
> library(genetics)
We then create agenotypeobject and summarize the corresponding genotype
and allele frequencies:
> Geno <- genotype(actn3_rs540874,sep="")
> summary(Geno)
Number of samples typed: 1216 (87%)
Allele Frequency: (2 alleles)
Count Proportion
G 1385 0.57
A 1047 0.43
NA 362 NA
Genotype Frequency:
Count Proportion
G/G 395 0.32
G/A 595 0.49
A/A 226 0.19
NA 181 NA
Heterozygosity (Hu) = 0.4905439
Poly. Inf. Content = 0.3701245
Here we again see thatAcorresponds to the minor allele at this SNP locus,
with a frequency of 0:43, whileGis the major allele, with a greater frequency
of 0:57.

1.3 Data examples 23
The Human Genome Diversity Project (HGDP)
The Human Genome Diversity Project (HGDP) began in 1991 with the aim
of documenting and characterizing the genetic variation in humans worldwide
(Cannet al., 2002). Genetic and demographic data are recorded onn= 1064
individuals across 27 countries. In this text, we consider genotype information
across four SNPs from the v-akt murine thymoma viral oncogene homolog 1
(AKT1) gene. In addition to genotype information, each individual's country
of origin, gender and ethnicity are recorded. For complete information on
this study, readers are referred to http://www.stanford.edu/group/morrinst/
hgdp.html. Data are contained in the tab-delimited text leHGDP
AKT1.txt
on the textbook website. Again we begin by specifying the location of the
data:
> hgdpURL <- "http://people.umass.edu/foulkes/asg/data/HGDP_AKT1.txt"
Then we apply theread.delim()function to read the data into R:
> hgdp <- read.delim(file=hgdpURL, header=T, sep=" ")
Data on the rst 20 observations in this dataset are provided in Table 1.2.
Here the variablePopulationrefers to ethnicity,Geographic.originis the
country of origin andGeographic.areais a more general description of loca-
tion for the individuals in this cohort.
The Virco data
Several publicly available datasets that include viral sequence information,
treatment histories and clinical measures of disease progression for HIV-
infected individuals are downloadable at the Stanford Resistance Database:
http://hivdb.stanford.edu/. In this text we consider a data set generated by
Virco
T M
, which includes protease (Pr) sequence information on 1066 viral
isolates and corresponding fold-resistance measures for each of eight Pr in-
hibitors. Fold resistance is a comparative measure of responsiveness to a drug,
where the referent value is for awildtypeorconsensusvirus. The consensus
AA at a site on the viral genome is dened as the AA that is most common
at this site in the general population. The data are comma delimited and
contained in the leVirco
data.csvon the textbook website. We use the
read.csv()function in R to read in the data:
> vircoURL <- "http://people.umass.edu/foulkes/asg/data/Virco_data.csv"
> virco <- read.csv(file=vircoURL, header=T, sep=",")
Note that we now indicatesep=","since the data are comma delimited.
This is the default for theread.csv()function. Complete information on the
variables in the database and associated publications can be found on the
Stanford Resistance Database website. A sample of the data on a select set

24 1 Genetic Association Studies
Table 1.2.
Sample of HGDP data
AKT1
Well ID Gender Population Geographic.origin Geographic.area C0756A C6024T G2347T G2375A
1 B12 HGDP00980 F Biaka Pygmies Central African Republic Central Africa CA CT TT AA 2 A12 HGDP01406 M Bantu Kenya
Central Africa CA CT TT AA
3 E5 HGDP01266 M Mozabite Algeria (Mzab) Northern Africa AA TT TT AA 4 B9 HGDP01006 F Karitiana Brazil
South America AA TT TT AA
5 E1 HGDP01220 M Daur China
China
AA TT TT AA
6 H2 HGDP01288 M Han
China
China
AA TT TT AA
7 G3 HGDP01246 M Xibo China
China
AA TT TT AA
8 H10 HGDP00705 M Colombian Colombia
South America AA TT TT AA
9 H11 HGDP00706 F Colombian Colombia
South America AA TT TT AA
10 H12 HGDP00707 F Colombian Colombia
South America AA TT TT AA
11 A2 HGDP00708 F Colombian Colombia
South America AA TT TT AA
12 A3 HGDP00709 M Colombian Colombia
South America AA TT TT AA
13 A4 HGDP00710 M Colombian Colombia
South America AA TT TT AA
14 F5 HGDP00598 M Druze Israel (Carmel) Israel
AA TT TT AA
15 G11 HGDP00684 F Palestinian Israel (Central) Israel
AA TT TT AA
16 C2 HGDP00667 F Sardinian Italy
Southern Europe AA TT TT AA
17 E10 HGDP01155 M North Italian Italy (Bergamo) Southern Europe AA TT TT AA 18 B7 HGDP01415 M Bantu Kenya
Central Africa AA TT TT AA
19 B8 HGDP01416 M Bantu Kenya
Central Africa AA TT TT AA
20 G4 HGDP00865 F Maya Mexico
Central America AA TT TT AA

1.3 Data examples 25
of variables is given in Table 1.3. The variableSeqIDis the sequence iden-
tier, andIsolateNameis the name given to the corresponding isolate. The
drug-specic fold-resistance variables are labeledDrug.Fold, so, for example,
Indinavir (IDV) fold resistance is given by the variableIDV.Fold. A higher
fold-resistance value indicates that the corresponding isolate is more resistant
(less sensitive) to the indicated drug than a wildtype sequence based on anin
vitroassay.
The genotype information is available in two formats. The rst represen-
tation is given by the variables with names that begin with the letterPand
followed by a number. This number refers to the amino acid position within
the Pr region of the viral sequence. For example, the variableP10represents
the tenth AA position within the Pr region of the viral genome. A \" in
the data table indicates the presence of the population consensus AA, while a
letter indicates a mutation in the form of the AA corresponding to this letter.
For example, forSeqID==3852, a variant AA is observed at site 10 in the form
of Isoleucine (I). A total of 99Pvariables are included in this dataset, corre-
sponding to the 99 AA sites in the protease region of the viral genome. An
alternative formulation of the data is given by the variableMutList, which is
a list of all the observed mutations. These data are coded by a letter, followed
by a number, followed by another letter. The number is again the AA loca-
tion, the rst letter is the consensus AA at this site and the letter following
the number is the AA(s) that are observed at the corresponding location. For
example,L10Iindicates that AAIis present in place of leucine (L) at site
10.

26 1 Genetic Association Studies
Table 1.3.
Sample Virco data
SeqID IsolateName IDV.Fold P10 P63 P71 P82 P90 CompMutList
1 3852 CA3176 14.20 I P - - M L10I, M46I, L63P, G73CS, V77I, L90M, I93L 2 3865 CA3191 13.50 I P V T M L10I, R41K, K45R, M46I, L63P, A71V, G73S, V77I, V82T, I85V,
L90M, I93L
3 7430 CA9998 16.70 I P V A M L10I, I15V, K20M, E35D, M36I, I54V, R57K, I62V, L63P, A71V,
G73S, V82A, L90M
4 7459 Hertogs-Pt1 3.00 I P T - M L10I, L19Q, E35D, G48V, L63P, H69Y, A71T, L90M, I93L 5 7460 Hertogs-Pt2 7.00 - - - A - K14R, I15V, V32I, M36I, M46I, V82A 6 7461 Hertogs-Pt3 21.00 I P V A M L10I, K20R, M36I, N37D, I54V, R57K, D60E, L63P, A71V, I72V,
V82A, L90M, I93L
7 7462 Hertogs-Pt4 8.00 - P - A - M36I, G48V, I54V, D60E, I62V, L63P, V82A 8 7463 Hertogs-Pt5 100.00 I - V A M L10I, I13V, M36I, N37D, G48V, I54V, D60E, Q61E, I62V, I64V,
A71V, V82A, L90M, I93L
9 7464 Hertogs-Pt6 18.00 - P - A - V32I, M46I, L63P, V82A, I93L
10 7465 Hertogs-Pt7 15.00 - I V A M E34K, R41K, K43R, I54V, I62V, L63I, A71V, T74S, V82A, L90M 11 7466 Hertogs-Pt8 4.00 I P - - - L10I, E35D, M36I, G48V, D60E, L63P, H69Y 12 7467 Hertogs-Pt9 45.00 - P V - - I13V, K14R, K20M, E35D, M36I, N37D, K45R, L63P, H69X, A71V,
I84V, L89X
13 15492 RC-V33778 1.00 X - V - - L10X, I15V, I50V, I62V, A71V, I72V, N83Z 14 15493 RC-V213888 1.00 F A - - - L10F, I13V, L33F, M46X, I50V, L63A, T74S, V77I, L89M 15 15494 RC-V207648 2.00 F - - - - L10F, V32I, M46I, I47V, I62V 16 15495 RC-V022292 3.00 - P V A M E34Z, R41K, K43R, I54V, I62V, L63P, A71V, V82A, L90M, I93L 17 15498 RC-V020855 1.00 I X X - - L10I, G48V, I54X, L63X, I64V, A71X, I93L 18 15499 RC-V216965 1.00 - T V M X L33X, K43Z, M46V, I50V, Q58E, D60E, L63T, I64V, A71V, I72Z,
V77I, V82M, L90X
19 15500 RC-V020829 0.50 I P - - - L10I, D30N, E35D, M36V, P39Z, L63P, N88D, I93L 20 15501 RC-V020834 1.00 - P - - M E35D, M36I, G48V, L63P, H69Z, L90M

Problems 27
Problems
1.1.State the primary analytic considerations that distinguish population-
based and family-based investigations.
1.2.Dene and contrast the following terms: (a) genotype, (b) haplotype, (c)
phase, (d) homologous, (e) allele, and (f) zygosity.
1.3.Based on the FAMuSS data, determine the minor allele and its frequency
for theactn3
1671064SNP. Report these frequencies overall and stratied
by the variable labeledRace. Interpret your ndings.
1.4.Using the HGDP data, summarize the genotype frequencies for the SNP
labeledAKT1.C6024T, overall and by geographic area, using the variable
namedgeographic.area. Interpret the results.
1.5.Report the observed proportion of mutations at sites 1, 10, 30, 71, 82
and 90 in the Protease region of the HIV genome for the Virco data using the
variables labeledP1,P10,P30,P71,P82andP90. Explain your ndings.

2
Elementary Statistical Principles
This chapter includes coverage of several statistical and epidemiological con-
cepts that are broadly relevant to the study of association among multiple
variables and specically important in the analysis of genetic association in
population-based investigations. The chapter is divided into three sections.
Section 2.1 oers a general background, including the notation used through-
out this text and some elementary probability concepts. This section also
provides a basic overview of several fundamental epidemiological concepts
relevant to any population-based investigation, including confounding, eect
mediation, eect modication and conditional association. The reader is re-
ferred to Rothman and Greenland (1998) for a comprehensive overview of
these and other epidemiological principles. Some of these concepts are very
similar to the genetic data concepts discussed in Chapter 3, though they tend
to have dierent nomenclature. All of these elements are important to the
discovery and characterization of genotype{trait associations. Section 2.2 de-
scribes several simple measures and tests of statistical association, including
correlation analysis, contingency table analysis and simple linear and logis-
tic regression. Also included in this section is an introduction to methods for
multivariable analysis. Finally, Section 2.3 oers an overview of the analytic
challenges inherent in population-based genetic investigations. The testing
procedures described throughout this chapter can be applied to each of a set
of genotype variables though they generally require an adjustment for multiple
comparisons, as described in Chapter 4. Further extensions that allow simul-
taneous assessment of associations for a group of SNPs or genes are presented
in Chapters 5{7.
A.S. Foulkes,Applied Statistical Genetics with R: For Population-based Association29
Studies, Use R, DOI: 10.1007/978-0-387-89554-3
2,
cSpringer Science+Business Media LLC 2009

30 2 Elementary Statistical Principles
2.1 Background
2.1.1 Notation and basic probability concepts
Notation
As described in Section 1.2, data arising from population-based association
studies are comprised of three primary components: the trait, which in this
text is either quantitative or binary; a single or multilocus genotype; and
several patient-level covariates. Throughout this text, we useyto represent
the trait under study,xto represent the genotype data andzto represent
covariates. For example, we letyibe the trait for theith individual in our
sample, wherei= 1; : : : ; nandnis our total sample size. Furthermore,xijis
the genotype at thejth SNP for individuali, wherej= 1; : : : ; pandpis the
total number of SNPs under study. Finally,zikis the value of thekth covariate
for individuali, wherek= 1; : : : ; mandmis the number of covariates.
Boldface font is used to indicate vectors, and capital letters are used to
represent matrices. For example, we usex= (x1; : : : ; xn)
T
to represent an
n1 vector of genotypes at a single site on the genome across all individ-
uals in our sample andxj= (x1j; : : : ; xnj)
T
to represent the genotypes at
thejth site under study. The notationTis used to indicate the transpose
of a vector. Similarly,y= (y1; : : : ; yn)
T
is a vector with itsith element cor-
responding to the trait for individuali. In several settings, we additionally
usexi= (xi1; : : : ; xip)
T
to denote the genotype data for theith individual
acrosspvariables. The dierence betweenxiandxjwill be made clear by the
particular context presented. Recall thatycan be quantitative, such as CD4
count or total cholesterol, in which case it is typically measured with error.
Alternatively,ymay be a binary random variable representing disease status.
Annpmatrix of genotype variables is given byX, with the (i,j)-element
corresponding to thejth genotype for individuali. Similarly, thenmmatrix
Zis used to represent the entire set of covariates. These may include multiple
clinical, demographic and environmental variables, such as age, sex, weight
and second-hand smoke exposures. The concatenated matrix given by

X Z

represents all potential explanatory variables. If dimensions are not indicated,
they can generally be inferred based on the specic model of association under
consideration. Finally, while Roman letters are used to represent data, Greek
symbols, such as,,and, are used to represent model parameters. These
parameters are unobservable quantities that we are generally interested in
estimating or making inference about.
Application of any statistical approach requires rst understanding the
potential ways in which the components of our data can be dened. In general,
the genotype for individualiat sitej, denotedxij, is a categorical variable
that takes on two or more levels. For example,xijmay be dened as a three-
level factor variable taking on the three possible genotypes at a biallelic site,
given byAA,Aaandaa. Alternatively, we can denexijas a binary variable

2.1 Background 31
indicating the presence of at least one variant allele at a single SNP locus.
That is, for example, we can letxij= 0 if the observed genotype isAAand
xij= 1 otherwise. Yet another alternative is to denexijas an indicator for
the presence of any variant alleles across multiple SNPs within a given gene.
For example, suppose there are two SNPs within thejth gene. We can then
letxij= 0 if the multilocus genotype is (AA; BB) andxij= 1 otherwise.
Note that throughout this text we oer simple examples in which the
lettersA,a,Bandbare used to represent the alleles at two biallelic loci.
The two lettersAandBare used to indicate dierent sites, and the capital
and lowercase letters are intended to represent the dierent alleles at each
corresponding site. For example, if the nucleotidesCandGare observed at
the rst site, then we may letA=Canda=G. In general,Arepresents the
major allele andarepresents the minor allele. This notation is presented in
several genetics texts, though alternative notation has also been used. Most
notably, we often see the notationA1,A2,B1andB2, where now the subscript
is used to indicate the corresponding allele. The advantage of this alternative
notation is that we are not restricted to biallelic SNPs. That is, if more than
two alleles are present in a population at a given site, we can represent these
asA1,A2,A3,: : :,Akfork >2. In this text, we resort to the more commonly
usedA,anotation since it tends to be less cumbersome within larger formulas.
Statistical independence
The concept of independence is a cornerstone of statistical inference. In the
context of genetic association studies, we are often interested in testing the
null hypothesis that the trait under study is independent of genotype. That
is, a summary value of the trait, such as the mean, is the same for all levels
of genotype. As we will see in Chapter 3, assessing independence of alleles
across two or more loci (linkage equilibrium) or independence of alleles across
two homologous chromosomes (Hardy-Weinberg equilibrium) is also of gen-
eral interest in genetic association studies. In probability terms, we say that
two events areindependentif the joint probability of their occurrence is the
product of the probabilities of each event occurring on its own.
Consider for example a single SNP with alleles given byAanda. Suppose
one event is dened as the presence ofAon one of the two homologous chro-
mosomes and a second event is dened as the presence ofAon the second of
the two homologous chromosomes. Further, letpAbe the allele frequency of
AandpAAbe the joint probability of having two copies ofA, one on each of
the two homologous chromosomes. We say these two events areindependent
(that is, the occurrences ofAon each of the two homologous chromosomes
are independent) ifpAA=pApA. For example, if the probability ofAis
pA= 0:50, then, under independence, the probability of theAAgenotype is
pAA= 0:5
2
= 0:25.
Dependency, on the other hand, refers to the situation in which the prob-
ability of one event depends on the outcome of another event. For example,

32 2 Elementary Statistical Principles
suppose one event is dened as having a mutation in the BRCA1 gene, a
known risk factor for breast and ovarian cancers, and a second event is de-
ned as developing breast cancer. In this case, the probability of developing
breast cancer depends on whether a mutation is present; that is, the second
event depends on whether the rst event occurred. Formally, let our two events
be denotedE1andE2. The conditional probability of the second event given
that the rst one has occurred is writtenP r(E2jE1), wherejis read \given".
In general, we haveP r(E2jE1) =P r(E2andE1)=P r(E1). If the two events
are independent, then this reduces toP r(E2jE1) =P r(E2), and similarly we
haveP r(E1jE2) =P r(E1).
Expectation
The expectation of a discrete random variable can be thought of simply as
a weighted average of its possible values, where the weights are equal to the
probabilities that the variable takes on corresponding values. For example,
supposeYis a random variable that takes on the value 1 with probability
pand the value 0 with probability (1p). We sayYis a Bernoulli random
variable and we have the following, whereE[] is used to denote expectation:
E[Y] = 1p+ 0(1p) =p (2.1)
In the continuous case, we instead weight by the probability density function.
Notably, ifcis a constant, thenE[c] =cand the expectation of a mean 0 ran-
dom variable is equal to 0. We will see the use of expectations in Section 2.2.3
and Chapter 5. Emphasis, however, is placed on the general concept of ex-
pectation as a weighted average, and technical details are not emphasized.
Likelihood
Maximum likelihood is probably the most widely used approach to nding
point estimates of population-level parameters based on a sample of data.
For the genetic association setting described herein, we may be interested,
for example, in characterizing the eect of having a variant allele at a given
SNP locus on a quantitative trait. Maximum likelihood is one approach to
deriving an estimate of this eect based on a sample of observations that
maintains several desirable properties. A complete introduction to estimation
and associated concepts is provided in Casella and Berger (2002).
Briey, suppose our parameter of interest is denoted. Thelikelihood func-
tionis given by
L(jy) =L(jy1; : : : ; yn) =f(y1; : : : ; ynj) (2.2)
wherenis the number of individuals in our sample andf(y1; : : : ; ynj) is the
joint probability distribution ofy= (y1; : : : ; yn). Under the assumption that
our observations are independent and identically distributed, we have

Another Random Scribd Document
with Unrelated Content

[379] A great trade was carried on in those times in dried fish from
the Pontic or Black Sea. See Strabo, p. 320, ed. Casaub.
[380] It was near the end of b.c. 40 that Antonius was roused from
his “sleep and drunken debauch.” He sailed from Alexandria to Tyrus
in Phoenicia, and thence by way of Cyprus and Rhodes to Athens,
where he saw Fulvia, who had escaped thither from Brundusium. He
left her sick at Sikyon, and crossed from Corcyra (Corfu) to Italy.
(Appian, Civil Wars, v. 52-55.) Brundusium shut her gates against him,
on which he commenced the siege of the city. The war was stopped
by the reconciliation that is mentioned in the text, to which the news
of the death of Fulvia greatly contributed. Antonius had left her at
Sikyon without taking leave of her, and vexation and disease put an
end to her turbulent life. (Appian, Civil Wars, v. 59.)
[381] See the Life of Cicero, c. 44, note.
[382] The meeting with, Sextus Pompeius was in b.c. 39, at Cape
Miseno, which is the northern point of the Gulf of Naples.
Sextus was the second son of Pompeius Magnus. He was now master
of a large fleet, and having the command of the sea, he cut off the
supplies from Rome. The consequence was a famine and riots in the
city. (Appian, Civil Wars, v. 67, &c.) Antonius slaughtered many of the
rioters, and their bodies were thrown into the Tiber. This restored
order; “but the famine,” says Appian, “was at its height, and the
people groaned and were quiet.”
[383] P. Ventidius Bassus was what the Romans call a “novus homo,”
the first of his family who distinguished himself at Rome. He had the
courage of a soldier and the talents of a true general. When a child he
was made prisoner with his mother in the Marsian war (Dion Cass.
xliii. 51), and he appeared in the triumphal procession of Pompeius
Strabo (Dion Cass. xlix. 21). The captive lived to figure as the
principal person in his own triumph, b.c. 38. In his youth he supported
himself by a mean occupation. Hoche, when he was a common
soldier, used to embroider waistcoats. Julius Cæsar discovered the
talents of Bassus, and gave him employment suited to his abilities. In
b.c. 43 he was Prætor and in the same year Consul Suffectus.
(Drumann, Antonii, p. 439; Gell. xv. 4.)
[384] Cockfighting pleased a Roman, as it used to do an Englishman.
The Athenians used to fight quails.

[385] The name is written indifferently Hyrodes or Orodes (see the
Life of Crassus, c. 18).
Plutarch, on this as on many other occasions, takes no pains to state
facts with accuracy. Labienus lost his life and the Parthians were
defeated; and that was enough for his purpose. The facts are stated
more circumstantially by Dion Cassius (xlviii. 40, 41).
[386] The president of the gymnastic exercises. Dion Cassius (xlviii.
39) tells us something that is characteristic of Antonius. The fulsome
flattery of the Athenians gave him on this occasion the title of the
young Bacchus, and they betrothed the goddess Minerva to him.
Antonius said he was well content with the match; and to show that
he was in earnest he demanded of them a contribution of one million
drachmæ as a portion with his new wife. He thus fleeced them of
about 2800l. sterling. No doubt Antonius relished the joke as well as
the money.
[387] The sacred olive was in the Erektheium on the Acropolis of
Athens. Pausanias (i. 28) mentions a fountain on the Acropolis near
the Propylæa; and this is probably what Plutarch calls Clepsydra, or a
water-clock. The name Clepsydra is given to a spring in Messenia by
Pausanias (iv. 31). Kaltwasser supposes the name Clepsydra to have
been given because such a spring was intermittent. Such a spring the
younger Pliny describes (Ep. iv. 30).
[388] The defeat of Pacorus (b.c. 38) is told by Dion Cassius (xlix. 19).
The ode of Horace (Carm. iii. 6) in which he mentions Pacorus seems
to have been written before this victory, and after the defeat of
Decidius Saxa (b.c. 40; Dion, xlviii. 25).
[389] Commagene on the west bordered on Cilicia and Cappadocia.
The capital was Samosata, on the Euphrates, afterwards the
birthplace of Lucian. This Antiochus was attacked by Pompeius b.c. 65,
who concluded a peace with him and extended his dominions (Appian
Mithrid. 106, &c.).
[390] C. Sossius was made governor of Syria and Cilicia by Antonius.
He took the island and town of Aradus on the coast of Phoenice (b.c.
38); and captured Antigonus, the son of Aristobulus, in Jerusalem.
[391] P. Canidius Crassus. His campaign against the Iberi of Asia is
described by Dion Cassius (xlix. 24).
[392] Antonius and Cæsar met at Tarentum (Taranto) in the spring of
b.c. 37. The events of this meeting are circumstantially detailed by

Appian (Civil Wars, v. 93, &c.). Dion Cassius (xlviii. 54) says that the
meeting was in the winter.
[393] M. Vipsanius Agrippa, the constant friend of Cæsar, and
afterwards the husband of his daughter Julia. Mæcenas, the patron of
Virgil and Horace.
[394] Μυοπάρωνες are said to be light ships, such as pirates use,
adapted for quick sailing.
[395] Cæsar spent this year in making preparation against Sextus
Pompeius. In b.c. 36 Pompeius was defeated on the coast of Sicily. He
fled into Asia, and was put to death at Miletus by M. Titius, who
commanded under Antonius (Appian, Civil Wars, v. 97-121).
[396] The passage to which Plutarch alludes is in the Phædrus, p.
556.
[397] That is, the Ocean, as opposed to the Internal Sea or the
Mediterranean. Kaltwasser proposes to alter the text to “internal sea,”
for no sufficient reason.
[398] This was the Antigonus who fell into the hands of Sossius, when
he took Jerusalem on the Sabbath, as Pompeius Magnus had done.
(Life of Pompeius, 39; Dion Cassius, xlix. 22, and the notes of
Reimarus.) Antigonus was tied to a stake and whipped before he was
beheaded. The kingdom of Judæa was given to Herodes, the son of
Antipater.
[399] Plutarch probably alludes to some laws of Solon against
bastardy.
[400] A common name of the Parthian kings (see the Life of Crassus,
c. 33). This Parthian war of Antonius took place in b.c. 36.
[401] See Plutarch’s Life of Themistocles, c. 29. It was an eastern
fashion to grant a man a country, or a town and its district, for his
maintenance and to administer. Fidelity to the giver was of course
expected. The gift was a kind of fief.
[402] Among the Persians, and as it here appears among the
Parthians, “to send a right hand” was an offer of peace and friendship
(Xenophon, Anab. ii. 4, who uses the expression “right hands”).
[403] The desert tract in the northern part of Mesopotamia is meant.

[404] There is error as to the number of cavalry of Artavasdes either
here or in c. 50. See the notes of Kaltwasser and Sintenis: and as to
Artavasdes, Life of Crassus, c. 19, 33, and Dion Cassius, xlix. 25.
[405] No doubt Iberians of Spain are meant.
[406] Was the most south-western part of Media, and it
comprehended the chief part of the modern Azerbijan.
[407] Dion Cassius (xlix. 25) names the place Phraaspa or Praaspa,
which may be the right name. The position of the place and the
direction of the march of Antonius are unknown.
[408] Was a king of Pontus: he was ransomed for a large sum of
money. Reimarus says in a note to Dion Cassius (xlix. 25) that
Plutarch states that Polemon was killed. The learned editor must have
read this chapter carelessly.
[409] See Life of Crassus, c. 10.
[410] οἱ γνωριμώτατοι, which Kaltwasser translates “those who were
most acquainted with the Romans;” and his translation may be right.
[411] Cn. Domitius Ahenobarbus, which is the Roman mode of writing
the word. He was the son of Domitius who was taken by Cæsar in
Corfinium (Life of Cæsar, c. 34); and he is the Domitius who deserted
Antonius just before the battle of Actium (c. 63).
[412] The Mardi inhabited a tract on the south coast of the Caspian,
where there was a river Mardus or Amardus.
Plutarch has derived his narrative of the retreat from some account by
an eye-witness, but though it is striking as a picture, it is quite useless
as a military history. The route is not designated any further than this,
that Antonius had to pass through a plain and desert country. It is
certain that he advanced considerably east of the Tigris, and he
experienced the same difficulties that Crassus did in the northern part
of Mesopotamia. (Strabo, p. 523, ed. Casaub. as to the narrative of
Adelphius, and Casaubon’s note.)
[413] These were used by the slingers (funditores) in the Roman
army.
[414] ἐπ’ οὐρὰν, Sintenis: but the MS. reading is ἀπ’ οὐρᾶς, “from the
rear.” See the note of Schaefer, and of Sintenis.

[415] Contrary to Parthian practice. Compare the Life of Crassus, c.
27.
[416] These are the soldiers in full armour. Sintenis refers to the Life
of Crassus, c. 25. See life of Antonius, c. 49, οἱ δὲ ὁπλῖται ... τοῖς
θυρεοῖς.
[417] The Romans called this mode of defence Testudo, or tortoise. It
is described by Dion Cassius (xlix. 30). The testudo was also used in
assaulting a city or wall. A cut of one from the Antonine column is
given in Smith’s Dict. of Antiquities, art. Testudo.
[418] The forty-eighth part of a medimnus. The medimnus is
estimated at 11 gal. 7·1456 pints English. The drachma (Attic) is
reckoned at about 9-3/4d. (Smith’s Dict. of Antiquities.) But the
scarcity is best shown by the fact that barley bread was as dear as
silver. Compare Xenophon (Anab. i. 5, 6) as to the prices in the army
of Cyrus, when it was marching through the desert.
[419] The allusion is to the retreat of the Greeks in the army of Cyrus
from the plain of Cunaxa over the highlands of Armenia to Trapezus
(Trebizond); which is the main subject of the Anabasis of Xenophon.
[420] Salt streams occur on the high lands of Asia. Mannert, quoted
by Kaltwasser, supposes that the stream here spoken of is one that
flows near Tabriz and then joins another river. If this were the only
salt stream that Antonius could meet with on his march, the
conclusion of the German geographer might be admitted.
[421] The modern Aras. The main branch of the river rises in the
same mountain mass in which a branch of the Euphrates rises, about
39° 47’ N. lat., 41° 9’ E. long. It joins the Cyrus or Kur, which comes
from the Caucasus, about thirty miles above the entrance of the
united stream into the Caspian Sea. Mannert, quoted by Kaltwasser,
conjectures that Antonius crossed the river at Julfa (38° 54’ N. lat.). It
is well to call it a conjecture. Any body may make another, with as
much reason. Twenty-seven days’ march (c. 50) brought the Romans
from Phraata to the Araxes, but the point of departure and the point
where the army crossed the Araxes are both unknown.
[422] The second expedition of Antonius into Armenia was in b.c. 34,
when he advanced to the Araxes. After the triumph, Artavasdes was
kept in captivity, and he was put to death by Cleopatra in Egypt after
the battle of Actium, b.c. 30 (Dion Cassius xlix. 41, &c).
[423] Compare Dion Cassius, xlix. 51.

[424] The name is written both Phraates and Phrates in the MSS.
[425] She went to Athens in b.c. 35.
[426] In b.c. 34, Antonius invaded Armenia and got Artavasdes the
king into his power. The Median king with whom Antonius made this
marriage alliance (b.c. 33) was also named Artavasdes. Alexander, the
son of Antonius by Cleopatra, was married to Jotape, a daughter of
this Median king.
[427] This is Plutarch’s word. Its precise meaning is not clear, but it
may be collected from the context. It was something like a piece of
theatrical pomp.
[428] Or Cidaris. (See Life of Pompeius, c. 33.) The Cittaris seems to
be the higher and upright part of the tiara; and sometimes to be used
in the same sense as tiara. The Causia was a Macedonian hat with a
broad brim. (See Smith’s Dict. of Antiquities.)
[429] After the defeat of Sextus Pompeius, Lepidus made a claim to
Sicily and attempted a campaign there against Cæsar. But this feeble
man was compelled to surrender. He was deprived of all power, and
sent to live in Italy. He still retained his office of Pontifex Maximus
(Appian, Civil Wars, v. 126; Dion Cassius, xlix. 11).
[430] This is an emendation of Amiot in place of the corrupt word
Laurians.
[431] The preparation was making in b.c. 32. Antonius spent the
winter of this year at Patræ in Achæa.
[432] An account of these exactions is given by Dion Cassius (l. 10).
They show to what a condition a people can be reduced by tyranny.
[433] Such is the nature of the people. It is hard to rouse them; and
their patience is proved by all the facts of history.
[434] It was usual with the Romans, at least with men of rank, to
deposit their wills with the Vestals for safe keeping.
[435] This great library at Alexandria is said to have been destroyed
during the Alexandrine war. See the Life of Cæsar, c. 49.
[436] The translators are much puzzled to explain this. Kaltwasser
conjectures that Antonius in consequence of losing some wager was
required to do this servile act; and accordingly he translates part of
the Greek text “in consequence of a wager that had been made.”

[437] The only person of the name who is known as an active
partizan at this time was C. Furnius, tribune of the plebs, b.c. 50. He
was a legatus under M. Antonius in Asia in b.c. 35. Here Plutarch
represents him as a partizan of Cæsar. If Plutarch’s Furnius was the
tribune, he must have changed sides already. As to his eloquence,
there is no further evidence of it than what we have here.
[438] C. Calvisius Sabinus, who was consul b.c. 39 with L. Marcius
Censorinus.
[439] The name occurs in Horace, 1 Sat. 5; but the two may be
different persons. As to the Roman Deliciæ see the note of Coraes;
and Suetonius, Augustus, c. 83.
[440] Dion Cassius (1. 4) also states that war was declared only
against Cleopatra, but that Antonius was deprived of all the powers
that had been given to him.
[441] Now Pesaro in Umbria.
[442] See Pausanias, i. 25. 2.
[443] The text of Bryan has, “and Deiotarus, king of the Galatians:”
and Schaefer follows it. But see the note of Sintenis.
[444] Actium is a promontory on the southern side of the entrance of
the Ambraciot Gulf, now the gulf of Arta. It is probably the point of
land now called La Punta. The width of the entrance of the gulf is
about half a mile. Nicopolis, “the city of Victory,” was built by Cæsar
on the northern side of the gulf, a few miles from the site of Prevesa.
The battle of Actium was fought on the 2nd of September, b.c. 31. It
is more minutely described by Dion Cassius (l. 31, &c.; li. 1).

[445] This word means something to stir up a pot with, a ladle or
something of the kind. The joke is as dull as it could be.
[446] Sintenis observes that Plutarch has here omitted to mention the
place of Arruntius, who had the centre of Cæsar’s line (c. 66). C.
Sossius commanded the left of the line of Antonius. Insteius is a
Roman name, as appears from inscriptions. Taurus is T. Statilius
Taurus.
[447] There is some confusion in the text here, but the general
meaning is probably what I have given. See the note of Sintenis.
[448] These were light vessels adapted for quick evolutions. Horace,
Epod. i., alludes to them:—
“Ibis Liburnis inter alta navium,
Amice, propugnacula.”
[449] Is the most southern point of the Peloponnesus, in Laconica.
The modern name of Tænarus is Matapan or “head.”
[450] Dion Cassius (li. 2) gives an account of Cæsar’s behaviour after
the battle. He exacted money from the cities; but Dion does not
mention any particular cities.
[451] By “all the citizens” Plutarch means the citizens of his native
town Chæronea. The people had to carry their burden a considerable
distance, for this Antikyra was on the Corinthian gulf, nearly south of
Delphi. This anecdote, which is supported by undoubted authority, is a
good example of the sufferings of the people during this contest for
power between two men.
[452] This was a town on the coast in the country called Marmarica.
It had a port and was fortified, and thus served as a frontier post to
Egypt against attacks from the west.
[453] See the Life of Brutus, c. 50.
[454] He was L. Pinarius Carpus, who had fought under him at
Philippi. Carpus gave up his troops to Cornelius Gallus, who advanced
upon him from the province Africa (Dion. Cass. 1. 5, where he is
called Scarpus in the text of Reimarus).
[455] Or “Sea that lies off Egypt,” that part of the Mediterranean
which borders on Egypt. The width of the Isthmus is much more than

300 stadia: it is about seventy-two miles. Herodotus (ii. 158) states
the width more correctly at one thousand stadia.
In this passage Plutarch calls the Red Sea both the Arabian gulf and
the Erythra (Red), and in this he agrees with Herodotus. The Arabian
Gulf or modern Red Sea was considered a part of the great Erythræan
Sea or Indian Ocean. Herodotus (ii. 11) says that there is a gulf which
runs into the land from the Erythræan sea; and this gulf he calls (ii.
11, 158) the Arabian gulf, which is now the Red Sea. See Anton, c. 3.
[456] See the Life of Pompeius, c. 41.
[457] The Pharos was an island opposite to Alexandria, and
connected with it by a dike called Heptastadion, the length being
seven stadia.
[458] Shakspere has made a play out of the meagre subject of Timon,
and Lucian has a dialogue entitled “Timon or the Misanthropist.”
(Comp. Strab. 794, ed. Cas.)
[459] This was the second day of the third Dionysiac festival, called
the Anthesteria. The first day was Pithœgia (πιθοιγία) or the tapping
of the jars of wine; and the second day, as the word Choes seems to
import, was the cup day.
[460] This was Herodes I., son of Antipater, sometimes called the
Great. He was not at the battle of Actium, but he sent aid to Antonius
(c. 61).
[461] This was the toga virilis, or dress which denoted that a male
was pubes, fourteen at least, and had attained full legal capacity. The
prætexta, which was worn up to the time of assuming the toga virilis,
had a broad purple border, by which the impubes was at once
distinguished from other persons.
Cleopatra’s son, Cæsarion, was registered as an Alexandrine. The son
of Antonius was treated as a Roman citizen.
[462] This seems to be the sense of the passage. The Greek for asp is
aspis. Some suppose that it is the poisonous snake which the Arabs
call El Haje, which measures from three to five feet in length. But this
is rather too large to be put in a basket of figs.
[463] Conjectured by M. du Soul to be Alexander the Syrian, who has
been mentioned before.

[464] He was a native of Alexandria, and had been carried prisoner to
Rome by Gabinius. He obtained his freedom, and acquired celebrity as
a rhetorician and historian. He was a favourite of Asinius Pollio and of
Augustus; but he was too free-spoken for Augustus, who finally
forbade him his house (Horat. 1. Ep. l, 19; and the note of Orelli). Life
of Pompeius, c. 49.
Dion Cassius (li. 8), who believed every scandalous story, says that
Cæsar made love to Cleopatra through the medium of Thyrsus.
[465] After the battle of Actium, Cæsar crossed over to Samos, where
he spent the winter. He was recalled by the news of a mutiny among
the soldiers, who had not received their promised reward. He returned
to Brundusium, where he stayed twenty-seven days, and he went no
further, for his appearance in Italy stopped the disturbance. He
returned to Asia and marched through Syria to Egypt (Sueton. Aug. c.
17; Dion Cassius, li. 4).
[466] The shout of Bacchanals at the festivals. See the Ode of Horace
(Carm. ii. 19):
Evoe, recenti mens trepidat metu.
[467] The fleet passed over to Cæsar on the 1st of August (Orosius,
vi. 19). The treachery of Cleopatra is not improbable (Dion Cass. li.
10).
[468] Compare Dion Cassius, li. 10.
[469] His name was C. Proculeius. He appears to be the person to
whom Horace alludes (Carm. ii. 2).
[470] Dion Cassius (li. 11) says that Cleopatra communicated to
Cæsar the death of Antonius, which is not so probable as Plutarch’s
narrative.
[471] C. Cornelius Gallus, a Roman Eques, who had advanced from
the province Africa upon Egypt. He was afterwards governor of Egypt;
but he incurred the displeasure of Augustus, and put an end to life
b.c. 26. Gallus was a poet, and a friend of Virgil and Ovid. The tenth
Eclogue of Virgil is addressed to Gallus.
[472] Said to have been a Stoic, and much admired by Augustus
(Dion Cass. li. 16; Sueton. Aug. 89).
[473] Probably the same that is mentioned in the Life of Cato the
Younger, c. 57.

[474] The circumstances of the death of Antyllus and Cæsarion are
not told in the same way by Dion Cassius (li. 15). Antyllus had been
betrothed to Cæsar’s daughter Julia in b.c. 36.
[475] The words are borrowed from Homer (Iliad, ii. 204):—
Οὐκ ἀγαθὸν πολυκοιρανίη.
There could be no reason for putting Cæsarion to death as a possible
competitor with Cæsar at Rome, for he was not a Roman citizen. As it
was Cæsar’s object to keep Egypt, Cæsarion would have been an
obstacle there.
[476] There were, as usual in such matters, various versions of this
interview: it was a fit subject for embellishment with the writers of
spurious history. The account of Plutarch is much simpler and more
natural than that of Dion Cassius (li. 12), which savours of the
rhetorical.
[477] He was the son of P. Cornelius Dolabella, once the son-in-law of
Cicero, and one of Cæsar’s murderers. His son P. Cornelius Dolabella
was consul A.D. 10.
[478] The word “companions” represents the Roman “comites,” which
has a technical meaning. Young men of rank, who were about the
person of a commander, and formed a kind of staff, were his Comites.
See Horat. I. Ep. 8.
[479] The story of Dion (li. 14) is that Cæsar, after he had seen the
body, sent for the Psylli, serpent charmers, to suck out the poison
(compare Lucan, Pharsal. ix. 925). If a person was not dead, it was
supposed that the Psylli could extract the poison and save the life.
Dion Cassius also states that the true cause of Cleopatra’s death was
unknown. One account was that she punctured her arm with a hair-
pin (βελόνη) which was poisoned. But even as to the punctures on
the arm, Plutarch does not seem to state positively that there were
any. The “hollow comb” is hardly intelligible. Plutarch’s word is
κνηστίς, “a scraping instrument of any kind.” One MS. has κιστίς, “a
small coffer.” Strabo (p. 795, ed. Casaub.) doubts whether she
perished by the bite of a serpent or by puncturing herself with a
poisoned instrument. Propertius (iii. 11, 53) alludes to the image of
Cleopatra, which was carried in the triumph—

Brachia spectavi sacris admorsa colubris
Et trahere occultum membra soporis iter.
An ancient marble at Rome represents Cleopatra with the asp on her
arm. There was also a story of her applying it to the left breast.
Cleopatra was born in b.c. 69, and died in the latter part of b.c. 30.
She was seventeen years of age when her father Ptolemæus Auletes
died: and upon his death she governed jointly with her brother
Ptolemæus, whose wife she was to be. Antonius first saw her when
he was in Egypt with Gabinius, and he had not forgotten the
impression which the young girl then made on him at the time when
she visited him at Tarsus (Appian, Civil Wars, v. 8). Antonius was forty
years old when he saw Cleopatra at Tarsus, b.c. 41, and he would
therefore be in his fifty-second year at the time of his death (Clinton,
Fasti).
[480] Octavia’s care of the children of Antonius is one of the beautiful
traits of her character. She is one of those Roman women whose
virtues command admiration.
Cleopatra, the daughter of Antonius and twin sister of Alexander,
married Juba II., king of Numidia, by whom she had a son
Ptolemæus, who succeeded his father, and a daughter Drusilla, who
married Antonius Felix, the governor of Judæa. The two brothers of
Cleopatra were Alexander and Ptolemæus.
Antonius, the son of Fulvia, was called Iulus Antonius. He married
Marcella, one of the daughters of Octavia. In b.c. 10, Antonius was
consul. He formed an adulterous intercourse with Julia, the daughter
of Augustus, which cost him his life b.c. 2. Antonius was a poet, as it
seems (Horat. Carm. iv. 2, and Orelli’s note).
The elder Antonia, the daughter of Octavia and Antonius, married L.
Domitius Ahenobarbus, the son of Cneius, who deserted to Cæsar just
before the battle of Actium. This Lucius had by Antonia a son, Cn.
Domitius Ahenobarbus, who married Agrippina, the daughter of
Cæsar Germanicus. Agrippina’s son, L. Domitius Ahenobarbus, was
adopted by the emperor Claudius after his marriage with Agrippina,
and Lucius then took the name of Nero Claudius Caesar Drusus. As
the emperor Nero his infamy is imperishable.
The younger Antonia, the daughter of Octavia and Antonius, married
Drusus, the second son of Tiberius Claudius Nero. Tiberius had

divorced his wife Livia in order that Caesar Octavianus might become
her husband. The virtues of Antonia are recorded by Plutarch and
others: her beauty is testified by her handsome face on a medal.
The expression of Plutarch that Caius, by whom he means Caius
Caligula, “ruled with distinction,” has caused the commentators some
difficulty, and they have proposed to read ἐπιμανῶς, “like a madman”
in place of ἐπιφανῶς, “with distinction.” Perhaps Plutarch’s meaning
may be something like what I have given, and he may allude to the
commencement of Caligula’s reign, which gave good hopes, as
Suetonius shows. Some would get over the difficulty by giving to
ἐπιφανῶς a different meaning from the common meaning. See
Kaltwasser’s note.
A portrait of Antonius (see Notes to Brutus, c. 52) would be an idle
impertinence. He is portrayed clear and distinct in this inimitable Life
of Plutarch.
Here ends the Tragedy of Antonius and Cleopatra; and after it begins
the Monarchy, as Plutarch would call it, or the sole rule of Augustus.
See the Preface to the First Volume.
[481] Of Athens.
[482] The various stories about Plato’s slavery are discussed in Grote’s
‘History of Greece,’ part ii. ch. 53.
[483] Aristomache and Arete.
[484] Periodical northerly winds or monsoons.
[485] The ceremony of the libations seems to correspond to our
“grace after meat.” See vol. i. Life of Perikles, ch. 7.
[486] Grote paraphrases this passage as follows:—“A little squadron
was prepared, of no more than five merchantmen, two of them
vessels of thirty oars, &c.” On consulting Liddell and Scott’s Lexicon,
s.v. τριακόντορος, I find a reference to Thuc. iv. 9; where a Messenian
pirate triaconter is spoken of, and for further information the reader is
referred to the article “πεντηκόντορος (sc. ναῦς), ἡ, a ship of burden
with fifty oars,” Pind. P. 4. 436, Eur. I.T. 1124, Thuc. i., 14, &c. But
none of these passages bear out the sense of a “vessel of burden.”
The passage in Pindar merely states that the snake which Jason slew
was as big or bigger than a πεντηκόντορος. Herod, ii. 163, distinctly
says “not ships of burden, but penteconters.” In Eur. I.T. 1124, the
chorus merely remark that Iphigenia will be borne home by a

penteconter, while Thucydides (i. 14) explicitly states that, many
generations after the Trojan war, the chief navies of Greece consisted
of but few triremes, and chiefly of “penteconters or of long ships
equipped like them.” From these passages I am inclined to think that
the true meaning of the passage is the literal one, that the soldiers
were placed on board of two transports, that the two triaconters, or
thirty-oared galleys, were ships of war and acted as convoy to them,
and that the small vessel was intended for Dion and his friends to
escape in if necessary. In Dem. Zen. a πεντηκόντορος undoubtedly is
spoken of as a merchant vessel; but this does not prove that there
were no war penteconters in Dion’s time.
[487] Kerkina and Kerkinitis, two low islands off the north coast of
Africa, in the mouth of the Lesser Syrtis, united by a bridge and
possessing a fine harbour. ‘Dictionary of Antiquities.’
[488] The Greek word is κοντός, which is singularly near in sound to
the East Anglian “quant.”
[489] This seems to be the universally accepted emendation of the
unmeaning words in the original text. Grote remarks “The statue and
sacred ground of Apollo Temenites was the most remarkable feature
in this portion of Syracuse, and would naturally be selected to furnish
a name for the gate.” ‘Hist. of Greece,’ part ii. ch. lxxxiv. note.
[490] The main street of Achradina is spoken of by Cicero as broad,
straight and long; which was unusual in an ancient Greek city. See
Grote. ad. loc.
[491] The citadel of Syracuse was built upon the island of Ortygia,
and was therefore easily cut off by a ditch and palisade across the
narrow isthmus by which it was connected with the mainland.
[492] “He offered them what in modern times would be called a
constitution.” Grote.
[493] On this passage Grote has the following note:—“Plutarch states
that Herakleides brought only seven triremes. But the force stated by
Diodorus (twenty triremes, three transports and 1500 soldiers)
appears more probable. It is difficult otherwise to explain the number
of ships which the Syracusans presently appear as possessing.
Moreover, the great importance which Herakleides steps into, as
opposed to Dion, is more easily accounted for.”
[494] The Syracusan cavalry was celebrated, and “the knights” here
and elsewhere no doubt means Syracusan citizens, though at first this

passage looks as if strangers were meant. See ch. 44, where the
knights and leading citizens are mentioned together.
[495] I conceive that the “atrium” or “cavædium” of the house, that
is, the interior peristyle or court surrounded with columns, is meant,
and that Dion, sitting on one side of this room, saw the apparition
behind the columns on the other. An outside portico was a very
unusual appendage to a Greek house, and Dion’s house is said to
have been especially simple and unpretending, whereas nearly all
houses were built with an inner court or “patio,” with its roof
supported by columns, and into which the other rooms of the house
opened.
[496] L. Junius Brutus, consul b.c. 509, was a Patrician, and his race
was extinct in his two sons (Liv. ii. 1-4; Drumann, Junii, p. 1; Dion
Cassius, xliv. 12; Dionys. Hal. Antiq. Rom. v. 18).
[497] Servilia, the wife of M. Junius Brutus, the father of this Brutus,
was the daughter of Livia, who was the sister of M. Livius Drusus,
tribunus plebis b.c. 91. Livia married for her first husband M. Cato, by
whom she had M. Cato Uticensis; for her second husband she had Q.
Servilius Cæpio, by whom she became the mother of Servilia. M.
Junius Brutus, the father of this Brutus, was the first husband of
Servilia, who had by her second husband, D. Junius Silanus, two
daughters. Her son Brutus was born in the autumn of b.c. 85. He was
adopted by his uncle Q. Servilius Cæpio, whence he is sometimes
called Cæpio, and Q. Cæpio Brutus on coins, public monuments, and
in decrees (Drumann, Junii).
[498] Ahala was Magister Equitum to L. Quinctius Cincinnatus. The
story belongs to b.c. 439; and it is told by Livius, iv. 13, 14. The true
name of Mallius Spurius is Spurius Mælius.
[499] This passage is obscure in the original. The parentage of M.
Junius Brutus, the father of this Brutus, does not appear to be
ascertained.
[500] See the Life of Lucullus, c. 42.
[501] See the Life of Cicero, c. 4. Cicero mentions Ariston, which is
probably the true name, in his Tusculanæ Quæstiones, v. 8.
[502] Nothing more is known of him.
[503] The original is obscure. See Sintenis, note; and Schæfer, note.
Kaltwasser follows the reading πρὸς τὰς ἐξόδους, which he translates

“für den Kriegsdienst.”
[504] See the Life of the Younger Cato, c. 35, &c.
[505] Coræs explains the original (σχολαστὴς) to mean “one who is
engaged about learning and philosophy.”
[506] The father of this Brutus was of the faction of Marius, and
tribunus plebis b.c. 83. After Sulla’s return he lost all power, and after
Sulla’s death Pompeius (b.c. 77) marched against Brutus, who shut
himself up in Mutina (Modena). A mutiny among his troops compelled
him to open the gates, and Pompeius ordered him to be put to death,
contrary to the promise which he had given (Life of Pompeius, c. 16).
The allusion at the beginning of this chapter is to the outbreak
between Pompeius and Cæsar, b.c. 49.
[507] P. Sextius was governor of Cilicia. In the text of Plutarch Sicilia
stands erroneously in place of Cilicia: this is probably an error of the
copyists, who often confound these names (see Life of Pompeius, c.
61; Cicero, Ad Attic. viii. 14; ix. 7).
[508] Brutus was a great reader and a busy writer. Drumann (Junii, p.
37) gives a sketch of his literary activity. Such a trifle as an epitome of
Polybius was probably only intended as a mere occupation to pass the
time. The loss of it is not a matter of regret, any further than so far as
it might have supplied some deficiencies in the present text of
Polybius. Bacon (Advancement of Learning) describes epitomes thus:
“As for the corruptions and moths of history, which are epitomes, the
use of them deserveth to be banished, as all men of sound judgment
have confessed; as those that have fretted and corroded the sound
bodies of many excellent histories, and wrought them into base and
unprofitable dregs.”
[509] The story of Cæsar receiving this note is told in the Life of Cato,
c. 24. Cæsar was born on the 12th July, b.c. 100, which is a sufficient
answer to the scandalous tale of his being the father of Brutus. That
he may have had an adulterous commerce with Servilia in and before
b.c. 63, the year of Catiline’s conspiracy, is probable enough.
[510] This was C. Cassius Longinus, who accompanied Crassus in his
Parthian campaign (Life of Crassus, c. 18, &c.). After Cato had retired
to Africa, Cassius made his peace with Cæsar (Dion Cassius, xlii. 13).
[511] Kaltwasser has adopted the correction of Moses du Soul, and
has translated the passage “in Nikaea für den König Deiotarus.” The

anecdote appears to refer clearly to king Deiotarus, as appears from
Cicero’s Letters to Atticus (xiv. 1). See Drumann’s note, Junii, p. 25,
note 83. Coræs would read Γαλατῶν for Λιβύων.
[512] This was the north part of Italy. Cæsar set out for his African
campaign in b.c. 47. Brutus held Gallia in the year b.c. 46. See
Drumann, Junii, p. 26, note 91, on the administration of Gallia by
Brutus.
[513] Plutarch here alludes to the office of Prætor Urbanus, who,
during the year of his office, was the chief person for the
administration of justice. The number of prætors at this time was ten
(Dion Cassius, xlii. 51), to which number they were increased from
eight by Cæsar in b.c. 47. The Prætor Urbanus still held the first rank.
The motive of Cæsar may have been, as Dion Cassius says, to oblige
his dependents by giving them office and rank. Brutus was Prætor
Urbanus in b.c. 44, the year of Cæsar’s assassination.
[514] This anecdote is told in Cæsar’s Life, c. 62.
[515] Q. Fufius Calenus was sent by Cæsar before the battle of
Pharsalus to Greece (Life of Cæsar, c. 43). Megara made strong
resistance to Calenus, and was treated with severity. Dion Cassius
(xlii. 14) says nothing about the lions.
[516] See the Life of Sulla, c. 34, and note to c. 37; and the Life of
Cæsar, c. 53, note.
[517] See the Life of Cæsar, c. 61, and Dion Cassius, xliv. 3, &c.
[518] His name was Quintus. Ligarius fought against Cæsar at the
battle of Thapsus b.c. 46. He was taken prisoner and banished. He
was prosecuted by Q. Delius Tubero for his conduct in Africa, and
defended by Cicero in an extant speech. Ligarius obtained a pardon
from Cæsar, and he repaid the dictator, like many others, by aiding in
his murder. It seems pretty certain that he lost his life in the
proscriptions of the Triumviri (Appian, Civil Wars, iv, 22, 23).
[519] Compare the Life of the Younger Cato, c. 65, 73; and as to
Favonius, the same life.
[520] Q. Antistius Labeo was one of the hearers of Servius Sulpicins
(Dig. i. tit. 2, s. 2, § 44), and himself a jurist, and the father of a more
distinguished jurist, Antistius Labeo, who lived under Augustus. He
was at the battle of Philippi, and after the defeat he killed himself, and

was buried in a grave in his tent, which he had dug for the purpose
(Appian, Civil Wars, iv. 135).
[521] See the Life of Cæsar, c. 64, and the note.
The signs of Cæsar’s death are mentioned in the Life of Cæsar, c. 63.
[522] Brutus was first married to Claudia, a daughter of Appius
Claudius, consul b.c. 54. It was probably in b.c. 55, and after Cato’s
death, that he put away Claudia, for which he was blamed (Cic. Ad
Attic. xiii. 9), and married Porcia, the daughter of Cato, and widow of
M. Calpurnius Bibulus, the colleague of Cæsar in the consulship b.c.
59. As to the affair of the wound, compare Dion Cassius (xliv. 13 &c.).
[523] This was the great architectural work of Pompeius (Life of
Pompeius, c. 40, note).
[524] The same story is told by Appian (Civil Wars, ii. 115).
[525] The circumstances of Cæsar’s death are told in his Life, c. 66;
where it is incorrectly said that Brutus Albinus engaged Antonius in
conversation. To the authorities referred to in the note to c. 66 of the
Life of Cæsar, add Cicero, Phillip. ii. 14, which is referred to by
Kaltwasser.
[526] L. Munatius Plancus, who had received favours from Cæsar, and
the province of Transalpine Gaul, with the exception of Narbonensis
and Belgica b.c. 44.
As to the arrangement about the provinces after Cæsar’s death, see
the Life of Antonius, c. 14.
[527] Compare the Life of Cæsar, c. 68, and the note.
[528] The allusion is to P. Clodius, who fell in a brawl with T. Annius
Milo b.c. 52. See the Life of Cicero, c. 52.
[529] Compare the Life of Cæsar, c. 68.
[530] Now Porto d’Anzo, on the coast of Latium, thirty miles from
Rome. It is now a poor place, with numerous remains of former
buildings (Westphal, Die Römische Kampagne, and his two maps).
[531] These were the Ludi Apollinares (Dion, xlvii. 20), which Brutus
had to superintend as Prætor Urbanus. The day of celebration was the
fourth of Quintilis or Julius. The games were superintended by L.
Antonius, the brother of Marcus, and the colleague of Brutus.

[532] Compare the Life of Cicero, c. 43, and notes; and the Life of
Antonius, c. 16.
[533] Complaints like these, of the conduct of Cicero, appear in the
sixteenth and seventeenth letters of the book which is entitled ‘M.
Tullii Epistolarum ad Brutum Liber Singularis;’ but the genuineness of
these letters is very doubtful. Plutarch himself (Brutus, 53) did not
fully believe in the genuineness of all the letters attributed to Brutus.
[534] Elea, the Romans called this place Velia. It was on the coast of
Lucania, in the modern province of Basilicata in the kingdom of
Naples; and the remains are near Castella a mare della Brucca. Velia
is often mentioned by Cicero, who set sail from thence when he
intended to go to Greece (Life of Cicero, c. 43).
[535] The passages in Homer are, Iliad, vi. 429 and 491, the parting
of Hector and Andromache. The old stories of Greece furnished the
painter with excellent subjects, and the simplicity with which they
treated them may be inferred from Plutarch’s description. The poet
was here the real painter. The artist merely gave a sensuous form to
the poet’s conception. The parting of Hector and Andromache is the
subject of one of Schiller’s early poems.
[536] Dion Cassius (xlvii. 20) describes the reception of Brutus at
Athens. The Athenians ordered bronze statues of Brutus and Cassius
to be set up by the side of the statues of Harmodius and Aristogeiton,
who had liberated Athens from the tyranny of the Peisistratidæ.
[537] See the Life of Pompeius, c. 75. Cicero’s son Marcus was
attending the lectures of Cratippus b.c. 44, and also, as it appears, up
to the time when Brutus came to Athens. Horace, who was now at
Athens, also joined the side of Brutus, and was present at the battle
of Philippi.
[538] A town near the southern point of Eubœa. The Roman
commander who gave up the money, was the Quæstor M. Appuleius
(Cicero, Philipp. x. 11). Plutarch in the next chapter calls him
Antistius.
[539] These are the dying words of Patroclus (Iliad, xvi. 849). Apollo
is Leto’s son.
[540] See the Life of Cicero, c. 43, note; and Dion Cassius (xlvii. 29,
&c.).
[541] A town in Thessalia.

[542] Q. Hortensius Hortalus, the son of the orator Hortensius, who
held the province of Macedonia (b.c. 44), in which Brutus was to
succeed him. He was put to death by M. Antonius after the battle of
Philippi (c. 28).
[543] This may be an error of Plutarch’s copyists. His name was P.
Vatinius (Dion Cassius, xlvii. 21).
[544] The Greek soldiers suffered in this way in their retreat from
Babylonia over the table-land of Armenia (Xenophon, Anabasis, iv. 5,
7). This bulimy is a different thing from that which modern writers call
by that name, and which they describe as a “canine appetite,
insatiable desire for food.” The nature of the appetite is exemplified by
the instance of a man eating in one day four pounds of raw cow’s
udder, ten pounds of raw beef, two pounds of candles, and drinking
five bottles of porter (Penny Cyclopædia, art. Bulimia). The subject of
Bulimia is discussed by Plutarch (Symposiaca, b. vi. Qu. 8).
[545] Now Butrinto, was on the main land in the north part of the
channel which divides Corcyra (Corfu) from the continent. It was
made a Colonia by the Romans after their occupation of Epirus.
Atticus, the friend of Cicero, had land in the neighbourhood of
Buthrotum.
As to the events mentioned at the end of this chapter, compare Dion
Cassius, xlvii. 21-23.
[546] Compare Dion Cassius, xlvii. 22.
[547] This was Decimus Brutus Albinus, who fell into the hands of the
soldiers of M. Antonius in North Italy, and was put to death by order
of Antonius b.c. 43. Compare Dion Cassius (xlvi. 53), and the note of
Reimarus.
[548] Brutus passed over into Asia probably about the middle of b.c.
43, while the proscriptions were going on at Rome. As to Cyzicus, see
the Life of Lucullus, c. 9.
[549] Cassius was now in Syria, whence he designed to march to
Egypt to punish Cleopatra for the assistance which she had given to
Dolabella.
[550] The Mediterranean, for which the Romans had no name.
[551] Xanthus stood on a river of the same name, about ten miles
from the mouth. The river is now called Etchen-Chai. Xanthus is first

mentioned by Herodotus (i. 176), who describes its destruction by the
Persian general Harpagus, to which Plutarch afterwards (c. 31)
alludes. Numerous remains have been recently discovered there by
Fellowes, and some of them are now in the British Museum (Penny
Cyclop. art. Xanthian Marbles, and the references in that article).
The last sentence of this chapter is very confused in the original.
[552] Compare the Life of Pompeius, c. 77, 80.
[553] Brutus and Cassius met at Sardis in the early part of b.c. 42.
[554] The passage to which Plutarch refers is Iliad, i. 259. The
character of Favonius is well known from the Lives of Pompeius and
Cato the Younger.
[555] Kaltwasser has a note on the Roman practice of an invited
guest taking his shadows (umbræ) with him. Horace alludes to the
practice (i. Ep. 5, 28),
——“locus est et pluribus umbris.”
Plutarch discusses the etiquette as to umbræ in his Symposiaca (book
vii. Qu. 6).
[556] The Romans reclined at table. They placed couches on three
sides of the table and left the fourth open. The central couch or sofa
(lectus medius) was the first place. The other sofas at the adjoining
two sides were respectively lectus summus and imus.
[557] Nothing further seems to be known of him. The name Pella is
probably corrupt. The consequence of his condemnation was Infamia,
as to the meaning of which term: see Dict. of Greek and Roman
Antiquities, Infamia. This interview between Brutus and Cassius forms
one of the finest scenes in Shakespeare’s play of Julius Cæsar.
[558] The reading here is probably corrupt. See the note of Sintenis.
[559] The ghost story is told also in the Life of Cæsar, c. 69.
[560] Cassius was one of the Romans who had embraced the
doctrines of Epicurus, modified somewhat by the Roman character.
Cicero in a letter to Cassius (Ad Diversos, xv. 16) rallies him about his
opinions; and Cassius (xv. 19) in reply defends them. Cicero says to
Cassius, that he hopes he will tell him whether it is in his power, as
soon as he chooses to think of Cassius, to have his spectrum
(εἴδωλον) present, before him, and whether, if he should begin to

think of the island Britannia, the image (spectrum) of Britannia will fly
to his mind.
Lucretius expounded the Epicurean doctrines in his poem De Rerum
Natura. In his fourth book he treats of images (simulacra):
“Quæritur in primis quare quod quoique libido
Venerit, extemplo mens cogitet ejus id ipsum.
Anne voluntatem nostram simulacra tuentur,
Et simulac volumus, nobis occurrit imago?”—iv. 781, &c.
The things on which the mind has been engaged in waking hours,
recur as images during sleep:
“Et quo quisque fere studio defunctus adhæret,
Aut quibus in rebus multum sumus ante moratei
Atque in ea ratione fuit contenta magis mens,
In somnis eadem plerumque videmur obire:
Causidicei causas agere et componere leges,
Induperatores pugnare ac proelia obire,” &c.—iv. 963.
He has observed in a previous passage, that numerous images of
things wander about in all directions, that they are of a subtile nature,
and are easily united when they meet; they are of a much more
subtile nature than the things which affect the sight, for they
penetrate through the pores of bodies, and inwardly move the subtile
nature of the mind. He then adds:
“Centauros itaque et Scyllarum membra videmus,
Cerbereasque canum fauceis simulacraque eorum
Quorum morte obita tellus amplectitur ossa.”—iv. 734, &c.
The doctrine which Lucretius inculcated as to the deities, admitted
their existence, but denied that they concerned themselves about
mundane affairs; and they had nothing to do with the creation of the
world. It is one of the main purposes of the poem to free men from all
religious belief, and to show the misery and absurdities that it breeds.
A belief in dæmons would be inconsistent with such doctrines; and as
to the gods, Cassius means to say, that though he did not believe in
their existence, he almost wished that there were gods to aid their
righteous cause.
As to the opinions of Cassius, compare the Life of Cæsar, c. 66.

[561] C. Norbanus Flaccus and L. Decidius Saxa, two legates of
Antonius, who had been sent forward with eight legions, and had
occupied Philippi. The town of Philippi lay near the mountain-range of
Pangæus and Symbolum, which was the name of a place at which
Pangæus joins another mountain, that stretches up into the interior.
Symbolum was between Neapolis (new city) and Philippi. Neapolis
was on the coast opposite to Thasus: Philippi was in the mountain
region, and was built on a hill; west of it was a plain which extended
to the Strymon (Appian, Civil Wars, iv. 1205; Dion Cassius. xlvii. 35).
Philippi was originally called Krenides, or the Springs, then Datus, and
lastly Philippi by King Philippus, of Macedonia, who fortified it.
Appian’s description of the position of Philippi is very clear.
[562] A lustration was a solemn ceremony of purification, which was
performed on various occasions, and before a battle: see Livy, xxix.
47.
The omens which preceded the battle are recorded by Dion Cassius,
xlvii. 49.
[563] M. Valerius Messala Corvinus, of a distinguished Roman family,
was a son of Messala who was consul b.c. 53. After the battle of
Philippi he attached himself to M. Antonius, whom he deserted to join
Octavianus Cæsar. He fought on Cæsar’s side at the battle of Actium
(c. 53). He died somewhere between b.c. 3 and A.D. 3. Messala was a
poet and an historian. His history of the Civil Wars, after the death of
the Dictator Cæsar, was used by Plutarch.
[564] See the note of Sintenis, who proposes to read κεκλημένος for
κεκλημένον, to prevent any ambiguity, such as Kaltwasser discovered
in the passage. It was the birthday of Cassius (Appian, Civil Wars, iv,
113).
[565] Plutarch here quotes the Memoirs of Cæsar. It is of no great
importance who saw the dream, and perhaps there was no dream at
all. Cæsar wished to have an excuse for being out of the way of
danger. Dion Cassius (xlvii. 41) says that it was Cæsar’s physician who
had the dream, but he does not mention his name. See the notes of
Reimarus.
[566] The true name may be Briges. The Briges were a Thracian tribe
(Stephan. Byzant., Βρίγες), who are mentioned by Herodotus (vii. 73).
The Macedonian tradition was that they were the same as the
Phrygians; that so long as they lived in Europe with the Macedonians

they kept the name of Briges, and that when they passed over into
Asia they were called Phryges.
[567] Drumann (Geschichte Roms, i. 516, n. 84) assumes that it is P.
Volumnius Eutrapelus, a boon companion of Antonius. Several of
Cicero’s letters to him are extant (Ad Div. vii. 32, 33).
[568] Plutarch has handled the character of Brutus with partiality. He
could not be ignorant of his love of money and of the oppressive
manner in which he treated his unlucky creditors. Drumann (Junii, p.
20, &c.) has collected the evidence on this point. Though Brutus was
an austere man and affected philosophy, his character is not free from
the imputation of ingratitude to Cæsar, love of power, and avarice. He
seems to have been one of those who deceive themselves into a
belief of their own virtues, because they are free from other people’s
vices. The promise of plunder to his soldiers is not excusable because
Antonius and Cæsar did worse than he intended to do. Plutarch here
alludes to many of the Italians being driven out of their lands, which
were given to the soldiers who had fought on the side of Cæsar and
Antonius at Philippi. The misery that was occasioned by this measure
was one of the chief evils of the Civil Wars. The slaughter in war
chiefly affected the soldiers themselves, and if both armies had been
destroyed, the people would only have been the better for it. The
misery that arose from the ejection of the hard-working husbandmen
reached to their wives and children. But a country which had a large
army on foot which is no longer wanted, must either pay them out of
taxes and plunder, or have a revolution. Necessity was the excuse for
Cæsar and Antonius, and the same necessity would have been the
excuse of Brutus, if he had been victorious. Defeat saved him from
this necessity.
[569] The ships which were bringing aid to Cæsar from Brundusium
under the command of Domitius Calvinus. They were met and
defeated by L. Statius Marcus.
[570] Nothing seems to be known about him. Of course he is not the
Volumnius mentioned in c. 45.
[571] See the Life of Cato the Younger, c. 73.
[572] See the Life of Antonius, c. 70.
[573] The verse is from the Medea of Euripides (v. 332), in which
Medea Is cursing her faithless husband Jason. The educated Romans
were familiar with the Greek dramatists, whom they often quoted.

(Compare the Life of Pompeius, c. 78.) Appian says that Brutus
intended to apply this line to Antonius (Civil Wars, iv. 130).
The other verse, which Volumnius forgot, was remembered by
somebody else, if it be the verse of which Florus (iv. 7) has recorded
the substance, “that virtue is not a reality, but a name.” Dion Cassius
(xlvii. 49, and the note of Reimarus) also has recorded two Greek
verses which Brutus is said to have uttered; but he does not mention
the verse which Plutarch cites. The substance of the two verses cited
by Dion is this:
“Poor virtue, empty name, whom I have serv’d
As a true mistress; thou art fortune’s slave.”
Volumnius might not choose to remember these verses, as Drumann
suggests, in order to save the credit of his friend.
[574] See c. 11, and the Life of the younger Cato, c. 65, 73.
[575] Brutus was forty-three years of age when he died. Velleius (ii.
72) says that he was in his thirty-seventh year, which is a mistake.
The character of Brutus requires a special notice. It is easy enough to
write a character of a man, but not easy to write a true one. Michelet
(Histoire de la Revolution Française, ii. 545), speaking of the chief
actors of the revolution in 1789. ’90, ’91, says: “We have rarely given
a judgment entire, indistinct, no portrait properly speaking; all, almost
all, are unjust; resulting from a mean which is taken between this and
that moment in a person’s life, between the good and the bad,
neutralising the one by the other, and making both false. We have
judged the acts, as they present themselves, day by day, hour by
hour. We have given a date to our judgments; and this has allowed us
often to praise men, whom at a later time we shall have to blame.
Criticism, forgetful and harsh, too often condemns beginnings which
are laudable, having in view the end which it knows, of which it has a
view beforehand. But we do not choose to know this end; whatever
this man may do to-morrow, we note for his advantage the good
which he does to-day: the end will come soon enough.” This is the
true method of writing history; this is the true method of judging
men. Unfortunately we cannot trace the career of many individuals
with that particularity of date and circumstance which would enable
us to do justice. Plutarch does not draw characters in the mass in the
modern way: he gives us both the good and the bad, in detail: but
with little regard sometimes to time and circumstance. He has treated
Brutus with partiality: he finds only one act in his life to condemn

(chap. 46). The great condemnation of Brutus is, that acting in the
name of virtue, he did not know what it was; that fighting for his
country, he was fighting for a party; his Roman republic was a
republic of aristocrats; his people was a fraction of the Roman
citizens; he conceived no scheme for regenerating a whole nation: he
engaged in a death struggle in which we can feel no sympathy. His
name is an idle abused theme for rhetoric; and his portrait must be
drawn, ill or well, that the world may be disabused.
Drumann (Geschichte Roms, Junii, p. 34) has carefully collected the
acts of Brutus; and he has judged him severely, and, I think, truly.
Brutus had moderate abilities, with great industry and much learning:
he had no merit as a general, but he had the courage of a soldier, he
had the reputation of virtue, and he was free from many of the vices
of his contemporaries; he was sober and temperate. Of enlarged
political views he had none; there is not a sign of his being superior in
this respect to the mass of his contemporaries. When the Civil War
broke out, he joined Pompeius, though Pompeius had murdered his
father. If he gave up his private enmity, as Plutarch says, for what he
believed to be the better cause, the sacrifice was honourable: if there
were other motives, and I believe there were, his choice of his party
does him no credit. His conspiracy against Cæsar can only be justified
by those, if there are such, who think that a usurper ought to be got
rid of in any way. But if a man is to be murdered, one does not expect
those to take a part in the act who, after being enemies have received
favours from him, and professed to be friends. The murderers should
at least be a man’s declared enemies who have just wrongs to
avenge. Though Brutus was dissatisfied with things under Cæsar, he
was not the first mover in the conspiracy. He was worked upon by
others, who knew that his character and personal relation to Cæsar
would in a measure sanctify the deed; and by their persuasion, not his
own resolve, he became an assassin in the name of freedom, which
meant the triumph of his party, and in the name of virtue, which
meant nothing.
The act was bad in Brutus as an act of treachery; and it was bad as
an act of policy. It failed in its object—the success of a party, because
the death of Cæsar was not enough; other victims were necessary,
and Brutus would not have them. He put himself at the head of a plot,
in which there was no plan: he dreamed of success and forgot the
means. He mistook the circumstances of the times and the character
of the men. His conduct after the murder was feeble and uncertain;
and it was also as illegal as the usurpation of Cæsar. “He left Rome as

prætor without the permission of the Senate; he took possession of a
province which, even according to Cicero’s testimony, had been
assigned to another; he arbitrarily passed beyond the boundaries of
his province, and set his effigy on the coins.” (Drumann.) He attacked
the Bessi in order to give his soldiers booty, and he plundered Asia to
get money for the conflict against Cæsar and Antonius, for the
mastery of Rome and Italy. The means that he had at his disposal
show that he robbed without measure and without mercy; and never
was greater tyranny exercised over helpless people in the name of
liberty than the wretched inhabitants of Asia experienced from Brutus
the “Liberator” and Cassius “the last of the Romans.” But all these
great resources were thrown away in an ill-conceived and worse
executed campaign.
Temperance, industry, and unwillingness to shed blood are noble
qualities in a citizen and a soldier; and Brutus possessed them. But
great wealth gotten by ill means is an eternal reproach; and the trade
of money-lending, carried on in the names of others, with unrelenting
greediness, is both avarice and hypocrisy. Cicero, the friend of Brutus,
is the witness for his wealth, and for his unworthy means to increase
it.
Reflecting men in all ages have a philosophy. With the educated
Greeks and Romans, philosophy was religion. The vulgar belief, under
whatever name it may be, is never the belief of those who have
leisure for reflection. The vulgar rich and vulgar poor are immersed in
sense: the man of reflection strives to emerge from it. To him the
things which are seen are only the shadows of the unseen; forms
without substance, but the evidence of the substantial: “for the
invisible things of God from the creation of the world are clearly seen,
being understood by the things that are made” (Epistle to the
Romans, i. 20). Brutus was from his youth up a student of philosophy
and well versed in the systems of the Greeks. Untiring industry and a
strong memory had stored his mind with the thoughts of others, but
he had not capacity enough to draw profit from his intellectual as he
did from his golden treasures. His mind was a barren field on which
no culture could raise an abundant crop. His wisdom was the thoughts
of others, and he had ever ready in his mouth something that others
had said. But to utter other men’s wisdom is not enough: a man must
make it his own by the labour of independent thought. Philosophy and
superstition were blended in his mind, and they formed a chaos in his
bewildered brain, as they always will do; and the product is Gorgons
and Hydras and Chimeras dire. In the still of night phantoms floated

before his wasted strength and wakeful eyes; perhaps the vision of
him, the generous and the brave, who had saved the life of an enemy
in battle, and fell by his hand in the midst of peace. Conscience was
his tormentor, for truth was stronger than the illusions of self-imputed
virtue. Though Brutus had condemned Cato’s death, he died by his
own hand, not with the stubborn resolve of Cato, who would not yield
to a usurper, but merely to escape from his enemies. A Roman might
be pardoned for not choosing to become the prisoner of a Roman, but
his grave should have been the battlefield, and the instrument should
have been the hands of those who were fighting against the cause
which he proclaimed to be righteous and just. Cato’s son bettered his
father’s example: he died on the plain of Philippi by the sword of the
enemy. Brutus died without belief in the existence of that virtue which
he had affected to follow: the triumph of a wrongful cause, as he
conceived it, was a proof that virtue was an empty name. He forgot
the transitory nature of all individual existences, and thought that
justice perished with him. But a true philosopher does not make
himself a central point, nor his own misfortunes a final catastrophe.
He looks both backwards and forwards, to the past and the future,
and views himself as a small link in the great chain of events which
holds all things together. Brutus died in despair, with the courage, but
not with the faith, of a martyr.
When men talk of tyranny and rise against it, the name of Brutus is
invoked; a mere name and nothing else. What single act is there in
the man’s life which promised the regeneration of his country and the
freedom of mankind? Like other Romans, he only thought of
maintaining the supremacy of Rome; his ideas were no larger than
theirs; he had no sympathy for those whom Rome governed and
oppressed. For his country, he had nothing to propose; its worn-out
political constitution he would maintain, not amend; indeed,
amendment was impossible. Probably he dreaded anarchy and the
dissolution of social order, for that would have released his creditors
and confiscated his valuable estates. But Cæsar’s usurpation was not
an anarchy: it was a monarchy, a sole rule; and Brutus, who was
ambitious, could not endure that. It may be said that if the political
views of Brutus were narrow, he was only like most of his
countrymen. But why then is he exalted, and why is his name
invoked? What single title had he to distinction except what Cæsar
gave him? A man of unknown family, the son of a woman whom
Cæsar had debauched, pardoned after fighting against his mother’s
lover, raised by him to the prætorship, and honoured with Cæsar’s
friendship—he has owed his distinction to nothing else than

murdering the man whose genius he could not appreciate, but whose
favours he had enjoyed.
His spurious philosophy has helped to save him from the detestation
which is his due; but the false garb should be stripped off. A stoic, an
ascetic, and nothing more, is a mere negation. The active virtues of
Brutus are not recorded. If he sometimes did an act of public justice
(c. 35), it was not more than many other Romans have done. To
reduce this philosopher to his true level, we ask, what did he say or
do that showed a sympathy with all mankind? Where is the evidence
that he had the feeling of justice which alone can regenerate a
nation? But it may be said, why seek in a Roman of his age what we
cannot expect to find? Why then elevate him above the rest of his age
and consecrate his name? Why make a hero of him who murdered his
benefactor, and then ran away from the city which he was to save—
from we know not what? And why make a virtuous man of him who
was only austere, and who did not believe in the virtues that he
professed? As to statesmanship, nobody has claimed that for him yet.
“The deputy of Arras, poor, and despised even by his own party, won
the confidence of the people by their belief in his probity: and he
deserved it. Fanatical and narrow-minded, he was still a man of
principles. Untiring industry, unshaken faith, and poverty, the
guarantee of his probity, raised him slowly to distinction, and enabled
him to destroy all who stood between him and the realisation of an
unbending theory. Though he had sacrificed the lives of others, he
scorned to save his own by doing what would have contradicted his
principles: he respected the form of legality, when its substance no
longer existed, and refused to sanction force when it would have been
used for his own protection” (Lamartine, Histoire des Girondins, liv.
61, ix.). A great and memorable example of crime, of fanaticism, and
of virtue; of a career commenced in the cause of justice, in truth, faith
and sincerity; of a man who did believe in virtue, and yet spoiled the
cause in which he embarked, and left behind him a name for universal
execration.
Treachery at home, enmity abroad, and misconduct in its own leaders,
made the French Revolution result in anarchy, and then in a tyranny.
The Civil Wars of Rome resulted in a monarchy, and there was nothing
else in which they could end. The Roman monarchy or the Empire
was a natural birth. The French Empire was an abortion. The Roman
Empire was the proper growth of the ages that had preceded it: they
could produce nothing better. In a few years after the battle of
Philippi, Cæsar Octavianus got rid of his partner Antonius; and under

the administration of Augustus the world enjoyed comparative peace,
and the Roman Empire was established and consolidated. The genius
of Augustus, often ill appreciated, is demonstrated by the results of
his policy. He restored order to a distracted state and transmitted his
power to his successors. The huge fabric of Roman greatness resting
on its ancient foundations, only crumbled beneath the assaults that
time and new circumstances make against all political institutions.
[576] Velleius (ii. 71, quoted by Kaltwasser) states that some of the
partisans of Brutus and Cassius wished Messala to put himself at the
head of their party, but he declined to try the fortune of another
contest.
[577] Compare the Life of Antonius, c. 22. Appian (Civil Wars, iv. 135)
makes the same statement as Plutarch about the body of Brutus. It is
not inconsistent with this that his head was cut off in order to be sent
to Rome and thrown at the feet of Cæsar’s statue, as Suetonius says
(Sueton. August. 13). Dion Cassius adds (xlvii. 49) that in the passage
from Dyrrachium a storm came on and the head was thrown into the
sea.
[578] Nikolaus of Damascus, a Peripatetic philosopher, and a friend of
Augustus, wrote a universal history in Greek, in one hundred and
forty-four books, of which a few fragments remain. There is also a
fragment of his Life of Augustus. The best edition is that of J.C. Orelli,
Leipzig, 1804, 8vo.; to which a supplement was published in 1811.
[579] The work of Valerius Maximus is dedicated to the Emperor
Tiberius. The death of Porcia is mentioned in lib. iv. c. 6, 5. Appian
(Civil Wars, iv. 136) and Dion Cassius (xlvii. 49) give the same account
of Porcia’s death.
[580] Plutarch here evidently doubts the genuineness of the letter
attributed to Brutus. The life of Brutus offered good materials for the
falsifiers of history, who worked with them after rhetorical fashion.
There are a few letters in the collection of Cicero which are genuine,
but the single book of letters to Brutus (M. Tullii Ciceronis Epistolorum
ad Brutum Liber Singularis) is condemned as a forgery by the best
critics. It contains letters of Cicero to Brutus, and of Brutus to Cicero;
and a letter of Brutus to Atticus. Genuine letters of Brutus, written day
by day, like those of Cicero, would have formed the best materials
from which we might judge him.
[581] A despatch rolled in a peculiar manner. See vol. ii. Life of
Lysander, ch. 19.

[582] The battle of Kunaxa was fought on the 7th of September 401
b.c.
[583] The title of a great Persian officer of State.
[584] Egypt revolted from Persia b.c. 358. See vol. iii. Life of
Agesilaus, ad. fin.
[585] A people of Media on the Caspian Sea.
[586] See Grote on Epameinondas. “The muscularity, purchased by
excessive nutriment, of the Bœotian pugilist.” (Hist. of Greece, part ii.
ch. lxxvii.)
[587] See vol. iii. Life of Agesilaus, c. 13, note.
[588] Ptolemy, King of Egypt.
[589] The reading Adria is obviously wrong. Droysen suggests Andros;
but Thirlwall much more reasonably conjectures that the word should
be Hydrea, observing that the geographical position of Andros does
not suit the account given in the text. Clough prefers to read Andros,
saying that “Aratus would hardly be thought to have gone from
Hydrea to Eubœa, which is near enough to Andros to make the
supposition in this case not unnatural.” But I think that this argument
makes just the other way, for the object of Aratus’s slaves was to tell
the Macedonian officer that their master was gone to a place so far
away that it would be useless to attempt to follow him.
[590] The word which I have here translated “portraits” generally
means statues, but not necessarily. Probably most of the despots
were commemorated by statues.
[591] Philip of Macedon, the father of Alexander, I suppose is meant.
[592] This Alexander was the son of Kraterus, and grandson of
Alexander the Great’s general of that name.
[593] A common precaution against surprise. See above, ch. viii.
[594] This was Demetrius II., the son of Antigonus Gonatas, who
succeeded his father on the throne of Macedonia, b.c. 239.
[595] Apparently the great seal of the league is meant, which we
must suppose was entrusted to the general for the time being.
[596] I., ii. 607.

[597] Philip’s object in this expedition was to make himself master of
Apollonia and Oricum.
[598] “He was forced to burn his ships and retreat overland, leaving
his baggage, ammunition, and a great part of the arms of his troops
in the enemy’s hands.” (Thirlwall’s History, ch. lxiv).
[599] See Merivale’s ‘History of the Romans under the Empire,’ ch. liii.
vol. vi. page 142, note.
[600] Quintus Catulus Capitolinus.
[601] Nero set a price upon the head of Vindex, whose designs were
speedily revealed to him, and though the forces of the Gaulish
province were disposed to follow their chief, the more powerful
legions of Lower Germany, under Virginius Rufus, were in full march
against them. The armies met at Vesontio, and there Virginius and
Vindex at a private interview agreed to conspire together, but their
troops could come to no such understanding; the Virginians attacked
the soldiers of Vindex, and almost cut them to pieces. Vindex
thereupon, with the haste and levity of his race, threw himself upon
his sword, and the rebellion seemed for a moment to be crushed.
Merivale’s ‘History of the Romans under the Empire,’ vol. vi. ch. lv.
[602] Nero died on the 9th of June, A.D. 68.
[603] The gold ring was presented by the Roman emperors in much
the same way as the insignia of an order of chivalry is given by
modern sovereigns. Under the republic it had been the distinguishing
mark of the equestrian order, and its possession still continued to
raise its recipients to the rank of ‘eques,’ cf. Plin. H.N. 33, 2, and
Paulus i. 5, de jure anul.
[604] Clough well remarks that here we may observe the beginning of
a state-post, which still exists on the continent of Europe, by which all
government couriers, &c., were forwarded free of expense. The
modern terms of “diplomacy,” “diplomatist,” &c., is derived from the
“diplomata,” or folded and sealed dispatches carried by such persons.
[605] Narbonne.
[606] Tacitus sums up the characters of these two men after his
manner. “Titus Vinius and Cornelius Laco, the one the worst, the other
the laziest of men, &c.” Tac. Hist. i. 6.

Welcome to our website – the perfect destination for book lovers and
knowledge seekers. We believe that every book holds a new world,
offering opportunities for learning, discovery, and personal growth.
That’s why we are dedicated to bringing you a diverse collection of
books, ranging from classic literature and specialized publications to
self-development guides and children's books.
More than just a book-buying platform, we strive to be a bridge
connecting you with timeless cultural and intellectual values. With an
elegant, user-friendly interface and a smart search system, you can
quickly find the books that best suit your interests. Additionally,
our special promotions and home delivery services help you save time
and fully enjoy the joy of reading.
Join us on a journey of knowledge exploration, passion nurturing, and
personal growth every day!
ebookbell.com