GENOME-WIDE
ASSOCIATION STUDIES ~
GWAS
Zané Lombard; Wits Bioinformatics
Identifying disease genes
Identifying genes that contribute to disease risk is one
of the main objectives of molecular research
Such findings have contributed to improvements in
diagnosis, prognosis and therapy
With the successful identification of disease genes for
many single-gene disorders, the focus has shifted to
diseases with a complex, multifactorial aetiology
What are the approaches available?
How our genome can influence disease
Chromosomal abnormalities
Mutation (direct cause and effect)
Normal red
blood cells
Sickle red blood
cells
How our genome can influence disease
Multifactorial inheritance
Identifying disease genes – association
Association refers to the co-occurrence of a
genetic variant with a disease trait, more
frequently than can be readily explained by
chance.
Two approaches to association studies:
Candidate gene approach
Genome-wide approach
How is GWAS possible?
Decide to genotype 12 million common SNPs
Collect 1,000 cases and 1,000 controls
Genotype all DNAs for all SNPs
That adds up to 24 billion genotypes
Imagine, this approach cost 50 cents a genotype.
That’s R12 billion for each disease/trait –
completely out of the question!!
IMPORTANT CONCEPTS
FACILITATING GWAS
SNP discovery
Most common type of genetic variation
In latest build of dbSNP (138) there are 62,676,337 SNPs
SNPs are scattered throughout the genome
The abundance of SNPs and the ease with which they
can be measured make these genetic variations
significant
locus Allele: A Allele: C
Genotype: AC
Common Disease – Common Variant
Hypothesis
Common disorders are likely influenced by genetic
variation that is also common in the population.
Common variants have small effects (low penetrance)
Multiple variants contribute to disease susceptibility
HapMap and Linkage Disequilibrium
tagSNPs & Indirect Association
Based on analysis of data from the HapMap project,
>80% of commonly occurring SNPs in European descent
populations can be captured using a subset of 500,000
to 1,000,000 SNPs scattered across the genome.
Technology advances
Affymetrix
Illumina
Designing a GWAS
Phenotype measures
Case-control vs Quantitative design
Sample size & Power
Population considerations
Image files (±400Mb/individual)
Data files (±600Mb)
What does the data look like?
Analysis of a GWAS
Quality control
Statistical testing
Covariates
Multiple testing
Quality Control
Batch effects
SNP quality control
Missingness
HWE
MAF
Sample quality control
Missingness
Relatedness!
Population structure
Statistical testing - Association
Quantitative traits
Generalized linear model (GLM) approaches
Case-control
Logistic regression
Manhattan plots & Regional plots
Multi-locus Analysis: Allele Scoring
Multiple testing
The cumulative likelihood of finding one or more
false positives over the entire GWAS analysis is
high.
Bonferroni correction:
Adjust = 0.05 to = (0.05/k) where k is the
number of statistical tests conducted.
False-discovery rate
Permutation testing
Finishing your GWAS work
Replication
Meta Analysis
Functional Studies