Genome Sequencing
•Goal
figuring the order of nucleotides across a genome
•Problem
Conventional DNA sequencing methods can
handle only short stretches of DNA at once (<1-
2Kbp)
•Solution
Sequence and then use computers to assemble
the small pieces
4
Genome Sequencing
5
5
ACGTGGTAA CGTATACAC TAGGCCATA
GTAATGGCG CACCCTTAG
TGGCGTATA CATA…
ACGTGGTAATGGCGTATACACCCTTAGGCCATA
Short fragments of DNA
AC..GC
TT..TC
CG..CA
AC..GC
TG..GT TC..CC
GA..GC
TG..AC
CT..TG
GT..GC AC..GC AC..GC
AT..AT
TT..CC
AA..GC
Short DNA sequences
ACGTGACCGGTACTGGTAACGTACA
CCTACGTGACCGGTACTGGTAACGT
ACGCCTACGTGACCGGTACTGGTAA
CGTATACACGTGACCGGTACTGGTA
ACGTACACCTACGTGACCGGTACTG
GTAACGTACGCCTACGTGACCGGTA
CTGGTAACGTATACCTCT...
Sequenced genome
Genome
Technical Breakthrough For
DNA Sequencing
In 1977, two separate methods for the
large-scale sequencing of DNA were
devised:
•Chemical cleavage method
by A. M. Maxam and W. Gilbert
•Enzymatic chain termination method
by F. Sanger et. al.
Chemical Cleavage Method
•This method uses double-stranded DNA samples.
•Involves modification of the bases in DNA followed by
chemical base-specific cleavage.
•Sequences DNA fragments containing upto ~500
nucleotides in length.
Stages:
1.The double-stranded fragment to be
sequenced is isolated and radioactively
labeled at the 5’-ends with
32
P.
2.The fragment is then cut with restriction
enzyme and thus the label is removed
from one end.
3.The fragment of DNA with one end
labeled is denatured.
4.Four identical samples of these end-
labeled DNA restriction fragments are
subjected to chemical cleavage at
different chemical nucleotides.
5.There are four specific sets of chemical
reactions that selectively cut the DNA
backbone at G, A+G, C+T, or C residues.
–G only: Dimethyl sulphate(DMS)
and piperidine
–A+G : DMS, piperidine
–C+T : Hydrazine, piperidine
–C only : Hydrazine, alkali,
piperidine
Figure: Maxam-Gilbert method
(continued)
Lodish, H.;Berk, A. et. al. (4
th
ed);
Mol. Cell Biol.; W. H. Freeman and Co. (2000)
p: 233
6.For each labeled chain to be
broken only once, the reactions
are controlled.
7.The labeled subfragments
created by the four reactions
have
–the
32
P label at one end and
–the chemical cleavage point
at the other end.
8.The reaction products are
separated by polyacrylamide gel
electrophoresis which is based
on size. Smallest fragment goes
fastest.
Figure: Apparatus for gel electrophoresis
Voet, D.; Voet, J. and Pratt, C. (upgrade ed)
Fundamentals of Biochemistry; John Wiley and Sons, Inc
(2002); p: 58
9. The labeled fragments
in the gel are visualized
by autoradiography.
10. The sequence is read
from bottom to top of
the gel.
Figure: Maxam-Gilbert method
Lodish, H.;Berk, A. et. al. (4th ed);
Mol. Cell Biol.; W. H. Freeman and Co. (2000)
p: 233
Example of DNA Sequencing by
Chemical Method
http://users.wmin.ac.uk/~redwayk/lectures/sequence.htm
Mechanism of the chemical cleavage
method
Voet, D.; Voet, J. Biochemistry; John Wiley and Sons, Inc (1990); p: 830
Continued
Voet, D.; Voet, J. Biochemistry; John Wiley and Sons, Inc (1990); p: 831
Advantages Disadvantages
•No premature termination due
to DNA sequencing. So, no
problem with polymerase to
synthesize DNA.
•Stretches of DNA can be
sequenced which can not be
done with enzymatic method.
•Not widely used.
•Use of radioactivity and
toxic chemicals.
http://www.cmb.uab.edu/courses/lectures/scheirer2.pdf
Chain Termination method
•This method uses single-stranded DNA.
•Also known as dideoxy sequencingmethod because it involves the
use of analogue of normal nucleotide 2’,3’-dideoxynucleoside
triphosphates (ddNTPs). These are chain terminating nucleotides
lacking 3’-OH ends.
•This method is based upon the incorporation of ddNTPs into a
growing DNA strand to stop chain elongation.
Figure: Structure of NTP,
dNTP, and ddNTP
Lodish, H.;Berk, A. et. al. (4th ed);
Mol. Cell Biol.; W. H. Freeman and Co. (2000), p:
233
Stages:
1.The DNA to be sequenced is called the template DNA. It is prepared as a single-stranded DNA
after being spliced into M13 vector DNA. Infected E. colihost cells release phage particles
which contains single-stranded recombinant DNA that includes the sample DNA. This DNA
sample is then extracted from phage for sequencing purpose.
2. A synthetic 5’-end-labeled oligodeoxynucleotide is used as the primer.
3. The template DNA is hybridized to the primer.
4. The primer elongation is performed in four separate polymerization reaction mixtures. Each
mixture contains
-4 normal deoxynucleotides (dNTPs)
in higher concentration and
-a low concentration of the each of
the 4 ddNTPs.
5. There is initiation of
DNA synthesis by adding
enzyme DNA polymerase
since the enzyme cannot
distinguish between the
normal nucleotides and
their analogues.
Figure: Action of DNA polymerase I
Voet, D.; Voet, J. and Pratt, C. (upgrade ed)
Fundamentals of Biochemistry; John Wiley and Sons, Inc (2002); p: 60
6. The strand synthesis continues
until a ddNTP is added. The
chain elongation ceases on the
incorporation of a ddNTP
because it lacks a 3’-OH group
which prevents addition of the
next nucleotide.
7. There is a result of mixture of
terminated fragments, all of
different lengths.
8. Denature DNA fragments.
9. Each of the four mixtures are run
together on a polyacrylamide gel
for electrphoresis.
Figure: Sanger method
Lodish, H.;Berk, A. et. al. (4th ed);
Mol. Cell Biol.; W. H. Freeman and Co. (2000)
p: 234
10. The separated
fragments are
then visualized
by autography.
11. From the position
of the bands of the resulting
autoradiogram, the
sequence of the original
DNA template strand can
be read directly.
Figure: Chain termination method
Voet, D.; Voet, J. and Pratt, C. (upgrade ed)
Fundamentals of Biochemistry; John Wiley and Sons, Inc (2002);
p: 61
•Most popular method.
•Simpler and quicker allowing
large output. Within an hour
the primer-annealing and
sequencing reactions can be
completed.
•Yielding of poor results owing
to secondary structure in the
DNA as sometimes DNA
polymerases terminate chain
elongation prematurely.
•The sequence is obtained not
from the original DNA molecule
but from an enzymatic copy. So,
there is a chance of
incorporation of wrong bases.
http://www.ich.ucl.ac.uk/cmgs/sequence.htm
Advantage
Disadvantage
Example of DNA Sequencing in
Sanger Method
http://users.wmin.ac.uk/~redwayk/lectures/sequence.htm
Other Improved Approaches and Automated DNA Sequencing
•Updated version of Sanger method
•Fluorescence detection with lasers
•Cycle sequencing
•Shotgun sequencing
http://www.cmb.uab.edu/courses/lectures/scheirer2.pdf
chsfpc5.chem.ncsu.edu/Poznan/ chem_bio/sld026.htm
opbs.okstate.edu/.../ sld015.htm
many pieces
to assemble
High coverage:
Assembly: How Much DNA?
25
Low coverage:
A few pieces
to assemble
a few contigs,
a few gaps
many contigs,
many gaps
Input Output
Lander and Waterman,
1988
Sanger Sequencing
26
1980 1990 2000
1982: lambda virus
DNA stretches up to
30-40Kbp
(Sanger et al.)
1994: H. Influenzae
1.8 Mbp
(Fleischmann et al.)
2001: H. Sapiens,
D. Melanogaster
3 Gbp
(Venter et al.)
2007: Global Ocean
Sampling Expedition
~3,000 organisms,
7Gbp (Venter et al.)
27
Next Generation Sequencing:
Why Now?
•Motivation:HGP and its derivatives,
personalized medicine
•Short reads applications:(re-)sequencing,
other methods (e.g. gene expression)
•Advancements in technology
28
High Parallelism is Achieved in
Polony Sequencing
PolonySanger
29
Generation of Polony array:
DNA Beads (454, SOLiD)
DNA Beads are generated using Emulsion PCR
30
Generation of Polony array:
DNA Beads (454, SOLiD)
DNA Beads are placed in wells
31
Generation of Polony array:
Bridge-PCR (Solexa)
DNA fragments are attached to array and
used as PCR templates
32
Sequencing: Pyrosequencing
(454)
Complementary strand elongation: DNA Polymerase
35
Sequencing: Fluorescently
Labeled Nucleotides (ABI SOLiD)
5 reading frames, each position is read twice
36
Single Molecule Sequencing:
HeliScope
•Direct sequencing of DNAmolecules: no
amplification stage
•DNA fragments are attached to array
•Potential benefits: higher throughput, less
errors
38
What, When and Why
•Sanger:
Small projects (less than 1Mbp)
•454:
De-novo sequencing, metagenomics
•Solexa, SOLiD, Heliscope:
–Gene expression, protein-DNA interactions
–Resequencing
39
Applications
40
Applications
41
Where Do We Go from Here?
•Higher throughput, longer reads (Pacific
BioSciences)
•Computational bottleneck
•Shift to sequencing-based technologies
•Will it help to cure cancer?