DNA SEQUENCING TECHNIQUES AND TYPES OF SEQUENCINGppt
hemantshiv1985
9 views
92 slides
Aug 31, 2025
Slide 1 of 92
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
About This Presentation
DNA Sequencing
Size: 4.44 MB
Language: en
Added: Aug 31, 2025
Slides: 92 pages
Slide Content
DNA sequencing
“the technology lecture”
•Part 1: Chemistry, instrumentation and data
analysis
•Part 2: Large-scale operations, comparative
sequencing.
•Part 3: Sequencing analysis, variation
analysis.
•Abbrev 02/03/05, rev 10/17/05
Technology: Importance
•Approaches to research
–hypothesis driven
–discovery driven
–technology development
•Technology drives research drives technology
–paradigm changing
–punctuated versus gradual
•Vision
–personalized medicine
DNA sequencing: Importance
•Basic blueprint for life; Aesthetics.
•Gene and protein
–Function
–Structure
–Evolution
•Genome-based diseases- “inborn errors of metabolism”
–Genetic disorders
–Genetic predispositions to infection
–Diagnostics
–Therapies
DNA sequencing methodologies:
ca. 1977!
•Maxam-Gilbert
–base modification by
general and specific
chemicals.
–depurination or
depyrimidination.
–single-strand excision.
–not amenable to
automation
•Sanger
–DNA replication.
–substitution of
substrate with chain-
terminator chemical.
–more efficient
–automation??
Maxam-Gilbert ‘chemical’ method
versus “bio” based methods
•Sanger
•dideoxynucleotides
DNA chemistry
DNA biochemistry: replication
fork
DNA replication: biochemistry
OC N
purine
or
pyrimidine
PO
O
OH
PO
O
OH
PO
O
OH
HO
PO
O
OH
O OC N
purine
or
pyrimidine
OH
5’
3’
now, DNA “sequencing:” Sanger
dideoxy method I
OC N
purine
or
pyrimidine
PO
O
OH
PO
O
OH
PO
O
OH
HO
Hdideoxyribonucleoside triphosphate
(ddNTP)
DNA sequencing: Sanger II
OC N
purine
or
pyrimidine
PO
O
OH
PO
O
OH
PO
O
OH
HO
PO
O
OH
O OC N
purine
or
pyrimidine
H
chain
termination
method
DNA sequencing: in practice
template + polymerase +
1
dCTP
dTTP
dGTP
dATP
ddATP
primer
2
dCTP
dTTP
dGTP
dATP
ddGTP
primer
3
dCTP
dTTP
dGTP
dATP
ddTTP
primer
4
dCTP
dTTP
dGTP
dATP
ddCTP
primer
extension
electrophoresis
A•T
G•C
A•T
T•A
C•G
T•A
G•C
G•C
A•T
G•C
T•A
T•A
C•G
T•A
G•C
A•T
Manual radioactive sequencing
Semi-automated fluorescent
DNA sequencing
•Fred Sanger et. al.,
1977.
•Walter Gilbert et. al.,
1977.
•Leroy Hood et. al.
1986.
•Applied Biosystems,
Inc.
•DuPont Company
DNA sequencing: upgrade,
second iteration, terminator-label
•Disadvantages of primer-labels:
–four reactions
–tedious
–limited to certain regions, custom oligos or
–limited to cloned inserts behind ‘universal’
priming sites.
•Advantages:
•Solution:
–fluorescent dye terminators
ABI series: 370, 373 and 377
•semi-automated
•“best” pre- and post-
•higher throughput
operations.
•bioinformatics
limitations, ‘scuze me-
“opportunities.’”
ABI 370s-series screen dump
Bioinformatics part one: pixel
refinement
ABI 377 envelope: 96 lanes
genome sequencing strategies
•Shotgun
•Directed primer walks
•Modified directed primer walks
Sequencing strategies
Whole genome
Also on a smaller scale: 1. “Island walking” and 2. Primer walking.
Rapid re-sequencing of human Ad1: Time trial.
Have sequence of Ad 1.
In theory, have a minimally tiled set of PCR primers to cover entire 36,001 base genome.
In theory, have a minimally tiled set of sequencing primers as well.
Want draft sequence in a minimal time, including primer delivery from a vendor.
In practice design two parallel sets of minimally tiled PCR primers and amplify two sets.
In practice, assume 750 base reads--> 48 primers, one direction.
Compare with consensus: Determine accuracy, timing and evaluate operation.
1 36,001
115 7,315
7,300 14,500
14,400 21,600
21,500 28,700
28,600 35,885
Custom primer walks and “island” hopping
•Have scaffold of generic genome: related or compiled.
•Have archived “islands of sequences” (lg, med, sm)- from other research interests.
•Generate “in-bound” primers to re-sequence equivalents and known features, e.g., 3’-
ITR.
•Use custom “out-bound” primers to walk across “inter-island” sequences (PCR and
sequencing.
•Collect “1st +” draft genomic sequence as round 1.
•Iterative walks to complete “2+1” consensus, with error rate 1/10,000 bases.
Target: HAdV4
•For 36,000 bases, need 90 primers for 1x coverage (1
st
draft) and 270 primers for 3x coverage
(finished).
•Have from GenBank: 10 “islands” @ 30%= 10,883 bases,
–calling for 27x2= 54 primers for complementing coverage.
•Theory (if continuous sequence): 36,000-10,883= 25,117 bases.
–At 400 bases per read, need 63 primers for 1x coverage, or 126 for complementing
coverage.
•Practice: 10 “islands” @ 30%= 10,883 bases, 80 primers.
•Example: “Island 1” is 149 bases.
–1 fragment at 400 bases/read.
–2 primers for 1x coverage.
–“Terminal island,” need only 1 “outbound” primer.
–Total of (1x2)+1= 3 primers.
•Example: “Island 2” is 2042 bases.
–5 fragments at 400 bases/read.
–“Internal island,” need 2 “outbound” primers.
–Total of (5x2)+2= 12 primers.
Definition of tiled set of PCR primers: Data.
A
B
C
D
E
F
G
H
“B”
“C”
“D”
“E”
PCR fragments
DNA sequencing: Computation
•Input from sequencer
–peak intensities
•Output to user
–DNA sequence
1.normalize intensities
2.apply mobility corrections
3.predict bands
4.call bases
DNA sequencing: Computation
DNA sequencing: Computation
Sequence assembly:
“Sequencher”
Applications DNA sequencing
•Whole genome analysis
•Comparative genomics
•Applications to subfields
Shimadzu, Ltd.
•NEW ORLEANS, March 19, 2002. PittCon.
•Faster and more economical DNA Sequencer.
•10 times faster and 90 percent cheaper to run than current state-of-the-
art.
•GenoMEMS, MA spinoff that has developed a microfabrication
technology, based on Whitehead Inst. technology.
•Microelectromechanical system, or MEMS,
technology:microfabricated electrical and mechanical components
•Five million bases per day.
•Readlengths of 800 bases.
•
Target 2003.
•TODAY (2005) Solexa, 454 etc. $1,000 genome- $100,000
genome
Other considerations: automation
Bioinformatics issues in comparative DNA
sequencing
Done! Now what? ex., Ad 1 assembly:
Consensus
Genome characterization
•Align DNA sequence with archived sequences.
•Annotate DNA features, e.g., RE sites, GC sites, replication and
transcription factor binding sites.
•Annotate ORFs.
•Annotate genes and proteins.
•Phylogenetic analyses of genes.
•Whole genome comparisons.
•Phylogenetic analyses of genomes.
•Identify cellular homologues or “ancient history”
-horizontal transfer.
Genome Sequence
Annotation.
•Annotation flowchart.
•Summary of findings.
•Comparison of genome sequences.
From the sequencing projects:
Biological features in sequence ?
ATG TAG
TAA
TGA
GT AG
PROMOTER
POLY A SIGNAL
EXON
INTRON
EXON
Genome sequence annotation:
(M. Zorn, Berkeley, 2002)
• Extraction, definition and interpretation of features in the genome
sequence by integrating computational tools and biological knowledge.
• “Proofread” the sequence: correct miscalls. Sequence data needs to be
“cleaned up” for chip design.
Adenoviruses:
Non-enveloped icosahedral viruses .
Multiply in the host nucleus.
Linear double-stranded DNA genome, 26-45bp in size.
Infect most vertabrates from fish to humans.
Human adenoviruses Mastadenovirus.
51 human serotypes divided into six sub-genera
(Group A-F).
–HAdB1: Ads 3, 7, 16, 21. (respiratory infections)
–HAdB2: Ads 11, 14, 34, 35, 50. (kidney and UT infections
except, 11a and 14)
–HAdE: Ad 4. (respiratory infections)
From Stone et al, 2003.
Early
Intermediate
Late
Transcription units
Gene annotation of adenovirus genome: BasicGene annotation of adenovirus genome: Basic
GLIMMER2
Artemis
RBSFinder
ORFs
Start/Stop Codon
Verification
Translated frames
GenBank
Non-redudant
Protein databases
Sequence
Alignments
BLASTP
GENES: name, CDS (Splice sites), MW
Refined ORFs
Artemis: six frames
translation
CLUSTALW
Advanced: Detailed annotation of genes
GenBank: E4 Superfamily: regions 1 and 2
Join: region 1, region2:117~306 nt in between
GenBank E4 Superfamily:
17 KD, 20 KD, 24KD, 27 KD
CLUSTALW Artemis:six frame translation
Annotated Human type 1 adenovirus E4 genes:
Spliced from 2 to 3 exons
Two annotation approaches to HAdV1
Based on Ad 2 annotation Generic annotation plus advanced
HAdV1 genes
Phylogenetic analysis of genes
Global tools for whole genome analyses
•Databases and data streams “readily” available.
•Data mining opportunities: “added value.”
•Limited tools in tool set, especially whole genome comparisons:
MAP, GeneOrder and CoreGenes.
•Non-available or non-optimal tools: Automated annotation, etc.
•These whole genome analysis tools have value for the EOS project, in
particular the PCR-based assays and the microarray “re-sequencing”
assays.
Genome analysis: Continuation.
GCG SeqWeb Compare: Adenovirus genomes
FLAG: Fast Local Alignment for Gigabases
FLAG Ad 1 vs 2 vs 5
•Ad 1 vs Ad 2
•Ad 1 vs Ad 5
•Ad 2 vs Ad 5
Get GenBank file from NCBI website
Remove unnecessary
information and save
Convert to FASTA format
Convert to database format for BLASTP
Error message
Yes
Stop
No
Break query file into single query.
Save each query in a temporary file.
BLASTP against database.
Get BLASTP results based on selected ranges
Extract and print table/graph
Problem during
process
GeneOrder flowchart
GeneOrder analysis: Example
•Manually plot with MS-Excel.
•Each point is a coding gene.
•Co-linear arrangements suggest
synteny.
•Several regions of genomic
rearrangement events within the
genomes of the two chloroplasts.
•Rearrangements include flipping
of entire set of genes.
•Two versions have been
developed: GO1 and 2.
•Ongoing work include recoding
for megabase genomes, which
have additional value.
0
5 0
1 0 0
1 5 0
2 0 0
2 5 0
3 0 0
0 5 0 1 0 0 1 5 0 2 0 0 2 5 0
V a c c i n i a
0
5 0
1 00
1 50
2 00
2 50
3 00
0 50 1 00 1 50 2 00 2 50
V ac cin ia
0
5 0
10 0
15 0
20 0
25 0
30 0
0 1 0 0 2 00 3 0 0
M s E PV
Poxvirus genomes: Gene order2.0 analysis II
GeneOrder2.0 analysis: AmEPV v MsEPV
GeneOrder identifies similar genes in two genomes
Organize common genes in five genomes (genera) as “Alphabet”
Add other genes based on additional information
Use Advanced BLAST to check the Alphabet
Use PSI-BLAST with several iterations to check the Alphabet
Scan entire NCBI protein database using conserved
profiles to ensure that all the conserved proteins have been extracted
Compare Alphabet with experimental TS mutant data to determine
the essential genes for pox viruses
Conserved genes of poxviruses
Orthologous gene locator
•Develop software tool to characterize genomes globally.
•Characterize genomes by identifying orthologous genes.
•Identify paralogs.
•Characterize unknown genes by identifying orthologs.
•Rapid automated comparisons of genomes.
•Identify “alphabet” of essential genes.
•“CoreGenes.”
•In general, high BLAST may not be orthologous/homologous
Transcription/RNA
modification
34%
DNA
replication/repair
10%
Structural
24%
Other Enzymes
12%
Unknown
20%
Conserved genes of poxviruses
Applications of CoreGenes to EOS Affy chip
design
•HNC, San Diego, has a DTRA contract to build a software tool to determine
sequences common to bacterial pathogens, allowing for identification of probes
and primers: “BugID.”
•HNC has been tasked to reformat “BugID” for examining virus genomes,
which do have “core” genes, conserved at the amino acid but not necessarily at
the nucleotide level. One preliminary exercise is to develop software to
identify essential and related proteins.
•“CoreGenes” from GMU already performs this function. It presents a table of
“core” and presumably essential genes from families of organisms.
•“CoreGenes” is under continued development. One feature is to present tables
of related, slightly related and unrelated genes.
•This has value in identifying probes and primers for assays such as
microarrays.
CoreGenes: Chloroplasts analysis
CoreGenes: Mitochondria analysis
CoreGenes as annotation tool
Annotation of human adenoviruses
Revised annotation HAdV genomes
Automated annotation
Transform newly determined DNA sequence into linear array:
•Input: DNA sequence.
•Discovery: ORFs analysis.
•Discovery: “Gene finder” analyses, e. g., GRAIL, etc.
•Input: Related genomes.
•Discovery: GeneOrder (pairwise); CoreGenes- collect “gaps,” catalog
and re-analyze “gaps” as above.
•Discovery: BLAST- tBLASTx, BLASTP, Advanced BLAST, Psi BLAST,
etc.
•Input: “Loose” genes, proprietary genes.
•Discovery: Annot. with protein domain, features, pattern etc. dbs.
•Process: Merge newly generated databases.
•Ordering: Order genes with respect to genomic locations.
•Output: Linear array of genes; GeneOrder plots (closest pairs);
CoreGenes genomes table; “loose” genes table; “spliced” genes table.