DNA SEQUENCING TECHNIQUES AND TYPES OF SEQUENCINGppt

hemantshiv1985 9 views 92 slides Aug 31, 2025
Slide 1
Slide 1 of 92
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92

About This Presentation

DNA Sequencing


Slide Content

DNA sequencing
“the technology lecture”
•Part 1: Chemistry, instrumentation and data
analysis
•Part 2: Large-scale operations, comparative
sequencing.
•Part 3: Sequencing analysis, variation
analysis.
•Abbrev 02/03/05, rev 10/17/05

Technology: Importance
•Approaches to research
–hypothesis driven
–discovery driven
–technology development
•Technology drives research drives technology
–paradigm changing
–punctuated versus gradual
•Vision
–personalized medicine

DNA sequencing: Importance
•Basic blueprint for life; Aesthetics.
•Gene and protein
–Function
–Structure
–Evolution
•Genome-based diseases- “inborn errors of metabolism”
–Genetic disorders
–Genetic predispositions to infection
–Diagnostics
–Therapies

DNA sequencing methodologies:
ca. 1977!
•Maxam-Gilbert
–base modification by
general and specific
chemicals.
–depurination or
depyrimidination.
–single-strand excision.
–not amenable to
automation
•Sanger
–DNA replication.
–substitution of
substrate with chain-
terminator chemical.
–more efficient
–automation??

Maxam-Gilbert ‘chemical’ method

versus “bio” based methods
•Sanger
•dideoxynucleotides

DNA chemistry

DNA biochemistry: replication
fork

DNA replication: biochemistry
OC N
purine
or
pyrimidine
PO
O
OH
PO
O
OH
PO
O
OH
HO
PO
O
OH
O OC N
purine
or
pyrimidine
OH
5’
3’

now, DNA “sequencing:” Sanger
dideoxy method I
OC N
purine
or
pyrimidine
PO
O
OH
PO
O
OH
PO
O
OH
HO
Hdideoxyribonucleoside triphosphate
(ddNTP)

DNA sequencing: Sanger II
OC N
purine
or
pyrimidine
PO
O
OH
PO
O
OH
PO
O
OH
HO
PO
O
OH
O OC N
purine
or
pyrimidine
H
chain
termination
method

DNA sequencing: chemistry
*
*
*
*
*
*
*
*
*
*
*
*
*
*

DNA sequencing: in practice
template + polymerase +
1
dCTP
dTTP
dGTP
dATP
ddATP
primer
2
dCTP
dTTP
dGTP
dATP
ddGTP
primer
3
dCTP
dTTP
dGTP
dATP
ddTTP
primer
4
dCTP
dTTP
dGTP
dATP
ddCTP
primer
extension
electrophoresis
A•T
G•C
A•T
T•A
C•G
T•A
G•C
G•C
A•T
G•C
T•A
T•A
C•G
T•A
G•C
A•T

Manual radioactive sequencing

Semi-automated fluorescent
DNA sequencing
•Fred Sanger et. al.,
1977.
•Walter Gilbert et. al.,
1977.
•Leroy Hood et. al.
1986.
•Applied Biosystems,
Inc.
•DuPont Company

DNA sequencing: upgrade,
second iteration, terminator-label
•Disadvantages of primer-labels:
–four reactions
–tedious
–limited to certain regions, custom oligos or
–limited to cloned inserts behind ‘universal’
priming sites.
•Advantages:
•Solution:
–fluorescent dye terminators

DNA sequencing: chemistry
template + polymerase +
dCTP
dTTP
dGTP
dATP
ddATP
ddGTP
ddTTP
ddCTP
extension
electrophoresis
A•T
G•C
A•T
T•A
C•G
T•A
G•C
G•C
A•T
G•C
T•A
T•A
C•G
T•A
G•C
A•T

DNA sequencing: photochemistry

ABI series: 370, 373 and 377
•semi-automated
•“best” pre- and post-
•higher throughput
operations.
•bioinformatics
limitations, ‘scuze me-
“opportunities.’”

ABI 370s-series screen dump

Bioinformatics part one: pixel
refinement

ABI 377 envelope: 96 lanes

genome sequencing strategies
•Shotgun
•Directed primer walks
•Modified directed primer walks

Sequencing strategies
Whole genome
Also on a smaller scale: 1. “Island walking” and 2. Primer walking.

Rapid re-sequencing of human Ad1: Time trial.
Have sequence of Ad 1.
In theory, have a minimally tiled set of PCR primers to cover entire 36,001 base genome.
In theory, have a minimally tiled set of sequencing primers as well.
Want draft sequence in a minimal time, including primer delivery from a vendor.
In practice design two parallel sets of minimally tiled PCR primers and amplify two sets.
In practice, assume 750 base reads--> 48 primers, one direction.
Compare with consensus: Determine accuracy, timing and evaluate operation.
1 36,001
115 7,315
7,300 14,500
14,400 21,600
21,500 28,700
28,600 35,885

Custom primer walks and “island” hopping
•Have scaffold of generic genome: related or compiled.
•Have archived “islands of sequences” (lg, med, sm)- from other research interests.
•Generate “in-bound” primers to re-sequence equivalents and known features, e.g., 3’-
ITR.
•Use custom “out-bound” primers to walk across “inter-island” sequences (PCR and
sequencing.
•Collect “1st +” draft genomic sequence as round 1.
•Iterative walks to complete “2+1” consensus, with error rate 1/10,000 bases.

Target: HAdV4
•For 36,000 bases, need 90 primers for 1x coverage (1
st
draft) and 270 primers for 3x coverage
(finished).
•Have from GenBank: 10 “islands” @ 30%= 10,883 bases,
–calling for 27x2= 54 primers for complementing coverage.
•Theory (if continuous sequence): 36,000-10,883= 25,117 bases.
–At 400 bases per read, need 63 primers for 1x coverage, or 126 for complementing
coverage.
•Practice: 10 “islands” @ 30%= 10,883 bases, 80 primers.
•Example: “Island 1” is 149 bases.
–1 fragment at 400 bases/read.
–2 primers for 1x coverage.
–“Terminal island,” need only 1 “outbound” primer.
–Total of (1x2)+1= 3 primers.
•Example: “Island 2” is 2042 bases.
–5 fragments at 400 bases/read.
–“Internal island,” need 2 “outbound” primers.
–Total of (5x2)+2= 12 primers.

Definition of tiled set of PCR primers: Data.
A
B
C
D
E
F
G
H
“B”
“C”
“D”
“E”
PCR fragments

DNA sequencing: Computation
•Input from sequencer
–peak intensities
•Output to user
–DNA sequence
1.normalize intensities
2.apply mobility corrections
3.predict bands
4.call bases

DNA sequencing: Computation

DNA sequencing: Computation

Sequence assembly:
“Sequencher”

Applications DNA sequencing
•Whole genome analysis
•Comparative genomics
•Applications to subfields

DNA sequencing
•Higher throughput

DNA sequencing: Equipment

DNA sequencing technology
•Manual.
•ABI 370s series.
•DuPont “Genesis.”
•Capillary array: Hitachi, ABI, Amersham...
•Ultrathin horizontal: GeneSys Tech.
(MJResearch), Whitehead Inst., E. Yeung.
•Thin channel.
•“ABI” 310, 3100, 3710……. (2002)

Capillary electrophoresis

Multi-capillary array

Cap array screen dump

Shimadzu, Ltd.
•NEW ORLEANS, March 19, 2002. PittCon.
•Faster and more economical DNA Sequencer.
•10 times faster and 90 percent cheaper to run than current state-of-the-
art.
•GenoMEMS, MA spinoff that has developed a microfabrication
technology, based on Whitehead Inst. technology.
•Microelectromechanical system, or MEMS,
technology:microfabricated electrical and mechanical components
•Five million bases per day.
•Readlengths of 800 bases.

Target 2003.
•TODAY (2005) Solexa, 454 etc. $1,000 genome- $100,000
genome

Other considerations: automation

Bioinformatics issues in comparative DNA
sequencing

Done! Now what? ex., Ad 1 assembly:
Consensus

Genome characterization
•Align DNA sequence with archived sequences.
•Annotate DNA features, e.g., RE sites, GC sites, replication and
transcription factor binding sites.
•Annotate ORFs.
•Annotate genes and proteins.
•Phylogenetic analyses of genes.
•Whole genome comparisons.
•Phylogenetic analyses of genomes.
•Identify cellular homologues or “ancient history”
-horizontal transfer.

Genome Sequence
Annotation.
•Annotation flowchart.
•Summary of findings.
•Comparison of genome sequences.

From the sequencing projects:
Biological features in sequence ?
ATG TAG
TAA
TGA
GT AG
PROMOTER
POLY A SIGNAL
EXON
INTRON
EXON

Genome sequence annotation:
(M. Zorn, Berkeley, 2002)
• Extraction, definition and interpretation of features in the genome
sequence by integrating computational tools and biological knowledge.
• “Proofread” the sequence: correct miscalls. Sequence data needs to be
“cleaned up” for chip design.

Adenoviruses:
Non-enveloped icosahedral viruses .
Multiply in the host nucleus.
Linear double-stranded DNA genome, 26-45bp in size.
Infect most vertabrates from fish to humans.
Human adenoviruses  Mastadenovirus.
51 human serotypes divided into six sub-genera
(Group A-F).
–HAdB1: Ads 3, 7, 16, 21. (respiratory infections)
–HAdB2: Ads 11, 14, 34, 35, 50. (kidney and UT infections
except, 11a and 14)
–HAdE: Ad 4. (respiratory infections)

From Stone et al, 2003.
Early
Intermediate
Late
Transcription units

Gene annotation of adenovirus genome: BasicGene annotation of adenovirus genome: Basic
GLIMMER2
Artemis
RBSFinder
ORFs
Start/Stop Codon
Verification
Translated frames
GenBank
Non-redudant
Protein databases
Sequence
Alignments
BLASTP
GENES: name, CDS (Splice sites), MW
Refined ORFs
Artemis: six frames
translation
CLUSTALW

Advanced: Detailed annotation of genes
GenBank: E4 Superfamily: regions 1 and 2
Join: region 1, region2:117~306 nt in between
GenBank E4 Superfamily:
17 KD, 20 KD, 24KD, 27 KD
CLUSTALW Artemis:six frame translation
Annotated Human type 1 adenovirus E4 genes:
Spliced from 2 to 3 exons

5’
E4 27K
E4 20K
E4 17K
5’
5’

MAAAVEALYVVLEREGAILPRQEGFSGVYVFFSPINFVIPPMGAVMLSLRLRVCIPPGYFGRFLALTDVNQPDVFTESYIMTPDMTEELSVVLFNHGDQFFYGHAGMAVVRLMLIRVVFP
V
Had1 27K
1st exon
ALPDFLSSTLHFISPPMQQAYIGATLVSIAPSMRVIISVGSFVMVPGGEVAALVRADLHDYVQLALRRDLRDRGIFVNVPLLNLIQVCEEPEFLQSHad1 27K
2nd exon
VGIAYLLLRQRPALPYWRIIRCCPNVTLHad1 27K
3rd exon
MAAAVEALYVVLEREGAILPRQEGFSGVYVFFSPINFVIPPMGAVMLSLRLRVCIPPGYFGRFLALTDVNQPDVFTESYIMTPDMTEELSVVLFNHGDQFFYGHAGMAVVRLMLIRVVFP
VALPDFLSSTLHFISPPMQQAYIGATLVSIAPSMRVIISVGSFVMVPGGEVAALVRADLHDYVQLALRRDLRDRGIFVNVPLLNLIQVCEEPEFLQSVGIAYLLLRQRPALPYWRIIRCC
PNVTL
Had2 27K
MAAAVEALYVVLEREGAILPRQEGFSGVYVFFSPINFVIPPMGAVMLSLRLRVCIPPGYFGRFLALTDVNQPDVFTESYIMTPDMTEELSVVLFNHGDQFFYGHAGMAVVRLMLIRVVFP
VVRQASNV#MFFFVILFCV#CRNPQTCLREKWCLFLWWFRNLPAFICMSMTTMCLLFCARLCLIF*AAPCILYRRPCNKLT+GLRWLA+LRVCVS#SVWVLLSWFLAGKWPRWSVQTCTI
MFSWPCEGTYGIAVFLLMFRF*ILYRSVRNLNFCNHDSLLEAEGGGRSGADFYNGRT#YSGFA+RHIDKVAR*KLFGHG*RCWNVYRGDSP*RV+PLRPLGREGSLPFGSHCATSYKCHY
LFFGCRV*PRHRRGARSLNRSSF*GFG#SFGIKKKKTWFFQLFPLLPCVTRRTNV+VGWVWLILRWWMLSGQRRMKEFT+NPKPGGAWML*ESGYTTTTTQSELSDETGDADLFVTPAPG
FASGNMTTSGVPFGMTLRPTRSRLSRRTPYSRDRLPPFETETRATILEDHPLLPECNTLTMHNVSYVRGLPCSVGFTLIQEWVVPWDMVLTREELVILRKCMHVCLCCANIDIMTSMMIH
GYESWALHCHCSSPGSLQCIAGGQVLASWFRMVVDGAMFNQRFIWYREVVNYNMPKEVMFMSSVFMRGRHLIYLRLWYDGHVGSVVPAMSFGYSALHCGILNNIVVLCCSYCADLSEIRV
RCCARRTRRLMLRAVRIIAEETTAMLYSCRTERRRQQFIRALLQHHRPILMHDYDSTPM
Had1
ORF
MAAAVEALYVVLEREGAILPRQEGFSGVYVFFSPINFVIPPMGAVMLSLRLRVCIPPGYFGRFLALTDVNQPDVFTESYIMTPDMTEELSVVLFNHGDQFFYGHAGMAVVRLMLIRVVFP
VALPDFLSSTLHFISPPMQQAYIGATLVSIAPSMRVIISVGSFVMVPGGEVAALVRADLHDYVQLALRRDLRDRGIFVNVPLLNLIQVCEEPEFLQSVGIAYLLLRQRPALPYWRIIRCC
PNVTL
Had2_27K
MAAAVEALYVVLEREGAILPRQEGFSGVYVFFSPINFVIPPMGAVMLSLRLRVCIPPGYFGRFLALTDVNQPDVFTESYIMTPDMTEELSVVLFNHGDQFFYGHAGMAVVRLMLIRVVFP
V
Had2 27K
1stexon
Had2
ORFs
ALPDFLSSTLHFISPPMQQAYIGATLVSIAPSMRVIISVGSFVMVPGGEVAALVRADLHDYVQLALRRDLRDRGIFVNVPLLNLIQVCEEPEFLQS
Had2 27K
2nd exon
VGIAYLLLRQRPALPYWRIIRCCPNVTLHad2 27K
3rd exon
MAAAVEALYVVLEREGAILPRQEGFSGVYVFFSPINFVIPPMGAVMLSLRLRVCIPPGYFGRFLALTDVNQPDVFTESYIMTPDMTEELSVVLFNHGDQFFYGHAGMAVVRLMLIRVVFP
VVRQASNV#MFFFVILFCV#CRNPQTCLREKWCLFLWWFRNLPAFICMSMTTMCLLFCARLCLIF*AAPCILYRRPCNKLT+GLRWLA+LRVCVS#SVWVLLSWFLAGKWPRWSVQTCTI
MFSWPCEGTYGIAVFLLMFRF*ILYRSVRNLNFCNHDSLLEAEGGGRSGADFYNGRT#YSGFA+RHIDKVAR*KLFGHG*RCWNVYRGDSP*RV+PLRPLGREGSLPFGSHCATSYKCHY
LFFGCRV*PRHRRGARSLNRSSF*GFG#SFGIKKKKTWFFQLFPLLPCVTRRTNV+VGWVWLILRWWMLSGQRRMKEFT+NPKPGGAWML*ESGYTTTTTQSELSDETGDADLFVTPAPG
FASGNMTTSGVPFGMTLRPTRSRLSRRTPYSRDRLPPFETETRATILEDHPLLPECNTLTMHNVSYVRGLPCSVGFTLIQEWVVPWDMVLTREELVILRKCMHVCLCCANIDIMTSMMIH
GYESWALHCHCSSPGSLQCIAGGQVLASWFRMVVDGAMFNQRFIWYREVVNYNMPKEVMFMSSVFMRGRHLIYLRLWYDGHVGSVVPAMSFGYSALHCGILNNIVVLCCSYCADLSEIRV
RCCARRTRRLMLRAVRIIAEETTAMLYSCRTERRRQQFIRALLQHHRPILMHDYDSTPM
MAAAVEALYVVLEREGAILPRQEGFSGVYVFFSPINFVIPPMGAVMLSLRLRVCIPPGYFGRFLALTDVNQPDVFTESYIMTPDMTEELSVVLFNHGDQFFYGHAGMAVVRLMLIRVVFP
MAAAVEALYVVLEREGAILPRQEGFSGVYVFFSPINFVIPPMGAVMLSLRLRVCIPPGYFGRFLALTDVNQPDVFTESYIMTPDMTEELSVVLFNHGDQFFYGHAGMAVVRLMLIRVVFP
VALPDFLSSTLHFISPPMQQAYIGATLVSIAPSMRVIISVGSFVMVPGGEVAALVRADLHDYVQLALRRDLRDRGIFVNVPLLNLIQVCEEPEFLQSVGIAYLLLRQRPALPYWRIIRCC
VALPDFLSSTLHFISPPMQQAYIGATLVSIAPSMRVIISVGSFVMVPGGEVAALVRADLHDYVQLALRRDLRDRGIFVNVPLLNLIQVCEEPEFLQSVGIAYLLLRQRPALPYWRIIRCC
PNVTL
PNVTL
Had1
Had2
Had1
Had2
Had1
Had2

Two annotation approaches to HAdV1
Based on Ad 2 annotation Generic annotation plus advanced

HAdV1 genes

Phylogenetic analysis of genes

Global tools for whole genome analyses
•Databases and data streams “readily” available.
•Data mining opportunities: “added value.”
•Limited tools in tool set, especially whole genome comparisons:
MAP, GeneOrder and CoreGenes.
•Non-available or non-optimal tools: Automated annotation, etc.
•These whole genome analysis tools have value for the EOS project, in
particular the PCR-based assays and the microarray “re-sequencing”
assays.

Genome analysis: Continuation.

GCG SeqWeb Compare: Adenovirus genomes

FLAG: Fast Local Alignment for Gigabases

FLAG Ad 1 vs 2 vs 5
•Ad 1 vs Ad 2
•Ad 1 vs Ad 5
•Ad 2 vs Ad 5

Get GenBank file from NCBI website
Remove unnecessary
information and save
Convert to FASTA format
Convert to database format for BLASTP
Error message
Yes
Stop
No
Break query file into single query.
Save each query in a temporary file.
BLASTP against database.
Get BLASTP results based on selected ranges
Extract and print table/graph
Problem during
process
GeneOrder flowchart

GeneOrder analysis: Example
•Manually plot with MS-Excel.
•Each point is a coding gene.
•Co-linear arrangements suggest
synteny.
•Several regions of genomic
rearrangement events within the
genomes of the two chloroplasts.
•Rearrangements include flipping
of entire set of genes.
•Two versions have been
developed: GO1 and 2.
•Ongoing work include recoding
for megabase genomes, which
have additional value.

GeneOrder2.0 analysis

Link to GenBank database

0
20
40
60
80
100
120
140
160
180
0 100 200 300
Vaccinia
0
20
40
60
80
100
120
140
160
0 50 100 150 200 250
Vaccinia
0
50
100
150
200
250
300
0 50 100 150 200 250
Vaccinia
Poxvirus genomes: Gene order1.0 analysis I

0
5 0
1 0 0
1 5 0
2 0 0
2 5 0
3 0 0
0 5 0 1 0 0 1 5 0 2 0 0 2 5 0
V a c c i n i a
0
5 0
1 00
1 50
2 00
2 50
3 00
0 50 1 00 1 50 2 00 2 50
V ac cin ia
0
5 0
10 0
15 0
20 0
25 0
30 0
0 1 0 0 2 00 3 0 0
M s E PV
Poxvirus genomes: Gene order2.0 analysis II

GeneOrder2.0 analysis: AmEPV v MsEPV

GeneOrder identifies similar genes in two genomes
Organize common genes in five genomes (genera) as “Alphabet”
Add other genes based on additional information
Use Advanced BLAST to check the Alphabet
Use PSI-BLAST with several iterations to check the Alphabet
Scan entire NCBI protein database using conserved
profiles to ensure that all the conserved proteins have been extracted
Compare Alphabet with experimental TS mutant data to determine
the essential genes for pox viruses
Conserved genes of poxviruses

Orthologous gene locator
•Develop software tool to characterize genomes globally.
•Characterize genomes by identifying orthologous genes.
•Identify paralogs.
•Characterize unknown genes by identifying orthologs.
•Rapid automated comparisons of genomes.
•Identify “alphabet” of essential genes.
•“CoreGenes.”
•In general, high BLAST may not be orthologous/homologous

CoreGenes analysis of poxviruses

Protein function Orthopoxvirus
(Vaccinia)
MCV Fibroma FPV MsEPV
1. Major core protein, p4a A10L
335477
MC113L
1492056
gp099L
6578628
FPV174
7271672
MSV152
4049715
2. Unknown
TM-C? (Senkevich et al.
1997)
A11R
335481
MC114R
1492057
gp100R
6578629
FPV175
7271673
MSV151
4049824
3. Potential membrane protein;
TM-C, S-S (Senkevich et al.
1997)
A16L
335373
MC016L
1491959
gp019L
6578549
FPV112
7271610
MSV090
4049680
4. Late transcription factor;
VLTF-2
A1L
335464
MC103L
1492046
gp089L
6578619
FPV049
7271547
MSV187
4049734
5. NGL-C, RNA helicase
(Senkevich et al. 1997)
A18R
335488
MC123R
1492066
gp108R
6578637
FPV183
7271681
MSV148
4049826
Poxvirus alphabet: Compilation of “core”

Transcription/RNA
modification
34%
DNA
replication/repair
10%
Structural
24%
Other Enzymes
12%
Unknown
20%
Conserved genes of poxviruses

Applications of CoreGenes to EOS Affy chip
design
•HNC, San Diego, has a DTRA contract to build a software tool to determine
sequences common to bacterial pathogens, allowing for identification of probes
and primers: “BugID.”
•HNC has been tasked to reformat “BugID” for examining virus genomes,
which do have “core” genes, conserved at the amino acid but not necessarily at
the nucleotide level. One preliminary exercise is to develop software to
identify essential and related proteins.
•“CoreGenes” from GMU already performs this function. It presents a table of
“core” and presumably essential genes from families of organisms.
•“CoreGenes” is under continued development. One feature is to present tables
of related, slightly related and unrelated genes.
•This has value in identifying probes and primers for assays such as
microarrays.

CoreGenes: Chloroplasts analysis

CoreGenes: Mitochondria analysis

CoreGenes as annotation tool

Annotation of human adenoviruses

Revised annotation HAdV genomes

Automated annotation
Transform newly determined DNA sequence into linear array:
•Input: DNA sequence.
•Discovery: ORFs analysis.
•Discovery: “Gene finder” analyses, e. g., GRAIL, etc.
•Input: Related genomes.
•Discovery: GeneOrder (pairwise); CoreGenes- collect “gaps,” catalog
and re-analyze “gaps” as above.
•Discovery: BLAST- tBLASTx, BLASTP, Advanced BLAST, Psi BLAST,
etc.
•Input: “Loose” genes, proprietary genes.
•Discovery: Annot. with protein domain, features, pattern etc. dbs.
•Process: Merge newly generated databases.
•Ordering: Order genes with respect to genomic locations.
•Output: Linear array of genes; GeneOrder plots (closest pairs);
CoreGenes genomes table; “loose” genes table; “spliced” genes table.
Tags