Basic Local Alignment Search Tool - PPTs

JMuthukumaran 33 views 49 slides Jul 09, 2024
Slide 1
Slide 1 of 49
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49

About This Presentation

BLAST


Slide Content

BLAST, Mega BLAST, MUMmer and
AVID

What is BLAST?
BasicLocalAlignmentSearchTool
Itallowsrapidsequencecomparisonofaquery
sequenceagainstadatabase.
The BLAST algorithm is fast, accurate, and web-
accessible.
Developed in 1990 and 1997 (S. Altschul)

Why uses BLAST?
BLAST searching is fundamental to understanding the
relatedness of any favorite query sequence to other
known proteins or DNA sequences.
Applications include:-
Identifying orthologs and paralogs
Discovering new genes or proteins
Discovering variants of genes or proteins
Investigating expressed sequence tags (ESTs)
Exploring protein structure and function

Four Essential Components of BLAST
(1) Choose the sequence (query)
(2) Select the BLAST program
(3) Choose the database to search
(4) Choose optional parameters
Then click “BLAST”

BLAST ACCESS
NCBI BLAST
http://www.ncbi.nlm.nih.gov/BLAST/
Canadian Bioinformatics Resource BLAST
http://cbr-rbc.nrc-cnrc.gc.ca/blast/
European Bioinformatics Institute BLAST
http://www.ebi.ac.uk/blastall/
http://www.ebi.ac.uk/blast2/

NCBI BLAST

Canadian Bioinformatics Resource BLAST

European Bioinformatics Institute BLAST

Input Format
Inputsequencesarein:-
FASTAformator
Genbankformat.

1. Nucleotide BLAST
BLAST
N–Nucleotide BLAST
Searchanucleotidedatabaseusing
anucleotidequery

Input Page

2. Protein BLAST
BLAST
P–Protein BLAST
Search proteindatabase using
aproteinquery

Input Page

3. BLASTX –Translated BLAST
Searchproteindatabase using atranslated
nucleotidequery

Input Page

4. tBLAST
N
Searchtranslatednucleotidedatabaseusing
aproteinquery

Input Page

5. tBLAST
X
Searchtranslatednucleotidedatabaseusing
atranslatednucleotidequery

Input Page

What is Genomic BLAST?
GenomicBLAST,anovelgraphicaltoolfor
simplifyingBLASTsearchesagainstcomplete
andunfinishedgenomesequences.
Thistoolallowstheusertocomparethe
querysequenceagainstavirtualdatabaseof
DNAand/orproteinsequencesfroma
selectedgroupoforganismswithfinishedor
unfinishedgenomes.

What is Genomic BLAST?
Currentlyprovidesaccesstoover
170bacterialandarchaealgenomesand
over40eukaryoticgenomes.

Specialized BLAST
PrimerBLAST–FindingPrimer
CDSBLAST-Conserveddomainsequences
GEOBLAST-Geneexpressiondata
IgBLAST–Immunoglobulinsequences
SNPBLAST–Singlenucleotidepolymorphismetc.,

Primer BLAST -Input Page

CDS BLAST -Input Page

GEO BLAST -Input Page

Ig BLAST -Input Page

SNP BLAST -Input Page

Advanced BLAST
PSIBLAST–PositionSpecificIterativeBLAST
PHIBLAST–PatternHitIterativeBLAST
ThesetwoBLASTProgramsarecalled“Iterative
BLAST”

PSI BLAST
Position-SpecificIterated(PSI)-BLASTisthe
mostsensitiveBLASTprogram,makingit
usefulforfindingverydistantlyrelated
proteinsornewmembersofaproteinfamily.

PHI BLAST
Pattern-HitInitiated(PHI)-BLASTisdesignedto
searchforproteinsthatcontainapatternspecified
bytheuserANDaresimilartothequerysequence
inthevicinityofthepattern.
Thisdualrequirementisintendedtoreducethe
numberofdatabasehitsthatcontainthepattern,
butarelikelytohavenotruehomologytothe
query.

BLAST2
ItutilizestheBLASTalgorithmforpairwiseDNA-
DNAorprotein-proteinsequencecomparison.
Ithasbeenveryusefulforthecomparisonof
homologousgenesfromcompletemicrobial
genomes.
BLAST2.0algorithmgeneratesagappedalignment
byusingdynamicprogramming.

BLAST2
SEGandDUSTProgramsareusedtoremovethe
low-complexityregions.
Theprogramisnotgenerallyusefulformotif-style
searchingandaligningmegabasesizegenomic
sequencesisnotrecommended.
Themaximumnumberofcharacterspersequence,
thatmaybeaccommodatedis~150kb,the
optimalsizeofquerysequenceisabout1kb.

Output Page

MEGA BLAST
ComparisonoflargesetsoflongDNAsequences.
It'smuchfasterthanthestandardBLASTN
ItusesthegreedyalgorithmofWebbMilleretal.for
nucleotidesequencealignmentsearch
Itisuptotentimesfasterthanmorecommonsequence
similarityprogramsandthereforecanbeusedtoquickly
comparetwolargesetsofsequencesagainsteachother.

Suffix tree
Suffixtree,asthenamesuggestsisa
treeinwhicheverysuffixofastringSis
represented.
More formally defined, suffix tree is an
automaton which accepts every suffix
of a string.

Suffix tree
Exampleofsuffixtreeforthestring
“ABC”
1,2and3representtheendsof
suffixesstartingatpositions1,2
and3respectively.Thesearethe
leafnodes.

MUMmer –Genome alignment
algorithm
Developed by
Dr. Steven Salzberg’s group at TIGR
NAR (1999) 27:2369-2376
NAR (2002) 30:2478-2483
Availability
Free
TIGR (The Institute of Genomic Research) site

Features
The algorithm assumes that sequences are closely related
Can quickly compare millions of bases
Outputs:
Base to base alignment
Highlights the exact matches and differences in the
genomes
Locates
SNPs
Large inserts
Significant repeats
Tandem repeats and reversals

Technique used in MUMmer algorithm
Compute Suffix trees for every genome
Longest Increasing Subsequence (LIS)
Alignment using Smith & Waterman algorithm
Integration of
these techniques
for genome alignment

Steps
Locating MUMs
Sorting MUMs
Closure with gaps
G1: ACTGATTACGTGAACTGGATCCA
G2: ACTCTAGGTGAAGTGATCCA

Genome1: ACTGATTACGTGAACTGGATCCA
Genome2: ACTCTAGGTGAAGTGATCCA
Genome1:ACTGATTACGTGAACTGGATCCA
Genome2:ACTCTAGGTGAAGTGATCCA
ACTGATTACGTGAACTGGATCCA
ACTC--TAGGTGAAGT-GATCCA

What is MUM?
MUMisasubsequencethatoccursexactlyoncein
bothgenomesandisNOTpartofanylonger
sequence
TwocharactersthatboundaMUMarealways
mismatches
GenA:tcgatcGACGATCGCCGCCGTAGATCGAATAACGAGAGAGCATAA cgactta
GenB:gcattaGACGATCGCCGCCGTAGATCGAATAACGAGAGAGCATAA tccagag
Similar to
BLAST & FASTA!!

Sorting & ordering MUMs
MUMs are sorted according to their position in
Genome A
The order of matching MUMs in Genome B is
considered
LIS algorithm to locate longest set of MUMs which
occur in ascending order in both genomes
2
4
MUM5:
transposition
MUM3:
Random match
Inexact repeat
Leads to Global MUM-alignment

Results: Alignment of M. tuberculosis strains
CDC1551 (Top) & H37Rv (bottom)
Single green lines
indicate SNPs
Blue lines
indicate insertions

Comparison of 2 Mycoplasma genomes
cousins that are distantly related
M. genitalium: 580 074 nt
M. pneumoniae: 816 394 (+226 000)
Analysis of proteins tell us that all M.g. proteins are
present in P.m.
Alignment was carried using
FASTA (dividing each genome into 1000 bp)
All-against-all searches
Fixed length of pattern (25)
Using MUMmer (length = 25)

Comparison of 2 Mycoplasma genomes
Using FASTA
Fixed length
patterns: 25mers
MUMmer

AVID –Global alignment algorithm
Avidisdesignedtobefast,memoryefficient,and
practicalforsequencealignmentsoflargegenomic
regionsuptomegabaseslong.
AVIDgloballyalignsDNAsequencesofarbitrary
lengthforthepurposeofannotationandbiological
discoveryusingsyntenicgenomicsequencesfrom
twoorganisms.

Key features
Alignshundredsofkilobasesquickly
Highlyaccurateandabletodetectweak
homologies
Abletohandleonesequenceindraftbyordering
andorientingthecontigsautomatically

Key features
Fastalignmentofsimilarsequencesisusefulfor
alignmentsofprimatesequencesorcomparisonof
assemblies.
Itworksbyrecursivelyfindingstronganchorsfrom
thecollectionofmaximalmatchesinthe
sequences.
Tags