Basic Local Alignment Search Tool - PPTs

BLAST, Mega BLAST, MUMmer and
AVID

What is BLAST?
BasicLocalAlignmentSearchTool
Itallowsrapidsequencecomparisonofaquery
sequenceagainstadatabase.
The BLAST algorithm is fast, accurate, and web-
accessible.
Developed in 1990 and 1997 (S. Altschul)

Why uses BLAST?
BLAST searching is fundamental to understanding the
relatedness of any favorite query sequence to other
known proteins or DNA sequences.
Applications include:-
Identifying orthologs and paralogs
Discovering new genes or proteins
Discovering variants of genes or proteins
Investigating expressed sequence tags (ESTs)
Exploring protein structure and function

Four Essential Components of BLAST
(1) Choose the sequence (query)
(2) Select the BLAST program
(3) Choose the database to search
(4) Choose optional parameters
Then click “BLAST”

BLAST ACCESS
NCBI BLAST
http://www.ncbi.nlm.nih.gov/BLAST/
Canadian Bioinformatics Resource BLAST
http://cbr-rbc.nrc-cnrc.gc.ca/blast/
European Bioinformatics Institute BLAST
http://www.ebi.ac.uk/blastall/
http://www.ebi.ac.uk/blast2/

NCBI BLAST

Canadian Bioinformatics Resource BLAST

European Bioinformatics Institute BLAST

Input Format
Inputsequencesarein:-
FASTAformator
Genbankformat.

1. Nucleotide BLAST
BLAST
N–Nucleotide BLAST
Searchanucleotidedatabaseusing
anucleotidequery

Input Page

2. Protein BLAST
BLAST
P–Protein BLAST
Search proteindatabase using
aproteinquery

Input Page

3. BLASTX –Translated BLAST
Searchproteindatabase using atranslated
nucleotidequery

Input Page

4. tBLAST
N
Searchtranslatednucleotidedatabaseusing
aproteinquery

Input Page

5. tBLAST
X
Searchtranslatednucleotidedatabaseusing
atranslatednucleotidequery

Input Page

What is Genomic BLAST?
GenomicBLAST,anovelgraphicaltoolfor
simplifyingBLASTsearchesagainstcomplete
andunfinishedgenomesequences.
Thistoolallowstheusertocomparethe
querysequenceagainstavirtualdatabaseof
DNAand/orproteinsequencesfroma
selectedgroupoforganismswithfinishedor
unfinishedgenomes.

What is Genomic BLAST?
Currentlyprovidesaccesstoover
170bacterialandarchaealgenomesand
over40eukaryoticgenomes.

Specialized BLAST
PrimerBLAST–FindingPrimer
CDSBLAST-Conserveddomainsequences
GEOBLAST-Geneexpressiondata
IgBLAST–Immunoglobulinsequences
SNPBLAST–Singlenucleotidepolymorphismetc.,

Primer BLAST -Input Page

CDS BLAST -Input Page

GEO BLAST -Input Page

Ig BLAST -Input Page

SNP BLAST -Input Page

Advanced BLAST
PSIBLAST–PositionSpecificIterativeBLAST
PHIBLAST–PatternHitIterativeBLAST
ThesetwoBLASTProgramsarecalled“Iterative
BLAST”

PSI BLAST
Position-SpecificIterated(PSI)-BLASTisthe
mostsensitiveBLASTprogram,makingit
usefulforfindingverydistantlyrelated
proteinsornewmembersofaproteinfamily.

PHI BLAST
Pattern-HitInitiated(PHI)-BLASTisdesignedto
searchforproteinsthatcontainapatternspecified
bytheuserANDaresimilartothequerysequence
inthevicinityofthepattern.
Thisdualrequirementisintendedtoreducethe
numberofdatabasehitsthatcontainthepattern,
butarelikelytohavenotruehomologytothe
query.

BLAST2
ItutilizestheBLASTalgorithmforpairwiseDNA-
DNAorprotein-proteinsequencecomparison.
Ithasbeenveryusefulforthecomparisonof
homologousgenesfromcompletemicrobial
genomes.
BLAST2.0algorithmgeneratesagappedalignment
byusingdynamicprogramming.

BLAST2
SEGandDUSTProgramsareusedtoremovethe
low-complexityregions.
Theprogramisnotgenerallyusefulformotif-style
searchingandaligningmegabasesizegenomic
sequencesisnotrecommended.
Themaximumnumberofcharacterspersequence,
thatmaybeaccommodatedis~150kb,the
optimalsizeofquerysequenceisabout1kb.

Output Page

MEGA BLAST
ComparisonoflargesetsoflongDNAsequences.
It'smuchfasterthanthestandardBLASTN
ItusesthegreedyalgorithmofWebbMilleretal.for
nucleotidesequencealignmentsearch
Itisuptotentimesfasterthanmorecommonsequence
similarityprogramsandthereforecanbeusedtoquickly
comparetwolargesetsofsequencesagainsteachother.

Suffix tree
Suffixtree,asthenamesuggestsisa
treeinwhicheverysuffixofastringSis
represented.
More formally defined, suffix tree is an
automaton which accepts every suffix
of a string.

Suffix tree
Exampleofsuffixtreeforthestring
“ABC”
1,2and3representtheendsof
suffixesstartingatpositions1,2
and3respectively.Thesearethe
leafnodes.

MUMmer –Genome alignment
algorithm
Developed by
Dr. Steven Salzberg’s group at TIGR
NAR (1999) 27:2369-2376
NAR (2002) 30:2478-2483
Availability
Free
TIGR (The Institute of Genomic Research) site

Features
The algorithm assumes that sequences are closely related
Can quickly compare millions of bases
Outputs:
Base to base alignment
Highlights the exact matches and differences in the
genomes
Locates
SNPs
Large inserts
Significant repeats
Tandem repeats and reversals

Technique used in MUMmer algorithm
Compute Suffix trees for every genome
Longest Increasing Subsequence (LIS)
Alignment using Smith & Waterman algorithm
Integration of
these techniques
for genome alignment

Steps
Locating MUMs
Sorting MUMs
Closure with gaps
G1: ACTGATTACGTGAACTGGATCCA
G2: ACTCTAGGTGAAGTGATCCA

Genome1: ACTGATTACGTGAACTGGATCCA
Genome2: ACTCTAGGTGAAGTGATCCA
Genome1:ACTGATTACGTGAACTGGATCCA
Genome2:ACTCTAGGTGAAGTGATCCA
ACTGATTACGTGAACTGGATCCA
ACTC--TAGGTGAAGT-GATCCA

What is MUM?
MUMisasubsequencethatoccursexactlyoncein
bothgenomesandisNOTpartofanylonger
sequence
TwocharactersthatboundaMUMarealways
mismatches
GenA:tcgatcGACGATCGCCGCCGTAGATCGAATAACGAGAGAGCATAA cgactta
GenB:gcattaGACGATCGCCGCCGTAGATCGAATAACGAGAGAGCATAA tccagag
Similar to
BLAST & FASTA!!

Sorting & ordering MUMs
MUMs are sorted according to their position in
Genome A
The order of matching MUMs in Genome B is
considered
LIS algorithm to locate longest set of MUMs which
occur in ascending order in both genomes
2
4
MUM5:
transposition
MUM3:
Random match
Inexact repeat
Leads to Global MUM-alignment

Results: Alignment of M. tuberculosis strains
CDC1551 (Top) & H37Rv (bottom)
Single green lines
indicate SNPs
Blue lines
indicate insertions

Comparison of 2 Mycoplasma genomes
cousins that are distantly related
M. genitalium: 580 074 nt
M. pneumoniae: 816 394 (+226 000)
Analysis of proteins tell us that all M.g. proteins are
present in P.m.
Alignment was carried using
FASTA (dividing each genome into 1000 bp)
All-against-all searches
Fixed length of pattern (25)
Using MUMmer (length = 25)

Comparison of 2 Mycoplasma genomes
Using FASTA
Fixed length
patterns: 25mers
MUMmer

AVID –Global alignment algorithm
Avidisdesignedtobefast,memoryefficient,and
practicalforsequencealignmentsoflargegenomic
regionsuptomegabaseslong.
AVIDgloballyalignsDNAsequencesofarbitrary
lengthforthepurposeofannotationandbiological
discoveryusingsyntenicgenomicsequencesfrom
twoorganisms.

Key features
Alignshundredsofkilobasesquickly
Highlyaccurateandabletodetectweak
homologies
Abletohandleonesequenceindraftbyordering
andorientingthecontigsautomatically

Key features
Fastalignmentofsimilarsequencesisusefulfor
alignmentsofprimatesequencesorcomparisonof
assemblies.
Itworksbyrecursivelyfindingstronganchorsfrom
thecollectionofmaximalmatchesinthe
sequences.

Basic Local Alignment Search Tool - PPTs

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Basic Local Alignment Search Tool - PPTs

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......