Blast

1,370 views 28 slides Nov 03, 2019
Slide 1
Slide 1 of 28
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28

About This Presentation

Blast Types, Algorithm, Output


Slide Content

BLAST

Contents
Definition
Background
Types of BLASTProgram
Algorithm
BLASTInput-Output
BLASTsearch
BLASTFunction
Objectives ofBLAST

Definition
TheBasicLocalAlignmentSearchTool
(BLAST)forcomparinggeneandprotein
sequencesagainstothersinpublic
databases.
BLASTisasetofsequencecomparison
algorithmsusedtosearchdatabasesfor
optimallocalalignmentstoaquery.

Definition
Itbreaksthequeryanddatabases
sequencesintofragmentsandseeks
matchesbetweenthem.
Nucleicacid/ProteinAlignmentswere
timeconsuming.Alignmentswere
donebyfullalignmentsbyusing
dynamicprogramming.BLASTis50
timesfaster thendynamic
programming.

Background
Beginning in the 1970s, scientists began
to accumulate DNA and protein
sequence data at an exponential rate; in
fact, researchers currently have
approximately 97 billion bases
sequenced and over 93 million records.
Amazingly, this sequence data doubles
every 18 months!

Background
Today, one of the most commonly used
tools to examine DNA and protein
sequences is the Basic Local Alignment
Search Tool, also known as BLAST.
BLAST is a computer algorithm that is
available for use online at the National
Center for Biotechnology Information
(NCBI) website and many other sites.

Types ofBLAST
Nucleotide-nucleotideBLAST(blastn)
- Thisprogram,givenaDNAquery,
returnsthemostsimilarDNAsequencesfrom
theDNAdatabasethattheuserspecifies.
Protein-proteinBLAST(blastp)
- Thisprogram,givenaproteinquery,
returnsthemostsimilarproteinsequencesfrom
theproteindatabasethattheuserspecifies.
Position-SpecificIterativeBLAST(PSI-
BLAST)(blastpgp)
- Thisprogramisusedtofinddistant
relativesofaprotein.

Types ofBLAST
Nucleotide6-frametranslation-protein
(blastx)
-Thisprogramcomparesthesix-frame
conceptualtranslationproductsofa
nucleotidequerysequence(bothstrands)
againstaproteinsequencedatabase.
Nucleotide6-frametranslation-nucleotide
6-frametranslation(tblastx)
-Thepurposeoftblastxistofindverydistant
relationshipsbetweennucleotidesequences.

Types ofBLAST
Protein-nucleotide 6-frame translation
(tblastn)
-This program compares a protein query against
the all six reading frames of a nucleotide
sequence database.
Largenumbers ofquerysequences
(megablast)
-When comparing large numbers of input
sequences via the command-line BLAST,
"megablast" is much faster than running BLAST
multiple times.

Types ofBLAST
Of these programs, BLASTn and BLASTp are
the most commonly used because they use
direct comparisons, and do not require
translations.
However, since protein sequences are better
conserved evolutionarily than nucleotide
sequences, tBLASTn, tBLASTx, and BLASTx,
produce more reliable and accurate results
when dealing with coding DNA.

BLASTAlgorithm
The blast algorithm is fast, accurate and
web-accessible.
It is relatively faster than other sequence
similarity search tools.
Complex BLAST algorithm requires
multiple steps and many parameters.

BLASTAlgorithm
An overview of the
BLAST algorithm (a
protein to protein
search) is as follows:
Remove low-
complexity region or
sequence repeats in
the query sequence.
Make a k-letter word
list of the query
sequence -Take k=3 for
example, we list the words of length 3
in the query protein sequence (k is
usually 11 for a DNA sequence)
"sequentially", until the last letter of
the query sequence is included.

BLASTAlgorithm
List the possible matching words.
Organize the remaining high-scoring words into an
efficient search tree.
Repeat step 3 to 4 for each k-letter word in the
query sequence.
Scan the database sequences for exact matches
with the remaining high-scoring words.
Extend the exact matches to high-scoring segment
pair (HSP).

BLASTAlgorithm
List all of the HSPs in the database whose score
is high enough to be considered.
Evaluate the significance of the HSP score.
Make two or more HSP regions into a longer
alignment.
Show the gapped Smith-Waterman local
alignments of the query and each of the matched
database sequences.
Report every match whose expect score is lower
than a threshold parameter E.

BLASTInput-Output
Input
Input sequences
in FASTA or Genbank format.
Output
BLAST output can be delivered in a variety of
formats. These formats include HTML, plain
text, and XML formatting. For NCBI's web-
page, the default format for output is HTML.
An introduction that tells where the search
occurred and what database and query were
compared

BLASTOutput
A list of the
sequences in the
database containing
segment pairs whose
scores were least
likely to occur by
chance
Alignments of the
high-scoring segment
pairs showing identical
and similar residues
A complete list of the
parameter settings
used for the search.

BLASTOutput
E-value (expectation value)
The Expect value (E) is a parameter that
describes the number of hits one can "expect"
to see by chance when searching a database of
a particular size.
It decreases exponentially as the Score (S) of
the match increases.
Essentially, the E value describes the random
background noise.
In general terms the smaller E is the more
likely the match is significant.

BLASTOutput
Default E value for blastn, blastp, blastx
and tblastn is 10
At this setting, 10 hits with scores equal to
or better than the defined alignment score,
S, are expected to occur by chance. The E-
value can be increased or decreased to
alter the stringency of the search.
Increase the E value when searching with
a short query, since it is likely to be found
many times by chance in a given database.

BLASTOutput
BitScore
Abitscoreisanotherprominentstatistical
indicatorusedinadditiontotheEvaluein
aBLASToutput.
The bit score measures sequence
similarity independent of query sequence
length and database size and is normalized
based on the raw pairwise alignment score.

BLASTSearch
•Go to http://www.ncbi.nlm.nih.gov/
•Select BLASTprogram

BLASTSearch
Selecting the BLASTDatabase

BLASTSearch
Enteringsequence
Submittingsearch

BLASTFunction
BLASTcanbeusedforseveralpurposes.
Theseincludeidentifyingspecies,locating
domains,establishingphylogeny,DNA
mapping,andcomparison.
Identifying species
-WiththeuseofBLAST,wecanpossibly
correctlyidentifyaspeciesorfindhomologous
species.Thiscanbeuseful,forexample,when
weareworkingwithaDNAsequencefroman
unknownspecies.

BLASTFunction
Locating domains
-When working with a protein sequence
you can input it into BLAST, to locate
known domains within the sequence of
interest.
Establishing phylogeny
-Using the results received through BLAST
we can create a phylogenetic tree using
the BLAST web-page.

BLASTFunction
DNA mapping
-Whenworkingwithaknownspecies,and
lookingtosequenceageneatanunknown
location,BLAST cancompare the
chromosomalpositionofthesequenceof
interest,torelevantsequencesinthe
database
Comparison
-Whenworkingwithgenes,BLASTcanlocate
commongenesintworelatedspecies,and
canbeusedtomapannotationsfromone
organismtoanother.

Objectives ofBLAST
Itisoneofthemostpopularprogramsfor
sequenceanalysis.
Enablesaresearchertocomparea
querysequencewithalibraryordatabase
ofsequence.
Identifylibrarysequencesthatresemble
thequerysequenceaboveacertain
threshold.
Theobjectiveistofindhighscoring
ungappedsegmentsamongrelated
sequences.

Objectives ofBLAST
Alignmentsofthehigh-scoringsegment pairs
showingidenticalandsimilarresidues.
Acompletelistoftheparametersettingsusedforthe
search.

THANKYOU
Tags