WELCOME TO OUR PRESENTATION Daffodil International University Presented by ‘’HIGH-5’’
PRESENTED TO : Name : Mr. Shaon Bhatta Shuvo Designation Lecturer Department Department of Computer Science and Engineering Daffodil International University
Contents Definition Background Types of BLAST Program Algorithm BLAST Input-Output BLAST search BLAST Function Objectives of BLAST
Definition The Basic Local Alignment Search Tool (BLAST) for comparing gene and protein sequences against others in public databases. BLAST is a set of sequence comparison algorithms used to search databases for optimal local alignments to a query.
Basic Local Alignment Search Tool BLAST finds regions of similarity between biological sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance. The Basic Local Alignment Search Tool (BLAST) finds regions of local similarity between sequences. The program compares nucleotide or protein sequences to sequence databases and calculates the statistical significance of matches. BLAST can be used to infer functional and evolutionary relationships between sequences as well as help identify members of gene families.
Background Beginning in the 1970s, scientists began to accumulate DNA and protein sequence data at an exponential rate; in fact, researchers currently have approximately 97 billion bases sequenced and over 93 million records. Amazingly, this sequence data doubles every 18 months!
Background Today, one of the most commonly used tools to examine DNA and protein sequences is the Basic Local Alignment Search Tool, also known as BLAST. BLAST is a computer algorithm that is available for use online at the National Center for Biotechnology Information (NCBI) website and many other sites.
Types of BLAST Nucleotide-nucleotide BLAST ( blastn ) - This program, given a DNA query, returns the most similar DNA sequences from the DNA database that the user specifies. Protein-protein BLAST ( blastp ) - This program, given a protein query, returns the most similar protein sequences from the protein database that the user specifies. Position-Specific Iterative BLAST (PSI-BLAST) ( blastpgp ) - This program is used to find distant relatives of a protein.
Types of BLAST Nucleotide 6-frame translation-protein ( blastx ) - This program compares the six-frame conceptual translation products of a nucleotide query sequence (both strands) against a protein sequence database. Nucleotide 6-frame translation-nucleotide 6-frame translation ( tblastx ) - The purpose of tblastx is to find very distant relationships between nucleotide sequences.
Types of BLAST Protein-nucleotide 6-frame translation ( tblastn ) - This program compares a protein query against the all six reading frames of a nucleotide sequence database. Large numbers of query sequences ( megablast ) - When comparing large numbers of input sequences via the command-line BLAST, " megablast " is much faster than running BLAST multiple times.
Types of BLAST Of these programs, BLASTn and BLASTp are the most commonly used because they use direct comparisons, and do not require translations. However, since protein sequences are better conserved evolutionarily than nucleotide sequences, tBLASTn , tBLASTx , and BLASTx , produce more reliable and accurate results when dealing with coding DNA.
BLAST Algorithm The blast algorithm is fast, accurate and web-accessible. It is relatively faster than other sequence similarity search tools. Complex BLAST algorithm requires multiple steps and many parameters.
BLAST Algorithm An overview of the BLAST algorithm (a protein to protein search) is as follows : Remove low-complexity region or sequence repeats in the query sequence . Make a k -letter word list of the query sequence - Take k =3 for example, we list the words of length 3 in the query protein sequence ( k is usually 11 for a DNA sequence) "sequentially", until the last letter of the query sequence is included .
BLAST Algorithm List the possible matching words. Organize the remaining high-scoring words into an efficient search tree. Repeat step 3 to 4 for each k -letter word in the query sequence. Scan the database sequences for exact matches with the remaining high-scoring words. Extend the exact matches to high-scoring segment pair (HSP).
BLAST Input-Output Input Input sequences in FASTA or Genbank format. Output BLAST output can be delivered in a variety of formats. These formats include HTML, plain text, and XML formatting. For NCBI's web-page, the default format for output is HTML. An introduction that tells where the search occurred and what database and query were compared
BLAST Output A list of the sequences in the database containing segment pairs whose scores were least likely to occur by chance Alignments of the high-scoring segment pairs showing identical and similar residues A complete list of the parameter settings used for the search.
BLAST Output Bit Score A bit score is another prominent statistical indicator used in addition to the E value in a BLAST output. The bit score measures sequence similarity independent of query sequence length and database size and is normalized based on the raw pairwise alignment score.
BLAST Search Go to http ://www.ncbi.nlm.nih.gov / Select BLAST program
BLAST Search Selecting the BLAST Database
BLAST Search Entering sequence Submitting search
BLAST Function Locating domains - When working with a protein sequence you can input it into BLAST, to locate known domains within the sequence of interest. Establishing phylogeny - Using the results received through BLAST we can create a phylogenetic tree using the BLAST web-page.
BLAST Function DNA mapping - When working with a known species, and looking to sequence a gene at an unknown location, BLAST can compare the chromosomal position of the sequence of interest, to relevant sequences in the database Comparison - When working with genes, BLAST can locate common genes in two related species, and can be used to map annotations from one organism to another.
Objectives of BLAST It is one of the most popular programs for sequence analysis. Enables a researcher to compare a query sequence with a library or database of sequence. Identify library sequences that resemble the query sequence above a certain threshold. The objective is to find high scoring ungapped segments among related sequences.