Blast and fasta

38,988 views 27 slides Oct 06, 2018
Slide 1
Slide 1 of 27
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27

About This Presentation

blast and fasta are two softwares in bioinformatics, blast usually used for similarity checking


Slide Content

BLAST & FASTA By, Allie N U, MSc biotechnology.

Introduction Used to find the local similarity or alignment shared by two sequences. Method to find the similarity is called the alignment. It can be of two types, Global alignment – align the entire sequence using as many characters as possible. Local alignment – focuses on region of similarity in parts of the sequence only

Alignment of two sequences is performed by following methods: Dot matrix analysis Dynamic programming Word or k- tuple method (FASTA & BLAST programs)

Word or k - tuple Align two sequences very quickly, first by searching for identical short stretches of sequences called word or k – tuple . Then by joining these words into an alignment by dynamic programming method. BLAST and FASTA methods are heuristic.

BLAST - introduction Basic local alignment search tool (BLAST) is a popular user friendly tool for searching all the major sequence databases. It is used to find sequence homolog to predict the identity, function, 3D structure of the query sequence. It shows better results for protein sequences than nucleotide sequences.

Salient features Local alignment: BLAST tries to find patches of regional similarity, rather than trying for global fit between the query and the database sequence. BLAST works under the assumption that high-scoring alignments are likely to contain short stretches of identical or near identical letters, called words .

Overview BLAST is extremely fast, the program can be run locally or queries can be e-mailed to NCBI server. It does not guarantee to find the best alignment between query and database, it may miss matches. Its because its strategy is expected to find most matches, & this way it sacrifices complete sensitivity thus to gain speed .

Working (brief) BLAST searches in two phases. First, it looks for short subsequences that are likely to have significant matches. Then it tries to extend these matched regions on both sides in order to obtain maximum sequence similarity.

General working of BLAST

Substitution matrix It is a scoring method used in alignment of one residue against other. Margaret dayhoff and her co-workers developed the first substitution matrix used in comparison of protein sequences for evolutionary terms. These matrices are commonly called as PAM matrices. In contrast to PAM, Steve Henikoff and his coworkers developed BLOSUM matrices.

Substitution matrix Percent accepted mutation matrix( PAM) BLOSUM PAM matrices are based on global alignment of closely related proteins. Number accompanying PAM refers to evolutionary distanced. Larger number represent greater evolutionary distance. PAM 250 is widely used. BLOSUM matrices are based on local alignments. Smaller number corresponds to greater evolutionary distant sequences. BLOSUM 62 is widely used

Steps involved Pre processing of the query:- Quickly locate ungapped similarity between query sequence and sequence from database. All words of length ‘W’, of the query are compared with database sequences. Generation of hits:- Hit is made with one or several successive pairs of similar words, and characterised by its positon in each of two sequences. All the possible hits between query and database are calculated

Extension of the hits:- every hit is now extended, without gaps, inorder to determine whether this hits may be part of a larger segment of similarity. every extended segment pair that scores the same or better than S (set as parameter of program) is kept and called as HSP( high scoring segment pair).

Types of BLAST Standard BLAST are of five types: BLASTp BLASTn BLASTx tBLASTn tBLASTx Other class include: MegaBLAST PSI BLAST PHI BLAST

BLASTp – this program compares an amino acid query sequence against a protein sequence database. BLASTn – it compares a nucleotide query sequence against a nucleotide sequence database. BLASTx – it searches the six frame translation products of a nucleotide sequence against a protein database. tBLASTn – it searches a protein sequence against translated nucleotide sequence in the database. tBLASTx – it compares the six frame translations of a nucleotide query sequence against six frame translations of database.

Mega BLAST – it is a program optimized for aligning long sequences. It can only work with DNA sequences. PSI BLAST – it stands for position specific iterated BLAST. It is useful for protein similarity search. PHI BLAST – pattern hit initiated BLAST, it can be used to search for a specific pattern or motif

FASTA It’s a sequence analysis tool, similar to BLAST. It was developed by W.R. Pearson and Lipman and this algorithm can be accessed from EBI site. Fast A gives better results for nucleotide sequences than protein. FastP is for protein sequences.

Working (brief) finds regions of similarity by first breaking the sequence into short subsequences, then searching for diagonals with highest density of words that match. The alignment in diagonals is then refined. Its fast but is not guaranteed to find the best alignment, it may miss matches.

Steps involved First FASTA prepares a list of words from the pair of sequences to be matched. Words can be 3-6 nucleotides or 1or 2 amino acids. It uses non overlapping words, it matches the words and makes a count of it. It creates the word diagonal and finds a high scoring match. The output is labeled as unit1 Only if score is sizable it proceeds to the second level. In the second level, for every best hit of words, it looks for neighboring approximate hits If the score value is good, and prepares a larger dot matrix diagonal.

The best score from this second level scoring is called initin , The initin scores are saved for each comparison of a query sequence with database sequence.

Types Different programs in FASTA include FASTP (protein sequence). TFASTA (compares a query protein sequence to a DNA sequence database). FASTF( compares a set of ordered peptide fragments obtained from analysis of protein by cleavage and sequencing of protein bands resolved by electrophoresis against a protein database). TFASTF( compares a set of ordered peptide fragments against a DNA database).

Thank you
Tags