sequence alignment tool and definitionss

Soumyajitdey27 40 views 15 slides Sep 26, 2024

Slide 1 of 15

About This Presentation

basic so please

Size: 898.52 KB

Language: en

Added: Sep 26, 2024

Slides: 15 pages

Slide Content

Sequence Alignment

Gaps in an Alignment Gap opening penalty Gap extension penalty

Scoring Matrices are used to assign a score to each comparison of a pair of characters. The scores in the matrix are integer values which assign a positive score to identical or similar character pairs, and a negative value to dissimilar pairs. The matrices were constructed by analysing known families of proteins. Scoring Matrices

BLOSUM versus PAM The PAM family – PAM matrices are based on global alignments of closely related proteins. – The PAM1 is the matrix calculated from comparisons of sequences with no more than 1% divergence; Other PAM matrices are extrapolated from PAM1. Developed by Margaret Dayhoff and co-workers. The BLOSUM family – BLOSUM matrices are based on local alignments (blocks) – All BLOSUM matrices are based on observed alignments (BLOSUM 62 is a matrix calculated from comparisons of sequences with no less than 62% similarity) Higher numbers in the PAM matrix naming scheme denote larger evolutionary distance; BLOSUM is the opposite. – For alignment of distant proteins, you use PAM150 instead of PAM100, or BLOSUM50 instead of BLOSUM62. Scoring Matrices

For global alignments use PAM matrices Lower PAM matrices---find short alignments of highly similar regions Higher PAM matrices find weaker long alignments For local alignments use BLOSUM matrices BLOSUM matrices with high numbers---better for similar sequences BLOSUM matrices with low number—are better for distant sequences Scoring Matrices

Assignment 2: Introduction to BLAST B asic L ocal A lignment S earch T ool

BLAST Results Max score: The score of the highest scoring HSP from that database sequence Total score: The total score of all HSP's from that database sequence. Query Coverage: It is the percent of length of the query covered. Max Identity: It is the maximal percent identity of the HSP HSP=High-scoring Segment Pair : It is a local alignment that achieves one of the highest alignment scores in a given search.

Query - sequence used for the search Subject - sequence that was found to match the similarity criteria

Steps for searching a protein sequence database by a query protein sequence include the following : Eg : Searching with the word : PQG The likelihood of a match to itself is found in the BLOSUM62 matrix as the log odds score of a P-P match + a Q-Q match + G-G match =7+5+6 =18 Similarly matches of PQG to PEG would score 15 PRG  14 PSG  13 and PQA  12 If the cutoff score T is 13 possible matches to PQG would include PEG(15) but not PQA(12) The above procedure is repeated for each three-letter word in the query sequence.

Is the similarity significant or could it have arisen by chance? If the score of the alignment observed is no better than might be expected from a random permutation of the sequence, then it is likely to have arisen by chance. The alignment is unlikely to be significant, if the randomized sequences score as well as the original one.

Significance of BLAST results- Z score and p-value Z-score =0 => observed similarity is no better than the average of random permutations of sequence, and might well have arisen by chance The Z-score reflects the extent to which the original result is an outlier from the randomized sequence P-Value: P is another measure of significance. It is the probability that the observed match could have happened by chance. P<=10-100 :exact match P in range 10-100 to 10-50 :sequences very nearly identical P in range 10-50 to 10-10 :closely related sequences, homology certain P in range 10-5 to 10-1 :usually distant relatives P > 10-1 :insignificant match probably

Significance of BLAST results- E- value The E-value of an alignment is the expected number of sequences that give the same Z-score or better if the database is probed with a random sequence. E is found by multiplying the value of P by the size of the database probed. E-values range between 0 and the number of sequences in the database searched. E<=0.02 :sequences probably homologous E between 0.02 and 1 :homology cannot be ruled out E >1 :expect this as good a match by chance

PSI-BLAST PHI-BLAST Algorithms may also differ: Sequences types used in BLAST may differ:

sequence alignment tool and definitionss

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

sequence alignment tool and definitionss

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......