Scoring schemes in bioinformatics (blosum)

6,888 views 12 slides Jan 12, 2020
Slide 1
Slide 1 of 12
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12

About This Presentation

BLOSUM SCORING MATRIX


Slide Content

SCORING SCHEMES IN BIOINFORMATICS (BLOSUM)

CONTENT INTRODUCTION TO BLOSUM ALGORITHM BLOSUM -62 Matrix THE BLOSUM SCORE COMPARISON BETWEEN PAM AND BLOSUM SIGNIFICANCE OF SCORING MATRICES

INTRODUCTION TO BLOSUM -It is based on PROSITE signatures (signatures are short expressions like C-X-X-C-X-X-X-C). In short BLOSUM approach is as follows- Series of blocks amino acid substitution matrices are derived based on the direct observation for every possible amino acid substitution in multiple sequence alignments. – These were constructed based on more than 2000 conserved amino acid patterns (locally aligned each feature to get 'blocks’) representing 500 groups of protein sequences. – Blocks are locally conserved regions/ ungapped alignments of less than sixty amino acid residues.

INTRODUCTION TO BLOSUM (cont.) – More constrained regions are likely to be related to structure/function . – Blocks contain sequences at all different evolutionary distances and may be highly biased (e.g. many identical sequences) – The frequencies of amino acid substitutions of residues in these blocks are calculated to produce a numerical table, or block substitution matrix. It deals with bias and distance.

ALGORITHM Algorithm is as follows- - Cluster all sequences with less than X% identities. - Clustered sequences count as 1 sequence . - If X is 100% it simply removes identical sequences If X is <100% it reduces the weight on closely related sequences . - Calculate substitution frequencies and log-odd matrix . This gives a BLOSUM X table.

ALGORITHM (cont.) The BLOSUM matrices are actual percentage identity values of sequences selected for construction of matrices. In the reversing order of the PAM numbering system, the lower the BLOSUM number , the more divergent sequences they represent. For example in BLOSUM 62-sequences greater than 62% identical are clustered and in BLOSUM 80-sequences greater than 80% identical are clustered

BLOSUM -62 Matrix

THE BLOSUM SCORE The BLOSUM score------- for a particular residue pair is derived from the log ratio of observed residue substitution frequency versus the expected probability of a particular residue. The log odds is taken to the base of 2 (instead of 10 as in the PAM matrices). The resulting value is rounded to the nearest integer and entered into the substitution matrix. Positive score corresponds to substitutions that occur more frequently than expected among evolutionarily conserved replacements and reverse is true for negative scores .

COMPARISON BETWEEN PAM AND BLOSUM There are many differences between both matrices- ● The main difference is that except for PAM1 other PAM matrices are derived from an evolutionary model where as the BLOSUM matrices consist of entirely direct observations . So, BLOSUM matrices may have less evolutionary meaning than the PAM matrices Thus, PAM matrices are used for making phylogenetic tree . ● Since in PAM matrices mathematical extrapolation procedures are used the PAM values may be less realistic for divergent sequences . ● The BLOSUM matrices are entirely derived from local sequence alignments of conserved sequence blocks , whereas the PAM1 matrix is based on the global alignment of full length sequences composed of both conserved and nonconserved regions . This is why the BLOSUM matrices prove to be more advantageous in searching databases and finding conserved domains in proteins.

COMPARISON BETWEEN PAM AND BLOSUM (cont.) ● Several empirical tests have shown that the BLOSUM matrices outperform the PAM matrices in terms of accuracy of local alignment. This could be largely due to the fact that BLOSUM matrices are derived from much larger and more representative dataset than the one used for deriving PAM matrices. This renders the value for the BLOSUM matrices more reliable . ● Newer matrices are derived using same approach and much larger datasets to compensate deficiencies in the PAM system . These include Gonnet matrices and Jones-Taylor-Thornton matrices . These have been shown to have equivalent performance to BLOSUM in regular alignment , and are robust in phylogenetic tree construction.

SIGNIFICANCE OF SCORING MATRICES Bioinformatics is mainly concerned with the detection of evolutionary relationship between sequences. The use of matrices extends the ability to detect distant relationships far beyond what could be found using the identity matrix. It is preferred to compare protein sequences rather than nucleic acid for distant relations having even less than 30% identical residues . The ability to encode permissible changes in protein structure means that protein sequence alignment can reveal much more distant evolutionary relationships than naïve comparison of nucleic acid sequences. Scoring matrices appear in all analysis involving pairwise comparison . The choice of matrix can strongly influence the outcome of the sequence analysis. The scoring matrices implicitly represent a particular theory of evolution . To understand theory underlying a given alignment scoring matrix can aid in making proper choice .

REFERENCES 1.Scoring matrices. Ashwini S Mushunuri.BBI-2-13010. https://www.slideshare.net/ashwinimushunuri96/scoring-matrices. 2. Point accepted mutation. https://en.wikipedia.org/wiki/ Point_accepted_mutation . 3. Adansonian Classification - Medical Definition from MediLexicon www.medilexicon.com/dictionary/18016 4. S.C. Rastogi, Namita Mendiratta, Parag.Rastogi. Bioinformatics concepts, Skills & Applications. CBS Publishers & distributors. New Delhi. http://www.cbspd.com 5. D.R. Westhead, J.H., J.H.Parish and R.M. Twyman. . Instant Notes bioinformatics. Viva books Private Limited.
Tags