Limitations of Identity scoring In identity scoring we use scoring number 1 and 0. It has poor diagnostic power because all identical matches carrying equal weighting. In protein sequence alignment and scoring, alphabet size increases from 4 to 20. So scoring matrix becomes more complicated in protein than that of DNA. Different amino acids partially match in chemical properties, for which Identity scoring (1 and 0) is not reliable.
Positive mismatches Met → Leu substitution does not alter the hydrophobic interaction Met → Arg substitution alters the hydrophobic interaction Hydrophobic interaction between this Met and Ile stabilizes this binding This interaction is essential to cell survival. So in the course of evolution Met → Leu substitution is more likely to occur than Met → Arg substitution. Unit matrix scheme is not justified in this case
PAM : Point Accepted Mutation Margarett Dayhoff (1978 ) Based on evolutionary distance obtained from 71 closely related protein sequence alignments Muataion that comprise of change in single amino acid (substitution) which is accepted by natural selection. (Accepted point mutation) Mutation of gene region (coding single amino acid) to produce different amino acid. That mutation accepted as predominant form in a species. 1 PAM meaning one APM per 100 amino acids. It is based on global alignment (aligns entire sequence)
PAM : Point Accepted Mutation Markovian assumption that each amino acid change at a site being independent of previous change at that site. So we can cover as much as evolutionary divergence as we need (higher PAM unit) by extrapolating same PAM1 again and again. 1 PAM denoted as PAM 1 PAM 1 x PAM 1 = PAM 2 So generally, PAM x = PAM 1 x ( x iteration of PAM 1 ) PAM 250 = PAM 1 250 (widely used scoring matrices)
PAM : Point Accepted Mutation If we consider PAM 100 , it does not mean that after 100 PAM of evolution every residue will have change. Some may mutate several times. Some may returned to its original state. Some residue may not changed at all.
PAM matrix origin Based on 71 groups of closely related protein. PAM (percent accepted mutation) is inferred from the types of changes observed in this proteins. (tabulated) Relative mutability of different amino acids were calculated. These two data combined in mutation probability matrix. The elements of this matrix give the probability that the amino acid in one column will be replaced by the amino acid in some row after a given evolutionary interval. 0 PAM having ‘ones’ on the main diagonal and ‘zeroes’ elsewhere. Dayhoff et al., 1978
Number of accepted point mutation (x10) accumulated from closely related sequences Dayhoff et al., 1978
Computation of relative mutability Dayhoff et al., 1978
PAM matrix origin Values in mutation probability matrix as follows…. Non diagonal elements have the value Where, A ij is elements of accepted point mutation matrix. λ is proportionality constant. M j is mutability of j th amino acid. Diagonal elements have the value Dayhoff et al., 1978
Mutational probability matrix for evolutionary distance of 1 PAM (for simplification elements are multiplied by 10000) Dayhoff et al., 1978
Mutational probability matrix for evolutionary distance of 250 PAM (for simplification elements are multiplied by 100) Dayhoff et al., 1978 As per, PAM x = PAM 1 x ( x iteration of PAM 1 )
PAM matrix origin Relatedness odd matrix R ij = (odd score matrix) when two sequence compared position to position, one should multiply the odds for each position to get odd score for whole protein. So, logarithm (multiplied by 10) of odd matrix is more convenient and used to develop final log odd matrix (LOD score) (Allows total score of all substitutions by summation) Where, f i = observed frequency of amino acid A i Dayhoff et al., 1978 Here, reciprocal substitution can occur…. A → B = B → A So, L od score matrix value filled up by average of both alternative substitution L od score .
Figure: The PAM250 log odd matrix . It is the most used PAM matrix and represents the mutation probabilities of sequences with 20% of equivalence If values in matrix, >0 → likely mutation =0 → neutral or random <0 → unlikely mutation Opperdoes et al .
Correspondence of observed differences between proteins and their evolutionary distance Greater PAM = greater evolutionary distance and vice versa.
References Dayhoff, M., Schwartz, R., & Orcutt , B. (1978). 22 a model of evolutionary change in proteins. In Atlas of protein sequence and structure (Vol. 5, pp. 345-352 ). PAM matrices . ( n.d. ). Retrieved May 30, 2020, from http://www.cs.tau.ac.il/~rshamir/algmb/98/scribe/html/lec03/node9.html Bioinformatics tutorial: Construction of substitution matrices part II: PAM matrices 2020 . ( n.d. ). Retrieved June 1, 2020, from https://bioinformaticshome.com/bioinformatics_tutorials/sequence_alignment/substitution_matrices_page2.html Opperdoes , Fred & Lemey , Philippe. (2018). Phylogenetic analysis using protein sequences PAM Matrices . ( n.d. ). Retrieved June 2, 2020, from http:// www.quretec.com/u/vilo/edu/2002-03/Tekstialgoritmid_I/Loengud/Loeng3_Edit_Distance/pam.html