Global alignment

13,392 views 13 slides Jul 17, 2012
Slide 1
Slide 1 of 13
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13

About This Presentation

Global Alignment algorithm with example and applications


Slide Content

Global Alignment Pinky Sheetal V M.tech Bioinformatics

Contents Sequence Alignment Dynamic Programming Algorithm Global Alignment

The result of inserting gaps into the strings such that afterwards as many positions as possible coincides. X: AGGCTATCA Y: TAGCTATCA Sequence Alignment

Scoring weights: For a match : +m For a mismatch : -s For a gap : -d Alignment Score: F = (# matches) x m - (# mismatches) x s – (#gaps) x d

Complex Problem Sub prob1 Sub prob2 Sub prob3 Soln 1 Complete Solution Soln 2 Soln 3 Dynamic Programming Algorithm

Global Alignment

Allows obtaining the optimal alignment with linear gap cost has been proposed by Needleman and Wunsch by providing a score, for each position of the aligned sequences. Based on the dynamic programming technique . For two sequences of length m and n we define a matrix of dimensions m+1 and n+1.

Termination Condition: Optimal score between the two sequences obtained at the last cell of the last row and last column.

Sequences: S: ATTATCT T: TTTCTA T S _ A T T A T C T _ T T T C T A -1 -2 -3 -4 -5 -6 -7 -1 -2 -3 -4 -5 -6 -1 -2 -3 -4 -5 1 2 1 -1 -2 3 4 3 2 1 -1 2 3 4 3 4 -2 1 4 3 6 5 -3 3 6 5 6 -4 -1 2 5 8 7 Match Score = +2 Mismatch Score = 0 Gap Penalty = -1 i-1, j-1 i-1, j I, j-1 I, j

_ A T T A T C T _ T T T C T A -1 -2 -3 -4 -5 -6 -7 -1 -2 -3 -4 -5 -6 -1 -2 -3 -4 -5 1 2 1 -1 -2 3 4 3 2 1 -1 2 3 4 3 4 -2 1 4 3 6 5 -3 3 6 5 6 -4 -1 2 5 8 7 T S

Optimal Alignment: S T No: of matches = 5 No: of mismatches = 3 (5 x 2) – (3 x -1) = 7 A T T A T C T – - T T – T C T A

Tools that utilize Global Alignment Algorithm EMBOSS Needle EMBOSS Stretcher Applications: Identify Conserved Interaction Pathways and Complexes [ Brian P. Kelley,et al.2003 ] Functional Orthology Detection [ Rohit Singh.et al.2008 ] Advantages: The similar sequence region is of the same order and orientation. Disadvantage: Slow, Memory Intensive Cannot be applied on genome-sized sequences

Thank you