Global and local alignment (bioinformatics)

44,063 views 18 slides Mar 22, 2017
Slide 1
Slide 1 of 18
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18

About This Presentation

A general global alignment technique is the Needleman–Wunsch algorithm, which is based on dynamic programming. Local alignments are more useful for dissimilar sequences that are suspected to contain regions of similarity or similar sequence motifs within their larger sequence context.


Slide Content

11/9/2016 Introduction to Bioinformatics 1

11/9/2016 Introduction to Bioinformatics 2 Group Member 151-15-255 151-15-453 151-15-240 151-15-245

11/9/2016 Introduction to Bioinformatics 3

11/9/2016 Introduction to Bioinformatics 4 Wh y align sequences? Useful for discovering Functional Structura l and E volutionary relationship F or example To find whethe r two (o r more ) gene s o r protein s are evolutionarily related to each other Two proteins with similar sequences will probably be structurally or functionally similar

11/9/2016 Introduction to Bioinformatics 5 Global Vs Local Alignment Global Alignment A general global alignment technique is the Needleman– Wunsch algorithm, which is based on dynamic programming. Attempts to align the maximum of the entire sequence Suitable for similar and equal length sequences Local Alignment – Local alignments are more useful for dissimilar sequences that are suspected to contain regions of similarity or similar sequence motifs within their larger sequence context. Stretches of sequences with highest density of matches are aligned Suitable for partially similar, different length and conserved region containing sequences

11/9/2016 Introduction to Bioinformatics 6

11/9/2016 Introduction to Bioinformatics 7 Allows obtaining the optimal alignment with linear gap cost has been proposed by Needleman and Wunsch by providing a score, for each position of the aligned sequences. Based on the dynamic programming technique. For two sequences of length m and n we define a matrix of dimensions m+1 and n+1. Global Alignment

8 Global Alignment Three steps in dynamic programming Initialization Matrix fill (scoring) Traceback (alignment) Smith–Waterman algorithm

11/9/2016 Introduction to Bioinformatics 9 Sequences: S: ATTATCT T: TTTCTA T S _ A T T A T C T _ T T T C T A -1 -2 -3 -4 -5 -6 -7 -1 -2 -3 -4 -5 -6 -1 -2 -3 -4 -5 1 2 1 -1 -2 3 4 3 2 1 -1 2 3 4 3 4 -2 1 4 3 6 5 -3 3 6 5 6 -4 -1 2 5 8 7 Match Score = +2 Mismatch Score = 0 Gap Penalty = -1 i-1, j-1 i-1, j I, j-1 I, j

11/9/2016 Introduction to Bioinformatics 10 _ A T T A T C T _ T T T C T A -1 -2 -3 -4 -5 -6 -7 -1 -2 -3 -4 -5 -6 -1 -2 -3 -4 -5 1 2 1 -1 -2 3 4 3 2 1 -1 2 3 4 3 4 -2 1 4 3 6 5 -3 3 6 5 6 -4 -1 2 5 8 7 T S

11/9/2016 Introduction to Bioinformatics 11 Optimal Alignment: S T No: of matches = 5 No: of mismatches = 3 (5 x 2) – (3 x -1) = 7 A T T A T C T – - T T – T C T A

11/9/2016 Introduction to Bioinformatics 12

13 S/T A T G A T G T A G G A G A T G T G C 0 + 2 +-2 0 + -2 2 + 2 = 2 0 + -2 = 0 0 + -2 = 0 Match : 2, Mismatch : -1, Gap : -2 0 + -1 0 + -2 0 + -2 + 2 = 0 + -2 = 0 0 + -2 = 0 Matrix fill (scoring )

14 S/T A T G A T G T A G G 2 2 A 2 4 2 2 G 1 2 2 3 4 2 4 A 2 4 2 2 2 4 2 T 4 2 2 6 4 4 2 3 G 2 6 4 4 8 6 4 4 T 2 4 4 6 6 10 8 6 G 4 3 4 8 8 9 10 C 2 3 1 6 7 7 8 Match : 2, Mismatch : -1, Gap : -2 Matrix fill (scoring )

Trace back 15 S/T A T G A T G T A G G 2 2 A 2 4 2 2 G 1 2 2 3 4 2 4 A 2 4 2 2 2 4 2 T 4 2 2 6 4 4 2 3 G 2 6 4 4 8 6 4 4 T 2 4 4 6 6 10 8 6 G 4 3 4 8 8 9 10 C 2 3 1 6 7 7 8 Match : 2, Mismatch : -1, Gap : -2

Alignment 16 G A T G T A G | | | | | | | G A T G T - G 2 2 2 2 2 -2 2 6 X 2 = 12 1 X -2 = -2 10 G A T G T | | | | | G A T G T 2 2 2 2 2 5 X 2 = 10 10

11/9/2016 Introduction to Bioinformatics 17

11/9/2016 Introduction to Bioinformatics 18