Sequence alignment global vs. local

38,286 views 21 slides Sep 14, 2019
Slide 1
Slide 1 of 21
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21

About This Presentation

It deliver the details about the global and local sequencing in bioinformatics.


Slide Content

Sequence alignment- global vs. local alignment Presented by Fathima Hameed

outline Introduction Principle Types of alignment - global alignment - local alignment - semi global alignment Difference between global and local Dynamic programming method Advantages Disadvantages references

Introduction A sequence alignment is a way of arranging the primary sequences of DNA, RNA, or protein to identify regions of similarity that may be a consequence of functional, structural, or evolutionary relationships between the sequences. The sequence alignment is made between a known ssequence and unknown sequence or between two unknown sequences. The known sequence is called reference sequence, unknown sequence is called query sequence.

principle Alignment can reveal homology between sequences Similarity is descriptive term that tells about the degree of match between the two sequences Sequence similarity does not always imply a common function Conserved function does not always imply similarity at the sequence level Convergent evoluation ; sequences are highly similar, but are not homologous.

Types of alignment Based on completeness, it was classified as three types. they are, 1. Global alignment 2. Local alignment 3. semi global alignment

Global alignment Is a matching the residues of two sequences across their entire length. It matches the identical sequences. To align every residue in every sequence, are most useful when the sequences in the query set are similar and of roughly equal size. A general global alignment technique is called the Needleman - Wunch algorithm and is based on dynamic programming.

Local alignment Is a matching two sequence from regions which have more similar with each other. These are more useful for dissimilar sequences that are suspected to contain regions of similarity or similar sequence motifs within their larger sequence context. The Smith – Waterman algorithm is a general local alignment method also based on dynamic programming.

Semi global alignment It’s a hybrid method, known as semi global or glocal methods. To find the best possible alignment that includes the start and end of one or the other sequence. This can be especially useful when the downstream part of one sequence overlaps with the upstream part of the other sequence.

Global sequence alignment Local sequence alignment Made to align the entire sequence Finds local region Contains all letters from both the query and target sequence Aligns a substring of the query sequence to a substring of the target sequence It have the Same length and are quite similar Finds stretches of sequence with high level of matches Suitable for aligning two closely related sequences. Suitable for aligning more distantly related sequences Usually done for comparing homologous genes Used for finding out conserved patterns of DNA These technique is the Needleman- Wunsch algorithm These are Smith – Waterman algorithm Ex, > EMBOSS Needle > Needleman – Wunsch global align nucleotide sequences (specialized BLAST) Ex, > BLAST > EMBOSS Water > LALIGN

Dynamic programming in bioinformatics It is widely used in bioinformatics for the tasks such as sequence alignment, protein folding, RNA structure prediction and protein –DNA binding. Needleman and wunsch describes general algorithm for sequence alignment. Maximize a score of similarity to give maximum match. Maximum match= largest number of nucleotides that can be match with others. That want to quantify sequence similarity between two sequences.

Dynamic programming method It was introduced by Richard Bellman in 1940. The word programming here denotes finding an acceptable plan of action not computer programming. It is useful in aligning nucleotides sequences of DNA and amino acid sequence of proteins coded by that DNA. Is solving complex problems by breaking them into a simpler sub problems. Problem can be divided into many smaller parts. Dynamic programming is a three step process that involves: 1. initialization 2. matrix filling (scoring) 3. trace back and aligning

Dynamic programming in sequence alignment 1.Initialization : The first step in the global alignment dynamic programming approach is to create a matrix with M+1 columns and N+1 rows where M and N corresponds to the size of the sequences to be aligned. 2. Matrix filling: we will the matrix with highest possible scores. to align with diagonal (align in next position.) align in off- diagonal requires inserion of corresponding gaps. 3.trace back and aligning: move from last corner and follow arrow.

Global alignment via dynamic programming 1 st column and 1 st row will be empty. Fill 1 st block with zero. Then fill 1 st row and 1 st column with gap penalty multiples. While filling the matrix there are three possible values horizontal; score + gap penalty vertical ; score + gap penalty diagonal; score + (match / mismatch) We have to write max score from these values in a cell Let, match = +1 mismatch= -1 gap penalty= -2

Lets, sequence - AAAC sequence – AGC A A A C -2 -4 -6 -8 A -2 1 -1 -3 -5 G -4 -1 -2 -4 C -6 -3 -2 -1 -1

Backward tracking In backward tracking we have to move from last cell (lower corner) and follows arrow from which cell the current cell’s values come from and go ahead. Now we have to align this sequences. For aligning there are 2 rules. 1.If the value come from column we will have to write 2 sequences. 2. If value come from horizontal or vertical then we will have to write perpendicular and add gap to other side.

Local alignment via dynamic programming Algorithm is same as in global alignment, but there are some changes. We fill 1 st column and 1 st row with zero. If the value comes in negative number than it is replaced by zero. Backtracking will be start from maximum value. Let, match= 1 mismatch = 0 gap penalty = 0

Lets , sequence - GAATTCAGTTA sequence- GGATCGA G A A T T C A G T T A G 1 1 1 1 1 1 1 1 1 1 1 G 1 1 1 1 1 1 1 2 2 2 2 A 1 2 2 2 2 2 2 2 2 2 3 T 1 2 2 3 3 3 3 3 3 3 3 C 1 2 2 3 3 4 4 4 4 4 4 G 1 2 2 3 3 4 4 5 5 5 5 A 1 2 3 3 3 3 4 5 5 5 6

Backtracking After the matrix fill step, the maximum alignment score for the two test sequences is 6. the trace back step determines the actual alignment that result in the maximum score. Rule will be same for this as in global alignment Seq # 1 GAATTCAGTTA Seq#2 GA – TC – G – - A so in this way we align the sequence using dynamic programming.

Uses of sequencing It can be used to find genes, segments of DNA that code for a specific protein or phenotype If a region of DNA has been sequenced, it can be screened for characteristics features of genes. Advantages of global alignment: Easy to understand, complete sequences in output. Checking minor differences between 2 sequences. Finding polymorphisms between 2 sequences. Advantages of local alignment: mRNA vs. genomic DNA ; introns / exons Genes/ proteins are modular Finding repeat elements within 1 sequences. Possible to determine e-values.

References www.google.com www.cs.mcgill.ca /~rwest/wikispeedia/wpcd/wp/s/sequence-alignment.htm https://www.slideshare.net/mobile/ammarkareem3/sequence-alignment-58496054 https:www.slideshare.net/mobile/zohaibkhan404/dynamic-programming-42984154

Thank you
Tags