Dot matrix seminar

2,012 views 44 slides Sep 29, 2019
Slide 1
Slide 1 of 44
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44

About This Presentation

it is about dot plot analydis


Slide Content

Dot matrix m.Sri aravind lal b841018

Introduction In computional biology a dot plot is a graphical methods for comparing two biological sequences and identifying region of close similarity It is type of recurrence plot (graph of horizontal and vertical axis

history These are introduced by Gibbs and Mclntyre in 1970 These plot are two dimensional matrices that have sequences of the proteins being compared along the vertical and horizontal axis. Individual cells in matrix can be shaded black ,if the residue are identical Thus matched sequences run of diagonal lines across the matrix.

principle The principle used to generate the dot plot is: The top X and the left y axes of a rectangular array are used to represent the two sequences to be compared Calculation: Matrix Columns = residues of sequence 1 Rows = residues of sequence 2

Example Seq 1: TWILIGHTZONE Seq 2: MIDNIGHTZONE Matrix= 12 * 12 A dot is plotted at every co-ordinate where there is similarity between the bases

Dot plot interpretation Seq1: ATGATAT Seq2: ATGATAT

Simple plot terms Window: size of sequence block used for comparison. example : window = 1 Stringency = Number of matches required to score positive. example: stringency = 1 (required exact match)

DotPlot scoring Dotplot - matrix, with one sequence across top, other down side. Put a dot, or 1, where ever there is identity. G A T C T G A T C T

DotPlot scoring Dotplot - matrix, with one sequence across top, other down side. Put a dot, or 1, where ever there is identity. G A T C T G A T C T .

DotPlot scoring Dotplot - matrix, with one sequence across top, other down side. Put a dot, or 1, where ever there is identity. G A T C T G A T C T . . . .

DotPlot scoring Dotplot - matrix, with one sequence across top, other down side. Put a dot, or 1, where ever there is identity. G A T C T G A T C T . . . . . . .

Intragenic Comparison Rat Groucho Gene  It is the family of transcriptional co-repressor proteins

Intergenic Comparison Rat and Drosophila Groucho Gene

Intergenic comparison Nucleotide sequence contains three domains.

Intergenic comparison Nucleotide sequence contains three domains. 50 - 350 - Strong conservation Indel places comparison out of register

Intergenic comparison Nucleotide sequence contains three domains. 50 - 350 - Strong conservation Indel places comparison out of register 450 - 1300 - Slightly weaker conservation

Intergenic comparison Nucleotide sequence contains three domains. 50 - 350 - Strong conservation Indel places comparison out of register 450 - 1300 - Slightly weaker conservation 1300 - 2400 - Strong conservation

Analysis of dot plot matrix Principal diagonal shows identical sequence . Global and local alignment are shown. Multiple diagonal indicate repeatation Reverse diagonal (perpendicular to diagonal) indicate INVERSION . Reverse diagonal crossing diagonal (X) indicate PALINDROMES . Formation of box indicate the low complexity region

Direct repeat

Palindromic sequence A palindromic sequence is a nucleic acid sequence (DNA or RNA) tha is same whether read 5' to 3' on one strand or 5' to 3' on the complementary strand with which it forms a double helix.

Inverted repeat An inverted repeat is sequence of nucleotides followed downstream by its reverse complement. Inverted repeat: abcd ee dcba fghijklmno

Low-complexity regions Low-complexity regions in sequences can be found as regions around the diagonal all obtaining a high score . Low complexity regions are calculated from the redundancy of amino acids within a limited region.

Dot plot software we can use the EMBOSS package, which are following: Dotmatcher Dotpath Polydot Dottup (http://emboss.bioinformatics.nl/cgi-bin/emboss/dottup

journals

Application Shows the all possible alignment between two nucleic acid and amino acid sequences. Help to recognise large region of simiarity . An excellent approach for finding sequence transposition . To find the location of genes between two genomes. To find the non sequential alignment.

limitation For longer sequence, memory required for the graphical representation is very high. So long sequence can not be aligned . (only 2 sequence can align at a time) Lots of insignifcant matches makes it noisy (so many off diagonal appear). Time required to compare two sequences is proportional to the product of length of the sequences time of the search window. (not very quick) i.e , higher efficiency of short sequence. Low efficiency of long sequence.

Gap penality Gap penality is a method of scoring alignment of two or more sequence. when a gap is inserted in an sequence it matches more than the sequence without gap insertion. Too many gap can cause an alignment to become meaningless. Types of gap penality Constant Linear affine

Scoring schemes

Types of gap penality Constant This is the simplest type of gap penality and a fixed negative score is given to every gap, regardless of its length. ATTGACCTGA EACH MATCH=1 SCORE 7-1=6 AT CCTGA WHOLE GAP=1

Types of gap penality Linear  The linear gap penalty takes into account the length (L) of each insertion/deletion in the gap. ATTGACCTGA EACH MATCH =1 AT CCTGA EACH GAP = -1 The score here is (7 − 3 = 4).

Types of gap penality Affine Most widely used gap penality and it combines both linear and constant gap penality . Penality is based on form of A+B.L A is known as the gap opening penalty, B the gap extension penalty and L the length of the gap. Gap opening refers to the cost required to open a gap of any length, and gap extension the cost to extend the length of an existing gap by 1.

VALUE IS 26

VALUE IS 7

References Bioinformatics concepts, skill & applications, second edition by S.C.Rastogi , Namita Mendriatta , Parag Rastogi http://en.wikipedia.org/wiki/Dot_plot_%28bioinformatics%29 http://lectures.molgen.mpg.de/Pairwise/DotPlots/ https://ugene.unipro.ru/wiki/pages/viewpage.action?pageId=4 227426 http://www.clcsupport.com/clcgenomicsworkbench/650/Examples _interpretations_dot_plots.html

EMBOSS Dotpath