Clustal

12,958 views 24 slides Jan 27, 2019
Slide 1
Slide 1 of 24
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24

About This Presentation

This ppt describes about clustal,clustal w and different algorithms.


Slide Content

CLUSTAL
BY,
BENITTA BENNY
S2BIOINFORMATICS

CLUSTAL
•Clustal - computer programs used 
in Bioinformatics for multiple sequence alignment. 
• Many versions of Clustal over the development of 
the algorithm .  A  combination of the software 
availability and may not be supported for every 
current version of the Clustal tools.
• Clustal Omega has the most wide variety of 
operating systems out of all the Clustal tools.

Clustal
lus ClustalW
•ClustalW like the other Clustal tools is used for 
aligning multiple nucleotide or protein sequences in 
an efficient manner. It uses progressive alignment 
methods- align the most similar sequences first and 
work their way down to the least similar sequences 
until a global alignment is created. ClustalW is a 
matrix-based algorithm- tools like T-
Coffee and Dialign are consistency-based. ClustalW  
- fairly efficient algorithm  competes - against other 
software. This program requires three or more 
sequences in order to calculate a global alignment, 
for pairwise sequence alignment (2 sequences) use 
tools similar to EMBOSS, LALIGN.

Algorithm
•ClustalW uses progressive alignment methods.  
sequences with the best alignment score are aligned 
first, then progressively more distant groups of 
sequences are aligned. 
•This heuristic approach is necessary due to the time 
and memory demand of finding the global optimal 
solution. 
•The first step to the algorithm is computing a rough 
distance matrix between each pair of sequences, also 
known as pairwise sequence alignment. 
•The next step is a neighbor-joining method that uses 
midpoint rooting to create an overall guide tree.  
The guide tree is then used as a rough template to 
generate a global alignment.

Multiple Alignment Method
•The steps are summarized as follows:
•Compare all sequences pairwise. 
•Perform cluster analysis on the pairwise data to generate a 
hierarchy for alignment. This may be in the form of a binary tree 
or a simple ordering
•Build the multiple alignment by first aligning the most similar 
pair of sequences, then the next most similar pair and so on. Once 
an alignment of  two sequences has been made, then this is fixed. 
Thus for a set of sequences A, B, C, D having aligned A with C and 
B with D the alignment of A, B, C, D is obtained by comparing the 
alignments of A and C with that of B and D using averaged scores 
at each aligned position.

ClustalW- for multiple alignment
•ClustaW is a general purpose multiple alignment
program for DNA or proteins.
•ClustalW is produced by Julie D. Thompson, Toby
Gibson of European Molecular Biology Laboratory,
Germany and Desmond Higgins of European
Bioinformatics Institute, Cambridge, UK. Algorithmic
•ClustalW is cited: improving the sensitivity of progressive
multiple sequence alignment through sequence weighting,
positions-specific gap penalties and weight matrix choice.
Nucleic Acids Research, 22:4673-4680.

ClustalW can create multiple alignments,
manipulate existing alignments, do profile
analysis and create phylogentic trees.
Alignment can be done by 2 methods:
- slow/accurate
- fast/approximate

ClustalW - Input
http://www.ebi.ac.uk/Tools/clustalw2/index.html
Input
sequences
Gap scoring
Scoring
matrix
Email
address
Output
format

ClustalW - Output
Match strength in decreasing order: * : .

ClustalW - Output

ClustalW - Output

ClustalW - Output

Output of ClustalW
CLUSTAL W (1.7) multiple sequence alignment
HSTNFR GGGAAGAG---TTCCCCAGGGACCTCTCTCTAATCAGCCCTCTGGCCCAG------GCAG
SYNTNFTRP GGGAAGAG---TTCCCCAGGGACCTCTCTCTAATCAGCCCTCTGGCCCAG------GCAG
CFTNFA -------------------------------------------TGTCCAG------ACAG
CATTNFAA GGGAAGAG---CTCCCACATGGCCTGCAACTAATCAACCCTCTGCCCCAG------ACAC
RABTNFM AGGAGGAAGAGTCCCCAAACAACCTCCATCTAGTCAACCCTGTGGCCCAGATGGTCACCC
RNTNFAA AGGAGGAGAAGTTCCCAAATGGGCTCCCTCTCATCAGTTCCATGGCCCAGACCCTCACAC
OATNFA1 GGGAAGAGCAGTCCCCAGCTGGCCCCTCCTTCAACAGGCCTCTGGTTCAG------ACAC
OATNFAR GGGAAGAGCAGTCCCCAGCTGGCCCCTCCTTCAACAGGCCTCTGGTTCAG------ACAC
BSPTNFA GGGAAGAGCAGTCCCCAGGTGGCCCCTCCATCAACAGCCCTCTGGTTCAA------ACAC
CEU14683 GGGAAGAGCAATCCCCAACTGGCCTCTCCATCAACAGCCCTCTGGTTCAG------ACCC
** *

Clustal X - Multiple Sequence
Alignment Program
•Clustal X provides a new window-based user interface to the
ClustalW program.
•It uses the Vibrant multi-platform user interface development
library, developed by the National Center for Biotechnology
Information (Bldg 38A, NIH 8600 Rockville Pike,Bethesda, MD
20894) as part of their NCBI SOFTWARE DEVELOPEMENT
TOOLKIT.

ClustalX

 Fast and scalable program written in C and C++ used
for multiple sequence alignment.
 It uses seeded guide trees and a
new HMM engine that focuses on two
profiles to generate these alignments.
The program requires three or more
sequences in order to calculate
the multiple sequence alignment, for
two sequences use pairwise sequence
alignment tools (EMBOSS, LALIGN).
Clustal Omega is consistency-based
and is widely viewed as one of the
fastest online implementations of all
multiple sequence alignment tools and
still ranks high in accuracy, among
both consistency-based and matrix-
based algorithms.
CLUSTAL OMEGA

The structure of a profile HMM used in the
Algorithm implementation of Clustal Omega is
shown here.
Clustal Omega has five main steps .
The first is producing a pairwise alignment using
the k-tuple method, also known as the word
method. This, in summary, is a heuristic method
to find an optimal alignment solution, but is
significantly more efficient than the dynamic
programming method of alignment. After that,
the sequences are clustered using the modified
mBed method. The mBed method calculates
pairwise distance using sequence embedding.
This step is followed by the k-means clustering
method.
ALGORITHM

Next, the guide tree is constructed using the UPGMA
method. This is shown as multiple guide tree steps
leading into one final guide tree construction because
of the way the UPGMA algorithm works.
At each step, (each diamond in the flowchart) the
nearest two clusters are combined and is repeated until
the final tree can be assessed. In the final step,
the multiple sequence alignment is produced using
HHAlign package from the HH-Suite, which uses two
profile HMM's.
A profile HMM is a linear state machine consisting of a
series of nodes, each of which corresponds roughly to a
position (column) in the alignment from which it was
built
Tags