Multiple Sequence Alignment Tool Using NCBI COBALT

MohsinRazaSaharan 2,524 views 34 slides Feb 23, 2018
Slide 1
Slide 1 of 34
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34

About This Presentation

Multiple Sequence Alignment Tool Using NCBI COBALT


Slide Content

MSAT
MULTIPLE SEQUENCE ALIGNMENT TOOL
BY
GROUP 2
2/22/2018
1

OVERVIEW
Sequencealignment
Typesofsequencealignments
Multiplesequencealignment
PurposeofMSA
TypesofMSA
Progressivealignment
Pros & Cons
2/22/2018
2

SEQUENCEALIGNMENT
Inbioinformatics,asequencealignmentsawayofarrangingthesequencesofDNA,RNA
,orproteintoidentifyregionsofsimilaritythatmaybeaconsequenceoffunctional,stru
ctural,orevolutionaryrelationshipsbetweenthesequences.
2/22/2018
3

SEQUENCE ALIGNMENT
SEQUENCEALIGNMENT Sequences oftencontainhighlyconservedregions
Theseregionscanbeusedforinitialalignments
2/22/2018
4

TYPESOFSEQUENCEALIGNMENTS
Pair‐wisealignment
Dotmatrixmethod
Dynamicprogramming
Wordmethods
Multiplesequencealignment
Dynamicprogramming
Progressivemethods
Iterativemethods 2/22/2018
5

MULTIPLESEQUENCEALIGNMENT
Amultiplesequencealignmentis tool that simultaneously aligns multiple protein
sequences, automatically utilizes information about protein domains, and has a good
compromise between speed and accuracy will have practical advantages over
current tools
Theprincipleisthatmultiplealignmentsareachievedbysuccessiveapplicationofp
airwise methods.
2/22/2018
6

PURPOSEOFMSA
Inordertocharacterizeproteinfamilies,identifysharedregionsofhomologyina
multiplesequencealignment
Determinationoftheconsensussequenceofseveralalignedsequences.
Consensussequencescanhelptodevelopasequence“fingerprint”whichallo
wstheidentificationofmembersofdistantlyrelatedproteinfamily(motifs)
MSAcanhelpustorevealbiologicalfacts
aboutproteins,likeanalysisofthesecondary/tertiarystructure
2/22/2018
7

2/22/2018
8

TYPESOFMSA
Dynamicprogrammingapproach
Computesanoptimalalignmentforagivenscorefunction.Becauseofitshighru
nningtime,itisnottypicallyusedinpractice.
Progressivemethod
Thisapproachrepeatedlyalignstwosequences,twoalignments,orasequence
withanalignment.
Iterativemethod
Workssimilarlytoprogressivemethodsbutrepeatedlyrealignstheinitialsequence
saswellasaddingnewsequencestothegrowingMSA.
2/22/2018
9

PROGRESSIVEALIGNMENT
Themostwidelyusedapproach
BuildsupafinalMSAbycombiningpairwise
alignmentsbeginningwiththemostsimilarpairandprogressingtothemostdistantlyr
elated
Progressivealignmentmethodsrequiretwostages:
First stageinwhichtherelationshipsbetweenthesequencesarerepresentedasatree,
calledaguide tree
‐SecondstepinwhichtheMSAisbuiltbyadding
thesequencessequentiallytothegrowingMSA accordingtotheguidetree
2/22/2018
10

USING COBALT NCBI
Constraint based alignment tool that implements a general framework for
multiple alignment of protein sequences
COBALT finds a collection of pairwise constraints derived from database
searches, sequence similarity and user input, combines these pairwise
constraints, and then incorporates them into a progressive multiple
alignment
COBALT has reasonable runtime performance and alignment accuracy
comparable to or exceeding that of other tools for a broad range of
problems
2/22/2018
11

USING COBALT NCBI
COBALT has a general framework that uses progressive multiple alignment
to combine pairwise constraints from different sources into a multiple
alignment
When the same domain matches to multiple sequences, we can infer
several potential pairwise constraints based on these domain matches
CDD ( Conserved Domains Database ) also contains auxiliary information
that allows COBALT to create partial profiles for input sequences before
progressive alignment begins, and this avoids computationally expensive
procedures for building profiles
2/22/2018
12

RUNTIME OF COBALT
The runtime performance of COBALT is highly data driven
COBALT is about five times faster than ProbCons
COBALT is included in the NCBI C++ Toolkit
Numerous auxiliary programs were written in C, C++ and Perl to automate
testing and summarize results
2/22/2018
13

AVAILABILITY
COBALT is included in the NCBI C++ toolkit. A Linux executable for COBALT,
and CDD and PROSITE data used is available at:
https://www.ncbi.nlm.nih.gov/tools/cobalt/re_cobalt.cgi
Contact: [email protected]
2/22/2018
14

STEP 1
2/22/2018
15
Go to https://www.ncbi.nlm.nih.gov/

STEP 2
The Swiss-Protprotein sequence for Schizosaccharomycespombe Clr4 is
O60016.2.
2/22/2018
16

STEP 3
2/22/2018
17

Step 4
2/22/2018
18

STEP 5
2/22/2018
19

STEP 6
2/22/2018
20

Your patience is greatly appreciated….
2/22/2018
21

RESULT
2/22/2018
22

STEP 6
2/22/2018
23
Select First 11

STEP 7
2/22/2018
24

Your patience is greatly appreciated….
2/22/2018
25

RESULTS
2/22/2018
26

RESULTS
2/22/2018
27

STEP 8
Notice that the above multiple alignment cant be edited “Edit and
Resubmit” link at the top of the COBALT results to remove the undesired
protein than search again.
2/22/2018
28

STEP 8 (a)
2/22/2018
29

STEP 8 (b)
2/22/2018
30

PROS AND CONS OF PROGRESSIVE
METHOD OF ALIGNMENT
PROS:
Efficient enough to implement on a large scale for
many (100s to 1000s) sequences.
Progressive alignment services are commonly available
on publicly accessible web servers, so users need not
locally install the applications of interest.
Most widely used method of multiple sequence
alignment because of speed and accuracy.
2/22/2018
31

CONS…….
Progressive alignments are not guaranteed to be
globally optimal.
The primary problem is that when errors are made at
any stage in growing the MSA, these errors are then
propagated through to the final result.
Performance is also particularly bad when all of the
sequences in the set are rather distantly related
2/22/2018
32

REFERENCES
https://insidescienceresources.wordpress.com/2017/05/15/ncbi-
bioinformatics-tools-protein-blast-cobalt-and-cn3d-structure-viewer/
https://academic.oup.com/bioinformatics/article/23/9/1073/272774
https://www.ncbi.nlm.nih.gov/
2/22/2018
33

2/22/2018
34