Bioinformatics

3,218 views 94 slides Jan 12, 2014
Slide 1
Slide 1 of 94
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94

About This Presentation

No description available for this slideshow.


Slide Content

How to Apply Bioinformatics In Proteomics Seyed mohammad motevalli December 2013

outline Introduction to bioinformatics Biological databases Sequence alignment and their algorithms Structural prediction Web-based tools Stand-alone software

Introduction to bioinformatics What is the bioinformatics? Bioinformatics is an interdisciplinary research area at the interface between computer science and biological science .

Introduction to bioinformatics What are differences between bioinformatics and informatics? What are differences between bioinformatics and computational biology? What is the algorithm?

What is the proteomics!?

Biological databases Database A database is a computerized archive used to store and organize data in such a way that information can be retrieved easily via a variety of search criteria Entry Each record should contain a number of fields that hold the actual data items Value a particular piece of information Making a query To retrieve a particular record from the database , a user can specify a value to be found in a particular field and expect the computer to retrieve the whole data record

Biological databases Primary databases Gen bank (NCBI) www.ncbi.nlm.nih.gov EMBL www.ebi.ac.uk/embl/index.html DDBJ www.ddbj.nig.ac.jp Secondary databases ExPASY http ://web.expasy.org PIR http://pir.georgetown.edu/pirwww/pirhome3.shtml SWISS- Prot www.ebi.ac.uk/swissprot/access.html

Biological databases Interconnection between Biological Databases

Biological databases Pitfalls of biological databases The causes of redundancy include: repeated submission of identical or overlapping sequences by the same or different authors, revision of annotations, dumping of expressed sequence tags (EST) data Redundant sequences Non-redundant sequences (Ref Seq )

Biological databases Further databases NCBI www.ncbi.nlm.nih.gov Uniprot http :// www.uniprot.org ExPASY http://web.expasy.org PIR http://pir.georgetown.edu / SWISS- Prot http://swissmodel.expasy.org/ PDB http://www.rcsb.org/pdb/home/home.do Enzyme structure http :// www.ebi.ac.uk/thornton-srv/databases/enzymes

Biological databases NCBI www.ncbi.nlm.nih.gov

Biological databases Uniprot http://www.uniprot.org

Biological databases ExPASY http://web.expasy.org

Biological databases PIR http ://pir.georgetown.edu/

Biological databases SWISS- Prot http://swissmodel.expasy.org/

Biological databases PDB http://www.rcsb.org/pdb/home/home.do

Biological databases Enzyme structure http://www.ebi.ac.uk/thornton-srv/databases/enzymes

Sequence alignment and their algorithms Pairwise sequence alignment Pairwise sequence alignment is the process of aligning two sequences and is the basis of database similarity searching and multiple sequence alignment Sequence similarity versus sequence homology When two sequences are descended from a common evolutionary origin, they are said to have a homologous relationship or share homology. A related but different term is sequence similarity , which is the percentage of aligned residues that are similar in physiochemical properties such as size, charge, and hydrophobicity Sequence similarity versus sequence identity In a protein sequence alignment, sequence identity refers to the percentage of matches of the same amino acid residues between two aligned sequences. Similarity refers to the percentage of aligned residues that have similar physicochemical characteristics and can be more readily substituted for each other

Sequence alignment and their algorithms Sequence alignment strategies Global a lignment In global alignment, two sequences to be aligned are assumed to be generally similar over their entire length. Alignment is carried out from beginning to end of both sequences to find the best possible alignment across the entire length between the two sequences Local alignment In local alignment does not assume that the two sequences in question have similarity over the entire length. It only finds local regions with the highest level of similarity between the two sequences and aligns these regions without regard for the alignment of the rest of the sequence regions

Sequence alignment and their algorithms

Sequence alignment and their algorithms Linear gap penalty: The cost for creation and extension of gaps are the same W(I)= gI , g is the cost for each gap and I is the length Affine gap penalty: different cost for creation and extension W(I)=g open + g ext (I-1) and g open < G ext

Sequence alignment and their algorithms Alignment Algorithms And Methodes T he dot matrix method T he word method The dynamic programming method

Alignment Algorithms The dot matrix method The most basic sequence alignment method is the dot matrix method, also known as the dot plot method Sequence alignment and their algorithms

Sequence alignment and their algorithms Alignment Algorithms The word method It works by finding short stretches of identical or nearly identical letters in two sequences. These short strings of characters are called words, which are similar to the windows used in the dot matrix method

Sequence alignment and their algorithms Alignment Algorithms The word method

Alignment Algorithms The dynamic programming method Dynamic programming is a method that determines optimal alignment by matching two sequences for all possible pairs of characters between the two sequences Sequence alignment and their algorithms

Sequence alignment and their algorithms Alignment Algorithms The dynamic programming method Global alignment The classical global pairwise alignment algorithm using dynamic programming is the Needleman– Wunsch algorithm. In this algorithm, an optimal alignment is obtained over the entire lengths of the two sequences Local alignment The first application of dynamic programming in local alignment is the Smith–Waterman algorithm. In this algorithm, positive scores are assigned for matching residues and zeros for mismatches. No negative scores are used

Sequence alignment and their algorithms substitution matrix PAM matrices ( point accepted mutation) The PAM matrices were subsequently derived based on the evolutionary divergence between sequences of the same cluster. One PAM unit is defined as 1% of the amino acid positions that have been changed. Because of the use of very closely related homologs, the observed mutations were not expected to significantly change the common function of the proteins

Sequence alignment and their algorithms substitution matrix PAM matrices (point accepted mutation)

Sequence alignment and their algorithms substitution matrix BLOSUM matrices This is the series of blocks amino acid substitution matrices (BLOSUM), all of which are derived based on direct observation for every possible amino acid substitution in multiple sequence alignments

Sequence alignment and their algorithms substitution matrix BLOSUM matrices

Sequence alignment and their algorithms What Matrices should be used and when?

Comparison PAM is based on an evolutionary model using phylogenetic trees BLOSUM assumes no evolutionary model, but rather conserved “blocks” of proteins

Sequence alignment and their algorithms Heuristic database searching The heuristic algorithms perform faster searches because they examine only a fraction of the possible alignments examined in regular dynamic programming BLAST (basic local alignment search tool) BLAST uses heuristics to align a query sequence with all sequences in a database

Sequence alignment and their algorithms BLAST (basic local alignment search tool)

Sequence alignment and their algorithms Minimum Score (S) Neighborhood Score Threshold (T) Threshold for stopping extension Negative scores from scoring matrix If the extension stopped after crossing the X, the alignment is called High-scoring segment pair (HSP) 6- finishing

Sequence alignment and their algorithms Suggested BLAST Cutoffs For nucleotide-based searches: hits with E values of 10 -6 or less and seq identity 70% or more For protein-based searches: hits with E values of 10 -3 or less and seq. identity of 25% or more. Finding by chance in nucleotide database is more than proteins Identity in proteins is more informative than in the nucleic acids

Sequence alignment and their algorithms BLAST (basic local alignment search tool) BLASTN queries nucleotide sequences with a nucleotide sequence database BLASTP uses protein sequences as queries to search against a protein sequence database BLASTX uses nucleotide sequences as queries and translates them in all six reading frames to produce translated protein sequences, which are used to query a protein sequence database TBLASTN queries protein sequences to a nucleotide sequence database with the sequences translated in all six reading frames TBLASTX uses nucleotide sequences, which are translated in all six frames, to search against a nucleotide sequence database that has all the sequences translated in six frames

Sequence alignment and their algorithms PSI-BLAST Position-specific iterated BLAST (PSI-BLAST) builds profiles and performs database searches in an iterative fashion. The main feature of PSI-BLAST is that profiles are constructed automatically and arefine-tunedin each successive cycle

Sequence alignment and their algorithms PSI-BLAST

Sequence alignment and their algorithms Multiple sequence alignment

Sequence alignment and their algorithms Multiple sequence alignment Exhaustive algorithms The exhaustive alignment method involves examining all possible aligned positions simultaneously Heuristic algorithms Because the use of dynamic programming is not feasible for routine multiple sequence alignment , faster and heuristic algorithms have been developed. computational strategy to find a near-optimal solution by using rules of thumb . Essentially, this strategy takes shortcuts by reducing the search space according to certain criteria

Sequence alignment and their algorithms Multiple sequence alignment Heuristic algorithms Progressive alignment Progressive alignment depends on the stepwise assembly of multiple alignment and is heuristic in nature Clustal It is a progressive multiple alignment program available either as a stand-alone or on-line program T-coffee T-coffee performs progressive sequence alignments as in Clustal . The main difference is that, in processing a query, T-Coffee performs both global and local pairwise alignment for all possible pairs involved. The global pairwise alignment is performed using the Clustal program

Sequence alignment and their algorithms Multiple sequence alignment Heuristic algorithms Iterative alignment The iterative approach is based on the idea that an optimal solution can be found by repeatedly modifying existing suboptimal solutions

Sequence alignment and their algorithms Multiple sequence alignment Heuristic algorithms Block-Based Alignment The strategy identifies a block of ungapped alignment shared by all the sequences, hence , the block-based local alignment strategy

Structural prediction Structural prediction methods Ab -initio prediction Computational prediction based on first principles or using the most elementary information Threading Method of predicting the most likely protein structural fold based on secondary structure similarity with database structures and assessment of energies of the potential fold. The term has been used interchangeably with fold recognition Homology-based modeling Method for predicting the three-dimensional structure of a protein based on homology by assigning the structure of an unknown protein using an existing homologous protein structure as a template

Hidden Markova algorithm Statistical model composed of a number of interconnected. Markov chains with the capability to generate the probability value of an event by taking into account the influence from hidden variables . Mathematically , it calculates probability values of connected states among the Markov chains to find an optimal path within the network of states. It requires training to obtain the probability values of state transitions. When using a hidden Markov model to represent a multiple sequence alignment, a sequence can be generated through the model by incorporating probability values of match, insertion, and deletion states

Hidden Markova algorithm

Neural network algorithm Machine-learning algorithm for pattern recognition. It is composed of input, hidden, and output layers. Units of information in each layer are called nodes. The nodes of different layers are interconnected to form a network analogous to a biological nervous system. Between the nodes are mathematical weight parameters that can be trained with known patterns so they can be used for later predictions. After training, the network is able to recognize correlation between an input and output

Neural network algorithm

Web-based tools Alignment tools Sequence-based methods T-coffee http :// tcoffee.crg.cat/apps/tcoffee/do:regular NCBI http:// blast.ncbi.nlm.nih.gov/Blast.cgi Uniprot http :// www.uniprot.org EMBL http:// coot.embl.de/Alignment Structural-based methods Dali server http://ekhidna.biocenter.helsinki.fi/dali_server FSSP http:// protein.hbu.cn/fssp Signal peptide resource http://proline.bic.nus.edu.sg/spdb/searchn.html Active site prediction http ://www.scfbio-iitd.res.in/dock/ActiveSite.jsp

Web-based tools T-coffee http://tcoffee.crg.cat/apps/tcoffee/do:regular

Web-based tools NCBI http://blast.ncbi.nlm.nih.gov/Blast.cgi

Web-based tools Uniprot http://www.uniprot.org

Web-based tools EMBL http://coot.embl.de/Alignment

Web-based tools Dali server http://ekhidna.biocenter.helsinki.fi/dali_server

Web-based tools FSSP http://protein.hbu.cn/fssp

Web-based tools Secondary structures prediction Sopma http :// npsa- pbil.ibcp.fr/ cgibin / npsa_automat.pl?page =npsa_sopma.html Jpred3 http :// www.compbio.dundee.ac.uk/www-jpred PreSSaPro http :// bioinformatica.isa.cnr.it/PRESSAPRO HMM protein structure prediction http :// compbio.soe.ucsc.edu/SAM_T08/T08-query.html PROF http ://www.aber.ac.uk/~ phiwww/prof Software package http ://molbiol-tools.ca/Protein_secondary_structure.htm

Web-based tools Sopma http://npsapbil.ibcp.fr/cgibin/npsa_automat.pl?page=npsa_sopma.html

Web-based tools Sopma http://npsapbil.ibcp.fr/cgibin/npsa_automat.pl?page=npsa_sopma.html

Web-based tools Jpred3 http://www.compbio.dundee.ac.uk/www-jpred

Web-based tools PreSSaPro http://bioinformatica.isa.cnr.it/PRESSAPRO

Web-based tools HMM protein structure prediction http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html

Web-based tools PROF http://www.aber.ac.uk/~phiwww/prof

Web-based tools Software package http ://molbiol-tools.ca/Protein_secondary_structure.htm

Web-based tools Signal peptide resource http://proline.bic.nus.edu.sg/spdb/searchn.html

Web-based tools Active site prediction http://www.scfbio-iitd.res.in/dock/ActiveSite.jsp

Web-based tools Tertiary structure prediction Phyre2 http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index

Web-based tools Biochemical features Protein calculator http://www.scripps.edu/~cdputnam/protcalc.html Amino acid calculator http :// proteome.gs.washington.edu/cgi- bin/aa_calc.pl Peptide property calculator https://www.genscript.com/ssl-bin/site2/peptide_calculation.cgi Peptide property calculator http :// www.innovagen.se/custom-peptide-synthesis/peptide-property-calculator/peptide-property-calculator.asp Physico -chemical profiles http :// npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page =/ NPSA/npsa_pcprof.html Tagldent tool http ://web.expasy.org/tagident/

Web-based tools Biochemical features Peptide cutter http://web.expasy.org/peptide_cutter/ Kyte doolittle hydropahty plot http://gcat.davidson.edu/DGPB/kd/kyte-doolittle.htm GRAVY calculator http://www.gravy-calculator.de/index.php ProtScale http://web.expasy.org/protscale/ ProtParam http://web.expasy.org/protparam/ Prosite http://prosite.expasy.org/prosite.html Interpro http://www.ebi.ac.uk/interpro/

Web-based tools Protein calculator http://www.scripps.edu/~cdputnam/protcalc.html

Web-based tools Amino acid calculator http ://proteome.gs.washington.edu/cgi- bin/aa_calc.pl

Web-based tools Peptide property calculator https ://www.genscript.com/ssl-bin/site2/peptide_calculation.cgi

Web-based tools Peptide property calculator http://www.innovagen.se/custom-peptide-synthesis/peptide-property-calculator/peptide-property-calculator.asp

Web-based tools Physico -chemical profiles http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_pcprof.html

Web-based tools Tagldent tool http://web.expasy.org/tagident/

Web-based tools Peptide cutter http://web.expasy.org/peptide_cutter/

Web-based tools Kyte doolittle hydropahty plot http://gcat.davidson.edu/DGPB/kd/kyte-doolittle.htm

Web-based tools GRAVY calculator http://www.gravy-calculator.de/index.php

Web-based tools ProtScale http://web.expasy.org/protscale/

Web-based tools ProtParam http://web.expasy.org/protparam/

Web-based tools Prosite http://prosite.expasy.org/prosite.html

Web-based tools Interpro http://www.ebi.ac.uk/interpro/

Stand-alone softwares MEGA

Stand-alone softwares CLC main workbench

Stand-alone softwares UGENE

Stand-alone softwares Spdb viewer

Stand-alone softwares Pairwise structure alignment

Stand-alone softwares Cn3D

Stand-alone software BioEdit

Stand-alone software ClustalX
Tags