Bioinformatics

How to Apply Bioinformatics In Proteomics Seyed mohammad motevalli December 2013

outline Introduction to bioinformatics Biological databases Sequence alignment and their algorithms Structural prediction Web-based tools Stand-alone software

Introduction to bioinformatics What is the bioinformatics? Bioinformatics is an interdisciplinary research area at the interface between computer science and biological science .

Introduction to bioinformatics What are differences between bioinformatics and informatics? What are differences between bioinformatics and computational biology? What is the algorithm?

What is the proteomics!?

Biological databases Database A database is a computerized archive used to store and organize data in such a way that information can be retrieved easily via a variety of search criteria Entry Each record should contain a number of fields that hold the actual data items Value a particular piece of information Making a query To retrieve a particular record from the database , a user can specify a value to be found in a particular field and expect the computer to retrieve the whole data record

Biological databases Primary databases Gen bank (NCBI) www.ncbi.nlm.nih.gov EMBL www.ebi.ac.uk/embl/index.html DDBJ www.ddbj.nig.ac.jp Secondary databases ExPASY http ://web.expasy.org PIR http://pir.georgetown.edu/pirwww/pirhome3.shtml SWISS- Prot www.ebi.ac.uk/swissprot/access.html

Biological databases Interconnection between Biological Databases

Biological databases Pitfalls of biological databases The causes of redundancy include: repeated submission of identical or overlapping sequences by the same or different authors, revision of annotations, dumping of expressed sequence tags (EST) data Redundant sequences Non-redundant sequences (Ref Seq )

Biological databases Further databases NCBI www.ncbi.nlm.nih.gov Uniprot http :// www.uniprot.org ExPASY http://web.expasy.org PIR http://pir.georgetown.edu / SWISS- Prot http://swissmodel.expasy.org/ PDB http://www.rcsb.org/pdb/home/home.do Enzyme structure http :// www.ebi.ac.uk/thornton-srv/databases/enzymes

Biological databases NCBI www.ncbi.nlm.nih.gov

Biological databases Uniprot http://www.uniprot.org

Biological databases ExPASY http://web.expasy.org

Biological databases PIR http ://pir.georgetown.edu/

Biological databases SWISS- Prot http://swissmodel.expasy.org/

Biological databases PDB http://www.rcsb.org/pdb/home/home.do

Biological databases Enzyme structure http://www.ebi.ac.uk/thornton-srv/databases/enzymes

Sequence alignment and their algorithms Pairwise sequence alignment Pairwise sequence alignment is the process of aligning two sequences and is the basis of database similarity searching and multiple sequence alignment Sequence similarity versus sequence homology When two sequences are descended from a common evolutionary origin, they are said to have a homologous relationship or share homology. A related but different term is sequence similarity , which is the percentage of aligned residues that are similar in physiochemical properties such as size, charge, and hydrophobicity Sequence similarity versus sequence identity In a protein sequence alignment, sequence identity refers to the percentage of matches of the same amino acid residues between two aligned sequences. Similarity refers to the percentage of aligned residues that have similar physicochemical characteristics and can be more readily substituted for each other

Sequence alignment and their algorithms Sequence alignment strategies Global a lignment In global alignment, two sequences to be aligned are assumed to be generally similar over their entire length. Alignment is carried out from beginning to end of both sequences to find the best possible alignment across the entire length between the two sequences Local alignment In local alignment does not assume that the two sequences in question have similarity over the entire length. It only finds local regions with the highest level of similarity between the two sequences and aligns these regions without regard for the alignment of the rest of the sequence regions

Sequence alignment and their algorithms

Sequence alignment and their algorithms Linear gap penalty: The cost for creation and extension of gaps are the same W(I)= gI , g is the cost for each gap and I is the length Affine gap penalty: different cost for creation and extension W(I)=g open + g ext (I-1) and g open < G ext

Sequence alignment and their algorithms Alignment Algorithms And Methodes T he dot matrix method T he word method The dynamic programming method

Alignment Algorithms The dot matrix method The most basic sequence alignment method is the dot matrix method, also known as the dot plot method Sequence alignment and their algorithms

Sequence alignment and their algorithms Alignment Algorithms The word method It works by finding short stretches of identical or nearly identical letters in two sequences. These short strings of characters are called words, which are similar to the windows used in the dot matrix method

Sequence alignment and their algorithms Alignment Algorithms The word method

Alignment Algorithms The dynamic programming method Dynamic programming is a method that determines optimal alignment by matching two sequences for all possible pairs of characters between the two sequences Sequence alignment and their algorithms

Sequence alignment and their algorithms Alignment Algorithms The dynamic programming method Global alignment The classical global pairwise alignment algorithm using dynamic programming is the Needleman– Wunsch algorithm. In this algorithm, an optimal alignment is obtained over the entire lengths of the two sequences Local alignment The first application of dynamic programming in local alignment is the Smith–Waterman algorithm. In this algorithm, positive scores are assigned for matching residues and zeros for mismatches. No negative scores are used

Sequence alignment and their algorithms substitution matrix PAM matrices ( point accepted mutation) The PAM matrices were subsequently derived based on the evolutionary divergence between sequences of the same cluster. One PAM unit is defined as 1% of the amino acid positions that have been changed. Because of the use of very closely related homologs, the observed mutations were not expected to significantly change the common function of the proteins

Sequence alignment and their algorithms substitution matrix PAM matrices (point accepted mutation)

Sequence alignment and their algorithms substitution matrix BLOSUM matrices This is the series of blocks amino acid substitution matrices (BLOSUM), all of which are derived based on direct observation for every possible amino acid substitution in multiple sequence alignments

Sequence alignment and their algorithms substitution matrix BLOSUM matrices

Sequence alignment and their algorithms What Matrices should be used and when?

Comparison PAM is based on an evolutionary model using phylogenetic trees BLOSUM assumes no evolutionary model, but rather conserved “blocks” of proteins

Sequence alignment and their algorithms Heuristic database searching The heuristic algorithms perform faster searches because they examine only a fraction of the possible alignments examined in regular dynamic programming BLAST (basic local alignment search tool) BLAST uses heuristics to align a query sequence with all sequences in a database

Sequence alignment and their algorithms BLAST (basic local alignment search tool)

Sequence alignment and their algorithms Minimum Score (S) Neighborhood Score Threshold (T) Threshold for stopping extension Negative scores from scoring matrix If the extension stopped after crossing the X, the alignment is called High-scoring segment pair (HSP) 6- finishing

Sequence alignment and their algorithms Suggested BLAST Cutoffs For nucleotide-based searches: hits with E values of 10 -6 or less and seq identity 70% or more For protein-based searches: hits with E values of 10 -3 or less and seq. identity of 25% or more. Finding by chance in nucleotide database is more than proteins Identity in proteins is more informative than in the nucleic acids

Sequence alignment and their algorithms BLAST (basic local alignment search tool) BLASTN queries nucleotide sequences with a nucleotide sequence database BLASTP uses protein sequences as queries to search against a protein sequence database BLASTX uses nucleotide sequences as queries and translates them in all six reading frames to produce translated protein sequences, which are used to query a protein sequence database TBLASTN queries protein sequences to a nucleotide sequence database with the sequences translated in all six reading frames TBLASTX uses nucleotide sequences, which are translated in all six frames, to search against a nucleotide sequence database that has all the sequences translated in six frames

Sequence alignment and their algorithms PSI-BLAST Position-specific iterated BLAST (PSI-BLAST) builds profiles and performs database searches in an iterative fashion. The main feature of PSI-BLAST is that profiles are constructed automatically and arefine-tunedin each successive cycle

Sequence alignment and their algorithms PSI-BLAST

Sequence alignment and their algorithms Multiple sequence alignment

Sequence alignment and their algorithms Multiple sequence alignment Exhaustive algorithms The exhaustive alignment method involves examining all possible aligned positions simultaneously Heuristic algorithms Because the use of dynamic programming is not feasible for routine multiple sequence alignment , faster and heuristic algorithms have been developed. computational strategy to find a near-optimal solution by using rules of thumb . Essentially, this strategy takes shortcuts by reducing the search space according to certain criteria

Sequence alignment and their algorithms Multiple sequence alignment Heuristic algorithms Progressive alignment Progressive alignment depends on the stepwise assembly of multiple alignment and is heuristic in nature Clustal It is a progressive multiple alignment program available either as a stand-alone or on-line program T-coffee T-coffee performs progressive sequence alignments as in Clustal . The main difference is that, in processing a query, T-Coffee performs both global and local pairwise alignment for all possible pairs involved. The global pairwise alignment is performed using the Clustal program

Sequence alignment and their algorithms Multiple sequence alignment Heuristic algorithms Iterative alignment The iterative approach is based on the idea that an optimal solution can be found by repeatedly modifying existing suboptimal solutions

Sequence alignment and their algorithms Multiple sequence alignment Heuristic algorithms Block-Based Alignment The strategy identifies a block of ungapped alignment shared by all the sequences, hence , the block-based local alignment strategy

Structural prediction Structural prediction methods Ab -initio prediction Computational prediction based on first principles or using the most elementary information Threading Method of predicting the most likely protein structural fold based on secondary structure similarity with database structures and assessment of energies of the potential fold. The term has been used interchangeably with fold recognition Homology-based modeling Method for predicting the three-dimensional structure of a protein based on homology by assigning the structure of an unknown protein using an existing homologous protein structure as a template

Hidden Markova algorithm Statistical model composed of a number of interconnected. Markov chains with the capability to generate the probability value of an event by taking into account the influence from hidden variables . Mathematically , it calculates probability values of connected states among the Markov chains to find an optimal path within the network of states. It requires training to obtain the probability values of state transitions. When using a hidden Markov model to represent a multiple sequence alignment, a sequence can be generated through the model by incorporating probability values of match, insertion, and deletion states

Hidden Markova algorithm

Neural network algorithm Machine-learning algorithm for pattern recognition. It is composed of input, hidden, and output layers. Units of information in each layer are called nodes. The nodes of different layers are interconnected to form a network analogous to a biological nervous system. Between the nodes are mathematical weight parameters that can be trained with known patterns so they can be used for later predictions. After training, the network is able to recognize correlation between an input and output

Neural network algorithm

Web-based tools Alignment tools Sequence-based methods T-coffee http :// tcoffee.crg.cat/apps/tcoffee/do:regular NCBI http:// blast.ncbi.nlm.nih.gov/Blast.cgi Uniprot http :// www.uniprot.org EMBL http:// coot.embl.de/Alignment Structural-based methods Dali server http://ekhidna.biocenter.helsinki.fi/dali_server FSSP http:// protein.hbu.cn/fssp Signal peptide resource http://proline.bic.nus.edu.sg/spdb/searchn.html Active site prediction http ://www.scfbio-iitd.res.in/dock/ActiveSite.jsp

Web-based tools T-coffee http://tcoffee.crg.cat/apps/tcoffee/do:regular

Web-based tools NCBI http://blast.ncbi.nlm.nih.gov/Blast.cgi

Web-based tools Uniprot http://www.uniprot.org

Web-based tools EMBL http://coot.embl.de/Alignment

Web-based tools Dali server http://ekhidna.biocenter.helsinki.fi/dali_server

Web-based tools FSSP http://protein.hbu.cn/fssp

Web-based tools Secondary structures prediction Sopma http :// npsa- pbil.ibcp.fr/ cgibin / npsa_automat.pl?page =npsa_sopma.html Jpred3 http :// www.compbio.dundee.ac.uk/www-jpred PreSSaPro http :// bioinformatica.isa.cnr.it/PRESSAPRO HMM protein structure prediction http :// compbio.soe.ucsc.edu/SAM_T08/T08-query.html PROF http ://www.aber.ac.uk/~ phiwww/prof Software package http ://molbiol-tools.ca/Protein_secondary_structure.htm

Web-based tools Sopma http://npsapbil.ibcp.fr/cgibin/npsa_automat.pl?page=npsa_sopma.html

Web-based tools Jpred3 http://www.compbio.dundee.ac.uk/www-jpred

Web-based tools PreSSaPro http://bioinformatica.isa.cnr.it/PRESSAPRO

Web-based tools HMM protein structure prediction http://compbio.soe.ucsc.edu/SAM_T08/T08-query.html

Web-based tools PROF http://www.aber.ac.uk/~phiwww/prof

Web-based tools Software package http ://molbiol-tools.ca/Protein_secondary_structure.htm

Web-based tools Signal peptide resource http://proline.bic.nus.edu.sg/spdb/searchn.html

Web-based tools Active site prediction http://www.scfbio-iitd.res.in/dock/ActiveSite.jsp

Web-based tools Tertiary structure prediction Phyre2 http://www.sbg.bio.ic.ac.uk/phyre2/html/page.cgi?id=index

Web-based tools Biochemical features Protein calculator http://www.scripps.edu/~cdputnam/protcalc.html Amino acid calculator http :// proteome.gs.washington.edu/cgi- bin/aa_calc.pl Peptide property calculator https://www.genscript.com/ssl-bin/site2/peptide_calculation.cgi Peptide property calculator http :// www.innovagen.se/custom-peptide-synthesis/peptide-property-calculator/peptide-property-calculator.asp Physico -chemical profiles http :// npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page =/ NPSA/npsa_pcprof.html Tagldent tool http ://web.expasy.org/tagident/

Web-based tools Biochemical features Peptide cutter http://web.expasy.org/peptide_cutter/ Kyte doolittle hydropahty plot http://gcat.davidson.edu/DGPB/kd/kyte-doolittle.htm GRAVY calculator http://www.gravy-calculator.de/index.php ProtScale http://web.expasy.org/protscale/ ProtParam http://web.expasy.org/protparam/ Prosite http://prosite.expasy.org/prosite.html Interpro http://www.ebi.ac.uk/interpro/

Web-based tools Protein calculator http://www.scripps.edu/~cdputnam/protcalc.html

Web-based tools Amino acid calculator http ://proteome.gs.washington.edu/cgi- bin/aa_calc.pl

Web-based tools Peptide property calculator https ://www.genscript.com/ssl-bin/site2/peptide_calculation.cgi

Web-based tools Peptide property calculator http://www.innovagen.se/custom-peptide-synthesis/peptide-property-calculator/peptide-property-calculator.asp

Web-based tools Physico -chemical profiles http://npsa-pbil.ibcp.fr/cgi-bin/npsa_automat.pl?page=/NPSA/npsa_pcprof.html

Web-based tools Tagldent tool http://web.expasy.org/tagident/

Web-based tools Peptide cutter http://web.expasy.org/peptide_cutter/

Web-based tools Kyte doolittle hydropahty plot http://gcat.davidson.edu/DGPB/kd/kyte-doolittle.htm

Web-based tools GRAVY calculator http://www.gravy-calculator.de/index.php

Web-based tools ProtScale http://web.expasy.org/protscale/

Web-based tools ProtParam http://web.expasy.org/protparam/

Web-based tools Prosite http://prosite.expasy.org/prosite.html

Web-based tools Interpro http://www.ebi.ac.uk/interpro/

Stand-alone softwares MEGA

Stand-alone softwares CLC main workbench

Stand-alone softwares UGENE

Stand-alone softwares Spdb viewer

Stand-alone softwares Pairwise structure alignment

Stand-alone softwares Cn3D

Stand-alone software BioEdit

Stand-alone software ClustalX

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Bioinformatics

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Slide 54

Slide 55

Slide 56

Slide 57

Slide 58

Slide 59

Slide 60

Slide 61

Slide 62

Slide 63

Slide 64

Slide 65

Slide 66

Slide 67

Slide 68

Slide 69

Slide 70

Slide 71

Slide 72

Slide 73

Slide 74

Slide 75

Slide 76

Slide 77