Protein Structure Prediction

51,332 views 80 slides Feb 23, 2010
Slide 1
Slide 1 of 80
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80

About This Presentation

No description available for this slideshow.


Slide Content

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Lecture 14:
Protein Structure Prediction

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Review of Proteins
• Proteins: polypeptides with a three
dimensional structure

•Primary structure–sequence of amino
acids constituting polypeptide chain
•Secondary structure–local organization of
polypeptide chain into secondary structures
such as αhelices and βsheets

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Review of Proteins
•Tertiary structure–three dimensional
arrangements of amino acids as they react to
one another due to polarity and interactions
between side chains
•Quaternary structure–Interaction of several
protein subunits

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure
• Proteins: chains of amino acids joined by
peptide bonds
• Amino Acids:
– Polar (separate positive and negatively charged
regions)
– free C=O group (CARBOXYL), can act as
hydrogen bond acceptor
– free NH group (AMINYL), can act as hydrogen
bond donor

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure
• Many confirmations possible due to the
rotation around the Alpha-Carbon (C
α
)
atom
• Confirmationalchanges lead to
differences in three-dimensional
structure of protein

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure
• Polypeptide chain has pattern of N-C
α
-C
repeated
• Angle between aminylgroup and C
α
is
PHI (φ) angle; angle between C
α
and
carboxyl group is PSI (ψ) angle

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Differences between A.A.’s
• Difference between 20 amino acids is the R
side chains
• Amino acids can be separated based on the
chemical properties of the side chains:
– Hydrophobic
– Charged
– Polar

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Differences between A.A.’s
• Hydrophobic: Alanine(A), Valine(V),
phenylalanine (Y), Proline(P), Methionine
(M), isoleucine(I), and Leucine(L)
• Charged: Aspartic acid (D), GlutamicAcid
(E), Lysine (K), Arginine(R)
• Polar: Serine (S), Theronine(T), Tyrosine (Y);
Histidine(H), Cysteine(C), Asparagine(N),
Glutamine (Q), Tryptophan(W)

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Secondary Structure
• Image source: http://www.ebi.ac.uk/microarray/biology_intro.html

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Secondary Structures
• Core of each protein made up of regular
secondary structures
• Regular patterns of hydrogen bonds are
formed between neighboring amino acids
• Amino acids in secondary structures have
similar φand ψangles

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Secondary Structures
• Structures act to neutralize the polar groups
on each amino acid
• Secondary structures tightly packed in protein
core and a hydrophobic environment
• Each amino acid side group has a limited
space to occupy --therefore a limited number
of possible interactions

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Types of Secondary
Structures
•αHelices
•βSheets
• Loops
• Coils

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
αHelix
• Most abundant secondary
structure
• 3.6 amino acids per turn
• Hydrogen bond formed
between every fourth reside
• Average length: 10 amino
acids, or 3 turns
• Varies from 5 to 40 amino acids
Image source: http://www.hhmi.princeton.edu/sw/ 2002/psidelsk/scavengerhunt.htm
; http://www4.ocn.ne.jp/~bio/biology/protein.htm

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
αHelix
• Normally found on the surface of protein
cores
• Interact with aqueous environment
–Inner facing side has hydrophobic amino
acids
–Outer-facing side has hydrophilic amino
acids

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
αHelix
• Every third amino acid tends to be
hydrophobic
• Pattern can be detected computationally
• Rich in alanine(A), gutamicacid (E), leucine
(L), and methionine(M)
• Poor in proline(P), glycine(G), tyrosine (Y),
and serine (S)

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
βSheet
Image source: http://broccoli.mfn.ki.se/ pps_course_96/ss_960723_12.html
;
http://www4.ocn.ne.jp/~bio/biology/protein.htm

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
βSheet
• Hydrogen bonds between 5-10
consecutive amino acids in one portion
of the chain with another 5-10 farther
down the chain
• Interacting regions may be adjacent
with a short loop, or far apart with other
structures in between

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
βSheet
• Directions:
–Same: Parallel Sheet
–Opposite: Anti-parallel Sheet
–Mixed: Mixed Sheet
• Pattern of hydrogen bond formation in
parallel and anti-parallel sheets is
different

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
βSheet
• Slight counterclockwise rotation
• Alpha carbons (as well as R side
groups) alternate above and below the
sheet
• Prediction difficult, due to wide range of
φand ψangles

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric RouchkaInteractions in Helices and
Sheets

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Loop
• Regions between αhelices and β
sheets
• Various lengths and three-dimensional
configurations
• Located on surface of the structure

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Loop
• Hairpin loops: complete turn in the
polypeptide chain, (anti-parallel βsheets)
• More variable sequence structure
• Tend to have charged and polar amino acids
• Frequently a component of active sites

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Coil
• Region of secondary structure that is
not a helix, sheet, or loop

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Secondary Structure
• Image source: http://www.ebi.ac.uk/microarray/biology_intro.html

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
6 Classes of Protein Structure 1) Class α: bundles of αhelices connected by
loops on surface of proteins
2) Class β: antiparallelβsheets, usually two
sheets in close contact forming sandwich
3) Class α/β: mainly parallel βsheets with
intervening αhelices; may also have mixed β
sheets (metabolic enzymes)

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
6 Classes of Protein Structure 4)Class α+ β: mainly segregated αhelices and
antiparallelβsheets
5) Multidomain(αand β) proteins more than
one of the above four domains
6) Membrane and cell-surface proteins and
peptides excluding proteins of the immune
system

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
αClass Protein (hemoglobin)
•http://www.rcsb.org/pdb/cgi/explore.cgi?job=grap hics;pdbId=3hhb;page=;pid=&opt=show&size=250

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
βClass Protein (T-Cell CD8)
•http://www.rcsb.org/pdb/cgi/explore.cgi?job=grap hics;pdbId=1cd8;page=;pid=&opt=show&size=500

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
α/ βClass Protein
(tryptohansynthase)
•http://www.rcsb.org/pdb/cgi/explore.cgi?job=grap hics;pdbId=2wsy;page=;pid=&opt=show&size=500

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
α+βClass Protein
(1RNB)
•http://www.rcsb.org/pdb/cgi/explore.cgi?job=grap hics;pdbId=1rnb;page=;pid=&opt=show&size=500

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric RouchkaMembrane Protein (10PF)
•http://www.rcsb.org/pdb/cgi/explore.cgi?job=grap hics;pdbId=1opf;page=;pid=&opt=show&size=500

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure Databases
• Databases of three dimensional structures of
proteins, where structure has been solved
using X-ray crystallography or nuclear
magnetic resonance (NMR) techniques
• Protein Databases:
–PDB
–SCOP
– Swiss-Prot
–PIR

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure Databases
• Most extensive for 3-D structure is the
Protein Data Bank (PDB)
• Current release of PDB (April 8, 2003)
has 20,622 structures

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Partial PDB File
ATOM 1 N VAL A 1 6.452 16.459 4.843 7.00 47.38 3HHB 162
ATOM 2 CA VAL A 1 7.060 17.792 4.760 6.00 48.47 3HHB 163
ATOM 3 C VAL A 1 8.561 17.703 5.038 6.00 37.13 3HHB 164
ATOM 4 O VAL A 1 8.992 17.182 6.072 8.00 36.25 3HHB 165
ATOM 5 CB VAL A 1 6.342 18.738 5.727 6.00 55.13 3HHB 166
ATOM 6 CG1 VAL A 1 7.114 20.033 5.993 6.00 54.30 3HHB 167
ATOM 7 CG2 VAL A 1 4.924 19.032 5.232 6.00 64.75 3HHB 168
ATOM 8 N LEU A 2 9.333 18.209 4.095 7.00 30.18 3HHB 169
ATOM 9 CA LEU A 2 10.785 18.159 4.237 6.00 35.60 3HHB 170
ATOM 10 C LEU A 2 11.247 19.305 5.133 6.00 35.47 3HHB 171
ATOM 11 O LEU A 2 11.017 20.477 4.819 8.00 37.64 3HHB 172
ATOM 12 CB LEU A 2 11.451 18.286 2.866 6.00 35.22 3HHB 173
ATOM 13 CG LEU A 2 11.081 17.137 1.927 6.00 31.04 3HHB 174
ATOM 14 CD1 LEU A 2 11.766 17.306 .570 6.00 39.08 3HHB 175
ATOM 15 CD2 LEU A 2 11.427 15.778 2.539 6.00 38.96 3HHB 176

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Description of PDB File
• second column: amino acid position in the
polypeptide chain
• fourth column: current amino acid
• Columns 7, 8, and 9: x, y, and z coordinates
(in angstroms)
• The 11
th
column: temperature factor --can be
used as a measurement of uncertainty

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure
Classification Databases
•Structural Classification of proteins
(SCOP)
• based on expert definition of structural
similarities
• SCOP classifies by class, family, superfamily,
and fold
•http://scop.mrc-lmb.cam.ac.uk/scop/

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure
Classification Databases
•Classification by class, architecture,
topology, and homology (CATH)
• Classifies proteins into hierarchical levels by
class
• a/B and a+B are considered to be a single
class
•http://www.biochem.ucl.ac.uk/bsm/cath/

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure
Classification Databases
•Molecular Modeling Database (MMDB)
• structures from PDB categorized into
structurally related groups using the VAST
• looks for similar arrangements of secondary
structural elements
•http://www.ncbi.nlm.nih.gov/Entrez

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Protein Structure
Classification Databases
•Spatial Arrangement of Backbone
Fragments (SARF)
• categorized on structural similarities,
similar to the MMDB •http://www-lmmb.ncifcrf.gov/~nicka/sarf2.html

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Visualization of Proteins
• A number of programs convert atomic
coordinates of 3-d structures into views of the
molecule
• allow the user to manipulate the molecule by
rotation, zooming, etc.
• Critical in drug design --yields insight into
how the protein might interact with ligandsat
active sites

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Visualization of Proteins
• Most popular program for viewing 3-
dimensional structures is Rasmol Rasmol: http://www.umass.edu/microbio/rasmol/
Chime: http://www.umass.edu/microbio/chime/ Cn3D: http://www.ncbi.nlm.nih.gov/Structure/ Mage: http://kinemage.biochem.duke.edu/website/kinhome.html Swiss 3D viewer: http://www.expasy.ch/spdbv/mainpage.html

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Alignment of Protein Structure • Three-dimensional structure of one protein
compared against three-dimensional
structure of second protein
• Atoms fit together as closely as possible to
minimize the average deviation
• Structural similarity between proteins does
not necessarily mean evolutionary
relationship

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Alignment of Protein Structure • Positions of atoms in three-dimensional
structures compared
• Look for positions of secondary
structural elements (helices and
strands) within a protein domain

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Alignment of Protein Structure • Distances between carbon atoms
examined to determine degree
structures may be superimposed
• Side chain information can be
incorporated
–Buried; visible

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
SSAP
• Secondary Structure Alignment
Program
• Incorporates double dynamic
programming to produce a structural
alignment between two proteins

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Steps in SSAP
• 1) Calculate vectors from C
β
of one amino
acid to set of nearby amino acids
– Vectors from two separate proteins compared
– Difference (expressed as an angle) calculated,
and converted to score
• 2) Matrix for scores of vector differences
from one protein to the next is computed.

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Steps in SSAP
•3)Optimal alignment found using
global dynamic programming, with a
constant gap penalty
• 4) Next amino acid residue
considered, optimal path to align this
amino acid to the second sequence
computed

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Steps in SSAP
•5)Alignments transferred to
summary matrix
–If paths cross same matrix position, scores
are summed
–If part of alignment path found in both
matrices, evidence of similarity

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Steps in SSAP
•6)Dynamic programming alignment
is performed for the summary matrix
–Final alignment represents optimal
alignment between the protein structures
–Resulting score converted so it can be
compared to see how closely related two
structures are

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric RouchkaDistance Matrix Approach
• Uses graphical procedure similar to dot
plots
• Identifies atoms that lie most closely
together in three-dimensional structure
• Two sequences with similar structure
can have dot plots superimposed

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric RouchkaDistance Matrix Approach
• Values in distance matrix represent distance
between the C
α
atoms in the three
dimensional structure
• positions of closest packing atoms marked
with a dot to highlight regions of interest
• Similar groups superimposed as closely as
possible by minimizing sum of atomic
distances

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
DALI
•Distance Alignment Tool (DALI)
• Uses distance matrix method to align protein
structures
• Assembly step uses Monte Carlo simulation
to find submatricesthat can be aligned
• Existing structures that have been compared
are organized into the FSSP database

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric RouchkaFast Structural Similarity
Search
• Compare types and arrangements of
secondary structures within two proteins
• If elements similarly arranged, three-
dimensional structures are similar
• VAST and SARF are programs that use
these fast methods

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric RouchkaStructural Motifs Based on
Sequence Analysis
• Some structural elements can be
determined by looking at sequence
composition
–zinc finger motifs
–leucinezippers
–coiled-coil structures

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Zinc Finger Motifs
• Found by looking at
order and spacing of
cysteineand
histidineresidues
• Typical zinc finger
motifs are
composed of two
cysteinesfollowed
by two histidines
Image source: www.bmb.psu.edu/faculty/tan/lab/ tanlab_gallery_protdna.html

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
LeucineZippers
• Found by looking for
two antiparallelalpha
helices held together
• Interactions between
hydrophobic leucine
residues found every
seventh position in helix
Image source: ww2.mcgill.ca/biology/undergra/ c200a/sec3-5.htm

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric RouchkaTransmembraneProteins
• traverse back and forth
through alpha helices
• Typical length: 20-30
residues
• Transmembranealpha
helices have hydrophobic
residues on the inside
facing portions, and
hydrophilic residues on the
outside
Image source:
http://www.northwestern.edu/neurobiology/faculty/pinto2/pinto_12big.jpg

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Membrane Prediction
Programs
•PHDhtm: employs neural network approach;
neural network trained to recognize sequence
patterns and variations of helices in
transmembraneproteins of known structures
•Tmpred: functions by searching a protein
against a sequence scoring matrix obtained
by aligning the sequences of all known
transmembranealpha helix regions

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric RouchkaDistance Matrix Approach
• Uses graphical procedure similar to dot
plots
• Identifies atoms that lie most closely
together in three-dimensional structure
• Two sequences with similar structure
can have dot plots superimposed

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric RouchkaDistance Matrix Approach
• Values in distance matrix represent distance
between the C
α
atoms in the three
dimensional structure
• positions of closest packing atoms marked
with a dot to highlight regions of interest
• Similar groups superimposed as closely as
possible by minimizing sum of atomic
distances

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
DALI
•Distance Alignment Tool (DALI)
• Uses distance matrix method to align protein
structures
• Assembly step uses Monte Carlo simulation
to find sub-matrices that can be aligned
• Existing structures that have been compared
are organized into the FSSP database

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric RouchkaFast Structural Similarity
Search
• Compare types and arrangements of
secondary structures within two proteins
• If elements similarly arranged, three-
dimensional structures are similar
• VAST and SARF are programs that use
these fast methods

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric RouchkaStructural Motifs Based on
Sequence Analysis
• Some structural elements can be
determined by looking at sequence
composition
–zinc finger motifs
–leucinezippers
–coiled-coil structures

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Zinc Finger Motifs
• Found by looking at
order and spacing of
cysteineand
histidineresidues
• Typical zinc finger
motifs are
composed of two
cysteinesfollowed
by two histidines
Image source: www.bmb.psu.edu/faculty/tan/lab/ tanlab_gallery_protdna.html

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
LeucineZippers
• Found by looking for
two antiparallelalpha
helices held together
• Interactions between
hydrophobic leucine
residues found every
seventh position in helix
Image source: ww2.mcgill.ca/biology/undergra/ c200a/sec3-5.htm

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric RouchkaTransmembraneProteins
• traverse back and forth
through alpha helices
• Typical length: 20-30
residues
• Transmembranealpha
helices have hydrophobic
residues on the inside
facing portions, and
hydrophilic residues on the
outside
Image source:
http://www.northwestern.edu/neurobiology/faculty/pinto2/pinto_12big.jpg

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Membrane Prediction
Programs
•PHDhtm: employs neural network approach;
neural network trained to recognize sequence
patterns and variations of helices in
transmembraneproteins of known structures
•Tmpred: functions by searching a protein
against a sequence scoring matrix obtained
by aligning the sequences of all known
transmembranealpha helix regions

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Chou-FasmanMethod
• based on analyzing frequency of amino acids in
different secondary structures
– A, E, L, and M strong predictors of alpha helices
– P and G are predictors in the break of a helix
• Table of predictive values created for alpha helices,
beta sheets, and loops
• Structure with greatest overall prediction value
greater than 1 used to determine the structure

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
GOR Method
• Improves upon the Chou-Fasmanmethod
• Assumes amino acids surrounding the central amino
acid influence secondary structure central amino acid
is likely to adopt
• Scoring matrices used in GOR method, incorporates
information theory and Bayesian statistics
• Mount, p450-451

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Neural Network Models
• Programs trained to recognize amino acid
patterns located in known secondary
structures
• distinguish these patterns from patterns not
located in structures
• PHD and NNPREDICT use neural networks

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Nearest-neighbor
• machine learning method
• secondary structure confirmation of an amino
acid calculated by identifying sequences of
known structures similar to the query by
looking at the surrounding amino acids
• Nearest-neighbor programs include include
PSSP, Simpa96, SOPM, and SOPMA

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric RouchkaPrediction of 3d Structures
• Threading is most Robust technique
• Time consuming
• Requires knowledge of protein structure

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Threading
• Searches for structures with similar folds
without sequence similarity
• Threading takes a sequence with unknown
structure and threads it through the
coordinates of a target protein whose
structure has been solved
– X-ray crystallography
–NMR imaging

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Threading
• Considered position by position subject
to predetermined constraints
• Thermodynamic calculations made to
determine most energetically favorable
and confirmationallystable alignment

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric RouchkaEnvironmental Template
• Environment of each amino acid in each
known structural core is determined
–secondary structure
–area of side chain buried by closeness to
other atoms
–types of nearby side chains

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric RouchkaEnvironmental Template
• Each position classified into one of 18
types
–6 representing increasing levels of residue
burial
–three classes of secondary structure (alpha
helices, beta sheets, and loops).

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Upcoming Seminars
• Topic TBA
–Rafael Irizarry, Johns Hopkins University
• Friday, 4/23/2004
• 8:30 AM –9:30 AM
• LOCATION: K-Building Room 2036 (HSC
Campus)

CECS 694-02 Introduction to Bioinformatics University of Louisville Spring 2004 Dr. Eric Rouchka
Presentations
• 4:45 –5:00 Richard Jones
• 5:00 –5:15 Steven Xu
• 5:15 –5:30 OlutolaIyun
• 5:30 –5:45 Frank Baker
• 5:45 –6:00 GuanghuiLan
• 6:00 –6:15 Tim Hardin
• 6:15 –6:30 SatishBollimpalli& Ravi
Gundlapalli
Tags