Secondary Structure Prediction of proteins

52,013 views 38 slides Dec 21, 2015
Slide 1
Slide 1 of 38
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38

About This Presentation

Secondary structure prediction has been around for almost a quarter of a century. The early methods suffered from a lack of data. Predictions were performed on single sequences rather than families of homologous sequences, and there were relatively few known 3D structures from which to derive parame...


Slide Content

Secondary Structure
Prediction Of Protein
Protein
Sequence +
Structure
VIJAY

INRODUCTION



Primary structure (Amino acid sequence)

Secondary structure (α-helix, β-sheet)

Tertiary structure (Three-dimensional
structure formed by assembly of secondary
structures)

Quaternary structure (Structure formed by
more than one polypeptide chains)

Secondary Structure
Defined as the local conformation of protein backbone
Primary Structure —folding— Secondary Structure
a helix and b sheet


Secondary Structure
Regular Secondary
Structure
(a-helices, b-sheets)
Irregular
Secondary
Structure
(Tight turns,
Random coils,
bulges)

a helix

•common confirmation.
•spiral structure
•Tightly packed coiled polypeptide
backbone, with extending side chains
•Spontaneous
•stabilized by H-bonding between amide
hydrogens and carbonyl oxygens of peptide
bonds.
•R-groups lie on the exterior of the helix
and perpendicular to its axis.
•complete turn of helix —3.6 aminoacyl
residues with distance 0.54 nm
e.g. the keratins- entirely α-helical
Myoglobin- 80% helical

•Glycine and Proline , bulky amino acids,
charged amino acids favor disruption of the
helix.

b sheet

•β-sheets are composed of 2 or more different regions of
stretches of at least 5-10 amino acids.
•The folding and alignment of stretches of the polypeptide
backbone aside one another to form β-sheets is stabilized by
H-bonding between amide hydrogens and carbonyl oxygens
•the peptide backbone of the β sheet is highly extended.
•R groups of adjacent residues point in opposite directions.
• β-sheets are either parallel or antiparallel

b-sheet
(parallel, anti-parallel)

What is secondary
structure prediction?
Given a protein sequence (primary structure)






1
st
step in prediction of protein structure.
Technique concerned with determination of secondary structure of
given polypeptide by locating the Coils Alpha Helix Beta Strands in
plypeptide

GHWIATRGQLIREAYEDYRHFSSECPFIP
Predict its secondary structure content
(C=Coils H=Alpha Helix E=Beta Strands)
CEEEEECHHHHHHHHHHHCCCHHCCCCCC

Why secondary structure
prediction?
osecondary structure —tertiary structure prediction
oProtein function prediction
oProtein classification
oPredicting structural change
odetection and alignment of remote homology between proteins
oon detecting transmembrane regions, solvent-accessible residues,
and other important features of molecules
oDetection of hydrophobic region and hydrophilic region

Prediction methods
o Statistical method
oChou-Fasman method, GOR I-IV
oNearest neighbors
oNNSSP, SSPAL
oNeural network
oPHD, Psi-Pred, J-Pred
oSupport vector machine (SVM)
oHMM

Chou-Fasman algorithm

Chou and fasman in 1978

It is based on assigning a set of prediction value to amino
acid residue in polypeptide and applying an algorithm to the
conformational parameter and positional frequency.

conformational parameter for each amino acid is calculated
by considering the relative frequency of each 20 amino
acid in proteins
By this C=Coils H=Alpha Helix E=Beta Strands are
determined
Also called preference parameter

•A table of prediction value or preference parameter for each
of 20 amino acid in alpha helix ,beta plate and turn
already calculated and standardised.
•To obtain the prediction value the frequency of amino
acids( i) in structure is divided by of all residences in
protein (s)
•i/s
•The resulting structural parameter of
p(alpha),p(beta),p(turn)vary —0.5 to 1.5 for 20 amino acid

Window is scanned to find a short sequence of
amino acid that has high probability to form one
type of structure
When 4 out of 6 amino acid have high
probability >1.03 the – alpha helix
3 out of 5 amino acid with probability >1.03-beta
RULES

ALGORITHM
oNote preference parameter for 20 aa in peptide
oScan the window and identify the region where 4 out of
6 contiguous residue have p(alpha helix) >1.00
oContinue scanning in both the direction until the 4
contiguous residue that have an average p(alpha
helix)<1.00,end of helix
oIf segment is longer than 5aa and p(alpha helix)>p(beta
sheet )-segment –completely alpha helix
o scan different segment and identify - alpha helix

Identify the region where 3 out of 5 aa have the
value of p( beta sheet) >1.00 ,region is predicted
as beta sheet
Continue scanning both the direction until 4
residue that have p( beta sheet) <1.00
End of beta sheet
 average p( beta sheet) >105 and p( beta sheet)
>p(alpha helix) than consider complete segment
as b pleated sheet

If any region is over lapping than consider it as
alpha helix if average p(alpha helix)>p(beta sheet )
Or beta sheet if p(alpha helix)<p(beta sheet )
To identify turn
P(t)=f(j)f(j+1)f(j+2)f(j+3)
J=residual number

result
Accuracy: ~50%  ~60%
helix alanine,glutamine,leucine,methionine
Helix breaking proline and glycine
Beta sheet isoleucine,valine,tyrosine
Beta breaking proline,aspargine,glutamine
Turn contains proline(30%),serine(14%),lysine,
aspargine(10%)
Glycine(19%),aspartic acid
(`18%),serine(13%),tyrosine(11%)
http://www.accelrys.com/product/gcg-wisconsin-
package/program-list.html

Out put of Chou-Fasman

GOR METHOD
•GOR(Garnier,Osguthorpe,Robson)1978
•Chou fasman method is based on assumption that each amino
acid individually influence the 2ry structure of sequence
•GOR is based on, amino acid flanking the central amino acid
will influence the 2ry structure
•Consider a peptide central amino acid
side amino acid
•It assume that amino acid up to 8 residue on sides will
influence the 2ry structure of central residue
•4
th
version
•64% accurate

ALGORITHUM

•It uses the sliding window of 17 amino acid
•The side amino acid sequence and alignment is determined to
predict secondary structure of central sequence
•Good for helix than sheet because beta sheet has more inter
sequence hydrogen bonding
•36.5% accurate for beta sheet
•input any amino acid sequence
•Output tells about secondary structure

NEAREST NEIGHBOUR
METHOD
oBased on ,short homologues sequences of amino acids
have the same secondary structure
oIt predicts secondary structure of central homologues
segment by neighbour homologues sequences
oBy using structural database find some secondary
structure of sequence which may be homologues to our
target sequence
oNaturally evolved proteins with 35% identical amino acid
sequence will have same secondary structure
oFind some sequence which may match with target
sequence
oScoring matrix,MSA

“Singleton” score matrix
Helix Sheet Loop
Buried Inter Exposed Buried Inter Exposed Buried Inter Exposed
ALA -0.578 -0.119 -0.160 0.010 0.583 0.921 0.023 0.218 0.368
ARG 0.997 -0.507 -0.488 1.267 -0.345 -0.580 0.930 -0.005 -0.032
ASN 0.819 0.090 -0.007 0.844 0.221 0.046 0.030 -0.322 -0.487
ASP 1.050 0.172 -0.426 1.145 0.322 0.061 0.308 -0.224 -0.541
CYS -0.360 0.333 1.831 -0.671 0.003 1.216 -0.690 -0.225 1.216
GLN 1.047 -0.294 -0.939 1.452 0.139 -0.555 1.326 0.486 -0.244
GLU 0.670 -0.313 -0.721 0.999 0.031 -0.494 0.845 0.248 -0.144
GLY 0.414 0.932 0.969 0.177 0.565 0.989 -0.562 -0.299 -0.601
HIS 0.479 -0.223 0.136 0.306 -0.343 -0.014 0.019 -0.285 0.051
ILE -0.551 0.087 1.248 -0.875 -0.182 0.500 -0.166 0.384 1.336
LEU -0.744 -0.218 0.940 -0.411 0.179 0.900 -0.205 0.169 1.217
LYS 1.863 -0.045 -0.865 2.109 -0.017 -0.901 1.925 0.474 -0.498
MET -0.641 -0.183 0.779 -0.269 0.197 0.658 -0.228 0.113 0.714
PHE -0.491 0.057 1.364 -0.649 -0.200 0.776 -0.375 -0.001 1.251
PRO 1.090 0.705 0.236 1.249 0.695 0.145 -0.412 -0.491 -0.641
SER 0.350 0.260 -0.020 0.303 0.058 -0.075 -0.173 -0.210 -0.228
THR 0.291 0.215 0.304 0.156 -0.382 -0.584 -0.012 -0.103 -0.125
TRP -0.379 -0.363 1.178 -0.270 -0.477 0.682 -0.220 -0.099 1.267
TYR -0.111 -0.292 0.942 -0.267 -0.691 0.292 -0.015 -0.176 0.946
VAL -0.374 0.236 1.144 -0.912 -0.334 0.089 -0.030 0.309 0.998

Neural Network
Method

•Prediction is done by utilizing the
information of different
DATABASE
•Linear sequence  3D structure of
Polypeptide

Neural network
Input signals are summed
and turned into zero or one
3.
J
1
J
2
J
3
J
4
Feed-forward multilayer network
Input layer Hidden layer Output layer
neurons

Enter sequences
Compare Prediction to Reality
Adjust Weights

Neural network training

Simple Neural Network
With Hidden Layer
out
if
ij
2
J
f
jk
1
J
k


kin






j








 Simple neural network
with hidden layer

A
C
D
E
F
G
H
I
K
L
M
N
P
Q
R
S
T
V
W
Y
.
H
E
L
D (L)
R (E)
Q (E)
G (E)
F (E)
V (E)
P (E)
A (H)
A (H)
Y (H)
V (E)
K (E)
K (E) Neural network for
secondary structure

Summary
Introduction
What is secondary structure prediction
Why
Chou-Fasman method
GOR I-IV
Nearest neighbors
Neural network

Suggested reading:
Chapter 15 in “Current Topics in Computational Molecular
Biology, edited by Tao Jiang, Ying Xu, and Michael Zhang. MIT
Press. 2002.”
Bioinformatics by Cynthia and per jambeck
 Bioinformatics by S.C.RASTOGI
 Bioinformatics By Andreas
Optional reading:
Review by Burkhard Rost:
http://cubic.bioc.columbia.edu/papers/2003_r
ev_dekker/paper.html
Reference