Prediction of protein structure, homology Modeling

siya886600 0 views 40 slides May 10, 2025
Slide 1
Slide 1 of 40
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40

About This Presentation

Prediction of protein structure


Slide Content

aim
Structure prediction tries to build models of 3D
structures of proteins that could be useful for
understanding structure-function relationships.

Genbank/EMBL 105.000.000
Uniprot5.200.000
PDB 47.000

The protein folding problem
The information for 3D structures is coded in the
protein sequence
Proteins fold in their native structure in seconds
Native structures are both thermodynamically
stables and kinetically available

AVVTW...GTTWVRAVVTW...GTTWVR
ab-initio prediction
Prediction from sequence using first principles

Ab-initio prediction
“In theory”, we should be able to build native
structures from first principles using sequence
information and molecular dynamics
simulations: “Ab-initio prediction of structure”
Simulaciones de 1 s de “folding” de una proteína modelo
(Duan-Kollman: Science, 277, 1793, 1998).
Simulaciones de folding reversible de péptidos (20-200 ns)
(Daura et al., Angew. Chem., 38, 236, 1999).
Simulaciones distribuidas de folding de Villin (36-residues)
(Zagrovic et al., JMB, 323, 927, 2002).

... the bad news ...
It is not possible to span simulations to the
“seconds” range
Simulations are limited to small systems and fast
folding/unfolding events in known structures
steered dynamics
biased molecular dynamics
Simplified systems

typical shortcuts
Reduce conformational space
1,2 atoms per residue
fixed lattices
Statistic force-fields obtained from known structures
Average distances between residues
Interactions
Use building blocks: 3-9 residues from PDB
structures

Some protein from ESome protein from E.coli.coli
predicted at 7.6 Åpredicted at 7.6 Å
(CASP3, H.Scheraga)(CASP3, H.Scheraga)
Results from ab-initio
Average error 5 Å - 10 Average error 5 Å - 10
ÅÅ
Function cannot be Function cannot be
predictedpredicted
Long simulationsLong simulations

comparative modelling
The most efficient way to predict protein
structure is to compare with known 3D
structures

Protein folds

Basic concept
In a given protein 3D structure is a more
conserved characteristic than sequence
Some aminoacids are “equivalent” to each other
Evolutionary pressure allows only aminoacids
substitutions that keep 3D structure largely
unaltered
Two proteins of “similar” sequences must have
the “same” 3D structure

Possible scenarios
1. Homology can be recognized using sequence comparison tools or
protein family databases (blast, clustal, pfam,...).
Structural and functional predictions are feasible
2. Homology exist but cannot be recognized easily (psi-blast,
threading)
Low resolution fold predictions are possible. No functional
information.
3. No homology
1D predictions. Sequence motifs. Limited functional prediction.
Ab-initio prediction

fold prediction

3D struc. prediction

1D prediction
Prediction is based on averaging aminoacid
properties
AGGCFHIKLAAGIHLLVILVVKLGFSTRDEEASS
Average over a
window

1D prediction. Properties
Secondary structure propensitites
Hydrophobicity (transmembrane)
Accesibility
...

Aminoacido P() P() P(turn)
Ala 1.29 0.9 0.78
Cys 1.11 0.74 0.8
Leu 1.3 1.02 0.59
Met 1.47 0.97 0.39
Glu 1.44 0.75 1
Gln 1.27 0.8 0.97
His 1.22 1.08 0.69
Lys 1.23 0.77 0.96
Val 0.91 1.49 0.47
Ile 0.97 1.45 0.51
Phe 1.07 1.32 0.58
Tyr 0.72 1.25 1.05
Trp 0.99 1.14 0.75
Thr 0.82 1.21 1.03
Gly 0.56 0.92 1.64
Ser 0.82 0.95 1.33
Asp 1.04 0.72 1.41
Asn 0.9 0.76 1.23
Pro 0.52 0.64 1.91
Arg 0.96 0.99 0.88
Propensities Chou-Fasman
Biochemistry 17, 4277 1978


turn

Some programs (www.expasy.org)
BCM PSSP - Baylor College of Medicine
Prof - Cascaded Multiple Classifiers for Secondary Structure Prediction
GOR I (Garnier et al, 1978) [At PBIL or at SBDS]
GOR II (Gibrat et al, 1987)
GOR IV (Garnier et al, 1996)
HNN - Hierarchical Neural Network method (Guermeur, 1997)
Jpred - A consensus method for protein secondary structure prediction
at University of Dundee
nnPredict - University of California at San Francisco (UCSF)
PredictProtein - PHDsec, PHDacc, PHDhtm, PHDtopology, PHDthreader,
MaxHom, EvalSec from Columbia University
PSA - BioMolecular Engineering Research Center (BMERC) / Boston
PSIpred - Various protein structure prediction methods at Brunel
University
SOPM (Geourjon and Deléage, 1994)
SOPMA (Geourjon and Deléage, 1995)
AGADIR - An algorithm to predict the helical content of peptides

1D Prediction
Original methods: 1 sequence and uniform
parameters (25-30%)
Original improvements: Parameters specific
from protein classes
Present methods use sequence profiles obtained
from multiple alignments and neural networks
to extract parameters (70-75%, 98% for
transmembrane helix)

Methods for remote homology
Homology can be recognized using PSI-Blast
Fold prediction is possible using threading
methods
Acurate 3D prediction is not possible: No
structure-function relationship can be inferred
from models

Threading
Unknown sequence is “folded” in a number of
known structures
Scoring functions evaluate the fitting between
sequence and structure according to statistical
functions and sequence comparison

..........
10.510.5 5.2>> ..........
SELECTED HITSELECTED HIT

ATTWV....PRKSCTATTWV....PRKSCT SequenceSequence
HHHHH....CCBBBBHHHHH....CCBBBB Pred. Sec. Struc.Pred. Sec. Struc.
eeebb....eeebebeeebb....eeebeb Pred. accesibilityPred. accesibility
..........
SequenceSequence GGTV....ATTW ........... ATTVL....FFRKGGTV....ATTW ........... ATTVL....FFRK
Obs SS Obs SS BBBB....CCHH ........... HHHB.....CBCB BBBB....CCHH ........... HHHB.....CBCB
Obs Acc. Obs Acc. EEBE.....BBEB ........... BBEBB....EBBEEEBE.....BBEB ........... BBEBB....EBBE

Threading accurancyThreading accurancy
0
0.05
0.1
0.15
0.2
0.25
0.3
0.35
%

A
C
I
E
R
T
O
S
5 10 15 20 25
% IDENTIDAD SECUENCIAS

Comparative modelling
Good for homology >30%
Accurancy is very high for homology > 60%
Reminder
The model must be USEFUL
Only the “interesting” regions of the protein need
to be modelled

Expected accurancy
Strongly dependent on the quality of the sequence
alignment
Strongly dependent on the identity with “template”
structures. Very good structures if identity > 60-70%.
Quality of the model is better in the backbone than
side chains
Quality of the model is better in conserved regions

Quality test
No energy differences between a correct or
wrong model
The structure must by “chemically correct” to
use it in quantitative predictions

Analysis software
PROCHECK
WHATCHECK
Suite Biotech
PROSA

Prediction software
SwissModel (automatic)
http://www.expasy.org/swissmod/
SwissModel Repository
http://swissmodel.expasy.org/repository/
3D-JIGSAW (M.Stenberg)
http://www.bmm.icnet.uk/servers/3djigsaw/
Modeller (A.Sali)
http://salilab.org/modeller/modeller.html
MODBASE (A. Sali)
http://alto.compbio.ucsf.edu/modbase-cgi/index.cgi

Final test
The model must justify experimental data (i.e.
differences between unknown sequence and
templates) and be useful to understand function.
Tags