Protein struc pred-Ab initio and other methods as a short introduction.ppt
60BT119YAZHINIK
208 views
37 slides
Mar 24, 2024
Slide 1 of 37
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
About This Presentation
Protein struc prediction methods as a gist
Size: 440.11 KB
Language: en
Added: Mar 24, 2024
Slides: 37 pages
Slide Content
1
Protein Structure Prediction
Charles Yan
2
Different Levels of Protein Structures
The primary structureis the sequence of
residues in the polypeptide chain.
Secondary structureis a local regularly
occurring structure in proteins.
Alpha helices
Betasheets
Loops (Coils, Turns)
3
Different Levels of Protein Structures
Tertiary
structure
describes the
packing of alpha-
helices, beta-sheets
and random coils
with respect to
each other on the
level of one whole
polypeptide chain.
4
Different Levels of Protein Structures
Quaternary
structureonly
exists, if there is
more than one
polypeptide chain
present in a
complex protein.
5
Question
Why and howa sequence of amino
acids can fold into its functional native
structure given the abundance of
geometrically possiblestructures?
6
Protein Structure Prediction
Anfinsen’s (1973) thermodynamichypothesis:
Proteins are not assembled into their native
structuresby a biological process, but folding is a
purely physicalprocessthat depends only on the
specific amino acidsequenceof the protein.
Anfinsen’shypothesis implies that in principle protein
structurecan be predicted if a model of the free
energy is available,and if the global minimum of this
function can be identified.
7
Protein Structure Prediction
Protein structureprediction remains
utterly complex, since even shortamino
acid sequences can form an abundant
number of geometricstructuresamong
which the free energy minimumhas to
be identified.
8
Structure Prediction Methods
Methods for structure prediction can be
divided into four groups:
Comparative modeling
Fold recognition
Fragment-based method
Ab initio(methods that do not use database
information).
9
Comparative Modeling
The number of protein structures thathave
been determined experimentally continues to
growrapidly. At the end of 2004, the number
of structures freelyavailable from the Protein
Data Bank (Berman et al., 2000)is
approaching 28,000.
The availability of experimentaldata on
protein structures has inspired the
developmentof methods for computational
structure prediction that areknowledge-
based rather than physics based.
10
Comparative Modeling
While such database methodshave been
criticized for not helping to obtain a
fundamentalunderstandingof the
mechanisms that drive structure
formation, these knowledge-based
methods can often successfullypredict
unknown three dimensional structures.
11
Comparative Modeling
In comparative modelingthe structure of a
protein is predictedby comparing its amino
acid sequence to sequencesfor which the
native three-dimensional structure is already
known.
Comparative modeling is based on the
observationthat sequence similarity implies
structural similarity.
Theaccuracy of predictions by comparative
modeling, however,strongly depends on the
degree of sequence similarity.
12
Comparative Modeling
Ifthe target and the template share more
than 50% of theirsequences, predictions
usually are of high quality and havebeen
shown to be as accurate as low-resolution X-
ray predictions.
For 30–50% sequenceidentity more than
80% of the C-atoms can be expected tobe
within 3.5 ˚A of their true positions.
For less than 30% sequence identity, the
predictionis likely to contain significant errors
13
Comparative Modeling
In general, comparativemodeling consists of
Selection of one or more templates from a database.
BLAST (for closely related sequences).
PSI-BLAST (for distantly related sequences).
A single template rarely provides a complete model.
Alternative template structures may provide some additional
structural features.
Alignment to the target sequence.
Require a correct alignment of the target and template
sequences. This is not trivial, especially when the similarity
is not very high.
Refinement of sidechaingeometry and regions of
low sequence identity.
14
Comparative Modeling
Comparative modeling methods hardly differ
with respectto template selection and
alignment.
Little progress in refiningtemplates. Early
hopes that molecular dynamicsmethods would
allow refinement have not been fulfilled.
Reasons for thisare a matter of hot debate
within the field, with threesuggested inter-
related explanations: inadequate samplingof
alternative conformations, insufficiently
accuratedescription of the inter-atomic forces
and too short trajectories.
15
Comparative Modeling
Improving sequence comparisontechniques
have broadened the scope of comparative
modeling.
While 30% sequence similarity was considered
to bethe threshold for successful
comparative modeling, predictionsfor targets
with as low as 17% sequence similaritywere
made during the CASP4 experimentand 6%
during CASP5.
16
Comparative Modeling
Challenges
Aligning the target sequence onto the template
structure or structures is challenging, and typically
results in very significant errors.
Generally, a significant fraction of residues in a target
will have no structural equivalent in an available
template. Reliably buildingregions of the structure not
present in a template remainsa challenge.
Sidechain accuracy of these approximate modelsis
poor.
Refinement remains the principal bottleneck to
progress.
17
Comparative Modeling
The importanceof comparative modeling
will continue to grow as the numberof
experimentally determined structures
grows steadily and,therefore, the
number of sequences that can be
related to aknown structure is growing.
19
Fold Recognition
While similar sequence implies similar structure,
the converseis in general not true.
In contrast, similar structuresare often found
for proteins for which no sequence similarityto
any known structure can be detected.
As a consequence, the repertoire of different
folds is morelimited than suggested by
sequence diversity.
20
Fold Recognition
Fold recognition methods are motivated by
the notionthat structure is evolutionary more
conserved than sequence.
Fold recognitionmethods are one class of
methods that aim at predicting thethree-
dimensional folded structure for amino acid
sequencesfor which comparative modeling
methods provide no reliableprediction.
21
Fold Recognition
Since the number of sequences is much
larger than thenumber of folds, fold
recognition methods attempt to identify
a model fold for a given target
sequence among theknown folds even
if no sequence similarity can be
detected.
22
Fold Recognition
Do we have all the folds?
According to arecent assessment, the
protein data bank already contains
enough structures to cover small
protein structures up to alength of
about a hundred residues.
23
Fold Recognition
One approach to fold recognition is based on
secondarystructure prediction and
comparison.
This subclass ofmethods is based on the
observation that secondary structure
similarity can exceed 80% for sequences that
exhibit lessthan 10% sequence similarity.
Clearly any such approach can only be as
good as the underlyingsecondary structure
prediction method.
25
Fold Recognition
Secondary structure information is often
combined withother one-dimensional
descriptors in fold recognition methods(e.g.,
with simple scores for solvent accessibility of
each amino acid)。
The approachis basedon predicting one
dimensional descriptors for a target, and
identifying a similar fold by comparing these
descriptorsto the descriptors of known folds.
26
Fold Recognition
Threadingis an importantrepresentative
of fold recognition methods.
Threadingmethods attempt to fit a target
sequence to a known structurein a library
of folds.
Threading-based methods are known to
be computationally expensive.
Globally optimal proteinthreading is
known to be NP-hard
27
Fold Recognition
Several threading methods ignore
pairwiseinteraction between residues.
In doing so, the threading problemis
simplified considerably, and the
simplified problemcan be solved with
dynamic programming
28
Fold Recognition
In early methods of this kind, a onedimensional
string of featureswas recorded for known folds
and compared to the target sequence.
The recorded features comprise attributes like
buried sidechainarea, side chain area covered
by polar atoms includingwater, and the local
secondary structure.
In this manner,the three-dimensional structure
of known proteins is convertedinto a one-
dimensional sequence of descriptorsand fold
recognition is reduced to seeking the most
favorable sequence alignment between the
query sequence and a database of sequences.
29
Fold Recognition
Recent approaches take into accountpairwise
residue interaction potentialsthat describe a
mean force derivedfrom a database of
known structures.
30
Fragment Assembly Methods
These methods do not compare a target to a
knownprotein, but they compare fragments,
that is, short aminoacid subsequences, of a
target to fragments of knownstructures
obtained from the Protein Data Bank.
Once appropriate fragments have been
identified,they are assembled to a structure.
31
Ab InitioMethods
Methods of this type make direct use of
Anfinsen’s thermodynamichypothesis in
that they attempt to identify the
structure with minimum free energy.
Computationally demanding.
Indispensablecomplementary approach
to any knowledge-based approachfor
several reasons.
32
Ab InitioMethods
First, in some cases, even a remotelyrelated
structural homologue may not be available.
Second,new structurescontinue to be discovered
which couldnot have been identified by methods
which rely on comparisonto known structures.
Third, knowledge-based methodshave been criticized
for predicting protein structures withouthaving to
obtain a fundamental understanding of the
mechanismsand driving forces of structure
formation.Ab initiomethods, in contrast, base their
predictionson physical models for these mechanisms.
33
Ab InitioMethods
POS: This class of methods can be
applied to any given targetsequence
using only physically meaningful
potentials andatom representations.
NEG: These methods are the most
difficult of the proteinstructure
prediction methods.
34
Ab InitioMethods
Challenges
Energy functions that can reliable
discriminatenative and non-native structures.
Enormous amount of computations.
35
Ab InitioMethods
Ab initio methods have recently received
increased attentionin the prediction of loops.
Loops exhibit greaterstructural variability than
Beta-sheets and Alpha helices.
Loop structuretherefore is considerably more
difficult to predict thanthe structure of the
geometrically highly regular Beta-sheets and Alpha
helices.
Loopsare often exposed to the surface of proteins
and contributeto active and binding sites.
Consequently,loops arecrucial for protein function.
36
CASP
Progress for all variants of computational protein
structureprediction methods is assessed in the
biannual, communitywideCritical Assessment of
Protein Structure Prediction(CASP)experiments.
In the CASP experiments, research groups are
invitedto apply their prediction methods to amino
acid sequencesfor which the native structure is
not known but to be determinedand to be
published soon.
37
CASP
Over200 prediction teams from 24
countries participated inCASP6.