Brief Methodology
1.Use physics to find the potential energy between all pairs of atoms.
2.Move atoms to the next state.
3.Repeat.
Energy Function
Describes the interaction energies of all atoms and molecules in the
system.
Always an approximation.
Closer to real physics --> more realistic, more computation time (I.e.
smaller time steps and more interactions increase accuracy)
Scale in Simulations
Ηψ = Εψ
F = MA
exp(-ΔE/kT)
domain
quantum
chemistry
molecular
dynamics
Monte Carlo
mesoscale
continuum
Length Scale
10
-10
M 10
-8
M 10
-6
M 10
-4
M
10
-12
S
10
-8
S
10
-6
S
Structure schematic of human
SUMO protein
NMRstructureofSUMO:thebackboneoftheproteinis
representedasaribbon,highlightingsecondarystructure;N-
terminusinblue,C-terminusinred
Function of SUMO
SUMOmodificationofproteinshasmanyfunctions.Amongthe
mostfrequentandbeststudiedareproteinstability,nuclear-
cytosolictransport,andtranscriptionalregulation.
Typically,onlyasmallfractionofagivenproteinisSUMOylated
andthismodificationisrapidlyreversedbytheactionof
deSUMOylatingenzymes.TheSUMO-1modificationofRanGAP1
(thefirstidentifiedSUMOsubstrate)leadstoitstraffickingfrom
cytosoltonuclearporecomplex.
The SUMO modification of protein leads to its movement from
thecentrosome to thenucleus.
Structure
There are 3 confirmed SUMOisoformsin humans;SUMO-1, SUMO-2 and
SUMO-3. SUMO-2/3 show high a high degree of similarity to each other and
are distinct from SUMO-1.
SUMO proteins are small; most are around 88 to100amino acidsin length
and 12kDAinmass. The exact length and mass varies between SUMO family
members and depends on which organism the protein comes from.
Although SUMO has very little homology with Ubiquitin at the amino acid
level, it has a nearly identical structural fold.
SUMO1 as a globular protein with both ends of the amino acid chain sticking
out of the protein's centre. The spherical core consists of analpha helixand
abeta sheet.
The SUMO protein taken for this work was extracted out from Drosophila
melangaster
Giving Dynamics to the protein
Step1:generationofstructures.
Step2:performingmoleculardynamicsoneachofthetopologies.
Step 3: Recording the potential energy changes in protein during
Dynamics.
Step4:Clusteringofthebestminimizedstructures.
Programs/software'sused:
•Cyana
•NAMD/VMD
•VEGAZZ
•GROMACS
Generation of structures using Cyana
For the sake of convenience and ease of dynamics, SUMO protein
was divided in to five fragments.
These fragments were divided based on their propensity to form
secondary structures.
Fragment Residue numbers
Fragment 1 1-12
Fragment 2 11-32
Fragment 3 31-53
Fragment 4 52-72
Fragment 5 71-88
First it was necessary to make 1000 random conformers or topologies
from each of the five different fragments.
The program used for this structure generation was CYANA which is
linux-based program.
For this the sequences of each of the five fragments of SUMO protein
was given to the program and was told to generate 1000 random
topologies.
Dynamics and annealing conditions were applied to give the energy of
these 1000 random structures.
The program was told to select 20 best energy minimized structures.
These 20 different topologies could be viewed using softwareslike
Pymol, Molmol, VMD etc.
The dynamics of the protein was carried out in vacuum without giving
any constraints to them and the dynamics of the protein could be
played using the above mentioned softwaresand are saved.
Files used in Cyana
First need to create .CCO file of the FIVE fragments
.CCO file------------
1 MET H HA 6.7277 3.20E+00
2 SER H HA 6.9968 1.20E+00
3 ASP H HA 6.4720 1.20E+00
4 GLU H HA 6.8444 1.20E+00
5 LYS H HA 6.9625 1.20E+00
.
.
.
.
.
.
.
53 THR H HA 7.5359 2.20E+00
•Init.cya File
.cyais a batch file which contains a set commands.
Rmsdrange := 31.....53
Cyana.lib
Read seqthird.seq
Swap = 0
Batch file
‘Seed’ asks the program to generate 1000 topologies.
The last two commands create the 20 best topologies.
performing molecular dynamics on each of the
topologies
Using
GROMACS………………
HIGHLIGHTS
Generally 3 to 10 times faster than other Molecular Dynamics programs
Very user-friendly: issues clear error messages, no scripting language is
required to run the programs, prints out the progress of the program that
is running, etc.
Allows the trajectory data to be stored in a compact way.
Gromacsprovides a basic trajectory data viewer; xmgror Grace may also
be used to analyze the results.
Files from earlier versions of Gromacsmay be used in the latest Gromacs,
version 3.1.
To run a simulation several things are needed:
1.a file containing the coordinates for all atoms.
2.information on the interactions (bond angles, charges, Van der
Waals).
3.parameters to control the simulation.
The exercise falls apart in four sections,
corresponding to the actual steps in an MD
simulation.
1.Conversion of the pdbstructure file to a Gromacsstructure file,
with the simultaneous generation of a descriptive topology file.
2.Energy minimization of the structure to release strain.
3.Running a full simulations.
4.Analyzing results.
File Formats
PDB file
------format used by Brookhaven Protein DataBank.
Atom
residue
Res.no
X,Y,Z coordinates
Topology (*.top) file (ascii)
---------contains all the forcefieldparameters
Gromacs(*.gro):
molecular structure file in the Gromos87 format
(Gromacsformat)
x, y, and z
position,
in nm
x, y, and z
velocity, in
nm/ps
*.tprFile
contains the starting structure of the simulation, the molecular
topology file and all the simulation parameters; binary format.
*.xvgfile:
file format that is read by Grace (formerly called Xmgr), which is a
plotting tool for the X window system.
Plot of X vsY
*.mdpfile:
allows the user to set up specific parameters for all the
calculations that Gromacsperforms.
Recording every 0.002ps
No of steps for
MD=500000
Coloumbinteractions
within 1.4A
Vander waals
interaction
within 1.4radius
Takes into account
interactions from
all the bonds
em.mdpfile:
sets the parameters for running energy minimizations
integratorIterations
Neighbor list
constraints
Forces and
potential
The exercise falls apart in four sections,
corresponding to the actual steps in an MD
simulation.
1.Conversion of the pdbstructure file to a Gromacsstructure
file, with the simultaneous generation of a descriptive
topology file.
2.Energy minimization of the structure to release strain.
3.Running a full simulations.
4.Analyzing results.
Batch file for executing molecular dynamics
Step 1: Conversion of the PDB File
Each of the topologies saved in the PDB format were used as an input
file for MD simulations with Gromacs.
It is first necessary to convert it to the gromosfile type (*.gro).
Original data in the pdbfile is often incomplete, carbon bound
hydrogens are generally omitted.
The conversion program pdb2gmxwill check every residue in the
structure file against a database and add all hydrogens. In the
conversion process it also creates a topology file, with all the
connections between the atoms listed.
pdb2gmx can be used by simply typing it at the prompt.
Example : we get a list of available options that that this conversion
program can execute ----
pdb2gmx -h
pdb2gmx -h
This program reads a pdbfile, reads some database files, adds
hydrogens to the molecules and generates coordinates in Gromacs
(Gromos) format and a topology in Gromacsformat.
This conversion program contains many in-built force-fields. We have
to select the required force fields.
Option in Pdb2gmx:
Options Description
-f Input
-o Output
-p Output for topology file
-i Output
-n Output
-q output
-ff To assign force field
The program will ask to select a force field:
Select the Force Field:
0:GROMOS9643a1forcefield.
1:GROMOS9643b1vacuumforcefield.
2:GROMOS9643a2forcefield(improvedalkane
dihedrals).
3:OPLS-AA/Lall-atomforcefield(foraminoacid
dihedrals).
4:Gromacsforcefield(gmx)withhydrogensforNMR.
5:Encadall-atomforcefield,usingscaled-down
vacuumcharges.
6:Encadall-atomforcefield,usingfullsolventcharges.
Gmxforce field was used (for NMR)
Outputs produced by this command…………………..
Once the selection of the force field is done, three kinds of output files are produced:
1.PDB files
2.the generated topology (.top) file
3.gromos(.gro) file
•dsmt3.gro:
It looks a lot like the original pdbfile, containing the same information regarding
the positions of the atoms, but the layout is different, hydrogenshave been added
and units have been converted to nm.
•dsmt3.top:
This file contains the information on the atom names, types, masses and charges, as
well as a description of bonds, angles, dihedrals, etc.
Force fields used using during molecular dynamics
force field(also called a forcefield) refers to the functional formand
parameter sets used to describe the potential energyof a system of
particles ( in this case the atoms and the residues).
As protein models consist of hundreds or thousands of atoms the only
feasible methods of computing systems of such size are molecular
mechanics calculations.
A Force-Field is assigned to each atom in the protein. This figure is a
schematic representation of the four key contributions to a molecular
mechanics force field: bond stretching, angle bending, torsional terms
and non-bonded interactions.
Bond Stretching Energy
Bending Energy
Torsion Energy
Non-bonded Energy
Energy=
Stretching Energy
+
Bending Energy
+
Torsion Energy
+
Non-Bonded Interaction Energy
Types of force fields
1.All-atomforce fields -provide parameters for every atom in a system, including
hydrogen.
2.united-atom force fields -treat the hydrogen and carbon atoms in methyl and
methylene groups as a single interaction center.
3.Coarse-grained force fields -which are frequently used in long-time simulations of
proteins.
These equations together with the data (parameters) required to describe the behavior
of different kinds of atoms and bonds, is called a force-field.
Batch file for executing molecular dynamics
Done
Editconf
•editconf puts .gro file into a box
•The box can be modified with options -box, -d and -angles. Both -box and –d will center
the system in the box.
•Option -bt determines the box type: cubic is a rectangular box with all sides equal
dodecahedron represents a rhombic dodecahedron and octahedron is a truncated
octahedron.
•With -d and cubic, dodecahedron or octahedron boxes, the dimensions are set to the
diameter of the system.
Options in Editconf
Option Description
-f Input
-n Output
-o Output
-bt For box type
-d Distance between the solute and the
box
Batch file for executing molecular dynamics
Done
Done
Batch file for executing molecular dynamics
Done
Done
Done
Step 2: Energy Minimization
•The structure is now complete (hydrogens have been added) and a topology
file has been created.
•However, there may be local strain in the protein, due to the generation of
the hydrogens, and bad Van der Waals contacts may exist, caused by
particles that are too close.
•The strain has to be removed by energy minimization of the structure. This
can be done with the program 'mdrun', which is the MD program. Mdrun
uses a single .tpr file as input, which is generated by combining the topology
(aki.top), structure (aki.gro) and parameter files (minim.mdp).
•grompp also reads parameters for the mdrun (eg. number of MD steps, time
step, cut-off).
•To generate the .tpr file the program grompp has to be used.
A description of grompp can be obtained by giving the command:
grompp-h
Options in Grompp
Option Description
-f grompp input file with MD parameters
-po grompp input file with MD parameters
-c Input
-r Input
-n Input
-p Input the topology file
-pp Preprocess and output the toplology file
-o Output
-t Input the trajectory file
-e Input the energy file
-np Generate the status file
Batch file for executing molecular dynamics
Done
Done
Done
Done
Mdrun
•The mdrun program is the main computational chemistry engine within GROMACS.
•It performs Molecular Dynamics simulations, Brownian Dynamics and Langevin
Dynamics as well as Conjugate Gradient or Steepest Descents energy minimization.
Principle
The mdrun program reads the run input file (-s)
Distributes the topology over nodes.
The coordinates are passed around, so that computations can begin.
A neighborlist is made, then the forces are computed.
The forces are globally summed, and positions are updated.
If necessary shake is performed to constrain bond lengths and/or bond
angles.
•Temperature and Pressure can be controlled using weak coupling to a bath.
•Option in Mdrun
Option Description
np Number of nodes used
s Input
o Output
c Output
e Output
g Output
x Output
The energy minimization may take some time, depending on the CPU in and the load
of the computer.
The trajectory file is not very important in energy minimizations, but the generated
structure file (minimized.gro) will serve as input for the simulation.
During the minimization the potential energy decreases. A plot from the energy over
time can be made from the minim_ener.edr file using g_energy command.
Simply make a plot from the .edr file by executing:
This will display something like the following:
g_energy-f dsmt3-em_ener.edr -o dsmt3-em_ener.xvg
Select the property you want by typing the name, e.g. Potential, which
codes for potential energy and then press return and another return to
quit.
The program g_energy produces a .xvg graph, which can be viewed and
edited with xmgrace (prgram which makes graphs in gromacs) :
GRAPH
xmgrace-nxydsmt3-
em_ener.xvg
Position Restrained MD
•molecular dynamics of the water molecules of water molecules are done, and position of the peptide
is kept fixed. This is called position restrained (PR) MD.
•Position Restrained MD keeps the peptide fixed and lets all water molecules equilibrate around the
peptide in order to fill holes, etc., which were not filled by the genboxprogram.
•It is first necessary to pre-process the input files to generate the binary topology. The input files are:
the topology file, the structure file (output of the EM) and a parameter file.
•By default, the system was split into two groups -Protein and SOL(vent), to put position restraints on
all the atoms of the peptide.
•The parameter file (.mdpextension) contains information about the PR-MD such as step size,
number of steps, temperature, etc. This parameter file also tells GROMACS what kind of simulation
should be performed
preprocess Position restrained MD file
Topology file of the
protein
Energy minimized
binary file from
previous step
Position restrained
MD file with energy
minimized protein in
it
•These two commands are used to give full moloeculardynamics, where none of
the systems are fixed.
•Both the proteins and the solvent is subjected to motion until both of them
become stable at a particular point, which is retained as the final output.
•g_filterfrequency filters trajectories, useful for making smooth movies. Many
of the trajectories are filtered and in all only 10 trajectories are kept. These can
be read by pymol(protein visualization software).
Pymolcannot read Gromacsxtctrajectories , and it is better to remove the solvent in
the trajectory to concentrate on the proteins.
This is easy to fix by using another Gromacsprogram to convert the trajectory to PDB
format and only select one group:
An output group is asked to select and protein is selected in that.
The trajectories of protein after MD is visualized in Pymolby giving a command:
Pymoldsmt3-finaltraj.pdb
Acknowledgements………………………
Prof. RamkrishnaHosur(TIFR, Mumbai)
Dinesh Kumar (TIFR, Mumbai)
Dr. GanapathySubramanian (NCL, Pune)
Special thanks to NMR Facility, TIFR, Mumbai