Homology modeling tools

bharathpharmacist 2,422 views 14 slides Jan 23, 2015
Slide 1
Slide 1 of 14
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14

About This Presentation

Homology modeling tools


Slide Content

Homology Modeling

Modeller

. Basic Modeling. Model a sequence with high identity to a template.
This exercise introduces the use of MODELLER in a simple case where the template selection and target-template
alignments are not a problem.

. Advanced Modeling. Model a sequence based on multiple templates and bound to a ligand.
This exercise introduces the use of multiple templates. ligands and loop refinement in the process of model building with
MODELLER.

. Iterative Modeling. increase the accuracy of the modeling exercise by iterating the 4 step process.
This exercise introduces the concept of MOULDING to improve the accuracy of comparative models.

. Difficult Modeling. model e sequence based on a low identity to a template
This exercise uses resources external to MODELLER in order to select a template for a difficult case of protein structure
prediction.

. Modeling with cryo-EM. mode; a sequence using both template and cryo-EM data.
This exercise assesses the quality of generated models and loops by rigid fitting into cryo-EM maps, and improves them
with flexible EM fitting.

1. Searching for structures related to TvLDH

First, it is necessary to put the target TvLDH sequence into the PIR format readable by MODELLER (file "'vLDH . ali”).

HS [ASGELYGDRQVYLMLLD IPP AMNRLT ALTMELED CAFPHLAGF VATTDPKA
AFKDIDCAFLVASMPLKPGQVRADLISSNSVIFKNTGEYLSEWAKPSVKVLVLGNEDNTNCE LAMLHAKNLKPEN
| FSSLSMLDQNRAYYEVASKLGVDVKDVHDI IVWGNHGESMVADL T QATF TKEGKTQKVVDVLDHDYVFDTFEKKI

GHRAWDILEHRGF TSAASP TKAAL QHMKAHLE GTAP GE VL SMGIP VPEGNPYGIKP GVVE SEP CNVDKEGKIHUV
: EGFKVNDWLREKLDE TEKDLFHEKETALNHLAQGG*

File: TWH ati

The first line contains the sequence code, in the format ">P1:code". The second line with ten fields separated by colons
generally contains information about the structure file, if applicable. Only two of these fields are used for sequences, "sequence"
(indicating that the file contains a sequence without known structure) and "TvLDH" (the model file name). The rest of the file
contains the sequence of TVLDH, with "*" marking its end. The standard one-letter amino acid codes are used. (Note that they
must be upper case; some lower case letters are used for non-standard residues. See the file modiab/restyp. 11 In the Modeller
distribution for more information.)

A search for potentially related sequences of known structure can be performed by the profile.build() command of MODELLER.
The following script, taken line by line, does the following (see file "ouild_profile.py"):

1. Initializes the ‘environment for this modeling run, by creating a new ‘environ’ object. Almost all MODELLER scripts require
this step. as the new object (which we call here ‘env’. but vou can call it anything vou like) is needed to build most other

from modeller import *

log. verbose ()
env = environ()

#-- Prepare the input files

#-- Read in the sequence database

sdb = sequence_db(env)

sdb.xead(seq database_file='pdb_95.pir', seq database_format='PIR',
chains_list='ALL', minmax_db_seq_len=(30, 4000), clean_sequences=True)

#-- Write the sequence database in binary form
sdb mite (seq database_file-'pdb_95.bin', seq database_format
chains list="ALL')

#-- Now, read in the binary database
sdb.read(seq database file='pdb_95.bin', seq database format=

Chains list="ALL')

#-- Read in the target sequence/alignment
aln = alignment (env)
aln.append(file="TYLDX.ali', alignment_format="PL

#-- Convert the input sequence/alignment into
# profile format
prt = aln.to profile()

#-- Scan sequence database to pick up homologous sequences

prf build(sdb, matrix_offset=-450, rr_file='${LIB} /blosum62.sim.mat',
gap_penalties_1d=(-500, -50), n prof iterations
check profile=False, max_aln_evalue=0.01)

#-- Write out the profile in text format
prf-write(file-'build profile.pr£', profile_forma

TEXT')

#-- Convert the profile back to alignment format
aln = pr£.to_alignnent ()

#-- Write out the alignment file
aln.wite(file='build profile.ali', alignment_format='PIR')

W Number of sequences: 30
# Length of profile 335

# N_PROF_ITERATIONS 1

W GRP_PERALTIES 1D -900.0 -30.0

# MATRIX OFFSET 0.0

# RR FILE : S{MODINSTALLOV1) /mod1ib//asi.sim.mat

335
312
327
318
315
314
312
303
299
305
301
307
332
331
316
329
331
333

335
242 6
331

o 0 op 0.00
3 229 164 28.
6 325 316 42.
325 1 310 309 45.
256 4 250 238 25.
334 33 358 325 3.
320 3 303 289 27.
320 3 207 278 26.
191 9 ım 158 a.
301 8 200 265 25.
323 8 209 274 26.
191 3 183 173 29.
300 94 304 207 25.
295 93 298 196 26.
298 73 301 214 26.
301 56 302 244 2.
306 67 306 227 26.
300 94 304 207 26.

(KRKKKKKKKKKKRKKKRKRO
・ ビ ヒビ ロビ ビビ ロビ ビビ ビ ビビ ビビ ビビ o ゥ

3. Aligning TVLDF with the template

A good way of aligning the sequence of TYLDH with the structure of 1bdm:A is the align2d() command in MODELLER. Although align2d() is based on
a dynamic programming algorithm, itis different from standard sequence-sequence alignment methods because it takes into account structural
information from the template when constructing an alignment. This task is achieved through a variable gap penalty function that tends to place gaps in
solvent exposed and curved regions, outside secondary structure segments, and between two positions that are close in space. As a result, the
alignment errors are reduced by approximately one third relative to those that occur with standard sequence alignment techniques. This improvement
becomes more important as the similarity between the sequences decreases and the number of gaps increases In the current example, the template-
target similarity is so high that almost any alignment method with reasonable parameters will result in the same alignment. The following MODELLER
script aligns the TvLDH sequence in file "TvLDH. ali" with the 1bdm:A structure in the PDB file "1bdm. pab" (file "ali gn2d. py").

from modeller import *

environ()
alignment (env)
model(env, file="1bdm', model_segment=("FIRST:A' ‚'LAST:A'))
ppend_model(mäl, align_codes='Ibdmi', atom files='1bam.pab')
-append{file-"TvLDH.ali', align_codes="TvLDH"}
A
te(file=' TvLDN-1bamA.ali', aligrment_forma:
TWLDH-1hamA.pap', aligrment_forma:

File align? 23

In this script, we again create an environ! object to use as inputto later commands. We create an empty alignment ‘ain’, and then a new protein model
‘mal, into which we read the chain A segment of the 1bdm PDB structure file. The append_model() command transfers the PDB sequence of this
model to the alignment and assigns it the name of" fbdmA" (align_codes). Then we add the "TvLDH" sequence from file “2 vLDH. seg" to the
alignment, using the append() command The align2d() command is then executed to align the two sequences. Finally, the alignment is written out in
‘two formats, PIR ("TvLDH-1bdmA. ali") and PAP ("TvLDH-1bdmA. pap"). The PIR format is used by MODELLER in the subsequent model
building stage, while the PAP alignment farmat is easier to inspect visually. Due to the high target-template similarity, there are only a few gaps in the
alignment. In the PAP format, all identical positions are marked with a **" (file "TvLDH-1bdma. pap").

from modeller import *
from modeller -automodel import *

env = environ()

a = automodel(eny, alnfile='TvLDH-1bamA.ali' ,
‘Knowns='ibdmA', sequence='TuLDH'

ess _methods=(assess DOPE, asst
a.starting model = 1

5. Model evaluation

If several models are calculated for the same target, the "best" model can be selected in several ways. For example, you could pick the model with the
lowest value of the MODELLER objective function or the DOPE assessment score, or with the highest GA341 assessment score, all of which are
reported in the log file, above. (The objective function, molpdf, is always calculated, and is also reported in a REMARK in each generated PDB file. The
DOPE and GA341 scores, or any other assessment scores, are only calculated if you list them in assess_methods ) The molpdf and DOPE scores
are not ‘absolute’ measures, in the sense that they can only be used to rank models calculated from the same alignment. Other scores are transferable.
For example GA341 scores always range from 0.0 (worst) to 1.0 (native-like), however GA341 is not as good as DOPE at distinguishing 'good' model:
from ‘bad! models.

Once a final model is selected, it can ba further assessed in many ways. Links to programs for model assessment can be found in the MODEL
EVALUATION section on

Before any external evaluation of the model, one should check the log file from the modeling run for runtime errors ("model-single. log") and
restraint violations (see the MODELLER for details). The file "evaluate_model. py" evaluates an input model with the DOPE potential.
(Note that here we arbitrarily picked the second generated model - you may want fo try other models.)

from modeller import *
from modeller.soripts import complete_pdb

log.verbose() # request verbose output
env = environ()

env.libs. topology .read(file='§(LIB)/top_heav.1ib') # read topology
env .1ibs. parameters .read(#ile='§(LIB) /par.1ib') 8 read parameters

# read model file
mal = complete pdb(enw, 'TWLDM.B99990002.pab')

| # Assess with DOPE:
AA AAA

DATABASES

CATH S hito: Awww biochem UcLac.uk/osm/cathy

DBAIi El http: ¿www salilab.org/OBAI

GenBank 3 http: Aw. ncbinim nih gowGenbank/GenbankSearch html
GeneCensus 5 http: #bioinfo.mbh yale edu/genome
MODBASE 5 http: #salilab org/modbases

MSD 5 http: www TCSh org/databases html

NCBI 5 http: Awww CRi lm nih ay

PDB Ss http: ¿Aura TCSh orgépdh{

PSI 5 http: Awww nigmas nih.gowpsi

Sacch3D 3 http: genome-www stanford edu/Sacch3D/
SCOP El bttp:f/scop_.rnrc-Imb.cam.ac.uk/scop{

TIGR 5 http tigr orqtdb/mdb/mdbcomplete html
UniProtkB Ss http ‘Awww uniprot org!

FOLD ASSIGNMENT

123D S http:#123d nciferf gov?

3D-PSSM 5 http: fan sh bio ic ac.uk/-3dpssmi

BIOINBGU El http: cs 1004 ac il/-bioinhqui

BLAST 5 http: #himmm ncbi.nlm nih.gowBLAST{

DALI 5 http: ebi ac uk/dalif

FASS 5 http:#bioinformatics burnharn-inst.org/FFAS/index.htrnl
FastA S http ff ebLac ukfastad/

FRSVR 5 bttp:#fold.doe-mbiucla.edu/

FUGUE 5 http: Jwwiw-cryst.bioc cam ac uk/-fugue/

LOOPP 5 http:/ser-loopp te cornell edu/cbhsu/loopp.htm
PDB-BLastFASS 5 Http が hininfarmatics I¡crf edu/pdb blastí

PHD, TOPITS El http Awe predictoratein org’

PROFIT P http /Avwew came shg ac at

SAM-T99/T98 El http /Avwi CSB_UCSC edu/research/compbio/HMM-app sé
THREADER 5 http:#bioinf.cs.uclac.ukfthreader/

UCLA-DOE FRSYR 5 http:#fold.doe-mbi.ucla.edu/

SEQUENCE ALIGNMENT

BCM SERVER 5 http:#searchlauncher bcrn.trac.edu

BLAST2 に http /Ainww nebi.nim.nih.gow/blast/bl2seg/wblast2.cgi
BLOCK MAKER E http: /fblocks fhcrc org/blocks/blockmkr/make_blocks html
CLUSTALW 5 http Mina? ebi ac uk/clustahuf

FASTA3 5 http: /Mwww2.ebi.ac.ukfastad/

Gerstein Group El http:/bioinfo.mbb yale edu/Align?

MULTALIN El htto:/prodes toulouse inra fr/multalin/muitalin html

MODELING

3D-JIGSAW 5 http: Awww mm icnet.uk/servers/3djigsaw/
CPH-Models 5 http: Awww chs dtu dk/services/CPHmodels{
COMPOSER P hcCPySt bioc cam ac uk

CONGEN P http: Aww congenomics.cormcongen/congen htm!
TIGERFAMS S http: Awww tigr.orgTIGRFAMsÄndex.shtml

ICM ta) P http: Am molsoft. coms

Insightll fb) P http: Ana accelrys com

MODELLER P htto://salilab org/modeller/modeller. htrn!

ModWeb 5 http: #salilab.org/modweb

QUANTA, (b) P http: nm accelrys.com/

SYBYL (c) P ‚http: ww tripos .corn/

SCWRL P http: #dunbrack.fece edu/SCWRL3 php

SDSC1 S http.wclsdsceduhm html

SWISS-MODEL El http: Awww expasy.ch/swissmod/SWISS-MODEL.html
WHAT IF P http: cmbikun nlAwhatif

MODEL EVALUATION

ANOLEA 5 http: /fprotein bio puc.clcardex/servers/index html
BIOTECH? 3 httpbiotech.embiebi ac uk:8400/

CAFASP El http: /bioinfo plcatasp/

ERRAT Ss hittp:/Awww.doe-mbiucla.edu/Services/ERRATw2/
EVA El http:#cubic bioc.columbia.eduwevaf

LiveBench El http.w/bioinfo pl/LiveBench/

PROCHECK P http: fur biochem.ucl.ac.uk/-roman/procheck/procheck.html
Prosall® P http www. came sbg.ac.at!

PROVE 5 http: Aa ucmb.ulb ac be/UCMB/PROVE
WERIFY3D 5 Http Awww. doe-mbiucla.edu/Services/verify 3D/
WHATCHECK PR http vw Sander embl-heidelberg deAvhatcheck/
Tags