Intro to in silico drug discovery 2014

leelarcombe 3,828 views 74 slides Nov 17, 2014
Slide 1
Slide 1 of 74
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74

About This Presentation

An introduction to in silico methods in drug discovery, covering small molecule drugs and biologics, and considering safety and efficacy.


Slide Content

An Intro to in silico drug Design:
considering safety and efficacy
Dr Lee Larcombe
[email protected]

Lecture Aim
This lecture aims to provide a basic understanding of
the concept of protein and molecular in silico
engineering/design as part of the drug development
process:-
Introducing theory and approaches, drivers, databases
and software – and with a focus on safety and efficacy.

This Lecture Covers
•Drivers for use of computational approaches
•Small molecule drugs
•Getting protein structures
•Simulation of molecular interactions
•Considering safety during design
•Biologics – antibody therapeutics
•Engineering biologics for safety – reducing immunogenicity
•Considering efficacy of biologics
•We will also highlight key software or data sources along
the way

Key Drivers for in silico

Business
Target identification
Lead selection
Lead refinement
Pre-Clinical phases
Genomics
Proteomics/Metabolomics
Interaction Networks
Molecular modelling
Protein modelling
Chemoinformatics
Molecular modelling
Data modelling
Interaction Networks
Systems Biology
In vitro
In vivo
££
£
£
££
£

Ethics Drivers
•Use of animals in research
•3Rs – Refine, Reduce, Replace
•Relevance of animal data for human use
•Extrapolation across species
•Improvement of safety for subsequent trials
•Regulatory requirements and change

Extrapolation of data across
species
How relevant is animal physiology to human physiology ?
Models not available for all diseases
Choice of species can be important
•30% attrition due to no efficacy in man
•10% attrition due to toxicity
For biologics, even more difficult to predict

Part 1: Small Molecule Drugs
8

Safety and Efficacy of Small Molecule
Drugs
•Safety: safety issues primarily focus on the potential of
the small molecule to have off-target effects,
metabolite/breakdown product toxicity, or buildup/non
clearance
•Efficacy: efficacy issues focus on bioavailability and good
binding kinetics to the right target protein – including
variations of that protein (SNPs/mutants)

1
st
we need a source of molecules:
Chemical Repositories
•Databases with safety information (GRS, CAS)
•Databases with structure and vendor/price – individual
chemical supply companies - Zinc
•Databases with multiple information types – ChEMBLdb,
PubChem, Kegg

ChEMBLdb
“The ChEMBL database (ChEMBLdb) contains medicinal chemistry bioassay data,
integrated from a wide variety of sources (the literature, deposited data sets, other
bioassay databases). Subsets of ChEMBLdb, relating to particular target classes, or
disease areas, are exported to smaller databases, These separate data sets, and the
entire ChEMBLdb, are available either via ftp downloads, or via bespoke query interfaces,
tailored to the requirements of the scientific communities with a specific interest in these
research areas”
• Targets: 10,579
• Compound records: 1,638,394
• Distinct compounds: 1,411,786
• Activities: 12,843,338
• Publications: 57,156
(release 19)

ChEMBL
www.ebi.ac.uk/chembl/

What can we do with chemical
models?
We can investigate structure and similarities of structure
between molecules
We can map structural characteristics to properties (SARs)
We can study molecular interactions – particularly with
proteins

•Computation to assess binding affinity
•Looks for conformational and electrostatic "fit" between
proteins and other molecules
•Optimization: Does position and orientation of the two
molecules minimise the total energy? (Computationally
intensive)
•Docking small ligands to proteins is a way to find potential
drugs. Industrially important!
Interactions – Docking & Screening

•Docking small ligands to proteins is a way to find potential
drugs. Industrially important
•A small region of interest (pharmacophore) can be identified,
reducing computation
•Empirical scoring functions are not universal
•Various search methods:
•Rigid- provides score for whole ligand (accurate)
•Flexible- breaks ligands into pieces and docks them
individually
Virtual Screening

So – we need protein (target)
structures
http://www.rcsb.org/

The PDB
The PDB was established in 1971 at Brookhaven National
Laboratory and originally contained 7 structures. In 1998,
the Research Collaboratory for Structural Bioinformatics
(RCSB) became responsible for the management of the
PDB.
Last year (2013), 9597 structures were deposited from
scientists all over the world – this year (2014) so far, 8391
Now totals 104,866 (yesterday) structures

Entries in database - cumulative and by year
Red = total
Blue = yearly

What if there is no structure available?
Can we predict structures?
Tertiary structure is dependent on ‘folding’ of the protein.
Recognition, characterisation, and assignment of domains
and folds is a major area of structural bioinformatics.
Predicting structure from sequence is one of the biggest
challenges...

Levinthal’s paradox (1969)
100 residues = 99 peptide bonds
therefore 198 different phi and psi bond
angles
3 stable conformations of bond angle = 3
198

possible conformations
At a nano/pico second sample rate proteins
would not find correct structure for a long
time (longer than the age of the Universe!)
Folding is Complex: Is a truly random
approach possible?
Proteins fold on a milli/micro second timescale – this is the paradox...
phi
psi

1.proteins do NOT fold from random conformations,
which was an assumption of Levinthal's calculation
2.instead, they fold from denatured states that retain
substantial 2
o
, and possibly 3
o
, structure
•Simulations are computational expensive
•Gross approximations in simulations
•Nature uses tricks such as
•Posttranslational processing
•Chaperones
•Environment change
Why are folding simulations so difficult?
How does it work at all?

Complexity & Diversity –
potential vs reality
If the average protein contains about 300 amino acids, then
there could be a possible 20
300
different proteins
(Apparently) this is more than the atoms in the universe!
Yet a human (complex) has only 30,000 proteins
All proteins so far appear to be represented by between
1000 - 5000 fold types

Two reasons for limited fold space
Convergent evolution
Certain folds are biophysically favourable and may
have arisen in multiple cases
Divergent evolution
The number of folds seen is limited because they have
evolved from a limited number of common ancestor
proteins
Despite the evolutionary limitation of the number of existing folds (fold
space) it is still complex enough to make classification and
comprehension difficult

Why is Folding Difficult to do?
It's amazing that not only do proteins self-assemble -- fold -- but they do
so amazingly quickly: some as fast as a millionth of a second. While this
time is very fast on a person's timescale, it's remarkably long for
computers to simulate.
In fact, it takes about a day to simulate a nanosecond (1/1,000,000,000 of
a second) of dynamics for a reasonable sized protein. (eg Intel core i7
2.66Ghz)
Unfortunately, proteins fold on the tens of microsecond timescale (10,000
nanoseconds). Thus, it would take 10,000 CPU days to simulate folding
-- i.e. it would take 30 CPU years! That's a long time to wait for one
result!

A compromise: Homology modelling
If there is no structure for your protein - perhaps there is
one for a similar protein.
Sequence alignment tools can be used to compare this to
your sequence with unknown structure
Homology searching and sequence alignment is now the
first step to protein structure prediction
If homologous proteins are found with structures, unknown
can be ‘overlayed’ and structure inferred

Homology Modeling
Based on two assumptions:
1.The structure of a protein is determined by its amino acid
sequence alone
2.With evolution, the structure changes more slowly than
the sequence - similar sequences may adopt the same
structure

Sequence alignment
TEX19 – human protein without a
structure.
PDB 2AAM: Crystal structure of a
putative glycosidase (tm1410) from
thermotoga maritima

Structure inference/alignment

ExPASy - SwissModel
SwissModel (swissmodel.expasy.org/)

Phyre2
http://www.sbg.bio.ic.ac.uk/phyre2

More annotation
http://genome3d.eu/

Using the Models – Docking/Screening
•Choose and prepare target protein
•Identify binding pocket
•Fit ligand to pocket
•Score
•(for screening – repeat!)

Identify the Binding Pocket
•Could identify this by the location of an existing co-
crystallised ligand
•Or use surface sphere clusters
•Or identify it by clustering of solvent molecules (normally
water)
•Perhaps identify it by clustering of fragments (SurFlex
dock protomol)

Binding site based on existing
ligand
•Most methods allow you to
specify where the site is –
perhaps by identifying key
residues or based on an
existing ligand
•Could use the ‘hole’ left by the
ligand as a pocket, or use the
‘surface’ of the ligand as a
protomol

Surface Sphere generation
•Generate the surface of the target
– Connolly surface
•‘Rolls’ a sphere the radius of
water across the van der Waal’s
surface of the target
•Each atom’s centre of van der Waal’s radius acts as a sitepoint for the
generation of a sphere on the surface whose centre is perpendicular to
the surface at the sitepoint.
•Spheres are then clustered – each cluster is a potential pocket

Identified pocket

Prepare the ligand
•The ligand needs to be prepared too
•Drawn & minimised
•From a database - & minimised
•Extracted from another/the same binding site
•Hydrogens added etc
•Minimised/optimised – ready to dock

Docking
•Rigid docking -> ligand is fixed conformationally
•Flexible docking –> ligand is conformationally flexible
•Posable -> ligand is rigid, but moved spacially

Rigid Ligand docking
•Centres of spheres
representing the binding
pocket act as ‘Site
Points’
•The atoms of the ligand
are matched to the site
points
•Once orientation made,
possibly interaction
minimised: receptor kept
rigid and ligand flexible

Alternatives
Flexible Docking Posable Docking
Rings treated as flexible
Other bonds treated as
flexible/rotamers
Rings treated as rigid – ligand
fragmented
Rigid docking, but ligands
posed conformationally
•Rotated
•Twisted
•Flipped etc
And repetitively docked to find
best fit

Example Interaction – Avidin / Biotin

Virtual Screening
•Docking – but repeated with many potential ligands
•Libraries can come from resources such as
PubChem/ChEMBLdb – vendors – or other in-house
sources
•From specialised databases holding structures suitable for
docking
•It is important to have a diversified library especially for
rigid docking !

Considering safety & efficacy – “Drug-
like”
Lipinski rule of 5 (or Pfizer rule)
‘Compounds which violate at least two of the following conditions have
a very low chance of being orally bioavailable’
•MW <500 Da
•log P (lipophilicity) <5
•number of H bond donors <5
•number of H bond acceptors <10
Works well once you have descriptions of small molecules – can be
search criteria in databases...

ADME / ADME-Tox
•Lipinski rule is really the 1
st
step in ADME (adsorption,
distribution, metabolism, excretion) modelling
•Structure Activity Relationships (SARs) – similar
molecules will behave in similar ways, ie have similar
effects.
•Allows for knowledge-based compariative analysis – Tox
databases

ChEMBL SARfari(s)

Knowledge-based
tox in silico
www.dixa-fp7.eu

Toxicogenomics – Open TG-Gates

HeCaToS http://www.hecatos.eu/

Part 2: Biologics

What are Biologics?
Typically biologics are thought of as being either antibody
therapeutics or components of vaccine products.

However... (from FDA CBER)
Biological products include a wide range of products such as vaccines, blood
and blood components, allergenics, somatic cells, gene therapy, tissues, and
recombinant therapeutic proteins. Biologics can be composed of sugars,
proteins, or nucleic acids or complex combinations of these substances, or
may be living entities such as cells and tissues. Biologics are isolated from a
variety of natural sources - human, animal, or microorganism - and may be
produced by biotechnology methods and other cutting-edge technologies.
Gene-based and cellular biologics, for example, often are at the forefront of
biomedical research, and may be used to treat a variety of medical conditions
for which no other treatments are available.
Center for biologics evaluation and research
We will just consider antibodies here...

Safety and Efficacy of Biologics
•Safety: safety issues primarily focus on the potential of
the protein biologic to raise an immune response in the
subject. This could be mild or severe.
•Efficacy: efficacy issues focus on either the raising of anti-
drug antibody responses, or the in vivo half life of the
protein

Making suitable Abs for therapy
Monoclonal antibodies are traditionally made using Mice* – these are
fine for R&D use, but bring problems for use in Humans
When developing Abs for therapeutic use there are very few
requirements for modelling or in silico engineering as most of the work
can be simple molecular biology (gene editing/expression systems)
However, the use of in silico engineering provides further options for
improving or modifying function – particularly considering safety and
efficacy.
*also phage or ribosome display – or now, humanised mice, which can avoid these problems – but are
beyond the scope here

Immune response: B-cell activation
a) "B cell activation" by Fred the Oysteri. Licensed under Public domain via Wikimedia Commons
b) "T-dependent B cell activation" by Altaileopard - Own work. Licensed under Public domain via Wikimedia Commons
(a)
(b)

Antibody structure
By Dan1gia2 (Own work) [CC-BY-SA-3.0
(http://creativecommons.org/licenses/by-
sa/3.0)], via Wikimedia Commons

Size relationship
antibody
rhinovirus
DNA and DNA
polymerase
ribosome
rhodopsin
membrane
cyclooxygenase
http://www.rcsb.org/

Chimeric Ab:
Retain the murine variable domains –
splice to Human constant domain.
75% Human*
Humanised Ab:
Retain the murine CDRs – splice to
Human variable framework & constant
domain.
95% Human*
Best to try and ‘humanise’ them as a first
step – helps both:
Safety and Efficacy
Engineering:
* refers to percentage Human origin. Of
course, being both mammals the mouse
and Human have fairly high antibody
sequence similarity

Targets for engineering
By Dan1gia2 (Own work) [CC-BY-SA-3.0 (http://creativecommons.org/licenses/by-
sa/3.0)], via Wikimedia Commons
CDR – tweak to remove unwanted PTM
sites – mitigate immunogenicity (more
later) at human/mouse interface
VL/H – remove unwanted PTMs. If
Chimeric, reduce immunogenicity at C/V
interface
Fc – Select effector functions, remove
unwanted PTMs, enhance function?
Other – Add drug conjugates?
(Beyond the scope of this talk)

What about Fc selections?
Salfeld, J.G., 2007. Isotype selection in antibody engineering. Nature
Biotechnology, 25(12), pp.1369-1372.

Half life
•Proteins & Biologics will be slowly cleared by the system
(either immunologic response or cellular
uptake/destruction)
•Two main strategies to increase serum halflife: increase
the size (pegylation) or exploit (enhance?) natural protein
recycling (via FcRn)

FcRn – neonatal Fc Receptor
Roopenian, D.C. & Akilesh, S., 2007. FcRn : the neonatal Fc receptor comes of age. Nature Reviews,
Immunology, 7, pp.715-725.

FcRn in the adult
Roopenian, D.C. & Akilesh, S., 2007. FcRn : the neonatal Fc receptor comes of age. Nature Reviews,
Immunology, 7, pp.715-725.

IgG : FcRn binding
Roopenian, D.C. & Akilesh, S., 2007. FcRn : the neonatal Fc receptor comes of age. Nature Reviews,
Immunology, 7, pp.715-725.

Deimmunisation & ADA
•If part of the Ab is recognised as foreign – it can stimulate
a T-cell response when the fragment is presented on
MHCII, and...
•If the Ab contains a B-cell epitope (it will), then...
•The immune system will raise antibodies to the biologic
which may be harmful to the patient or at least reduce the
usefulness of the drug
•Engineer to remove the T-cell epitopes (Humanisation +
deimmunisation strategy)

Safety: reducing immunogenicity
a) "T-dependent B cell activation" by Altaileopard - Own work. Licensed under Public domain via Wikimedia Commons
(a)
If the Antibody (antigen) doesn’t have any epitopes that will (a) bind MHC
II or (b) be recognised by a TCR – the B-cell will not be activated, and no
ADA
We can deal with (a) though engineering - deimmunisation

Predicting T-cell epitopeshttp://www.iedb.org/

Sequence-level engineering
PGLVRPSQTLSLTCT = T-cell epitope
PGLVRPSATLSLTCT = weak or non-epitope?
Remove or mitigate the risk – taking into account the
promiscuity of the epitope for HLA types, and population
variation.

MHCII varies
by population,
but so does
IgG...
Jefferis, R. & Lefranc, M.-paule, 2009. Human immunoglobulin allotypes. Possible implications
for immunogenicity. mAbs, 1(4), pp.1-7.

Aggregation & ADA
T
-
c
e
ll
e
p
it
o
p
e
s
A
g
g
r
e
g
a
t
io
n
a) "B cell activation" by Fred the Oysteri. Licensed under Public domain via Wikimedia Commons
(a)
If antigen can cross-link the
B-cell receptor, the cell will
become activated without the
presence of a T-cell
The result is mainly IgM, but
can still be a problematic
response
Aggregated antigen can
cause the cross-linking –
even when as “Human-like”
as possible
This is T-cell Independent B-
cell Activation

Aggregation & ADA
Engineer to remove potential aggregation hotspots
(disorder/hydrophobicity, PTMs and pI shift potential,
hydrophobic patches)
Predicting aggregation is really hard!
Problem – sometimes this is due to formulation!

Final Comments

Remember the Key Drivers for in silico
approaches

Explore the following Software Tools
As well as resources mentioned in the slides!
Homology Modelling
Modeller, Phyre, SwissModel
Model Viewers
Pymol, Jmol, Rasmol
Molecular Simulation etc
Gromacs, Tinker, Amber, NAMD, Charmm,
Docking/Screening
Surflex Dock, Dock, AutoDock, Vina
Graphical Tools/builders/interfaces
Chimera, Maestro, Ghemical, VMD, DeepView
Suites (companies)
Tripos, Accellrys, OpenEye, ChemAxon, Schrodinger, MoE, Yasara
Some are free for
academic use, but cost
for commercial use
Take note and beware!

Workflow example – free vs paid
ChEMBL
PDB
Discovery
Studio
Marvin Sketch
Chimera
Gromacs
Dock
Chimera
ligand
target
get structures
minimisation
dynamics
docking
evaluation
preparation
Commercial suite
vs free tools
£££ $$$