ACS Denver 2024: Generative chemistry with deep learning models

aclarkxyz 335 views 12 slides Aug 22, 2024
Slide 1
Slide 1 of 12
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12

About This Presentation

Generative deep learning networks can be used to train an encoder/decoder sequence for converting a molecular graph into a numeric vector and back again. Not only do these vectors make very good descriptors for QSAR models and similarity comparisons, but they bring additional capabilities: it is pos...


Slide Content

Alex M. Clark
Some of the doors we unlock
by describing molecules using
generative models
[email protected]

Generative Models
❖Bottleneck is the latent vector (λ):
✦highly orthogonalized fixed-length vector descriptor
✦and any set of values can be used to recreate corresponding molecule
❖Most models use SMILES input & output
✦linear representation is a good fit for the technology
✦ours also uses graph inputs and fingerprint outputs
2
λ
molecular
graph input
encoder decoder
molecular
graph output

Navigation of Chemical Space
❖λ as a multi-dimensional
coordinate
3
❖e.g. if molecule A has good activity and B has good ADME, is there
something good in between?
A
B
?

Too Many Dimensions
❖λ384 necessary to get most of ChEMBL ➾ SMILES & ECFP
❖Many dimensions hard to visualize: reduction from 384 → 2
4
❖Medium sized datasets (several thousand molecules) work well
❖Very simple neural network construction
❖Transformation to μxy is reversible
encoder decoder
λ λμxy
flattenunflatten

Flattening Model
❖λ384 vectors reversibly projected onto 2D, displayed interactively
❖Fun fact: deep learning model is trained in-browser, native JavaScript
5
DHFR, 500 molecules

Activity Contours
❖Create a "gradient boost" model for λ384 → activity (fast to train)
❖Empty grid: unflatten μxy → λ384 and feed into model... colour the grid
6

Exploring
7
μxy λ384
unflatten
reconstruct
Nc1ncc(-c2cccc(N(CCC(=O)c3ccccc3)C(=O)C=CCONC(=O)c3ccccc3OCCC(=O)c3nc(N)ncc3Cl)c2)c(N)n1
NC1=NC(NC(=O)c2ccccc2)C(OCCCNC(=O)c2cccc(OCCC(=O)c3ccccc3N)n2)=N1
NC1=NC(NC(=O)c2ccccc2)C(=O)C=C1OCCCNc1cccc(C(=O)NCCC(=O)c2cccc(O)c2ONC(=O)c2ccccc2)n1
NC(N)=Nc1cccc(C(=O)N=C2N=C(N)N(C=CCCCOc3ccccc3-c3cccc(N)n3)C2=O)c1
N=C(N)NCCC=C1N=C(N)C(C=C(C(=O)c2cccc(Cl)c2)c2ccccc2)Oc2c(C(=O)NCCON=C(N)N)cccc21
NC1=NC(NC(=O)c2cccc(OCCCN3C=CC(CCOC(=O)c4ccccc4)N=C3N)c2)=NC1=O

Bio-isosteres
❖Modify the structure to:
✦keep the biological activity
✦re-roll the dice on everything else
❖Conventionally implemented as scaffold replacements
❖Pain points:
✦where to get suitable bio-isostere transforms?
✦what if you need something a bit less cleanly defined?
8
transform
AB C D
λa λb
δab
λc
λd
δab
+
encode encode encode
decode
derive latent vector offset
from known bioisostere
apply latent vector offset
decode vector to molecule

Quasi-bioisosteres?
9
❖Hypothesis: a translation in latent space can achieve similar effects to a
scaffold replacement
❖derive latent vector offset
❖from known bioisostere
❖apply latent vector offset
❖decode vector to molecule
AB C D
λa λb
δab
λc
λd
δab
+
encode encode encode
decode
indication of how similar transforms are to each other, in latent space

Why?
❖Propose a structural transformation:
✦has bioisostere-like properties
✦but not necessarily a well defined
substructure swap
10
❖Molecules A and B have good activity, but B has better ADME
❖Define transform δ as λ(A) <z λ(B) in latent vector space
❖Apply transform as λ(C) + δ to improve ADME...?
A

C

passes the
eyeball test, but
needs to be
developed

Conclusions & Future Work
❖Operating in latent vector space enables new drug design tools...
❖... currently in early prototype stage
❖Generative models: shrink latent vectors? alternatives to SMILES?
❖Exploring multidimensional space via gradient
❖Contours: plot multiple activities and ADME/tox
❖Quasi-bioisosteres: can it enrich datasets?
❖TODO: get some of these tools into the hands of scientists...
11

Questions?
❖Contact:
✦Alex M. Clark [email protected] (Collaborative Drug Discovery)
❖Thanks to the Vault & Research Informatics teams
12