Basic Concepts and Areas of Application.pdf

AhmadEweas 0 views 69 slides Oct 02, 2025
Slide 1
Slide 1 of 69
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69

About This Presentation

basic sceicne


Slide Content

1
Chemoinformatics:
Basic Concepts and Areas of Application
Alexandre Varnek
Laboratory of Chemoinformatics, University of Strasbourg

Double diploma UniStra/KFU

Chem(o)informatics
Cheminformatics
Chemical Informatics

Infochimie
Chémoinformatique

Хемоинформатика

Chemoinformatics is a generic term that encompasses the design, creation, organization,
management, retrieval, analysis, dissemination, visualization, and use of chemical
information
G. Paris, 1998
Chemoinformatics - definition

Chemoinformatics is the application of informatics methods
to solve chemical problems
J. Gasteiger, 2004
Chemoinformatics is the mixing of those information resources to transform data into
information and information into knowledge for the intended purpose of making better
decisions faster in the area of drug lead identification and optimization”
F.K. Brown, 1998
Chemoinformatics is a field based on the representation of molecules as objects
(graphs or vectors) in a chemical space
A. Varnek & I. Baskin, 2011

Selected books in chemoinformatics

Paul Emile Lecoq de
Boisbaudran
Gallium discovery:
the first QSAR successful story
Predicted in 1869
Dmitry Mendeleév
Discovered in 1875
Density
pred ≈ 6.0 g/cm
3
Density
exp = 4.7 (initial)
Density
exp = 5.935
(corrected)

Chemoinformatics:
new disciline combining several „old“ fields
•Chemical databases
•Structure-Activity modeling (QSAR)
•Structure-based drug design
•Computer-aided synthesis design
Peter Willett Michael Lynch
Corwin Hansch Johann Gasteiger
Irwin D. Kuntz
Elias Corey Ivar Ugi
Hans-Joachim Böhm

• Needs for chemoinformatics
• Fundamentals of chemoinformatics
• Chemical Space paradigm
• Virtual screening approaches
• Perspectives
OUTLOOK

Needs in Chemoinformatics

10
Chemical universe
> 100 M compounds are
currently recorded

•How to select useful compounds from this huge dataset ?
•How to design new compounds ?
•How to synthesize these compounds ?

Target Protein
Large libraries
of molecules
High Throughout Screening
Hit
experiment
computations
Virtual
Screening
Small Library of selected hits

Chemical universe:

• > 10
8
compounds are currently available
• 10
33
druglike molecules could potentially be synthesised
(see P. Polischuk, T. Madzidov et al., JCAMD, 2013)
Virtual screening is inevitable to analyse a
huge amount of protein-ligand combinations
Virtual screening must be very fast and efficient !
Human proteome:

• 5000 druggable proteins

Ionic Liquids
Ionic Liquids are composed of
large organic cations:
PF
6
-
, Cl
-
, BF
4
-
, CF
3SO
3
-
, [CF
3SO
2)
2N]
-

and anions: NRR
12
+ NR
R
1
2
+ N
N
+
R
R
R
1
2
3 N
R
R
R
R
1
2
3
4
+ N
N
+
R
R
1
3

There exist combinations of
ions that could lead to useful ionic
liquids
Ionic Liquids
Large organic cations:
PF
6
-
, Cl
-
, BF
4
-
, CF
3SO
3
-
, [CF
3SO
2)
2N]
-

anions: NRR
12
+ NR
R
1
2
+ N
N
+
R
R
R
1
2
3 N
R
R
R
R
1
2
3
4
+ N
N
+
R
R
1
3
10
18

Virtual screening : finding the needle in the haystack

CHEMICAL DATABASE
~10
6
– 10
9

molecules

Chemoinformatics:
pattern recognition in chemistry
CHEMICAL DATABASE
~10
6
– 10
9

molecules
model
-Specific structural motifs,
-Selected molecular properties (shape, fields, …),
-Interaction patterns,
- Mathematical equations
Activity = F (structure)

VIRTUAL
SCREENING
INACTIVES
HITS
~10
6
– 10
9

molecules
CHEMICAL DATABASE
Chemoinformatics: Virtual screening “funnel”
Similarity search
Filters

(Q)SAR

Docking
Pharmacophore
~10
1
– 10
3

molecules

Chemoinformatics: Virtual screening “funnel”
Similarity search
Filters

(Q)SAR

Docking
Pharmacophore
VIRTUAL
SCREENING
INACTIVES
HITS
~10
6
– 10
9

molecules
CHEMICAL
DATABASE
~10
1
– 10
3

molecules
Ligand-based
Structure-based

Chemoinformatics as a
theoretical chemistry discipline

20
Chemoinformatics is defined as individual discipline
characterized by its own molecular model, basic concepts,
major applications and learning approach

21
Theoretical chemistry
Quantum Chemistry
Force Field
Molecular Modelling
Chemoinformatics
- Molecular model
- Basic concepts
- Major applications
- Learning approaches

22
Molecular Model
Quantum Chemistry
Force Field Molecular Modelling
Chemoinformatics
• molecular graph
• descriptor vector
electrons and nuclei
atoms and bonds
Chemoinformatics is a field based on the representation of molecules as
objects (graphs or vectors) in a chemical space

Chemoinformatics: From Data to Knowledge
know-
ledge
information
data
generalization
context
measurement
or calculation
deductive
learning
inductive
learning
Chemoinformatics learns from experimental data !

Basic concepts
Quantum Chemistry
Force Field
Molecular Modelling
Chemoinformatics
chemical space
wave/particle dualism
classical mechanics

Chemical space paradigm

26
Chemical Space representations
graphs-based descriptors -based
SPACE = objects + metric

Graph-based chemical space

A. Schuffenhauer, P. Ertl, et al. J. Chem. Inf. Model., 2007, 47 (1), 47-58
Scaffold Tree

Natural Product Scaffold Tree
Courtesy of P. Ertl

Natural Product Scaffold Tree
Courtesy of P. Ertl

Descriptors-based chemical space
vectorial space defined by molecular descriptors

32
Case study: Hansch Analysis
3 types of physicochemical parameters are used:

• Electronic (s)
• Steric (dE
s)
• Hydrophobic (logP)
Biological Activity = f (Physicochemical parameters ) + constant
Activity = a ( log P )
2
+ b log P + s + dE
s + cont

33
Case study: Hansch Analysis
Molecule 1
Molecule 2

34
Molecular Descriptors :


ensemble of topological, electronic, geometry parameters calculated directly
from molecular structure
Descriptors
D
1
D
2

D
i

Molecular graph
-Topological indices,
- Atomic charges,
- Inductive descriptors,
- Substructural fragments,
- Molecular volume and surface, …
Descriptor vector
> 5000 types of descriptors are reported

35
Chemography:

Design and visualization of chemical space

Greenland
2.2 M km
2
Australia
7.7 km
2
Arabian Peninsula
3.5 M km
2
Dimensionality Reduction problem

37
Swiss Roll
• GTM relates the latent space with a 2D “rubber sheet” (manifold) injected into
the high-dimensional data space.

• The visualization plot is obtained by projecting the data points onto the manifold
and then letting the “rubber sheet” relax to its original form.
Generative Topography Mapping (GTM)
N. Kireeva, I. Baskin, H. Gaspar, D. Horvath, G. Marcou, A. Varnek Mol. Inf. 2012, 31, 301–312

GTM of a dataset containing 10 activities from DUD
Similarity principle:
similar molecules possess similar properties

39
Chemical Similarity
0.82
0.39
0.84
0.72
0.67
0.64
0.53
0.56
0.52
reference
compound
Similar compounds possess similar properties

Chemical space representation: Activity Landscapes 

i
ik
i
iki
k
R
RA
= A
Expectation of activity in k - node for the training set
logK of Lu
3+
L complexes Ak

logK
Lu

42
Strong binders Weak binders
Activity landscape of lanthanides’ binders
Generative Topographic Mappping
of the set of Ln binders

Contours correspond to different
logK values
H. Gaspar, I. Baskin, G. Marcou, A. Varnek unpublished results

Biopharmaceutics Drug Disposition Classification System
DATASET: 893 drugs
DESCRIPTORS: VolSurf
Case study: classification models for BDDCS classes
Visualization of models’ Applicability Domain

44
CPF ≤ 1, coverage =100 % CPF ≤ 5, coverage = 47 %
BDDCS classes probability distribution
Colored zones on the maps correspond to model’s applicability domain
H. Gaspar, G. Marcou, A. Varnek JCIM, 2013
Class Preference Factor ??????????????????=
max????????????(??????|??????)
??????(??????|??????
??????)
,∀??????
??????≠??????

Chemoinformatics:
Properties predictions

46
Quantitative Structure-Activity Relationships
(QSAR)
Activity = F (structure)
= F (descriptors)
machine-learning methods
•neural networks, support vector machine,
random forest, naïve Bayes, PLS, …

A. Varnek & I. Baskin Machine Learning Methods in Chemoinformatics: Quo Vadis?
J. Chem. Inf. Model. 2012, 52, 1413−1437

predictions of > 20 physico-chemical
properties and NMR spectra for
each individual compound
Chemoinformatics tools in SciFinder:

ISIDA virtual screening platform
infochim.u-strasbg.fr/webserv/VSEngine.html

Machine Learning Methods in Chemoinformatics: Quo Vadis ?
A. Varnek

and I. Baskin , J Chem. Inf. Mod., 2012, 52, 1413-1437

Chemoinformatics:
virtual screening in 3D

Virtual screening : finding the needle in the haystack

CHEMICAL DATABASE
~10
6
– 10
9

molecules

What is in common between these two molecules ?

-
+
+ -
-
Arg-Gly-Asp-Phe
Tirofiban

Pharmacophore model of ligand complementary to
integrine α
IIb
β
3

Positive charge,
H-donor
Negative charge,
H-acceptor
15.5 Å
5 Å
- +
Hydrophobic
interactions
-
+
+ -

pK
i = 7.51
TanimotoCombo = 0.74
pK
i = 7.82
TanimotoCombo = 0.67
pK
i = 7.82
Molecular Shape similarity analysis

Molecular fields

56
Lock Key
Ligand-Protein complex
+
Hermann Emil Fischer
Ligand-to-protein docking :
Lock-and-key paradigm

Selected in silico designed compounds that were synthesized
and successfully tested for bioactivity
G. Schneider J Comput Aided Mol Des (2012) 26:115–120

Chemoinformatics:
areas of application
-Drug design (pharmacodynamics and pharmacokinetics),
-Prediction of physico-chemical properties,
-Materials design,
-Synthesis design,
-Molecular spectra simulations

Chemoinformatics:
perspectives

60
Assessment of biological activity

61
Assessment of side effects

62
See review by D. Rognan, British Journal of Pharmacology (2007), 1–15

Chemoinformatics : Complexity challenge
P. Csermely1 et al. Pharmacology & Therapeutics, 2012

64

Day 1: Databases
Veli-Pekka Hyttinen
Timur Madzidov
Gilles Marcou Dragos Horvath
Chemical Databases: Encoding, Storage and Search
of Chemical Structures
SciFinder - The choice for chemistry research
Tutorial with ChemAxon

Day 2: QSAR
Igor Tetko
Igor Baskin
Obtaining, Validation and Application of SAR/QSAR Models
SAR/QSAR Modelling: state of the art
Tutorial with OChem
Alex Tropsha
ADMET Predictions

Day 3: virtual screening in 3D
Conformational Sampling
Pharmacophore and Its Applications
Tutorial with LigandScoute
Molecular Docking Methods
Gilles Marcou
Dragos Horvath
Thierry Langer
Sharon Bryant
Gilles Marcou Dragos Horvath
Tutorial with LeadIt

Day 4: Drug Design applications
Konstantin Balakin
Vladimir Poroikov
Computational Mapping Tools for Drug Discovery
Drug Design & Discovery in Academia

staff
Invited Professors
at UniStra
Invited Lecturers
Visiting scientists
Visiting friends
Tags