ChEMBL+KNIME

gpapadatos 1,507 views 30 slides Oct 08, 2015
Slide 1
Slide 1 of 30
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30

About This Presentation

ChEMBL and KNIME provide an ideal match of open data with open tools. This is a quick overview of how to access ChEMBL data resources and web services (ChEMBL, UniChem, Beaker, myChEMBL, SureChEMBL) via the KNIME platform.


Slide Content

ChEMBL resources and KNIME




George Papadatos
[email protected]

Outline
• ChEMBL data
• ChEMBL nodes
• Web services v2.0
• UniChem
• Cheminformatics utilities
• myChEMBL
• SureChEMBL and Open PHACTS

Bioactivity data
Compound
Assay/Target
>Thrombin
MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLE
RECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGT
NYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYT
TDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVT
THGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGY
CDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLF
EKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDR
WVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWR
ENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTA
NVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGG
PFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE
3. Insight, tools and resources for translational drug discovery
2. Organization, integration, curation and standardization of pharmacology data
1. Scientific facts
K
i
= 4.5nM
APTT = 11 min.
ChEMBL: Data for drug discovery

Bioactivity data
Compound
Assay/Target
>Thrombin
MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLE
RECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGT
NYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYT
TDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVT
THGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGY
CDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLF
EKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDR
WVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWR
ENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTA
NVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGG
PFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE
3. Insight, tools and resources for translational drug discovery
2. Organization, integration, curation and standardization of pharmacology data
1. Scientific facts
K
i
= 4.5nM
APTT = 11 min.
ChEMBL: Data for drug discovery

KNIME at the EBI
• Access ChEBI and ChEMBL databases via KNIME nodes
• Trusted community nodes
• Algorithms development
• Document classification
• Share example workflows and use cases
• Provide KNIME training to scientists and researchers
• Wellcome Trust drug discovery courses, EMBL courses
• CDK community nodes development
h"p://tech.knime.org/book/embl3ebi3nodes6

ChEMBL nodes

ChEMBL KNIME nodes

Example: All bioactivities for hERG
All6bioac9vi9es6for6hERG6
Ac9vity6value,6assay6
descrip9on,6compound,6
reference6

Example: Compound searching in ChEMBL
Query6
List6of6NNs6

Example: Polypharmacology profile
Compounds
Query6
Find6NNs6
Retrieve6
bioac9vi9es6
6
Filter,6summarise6&6pivot6
6

Web services v2.0
• Many more entities ! granularity
• Pagination, filtering, ordering

UniChem integration

EMBL-EBI chemistry resources
RDF6and6REST6API6interfaces6
REST6API6Interface636h"ps://www.ebi.ac.uk/unichem/6
Atlas6
6
6
6
Ligand6
induced6
transcript6
response6
7506
PDBe6
6
6
6
Ligand6
structures6
from6
structurally6
defined6
protein6
complexes6
6
15K6
ChEBI6
6
6
6
Nomenclature6
of6primary6and6
secondary6
metabolites.6
Chemical6
Ontology6
6
24K6
SureChEMBL6
6
6
6
6
Chemical6
structures6
from6patent6
literature6
6
~17M%
ChEMBL6
6
6
6
Bioac9vity6
data6from6
literature6
and6
deposi9ons6
6
1.5M6
UniChem6–6InChI3based6chemical6resolver6(full6+6relaxed6‘lenses’)6>90M6
3
rd
6Party6Data6
6
ZINC,6PubChem,6
ThomsonPharma6
DOTF,6IUPHAR,6
DrugBank,6KEGG,6
NIH6NCC,6
eMolecules,6FDA6
SRS,6PharmGKB,6
Selleck,6….6
6
~70M6

Novelty checking with UniChem
h"ps://www.ebi.ac.uk/unichem/6

Cheminformatics utilities

Cheminformatics utilities (aka ‘Beaker’)
• Chemical format conversions
• Dynamic image generation
• Image processing (via OSRA)
• Descriptors and property calculations
• Chemical modifications and standardization

https://www.ebi.ac.uk/chembl/api/utils/docs

Example: Image to Structure
image URL

myChEMBL integration

Accessing local data with myChEMBL

Using KNIME to connect to myChEMBL
SELECT mr.*, md.chembl_id,
cp.full_mwt, cp.alogp
from mols_rdkit mr,
molecule_dictionary md,
compound_properties cp
where
mr.m @> '$${SMolecule}$$'::qmol
and
mr.molregno = md.molregno
and
md.molregno = cp.molregno;

SureChEMBL and Open PHACTS

SureChEMBL and Open PHACTS
SureChEMBL%
SciBite%Termite%
Open%PHACTS%API%
https://dev.openphacts.org/docs/develop
https://github.com/openphacts/OPS-Knime/

http://rdf.ebi.ac.uk/resource/surechembl/patent/US-8877786-B2
Substituted
carbamoylmethylamino acetic
acid derivatives as novel NEP
inhibitors
US-8877786-B2
Most relevant targets and diseases
MCS scaffold

Most relevant diseases
Most relevant targets
Patent publication date histogram
http://rdf.ebi.ac.uk/resource/surechembl/molecule/SCHEMBL371804
Foretinib, a kinase inhibitor in clinical phase II
Found in 89 EP,
WO and US
patents

Summary
• KNIME: democratizes access to data and tools
• Access public domain structure and bioactivity data and
services with KNIME
• ChEMBL KNIME Nodes
• UniChem
• Cheminformatics services
• myChEMBL
• SureChEMBL

Publications

Acknowledgements
• Francis Atkinson
• Louisa Bellis
• Jon Chambers
• Michał Nowotka
• Anne Hersey
• Stefan Beisken
• Edmund Duesbury
• Daniela Digles
• Thorsten Meinl
• KNIME
• KNIME community
All6workflow6examples6are6available6on6request.66

ChEMBL resources and KNIME




George Papadatos
[email protected]