ChEMBL and KNIME provide an ideal match of open data with open tools. This is a quick overview of how to access ChEMBL data resources and web services (ChEMBL, UniChem, Beaker, myChEMBL, SureChEMBL) via the KNIME platform.
Outline
• ChEMBL data
• ChEMBL nodes
• Web services v2.0
• UniChem
• Cheminformatics utilities
• myChEMBL
• SureChEMBL and Open PHACTS
Bioactivity data
Compound
Assay/Target
>Thrombin
MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLE
RECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGT
NYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYT
TDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVT
THGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGY
CDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLF
EKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDR
WVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWR
ENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTA
NVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGG
PFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE
3. Insight, tools and resources for translational drug discovery
2. Organization, integration, curation and standardization of pharmacology data
1. Scientific facts
K
i
= 4.5nM
APTT = 11 min.
ChEMBL: Data for drug discovery
Bioactivity data
Compound
Assay/Target
>Thrombin
MAHVRGLQLPGCLALAALCSLVHSQHVFLAPQQARSLLQRVRRANTFLEEVRKGNLE
RECVEETCSYEEAFEALESSTATDVFWAKYTACETARTPRDKLAACLEGNCAEGLGT
NYRGHVNITRSGIECQLWRSRYPHKPEINSTTHPGADLQENFCRNPDSSTTGPWCYT
TDPTVRRQECSIPVCGQDQVTVAMTPRSEGSSVNLSPPLEQCVPDRGQQYQGRLAVT
THGLPCLAWASAQAKALSKHQDFNSAVQLVENFCRNPDGDEEGVWCYVAGKPGDFGY
CDLNYCEEAVEEETGDGLDEDSDRAIEGRTATSEYQTFFNPRTFGSGEADCGLRPLF
EKKSLEDKTERELLESYIDGRIVEGSDAEIGMSPWQVMLFRKSPQELLCGASLISDR
WVLTAAHCLLYPPWDKNFTENDLLVRIGKHSRTRYERNIEKISMLEKIYIHPRYNWR
ENLDRDIALMKLKKPVAFSDYIHPVCLPDRETAASLLQAGYKGRVTGWGNLKETWTA
NVGKGQPSVLQVVNLPIVERPVCKDSTRIRITDNMFCAGYKPDEGKRGDACEGDSGG
PFVMKSPFNNRWYQMGIVSWGEGCDRDGKYGFYTHVFRLKKWIQKVIDQFGE
3. Insight, tools and resources for translational drug discovery
2. Organization, integration, curation and standardization of pharmacology data
1. Scientific facts
K
i
= 4.5nM
APTT = 11 min.
ChEMBL: Data for drug discovery
KNIME at the EBI
• Access ChEBI and ChEMBL databases via KNIME nodes
• Trusted community nodes
• Algorithms development
• Document classification
• Share example workflows and use cases
• Provide KNIME training to scientists and researchers
• Wellcome Trust drug discovery courses, EMBL courses
• CDK community nodes development
h"p://tech.knime.org/book/embl3ebi3nodes6
ChEMBL nodes
ChEMBL KNIME nodes
Example: All bioactivities for hERG
All6bioac9vi9es6for6hERG6
Ac9vity6value,6assay6
descrip9on,6compound,6
reference6
Example: Compound searching in ChEMBL
Query6
List6of6NNs6
Novelty checking with UniChem
h"ps://www.ebi.ac.uk/unichem/6
Cheminformatics utilities
Cheminformatics utilities (aka ‘Beaker’)
• Chemical format conversions
• Dynamic image generation
• Image processing (via OSRA)
• Descriptors and property calculations
• Chemical modifications and standardization
https://www.ebi.ac.uk/chembl/api/utils/docs
Example: Image to Structure
image URL
myChEMBL integration
Accessing local data with myChEMBL
Using KNIME to connect to myChEMBL
SELECT mr.*, md.chembl_id,
cp.full_mwt, cp.alogp
from mols_rdkit mr,
molecule_dictionary md,
compound_properties cp
where
mr.m @> '$${SMolecule}$$'::qmol
and
mr.molregno = md.molregno
and
md.molregno = cp.molregno;
SureChEMBL and Open PHACTS
SureChEMBL and Open PHACTS
SureChEMBL%
SciBite%Termite%
Open%PHACTS%API%
https://dev.openphacts.org/docs/develop
https://github.com/openphacts/OPS-Knime/
http://rdf.ebi.ac.uk/resource/surechembl/patent/US-8877786-B2
Substituted
carbamoylmethylamino acetic
acid derivatives as novel NEP
inhibitors
US-8877786-B2
Most relevant targets and diseases
MCS scaffold
Most relevant diseases
Most relevant targets
Patent publication date histogram
http://rdf.ebi.ac.uk/resource/surechembl/molecule/SCHEMBL371804
Foretinib, a kinase inhibitor in clinical phase II
Found in 89 EP,
WO and US
patents
Summary
• KNIME: democratizes access to data and tools
• Access public domain structure and bioactivity data and
services with KNIME
• ChEMBL KNIME Nodes
• UniChem
• Cheminformatics services
• myChEMBL
• SureChEMBL
Publications
Acknowledgements
• Francis Atkinson
• Louisa Bellis
• Jon Chambers
• Michał Nowotka
• Anne Hersey
• Stefan Beisken
• Edmund Duesbury
• Daniela Digles
• Thorsten Meinl
• KNIME
• KNIME community
All6workflow6examples6are6available6on6request.66