Biomedical Knowledge Graphs for Data Scientists and Bioinformaticians

neo4j 316 views 44 slides Jun 17, 2024
Slide 1
Slide 1 of 44
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44

About This Presentation

Dmitrii Kamaev, PhD
Senior Product Owner - QIAGEN


Slide Content

Biomedical KB-HD/AI:
Biomedical KnowledgeGraph
for Data Scientists
andBioinformaticians
Dmitrii Kamaev PhD

Legal disclaimer
QIAGEN products shown here are intended for molecular biology applications. These products are not intended for the diagnosis, prevention or
treatment of a disease.
For up-to-date licensing information and product-specific disclaimers, see the respective QIAGEN kit instructions for use or user operator manual.
QIAGEN instructions for use and user manuals are available at www.qiagen.com or can be requested from QIAGEN Technical Services (or your
local distributor).
2

QIAGEN Digital Insights (QDI)
Leading provider of genomic and clinical knowledge, analysis and interpretation tools and services for scientists and clinicians
3
Powered by the acquisition of:
…one of 3 Business Units within QIAGEN
3,000,000

QIAGEN Discovery Insights: leading provider of expert-curated knowledge
4June 14, 2024
Curated research findings
Highlight pathways, map networks,
discover mechanisms of action
Curated ‘omics data
Search across diseases and tissues,
find comparisons, identify biomarkers
Curated gene variants
Somatic or germline compendiums,
observed clinical case distribution

Applications
Quickly and efficiently generate novel, high-quality discoveries through highly flexible data analysis and exploration​
5
Analytics-driven drug discovery
Buildapplications
Integrate
Combine our leading data with your innovative analysis
approaches and a wide range of advanced algorithms
developed by the industry to power analytics and AI-driven
drug discovery
Use the data within your own analysis and data-exploration
applications
Integrate the data with other data types and sources, as
well as third-party technologies. Can act as a foundational
data model.
Primary application categories:
Biomedical knowledge graph construction
and analysis
Most popular applications
Analytics and AI-driven target identification
and drug repositioning
Target, disease and drug intelligence
portals
Disease subtype and biomarker
identification based on functional features

QIAGEN Biomedical Knowledge Base
Break knowledge silos to power R&D with data science
6
Biomedical KB-HD
(human-derived)
•Manually curated by expert
scientists
•Contains over 24 million
biomedical relationships
Biomedical KB-AI
(generative AI-derived)
•Curated through advanced
AI processes
•Boasts 600 million+
biomedical relationships
•Quarterly updates
•Available as flat files, knowledge graphs, APIs
•FAIR friendly
•Foundational data model that can scale
Saving time and facilitating research with comprehensive databases.

Many ways to access QIAGEN-curated relationships
7
94,000
diseases
Downloadable flat files
Python, R,
and REST APIs
Causal analysis and
export functions
Neo4j and SQL database
imports
PubMed
TargetScan
BioGRID
UMLS
SnoMed
MeSH
FDA, ClinVar
ClinicalTrials.gov
DrugBank
17,000
drugs
51,000
functions
49,000
chemicals
20 M
research
findings

Tabular representation VS graph representation
June 14, 2024 8
Relationships
Entity metadata

Tabular representation
June 14, 2024 9
Tabular representation makes many queries complex to write

Schema: simple representation

Real schema and real data
June 14, 2024 11

Knowledge Graph Schema Design
June 14, 2024 12
Completeness Simplicity
Better performance
Increases adoption
Supports diverse user needs
Scientific thoroughness

Design Choices: Gene Representation
June 14, 2024 13

Design Choices: non-directional relationships as a single relationship
June 14, 2024 14
Avoids the problem of deduplicating relationships afterwards
Protein-Protein interaction has no directionality. How should we represent it?

Design Choices: Clinical Trial Fine-Grained Representation
June 14, 2024 15
Shi, X., Du, J. Constructing a finer-grained representation of clinical trial results from ClinicalTrials.gov. Sci Data 11, 41 (2024). https://doi.org/10.1038/s41597-023-02869-7

Design Choices: Clinical Trial Evidence Representation
June 14, 2024 16
Evidence
Drug Drug TargetDisease
Evidence attributes

Design choices: relationship aggregation
June 14, 2024 17

Design Choices: Roll up of relationships in ontologies
June 14, 2024 18

Graph Customization: Build Your Own Graph
•Custom names of nodes and relationships
•Customization of attributes
•Aggregation of edges
•Subgraph centered around a certain node
•Exclude irrelevant portions of the content
June 14, 2024 19

June 14, 2024 20
Good schema design is a
balance between simplicity and
comprehensiveness

Recursive queries: finding positive feedback loops
June 14, 2024 21

Querying ontologies
June 14, 2024 22
Equivalent SQL:

Hop 1 and 2 expression networks of ANO1
June 14, 2024 23

Clustering of core pathways in 3D
June 14, 2024 24

Library for 3D visualization
June 14, 2024 25
https://github.com/vasturiano/3d-force-graph

SemSpect: Data Exploration Plugin for Neo4j
June 14, 2024 26

SemSpect: Avoid hairballs in your exploration
June 14, 2024 27
Hides complexity in tables

Relationship-Based Constraints
June 14, 2024 28

June 14, 2024 29
Graph representation enables
discovery through exploration within
the complex interconnections of
biomedical data

What genes cause or correlate with asthma?
30
Genes
Diseases
match (d:disease {name: 'Asthma'})<-[r:C|CO]-(g0:gene)
where any (
subtype_list in g0.node_subtype
where subtype_list in [
'enzyme', 'transcription regulator', 'transporter',
'kinase', 'G-protein coupled receptor', 'peptidase',
'transmembrane receptor', 'ion channel', 'phosphatase',
'translation regulator', 'cytokine', 'growth factor',
'ligand-dependent nuclear receptor'])
return d, r, g0
355 nodes
8309 relationships

Genes
Tox Functions
Pathways
How are asthma-related genes functionally linked?
31
...
optional match (g0:gene)-[:is_a*]->(g1:gene {macromolecule_level: 'ortholog group level'})
optional match (g1:gene {macromolecule_level: 'ortholog group level'})-[r1:member_of]-(p:pathway|toxlist)
with p, collect(distinct g1) as genes, collect(r1) as relationships
where size(genes) >= 2
return genes, relationships, p
281 genes
426 pathways
62 toxlists
3507 relationships

Genes
Tox Functions
Pathways
32
Louvain neighborhood detection,
then filtering by centrality
Can we use biological activity to identify functional neighborhoods?

33
Drugs known to
activate or inhibit
Can we repurpose drugs to target key intersections?
Genes
Tox Functions Drugs
Pathways
Immunosuppressant
approved for atopic
dermatitis
Phase two
complete for
asthma

Link Prediction
June 14, 2024 34
Complex Embeddings for Simple Link Prediction, Theo Trouillon et al.
Gene Disease
?
Define train/test split using Neo4j
•Taking random links between gene-disease
•Mark links to child and parent diseases as exclude
Trained ComplEx embeddings with DGL-KE
Compared to predictions based on node degree

QIAGEN Biomedical KB-AI: provides the greatest depth and
breadth of knowledge for critical pharmaceutical research
Unstructured relationship sources
Structured relationship sources
Graph enrichment sources
35
NIHPMCPubMed
arXivmedRxiv
bioRxiv
Google
Patents
GWAS
Catalog
dbSNP ChEMBL
RxNav
CPDB
ClinicalTrials.gov
un1Chem
PubChem
FDA
HGNC
reactome
GENEONTOLOGY
MeSH
Open Targets
UniProt DAILYMED
12 billion+ triples; 600 million+ relationships
•335 million+ relationships from scientific research
•9.4 million+ relationships from patents
•14.9 million+ relationships from grants
•4.7 million+ relationships from clinical trials
•279 million+ relationshipsfrom structured sources
Discovery
Identify new targets and
indications with genetic
evidence found across
scientific literature
Clinical
development
Establish potential
biomarkers for
diseases
Business development
and strategy
Understand the competitive
landscape by target, drug,
indication and augment
scientific due diligence
Data generated using state-of-the-art entity disambiguation, semantically meaningful relationship extraction and causal
relationships.

June 14, 2024 36
Entities and Relationships in Biomedical KB-AI
Relationships
Semantic: 290 million
Causal: 9 million
Adverse effects: 280 million
Clinical Trials: 4.7 million
GWAS: 2 million

Preclinical Competitive Intelligence
June 14, 2024 37
Biomedical KB-AI provides many competitive
intelligence sources including
•Patents
•Clinical trials
•Research papers
•Grant applications
GLP1R patent mentions

Preclinical Competitive Intelligence
June 14, 2024 38
Biomedical KB-AI provides many competitive
intelligence sources including
•Patents
•Clinical trials
•Research papers
•Grant applications
Top 20% clinical trial sponsors for GLP1

Indications GLP1 Is Investigated For By Top Pharma Companies

Timeline –Top 4 drugs
targeting GLP1R
Evidence comes form
NIH grants, Publicaitons
and Patents
Evidence accumulation for GLP1R interacting drugs

Rare diseases research
6/14/2024 41
Hypophosphatasia and Ehlers-Danlos syndrome
•Building chat interface model for HPP and EDS scientificpublications
•Building model that augments research for HPP and EDS

June 14, 2024 42
Graph representation supports
complex analyses of biomedical
data

June 14, 2024 43
•Kyle Nilson
•Millie Zhou
•Ivana Grbesa
•Francesco Lamanna
•Andreas Kramer
•Bob Rebres
•Burk Braun
•Swati Mishra
•Bjarke Skjernaa
•Allan Merrild
•Rune Gee Madsen
•Poul Liboriussen
•Thomas Hyldgaard
•Venkatesh Moktali
•Alex Jarasch
•Alexander Erdl
•Vincent Vialard
Acknowledgments

Thank you for your attention
Trademarks: QIAGEN
®
, Sample to Insight
®
, Ingenuity
®
, IPA
®
(QIAGEN Group). Registered names, trademarks, etc. used in this document, even when
not specifically marked as such, may still be protected by law. PROM-21134-001 © 2022 QIAGEN, all rights reserved.