intermine.bio2rdf.org : A QLever SPARQL endpoint

fbelleau 307 views 16 slides Jul 14, 2024
Slide 1
Slide 1 of 16
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16

About This Presentation

intermine.bio2rdf.org : A QLever SPARQL endpoint
for InterMine databases

François Belleau

presented @ BOSC2024


Slide Content

intermine.bio2rdf.org A QLever SPARQL endpoint for InterMine databases François Belleau ISMB 2024 BOSC 12 July 2024

2 Contents InterMine project Semantic Web concepts Data transformation process from REST to RDF InterMine SPARQL endpoint demo Concluding remarks

InterMine   is an open source data warehouse system, licensed under the LGPL 2.1. It is used to create databases of biological data accessed by sophisticated web query . InterMine includes a user- friendly web interface that works 'out of the box' and can be easily customised . InterMine makes it easy to integrate multiple data sources into a single data warehouse . There is 19 InterMine databases are available . https://en.wikipedia.org/wiki/InterMine InterMine Project 3

FlyMine query : Pathway identifier and name for the selected gene with FlyMine 4 https://www.flymine.org/flymine

InterMine : Programming API available 5

Semantic Web is an extension of the World Wide Web that create a web of data where machines can understand the meaning and relationships between things . RDF provides a standard way to represent information in the form of triples: ( subject , predicate , object ). SPARQL is a query language specifically designed for retrieving and manipulating data stored in RDF format in a triplestore . Semantic Web concepts 6

20 years of Semantic web in Life Science evolution 7 ISMB 2005 ISMB 2008 Linked Data 2005 2014 Linked Data 2024 for life science Linked Data 2009

SPARQL endpoints today …and many more 8 https://www.genome.jp/sparql/linkdb https://disease-ontology.org/do-kb/sparql https://sparql.uniprot.org/ https://id.nlm.nih.gov/mesh/query https://query.wikidata.org/ https://rdfportal.org/dataset/pdbj UniProt MeSH Kegg DO PDP Wikipedia

InterMine SPARQL endpoints are missing . 9

MO-LD project : InterMine RDF conversion first attempt 10 https://github.com/mo-ld DÉRASPE, Maxime, BINKLEY, Gail, BUTANO, Daniela,  et al.   Making linked data SPARQL with the InterMine biological data warehouse . In :  CEUR Workshop Proceedings . Rheinisch-Westfaelische Technische Hochschule Aachen* Lehrstuhl Informatik V, 2016.

Data transformation process from REST to RDF 11 http://es.kibio.science QLever triplestore DB collection : FlyMine WormMine YeastMine http://intermine.bio2rdf.org SPARQL endpoints - QLever UI intermine - linkml -classe intermine-linkml-field intermine -DB-relation Intermine -DB- object Python InterMine REST API DB- relation.nt.gz DB- object.nt.gz qlever index intermine2linkml.py intermine2os.py elasticdump https://huggingface.co/datasets/bio2rdf/intermine json_gz2nt_gz.py DB- relation.ndjson.gz DB- object.ndjson.gz linkml-DB.yaml linkml2es.py

http://intermine.bio2rdf.org:7000 12

Show the Pathway identifier(s) and name for the selected gene 13 SPARQL query InterMine query

14 Concluding remarks Semantic Web as evolved with new technologies : JSON-LD ( https://json-ld.org/ ) LinkML ( https://linkml.io/ ) QLever triplestore ( https://qlever.cs.uni-freiburg.de ) Converting JSON from REST API to RDF is a simple approach Future works : Other InterMine SPARQL endpoints will be added We will explore SPARQL query generation with LLM

15 Acknowlegments Collaborators Gos Micklem (Cambridge University ) Deepak Unni (SIB, Swiss Institute of Bioinformatics ) Arnaud Droit ( ADLab ) Funding BioHackathon 2023 organizers BOSC 2024 Organizing Committee

16 16 http://intermine.bio2rdf.org:7000 https://huggingface.co/datasets/bio2rdf/intermine Try it Get data and scripts [email protected]