intermine.bio2rdf.org : A QLever SPARQL endpoint
for InterMine databases
François Belleau
presented @ BOSC2024
Size: 16.47 MB
Language: en
Added: Jul 14, 2024
Slides: 16 pages
Slide Content
intermine.bio2rdf.org A QLever SPARQL endpoint for InterMine databases François Belleau ISMB 2024 BOSC 12 July 2024
2 Contents InterMine project Semantic Web concepts Data transformation process from REST to RDF InterMine SPARQL endpoint demo Concluding remarks
InterMine is an open source data warehouse system, licensed under the LGPL 2.1. It is used to create databases of biological data accessed by sophisticated web query . InterMine includes a user- friendly web interface that works 'out of the box' and can be easily customised . InterMine makes it easy to integrate multiple data sources into a single data warehouse . There is 19 InterMine databases are available . https://en.wikipedia.org/wiki/InterMine InterMine Project 3
FlyMine query : Pathway identifier and name for the selected gene with FlyMine 4 https://www.flymine.org/flymine
InterMine : Programming API available 5
Semantic Web is an extension of the World Wide Web that create a web of data where machines can understand the meaning and relationships between things . RDF provides a standard way to represent information in the form of triples: ( subject , predicate , object ). SPARQL is a query language specifically designed for retrieving and manipulating data stored in RDF format in a triplestore . Semantic Web concepts 6
20 years of Semantic web in Life Science evolution 7 ISMB 2005 ISMB 2008 Linked Data 2005 2014 Linked Data 2024 for life science Linked Data 2009
SPARQL endpoints today …and many more 8 https://www.genome.jp/sparql/linkdb https://disease-ontology.org/do-kb/sparql https://sparql.uniprot.org/ https://id.nlm.nih.gov/mesh/query https://query.wikidata.org/ https://rdfportal.org/dataset/pdbj UniProt MeSH Kegg DO PDP Wikipedia
InterMine SPARQL endpoints are missing . 9
MO-LD project : InterMine RDF conversion first attempt 10 https://github.com/mo-ld DÉRASPE, Maxime, BINKLEY, Gail, BUTANO, Daniela, et al. Making linked data SPARQL with the InterMine biological data warehouse . In : CEUR Workshop Proceedings . Rheinisch-Westfaelische Technische Hochschule Aachen* Lehrstuhl Informatik V, 2016.
Data transformation process from REST to RDF 11 http://es.kibio.science QLever triplestore DB collection : FlyMine WormMine YeastMine http://intermine.bio2rdf.org SPARQL endpoints - QLever UI intermine - linkml -classe intermine-linkml-field intermine -DB-relation Intermine -DB- object Python InterMine REST API DB- relation.nt.gz DB- object.nt.gz qlever index intermine2linkml.py intermine2os.py elasticdump https://huggingface.co/datasets/bio2rdf/intermine json_gz2nt_gz.py DB- relation.ndjson.gz DB- object.ndjson.gz linkml-DB.yaml linkml2es.py
http://intermine.bio2rdf.org:7000 12
Show the Pathway identifier(s) and name for the selected gene 13 SPARQL query InterMine query
14 Concluding remarks Semantic Web as evolved with new technologies : JSON-LD ( https://json-ld.org/ ) LinkML ( https://linkml.io/ ) QLever triplestore ( https://qlever.cs.uni-freiburg.de ) Converting JSON from REST API to RDF is a simple approach Future works : Other InterMine SPARQL endpoints will be added We will explore SPARQL query generation with LLM
15 Acknowlegments Collaborators Gos Micklem (Cambridge University ) Deepak Unni (SIB, Swiss Institute of Bioinformatics ) Arnaud Droit ( ADLab ) Funding BioHackathon 2023 organizers BOSC 2024 Organizing Committee
16 16 http://intermine.bio2rdf.org:7000 https://huggingface.co/datasets/bio2rdf/intermine Try it Get data and scripts [email protected]