Semantic MediaWiki - a Linked Open Data Platform

krabina 45 views 39 slides Oct 10, 2024
Slide 1
Slide 1 of 39
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39

About This Presentation

Presentation about Semantic MediaWiki as Linked Open Data platform at the 2024 LD4 Conference: Building Community for Linked Open Data


Slide Content

Semanti c MediaWiki as Linked Open Data Platform Bernhard Krabina, KM-A

Introduction Managing partner at KM-A Knowledge Management Associates Active member of the Semantic MediaWiki community ~ 15 years Knowledge Graph/Wiki researcher at WU Vienna (Prof. Polleres ) Knowledge Management lecturer at university of applied sciences 2 2 KM consulting KM training KM research open-source SMW stack professional hosting

1 2 3 Agenda MediaWiki as Knowledge Base/Graph Structured Data in MediaWiki Semantic MediaWiki as LOD platform

Wikipedia – Wikimedia – Mediawiki Encyclopedia Operator Software 4

The power of Knowledge Graphs A knowledge graph represents knowledge in form of tripes: subject – predicate – object (format: RDF) This forms a network of nodes and edges page 5 Hedy Lamarr actor is an born in Vienna Austria located in artist subclass of Query for Austrian artists can retrieve Hedy Lamarr even when this is not tagged on her page !

Knowledge Graph vs. Knowledge Base Term was made popular by Google „Introducing the Knowledge Graph: things, not strings” Trivial definition: Ontology + Instances https://blog.google/products/search/introducing-knowledge-graph-things-not/ actress woman person ??? is a subclass of has birthdate has birthdate Hedy Lamarr Romy Schneider Christiane Hörbiger Paula Wessely https://en.wikipedia.org/wiki/ Category:Actresses_from_Vienna

A scientific definition ( Paulheim 2016) A knowledge graph mainly describes real world entities and their interrelations, organized in a graph, defines possible classes and relations of entities in a schema, allows for potentially interrelating arbitrary entities, covers various topical domains. in MediaWiki real world entities = wiki pages classes = categories and relations of entities = properties , interrelating entities = linking wiki topic

Structures in MediaWiki Formatted text (Headings, numerations, paragraphs, quotes) Templates Pages and subpages Namespaces Categories and subcategories Category „inflation“ Manually curated lists No querying of data inside MediaWiki

Knowledge Graphs and Wikipedia vs. custom Knowledge Graphs extract structured information from Wikipedia and make this information available on the Web 9 free knowledge base that can be read and edited by humans and machines alike… central storage for the data that may be accessed by the client Wikipedias turns MediaWiki into a powerful and flexible knowledge management system lets you store and query data within the wiki's pages a set of extensions for MediaWiki

1 2 3 Agenda MediaWiki as Knowledge Base/Graph Structured Data in MediaWiki Semantic MediaWiki as LOD platform

Options

Knowledge Graph? Storage of data MW database, ElasticSearch , TripleStores (incl. Blazegraph ) MW database, Blazegraph MW database MW database Properties flexible defined before usage, unchangeable no properties, but table fields defined through JSON-schema Queries parser function, API, TripleStore (SPARQL) API, TripeStore (SPARQL) parser function, API parser function, API Linking Data RDF, importing ontologies RDF, reusing Wikidata ontology - - Knowledge Graph    would need RDF or JSON-LD  would need RDF or JSON-LD

Semantic MediaWiki or Wikibase ? https://www.mediawiki.org/wiki/Manual:Managing_data_in_MediaWiki Semantic MediaWiki Wikibase flexible data model data model of Wikidata properties can be pre-defined or declared by annotating properties need to be pre-defined properties (and datatypes) can be changed any time properties cannot be changed! requires extensions for form-based input comes with a fixed, built-in edit interface SPARQL only with external triplestore internal query language (easier than SPARQL) no built-in querying of data 13

Semantic MediaWiki or Wikibase ? https://www.mediawiki.org/wiki/Manual:Managing_data_in_MediaWiki Semantic MediaWiki when you Wikibase when you have data that is not always disputed ( reference datatype exists ) need data provenance ( complex statments with references for disputed data ) want a data structure that defines your (smaller) world want a data structure that describes the world want to re-use domain-specific vocabularies (schema.org) want to re-use (part of the) Wikidata ontology want to define your own edit forms want the edit interface of Wikidata want to interact with your data in wiki pages want to interact with your data primarily via SPARQL endpoint 14

1 2 3 Agenda MediaWiki as Knowledge Base/Graph Structured Data in MediaWiki Semantic MediaWiki as LOD platform

What is Semantic MediaWiki (SMW)? open source project: www.semantic-mediawiki.org https://github.com/SemanticMediaWiki the „ swiss army knife“ for data and semantics built on the MediaWiki ecosystem: the wiki engine that powers Wikipedia can be used for much more than just wikis… 16

MediaWiki + SMW + more extensions collaborative editing version history of every edit no backend: everything is a wiki page structure via categories and namespaces API … structured data (Web database) result lists and formats via {{#ask:}} queries Semantic Web standards triplestore support … online forms for data entry more visualizations responsive skin authentication image annotation SPARQL … 17

Linked Data vs. Open data

Open Data and SMW Make your SMW readable by everyone editing can still be restricted to logged-in users Include an open license see https://www.mediawiki.org/wiki/Manual:Copyright $ wgRightsIcon = "$ wgScriptPath /resources/assets/licenses/cc-by-sa.png"; Make it easy for users to access your data: SMW puts a link to the RDF-representation of individual pages (via Special:ExportRDF ) in the HTML automatically, see https://www.semantic-mediawiki.org/wiki/Help:RDF_export create an RDF Dump https://www.semantic-mediawiki.org/wiki/Help:Maintenance_script_dumpRDF.php indicate exporting of data on pages or lists (e. g. vCard, iCalendar, BibTex , KML) provide export pages with explanations and several result formats (CSV, JSON, RDF)

Linked Data and SMW Re-use external vocabularies, most importantly Schema.org Add vocabulary definitions to properties, but also to category pages Use the page IDs of MediaWiki as unique identifiers Use the data type External identifier to link to external sources

Building your Knowledge Base page Vienna can have properties number of inhabitants, located in, coordinates, WikidataID , … properties can have various data types page, text, number, date, URL, … external identifier links to external resources re-use external vocabularies “Coordinates” imported from schema:geo a page should be put into a category Also category pages should re-use vocabularies: {{#set:Imported from= schema:City }} 21

Unique IDs in MediaWiki and Special:URIResolver in SMW Every page has a unique page ID display it with the magic word: {{PAGEID}} use it to link to a page without using the page name (that could change over time) Use Special:URIResolver https://yourwiki/ Special:URIResolver /?curid=34 supports content negotiation Furthermore: IDs for every version of a page!

Using External Vocabularies Add/edit a page MediaWiki:Smw import schema Instead of local datatype declarations, use {{#set:Imported from= schema:geo }} on the property page (e. g. Property:Coordinates ) instead of {{#set:Has type=Geographic coordinates}} Add (or remove) vocabulary terms any time…

Linking to external identifiers Define a property Assign datatype „External identifier“ Links to external ids {{#set:Has type=External identifier |External formatter uri = http://www.wikidata.org/entity/$1}} Look for other identifiers ORCID https://orcid.org/ GND ….. Even better, use Schema.org: {{#set:Imported from:: schema:sameAs }}

https://www.semantic-mediawiki.org/wiki/ MediaWiki:Smw_import_schema

Changing data types in SMW https://www.caf-network.eu/MediaWiki:Smw_import_dcterms

Internal query language 27 {{#ask: [[ Category:Practices ]] [[Country::Austria]] |? Organisation |?Coordinates |format=table }}

Internal query language {{#ask: [[ Category:Practices ]] [[Country::Austria]] |? Organisation |?Coordinates |format=map }} 28

> 70 result formats, supporting MediaWiki templates |format= moderntimeline |format=calendar |format=median |format=D3chart |format= gantt |format= tagcloud |format= json |format= rdf |format= bibtex … 29

Maintaining your Knowledge Graph Data/ontology curation: “semantic gardening” user rights admins, curators, users property annotation health Outdated properties/entities Similar properties Property uniqueness Improper annotations and failed queries Missing redirect annotations https://www.semantic-mediawiki.org/wiki/Semantic_gardening 30 Inferencing subcategories subproperties equality of pages: redirects subqueries https://www.semantic-mediawiki.org/wiki/Help:Inferencing

Semantic MediaWiki storage options SQL Store (default) extra tables in the SQL store of MediaWiki ElasticStore search engine, not a storage backend SPARQL/RDF Store custom, default Virtuoso Blazegraph Fuseki Sesame 4store 31 easy ( to install ) harder to install but more powerful

Examples Vienna History Wiki https://www.geschichtewiki.wien.gv.at Knowledge Management Platform https://www.wissensmanagement.gv.at Austrian Public Sector Award https://www.verwaltungspreis.gv.at FINA Wiki (numismatic research) https://fina.oeaw.ac.at/

https://doi.org/10.1016/j.websem.2022.100771

Hacking Semantic MediaWiki Get involved in the SMW community: www.semantic-mediawiki.org Join our Github account: https://github.com/SemanticMediaWiki/ Join mailing lists: https://www.semantic-mediawiki.org/wiki/Semantic_MediaWiki_mailing_lists Element/Matric/Telegram chat https://t.me/joinchat/MCG84k3OMoaYZoFA9yhyMg Social Media channels (Twitter, Mastodon , LinkedIn, Facebook, YouTube) Projects you should look into https://canasta.wiki https ://www.open-csp.org 34

SMW sponsorship: donating time https://www.semantic-mediawiki.org/wiki/Sponsorship Self-declaration

Donate money! https://opencollective.com/smw/

1 2 Let’s collaborate on the future of SMW!

39 KMA Knowledge Management Associates | Gersthofer Straße 162 | A-1180 Wien | [email protected] | www.km-a.net 39 Knowledge Management Wiki consulting , Semantic MediaWiki Open Government , Open Data Bernhard Krabina +43 676 5103593 [email protected] linkedin.com/in/krabina @krabina