Spinque is a spin-off company from CWI that builds on the research into Databases and Information Retrieval integration. We build tailor made search engines over connected datasets. With th...
Spinque is a spin-off company from CWI that builds on the research into Databases and Information Retrieval integration. We build tailor made search engines over connected datasets. With the Spinque technology we compose a search engine out of building blocks and compile this “search strategy” into an efficient query program. In the talk we explain and demonstrate the Search by Strategy approach. In addition, we discuss our current developments and challenges in searching Linked Data.
Bio: Michiel Hildebrand received his PhD from University of Amsterdam (at CWI) in 2010 for his research on access to Linked Data. He worked as a researcher at VU University and CWI. In 2014 he joined Spinque to apply the company’s search by strategy approach to Linked Data.
Size: 2.49 MB
Language: en
Added: Jan 30, 2015
Slides: 20 pages
Slide Content
Searching Linked Data with Spinque Michiel Hildebrand, Wouter Alink, Roberto Cornacchia, Arjen de Vries Search Engines Amsterdam, January 30 2015
background concept product Information Retrieval and DB integration Cornacchia et al. Flexible and efficient IR using Array Databases . VLDB ‘08 Journal Mühleisen et a l. Column Stores as an IR Prototyping Tool . ECIR’14 & SIGIR’14 Search by Strategy Alink et al. Searching CLEF-IP by strategy . CLEF’09 PatOlympics, 2010 and 2011 Tailored access to connected datasets Koninklijke Bibliotheek, Wageningen Universiteit, Beeld&Geluid, Elsevier, Heineken, ...
Heterogenous Data Hang Li et al. A new approach to intranet search based on information extraction. CIKM’05 Complex information needs SQL CSV XML HTML OAI JSON
Heterogenous University Data Financial administration (ERP) Contract administration (CMS) Contract documents (CMS attachments) Publication database (Institutional Repository) Publication documents (Institutional Repository PDFs) Employee database (address lists, ERP+CMS) Companies (CMS + ERP + document mentions) Subsidy database (CMS) Departments (address lists, CMS) Web addresses (extracted from documents) Topic (assigned to publications) Research programmes (dependent on funding scheme) Complex information needs What funding schemes are the primary source of income? Can we move to Europe when Dutch funding dries up? Who has active relations with partner X? “Valorisation”; new national funding requirements What industry sectors do we depend upon? How many projects in smart cities? Green energy? Cloud computing? Etc. How are strategic decisions implemented? Has objective “move from Telecom toward ICT” been achieved, and how does it develop over time?
Heterogenous University Data Harvest and link data, model as a graph Complex information needs Search by Strategy
Project by topic Search in attachments of projects Search for project contracts (by metadata) Traverse from attachments to projects & combine results
Topic expert Search objects about topic Expand with neighbours in and out Return related persons Ranked by tf-idf on relations
Norbert Fuhr, Thomas Rölleke. A Probabilistic Relational Algebra for the Integration of Information Retrieval and Database Systems (1994)
API STRATEGY EDITOR COMPILER INDEXING PIPELINE SQL CSV HTML OAI XML APPLICATIONS
Search by Strategy (visual) modelling of search processes Rank. Everything. Always. all-round probabilistic search Many strategies, one data model many search engines, one index
Components Supporting the Open Data Exploitation
API SQL CSV HTML OAI XML STRATEGY EDITOR COMPILER APPLICATIONS INDEXING PIPELINE
API Builder for Open Data? Supporting (search) application developers Gregory Grefenstette. Search-based applications. 2010 Jamie Callan. Search Engine Support For Software Applications. CIKM 2010 Keynote Who builds search strategies? Developers are not IR specialists Domain specialists neither How to handle schema-mess ? in a heterogeneous dataspace
Happy alignments are all alike, every unhappy alignment is unhappy in it’s own way Jacco van Ossenbruggen 2012 (improvisation on Anna Karenina, Leo Tolstoy 1887)
Alignment strategies Interactive vocabulary alignment, Jacco van Ossenbruggen, Michiel Hildebrand, Victor de Boer, TPDL 2011 Coming soon Spinque Alignment Service Beeld&Geluid, Naturalis, Rijksdienst Cultureel Erfgoed (RCE)