Presentació del projecte europeu ECHOES duta a terme el 28 de juny de 2018 a Leiden (Holanda), on el CSUC ha mostrat els objectius i principals característiques del projecte a empreses tecnològiques holandeses.
Size: 6.89 MB
Language: en
Added: Jul 04, 2018
Slides: 67 pages
Slide Content
Empowering Communities with a
Heritage Open Ecosystem
28thjune 2018
ECHOES PROJECT
Technological partner
Agenda
1. About Echoes
2. Analysis 3. Development
4. Conclusions
Project scope ECHOES tries to
•provide a modular IT architecture
•based on open source
•to heritage collection holders
•that functions as a digital ecosystem for a broad
range of user communities
•allowing them to
– take an active role
– be able to enrich digital collections
Main goals
OPEN
Open your collections
and link them to the
world
MODULAR
Modular and extensible
architecture
INNOVATION
New ways of searching
and displaying information
Design principles
STANDARD
Use EDM data
model as standard
metadata schema
TRIPLETS
Data transformed
to LOD/RDF triplets
DEVELOPMENT
Agile development
methodologies
USER
User centered
design
MODULAR
Block by block
approach
Interoperability network Each “echoes hub” can
•manage different collections
•use all or only a part of the functionalities
Analysis phase Technical architecture
•Study possible technologies
– Data structures
– Interoperability
•Component structure
Study some references
•Europeana
•LoCloud
•Catalan Research Portal
Development proposal for each component
•Scope and functionalities
•Tools and technologies
•Risk analysis
Proposal
STANDARD
EDM metadata schema
as interoperability for
• inputs
• enrichments
TECHNOLOGIES
Proposed technologies and tools:
• Dspace
• Apache Fuseki
• Mysql
• Mint, Ontowiki, Hub3..
• Geonames, Dbpedia..
• Zooniverse
MODULAR
Four main modules
• Data Sources
• Enrichments
• Data Lake
• Data retrieval an
visualization
Proposed architecture
Vale Handen
Agenda
3. Development
3.1. Methodology
3.2. Data sources module 3.3. Data Lake module
3.4. Data retrieval and visualization module
3.5. Enrichment module
Methodology Agile development
•Develop a prototype with the minimum requirements
•Test the prototype
•Add new features and improvements on each iteration
Main goal
•Developed product is better suited to the needs
19 sprints
7 releases
1 MPV
Project planning
Agenda
3. Development
3.1. Methodology
3.2. Data sources module 3.3. Data Lake module
3.4. Data retrieval and visualization module
3.5. Enrichment module
Data source module
INPUTS
Collections from different
sources defined in many
metadata schemas
TOOLS
Mapping and
transformation tools to
prepare data
Data Source
Module
Inputs
INPUTS
Collections from different
sources defined in many
metadata schemas
ELO
•17 collection
•Dublin Core, A2A, EAD, Custom
metadata schema
•Source OAI, files
Tresoar
•32 collection
•A2A
•Source: OAI
Gencat
•1 collection
•Custom metadata schema
•Source: Excel file
DIBA
•10 collection
•Custom metadata schema
•Source: Access file
Data Source
Module
Inputs Weput data fromthedifferentinputs on the
systemand... ourplanningwasworkingon
agreggatedvisualization
Whatreallyhappens...
• Too much data, difficultto explore using conventional t ools
• Too much heterogeneous data, silosof information
• Poor data quality (date formats, misspellings, differ entkinds
of geolocations)
Data Source
Module
Inputs
•Weneedto improvethedata quality
•Data profiling
•Define data formats
•Transformthe data
•Data cleansing
•Data standardization
•Data validation
→First approach
→Inputs (examples) → directly mapping to EDM
→ Data Lake→ chaos
→Second approach
→Define standard mapping on each format and create validat ors
→Inputs (examples) → validator→ if ok→ Data Lake→ no chaos :)
Data Source
Module
Tools
TOOLS
Mapping and
transformation tools to
prepare data
Transform inputs to EDM
•Create mapping tool for each
metadata schema
•Look at examples to decide
mapping
Metadata schemas
transformed
•Dublin Core
•A2A
•EAD
•Custom from memorix
•Custom Catalan metadata schema
•Topx (working)
Data Source
Module
Local Data Lake
Document specification describing metadata schema mapping to EDM
Data Source
Module
Deduplication inputs challenges WHERE
WHO
WHAT
-----------------------------
Basílica
WHO
WHERE
WHEN
-----------------------------
1882-2026?
WHERE
WHO
WHAT
-----------------------------
Basílica
WHEN
-----------------------------
1882-2026?
COLLECTION 1
COLLECTION 2
Desired object
to Data Lake
Data Source
Module
How to use mapping tool
Data Source
Module
http://github.com/CSUC • ECHOES 1,2
Mapping and validation tools graphic user interface
Data Source
Module
Agenda
3. Development
3.1. Methodology
3.2. Data sources module 3.3. Data Lake module
3.4. Data retrieval and visualization module
3.5. Enrichment module
Data Lake module
DATA LAKE
Contains data from different sources
in EDM
Data Lake Module Data Lake stores a big amount of data
Data comes from different sources
All data is in the same format EDM
Data Lake
Module
EDM metadata schema
Data Lake
Module
Data Lake Analysis propose use DSpace
But starting to work on it
•DublinCore mapping was done ok
•A2A need to store relation between data
Data Lake
Module
Technologies Study and test graph database tool
•Behavior using real data
•Performance tests
•API
Data Lake
Module
Graph database Blazegraph database
•standards-based
•high-performance
•Scalable
Open-source
Written entirely in Java
Supports
•Blueprints
•RDF/SPARQL1.1family of specifications
Data Lake
Module
Agenda
3. Development
3.1. Methodology
3.2. Data sources module 3.3. Data Lake module
3.4. Data retrieval and visualization module
3.5. Enrichment module
Data retrieval and visualization module
SPARQL ENDPOINT
Open your collections and link them
to the world
WEB PORTAL
Modular and extensible
architecture
Data retrieval and visualization module Analysisphasepropose
Web Portal
SPQRL Endpoint
(as Data Lake) (Good intregr ation with DSpace)
Butproposalfor Data Lakewaschanged
Data retrieval and
visualization Module
Data retrieval and visualization module
SPARQL ENDPOINT
Open your collections and link them
to the world
Data retrieval and
visualization Module
SPARQL Endpoint Requirements
•Allow to export data to the semantic web
•APIs to export information to web pages or widgets
•Triplet Store Database
•SPARQL
Proposal
•YASGUI suite
– Query Editor YASQE
– Result Set Visualizer YASR
Data retrieval and
visualization Module
SAPQL Endpoint
PREFIX rdaGr2: <http://rdvocab.info/ElementsGr2/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
SELECT ?gender (COUNT(?gender) AS ?Count)
WHERE
{ ?agent a edm:Agent ;
rdaGr2:gender ?gender .
?provided a edm:ProvidedCHO ;
dc:contributor ?agent .
?aggregacio edm:aggregatedCHO ?provided ;
edm:dataProvider ?institucio ;
edm:intermediateProvider ?col
}
GROUP BY ?gender ?institucio ?col
ORDER BY DESC(?Count)
EDM AGENT GENDER
http://blazegraph.pre.csuc.cat/echoes/short/Hk_GRIbxf
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT distinct ?place WHERE {
?s a edm:Place;
skos:prefLabel ?place;
}
LIMIT 10
PLACES
http://blazegraph.pre.csuc.cat/echoes/short/S1xkWRZxG
Data retrieval and
visualization Module
Data retrieval and visualization module
WEB PORTAL
Modular and extensible
architecture
Data retrieval and
visualization Module
Web Portal Requirements
•browse, accessto, andsearchthecontents
•Differentvisualizationtools
•Advanced searchfunctionalities
Proposal
•WordPress
– CMS capabilitiesto manageweb pages
– Customizablecreatingplugins
– Retrieve data using BlazegraphAPI
– Visualization
Data retrieval and
visualization Module
Visualization focus
Data retrieval and
visualization Module
Echoes portal
Data retrieval and
visualization Module
Echoes portal
N o se p u ed e mo str ar la imag en en este mo men to .
Data retrieval and
visualization Module
Echoes portal
Data retrieval and
visualization Module
Data visualization
PLACES
EDM: Place
• Metadata
• Relations
Source
• Map
TIME
EDM: TimeSpan
• Metadata
• Relations
Source
• Timeline
• Heat map
CULTURAL
OBJECTS EDM: ProvidedCHO
• Metadata
• Relations
Source
• Map
PEOPLE
EDM:Agent
• Metadata
• Relations
Source
• Graph
Data retrieval and
visualization Module
Places visualization
Map
Search
•Using keywords
•Allow select a
region also
•Filter by type
Tooltip
•information
related to place
•Show more
option
Download
results in JSON,
CSV…
Data retrieval and
visualization Module
Cultural object visualization
Graph
Related
information
showed as a
graph
Tooltip
•Detailed
information
•Show more
options
Download results
in JSON, CSV…
Data retrieval and
visualization Module
Timeline
Search
•Between dates
•Period
Tooltip
•information
related to place
•Show more
option
Time span visualization
Data retrieval and
visualization Module
Time span visualization
Timeline
Search
•Between dates
Period showed
under years
Banner
•information
related to place
•Show more
option
•Click to next
Data retrieval and
visualization Module
Time span visualization
Heat map
Search
•Between dates
On mouse over day
box show number of
providedCHO related
and a link to show
them
Color darkens
depending on number
of occurrences
Selecting a date from
calendar related
providedCHO are
listed
Data retrieval and
visualization Module
Time span visualization
Timespan
Search
•Between dates
Show
providedCHO by
year
•Show more
options
Tooltip
•Detailed
information
Data retrieval and
visualization Module
Agent visualization
https://echoes.pre.csuc.cat/ag ents/demo/ • Using the same library
used to graph the relations
in the CHO details, with
icons, different type of
relations and pseudo-
hierarchy.
• Tooltip can be included
Graph
NGraph to show
agent relations in
a providedCHO
NTooltip
•Detailed
information
N o se p u ed e mo str ar la imag en en este mo men to .
Data retrieval and
visualization Module
Agent visualization
Family tree
Agents showed as
a simple hybrid
graph/tree
hierarchy
depending their
relation
Colored by gender
Data retrieval and
visualization Module
Agent visualization
https://echoes.pre.csuc.cat/ag ents/demo_three/ • Information showed as a
left to right tree
• Additional agent details
are showed
Family tree
Information
showed as a left
to right tree
Additional agent
details are
showed
Data retrieval and
visualization Module
Agenda
3. Development
3.1. Methodology
3.2. Data sources module 3.3. Data Lake module
3.4. Data retrieval and visualization module
3.5. Enrichment module
Enrichments module
AUTOMATIC
Predefined processes
include new metadata
related to objects
MANUAL
User empowering
Enrichments module
AUTOMATIC
Predefined processes
include new metadata
related to objects
Enrichments
Module
Which
•Select metadata fields
When
•New metadata can be
incorporated as a
– preprocess
– post process
How
•Reuse existing fields
•Create new metadata
Enrichments module
MANUAL
User empowering
Enrichments
Module
Coming soon…
Agenda
1. About Echoes
2. Analysis 3. Development
4. Conclusions
Lessons learned
Challenges in ECHOES development
Sprint 4: A2A relations doesn’t fit in a relational database
Sprint 11: Data quality vs Data quantity
Sprint 14: Create unique objects using metadata from
many sources
ECHOES architecture
Incoming challenges Automatic enrichments
•When
•How
•Sources
Manual enrichments
•Review user proposal
•Load new data in
Data Lake