Echoes Project

CSUC_info 540 views 67 slides Jul 04, 2018
Slide 1
Slide 1 of 67
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67

About This Presentation

Presentació del projecte europeu ECHOES duta a terme el 28 de juny de 2018 a Leiden (Holanda), on el CSUC ha mostrat els objectius i principals característiques del projecte a empreses tecnològiques holandeses.


Slide Content

Empowering Communities with a
Heritage Open Ecosystem
28thjune 2018
ECHOES PROJECT
Technological partner

Agenda
1. About Echoes
2. Analysis 3. Development
4. Conclusions

Agenda
1. About Echoes
1.1. Scope
1.2.Design principles
1.3. Interoperability network

Project scope ECHOES tries to
•provide a modular IT architecture
•based on open source
•to heritage collection holders
•that functions as a digital ecosystem for a broad
range of user communities
•allowing them to
– take an active role
– be able to enrich digital collections

Main goals
OPEN
Open your collections
and link them to the
world
MODULAR
Modular and extensible
architecture
INNOVATION
New ways of searching
and displaying information

Design principles
STANDARD
Use EDM data
model as standard
metadata schema
TRIPLETS
Data transformed
to LOD/RDF triplets
DEVELOPMENT
Agile development
methodologies
USER
User centered
design
MODULAR
Block by block
approach

Interoperability network Each “echoes hub” can
•manage different collections
•use all or only a part of the functionalities

Agenda
2. Analysis
2.2. Tasks 2.3. Proposal
2.4. Technical architecture

Analysis phase Technical architecture
•Study possible technologies
– Data structures
– Interoperability
•Component structure
Study some references
•Europeana
•LoCloud
•Catalan Research Portal
Development proposal for each component
•Scope and functionalities
•Tools and technologies
•Risk analysis

Proposal
STANDARD
EDM metadata schema
as interoperability for
• inputs
• enrichments
TECHNOLOGIES
Proposed technologies and tools:
• Dspace
• Apache Fuseki
• Mysql
• Mint, Ontowiki, Hub3..
• Geonames, Dbpedia..
• Zooniverse
MODULAR
Four main modules
• Data Sources
• Enrichments
• Data Lake
• Data retrieval an
visualization

Proposed architecture
Vale Handen

Agenda
3. Development
3.1. Methodology
3.2. Data sources module 3.3. Data Lake module
3.4. Data retrieval and visualization module
3.5. Enrichment module

Methodology Agile development
•Develop a prototype with the minimum requirements
•Test the prototype
•Add new features and improvements on each iteration
Main goal
•Developed product is better suited to the needs
19 sprints
7 releases
1 MPV

Project planning

Agenda
3. Development
3.1. Methodology
3.2. Data sources module 3.3. Data Lake module
3.4. Data retrieval and visualization module
3.5. Enrichment module

Data source module
INPUTS
Collections from different
sources defined in many
metadata schemas
TOOLS
Mapping and
transformation tools to
prepare data
Data Source
Module

Inputs
INPUTS
Collections from different
sources defined in many
metadata schemas
ELO
•17 collection
•Dublin Core, A2A, EAD, Custom
metadata schema
•Source OAI, files
Tresoar
•32 collection
•A2A
•Source: OAI
Gencat
•1 collection
•Custom metadata schema
•Source: Excel file
DIBA
•10 collection
•Custom metadata schema
•Source: Access file
Data Source
Module

Inputs Weput data fromthedifferentinputs on the
systemand... ourplanningwasworkingon
agreggatedvisualization
Whatreallyhappens...
• Too much data, difficultto explore using conventional t ools
• Too much heterogeneous data, silosof information
• Poor data quality (date formats, misspellings, differ entkinds
of geolocations)
Data Source
Module

Inputs
•Weneedto improvethedata quality
•Data profiling
•Define data formats
•Transformthe data
•Data cleansing
•Data standardization
•Data validation
→First approach
→Inputs (examples) → directly mapping to EDM
→ Data Lake→ chaos
→Second approach
→Define standard mapping on each format and create validat ors
→Inputs (examples) → validator→ if ok→ Data Lake→ no chaos :)
Data Source
Module

Tools
TOOLS
Mapping and
transformation tools to
prepare data
Transform inputs to EDM
•Create mapping tool for each
metadata schema
•Look at examples to decide
mapping
Metadata schemas
transformed
•Dublin Core
•A2A
•EAD
•Custom from memorix
•Custom Catalan metadata schema
•Topx (working)
Data Source
Module

Local Data Lake
Document specification describing metadata schema mapping to EDM
Data Source
Module

Deduplication inputs challenges WHERE
WHO
WHAT
-----------------------------
Basílica
WHO
WHERE
WHEN
-----------------------------
1882-2026?
WHERE
WHO
WHAT
-----------------------------
Basílica
WHEN
-----------------------------
1882-2026?
COLLECTION 1
COLLECTION 2
Desired object
to Data Lake
Data Source
Module

How to use mapping tool
Data Source
Module
http://github.com/CSUC • ECHOES 1,2

Mapping and validation tools graphic user interface
Data Source
Module

Agenda
3. Development
3.1. Methodology
3.2. Data sources module 3.3. Data Lake module
3.4. Data retrieval and visualization module
3.5. Enrichment module

Data Lake module
DATA LAKE
Contains data from different sources
in EDM

Data Lake Module Data Lake stores a big amount of data
Data comes from different sources
All data is in the same format EDM
Data Lake
Module

EDM metadata schema
Data Lake
Module

Data Lake Analysis propose use DSpace
But starting to work on it
•DublinCore mapping was done ok
•A2A need to store relation between data
Data Lake
Module

Technologies Study and test graph database tool
•Behavior using real data
•Performance tests
•API
Data Lake
Module

Graph database Blazegraph database
•standards-based
•high-performance
•Scalable
Open-source
Written entirely in Java
Supports
•Blueprints
•RDF/SPARQL1.1family of specifications
Data Lake
Module

Agenda
3. Development
3.1. Methodology
3.2. Data sources module 3.3. Data Lake module
3.4. Data retrieval and visualization module
3.5. Enrichment module

Data retrieval and visualization module
SPARQL ENDPOINT
Open your collections and link them
to the world
WEB PORTAL
Modular and extensible
architecture

Data retrieval and visualization module Analysisphasepropose
Web Portal
SPQRL Endpoint
(as Data Lake) (Good intregr ation with DSpace)
Butproposalfor Data Lakewaschanged
Data retrieval and
visualization Module

Data retrieval and visualization module
SPARQL ENDPOINT
Open your collections and link them
to the world
Data retrieval and
visualization Module

SPARQL Endpoint Requirements
•Allow to export data to the semantic web
•APIs to export information to web pages or widgets
•Triplet Store Database
•SPARQL
Proposal
•YASGUI suite
– Query Editor YASQE
– Result Set Visualizer YASR
Data retrieval and
visualization Module

SAPQL Endpoint
PREFIX rdaGr2: <http://rdvocab.info/ElementsGr2/>
PREFIX dc: <http://purl.org/dc/elements/1.1/>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
SELECT ?gender (COUNT(?gender) AS ?Count)
WHERE
{ ?agent a edm:Agent ;
rdaGr2:gender ?gender .
?provided a edm:ProvidedCHO ;
dc:contributor ?agent .
?aggregacio edm:aggregatedCHO ?provided ;
edm:dataProvider ?institucio ;
edm:intermediateProvider ?col
}
GROUP BY ?gender ?institucio ?col
ORDER BY DESC(?Count)
EDM AGENT GENDER
http://blazegraph.pre.csuc.cat/echoes/short/Hk_GRIbxf
PREFIX rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#>
PREFIX rdfs: <http://www.w3.org/2000/01/rdf-schema#>
PREFIX edm: <http://www.europeana.eu/schemas/edm/>
PREFIX skos: <http://www.w3.org/2004/02/skos/core#>
SELECT distinct ?place WHERE {
?s a edm:Place;
skos:prefLabel ?place;
}
LIMIT 10
PLACES
http://blazegraph.pre.csuc.cat/echoes/short/S1xkWRZxG
Data retrieval and
visualization Module

Data retrieval and visualization module
WEB PORTAL
Modular and extensible
architecture
Data retrieval and
visualization Module

Web Portal Requirements
•browse, accessto, andsearchthecontents
•Differentvisualizationtools
•Advanced searchfunctionalities
Proposal
•WordPress
– CMS capabilitiesto manageweb pages
– Customizablecreatingplugins
– Retrieve data using BlazegraphAPI
– Visualization
Data retrieval and
visualization Module

Visualization focus
Data retrieval and
visualization Module

Echoes portal
Data retrieval and
visualization Module

Echoes portal
N o se p u ed e mo str ar la imag en en este mo men to .
Data retrieval and
visualization Module

Echoes portal
Data retrieval and
visualization Module

Data visualization
PLACES
EDM: Place
• Metadata
• Relations
Source
• Map
TIME
EDM: TimeSpan
• Metadata
• Relations
Source
• Timeline
• Heat map
CULTURAL
OBJECTS EDM: ProvidedCHO
• Metadata
• Relations
Source
• Map
PEOPLE
EDM:Agent
• Metadata
• Relations
Source
• Graph
Data retrieval and
visualization Module

Places visualization
Map
Search
•Using keywords
•Allow select a
region also
•Filter by type
Tooltip
•information
related to place
•Show more
option
Download
results in JSON,
CSV…
Data retrieval and
visualization Module

Cultural object visualization
Graph
Related
information
showed as a
graph
Tooltip
•Detailed
information
•Show more
options
Download results
in JSON, CSV…
Data retrieval and
visualization Module

Timeline
Search
•Between dates
•Period
Tooltip
•information
related to place
•Show more
option
Time span visualization
Data retrieval and
visualization Module

Time span visualization
Timeline
Search
•Between dates
Period showed
under years
Banner
•information
related to place
•Show more
option
•Click to next
Data retrieval and
visualization Module

Time span visualization
Heat map
Search
•Between dates
On mouse over day
box show number of
providedCHO related
and a link to show
them
Color darkens
depending on number
of occurrences
Selecting a date from
calendar related
providedCHO are
listed
Data retrieval and
visualization Module

Time span visualization
Timespan
Search
•Between dates
Show
providedCHO by
year
•Show more
options
Tooltip
•Detailed
information
Data retrieval and
visualization Module

Agent visualization
https://echoes.pre.csuc.cat/ag ents/demo/ • Using the same library
used to graph the relations
in the CHO details, with
icons, different type of
relations and pseudo-
hierarchy.
• Tooltip can be included
Graph
NGraph to show
agent relations in
a providedCHO
NTooltip
•Detailed
information
N o se p u ed e mo str ar la imag en en este mo men to .
Data retrieval and
visualization Module

Agent visualization
Family tree
Agents showed as
a simple hybrid
graph/tree
hierarchy
depending their
relation
Colored by gender
Data retrieval and
visualization Module

Agent visualization
https://echoes.pre.csuc.cat/ag ents/demo_three/ • Information showed as a
left to right tree
• Additional agent details
are showed
Family tree
Information
showed as a left
to right tree
Additional agent
details are
showed
Data retrieval and
visualization Module

Agenda
3. Development
3.1. Methodology
3.2. Data sources module 3.3. Data Lake module
3.4. Data retrieval and visualization module
3.5. Enrichment module

Enrichments module
AUTOMATIC
Predefined processes
include new metadata
related to objects
MANUAL
User empowering

Enrichments module
AUTOMATIC
Predefined processes
include new metadata
related to objects
Enrichments
Module
Which
•Select metadata fields
When
•New metadata can be
incorporated as a
– preprocess
– post process
How
•Reuse existing fields
•Create new metadata

Enrichments module Possible enrichments
• Place: TGN, GeoNames, Pleiades, HPN…
• Agent: VIAF, ULAN, GND, Wikidata..
• TimeSpan: PeriodO, ChronOntology..
• Concepts: LCSH
• Generals: Getty, Biotechnology Glossary, DBpedia,
EUROVOC, Geopolical Ontology
Enrichments
Module

Enrichments module
MANUAL
User empowering
Enrichments
Module
Coming soon…

Agenda
1. About Echoes
2. Analysis 3. Development
4. Conclusions

Lessons learned

Challenges in ECHOES development
Sprint 4: A2A relations doesn’t fit in a relational database
Sprint 11: Data quality vs Data quantity
Sprint 14: Create unique objects using metadata from
many sources

ECHOES architecture

Incoming challenges Automatic enrichments
•When
•How
•Sources
Manual enrichments
•Review user proposal
•Load new data in
Data Lake

Lessons learned
Modular architectureallowus
•Changetechnologies
•Addnewpiecesto modules
•Addnewfunctionalities

Lessons learned
Quality
Homogeneous
Schema
Quantity
Heterogeneous
Examples

Thanks for your
attention
Ús intern