web-of-dddddddddddddddddddddddddddddddddata-intro.pdf

testaccount387216 6 views 85 slides Sep 30, 2024
Slide 1
Slide 1 of 85
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85

About This Presentation

ssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssssss...


Slide Content

Introduction to Semantic Web  Introduction to Semantic Web 
Technologies & Linked Data Technologies & Linked Data
Oktie Hassanzadeh Oktie Hassanzadeh University of Toronto University of Toronto
March 2011 March 2011
CS 443: Database Management Systems   CS 443: Database Management Systems  
--
Winter 2011 Winter 2011

2
Outline O
Introduction
O
Semantic Web Technologies O
Resource Description Framework (RDF)
O
Querying RDF data (SPARQL)
O
Linked Data O
Linked Data Principles
O
Linking Open Data Community Project
O
Example Data Sources
O
Example Applications

3
Introduction
Web of Documents vs. Web of Data

4
Web of Documents
Primary objects: documents
Links between documents(or parts of them)
Degree of structure in data: fairly  low
Implicitsemantics of contents
Designed for: human consumption
HTML
HTML
HTML
API/
XML
Untyped
Links
Untyped
Links
Untyped
Links
A
B
C
D
Based on presentations by Chris Bizer, Richard Cyganiak, Tom Heath, available at http://linkeddata.org /guides-and-tutorials

5
Web of Documents: Problem HTML
HTML
HTML
API/
XML
Untyped
Links
Untyped
Links
Untyped
Links
A
B
C
D
thing
thing
?
?
?
?
?
Are two documents 
talking about the same 
“thing”?
?
Based on presentations by Chris Bizer, Richard Cyganiak, Tom Heath, available at http://linkeddata.org /guides-and-tutorials

6
Example Query
Elvis Presley
1935 -1977
Will there ever be someone like him again?
Based on presentation by Lauw, Schenkel, Suchanek, Theobald and Weikum, available at http://www.mpi-inf.mpg.de/yago-naga/CIKM10-tutorial/

7
Example Query
Another Elvis
Elvis Presley: The Early Years Elvis spent more weeks at the top of the charts
than any otherartist.
www.fiftiesweb.com/elvis.htm
Based on presentation by Lauw, Schenkel, Suchanek, Theobald and Weikum, available at http://www.mpi-inf.mpg.de/yago-naga/CIKM10-tutorial/

8
Example Query Personal relationships of Elvis Presley –Wikipedia ...when Elvis was a youngteen.... anothergirl whom the
singer's mother hoped Presley would .... The writer called
Elvis"a hillbilly cat”
en.wikipedia.org/.../Personal_relationships_of_Elvi s_Presley
Another singer called Elvis, young
Based on presentation by Lauw, Schenkel, Suchanek, Theobald and Weikum, available at http://www.mpi-inf.mpg.de/yago-naga/CIKM10-tutorial/

9
Example Query
Dear Mr. Page, you don’t understand me. I just...
ElvisPresley -Official pagefor ElvisPresley Welcome to the Official Elvis Presley Web Site, hom e of the
undisputed King of Rock 'n' Roll and his beloved Gr aceland ...
www.elvis.com/
Based on presentation by Lauw, Schenkel, Suchanek, Theobald and Weikum, available at http://www.mpi-inf.mpg.de/yago-naga/CIKM10-tutorial/

10
Example Query O
How about this query: O
How many romantic comedy Hollywood movies are 
directed by a person who is born in a city that has 
average temperature above 15 degrees!?
O
You need to: k
Find reliable sources containing facts about movies  (genre & 
director), birthplaces of famous artists/directors,  average 
temperature of cities across the world, etc. 
k
The result: several lists of thousands of facts
k
Integrate all the data, join the facts that come fr om 
heterogeneous sources
Even if possible, it may take days to answer just a single query!

11
Solution: Web of Data
Primary objects: “things”(or description of things)
Links between “things”
Degree of Structure:  High(based on RDF data model)
Explicit semantics of contents  and links
Designed for: Both machines and humans
Typed
Links
A
B
C
D
thin
g
thin
g
thin
g
thin
g
thin
g
thin
g
thin
g
thin
g
thin
g
thin
g
thin
g
thin
g
thin
g
thin
g
thin
g
thin
g
Typed
Links
Typed
Links

12
Web of Data
Typed
Links
A
B
C
D
thing
thing
thing
thing
thing
thing
thing
thing
thing
thing
thing
thing
thing
thing
thing
thing
Typed
Links
Typed
Links
HTMLHTML
EE
HTMLHTML
FF
HTMLHTML
GG
Based on presentations by Chris Bizer, Richard Cyganiak, Tom Heath, available at http://linkeddata.org /guides-and-tutorials

Introduction to 
Semantic Web Technologies
Resource Description Framework (RDF)

14
Semantic Web Technologies O
A set of technologies and frameworks that enable 
the Web of Data:
O
Resource Description Framework
(RDF)
O
A variety of data interchange formats 
(e.g.
RDF/XML
,
N3
,
Turtle
,
N-Triples
)
O
Notations such as
RDF Schema
(RDFS) and the
Web 
Ontology Language
(OWL)
O
All are intended to provide a
formal 
description
of
concepts
,
terms
, and 
relationships
within 
a given
knowledge domain
Source:
http://en.wikipedia.org/wiki/Semantic_Web

15
Resource Description Framework (RDF) t
Data model for describing “things”and their 
interrelations
t
Consists of statements about “things”(Web 
resources) in the form of subject-predicate-object 
expressions, also known as  triples
subject
Renee Miller Renee Miller
TeachesCSC443
object predicate
Lives in Toronto

16
Resource Description Framework (RDF) t
Data model for describing “things”and their 
interrelations
t
Consists of statements about “things”(Web 
resources) in the form of subject-predicate-object 
expressions, also known as  triples
<http://dbpedia.org/resource/Toronto>
<uri>
<http://cs.toronto.edu/~miller>
or
subjectobject predicate
<http://xmlns.com/foaf/spec/#term_based_near>
<http://cs.toronto.edu/~miller> <http://xmlns.com/foaf/spec/#term_based_near>
“Toronto”
<uri> or “literal” <uri>
Uniform Resource Identifier(URI): a string of characters used to
identify a name or a resource on the Internet .
http://en.wikipedia.org/wiki/Uniform_Resource_Identifier

17
A simple RDF example
Renée J. Miller
dbpedia:Toronto
foaf:name
foaf:based_near
foaf:Person
rdf:type
bb:renee-j-miller
Based on presentations by Chris Bizer, Richard Cygani ak, Tom Heath, available at http://linkeddata.org/gui des-and-tutorials

18
Data Items Identified with (HTTP) URIs bb:renee-j-miller
Renée J. Miller
dbpedia:Toronto
foaf:name
foaf:based_near
foaf:Person
rdf:type
bb:rjmiller
= http://data.bibbase.org/author/renee-j-miller/
dbpedia:Toronto
= http://dbpedia.org/resource/Toronto
Based on presentations by Chris Bizer, Richard Cygani ak, Tom Heath, available at http://linkeddata.org/gui des-and-tutorials

19
Dereferencing URIsover the Web
dp:Cities_in_Canada
4,753,120
dp:populationUrban
skos:subject
Renée J. Miller
dbpedia:Toronto
foaf:name
foaf:based_near
foaf:Person
rdf:type
dbpedia:Ottawa
dbpedia:Montreal
skos:subject
skos:subject
bb:renee-j-miller

20
A Simple RDF Example (in RDF/XML)
Renée J. Miller
dbpedia:Toronto
foaf:name
foaf:based_near
foaf:Person
rdf:type
bb:renee-j-miller
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/spec/#"
xmlns:bb="http://data.bibbase.org/ontology/">
<rdf:Description rdf:about="http://.../author/renee-j-miller/">
<rdf:type rdf:resource="http://xmlns.com/foaf/spec/#term_Person"/>
<foaf:name xml:lang=“en">Renée J. Miller</foaf:name>
<foaf:based_near rdf:resource="http://dbpedia.org/resource/Toronto"/>
</rdf:Description>
</rdf:RDF>
<?xml version="1.0"?>
<rdf:RDF xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#"
xmlns:foaf="http://xmlns.com/foaf/spec/#"
xmlns:bb="http://data.bibbase.org/ontology/">
<rdf:Description rdf:about="http://.../author/renee-j-miller/">
<rdf:type rdf:resource="http://xmlns.com/foaf/spec/#term_Person"/>
<foaf:name xml:lang=“en">Renée J. Miller</foaf:name>
<foaf:based_near rdf:resource="http://dbpedia.org/resource/Toronto"/>
</rdf:Description>
</rdf:RDF>

21
A Simple RDF Example (in Turtle)
Renée J. Miller
dbpedia:Toronto
foaf:name
foaf:based_near
foaf:Person
rdf:type
bb:renee-j-miller
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/spec/#> .
@prefix bb: <http://data.bibbase.org/ontology/> .
<http://data.bibbase.org/author/renee-j-miller/>
rdf:type foaf:person .
foaf:name “Renée J. Miller”@en ;
foaf:based_near <http://dbpedia.org/resource/Toronto>
@prefix rdf: <http://www.w3.org/1999/02/22-rdf-syntax-ns#> .
@prefix foaf: <http://xmlns.com/foaf/spec/#> .
@prefix bb: <http://data.bibbase.org/ontology/> .
<http://data.bibbase.org/author/renee-j-miller/>
rdf:type foaf:person .
foaf:name “Renée J. Miller”@en ;
foaf:based_near <http://dbpedia.org/resource/Toronto>

22
A Simple RDF Example (in RDFa)
Renée J. Miller
dbpedia:Toronto
foaf:name
foaf:based_near
foaf:Person
rdf:type
bb:renee-j-miller

<p about="http://.../author/renee-j-miller">The author
“<span property=“foaf:name” lang=“en”>Renée J. Miller</span>”
lives in the city
“<span rel=“foaf:based_near“ resource="http://…/Toronto">Toronto</span>”
</p> .


<p
about
="http://.../author/renee-j-miller">The author
“<span
property
=“foaf:name”
lang
=“en”>Renée J. Miller</span>”
lives in the city
“<span
rel
=“foaf:based_near“
resource
="http://…/Toronto">Toronto</span>”
</p> .

Based on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/

Introduction to 
Semantic Web Technologies
Storing & Querying RDF

24
Storing RDF Data t
Simplest way: store and publish RDF files
t
It’s like saving data into file system, instead of using  a DBMS, 
although it’s still  dataso it can be retrieved and processed by 
crawlers (machines)
t
RDF store (aka Triplestore)
t
Database systems designed for the storage and retrieval of RD F 
data
r
Some popular RDF stores include Sesame, Jena, Redla nd, 
OpenLinkVirtuoso
r
A more complete list at 
http://en.wikipedia.org/wiki/Triplestore
t
RDF view over relational data (aka RDB2RDF)
t
A very common approach
t
Topic of our next class

25
Querying RDF graphs O
RDF data can be retrieved and processed by machines , but 
is that enough? O
Using Jena (A Semantic Web Framework for Java):
StmtIterator iter=model.listStatements(subject,null,null);
while(iter.hasNext()) {
st = iter.next();
p = st.getProperty(); o = st.getObject();
do_something(p,o); }
StmtIterator iter=model.listStatements(subject,null,null);
while(iter.hasNext()) {
st = iter.next();
p = st.getProperty(); o = st.getObject();
do_something(p,o); }
O
In practice, more complex queries into the RDF data  are 
necessary
O
Based on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/

26
Analyze the Jena example
StmtIterator iter=model.listStatements(subject,null,null);
while(iter.hasNext()) {
st = iter.next();
p = st.getProperty(); o = st.getObject();
do_something(p,o);
StmtIterator iter=model.listStatements(subject,null,null);
while(iter.hasNext()) {
st = iter.next();
p = st.getProperty(); o = st.getObject();
do_something(p,o);
subject subject
?o?o?o?o ?o?o ?o?o
?p ?p
?p
?p
Based on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/

27
General: graph patterns O
The fundamental idea: use graph patterns O
The pattern contains unbound symbols
O
By binding the symbols, subgraphsof the RDF 
graph are selected
O
If there is such a selection, the query returns bou nd 
resources
Based on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/

28
Our example in SPARQL
O
The triples in WHERE define the graph pattern, with 
?p and ?o are “unbound”symbols
O
The query returns all p,opairs
SELECT ?p ?o
WHERE {subject ?p ?o}
SELECT ?p ?o
WHERE {subject ?p ?o}
subject subject
?o?o?o?o ?o?o ?o?o
?p ?p
?p
?p
Based on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/

29
Simple SPARQL example
a:name
http://…isbn/2020386682 http://…isbn/2020386682
http://…isbn/000651409X http://…isbn/000651409X :£:£
3333
p:currency rdf:value
:€:€
5050
p:currency rdf:value
:€:€
6060
p:currency rdf:value
:$:$
7878
p:currency rdf:value
Ghosh, Amitav Ghosh, Amitav
a:price a:price
a:price a:price
a:author a:author
SELECT ?isbn ?price ?currency # note: not ?x!
WHERE { ?isbn a:price ?x .
?x rdf:value ?price .
?x p:currency ?currency. }
SELECT ?isbn ?price ?currency # note: not ?x!
WHERE { ?isbn a:price ?x .
?x rdf:value ?price .
?x p:currency ?currency. } Based on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/

30
Simple SPARQL example
a:name
http://…isbn/2020386682 http://…isbn/2020386682
http://…isbn/000651409X http://…isbn/000651409X :£:£
3333
p:currency rdf:value
:€:€
5050
p:currency rdf:value
:€:€
6060
p:currency rdf:value
:$:$
7878
p:currency rdf:value
Ghosh, Amitav Ghosh, Amitav
a:price a:price
a:price a:price
a:author a:author
Returns: 
[<…409X>,33,:£]
SELECT ?isbn ?price ?currency # note: not ?x!
WHERE { ?isbn a:price ?x .
?x rdf:value ?price .
?x p:currency ?currency. }
SELECT ?isbn ?price ?currency # note: not ?x!
WHERE { ?isbn a:price ?x .
?x rdf:value ?price .
?x p:currency ?currency. } Based on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/

31
Simple SPARQL example
a:name
http://…isbn/2020386682 http://…isbn/2020386682
http://…isbn/000651409X http://…isbn/000651409X :£:£
3333
p:currency rdf:value
:€:€
5050
p:currency rdf:value
:€:€
6060
p:currency rdf:value
:$:$
7878
p:currency rdf:value
Ghosh, Amitav Ghosh, Amitav
a:price a:price
a:price a:price
a:author a:author
Returns: 
[<…409X>,33,:£]
,  
[<…409X>,50,:€]
SELECT ?isbn ?price ?currency # note: not ?x!
WHERE { ?isbn a:price ?x .
?x rdf:value ?price .
?x p:currency ?currency. }
SELECT ?isbn ?price ?currency # note: not ?x!
WHERE { ?isbn a:price ?x .
?x rdf:value ?price .
?x p:currency ?currency. } Based on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/

32
Simple SPARQL example
a:name
http://…isbn/2020386682 http://…isbn/2020386682
http://…isbn/000651409X http://…isbn/000651409X :£:£
3333
p:currency rdf:value
:€:€
5050
p:currency rdf:value
:€:€
6060
p:currency rdf:value
:$:$
7878
p:currency rdf:value
Ghosh, Amitav Ghosh, Amitav
a:price a:price
a:price a:price
a:author a:author
Returns: 
[<…409X>,33,:£]
,  
[<…409X>,50,:€]
,
[<…6682>,60,:€]
SELECT ?isbn ?price ?currency # note: not ?x!
WHERE { ?isbn a:price ?x .
?x rdf:value ?price .
?x p:currency ?currency. }
SELECT ?isbn ?price ?currency # note: not ?x!
WHERE { ?isbn a:price ?x .
?x rdf:value ?price .
?x p:currency ?currency. } Based on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/

33
Simple SPARQL example
a:name
http://…isbn/2020386682 http://…isbn/2020386682
http://…isbn/000651409X http://…isbn/000651409X :£:£
3333
p:currency rdf:value
:€:€
5050
p:currency rdf:value
:€:€
6060
p:currency rdf:value
:$:$
7878
p:currency rdf:value
Ghosh, Amitav Ghosh, Amitav
a:price a:price
a:price a:price
a:author a:author
Returns: 
[<…409X>,33,:£]
,  
[<…409X>,50,:€]
,
[<…6682>,60,:€]
,  
[<…6682>,78,:$]
SELECT ?isbn ?price ?currency # note: not ?x!
WHERE { ?isbn a:price ?x .
?x rdf:value ?price .
?x p:currency ?currency. }
SELECT ?isbn ?price ?currency # note: not ?x!
WHERE { ?isbn a:price ?x .
?x rdf:value ?price .
?x p:currency ?currency. } Based on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/

34
Pattern constraints
a:name
http://…isbn/2020386682 http://…isbn/2020386682
http://…isbn/000651409X http://…isbn/000651409X :£:£
3333
p:currency rdf:value
:€:€
5050
p:currency rdf:value
:€:€
6060
p:currency rdf:value
:$:$
7878
p:currency rdf:value
Ghosh, Amitav Ghosh, Amitav
a:price a:price
a:price a:price
a:author a:author
Returns: 
[<…409X>,50,:€]
,  
[<…6682>,60,:€]
SELECT ?isbn ?price ?currency # note: not ?x!
WHERE { ?isbn a:price ?x .
?x rdf:value ?price .
?x p:currency ?currency.
FILTER(?currency == :€) }
SELECT ?isbn ?price ?currency # note: not ?x!
WHERE { ?isbn a:price ?x .
?x rdf:value ?price .
?x p:currency ?currency.
FILTER(?currency == :€) } Based on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/

35
Many other SPARQL features O
Limit the number of returned results; remove 
duplicates, sort them, …
O
Optional branches: if some part of the pattern does 
not match, ignore it
O
Specify several data sources (via URI-s) within the 
query (essentially, a merge on-the-fly!)
O
Construct a graph using a separate pattern on the 
query results
O
In SPARQL 1.1: updating data, not only query
Based on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/
A good reference: SPARQL By Example
http://www.cambridgesemantics.com/2008/09/sparql-by-example/

36
SPARQL usage in practice O
SPARQL is usually used over the network O
Separate documents define the protocol and the 
result format
k
SPARQL Protocol for RDF with HTTP and SOAP bindings
k
SPARQL results in XML or JSON formats
O
Big datasets often offer “SPARQL endpoints”using 
this protocol
O
Typical example: SPARQL endpoint to DBpedia
Based on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/

37
SPARQL as a unifying point
SPARQL Processor
HTML
Unstructured Text
XML/XHTML
Relational
Database SQLtt ttRDF
Database
SPARQL Endpoint
Triple store
SPARQL Endpoint
RDF Graph
Application
RDFa
GRDDL, RDFa
NLP Techniques
SPARQL Construct
SPARQL Construct
Based on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/

38
Other Semantic Web Technologies O
Web Ontology Language (OWL)
k
A family of knowledge representation languages for  authoring ontologies for 
the Web
O
RDF Schema (RDFS)
O
RDF Vocabulary Description Language
k
http://www.w3.org/TR/rdf-schema/
k
How to use RDF to describe RDF vocabularies
O
Other RDF Vocabularies
O
Simple Knowledge Organization System (SKOS)
k
Designed for representation of thesauri, classifica tion schemes, taxonomies, 
subject-heading systems, or any other type of struc tured controlled 
vocabulary
O
FOAF (Friend of a friend)
k
A machine-readable ontology describing persons, the ir activities and their
relations to other people and objects.

Linked Data
Linked Data Principles
Linking Open Data Community Project

40
Linked Data t
Linked Datais a way of publishing data on the 
(Semantic) Web that:
t
Encourages reuse
t
Reduces redundancy
t
Maximises its (real and potential) inter-connectedness
t
Enables network effects to add value to data
Based on presentations by Chris Bizer, Richard Cygani ak, Tom Heath, available at http://linkeddata.org/gui des-and-tutorials

41
Principles of Linked Data 
1.
Use URIsas names for things
2.
Use HTTP URIsso that people can look up those 
names 3.
When someone looks up a URI, provide useful 
(RDF) information 4.
Include RDF statements that link to other URIsso that 
they can discover related things
Tim Berners-Lee 2007
http://www.w3.org/DesignIssues/LinkedData.html

42
Linking Open Data Community Project
O
A W3C SWEO community effort to O
Publish existing open license datasets as Linked Data 
on the Web
O
Interlink things between different data sources
O
Develop clients that consume Linked Data from the 
Web

43
Linked Open Data Cloud of Data Sets O
Over 500 million RDF triples
O
Around 120,000 RDF links between data sources
“Linking Open Data cloud diagram, by Richard Cyganiakand Anja Jentzsch. http://lod-cloud.net/”

44
LOD Cloud –July 2007
“Linking Open Data cloud diagram, by Richard Cyganiakand Anja Jentzsch. http://lod-cloud.net/”

45
LOD Cloud –August 2007
45
“Linking Open Data cloud diagram, by Richard Cyganiakand Anja Jentzsch. http://lod-cloud.net/”

46
LOD Cloud –November 2007
“Linking Open Data cloud diagram, by Richard Cyganiakand Anja Jentzsch. http://lod-cloud.net/”

47
LOD Cloud September 2008
47
“Linking Open Data cloud diagram, by Richard Cyganiakand Anja Jentzsch. http://lod-cloud.net/”

48
LOD Cloud –March 2009
Overall Statistics
Triples #: 7.7 billion
Links (external) #: 142 million
“Linking Open Data cloud diagram, by Richard Cyganiakand Anja Jentzsch. http://lod-cloud.net/”

49
LOD Cloud –September 2010
Overall Statistics
Triples #: ~25 billion
Links (external) #: ~440 million
“Linking Open Data cloud diagram, by Richard Cyganiakand Anja Jentzsch. http://lod-cloud.net/”

50
Properties of Web of Linked Data O
Anyone can publish data on the Web of Linked Data
O
Entities are connected by links O
Creating a global data graph that spans data sources an d 
enables the discovery of new data sources.
O
Data is self-describing  O
If an application encounters data represented using an 
unfamiliar vocabulary, the application can resolve the URIs
that identify vocabulary terms in order to find their RDFS or 
OWL definition.
O
The Web of Data is open  O
Meaning that applications can discover new data sources at  
run-time by following links.
Based on presentations by Chris Bizer, Richard Cygani ak, Tom Heath, available at http://linkeddata.org/gui des-and-tutorials

Linked Data
Example Linked Data Sources
Example Applications

52
DBpedia O
Linked data source created by:
O
Extracting structured information from Wikipedia k
Using “infobox”of the articles
O
Establishing links to other external sources

53
From Info Boxes to Linked Data
Based on presentation by Anja Jentzsch, available at ht tp://www.swib09.de/vortraege/20091124_jentzsch.pdf

54
Information Extraction O
Why only info boxes? O
Extraction from text could be very challenging
k
One of many problems: Entity Disambiguation
“Penn“
“U Penn“
University of
Pennsylvania
“Penn State“
Pennsylvania
State University
„PSU“
Pennsylvania
(US State)
Sean Penn Passenger
Service Unit
??
Based on presentation by Lauw, Schenkel, Suchanek, Theobald and Weikum, available at http://www.mpi-inf.mpg.de/yago-naga/CIKM10-tutorial/

55
DBpediaData

56
Example App: DBpediaMobile
Figures from: Christian Becker, Christian Bizer: DB pedia Mobile: A Location-Enabled Linked Data Browser. LDOW 2008

57
DBpediaRelFinder
http://relfinder.dbpedia.org/
O
Interactive online relationship discovery

58
Freebase O
Free (open, CC-licensed) web repository of over 12 
million things (objects) of almost any type (movies, 
books, celebrities, locations, companies and more.)
O
Centralized, community-driven approach to publishing 
Linked Data
O
Each object has a globally unique identifier (GUID)
O
Data can be retrieved in RDF or JSON
O
Nice APIs, a query language (MQL), and a set of tools 
to simplify editing, publishing or retrieving the data

59
Freebase Interface O
Freebase snapshot

60
Freebase Data

61
Freebase Data

62
Freebase Applications

63
Popular Freebase Apps O
Google Refine
-a power tool for data cleaning and discovery.
O
Powerset
-a semantic search engine that searched Freebase fo r 
answers to natural-language questions (purchased by  Microsoft and 
used in their Bingsearch engine)
O
Freebase genealogy
-family-tree viewer
O
FMDb
-a Freebase
IMDB
O
Freebase sets
-a clone of
Google sets
using Freebase data
O
Parallax data viewer
-an alternative UI
O
Freebase Schema Explorer
-a visualiserfor Freebase's
ontologies
O
2D Visualiser
-a Java app browser
O
Thinkbase
-a visual graph-based exploration tool
Source: Wikipedia - http://en.wikipedia.org/wiki/Fre ebase_(database)

64
LinkedMDB
t
The first Linked Data source dedicated to movies an d movie-
related information
t
Published by D2R Server  (Linked Data interface to relational data)
t
Provides links to other linked data sources and movie web sites t
Meta-data about how the links are discovered are published in the 
form of Linked Data
t
Contains ~6 million triples and ~0.5 million extern al links 
t
Won the first prize at  Triplification Challenge t
I-Semantics Conference, October 2008

65
Application Built on top of LinkedMDB
O
Automatically-
generated quiz 
about your favorite 
actor or actress
O
Web application built 
in less than 10 KB O
Uses LinkedMDB’s data 
via SPARQL O
Try it out:  http://10k.aneventapart.com/ Uploads/310/

66
Linked Clinical Trials O
Part of the Linking Open Drug Data Project under W3C ’s Health Care 
and Life Sciences (HCLS) task force
O
Won the first prize at Triplification Challenge
k
I-Semantics Conference, October 2009
O
LinkedCTdata –http://linkedct.org
O
Published by D2R Server 
k
from ClinicalTrials.gov data
k
Online registry of clinical trials conducted in the  United States and around the 
world k
Published in XML
O
Links to several external sources about drugs, dise ases, locations, etc.
O
Actively working with domain experts (W3C’s HCLS mem bers) on 
increasing the quality of the data and the links ba sed on real use cases 
O
See: Enabling Tailored Therapeutics with Linked DataBy A. Jentzsch et al. at LDOW2010

67
BibBase O
Goals O
Makes it easy for scientists and research groups to maintai n 
publications pages
k
Users maintain a BibTeX file; BibBase does the rest
k
Publishes a good-looking custom HTML page
k
Publishes the data in RDF, with a SPARQL endpoint
k
Links entries to the open linked data cloud on-the- fly
O
With incentive, scientists are helping us build a high-qu ality 
bibliographic database (think DBLP but automated)
O
Invaluable data set for benchmarking duplicate detection 
and semantic link discovery systems

68
BibBase

69
BibBase
(from http://www.cs.toronto.edu/~miller)

70
BibBase

71
Revyu.com: Review Anything

72
New York Times Datadata.nytimes.com

73
Example App: NYT Articles on University Alumni
Based on presentation by Ivan Herman, available at http://www.w3.org/2010/Talks/0622-SemTech-IH/

74
Dynamic Worldwide Earthquake Map Using 
Linked Data from Data.gov

75
Other Linked Data Applications O
Browsing O
E.g., Marbles and DBpediaMobile
O
Searching O
Falcons
O
Mashupsand more O
Revyu, BBC Music, Pipe
O
Enterprise analytics

76
Browsing Linked Data O
Linked Data Browsers O
Tabulator Browser (MIT, USA)
O
Marbles (FU Berlin, DE)
O
OpenLinkRDF Browser (OpenLink, UK)
O
ZitgistRDF Browser (Zitgist, USA)
O
Humboldt (HP Labs, UK)
O
Disco HyperdataBrowser (FU Berlin, DE)
O
Fenfire(DERI, Irland)
O
Try the LOD Browser Switch O
http://browse.semanticweb.org/
Based on presentations by Chris Bizer, Richard Cyganiak, Tom Heath, available at http://linkeddata.org /guides-and-tutorials

77
Marbles
Based on presentations by Chris Bizer, Richard Cyganiak, Tom Heath, available at http://linkeddata.org /guides-and-tutorials

78

79 Based on presentations by Chris Bizer, Richard Cyganiak, Tom Heath, available at http://linkeddata.org /guides-and-tutorials

80
DERI Semantic Web Mashup
Based on presentations by Chris Bizer, Richard Cyganiak, Tom Heath, available at http://linkeddata.org /guides-and-tutorials

81
Yahoo!, Google & Facebook I
Yahoo! and Google have started to crawl Linked Data  in its 
RDFaserialization as well as Microformats.  I
Yahoo!
I
Provides access to crawled data through the Yahoo B OSS API 
I
Uses the data within Yahoo Search Monkey to make se arch 
results more useful and visually appealing. 
I
Google 
I
Uses crawled RDF data for its Social Graph API 
I
Plans to / uses crawled data to enhance search resu lts snippets 
for reviews and people.
I
Facebook
I
The Open Graph Protocol is based on RDFa: 
http://ogp.me/
I
Application: Facebook’s“Like”Buttons, and much more
Based on presentations by Chris Bizer, Richard Cyganiak, Tom Heath, available at http://linkeddata.org /guides-and-tutorials

82
Linked Open Data in IBM’s Watson? O
Watson: IBM’s supercomputer that defeated two of Jeo pardy’s 
greatest players
O
Check out 
http://www.ibmwatson.com
O
Linked Data used to enhance Natural Language Questi on Answering
O
Finding/Verifying object types of the possible answ ers
Note: Watson was not connected to the Internet during the game, so the (linked) data has been used offline.
Source: Chris Welty’s talk at ISWC2010 &
IBM China Research Lab’s DeepQA home page
http://www.research.ibm.com/deepqa/china_research_lab.shtml
Also see Welty’s talk at ISWC 2007:
http://videolectures.net/iswc07_welty_hiwr/

83
Verifying Answer Types in Watson
The category:U.S. cities.
The question:"This U.S. city's largest airport is named for a fa mous World
War II hero, its second largest for a famous World War II battle.“
Watson’s answer: What is Toronto???

84
Linked Data Application Highlights O
See “Semantic Web Success Stories”from a recent tuto rial 
at ISWC 2010 conference
O
http://people.csail.mit.edu/pcm/SemWebTutorial.html
O
Linked Data published by New York Times, Best Buy,  U.S. 
and U.K. governments O
Linked Open Data used in IBM’s Watson (DeepQAproject )
O
Yahoo! and Google user RDFa, Freebase data powers B ing
O
Healthcare and life sciences applications
O
See W3C’s HCLS group activities  http://www.w3.org/2001/sw/hcls/

85
References t
New Book -Linked Data: Evolving the Web into a 
Global Data Space. By Tom Heath & Christian Bizer
Available online at 
http://linkeddatabook.com/editions/1.0/
t
Linking Open Data Project Wiki  http://esw.w3.org/topic/SweoIG/TaskForces/CommunityProjects/LinkingOp enData
t
Tutorial on How to Publish Linked Data on the Web  http://www4.wiwiss.fu-berlin.de/bizer/pub/LinkedDat aTutorial/
http://linkeddata.org