Graph based data models

moumie 1,983 views 88 slides Jan 02, 2017
Slide 1
Slide 1 of 88
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88

About This Presentation

This is an introduction and overview to the latest development Nosql and Graph Based Data Models


Slide Content

Soulemane Moumie
@moumie.org
Graph Based Data Models

Outline
Introduction
Graph in real world
What is a Graph ?
Data Model
Graph in RDBMS
Graph-based modeling
Graph Databases
Graph Query Languages
Demo with Neo4J and OrientDB
Conclusions

Introduction
We live in a connected world. There are no isolated pieces of information
around us but rich ,connected domains all around us.
Interconnectivity of data is an important aspect.
Early adopters of graph technology re-imagined their businesses around
the value of data relationship.
These companies quickly grew up from unknown startup to large
industrial corporations.
Google, LinkedIn, PayPal, Facebook, Twitter.

Graph in real world
Fraud detection: uncovering fraud ring
Ref: 1

Graph in real world
Realtime recommendation engine
Ref: 1

Graph in real world
Master data management solutions: employee hierarchy data
Ref: 1

Graph in real world
Empowering Network and IT solutions: Troubleshooting
Ref: 1

Graph in real world
Social network
Ref: 1

What is a graph ?
Graph Theory is Boring …
Ref: 2

What is a graph?: History
Ref: 3

What is a graph?: Definition

What is a graph?: Definition

What is a graph?: Definition

What is a graph?: Type

What is a graph?: Density

What is a graph?: Density

What is a graph?: Density

What is a graph?: Graph storage

Graph in RDMBS?
Ref: 6

Data model
Definition: A data modelis an abstract model that organizes elements of dataand standardizes how they
relate to one another and to properties of the real world.
Ref: 7

Graph in RDMBS : Model
Ref: 4, 5

Graph in RDMBS : Model
Ref: 4, 5

Graph in RDMBS : Model
Ref: 4, 5

Graph in RDMBS : Model
Ref: 4, 5

Graph in RDMBS : Model
Ref: 4, 5

Graph in RDMBS : Model
Ref: 4, 5

Graph in RDMBS ?: Model
Ref: 4, 5

Graph in RDMBS ?: Model
Ref: 4, 5

Graph in RDMBS ?: Model
Ref: 4, 5

Graph in RDMBS ?: Model
Ref: 4, 5

Graph in JSON Database ?
Ref: 8

Graph in XML Database?
Ref: 8

Graph in RDMBS ?: Issues
While storing a graph in a relational
database is simple, querying it,
particularly traversing it,
can be time-inefficient due to the
number of potential joins with its
complex queries

Graph Database?
If any database can represent the
graph, then what is the graph
database ?

NoSQL : Characteristics

NoSQL : History
Ref: 9

NoSQL : Categories
Ref: 9

Graph Database?: Definition
“A graph database is any storage system that
provides index-free adjacency. ”
•Each vertex serves as a “mini index”of its adjacent elements
•No index lookups are necessary.
•The cost of the local step remains constant as the graph grows
•Cheaper than global indexes
Ref: 10

Graph Database?: Traversal
Ref: 11

Graph Database?:Traversal
Ref: 11

Graph Database?:Definition
“A database that uses graph structures for semantic
queries with nodes,
edges and properties to represent and store data”
Independent of the way the data is stored internally.
It‟s really the model and the implemented algorithms that matter.
Ref: 12

Graph data model : Representation

Graph data model : Representation

Graph data model : Representation

Graph data model : Representation

Graph data model : Representation

Graph data model : Building blocks
Nodes: entities
Relationships: connect
entities and structure
domain
Properties: attributes
and meta data
Labels: group node by
role

Graph data model : Building blocks

Graph data model : ERD example
Ref: 13

Graph data model : ERD example
Ref: 13

Graph data model : Why ?
For applications where „Interconnectivity and topology‟ matters.
It allows for a more natural modeling of connected data. Graph
structures are visible to the user and they allow a natural way of
handling applications data, for example, hypertext or geographic
data
Queries can refer directly to graph structure.
So, we can do specific graph operations like –shortest path, sub
graph determining etc.
For implementation, graph databases may provide special graph
storage structures, and efficient graph algorithms for realizing
specific operations
Ref: 14

Graph data model : Motivation and Application
Critic on classical DB models –drawbacks + difficulty for user
to see data connectivity.
For applications –where complexity exceed capabilities of
relational database. e.g.Managing transport n/w.
Limited expressive power of current query language.
the appearance of on-line hypertext evidenced the need for
other db-models.
In technological networks, the spatial and geographical aspects
of the structure are dominant.
Ref: 14

Database model : Components

Database model : Notions
Schema:
-Database schema is the skeleton of database.
-It is designed when the database doesn't exist at all.
-A database schema does not contain any data or information.
Instance
-It is a state of operational database with data at any given time.
-It contains a snapshot of the database.
-Database instances tend to change with time.

Graph database model : Definition
Ref: 14

Graph database model : representation
Representation of database:
flat graph: has many interconnected nodes, not expressive, extendible ,
difficult to present the information to the user in a clear way.
hypernode: set of nested graphs, expressive, it is a graph whose nodes
can themselves be a graph. Offers the ability to represent each real-world
object as a separate database entity.

Graph database model : Data structures
1. Genealogy diagram example
Ref: 14

Graph database model : Data structures
1. Logical Data model
The schema uses two basic type nodes for representing data values (N,L), and two product
type nodes (NL,PP) to establish relations among data values in a relational
style. The instance is a collection of tables, one for each node of the schema
Ref: 14

Graph database model : Data structures
2. Hypernode Data Model
The schema defines a person as a complex object with the properties name
and lastname of type string, and parent of type person (recursively defined). The instance
shows the relations in the genealogy among different instances of person
Ref: 14

Graph database model : Data structures
3. Hypergraph-Based Data Model (GROOVY)
GROOVY: Graphically Represented Object-Oriented data model with Values
The schema level models an object PERSON as a hypergraphthat relates the attributes
NAME, LASTNAME and PARENTS.
Ref: 14

Graph database model : Data structures
4. Graph Data Model (GDM)
Ref: 14

Graph database model : Integrity constraints
Integrity constraints are general statements and rules that define the set of consistent
database states, or changes of state. In the case of graph db-models, it includes:
Schema instance integrity: Entity types and type checking
Schema instance separation: degree to which schema and instance are different objects
in the database
Redundancy of data: preserve uniqueness of data
Object identity and referential integrity: Entity Integrity assures that each hypernode is a
unique real world entity identified by its content; Referential Integrity requires that only
existing entities be referenced.
Ref: 14

Graph database model : Comparison with other Database Models
Ref: 14

Graph Databases : Critics
Yes, graph model is more versatile than relational model, but it doesn't
make it universal -in some cases, this versatility is a roadblock for
optimizations.
In fact, modern graph databases are a niche solutions for a narrow set of
tasks -finding a route from A to B, working with friends in a social
network, information technology in medicine.
For most business applications relational databases continue to prevail.

Graph Databases : Critics
Relational databases were designed to aggregate
data, graph to find relations.
E.g: In the financial domain, all connections are known,
You only aggregate data by other data to find sums
and so on.

Graph Databases : Critics
Usually need to learn a new query language like
CIPHER, Gremlin, SparcQL
You have to use an API.
Fewer vendors to choose from, and smaller user
base, so harder to get support when you run into
issues

Graph Databases : Critics
Graph databases are relatively immature
compared to well-established RDBMS.
Requires conceptual shift
No standardization

Graph Databases : Trends
Ref: 15

Graph Databases : The most popular
Ref: 16

Graph Query Language:
Cypher
Extended SQL
Gremlin

Graph Query Language: CQL
Ref: 17

Graph Query Language: Neo4j CQL Commands/Clauses

Graph Query Language: Cypher
Ref: 18

Graph Query Language: Cypher
Ref: 18

Graph Query Language: Cypher
Ref: 18

Graph Query Language: Cypher
Ref: 18

Graph Query Language: Cypher
Ref: 18

Graph Query Language: Cypher
Ref: 18

Graph Query Language : Extended SQL &Gremlin
OrientDBis a 2nd Generation Distributed
Graph Databasewith the flexibility of
Documents in one product. OrientDBis
another great graph DB tool which also
operates as a document DB or an Object-
Oriented Database. Its query language is
based on SQL to make it 'more familiar to
TSQL developers'. Like Neo4J there is a
community edition available and licensing
for enterprise is very reasonable.
Ref: 19

Graph Query Language: Extended&SQL,&Gremlin
Ref: 20

Graph Query Language: OrientDB SQL: schema
Ref: 20

Graph Query Language: Populate orientDB

Graph Query Language: Queries

Graph Query Language: Gremlin
A lot of graph databases support their custom languages (e.g. Cipher in Neo4j).
These languages are really useful, however they become useless on other databases.
Gremlin is a powerful domain specific traversal language for graph databases.
This language is supported by all popular graph databases.
Learning Gremlin for graph databases is equivalent to learning SQL for relational
databases.
Ref: 21

Graph Query Language: Gremlin
Ref: 22,23

References
[1] https://neo4j.com/blog/rdbms-graphs-basics-for-relational-developer/
[2] http://images.google.de/imgres?imgurl=http%3A%2F%2Fcdn2.business2community.com%2Fwp-
content%2Fuploads%2F2014%2F03%2Fistock_000006832296xsmall_small.jpg&imgrefurl=http%3A%2F%2Fwww.business2
community.com%2Fcontent-marketing%2Fbrand-boring-content-marketing-0819786&h=232&w=300&tbnid=GUe7dYIZl9-
29M%3A&docid=jRPxqmLQ1TLKWM&ei=2V1uV4 -eG8yWgAad_baAAQ&tbm=isch&client=firefox-
b&iact=rc&uact=3&dur=226&page=3&start=42&ndsp=27&ved=0ahUKEwjP7pz8_MLNAhVMC8AKHZ2 -
DRAQMwiLASgrMCs&bih=634&biw=1366
[3] http://www.slideshare.net/infinitegraph/an-introduction-to-graph-databases, slide 5
[4] Trees and Hierarchies in SQL for Smarties, Joe Celko, Morgan Kaufmann, ISBN: 1558609202
[5] http://www.slideshare.net/ehildebrandt/trees-and-hierarchies-in-sql
[6] http://www.slideshare.net/navicorevn/hierarchical-data-models-in-relational-databases
[7] https://en.wikipedia.org/wiki/Data_model
[8] http://www.slideshare.net/slidarko/graph-windycitydb2010/25-Representing_a_Graph_in_a
[9] GRAPH DATABASES AND ORIENTDB. INFO-H-415: Advanced Databases (Project). Professor: Esteban Zimányi,
cs.ulb.ac.be/public/_media/teaching/infoh415/student_projects/orientdb.pdf

References
[10] http://systemg.research.ibm.com/database.html
[11] https://www.youtube.com/watch?v=kpLqfFGubKM
[12] https://www.arangodb.com/2016/04/index-free-adjacency-hybrid-indexes-graph-databases/
[13] https://neo4j.com/blog/rdbms-vs-graph-data-modeling/
[14] Angles, R., & Gutierrez, "Survey of graph database models", ACM Computing Surveys, Vol.40, No.1, Article 1, Feb.2008
[15] http://db-engines.com/en/ranking_trend/graph+dbms
[16] R.Campbell et al., "A performance evaluation of open source graph databases",ACM ,PPAA ‟14, February 16, 2014.
[17] http://www.tutorialspoint.com/neo4j/neo4j_cql_introduction.htm
[18] https://neo4j.com/developer/cypher-query-language/
[19] http://orientdb.com/docs/last/index.html
[20] http://pettergraff.blogspot.de/2014/01/getting-started-with-orientdb.html
[21] http://www.fromdev.com/2013/09/Gremlin-Example-Query-Snippets-Graph-DB.html
[22] http://sql2gremlin.com/
[23] http://gremlindocs.spmallette.documentup.com/

Danke !