This is an introduction and overview to the latest development Nosql and Graph Based Data Models
Size: 2.19 MB
Language: en
Added: Jan 02, 2017
Slides: 88 pages
Slide Content
Soulemane Moumie
@moumie.org
Graph Based Data Models
Outline
Introduction
Graph in real world
What is a Graph ?
Data Model
Graph in RDBMS
Graph-based modeling
Graph Databases
Graph Query Languages
Demo with Neo4J and OrientDB
Conclusions
Introduction
We live in a connected world. There are no isolated pieces of information
around us but rich ,connected domains all around us.
Interconnectivity of data is an important aspect.
Early adopters of graph technology re-imagined their businesses around
the value of data relationship.
These companies quickly grew up from unknown startup to large
industrial corporations.
Google, LinkedIn, PayPal, Facebook, Twitter.
Graph in real world
Fraud detection: uncovering fraud ring
Ref: 1
Graph in real world
Realtime recommendation engine
Ref: 1
Graph in real world
Master data management solutions: employee hierarchy data
Ref: 1
Graph in real world
Empowering Network and IT solutions: Troubleshooting
Ref: 1
Graph in real world
Social network
Ref: 1
What is a graph ?
Graph Theory is Boring …
Ref: 2
What is a graph?: History
Ref: 3
What is a graph?: Definition
What is a graph?: Definition
What is a graph?: Definition
What is a graph?: Type
What is a graph?: Density
What is a graph?: Density
What is a graph?: Density
What is a graph?: Graph storage
Graph in RDMBS?
Ref: 6
Data model
Definition: A data modelis an abstract model that organizes elements of dataand standardizes how they
relate to one another and to properties of the real world.
Ref: 7
Graph in RDMBS : Model
Ref: 4, 5
Graph in RDMBS : Model
Ref: 4, 5
Graph in RDMBS : Model
Ref: 4, 5
Graph in RDMBS : Model
Ref: 4, 5
Graph in RDMBS : Model
Ref: 4, 5
Graph in RDMBS : Model
Ref: 4, 5
Graph in RDMBS ?: Model
Ref: 4, 5
Graph in RDMBS ?: Model
Ref: 4, 5
Graph in RDMBS ?: Model
Ref: 4, 5
Graph in RDMBS ?: Model
Ref: 4, 5
Graph in JSON Database ?
Ref: 8
Graph in XML Database?
Ref: 8
Graph in RDMBS ?: Issues
While storing a graph in a relational
database is simple, querying it,
particularly traversing it,
can be time-inefficient due to the
number of potential joins with its
complex queries
Graph Database?
If any database can represent the
graph, then what is the graph
database ?
NoSQL : Characteristics
NoSQL : History
Ref: 9
NoSQL : Categories
Ref: 9
Graph Database?: Definition
“A graph database is any storage system that
provides index-free adjacency. ”
•Each vertex serves as a “mini index”of its adjacent elements
•No index lookups are necessary.
•The cost of the local step remains constant as the graph grows
•Cheaper than global indexes
Ref: 10
Graph Database?: Traversal
Ref: 11
Graph Database?:Traversal
Ref: 11
Graph Database?:Definition
“A database that uses graph structures for semantic
queries with nodes,
edges and properties to represent and store data”
Independent of the way the data is stored internally.
It‟s really the model and the implemented algorithms that matter.
Ref: 12
Graph data model : Representation
Graph data model : Representation
Graph data model : Representation
Graph data model : Representation
Graph data model : Representation
Graph data model : Building blocks
Nodes: entities
Relationships: connect
entities and structure
domain
Properties: attributes
and meta data
Labels: group node by
role
Graph data model : Building blocks
Graph data model : ERD example
Ref: 13
Graph data model : ERD example
Ref: 13
Graph data model : Why ?
For applications where „Interconnectivity and topology‟ matters.
It allows for a more natural modeling of connected data. Graph
structures are visible to the user and they allow a natural way of
handling applications data, for example, hypertext or geographic
data
Queries can refer directly to graph structure.
So, we can do specific graph operations like –shortest path, sub
graph determining etc.
For implementation, graph databases may provide special graph
storage structures, and efficient graph algorithms for realizing
specific operations
Ref: 14
Graph data model : Motivation and Application
Critic on classical DB models –drawbacks + difficulty for user
to see data connectivity.
For applications –where complexity exceed capabilities of
relational database. e.g.Managing transport n/w.
Limited expressive power of current query language.
the appearance of on-line hypertext evidenced the need for
other db-models.
In technological networks, the spatial and geographical aspects
of the structure are dominant.
Ref: 14
Database model : Components
Database model : Notions
Schema:
-Database schema is the skeleton of database.
-It is designed when the database doesn't exist at all.
-A database schema does not contain any data or information.
Instance
-It is a state of operational database with data at any given time.
-It contains a snapshot of the database.
-Database instances tend to change with time.
Graph database model : Definition
Ref: 14
Graph database model : representation
Representation of database:
flat graph: has many interconnected nodes, not expressive, extendible ,
difficult to present the information to the user in a clear way.
hypernode: set of nested graphs, expressive, it is a graph whose nodes
can themselves be a graph. Offers the ability to represent each real-world
object as a separate database entity.
Graph database model : Data structures
1. Genealogy diagram example
Ref: 14
Graph database model : Data structures
1. Logical Data model
The schema uses two basic type nodes for representing data values (N,L), and two product
type nodes (NL,PP) to establish relations among data values in a relational
style. The instance is a collection of tables, one for each node of the schema
Ref: 14
Graph database model : Data structures
2. Hypernode Data Model
The schema defines a person as a complex object with the properties name
and lastname of type string, and parent of type person (recursively defined). The instance
shows the relations in the genealogy among different instances of person
Ref: 14
Graph database model : Data structures
3. Hypergraph-Based Data Model (GROOVY)
GROOVY: Graphically Represented Object-Oriented data model with Values
The schema level models an object PERSON as a hypergraphthat relates the attributes
NAME, LASTNAME and PARENTS.
Ref: 14
Graph database model : Data structures
4. Graph Data Model (GDM)
Ref: 14
Graph database model : Integrity constraints
Integrity constraints are general statements and rules that define the set of consistent
database states, or changes of state. In the case of graph db-models, it includes:
Schema instance integrity: Entity types and type checking
Schema instance separation: degree to which schema and instance are different objects
in the database
Redundancy of data: preserve uniqueness of data
Object identity and referential integrity: Entity Integrity assures that each hypernode is a
unique real world entity identified by its content; Referential Integrity requires that only
existing entities be referenced.
Ref: 14
Graph database model : Comparison with other Database Models
Ref: 14
Graph Databases : Critics
Yes, graph model is more versatile than relational model, but it doesn't
make it universal -in some cases, this versatility is a roadblock for
optimizations.
In fact, modern graph databases are a niche solutions for a narrow set of
tasks -finding a route from A to B, working with friends in a social
network, information technology in medicine.
For most business applications relational databases continue to prevail.
Graph Databases : Critics
Relational databases were designed to aggregate
data, graph to find relations.
E.g: In the financial domain, all connections are known,
You only aggregate data by other data to find sums
and so on.
Graph Databases : Critics
Usually need to learn a new query language like
CIPHER, Gremlin, SparcQL
You have to use an API.
Fewer vendors to choose from, and smaller user
base, so harder to get support when you run into
issues
Graph Databases : Critics
Graph databases are relatively immature
compared to well-established RDBMS.
Requires conceptual shift
No standardization
Graph Query Language : Extended SQL &Gremlin
OrientDBis a 2nd Generation Distributed
Graph Databasewith the flexibility of
Documents in one product. OrientDBis
another great graph DB tool which also
operates as a document DB or an Object-
Oriented Database. Its query language is
based on SQL to make it 'more familiar to
TSQL developers'. Like Neo4J there is a
community edition available and licensing
for enterprise is very reasonable.
Ref: 19
Graph Query Language: Gremlin
A lot of graph databases support their custom languages (e.g. Cipher in Neo4j).
These languages are really useful, however they become useless on other databases.
Gremlin is a powerful domain specific traversal language for graph databases.
This language is supported by all popular graph databases.
Learning Gremlin for graph databases is equivalent to learning SQL for relational
databases.
Ref: 21
Graph Query Language: Gremlin
Ref: 22,23
References
[1] https://neo4j.com/blog/rdbms-graphs-basics-for-relational-developer/
[2] http://images.google.de/imgres?imgurl=http%3A%2F%2Fcdn2.business2community.com%2Fwp-
content%2Fuploads%2F2014%2F03%2Fistock_000006832296xsmall_small.jpg&imgrefurl=http%3A%2F%2Fwww.business2
community.com%2Fcontent-marketing%2Fbrand-boring-content-marketing-0819786&h=232&w=300&tbnid=GUe7dYIZl9-
29M%3A&docid=jRPxqmLQ1TLKWM&ei=2V1uV4 -eG8yWgAad_baAAQ&tbm=isch&client=firefox-
b&iact=rc&uact=3&dur=226&page=3&start=42&ndsp=27&ved=0ahUKEwjP7pz8_MLNAhVMC8AKHZ2 -
DRAQMwiLASgrMCs&bih=634&biw=1366
[3] http://www.slideshare.net/infinitegraph/an-introduction-to-graph-databases, slide 5
[4] Trees and Hierarchies in SQL for Smarties, Joe Celko, Morgan Kaufmann, ISBN: 1558609202
[5] http://www.slideshare.net/ehildebrandt/trees-and-hierarchies-in-sql
[6] http://www.slideshare.net/navicorevn/hierarchical-data-models-in-relational-databases
[7] https://en.wikipedia.org/wiki/Data_model
[8] http://www.slideshare.net/slidarko/graph-windycitydb2010/25-Representing_a_Graph_in_a
[9] GRAPH DATABASES AND ORIENTDB. INFO-H-415: Advanced Databases (Project). Professor: Esteban Zimányi,
cs.ulb.ac.be/public/_media/teaching/infoh415/student_projects/orientdb.pdf
References
[10] http://systemg.research.ibm.com/database.html
[11] https://www.youtube.com/watch?v=kpLqfFGubKM
[12] https://www.arangodb.com/2016/04/index-free-adjacency-hybrid-indexes-graph-databases/
[13] https://neo4j.com/blog/rdbms-vs-graph-data-modeling/
[14] Angles, R., & Gutierrez, "Survey of graph database models", ACM Computing Surveys, Vol.40, No.1, Article 1, Feb.2008
[15] http://db-engines.com/en/ranking_trend/graph+dbms
[16] R.Campbell et al., "A performance evaluation of open source graph databases",ACM ,PPAA ‟14, February 16, 2014.
[17] http://www.tutorialspoint.com/neo4j/neo4j_cql_introduction.htm
[18] https://neo4j.com/developer/cypher-query-language/
[19] http://orientdb.com/docs/last/index.html
[20] http://pettergraff.blogspot.de/2014/01/getting-started-with-orientdb.html
[21] http://www.fromdev.com/2013/09/Gremlin-Example-Query-Snippets-Graph-DB.html
[22] http://sql2gremlin.com/
[23] http://gremlindocs.spmallette.documentup.com/