Graph Storage: 10x faster & 100x cheaper

chloewilliams62 44 views 16 slides Oct 14, 2024
Slide 1
Slide 1 of 16
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16

About This Presentation

Talk from Jakob Pörschmann, AI/ML Customer Engineer, Google Cloud


Slide Content

Jakob Pörschmann - October 2024
Graph Storage:
10x faster, 100x cheaper
Building Beyond the Buzz V
??????????????????

Proprietary + ConfidentialProprietary + Confidential
Representing data as graphs uncovers
unprecedented insights.

Proprietary + Confidential
Extractive (local) questions require different
retrieval than aggregative (global) questions.
Text Embed. RAG
Doc
Chunk
Chunk
Chunk
Vector DB
[0.2, -0.8, …]
[0.5, -0.2, …]
[-0.7, -0.2, …]

User Query [0.1, -0.7, …]
Find Similar
Given some context:
[0.2, -0.8, …]

What is the answer to:
User Query

Original Doc chunks as final answer context.

Great at extractive answers.
Graph RAG
Doc
Entities
Edges
Graph DB
User Query of
“topic 2”
Given only relevant comm. reports:
Community 2

What is the answer to:
User Query

Community
“Topic 1”
Community
“Topic 2”
Intermediate answer to
user query based on all
community reports
Aggregated topics across docs as final context.

Great at aggregation answers.

Proprietary + Confidential
Where am I going to store this graph stuff…?
Graph DB hesitation because:
Learning Query lang. (Cypher/Gremlin)
Expanding tech stack
No decent serverless option

Proprietary + Confidential
Graphs are just a
special form of
semi-structured data

Proprietary + Confidential
If we only had DBs…

… with flexible schema.

… great at read heavy workloads.

… efficient to filter on attribute level.

Proprietary + Confidential

Proprietary + Confidential
Graphs can be stored in a regular NoSQL
DB with the right data model.
graph2nosql
on Github

Proprietary + Confidential
Wikidata5m

Proprietary + Confidential
Pricing: Aura (Neo4j) vs. Firestore (graph2nosql)

Proprietary + Confidential
Pricing: Aura (Neo4j) vs. Firestore (graph2nosql)
GCP pricing calculator

Proprietary + Confidential
Graph Databases build on index-free adjacency,
firestore on single-field indexes
“The graph structure is stored as fixed-length
records: one store for nodes and a similar one
for relationships.
Multiplying the ID of a record by its size in bytes
gives you its offset in the corresponding store
file. This pattern is known as index-free
adjacency [...]”
If no index exists for a query, most
databases crawl through their contents
item by item, [...]. Cloud Firestore
guarantees high query performance by
using indexes for all queries. As a result,
query performance depends on the size of
the result set and not on the number of
items in the database.

Proprietary + Confidential
Single-field indexes & data model allowing for
undirected & directed connections ftw.
Who are Bob’s friends?
vs.
“who is friends with Bob?”

Proprietary + Confidential
The Graph Storage landscape is underdeveloped
and has a lot of space to grow.
OSS Focus Firestore is private source, MongoDB graph2nosql interface tbd.
Neo4j UI
Cypher A structured query language is a huge advantage for exploration.
Neo4j UI is great for non-tech stakeholder’s exploration.
Cost + Latency Firestore outperforms in simple cases, let’s discuss.
hello-jp.net

Proprietary + Confidential
That’s a wrap!

Proprietary + Confidential
Try it out or contribute
Tags