Talk from Jakob Pörschmann, AI/ML Customer Engineer, Google Cloud
Size: 2.01 MB
Language: en
Added: Oct 14, 2024
Slides: 16 pages
Slide Content
Jakob Pörschmann - October 2024
Graph Storage:
10x faster, 100x cheaper
Building Beyond the Buzz V
??????????????????
Proprietary + ConfidentialProprietary + Confidential
Representing data as graphs uncovers
unprecedented insights.
Proprietary + Confidential
Extractive (local) questions require different
retrieval than aggregative (global) questions.
Text Embed. RAG
Doc
Chunk
Chunk
Chunk
Vector DB
[0.2, -0.8, …]
[0.5, -0.2, …]
[-0.7, -0.2, …]
User Query [0.1, -0.7, …]
Find Similar
Given some context:
[0.2, -0.8, …]
What is the answer to:
User Query
Original Doc chunks as final answer context.
Great at extractive answers.
Graph RAG
Doc
Entities
Edges
Graph DB
User Query of
“topic 2”
Given only relevant comm. reports:
Community 2
What is the answer to:
User Query
Community
“Topic 1”
Community
“Topic 2”
Intermediate answer to
user query based on all
community reports
Aggregated topics across docs as final context.
Great at aggregation answers.
Proprietary + Confidential
Where am I going to store this graph stuff…?
Graph DB hesitation because:
Learning Query lang. (Cypher/Gremlin)
Expanding tech stack
No decent serverless option
Proprietary + Confidential
Graphs are just a
special form of
semi-structured data
Proprietary + Confidential
If we only had DBs…
… with flexible schema.
… great at read heavy workloads.
… efficient to filter on attribute level.
Proprietary + Confidential
Proprietary + Confidential
Graphs can be stored in a regular NoSQL
DB with the right data model.
graph2nosql
on Github
Proprietary + Confidential
Wikidata5m
Proprietary + Confidential
Pricing: Aura (Neo4j) vs. Firestore (graph2nosql)
Proprietary + Confidential
Graph Databases build on index-free adjacency,
firestore on single-field indexes
“The graph structure is stored as fixed-length
records: one store for nodes and a similar one
for relationships.
Multiplying the ID of a record by its size in bytes
gives you its offset in the corresponding store
file. This pattern is known as index-free
adjacency [...]”
If no index exists for a query, most
databases crawl through their contents
item by item, [...]. Cloud Firestore
guarantees high query performance by
using indexes for all queries. As a result,
query performance depends on the size of
the result set and not on the number of
items in the database.
Proprietary + Confidential
Single-field indexes & data model allowing for
undirected & directed connections ftw.
Who are Bob’s friends?
vs.
“who is friends with Bob?”
Proprietary + Confidential
The Graph Storage landscape is underdeveloped
and has a lot of space to grow.
OSS Focus Firestore is private source, MongoDB graph2nosql interface tbd.
Neo4j UI
Cypher A structured query language is a huge advantage for exploration.
Neo4j UI is great for non-tech stakeholder’s exploration.
Cost + Latency Firestore outperforms in simple cases, let’s discuss.
hello-jp.net
Proprietary + Confidential
That’s a wrap!
Proprietary + Confidential
Try it out or contribute