Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced RAG.pdf
neo4j
209 views
40 slides
May 01, 2024
Slide 1 of 40
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
About This Presentation
These are the slides delivered in a workshop at Data Innovation Summit Stockholm April 2024, by Kristof Neys and Jonas El Reweny.
Size: 3.93 MB
Language: en
Added: May 01, 2024
Slides: 40 pages
Slide Content
Best of Both Worlds: Combine
KG and Vector search for
enhanced RAG
Data Innovation Summit 2024
Jonas El Reweny, Kristof Neys
Neo4j Field Engineering
Agenda
Neo4j Inc. All rights reserved 20232
1.Knowledge Graph
2.Graph Query Language
3.Graph Data Science
4.Vectors
5.Demo Time!
Notebook in Google Colab:
tinyurl.com/disws24
Neo4j Sandbox:
sandbox.neo4j.com
Prerequisites for the workshop:
●Laptop with internet access and no
outbound restrictions on ports 80,
443, 7687
●Register an account and log in to
https://sandbox.neo4j.com and
select the "Blank Sandbox" project
●Register an account and log in to
https://colab.research.google.com/
But….
First a word from our
sponsor…
Neo4j Inc. All rights reserved 20233
Neo4j Inc. All rights reserved 20234
Neo4j: The Graph Database
& Analytics Leader
Neo4j Inc. All rights reserved 20235
300
1B+ Enterprise
customers
$500M
in funding
170+
Global partner
ecosystem
250K
Community of developers
and data pros
100M+
Downloads
The first-ever graph database
Neo4j Inc. All rights reserved 20237
The core graph object:
a Knowledge Graph
Recap a Knowledge Graph
A knowledge graph is a
structured representation
of facts, consisting of
entities, relationships and
semantic descriptions
8 Neo4j Inc. All rights reserved 2024
From data points to a Knowledge Graph
9 Neo4j Inc. All rights reserved 2024
From data points to a Knowledge Graph
10 Neo4j Inc. All rights reserved 2024
From data points to a Knowledge Graph
11 Neo4j Inc. All rights reserved 2024
From data points to a Knowledge Graph
12 Neo4j Inc. All rights reserved 2024
•Node2Vec
•FastRP
•FastRPExtended
•GraphSAGE
•Synthetic Graph Generation
•Scale Properties
•Collapse Paths
•One Hot Encoding
•Split Relationships
•Graph Export
•Pregel API (write your own algos)
23 Neo4j Inc. All rights reserved 2023
24 Neo4j Inc. All rights reserved 2023
It’s Better with Vectors…
Neo4j Inc. All rights reserved 2023
What is a Vector?
Neo4j Inc. All rights reserved 202325
What is a vector
Neo4j Inc. All rights reserved 202326
●Length
●Direction
●Components have meaning
horizontal
vertical
Vector arithmetic
Neo4j Inc. All rights reserved 202327
1
a
b
2
a
b
3
a + b
Kings and Queens
Neo4j Inc. All rights reserved 202328
king − man + woman ≈ queen
king
man
woman
1
king
man
woman
2
queen?
3
What are vector embeddings
Neo4j Inc. All rights reserved 202329
●Same concepts, just “an arrow”
●100s or 1000s dimensions
Finding Similar vectors
Neo4j Inc. All rights reserved 202330
●cosine
●direction / angle based
vector point
query
nearest 4
●Euclidean
●distance based
Why a Vector Store?
Neo4j Inc. All rights reserved 202331
Why & What is a Vector Index?
●Data applied on: encoding vectors of mainly unstructured data such
as text, audio, video that is converted using embedding models
(“Raw” vectors).
●Main purpose: deploy approximate methods to perform similarity
search at lower computational cost.
●Once an embedding vector has been created as a node property a vector
index can be created across those properties.
●This indexing is an algorithm that maps the original vector to a data
structure that enables faster search.
●By creating a vector index a data structure optimized for queries is created
at “store time” (as opposed to GDS similarity search at query time).
Neo4j Inc. All rights reserved 202332
How is search performed?
Neo4j Inc. All rights reserved 202333
●The Query vector is any piece of unstructured data that is being converted
to an encoding vector (the “Raw” vector) and is mapped to an index using
the same Algorithm (i.e. Hierarchical Navigable Small World).
●The “Key” vectors are the stored vectors that have been indexed.
●When search is performed between the query vector and the stored
vectors a similarity function is applied.
●Several similarity measures can be used, including:
○Cosine similarity
○Euclidean similarity
○Dot product
Neo4j and Vector Search
Neo4j Inc. All rights reserved 202334
Find relevant documents and
content for user queries
Find entities associated to
content and patterns in
connected data.
Improve search relevance &
insights by enhancing a
Knowledge Graph. Use graph
algorithms and ML to
discover new relationships,
entities, and groups.
Vector Similarity
Search
Graph Traversals &
Pattern Matching
Knowledge Graph
Inference & ML
Vector Search
Graph Database
What about Graph
Embeddings….?
Neo4j Inc. All rights reserved 202335
Neo4j Inc. All rights reserved 202336
What are node embeddings?
The representation of nodes as low-dimensional vectors that summarize
their graph position, the structure of their local graph neighborhood as well
as any possible node features
Neo4j Inc. All rights reserved 202337
NODE EMBEDDING
Neo4j Inc. All rights reserved 202338
4 algorithms…and counting
•FastRP (Fast Random Projection) - Calculates embeddings extremely fast using probabilistic
sampling and linear algebra.
•GraphSAGE (Graph SAmple and aggreGatE) - Trains a Graph Neural Network (GNN) to
generate embeddings on old and new graph data. Uses batch sampling procedures for
scalability.
•Node2Vec - Creates embeddings that represent nodes in similar neighborhoods and/or
structural “roles” in the graph using adjustable random walks.
•HashGNN - Quickly generates embeddings on heterogeneous graphs. Like a GNN but much
faster and simpler with comparable benchmarked performance. Leverages a clever application
of hashing functions rather than training a model.
Graph Data Science Embeddings
Notebook Time!
Neo4j Inc. All rights reserved 202339