Workshop - Best of Both Worlds_ Combine KG and Vector search for enhanced RAG.pdf

neo4j 209 views 40 slides May 01, 2024
Slide 1
Slide 1 of 40
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40

About This Presentation

These are the slides delivered in a workshop at Data Innovation Summit Stockholm April 2024, by Kristof Neys and Jonas El Reweny.


Slide Content

Best of Both Worlds: Combine
KG and Vector search for
enhanced RAG
Data Innovation Summit 2024


Jonas El Reweny, Kristof Neys
Neo4j Field Engineering

Agenda
Neo4j Inc. All rights reserved 20232
1.Knowledge Graph
2.Graph Query Language
3.Graph Data Science
4.Vectors
5.Demo Time!
Notebook in Google Colab:
tinyurl.com/disws24
Neo4j Sandbox:
sandbox.neo4j.com
Prerequisites for the workshop:
●Laptop with internet access and no
outbound restrictions on ports 80,
443, 7687
●Register an account and log in to
https://sandbox.neo4j.com and
select the "Blank Sandbox" project
●Register an account and log in to
https://colab.research.google.com/

But….
First a word from our
sponsor…
Neo4j Inc. All rights reserved 20233

Neo4j Inc. All rights reserved 20234
Neo4j: The Graph Database
& Analytics Leader

Neo4j Inc. All rights reserved 20235
300
1B+ Enterprise
customers
$500M
in funding
170+
Global partner
ecosystem
250K
Community of developers
and data pros
100M+
Downloads
The first-ever graph database

Creator of the market category

Continued market leader

© 2023 Neo4j, Inc. All rights reserved. 6
20 / 20
Top US banks
3 / 5
Top Aircraft Manufacturers
7 / 10
Top Telcos
3 / 5
Top Hotel Groups
8 / 10
Top Insurance Companies
10 /10
Top Automakers
7 / 10
Top Retailers
5 / 5
Top Pharmaceuticals
Trusted by
75 of the

Neo4j Inc. All rights reserved 20237
The core graph object:
a Knowledge Graph

Recap a Knowledge Graph
A knowledge graph is a
structured representation
of facts, consisting of
entities, relationships and
semantic descriptions
8 Neo4j Inc. All rights reserved 2024

From data points to a Knowledge Graph
9 Neo4j Inc. All rights reserved 2024

From data points to a Knowledge Graph
10 Neo4j Inc. All rights reserved 2024

From data points to a Knowledge Graph
11 Neo4j Inc. All rights reserved 2024

From data points to a Knowledge Graph
12 Neo4j Inc. All rights reserved 2024

© 2023 Neo4j, Inc. All rights reserved.
User:VISITED
Website
User
IPLocation
Website
IPLocation
Website
Website
Website
:VISITED
:VISITED
:VISITED
:USED
:USED
:USED
:VISITED
:VISITED
:VISITED
:SAME_AS
Graphs allows you to make implicit
relationships….
….explicit
And they Grow too…?!
13

© 2023 Neo4j, Inc. All rights reserved.
:SAME_AS
User:VISITED
Website
User
IPLocation
Website
IPLocation
Website
Website
Website
:VISITED
:VISITED
:VISITED
:USED
:USED
:USED
:VISITED
:VISITED
:VISITED
User
:SAME_AS
:USED
:VISITED
PersonId: 1
PersonId: 1 PersonId: 1
User
PersonId: 2
:VISITED
…and can then group similar nodes…and
create a new graph from the explicit
relationships…
A graph grows organically - gaining
insights and enriching your data
Graphs….Grow!
14

Neo4j Inc. All rights reserved 202315

Cypher (GQL) is how we
roll…

© 2022 Neo4j, Inc. All rights reserved.
Cypher: powerful and expressive query language
16
MATCH (:Person { name:“Dan”} ) -[:LOVES]-> (:Person { name:“Ann”} )
Person
NODE NODE
LABEL PROPERTYLABEL PROPERTY
CREATE
RELATIONSHIP
name: ‘Ann’
LOVES
Person
name: ‘Dan’

© 2022 Neo4j, Inc. All rights reserved.
Cypher: powerful and expressive query language
17
MARRIED_TO
Person
name: ‘Dan’
MATCH (p:Person { name:“Dan”} ) -[:MARRIED_TO]-> (spouse)
NODE RELATIONSHIP TYPE
LABEL PROPERTY VARIABLE
spouse
NODE
RETURN p.name as husband, spouse
VARIABLE

Neo4j Inc. All rights reserved 202318

Enhance your RAG with
Graph Data Science

GDS evolution
Local
Matching
Learn features in your
graph that you don’t even
know are important yet

Train in-graph supervise
ML models to predict
links, labels and missing
data.

Global
Patterns
Graph
Representations
Use unsupervised
machine learning
techniques to identify
associations, anomalies,
and trends.

Graph analytics
Graph feature
engineering
Find the patterns
you’re looking for in
connected data.

Knowledge graphs


19 Neo4j Inc. All rights reserved 2023

Neo4j Inc. All rights reserved 202320

Before we go any
further…let’s quiz!

Neo4j, Inc. All rights reserved 2021 21
Which of the colored nodes would be considered the most
‘important'?

Neo4j, Inc. All rights reserved 2021 22
Which of the colored nodes would be considered the most
‘important'?

70+ Graph Data Science Techniques in Neo4j

Pathfinding &
Search

•Shortest Path
•Single-Source Shortest Path
•All Pairs Shortest Path
•A* Shortest Path
•Yen’s K Shortest Path
•Minimum Weight Spanning Tree
•K-Spanning Tree (MST)
•Random Walk
•Breadth & Depth First Search
Centrality &
Importance

•Degree Centrality
•Closeness Centrality
•Harmonic Centrality
•Betweenness Centrality & Approx.
•PageRank
•Personalized PageRank
•ArticleRank
•Eigenvector Centrality
•Hyperlink Induced Topic Search (HITS)
•Influence Maximization (Greedy, CELF)

Community
Detection

•Triangle Count
•Local Clustering Coefficient
•Connected Components (Union Find)
•Strongly Connected Components
•Label Propagation
•Louvain Modularity
•K-1 Coloring
•Modularity Optimization
•Speaker Listener Label Propagation
Supervised
Machine Learning

•Node Classification
•Link Prediction





… and more!
Heuristic Link
Prediction

•Adamic Adar
•Common Neighbors
•Preferential Attachment
•Resource Allocations
•Same Community
•Total Neighbors
Similarity


•Node Similarity
•K-Nearest Neighbors (KNN)
•Jaccard Similarity
•Cosine Similarity
•Pearson Similarity
•Euclidean Distance
•Approximate Nearest Neighbors (ANN)
Graph
Embeddings

•Node2Vec
•FastRP
•FastRPExtended
•GraphSAGE
•Synthetic Graph Generation
•Scale Properties
•Collapse Paths
•One Hot Encoding
•Split Relationships
•Graph Export
•Pregel API (write your own algos)
23 Neo4j Inc. All rights reserved 2023

24 Neo4j Inc. All rights reserved 2023
It’s Better with Vectors…
Neo4j Inc. All rights reserved 2023

What is a Vector?
Neo4j Inc. All rights reserved 202325

What is a vector
Neo4j Inc. All rights reserved 202326
●Length
●Direction
●Components have meaning
horizontal
vertical

Vector arithmetic
Neo4j Inc. All rights reserved 202327
1
a
b
2
a
b
3
a + b

Kings and Queens
Neo4j Inc. All rights reserved 202328
king − man + woman ≈ queen
king
man
woman
1
king
man
woman
2
queen?
3

What are vector embeddings
Neo4j Inc. All rights reserved 202329
●Same concepts, just “an arrow”
●100s or 1000s dimensions

Finding Similar vectors
Neo4j Inc. All rights reserved 202330
●cosine
●direction / angle based
vector point
query
nearest 4

●Euclidean
●distance based

Why a Vector Store?
Neo4j Inc. All rights reserved 202331

Why & What is a Vector Index?
●Data applied on: encoding vectors of mainly unstructured data such
as text, audio, video that is converted using embedding models
(“Raw” vectors).
●Main purpose: deploy approximate methods to perform similarity
search at lower computational cost.

●Once an embedding vector has been created as a node property a vector
index can be created across those properties.
●This indexing is an algorithm that maps the original vector to a data
structure that enables faster search.
●By creating a vector index a data structure optimized for queries is created
at “store time” (as opposed to GDS similarity search at query time).

Neo4j Inc. All rights reserved 202332

How is search performed?

Neo4j Inc. All rights reserved 202333
●The Query vector is any piece of unstructured data that is being converted
to an encoding vector (the “Raw” vector) and is mapped to an index using
the same Algorithm (i.e. Hierarchical Navigable Small World).
●The “Key” vectors are the stored vectors that have been indexed.
●When search is performed between the query vector and the stored
vectors a similarity function is applied.
●Several similarity measures can be used, including:
○Cosine similarity
○Euclidean similarity
○Dot product

Neo4j and Vector Search
Neo4j Inc. All rights reserved 202334
Find relevant documents and
content for user queries
Find entities associated to
content and patterns in
connected data.
Improve search relevance &
insights by enhancing a
Knowledge Graph. Use graph
algorithms and ML to
discover new relationships,
entities, and groups.
Vector Similarity
Search
Graph Traversals &
Pattern Matching
Knowledge Graph
Inference & ML
Vector Search
Graph Database

What about Graph
Embeddings….?
Neo4j Inc. All rights reserved 202335

Neo4j Inc. All rights reserved 202336
What are node embeddings?



The representation of nodes as low-dimensional vectors that summarize
their graph position, the structure of their local graph neighborhood as well
as any possible node features

Neo4j Inc. All rights reserved 202337
NODE EMBEDDING

Neo4j Inc. All rights reserved 202338
4 algorithms…and counting

•FastRP (Fast Random Projection) - Calculates embeddings extremely fast using probabilistic
sampling and linear algebra.
•GraphSAGE (Graph SAmple and aggreGatE) - Trains a Graph Neural Network (GNN) to
generate embeddings on old and new graph data. Uses batch sampling procedures for
scalability.
•Node2Vec - Creates embeddings that represent nodes in similar neighborhoods and/or
structural “roles” in the graph using adjustable random walks.
•HashGNN - Quickly generates embeddings on heterogeneous graphs. Like a GNN but much
faster and simpler with comparable benchmarked performance. Leverages a clever application
of hashing functions rather than training a model.
Graph Data Science Embeddings

Notebook Time!
Neo4j Inc. All rights reserved 202339

Thank you!
[email protected]
Neo4j Inc. All rights reserved 202340