Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan

neo4j 103 views 75 slides May 16, 2024
Slide 1
Slide 1 of 75
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75

About This Presentation

Look beyond the hype and unlock practical techniques to responsibly activate intelligence across your organization’s data with GenAI. Explore how to use knowledge graphs to increase accuracy, transparency, and explainability within generative AI systems. You’ll depart with hands-on experience co...


Slide Content

© 2024 Neo4j, Inc. All rights reserved.

Generative AI
workshop
Marco Bessi - Milano 2024/05/14


1

© 2024 Neo4j, Inc. All rights reserved.

EMEA
PreSale
Team
Computer
Science
PhD
Marco Bessi (he/him/his)

(Marco) -[:LIVES_IN]-> (Prato,Italy)
(Marco) -[:HAS_PHD]-> (ComputerScience PhD)
(ComputerScience PhD) -[:AT]-> (PoliMI)
(Marco) -[:IS_PART_OF]-> (EMEA PreSaleTeam)
(EMEA PreSaleTeam) -[:AT]-> (Neo4j)
Prato,
Italy
PoliMI
2
Your friendly instructor

© 2024 Neo4j, Inc. All rights reserved.

Code of Conduct
●Speak up or be quiet forever. Seriously, this is supposed to
be an interactive session (you could have stayed home and
watched the video afterwards otherwise ;-).

●If you are stuck, ask for help. Don't worry too much about it
though, you will get all the materials and can go through
them at your own pace afterwards.

●Bear with us when (not if ;-) technology fails.

●Have fun!


3

© 2023 Neo4j, Inc. All rights reserved.

© 2024 Neo4j, Inc. All rights reserved.

WIFI Access: WIFI-Name: neo4j | Password: graphsummit24
Restrooms: Out of the room and to
In the foyer area
Chargers:
the left the rightstraight ahead
under on top underthe table in first three rowsx
4
x
Logistics

© 2024 Neo4j, Inc. All rights reserved.

Practical
1. Jupyter Notebook
Browse to https://bit.ly/neo4j-gs-milano

Log on with your attendeeXX, password
attendeeXX (with the XX provided to you)



2. Neo4j Browser
We'll get to that from the Notebook, username
and password will be the same.
5

© 2024 Neo4j, Inc. All rights reserved.

Why are you in
THIS workshop?
6

© 2024 Neo4j, Inc. All rights reserved.

Quick poll (by show of hands)
7

© 2024 Neo4j, Inc. All rights reserved.

AI is critical to the success of my
company over the next 5 years

8

© 2024 Neo4j, Inc. All rights reserved.

Results
94% of business leaders agreed with that statement in
october 2022 (Deloitte study)!
9
That actually wasn't the thing about that report that I found
interesting.

© 2024 Neo4j, Inc. All rights reserved.

This is
10

© 2024 Neo4j, Inc. All rights reserved.

11
Managing AI risk
is the biggest
barrier to scaling
AI initiatives
1
Widespread Hesitancy: Over half of business leaders
currently discourage adoption of genAI.
2
Lack of Explainability: Over 80% of executives worry
the non-transparent nature of genAI could result in poor
or unlawful decisions.
2

Risk of Inaccuracy : Inaccuracy and hallucination are two
of the most-cited risks of adopting genAI technology at all
levels of an organization.
3

1. Deloitte’s State of AI in the Enterprise 2. BCG’s Digital Acceleration Index Study 2023 3. McKinsey: The state of AI in 2023

© 2024 Neo4j, Inc. All rights reserved.

12
Retrieval-Augmented Generation
is becoming an industry standard
RAG augments LLMs by
retrieving up-to-date,
contextual external data to
inform responses:

●Reduce hallucinations with
verified data
●Provide domain-specific,
relevant responses
●Enable traceability back to
sources

Retrieval Augmented Generation
Database of Truth

© 2024 Neo4j, Inc. All rights reserved.

13
Why RAG with vector databases
fall short
Database of Truth
1
3
2
4
Only leverage a fraction of
your data: Beyond simple
“metadata”, vector
databases alone fail to
capture relationships from
structured data
Miss critical context: Struggle
to capture connections across
nuanced facts, making it
challenging to answer
multi-step, domain-specific,
questions
Vector Similarity ≠ Relevance:
Vector search uses an incomplete
measure of similarity. Relying on it
solely can result in irrelevant and
duplicative results
Lack explainability:
The black-box nature of
vectors lacks transparency
and explainability

© 2024 Neo4j, Inc. All rights reserved.

RAG with
14
Find similar documents
and content
Identify entities
associated to content and
patterns
in connected data
Improve GenAI inferences
and insights. Discover new
relationships and entities
Unify vector search, knowledge graph and data science
capabilities to improve RAG quality and effectiveness
Vector Search
Graph Data
Science
Knowledge
Graph

© 2024 Neo4j, Inc. All rights reserved.

15
Today

© 2024 Neo4j, Inc. All rights reserved.

Today
we are choosing
clothes
16

© 2024 Neo4j, Inc. All rights reserved.

For cousin Nina
Nina likes t-shirts

17

© 2024 Neo4j, Inc. All rights reserved.

Traditional approach
●Line up options
●Let Nina pick
●Add some
accessories
18
That works, but what if
Nina isn't there, what if
you want to buy Nina a
present?

© 2024 Neo4j, Inc. All rights reserved.

The mission - a graph based AI
shopping/fashion assistant
because Nina is worth it

19
Steps
Knowledge graph building
Semantic search
Personalised search
Recommendation engine
Fashion assistant

© 2024 Neo4j, Inc. All rights reserved.

Tools
20

© 2024 Neo4j, Inc. All rights reserved.

Knowledge
graph building
21

© 2024 Neo4j, Inc. All rights reserved.

Dataset
H&M personalized fashion recommendations

https://www.kaggle.com/competitions/h-and-m-personalized-
fashion-recommendations/data

Changes
●Normalized into a graph model (coming right up)
●Added random (!) names
22

© 2024 Neo4j, Inc. All rights reserved.

Model (labels and types)
23

© 2024 Neo4j, Inc. All rights reserved.

Model (properties)
24

© 2024 Neo4j, Inc. All rights reserved.

Hammer Time
●Go to the Jupyter Lab environment and open the
genai_workshop.ipynb file.
We'll run it together, step by step

●For those not familiar with notebooks, when you run a step,
please wait for the [*] to turn into a [<number>], most steps
will also have output to look at.

●If you want to run ahead, please wait at the point where we
will switch to the Neo4j Browser (that's just before Vector
Search).
25

© 2024 Neo4j, Inc. All rights reserved.

Exploring the graph
In the notebook, click the link to open the Neo4j Browser at
https://browser.neo4j.io

Connect url: neo4j+s://milano-gs-db.graphdatabase.ninja:443
Username: attendeeXX
Password: attendeeXX


26

© 2024 Neo4j, Inc. All rights reserved.

Executing a browser guide
●Cut and paste the browser guide link (:play included) from
the notebook.
●Execute with the blue arrow.

27

© 2024 Neo4j, Inc. All rights reserved.

Executing a browser guide
●Cut and paste the browser guide link (:play included) from
the notebook.
●Execute with the blue arrow.

28

© 2024 Neo4j, Inc. All rights reserved.

One is one and all alone
Find a single person by id

29
MATCH (c:Customer WHERE c.id =
"daae10780ecd14990ea190a1e9917da33fe96cd8cfa5e80b67b4600171aa77e0 ")
RETURN c;
001
Find a single person by name

MATCH (c:Customer WHERE c.name = "Nina Massey")
RETURN c;
002

© 2024 Neo4j, Inc. All rights reserved.

And it's a customer
30
Find a single customer by name

MATCH sg=(c:Customer)-[:PURCHASED]->(:Article)
WHERE c.name = "Nina Massey"
RETURN sg;
003

© 2024 Neo4j, Inc. All rights reserved.

And that's not without value
It may so far not look interesting, but just with this data we
could determine
●How loyal a customer Nina is
●How big a spender Nina is
●What Nina's favourite colours are
●…

Let's try the colours.
31

© 2024 Neo4j, Inc. All rights reserved.

Showing your true colours
32
Find Nina's favourite colours

MATCH (c:Customer)-[:PURCHASED]->( a:Article)
WHERE c.name = "Nina Massey"
RETURN a.colour AS colour, count(*) AS occurences
ORDER BY occurences DESC;
004

© 2024 Neo4j, Inc. All rights reserved.

From article to product
33
Buying products

MATCH
sg=(c:Customer)-[:PURCHASED]->(:Article)-[:VARIANT_OF]->(:Product)
WHERE c.name = "Nina Massey"
RETURN sg;
005

© 2024 Neo4j, Inc. All rights reserved.

More value
It may still not look very exciting but now we can determine
●Which products Nina has bought multiple times
→ and maybe if that coincided with a promotion
●In which order products are bought
→ and maybe if there is a pattern there that we can also
find with other customers
●…


34

© 2024 Neo4j, Inc. All rights reserved.

Compulsive buying
35
Find Nina's favourite products

MATCH
(c:Customer)-[:PURCHASED]->(:Article)-[:VARIANT_OF]->( p:Product)
WHERE c.name = "Nina Massey"
RETURN p.name AS productname, count(*) AS occurences
ORDER BY occurences DESC;
006

© 2024 Neo4j, Inc. All rights reserved.

Location, location, location
36
Where to find Nina

MATCH
sg=(c:Customer)-[:PURCHASED]->(:Article)-[:LOCATED_IN]->(:Department)
WHERE c.name = "Nina Massey"
RETURN sg;
007

© 2024 Neo4j, Inc. All rights reserved.

Semantic
search
37

© 2024 Neo4j, Inc. All rights reserved.

Vector
●an arrow
●direction and magnitude (strength)



38

© 2024 Neo4j, Inc. All rights reserved.

Vector - array of numbers
●is a way to quantify the direction and magnitude in numbers
●for example in a 2D space, x may indicate the horizontal
direction, y the vertical direction and the magnitude is
calculated by the formula



39

© 2024 Neo4j, Inc. All rights reserved.

Vector - embedding
●represent complex data in a dense numerical form
●embed the data in a high-dimensional space where similar
data is close together

Whilst these vectors can be huge (OpenAI now has a 3072
dimension embedding model), this is nothing compared to the
complexity of what they embed.

An embedding is a - massive - dimension reduction.
40

© 2024 Neo4j, Inc. All rights reserved.

Vector - similarity
Euclidean - distance based
41

vector point
query
nearest 4
Cosine - direction based

© 2024 Neo4j, Inc. All rights reserved.

Vector - index
Neo4j implements the Hierarchical Navigable Small World
algorithm to do efficient k-ANN (approximate nearest
neighbours) searches.
42
Backless
blouse
Tie-back
shirt
Sleeve-
less crop
CREATE VECTOR INDEX …

© 2024 Neo4j, Inc. All rights reserved.

Vector - search
43
“Halter neck top”
CALL db.index.vector.queryNodes …

© 2024 Neo4j, Inc. All rights reserved.

Just so it's clear
A full text index search can not find King unless King is in the
text. Context is largely ignored.

A vector index search will find Queen, Prince, Baron, Warlord,
<and so on> to be to some degree similar to what you asked
but might also weave in some unexpected results. Context is -
well - King.

44

© 2024 Neo4j, Inc. All rights reserved.

Hammer Time
●Go back to the Jupyter Lab environment.
We'll continue to run it together, step by step

●If you want to run ahead, please wait at the point where we
will switch to the Neo4j Browser again (that's just after the
Semantic Search with Context heading).
45

© 2024 Neo4j, Inc. All rights reserved.

Personalised
search
46

© 2024 Neo4j, Inc. All rights reserved.

Executing a browser guide
●Cut and paste the browser guide link (:play included) from
the notebook.
●Execute with the blue arrow.

47

© 2024 Neo4j, Inc. All rights reserved.

Executing a browser guide
●Cut and paste the browser guide link (:play included) from
the notebook.
●Execute with the blue arrow.

48

© 2024 Neo4j, Inc. All rights reserved.

HisHer story
Purchase history of a single customer

49
MATCH
(c:Customer)-[pc:PURCHASED]->(:Article)-[:VARIANT_OF]->( p:Product)
WHERE c.name = "Nina Massey"
RETURN c.name AS name,
pc.transactiondate AS transactiondate,
p.name AS product,
p.description AS description
ORDER BY transactiondate DESC;
001

© 2024 Neo4j, Inc. All rights reserved.

Nina won the lottery
Shopping spree

50
MATCH
(c:Customer)-[pc:PURCHASED]->(:Article)-[:VARIANT_OF]->( p:Product)
WHERE c.name = "Nina Massey"
RETURN c.name AS name,
pc.transactiondate AS transactiondate,
collect(p.name) AS spree
ORDER BY size(spree) DESC;
002

© 2024 Neo4j, Inc. All rights reserved.

Double trouble
Eliminate doubles

51
MATCH
(c:Customer)-[pc:PURCHASED]->(:Article)-[:VARIANT_OF]->( p:Product)
WHERE c.name = "Nina Massey"
RETURN c.name AS name,
pc.transactiondate AS transactiondate,
collect(DISTINCT p.name) AS spree
ORDER BY size(spree) DESC;
003

© 2024 Neo4j, Inc. All rights reserved.

WITH or without you
Pipeline building

52
MATCH
(c:Customer)-[pc:PURCHASED]->(:Article)-[:VARIANT_OF]->( p:Product)
WHERE c.name = "Nina Massey"
WITH c.name AS name,
pc.transactiondate AS transactiondate,
collect(DISTINCT p.name) AS spree
ORDER BY size(spree) DESC LIMIT 1
RETURN transactiondate AS dateofinterest;
004

© 2024 Neo4j, Inc. All rights reserved.

Similar customers also buy X (part one)

53
// determine the date of interest
MATCH
(c:Customer)-[pc:PURCHASED]->(:Article)-[:VARIANT_OF]->( p:Product)
WHERE c.name = "Nina Massey"
WITH c.name AS name,
pc.transactiondate AS dateofinterest,
collect(DISTINCT p.name) AS spree
ORDER BY size(spree) DESC LIMIT 1;
...
005

© 2024 Neo4j, Inc. All rights reserved.

Similar customers also buy X (part two)

54
...
// what other products do customers buy
MATCH
(c)-[pc:PURCHASED]->(:Article) <-[:PURCHASED]-(:Customer)-[:PURCHASED]
->(:Article)-[:VARIANT_OF]->( p:Product)
WHERE pc.transactiondate = dateofinterest
RETURN p.name AS product,
count(*) AS commonPurchaseScore,
p.description AS description
ORDER BY commonPurchaseScore DESC;
005

© 2024 Neo4j, Inc. All rights reserved.

I know, I know
that a lot of time today is spend on working with the
knowledge graph, with the data.

55
But then there are also lot of questions along the lines of
●Why should I integrate a knowledge graph with the LLM,
have you seen the results of what <latest model> can do?
→ yes indeed and they are often crap if you dig just below the
surface

© 2024 Neo4j, Inc. All rights reserved.

I know, I know
that a lot of time today is spend on working with the
knowledge graph, with the data.

56
Or rants along the lines of
●You should provide proof and benchmarks that integrating
a knowledge graph with the LLM provides better results!
→ yes indeed and we do but what's the point if you do not
understand the value of a knowledge graph to start with

© 2024 Neo4j, Inc. All rights reserved.

Hammer Time
●Go back to the Jupyter Lab environment.
We'll continue to run it together, step by step

●If you want to run ahead, please wait at Augmenting
Semantic Search with Knowledge Graph Inference & ML.
57

© 2024 Neo4j, Inc. All rights reserved.

Recommendation
engine
58

© 2024 Neo4j, Inc. All rights reserved.

Graph Transactional
So far we always queried the graph database in the following
fashion
●Find starting point (Nina)
●Walk the graph from the starting point
●Stay local to the starting point

→ This is what is called graph transactional querying, used in
hundreds of real time use cases.
59

© 2024 Neo4j, Inc. All rights reserved.

Graph Data Science
You can also consider the graph as a whole and there are many
algorithms that do that
●(Lary) Pagerank
●(Edsger) Dijkstra's pathfinding
●(Vincent) Blondel's Louvain community detection
●<many others>

→ This is what is called graph data science, used in hundreds
of analytical use cases and to enhance the graph for the real
time use cases.
60

© 2024 Neo4j, Inc. All rights reserved.

Our overview
61

© 2024 Neo4j, Inc. All rights reserved.

Remember this?
62

© 2024 Neo4j, Inc. All rights reserved.

It actually looks like this now
63

© 2024 Neo4j, Inc. All rights reserved.

And we want it to look like this
64

© 2024 Neo4j, Inc. All rights reserved.

And we want it to look like this
65

© 2024 Neo4j, Inc. All rights reserved.

Hammer Time
●Go back to the Jupyter Lab environment.
We'll continue to run it together, step by step

●If you want to run ahead, please wait at LLM For Generating
Grounded Content.
66

© 2024 Neo4j, Inc. All rights reserved.

Just so it's clear
There are several possible approaches to create the
CUSTOMERS_ALSO_LIKE relationship. The embedding is
definitely not the easiest ;-), but
●fits the context (earlier on we saw text embeddings, now we
used node embeddings)
●allows the inclusion of properties (even though we didn't do
that)

Important to understand is that the similarity here is based on
what was bought together, not on what the article is!

67

© 2024 Neo4j, Inc. All rights reserved.

Fashion
Assistant
68

© 2024 Neo4j, Inc. All rights reserved.

Pulling it all together
So far you have done
a vector search
and enhanced that
with a personalized search
and you also know what you could recommend.


69
Allow me to introduce you to our AI Fashion Assistant Sam,
who will combine all your hard work. Sam is pretty new to this
though and needs a bit of prompting.

© 2024 Neo4j, Inc. All rights reserved.

Prompting Sam
You are a personal assistant named Sam for a fashion, home, and beauty company called
HRM.
write an email to {customerName}, one of your customers, to promote and summarize
products relevant for them given the current season / time of year: {timeOfYear}.
Please only mention the products listed below. Do not come up with or add any new
products to the list.
Each product comes with an https `url` field. Make sure to provide that https url with
descriptive name text in markdown for each product.
---
# Relevant Products:
{searchProds}

# Customer May Also Be Interested In the following
(pick items from here that pair with the above products well for the current season /
time of year: {timeOfYear}.
prioritize those higher in the list if possible):
{recProds}
---

70

© 2024 Neo4j, Inc. All rights reserved.

Sam's insides
71

© 2024 Neo4j, Inc. All rights reserved.

Sam's insides
72
Personalized Search
Recommendations
Personal information
search prompt
customer id
time of year

© 2024 Neo4j, Inc. All rights reserved.

Hammer Time
●Go back to the Jupyter Lab environment.
We'll continue to run it together to the end, step by step
73

© 2024 Neo4j, Inc. All rights reserved.

Resources
Original session:
https://github.com/neo4j-product-examples/genai-workshop

Today's session:
https://github.com/tomgeudens/jupyter-notebooks/tree/main/genaihm

Today's data:
https://github.com/tomgeudens/trainingdata/tree/main/genaihm
74

© 2024 Neo4j, Inc. All rights reserved.

Thank You!
[email protected]

75