Workshop: Enabling GenAI Breakthroughs with Knowledge Graphs - GraphSummit Milan
neo4j
103 views
75 slides
May 16, 2024
Slide 1 of 75
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
About This Presentation
Look beyond the hype and unlock practical techniques to responsibly activate intelligence across your organization’s data with GenAI. Explore how to use knowledge graphs to increase accuracy, transparency, and explainability within generative AI systems. You’ll depart with hands-on experience co...
Look beyond the hype and unlock practical techniques to responsibly activate intelligence across your organization’s data with GenAI. Explore how to use knowledge graphs to increase accuracy, transparency, and explainability within generative AI systems. You’ll depart with hands-on experience combining relationships and LLMs for increased domain-specific context and enhanced reasoning.
Code of Conduct
●Speak up or be quiet forever. Seriously, this is supposed to
be an interactive session (you could have stayed home and
watched the video afterwards otherwise ;-).
●If you are stuck, ask for help. Don't worry too much about it
though, you will get all the materials and can go through
them at your own pace afterwards.
WIFI Access: WIFI-Name: neo4j | Password: graphsummit24
Restrooms: Out of the room and to
In the foyer area
Chargers:
the left the rightstraight ahead
under on top underthe table in first three rowsx
4
x
Logistics
Results
94% of business leaders agreed with that statement in
october 2022 (Deloitte study)!
9
That actually wasn't the thing about that report that I found
interesting.
11
Managing AI risk
is the biggest
barrier to scaling
AI initiatives
1
Widespread Hesitancy: Over half of business leaders
currently discourage adoption of genAI.
2
Lack of Explainability: Over 80% of executives worry
the non-transparent nature of genAI could result in poor
or unlawful decisions.
2
Risk of Inaccuracy : Inaccuracy and hallucination are two
of the most-cited risks of adopting genAI technology at all
levels of an organization.
3
1. Deloitte’s State of AI in the Enterprise 2. BCG’s Digital Acceleration Index Study 2023 3. McKinsey: The state of AI in 2023
12
Retrieval-Augmented Generation
is becoming an industry standard
RAG augments LLMs by
retrieving up-to-date,
contextual external data to
inform responses:
●Reduce hallucinations with
verified data
●Provide domain-specific,
relevant responses
●Enable traceability back to
sources
13
Why RAG with vector databases
fall short
Database of Truth
1
3
2
4
Only leverage a fraction of
your data: Beyond simple
“metadata”, vector
databases alone fail to
capture relationships from
structured data
Miss critical context: Struggle
to capture connections across
nuanced facts, making it
challenging to answer
multi-step, domain-specific,
questions
Vector Similarity ≠ Relevance:
Vector search uses an incomplete
measure of similarity. Relying on it
solely can result in irrelevant and
duplicative results
Lack explainability:
The black-box nature of
vectors lacks transparency
and explainability
RAG with
14
Find similar documents
and content
Identify entities
associated to content and
patterns
in connected data
Improve GenAI inferences
and insights. Discover new
relationships and entities
Unify vector search, knowledge graph and data science
capabilities to improve RAG quality and effectiveness
Vector Search
Graph Data
Science
Knowledge
Graph
Traditional approach
●Line up options
●Let Nina pick
●Add some
accessories
18
That works, but what if
Nina isn't there, what if
you want to buy Nina a
present?
Hammer Time
●Go to the Jupyter Lab environment and open the
genai_workshop.ipynb file.
We'll run it together, step by step
●For those not familiar with notebooks, when you run a step,
please wait for the [*] to turn into a [<number>], most steps
will also have output to look at.
●If you want to run ahead, please wait at the point where we
will switch to the Neo4j Browser (that's just before Vector
Search).
25
And that's not without value
It may so far not look interesting, but just with this data we
could determine
●How loyal a customer Nina is
●How big a spender Nina is
●What Nina's favourite colours are
●…
Showing your true colours
32
Find Nina's favourite colours
MATCH (c:Customer)-[:PURCHASED]->( a:Article)
WHERE c.name = "Nina Massey"
RETURN a.colour AS colour, count(*) AS occurences
ORDER BY occurences DESC;
004
More value
It may still not look very exciting but now we can determine
●Which products Nina has bought multiple times
→ and maybe if that coincided with a promotion
●In which order products are bought
→ and maybe if there is a pattern there that we can also
find with other customers
●…
MATCH
(c:Customer)-[:PURCHASED]->(:Article)-[:VARIANT_OF]->( p:Product)
WHERE c.name = "Nina Massey"
RETURN p.name AS productname, count(*) AS occurences
ORDER BY occurences DESC;
006
Vector - array of numbers
●is a way to quantify the direction and magnitude in numbers
●for example in a 2D space, x may indicate the horizontal
direction, y the vertical direction and the magnitude is
calculated by the formula
Vector - index
Neo4j implements the Hierarchical Navigable Small World
algorithm to do efficient k-ANN (approximate nearest
neighbours) searches.
42
Backless
blouse
Tie-back
shirt
Sleeve-
less crop
CREATE VECTOR INDEX …
Just so it's clear
A full text index search can not find King unless King is in the
text. Context is largely ignored.
A vector index search will find Queen, Prince, Baron, Warlord,
<and so on> to be to some degree similar to what you asked
but might also weave in some unexpected results. Context is -
well - King.
Hammer Time
●Go back to the Jupyter Lab environment.
We'll continue to run it together, step by step
●If you want to run ahead, please wait at the point where we
will switch to the Neo4j Browser again (that's just after the
Semantic Search with Context heading).
45
HisHer story
Purchase history of a single customer
49
MATCH
(c:Customer)-[pc:PURCHASED]->(:Article)-[:VARIANT_OF]->( p:Product)
WHERE c.name = "Nina Massey"
RETURN c.name AS name,
pc.transactiondate AS transactiondate,
p.name AS product,
p.description AS description
ORDER BY transactiondate DESC;
001
50
MATCH
(c:Customer)-[pc:PURCHASED]->(:Article)-[:VARIANT_OF]->( p:Product)
WHERE c.name = "Nina Massey"
RETURN c.name AS name,
pc.transactiondate AS transactiondate,
collect(p.name) AS spree
ORDER BY size(spree) DESC;
002
51
MATCH
(c:Customer)-[pc:PURCHASED]->(:Article)-[:VARIANT_OF]->( p:Product)
WHERE c.name = "Nina Massey"
RETURN c.name AS name,
pc.transactiondate AS transactiondate,
collect(DISTINCT p.name) AS spree
ORDER BY size(spree) DESC;
003
52
MATCH
(c:Customer)-[pc:PURCHASED]->(:Article)-[:VARIANT_OF]->( p:Product)
WHERE c.name = "Nina Massey"
WITH c.name AS name,
pc.transactiondate AS transactiondate,
collect(DISTINCT p.name) AS spree
ORDER BY size(spree) DESC LIMIT 1
RETURN transactiondate AS dateofinterest;
004
53
// determine the date of interest
MATCH
(c:Customer)-[pc:PURCHASED]->(:Article)-[:VARIANT_OF]->( p:Product)
WHERE c.name = "Nina Massey"
WITH c.name AS name,
pc.transactiondate AS dateofinterest,
collect(DISTINCT p.name) AS spree
ORDER BY size(spree) DESC LIMIT 1;
...
005
54
...
// what other products do customers buy
MATCH
(c)-[pc:PURCHASED]->(:Article) <-[:PURCHASED]-(:Customer)-[:PURCHASED]
->(:Article)-[:VARIANT_OF]->( p:Product)
WHERE pc.transactiondate = dateofinterest
RETURN p.name AS product,
count(*) AS commonPurchaseScore,
p.description AS description
ORDER BY commonPurchaseScore DESC;
005
I know, I know
that a lot of time today is spend on working with the
knowledge graph, with the data.
55
But then there are also lot of questions along the lines of
●Why should I integrate a knowledge graph with the LLM,
have you seen the results of what <latest model> can do?
→ yes indeed and they are often crap if you dig just below the
surface
I know, I know
that a lot of time today is spend on working with the
knowledge graph, with the data.
56
Or rants along the lines of
●You should provide proof and benchmarks that integrating
a knowledge graph with the LLM provides better results!
→ yes indeed and we do but what's the point if you do not
understand the value of a knowledge graph to start with
Graph Transactional
So far we always queried the graph database in the following
fashion
●Find starting point (Nina)
●Walk the graph from the starting point
●Stay local to the starting point
→ This is what is called graph transactional querying, used in
hundreds of real time use cases.
59
Graph Data Science
You can also consider the graph as a whole and there are many
algorithms that do that
●(Lary) Pagerank
●(Edsger) Dijkstra's pathfinding
●(Vincent) Blondel's Louvain community detection
●<many others>
→ This is what is called graph data science, used in hundreds
of analytical use cases and to enhance the graph for the real
time use cases.
60
Just so it's clear
There are several possible approaches to create the
CUSTOMERS_ALSO_LIKE relationship. The embedding is
definitely not the easiest ;-), but
●fits the context (earlier on we saw text embeddings, now we
used node embeddings)
●allows the inclusion of properties (even though we didn't do
that)
Important to understand is that the similarity here is based on
what was bought together, not on what the article is!
Pulling it all together
So far you have done
a vector search
and enhanced that
with a personalized search
and you also know what you could recommend.
69
Allow me to introduce you to our AI Fashion Assistant Sam,
who will combine all your hard work. Sam is pretty new to this
though and needs a bit of prompting.
Prompting Sam
You are a personal assistant named Sam for a fashion, home, and beauty company called
HRM.
write an email to {customerName}, one of your customers, to promote and summarize
products relevant for them given the current season / time of year: {timeOfYear}.
Please only mention the products listed below. Do not come up with or add any new
products to the list.
Each product comes with an https `url` field. Make sure to provide that https url with
descriptive name text in markdown for each product.
---
# Relevant Products:
{searchProds}
# Customer May Also Be Interested In the following
(pick items from here that pair with the above products well for the current season /
time of year: {timeOfYear}.
prioritize those higher in the list if possible):
{recProds}
---