Introduction to Open Source RAG and RAG Evaluation

chloewilliams62 999 views 53 slides May 23, 2024

Slide 1 of 53

About This Presentation

You’ve heard good data matters in Machine Learning, but does it matter for Generative AI applications? Corporate data often differs significantly from the general Internet data used to train most foundation models. Join me for a demo on building an open source RAG (Retrieval Augmented Generation) ...

Size: 5.58 MB

Language: en

Added: May 23, 2024

Slides: 53 pages

Slide Content

1 | © Copyright 11/17/23 Zilliz1 | © Copyright 11/17/23 Zilliz 1| © Copyright 11/17/23 Zilliz1| © Copyright 11/17/23 Zilliz
Speaker
Christy Bergman
Developer Advocate, Zilliz
[email protected]
https://www.linkedin.com/in/christybergman/

https://github.com/milvus-io/milvus
discord: https://discord.gg/FjCMmaJng6

2 | © Copyright 11/17/23 Zilliz2 | © Copyright 11/17/23 Zilliz
Image source: https://thedataquarry.com/posts/vector-db-1/

3 | © Copyright 11/17/23 Zilliz3 | © Copyright 11/17/23 Zilliz
27K+
GitHub
Stars
25M+
Downloads
250+
Contributors
2,600
+Forks
Milvus is an open-source vector database for GenAI projects. Pip-install on your
laptop, plug into popular AI dev tools, and push to production with a single line of
code.
Easy Setup

Pip-install to start
coding in a notebook
within seconds.
Reusable Code

Write once, and
deploy with one line
of code into the
production
environment
Integration

Plug into OpenAI,
Langchain,
LlmaIndex, and
many more
Feature-rich

Dense & sparse
embeddings,
filtering, reranking
and beyond

4 | © Copyright 11/17/23 Zilliz4 | © Copyright 11/17/23 Zilliz
Zilliz Cloud is a fully-managed vector
database built atop of OSS Milvus
Open Source
Flexible & Secure Deployment

Enterprise features
for production-ready
Cardinal Search Engine &
Use Case Optimized Compute
Milvus completely
re-engineered to
be optimized
Pipelines Connectors Model Library
A streamlined
unstructured data
platform
Stable Milvus
versions are
continuously
deployed to Zilliz
Cloud

5 | © Copyright 11/17/23 Zilliz5 | © Copyright 11/17/23 Zilliz 5| © Copyright 11/17/23 Zilliz5| © Copyright 11/17/23 Zilliz
Milvus
Open Source Self-Managed

Milvus Discord
Join our community

github.com/milvus-io/milvus

Getting Started with Vector Databases
milvus.io/discord

6 | © Copyright 11/17/23 Zilliz6 | © Copyright 11/17/23 Zilliz
AGENDA
01AI Hallucinations and RAG
03
04RAG Evaluation Methods
024 Challenges
Demo RAG
05Demo Eval

7 | © Copyright 11/17/23 Zilliz7 | © Copyright 11/17/23 Zilliz
01
AI Hallucinations
and RAG

Example AI Hallucination
gemini
wikipedia

Example AI Hallucination
gemini
wikipedia
hallucinated
answer

Why do models hallucinate?
•The reason LLMs
hallucinate is because
…

•They are trained on
sequences of words
(tokens)

Sample Data
The hamster cabinet …
!!@#%# …
Monkey eats shark …
trees in the moons…

Vector
Database
Where do Vectors Come From?
Unstructured Data

Embeddings here
Pre-trained Deep
Learning Models

Vectors

Where do Vectors Come From?
Unstructured Data

Vectors

Where do Vectors Come From?
Unstructured Data

Vectors

Embedding
model
Generator
Model
or LLM

Semantic Similarity
Image from Sutor et al
Woman = [0.3, 0.4]
Queen = [0.3, 0.9]
King = [0.5, 0.7]
Woman = [0.3, 0.4]
Queen = [0.3, 0.9]
King = [0.5, 0.7]
Man = [0.5, 0.2]
Queen - Woman + Man = King
Queen = [0.3, 0.9]
- Woman = [0.3, 0.4]
[0.0, 0.5]
+ Man = [0.5, 0.2]
King = [0.5, 0.7]
Man = [0.5, 0.2]

15 | © Copyright 11/17/23 Zilliz15 | © Copyright 11/17/23 Zilliz
Retrieval Augmented Generation (RAG)
Your Data
Embedding Model
Vector Database
Question
Question + Context
Search
Gen AI Model
Reliable Answers
What is the default
AUTOINDEX distance
metric in Milvus
Client?
The default
AUTOINDEX distance
metric in Milvus
Client is L2.

16 | © Copyright 11/17/23 Zilliz16 | © Copyright 11/17/23 Zilliz
02
3 Challenges and
Lessons Learned

17 | © Copyright 11/17/23 Zilliz17 | © Copyright 11/17/23 Zilliz
Pain Point #1: Choosing an Embedding Model
https://huggingface.co/spaces/mteb/leaderboard

18 | © Copyright 11/17/23 Zilliz18 | © Copyright 11/17/23 Zilliz
Pain Point #1: Choosing an Embedding Model
CreatorModel Embedding
Dim
Context
Length
Use Case
Tasks
Open
Source
MTEB
Score
OpenAItext-embedding-
3-small
512-1536 8K Real-time
Multilingual text
chatbots
No 62 (1536)
62 (512)
OpenAItext-embedding-
3-large
256-3072 8K Real-time
Multilingual text
chatbots
No 65 (3072)
62 (256)
Matryoshka Representation Learning:
https://arxiv.org/pdf/2205.13147v4.pdf

19 | © Copyright 11/17/23 Zilliz19 | © Copyright 11/17/23 Zilliz
Pain Point #2: Choosing an Index
https://milvus.io/docs/index.md

20 | © Copyright 11/17/23 Zilliz20 | © Copyright 11/17/23 Zilliz
Pain Point #2: Choosing an Index
●In-memory
○Floating point dense
■Flat - The FLAT index is an exhaustive, brute-force approach that compares the query vector
against every single vector in the dataset to ﬁnd the nearest neighbors. Suitable for small
datasets where perfect accuracy is required, and search latency is not of concern.
■IVF_Flat - The IVF_FLAT (Inverted File FLAT) index is a quantization-based index that
divides the vector space into clusters. During indexing, vectors are assigned to the nearest
cluster centroid, and during search, only the vectors within the closest clusters to the query
vector are compared.
■HNSW - HNSW organizes vectors in a hierarchical, multi-layered graph, so search
complexity is logarithmic. The basic idea is to separate nearest neighbours into layers in the
graph where the top layer is the sparsest. The lowest layer forms the complete graph. Search is
performed from top to bottom.
○Floating point sparse - SPLADE, BGE-M3
○Binary
●On-disk - diskANN when your data is too large to fit in memory
●Hardware-optimized: GPU CAGRA, ARM,

21 | © Copyright 11/17/23 Zilliz21 | © Copyright 11/17/23 Zilliz
Pain Point #2: Choosing an Index
IVF-Flat
HNSW
https://arxiv.org/abs/160
3.09320

22 | © Copyright 11/17/23 Zilliz22 | © Copyright 11/17/23 Zilliz
Conversation
Data
Documentation
Data
Lecture or Q/A
Data
Pain Point #3: Chunking

23 | © Copyright 11/17/23 Zilliz23 | © Copyright 11/17/23 Zilliz
Conversation
Data
Documentation
Data
Question Answer
Data
add
conversation
memory
use Q&A pair
formatting
Pain Point #3: Chunking

24 | © Copyright 11/17/23 Zilliz24 | © Copyright 11/17/23 Zilliz
Pain Point #3: Chunks need more context
Tesla Roadster

2018

Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tem

2023

Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tem

Chunk #1

Chunk #2
Naive Chunks

25 | © Copyright 11/17/23 Zilliz25 | © Copyright 11/17/23 Zilliz
Tesla Roadster

2018

Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tem

2023

Lorem ipsum dolor sit amet,
consectetur adipiscing elit,
sed do eiusmod tem

Tesla Roadster 2018
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tem

Tesla Roadster 2023
Lorem ipsum dolor sit
amet, consectetur
adipiscing elit, sed do
eiusmod tem
HTMLHeaderTextSplitter
ParentDocumentRetriever
Title 2-levels above

Title 1-level above
Naive Chunks Better Chunks
HierarchicalNodeParser
AutoMergingRetriever

Pain Point #3: Chunks need more context

28 | © Copyright 11/17/23 Zilliz28 | © Copyright 11/17/23 Zilliz
Pain Point #4: Keyword or Semantic Search?
??????
Good for:
●Exact product name
●Jargon words

Examples:
●Product name =
“2022 RF GT 6MT”
Good for:
●Similar meaning but
maybe not exact

Examples:
●Similar image search
●Related wiki articles

29 | © Copyright 11/17/23 Zilliz29 | © Copyright 11/17/23 Zilliz
Pain Point #4: Keyword or Semantic Search?
Dense Vector

Sparse Vector

TF-IDF
BM25
SPLADE
Lucene WAND pruning
BGE-M3
Top10
Top5
Final top_k
Prompt & Question
Improved context
Best of both worlds!
●Reranked Keyword AND Semantic top_k
●Put reranked into the Prompt Context
Keyword
Search
Semantic
Search
Linear comb.
Cross-encoder
Neural reranker

30 | © Copyright 11/17/23 Zilliz30 | © Copyright 11/17/23 Zilliz
Rerankers - when are they computed?
-Straight up Cosine similarity is called no interaction. This is dense embeddings “semantic
search”.
-BERT was an Early Interaction model meaning relationship between question and docs are
pre-computed as part of Embedding model, offline.
-Cross-encoders are ML-model Late Interaction, calculated at query time. Too
computation-heavy to run real-time except for small top_k to reduce to smaller top_2.
Cross-encoder reranking (adds classifier to Q, A pairs).
-ColBERT v2 is Neural-model Late Interaction calculated offline, before the user asks
their question! ~2% increased accuracy, but requires storing extra embeddings.
-Cohere’s rerank-3, claims ~26% improvement over sparse only; 6% over dense
-Jina.ai Reranker, claims ~20% improvement over sparse only

31 | © Copyright 11/17/23 Zilliz31 | © Copyright 11/17/23 Zilliz
BERT vs ColBert
BERT: SPLADE, BGE-M3
Query Top_k candidates
Final
top_k
https://arxiv.org/pdf/2112.01488.pdf

32 | © Copyright 11/17/23 Zilliz32 | © Copyright 11/17/23 Zilliz
Colbert v2 Reranker
https://arxiv.org/pdf/2112.01488.pdf

33 | © Copyright 11/17/23 Zilliz33 | © Copyright 11/17/23 Zilliz
Slide from Tengyu Ma, April 2024
talk at Unstructured Data
(+add Milvus metadata filtering)
Metadata
filtering (hash)

34 | © Copyright 11/17/23 Zilliz34 | © Copyright 11/17/23 Zilliz
BGE M3-Embedding
●“Multi-vec” - Multi-vector retrieval, uses
fine-grained interactions between query
and passage’s embeddings to compute
the relevance score. Re-rank the
top-200 Dense candidates, for efficient
processing.
●“Dense+Sparse” - Retrieve the top-1000
candidates with dense and sparse
method; then re-rank using the sum of
two scores.
●“All” - Re-rank based on the sum of all
three scores.
…
Multi-lingual retrieval performance on the MIRACL dev set (measured by nDCG@10).
https://arxiv.org/pdf/2402.03216

35 | © Copyright 11/17/23 Zilliz35 | © Copyright 11/17/23 Zilliz
https://chat.lmsys.org/?leaderboard
chart by @maximelabonne

37 | © Copyright 11/17/23 Zilliz37 | © Copyright 11/17/23 Zilliz
Mixtral 8x22B-Instruct-v0.1 with Anyscale Endpoints
https://console.anyscale.com/v2/playground

38 | © Copyright 11/17/23 Zilliz38 | © Copyright 11/17/23 Zilliz
Question: What do the parameters for HNSW mean?
Prompt
GPT-3.5-turbo
Anyscale endpoints
Mixtral-8x22B-Instruct-v0.1

39 | © Copyright 11/17/23 Zilliz39 | © Copyright 11/17/23 Zilliz
2023 Lost-in-the-middle
https://arxiv.org/pdf/2307.03172
2024 Needle-in-a-haystack experiments
https://github.com/gkamradt/LLMTest_NeedleInAHaystack
Is RAG dead?

40 | © Copyright 11/17/23 Zilliz40 | © Copyright 11/17/23 Zilliz
Is RAG dead?
Needle in haystack experiments
Slide from Lance Martin, Langchain
https://blog.langchain.dev/multi-nee
dle-in-a-haystack/

Where do Vectors Come From?
Unstructured Data

Vectors

Where do Vectors Come From?
Unstructured Data

Vectors

Embedding
model
Generator
Model
or LLM

45 | © Copyright 11/17/23 Zilliz45 | © Copyright 11/17/23 Zilliz
Retrieval Augmented Generation (RAG)
Your Data
Embedding Model
Vector Database
Question
Question + Context
Search
Gen AI Model
Reliable Answers
What is the default
AUTOINDEX distance
metric in Milvus?
The default
AUTOINDEX distance
metric in Milvus is L2.

46 | © Copyright 11/17/23 Zilliz46 | © Copyright 11/17/23 Zilliz
Model Evals vs Production System Evals
Your RAG systemArena Elo score

47 | © Copyright 11/17/23 Zilliz47 | © Copyright 11/17/23 Zilliz
RAG Evaluation Methods
https://arxiv.org/pdf/2306.05685.pdf
GPT-4 favors itself with a 10% higher
win rate; Claude-v1 favors itself with a
25% higher win rate

Open weight Prometheus-eval aligns
with human judgments up to 85% as
of May 2024.

48 | © Copyright 11/17/23 Zilliz48 | © Copyright 11/17/23 Zilliz
Known Problems with LLM-as-Judge
https://www.databricks.com/blog/LLM-auto-eval-best-practices-RAG
GPT-4 is not a good
judge of
comprehensiveness
GPT-4
Matches
Human
judgements on
Correctness &
Readability

49 | © Copyright 11/17/23 Zilliz49 | © Copyright 11/17/23 Zilliz
Known Problems with LLM-as-Judge
https://arxiv.org/pdf/2305.17926
AI scores
max/min higher
Humans
score
medians
higher

50 | © Copyright 11/17/23 Zilliz50 | © Copyright 11/17/23 Zilliz
RAG Evaluation Methods
https://github.com/explodinggradients/ragas
faithfulness
context_precision
context_recall
Query
Context
answer_relevancy
Ground Truth
Answer
answer_correctness
answer_similarity
Response

52 | © Copyright 11/17/23 Zilliz52 | © Copyright 11/17/23 Zilliz
T H A N K Y O U
?????? We need your stars!
https://github.com/milvus-io/milvus

?????? Join our discord: https://discord.gg/FjCMmaJng6

Introduction to Open Source RAG and RAG Evaluation

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Introduction to Open Source RAG and RAG Evaluation

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Slide 43

Slide 44

Slide 45

Slide 46

Slide 47

Slide 48

Slide 49

Slide 50

Slide 51

Slide 52

Slide 53

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

8-top-ai-courses-for-customer-support-representatives-in-2025.pptx

7-essential-ai-courses-for-call-center-supervisors-in-2025.pptx

25-essential-ai-courses-for-user-support-specialists-in-2025.pptx

8-essential-ai-courses-for-insurance-customer-service-representatives-in-2025.pptx

Know for Certain

PPT OPD LES 3ertt4t4tqqqe23e3e3rq2qq232.pptx