Tirana Tech Meetup - Agentic RAG with Milvus, Llama3 and Ollama

chloewilliams62 536 views 94 slides Jul 08, 2024
Slide 1
Slide 1 of 94
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47
Slide 48
48
Slide 49
49
Slide 50
50
Slide 51
51
Slide 52
52
Slide 53
53
Slide 54
54
Slide 55
55
Slide 56
56
Slide 57
57
Slide 58
58
Slide 59
59
Slide 60
60
Slide 61
61
Slide 62
62
Slide 63
63
Slide 64
64
Slide 65
65
Slide 66
66
Slide 67
67
Slide 68
68
Slide 69
69
Slide 70
70
Slide 71
71
Slide 72
72
Slide 73
73
Slide 74
74
Slide 75
75
Slide 76
76
Slide 77
77
Slide 78
78
Slide 79
79
Slide 80
80
Slide 81
81
Slide 82
82
Slide 83
83
Slide 84
84
Slide 85
85
Slide 86
86
Slide 87
87
Slide 88
88
Slide 89
89
Slide 90
90
Slide 91
91
Slide 92
92
Slide 93
93
Slide 94
94

About This Presentation

Build a RAG system - From Beginners to Advanced


Slide Content

1 | © Copyright 8/16/23 Zilliz1 | © Copyright 8/16/23 Zilliz
Stephen Batifol | Zilliz

Tirana Tech Meetup, July 4th
Build a RAG system - From
Beginners to Advanced

2 | © Copyright 8/16/23 Zilliz2 | © Copyright 8/16/23 Zilliz
Stephen Batifol
Developer Advocate, Zilliz/ Milvus
[email protected]
linkedin.com/in/stephen-batifol/
@stephenbtl
Speaker

3 | © Copyright 8/16/23 Zilliz3 | © Copyright 8/16/23 Zilliz
27K+
GitHub
Stars
25M+
Downloads
250+
Contributors
2,600
+Forks
Milvus is an open-source vector database for GenAI projects. pip install on your
laptop, plug into popular AI dev tools, and push to production with a single line of
code.
Easy Setup

pip install
pymilvus to start
coding in a notebook
within seconds.
Reusable Code

Write once, and
deploy with one line
of code into the
production
environment
Integration

Plug into OpenAI,
Langchain,
LlamaIndex, and
many more
Feature-rich

Dense & sparse
embeddings,
filtering, reranking
and beyond

4 | © Copyright 8/16/23 Zilliz4 | © Copyright 8/16/23 Zilliz
Seamless integration with all popular AI toolkits

5 | © Copyright 8/16/23 Zilliz5 | © Copyright 8/16/23 Zilliz
Well-connected in LLM infrastructure to enable RAG
use cases
Framework
Hardware
Infrastructure
Embedding Models LLMs
Software Infrastructure
Vector Database

6 | © Copyright 8/16/23 Zilliz6 | © Copyright 8/16/23 Zilliz

7 | © Copyright 8/16/23 Zilliz7 | © Copyright 8/16/23 Zilliz 7| © Copyright 8/16/23 Zilliz7| © Copyright 8/16/23 Zilliz
01
Introduction to Vector DB
and Vector Search

8 | © Copyright 8/16/23 Zilliz8 | © Copyright 8/16/23 Zilliz
Traditional database was built upon exact search

9 | © Copyright 8/16/23 Zilliz9 | © Copyright 8/16/23 Zilliz
…which misses context, semantic meaning, and user intent





VS.
Apple





VS.
Rising dough





VS.
Change car tire
Rising Dough
Proofing Bread

10 | © Copyright 8/16/23 Zilliz10 | © Copyright 8/16/23 Zilliz
…and cannot process increasingly growing unstructured data
*Data Source: The Digitization of the World by IDC
20%
Other
newly generated data in 2025
will be unstructured data80%

11 | © Copyright 8/16/23 Zilliz11 | © Copyright 8/16/23 Zilliz
Retrieval Augmented
Generation (RAG)
Expand LLMs' knowledge by
incorporating external data sources
into LLMs and your AI applications.
Match user behavior or content
features with other similar ones to
make effective recommendations.
Recommender System
Search for semantically similar
texts across vast amounts of
natural language documents.
Text/ Semantic Search
Image Similarity Search
Identify and search for visually
similar images or objects from a
vast collection of image libraries.
Video Similarity Search
Search for similar videos, scenes,
or objects from extensive
collections of video libraries.
Audio Similarity Search
Find similar audios in large datasets
for tasks like genre classification or
speech recognition
Molecular Similarity Search
Search for similar substructures,
superstructures, and other
structures for a specific molecule.
Anomaly Detection
Detect data points, events, and
observations that deviate
significantly from the usual pattern
Multimodal Similarity Search
Search over multiple types of data
simultaneously, e.g. text and
images
Common AI Use Cases

12 | © Copyright 8/16/23 Zilliz12 | © Copyright 8/16/23 Zilliz
Vector
Databases
Where do Vectors Come From?

13 | © Copyright 8/16/23 Zilliz13 | © Copyright 8/16/23 Zilliz
Embeddings Models

14 | © Copyright 8/16/23 Zilliz14 | © Copyright 8/16/23 Zilliz
Vector Embedding

15 | © Copyright 8/16/23 Zilliz15 | © Copyright 8/16/23 Zilliz
Vector Space

16 | © Copyright 8/16/23 Zilliz16 | © Copyright 8/16/23 Zilliz 16| © Copyright 8/16/23 Zilliz16| © Copyright 8/16/23 Zilliz
02
How do Vector Databases
Work?

17 | © Copyright 8/16/23 Zilliz17 | © Copyright 8/16/23 Zilliz
How Similarity Search Works
V
n, 1



1
2
34
5
Transform into
Vectors
Unstructured Data
Images
User Generated
Content
Video
Documents
Audio
Vector Embeddings
Perform Approximate
Nearest Neighbor
Similarity Search
Perform Query
Get Results
Store in Vector Database

18 | © Copyright 8/16/23 Zilliz18 | © Copyright 8/16/23 Zilliz
Example Entry

19 | © Copyright 8/16/23 Zilliz19 | © Copyright 8/16/23 Zilliz 19| © Copyright 8/16/23 Zilliz19| © Copyright 8/16/23 Zilliz
03 Indexes

20 | © Copyright 8/16/23 Zilliz20 | © Copyright 8/16/23 Zilliz
Indexing strategies
•Tree based

•Graph based

•Hash based

•Cluster based

21 | © Copyright 8/16/23 Zilliz21 | © Copyright 8/16/23 Zilliz
Category Index Accuracy Latency ThroughputIndex TimeCost
Graph-based Cagra (GPU) High Low Very High Fast Very High

HNSW
High Low High Slow High

DiskANN
High High Mid Very Slow Low
Quantization-base
d or cluster-based
ScaNN Mid Mid High Mid Mid
IVF_FLAT Mid Mid Low Fast Mid

IVF +
Quantization
Low Mid Mid Mid Low

22 | © Copyright 8/16/23 Zilliz22 | © Copyright 8/16/23 Zilliz

23 | © Copyright 8/16/23 Zilliz23 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 23
FLAT

24 | © Copyright 8/16/23 Zilliz24 | © Copyright 8/16/23 Zilliz
FLAT Index

25 | © Copyright 8/16/23 Zilliz25 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 25
Inverted File FLAT
(IVF-FLAT)

26 | © Copyright 8/16/23 Zilliz26 | © Copyright 8/16/23 Zilliz
IVF-FLAT Index

27 | © Copyright 8/16/23 Zilliz27 | © Copyright 8/16/23 Zilliz
IVF-FLAT Index

28 | © Copyright 8/16/23 Zilliz28 | © Copyright 8/16/23 Zilliz
IVF-FLAT Index

29 | © Copyright 8/16/23 Zilliz29 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 29
Hierarchical Navigable
Small World (HNSW)

30 | © Copyright 8/16/23 Zilliz30 | © Copyright 8/16/23 Zilliz
HNSW - Skip List

31 | © Copyright 8/16/23 Zilliz31 | © Copyright 8/16/23 Zilliz
•Built by randomly shuffling data points and inserting them one by
one, with each point connected to a predefined number of edges
(M).

⇒ Creates a graph structure that exhibits the "small world".

⇒ Any two points are connected through a relatively short path.
HNSW - NSW Graph

32 | © Copyright 8/16/23 Zilliz32 | © Copyright 8/16/23 Zilliz
HNSW

33 | © Copyright 8/16/23 Zilliz33 | © Copyright 8/16/23 Zilliz
Category Index Accuracy Latency ThroughputIndex TimeCost
Graph-based Cagra (GPU) High Low Very High Fast Very High

HNSW
High Low High Slow High

DiskANN
High High Mid Very Slow Low
Quantization-base
d or cluster-based
ScaNN Mid Mid High Mid Mid
IVF_FLAT Mid Mid Low Fast Mid

IVF +
Quantization
Low Mid Mid Mid Low

34 | © Copyright 8/16/23 Zilliz34 | © Copyright 8/16/23 Zilliz
Picking an Index
●100% Recall – Use FLAT search if you need 100% accuracy
●10MB < index_size < 2GB – Standard IVF
●2GB < index_size < 20GB – Consider PQ and HNSW
●20GB < index_size < 200GB – Composite Index, IVF_PQ or
HNSW_SQ
●Disk-based indexes

35 | © Copyright 8/16/23 Zilliz35 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 35
The GenAI Boom

36 | © Copyright 8/16/23 Zilliz36 | © Copyright 8/16/23 Zilliz

37 | © Copyright 8/16/23 Zilliz37 | © Copyright 8/16/23 Zilliz
Three Pillars of GenAI & the opportunities they bring
Models Computation Data
Vector Database
●Data Encryption
●Data ETL
●Data Security
●Data Pipeline
●Data Observability
●Data Compliance

38 | © Copyright 8/16/23 Zilliz38 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 38
RAG
(Retrieval Augmented Generation)

39 | © Copyright 8/16/23 Zilliz39 | © Copyright 8/16/23 Zilliz
Basic Idea
Use RAG to force the LLM to work with your data
by injecting it via a vector database like Milvus

40 | © Copyright 8/16/23 Zilliz40 | © Copyright 8/16/23 Zilliz
Why RAG?
RAG vs. LLM
-Knowledge of LLM is out-of-date
-LLM can not get your private knowledge
-Help reduce Hallucinations
-Transparency and interpretability

RAG vs. Fine-tune
-Fine-tune is expensive
-Fine-tune spent much time
-RAG is pluggable

41 | © Copyright 8/16/23 Zilliz41 | © Copyright 8/16/23 Zilliz
Basic RAG Architecture

42 | © Copyright 8/16/23 Zilliz42 | © Copyright 8/16/23 Zilliz
5 lines starter

43 | © Copyright 8/16/23 Zilliz43 | © Copyright 8/16/23 Zilliz 43| © Copyright 8/16/23 Zilliz43| © Copyright 8/16/23 Zilliz
04 Basic RAG with Milvus

44 | © Copyright 8/16/23 Zilliz44 | © Copyright 8/16/23 Zilliz
•Framework for building LLM Applications
•Focus on retrieving data and integrating with LLMs
•Loading the Data
•Chunk & Chunk Overlap
•Integrations with most AI popular tools
Llama-Index

45 | © Copyright 8/16/23 Zilliz45 | © Copyright 8/16/23 Zilliz
Ollama
•Run LLMs Locally

•Embeddings Models

46 | © Copyright 8/16/23 Zilliz46 | © Copyright 8/16/23 Zilliz
Milvus Lite
pip install pymilvus

47 | © Copyright 8/16/23 Zilliz47 | © Copyright 8/16/23 Zilliz 47| © Copyright 8/16/23 Zilliz47| © Copyright 8/16/23 Zilliz
05 Embeddings

48 | © Copyright 8/16/23 Zilliz48 | © Copyright 8/16/23 Zilliz
Embeddings Models

49 | © Copyright 8/16/23 Zilliz49 | © Copyright 8/16/23 Zilliz
Examining Embeddings
Picking a model
What to embed
Metadata

50 | © Copyright 8/16/23 Zilliz50 | © Copyright 8/16/23 Zilliz
Embeddings Strategies
Level 1: Embedding Chunks Directly
Level 2: Embedding Sub and Super Chunks
Level 3: Incorporating Chunking and Non-Chunking Metadata

51 | © Copyright 8/16/23 Zilliz51 | © Copyright 8/16/23 Zilliz
Metadata Examples
Chunking
-Paragraph position
-Section header
-Larger paragraph
-Sentence Number
-…
Non-Chunking
-Author
-Publisher
-Organization
-Role Based Access Control
-…

52 | © Copyright 8/16/23 Zilliz52 | © Copyright 8/16/23 Zilliz
Your embeddings strategy depends on your accuracy,
cost, and use case needs
Takeaway:

53 | © Copyright 8/16/23 Zilliz53 | © Copyright 8/16/23 Zilliz 53| © Copyright 8/16/23 Zilliz53| © Copyright 8/16/23 Zilliz
06 Chunking

54 | © Copyright 8/16/23 Zilliz54 | © Copyright 8/16/23 Zilliz
Chunking Considerations
Chunk Size
Chunk Overlap
Character Splitters

55 | © Copyright 8/16/23 Zilliz55 | © Copyright 8/16/23 Zilliz
Chunk Size=50, Overlap=0

56 | © Copyright 8/16/23 Zilliz56 | © Copyright 8/16/23 Zilliz
Chunk Size=128, Overlap=20

57 | © Copyright 8/16/23 Zilliz57 | © Copyright 8/16/23 Zilliz
Chunk Size=256, Overlap=50

58 | © Copyright 8/16/23 Zilliz58 | © Copyright 8/16/23 Zilliz
SemanticChunker

59 | © Copyright 8/16/23 Zilliz59 | © Copyright 8/16/23 Zilliz
How Does Your Data Look?
Conversation
Data
Documentation
Data
Lecture or Q/A
Data

60 | © Copyright 8/16/23 Zilliz60 | © Copyright 8/16/23 Zilliz
Your chunking strategy depends on what your data looks
like and what you need from it.
Takeaway:

61 | © Copyright 8/16/23 Zilliz61 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 61
Demo!

62 | © Copyright 8/16/23 Zilliz62 | © Copyright 8/16/23 Zilliz
Naive RAG is limited

63 | © Copyright 8/16/23 Zilliz63 | © Copyright 8/16/23 Zilliz
Naive RAG failure mode
Summarization

64 | © Copyright 8/16/23 Zilliz64 | © Copyright 8/16/23 Zilliz
Naive RAG failure mode
Implicit data

65 | © Copyright 8/16/23 Zilliz65 | © Copyright 8/16/23 Zilliz
Naive RAG failure mode
Multi-part questions

66 | © Copyright 8/16/23 Zilliz66 | © Copyright 8/16/23 Zilliz 66| © Copyright 8/16/23 Zilliz
RAG is necessary but
not sufficient

67 | © Copyright 8/16/23 Zilliz67 | © Copyright 8/16/23 Zilliz
Good dishes come from good ingredients
•Data collection

•Data cleaning

•Parsing & Chunking

68 | © Copyright 8/16/23 Zilliz68 | © Copyright 8/16/23 Zilliz 68| © Copyright 9/25/23 Zilliz68| © Copyright 9/25/23 Zilliz
Simplify and streamline
the conversion of
unstructured data into
state-of-the-art vector
embeddings, using
intuitive UI and Restful
APIs.
Pipelines
Easy. High-quality. Scalable.







Simplify the workflow
for developers, from
converting
unstructured data into
searchable vectors to
retrieving them from
vector databases
Deliver excellence in
every phase of vector
search pipeline
development and
deployment,
regardless of their
expertise
Ensure scalability for
managing large
datasets and
high-throughput
queries, maintaining
high performance with
min. customization or
infra changes
Zilliz Cloud Pipelines

69 | © Copyright 8/16/23 Zilliz69 | © Copyright 8/16/23 Zilliz
Naive RAG Pipeline
⚠ Single-shot
⚠ No query understanding/planning
⚠ No tool use
⚠ No reflection, error correction
⚠ No memory (stateless)

70 | © Copyright 8/16/23 Zilliz70 | © Copyright 8/16/23 Zilliz
First thing first
Measure it before you attempts to improve it!

71 | © Copyright 8/16/23 Zilliz71 | © Copyright 8/16/23 Zilliz 71| © Copyright 8/16/23 Zilliz71| © Copyright 8/16/23 Zilliz
07 Advanced RAG techniques

72 | © Copyright 8/16/23 Zilliz72 | © Copyright 8/16/23 Zilliz
Types of RAG Enhancement Techniques
●Divide & Conquer
○Query Enhancement: better express or process the query intent.
○Indexing Enhancement: data cleanup, better parser and chunking
○Retriever Enhancement: more retrievers and hybrid search strategy
●Thinking outside the box
○Agents? Other tools than retriever?

73 | © Copyright 8/16/23 Zilliz73 | © Copyright 8/16/23 Zilliz 73| © Copyright 8/16/23 Zilliz73| © Copyright 8/16/23 Zilliz
Query Enhancementa)

74 | © Copyright 8/16/23 Zilliz74 | © Copyright 8/16/23 Zilliz

75 | © Copyright 8/16/23 Zilliz75 | © Copyright 8/16/23 Zilliz 75| © Copyright 8/16/23 Zilliz75| © Copyright 8/16/23 Zilliz
Indexing Enhancementb)

76 | © Copyright 8/16/23 Zilliz76 | © Copyright 8/16/23 Zilliz

77 | © Copyright 8/16/23 Zilliz77 | © Copyright 8/16/23 Zilliz 77| © Copyright 8/16/23 Zilliz77| © Copyright 8/16/23 Zilliz
Retriever Enhancementc)

78 | © Copyright 8/16/23 Zilliz78 | © Copyright 8/16/23 Zilliz

79 | © Copyright 8/16/23 Zilliz79 | © Copyright 8/16/23 Zilliz

80 | © Copyright 8/16/23 Zilliz80 | © Copyright 8/16/23 Zilliz 80| © Copyright 8/16/23 Zilliz80| © Copyright 8/16/23 Zilliz
08 Agentic RAG

81 | © Copyright 8/16/23 Zilliz81 | © Copyright 8/16/23 Zilliz
Agentic RAG
✅ Multi-turn
✅ Query / task planning layer
✅ Tool interface for external environment
✅ Reflection
✅ Memory for personalization

82 | © Copyright 8/16/23 Zilliz82 | © Copyright 8/16/23 Zilliz

83 | © Copyright 8/16/23 Zilliz83 | © Copyright 8/16/23 Zilliz

84 | © Copyright 8/16/23 Zilliz84 | © Copyright 8/16/23 Zilliz

85 | © Copyright 8/16/23 Zilliz85 | © Copyright 8/16/23 Zilliz
Conversation Memory

86 | © Copyright 8/16/23 Zilliz86 | © Copyright 8/16/23 Zilliz
Tool Use

87 | © Copyright 8/16/23 Zilliz87 | © Copyright 8/16/23 Zilliz 87| © Copyright 8/16/23 Zilliz87| © Copyright 8/16/23 Zilliz
09
RAG in action with Milvus
Lite

88 | © Copyright 8/16/23 Zilliz88 | © Copyright 8/16/23 Zilliz
General Ideas

89 | © Copyright 8/16/23 Zilliz89 | © Copyright 8/16/23 Zilliz | © Copyright 8/16/23 Zilliz 89
Demo!

90 | © Copyright 8/16/23 Zilliz90 | © Copyright 8/16/23 Zilliz
milvus.io
github.com/milvus-io/
@milvusio
@stephenbtl


/in/stephen-batifol
Thank you

91 | © Copyright 8/16/23 Zilliz91 | © Copyright 8/16/23 Zilliz
Meta Storage
Root Query Data Index
Coordinator Service
Proxy
Proxy
etcd
Log Broker
SDK
Load Balancer
DDL/DCL
DML
NOTIFICATION
CONTROL SIGNAL
Object Storage
Minio / S3 / AzureBlob
Log Snapshot Delta File Index File
Worker Node
QUERY DATA DATA
Message Storage
VECTOR
DATABASE
Access Layer
Query Node Data Node Index Node
Milvus Architecture

92 | © Copyright 8/16/23 Zilliz92 | © Copyright 8/16/23 Zilliz
Well-connected in LLM infrastructure to enable RAG
use cases
Framework
Hardware
Infrastructure
Embedding Models LLMs
Software Infrastructure
Vector Database

93 | © Copyright 8/16/23 Zilliz93 | © Copyright 8/16/23 Zilliz
Driving AI Innovation with leading business

94 | © Copyright 8/16/23 Zilliz94 | © Copyright 8/16/23 Zilliz
Tags