Combining Lexical and Semantic Search with Milvus 2.5

chloewilliams62 158 views 38 slides Feb 27, 2025
Slide 1
Slide 1 of 38
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38

About This Presentation

In short, lexical search is a way to search your documents based on the keywords they contain, in contrast to semantic search, which compares the similarity of embeddings. We’ll be covering:

​Why, when, and how should you use lexical search

​What is the BM25 distance metric

​How exactly d...


Slide Content

1 | © Copyright 2005 Zilliz1 1| © Copyright 10/22/23 Zilliz 1| © Copyright 2005 Zilliz
Stefan Webb
Developer Advocate, Zilliz
[email protected]
https://www.linkedin.com/in/stefan-webb
https://x.com/stefan_webb
Unstructured Data Meetup | Host

2 | © Copyright 2005 Zilliz2
Meanwhile, in Open-Source) GenAI…
What LLVM was released by DeepMind in December?
What are some of its capabilities ?

3 | © Copyright 2005 Zilliz3
google/paligemma2-3b-mix-448

4 | © Copyright 2005 Zilliz4
google/paligemma2-3b-mix-448

5 | © Copyright 2005 Zilliz5
google/paligemma2-3b-mix-448

6 | © Copyright 2005 Zilliz6
Meanwhile, in Open-Source) GenAI…
What multi-modal foundation model was released by Microsoft
Research this week? What was it and how was it novel?

7 | © Copyright 2005 Zilliz7
World and Human Action Model (microsoft/wham)

8 | © Copyright 2005 Zilliz8
World and Human Action Model (microsoft/wham)

9 | © Copyright 2005 Zilliz9
01
Semantic Search?
Lexical Search?

10 | © Copyright 2005 Zilliz10
Why?
A unified solution that supports lexical and semantic search
while reducing system complexity and cost
“Elasticsearch is Dead: Long Live Lexical Search”

11 | © Copyright 2005 Zilliz11
Semantic Search
“You shall know a word by the company it keeps!ˮ
J.R. Firth, 1957

12 | © Copyright 2005 Zilliz12
Semantic Search
Similarity Search

13 | © Copyright 2005 Zilliz13
Lexical Search
“You shall know a word by its relative document frequencies!ˮ
Stefan, today

14 | © Copyright 2005 Zilliz14
Lexical Search
relative frequency of
term in document
relative frequency of
term across documents

15 | © Copyright 2005 Zilliz15
Lexical Search
similarity between document d and query q
document in
question
collection of
documents

16 | © Copyright 2005 Zilliz16
Pros / Cons






or
Rising dough
Rising Dough
Proofing Bread

17 | © Copyright 2005 Zilliz17
Results
Code search on Anthropic dataset with Voyage AI Embedding Model
“Semantic Search vs. Full-Text: Which One Should I Choose with Milvus 2.5?”

18 | © Copyright 2005 Zilliz18
02
Lexical and Hybrid Search
with Milvus 2.5

19 | © Copyright 10/22/23 Zilliz19 | © Copyright 10/22/23 Zilliz
About Milvus
Milvus is an open source
vector database
33K
66M
400
2.7K
Easy Setup

Integration

Reusable Code

Feature-rich

20 | © Copyright 2005 Zilliz20
Lexical Search

21 | © Copyright 2005 Zilliz21
Lexical Search

22 | © Copyright 2005 Zilliz22

23 | © Copyright 2005 Zilliz23

24 | © Copyright 2005 Zilliz24

25 | © Copyright 2005 Zilliz25

26 | © Copyright 2005 Zilliz26
Hybrid Search

27 | © Copyright 2005 Zilliz27

28 | © Copyright 2005 Zilliz28

29 | © Copyright 2005 Zilliz29

30 | © Copyright 2005 Zilliz30

31 | © Copyright 2005 Zilliz31

32 | © Copyright 2005 Zilliz32

33 | © Copyright 2005 Zilliz33

34 | © Copyright 2005 Zilliz34
03
How does it work?

35 | © Copyright 2005 Zilliz35
What is the BM25 metric?
prevent bias towards longer
documents, which may
contain more instances of a
term simply due to their length
free parameter
free parameter
prevent overly high scores for
documents with very high
term frequencies

36 | © Copyright Zilliz36
Book a free 11 session to get help with your production deployment
meetings.hubspot.com/chloe-williams1/milvus-office-hours

37 | © Copyright 10/22/23 Zilliz37 | © Copyright 10/22/23 Zilliz
Unstructured
Data Podcast
Latest Episodes
•Inside the AI Revolution
•Prompt, Score, Repeat: Principled
RAG and Agent Design


??????????????????

38 | © Copyright 10/22/23 Zilliz38 | © Copyright 10/22/23 Zilliz
Workshop
with Milvus
and OpenAI

Join us for a hands-on session with
OpenAI to learn about Agents!

?????? March 20, 2025
⏰ 530  830 PM
?????? Palo Alto
Tags