Multimodal Retrieval Augmented Generation (RAG) with Milvus

chloewilliams62 536 views 26 slides Jun 27, 2024
Slide 1
Slide 1 of 26
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26

About This Presentation

We've seen an influx of powerful multimodal capabilities in many LLMs. In this talk, we'll vectorize a dataset of images and texts into the same embedding space, store them in Milvus, retrieve all relevant data using multilingual texts and/or images and input multimodal data as context into ...


Slide Content

1 | © Copyright 2024 Zilliz1
Multimodal RAG with Milvus
Yi Wang @ Zilliz

2 | © Copyright 2024 Zilliz2
01RAG is the New Search
CONTENTS
02Multimodal Retrieval with Milvus

3 | © Copyright 2024 Zilliz3
RAG is the New Search

4 | © Copyright 2024 Zilliz4
Retrieval-Augmented Generation

5 | © Copyright 2024 Zilliz5
A Typical Search System
Picture Credit: https://web.eecs.umich.edu/~nham/EECS398F19/

6 | © Copyright 2024 Zilliz6
Indexing
Query
Retrieval Prompt&
Generation
Recap of RAG Architecture

7 | © Copyright 2024 Zilliz7
Indexing
Query
Retrieval Prompt&
Generation
Recap of RAG Architecture
Offline Indexing

8 | © Copyright 2024 Zilliz8
Indexing
Query
Retrieval Prompt&
Generation
Recap of RAG Architecture
Online Serving

9 | © Copyright 2024 Zilliz9
How RAG Resembles Search

10 | © Copyright 2024 Zilliz10
Multimodal Retrieval with Milvus

11 | © Copyright 2024 Zilliz11
Multi-modal Retrieval
●Combining text and
image in the search
query
●Retrieving
multi-modal content
for generation
Query = "feuilles brunes pendant la journée"

(i.e. "brown leaves during daytime")

12 | © Copyright 2024 Zilliz12

13 | © Copyright 2024 Zilliz13
Easy to start with, can even run on edge devices!

14 | © Copyright 2024 Zilliz14
Scale-up on Docker

15 | © Copyright 2024 Zilliz15
Up to 100 billion vectors with K8s!

16 | © Copyright 2024 Zilliz16

17 | © Copyright 2024 Zilliz17
Data Preparation
?????? Download images.zip file directly from:
https://huggingface.co/datasets/unum-cloud/ann-unsplash-25k/tree/main
import glob, time, pprint
import numpy as np
from PIL import Image
import pandas as pd

# Load image files and descriptions
image_data = pd.read_csv('images.csv')
print(image_data.shape)
display(image_data.head(2))

# List of image urls and texts.
image_urls = list(image_data.photo_id)
image_texts = list(image_data.ai_description)

18 | © Copyright 2024 Zilliz18
Create a Milvus Collection
# STEP 1. Connect to milvus
connection = connections.connect(
alias="default",
host='localhost', # or '0.0.0.0' or 'localhost'
port='19530'
)

# STEP 2. Create a new collection and build index

EMBEDDING_DIM = 256
MAX_LENGTH = 65535

# Step 2.1 Define the data schema for the new Collection.
fields = [
# Use auto generated id as primary key
FieldSchema(name="id", dtype=DataType.INT64, is_primary=True, auto_id=True, max_length=100),
FieldSchema(name="text_vector", dtype=DataType.FLOAT_VECTOR, dim=EMBEDDING_DIM),
FieldSchema(name="image_vector", dtype=DataType.FLOAT_VECTOR, dim=EMBEDDING_DIM),
FieldSchema(name="chunk", dtype=DataType.VARCHAR, max_length=MAX_LENGTH),
FieldSchema(name="image_filepath", dtype=DataType.VARCHAR, max_length=MAX_LENGTH),
]
schema = CollectionSchema(fields, "")

# Step 2.2 create collection
col = Collection(“Demo_multimodal”, schema)

# Step 2.3 Build index for both vector columns .
image_index = {"metric_type": "COSINE"}
col.create_index("image_vector", image_index)
text_index = {"metric_type": "COSINE"}
col.create_index("text_vector", text_index)
col.load()

19 | © Copyright 2024 Zilliz19
Data Vectorization & insertion
# STEP 4. Insert data into milvus OR zilliz.
# Prepare data batch.
chunk_dict_list = []
for chunk, img_url, img_embed, text_embed in zip(
batch_texts,
batch_urls,
image_embeddings, text_embeddings):

# Assemble embedding vector, original text chunk, metadata.
chunk_dict = {
'chunk': chunk,
'image_filepath': img_url,
'text_vector': text_embed,
'image_vector': img_embed
}
chunk_dict_list.append(chunk_dict)

# Actually insert data batch.
# If the data size is large, try bulk_insert()
col.insert(data=chunk_dict_list)
# STEP 3. Data vectorization(i.e. embedding).
image_embeddings, text_embeddings = embedding_model(
batch_images=batch_images,
batch_texts=batch_texts)

20 | © Copyright 2024 Zilliz20
# STEP 4. hybrid_search() is the API for multimodal search
results = col.hybrid_search(
reqs=[image_req, text_req],
rerank=RRFRanker(),
limit=top_k,
output_fields=output_fields)
Final step: Search

21 | © Copyright 2024 Zilliz21
[Multimodal] search with text-only query
Query = "feuilles brunes pendant la journée"

(i.e. "brown leaves during daytime")

22 | © Copyright 2024 Zilliz22
[Multimodal] search with image-only query
[Query is an image]

23 | © Copyright 2024 Zilliz23
[Multimodal] search with text + image query

Query = text + image
1. "silhouette d'une personne assise sur une
roche au couche du soleil"
(i.e. "silhouette of person sitting on rock formation
during golden hour")

2. Image below
Result

24 | © Copyright 2024 Zilliz24
QA

25 | © Copyright 2024 Zilliz25 | © Copyright 9/25/23 Zilliz 25

26 | © Copyright 2024 Zilliz26 | © Copyright 9/25/23 Zilliz 26
curl --request POST \
--url “${MILVUS_HOST}:${MILVUS_PORT}/v2/vectordb/entities/advanced_search” \
--header “Authorization: Bearer ${TOKEN}” \
--header “accept: application/json” \
--header “content-type: application/json” \
-d
{
"collectionName": "book",
"search": {
"search_by": {
"field": "book_intro_vector",
"data": [1, 2, ...],
},
"search_by": {
"field": "book_cover_vector",
"data": [2, 3, ...],
},
},
"rerank": {
"strategy": "rrf",
},
"limit": 10,
}

Retrieve Params
Re-rank Params
Tags