Multimodal Search with Open-Source Tools

chloewilliams62 122 views 37 slides Oct 11, 2024

Slide 1 of 37

About This Presentation

A recent and exciting development in the world of Generative AI has been the use of language to understand images, video, and sound. One example is multi-modal retrieval, which is the process of using one modality, like text, to search another modality, like images. It is not only useful for search ...

Size: 10.61 MB

Language: en

Added: Oct 11, 2024

Slides: 37 pages

Slide Content

1| © Copyright 10/22/23 Zilliz
Stefan Webb
Developer Advocate, Zilliz
[email protected]
https://www.linkedin.com/in/stefan-webb
https://x.com/stefan_webb
Unstructured Data Meetup | Host

2| © Copyright 10/22/23 Zilliz
Welcome Speakers
Practical Guide to
Deploying LLMs
TECH TALK
Chaoyu Yang
Founder & CEO, BentoML
Timeplus - One single binary
to tackle streaming and
historical analytics
TECH TALK
Ken Chen
Co-Founder & Chief Architect, Timeplus
Enhancing Context
Provision for LLMs:
Beyond RAG and Prompts
Brian Hart
Engineer, Tecton
LIGHTNING TALK

Scan for a
chance to
win a prize
tonight!

4| © Copyright 10/22/23 Zilliz
Milvus
Open Source Self-Managed

Zilliz Cloud
SaaS Fully-Managed

github.com/milvus-io/milvus

Getting Started with Vector Databases
zilliz.com/cloud

Searching the Web with Gen AI

or
Apple

or
Rising dough

or
Change car tire
Rising Dough
Proofing
Bread
✔
❌
Why is Semantic Search Difficult?

Why is Semantic Search Important?
10%
Other
newly generated data in 2025
will be unstructured data90%
Data Source: The Digitization of the World by IDC

Solution: Deep Learning
Similarity Search

Vector Embedding

Vector Space

Embeddings Models

New Challenge: Search in Vector Spaces
How to Index and
Search?

●High-dimensional
●> 1000 dims
How to Scale?

●10-100 million vectors?
●Billions?
●Trillions?
●Billions of users?
Multiple Data Types?

●Text
●Images
●Audio
●Graphs
●…

Milvus is an Open-Source Vector Database to
store, index, manage, and use the massive
number of embedding vectors generated by
deep neural networks and LLMs.
contributors
400
stars
30K
docker pulls
66M
forks
2.7K
+
Milvus: High-performance, scalable vector database

Retrieval Augmented
Generation RAG
Expand LLMs' knowledge by
incorporating external data sources
into LLMs and your AI applications
Match user behavior or content
features with other similar ones to
make effective recommendations
Recommender System
RecSys
Search for semantically similar
texts across vast amounts of
natural language documents
Text/Semantic Similarity
Search
Molecular Similarity
Search
Search for similar substructures,
superstructures, and other
structures for a specific molecule
Fraud & Anomaly
Detection
Detect data points, events, and
observations that deviate
significantly from the usual pattern
Multimodal Similarity
Search
Search over multiple types of
data simultaneously, e.g. text,
audio, images, video
Common AI Use Cases

Deployment Options
Milvus Lite

●Locally hosted
●Suitable for prototyping
and demos
●10s of millions of vectors
Milvus Standalone

●Single remote/local server
●“Medium” scale
●Simplified setup,
maintenance, etc.
compared to cluster
●100s millions of vectors
Milvus Cluster

●Distributed system
●Many different types of
nodes
●100s of billions of vectors

Why Open-Source?
Cost-effective Innovation Community

Why Not Traditional Databases?
Suboptimal
Indexing / Search
Scaling Inadequate Query
& Analytics Support

Benchmarks
Shows 3-20x faster comparing with
open source Milvus
At least 6x faster than other vector databases
https://github.com/zilliztech/VectorDBBench

20| © Copyright 10/22/23 Zilliz
Integrated into Generative AI Tooling Ecosystem
Framework
Hardware
Infrastructure
Embedding Models LLMs
Software Infrastructure
Vector Database

Milvus Users

The Forrester Wave™ Vector
Database Providers, Q3 2024
Zilliz is recognized as the Leader in
the Vector DB Space

How Does Similarity Search Work?

Multimodal Semantic Similarity?
(multimodal models)

Multi-Modal Embeddings
text
encoder
image
encoder
“the lion sleeps”
“a lion roars”
“a dog is walked”
✕
✕
✕
✕
✕
✕

Why Multi-Modal?
Retrieval /
Similarity Search
Foundation
Models
RAG
vector
database
large
vision-language
model
“what is the user
clicking on?”
“from the screenshot
provided, I see the user
is…”
“photos and
recordings of Iberian
lynx”
“produce a graph of
revenue from
2021-2023”

How Does it Work?
•Dataset of e.g., (image, text) pairs
•Typically, mined from web
•For example, (<img src>, <img alt>)
•Pre-processing required for performance
•Train encoders so that embeddings for (image, text) are close
•Either initialized from pre-trained encoders or from scratch
•Simultaneously, penalize distances for (image, random text)
•This is called, Contrastive Learning

MagicLens: Task
[Figure 1; DeepMind, 2024]
Input:
•a query image, and;
•an instruction that specifies a specific
semantic relation

Output:
•matching image(s) from our database.

MagicLens: Data
[Figure 2; DeepMind, 2024]
Steps:
•Find image pairs on the same
webpages
•With PaLI, use images + alt. text to
generate descriptions of each image
and relationship between them
•Filter for sensitive content, similarity
•With PaLM2, form an instruction that
relates query to target

MagicLens: Model
[Figure 4; DeepMind, 2024]
3⃣ query image + instruction text
4⃣ target image + empty text
1⃣ initialized to pre-trained models
2⃣ additional layers

MagicLens: Training
Minimize contrastive loss function, which for an element of a minibatch
is:
4⃣ similarity between (query, instruction)
and (query, “”)
1⃣ similarity between (query, instruction) and (target, “”)
3⃣ similarity between (query, instruction)
and (non-target, “”)
2⃣ sum over minibatch

Milvus Demo: Multimodal Image Search
●https://multimodal-demo.milvus.io/

Join the
Milvus
Discord!

Have you built
something cool
using Milvus or
Zilliz? We want to
hear all about it.
Share Your Story

Zilliz is
Hiring!

Join our Team

Zilliz.com/careers
•Account Executives
•Revenue Operations
Manager
•Engineering Manager
•Staff Software Engineer

36 | © Copyright 10/22/23 Zilliz36 | © Copyright 10/22/23 Zilliz
Become a
Speaker!
Interesting in speaking at and/or
sponsoring a Zilliz Unstructured
Data Meetup? Fill out this form!

??????????????????

Multimodal Search with Open-Source Tools

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Multimodal Search with Open-Source Tools

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......