Confluent Current 2024 - Multimodal Embeddings

chloewilliams62 56 views 28 slides Sep 20, 2024
Slide 1
Slide 1 of 28
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28

About This Presentation

Frank's talk on multimodal embeddings


Slide Content

Multimodal Embeddings
Frank Liu | Head of AI & ML, Zilliz

Frank Liu
Head of AI & ML | Zilliz

A Quick Refresher
Beyond Text and Image Embeddings
Multimodal Embeddings
Demo Time!
Agenda

01
02
03
04

About Milvus
Milvus is an open-source vector database
for GenAI projects. pip install on your laptop,
plug into popular AI dev tools, and push to
production with a single line of code.
29K
GitHub Stars
25M
Downloads
250
Contributors
2,600
Forks
Easy Setup
Pip-install to start coding in a notebook within seconds
Integration
Plug into OpenAI, Langchain, LlmaIndex, and many more
Reusable Code
Write once, and deploy with one line of code into the production
environment
Feature-rich
Dense & sparse embeddings, filtering, reranking and beyond

A Quick
Refresher

Vectors Unlock Unstructured Data
Knowledge Base
Documents)
Embedding Models Vectors Vector Databases

Embeddings models workhorses of AI apps

Beyond Text and Image
Embeddings

Back in the day…

…feature vectors were handcrafted


SIFT
TFIDF
Harris Corner Detector

Circa 2012, convnets became immensely popular




Source: CS230 notes

And many people discovered the power of vectors

RNNs were the OG language model







Source: CS230 notes

Vectors are for more than just text and images

Vectors are for more than just text and images

Vectors are for more than just text and images

Vectors are for more than just text and images

Vectors are for more than just text and images

Multimodal
Embeddings

Visual + language embeddings (CLIP-like)













Source: CLIP blog

One embedding space, six modalities (ImageBind)





Source: Girdhar, et al.

LLMs are becoming natively multimodal…







Source: Llama Team, AI @ Meta

… and the best embedding models will too









Original: Fuyu-8B blog

Demo Time

Vanilla RAG is no longer enough…











Your Documents
Embedding Model
Milvus
Search
Gen AI Model
Reliable Answers
What is the default
AUTOINDEX distance
metric in Milvus Client?
The default
AUTOINDEX distance
metric in Milvus Client is
L2.
Question
Question + Context

… we need multimodal RAG











Multimodal Model
Milvus
Question
Question + Context
Search
Gen AI Model
Reliable Answers
What kind of music did
they play in the
pre-show?
The musician played
improvised electronic
music.

Connect with me









LinkedIn Twitter Demo Notebook
Tags