Beginner's Guid to Vector Dtabase by Tom Yeh

saurabstha07 82 views 47 slides Jun 22, 2024
Slide 1
Slide 1 of 47
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46
Slide 47
47

About This Presentation

Vector database


Slide Content

Beginner’s Guide to
Vector Databases
AI by Hand ✍
Prof. Tom Yeh
1
Hosted by:

AI by Hand ✍2024 © Tom Yeh
Roadmap
Database+VectorRetrieval
Dot ProductWord
Embedding
Sentence
Embedding
Search
Transformer
Q/A
2

Beginner’s Guide to Vector Databases -AI by Hand ✍
Database

AI by Hand ✍2024 © Tom Yeh
Fun fact
There are ___________________ millions dogs in the world!
4

AI by Hand ✍2024 © Tom Yeh
How to create a table?
__________ TABLE _________
( id __________,
name __________________,
size _________,
pop _________)
idnamesizepop
SQL:
5

AI by Hand ✍2024 © Tom Yeh
How to insert a record?
_____________ INTO animals
_____________ (1, dog, 2, 900)
SQL:
idnamesizepop
1dog2900
6

Beginner’s Guide to Vector Databases -AI by Hand ✍
Vector Database

AI by Hand ✍2024 © Tom Yeh
How to create a vector database?
CREATE TABLE animals
( id INT,
name VARCHAR(10),
size INT,
pop INT,
emb _________________ not null )
SQL:
idnamesizepop
8

AI by Hand ✍2024 © Tom Yeh
How to insert a record with a vector?
INSERT INTO animals
VAUES (1, dog, 2, 900, ______________)
SQL:
idnamesizepopemb
1dog2900
9

Beginner’s Guide to Vector Databases -AI by Hand ✍
Retrieval

AI by Hand ✍2024 © Tom Yeh
idnamesizepopemb
1dog2900
2bat110000
210
012
Which record is relevant to the query “cat”?
cat
120
Query
11

AI by Hand ✍2024 © Tom Yeh
Drawdistancevssimilarity
distancesimilarity
12

AI by Hand ✍2024 © Tom Yeh
Distancevssimilarityonascaleof1to5
similarity
distance
asc or desc
asc or desc
13

AI by Hand ✍2024 © Tom Yeh
How to retrieve by similarity? (dotproduct)
________ name, emb<___>[__,__,__] AS score
FROM animals
________ BY ______ ASC | DESC ;l
14

AI by Hand ✍2024 © Tom Yeh
How to retrieve by distance? (Euclidean)
SELECT name, emb<*>[1, 2, 0] AS score
FROM animals
ORDER BY score DESC;
15

Beginner’s Guide to Vector Databases -AI by Hand ✍
Dot Product

AI by Hand ✍2024 © Tom Yeh
How to compute dot product?
dog210
cat120
***
Result

123
220
2406
***
Result

Example:
=== ===
17

AI by Hand ✍2024 © Tom Yeh
How to compute dot product using matrix
multiplication?
dog
2
1
0
cat120
Example:
1
2
3
2206
18

AI by Hand ✍2024 © Tom Yeh
How to compute dot product with multiple
vectors?
dog
cat1204
Example:
11
21
31
22064
20
11
02
bat
19

Beginner’s Guide to Vector Databases -AI by Hand ✍
Word Embedding

AI by Hand ✍2024 © Tom Yeh
Wherearedog,catandbatinthe“name”
space?
21

AI by Hand ✍2024 © Tom Yeh
Wherearedog,catandbatinthe“name”
space?
22

AI by Hand ✍2024 © Tom Yeh
Whichembeddingisbetter?
dogcatbat
210
121
002
dogcatbat
201
110
022
Embedding 1Embedding 2
23

AI by Hand ✍2024 © Tom Yeh
Whichembeddingisbetter?
dogcatbat
210
121
002
dog210
cat120
bat012
dogcatbat
201
110
022
dog210
cat012
bat102
dogcatbat
210
121
002
dog210
cat120
bat012
HL
HL
LL
Desired
dot
product
similarity
Embedding 1Embedding 2
24

Beginner’s Guide to Vector Databases -AI by Hand ✍
Sentence Embedding

AI by Hand ✍2024 © Tom Yeh
How to embed sentences?
idcommentuseremb
1How are you?John?
2Who are you?Mary?
26

AI by Hand ✍2024 © Tom Yeh
aanthehowwhywhowhatareisambewasyouweItheysheheshemehimher
0-1010100-11000310-1000-10
202000-1100021020200200
-10-11200101-100-10300-102-1
01001010101-20001010101
howareyou
“Howareyou”àwordembeddingvectors
27

AI by Hand ✍2024 © Tom Yeh
WordvectorsàSentencevector
Method1:Concatenate
100
011
110
000
howareyou
28

AI by Hand ✍2024 © Tom Yeh
WordvectorsàSentencevector
Method2:Average
100
011
110
000
idcommentuseremb
1How are you?John
2Who are you?Mary
howareyou
29

AI by Hand ✍2024 © Tom Yeh
aanthehowwhywhowhatareisambewasyouweItheysheheshemehimher
0-1010100-11000310-1000-10
202000-1100021020200200
-10-11200101-100-10300-102-1
01001010101-20001010101
00
11
10
00
whoareyou
“Whoareyou”àwordembeddingvectors
30

AI by Hand ✍2024 © Tom Yeh
WordvectorsàSentencevector
Method2:Average
100
011
010
000
idcommentuseremb
1How are you?John[1/3, 2/3, 2/3, 0]
2Who are you?Mary
whoareyou
31

AI by Hand ✍2024 © Tom Yeh
How to query by SQL?
________ comment, emb<___>[__,__,__,__] AS score
FROM posts
ORDER BY ______ ASC | DESC ;
32

AI by Hand ✍2024 © Tom Yeh
How to query using a high-level API?
query = Query(post_index)
._________(post)
._________(relevance_space.text, Param("________"))
app.query(query, _________ = "who are you?" )
Source: Superlinked.com
33

Beginner’s Guide to Vector Databases -AI by Hand ✍
Search

AI by Hand ✍2024 © Tom Yeh
K-Nearest Neighbor, K=3, Dot-Product
9-899031-6011313-2615-976-58Query
1234567891011121314151617181920
Database
emb
ID
{ max | min }
35

AI by Hand ✍2024 © Tom Yeh
K-Nearest Neighbor, K=3, L2
1234567891011121314151617181920
68991100912152131261597658Query
Database
emb
ID
{ max | min }
36

Beginner’s Guide to Vector Databases -AI by Hand ✍
Transformer

AI by Hand ✍2024 © Tom Yeh
100
011
110
000
Word
Embedding
Vectors
Sentence
Embedding
Vector
Howto use a Transformer to get a sentence embedding
vector?
38

AI by Hand ✍2024 © Tom Yeh
Howtocombineacrosspositions?
100
011
110
000
1
0
1
39

AI by Hand ✍2024 © Tom Yeh
Howtocombineacrosspositions?
100
011
110
000
1
0
1
1
1
1
0
0
1
1
40

AI by Hand ✍2024 © Tom Yeh
Howtocombineacrosspositions?
100
011
110
000
1
0
1
1
1
1
0
0
1
1
0
2
1
0
0
0
1
41

AI by Hand ✍2024 © Tom Yeh
Howtocombineacrossfeatures?
100
011
110
000
10-101
111
42

AI by Hand ✍2024 © Tom Yeh
Howtocombineacrossfeatures?
100
011
110
000
10110-101
111
01100
43

AI by Hand ✍2024 © Tom Yeh
Howtocombineacrosspositions and
features?
100
011
110
000
1
0
1
0
1
1
0
2
1
0
0
0
1
0
1
0
0
10-101
111
44

AI by Hand ✍2024 © Tom Yeh
Howto use a Transformer to get a sentence embedding
vector?
100
011
110
000
10-101
01100
00011
00110
100
010
111
00
21
10
00
231
111
110
111
Word
Embedding
Vectors
Sentence
Embedding
Vector
45

Beginner’s Guide to Vector Databases -AI by Hand ✍
Q/A

AI by Hand ✍2024 © Tom Yeh 47
Tags