Vector Search @ sw2con for slideshare.pptx

jbellis 155 views 46 slides May 14, 2024
Slide 1
Slide 1 of 46
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43
Slide 44
44
Slide 45
45
Slide 46
46

About This Presentation

Modern Vector Search, covering the biggest advances in the past ten years. Featuring HNSW, DiskANN, PQ, LVQ, Fused ADC


Slide Content

Modern Vector Search SW2Con 2024

magic [0.3025549650192261, 0.1912980079650879, 0.04950578138232231, 0.13541743159294128, 0.22033651173114777, 0.3047471046447754, 0.03519149497151375, 0.41724318265914917, 0.46010446548461914, 0.13088607788085938, 0.11903445422649384, 0.30909594893455505, 0.2992345690727234, 0.17327798902988434, 0.02294405922293663, 0.20794396102428436, 0.46378788352012634, 0.16246692836284637, 0.7109631896018982, 0.20986509323120117, 0.1922052949666977, ... 2048 Dimensions

K Nearest Neighbors search (KNN) ‹#›

The curse of dimensionality, KNN edition ‹#›

The curse of dimensionality, KNN edition ‹#›

The curse of dimensionality, KNN edition ‹#› 3D: 0.1%

‹#› ANN (Approximate Nearest Neighbor)

Partitioned and graph indexes ‹#›

Milestones in ANN ‹#› 2015: FastScan 2016: HNSW 2017: Quick ADC 2018: Quicker ADC 2019: DiskANN 2020: SCANN and APQ 2021: NGT QG 2022: SPANN 2023: LVQ 2024: JVector LTM C ompression Graph construction Compression Compression Graph construction + compression Compression Compression Partitioning Compression Graph construction

IVF Partitioning ‹#›

Search ‹#›

KMeans ‹#›

IVF search ‹#› Coarse: O(centroids) Accurate: O(M * N/centroids) Centroid count needs to be relatively high FAISS recommends O(sqrt(N)) = 64K for N in 1..10M

SPANN (MSR, 2022) ‹#› Better partitioning Dynamic pruning during search Hybrid architecture: centroids in memory, postings lists on disk Scales to 1B+ vectors

Vector database index choices ‹#› Astra Lucene Milvus Pinecone Qdrant Weaviate pgVector Graph Graph Graph Graph Graph Graph Partitioned Graph

Partitioning downsides ‹#› KMeans is O(t*k*n*d) Incremental construction is difficult and slow Difficult to handle deletes SOTA is relatively complex

Graph indexes ‹#›

HNSW (Malkov + Yashunin, 2016) ‹#› First modern graph index Still in use in e.g. Lucene Single-pass search, everything in memory

HNSW: diversity heuristic ‹#›

Hierarchical NSW ‹#›

Larger-than-memory HNSW ‹#›

DiskANN (MSR, 2019) ‹#› Single graph layer Coarse + Accurate passes Coarse performed using compressed vectors in memory Accurate reranks coarse results using full resolution vectors from disk Scales to 1B+ vectors

DiskANN single layer design ‹#›

Non-blocking concurrency = linear scaling ‹#›

Product Quantization (PQ) ‹#›

Product Quantization (PQ)

Without reranking ‹#›

PQ with transparent reranking ‹#›

Binary Quantization is very lossy ‹#›

PQ is very, very hard to beat consistently ‹#›

LVQ: better than PQ at small(er) ratios ‹#›

DiskANN performance ‹#› O(log N) coarse search O(topK) rerank Still O(N) memory use

Beyond DiskANN ‹#› 2023: 10M is a big vector index 2024: 1B is a big vector index (and customers are asking when they can have 10B)

Larger-than-memory index construction ‹#› DiskANN (2019) Split dataset into 40 partitions using kmeans Index each partition separately, adding each node to closest 2 partitions Take the union of edges across all partitioned indexes to make one big index 5 days to build Deep1B dataset (350GB)

Larger-than-memory index construction ‹#› JVector (2024) Build the index using two-phase search (PQ in memory, full resolution on disk) 3h to build Cohere-v3-wikipedia (180GB)

Reducing memory footprint from O(N) to O(1) ‹#› Fused ADC First implemented by NGT (Yahoo! Japan, 2021) Apply Quicker ADC to graph indexes PQ lookup tables stored on disk, not in memory

Memory for 10M openai-v3-small vectors ‹#›

‹#› Conclusion

Milestones in ANN ‹#› 2015: FastScan 2016: HNSW 2017: Quick ADC 2018: Quicker ADC 2019: DiskANN 2020: SCANN 2021: NGT QG 2022: SPANN 2023: LVQ 2024: JVector LTM

Not like this ‹#›

What actually matters ‹#› Basic: Support for PQ Support for reranking Advanced: Larger-than-memory index construction O(1) memory footprint for queries Support for LVQ

Further reading ‹#› DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node Quicker ADC : Unlocking the hidden potential of Product Quantization with SIMD Locally-Adaptive Quantization for Streaming Vector Search
Tags