The curse of dimensionality, KNN edition ‹#› 3D: 0.1%
‹#› ANN (Approximate Nearest Neighbor)
Partitioned and graph indexes ‹#›
Milestones in ANN ‹#› 2015: FastScan 2016: HNSW 2017: Quick ADC 2018: Quicker ADC 2019: DiskANN 2020: SCANN and APQ 2021: NGT QG 2022: SPANN 2023: LVQ 2024: JVector LTM C ompression Graph construction Compression Compression Graph construction + compression Compression Compression Partitioning Compression Graph construction
IVF Partitioning ‹#›
Search ‹#›
KMeans ‹#›
IVF search ‹#› Coarse: O(centroids) Accurate: O(M * N/centroids) Centroid count needs to be relatively high FAISS recommends O(sqrt(N)) = 64K for N in 1..10M
SPANN (MSR, 2022) ‹#› Better partitioning Dynamic pruning during search Hybrid architecture: centroids in memory, postings lists on disk Scales to 1B+ vectors
Partitioning downsides ‹#› KMeans is O(t*k*n*d) Incremental construction is difficult and slow Difficult to handle deletes SOTA is relatively complex
Graph indexes ‹#›
HNSW (Malkov + Yashunin, 2016) ‹#› First modern graph index Still in use in e.g. Lucene Single-pass search, everything in memory
HNSW: diversity heuristic ‹#›
Hierarchical NSW ‹#›
Larger-than-memory HNSW ‹#›
DiskANN (MSR, 2019) ‹#› Single graph layer Coarse + Accurate passes Coarse performed using compressed vectors in memory Accurate reranks coarse results using full resolution vectors from disk Scales to 1B+ vectors
DiskANN single layer design ‹#›
Non-blocking concurrency = linear scaling ‹#›
Product Quantization (PQ) ‹#›
Product Quantization (PQ)
Without reranking ‹#›
PQ with transparent reranking ‹#›
Binary Quantization is very lossy ‹#›
PQ is very, very hard to beat consistently ‹#›
LVQ: better than PQ at small(er) ratios ‹#›
DiskANN performance ‹#› O(log N) coarse search O(topK) rerank Still O(N) memory use
Beyond DiskANN ‹#› 2023: 10M is a big vector index 2024: 1B is a big vector index (and customers are asking when they can have 10B)
Larger-than-memory index construction ‹#› DiskANN (2019) Split dataset into 40 partitions using kmeans Index each partition separately, adding each node to closest 2 partitions Take the union of edges across all partitioned indexes to make one big index 5 days to build Deep1B dataset (350GB)
Larger-than-memory index construction ‹#› JVector (2024) Build the index using two-phase search (PQ in memory, full resolution on disk) 3h to build Cohere-v3-wikipedia (180GB)
Reducing memory footprint from O(N) to O(1) ‹#› Fused ADC First implemented by NGT (Yahoo! Japan, 2021) Apply Quicker ADC to graph indexes PQ lookup tables stored on disk, not in memory
What actually matters ‹#› Basic: Support for PQ Support for reranking Advanced: Larger-than-memory index construction O(1) memory footprint for queries Support for LVQ
Further reading ‹#› DiskANN: Fast Accurate Billion-point Nearest Neighbor Search on a Single Node Quicker ADC : Unlocking the hidden potential of Product Quantization with SIMD Locally-Adaptive Quantization for Streaming Vector Search