Elevating PostgreSQL: Benchmarking Vector Search Performance

ScyllaDB 306 views 35 slides Oct 15, 2024

Slide 1 of 35

About This Presentation

PostgreSQL continues to evolve with vector search extensions like pgvector and pgvecto.rs. We'll explore recent benchmarks comparing vector search performance across various datasets and configurations, highlighting PostgreSQL's adaptability in modern use cases. #PostgreSQL #VectorSearch

Size: 2.38 MB

Language: en

Added: Oct 15, 2024

Slides: 35 pages

Slide Content

A ScyllaDB Community
Elevating PostgreSQL:
Benchmarking Vector Search
Performance
Daniel Seybold
Co-Founder of benchANT

Daniel Seybold (he/him)

Co-Founder at benchANT
■PhD about benchmarking cloud and database
systems
■All about distributed systems, databases and
cloud
■Enjoys to demystify the black art of database
benchmarking
■Loves every kind of racket sports

Database Hot Topics

Agenda
■The Vector Databases Landscape (Trending topics in the database world)
■Benchmarking Vector Databases
■PostgreSQL Vector Benchmark Results
■Takeaways

Hot Topics in the Database World

Hot Topics in the Database World
Google Trends for the search term “vector database” over the last ﬁve years

The Vector Database Landscape
native vector databases general purpose databases with
vector support

The Elephant in the Room
https://www.timescale.com/blog/postgres-the-birdhorse-of-databases/

PostgreSQL for Vector
Search

What is a vector database (I/II)
■stores data as a vectors
■stores embeddings (i.e. vectors) together with original data (e.g. text, images)
■each vector represents a data point using n dimensions
■embeddings are typically created outside of the database
●the higher the number of dimensions the better the quality
■provides similarity search capabilities by algorithms of the Approximate
Nearest Neighbour (ANN) class

What is a vector database (II/II)
■vector data needs to be indexed to enable eﬃcient lookups
■various indexing algorithms are available
● Inverted File Index (IVF)
●Hierarchical Navigable Small World (HNSW) graphs
●and many more
■for more details on vector databases see
https://thedataquarry.com/posts/vector-db-1/

PostgreSQL Vector Search Extensions

■pgvector (0.7.4)
■pgvecto.rs (0.3.0)
■pgvectorscale (0.3.0)
■lantern (0.3.2)
■and probably many more

PostgreSQL Vector Search Extensions
index types quantization additional details
pgvector IVFFLAT
HNSW

binary most popular vector
extension
pgveco.rs IVFFLAT
HNSW
scalar, product supports up to
65535 dimensions
pgvectorscale IVFFLAT
HNSW
StreamingDiskANN
Statistical Binary
Quantization
extends pgvector
lantern HNSW scalar, binary,
product
index creation
outside of the
database instance

PostgreSQL Vector Search Extensions
■Which index type ﬁts best for my target data set?
■Which throughput and latency numbers can be expected?
■Which PostgreSQL extension provides the best performance?

Benchmarking Vector
Databases

Why to Benchmark Vector Databases
■Comparing the performance of a native vector database with a general
purpose database with vector support

■Get a general understanding of the vector search performance for your target
data sets

■Exploring the performance impact of resource and database layer knobs for
vector search use cases

From an Real-World Application to a Benchmark
■Building a synthetic vector search benchmark is more straightforward approach as
for many OLTP/HTAP/OLAP applications

■Important data set parameters:
●# of vectors
●vector dimensions

■Important query parameters:
●ﬁltering

Vector Database Benchmark Suites
■ANN-Benchmark
■Big-ANN-Benchmark
■pgvectorbench
■Qdrant vector-db-benchmark
■VectorDBBench

For a continuously updated list of database benchmarks:
https://benchant.com/blog/benchmarking-suites

Vector Database Benchmark Metrics
■throughput
●ingestion
●search
■latency
■recall — search quality

PostgreSQL Vector Search
Benchmarks

Benchmarking Objectives
■knobs and bolts to consider when running vector search benchmarks or
reading reports (in general and for PostgreSQL)

■performance impact of using different index types on different data sets

■baseline performance numbers for PostgreSQL with pgvector and pgvectors

Out of Scope Objectives (for this Talk)
■comparative benchmarks against other vector databases
■in-depth benchmarking for each PostgreSQL vector extension
●analyzing index-speciﬁc parameters
●analysing extension speciﬁc parameters
■optimizing hardware and PostgreSQL conﬁguration for vector search
●analyzing compute and storage resources
●analyzing PostgreSQL conﬁguration options

Benchmark Methodology
■benchmark suite: VectorDBBench
●benchANT fork with extensions
■benchANT framework for benchmark execution
●automated database benchmarking in the cloud
●TODO: add ref to paper/blog
■benchmark setup: OVH, single node, conﬁg by DBTune
■data is available on GitHub

Benchmark Setup
■PostgreSQL Version 16
●pgvector 0.7.4
●pgvecto.rs 0.3.0
■IaaS: OVH VM with 16 vCores/64GB RAM
■VectorDBBench
●based on 0.12.0
●benchANT fork https://github.com/benchANT/vectordbbench
■CLI support
■index creation during optimize step

VectorDBBench Workﬂow
def load():
# download vector data set
# create database
# create index (option I)
# ingest into database (single threaded)

def optimize():
# create index (option II)
# apply DB specific tuning

def query():
# execute search query with 1..30 threads

Benchmark Results: Index Types

Benchmark Results: Index Creation Timing

pgvecto.rs Index Type and Quantization
Performance

pgvector vs. pgvecto.rs Performance 1M Cohere

Takeaways

Takeaways
■The vector database market is rapidly growing
■The PostgreSQL ecosystem provides different vector search extensions
■The vector search extensions differ in their supported indexes, index creation
approaches and additional features but also in their performance
■Benchmarking vector databases supports the selection of the right
technology for your use case
■Vector database benchmarking requires the consideration of new workload
parameters to create meaningful benchmark results

Thank you! Let’s connect.
Daniel Seybold
[email protected]
https://benchant.org/ — DataScaleFail Newsletter

Elevating PostgreSQL: Benchmarking Vector Search Performance

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Elevating PostgreSQL: Benchmarking Vector Search Performance

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......