Engineering a Low-Latency Vector Search Engine for ScyllaDB by Pawel Pery

ScyllaDB 1 views 35 slides Oct 15, 2025
Slide 1
Slide 1 of 35
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35

About This Presentation

Implementing Vector Search in ScyllaDB brings challenges from low-latency to predictable performance at scale. Rather than embedding HNSW indexing directly into the core database, we decoupled vector indexing and similarity search into a dedicated Rust engine. Learn about the architectural design de...


Slide Content

A ScyllaDB Community
Engineering a Low-Latency
Vector Search Engine
for ScyllaDB
Paweł Pery
Software Engineer

Agenda
■Architecture of the Vector Search (ScyllaDB + Vector Store)
■Architecture of the Vector Store
■Performance measurement environment
■Issue with Nagle’s algorithm (+40ms latency)
■Issue with CL=LocalQuorum
■Impact of threads versus CPU cores for performance
■Benchmarks results

Architecture of the ScyllaDB with the Vector Search in the Cloud

Architecture of the Vector Store service

Test environment




Cluster setup
Single R7i.8xlarge instance
(32 CPUs)
CPUs divided into clusters,
4 CPUs in each of them
3 ScyllaDB nodes (4 CPUs each)
3 Vector Store nodes (4 CPUs each)
1 Benchmarking tool node (4 CPUs)
Local network (using 127.0.x.y
addresses)




Dataset
openai_small_50k from VectorDBBench
50k vectors
1536 dimensions
Parameters
m = 16
ef-construction, ef-search = 128
Cosine distance
K (limit) = 100
ScyllaDB’s recall = 95.3%

Benchmark CQL (Table with vectors)
CREATE KEYSPACE {KEYSPACE}
WITH replication = {{
'class': 'NetworkTopologyStrategy',
'replication_factor': '3'
}} AND tablets = {{'enabled': 'false'}}

CREATE TABLE {KEYSPACE}.{TABLE} (
{ID} uuid PRIMARY KEY,
{VECTOR_ID} bigint, // id from dataset
{VECTOR} vector<float, {dimension}>,
)

INSERT INTO {KEYSPACE}.{TABLE} ({ID}, {VECTOR_ID}, {VECTOR}) VALUES (?, ?, ?)

Benchmark CQL (Vector Search index)
CREATE CUSTOM INDEX {INDEX} ON {KEYSPACE}.{TABLE} ({VECTOR})
USING 'vector_index' WITH OPTIONS = {{
'similarity_function': '{metric_type}',
'maximum_node_connections': '{m}',
'construction_beam_width': '{ef_construction}',
'search_beam_width': '{ef_search}'
}}

SELECT {VECTOR_ID} FROM {KEYSPACE}.{TABLE} ORDER BY {VECTOR} ANN OF ? LIMIT ?

Benchmark test
for i in range(1, max, step=3) {
sleep(duration=2sec);
run_benchmark(
concurrency=i,
limit=?,
duration=10s,
target={scylla|vector-store}
);
}

parse_logs();
create_charts();

Dataset 50k, r7i.8xlarge, Scylla 3 x 4 CPUs, Vector Store 3 x 4 CPUs, k=100, recall~=95.3%

Dataset 50k, r7i.8xlarge, Scylla 3 x 4 CPUs, Vector Store 3 x 4 CPUs, k=100, recall~=95.3%

Dataset 50k, r7i.8xlarge, Scylla 3 x 4 CPUs, Vector Store 3 x 4 CPUs, k=100, recall~=95.3%
~46ms difference in latency

HTTP communication dump between Scylla and Vector Store
40ms delay between TCP frame and ACK

Dataset 50k, r7i.8xlarge, Scylla 3 x 4 CPUs, Vector Store 3 x 4 CPUs, k=100, recall~=95.3%

Consistency Level
// Default for Rust driver
CONSISTENCY LOCAL_QUORUM;

// Every request sends reads to other nodes and awaits for 2 responses
// Quorum from 3 nodes is 2
SELECT {VECTOR_ID} FROM {KEYSPACE}.{TABLE} ORDER BY {VECTOR} ANN OF ? LIMIT ?

Consistency Level
// Default for Rust driver
CONSISTENCY LOCAL_QUORUM;

// Every request sends reads to other nodes and awaits for 2 responses
// Quorum from 3 nodes is 2
SELECT {VECTOR_ID} FROM {KEYSPACE}.{TABLE} ORDER BY {VECTOR} ANN OF ? LIMIT ?

CONSISTENCY ONE;

// The node which received request will answer as we have 3 nodes with RF=3
SELECT {VECTOR_ID} FROM {KEYSPACE}.{TABLE} ORDER BY {VECTOR} ANN OF ? LIMIT ?

Dataset 50k, r7i.8xlarge, Scylla 3 x 4 CPUs, Vector Store 3 x 4 CPUs, k=100, recall~=95.3%

Possible assignment of threads to CPUs

Dataset 50k, r7i.8xlarge, Vector Store 1 x 4 CPUs, k=100

Dataset 50k, r7i.8xlarge, Vector Store 1 x 4 CPUs, k=100

Dataset 50k, r7i.8xlarge, Vector Store 1 x 4 CPUs, k=100

Dataset 50k, r7i.8xlarge, Vector Store 1 x 4 CPUs, k=100

CPU Usage for a single ANN request

CPU Usage for a single ANN request
Request received
Response send
USearch computation

CPU Usage for two ANN requests
Request received
Response send
USearch
computation from
other request

How to “prioritize” tasks in Rust Tokio
let semaphore = Arc::new(Semaphore::new(num_cpus));
while let Some(msg) = rx.recv().await {
let permit = Arc::clone(&semaphore).acquire_owned().await;
tokio::spawn(async move {

// move the current task to the end of the runtime queue
tokio::yield_now().await;

do_usearch_computation(msg);
});
}

Dataset 50k, r7i.8xlarge, Vector Store 1 x 4 CPUs, k=100

Dataset 50k, r7i.8xlarge, Vector Store 1 x 4 CPUs, k=100

Dataset 50k, r7i.8xlarge, Vector Store 1 x 4 CPUs, k=100

Cloud environment




Cluster setup
3 x Scylla i4i.xlarge instance (4 CPUs)
3 x Vector Store r7i.xlarge instances (4 CPUs)
1 Benchmarking tool r7i.8xlarge instance (32
CPUs)
Same DC and AZ
ScyllaDB
2025.4.0~dev-0.20250907.040d6e224559
Vector Store 0.4.0
Parameters
m = 16
ef-construction, ef-search = 128
Cosine distance
K (limit) = 100
ScyllaDB’s recall = 95.3%
Build time table ~7sec, index ~25sec




Dataset
openai_small_50k from VectorDBBench
50k vectors
1536 dimensions

Dataset 50k, Scylla 3 x i4i.xlarge (4 CPUs), Vector Store 3 x r7i.xlarge (4 CPUs), RF=3

Cloud environment




Cluster setup
3 x Scylla i4i.4xlarge instance (16 CPUs)
3 x Vector Store r7i.8xlarge instances
(16 CPUs)
1 Benchmarking tool r7i.8xlarge instance
(32 CPUs)
Same DC and AZ
ScyllaDB
2025.4.0~dev-0.20250907.040d6e224559
Vector Store 0.4.0
Parameters
m = 16
ef-construction, ef-search = 128
Cosine distance




Dataset
openai_small_50k from VectorDBBench
50k vectors
1536 dimensions

Dataset 50k, Scylla 3 x i4i.4xlarge (16 CPUs), Vector Store 3 x r7i.8xlarge (16 CPUs), RF=3

Cloud environment




Cluster setup
3 x Scylla i4i.8xlarge instance (32 CPUs)
3 x Vector Store r7i.16xlarge instances
(32 CPUs)
1 Benchmarking tool r7i.8xlarge instance
(32 CPUs)
Same DC and AZ
ScyllaDB
2025.4.0~dev-0.20250910.ce4592d8fcf8
Vector Store 0.3.0 + task scheduler with
yield
Parameters
m = 16
ef-construction, ef-search =
128
Cosine distance





Dataset
openai_small_50k
from VectorDBBench
50k vectors
1536 dimensions




Dataset
Laion_large_100m
from VectorDBBench
100m vectors
768 dimensions
Parameters
m = 64
ef-construction, ef-search =
512
Euclidean distance
Table build time ~43 min
Index build time ~7h

Dataset 100M, Scylla 3 x i4i.8xlarge (32 CPUs), Vector Store 3 x r7i.16xlarge (64 CPUs), RF=3

Thank you! Let’s connect.
Paweł Pery
[email protected]
Tags