Engineering a Low-Latency Vector Search Engine for ScyllaDB by Pawel Pery
ScyllaDB
1 views
35 slides
Oct 15, 2025
Slide 1 of 35
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
About This Presentation
Implementing Vector Search in ScyllaDB brings challenges from low-latency to predictable performance at scale. Rather than embedding HNSW indexing directly into the core database, we decoupled vector indexing and similarity search into a dedicated Rust engine. Learn about the architectural design de...
Implementing Vector Search in ScyllaDB brings challenges from low-latency to predictable performance at scale. Rather than embedding HNSW indexing directly into the core database, we decoupled vector indexing and similarity search into a dedicated Rust engine. Learn about the architectural design decisions that enabled us to combine and integrate ScyllaDB's shard-per-core for real-time operations and high-performance ANN processing via USearch.
Size: 3.63 MB
Language: en
Added: Oct 15, 2025
Slides: 35 pages
Slide Content
A ScyllaDB Community
Engineering a Low-Latency
Vector Search Engine
for ScyllaDB
Paweł Pery
Software Engineer
Agenda
■Architecture of the Vector Search (ScyllaDB + Vector Store)
■Architecture of the Vector Store
■Performance measurement environment
■Issue with Nagle’s algorithm (+40ms latency)
■Issue with CL=LocalQuorum
■Impact of threads versus CPU cores for performance
■Benchmarks results
Architecture of the ScyllaDB with the Vector Search in the Cloud
Architecture of the Vector Store service
Test environment
Cluster setup
Single R7i.8xlarge instance
(32 CPUs)
CPUs divided into clusters,
4 CPUs in each of them
3 ScyllaDB nodes (4 CPUs each)
3 Vector Store nodes (4 CPUs each)
1 Benchmarking tool node (4 CPUs)
Local network (using 127.0.x.y
addresses)
Dataset
openai_small_50k from VectorDBBench
50k vectors
1536 dimensions
Parameters
m = 16
ef-construction, ef-search = 128
Cosine distance
K (limit) = 100
ScyllaDB’s recall = 95.3%
Benchmark CQL (Table with vectors)
CREATE KEYSPACE {KEYSPACE}
WITH replication = {{
'class': 'NetworkTopologyStrategy',
'replication_factor': '3'
}} AND tablets = {{'enabled': 'false'}}
CREATE TABLE {KEYSPACE}.{TABLE} (
{ID} uuid PRIMARY KEY,
{VECTOR_ID} bigint, // id from dataset
{VECTOR} vector<float, {dimension}>,
)
INSERT INTO {KEYSPACE}.{TABLE} ({ID}, {VECTOR_ID}, {VECTOR}) VALUES (?, ?, ?)
Benchmark CQL (Vector Search index)
CREATE CUSTOM INDEX {INDEX} ON {KEYSPACE}.{TABLE} ({VECTOR})
USING 'vector_index' WITH OPTIONS = {{
'similarity_function': '{metric_type}',
'maximum_node_connections': '{m}',
'construction_beam_width': '{ef_construction}',
'search_beam_width': '{ef_search}'
}}
SELECT {VECTOR_ID} FROM {KEYSPACE}.{TABLE} ORDER BY {VECTOR} ANN OF ? LIMIT ?
Benchmark test
for i in range(1, max, step=3) {
sleep(duration=2sec);
run_benchmark(
concurrency=i,
limit=?,
duration=10s,
target={scylla|vector-store}
);
}
parse_logs();
create_charts();
Dataset 50k, r7i.8xlarge, Scylla 3 x 4 CPUs, Vector Store 3 x 4 CPUs, k=100, recall~=95.3%
Dataset 50k, r7i.8xlarge, Scylla 3 x 4 CPUs, Vector Store 3 x 4 CPUs, k=100, recall~=95.3%
Dataset 50k, r7i.8xlarge, Scylla 3 x 4 CPUs, Vector Store 3 x 4 CPUs, k=100, recall~=95.3%
~46ms difference in latency
HTTP communication dump between Scylla and Vector Store
40ms delay between TCP frame and ACK
Dataset 50k, r7i.8xlarge, Scylla 3 x 4 CPUs, Vector Store 3 x 4 CPUs, k=100, recall~=95.3%
Consistency Level
// Default for Rust driver
CONSISTENCY LOCAL_QUORUM;
// Every request sends reads to other nodes and awaits for 2 responses
// Quorum from 3 nodes is 2
SELECT {VECTOR_ID} FROM {KEYSPACE}.{TABLE} ORDER BY {VECTOR} ANN OF ? LIMIT ?
Consistency Level
// Default for Rust driver
CONSISTENCY LOCAL_QUORUM;
// Every request sends reads to other nodes and awaits for 2 responses
// Quorum from 3 nodes is 2
SELECT {VECTOR_ID} FROM {KEYSPACE}.{TABLE} ORDER BY {VECTOR} ANN OF ? LIMIT ?
CONSISTENCY ONE;
// The node which received request will answer as we have 3 nodes with RF=3
SELECT {VECTOR_ID} FROM {KEYSPACE}.{TABLE} ORDER BY {VECTOR} ANN OF ? LIMIT ?
Dataset 50k, r7i.8xlarge, Scylla 3 x 4 CPUs, Vector Store 3 x 4 CPUs, k=100, recall~=95.3%
Possible assignment of threads to CPUs
Dataset 50k, r7i.8xlarge, Vector Store 1 x 4 CPUs, k=100
Dataset 50k, r7i.8xlarge, Vector Store 1 x 4 CPUs, k=100
Dataset 50k, r7i.8xlarge, Vector Store 1 x 4 CPUs, k=100
Dataset 50k, r7i.8xlarge, Vector Store 1 x 4 CPUs, k=100
CPU Usage for a single ANN request
CPU Usage for a single ANN request
Request received
Response send
USearch computation
CPU Usage for two ANN requests
Request received
Response send
USearch
computation from
other request
How to “prioritize” tasks in Rust Tokio
let semaphore = Arc::new(Semaphore::new(num_cpus));
while let Some(msg) = rx.recv().await {
let permit = Arc::clone(&semaphore).acquire_owned().await;
tokio::spawn(async move {
// move the current task to the end of the runtime queue
tokio::yield_now().await;
do_usearch_computation(msg);
});
}
Dataset 50k, r7i.8xlarge, Vector Store 1 x 4 CPUs, k=100
Dataset 50k, r7i.8xlarge, Vector Store 1 x 4 CPUs, k=100
Dataset 50k, r7i.8xlarge, Vector Store 1 x 4 CPUs, k=100
Cloud environment
Cluster setup
3 x Scylla i4i.xlarge instance (4 CPUs)
3 x Vector Store r7i.xlarge instances (4 CPUs)
1 Benchmarking tool r7i.8xlarge instance (32
CPUs)
Same DC and AZ
ScyllaDB
2025.4.0~dev-0.20250907.040d6e224559
Vector Store 0.4.0
Parameters
m = 16
ef-construction, ef-search = 128
Cosine distance
K (limit) = 100
ScyllaDB’s recall = 95.3%
Build time table ~7sec, index ~25sec
Dataset
openai_small_50k from VectorDBBench
50k vectors
1536 dimensions
Dataset 50k, Scylla 3 x i4i.xlarge (4 CPUs), Vector Store 3 x r7i.xlarge (4 CPUs), RF=3
Cloud environment
Cluster setup
3 x Scylla i4i.4xlarge instance (16 CPUs)
3 x Vector Store r7i.8xlarge instances
(16 CPUs)
1 Benchmarking tool r7i.8xlarge instance
(32 CPUs)
Same DC and AZ
ScyllaDB
2025.4.0~dev-0.20250907.040d6e224559
Vector Store 0.4.0
Parameters
m = 16
ef-construction, ef-search = 128
Cosine distance
Dataset
openai_small_50k from VectorDBBench
50k vectors
1536 dimensions
Dataset 50k, Scylla 3 x i4i.4xlarge (16 CPUs), Vector Store 3 x r7i.8xlarge (16 CPUs), RF=3
Cloud environment
Cluster setup
3 x Scylla i4i.8xlarge instance (32 CPUs)
3 x Vector Store r7i.16xlarge instances
(32 CPUs)
1 Benchmarking tool r7i.8xlarge instance
(32 CPUs)
Same DC and AZ
ScyllaDB
2025.4.0~dev-0.20250910.ce4592d8fcf8
Vector Store 0.3.0 + task scheduler with
yield
Parameters
m = 16
ef-construction, ef-search =
128
Cosine distance
Dataset
openai_small_50k
from VectorDBBench
50k vectors
1536 dimensions
Dataset
Laion_large_100m
from VectorDBBench
100m vectors
768 dimensions
Parameters
m = 64
ef-construction, ef-search =
512
Euclidean distance
Table build time ~43 min
Index build time ~7h
Dataset 100M, Scylla 3 x i4i.8xlarge (32 CPUs), Vector Store 3 x r7i.16xlarge (64 CPUs), RF=3