Vector Search at Scale - Pro Tips - Stephen Batifol

chloewilliams62 47 views 42 slides Oct 14, 2024

Slide 1 of 42

About This Presentation

Have you ever started a project, downloaded a random Vector DB and added some vectors to it, to only realize that it was a struggle to scale? In this talk we'll go through the classic journey of an AI developer that doesn't know much about Vector DBs and we will talk about the mistakes to av...

Size: 5.36 MB

Language: en

Added: Oct 14, 2024

Slides: 42 pages

Slide Content

Stephen Batifol

Unstructured Data Meetup
Exploring Vector Search at
Scale

2| © Copyright 8/16/23 Zilliz2| © Copyright 8/16/23 Zilliz
The Happy
Beginnings

The Tech Stack in Q4 2024

Unstructured Data? No Problem!
●Docker + Vector DB

●Use some Embedding Model

●Store your Vectors in your DB, youʼre happy! D

The Growing Pains
You have more and more data!
Maybe even some multimodal data ??????

●Search Quality is declining
●Index updates take forever
●Your frustration rises!

Index Types Overview

Hash Based Index

Tree-based Index
ANNOY (Approximate Nearest Neighbors Oh Yeah)

Graph Based Index

Inverted File Index

IVF-FLAT Index

Balancing Speed & Accuracy

Compression

✅ Youʼve optimized your Index!

✅ Youʼve chosen the right data structure

✅ Youʼve dabbled in compression techniques!

More data → One instance isnʼt enough anymore!

⇒ Itʼs time to go Distributed!!
The Distributed Dilemma

Sharding: Divide and Conquer
You brainstorm different sharding approaches:
●Random sharding: Simple, but might lead to uneven distribution.
●Hash-based sharding: More even distribution, but potentially tricky
with updates.
●Semantic sharding: Grouping similar vectors together, which could
speed up certain types of queries.

You sketch out a plan:
1. Incoming data gets routed to the appropriate
language-specific partition.
2. Queries first determine the relevant language(s).
3. The search is executed only on the relevant partition.

This could reduce the search space for many queries,
leading to faster results!
Partitioning!

Scaling Horizontally: The Replica Game
●You decide to implement a replica system:
○Each shard has multiple replicas.
○Read requests are load-balanced across replicas.
○Write operations are synchronized across all replicas of a
shard.
This setup allows you to scale out your read capacity simply by
adding more replicas.

Consistency vs Availability

What about Real-Time?!
Users want it now!

●Write Ahead Logs
●Two-tier Systems
●Incremental Indexing

And Monitoring?! Keeping the Beast in Check
You set up dashboards to track:
●Query latency across different shards and replicas
●Index update times
●Resource utilization CPU, memory, disk I/O
●Shard balance and data distribution

You also implement automated processes for:
●Rebalancing shards when they become uneven
●Adding or removing replicas based on load
●Performing rolling updates to minimize downtime

To Host or Not to Host?
As your system grows, you start weighing the pros and cons of cloud-hosted
solutions versus managing your own infrastructure. You consider factors like:
●Scalability and elasticity
●Operational overhead
●Cost predictability
●Data privacy and compliance requirements

A Never Ending Journey

Meta Storage
Root Query Data Index
Coordinator Service
Proxy
Proxy
etcd
Log Broker
SDK
Load Balancer
DDL/DCL
DML
NOTIFICATION
CONTROL SIGNAL
Object Storage
Minio / S3 / AzureBlob
Log Snapshot Delta File Index File
Worker Node
QUERY DATA DATA
Message Storage
VECTOR
DATABASE
Access Layer
Query Node Data Node Index Node
Fully Distributed Architecture

29K
GitHub
Stars
25M
Downloads
250
Contributors
2,600
+Forks
Milvus is an open-source vector database for GenAI projects. pip install on your
laptop, plug into popular AI dev tools, and push to production with a single line of
code.
Easy Setup

pip install
pymilvus to start
coding in a notebook
within seconds.
Reusable Code

Write once, and
deploy with one line
of code into the
production
environment
Integration

Plug into OpenAI,
Langchain,
LlamaIndex, and
many more
Feature-rich

Dense & sparse
embeddings,
Filtering, Reranking
and beyond

Compute Types

Designed for various
compute powers, such as
AVX512, Neon for SIMD,
quantization cache-aware
optimization and GPU

Leverage strengths of each
hardware type, ensuring
high-speed processing and
cost-effective scalability for
different application needs

Search Types

Support multiple types such
as top-K ANN, Range ANN,
Sparse & Dense,
Multi-vector, Grouping,
and Metadata Filtering

Enable query flexibility and
accuracy, allowing
developers to tailor their
information retrieval needs
Multi-tenancy

Enable Multi-Tenancy
through collection and
partition management

Allow for efficient resource
utilization and customizable
data segregation, ensuring
secure and isolated data
handling for each tenant
Index Types

Offer a wide range of 15
indexes support, including
popular ones like HNSW,
PQ, Binary, Sparse,
DiskANN and GPU index

Empower developers with
tailored search
optimizations, catering to
performance, accuracy and
cost needs

We’ve built technologies for various types of use
cases

2024

Milvus Lite Milvus
Standalone
Milvus
Distributed●Ideal for prototyping,
small scale
experiments.
●Easy to set up and
use, pip instally
pymilvus
●Scale to ≈1M vectors
●Run on K8s
●Load balancer and
Multi-Node
Management
●Scaling of each
component
independently
●Scale to 100B
vectors
●Single-Node
Deployment
●Bundled in a single
Docker Image
●Supports Primary/
Secondary
●Scale up to 100M
vectors
Ready to scale ??????
Write your code once, and run it everywhere, at scale!
● API and SDK are the same

Milvus ?????? Open-Source
MINIO
Store Vectors and
Indexes
Enables Milvus’ stateless
architecture
Kafka/ Pulsar
Handles Data Insertion
stream
Internal Component
Communications
Real-time updates to
Milvus
Prometheus /
Grafana
Collects metrics from
Milvus
Provides real-time
monitoring dashboards

Kubernetes
Milvus Operator CRDs

Milvus Data Structures
Shard
●Boost the ingestion rate
Segment
●A single unit of Data in Milvus.
Segment < Partition < Collection

Growing Segment
●Directly retrieves data from the
message queue for rapid service.
Utilizes a brute-force index and
prioritizes data freshness.
Sealed Segment
●An immutable segment uses indexing
methods to guarantee efficiency.

Distributed
Architecture

●Subscribe to the log broker for real-time querying
●Convert new data into Growing Segments -
temporary in-memory structures for the latest
information.
●Access Sealed Segments from object storage for
comprehensive searches.
●Perform hybrid searches combining vector and
scalar data for accurate retrieval.

Query Node: Serving Search Requests

Data Node: Processing Data Updates

●Subscribe to the log broker for real-time updates.
●Process mutation requests for data changes or
updates.
●Pack log data into log snapshots - compressed
bundles of updates.
●Store log snapshots in object storage for
persistence and scalability

●Build indexes on the data to facilitate faster search
operations.
●Can be implemented using a serverless framework
for cost-efficiency and scalability.
Index Node: Building Search Indexes

Index Building
To avoid frequent index building
for data updates.

A collection in Milvus is divided
further into segments, each with
its own index.

●Distributed Search across
shards
●Parallel Processing
●Query Optimization
Scalable Search

2024

10B vectors
of 1536 dimensions
in a single Milvus/Zilliz Cloud
instance

100B vectors
in one of the largest deployment running
on K8s.
But at what Scale?

Stephen Batifol
Developer Advocate, Zilliz/ Milvus
Speaker
[email protected]
linkedin.com/in/stephen-batifol/
@stephenbtl

AI Hack Berlin
AI Hackathon in Berlin ??????????????????

●November 6-7th
●@Google
●10K€ for prizes??????

Builders, get ready!!!

milvus.io
github.com/milvus-io/
@milvusio
Notebooks
@stephenbtl

/in/stephen-batifol
Thank you

Vector Search at Scale - Pro Tips - Stephen Batifol

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Vector Search at Scale - Pro Tips - Stephen Batifol

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Slide 24

Slide 25

Slide 26

Slide 27

Slide 28

Slide 29

Slide 30

Slide 31

Slide 32

Slide 33

Slide 34

Slide 35

Slide 36

Slide 37

Slide 38

Slide 39

Slide 40

Slide 41

Slide 42

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......