WMF 2024 - Unlocking the Future of Data Powering Next-Gen AI with Vector Databases

foogaro 62 views 43 slides Jun 14, 2024
Slide 1
Slide 1 of 43
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25
Slide 26
26
Slide 27
27
Slide 28
28
Slide 29
29
Slide 30
30
Slide 31
31
Slide 32
32
Slide 33
33
Slide 34
34
Slide 35
35
Slide 36
36
Slide 37
37
Slide 38
38
Slide 39
39
Slide 40
40
Slide 41
41
Slide 42
42
Slide 43
43

About This Presentation

Vector databases are transforming how we handle data, allowing us to search through text, images, and audio by converting them into vectors. Today, we'll dive into the basics of this exciting technology and discuss its potential to revolutionize our next-generation AI applications. We'll exa...


Slide Content

Luigi Fugaro
Senior Solution Architect @ Redis
Unlocking the Future of Data:
Powering Next-Gen AI
with Vector Databases

Agenda

1.Data Review
2.Vector Embeddings
3.Vector Database
4.Demo - Let’s see come code

Titolo
Data Review
1 of 4

Data Review
Let’s start with a metric



Around 80%
of the data generated
by organizations is
Unstructured
Growth
IDC Report 2023 - https://www.box.com/resources/unstructured-data-paper

Data Review
Data Types
Growth
Unstructured
Quasi-Structured
Semi-Structured
Structured
No inherent structure
~ PDFs, images, audio, video
Erratic patterns/formats
~ Clickstreams
There's a discernible pattern
~ Spreadsheets / XML / JSON
Schema/defined data model
~ Database
IDC Report 2023 - https://www.box.com/resources/unstructured-data-paper

How to deal with Unstructured Data?
Common approaches were:

●Labeling
●Tagging
Data Review

Labeling and Tagging
Feature Value
Frame Color Green
Tire Color Brown
Has Rear Rack Yes
Has Fenders Yes
Has Safety Bell No
Has Fat Tires Yes
Feature Value
Frame Color Matte Olive
Tire Color Orange
Has Rear Rack Yes
Has Fenders Yes
Has Safety Bell Yes
Has Fat Tires Yes
Data Review

Labeling and Tagging
Feature Value
Easy Assembly ⭐⭐⭐⭐⭐
Chain Quality ⭐⭐⭐
Seat Comfort ⭐
Gear Smoothness⭐⭐⭐⭐
Data Review

How to deal with Unstructured Data?
Labeling and Tagging are
labor intensive,
subjective and error-prone

What’s the new approach?
Data Review

Titolo
Vector Embeddings
2 of 4

Vector Embeddings
What is a Vector?

Numeric representation of something
in N-dimensional space using floating numbers




Can represent anything
entire documents, images, video, audio…

Vector Embeddings
How to turn Data into Vectors?

It’s quite a complex process,
based primarily on Neural Networks

Vector Embeddings
How to turn Data into Vectors?

Don’t be scared, Machine Learning and Deep Learning
has leaped forward in the last decade and we all can
benefit from a huge ecosystem of Models, ready to use!




Each Model has its own specific task!

Vector Embeddings
Music
Video
Images
Faces
Poses
Emotions
Audio Model
Video Model
Vision Model
Face Detection/Recognition Models
Vision Model Trained on Poses
Sentiment Model Embeddings

Models quantifies features of the item
Vector Embeddings
Why vectors embeddings?
They are comparable!

Visual representation
Vector Embeddings
Semantic Relationship Syntactic Relationship

Visual representation
Vector Embeddings
https://jalammar.github.io/illustrated-word2vec
“King”

[ 0.50451 , 0.68607 , -0.59517 , -0.022801, 0.60046 , -0.13498 , -0.08813 , 0.47377 , -0.61798 , -0.31012
, -0.076666, 1.493 , -0.034189, -0.98173 , 0.68229 , 0.81722 , -0.51874 , -0.31503 , -0.55809 , 0.66421 ,
0.1961 , -0.13495 , -0.11476 , -0.30344 , 0.41177 , -2.223 , -1.0756 , -1.0783 , -0.34354 , 0.33505 , 1.9927 ,
-0.04234 , -0.64319 , 0.71125 , 0.49159 , 0.16754 , 0.34344 , -0.25663 , -0.8523 , 0.1661 , 0.40102 , 1.1685
, -1.0137 , -0.21585 , -0.15155 , 0.78321 , -0.91241 , -1.6106 , -0.64426 , -0.51042 ]

Visual representation
Vector Embeddings
https://jalammar.github.io/illustrated-word2vec

Visual representation
Vector Embeddings
https://jalammar.github.io/illustrated-word2vec

Visual representation
Vector Embeddings
https://jalammar.github.io/illustrated-word2vec

Visual representation
Vector Embeddings
https://jalammar.github.io/illustrated-word2vec

Visual representation
Vector Embeddings
https://jalammar.github.io/illustrated-word2vec

So, is it all about arithmetic operations?
Vector Embeddings

What else?
There is one main operation that you can do,
and it’s called Similarity Search!

Vector Similarity Search Algorithms
Vector Embeddings

Vector Embeddings
Cosine Similarity

Now that we have Vector Embeddings?
Vector Embeddings
We need a database to store them!
Nope, we need a Vector Database!

Titolo
Vector Database
3 of 4

Vector Database
Music
Video
Images
Faces
Poses
Emotions
Audio Model
Video Model
Vision Model
Face Detection/Recognition Models
Vision Model Trained on Poses
Sentiment Model Embeddings
REDIS

How does a Vector DB need to have?

❏Store data
❏Index data
❏Query data

Does Redis have all of’em?

Avoja, and much more!
Vector Database

Vector indexing algorithms
Redis manages vectors in an index data structure to enable intelligent similarity search that
balances search speed and search quality. Choose from two popular techniques, FLAT (a brute
force approach) and HNSW (Hierarchical Navigable Small World - a faster, and approximate
approach).
Vector search distance metrics
Redis uses a distance metric to measure the similarity between two vectors. Choose from
three popular metrics – Euclidean, Inner Product, and Cosine Similarity – used to calculate
how “close” or “far apart” two vectors are.
Powerful hybrid filtering
Take advantage of the full suite of search features available in Redis query and search.
Enhance your workflows by combining the power of vector similarity with more traditional
geo, numeric, text, and tag filters. Incorporate more business logic into queries and simplify
client application code.
Redis as Vector DB
Vector Database

Redis as Vector DB
Real-time updates
Real-time search and recommendation systems generate large volumes of
changing data. New images, text, products, or metadata? Perform updates,
insertions, and deletes to the search index seamlessly as your dataset changes
overtime. Redis Enterprise reduces costly impacts of stagnant data.
Vector range queries
Traditional vector search is performed by finding the “top K” most similar
vectors. Redis Enterprise also enables the discovery of relevant content within a
predefined similarity range or threshold for an alternative, and offers a more
flexible search experience.
Vector Database

Titolo
Let’s see some code
4 of 4

Demo - Plan B!
spring.data.redis.host=35.187.74.111
spring.data.redis.port=12000
spring.data.redis.username =default
spring.data.redis.password =redis
server.port=8080

spring.mvc.hiddenmethod.filter.enabled =true
com.redis.om.vss.useLocalImages =false
com.redis.om.vss.maxLines =300

redis.om.spring.djl.enabled =true

redis.om.spring.djl.image-embedding-model-engine =PyTorch
redis.om.spring.djl.image-embedding-model-model-urls =djl://ai.djl.pytorch/resnet18_embedding

redis.om.spring.djl.sentence-tokenizer-max-length =768
redis.om.spring.djl.sentence-tokenizer-model =sentence-transformers/all-mpnet-base-v2
redis.om.spring.djl.sentence-tokenizer-model-max-length =768

redis.om.spring.djl.face-detection-model-engine =PyTorch
redis.om.spring.djl.face-detection-model-name =retinaface
redis.om.spring.djl.face-detection-model-model-urls =https://resources.djl.ai/test-models/pytorch/retinaface.zip

redis.om.spring.djl.face-embedding-model-engine =PyTorch
redis.om.spring.djl.face-embedding-model-name =face_feature
redis.om.spring.djl.face-embedding-model-model-urls =https://resources.djl.ai/test-models/pytorch/face_feature.zip

Demo - Plan B!
@Document
public class ImageData {
@Id
private String id;
@Indexed
private String name;
@Indexed
private int height;
@Indexed
private int width;
@Indexed(schemaFieldType = SchemaFieldType.VECTOR,
algorithm = VectorField.VectorAlgorithm.HNSW,
type = VectorType.FLOAT32,
dimension = 512,
distanceMetric = DistanceMetric.L2,
initialCapacity = 10)
private float[] imageEmbedding;
@Vectorize(destination = "imageEmbedding", embeddingType = EmbeddingType.FACE)
private String imagePath;
@Indexed
private double score = 0;

...

}

Demo - Plan B!
@Service
public class BestOfMatchService {

@Autowired
private EntityStream entityStream;
@Autowired
public ZooModel<Image, float[]> faceEmbeddingModel ;

private List<ImageData> matchAll(byte[] image, int limit) {
List<ImageData> imageDataList = new ArrayList<>();
try (Predictor<Image, float[]> predictor = faceEmbeddingModel .newPredictor()) {
ByteArrayInputStream byteArrayInputStream = new ByteArrayInputStream( image);
Image img = ImageFactory.getInstance().fromInputStream( byteArrayInputStream );
float[] embedding = predictor.predict(img);
byte[] embeddingAsByteArray = floatArrayToByteArray (embedding);

SearchStream<ImageData> stream = entityStream.of(ImageData.class);
List<Pair<ImageData,Double>> matchWithScore = stream
.filter(ImageData$.IMAGE_EMBEDDING.knn(K, embeddingAsByteArray))
.sorted( ImageData$._IMAGE_EMBEDDING_SCORE , SortedField.SortOrder.ASC)
.limit( limit)
.map( Fields.of(ImageData$._THIS, ImageData$._IMAGE_EMBEDDING_SCORE ))
.collect( Collectors.toList());

for (Pair<ImageData,Double> pair : matchWithScore) {
ImageData imageData = pair.getFirst();
Double score = pair.getSecond();
imageData.setScore(score);
imageDataList.add(imageData);
}
return imageDataList;
} catch (Exception e) {
throw new RuntimeException( e);
}
}
}

Demo - Plan B!

Demo - Plan B!

Demo - Plan B!

Demo - Plan B!

Titolo
Wrap-up
1,2,3,4

4
Wrap up
Unlocking the Future of Data:
Powering Next-Gen AI with Vector Databases
#WMF2024

321
Data Vector Embeddings Vector Database Redis

VOTA L’INTERVENTO SU IBRIDA
Luigi Fugaro
Senior Solution Architect @ Redis

TITOLO PASSAGGIO UNO
Per ulteriori informazioni puoi scriverci a
[email protected]

www.wemakefuture.it