DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering

bunkertor 53 views 15 slides Oct 15, 2024
Slide 1
Slide 1 of 15
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15

About This Presentation

DBTA Round Table with Zilliz and Airbyte - Unstructured Data Engineering

https://www.dbta.com/Webinars/2076-Data-Engineering-Best-Practices-for-AI.htm





Data Engineering Best Practices for AI

Data engineering is the backbone of AI systems. After all, the success of AI models heavily depends on...


Slide Content

Introduction to Unstructured Data,
Vector Database and Gen AI
Tim Spann @ Zilliz

2| © Copyright 10/22/23 Zilliz 2| © Copyright 2024 Zilliz
Tim Spann
Principal Developer Advocate
Zilliz
[email protected]
https://www.linkedin.com/in/timothyspann/
https://x.com/PaaSDev

3| © Copyright 9/27/23 Zilliz3| © Copyright 9/27/23 Zilliz
Vector search
is the new
paradigm

| © Copyright 9/25/23 Zilliz 4
Retrieval Augmented
Generation RAG
Expand LLMs' knowledge by
incorporating external data sources
into LLMs and your AI applications
Match user behavior or content
features with other similar ones to
make effective recommendations
Recommender System
RecSys
Search for semantically similar
texts across vast amounts of
natural language documents
Text/Semantic Similarity
Search
Molecular Similarity
Search
Search for similar substructures,
superstructures, and other
structures for a specific molecule
Fraud & Anomaly
Detection
Detect data points, events, and
observations that deviate
significantly from the usual pattern
Multimodal Similarity
Search
Search over multiple types of
data simultaneously, e.g. text,
audio, images, video
…powers search across various types of apps

| © Copyright 9/25/23 Zilliz 5
Mission:
Helping organizations make sense
of unstructured data.
2017
Founded
$113M
Raised
140
Employees
Redwood City, CA
Headquarters

6 | © Copyright 2024 Zilliz6 6| © Copyright 2024 Zilliz6
Zilliz was built by a top-tier team
of algorithm and database
engineers with a strong
pedigree in developing
high-performance, scalable,
and highly available distributed
systems, uniquely tailored for
vector search.

Built by
database & AI
experts

30K
GitHub
Stars
M
Downloads
250
Contributors
2,600
+Forks
Milvus is an open-source vector database for GenAI projects. docker pull on your
laptop, plug into popular AI dev tools, and push to production with a single line of
code.
Easy Setup

docker pull to
start quickly.
Reusable Code

Write once, and
deploy with one line
of code into the
production
environment
Integration

Plug into OpenAI,
Langchain,
LlamaIndex, and
many more
Feature-rich

Dense & sparse
embeddings,
filtering, reranking
and beyond

The Forrester Wave™ Vector
Database Providers, Q3 2024
Zilliz is recognized as the Leader in
the Vector DB Space

Data Source: The Digitization of the World by IDC
10%
Other
of newly generated data in 2025
will be unstructured data90%
The world is much more than just text and keywords

The Vector Database

| © Copyright 2024 Zilliz
Rich functionality
Bulk Import GPU, Intel & ARM
CPU support
Disk Based
Index
Tiered StorageMillion+ level
tenant support
Hybrid Search
Dense & Sparse
RBAC, TLS,
Encryption
Float, Binary, &
Sparse Vector
Tag+Vector
Optimized Filtering
Dynamic Schema

Retrieval-Augmented Generation

Framework
Hardware
Infrastructure
Embedding Models LLMs
Software Infrastructure
Vector Database
Real-Time AI Stack
Integration

14| © Copyright 10/22/23 Zilliz 14| © Copyright 2024 Zilliz
Milvus
Open Source Self-Managed

Zilliz Cloud
SaaS Fully-Managed

github.com/milvus-io/milvus

Getting Started with Vector Databases
zilliz.com/cloud

milvus.io
github.com/milvus-io/
@milvusio
@paasDev


/in/timothyspann
Connect with me! Thank you!