Details
This is an in-person event! Registration is required to get in.
Topic: Connecting your unstructured data with Generative LLMs
What we’ll do:
Have some food and refreshments. Hear three exciting talks about unstructured data and generative AI.
5:30 - 6:00 - Welcome/Networking/Registration
6:05 - 6:30 - Tim Spann, Principal DevRel, Zilliz
6:35 - 7:00 - Chris Joynt, Senior PMM, Cloudera
7:05 - 7:30 - Lisa N Cao, Product Manager, Datastrato
7:30 - 8:30 - Networking
Tech talk 1: Unstructured Data Processing From Cloud to Edge
Speaker: Tim Spann, Principal Dev Advocate, Zilliz
In this talk I will do a presentation on why you should add a Cloud Native vector database to your Data and AI platform. He will also cover a quick introduction to Milvus, Vector Databases and unstructured data processing. By adding Milvus to your architecture you can scale out and improve your AI use cases through RAG, Real-Time Search, Multimodal Search, Recommendations Engines, fraud detection and many more emerging use cases.
As I will show, Edge devices even as small and inexpensive as a Raspberry Pi 5 can work in machine learning, deep learning and AI use cases and be enhanced with a vector database.
Tech talk 2: RAG Pipelines with Apache NiFi
Speaker: Chris Joynt, Senior PMM, Cloudera
Executing on RAG Architecture is not a set-it-and-forget-it endeavor. Unstructured or multimodal data must be cleansed, parsed, processed, chunked and vectorized before being loaded into knowledge stores and vector DB's. That needs to happen efficiently to keep our GenAI up to date always with fresh contextual data. But not only that, changes will have to be made on an ongoing basis. For example, new data sources must be added. Experimentation will be necessary to find the ideal chunking strategy. Apache NiFi is the perfect tool to build RAG pipelines to stream proprietary and external data into your RAG architectures. Come learn how to use this scalable and incredible versatile tool to quickly build pipelines to activate your GenAI use case.
Tech Talk 3: Metadata Lakes for Next-Gen AI/ML
Speaker: Lisa N Cao, Datastrato
Abstract: As data catalogs evolve to meet the growing and new demands of high-velocity, unstructured data, we see them taking a new shape as an emergent and flexible way to activate metadata for multiple uses. This talk discusses modern uses of metadata at the infrastructure level for AI-enablement in RAG pipelines in response to the new demands of the ecosystem. We will also discuss Apache (incubating) Gravitino and its open source-first approach to data cataloging across multi-cloud and geo-distributed architectures.
Who Should attend:
Anyone interested in talking and learning about Unstructured Data and Generative AI Apps.
All ideas are welcome
Be present and participate actively in discussions. Ask
questions and reach out for help when needed.
Report inappropriate behavior
Any inappropriate behavior is not tolerated at this event.
Inform a Zilliz team member immediately if you see any
behavior deemed inappropriate
Agenda
In this talk I will do a presentation on why you should
add a Cloud Native vector database to your Data and
AI platform. He will also cover a quick introduction to
Milvus, Vector Databases and unstructured data
processing. By adding Milvus to your architecture you
can scale out and improve your AI use cases through
RAG, Real-Time Search, Multimodal Search,
Recommendations Engines, fraud detection and
many more emerging use cases.
As I will show, Edge devices even as small and
inexpensive as a Raspberry Pi 5 can work in machine
learning, deep learning and AI use cases and be
enhanced with a vector database.
This meetup is for people working in unstructured data. Speakers will come present about related topics
such as vector databases, LLMs, and managing data at scale. The intended audience of this group
includes roles like machine learning engineers, data scientists, data engineers, software engineers, and
PMs.
This meetup was formerly Milvus Meetup, and is sponsored by Zilliz maintainers of Milvus.
Extracting Value from Unstructured Data
Example
•A company has 100,000s+ pages of
proprietary documentation to enable
their staff to service customers.
Problem
•Searching can be slow, inefficient, or
lack context.
Solution
•Create internal chatbot with ChatGPT
and a vector database enriched with
company documentation to provide
direction and support to employees
and customers.
https://osschat.io/chat
We provide deployment flexibility for different operational, security and compliance requirements
BRING YOUR OWN CLOUD
Zilliz BYOC
Enterprise-ready Milvus for
Private VPCs
Deploy in your virtual private cloud
Zilliz Cloud
Milvus Re-engineered for the
Cloud
Available on the leading public
clouds
FULLY MANAGED SERVICE
Coming Soon!Coming Soon!
Milvus
Most widely-adopted open
source vector database
Self hosted on any machine with
community support
SELF MANAGED SOFTWARE
Local Docker K8s
Multi-replication Yes
Dynamic segment placement vs. static data sharding Dynamic segment placement
Cloud-native Yes
Regarding scalability, Milvus uses worker nodes for each type of action
(components to handle connections, data nodes to handle ingestion, index
nodes to index, and query nodes to search). Each node has its own assigned
CPU and memory resources. Milvus can dynamically allocate new nodes to an
action group, speeding up operations or reducing the number of nodes, thus
freeing resources for other actions. Dynamically allocating nodes allows for
easier scaling and resource planning and guarantees latency and throughput.
Billion-scale vector support Yes
Milvus is the fastest regarding search latency and throughput, supporting a
billion scale-dataset. In addition, its exceptional approach to supporting multiple
in-memory indexes and table-level partitions results in the high performance
required for real-time information retrieval systems.
Hybrid Search Yes with Scalar filtering and combined Sparse and Dense Vectors
Index type supported
9 (FLAT, IVS_FLAT, IVF_SQ8, IVF_PQ, HNSW, ANNOY, BIN_FLAT, and
BIN_IVF_FLAT)
Purpose-built for Vectors Yes
Database rollback Yes
Data Consistency settings Yes
Milvus is a fully open source and independent project, maintained by a number of
companies and individuals, some of whom also offer commercial services and
support. Graduate of LF AI Data.
License: Apache-2.0 license
RAG vs. Fine-tune
-Fine-tune is expensive
-Fine-tune spent much time
-RAG is pluggable
Retrieval-Augmented Generation
…and cannot process increasingly growing
unstructured data
*Data Source: The Digitization of the World by IDC
20%
Other
newly generated data in 2025
will be unstructured data80%
Zilliz Cloud
Optimized Milvus with essential data and
security tools for a high-performing vector
search platform
VECTOR SEARCH
ENGINE
VECTORDB
BENCHMARK TOOL
VECTOR DATABASE
SEMANTIC CACHE
FOR LLM QUERIES
GPT-Cache
Product Portfolio
GUI for Milvus
Hierarchical IDs
•Use hierarchical structures for complex data sets.
Example: For a hierarchical document system, use IDs like {"id":
"projectA_chapter1_section2"}
Identifier Strategies