Advanced Video Search - Leveraging Twelve Labs and Milvus for Semantic Retrieval
chloewilliams62
160 views
23 slides
Aug 14, 2024
Slide 1 of 23
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
About This Presentation
This talk will explore the power of Twelve Labs' multimodal embeddings and Milvus' efficient vector database to create a robust video search solution. We'll cover key concepts such as generating multimodal embeddings from videos with Twelve Labs' SOTA foundation model, storing them e...
This talk will explore the power of Twelve Labs' multimodal embeddings and Milvus' efficient vector database to create a robust video search solution. We'll cover key concepts such as generating multimodal embeddings from videos with Twelve Labs' SOTA foundation model, storing them efficiently in Milvus, and performing similarity searches to retrieve relevant content. With this integration, developers can build applications such as content-based video retrieval, recommendation systems, and sophisticated search engines that understand the nuances of video data.
Size: 3.39 MB
Language: en
Added: Aug 14, 2024
Slides: 23 pages
Slide Content
1PROPRIETARY
Advanced Video Search -
Leveraging Twelve Labs and
Milvus for Semantic Retrieval
Unstructured Data Meetup - South Bay Edition
August 13, 2024
2PROPRIETARY
Video most closely resembles the
sensory inputs from the
real-world. We model the world
by shipping next-generation
multimodal foundation models
and pushing the boundaries of
video understanding.
Our Mission
3PROPRIETARY
What is Video
Understanding?
4PROPRIETARY
How Video Understanding Has Evolved Over The Years
What is Video Understanding?
Video invented
1878
First speech-to-text
commercialized
First CNN-based
image recognition (LeNet-5)
19961997 2022
Keep Binging till eternity!
1Manual watching 2Transcripts
Awful to read and
Totally disconnected from visual info
Doesn’t capture meaning or context and can
create huge discrepancies
Tags3
5PROPRIETARY
The Past, Present, and Future of Video Understanding
What is Video Understanding?
1.The Past: Solving Low-Level Video Perception Tasks (Object Detection, Object Tracking, Action Recognition,
Instance Segmentation)
2.The Present: Handling High-Level Video Understanding Tasks (Classification, Retrieval, Question Answering,
Captioning)
3.The Future: Going Multimodal with Video Foundation Models -> General-Purpose Video Understanding
6PROPRIETARY
Video Foundation Models
What is Video Understanding?
Individual users & enterprises
Video-centric applications
Law Enforcement Contextual Ads E-learning Sports Creator Economy
End-users will demand video applications
to be intelligent from inception.
Applications built on top of embedding-based
gateway APIs (Search, Classify, Generate, and
Embed).
VIDEO FOUNDATION MODELS
Multimodal embeddings (Video-text)
SEARCH & CLASSIFY APIs GENERATE API EMBED API
APIs for downstream tasks
Video foundation models generate powerful
video-text embeddings.
Video embedding is a numerical
representation that stores all conversational
and visual semantic information from a video.
7PROPRIETARY
Video Foundation
Models
8PROPRIETARY
The Magic of Video Embeddings
Video Foundation Models
Source: The Multimodal Evolution of Vector Embeddings
9PROPRIETARY
A SOTA Video Foundation Model for Any-to-Any Search
Video Foundation Models
Source: Introducing Marengo-2.6
10PROPRIETARY
Twelve Labs Search API
Video Foundation Models
11PROPRIETARY
Twelve Labs Classify API
Video Foundation Models
12PROPRIETARY
Twelve Labs Embed API
Video Foundation Models
Video
POSTEmbed
input_type
file
: video, audio, image, text
: video.mp4
GETEmbed
task_id
Embeddings
: 61e1127861c43d6d9b736194
[0.6,-0.2,0.3,0.4,...]
Video embeddings (semantic
representation)
GET
GET
Video-level Embeddings
Clip-level Embeddings
[0.6,-0.2,0.3,0.4,...],
[0.6,-0.2,0.3,0.4,...],
…
[0.6,-0.2,0.3,0.4,...]
13PROPRIETARY
Twelve Labs Research Horizon
Video Foundation Models and Video Language Models
Source: Introducing Video-To-Text and Pegasus-1 (80B)
14PROPRIETARY
Twelve Labs
Meets Milvus
15PROPRIETARY
Advanced Multimodal Embeddings meets Efficient Vector Database
Twelve Labs and Milvus
Source: Advanced Video Search: Leveraging Twelve Labs and Milvus for Semantic Retrieval
16PROPRIETARY
Twelve Labs and Milvus
Connecting to Milvus + Creating a Milvus Collection