Advanced Video Search - Leveraging Twelve Labs and Milvus for Semantic Retrieval

chloewilliams62 160 views 23 slides Aug 14, 2024
Slide 1
Slide 1 of 23
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23

About This Presentation

This talk will explore the power of Twelve Labs' multimodal embeddings and Milvus' efficient vector database to create a robust video search solution. We'll cover key concepts such as generating multimodal embeddings from videos with Twelve Labs' SOTA foundation model, storing them e...


Slide Content

1PROPRIETARY
Advanced Video Search -
Leveraging Twelve Labs and
Milvus for Semantic Retrieval
Unstructured Data Meetup - South Bay Edition
August 13, 2024

2PROPRIETARY
Video most closely resembles the
sensory inputs from the
real-world. We model the world
by shipping next-generation
multimodal foundation models
and pushing the boundaries of
video understanding.
Our Mission

3PROPRIETARY
What is Video
Understanding?

4PROPRIETARY
How Video Understanding Has Evolved Over The Years
What is Video Understanding?
Video invented
1878
First speech-to-text
commercialized
First CNN-based
image recognition (LeNet-5)
19961997 2022
Keep Binging till eternity!
1Manual watching 2Transcripts
Awful to read and
Totally disconnected from visual info
Doesn’t capture meaning or context and can
create huge discrepancies
Tags3

5PROPRIETARY
The Past, Present, and Future of Video Understanding
What is Video Understanding?
1.The Past: Solving Low-Level Video Perception Tasks (Object Detection, Object Tracking, Action Recognition,
Instance Segmentation)
2.The Present: Handling High-Level Video Understanding Tasks (Classification, Retrieval, Question Answering,
Captioning)
3.The Future: Going Multimodal with Video Foundation Models -> General-Purpose Video Understanding

6PROPRIETARY
Video Foundation Models
What is Video Understanding?
Individual users & enterprises
Video-centric applications
Law Enforcement Contextual Ads E-learning Sports Creator Economy
End-users will demand video applications
to be intelligent from inception.
Applications built on top of embedding-based
gateway APIs (Search, Classify, Generate, and
Embed).
VIDEO FOUNDATION MODELS
Multimodal embeddings (Video-text)
SEARCH & CLASSIFY APIs GENERATE API EMBED API
APIs for downstream tasks
Video foundation models generate powerful
video-text embeddings.

Video embedding is a numerical
representation that stores all conversational
and visual semantic information from a video.

7PROPRIETARY
Video Foundation
Models

8PROPRIETARY
The Magic of Video Embeddings
Video Foundation Models
Source: The Multimodal Evolution of Vector Embeddings

9PROPRIETARY
A SOTA Video Foundation Model for Any-to-Any Search
Video Foundation Models
Source: Introducing Marengo-2.6

10PROPRIETARY
Twelve Labs Search API
Video Foundation Models

11PROPRIETARY
Twelve Labs Classify API
Video Foundation Models

12PROPRIETARY
Twelve Labs Embed API
Video Foundation Models
Video
POSTEmbed
input_type
file
: video, audio, image, text
: video.mp4
GETEmbed
task_id
Embeddings
: 61e1127861c43d6d9b736194
[0.6,-0.2,0.3,0.4,...]
Video embeddings (semantic
representation)
GET
GET
Video-level Embeddings
Clip-level Embeddings
[0.6,-0.2,0.3,0.4,...],
[0.6,-0.2,0.3,0.4,...],

[0.6,-0.2,0.3,0.4,...]

13PROPRIETARY
Twelve Labs Research Horizon
Video Foundation Models and Video Language Models
Source: Introducing Video-To-Text and Pegasus-1 (80B)

14PROPRIETARY
Twelve Labs
Meets Milvus

15PROPRIETARY
Advanced Multimodal Embeddings meets Efficient Vector Database
Twelve Labs and Milvus
Source: Advanced Video Search: Leveraging Twelve Labs and Milvus for Semantic Retrieval

16PROPRIETARY
Twelve Labs and Milvus
Connecting to Milvus + Creating a Milvus Collection

17PROPRIETARY
Twelve Labs and Milvus
Generating Embeddings + Insert Embeddings + Similarity Search

18PROPRIETARY
Developer
Resources

19PROPRIETARY
Twelve Labs Video Understanding Platform
Developer Resources
Source: Platform Overview

20PROPRIETARY
Twelve Labs SDKs
Developer Resources
Source: Twelve Labs SDKs

21PROPRIETARY
Jockey - A Conversational Video Agent
Developer Resources
Source: Introducing Jockey

22PROPRIETARY
Developer Resources
Join Our
Discord For
Support

23PROPRIETARY
James Le
[email protected]
Tags