Advanced Video Search - Leveraging Twelve Labs and Milvus for Semantic Retrieval

chloewilliams62 160 views 23 slides Aug 14, 2024

Slide 1 of 23

About This Presentation

This talk will explore the power of Twelve Labs' multimodal embeddings and Milvus' efficient vector database to create a robust video search solution. We'll cover key concepts such as generating multimodal embeddings from videos with Twelve Labs' SOTA foundation model, storing them e...

Size: 3.39 MB

Language: en

Added: Aug 14, 2024

Slides: 23 pages

Slide Content

1PROPRIETARY
Advanced Video Search -
Leveraging Twelve Labs and
Milvus for Semantic Retrieval
Unstructured Data Meetup - South Bay Edition
August 13, 2024

2PROPRIETARY
Video most closely resembles the
sensory inputs from the
real-world. We model the world
by shipping next-generation
multimodal foundation models
and pushing the boundaries of
video understanding.
Our Mission

3PROPRIETARY
What is Video
Understanding?

4PROPRIETARY
How Video Understanding Has Evolved Over The Years
What is Video Understanding?
Video invented
1878
First speech-to-text
commercialized
First CNN-based
image recognition (LeNet-5)
19961997 2022
Keep Binging till eternity!
1Manual watching 2Transcripts
Awful to read and
Totally disconnected from visual info
Doesn’t capture meaning or context and can
create huge discrepancies
Tags3

5PROPRIETARY
The Past, Present, and Future of Video Understanding
What is Video Understanding?
1.The Past: Solving Low-Level Video Perception Tasks (Object Detection, Object Tracking, Action Recognition,
Instance Segmentation)
2.The Present: Handling High-Level Video Understanding Tasks (Classiﬁcation, Retrieval, Question Answering,
Captioning)
3.The Future: Going Multimodal with Video Foundation Models -> General-Purpose Video Understanding

6PROPRIETARY
Video Foundation Models
What is Video Understanding?
Individual users & enterprises
Video-centric applications
Law Enforcement Contextual Ads E-learning Sports Creator Economy
End-users will demand video applications
to be intelligent from inception.
Applications built on top of embedding-based
gateway APIs (Search, Classify, Generate, and
Embed).
VIDEO FOUNDATION MODELS
Multimodal embeddings (Video-text)
SEARCH & CLASSIFY APIs GENERATE API EMBED API
APIs for downstream tasks
Video foundation models generate powerful
video-text embeddings.

Video embedding is a numerical
representation that stores all conversational
and visual semantic information from a video.

7PROPRIETARY
Video Foundation
Models

8PROPRIETARY
The Magic of Video Embeddings
Video Foundation Models
Source: The Multimodal Evolution of Vector Embeddings

9PROPRIETARY
A SOTA Video Foundation Model for Any-to-Any Search
Video Foundation Models
Source: Introducing Marengo-2.6

10PROPRIETARY
Twelve Labs Search API
Video Foundation Models

11PROPRIETARY
Twelve Labs Classify API
Video Foundation Models

12PROPRIETARY
Twelve Labs Embed API
Video Foundation Models
Video
POSTEmbed
input_type
ﬁle
: video, audio, image, text
: video.mp4
GETEmbed
task_id
Embeddings
: 61e1127861c43d6d9b736194
[0.6,-0.2,0.3,0.4,...]
Video embeddings (semantic
representation)
GET
GET
Video-level Embeddings
Clip-level Embeddings
[0.6,-0.2,0.3,0.4,...],
[0.6,-0.2,0.3,0.4,...],
…
[0.6,-0.2,0.3,0.4,...]

13PROPRIETARY
Twelve Labs Research Horizon
Video Foundation Models and Video Language Models
Source: Introducing Video-To-Text and Pegasus-1 (80B)

14PROPRIETARY
Twelve Labs
Meets Milvus

15PROPRIETARY
Advanced Multimodal Embeddings meets Efficient Vector Database
Twelve Labs and Milvus
Source: Advanced Video Search: Leveraging Twelve Labs and Milvus for Semantic Retrieval

16PROPRIETARY
Twelve Labs and Milvus
Connecting to Milvus + Creating a Milvus Collection

17PROPRIETARY
Twelve Labs and Milvus
Generating Embeddings + Insert Embeddings + Similarity Search

18PROPRIETARY
Developer
Resources

19PROPRIETARY
Twelve Labs Video Understanding Platform
Developer Resources
Source: Platform Overview

20PROPRIETARY
Twelve Labs SDKs
Developer Resources
Source: Twelve Labs SDKs

21PROPRIETARY
Jockey - A Conversational Video Agent
Developer Resources
Source: Introducing Jockey

Advanced Video Search - Leveraging Twelve Labs and Milvus for Semantic Retrieval

About This Presentation

Slide Content

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Advanced Video Search - Leveraging Twelve Labs and Milvus for Semantic Retrieval

About This Presentation

Slide Content

Slide 1

Slide 2

Slide 3

Slide 4

Slide 5

Slide 6

Slide 7

Slide 8

Slide 9

Slide 10

Slide 11

Slide 12

Slide 13

Slide 14

Slide 15

Slide 16

Slide 17

Slide 18

Slide 19

Slide 20

Slide 21

Slide 22

Slide 23

Tags

Categories

Download

Quick Actions

Statistics

Related Slideshows

Pray For The Peace Of Jerusalem and You Will Prosper

Don_t_Waste_Your_Life_God.....powerpoint

VILLASUR_FACTORS_TO_CONSIDER_IN_PLATING_SALAD_10-13.pdf

Fertility awareness methods for women in the society

Chapter 5 Arithmetic Functions Computer Organisation and Architecture

syakira bhasa inggris (1) (1).pptx.......