RGB-X Model Development: Exploring Four Channel ML Workflows

chloewilliams62 147 views 20 slides Sep 19, 2024
Slide 1
Slide 1 of 20
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20

About This Presentation

Machine Learning is rapidly becoming multimodal. With many models in Computer Vision expanding to areas like vision and 3D, one area that has also quietly been advancing rapidly is RGB-X data, such as infrared, depth, or normals. In this talk we will cover some of the leading models in this explodin...


Slide Content

RGB-X Model Development
Exploring 4 Channel Model Development
Daniel Gural
Machine Learning & Developer
Relations - Voxel51

While Stagnation Fears Rise, Visual AI Marches On



NVDA

The Year of Scaling
> Connecting to cloud
> Securing GPUs
> Building out your stack

RGB-X Models

What is a RGB-X Model?
> A model designed to input or output four channel
images
> Utilized across all industries

Why Does This Work Matter?
> Take a step back
> Does it matter today to answer?
> Hold this question for later

How RGB-X Models Are Being Used Today

Tracking Across all Signals
> Extension of a traditional Object Detection Model
> Tracks objects across frames of any RGB-X image

Surveying Difficult Terrain
> Scan hard to access places
> Create a map of areas just by flying through a drone
> Can be performed all onboard

Sapiens and Beyond

How to Get Started

Revisiting the Why
> In a vacuum, none of these use cases move the needle
> So why is this work important?

Looking to the Future

Looking to the Future

Looking to the Future

What Does This Mean for Machine Learning?
1.

What Does This Mean for Machine Learning?
2.

What Does This Mean for Machine Learning?
3.

Try Your Hand at Monocular Depth Estimation
https://docs.voxel51.com/tutorials/monocular_depth_estimation.html

voxel51.com/jobs/
We are
hiring!
Tags