“Data-efficient and Generalizable: The Domain-specific Small Vision Model Revolution,” a Presentation from Pixel Scientia Labs

embeddedvision 57 views 25 slides Oct 10, 2024
Slide 1
Slide 1 of 25
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24
Slide 25
25

About This Presentation

For the full video of this presentation, please visit: https://www.edge-ai-vision.com/2024/10/data-efficient-and-generalizable-the-domain-specific-small-vision-model-revolution-a-presentation-from-pixel-scientia-labs/

Heather Couture, Founder and Computer Vision Consultant at Pixel Scientia Labs, p...


Slide Content

Data-Efficient and Generalizable:
The Domain-Specific
Small Vision Model Revolution
Heather D. Couture
Founder and Computer Vision Consultant
Pixel Scientia Labs

From Large Language Models to Large Vision Models
2© 2024 Pixel Scientia Labs
GPT
BERT
Claude
Bard
BLOOM
Llama
ViT
BEit
PaLM
Swin
MoCo
DINO
CLIP
SAM
Language Vision

Foundation Models: Generality & Adaptability
3© 2024 Pixel Scientia Labs
Classification
PRE-TRAINING FINE-TUNING
Regression
Detection
Segmentation
ImageNet
Wikipedia
YouTube
GitHub
PubMed
MS-COCO
Open Street Map
Broad
Datasets
Tasks
Foundation
Model

Problem: Unique Imaging Modality
4© 2024 Pixel Scientia Labs
ImageNet
vs
Histopathology
Image credit: Shutterstock
Fluorescence Microscopy
Image credit: Shutterstock
Multispectral Satellite
Image credit: ESA
Drone
Image credit: Pixabay

Problem: Limited Data
5© 2024 Pixel Scientia Labs
DifficultTime-consumingExpensive
Data collection and labeling can be...
ImageNet
1.2 million vs. 200 images
New medical imaging device

Problem: Compute Resource Constraints
6© 2024 Pixel Scientia Labs
Model sizeModel name# parameters
(million)
FLOPS per inference
(billion)
Small MobileNetV2 7 1.2
ResNet18 12 1.8
ResNet50 26 4.1
ViT-Small 22 4.6
Swin-Tiny 28 4.5
Medium ResNet101 45 7.6
Swin-Small 50 8.7
ViT-Base 87 17.6
Swin-Base 88 15.5
Large Swin-Large 197 34.5
ViT-Large 304 61.6
ViT-Giant 1843 2860
Publicly-available
foundation models
are getting larger

Solution: Domain-SpecificFoundation Models
7© 2024 Pixel Scientia Labs
Histopathology
Image credit: Shutterstock Fluorescence Microscopy
Image credit: Shutterstock
Multispectral Satellite
Image credit: ESA
Forestry Drone
Image credit: Pixabay

Solution: Domain-SpecificFoundation Models
8© 2024 Pixel Scientia Labs
Histopathology
Image credit: Shutterstock Fluorescence Microscopy
Image credit: Shutterstock
Multispectral Satellite
Image credit: ESA
Forestry Drone
Image credit: Pixabay
Classification
Detection
Segmentation
Regression
Segmentation
Detection
Segmentation
Classification
Regression
Detection

Pre-Training and Fine-Tuning
9© 2024 Pixel Scientia Labs
Classification
PRE-TRAINING FINE-TUNING Regression Detection
Segmentation
Self-supervised
learningusing
unlabeled images
Supervised
learning using
labeled images
•Most computationally-intensive step
•Done once
•Select a smaller architecture for
improved computation speed
(training and inference)
•Uses a pretext task instead of
manual labels
•For each downstream task
•Less computation
•Could be as simple as linear
model with no fine-tuning
Domain-Specific
FoundationModel
Domain-Specific
Image Dataset
Downstream Tasks

Self-Supervised Pretext Task: Contrastive
10© 2024 Pixel Scientia Labs
Source: https://blog.research.google/2020/04/advancing-self-supervised-and-semi.html
No manual labels needed

Self-Supervised Pretext Task: Masked Autoencoder
11© 2024 Pixel Scientia Labs
Source: He, Masked Autoencoders Are Scalable Vision Learners, 2021
No manual
labels
needed

Example 1: Satellite
12© 2024 Pixel Scientia Labs
EuroSAT: land cover classification,27k images, 80/20pre-train/test

Example 1: Satellite
13© 2024 Pixel Scientia Labs
Little difference
Pretext Task
Pre-Training on EuroSAT

Example 1: Satellite
14© 2024 Pixel Scientia Labs
Little difference Smaller models sufficient for small training sets
Large models best for large training sets
Pretext Task
Pre-Training on EuroSAT
Model Size

Example 1: Satellite
Small domain-specific models are superior for small training sets
15© 2024 Pixel Scientia Labs
Pre-Training on Different Datasets
Little difference Smaller models sufficient for small training sets
Large models best for large training sets
Pre-training dataset matters a lot!
ImageNet: people, places, things
SeCo: satellite
EuroSAT: satellite
Pretext Task
Pre-Training on EuroSAT
Model Size

•Domain: H&E colorectal tissue
•Training: 100k image patches
•Test: 7180 image patches from different hospitals
•Goal: predict 9 tissue classes
•Pre-train on various datasets, followed by linear classifier
Example 2: Histopathology
16© 2024 Pixel Scientia Labs
ADI BACK DEB LYM MUC MUS NORM STR TUM

Example 2: Histopathology
17© 2024 Pixel Scientia Labs
Problem: color variations from different scanners or staining procedures
Solution: simulate
color variations with
image augmentation
Source: Kanwal, The Devil is in the Details, 2022

Example 2: Histopathology
Domain-specific model improves generalizability
18© 2024 Pixel Scientia Labs
Pre-trained on people, places, things Pre-trained on histopathology only

Domain-Specific Foundation Model Best Practices
19© 2024 Pixel Scientia Labs
Classification
PRE-TRAINING
PHASE 2
FINE-TUNING Regression Detection
Segmentation
PRE-TRAINING
PHASE 1
In-domain
dataset
In-domain dataset, if
possible
1) Start with another
foundation model to
shorten pre-training
2) Use a diverse dataset
to capture variations
within domain
3) Simulate additional
variations with augmentation

1)Domain-specificity allows for smaller models
2)Reduced computational needs for training and inference
3)Adaptable to multiple downstream tasks
4)Develop proof of concept quicker
5)Increased accuracy on downstream tasks
6)Less reliance on labeled data
7)Improved generalizability to distribution shifts
Benefits of Domain-Specific Small Foundation Models
20© 2024 Pixel Scientia Labs

Resources
21© 2024 Pixel Scientia Labs
https://pixelscientia.com/embedded2024/
Links to these slides, articles, podcasts, and other resources to guide you on your journey.
Foundation Model ROI WorkshopWednesday, June 5 @ 12 pm EDT/9 am PDT
A virtual workshop on how to identify the value and calculate
theROI of a vision foundation model approach.
Computer Vision Insights NewsletterA biweekly newsletter that often features the latestresearch
infoundation models.
Impact AI Podcast Learn how to build a mission-driven machine, learning-
poweredcompany from the innovatorsand entrepreneurswho
are leading theway.

22© 2024 Pixel Scientia Labs
Backup Slides

Self-Supervised Learning: Distillation
23© 2024 Pixel Scientia Labs
Source: https://medium.com/@noureldinalaa93/easily-explained-momentum-contrast-for-unsupervised-visual-representation-learning-moco-c6f00a95c4b2

•Domain: gastrointestinal endoscopy images
•Pre-training: 99k unlabeled gastrointestinal endoscopy images
•Train and test:2,642 images, 80/20 train/test
•Goal: pathological finding characterization (12 classes)
Example 3: Imbalanced Data
24© 2024 Pixel Scientia Labs
Images per class

Example 3: Imbalanced Data
Domain-specific model better handled class imbalance
25© 2024 Pixel Scientia Labs
Macro = equal weight to each class