30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by Kerry Halupka
ScyllaDB
202 views
19 slides
Mar 05, 2025
Slide 1 of 19
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
About This Presentation
Scaling content understanding for billions of images is no easy feat. This talk dives into building extreme label classification models, balancing accuracy & speed, and optimizing ML pipelines for scale. You'll learn new ways to tackle real-time performance challenges in massive data environ...
Scaling content understanding for billions of images is no easy feat. This talk dives into building extreme label classification models, balancing accuracy & speed, and optimizing ML pipelines for scale. You'll learn new ways to tackle real-time performance challenges in massive data environments.
Size: 104.76 MB
Language: en
Added: Mar 05, 2025
Slides: 19 pages
Slide Content
A ScyllaDB
Menster SCALE Summit
40B Images and Counting
Scaling Canva's Content-Understanding Pipelines
_ Dr Kerry Halupka
Principal ML Engineer
Dr Kerry Halupka (she/her)
+ Principal ML Engineer at Canva
+ Experience in Al-driven content enrichment & large-scale ML
pipelines
Compute Scalability Performance
Model needs Model needs Model needs
tobe cheap to be fast to be accurate
Monster SCALE Summit
Solving for compute
Traditional ML-decoder
Transformer-decoder
Computational
t Transformer-
Group Image decoder
Full Image Queries Embeddings
Queries Embeddings : :
5 1 xD (xD
mio (wn) xo
= Cross-Attention
ESS
on
Cross-attention |
) ML-decoder
no
Y
Food-Forward + Cure
as Connected Pooling Number of
y y Concepts
Net Net
ML-Decoder- Scalable and Versatie Classification Head Tal Ridnik’ Gilad Shari" Avi Ben
Cohen Emanuel Ben-Baruch Asaf Noy DAMO Academy, Albaba Group
janster SCALE Summit
Solving for scalability
How do we train anew concept?
e.g. Cottagecore
Hint: Cottagecore is an aesthetic
romanticising a simple, rustic, and
idyllic way of life
Monster SCALE Summit
Data Labelling
ing to find
rse
representations
Findnew
Unlabelled Data Pre
containing
concept
Define new
ae Labelimages
Checkallother
imagesin
dataset for
occurrences Labelled Data
Previously labelled
images may contain
the new concept
Finding Examples: Beyond Simple Search
Q Cottage Core Q Acozy farmhouse Q Ayoung woman Q Apicnic in the
kitchen with wildflowers woods
Monster SCALE Summit
Acory
farmhouse
Kitchen
Finding Diverse Examples
Li
Ti
INT,
+ Input: Multiple text descriptions
+ Output: Varied images capturing
different facets of the concept
Solving for Performance
Q Acozy farmhouse
kitchen
Improving the Training Dataset
“Acozy
farmhouse
kitchen”
Eu High High
Concept _ "ecall _ visualcritic _., "ecall,
Classification En Model high
precision precision
dataset dataset
Unlabelled Data
“Cottagecore”
End-to-End Process
New
concept
Description
generation
= 2 y High High
L. a Candidate _> Lowthreshold_ recall _ visual Critic recall
É CLIP =
mas > 7 “images 7? Concept low Model 77 high
UnlabelledData Classification precision precision
dataset dataset
Requirements
> if +
Compute Scalability Performance
Model needs Model needs Model needs
to be cheap tobe fast to be accurate