30B Images and Counting: Scaling Canva's Content-Understanding Pipelines by Kerry Halupka

ScyllaDB 202 views 19 slides Mar 05, 2025
Slide 1
Slide 1 of 19
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19

About This Presentation

Scaling content understanding for billions of images is no easy feat. This talk dives into building extreme label classification models, balancing accuracy & speed, and optimizing ML pipelines for scale. You'll learn new ways to tackle real-time performance challenges in massive data environ...


Slide Content

A ScyllaDB
Menster SCALE Summit

40B Images and Counting

Scaling Canva's Content-Understanding Pipelines

_ Dr Kerry Halupka
Principal ML Engineer

Dr Kerry Halupka (she/her)

+ Principal ML Engineer at Canva

+ Experience in Al-driven content enrichment & large-scale ML
pipelines

+ Passionate about mentoring & sharing knowledge

Canva

Monster SCALE Summit

We have

220 Million+

Monthly Active Users

Across 190 countries, in 100 languages |


Monster SCALE Summit

Humans see nuance

Monster SCALE Summit

Computers... don’t

Monster SCALE Summit

Q Energetic business meeting © =

Bad Results Good Results

Monster SCALE Summit

Concept
Depth

re. Image
a dul Volume

Monster SCALE Summit |

Requirements

> if +

Compute Scalability Performance
Model needs Model needs Model needs
tobe cheap to be fast to be accurate

Monster SCALE Summit

Solving for compute

Traditional ML-decoder
Transformer-decoder
Computational
t Transformer-
Group Image decoder
Full Image Queries Embeddings
Queries Embeddings : :
5 1 xD (xD
mio (wn) xo
= Cross-Attention

ESS

on

Cross-attention |

) ML-decoder
no
Y
Food-Forward + Cure
as Connected Pooling Number of
y y Concepts
Net Net

ML-Decoder- Scalable and Versatie Classification Head Tal Ridnik’ Gilad Shari" Avi Ben
Cohen Emanuel Ben-Baruch Asaf Noy DAMO Academy, Albaba Group

janster SCALE Summit

Solving for scalability

How do we train anew concept?

e.g. Cottagecore

Hint: Cottagecore is an aesthetic
romanticising a simple, rustic, and
idyllic way of life

Monster SCALE Summit

Data Labelling

ing to find
rse
representations

Findnew
Unlabelled Data Pre

containing
concept
Define new

ae Labelimages
Checkallother

imagesin

dataset for

occurrences Labelled Data

Previously labelled
images may contain
the new concept

Finding Examples: Beyond Simple Search

Q Cottage Core Q Acozy farmhouse Q Ayoung woman Q Apicnic in the
kitchen with wildflowers woods

Monster SCALE Summit

Acory
farmhouse
Kitchen

Finding Diverse Examples

Li

Ti

INT,

+ Input: Multiple text descriptions
+ Output: Varied images capturing
different facets of the concept

Solving for Performance

Q Acozy farmhouse
kitchen

Improving the Training Dataset

“Acozy
farmhouse
kitchen”
Eu High High
Concept _ "ecall _ visualcritic _., "ecall,
Classification En Model high

precision precision

dataset dataset

Unlabelled Data

“Cottagecore”

End-to-End Process

New
concept
Description
generation
= 2 y High High
L. a Candidate _> Lowthreshold_ recall _ visual Critic recall
É CLIP =
mas > 7 “images 7? Concept low Model 77 high
UnlabelledData Classification precision precision

dataset dataset

Requirements

> if +

Compute Scalability Performance

Model needs Model needs Model needs
to be cheap tobe fast to be accurate

Stay in Touch

Dr Kerry Halupka
pa [email protected]

lin] www.linkedin.com/in/kerry-halupka

J
Tags