THE BIG BOOK OF GENERATIVE AI NUMBER
OF A100S
THROUGHPUT
FOR U-NET
@ 256X256
(IMAGES /
SECOND)
THROUGHPUT
FOR U-NET
@ 512X512
(IMAGES /
SECOND)
THROUGHPUT
FOR U-NET @
512X512 WITH
EMA (IMAGES /
SECOND)
DAYS TO TRAIN
ON MOSAICML
CLOUD
APPROX. COST
ON MOSAICML
CLOUD
8 1100 290 290 101.04 $38,800
16 2180 585 580 50.29 $38,630
32 4080 1195 1160 25.01 $38,420
64 8530 2340 2220 12.63 $38,800
128 11600 4590 3927 6.79 $41,710
Table 1: Estimated time and cost to train a Stable Diffusion model on 1.1 billion images at 256x256 resolution, followed by 1.7 billion images at 512x512
resolution. Different rows show different numbers of NVIDIA 40GB A100 GPUs at a global batch size of 2048.
These optimizations show that training image generation models from scratch is within reach for everyone. For
updates on our latest work, join our Community Slack or follow us on Twitter. If your organization wants to start
training diffusion models today, please schedule a demo online or email us at
[email protected] .
1 When training large models with big batches that don't fit in memory in a single pass, each batch is divided into smaller microbatches. On each
device, we can do a forward and backward pass for each microbatch and sum the gradients at the end to compute a gradient update equivalent to
a single forward and backward pass with the entire batch all at once.
80