EfficientNet

MahetaJugal 1,361 views 24 slides Nov 24, 2020
Slide 1
Slide 1 of 24
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10
Slide 11
11
Slide 12
12
Slide 13
13
Slide 14
14
Slide 15
15
Slide 16
16
Slide 17
17
Slide 18
18
Slide 19
19
Slide 20
20
Slide 21
21
Slide 22
22
Slide 23
23
Slide 24
24

About This Presentation

EfficientNet Architecture and comparison with other.


Slide Content

1 Signal Processing and Machine Learning Gandhi Jugal IDDP-August-2020 CSIR-CEERI EfficientNet: Rethinking Model scaling for CNN

Scaling up ConvNets is widely used to achieve better accuracy. ResNet can be scaled from ResNet-18 to ResNet-200 by using more layers. GPipe achieved 84.3 % ImageNet top-1 accuracy by scaling up a baseline model 4 times larger . The most common way is to scale up ConvNet by their depth, width, or image resolution. In previous works, it is common to scale only one of the three dimensions. Though it is possible to scale up two or three dimensions arbitrarily, arbitrary scaling requires tedious manual tuning and still often yields sub-optimal accuracy and efficiency . 2

Deep ConvNets are often over-parameterized. Model compression is a common way to reduce model size by trading accuracy for efficiency. it is also common to handcraft efficient mobile-size ConvNets , such as SqueezeNets , MobileNets , and ShuffleNets . Recently , Neural A rchitecture S earch becomes increasingly popular in designing efficient mobile-size ConvNets such as MNasNet . However , it is unclear how to apply these techniques for larger models that have much larger design space and much more expensive tuning cost. 3

There are many ways to scale a ConvNet for different resource constraints ResNet can be scaled down (e.g., ResNet-18) or up (e.g.,ResNet-200) by adjusting network depth (#layers ). WideResNet and MobileNets can be scaled by network width (#channels ). It is also well-recognized that bigger input image size will help accuracy with the overhead of more OPS . The Network depth and width are both important for ConvNets , it still remains an open question of how to effectively scale a ConvNet to achieve better efficiency and accuracy . 4

5 Increase width, add more layer, improve input resolution m

6 Depth (# layers): Deeper ConvNet can capture richer and more complex features , and generalize well on new tasks Width (#channels): Wider networks tend to be able to capture more find-grained features and are easier to train / Difficulties in capturing higher level features Resolution (#image sizes): ConvNet can potentially capture more fine-grained patterns

7 Scaling Dimensions –Depth The intuition is that deeper ConvNet can capture richer and more complex features, and generalize well on new tasks. However , the accuracy gain of very deep network diminishes. For example, ResNet-1000 has similar accuracy as ResNet-101 even though it has much more layers. Floating points operations per sec.

8 Scaling Dimensions –Width Scaling network width is commonly used for small size models. As discussed in previously, wider networks tend to be able to capture more fine-grained features and are easier to train. However , extremely wide but shallow networks tend to have difficulties in capturing higher level features. And the accuracy quickly saturates when networks become much wider with larger w.

9 Scaling Dimensions –Resolution With higher resolution input images, ConvNet can potentially capture more fine-grained patterns. Starting from 224x224 in early ConvNet , modern ConvNet stand to use 299x299 or 331x331 for better accuracy. Recently, Gpipe achieves state-of-the-art ImageNet accuracy with 480x480 resolution. Higher resolutions improve accuracy, but the accuracy gain diminishes for very high resolutions.

The accuracy gain quickly saturate after reaching 80%, demonstrating the limitation of single dimension scaling. (Baseline: EfficientNet-B0) Width Scaling Depth Scaling Resolution Scaling 224*224

11 Compound Scaling Intuitively , the compound scaling method makes sense because if the input image is bigger, then the network needs more layers to increase the receptive field and more channels to capture more fine-grained patterns on the bigger image. If we only scale network width w without changing depth (d=1.0) and resolution (r=1.0), the accuracy saturates quickly. With deeper (d=2.0) and higher resolution (r=2.0), width scaling achieves much better accuracy under the same OPS cost.

12  , , are constants that can be determined by a small grid search . Phi is a user-specified coefficient that controls how many more resources are available for model scaling.

13 Backbone Network πΈπ‘“π‘“π‘–π‘π‘–π‘’π‘›π‘‘π‘π‘’π‘‘βˆ’π΅0~𝐡6 Input resolution 𝑅 𝑖𝑛𝑝𝑒𝑑 =512+πœ™βˆ™128 𝐷 𝑏𝑖𝑓𝑝𝑛 =2+πœ™ #layers #channels π‘Š π‘π‘Ÿπ‘’π‘‘ =π‘Š 𝑏𝑖𝑓𝑝𝑛

14 Backbone network Same width/depth scaling coefficient of EfficientNet B0 to B6 BiFPN network ( Bi-directional Feature Pyramid Network ) Exponentially grow BiFPN width π‘Š 𝑏𝑖𝑓𝑝𝑛 (#channels), but linearly increase depth 𝐷 b 𝑖𝑓𝑝𝑛 (#layers) since depth needs to be rounded to small. π‘Š 𝑏𝑖𝑓𝑝𝑛 =64βˆ™1.35 πœ™ ,𝐷 𝑏𝑖𝑓𝑝𝑛 =2+πœ™ Box/class prediction network Fix their width to always the same as BiFPN (i.e., π‘Š π‘π‘Ÿπ‘’π‘‘ =π‘Š 𝑏𝑖𝑓𝑝𝑛 But, linearly increase the depth(#layers) using equation: π·π‘π‘œπ‘₯ = π·π‘π‘™π‘Žπ‘ π‘  = 3 + [πœ™/3 ] Input image resolution Since feature level 3 - 7 are used in BiFPN , the input resolution must be dividable by 128 𝑅𝑖𝑛𝑝𝑒𝑑=512+πœ™βˆ™128

15

16

17 The EfficientNet Architecture using Neural Architecture Search

18 ImageNet Results for EfficientNet

19 ImageNet Results for EfficientNet 84.4 % for top 1 97 % for top 5

20

21 EfficientNet Performance Results on Transfer Learning Datasets. Our scaled EfficientNet models achieve new state-of-the-art accuracy for 5 out of 8 datasets, with 9.6x fewer parameters on average.

Conclusion Weighted bidirectional feature network and customized compound scaling method are proposed . These are for improving both accuracy and efficiency. EfficientDet is also up to 3.2x faster on GPUs and 8.1x faster on CPUs 22

23 Reference https:// ai.googleblog.com/2019/05/efficientnet-improving-accuracy-and.html https:// www.youtube.com/watch?v=3svIm5UC94I&t=11s Β  https://arxiv.org/abs/1905.11946 https://heartbeat.fritz.ai/reviewing-efficientnet-increasing-the-accuracy-and-robustness-of-cnns-6aaf411fc81d

24