EfficientNet.pptx

ssuser2624f71 215 views 10 slides Oct 20, 2023
Slide 1
Slide 1 of 10
Slide 1
1
Slide 2
2
Slide 3
3
Slide 4
4
Slide 5
5
Slide 6
6
Slide 7
7
Slide 8
8
Slide 9
9
Slide 10
10

About This Presentation

EfficientNet


Slide Content

EfficientNet YANJUN WU Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail:[email protected]

1. Background Increasing the depth of the network can get richer and more complex features and can be well applied to other tasks. However, if the depth of the network is too deep, it will face the problem of gradient disappearance and training difficulty. Increasing the width of the network can obtain higher fine-grained features and is also easier to train, but it is often difficult to learn deeper features for networks with large width and shallow depth. Increasing the image resolution of the input network can potentially yield higher fine-grained feature templates, but for very high input resolutions, the accuracy gain is also reduced. And large resolution images increase the amount of computation.

1. Background Before EfficientNet came out, we always added just one dimension. But someone may ask why the input image resolution is fixed at 224? Why is the number of convolutions set to this value? Why is the depth of the network set to this depth? This can only be explained by engineering experience. But EfficientNet combines the relationships between them.

2. model Only i ncrease width Only i ncrease depth Only i ncrease resolution base model EfficientNetB-0

1. model Then the authors did an experiment with different depths and picture sizes. Then keep changing the width of the network will get the 4 curves shown in the figure . We can see that under the same calculation, changing the depth and picture size at the same time has the best effect.

1. model Width: Number of channels in each convolutional layer Depth: Number of layers in the network

2. model EfficientNet-B0 We can see that EfficientNet-b0 is a series of MBconvs . Let's take a look at the structure in MBConv : Increase number of channels Reduced number of channels k is 3 when the back of MBConv is 3×3 and k is 5 when it is 5×5

2. model SE module [ h,w,c ] [1,1,c] [1,1,x] [1,1,w] For example, the input feature map has 8 channels, and after the first fully connected layer, the output is of length 4. Input: (c1, c2, c3, ... c8) This fully connected layer learns to generate 4 new channel descriptions x1~x4. Fully connected layer 1: (x1, x2, x3, x4) The second fully connected layer takes these 4 channel descriptions as input, and the output length is restored to 8, which is the attention weight of each channel. Fully connected layer 2: (w1, w2, w3, ..... .w8) These 8 weights Wi are then used to scale each input channel Ci for feature scaling. Output: (w1c1, w2c2, w3c3, ... w8c8) Thus, the first fully-connected layer encodes the input channels and the second fully-connected layer generates the attentional weights, and these two layers work together to learn effective feature scaling. Benefits: Enlargement of important feature channels and suppress unimportant ones, thus improving the model's ability to utilize input information and improving the collection of effective features.

3. Results

4. Question Q&A
Tags