2. model SE module [ h,w,c ] [1,1,c] [1,1,x] [1,1,w] For example, the input feature map has 8 channels, and after the first fully connected layer, the output is of length 4. Input: (c1, c2, c3, ... c8) This fully connected layer learns to generate 4 new channel descriptions x1~x4. Fully connected layer 1: (x1, x2, x3, x4) The second fully connected layer takes these 4 channel descriptions as input, and the output length is restored to 8, which is the attention weight of each channel. Fully connected layer 2: (w1, w2, w3, ..... .w8) These 8 weights Wi are then used to scale each input channel Ci for feature scaling. Output: (w1c1, w2c2, w3c3, ... w8c8) Thus, the first fully-connected layer encodes the input channels and the second fully-connected layer generates the attentional weights, and these two layers work together to learn effective feature scaling. Benefits: Enlargement of important feature channels and suppress unimportant ones, thus improving the model's ability to utilize input information and improving the collection of effective features.