GoogLeNet Min- Seo Kim Network Science Lab Dept. of Artificial Intelligence The Catholic University of Korea E-mail: [email protected]
Ongoing studies GoogLeNet
GoogLeNet The GoogLeNet submission to ILSVRC 2014 used 12× fewer parameters than the winning architecture, VGG, from two years prior, yet it was significantly more accurate. Notable factor is that with the ongoing traction of mobile and embedded computing, the efficiency of our algorithms – especially their power and memory use – gains importance. Introduction
GoogLeNet GoogLeNet have typically had a standard structure – stacked convolutional layers (optionally followed by contrast normalization and maxpooling ) are followed by one or more fully-connected layers. Use 1×1 convolutional layers, ReLU activation function in Network-in-Network. Related Work
GoogLeNet The most straightforward way of improving the performance of deep neural networks is by increasing their size. However this simple solution comes with two major drawbacks. Bigger size typically means a larger number of parameters, which makes the enlarged network more prone to overfitting. To prevent overfitting, creation of high quality training sets can be tricky and expensive. Uniformly increased network size is the dramatically increased use of computational resources. Since in practice the computational budget is always finite, an efficient distribution of computing resources is preferred to an indiscriminate increase of size Motivation and High Level Considerations
GoogLeNet - Architectural Details To effectively extract feature maps, 1x1, 3x3, and 5x5 convolution filters are performed in parallel. However, this inevitably increases the computational load. Inception module
GoogLeNet - Architectural Details Therefore, to address this issue, the 1x1 convolution filter was used. By placing it before the 3x3 and 5x5 filters, it reduces the dimensions, which in turn reduces the computational load and introduces non-linearity. Inception module
GoogLeNet - Architectural Details - input tensor = 28X28X192 - convolution filter = 5X5X192 - padding = 2 - strride = 1 - number of filter = 32 28X28X192X5X5X32=1.2 billion times How does the 1x1 conv filter reduce the amount of computation? - input tensor = 28X28X192 - convolution filter = 1X1X16 - number of filter = 16 192X1X1X28X28X16=2.4 million operations - input tensor = 28X28X16 - convolution filter = 5X5X192 - padding = 2 - strride = 1 - number of filter = 32 16x5x5x28x28x32 = 10 million operations Total of 12.4 million operations. The number of operations has decreased tenfold. The non-linearity has increased.
GoogLeNet - Architectural Details This is the parameter calculation for the Inception 3a module inside the actual GoogLeNet . Inception in GoogLeNet (inception 3a)
GoogLeNet - Architectural Details This is where the lower layers are located, close to the input image. For efficient memory usage, we applied a basic CNN-type model in the lower layer. The Inception module is used in the higher layers, so it is not used in this part. Part 1
GoogLeNet - Architectural Details To extract various features, the Inception module described earlier is implemented. Part 2
GoogLeNet - Architectural Details As the depth of the model becomes very deep, the vanishing gradient problem can occur even when using the ReLU activation function. We added an auxiliary classifier to the middle layer, which outputs intermediate results so that the gradient can be passed as an additional backprop. To prevent it from having too much influence, the loss of the auxiliary classifier is multiplied by 0.3 and added to the total loss of the entire network. In the actual test, we removed the auxiliary classifier and used only the softmax of the far end. Part 3
GoogLeNet - Architectural Details This is the end of the model with the prediction results. The average pooling layer with global average pooling is applied. This reduces the size of the feature map without any additional parameters. Part 4
GoogLeNet We presented a new methodology that is different from the existing CNN methods that only build up depth. It won the first prize at ILSVRC 2014, beating VGGNet . Conclusions