Inception-v4, Inception-ResNet and the Impact of Residual Connections on Learning Christian Szegedy, Sergey Ioffe, Vincent Vanhoucke, Alexander A. Alemi (Google Research, 2017)
Problem & Motivation Deep CNNs drive advances in image recognition. Inception networks: efficient, strong performance. Residual connections (ResNets): improved optimization, ILSVRC 2015 winner. Gap: Unclear whether Inception can benefit from residuals. Goal: Build hybrid Inception-ResNet, simplify Inception-v4, and compare.
Method – Key Contributions Introduced Inception-v4: simplified, uniform architecture. Introduced Inception-ResNet-v1/v2: Inception modules + residual connections. Residual scaling (0.1–0.3) stabilized very wide networks. Used TensorFlow: training without replica partitioning. Training tricks: RMSProp, gradient clipping, exponential LR decay.
Results – Training Dynamics Residual connections → faster training than pure Inception. Inception-ResNet slightly outperforms similar-cost Inception. Final accuracy depends mainly on model size, but residuals improve efficiency.
Strengths Introduced strong new architectures (Inception-v4, Inception-ResNet). Residual scaling stabilized wide networks. Residuals improved convergence speed and optimization. Achieved SOTA ImageNet performance (3.08% Top-5 error).
Limitations Instability: residual networks with >1000 filters 'died' without scaling. Final accuracy depended more on model size than residuals. Comparisons ad hoc: similar cost models, not systematically optimized. Ensembling gains smaller than expected.
Takeaways Problem: Can Inception benefit from residuals? Solution: Yes — hybrids train faster, often more accurate. Results: Inception-v4 and Inception-ResNet-v2 matched/exceeded SOTA. Impact: Residuals confirmed as broadly useful, inspiring later hybrids.