Recognition: unknown
Network In Network
read the original abstract
We propose a novel deep network structure called "Network In Network" (NIN) to enhance model discriminability for local patches within the receptive field. The conventional convolutional layer uses linear filters followed by a nonlinear activation function to scan the input. Instead, we build micro neural networks with more complex structures to abstract the data within the receptive field. We instantiate the micro neural network with a multilayer perceptron, which is a potent function approximator. The feature maps are obtained by sliding the micro networks over the input in a similar manner as CNN; they are then fed into the next layer. Deep NIN can be implemented by stacking mutiple of the above described structure. With enhanced local modeling via the micro network, we are able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers. We demonstrated the state-of-the-art classification performances with NIN on CIFAR-10 and CIFAR-100, and reasonable performances on SVHN and MNIST datasets.
This paper has not been read by Pith yet.
Forward citations
Cited by 7 Pith papers
-
Deep Residual Learning for Image Recognition
Residual networks reformulate layers to learn residual functions, enabling effective training of up to 152-layer models that achieve 3.57% error on ImageNet and win ILSVRC 2015.
-
Latent Space Probing for Adult Content Detection in Video Generative Models
Latent space probing on CogVideoX achieves 97.29% F1 for adult content detection on a new 11k-clip dataset with 4-6ms overhead.
-
Wide Residual Networks
Wide residual networks achieve higher accuracy and faster training than very deep thin residual networks by increasing width and decreasing depth, setting new state-of-the-art results on CIFAR, SVHN, and ImageNet.
-
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
A pruning-quantization-Huffman pipeline compresses deep neural networks 35-49x without accuracy loss.
-
Breaking Global Self-Attention Bottlenecks in Transformer-based Spiking Neural Networks with Local Structure-Aware Self-Attention
LSFormer uses local structure-aware spiking self-attention and spiking response pooling to cut global attention bottlenecks, delivering 4.3% and 8.6% accuracy gains on Tiny-ImageNet and N-CALTECH101 over prior transfo...
-
Higher Resolution, Better Generalization: Unlocking Visual Scaling in Deep Reinforcement Learning
Higher-resolution observations with global-average-pooling encoders improve RL performance and generalization by enabling more localized visual attention, yielding up to 28% gains over standard Impala encoders.
-
Parameter-Efficient Architectural Modifications for Translation-Invariant CNNs
Strategic insertion of Global Average Pooling layers in VGG-16 reduces trainable parameters by 98%, maintains 66.4% ImageNet Top-1 accuracy, doubles translation robustness, and yields superior Spearman correlations in...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.