Striving for Simplicity: The All Convolutional Net

Alexey Dosovitskiy; Jost Tobias Springenberg; Martin Riedmiller; Thomas Brox

arxiv: 1412.6806 · v3 · pith:H2IXAE26new · submitted 2014-12-21 · 💻 cs.LG · cs.CV· cs.NE

Striving for Simplicity: The All Convolutional Net

Jost Tobias Springenberg , Alexey Dosovitskiy , Thomas Brox , Martin Riedmiller This is my paper

classification 💻 cs.LG cs.CVcs.NE

keywords convolutionalrecognitionlayersnetworkobjectcnnsfindingmax-pooling

0 comments

read the original abstract

Most modern convolutional neural networks (CNNs) used for object recognition are built using the same principles: Alternating convolution and max-pooling layers followed by a small number of fully connected layers. We re-evaluate the state of the art for object recognition from small images with convolutional networks, questioning the necessity of different components in the pipeline. We find that max-pooling can simply be replaced by a convolutional layer with increased stride without loss in accuracy on several image recognition benchmarks. Following this finding -- and building on other recent work for finding simple network structures -- we propose a new architecture that consists solely of convolutional layers and yields competitive or state of the art performance on several object recognition datasets (CIFAR-10, CIFAR-100, ImageNet). To analyze the network we introduce a new variant of the "deconvolution approach" for visualizing features learned by CNNs, which can be applied to a broader range of network structures than existing approaches.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 18 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Dataset Distillation
cs.LG 2018-11 unverdicted novelty 8.0

Dataset distillation creates a tiny synthetic training set that, when used with a fixed network initialization, produces models whose performance approximates that of models trained on the full original dataset.
Federated Learning: Strategies for Improving Communication Efficiency
cs.LG 2016-10 conditional novelty 8.0

Structured updates (low-rank or masked) and sketched updates (quantized, rotated, subsampled) reduce uplink communication in federated learning by up to two orders of magnitude on convolutional and recurrent networks.
Unsupervised Representation Learning with Deep Convolutional Generative Adversarial Networks
cs.LG 2015-11 accept novelty 8.0

DCGANs with architectural constraints learn a hierarchy of representations from object parts to scenes in both generator and discriminator across image datasets.
Spectral Integrated Gradients for Coarse-to-Fine Feature Attribution
cs.CV 2026-05 unverdicted novelty 7.0

Spectral Integrated Gradients constructs SVD-based integration paths that activate singular components from largest to smallest, producing cleaner attribution maps and better quantitative scores than standard Integrat...
ExPath: Targeted Pathway Inference for Biological Knowledge Bases via Graph Learning and Explanation
cs.LG 2025-02 unverdicted novelty 6.0

ExPath is a subgraph inference framework that classifies bio-networks with experimental data and uses explanations to identify targeted pathways, reporting up to 4.5x higher Fidelity+ and 14x lower Fidelity- than base...
SalUn: Empowering Machine Unlearning via Gradient-based Weight Saliency in Both Image Classification and Generation
cs.LG 2023-10 conditional novelty 6.0

SalUn uses gradient-based weight saliency to achieve effective machine unlearning of data, classes, or concepts in image classification and generation, narrowing the gap to exact retraining.
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
cs.LG 2021-04 accept novelty 6.0

Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
Saliency-driven Word Alignment Interpretation for Neural Machine Translation
cs.CL 2019-06 unverdicted novelty 6.0

Saliency-driven interpretation methods reveal that NMT models learn word alignments of better quality than fast-align under force decoding and consistent with automatic tools under free decoding.
Seeing What Shouldn't Be There: Counterfactual GANs for Medical Image Attribution
cs.CV 2026-05 unverdicted novelty 5.0

A cycle-consistent GAN generates counterfactual medical images to attribute classification decisions more comprehensively than standard saliency methods.
Leveraging Convolutional Sparse Autoencoders for Robust Movement Classification from Low-Density sEMG
cs.LG 2026-01 unverdicted novelty 5.0

Convolutional sparse autoencoder on two-channel sEMG delivers 94.3% multi-subject F1 for six gestures, 92.3% after few-shot transfer to unseen subjects, and 90% after incremental extension to ten classes.
AdaProb: Efficient Machine Unlearning via Adaptive Probability
cs.LG 2024-11 unverdicted novelty 5.0

AdaProb performs machine unlearning by substituting final-layer output probabilities with optimized uniform pseudo-probabilities and updating model weights.
Explaining Graph Neural Networks for Node Similarity on Graphs
cs.LG 2024-07 unverdicted novelty 5.0

Empirical comparison shows gradient-based explanations for GNN node similarities are actionable, consistent, and retain effects when sparsified, unlike mutual information explanations.
Explaining the Explainers in Graph Neural Networks: a Comparative Study
cs.LG 2022-10 unverdicted novelty 5.0

Benchmark study of ten GNN explainers on eight architectures and six datasets that isolates usable components and issues practical recommendations.
Explaining an increase in predicted risk for clinical alerts
cs.LG 2019-07 unverdicted novelty 5.0

Methods are introduced to lift static attribution techniques to dynamical models for explaining risk increases in clinical alert systems.
Out-of-Distribution Detection Using Neural Rendering Generative Models
cs.LG 2019-07 unverdicted novelty 5.0

NRM enables OoD detection by joint latent likelihood, assigning lower values to SVHN than CIFAR-10 (unlike VAEs/flows) and consistent across other OoD sets.
Explainable Human Activity Recognition: A Unified Review of Concepts and Mechanisms
cs.LG 2026-04 unverdicted novelty 4.0

The paper delivers a mechanism-centric taxonomy and unified perspective on explainable human activity recognition methods across sensing modalities.
How Does Overparameterization Affect Machine Unlearning of Deep Neural Networks?
cs.LG 2025-03 unverdicted novelty 4.0

Overparameterized DNNs enable more effective machine unlearning for privacy and bias removal via localized decision-region adjustments, with performance depending on method access to forgotten data.
Image Classification with Hierarchical Multigraph Networks
cs.CV 2019-07 unverdicted novelty 4.0

Hierarchical multigraph GCNs applied to superpixels achieve competitive or superior accuracy to CNNs on standard image classification benchmarks.