OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

David Eigen; Michael Mathieu; Pierre Sermanet; Rob Fergus; Xiang Zhang; Yann LeCun

arxiv: 1312.6229 · v4 · pith:3CHUHHJ7new · submitted 2013-12-21 · 💻 cs.CV

OverFeat: Integrated Recognition, Localization and Detection using Convolutional Networks

Pierre Sermanet , David Eigen , Xiang Zhang , Michael Mathieu , Rob Fergus , Yann LeCun This is my paper

classification 💻 cs.CV

keywords detectionlocalizationintegratedapproachconvolutionalframeworklearningnetworks

0 comments

read the original abstract

We present an integrated framework for using Convolutional Networks for classification, localization and detection. We show how a multiscale and sliding window approach can be efficiently implemented within a ConvNet. We also introduce a novel deep learning approach to localization by learning to predict object boundaries. Bounding boxes are then accumulated rather than suppressed in order to increase detection confidence. We show that different tasks can be learned simultaneously using a single shared network. This integrated framework is the winner of the localization task of the ImageNet Large Scale Visual Recognition Challenge 2013 (ILSVRC2013) and obtained very competitive results for the detection and classifications tasks. In post-competition work, we establish a new state of the art for the detection task. Finally, we release a feature extractor from our best model called OverFeat.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 6 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Differentiable Surrogate for Detector Simulation and Design with Diffusion Models
physics.ins-det 2026-01 unverdicted novelty 7.0

A LoRA-adapted conditional diffusion surrogate for electromagnetic calorimeter showers matches key observables within 2% RMSE and reproduces directional trends in design-utility gradients.
Socratic Models: Composing Zero-Shot Multimodal Reasoning with Language
cs.CV 2022-04 unverdicted novelty 7.0

Socratic Models compose zero-shot multimodal reasoning by prompting pretrained language and vision models to exchange information and enable new capabilities without finetuning.
DETOUR: A Practical Backdoor Attack against Object Detection
cs.CR 2026-04 unverdicted novelty 6.0

DETOUR enables practical backdoor attacks on object detectors by training with rescaled semantic triggers from real-world objects placed at multiple locations to exploit the trigger radiating effect for reliable activ...
Rethinking Atrous Convolution for Semantic Image Segmentation
cs.CV 2017-06 unverdicted novelty 6.0

DeepLabv3 improves semantic segmentation by capturing multi-scale context with cascaded or parallel atrous convolutions and adding global context to ASPP, achieving better results on PASCAL VOC 2012 without DenseCRF p...
A Regularized Convolutional Neural Network for Semantic Image Segmentation
cs.CV 2019-06 unverdicted novelty 3.0

Integrating total variation regularization into U-Net and SegNet yields segmentation results with improved spatial regularity and noise robustness on WBC, CamVid, and SUN-RGBD datasets.
A review on deep learning techniques for 3D sensed data classification
cs.CV 2019-07 unverdicted novelty 1.0

A survey of deep learning architectures for 3D sensed data classification covering RGB-D, multi-view, volumetric and end-to-end methods along with datasets and future directions.