pith. machine review for the scientific record. sign in

arxiv: 1311.2524 · v5 · submitted 2013-11-11 · 💻 cs.CV

Recognition: unknown

Rich feature hierarchies for accurate object detection and semantic segmentation

Authors on Pith no claims yet
classification 💻 cs.CV
keywords detectionr-cnncnnscombinedatasetfeaturesobjectoverfeat
0
0 comments X
read the original abstract

Object detection performance, as measured on the canonical PASCAL VOC dataset, has plateaued in the last few years. The best-performing methods are complex ensemble systems that typically combine multiple low-level image features with high-level context. In this paper, we propose a simple and scalable detection algorithm that improves mean average precision (mAP) by more than 30% relative to the previous best result on VOC 2012---achieving a mAP of 53.3%. Our approach combines two key insights: (1) one can apply high-capacity convolutional neural networks (CNNs) to bottom-up region proposals in order to localize and segment objects and (2) when labeled training data is scarce, supervised pre-training for an auxiliary task, followed by domain-specific fine-tuning, yields a significant performance boost. Since we combine region proposals with CNNs, we call our method R-CNN: Regions with CNN features. We also compare R-CNN to OverFeat, a recently proposed sliding-window detector based on a similar CNN architecture. We find that R-CNN outperforms OverFeat by a large margin on the 200-class ILSVRC2013 detection dataset. Source code for the complete system is available at http://www.cs.berkeley.edu/~rbg/rcnn.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Intriguing properties of neural networks

    cs.CV 2013-12 accept novelty 8.0

    Deep neural networks exhibit distributed high-level semantic representations and discontinuous input-output mappings vulnerable to transferable adversarial perturbations.

  2. SAM 3D: 3Dfy Anything in Images

    cs.CV 2025-11 unverdicted novelty 6.0

    SAM 3D reconstructs 3D objects from single images with geometry, texture, and pose using human-model annotated data at scale and synthetic-to-real training, achieving 5:1 human preference wins.

  3. Learning to count small and clustered objects with application to bacterial colonies

    cs.CV 2026-04 unverdicted novelty 4.0

    ACFamNet Pro reaches 9.64% mean normalized absolute error on bacterial colony images under 5-fold cross-validation, beating FamNet by 12.71%.