Faster R-CNN: Towards Real-Time Object Detection with Region Proposal Networks

Shaoqing Ren , Kaiming He , Ross Girshick , Jian Sun

Authors on Pith no claims yet

classification 💻 cs.CV

keywords detectionregionnetworkobjectr-cnnnetworksproposalconvolutional

read the original abstract

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 19 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Tri-Modal Fusion Transformers for UAV-based Object Detection
cs.CV 2026-04 unverdicted novelty 7.0

A dual-stream vision transformer with modality-aware gated exchange and bidirectional token exchange fuses RGB, thermal, and event data to improve UAV vehicle detection over dual-modal baselines on a new 10,489-frame dataset.
Investigating Anisotropy in Visual Grounding under Controlled Counterfactual Perturbations
cs.CV 2026-05 unverdicted novelty 6.0

Controlled counterfactual perturbations reveal no correlation between embedding cosine similarity and approximation behavior in two visual grounding models.
Transferable Physical-World Adversarial Patches Against Pedestrian Detection Models
cs.CV 2026-04 unverdicted novelty 6.0

TriPatch generates transferable physical adversarial patches via multi-stage triplet loss, appearance consistency, and data augmentation to achieve higher attack success rates on pedestrian detectors than prior methods.
PASTA: A Patch-Agnostic Twofold-Stealthy Backdoor Attack on Vision Transformers
cs.CV 2026-04 unverdicted novelty 6.0

PASTA enables patch-agnostic backdoor activation in ViTs via multi-location trigger insertion during training and bi-level optimization, achieving 99.13% average attack success with large gains in visual/attention ste...
AIM: Asymmetric Information Masking for Visual Question Answering Continual Learning
cs.CV 2026-04 unverdicted novelty 6.0

AIM applies modality-specific masks to balance stability and plasticity in asymmetric VLMs, achieving SOTA average performance and reduced forgetting on continual VQA v2 and GQA while preserving generalization to nove...
DroneScan-YOLO: Redundancy-Aware Lightweight Detection for Tiny Objects in UAV Imagery
cs.CV 2026-04 unverdicted novelty 6.0

DroneScan-YOLO reaches 55.3% mAP@50 and 35.6% mAP@50-95 on VisDrone2019-DET by combining 1280x1280 input, RPA-Block pruning, MSFD stride-4 branch, and SAL-NWD loss, beating YOLOv8s by 16.6 and 12.3 points with only 4....
Improving Layout Representation Learning Across Inconsistently Annotated Datasets via Agentic Harmonization
cs.CV 2026-04 unverdicted novelty 6.0

VLM-based harmonization of inconsistent annotations across two document layout corpora raises detection F-score from 0.860 to 0.883 and table TEDS from 0.750 to 0.814 while tightening embedding clusters.
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges
cs.LG 2021-04 accept novelty 6.0

Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
Unveiling Hidden Lyman Alpha Emitters in the DESI DR1 Data
astro-ph.GA 2026-05 unverdicted novelty 5.0

A CNN detects 19,685 LAEs at z=2-3.5 in DESI DR1 spectra with 95% purity and completeness.
Investigation of cardinality classification for bacterial colony counting using explainable artificial intelligence
cs.CV 2026-04 unverdicted novelty 5.0

XAI analysis identifies high visual similarity across colony cardinality classes as the primary limit on MicrobiaNet performance in bacterial colony counting, revising prior model assessments.
A Weak-Signal-Aware Framework for Subsurface Defect Detection: Mechanisms for Enhancing Low-SCR Hyperbolic Signatures
cs.CV 2026-04 unverdicted novelty 5.0

WSA-Net uses partial convolutions, heterogeneous grouping attention, geometric reconstruction, and context anchoring to enhance low-SCR hyperbolic signatures in GPR data, reaching 0.6958 mAP@0.5 at 164 FPS with 2.412M...
Label-Efficient School Detection from Aerial Imagery via Weakly Supervised Pretraining and Fine-Tuning
cs.CV 2026-05 unverdicted novelty 4.0

A two-stage weakly supervised pipeline pretrains on auto-generated school labels from sparse points and fine-tunes on only 50 manual examples to achieve strong detection performance in aerial imagery.
Multi-Dataset Cross-Domain Knowledge Distillation for Unified Medical Image Segmentation, Classification, and Detection
cs.CV 2026-05 unverdicted novelty 4.0

A multi-dataset cross-domain knowledge distillation approach improves unified performance on medical image segmentation, classification, and detection by transferring domain-invariant features from a joint teacher mod...
KAYRA: A Microservice Architecture for AI-Assisted Karyotyping with Cloud and On-Premise Deployment
cs.LG 2026-04 unverdicted novelty 4.0

KAYRA packages a cascade of EfficientNet-B5 + U-Net, Mask R-CNN, and ResNet-18 models into a microservice architecture that supports both cloud and on-premise deployment and reaches 98.91% segmentation accuracy in a p...
Learning to count small and clustered objects with application to bacterial colonies
cs.CV 2026-04 unverdicted novelty 4.0

ACFamNet Pro reaches 9.64% mean normalized absolute error on bacterial colony images under 5-fold cross-validation, beating FamNet by 12.71%.
Virtual KITTI 2
cs.CV 2020-01 accept novelty 4.0

Virtual KITTI 2 supplies synthetic clones of real KITTI driving sequences with added weather and camera variants and multi-modal ground-truth annotations for autonomous driving vision research.
YOLOv3: An Incremental Improvement
cs.CV 2018-04 accept novelty 4.0

YOLOv3 achieves accuracy comparable to SSD and RetinaNet but runs substantially faster, with 28.2 mAP at 320x320 in 22 ms and 57.9 mAP@50 in 51 ms on Titan X.
A Comparative Study of Modern Object Detectors for Robust Apple Detection in Orchard Imagery
cs.CV 2026-04 unverdicted novelty 3.0

YOLO11n achieves the highest mAP@0.5:0.95 of 0.6065 for apple localization, with other detectors showing trade-offs in recall and precision at low confidence thresholds.
AI Driven Soccer Analysis Using Computer Vision
cs.CV 2026-04 unverdicted novelty 2.0

A system combining object detection, segmentation, keypoint prediction, and homography transforms soccer video into real-world player positions and tactical statistics.