pith. sign in

hub

Girshick, and Jian Sun

25 Pith papers cite this work. Polarity classification is still indexing.

25 Pith papers citing it
abstract

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

hub tools

citation-role summary

background 2 dataset 1 method 1

citation-polarity summary

representative citing papers

Tri-Modal Fusion Transformers for UAV-based Object Detection

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

A dual-stream vision transformer with modality-aware gated exchange and bidirectional token exchange fuses RGB, thermal, and event data to improve UAV vehicle detection over dual-modal baselines on a new 10,489-frame dataset.

New VVC profiles targeting Feature Coding for Machines

cs.CV · 2025-12-09 · unverdicted · novelty 4.0

Three lightweight VVC profiles for feature coding achieve up to 2.96% BD-Rate gain and 95.6% encoding speedup while preserving downstream task accuracy under the MPEG-AI FCM framework.

Virtual KITTI 2

cs.CV · 2020-01-29 · accept · novelty 4.0

Virtual KITTI 2 supplies synthetic clones of real KITTI driving sequences with added weather and camera variants and multi-modal ground-truth annotations for autonomous driving vision research.

YOLOv3: An Incremental Improvement

cs.CV · 2018-04-08 · accept · novelty 4.0

YOLOv3 achieves accuracy comparable to SSD and RetinaNet but runs substantially faster, with 28.2 mAP at 320x320 in 22 ms and 57.9 mAP@50 in 51 ms on Titan X.

citing papers explorer

Showing 25 of 25 citing papers.