hub

Girshick, and Jian Sun

Shaoqing Ren, Kaiming He, Ross B · 2015 · cs.CV · arXiv 1506.01497

25 Pith papers cite this work. Polarity classification is still indexing.

25 Pith papers citing it

open full Pith review browse 25 citing papers arXiv PDF

abstract

State-of-the-art object detection networks depend on region proposal algorithms to hypothesize object locations. Advances like SPPnet and Fast R-CNN have reduced the running time of these detection networks, exposing region proposal computation as a bottleneck. In this work, we introduce a Region Proposal Network (RPN) that shares full-image convolutional features with the detection network, thus enabling nearly cost-free region proposals. An RPN is a fully convolutional network that simultaneously predicts object bounds and objectness scores at each position. The RPN is trained end-to-end to generate high-quality region proposals, which are used by Fast R-CNN for detection. We further merge RPN and Fast R-CNN into a single network by sharing their convolutional features---using the recently popular terminology of neural networks with 'attention' mechanisms, the RPN component tells the unified network where to look. For the very deep VGG-16 model, our detection system has a frame rate of 5fps (including all steps) on a GPU, while achieving state-of-the-art object detection accuracy on PASCAL VOC 2007, 2012, and MS COCO datasets with only 300 proposals per image. In ILSVRC and COCO 2015 competitions, Faster R-CNN and RPN are the foundations of the 1st-place winning entries in several tracks. Code has been made publicly available.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2 dataset 1 method 1

citation-polarity summary

background 2 use dataset 1 use method 1

representative citing papers

OD3: Optimization-free Dataset Distillation for Object Detection

cs.CV · 2025-06-02 · unverdicted · novelty 7.0

OD3 presents an optimization-free dataset distillation framework for object detection that reports new state-of-the-art accuracy on COCO and VOC at compression ratios from 0.25% to 5%.

Tri-Modal Fusion Transformers for UAV-based Object Detection

cs.CV · 2026-04-17 · unverdicted · novelty 7.0

A dual-stream vision transformer with modality-aware gated exchange and bidirectional token exchange fuses RGB, thermal, and event data to improve UAV vehicle detection over dual-modal baselines on a new 10,489-frame dataset.

A Multitask Network for Localization and Recognition of Text in Images

cs.CL · 2019-06-21 · unverdicted · novelty 6.0

Presents an end-to-end multitask CNN with FPN, dynamic RoI pooling, and convolutional attention for simultaneous lexicon-free text localization and recognition in complex images.

Investigating Anisotropy in Visual Grounding under Controlled Counterfactual Perturbations

cs.CV · 2026-05-09 · unverdicted · novelty 6.0

Controlled counterfactual perturbations reveal no correlation between embedding cosine similarity and approximation behavior in two visual grounding models.

Transferable Physical-World Adversarial Patches Against Pedestrian Detection Models

cs.CV · 2026-04-24 · unverdicted · novelty 6.0

TriPatch generates transferable physical adversarial patches via multi-stage triplet loss, appearance consistency, and data augmentation to achieve higher attack success rates on pedestrian detectors than prior methods.

PASTA: A Patch-Agnostic Twofold-Stealthy Backdoor Attack on Vision Transformers

cs.CV · 2026-04-21 · unverdicted · novelty 6.0

PASTA enables patch-agnostic backdoor activation in ViTs via multi-location trigger insertion during training and bi-level optimization, achieving 99.13% average attack success with large gains in visual/attention stealthiness and defense robustness.

AIM: Asymmetric Information Masking for Visual Question Answering Continual Learning

cs.CV · 2026-04-16 · unverdicted · novelty 6.0

AIM applies modality-specific masks to balance stability and plasticity in asymmetric VLMs, achieving SOTA average performance and reduced forgetting on continual VQA v2 and GQA while preserving generalization to novel compositions.

DroneScan-YOLO: Redundancy-Aware Lightweight Detection for Tiny Objects in UAV Imagery

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

DroneScan-YOLO reaches 55.3% mAP@50 and 35.6% mAP@50-95 on VisDrone2019-DET by combining 1280x1280 input, RPA-Block pruning, MSFD stride-4 branch, and SAL-NWD loss, beating YOLOv8s by 16.6 and 12.3 points with only 4.1% more parameters.

Improving Layout Representation Learning Across Inconsistently Annotated Datasets via Agentic Harmonization

cs.CV · 2026-04-13 · unverdicted · novelty 6.0

VLM-based harmonization of inconsistent annotations across two document layout corpora raises detection F-score from 0.860 to 0.883 and table TEDS from 0.750 to 0.814 while tightening embedding clusters.

Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges

cs.LG · 2021-04-27 · accept · novelty 6.0

Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.

Unveiling Hidden Lyman Alpha Emitters in the DESI DR1 Data

astro-ph.GA · 2026-05-12 · unverdicted · novelty 5.0

A CNN detects 19,685 LAEs at z=2-3.5 in DESI DR1 spectra with 95% purity and completeness.

Investigation of cardinality classification for bacterial colony counting using explainable artificial intelligence

cs.CV · 2026-04-21 · unverdicted · novelty 5.0

XAI analysis identifies high visual similarity across colony cardinality classes as the primary limit on MicrobiaNet performance in bacterial colony counting, revising prior model assessments.

A Weak-Signal-Aware Framework for Subsurface Defect Detection: Mechanisms for Enhancing Low-SCR Hyperbolic Signatures

cs.CV · 2026-04-07 · unverdicted · novelty 5.0

WSA-Net uses partial convolutions, heterogeneous grouping attention, geometric reconstruction, and context anchoring to enhance low-SCR hyperbolic signatures in GPR data, reaching 0.6958 mAP@0.5 at 164 FPS with 2.412M parameters on the RTST dataset.

New VVC profiles targeting Feature Coding for Machines

cs.CV · 2025-12-09 · unverdicted · novelty 4.0

Three lightweight VVC profiles for feature coding achieve up to 2.96% BD-Rate gain and 95.6% encoding speedup while preserving downstream task accuracy under the MPEG-AI FCM framework.

GarmNet: Improving Global with Local Perception for Robotic Laundry Folding

cs.RO · 2019-06-30 · unverdicted · novelty 4.0

GarmNet jointly localizes garments and detects grasp landmarks on the CloPeMa dataset, reducing localization error by 24.7% when landmark detection is included.

Label-Efficient School Detection from Aerial Imagery via Weakly Supervised Pretraining and Fine-Tuning

cs.CV · 2026-05-05 · unverdicted · novelty 4.0

A two-stage weakly supervised pipeline pretrains on auto-generated school labels from sparse points and fine-tunes on only 50 manual examples to achieve strong detection performance in aerial imagery.

Multi-Dataset Cross-Domain Knowledge Distillation for Unified Medical Image Segmentation, Classification, and Detection

cs.CV · 2026-05-02 · unverdicted · novelty 4.0

A multi-dataset cross-domain knowledge distillation approach improves unified performance on medical image segmentation, classification, and detection by transferring domain-invariant features from a joint teacher model to task-specific students.

KAYRA: A Microservice Architecture for AI-Assisted Karyotyping with Cloud and On-Premise Deployment

cs.LG · 2026-04-29 · unverdicted · novelty 4.0

KAYRA packages a cascade of EfficientNet-B5 + U-Net, Mask R-CNN, and ResNet-18 models into a microservice architecture that supports both cloud and on-premise deployment and reaches 98.91% segmentation accuracy in a pilot test on 459 chromosomes.

Learning to count small and clustered objects with application to bacterial colonies

cs.CV · 2026-04-21 · unverdicted · novelty 4.0

ACFamNet Pro reaches 9.64% mean normalized absolute error on bacterial colony images under 5-fold cross-validation, beating FamNet by 12.71%.

Virtual KITTI 2

cs.CV · 2020-01-29 · accept · novelty 4.0

Virtual KITTI 2 supplies synthetic clones of real KITTI driving sequences with added weather and camera variants and multi-modal ground-truth annotations for autonomous driving vision research.

YOLOv3: An Incremental Improvement

cs.CV · 2018-04-08 · accept · novelty 4.0

YOLOv3 achieves accuracy comparable to SSD and RetinaNet but runs substantially faster, with 28.2 mAP at 320x320 in 22 ms and 57.9 mAP@50 in 51 ms on Titan X.

CD-TWINSAFE: A ROS-enabled Digital Twin for Scene Understanding and Safety Emerging V2I Technology

cs.CV · 2026-01-18 · unverdicted · novelty 3.0

A ROS-enabled V2I digital twin architecture integrates on-board stereo perception with an Unreal Engine 5 replica to deliver real-time safety monitoring for autonomous vehicles.

A Comparative Study of Modern Object Detectors for Robust Apple Detection in Orchard Imagery

cs.CV · 2026-04-11 · unverdicted · novelty 3.0

YOLO11n achieves the highest mAP@0.5:0.95 of 0.6065 for apple localization, with other detectors showing trade-offs in recall and precision at low confidence thresholds.

RGB-D image-based Object Detection: from Traditional Methods to Deep Learning Techniques

cs.CV · 2019-07-22 · unverdicted · novelty 2.0

A survey of RGB-D object detection from traditional hand-crafted features with machine learning to deep learning techniques.

citing papers explorer

Showing 25 of 25 citing papers.

OD3: Optimization-free Dataset Distillation for Object Detection cs.CV · 2025-06-02 · unverdicted · none · ref 4 · internal anchor
OD3 presents an optimization-free dataset distillation framework for object detection that reports new state-of-the-art accuracy on COCO and VOC at compression ratios from 0.25% to 5%.
Tri-Modal Fusion Transformers for UAV-based Object Detection cs.CV · 2026-04-17 · unverdicted · none · ref 25
A dual-stream vision transformer with modality-aware gated exchange and bidirectional token exchange fuses RGB, thermal, and event data to improve UAV vehicle detection over dual-modal baselines on a new 10,489-frame dataset.
A Multitask Network for Localization and Recognition of Text in Images cs.CL · 2019-06-21 · unverdicted · none · ref 22 · internal anchor
Presents an end-to-end multitask CNN with FPN, dynamic RoI pooling, and convolutional attention for simultaneous lexicon-free text localization and recognition in complex images.
Investigating Anisotropy in Visual Grounding under Controlled Counterfactual Perturbations cs.CV · 2026-05-09 · unverdicted · none · ref 31
Controlled counterfactual perturbations reveal no correlation between embedding cosine similarity and approximation behavior in two visual grounding models.
Transferable Physical-World Adversarial Patches Against Pedestrian Detection Models cs.CV · 2026-04-24 · unverdicted · none · ref 30
TriPatch generates transferable physical adversarial patches via multi-stage triplet loss, appearance consistency, and data augmentation to achieve higher attack success rates on pedestrian detectors than prior methods.
PASTA: A Patch-Agnostic Twofold-Stealthy Backdoor Attack on Vision Transformers cs.CV · 2026-04-21 · unverdicted · none · ref 75
PASTA enables patch-agnostic backdoor activation in ViTs via multi-location trigger insertion during training and bi-level optimization, achieving 99.13% average attack success with large gains in visual/attention stealthiness and defense robustness.
AIM: Asymmetric Information Masking for Visual Question Answering Continual Learning cs.CV · 2026-04-16 · unverdicted · none · ref 46
AIM applies modality-specific masks to balance stability and plasticity in asymmetric VLMs, achieving SOTA average performance and reduced forgetting on continual VQA v2 and GQA while preserving generalization to novel compositions.
DroneScan-YOLO: Redundancy-Aware Lightweight Detection for Tiny Objects in UAV Imagery cs.CV · 2026-04-14 · unverdicted · none · ref 20
DroneScan-YOLO reaches 55.3% mAP@50 and 35.6% mAP@50-95 on VisDrone2019-DET by combining 1280x1280 input, RPA-Block pruning, MSFD stride-4 branch, and SAL-NWD loss, beating YOLOv8s by 16.6 and 12.3 points with only 4.1% more parameters.
Improving Layout Representation Learning Across Inconsistently Annotated Datasets via Agentic Harmonization cs.CV · 2026-04-13 · unverdicted · none · ref 9
VLM-based harmonization of inconsistent annotations across two document layout corpora raises detection F-score from 0.860 to 0.883 and table TEDS from 0.750 to 0.814 while tightening embedding clusters.
Geometric Deep Learning: Grids, Groups, Graphs, Geodesics, and Gauges cs.LG · 2021-04-27 · accept · none · ref 68
Geometric deep learning provides a unified mathematical framework based on grids, groups, graphs, geodesics, and gauges to explain and extend neural network architectures by incorporating physical regularities.
Unveiling Hidden Lyman Alpha Emitters in the DESI DR1 Data astro-ph.GA · 2026-05-12 · unverdicted · none · ref 66
A CNN detects 19,685 LAEs at z=2-3.5 in DESI DR1 spectra with 95% purity and completeness.
Investigation of cardinality classification for bacterial colony counting using explainable artificial intelligence cs.CV · 2026-04-21 · unverdicted · none · ref 47
XAI analysis identifies high visual similarity across colony cardinality classes as the primary limit on MicrobiaNet performance in bacterial colony counting, revising prior model assessments.
A Weak-Signal-Aware Framework for Subsurface Defect Detection: Mechanisms for Enhancing Low-SCR Hyperbolic Signatures cs.CV · 2026-04-07 · unverdicted · none · ref 15
WSA-Net uses partial convolutions, heterogeneous grouping attention, geometric reconstruction, and context anchoring to enhance low-SCR hyperbolic signatures in GPR data, reaching 0.6958 mAP@0.5 at 164 FPS with 2.412M parameters on the RTST dataset.
New VVC profiles targeting Feature Coding for Machines cs.CV · 2025-12-09 · unverdicted · none · ref 12 · internal anchor
Three lightweight VVC profiles for feature coding achieve up to 2.96% BD-Rate gain and 95.6% encoding speedup while preserving downstream task accuracy under the MPEG-AI FCM framework.
GarmNet: Improving Global with Local Perception for Robotic Laundry Folding cs.RO · 2019-06-30 · unverdicted · none · ref 19 · internal anchor
GarmNet jointly localizes garments and detects grasp landmarks on the CloPeMa dataset, reducing localization error by 24.7% when landmark detection is included.
Label-Efficient School Detection from Aerial Imagery via Weakly Supervised Pretraining and Fine-Tuning cs.CV · 2026-05-05 · unverdicted · none · ref 26
A two-stage weakly supervised pipeline pretrains on auto-generated school labels from sparse points and fine-tunes on only 50 manual examples to achieve strong detection performance in aerial imagery.
Multi-Dataset Cross-Domain Knowledge Distillation for Unified Medical Image Segmentation, Classification, and Detection cs.CV · 2026-05-02 · unverdicted · none · ref 63
A multi-dataset cross-domain knowledge distillation approach improves unified performance on medical image segmentation, classification, and detection by transferring domain-invariant features from a joint teacher model to task-specific students.
KAYRA: A Microservice Architecture for AI-Assisted Karyotyping with Cloud and On-Premise Deployment cs.LG · 2026-04-29 · unverdicted · none · ref 4
KAYRA packages a cascade of EfficientNet-B5 + U-Net, Mask R-CNN, and ResNet-18 models into a microservice architecture that supports both cloud and on-premise deployment and reaches 98.91% segmentation accuracy in a pilot test on 459 chromosomes.
Learning to count small and clustered objects with application to bacterial colonies cs.CV · 2026-04-21 · unverdicted · none · ref 68
ACFamNet Pro reaches 9.64% mean normalized absolute error on bacterial colony images under 5-fold cross-validation, beating FamNet by 12.71%.
Virtual KITTI 2 cs.CV · 2020-01-29 · accept · none · ref 20
Virtual KITTI 2 supplies synthetic clones of real KITTI driving sequences with added weather and camera variants and multi-modal ground-truth annotations for autonomous driving vision research.
YOLOv3: An Incremental Improvement cs.CV · 2018-04-08 · accept · none · ref 17
YOLOv3 achieves accuracy comparable to SSD and RetinaNet but runs substantially faster, with 28.2 mAP at 320x320 in 22 ms and 57.9 mAP@50 in 51 ms on Titan X.
CD-TWINSAFE: A ROS-enabled Digital Twin for Scene Understanding and Safety Emerging V2I Technology cs.CV · 2026-01-18 · unverdicted · none · ref 13 · internal anchor
A ROS-enabled V2I digital twin architecture integrates on-board stereo perception with an Unreal Engine 5 replica to deliver real-time safety monitoring for autonomous vehicles.
A Comparative Study of Modern Object Detectors for Robust Apple Detection in Orchard Imagery cs.CV · 2026-04-11 · unverdicted · none · ref 4
YOLO11n achieves the highest mAP@0.5:0.95 of 0.6065 for apple localization, with other detectors showing trade-offs in recall and precision at low confidence thresholds.
RGB-D image-based Object Detection: from Traditional Methods to Deep Learning Techniques cs.CV · 2019-07-22 · unverdicted · none · ref 70 · internal anchor
A survey of RGB-D object detection from traditional hand-crafted features with machine learning to deep learning techniques.
AI Driven Soccer Analysis Using Computer Vision cs.CV · 2026-04-09 · unverdicted · none · ref 10
A system combining object detection, segmentation, keypoint prediction, and homography transforms soccer video into real-world player positions and tactical statistics.

Girshick, and Jian Sun

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer