Fast R-CNN
read the original abstract
This paper proposes a Fast Region-based Convolutional Network method (Fast R-CNN) for object detection. Fast R-CNN builds on previous work to efficiently classify object proposals using deep convolutional networks. Compared to previous work, Fast R-CNN employs several innovations to improve training and testing speed while also increasing detection accuracy. Fast R-CNN trains the very deep VGG16 network 9x faster than R-CNN, is 213x faster at test-time, and achieves a higher mAP on PASCAL VOC 2012. Compared to SPPnet, Fast R-CNN trains VGG16 3x faster, tests 10x faster, and is more accurate. Fast R-CNN is implemented in Python and C++ (using Caffe) and is available under the open-source MIT License at https://github.com/rbgirshick/fast-rcnn.
This paper has not been read by Pith yet.
Forward citations
Cited by 11 Pith papers
-
Pose Estimation for Non-Cooperative Rendezvous Using Neural Networks
SPN is a CNN that detects a spacecraft bounding box, classifies then regresses attitude, and optimizes position via Gauss-Newton, achieving degree-level attitude and cm-level position errors on real images after train...
-
Deep Compression: Compressing Deep Neural Networks with Pruning, Trained Quantization and Huffman Coding
A pruning-quantization-Huffman pipeline compresses deep neural networks 35-49x without accuracy loss.
-
AMAR: Lightweight Attention-Based Multi-User Activity Recognition from Wi-Fi CSI
AMAR uses a transformer with learnable query embeddings for set-based prediction of concurrent activities from composite Wi-Fi CSI, combined with edge feature extraction and vector quantization for bandwidth-efficient...
-
CalibFree: Self-Supervised View Feature Separation for Calibration-Free Multi-Camera Multi-Object Tracking
CalibFree enables calibration-free multi-camera tracking via self-supervised feature separation through single-view distillation and cross-view reconstruction, reporting 3% higher accuracy and 7.5% better F1 on tested...
-
A Multitask Network for Localization and Recognition of Text in Images
Presents an end-to-end multitask CNN with FPN, dynamic RoI pooling, and convolutional attention for simultaneous lexicon-free text localization and recognition in complex images.
-
Efficient Multi-Domain Network Learning by Covariance Normalization
CovNorm reduces parameters in domain-adaptive layers via two PCAs and a mini-adaptation layer, enabling efficient multi-domain learning with performance close to full fine-tuning.
-
Label-Efficient School Detection from Aerial Imagery via Weakly Supervised Pretraining and Fine-Tuning
A two-stage weakly supervised pipeline pretrains on auto-generated school labels from sparse points and fine-tunes on only 50 manual examples to achieve strong detection performance in aerial imagery.
-
Learning to count small and clustered objects with application to bacterial colonies
ACFamNet Pro reaches 9.64% mean normalized absolute error on bacterial colony images under 5-fold cross-validation, beating FamNet by 12.71%.
-
GarmNet: Improving Global with Local Perception for Robotic Laundry Folding
GarmNet jointly localizes garments and detects grasp landmarks on the CloPeMa dataset, reducing localization error by 24.7% when landmark detection is included.
-
RGB-D image-based Object Detection: from Traditional Methods to Deep Learning Techniques
A survey of RGB-D object detection from traditional hand-crafted features with machine learning to deep learning techniques.
-
Understanding Deep Learning Techniques for Image Segmentation
A 2019 survey that categorizes and intuitively explains major deep learning techniques for image segmentation, progressing from classical methods to modern neural architectures.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.