RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free

Cheng-Yang Fu, Mykhailo Shvets, Alexander C Berg · 2019 · cs.CV · arXiv 1901.03353

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

Recently two-stage detectors have surged ahead of single-shot detectors in the accuracy-vs-speed trade-off. Nevertheless single-shot detectors are immensely popular in embedded vision applications. This paper brings single-shot detectors up to the same level as current two-stage techniques. We do this by improving training for the state-of-the-art single-shot detector, RetinaNet, in three ways: integrating instance mask prediction for the first time, making the loss function adaptive and more stable, and including additional hard examples in training. We call the resulting augmented network RetinaMask. The detection component of RetinaMask has the same computational cost as the original RetinaNet, but is more accurate. COCO test-dev results are up to 41.4 mAP for RetinaMask-101 vs 39.1mAP for RetinaNet-101, while the runtime is the same during evaluation. Adding Group Normalization increases the performance of RetinaMask-101 to 41.7 mAP. Code is at:https://github.com/chengyangfu/retinamask

citation-role summary

baseline 1

citation-polarity summary

baseline 1

representative citing papers

DiffClean: Diffusion-based Makeup Removal for Accurate Age Estimation

cs.CV · 2025-07-17 · unverdicted · novelty 6.0

DiffClean applies text-guided diffusion to erase makeup from faces, boosting age estimation and verification accuracy over makeup-affected images.

Where are the Masks: Instance Segmentation with Image-level Supervision

cs.CV · 2019-07-02 · unverdicted · novelty 6.0

A two-stage pipeline generates pseudo masks from image-level labels to train Mask R-CNN, achieving state-of-the-art results on PASCAL VOC 2012 for weakly supervised instance segmentation.

YOLOv4: Optimal Speed and Accuracy of Object Detection

cs.CV · 2020-04-23 · unverdicted · novelty 5.0

YOLOv4 achieves 43.5% AP (65.7% AP50) on MS COCO at ~65 FPS on Tesla V100 by integrating WRC, CSP, CmBN, SAT, Mish activation, Mosaic augmentation, DropBlock, and CIoU loss.

citing papers explorer

Showing 3 of 3 citing papers.

DiffClean: Diffusion-based Makeup Removal for Accurate Age Estimation cs.CV · 2025-07-17 · unverdicted · none · ref 15 · internal anchor
DiffClean applies text-guided diffusion to erase makeup from faces, boosting age estimation and verification accuracy over makeup-affected images.
Where are the Masks: Instance Segmentation with Image-level Supervision cs.CV · 2019-07-02 · unverdicted · none · ref 13 · internal anchor
A two-stage pipeline generates pseudo masks from image-level labels to train Mask R-CNN, achieving state-of-the-art results on PASCAL VOC 2012 for weakly supervised instance segmentation.
YOLOv4: Optimal Speed and Accuracy of Object Detection cs.CV · 2020-04-23 · unverdicted · none · ref 14
YOLOv4 achieves 43.5% AP (65.7% AP50) on MS COCO at ~65 FPS on Tesla V100 by integrating WRC, CSP, CmBN, SAT, Mish activation, Mosaic augmentation, DropBlock, and CIoU loss.

RetinaMask: Learning to predict masks improves state-of-the-art single-shot detection for free

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer