RetinaFace: Single-stage Dense Face Localisation in the Wild

Jiankang Deng , Jia Guo , Yuxiang Zhou , Jinke Yu , Irene Kotsia , Stefanos Zafeiriou

Authors on Pith no claims yet

classification 💻 cs.CV

keywords faceretinafacelocalisationbeendetectionextrafivehard

read the original abstract

Though tremendous strides have been made in uncontrolled face detection, accurate and efficient face localisation in the wild remains an open challenge. This paper presents a robust single-stage face detector, named RetinaFace, which performs pixel-wise face localisation on various scales of faces by taking advantages of joint extra-supervised and self-supervised multi-task learning. Specifically, We make contributions in the following five aspects: (1) We manually annotate five facial landmarks on the WIDER FACE dataset and observe significant improvement in hard face detection with the assistance of this extra supervision signal. (2) We further add a self-supervised mesh decoder branch for predicting a pixel-wise 3D shape face information in parallel with the existing supervised branches. (3) On the WIDER FACE hard test set, RetinaFace outperforms the state of the art average precision (AP) by 1.1% (achieving AP equal to 91.4%). (4) On the IJB-C test set, RetinaFace enables state of the art methods (ArcFace) to improve their results in face verification (TAR=89.59% for FAR=1e-6). (5) By employing light-weight backbone networks, RetinaFace can run real-time on a single CPU core for a VGA-resolution image. Extra annotations and code have been made available at: https://github.com/deepinsight/insightface/tree/master/RetinaFace.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection
cs.CV 2026-04 unverdicted novelty 6.0

LAA-X uses multi-task learning with explicit localized artifact attention and blending synthesis to build a deepfake detector that generalizes to high-quality and unseen manipulations after training only on real and p...
Multimodal Ambivalence/Hesitancy Recognition in Videos for Personalized Digital Health Interventions
cs.CV 2026-04 unverdicted novelty 5.0

Multimodal deep learning for ambivalence/hesitancy recognition in videos yields limited results on the BAH dataset, highlighting the need for improved spatio-temporal and cross-modal fusion methods.
Automated Detection of Mutual Gaze and Joint Attention in Dual-Camera Settings via Dual-Stream Transformers
cs.CV 2026-04 unverdicted novelty 4.0

A dual-stream Transformer using frozen GazeLLE backbones and custom token fusion detects mutual gaze and joint attention from dual-camera recordings, outperforming CNN baselines and a multimodal LLM on caregiver-infant data.