Focal Loss for Dense Object Detection

Kaiming He; Piotr Doll\'ar; Priya Goyal; Ross Girshick; Tsung-Yi Lin

arxiv: 1708.02002 · v2 · pith:JZ2WVGOAnew · submitted 2017-08-07 · 💻 cs.CV

Focal Loss for Dense Object Detection

Tsung-Yi Lin , Priya Goyal , Ross Girshick , Kaiming He , Piotr Doll\'ar This is my paper

classification 💻 cs.CV

keywords detectorslossdenseobjectaccuracyfocaltrainingtwo-stage

0 comments

read the original abstract

The highest accuracy object detectors to date are based on a two-stage approach popularized by R-CNN, where a classifier is applied to a sparse set of candidate object locations. In contrast, one-stage detectors that are applied over a regular, dense sampling of possible object locations have the potential to be faster and simpler, but have trailed the accuracy of two-stage detectors thus far. In this paper, we investigate why this is the case. We discover that the extreme foreground-background class imbalance encountered during training of dense detectors is the central cause. We propose to address this class imbalance by reshaping the standard cross entropy loss such that it down-weights the loss assigned to well-classified examples. Our novel Focal Loss focuses training on a sparse set of hard examples and prevents the vast number of easy negatives from overwhelming the detector during training. To evaluate the effectiveness of our loss, we design and train a simple dense detector we call RetinaNet. Our results show that when trained with the focal loss, RetinaNet is able to match the speed of previous one-stage detectors while surpassing the accuracy of all existing state-of-the-art two-stage detectors. Code is at: https://github.com/facebookresearch/Detectron.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 31 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

VitaminP: cross-modal learning enables whole-cell segmentation from routine histology
cs.CV 2026-04 unverdicted novelty 7.0

VitaminP uses paired H&E-mIF data to train a model that transfers molecular boundary information, enabling accurate whole-cell segmentation directly from routine H&E histology across 34 cancer types.
OD3: Optimization-free Dataset Distillation for Object Detection
cs.CV 2025-06 unverdicted novelty 7.0

OD3 presents an optimization-free dataset distillation framework for object detection that reports new state-of-the-art accuracy on COCO and VOC at compression ratios from 0.25% to 5%.
Why SGD is not Brownian Motion: A New Perspective on Stochastic Dynamics
cs.LG 2026-05 unverdicted novelty 6.0

SGD is reformulated via a master equation from discrete updates, producing a discrete Fokker-Planck equation that predicts non-stationary variance growth proportional to learning rate in flat Hessian directions.
SToRe3D: Sparse Token Relevance in ViTs for Efficient Multi-View 3D Object Detection
cs.CV 2026-05 unverdicted novelty 6.0

SToRe3D delivers up to 3x faster inference for multi-view 3D object detection in ViTs by selecting relevant 2D tokens and 3D queries via mutual relevance heads with only marginal accuracy loss.
Spectral Vision Transformer for Efficient Tokenization with Limited Data
cs.CV 2026-05 unverdicted novelty 6.0

A spectral vision transformer achieves equitable or superior performance with fewer parameters than standard ViTs, CNNs, and other models by using spectral projections for tokenization in limited-data medical imaging.
UniISP: A Unified ISP Framework for Both Human and Machine Vision
cs.CV 2026-05 unverdicted novelty 6.0

UniISP unifies ISP processing with a Hybrid Attention Module and Feature Adapter to produce images that are both visually pleasing for humans and informative for computer vision models.
Diversity in Large Language Models under Supervised Fine-Tuning
cs.LG 2026-04 unverdicted novelty 6.0

TOFU loss mitigates the narrowing of generative diversity in LLMs after supervised fine-tuning by addressing neglect of low-frequency patterns and forgetting of prior knowledge.
Component-Adaptive and Lesion-Level Supervision for Improved Small Structure Segmentation in Brain MRI
cs.CV 2026-04 unverdicted novelty 6.0

CATMIL augments nnU-Net with component-adaptive Tversky and MIL-based lesion supervision to raise Dice scores, small-lesion recall, and error control on the MSLesSeg dataset.
LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection
cs.CV 2026-04 unverdicted novelty 6.0

LAA-X uses multi-task learning with explicit localized artifact attention and blending synthesis to build a deepfake detector that generalizes to high-quality and unseen manipulations after training only on real and p...
Street-Legal Physical-World Adversarial Rim for License Plates
cs.CV 2026-04 conditional novelty 6.0

SPAR is a street-legal physical rim that cuts modern ALPR accuracy by 60% and reaches 18% targeted impersonation while costing under $100 and requiring no plate modification.
Optimus: A Robust Defense Framework for Mitigating Toxicity while Fine-Tuning Conversational AI
cs.CR 2025-07 unverdicted novelty 6.0

Optimus mitigates toxicity during LLM fine-tuning by combining repurposed LLM safety alignments for detection with synthetic data and DPO alignment, remaining effective even with highly biased classifiers and against attacks.
GigaCheck: Detecting LLM-generated Content via Object-Centric Span Localization
cs.CL 2024-10 unverdicted novelty 6.0

GigaCheck detects LLM-generated text at both document and span levels by combining fine-tuned language-model embeddings with a DETR-like architecture that treats generated intervals as detectable objects.
Replacement Learning: Training Neural Networks with Fewer Parameters
cs.CV 2026-05 unverdicted novelty 5.0

Replacement Learning replaces selected blocks in CNNs and ViTs with learnable parameter-fusion surrogates derived from adjacent layers to reduce full-depth backpropagation redundancy.
Diversity in Large Language Models under Supervised Fine-Tuning
cs.LG 2026-04 unverdicted novelty 5.0

Supervised fine-tuning narrows LLM generative diversity through neglect of low-frequency patterns and knowledge forgetting, but the TOFU loss mitigates this effect across models and benchmarks.
MapATM: Enhancing HD Map Construction through Actor Trajectory Modeling
cs.CV 2026-04 unverdicted novelty 5.0

MapATM improves lane divider AP by 4.6 and mAP by 2.6 on NuScenes by treating actor trajectories as structural priors for road geometry.
Low-Data Supervised Adaptation Outperforms Prompting for Cloud Segmentation Under Domain Shift
cs.CV 2026-04 unverdicted novelty 5.0

Supervised fine-tuning with 0.1% labeled data outperforms all 60 tested prompt variants for CLIPSeg cloud segmentation on satellite imagery under domain shift.
A Weak-Signal-Aware Framework for Subsurface Defect Detection: Mechanisms for Enhancing Low-SCR Hyperbolic Signatures
cs.CV 2026-04 unverdicted novelty 5.0

WSA-Net uses partial convolutions, heterogeneous grouping attention, geometric reconstruction, and context anchoring to enhance low-SCR hyperbolic signatures in GPR data, reaching 0.6958 mAP@0.5 at 164 FPS with 2.412M...
Trust the uncertain teacher: distilling dark knowledge via calibrated uncertainty
cs.LG 2026-02 unverdicted novelty 5.0

CUD reshapes the teacher's predictive distribution before distillation so that students receive calibrated uncertainty signals alongside accuracy, yielding more robust and better-calibrated models on high-cardinality ...
LiLAW: Lightweight Learnable Adaptive Weighting to Learn Sample Difficulty & Improve Noisy Training
cs.LG 2025-09 unverdicted novelty 5.0

LiLAW learns to weight samples as easy, moderate or hard using three global scalars updated by one gradient step on a validation batch to improve noisy training performance.
DeepFWI: Identifying Bug-Sensitive Warnings with Multi-Modal Code-Warning Semantics
cs.SE 2024-03 conditional novelty 5.0

DeepFWI is a multi-modal LSTM model with cross-attention that identifies bug-sensitive warnings at warning granularity, reaching 67.06% F1 on a 280k-warning dataset and surfacing 25 confirmed bugs in four open-source ...
REFNet++: Multi-Task Efficient Fusion of Camera and Radar Sensor Data in Bird's-Eye Polar View
cs.CV 2026-05 unverdicted novelty 4.0

REFNet++ aligns raw camera images and radar range-Doppler data into a shared bird's-eye polar view using variational encoders for multi-task vehicle detection and free space segmentation on the RADIal dataset.
OneSearch-V2: The Latent Reasoning Enhanced Self-distillation Generative Search Framework
cs.IR 2026-03 unverdicted novelty 4.0

OneSearch-V2 improves generative retrieval via latent reasoning and self-distillation, achieving +3.98% item CTR, +2.07% buyer volume, and +2.11% order volume in online A/B tests.
A Resource Efficient Fusion Network for Object Detection in Bird's-Eye View using Camera and Raw Radar Data
cs.CV 2024-11 unverdicted novelty 4.0

Describes a camera-radar fusion network that uses raw RD spectra and BEV-polar camera features for BEV object detection, evaluated for accuracy and compute on the RADIal dataset.
Skywork-Reward: Bag of Tricks for Reward Modeling in LLMs
cs.AI 2024-10 unverdicted novelty 4.0

Data-centric filtering yields an 80K preference dataset and reward models that lead RewardBench while boosting other top entries.
Lung Nodules Detection and Segmentation Using 3D Mask-RCNN
eess.IV 2019-07 unverdicted novelty 4.0

Adapted Mask-RCNN to 3D and applied it to lung nodule detection and segmentation on CT scans, reporting competitive detection results on the LUNA16 dataset.
YOLOv3: An Incremental Improvement
cs.CV 2018-04 accept novelty 4.0

YOLOv3 achieves accuracy comparable to SSD and RetinaNet but runs substantially faster, with 28.2 mAP at 320x320 in 22 ms and 57.9 mAP@50 in 51 ms on Titan X.
Sequential Feature Selection for Efficient Landslide Segmentation from Multi-Spectral Data
cs.LG 2026-05 unverdicted novelty 3.0

Sequential Forward Floating Selection with a U-Net++ proxy identifies an 8-channel subset from multi-spectral and terrain data that matches or exceeds F1 scores of full 30-channel configurations for landslide segmentation.
AI-Driven Security Alert Screening and Alert Fatigue Mitigation in Security Operations Centers: A Comprehensive Survey
cs.CR 2026-05 unverdicted novelty 3.0

A literature survey synthesizes 119 studies on AI-driven alert screening into a four-stage taxonomy of filtering, triage, correlation, and generative augmentation while identifying gaps in deployment realism and robustness.
Duluth at SemEval-2026 Task 6: DeBERTa with LLM-Augmented Data for Unmasking Political Question Evasions
cs.CL 2026-04 unverdicted novelty 3.0

DeBERTa-V3-base with focal loss, discourse features, and LLM-augmented data for minority classes achieves 0.76 Macro F1 on clarity-level classification of political QA pairs, ranking 8th in SemEval-2026 Task 6.
YEZE at SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization via Heterogeneous Ensembling
cs.CL 2026-05 unverdicted novelty 2.0

A heterogeneous ensemble of XLM-RoBERTa-large and mDeBERTa-v3-base with independent task modeling and class weighting is reported as effective for multilingual, multicultural, and multievent online polarization detection.
YEZE at SemEval-2026 Task 9: Detecting Multilingual, Multicultural and Multievent Online Polarization via Heterogeneous Ensembling
cs.CL 2026-05 unverdicted novelty 2.0

Independent task modeling with class weighting outperforms multi-task learning and translation augmentation in a multilingual model ensemble for SemEval-2026 Task 9 polarization detection.