You Only Look Once: Unified, Real-Time Object Detection

Joseph Redmon , Santosh Divvala , Ross Girshick , Ali Farhadi

Authors on Pith no claims yet

classification 💻 cs.CV

keywords detectionyoloobjectimagesnetworkreal-timeboundingboxes

read the original abstract

We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is far less likely to predict false detections where nothing exists. Finally, YOLO learns very general representations of objects. It outperforms all other detection methods, including DPM and R-CNN, by a wide margin when generalizing from natural images to artwork on both the Picasso Dataset and the People-Art Dataset.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 10 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

A global dataset of continuous urban dashcam driving
cs.CV 2026-04 accept novelty 7.0

CROWD is a new global dataset of 51,753 continuous urban dashcam segments spanning over 20,000 hours from 238 countries, with manual labels and automated object detections for routine driving analysis.
DroneScan-YOLO: Redundancy-Aware Lightweight Detection for Tiny Objects in UAV Imagery
cs.CV 2026-04 unverdicted novelty 6.0

DroneScan-YOLO reaches 55.3% mAP@50 and 35.6% mAP@50-95 on VisDrone2019-DET by combining 1280x1280 input, RPA-Block pruning, MSFD stride-4 branch, and SAL-NWD loss, beating YOLOv8s by 16.6 and 12.3 points with only 4....
CODO: An Automated Compiler for Comprehensive Dataflow Optimization
cs.AR 2026-04 unverdicted novelty 6.0

CODO automates comprehensive dataflow optimization on FPGAs, achieving 1.45x-4.52x speedups on kernels and up to 33.8x on DNN models over state-of-the-art frameworks.
Improving Layout Representation Learning Across Inconsistently Annotated Datasets via Agentic Harmonization
cs.CV 2026-04 unverdicted novelty 6.0

VLM-based harmonization of inconsistent annotations across two document layout corpora raises detection F-score from 0.860 to 0.883 and table TEDS from 0.750 to 0.814 while tightening embedding clusters.
Entropy-Gradient Grounding: Training-Free Evidence Retrieval in Vision-Language Models
cs.CV 2026-04 unverdicted novelty 6.0

Entropy-gradient grounding uses model uncertainty to retrieve evidence regions in VLMs, improving performance on detail-critical and compositional tasks across multiple architectures.
Unveiling Hidden Lyman Alpha Emitters in the DESI DR1 Data
astro-ph.GA 2026-05 unverdicted novelty 5.0

A CNN detects 19,685 LAEs at z=2-3.5 in DESI DR1 spectra with 95% purity and completeness.
Can LLMs Reason About Attention? Towards Zero-Shot Analysis of Multimodal Classroom Behavior
cs.HC 2026-04 unverdicted novelty 4.0

A pipeline uses OpenPose and Gaze-LLE to extract pose and gaze data from classroom videos, deletes the raw footage, and applies an LLM for zero-shot behavioral analysis of student attention.
Label-efficient underwater species classification with logistic regression on frozen foundation model embeddings
cs.CV 2026-03 accept novelty 4.0

Logistic regression on frozen DINOv3 features achieves 88.5% macro F1 on the AQUA20 marine species benchmark, matching end-to-end supervised models with only 6% of the labels.
Real-Time Cellist Postural Evaluation With On-Device Computer Vision
cs.HC 2026-04 unverdicted novelty 3.0

Cello Evaluator is a real-time postural feedback system for cellists running on current Android phones via on-device computer vision, validated as user-friendly by experts.
AI Driven Soccer Analysis Using Computer Vision
cs.CV 2026-04 unverdicted novelty 2.0

A system combining object detection, segmentation, keypoint prediction, and homography transforms soccer video into real-world player positions and tactical statistics.