hub

You only look once: Uniﬁed, real-time object detection

Joseph Redmon, Santosh Divvala, Ross Girshick, Ali Farhad i · 2015 · cs.CV · arXiv 1506.02640

19 Pith papers cite this work. Polarity classification is still indexing.

19 Pith papers citing it

open full Pith review browse 19 citing papers arXiv PDF

abstract

We present YOLO, a new approach to object detection. Prior work on object detection repurposes classifiers to perform detection. Instead, we frame object detection as a regression problem to spatially separated bounding boxes and associated class probabilities. A single neural network predicts bounding boxes and class probabilities directly from full images in one evaluation. Since the whole detection pipeline is a single network, it can be optimized end-to-end directly on detection performance. Our unified architecture is extremely fast. Our base YOLO model processes images in real-time at 45 frames per second. A smaller version of the network, Fast YOLO, processes an astounding 155 frames per second while still achieving double the mAP of other real-time detectors. Compared to state-of-the-art detection systems, YOLO makes more localization errors but is far less likely to predict false detections where nothing exists. Finally, YOLO learns very general representations of objects. It outperforms all other detection methods, including DPM and R-CNN, by a wide margin when generalizing from natural images to artwork on both the Picasso Dataset and the People-Art Dataset.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 1 method 1

citation-polarity summary

background 1 use method 1

representative citing papers

Structured Layout Priors for Robust Out-of-Distribution Visual Document Understanding

cs.CV · 2026-05-19 · conditional · novelty 7.0

Injecting pre-computed layout priors from RT-DETR into VLM prompts raises markdown F1 from 0.37 to 0.92 on a 10k-page OOD benchmark and cuts infinite-loop failures across domains.

Characterizing Learning in Deep Neural Networks using Tractable Algorithmic Complexity Analysis

cs.LG · 2026-05-15 · unverdicted · novelty 7.0

QuBD extends algorithmic complexity estimation to quantized DNN weights, revealing that complexity decreases during learning, increases with overfitting, follows grokking patterns, and correlates with generalization.

A global dataset of continuous urban dashcam driving

cs.CV · 2026-04-01 · accept · novelty 7.0

CROWD is a new global dataset of 51,753 continuous urban dashcam segments spanning over 20,000 hours from 238 countries, with manual labels and automated object detections for routine driving analysis.

Do You Need Text Rectification? Soft Attention Mask Embedding for Rectification-Free Scene Text Spotting

cs.CV · 2026-05-18 · unverdicted · novelty 6.0

SAME-Net adds a differentiable soft attention mask embedding module to achieve rectification-free end-to-end scene text spotting with 84.02% H-mean on Total-Text.

AFFORD2ACT: Affordance-Guided Automatic Keypoint Selection for Generalizable and Lightweight Robotic Manipulation

cs.RO · 2025-10-01 · unverdicted · novelty 6.0

AFFORD2ACT distills a minimal set of affordance-guided 2D keypoints from text and a single image to train a 38-dimensional gated transformer policy that achieves 82% success on unseen objects and scenes.

A Multitask Network for Localization and Recognition of Text in Images

cs.CL · 2019-06-21 · unverdicted · novelty 6.0

Presents an end-to-end multitask CNN with FPN, dynamic RoI pooling, and convolutional attention for simultaneous lexicon-free text localization and recognition in complex images.

DroneScan-YOLO: Redundancy-Aware Lightweight Detection for Tiny Objects in UAV Imagery

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

DroneScan-YOLO reaches 55.3% mAP@50 and 35.6% mAP@50-95 on VisDrone2019-DET by combining 1280x1280 input, RPA-Block pruning, MSFD stride-4 branch, and SAL-NWD loss, beating YOLOv8s by 16.6 and 12.3 points with only 4.1% more parameters.

CODO: An Automated Compiler for Comprehensive Dataflow Optimization

cs.AR · 2026-04-14 · unverdicted · novelty 6.0

CODO automates comprehensive dataflow optimization on FPGAs, achieving 1.45x-4.52x speedups on kernels and up to 33.8x on DNN models over state-of-the-art frameworks.

Improving Layout Representation Learning Across Inconsistently Annotated Datasets via Agentic Harmonization

cs.CV · 2026-04-13 · unverdicted · novelty 6.0

VLM-based harmonization of inconsistent annotations across two document layout corpora raises detection F-score from 0.860 to 0.883 and table TEDS from 0.750 to 0.814 while tightening embedding clusters.

Entropy-Gradient Grounding: Training-Free Evidence Retrieval in Vision-Language Models

cs.CV · 2026-04-09 · unverdicted · novelty 6.0

Entropy-gradient grounding uses model uncertainty to retrieve evidence regions in VLMs, improving performance on detail-critical and compositional tasks across multiple architectures.

OSS: Open Suturing Skills Vision-Based Assessment Challenge 2024-2025

cs.CV · 2026-05-21 · accept · novelty 5.0

The OSS Challenge provides benchmarks showing spatiotemporal video models excel at open suturing skill classification and OSATS scoring but struggle with keypoint tracking under occlusion.

Comparative blobs and holes dynamics in a tokamak plasma: deep learning analysis of fast imaging data

nlin.CD · 2026-05-18 · unverdicted · novelty 5.0

Deep learning analysis of fast imaging data indicates most negative fluctuation structures in COMPASS tokamak plasma are artifacts from sliding-median subtraction, while supernumerary negatives exhibit expected hole dynamics.

Unveiling Hidden Lyman Alpha Emitters in the DESI DR1 Data

astro-ph.GA · 2026-05-12 · unverdicted · novelty 5.0

A CNN detects 19,685 LAEs at z=2-3.5 in DESI DR1 spectra with 95% purity and completeness.

RobustTP: End-to-End Trajectory Prediction for Heterogeneous Road-Agents in Dense Traffic with Noisy Sensor Inputs

cs.RO · 2019-07-20 · unverdicted · novelty 4.0

RobustTP uses a non-linear motion model plus instance segmentation to create noisy trajectories, then an LSTM-CNN to predict 5-second future positions of heterogeneous agents in dense traffic, claiming up to 18% ADE and 35.5% FDE gains over prior methods.

GarmNet: Improving Global with Local Perception for Robotic Laundry Folding

cs.RO · 2019-06-30 · unverdicted · novelty 4.0

GarmNet jointly localizes garments and detects grasp landmarks on the CloPeMa dataset, reducing localization error by 24.7% when landmark detection is included.

Can LLMs Reason About Attention? Towards Zero-Shot Analysis of Multimodal Classroom Behavior

cs.HC · 2026-04-03 · unverdicted · novelty 4.0

A pipeline uses OpenPose and Gaze-LLE to extract pose and gaze data from classroom videos, deletes the raw footage, and applies an LLM for zero-shot behavioral analysis of student attention.

Label-efficient underwater species classification with logistic regression on frozen foundation model embeddings

cs.CV · 2026-03-31 · accept · novelty 4.0

Logistic regression on frozen DINOv3 features achieves 88.5% macro F1 on the AQUA20 marine species benchmark, matching end-to-end supervised models with only 6% of the labels.

Real-Time Cellist Postural Evaluation With On-Device Computer Vision

cs.HC · 2026-04-19 · unverdicted · novelty 3.0

Cello Evaluator is a real-time postural feedback system for cellists running on current Android phones via on-device computer vision, validated as user-friendly by experts.

AI Driven Soccer Analysis Using Computer Vision

cs.CV · 2026-04-09 · unverdicted · novelty 2.0

A system combining object detection, segmentation, keypoint prediction, and homography transforms soccer video into real-world player positions and tactical statistics.

citing papers explorer

Showing 19 of 19 citing papers.

Structured Layout Priors for Robust Out-of-Distribution Visual Document Understanding cs.CV · 2026-05-19 · conditional · none · ref 33 · internal anchor
Injecting pre-computed layout priors from RT-DETR into VLM prompts raises markdown F1 from 0.37 to 0.92 on a 10k-page OOD benchmark and cuts infinite-loop failures across domains.
Characterizing Learning in Deep Neural Networks using Tractable Algorithmic Complexity Analysis cs.LG · 2026-05-15 · unverdicted · none · ref 77 · internal anchor
QuBD extends algorithmic complexity estimation to quantized DNN weights, revealing that complexity decreases during learning, increases with overfitting, follows grokking patterns, and correlates with generalization.
A global dataset of continuous urban dashcam driving cs.CV · 2026-04-01 · accept · none · ref 58
CROWD is a new global dataset of 51,753 continuous urban dashcam segments spanning over 20,000 hours from 238 countries, with manual labels and automated object detections for routine driving analysis.
Do You Need Text Rectification? Soft Attention Mask Embedding for Rectification-Free Scene Text Spotting cs.CV · 2026-05-18 · unverdicted · none · ref 23 · internal anchor
SAME-Net adds a differentiable soft attention mask embedding module to achieve rectification-free end-to-end scene text spotting with 84.02% H-mean on Total-Text.
AFFORD2ACT: Affordance-Guided Automatic Keypoint Selection for Generalizable and Lightweight Robotic Manipulation cs.RO · 2025-10-01 · unverdicted · none · ref 39 · internal anchor
AFFORD2ACT distills a minimal set of affordance-guided 2D keypoints from text and a single image to train a 38-dimensional gated transformer policy that achieves 82% success on unseen objects and scenes.
A Multitask Network for Localization and Recognition of Text in Images cs.CL · 2019-06-21 · unverdicted · none · ref 21 · internal anchor
Presents an end-to-end multitask CNN with FPN, dynamic RoI pooling, and convolutional attention for simultaneous lexicon-free text localization and recognition in complex images.
DroneScan-YOLO: Redundancy-Aware Lightweight Detection for Tiny Objects in UAV Imagery cs.CV · 2026-04-14 · unverdicted · none · ref 13
DroneScan-YOLO reaches 55.3% mAP@50 and 35.6% mAP@50-95 on VisDrone2019-DET by combining 1280x1280 input, RPA-Block pruning, MSFD stride-4 branch, and SAL-NWD loss, beating YOLOv8s by 16.6 and 12.3 points with only 4.1% more parameters.
CODO: An Automated Compiler for Comprehensive Dataflow Optimization cs.AR · 2026-04-14 · unverdicted · none · ref 36
CODO automates comprehensive dataflow optimization on FPGAs, achieving 1.45x-4.52x speedups on kernels and up to 33.8x on DNN models over state-of-the-art frameworks.
Improving Layout Representation Learning Across Inconsistently Annotated Datasets via Agentic Harmonization cs.CV · 2026-04-13 · unverdicted · none · ref 10
VLM-based harmonization of inconsistent annotations across two document layout corpora raises detection F-score from 0.860 to 0.883 and table TEDS from 0.750 to 0.814 while tightening embedding clusters.
Entropy-Gradient Grounding: Training-Free Evidence Retrieval in Vision-Language Models cs.CV · 2026-04-09 · unverdicted · none · ref 25
Entropy-gradient grounding uses model uncertainty to retrieve evidence regions in VLMs, improving performance on detail-critical and compositional tasks across multiple architectures.
OSS: Open Suturing Skills Vision-Based Assessment Challenge 2024-2025 cs.CV · 2026-05-21 · accept · none · ref 64 · internal anchor
The OSS Challenge provides benchmarks showing spatiotemporal video models excel at open suturing skill classification and OSATS scoring but struggle with keypoint tracking under occlusion.
Comparative blobs and holes dynamics in a tokamak plasma: deep learning analysis of fast imaging data nlin.CD · 2026-05-18 · unverdicted · none · ref 15 · internal anchor
Deep learning analysis of fast imaging data indicates most negative fluctuation structures in COMPASS tokamak plasma are artifacts from sliding-median subtraction, while supernumerary negatives exhibit expected hole dynamics.
Unveiling Hidden Lyman Alpha Emitters in the DESI DR1 Data astro-ph.GA · 2026-05-12 · unverdicted · none · ref 65
A CNN detects 19,685 LAEs at z=2-3.5 in DESI DR1 spectra with 95% purity and completeness.
RobustTP: End-to-End Trajectory Prediction for Heterogeneous Road-Agents in Dense Traffic with Noisy Sensor Inputs cs.RO · 2019-07-20 · unverdicted · none · ref 28 · internal anchor
RobustTP uses a non-linear motion model plus instance segmentation to create noisy trajectories, then an LSTM-CNN to predict 5-second future positions of heterogeneous agents in dense traffic, claiming up to 18% ADE and 35.5% FDE gains over prior methods.
GarmNet: Improving Global with Local Perception for Robotic Laundry Folding cs.RO · 2019-06-30 · unverdicted · none · ref 17 · internal anchor
GarmNet jointly localizes garments and detects grasp landmarks on the CloPeMa dataset, reducing localization error by 24.7% when landmark detection is included.
Can LLMs Reason About Attention? Towards Zero-Shot Analysis of Multimodal Classroom Behavior cs.HC · 2026-04-03 · unverdicted · none · ref 10
A pipeline uses OpenPose and Gaze-LLE to extract pose and gaze data from classroom videos, deletes the raw footage, and applies an LLM for zero-shot behavioral analysis of student attention.
Label-efficient underwater species classification with logistic regression on frozen foundation model embeddings cs.CV · 2026-03-31 · accept · none · ref 21
Logistic regression on frozen DINOv3 features achieves 88.5% macro F1 on the AQUA20 marine species benchmark, matching end-to-end supervised models with only 6% of the labels.
Real-Time Cellist Postural Evaluation With On-Device Computer Vision cs.HC · 2026-04-19 · unverdicted · none · ref 14
Cello Evaluator is a real-time postural feedback system for cellists running on current Android phones via on-device computer vision, validated as user-friendly by experts.
AI Driven Soccer Analysis Using Computer Vision cs.CV · 2026-04-09 · unverdicted · none · ref 9
A system combining object detection, segmentation, keypoint prediction, and homography transforms soccer video into real-world player positions and tactical statistics.

You only look once: Uniﬁed, real-time object detection

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer