hub Mixed citations

YOLOv12: Attention-Centric Real-Time Object Detectors

Yunjie Tian, Qixiang Ye, David Doermann · 2025 · cs.CV · arXiv 2502.12524

Mixed citation behavior. Most common role is background (44%).

35 Pith papers citing it

Background 44% of classified citations

open full Pith review browse 35 citing papers arXiv PDF

abstract

Enhancing the network architecture of the YOLO framework has been crucial for a long time, but has focused on CNN-based improvements despite the proven superiority of attention mechanisms in modeling capabilities. This is because attention-based models cannot match the speed of CNN-based models. This paper proposes an attention-centric YOLO framework, namely YOLOv12, that matches the speed of previous CNN-based ones while harnessing the performance benefits of attention mechanisms. YOLOv12 surpasses all popular real-time object detectors in accuracy with competitive speed. For example, YOLOv12-N achieves 40.6% mAP with an inference latency of 1.64 ms on a T4 GPU, outperforming advanced YOLOv10-N / YOLOv11-N by 2.1%/1.2% mAP with a comparable speed. This advantage extends to other model scales. YOLOv12 also surpasses end-to-end real-time detectors that improve DETR, such as RT-DETR / RT-DETRv2: YOLOv12-S beats RT-DETR-R18 / RT-DETRv2-R18 while running 42% faster, using only 36% of the computation and 45% of the parameters. More comparisons are shown in Figure 1.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 method 3 baseline 2

citation-polarity summary

background 4 use method 3 baseline 2

representative citing papers

A Dataset for the Recognition of Historical and Handwritten Music Scores in Western Notation

cs.CV · 2026-05-18 · conditional · novelty 7.0

MusiCorpus supplies 1,309 pages of real historical handwritten music with transcriptions and annotations, the largest such resource for training optical music recognition systems under realistic conditions.

WUTDet: A 100K-Scale Ship Detection Dataset and Benchmarks with Dense Small Objects

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

WUTDet is a 100K-image ship detection dataset with benchmarks indicating Transformer models outperform CNN and Mamba architectures in accuracy and small-object detection for complex maritime environments.

SARES-DEIM: Sparse Mixture-of-Experts Meets DETR for Robust SAR Ship Detection

cs.CV · 2026-04-05 · unverdicted · novelty 7.0

SARES-DEIM achieves 76.4% mAP50:95 and 93.8% mAP50 on HRSID by routing SAR features through sparse frequency and wavelet experts plus a high-resolution preservation neck, outperforming prior YOLO and SAR detectors.

Descriptor: Parasitoid Wasps and Associated Hymenoptera Dataset (DAPWH)

cs.CV · 2026-02-20 · unverdicted · novelty 7.0

Releases the DAPWH dataset of 3556 wasp images including 1739 COCO-annotated examples to enable AI models for identifying Ichneumonoidea and associated families.

AnyDepth-DETR/-YOLO: Any-depth object detection with a single network

cs.CV · 2026-05-10 · unverdicted · novelty 6.0

A single network achieves any-depth object detection by splitting stages into always-executed essential paths and skippable refinement paths, trained via self-distillation on the full and minimal extremes to maintain stage compatibility.

Training-Free Tunnel Defect Inspection and Engineering Interpretation via Visual Recalibration and Entity Reconstruction

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

TunnelMIND recalibrates language-guided defect proposals via dense visual consistency and reconstructs them into structured defect entities with attributes for severity grading and retrieval-grounded engineering reports, reporting F1 scores of 0.68, 0.78, and 0.72 on visible, GPR, and road defect任务.

Visual Prototype Conditioned Focal Region Generation for UAV-Based Object Detection

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

UAVGen generates higher-quality synthetic UAV images via visual prototype conditioning and focal region focus in diffusion models, leading to better object detection accuracy than prior methods.

Scale-Gest: Scalable Model-Space Synthesis and Runtime Selection for On-Device Gesture Detection

cs.CV · 2026-03-16 · conditional · novelty 6.0

Scale-Gest creates a runtime-selectable family of tiny-YOLO models with device-calibrated ACE profiles and an ROI gate that cuts per-frame energy by 4x while holding event-level F1 at 0.8-0.9 on a new driving-gesture dataset.

A Self-Evolving Defect Detection Framework for Industrial Photovoltaic Systems

cs.AI · 2026-03-16 · unverdicted · novelty 6.0

SEPDD is a self-evolving defect detection framework for PV modules that achieves 91.4% mAP50 on public data and 49.5% on private data, outperforming autonomous baselines and human experts.

Unified Unsupervised and Sparsely-Supervised 3D Object Detection by Semantic Pseudo-Labeling and Prototype Learning

cs.CV · 2026-02-25 · unverdicted · novelty 6.0

SPL unifies unsupervised and sparsely-supervised 3D object detection via semantic pseudo-labeling that produces bounding boxes and point labels, followed by memory-based prototype learning that mines features from both labeled and unlabeled data.

Edge Assisted Multi-Camera Vehicle Tracking Framework for Real-Time and Scalable Deployment

cs.CV · 2025-11-17 · unverdicted · novelty 6.0

EASE-MCVT is a distributed edge-assisted multi-camera vehicle tracking framework that achieves real-time performance and competitive accuracy on public datasets through edge processing and server-side optimizations.

SoftHGNN: Soft Hypergraph Neural Networks for General Visual Recognition

cs.CV · 2025-05-21 · unverdicted · novelty 6.0

SoftHGNN introduces differentiable soft hyperedges via learnable prototypes and top-k sparse selection to model high-order visual interactions and improve recognition accuracy.

SpikeDet: Better Firing Patterns for Accurate and Energy-Efficient Object Detection with Spiking Neural Networks

cs.CV · 2025-01-25 · unverdicted · novelty 6.0

SpikeDet reaches 52.2% AP on COCO 2017 with spiking networks by optimizing firing patterns via MDSNet and SMFM, using half the energy of prior SNN detectors.

TriBand-BEV: Real-Time LiDAR-Only 3D Pedestrian Detection via Height-Aware BEV and High-Resolution Feature Fusion

cs.CV · 2026-05-12 · unverdicted · novelty 5.0

TriBand-BEV introduces a three-band height-aware BEV encoding of LiDAR data to enable single-pass real-time 3D detection of pedestrians, cars, and cyclists with improved KITTI accuracy.

Cooperative Robotics Reinforced by Collective Perception for Traffic Moderation

cs.RO · 2026-05-12 · unverdicted · novelty 5.0

A cooperative humanoid robot fuses camera-based collective perception with V2X messages to detect collision risks at non-line-of-sight intersections and physically stops merging vehicles.

InsHuman: Towards Natural and Identity-Preserving Human Insertion

cs.CV · 2026-05-08 · unverdicted · novelty 5.0

InsHuman proposes Human-Background Adaptive Fusion, Face-to-Face ID-Preserving, and Bidirectional Data Pairing to enable natural human insertion in images without altering identity.

LLM-Guided Agentic Floor Plan Parsing for Accessible Indoor Navigation of Blind and Low-Vision People

cs.AI · 2026-04-27 · unverdicted · novelty 5.0

A self-correcting multi-agent LLM pipeline parses floor plans into graphs and generates accessible routes, outperforming single LLM calls with success rates up to 92% on short paths in a real university building.

Caries DETR: Tooth Structure-aware Prior and Lesion-aware Dynamic Loss Refinement for DETR Based Caries Detection

cs.CV · 2026-04-26 · unverdicted · novelty 5.0

Caries-DETR adds tooth-structure query initialization and lesion-aware loss reweighting to DETR, reaching state-of-the-art caries detection on AlphaDent and DentalAI datasets.

StomaD2: An All-in-One System for Intelligent Stomatal Phenotype Analysis via Diffusion-Based Restoration Detection Network

cs.CV · 2026-04-18 · unverdicted · novelty 5.0

StomaD2 integrates diffusion-based image restoration with a specialized rotated detection network to achieve high-accuracy stomatal phenotyping across more than 130 plant species.

A Weak-Signal-Aware Framework for Subsurface Defect Detection: Mechanisms for Enhancing Low-SCR Hyperbolic Signatures

cs.CV · 2026-04-07 · unverdicted · novelty 5.0

WSA-Net uses partial convolutions, heterogeneous grouping attention, geometric reconstruction, and context anchoring to enhance low-SCR hyperbolic signatures in GPR data, reaching 0.6958 mAP@0.5 at 164 FPS with 2.412M parameters on the RTST dataset.

FMC-DETR: Frequency-Decoupled Multi-Domain Coordination for Aerial-View Object Detection

cs.CV · 2025-09-27 · unverdicted · novelty 5.0

FMC-DETR proposes a frequency-decoupled fusion framework with WeKat backbone, MDFC coordination, and CPF fusion modules that claims state-of-the-art results on remote sensing object detection benchmarks.

3D Reconstruction and Knowledge Distillation to Improve Multi-View Image Models to Explore Spike Volume Estimation in Wheat

cs.CV · 2026-05-20 · unverdicted · novelty 4.0

Knowledge distillation from a rigid-invariant 3D point cloud network into a regulated multi-view Transformer yields lower-error, faster wheat spike volume estimates from 2D images.

ATRACT: A Trustworthy Robotic Autonomous system to support Casualty Triage

cs.HC · 2026-05-16 · unverdicted · novelty 4.0

ATRACT integrates drone video and wearable sensor data with conditional variational autoencoder augmentation to achieve 85.7% accuracy in casualty action classification for remote battlefield triage.

A Marine Debris Detection Framework for Ocean Robots via Self-Attention Enhancement and Feature Interaction Optimization

cs.CV · 2026-05-08 · unverdicted · novelty 4.0

YOLO-MD improves underwater marine debris detection by adding a Dual-Branch Convolutional Enhanced Self-Attention module, a lightweight shift operation, and SFG-Loss for class imbalance, achieving 0.875 precision and 0.849 mAP50 on the UODM dataset.

citing papers explorer

Showing 35 of 35 citing papers.

A Dataset for the Recognition of Historical and Handwritten Music Scores in Western Notation cs.CV · 2026-05-18 · conditional · none · ref 53 · internal anchor
MusiCorpus supplies 1,309 pages of real historical handwritten music with transcriptions and annotations, the largest such resource for training optical music recognition systems under realistic conditions.
WUTDet: A 100K-Scale Ship Detection Dataset and Benchmarks with Dense Small Objects cs.CV · 2026-04-09 · unverdicted · none · ref 34 · internal anchor
WUTDet is a 100K-image ship detection dataset with benchmarks indicating Transformer models outperform CNN and Mamba architectures in accuracy and small-object detection for complex maritime environments.
SARES-DEIM: Sparse Mixture-of-Experts Meets DETR for Robust SAR Ship Detection cs.CV · 2026-04-05 · unverdicted · none · ref 8 · internal anchor
SARES-DEIM achieves 76.4% mAP50:95 and 93.8% mAP50 on HRSID by routing SAR features through sparse frequency and wavelet experts plus a high-resolution preservation neck, outperforming prior YOLO and SAR detectors.
Descriptor: Parasitoid Wasps and Associated Hymenoptera Dataset (DAPWH) cs.CV · 2026-02-20 · unverdicted · none · ref 39 · internal anchor
Releases the DAPWH dataset of 3556 wasp images including 1739 COCO-annotated examples to enable AI models for identifying Ichneumonoidea and associated families.
AnyDepth-DETR/-YOLO: Any-depth object detection with a single network cs.CV · 2026-05-10 · unverdicted · none · ref 5 · internal anchor
A single network achieves any-depth object detection by splitting stages into always-executed essential paths and skippable refinement paths, trained via self-distillation on the full and minimal extremes to maintain stage compatibility.
Training-Free Tunnel Defect Inspection and Engineering Interpretation via Visual Recalibration and Entity Reconstruction cs.CV · 2026-04-30 · unverdicted · none · ref 17 · internal anchor
TunnelMIND recalibrates language-guided defect proposals via dense visual consistency and reconstructs them into structured defect entities with attributes for severity grading and retrieval-grounded engineering reports, reporting F1 scores of 0.68, 0.78, and 0.72 on visible, GPR, and road defect任务.
Visual Prototype Conditioned Focal Region Generation for UAV-Based Object Detection cs.CV · 2026-04-03 · unverdicted · none · ref 47 · internal anchor
UAVGen generates higher-quality synthetic UAV images via visual prototype conditioning and focal region focus in diffusion models, leading to better object detection accuracy than prior methods.
Scale-Gest: Scalable Model-Space Synthesis and Runtime Selection for On-Device Gesture Detection cs.CV · 2026-03-16 · conditional · none · ref 25 · internal anchor
Scale-Gest creates a runtime-selectable family of tiny-YOLO models with device-calibrated ACE profiles and an ROI gate that cuts per-frame energy by 4x while holding event-level F1 at 0.8-0.9 on a new driving-gesture dataset.
A Self-Evolving Defect Detection Framework for Industrial Photovoltaic Systems cs.AI · 2026-03-16 · unverdicted · none · ref 26 · internal anchor
SEPDD is a self-evolving defect detection framework for PV modules that achieves 91.4% mAP50 on public data and 49.5% on private data, outperforming autonomous baselines and human experts.
Unified Unsupervised and Sparsely-Supervised 3D Object Detection by Semantic Pseudo-Labeling and Prototype Learning cs.CV · 2026-02-25 · unverdicted · none · ref 44 · internal anchor
SPL unifies unsupervised and sparsely-supervised 3D object detection via semantic pseudo-labeling that produces bounding boxes and point labels, followed by memory-based prototype learning that mines features from both labeled and unlabeled data.
Edge Assisted Multi-Camera Vehicle Tracking Framework for Real-Time and Scalable Deployment cs.CV · 2025-11-17 · unverdicted · none · ref 11 · internal anchor
EASE-MCVT is a distributed edge-assisted multi-camera vehicle tracking framework that achieves real-time performance and competitive accuracy on public datasets through edge processing and server-side optimizations.
SoftHGNN: Soft Hypergraph Neural Networks for General Visual Recognition cs.CV · 2025-05-21 · unverdicted · none · ref 56 · internal anchor
SoftHGNN introduces differentiable soft hyperedges via learnable prototypes and top-k sparse selection to model high-order visual interactions and improve recognition accuracy.
SpikeDet: Better Firing Patterns for Accurate and Energy-Efficient Object Detection with Spiking Neural Networks cs.CV · 2025-01-25 · unverdicted · none · ref 55 · internal anchor
SpikeDet reaches 52.2% AP on COCO 2017 with spiking networks by optimizing firing patterns via MDSNet and SMFM, using half the energy of prior SNN detectors.
TriBand-BEV: Real-Time LiDAR-Only 3D Pedestrian Detection via Height-Aware BEV and High-Resolution Feature Fusion cs.CV · 2026-05-12 · unverdicted · none · ref 39 · internal anchor
TriBand-BEV introduces a three-band height-aware BEV encoding of LiDAR data to enable single-pass real-time 3D detection of pedestrians, cars, and cyclists with improved KITTI accuracy.
Cooperative Robotics Reinforced by Collective Perception for Traffic Moderation cs.RO · 2026-05-12 · unverdicted · none · ref 27 · internal anchor
A cooperative humanoid robot fuses camera-based collective perception with V2X messages to detect collision risks at non-line-of-sight intersections and physically stops merging vehicles.
InsHuman: Towards Natural and Identity-Preserving Human Insertion cs.CV · 2026-05-08 · unverdicted · none · ref 43 · internal anchor
InsHuman proposes Human-Background Adaptive Fusion, Face-to-Face ID-Preserving, and Bidirectional Data Pairing to enable natural human insertion in images without altering identity.
LLM-Guided Agentic Floor Plan Parsing for Accessible Indoor Navigation of Blind and Low-Vision People cs.AI · 2026-04-27 · unverdicted · none · ref 22 · internal anchor
A self-correcting multi-agent LLM pipeline parses floor plans into graphs and generates accessible routes, outperforming single LLM calls with success rates up to 92% on short paths in a real university building.
Caries DETR: Tooth Structure-aware Prior and Lesion-aware Dynamic Loss Refinement for DETR Based Caries Detection cs.CV · 2026-04-26 · unverdicted · none · ref 35 · internal anchor
Caries-DETR adds tooth-structure query initialization and lesion-aware loss reweighting to DETR, reaching state-of-the-art caries detection on AlphaDent and DentalAI datasets.
StomaD2: An All-in-One System for Intelligent Stomatal Phenotype Analysis via Diffusion-Based Restoration Detection Network cs.CV · 2026-04-18 · unverdicted · none · ref 44 · internal anchor
StomaD2 integrates diffusion-based image restoration with a specialized rotated detection network to achieve high-accuracy stomatal phenotyping across more than 130 plant species.
A Weak-Signal-Aware Framework for Subsurface Defect Detection: Mechanisms for Enhancing Low-SCR Hyperbolic Signatures cs.CV · 2026-04-07 · unverdicted · none · ref 17 · internal anchor
WSA-Net uses partial convolutions, heterogeneous grouping attention, geometric reconstruction, and context anchoring to enhance low-SCR hyperbolic signatures in GPR data, reaching 0.6958 mAP@0.5 at 164 FPS with 2.412M parameters on the RTST dataset.
FMC-DETR: Frequency-Decoupled Multi-Domain Coordination for Aerial-View Object Detection cs.CV · 2025-09-27 · unverdicted · none · ref 20 · internal anchor
FMC-DETR proposes a frequency-decoupled fusion framework with WeKat backbone, MDFC coordination, and CPF fusion modules that claims state-of-the-art results on remote sensing object detection benchmarks.
3D Reconstruction and Knowledge Distillation to Improve Multi-View Image Models to Explore Spike Volume Estimation in Wheat cs.CV · 2026-05-20 · unverdicted · none · ref 21 · internal anchor
Knowledge distillation from a rigid-invariant 3D point cloud network into a regulated multi-view Transformer yields lower-error, faster wheat spike volume estimates from 2D images.
ATRACT: A Trustworthy Robotic Autonomous system to support Casualty Triage cs.HC · 2026-05-16 · unverdicted · none · ref 38 · internal anchor
ATRACT integrates drone video and wearable sensor data with conditional variational autoencoder augmentation to achieve 85.7% accuracy in casualty action classification for remote battlefield triage.
A Marine Debris Detection Framework for Ocean Robots via Self-Attention Enhancement and Feature Interaction Optimization cs.CV · 2026-05-08 · unverdicted · none · ref 25 · internal anchor
YOLO-MD improves underwater marine debris detection by adding a Dual-Branch Convolutional Enhanced Self-Attention module, a lightweight shift operation, and SFG-Loss for class imbalance, achieving 0.875 precision and 0.849 mAP50 on the UODM dataset.
Resource-Constrained UAV-Based Weed Detection for Site-Specific Management on Edge Devices cs.CV · 2026-04-25 · unverdicted · none · ref 39 · internal anchor
YOLOv11s and RT-DETRv2-R50-M provide the best accuracy-speed trade-off for real-time weed detection on edge UAV systems, with mAP50 up to 79% and low latency.
Early Detection of Acute Myeloid Leukemia (AML) Using YOLOv12 Deep Learning Model cs.CV · 2026-04-17 · unverdicted · none · ref 11 · internal anchor
YOLOv12 with Otsu thresholding on cell-based segmentation classifies AML cells at 99.3% validation and test accuracy.
FSDETR: Frequency-Spatial Feature Enhancement for Small Object Detection cs.CV · 2026-04-16 · unverdicted · none · ref 25 · internal anchor
FSDETR enhances RT-DETR with SHAB, DA-AIFI, and FSFPN blocks to improve small-object detection, reporting 13.9% APS on VisDrone 2019 and 48.95% AP50 on TinyPerson using 14.7M parameters.
Beyond Mamba: Enhancing State-space Models with Deformable Dilated Convolutions for Multi-scale Traffic Object Detection cs.CV · 2026-04-09 · unverdicted · none · ref 7 · internal anchor
MDDCNet combines Mamba blocks with deformable dilated convolutions, enhanced feed-forward networks, and an attention-aggregating feature pyramid to achieve better multi-scale traffic object detection than prior detectors.
DAT: Dual-Aware Adaptive Transmission for Efficient Multimodal LLM Inference in Edge-Cloud Systems cs.MM · 2026-04-07 · unverdicted · none · ref 38 · internal anchor
DAT combines a small-large model cascade with fine-tuning and bandwidth-aware multi-stream transmission to deliver high-accuracy event recognition and low-latency alerts for video streams in edge-cloud systems.
WildfireVLM: AI-powered Analysis for Early Wildfire Detection and Risk Assessment Using Satellite Imagery cs.CV · 2026-02-09 · unverdicted · none · ref 30 · internal anchor
WildfireVLM integrates YOLOv12 object detection on satellite imagery with multimodal LLMs to detect wildfires and produce contextual risk assessments and response recommendations.
Depth-Aware Rover: A Study of Edge AI and Monocular Vision for Real-World Implementation cs.CV · 2026-04-24 · unverdicted · none · ref 8 · internal anchor
Monocular depth estimation with UniDepthV2 on Raspberry Pi enables cost-effective rover navigation, proving more robust than stereo vision in real-world tests at 0.1 FPS depth and 10 FPS detection.
Real-Time Cellist Postural Evaluation With On-Device Computer Vision cs.HC · 2026-04-19 · unverdicted · none · ref 17 · internal anchor
Cello Evaluator is a real-time postural feedback system for cellists running on current Android phones via on-device computer vision, validated as user-friendly by experts.
Multi-Agent Object Detection Framework Based on Raspberry Pi YOLO Detector and Slack-Ollama Natural Language Interface cs.CV · 2026-04-14 · unverdicted · none · ref 22 · internal anchor
A local multi-agent framework integrates YOLO object detection with Slack-Ollama natural language control entirely on Raspberry Pi hardware.
TRACE: Evidence Grounding-Guided Multi-Video Event Understanding and Claim Generation cs.CV · 2026-05-16 · unreviewed · ref 11 · internal anchor
Population-Scale Advancing Interface Modeling Reveals How Bacterial Swarms Encode Future Spatial Architecture cond-mat.soft · 2026-02-01 · unreviewed · ref 56 · internal anchor

YOLOv12: Attention-Centric Real-Time Object Detectors

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer