hub Mixed citations

YOLOv12: Attention-Centric Real-Time Object Detectors

Yunjie Tian, Qixiang Ye, David Doermann · 2025 · cs.CV · arXiv 2502.12524

Mixed citation behavior. Most common role is background (44%).

43 Pith papers citing it

Background 44% of classified citations

open full Pith review browse 43 citing papers arXiv PDF

abstract

Enhancing the network architecture of the YOLO framework has been crucial for a long time, but has focused on CNN-based improvements despite the proven superiority of attention mechanisms in modeling capabilities. This is because attention-based models cannot match the speed of CNN-based models. This paper proposes an attention-centric YOLO framework, namely YOLOv12, that matches the speed of previous CNN-based ones while harnessing the performance benefits of attention mechanisms. YOLOv12 surpasses all popular real-time object detectors in accuracy with competitive speed. For example, YOLOv12-N achieves 40.6% mAP with an inference latency of 1.64 ms on a T4 GPU, outperforming advanced YOLOv10-N / YOLOv11-N by 2.1%/1.2% mAP with a comparable speed. This advantage extends to other model scales. YOLOv12 also surpasses end-to-end real-time detectors that improve DETR, such as RT-DETR / RT-DETRv2: YOLOv12-S beats RT-DETR-R18 / RT-DETRv2-R18 while running 42% faster, using only 36% of the computation and 45% of the parameters. More comparisons are shown in Figure 1.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 method 3 baseline 2

citation-polarity summary

background 4 use method 3 baseline 2

representative citing papers

Unlocking the Visual Record of Materials Science: A Large-Scale Multimodal Dataset from Scientific Literature

cs.CV · 2026-06-29 · accept · novelty 8.0

MatMMExtract pipeline creates MatSciFig dataset of 391k annotated materials science figure panels and MaterialScope detection dataset with high accuracy.

A Dataset for the Recognition of Historical and Handwritten Music Scores in Western Notation

cs.CV · 2026-05-18 · conditional · novelty 7.0

MusiCorpus supplies 1,309 pages of real historical handwritten music with transcriptions and annotations, the largest such resource for training optical music recognition systems under realistic conditions.

WUTDet: A 100K-Scale Ship Detection Dataset and Benchmarks with Dense Small Objects

cs.CV · 2026-04-09 · unverdicted · novelty 7.0

WUTDet is a 100K-image ship detection dataset with benchmarks indicating Transformer models outperform CNN and Mamba architectures in accuracy and small-object detection for complex maritime environments.

SARES-DEIM: Sparse Mixture-of-Experts Meets DETR for Robust SAR Ship Detection

cs.CV · 2026-04-05 · unverdicted · novelty 7.0

SARES-DEIM achieves 76.4% mAP50:95 and 93.8% mAP50 on HRSID by routing SAR features through sparse frequency and wavelet experts plus a high-resolution preservation neck, outperforming prior YOLO and SAR detectors.

Descriptor: Parasitoid Wasps and Associated Hymenoptera Dataset (DAPWH)

cs.CV · 2026-02-20 · unverdicted · novelty 7.0

Releases the DAPWH dataset of 3556 wasp images including 1739 COCO-annotated examples to enable AI models for identifying Ichneumonoidea and associated families.

EpiSAM: Character Segmentation in Challenging Stone Inscriptions

cs.CV · 2026-06-27 · unverdicted · novelty 6.0

EpiSAM introduces neighbor-aware prediction in a prompt-guided transformer for character segmentation on challenging stone inscriptions, plus an expanded annotated dataset.

RefDiffNet: Learning to Expose Subtle PCB Defects Before Detection

cs.CV · 2026-05-30 · unverdicted · novelty 6.0

RefDiffNet is a lightweight input enhancement block that uses reference image comparison to expose PCB defects, delivering up to 18% relative mAP50:95 gains across YOLO, RT-DETR, and Faster R-CNN detectors with 0.004-0.005M extra parameters.

MORI-Seg: Learning Morphological Geometry for Instance Segmentation without Instance Annotations

cs.CV · 2026-05-27 · unverdicted · novelty 6.0

MORI-Seg learns morphology-aware geometric representations from semantic masks to enable instance segmentation without instance-level annotations.

SteelDS: A High-Resolution Video Dataset of E40 Steel Scrap for Object Detection and Instance Segmentation

cs.RO · 2026-05-26 · unverdicted · novelty 6.0

Introduces the SteelDS dataset with 24,297 annotated frames of E40 steel and copper scrap for object detection and instance segmentation to aid industrial sorting.

TRACE: Evidence Grounding-Guided Multi-Video Event Understanding and Claim Generation

cs.CV · 2026-05-16 · unverdicted · novelty 6.0 · 2 refs

TRACE improves multi-video event understanding by grounding evidence in structured timelines before visual reasoning, raising MiRAGE F1 from 0.705 to 0.811 on MAGMaR 2026.

AnyDepth-DETR/-YOLO: Any-depth object detection with a single network

cs.CV · 2026-05-10 · unverdicted · novelty 6.0

A single network achieves any-depth object detection by splitting stages into always-executed essential paths and skippable refinement paths, trained via self-distillation on the full and minimal extremes to maintain stage compatibility.

Training-Free Tunnel Defect Inspection and Engineering Interpretation via Visual Recalibration and Entity Reconstruction

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

TunnelMIND recalibrates language-guided defect proposals via dense visual consistency and reconstructs them into structured defect entities with attributes for severity grading and retrieval-grounded engineering reports, reporting F1 scores of 0.68, 0.78, and 0.72 on visible, GPR, and road defect任务.

Visual Prototype Conditioned Focal Region Generation for UAV-Based Object Detection

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

UAVGen generates higher-quality synthetic UAV images via visual prototype conditioning and focal region focus in diffusion models, leading to better object detection accuracy than prior methods.

Scale-Gest: Scalable Model-Space Synthesis and Runtime Selection for On-Device Gesture Detection

cs.CV · 2026-03-16 · conditional · novelty 6.0

Scale-Gest creates a runtime-selectable family of tiny-YOLO models with device-calibrated ACE profiles and an ROI gate that cuts per-frame energy by 4x while holding event-level F1 at 0.8-0.9 on a new driving-gesture dataset.

A Self-Evolving Defect Detection Framework for Industrial Photovoltaic Systems

cs.AI · 2026-03-16 · unverdicted · novelty 6.0

SEPDD is a self-evolving defect detection framework for PV modules that achieves 91.4% mAP50 on public data and 49.5% on private data, outperforming autonomous baselines and human experts.

Unified Unsupervised and Sparsely-Supervised 3D Object Detection by Semantic Pseudo-Labeling and Prototype Learning

cs.CV · 2026-02-25 · unverdicted · novelty 6.0

SPL unifies unsupervised and sparsely-supervised 3D object detection via semantic pseudo-labeling that produces bounding boxes and point labels, followed by memory-based prototype learning that mines features from both labeled and unlabeled data.

Edge Assisted Multi-Camera Vehicle Tracking Framework for Real-Time and Scalable Deployment

cs.CV · 2025-11-17 · unverdicted · novelty 6.0

EASE-MCVT is a distributed edge-assisted multi-camera vehicle tracking framework that achieves real-time performance and competitive accuracy on public datasets through edge processing and server-side optimizations.

SoftHGNN: Soft Hypergraph Neural Networks for General Visual Recognition

cs.CV · 2025-05-21 · unverdicted · novelty 6.0

SoftHGNN introduces differentiable soft hyperedges via learnable prototypes and top-k sparse selection to model high-order visual interactions and improve recognition accuracy.

SpikeDet: Better Firing Patterns for Accurate and Energy-Efficient Object Detection with Spiking Neural Networks

cs.CV · 2025-01-25 · unverdicted · novelty 6.0

SpikeDet reaches 52.2% AP on COCO 2017 with spiking networks by optimizing firing patterns via MDSNet and SMFM, using half the energy of prior SNN detectors.

TinyFormer: Preserving Tiny Objects in YOLO-DETR Hybrid Real-time Detectors

cs.CV · 2026-05-24 · unverdicted · novelty 5.0

TinyFormer adds Parallel Bi-fusion Module and Spatial Semantic Adapter to a YOLO-DETR hybrid, raising small-object AP by 1.6 points to 58.5% on MS COCO while keeping real-time speed.

TriBand-BEV: Real-Time LiDAR-Only 3D Pedestrian Detection via Height-Aware BEV and High-Resolution Feature Fusion

cs.CV · 2026-05-12 · unverdicted · novelty 5.0

TriBand-BEV introduces a three-band height-aware BEV encoding of LiDAR data to enable single-pass real-time 3D detection of pedestrians, cars, and cyclists with improved KITTI accuracy.

Cooperative Robotics Reinforced by Collective Perception for Traffic Moderation

cs.RO · 2026-05-12 · unverdicted · novelty 5.0

A cooperative humanoid robot fuses camera-based collective perception with V2X messages to detect collision risks at non-line-of-sight intersections and physically stops merging vehicles.

InsHuman: Towards Natural and Identity-Preserving Human Insertion

cs.CV · 2026-05-08 · unverdicted · novelty 5.0

InsHuman proposes Human-Background Adaptive Fusion, Face-to-Face ID-Preserving, and Bidirectional Data Pairing to enable natural human insertion in images without altering identity.

LLM-Guided Agentic Floor Plan Parsing for Accessible Indoor Navigation of Blind and Low-Vision People

cs.AI · 2026-04-27 · unverdicted · novelty 5.0

A self-correcting multi-agent LLM pipeline parses floor plans into graphs and generates accessible routes, outperforming single LLM calls with success rates up to 92% on short paths in a real university building.

citing papers explorer

Showing 3 of 3 citing papers after filters.

AnyDepth-DETR/-YOLO: Any-depth object detection with a single network cs.CV · 2026-05-10 · unverdicted · none · ref 5 · internal anchor
A single network achieves any-depth object detection by splitting stages into always-executed essential paths and skippable refinement paths, trained via self-distillation on the full and minimal extremes to maintain stage compatibility.
Cooperative Robotics Reinforced by Collective Perception for Traffic Moderation cs.RO · 2026-05-12 · unverdicted · none · ref 27 · internal anchor
A cooperative humanoid robot fuses camera-based collective perception with V2X messages to detect collision risks at non-line-of-sight intersections and physically stops merging vehicles.
InsHuman: Towards Natural and Identity-Preserving Human Insertion cs.CV · 2026-05-08 · unverdicted · none · ref 43 · internal anchor
InsHuman proposes Human-Background Adaptive Fusion, Face-to-Face ID-Preserving, and Bidirectional Data Pairing to enable natural human insertion in images without altering identity.

YOLOv12: Attention-Centric Real-Time Object Detectors

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer