hub Mixed citations

Yolo26: Key architectural enhancements and performance bench- marking for real-time object detection

· 2025 · arXiv 2509.25164

Mixed citation behavior. Most common role is method (50%).

22 Pith papers citing it

Method 50% of classified citations

read on arXiv browse 22 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

method 4 baseline 2

citation-polarity summary

use method 3 baseline 2 background 1

representative citing papers

Cracks in the Foundation: A Civil Infrastructure Dataset to Challenge Vision Foundation Models

cs.CV · 2026-05-18 · unverdicted · novelty 8.0

CiF is a large new civil infrastructure segmentation dataset that shows zero-shot foundation models and domain-supervised models plateau at roughly 25% mAP, establishing infrastructure inspection as an open challenge for current visual AI.

WHU-Infra3D: A Full-stack Multi-modal Dataset and Benchmark for 3D Roadside Infrastructure Inventory

cs.CV · 2026-06-03 · unverdicted · novelty 7.0

WHU-Infra3D is a new large-scale multi-modal dataset and benchmark for 3D roadside infrastructure inventory, providing over 175k 2D boxes, thousands of 3D instances, and 181k annotations across five core tasks while exposing cross-city gaps and long-tailed defect vulnerabilities.

Train the Agent, Not the Expert: Learning to Harness Heterogeneous Experts for Multi-Turn Visual Reasoning

cs.CV · 2026-05-28 · unverdicted · novelty 7.0

VisHarness learns a reinforcement-learned policy to harness specialized visual experts via multi-turn interactions and dynamic visual memory archiving, outperforming general models on four visual reasoning benchmarks.

XWOD: A Real-World Benchmark for Object Detection under Extreme Weather Conditions

cs.CV · 2026-05-12 · accept · novelty 7.0

XWOD is a large-scale real-world benchmark for traffic object detection under seven extreme weather conditions that improves zero-shot generalization to other weather datasets.

OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction

cs.RO · 2026-04-30 · unverdicted · novelty 7.0

A 48-camera residential platform delivers real-time occlusion-robust 3D perception and coordinated actuation for multi-human multi-robot interaction in a shared home workspace.

What and Where to Adapt: Structure-Semantics Co-Tuning for Machine Vision Compression via Synergistic Adapters

cs.CV · 2026-04-11 · unverdicted · novelty 7.0

S2-CoT coordinates a Structural Fidelity Adapter in the encoder-decoder with a Semantic Context Adapter in the entropy model to convert potential performance loss into state-of-the-art gains across base codecs while using only a small fraction of parameters.

A Hybrid Vision-Language Architecture for Automated Defect Reasoning and Report Generation in Industrial Inspection

cs.CV · 2026-05-26 · unverdicted · novelty 6.0

A decoupled pipeline with YOLO detection, deterministic prompt encoding, and QLoRA-adapted 1.5B LLM achieves superior structured report generation compared to monolithic VLMs on synthetic maintenance data.

Contour-Native Bridge Defect Detection and Compact Digital Archiving with Frequency-Supervised Fourier Contours

cs.CV · 2026-05-09 · unverdicted · novelty 6.0

FS-FSD regresses frequency-supervised Fourier contours for bridge defects, yielding higher polygon accuracy and better geometric quality than box, mask, or contour baselines on 3,767 UAV images with 42,346 instances.

FruitProM-V2: Robust Probabilistic Maturity Estimation and Detection of Fruits and Vegetables

cs.CV · 2026-04-28 · unverdicted · novelty 6.0

A probabilistic maturity model treats ripeness as continuous rather than discrete classes, improving robustness to annotation noise in fruit detection.

CSA-Graphs: A Privacy-Preserving Structural Dataset for Child Sexual Abuse Research

cs.CV · 2026-04-08 · unverdicted · novelty 6.0

CSA-Graphs provides scene graphs and skeleton graphs as privacy-preserving alternatives to real CSAI images, with experiments showing they support classification and improve when combined.

TinyFormer: Preserving Tiny Objects in YOLO-DETR Hybrid Real-time Detectors

cs.CV · 2026-05-24 · unverdicted · novelty 5.0

TinyFormer adds Parallel Bi-fusion Module and Spatial Semantic Adapter to a YOLO-DETR hybrid, raising small-object AP by 1.6 points to 58.5% on MS COCO while keeping real-time speed.

MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models

cs.LG · 2026-05-19 · unverdicted · novelty 5.0

MiMuon is a hybrid optimizer that achieves a generalization error bound of O(1/N) independent of the small singular-value gap that limits the original Muon bound, while retaining the same O(1/T^{1/4}) convergence rate.

Harnessing Floating Car Data, Traffic Camera Observations, and Network Flow Analysis for Traffic Volume Estimation

eess.SY · 2026-05-11 · unverdicted · novelty 5.0

A CTM-GNN model with EnSRF assimilation and flow-weighted transition matrix fuses floating car data and camera observations to deliver physically consistent, network-wide traffic volume estimates and forecasts, demonstrated with improved accuracy in Manhattan.

Vision-Based Safe Human-Robot Collaboration with Uncertainty Guarantees

cs.RO · 2026-04-16 · unverdicted · novelty 5.0 · 2 refs

Proposes a vision-based human pose estimation and motion prediction pipeline that uses conformal prediction sets to provide valid confidence guarantees for safe human-robot collaboration.

DocRevive: A Unified Pipeline for Document Text Restoration

cs.CV · 2026-04-11 · unverdicted · novelty 5.0 · 2 refs

A unified pipeline using OCR, inpainting, and diffusion models restores text in degraded documents on a new synthetic benchmark dataset, evaluated with the proposed UCSM metric.

Gaze to Insight: A Scalable AI Approach for Detecting Gaze Behaviours in Face-to-Face Collaborative Learning

cs.CV · 2026-04-01 · unverdicted · novelty 5.0

A method combining pretrained YOLO11, YOLOE-26, and Gaze-LLE models detects student gaze targets in collaborative learning videos with F1-score 0.829 without requiring labeled training data.

A Marine Debris Detection Framework for Ocean Robots via Self-Attention Enhancement and Feature Interaction Optimization

cs.CV · 2026-05-08 · unverdicted · novelty 4.0

YOLO-MD improves underwater marine debris detection by adding a Dual-Branch Convolutional Enhanced Self-Attention module, a lightweight shift operation, and SFG-Loss for class imbalance, achieving 0.875 precision and 0.849 mAP50 on the UODM dataset.

A Hybrid Approach for Closing the Sim2real Appearance Gap in Game Engine Synthetic Datasets

cs.CV · 2026-05-04 · unverdicted · novelty 4.0

Combining a diffusion model and an image-to-image translation model produces more photorealistic game-engine synthetic images than either alone while keeping semantic labels intact.

AIPC: Agent-Based Automation for AI Model Deployment with Qualcomm AI Runtime

cs.SE · 2026-04-16 · unverdicted · novelty 4.0

AIPC uses AI agents to automate PyTorch-to-QNN/SNPE deployment, completing it in 7-20 minutes for regular vision models at low API cost.

Democratising Camera Trap AI: An Open-Source Model for Detecting UK Mammals

cs.CV · 2026-06-09 · accept · novelty 3.0

A YOLO26x object detector for 31 UK camera trap classes reports mAP 0.984 at IoU 0.5 on held-out data from the same sites as training.

YOLO26 vs. YOLOv8: A Comprehensive Architectural Benchmark of Next-Generation Real-Time Object Detection Models

cs.CV · 2026-05-24 · unverdicted · novelty 2.0

Empirical benchmark finds YOLO26 superior on Pascal VOC accuracy and efficiency but YOLOv8 faster on GPU, with both models struggling similarly on VisDrone small-object detection.

Evaluation of Convolutional and Transformer-Based Detectors for Weed Detection in Tomato Plantations

cs.CV · 2026-04-29 · unverdicted · novelty 2.0 · 2 refs

Comparative benchmark finds CNN detectors deliver higher efficiency than transformer detectors for weed detection in tomatoes while transformers capture more context at greater computational cost.

citing papers explorer

Showing 22 of 22 citing papers.

Cracks in the Foundation: A Civil Infrastructure Dataset to Challenge Vision Foundation Models cs.CV · 2026-05-18 · unverdicted · none · ref 42
CiF is a large new civil infrastructure segmentation dataset that shows zero-shot foundation models and domain-supervised models plateau at roughly 25% mAP, establishing infrastructure inspection as an open challenge for current visual AI.
WHU-Infra3D: A Full-stack Multi-modal Dataset and Benchmark for 3D Roadside Infrastructure Inventory cs.CV · 2026-06-03 · unverdicted · none · ref 60
WHU-Infra3D is a new large-scale multi-modal dataset and benchmark for 3D roadside infrastructure inventory, providing over 175k 2D boxes, thousands of 3D instances, and 181k annotations across five core tasks while exposing cross-city gaps and long-tailed defect vulnerabilities.
Train the Agent, Not the Expert: Learning to Harness Heterogeneous Experts for Multi-Turn Visual Reasoning cs.CV · 2026-05-28 · unverdicted · none · ref 23
VisHarness learns a reinforcement-learned policy to harness specialized visual experts via multi-turn interactions and dynamic visual memory archiving, outperforming general models on four visual reasoning benchmarks.
XWOD: A Real-World Benchmark for Object Detection under Extreme Weather Conditions cs.CV · 2026-05-12 · accept · none · ref 29
XWOD is a large-scale real-world benchmark for traffic object detection under seven extreme weather conditions that improves zero-shot generalization to other weather datasets.
OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction cs.RO · 2026-04-30 · unverdicted · none · ref 40
A 48-camera residential platform delivers real-time occlusion-robust 3D perception and coordinated actuation for multi-human multi-robot interaction in a shared home workspace.
What and Where to Adapt: Structure-Semantics Co-Tuning for Machine Vision Compression via Synergistic Adapters cs.CV · 2026-04-11 · unverdicted · none · ref 47
S2-CoT coordinates a Structural Fidelity Adapter in the encoder-decoder with a Semantic Context Adapter in the entropy model to convert potential performance loss into state-of-the-art gains across base codecs while using only a small fraction of parameters.
A Hybrid Vision-Language Architecture for Automated Defect Reasoning and Report Generation in Industrial Inspection cs.CV · 2026-05-26 · unverdicted · none · ref 12
A decoupled pipeline with YOLO detection, deterministic prompt encoding, and QLoRA-adapted 1.5B LLM achieves superior structured report generation compared to monolithic VLMs on synthetic maintenance data.
Contour-Native Bridge Defect Detection and Compact Digital Archiving with Frequency-Supervised Fourier Contours cs.CV · 2026-05-09 · unverdicted · none · ref 45
FS-FSD regresses frequency-supervised Fourier contours for bridge defects, yielding higher polygon accuracy and better geometric quality than box, mask, or contour baselines on 3,767 UAV images with 42,346 instances.
FruitProM-V2: Robust Probabilistic Maturity Estimation and Detection of Fruits and Vegetables cs.CV · 2026-04-28 · unverdicted · none · ref 34
A probabilistic maturity model treats ripeness as continuous rather than discrete classes, improving robustness to annotation noise in fruit detection.
CSA-Graphs: A Privacy-Preserving Structural Dataset for Child Sexual Abuse Research cs.CV · 2026-04-08 · unverdicted · none · ref 42
CSA-Graphs provides scene graphs and skeleton graphs as privacy-preserving alternatives to real CSAI images, with experiments showing they support classification and improve when combined.
TinyFormer: Preserving Tiny Objects in YOLO-DETR Hybrid Real-time Detectors cs.CV · 2026-05-24 · unverdicted · none · ref 20
TinyFormer adds Parallel Bi-fusion Module and Spatial Semantic Adapter to a YOLO-DETR hybrid, raising small-object AP by 1.6 points to 58.5% on MS COCO while keeping real-time speed.
MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models cs.LG · 2026-05-19 · unverdicted · none · ref 33
MiMuon is a hybrid optimizer that achieves a generalization error bound of O(1/N) independent of the small singular-value gap that limits the original Muon bound, while retaining the same O(1/T^{1/4}) convergence rate.
Harnessing Floating Car Data, Traffic Camera Observations, and Network Flow Analysis for Traffic Volume Estimation eess.SY · 2026-05-11 · unverdicted · none · ref 27
A CTM-GNN model with EnSRF assimilation and flow-weighted transition matrix fuses floating car data and camera observations to deliver physically consistent, network-wide traffic volume estimates and forecasts, demonstrated with improved accuracy in Manhattan.
Vision-Based Safe Human-Robot Collaboration with Uncertainty Guarantees cs.RO · 2026-04-16 · unverdicted · none · ref 30 · 2 links
Proposes a vision-based human pose estimation and motion prediction pipeline that uses conformal prediction sets to provide valid confidence guarantees for safe human-robot collaboration.
DocRevive: A Unified Pipeline for Document Text Restoration cs.CV · 2026-04-11 · unverdicted · none · ref 30 · 2 links
A unified pipeline using OCR, inpainting, and diffusion models restores text in degraded documents on a new synthetic benchmark dataset, evaluated with the proposed UCSM metric.
Gaze to Insight: A Scalable AI Approach for Detecting Gaze Behaviours in Face-to-Face Collaborative Learning cs.CV · 2026-04-01 · unverdicted · none · ref 16
A method combining pretrained YOLO11, YOLOE-26, and Gaze-LLE models detects student gaze targets in collaborative learning videos with F1-score 0.829 without requiring labeled training data.
A Marine Debris Detection Framework for Ocean Robots via Self-Attention Enhancement and Feature Interaction Optimization cs.CV · 2026-05-08 · unverdicted · none · ref 27
YOLO-MD improves underwater marine debris detection by adding a Dual-Branch Convolutional Enhanced Self-Attention module, a lightweight shift operation, and SFG-Loss for class imbalance, achieving 0.875 precision and 0.849 mAP50 on the UODM dataset.
A Hybrid Approach for Closing the Sim2real Appearance Gap in Game Engine Synthetic Datasets cs.CV · 2026-05-04 · unverdicted · none · ref 2
Combining a diffusion model and an image-to-image translation model produces more photorealistic game-engine synthetic images than either alone while keeping semantic labels intact.
AIPC: Agent-Based Automation for AI Model Deployment with Qualcomm AI Runtime cs.SE · 2026-04-16 · unverdicted · none · ref 32
AIPC uses AI agents to automate PyTorch-to-QNN/SNPE deployment, completing it in 7-20 minutes for regular vision models at low API cost.
Democratising Camera Trap AI: An Open-Source Model for Detecting UK Mammals cs.CV · 2026-06-09 · accept · none · ref 51
A YOLO26x object detector for 31 UK camera trap classes reports mAP 0.984 at IoU 0.5 on held-out data from the same sites as training.
YOLO26 vs. YOLOv8: A Comprehensive Architectural Benchmark of Next-Generation Real-Time Object Detection Models cs.CV · 2026-05-24 · unverdicted · none · ref 21
Empirical benchmark finds YOLO26 superior on Pascal VOC accuracy and efficiency but YOLOv8 faster on GPU, with both models struggling similarly on VisDrone small-object detection.
Evaluation of Convolutional and Transformer-Based Detectors for Weed Detection in Tomato Plantations cs.CV · 2026-04-29 · unverdicted · none · ref 19 · 2 links
Comparative benchmark finds CNN detectors deliver higher efficiency than transformer detectors for weed detection in tomatoes while transformers capture more context at greater computational cost.

Yolo26: Key architectural enhancements and performance bench- marking for real-time object detection

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer