CiF is a large new civil infrastructure segmentation dataset that shows zero-shot foundation models and domain-supervised models plateau at roughly 25% mAP, establishing infrastructure inspection as an open challenge for current visual AI.
hub Mixed citations
Yolo26: Key architectural enhancements and performance bench- marking for real-time object detection
Mixed citation behavior. Most common role is method (50%).
hub tools
citation-role summary
citation-polarity summary
years
2026 22representative citing papers
WHU-Infra3D is a new large-scale multi-modal dataset and benchmark for 3D roadside infrastructure inventory, providing over 175k 2D boxes, thousands of 3D instances, and 181k annotations across five core tasks while exposing cross-city gaps and long-tailed defect vulnerabilities.
VisHarness learns a reinforcement-learned policy to harness specialized visual experts via multi-turn interactions and dynamic visual memory archiving, outperforming general models on four visual reasoning benchmarks.
XWOD is a large-scale real-world benchmark for traffic object detection under seven extreme weather conditions that improves zero-shot generalization to other weather datasets.
A 48-camera residential platform delivers real-time occlusion-robust 3D perception and coordinated actuation for multi-human multi-robot interaction in a shared home workspace.
S2-CoT coordinates a Structural Fidelity Adapter in the encoder-decoder with a Semantic Context Adapter in the entropy model to convert potential performance loss into state-of-the-art gains across base codecs while using only a small fraction of parameters.
A decoupled pipeline with YOLO detection, deterministic prompt encoding, and QLoRA-adapted 1.5B LLM achieves superior structured report generation compared to monolithic VLMs on synthetic maintenance data.
FS-FSD regresses frequency-supervised Fourier contours for bridge defects, yielding higher polygon accuracy and better geometric quality than box, mask, or contour baselines on 3,767 UAV images with 42,346 instances.
A probabilistic maturity model treats ripeness as continuous rather than discrete classes, improving robustness to annotation noise in fruit detection.
CSA-Graphs provides scene graphs and skeleton graphs as privacy-preserving alternatives to real CSAI images, with experiments showing they support classification and improve when combined.
TinyFormer adds Parallel Bi-fusion Module and Spatial Semantic Adapter to a YOLO-DETR hybrid, raising small-object AP by 1.6 points to 58.5% on MS COCO while keeping real-time speed.
MiMuon is a hybrid optimizer that achieves a generalization error bound of O(1/N) independent of the small singular-value gap that limits the original Muon bound, while retaining the same O(1/T^{1/4}) convergence rate.
A CTM-GNN model with EnSRF assimilation and flow-weighted transition matrix fuses floating car data and camera observations to deliver physically consistent, network-wide traffic volume estimates and forecasts, demonstrated with improved accuracy in Manhattan.
Proposes a vision-based human pose estimation and motion prediction pipeline that uses conformal prediction sets to provide valid confidence guarantees for safe human-robot collaboration.
A unified pipeline using OCR, inpainting, and diffusion models restores text in degraded documents on a new synthetic benchmark dataset, evaluated with the proposed UCSM metric.
A method combining pretrained YOLO11, YOLOE-26, and Gaze-LLE models detects student gaze targets in collaborative learning videos with F1-score 0.829 without requiring labeled training data.
YOLO-MD improves underwater marine debris detection by adding a Dual-Branch Convolutional Enhanced Self-Attention module, a lightweight shift operation, and SFG-Loss for class imbalance, achieving 0.875 precision and 0.849 mAP50 on the UODM dataset.
Combining a diffusion model and an image-to-image translation model produces more photorealistic game-engine synthetic images than either alone while keeping semantic labels intact.
AIPC uses AI agents to automate PyTorch-to-QNN/SNPE deployment, completing it in 7-20 minutes for regular vision models at low API cost.
A YOLO26x object detector for 31 UK camera trap classes reports mAP 0.984 at IoU 0.5 on held-out data from the same sites as training.
Empirical benchmark finds YOLO26 superior on Pascal VOC accuracy and efficiency but YOLOv8 faster on GPU, with both models struggling similarly on VisDrone small-object detection.
Comparative benchmark finds CNN detectors deliver higher efficiency than transformer detectors for weed detection in tomatoes while transformers capture more context at greater computational cost.
citing papers explorer
-
Cracks in the Foundation: A Civil Infrastructure Dataset to Challenge Vision Foundation Models
CiF is a large new civil infrastructure segmentation dataset that shows zero-shot foundation models and domain-supervised models plateau at roughly 25% mAP, establishing infrastructure inspection as an open challenge for current visual AI.
-
WHU-Infra3D: A Full-stack Multi-modal Dataset and Benchmark for 3D Roadside Infrastructure Inventory
WHU-Infra3D is a new large-scale multi-modal dataset and benchmark for 3D roadside infrastructure inventory, providing over 175k 2D boxes, thousands of 3D instances, and 181k annotations across five core tasks while exposing cross-city gaps and long-tailed defect vulnerabilities.
-
Train the Agent, Not the Expert: Learning to Harness Heterogeneous Experts for Multi-Turn Visual Reasoning
VisHarness learns a reinforcement-learned policy to harness specialized visual experts via multi-turn interactions and dynamic visual memory archiving, outperforming general models on four visual reasoning benchmarks.
-
XWOD: A Real-World Benchmark for Object Detection under Extreme Weather Conditions
XWOD is a large-scale real-world benchmark for traffic object detection under seven extreme weather conditions that improves zero-shot generalization to other weather datasets.
-
OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction
A 48-camera residential platform delivers real-time occlusion-robust 3D perception and coordinated actuation for multi-human multi-robot interaction in a shared home workspace.
-
What and Where to Adapt: Structure-Semantics Co-Tuning for Machine Vision Compression via Synergistic Adapters
S2-CoT coordinates a Structural Fidelity Adapter in the encoder-decoder with a Semantic Context Adapter in the entropy model to convert potential performance loss into state-of-the-art gains across base codecs while using only a small fraction of parameters.
-
A Hybrid Vision-Language Architecture for Automated Defect Reasoning and Report Generation in Industrial Inspection
A decoupled pipeline with YOLO detection, deterministic prompt encoding, and QLoRA-adapted 1.5B LLM achieves superior structured report generation compared to monolithic VLMs on synthetic maintenance data.
-
Contour-Native Bridge Defect Detection and Compact Digital Archiving with Frequency-Supervised Fourier Contours
FS-FSD regresses frequency-supervised Fourier contours for bridge defects, yielding higher polygon accuracy and better geometric quality than box, mask, or contour baselines on 3,767 UAV images with 42,346 instances.
-
FruitProM-V2: Robust Probabilistic Maturity Estimation and Detection of Fruits and Vegetables
A probabilistic maturity model treats ripeness as continuous rather than discrete classes, improving robustness to annotation noise in fruit detection.
-
CSA-Graphs: A Privacy-Preserving Structural Dataset for Child Sexual Abuse Research
CSA-Graphs provides scene graphs and skeleton graphs as privacy-preserving alternatives to real CSAI images, with experiments showing they support classification and improve when combined.
-
TinyFormer: Preserving Tiny Objects in YOLO-DETR Hybrid Real-time Detectors
TinyFormer adds Parallel Bi-fusion Module and Spatial Semantic Adapter to a YOLO-DETR hybrid, raising small-object AP by 1.6 points to 58.5% on MS COCO while keeping real-time speed.
-
MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models
MiMuon is a hybrid optimizer that achieves a generalization error bound of O(1/N) independent of the small singular-value gap that limits the original Muon bound, while retaining the same O(1/T^{1/4}) convergence rate.
-
Harnessing Floating Car Data, Traffic Camera Observations, and Network Flow Analysis for Traffic Volume Estimation
A CTM-GNN model with EnSRF assimilation and flow-weighted transition matrix fuses floating car data and camera observations to deliver physically consistent, network-wide traffic volume estimates and forecasts, demonstrated with improved accuracy in Manhattan.
-
Vision-Based Safe Human-Robot Collaboration with Uncertainty Guarantees
Proposes a vision-based human pose estimation and motion prediction pipeline that uses conformal prediction sets to provide valid confidence guarantees for safe human-robot collaboration.
-
DocRevive: A Unified Pipeline for Document Text Restoration
A unified pipeline using OCR, inpainting, and diffusion models restores text in degraded documents on a new synthetic benchmark dataset, evaluated with the proposed UCSM metric.
-
Gaze to Insight: A Scalable AI Approach for Detecting Gaze Behaviours in Face-to-Face Collaborative Learning
A method combining pretrained YOLO11, YOLOE-26, and Gaze-LLE models detects student gaze targets in collaborative learning videos with F1-score 0.829 without requiring labeled training data.
-
A Marine Debris Detection Framework for Ocean Robots via Self-Attention Enhancement and Feature Interaction Optimization
YOLO-MD improves underwater marine debris detection by adding a Dual-Branch Convolutional Enhanced Self-Attention module, a lightweight shift operation, and SFG-Loss for class imbalance, achieving 0.875 precision and 0.849 mAP50 on the UODM dataset.
-
A Hybrid Approach for Closing the Sim2real Appearance Gap in Game Engine Synthetic Datasets
Combining a diffusion model and an image-to-image translation model produces more photorealistic game-engine synthetic images than either alone while keeping semantic labels intact.
-
AIPC: Agent-Based Automation for AI Model Deployment with Qualcomm AI Runtime
AIPC uses AI agents to automate PyTorch-to-QNN/SNPE deployment, completing it in 7-20 minutes for regular vision models at low API cost.
-
Democratising Camera Trap AI: An Open-Source Model for Detecting UK Mammals
A YOLO26x object detector for 31 UK camera trap classes reports mAP 0.984 at IoU 0.5 on held-out data from the same sites as training.
-
YOLO26 vs. YOLOv8: A Comprehensive Architectural Benchmark of Next-Generation Real-Time Object Detection Models
Empirical benchmark finds YOLO26 superior on Pascal VOC accuracy and efficiency but YOLOv8 faster on GPU, with both models struggling similarly on VisDrone small-object detection.
-
Evaluation of Convolutional and Transformer-Based Detectors for Weed Detection in Tomato Plantations
Comparative benchmark finds CNN detectors deliver higher efficiency than transformer detectors for weed detection in tomatoes while transformers capture more context at greater computational cost.