CiF is a large new civil infrastructure segmentation dataset that shows zero-shot foundation models and domain-supervised models plateau at roughly 25% mAP, establishing infrastructure inspection as an open challenge for current visual AI.
hub Mixed citations
Yolo26: Key architectural enhancements and performance bench- marking for real-time object detection
Mixed citation behavior. Most common role is method (50%).
hub tools
citation-role summary
citation-polarity summary
years
2026 16representative citing papers
XWOD is a large-scale real-world benchmark for traffic object detection under seven extreme weather conditions that improves zero-shot generalization to other weather datasets.
A 48-camera residential platform delivers real-time occlusion-robust 3D perception and coordinated actuation for multi-human multi-robot interaction in a shared home workspace.
S2-CoT coordinates a Structural Fidelity Adapter in the encoder-decoder with a Semantic Context Adapter in the entropy model to convert potential performance loss into state-of-the-art gains across base codecs while using only a small fraction of parameters.
FS-FSD regresses frequency-supervised Fourier contours for bridge defects, yielding higher polygon accuracy and better geometric quality than box, mask, or contour baselines on 3,767 UAV images with 42,346 instances.
A probabilistic maturity model treats ripeness as continuous rather than discrete classes, improving robustness to annotation noise in fruit detection.
CSA-Graphs provides scene graphs and skeleton graphs as privacy-preserving alternatives to real CSAI images, with experiments showing they support classification and improve when combined.
MiMuon is a hybrid optimizer that achieves a generalization error bound of O(1/N) independent of the small singular-value gap that limits the original Muon bound, while retaining the same O(1/T^{1/4}) convergence rate.
A CTM-GNN model with EnSRF assimilation and flow-weighted transition matrix fuses floating car data and camera observations to deliver physically consistent, network-wide traffic volume estimates and forecasts, demonstrated with improved accuracy in Manhattan.
Proposes a vision-based human pose estimation and motion prediction pipeline that uses conformal prediction sets to provide valid confidence guarantees for safe human-robot collaboration.
A unified pipeline using OCR, inpainting, and diffusion models restores text in degraded documents on a new synthetic benchmark dataset, evaluated with the proposed UCSM metric.
A method combining pretrained YOLO11, YOLOE-26, and Gaze-LLE models detects student gaze targets in collaborative learning videos with F1-score 0.829 without requiring labeled training data.
YOLO-MD improves underwater marine debris detection by adding a Dual-Branch Convolutional Enhanced Self-Attention module, a lightweight shift operation, and SFG-Loss for class imbalance, achieving 0.875 precision and 0.849 mAP50 on the UODM dataset.
Combining a diffusion model and an image-to-image translation model produces more photorealistic game-engine synthetic images than either alone while keeping semantic labels intact.
AIPC uses AI agents to automate PyTorch-to-QNN/SNPE deployment, completing it in 7-20 minutes for regular vision models at low API cost.
citing papers explorer
-
Cracks in the Foundation: A Civil Infrastructure Dataset to Challenge Vision Foundation Models
CiF is a large new civil infrastructure segmentation dataset that shows zero-shot foundation models and domain-supervised models plateau at roughly 25% mAP, establishing infrastructure inspection as an open challenge for current visual AI.
-
XWOD: A Real-World Benchmark for Object Detection under Extreme Weather Conditions
XWOD is a large-scale real-world benchmark for traffic object detection under seven extreme weather conditions that improves zero-shot generalization to other weather datasets.
-
OmniRobotHome: A Multi-Camera Platform for Real-Time Multiadic Human-Robot Interaction
A 48-camera residential platform delivers real-time occlusion-robust 3D perception and coordinated actuation for multi-human multi-robot interaction in a shared home workspace.
-
What and Where to Adapt: Structure-Semantics Co-Tuning for Machine Vision Compression via Synergistic Adapters
S2-CoT coordinates a Structural Fidelity Adapter in the encoder-decoder with a Semantic Context Adapter in the entropy model to convert potential performance loss into state-of-the-art gains across base codecs while using only a small fraction of parameters.
-
Contour-Native Bridge Defect Detection and Compact Digital Archiving with Frequency-Supervised Fourier Contours
FS-FSD regresses frequency-supervised Fourier contours for bridge defects, yielding higher polygon accuracy and better geometric quality than box, mask, or contour baselines on 3,767 UAV images with 42,346 instances.
-
FruitProM-V2: Robust Probabilistic Maturity Estimation and Detection of Fruits and Vegetables
A probabilistic maturity model treats ripeness as continuous rather than discrete classes, improving robustness to annotation noise in fruit detection.
-
CSA-Graphs: A Privacy-Preserving Structural Dataset for Child Sexual Abuse Research
CSA-Graphs provides scene graphs and skeleton graphs as privacy-preserving alternatives to real CSAI images, with experiments showing they support classification and improve when combined.
-
MiMuon: Mixed Muon Optimizer with Improved Generalization for Large Models
MiMuon is a hybrid optimizer that achieves a generalization error bound of O(1/N) independent of the small singular-value gap that limits the original Muon bound, while retaining the same O(1/T^{1/4}) convergence rate.
-
Harnessing Floating Car Data, Traffic Camera Observations, and Network Flow Analysis for Traffic Volume Estimation
A CTM-GNN model with EnSRF assimilation and flow-weighted transition matrix fuses floating car data and camera observations to deliver physically consistent, network-wide traffic volume estimates and forecasts, demonstrated with improved accuracy in Manhattan.
-
Vision-Based Safe Human-Robot Collaboration with Uncertainty Guarantees
Proposes a vision-based human pose estimation and motion prediction pipeline that uses conformal prediction sets to provide valid confidence guarantees for safe human-robot collaboration.
-
DocRevive: A Unified Pipeline for Document Text Restoration
A unified pipeline using OCR, inpainting, and diffusion models restores text in degraded documents on a new synthetic benchmark dataset, evaluated with the proposed UCSM metric.
-
Gaze to Insight: A Scalable AI Approach for Detecting Gaze Behaviours in Face-to-Face Collaborative Learning
A method combining pretrained YOLO11, YOLOE-26, and Gaze-LLE models detects student gaze targets in collaborative learning videos with F1-score 0.829 without requiring labeled training data.
-
A Marine Debris Detection Framework for Ocean Robots via Self-Attention Enhancement and Feature Interaction Optimization
YOLO-MD improves underwater marine debris detection by adding a Dual-Branch Convolutional Enhanced Self-Attention module, a lightweight shift operation, and SFG-Loss for class imbalance, achieving 0.875 precision and 0.849 mAP50 on the UODM dataset.
-
A Hybrid Approach for Closing the Sim2real Appearance Gap in Game Engine Synthetic Datasets
Combining a diffusion model and an image-to-image translation model produces more photorealistic game-engine synthetic images than either alone while keeping semantic labels intact.
-
AIPC: Agent-Based Automation for AI Model Deployment with Qualcomm AI Runtime
AIPC uses AI agents to automate PyTorch-to-QNN/SNPE deployment, completing it in 7-20 minutes for regular vision models at low API cost.
- Evaluation of Convolutional and Transformer-Based Detectors for Weed Detection in Tomato Plantations