hub Mixed citations

YOLOv11: An Overview of the Key Architectural Enhancements

Rahima Khanam, Muhammad Hussain · 2024 · cs.CV · arXiv 2410.17725

Mixed citation behavior. Most common role is background (40%).

55 Pith papers citing it

Background 40% of classified citations

open full Pith review browse 55 citing papers arXiv PDF

abstract

This study presents an architectural analysis of YOLOv11, the latest iteration in the YOLO (You Only Look Once) series of object detection models. We examine the models architectural innovations, including the introduction of the C3k2 (Cross Stage Partial with kernel size 2) block, SPPF (Spatial Pyramid Pooling - Fast), and C2PSA (Convolutional block with Parallel Spatial Attention) components, which contribute in improving the models performance in several ways such as enhanced feature extraction. The paper explores YOLOv11's expanded capabilities across various computer vision tasks, including object detection, instance segmentation, pose estimation, and oriented object detection (OBB). We review the model's performance improvements in terms of mean Average Precision (mAP) and computational efficiency compared to its predecessors, with a focus on the trade-off between parameter count and accuracy. Additionally, the study discusses YOLOv11's versatility across different model sizes, from nano to extra-large, catering to diverse application needs from edge devices to high-performance computing environments. Our research provides insights into YOLOv11's position within the broader landscape of object detection and its potential impact on real-time computer vision applications.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4 baseline 3 method 2 other 1

citation-polarity summary

background 4 baseline 3 use method 2 unclear 1

representative citing papers

Cracks in the Foundation: A Civil Infrastructure Dataset to Challenge Vision Foundation Models

cs.CV · 2026-05-18 · unverdicted · novelty 8.0

CiF is a large new civil infrastructure segmentation dataset that shows zero-shot foundation models and domain-supervised models plateau at roughly 25% mAP, establishing infrastructure inspection as an open challenge for current visual AI.

ScreenParse: Moving Beyond Sparse Grounding with Complete Screen Parsing Supervision

cs.CV · 2026-02-15 · conditional · novelty 8.0

ScreenParse dataset and ScreenVLM model deliver dense screen parsing that outperforms larger VLMs on PageIoU and transfers to better UI grounding.

Weakly Supervised Cross-Modal Learning for 4D Radar Scene Flow Estimation

cs.CV · 2026-05-18 · unverdicted · novelty 7.0 · 2 refs

A task-specific iterative framework for weakly supervised 4D radar scene flow estimation uses instance-aware self-supervised losses from 2D tracking/segmentation and a rigid static loss from odometry to outperform LiDAR-dependent cross-modal and fully supervised methods on the VoD dataset.

PoseBridge: Bridging the Skeletonization Gap for Zero-Shot Skeleton-Based Action Recognition

cs.CV · 2026-05-12 · unverdicted · novelty 7.0

PoseBridge recovers semantic information lost during skeletonization by extracting pose-anchored cues from human pose estimation and transferring them via skeleton-conditioned bridging and semantic prototype adaptation, yielding 13.3-17.4 point gains on the Kinetics PURLS benchmark.

Fusion in Your Way: Aligning Image Fusion with Heterogeneous Demands via Direct Preference Optimization

cs.CV · 2026-05-07 · unverdicted · novelty 7.0

DPOFusion uses direct preference optimization on property-aligned and preference-controllable latent diffusion models to produce adaptive infrared-visible image fusions aligned with heterogeneous human and machine vision demands.

FluxShard: Motion-Aware Feature Cache Reuse for Collaborative Video Analytics in Mobile Edge Computing

cs.NI · 2026-05-07 · unverdicted · novelty 7.0

FluxShard uses per-block motion vectors and a Receptive Field Alignment Principle to manage feature cache reuse in edge-cloud video analytics, delivering 32.6-83.8% lower latency and 14.9-64.0% lower energy than baselines while preserving accuracy.

Learning Where to Embed: Noise-Aware Positional Embedding for Query Retrieval in Small-Object Detection

cs.CV · 2026-04-16 · unverdicted · novelty 7.0

HELP uses heatmap-guided positional embeddings and a gradient mask to suppress background noise in queries, enabling efficient small-object detection with fewer decoder layers and parameters.

What and Where to Adapt: Structure-Semantics Co-Tuning for Machine Vision Compression via Synergistic Adapters

cs.CV · 2026-04-11 · unverdicted · novelty 7.0

S2-CoT coordinates a Structural Fidelity Adapter in the encoder-decoder with a Semantic Context Adapter in the entropy model to convert potential performance loss into state-of-the-art gains across base codecs while using only a small fraction of parameters.

GeoMMBench and GeoMMAgent: Toward Expert-Level Multimodal Intelligence in Geoscience and Remote Sensing

cs.CV · 2026-04-10 · unverdicted · novelty 7.0

GeoMMBench reveals deficiencies in current multimodal LLMs for geoscience tasks while GeoMMAgent demonstrates that tool-integrated agents achieve significantly higher performance.

SARES-DEIM: Sparse Mixture-of-Experts Meets DETR for Robust SAR Ship Detection

cs.CV · 2026-04-05 · unverdicted · novelty 7.0

SARES-DEIM achieves 76.4% mAP50:95 and 93.8% mAP50 on HRSID by routing SAR features through sparse frequency and wavelet experts plus a high-resolution preservation neck, outperforming prior YOLO and SAR detectors.

UniSpector: Towards Universal Open-set Defect Recognition via Spectral-Contrastive Visual Prompting

cs.CV · 2026-04-03 · unverdicted · novelty 7.0

UniSpector organizes visual prompt space with spatial-spectral and contrastive encoders to support open-set defect localization, beating baselines by at least 19.7% AP50b and 15.8% AP50m on the new Inspect Anything benchmark.

VTC: DNN Compilation with Virtual Tensors for Data Movement Elimination

cs.DC · 2026-02-11 · unverdicted · novelty 7.0

VTC eliminates unnecessary data movement in DNN compilation using virtual tensors tracked by index mappings, achieving up to 1.93x speedup and 60% memory savings on NVIDIA GPUs.

Gen-n-Val: Agentic Image Data Generation and Validation

cs.CV · 2025-06-05 · conditional · novelty 7.0

Gen-n-Val uses LLM and VLLM agents with Layer Diffusion and TextGrad to generate and validate synthetic instance data, cutting invalid samples from 50% to 7% and improving rare-class performance on LVIS and COCO benchmarks.

Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games

cs.AI · 2025-06-04 · unverdicted · novelty 7.0

Orak is a foundational benchmark providing training data, interfaces, and evaluation tools for LLM agents across diverse video game genres.

EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation

cs.CV · 2026-05-22 · unverdicted · novelty 6.0

EvalVerse is a pipeline-aware benchmark that distills expert cinematic judgments into VLMs to assess 'goodness' metrics like aesthetics and multi-shot coherence alongside basic prompt adherence.

FedADAS: Communication-Efficient Federated Distillation for On-Device Driver Yawn Recognition in Vehicular Networks

cs.DC · 2026-05-19 · unverdicted · novelty 6.0

FedADAS uses federated distillation to support heterogeneous on-device yawn recognition models across vehicles, delivering up to 9974x lower communication cost than standard federated learning while preserving accuracy under extreme data heterogeneity.

Contour-Native Bridge Defect Detection and Compact Digital Archiving with Frequency-Supervised Fourier Contours

cs.CV · 2026-05-09 · unverdicted · novelty 6.0

FS-FSD regresses frequency-supervised Fourier contours for bridge defects, yielding higher polygon accuracy and better geometric quality than box, mask, or contour baselines on 3,767 UAV images with 42,346 instances.

Look Once, Beam Twice: Camera-Primed Real-Time Double-Directional mmWave Beam Management for Vehicular Connectivity

cs.NI · 2026-05-06 · unverdicted · novelty 6.0

VIBE is a camera-primed hybrid model-based closed-loop learning system for real-time double-directional mmWave beam management in vehicular networks that achieves outage rates as low as 1.1-1.4% and outperforms 5G NR and end-to-end ML baselines.

Physical Adversarial Clothing Evades Visible-Thermal Detectors via Non-Overlapping RGB-T Pattern

cs.CV · 2026-05-06 · unverdicted · novelty 6.0

Non-overlapping RGB-T adversarial patterns on clothing, optimized with spatial discrete-continuous optimization, achieve high attack success rates against multiple RGB-T detector fusion architectures in both digital and physical evaluations.

Training-Free Tunnel Defect Inspection and Engineering Interpretation via Visual Recalibration and Entity Reconstruction

cs.CV · 2026-04-30 · unverdicted · novelty 6.0

TunnelMIND recalibrates language-guided defect proposals via dense visual consistency and reconstructs them into structured defect entities with attributes for severity grading and retrieval-grounded engineering reports, reporting F1 scores of 0.68, 0.78, and 0.72 on visible, GPR, and road defect任务.

Transferable Physical-World Adversarial Patches Against Object Detection in Autonomous Driving

cs.CV · 2026-04-25 · unverdicted · novelty 6.0

AdvAD produces physical-world adversarial patches with improved transferability to unseen object detectors by multi-model optimization, adaptive balancing, and physical variation robustness.

ZoomSpec: A Physics-Guided Coarse-to-Fine Framework for Wideband Spectrum Sensing

cs.CV · 2026-04-15 · unverdicted · novelty 6.0

ZoomSpec achieves 78.1 mAP@0.5:0.95 on the SpaceNet dataset by combining log-space STFT, a coarse proposal net, adaptive heterodyne filtering, and dual-domain fine recognition to improve narrowband visibility in wideband spectrum sensing.

Toward Unified Fine-Grained Vehicle Classification and Automatic License Plate Recognition

cs.CV · 2026-04-07 · accept · novelty 6.0

UFPR-VeSV is a new real-world dataset for fine-grained vehicle classification and automatic license plate recognition collected from Brazilian police cameras, with benchmarks demonstrating its difficulty and the value of joint task use.

Visual Prototype Conditioned Focal Region Generation for UAV-Based Object Detection

cs.CV · 2026-04-03 · unverdicted · novelty 6.0

UAVGen generates higher-quality synthetic UAV images via visual prototype conditioning and focal region focus in diffusion models, leading to better object detection accuracy than prior methods.

citing papers explorer

Showing 50 of 55 citing papers.

Cracks in the Foundation: A Civil Infrastructure Dataset to Challenge Vision Foundation Models cs.CV · 2026-05-18 · unverdicted · none · ref 22 · internal anchor
CiF is a large new civil infrastructure segmentation dataset that shows zero-shot foundation models and domain-supervised models plateau at roughly 25% mAP, establishing infrastructure inspection as an open challenge for current visual AI.
ScreenParse: Moving Beyond Sparse Grounding with Complete Screen Parsing Supervision cs.CV · 2026-02-15 · conditional · none · ref 4 · internal anchor
ScreenParse dataset and ScreenVLM model deliver dense screen parsing that outperforms larger VLMs on PageIoU and transfers to better UI grounding.
Weakly Supervised Cross-Modal Learning for 4D Radar Scene Flow Estimation cs.CV · 2026-05-18 · unverdicted · none · ref 2 · 2 links · internal anchor
A task-specific iterative framework for weakly supervised 4D radar scene flow estimation uses instance-aware self-supervised losses from 2D tracking/segmentation and a rigid static loss from odometry to outperform LiDAR-dependent cross-modal and fully supervised methods on the VoD dataset.
PoseBridge: Bridging the Skeletonization Gap for Zero-Shot Skeleton-Based Action Recognition cs.CV · 2026-05-12 · unverdicted · none · ref 13 · internal anchor
PoseBridge recovers semantic information lost during skeletonization by extracting pose-anchored cues from human pose estimation and transferring them via skeleton-conditioned bridging and semantic prototype adaptation, yielding 13.3-17.4 point gains on the Kinetics PURLS benchmark.
Fusion in Your Way: Aligning Image Fusion with Heterogeneous Demands via Direct Preference Optimization cs.CV · 2026-05-07 · unverdicted · none · ref 61 · internal anchor
DPOFusion uses direct preference optimization on property-aligned and preference-controllable latent diffusion models to produce adaptive infrared-visible image fusions aligned with heterogeneous human and machine vision demands.
FluxShard: Motion-Aware Feature Cache Reuse for Collaborative Video Analytics in Mobile Edge Computing cs.NI · 2026-05-07 · unverdicted · none · ref 21 · internal anchor
FluxShard uses per-block motion vectors and a Receptive Field Alignment Principle to manage feature cache reuse in edge-cloud video analytics, delivering 32.6-83.8% lower latency and 14.9-64.0% lower energy than baselines while preserving accuracy.
Learning Where to Embed: Noise-Aware Positional Embedding for Query Retrieval in Small-Object Detection cs.CV · 2026-04-16 · unverdicted · none · ref 68 · internal anchor
HELP uses heatmap-guided positional embeddings and a gradient mask to suppress background noise in queries, enabling efficient small-object detection with fewer decoder layers and parameters.
What and Where to Adapt: Structure-Semantics Co-Tuning for Machine Vision Compression via Synergistic Adapters cs.CV · 2026-04-11 · unverdicted · none · ref 28 · internal anchor
S2-CoT coordinates a Structural Fidelity Adapter in the encoder-decoder with a Semantic Context Adapter in the entropy model to convert potential performance loss into state-of-the-art gains across base codecs while using only a small fraction of parameters.
GeoMMBench and GeoMMAgent: Toward Expert-Level Multimodal Intelligence in Geoscience and Remote Sensing cs.CV · 2026-04-10 · unverdicted · none · ref 21 · internal anchor
GeoMMBench reveals deficiencies in current multimodal LLMs for geoscience tasks while GeoMMAgent demonstrates that tool-integrated agents achieve significantly higher performance.
SARES-DEIM: Sparse Mixture-of-Experts Meets DETR for Robust SAR Ship Detection cs.CV · 2026-04-05 · unverdicted · none · ref 7 · internal anchor
SARES-DEIM achieves 76.4% mAP50:95 and 93.8% mAP50 on HRSID by routing SAR features through sparse frequency and wavelet experts plus a high-resolution preservation neck, outperforming prior YOLO and SAR detectors.
UniSpector: Towards Universal Open-set Defect Recognition via Spectral-Contrastive Visual Prompting cs.CV · 2026-04-03 · unverdicted · none · ref 16 · internal anchor
UniSpector organizes visual prompt space with spatial-spectral and contrastive encoders to support open-set defect localization, beating baselines by at least 19.7% AP50b and 15.8% AP50m on the new Inspect Anything benchmark.
VTC: DNN Compilation with Virtual Tensors for Data Movement Elimination cs.DC · 2026-02-11 · unverdicted · none · ref 18 · internal anchor
VTC eliminates unnecessary data movement in DNN compilation using virtual tensors tracked by index mappings, achieving up to 1.93x speedup and 60% memory savings on NVIDIA GPUs.
Gen-n-Val: Agentic Image Data Generation and Validation cs.CV · 2025-06-05 · conditional · none · ref 16 · internal anchor
Gen-n-Val uses LLM and VLLM agents with Layer Diffusion and TextGrad to generate and validate synthetic instance data, cutting invalid samples from 50% to 7% and improving rare-class performance on LVIS and COCO benchmarks.
Orak: A Foundational Benchmark for Training and Evaluating LLM Agents on Diverse Video Games cs.AI · 2025-06-04 · unverdicted · none · ref 76 · internal anchor
Orak is a foundational benchmark providing training data, interfaces, and evaluation tools for LLM agents across diverse video game genres.
EvalVerse: Pipeline-Aware and Expert-Calibrated Benchmarking for Professional Cinematic Video Generation cs.CV · 2026-05-22 · unverdicted · none · ref 7 · internal anchor
EvalVerse is a pipeline-aware benchmark that distills expert cinematic judgments into VLMs to assess 'goodness' metrics like aesthetics and multi-shot coherence alongside basic prompt adherence.
FedADAS: Communication-Efficient Federated Distillation for On-Device Driver Yawn Recognition in Vehicular Networks cs.DC · 2026-05-19 · unverdicted · none · ref 12 · internal anchor
FedADAS uses federated distillation to support heterogeneous on-device yawn recognition models across vehicles, delivering up to 9974x lower communication cost than standard federated learning while preserving accuracy under extreme data heterogeneity.
Contour-Native Bridge Defect Detection and Compact Digital Archiving with Frequency-Supervised Fourier Contours cs.CV · 2026-05-09 · unverdicted · none · ref 44 · internal anchor
FS-FSD regresses frequency-supervised Fourier contours for bridge defects, yielding higher polygon accuracy and better geometric quality than box, mask, or contour baselines on 3,767 UAV images with 42,346 instances.
Look Once, Beam Twice: Camera-Primed Real-Time Double-Directional mmWave Beam Management for Vehicular Connectivity cs.NI · 2026-05-06 · unverdicted · none · ref 42 · internal anchor
VIBE is a camera-primed hybrid model-based closed-loop learning system for real-time double-directional mmWave beam management in vehicular networks that achieves outage rates as low as 1.1-1.4% and outperforms 5G NR and end-to-end ML baselines.
Physical Adversarial Clothing Evades Visible-Thermal Detectors via Non-Overlapping RGB-T Pattern cs.CV · 2026-05-06 · unverdicted · none · ref 18 · internal anchor
Non-overlapping RGB-T adversarial patterns on clothing, optimized with spatial discrete-continuous optimization, achieve high attack success rates against multiple RGB-T detector fusion architectures in both digital and physical evaluations.
Training-Free Tunnel Defect Inspection and Engineering Interpretation via Visual Recalibration and Entity Reconstruction cs.CV · 2026-04-30 · unverdicted · none · ref 8 · internal anchor
TunnelMIND recalibrates language-guided defect proposals via dense visual consistency and reconstructs them into structured defect entities with attributes for severity grading and retrieval-grounded engineering reports, reporting F1 scores of 0.68, 0.78, and 0.72 on visible, GPR, and road defect任务.
Transferable Physical-World Adversarial Patches Against Object Detection in Autonomous Driving cs.CV · 2026-04-25 · unverdicted · none · ref 34 · internal anchor
AdvAD produces physical-world adversarial patches with improved transferability to unseen object detectors by multi-model optimization, adaptive balancing, and physical variation robustness.
ZoomSpec: A Physics-Guided Coarse-to-Fine Framework for Wideband Spectrum Sensing cs.CV · 2026-04-15 · unverdicted · none · ref 24 · internal anchor
ZoomSpec achieves 78.1 mAP@0.5:0.95 on the SpaceNet dataset by combining log-space STFT, a coarse proposal net, adaptive heterodyne filtering, and dual-domain fine recognition to improve narrowband visibility in wideband spectrum sensing.
Toward Unified Fine-Grained Vehicle Classification and Automatic License Plate Recognition cs.CV · 2026-04-07 · accept · none · ref 36 · internal anchor
UFPR-VeSV is a new real-world dataset for fine-grained vehicle classification and automatic license plate recognition collected from Brazilian police cameras, with benchmarks demonstrating its difficulty and the value of joint task use.
Visual Prototype Conditioned Focal Region Generation for UAV-Based Object Detection cs.CV · 2026-04-03 · unverdicted · none · ref 24 · internal anchor
UAVGen generates higher-quality synthetic UAV images via visual prototype conditioning and focal region focus in diffusion models, leading to better object detection accuracy than prior methods.
Chasing Ghosts: A Simulation-to-Real Olfactory Navigation Stack with Optional Vision Augmentation cs.RO · 2026-02-23 · unverdicted · none · ref 30 · internal anchor
A simulation-to-real navigation policy enables a quadrotor to locate an odor source using only basic olfaction sensors and optional vision, validated in indoor real-world flights.
PEPR: Privileged Event-based Predictive Regularization for Domain Generalization cs.CV · 2026-02-04 · unverdicted · none · ref 25 · internal anchor
PEPR reframes learning with privileged event data as predicting latent event features from RGB to improve domain generalization in object detection and segmentation without direct cross-modal alignment.
Clutter-Robust Vision-Language-Action Models through Object-Centric and Geometry Grounding cs.RO · 2025-12-27 · conditional · none · ref 37 · internal anchor
OBEYED-VLA improves VLA robustness in cluttered real-world manipulation by disentangling perception into VLM-based object-centric grounding and geometry-aware stages, then fine-tuning the policy only on single-object demonstrations.
Edge Assisted Multi-Camera Vehicle Tracking Framework for Real-Time and Scalable Deployment cs.CV · 2025-11-17 · unverdicted · none · ref 8 · internal anchor
EASE-MCVT is a distributed edge-assisted multi-camera vehicle tracking framework that achieves real-time performance and competitive accuracy on public datasets through edge processing and server-side optimizations.
Video models are zero-shot learners and reasoners cs.LG · 2025-09-24 · unverdicted · none · ref 14 · internal anchor
Generative video models exhibit emergent zero-shot capabilities across perception, manipulation, and basic reasoning tasks.
Synthetic Data Augmentation for Enhanced Chicken Carcass Instance Segmentation cs.CV · 2025-07-24 · unverdicted · none · ref 62 · internal anchor
Synthetic data augmentation improves instance segmentation performance for chicken carcasses when real annotated data is limited.
A Leaf-Level Dataset for Soybean-Cotton Detection and Segmentation cs.CV · 2025-03-03 · unverdicted · none · ref 29 · internal anchor
A new leaf-instance dataset for soybean-cotton detection and segmentation collected across growth stages and conditions from commercial farms is presented and validated with YOLOv11.
MR2-ByteTrack: CNN and Transformer-based Video Object Detection for AI-augmented Embedded Vision Sensor Nodes cs.CV · 2026-05-14 · conditional · none · ref 40 · internal anchor
MR2-ByteTrack maintains high accuracy in video object detection on MCUs by combining multi-resolution processing, ByteTrack for frame linking, and Rescore for confidence aggregation, achieving up to 55% energy savings and real-time performance for both CNN and Transformer models.
ERPPO: Entropy Regularization-based Proximal Policy Optimization cs.LG · 2026-05-13 · unverdicted · none · ref 79 · internal anchor
ERPPO adds a DSA-based ambiguity estimator to MAPPO and switches between L1 and L2 entropy regularization to improve exploration and stability in non-stationary multi-dimensional observations.
TriBand-BEV: Real-Time LiDAR-Only 3D Pedestrian Detection via Height-Aware BEV and High-Resolution Feature Fusion cs.CV · 2026-05-12 · unverdicted · none · ref 9 · internal anchor
TriBand-BEV introduces a three-band height-aware BEV encoding of LiDAR data to enable single-pass real-time 3D detection of pedestrians, cars, and cyclists with improved KITTI accuracy.
Exploring Clustering Capability of Inpainting Model Embeddings for Pattern-based Individual Identification cs.CV · 2026-05-06 · unverdicted · none · ref 11 · internal anchor
Inpainting auxiliary task improves clustering of embeddings for individual zebrafish identification based on skin patterns.
Echo-{\alpha}: Large Agentic Multimodal Reasoning Model for Ultrasound Interpretation cs.CV · 2026-04-30 · unverdicted · none · ref 9 · internal anchor
Echo-α integrates organ-specific detectors with global visual context via an invoke-and-reason agentic loop, trained on a nine-task curriculum plus sequential RL, to achieve superior grounding (56.73%/43.78% F1@0.5) and diagnosis (74.90%/49.20% accuracy) on cross-center renal and breast ultrasound.
Edge-Cloud Collaborative Reconstruction via Structure-Aware Latent Diffusion for Downstream Remote Sensing Perception cs.CV · 2026-04-28 · unverdicted · none · ref 23 · internal anchor
SALD decouples remote sensing images into compressed payload plus structural prior at the edge and uses structure-gated diffusion on the cloud to improve super-resolution and downstream detection under extreme bandwidth limits.
DocRevive: A Unified Pipeline for Document Text Restoration cs.CV · 2026-04-11 · unverdicted · none · ref 18 · 2 links · internal anchor
A unified pipeline using OCR, inpainting, and diffusion models restores text in degraded documents on a new synthetic benchmark dataset, evaluated with the proposed UCSM metric.
A Weak-Signal-Aware Framework for Subsurface Defect Detection: Mechanisms for Enhancing Low-SCR Hyperbolic Signatures cs.CV · 2026-04-07 · unverdicted · none · ref 12 · internal anchor
WSA-Net uses partial convolutions, heterogeneous grouping attention, geometric reconstruction, and context anchoring to enhance low-SCR hyperbolic signatures in GPR data, reaching 0.6958 mAP@0.5 at 164 FPS with 2.412M parameters on the RTST dataset.
Human Interaction-Aware 3D Reconstruction from a Single Image cs.CV · 2026-04-07 · unverdicted · none · ref 17 · internal anchor
HUG3D uses group-instance multi-view diffusion and physics-based optimization to create physically plausible 3D reconstructions of interacting people from a single image.
Gaze to Insight: A Scalable AI Approach for Detecting Gaze Behaviours in Face-to-Face Collaborative Learning cs.CV · 2026-04-01 · unverdicted · none · ref 17 · internal anchor
A method combining pretrained YOLO11, YOLOE-26, and Gaze-LLE models detects student gaze targets in collaborative learning videos with F1-score 0.829 without requiring labeled training data.
YawDD+: Frame-level Annotations for Accurate Yawn Prediction cs.CV · 2025-12-12 · conditional · none · ref 5 · internal anchor
YawDD+ frame-level annotations improve yawn classification to 99.34% accuracy and detection to 95.69% mAP on Jetson hardware compared to video-level labels.
A Marine Debris Detection Framework for Ocean Robots via Self-Attention Enhancement and Feature Interaction Optimization cs.CV · 2026-05-08 · unverdicted · none · ref 24 · internal anchor
YOLO-MD improves underwater marine debris detection by adding a Dual-Branch Convolutional Enhanced Self-Attention module, a lightweight shift operation, and SFG-Loss for class imbalance, achieving 0.875 precision and 0.849 mAP50 on the UODM dataset.
Fringe Projection Based Vision Pipeline for Autonomous Hard Drive Disassembly cs.CV · 2026-04-19 · unverdicted · none · ref 27 · internal anchor
An integrated fringe projection and AI pipeline delivers aligned high-accuracy 3D sensing and instance segmentation for autonomous HDD disassembly at 77.7 FPS.
Beyond Mamba: Enhancing State-space Models with Deformable Dilated Convolutions for Multi-scale Traffic Object Detection cs.CV · 2026-04-09 · unverdicted · none · ref 4 · internal anchor
MDDCNet combines Mamba blocks with deformable dilated convolutions, enhanced feed-forward networks, and an attention-aggregating feature pyramid to achieve better multi-scale traffic object detection than prior detectors.
Real-Time Structural Detection for Indoor Navigation from 3D LiDAR Using Bird's-Eye-View Images cs.RO · 2026-03-20 · conditional · none · ref 25 · internal anchor
Projecting 3D LiDAR to BEV images and applying YOLO-OBB with spatiotemporal fusion enables reliable real-time structural detection on resource-constrained robots.
Are vision-language models ready to zero-shot replace supervised classification models in agriculture? cs.CV · 2025-12-17 · unverdicted · none · ref 23 · internal anchor
Zero-shot VLMs reach at most 62% accuracy on agricultural classification tasks while supervised models like YOLO11 perform markedly higher, indicating they are not ready to replace task-specific systems.
StreakMind: AI detection and analysis of satellite streaks in astronomical images with automated database integration astro-ph.IM · 2026-05-05 · unverdicted · none · ref 12 · internal anchor
StreakMind trains a YOLO OBB model on 2335 images to detect satellite streaks in FITS frames with 94% precision and 97% recall, then applies geometric refinement and orbital database matching.
Real-Time Cellist Postural Evaluation With On-Device Computer Vision cs.HC · 2026-04-19 · unverdicted · none · ref 9 · internal anchor
Cello Evaluator is a real-time postural feedback system for cellists running on current Android phones via on-device computer vision, validated as user-friendly by experts.
Cosmos World Foundation Model Platform for Physical AI cs.CV · 2025-01-07 · unverdicted · none · ref 95 · internal anchor
The Cosmos platform supplies open-source pre-trained world models and supporting tools for building fine-tunable digital world simulations to train Physical AI.

YOLOv11: An Overview of the Key Architectural Enhancements

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer