CoExVQA uses a chain-of-explanation to ground DocVQA answers in localized document regions, achieving state-of-the-art explainable performance with a 12% ANLS gain on PFL-DocVQA over prior baselines.
hub
Yolov13: Real-time object detection with hypergraph-enhanced adaptive visual perception
17 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 17representative citing papers
WUTDet is a 100K-image ship detection dataset with benchmarks indicating Transformer models outperform CNN and Mamba architectures in accuracy and small-object detection for complex maritime environments.
DLEBench is the first benchmark for small-scale object editing in instruction-based image editing models, using 1889 samples, seven instruction types, and a dual-mode evaluation protocol to reveal performance gaps in 10 tested models.
HyperFSAD uses sparse hypergraph matching on DINOv3 features plus dual-branch scoring to deliver training-free and language-free few-shot anomaly detection that reaches state-of-the-art on six industrial and medical datasets.
Hyper-FEOD fuses RGB and event data via sparse hypergraph cross-modal fusion and region-specialized MoE experts to improve accuracy-efficiency in object detection.
Ψ-Map combines plane-constrained Gaussian surfels from LiDAR with end-to-end panoptic lifting to deliver high-precision geometric and semantic reconstruction in large-scale environments at real-time speeds.
Hippocampus-DETR integrates a hippocampal memory network (HipNet) into DETR to simulate brain subregions for pattern separation, completion, and improved detection accuracy plus generalization.
TinyFormer adds Parallel Bi-fusion Module and Spatial Semantic Adapter to a YOLO-DETR hybrid, raising small-object AP by 1.6 points to 58.5% on MS COCO while keeping real-time speed.
RACANet proposes a reliability-aware two-stage fusion network with cross-modal pretraining and local anchor modules that outperforms prior RGB-T crowd counting methods on standard benchmarks.
Fast-SegSim achieves real-time 3D-consistent open-vocabulary segmentation by optimizing feature accumulation in 2D Gaussian Splatting with Precise Tile Intersection and Top-K Hard Selection.
FMC-DETR proposes a frequency-decoupled fusion framework with WeKat backbone, MDFC coordination, and CPF fusion modules that claims state-of-the-art results on remote sensing object detection benchmarks.
Introduces UAVDB dataset for UAV detection/segmentation via PIC point-to-box conversion and SAM2 masks, with YOLO baselines showing PIC+SAM2 outperforms prior annotation methods on IoU.
YOLO-MD improves underwater marine debris detection by adding a Dual-Branch Convolutional Enhanced Self-Attention module, a lightweight shift operation, and SFG-Loss for class imbalance, achieving 0.875 precision and 0.849 mAP50 on the UODM dataset.
YOLOv11s and RT-DETRv2-R50-M provide the best accuracy-speed trade-off for real-time weed detection on edge UAV systems, with mAP50 up to 79% and low latency.
MDDCNet combines Mamba blocks with deformable dilated convolutions, enhanced feed-forward networks, and an attention-aggregating feature pyramid to achieve better multi-scale traffic object detection than prior detectors.
Empirical benchmark finds YOLO26 superior on Pascal VOC accuracy and efficiency but YOLOv8 faster on GPU, with both models struggling similarly on VisDrone small-object detection.
The NTIRE 2026 RipDetSeg Challenge evaluated AI methods for rip current detection and segmentation, finding that pretrained general-purpose models with augmentation and post-processing performed well on a diverse multi-country dataset.
citing papers explorer
-
Towards Self-Explainable Document Visual Question Answering with Chain-of-Explanation Predictions
CoExVQA uses a chain-of-explanation to ground DocVQA answers in localized document regions, achieving state-of-the-art explainable performance with a 12% ANLS gain on PFL-DocVQA over prior baselines.
-
WUTDet: A 100K-Scale Ship Detection Dataset and Benchmarks with Dense Small Objects
WUTDet is a 100K-image ship detection dataset with benchmarks indicating Transformer models outperform CNN and Mamba architectures in accuracy and small-object detection for complex maritime environments.
-
DLEBench: Evaluating Small-scale Object Editing Ability for Instruction-based Image Editing Model
DLEBench is the first benchmark for small-scale object editing in instruction-based image editing models, using 1889 samples, seven instruction types, and a dual-mode evaluation protocol to reveal performance gaps in 10 tested models.
-
Hypergraph-Enhanced Training-Free and Language-Free Few-Shot Anomaly Detection
HyperFSAD uses sparse hypergraph matching on DINOv3 features plus dual-branch scoring to deliver training-free and language-free few-shot anomaly detection that reaches state-of-the-art on six industrial and medical datasets.
-
Sparse Hypergraph-Enhanced Frame-Event Object Detection with Fine-Grained MoE
Hyper-FEOD fuses RGB and event data via sparse hypergraph cross-modal fusion and region-specialized MoE experts to improve accuracy-efficiency in object detection.
-
{\Psi}-Map: Panoptic Surface Integrated Mapping Enables Real2Sim Transfer
Ψ-Map combines plane-constrained Gaussian surfels from LiDAR with end-to-end panoptic lifting to deliver high-precision geometric and semantic reconstruction in large-scale environments at real-time speeds.
-
Hippocampus-DETR: An Explicit Memory Object Detection Framework Based on Hippocampus Modeling
Hippocampus-DETR integrates a hippocampal memory network (HipNet) into DETR to simulate brain subregions for pattern separation, completion, and improved detection accuracy plus generalization.
-
TinyFormer: Preserving Tiny Objects in YOLO-DETR Hybrid Real-time Detectors
TinyFormer adds Parallel Bi-fusion Module and Spatial Semantic Adapter to a YOLO-DETR hybrid, raising small-object AP by 1.6 points to 58.5% on MS COCO while keeping real-time speed.
-
RACANet: Reliability-Aware Crowd Anchor Network for RGB-T Crowd Counting
RACANet proposes a reliability-aware two-stage fusion network with cross-modal pretraining and local anchor modules that outperforms prior RGB-T crowd counting methods on standard benchmarks.
-
Fast-SegSim: Real-Time Open-Vocabulary Segmentation for Robotics in Simulation
Fast-SegSim achieves real-time 3D-consistent open-vocabulary segmentation by optimizing feature accumulation in 2D Gaussian Splatting with Precise Tile Intersection and Top-K Hard Selection.
-
FMC-DETR: Frequency-Decoupled Multi-Domain Coordination for Aerial-View Object Detection
FMC-DETR proposes a frequency-decoupled fusion framework with WeKat backbone, MDFC coordination, and CPF fusion modules that claims state-of-the-art results on remote sensing object detection benchmarks.
-
UAVDB: Point-Guided Masks for UAV Detection and Segmentation
Introduces UAVDB dataset for UAV detection/segmentation via PIC point-to-box conversion and SAM2 masks, with YOLO baselines showing PIC+SAM2 outperforms prior annotation methods on IoU.
-
A Marine Debris Detection Framework for Ocean Robots via Self-Attention Enhancement and Feature Interaction Optimization
YOLO-MD improves underwater marine debris detection by adding a Dual-Branch Convolutional Enhanced Self-Attention module, a lightweight shift operation, and SFG-Loss for class imbalance, achieving 0.875 precision and 0.849 mAP50 on the UODM dataset.
-
Resource-Constrained UAV-Based Weed Detection for Site-Specific Management on Edge Devices
YOLOv11s and RT-DETRv2-R50-M provide the best accuracy-speed trade-off for real-time weed detection on edge UAV systems, with mAP50 up to 79% and low latency.
-
Beyond Mamba: Enhancing State-space Models with Deformable Dilated Convolutions for Multi-scale Traffic Object Detection
MDDCNet combines Mamba blocks with deformable dilated convolutions, enhanced feed-forward networks, and an attention-aggregating feature pyramid to achieve better multi-scale traffic object detection than prior detectors.
-
YOLO26 vs. YOLOv8: A Comprehensive Architectural Benchmark of Next-Generation Real-Time Object Detection Models
Empirical benchmark finds YOLO26 superior on Pascal VOC accuracy and efficiency but YOLOv8 faster on GPU, with both models struggling similarly on VisDrone small-object detection.
-
NTIRE 2026 Rip Current Detection and Segmentation (RipDetSeg) Challenge Report
The NTIRE 2026 RipDetSeg Challenge evaluated AI methods for rip current detection and segmentation, finding that pretrained general-purpose models with augmentation and post-processing performed well on a diverse multi-country dataset.