CoExVQA uses a chain-of-explanation to ground DocVQA answers in localized document regions, achieving state-of-the-art explainable performance with a 12% ANLS gain on PFL-DocVQA over prior baselines.
hub
arXiv preprint arXiv:2506.17733 (2025)
14 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 14representative citing papers
WUTDet is a 100K-image ship detection dataset with benchmarks indicating Transformer models outperform CNN and Mamba architectures in accuracy and small-object detection for complex maritime environments.
DLEBench is the first benchmark for small-scale object editing in instruction-based image editing models, using 1889 samples, seven instruction types, and a dual-mode evaluation protocol to reveal performance gaps in 10 tested models.
HyperFSAD uses sparse hypergraph matching on DINOv3 features plus dual-branch scoring to deliver training-free and language-free few-shot anomaly detection that reaches state-of-the-art on six industrial and medical datasets.
Hyper-FEOD fuses RGB and event data via sparse hypergraph cross-modal fusion and region-specialized MoE experts to improve accuracy-efficiency in object detection.
Ψ-Map combines plane-constrained Gaussian surfels from LiDAR with end-to-end panoptic lifting to deliver high-precision geometric and semantic reconstruction in large-scale environments at real-time speeds.
RACANet proposes a reliability-aware two-stage fusion network with cross-modal pretraining and local anchor modules that outperforms prior RGB-T crowd counting methods on standard benchmarks.
Fast-SegSim achieves real-time 3D-consistent open-vocabulary segmentation by optimizing feature accumulation in 2D Gaussian Splatting with Precise Tile Intersection and Top-K Hard Selection.
FMC-DETR proposes a frequency-decoupled fusion framework with WeKat backbone, MDFC coordination, and CPF fusion modules that claims state-of-the-art results on remote sensing object detection benchmarks.
Introduces UAVDB dataset for UAV detection/segmentation via PIC point-to-box conversion and SAM2 masks, with YOLO baselines showing PIC+SAM2 outperforms prior annotation methods on IoU.
YOLO-MD improves underwater marine debris detection by adding a Dual-Branch Convolutional Enhanced Self-Attention module, a lightweight shift operation, and SFG-Loss for class imbalance, achieving 0.875 precision and 0.849 mAP50 on the UODM dataset.
YOLOv11s and RT-DETRv2-R50-M provide the best accuracy-speed trade-off for real-time weed detection on edge UAV systems, with mAP50 up to 79% and low latency.
MDDCNet combines Mamba blocks with deformable dilated convolutions, enhanced feed-forward networks, and an attention-aggregating feature pyramid to achieve better multi-scale traffic object detection than prior detectors.
The NTIRE 2026 RipDetSeg Challenge evaluated AI methods for rip current detection and segmentation, finding that pretrained general-purpose models with augmentation and post-processing performed well on a diverse multi-country dataset.
citing papers explorer
-
Towards Self-Explainable Document Visual Question Answering with Chain-of-Explanation Predictions
CoExVQA uses a chain-of-explanation to ground DocVQA answers in localized document regions, achieving state-of-the-art explainable performance with a 12% ANLS gain on PFL-DocVQA over prior baselines.
-
WUTDet: A 100K-Scale Ship Detection Dataset and Benchmarks with Dense Small Objects
WUTDet is a 100K-image ship detection dataset with benchmarks indicating Transformer models outperform CNN and Mamba architectures in accuracy and small-object detection for complex maritime environments.
-
DLEBench: Evaluating Small-scale Object Editing Ability for Instruction-based Image Editing Model
DLEBench is the first benchmark for small-scale object editing in instruction-based image editing models, using 1889 samples, seven instruction types, and a dual-mode evaluation protocol to reveal performance gaps in 10 tested models.
-
Hypergraph-Enhanced Training-Free and Language-Free Few-Shot Anomaly Detection
HyperFSAD uses sparse hypergraph matching on DINOv3 features plus dual-branch scoring to deliver training-free and language-free few-shot anomaly detection that reaches state-of-the-art on six industrial and medical datasets.
-
Sparse Hypergraph-Enhanced Frame-Event Object Detection with Fine-Grained MoE
Hyper-FEOD fuses RGB and event data via sparse hypergraph cross-modal fusion and region-specialized MoE experts to improve accuracy-efficiency in object detection.
-
{\Psi}-Map: Panoptic Surface Integrated Mapping Enables Real2Sim Transfer
Ψ-Map combines plane-constrained Gaussian surfels from LiDAR with end-to-end panoptic lifting to deliver high-precision geometric and semantic reconstruction in large-scale environments at real-time speeds.
-
RACANet: Reliability-Aware Crowd Anchor Network for RGB-T Crowd Counting
RACANet proposes a reliability-aware two-stage fusion network with cross-modal pretraining and local anchor modules that outperforms prior RGB-T crowd counting methods on standard benchmarks.
-
Fast-SegSim: Real-Time Open-Vocabulary Segmentation for Robotics in Simulation
Fast-SegSim achieves real-time 3D-consistent open-vocabulary segmentation by optimizing feature accumulation in 2D Gaussian Splatting with Precise Tile Intersection and Top-K Hard Selection.
-
FMC-DETR: Frequency-Decoupled Multi-Domain Coordination for Aerial-View Object Detection
FMC-DETR proposes a frequency-decoupled fusion framework with WeKat backbone, MDFC coordination, and CPF fusion modules that claims state-of-the-art results on remote sensing object detection benchmarks.
-
UAVDB: Point-Guided Masks for UAV Detection and Segmentation
Introduces UAVDB dataset for UAV detection/segmentation via PIC point-to-box conversion and SAM2 masks, with YOLO baselines showing PIC+SAM2 outperforms prior annotation methods on IoU.
-
A Marine Debris Detection Framework for Ocean Robots via Self-Attention Enhancement and Feature Interaction Optimization
YOLO-MD improves underwater marine debris detection by adding a Dual-Branch Convolutional Enhanced Self-Attention module, a lightweight shift operation, and SFG-Loss for class imbalance, achieving 0.875 precision and 0.849 mAP50 on the UODM dataset.
-
Resource-Constrained UAV-Based Weed Detection for Site-Specific Management on Edge Devices
YOLOv11s and RT-DETRv2-R50-M provide the best accuracy-speed trade-off for real-time weed detection on edge UAV systems, with mAP50 up to 79% and low latency.
-
Beyond Mamba: Enhancing State-space Models with Deformable Dilated Convolutions for Multi-scale Traffic Object Detection
MDDCNet combines Mamba blocks with deformable dilated convolutions, enhanced feed-forward networks, and an attention-aggregating feature pyramid to achieve better multi-scale traffic object detection than prior detectors.
-
NTIRE 2026 Rip Current Detection and Segmentation (RipDetSeg) Challenge Report
The NTIRE 2026 RipDetSeg Challenge evaluated AI methods for rip current detection and segmentation, finding that pretrained general-purpose models with augmentation and post-processing performed well on a diverse multi-country dataset.