{"total":18,"items":[{"citing_arxiv_id":"2605.20738","ref_index":131,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"STAR-IOD: Scale-decoupled Topology Alignment with Pseudo-label Refinement for Remote Sensing Incremental Object Detection","primary_cat":"cs.CV","submitted_at":"2026-05-20T05:43:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"STAR-IOD applies scale-decoupled topology alignment and K-Means-based pseudo-label refinement to reduce catastrophic forgetting in remote sensing incremental object detection, reporting 1.7% and 2.1% mAP gains on new DIOR-IOD and DOTA-IOD datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.16671","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Sustainable Intelligence for the Wild: Democratizing Ecological Monitoring via Knowledge-Adaptive Edge Expert Agents","primary_cat":"cs.AI","submitted_at":"2026-05-15T22:12:02+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"Proposes a knowledge-adaptive edge expert agent architecture for sustainable biodiversity monitoring that separates visual perception from reasoning with an explicit knowledge base.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.10845","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"BabelDOC: Better Layout-Preserving PDF Translation via Intermediate Representation","primary_cat":"cs.CV","submitted_at":"2026-05-11T16:56:44+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"BabelDOC uses an intermediate representation to decouple layout from content for improved layout-preserving PDF translation.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"you need.Advances in Neural Information Process- ing Systems, 30. Ao Wang, Hui Chen, Lihao Liu, Kai Chen, Zi- jia Lin, Jungong Han, and Guiguang Ding. 2024. Yolov10: Real-time end-to-end object detection. arXiv preprint arXiv:2405.14458. ws051682. 2025. feat(translator): Add Ali Qwen MT translator by ws051682 · Pull Request #585 · Byaidu/PDFMathTranslate. [Online; accessed 18. Mar. 2025]. Wybxc. 2025. feat: onnx support by Wybxc · Pull Request #116 · Byaidu/PDFMathTranslate. [Online; accessed 18. Mar. 2025]. Zhiyuan Zhao, Hengrui Kang, Bin Wang, and Conghui He. 2024. DocLayout-YOLO: Enhancing Document Layout Analysis through Diverse Synthetic Data and Global-to-Local Adaptive Perception.Preprint, arXiv:2410.12628. Jinhua Zhu, Yingce Xia, Lijun Wu, Di He, Tao Qin,"},{"citing_arxiv_id":"2605.10496","ref_index":13,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"M$^2$E-UAV: A Benchmark and Analysis for Onboard Motion-on-Motion Event-Based Tiny UAV Detection","primary_cat":"cs.CV","submitted_at":"2026-05-11T12:54:34+00:00","verdict":"CONDITIONAL","verdict_confidence":"LOW","novelty_score":8.0,"formal_verification":"none","one_line_summary":"M²E-UAV is the first benchmark dataset and evaluation protocol for tiny UAV detection from a moving event camera in motion-on-motion conditions.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.04514","ref_index":35,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Deep Learning-Based Computer Vision for Beam Selection and Proactive Blockage Prediction","primary_cat":"eess.SP","submitted_at":"2026-05-06T05:48:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Vision-aided deep learning delivers 98.96% beam prediction accuracy and over 98% proactive blockage prediction for mm-wave links, including the first treatment of simultaneous non-uniform mobility.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.27383","ref_index":43,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Real-time Scale-robust Network for Glottis Segmentation in Nasal Transnasal Intubation","primary_cat":"eess.IV","submitted_at":"2026-04-30T03:51:25+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A scale-robust lightweight CNN for glottis segmentation achieves 92.9% mDice at over 170 FPS with a 19 MB model size on three datasets.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.22014","ref_index":44,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DM$^3$-Nav: Decentralized Multi-Agent Multimodal Multi-Object Semantic Navigation","primary_cat":"cs.MA","submitted_at":"2026-04-23T19:07:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"DM³-Nav delivers decentralized multi-agent semantic navigation for multimodal open-vocabulary multi-object tasks that matches centralized baselines in simulation and succeeds in real-world robot deployments.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.20822","ref_index":41,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Global Offshore Wind Infrastructure: Deployment and Operational Dynamics from Dense Sentinel-1 Time Series","primary_cat":"cs.CV","submitted_at":"2026-04-22T17:47:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A new global corpus of dense Sentinel-1 SAR time series for 15,606 offshore wind infrastructure locations is released with baseline semantic labels and an expert benchmark, enabling analyses of deployment dynamics.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.10077","ref_index":40,"ref_count":2,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DocRevive: A Unified Pipeline for Document Text Restoration","primary_cat":"cs.CV","submitted_at":"2026-04-11T07:50:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"A unified pipeline using OCR, inpainting, and diffusion models restores text in degraded documents on a new synthetic benchmark dataset, evaluated with the proposed UCSM metric.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.09996","ref_index":1,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"A Comparative Study of Modern Object Detectors for Robust Apple Detection in Orchard Imagery","primary_cat":"cs.CV","submitted_at":"2026-04-11T02:56:42+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"YOLO11n achieves the highest mAP@0.5:0.95 of 0.6065 for apple localization, with other detectors showing trade-offs in recall and precision at low confidence thresholds.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2512.07078","ref_index":70,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DFIR-DETR: Frequency-Domain Iterative Refinement and Dynamic Feature Aggregation for Small Object Detection","primary_cat":"cs.CV","submitted_at":"2025-12-08T01:25:10+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"DFIR-DETR augments RT-DETR with frequency-domain iterative refinement and dynamic feature aggregation, reporting 92.9% mAP50 on NEU-DET and 51.6% on VisDrone at 11.7M parameters and 47.2 GFLOPs.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2507.18967","ref_index":20,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Underwater Waste Detection Using Deep Learning A Performance Comparison of YOLOv7 to 10 and Faster RCNN","primary_cat":"cs.CV","submitted_at":"2025-07-25T05:36:37+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":3.0,"formal_verification":"none","one_line_summary":"YOLOv8 achieves the highest mAP of 80.9% for detecting 15 classes of underwater waste among the tested models.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2505.15325","ref_index":74,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"SoftHGNN: Soft Hypergraph Neural Networks for General Visual Recognition","primary_cat":"cs.CV","submitted_at":"2025-05-21T10:01:30+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SoftHGNN introduces differentiable soft hyperedges via learnable prototypes and top-k sparse selection to model high-order visual interactions and improve recognition accuracy.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2502.12524","ref_index":53,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"YOLOv12: Attention-Centric Real-Time Object Detectors","primary_cat":"cs.CV","submitted_at":"2025-02-18T04:20:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"YOLOv12 is a new attention-based real-time object detector that reports higher accuracy than YOLOv10, YOLOv11, and RT-DETR variants at comparable or better speed and efficiency.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"periments on standard object detection benchmarks follow- ing YOLOv11 [28] without any additional tricks, demon- strating that YOLOv12 provides significant improvements over previous popular models in terms of latency-accuracy and FLOPs-accuracy trade-offs across these scales, as il- lustrated in Figure 1. For example, YOLOv12-N achieves 40.6% mAP, outperforming YOLOv10-N [53] by 2.1% mAP while maintaining a faster inference speed, and YOLOv11-N [28] by 1.2% mAP with a comparable speed. This advantage remains consistent across other scale mod- els. Compared to RT-DETR-R18 [66] / RT-DETRv2- R18 [40], YOLOv12-S is 1.5%/0.1% mAP better, while reports 42%/42% faster latency speed, requiring only 36%/36% of their computations and 45%/45% of their pa-"},{"citing_arxiv_id":"2501.13400","ref_index":4,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"YOLOv8 to YOLO11: A Comprehensive Architecture In-depth Comparative Review","primary_cat":"cs.CV","submitted_at":"2025-01-23T05:57:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"Comparative review of YOLOv8 to YOLO11 architectures based on papers, docs, and code inspection, noting incremental improvements and some unchanged blocks.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2410.17725","ref_index":15,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"YOLOv11: An Overview of the Key Architectural Enhancements","primary_cat":"cs.CV","submitted_at":"2024-10-23T09:55:22+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":1.0,"formal_verification":"none","one_line_summary":"YOLOv11 adds blocks such as C3k2, SPPF, and C2PSA to improve feature extraction, mAP, and efficiency while supporting detection, segmentation, pose, and oriented detection across model sizes.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"YOLOv7 [12] 2022 Object Detection, Object Tracking, Instance Segmentation Transformers, E-ELAN reparame- terisation PyTorch YOLOv8 [13] 2023 Object Detection, Instance Segmen- tation, Panoptic Segmentation, Key- point Estimation GANs, anchor-free detection PyTorch YOLOv9 [14] 2024 Object Detection, Instance Segmen- tation PGI and GELAN PyTorch YOLOv10 [15] 2024 Object Detection Consistent dual assignments for NMS-free training PyTorch This evolution showcases the rapid advancement in object detection technologies, with each version introducing novel features and expanding the range of supported tasks. From the original YOLO's groundbreaking single-stage detection to YOLOv10's NMS-free training, the series has consistently pushed the boundaries of real-time object detection."},{"citing_arxiv_id":"2409.18839","ref_index":31,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"MinerU: An Open-Source Solution for Precise Document Content Extraction","primary_cat":"cs.CV","submitted_at":"2024-09-27T15:35:15+00:00","verdict":"CONDITIONAL","verdict_confidence":"MODERATE","novelty_score":4.0,"formal_verification":"none","one_line_summary":"MinerU delivers an open-source pipeline for high-precision document content extraction by integrating specialized models with tuned preprocessing and postprocessing rules.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2409.06490","ref_index":67,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"UAVDB: Point-Guided Masks for UAV Detection and Segmentation","primary_cat":"cs.CV","submitted_at":"2024-09-09T13:27:53+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"Introduces UAVDB dataset for UAV detection/segmentation via PIC point-to-box conversion and SAM2 masks, with YOLO baselines showing PIC+SAM2 outperforms prior annotation methods on IoU.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null}],"limit":50,"offset":0}