{"total":13,"items":[{"citing_arxiv_id":"2607.00889","ref_index":27,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"DeWorldSG: Depth-Aware 3D Semantic Scene Graph Generation via World-Model Priors","primary_cat":"cs.CV","submitted_at":"2026-07-01T12:55:09+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"DeWorldSG improves 3D scene graph generation from RGB-D sequences by using depth-guided 3D Gaussian object nodes and V-JEPA 2 world-model priors for spatiotemporal relation refinement, reporting large recall gains on 3DSSG and ReplicaSSG.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2607.00734","ref_index":11,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ConRTF: Edge-Constrained Boundary Distribution Refinement for Realtime TransFormer Table Structure Recognition","primary_cat":"cs.CV","submitted_at":"2026-07-01T10:18:12+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"ConRTF adds an edge-constrained fine-grained localization loss to a distribution-based real-time detector to improve boundary accuracy in table structure recognition, claiming up to +1.6 GriTS gains on PubTables-1M while remaining data-efficient.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2607.00124","ref_index":23,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Segmenting, Fast and Slow: Real-Time Open-Vocabulary Video Instance Segmentation with Dual-Path Processing","primary_cat":"cs.CV","submitted_at":"2026-06-30T19:59:19+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"SegFS is a dual-path architecture that uses sparse keyframe open-vocabulary predictions to condition a fast feature-space network for efficient temporal instance segmentation in videos.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.10953","ref_index":3,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Architect-Ant: Editable Automatic Furnishing of Architectural Floor Plans","primary_cat":"cs.AI","submitted_at":"2026-06-09T14:55:43+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"Architect-Ant fine-tunes a vision-language model on the new AntPlan-270 dataset using procedural reasoning traces and preference optimization to output editable DSL furniture layouts that can be rendered into images.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.09367","ref_index":10,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"RT-SDGOD: Real-Time Single-Domain Generalized Object Detection","primary_cat":"cs.CV","submitted_at":"2026-06-08T11:39:18+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"RT-SDGDet applies one-to-many supervision, Discriminative Evidence Diversity Learning, and Dual-view Evidence Consistency Learning during training to reduce missed detections in real-time object detectors under unseen domain shifts.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2606.03748","ref_index":47,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Ultralytics YOLO26: Unified Real-Time End-to-End Vision Models","primary_cat":"cs.CV","submitted_at":"2026-06-02T15:01:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"YOLO26 presents a unified real-time vision model family with dual-head end-to-end design, new training components, and task-specific heads that reports improved mAP-latency tradeoffs on COCO and LVIS benchmarks across detection, segmentation, pose, and oriented detection.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.30578","ref_index":42,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"AdvScene: Rethinking Adversarial Patch Evaluation Through Scene Robustness","primary_cat":"cs.CR","submitted_at":"2026-05-28T21:11:35+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"AdvScene is a scene-grounded evaluation method using Adversarial Patch-to-Scene Embedding (APSE) to map the operational envelope of physical adversarial patches in reconstructed real environments.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.25046","ref_index":17,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"TinyFormer: Preserving Tiny Objects in YOLO-DETR Hybrid Real-time Detectors","primary_cat":"cs.CV","submitted_at":"2026-05-24T12:42:13+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":5.0,"formal_verification":"none","one_line_summary":"TinyFormer adds Parallel Bi-fusion Module and Spatial Semantic Adapter to a YOLO-DETR hybrid, raising small-object AP by 1.6 points to 58.5% on MS COCO while keeping real-time speed.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.24831","ref_index":16,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"YOLO26 vs. YOLOv8: A Comprehensive Architectural Benchmark of Next-Generation Real-Time Object Detection Models","primary_cat":"cs.CV","submitted_at":"2026-05-24T02:55:33+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":2.0,"formal_verification":"none","one_line_summary":"Empirical benchmark finds YOLO26 superior on Pascal VOC accuracy and efficiency but YOLOv8 faster on GPU, with both models struggling similarly on VisDrone small-object detection.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2605.03784","ref_index":24,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"ReLeaf: Benchmarking Leaf Segmentation across Domains and Species","primary_cat":"cs.CV","submitted_at":"2026-05-05T14:11:52+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":7.0,"formal_verification":"none","one_line_summary":"A YOLO26 model trained on four leaf segmentation datasets reaches 83.9% mean mAP50-95 on their test sets but only 40.2% on a new 23-species benchmark, revealing substantial cross-domain generalization gaps.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.23442","ref_index":25,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Resource-Constrained UAV-Based Weed Detection for Site-Specific Management on Edge Devices","primary_cat":"cs.CV","submitted_at":"2026-04-25T20:58:51+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":4.0,"formal_verification":"none","one_line_summary":"YOLOv11s and RT-DETRv2-R50-M provide the best accuracy-speed trade-off for real-time weed detection on edge UAV systems, with mAP50 up to 79% and low latency.","context_count":0,"top_context_role":null,"top_context_polarity":null,"context_text":null},{"citing_arxiv_id":"2604.11042","ref_index":18,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"Improving Layout Representation Learning Across Inconsistently Annotated Datasets via Agentic Harmonization","primary_cat":"cs.CV","submitted_at":"2026-04-13T06:14:20+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"VLM-based harmonization of inconsistent annotations across two document layout corpora raises detection F-score from 0.860 to 0.883 and table TEDS from 0.750 to 0.814 while tightening embedding clusters.","context_count":1,"top_context_role":"background","top_context_polarity":"background","context_text":"this architecture for real-time applications. In the document domain, these detection models have been supported by large-scale benchmarks such as PubLayNet [6], DocBank [7], and DocLayNet [8]. These works show that modern detectors can perform strongly when trained on large datasets with consistent annotations. More recent document layout models such as VGT [18], DocLayout-YOLO [19], and the Docling Heron models [20] further demonstrate the strength of detector-based layout analysis at scale. However, as models are trained on increasingly heterogeneous document corpora, the assumption of consistent supervision becomes increasingly fragile. 2.2 Cross-Dataset Object Detection Adapting object detectors across datasets is commonly framed as a transfer learning or domain"},{"citing_arxiv_id":"2502.12524","ref_index":41,"ref_count":1,"confidence":0.9,"is_internal_anchor":false,"paper_title":"YOLOv12: Attention-Centric Real-Time Object Detectors","primary_cat":"cs.CV","submitted_at":"2025-02-18T04:20:14+00:00","verdict":"UNVERDICTED","verdict_confidence":"LOW","novelty_score":6.0,"formal_verification":"none","one_line_summary":"YOLOv12 is a new attention-based real-time object detector that reports higher accuracy than YOLOv10, YOLOv11, and RT-DETR variants at comparable or better speed and efficiency.","context_count":1,"top_context_role":"baseline","top_context_polarity":"baseline","context_text":"YOLOv8-N [24] 8.7 3.2 37.4 52.6 40.5 1.77 YOLOv10-N [53] 6.7 2.3 38.5 53.8 41.7 1.84 YOLO11-N [28] 6.5 2.6 39.4 55.3 42.8 1.5 YOLOv12-N (Ours) 6.5 2.6 40.6 56.7 43.8 1.64 YOLOv6-3.0-S [32] 45.3 18.5 44.3 61.2 - 3.42 Gold-YOLO-S [54] 46.0 21.5 45.4 62.5 - 3.82 YOLOv8-S [24] 28.6 11.2 45.0 61.8 48.7 2.33 RT-DETR-R18 [66] 60.0 20.0 46.5 63.8 - 4.58 RT-DETRv2-R18 [41] 60.0 20.0 47.9 64.9 - 4.58 YOLOv9-S [58] 26.4 7.1 46.8 63.4 50.7 - YOLOv10-S [53] 21.6 7.2 46.3 63.0 50.4 2.49 YOLO11-S [28] 21.5 9.4 46.9 63.9 50.6 2.5 YOLOv12-S (Ours) 21.4 9.3 48.0 65.0 51.8 2.61 YOLOv6-3.0-M [32] 85.8 34.9 49.1 66.1 - 5.63 Gold-YOLO-M [54] 87.5 41.3 49.8 67.0 - 6.38 YOLOv8-M [24] 78.9 25.9 50.3 67.2 54.7 5.09 RT-DETR-R34 [66] 100."}],"limit":50,"offset":0}