AdvScene is a scene-grounded evaluation method using Adversarial Patch-to-Scene Embedding (APSE) to map the operational envelope of physical adversarial patches in reconstructed real environments.
hub
Rt-detrv2: Improved base- line with bag-of-freebies for real-time detection transformer
13 Pith papers cite this work. Polarity classification is still indexing.
hub tools
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 13representative citing papers
A YOLO26 model trained on four leaf segmentation datasets reaches 83.9% mean mAP50-95 on their test sets but only 40.2% on a new 23-species benchmark, revealing substantial cross-domain generalization gaps.
SegFS is a dual-path architecture that uses sparse keyframe open-vocabulary predictions to condition a fast feature-space network for efficient temporal instance segmentation in videos.
Architect-Ant fine-tunes a vision-language model on the new AntPlan-270 dataset using procedural reasoning traces and preference optimization to output editable DSL furniture layouts that can be rendered into images.
VLM-based harmonization of inconsistent annotations across two document layout corpora raises detection F-score from 0.860 to 0.883 and table TEDS from 0.750 to 0.814 while tightening embedding clusters.
YOLOv12 is a new attention-based real-time object detector that reports higher accuracy than YOLOv10, YOLOv11, and RT-DETR variants at comparable or better speed and efficiency.
DeWorldSG improves 3D scene graph generation from RGB-D sequences by using depth-guided 3D Gaussian object nodes and V-JEPA 2 world-model priors for spatiotemporal relation refinement, reporting large recall gains on 3DSSG and ReplicaSSG.
TinyFormer adds Parallel Bi-fusion Module and Spatial Semantic Adapter to a YOLO-DETR hybrid, raising small-object AP by 1.6 points to 58.5% on MS COCO while keeping real-time speed.
ConRTF adds an edge-constrained fine-grained localization loss to a distribution-based real-time detector to improve boundary accuracy in table structure recognition, claiming up to +1.6 GriTS gains on PubTables-1M while remaining data-efficient.
RT-SDGDet applies one-to-many supervision, Discriminative Evidence Diversity Learning, and Dual-view Evidence Consistency Learning during training to reduce missed detections in real-time object detectors under unseen domain shifts.
YOLO26 presents a unified real-time vision model family with dual-head end-to-end design, new training components, and task-specific heads that reports improved mAP-latency tradeoffs on COCO and LVIS benchmarks across detection, segmentation, pose, and oriented detection.
YOLOv11s and RT-DETRv2-R50-M provide the best accuracy-speed trade-off for real-time weed detection on edge UAV systems, with mAP50 up to 79% and low latency.
Empirical benchmark finds YOLO26 superior on Pascal VOC accuracy and efficiency but YOLOv8 faster on GPU, with both models struggling similarly on VisDrone small-object detection.
citing papers explorer
-
YOLOv12: Attention-Centric Real-Time Object Detectors
YOLOv12 is a new attention-based real-time object detector that reports higher accuracy than YOLOv10, YOLOv11, and RT-DETR variants at comparable or better speed and efficiency.