Mixed citations

End- to-end object detection with transformers

Nicolas Carion, Francisco Massa, Gabriel Synnaeve, Nicolas Usunier, Alexander Kirillov, Sergey Zagoruyko · 2005 · arXiv 2005.12872

Mixed citation behavior. Most common role is background (67%).

16 Pith papers citing it

Background 67% of classified citations

read on arXiv browse 16 citing papers

citation-role summary

background 5 method 1

citation-polarity summary

background 4 unclear 1 use method 1

representative citing papers

SARES-DEIM: Sparse Mixture-of-Experts Meets DETR for Robust SAR Ship Detection

cs.CV · 2026-04-05 · unverdicted · novelty 7.0

SARES-DEIM achieves 76.4% mAP50:95 and 93.8% mAP50 on HRSID by routing SAR features through sparse frequency and wavelet experts plus a high-resolution preservation neck, outperforming prior YOLO and SAR detectors.

Transformers Provably Learn Sparse XOR with Polylogarithmic Parameters

cs.LG · 2025-02-11 · unverdicted · novelty 7.0

Single-layer two-head Transformers learn sparse XOR with O(polylog(d)) parameters in one gradient step, breaking the Omega(d) parameter bottleneck of FFNNs.

Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware

cs.RO · 2023-04-23 · conditional · novelty 7.0

Low-cost imprecise robots achieve 80-90% success on six fine bimanual manipulation tasks using imitation learning with a new Action Chunking with Transformers algorithm trained on only 10 minutes of demonstrations.

AMAR: Lightweight Attention-Based Multi-User Activity Recognition from Wi-Fi CSI

eess.SP · 2026-05-20 · unverdicted · novelty 6.0

AMAR uses a transformer with learnable query embeddings for set-based prediction of concurrent activities from composite Wi-Fi CSI, combined with edge feature extraction and vector quantization for bandwidth-efficient deployment.

Lucid-XR: An Extended-Reality Data Engine for Robotic Manipulation

cs.RO · 2026-04-30 · unverdicted · novelty 6.0

Lucid-XR uses XR-headset physics simulation and physics-guided video generation to create synthetic data that trains robot policies transferring zero-shot to unseen real-world manipulation tasks.

Improving Layout Representation Learning Across Inconsistently Annotated Datasets via Agentic Harmonization

cs.CV · 2026-04-13 · unverdicted · novelty 6.0

VLM-based harmonization of inconsistent annotations across two document layout corpora raises detection F-score from 0.860 to 0.883 and table TEDS from 0.750 to 0.814 while tightening embedding clusters.

LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection

cs.CV · 2026-04-05 · unverdicted · novelty 6.0

LAA-X uses multi-task learning with explicit localized artifact attention and blending synthesis to build a deepfake detector that generalizes to high-quality and unseen manipulations after training only on real and pseudo-fake samples.

DeCo-DETR: Decoupled Cognition DETR for efficient Open-Vocabulary Object Detection

cs.CV · 2026-04-03 · unverdicted · novelty 6.0 · 2 refs

DeCo-DETR builds hierarchical semantic prototypes offline and uses decoupled training streams to deliver competitive zero-shot open-vocabulary detection with improved inference speed.

Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control

cs.RO · 2026-02-13 · unverdicted · novelty 6.0

Steerable VLAs trained on rich synthetic commands at subtask, motion, and pixel levels enable VLMs to steer robot behavior more effectively, outperforming prior hierarchical baselines on real-world manipulation and generalization tasks.

Learning to Detect and Segment for Open Vocabulary Object Detection

cs.CV · 2022-12-23 · unverdicted · novelty 6.0

CondHead conditionally parameterizes detection heads on semantic embeddings via aggregated expert and dynamically generated streams to improve generalization for novel categories.

Predicting the thermodynamics in the chromosphere from the translation of SDO data into the IRIS$^{2}$ inversion results using a visual transformer model

astro-ph.SR · 2026-04-23 · unverdicted · novelty 5.0

A visual transformer model trained on IRIS inversions predicts chromospheric temperature and density from SDO data with correlations around 0.8 on 80% of test cases.

RareSpot+: A Benchmark, Model, and Active Learning Framework for Small and Rare Wildlife in Aerial Imagery

cs.CV · 2026-04-21 · unverdicted · novelty 5.0

RareSpot+ boosts small-object detection mAP by 0.13 on aerial wildlife data and cuts annotation needs to 1.7% of tiles via consistency losses and spatial priors.

Learning Class Difficulty in Imbalanced Histopathology Segmentation via Dynamic Focal Attention

eess.IV · 2026-04-15 · unverdicted · novelty 5.0

Dynamic Focal Attention learns class-specific difficulty via per-class biases in attention logits, improving Dice and IoU on imbalanced histopathology segmentation benchmarks.

MapATM: Enhancing HD Map Construction through Actor Trajectory Modeling

cs.CV · 2026-04-13 · unverdicted · novelty 5.0

MapATM improves lane divider AP by 4.6 and mAP by 2.6 on NuScenes by treating actor trajectories as structural priors for road geometry.

SynSur: An end-to-end generative pipeline for synthetic industrial surface defect generation and detection

cs.CV · 2026-04-29 · unverdicted · novelty 4.0

A generative pipeline creates realistic synthetic pitting defects and other surface flaws that, when added to real training data, yield modest gains in industrial defect detectors without replacing the need for authentic samples.

A Comparative Study of Modern Object Detectors for Robust Apple Detection in Orchard Imagery

cs.CV · 2026-04-11 · unverdicted · novelty 3.0

YOLO11n achieves the highest mAP@0.5:0.95 of 0.6065 for apple localization, with other detectors showing trade-offs in recall and precision at low confidence thresholds.

citing papers explorer

Showing 16 of 16 citing papers.

SARES-DEIM: Sparse Mixture-of-Experts Meets DETR for Robust SAR Ship Detection cs.CV · 2026-04-05 · unverdicted · none · ref 9
SARES-DEIM achieves 76.4% mAP50:95 and 93.8% mAP50 on HRSID by routing SAR features through sparse frequency and wavelet experts plus a high-resolution preservation neck, outperforming prior YOLO and SAR detectors.
Transformers Provably Learn Sparse XOR with Polylogarithmic Parameters cs.LG · 2025-02-11 · unverdicted · none · ref 5
Single-layer two-head Transformers learn sparse XOR with O(polylog(d)) parameters in one gradient step, breaking the Omega(d) parameter bottleneck of FFNNs.
Learning Fine-Grained Bimanual Manipulation with Low-Cost Hardware cs.RO · 2023-04-23 · conditional · none · ref 9
Low-cost imprecise robots achieve 80-90% success on six fine bimanual manipulation tasks using imitation learning with a new Action Chunking with Transformers algorithm trained on only 10 minutes of demonstrations.
AMAR: Lightweight Attention-Based Multi-User Activity Recognition from Wi-Fi CSI eess.SP · 2026-05-20 · unverdicted · none · ref 24
AMAR uses a transformer with learnable query embeddings for set-based prediction of concurrent activities from composite Wi-Fi CSI, combined with edge feature extraction and vector quantization for bandwidth-efficient deployment.
Lucid-XR: An Extended-Reality Data Engine for Robotic Manipulation cs.RO · 2026-04-30 · unverdicted · none · ref 17
Lucid-XR uses XR-headset physics simulation and physics-guided video generation to create synthetic data that trains robot policies transferring zero-shot to unseen real-world manipulation tasks.
Improving Layout Representation Learning Across Inconsistently Annotated Datasets via Agentic Harmonization cs.CV · 2026-04-13 · unverdicted · none · ref 13
VLM-based harmonization of inconsistent annotations across two document layout corpora raises detection F-score from 0.860 to 0.883 and table TEDS from 0.750 to 0.814 while tightening embedding clusters.
LAA-X: Unified Localized Artifact Attention for Quality-Agnostic and Generalizable Face Forgery Detection cs.CV · 2026-04-05 · unverdicted · none · ref 59
LAA-X uses multi-task learning with explicit localized artifact attention and blending synthesis to build a deepfake detector that generalizes to high-quality and unseen manipulations after training only on real and pseudo-fake samples.
DeCo-DETR: Decoupled Cognition DETR for efficient Open-Vocabulary Object Detection cs.CV · 2026-04-03 · unverdicted · none · ref 3 · 2 links
DeCo-DETR builds hierarchical semantic prototypes offline and uses decoupled training streams to deliver competitive zero-shot open-vocabulary detection with improved inference speed.
Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control cs.RO · 2026-02-13 · unverdicted · none · ref 51
Steerable VLAs trained on rich synthetic commands at subtask, motion, and pixel levels enable VLMs to steer robot behavior more effectively, outperforming prior hierarchical baselines on real-world manipulation and generalization tasks.
Learning to Detect and Segment for Open Vocabulary Object Detection cs.CV · 2022-12-23 · unverdicted · none · ref 3
CondHead conditionally parameterizes detection heads on semantic embeddings via aggregated expert and dynamically generated streams to improve generalization for novel categories.
Predicting the thermodynamics in the chromosphere from the translation of SDO data into the IRIS$^{2}$ inversion results using a visual transformer model astro-ph.SR · 2026-04-23 · unverdicted · none · ref 23
A visual transformer model trained on IRIS inversions predicts chromospheric temperature and density from SDO data with correlations around 0.8 on 80% of test cases.
RareSpot+: A Benchmark, Model, and Active Learning Framework for Small and Rare Wildlife in Aerial Imagery cs.CV · 2026-04-21 · unverdicted · none · ref 7
RareSpot+ boosts small-object detection mAP by 0.13 on aerial wildlife data and cuts annotation needs to 1.7% of tiles via consistency losses and spatial priors.
Learning Class Difficulty in Imbalanced Histopathology Segmentation via Dynamic Focal Attention eess.IV · 2026-04-15 · unverdicted · none · ref 4
Dynamic Focal Attention learns class-specific difficulty via per-class biases in attention logits, improving Dice and IoU on imbalanced histopathology segmentation benchmarks.
MapATM: Enhancing HD Map Construction through Actor Trajectory Modeling cs.CV · 2026-04-13 · unverdicted · none · ref 2
MapATM improves lane divider AP by 4.6 and mAP by 2.6 on NuScenes by treating actor trajectories as structural priors for road geometry.
SynSur: An end-to-end generative pipeline for synthetic industrial surface defect generation and detection cs.CV · 2026-04-29 · unverdicted · none · ref 8
A generative pipeline creates realistic synthetic pitting defects and other surface flaws that, when added to real training data, yield modest gains in industrial defect detectors without replacing the need for authentic samples.
A Comparative Study of Modern Object Detectors for Robust Apple Detection in Orchard Imagery cs.CV · 2026-04-11 · unverdicted · none · ref 26
YOLO11n achieves the highest mAP@0.5:0.95 of 0.6065 for apple localization, with other detectors showing trade-offs in recall and precision at low confidence thresholds.

End- to-end object detection with transformers

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer