RT-SDGOD: Real-Time Single-Domain Generalized Object Detection

Fangzhuo Gao; Liang Wan; Ruize Han; Wei Feng; Yupeng Zhang

arxiv: 2606.09367 · v1 · pith:TRNXYEZJnew · submitted 2026-06-08 · 💻 cs.CV

RT-SDGOD: Real-Time Single-Domain Generalized Object Detection

Yupeng Zhang , Fangzhuo Gao , Ruize Han , Wei Feng , Liang Wan This is my paper

Pith reviewed 2026-06-27 16:46 UTC · model grok-4.3

classification 💻 cs.CV

keywords real-time object detectionsingle-domain generalizationdomain shiftDETRmulti-query supervisionmissed detectionstraining-time learningcross-domain robustness

0 comments

The pith

Real-time detectors generalize across domains by training multiple queries to cover stable object evidence without any added inference cost.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Real-time object detectors lose accuracy when weather or camera changes create new domains, mainly because they miss more objects due to weak and inconsistent evidence about what an object looks like. The paper shows that this can be fixed during training alone by making several queries for the same object work together to gather more diverse yet consistent evidence. One-to-many supervision first creates stable groups of queries per object, then two new losses expand the range of evidence and keep it steady under appearance changes. Because nothing changes at inference time, the detector stays fast while performing better on unseen domains. This matters for applications that must run in real time across varying conditions without retraining or extra hardware.

Core claim

The paper establishes that DETR-based real-time detectors degrade under domain shift primarily via increased missed detections caused by limited and unstable object-level discriminative evidence. RT-SDGDet addresses this with a multi-evidence collaborative modeling framework: one-to-many supervision builds stable object-specific query groups, Discriminative Evidence Diversity Learning expands coverage of object evidence, and Dual-view Evidence Consistency Learning improves stability across views. All additions occur only at training time, so inference overhead remains zero while generalization improves on multiple unseen target domains.

What carries the argument

Multi-evidence collaborative modeling framework that uses one-to-many supervision to form stable object-specific query groups, plus DEDL for evidence diversity and DvECL for evidence consistency under perturbations.

If this is right

Real-time detectors achieve higher accuracy on unseen domains while preserving original inference speed.
Missed detections fall because object evidence coverage widens and remains consistent across appearance changes.
The method applies to existing DETR-based real-time architectures without architecture changes at deployment.
Generalization gains hold across multiple distinct target domains without any target data used in training.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same training-only evidence stabilization might transfer to other real-time vision tasks that suffer domain shift, such as segmentation or tracking.
If the query-grouping idea scales, it could reduce reliance on expensive domain adaptation pipelines that require target-domain samples.
Designers of new real-time detectors could adopt these training losses as a default robustness layer when target conditions are unknown.

Load-bearing premise

Domain shift mainly increases missed detections because object evidence is limited and unstable, and training-time multi-query collaboration can expand and stabilize that evidence without harming real-time speed.

What would settle it

Run the detector on shifted domains and count missed detections with and without the training components; if missed detections do not drop or if measured inference latency rises, the claim is false.

Figures

Figures reproduced from arXiv: 2606.09367 by Fangzhuo Gao, Liang Wan, Ruize Han, Wei Feng, Yupeng Zhang.

**Figure 1.** Figure 1: Error pattern analysis of DETR-based detectors under SDGOD. Taking predictions on daytime-clear source images as the reference, we progressively increase environmental perturbations to simulate increasingly severe domain shifts and analyze the resulting error trends. (a) Class Confusion Rate: the proportion of detected objects with incorrect categories; (b) False Positive Rate: the proportion of detectio… view at source ↗

**Figure 2.** Figure 2: Evidence concentration analysis of DETR-based real-time detectors. For a high-confidence prediction, we first visualize the detector response with Grad-CAM and further examine its sensitivity to different regions through local blocking and local blurring. Specifically, CAM-Block and CAM-Blur apply blocking and blurring to the Grad-CAM-highlighted high-response region, while Random-Block and RandomBlur a… view at source ↗

**Figure 3.** Figure 3: Overview of RT-SDGDet. Built on RF-DETR with O2M supervision, RTSDGDet introduces Discriminative Evidence Diversity Learning (DEDL) and Dualview Evidence Consistency Learning (DvECL) during training, without changing the inference pipeline. Specifically, at decoder layer l, we compute a query–target matching quality matrix T(l) ∈ R N×Q following prior O2M matching, where N and Q are the numbers of GT obj… view at source ↗

**Figure 4.** Figure 4: Qualitative comparison under different domain conditions. Comparison of detection results between RF-DETR, MAD and our method. method remains more stable across target domains, preserving more true detections and more complete localization, especially for distant vehicles, boundary objects, and targets degraded by fog, rain, and nighttime illumination. This is consistent with the quantitative results and… view at source ↗

**Figure 5.** Figure 5: Visualization of discriminative evidence regions. Compared with RF-DETR, our method yields broader and more complete responses on target objects, indicating that RT-SDGDet learns more sufficient discriminative evidence rather than relying on a few localized cues. To analyze the effect of our method on discriminative evidence modeling, we visualize the evidence response regions of RF-DETR and RT-SDGDet. … view at source ↗

read the original abstract

In real-world deployment under strict real-time constraints, weather and imaging variations induce significant distribution shifts, severely degrading detectors. Single-Domain Generalized Object Detection aims to mitigate this issue, yet existing methods rarely investigate-at the level of problem formulation-the generalization capability of real-time detectors under such constrained inference budgets. To this end, we introduce Real-Time Single-Domain Generalized Object Detection (RT-SDGOD), which focuses on how real-time detectors can achieve cross-domain generalization under zero extra inference overhead by relying solely on training-time representation learning. We observe that, under domain shift, DETR-based real-time detectors mainly degrade through increased missed detections, rooted in limited and unstable object-level discriminative evidence. Based on this, we propose RT-SDGDet, a multi-evidence collaborative modeling framework for RT-SDGOD. The core idea is to enable multiple queries of the same object to collaboratively cover more sufficient discriminative evidence while maintaining the stability of such evidence modeling across views. Specifically, we use one-to-many (O2M) supervision to construct stable object-specific query groups, and further design Discriminative Evidence Diversity Learning (DEDL) and Dual-view Evidence Consistency Learning (DvECL) to expand object-level evidence coverage and improve evidence stability under appearance perturbations, respectively. Since all components are introduced only during training, our method incurs no extra inference overhead. Extensive experiments show that the proposed method achieves better generalization performance than existing approaches across multiple unseen target domains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds training-only modules to real-time DETR detectors for domain generalization with no inference overhead, but the abstract supplies no numbers or error analysis to back the mechanism.

read the letter

The main point is that this work sets up real-time single-domain generalized object detection as a distinct setting and proposes RT-SDGDet with one-to-many supervision plus DEDL and DvECL losses to give object queries more stable and diverse evidence during training only.

The framing around zero extra inference cost is useful for applications that cannot change runtime budgets. The observation that degradation shows up mainly as missed detections rather than other error types is a clear starting assumption, and the collaborative query groups follow logically from it. The method description stays consistent with existing DETR real-time lines of work.

The soft spots are straightforward. The abstract states improved generalization across unseen domains but gives no mAP numbers, no baselines, no error bars, and no breakdown of false negatives versus precision or localization errors on target domains. The stress-test concern holds: without ablations that separate DEDL and DvECL from the O2M supervision itself, or any metric tracking evidence coverage per object, it is not possible to confirm that the modules act through the claimed causal path rather than as generic regularizers. The full paper would need to supply those details for the central claim to land.

Citation patterns look standard for the subfield. No load-bearing contradictions appear in the method outline.

This is for researchers working on practical robust detection under fixed latency, such as driving or monitoring systems. A reader who wants concrete training tricks to try on existing real-time models could extract value from the module descriptions.

It deserves peer review because the problem is practical and the approach is described without obvious internal gaps, even if the current text leaves the empirical case open.

Referee Report

3 major / 2 minor

Summary. The paper introduces the RT-SDGOD setting for real-time object detectors under domain shift and proposes RT-SDGDet, a training-only framework that uses one-to-many (O2M) supervision to form stable query groups, plus Discriminative Evidence Diversity Learning (DEDL) and Dual-view Evidence Consistency Learning (DvECL) to expand and stabilize object-level evidence. It claims that DETR-based real-time detectors degrade primarily via increased missed detections under shift, that the proposed components directly mitigate this without side effects, and that the resulting model outperforms prior approaches on multiple unseen target domains while incurring zero extra inference cost.

Significance. If the mechanistic claims and quantitative gains are substantiated, the work would be significant for practical deployment of real-time detectors in variable conditions, as it targets the inference-budget constraint explicitly and avoids test-time adaptation overhead. The training-only design and focus on evidence coverage are potentially useful contributions if supported by targeted ablations and error breakdowns.

major comments (3)

[Abstract, §1] Abstract and §1: The central observation that degradation occurs 'mainly' through increased missed detections (rather than precision loss or localization error) is stated without any supporting quantitative breakdown of error types (FN vs. FP vs. box regression) on target domains; this is load-bearing for the subsequent design of O2M+DEDL+DvECL.
[§3, §4] §3 and §4: No ablation isolates the contribution of DEDL and DvECL from the O2M supervision itself, nor measures 'evidence coverage' or 'stability' directly (e.g., via query-feature variance or positive-match cardinality); without these, it is unclear whether the reported gains stem from the proposed mechanisms or from generic regularization or training schedule changes.
[§5] §5 (experiments): The abstract asserts 'better generalization performance than existing approaches' but supplies no numerical results, baselines, mAP deltas, or error bars; the full experimental section must provide these with statistical detail to substantiate the claim that the method secures the stated generalization benefit.

minor comments (2)

[Abstract] Abstract contains a typographical issue: 'investigate-at the level' should be 'investigate at the level'.
[§3] Notation for the proposed losses (DEDL, DvECL) and their weighting hyperparameters should be introduced with explicit equations in §3 before being referenced in the experimental protocol.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to strengthen the supporting evidence, ablations, and experimental reporting.

read point-by-point responses

Referee: [Abstract, §1] Abstract and §1: The central observation that degradation occurs 'mainly' through increased missed detections (rather than precision loss or localization error) is stated without any supporting quantitative breakdown of error types (FN vs. FP vs. box regression) on target domains; this is load-bearing for the subsequent design of O2M+DEDL+DvECL.

Authors: We agree that a quantitative breakdown of error types is necessary to support the central claim. In the revised manuscript we will add tables and analysis breaking down false negatives, false positives, and localization errors on the unseen target domains, confirming that missed detections are the dominant source of degradation. revision: yes
Referee: [§3, §4] §3 and §4: No ablation isolates the contribution of DEDL and DvECL from the O2M supervision itself, nor measures 'evidence coverage' or 'stability' directly (e.g., via query-feature variance or positive-match cardinality); without these, it is unclear whether the reported gains stem from the proposed mechanisms or from generic regularization or training schedule changes.

Authors: We acknowledge the need for component-wise isolation and direct mechanism measurements. The revised version will include new ablations that evaluate O2M alone, then add DEDL, then add DvECL, plus quantitative metrics for evidence coverage (e.g., positive-match cardinality) and stability (e.g., query-feature variance across views) to demonstrate that gains arise from the proposed mechanisms rather than generic effects. revision: yes
Referee: [§5] §5 (experiments): The abstract asserts 'better generalization performance than existing approaches' but supplies no numerical results, baselines, mAP deltas, or error bars; the full experimental section must provide these with statistical detail to substantiate the claim that the method secures the stated generalization benefit.

Authors: The full experimental section already reports mAP comparisons against baselines across multiple unseen domains. To fully address the request we will augment the section with explicit mAP deltas, error bars from multiple random seeds, and statistical significance tests in the revised manuscript. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation chain is self-contained

full rationale

The paper states an empirical observation about degradation modes under domain shift and introduces new training-only components (O2M supervision, DEDL, DvECL) to address it. No equations, fitted parameters, or predictions are described that reduce by construction to the inputs or target metrics. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The central claim rests on the proposed training additions and experimental results rather than any definitional or statistical equivalence to prior fitted quantities.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no equations, parameters, or explicit assumptions; ledger left empty due to insufficient detail.

pith-pipeline@v0.9.1-grok · 5800 in / 1006 out tokens · 17082 ms · 2026-06-27T16:46:57.244691+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 10 canonical work pages · 1 internal anchor

[1]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Vidit, V., Engilberge, M., Salzmann, M.: Clip the gap: A single domain generaliza- tion approach for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3219–3229 (2023) 2, 4

2023
[2]

arXiv preprint arXiv:2504.19574 (2025) 2, 4

Hwang, S., Han, D., Jeon, M.: Dg-detr: Toward domain generalized detection trans- former. arXiv preprint arXiv:2504.19574 (2025) 2, 4

work page arXiv 2025
[3]

In: Proceed- ingsoftheAAAIConferenceonArtificialIntelligence.vol.38,pp.5958–5966(2024) 2, 4

Wu, F., Gao, J., Hong, L., Wang, X., Zhou, C., Ye, N.: G-nas: Generalizable neural architecture search for single domain generalization object detection. In: Proceed- ingsoftheAAAIConferenceonArtificialIntelligence.vol.38,pp.5958–5966(2024) 2, 4

2024
[4]

arXiv preprint arXiv:2504.20498 (2025) 2, 4

Han, J., Wang, Y., Chen, L.: Style-adaptive detection transformer for single-source domain generalized object detection. arXiv preprint arXiv:2504.20498 (2025) 2, 4

work page arXiv 2025
[5]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 779–788 (2016) 4

2016
[6]

YOLOv12: Attention-Centric Real-Time Object Detectors

Tian, Y., Ye, Q., Doermann, D.: Yolov12: Attention-centric real-time object detec- tors. arXiv preprint arXiv:2502.12524 (2025) 4, 10

work page internal anchor Pith review Pith/arXiv arXiv 2025
[7]

arXiv preprint arXiv:2506.17733 (2025) 4, 10

Lei, M., Li, S., Wu, Y., Hu, H., Zhou, Y., Zheng, X., Ding, G., Du, S., Wu, Z., Gao, Y.: Yolov13: Real-time object detection with hypergraph-enhanced adaptive visual perception. arXiv preprint arXiv:2506.17733 (2025) 4, 10

work page arXiv 2025
[8]

In: European conference on computer vision

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End- to-end object detection with transformers. In: European conference on computer vision. pp. 213–229. Springer (2020) 4, 6

2020
[9]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Zhao, Y., Lv, W., Xu, S., Wei, J., Wang, G., Dang, Q., Liu, Y., Chen, J.: Detrs beat yolos on real-time object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 16965–16974 (2024) 4

2024
[10]

arXiv:2407.17140 [cs.CV] https://arxiv.org/abs/2407.17140 Paul C

Lv, W., Zhao, Y., Chang, Q., Huang, K., Wang, G., Liu, Y.: Rt-detrv2: Improved baseline with bag-of-freebies for real-time detection transformer. arXiv preprint arXiv:2407.17140 (2024) 4

work page arXiv 2024
[11]

In: 2025 IEEE/CVF Winter Confer- ence on Applications of Computer Vision (WACV)

Wang, S., Xia, C., Lv, F., Shi, Y.: Rt-detrv3: Real-time end-to-end object detection with hierarchical dense positive supervision. In: 2025 IEEE/CVF Winter Confer- ence on Applications of Computer Vision (WACV). pp. 1628–1636. IEEE (2025) 4, 10

2025
[12]

arXiv preprint arXiv:2510.25257 (2025) 4, 10 16 Yupeng Zhang, Fangzhuo Gao, Ruize Han, Wei Feng, and Liang Wan

Liao, Z., Zhao, Y., Shan, X., Yan, Y., Liu, C., Lu, L., Ji, X., Chen, J.: Rt-detrv4: Painlessly furthering real-time object detection with vision foundation models. arXiv preprint arXiv:2510.25257 (2025) 4, 10 16 Yupeng Zhang, Fangzhuo Gao, Ruize Han, Wei Feng, and Liang Wan

work page arXiv 2025
[13]

arXiv preprint arXiv:2406.03459 (2024) 4, 10

Chen, Q., Su, X., Zhang, X., Wang, J., Chen, J., Shen, Y., Han, C., Chen, Z., Xu, W., Li, F., et al.: Lw-detr: A transformer replacement to yolo for real-time detection. arXiv preprint arXiv:2406.03459 (2024) 4, 10

work page arXiv 2024
[14]

arXiv preprint arXiv:2410.13842 (2024) 4, 10

Peng, Y., Li, H., Wu, P., Zhang, Y., Sun, X., Wu, F.: D-fine: Redefine re- gression task in detrs as fine-grained distribution refinement. arXiv preprint arXiv:2410.13842 (2024) 4, 10

work page arXiv 2024
[15]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Huang, S., Lu, Z., Cun, X., Yu, Y., Zhou, X., Shen, X.: Deim: Detr with improved matching for fast convergence. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 15162–15171 (2025) 4, 10

2025
[16]

arXiv preprint arXiv:2509.20787 (2025) 4, 10

Huang, S., Hou, Y., Liu, L., Yu, X., Shen, X.: Real-time object detection meets dinov3. arXiv preprint arXiv:2509.20787 (2025) 4, 10

work page arXiv 2025
[17]

arXiv preprint arXiv:2511.09554 (2025) 4, 6, 10

Robinson, I., Robicheaux, P., Popov, M., Ramanan, D., Peri, N.: Rf-detr: Neural architecture search for real-time detection transformers. arXiv preprint arXiv:2511.09554 (2025) 4, 6, 10

work page arXiv 2025
[18]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Danish, M.S., Khan, M.H., Munir, M.A., Sarfraz, M.S., Ali, M.: Improving sin- gle domain-generalized object detection: A focus on diversification and alignment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 17732–17742 (2024) 4

2024
[19]

IEEE Transactions on Neural Networks and Learning Systems (2024) 4

Rao, Z., Guo, J., Tang, L., Huang, Y., Ding, X., Guo, S.: Srcd: Semantic reason- ing with compound domains for single-domain generalized object detection. IEEE Transactions on Neural Networks and Learning Systems (2024) 4

2024
[20]

In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition

Wu, A., Deng, C.: Single-domain generalized object detection in urban scene via cyclic-disentangled self-distillation. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. pp. 847–856 (2022) 4, 9

2022
[21]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Liu, Y., Zhou, S., Liu, X., Hao, C., Fan, B., Tian, J.: Unbiased faster r-cnn for single-source domain generalized object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 28838– 28847 (2024) 4

2024
[22]

xiao et al

Xiao, A., Yu, W., Yu, H.: Sample-aware randaugment: Search-free automatic data augmentation for effective image recognition: A. xiao et al. International Journal of Computer Vision133(11), 7710–7725 (2025) 5, 10

2025
[23]

Cheng, S., Gokhale, T., Yang, Y.: Adversarial bayesian augmentation for single- sourcedomaingeneralization.In:ProceedingsoftheIEEE/CVFInternationalCon- ference on Computer Vision. pp. 11400–11410 (2023) 5, 10

2023
[24]

Xu, M., Qin, L., Chen, W., Pu, S., Zhang, L.: Multi-view adversarial discriminator: Minethenon-causalfactorsforobjectdetectioninunseendomains.In:Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8103–8112 (2023) 5, 8, 10

2023
[25]

In: The Eleventh International Conference on Learning Representations (ICLR 2023)

Fan, Q., Segu, M., Tai, Y.W., Yu, F., Tang, C.K., Schiele, B., Dai, D.: Towards robust object detection invariant to real-world domain shifts. In: The Eleventh International Conference on Learning Representations (ICLR 2023). OpenReview (2023) 5, 10

2023
[26]

In: proceedings of the AAAI conference on artificial intelligence

Lee, W., Hong, D., Lim, H., Myung, H.: Object-aware domain generalization for object detection. In: proceedings of the AAAI conference on artificial intelligence. vol. 38, pp. 2947–2955 (2024) 5, 10

2024
[27]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Xu,X.,Yang,J.,Shi,W.,Ding,S.,Luo,L.,Liu,J.:Physaug:Aphysical-guidedand frequency-based data augmentation for single-domain generalized object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 21815–21823 (2025) 5, 10 Abbreviated paper title 17

2025
[28]

In: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition

Li, F., Zhang, H., Liu, S., Guo, J., Ni, L.M., Zhang, L.: Dn-detr: Accelerate detr training by introducing query denoising. In: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition. pp. 13619–13627 (2022) 6

2022
[29]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 19702–19712 (2023) 6

2023
[30]

In: Proceedings of the IEEE/CVF international conference on com- puter vision

Chen, Q., Chen, X., Wang, J., Zhang, S., Yao, K., Feng, H., Han, J., Ding, E., Zeng, G., Wang, J.: Group detr: Fast detr training with group-wise one-to-many assignment. In: Proceedings of the IEEE/CVF international conference on com- puter vision. pp. 6633–6642 (2023) 6

2023
[31]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Zhao, C., Sun, Y., Wang, W., Chen, Q., Ding, E., Yang, Y., Wang, J.: Ms-detr: Efficient detr training with mixed supervision. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 17027–17036 (2024) 6

2024

[1] [1]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Vidit, V., Engilberge, M., Salzmann, M.: Clip the gap: A single domain generaliza- tion approach for object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3219–3229 (2023) 2, 4

2023

[2] [2]

arXiv preprint arXiv:2504.19574 (2025) 2, 4

Hwang, S., Han, D., Jeon, M.: Dg-detr: Toward domain generalized detection trans- former. arXiv preprint arXiv:2504.19574 (2025) 2, 4

work page arXiv 2025

[3] [3]

In: Proceed- ingsoftheAAAIConferenceonArtificialIntelligence.vol.38,pp.5958–5966(2024) 2, 4

Wu, F., Gao, J., Hong, L., Wang, X., Zhou, C., Ye, N.: G-nas: Generalizable neural architecture search for single domain generalization object detection. In: Proceed- ingsoftheAAAIConferenceonArtificialIntelligence.vol.38,pp.5958–5966(2024) 2, 4

2024

[4] [4]

arXiv preprint arXiv:2504.20498 (2025) 2, 4

Han, J., Wang, Y., Chen, L.: Style-adaptive detection transformer for single-source domain generalized object detection. arXiv preprint arXiv:2504.20498 (2025) 2, 4

work page arXiv 2025

[5] [5]

In: Proceedings of the IEEE conference on computer vision and pattern recognition

Redmon, J., Divvala, S., Girshick, R., Farhadi, A.: You only look once: Unified, real-time object detection. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 779–788 (2016) 4

2016

[6] [6]

YOLOv12: Attention-Centric Real-Time Object Detectors

Tian, Y., Ye, Q., Doermann, D.: Yolov12: Attention-centric real-time object detec- tors. arXiv preprint arXiv:2502.12524 (2025) 4, 10

work page internal anchor Pith review Pith/arXiv arXiv 2025

[7] [7]

arXiv preprint arXiv:2506.17733 (2025) 4, 10

Lei, M., Li, S., Wu, Y., Hu, H., Zhou, Y., Zheng, X., Ding, G., Du, S., Wu, Z., Gao, Y.: Yolov13: Real-time object detection with hypergraph-enhanced adaptive visual perception. arXiv preprint arXiv:2506.17733 (2025) 4, 10

work page arXiv 2025

[8] [8]

In: European conference on computer vision

Carion, N., Massa, F., Synnaeve, G., Usunier, N., Kirillov, A., Zagoruyko, S.: End- to-end object detection with transformers. In: European conference on computer vision. pp. 213–229. Springer (2020) 4, 6

2020

[9] [9]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Zhao, Y., Lv, W., Xu, S., Wei, J., Wang, G., Dang, Q., Liu, Y., Chen, J.: Detrs beat yolos on real-time object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 16965–16974 (2024) 4

2024

[10] [10]

arXiv:2407.17140 [cs.CV] https://arxiv.org/abs/2407.17140 Paul C

Lv, W., Zhao, Y., Chang, Q., Huang, K., Wang, G., Liu, Y.: Rt-detrv2: Improved baseline with bag-of-freebies for real-time detection transformer. arXiv preprint arXiv:2407.17140 (2024) 4

work page arXiv 2024

[11] [11]

In: 2025 IEEE/CVF Winter Confer- ence on Applications of Computer Vision (WACV)

Wang, S., Xia, C., Lv, F., Shi, Y.: Rt-detrv3: Real-time end-to-end object detection with hierarchical dense positive supervision. In: 2025 IEEE/CVF Winter Confer- ence on Applications of Computer Vision (WACV). pp. 1628–1636. IEEE (2025) 4, 10

2025

[12] [12]

arXiv preprint arXiv:2510.25257 (2025) 4, 10 16 Yupeng Zhang, Fangzhuo Gao, Ruize Han, Wei Feng, and Liang Wan

Liao, Z., Zhao, Y., Shan, X., Yan, Y., Liu, C., Lu, L., Ji, X., Chen, J.: Rt-detrv4: Painlessly furthering real-time object detection with vision foundation models. arXiv preprint arXiv:2510.25257 (2025) 4, 10 16 Yupeng Zhang, Fangzhuo Gao, Ruize Han, Wei Feng, and Liang Wan

work page arXiv 2025

[13] [13]

arXiv preprint arXiv:2406.03459 (2024) 4, 10

Chen, Q., Su, X., Zhang, X., Wang, J., Chen, J., Shen, Y., Han, C., Chen, Z., Xu, W., Li, F., et al.: Lw-detr: A transformer replacement to yolo for real-time detection. arXiv preprint arXiv:2406.03459 (2024) 4, 10

work page arXiv 2024

[14] [14]

arXiv preprint arXiv:2410.13842 (2024) 4, 10

Peng, Y., Li, H., Wu, P., Zhang, Y., Sun, X., Wu, F.: D-fine: Redefine re- gression task in detrs as fine-grained distribution refinement. arXiv preprint arXiv:2410.13842 (2024) 4, 10

work page arXiv 2024

[15] [15]

In: Proceedings of the Computer Vision and Pattern Recognition Conference

Huang, S., Lu, Z., Cun, X., Yu, Y., Zhou, X., Shen, X.: Deim: Detr with improved matching for fast convergence. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 15162–15171 (2025) 4, 10

2025

[16] [16]

arXiv preprint arXiv:2509.20787 (2025) 4, 10

Huang, S., Hou, Y., Liu, L., Yu, X., Shen, X.: Real-time object detection meets dinov3. arXiv preprint arXiv:2509.20787 (2025) 4, 10

work page arXiv 2025

[17] [17]

arXiv preprint arXiv:2511.09554 (2025) 4, 6, 10

Robinson, I., Robicheaux, P., Popov, M., Ramanan, D., Peri, N.: Rf-detr: Neural architecture search for real-time detection transformers. arXiv preprint arXiv:2511.09554 (2025) 4, 6, 10

work page arXiv 2025

[18] [18]

In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition

Danish, M.S., Khan, M.H., Munir, M.A., Sarfraz, M.S., Ali, M.: Improving sin- gle domain-generalized object detection: A focus on diversification and alignment. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 17732–17742 (2024) 4

2024

[19] [19]

IEEE Transactions on Neural Networks and Learning Systems (2024) 4

Rao, Z., Guo, J., Tang, L., Huang, Y., Ding, X., Guo, S.: Srcd: Semantic reason- ing with compound domains for single-domain generalized object detection. IEEE Transactions on Neural Networks and Learning Systems (2024) 4

2024

[20] [20]

In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition

Wu, A., Deng, C.: Single-domain generalized object detection in urban scene via cyclic-disentangled self-distillation. In: Proceedings of the IEEE/CVF Conference on computer vision and pattern recognition. pp. 847–856 (2022) 4, 9

2022

[21] [21]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Liu, Y., Zhou, S., Liu, X., Hao, C., Fan, B., Tian, J.: Unbiased faster r-cnn for single-source domain generalized object detection. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 28838– 28847 (2024) 4

2024

[22] [22]

xiao et al

Xiao, A., Yu, W., Yu, H.: Sample-aware randaugment: Search-free automatic data augmentation for effective image recognition: A. xiao et al. International Journal of Computer Vision133(11), 7710–7725 (2025) 5, 10

2025

[23] [23]

Cheng, S., Gokhale, T., Yang, Y.: Adversarial bayesian augmentation for single- sourcedomaingeneralization.In:ProceedingsoftheIEEE/CVFInternationalCon- ference on Computer Vision. pp. 11400–11410 (2023) 5, 10

2023

[24] [24]

Xu, M., Qin, L., Chen, W., Pu, S., Zhang, L.: Multi-view adversarial discriminator: Minethenon-causalfactorsforobjectdetectioninunseendomains.In:Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 8103–8112 (2023) 5, 8, 10

2023

[25] [25]

In: The Eleventh International Conference on Learning Representations (ICLR 2023)

Fan, Q., Segu, M., Tai, Y.W., Yu, F., Tang, C.K., Schiele, B., Dai, D.: Towards robust object detection invariant to real-world domain shifts. In: The Eleventh International Conference on Learning Representations (ICLR 2023). OpenReview (2023) 5, 10

2023

[26] [26]

In: proceedings of the AAAI conference on artificial intelligence

Lee, W., Hong, D., Lim, H., Myung, H.: Object-aware domain generalization for object detection. In: proceedings of the AAAI conference on artificial intelligence. vol. 38, pp. 2947–2955 (2024) 5, 10

2024

[27] [27]

In: Proceedings of the AAAI Conference on Artificial Intelligence

Xu,X.,Yang,J.,Shi,W.,Ding,S.,Luo,L.,Liu,J.:Physaug:Aphysical-guidedand frequency-based data augmentation for single-domain generalized object detection. In: Proceedings of the AAAI Conference on Artificial Intelligence. vol. 39, pp. 21815–21823 (2025) 5, 10 Abbreviated paper title 17

2025

[28] [28]

In: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition

Li, F., Zhang, H., Liu, S., Guo, J., Ni, L.M., Zhang, L.: Dn-detr: Accelerate detr training by introducing query denoising. In: Proceedings of the IEEE/CVF con- ference on computer vision and pattern recognition. pp. 13619–13627 (2022) 6

2022

[29] [29]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Jia, D., Yuan, Y., He, H., Wu, X., Yu, H., Lin, W., Sun, L., Zhang, C., Hu, H.: Detrs with hybrid matching. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 19702–19712 (2023) 6

2023

[30] [30]

In: Proceedings of the IEEE/CVF international conference on com- puter vision

Chen, Q., Chen, X., Wang, J., Zhang, S., Yao, K., Feng, H., Han, J., Ding, E., Zeng, G., Wang, J.: Group detr: Fast detr training with group-wise one-to-many assignment. In: Proceedings of the IEEE/CVF international conference on com- puter vision. pp. 6633–6642 (2023) 6

2023

[31] [31]

In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition

Zhao, C., Sun, Y., Wang, W., Chen, Q., Ding, E., Yang, Y., Wang, J.: Ms-detr: Efficient detr training with mixed supervision. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 17027–17036 (2024) 6

2024