VFM$^{4}$SDG: Unveiling the Power of VFMs for Single-Domain Generalized Object Detection

Liang Wan; Ningnan Guo; Ruize Han; Song Wang; Wei Feng; Yupeng Zhang

arxiv: 2604.21502 · v2 · pith:S2FQKJNZnew · submitted 2026-04-23 · 💻 cs.CV

VFM⁴SDG: Unveiling the Power of VFMs for Single-Domain Generalized Object Detection

Yupeng Zhang , Ruize Han , Ningnan Guo , Wei Feng , Song Wang , Liang Wan This is my paper

Pith reviewed 2026-05-25 06:01 UTC · model grok-4.3

classification 💻 cs.CV

keywords single-domain generalized object detectionvision foundation modelsDETR detectorsdomain shiftrelational prior distillationquery enhancementmissed detectionsobject detection

0 comments

The pith

Vision foundation models preserve stable relational structures that compensate for missed detections in DETR detectors under domain shifts.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that single-domain generalized object detection fails mainly because domain shifts break encoder relations between objects and background and weaken decoder query binding to objects. It shows that vision foundation models maintain these relational structures and object responses across shifts, allowing them to serve as fixed priors. The method distills stable relations into the encoder and injects semantic prototypes plus global context into queries to restore stability. A reader would care because this turns an existing large model into a source of cross-domain robustness without new labeled data or per-domain retraining.

Core claim

Performance degradation under domain shift is dominated by increasing missed detections that arise from disrupted encoder-side object-background and inter-instance relations plus weakened semantic-spatial binding between decoder queries and objects; vision foundation models preserve stable relational structures and object responses under severe shifts and therefore supply usable cross-domain stability priors when their encoder relations are distilled and their category semantics are injected into queries.

What carries the argument

Dual-prior learning framework that performs Cross-domain Stable Relational Prior Distillation from a frozen VFM into the detector encoder and Semantic-Contextual Prior-based Query Enhancement that adds category semantic prototypes and global object context to decoder queries.

If this is right

VFM4SDG outperforms prior SDGOD methods on standard benchmarks while remaining compatible with two mainstream DETR-based detectors.
Relational stability in the encoder and query-object binding in the decoder are the primary factors that determine cross-domain detection reliability.
A single frozen VFM can serve as a fixed source of priors that compensates for domain-induced degradation without requiring domain-specific adjustments.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same relational-distillation approach could be tested on non-DETR detectors or on tasks such as instance segmentation that also rely on query-object binding.
If the stability priors generalize, the method might lower the amount of data augmentation needed in other single-domain generalization pipelines.
Measuring the preservation of VFM relations across a wider range of imaging degradations would clarify whether the observed stability holds beyond the weather and illumination shifts examined.

Load-bearing premise

The claim that missed detections are the main source of degradation and that distilling from a frozen VFM will restore the broken relations and bindings without creating new instabilities.

What would settle it

A controlled test in which the VFM-derived relations are shown to be as unstable as the detector's own relations under the same shifts, or in which adding the distilled priors fails to reduce the count of missed detections on the standard SDGOD benchmarks.

Figures

Figures reproduced from arXiv: 2604.21502 by Liang Wan, Ningnan Guo, Ruize Han, Song Wang, Wei Feng, Yupeng Zhang.

**Figure 2.** Figure 2: Overall framework of VFM4SDG. Built upon a DETR-based detector, VFM4SDG leverages a frozen VFM as a cross-domain structural visual prior for single-domain generalized object detection. At the encoding stage, Cross-domain Stable Relational Prior Distillation (CSRPD) transfers cross-domain stable inter-instance relational structures from VFM to the encoder, yielding a representation space with enhanced relat… view at source ↗

**Figure 3.** Figure 3: Qualitative comparisons under diverse domain conditions. We compare our detection results with state-of-the-art methods, with different categories [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of Encoder Feature Responses (Layer-2) under Domain Shift. From left to right: source image, Co-DETR encoder features, DINOv3 [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗

**Figure 5.** Figure 5: Representative failure cases of VFM4SDG (Co-DETR-based) under challenging domain conditions. From left to right: GT and VFM4SDG predictions. The examples include scenarios with heavy rain, nighttime illumination, dense fog, motion blur, small-scale objects, and severe occlusion. While the proposed method maintains robust detection performance in most cases, certain missed detections still occur under extr… view at source ↗

read the original abstract

Real-world weather, illumination, and imaging variations often induce severe domain shifts, degrading single-source detectors in unseen environments. Existing single-domain generalized object detection (SDGOD) methods mainly rely on data augmentation or domain-invariant learning, while largely overlooking how domain shift disrupts detector prediction stability. Through analytical experiments, we find that performance degradation is mainly dominated by increasing missed detections. Further analysis shows that this phenomenon stems from reduced cross-domain stability in DETR-style detectors: domain shift disrupts encoder-side object-background and inter-instance relations, and further weakens the semantic-spatial binding between decoder queries and real objects. Motivated by this, we find that vision foundation models (VFMs) still preserve stable relational structures and object responses under severe shifts, making them suitable cross-domain stability priors to compensate for detector degradation. To this end, we propose VFM$^{4}$SDG, a dual-prior learning framework for SDGOD, which introduces a frozen VFM into encoder representation learning and decoder query modeling. Specifically, we propose Cross-domain Stable Relational Prior Distillation to distill stable object-background and inter-instance relations from the VFM into the encoder, compensating for relational degradation. Meanwhile, we propose Semantic-Contextual Prior-based Query Enhancement, which injects category semantic prototypes and global object context into queries before they enter the decoder layer, enhancing semantic-spatial query-object binding stability. Extensive experiments show that VFM$^{4}$SDG significantly outperforms existing advanced methods on standard SDGOD benchmarks and two mainstream DETR-based detection frameworks, demonstrating its effectiveness, robustness, and generality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper pins DETR domain-shift failures on missed detections from broken encoder relations and decoder bindings, then injects stable priors from a frozen VFM via two new modules, but lacks separate metrics confirming the stability claim.

read the letter

The main point is that this work traces single-domain generalization failures in DETR detectors to rising missed detections caused by unstable object-background and inter-instance relations in the encoder plus weakened query-object binding in the decoder. They observe that VFMs keep those structures more stable across shifts and build a dual-prior framework around that observation.

Referee Report

2 major / 2 minor

Summary. The paper claims that domain shift in DETR-style detectors primarily increases missed detections by disrupting encoder-side object-background/inter-instance relations and decoder query-object bindings; vision foundation models (VFMs) preserve stable relational structures and object responses across shifts, so the proposed VFM⁴SDG dual-prior framework distills cross-domain stable relational priors from a frozen VFM into the detector encoder and injects category semantic prototypes plus global context into decoder queries, yielding significant gains over prior SDGOD methods on standard benchmarks and two DETR frameworks.

Significance. If the analytical experiments confirm that VFM relational stability is quantitatively distinct from detector degradation and the performance lift is specifically due to the two priors rather than generic regularization or capacity, the work would establish a concrete mechanism for using frozen VFMs as cross-domain stability anchors in single-source generalized detection, with potential generality across detection architectures.

major comments (2)

[analytical experiments / motivation section] The central motivation rests on the analytical finding that VFMs preserve stable relational structures under shift while detectors do not, yet no explicit quantitative stability metric (e.g., cosine similarity of relation matrices or query-object binding scores) is reported on frozen VFM features versus detector features across source and target domains; without such a table or figure in the analytical experiments section, the claimed cross-domain stability prior cannot be separated from post-hoc performance gains.
[method and experiments] § on method and experiments: the two proposed modules (Cross-domain Stable Relational Prior Distillation and Semantic-Contextual Prior-based Query Enhancement) are motivated as directly compensating the identified degradation modes, but the manuscript provides no ablation that measures the reduction in missed detections attributable to each module separately on the target domains, nor any statistical test confirming the modules address the encoder-relation and decoder-binding issues rather than generic regularization.

minor comments (2)

[method] Notation for the two priors and their loss terms should be introduced with explicit equations early in the method section to improve readability.
[figures] Figure captions for the analytical experiments should explicitly state the domains and models compared so that the stability claim can be verified from the visuals alone.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major comment below and commit to revisions that directly strengthen the motivation and experimental validation.

read point-by-point responses

Referee: [analytical experiments / motivation section] The central motivation rests on the analytical finding that VFMs preserve stable relational structures under shift while detectors do not, yet no explicit quantitative stability metric (e.g., cosine similarity of relation matrices or query-object binding scores) is reported on frozen VFM features versus detector features across source and target domains; without such a table or figure in the analytical experiments section, the claimed cross-domain stability prior cannot be separated from post-hoc performance gains.

Authors: We agree that an explicit quantitative comparison would more rigorously separate the claimed stability prior from performance observations. While the current analytical experiments section links missed detections to relational degradation via indirect metrics and visualizations, it does not report direct cross-domain stability scores (e.g., cosine similarity of relation matrices or query-object binding) between frozen VFM and detector features. We will add a dedicated table and accompanying figure in the revised analytical experiments section that computes and reports these metrics on both source and target domains for VFM versus detector features. This addition will be placed before the method section to better ground the motivation. revision: yes
Referee: [method and experiments] § on method and experiments: the two proposed modules (Cross-domain Stable Relational Prior Distillation and Semantic-Contextual Prior-based Query Enhancement) are motivated as directly compensating the identified degradation modes, but the manuscript provides no ablation that measures the reduction in missed detections attributable to each module separately on the target domains, nor any statistical test confirming the modules address the encoder-relation and decoder-binding issues rather than generic regularization.

Authors: We concur that module-specific ablations focused on missed-detection reduction and statistical validation would strengthen the causal claims. The existing ablations demonstrate overall gains but do not isolate per-module effects on missed detections or include formal statistical tests. In the revision we will add targeted ablations that report the change in missed-detection rate on target domains when each module is enabled individually. We will also run multiple random seeds and include paired statistical significance tests (e.g., t-tests) comparing the full model against ablated variants to confirm the improvements exceed generic regularization effects. These results will appear in the main experiments section or supplementary material. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical motivation and proposed modules are independent of fitted inputs or self-citations.

full rationale

The paper's chain consists of analytical observations on detector degradation under domain shift, followed by a proposed dual-prior framework (Cross-domain Stable Relational Prior Distillation and Semantic-Contextual Prior-based Query Enhancement) that injects VFM features. No equations, parameter fits, or derivations are presented that reduce the claimed stability priors or performance gains to quantities defined from the same data by construction. No self-citation load-bearing steps or uniqueness theorems from prior author work appear in the provided text. The approach is self-contained against external SDGOD benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Review performed on abstract only; full details on any learned weights, loss coefficients, or prototype construction are unavailable. The central claim rests on two domain assumptions about the source of degradation and the stability properties of VFMs.

axioms (2)

domain assumption Performance degradation under domain shift is mainly dominated by increasing missed detections caused by disrupted encoder relations and weakened decoder query-object binding.
Stated as the outcome of the paper's analytical experiments in the abstract.
domain assumption Vision foundation models preserve stable relational structures and object responses under severe domain shifts.
Presented as the key motivation for using VFMs as cross-domain stability priors.

pith-pipeline@v0.9.0 · 5832 in / 1611 out tokens · 51363 ms · 2026-05-25T06:01:32.762603+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

55 extracted references · 55 canonical work pages · 7 internal anchors

[1]

Single-domain generalized object detection in urban scene via cyclic-disentangled self-distillation,

A. Wu and C. Deng, “Single-domain generalized object detection in urban scene via cyclic-disentangled self-distillation,” inProceedings of the IEEE/CVF Conference on computer vision and pattern recognition, 2022, pp. 847–856. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 11 GT Daytime ClearDaytime FoggyDusk RainyNight Rainy SA-DETR (DINO) VFM4...

work page 2022
[2]

Dg-detr: Toward domain generalized detection transformer,

S. Hwang, D. Han, and M. Jeon, “Dg-detr: Toward domain generalized detection transformer,”arXiv preprint arXiv:2504.19574, 2025

work page arXiv 2025
[3]

Style-adaptive detection transformer for single-source domain generalized object detection,

J. Han, Y . Wang, and L. Chen, “Style-adaptive detection transformer for single-source domain generalized object detection,”arXiv preprint arXiv:2504.20498, 2025

work page arXiv 2025
[4]

End-to-end object detection with transformers,

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision. Springer, 2020, pp. 213– 229

work page 2020
[5]

Learning to learn single domain gen- eralization,

F. Qiao, L. Zhao, and X. Peng, “Learning to learn single domain gen- eralization,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 12 556–12 565. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 12 Source Image Co- VFM4SDG (Co-DETR)DETR DINOv3 Fig. 4. Visualization of Encoder Feature Responses (Layer...

work page 2020
[6]

Adversarially adaptive normalization for single domain generalization,

X. Fan, Q. Wang, J. Ke, F. Yang, B. Gong, and M. Zhou, “Adversarially adaptive normalization for single domain generalization,” inProceedings of the IEEE/CVF conference on Computer Vision and Pattern Recogni- tion, 2021, pp. 8208–8217

work page 2021
[7]

Progressive domain expansion network for single domain gen- eralization,

L. Li, K. Gao, J. Cao, Z. Huang, Y . Weng, X. Mi, Z. Yu, X. Li, and B. Xia, “Progressive domain expansion network for single domain gen- eralization,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 224–233

work page 2021
[8]

Learning to diversify for single domain generalization,

Z. Wang, Y . Luo, R. Qiu, Z. Huang, and M. Baktashmotlagh, “Learning to diversify for single domain generalization,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 834– 843

work page 2021
[9]

Exact feature distribution matching for arbitrary style transfer and domain generalization,

Y . Zhang, M. Li, R. Li, K. Jia, and L. Zhang, “Exact feature distribution matching for arbitrary style transfer and domain generalization,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 8035–8045

work page 2022
[10]

Out-of-domain generalization from a single source: An uncertainty quantification approach,

X. Peng, F. Qiao, and L. Zhao, “Out-of-domain generalization from a single source: An uncertainty quantification approach,”IEEE Transac- tions on Pattern Analysis and Machine Intelligence, vol. 46, no. 3, pp. 1775–1787, 2022

work page 2022
[11]

Meta convolutional neural networks for single domain gen- eralization,

C. Wan, X. Shen, Y . Zhang, Z. Yin, X. Tian, F. Gao, J. Huang, and X.-S. Hua, “Meta convolutional neural networks for single domain gen- eralization,” inproceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 4682–4691

work page 2022
[12]

Attention consistency on visual corruptions for single-source domain generalization,

I. Cugu, M. Mancini, Y . Chen, and Z. Akata, “Attention consistency on visual corruptions for single-source domain generalization,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4165–4174

work page 2022
[13]

Adversarial source generation for source-free domain adaptation,

C. Cui, F. Meng, C. Zhang, Z. Liu, L. Zhu, S. Gong, and X. Lin, “Adversarial source generation for source-free domain adaptation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 6, pp. 4887–4898, 2023

work page 2023
[14]

Adversarial bayesian augmen- tation for single-source domain generalization,

S. Cheng, T. Gokhale, and Y . Yang, “Adversarial bayesian augmen- tation for single-source domain generalization,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 11 400–11 410

work page 2023
[15]

Meta-causal learning for single domain generalization,

J. Chen, Z. Gao, X. Wu, and J. Luo, “Meta-causal learning for single domain generalization,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 7683–7692

work page 2023
[16]

Center- aware adversarial augmentation for single domain generalization,

T. Chen, M. Baktashmotlagh, Z. Wang, and M. Salzmann, “Center- aware adversarial augmentation for single domain generalization,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 4157–4165

work page 2023
[17]

Learning class and domain augmentations for single-source open- domain generalization,

P. Bele, V . Bundele, A. Bhattacharya, A. Jha, G. Roig, and B. Banerjee, “Learning class and domain augmentations for single-source open- domain generalization,” inProceedings of the IEEE/CVF Winter Con- ference on Applications of Computer Vision, 2024, pp. 1816–1826

work page 2024
[18]

Progressive diversity generation for single domain generalization,

D. Rui, K. Guo, X. Zhu, Z. Wu, and H. Fang, “Progressive diversity generation for single domain generalization,”IEEE Transactions on Multimedia, vol. 26, pp. 10 200–10 210, 2024

work page 2024
[19]

Single domain generalization via normalised cross- correlation based convolutions,

W. Chuah, R. Tennakoon, R. Hoseinnezhad, D. Suter, and A. Bab- Hadiashar, “Single domain generalization via normalised cross- correlation based convolutions,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 1752–1761

work page 2024
[20]

Wildnet: Learning domain generalized semantic segmentation from the wild,

S. Lee, H. Seong, S. Lee, and E. Kim, “Wildnet: Learning domain generalized semantic segmentation from the wild,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 9936–9946

work page 2022
[21]

Learning generalized knowledge from a single domain on urban-scene segmentation,

X. Li, M. Li, X. Li, and X. Guo, “Learning generalized knowledge from a single domain on urban-scene segmentation,”IEEE Transactions on Multimedia, vol. 25, pp. 7635–7646, 2022

work page 2022
[22]

Style projected clustering for domain generalized semantic segmentation,

W. Huang, C. Chen, Y . Li, J. Li, C. Li, F. Song, Y . Yan, and Z. Xiong, “Style projected clustering for domain generalized semantic segmentation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 3061–3071

work page 2023
[23]

Adaptive texture filtering for single-domain generalized segmentation,

X. Li, M. Li, Y . Wang, C.-X. Ren, and X. Guo, “Adaptive texture filtering for single-domain generalized segmentation,” inProceedings of the AAAI conference on artificial intelligence, vol. 37, no. 2, 2023, pp. 1442–1450

work page 2023
[24]

Clip the gap: A single domain generalization approach for object detection,

V . Vidit, M. Engilberge, and M. Salzmann, “Clip the gap: A single domain generalization approach for object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 3219–3229

work page 2023
[25]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PmLR, 2021, pp. 8748–8763

work page 2021
[26]

Improving single domain-generalized object detection: A focus on diversification and alignment,

M. S. Danish, M. H. Khan, M. A. Munir, M. S. Sarfraz, and M. Ali, “Improving single domain-generalized object detection: A focus on diversification and alignment,” inProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2024, pp. 17 732– 17 742

work page 2024
[27]

Towards robust object detection invariant to real-world domain shifts,

Q. Fan, M. Segu, Y .-W. Tai, F. Yu, C.-K. Tang, B. Schiele, and D. Dai, “Towards robust object detection invariant to real-world domain shifts,” inThe Eleventh International Conference on Learning Representations (ICLR 2023). OpenReview, 2023

work page 2023
[28]

Srcd: Se- mantic reasoning with compound domains for single-domain generalized object detection,

Z. Rao, J. Guo, L. Tang, Y . Huang, X. Ding, and S. Guo, “Srcd: Se- mantic reasoning with compound domains for single-domain generalized object detection,”IEEE Transactions on Neural Networks and Learning Systems, 2024

work page 2024
[29]

G-nas: Generalizable neural architecture search for single domain generalization object detection,

F. Wu, J. Gao, L. Hong, X. Wang, C. Zhou, and N. Ye, “G-nas: Generalizable neural architecture search for single domain generalization object detection,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 6, 2024, pp. 5958–5966

work page 2024
[30]

Unbiased faster r-cnn for single-source domain generalized object detection,

Y . Liu, S. Zhou, X. Liu, C. Hao, B. Fan, and J. Tian, “Unbiased faster r-cnn for single-source domain generalized object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 28 838–28 847

work page 2024
[31]

Deit iii: Revenge of the vit,

H. Touvron, M. Cord, and H. J ´egou, “Deit iii: Revenge of the vit,” in European conference on computer vision. Springer, 2022, pp. 516–533

work page 2022
[32]

Grounded language-image pre-training,

L. H. Li, P. Zhang, H. Zhang, J. Yang, C. Li, Y . Zhong, L. Wang, L. Yuan, L. Zhang, J.-N. Hwanget al., “Grounded language-image pre-training,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 965–10 975

work page 2022
[33]

Segment anything,

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4015–4026. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14

work page 2023
[34]

SAM 2: Segment Anything in Images and Videos

N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. R¨adle, C. Rolland, L. Gustafsonet al., “Sam 2: Segment anything in images and videos,”arXiv preprint arXiv:2408.00714, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[35]

SAM 3: Segment Anything with Concepts

N. Carion, L. Gustafson, Y .-T. Hu, S. Debnath, R. Hu, D. Suris, C. Ryali, K. V . Alwala, H. Khedr, A. Huanget al., “Sam 3: Segment anything with concepts,”arXiv preprint arXiv:2511.16719, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[36]

An empirical study of training self- supervised vision transformers,

X. Chen, S. Xie, and K. He, “An empirical study of training self- supervised vision transformers,” inProceedings of the IEEE/CVF in- ternational conference on computer vision, 2021, pp. 9640–9649

work page 2021
[37]

Emerging properties in self-supervised vision transformers,

M. Caron, H. Touvron, I. Misra, H. J ´egou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 9650–9660

work page 2021
[38]

Is imagenet worth 1 video? learning strong image encoders from 1 long unlabelled video,

S. Venkataramanan, M. N. Rizve, J. Carreira, Y . M. Asano, and Y . Avrithis, “Is imagenet worth 1 video? learning strong image encoders from 1 long unlabelled video,”arXiv preprint arXiv:2310.08584, 2023

work page arXiv 2023
[39]

Self-supervised cross- stage regional contrastive learning for object detection,

J. Yan, L. Yang, Y . Gao, and W.-S. Zheng, “Self-supervised cross- stage regional contrastive learning for object detection,” in2023 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2023, pp. 1044–1049

work page 2023
[40]

Masked au- toencoders are scalable vision learners,

K. He, X. Chen, S. Xie, Y . Li, P. Doll ´ar, and R. Girshick, “Masked au- toencoders are scalable vision learners,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 000–16 009

work page 2022
[41]

BEiT: BERT Pre-Training of Image Transformers

H. Bao, L. Dong, S. Piao, and F. Wei, “Beit: Bert pre-training of image transformers,”arXiv preprint arXiv:2106.08254, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[42]

Image as a foreign language: Beit pretraining for vision and vision-language tasks,

W. Wang, H. Bao, L. Dong, J. Bjorck, Z. Peng, Q. Liu, K. Aggarwal, O. K. Mohammed, S. Singhal, S. Somet al., “Image as a foreign language: Beit pretraining for vision and vision-language tasks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19 175–19 186

work page 2023
[43]

Deconstructing denoising diffusion models for self-supervised learning,

X. Chen, Z. Liu, S. Xie, and K. He, “Deconstructing denoising diffusion models for self-supervised learning,”arXiv preprint arXiv:2401.14404, 2024

work page arXiv 2024
[44]

iBOT: Image BERT Pre-Training with Online Tokenizer

J. Zhou, C. Wei, H. Wang, W. Shen, C. Xie, A. Yuille, and T. Kong, “ibot: Image bert pre-training with online tokenizer,”arXiv preprint arXiv:2111.07832, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021
[45]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Noubyet al., “Dinov2: Learning robust visual features without supervision,”arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[46]

DINOv3

O. Sim ´eoni, H. V . V o, M. Seitzer, F. Baldassarre, M. Oquab, C. Jose, V . Khalidov, M. Szafraniec, S. Yi, M. Ramamonjisoaet al., “Dinov3,” arXiv preprint arXiv:2508.10104, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[47]

Vision transformer adapter for dense predictions,

Z. Chen, Y . Duan, W. Wang, J. He, T. Lu, J. Dai, and Y . Qiao, “Vision transformer adapter for dense predictions,”arXiv preprint arXiv:2205.08534, 2022

work page arXiv 2022
[48]

Frozen- detr: Enhancing detr with image understanding from frozen foundation models,

S. Fu, J. Yan, Q. Yang, X. Wei, X. Xie, and W.-S. Zheng, “Frozen- detr: Enhancing detr with image understanding from frozen foundation models,”Advances in Neural Information Processing Systems, vol. 37, pp. 105 949–105 971, 2024

work page 2024
[49]

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y . Shum, “Dino: Detr with improved denoising anchor boxes for end-to- end object detection,”arXiv preprint arXiv:2203.03605, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[50]

Detrs with collaborative hybrid assign- ments training. arxiv 2022,

Z. Zong, G. Song, and Y . Liu, “Detrs with collaborative hybrid assign- ments training. arxiv 2022,”arXiv preprint arXiv:2211.12860, 2022

work page arXiv 2022
[51]

Rt-detrv4: Painlessly furthering real-time object detection with vision foundation models,

Z. Liao, Y . Zhao, X. Shan, Y . Yan, C. Liu, L. Lu, X. Ji, and J. Chen, “Rt-detrv4: Painlessly furthering real-time object detection with vision foundation models,”arXiv preprint arXiv:2510.25257, 2025

work page arXiv 2025
[52]

Multi-view adversarial discriminator: Mine the non-causal factors for object detection in unseen domains,

M. Xu, L. Qin, W. Chen, S. Pu, and L. Zhang, “Multi-view adversarial discriminator: Mine the non-causal factors for object detection in unseen domains,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 8103–8112

work page 2023
[53]

Object-aware domain gen- eralization for object detection,

W. Lee, D. Hong, H. Lim, and H. Myung, “Object-aware domain gen- eralization for object detection,” inproceedings of the AAAI conference on artificial intelligence, vol. 38, no. 4, 2024, pp. 2947–2955

work page 2024
[54]

Physaug: A physical-guided and frequency-based data augmentation for single- domain generalized object detection,

X. Xu, J. Yang, W. Shi, S. Ding, L. Luo, and J. Liu, “Physaug: A physical-guided and frequency-based data augmentation for single- domain generalized object detection,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 20, 2025, pp. 21 815– 21 823

work page 2025
[55]

Sample-aware randaugment: Search-free automatic data augmentation for effective image recognition: A. xiao et al

A. Xiao, W. Yu, and H. Yu, “Sample-aware randaugment: Search-free automatic data augmentation for effective image recognition: A. xiao et al.”International Journal of Computer Vision, vol. 133, no. 11, pp. 7710–7725, 2025

work page 2025

[1] [1]

Single-domain generalized object detection in urban scene via cyclic-disentangled self-distillation,

A. Wu and C. Deng, “Single-domain generalized object detection in urban scene via cyclic-disentangled self-distillation,” inProceedings of the IEEE/CVF Conference on computer vision and pattern recognition, 2022, pp. 847–856. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 11 GT Daytime ClearDaytime FoggyDusk RainyNight Rainy SA-DETR (DINO) VFM4...

work page 2022

[2] [2]

Dg-detr: Toward domain generalized detection transformer,

S. Hwang, D. Han, and M. Jeon, “Dg-detr: Toward domain generalized detection transformer,”arXiv preprint arXiv:2504.19574, 2025

work page arXiv 2025

[3] [3]

Style-adaptive detection transformer for single-source domain generalized object detection,

J. Han, Y . Wang, and L. Chen, “Style-adaptive detection transformer for single-source domain generalized object detection,”arXiv preprint arXiv:2504.20498, 2025

work page arXiv 2025

[4] [4]

End-to-end object detection with transformers,

N. Carion, F. Massa, G. Synnaeve, N. Usunier, A. Kirillov, and S. Zagoruyko, “End-to-end object detection with transformers,” in European conference on computer vision. Springer, 2020, pp. 213– 229

work page 2020

[5] [5]

Learning to learn single domain gen- eralization,

F. Qiao, L. Zhao, and X. Peng, “Learning to learn single domain gen- eralization,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2020, pp. 12 556–12 565. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 12 Source Image Co- VFM4SDG (Co-DETR)DETR DINOv3 Fig. 4. Visualization of Encoder Feature Responses (Layer...

work page 2020

[6] [6]

Adversarially adaptive normalization for single domain generalization,

X. Fan, Q. Wang, J. Ke, F. Yang, B. Gong, and M. Zhou, “Adversarially adaptive normalization for single domain generalization,” inProceedings of the IEEE/CVF conference on Computer Vision and Pattern Recogni- tion, 2021, pp. 8208–8217

work page 2021

[7] [7]

Progressive domain expansion network for single domain gen- eralization,

L. Li, K. Gao, J. Cao, Z. Huang, Y . Weng, X. Mi, Z. Yu, X. Li, and B. Xia, “Progressive domain expansion network for single domain gen- eralization,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2021, pp. 224–233

work page 2021

[8] [8]

Learning to diversify for single domain generalization,

Z. Wang, Y . Luo, R. Qiu, Z. Huang, and M. Baktashmotlagh, “Learning to diversify for single domain generalization,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 834– 843

work page 2021

[9] [9]

Exact feature distribution matching for arbitrary style transfer and domain generalization,

Y . Zhang, M. Li, R. Li, K. Jia, and L. Zhang, “Exact feature distribution matching for arbitrary style transfer and domain generalization,” in Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 8035–8045

work page 2022

[10] [10]

Out-of-domain generalization from a single source: An uncertainty quantification approach,

X. Peng, F. Qiao, and L. Zhao, “Out-of-domain generalization from a single source: An uncertainty quantification approach,”IEEE Transac- tions on Pattern Analysis and Machine Intelligence, vol. 46, no. 3, pp. 1775–1787, 2022

work page 2022

[11] [11]

Meta convolutional neural networks for single domain gen- eralization,

C. Wan, X. Shen, Y . Zhang, Z. Yin, X. Tian, F. Gao, J. Huang, and X.-S. Hua, “Meta convolutional neural networks for single domain gen- eralization,” inproceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 4682–4691

work page 2022

[12] [12]

Attention consistency on visual corruptions for single-source domain generalization,

I. Cugu, M. Mancini, Y . Chen, and Z. Akata, “Attention consistency on visual corruptions for single-source domain generalization,” inPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022, pp. 4165–4174

work page 2022

[13] [13]

Adversarial source generation for source-free domain adaptation,

C. Cui, F. Meng, C. Zhang, Z. Liu, L. Zhu, S. Gong, and X. Lin, “Adversarial source generation for source-free domain adaptation,”IEEE Transactions on Circuits and Systems for Video Technology, vol. 34, no. 6, pp. 4887–4898, 2023

work page 2023

[14] [14]

Adversarial bayesian augmen- tation for single-source domain generalization,

S. Cheng, T. Gokhale, and Y . Yang, “Adversarial bayesian augmen- tation for single-source domain generalization,” inProceedings of the IEEE/CVF International Conference on Computer Vision, 2023, pp. 11 400–11 410

work page 2023

[15] [15]

Meta-causal learning for single domain generalization,

J. Chen, Z. Gao, X. Wu, and J. Luo, “Meta-causal learning for single domain generalization,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 7683–7692

work page 2023

[16] [16]

Center- aware adversarial augmentation for single domain generalization,

T. Chen, M. Baktashmotlagh, Z. Wang, and M. Salzmann, “Center- aware adversarial augmentation for single domain generalization,” in Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2023, pp. 4157–4165

work page 2023

[17] [17]

Learning class and domain augmentations for single-source open- domain generalization,

P. Bele, V . Bundele, A. Bhattacharya, A. Jha, G. Roig, and B. Banerjee, “Learning class and domain augmentations for single-source open- domain generalization,” inProceedings of the IEEE/CVF Winter Con- ference on Applications of Computer Vision, 2024, pp. 1816–1826

work page 2024

[18] [18]

Progressive diversity generation for single domain generalization,

D. Rui, K. Guo, X. Zhu, Z. Wu, and H. Fang, “Progressive diversity generation for single domain generalization,”IEEE Transactions on Multimedia, vol. 26, pp. 10 200–10 210, 2024

work page 2024

[19] [19]

Single domain generalization via normalised cross- correlation based convolutions,

W. Chuah, R. Tennakoon, R. Hoseinnezhad, D. Suter, and A. Bab- Hadiashar, “Single domain generalization via normalised cross- correlation based convolutions,” inProceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, 2024, pp. 1752–1761

work page 2024

[20] [20]

Wildnet: Learning domain generalized semantic segmentation from the wild,

S. Lee, H. Seong, S. Lee, and E. Kim, “Wildnet: Learning domain generalized semantic segmentation from the wild,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 9936–9946

work page 2022

[21] [21]

Learning generalized knowledge from a single domain on urban-scene segmentation,

X. Li, M. Li, X. Li, and X. Guo, “Learning generalized knowledge from a single domain on urban-scene segmentation,”IEEE Transactions on Multimedia, vol. 25, pp. 7635–7646, 2022

work page 2022

[22] [22]

Style projected clustering for domain generalized semantic segmentation,

W. Huang, C. Chen, Y . Li, J. Li, C. Li, F. Song, Y . Yan, and Z. Xiong, “Style projected clustering for domain generalized semantic segmentation,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 3061–3071

work page 2023

[23] [23]

Adaptive texture filtering for single-domain generalized segmentation,

X. Li, M. Li, Y . Wang, C.-X. Ren, and X. Guo, “Adaptive texture filtering for single-domain generalized segmentation,” inProceedings of the AAAI conference on artificial intelligence, vol. 37, no. 2, 2023, pp. 1442–1450

work page 2023

[24] [24]

Clip the gap: A single domain generalization approach for object detection,

V . Vidit, M. Engilberge, and M. Salzmann, “Clip the gap: A single domain generalization approach for object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 3219–3229

work page 2023

[25] [25]

Learning transferable visual models from natural language supervision,

A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clarket al., “Learning transferable visual models from natural language supervision,” inInternational conference on machine learning. PmLR, 2021, pp. 8748–8763

work page 2021

[26] [26]

Improving single domain-generalized object detection: A focus on diversification and alignment,

M. S. Danish, M. H. Khan, M. A. Munir, M. S. Sarfraz, and M. Ali, “Improving single domain-generalized object detection: A focus on diversification and alignment,” inProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, 2024, pp. 17 732– 17 742

work page 2024

[27] [27]

Towards robust object detection invariant to real-world domain shifts,

Q. Fan, M. Segu, Y .-W. Tai, F. Yu, C.-K. Tang, B. Schiele, and D. Dai, “Towards robust object detection invariant to real-world domain shifts,” inThe Eleventh International Conference on Learning Representations (ICLR 2023). OpenReview, 2023

work page 2023

[28] [28]

Srcd: Se- mantic reasoning with compound domains for single-domain generalized object detection,

Z. Rao, J. Guo, L. Tang, Y . Huang, X. Ding, and S. Guo, “Srcd: Se- mantic reasoning with compound domains for single-domain generalized object detection,”IEEE Transactions on Neural Networks and Learning Systems, 2024

work page 2024

[29] [29]

G-nas: Generalizable neural architecture search for single domain generalization object detection,

F. Wu, J. Gao, L. Hong, X. Wang, C. Zhou, and N. Ye, “G-nas: Generalizable neural architecture search for single domain generalization object detection,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 38, no. 6, 2024, pp. 5958–5966

work page 2024

[30] [30]

Unbiased faster r-cnn for single-source domain generalized object detection,

Y . Liu, S. Zhou, X. Liu, C. Hao, B. Fan, and J. Tian, “Unbiased faster r-cnn for single-source domain generalized object detection,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2024, pp. 28 838–28 847

work page 2024

[31] [31]

Deit iii: Revenge of the vit,

H. Touvron, M. Cord, and H. J ´egou, “Deit iii: Revenge of the vit,” in European conference on computer vision. Springer, 2022, pp. 516–533

work page 2022

[32] [32]

Grounded language-image pre-training,

L. H. Li, P. Zhang, H. Zhang, J. Yang, C. Li, Y . Zhong, L. Wang, L. Yuan, L. Zhang, J.-N. Hwanget al., “Grounded language-image pre-training,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 10 965–10 975

work page 2022

[33] [33]

Segment anything,

A. Kirillov, E. Mintun, N. Ravi, H. Mao, C. Rolland, L. Gustafson, T. Xiao, S. Whitehead, A. C. Berg, W.-Y . Loet al., “Segment anything,” inProceedings of the IEEE/CVF international conference on computer vision, 2023, pp. 4015–4026. JOURNAL OF LATEX CLASS FILES, VOL. 14, NO. 8, AUGUST 2021 14

work page 2023

[34] [34]

SAM 2: Segment Anything in Images and Videos

N. Ravi, V . Gabeur, Y .-T. Hu, R. Hu, C. Ryali, T. Ma, H. Khedr, R. R¨adle, C. Rolland, L. Gustafsonet al., “Sam 2: Segment anything in images and videos,”arXiv preprint arXiv:2408.00714, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[35] [35]

SAM 3: Segment Anything with Concepts

N. Carion, L. Gustafson, Y .-T. Hu, S. Debnath, R. Hu, D. Suris, C. Ryali, K. V . Alwala, H. Khedr, A. Huanget al., “Sam 3: Segment anything with concepts,”arXiv preprint arXiv:2511.16719, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[36] [36]

An empirical study of training self- supervised vision transformers,

X. Chen, S. Xie, and K. He, “An empirical study of training self- supervised vision transformers,” inProceedings of the IEEE/CVF in- ternational conference on computer vision, 2021, pp. 9640–9649

work page 2021

[37] [37]

Emerging properties in self-supervised vision transformers,

M. Caron, H. Touvron, I. Misra, H. J ´egou, J. Mairal, P. Bojanowski, and A. Joulin, “Emerging properties in self-supervised vision transformers,” inProceedings of the IEEE/CVF international conference on computer vision, 2021, pp. 9650–9660

work page 2021

[38] [38]

Is imagenet worth 1 video? learning strong image encoders from 1 long unlabelled video,

S. Venkataramanan, M. N. Rizve, J. Carreira, Y . M. Asano, and Y . Avrithis, “Is imagenet worth 1 video? learning strong image encoders from 1 long unlabelled video,”arXiv preprint arXiv:2310.08584, 2023

work page arXiv 2023

[39] [39]

Self-supervised cross- stage regional contrastive learning for object detection,

J. Yan, L. Yang, Y . Gao, and W.-S. Zheng, “Self-supervised cross- stage regional contrastive learning for object detection,” in2023 IEEE International Conference on Multimedia and Expo (ICME). IEEE, 2023, pp. 1044–1049

work page 2023

[40] [40]

Masked au- toencoders are scalable vision learners,

K. He, X. Chen, S. Xie, Y . Li, P. Doll ´ar, and R. Girshick, “Masked au- toencoders are scalable vision learners,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2022, pp. 16 000–16 009

work page 2022

[41] [41]

BEiT: BERT Pre-Training of Image Transformers

H. Bao, L. Dong, S. Piao, and F. Wei, “Beit: Bert pre-training of image transformers,”arXiv preprint arXiv:2106.08254, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[42] [42]

Image as a foreign language: Beit pretraining for vision and vision-language tasks,

W. Wang, H. Bao, L. Dong, J. Bjorck, Z. Peng, Q. Liu, K. Aggarwal, O. K. Mohammed, S. Singhal, S. Somet al., “Image as a foreign language: Beit pretraining for vision and vision-language tasks,” in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2023, pp. 19 175–19 186

work page 2023

[43] [43]

Deconstructing denoising diffusion models for self-supervised learning,

X. Chen, Z. Liu, S. Xie, and K. He, “Deconstructing denoising diffusion models for self-supervised learning,”arXiv preprint arXiv:2401.14404, 2024

work page arXiv 2024

[44] [44]

iBOT: Image BERT Pre-Training with Online Tokenizer

J. Zhou, C. Wei, H. Wang, W. Shen, C. Xie, A. Yuille, and T. Kong, “ibot: Image bert pre-training with online tokenizer,”arXiv preprint arXiv:2111.07832, 2021

work page internal anchor Pith review Pith/arXiv arXiv 2021

[45] [45]

DINOv2: Learning Robust Visual Features without Supervision

M. Oquab, T. Darcet, T. Moutakanni, H. V o, M. Szafraniec, V . Khalidov, P. Fernandez, D. Haziza, F. Massa, A. El-Noubyet al., “Dinov2: Learning robust visual features without supervision,”arXiv preprint arXiv:2304.07193, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[46] [46]

DINOv3

O. Sim ´eoni, H. V . V o, M. Seitzer, F. Baldassarre, M. Oquab, C. Jose, V . Khalidov, M. Szafraniec, S. Yi, M. Ramamonjisoaet al., “Dinov3,” arXiv preprint arXiv:2508.10104, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[47] [47]

Vision transformer adapter for dense predictions,

Z. Chen, Y . Duan, W. Wang, J. He, T. Lu, J. Dai, and Y . Qiao, “Vision transformer adapter for dense predictions,”arXiv preprint arXiv:2205.08534, 2022

work page arXiv 2022

[48] [48]

Frozen- detr: Enhancing detr with image understanding from frozen foundation models,

S. Fu, J. Yan, Q. Yang, X. Wei, X. Xie, and W.-S. Zheng, “Frozen- detr: Enhancing detr with image understanding from frozen foundation models,”Advances in Neural Information Processing Systems, vol. 37, pp. 105 949–105 971, 2024

work page 2024

[49] [49]

DINO: DETR with Improved DeNoising Anchor Boxes for End-to-End Object Detection

H. Zhang, F. Li, S. Liu, L. Zhang, H. Su, J. Zhu, L. M. Ni, and H.-Y . Shum, “Dino: Detr with improved denoising anchor boxes for end-to- end object detection,”arXiv preprint arXiv:2203.03605, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[50] [50]

Detrs with collaborative hybrid assign- ments training. arxiv 2022,

Z. Zong, G. Song, and Y . Liu, “Detrs with collaborative hybrid assign- ments training. arxiv 2022,”arXiv preprint arXiv:2211.12860, 2022

work page arXiv 2022

[51] [51]

Rt-detrv4: Painlessly furthering real-time object detection with vision foundation models,

Z. Liao, Y . Zhao, X. Shan, Y . Yan, C. Liu, L. Lu, X. Ji, and J. Chen, “Rt-detrv4: Painlessly furthering real-time object detection with vision foundation models,”arXiv preprint arXiv:2510.25257, 2025

work page arXiv 2025

[52] [52]

Multi-view adversarial discriminator: Mine the non-causal factors for object detection in unseen domains,

M. Xu, L. Qin, W. Chen, S. Pu, and L. Zhang, “Multi-view adversarial discriminator: Mine the non-causal factors for object detection in unseen domains,” inProceedings of the IEEE/CVF conference on computer vision and pattern recognition, 2023, pp. 8103–8112

work page 2023

[53] [53]

Object-aware domain gen- eralization for object detection,

W. Lee, D. Hong, H. Lim, and H. Myung, “Object-aware domain gen- eralization for object detection,” inproceedings of the AAAI conference on artificial intelligence, vol. 38, no. 4, 2024, pp. 2947–2955

work page 2024

[54] [54]

Physaug: A physical-guided and frequency-based data augmentation for single- domain generalized object detection,

X. Xu, J. Yang, W. Shi, S. Ding, L. Luo, and J. Liu, “Physaug: A physical-guided and frequency-based data augmentation for single- domain generalized object detection,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 20, 2025, pp. 21 815– 21 823

work page 2025

[55] [55]

Sample-aware randaugment: Search-free automatic data augmentation for effective image recognition: A. xiao et al

A. Xiao, W. Yu, and H. Yu, “Sample-aware randaugment: Search-free automatic data augmentation for effective image recognition: A. xiao et al.”International Journal of Computer Vision, vol. 133, no. 11, pp. 7710–7725, 2025

work page 2025