pith. sign in

arxiv: 2606.21736 · v1 · pith:P5S3BYHVnew · submitted 2026-06-19 · 💻 cs.CV

Adversarial Domain Prompt Tuning and Generation for Single Domain Generalization

Pith reviewed 2026-06-26 14:16 UTC · model grok-4.3

classification 💻 cs.CV
keywords single domain generalizationprompt tuningdiffusion modelsadversarial learningdata generationout-of-domain augmentationtext-to-image models
0
0 comments X

The pith

Adversarial tuning of two prompt sets inside a frozen diffusion model generates out-of-domain images that improve single-domain generalization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper seeks to establish that a model trained on one domain can generalize to many unseen domains by augmenting its training set with images produced by a pre-trained text-to-image diffusion model. It does this by learning one prompt set that holds category identity and a second prompt set that holds domain style, then optimizing the two sets against each other so that style changes while category stays fixed. This removes the need to write manual descriptions for every possible target domain. If the separation holds, training data can be expanded automatically to cover a wide range of visual styles without collecting new real images.

Core claim

The Progressive Adversarial Prompt Tuning framework learns two sets of abstract prompts as conditions for a pre-trained diffusion model: one set encodes domain-invariant category information while the other encodes domain-specific styles. Adversarial optimization between the two sets lets the diffusion model produce images that vary in style across many domains yet keep the original category features intact, yielding higher accuracy on unseen domains than prior single-domain generalization methods.

What carries the argument

Two adversarially optimized sets of abstract prompts that condition a frozen diffusion model to separate category content from domain style.

If this is right

  • A single-domain training set can be expanded with generated images that cover many unseen visual styles.
  • No hand-written domain descriptions are needed to produce the varied training images.
  • The generated images preserve enough category detail that classifiers trained on them outperform earlier single-domain generalization techniques.
  • The same frozen diffusion model can be reused across different source domains by retuning only the prompt sets.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If prompt separation works reliably, the same idea could be tested on other generative backbones such as autoregressive image models.
  • Applying the method to fine-grained categories might reveal whether the learned prompts can control attributes at the part level rather than whole-object level.
  • Running the generated images through a downstream task that requires precise localization could test whether style variation harms spatial structure.

Load-bearing premise

Two learned prompt sets can be made to carry only category information or only style information inside the diffusion model without the categories becoming unrecognizable.

What would settle it

Train a classifier on images generated by the method and measure its accuracy on a held-out multi-domain test set; if accuracy is no higher than a classifier trained only on the original single domain, the claim is false.

Figures

Figures reproduced from arXiv: 2606.21736 by De Cheng, Dongsheng Li, Nannan Wang, Xinbo Gao, Xinyang Jiang, Zhipeng Xu.

Figure 1
Figure 1. Figure 1: Comparison between our method with existing single [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The framework of our proposed method. We first train a set of category prompts [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: This figure shows the impact of the Progressive Training [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: This figure shows the training data generated by the T2I model under different input conditions, corresponding to each row in the [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Impact of domain prompt number in Eq. 7 N on PACS and VLCS dataset on SDG setting and multi-source DG setting. mainly introduce diverse styles obtain relatively slight per￾formance gains on VLCS dataset compared to PACS ones. However, our AD significantly improves the model perfor￾mance of 0.28%∼6.40% across each domain and 3.42% on Avg-acc (see 3 rd and 4 th row in Tab. 6). This observation indicates our … view at source ↗
Figure 6
Figure 6. Figure 6: Parameter analysis of λ in Eq. 15 on PACS dataset. 4.6. Parameter Analysis We conduct experiments to analyze the impact of the hyper￾parameters in our method on PACS dataset, including λ in Eq. 15. The results are shown in [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗
read the original abstract

Single domain generalization (SDG) aims to learn a robust model, which could perform well on many unseen domains while there is only one single domain available for training. One of the promising directions for achieving single-domain generalization is to generate out-of-domain (OOD) training data through data augmentation or image generation. Given the rapid advancements in AI-generated content (AIGC), this paper is the first to propose leveraging powerful pre-trained text-to-image (T2I) foundation models to create the training data. However, manually designing textual prompts to generate images for all possible domains is often impractical, and some domain characteristics may be too abstract to describe with words. To address these challenges, we propose a novel Progressive Adversarial Prompt Tuning (PAPT) framework for pre-trained diffusion models. Instead of relying on static textual domains, our approach learns two sets of abstract prompts as conditions for the diffusion model: one that captures domain-invariant category information and another that models domain-specific styles. This adversarial learning mechanism enables the T2I model to generate images in various domain styles while preserving key categorical features. Extensive experiments demonstrate the effectiveness of the proposed method, achieving superior performances to state-of-the-art single-domain generalization approaches.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes the Progressive Adversarial Prompt Tuning (PAPT) framework for single-domain generalization (SDG). It is the first to leverage pre-trained text-to-image diffusion models to synthesize out-of-domain training data by adversarially learning two sets of abstract prompts—one capturing domain-invariant category information and the other modeling domain-specific styles—without manual domain descriptions. The central claim is that this mechanism allows the frozen T2I model to generate diverse domain-styled images while preserving categorical fidelity, yielding superior performance over state-of-the-art SDG methods as shown by extensive experiments.

Significance. If the disentanglement and performance claims hold, the work would advance SDG by automating OOD data generation via foundation models, removing reliance on hand-crafted prompts and potentially improving robustness when only single-domain training data is available.

major comments (2)
  1. [Abstract] Abstract: the claim that adversarial optimization of the two prompt sets 'enables the T2I model to generate images in various domain styles while preserving key categorical features' is load-bearing, yet the abstract supplies no description of the adversarial objective, any category-reconstruction term, contrastive regularizer, or classifier-based fidelity loss that would prevent style prompts from drifting category semantics or invariant prompts from absorbing style leakage.
  2. [Abstract] Abstract: the assertion of 'extensive experiments' demonstrating 'superior performances to state-of-the-art single-domain generalization approaches' is unsupported by any quantitative results, baseline comparisons, ablation details, or error analysis, preventing verification of the central performance claim.
minor comments (1)
  1. The acronym PAPT is introduced in the abstract but the title uses a different phrasing ('Adversarial Domain Prompt Tuning and Generation'); consistent terminology would aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We respond point-by-point below and indicate where revisions will be made.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that adversarial optimization of the two prompt sets 'enables the T2I model to generate images in various domain styles while preserving key categorical features' is load-bearing, yet the abstract supplies no description of the adversarial objective, any category-reconstruction term, contrastive regularizer, or classifier-based fidelity loss that would prevent style prompts from drifting category semantics or invariant prompts from absorbing style leakage.

    Authors: We agree the abstract is concise and omits explicit mention of the loss terms. The full manuscript (Section 3.2) defines the progressive adversarial objective with a category-reconstruction loss, contrastive regularizer, and classifier-based fidelity term to enforce the claimed disentanglement. To address the concern, we will revise the abstract to briefly reference these mechanisms. revision: yes

  2. Referee: [Abstract] Abstract: the assertion of 'extensive experiments' demonstrating 'superior performances to state-of-the-art single-domain generalization approaches' is unsupported by any quantitative results, baseline comparisons, ablation details, or error analysis, preventing verification of the central performance claim.

    Authors: Abstracts are space-constrained and conventionally summarize rather than enumerate numbers or tables; the manuscript provides those details in Section 4 (Tables 1-4, Figures 3-5) with SOTA comparisons, ablations, and error analysis. We will explore adding one high-level quantitative statement if it fits the length limit. revision: partial

Circularity Check

0 steps flagged

No circularity: novel prompt-tuning framework with no self-referential reductions

full rationale

The paper introduces a new Progressive Adversarial Prompt Tuning (PAPT) method that learns two abstract prompt sets inside a frozen T2I diffusion model via adversarial optimization. No equations, fitted parameters, or self-citations appear in the provided text that would reduce the claimed OOD generation or SDG performance to inputs by construction. The derivation chain consists of a proposed mechanism (adversarial prompt learning for category-style separation) whose outputs are evaluated experimentally rather than defined into the inputs. This is a standard non-circular proposal of a new technique.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the method implicitly assumes the diffusion model can be conditioned on learned prompts without further specification.

pith-pipeline@v0.9.1-grok · 5756 in / 1127 out tokens · 15027 ms · 2026-06-26T14:16:00.274122+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

78 extracted references · 13 canonical work pages · 7 internal anchors

  1. [1]

    Invariant Risk Minimization

    Martin Arjovsky, L ´eon Bottou, Ishaan Gulrajani, and David Lopez-Paz. Invariant risk minimization.arXiv preprint arXiv:1907.02893, 2019. 6

  2. [2]

    Recognition in terra incognita

    Sara Beery, Grant Van Horn, and Pietro Perona. Recognition in terra incognita. InProceedings of the European confer- ence on computer vision (ECCV), pages 456–473, 2018. 6

  3. [3]

    Domain generalization by marginal transfer learning.Journal of machine learning re- search, 22(2):1–55, 2021

    Gilles Blanchard, Aniket Anand Deshmukh, Urun Dogan, Gyemin Lee, and Clayton Scott. Domain generalization by marginal transfer learning.Journal of machine learning re- search, 22(2):1–55, 2021. 6

  4. [4]

    Domain generalization by solving jigsaw puzzles

    Fabio M Carlucci, Antonio D’Innocente, Silvia Bucci, Bar- bara Caputo, and Tatiana Tommasi. Domain generalization by solving jigsaw puzzles. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2229–2238, 2019. 2

  5. [5]

    Domain generalization by mutual-information regu- larization with pre-trained models

    Junbum Cha, Kyungjae Lee, Sungrae Park, and Sanghyuk Chun. Domain generalization by mutual-information regu- larization with pre-trained models. InEuropean conference on computer vision, pages 440–457. Springer, 2022. 7

  6. [6]

    Meta-causal learning for single domain generalization

    Jin Chen, Zhi Gao, Xinxiao Wu, and Jiebo Luo. Meta-causal learning for single domain generalization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7683–7692, 2023. 6

  7. [7]

    Improved test-time adaptation for domain generalization

    Liang Chen, Yong Zhang, Yibing Song, Ying Shan, and Lingqiao Liu. Improved test-time adaptation for domain generalization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24172– 24182, 2023. 6

  8. [8]

    Domain generalization via rationale invariance

    Liang Chen, Yong Zhang, Yibing Song, Anton Van Den Hen- gel, and Lingqiao Liu. Domain generalization via rationale invariance. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 1751–1760, 2023. 6

  9. [9]

    Efficient bilateral cross- modality cluster matching for unsupervised visible-infrared person reid

    De Cheng, Lingfeng He, Nannan Wang, Shizhou Zhang, Zhen Wang, and Xinbo Gao. Efficient bilateral cross- modality cluster matching for unsupervised visible-infrared person reid. InProceedings of the 31st ACM International Conference on Multimedia, pages 1325–1333, 2023. 1

  10. [10]

    Disentangled prompt rep- resentation for domain generalization

    De Cheng, Zhipeng Xu, Xinyang Jiang, Nannan Wang, Dongsheng Li, and Xinbo Gao. Disentangled prompt rep- resentation for domain generalization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23595–23604, 2024. 1, 2

  11. [11]

    Prompt disentanglement via language guidance and representation alignment for domain generalization.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026

    De Cheng, Zhipeng Xu, Xinyang Jiang, Dongsheng Li, Nan- nan Wang, and Xinbo Gao. Prompt disentanglement via language guidance and representation alignment for domain generalization.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026

  12. [12]

    Interference-isolated elastic weight consolidation and knowledge calibration for incremental ob- ject detection

    De Cheng, Mingyue Zeng, Zhipeng Xu, Di Xu, Nannan Wang, and Xinbo Gao. Interference-isolated elastic weight consolidation and knowledge calibration for incremental ob- ject detection. InProceedings of the Fourteenth Interna- tional Conference on Learning Representations, 2026. 1

  13. [13]

    Progressive random con- volutions for single domain generalization

    Seokeon Choi, Debasmit Das, Sungha Choi, Seunghan Yang, Hyunsin Park, and Sungrack Yun. Progressive random con- volutions for single domain generalization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pat- tern Recognition, pages 10312–10322, 2023. 6

  14. [14]

    Directly Fine-Tuning Diffusion Models on Differentiable Rewards

    Kevin Clark, Paul Vicol, Kevin Swersky, and David J Fleet. Directly fine-tuning diffusion models on differentiable re- wards.arXiv preprint arXiv:2309.17400, 2023. 3

  15. [15]

    Attention consistency on visual corruptions for single-source domain generalization

    Ilke Cugu, Massimiliano Mancini, Yanbei Chen, and Zeynep Akata. Attention consistency on visual corruptions for single-source domain generalization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 4165–4174, 2022. 6

  16. [16]

    Imagenet: A large-scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009. 6

  17. [17]

    Cogview: Mastering text-to-image generation via transformers.Advances in neural information processing systems, 34:19822–19835, 2021

    Ming Ding, Zhuoyi Yang, Wenyi Hong, Wendi Zheng, Chang Zhou, Da Yin, Junyang Lin, Xu Zou, Zhou Shao, Hongxia Yang, et al. Cogview: Mastering text-to-image generation via transformers.Advances in neural information processing systems, 34:19822–19835, 2021. 2

  18. [18]

    Domain generalization via model-agnostic learning of semantic features.Advances in neural informa- tion processing systems, 32, 2019

    Qi Dou, Daniel Coelho de Castro, Konstantinos Kamnitsas, and Ben Glocker. Domain generalization via model-agnostic learning of semantic features.Advances in neural informa- tion processing systems, 32, 2019. 2

  19. [19]

    Adversarially adaptive normal- ization for single domain generalization

    Xinjie Fan, Qifei Wang, Junjie Ke, Feng Yang, Boqing Gong, and Mingyuan Zhou. Adversarially adaptive normal- ization for single domain generalization. InProceedings of the IEEE/CVF conference on Computer Vision and Pattern Recognition, pages 8208–8217, 2021. 2

  20. [20]

    An Image is Worth One Word: Personalizing Text-to-Image Generation using Textual Inversion

    Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patash- nik, Amit H Bermano, Gal Chechik, and Daniel Cohen- Or. An image is worth one word: Personalizing text-to- image generation using textual inversion.arXiv preprint arXiv:2208.01618, 2022. 3

  21. [21]

    Domain-adversarial training of neural networks.Journal of machine learning research, 17(59):1–35, 2016

    Yaroslav Ganin, Evgeniya Ustinova, Hana Ajakan, Pas- cal Germain, Hugo Larochelle, Franc ¸ois Laviolette, Mario March, and Victor Lempitsky. Domain-adversarial training of neural networks.Journal of machine learning research, 17(59):1–35, 2016. 1, 2, 6, 7

  22. [22]

    Generative adversarial networks.Commu- nications of the ACM, 63(11):139–144, 2020

    Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial networks.Commu- nications of the ACM, 63(11):139–144, 2020. 2

  23. [23]

    Domaindrop: Sup- pressing domain-sensitive channels for domain generaliza- tion

    Jintao Guo, Lei Qi, and Yinghuan Shi. Domaindrop: Sup- pressing domain-sensitive channels for domain generaliza- tion. InProceedings of the IEEE/CVF international confer- ence on computer vision, pages 19114–19124, 2023. 7

  24. [24]

    Deep residual learning for image recognition

    Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceed- ings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016. 6

  25. [25]

    Augmix: A simple data processing method to improve robustness and uncertainty.arXiv preprint arXiv:1912.02781, 2019

    Dan Hendrycks, Norman Mu, Ekin D Cubuk, Barret Zoph, Justin Gilmer, and Balaji Lakshminarayanan. Augmix: A simple data processing method to improve robustness and uncertainty.arXiv preprint arXiv:1912.02781, 2019. 6

  26. [26]

    Distilling the Knowledge in a Neural Network

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distill- ing the knowledge in a neural network.arXiv preprint arXiv:1503.02531, 2015. 4

  27. [27]

    Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising dif- fusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 4

  28. [28]

    Representation enhancement-stabilization: Reducing bias- variance of domain generalization

    Wei Huang, Yilei Shi, Zhitong Xiong, and Xiao Xiang Zhu. Representation enhancement-stabilization: Reducing bias- variance of domain generalization. 7

  29. [29]

    Self-challenging improves cross-domain generalization

    Zeyi Huang, Haohan Wang, Eric P Xing, and Dong Huang. Self-challenging improves cross-domain generalization. In Computer vision–ECCV 2020: 16th European conference, Glasgow, UK, August 23–28, 2020, proceedings, part II 16, pages 124–140. Springer, 2020. 6, 7

  30. [30]

    Undoing the dam- age of dataset bias

    Aditya Khosla, Tinghui Zhou, Tomasz Malisiewicz, Alexei A Efros, and Antonio Torralba. Undoing the dam- age of dataset bias. InComputer Vision–ECCV 2012: 12th European Conference on Computer Vision, Florence, Italy, October 7-13, 2012, Proceedings, Part I 12, pages 158–171. Springer, 2012. 2

  31. [31]

    Out-of-distribution general- ization via risk extrapolation (rex)

    David Krueger, Ethan Caballero, Joern-Henrik Jacobsen, Amy Zhang, Jonathan Binas, Dinghuai Zhang, Remi Le Priol, and Aaron Courville. Out-of-distribution general- ization via risk extrapolation (rex). InInternational confer- ence on machine learning, pages 5815–5826. PMLR, 2021. 6, 7

  32. [32]

    Deeper, broader and artier domain generaliza- tion

    Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy M Hospedales. Deeper, broader and artier domain generaliza- tion. InProceedings of the IEEE international conference on computer vision, pages 5542–5550, 2017. 5, 6

  33. [33]

    Learning to generalize: Meta-learning for do- main generalization

    Da Li, Yongxin Yang, Yi-Zhe Song, and Timothy Hospedales. Learning to generalize: Meta-learning for do- main generalization. InProceedings of the AAAI conference on artificial intelligence, 2018. 7

  34. [34]

    Prompt-driven dynamic object-centric learning for single do- main generalization

    Deng Li, Aming Wu, Yaowei Wang, and Yahong Han. Prompt-driven dynamic object-centric learning for single do- main generalization. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition, pages 17606–17615, 2024. 6

  35. [35]

    Uncertainty model- ing for out-of-distribution generalization.arXiv preprint arXiv:2202.03958, 2022

    Xiaotong Li, Yongxing Dai, Yixiao Ge, Jun Liu, Ying Shan, and Ling-Yu Duan. Uncertainty model- ing for out-of-distribution generalization.arXiv preprint arXiv:2202.03958, 2022. 6

  36. [36]

    Domain generalization via conditional invari- ant representations

    Ya Li, Mingming Gong, Xinmei Tian, Tongliang Liu, and Dacheng Tao. Domain generalization via conditional invari- ant representations. InProceedings of the AAAI conference on artificial intelligence, 2018. 6, 7

  37. [37]

    Transfer feature learning with joint distribution adaptation

    Mingsheng Long, Jianmin Wang, Guiguang Ding, Jiaguang Sun, and Philip S Yu. Transfer feature learning with joint distribution adaptation. InProceedings of the IEEE inter- national conference on computer vision, pages 2200–2207,

  38. [38]

    Grounding stylistic do- main generalization with quantitative domain shift measures and synthetic scene images

    Yiran Luo, Joshua Feinglass, Tejas Gokhale, Kuan-Cheng Lee, Chitta Baral, and Yezhou Yang. Grounding stylistic do- main generalization with quantitative domain shift measures and synthetic scene images. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7303–7313, 2024. 7

  39. [39]

    Reducing domain gap by reduc- ing style bias

    Hyeonseob Nam, HyunJae Lee, Jongchan Park, Wonjun Yoon, and Donggeun Yoo. Reducing domain gap by reduc- ing style bias. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8690– 8699, 2021. 6, 7

  40. [40]

    GLIDE: Towards Photorealistic Image Generation and Editing with Text-Guided Diffusion Models

    Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models.arXiv preprint arXiv:2112.10741, 2021. 2

  41. [41]

    Permuted adain: Reducing the bias towards global statistics in image clas- sification

    Oren Nuriel, Sagie Benaim, and Lior Wolf. Permuted adain: Reducing the bias towards global statistics in image clas- sification. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9482–9491,

  42. [42]

    Transferrable prototypical networks for unsupervised domain adaptation

    Yingwei Pan, Ting Yao, Yehao Li, Yu Wang, Chong-Wah Ngo, and Tao Mei. Transferrable prototypical networks for unsupervised domain adaptation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 2239–2247, 2019. 1

  43. [43]

    Moment matching for multi-source domain adaptation

    Xingchao Peng, Qinxun Bai, Xide Xia, Zijun Huang, Kate Saenko, and Bo Wang. Moment matching for multi-source domain adaptation. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 1406–1415,

  44. [44]

    Domain agnostic learning with disentangled repre- sentations

    Xingchao Peng, Zijun Huang, Ximeng Sun, and Kate Saenko. Domain agnostic learning with disentangled repre- sentations. InInternational conference on machine learning, pages 5102–5112. PMLR, 2019. 2

  45. [45]

    Learning to learn single domain generalization

    Fengchun Qiao, Long Zhao, and Xi Peng. Learning to learn single domain generalization. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 12556–12565, 2020. 2

  46. [46]

    Bmd: A general class-balanced multicen- tric dynamic prototype strategy for source-free domain adap- tation

    Sanqing Qu, Guang Chen, Jing Zhang, Zhijun Li, Wei He, and Dacheng Tao. Bmd: A general class-balanced multicen- tric dynamic prototype strategy for source-free domain adap- tation. InEuropean conference on computer vision, pages 165–182. Springer, 2022. 1

  47. [47]

    Modality-agnostic debiasing for sin- gle domain generalization

    Sanqing Qu, Yingwei Pan, Guang Chen, Ting Yao, Changjun Jiang, and Tao Mei. Modality-agnostic debiasing for sin- gle domain generalization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 24142–24151, 2023. 5, 6

  48. [48]

    Upcycling models under domain and category shift

    Sanqing Qu, Tianpei Zou, Florian R ¨ohrbein, Cewu Lu, Guang Chen, Dacheng Tao, and Changjun Jiang. Upcycling models under domain and category shift. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 20019–20028, 2023. 1

  49. [49]

    Learning transferable visual models from natural language supervi- sion

    Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PMLR, 2021. 4

  50. [50]

    Fishr: Invariant gradient variances for out-of-distribution generalization

    Alexandre Rame, Corentin Dancette, and Matthieu Cord. Fishr: Invariant gradient variances for out-of-distribution generalization. InInternational Conference on Machine Learning, pages 18347–18377. PMLR, 2022. 6

  51. [51]

    Zero-shot text-to-image generation

    Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. InInternational confer- ence on machine learning, pages 8821–8831. Pmlr, 2021. 2

  52. [52]

    High-resolution image synthesis with latent diffusion models

    Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj ¨orn Ommer. High-resolution image synthesis with latent diffusion models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022. 2, 3, 4

  53. [53]

    U- net: Convolutional networks for biomedical image segmen- tation

    Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U- net: Convolutional networks for biomedical image segmen- tation. InMedical image computing and computer-assisted intervention–MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceedings, part III 18, pages 234–241. Springer, 2015. 4

  54. [54]

    Distributionally Robust Neural Networks for Group Shifts: On the Importance of Regularization for Worst-Case Generalization

    Shiori Sagawa, Pang Wei Koh, Tatsunori B Hashimoto, and Percy Liang. Distributionally robust neural networks for group shifts: On the importance of regularization for worst- case generalization.arXiv preprint arXiv:1911.08731, 2019. 6, 7

  55. [55]

    Generalizing Across Domains via Cross-Gradient Training

    Shiv Shankar, Vihari Piratla, Soumen Chakrabarti, Sid- dhartha Chaudhuri, Preethi Jyothi, and Sunita Sarawagi. Generalizing across domains via cross-gradient training. arXiv preprint arXiv:1804.10745, 2018. 2

  56. [56]

    Unknown domain inconsistency minimization for domain generalization

    Seungjae Shin, HeeSun Bae, Byeonghu Na, Yoon-Yeong Kim, and Il-chul Moon. Unknown domain inconsistency minimization for domain generalization. InThe Twelfth In- ternational Conference on Learning Representations. 5, 6

  57. [57]

    Deep coral: Correlation alignment for deep domain adaptation

    Baochen Sun and Kate Saenko. Deep coral: Correlation alignment for deep domain adaptation. InComputer Vision– ECCV 2016 Workshops: Amsterdam, The Netherlands, Oc- tober 8-10 and 15-16, 2016, Proceedings, Part III 14, pages 443–450. Springer, 2016. 6, 7

  58. [58]

    Rethinking multi- domain generalization with a general learning objective

    Zhaorui Tan, Xi Yang, and Kaizhu Huang. Rethinking multi- domain generalization with a general learning objective. In Proceedings of the IEEE/CVF Conference on Computer Vi- sion and Pattern Recognition, pages 23512–23522, 2024. 7

  59. [59]

    Springer science & business media, 2013

    Vladimir Vapnik.The nature of statistical learning theory. Springer science & business media, 2013. 6, 7

  60. [60]

    Deep hashing network for unsupervised domain adaptation

    Hemanth Venkateswara, Jose Eusebio, Shayok Chakraborty, and Sethuraman Panchanathan. Deep hashing network for unsupervised domain adaptation. InProceedings of the IEEE conference on computer vision and pattern recogni- tion, pages 5018–5027, 2017. 6

  61. [61]

    Addressing model vul- nerability to distributional shifts over image transformation sets

    Riccardo V olpi and Vittorio Murino. Addressing model vul- nerability to distributional shifts over image transformation sets. InProceedings of the IEEE/CVF International Confer- ence on Computer Vision, pages 7980–7989, 2019. 2

  62. [62]

    Generalizing to unseen domains via adversarial data augmentation.Ad- vances in neural information processing systems, 31, 2018

    Riccardo V olpi, Hongseok Namkoong, Ozan Sener, John C Duchi, Vittorio Murino, and Silvio Savarese. Generalizing to unseen domains via adversarial data augmentation.Ad- vances in neural information processing systems, 31, 2018. 2

  63. [63]

    Meta-learning for domain generalization in semantic parsing.arXiv preprint arXiv:2010.11988, 2020

    Bailin Wang, Mirella Lapata, and Ivan Titov. Meta-learning for domain generalization in semantic parsing.arXiv preprint arXiv:2010.11988, 2020. 2

  64. [64]

    Stpr: Spatiotemporal preservation and routing for exemplar-free video class-incremental learning

    Huaijie Wang, De Cheng, Guozhang Li, Zhipeng Xu, Lingfeng He, Jie Li, Nannan Wang, and Xinbo Gao. Stpr: Spatiotemporal preservation and routing for exemplar-free video class-incremental learning. InProceedings of the Four- teenth International Conference on Learning Representa- tions, 2026. 1

  65. [65]

    Generalizing to unseen domains: A survey on do- main generalization.IEEE transactions on knowledge and data engineering, 35(8):8052–8072, 2022

    Jindong Wang, Cuiling Lan, Chang Liu, Yidong Ouyang, Tao Qin, Wang Lu, Yiqiang Chen, Wenjun Zeng, and S Yu Philip. Generalizing to unseen domains: A survey on do- main generalization.IEEE transactions on knowledge and data engineering, 35(8):8052–8072, 2022. 1

  66. [66]

    Sharpness-aware gradient matching for domain generaliza- tion

    Pengfei Wang, Zhaoxiang Zhang, Zhen Lei, and Lei Zhang. Sharpness-aware gradient matching for domain generaliza- tion. InProceedings of the IEEE/CVF Conference on Com- puter Vision and Pattern Recognition, pages 3769–3778,

  67. [67]

    Learning to diversify for single do- main generalization

    Zijian Wang, Yadan Luo, Ruihong Qiu, Zi Huang, and Mahsa Baktashmotlagh. Learning to diversify for single do- main generalization. InProceedings of the IEEE/CVF In- ternational Conference on Computer Vision, pages 834–843,

  68. [68]

    Imagere- ward: Learning and evaluating human preferences for text- to-image generation.Advances in Neural Information Pro- cessing Systems, 36, 2024

    Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagere- ward: Learning and evaluating human preferences for text- to-image generation.Advances in Neural Information Pro- cessing Systems, 36, 2024. 3

  69. [69]

    Reasoning-driven multimodal llm for domain generalization

    Zhipeng Xu, Zilong Wang, Xinyang Jiang, Dongsheng Li, De Cheng, and Nannan Wang. Reasoning-driven multimodal llm for domain generalization. InThe Fourteenth Interna- tional Conference on Learning Representations, 2026. 1

  70. [70]

    Improve unsupervised domain adaptation with mixup training.arXiv preprint arXiv:2001.00677, 2020

    Shen Yan, Huan Song, Nanxiang Li, Lincan Zou, and Liu Ren. Improve unsupervised domain adaptation with mixup training.arXiv preprint arXiv:2001.00677, 2020. 6

  71. [71]

    Cutmix: Regu- larization strategy to train strong classifiers with localizable features

    Sangdoo Yun, Dongyoon Han, Seong Joon Oh, Sanghyuk Chun, Junsuk Choe, and Youngjoon Yoo. Cutmix: Regu- larization strategy to train strong classifiers with localizable features. InProceedings of the IEEE/CVF international con- ference on computer vision, pages 6023–6032, 2019. 6

  72. [72]

    mixup: Beyond empirical risk minimiza- tion

    Hongyi Zhang, Moustapha Cisse, Yann N Dauphin, and David Lopez-Paz. mixup: Beyond empirical risk minimiza- tion. InInternational Conference on Learning Representa- tions, 2018. 6

  73. [73]

    Towards principled disentanglement for domain generalization

    Hanlin Zhang, Yi-Fan Zhang, Weiyang Liu, Adrian Weller, Bernhard Sch ¨olkopf, and Eric P Xing. Towards principled disentanglement for domain generalization. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8024–8034, 2022. 1

  74. [74]

    Adaptive risk min- imization: Learning to adapt to domain shift.Advances in Neural Information Processing Systems, 34:23664–23678,

    Marvin Zhang, Henrik Marklund, Nikita Dhawan, Abhishek Gupta, Sergey Levine, and Chelsea Finn. Adaptive risk min- imization: Learning to adapt to domain shift.Advances in Neural Information Processing Systems, 34:23664–23678,

  75. [75]

    Exact feature distribution matching for arbitrary style transfer and domain generalization

    Yabin Zhang, Minghan Li, Ruihuang Li, Kui Jia, and Lei Zhang. Exact feature distribution matching for arbitrary style transfer and domain generalization. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8035–8045, 2022. 6

  76. [76]

    Maximum-entropy adversarial data augmentation for im- proved generalization and robustness.Advances in Neural Information Processing Systems, 33:14435–14447, 2020

    Long Zhao, Ting Liu, Xi Peng, and Dimitris Metaxas. Maximum-entropy adversarial data augmentation for im- proved generalization and robustness.Advances in Neural Information Processing Systems, 33:14435–14447, 2020. 2

  77. [77]

    Domain generalization with mixstyle.arXiv preprint arXiv:2104.02008, 2021

    Kaiyang Zhou, Yongxin Yang, Yu Qiao, and Tao Xi- ang. Domain generalization with mixstyle.arXiv preprint arXiv:2104.02008, 2021. 2, 6, 7

  78. [78]

    Surrogate gap minimization improves sharpness-aware training.arXiv preprint arXiv:2203.08065,

    Juntang Zhuang, Boqing Gong, Liangzhe Yuan, Yin Cui, Hartwig Adam, Nicha Dvornek, Sekhar Tatikonda, James Duncan, and Ting Liu. Surrogate gap minimization improves sharpness-aware training.arXiv preprint arXiv:2203.08065,