Spectral Gradient Surgery for Domain-Generalizable Dataset Distillation

Jae-Young Sim; Minyoung Oh; Najeong Chae

arxiv: 2605.18836 · v1 · pith:FPIRRQPCnew · submitted 2026-05-13 · 💻 cs.LG · cs.CV

Spectral Gradient Surgery for Domain-Generalizable Dataset Distillation

Minyoung Oh , Najeong Chae , Jae-Young Sim This is my paper

Pith reviewed 2026-05-20 20:36 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords dataset distillationdomain generalizationout-of-distribution generalizationspectral gradientdistribution matchinggradient surgerydomain shift

0 comments

The pith

Spectral Gradient Surgery disentangles class-discriminative and domain-specific information in distilled datasets to enable out-of-distribution generalization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to solve the problem of dataset distillation failing when test data comes from different domains than training data. It introduces a new setting called Domain Generalizable Dataset Distillation and proposes Spectral Gradient Surgery as a way to fix the Distribution Matching method by separating useful class features from domain-specific ones using the spectral domain of gradients. If this works, compact synthetic datasets could be used to train models that perform well even on new, unseen distributions without needing lots of real data from those domains.

Core claim

The central discovery is that cross-domain agreement among domain-wise gradients in the spectral domain can identify which components are class-discriminative and shared across domains, versus domain-specific. Based on this, Spectral Gradient Surgery augments the standard update in Distribution Matching with one gradient that reinforces the shared components and another that promotes diversity in the distilled dataset, leading to better OOD generalization.

What carries the argument

Spectral Gradient Surgery (SGS), which analyzes agreement and disagreement of gradients across domains in the spectral domain to create two complementary update terms for the distillation process.

If this is right

Substantially improves out-of-distribution generalization on various benchmarks of different scales.
Remains compatible as a plug-and-play addition to existing Distribution Matching methods.
Disentangles entangled class and domain information in the synthetic dataset.
Promotes internal diversity within the compact distilled set without extra augmentation overhead.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach might be adaptable to other dataset distillation techniques beyond Distribution Matching.
If successful, it could lead to more efficient ways to create robust training sets for real-world applications with domain shifts.
Future work could test whether the spectral agreement truly corresponds to class discrimination in a wider range of datasets.

Load-bearing premise

That the level of agreement between gradients from different domains in the spectral space accurately distinguishes class-discriminative features from those specific to individual domains.

What would settle it

An experiment where applying the spectral agreement-based gradient modifications fails to improve performance on out-of-distribution test sets compared to standard distillation, or where the agreement metric does not align with actual class discriminability.

Figures

Figures reproduced from arXiv: 2605.18836 by Jae-Young Sim, Minyoung Oh, Najeong Chae.

**Figure 1.** Figure 1: Comparison of ID and OOD performance for each target domain on PACS. DM +FACT +MixStyle +AdvFreq +SGS (Ours) 34 36 38 40 Accuracy (%) 17.9s 37.2s 21.2s 124.4s 17.9s Training Time 10s 30s [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: Comparison of average OOD performance and downstream training time on PACS when applying DG methods post-hoc to DM-distilled datasets. We assess the robustness of DM-distilled dataset with 10 images-per-class against domain shifts on the PACS [Li et al., 2017] dataset under two evaluation settings: in-distribution (ID) (evaluated on source domains) and out-of-distribution (OOD) (evaluated on an unseen tar… view at source ↗

**Figure 3.** Figure 3: UMAP visualization of feature statistics on the Digits-DG dataset. Each color indicates a distinct domain. Distribution Matching (DM) [Zhao and Bilen, 2023] updates a synthetic dataset Dˆ by minimizing the discrepancy between the feature distributions of synthetic and real data, computed per class: LDM = X C c=1 [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗

**Figure 4.** Figure 4: Visualization of distilled Digits-DG images (IPC=10) [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 6.** Figure 6: Performance changes according to (a) λc and (b) λd on Digits-DG. 4.4 Further Analysis Visualization of g and g class . To verify whether the proposed spectral decomposition effectively isolates class-discriminative information, we visualize the standard DM gradient g and the classdiscriminative signal g class extracted by SGS in [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗

**Figure 7.** Figure 7: Visual examples from the three benchmark datasets. For each dataset, rows correspond [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Performance changes according to K [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗

**Figure 9.** Figure 9: Visualization of distilled images on the Digits-DG under the SDG setting (IPC=10). [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

**Figure 10.** Figure 10: Comparison of average OOD accuracy and downstream training time among various [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗

**Figure 11.** Figure 11: Distilled Images of our proposed SGS upon DM when IPC=10 under MDG setting. Each [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗

**Figure 12.** Figure 12: Distilled Images of our proposed SGS upon DM when IPC=10 under SDG setting. Each [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗

read the original abstract

Dataset Distillation (DD) synthesizes a compact synthetic dataset that preserves the training utility of a full dataset. However, its standard formulation assumes that test data follow the same distribution as training data, an assumption that rarely holds in practice. A straightforward extension-applying post-hoc Domain Generalization (DG) techniques to distilled data-is ill-suited because existing DG methods rely on the natural diversity of real datasets, which compact synthetic sets inherently lack, while also incurring substantial augmentation overhead that conflicts with the efficiency objective of dataset distillation. To address this limitation, we introduce Domain Generalizable Dataset Distillation (DGDD), a new problem setting that explicitly targets out-of-distribution (OOD) generalization of distilled datasets. We study this problem through a widely adopted DD baseline of Distribution Matching (DM). We attribute the OOD vulnerability of DM to the entanglement of class-discriminative and domain-specific information within the compressed synthetic set, and propose Spectral Gradient Surgery (SGS) to disentangle the two. The key insight of SGS is that cross-domain agreement among domain-wise gradients in the spectral domain reveals which gradient components are shared across source domains-and are therefore class-discriminative-and which are domain-specific. Based on this observation, SGS augments the standard DM update with two complementary gradients: one that reinforces cross-domain shared components and another that explicitly promotes diversity within the distilled dataset. Extensive experiments on diverse-scale benchmarks demonstrate that SGS substantially improves OOD generalization while remaining plug-and-play compatible with existing DM methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper carves out DGDD as a new setting and adds a spectral tweak to DM that empirically lifts OOD performance, but the mapping from gradient agreement to class semantics stays intuitive rather than derived.

read the letter

The paper's main contribution is framing dataset distillation under domain shift as its own problem (DGDD) and then showing how to modify the standard distribution-matching update with a spectral gradient surgery step. SGS looks at agreement and disagreement among domain-wise gradients in the frequency domain, reinforces the shared parts, and adds a diversity term for the rest. This keeps the method compatible with existing DM pipelines without extra data augmentation overhead, which is a practical plus for anyone already running distillation at scale. The experiments are described as covering diverse benchmarks and delivering clear OOD gains, so the empirical case appears to land if the numbers check out in the full text. What the work does cleanly is identify a real mismatch: standard DG tricks assume rich natural variation that synthetic sets lack, so a distillation-specific fix makes sense. The soft spot is the central premise. The authors state that spectral agreement isolates class-discriminative components while disagreement flags domain-specific ones, yet the abstract gives no derivation or supporting argument for why that correspondence must hold rather than reflect correlated artifacts or training dynamics. The stress-test note is on target here; if the mapping is off, both added gradient terms could amplify the wrong directions. I would want to see targeted ablations that turn the spectral step on and off and check whether the gains survive when the domains are more subtly shifted. This is squarely for the dataset-distillation crowd that cares about robustness beyond i.i.d. test sets. It is novel enough and addresses a genuine gap, so it deserves a serious referee even if the justification for the spectral rule needs tightening in revision.

Referee Report

2 major / 2 minor

Summary. The paper introduces Domain Generalizable Dataset Distillation (DGDD) as a new problem setting targeting OOD generalization for distilled datasets. Focusing on the Distribution Matching (DM) baseline, it attributes OOD vulnerability to entanglement of class-discriminative and domain-specific information, and proposes Spectral Gradient Surgery (SGS) that augments the DM update by reinforcing cross-domain shared spectral gradient components (claimed to be class-discriminative) while adding a term to promote diversity in disagreeing components (claimed domain-specific). The method is presented as plug-and-play compatible with existing DM approaches, with experiments on diverse-scale benchmarks claimed to demonstrate substantial OOD improvements.

Significance. If the spectral agreement mapping holds and the reported gains prove robust, this would meaningfully advance dataset distillation toward real-world applicability under domain shifts, offering an efficient alternative to post-hoc DG techniques that depend on natural data diversity.

major comments (2)

[Abstract / Method] Abstract and method description: The central premise that cross-domain agreement among domain-wise gradients in the spectral domain isolates class-discriminative components (while disagreement isolates domain-specific ones) is stated as the key insight but lacks any derivation, proof, or targeted validation showing why this correspondence holds rather than capturing correlated artifacts or optimization biases. This assumption directly justifies the two added gradient terms and is load-bearing for the SGS construction.
[Experiments] Experimental evaluation: The claims of substantial OOD generalization improvements rest on reported benchmark results without error bars, ablation studies isolating the contribution of each SGS term, or implementation details for the spectral decomposition, making it impossible to assess whether the gains are reliable or reproducible.

minor comments (2)

[Introduction] Clarify in the introduction how DGDD differs from simply applying existing DG methods to distilled data, beyond the efficiency argument.
[Notation] Ensure all spectral-domain notation (e.g., gradient components, agreement metrics) is defined consistently before use in equations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, indicating the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [Abstract / Method] Abstract and method description: The central premise that cross-domain agreement among domain-wise gradients in the spectral domain isolates class-discriminative components (while disagreement isolates domain-specific ones) is stated as the key insight but lacks any derivation, proof, or targeted validation showing why this correspondence holds rather than capturing correlated artifacts or optimization biases. This assumption directly justifies the two added gradient terms and is load-bearing for the SGS construction.

Authors: We acknowledge that the central premise is presented primarily as an empirical observation supported by the method's design and downstream OOD gains rather than a formal derivation. The intuition arises from the fact that class-discriminative signals tend to produce consistent gradient directions across domains in the spectral domain, while domain-specific cues produce disagreements; this is consistent with frequency-based analyses in related DG literature. We agree that stronger targeted validation is warranted. In the revised manuscript we will expand the method section with additional gradient spectrum visualizations across domains, include new experiments that quantify class discrimination of the reinforced versus diversified components, and provide a more detailed heuristic justification. A complete theoretical proof of exact isolation is beyond the current empirical scope but we will strengthen the supporting evidence. revision: partial
Referee: [Experiments] Experimental evaluation: The claims of substantial OOD generalization improvements rest on reported benchmark results without error bars, ablation studies isolating the contribution of each SGS term, or implementation details for the spectral decomposition, making it impossible to assess whether the gains are reliable or reproducible.

Authors: We thank the referee for this observation on experimental reporting. The current results are averaged over multiple random seeds but lack explicit error bars and component-wise ablations. In the revision we will add standard error bars to all tables and figures, introduce ablation studies that separately disable the shared-component reinforcement term and the diversity-promotion term, and include pseudocode plus hyperparameter settings for the spectral decomposition (FFT windowing, frequency thresholding, etc.) in the appendix. These changes will improve verifiability and reproducibility. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained augmentation of baseline

full rationale

The paper presents SGS as a heuristic augmentation to the existing DM baseline, motivated by an explicit assumption about spectral gradient agreement mapping to class-discriminative vs. domain-specific components. No equations are shown that define a quantity in terms of itself, no fitted parameters are relabeled as predictions, and no load-bearing steps reduce to self-citations or prior author work by construction. The central claim rests on the stated insight and empirical validation rather than tautological reduction, making the derivation independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unproven premise that spectral-domain gradient agreement separates class-discriminative from domain-specific information; no free parameters or invented entities are mentioned in the abstract.

axioms (1)

domain assumption Cross-domain agreement among domain-wise gradients in the spectral domain reveals class-discriminative components.
This premise is stated as the key insight that justifies the two added gradient terms.

pith-pipeline@v0.9.0 · 5800 in / 1289 out tokens · 46765 ms · 2026-05-20T20:36:54.594677+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

The key insight of SGS is that cross-domain agreement among domain-wise gradients in the spectral domain reveals which gradient components are shared across source domains—and are therefore class-discriminative—and which are domain-specific.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SGS augments the standard DM update with two complementary gradients: one that reinforces cross-domain shared components and another that explicitly promotes diversity within the distilled dataset.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 3 internal anchors

[1]

Dataset Distillation

Dataset Distillation , author=. arXiv preprint arXiv:1811.10959 , year=

work page internal anchor Pith review arXiv
[2]

Proceedings of the International Conference on Learning Representations (ICLR) , year=

Dataset Condensation with Gradient Matching , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

work page
[3]

Proceedings of the International Conference on Machine Learning (ICML) , pages=

Dataset condensation with Differentiable Siamese Augmentation , author=. Proceedings of the International Conference on Machine Learning (ICML) , pages=

work page
[4]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

Dataset Distillation by Matching Training Trajectories , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

work page
[5]

Proceedings of the International Conference on Machine Learning (ICML) , pages=

Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory , author=. Proceedings of the International Conference on Machine Learning (ICML) , pages=

work page
[6]

Wang, Kai and Zhao, Bo and Peng, Xiangyu and Zhu, Zheng and Yang, Shuo and Wang, Shuo and Huang, Guan and Bilen, Hakan and Wang, Xinchao and You, Yang , booktitle=

work page
[7]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , pages=

Dataset Condensation with Distribution Matching , author=. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , pages=

work page
[8]

Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) , year=

Hyperbolic Dataset Distillation , author=. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) , year=

work page
[9]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

Improved Distribution Matching for Dataset Condensation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

work page
[10]

and Lawryshyn, Yuri A

Sajedi, Ahmad and Khaki, Samir and Amjadian, Ehsan and Liu, Lucy Z. and Lawryshyn, Yuri A. and Plataniotis, Konstantinos N. , booktitle=

work page
[11]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

Dataset Distillation with Neural Characteristic Function: A Minmax Perspective , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

work page
[12]

Zhang, Hansong and Li, Shikun and Wang, Pengju and Zeng, Dan Ge, Shiming , booktitle=

work page
[13]

European conference on computer vision , pages=

Learning to generate novel domains for domain generalization , author=. European conference on computer vision , pages=

work page
[14]

Proceedings of the IEEE international conference on computer vision , pages=

Deeper, broader and artier domain generalization , author=. Proceedings of the IEEE international conference on computer vision , pages=

work page
[15]

Proceedings of the IEEE , volume=

Gradient-based learning applied to document recognition , author=. Proceedings of the IEEE , volume=. 2002 , publisher=

work page 2002
[16]

International conference on machine learning , pages=

Unsupervised domain adaptation by backpropagation , author=. International conference on machine learning , pages=. 2015 , organization=

work page 2015
[17]

NIPS workshop on deep learning and unsupervised feature learning , volume=

Reading digits in natural images with unsupervised feature learning , author=. NIPS workshop on deep learning and unsupervised feature learning , volume=. 2011 , organization=

work page 2011
[18]

Conference on robot learning , pages=

Core50: a new dataset and benchmark for continuous object recognition , author=. Conference on robot learning , pages=. 2017 , organization=

work page 2017
[19]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

A fourier-based framework for domain generalization , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page
[20]

arXiv preprint arXiv:2505.22387 , year=

DAM: Domain-Aware Module for Multi-Domain Dataset Condensation , author=. arXiv preprint arXiv:2505.22387 , year=

work page arXiv
[21]

Proceedings of the International Conference on Learning Representations (ICLR) , year=

Domain Generalization with MixStyle , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

work page
[22]

Advances in Neural Information Processing Systems , volume=

Towards combating frequency simplicity-biased learning for domain generalization , author=. Advances in Neural Information Processing Systems , volume=

work page
[23]

Proceedings of the AAAI conference on artificial intelligence , volume=

AdvST: Revisiting data augmentations for single domain generalization , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page
[24]

Proceedings of the International Conference on Machine Learning (ICML) , pages=

Herding dynamical weights to learn , author=. Proceedings of the International Conference on Machine Learning (ICML) , pages=

work page
[25]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Umap: Uniform manifold approximation and projection for dimension reduction , author=. arXiv preprint arXiv:1802.03426 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[26]

2009 , publisher=

Directional statistics , author=. 2009 , publisher=

work page 2009
[27]

Proceedings of the International Conference on Learning Representations (ICLR) , year=

Asymmetric Synthetic Data Update for Domain Incremental Dataset Distillation , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

work page
[28]

Invariant Risk Minimization

Invariant risk minimization , author=. arXiv preprint arXiv:1907.02893 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1907
[29]

Journal of the royal statistical society

Algorithm AS 136: A k-means clustering algorithm , author=. Journal of the royal statistical society. series c (applied statistics) , volume=. 1979 , publisher=

work page 1979
[30]

Proceedings of the International Conference on Machine Learning (ICML) , pages=

Fishr: Invariant gradient variances for out-of-distribution generalization , author=. Proceedings of the International Conference on Machine Learning (ICML) , pages=

work page
[31]

Proceedings of the International Conference on Learning Representations (ICLR) , year=

Gradient matching for domain generalization , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

work page

[1] [1]

Dataset Distillation

Dataset Distillation , author=. arXiv preprint arXiv:1811.10959 , year=

work page internal anchor Pith review arXiv

[2] [2]

Proceedings of the International Conference on Learning Representations (ICLR) , year=

Dataset Condensation with Gradient Matching , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

work page

[3] [3]

Proceedings of the International Conference on Machine Learning (ICML) , pages=

Dataset condensation with Differentiable Siamese Augmentation , author=. Proceedings of the International Conference on Machine Learning (ICML) , pages=

work page

[4] [4]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

Dataset Distillation by Matching Training Trajectories , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

work page

[5] [5]

Proceedings of the International Conference on Machine Learning (ICML) , pages=

Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory , author=. Proceedings of the International Conference on Machine Learning (ICML) , pages=

work page

[6] [6]

Wang, Kai and Zhao, Bo and Peng, Xiangyu and Zhu, Zheng and Yang, Shuo and Wang, Shuo and Huang, Guan and Bilen, Hakan and Wang, Xinchao and You, Yang , booktitle=

work page

[7] [7]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , pages=

Dataset Condensation with Distribution Matching , author=. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , pages=

work page

[8] [8]

Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) , year=

Hyperbolic Dataset Distillation , author=. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) , year=

work page

[9] [9]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

Improved Distribution Matching for Dataset Condensation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

work page

[10] [10]

and Lawryshyn, Yuri A

Sajedi, Ahmad and Khaki, Samir and Amjadian, Ehsan and Liu, Lucy Z. and Lawryshyn, Yuri A. and Plataniotis, Konstantinos N. , booktitle=

work page

[11] [11]

Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

Dataset Distillation with Neural Characteristic Function: A Minmax Perspective , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

work page

[12] [12]

Zhang, Hansong and Li, Shikun and Wang, Pengju and Zeng, Dan Ge, Shiming , booktitle=

work page

[13] [13]

European conference on computer vision , pages=

Learning to generate novel domains for domain generalization , author=. European conference on computer vision , pages=

work page

[14] [14]

Proceedings of the IEEE international conference on computer vision , pages=

Deeper, broader and artier domain generalization , author=. Proceedings of the IEEE international conference on computer vision , pages=

work page

[15] [15]

Proceedings of the IEEE , volume=

Gradient-based learning applied to document recognition , author=. Proceedings of the IEEE , volume=. 2002 , publisher=

work page 2002

[16] [16]

International conference on machine learning , pages=

Unsupervised domain adaptation by backpropagation , author=. International conference on machine learning , pages=. 2015 , organization=

work page 2015

[17] [17]

NIPS workshop on deep learning and unsupervised feature learning , volume=

Reading digits in natural images with unsupervised feature learning , author=. NIPS workshop on deep learning and unsupervised feature learning , volume=. 2011 , organization=

work page 2011

[18] [18]

Conference on robot learning , pages=

Core50: a new dataset and benchmark for continuous object recognition , author=. Conference on robot learning , pages=. 2017 , organization=

work page 2017

[19] [19]

Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

A fourier-based framework for domain generalization , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

work page

[20] [20]

arXiv preprint arXiv:2505.22387 , year=

DAM: Domain-Aware Module for Multi-Domain Dataset Condensation , author=. arXiv preprint arXiv:2505.22387 , year=

work page arXiv

[21] [21]

Proceedings of the International Conference on Learning Representations (ICLR) , year=

Domain Generalization with MixStyle , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

work page

[22] [22]

Advances in Neural Information Processing Systems , volume=

Towards combating frequency simplicity-biased learning for domain generalization , author=. Advances in Neural Information Processing Systems , volume=

work page

[23] [23]

Proceedings of the AAAI conference on artificial intelligence , volume=

AdvST: Revisiting data augmentations for single domain generalization , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

work page

[24] [24]

Proceedings of the International Conference on Machine Learning (ICML) , pages=

Herding dynamical weights to learn , author=. Proceedings of the International Conference on Machine Learning (ICML) , pages=

work page

[25] [25]

UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

Umap: Uniform manifold approximation and projection for dimension reduction , author=. arXiv preprint arXiv:1802.03426 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[26] [26]

2009 , publisher=

Directional statistics , author=. 2009 , publisher=

work page 2009

[27] [27]

Proceedings of the International Conference on Learning Representations (ICLR) , year=

Asymmetric Synthetic Data Update for Domain Incremental Dataset Distillation , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

work page

[28] [28]

Invariant Risk Minimization

Invariant risk minimization , author=. arXiv preprint arXiv:1907.02893 , year=

work page internal anchor Pith review Pith/arXiv arXiv 1907

[29] [29]

Journal of the royal statistical society

Algorithm AS 136: A k-means clustering algorithm , author=. Journal of the royal statistical society. series c (applied statistics) , volume=. 1979 , publisher=

work page 1979

[30] [30]

Proceedings of the International Conference on Machine Learning (ICML) , pages=

Fishr: Invariant gradient variances for out-of-distribution generalization , author=. Proceedings of the International Conference on Machine Learning (ICML) , pages=

work page

[31] [31]

Proceedings of the International Conference on Learning Representations (ICLR) , year=

Gradient matching for domain generalization , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

work page