pith. sign in

arxiv: 2605.18836 · v1 · pith:FPIRRQPCnew · submitted 2026-05-13 · 💻 cs.LG · cs.CV

Spectral Gradient Surgery for Domain-Generalizable Dataset Distillation

Pith reviewed 2026-05-20 20:36 UTC · model grok-4.3

classification 💻 cs.LG cs.CV
keywords dataset distillationdomain generalizationout-of-distribution generalizationspectral gradientdistribution matchinggradient surgerydomain shift
0
0 comments X

The pith

Spectral Gradient Surgery disentangles class-discriminative and domain-specific information in distilled datasets to enable out-of-distribution generalization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to solve the problem of dataset distillation failing when test data comes from different domains than training data. It introduces a new setting called Domain Generalizable Dataset Distillation and proposes Spectral Gradient Surgery as a way to fix the Distribution Matching method by separating useful class features from domain-specific ones using the spectral domain of gradients. If this works, compact synthetic datasets could be used to train models that perform well even on new, unseen distributions without needing lots of real data from those domains.

Core claim

The central discovery is that cross-domain agreement among domain-wise gradients in the spectral domain can identify which components are class-discriminative and shared across domains, versus domain-specific. Based on this, Spectral Gradient Surgery augments the standard update in Distribution Matching with one gradient that reinforces the shared components and another that promotes diversity in the distilled dataset, leading to better OOD generalization.

What carries the argument

Spectral Gradient Surgery (SGS), which analyzes agreement and disagreement of gradients across domains in the spectral domain to create two complementary update terms for the distillation process.

If this is right

  • Substantially improves out-of-distribution generalization on various benchmarks of different scales.
  • Remains compatible as a plug-and-play addition to existing Distribution Matching methods.
  • Disentangles entangled class and domain information in the synthetic dataset.
  • Promotes internal diversity within the compact distilled set without extra augmentation overhead.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This approach might be adaptable to other dataset distillation techniques beyond Distribution Matching.
  • If successful, it could lead to more efficient ways to create robust training sets for real-world applications with domain shifts.
  • Future work could test whether the spectral agreement truly corresponds to class discrimination in a wider range of datasets.

Load-bearing premise

That the level of agreement between gradients from different domains in the spectral space accurately distinguishes class-discriminative features from those specific to individual domains.

What would settle it

An experiment where applying the spectral agreement-based gradient modifications fails to improve performance on out-of-distribution test sets compared to standard distillation, or where the agreement metric does not align with actual class discriminability.

Figures

Figures reproduced from arXiv: 2605.18836 by Jae-Young Sim, Minyoung Oh, Najeong Chae.

Figure 1
Figure 1. Figure 1: Comparison of ID and OOD perfor￾mance for each target domain on PACS. DM +FACT +MixStyle +AdvFreq +SGS (Ours) 34 36 38 40 Accuracy (%) 17.9s 37.2s 21.2s 124.4s 17.9s Training Time 10s 30s [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Comparison of average OOD per￾formance and downstream training time on PACS when applying DG methods post-hoc to DM-distilled datasets. We assess the robustness of DM-distilled dataset with 10 images-per-class against domain shifts on the PACS [Li et al., 2017] dataset under two evaluation settings: in-distribution (ID) (evaluated on source domains) and out-of-distribution (OOD) (evaluated on an unseen tar… view at source ↗
Figure 3
Figure 3. Figure 3: UMAP visualiza￾tion of feature statistics on the Digits-DG dataset. Each color indicates a distinct domain. Distribution Matching (DM) [Zhao and Bilen, 2023] updates a syn￾thetic dataset Dˆ by minimizing the discrepancy between the feature distributions of synthetic and real data, computed per class: LDM = X C c=1 [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of distilled Digits-DG images (IPC=10) [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Performance changes according to (a) λc and (b) λd on Digits-DG. 4.4 Further Analysis Visualization of g and g class . To verify whether the proposed spectral decomposition effectively isolates class-discriminative information, we visualize the standard DM gradient g and the class￾discriminative signal g class extracted by SGS in [PITH_FULL_IMAGE:figures/full_fig_p009_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Visual examples from the three benchmark datasets. For each dataset, rows correspond [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Performance changes according to K [PITH_FULL_IMAGE:figures/full_fig_p014_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Visualization of distilled images on the Digits-DG under the SDG setting (IPC=10). [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Comparison of average OOD accuracy and downstream training time among various [PITH_FULL_IMAGE:figures/full_fig_p016_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Distilled Images of our proposed SGS upon DM when IPC=10 under MDG setting. Each [PITH_FULL_IMAGE:figures/full_fig_p017_11.png] view at source ↗
Figure 12
Figure 12. Figure 12: Distilled Images of our proposed SGS upon DM when IPC=10 under SDG setting. Each [PITH_FULL_IMAGE:figures/full_fig_p017_12.png] view at source ↗
read the original abstract

Dataset Distillation (DD) synthesizes a compact synthetic dataset that preserves the training utility of a full dataset. However, its standard formulation assumes that test data follow the same distribution as training data, an assumption that rarely holds in practice. A straightforward extension-applying post-hoc Domain Generalization (DG) techniques to distilled data-is ill-suited because existing DG methods rely on the natural diversity of real datasets, which compact synthetic sets inherently lack, while also incurring substantial augmentation overhead that conflicts with the efficiency objective of dataset distillation. To address this limitation, we introduce Domain Generalizable Dataset Distillation (DGDD), a new problem setting that explicitly targets out-of-distribution (OOD) generalization of distilled datasets. We study this problem through a widely adopted DD baseline of Distribution Matching (DM). We attribute the OOD vulnerability of DM to the entanglement of class-discriminative and domain-specific information within the compressed synthetic set, and propose Spectral Gradient Surgery (SGS) to disentangle the two. The key insight of SGS is that cross-domain agreement among domain-wise gradients in the spectral domain reveals which gradient components are shared across source domains-and are therefore class-discriminative-and which are domain-specific. Based on this observation, SGS augments the standard DM update with two complementary gradients: one that reinforces cross-domain shared components and another that explicitly promotes diversity within the distilled dataset. Extensive experiments on diverse-scale benchmarks demonstrate that SGS substantially improves OOD generalization while remaining plug-and-play compatible with existing DM methods.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces Domain Generalizable Dataset Distillation (DGDD) as a new problem setting targeting OOD generalization for distilled datasets. Focusing on the Distribution Matching (DM) baseline, it attributes OOD vulnerability to entanglement of class-discriminative and domain-specific information, and proposes Spectral Gradient Surgery (SGS) that augments the DM update by reinforcing cross-domain shared spectral gradient components (claimed to be class-discriminative) while adding a term to promote diversity in disagreeing components (claimed domain-specific). The method is presented as plug-and-play compatible with existing DM approaches, with experiments on diverse-scale benchmarks claimed to demonstrate substantial OOD improvements.

Significance. If the spectral agreement mapping holds and the reported gains prove robust, this would meaningfully advance dataset distillation toward real-world applicability under domain shifts, offering an efficient alternative to post-hoc DG techniques that depend on natural data diversity.

major comments (2)
  1. [Abstract / Method] Abstract and method description: The central premise that cross-domain agreement among domain-wise gradients in the spectral domain isolates class-discriminative components (while disagreement isolates domain-specific ones) is stated as the key insight but lacks any derivation, proof, or targeted validation showing why this correspondence holds rather than capturing correlated artifacts or optimization biases. This assumption directly justifies the two added gradient terms and is load-bearing for the SGS construction.
  2. [Experiments] Experimental evaluation: The claims of substantial OOD generalization improvements rest on reported benchmark results without error bars, ablation studies isolating the contribution of each SGS term, or implementation details for the spectral decomposition, making it impossible to assess whether the gains are reliable or reproducible.
minor comments (2)
  1. [Introduction] Clarify in the introduction how DGDD differs from simply applying existing DG methods to distilled data, beyond the efficiency argument.
  2. [Notation] Ensure all spectral-domain notation (e.g., gradient components, agreement metrics) is defined consistently before use in equations.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below, indicating the revisions we will make to strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract / Method] Abstract and method description: The central premise that cross-domain agreement among domain-wise gradients in the spectral domain isolates class-discriminative components (while disagreement isolates domain-specific ones) is stated as the key insight but lacks any derivation, proof, or targeted validation showing why this correspondence holds rather than capturing correlated artifacts or optimization biases. This assumption directly justifies the two added gradient terms and is load-bearing for the SGS construction.

    Authors: We acknowledge that the central premise is presented primarily as an empirical observation supported by the method's design and downstream OOD gains rather than a formal derivation. The intuition arises from the fact that class-discriminative signals tend to produce consistent gradient directions across domains in the spectral domain, while domain-specific cues produce disagreements; this is consistent with frequency-based analyses in related DG literature. We agree that stronger targeted validation is warranted. In the revised manuscript we will expand the method section with additional gradient spectrum visualizations across domains, include new experiments that quantify class discrimination of the reinforced versus diversified components, and provide a more detailed heuristic justification. A complete theoretical proof of exact isolation is beyond the current empirical scope but we will strengthen the supporting evidence. revision: partial

  2. Referee: [Experiments] Experimental evaluation: The claims of substantial OOD generalization improvements rest on reported benchmark results without error bars, ablation studies isolating the contribution of each SGS term, or implementation details for the spectral decomposition, making it impossible to assess whether the gains are reliable or reproducible.

    Authors: We thank the referee for this observation on experimental reporting. The current results are averaged over multiple random seeds but lack explicit error bars and component-wise ablations. In the revision we will add standard error bars to all tables and figures, introduce ablation studies that separately disable the shared-component reinforcement term and the diversity-promotion term, and include pseudocode plus hyperparameter settings for the spectral decomposition (FFT windowing, frequency thresholding, etc.) in the appendix. These changes will improve verifiability and reproducibility. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained augmentation of baseline

full rationale

The paper presents SGS as a heuristic augmentation to the existing DM baseline, motivated by an explicit assumption about spectral gradient agreement mapping to class-discriminative vs. domain-specific components. No equations are shown that define a quantity in terms of itself, no fitted parameters are relabeled as predictions, and no load-bearing steps reduce to self-citations or prior author work by construction. The central claim rests on the stated insight and empirical validation rather than tautological reduction, making the derivation independent of its own outputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unproven premise that spectral-domain gradient agreement separates class-discriminative from domain-specific information; no free parameters or invented entities are mentioned in the abstract.

axioms (1)
  • domain assumption Cross-domain agreement among domain-wise gradients in the spectral domain reveals class-discriminative components.
    This premise is stated as the key insight that justifies the two added gradient terms.

pith-pipeline@v0.9.0 · 5800 in / 1289 out tokens · 46765 ms · 2026-05-20T20:36:54.594677+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 3 internal anchors

  1. [1]

    Dataset Distillation

    Dataset Distillation , author=. arXiv preprint arXiv:1811.10959 , year=

  2. [2]

    Proceedings of the International Conference on Learning Representations (ICLR) , year=

    Dataset Condensation with Gradient Matching , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

  3. [3]

    Proceedings of the International Conference on Machine Learning (ICML) , pages=

    Dataset condensation with Differentiable Siamese Augmentation , author=. Proceedings of the International Conference on Machine Learning (ICML) , pages=

  4. [4]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

    Dataset Distillation by Matching Training Trajectories , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

  5. [5]

    Proceedings of the International Conference on Machine Learning (ICML) , pages=

    Scaling Up Dataset Distillation to ImageNet-1K with Constant Memory , author=. Proceedings of the International Conference on Machine Learning (ICML) , pages=

  6. [6]

    Wang, Kai and Zhao, Bo and Peng, Xiangyu and Zhu, Zheng and Yang, Shuo and Wang, Shuo and Huang, Guan and Bilen, Hakan and Wang, Xinchao and You, Yang , booktitle=

  7. [7]

    Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , pages=

    Dataset Condensation with Distribution Matching , author=. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision (WACV) , pages=

  8. [8]

    Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) , year=

    Hyperbolic Dataset Distillation , author=. Proceedings of the Advances in Neural Information Processing Systems (NeurIPS) , year=

  9. [9]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

    Improved Distribution Matching for Dataset Condensation , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , pages=

  10. [10]

    and Lawryshyn, Yuri A

    Sajedi, Ahmad and Khaki, Samir and Amjadian, Ehsan and Liu, Lucy Z. and Lawryshyn, Yuri A. and Plataniotis, Konstantinos N. , booktitle=

  11. [11]

    Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

    Dataset Distillation with Neural Characteristic Function: A Minmax Perspective , author=. Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) , year=

  12. [12]

    Zhang, Hansong and Li, Shikun and Wang, Pengju and Zeng, Dan Ge, Shiming , booktitle=

  13. [13]

    European conference on computer vision , pages=

    Learning to generate novel domains for domain generalization , author=. European conference on computer vision , pages=

  14. [14]

    Proceedings of the IEEE international conference on computer vision , pages=

    Deeper, broader and artier domain generalization , author=. Proceedings of the IEEE international conference on computer vision , pages=

  15. [15]

    Proceedings of the IEEE , volume=

    Gradient-based learning applied to document recognition , author=. Proceedings of the IEEE , volume=. 2002 , publisher=

  16. [16]

    International conference on machine learning , pages=

    Unsupervised domain adaptation by backpropagation , author=. International conference on machine learning , pages=. 2015 , organization=

  17. [17]

    NIPS workshop on deep learning and unsupervised feature learning , volume=

    Reading digits in natural images with unsupervised feature learning , author=. NIPS workshop on deep learning and unsupervised feature learning , volume=. 2011 , organization=

  18. [18]

    Conference on robot learning , pages=

    Core50: a new dataset and benchmark for continuous object recognition , author=. Conference on robot learning , pages=. 2017 , organization=

  19. [19]

    Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

    A fourier-based framework for domain generalization , author=. Proceedings of the IEEE/CVF conference on computer vision and pattern recognition , pages=

  20. [20]

    arXiv preprint arXiv:2505.22387 , year=

    DAM: Domain-Aware Module for Multi-Domain Dataset Condensation , author=. arXiv preprint arXiv:2505.22387 , year=

  21. [21]

    Proceedings of the International Conference on Learning Representations (ICLR) , year=

    Domain Generalization with MixStyle , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

  22. [22]

    Advances in Neural Information Processing Systems , volume=

    Towards combating frequency simplicity-biased learning for domain generalization , author=. Advances in Neural Information Processing Systems , volume=

  23. [23]

    Proceedings of the AAAI conference on artificial intelligence , volume=

    AdvST: Revisiting data augmentations for single domain generalization , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

  24. [24]

    Proceedings of the International Conference on Machine Learning (ICML) , pages=

    Herding dynamical weights to learn , author=. Proceedings of the International Conference on Machine Learning (ICML) , pages=

  25. [25]

    UMAP: Uniform Manifold Approximation and Projection for Dimension Reduction

    Umap: Uniform manifold approximation and projection for dimension reduction , author=. arXiv preprint arXiv:1802.03426 , year=

  26. [26]

    2009 , publisher=

    Directional statistics , author=. 2009 , publisher=

  27. [27]

    Proceedings of the International Conference on Learning Representations (ICLR) , year=

    Asymmetric Synthetic Data Update for Domain Incremental Dataset Distillation , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=

  28. [28]

    Invariant Risk Minimization

    Invariant risk minimization , author=. arXiv preprint arXiv:1907.02893 , year=

  29. [29]

    Journal of the royal statistical society

    Algorithm AS 136: A k-means clustering algorithm , author=. Journal of the royal statistical society. series c (applied statistics) , volume=. 1979 , publisher=

  30. [30]

    Proceedings of the International Conference on Machine Learning (ICML) , pages=

    Fishr: Invariant gradient variances for out-of-distribution generalization , author=. Proceedings of the International Conference on Machine Learning (ICML) , pages=

  31. [31]

    Proceedings of the International Conference on Learning Representations (ICLR) , year=

    Gradient matching for domain generalization , author=. Proceedings of the International Conference on Learning Representations (ICLR) , year=