Pool-Select-Refine for Allocation-Aware Generative Dataset Distillation

Shunsuke Sakai; Tatsuhito Hasegawa; Wenmin Li; Zhongkai Zhao

arxiv: 2606.01920 · v2 · pith:QQIST4GJnew · submitted 2026-06-01 · 💻 cs.CV

Pool-Select-Refine for Allocation-Aware Generative Dataset Distillation

Wenmin Li , Shunsuke Sakai , Zhongkai Zhao , Tatsuhito Hasegawa This is my paper

Pith reviewed 2026-06-29 05:32 UTC · model grok-4.3

classification 💻 cs.CV

keywords dataset distillationdiffusion modelsgenerative dataset distillationimage classificationsynthetic datasetslatent space refinementbudget allocation

0 comments

The pith

A Pool-Select-Refine framework improves diffusion-based dataset distillation by building an over-complete candidate pool, selecting a budget-sized subset, and refining in latent space.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that current diffusion-based dataset distillation methods waste limited budgets through a rigid generate-and-use approach that directly outputs a fixed number of samples. Instead, the proposed two-stage process first generates more candidates than the final budget allows, then selects a compact subset and refines those samples in latent space using soft-label supervision from a teacher model. This explicit decoupling of generation, selection, and refinement produces more informative synthetic datasets. Experiments on large-scale and fine-grained image classification benchmarks show consistent gains over direct-generation baselines. A sympathetic reader would care because the method addresses redundancy and poor allocation without altering the underlying generative prior.

Core claim

The Pool-Select-Refine framework decouples candidate generation from final budget allocation in generative dataset distillation. By first constructing an over-complete candidate pool from a pretrained diffusion model, selecting a compact subset that respects the target images-per-class budget, and then refining the selected samples in latent space with soft-label supervision derived from the teacher model, the method produces synthetic datasets that achieve higher downstream classification accuracy than the standard Generate-and-Use strategy.

What carries the argument

The Pool-Select-Refine process: over-complete candidate pool from diffusion generation, followed by budget-constrained selection of a compact subset, followed by latent-space refinement using soft labels from the teacher model.

If this is right

Consistent accuracy gains over diffusion-based baselines on both large-scale and fine-grained image classification benchmarks.
More effective use of a fixed images-per-class budget by reducing redundancy among generated samples.
Improved semantic alignment of the final synthetic set while the generative prior from the diffusion model remains intact.
A curation stage inserted before refinement is sufficient to improve results without changing the underlying diffusion generator.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pool-then-select pattern could be tested on generative models other than diffusion while keeping the refinement step fixed.
The method might allow the same downstream accuracy to be reached with smaller overall budgets than current direct-generation approaches.
Selection criteria used in the pool stage could be made learnable rather than fixed, potentially further reducing bias.

Load-bearing premise

That selecting a subset from an over-complete pool and then refining will reliably yield more informative samples than generating exactly the budgeted number directly, without the selection step introducing biases that degrade the generative prior.

What would settle it

A controlled run of the same classification benchmarks in which the selection stage is removed, so that directly generated samples at the exact budget are used instead, and the resulting accuracy is equal to or higher than the full Pool-Select-Refine pipeline.

Figures

Figures reproduced from arXiv: 2606.01920 by Shunsuke Sakai, Tatsuhito Hasegawa, Wenmin Li, Zhongkai Zhao.

**Figure 1.** Figure 1: Performance comparison on ImageWoof under different IPC budgets. Our method consistently improves over the baselines under the same IPC budget, demonstrating the effectiveness of explicit curation and refinement for fine-grained dataset distillation. However, the potential of diffusion-based distillation [9, 35, 30, 3, 47, 24, 40, 37] is not fully exploited by existing pipelines. Most current methods still… view at source ↗

**Figure 2.** Figure 2: illustrates the pipeline-level difference between conventional one-shot generate-and-use distillation and our proposed “Pool-Select-Refine” framework. The conventional pipeline directly trains the student on a fixed set of DiT-generated samples, so redundant or weakly informative samples may occupy scarce IPC slots. In contrast, our framework first builds an over-complete candidate pool, selects a compac… view at source ↗

**Figure 2.** Figure 2: Pipeline comparison between conventional generate-and-use distillation and our Pool-SelectRefine framework. Our method first constructs an over-complete candidate pool, selects useful samples, and then refines the selected subset before student training. waste due to the rigid “Generate-and-Use” paradigm, where limited IPC budgets are directly spent on synthetic samples without an explicit curation stage.… view at source ↗

**Figure 3.** Figure 3: Detailed workflow of the proposed Pool-Select-Refine framework. Stage I constructs an over-complete candidate pool, evaluates all candidates using teacher-derived reliability, diversity, and uncertainty, and selects a compact subset under the target IPC budget. The teacher soft labels and generation latents of the selected samples are cached. Stage II directly optimizes the cached latent codes under soft-l… view at source ↗

**Figure 4.** Figure 4: provides empirical motivation for this formulation. Compared with directly using a fixed number of generated samples, selecting 𝐾 samples from an over-complete pool consistently improves student performance. This indicates that budget waste is not only a generation-quality issue, but also a budget-allocation issue. Therefore, Stage I focuses on selecting more useful seeds from the candidate pool. Since … view at source ↗

**Figure 4.** Figure 4: Effect of pool-based selection. Selecting K samples from an over-complete candidate pool improves student accuracy compared with directly using a fixed number of generated samples, indicating that budget allocation is a key factor in diffusion-based dataset distillation. Markers are slightly horizontally shifted for better readability. construct a compact distilled dataset: D⋆ = [ C c=1 {(xc,k, q ref c,k … view at source ↗

**Figure 5.** Figure 5: Ablation of selection signals under a DiT generative prior. (a) Single-term scoring using reliability, diversity, or uncertainty. (b) Pairwise and full combinations. Pairwise scoring consistently improves over single-term variants, and using all three signals achieves the best or tied-best accuracy across IPC settings, indicating complementary roles in subset curation. especially in small-IPC settings. Im… view at source ↗

**Figure 6.** Figure 6: Effect of soft-label guidance in latent-space optimization. We compare the final classification accuracy between latent optimization with (LDM + KL) and without KL. Results on both DiT-based (a) and Minimax-based (b) pipelines demonstrate consistent performance improvement from softlabel supervision in the latent space. We further validate the role of the soft-label KL divergence term in the latent opti… view at source ↗

**Figure 7.** Figure 7: Comparison of soft-label and hard-label supervision when training models on distilled datasets. We report classification accuracy using three label settings—baseline (from original synthetic data), hard labels and soft labels. Across both DiT (a) and Minimax (b) pipelines, soft-label training consistently yields superior accuracy, especially under low IPC conditions. Finally, we compare the effect of usin… view at source ↗

**Figure 9.** Figure 9: Qualitative visualization of distilled samples. We compare images from the Real dataset (Top row) against synthetic samples generated by standard baselines (DiT, Minimax) and our proposed method (DiT-Ours, Minimax-Ours). Columns exhibit samples from the Golden Retriever and Church classes. Compared to the direct generation baselines, which occasionally yield ambiguous backgrounds or artifacts, our method s… view at source ↗

**Figure 8.** Figure 8: Complementarity of selection signals under a Minimax generative prior. (a) Single-term scoring using reliability, diversity or uncertainty. (b) Term combinations. Pairwise combinations generally outperform single-term variants, and combining all three signals yields the best or tied-best performance across IPC settings, indicating complementary roles in subset curation. Markers are slightly horizontally sh… view at source ↗

**Figure 10.** Figure 10: Complementarity of selection signals under a Minimax generative prior. (a) Single-term scoring using reliability, diversity or uncertainty. (b) Term combinations. Pairwise combinations generally outperform single-term variants, and combining all three signals yields the best or tied-best performance across IPC settings, indicating complementary roles in subset curation. A. My Appendix A.1. Complementari… view at source ↗

**Figure 9.** Figure 9: Heatmaps of classification accuracy over combinations of per-class image pool sizes and IPC. Each cell shows the classification accuracy when K samples from a pool of size M per class.Red boxes indicate the best configuration for each target IPC. advantages while keeping computational overhead within manageable limits. 4.5. Visualization To intuitively assess the quality of the distilled dataset, we visual… view at source ↗

**Figure 10.** Figure 10: Qualitative visualization of distilled samples. We compare images from the Real dataset (Top row) against synthetic samples generated by standard baselines (DiT, Minimax) and our proposed method (DiT-Ours, Minimax-Ours). Columns exhibit samples from the Golden Retriever and Church classes. Compared to the direct generation baselines, which occasionally yield ambiguous backgrounds or artifacts, our method… view at source ↗

read the original abstract

Diffusion-based dataset distillation has recently emerged as a promising paradigm for condensing large-scale datasets into compact synthetic sets. By leveraging pretrained generative priors, these methods can produce realistic class-conditional samples more efficiently than traditional matching-based approaches. However, most existing diffusion-based methods still adopt a rigid ``Generate-and-Use'' strategy, where the generated samples are directly treated as the final distilled set under a fixed images-per-class budget. Such a design tightly couples candidate generation with final budget allocation, which may result in redundant waste of the limited budget or insufficiently informative samples. In this paper, we propose ``Pool-Select-Refine'', a two-stage framework for allocation-aware generative dataset distillation. First, instead of directly using a fixed number of generated samples, we construct an over-complete candidate pool and select a compact subset under the target budget. Second, we refine the selected samples in latent space using soft-label supervision derived from the teacher model, improving semantic alignment while preserving the generative prior. This design explicitly decouples generation, selection, and refinement, enabling more effective use of the distillation budget. Experiments on large-scale and fine-grained image classification benchmarks show that the proposed framework delivers consistent gains over diffusion-based baselines. The results suggest that introducing a curation stage before refinement is a simple yet effective way to improve diffusion-based dataset distillation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper adds a selection step after over-generating a pool in diffusion dataset distillation, but without details on the selection rule or ablations it's unclear if this beats simpler extra sampling.

read the letter

The colleague should know two things: the core proposal is a Pool-Select-Refine pipeline that first builds an over-complete candidate set from a diffusion model, then picks a budgeted subset and refines those picks in latent space with teacher soft labels; and the abstract claims this yields consistent gains over prior diffusion baselines on large-scale and fine-grained classification tasks.

What is new is the explicit decoupling of generation, selection, and refinement. Earlier diffusion distillation work mostly generates exactly the target number of images per class and uses them directly. The authors argue this couples generation too tightly to the final budget and can produce redundant or weak samples. Their fix is to generate more candidates first, curate, and then adjust the chosen ones. That separation is a clean design choice not present in the referenced baselines.

The paper does a reasonable job identifying a real inefficiency in the generate-and-use pattern. The refinement step using soft-label supervision is a practical way to improve semantic fit while staying close to the pretrained prior. If the selection criterion turns out to be effective, the overall approach could be a modest but useful engineering improvement for people already running diffusion-based distillation.

The soft spots are in the selection mechanism and the strength of the evidence. The abstract gives no description of how candidates are ranked or filtered from the pool, so it is impossible to judge whether the gains come from intelligent curation or simply from extra generation budget. If selection is weak or introduces its own bias, the refinement stage might be doing most of the work. The claim of consistent gains is stated without numbers, error bars, or ablations that isolate the selection component, which leaves the central empirical point under-supported. The stress-test concern about selection not demonstrably outperforming random or direct generation therefore still stands on the information given.

This is for readers already working on generative dataset distillation in computer vision who want to try a curation stage. It is not a foundational shift. The idea is clear enough and the motivation is honest, so it deserves a serious referee even though the current write-up would need stronger experimental grounding to be convincing.

Referee Report

2 major / 1 minor

Summary. The paper proposes the Pool-Select-Refine framework for diffusion-based dataset distillation. It decouples the process by first generating an over-complete candidate pool, then selecting a compact subset under a fixed images-per-class budget, and finally refining the selected samples in latent space via soft-label supervision from a teacher model. This is positioned as addressing limitations of rigid 'Generate-and-Use' strategies. Experiments on large-scale and fine-grained image classification benchmarks are claimed to show consistent gains over diffusion-based baselines.

Significance. If the selection step is shown to extract higher-value samples without introducing bias or extra cost, and if refinement preserves the generative prior, the approach could improve budget allocation efficiency in generative distillation. The explicit separation of generation, selection, and refinement stages is a clear structural contribution, though its empirical advantage requires verification against matched-budget controls.

major comments (2)

[Abstract] Abstract: the claim that the framework 'delivers consistent gains' and that 'introducing a curation stage before refinement is a simple yet effective way' is presented without any quantitative results, baselines, error bars, or description of the selection criterion; this leaves the central empirical claim unsupported in the provided text and directly engages the skeptic concern that selection may not outperform random or direct generation under matched budgets.
[Abstract] Abstract (weakest assumption paragraph): the design is asserted to 'explicitly decouple generation, selection, and refinement' and to avoid 'redundant waste of the limited budget,' yet no mechanism is given for the selection criterion or any ablation showing that the over-complete pool plus selection adds value beyond simply generating more candidates; without this, the advantage over fixed-budget diffusion generation remains unverified.

minor comments (1)

[Abstract] Abstract: the phrase 'allocation-aware' is introduced but never defined in operational terms relative to the budget or the selection step.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on the abstract. We address each point below and will revise the abstract to strengthen the presentation of our claims while respecting length constraints.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the framework 'delivers consistent gains' and that 'introducing a curation stage before refinement is a simple yet effective way' is presented without any quantitative results, baselines, error bars, or description of the selection criterion; this leaves the central empirical claim unsupported in the provided text and directly engages the skeptic concern that selection may not outperform random or direct generation under matched budgets.

Authors: We agree the abstract, as a concise summary, does not embed the full quantitative results, error bars, or selection details. These are reported in Section 4 with matched-budget baselines and standard deviations across multiple runs. The selection criterion (informativeness-based ranking within the pool) is defined in Section 3.2. We will revise the abstract to include a short clause describing the selection approach and reference the observed gains (e.g., consistent improvements on CIFAR-100 and ImageNet subsets). Full numbers and ablations remain in the main text due to abstract length limits, but the revision will make the central claim more self-contained. revision: yes
Referee: [Abstract] Abstract (weakest assumption paragraph): the design is asserted to 'explicitly decouple generation, selection, and refinement' and to avoid 'redundant waste of the limited budget,' yet no mechanism is given for the selection criterion or any ablation showing that the over-complete pool plus selection adds value beyond simply generating more candidates; without this, the advantage over fixed-budget diffusion generation remains unverified.

Authors: The decoupling of stages and the selection mechanism (constructing an over-complete pool then choosing a budget-constrained subset via a learned or heuristic scorer) are detailed in Sections 3.1–3.2. Ablations comparing the full Pool-Select-Refine pipeline against direct fixed-budget generation and random selection from the same pool appear in Section 4.3. We will revise the abstract to briefly name the selection stage and its purpose, thereby clarifying how the over-complete pool enables better budget allocation without altering the core claims. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method proposal with no self-referential derivations or fitted predictions

full rationale

The paper introduces a Pool-Select-Refine pipeline as a procedural framework for dataset distillation and reports empirical gains on image classification benchmarks. No equations, parameter-fitting steps, uniqueness theorems, or self-citations appear in the provided text that would reduce any claimed result to its own inputs by construction. The central claim is presented as an outcome of the new allocation-aware procedure versus diffusion baselines, making the derivation self-contained against external experimental benchmarks rather than internally forced.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only input supplies no explicit free parameters, axioms, or invented entities; the framework assumes standard diffusion priors and teacher-model supervision from prior literature.

pith-pipeline@v0.9.1-grok · 5774 in / 941 out tokens · 33424 ms · 2026-06-29T05:32:45.588346+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 1 canonical work pages

[1]

Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A. Efros. Dataset distillation, 2020

2020
[2]

Dataset distillation: A comprehen- sive review.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):150–170, January 2024

Ruonan Yu, Songhua Liu, and Xinchao Wang. Dataset distillation: A comprehen- sive review.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):150–170, January 2024

2024
[3]

Towards trustworthy dataset distillation.Pattern Recognition, 157:110875, 2025

Shijie Ma, Fei Zhu, Zhen Cheng, and Xu-Yao Zhang. Towards trustworthy dataset distillation.Pattern Recognition, 157:110875, 2025

2025
[4]

Compr: Efficient point cloud dataset condensation via bidirectional matching and point recycling.Pattern Recognition, 172:112494, 2026

Hongliang Zhang, Xiaoqi An, Jiawei Lian, Lei Luo, and Jian Yang. Compr: Efficient point cloud dataset condensation via bidirectional matching and point recycling.Pattern Recognition, 172:112494, 2026

2026
[5]

Towards reliable domain generalization: Insights from the pf2hc benchmark and dynamic evalua- tions.Pattern Recognition, 157:110926, 2025

Jiao Zhang, Xiang Ao, Xu-Yao Zhang, and Cheng-Lin Liu. Towards reliable domain generalization: Insights from the pf2hc benchmark and dynamic evalua- tions.Pattern Recognition, 157:110926, 2025

2025
[6]

Dataset meta-learning from kernel ridge-regression

Timothy Chieu Nguyen, Zhourong Chen, and Jaehoon Lee. Dataset meta-learning from kernel ridge-regression. InICLR 2021, 2021

2021
[7]

Dataset distillation using neural feature regression

Yongchao Zhou, Ehsan Nezhadarya, and Jimmy Ba. Dataset distillation using neural feature regression. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

2022
[8]

Efficient dataset distillation using random feature approximation

Noel Loo, Ramin Hasani, Mathias Lechner, and Daniela Rus. Efficient dataset distillation using random feature approximation. InAdvances in Neural Informa- tion Processing Systems (NeurIPS), 2022

2022
[9]

arXiv preprint arXiv:2006.05929 (2020)

Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. Dataset condensation with gradient matching.CoRR, abs/2006.05929, 2020

work page arXiv 2006
[10]

Dataset condensation with differentiable siamese augmentation

Bo Zhao and Hakan Bilen. Dataset condensation with differentiable siamese augmentation. InInternational Conference on Machine Learning (ICML), pages 12674–12685, 2021. 25

2021
[11]

Dataset condensation with distribution matching

Bo Zhao and Hakan Bilen. Dataset condensation with distribution matching. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision 2023 (WACV), IEEE Winter Conference on Applications of Computer Vi- sion, pages 6503–6512, United States, January 2023. Institute of Electrical and Electronics Engineers

2023
[12]

Cafe: Learning to condense dataset by aligning features

Kai Wang, Bo Zhao, Xiangyu Peng, Zheng Zhu, Shuo Yang, Shuo Wang, Guan Huang, Hakan Bilen, Xinchao Wang, and Yang You. Cafe: Learning to condense dataset by aligning features. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12196–12205, 2022

2022
[13]

Datadam: Efficient dataset distillation with at- tention matching

Ahmad Sajedi, Samir Khaki, Ehsan Amjadian, Lucy Z Liu, Yuri A Lawryshyn, and Konstantinos N Plataniotis. Datadam: Efficient dataset distillation with at- tention matching. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 17097–17107, 2023

2023
[14]

Mim4dd: Mutual information max- imization for dataset distillation

Yuzhang Shang, Zhihang Yuan, and Yan Yan. Mim4dd: Mutual information max- imization for dataset distillation. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

2023
[15]

Efros, and Jun-Yan Zhu

George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A. Efros, and Jun-Yan Zhu. Dataset distillation by matching training trajectories. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2022
[16]

Towards lossless dataset distillation via difficulty-aligned trajectory match- ing

Ziyao Guo, Kai Wang, George Cazenavette, Hui Li, Kaipeng Zhang, and Yang You. Towards lossless dataset distillation via difficulty-aligned trajectory match- ing. InInternational Conference on Learning Representations (ICLR), 2024

2024
[17]

Min- imizing the accumulated trajectory error to improve dataset distillation

Jiawei Du, Yidi Jiang, Vincent YF Tan, Joey Tianyi Zhou, and Haizhou Li. Min- imizing the accumulated trajectory error to improve dataset distillation. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), pages 3749–3758, 2023. 26

2023
[18]

Synthesizing informative training samples with GAN

Bo Zhao and Hakan Bilen. Synthesizing informative training samples with GAN. InNeurIPS 2022 Workshop on Synthetic Data for Empowering ML Research, 2022

2022
[19]

Efros, and Jun-Yan Zhu

George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A. Efros, and Jun-Yan Zhu. Generalizing dataset distillation via deep generative prior.CVPR, 2023

2023
[20]

Hierarchical features matter: A deep exploration of gan priors for improved dataset distillation

Xinhao Zhong, Hao Fang, Bin Chen, Xulin Gu, Meikang Qiu, Shuhan Qi, and Shu-Tao Xia. Hierarchical features matter: A deep exploration of gan priors for improved dataset distillation. pages 30462–30471. Computer Vision Foundation /IEEE, 2025

2025
[21]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems (NeurIPS), vol- ume 33, pages 6840–6851, 2020

2020
[22]

Denoising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. InInternational Conference on Learning Representations (ICLR), 2021

2021
[23]

Generative adver- sarial networks.Commun

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde- Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adver- sarial networks.Commun. ACM, 63(11):139–144, October 2020

2020
[24]

Efficient dataset distillation via minimax diffusion

Jianyang Gu, Saeed Vahidian, Vyacheslav Kungurtsev, Haonan Wang, Wei Jiang, Yang You, and Yiran Chen. Efficient dataset distillation via minimax diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15793–15803, June 2024

2024
[25]

D4m: Dataset distillation via disentangled diffusion model

Duo Su, Junjie Hou, Weizhi Gao, Yingjie Tian, and Bowen Tang. D4m: Dataset distillation via disentangled diffusion model. InCVPR, pages 5809–5818, 2024

2024
[26]

Influence-guided diffusion for dataset distillation

Mingyang Chen, Jiawei Du, Bo Huang, Yi Wang, Xiaobo Zhang, and Wei Wang. Influence-guided diffusion for dataset distillation. InThe Thirteenth International Conference on Learning Representations, 2025. 27

2025
[27]

MGD3: Mode-guided dataset distillation using diffusion models

Jeffrey A Chan Santiago, praveen tirupattur, Gaurav Kumar Nayak, Gaowen Liu, and Mubarak Shah. MGD3: Mode-guided dataset distillation using diffusion models. InForty-second International Conference on Machine Learning, 2025

2025
[28]

Unlocking dataset distillation with diffusion models

Brian Bernhard Moser, Federico Raue, Sebastian Palacio, Stanislav Frolov, and Andreas Dengel. Unlocking dataset distillation with diffusion models. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026

2026
[29]

Plataniotis

Linfeng Ye, Shayan Mohajer Hamidi, Guang Li, Takahiro Ogawa, Miki Haseyama, and Konstantinos N. Plataniotis. Information-guided diffusion sam- pling for dataset distillation. InNeurIPS 2025 Workshop on Structured Proba- bilistic Inference&Generative Modeling, 2025

2025
[30]

Enhancing diffusion-based dataset distillation via adversary-guided curriculum sampling

Lexiao Zou, Yanda Chen, et al. Enhancing diffusion-based dataset distillation via adversary-guided curriculum sampling. InIEEE International Conference on Multimedia and Expo (ICME), 2025

2025
[31]

Cao2: Rectifying inconsistencies in diffusion-based dataset distillation

Haoxuan Wang, Zhenghao Zhao, Junyi Wu, Yuzhang Shang, Gaowen Liu, and Yan Yan. Cao2: Rectifying inconsistencies in diffusion-based dataset distillation. In2025 IEEE/CVF International Conference on Computer Vision (ICCV), pages 4722–4731, 2025

2025
[32]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4195–4205, 2023

2023
[33]

A label is worth a thousand images in dataset distillation

Tian Qin, Zhiwei Deng, and David Alvarez-Melis. A label is worth a thousand images in dataset distillation. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

2024
[34]

A baseline for detecting misclassified and out-of-distribution examples in neural networks.Proceedings of the International Conference on Learning Representations (ICLR), 2017

Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks.Proceedings of the International Conference on Learning Representations (ICLR), 2017. 28

2017
[35]

Weinberger

Chuan Guo, GeoffPleiss, Yu Sun, and Kilian Q. Weinberger. On calibration of modern neural networks. In Doina Precup and Yee Whye Teh, editors,Proceed- ings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1321–1330. PMLR, 06–11 Aug 2017

2017
[36]

Active learning for convolutional neural net- works: A core-set approach

Ozan Sener and Silvio Savarese. Active learning for convolutional neural net- works: A core-set approach. InInternational Conference on Learning Represen- tations (ICLR), 2018

2018
[37]

Active learning literature survey

Burr Settles. Active learning literature survey. Technical Report 1648, University of Wisconsin-Madison, 2009

2009
[38]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), pages 10684–10695, 2022

2022
[39]

Revisiting confidence estimation: Towards reliable failure prediction.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(5):3370–3387, 2024

Fei Zhu, Xu-Yao Zhang, Zhen Cheng, and Cheng-Lin Liu. Revisiting confidence estimation: Towards reliable failure prediction.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(5):3370–3387, 2024

2024
[40]

Dataset condensation via efficient synthetic-data parameterization

Jang-Hyun Kim, Jinuk Kim, Seong Joon Oh, Sangdoo Yun, Hwanjun Song, Joon- hyun Jeong, Jung-Woo Ha, and Hyun Oh Song. Dataset condensation via efficient synthetic-data parameterization. InProceedings of the 39th International Confer- ence on Machine Learning (ICML), pages 11102–11118, 2022

2022
[41]

On the diversity and realism of distilled dataset: An efficient dataset distillation paradigm

Peng Sun, Bei Shi, Daiwei Yu, and Tao Lin. On the diversity and realism of distilled dataset: An efficient dataset distillation paradigm. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9390–9399, 2024

2024
[42]

Dynamic few-shot visual learning with- out forgetting

Spyros Gidaris and Nikos Komodakis. Dynamic few-shot visual learning with- out forgetting. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4367–4375, 2018. 29

2018
[43]

Deep residual learn- ing for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learn- ing for image recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016

2016
[44]

A convnet for the 2020s

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Dar- rell, and Saining Xie. A convnet for the 2020s. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11976– 11986, 2022

2022
[45]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InProceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009

2009
[46]

Imagenette: A smaller subset of 10 easily classified classes from imagenet.https://github.com/fastai/imagenette, 2019

Jeremy Howard. Imagenette: A smaller subset of 10 easily classified classes from imagenet.https://github.com/fastai/imagenette, 2019

2019
[47]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

2009
[48]

Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015

Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015. 30

2015

[1] [1]

Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A. Efros. Dataset distillation, 2020

2020

[2] [2]

Dataset distillation: A comprehen- sive review.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):150–170, January 2024

Ruonan Yu, Songhua Liu, and Xinchao Wang. Dataset distillation: A comprehen- sive review.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):150–170, January 2024

2024

[3] [3]

Towards trustworthy dataset distillation.Pattern Recognition, 157:110875, 2025

Shijie Ma, Fei Zhu, Zhen Cheng, and Xu-Yao Zhang. Towards trustworthy dataset distillation.Pattern Recognition, 157:110875, 2025

2025

[4] [4]

Compr: Efficient point cloud dataset condensation via bidirectional matching and point recycling.Pattern Recognition, 172:112494, 2026

Hongliang Zhang, Xiaoqi An, Jiawei Lian, Lei Luo, and Jian Yang. Compr: Efficient point cloud dataset condensation via bidirectional matching and point recycling.Pattern Recognition, 172:112494, 2026

2026

[5] [5]

Towards reliable domain generalization: Insights from the pf2hc benchmark and dynamic evalua- tions.Pattern Recognition, 157:110926, 2025

Jiao Zhang, Xiang Ao, Xu-Yao Zhang, and Cheng-Lin Liu. Towards reliable domain generalization: Insights from the pf2hc benchmark and dynamic evalua- tions.Pattern Recognition, 157:110926, 2025

2025

[6] [6]

Dataset meta-learning from kernel ridge-regression

Timothy Chieu Nguyen, Zhourong Chen, and Jaehoon Lee. Dataset meta-learning from kernel ridge-regression. InICLR 2021, 2021

2021

[7] [7]

Dataset distillation using neural feature regression

Yongchao Zhou, Ehsan Nezhadarya, and Jimmy Ba. Dataset distillation using neural feature regression. InAdvances in Neural Information Processing Systems (NeurIPS), 2022

2022

[8] [8]

Efficient dataset distillation using random feature approximation

Noel Loo, Ramin Hasani, Mathias Lechner, and Daniela Rus. Efficient dataset distillation using random feature approximation. InAdvances in Neural Informa- tion Processing Systems (NeurIPS), 2022

2022

[9] [9]

arXiv preprint arXiv:2006.05929 (2020)

Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. Dataset condensation with gradient matching.CoRR, abs/2006.05929, 2020

work page arXiv 2006

[10] [10]

Dataset condensation with differentiable siamese augmentation

Bo Zhao and Hakan Bilen. Dataset condensation with differentiable siamese augmentation. InInternational Conference on Machine Learning (ICML), pages 12674–12685, 2021. 25

2021

[11] [11]

Dataset condensation with distribution matching

Bo Zhao and Hakan Bilen. Dataset condensation with distribution matching. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision 2023 (WACV), IEEE Winter Conference on Applications of Computer Vi- sion, pages 6503–6512, United States, January 2023. Institute of Electrical and Electronics Engineers

2023

[12] [12]

Cafe: Learning to condense dataset by aligning features

Kai Wang, Bo Zhao, Xiangyu Peng, Zheng Zhu, Shuo Yang, Shuo Wang, Guan Huang, Hakan Bilen, Xinchao Wang, and Yang You. Cafe: Learning to condense dataset by aligning features. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 12196–12205, 2022

2022

[13] [13]

Datadam: Efficient dataset distillation with at- tention matching

Ahmad Sajedi, Samir Khaki, Ehsan Amjadian, Lucy Z Liu, Yuri A Lawryshyn, and Konstantinos N Plataniotis. Datadam: Efficient dataset distillation with at- tention matching. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 17097–17107, 2023

2023

[14] [14]

Mim4dd: Mutual information max- imization for dataset distillation

Yuzhang Shang, Zhihang Yuan, and Yan Yan. Mim4dd: Mutual information max- imization for dataset distillation. InAdvances in Neural Information Processing Systems (NeurIPS), 2023

2023

[15] [15]

Efros, and Jun-Yan Zhu

George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A. Efros, and Jun-Yan Zhu. Dataset distillation by matching training trajectories. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2022

2022

[16] [16]

Towards lossless dataset distillation via difficulty-aligned trajectory match- ing

Ziyao Guo, Kai Wang, George Cazenavette, Hui Li, Kaipeng Zhang, and Yang You. Towards lossless dataset distillation via difficulty-aligned trajectory match- ing. InInternational Conference on Learning Representations (ICLR), 2024

2024

[17] [17]

Min- imizing the accumulated trajectory error to improve dataset distillation

Jiawei Du, Yidi Jiang, Vincent YF Tan, Joey Tianyi Zhou, and Haizhou Li. Min- imizing the accumulated trajectory error to improve dataset distillation. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), pages 3749–3758, 2023. 26

2023

[18] [18]

Synthesizing informative training samples with GAN

Bo Zhao and Hakan Bilen. Synthesizing informative training samples with GAN. InNeurIPS 2022 Workshop on Synthetic Data for Empowering ML Research, 2022

2022

[19] [19]

Efros, and Jun-Yan Zhu

George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A. Efros, and Jun-Yan Zhu. Generalizing dataset distillation via deep generative prior.CVPR, 2023

2023

[20] [20]

Hierarchical features matter: A deep exploration of gan priors for improved dataset distillation

Xinhao Zhong, Hao Fang, Bin Chen, Xulin Gu, Meikang Qiu, Shuhan Qi, and Shu-Tao Xia. Hierarchical features matter: A deep exploration of gan priors for improved dataset distillation. pages 30462–30471. Computer Vision Foundation /IEEE, 2025

2025

[21] [21]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. InAdvances in Neural Information Processing Systems (NeurIPS), vol- ume 33, pages 6840–6851, 2020

2020

[22] [22]

Denoising diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models. InInternational Conference on Learning Representations (ICLR), 2021

2021

[23] [23]

Generative adver- sarial networks.Commun

Ian Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde- Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adver- sarial networks.Commun. ACM, 63(11):139–144, October 2020

2020

[24] [24]

Efficient dataset distillation via minimax diffusion

Jianyang Gu, Saeed Vahidian, Vyacheslav Kungurtsev, Haonan Wang, Wei Jiang, Yang You, and Yiran Chen. Efficient dataset distillation via minimax diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 15793–15803, June 2024

2024

[25] [25]

D4m: Dataset distillation via disentangled diffusion model

Duo Su, Junjie Hou, Weizhi Gao, Yingjie Tian, and Bowen Tang. D4m: Dataset distillation via disentangled diffusion model. InCVPR, pages 5809–5818, 2024

2024

[26] [26]

Influence-guided diffusion for dataset distillation

Mingyang Chen, Jiawei Du, Bo Huang, Yi Wang, Xiaobo Zhang, and Wei Wang. Influence-guided diffusion for dataset distillation. InThe Thirteenth International Conference on Learning Representations, 2025. 27

2025

[27] [27]

MGD3: Mode-guided dataset distillation using diffusion models

Jeffrey A Chan Santiago, praveen tirupattur, Gaurav Kumar Nayak, Gaowen Liu, and Mubarak Shah. MGD3: Mode-guided dataset distillation using diffusion models. InForty-second International Conference on Machine Learning, 2025

2025

[28] [28]

Unlocking dataset distillation with diffusion models

Brian Bernhard Moser, Federico Raue, Sebastian Palacio, Stanislav Frolov, and Andreas Dengel. Unlocking dataset distillation with diffusion models. In The Thirty-ninth Annual Conference on Neural Information Processing Systems, 2026

2026

[29] [29]

Plataniotis

Linfeng Ye, Shayan Mohajer Hamidi, Guang Li, Takahiro Ogawa, Miki Haseyama, and Konstantinos N. Plataniotis. Information-guided diffusion sam- pling for dataset distillation. InNeurIPS 2025 Workshop on Structured Proba- bilistic Inference&Generative Modeling, 2025

2025

[30] [30]

Enhancing diffusion-based dataset distillation via adversary-guided curriculum sampling

Lexiao Zou, Yanda Chen, et al. Enhancing diffusion-based dataset distillation via adversary-guided curriculum sampling. InIEEE International Conference on Multimedia and Expo (ICME), 2025

2025

[31] [31]

Cao2: Rectifying inconsistencies in diffusion-based dataset distillation

Haoxuan Wang, Zhenghao Zhao, Junyi Wu, Yuzhang Shang, Gaowen Liu, and Yan Yan. Cao2: Rectifying inconsistencies in diffusion-based dataset distillation. In2025 IEEE/CVF International Conference on Computer Vision (ICCV), pages 4722–4731, 2025

2025

[32] [32]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 4195–4205, 2023

2023

[33] [33]

A label is worth a thousand images in dataset distillation

Tian Qin, Zhiwei Deng, and David Alvarez-Melis. A label is worth a thousand images in dataset distillation. InThe Thirty-eighth Annual Conference on Neural Information Processing Systems, 2024

2024

[34] [34]

A baseline for detecting misclassified and out-of-distribution examples in neural networks.Proceedings of the International Conference on Learning Representations (ICLR), 2017

Dan Hendrycks and Kevin Gimpel. A baseline for detecting misclassified and out-of-distribution examples in neural networks.Proceedings of the International Conference on Learning Representations (ICLR), 2017. 28

2017

[35] [35]

Weinberger

Chuan Guo, GeoffPleiss, Yu Sun, and Kilian Q. Weinberger. On calibration of modern neural networks. In Doina Precup and Yee Whye Teh, editors,Proceed- ings of the 34th International Conference on Machine Learning, volume 70 of Proceedings of Machine Learning Research, pages 1321–1330. PMLR, 06–11 Aug 2017

2017

[36] [36]

Active learning for convolutional neural net- works: A core-set approach

Ozan Sener and Silvio Savarese. Active learning for convolutional neural net- works: A core-set approach. InInternational Conference on Learning Represen- tations (ICLR), 2018

2018

[37] [37]

Active learning literature survey

Burr Settles. Active learning literature survey. Technical Report 1648, University of Wisconsin-Madison, 2009

2009

[38] [38]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High-resolution image synthesis with latent diffusion models. InPro- ceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recogni- tion (CVPR), pages 10684–10695, 2022

2022

[39] [39]

Revisiting confidence estimation: Towards reliable failure prediction.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(5):3370–3387, 2024

Fei Zhu, Xu-Yao Zhang, Zhen Cheng, and Cheng-Lin Liu. Revisiting confidence estimation: Towards reliable failure prediction.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(5):3370–3387, 2024

2024

[40] [40]

Dataset condensation via efficient synthetic-data parameterization

Jang-Hyun Kim, Jinuk Kim, Seong Joon Oh, Sangdoo Yun, Hwanjun Song, Joon- hyun Jeong, Jung-Woo Ha, and Hyun Oh Song. Dataset condensation via efficient synthetic-data parameterization. InProceedings of the 39th International Confer- ence on Machine Learning (ICML), pages 11102–11118, 2022

2022

[41] [41]

On the diversity and realism of distilled dataset: An efficient dataset distillation paradigm

Peng Sun, Bei Shi, Daiwei Yu, and Tao Lin. On the diversity and realism of distilled dataset: An efficient dataset distillation paradigm. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9390–9399, 2024

2024

[42] [42]

Dynamic few-shot visual learning with- out forgetting

Spyros Gidaris and Nikos Komodakis. Dynamic few-shot visual learning with- out forgetting. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 4367–4375, 2018. 29

2018

[43] [43]

Deep residual learn- ing for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learn- ing for image recognition. InProceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pages 770–778, 2016

2016

[44] [44]

A convnet for the 2020s

Zhuang Liu, Hanzi Mao, Chao-Yuan Wu, Christoph Feichtenhofer, Trevor Dar- rell, and Saining Xie. A convnet for the 2020s. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 11976– 11986, 2022

2022

[45] [45]

Imagenet: A large-scale hierarchical image database

Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large-scale hierarchical image database. InProceedings of the IEEE Confer- ence on Computer Vision and Pattern Recognition (CVPR), pages 248–255, 2009

2009

[46] [46]

Imagenette: A smaller subset of 10 easily classified classes from imagenet.https://github.com/fastai/imagenette, 2019

Jeremy Howard. Imagenette: A smaller subset of 10 easily classified classes from imagenet.https://github.com/fastai/imagenette, 2019

2019

[47] [47]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

2009

[48] [48]

Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015

Ya Le and Xuan Yang. Tiny imagenet visual recognition challenge.CS 231N, 7(7):3, 2015. 30

2015