pith. sign in

arxiv: 2606.18209 · v1 · pith:6NZQAQFXnew · submitted 2026-06-16 · 💻 cs.LG

Rethinking Dataset Distillation for Classification: Do Distilled Sets Outperform Coresets?

Pith reviewed 2026-06-27 01:43 UTC · model grok-4.3

classification 💻 cs.LG
keywords dataset distillationcoreset selectiondata pruningImageNetclassificationdata efficiencymachine learning
0
0 comments X

The pith

Dataset distillation methods match or fall short of coreset selection on large image classification tasks while costing more to build.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether creating synthetic training samples through dataset distillation delivers better results than simply choosing a small number of real examples with coreset selection. It runs this test on ImageNet-1K and related datasets using the same training protocols for both approaches. The findings indicate that top distillation techniques reach accuracy levels comparable to or below those of coresets, with some distillation sets performing no better than random selection. Coresets also show stronger coverage of the original data distribution and require far less computation to produce. This challenges the idea that synthetic samples hold an inherent advantage over selected real data for reducing training sets.

Core claim

Through standardized benchmarks of seven state-of-the-art dataset distillation methods against three coreset strategies on ImageNet-1K, ImageNet100, and ImageNette using three training protocols, the work establishes that leading distillation approaches achieve accuracy comparable to or lower than coresets on large-scale datasets, incur substantially higher construction costs, and deliver inferior representativeness, diversity, and quality compared to coresets, while some distillation methods fail to beat random subsets.

What carries the argument

Direct head-to-head comparison of seven SOTA dataset distillation methods versus three coreset strategies under three fixed training protocols, measured by accuracy, construction cost, and metrics for data distribution coverage.

If this is right

  • Some dataset distillation methods underperform even random subsets of real data.
  • Current leading distillation techniques remain comparable to or worse than coresets in final model accuracy on large datasets.
  • Dataset distillation requires substantially higher computational cost to construct the condensed set than coreset selection.
  • Coresets consistently provide better coverage, diversity, and quality relative to the original data distribution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the pattern holds across more settings, practitioners may prefer coreset selection over distillation for practical data reduction in classification.
  • The results suggest testing whether the expressiveness advantage claimed for synthetic samples survives when total compute for construction is held fixed.
  • Extending the same standardized comparison to non-image tasks or other modalities could show whether the relative performance is domain-specific.

Load-bearing premise

The three chosen coreset strategies and three training protocols fairly represent typical use cases without favoring one method through implementation details or dataset effects.

What would settle it

A state-of-the-art distillation method achieving clearly higher accuracy than the best coreset at lower total construction cost on ImageNet-1K under identical training conditions would contradict the central result.

Figures

Figures reproduced from arXiv: 2606.18209 by Akshay Mehra, Joshua Kimball, Trisha Mittal.

Figure 1
Figure 1. Figure 1: Coresets consistently outperforms distilled sets in accuracy, data coverage, and construction efficiency. We report results on ImageNet-1K using condensed sets consisting of 50 images per class. (a) Relative accuracy gains of a ResNet-50 trained on coreset identified using AUM [5] in comparison to three SOTA DD methods (FADRM+ [6], ManifoldGD [7], VLCP [8]) for three evaluation protocols. (b) Representativ… view at source ↗
Figure 2
Figure 2. Figure 2: Experiment Design Overview: We start with a full dataset, and create coresets using the CS methods (green) and also distilled datasets using the various DD methods (orange). We then train various student models on these coresets or distilled datasets via three training protocols and benchmark the performance of the trained student models on the test set. We experiment with three training protocols (blue); … view at source ↗
Figure 3
Figure 3. Figure 3: FID comparison for 50 IPC condensed datasets for ImageNet-1K. original data distribution. Diversity is defined as 1 − maxi̸=j cos f T i , fT j  , measuring redundancy within the condensed set. Higher representativeness indicates better coverage, while higher diversity reflects a greater sample spread. We report an average for all the metrics computed class-wise for 50 IPC sets for one representative metho… view at source ↗
Figure 4
Figure 4. Figure 4: Coreset created by AUM exhibits greater intra [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Representativeness–diversity trade-off at 50 IPC for ImageNet-100, and ImageNette. Results show that AUM achieves a more favorable balance, occupying the upper right corner, compared to SOTA DD methods. C.1 Additional results for diversity and representativeness [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Representativeness–diversity trade-off at 10 IPC on ImageNet-1K, ImageNet-100, and ImageNette. Similar to the 50 IPC results presented, AUM achieves a more favorable balance between representativeness and diversity, while DD methods exhibit reduced diversity. 80 90 100 110 120 130 140 150 160 FID RANDOM FADRM MANIFOLDGD VLCP AUM 4.0% (a) ImageNet-100 50 75 100 125 150 175 200 225 250 FID RANDOM FADRM MANIF… view at source ↗
Figure 7
Figure 7. Figure 7: Fréchet Inception Distance (FID) comparison for condensed datasets for [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: 10 images per class sampled from condensed datasets produced by VLCP, ManifoldGD, FADRM+, and AUM for 4 [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: 10 images per class sampled from condensed datasets produced by VLCP, ManifoldGD, FADRM+, and AUM for 4 [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
read the original abstract

Dataset distillation (DD) has emerged as a prominent approach in data centric machine learning, aiming to synthesize compact training sets for efficient training by compressing the information in large datasets into a small number of synthetic samples. However, DD methods are often evaluated under inconsistent evaluation protocols, ranging from standard ERM to single/multi-teacher supervision, making it difficult to isolate the effectiveness of distilled data from evaluation. Moreover, many prior methods claim that DD outperforms data pruning approaches such as coreset selection (CS), based on the assumption that restricting condensed datasets to subsets of real samples fundamentally limits their expressiveness. In this work, we critically evaluate DD methods through large-scale experiments using standardized datasets and evaluation protocols to assess their intrinsic effectiveness. We benchmark seven state-of-the-art (SOTA) DD methods on ImageNet-1K, ImageNet100, and ImageNette, using three widely adopted training protocols against three CS strategies. Our results show that while some DD methods fail to outperform even simple random subsets, the SOTA DD approaches are comparable to or worse than coresets on large-scale datasets and incur a substantially higher cost for construction. Beyond accuracy, we also evaluate the representativeness, diversity, and quality of condensed sets, and find that coresets consistently achieve better coverage of the original data distribution. These findings highlight the limited practical advantages of current DD methods and show that coresets remain competitive and are often a more computationally efficient alternative for data-centric learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript claims that under standardized evaluation protocols, state-of-the-art dataset distillation (DD) methods do not outperform coreset selection (CS) methods on large-scale datasets including ImageNet-1K, ImageNet-100, and ImageNette. Benchmarking seven SOTA DD methods against three CS strategies with three training protocols shows that some DD methods fail to beat random subsets, SOTA DD is comparable to or worse than CS while incurring higher construction costs, and coresets achieve better coverage, representativeness, diversity, and quality of the condensed sets. The work emphasizes inconsistent prior protocols as motivation for the standardized comparison.

Significance. If the empirical results hold, the paper would provide a valuable large-scale re-evaluation of DD's practical advantages over simpler selection methods, potentially redirecting data-centric ML research toward computationally efficient alternatives. Strengths include the scale of experiments on ImageNet-1K, use of multiple protocols and additional metrics beyond accuracy, and direct comparison that addresses prior claims of DD superiority based on expressiveness assumptions.

major comments (1)
  1. [Abstract] Abstract: The central claim that the three training protocols allow assessment of the 'intrinsic effectiveness' of distilled data (and thus that SOTA DD is comparable/worse than CS) requires that these protocols match the supervision regimes (e.g., multi-teacher or custom distillation losses) under which the seven DD methods originally demonstrated superiority. The manuscript acknowledges inconsistent prior protocols but provides no explicit verification or details that the chosen protocols include those regimes; without this, performance gaps may reflect protocol mismatch rather than the distilled sets themselves being less expressive than selected real samples.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. Below we respond to the major comment.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The central claim that the three training protocols allow assessment of the 'intrinsic effectiveness' of distilled data (and thus that SOTA DD is comparable/worse than CS) requires that these protocols match the supervision regimes (e.g., multi-teacher or custom distillation losses) under which the seven DD methods originally demonstrated superiority. The manuscript acknowledges inconsistent prior protocols but provides no explicit verification or details that the chosen protocols include those regimes; without this, performance gaps may reflect protocol mismatch rather than the distilled sets themselves being less expressive than selected real samples.

    Authors: We appreciate the referee highlighting this point. Our three protocols (standard ERM, single-teacher, and multi-teacher supervision) were selected precisely because they represent the most common evaluation regimes appearing across the DD literature; they are detailed in Section 3.2. The goal of the work is to isolate the quality of the distilled sets themselves under a single, consistent evaluation framework rather than to reproduce each method's original (and mutually inconsistent) training procedure. This standardization is what enables the head-to-head comparison with coreset methods. That said, we agree that an explicit cross-reference would remove any ambiguity. We will add a table in the revised manuscript that maps each of the seven DD methods to the supervision regime(s) used in its original paper and indicates which of our three protocols align with it. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical benchmarking on external public datasets

full rationale

The paper performs direct head-to-head accuracy, diversity, and cost comparisons of seven DD methods against three CS strategies on ImageNet-1K, ImageNet-100, and ImageNette under three fixed training protocols. No equations, parameter fits, or first-principles derivations are present; all reported outcomes are measured quantities from independent runs on public benchmarks. Self-citations, if any, are limited to prior method descriptions and do not bear the central claim. The work is therefore self-contained against external data and scores 0.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard supervised learning evaluation assumptions and the representativeness of the chosen coreset baselines; no free parameters or invented entities are introduced.

axioms (1)
  • domain assumption Standard i.i.d. train/test splits and ERM training assumptions hold for the ImageNet variants used.
    Invoked when applying consistent training protocols across DD and CS sets.

pith-pipeline@v0.9.1-grok · 5797 in / 1175 out tokens · 45607 ms · 2026-06-27T01:43:50.745257+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

48 extracted references · 16 canonical work pages · 5 internal anchors

  1. [1]

    LLaMA: Open and Efficient Foundation Language Models

    Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timo- thée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023

  2. [2]

    GPT-4 Technical Report

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

  3. [3]

    Beyond neural scaling laws: beating power law scaling via data pruning.Advances in Neural Information Processing Systems, 35:19523–19536, 2022

    Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, and Ari Morcos. Beyond neural scaling laws: beating power law scaling via data pruning.Advances in Neural Information Processing Systems, 35:19523–19536, 2022

  4. [4]

    Datacomp-lm: In search of the next generation of training sets for language models.Advances in Neural Information Processing Systems, 37:14200–14282, 2024

    Jeffrey Li, Alex Fang, Georgios Smyrnis, Maor Ivgi, Matt Jordan, Samir Yitzhak Gadre, Hritik Bansal, Etash Guha, Sedrick Scott Keh, Kushal Arora, et al. Datacomp-lm: In search of the next generation of training sets for language models.Advances in Neural Information Processing Systems, 37:14200–14282, 2024

  5. [5]

    Identifying mislabeled data using the area under the margin ranking

    Geoff Pleiss et al. Identifying mislabeled data using the area under the margin ranking. In NeurIPS, 2020

  6. [6]

    Fadrm: Fast and accurate data residual matching for dataset distillation

    Jiacheng Cui, Xinyue Bi, Yaxin Luo, Xiaohan Zhao, Jiacheng Liu, and Zhiqiang Shen. Fadrm: Fast and accurate data residual matching for dataset distillation. InThe Thirty-ninth Annual Conference on Neural Information Processing Systems

  7. [7]

    Mani- foldgd: Training-free hierarchical manifold guidance for diffusion-based dataset distillation

    Ayush Roy, Wei-Yang Alex Lee, Rudrasis Chakraborty, and Vishnu Suresh Lokhande. Mani- foldgd: Training-free hierarchical manifold guidance for diffusion-based dataset distillation. arXiv preprint arXiv:2602.23295, 2026

  8. [8]

    Dataset distillation via vision-language category prototype

    Yawen Zou, Guang Li, Duo Su, Zi Wang, Jun Yu, and Chao Zhang. Dataset distillation via vision-language category prototype. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025

  9. [9]

    The evolution of dataset distillation: Toward scalable and generalizable solutions.arXiv preprint arXiv:2502.05673, 2025

    Ping Liu and Jiawei Du. The evolution of dataset distillation: Toward scalable and generalizable solutions.arXiv preprint arXiv:2502.05673, 2025

  10. [10]

    Dataset distillation in the era of large-scale data: Methods, analysis, and future directions.Authorea Preprints, 2025

    Xinyi Shang, Peng Sun, Zhiqiang Shen, Tao Lin, and Jing-Hao Xue. Dataset distillation in the era of large-scale data: Methods, analysis, and future directions.Authorea Preprints, 2025

  11. [11]

    Coresets for data-efficient training of machine learning models

    Baharan Mirzasoleiman, Jeff Bilmes, and Jure Leskovec. Coresets for data-efficient training of machine learning models. InInternational Conference on Machine Learning, pages 6950–6960. PMLR, 2020. 10

  12. [12]

    Deepcore: A comprehensive library for coreset selection in deep learning

    Chengcheng Guo, Bo Zhao, and Yanbing Bai. Deepcore: A comprehensive library for coreset selection in deep learning. InInternational Conference on Database and Expert Systems Applications, pages 181–195. Springer, 2022

  13. [13]

    Taming diffusion for dataset distillation with high representativeness

    Lin Zhao, Yushu Wu, Xinru Jiang, Jianyang Gu, Yanzhi Wang, Xiaolin Xu, Pu Zhao, and Xue Lin. Taming diffusion for dataset distillation with high representativeness. InForty-second International Conference on Machine Learning

  14. [14]

    Chan Santiago, Praveen Tirupattur, Gaurav Kumar Nayak, Gaowen Liu, and Mubarak Shah

    Jeffrey A. Chan Santiago, Praveen Tirupattur, Gaurav Kumar Nayak, Gaowen Liu, and Mubarak Shah. MGD 3: Mode-guided dataset distillation using diffusion models. InProceedings of the 42nd International Conference on Machine Learning (ICML), 2025

  15. [15]

    Efficient dataset distillation via minimax diffusion

    Jianyang Gu, Saeed Vahidian, Vyacheslav Kungurtsev, Haonan Wang, Wei Jiang, Yang You, and Yiran Chen. Efficient dataset distillation via minimax diffusion. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), 2024

  16. [16]

    Distilling the Knowledge in a Neural Network

    Geoffrey Hinton, Oriol Vinyals, and Jeff Dean. Distilling the knowledge in a neural network. arXiv preprint arXiv:1503.02531, 2015

  17. [17]

    Coreset selection via llm-based concept bottlenecks.arXiv preprint arXiv:2502.16733, 2025

    Akshay Mehra, Trisha Mittal, Subhadra Gopalakrishnan, and Joshua Kimball. Coreset selection via llm-based concept bottlenecks.arXiv preprint arXiv:2502.16733, 2025

  18. [18]

    On the diversity and realism of distilled dataset: An efficient dataset distillation paradigm, 2024

    Peng Sun, Bei Shi, Daiwei Yu, and Tao Lin. On the diversity and realism of distilled dataset: An efficient dataset distillation paradigm, 2024

  19. [19]

    Elucidating the design space of dataset condensation.arXiv preprint arXiv:2404.13733, 2024

    Shitong Shao, Zikai Zhou, Huanran Chen, and Zhiqiang Shen. Elucidating the design space of dataset condensation.arXiv preprint arXiv:2404.13733, 2024

  20. [20]

    D^4: Dataset distillation via disentangled diffusion model

    Duo Su, Junjie Hou, Weizhi Gao, Yingjie Tian, and Bowen Tang. D^4: Dataset distillation via disentangled diffusion model. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 5809–5818, June 2024

  21. [21]

    Task-specific generative dataset distillation with difficulty-guided sampling

    Mingzhuo Li, Guang Li, Jiafeng Mao, Linfeng Ye, Takahiro Ogawa, and Miki Haseyama. Task-specific generative dataset distillation with difficulty-guided sampling. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) Workshops, 2025

  22. [22]

    Fast-dit: Fast diffusion models with transformers

    Chuanyang Jin and Saining Xie. Fast-dit: Fast diffusion models with transformers. https: //github.com/chuanyangjin/fast-DiT, 2024

  23. [23]

    Generalized large-scale data condensation via various backbone and statistical matching, 2024

    Shitong Shao, Zeyuan Yin, Muxin Zhou, Xindong Zhang, and Zhiqiang Shen. Generalized large-scale data condensation via various backbone and statistical matching, 2024

  24. [24]

    Squeeze, recover and relabel: Dataset condensation at imagenet scale from a new perspective.Advances in Neural Information Processing Systems, 36:73582–73603, 2023

    Zeyuan Yin, Eric Xing, and Zhiqiang Shen. Squeeze, recover and relabel: Dataset condensation at imagenet scale from a new perspective.Advances in Neural Information Processing Systems, 36:73582–73603, 2023

  25. [25]

    Dataset condensation with gradient matching

    Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. Dataset condensation with gradient matching. InInternational Conference on Learning Representations (ICLR), 2021

  26. [26]

    Dataset distillation by matching training trajectories

    George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A Efros, and Jun-Yan Zhu. Dataset distillation by matching training trajectories. InProceedings of the IEEE/CVF confer- ence on computer vision and pattern recognition, pages 4750–4759, 2022

  27. [27]

    An empirical study of example forgetting during deep neural network learning.arXiv preprint arXiv:1812.05159, 2018

    Mariya Toneva et al. An empirical study of example forgetting during deep neural network learning.arXiv preprint arXiv:1812.05159, 2018

  28. [28]

    Selection via proxy: Efficient data selection for deep learning.arXiv preprint arXiv:1906.11829, 2019

    Cody Coleman et al. Selection via proxy: Efficient data selection for deep learning.arXiv preprint arXiv:1906.11829, 2019

  29. [29]

    A label is worth a thousand images in dataset distillation.Advances in Neural Information Processing Systems, 37:131946–131971, 2024

    Tian Qin, Zhiwei Deng, and David Alvarez-Melis. A label is worth a thousand images in dataset distillation.Advances in Neural Information Processing Systems, 37:131946–131971, 2024

  30. [30]

    Dd-ranking: Rethinking the evaluation of dataset distillation.arXiv preprint arXiv:2505.13300, 2025

    Zekai Li, Xinhao Zhong, Samir Khaki, Zhiyuan Liang, Yuhao Zhou, Mingjia Shi, Ziqiao Wang, Xuanlei Zhao, Wangbo Zhao, Ziheng Qin, et al. Dd-ranking: Rethinking the evaluation of dataset distillation.arXiv preprint arXiv:2505.13300, 2025. 11

  31. [31]

    Rectified decoupled dataset distillation: A closer look for fair and comprehensive evaluation.arXiv preprint arXiv:2509.19743, 2025

    Xinhao Zhong, Shuoyang Sun, Xulin Gu, Chenyang Zhu, Bin Chen, and Yaowei Wang. Rectified decoupled dataset distillation: A closer look for fair and comprehensive evaluation.arXiv preprint arXiv:2509.19743, 2025

  32. [32]

    Rethinking dataset distillation: Hard truths about soft labels

    Priyam Dey, Aditya Sahdev, Sunny Bhati, Konda Reddy Mopuri, and Venkatesh Babu Radhakr- ishnan. Rethinking dataset distillation: Hard truths about soft labels. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 178–187, 2026

  33. [33]

    Unifying Dataset Pruning and Distillation for Efficient Large-scale Compression

    Lingao Xiao, Songhua Liu, Yang He, and Xinchao Wang. Rethinking large-scale dataset compression: Shifting focus from labels to images.arXiv preprint arXiv:2502.06434, 2025

  34. [34]

    Coverage-centric coreset selection for high pruning rates.arXiv preprint arXiv:2210.15809, 2022

    Haizhong Zheng, Rui Liu, Fan Lai, and Atul Prakash. Coverage-centric coreset selection for high pruning rates.arXiv preprint arXiv:2210.15809, 2022

  35. [35]

    Elfs: Enhancing label-free coreset selection via clustering-based pseudo-labeling

    Haizhong Zheng, Elisa Tsai, Yifu Lu, Jiachen Sun, Brian R Bartoldson, Bhavya Kailkhura, and Atul Prakash. Elfs: Enhancing label-free coreset selection via clustering-based pseudo-labeling. arXiv preprint arXiv:2406.04273, 2024

  36. [36]

    Dataset condensation with distribution matching

    Bo Zhao and Hakan Bilen. Dataset condensation with distribution matching. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 6514–6523, 2023

  37. [37]

    Imagenet: A large- scale hierarchical image database

    Jia Deng, Wei Dong, Richard Socher, Li-Jia Li, Kai Li, and Li Fei-Fei. Imagenet: A large- scale hierarchical image database. In2009 IEEE conference on computer vision and pattern recognition, pages 248–255. Ieee, 2009

  38. [38]

    Contrastive multiview coding

    Yonglong Tian, Dilip Krishnan, and Phillip Isola. Contrastive multiview coding. InEuropean conference on computer vision, pages 776–794. Springer, 2020

  39. [39]

    Imagenette: A smaller subset of 10 easily classified classes from imagenet

    Jeremy Howard. Imagenette: A smaller subset of 10 easily classified classes from imagenet. https://github.com/fastai/imagenette, April 2019. Accessed: 2019

  40. [40]

    Scalable diffusion models with transformers

    William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF international conference on computer vision, pages 4195–4205, 2023

  41. [41]

    Synthesizing informative training samples with gan, 2022

    Bo Zhao and Hakan Bilen. Synthesizing informative training samples with gan, 2022

  42. [42]

    Active Learning for Convolutional Neural Networks: A Core-Set Approach

    Ozan Sener and Silvio Savarese. Active learning for convolutional neural networks: A core-set approach.arXiv preprint arXiv:1708.00489, 2017

  43. [43]

    Turning big data into tiny data: Constant-size coresets for k-means.SIAM Journal on Computing, 2020

    Dan Feldman et al. Turning big data into tiny data: Constant-size coresets for k-means.SIAM Journal on Computing, 2020

  44. [44]

    Understanding black-box predictions via influence functions

    Pang Wei Koh and Percy Liang. Understanding black-box predictions via influence functions. InICML, 2017

  45. [45]

    Gradmatch: Gradient matching based data subset selection

    Krishnateja Killamsetty et al. Gradmatch: Gradient matching based data subset selection. In ICML, 2021

  46. [46]

    Milo: Model-agnostic subset selection framework for efficient model training and tuning.arXiv preprint arXiv:2301.13287, 2023

    Krishnateja Killamsetty, Alexandre V Evfimievski, Tejaswini Pedapati, Kiran Kate, Lucian Popa, and Rishabh Iyer. Milo: Model-agnostic subset selection framework for efficient model training and tuning.arXiv preprint arXiv:2301.13287, 2023

  47. [47]

    Provable data subset selection for efficient neural networks training

    Murad Tukan, Samson Zhou, Alaa Maalouf, Daniela Rus, Vladimir Braverman, and Dan Feldman. Provable data subset selection for efficient neural networks training. InInternational Conference on Machine Learning, pages 34533–34555. PMLR, 2023

  48. [48]

    Grad-match: Gradient matching based data subset selection for efficient deep model training

    Krishnateja Killamsetty, Sivasubramanian Durga, Ganesh Ramakrishnan, Abir De, and Rishabh Iyer. Grad-match: Gradient matching based data subset selection for efficient deep model training. InInternational Conference on Machine Learning, pages 5464–5474. PMLR, 2021. 12 Appendix We present an additional details of DD and CS methods, followed by details of t...