pith. sign in

arxiv: 2605.18012 · v1 · pith:QIWYXAJHnew · submitted 2026-05-18 · 💻 cs.CV · cs.AI· cs.LG

SAS: Semantic-aware Sampling for Generative Dataset Distillation

Pith reviewed 2026-05-20 12:16 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.LG
keywords dataset distillationsemantic-aware samplingCLIPgenerative dataset distillationsemantic scoringdata efficiencyimage classification
0
0 comments X

The pith

Semantic-aware sampling with CLIP enhances the effectiveness of distilled datasets for model training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to show that adding semantic awareness to the sampling process in generative dataset distillation can create more useful compact datasets. It uses a pretrained CLIP model to score images based on how well they represent their class, how separable classes are, and how diverse the set is. By doing a two-stage process of filtering good candidates and then selecting for diversity, the method improves performance on downstream tasks without needing more data. This matters because current distillation methods focus on statistics but miss high-level meaning, so this could make efficient training more reliable.

Core claim

By leveraging CLIP as a semantic prior for post-sampling, we develop three semantic scoring functions that measure class relevance, inter-class separability, and intra-set diversity. We then apply a two-stage sampling strategy on generated image pools: first filtering for discriminative samples, then dynamically selecting for diversity. This produces distilled datasets that are compact, semantically discriminative, and diverse, leading to consistent gains in downstream model performance across various datasets and architectures.

What carries the argument

Three semantic scoring functions based on CLIP embeddings for class relevance, separability, and diversity, combined in a two-stage filtering and dynamic selection process.

If this is right

  • Distilled datasets preserve more high-level semantic information.
  • Performance improves consistently on image classification tasks using the distilled sets.
  • The method can be applied as a post-processing step to existing distillation techniques.
  • Smaller datasets can achieve higher accuracy by being more semantically informative.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar semantic priors could be used in distillation for other data types like text or audio if appropriate models exist.
  • The approach might allow for even smaller dataset sizes while maintaining accuracy.
  • Integrating task-specific fine-tuning of the semantic model could further boost results on particular applications.

Load-bearing premise

The CLIP model provides a semantic space that accurately reflects class relevance and diversity for the specific datasets and tasks without domain mismatch.

What would settle it

Observing no performance gain or a loss when applying the method to a dataset with significant domain shift from CLIP's training data, such as specialized scientific imagery, would indicate the claim does not hold.

Figures

Figures reproduced from arXiv: 2605.18012 by Guang Li, Jiafeng Mao, Konstantinos N. Plataniotis, Linfeng Ye, Miki Haseyama, Mingzhuo Li, Takahiro Ogawa.

Figure 1
Figure 1. Figure 1: FIGURE 1 [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIGURE 2 [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIGURE 3 [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIGURE 4 [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIGURE 5 [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
read the original abstract

Deep neural networks have achieved impressive performance across a wide range of tasks, but this success often comes with substantial computational and storage costs due to large-scale training data. Dataset distillation addresses this challenge by constructing compact yet informative datasets that enable efficient model training while maintaining downstream performance. However, most existing approaches primarily emphasize matching data distributions or downstream training statistics, with limited attention to preserving high-level semantic information in the distilled data. In this work, we introduce a semantic-aware perspective for dataset distillation by leveraging Contrastive Language-Image Pretraining (CLIP) as a semantic prior for post-sampling. Our goal is to obtain distilled datasets that are not only compact but also semantically class-discriminative and diverse. To this end, we design three semantic scoring functions that quantify class relevance, inter-class separability, and intra-set diversity in a pretrained semantic space. Based on image pools generated by existing distillation methods, we further develop a two-stage strategy for effective sampling: the first stage filters semantically discriminative samples to form a reliable candidate set, and the second stage performs a dynamic diversity-aware selection to reduce redundancy while preserving semantic coverage. Extensive experiments across multiple datasets, image pools, and downstream models demonstrate consistent performance gains, highlighting the effectiveness of incorporating semantic information into dataset distillation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript proposes SAS, a semantic-aware sampling method for generative dataset distillation. It uses a pretrained CLIP model to define three scoring functions quantifying class relevance, inter-class separability, and intra-set diversity. These are applied via a two-stage process (filtering semantically discriminative samples from existing distillation pools, followed by dynamic diversity-aware selection) to produce compact datasets that are claimed to be more class-discriminative and diverse. Extensive experiments on datasets including CIFAR-10/100 and Tiny-ImageNet across multiple image pools and downstream models are said to show consistent performance gains over baselines.

Significance. If the central results hold after addressing the noted concerns, the work offers a practical post-processing step that incorporates high-level semantic priors into dataset distillation. This could improve the utility of distilled datasets for efficient model training while better preserving semantic structure, and the two-stage strategy is modular enough to apply atop various existing distillation methods.

major comments (2)
  1. [Method (scoring functions and two-stage strategy)] The central claim that performance gains arise specifically from 'semantic awareness' via CLIP (as stated in the abstract) rests on the assumption that CLIP embeddings provide an unbiased, task-appropriate metric without substantial domain shift or label misalignment on the target distributions. No analysis, ablation, or control experiment (e.g., comparing against non-CLIP heuristics or random selection with matched cardinality) is described to rule out that gains could stem from any non-random filtering rather than the semantic scores themselves.
  2. [Experiments section] The abstract asserts 'consistent performance gains' and 'extensive experiments,' yet supplies no quantitative tables, error bars, statistical tests, or ablation details on the individual contributions of the three scoring functions. Without these, it is impossible to assess robustness or whether improvements are load-bearing for the semantic-aware claim.
minor comments (2)
  1. [Method] Notation for the three scoring functions could be introduced more clearly with explicit equations rather than descriptive text to aid reproducibility.
  2. [Abstract] The abstract would benefit from naming the specific baseline distillation methods and exact dataset splits used in the comparisons.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive comments on our manuscript. We address each of the major comments below and outline the revisions we plan to make to strengthen the paper.

read point-by-point responses
  1. Referee: [Method (scoring functions and two-stage strategy)] The central claim that performance gains arise specifically from 'semantic awareness' via CLIP (as stated in the abstract) rests on the assumption that CLIP embeddings provide an unbiased, task-appropriate metric without substantial domain shift or label misalignment on the target distributions. No analysis, ablation, or control experiment (e.g., comparing against non-CLIP heuristics or random selection with matched cardinality) is described to rule out that gains could stem from any non-random filtering rather than the semantic scores themselves.

    Authors: We agree that additional controls are necessary to more rigorously attribute the performance improvements to the semantic scoring functions rather than the filtering process in general. In the revised manuscript, we will include new experiments comparing our method against random selection from the same image pool with matched cardinality, as well as against non-semantic heuristics such as selecting based on image quality metrics or simple clustering without CLIP. These ablations will help isolate the contribution of the CLIP-based semantic awareness. We will also discuss potential domain shift issues and how the two-stage strategy mitigates them. revision: yes

  2. Referee: [Experiments section] The abstract asserts 'consistent performance gains' and 'extensive experiments,' yet supplies no quantitative tables, error bars, statistical tests, or ablation details on the individual contributions of the three scoring functions. Without these, it is impossible to assess robustness or whether improvements are load-bearing for the semantic-aware claim.

    Authors: We apologize for any lack of clarity in the presentation. The full manuscript does contain quantitative results in the form of performance tables across CIFAR-10, CIFAR-100, and Tiny-ImageNet for various image pools and downstream models. However, we acknowledge that error bars from multiple runs, statistical tests for significance, and detailed ablations on the individual scoring functions (class relevance, inter-class separability, and intra-set diversity) are not sufficiently detailed. In the revision, we will add these: standard deviations over 3-5 runs, p-values or t-tests where appropriate, and ablation studies showing the impact of each scoring function and the two-stage process. This will better support the claims of consistent gains and the importance of semantic awareness. revision: yes

Circularity Check

0 steps flagged

No significant circularity; method uses external CLIP embeddings and prior distillation pools as independent inputs.

full rationale

The paper defines semantic scoring functions (class relevance, inter-class separability, intra-set diversity) directly from a pretrained external CLIP model applied to image pools produced by existing distillation methods. These functions and the subsequent two-stage filter-then-diversity selection are not fitted to the target downstream performance inside the paper, nor do they reduce to self-citations or self-defined quantities. Experimental gains are reported against external benchmarks (CIFAR-10/100, Tiny-ImageNet, multiple models), making the derivation self-contained rather than circular by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The method rests on the assumption that CLIP embeddings reliably encode the semantic properties needed for distillation; no free parameters or new entities are introduced in the abstract description.

axioms (1)
  • domain assumption Pretrained CLIP provides a suitable semantic prior for quantifying class relevance, inter-class separability, and intra-set diversity in the target image domains.
    Invoked when the three scoring functions are defined and applied to image pools.

pith-pipeline@v0.9.0 · 5781 in / 1215 out tokens · 34278 ms · 2026-05-20T12:16:02.064376+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    we design three semantic scoring functions that quantify class relevance, inter-class separability, and intra-set diversity in a pretrained semantic space... s_rel(xi,c)=-d_ang(vi,t_c), s_sep= min d_ang to other classes, s_div=average d_ang within class; two-stage: margin filtering then dynamic diversity-aware selection

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 3 internal anchors

  1. [1]

    Adversarial robust modulation recognition guided by attention mechanisms,

    Q. Zhan, X. Zhang, M. Sun, L. Song, and Z. Zhou, “Adversarial robust modulation recognition guided by attention mechanisms,”IEEE Open Journal of Signal Processing, vol. 6, pp. 17–29, 2025

  2. [2]

    Advancement in graph neural networks for eeg signal analysis and application: A review,

    S. M. Atoar Rahman, M. Ibrahim Khalil, H. Zhou, Y . Guo, Z. Ding, X. Gao, and D. Zhang, “Advancement in graph neural networks for eeg signal analysis and application: A review,”IEEE ACCESS, vol. 13, pp. 50 167–50 187, 2025

  3. [3]

    Energy and Policy Considerations for Deep Learning in NLP

    E. Strubell, A. Ganesh, and A. McCallum, “Energy and policy consid- erations for deep learning in nlp,”arXiv preprint arXiv:1906.02243, 2019

  4. [4]

    Densely connected convolutional networks,

    G. Huang, Z. Liu, L. v. d. Maaten, and K. Q. Weinberger, “Densely connected convolutional networks,” inCVPR, 2017

  5. [5]

    A convnet for the 2020s,

    Z. Liu, H. Mao, C.-Y . Wu, C. Feichtenhofer, T. Darrell, and S. Xie, “A convnet for the 2020s,” inCVPR, 2022

  6. [6]

    Imagenet large scale visual recognition challenge,

    O. Russakovsky, J. Deng, H. Su, J. Krause, S. Satheesh, S. Ma, Z. Huang, A. Karpathy, A. Khosla, M. Bernstein, A. C. Berg, and F.-F. Li, “Imagenet large scale visual recognition challenge,”International Journal of Computer Vision, vol. 115, no. 3, 2015

  7. [7]

    Segformer: simple and efficient design for semantic segmentation with transformers,

    E. Xie, W. Wang, Z. Yu, A. Anandkumar, J. M. Alvarez, and P. Luo, “Segformer: simple and efficient design for semantic segmentation with transformers,” inNeurIPS, 2021

  8. [8]

    Semantic understanding of scenes through the ade20k dataset,

    B. Zhou, H. Zhao, X. Puig, T. Xiao, S. Fidler, A. Barriuso, and A. Torralba, “Semantic understanding of scenes through the ade20k dataset,”International Journal of Computer Vision, vol. 127, no. 3, p. 302–321, Mar. 2019

  9. [9]

    Scaling Laws for Neural Language Models

    J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, and D. Amodei, “Scaling laws for neural language models,”arXiv preprint arXiv:2001.08361, 2020

  10. [10]

    Dataset Distillation

    T. Wang, J.-Y . Zhu, A. Torralba, and A. A. Efros, “Dataset distillation,” arXiv preprint arXiv:1811.10959, 2018

  11. [11]

    Awesome dataset distillation,

    G. Li, B. Zhao, and T. Wang, “Awesome dataset distillation,” GitHub repository, Tech. Rep., 2022, https://github.com/Guang000/Awesome- Dataset-Distillation

  12. [12]

    Soft-label anonymous gastric x-ray image distillation,

    G. Li, R. Togo, T. Ogawa, and M. Haseyama, “Soft-label anonymous gastric x-ray image distillation,” inICIP, 2020

  13. [13]

    Dataset distillation: A comprehensive review,

    R. Yu, S. Liu, and X. Wang, “Dataset distillation: A comprehensive review,”IEEE Transactions on Pattern Analysis and Machine Intelli- gence, vol. 46, no. 1, 2023

  14. [14]

    The evolution of dataset distillation: Toward scalable and generalizable solutions,

    P. Liu and J. Du, “The evolution of dataset distillation: Toward scalable and generalizable solutions,”arXiv preprint arXiv:2502.05673, 2025

  15. [15]

    Dataset condensation with gradient matching,

    B. Zhao and H. Bilen, “Dataset condensation with gradient matching,” inICLR, 2021

  16. [16]

    Dataset distillation by matching training trajectories,

    G. Cazenavette, T. Wang, A. Torralba, A. A. Efros, and J.-Y . Zhu, “Dataset distillation by matching training trajectories,” inCVPR, 2022

  17. [17]

    Dataset distillation using parameter pruning,

    G. Li, R. Togo, T. Ogawa, and M. Haseyama, “Dataset distillation using parameter pruning,”IEICE Transactions on Fundamentals of Electronics, Communications and Computer Sciences, 2023

  18. [18]

    Diversity-enhanced distribution alignment for dataset distillation,

    H. Li, Y . Zhou, X. Gu, B. Li, and W. Wang, “Diversity-enhanced distribution alignment for dataset distillation,” inICCV, 2025

  19. [19]

    Importance-aware adaptive dataset distillation,

    G. Li, R. Togo, T. Ogawa, and M. Haseyama, “Importance-aware adaptive dataset distillation,”Neural Networks, 2024

  20. [20]

    DiM: Distill- ing dataset into generative model,

    K. Wang, J. Gu, D. Zhou, Z. Zhu, W. Jiang, and Y . You, “DiM: Distill- ing dataset into generative model,”arXiv preprint arXiv:2303.04707, 2023

  21. [21]

    Efficient dataset distillation via minimax diffusion,

    J. Gu, S. Vahidian, V . Kungurtsev, H. Wang, W. Jiang, Y . You, and Y . Chen, “Efficient dataset distillation via minimax diffusion,” in CVPR, 2024

  22. [22]

    Generative dataset distillation: Balancing global structure and local details,

    L. Li, G. Li, R. Togo, K. Maeda, T. Ogawa, and M. Haseyama, “Generative dataset distillation: Balancing global structure and local details,” inCVPR Workshop, 2024, pp. 7664–7671

  23. [23]

    Diversity- driven generative dataset distillation based on diffusion model with self-adaptive memory,

    M. Li, G. Li, J. Mao, T. Ogawa, and M. Haseyama, “Diversity- driven generative dataset distillation based on diffusion model with self-adaptive memory,” inICIP, 2025

  24. [24]

    Task- specific generative dataset distillation with difficulty-guided sampling,

    M. Li, G. Li, J. Mao, L. Ye, T. Ogawa, and M. Haseyama, “Task- specific generative dataset distillation with difficulty-guided sampling,” inICCV Workshop, 2025

  25. [25]

    & Wang, M

    M. Chen, S. Mei, J. Fan, and W. Mengdi, “An overview of diffusion models: Applications, guided generation, statistical rates and optimiza- tion,”arXiv preprint arXiv:2404.07771, 2024

  26. [26]

    Learning transferable visual models from natural language supervi- sion,

    A. Radford, J. W. Kim, C. Hallacy, A. Ramesh, G. Goh, S. Agarwal, G. Sastry, A. Askell, P. Mishkin, J. Clark, G. Krueger, and I. Sutskever, “Learning transferable visual models from natural language supervi- sion,” inICML, 2021

  27. [27]

    Audio-visual speech en- hancement with score-based generative models,

    J. Richter, S. Frintrop, and T. Gerkmann, “Audio-visual speech en- hancement with score-based generative models,” inITG-SC, 2023

  28. [28]

    Auto-encoding variational bayes,

    D. P. Kingma and M. Welling, “Auto-encoding variational bayes,” ICLR, 2014

  29. [29]

    Generative adversarial net- works,

    I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y . Bengio, “Generative adversarial net- works,” inNeurIPS, 2014

  30. [30]

    Denoising diffusion probabilistic models,

    J. Ho, A. Jain, and P. Abbeel, “Denoising diffusion probabilistic models,” inNeurIPS, 2020

  31. [31]

    High- resolution image synthesis with latent diffusion models,

    R. Rombach, A. Blattmann, D. Lorenz, P. Esser, and B. Ommer, “High- resolution image synthesis with latent diffusion models,” inCVPR, 2022

  32. [32]

    Scalable diffusion models with transformers,

    W. Peebles and S. Xie, “Scalable diffusion models with transformers,” inICCV, 2023

  33. [33]

    Adding conditional control to text-to-image diffusion models,

    L. Zhang, A. Rao, and M. Agrawala, “Adding conditional control to text-to-image diffusion models,” inICCV, 2023

  34. [34]

    Glide: Towards photorealistic image generation and editing with text-guided diffusion models,

    A. Nichol, P. Dhariwal, A. Ramesh, P. Shyam, P. Mishkin, B. McGrew, I. Sutskever, and M. Chen, “Glide: Towards photorealistic image generation and editing with text-guided diffusion models,” inICML, 2022. 10 VOLUME , <Society logo(s) and publication title will appear here.>

  35. [35]

    Causal diffusion models for generalized speech enhancement,

    J. Richter, S. Welker, J.-M. Lemercier, B. Lay, T. Peer, and T. Gerk- mann, “Causal diffusion models for generalized speech enhancement,” IEEE Open Journal of Signal Processing, vol. 5, pp. 780–789, 2024

  36. [36]

    D4M: Dataset distillation via disentangled diffusion model,

    D. Su, J. Hou, W. Gao, Y . Tian, and B. Tang, “D4M: Dataset distillation via disentangled diffusion model,” inCVPR, 2024

  37. [37]

    Information-guided diffusion sampling for dataset distillation,

    L. Ye, S. M. Hamidi, G. Li, T. Ogawa, M. Haseyama, and K. N. Platan- iotis, “Information-guided diffusion sampling for dataset distillation,” inAdvances in Neural Information Processing Systems Workshops, 2025

  38. [38]

    Effective pruning of web-scale datasets based on complexity of concept clusters,

    A. Abbas, E. Rusak, K. Tirumala, W. Brendel, K. Chaudhuri, and A. S. Morcos, “Effective pruning of web-scale datasets based on complexity of concept clusters,” inICLR, 2024

  39. [39]

    A coreset selection of coreset selection literature: Introduction and recent advances,

    B. B. Moser, A. S. Shanbhag, S. Frolov, F. Raue, J. Folz, and A. Den- gel, “A coreset selection of coreset selection literature: Introduction and recent advances,”arXiv preprint arXiv:2505.17799, 2025

  40. [40]

    Provable and efficient dataset distillation for kernel ridge regression,

    Y . Chen, W. Huang, and T.-W. Weng, “Provable and efficient dataset distillation for kernel ridge regression,” inNeurIPS, 2024

  41. [41]

    On the diversity and realism of distilled dataset: An efficient dataset distillation paradigm,

    P. Sun, B. Shi, D. Yu, and T. Lin, “On the diversity and realism of distilled dataset: An efficient dataset distillation paradigm,” inCVPR, 2024

  42. [42]

    FADRM: Fast and accurate data residual matching for dataset distillation,

    J. Cui, X. Bi, Y . Luo, X. Zhao, J. Liu, and Z. Shen, “FADRM: Fast and accurate data residual matching for dataset distillation,” inNeurIPS, 2025

  43. [43]

    MGD3: Mode-guided dataset distillation using diffusion models,

    J. A. Chan-Santiago, P. Tirupattur, G. K. Nayak, G. Liu, and M. Shah, “MGD3: Mode-guided dataset distillation using diffusion models,” in ICML, 2025

  44. [44]

    CaO2: Rectifying inconsistencies in diffusion-based dataset distillation,

    H. Wang, Z. Zhao, J. Wu, Y . Shang, G. Liu, and Y . Yan, “CaO2: Rectifying inconsistencies in diffusion-based dataset distillation,” in ICCV, 2025

  45. [45]

    Adaptive dataset quantization,

    M. Li, D. Zhang, Q. Dong, X. Xie, and K. Qin, “Adaptive dataset quantization,” inAAAI, 2025

  46. [46]

    Heavy labels out! dataset distillation with label space lightening,

    R. Yu, S. Liu, Z. Chen, J. Ye, and X. Wang, “Heavy labels out! dataset distillation with label space lightening,” inICCV, 2025

  47. [47]

    Summarizing stream data for memory-restricted online continual learning,

    J. Gu, K. Wang, W. Jiang, and Y . You, “Summarizing stream data for memory-restricted online continual learning,” inAAAI, 2024

  48. [48]

    Compressed gastric image generation based on soft-label dataset distillation for medical data sharing,

    G. Li, R. Togo, T. Ogawa, and M. Haseyama, “Compressed gastric image generation based on soft-label dataset distillation for medical data sharing,”Computer Methods and Programs in Biomedicine, 2022

  49. [49]

    Improving noise efficiency in privacy-preserving dataset distillation,

    R. Zheng, V . A. Dasu, Y . O. Wang, H. Wang, and F. D. l. Torre, “Improving noise efficiency in privacy-preserving dataset distillation,” inICCV, 2025

  50. [50]

    TD3: Tucker decomposition based dataset distillation method for sequential recommendation,

    J. Zhang, M. Yin, H. Wang, Y . Li, Y . Ye, X. Lou, J. Du, and E. Chen, “TD3: Tucker decomposition based dataset distillation method for sequential recommendation,” inWWW, 2025

  51. [51]

    Interpreting and analysing clip’s zero-shot image classification via mutual knowledge,

    F. Sammani and N. Deligiannis, “Interpreting and analysing clip’s zero-shot image classification via mutual knowledge,” inNeurIPS, 2024

  52. [52]

    Structural-aware disentangled learning with clip for hyperbolic zero-shot sketch-based image retrieval,

    Q. Zhang, J. Zhang, F. Bao, X. Su, and G. Gao, “Structural-aware disentangled learning with clip for hyperbolic zero-shot sketch-based image retrieval,” inICASSP, 2025

  53. [53]

    Wav2clip: Learning robust audio representations from clip,

    W. Ho-Hsiang, P. Seetharaman, K. Kumar, and J. P. Bello, “Wav2clip: Learning robust audio representations from clip,” inICASSP, 2025

  54. [54]

    Universal guidance for diffusion mod- els,

    A. Bansal, H.-M. Chu, A. Schwarzschild, S. Sengupta, M. Goldblum, J. Geiping, and T. Goldstein, “Universal guidance for diffusion mod- els,” inCVPR Workshops, 2023

  55. [55]

    Active learning for convolutional neural networks: A core-set approach,

    O. Sener and S. Savarese, “Active learning for convolutional neural networks: A core-set approach,” inICLR, 2018

  56. [56]

    Dataset condensation via efficient synthetic-data parameterization,

    J.-H. Kim, J. Kim, S. J. Oh, S. Yun, H. Song, J. Jeong, J.-W. Ha, and H. O. Song, “Dataset condensation via efficient synthetic-data parameterization,” inICML, 2022

  57. [57]

    imagenette,

    Fastai, “imagenette,” GitHub repository, Tech. Rep., 2019, https://github.com/fastai/imagenette

  58. [58]

    Dynamic few-shot visual learning without forgetting,

    S. Gidaris and N. Komodakis, “Dynamic few-shot visual learning without forgetting,” inCVPR, 2018

  59. [59]

    Deep residual learning for image recognition,

    K. He, X. Zhang, S. Ren, and J. Sun, “Deep residual learning for image recognition,” inCVPR, 2016

  60. [60]

    Ketanmann/class conditioned diffusion training script: Class conditioned diffusion with multi-gpusupport,

    KetanMann, “Ketanmann/class conditioned diffusion training script: Class conditioned diffusion with multi-gpusupport,” Jun. 2024

  61. [61]

    Difffit: Unlocking transferability of large diffusion models via simple parameter-efficient fine-tuning,

    E. Xie, L. Yao, H. Shi, Z. Liu, D. Zhou, Z. Liu, J. Li, and Z. Li, “Difffit: Unlocking transferability of large diffusion models via simple parameter-efficient fine-tuning,” inICCV, 2023

  62. [62]

    Learning multiple layers of features from tiny images,

    A. Krizhevsky and G. Hinton, “Learning multiple layers of features from tiny images,” University of Toronto, Tech. Rep., 2009. VOLUME , 11