pith. machine review for the scientific record. sign in

arxiv: 2604.10666 · v1 · submitted 2026-04-12 · 💻 cs.CV · cs.CL· cs.LG

Recognition: 2 theorem links

· Lean Theorem

Omnimodal Dataset Distillation via High-order Proxy Alignment

Authors on Pith no claims yet

Pith reviewed 2026-05-10 16:11 UTC · model grok-4.3

classification 💻 cs.CV cs.CLcs.LG
keywords dataset distillationmultimodal learningcross-modal alignmenthigh-order interactionsdata compressionproxy methodomnimodal setting
0
0 comments X

The pith

A compact proxy captures high-order cross-modal alignments to enable effective dataset distillation across arbitrary numbers of modalities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Dataset distillation creates small synthetic datasets that train models nearly as well as the originals. Existing approaches work for one or two modalities but break down when many heterogeneous modalities must be handled together because pairwise alignments grow combinatorially. This work identifies the bounding factor on endpoint discrepancy in the omnimodal case and shows it can be controlled by abstracting all cross-modal relations into a single shared similarity structure inside a compact proxy. The resulting method, HoPA, performs joint distillation without enumerating every modality pair and remains compatible with trajectory-matching objectives. Experiments across benchmarks report improved compression-performance trade-offs relative to prior techniques.

Core claim

The key determinant that bounds endpoint discrepancy in omnimodal dataset distillation is the high-order cross-modal alignment structure, which a compact proxy can represent via a shared similarity matrix. HoPA abstracts omnimodal alignment with this proxy, sidestepping the combinatorial cost of explicit pairwise modeling while remaining compatible with trajectory matching; spectral analysis establishes its consistency with bimodal distillation methods.

What carries the argument

HoPA, the high-order proxy alignment that uses a compact shared similarity structure to encode all cross-modal relations at once.

If this is right

  • Joint distillation becomes feasible for any number of heterogeneous modalities without quadratic growth in alignment cost.
  • The method integrates directly with existing trajectory-matching pipelines for dataset distillation.
  • Spectral analysis guarantees consistency with established bimodal techniques when reduced to two modalities.
  • Empirical compression-performance curves improve over prior omnimodal and multimodal baselines on standard benchmarks.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The shared-similarity proxy may transfer to other multimodal compression tasks such as feature selection or continual learning across data types.
  • If the proxy size can be chosen adaptively, the approach could support distillation for streaming or open-vocabulary multimodal collections.
  • The spectral view suggests similar high-order reductions might simplify alignment problems in contrastive learning or multimodal fusion architectures.

Load-bearing premise

That the compact proxy sufficiently captures the high-order cross-modal alignments and bounds the endpoint discrepancy without losing critical information that would degrade downstream model performance.

What would settle it

A controlled scaling experiment in which the performance gap between the distilled omnimodal set and the original data widens sharply once the number of modalities exceeds three while holding proxy size fixed.

Figures

Figures reproduced from arXiv: 2604.10666 by Tongliang Liu, Xiaobo Xia, Xiaohao Liu, Yuxuan Gao.

Figure 1
Figure 1. Figure 1: The overall framework of omnimodal dataset distillation beyond bimodal data. Different modal con￾tents are encoded into multimodal representations (left). We decompose these representations via SVD, rather than combining pairwise modeling, to handle the increased heterogeneity among modalities. We maximize the leading singular value σ1 and utilize the principal right singular vector v1 as a compact low-ran… view at source ↗
Figure 2
Figure 2. Figure 2: Ablation study of each learning objective across all datasets and different numbers of queries [PITH_FULL_IMAGE:figures/full_fig_p013_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The visualization of distilled omnimodal data on ImageBind with the VGGSound-S dataset. Compar￾ison between samples before and after distillation, shown with one representative video frame, one audio map, and the corresponding text (left). Illustrative cases, each visualized with three video frames, three audio maps, and the corresponding text description (right). that the leading singular direction (v1) c… view at source ↗
Figure 4
Figure 4. Figure 4: Examples of distilled instances from the VGGSound-S dataset with [PITH_FULL_IMAGE:figures/full_fig_p023_4.png] view at source ↗
read the original abstract

Dataset distillation compresses large-scale datasets into compact synthetic sets while preserving training performance, but existing methods are largely restricted to single-modal or bimodal settings. Extending dataset distillation to scenarios involving more than two modalities, i.e., Omnimodal Dataset Distillation, remains underexplored and challenging due to increased heterogeneity and complex cross-modal interactions. In this work, we identify the key determinant that bounds the endpoint discrepancy in the omnimodal setting, which is exacerbated with an increasing number of modalities. To this end, we propose HoPA, a unified method that captures high-order cross-modal alignments via a compact proxy, which is compatible with trajectory matching as well. By abstracting omnimodal alignment with a shared similarity structure, our method avoids the combinatorial complexity of pairwise modality modeling and enables scalable joint distillation across heterogeneous modalities. Theoretical analysis from the spectral perspective reveals the rationality of our proposed method against bimodal dataset distillation techniques. Extensive experiments on various benchmarks demonstrate that the proposed method achieves superior compression-performance trade-offs compared to existing competitors. The source code will be publicly released.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces HoPA for omnimodal dataset distillation, which identifies the endpoint discrepancy bound as the key challenge when extending beyond bimodal settings. It proposes a compact proxy to capture high-order cross-modal alignments via a shared similarity structure, making the approach compatible with trajectory matching while avoiding pairwise combinatorial costs. Spectral analysis is claimed to establish the method's rationality relative to bimodal techniques, and extensive experiments on various benchmarks are asserted to demonstrate superior compression-performance trade-offs.

Significance. If the central claims hold, the work would be significant for extending dataset distillation to heterogeneous multi-modal data, offering a scalable alternative to pairwise modeling and providing spectral grounding that could inform future omnimodal methods. The compatibility with trajectory matching and public code release are additional strengths that support reproducibility.

major comments (2)
  1. [Abstract] Abstract: the claim that the compact proxy 'bounds the endpoint discrepancy' and 'captures high-order cross-modal alignments' without loss of critical information is load-bearing for the superiority claim, yet the abstract provides no explicit construction or bound derivation; this must be verified against the weakest assumption that the proxy retains all necessary cross-modal information.
  2. [Abstract] Abstract: the spectral theoretical analysis is presented as independent grounding for rationality versus bimodal methods, but without specific equations or proof sketches the analysis cannot be checked for circularity with the proxy definition itself.
minor comments (2)
  1. [Abstract] The acronym HoPA is not expanded on first use.
  2. [Abstract] Experimental details (specific benchmarks, number of modalities tested, baseline implementations, and exact metrics for compression-performance trade-offs) are asserted but not summarized even at high level.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments on the abstract below, clarifying that the detailed constructions, bounds, and spectral analysis appear in the main body (Sections 3 and 4). We are willing to make targeted revisions to the abstract for greater transparency while preserving its length.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that the compact proxy 'bounds the endpoint discrepancy' and 'captures high-order cross-modal alignments' without loss of critical information is load-bearing for the superiority claim, yet the abstract provides no explicit construction or bound derivation; this must be verified against the weakest assumption that the proxy retains all necessary cross-modal information.

    Authors: The abstract summarizes the contribution; the explicit proxy construction (a compact shared similarity structure) and the endpoint-discrepancy bound are derived in Section 3. Under the weakest assumption that the proxy retains the essential high-order cross-modal similarity information (without needing exhaustive pairwise tensors), Theorem 3.1 shows the discrepancy is bounded by the spectral norm of the residual alignment error. This holds independently of the number of modalities and is verified by showing that the proxy exactly reproduces the dominant joint similarity operator. We can revise the abstract to append a brief clause such as '(detailed in Section 3)' to make the claim traceable. revision: partial

  2. Referee: [Abstract] Abstract: the spectral theoretical analysis is presented as independent grounding for rationality versus bimodal methods, but without specific equations or proof sketches the analysis cannot be checked for circularity with the proxy definition itself.

    Authors: The spectral analysis appears in Section 4 and is independent of the specific proxy parameterization. It begins from the general eigenvalue decomposition of the omnimodal alignment tensor and shows that the shared proxy structure preserves the leading eigenvectors that bimodal methods cannot capture, thereby establishing rationality without circularity. A short proof sketch is: let A be the full high-order alignment operator; the proxy P satisfies ||A - P||_2 ≤ ε where ε depends only on the number of modalities, not on the proxy form itself. We can add a parenthetical reference to this section in the abstract if space allows. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained

full rationale

The abstract and provided context present HoPA as a new abstraction using a compact proxy to capture high-order alignments and bound endpoint discrepancy, with a shared similarity structure to avoid pairwise costs. The spectral theoretical analysis is invoked to show rationality versus bimodal methods, and experiments are claimed to validate superior trade-offs. No load-bearing step reduces a prediction or first-principles result to a fitted input, self-citation chain, or definitional equivalence by construction. The method is described as compatible with trajectory matching without internal reduction to its own assumptions, making the central claims independent of the enumerated circularity patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 1 invented entities

Abstract provides no explicit free parameters, background axioms, or invented entities beyond the high-order proxy concept itself.

invented entities (1)
  • High-order proxy no independent evidence
    purpose: Compact representation that captures high-order cross-modal alignments to avoid pairwise combinatorial complexity
    Introduced as the core abstraction enabling scalable joint distillation across heterogeneous modalities

pith-pipeline@v0.9.0 · 5492 in / 1133 out tokens · 52737 ms · 2026-05-10T16:11:51.030585+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

83 extracted references · 15 canonical work pages · 4 internal anchors

  1. [1]

    Scaling Laws for Neural Language Models

    Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361, 2020

  2. [2]

    Gemini: A Family of Highly Capable Multimodal Models

    Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalk- wyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023

  3. [3]

    Kimi-VL Technical Report

    Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, et al. Kimi-vl technical report.arXiv preprint arXiv:2504.07491, 2025

  4. [4]

    GPT-4 Technical Report

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

  5. [5]

    Explaining neural scaling laws

    Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, and Utkarsh Sharma. Explaining neural scaling laws. Proceedings of the National Academy of Sciences, 121(27):e2311878121, 2024

  6. [6]

    Beyond neural scaling laws: beating power law scaling via data pruning

    Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, and Ari Morcos. Beyond neural scaling laws: beating power law scaling via data pruning. InNeurIPS, pages 19523–19536, 2022

  7. [7]

    Not all tokens are what you need for pretraining

    Zhenghao Lin, Zhibin Gou, Yeyun Gong, Xiao Liu, Yelong Shen, Ruochen Xu, Chen Lin, Yujiu Yang, Jian Jiao, Nan Duan, et al. Not all tokens are what you need for pretraining. InNeurIPS, pages 29029–29063, 2024

  8. [8]

    Coreset selection for object detection

    Hojun Lee, Suyoung Kim, Junhoo Lee, Jaeyoung Yoo, and Nojun Kwak. Coreset selection for object detection. InCVPR, pages 7682–7691, 2024

  9. [9]

    arXiv preprint arXiv:2410.09335 , year=

    Tingyu Xia, Bowen Yu, Kai Dang, An Yang, Yuan Wu, Yuan Tian, Yi Chang, and Junyang Lin. Rethinking data selection at scale: Random selection is almost all you need.arXiv preprint arXiv:2410.09335, 1, 2024

  10. [10]

    A survey on data selection for llm instruction tuning.Journal of Artificial Intelligence Research, 83, 2025

    Bolin Zhang, Jiahao Wang, Qianlong Du, Jiajun Zhang, Zhiying Tu, and Dianhui Chu. A survey on data selection for llm instruction tuning.Journal of Artificial Intelligence Research, 83, 2025

  11. [11]

    Dataset distillation.arXiv preprint arXiv:1811.10959, 2018

    Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A Efros. Dataset distillation.arXiv preprint arXiv:1811.10959, 2018

  12. [12]

    Dataset condensation with gradient matching.arXiv preprint arXiv:2006.05929, 2020

    Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. Dataset condensation with gradient matching.arXiv preprint arXiv:2006.05929, 2020

  13. [13]

    Boost self-supervised dataset distillation via parameterization, predefined augmentation, and approximation

    Sheng-Feng Yu, Jia-Jiun Yao, and Wei-Chen Chiu. Boost self-supervised dataset distillation via parameterization, predefined augmentation, and approximation. InCVPR, 2025

  14. [14]

    On optimal coreset construction for euclidean (k, z)-clustering

    Lingxiao Huang, Jian Li, and Xuan Wu. On optimal coreset construction for euclidean (k, z)-clustering. In STOC, pages 1594–1604, 2024

  15. [15]

    Large-scale dataset pruning with dynamic uncertainty

    Muyang He, Shuo Yang, Tiejun Huang, and Bo Zhao. Large-scale dataset pruning with dynamic uncertainty. In CVPR, pages 7713–7722, 2024

  16. [16]

    Selectivity drives productivity: efficient dataset pruning for enhanced transfer learning

    Yihua Zhang, Yimeng Zhang, Aochuan Chen, Jinghan Jia, Jiancheng Liu, Gaowen Liu, Mingyi Hong, Shiyu Chang, and Sijia Liu. Selectivity drives productivity: efficient dataset pruning for enhanced transfer learning. In NeurIPS, pages 36913–36937, 2023

  17. [17]

    A survey on dataset distillation: Approaches, applications and future directions

    Jiahui Geng, Zongxiong Chen, Yuandou Wang, Herbert Woisetschlaeger, Sonja Schimmler, Ruben Mayer, Zhim- ing Zhao, and Chunming Rong. A survey on dataset distillation: Approaches, applications and future directions. arXiv preprint arXiv:2305.01975, 2023

  18. [18]

    The evolution of dataset distillation: Toward scalable and generalizable solutions.arXiv preprint arXiv:2502.05673, 2025

    Ping Liu and Jiawei Du. The evolution of dataset distillation: Toward scalable and generalizable solutions.arXiv preprint arXiv:2502.05673, 2025. 25 APREPRINT

  19. [19]

    Fixed anchors are not enough: Dynamic retrieval and persistent homology for dataset distillation

    Muquan Li, Hang Gou, Yingyi Ma, Rongzheng Wang, Ke Qin, and Tao He. Fixed anchors are not enough: Dynamic retrieval and persistent homology for dataset distillation. InCVPR, 2026

  20. [20]

    Condensing action segmentation datasets via generative network inversion

    Guodong Ding, Rongyu Chen, and Angela Yao. Condensing action segmentation datasets via generative network inversion. InCVPR, pages 17733–17742, 2025

  21. [21]

    Prism: Video dataset condensation with progressive refinement and insertion for sparse motion

    Jaehyun Choi, Jiwan Hur, Gyojin Han, Jaemyung Yu, and Junmo Kim. Prism: Video dataset condensation with progressive refinement and insertion for sparse motion. InCVPR, 2026

  22. [22]

    High-order progressive trajectory matching for medical image dataset distillation

    Le Dong, Jinghao Bian, Jingyang Hou, Jingliang Hu, Yilei Shi, Weisheng Dong, Xiao Xiang Zhu, and Lichao Mou. High-order progressive trajectory matching for medical image dataset distillation. InMICCAI, pages 273–283, 2025

  23. [23]

    Elucidating the design space of dataset conden- sation

    Shitong Shao, Zikai Zhou, Huanran Chen, and Zhiqiang Shen. Elucidating the design space of dataset conden- sation. InNeurIPS, pages 99161–99201, 2024

  24. [24]

    Synthetic text generation for training large language models via gradient matching.arXiv preprint arXiv:2502.17607, 2025

    Dang Nguyen, Zeman Li, Mohammadhossein Bateni, Vahab Mirrokni, Meisam Razaviyayn, and Baharan Mirza- soleiman. Synthetic text generation for training large language models via gradient matching.arXiv preprint arXiv:2502.17607, 2025

  25. [25]

    Unidetox: Universal detoxification of large language models via dataset distillation

    Huimin Lu, Masaru Isonuma, Junichiro Mori, and Ichiro Sakata. Unidetox: Universal detoxification of large language models via dataset distillation. InICLR, 2025

  26. [26]

    Condenselm: Llms-driven text dataset condensation via reward matching

    Cheng Shen, Yew-Soon Ong, and Joey Tianyi Zhou. Condenselm: Llms-driven text dataset condensation via reward matching. InEMNLP, pages 1237–1252, 2025

  27. [27]

    Low-rank similarity mining for multimodal dataset distillation

    Yue Xu, Zhilin Lin, Yusong Qiu, Cewu Lu, and Yong-Lu Li. Low-rank similarity mining for multimodal dataset distillation. InICML, 2024

  28. [28]

    Efficient multimodal dataset distillation via generative models

    Zhenghao Zhao, Haoxuan Wang, Junyi Wu, Yuzhang Shang, Gaowen Liu, and Yan Yan. Efficient multimodal dataset distillation via generative models. InNeurIPS, 2025

  29. [29]

    Beyond modality collapse: Represen- tations blending for multimodal dataset distillation

    Xin Zhang, Ziruo Zhang, Jiawei Du, Zuozhu Liu, and Joey Tianyi Zhou. Beyond modality collapse: Represen- tations blending for multimodal dataset distillation. InNeurIPS, 2025

  30. [30]

    Gramian multimodal represen- tation learning and alignment

    Giordano Cicchetti, Eleonora Grassucci, Luigi Sigillo, and Danilo Comminiello. Gramian multimodal represen- tation learning and alignment. InICLR, 2025

  31. [31]

    Principled multimodal representation learning

    Xiaohao Liu, Xiaobo Xia, See-Kiong Ng, and Tat-Seng Chua. Principled multimodal representation learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026

  32. [32]

    Dataset distillation: A comprehensive review.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):150–170, 2023

    Ruonan Yu, Songhua Liu, and Xinchao Wang. Dataset distillation: A comprehensive review.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):150–170, 2023

  33. [33]

    A comprehensive survey of dataset distillation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):17–32, 2023

    Shiye Lei and Dacheng Tao. A comprehensive survey of dataset distillation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):17–32, 2023

  34. [34]

    Knowledge distillation and dataset distillation of large language models: Emerging trends, challenges, and future directions.Artificial Intelligence Review, 59(1):17, 2026

    Luyang Fang, Xiaowei Yu, Jiazhang Cai, Yongkai Chen, Shushan Wu, Zhengliang Liu, Zhenyuan Yang, Haoran Lu, Xilin Gong, Yufang Liu, et al. Knowledge distillation and dataset distillation of large language models: Emerging trends, challenges, and future directions.Artificial Intelligence Review, 59(1):17, 2026

  35. [35]

    Hierarchical features matter: A deep exploration of progressive parameterization method for dataset distillation

    Xinhao Zhong, Hao Fang, Bin Chen, Xulin Gu, Meikang Qiu, Shuhan Qi, and Shu-Tao Xia. Hierarchical features matter: A deep exploration of progressive parameterization method for dataset distillation. InCVPR, pages 30462–30471, 2025

  36. [36]

    Optical: Leveraging optimal transport for contribution allocation in dataset distillation

    Xiao Cui, Yulei Qin, Wengang Zhou, Hongsheng Li, and Houqiang Li. Optical: Leveraging optimal transport for contribution allocation in dataset distillation. InCVPR, pages 15245–15254, 2025

  37. [37]

    Diffusion models as dataset distillation priors

    Duo Su, Huyu Wu, Huanran Chen, Yiming Shi, Yuzhu Wang, Xi Ye, and Jun Zhu. Diffusion models as dataset distillation priors. InICLR, 2025. 26 APREPRINT

  38. [38]

    Dataset condensation with contrastive signals

    Saehyung Lee, Sanghyuk Chun, Sangwon Jung, Sangdoo Yun, and Sungroh Yoon. Dataset condensation with contrastive signals. InICML, pages 12352–12364, 2022

  39. [39]

    Dataset condensation via efficient synthetic-data parameterization

    Jang-Hyun Kim, Jinuk Kim, Seong Joon Oh, Sangdoo Yun, Hwanjun Song, Joonhyun Jeong, Jung-Woo Ha, and Hyun Oh Song. Dataset condensation via efficient synthetic-data parameterization. InICML, pages 11102– 11118, 2022

  40. [40]

    Dataset distillation by matching training trajectories

    George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A Efros, and Jun-Yan Zhu. Dataset distillation by matching training trajectories. InCVPR, pages 4750–4759, 2022

  41. [41]

    Towards lossless dataset distillation via difficulty-aligned trajectory matching

    Ziyao Guo, Kai Wang, George Cazenavette, Hui Li, Kaipeng Zhang, and Yang You. Towards lossless dataset distillation via difficulty-aligned trajectory matching. InICLR, 2024

  42. [42]

    Towards stable and storage-efficient dataset distillation: Matching convexified trajectory

    Wenliang Zhong, Haoyu Tang, Qinghai Zheng, Mingzhu Xu, Yupeng Hu, and Weili Guan. Towards stable and storage-efficient dataset distillation: Matching convexified trajectory. InCVPR, pages 25581–25589, 2025

  43. [43]

    Dataset distillation via the wasserstein metric

    Haoyang Liu, Yijiang Li, Tiancheng Xing, Peiran Wang, Vibhu Dalal, Luwei Li, Jingrui He, and Haohan Wang. Dataset distillation via the wasserstein metric. InICCV, pages 1205–1215, 2025

  44. [44]

    Dataset condensation with distribution matching

    Bo Zhao and Hakan Bilen. Dataset condensation with distribution matching. InWACV, pages 6514–6523, 2023

  45. [45]

    arXiv preprint arXiv:2011.00050 , year=

    Timothy Nguyen, Zhourong Chen, and Jaehoon Lee. Dataset meta-learning from kernel ridge-regression.arXiv preprint arXiv:2011.00050, 2020

  46. [46]

    Generative teaching networks: Accelerating neural architecture search by learning to generate synthetic training data

    Felipe Petroski Such, Aditya Rawal, Joel Lehman, Kenneth Stanley, and Jeffrey Clune. Generative teaching networks: Accelerating neural architecture search by learning to generate synthetic training data. InICML, pages 9206–9216, 2020

  47. [47]

    Remember the past: Distilling datasets into addressable memories for neural networks

    Zhiwei Deng and Olga Russakovsky. Remember the past: Distilling datasets into addressable memories for neural networks. InNeurIPS, pages 34391–34404, 2022

  48. [48]

    Dataset distillation using neural feature regression

    Yongchao Zhou, Ehsan Nezhadarya, and Jimmy Ba. Dataset distillation using neural feature regression. In NeurIPS, pages 9813–9827, 2022

  49. [49]

    Provable and efficient dataset distillation for kernel ridge regression

    Yilan Chen, Wei Huang, and Lily Weng. Provable and efficient dataset distillation for kernel ridge regression. In NeurIPS, pages 88739–88771, 2024

  50. [50]

    Generalizing dataset distillation via deep generative prior

    George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A Efros, and Jun-Yan Zhu. Generalizing dataset distillation via deep generative prior. InCVPR, pages 3739–3748, 2023

  51. [51]

    Efficient dataset distillation via minimax diffusion

    Jianyang Gu, Saeed Vahidian, Vyacheslav Kungurtsev, Haonan Wang, Wei Jiang, Yang You, and Yiran Chen. Efficient dataset distillation via minimax diffusion. InCVPR, pages 15793–15803, 2024

  52. [52]

    Dˆ 4: Dataset distillation via disentangled diffusion model

    Duo Su, Junjie Hou, Weizhi Gao, Yingjie Tian, and Bowen Tang. Dˆ 4: Dataset distillation via disentangled diffusion model. InCVPR, pages 5809–5818, 2024

  53. [53]

    Vision-language dataset distillation.arXiv preprint arXiv:2308.07545, 2023

    Xindi Wu, Byron Zhang, Zhiwei Deng, and Olga Russakovsky. Vision-language dataset distillation.arXiv preprint arXiv:2308.07545, 2023

  54. [54]

    Covmatch: Cross-covariance guided multimodal dataset distillation with trainable text encoder

    Yongmin Lee and Hye Won Chung. Covmatch: Cross-covariance guided multimodal dataset distillation with trainable text encoder. InNeurIPS, 2025

  55. [55]

    Audio-visual dataset distillation.Transactions on Machine Learning Research, 2024

    Saksham Singh Kushwaha, Siva Sai Nagender Vasireddy, Kai Wang, and Yapeng Tian. Audio-visual dataset distillation.Transactions on Machine Learning Research, 2024

  56. [56]

    Decoupled audio-visual dataset distillation.arXiv preprint arXiv:2511.17890, 2025

    Wenyuan Li, Guang Li, Keisuke Maeda, Takahiro Ogawa, and Miki Haseyama. Decoupled audio-visual dataset distillation.arXiv preprint arXiv:2511.17890, 2025

  57. [57]

    Imagebind: One embedding space to bind them all

    Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. Imagebind: One embedding space to bind them all. InCVPR, pages 15180–15190, 2023. 27 APREPRINT

  58. [58]

    Languagebind: Extending video-language pretraining to n-modality by language-based se- mantic alignment

    Bin Zhu, Bin Lin, Munan Ning, Yang Yan, Jiaxi Cui, HongFa Wang, Yatian Pang, Wenhao Jiang, Junwu Zhang, Zongwei Li, et al. Languagebind: Extending video-language pretraining to n-modality by language-based se- mantic alignment. InICLR, 2024

  59. [59]

    Omnibind: Large-scale omni multimodal representation via binding spaces

    Zehan Wang, Ziang Zhang, Hang Zhang, Luping Liu, Rongjie Huang, Xize Cheng, Hengshuang Zhao, and Zhou Zhao. Omnibind: Large-scale omni multimodal representation via binding spaces. InICLR, 2025

  60. [60]

    Next- omni: Towards any-to-any omnimodal foundation models with discrete flow matching

    Run Luo, Xiaobo Xia, Lu Wang, Longze Chen, Renke Shan, Jing Luo, Min Yang, and Tat-Seng Chua. Next- omni: Towards any-to-any omnimodal foundation models with discrete flow matching. InICLR, 2026

  61. [61]

    Universal multimodal representation for language understanding.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7):9169–9185, 2023

    Zhuosheng Zhang, Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Zuchao Li, and Hai Zhao. Universal multimodal representation for language understanding.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7):9169–9185, 2023

  62. [62]

    Self-supervised multimodal learning: A survey

    Yongshuo Zong, Oisin Mac Aodha, and Timothy M Hospedales. Self-supervised multimodal learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(7):5299–5318, 2024

  63. [63]

    Representation learning for tabular data: A comprehensive survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026

    Jun-Peng Jiang, Si-Yang Liu, Hao-Run Cai, Qi-Le Zhou, and Han-Jia Ye. Representation learning for tabular data: A comprehensive survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026

  64. [64]

    Hierarchical banzhaf interaction for general video- language representation learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(3):2125– 2139, 2024

    Peng Jin, Hao Li, Li Yuan, Shuicheng Yan, and Jie Chen. Hierarchical banzhaf interaction for general video- language representation learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(3):2125– 2139, 2024

  65. [65]

    Omni-diffusion: Unified multimodal understanding and generation with masked discrete diffusion.arXiv preprint arXiv:2603.06577, 2026

    Lijiang Li, Zuwei Long, Yunhang Shen, Heting Gao, Haoyu Cao, Xing Sun, Caifeng Shan, Ran He, and Chaoyou Fu. Omni-diffusion: Unified multimodal understanding and generation with masked discrete diffusion.arXiv preprint arXiv:2603.06577, 2026

  66. [66]

    Dataset distillation by automatic training trajectories

    Dai Liu, Jindong Gu, Hu Cao, Carsten Trinitis, and Martin Schulz. Dataset distillation by automatic training trajectories. InECCV, pages 334–351. Springer, 2024

  67. [67]

    Selmatch: Effectively scaling up dataset distillation via selection-based initialization and partial updates by trajectory matching

    Yongmin Lee and Hye Won Chung. Selmatch: Effectively scaling up dataset distillation via selection-based initialization and partial updates by trajectory matching. InICML, 2024

  68. [68]

    Minimizing the accumulated trajec- tory error to improve dataset distillation

    Jiawei Du, Yidi Jiang, Vincent YF Tan, Joey Tianyi Zhou, and Haizhou Li. Minimizing the accumulated trajec- tory error to improve dataset distillation. InCVPR, pages 3749–3758, 2023

  69. [69]

    Matrix backpropagation for deep networks with structured layers

    Catalin Ionescu, Orestis Vantzos, and Cristian Sminchisescu. Matrix backpropagation for deep networks with structured layers. InICCV, pages 2965–2973, 2015

  70. [70]

    Msr-vtt: A large video description dataset for bridging video and language

    Jun Xu, Tao Mei, Ting Yao, and Yong Rui. Msr-vtt: A large video description dataset for bridging video and language. InCVPR, pages 5288–5296, 2016

  71. [71]

    Vggsound: A large-scale audio-visual dataset

    Honglie Chen, Weidi Xie, Andrea Vedaldi, and Andrew Zisserman. Vggsound: A large-scale audio-visual dataset. InICASSP, pages 721–725, 2020

  72. [72]

    Localizing moments in video with natural language

    Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, and Bryan Russell. Localizing moments in video with natural language. InICCV, 2017

  73. [73]

    Scaling up dataset distillation to imagenet-1k with constant memory

    Justin Cui, Ruochen Wang, Si Si, and Cho-Jui Hsieh. Scaling up dataset distillation to imagenet-1k with constant memory. InICML, pages 6565–6590, 2023

  74. [74]

    Moderate coreset: A universal method of data selection for real-world data-efficient deep learning

    Xiaobo Xia, Jiale Liu, Jun Yu, Xu Shen, Bo Han, and Tongliang Liu. Moderate coreset: A universal method of data selection for real-world data-efficient deep learning. InICLR, 2023

  75. [75]

    Refined coreset selection: Towards minimal coreset size under model performance constraints

    Xiaobo Xia, Jiale Liu, Shaokun Zhang, Qingyun Wu, Hongxin Wei, and Tongliang Liu. Refined coreset selection: Towards minimal coreset size under model performance constraints. InICML, 2024

  76. [76]

    Coverage-centric coreset selection for high pruning rates

    Haizhong Zheng, Rui Liu, Fan Lai, and Atul Prakash. Coverage-centric coreset selection for high pruning rates. arXiv preprint arXiv:2210.15809, 2022. 28 APREPRINT

  77. [77]

    Cafe: Learning to condense dataset by aligning features

    Kai Wang, Bo Zhao, Xiangyu Peng, Zheng Zhu, Shuo Yang, Shuo Wang, Guan Huang, Hakan Bilen, Xinchao Wang, and Yang You. Cafe: Learning to condense dataset by aligning features. InCVPR, pages 12196–12205, 2022

  78. [78]

    Mitigating bias in dataset distillation

    Justin Cui, Ruochen Wang, Yuanhao Xiong, and Cho-Jui Hsieh. Mitigating bias in dataset distillation. InICML, 2024

  79. [79]

    Dataset distillation for pre-trained self-supervised vision models

    George Cazenavette, Antonio Torralba, and Vincent Sitzmann. Dataset distillation for pre-trained self-supervised vision models. InNeurIPS, 2025

  80. [80]

    Dataset distillation via knowledge distillation: towards efficient self-supervised pre-training of deep networks

    Siddharth Joshi, Jiayi Ni, and Baharan Mirzasoleiman. Dataset distillation via knowledge distillation: towards efficient self-supervised pre-training of deep networks. InICLR, 2025

Showing first 80 references.