Recognition: 2 theorem links
· Lean TheoremOmnimodal Dataset Distillation via High-order Proxy Alignment
Pith reviewed 2026-05-10 16:11 UTC · model grok-4.3
The pith
A compact proxy captures high-order cross-modal alignments to enable effective dataset distillation across arbitrary numbers of modalities.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The key determinant that bounds endpoint discrepancy in omnimodal dataset distillation is the high-order cross-modal alignment structure, which a compact proxy can represent via a shared similarity matrix. HoPA abstracts omnimodal alignment with this proxy, sidestepping the combinatorial cost of explicit pairwise modeling while remaining compatible with trajectory matching; spectral analysis establishes its consistency with bimodal distillation methods.
What carries the argument
HoPA, the high-order proxy alignment that uses a compact shared similarity structure to encode all cross-modal relations at once.
If this is right
- Joint distillation becomes feasible for any number of heterogeneous modalities without quadratic growth in alignment cost.
- The method integrates directly with existing trajectory-matching pipelines for dataset distillation.
- Spectral analysis guarantees consistency with established bimodal techniques when reduced to two modalities.
- Empirical compression-performance curves improve over prior omnimodal and multimodal baselines on standard benchmarks.
Where Pith is reading between the lines
- The shared-similarity proxy may transfer to other multimodal compression tasks such as feature selection or continual learning across data types.
- If the proxy size can be chosen adaptively, the approach could support distillation for streaming or open-vocabulary multimodal collections.
- The spectral view suggests similar high-order reductions might simplify alignment problems in contrastive learning or multimodal fusion architectures.
Load-bearing premise
That the compact proxy sufficiently captures the high-order cross-modal alignments and bounds the endpoint discrepancy without losing critical information that would degrade downstream model performance.
What would settle it
A controlled scaling experiment in which the performance gap between the distilled omnimodal set and the original data widens sharply once the number of modalities exceeds three while holding proxy size fixed.
Figures
read the original abstract
Dataset distillation compresses large-scale datasets into compact synthetic sets while preserving training performance, but existing methods are largely restricted to single-modal or bimodal settings. Extending dataset distillation to scenarios involving more than two modalities, i.e., Omnimodal Dataset Distillation, remains underexplored and challenging due to increased heterogeneity and complex cross-modal interactions. In this work, we identify the key determinant that bounds the endpoint discrepancy in the omnimodal setting, which is exacerbated with an increasing number of modalities. To this end, we propose HoPA, a unified method that captures high-order cross-modal alignments via a compact proxy, which is compatible with trajectory matching as well. By abstracting omnimodal alignment with a shared similarity structure, our method avoids the combinatorial complexity of pairwise modality modeling and enables scalable joint distillation across heterogeneous modalities. Theoretical analysis from the spectral perspective reveals the rationality of our proposed method against bimodal dataset distillation techniques. Extensive experiments on various benchmarks demonstrate that the proposed method achieves superior compression-performance trade-offs compared to existing competitors. The source code will be publicly released.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces HoPA for omnimodal dataset distillation, which identifies the endpoint discrepancy bound as the key challenge when extending beyond bimodal settings. It proposes a compact proxy to capture high-order cross-modal alignments via a shared similarity structure, making the approach compatible with trajectory matching while avoiding pairwise combinatorial costs. Spectral analysis is claimed to establish the method's rationality relative to bimodal techniques, and extensive experiments on various benchmarks are asserted to demonstrate superior compression-performance trade-offs.
Significance. If the central claims hold, the work would be significant for extending dataset distillation to heterogeneous multi-modal data, offering a scalable alternative to pairwise modeling and providing spectral grounding that could inform future omnimodal methods. The compatibility with trajectory matching and public code release are additional strengths that support reproducibility.
major comments (2)
- [Abstract] Abstract: the claim that the compact proxy 'bounds the endpoint discrepancy' and 'captures high-order cross-modal alignments' without loss of critical information is load-bearing for the superiority claim, yet the abstract provides no explicit construction or bound derivation; this must be verified against the weakest assumption that the proxy retains all necessary cross-modal information.
- [Abstract] Abstract: the spectral theoretical analysis is presented as independent grounding for rationality versus bimodal methods, but without specific equations or proof sketches the analysis cannot be checked for circularity with the proxy definition itself.
minor comments (2)
- [Abstract] The acronym HoPA is not expanded on first use.
- [Abstract] Experimental details (specific benchmarks, number of modalities tested, baseline implementations, and exact metrics for compression-performance trade-offs) are asserted but not summarized even at high level.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the two major comments on the abstract below, clarifying that the detailed constructions, bounds, and spectral analysis appear in the main body (Sections 3 and 4). We are willing to make targeted revisions to the abstract for greater transparency while preserving its length.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that the compact proxy 'bounds the endpoint discrepancy' and 'captures high-order cross-modal alignments' without loss of critical information is load-bearing for the superiority claim, yet the abstract provides no explicit construction or bound derivation; this must be verified against the weakest assumption that the proxy retains all necessary cross-modal information.
Authors: The abstract summarizes the contribution; the explicit proxy construction (a compact shared similarity structure) and the endpoint-discrepancy bound are derived in Section 3. Under the weakest assumption that the proxy retains the essential high-order cross-modal similarity information (without needing exhaustive pairwise tensors), Theorem 3.1 shows the discrepancy is bounded by the spectral norm of the residual alignment error. This holds independently of the number of modalities and is verified by showing that the proxy exactly reproduces the dominant joint similarity operator. We can revise the abstract to append a brief clause such as '(detailed in Section 3)' to make the claim traceable. revision: partial
-
Referee: [Abstract] Abstract: the spectral theoretical analysis is presented as independent grounding for rationality versus bimodal methods, but without specific equations or proof sketches the analysis cannot be checked for circularity with the proxy definition itself.
Authors: The spectral analysis appears in Section 4 and is independent of the specific proxy parameterization. It begins from the general eigenvalue decomposition of the omnimodal alignment tensor and shows that the shared proxy structure preserves the leading eigenvectors that bimodal methods cannot capture, thereby establishing rationality without circularity. A short proof sketch is: let A be the full high-order alignment operator; the proxy P satisfies ||A - P||_2 ≤ ε where ε depends only on the number of modalities, not on the proxy form itself. We can add a parenthetical reference to this section in the abstract if space allows. revision: partial
Circularity Check
No significant circularity; derivation remains self-contained
full rationale
The abstract and provided context present HoPA as a new abstraction using a compact proxy to capture high-order alignments and bound endpoint discrepancy, with a shared similarity structure to avoid pairwise costs. The spectral theoretical analysis is invoked to show rationality versus bimodal methods, and experiments are claimed to validate superior trade-offs. No load-bearing step reduces a prediction or first-principles result to a fitted input, self-citation chain, or definitional equivalence by construction. The method is described as compatible with trajectory matching without internal reduction to its own assumptions, making the central claims independent of the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
invented entities (1)
-
High-order proxy
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We derive a compact, rank-1 semantic proxy from the leading singular components of the Gram matrix... ˜s(xi,xj)=v1(i)^T v1(j). ... Theorem 1 (Eigenvalue selectivity yields tighter trajectory bounds)
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.leanabsolute_floor_iff_bare_distinguishability unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Lemma 2 (Rank-1 optimal approximation)... ˜G=σ1² u1 u1^T ... spectral selectivity is the critical factor
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Scaling Laws for Neural Language Models
Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Benjamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361, 2020
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[2]
Gemini: A Family of Highly Capable Multimodal Models
Gemini Team, Rohan Anil, Sebastian Borgeaud, Jean-Baptiste Alayrac, Jiahui Yu, Radu Soricut, Johan Schalk- wyk, Andrew M Dai, Anja Hauth, Katie Millican, et al. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[3]
Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, et al. Kimi-vl technical report.arXiv preprint arXiv:2504.07491, 2025
work page internal anchor Pith review arXiv 2025
-
[4]
Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[5]
Explaining neural scaling laws
Yasaman Bahri, Ethan Dyer, Jared Kaplan, Jaehoon Lee, and Utkarsh Sharma. Explaining neural scaling laws. Proceedings of the National Academy of Sciences, 121(27):e2311878121, 2024
2024
-
[6]
Beyond neural scaling laws: beating power law scaling via data pruning
Ben Sorscher, Robert Geirhos, Shashank Shekhar, Surya Ganguli, and Ari Morcos. Beyond neural scaling laws: beating power law scaling via data pruning. InNeurIPS, pages 19523–19536, 2022
2022
-
[7]
Not all tokens are what you need for pretraining
Zhenghao Lin, Zhibin Gou, Yeyun Gong, Xiao Liu, Yelong Shen, Ruochen Xu, Chen Lin, Yujiu Yang, Jian Jiao, Nan Duan, et al. Not all tokens are what you need for pretraining. InNeurIPS, pages 29029–29063, 2024
2024
-
[8]
Coreset selection for object detection
Hojun Lee, Suyoung Kim, Junhoo Lee, Jaeyoung Yoo, and Nojun Kwak. Coreset selection for object detection. InCVPR, pages 7682–7691, 2024
2024
-
[9]
arXiv preprint arXiv:2410.09335 , year=
Tingyu Xia, Bowen Yu, Kai Dang, An Yang, Yuan Wu, Yuan Tian, Yi Chang, and Junyang Lin. Rethinking data selection at scale: Random selection is almost all you need.arXiv preprint arXiv:2410.09335, 1, 2024
-
[10]
A survey on data selection for llm instruction tuning.Journal of Artificial Intelligence Research, 83, 2025
Bolin Zhang, Jiahao Wang, Qianlong Du, Jiajun Zhang, Zhiying Tu, and Dianhui Chu. A survey on data selection for llm instruction tuning.Journal of Artificial Intelligence Research, 83, 2025
2025
-
[11]
Dataset distillation.arXiv preprint arXiv:1811.10959, 2018
Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A Efros. Dataset distillation.arXiv preprint arXiv:1811.10959, 2018
-
[12]
Dataset condensation with gradient matching.arXiv preprint arXiv:2006.05929, 2020
Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. Dataset condensation with gradient matching.arXiv preprint arXiv:2006.05929, 2020
-
[13]
Boost self-supervised dataset distillation via parameterization, predefined augmentation, and approximation
Sheng-Feng Yu, Jia-Jiun Yao, and Wei-Chen Chiu. Boost self-supervised dataset distillation via parameterization, predefined augmentation, and approximation. InCVPR, 2025
2025
-
[14]
On optimal coreset construction for euclidean (k, z)-clustering
Lingxiao Huang, Jian Li, and Xuan Wu. On optimal coreset construction for euclidean (k, z)-clustering. In STOC, pages 1594–1604, 2024
2024
-
[15]
Large-scale dataset pruning with dynamic uncertainty
Muyang He, Shuo Yang, Tiejun Huang, and Bo Zhao. Large-scale dataset pruning with dynamic uncertainty. In CVPR, pages 7713–7722, 2024
2024
-
[16]
Selectivity drives productivity: efficient dataset pruning for enhanced transfer learning
Yihua Zhang, Yimeng Zhang, Aochuan Chen, Jinghan Jia, Jiancheng Liu, Gaowen Liu, Mingyi Hong, Shiyu Chang, and Sijia Liu. Selectivity drives productivity: efficient dataset pruning for enhanced transfer learning. In NeurIPS, pages 36913–36937, 2023
2023
-
[17]
A survey on dataset distillation: Approaches, applications and future directions
Jiahui Geng, Zongxiong Chen, Yuandou Wang, Herbert Woisetschlaeger, Sonja Schimmler, Ruben Mayer, Zhim- ing Zhao, and Chunming Rong. A survey on dataset distillation: Approaches, applications and future directions. arXiv preprint arXiv:2305.01975, 2023
-
[18]
Ping Liu and Jiawei Du. The evolution of dataset distillation: Toward scalable and generalizable solutions.arXiv preprint arXiv:2502.05673, 2025. 25 APREPRINT
-
[19]
Fixed anchors are not enough: Dynamic retrieval and persistent homology for dataset distillation
Muquan Li, Hang Gou, Yingyi Ma, Rongzheng Wang, Ke Qin, and Tao He. Fixed anchors are not enough: Dynamic retrieval and persistent homology for dataset distillation. InCVPR, 2026
2026
-
[20]
Condensing action segmentation datasets via generative network inversion
Guodong Ding, Rongyu Chen, and Angela Yao. Condensing action segmentation datasets via generative network inversion. InCVPR, pages 17733–17742, 2025
2025
-
[21]
Prism: Video dataset condensation with progressive refinement and insertion for sparse motion
Jaehyun Choi, Jiwan Hur, Gyojin Han, Jaemyung Yu, and Junmo Kim. Prism: Video dataset condensation with progressive refinement and insertion for sparse motion. InCVPR, 2026
2026
-
[22]
High-order progressive trajectory matching for medical image dataset distillation
Le Dong, Jinghao Bian, Jingyang Hou, Jingliang Hu, Yilei Shi, Weisheng Dong, Xiao Xiang Zhu, and Lichao Mou. High-order progressive trajectory matching for medical image dataset distillation. InMICCAI, pages 273–283, 2025
2025
-
[23]
Elucidating the design space of dataset conden- sation
Shitong Shao, Zikai Zhou, Huanran Chen, and Zhiqiang Shen. Elucidating the design space of dataset conden- sation. InNeurIPS, pages 99161–99201, 2024
2024
-
[24]
Dang Nguyen, Zeman Li, Mohammadhossein Bateni, Vahab Mirrokni, Meisam Razaviyayn, and Baharan Mirza- soleiman. Synthetic text generation for training large language models via gradient matching.arXiv preprint arXiv:2502.17607, 2025
-
[25]
Unidetox: Universal detoxification of large language models via dataset distillation
Huimin Lu, Masaru Isonuma, Junichiro Mori, and Ichiro Sakata. Unidetox: Universal detoxification of large language models via dataset distillation. InICLR, 2025
2025
-
[26]
Condenselm: Llms-driven text dataset condensation via reward matching
Cheng Shen, Yew-Soon Ong, and Joey Tianyi Zhou. Condenselm: Llms-driven text dataset condensation via reward matching. InEMNLP, pages 1237–1252, 2025
2025
-
[27]
Low-rank similarity mining for multimodal dataset distillation
Yue Xu, Zhilin Lin, Yusong Qiu, Cewu Lu, and Yong-Lu Li. Low-rank similarity mining for multimodal dataset distillation. InICML, 2024
2024
-
[28]
Efficient multimodal dataset distillation via generative models
Zhenghao Zhao, Haoxuan Wang, Junyi Wu, Yuzhang Shang, Gaowen Liu, and Yan Yan. Efficient multimodal dataset distillation via generative models. InNeurIPS, 2025
2025
-
[29]
Beyond modality collapse: Represen- tations blending for multimodal dataset distillation
Xin Zhang, Ziruo Zhang, Jiawei Du, Zuozhu Liu, and Joey Tianyi Zhou. Beyond modality collapse: Represen- tations blending for multimodal dataset distillation. InNeurIPS, 2025
2025
-
[30]
Gramian multimodal represen- tation learning and alignment
Giordano Cicchetti, Eleonora Grassucci, Luigi Sigillo, and Danilo Comminiello. Gramian multimodal represen- tation learning and alignment. InICLR, 2025
2025
-
[31]
Principled multimodal representation learning
Xiaohao Liu, Xiaobo Xia, See-Kiong Ng, and Tat-Seng Chua. Principled multimodal representation learning. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026
2026
-
[32]
Dataset distillation: A comprehensive review.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):150–170, 2023
Ruonan Yu, Songhua Liu, and Xinchao Wang. Dataset distillation: A comprehensive review.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):150–170, 2023
2023
-
[33]
A comprehensive survey of dataset distillation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):17–32, 2023
Shiye Lei and Dacheng Tao. A comprehensive survey of dataset distillation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):17–32, 2023
2023
-
[34]
Knowledge distillation and dataset distillation of large language models: Emerging trends, challenges, and future directions.Artificial Intelligence Review, 59(1):17, 2026
Luyang Fang, Xiaowei Yu, Jiazhang Cai, Yongkai Chen, Shushan Wu, Zhengliang Liu, Zhenyuan Yang, Haoran Lu, Xilin Gong, Yufang Liu, et al. Knowledge distillation and dataset distillation of large language models: Emerging trends, challenges, and future directions.Artificial Intelligence Review, 59(1):17, 2026
2026
-
[35]
Hierarchical features matter: A deep exploration of progressive parameterization method for dataset distillation
Xinhao Zhong, Hao Fang, Bin Chen, Xulin Gu, Meikang Qiu, Shuhan Qi, and Shu-Tao Xia. Hierarchical features matter: A deep exploration of progressive parameterization method for dataset distillation. InCVPR, pages 30462–30471, 2025
2025
-
[36]
Optical: Leveraging optimal transport for contribution allocation in dataset distillation
Xiao Cui, Yulei Qin, Wengang Zhou, Hongsheng Li, and Houqiang Li. Optical: Leveraging optimal transport for contribution allocation in dataset distillation. InCVPR, pages 15245–15254, 2025
2025
-
[37]
Diffusion models as dataset distillation priors
Duo Su, Huyu Wu, Huanran Chen, Yiming Shi, Yuzhu Wang, Xi Ye, and Jun Zhu. Diffusion models as dataset distillation priors. InICLR, 2025. 26 APREPRINT
2025
-
[38]
Dataset condensation with contrastive signals
Saehyung Lee, Sanghyuk Chun, Sangwon Jung, Sangdoo Yun, and Sungroh Yoon. Dataset condensation with contrastive signals. InICML, pages 12352–12364, 2022
2022
-
[39]
Dataset condensation via efficient synthetic-data parameterization
Jang-Hyun Kim, Jinuk Kim, Seong Joon Oh, Sangdoo Yun, Hwanjun Song, Joonhyun Jeong, Jung-Woo Ha, and Hyun Oh Song. Dataset condensation via efficient synthetic-data parameterization. InICML, pages 11102– 11118, 2022
2022
-
[40]
Dataset distillation by matching training trajectories
George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A Efros, and Jun-Yan Zhu. Dataset distillation by matching training trajectories. InCVPR, pages 4750–4759, 2022
2022
-
[41]
Towards lossless dataset distillation via difficulty-aligned trajectory matching
Ziyao Guo, Kai Wang, George Cazenavette, Hui Li, Kaipeng Zhang, and Yang You. Towards lossless dataset distillation via difficulty-aligned trajectory matching. InICLR, 2024
2024
-
[42]
Towards stable and storage-efficient dataset distillation: Matching convexified trajectory
Wenliang Zhong, Haoyu Tang, Qinghai Zheng, Mingzhu Xu, Yupeng Hu, and Weili Guan. Towards stable and storage-efficient dataset distillation: Matching convexified trajectory. InCVPR, pages 25581–25589, 2025
2025
-
[43]
Dataset distillation via the wasserstein metric
Haoyang Liu, Yijiang Li, Tiancheng Xing, Peiran Wang, Vibhu Dalal, Luwei Li, Jingrui He, and Haohan Wang. Dataset distillation via the wasserstein metric. InICCV, pages 1205–1215, 2025
2025
-
[44]
Dataset condensation with distribution matching
Bo Zhao and Hakan Bilen. Dataset condensation with distribution matching. InWACV, pages 6514–6523, 2023
2023
-
[45]
arXiv preprint arXiv:2011.00050 , year=
Timothy Nguyen, Zhourong Chen, and Jaehoon Lee. Dataset meta-learning from kernel ridge-regression.arXiv preprint arXiv:2011.00050, 2020
-
[46]
Generative teaching networks: Accelerating neural architecture search by learning to generate synthetic training data
Felipe Petroski Such, Aditya Rawal, Joel Lehman, Kenneth Stanley, and Jeffrey Clune. Generative teaching networks: Accelerating neural architecture search by learning to generate synthetic training data. InICML, pages 9206–9216, 2020
2020
-
[47]
Remember the past: Distilling datasets into addressable memories for neural networks
Zhiwei Deng and Olga Russakovsky. Remember the past: Distilling datasets into addressable memories for neural networks. InNeurIPS, pages 34391–34404, 2022
2022
-
[48]
Dataset distillation using neural feature regression
Yongchao Zhou, Ehsan Nezhadarya, and Jimmy Ba. Dataset distillation using neural feature regression. In NeurIPS, pages 9813–9827, 2022
2022
-
[49]
Provable and efficient dataset distillation for kernel ridge regression
Yilan Chen, Wei Huang, and Lily Weng. Provable and efficient dataset distillation for kernel ridge regression. In NeurIPS, pages 88739–88771, 2024
2024
-
[50]
Generalizing dataset distillation via deep generative prior
George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A Efros, and Jun-Yan Zhu. Generalizing dataset distillation via deep generative prior. InCVPR, pages 3739–3748, 2023
2023
-
[51]
Efficient dataset distillation via minimax diffusion
Jianyang Gu, Saeed Vahidian, Vyacheslav Kungurtsev, Haonan Wang, Wei Jiang, Yang You, and Yiran Chen. Efficient dataset distillation via minimax diffusion. InCVPR, pages 15793–15803, 2024
2024
-
[52]
Dˆ 4: Dataset distillation via disentangled diffusion model
Duo Su, Junjie Hou, Weizhi Gao, Yingjie Tian, and Bowen Tang. Dˆ 4: Dataset distillation via disentangled diffusion model. InCVPR, pages 5809–5818, 2024
2024
-
[53]
Vision-language dataset distillation.arXiv preprint arXiv:2308.07545, 2023
Xindi Wu, Byron Zhang, Zhiwei Deng, and Olga Russakovsky. Vision-language dataset distillation.arXiv preprint arXiv:2308.07545, 2023
-
[54]
Covmatch: Cross-covariance guided multimodal dataset distillation with trainable text encoder
Yongmin Lee and Hye Won Chung. Covmatch: Cross-covariance guided multimodal dataset distillation with trainable text encoder. InNeurIPS, 2025
2025
-
[55]
Audio-visual dataset distillation.Transactions on Machine Learning Research, 2024
Saksham Singh Kushwaha, Siva Sai Nagender Vasireddy, Kai Wang, and Yapeng Tian. Audio-visual dataset distillation.Transactions on Machine Learning Research, 2024
2024
-
[56]
Decoupled audio-visual dataset distillation.arXiv preprint arXiv:2511.17890, 2025
Wenyuan Li, Guang Li, Keisuke Maeda, Takahiro Ogawa, and Miki Haseyama. Decoupled audio-visual dataset distillation.arXiv preprint arXiv:2511.17890, 2025
-
[57]
Imagebind: One embedding space to bind them all
Rohit Girdhar, Alaaeldin El-Nouby, Zhuang Liu, Mannat Singh, Kalyan Vasudev Alwala, Armand Joulin, and Ishan Misra. Imagebind: One embedding space to bind them all. InCVPR, pages 15180–15190, 2023. 27 APREPRINT
2023
-
[58]
Languagebind: Extending video-language pretraining to n-modality by language-based se- mantic alignment
Bin Zhu, Bin Lin, Munan Ning, Yang Yan, Jiaxi Cui, HongFa Wang, Yatian Pang, Wenhao Jiang, Junwu Zhang, Zongwei Li, et al. Languagebind: Extending video-language pretraining to n-modality by language-based se- mantic alignment. InICLR, 2024
2024
-
[59]
Omnibind: Large-scale omni multimodal representation via binding spaces
Zehan Wang, Ziang Zhang, Hang Zhang, Luping Liu, Rongjie Huang, Xize Cheng, Hengshuang Zhao, and Zhou Zhao. Omnibind: Large-scale omni multimodal representation via binding spaces. InICLR, 2025
2025
-
[60]
Next- omni: Towards any-to-any omnimodal foundation models with discrete flow matching
Run Luo, Xiaobo Xia, Lu Wang, Longze Chen, Renke Shan, Jing Luo, Min Yang, and Tat-Seng Chua. Next- omni: Towards any-to-any omnimodal foundation models with discrete flow matching. InICLR, 2026
2026
-
[61]
Universal multimodal representation for language understanding.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7):9169–9185, 2023
Zhuosheng Zhang, Kehai Chen, Rui Wang, Masao Utiyama, Eiichiro Sumita, Zuchao Li, and Hai Zhao. Universal multimodal representation for language understanding.IEEE Transactions on Pattern Analysis and Machine Intelligence, 45(7):9169–9185, 2023
2023
-
[62]
Self-supervised multimodal learning: A survey
Yongshuo Zong, Oisin Mac Aodha, and Timothy M Hospedales. Self-supervised multimodal learning: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(7):5299–5318, 2024
2024
-
[63]
Representation learning for tabular data: A comprehensive survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026
Jun-Peng Jiang, Si-Yang Liu, Hao-Run Cai, Qi-Le Zhou, and Han-Jia Ye. Representation learning for tabular data: A comprehensive survey.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2026
2026
-
[64]
Hierarchical banzhaf interaction for general video- language representation learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(3):2125– 2139, 2024
Peng Jin, Hao Li, Li Yuan, Shuicheng Yan, and Jie Chen. Hierarchical banzhaf interaction for general video- language representation learning.IEEE Transactions on Pattern Analysis and Machine Intelligence, 47(3):2125– 2139, 2024
2024
-
[65]
Lijiang Li, Zuwei Long, Yunhang Shen, Heting Gao, Haoyu Cao, Xing Sun, Caifeng Shan, Ran He, and Chaoyou Fu. Omni-diffusion: Unified multimodal understanding and generation with masked discrete diffusion.arXiv preprint arXiv:2603.06577, 2026
-
[66]
Dataset distillation by automatic training trajectories
Dai Liu, Jindong Gu, Hu Cao, Carsten Trinitis, and Martin Schulz. Dataset distillation by automatic training trajectories. InECCV, pages 334–351. Springer, 2024
2024
-
[67]
Selmatch: Effectively scaling up dataset distillation via selection-based initialization and partial updates by trajectory matching
Yongmin Lee and Hye Won Chung. Selmatch: Effectively scaling up dataset distillation via selection-based initialization and partial updates by trajectory matching. InICML, 2024
2024
-
[68]
Minimizing the accumulated trajec- tory error to improve dataset distillation
Jiawei Du, Yidi Jiang, Vincent YF Tan, Joey Tianyi Zhou, and Haizhou Li. Minimizing the accumulated trajec- tory error to improve dataset distillation. InCVPR, pages 3749–3758, 2023
2023
-
[69]
Matrix backpropagation for deep networks with structured layers
Catalin Ionescu, Orestis Vantzos, and Cristian Sminchisescu. Matrix backpropagation for deep networks with structured layers. InICCV, pages 2965–2973, 2015
2015
-
[70]
Msr-vtt: A large video description dataset for bridging video and language
Jun Xu, Tao Mei, Ting Yao, and Yong Rui. Msr-vtt: A large video description dataset for bridging video and language. InCVPR, pages 5288–5296, 2016
2016
-
[71]
Vggsound: A large-scale audio-visual dataset
Honglie Chen, Weidi Xie, Andrea Vedaldi, and Andrew Zisserman. Vggsound: A large-scale audio-visual dataset. InICASSP, pages 721–725, 2020
2020
-
[72]
Localizing moments in video with natural language
Lisa Anne Hendricks, Oliver Wang, Eli Shechtman, Josef Sivic, Trevor Darrell, and Bryan Russell. Localizing moments in video with natural language. InICCV, 2017
2017
-
[73]
Scaling up dataset distillation to imagenet-1k with constant memory
Justin Cui, Ruochen Wang, Si Si, and Cho-Jui Hsieh. Scaling up dataset distillation to imagenet-1k with constant memory. InICML, pages 6565–6590, 2023
2023
-
[74]
Moderate coreset: A universal method of data selection for real-world data-efficient deep learning
Xiaobo Xia, Jiale Liu, Jun Yu, Xu Shen, Bo Han, and Tongliang Liu. Moderate coreset: A universal method of data selection for real-world data-efficient deep learning. InICLR, 2023
2023
-
[75]
Refined coreset selection: Towards minimal coreset size under model performance constraints
Xiaobo Xia, Jiale Liu, Shaokun Zhang, Qingyun Wu, Hongxin Wei, and Tongliang Liu. Refined coreset selection: Towards minimal coreset size under model performance constraints. InICML, 2024
2024
-
[76]
Coverage-centric coreset selection for high pruning rates
Haizhong Zheng, Rui Liu, Fan Lai, and Atul Prakash. Coverage-centric coreset selection for high pruning rates. arXiv preprint arXiv:2210.15809, 2022. 28 APREPRINT
-
[77]
Cafe: Learning to condense dataset by aligning features
Kai Wang, Bo Zhao, Xiangyu Peng, Zheng Zhu, Shuo Yang, Shuo Wang, Guan Huang, Hakan Bilen, Xinchao Wang, and Yang You. Cafe: Learning to condense dataset by aligning features. InCVPR, pages 12196–12205, 2022
2022
-
[78]
Mitigating bias in dataset distillation
Justin Cui, Ruochen Wang, Yuanhao Xiong, and Cho-Jui Hsieh. Mitigating bias in dataset distillation. InICML, 2024
2024
-
[79]
Dataset distillation for pre-trained self-supervised vision models
George Cazenavette, Antonio Torralba, and Vincent Sitzmann. Dataset distillation for pre-trained self-supervised vision models. InNeurIPS, 2025
2025
-
[80]
Dataset distillation via knowledge distillation: towards efficient self-supervised pre-training of deep networks
Siddharth Joshi, Jiayi Ni, and Baharan Mirzasoleiman. Dataset distillation via knowledge distillation: towards efficient self-supervised pre-training of deep networks. InICLR, 2025
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.