Rank-Aware Hyperbolic Alignment for Vision-Language Dataset Distillation
Pith reviewed 2026-06-30 07:19 UTC · model grok-4.3
The pith
Rank-aware hyperbolic alignment separates shared image-text semantics from modality-private residuals to improve vision-language dataset distillation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RAHA lifts multimodal representations to hyperbolic space and optimizes distilled pairs with asymmetric objectives that enforce geodesic alignment in the shared range while regularizing the residual subspace to preserve modality-private diversity and improve transfer robustness.
What carries the argument
rank-aware hyperbolic alignment (RAHA), which uses hierarchical hyperbolic geometry together with asymmetric geodesic objectives to enforce alignment only in the dominant shared subspace
If this is right
- Synthetic pairs distilled with RAHA achieve competitive cross-modal retrieval under fixed budgets.
- Transfer performance on downstream tasks improves relative to Euclidean and low-rank factorization methods.
- Modality-private diversity is preserved in the residual subspace, reducing overfitting to shared semantics.
- Contrastive vision-language models can be trained more robustly with smaller synthetic datasets.
Where Pith is reading between the lines
- The same hyperbolic capacity-control idea could be tested on other multimodal tasks where one modality carries hierarchical structure.
- If the rank deficiency assumption holds across datasets, the method might reduce the number of required synthetic pairs further without loss of performance.
- Combining RAHA with trajectory-matching distillation techniques could compound the efficiency gains.
- The approach implies that geometry choice matters more than raw dimensionality reduction when alignment capacity must be explicitly budgeted.
Load-bearing premise
Image-text correlation is rank-deficient, with shared semantics concentrated in a low-dimensional range that hyperbolic lifting and asymmetric objectives can isolate and control more effectively than Euclidean or low-rank baselines.
What would settle it
Demonstrating that a Euclidean low-rank baseline or full-dimensional alignment matches or exceeds RAHA on cross-modal retrieval and transfer metrics under identical budgets and architectures would falsify the claimed advantage.
Figures
read the original abstract
Vision-language dataset distillation (VLDD) compresses a large image-text paired dataset into a small set of synthetic pairs that can efficiently train contrastive vision-language models under strict data and compute budgets. Most existing methods match expert trajectories or cross-modal statistics, yet still enforce full-dimensional alignment in a Euclidean embedding space. This is often overly restrictive due to rank-deficient image--text correlation, with shared semantics concentrated in a low-dimensional range and remaining variation spread across a weakly correlated residual subspace. LoRS relaxes alignment at the similarity level by low-rank factorization, but does not explicitly control dominant alignment capacity and structure in the representation space. We thus propose a rank-aware hyperbolic alignment (RAHA) that combines hierarchical geometry with explicit alignment-capacity control. RAHA lifts multimodal representations to hyperbolic space and optimizes distilled pairs with asymmetric objectives that enforce geodesic alignment in the shared range while regularizing the residual subspace to preserve modality-private diversity and improve transfer robustness. Experiments on benchmarks show that RAHA demonstrates competitive cross-modal retrieval and improved transfer indicators under fixed budgets.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Rank-Aware Hyperbolic Alignment (RAHA) for vision-language dataset distillation (VLDD). It argues that image-text correlations are rank-deficient, with shared semantics in a low-dimensional range and residual variation weakly correlated. Existing methods enforce full-dimensional Euclidean alignment or use low-rank factorization (LoRS) without explicit capacity control. RAHA lifts representations to hyperbolic space and optimizes distilled pairs via asymmetric geodesic objectives that enforce alignment in the shared range while regularizing the residual subspace for modality-private diversity. Experiments claim competitive cross-modal retrieval and improved transfer indicators under fixed budgets.
Significance. If the empirical results and derivations hold, the work offers a geometrically motivated way to relax over-constrained alignment in VLDD while preserving transfer robustness. The combination of hyperbolic lifting with explicit rank-aware regularization could influence dataset distillation and cross-modal representation learning by providing a principled alternative to Euclidean or low-rank baselines, particularly under strict data budgets.
major comments (2)
- [Abstract] Abstract: the central premise that 'image--text correlation is rank-deficient with shared semantics concentrated in a low-dimensional range' is stated without supporting analysis or citation to prior rank analyses of vision-language embeddings; this assumption is load-bearing for the motivation of hyperbolic lifting and asymmetric objectives, yet no evidence or derivation is visible to substantiate it.
- [Abstract] Abstract: the claim that RAHA 'demonstrates competitive cross-modal retrieval and improved transfer indicators' is presented without reference to specific baselines, metrics, datasets, or quantitative deltas; without the experimental section, it is impossible to assess whether the hyperbolic components deliver gains beyond what Euclidean low-rank methods already achieve.
minor comments (2)
- [Abstract] Abstract: the term 'asymmetric objectives' is introduced without a brief definition or contrast to symmetric contrastive losses; a one-sentence clarification would improve readability.
- [Abstract] Abstract: 'hierarchical geometry' is invoked but not linked to any concrete hyperbolic model (e.g., Poincaré ball, Lorentz model) or curvature parameter; specifying the model would help readers anticipate the technical approach.
Simulated Author's Rebuttal
We thank the referee for the comments. We address the two major points on the abstract below.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central premise that 'image--text correlation is rank-deficient with shared semantics concentrated in a low-dimensional range' is stated without supporting analysis or citation to prior rank analyses of vision-language embeddings; this assumption is load-bearing for the motivation of hyperbolic lifting and asymmetric objectives, yet no evidence or derivation is visible to substantiate it.
Authors: The abstract states the premise concisely as motivation. The full manuscript contains an SVD-based rank analysis of cross-modal similarity matrices in Section 3.1 demonstrating rapid singular-value decay. We will add a citation to prior rank analyses of VL embeddings and a one-sentence reference to this analysis in the revised abstract. revision: yes
-
Referee: [Abstract] Abstract: the claim that RAHA 'demonstrates competitive cross-modal retrieval and improved transfer indicators' is presented without reference to specific baselines, metrics, datasets, or quantitative deltas; without the experimental section, it is impossible to assess whether the hyperbolic components deliver gains beyond what Euclidean low-rank methods already achieve.
Authors: Abstracts are space-constrained summaries. Section 4 and Tables 2–4 report the full comparisons (baselines: Euclidean full-alignment and LoRS; metrics: Recall@K and transfer accuracy; datasets: COCO, Flickr30K) with quantitative deltas. We will revise the abstract to name the primary datasets and one key improvement figure. revision: yes
Circularity Check
No significant circularity identified
full rationale
The abstract and description introduce RAHA as a new method combining hyperbolic geometry with asymmetric geodesic objectives for rank-aware alignment in VLDD. No equations, fitting procedures, self-citations, or derivation steps are visible that would reduce any claimed prediction or result to its own inputs by construction. The central premise (rank-deficient correlation addressed via hyperbolic lifting) is presented as a technical choice rather than derived from prior self-referential results. This matches the expectation for a score of 0 when the provided text is self-contained against external benchmarks with no load-bearing circular steps.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Hugging Face Datasets (2023),https://huggin gface.co/datasets/liuhaotian/LLaVA-CC3M-Pretrain-595K
Llava-cc3m-pretrain-595k dataset. Hugging Face Datasets (2023),https://huggin gface.co/datasets/liuhaotian/LLaVA-CC3M-Pretrain-595K
2023
-
[2]
Achiam, J., Adler, S., Agarwal, S., Ahmad, L., Akkaya, I., Aleman, F.L., Almeida, D., Altenschmidt, J., Altman, S., Anadkat, S., et al.: Gpt-4 technical report. arXiv preprint arXiv:2303.08774 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[3]
Advances in neural information processing systems35, 23716– 23736 (2022)
Alayrac, J.B., Donahue, J., Luc, P., Miech, A., Barr, I., Hasson, Y., Lenc, K., Mensch, A., Millican, K., Reynolds, M., et al.: Flamingo: a visual language model for few-shot learning. Advances in neural information processing systems35, 23716– 23736 (2022)
2022
-
[4]
Bai, J., Bai, S., Chu, Y., Cui, Z., Dang, K., Deng, X., Fan, Y., Ge, W., Han, Y., Huang, F., et al.: Qwen technical report. arXiv preprint arXiv:2309.16609 (2023)
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[5]
In: Advances in Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track (2023)
Birhane, A., Prabhu, V., Han, S., Boddeti, V.N., Luccioni, A.S.: Into the LAIONs den: Investigating hate in multimodal datasets. In: Advances in Neural Information Processing Systems (NeurIPS), Datasets and Benchmarks Track (2023)
2023
-
[6]
In: Neural networks: tricks of the trade: second edition, pp
Bottou, L.: Stochastic gradient descent tricks. In: Neural networks: tricks of the trade: second edition, pp. 421–436. Springer (2012)
2012
-
[7]
In: International conference on machine learning
Brock, A., De, S., Smith, S.L., Simonyan, K.: High-performance large-scale image recognition without normalization. In: International conference on machine learning. pp. 1059–1071. PMLR (2021)
2021
-
[8]
Byeon, M., Park, B., Kim, H., Lee, S., Baek, W., Kim, S.: Coyo-700m: Image-text pair dataset.https://github.com/kakaobrain/coyo-dataset(2022)
2022
-
[9]
In: IEEE Symposium on Security and Privacy (SP) (2024)
Carlini, N., Jagielski, M., Choquette-Choo, C.A., Paleka, D., Pearce, W., Anderson, H., Terzis, A., Thomas, K., Tramèr, F.: Poisoning web-scale training datasets is practical. In: IEEE Symposium on Security and Privacy (SP) (2024)
2024
-
[10]
In: CVPR (2022)
Cazenavette, G., Wang, T., Torralba, A., Efros, A.A., Zhu, J.Y.: Dataset distillation by matching training trajectories. In: CVPR (2022)
2022
-
[11]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Cazenavette, G., Wang, T., Torralba, A., Efros, A.A., Zhu, J.Y.: Generalizing dataset distillation via deep generative prior. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 3739–3748 (2023)
2023
-
[12]
Advances in Neural Information Processing Systems35, 810–822 (2022)
Cui, J., Wang, R., Si, S., Hsieh, C.J.: Dc-bench: Dataset condensation benchmark. Advances in Neural Information Processing Systems35, 810–822 (2022)
2022
-
[13]
In: International Conference on Machine Learning
Cui, J., Wang, R., Si, S., Hsieh, C.J.: Scaling up dataset distillation to imagenet-1k with constant memory. In: International Conference on Machine Learning. pp. 6565–6590. PMLR (2023)
2023
-
[14]
In: Proceedings of the Computer Vision and Pattern Recognition Conference
Cui, X., Qin, Y., Zhou, W., Li, H., Li, H.: Optical: Leveraging optimal transport for contribution allocation in dataset distillation. In: Proceedings of the Computer Vision and Pattern Recognition Conference. pp. 15245–15254 (2025)
2025
-
[15]
Advances in neural information processing systems26(2013)
Cuturi, M.: Sinkhorn distances: Lightspeed computation of optimal transport. Advances in neural information processing systems26(2013)
2013
-
[16]
In: 2009 IEEE conference on computer vision and pattern recognition
Deng, J., Dong, W., Socher, R., Li, L.J., Li, K., Fei-Fei, L.: Imagenet: A large-scale hierarchical image database. In: 2009 IEEE conference on computer vision and pattern recognition. pp. 248–255. Ieee (2009)
2009
-
[17]
In: ICML (2023)
Desai, K., Nickel, M., Rajpurohit, T., Johnson, J., Vedantam, R.: Hyperbolic image-text representations. In: ICML (2023)
2023
-
[18]
In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers)
Devlin, J., Chang, M.W., Lee, K., Toutanova, K.: Bert: Pre-training of deep bidirectional transformers for language understanding. In: Proceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers). pp. 4171–4186 (2019) 34 Jeonget al
2019
-
[19]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Du, J., Jiang, Y., Tan, V.Y., Zhou, J.T., Li, H.: Minimizing the accumulated trajectory error to improve dataset distillation. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 3749–3758 (2023)
2023
-
[20]
Springer Science & Business Media (2009)
Farahani, R.Z., Hekmatfar, M.: Facility location: concepts, models, algorithms and case studies. Springer Science & Business Media (2009)
2009
-
[21]
Advances in neural information processing systems31(2018)
Ganea, O., Bécigneul, G., Hofmann, T.: Hyperbolic neural networks. Advances in neural information processing systems31(2018)
2018
-
[22]
Countering Adversarial Images using Input Transformations
Guo, C., Rana, M., Cisse, M., Van Der Maaten, L.: Countering adversarial images using input transformations. arXiv preprint arXiv:1711.00117 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[23]
arXiv preprint arXiv:2310.05773 (2023)
Guo, Z., Wang, K., Cazenavette, G., Li, H., Zhang, K., You, Y.: Towards loss- less dataset distillation via difficulty-aligned trajectory matching. arXiv preprint arXiv:2310.05773 (2023)
-
[24]
Journal of Artificial Intelligence Research 47, 853–899 (2013)
Hodosh, M., Young, P., Hockenmaier, J.: Framing image description as a ranking task: Data, models and evaluation metrics. Journal of Artificial Intelligence Research 47, 853–899 (2013)
2013
-
[25]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2026)
Jeong, J., Kwon, H., Kim, M., Yoon, K.J.: Multimodal distribution matching for vision-language dataset distillation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2026)
2026
-
[26]
In: Proceedings of the IEEE conference on computer vision and pattern recognition
Karpathy, A., Fei-Fei, L.: Deep visual-semantic alignments for generating image descriptions. In: Proceedings of the IEEE conference on computer vision and pattern recognition. pp. 3128–3137 (2015)
2015
-
[27]
In: ICML (2022)
Kim, J.H., Kim, J., Oh, S.J., Yun, S., Song, H., Jeong, J., Ha, J.W., Song, H.O.: Dataset condensation via efficient synthetic-data parameterization. In: ICML (2022)
2022
-
[28]
In: European Conference on Computer Vision (ECCV) (2024).https://doi.org/10.48550/arXiv.2404.17507
Kim, W., Chun, S., Kim, T., Han, D., Yun, S.: Hype: Hyperbolic entailment filtering for underspecified images and texts. In: European Conference on Computer Vision (ECCV) (2024).https://doi.org/10.48550/arXiv.2404.17507
-
[29]
In: Proceedings of the IEEE international conference on computer vision workshops
Krause, J., Stark, M., Deng, J., Fei-Fei, L.: 3d object representations for fine-grained categorization. In: Proceedings of the IEEE international conference on computer vision workshops. pp. 554–561 (2013)
2013
-
[30]
Krizhevsky, A., Hinton, G., et al.: Learning multiple layers of features from tiny images (2009)
2009
-
[31]
arXiv preprint arXiv:2208.10494 (2022)
Lee, H.B., Lee, D.B., Hwang, S.J.: Dataset condensation with latent space knowledge factorization and sharing. arXiv preprint arXiv:2208.10494 (2022)
-
[32]
arXiv preprint arXiv:2510.18583 (2025)
Lee, Y., Chung, H.W.: Covmatch: Cross-covariance guided multimodal dataset distillation with trainable text encoder. arXiv preprint arXiv:2510.18583 (2025)
-
[33]
IEEE Transactions on Pattern Analysis and Machine Intelligence46(1), 17–32 (2023)
Lei, S., Tao, D.: A comprehensive survey of dataset distillation. IEEE Transactions on Pattern Analysis and Machine Intelligence46(1), 17–32 (2023)
2023
-
[34]
In: International Conference on Machine Learning
Li, J., Li, D., Xiong, C., Hoi, S.: Blip: Bootstrapping language-image pre-training for unified vision-language understanding and generation. In: International Conference on Machine Learning. pp. 12888–12900. PMLR (2022)
2022
-
[35]
In: Advances in Neural Information Processing Systems (NeurIPS) (2025), poster
Li, W., Li, G., Maeda, K., Ogawa, T., Haseyama, M.: Hyperbolic dataset distillation. In: Advances in Neural Information Processing Systems (NeurIPS) (2025), poster
2025
-
[36]
In: Computer Vision– ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13
Lin, T.Y., Maire, M., Belongie, S., Hays, J., Perona, P., Ramanan, D., Dollár, P., Zitnick, C.L.: Microsoft coco: Common objects in context. In: Computer Vision– ECCV 2014: 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part V 13. pp. 740–755. Springer (2014)
2014
-
[37]
In: European Conference on Computer Vision
Liu, D., Gu, J., Cao, H., Trinitis, C., Schulz, M.: Dataset distillation by automatic training trajectories. In: European Conference on Computer Vision. pp. 334–351. Springer (2024)
2024
-
[38]
Advances in neural information processing systems36, 34892–34916 (2023) RAHA: Appendix 35
Liu, H., Li, C., Wu, Q., Lee, Y.J.: Visual instruction tuning. Advances in neural information processing systems36, 34892–34916 (2023) RAHA: Appendix 35
2023
-
[39]
In: Proceedings of the IEEE/CVF International Conference on Computer Vision
Liu, H., Li, Y., Xing, T., Wang, P., Dalal, V., Li, L., He, J., Wang, H.: Dataset dis- tillation via the wasserstein metric. In: Proceedings of the IEEE/CVF International Conference on Computer Vision. pp. 1205–1215 (2025)
2025
-
[40]
arXiv preprint arXiv:2502.05673 , year=
Liu, P., Du, J.: The evolution of dataset distillation: Toward scalable and generaliz- able solutions. arXiv preprint arXiv:2502.05673 (2025)
-
[41]
In: NeurIPS (2022)
Liu, S., Wang, K., Yang, X., Ye, J., Wang, X.: Dataset distillation via factorization. In: NeurIPS (2022)
2022
-
[42]
arXiv preprint arXiv:2310.16787 (2023)
Longpre, S., Mahari, R., Chen, A., Obeng-Marnu, N., Sileo, D., Brannon, W., Muennighoff, N., Khazam, N., Kabbara, J., Perisetla, K., et al.: The data provenance initiative: A large scale audit of dataset licensing & attribution in AI. arXiv preprint arXiv:2310.16787 (2023)
-
[43]
In: Advances in Neural Information Processing Systems (NeurIPS) (2024)
Longpre, S., Mahari, R., Lee, A., Lund, C., Oderinwale, H., Brannon, W., Saxena, N., Obeng-Marnu, N., South, T., Hunter, C., Klyman, K., et al.: Consent in crisis: The rapid decline of the AI data commons. In: Advances in Neural Information Processing Systems (NeurIPS) (2024)
2024
-
[44]
In: NeurIPS (2022)
Loo, N., Hasani, R., Amini, A., Rus, D.: Efficient dataset distillation using random feature approximation. In: NeurIPS (2022)
2022
-
[45]
arXiv preprint arXiv:2302.06755 (2023)
Loo, N., Hasani, R., Lechner, M., Rus, D.: Dataset distillation with convexified implicit gradients. arXiv preprint arXiv:2302.06755 (2023)
-
[46]
Towards Deep Learning Models Resistant to Adversarial Attacks
Madry, A., Makelov, A., Schmidt, L., Tsipras, D., Vladu, A.: Towards deep learning models resistant to adversarial attacks. arXiv preprint arXiv:1706.06083 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[47]
arXiv preprint arXiv:2011.00050 (2020)
Nguyen, T., Chen, Z., Lee, J.: Dataset meta-learning from kernel ridge-regression. arXiv preprint arXiv:2011.00050 (2020)
-
[48]
In: NeurIPS (2021)
Nguyen, T., Novak, R., Xiao, L., Lee, J.: Dataset distillation with infinitely wide convolutional networks. In: NeurIPS (2021)
2021
-
[49]
Poincar\'e Embeddings for Learning Hierarchical Representations
Nickel, M., Kiela, D.: Poincaré embeddings for learning hierarchical representations. arXiv preprint arXiv:1705.08039 (2017)
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[50]
In: Proceedings of the 35th International Conference on Machine Learning (ICML)
Nickel, M., Kiela, D.: Learning continuous hierarchies in the Lorentz model of hyperbolic geometry. In: Proceedings of the 35th International Conference on Machine Learning (ICML). Proceedings of Machine Learning Research (PMLR), vol. 80, pp. 3779–3788 (2018)
2018
-
[51]
In: International Conference on Learning Representations (ICLR) (2025), oral
Pal, A., van Spengler, M., D’Amely di Melendugno, G.M., Flaborea, A., Galasso, F., Mettes, P.: Compositional entailment learning for hyperbolic vision-language models. In: International Conference on Learning Representations (ICLR) (2025), oral
2025
-
[52]
arXiv preprint arXiv:2101.04562 (2021)
Peng, W., Varanka, T., Mostafa, A., Shi, H., Zhao, G.: Hyperbolic deep neural networks: A survey. arXiv preprint arXiv:2101.04562 (2021)
-
[53]
Poppi, T., Kasarla, T., Mettes, P., Baraldi, L., Cucchiara, R.: Hyperbolic safety- aware vision-language models. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR) (2025).https://doi.org/10 .48550/arXiv.2503.12127
-
[54]
In: International conference on machine learning
Radford, A., Kim, J.W., Hallacy, C., Ramesh, A., Goh, G., Agarwal, S., Sastry, G., Askell, A., Mishkin, P., Clark, J., et al.: Learning transferable visual models from natural language supervision. In: International conference on machine learning. pp. 8748–8763. PmLR (2021)
2021
-
[55]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)
Ramasinghe, S., Shevchenko, V., Avraham, G., Thalaiyasingam, A.: Accept the modality gap: An exploration in the hyperbolic space. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR). pp. 27263–27272 (June 2024) 36 Jeonget al
2024
-
[56]
In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition
Rombach, R., Blattmann, A., Lorenz, D., Esser, P., Ommer, B.: High-resolution image synthesis with latent diffusion models. In: Proceedings of the IEEE/CVF conference on computer vision and pattern recognition. pp. 10684–10695 (2022)
2022
-
[57]
DistilBERT, a distilled version of BERT: smaller, faster, cheaper and lighter
Sanh, V., Debut, L., Chaumond, J., Wolf, T.: Distilbert, a distilled version of bert: smaller, faster, cheaper and lighter. arXiv preprint arXiv:1910.01108 (2019)
work page internal anchor Pith review Pith/arXiv arXiv 1910
-
[58]
Advances in Neural Information Processing Systems35, 25278–25294 (2022)
Schuhmann, C., Beaumont, R., Vencu, R., Gordon, C., Wightman, R., Cherti, M., Coombes, T., Katta, A., Mullis, C., Wortsman, M., et al.: Laion-5b: An open large-scale dataset for training next generation image-text models. Advances in Neural Information Processing Systems35, 25278–25294 (2022)
2022
-
[59]
arXiv preprint arXiv:2312.16627 (2023)
Shang, Y., Yuan, Z., Yan, Y.: Mim4dd: Mutual information maximization for dataset distillation. arXiv preprint arXiv:2312.16627 (2023)
-
[60]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Su, D., Hou, J., Gao, W., Tian, Y., Tang, B.: D^4m: Dataset distillation via disentangled diffusion model. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 5809–5818 (2024)
2024
-
[61]
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
Team,G., Georgiev, P., Lei,V.I., Burnell, R., Bai, L., Gulati, A., Tanzer, G., Vincent, D., Pan, Z., Wang, S., et al.: Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context. arXiv preprint arXiv:2403.05530 (2024)
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[62]
Thiel, D.: Identifying and eliminating CSAM in generative ML training data and models. Tech. rep., Stanford Internet Observatory (2023).https://doi.org/10.2 5740/kh752sm9123,https://purl.stanford.edu/kh752sm9123
2023
-
[63]
Communications of the ACM59(2), 64–73 (2016)
Thomee, B., Shamma, D.A., Friedland, G., Elizalde, B., Ni, K., Poland, D., Borth, D., Li, L.J.: Yfcc100m: The new data in multimedia research. Communications of the ACM59(2), 64–73 (2016)
2016
-
[64]
An empirical study of example forgetting during deep neural network learning
Toneva, M., Sordoni, A., Combes, R.T.d., Trischler, A., Bengio, Y., Gordon, G.J.: An empirical study of example forgetting during deep neural network learning. arXiv preprint arXiv:1812.05159 (2018)
-
[65]
Wang, H., Zhao, Z., Wu, J., Shang, Y., Liu, G., Yan, Y.: Cao2: Rectifying inconsis- tencies in diffusion-based dataset distillation (2025),https://arxiv.org/abs/25 06.22637
2025
-
[66]
In: CVPR (2022)
Wang, K., Zhao, B., Peng, X., Zhu, Z., Yang, S., Wang, S., Huang, G., Bilen, H., Wang, X., You, Y.: Cafe: Learning to condense dataset by aligning features. In: CVPR (2022)
2022
-
[67]
Wang, T., Zhu, J.Y., Torralba, A., Efros, A.A.: Dataset distillation. arXiv preprint arXiv:1811.10959 (2018)
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[68]
Welinder, P., Branson, S., Mita, T., Wah, C., Schroff, F., Belongie, S., Perona, P.: Caltech-ucsd birds 200 (2010)
2010
-
[69]
In: Proceedings of the 26th Annual International Conference on Machine Learning
Welling, M.: Herding dynamical weights to learn. In: Proceedings of the 26th Annual International Conference on Machine Learning. pp. 1121–1128 (2009)
2009
-
[70]
Wu, X., Zhang, B., Deng, Z., Russakovsky, O.: Vision-language dataset distillation (2024),https://openreview.net/forum?id=2y8XnaIiB8, tMLR 2024
2024
-
[71]
Xu, W., Evans, D., Qi, Y.: Feature squeezing: Detecting adversarial examples in deep neural networks. In: NDSS (2018).https://doi.org/10.14722/ndss.2018.23295, https://www.ndss-symposium.org/ndss-paper/feature-squeezing-detectin g-adversarial-examples-in-deep-neural-networks/
-
[72]
In: Proceedings of the 41st International Conference on Machine Learning
Xu, Y., Lin, Z., Qiu, Y., Lu, C., Li, Y.L.: Low-rank similarity mining for multimodal dataset distillation. In: Proceedings of the 41st International Conference on Machine Learning. Proceedings of Machine Learning Research, vol. 235, pp. 55144–55161. PMLR (2024),https://proceedings.mlr.press/v235/xu24q.html
2024
-
[73]
Transactions of the association for computational linguistics2, 67–78 (2014) RAHA: Appendix 37
Young, P., Lai, A., Hodosh, M., Hockenmaier, J.: From image descriptions to visual denotations: New similarity metrics for semantic inference over event descriptions. Transactions of the association for computational linguistics2, 67–78 (2014) RAHA: Appendix 37
2014
-
[74]
IEEE transactions on pattern analysis and machine intelligence46(1), 150–170 (2023)
Yu, R., Liu, S., Wang, X.: Dataset distillation: A comprehensive review. IEEE transactions on pattern analysis and machine intelligence46(1), 150–170 (2023)
2023
-
[75]
In: Proceedings of the IEEE/CVF international conference on computer vision
Zhai, X., Mustafa, B., Kolesnikov, A., Beyer, L.: Sigmoid loss for language image pre- training. In: Proceedings of the IEEE/CVF international conference on computer vision. pp. 11975–11986 (2023)
2023
-
[76]
Zhang, X., Zhang, Z., Du, J., Liu, Z., Zhou, J.T.: Beyond modality collapse: Rep- resentations blending for multimodal dataset distillation. arXiv arXiv:2505.14705 (2025)
-
[77]
In: ICML (2021)
Zhao, B., Bilen, H.: Dataset condensation with differentiable siamese augmentation. In: ICML (2021)
2021
-
[78]
In: WACV (2023)
Zhao, B., Bilen, H.: Dataset condensation with distribution matching. In: WACV (2023)
2023
-
[79]
arXiv preprint arXiv:2006.05929 (2020)
Zhao, B., Mopuri, K.R., Bilen, H.: Dataset condensation with gradient matching. arXiv preprint arXiv:2006.05929 (2020)
-
[80]
In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition
Zhao, G., Li, G., Qin, Y., Yu, Y.: Improved distribution matching for dataset condensation. In: Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition. pp. 7856–7865 (2023)
2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.