From Compression to Accountability: Harmless Copyright Protection for Dataset Distillation

Joey Tianyi Zhou; Mengyu Sun; Yan Liang; Yi Zhang; Ziyuan Yang

arxiv: 2605.12942 · v2 · pith:IX4RTJJUnew · submitted 2026-05-13 · 💻 cs.CR

From Compression to Accountability: Harmless Copyright Protection for Dataset Distillation

Yan Liang , Ziyuan Yang , Mengyu Sun , Joey Tianyi Zhou , Yi Zhang This is my paper

Pith reviewed 2026-05-19 14:18 UTC · model grok-4.3

classification 💻 cs.CR

keywords dataset distillationcopyright protectionsubpopulation biasmodel provenanceblack-box verificationharmless markingdata accountability

0 comments

The pith

SubPopMark protects distilled datasets by embedding class-consistent subpopulation biases that models memorize and reveal during verification.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that distilled datasets can be made accountable for copyright by injecting harmless class-consistent subpopulation biases that trained models will reliably exhibit as prediction patterns. This enables owners to perform black-box verification and user-specific tracing by comparing a suspicious model's outputs against a pre-built reference behavior bank covering standard and shifted data distributions. A sympathetic reader would care because current protection methods for raw data rely on risky backdoors, while this approach avoids security concerns and preserves both the original training trajectory and dataset utility for efficient model training. The method directly addresses the risks of easy copying and redistribution that come with compact synthetic datasets.

Core claim

Deep neural networks memorize subpopulation distributions during training and therefore display systematic prediction biases favoring aligned samples. SubPopMark exploits this property through a two-stage process: the Copyright Verification Marker optimization injects a class-consistent subpopulation bias while keeping the original optimization trajectory intact, after which the User-Specific Tracing Marker stage adds distinguishable perturbations. A reference behavior bank is assembled from model outputs on carefully chosen test sets spanning both standard and subpopulation-shifted distributions, allowing the provenance of any suspicious model to be inferred by matching its output behavior

What carries the argument

SubPopMark, a two-stage bias-injection framework that creates detectable yet harmless prediction signatures in models trained on the protected distilled data for subsequent black-box verification and tracing.

If this is right

Owners gain the ability to perform black-box copyright verification of any model suspected to have used the protected distilled dataset.
Individual users or distributors become traceable through the unique perturbations added in the second stage.
Protection is achieved without introducing backdoor triggers or other malicious behaviors that could raise security issues.
The original utility of the distilled dataset for training high-performing models remains fully preserved.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same subpopulation-bias principle could be tested on other forms of synthetic or compressed training data beyond distillation.
Robustness against deliberate attempts to remove or mask the bias during training would be a natural next measurement.
Widespread adoption might support emerging requirements for data provenance tracking in deployed AI systems.

Load-bearing premise

Deep neural networks will reliably memorize and display the injected class-consistent subpopulation bias as detectable prediction patterns after training, while the injection leaves the original optimization trajectory and dataset utility unchanged.

What would settle it

Train multiple independent models on the protected distilled dataset and check whether their accuracy or output distributions on subpopulation-shifted test samples consistently and measurably favor the injected bias direction compared with standard test samples.

Figures

Figures reproduced from arXiv: 2605.12942 by Joey Tianyi Zhou, Mengyu Sun, Yan Liang, Yi Zhang, Ziyuan Yang.

**Figure 2.** Figure 2: The overview of our proposed method. where PM and PM denote the manipulation subset and transformed distributions. Since the transformed samples follow a consistent subpopulation structure introduced during training, the infringing model assigns a higher posterior probability to the correct label compared to the reference model. Therefore, we expect: Pmark(y | x) > Psyn(y | x). (9) Consequently, the confid… view at source ↗

**Figure 3.** Figure 3: Performance analysis of SubPopMark across diverse datasets and IPC settings using the DC distillation method. All [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Performance analysis of SubPopMark across diverse datasets and IPC settings using the DC distillation method across [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Stable reference values of copyright verification performance gaps for reference and infringing models across various [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of prediction distributions between infring [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: Visualization of the original and our protected distilled [PITH_FULL_IMAGE:figures/full_fig_p012_7.png] view at source ↗

**Figure 8.** Figure 8: Computational efficiency analysis. Transformer-based marker is combined with the ConvNetbased markers, the proposed framework still preserves effective copyright verification on convolutional architectures, indicating that the protection signals remain robust under heterogeneous backbone settings. VII. CONCLUSION In this paper, we propose SubPopMark, a harmless framework for copyright protection and data… view at source ↗

read the original abstract

Large-scale datasets have been a key driving force behind the rapid progress of deep learning, but their storage, computational, and energy costs have become increasingly prohibitive. Dataset distillation (DD) mitigates this problem by synthesizing compact yet informative datasets, thereby enabling efficient model training and storage. However, the ease of copying and distributing distilled datasets introduces serious risks of copyright infringement and data leakage. Existing protection methods are primarily designed for raw datasets rather than distilled datasets, and typically rely on backdoor-triggered malicious behaviors, which may raise security concerns. In this paper, we observe that deep neural networks tend to memorize subpopulation distributions during training, resulting in a systematic prediction bias, where models perform better on samples aligned with memorized subpopulations. Motivated by this observation, we propose SubPopMark, a harmless subpopulation-driven protection framework for distilled datasets. SubPopMark consists of two stages. First, the Copyright Verification Marker(CVM) optimization stage injects a class-consistent subpopulation bias while preserving the original optimization trajectory. Second, the User-Specific Tracing Marker (USTM) optimization stage further introduces user-distinguishable perturbations into the CVM-augmented data. To enable black-box verification and tracing, we construct a reference behavior bank by collecting model outputs over carefully designed test sets that cover both standard and subpopulation-shifted data distributions. The provenance of a suspicious model is then inferred by comparing its output behavior signature with the bank and identifying the most consistent reference behavior pattern.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SubPopMark marks distilled datasets via injected subpopulation biases for black-box tracing without backdoors, but the approach rests on an untested assumption that those biases will survive training and produce reliable signatures.

read the letter

The paper's main contribution is a two-stage marking scheme for distilled datasets. CVM adds a class-consistent subpopulation bias during distillation while trying to hold the original optimization path steady, then USTM layers on user-specific perturbations. A reference bank of model outputs on both clean and shifted test sets lets you match a suspicious model's behavior to the right source. This is a direct response to the copying risk that comes with small, shareable distilled sets, and it deliberately avoids the malicious triggers common in earlier protection work on raw data.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces SubPopMark, a two-stage harmless protection framework for distilled datasets. The Copyright Verification Marker (CVM) optimization injects a class-consistent subpopulation bias while preserving the original optimization trajectory; the User-Specific Tracing Marker (USTM) adds user-distinguishable perturbations. A reference behavior bank is built from model outputs on standard and subpopulation-shifted test sets, enabling black-box provenance inference by matching a suspicious model's output signature to the closest reference pattern.

Significance. If the injected biases produce persistent, distinguishable signatures without degrading utility or introducing security risks, the work would offer a practical alternative to backdoor-based dataset protection methods. This addresses an important gap in copyright accountability for dataset distillation, a technique whose adoption is growing due to storage and compute constraints in deep learning.

major comments (2)

[Abstract] Abstract: the central claim that provenance can be inferred by matching output behavior signatures rests on the unshown empirical support that the CVM-injected subpopulation bias remains measurable and class-consistent after training on the distilled set. No results, ablations, or error analysis are described to confirm distinguishability from natural variation or unmarked training.
[CVM optimization stage] CVM optimization stage: the assumption that artificial subpopulation bias injection will reliably produce a detectable, persistent signature (distinct from training dynamics, random seeds, or hyperparameter changes) lacks any derivation, bound, or preliminary evidence. If the bias is washed out on the small distilled dataset, signature overlap will undermine the reference bank matching step.

minor comments (1)

[Abstract] The abstract would benefit from a concise statement of the datasets, models, and quantitative metrics (e.g., verification accuracy, utility drop) used to validate the framework.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive feedback, which highlights important aspects of empirical validation and theoretical grounding in our proposed SubPopMark framework. We address each major comment below and indicate the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that provenance can be inferred by matching output behavior signatures rests on the unshown empirical support that the CVM-injected subpopulation bias remains measurable and class-consistent after training on the distilled set. No results, ablations, or error analysis are described to confirm distinguishability from natural variation or unmarked training.

Authors: We agree that the abstract would benefit from explicitly referencing the supporting experiments. The full manuscript presents results in Sections 4.2 and 4.3, including ablations on signature consistency across multiple distilled datasets and comparisons against unmarked training runs, showing measurable separation from natural variation. In the revision, we will update the abstract to concisely summarize these empirical findings and their implications for provenance inference. revision: yes
Referee: [CVM optimization stage] CVM optimization stage: the assumption that artificial subpopulation bias injection will reliably produce a detectable, persistent signature (distinct from training dynamics, random seeds, or hyperparameter changes) lacks any derivation, bound, or preliminary evidence. If the bias is washed out on the small distilled dataset, signature overlap will undermine the reference bank matching step.

Authors: We acknowledge the value of additional evidence for robustness. The approach builds on the documented tendency of DNNs to exhibit subpopulation biases during training, as motivated in the introduction. We have added new ablation results in the revised manuscript that evaluate signature persistence under varied random seeds, hyperparameters, and training trajectories, confirming low overlap with unmarked cases and reliable matching in the reference bank. While the phenomenon is primarily empirical rather than derived from a closed-form bound, these experiments directly address concerns about washout on distilled data. revision: partial

standing simulated objections not resolved

A formal theoretical derivation or bound guaranteeing persistence of the injected subpopulation bias independent of training dynamics.

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper motivates SubPopMark from the stated empirical observation that DNNs exhibit systematic prediction bias on subpopulation-aligned samples due to memorization. The CVM and USTM stages inject bias while preserving optimization trajectory, and provenance is recovered by matching model outputs against a reference behavior bank built from independently designed test sets (standard plus subpopulation-shifted). No equations, fitted parameters, or self-citations are shown that would reduce the signature-matching step to a tautology or to the injected values by construction. The verification procedure remains externally falsifiable on the held-out test distributions and does not collapse into renaming or self-definition.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests primarily on the stated domain assumption about subpopulation memorization; no free parameters or invented entities are explicitly introduced in the provided description.

axioms (1)

domain assumption Deep neural networks tend to memorize subpopulation distributions during training, resulting in a systematic prediction bias where models perform better on samples aligned with memorized subpopulations.
This observation is presented as the direct motivation for injecting biases and is central to both stages of the method.

pith-pipeline@v0.9.0 · 5795 in / 1180 out tokens · 50790 ms · 2026-05-19T14:18:11.156020+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · 4 internal anchors

[1]

A comprehensive survey of dataset distillation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):17–32, 2023

Shiye Lei and Dacheng Tao. A comprehensive survey of dataset distillation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):17–32, 2023

work page 2023
[2]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171– 4186, 2019

work page 2019
[3]

Randla-net: Efficient semantic segmentation of large-scale point clouds

Qingyong Hu, Bo Yang, Linhai Xie, Stefano Rosa, Yulan Guo, Zhihua Wang, Niki Trigoni, and Andrew Markham. Randla-net: Efficient semantic segmentation of large-scale point clouds. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11108–11117, 2020

work page 2020
[4]

Self-supervised graph transformer on large- scale molecular data.Advances in neural information processing systems, 33:12559–12571, 2020

Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei, Wenbing Huang, and Junzhou Huang. Self-supervised graph transformer on large- scale molecular data.Advances in neural information processing systems, 33:12559–12571, 2020

work page 2020
[5]

Green ai

Roy Schwartz, Jesse Dodge, Noah A Smith, and Oren Etzioni. Green ai. Communications of the ACM, 63(12):54–63, 2020

work page 2020
[6]

Trust the unreliability: Inward backward dynamic unreliability driven coreset selection for medical image classification, 2026

Yan Liang, Ziyuan Yang, Zhuxin Lei, Mengyu Sun, Yingyu Chen, and Yi Zhang. Trust the unreliability: Inward backward dynamic unreliability driven coreset selection for medical image classification, 2026

work page 2026
[7]

Diversity-driven synthesis: Enhancing dataset distillation through directed weight adjustment.Advances in neural information processing systems, 37:119443–119465, 2024

Jiawei Du, Xin Zhang, Juncheng Hu, Wenxing Huang, and Joey T Zhou. Diversity-driven synthesis: Enhancing dataset distillation through directed weight adjustment.Advances in neural information processing systems, 37:119443–119465, 2024

work page 2024
[8]

On the diversity and realism of distilled dataset: An efficient dataset distillation paradigm

Peng Sun, Bei Shi, Daiwei Yu, and Tao Lin. On the diversity and realism of distilled dataset: An efficient dataset distillation paradigm. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9390–9399, 2024

work page 2024
[9]

Dataset condensation with distribution matching

Bo Zhao and Hakan Bilen. Dataset condensation with distribution matching. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 6514–6523, 2023

work page 2023
[10]

Teddy: Efficient large-scale dataset distillation via taylor-approximated matching

Ruonan Yu, Songhua Liu, Jingwen Ye, and Xinchao Wang. Teddy: Efficient large-scale dataset distillation via taylor-approximated matching. InEuropean Conference on Computer Vision, pages 1–17. Springer, 2024

work page 2024
[11]

Poisoned distillation: Injecting backdoors into distilled datasets without raw data access

Ziyuan Yang, Ming Yan, Yi Zhang, and Joey Tianyi Zhou. Poisoned distillation: Injecting backdoors into distilled datasets without raw data access. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 1444–1452, 2026

work page 2026
[12]

Dataset Distillation

Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A Efros. Dataset distillation.arXiv preprint arXiv:1811.10959, 2018

work page internal anchor Pith review arXiv 2018
[13]

Backdoor attacks against dataset distillation.arXiv preprint arXiv:2301.01197, 2023

Yugeng Liu, Zheng Li, Michael Backes, Yun Shen, and Yang Zhang. Backdoor attacks against dataset distillation.arXiv preprint arXiv:2301.01197, 2023

work page arXiv 2023
[14]

Efficient dataset distillation using random feature approximation.Advances in Neural Information Processing Systems, 35:13877–13891, 2022

Noel Loo, Ramin Hasani, Alexander Amini, and Daniela Rus. Efficient dataset distillation using random feature approximation.Advances in Neural Information Processing Systems, 35:13877–13891, 2022

work page 2022
[15]

Provable and efficient dataset distillation for kernel ridge regression.Advances in Neural Information Processing Systems, 37:88739–88771, 2024

Yilan Chen, Wei Huang, and Tsui-Wei Weng. Provable and efficient dataset distillation for kernel ridge regression.Advances in Neural Information Processing Systems, 37:88739–88771, 2024

work page 2024
[16]

Dataset condensation with gradient matching

Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. Dataset condensation with gradient matching. InNinth International Conference on Learning Representations 2021, 2021

work page 2021
[17]

Dataset condensation with differentiable siamese augmentation

Bo Zhao and Hakan Bilen. Dataset condensation with differentiable siamese augmentation. InInternational Conference on Machine Learning, pages 12674–12685. PMLR, 2021

work page 2021
[18]

Efficient industrial dataset distillation with textual trajectory matching

Muquan Li, Qian Dong, Dongyang Zhang, Ke Qin, and Guangchun Luo. Efficient industrial dataset distillation with textual trajectory matching. IEEE Transactions on Industrial Informatics, pages 1–12, 2026

work page 2026
[19]

Dataset distillation by matching training trajectories

George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A Efros, and Jun-Yan Zhu. Dataset distillation by matching training trajectories. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4750–4759, 2022

work page 2022
[20]

Scaling up dataset distillation to imagenet-1k with constant memory

Justin Cui, Ruochen Wang, Si Si, and Cho-Jui Hsieh. Scaling up dataset distillation to imagenet-1k with constant memory. InInternational Conference on Machine Learning, pages 6565–6590. PMLR, 2023

work page 2023
[21]

Halftone image watermarking by content aware double-sided embedding error diffusion.IEEE Transactions on Image Processing, 27(7):3387– 3402, 2018

Yuanfang Guo, Oscar C Au, Rui Wang, Lu Fang, and Xiaochun Cao. Halftone image watermarking by content aware double-sided embedding error diffusion.IEEE Transactions on Image Processing, 27(7):3387– 3402, 2018

work page 2018
[22]

Robust digital watermarking techniques for copyright protection of digital data: A survey

Poonam Kadian, Shiafali M Arora, and Nidhi Arora. Robust digital watermarking techniques for copyright protection of digital data: A survey. Wireless Personal Communications, 118(4):3225–3249, 2021

work page 2021
[23]

Identity-based encryption transformation for flexible sharing of encrypted data in public cloud.IEEE Transactions on Information Forensics and Security, 15:3168–3180, 2020

Hua Deng, Zheng Qin, Qianhong Wu, Zhenyu Guan, Robert H Deng, Yujue Wang, and Yunya Zhou. Identity-based encryption transformation for flexible sharing of encrypted data in public cloud.IEEE Transactions on Information Forensics and Security, 15:3168–3180, 2020

work page 2020
[24]

Visual privacy protection via mapping distortion

Yiming Li, Peidong Liu, Yong Jiang, and Shu-Tao Xia. Visual privacy protection via mapping distortion. InICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3740–3744. IEEE, 2021

work page 2021
[25]

Multinomial random forest.Pattern Recognition, 122:108331, 2022

Jiawang Bai, Yiming Li, Jiawei Li, Xue Yang, Yong Jiang, and Shu-Tao Xia. Multinomial random forest.Pattern Recognition, 122:108331, 2022

work page 2022
[26]

Black-box dataset ownership verification via backdoor watermarking

Yiming Li, Mingyan Zhu, Xue Yang, Yong Jiang, Tao Wei, and Shu-Tao Xia. Black-box dataset ownership verification via backdoor watermarking. IEEE Transactions on Information Forensics and Security, 18:2318–2332, 2023

work page 2023
[27]

Did you train on my dataset? towards public dataset protection with cleanlabel backdoor watermarking.ACM SIGKDD Explorations Newsletter, 25(1):43–53, 2023

Ruixiang Tang, Qizhang Feng, Ninghao Liu, Fan Yang, and Xia Hu. Did you train on my dataset? towards public dataset protection with cleanlabel backdoor watermarking.ACM SIGKDD Explorations Newsletter, 25(1):43–53, 2023

work page 2023
[28]

Zero-sacrifice persistent- robustness adversarial defense for pre-trained encoders

Zhuxin Lei, Ziyuan Yang, and Yi Zhang. Zero-sacrifice persistent- robustness adversarial defense for pre-trained encoders. InProceedings of the International Conference on Learning Representations, 2026

work page 2026
[29]

Statistical behavior and consistency of classification methods based on convex risk minimization.The Annals of Statistics, 32(1):56–85, 2004

Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization.The Annals of Statistics, 32(1):56–85, 2004

work page 2004
[30]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021

work page 2021
[31]

Multiscale structural similarity for image quality assessment

Zhou Wang, Eero P Simoncelli, and Alan C Bovik. Multiscale structural similarity for image quality assessment. InThe thrity-seventh asilomar conference on signals, systems & computers, 2003, volume 2, pages 1398–1402. Ieee, 2003

work page 2003
[32]

Secure hash standard (shs).Fips pub, 180(4):2012, 2012

Fips Pub. Secure hash standard (shs).Fips pub, 180(4):2012, 2012

work page 2012
[33]

Adam: A method for stochastic optimization.(No Title), 2014

Kingma Diederik. Adam: A method for stochastic optimization.(No Title), 2014

work page 2014
[34]

Imagenet classification with deep convolutional neural networks.Advances in neural information processing systems, 25, 2012

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks.Advances in neural information processing systems, 25, 2012

work page 2012
[35]

Imagenet classification with deep convolutional neural networks.Communications of the ACM, 60(6):84–90, 2017

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks.Communications of the ACM, 60(6):84–90, 2017

work page 2017
[36]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very deep convolutional net- works for large-scale image recognition.arXiv preprint arXiv:1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[37]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016
[38]

Towards lossless dataset distillation via difficulty-aligned 14 trajectory matching

Ziyao Guo, Kai Wang, George Cazenavette, Hui Li, Kaipeng Zhang, and Yang You. Towards lossless dataset distillation via difficulty-aligned 14 trajectory matching. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024
[39]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009
[40]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[41]

Reading digits in natural images with unsupervised feature learning

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Baolin Wu, Andrew Y Ng, et al. Reading digits in natural images with unsupervised feature learning. InNIPS workshop on deep learning and unsupervised feature learning, volume 2011, page 4. Granada, 2011

work page 2011
[42]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weis- senborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[43]

Going deeper with image transformers

Hugo Touvron, Matthieu Cord, Alexandre Sablayrolles, Gabriel Synnaeve, and Herv´e J´egou. Going deeper with image transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 32–42, October 2021

work page 2021
[44]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021. 15

work page 2021

[1] [1]

A comprehensive survey of dataset distillation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):17–32, 2023

Shiye Lei and Dacheng Tao. A comprehensive survey of dataset distillation.IEEE Transactions on Pattern Analysis and Machine Intelligence, 46(1):17–32, 2023

work page 2023

[2] [2]

Bert: Pre-training of deep bidirectional transformers for language understanding

Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. Bert: Pre-training of deep bidirectional transformers for language understanding. InProceedings of the 2019 conference of the North American chapter of the association for computational linguistics: human language technologies, volume 1 (long and short papers), pages 4171– 4186, 2019

work page 2019

[3] [3]

Randla-net: Efficient semantic segmentation of large-scale point clouds

Qingyong Hu, Bo Yang, Linhai Xie, Stefano Rosa, Yulan Guo, Zhihua Wang, Niki Trigoni, and Andrew Markham. Randla-net: Efficient semantic segmentation of large-scale point clouds. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 11108–11117, 2020

work page 2020

[4] [4]

Self-supervised graph transformer on large- scale molecular data.Advances in neural information processing systems, 33:12559–12571, 2020

Yu Rong, Yatao Bian, Tingyang Xu, Weiyang Xie, Ying Wei, Wenbing Huang, and Junzhou Huang. Self-supervised graph transformer on large- scale molecular data.Advances in neural information processing systems, 33:12559–12571, 2020

work page 2020

[5] [5]

Green ai

Roy Schwartz, Jesse Dodge, Noah A Smith, and Oren Etzioni. Green ai. Communications of the ACM, 63(12):54–63, 2020

work page 2020

[6] [6]

Trust the unreliability: Inward backward dynamic unreliability driven coreset selection for medical image classification, 2026

Yan Liang, Ziyuan Yang, Zhuxin Lei, Mengyu Sun, Yingyu Chen, and Yi Zhang. Trust the unreliability: Inward backward dynamic unreliability driven coreset selection for medical image classification, 2026

work page 2026

[7] [7]

Diversity-driven synthesis: Enhancing dataset distillation through directed weight adjustment.Advances in neural information processing systems, 37:119443–119465, 2024

Jiawei Du, Xin Zhang, Juncheng Hu, Wenxing Huang, and Joey T Zhou. Diversity-driven synthesis: Enhancing dataset distillation through directed weight adjustment.Advances in neural information processing systems, 37:119443–119465, 2024

work page 2024

[8] [8]

On the diversity and realism of distilled dataset: An efficient dataset distillation paradigm

Peng Sun, Bei Shi, Daiwei Yu, and Tao Lin. On the diversity and realism of distilled dataset: An efficient dataset distillation paradigm. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9390–9399, 2024

work page 2024

[9] [9]

Dataset condensation with distribution matching

Bo Zhao and Hakan Bilen. Dataset condensation with distribution matching. InProceedings of the IEEE/CVF winter conference on applications of computer vision, pages 6514–6523, 2023

work page 2023

[10] [10]

Teddy: Efficient large-scale dataset distillation via taylor-approximated matching

Ruonan Yu, Songhua Liu, Jingwen Ye, and Xinchao Wang. Teddy: Efficient large-scale dataset distillation via taylor-approximated matching. InEuropean Conference on Computer Vision, pages 1–17. Springer, 2024

work page 2024

[11] [11]

Poisoned distillation: Injecting backdoors into distilled datasets without raw data access

Ziyuan Yang, Ming Yan, Yi Zhang, and Joey Tianyi Zhou. Poisoned distillation: Injecting backdoors into distilled datasets without raw data access. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 1444–1452, 2026

work page 2026

[12] [12]

Dataset Distillation

Tongzhou Wang, Jun-Yan Zhu, Antonio Torralba, and Alexei A Efros. Dataset distillation.arXiv preprint arXiv:1811.10959, 2018

work page internal anchor Pith review arXiv 2018

[13] [13]

Backdoor attacks against dataset distillation.arXiv preprint arXiv:2301.01197, 2023

Yugeng Liu, Zheng Li, Michael Backes, Yun Shen, and Yang Zhang. Backdoor attacks against dataset distillation.arXiv preprint arXiv:2301.01197, 2023

work page arXiv 2023

[14] [14]

Efficient dataset distillation using random feature approximation.Advances in Neural Information Processing Systems, 35:13877–13891, 2022

Noel Loo, Ramin Hasani, Alexander Amini, and Daniela Rus. Efficient dataset distillation using random feature approximation.Advances in Neural Information Processing Systems, 35:13877–13891, 2022

work page 2022

[15] [15]

Provable and efficient dataset distillation for kernel ridge regression.Advances in Neural Information Processing Systems, 37:88739–88771, 2024

Yilan Chen, Wei Huang, and Tsui-Wei Weng. Provable and efficient dataset distillation for kernel ridge regression.Advances in Neural Information Processing Systems, 37:88739–88771, 2024

work page 2024

[16] [16]

Dataset condensation with gradient matching

Bo Zhao, Konda Reddy Mopuri, and Hakan Bilen. Dataset condensation with gradient matching. InNinth International Conference on Learning Representations 2021, 2021

work page 2021

[17] [17]

Dataset condensation with differentiable siamese augmentation

Bo Zhao and Hakan Bilen. Dataset condensation with differentiable siamese augmentation. InInternational Conference on Machine Learning, pages 12674–12685. PMLR, 2021

work page 2021

[18] [18]

Efficient industrial dataset distillation with textual trajectory matching

Muquan Li, Qian Dong, Dongyang Zhang, Ke Qin, and Guangchun Luo. Efficient industrial dataset distillation with textual trajectory matching. IEEE Transactions on Industrial Informatics, pages 1–12, 2026

work page 2026

[19] [19]

Dataset distillation by matching training trajectories

George Cazenavette, Tongzhou Wang, Antonio Torralba, Alexei A Efros, and Jun-Yan Zhu. Dataset distillation by matching training trajectories. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 4750–4759, 2022

work page 2022

[20] [20]

Scaling up dataset distillation to imagenet-1k with constant memory

Justin Cui, Ruochen Wang, Si Si, and Cho-Jui Hsieh. Scaling up dataset distillation to imagenet-1k with constant memory. InInternational Conference on Machine Learning, pages 6565–6590. PMLR, 2023

work page 2023

[21] [21]

Halftone image watermarking by content aware double-sided embedding error diffusion.IEEE Transactions on Image Processing, 27(7):3387– 3402, 2018

Yuanfang Guo, Oscar C Au, Rui Wang, Lu Fang, and Xiaochun Cao. Halftone image watermarking by content aware double-sided embedding error diffusion.IEEE Transactions on Image Processing, 27(7):3387– 3402, 2018

work page 2018

[22] [22]

Robust digital watermarking techniques for copyright protection of digital data: A survey

Poonam Kadian, Shiafali M Arora, and Nidhi Arora. Robust digital watermarking techniques for copyright protection of digital data: A survey. Wireless Personal Communications, 118(4):3225–3249, 2021

work page 2021

[23] [23]

Identity-based encryption transformation for flexible sharing of encrypted data in public cloud.IEEE Transactions on Information Forensics and Security, 15:3168–3180, 2020

Hua Deng, Zheng Qin, Qianhong Wu, Zhenyu Guan, Robert H Deng, Yujue Wang, and Yunya Zhou. Identity-based encryption transformation for flexible sharing of encrypted data in public cloud.IEEE Transactions on Information Forensics and Security, 15:3168–3180, 2020

work page 2020

[24] [24]

Visual privacy protection via mapping distortion

Yiming Li, Peidong Liu, Yong Jiang, and Shu-Tao Xia. Visual privacy protection via mapping distortion. InICASSP 2021-2021 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), pages 3740–3744. IEEE, 2021

work page 2021

[25] [25]

Multinomial random forest.Pattern Recognition, 122:108331, 2022

Jiawang Bai, Yiming Li, Jiawei Li, Xue Yang, Yong Jiang, and Shu-Tao Xia. Multinomial random forest.Pattern Recognition, 122:108331, 2022

work page 2022

[26] [26]

Black-box dataset ownership verification via backdoor watermarking

Yiming Li, Mingyan Zhu, Xue Yang, Yong Jiang, Tao Wei, and Shu-Tao Xia. Black-box dataset ownership verification via backdoor watermarking. IEEE Transactions on Information Forensics and Security, 18:2318–2332, 2023

work page 2023

[27] [27]

Did you train on my dataset? towards public dataset protection with cleanlabel backdoor watermarking.ACM SIGKDD Explorations Newsletter, 25(1):43–53, 2023

Ruixiang Tang, Qizhang Feng, Ninghao Liu, Fan Yang, and Xia Hu. Did you train on my dataset? towards public dataset protection with cleanlabel backdoor watermarking.ACM SIGKDD Explorations Newsletter, 25(1):43–53, 2023

work page 2023

[28] [28]

Zero-sacrifice persistent- robustness adversarial defense for pre-trained encoders

Zhuxin Lei, Ziyuan Yang, and Yi Zhang. Zero-sacrifice persistent- robustness adversarial defense for pre-trained encoders. InProceedings of the International Conference on Learning Representations, 2026

work page 2026

[29] [29]

Statistical behavior and consistency of classification methods based on convex risk minimization.The Annals of Statistics, 32(1):56–85, 2004

Tong Zhang. Statistical behavior and consistency of classification methods based on convex risk minimization.The Annals of Statistics, 32(1):56–85, 2004

work page 2004

[30] [30]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021

work page 2021

[31] [31]

Multiscale structural similarity for image quality assessment

Zhou Wang, Eero P Simoncelli, and Alan C Bovik. Multiscale structural similarity for image quality assessment. InThe thrity-seventh asilomar conference on signals, systems & computers, 2003, volume 2, pages 1398–1402. Ieee, 2003

work page 2003

[32] [32]

Secure hash standard (shs).Fips pub, 180(4):2012, 2012

Fips Pub. Secure hash standard (shs).Fips pub, 180(4):2012, 2012

work page 2012

[33] [33]

Adam: A method for stochastic optimization.(No Title), 2014

Kingma Diederik. Adam: A method for stochastic optimization.(No Title), 2014

work page 2014

[34] [34]

Imagenet classification with deep convolutional neural networks.Advances in neural information processing systems, 25, 2012

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks.Advances in neural information processing systems, 25, 2012

work page 2012

[35] [35]

Imagenet classification with deep convolutional neural networks.Communications of the ACM, 60(6):84–90, 2017

Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks.Communications of the ACM, 60(6):84–90, 2017

work page 2017

[36] [36]

Very Deep Convolutional Networks for Large-Scale Image Recognition

Karen Simonyan and Andrew Zisserman. Very deep convolutional net- works for large-scale image recognition.arXiv preprint arXiv:1409.1556, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[37] [37]

Deep residual learning for image recognition

Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 770–778, 2016

work page 2016

[38] [38]

Towards lossless dataset distillation via difficulty-aligned 14 trajectory matching

Ziyao Guo, Kai Wang, George Cazenavette, Hui Li, Kaipeng Zhang, and Yang You. Towards lossless dataset distillation via difficulty-aligned 14 trajectory matching. InThe Twelfth International Conference on Learning Representations, 2024

work page 2024

[39] [39]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009

[40] [40]

Fashion-MNIST: a Novel Image Dataset for Benchmarking Machine Learning Algorithms

Han Xiao, Kashif Rasul, and Roland V ollgraf. Fashion-mnist: a novel image dataset for benchmarking machine learning algorithms.arXiv preprint arXiv:1708.07747, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[41] [41]

Reading digits in natural images with unsupervised feature learning

Yuval Netzer, Tao Wang, Adam Coates, Alessandro Bissacco, Baolin Wu, Andrew Y Ng, et al. Reading digits in natural images with unsupervised feature learning. InNIPS workshop on deep learning and unsupervised feature learning, volume 2011, page 4. Granada, 2011

work page 2011

[42] [42]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

Alexey Dosovitskiy, Lucas Beyer, Alexander Kolesnikov, Dirk Weis- senborn, Xiaohua Zhai, Thomas Unterthiner, Mostafa Dehghani, Matthias Minderer, Georg Heigold, Sylvain Gelly, et al. An image is worth 16x16 words: Transformers for image recognition at scale.arXiv preprint arXiv:2010.11929, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[43] [43]

Going deeper with image transformers

Hugo Touvron, Matthieu Cord, Alexandre Sablayrolles, Gabriel Synnaeve, and Herv´e J´egou. Going deeper with image transformers. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 32–42, October 2021

work page 2021

[44] [44]

Swin transformer: Hierarchical vision transformer using shifted windows

Ze Liu, Yutong Lin, Yue Cao, Han Hu, Yixuan Wei, Zheng Zhang, Stephen Lin, and Baining Guo. Swin transformer: Hierarchical vision transformer using shifted windows. InProceedings of the IEEE/CVF international conference on computer vision, pages 10012–10022, 2021. 15

work page 2021