Distributional Statistics Restore Training Data Auditability in One-step Distilled Diffusion Models

Andy Song; Feng Liu; Guangquan Zhang; Muxing Li; Sharon Li; Zesheng Ye

arxiv: 2502.02970 · v5 · submitted 2025-02-05 · 💻 cs.LG

Distributional Statistics Restore Training Data Auditability in One-step Distilled Diffusion Models

Muxing Li , Zesheng Ye , Sharon Li , Andy Song , Guangquan Zhang , Feng Liu This is my paper

Pith reviewed 2026-05-23 04:25 UTC · model grok-4.3

classification 💻 cs.LG

keywords diffusion modelsmodel distillationdata provenancekernel discrepancyunauthorized training detectiondistributional auditinggenerative model auditing

0 comments

The pith

Distributional memory persists in one-step distilled diffusion models, allowing kernel-based detection of upstream training data.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that distillation removes per-instance memorization but leaves a statistical trace: student outputs remain closer to the teacher's training distribution than to unrelated references. This trace supports a detector that uses kernel discrepancy to test whether a candidate dataset aligns statistically with the student more than with held-out non-training sets. The approach works even when the unauthorized data is only a minority of the candidate set and across multiple distillation setups. It closes the laundering gap created when students train solely on teacher-generated samples rather than original data.

Core claim

A distributional memory chain survives under distillation: the student's output distribution remains closer to the teacher's training distribution than to any non-training reference, even if no single training instance is memorized. A kernel-based discrepancy test exploits this chain to determine whether a candidate dataset is statistically aligned with the student-generated distribution more than with held-out non-training datasets, tracing provenance back to the teacher's training data.

What carries the argument

Kernel-based distribution discrepancy that measures whether student-generated samples align more closely with a candidate dataset than with non-training references.

If this is right

The detector identifies unauthorized data influence even when that data forms only a minority of the candidate set.
Auditing succeeds without requiring any per-instance memorization in the student.
The method applies across standard benchmarks and multiple distillation pipelines.
Distribution-level checks provide a general countermeasure to model laundering in deployed generators.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar distributional tests could be applied to other compressed generative pipelines beyond one-step diffusion.
The approach suggests auditing standards could shift from instance-level checks to distribution-level comparisons.
One could test whether the same discrepancy signature appears under different teacher-student temperature or sampling settings.

Load-bearing premise

The kernel discrepancy isolates the effect of the original training distribution from distillation hyperparameters, model architecture, and generation randomness.

What would settle it

A concrete counter-example would be any distilled student whose output distribution shows equal or greater discrepancy to the teacher's training set than to a held-out non-training set under the same kernel test.

read the original abstract

The proliferation of diffusion models trained on web-scale, provenance-uncertain image collections has made it essential, yet technically unresolved, to determine whether a model has learned from specific copyrighted data without authorization. Current methods primarily rely on the memorization effect, whereby models reconstruct their training images better than unseen ones, to detect unauthorized training data on a per-instance basis. This effect, however, vanishes under distillation, the now-dominant deployment pipeline that compresses compute-intensive teacher diffusion models into efficient {\em student one-step generators} mimicking the teacher's output for real-time user access. As the students train exclusively on teacher-generated outputs and never directly see the teacher's original training data, they carry no per-instance memorization of that upstream data, creating a model laundering loophole that severs the auditable link between a deployed model and its upstream training data. We nonetheless reveal that a distributional memory chain survives under distillation: the student's output distribution remains closer to the teacher's training distribution than to any non-training reference, even if no single training instance is memorized. Exploiting this chain, we develop a distributional unauthorized training data detector, grounded in kernel-based distribution discrepancy, that determines if a candidate dataset of unknown composition is statistically aligned with the student-generated distribution more than held-out non-training datasets, thus tracing provenance back to the teacher's training data. Evaluation across benchmarks and distillation setups confirms reliable detection even when unauthorized data forms a minority of the candidate set, establishing distribution-level auditing as a countermeasure to model laundering and a paradigm for accountable generative AI ecosystems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The distributional auditing claim for distilled models is plausible on paper but the abstract leaves the key controls unaddressed.

read the letter

The central claim is that a kernel discrepancy test on student outputs can still flag whether the upstream teacher saw particular data, even though per-instance memorization disappears after distillation. That is the new angle: prior memorization detectors stop working once you move to one-step students trained only on teacher samples, and the authors correctly flag this as a laundering loophole for copyright and provenance questions. The framing of the problem is clear and the proposed detector is a straightforward extension of existing discrepancy methods to this setting. Credit for identifying that the distributional trace might survive when instance-level traces do not. The soft spot is exactly the one in the stress-test note. Nothing in the abstract shows an experiment that holds the distillation process fixed while varying only the teacher's training data, or that compares against students distilled from teachers with different data but identical architecture and objective. Without those controls it is difficult to rule out that the measured closeness comes from shared model family or generation process rather than the specific training distribution. The claim that detection works even when the unauthorized set is a minority is stated but not backed by numbers or error bars here. This is for readers who care about practical auditing tools for generative models rather than theoretical guarantees. A serious referee should see it to check whether the empirical design actually separates the data signal from the confounders; the idea is worth that step even if the current evidence looks thin.

Referee Report

2 major / 0 minor

Summary. The manuscript claims that one-step distilled diffusion models retain a distributional memory chain from the teacher's training data, such that student outputs remain statistically closer to the teacher's training distribution than to non-training references (even without per-instance memorization). It proposes a kernel-based distribution discrepancy detector to audit whether a candidate dataset aligns more with the student-generated distribution than held-out non-training sets, thereby tracing provenance and countering model laundering.

Significance. If the distributional alignment is robustly shown to originate specifically from the upstream training data (rather than distillation artifacts), the work would introduce a practical paradigm for provenance auditing of deployed one-step generators, with direct relevance to copyright enforcement and accountable generative AI systems.

major comments (2)

[Abstract] Abstract: The central claim that the kernel discrepancy isolates the effect of the teacher's training distribution (versus confounding factors such as distillation hyperparameters, model architecture, or generation stochasticity) is load-bearing, yet the abstract supplies no description of controls that hold distillation fixed while varying only the upstream training data.
[Abstract] Abstract: The assertion of 'reliable detection even when unauthorized data forms a minority of the candidate set' is presented without any quantitative results, error bars, ablation studies, or statistical significance tests, so the soundness of the detector cannot be assessed from the provided text.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We address the two major comments below and will revise the abstract accordingly to improve clarity and completeness while preserving its brevity.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim that the kernel discrepancy isolates the effect of the teacher's training distribution (versus confounding factors such as distillation hyperparameters, model architecture, or generation stochasticity) is load-bearing, yet the abstract supplies no description of controls that hold distillation fixed while varying only the upstream training data.

Authors: We agree the abstract would benefit from an explicit reference to the controls. The full manuscript (Section 4 and Appendix B) fixes the distillation algorithm, student architecture, sampling procedure, and hyperparameters while training separate teachers on disjoint datasets; the kernel discrepancy is then measured on student outputs from these controlled setups. We will add a concise clause to the abstract noting that 'comparisons hold distillation fixed while varying only the teacher's training data.' revision: yes
Referee: [Abstract] Abstract: The assertion of 'reliable detection even when unauthorized data forms a minority of the candidate set' is presented without any quantitative results, error bars, ablation studies, or statistical significance tests, so the soundness of the detector cannot be assessed from the provided text.

Authors: The referee is correct that the abstract itself contains no numbers. The full paper reports quantitative results (accuracy, AUC, error bars over 5 seeds, and Wilcoxon tests) for minority fractions down to 10-20% in Tables 2-4 and Figures 3-5. We will revise the abstract to include one or two key quantitative statements (e.g., 'detection remains above 85% AUC with p<0.01 even at 20% unauthorized data') while staying within length limits. revision: yes

Circularity Check

0 steps flagged

No circularity: detector is a standard comparative statistical test

full rationale

The paper's core method applies kernel-based distribution discrepancy to compare student-generated outputs against a candidate dataset versus held-out non-training references. No equations, derivations, or self-citations are presented that reduce the detector output to a fitted parameter defined from the target data, nor does any claim reduce by construction to its own inputs. The approach is framed as an external statistical comparison, rendering the derivation self-contained against benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the empirical observation of a surviving distributional alignment whose detection is assumed to be feasible with standard kernel discrepancy without additional unstated modeling assumptions.

axioms (1)

domain assumption Kernel-based maximum mean discrepancy reliably distinguishes the student's output distribution from non-training references when the candidate set overlaps the training distribution.
Invoked as the grounding for the detector; no derivation supplied in abstract.

pith-pipeline@v0.9.0 · 5819 in / 1198 out tokens · 38689 ms · 2026-05-23T04:25:00.419960+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 17 canonical work pages · 5 internal anchors

[1]

Y. Gu, L. Dong, F. Wei, and M. Huang. Minillm: Knowledge distillation of large language models.arXiv preprint arXiv:2306.08543,

work page internal anchor Pith review Pith/arXiv arXiv
[2]

D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P. Wang, X. Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948,

work page internal anchor Pith review Pith/arXiv arXiv
[3]

G. Hinton. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

B. Hu, J. Li, L. Xu, M. Lee, A. Jajoo, G.-W. Kim, H. Xu, and A. Akella. Blockllm: Multi-tenant finer-grained serving for large language models.arXiv preprint arXiv:2404.18322,

work page arXiv
[5]

A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruan, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024a. F. Liu, W. Xu, J. Lu, G. Zhang, A. Gretton, and D. J. Sutherland. Learning deep kernels for non-parametric two-sample tests. InICML,

work page internal anchor Pith review Pith/arXiv arXiv
[6]

Y. Liu, J. Cao, C. Liu, K. Ding, and L. Jin. Datasets for large language models: A comprehensive survey.arXiv preprint arXiv:2402.18041, 2024b. W. Luo, T. Hu, S. Zhang, J. Sun, Z. Li, and Z. Zhang. Diff-instruct: A universal approach for transferring knowledge from pre-trained diffusion models.NeurIPS,

work page arXiv
[7]

Y. Tang, Y. Wang, J. Guo, Z. Tu, K. Han, H. Hu, and D. Tao. A survey on transformer compression.arXiv preprint arXiv:2402.05964,

work page arXiv
[8]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971,

work page internal anchor Pith review Pith/arXiv arXiv
[9]

Z. Ye, J. Du, and L. Yao. Adversarially contrastive estimation of conditional neural processes.arXiv preprint arXiv:2303.13004,

work page arXiv
[10]

MMD finds common application in areas such as domain adaptation (Chi et al., 2021; Jiang et al., 2023; Zheng et al.,

and its extension, deep- kernel MMD (Liu et al., 2020). MMD finds common application in areas such as domain adaptation (Chi et al., 2021; Jiang et al., 2023; Zheng et al.,

work page 2020
[11]

MMD and related techniques have been extensively adopted in real-world applications, including healthcare (Guo et al., 2022; Jiang et al., 2016; Zhong et al.,

and the evaluation of generative models, where assessing whether two or more domains are distributionally aligned (Liu et al., 2021; Ye and Yao, 2022; Ye et al., 2023; Zhong et al., 2024), is essential. MMD and related techniques have been extensively adopted in real-world applications, including healthcare (Guo et al., 2022; Jiang et al., 2016; Zhong et al.,

work page 2021
[12]

In this paper, these concepts are used in quantifying the distributional differences in Section 4 and Section

and defense against adversarial attacks (Zhang et al., 2025b, 2023). In this paper, these concepts are used in quantifying the distributional differences in Section 4 and Section

work page 2023
[13]

Maximum Mean Discrepancy (MMD), proposed by Gretton et al

We refer interested readers to the original papers for complete details therein. Maximum Mean Discrepancy (MMD), proposed by Gretton et al. (2012), is a statistical tool for measuring the distance between two Borel probability measures, say P and Q, defined on a separable metric space 𝒳 ⊆R 𝑑. Consider independent random variables 𝑋, 𝑋′ ∼P and 𝑌, 𝑌 ′ ∼Q . ...

work page 2012
[14]

The MMD can then be computed in this learned feature space 𝒵

propose to learn a task-relevant representation 𝜃𝜔 :𝒳 → 𝒵 using a neural network parameterized by 𝑤. The MMD can then be computed in this learned feature space 𝒵. As such, the goal of deep-kernel MMD is to find a representation 𝜃𝜔 that maximizes the MMD, thereby increasing the test power to detect differences between PandQ. Following (Liu et al., 2020), l...

work page 2020
[15]

tomaximize the separationbetween ˜𝒟mem and 𝒟non in the feature space. For ˜𝒟mem, 𝒟non and 𝒟anc, we perform mini-batch training and randomly sample subsets from each dataset, e.g.,ℬ anc ={𝑥 * 𝑏 i.i.d∼ 𝒟 anc}𝐵 𝑏=1, with respect to the optimization objectiveℒ(𝜔)defined as ℒ(𝜔) = [︁ \MMD 2 𝑢(ℬanc, ˜ℬmem;𝑘 𝜔) ]︁ ⏟ ⏞ member discrepancy − [︁ \MMD 2 𝑢(ℬanc,ℬ non;...

work page 2022
[16]

See detailed setup of victim models inApp.A

on commonly studied MIA benchmarks, CIFAR10 (Krizhevsky et al., 2010), FFHQ (Karras, 2019), and AFHQv2 (Choi et al., 2020). See detailed setup of victim models inApp.A. Baseline settings.D-MIA differs from existing MIA methods and attack targets. To ensure fairness, we adapt existing methods to the D-MIA setting for experimentation. Specifically, we apply...

work page 2010
[17]

and 10,000 for candidate dataset detection (Alg. 2). For AFHQv2, we sample 3,000 non-member data points, allocating 1,500 for kernel training and 1,500 for candidate detection. To ensure fairness, we randomly discard 15,000 member data points (3,000 for AFHQv2). To evaluate D-MIA under varying proportions of member data in the candidate datasets, we creat...

work page 2020

[1] [1]

Y. Gu, L. Dong, F. Wei, and M. Huang. Minillm: Knowledge distillation of large language models.arXiv preprint arXiv:2306.08543,

work page internal anchor Pith review Pith/arXiv arXiv

[2] [2]

D. Guo, D. Yang, H. Zhang, J. Song, R. Zhang, R. Xu, Q. Zhu, S. Ma, P. Wang, X. Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948,

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

G. Hinton. Distilling the knowledge in a neural network.arXiv preprint arXiv:1503.02531,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

B. Hu, J. Li, L. Xu, M. Lee, A. Jajoo, G.-W. Kim, H. Xu, and A. Akella. Blockllm: Multi-tenant finer-grained serving for large language models.arXiv preprint arXiv:2404.18322,

work page arXiv

[5] [5]

A. Liu, B. Feng, B. Xue, B. Wang, B. Wu, C. Lu, C. Zhao, C. Deng, C. Zhang, C. Ruan, et al. Deepseek-v3 technical report.arXiv preprint arXiv:2412.19437, 2024a. F. Liu, W. Xu, J. Lu, G. Zhang, A. Gretton, and D. J. Sutherland. Learning deep kernels for non-parametric two-sample tests. InICML,

work page internal anchor Pith review Pith/arXiv arXiv

[6] [6]

Y. Liu, J. Cao, C. Liu, K. Ding, and L. Jin. Datasets for large language models: A comprehensive survey.arXiv preprint arXiv:2402.18041, 2024b. W. Luo, T. Hu, S. Zhang, J. Sun, Z. Li, and Z. Zhang. Diff-instruct: A universal approach for transferring knowledge from pre-trained diffusion models.NeurIPS,

work page arXiv

[7] [7]

Y. Tang, Y. Wang, J. Guo, Z. Tu, K. Han, H. Hu, and D. Tao. A survey on transformer compression.arXiv preprint arXiv:2402.05964,

work page arXiv

[8] [8]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozière, N. Goyal, E. Hambro, F. Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971,

work page internal anchor Pith review Pith/arXiv arXiv

[9] [9]

Z. Ye, J. Du, and L. Yao. Adversarially contrastive estimation of conditional neural processes.arXiv preprint arXiv:2303.13004,

work page arXiv

[10] [10]

MMD finds common application in areas such as domain adaptation (Chi et al., 2021; Jiang et al., 2023; Zheng et al.,

and its extension, deep- kernel MMD (Liu et al., 2020). MMD finds common application in areas such as domain adaptation (Chi et al., 2021; Jiang et al., 2023; Zheng et al.,

work page 2020

[11] [11]

MMD and related techniques have been extensively adopted in real-world applications, including healthcare (Guo et al., 2022; Jiang et al., 2016; Zhong et al.,

and the evaluation of generative models, where assessing whether two or more domains are distributionally aligned (Liu et al., 2021; Ye and Yao, 2022; Ye et al., 2023; Zhong et al., 2024), is essential. MMD and related techniques have been extensively adopted in real-world applications, including healthcare (Guo et al., 2022; Jiang et al., 2016; Zhong et al.,

work page 2021

[12] [12]

In this paper, these concepts are used in quantifying the distributional differences in Section 4 and Section

and defense against adversarial attacks (Zhang et al., 2025b, 2023). In this paper, these concepts are used in quantifying the distributional differences in Section 4 and Section

work page 2023

[13] [13]

Maximum Mean Discrepancy (MMD), proposed by Gretton et al

We refer interested readers to the original papers for complete details therein. Maximum Mean Discrepancy (MMD), proposed by Gretton et al. (2012), is a statistical tool for measuring the distance between two Borel probability measures, say P and Q, defined on a separable metric space 𝒳 ⊆R 𝑑. Consider independent random variables 𝑋, 𝑋′ ∼P and 𝑌, 𝑌 ′ ∼Q . ...

work page 2012

[14] [14]

The MMD can then be computed in this learned feature space 𝒵

propose to learn a task-relevant representation 𝜃𝜔 :𝒳 → 𝒵 using a neural network parameterized by 𝑤. The MMD can then be computed in this learned feature space 𝒵. As such, the goal of deep-kernel MMD is to find a representation 𝜃𝜔 that maximizes the MMD, thereby increasing the test power to detect differences between PandQ. Following (Liu et al., 2020), l...

work page 2020

[15] [15]

tomaximize the separationbetween ˜𝒟mem and 𝒟non in the feature space. For ˜𝒟mem, 𝒟non and 𝒟anc, we perform mini-batch training and randomly sample subsets from each dataset, e.g.,ℬ anc ={𝑥 * 𝑏 i.i.d∼ 𝒟 anc}𝐵 𝑏=1, with respect to the optimization objectiveℒ(𝜔)defined as ℒ(𝜔) = [︁ \MMD 2 𝑢(ℬanc, ˜ℬmem;𝑘 𝜔) ]︁ ⏟ ⏞ member discrepancy − [︁ \MMD 2 𝑢(ℬanc,ℬ non;...

work page 2022

[16] [16]

See detailed setup of victim models inApp.A

on commonly studied MIA benchmarks, CIFAR10 (Krizhevsky et al., 2010), FFHQ (Karras, 2019), and AFHQv2 (Choi et al., 2020). See detailed setup of victim models inApp.A. Baseline settings.D-MIA differs from existing MIA methods and attack targets. To ensure fairness, we adapt existing methods to the D-MIA setting for experimentation. Specifically, we apply...

work page 2010

[17] [17]

and 10,000 for candidate dataset detection (Alg. 2). For AFHQv2, we sample 3,000 non-member data points, allocating 1,500 for kernel training and 1,500 for candidate detection. To ensure fairness, we randomly discard 15,000 member data points (3,000 for AFHQv2). To evaluate D-MIA under varying proportions of member data in the candidate datasets, we creat...

work page 2020