Recognition: 2 theorem links
· Lean TheoremDP-OPD: Differentially Private On-Policy Distillation for Language Models
Pith reviewed 2026-05-10 19:09 UTC · model grok-4.3
The pith
Differentially private on-policy distillation trains only the student with DP-SGD on its own generated trajectories guided by a frozen teacher, yielding better perplexity under tight privacy budgets and collapsing the pipeline to a singleDP
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
DP-OPD enforces privacy solely through DP-SGD on the student while leveraging a frozen teacher to provide dense token-level targets on student-generated trajectories. It instantiates this via private generalized knowledge distillation on continuation tokens. Under a strict privacy budget of ε=2.0, the approach improves perplexity over DP fine-tuning and off-policy DP distillation and outperforms synthesis-based DP distillation on Yelp and BigPatent, while collapsing private compression into a single DP student-training loop by eliminating DP teacher training and offline synthetic text generation.
What carries the argument
Private generalized knowledge distillation on continuation tokens, where the student is trained with DP-SGD using the frozen teacher's predictions as soft targets on trajectories sampled from the student itself.
If this is right
- Perplexity on sensitive-domain language modeling improves under the same privacy budget compared with DP fine-tuning or off-policy distillation.
- Private model compression reduces to one DP student-training loop without separate teacher privacy costs or offline data synthesis.
- On-policy sampling with teacher guidance mitigates compounding errors from optimization noise along long autoregressive rollouts.
- The framework applies directly to proprietary corpora that require record-level privacy protection.
Where Pith is reading between the lines
- The single-loop structure could allow scaling to larger student models by avoiding the memory and compute overhead of DP-training the teacher.
- If the on-policy mechanism holds, it might extend to other sequence tasks where exposure bias is a concern, such as code generation on private repositories.
- The approach leaves open whether combining it with selective teacher updates could further tighten the privacy-utility curve without reintroducing multi-stage complexity.
Load-bearing premise
That enforcing DP-SGD only on the student using its own generated trajectories preserves formal privacy guarantees without leakage through the frozen teacher or the on-policy sampling process, and that the teacher's dense targets remain effective guides under the added optimization noise.
What would settle it
A direct measurement on the same datasets showing that perplexity under ε=2.0 does not improve relative to the synthesis-based baseline, or an audit revealing private information leakage from the student despite DP-SGD being applied only to it.
Figures
read the original abstract
Large language models (LLMs) are increasingly adapted to proprietary and domain-specific corpora that contain sensitive information, creating a tension between formal privacy guarantees and efficient deployment through model compression. Differential privacy (DP), typically enforced via DP-SGD, provides record-level protection but often incurs substantial utility loss in autoregressive generation, where optimization noise can amplify exposure bias and compounding errors along long rollouts. Existing approaches to private distillation either apply DP-SGD to both teacher and student, worsening computation and the privacy--utility tradeoff, or rely on DP synthetic text generation from a DP-trained teacher, avoiding DP on the student at the cost of DP-optimizing a large teacher and introducing an offline generation pipeline. We propose \textbf{Differentially Private On-Policy Distillation (DP-OPD)}, a synthesis-free framework that enforces privacy solely through DP-SGD on the student while leveraging a frozen teacher to provide dense token-level targets on \emph{student-generated} trajectories. DP-OPD instantiates this idea via \emph{private generalized knowledge distillation} on continuation tokens. Under a strict privacy budget ($\varepsilon=2.0$), DP-OPD improves perplexity over DP fine-tuning and off-policy DP distillation, and outperforms synthesis-based DP distillation (Yelp: 44.15$\rightarrow$41.68; BigPatent: 32.43$\rightarrow$30.63), while substantially simplifying the training pipeline. In particular, \textbf{DP-OPD collapses private compression into a single DP student-training loop} by eliminating DP teacher training and offline synthetic text generation. Code will be released upon publication at https://github.com/khademfatemeh/dp_opd.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims to introduce DP-OPD, a differentially private on-policy distillation method for language models. By applying DP-SGD only to the student on trajectories it generates itself, guided by a frozen teacher, it achieves formal privacy at ε=2.0 and better perplexity than DP fine-tuning, off-policy DP distillation, and synthesis-based approaches on Yelp and BigPatent datasets, while simplifying the pipeline to a single DP student-training loop.
Significance. Should the formal privacy hold, this would be a meaningful advance in private LLM adaptation and compression, as it avoids the computational cost of DP-training the teacher and generating synthetic data offline. The concrete perplexity gains and the commitment to release code are strengths that enhance the potential impact.
major comments (2)
- [Privacy Analysis (Section 3)] The central privacy claim relies on standard DP-SGD applied to the student. However, because the training data consists of on-policy trajectories sampled from the student's current (DP-noisy) parameters, a change in one private record can influence both the gradient and the distribution of sampled sequences. This violates the fixed-dataset assumption in standard DP-SGD analyses and requires additional justification or a modified proof to support the ε=2.0 guarantee.
- [Experimental Setup (Section 4)] The reported perplexity improvements (Yelp: 44.15→41.68; BigPatent: 32.43→30.63) are presented without accompanying details on the number of independent runs, variance or standard errors, hyperparameter search procedures, or precise implementation of the on-policy sampling and baseline methods. These omissions make it difficult to assess the reliability and reproducibility of the empirical claims.
minor comments (1)
- The abstract and introduction could more explicitly state the assumption that the teacher model has not been trained on the private data, to clarify the privacy boundary.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback on our manuscript. The comments highlight important aspects of the privacy analysis and experimental reporting that we will address in the revision. Below we respond point by point to the major comments.
read point-by-point responses
-
Referee: [Privacy Analysis (Section 3)] The central privacy claim relies on standard DP-SGD applied to the student. However, because the training data consists of on-policy trajectories sampled from the student's current (DP-noisy) parameters, a change in one private record can influence both the gradient and the distribution of sampled sequences. This violates the fixed-dataset assumption in standard DP-SGD analyses and requires additional justification or a modified proof to support the ε=2.0 guarantee.
Authors: We thank the referee for identifying this subtlety. The on-policy sampling introduces an adaptive dependence between the model state and the trajectories used for the next update, which is not present in the standard fixed-dataset DP-SGD setting. We agree that additional justification is warranted. In the revised manuscript we will expand Section 3 with a dedicated subsection that extends the analysis: we treat the full training loop as a composition of DP-SGD steps where each sampling distribution is a post-processing of the already-private model parameters. Because post-processing incurs no additional privacy cost and the influence of any single record is controlled by the preceding DP updates, the overall mechanism remains (ε,δ)-DP with the stated budget. A proof sketch will be added to the appendix showing that the adaptive sampling does not amplify the privacy loss beyond the per-step DP-SGD accounting. We believe this addresses the concern while preserving the ε=2.0 claim. revision: yes
-
Referee: [Experimental Setup (Section 4)] The reported perplexity improvements (Yelp: 44.15→41.68; BigPatent: 32.43→30.63) are presented without accompanying details on the number of independent runs, variance or standard errors, hyperparameter search procedures, or precise implementation of the on-policy sampling and baseline methods. These omissions make it difficult to assess the reliability and reproducibility of the empirical claims.
Authors: We agree that the current experimental section lacks sufficient detail for assessing reliability and reproducibility. In the revised version we will expand Section 4 (and add a new appendix) to report: five independent runs with different random seeds for each method and dataset; mean perplexity together with standard error; the hyperparameter search procedure (grid search over learning rate, batch size, noise multiplier, and clipping norm, subject to the fixed ε=2.0 budget); and precise implementation details for on-policy sampling (temperature=1.0, maximum continuation length, and how the off-policy and synthesis baselines were reproduced). These additions will allow readers to evaluate the stability of the reported gains. revision: yes
Circularity Check
No circularity; claims rest on empirical evaluation of a proposed algorithm
full rationale
The paper proposes DP-OPD as a practical framework that applies DP-SGD only to the student on its own generated trajectories while using a frozen teacher for token targets. All central claims (perplexity improvements under ε=2.0, pipeline simplification) are supported by direct experimental comparisons on Yelp and BigPatent rather than any mathematical derivation. No equations, self-definitions, fitted parameters renamed as predictions, or load-bearing self-citations appear in the manuscript. The privacy argument invokes standard DP-SGD without reducing it to a self-referential construction or prior author result that itself lacks independent verification. The derivation chain is therefore self-contained as an algorithmic proposal plus empirical validation.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption DP-SGD applied only to student updates provides the stated record-level privacy guarantees even when targets come from a non-private frozen teacher.
- domain assumption On-policy trajectories generated by the noisy student still allow effective guidance from the teacher's token-level targets.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We optimize the per-example objective ℓi(θ) using DP-SGD... clip to ℓ2 norm C, add Gaussian noise... track privacy with RDP accountant
-
IndisputableMonolith/Foundation/BranchSelection.leanbranch_selection unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
generalized distillation divergence... β=0 forward-KL, β=1 reverse-KL, β=0.5 JSD-like
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
B., Mironov, I., Talwar, K., and Zhang, L
Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS),
work page 2016
-
[2]
doi: 10.1145/2976749.2978318. Agarwal, R., Vieillard, N., Zhou, Y ., Stanczyk, P., Ramos, S., Geist, M., and Bachem, O. On-policy distillation of language models: Learning from self-generated mis- takes.arXiv preprint arXiv:2306.13649,
-
[3]
Bengio, S., Vinyals, O., Jaitly, N., and Shazeer, N
To appear / version cited as 2024 in some venues. Bengio, S., Vinyals, O., Jaitly, N., and Shazeer, N. Sched- uled sampling for sequence prediction with recurrent neu- ral networks. InAdvances in Neural Information Process- ing Systems (NeurIPS),
work page 2024
-
[4]
doi: 10.1007/11681878
-
[5]
Flemings, J. and Annavaram, M. Differentially private knowledge distillation via synthetic text generation. In Findings of the Association for Computational Lin- guistics: ACL 2024,
work page 2024
-
[6]
Style-specific neurons for steering LLMs in text style transfer
doi: 10.18653/v1/2024. findings-acl.769. URL https://aclanthology. org/2024.findings-acl.769/. Hugging Face H4. Unlocking on-policy dis- tillation for any model family. https:// huggingface.co/spaces/HuggingFaceH4/ on-policy-distillation, October
-
[7]
arXiv preprint arXiv:1909.05858 , year=
Accessed 2026-01-31. Keskar, N. S., McCann, B., Varshney, L. R., Xiong, C., and Socher, R. Ctrl: A conditional transformer lan- guage model for controllable generation.arXiv preprint arXiv:1909.05858,
-
[8]
arXiv preprint arXiv:2306.01684 , year=
Kurakin, A., Ponomareva, N., Syed, U., MacDermed, L., and Terzis, A. Harnessing large-language mod- els to generate private synthetic text.arXiv preprint arXiv:2306.01684,
-
[9]
Li, X., Tram`er, F., Liang, P., and Hashimoto, T. Large lan- guage models can be strong differentially private learners. arXiv preprint arXiv:2110.05679,
-
[10]
arXiv preprint arXiv:2205.13621 , year=
Majmudar, J., Dupuy, C., Peris, C., Smaili, S., Gupta, R., and Zemel, R. Differentially private decoding in large lan- guage models.arXiv preprint arXiv:2205.13621,
-
[11]
Differentially private language models for secure data sharing.arXiv preprint arXiv:2210.13918,
Mattern, J., Jin, Z., Weggenmann, B., Sch ¨olkopf, B., and Sachan, M. Differentially private language models for secure data sharing.arXiv preprint arXiv:2210.13918,
-
[12]
Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data
Papernot, N., Abadi, M., Erlingsson, ´U., Goodfellow, I., and Talwar, K. Semi-supervised knowledge transfer for deep learning from private training data.arXiv preprint arXiv:1610.05755,
-
[13]
Sequence Level Training with Recurrent Neural Networks
Ranzato, M., Chopra, S., Auli, M., and Zaremba, W. Se- quence level training with recurrent neural networks. arXiv preprint arXiv:1511.06732,
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.