pith. machine review for the scientific record. sign in

arxiv: 2604.04461 · v1 · submitted 2026-04-06 · 💻 cs.LG · cs.AI· cs.CL

Recognition: 2 theorem links

· Lean Theorem

DP-OPD: Differentially Private On-Policy Distillation for Language Models

Authors on Pith no claims yet

Pith reviewed 2026-05-10 19:09 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CL
keywords differential privacyon-policy distillationknowledge distillationlanguage modelsmodel compressionDP-SGDprivacy-utility tradeoff
0
0 comments X

The pith

Differentially private on-policy distillation trains only the student with DP-SGD on its own generated trajectories guided by a frozen teacher, yielding better perplexity under tight privacy budgets and collapsing the pipeline to a singleDP

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that enforcing differential privacy solely on the student model through DP-SGD, while using dense token-level targets from a frozen teacher on student-generated continuations, resolves the tension between record-level privacy and utility loss in autoregressive language modeling. It demonstrates concrete gains in perplexity on domain-specific datasets compared with DP fine-tuning, off-policy distillation, and synthesis-based approaches that require private teacher training. The method matters because it removes the need for a separate DP-trained teacher and offline synthetic data generation, reducing the process to one private student-training loop. A reader would care if this holds because it makes private adaptation of compressed models to sensitive corpora more practical without amplifying exposure bias from optimization noise.

Core claim

DP-OPD enforces privacy solely through DP-SGD on the student while leveraging a frozen teacher to provide dense token-level targets on student-generated trajectories. It instantiates this via private generalized knowledge distillation on continuation tokens. Under a strict privacy budget of ε=2.0, the approach improves perplexity over DP fine-tuning and off-policy DP distillation and outperforms synthesis-based DP distillation on Yelp and BigPatent, while collapsing private compression into a single DP student-training loop by eliminating DP teacher training and offline synthetic text generation.

What carries the argument

Private generalized knowledge distillation on continuation tokens, where the student is trained with DP-SGD using the frozen teacher's predictions as soft targets on trajectories sampled from the student itself.

If this is right

  • Perplexity on sensitive-domain language modeling improves under the same privacy budget compared with DP fine-tuning or off-policy distillation.
  • Private model compression reduces to one DP student-training loop without separate teacher privacy costs or offline data synthesis.
  • On-policy sampling with teacher guidance mitigates compounding errors from optimization noise along long autoregressive rollouts.
  • The framework applies directly to proprietary corpora that require record-level privacy protection.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The single-loop structure could allow scaling to larger student models by avoiding the memory and compute overhead of DP-training the teacher.
  • If the on-policy mechanism holds, it might extend to other sequence tasks where exposure bias is a concern, such as code generation on private repositories.
  • The approach leaves open whether combining it with selective teacher updates could further tighten the privacy-utility curve without reintroducing multi-stage complexity.

Load-bearing premise

That enforcing DP-SGD only on the student using its own generated trajectories preserves formal privacy guarantees without leakage through the frozen teacher or the on-policy sampling process, and that the teacher's dense targets remain effective guides under the added optimization noise.

What would settle it

A direct measurement on the same datasets showing that perplexity under ε=2.0 does not improve relative to the synthesis-based baseline, or an audit revealing private information leakage from the student despite DP-SGD being applied only to it.

Figures

Figures reproduced from arXiv: 2604.04461 by Fatemeh Khadem, Sajad Mousavi, Yi Fang, Yuhong Liu.

Figure 1
Figure 1. Figure 1: Contrasting DP distillation paradigms. In DP-OPD, teacher targets are evaluated on student-generated trajectories, while privacy is enforced solely via DP-SGD on student updates. privacy regimes (Mireshghallah et al., 2022). Recent work has begun to connect DP and distillation for LMs by using DP synthetic text. DISTILDP (Flemings & Annavaram, 2024) trains a differentially private teacher with DP-SGD, gene… view at source ↗
Figure 2
Figure 2. Figure 2: Ablation on BigPatent: test perplexity as a function of the GKD divergence parameter β with λ = 1.0 (on-policy every step). Lower β yields lower PPL in our setting, with the best performance at β = 0. lower β corresponds to a more forward-KL-like alignment that encourages the student to cover the teacher’s predictive distribution, which is closely tied to next-token likelihood and thus perplexity. As β inc… view at source ↗
read the original abstract

Large language models (LLMs) are increasingly adapted to proprietary and domain-specific corpora that contain sensitive information, creating a tension between formal privacy guarantees and efficient deployment through model compression. Differential privacy (DP), typically enforced via DP-SGD, provides record-level protection but often incurs substantial utility loss in autoregressive generation, where optimization noise can amplify exposure bias and compounding errors along long rollouts. Existing approaches to private distillation either apply DP-SGD to both teacher and student, worsening computation and the privacy--utility tradeoff, or rely on DP synthetic text generation from a DP-trained teacher, avoiding DP on the student at the cost of DP-optimizing a large teacher and introducing an offline generation pipeline. We propose \textbf{Differentially Private On-Policy Distillation (DP-OPD)}, a synthesis-free framework that enforces privacy solely through DP-SGD on the student while leveraging a frozen teacher to provide dense token-level targets on \emph{student-generated} trajectories. DP-OPD instantiates this idea via \emph{private generalized knowledge distillation} on continuation tokens. Under a strict privacy budget ($\varepsilon=2.0$), DP-OPD improves perplexity over DP fine-tuning and off-policy DP distillation, and outperforms synthesis-based DP distillation (Yelp: 44.15$\rightarrow$41.68; BigPatent: 32.43$\rightarrow$30.63), while substantially simplifying the training pipeline. In particular, \textbf{DP-OPD collapses private compression into a single DP student-training loop} by eliminating DP teacher training and offline synthetic text generation. Code will be released upon publication at https://github.com/khademfatemeh/dp_opd.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims to introduce DP-OPD, a differentially private on-policy distillation method for language models. By applying DP-SGD only to the student on trajectories it generates itself, guided by a frozen teacher, it achieves formal privacy at ε=2.0 and better perplexity than DP fine-tuning, off-policy DP distillation, and synthesis-based approaches on Yelp and BigPatent datasets, while simplifying the pipeline to a single DP student-training loop.

Significance. Should the formal privacy hold, this would be a meaningful advance in private LLM adaptation and compression, as it avoids the computational cost of DP-training the teacher and generating synthetic data offline. The concrete perplexity gains and the commitment to release code are strengths that enhance the potential impact.

major comments (2)
  1. [Privacy Analysis (Section 3)] The central privacy claim relies on standard DP-SGD applied to the student. However, because the training data consists of on-policy trajectories sampled from the student's current (DP-noisy) parameters, a change in one private record can influence both the gradient and the distribution of sampled sequences. This violates the fixed-dataset assumption in standard DP-SGD analyses and requires additional justification or a modified proof to support the ε=2.0 guarantee.
  2. [Experimental Setup (Section 4)] The reported perplexity improvements (Yelp: 44.15→41.68; BigPatent: 32.43→30.63) are presented without accompanying details on the number of independent runs, variance or standard errors, hyperparameter search procedures, or precise implementation of the on-policy sampling and baseline methods. These omissions make it difficult to assess the reliability and reproducibility of the empirical claims.
minor comments (1)
  1. The abstract and introduction could more explicitly state the assumption that the teacher model has not been trained on the private data, to clarify the privacy boundary.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. The comments highlight important aspects of the privacy analysis and experimental reporting that we will address in the revision. Below we respond point by point to the major comments.

read point-by-point responses
  1. Referee: [Privacy Analysis (Section 3)] The central privacy claim relies on standard DP-SGD applied to the student. However, because the training data consists of on-policy trajectories sampled from the student's current (DP-noisy) parameters, a change in one private record can influence both the gradient and the distribution of sampled sequences. This violates the fixed-dataset assumption in standard DP-SGD analyses and requires additional justification or a modified proof to support the ε=2.0 guarantee.

    Authors: We thank the referee for identifying this subtlety. The on-policy sampling introduces an adaptive dependence between the model state and the trajectories used for the next update, which is not present in the standard fixed-dataset DP-SGD setting. We agree that additional justification is warranted. In the revised manuscript we will expand Section 3 with a dedicated subsection that extends the analysis: we treat the full training loop as a composition of DP-SGD steps where each sampling distribution is a post-processing of the already-private model parameters. Because post-processing incurs no additional privacy cost and the influence of any single record is controlled by the preceding DP updates, the overall mechanism remains (ε,δ)-DP with the stated budget. A proof sketch will be added to the appendix showing that the adaptive sampling does not amplify the privacy loss beyond the per-step DP-SGD accounting. We believe this addresses the concern while preserving the ε=2.0 claim. revision: yes

  2. Referee: [Experimental Setup (Section 4)] The reported perplexity improvements (Yelp: 44.15→41.68; BigPatent: 32.43→30.63) are presented without accompanying details on the number of independent runs, variance or standard errors, hyperparameter search procedures, or precise implementation of the on-policy sampling and baseline methods. These omissions make it difficult to assess the reliability and reproducibility of the empirical claims.

    Authors: We agree that the current experimental section lacks sufficient detail for assessing reliability and reproducibility. In the revised version we will expand Section 4 (and add a new appendix) to report: five independent runs with different random seeds for each method and dataset; mean perplexity together with standard error; the hyperparameter search procedure (grid search over learning rate, batch size, noise multiplier, and clipping norm, subject to the fixed ε=2.0 budget); and precise implementation details for on-policy sampling (temperature=1.0, maximum continuation length, and how the off-policy and synthesis baselines were reproduced). These additions will allow readers to evaluate the stability of the reported gains. revision: yes

Circularity Check

0 steps flagged

No circularity; claims rest on empirical evaluation of a proposed algorithm

full rationale

The paper proposes DP-OPD as a practical framework that applies DP-SGD only to the student on its own generated trajectories while using a frozen teacher for token targets. All central claims (perplexity improvements under ε=2.0, pipeline simplification) are supported by direct experimental comparisons on Yelp and BigPatent rather than any mathematical derivation. No equations, self-definitions, fitted parameters renamed as predictions, or load-bearing self-citations appear in the manuscript. The privacy argument invokes standard DP-SGD without reducing it to a self-referential construction or prior author result that itself lacks independent verification. The derivation chain is therefore self-contained as an algorithmic proposal plus empirical validation.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based solely on the abstract, the approach relies on standard assumptions from differential privacy and knowledge distillation without introducing new free parameters or entities.

axioms (2)
  • domain assumption DP-SGD applied only to student updates provides the stated record-level privacy guarantees even when targets come from a non-private frozen teacher.
    Implicit in the claim that privacy is enforced solely through DP-SGD on the student.
  • domain assumption On-policy trajectories generated by the noisy student still allow effective guidance from the teacher's token-level targets.
    Required for the on-policy distillation to outperform off-policy baselines.

pith-pipeline@v0.9.0 · 5623 in / 1405 out tokens · 58777 ms · 2026-05-10T19:09:22.833185+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

13 extracted references · 13 canonical work pages

  1. [1]

    B., Mironov, I., Talwar, K., and Zhang, L

    Abadi, M., Chu, A., Goodfellow, I., McMahan, H. B., Mironov, I., Talwar, K., and Zhang, L. Deep learning with differential privacy. InProceedings of the 2016 ACM SIGSAC Conference on Computer and Communications Security (CCS),

  2. [2]

    Goodfellow, H

    doi: 10.1145/2976749.2978318. Agarwal, R., Vieillard, N., Zhou, Y ., Stanczyk, P., Ramos, S., Geist, M., and Bachem, O. On-policy distillation of language models: Learning from self-generated mis- takes.arXiv preprint arXiv:2306.13649,

  3. [3]

    Bengio, S., Vinyals, O., Jaitly, N., and Shazeer, N

    To appear / version cited as 2024 in some venues. Bengio, S., Vinyals, O., Jaitly, N., and Shazeer, N. Sched- uled sampling for sequence prediction with recurrent neu- ral networks. InAdvances in Neural Information Process- ing Systems (NeurIPS),

  4. [4]

    doi: 10.1007/11681878

  5. [5]

    and Annavaram, M

    Flemings, J. and Annavaram, M. Differentially private knowledge distillation via synthetic text generation. In Findings of the Association for Computational Lin- guistics: ACL 2024,

  6. [6]

    Style-specific neurons for steering LLMs in text style transfer

    doi: 10.18653/v1/2024. findings-acl.769. URL https://aclanthology. org/2024.findings-acl.769/. Hugging Face H4. Unlocking on-policy dis- tillation for any model family. https:// huggingface.co/spaces/HuggingFaceH4/ on-policy-distillation, October

  7. [7]

    arXiv preprint arXiv:1909.05858 , year=

    Accessed 2026-01-31. Keskar, N. S., McCann, B., Varshney, L. R., Xiong, C., and Socher, R. Ctrl: A conditional transformer lan- guage model for controllable generation.arXiv preprint arXiv:1909.05858,

  8. [8]

    arXiv preprint arXiv:2306.01684 , year=

    Kurakin, A., Ponomareva, N., Syed, U., MacDermed, L., and Terzis, A. Harnessing large-language mod- els to generate private synthetic text.arXiv preprint arXiv:2306.01684,

  9. [9]

    Large language models can be strong differentially private learners.arXiv preprint arXiv:2110.05679, 2021

    Li, X., Tram`er, F., Liang, P., and Hashimoto, T. Large lan- guage models can be strong differentially private learners. arXiv preprint arXiv:2110.05679,

  10. [10]

    arXiv preprint arXiv:2205.13621 , year=

    Majmudar, J., Dupuy, C., Peris, C., Smaili, S., Gupta, R., and Zemel, R. Differentially private decoding in large lan- guage models.arXiv preprint arXiv:2205.13621,

  11. [11]

    Differentially private language models for secure data sharing.arXiv preprint arXiv:2210.13918,

    Mattern, J., Jin, Z., Weggenmann, B., Sch ¨olkopf, B., and Sachan, M. Differentially private language models for secure data sharing.arXiv preprint arXiv:2210.13918,

  12. [12]

    Semi-supervised Knowledge Transfer for Deep Learning from Private Training Data

    Papernot, N., Abadi, M., Erlingsson, ´U., Goodfellow, I., and Talwar, K. Semi-supervised knowledge transfer for deep learning from private training data.arXiv preprint arXiv:1610.05755,

  13. [13]

    Sequence Level Training with Recurrent Neural Networks

    Ranzato, M., Chopra, S., Auli, M., and Zaremba, W. Se- quence level training with recurrent neural networks. arXiv preprint arXiv:1511.06732,