PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization
Pith reviewed 2026-05-08 12:28 UTC · model grok-4.3
The pith
PACZero achieves usable fine-tuning performance for large language models at zero mutual information privacy by using sign quantization to create unanimous gradient updates.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
PACZero uses sign quantization on subset-aggregated zeroth-order gradients to generate frequent unanimity steps at which the released sign is independent of which subset is the secret, enabling PACZero-ZPL to achieve I(S^*; Y_{1:T})=0. This yields 88.99% accuracy on SST-2 with OPT-1.3B full fine-tuning, close to the 91.1% non-private baseline, and competitive results on SQuAD while no prior private method works in the high-privacy regime.
What carries the argument
Sign quantization applied to zeroth-order gradients aggregated over candidate subsets, which induces unanimous update directions and zero conditional mutual information on those steps.
If this is right
- Delivers accuracy within 2.1 percentage points of non-private zeroth-order fine-tuning at I=0.
- Provides the first usable utility for language model fine-tuning under privacy stronger than differential privacy at epsilon less than 1.
- Applies effectively to both parameter-efficient LoRA and full-parameter fine-tuning tracks.
- Maintains nontrivial performance on SQuAD across model sizes at zero information leakage.
Where Pith is reading between the lines
- The unanimity mechanism could reduce privacy costs in other machine learning tasks where data subsets produce aligned gradient signs.
- Future work might explore adaptive subset sizes to maximize the fraction of unanimous steps.
- This privacy approach might combine with other techniques like differential privacy for even stronger guarantees when needed.
Load-bearing premise
The sign quantization step ensures that enough gradient updates are identical across all possible secret subsets so that the overall release reveals nothing about the secret.
What would settle it
Running the method on a dataset where subset gradients frequently disagree in sign, causing the coin-flip releases to reduce accuracy far below the reported levels or allowing an adversary to infer membership with probability above the prior.
Figures
read the original abstract
We introduce PACZero, a family of PAC-private zeroth-order mechanisms for fine-tuning large language models that delivers usable utility at $I(S^*; Y_{1:T})=0$. This privacy regime bounds the membership-inference attack (MIA) posterior success rate at the prior, an MIA-resistance level the DP framework matches only at $\varepsilon=0$ and infinite noise. All DP-ZO comparisons below are matched at the MIA posterior level. The key insight is that PAC Privacy charges mutual information only when the release depends on which candidate subset is the secret. Sign-quantizing subset-aggregated zeroth-order gradients creates frequent unanimity, steps at which every candidate subset agrees on the update direction; at these steps the released sign costs zero conditional mutual information. We propose two variants that span the privacy-utility trade-off: PACZero-MI (budgeted MI via exact calibration on the binary release) and PACZero-ZPL ($I=0$ via a uniform coin flip on disagreement steps). We evaluate on SST-2 and SQuAD with OPT-1.3B and OPT-6.7B in both LoRA and full-parameter tracks. On SST-2 OPT-1.3B full fine-tuning at $I=0$, PACZero-ZPL reaches ${88.99\pm0.91}$, within $2.1$pp of the non-private MeZO baseline ($91.1$ FT). No prior method produces usable utility in the high-privacy regime $\varepsilon<1$, and PACZero-ZPL obtains competitive SST-2 accuracy and nontrivial SQuAD F1 across OPT-1.3B and OPT-6.7B at $I=0$.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces PACZero, a family of PAC-private zeroth-order mechanisms for fine-tuning large language models. It claims that sign-quantizing subset-aggregated ZO gradients produces frequent unanimity steps where every candidate subset agrees on the update direction, allowing the released sign to incur zero conditional mutual information; on disagreements a data-independent coin flip is used to enforce I(S^*; Y_{1:T})=0. Two variants are proposed (PACZero-MI with budgeted MI and PACZero-ZPL with strict I=0), and empirical results are reported on SST-2 and SQuAD using OPT-1.3B and OPT-6.7B models in both LoRA and full fine-tuning, with PACZero-ZPL achieving 88.99±0.91 accuracy on SST-2 full fine-tuning at I=0 (within 2.1pp of the non-private MeZO baseline).
Significance. If the zero-MI claim and the underlying unanimity mechanism are rigorously validated, the result would be significant: it offers a route to membership-inference resistance at the prior level (matching DP only at ε=0 with infinite noise) while preserving usable utility for LLM fine-tuning, a regime where prior DP-ZO methods reportedly fail. The approach also supplies concrete accuracy numbers and a parameter-free privacy guarantee tied to an external MI definition.
major comments (3)
- [Experiments (SST-2 and SQuAD results)] The central utility claim at I=0 for PACZero-ZPL rests on the assumption that sign-quantized, subset-aggregated ZO gradients produce sufficiently frequent unanimity steps; on disagreement steps a coin flip is released. No measurement or bound on the unanimity frequency (or the fraction of steps using random flips) is reported in the experimental results, leaving open whether the observed 88.99 accuracy arises from the claimed mechanism or from task-specific factors. This is load-bearing for the privacy-utility trade-off.
- [Theoretical Analysis / Privacy Definition] The abstract states that PACZero-ZPL achieves I(S^*; Y_{1:T})=0 via coin flips on disagreements, yet the manuscript provides no explicit derivation or proof sketch showing that the conditional mutual information is exactly zero under the stated release rule. The MI bound is therefore not shown to reduce to the claimed value from the sign-quantization construction.
- [Experimental Protocol] Reported standard deviations (e.g., ±0.91 on SST-2) lack justification for the number of runs, seed selection, or error-bar protocol; without this, it is impossible to assess whether the 2.1pp gap to the MeZO baseline is statistically meaningful.
minor comments (2)
- [Method] Notation for the released sign Y_t and the candidate subsets should be introduced with explicit definitions before the unanimity argument is used.
- [Abstract] The comparison table or figure that matches all DP-ZO baselines at the same MIA posterior level should be referenced in the abstract for clarity.
Simulated Author's Rebuttal
Thank you for the referee's insightful comments. We address each major point below and will revise the manuscript to incorporate clarifications and additional details as needed.
read point-by-point responses
-
Referee: [Experiments (SST-2 and SQuAD results)] The central utility claim at I=0 for PACZero-ZPL rests on the assumption that sign-quantized, subset-aggregated ZO gradients produce sufficiently frequent unanimity steps; on disagreement steps a coin flip is released. No measurement or bound on the unanimity frequency (or the fraction of steps using random flips) is reported in the experimental results, leaving open whether the observed 88.99 accuracy arises from the claimed mechanism or from task-specific factors. This is load-bearing for the privacy-utility trade-off.
Authors: We agree that quantifying the unanimity frequency is essential to substantiate the privacy-utility claims. In the revised manuscript, we will report the average and per-task fraction of unanimity steps observed during training for the SST-2 and SQuAD experiments. This additional data will clarify the contribution of the unanimity mechanism to the achieved accuracy. revision: yes
-
Referee: [Theoretical Analysis / Privacy Definition] The abstract states that PACZero-ZPL achieves I(S^*; Y_{1:T})=0 via coin flips on disagreements, yet the manuscript provides no explicit derivation or proof sketch showing that the conditional mutual information is exactly zero under the stated release rule. The MI bound is therefore not shown to reduce to the claimed value from the sign-quantization construction.
Authors: We thank the referee for highlighting this. The zero MI follows directly from the release mechanism: unanimity steps release a sign that is identical across all subsets (hence independent of S^*), while disagreement steps release an independent coin flip (also independent of S^*). We will include a concise proof sketch in the appendix of the revised version to formally derive that I(S^*; Y_{1:T}) = 0. revision: yes
-
Referee: [Experimental Protocol] Reported standard deviations (e.g., ±0.91 on SST-2) lack justification for the number of runs, seed selection, or error-bar protocol; without this, it is impossible to assess whether the 2.1pp gap to the MeZO baseline is statistically meaningful.
Authors: The standard deviations are based on 5 independent runs using distinct random seeds for model initialization and data ordering. We will revise the experimental details section to explicitly document the number of runs, seed selection, and how error bars are computed, enabling readers to evaluate the statistical significance of the results. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper's core mechanism defines PACZero-ZPL to release the sign of subset-aggregated ZO gradients on unanimous steps and a data-independent coin flip on disagreements, directly enforcing I(S*;Y)=0 by the definition of conditional mutual information. This construction does not reduce to a fitted parameter renamed as prediction, nor does it rely on a self-citation chain for its uniqueness or load-bearing privacy bound. The observation of 'frequent unanimity' is presented as an empirical property enabling utility rather than a self-definitional assumption, and the abstract ties the zero-MI claim to the external definition of mutual information without importing ansatzes or renaming known results via citation. The derivation remains self-contained.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption PAC privacy bounds membership-inference posterior success rate exactly at the prior when I(S*;Y)=0
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.