PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization

Marten van Dijk; Murat Bilgehan Ertan; Phuong Ha Nguyen; Srinivas Devadas; Xiaochen Zhu

arxiv: 2605.06505 · v2 · pith:MQB2XUF2new · submitted 2026-05-07 · 💻 cs.LG · cs.AI· cs.CR

PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization

Murat Bilgehan Ertan , Xiaochen Zhu , Phuong Ha Nguyen , Marten van Dijk , Srinivas Devadas This is my paper

Pith reviewed 2026-05-08 12:28 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CR

keywords PAC privacyzeroth-order optimizationsign quantizationlanguage model fine-tuningmutual informationmembership inference

0 comments

The pith

PACZero achieves usable fine-tuning performance for large language models at zero mutual information privacy by using sign quantization to create unanimous gradient updates.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops PACZero mechanisms for private fine-tuning of language models with zeroth-order methods. It establishes that quantizing aggregated gradients to their signs produces many steps where all possible secret subsets agree on the update, allowing the release to carry no information about the secret data. A reader would care because this zero-information regime protects against membership inference attacks at the level of random guessing, something standard differential privacy cannot achieve without destroying utility. The resulting models maintain high accuracy on sentiment and question-answering tasks with models up to 6.7 billion parameters.

Core claim

PACZero uses sign quantization on subset-aggregated zeroth-order gradients to generate frequent unanimity steps at which the released sign is independent of which subset is the secret, enabling PACZero-ZPL to achieve I(S^*; Y_{1:T})=0. This yields 88.99% accuracy on SST-2 with OPT-1.3B full fine-tuning, close to the 91.1% non-private baseline, and competitive results on SQuAD while no prior private method works in the high-privacy regime.

What carries the argument

Sign quantization applied to zeroth-order gradients aggregated over candidate subsets, which induces unanimous update directions and zero conditional mutual information on those steps.

If this is right

Delivers accuracy within 2.1 percentage points of non-private zeroth-order fine-tuning at I=0.
Provides the first usable utility for language model fine-tuning under privacy stronger than differential privacy at epsilon less than 1.
Applies effectively to both parameter-efficient LoRA and full-parameter fine-tuning tracks.
Maintains nontrivial performance on SQuAD across model sizes at zero information leakage.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The unanimity mechanism could reduce privacy costs in other machine learning tasks where data subsets produce aligned gradient signs.
Future work might explore adaptive subset sizes to maximize the fraction of unanimous steps.
This privacy approach might combine with other techniques like differential privacy for even stronger guarantees when needed.

Load-bearing premise

The sign quantization step ensures that enough gradient updates are identical across all possible secret subsets so that the overall release reveals nothing about the secret.

What would settle it

Running the method on a dataset where subset gradients frequently disagree in sign, causing the coin-flip releases to reduce accuracy far below the reported levels or allowing an adversary to infer membership with probability above the prior.

Figures

Figures reproduced from arXiv: 2605.06505 by Marten van Dijk, Murat Bilgehan Ertan, Phuong Ha Nguyen, Srinivas Devadas, Xiaochen Zhu.

**Figure 1.** Figure 1: The PACZERO per-step mechanism. Per-sample ZO scalars are aggregated over M = 128 random subsets, sign-quantized to sm ∈ {−1, +1}, and released as a single bit identifying the sign of the secret subset. On unanimity (q + t ∈ {0, 1}) the released bit is constant on supp pt and contributes zero conditional MI. On disagreement, PACZERO-MI releasessign(sj ∗+N (0, σ2 t )) with σt calibrated to a per-step MI bud… view at source ↗

read the original abstract

We introduce PACZero, a family of PAC-private zeroth-order mechanisms for fine-tuning large language models that delivers usable utility at $I(S^*; Y_{1:T})=0$. This privacy regime bounds the membership-inference attack (MIA) posterior success rate at the prior, an MIA-resistance level the DP framework matches only at $\varepsilon=0$ and infinite noise. All DP-ZO comparisons below are matched at the MIA posterior level. The key insight is that PAC Privacy charges mutual information only when the release depends on which candidate subset is the secret. Sign-quantizing subset-aggregated zeroth-order gradients creates frequent unanimity, steps at which every candidate subset agrees on the update direction; at these steps the released sign costs zero conditional mutual information. We propose two variants that span the privacy-utility trade-off: PACZero-MI (budgeted MI via exact calibration on the binary release) and PACZero-ZPL ($I=0$ via a uniform coin flip on disagreement steps). We evaluate on SST-2 and SQuAD with OPT-1.3B and OPT-6.7B in both LoRA and full-parameter tracks. On SST-2 OPT-1.3B full fine-tuning at $I=0$, PACZero-ZPL reaches ${88.99\pm0.91}$, within $2.1$pp of the non-private MeZO baseline ($91.1$ FT). No prior method produces usable utility in the high-privacy regime $\varepsilon<1$, and PACZero-ZPL obtains competitive SST-2 accuracy and nontrivial SQuAD F1 across OPT-1.3B and OPT-6.7B at $I=0$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces PACZero, a family of PAC-private zeroth-order mechanisms for fine-tuning large language models. It claims that sign-quantizing subset-aggregated ZO gradients produces frequent unanimity steps where every candidate subset agrees on the update direction, allowing the released sign to incur zero conditional mutual information; on disagreements a data-independent coin flip is used to enforce I(S^*; Y_{1:T})=0. Two variants are proposed (PACZero-MI with budgeted MI and PACZero-ZPL with strict I=0), and empirical results are reported on SST-2 and SQuAD using OPT-1.3B and OPT-6.7B models in both LoRA and full fine-tuning, with PACZero-ZPL achieving 88.99±0.91 accuracy on SST-2 full fine-tuning at I=0 (within 2.1pp of the non-private MeZO baseline).

Significance. If the zero-MI claim and the underlying unanimity mechanism are rigorously validated, the result would be significant: it offers a route to membership-inference resistance at the prior level (matching DP only at ε=0 with infinite noise) while preserving usable utility for LLM fine-tuning, a regime where prior DP-ZO methods reportedly fail. The approach also supplies concrete accuracy numbers and a parameter-free privacy guarantee tied to an external MI definition.

major comments (3)

[Experiments (SST-2 and SQuAD results)] The central utility claim at I=0 for PACZero-ZPL rests on the assumption that sign-quantized, subset-aggregated ZO gradients produce sufficiently frequent unanimity steps; on disagreement steps a coin flip is released. No measurement or bound on the unanimity frequency (or the fraction of steps using random flips) is reported in the experimental results, leaving open whether the observed 88.99 accuracy arises from the claimed mechanism or from task-specific factors. This is load-bearing for the privacy-utility trade-off.
[Theoretical Analysis / Privacy Definition] The abstract states that PACZero-ZPL achieves I(S^*; Y_{1:T})=0 via coin flips on disagreements, yet the manuscript provides no explicit derivation or proof sketch showing that the conditional mutual information is exactly zero under the stated release rule. The MI bound is therefore not shown to reduce to the claimed value from the sign-quantization construction.
[Experimental Protocol] Reported standard deviations (e.g., ±0.91 on SST-2) lack justification for the number of runs, seed selection, or error-bar protocol; without this, it is impossible to assess whether the 2.1pp gap to the MeZO baseline is statistically meaningful.

minor comments (2)

[Method] Notation for the released sign Y_t and the candidate subsets should be introduced with explicit definitions before the unanimity argument is used.
[Abstract] The comparison table or figure that matches all DP-ZO baselines at the same MIA posterior level should be referenced in the abstract for clarity.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the referee's insightful comments. We address each major point below and will revise the manuscript to incorporate clarifications and additional details as needed.

read point-by-point responses

Referee: [Experiments (SST-2 and SQuAD results)] The central utility claim at I=0 for PACZero-ZPL rests on the assumption that sign-quantized, subset-aggregated ZO gradients produce sufficiently frequent unanimity steps; on disagreement steps a coin flip is released. No measurement or bound on the unanimity frequency (or the fraction of steps using random flips) is reported in the experimental results, leaving open whether the observed 88.99 accuracy arises from the claimed mechanism or from task-specific factors. This is load-bearing for the privacy-utility trade-off.

Authors: We agree that quantifying the unanimity frequency is essential to substantiate the privacy-utility claims. In the revised manuscript, we will report the average and per-task fraction of unanimity steps observed during training for the SST-2 and SQuAD experiments. This additional data will clarify the contribution of the unanimity mechanism to the achieved accuracy. revision: yes
Referee: [Theoretical Analysis / Privacy Definition] The abstract states that PACZero-ZPL achieves I(S^*; Y_{1:T})=0 via coin flips on disagreements, yet the manuscript provides no explicit derivation or proof sketch showing that the conditional mutual information is exactly zero under the stated release rule. The MI bound is therefore not shown to reduce to the claimed value from the sign-quantization construction.

Authors: We thank the referee for highlighting this. The zero MI follows directly from the release mechanism: unanimity steps release a sign that is identical across all subsets (hence independent of S^*), while disagreement steps release an independent coin flip (also independent of S^*). We will include a concise proof sketch in the appendix of the revised version to formally derive that I(S^*; Y_{1:T}) = 0. revision: yes
Referee: [Experimental Protocol] Reported standard deviations (e.g., ±0.91 on SST-2) lack justification for the number of runs, seed selection, or error-bar protocol; without this, it is impossible to assess whether the 2.1pp gap to the MeZO baseline is statistically meaningful.

Authors: The standard deviations are based on 5 independent runs using distinct random seeds for model initialization and data ordering. We will revise the experimental details section to explicitly document the number of runs, seed selection, and how error bars are computed, enabling readers to evaluate the statistical significance of the results. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper's core mechanism defines PACZero-ZPL to release the sign of subset-aggregated ZO gradients on unanimous steps and a data-independent coin flip on disagreements, directly enforcing I(S*;Y)=0 by the definition of conditional mutual information. This construction does not reduce to a fitted parameter renamed as prediction, nor does it rely on a self-citation chain for its uniqueness or load-bearing privacy bound. The observation of 'frequent unanimity' is presented as an empirical property enabling utility rather than a self-definitional assumption, and the abstract ties the zero-MI claim to the external definition of mutual information without importing ansatzes or renaming known results via citation. The derivation remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract-only review; the central claim rests on the PAC privacy definition and the assumption that unanimity steps occur frequently enough to dominate the information cost.

axioms (1)

domain assumption PAC privacy bounds membership-inference posterior success rate exactly at the prior when I(S*;Y)=0
Stated directly in the abstract as the target privacy regime.

pith-pipeline@v0.9.0 · 5634 in / 1133 out tokens · 37418 ms · 2026-05-08T12:28:28.804139+00:00 · methodology

PACZero: PAC-Private Fine-Tuning of Language Models via Sign Quantization

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)