Quantization-Robust LLM Unlearning via Low-Rank Adaptation
Pith reviewed 2026-05-15 22:14 UTC · model grok-4.3
The pith
Low-rank adapters preserve unlearning effects under 4-bit quantization where full fine-tuning fails.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Standard full-parameter fine-tuning often induces parameter changes that are too small to survive 4-bit quantization. Using low-rank adaptation (LoRA) to freeze the base model and concentrate unlearning into trainable adapters preserves the effective update after quantization, leading to higher 4-bit utility and lower privacy leakage on the MUSE dataset for Llama-2-7B while maintaining strong forgetting.
What carries the argument
Low-rank adaptation (LoRA) adapters that concentrate the unlearning updates, allowing larger relative changes to survive quantization rounding.
Load-bearing premise
That the updates concentrated in the low-rank adapters have large enough magnitude to overcome the rounding errors introduced by 4-bit quantization.
What would settle it
Observing that a 4-bit quantized model after LoRA unlearning performs identically to the pre-unlearning model on the forgotten knowledge would disprove the robustness claim.
read the original abstract
Large Language Model (LLM) unlearning aims to remove targeted knowledge from a trained model, but practical deployments often require post-training quantization (PTQ) for efficient inference. However, aggressive low-bit PTQ can mask unlearning updates, causing quantized models to revert to pre-unlearning behavior. We show that standard full-parameter fine-tuning often induces parameter changes that are too small to survive 4-bit quantization. We propose quantization-robust unlearning via low-rank adaptation (LoRA): we freeze the base model and concentrate unlearning into trainable adapters so that the effective update is preserved after quantization. On Llama-2-7B evaluated with MUSE dataset (BOOKS and NEWS), LoRA improves 4-bit utility by up to 7.93 points (NPO+GDR on BOOKS: 50.17 to 58.10) and yields higher 4-bit utility on NEWS for GA+GDR (40.06 to 44.82, increase of 4.76). LoRA also substantially reduces privacy leakage under 4-bit PTQ, e.g., for GA+KLR on BOOKS, PrivLeak moves from -25.68 to -5.86 (closer to ideal 0), while maintaining strong forgetting (VerMem and KnowMem near 0). Thus, using LoRA for Machine Unlearning is beneficial for scenarios where quantization is necessary for model deployment.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that full-parameter unlearning in LLMs produces small parameter deltas that are erased by aggressive 4-bit post-training quantization (PTQ), causing quantized models to revert to pre-unlearning behavior. It proposes concentrating unlearning updates into low-rank adapters (LoRA) while freezing the base model, so that the effective updates survive quantization. On Llama-2-7B with the MUSE dataset (BOOKS and NEWS), LoRA yields up to 7.93-point gains in 4-bit utility (e.g., NPO+GDR on BOOKS: 50.17 to 58.10) and reduces privacy leakage (e.g., GA+KLR on BOOKS PrivLeak from -25.68 to -5.86) while preserving strong forgetting metrics.
Significance. If the central claim holds, the work offers a practical, low-overhead technique for making LLM unlearning compatible with quantized inference, addressing a deployment-relevant gap. The reported numeric improvements on standard benchmarks are concrete and directionally promising; however, the absence of direct mechanistic verification (update magnitudes) and statistical controls limits the strength of the conclusions.
major comments (2)
- [Experimental Results / Evaluation] The manuscript's explanatory claim—that full-parameter unlearning deltas are too small to survive 4-bit rounding while LoRA deltas are large enough to persist—is load-bearing but unsupported by direct evidence. No pre- versus post-quantization L2 or Frobenius norms of the effective weight updates (or per-layer delta magnitudes) are reported or compared between full fine-tuning and LoRA runs.
- [Abstract and §4 (Results)] All reported improvements (e.g., 7.93-point utility gain on BOOKS, 4.76-point gain on NEWS) are presented as single-point estimates without error bars, multiple random seeds, or statistical significance tests. This makes it impossible to assess whether the gains are robust or could arise from optimization trajectory differences rather than magnitude preservation.
minor comments (1)
- [Abstract and Experiments] The abstract and results sections would benefit from an explicit statement of the LoRA rank, alpha, and dropout values used in the reported runs, as well as the exact quantization scheme (e.g., GPTQ, AWQ) and calibration data.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the paper.
read point-by-point responses
-
Referee: The manuscript's explanatory claim—that full-parameter unlearning deltas are too small to survive 4-bit rounding while LoRA deltas are large enough to persist—is load-bearing but unsupported by direct evidence. No pre- versus post-quantization L2 or Frobenius norms of the effective weight updates (or per-layer delta magnitudes) are reported or compared between full fine-tuning and LoRA runs.
Authors: We agree that direct evidence of update magnitudes would provide stronger support for the central claim. Although the performance metrics under quantization offer indirect validation, we will add a new analysis in the revised manuscript. Specifically, we will report the L2 and Frobenius norms of the weight updates pre- and post-quantization for both full-parameter fine-tuning and LoRA, including per-layer breakdowns where relevant. This will be included in Section 4 or a new subsection. revision: yes
-
Referee: All reported improvements (e.g., 7.93-point utility gain on BOOKS, 4.76-point gain on NEWS) are presented as single-point estimates without error bars, multiple random seeds, or statistical significance tests. This makes it impossible to assess whether the gains are robust or could arise from optimization trajectory differences rather than magnitude preservation.
Authors: We acknowledge that reporting results from multiple random seeds with error bars and statistical tests would enhance the reliability of our findings. Due to the substantial computational resources required for fine-tuning and evaluating Llama-2-7B across multiple seeds, our current experiments used a single fixed seed per configuration for reproducibility. In the revision, we will include a discussion of this limitation and provide results from at least three additional seeds for the key experiments if feasible, or at minimum report variance where available from our logs. We will also add significance tests where appropriate. revision: partial
Circularity Check
Empirical results on held-out sets exhibit no circularity
full rationale
The manuscript presents direct experimental measurements of post-quantization utility (e.g., NPO+GDR on BOOKS: 50.17 to 58.10) and privacy leakage (e.g., GA+KLR PrivLeak from -25.68 to -5.86) after applying LoRA versus full fine-tuning on the MUSE dataset. These outcomes are obtained by running the unlearning procedures, applying 4-bit PTQ, and evaluating on separate test splits; no equations, fitted parameters, or self-citations are invoked to derive the reported deltas. The central premise that LoRA updates survive rounding because they are larger in magnitude is supported only by the observed metric differences, not by any reduction of those metrics to quantities defined inside the paper itself. The work is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Unlearning objectives (NPO, GA, GDR, KLR) produce parameter updates whose effect can be isolated in low-rank adapters.
- domain assumption 4-bit PTQ rounding is the dominant source of unlearning degradation.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.