pith. sign in

arxiv: 2602.13151 · v3 · submitted 2026-02-13 · 💻 cs.LG · cs.CL

Quantization-Robust LLM Unlearning via Low-Rank Adaptation

Pith reviewed 2026-05-15 22:14 UTC · model grok-4.3

classification 💻 cs.LG cs.CL
keywords LLM unlearningquantizationLoRAlow-rank adaptationpost-training quantizationmachine unlearningprivacy leakage
0
0 comments X

The pith

Low-rank adapters preserve unlearning effects under 4-bit quantization where full fine-tuning fails.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that unlearning updates from full-parameter fine-tuning are often too small to survive the rounding in 4-bit quantization, causing the model to revert to its original behavior. By using low-rank adaptation to apply unlearning only through small trainable matrices while freezing the base model, the updates gain sufficient magnitude to remain effective after quantization. This method improves the utility of the quantized unlearned model by up to nearly 8 points on benchmark tasks and reduces privacy leaks, all while ensuring the targeted knowledge is still forgotten. The findings matter for practical deployment of unlearned LLMs on hardware that requires quantization for efficiency.

Core claim

Standard full-parameter fine-tuning often induces parameter changes that are too small to survive 4-bit quantization. Using low-rank adaptation (LoRA) to freeze the base model and concentrate unlearning into trainable adapters preserves the effective update after quantization, leading to higher 4-bit utility and lower privacy leakage on the MUSE dataset for Llama-2-7B while maintaining strong forgetting.

What carries the argument

Low-rank adaptation (LoRA) adapters that concentrate the unlearning updates, allowing larger relative changes to survive quantization rounding.

Load-bearing premise

That the updates concentrated in the low-rank adapters have large enough magnitude to overcome the rounding errors introduced by 4-bit quantization.

What would settle it

Observing that a 4-bit quantized model after LoRA unlearning performs identically to the pre-unlearning model on the forgotten knowledge would disprove the robustness claim.

read the original abstract

Large Language Model (LLM) unlearning aims to remove targeted knowledge from a trained model, but practical deployments often require post-training quantization (PTQ) for efficient inference. However, aggressive low-bit PTQ can mask unlearning updates, causing quantized models to revert to pre-unlearning behavior. We show that standard full-parameter fine-tuning often induces parameter changes that are too small to survive 4-bit quantization. We propose quantization-robust unlearning via low-rank adaptation (LoRA): we freeze the base model and concentrate unlearning into trainable adapters so that the effective update is preserved after quantization. On Llama-2-7B evaluated with MUSE dataset (BOOKS and NEWS), LoRA improves 4-bit utility by up to 7.93 points (NPO+GDR on BOOKS: 50.17 to 58.10) and yields higher 4-bit utility on NEWS for GA+GDR (40.06 to 44.82, increase of 4.76). LoRA also substantially reduces privacy leakage under 4-bit PTQ, e.g., for GA+KLR on BOOKS, PrivLeak moves from -25.68 to -5.86 (closer to ideal 0), while maintaining strong forgetting (VerMem and KnowMem near 0). Thus, using LoRA for Machine Unlearning is beneficial for scenarios where quantization is necessary for model deployment.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper claims that full-parameter unlearning in LLMs produces small parameter deltas that are erased by aggressive 4-bit post-training quantization (PTQ), causing quantized models to revert to pre-unlearning behavior. It proposes concentrating unlearning updates into low-rank adapters (LoRA) while freezing the base model, so that the effective updates survive quantization. On Llama-2-7B with the MUSE dataset (BOOKS and NEWS), LoRA yields up to 7.93-point gains in 4-bit utility (e.g., NPO+GDR on BOOKS: 50.17 to 58.10) and reduces privacy leakage (e.g., GA+KLR on BOOKS PrivLeak from -25.68 to -5.86) while preserving strong forgetting metrics.

Significance. If the central claim holds, the work offers a practical, low-overhead technique for making LLM unlearning compatible with quantized inference, addressing a deployment-relevant gap. The reported numeric improvements on standard benchmarks are concrete and directionally promising; however, the absence of direct mechanistic verification (update magnitudes) and statistical controls limits the strength of the conclusions.

major comments (2)
  1. [Experimental Results / Evaluation] The manuscript's explanatory claim—that full-parameter unlearning deltas are too small to survive 4-bit rounding while LoRA deltas are large enough to persist—is load-bearing but unsupported by direct evidence. No pre- versus post-quantization L2 or Frobenius norms of the effective weight updates (or per-layer delta magnitudes) are reported or compared between full fine-tuning and LoRA runs.
  2. [Abstract and §4 (Results)] All reported improvements (e.g., 7.93-point utility gain on BOOKS, 4.76-point gain on NEWS) are presented as single-point estimates without error bars, multiple random seeds, or statistical significance tests. This makes it impossible to assess whether the gains are robust or could arise from optimization trajectory differences rather than magnitude preservation.
minor comments (1)
  1. [Abstract and Experiments] The abstract and results sections would benefit from an explicit statement of the LoRA rank, alpha, and dropout values used in the reported runs, as well as the exact quantization scheme (e.g., GPTQ, AWQ) and calibration data.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive comments on our manuscript. We address each major comment below and outline the revisions we will make to strengthen the paper.

read point-by-point responses
  1. Referee: The manuscript's explanatory claim—that full-parameter unlearning deltas are too small to survive 4-bit rounding while LoRA deltas are large enough to persist—is load-bearing but unsupported by direct evidence. No pre- versus post-quantization L2 or Frobenius norms of the effective weight updates (or per-layer delta magnitudes) are reported or compared between full fine-tuning and LoRA runs.

    Authors: We agree that direct evidence of update magnitudes would provide stronger support for the central claim. Although the performance metrics under quantization offer indirect validation, we will add a new analysis in the revised manuscript. Specifically, we will report the L2 and Frobenius norms of the weight updates pre- and post-quantization for both full-parameter fine-tuning and LoRA, including per-layer breakdowns where relevant. This will be included in Section 4 or a new subsection. revision: yes

  2. Referee: All reported improvements (e.g., 7.93-point utility gain on BOOKS, 4.76-point gain on NEWS) are presented as single-point estimates without error bars, multiple random seeds, or statistical significance tests. This makes it impossible to assess whether the gains are robust or could arise from optimization trajectory differences rather than magnitude preservation.

    Authors: We acknowledge that reporting results from multiple random seeds with error bars and statistical tests would enhance the reliability of our findings. Due to the substantial computational resources required for fine-tuning and evaluating Llama-2-7B across multiple seeds, our current experiments used a single fixed seed per configuration for reproducibility. In the revision, we will include a discussion of this limitation and provide results from at least three additional seeds for the key experiments if feasible, or at minimum report variance where available from our logs. We will also add significance tests where appropriate. revision: partial

Circularity Check

0 steps flagged

Empirical results on held-out sets exhibit no circularity

full rationale

The manuscript presents direct experimental measurements of post-quantization utility (e.g., NPO+GDR on BOOKS: 50.17 to 58.10) and privacy leakage (e.g., GA+KLR PrivLeak from -25.68 to -5.86) after applying LoRA versus full fine-tuning on the MUSE dataset. These outcomes are obtained by running the unlearning procedures, applying 4-bit PTQ, and evaluating on separate test splits; no equations, fitted parameters, or self-citations are invoked to derive the reported deltas. The central premise that LoRA updates survive rounding because they are larger in magnitude is supported only by the observed metric differences, not by any reduction of those metrics to quantities defined inside the paper itself. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work is empirical and relies on standard assumptions of the unlearning and quantization literature rather than new axioms or invented entities.

axioms (2)
  • domain assumption Unlearning objectives (NPO, GA, GDR, KLR) produce parameter updates whose effect can be isolated in low-rank adapters.
    Invoked when the paper states that freezing the base and training only adapters preserves the unlearning effect after quantization.
  • domain assumption 4-bit PTQ rounding is the dominant source of unlearning degradation.
    Central premise that motivates the LoRA approach; stated in the abstract as the reason full fine-tuning fails.

pith-pipeline@v0.9.0 · 5587 in / 1298 out tokens · 21495 ms · 2026-05-15T22:14:40.268524+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.