pith. sign in

arxiv: 2604.12526 · v1 · submitted 2026-04-14 · 💻 cs.LG · cs.AI

Orthogonal Subspace Projection for Continual Machine Unlearning via SVD-Based LoRA

Pith reviewed 2026-05-10 15:55 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords continual unlearningmachine unlearningLoRAorthogonal subspace projectionSVDparameter-efficient fine-tuningtask isolation
0
0 comments X

The pith

SVD-guided orthogonal projection lets LoRA handle thirty sequential unlearning tasks while keeping retained accuracy near baseline.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper demonstrates that each new low-rank update can be trained inside the orthogonal complement of all earlier subspaces, identified via SVD, so that sequential deletion requests do not interfere with retained knowledge. This matters because real deletion requests arrive over time and simply adding or fusing adapters causes the model to lose useful performance on everything else. By enforcing the constraint inside the training loop rather than after the fact, the method achieves task isolation without any extra routing logic at inference. Experiments on CIFAR-100 and MNIST confirm that accuracy stays near 58 percent after thirty tasks where standard static fusion collapses to roughly 13 percent.

Core claim

Constraining each new LoRA update during training so that it lies in the orthogonal complement of the subspaces used by earlier unlearning tasks preserves task isolation without requiring dynamic routing at deployment. After thirty sequential unlearning tasks, state-of-the-art static fusion reduces retained accuracy from 60.39 percent to 12.70 percent, whereas the proposed in-training constrained optimization maintains baseline performance of approximately 58.1 percent while preserving strong unlearning efficacy.

What carries the argument

SVD-based orthogonal subspace projection that forces each new LoRA adapter into the orthogonal complement of subspaces already used by prior unlearning tasks.

If this is right

  • Retained accuracy remains close to the original baseline across long sequences of deletion requests.
  • Each unlearning task stays isolated from the others without parameter collision.
  • Inference requires no dynamic routing or task identification, keeping deployment cost unchanged.
  • The same constrained training procedure works on both ResNet-20 and simpler architectures.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same isolation idea could be applied to other continual adaptation problems where tasks must not overwrite one another.
  • When subspace capacity is exhausted, increasing LoRA rank or periodically resetting the base model may become necessary.
  • The approach could reduce reliance on full retraining from scratch in privacy-regulated settings.

Load-bearing premise

The remaining orthogonal directions in each layer will always contain enough capacity for an effective new unlearning update without driving the optimizer into poor local minima.

What would settle it

Run the method until the cumulative rank of used subspaces approaches the hidden dimension of a layer and measure whether retained accuracy or unlearning success rate drops sharply on the next task.

Figures

Figures reproduced from arXiv: 2604.12526 by Juncheng Hu, Nasir Iqbal, Sangarapillai Lambotharan, Yogachandran Rahulamathavan.

Figure 1
Figure 1. Figure 1: High-level architecture comparision of the related works (top) and the proposed approach (bottom). [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Operational trade-off trajectories for continual unlearning by sequentially forgetting 3 classes 9, 5 and 3 ( [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Correlation between available orthogonal subspace dimensions and [PITH_FULL_IMAGE:figures/full_fig_p004_3.png] view at source ↗
read the original abstract

Continual machine unlearning aims to remove the influence of data that should no longer be retained, while preserving the usefulness of the model on everything else. This setting becomes especially difficult when deletion requests arrive sequentially, because the model must repeatedly adapt without erasing previously retained knowledge. Low-Rank Adaptation (LoRA) offers an efficient way to implement such updates, but naively combining many sequential LoRA modules leads to parameter collision, causing \textit{strong interference} between tasks. We propose a static alternative based on Singular Value Decomposition (SVD)-guided orthogonal subspace projection. Our method constrains each new LoRA update during training so that it lies in the orthogonal complement of the subspaces used by earlier unlearning tasks. This preserves task isolation without requiring dynamic routing at deployment. Experiments on CIFAR-100 with ResNet-20 and on MNIST show stable behavior across long sequences of unlearning tasks. After thirty sequential unlearning tasks, state-of-the-art static fusion reduces retained accuracy from 60.39\% to 12.70\%, whereas the proposed in-training constrained optimization maintains baseline performance ($\sim$58.1\%) while preserving strong unlearning efficacy.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes SVD-based orthogonal subspace projection to constrain LoRA updates during continual machine unlearning. Each new update is forced into the orthogonal complement of subspaces from prior tasks, avoiding parameter collision when statically fusing multiple LoRA modules. On CIFAR-100 (ResNet-20) and MNIST, the method is reported to maintain ~58.1% retained accuracy after 30 sequential unlearning tasks, while state-of-the-art static fusion drops from 60.39% to 12.70%, with claims of preserved unlearning efficacy.

Significance. If the orthogonality constraint successfully preserves sufficient capacity in the complement subspace for new unlearning gradients without forcing poor local minima, the approach would provide a practical static alternative to dynamic routing for sequential unlearning. The concrete long-sequence results on standard benchmarks constitute a strength, offering empirical grounding for the isolation-without-routing claim.

major comments (3)
  1. [Abstract] Abstract: the central performance claim (maintenance of ~58.1% retained accuracy after 30 tasks versus degradation to 12.70%) is presented without error bars, without ablations on LoRA rank or SVD truncation threshold, and without explicit verification that unlearning is measured by membership-inference or retraining attacks; these omissions directly affect assessment of whether the shrinking orthogonal complement retains usable capacity for later tasks.
  2. [Method] Method description: the mechanism for enforcing the orthogonality constraint during optimization (how the SVD projection is applied inside the training loop, whether it is a hard constraint or a regularizer, and the precise definition of the complement subspace) is not specified with equations or pseudocode, leaving the load-bearing claim that updates remain inside the complement unverified.
  3. [Experiments] Experiments section: no ablation or analysis is provided on how the allowable subspace dimension decreases with each rank-r SVD block, nor any test confirming that required unlearning gradient directions remain inside the complement rather than being projected away after many tasks; this directly tests the weakest assumption identified in the stress-test note.
minor comments (2)
  1. [Abstract] Abstract: the phrase 'strong unlearning efficacy' should be replaced by a quantitative metric or reference to a standard evaluation protocol.
  2. Consider adding a table or figure that reports retained accuracy, unlearning success rate, and subspace dimension remaining after each block of tasks, with standard deviations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the referee's constructive and detailed feedback on our manuscript. We address each major comment point by point below, indicating the revisions we will make to improve clarity and completeness.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the central performance claim (maintenance of ~58.1% retained accuracy after 30 tasks versus degradation to 12.70%) is presented without error bars, without ablations on LoRA rank or SVD truncation threshold, and without explicit verification that unlearning is measured by membership-inference or retraining attacks; these omissions directly affect assessment of whether the shrinking orthogonal complement retains usable capacity for later tasks.

    Authors: We agree that the abstract would be strengthened by these supporting details. In the revised manuscript we will report error bars on all accuracy figures, add ablations varying LoRA rank and SVD truncation threshold, and explicitly describe the unlearning verification protocol (membership-inference attacks together with retraining-from-scratch baselines). revision: yes

  2. Referee: [Method] Method description: the mechanism for enforcing the orthogonality constraint during optimization (how the SVD projection is applied inside the training loop, whether it is a hard constraint or a regularizer, and the precise definition of the complement subspace) is not specified with equations or pseudocode, leaving the load-bearing claim that updates remain inside the complement unverified.

    Authors: We acknowledge that the current method section lacks the required mathematical precision. The revised version will include explicit equations and pseudocode showing that the SVD projection is applied as a hard constraint at each optimization step, together with the precise definition of the complement subspace as the orthogonal complement of the span of all prior-task right singular vectors. revision: yes

  3. Referee: [Experiments] Experiments section: no ablation or analysis is provided on how the allowable subspace dimension decreases with each rank-r SVD block, nor any test confirming that required unlearning gradient directions remain inside the complement rather than being projected away after many tasks; this directly tests the weakest assumption identified in the stress-test note.

    Authors: We will add both an analysis of the progressive reduction in allowable subspace dimension across the 30 tasks and new experiments that measure the fraction of each unlearning gradient that lies inside the current complement (via projection norms and cosine similarity before/after projection). These results will be reported in the revised experiments section. revision: yes

Circularity Check

0 steps flagged

No circularity: method is an algorithmic training constraint with empirical validation

full rationale

The paper proposes an SVD-based orthogonal projection to constrain sequential LoRA updates during training so each new unlearning task occupies the orthogonal complement of prior subspaces. This is a design choice implemented as an in-training optimization constraint, not a post-hoc derivation or prediction. The reported performance numbers (e.g., retained accuracy after 30 tasks) are direct experimental outcomes on CIFAR-100 and MNIST; no equations reduce these results to fitted parameters by construction, nor do any self-citations or ansatzes form a load-bearing loop. The derivation chain is self-contained as an engineering solution whose validity rests on the experiments rather than tautological equivalence to its inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are stated. The approach implicitly assumes that SVD of prior LoRA updates yields stable subspaces and that orthogonality can be maintained without capacity collapse, but these are not formalized.

pith-pipeline@v0.9.0 · 5519 in / 1228 out tokens · 37162 ms · 2026-05-10T15:55:31.772828+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

11 extracted references · 11 canonical work pages

  1. [1]

    Bourtoule, V

    L. Bourtoule, V. Chandrasekaran, C. A. Choquette-Choo, H. Jia, A. Travers, B. Zhang, D. Lie, and N. Papernot, ``Machine unlearning,'' in 2021 IEEE Symposium on Security and Privacy (SP). 1em plus 0.5em minus 0.4em IEEE, 2021, pp. 141--159

  2. [2]

    E. J. Hu, Y. Shen, P. Wallis, Z. Allen-Zhu, Y. Li, S. Wang, L. Wang, and W. Chen, ``Lora: Low-rank adaptation of large language models,'' in International Conference on Learning Representations, 2022

  3. [3]

    Kirkpatrick, R

    J. Kirkpatrick, R. Pascanu, N. Rabinowitz, J. Veness, G. Desjardins, A. A. Rusu, K. Milan, J. Quan, T. Ramalho, A. Grabska-Barwinska et al., ``Overcoming catastrophic forgetting in neural networks,'' Proceedings of the national academy of sciences, vol. 114, no. 13, pp. 3521--3526, 2017

  4. [4]

    Lopez-Paz and M

    D. Lopez-Paz and M. Ranzato, ``Gradient episodic memory for continual learning,'' Advances in neural information processing systems, vol. 30, 2017

  5. [5]

    G. Zeng, Y. Chen, B. Cui, and S. Yu, ``Continual learning of context-dependent processing in neural networks,'' Nature Machine Intelligence, vol. 1, no. 8, pp. 364--372, 2019

  6. [6]

    Golatkar, A

    A. Golatkar, A. Achille, and S. Soatto, ``Eternal sunshine of the spotless net: Selective forgetting in deep networks,'' in Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, 2020, pp. 9304--9312

  7. [7]

    P. W. Koh and P. Liang, ``Understanding black-box predictions via influence functions,'' in International conference on machine learning. 1em plus 0.5em minus 0.4em PMLR, 2017, pp. 1885--1894

  8. [8]

    Magdalena Lazier, A

    L. Magdalena Lazier, A. Dhar, V. Stambolic, and L. Cavigelli, ``Ac-lora: Access control aware llms through dynamic adapter retrieval,'' in Advances in Neural Information Processing Systems (NeurIPS), 2025

  9. [9]

    G. Zhao, Q. Zhang, S. Zhai, D. Shen, T. Zhang, Y. Qiao, and T. Xu, ``I-lora: Iterative merging of routing-tuned adapters for continual learning,'' in sbumitted to ICLR 2025, 2024

  10. [11]

    Iqbal, Y

    N. Iqbal, Y. Rahulamathavan et al., ``Balanced sharding for targeted machine unlearning (bstm),'' in Computing Conference (accepted for publication), 2026

  11. [12]

    Loramoe: Alleviate world knowledge forgetting in large language models via moe-style plugin.arXiv preprint arXiv:2312.09979, 2023

    11em plus .33em minus .07em 4000 4000 100 4000 4000 500 `\.=1000 = #1 \@IEEEnotcompsoconly \@IEEEcompsoconly #1 * [1] 0pt [0pt][0pt] #1 * [1] 0pt [0pt][0pt] #1 * \| ** #1 \@IEEEauthorblockNstyle \@IEEEcompsocnotconfonly \@IEEEauthorblockAstyle \@IEEEcompsocnotconfonly \@IEEEcompsocconfonly \@IEEEauthordefaulttextstyle \@IEEEcompsocnotconfonly \@IEEEauthor...