TailLoR: Protecting Principal Components in Parameter-Efficient Continual Learning

Alexandra Dragomir; Antonio Barbalau; Florin Brad; Ioana Pintilie; Marius Dragoi

arxiv: 2606.06494 · v1 · pith:I3AHSO3Znew · submitted 2026-06-04 · 💻 cs.LG

TailLoR: Protecting Principal Components in Parameter-Efficient Continual Learning

Marius Dragoi , Ioana Pintilie , Alexandra Dragomir , Antonio Barbalau , Florin Brad This is my paper

Pith reviewed 2026-06-28 01:57 UTC · model grok-4.3

classification 💻 cs.LG

keywords continual learningparameter-efficient fine-tuningsingular value decompositionlow-rank adaptationspectral penaltytask interferenceprincipal components

0 comments

The pith

TailLoR keeps singular vector bases U and V fixed while applying low-rank updates to singular values under a soft penalty on dominant directions to limit task interference.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TailLoR as a parameter-efficient approach to continual learning that treats the singular bases of pre-trained weights as an unchanging reference frame. It learns low-rank adjustments only to the singular value matrix and adds a penalty term that discourages changes along the largest singular directions. This directs adaptation toward the more flexible smaller singular values instead. A sympathetic reader would care because the approach targets catastrophic forgetting in sequential tasks without increasing parameter count or requiring data replay.

Core claim

TailLoR utilizes the singular bases U and V of the pre-trained weights as a fixed reference frame to learn a low-rank update applied to the singular value matrix. A soft spectral penalty discourages updates aligned with dominant singular directions, reducing interference while routing fine-grained adaptation into the highly flexible, long-tail spectral coordinates.

What carries the argument

Fixed singular bases U and V used as reference frame for low-rank updates to the singular value matrix, controlled by a soft spectral penalty that protects dominant directions.

If this is right

Principal components of earlier tasks remain protected, lowering interference on subsequent tasks.
Fine-grained adaptation occurs mainly in the long-tail singular directions that tolerate change more readily.
Parameter count stays low because only low-rank factors update the singular values.
The method supports sequential learning without storing previous task data or full model copies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same fixed-basis idea might extend to other matrix factorizations used in adaptation techniques.
Combining the penalty with replay or regularization methods could further stabilize performance across many tasks.
The approach suggests a way to prioritize spectral coordinates for updates that could apply beyond continual learning to other fine-tuning scenarios.

Load-bearing premise

Fixing the singular vectors and applying a soft penalty only to dominant singular values reduces task interference enough without restricting the model's ability to learn new tasks.

What would settle it

A sequence of tasks where models trained with the spectral penalty show no measurable reduction in forgetting rates compared to the same low-rank method without the penalty would falsify the central claim.

Figures

Figures reproduced from arXiv: 2606.06494 by Alexandra Dragomir, Antonio Barbalau, Florin Brad, Ioana Pintilie, Marius Dragoi.

**Figure 3.** Figure 3: Tail penalty matrix (left) and distribution of [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 2.** Figure 2: Head penalty matrix (left) and distribution of [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

read the original abstract

Parameter-efficient finetuning methods based on spectral decomposition have enabled progress in Continual Learning. In this paper we introduce TailLoR, which utilizes the singular bases U and V of the pre-trained weights as a fixed reference frame to learn a low-rank update applied to the singular value matrix. A soft spectral penalty discourages updates aligned with dominant singular directions, reducing interference while routing fine-grained adaptation into the highly flexible, long-tail spectral coordinates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

TailLoR fixes U and V from the pre-trained SVD and adds a soft penalty on dominant singular values to route low-rank updates into the tail, but the abstract supplies no results or derivations to show this actually cuts interference without hurting new-task capacity.

read the letter

The new piece is the specific construction: keep the singular vectors fixed as a reference frame, apply the low-rank update only to the diagonal of singular values, and use a soft spectral penalty that pushes changes away from the large singular values. This is presented as a way to reduce task interference in parameter-efficient continual learning while still allowing adaptation in the smaller coordinates.

The approach builds directly on prior spectral PEFT work and tries to make the update respect the original weight structure rather than treating all directions equally. That choice is at least coherent on its own terms.

The main weakness is the lack of any supporting evidence in the description. There are no equations shown for the penalty, no experimental results, no comparisons to other continual-learning PEFT baselines, and no discussion of whether the tail coordinates actually carry enough capacity for new tasks. The central assumption—that discouraging updates on dominant directions will protect prior tasks without starving new ones—remains untested in what is provided.

This is the kind of incremental method paper that matters to people already working on low-rank adaptation and lifelong learning. A reader looking for a new penalty idea in that niche could find it useful to examine, but only if the full paper contains reproducible experiments and ablations.

I would send it to peer review so the empirical claims can be checked properly.

Referee Report

2 major / 0 minor

Summary. The paper introduces TailLoR, a parameter-efficient continual learning method that fixes the singular vectors U and V from pre-trained weights as a reference frame, applies a low-rank update exclusively to the singular-value matrix, and employs a soft spectral penalty to discourage updates along dominant singular directions while routing adaptation into long-tail spectral coordinates.

Significance. If the construction demonstrably reduces task interference without sacrificing new-task capacity, the approach would offer a targeted spectral mechanism for protecting principal components in PEFT-based continual learning, extending existing singular-value methods with an explicit penalty on dominant directions.

major comments (2)

[Abstract] Abstract: the central claim that the soft spectral penalty reduces interference while preserving capacity rests on an unverified assumption; no derivation, update rule, or quantitative condition is supplied to show how the penalty interacts with the low-rank update on the singular-value matrix.
[Abstract] No experimental section or results are referenced in the provided description to test whether the long-tail routing actually mitigates forgetting on prior tasks or enables sufficient adaptation on new tasks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the review. We address the two major comments on the abstract below. Both point to the need for greater specificity in the abstract, which we will address through revision.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that the soft spectral penalty reduces interference while preserving capacity rests on an unverified assumption; no derivation, update rule, or quantitative condition is supplied to show how the penalty interacts with the low-rank update on the singular-value matrix.

Authors: The full manuscript derives the soft spectral penalty, its interaction with the low-rank update on the singular-value matrix, and the associated update rules and quantitative conditions in Section 3. The abstract summarizes the high-level idea but omits these details due to length limits. We will revise the abstract to briefly note the penalty mechanism and direct readers to Section 3 for the derivation and analysis. revision: yes
Referee: [Abstract] No experimental section or results are referenced in the provided description to test whether the long-tail routing actually mitigates forgetting on prior tasks or enables sufficient adaptation on new tasks.

Authors: The abstract does not reference experiments or results. The full manuscript contains Section 4, which reports experiments on standard continual learning benchmarks showing that the long-tail routing reduces forgetting on prior tasks while supporting adaptation on new tasks. We will revise the abstract to include a concise statement of these empirical findings. revision: yes

Circularity Check

0 steps flagged

No significant circularity in method description

full rationale

The paper presents TailLoR as a direct construction: singular bases U and V are taken as fixed from pre-trained weights, a low-rank update is applied only to the singular-value matrix, and a soft spectral penalty is introduced to discourage dominant-direction updates. No equations, derivations, or quantitative predictions appear in the provided abstract or description that could reduce to their own inputs by construction. There are no fitted parameters renamed as predictions, no self-citation load-bearing steps, and no uniqueness theorems or ansatzes smuggled in. The central claim is the method definition itself, which is self-contained and does not rely on any internal reduction or circular justification.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only input supplies no information on free parameters, axioms, or invented entities; full text would be required to populate this ledger.

pith-pipeline@v0.9.1-grok · 5603 in / 1105 out tokens · 27802 ms · 2026-06-28T01:57:13.443454+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

30 extracted references · 8 canonical work pages · 2 internal anchors

[1]

arXiv preprint arXiv:2504.07097 , year=

Sculpting subspaces: Constrained full fine-tuning in llms for continual learning , author=. arXiv preprint arXiv:2504.07097 , year=

arXiv
[2]

Advances in Neural Information Processing Systems , volume=

Svft: Parameter-efficient fine-tuning with singular vectors , author=. Advances in Neural Information Processing Systems , volume=
[3]

M i L o RA : Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning

Wang, Hanqing and Li, Yixia and Wang, Shuo and Chen, Guanhua and Chen, Yun. M i L o RA : Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025. doi:10.1...

work page doi:10.18653/v1/2025.naacl-long.248 2025
[4]

Advances in Neural Information Processing Systems , volume=

Pissa: Principal singular values and singular vectors adaptation of large language models , author=. Advances in Neural Information Processing Systems , volume=
[5]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Yan. InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning , booktitle =. 2024 , url =. doi:10.1109/CVPR52733.2024.02231 , timestamp =

work page doi:10.1109/cvpr52733.2024.02231 2024
[6]

ELLA : Efficient Lifelong Learning for Adapters in Large Language Models

Das Biswas, Shristi and Zhang, Yue and Pal, Anwesan and Bhargava, Radhika and Roy, Kaushik. ELLA : Efficient Lifelong Learning for Adapters in Large Language Models. Proceedings of the 19th Conference of the E uropean Chapter of the A ssociation for C omputational L inguistics (Volume 1: Long Papers). 2026. doi:10.18653/v1/2026.eacl-long.84

work page doi:10.18653/v1/2026.eacl-long.84 2026
[7]

Orthogonal Subspace Learning for Language Model Continual Learning

Wang, Xiao and Chen, Tianze and Ge, Qiming and Xia, Han and Bao, Rong and Zheng, Rui and Zhang, Qi and Gui, Tao and Huang, Xuanjing. Orthogonal Subspace Learning for Language Model Continual Learning. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.715

work page doi:10.18653/v1/2023.findings-emnlp.715 2023
[8]

Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen

Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen. LoRA: Low-Rank Adaptation of Large Language Models , booktitle =. 2022 , url =

2022
[9]

Parameter-Efficient Transfer Learning for

Neil Houlsby and Andrei Giurgiu and Stanislaw Jastrzebski and Bruna Morrone and Quentin de Laroussilhe and Andrea Gesmundo and Mona Attariyan and Sylvain Gelly , editor =. Parameter-Efficient Transfer Learning for. Proceedings of the 36th International Conference on Machine Learning,. 2019 , url =

2019
[10]

A Survey of Large Language Models

Wayne Xin Zhao and Kun Zhou and Junyi Li and Tianyi Tang and Xiaolei Wang and Yupeng Hou and Yingqian Min and Beichen Zhang and Junjie Zhang and Zican Dong and Yifan Du and Chen Yang and Yushuo Chen and Zhipeng Chen and Jinhao Jiang and Ruiyang Ren and Yifan Li and Xinyu Tang and Zikang Liu and Peiyu Liu and Jian. A Survey of Large Language Models , journ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.18223 2023
[11]

Large Language Models: A Survey

Shervin Minaee and Tom. Large Language Models:. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2402.06196 , eprinttype =. 2402.06196 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.06196 2024
[12]

Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen

Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen. LoRA: Low-Rank Adaptation of Large Language Models. CoRR , volume =. 2021 , url =. 2106.09685 , timestamp =

Pith/arXiv arXiv 2021
[13]

6th International Conference on Learning Representations (

Chunyuan Li and Heerad Farkhoor and Rosanne Liu and Jason Yosinski , title =. 6th International Conference on Learning Representations (. 2018 , url =

2018
[14]

Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning , booktitle =

Armen Aghajanyan and Sonal Gupta and Luke Zettlemoyer , title =. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing,. 2021 , url =. doi:10.18653/V1/2021.ACL-LONG.568 , timestamp =

work page doi:10.18653/v1/2021.acl-long.568 2021
[15]

Psychology of Learning and Motivation , volume=

Catastrophic interference in connectionist networks: The sequential learning problem , author=. Psychology of Learning and Motivation , volume=. 1989 , publisher=

1989
[16]

Trends in Cognitive Sciences , volume=

Catastrophic forgetting in connectionist networks , author=. Trends in Cognitive Sciences , volume=. 1999 , publisher=

1999
[17]

Rabinowitz and Joel Veness and Guillaume Desjardins and Andrei A

James Kirkpatrick and Razvan Pascanu and Neil C. Rabinowitz and Joel Veness and Guillaume Desjardins and Andrei A. Rusu and Kieran Milan and John Quan and Tiago Ramalho and Agnieszka Grabska. Overcoming catastrophic forgetting in neural networks , journal =. 2016 , url =. 1612.00796 , timestamp =

arXiv 2016
[18]

Gradient Episodic Memory for Continual Learning , booktitle =

David Lopez. Gradient Episodic Memory for Continual Learning , booktitle =. 2017 , url =

2017
[19]

CoRR , volume =

Cyprien de Masson d'Autume and Sebastian Ruder and Lingpeng Kong and Dani Yogatama , title =. CoRR , volume =. 2019 , url =. 1906.01076 , timestamp =

arXiv 2019
[20]

7th International Conference on Learning Representations (

Matthew Riemer and Ignacio Cases and Robert Ajemian and Miao Liu and Irina Rish and Yuhai Tu and Gerald Tesauro , title =. 7th International Conference on Learning Representations (. 2019 , url =

2019
[21]

6th International Conference on Learning Representations (

Jaehong Yoon and Eunho Yang and Jeongtae Lee and Sung Ju Hwang , title =. 6th International Conference on Learning Representations (. 2018 , url =

2018
[22]

Proceedings of the 36th International Conference on Machine Learning (

Xilai Li and Yingbo Zhou and Tianfu Wu and Richard Socher and Caiming Xiong , title =. Proceedings of the 36th International Conference on Machine Learning (. 2019 , url =

2019
[23]

Rusu and Neil C

Andrei A. Rusu and Neil C. Rabinowitz and Guillaume Desjardins and Hubert Soyer and James Kirkpatrick and Koray Kavukcuoglu and Razvan Pascanu and Raia Hadsell , title =. CoRR , volume =. 2016 , url =. 1606.04671 , timestamp =

Pith/arXiv arXiv 2016
[24]

2007 15th European signal processing conference , pages=

The effective rank: A measure of effective dimensionality , author=. 2007 15th European signal processing conference , pages=. 2007 , organization=

2007
[25]

International Conference on Learning Representations , volume=

The truth is in there: Improving reasoning in language models with layer-selective rank reduction , author=. International Conference on Learning Representations , volume=
[26]

arXiv preprint arXiv:2602.21919 , year=

Learning in the Null Space: Small Singular Values for Continual Learning , author=. arXiv preprint arXiv:2602.21919 , year=

arXiv
[27]

arXiv preprint arXiv:2310.06762 , year=

Trace: A comprehensive benchmark for continual learning in large language models , author=. arXiv preprint arXiv:2310.06762 , year=

arXiv
[28]

Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015 , pages =

Xiang Zhang and Junbo Jake Zhao and Yann LeCun , title =. Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015 , pages =. 2015 , url =

2015
[29]

The Eleventh International Conference on Learning Representations (

Anastasia Razdaibiedina and Yuning Mao and Rui Hou and Madian Khabsa and Mike Lewis and Amjad Almahairi , title =. The Eleventh International Conference on Learning Representations (. 2023 , url =

2023
[30]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =

Zhicheng Wang and Yufang Liu and Tao Ji and Xiaoling Wang and Yuanbin Wu and Congcong Jiang and Ye Chao and Zhencong Han and Ling Wang and Xu Shao and Wenqiu Zeng , title =. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =. 2023 , url =. doi:10.18653/V1/2023.ACL-LONG.612 , timestamp =

work page doi:10.18653/v1/2023.acl-long.612 2023

[1] [1]

arXiv preprint arXiv:2504.07097 , year=

Sculpting subspaces: Constrained full fine-tuning in llms for continual learning , author=. arXiv preprint arXiv:2504.07097 , year=

arXiv

[2] [2]

Advances in Neural Information Processing Systems , volume=

Svft: Parameter-efficient fine-tuning with singular vectors , author=. Advances in Neural Information Processing Systems , volume=

[3] [3]

M i L o RA : Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning

Wang, Hanqing and Li, Yixia and Wang, Shuo and Chen, Guanhua and Chen, Yun. M i L o RA : Harnessing Minor Singular Components for Parameter-Efficient LLM Finetuning. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2025. doi:10.1...

work page doi:10.18653/v1/2025.naacl-long.248 2025

[4] [4]

Advances in Neural Information Processing Systems , volume=

Pissa: Principal singular values and singular vectors adaptation of large language models , author=. Advances in Neural Information Processing Systems , volume=

[5] [5]

In: 2024 IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR)

Yan. InfLoRA: Interference-Free Low-Rank Adaptation for Continual Learning , booktitle =. 2024 , url =. doi:10.1109/CVPR52733.2024.02231 , timestamp =

work page doi:10.1109/cvpr52733.2024.02231 2024

[6] [6]

ELLA : Efficient Lifelong Learning for Adapters in Large Language Models

Das Biswas, Shristi and Zhang, Yue and Pal, Anwesan and Bhargava, Radhika and Roy, Kaushik. ELLA : Efficient Lifelong Learning for Adapters in Large Language Models. Proceedings of the 19th Conference of the E uropean Chapter of the A ssociation for C omputational L inguistics (Volume 1: Long Papers). 2026. doi:10.18653/v1/2026.eacl-long.84

work page doi:10.18653/v1/2026.eacl-long.84 2026

[7] [7]

Orthogonal Subspace Learning for Language Model Continual Learning

Wang, Xiao and Chen, Tianze and Ge, Qiming and Xia, Han and Bao, Rong and Zheng, Rui and Zhang, Qi and Gui, Tao and Huang, Xuanjing. Orthogonal Subspace Learning for Language Model Continual Learning. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.715

work page doi:10.18653/v1/2023.findings-emnlp.715 2023

[8] [8]

Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen

Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen. LoRA: Low-Rank Adaptation of Large Language Models , booktitle =. 2022 , url =

2022

[9] [9]

Parameter-Efficient Transfer Learning for

Neil Houlsby and Andrei Giurgiu and Stanislaw Jastrzebski and Bruna Morrone and Quentin de Laroussilhe and Andrea Gesmundo and Mona Attariyan and Sylvain Gelly , editor =. Parameter-Efficient Transfer Learning for. Proceedings of the 36th International Conference on Machine Learning,. 2019 , url =

2019

[10] [10]

A Survey of Large Language Models

Wayne Xin Zhao and Kun Zhou and Junyi Li and Tianyi Tang and Xiaolei Wang and Yupeng Hou and Yingqian Min and Beichen Zhang and Junjie Zhang and Zican Dong and Yifan Du and Chen Yang and Yushuo Chen and Zhipeng Chen and Jinhao Jiang and Ruiyang Ren and Yifan Li and Xinyu Tang and Zikang Liu and Peiyu Liu and Jian. A Survey of Large Language Models , journ...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2303.18223 2023

[11] [11]

Large Language Models: A Survey

Shervin Minaee and Tom. Large Language Models:. CoRR , volume =. 2024 , url =. doi:10.48550/ARXIV.2402.06196 , eprinttype =. 2402.06196 , timestamp =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.06196 2024

[12] [12]

Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen

Edward J. Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen. LoRA: Low-Rank Adaptation of Large Language Models. CoRR , volume =. 2021 , url =. 2106.09685 , timestamp =

Pith/arXiv arXiv 2021

[13] [13]

6th International Conference on Learning Representations (

Chunyuan Li and Heerad Farkhoor and Rosanne Liu and Jason Yosinski , title =. 6th International Conference on Learning Representations (. 2018 , url =

2018

[14] [14]

Intrinsic Dimensionality Explains the Effectiveness of Language Model Fine-Tuning , booktitle =

Armen Aghajanyan and Sonal Gupta and Luke Zettlemoyer , title =. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing,. 2021 , url =. doi:10.18653/V1/2021.ACL-LONG.568 , timestamp =

work page doi:10.18653/v1/2021.acl-long.568 2021

[15] [15]

Psychology of Learning and Motivation , volume=

Catastrophic interference in connectionist networks: The sequential learning problem , author=. Psychology of Learning and Motivation , volume=. 1989 , publisher=

1989

[16] [16]

Trends in Cognitive Sciences , volume=

Catastrophic forgetting in connectionist networks , author=. Trends in Cognitive Sciences , volume=. 1999 , publisher=

1999

[17] [17]

Rabinowitz and Joel Veness and Guillaume Desjardins and Andrei A

James Kirkpatrick and Razvan Pascanu and Neil C. Rabinowitz and Joel Veness and Guillaume Desjardins and Andrei A. Rusu and Kieran Milan and John Quan and Tiago Ramalho and Agnieszka Grabska. Overcoming catastrophic forgetting in neural networks , journal =. 2016 , url =. 1612.00796 , timestamp =

arXiv 2016

[18] [18]

Gradient Episodic Memory for Continual Learning , booktitle =

David Lopez. Gradient Episodic Memory for Continual Learning , booktitle =. 2017 , url =

2017

[19] [19]

CoRR , volume =

Cyprien de Masson d'Autume and Sebastian Ruder and Lingpeng Kong and Dani Yogatama , title =. CoRR , volume =. 2019 , url =. 1906.01076 , timestamp =

arXiv 2019

[20] [20]

7th International Conference on Learning Representations (

Matthew Riemer and Ignacio Cases and Robert Ajemian and Miao Liu and Irina Rish and Yuhai Tu and Gerald Tesauro , title =. 7th International Conference on Learning Representations (. 2019 , url =

2019

[21] [21]

6th International Conference on Learning Representations (

Jaehong Yoon and Eunho Yang and Jeongtae Lee and Sung Ju Hwang , title =. 6th International Conference on Learning Representations (. 2018 , url =

2018

[22] [22]

Proceedings of the 36th International Conference on Machine Learning (

Xilai Li and Yingbo Zhou and Tianfu Wu and Richard Socher and Caiming Xiong , title =. Proceedings of the 36th International Conference on Machine Learning (. 2019 , url =

2019

[23] [23]

Rusu and Neil C

Andrei A. Rusu and Neil C. Rabinowitz and Guillaume Desjardins and Hubert Soyer and James Kirkpatrick and Koray Kavukcuoglu and Razvan Pascanu and Raia Hadsell , title =. CoRR , volume =. 2016 , url =. 1606.04671 , timestamp =

Pith/arXiv arXiv 2016

[24] [24]

2007 15th European signal processing conference , pages=

The effective rank: A measure of effective dimensionality , author=. 2007 15th European signal processing conference , pages=. 2007 , organization=

2007

[25] [25]

International Conference on Learning Representations , volume=

The truth is in there: Improving reasoning in language models with layer-selective rank reduction , author=. International Conference on Learning Representations , volume=

[26] [26]

arXiv preprint arXiv:2602.21919 , year=

Learning in the Null Space: Small Singular Values for Continual Learning , author=. arXiv preprint arXiv:2602.21919 , year=

arXiv

[27] [27]

arXiv preprint arXiv:2310.06762 , year=

Trace: A comprehensive benchmark for continual learning in large language models , author=. arXiv preprint arXiv:2310.06762 , year=

arXiv

[28] [28]

Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015 , pages =

Xiang Zhang and Junbo Jake Zhao and Yann LeCun , title =. Advances in Neural Information Processing Systems 28: Annual Conference on Neural Information Processing Systems 2015 , pages =. 2015 , url =

2015

[29] [29]

The Eleventh International Conference on Learning Representations (

Anastasia Razdaibiedina and Yuning Mao and Rui Hou and Madian Khabsa and Mike Lewis and Amjad Almahairi , title =. The Eleventh International Conference on Learning Representations (. 2023 , url =

2023

[30] [30]

Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =

Zhicheng Wang and Yufang Liu and Tao Ji and Xiaoling Wang and Yuanbin Wu and Congcong Jiang and Ye Chao and Zhencong Han and Ling Wang and Xu Shao and Wenqiu Zeng , title =. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages =. 2023 , url =. doi:10.18653/V1/2023.ACL-LONG.612 , timestamp =

work page doi:10.18653/v1/2023.acl-long.612 2023