Repurposing Adversarial Perturbations for Continual Learning: From Defense to Active Alignment

Gang Li; Jianguo Jiang; Ming Liu; Mingqi Liu; Min Yu; Ning Li; Ran Liu; Rongsheng Li; Weiqing Huang; Zhen Xu

arxiv: 2606.02322 · v1 · pith:I2QD6TSNnew · submitted 2026-06-01 · 💻 cs.LG · cs.AI

Repurposing Adversarial Perturbations for Continual Learning: From Defense to Active Alignment

Ran Liu , Min Yu , Mingqi Liu , Jianguo Jiang , Gang Li , Rongsheng Li , Ning Li , Zhen Xu

show 2 more authors

Weiqing Huang Ming Liu

This is my paper

Pith reviewed 2026-06-28 16:00 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords continual learningadversarial perturbationsgeometric controlcatastrophic forgettingprototype alignmentmodel robustnesstask transfer

0 comments

The pith

AdvCL repurposes adversarial perturbations into three modules that provide geometric control for continual learning with reduced forgetting.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that adversarial perturbations, typically seen as a threat, can be repurposed as a geometric control signal to stabilize adaptation across tasks in continual learning. It introduces three plug-in modules that promote smoothness, clip excessive alignments, and align representations to prior prototypes. A sympathetic reader would care because continual learning for models like large language models often fails due to forgetting and poor transfer between tasks. If correct, the method offers a flexible way to add geometric stability to many existing continual learning approaches.

Core claim

AdvCL repurposes adversarial perturbations as a geometric control signal for stable continual adaptation. It combines three plug-in modules: Intra-Smooth promotes local smoothness via small adversarial perturbations; Proto-Clip uses similarity clipping to prevent excessive alignment to the current task prototype; and Inter-Align applies directional alignment toward the previous task prototype to reduce representational gaps. Experiments show consistent gains in both standard performance and robustness, with lower forgetting and stronger transfer. The modules provide complementary gains when combined and can be integrated individually into replay, regularization, and dynamic architecture para

What carries the argument

The three plug-in modules (Intra-Smooth, Proto-Clip, Inter-Align) that repurpose adversarial perturbations as a geometric control signal.

If this is right

Each module can be added individually to replay-based, regularization-based, or dynamic architecture continual learning methods.
Combining the modules yields complementary gains in performance, robustness, forgetting reduction, and transfer.
The approach supplies a geometric control mechanism that works across diverse continual learning paradigms.
Quantified sensitivity of Intra-Smooth to perturbation settings and effects of Inter-Align on task similarity provide analysis tools for geometric distance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The geometric alignment technique might extend to non-adversarial perturbations for similar control in sequential learning.
The modules could be tested for interactions with other forms of representation regularization beyond the three described.
Measuring changes in geometric distance between prototypes could become a diagnostic for when to apply alignment in new task sequences.

Load-bearing premise

That repurposing adversarial perturbations via the three modules will reliably act as a stable geometric control signal that improves continual learning outcomes across paradigms without introducing new instabilities or negative interactions.

What would settle it

An experiment on a standard continual learning benchmark where adding one or more of the modules increases forgetting rates or lowers accuracy relative to the unmodified baseline.

Figures

Figures reproduced from arXiv: 2606.02322 by Gang Li, Jianguo Jiang, Ming Liu, Mingqi Liu, Min Yu, Ning Li, Ran Liu, Rongsheng Li, Weiqing Huang, Zhen Xu.

**Figure 2.** Figure 2: Overview of AdvCL. It integrates Intra-Smooth, Proto-Clip, and Inter-Align, and maintains a prototype [PITH_FULL_IMAGE:figures/full_fig_p002_2.png] view at source ↗

**Figure 3.** Figure 3: Cross-task similarity in representation space. [PITH_FULL_IMAGE:figures/full_fig_p012_3.png] view at source ↗

**Figure 4.** Figure 4: Robustness curve under PGD [PITH_FULL_IMAGE:figures/full_fig_p017_4.png] view at source ↗

**Figure 5.** Figure 5: Similarity gain under varying ϵinter. G Computational Overhead Analysis We take one training step without any module enabled as the baseline computation (denoted as 1.0×), which consists of a standard forward and backward update. Below, we approximate additional computation by counting extra forward passes and gradient computations introduced by each module. Intra-Smooth requires solving perturbation ∆⋆ i… view at source ↗

**Figure 6.** Figure 6: Per-task stagewise performance curves across all stages. [PITH_FULL_IMAGE:figures/full_fig_p020_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison between standard average performance and robust average performance under PGD attacks. [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

read the original abstract

In dynamic environments, large language models need to keep adapting to new tasks, but continual learning often suffers from forgetting, limited transfer, and vulnerability to adversarial perturbations. To address this, we present AdvCL, which repurposes adversarial perturbations as a geometric control signal for stable continual adaptation. AdvCL combines three plug-in modules: Intra-Smooth promotes local smoothness via small adversarial perturbations; Proto-Clip uses similarity clipping to prevent excessive alignment to current task prototype; and Inter-Align applies directional alignment toward previous task prototype to reduce representational gaps. Experiments show consistent gains in both standard performance and robustness, with lower forgetting and stronger transfer. We further analyze key mechanisms by quantifying the sensitivity of Intra-Smooth to perturbation settings and the effect of Inter-Align on task similarity and geometric distance. In summary, the modules provide complementary gains when combined, and each can also be integrated individually into diverse CL paradigms, including replay, regularization, and dynamic architectures, thereby offering a geometric control mechanism for continual learning.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The abstract sketches a method to repurpose adversarial perturbations as geometric controls for continual learning via three plug-in modules, but supplies no experimental details to back the claimed gains.

read the letter

The paper's main move is to take adversarial perturbations, normally treated as a problem to defend against, and turn them into signals for controlling representation geometry during continual adaptation of language models. AdvCL adds Intra-Smooth for local smoothness with small perturbations, Proto-Clip to limit over-alignment to the current task prototype, and Inter-Align to pull toward previous prototypes and shrink gaps between tasks. These are framed as modular additions that can drop into replay, regularization, or architecture-based continual learning setups.

The approach is new in its explicit repurposing for this purpose and in the claim that the three pieces give complementary improvements when used together. The abstract also mentions some mechanism checks, such as how sensitive Intra-Smooth is to perturbation size and how Inter-Align affects task similarity and geometric distance. That level of follow-up is better than pure performance claims.

The clear limitation is the complete absence of any experimental specifics. No datasets, no baselines, no statistical reporting, and no numbers appear, so the statements about consistent gains in performance, robustness, forgetting, and transfer cannot be checked. The reader's weakest assumption—that the modules will act as stable controls without creating fresh instabilities—remains untested on the information given. The full manuscript is referenced but not supplied here, which leaves the same gap.

This work would interest people already working on continual learning for large language models who are looking for geometric levers. A reader who wants concrete evidence of whether the modules deliver on the promises would not get much from the abstract alone. The paper does not yet show the level of detail needed for a serious referee to evaluate the results or the implementation.

Referee Report

2 major / 1 minor

Summary. The manuscript proposes AdvCL for continual learning in large language models, repurposing adversarial perturbations as a geometric control signal. It introduces three plug-in modules—Intra-Smooth (local smoothness via small perturbations), Proto-Clip (similarity clipping to avoid excessive current-task alignment), and Inter-Align (directional alignment to prior-task prototypes to reduce gaps)—that can be combined or used individually within replay, regularization, or dynamic-architecture CL methods. The central claim is that the approach yields consistent gains in standard performance and robustness, reduced forgetting, and improved transfer, supported by analyses of perturbation sensitivity and geometric effects.

Significance. If the empirical results are reproducible and the modules prove stable across paradigms, the work supplies a practical geometric mechanism that converts an existing defense technique into an active alignment tool. This could be broadly useful for adapting LLMs without catastrophic forgetting while preserving robustness, and the plug-in design lowers the barrier to adoption in existing CL frameworks.

major comments (2)

[Abstract] Abstract: the claim of 'consistent gains in both standard performance and robustness, with lower forgetting and stronger transfer' is presented without any mention of datasets, baselines, number of runs, or statistical tests, making it impossible to assess whether the data actually support the stated improvements.
[Abstract] The weakest assumption—that the three modules act as a stable geometric control signal without introducing instabilities or negative interactions when combined—is load-bearing for the 'complementary gains' and 'plug-in' claims; the manuscript must supply ablation results quantifying interactions and failure modes under varied task similarities.

minor comments (1)

[Abstract] Abstract: terms such as 'task prototype' and 'representational gaps' are used without brief definitions, which would aid readability for readers outside the immediate subfield.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point by point below, agreeing that targeted revisions will strengthen the presentation of our claims.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of 'consistent gains in both standard performance and robustness, with lower forgetting and stronger transfer' is presented without any mention of datasets, baselines, number of runs, or statistical tests, making it impossible to assess whether the data actually support the stated improvements.

Authors: We agree that the abstract would benefit from additional context to support its claims. In the revised manuscript we will update the abstract to briefly reference the primary continual learning benchmarks, note that results are reported as averages over multiple independent runs, and indicate that statistical significance was evaluated in the experiments. The full details on datasets, baselines, run counts, and tests remain in Sections 4 and 5. revision: yes
Referee: [Abstract] The weakest assumption—that the three modules act as a stable geometric control signal without introducing instabilities or negative interactions when combined—is load-bearing for the 'complementary gains' and 'plug-in' claims; the manuscript must supply ablation results quantifying interactions and failure modes under varied task similarities.

Authors: The referee correctly highlights the importance of verifying stability. While the manuscript already reports complementary gains from module combinations across replay, regularization, and dynamic-architecture paradigms together with geometric analyses, it does not contain a dedicated ablation on negative interactions or instabilities across task-similarity regimes. We will add such experiments in the revision, systematically varying task similarity and reporting any observed instabilities or failure modes to directly support the stability claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical method with no derivation chain

full rationale

The paper presents AdvCL as an empirical method consisting of three plug-in modules (Intra-Smooth, Proto-Clip, Inter-Align) whose value is demonstrated through experiments on performance, robustness, forgetting, and transfer. No equations, derivations, or first-principles claims appear in the abstract or referenced text. Claims rest on experimental outcomes rather than any mathematical reduction that could be circular. This is a standard non-circular empirical contribution.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The abstract does not specify any free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5728 in / 1115 out tokens · 36127 ms · 2026-06-28T16:00:25.992153+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 28 canonical work pages

[1]

The Power of Scale for Parameter-Efficient Prompt Tuning

Lester, Brian and Al-Rfou, Rami and Constant, Noah. The Power of Scale for Parameter-Efficient Prompt Tuning. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021.emnlp-main.243

work page doi:10.18653/v1/2021.emnlp-main.243 2021
[2]

Language Models Resist Alignment: Evidence From Data Compression

Ji, Jiaming and Wang, Kaile and Qiu, Tianyi Alex and Chen, Boyuan and Zhou, Jiayi and Li, Changye and Lou, Hantao and Dai, Josef and Liu, Yunhuai and Yang, Yaodong. Language Models Resist Alignment: Evidence From Data Compression. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10....

work page doi:10.18653/v1/2025.acl-long.1141 2025
[3]

International Conference on Learning Representations , year=

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , author=. International Conference on Learning Representations , year=
[4]

Proceedings of the 36th International Conference on Machine Learning , pages =

Theoretically Principled Trade-off between Robustness and Accuracy , author =. Proceedings of the 36th International Conference on Machine Learning , pages =. 2019 , editor =

2019
[5]

2024 , eprint=

Maintaining Adversarial Robustness in Continuous Learning , author=. 2024 , eprint=

2024
[6]

Disentangling and mitigating the impact of task similarity for continual learning , volume =

Hiratani, Naoki , booktitle =. Disentangling and mitigating the impact of task similarity for continual learning , volume =. doi:10.52202/079017-0107 , editor =

work page doi:10.52202/079017-0107
[7]

and Soatto, Stefano and Perona, Pietro , title =

Achille, Alessandro and Lam, Michael and Tewari, Rahul and Ravichandran, Avinash and Maji, Subhransu and Fowlkes, Charless C. and Soatto, Stefano and Perona, Pietro , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =
[8]

, title =

Rebuffi, Sylvestre-Alvise and Kolesnikov, Alexander and Sperl, Georg and Lampert, Christoph H. , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , month =
[9]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

Cha, Hyuntak and Lee, Jaeho and Shin, Jinwoo , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =. 2021 , pages =

2021
[10]

Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , year=

Miyato, Takeru and Maeda, Shin-Ichi and Koyama, Masanori and Ishii, Shin , journal=. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , year=
[11]

International Conference on Learning Representations , year=

Sharpness-aware Minimization for Efficiently Improving Generalization , author=. International Conference on Learning Representations , year=
[12]

Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal

Huang, Jianheng and Cui, Leyang and Wang, Ante and Yang, Chengyi and Liao, Xinting and Song, Linfeng and Yao, Junfeng and Su, Jinsong. Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.186...

work page doi:10.18653/v1/2024.acl-long.77 2024
[13]

2016 , eprint=

Progressive Neural Networks , author=. 2016 , eprint=

2016
[14]

Rehearsal-Free Modular and Compositional Continual Learning for Language Models

Wang, Mingyang and Adel, Heike and Lange, Lukas and Str. Rehearsal-Free Modular and Compositional Continual Learning for Language Models. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers). 2024. doi:10.18653/v1/2024.naacl-short.39

work page doi:10.18653/v1/2024.naacl-short.39 2024
[15]

Revisiting Catastrophic Forgetting in Large Language Model Tuning

Li, Hongyu and Ding, Liang and Fang, Meng and Tao, Dacheng. Revisiting Catastrophic Forgetting in Large Language Model Tuning. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.249

work page doi:10.18653/v1/2024.findings-emnlp.249 2024
[16]

A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., Hassabis, D., Clopath, C., Kumaran, D., and Hadsell, R

James Kirkpatrick and Razvan Pascanu and Neil Rabinowitz and Joel Veness and Guillaume Desjardins and Andrei A. Rusu and Kieran Milan and John Quan and Tiago Ramalho and Agnieszka Grabska-Barwinska and Demis Hassabis and Claudia Clopath and Dharshan Kumaran and Raia Hadsell , title =. Proceedings of the National Academy of Sciences , volume =. 2017 , doi ...

work page doi:10.1073/pnas.1611835114 2017
[17]

Orthogonal Subspace Learning for Language Model Continual Learning

Wang, Xiao and Chen, Tianze and Ge, Qiming and Xia, Han and Bao, Rong and Zheng, Rui and Zhang, Qi and Gui, Tao and Huang, Xuanjing. Orthogonal Subspace Learning for Language Model Continual Learning. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.715

work page doi:10.18653/v1/2023.findings-emnlp.715 2023
[18]

Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language Models

Lu, Yuheng and Qian, Bingshuo and Yuan, Caixia and Jiang, Huixing and Wang, Xiaojie. Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language Models. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.940

work page doi:10.18653/v1/2025.acl-long.940 2025
[19]

Edward J Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen , booktitle=. Lo
[20]

SAPT : A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models

Zhao, Weixiang and Wang, Shilong and Hu, Yulin and Zhao, Yanyan and Qin, Bing and Zhang, Xuanyu and Yang, Qing and Xu, Dongliang and Che, Wanxiang. SAPT : A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long ...

work page doi:10.18653/v1/2024.acl-long.625 2024
[21]

SLIM : Let LLM Learn More and Forget Less with Soft L o RA and Identity Mixture

Han, Jiayi and Du, Liang and Du, Hongwei and Zhou, Xiangguo and Wu, Yiwen and Zhang, Yuanfang and Zheng, Weibo and Han, Donghong. SLIM : Let LLM Learn More and Forget Less with Soft L o RA and Identity Mixture. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technol...

work page doi:10.18653/v1/2025.naacl-long.246 2025
[22]

Parameter-Efficient Transfer Learning for

Houlsby, Neil and Giurgiu, Andrei and Jastrzebski, Stanislaw and Morrone, Bruna and De Laroussilhe, Quentin and Gesmundo, Andrea and Attariyan, Mona and Gelly, Sylvain , booktitle =. Parameter-Efficient Transfer Learning for. 2019 , editor =

2019
[23]

International Conference on Learning Representations , year=

FreeLB: Enhanced Adversarial Training for Natural Language Understanding , author=. International Conference on Learning Representations , year=
[24]

SMART : Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

Jiang, Haoming and He, Pengcheng and Chen, Weizhu and Liu, Xiaodong and Gao, Jianfeng and Zhao, Tuo. SMART : Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.197

work page doi:10.18653/v1/2020.acl-main.197 2020
[25]

International Conference on Learning Representations , year=

Better Fine-Tuning by Reducing Representational Collapse , author=. International Conference on Learning Representations , year=
[26]

The Twelfth International Conference on Learning Representations , year=

Neel Jain and Ping. The Twelfth International Conference on Learning Representations , year=
[27]

2022 , eprint=

The Effect of Task Ordering in Continual Learning , author=. 2022 , eprint=

2022
[28]

Sentence Embedding Alignment for Lifelong Relation Extraction

Wang, Hong and Xiong, Wenhan and Yu, Mo and Guo, Xiaoxiao and Chang, Shiyu and Wang, William Yang. Sentence Embedding Alignment for Lifelong Relation Extraction. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.186...

work page doi:10.18653/v1/n19-1086 2019
[29]

Refining Sample Embeddings with Relation Prototypes to Enhance Continual Relation Extraction

Cui, Li and Yang, Deqing and Yu, Jiaxin and Hu, Chengwei and Cheng, Jiayang and Yi, Jingjie and Xiao, Yanghua. Refining Sample Embeddings with Relation Prototypes to Enhance Continual Relation Extraction. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language ...

work page doi:10.18653/v1/2021.acl-long.20 2021
[30]

Consistent Representation Learning for Continual Relation Extraction

Zhao, Kang and Xu, Hua and Yang, Jiangong and Gao, Kai. Consistent Representation Learning for Continual Relation Extraction. Findings of the Association for Computational Linguistics: ACL 2022. 2022. doi:10.18653/v1/2022.findings-acl.268

work page doi:10.18653/v1/2022.findings-acl.268 2022
[31]

International Conference on Learning Representations , year=

Towards Deep Learning Models Resistant to Adversarial Attacks , author=. International Conference on Learning Representations , year=
[32]

International Conference on Learning Representations , year=

Explaining and Harnessing Adversarial Examples , author=. International Conference on Learning Representations , year=
[33]

Super- N atural I nstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Wang, Yizhong and Mishra, Swaroop and Alipoormolabashi, Pegah and Kordi, Yeganeh and Mirzaei, Amirreza and Naik, Atharva and Ashok, Arjun and Dhanasekaran, Arut Selvan and Arunkumar, Anjana and Stap, David and Pathak, Eshaan and Karamanolakis, Giannis and Lai, Haizhi and Purohit, Ishan and Mondal, Ishani and Anderson, Jacob and Kuznia, Kirby and Doshi, Kr...

work page doi:10.18653/v1/2022.emnlp-main.340 2022
[34]

S em E val-2018 Task 1: Affect in Tweets

Mohammad, Saif and Bravo-Marquez, Felipe and Salameh, Mohammad and Kiritchenko, Svetlana. S em E val-2018 Task 1: Affect in Tweets. Proceedings of the 12th International Workshop on Semantic Evaluation. 2018. doi:10.18653/v1/S18-1001

work page doi:10.18653/v1/s18-1001 2018
[35]

CARER : Contextualized Affect Representations for Emotion Recognition

Saravia, Elvis and Liu, Hsien-Chi Toby and Huang, Yen-Hao and Wu, Junlin and Chen, Yi-Shin. CARER : Contextualized Affect Representations for Emotion Recognition. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1404

work page doi:10.18653/v1/d18-1404 2018
[36]

Character-level Convolutional Networks for Text Classification , volume =

Zhang, Xiang and Zhao, Junbo and LeCun, Yann , booktitle =. Character-level Convolutional Networks for Text Classification , volume =
[37]

Adversarial NLI : A New Benchmark for Natural Language Understanding

Nie, Yixin and Williams, Adina and Dinan, Emily and Bansal, Mohit and Weston, Jason and Kiela, Douwe. Adversarial NLI : A New Benchmark for Natural Language Understanding. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.441

work page doi:10.18653/v1/2020.acl-main.441 2020
[38]

Hardt, M., Recht, B., and Singer, Y

Narayan, Shashi and Cohen, Shay B. and Lapata, Mirella. Don ' t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1206

work page doi:10.18653/v1/d18-1206 2018
[39]

Saiful and Mubasshir, Kazi and Li, Yuan-Fang and Kang, Yong-Bin and Rahman, M

Hasan, Tahmid and Bhattacharjee, Abhik and Islam, Md. Saiful and Mubasshir, Kazi and Li, Yuan-Fang and Kang, Yong-Bin and Rahman, M. Sohel and Shahriyar, Rifat. XL -Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.413

work page doi:10.18653/v1/2021.findings-acl.413 2021
[40]

International Conference on Learning Representations , year=

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , author=. International Conference on Learning Representations , year=
[41]

2024 , eprint=

The Llama 3 Herd of Models , author=. 2024 , eprint=

2024
[42]

T ext A ttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP

Morris, John and Lifland, Eli and Yoo, Jin Yong and Grigsby, Jake and Jin, Di and Qi, Yanjun. T ext A ttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2020. doi:10.18653/v1/2020.emnlp-demos.16

work page doi:10.18653/v1/2020.emnlp-demos.16 2020
[43]

2025 , eprint=

Parameter-Efficient Continual Fine-Tuning: A Survey , author=. 2025 , eprint=

2025
[44]

Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models

Araujo, Vladimir and Moens, Marie-Francine and Tuytelaars, Tinne. Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.38

work page doi:10.18653/v1/2024.findings-emnlp.38 2024
[45]

Model Sensitivity Aware Continual Learning , volume =

Wang, Zhenyi and Huang, Heng , booktitle =. Model Sensitivity Aware Continual Learning , volume =. doi:10.52202/079017-4215 , editor =

work page doi:10.52202/079017-4215
[46]

Datasets: A Community Library for Natural Language Processing

Lhoest, Quentin and Villanova del Moral, Albert and Jernite, Yacine and Thakur, Abhishek and von Platen, Patrick and Patil, Suraj and Chaumond, Julien and Drame, Mariama and Plu, Julien and Tunstall, Lewis and Davison, Joe and S a s ko, Mario and Chhablani, Gunjan and Malik, Bhavitvya and Brandeis, Simon and Le Scao, Teven and Sanh, Victor and Xu, Canwen ...

work page doi:10.18653/v1/2021.emnlp-demo.21 2021
[47]

Sourab Mangrulkar and Sylvain Gugger and Lysandre Debut and Younes Belkada and Sayak Paul and Benjamin Bossan and Marian Tietz , howpublished =
[48]

doi: 10.18653/v1/2020.emnlp-demos.6

Wolf, Thomas and Debut, Lysandre and Sanh, Victor and Chaumond, Julien and Delangue, Clement and Moi, Anthony and Cistac, Pierric and Rault, Tim and Louf, Remi and Funtowicz, Morgan and Davison, Joe and Shleifer, Sam and von Platen, Patrick and Ma, Clara and Jernite, Yacine and Plu, Julien and Xu, Canwen and Le Scao, Teven and Gugger, Sylvain and Drame, M...

work page doi:10.18653/v1/2020.emnlp-demos.6 2020
[49]

, journal=

Hunter, John D. , journal=. Matplotlib: A 2D Graphics Environment , year=
[50]

R., Millman, K

Charles R. Harris and K. Jarrod Millman and St. Array programming with. 2020 , month = sep, journal =. doi:10.1038/s41586-020-2649-2 , publisher =

work page doi:10.1038/s41586-020-2649-2 2020
[51]

PyTorch: An Imperative Style, High-Performance Deep Learning Library , volume =

Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu an...
[52]

Journal of Open Source Software , author =

Waskom, Michael L. , title =. doi:10.21105/joss.03021 , year =

work page doi:10.21105/joss.03021

[1] [1]

The Power of Scale for Parameter-Efficient Prompt Tuning

Lester, Brian and Al-Rfou, Rami and Constant, Noah. The Power of Scale for Parameter-Efficient Prompt Tuning. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing. 2021. doi:10.18653/v1/2021.emnlp-main.243

work page doi:10.18653/v1/2021.emnlp-main.243 2021

[2] [2]

Language Models Resist Alignment: Evidence From Data Compression

Ji, Jiaming and Wang, Kaile and Qiu, Tianyi Alex and Chen, Boyuan and Zhou, Jiayi and Li, Changye and Lou, Hantao and Dai, Josef and Liu, Yunhuai and Yang, Yaodong. Language Models Resist Alignment: Evidence From Data Compression. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10....

work page doi:10.18653/v1/2025.acl-long.1141 2025

[3] [3]

International Conference on Learning Representations , year=

On Large-Batch Training for Deep Learning: Generalization Gap and Sharp Minima , author=. International Conference on Learning Representations , year=

[4] [4]

Proceedings of the 36th International Conference on Machine Learning , pages =

Theoretically Principled Trade-off between Robustness and Accuracy , author =. Proceedings of the 36th International Conference on Machine Learning , pages =. 2019 , editor =

2019

[5] [5]

2024 , eprint=

Maintaining Adversarial Robustness in Continuous Learning , author=. 2024 , eprint=

2024

[6] [6]

Disentangling and mitigating the impact of task similarity for continual learning , volume =

Hiratani, Naoki , booktitle =. Disentangling and mitigating the impact of task similarity for continual learning , volume =. doi:10.52202/079017-0107 , editor =

work page doi:10.52202/079017-0107

[7] [7]

and Soatto, Stefano and Perona, Pietro , title =

Achille, Alessandro and Lam, Michael and Tewari, Rahul and Ravichandran, Avinash and Maji, Subhransu and Fowlkes, Charless C. and Soatto, Stefano and Perona, Pietro , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

[8] [8]

, title =

Rebuffi, Sylvestre-Alvise and Kolesnikov, Alexander and Sperl, Georg and Lampert, Christoph H. , title =. Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR) , month =

[9] [9]

Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =

Cha, Hyuntak and Lee, Jaeho and Shin, Jinwoo , title =. Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV) , month =. 2021 , pages =

2021

[10] [10]

Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , year=

Miyato, Takeru and Maeda, Shin-Ichi and Koyama, Masanori and Ishii, Shin , journal=. Virtual Adversarial Training: A Regularization Method for Supervised and Semi-Supervised Learning , year=

[11] [11]

International Conference on Learning Representations , year=

Sharpness-aware Minimization for Efficiently Improving Generalization , author=. International Conference on Learning Representations , year=

[12] [12]

Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal

Huang, Jianheng and Cui, Leyang and Wang, Ante and Yang, Chengyi and Liao, Xinting and Song, Linfeng and Yao, Junfeng and Su, Jinsong. Mitigating Catastrophic Forgetting in Large Language Models with Self-Synthesized Rehearsal. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2024. doi:10.186...

work page doi:10.18653/v1/2024.acl-long.77 2024

[13] [13]

2016 , eprint=

Progressive Neural Networks , author=. 2016 , eprint=

2016

[14] [14]

Rehearsal-Free Modular and Compositional Continual Learning for Language Models

Wang, Mingyang and Adel, Heike and Lange, Lukas and Str. Rehearsal-Free Modular and Compositional Continual Learning for Language Models. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 2: Short Papers). 2024. doi:10.18653/v1/2024.naacl-short.39

work page doi:10.18653/v1/2024.naacl-short.39 2024

[15] [15]

Revisiting Catastrophic Forgetting in Large Language Model Tuning

Li, Hongyu and Ding, Liang and Fang, Meng and Tao, Dacheng. Revisiting Catastrophic Forgetting in Large Language Model Tuning. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.249

work page doi:10.18653/v1/2024.findings-emnlp.249 2024

[16] [16]

A., Milan, K., Quan, J., Ramalho, T., Grabska-Barwinska, A., Hassabis, D., Clopath, C., Kumaran, D., and Hadsell, R

James Kirkpatrick and Razvan Pascanu and Neil Rabinowitz and Joel Veness and Guillaume Desjardins and Andrei A. Rusu and Kieran Milan and John Quan and Tiago Ramalho and Agnieszka Grabska-Barwinska and Demis Hassabis and Claudia Clopath and Dharshan Kumaran and Raia Hadsell , title =. Proceedings of the National Academy of Sciences , volume =. 2017 , doi ...

work page doi:10.1073/pnas.1611835114 2017

[17] [17]

Orthogonal Subspace Learning for Language Model Continual Learning

Wang, Xiao and Chen, Tianze and Ge, Qiming and Xia, Han and Bao, Rong and Zheng, Rui and Zhang, Qi and Gui, Tao and Huang, Xuanjing. Orthogonal Subspace Learning for Language Model Continual Learning. Findings of the Association for Computational Linguistics: EMNLP 2023. 2023. doi:10.18653/v1/2023.findings-emnlp.715

work page doi:10.18653/v1/2023.findings-emnlp.715 2023

[18] [18]

Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language Models

Lu, Yuheng and Qian, Bingshuo and Yuan, Caixia and Jiang, Huixing and Wang, Xiaojie. Controlled Low-Rank Adaptation with Subspace Regularization for Continued Training on Large Language Models. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 2025. doi:10.18653/v1/2025.acl-long.940

work page doi:10.18653/v1/2025.acl-long.940 2025

[19] [19]

Edward J Hu and Yelong Shen and Phillip Wallis and Zeyuan Allen-Zhu and Yuanzhi Li and Shean Wang and Lu Wang and Weizhu Chen , booktitle=. Lo

[20] [20]

SAPT : A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models

Zhao, Weixiang and Wang, Shilong and Hu, Yulin and Zhao, Yanyan and Qin, Bing and Zhang, Xuanyu and Yang, Qing and Xu, Dongliang and Che, Wanxiang. SAPT : A Shared Attention Framework for Parameter-Efficient Continual Learning of Large Language Models. Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long ...

work page doi:10.18653/v1/2024.acl-long.625 2024

[21] [21]

SLIM : Let LLM Learn More and Forget Less with Soft L o RA and Identity Mixture

Han, Jiayi and Du, Liang and Du, Hongwei and Zhou, Xiangguo and Wu, Yiwen and Zhang, Yuanfang and Zheng, Weibo and Han, Donghong. SLIM : Let LLM Learn More and Forget Less with Soft L o RA and Identity Mixture. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technol...

work page doi:10.18653/v1/2025.naacl-long.246 2025

[22] [22]

Parameter-Efficient Transfer Learning for

Houlsby, Neil and Giurgiu, Andrei and Jastrzebski, Stanislaw and Morrone, Bruna and De Laroussilhe, Quentin and Gesmundo, Andrea and Attariyan, Mona and Gelly, Sylvain , booktitle =. Parameter-Efficient Transfer Learning for. 2019 , editor =

2019

[23] [23]

International Conference on Learning Representations , year=

FreeLB: Enhanced Adversarial Training for Natural Language Understanding , author=. International Conference on Learning Representations , year=

[24] [24]

SMART : Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization

Jiang, Haoming and He, Pengcheng and Chen, Weizhu and Liu, Xiaodong and Gao, Jianfeng and Zhao, Tuo. SMART : Robust and Efficient Fine-Tuning for Pre-trained Natural Language Models through Principled Regularized Optimization. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.197

work page doi:10.18653/v1/2020.acl-main.197 2020

[25] [25]

International Conference on Learning Representations , year=

Better Fine-Tuning by Reducing Representational Collapse , author=. International Conference on Learning Representations , year=

[26] [26]

The Twelfth International Conference on Learning Representations , year=

Neel Jain and Ping. The Twelfth International Conference on Learning Representations , year=

[27] [27]

2022 , eprint=

The Effect of Task Ordering in Continual Learning , author=. 2022 , eprint=

2022

[28] [28]

Sentence Embedding Alignment for Lifelong Relation Extraction

Wang, Hong and Xiong, Wenhan and Yu, Mo and Guo, Xiaoxiao and Chang, Shiyu and Wang, William Yang. Sentence Embedding Alignment for Lifelong Relation Extraction. Proceedings of the 2019 Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). 2019. doi:10.186...

work page doi:10.18653/v1/n19-1086 2019

[29] [29]

Refining Sample Embeddings with Relation Prototypes to Enhance Continual Relation Extraction

Cui, Li and Yang, Deqing and Yu, Jiaxin and Hu, Chengwei and Cheng, Jiayang and Yi, Jingjie and Xiao, Yanghua. Refining Sample Embeddings with Relation Prototypes to Enhance Continual Relation Extraction. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language ...

work page doi:10.18653/v1/2021.acl-long.20 2021

[30] [30]

Consistent Representation Learning for Continual Relation Extraction

Zhao, Kang and Xu, Hua and Yang, Jiangong and Gao, Kai. Consistent Representation Learning for Continual Relation Extraction. Findings of the Association for Computational Linguistics: ACL 2022. 2022. doi:10.18653/v1/2022.findings-acl.268

work page doi:10.18653/v1/2022.findings-acl.268 2022

[31] [31]

International Conference on Learning Representations , year=

Towards Deep Learning Models Resistant to Adversarial Attacks , author=. International Conference on Learning Representations , year=

[32] [32]

International Conference on Learning Representations , year=

Explaining and Harnessing Adversarial Examples , author=. International Conference on Learning Representations , year=

[33] [33]

Super- N atural I nstructions: Generalization via Declarative Instructions on 1600+ NLP Tasks

Wang, Yizhong and Mishra, Swaroop and Alipoormolabashi, Pegah and Kordi, Yeganeh and Mirzaei, Amirreza and Naik, Atharva and Ashok, Arjun and Dhanasekaran, Arut Selvan and Arunkumar, Anjana and Stap, David and Pathak, Eshaan and Karamanolakis, Giannis and Lai, Haizhi and Purohit, Ishan and Mondal, Ishani and Anderson, Jacob and Kuznia, Kirby and Doshi, Kr...

work page doi:10.18653/v1/2022.emnlp-main.340 2022

[34] [34]

S em E val-2018 Task 1: Affect in Tweets

Mohammad, Saif and Bravo-Marquez, Felipe and Salameh, Mohammad and Kiritchenko, Svetlana. S em E val-2018 Task 1: Affect in Tweets. Proceedings of the 12th International Workshop on Semantic Evaluation. 2018. doi:10.18653/v1/S18-1001

work page doi:10.18653/v1/s18-1001 2018

[35] [35]

CARER : Contextualized Affect Representations for Emotion Recognition

Saravia, Elvis and Liu, Hsien-Chi Toby and Huang, Yen-Hao and Wu, Junlin and Chen, Yi-Shin. CARER : Contextualized Affect Representations for Emotion Recognition. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1404

work page doi:10.18653/v1/d18-1404 2018

[36] [36]

Character-level Convolutional Networks for Text Classification , volume =

Zhang, Xiang and Zhao, Junbo and LeCun, Yann , booktitle =. Character-level Convolutional Networks for Text Classification , volume =

[37] [37]

Adversarial NLI : A New Benchmark for Natural Language Understanding

Nie, Yixin and Williams, Adina and Dinan, Emily and Bansal, Mohit and Weston, Jason and Kiela, Douwe. Adversarial NLI : A New Benchmark for Natural Language Understanding. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 2020. doi:10.18653/v1/2020.acl-main.441

work page doi:10.18653/v1/2020.acl-main.441 2020

[38] [38]

Hardt, M., Recht, B., and Singer, Y

Narayan, Shashi and Cohen, Shay B. and Lapata, Mirella. Don ' t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization. Proceedings of the 2018 Conference on Empirical Methods in Natural Language Processing. 2018. doi:10.18653/v1/D18-1206

work page doi:10.18653/v1/d18-1206 2018

[39] [39]

Saiful and Mubasshir, Kazi and Li, Yuan-Fang and Kang, Yong-Bin and Rahman, M

Hasan, Tahmid and Bhattacharjee, Abhik and Islam, Md. Saiful and Mubasshir, Kazi and Li, Yuan-Fang and Kang, Yong-Bin and Rahman, M. Sohel and Shahriyar, Rifat. XL -Sum: Large-Scale Multilingual Abstractive Summarization for 44 Languages. Findings of the Association for Computational Linguistics: ACL-IJCNLP 2021. 2021. doi:10.18653/v1/2021.findings-acl.413

work page doi:10.18653/v1/2021.findings-acl.413 2021

[40] [40]

International Conference on Learning Representations , year=

Outrageously Large Neural Networks: The Sparsely-Gated Mixture-of-Experts Layer , author=. International Conference on Learning Representations , year=

[41] [41]

2024 , eprint=

The Llama 3 Herd of Models , author=. 2024 , eprint=

2024

[42] [42]

T ext A ttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP

Morris, John and Lifland, Eli and Yoo, Jin Yong and Grigsby, Jake and Jin, Di and Qi, Yanjun. T ext A ttack: A Framework for Adversarial Attacks, Data Augmentation, and Adversarial Training in NLP. Proceedings of the 2020 Conference on Empirical Methods in Natural Language Processing: System Demonstrations. 2020. doi:10.18653/v1/2020.emnlp-demos.16

work page doi:10.18653/v1/2020.emnlp-demos.16 2020

[43] [43]

2025 , eprint=

Parameter-Efficient Continual Fine-Tuning: A Survey , author=. 2025 , eprint=

2025

[44] [44]

Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models

Araujo, Vladimir and Moens, Marie-Francine and Tuytelaars, Tinne. Learning to Route for Dynamic Adapter Composition in Continual Learning with Language Models. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024. doi:10.18653/v1/2024.findings-emnlp.38

work page doi:10.18653/v1/2024.findings-emnlp.38 2024

[45] [45]

Model Sensitivity Aware Continual Learning , volume =

Wang, Zhenyi and Huang, Heng , booktitle =. Model Sensitivity Aware Continual Learning , volume =. doi:10.52202/079017-4215 , editor =

work page doi:10.52202/079017-4215

[46] [46]

Datasets: A Community Library for Natural Language Processing

Lhoest, Quentin and Villanova del Moral, Albert and Jernite, Yacine and Thakur, Abhishek and von Platen, Patrick and Patil, Suraj and Chaumond, Julien and Drame, Mariama and Plu, Julien and Tunstall, Lewis and Davison, Joe and S a s ko, Mario and Chhablani, Gunjan and Malik, Bhavitvya and Brandeis, Simon and Le Scao, Teven and Sanh, Victor and Xu, Canwen ...

work page doi:10.18653/v1/2021.emnlp-demo.21 2021

[47] [47]

Sourab Mangrulkar and Sylvain Gugger and Lysandre Debut and Younes Belkada and Sayak Paul and Benjamin Bossan and Marian Tietz , howpublished =

[48] [48]

doi: 10.18653/v1/2020.emnlp-demos.6

Wolf, Thomas and Debut, Lysandre and Sanh, Victor and Chaumond, Julien and Delangue, Clement and Moi, Anthony and Cistac, Pierric and Rault, Tim and Louf, Remi and Funtowicz, Morgan and Davison, Joe and Shleifer, Sam and von Platen, Patrick and Ma, Clara and Jernite, Yacine and Plu, Julien and Xu, Canwen and Le Scao, Teven and Gugger, Sylvain and Drame, M...

work page doi:10.18653/v1/2020.emnlp-demos.6 2020

[49] [49]

, journal=

Hunter, John D. , journal=. Matplotlib: A 2D Graphics Environment , year=

[50] [50]

R., Millman, K

Charles R. Harris and K. Jarrod Millman and St. Array programming with. 2020 , month = sep, journal =. doi:10.1038/s41586-020-2649-2 , publisher =

work page doi:10.1038/s41586-020-2649-2 2020

[51] [51]

PyTorch: An Imperative Style, High-Performance Deep Learning Library , volume =

Paszke, Adam and Gross, Sam and Massa, Francisco and Lerer, Adam and Bradbury, James and Chanan, Gregory and Killeen, Trevor and Lin, Zeming and Gimelshein, Natalia and Antiga, Luca and Desmaison, Alban and Kopf, Andreas and Yang, Edward and DeVito, Zachary and Raison, Martin and Tejani, Alykhan and Chilamkurthy, Sasank and Steiner, Benoit and Fang, Lu an...

[52] [52]

Journal of Open Source Software , author =

Waskom, Michael L. , title =. doi:10.21105/joss.03021 , year =

work page doi:10.21105/joss.03021