SEAT: Sparse Entity-Aware Tuning for Knowledge Adaptation while Preserving Epistemic Abstention

Nicholas D. Lane; Nicola Cancedda; William F. Shen; Xinchi Qiu

arxiv: 2506.14387 · v3 · submitted 2025-06-17 · 💻 cs.AI

SEAT: Sparse Entity-Aware Tuning for Knowledge Adaptation while Preserving Epistemic Abstention

William F. Shen , Xinchi Qiu , Nicola Cancedda , Nicholas D. Lane This is my paper

Pith reviewed 2026-05-19 09:30 UTC · model grok-4.3

classification 💻 cs.AI

keywords epistemic abstentionknowledge adaptationLLM fine-tuningsparse tuningKL regularizationhallucination mitigationsafe model updates

0 comments

The pith

SEAT lets language models absorb new facts without losing the ability to say they do not know the answer.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a fine-tuning approach called SEAT that updates large language models with new knowledge while keeping their capacity to abstain on questions outside that knowledge. Standard adaptation tends to erode this abstention behavior, leading models to generate confident but incorrect answers instead of acknowledging uncertainty. SEAT achieves the balance by limiting overall changes to the model's activations and by adding a targeted regularization term that focuses on specific entities to maintain clear local boundaries between known and unknown information. The method works without any extra alignment data or later corrective steps, and experiments across models show large gains in human-rated abstention on unknowns alongside full retention of the new knowledge.

Core claim

SEAT is a preventive fine-tuning method that preserves epistemic abstention while maintaining strong knowledge acquisition. It combines sparse tuning, which constrains global activation drift, with entity-perturbed KL regularization, which sharpens local epistemic boundaries and prevents spillover to neighboring knowledge. SEAT requires no alignment data, explicit boundary probing, or post-hoc re-alignment.

What carries the argument

SEAT, the combination of sparse tuning to limit global activation drift and entity-perturbed KL regularization to maintain sharp local boundaries around known entities.

If this is right

Knowledge updates can be performed without eroding the model's built-in refusal to answer unknowns.
No separate alignment dataset or post-tuning repair step is needed to retain abstention behavior.
Representations of known and unknown queries become more cleanly separated after the procedure.
Downstream task performance remains intact while abstention improves.
Abstention responses become more coherent and context-sensitive rather than generic refusals.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same sparse-plus-regularization pattern could be tested on other safety properties such as refusal of harmful requests.
Frequent incremental updates to deployed models might become feasible without repeated full safety retraining.
The approach may reduce reliance on large curated alignment corpora for maintaining model honesty over time.

Load-bearing premise

Sparse tuning plus entity-focused regularization alone can keep abstention boundaries intact during knowledge updates even without any separate alignment data or fixes.

What would settle it

Apply SEAT to a model, then measure abstention rates on held-out unknown queries; if the rates fall to the same low levels seen with ordinary fine-tuning or if the model begins producing confident answers on those queries, the preservation claim does not hold.

Figures

Figures reproduced from arXiv: 2506.14387 by Nicholas D. Lane, Nicola Cancedda, William F. Shen, Xinchi Qiu.

**Figure 1.** Figure 1: PCA visualization of activations (last token position at the last layer) over different datasets (projected onto the principal components of the unverifiable dataset). Plots over all layers can be found in Appendix B. where a binary mask m ∈ {0, 1} d is applied to the parameter space θ ∈ R d , controlling which weights are updated during fine-tuning. The mask defines a sparsity pattern such that, for each … view at source ↗

**Figure 2.** Figure 2: Base model: PCA visualization of activations per layer with Llama3-8B-instruct as the base model. Principal components are computed using activations from the unverifiable dataset after each block. Activations of datasets studied are projected onto the same PCA space [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Full FT: PCA visualization of activations per layer with Llama3-8B-instruct model fine-tuned using the PISTOL dataset. Principal components are computed using activations from the unverifiable dataset after each block. Activations of datasets studied are projected onto the same PCA space [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: LoRA FT: PCA visualization of activations per layer with Llama3-8B-instruct model fine-tuned using the PISTOL dataset. Principal components are computed using activations from the unverifiable dataset after each block. Activations of datasets studied are projected onto the same PCA space [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Sparse FT: PCA visualization of activations per layer with Llama3-8B-instruct model fine-tuned using the PISTOL dataset. Principal components are computed using activations from the unverifiable dataset after each block. Activations of datasets studied are projected onto the same PCA space [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: SEAT: PCA visualization of activations per layer with Llama3-8B-instruct model fine-tuned using the PISTOL dataset. Principal components are computed using activations from the unverifiable dataset after each block. Activations of datasets studied are projected onto the same PCA space [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

read the original abstract

Adapting LLMs with new knowledge is increasingly important, but standard fine-tuning often erodes aligned epistemic abstention: the ability to acknowledge when the model does not know. This failure mode is especially concerning in high-stakes settings, where abstention is a critical safeguard against hallucination. We present SEAT, a preventive fine-tuning method that preserves epistemic abstention while maintaining strong knowledge acquisition. SEAT combines sparse tuning, which constrains global activation drift, with entity-perturbed KL regularization, which sharpens local epistemic boundaries and prevents spillover to neighboring knowledge. Crucially, SEAT requires no alignment data, explicit boundary probing, or post-hoc re-alignment, making it attractive for lightweight and privacy-sensitive adaptation. Across models and datasets, SEAT improves human-evaluated abstention on unknown queries by 18%-101% over the strongest baseline while retaining near-perfect target knowledge acquisition, and produces coherent, context-aware abstentions after tuning. Further analyses show that both components are essential, that SEAT more cleanly separates known from unknown queries in representation space, and that it preserves downstream utility. These results identify preservation of epistemic abstention as a core objective for safe knowledge adaptation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

SEAT pairs sparse tuning with entity-perturbed KL regularization to keep abstention after knowledge updates, but the unknown-query tests may be too closely tied to the same perturbation trick used in training.

read the letter

The main thing to know is that SEAT tries to stop fine-tuning from erasing an LLM's ability to admit it does not know by combining sparse parameter updates with a KL penalty on entity-perturbed examples. It claims this works without alignment data or later fixes and delivers large gains in human-rated abstention on unknowns while still absorbing the new facts cleanly. The paper also reports that the two pieces are both necessary and that the model ends up with cleaner separation between known and unknown inputs in representation space. Those are the concrete results on offer. The evaluation setup is the clearest soft spot. The regularization sharpens boundaries using entity perturbations, yet the headline abstention improvements are measured on human-evaluated unknown queries whose construction is not shown to be independent of similar edits. If the test cases were filtered or generated in a comparable way, the apparent success could be narrower than it looks and might not extend to arbitrary out-of-knowledge inputs. The abstract is also thin on experimental controls, dataset details, and statistical checks, which makes it hard to judge how fairly the baselines were run or how stable the numbers are across models. If the full paper clarifies the test-query independence and includes proper ablations, the central claim would stand on firmer ground. This work is aimed at groups doing practical adaptation of LLMs for settings where hallucinations carry real costs. A reader who needs lightweight methods that avoid extra data collection would find the approach relevant. I would send it to peer review because the problem is real and the method is simple enough to test, even though the current evidence leaves some questions about generalization.

Referee Report

2 major / 2 minor

Summary. The paper proposes SEAT, a preventive fine-tuning method for LLMs that combines sparse tuning to constrain global activation drift with entity-perturbed KL regularization to sharpen local epistemic boundaries. The central claim is that SEAT enables effective adaptation to new knowledge while preserving epistemic abstention on unknown queries—improving human-evaluated abstention by 18-101% over the strongest baseline—without requiring alignment data, boundary probing, or post-hoc re-alignment, while retaining near-perfect target knowledge acquisition, producing coherent context-aware abstentions, and preserving downstream utility. Analyses reportedly confirm both components are essential and yield cleaner separation of known vs. unknown queries in representation space.

Significance. If the empirical claims hold under rigorous controls, SEAT would offer a practical, lightweight approach to mitigating hallucination risks during knowledge adaptation in high-stakes domains. The absence of reliance on extra alignment data or interventions distinguishes it from prior work and could facilitate safer deployment in privacy-sensitive settings. The reported representation-space separation and ablation results, if substantiated, would strengthen the case for treating abstention preservation as a first-class objective in adaptation pipelines.

major comments (2)

[§4, §4.2] §4 (Experiments) and §4.2 (Unknown-query evaluation): The headline gains in human-evaluated abstention on unknown queries are measured on queries whose construction is not demonstrated to be independent of the entity-perturbation mechanism used in the KL regularization term. If test unknowns are generated or filtered via analogous edits, the cleaner separation and context-aware abstentions could be an in-distribution artifact rather than evidence that the two components suffice for general epistemic abstention on arbitrary out-of-knowledge inputs. This directly affects the central claim that SEAT preserves abstention without alignment data or post-hoc fixes.
[Abstract, §3] Abstract and §3 (Method): The quantitative improvements (18%-101% abstention gains, near-perfect knowledge acquisition) are stated without reference to experimental details, controls, statistical tests, ablation tables, or variance estimates. This prevents verification of whether the data support the claims as stated and makes it impossible to assess whether the reported superiority over baselines is robust.

minor comments (2)

[Abstract] The abstract would benefit from a brief sentence on the datasets, model sizes, and human-evaluation protocol to allow readers to gauge the scope of the reported gains.
[§3] Notation for the entity-perturbed KL term and the sparsity mask should be introduced with explicit equations in §3 to improve reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments raise important points about evaluation independence and the clarity of quantitative claims. We address each major comment below with specific responses and proposed revisions.

read point-by-point responses

Referee: [§4, §4.2] §4 (Experiments) and §4.2 (Unknown-query evaluation): The headline gains in human-evaluated abstention on unknown queries are measured on queries whose construction is not demonstrated to be independent of the entity-perturbation mechanism used in the KL regularization term. If test unknowns are generated or filtered via analogous edits, the cleaner separation and context-aware abstentions could be an in-distribution artifact rather than evidence that the two components suffice for general epistemic abstention on arbitrary out-of-knowledge inputs. This directly affects the central claim that SEAT preserves abstention without alignment data or post-hoc fixes.

Authors: We appreciate the referee's concern regarding potential dependence between the training regularization and test query construction. In the current experiments, unknown queries are drawn from held-out entities and queries that were never subjected to the entity-perturbation procedure; the perturbation is applied exclusively during training on known entities to sharpen local boundaries. Test unknowns are identified solely by their absence from the adaptation knowledge base using dataset partitioning that precedes any perturbation. Nevertheless, to make this independence fully explicit and to rule out artifacts, we will revise §4.2 to include a dedicated subsection detailing the query selection protocol, the temporal and entity-level separation criteria, and an additional control experiment using naturally occurring unknown queries from an external disjoint corpus. This addresses the core validity concern while preserving the central claim. revision: partial
Referee: [Abstract, §3] Abstract and §3 (Method): The quantitative improvements (18%-101% abstention gains, near-perfect knowledge acquisition) are stated without reference to experimental details, controls, statistical tests, ablation tables, or variance estimates. This prevents verification of whether the data support the claims as stated and makes it impossible to assess whether the reported superiority over baselines is robust.

Authors: We agree that the abstract and §3 would be strengthened by explicit cross-references to the supporting evidence. The reported gains are obtained from the main results table, the component ablations, human evaluation protocol, and statistical tests (including variance and significance) that appear in §4 and the appendix. We will update the abstract to include brief parenthetical references to the relevant tables and sections. In §3 we will add a short paragraph summarizing the evaluation controls, statistical procedures, and ablation design, with direct pointers to the empirical sections. These changes will allow readers to trace each quantitative claim to its supporting data without altering the reported numbers. revision: yes

Circularity Check

0 steps flagged

No significant circularity detected in derivation or evaluation

full rationale

The paper describes SEAT as combining sparse tuning to limit activation drift with entity-perturbed KL regularization to sharpen epistemic boundaries, without alignment data or post-hoc fixes. Claims of 18-101% abstention gains rest on human-evaluated unknown queries and representation-space analyses that are presented as independent empirical outcomes. No equations, self-citations, or construction steps are shown reducing the reported improvements to the training regularization by definition; the method and results remain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract; no explicit free parameters, axioms, or invented entities are identifiable. The approach builds on standard concepts of sparsity and KL divergence but specifics are not provided.

pith-pipeline@v0.9.0 · 5751 in / 1097 out tokens · 39270 ms · 2026-05-19T09:30:39.376446+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

SEAT integrates two key components: (1) sparse training that constrains activation drift, and (2) a novel entity perturbation method with KL-divergence regularization
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

PCA visualization of activations ... seen (factual) and unseen ... clearly separable

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Preventing Safety Drift in Large Language Models via Coupled Weight and Activation Constraints
cs.AI 2026-04 unverdicted novelty 6.0

Coupled constraints on weight updates in a safety subspace and regularization of SAE-identified safety features preserve LLM refusal behaviors during fine-tuning better than weight-only or activation-only methods.

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · cited by 1 Pith paper · 6 internal anchors

[1]

Bert: Pre-training of deep bidirectional transformers for language understand- ing. In Proceedings of the 2019 conference of the North American chapter of the association for com- putational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186. Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad A...

work page 2019
[2]

The Llama 3 Herd of Models

The llama 3 herd of models. arXiv preprint arXiv:2407.21783. Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

work page internal anchor Pith review Pith/arXiv arXiv
[3]

LoRA: Low-Rank Adaptation of Large Language Models

arXiv preprint arXiv:2106.09685. Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Worts- man, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi

work page internal anchor Pith review Pith/arXiv arXiv
[4]

InThe Eleventh International Conference on Learning Representa- tions, ICLR 2023, Kigali, Rwanda, May 1-5, 2023

Calibrat- ing verbal uncertainty as a linear feature to reduce hallucinations. arXiv preprint arXiv:2503.14477. Jinqi Lai, Wensheng Gan, Jiayang Wu, Zhenlian Qi, and Philip S Yu

work page arXiv
[5]

A survey on medical large language models: Technology, application, trustworthiness, and future directions,

A survey on medi- cal large language models: Technology, application, trustworthiness, and future directions. arXiv preprint arXiv:2406.03712. David Lopez-Paz and Marc’Aurelio Ranzato

work page arXiv
[6]

An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning

An empirical study of catastrophic forgetting in large language mod- els during continual fine-tuning. arXiv preprint arXiv:2308.08747. Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C Lipton, and J Zico Kolter

work page internal anchor Pith review Pith/arXiv arXiv
[7]

TOFU: A Task of Fictitious Unlearning for LLMs

Tofu: A task of fictitious unlearning for llms. arXiv preprint arXiv:2401.06121. Xinchi Qiu, Javier Fernandez-Marques, Pedro PB Gus- mao, Yan Gao, Titouan Parcollet, and Nicholas Don- ald Lane

work page internal anchor Pith review Pith/arXiv arXiv
[8]

arXiv preprint arXiv:2208.02507

Zerofl: Efficient on-device train- ing for federated learning with local sparsity. arXiv preprint arXiv:2208.02507. Xinchi Qiu, William F Shen, Yihong Chen, Nicola Can- cedda, Pontus Stenetorp, and Nicholas D Lane

work page arXiv
[9]

arXiv preprint arXiv:2406.16810

Pistol: Dataset compilation pipeline for structural un- learning of llms. arXiv preprint arXiv:2406.16810. Joan Serra, Didac Suris, Marius Miron, and Alexandros Karatzoglou

work page arXiv
[10]

Shen and Xinchi Qiu and Meghdad Kurmanji and Alex Iacob and Lorenzo Sani and Yihong Chen and Nicola Cancedda and Nicholas D

Lunar: Llm unlearn- ing via neural activation redirection. arXiv preprint arXiv:2502.07218. Haizhou Shi, Zihao Xu, Hengyi Wang, Weiyi Qin, Wenyuan Wang, Yibin Wang, Zifeng Wang, Sayna Ebrahimi, and Hao Wang

work page arXiv
[11]

Continual learning of large language models: A comprehensive survey

Continual learning of large language models: A comprehensive survey. arXiv preprint arXiv:2404.16789. Reece Shuttleworth, Jacob Andreas, Antonio Torralba, and Pratyusha Sharma

work page arXiv
[12]

Lora vs full fine-tuning: An illusion of equivalence.arXiv preprint arXiv:2410.21228, 2024

Lora vs full fine- tuning: An illusion of equivalence. arXiv preprint arXiv:2410.21228. James Seale Smith, Junjiao Tian, Shaunak Halbe, Yen- Chang Hsu, and Zsolt Kira

work page arXiv
[13]

arXiv preprint arXiv:2405.07813

Localizing task information for improved model merging and compression. arXiv preprint arXiv:2405.07813. Mitchell Wortsman, Gabriel Ilharco, Samir Ya Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S Mor- cos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, et al

work page arXiv
[14]

To believe or not to believe your llm.arXiv preprint arXiv:2406.02543,

To believe or not to believe your llm. arXiv preprint arXiv:2406.02543. An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, et al

work page arXiv
[15]

Qwen2.5 Technical Report

Qwen2. 5 tech- nical report. arXiv preprint arXiv:2412.15115. Le Yu, Bowen Yu, Haiyang Yu, Fei Huang, and Yongbin Li

work page internal anchor Pith review Pith/arXiv arXiv
[16]

Revolutionizing finance with llms: An overview of applications and insights,

Revolutioniz- ing finance with llms: An overview of applications and insights. arXiv preprint arXiv:2401.11641. Jing Zhou, Zongyu Lin, Yanan Zheng, Jian Li, and Zhilin Yang

work page arXiv
[17]

Representation Engineering: A Top-Down Approach to AI Transparency

Representation engineering: A top- down approach to ai transparency. arXiv preprint arXiv:2310.01405. Appendix A Related work Continual Learning Continual learning for LLMs has emerged as a critical area of research, moti- vated by the need to efficiently incorporate new knowledge without catastrophic forgetting of previ- ously acquired capabilities. Trad...

work page internal anchor Pith review Pith/arXiv arXiv
[18]

Recent work has explored modular ar- chitectures and adapter-based methods to localize task-specific updates and reduce interference with general knowledge (Wang et al., 2024)

and parame- ter isolation techniques (Serra et al., 2018), have been adapted to the LLM setting, but face unique challenges due to the scale and sensitivity of these models. Recent work has explored modular ar- chitectures and adapter-based methods to localize task-specific updates and reduce interference with general knowledge (Wang et al., 2024). Others...

work page 2018
[19]

and task arithmetic (Ilharco et al., 2022), showing that compatible models with distinctive task specializa- tion can be fused to produce a merged model with strengthened performance across all tasks. More recent studies address the challenge of interference between constituent models, which often leads to degraded performance of the merged model (Yadav e...

work page 2022
[20]

Who wrote Romeo and Juliet?

and TOFU (Maini et al., 2024). Both datasets consist of syn- thetic data involving fictitious entities, which helps eliminate confounding risks from overlap with the pre-training corpus and ensures that the fictitious knowledge of PISTOL and TOFU datasets are not presented in the pretrained model. PISTOL dataset is generated via a pipeline designed to fle...

work page 2024

[1] [1]

Bert: Pre-training of deep bidirectional transformers for language understand- ing. In Proceedings of the 2019 conference of the North American chapter of the association for com- putational linguistics: human language technologies, volume 1 (long and short papers), pages 4171–4186. Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad A...

work page 2019

[2] [2]

The Llama 3 Herd of Models

The llama 3 herd of models. arXiv preprint arXiv:2407.21783. Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, and Weizhu Chen

work page internal anchor Pith review Pith/arXiv arXiv

[3] [3]

LoRA: Low-Rank Adaptation of Large Language Models

arXiv preprint arXiv:2106.09685. Gabriel Ilharco, Marco Tulio Ribeiro, Mitchell Worts- man, Ludwig Schmidt, Hannaneh Hajishirzi, and Ali Farhadi

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

InThe Eleventh International Conference on Learning Representa- tions, ICLR 2023, Kigali, Rwanda, May 1-5, 2023

Calibrat- ing verbal uncertainty as a linear feature to reduce hallucinations. arXiv preprint arXiv:2503.14477. Jinqi Lai, Wensheng Gan, Jiayang Wu, Zhenlian Qi, and Philip S Yu

work page arXiv

[5] [5]

A survey on medical large language models: Technology, application, trustworthiness, and future directions,

A survey on medi- cal large language models: Technology, application, trustworthiness, and future directions. arXiv preprint arXiv:2406.03712. David Lopez-Paz and Marc’Aurelio Ranzato

work page arXiv

[6] [6]

An Empirical Study of Catastrophic Forgetting in Large Language Models During Continual Fine-tuning

An empirical study of catastrophic forgetting in large language mod- els during continual fine-tuning. arXiv preprint arXiv:2308.08747. Pratyush Maini, Zhili Feng, Avi Schwarzschild, Zachary C Lipton, and J Zico Kolter

work page internal anchor Pith review Pith/arXiv arXiv

[7] [7]

TOFU: A Task of Fictitious Unlearning for LLMs

Tofu: A task of fictitious unlearning for llms. arXiv preprint arXiv:2401.06121. Xinchi Qiu, Javier Fernandez-Marques, Pedro PB Gus- mao, Yan Gao, Titouan Parcollet, and Nicholas Don- ald Lane

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

arXiv preprint arXiv:2208.02507

Zerofl: Efficient on-device train- ing for federated learning with local sparsity. arXiv preprint arXiv:2208.02507. Xinchi Qiu, William F Shen, Yihong Chen, Nicola Can- cedda, Pontus Stenetorp, and Nicholas D Lane

work page arXiv

[9] [9]

arXiv preprint arXiv:2406.16810

Pistol: Dataset compilation pipeline for structural un- learning of llms. arXiv preprint arXiv:2406.16810. Joan Serra, Didac Suris, Marius Miron, and Alexandros Karatzoglou

work page arXiv

[10] [10]

Shen and Xinchi Qiu and Meghdad Kurmanji and Alex Iacob and Lorenzo Sani and Yihong Chen and Nicola Cancedda and Nicholas D

Lunar: Llm unlearn- ing via neural activation redirection. arXiv preprint arXiv:2502.07218. Haizhou Shi, Zihao Xu, Hengyi Wang, Weiyi Qin, Wenyuan Wang, Yibin Wang, Zifeng Wang, Sayna Ebrahimi, and Hao Wang

work page arXiv

[11] [11]

Continual learning of large language models: A comprehensive survey

Continual learning of large language models: A comprehensive survey. arXiv preprint arXiv:2404.16789. Reece Shuttleworth, Jacob Andreas, Antonio Torralba, and Pratyusha Sharma

work page arXiv

[12] [12]

Lora vs full fine-tuning: An illusion of equivalence.arXiv preprint arXiv:2410.21228, 2024

Lora vs full fine- tuning: An illusion of equivalence. arXiv preprint arXiv:2410.21228. James Seale Smith, Junjiao Tian, Shaunak Halbe, Yen- Chang Hsu, and Zsolt Kira

work page arXiv

[13] [13]

arXiv preprint arXiv:2405.07813

Localizing task information for improved model merging and compression. arXiv preprint arXiv:2405.07813. Mitchell Wortsman, Gabriel Ilharco, Samir Ya Gadre, Rebecca Roelofs, Raphael Gontijo-Lopes, Ari S Mor- cos, Hongseok Namkoong, Ali Farhadi, Yair Carmon, Simon Kornblith, et al

work page arXiv

[14] [14]

To believe or not to believe your llm.arXiv preprint arXiv:2406.02543,

To believe or not to believe your llm. arXiv preprint arXiv:2406.02543. An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, et al

work page arXiv

[15] [15]

Qwen2.5 Technical Report

Qwen2. 5 tech- nical report. arXiv preprint arXiv:2412.15115. Le Yu, Bowen Yu, Haiyang Yu, Fei Huang, and Yongbin Li

work page internal anchor Pith review Pith/arXiv arXiv

[16] [16]

Revolutionizing finance with llms: An overview of applications and insights,

Revolutioniz- ing finance with llms: An overview of applications and insights. arXiv preprint arXiv:2401.11641. Jing Zhou, Zongyu Lin, Yanan Zheng, Jian Li, and Zhilin Yang

work page arXiv

[17] [17]

Representation Engineering: A Top-Down Approach to AI Transparency

Representation engineering: A top- down approach to ai transparency. arXiv preprint arXiv:2310.01405. Appendix A Related work Continual Learning Continual learning for LLMs has emerged as a critical area of research, moti- vated by the need to efficiently incorporate new knowledge without catastrophic forgetting of previ- ously acquired capabilities. Trad...

work page internal anchor Pith review Pith/arXiv arXiv

[18] [18]

Recent work has explored modular ar- chitectures and adapter-based methods to localize task-specific updates and reduce interference with general knowledge (Wang et al., 2024)

and parame- ter isolation techniques (Serra et al., 2018), have been adapted to the LLM setting, but face unique challenges due to the scale and sensitivity of these models. Recent work has explored modular ar- chitectures and adapter-based methods to localize task-specific updates and reduce interference with general knowledge (Wang et al., 2024). Others...

work page 2018

[19] [19]

and task arithmetic (Ilharco et al., 2022), showing that compatible models with distinctive task specializa- tion can be fused to produce a merged model with strengthened performance across all tasks. More recent studies address the challenge of interference between constituent models, which often leads to degraded performance of the merged model (Yadav e...

work page 2022

[20] [20]

Who wrote Romeo and Juliet?

and TOFU (Maini et al., 2024). Both datasets consist of syn- thetic data involving fictitious entities, which helps eliminate confounding risks from overlap with the pre-training corpus and ensures that the fictitious knowledge of PISTOL and TOFU datasets are not presented in the pretrained model. PISTOL dataset is generated via a pipeline designed to fle...

work page 2024