SEAT: Sparse Entity-Aware Tuning for Knowledge Adaptation while Preserving Epistemic Abstention

William F. Shen , Xinchi Qiu , Nicola Cancedda , Nicholas D. Lane

Authors on Pith no claims yet

classification 💻 cs.AI

keywords abstentionknowledgeseatepistemicadaptationtuningwhileacquisition

read the original abstract

Adapting LLMs with new knowledge is increasingly important, but standard fine-tuning often erodes aligned epistemic abstention: the ability to acknowledge when the model does not know. This failure mode is especially concerning in high-stakes settings, where abstention is a critical safeguard against hallucination. We present SEAT, a preventive fine-tuning method that preserves epistemic abstention while maintaining strong knowledge acquisition. SEAT combines sparse tuning, which constrains global activation drift, with entity-perturbed KL regularization, which sharpens local epistemic boundaries and prevents spillover to neighboring knowledge. Crucially, SEAT requires no alignment data, explicit boundary probing, or post-hoc re-alignment, making it attractive for lightweight and privacy-sensitive adaptation. Across models and datasets, SEAT improves human-evaluated abstention on unknown queries by 18%-101% over the strongest baseline while retaining near-perfect target knowledge acquisition, and produces coherent, context-aware abstentions after tuning. Further analyses show that both components are essential, that SEAT more cleanly separates known from unknown queries in representation space, and that it preserves downstream utility. These results identify preservation of epistemic abstention as a core objective for safe knowledge adaptation.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Preventing Safety Drift in Large Language Models via Coupled Weight and Activation Constraints
cs.AI 2026-04 unverdicted novelty 6.0

Coupled constraints on weight updates in a safety subspace and regularization of SAE-identified safety features preserve LLM refusal behaviors during fine-tuning better than weight-only or activation-only methods.