Output vector editing on MLP neurons suppresses memorization in LLMs up to 87.9% on 6831 sequences in OLMo-7B with a 2.7x gap over zero ablation, ensemble covering 96.5%.
The butterfly effect of model editing: Few edits can trigger large language models collapse
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.CL 2years
2026 2representative citing papers
Parameter-based knowledge editing in LLMs induces reasoning collapse via dimensional collapse and is consistently outperformed by a retrieval baseline across varied edit counts, knowledge complexity, and evaluation metrics.
citing papers explorer
-
Output Vector Editing for Memorization Mitigation in Large Language Models
Output vector editing on MLP neurons suppresses memorization in LLMs up to 87.9% on 6831 sequences in OLMo-7B with a 2.7x gap over zero ablation, ensemble covering 96.5%.
-
Revisiting Parameter-Based Knowledge Editing in Large Language Models: Theoretical Limits and Empirical Evidence
Parameter-based knowledge editing in LLMs induces reasoning collapse via dimensional collapse and is consistently outperformed by a retrieval baseline across varied edit counts, knowledge complexity, and evaluation metrics.