Output vector editing on MLP neurons suppresses memorization in LLMs up to 87.9% on 6831 sequences in OLMo-7B with a 2.7x gap over zero ablation, ensemble covering 96.5%.
Do Localization Methods Actually Localize Memorized Data in LLM s? A Tale of Two Benchmarks
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Empirical benchmarks show distribution similarity between adaptation and pretraining data increases practical privacy leakage in DP-adapted LLMs at fixed theoretical guarantees, with LoRA providing strongest protection for OOD cases.
citing papers explorer
-
Output Vector Editing for Memorization Mitigation in Large Language Models
Output vector editing on MLP neurons suppresses memorization in LLMs up to 87.9% on 6831 sequences in OLMo-7B with a 2.7x gap over zero ablation, ensemble covering 96.5%.
-
Benchmarking Empirical Privacy Protection for Adaptations of Large Language Models
Empirical benchmarks show distribution similarity between adaptation and pretraining data increases practical privacy leakage in DP-adapted LLMs at fixed theoretical guarantees, with LoRA providing strongest protection for OOD cases.