Fine-tuning on new knowledge induces propagating hallucinations in LLMs by weakening attention to key entities, with mitigation via reintroducing known knowledge during later training stages.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
representative citing papers
citing papers explorer
-
Understanding New-Knowledge-Induced Factual Hallucinations in LLMs: Analysis and Interpretation
Fine-tuning on new knowledge induces propagating hallucinations in LLMs by weakening attention to key entities, with mitigation via reintroducing known knowledge during later training stages.
- Benchmarking and Improving Monitors for Out-Of-Distribution Alignment Failure in LLMs