Activation patching reveals that citation decisions in Llama-3.1-8B RAG are implemented by a distributed attributional ensemble of heads and layers; targeted interventions fix most missed and spurious citations on PopQA.
Annals of Internal Medicine (Jun 2025)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.IR 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
How Do LLMs Cite? A Mechanistic Interpretation of Attribution in Retrieval-Augmented Generation
Activation patching reveals that citation decisions in Llama-3.1-8B RAG are implemented by a distributed attributional ensemble of heads and layers; targeted interventions fix most missed and spurious citations on PopQA.