Activation patching reveals that citation decisions in Llama-3.1-8B RAG are implemented by a distributed attributional ensemble of heads and layers; targeted interventions fix most missed and spurious citations on PopQA.
Modi, Bradley D
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
Literature on system prompts for AI shows fragmented and contradictory claims that complicate policy efforts to use them as reliable governance mechanisms.
citing papers explorer
-
How Do LLMs Cite? A Mechanistic Interpretation of Attribution in Retrieval-Augmented Generation
Activation patching reveals that citation decisions in Llama-3.1-8B RAG are implemented by a distributed attributional ensemble of heads and layers; targeted interventions fix most missed and spurious citations on PopQA.