Dominant error in attribution patching arises from downstream non-linearities; a single HVP correction removes the leading error term and matches Integrated Gradients accuracy at lower cost across 124M-9B models.
per-head median over K∈ {5,10,20}
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
When Attribution Patching Lies: Diagnosis and a Second-Order Correction
Dominant error in attribution patching arises from downstream non-linearities; a single HVP correction removes the leading error term and matches Integrated Gradients accuracy at lower cost across 124M-9B models.