Explanation biases in feature attribution methods are systematic products of lexical and positional preferences, with observed trade-offs across models and higher bias in anomalous explanations.
In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 6190–6197, Singapore
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2025 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Explanation Bias is a Product: Revealing the Hidden Lexical and Position Preferences in Post-Hoc Feature Attribution
Explanation biases in feature attribution methods are systematic products of lexical and positional preferences, with observed trade-offs across models and higher bias in anomalous explanations.