Self-explanations from LLMs produce faithful token subsets for correct predictions but align with human rationales only conditionally on text length and task complexity, unlike post-hoc attribution methods that highlight structural tokens.
In Proceedings of the 2023 Conference on Empiri- cal Methods in Natural Language Processing, pages 6907–6920, Singapore
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2024 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
A Systematic Comparison between Extractive Self-Explanations and Human Rationales in Text Classification
Self-explanations from LLMs produce faithful token subsets for correct predictions but align with human rationales only conditionally on text length and task complexity, unlike post-hoc attribution methods that highlight structural tokens.