Proposes SCSuff metric for evaluating LLM explanation sufficiency via model-generated alternative inputs, showing explanations are typically insufficient and predictable from hidden states.
G lob E nc: Quantifying Global Token Attribution by Incorporating the Whole Encoder Layer in Transformers
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.LG 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
A unified multi-component causal tracing method that uses soft interventions and a metric transformation to efficiently select critical LLM components for a target performance metric.
citing papers explorer
-
What LLMs explain is not what they believe: Evaluating explanation sufficiency under models' own input beliefs
Proposes SCSuff metric for evaluating LLM explanation sufficiency via model-generated alternative inputs, showing explanations are typically insufficient and predictable from hidden states.
-
Multi-component Causal Tracing in Large Language Models
A unified multi-component causal tracing method that uses soft interventions and a metric transformation to efficiently select critical LLM components for a target performance metric.