On measuring faithfulness or self-consistency of natural language explanations

Letitia Parcalabescu, Anette Frank · 2024 · DOI 10.18653/v1/2024.acl-long.329

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

open at publisher browse 2 citing papers

representative citing papers

AtManRL: Towards Faithful Reasoning via Differentiable Attention Saliency

cs.CL · 2026-04-17 · unverdicted · novelty 6.0

AtManRL learns an additive attention mask on CoT traces to produce a saliency reward that, when combined with outcome rewards in GRPO, trains LLMs to generate reasoning that genuinely influences final predictions.

Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models

cs.CL · 2026-04-16 · unverdicted · novelty 6.0

VLMs show answer inertia in CoT reasoning and remain influenced by misleading textual cues even with sufficient visual evidence, making CoT an incomplete window into modality reliance.

citing papers explorer

Showing 2 of 2 citing papers.

AtManRL: Towards Faithful Reasoning via Differentiable Attention Saliency cs.CL · 2026-04-17 · unverdicted · none · ref 15
AtManRL learns an additive attention mask on CoT traces to produce a saliency reward that, when combined with outcome rewards in GRPO, trains LLMs to generate reasoning that genuinely influences final predictions.
Reasoning Dynamics and the Limits of Monitoring Modality Reliance in Vision-Language Models cs.CL · 2026-04-16 · unverdicted · none · ref 22
VLMs show answer inertia in CoT reasoning and remain influenced by misleading textual cues even with sufficient visual evidence, making CoT an incomplete window into modality reliance.

On measuring faithfulness or self-consistency of natural language explanations

fields

years

verdicts

representative citing papers

citing papers explorer