Breaking the Chain: A Causal Analysis of LLM Faithfulness to Intermediate Structures

· 2026 · cs.AI · arXiv 2603.16475

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

In schema-guided reasoning (SGR) pipelines, LLMs produce explicit intermediate structures -- rubrics, checklists, or verification queries -- before committing to a final decision. SGR is increasingly adopted because it promises controllability: practitioners expect to inspect, edit, and override these structures to steer the outcome. But does the promise hold? We introduce a causal evaluation protocol to measure it: by selecting tasks where a deterministic function maps intermediate structures to decisions, every controlled edit implies a unique correct output. Across 12 models and 4 benchmarks, models appear self-consistent with their own intermediate structures but fail to update predictions after intervention -- revealing that apparent faithfulness is fragile once the intermediate structure changes. When derivation of the final decision from the structure is delegated to an external tool, this fragility largely disappears; stronger prompting yields only limited improvements, while preference optimization substantially improves intervention faithfulness. Overall, intermediate structures in schema-guided pipelines function as influential context rather than stable causal mediators.

representative citing papers

Faithful by Definition: Emotion Analysis via Natural Semantic Metalanguage Explications

cs.CL · 2026-07-01 · unverdicted · novelty 5.0

An NSM-based explication parser with fixed semantic rules produces emotion labels for events, achieving 0.33 accuracy on held-out crowd-sourced data while shifting empirical risk to an inspectable parser.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Faithful by Definition: Emotion Analysis via Natural Semantic Metalanguage Explications cs.CL · 2026-07-01 · unverdicted · none · ref 10 · internal anchor
An NSM-based explication parser with fixed semantic rules produces emotion labels for events, achieving 0.33 accuracy on held-out crowd-sourced data while shifting empirical risk to an inspectable parser.

Breaking the Chain: A Causal Analysis of LLM Faithfulness to Intermediate Structures

fields

years

verdicts

representative citing papers

citing papers explorer