Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing

· 2026 · cs.AI · arXiv 2604.08401

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

open full Pith review browse 1 citing papers arXiv PDF

abstract

In large language model (LLM) agents, reasoning trajectories are treated as reliable internal beliefs for guiding actions and updating memory. However, coherent reasoning can still violate logical or evidential constraints, allowing unsupported beliefs repeatedly stored and propagated across decision steps, leading to systematic behavioral drift in long-horizon agentic systems. Most existing strategies rely on the consensus mechanism, conflating agreement with faithfulness. In this paper, inspired by the vulnerability of unfaithful intermediate reasoning trajectories, we propose \textbf{S}elf-\textbf{A}udited \textbf{Ve}rified \textbf{R}easoning (\textsc{SAVeR}), a novel framework that enforces verification over internal belief states within the agent before action commitment, achieving faithful reasoning. Concretely, we structurally generate persona-based diverse candidate beliefs for selection under a faithfulness-relevant structure space. To achieve reasoning faithfulness, we perform adversarial auditing to localize violations and repair through constraint-guided minimal interventions under verifiable acceptance criteria. Extensive experiments on six benchmark datasets demonstrate that our approach consistently improves reasoning faithfulness while preserving competitive end-task performance.

representative citing papers

OpenClawBench: Benchmarking Process-side Anomalies in Real-world Agent Execution Trajectories

cs.AI · 2026-05-28 · unverdicted · novelty 6.0

OpenClawBench annotates 31,264 agent trajectories to show that roughly 9% of task-successful executions contain measurable process anomalies, and a fine-tuned detector reaches F1 0.729 on held-out data.

citing papers explorer

Showing 1 of 1 citing paper.

OpenClawBench: Benchmarking Process-side Anomalies in Real-world Agent Execution Trajectories cs.AI · 2026-05-28 · unverdicted · none · ref 24 · internal anchor
OpenClawBench annotates 31,264 agent trajectories to show that roughly 9% of task-successful executions contain measurable process anomalies, and a fine-tuned detector reaches F1 0.729 on held-out data.

Verify Before You Commit: Towards Faithful Reasoning in LLM Agents via Self-Auditing

fields

years

verdicts

representative citing papers

citing papers explorer