STRIDE co-trains generator and verifier on outcome rewards alone to deliver learnable stepwise language feedback that redirects LLM reasoning trajectories and outperforms scalar-reward baselines.
Step back to leap forward: Self-backtracking for boosting reasoning of language models
3 Pith papers cite this work. Polarity classification is still indexing.
years
2026 3representative citing papers
Behavior Cue Reasoning trains LLMs to emit special tokens before behaviors, enabling monitors to cut up to 50% wasted reasoning tokens and recover safe actions from 80% of unsafe traces, more than doubling success rates with no performance cost.
LLMs implement a second-order confidence architecture where the PANL activation encodes both error likelihood and the ability to correct it, beyond verbal confidence or log-probabilities.
citing papers explorer
-
STRIDE: Learnable Stepwise Language Feedback for LLM Reasoning
STRIDE co-trains generator and verifier on outcome rewards alone to deliver learnable stepwise language feedback that redirects LLM reasoning trajectories and outperforms scalar-reward baselines.
-
Behavior Cue Reasoning: Monitorable Reasoning Improves Efficiency and Safety through Oversight
Behavior Cue Reasoning trains LLMs to emit special tokens before behaviors, enabling monitors to cut up to 50% wasted reasoning tokens and recover safe actions from 80% of unsafe traces, more than doubling success rates with no performance cost.
-
How LLMs Detect and Correct Their Own Errors: The Role of Internal Confidence Signals
LLMs implement a second-order confidence architecture where the PANL activation encodes both error likelihood and the ability to correct it, beyond verbal confidence or log-probabilities.