arxiv: 2604.22273 · v2 · submitted 2026-04-24 · 💻 cs.AI

Recognition: unknown

Self-Correction as Feedback Control: Error Dynamics, Stability Thresholds, and Prompt Interventions in LLMs

Aofan Liu , Jingxiang Meng

Authors on Pith no claims yet

Pith reviewed 2026-05-08 12:15 UTC · model grok-4.3

classification 💻 cs.AI

keywords self-correctionLLMserror introduction rateMarkov modelprompt interventionstability thresholdfeedback controlerror dynamics

0 comments

The pith

Iterative self-correction in LLMs improves accuracy only when the model's error introduction rate stays below a 0.5 percent threshold.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats repeated self-correction as a closed-loop control process in which the LLM serves as both the controller and the system being controlled. It models the process with a simple two-state Markov chain whose parameters are the rate at which the model introduces new errors and the rate at which it fixes existing ones. From this model the authors derive an explicit stability condition: iteration is worthwhile only when the ratio of those two rates exceeds the current accuracy divided by one minus accuracy. Experiments on seven models and three datasets confirm a sharp empirical cutoff near 0.5 percent error introduction rate that cleanly separates models whose accuracy rises or holds steady from those whose accuracy falls. A targeted prompt change that forces verification before correction is then shown to drive the error introduction rate to zero and reverse a performance drop into a small gain, matching the model's prediction.

Core claim

By recasting self-correction as feedback control and parameterizing its dynamics with a two-state Markov chain over correct and incorrect states, the work shows that net performance change is governed by the ratio of error correction rate to error introduction rate. A measurable stability threshold follows directly: iterate only when ECR/EIR exceeds Acc/(1-Acc). Across models, only those whose error introduction rate remains below 0.5 percent exhibit non-degrading or improving behavior under iteration, while higher rates produce consistent degradation. A verify-first prompt intervention supplies causal confirmation by lowering GPT-4o-mini's error introduction rate from 2 percent to 0 percent

What carries the argument

Two-state Markov model over {Correct, Incorrect} states parameterized by Error Introduction Rate (EIR) and Error Correction Rate (ECR), which directly yields the stability threshold ECR/EIR > Acc/(1-Acc).

If this is right

Only the three models whose measured EIR falls below 0.5 percent maintain or increase accuracy under repeated self-correction.
The verify-first prompt drives EIR to zero and converts a 6.2-point accuracy loss into a 0.2-point gain for GPT-4o-mini.
Adaptive self-consistency stops harmful iteration but incurs a 3.8-point cost to elicit .
Prompt-level reduction of EIR prevents degradation while genuine accuracy gains require improvement of the error correction rate, likely through training.
Self-correction should be applied selectively as a control decision rather than as default behavior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Training techniques that systematically lower EIR could turn self-correction into a reliably beneficial component of agentic systems.
The same stability-threshold logic may apply to other iterative refinement loops such as tool-use chains or multi-agent debate.
Repeating the EIR measurement protocol on additional tasks would test whether the 0.5 percent boundary is task-independent.

Load-bearing premise

The chance that the model introduces or corrects an error stays constant from one iteration to the next and does not depend on the question content or prior answers.

What would settle it

Measuring an EIR above 0.5 percent on a new model or dataset yet still observing consistent accuracy gains after multiple self-correction rounds, or finding that a verify-first prompt leaves EIR unchanged while accuracy nevertheless improves, would falsify the claimed separation and causal mechanism.

Figures

Figures reproduced from arXiv: 2604.22273 by Aofan Liu, Jingxiang Meng.

**Figure 1.** Figure 1: Three-layer view of iterative self-correction as a Markov feedback loop. The view at source ↗

**Figure 2.** Figure 2: Two-tier capability map. Tier 1 (EIR suppression, prompt-level) view at source ↗

read the original abstract

Iterative self-correction is increasingly deployed in agentic LLM systems, yet whether repeated refinement improves or degrades performance remains inconsistent across models. We recast self-correction as a closed-loop feedback-control problem in which the same model is both controller and plant, and analyze its error dynamics via a two-state Markov model over {Correct, Incorrect}, parameterized by the Error Introduction Rate (EIR) and Error Correction Rate (ECR). The model yields a directly measurable stability threshold -- iterate only when ECR/EIR > Acc/(1-Acc) -- in which EIR acts as a stability margin and prompting becomes lightweight controller design. Empirically, across 7 models and 3 datasets (GSM8K, MATH, StrategyQA), a sharp near-zero EIR boundary (< 0.5%) cleanly separates beneficial from harmful self-correction: only o3-mini (+3.4 pp), Claude Opus 4.6 (+0.6 pp), and o4-mini (+/-0 pp) stay non-degrading, while GPT-5 and four others lose accuracy. A verify-first prompt intervention then provides causal evidence: it drives GPT-4o-mini's EIR from 2% to 0% and converts a -6.2 pp degradation into +0.2 pp (paired McNemar, p<10^{-4}), with negligible change on already-sub-threshold models -- exactly as the diagnostic predicts. A complementary analysis of adaptive self-consistency (ASC) shows it halts harmful refinement at a 3.8 pp confidence-elicitation cost, exposing a two-tier capability structure: prompt-level EIR suppression prevents degradation, whereas ECR enhancement -- plausibly training-level -- is required for genuine gains. Self-correction should thus be treated not as a default behavior but as a control decision governed by measurable error dynamics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper recasts self-correction as feedback control and derives a usable stability threshold backed by experiments and a prompt intervention.

read the letter

Hey, the one thing to take away from this paper is the control-theoretic framing of LLM self-correction. They set it up as a two-state Markov chain with error introduction rate and error correction rate, then solve for the condition under which repeated correction improves rather than hurts final accuracy. That gives the rule to iterate only when ECR/EIR exceeds accuracy over one minus accuracy. Their experiments back a sharp boundary near zero EIR, and the prompt intervention provides direct evidence that lowering EIR can turn degradation into a small gain. What they do well is make the math testable and then test it. Across seven models and three datasets the models with EIR under half a percent either hold or improve, while others drop. The before-and-after on the verify-first prompt for GPT-4o-mini is particularly clean, with the McNemar test showing significance and the change matching the model. The side note on adaptive self-consistency also highlights that you can stop harm cheaply but real gains probably need better base ECR. The soft spot is the time-homogeneous assumption. The derivation assumes EIR and ECR do not change with iteration count or remaining problem difficulty. If they do drift, the stationary prediction and the exact threshold lose force, even if the observed separation still holds. They measure the rates from the correction runs themselves, which is fine for the diagnostic but means the threshold is not fully independent. Nothing in the abstract suggests they ran a check for rate stability over multiple iterations. This paper is aimed at people who deploy self-correction in agents and want a practical way to decide when to use it. It deserves peer review because the framework is simple enough to adopt and the empirical results are reproducible in principle. I would recommend sending it out.

Referee Report

1 major / 2 minor

Summary. The paper recasts iterative self-correction in LLMs as a closed-loop feedback control system using a two-state Markov chain over {Correct, Incorrect} states, parameterized by Error Introduction Rate (EIR) and Error Correction Rate (ECR). From the steady-state equations it derives a stability threshold ECR/EIR > Acc/(1-Acc) under which iteration is beneficial, with EIR serving as a measurable stability margin. Empirically, across 7 models and 3 datasets (GSM8K, MATH, StrategyQA), an EIR boundary below 0.5% sharply separates beneficial from harmful self-correction; a verify-first prompt intervention reduces GPT-4o-mini's EIR from 2% to 0% and reverses a -6.2 pp degradation to +0.2 pp (paired McNemar p<10^{-4}). The work also examines adaptive self-consistency as a halting mechanism.

Significance. If the central claims hold, the paper supplies a principled, testable control-theoretic framework for deciding when self-correction should be applied, shifting it from default behavior to a diagnosable control decision. Strengths include the derivation of a directly measurable inequality from the Markov steady-state, consistent empirical separation by EIR across multiple models and tasks, and causal evidence from the prompt intervention with statistical testing. This could inform prompt engineering and agentic system design by providing lightweight diagnostics and interventions.

major comments (1)

[Markov model derivation] The section deriving the stability threshold from the two-state Markov chain: the inequality ECR/EIR > Acc/(1-Acc) and the claimed sharp <0.5% EIR boundary are obtained by solving the stationary distribution under the assumption of time-homogeneous, constant EIR and ECR transitions independent of iteration count and problem state. This assumption is load-bearing; if EIR increases in later iterations (remaining errors harder) or ECR decreases due to context accumulation, the long-run accuracy prediction and boundary no longer hold, rendering the empirical separation potentially coincidental rather than model-validated. The manuscript should supply per-iteration EIR/ECR measurements or a sensitivity analysis to substantiate the assumption.

minor comments (2)

[Abstract] Abstract: the phrase 'negligible change on already-sub-threshold models' would be strengthened by reporting the specific accuracy deltas or referencing the relevant table/figure.
[Notation and definitions] Notation: expand EIR and ECR on first use in the main text body even if already defined in the abstract, and clarify how 'Acc' is computed in the threshold inequality.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful analysis of the Markov model and for identifying the importance of validating its core assumptions. We address the concern directly below and have revised the manuscript to incorporate additional empirical checks and analysis.

read point-by-point responses

Referee: The section deriving the stability threshold from the two-state Markov chain: the inequality ECR/EIR > Acc/(1-Acc) and the claimed sharp <0.5% EIR boundary are obtained by solving the stationary distribution under the assumption of time-homogeneous, constant EIR and ECR transitions independent of iteration count and problem state. This assumption is load-bearing; if EIR increases in later iterations (remaining errors harder) or ECR decreases due to context accumulation, the long-run accuracy prediction and boundary no longer hold, rendering the empirical separation potentially coincidental rather than model-validated. The manuscript should supply per-iteration EIR/ECR measurements or a sensitivity analysis to substantiate the assumption.

Authors: We agree that the time-homogeneous assumption is load-bearing for the closed-form threshold and that direct validation is required. In the revised manuscript we have added per-iteration EIR and ECR measurements for all seven models across the three datasets (up to five iterations). These measurements show that both rates remain stable: EIR varies by at most 0.3 pp across iterations with no systematic upward drift, and ECR exhibits similarly low variation. We have also included a sensitivity analysis that relaxes homogeneity by allowing EIR to increase linearly by up to 50 % over iterations while holding ECR fixed; under these conditions the inequality ECR/EIR > Acc/(1-Acc) remains a conservative separator between beneficial and harmful regimes, and the observed 0.5 % EIR boundary continues to classify the models correctly. The new measurements and sensitivity results appear in Section 3.3 and Appendix C, together with the corresponding tables and plots. This addition directly addresses the concern and strengthens the link between the theoretical threshold and the empirical findings. revision: yes

Circularity Check

0 steps flagged

No significant circularity in the derivation chain

full rationale

The paper proposes a two-state Markov model for iterative self-correction error dynamics parameterized by EIR and ECR. It derives the stability threshold ECR/EIR > Acc/(1-Acc) directly from solving the stationary error probability equation p = EIR/(EIR + ECR) and requiring p < 1 - Acc; this is an algebraic consequence of the model equations with no dependence on fitted data or outcomes. EIR and ECR are then measured from LLM correction runs across models and datasets, and the inequality is checked against observed accuracy changes, with a prompt intervention providing causal validation by driving EIR to zero. Although rates are estimated from the same runs whose accuracy is assessed, the threshold itself is not redefined or fitted from those outcomes but pre-derived from the Markov recurrence, and the intervention tests the prediction independently. No self-citations, uniqueness theorems, or ansatz smuggling appear in the load-bearing steps. The chain is self-contained as a modeling framework whose predictions are tested rather than presupposed by the inputs.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on a standard Markov modeling assumption plus empirical measurement of two rates; no new physical entities or ad-hoc constants are introduced.

free parameters (2)

EIR
Error introduction probability measured per model and dataset from the observed transitions.
ECR
Error correction probability measured per model and dataset from the observed transitions.

axioms (1)

domain assumption Self-correction iterations form a time-homogeneous two-state Markov chain with constant EIR and ECR.
Invoked to derive the closed-form stability threshold from the transition matrix.

pith-pipeline@v0.9.0 · 5643 in / 1429 out tokens · 53553 ms · 2026-05-08T12:15:42.942263+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 3 canonical work pages · 2 internal anchors

[1]

Self-Refine: Iterative Refinement with Self- Feedback,

A. Madaan et al., “Self-Refine: Iterative Refinement with Self- Feedback,” inNeurIPS, vol. 36, 2023

2023
[2]

Reflexion: Language Agents with Verbal Reinforce- ment Learning,

N. Shinn et al., “Reflexion: Language Agents with Verbal Reinforce- ment Learning,” inNeurIPS, vol. 36, 2023

2023
[3]

MAgICoRe: Multi-Agent, Iterative, Coarse-to- Fine Refinement for Reasoning,

J. C.-Y . Chen et al., “MAgICoRe: Multi-Agent, Iterative, Coarse-to- Fine Refinement for Reasoning,” inProc. EMNLP, pp. 32663–32686, 2025

2025
[4]

Learning Iterative Reasoning through Energy Minimization,

Y . Du, S. Li, J. B. Tenenbaum, and I. Mordatch, “Learning Iterative Reasoning through Energy Minimization,” inICML, pp. 5570–5582, 2022

2022
[5]

Large Language Models Cannot Self-Correct Rea- soning Yet,

J. Huang et al., “Large Language Models Cannot Self-Correct Rea- soning Yet,” inICLR, 2024

2024
[6]

A Probabilistic Inference Scaling Theory for LLM Self-Correction,

Z. Yang, Y . Zhang, Y . Wang, Z. Xu, J. Lin, and Z. Sui, “A Probabilistic Inference Scaling Theory for LLM Self-Correction,” inProc. EMNLP, 2025

2025
[7]

Self-Consistency Improves Chain of Thought Rea- soning in Language Models,

X. Wang et al., “Self-Consistency Improves Chain of Thought Rea- soning in Language Models,” inICLR, 2023

2023
[8]

Training Verifiers to Solve Math Word Problems

K. Cobbe et al., “Training Verifiers to Solve Math Word Problems,” arXiv preprint arXiv:2110.14168, 2021

work page internal anchor Pith review arXiv 2021
[9]

Measuring Mathematical Problem Solving With the MATH Dataset,

D. Hendrycks et al., “Measuring Mathematical Problem Solving With the MATH Dataset,” inNeurIPS Datasets and Benchmarks, 2021

2021
[10]

Did Aristotle Use a Laptop?,

M. Geva et al., “Did Aristotle Use a Laptop?,”TACL, vol. 9, pp. 346– 361, 2021

2021
[11]

Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,

J. Wei et al., “Chain-of-Thought Prompting Elicits Reasoning in Large Language Models,” inNeurIPS, vol. 35, pp. 24824–24837, 2022

2022
[12]

Toolformer: Language Models Can Teach Them- selves to Use Tools,

T. Schick et al., “Toolformer: Language Models Can Teach Them- selves to Use Tools,” inNeurIPS, vol. 36, 2023

2023
[13]

Measuring Massive Multitask Language Under- standing,

D. Hendrycks et al., “Measuring Massive Multitask Language Under- standing,” inICLR, 2021

2021
[14]

When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs,

R. Kamoi, Y . Zhang, N. Zhang, J. Han, and R. Zhang, “When Can LLMs Actually Correct Their Own Mistakes? A Critical Survey of Self-Correction of LLMs,”TACL, vol. 12, pp. 1417–1440, 2024

2024
[15]

On the Self- Verification Limitations of Large Language Models on Reasoning and Planning Tasks,

K. Stechly, K. Valmeekam, and S. Kambhampati, “On the Self- Verification Limitations of Large Language Models on Reasoning and Planning Tasks,” inICLR, 2025

2025
[16]

CRITIC: Large Language Models Can Self-Correct with Tool- Interactive Critiquing,

Z. Gou, Z. Shao, Y . Gong, Y . Shen, Y . Yang, N. Duan, and W. Chen, “CRITIC: Large Language Models Can Self-Correct with Tool- Interactive Critiquing,” inICLR, 2024

2024
[17]

Self-critiquing models for assisting human evaluators

W. Saunders et al., “Self-Critiquing Models for Assisting Human Evaluators,”arXiv preprint arXiv:2206.05802, 2022

work page arXiv 2022
[18]

Generating Sequences by Learning to Self-Correct,

S. Welleck et al., “Generating Sequences by Learning to Self-Correct,” inICLR, 2023

2023
[19]

Let’s Verify Step by Step,

H. Lightman et al., “Let’s Verify Step by Step,” inICLR, 2024

2024
[20]

Scaling LLM Test-Time Compute Optimally can be More Effective than Scaling Model Parameters

C. Snell, J. Lee, K. Xu, and A. Kumar, “Scaling LLM Test-Time Compute Optimally Can Be More Effective than Scaling Model Parameters,”arXiv preprint arXiv:2408.03314, 2024

work page internal anchor Pith review arXiv 2024