When can LLMs actually correct their own mistakes? A critical survey of self-correction of LLMs

Kamoi, Ryo, Zhang, Yusen, Zhang, Nan, Han, Jiawei, Zhang, Rui · 2024 · DOI 10.1162/tacl_a_00713

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

open at publisher browse 8 citing papers

representative citing papers

RubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinement

cs.LG · 2026-05-10 · unverdicted · novelty 7.0 · 3 refs

RubricRefine is a training-free pre-execution method that creates rubrics to score and fix inter-tool contract violations in agent code, reaching 0.86 average on M3ToolEval across seven models with zero executions and lower latency.

Scaling Performance and Low-Resource Annotation with Many-Shot In-Context Learning for Named Entity Recognition

cs.CL · 2026-06-20 · unverdicted · novelty 6.0

Many-shot ICL with LLMs matches or exceeds supervised BERT on NER and generates high-quality labels for low-resource settings, producing ~10% absolute F1 gains when used to fine-tune BERT.

Guiding LLM-based Loop Invariant Synthesis via Feedback on Local Reasoning Errors

cs.PL · 2026-05-18 · unverdicted · novelty 6.0

LORIS detects local reasoning errors in LLM-generated proofs for loop invariants by translating natural-language steps to first-order logic implications and using invalid implications to refine the invariants, achieving 93.1% success on 460 C programs.

Weighted Rules under the Stable Model Semantics

cs.AI · 2026-05-10 · unverdicted · novelty 6.0

Weighted rules extend stable model semantics to support probabilistic reasoning, model ranking, and statistical inference in answer set programs.

ReflectCAP: Detailed Image Captioning with Reflective Memory

cs.AI · 2026-04-14 · unverdicted · novelty 6.0

ReflectCAP distills model-specific hallucination and oversight patterns into Structured Reflection Notes that steer LVLMs toward more factual and complete image captions, reaching the Pareto frontier on factuality-coverage trade-offs.

Correction and Corruption: A Two-Rate View of Error Flow in LLM Protocols

cs.LG · 2026-04-20 · unverdicted · novelty 5.0

A two-rate measurement (correction c and corruption γ) for LLM protocol steps predicts accuracy changes from paired correctness bits and flags three failure modes including mixture shift on GSM8K.

Understanding the Self-Reflection Mechanisms of LLMs through Biased Attitude Associations

cs.SI · 2026-05-30 · unverdicted · novelty 4.0

ReBias-Lens shows LLM self-reflection produces layer-wise smoothing of global valence fluctuations that reduces behavioral bias overall, yet selectively locks in and amplifies certain category-specific biases.

sciwrite-lint: Verification Infrastructure for the Age of Science Vibe-Writing

cs.DL · 2026-04-09

citing papers explorer

Showing 8 of 8 citing papers.

RubricRefine: Improving Tool-Use Agent Reliability with Training-Free Pre-Execution Refinement cs.LG · 2026-05-10 · unverdicted · none · ref 13 · 3 links
RubricRefine is a training-free pre-execution method that creates rubrics to score and fix inter-tool contract violations in agent code, reaching 0.86 average on M3ToolEval across seven models with zero executions and lower latency.
Scaling Performance and Low-Resource Annotation with Many-Shot In-Context Learning for Named Entity Recognition cs.CL · 2026-06-20 · unverdicted · none · ref 30
Many-shot ICL with LLMs matches or exceeds supervised BERT on NER and generates high-quality labels for low-resource settings, producing ~10% absolute F1 gains when used to fine-tune BERT.
Guiding LLM-based Loop Invariant Synthesis via Feedback on Local Reasoning Errors cs.PL · 2026-05-18 · unverdicted · none · ref 21
LORIS detects local reasoning errors in LLM-generated proofs for loop invariants by translating natural-language steps to first-order logic implications and using invalid implications to refine the invariants, achieving 93.1% success on 460 C programs.
Weighted Rules under the Stable Model Semantics cs.AI · 2026-05-10 · unverdicted · none · ref 64
Weighted rules extend stable model semantics to support probabilistic reasoning, model ranking, and statistical inference in answer set programs.
ReflectCAP: Detailed Image Captioning with Reflective Memory cs.AI · 2026-04-14 · unverdicted · none · ref 13
ReflectCAP distills model-specific hallucination and oversight patterns into Structured Reflection Notes that steer LVLMs toward more factual and complete image captions, reaching the Pareto frontier on factuality-coverage trade-offs.
Correction and Corruption: A Two-Rate View of Error Flow in LLM Protocols cs.LG · 2026-04-20 · unverdicted · none · ref 7
A two-rate measurement (correction c and corruption γ) for LLM protocol steps predicts accuracy changes from paired correctness bits and flags three failure modes including mixture shift on GSM8K.
Understanding the Self-Reflection Mechanisms of LLMs through Biased Attitude Associations cs.SI · 2026-05-30 · unverdicted · none · ref 3
ReBias-Lens shows LLM self-reflection produces layer-wise smoothing of global valence fluctuations that reduces behavioral bias overall, yet selectively locks in and amplifies certain category-specific biases.
sciwrite-lint: Verification Infrastructure for the Age of Science Vibe-Writing cs.DL · 2026-04-09 · unreviewed · ref 19

When can LLMs actually correct their own mistakes? A critical survey of self-correction of LLMs

fields

years

verdicts

representative citing papers

citing papers explorer