Self-correction bench: Uncovering and addressing the self-correction blind spot in large language models

· 2025 · arXiv 2507.02778

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

representative citing papers

ProCrit: Self-Elicited Multi-Perspective Reasoning with Critic-Guided Revision for Multimodal Sarcasm Detection

cs.MA · 2026-05-20 · unverdicted · novelty 6.0

ProCrit proposes a Proposal-Critic framework that synthesizes process-level annotations via agentic rollout and uses draft-critique-revise with mutual-refinement RL to improve multimodal sarcasm detection.

ReFlect: An Effective Harness System for Complex Long-Horizon LLM Reasoning

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

ReFlect is a harness that wraps LLMs to detect and recover from reasoning errors, achieving 7-29 pp gains over direct CoT on long-horizon tasks and improving code patch quality to 82-87%.

citing papers explorer

Showing 2 of 2 citing papers.

ProCrit: Self-Elicited Multi-Perspective Reasoning with Critic-Guided Revision for Multimodal Sarcasm Detection cs.MA · 2026-05-20 · unverdicted · none · ref 11
ProCrit proposes a Proposal-Critic framework that synthesizes process-level annotations via agentic rollout and uses draft-critique-revise with mutual-refinement RL to improve multimodal sarcasm detection.
ReFlect: An Effective Harness System for Complex Long-Horizon LLM Reasoning cs.AI · 2026-05-07 · unverdicted · none · ref 28
ReFlect is a harness that wraps LLMs to detect and recover from reasoning errors, achieving 7-29 pp gains over direct CoT on long-horizon tasks and improving code patch quality to 82-87%.

Self-correction bench: Uncovering and addressing the self-correction blind spot in large language models

fields

years

verdicts

representative citing papers

citing papers explorer