CyberCorrect applies cybernetic control theory to LLM self-correction, reporting 79.8% accuracy on a new 440-task benchmark with 6.2-point gains and 41% less over-correction.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
COOPO is a cyclic offline-online RL algorithm that repeatedly anchors the policy to a dataset via KL-regularized updates then fine-tunes online, claiming better sample efficiency and monotonic improvement under coverage assumptions.
citing papers explorer
-
CyberCorrect: A Cybernetic Framework for Closed-Loop Self-Correction in Large Language Models
CyberCorrect applies cybernetic control theory to LLM self-correction, reporting 79.8% accuracy on a new 440-task benchmark with 6.2-point gains and 41% less over-correction.
-
COOPO: Cyclic Offline-Online Policy Optimization Algorithm
COOPO is a cyclic offline-online RL algorithm that repeatedly anchors the policy to a dataset via KL-regularized updates then fine-tunes online, claiming better sample efficiency and monotonic improvement under coverage assumptions.