arxiv: 2604.01951 · v2 · submitted 2026-04-02 · 💻 cs.LG

Recognition: 2 theorem links

· Lean Theorem

Autolearn: Learn by Surprise, Commit by Proof

Kang-Sin Choi

Authors on Pith no claims yet

Pith reviewed 2026-05-13 21:53 UTC · model grok-4.3

classification 💻 cs.LG

keywords self-supervised learninglanguage modelsmemorizationperturbation gapQ&A trainingnovel fact generationunsupervised adaptation

0 comments

The pith

Language models can learn new facts from documents by training on self-generated questions from surprising passages.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Autolearn, a self-supervised method that flags passages producing high per-token loss, generates its own Q&A chains to verify them, and trains on that format using conviction-based adjustments. This Q&A approach reduces the perturbation gap—a ratio of paraphrase to original perplexity—below the pre-trained baseline, indicating reduced reliance on exact token sequences. Standard fine-tuning shows no such effect, while Autolearn raises the probability of generating correct novel facts from 6% to 54% across models. The process stops automatically once surprisal falls below threshold, skipping re-encountered material.

Core claim

Autolearn enables language models to learn from documents with no external supervision by identifying high-loss passages, creating self-generated Q&A verification chains, and applying beta-2 adjustments proportional to conviction. The Q&A training format drives the perturbation gap below the pre-trained baseline of 2.204 down to 2.098, while standard fine-tuning stays within noise. This yields a rise in correct novel fact generation probability from 6% to 54%, with the effect consistent across Qwen3 and Phi-4 model families.

What carries the argument

The perturbation gap, the ratio of paraphrase perplexity to original perplexity, which serves as a metric to detect when training has shifted from sequence memorization to broader understanding when paired with Q&A data format.

If this is right

Models can selectively acquire knowledge from new documents without human labeling or external rewards.
Q&A format training consistently outperforms direct text fine-tuning at suppressing memorization across different model scales.
The self-extinguishing property prevents repeated training on already-mastered content.
Novel fact generation improves specifically on items absent from the original training distribution.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Deployed models could use this loop for ongoing adaptation to new user-provided documents.
The same surprise-and-verify pattern might extend to code or multimodal data where high-loss segments can be turned into question-answer pairs.
If the perturbation gap remains a reliable signal, it could serve as an internal curriculum signal for larger training runs.
Models trained this way may show lower rates of confident repetition of training sequences on paraphrased inputs.

Load-bearing premise

Passages with anomalously high per-token loss contain verifiable learnable content that self-generated Q&A chains can accurately extract and reinforce without introducing or amplifying errors.

What would settle it

Apply the method to passages containing deliberately planted false facts and measure whether the post-training probability of generating those specific false facts increases above the pre-training baseline.

Figures

Figures reproduced from arXiv: 2604.01951 by Kang-Sin Choi.

**Figure 2.** Figure 2: Perturbation gap by condition under graduated accept ( [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: Effective β2 = 0.999 · r k by decay factor r and conviction depth k. Green: AdamW regime (β2 > 0.85). Yellow: boundary (0.5 < β2 < 0.85). Orange: SGD-like (β2 < 0.5). The optimal r = 0.98 keeps most items in the green zone. 3.4 Cross-Model Generalization To test whether LSCP’s mechanisms generalize beyond a single model, we ran the full pipeline (Stages 1–3) under graduated accept (k > 0) on six models spa… view at source ↗

**Figure 4.** Figure 4: Statistical significance of β2 gating across r values (n = 5 per condition; r = 0.98: n = 6). (A) Corrupt-adjacent accuracy: r = 0.98 achieves the best protection (93 ± 7%), above baseline (90%). (B) PPL reduction increases monotonically with stronger gating while unrelated accuracy remains near 100%, confirming no catastrophic forgetting. Error bars: ±1 std. Phi-4 accepts only 15 items, learning effective… view at source ↗

read the original abstract

We propose Autolearn, a framework that enables language models to learn from documents they read, with no external supervision. Passages that produce anomalously high per-token loss are flagged, verified through a self-generated Q&A chain, and trained on with conviction-proportional $\beta_2$ adjustment. We introduce the perturbation gap (paraphrase-to-original perplexity ratio) as a metric that distinguishes memorization from understanding. The key mechanism is the training data format: Q&A-format training drives the perturbation gap below the pre-trained baseline (2.098 vs. 2.204, $\Delta = -0.106$, $> 10\sigma$), suppressing token-sequence memorization, while standard fine-tuning's best attempt remains within noise ($\Delta = -0.010$, $< 1\sigma$). Across four models spanning Qwen3 and Phi-4 families, Autolearn is the only method that enters this regime. Stochastic evaluation reveals passage-specific knowledge acquisition: the probability of generating a correct novel fact rises from 6% to 54% after training ($p < 10^{-4}$), and Q&A format outperforms standard fine-tuning on genuinely novel facts. The system is self-extinguishing: learned content reduces surprisal below threshold and is skipped on re-encounter.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Autolearn's Q&A format produces a statistically sharp drop in the perturbation gap that standard fine-tuning misses, but the self-generated verification carries the main risk.

read the letter

Autolearn flags high per-token-loss passages, generates its own Q&A chains for verification, and trains with a conviction-weighted beta2 adjustment. The central claim is that only this Q&A format drives the perturbation gap below the pre-trained baseline (2.098 vs 2.204, >10 sigma) while ordinary fine-tuning stays within noise, and it lifts correct novel-fact generation from 6% to 54% across Qwen3 and Phi-4 models. The system also stops revisiting content once surprisal falls below threshold.

Referee Report

3 major / 1 minor

Summary. The paper proposes Autolearn, a self-supervised framework in which language models flag passages with anomalously high per-token loss, verify them via self-generated Q&A chains, and train on them with conviction-proportional β2 weighting. It introduces the perturbation gap (paraphrase-to-original perplexity ratio) as a metric to distinguish memorization from understanding, claiming that Q&A-format training reduces this gap below the pre-trained baseline (2.098 vs. 2.204, Δ = -0.106, >10σ) across four models while standard fine-tuning does not, and raises novel-fact generation probability from 6% to 54%. The system is described as self-extinguishing once surprisal falls below threshold.

Significance. If the empirical claims are robust, the work could meaningfully advance unsupervised continual learning for LLMs by turning internal surprisal into a training signal. The perturbation gap is a potentially useful diagnostic, the cross-model consistency is a strength, and the self-extinguishing loop offers an efficiency advantage. However, the absence of external validation for the Q&A chains and the self-referential use of the gap limit the strength of the conclusions.

major comments (3)

[Abstract] Abstract: the perturbation gap is introduced as the primary success metric and the reported improvement (2.098 vs. 2.204 baseline) is measured directly on it; this creates a circularity risk because the metric was designed to capture the effect that Q&A-format training is intended to produce, with no external validation or alternative metric provided to confirm what the gap actually measures.
[Abstract] Abstract and method description: the loss-anomaly threshold and conviction-proportional β2 are free parameters, yet no procedure is given for setting the threshold, generating or filtering the self-generated Q&A chains, or controlling for prompt sensitivity in the stochastic evaluation; without these details the >10σ claim and the 6%→54% novel-fact result cannot be assessed for robustness.
[Abstract] Abstract: the central assumption that self-generated Q&A chains provide accurate verification without external ground truth carries both the memorization-suppression and novel-fact acquisition results; any hallucinations in the chains would be reinforced during training, yet the paper provides no independent oracle, human review, or cross-model consistency check on the generated pairs.

minor comments (1)

[Abstract] Abstract: the exact statistical test, number of trials, and variance estimation method underlying the reported sigma levels and p < 10^{-4} should be stated explicitly.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. We address each of the major comments point by point below, providing clarifications and committing to revisions where appropriate to strengthen the paper.

read point-by-point responses

Referee: the perturbation gap is introduced as the primary success metric and the reported improvement (2.098 vs. 2.204 baseline) is measured directly on it; this creates a circularity risk because the metric was designed to capture the effect that Q&A-format training is intended to produce, with no external validation or alternative metric provided to confirm what the gap actually measures.

Authors: The perturbation gap is a theoretically motivated metric, defined as the ratio of perplexity on paraphrased versions to the original, to distinguish memorization (high gap) from understanding (low gap). Its application is not circular because we demonstrate a selective reduction only under Autolearn's Q&A training, not under standard fine-tuning, with statistical significance across models. The novel-fact generation probability provides an alternative, independent measure of acquisition. In the revision, we will expand the discussion on the metric's grounding and its correlation with fact-generation results. revision: partial
Referee: the loss-anomaly threshold and conviction-proportional β2 are free parameters, yet no procedure is given for setting the threshold, generating or filtering the self-generated Q&A chains, or controlling for prompt sensitivity in the stochastic evaluation; without these details the >10σ claim and the 6%→54% novel-fact result cannot be assessed for robustness.

Authors: We agree that more procedural details are necessary for reproducibility and robustness assessment. The revised manuscript will specify the exact method for determining the loss-anomaly threshold (e.g., passages exceeding mean loss by 2 standard deviations), the prompt engineering for Q&A generation, filtering based on internal consistency of the Q&A chain, and results from sensitivity analysis on prompt variations to support the reported significance levels. revision: yes
Referee: the central assumption that self-generated Q&A chains provide accurate verification without external ground truth carries both the memorization-suppression and novel-fact acquisition results; any hallucinations in the chains would be reinforced during training, yet the paper provides no independent oracle, human review, or cross-model consistency check on the generated pairs.

Authors: While the Q&A chains are self-generated, the primary training signal is the original high-loss passage, with β2 weighting based on conviction. The self-extinguishing mechanism ensures that only content reducing surprisal is learned, limiting reinforcement of errors. Cross-model consistency in outcomes provides some validation. We acknowledge the lack of external ground truth as a limitation and will include in the revision an analysis of Q&A consistency rates across the four models and a discussion of potential hallucination risks. revision: partial

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper introduces the perturbation gap as an independent metric (paraphrase-to-original perplexity ratio) and reports empirical results showing Q&A-format training reduces it below the pre-trained baseline while standard fine-tuning does not. This is presented as an experimental outcome across four models with statistical significance (p < 10^{-4}), not a first-principles derivation. No equation or claim reduces by construction to its inputs; the training mechanism (flagging high-loss passages, self-generated Q&A, conviction-weighted updates) operates separately from the post-training measurement of the gap. The self-extinguishing property and stochastic evaluation are additional empirical observations. The central claims rest on observable differences in held-out behavior rather than tautological redefinition or self-referential fitting.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on unstated choices for the loss anomaly threshold and the beta2 scaling rule, plus the assumption that self-generated Q&A provides reliable verification. No new physical or mathematical entities are postulated.

free parameters (2)

loss anomaly threshold
The cutoff defining anomalously high per-token loss is not specified and must be chosen to flag passages.
conviction-proportional beta2
The exact mapping from verification confidence to the beta2 adjustment parameter is not detailed.

axioms (2)

domain assumption High per-token loss reliably indicates content worth learning rather than noise or model limitation.
Invoked when flagging passages for the Q&A chain.
ad hoc to paper Self-generated Q&A chains produce accurate verification without external ground truth.
Central to the training loop and the claim of conviction-proportional adjustment.

pith-pipeline@v0.9.0 · 5520 in / 1607 out tokens · 40200 ms · 2026-05-13T21:53:50.640369+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

Passages that produce anomalously high per-token loss are flagged, verified through a self-generated Q&A chain, and trained on with conviction-proportional β2 adjustment.
IndisputableMonolith/Foundation/Recognition.lean reality_from_one_distinction echoes

?

echoes
ECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.

The framework is self-extinguishing: as the model learns, per-token loss on learned passages decreases toward the surprisal threshold.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 3 internal anchors

[1]

McCloskey, M., & Cohen, N. J. (1989). Catastrophic interference in connectionist networks. Psychology of Learning and Motivation, 24, 109–165

work page 1989
[2]

Lewis, P., et al. (2020). Retrieval-augmented generation for knowledge-intensive NLP tasks. In NeurIPS

work page 2020
[3]

Letta. (2024). RAG is not agent memory.Letta Blog

work page 2024
[4]

Liu, N., et al. (2023). Lost in the middle: How language models use long contexts.TACL

work page 2023
[5]

Loshchilov, I., & Hutter, F. (2019). Decoupled weight decay regularization. InICLR

work page 2019
[6]

McGaugh, J. L. (2004). The amygdala modulates the consolidation of memories of emotionally arousing experiences.Annual Review of Neuroscience, 27, 1–28

work page 2004
[7]

Clark, A. (2013). Whatever next? Predictive brains, situated agents, and the future of cognitive science.Behavioral and Brain Sciences, 36(3), 181–204

work page 2013
[8]

Friston, K. (2010). The free-energy principle: A unified brain theory?Nature Reviews Neuro- science, 11(2), 127–138. 21

work page 2010
[9]

M., et al

Zacks, J. M., et al. (2007). Event perception: A mind-brain perspective.Psychological Bulletin, 133(2), 273–293

work page 2007
[10]

Petroni, F., et al. (2019). Language models as knowledge bases? InEMNLP

work page 2019
[11]

Geva, M., Schuster, R., Berant, J., & Levy, O. (2021). Transformer feed-forward layers are key-value memories. InEMNLP

work page 2021
[12]

Power, A., et al. (2022). Grokking: Generalization beyond overfitting on small algorithmic datasets.arXiv preprint arXiv:2201.02177

work page internal anchor Pith review Pith/arXiv arXiv 2022
[13]

Zhao, R., et al. (2025). Do we know what LLMs don’t know? A study of consistency in knowledge probing. InFindings of EMNLP

work page 2025
[14]

Miao, N., et al. (2023). SelfCheck: Using LLMs to zero-shot check their own step-by-step reasoning. InNeurIPS

work page 2023
[15]

Wu, Q., et al. (2025). Rote learning considered useful: Generalizing over memorized data in LLMs. InICML 2025

work page 2025
[16]

Tononi, G., & Cirelli, C. (2006). Sleep function and synaptic homeostasis.Sleep Medicine Reviews, 10(1), 49–62

work page 2006
[17]

Fountas, Z., et al. (2024). Human-like episodic memory for infinite context LLMs.arXiv preprint arXiv:2407.09450

work page arXiv 2024
[18]

Zhao, L., et al. (2025). Pre-training limited memory language models with internal and external knowledge.arXiv preprint arXiv:2505.15962

work page arXiv 2025
[19]

Zelikman, E., Wu, Y., Mu, J., & Goodman, N. (2022). STaR: Bootstrapping reasoning with reasoning. InNeurIPS

work page 2022
[20]

Wang, Y., et al. (2023). Self-Instruct: Aligning language models with self-generated instructions. InACL

work page 2023
[21]

Chen, Z., et al. (2024). Self-play fine-tuning converts weak language models to strong language models. InICML

work page 2024
[22]

Bai, Y., et al. (2022). Constitutional AI: Harmlessness from AI feedback.arXiv preprint arXiv:2212.08073

work page internal anchor Pith review Pith/arXiv arXiv 2022
[23]

Zhang, X., et al. (2024). Self-Tuning: Instructing LLMs to effectively acquire new knowledge through self-teaching.arXiv preprint arXiv:2406.06326

work page arXiv 2024
[24]

Wang, X., et al. (2023). Self-consistency improves chain of thought reasoning in language models. InICLR

work page 2023
[25]

Atreja, D., et al. (2025). ALAS: Autonomous learning agent for self-updating language models. arXiv preprint arXiv:2508.15805

work page arXiv 2025
[26]

Qu, C., et al. (2025). Unlocking LLMs’ self-improvement capacity with autonomous learning. InFindings of ACL

work page 2025
[27]

Gao, J., et al. (2025). Self-evolving LLMs via continual instruction tuning.arXiv preprint arXiv:2509.18133. 22

work page arXiv 2025
[28]

Kirkpatrick, J., et al. (2017). Overcoming catastrophic forgetting in neural networks.PNAS, 114(13), 3521–3526

work page 2017
[29]

Bhoy, A., et al. (2025). STABLE: Gated continual learning for large language models.arXiv preprint arXiv:2510.16089

work page arXiv 2025
[30]

Chekalina, V., et al. (2024). SparseGrad: Efficient parameter-efficient fine-tuning via sparse gradient selection.arXiv preprint

work page 2024
[31]

NVIDIA Research. (2026). Reimagining LLM memory: Using context as training data unlocks models that learn at test-time.NVIDIA Technical Blog

work page 2026
[32]

L., McNaughton, B

McClelland, J. L., McNaughton, B. L., & O’Reilly, R. C. (1995). Why there are complementary learning systems in the hippocampus and neocortex.Psychological Review, 102(3), 419–457

work page 1995
[33]

Yin, Z., et al. (2023). Do large language models know what they don’t know? InFindings of ACL

work page 2023
[34]

Kadavath, S., et al. (2022). Language models (mostly) know what they know.arXiv preprint arXiv:2207.05221

work page internal anchor Pith review Pith/arXiv arXiv 2022
[35]

Ahdritz, G., et al. (2024). Distinguishing the knowable from the unknowable with language models. InICML. A Threshold Accept Results Graduated accept modulates learning strength through conviction depth k, but one may also set a threshold on k proportional to N so that only rigorously verified passages are learned. In this appendix, we set the strictest p...

work page 2024