An Information-Theoretic Criterion for Efficient Data Synthesis

Hanyu Li; Xiaotie Deng; Zhengqi Sun

arxiv: 2605.16379 · v1 · pith:N3TZSJ5Dnew · submitted 2026-05-11 · 💻 cs.LG · cs.AI· cs.IT· math.IT

An Information-Theoretic Criterion for Efficient Data Synthesis

Hanyu Li , Zhengqi Sun , Xiaotie Deng This is my paper

Pith reviewed 2026-05-20 22:13 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.ITmath.IT

keywords synthetic datainformation theorydata processing inequalitymodel collapsesupervision signalsgeneralizationreward hackinglarge language models

0 comments

The pith

Synthetic data improves models only when the generation-training loop is information-open, receiving external signals that add task-relevant information.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper explains inconsistent results with synthetic data through an information-theoretic lens: gains occur exclusively in information-open loops where external signals such as verifiers or rubrics supply information absent from the model's current outputs. In information-closed loops that rely solely on the model's own generations, the data processing inequality forces task-relevant information to decrease, predicting collapse. Coarser supervision signals, such as binary correctness, support stronger generalization because they do not link learned behavior to any particular output form or domain. The account culminates in the thesis that learning converges to the most information-efficient signal component present, speeding intended progress or triggering reward hacking when a spurious pattern proves simpler.

Core claim

Synthetic data improves a model only when the generation-training loop is information-open, i.e., shaped by external signals (verifiers, environments, or rubrics) that inject task-relevant information beyond the model's current distribution. When the loop is information-closed (relying on the model's own outputs without such signals), the data processing inequality ensures that task-relevant information can only decrease, making collapse a predicted outcome. Among information-open pipelines, both efficiency and generalization hinge on the meta-level of supervision: a coarser signal such as binary correctness treats all acceptable outputs as equivalent, so the behavior it teaches is not tied

What carries the argument

the distinction between information-open and information-closed generation-training loops, with external signals determining whether task-relevant information increases or decreases according to the data processing inequality.

If this is right

Closed-loop synthetic data pipelines must produce a net loss of task-relevant information and eventual performance collapse.
Coarser signals such as binary correctness yield behaviors that generalize across tasks and domains because they do not specify particular surface forms.
Learning converges to whichever signal component carries the highest information efficiency among those available.
Reward hacking arises when a spurious pattern happens to be the simplest information-efficient component in the signal.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The open-versus-closed distinction offers a diagnostic for why many self-training methods degrade without independent feedback mechanisms.
Pipeline designers could prioritize adding minimal external verifiers that supply just enough new information to keep loops open.
The preference for efficient signal components may extend to explaining shortcut learning in supervised settings without synthetic data.
Measuring mutual information between generated data and task objectives before and after training loops could directly test the account.

Load-bearing premise

The classification of generation-training loops as information-open versus information-closed is sufficient to determine whether task-relevant information increases or decreases, with the data processing inequality applying directly to the overall loop.

What would settle it

Conduct a closed-loop experiment that generates and trains repeatedly on the model's own outputs with no external verifier or rubric, then check whether accuracy on a held-out test set measuring task-relevant information steadily declines over iterations.

read the original abstract

Synthetic data becomes crucial for large language model training, but its effectiveness is highly inconsistent. We provide an information-theoretic account of this inconsistency: synthetic data improves a model only when the generation-training loop is information-open, i.e., shaped by external signals (verifiers, environments, or rubrics) that inject task-relevant information beyond the model's current distribution. When the loop is information-closed (relying on the model's own outputs without such signals), the data processing inequality ensures that task-relevant information can only decrease, making collapse a predicted outcome. Among information-open pipelines, both efficiency and generalization hinge on the meta-level of supervision: a coarser signal such as binary correctness treats all acceptable outputs as equivalent, so the behavior it teaches is not tied to any particular domain or surface form and generalizes naturally across tasks and domains. These observations lead to a guiding thesis: learning preferentially converges to the most information-efficient signal component available, which accelerates learning when that component is the intended one, but causes reward hacking when a spurious pattern happens to be simpler.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper frames synthetic data success via information-open vs closed loops and DPI to predict collapse, but the iterative model updates likely weaken the direct DPI application without extra proof.

read the letter

The main point is that this paper uses information theory to explain the inconsistent results of synthetic data in LLM training. It claims that only information-open loops, which bring in external signals like verifiers or rubrics, can add task-relevant information and improve the model. Closed loops that just feed the model its own outputs are expected to lose information by the data processing inequality, leading to collapse. It also suggests that coarser supervision signals help generalization because they do not tie behavior to specific output forms, and that learning tends toward the simplest available signal component, which can cause reward hacking if that component is spurious.

Referee Report

2 major / 1 minor

Summary. The paper claims that synthetic data improves LLM performance only in information-open generation-training loops, where external signals (verifiers, environments, rubrics) inject task-relevant information beyond the model's current distribution; in information-closed loops relying on the model's own outputs, the data processing inequality guarantees monotonic decrease in task-relevant mutual information, predicting collapse. Among open loops, coarser supervision (e.g., binary correctness) yields better generalization because it is not tied to specific surface forms. The guiding thesis is that learning converges to the most information-efficient signal component available.

Significance. If the result holds, the work supplies a principled information-theoretic lens for predicting and avoiding synthetic-data collapse, explaining empirical inconsistencies, and guiding pipeline design toward external signals and coarse supervision. It could unify observations across self-training, RLHF, and synthetic-data methods while highlighting the role of meta-level supervision efficiency.

major comments (2)

[Abstract / theoretical argument on closed loops] The central claim applies the data processing inequality directly to the composite iterative closed loop (abstract and theoretical argument), treating it as a single channel that monotonically decreases task-relevant mutual information. However, each training step updates model parameters, so the next generation is performed by a different distribution; the overall map is not a fixed Markov chain X→Y→Z. A proof that the iterative operator still obeys the DPI bound for task-relevant information is required, as the standard inequality does not automatically extend to this setting.
[Introduction and definitions of open/closed loops] The distinction between information-open and information-closed loops is load-bearing for the main thesis, yet the manuscript supplies no formal definitions, quantitative criteria, or measurable quantities (e.g., mutual information thresholds or external-signal injection rates) that would allow classification of concrete pipelines or falsification of the predictions.

minor comments (1)

[Theoretical framework] Notation for mutual information and task-relevant quantities should be introduced explicitly with symbols and units early in the theoretical section to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and detailed comments on our manuscript. These points help clarify the presentation of our information-theoretic framework. We address each major comment below, indicating the revisions we will make to the manuscript.

read point-by-point responses

Referee: [Abstract / theoretical argument on closed loops] The central claim applies the data processing inequality directly to the composite iterative closed loop (abstract and theoretical argument), treating it as a single channel that monotonically decreases task-relevant mutual information. However, each training step updates model parameters, so the next generation is performed by a different distribution; the overall map is not a fixed Markov chain X→Y→Z. A proof that the iterative operator still obeys the DPI bound for task-relevant information is required, as the standard inequality does not automatically extend to this setting.

Authors: We acknowledge that the iterative nature of the closed loop, with parameter updates after each training step, means the overall process is not a fixed Markov chain, and thus the standard DPI does not apply directly. Our argument relies on the intuition that without external signals, no new task-relevant information is introduced at any step. To make this rigorous, we will add a dedicated subsection in the revised manuscript that defines the iterative generation-training operator and provides a proof that task-relevant mutual information is non-increasing in information-closed loops. This will draw on concepts from adaptive information processing and show that the composition cannot increase relevant information. revision: yes
Referee: [Introduction and definitions of open/closed loops] The distinction between information-open and information-closed loops is load-bearing for the main thesis, yet the manuscript supplies no formal definitions, quantitative criteria, or measurable quantities (e.g., mutual information thresholds or external-signal injection rates) that would allow classification of concrete pipelines or falsification of the predictions.

Authors: We agree that formal definitions and criteria are necessary to make our framework operational and falsifiable. In the revision, we will add a new section early in the paper that formally defines information-open and information-closed loops. A loop is information-closed if the external signal S satisfies I(task; S | current_model) = 0, meaning no additional task-relevant information is injected. We will also propose measurable proxies, such as the mutual information between the external signal and the task labels, to classify pipelines and enable empirical validation of our predictions. revision: yes

Circularity Check

0 steps flagged

No circularity: derivation applies external DPI to loop classification

full rationale

The paper's central thesis applies the standard data processing inequality from information theory to information-closed generation-training loops, predicting monotonic decrease in task-relevant mutual information and model collapse. This rests on an external, independently established theorem rather than any quantities defined in terms of the paper's own fitted parameters, self-referential definitions, or load-bearing self-citations. The open/closed loop distinction is a conceptual framing that does not reduce the claimed outcome to the inputs by construction; the derivation chain remains self-contained and draws its force from the cited information-theoretic result.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 2 invented entities

The central claim rests on the data processing inequality as a background mathematical fact and introduces the new distinction between information-open and information-closed loops; no free parameters or additional invented physical entities are evident from the abstract.

axioms (1)

standard math Data processing inequality: processing a random variable cannot increase the mutual information it shares with another variable.
Invoked to conclude that task-relevant information can only decrease in information-closed loops.

invented entities (2)

information-open loop no independent evidence
purpose: Classifies synthetic data pipelines in which external signals add task-relevant information beyond the model's current distribution.
New descriptive category introduced to separate effective from ineffective synthetic data generation.
information-closed loop no independent evidence
purpose: Classifies pipelines that rely solely on the model's own outputs without external signals.
Complementary category used to predict collapse via the data processing inequality.

pith-pipeline@v0.9.0 · 5713 in / 1601 out tokens · 65759 ms · 2026-05-20T22:13:26.838862+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

When the loop is information-closed ... the data processing inequality ensures that task-relevant information can only decrease
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

X→(Z_t,S)→Z_{t+1} and I(X;Z_{t+1})≤I(X;Z_t,S)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 9 internal anchors

[1]

Alemohammad, Sina and. Self-. The

work page
[2]

Google DeepMind , url =

work page
[3]

Constitutional AI: Harmlessness from AI Feedback

Bai, Yuntao and Kadavath, Saurav and Kundu, Sandipan and Askell, Amanda and Kernion, Jackson and Jones, Andy and Chen, Anna and Goldie, Anna and Mirhoseini, Azalia and McKinnon, Cameron and Chen, Carol and Olsson, Catherine and Olah, Christopher and Hernandez, Danny and Drain, Dawn and Ganguli, Deep and Li, Dustin and. Constitutional. doi:10.48550/arXiv.2...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212.08073
[4]

NVIDIA Technical Blog , url =

Automating. NVIDIA Technical Blog , url =

work page
[5]

Seed-prover 1.5: Mastering undergraduate-level theorem proving via learning from experience.arXiv preprint arXiv:2512.17260, 2025

Seed-. doi:10.48550/arXiv.2512.17260 , url =. arXiv , keywords =:2512.17260 , publisher =

work page doi:10.48550/arxiv.2512.17260
[6]

Seed-prover: Deep and broad reasoning for automated theorem proving.arXiv preprint arXiv:2507.23726, 2025

Seed-. doi:10.48550/arXiv.2507.23726 , url =. arXiv , keywords =:2507.23726 , publisher =

work page doi:10.48550/arxiv.2507.23726
[7]

Gold-medalist performance in solving olympiad geometry with alphageometry2.arXiv preprint arXiv:2502.03544, 2025

Gold-medalist. doi:10.48550/arXiv.2502.03544 , url =. arXiv , keywords =:2502.03544 , publisher =

work page doi:10.48550/arxiv.2502.03544
[8]

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

doi:10.48550/arXiv.2512.02556 , url =. arXiv , keywords =:2512.02556 , publisher =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2512.02556
[9]

arXiv , keywords =:2601.08468 , publisher =

doi:10.48550/arXiv.2601.08468 , url =. arXiv , keywords =:2601.08468 , publisher =

work page doi:10.48550/arxiv.2601.08468
[10]

doi: 10.1038/s41586-025-09422-z

Nature , volume =. doi:10.1038/s41586-025-09422-z , url =

work page doi:10.1038/s41586-025-09422-z
[11]

Distilling the Knowledge in a Neural Network

Distilling the. doi:10.48550/arXiv.1503.02531 , url =. arXiv , keywords =:1503.02531 , publisher =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1503.02531
[12]

Huang, Jiaxin and Gu, Shixiang and Hou, Le and Wu, Yuexin and Wang, Xuezhi and Yu, Hongkun and Han, Jiawei , editor =. Large. Proceedings of the 2023. doi:10.18653/V1/2023.EMNLP-MAIN.67 , url =

work page doi:10.18653/v1/2023.emnlp-main.67 2023
[13]

Winning gold at imo 2025 with a model-agnostic verification- and-refinement pipeline.arXiv preprint arXiv:2507.15855, 2025

Winning. doi:10.48550/arXiv.2507.15855 , url =. arXiv , keywords =:2507.15855 , publisher =

work page doi:10.48550/arxiv.2507.15855
[14]

URL https://aclanthology.org/2025

Olympiad-level formal mathematical reasoning with reinforcement learning , year = 2025, month = nov, journal =. doi:10.1038/s41586-025-09833-y , url =

work page doi:10.1038/s41586-025-09833-y 2025
[15]

arXiv , keywords =:2511.10515 , publisher =

doi:10.48550/arXiv.2511.10515 , url =. arXiv , keywords =:2511.10515 , publisher =

work page doi:10.48550/arxiv.2511.10515
[16]

Scaling Laws for Neural Language Models

Scaling. doi:10.48550/arXiv.2001.08361 , url =. arXiv , keywords =:2001.08361 , publisher =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2001.08361 2001
[17]

Advances in

Le, Hung and Wang, Yue and Gotmare, Akhilesh Deepak and Savarese, Silvio and Hoi, Steven Chu-Hong , editor =. Advances in

work page
[18]

Mankowitz, Esme Sutherland Robson, Pushmeet Kohli, Nando De Freitas, Koray Kavukcuoglu, and Oriol Vinyals

Li, Yujia and Choi, David and Chung, Junyoung and Kushman, Nate and Schrittwieser, Julian and Leblond, Rémi and Eccles, Tom and Keeling, James and Gimeno, Felix and Dal Lago, Agustin and Hubert, Thomas and Choy, Peter and. Competition-level code generation with. Science , volume =. doi:10.1126/science.abq1158 , url =

work page doi:10.1126/science.abq1158
[19]

doi:10.48550/arXiv.2508.11874 , url =

Discovering. doi:10.48550/arXiv.2508.11874 , url =. arXiv , keywords =:2508.11874 , publisher =

work page doi:10.48550/arxiv.2508.11874
[20]

doi:10.48550/arXiv.2601.06052 , url =

Reinforcement. doi:10.48550/arXiv.2601.06052 , url =. arXiv , keywords =:2601.06052 , publisher =

work page doi:10.48550/arxiv.2601.06052
[21]

Thinking Machines Lab: Connectionism , doi =

On-. Thinking Machines Lab: Connectionism , doi =

work page
[22]

Madaan, Aman and Tandon, Niket and Gupta, Prakhar and Hallinan, Skyler and Gao, Luyu and Wiegreffe, Sarah and Alon, Uri and Dziri, Nouha and Prabhumoye, Shrimai and Yang, Yiming and Gupta, Shashank and Majumder, Bodhisattwa Prasad and Hermann, Katherine and Welleck, Sean and Yazdanbakhsh, Amir and Clark, Peter , editor =. Self-. Advances in

work page
[23]

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

Orca:. doi:10.48550/arXiv.2306.02707 , url =. arXiv , keywords =:2306.02707 , publisher =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2306.02707
[24]

AlphaEvolve: A coding agent for scientific and algorithmic discovery

doi:10.48550/arXiv.2506.13131 , url =. arXiv , keywords =:2506.13131 , publisher =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.13131
[25]

and Lewis, Mike , editor =

Press, Ofir and Zhang, Muru and Min, Sewon and Schmidt, Ludwig and Smith, Noah A. and Lewis, Mike , editor =. Measuring and. Findings of the. doi:10.18653/V1/2023.FINDINGS-EMNLP.378 , url =

work page doi:10.18653/v1/2023.findings-emnlp.378 2023
[26]

Harness design for long-running application development , year = 2026, month = mar, url =

work page 2026
[27]

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

doi:10.48550/arXiv.2504.21801 , url =. arXiv , keywords =:2504.21801 , publisher =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2504.21801
[28]

doi: 10.1038/s41586-023-06924-6

Mathematical discoveries from program search with large language models , year = 2024, month = jan, journal =. doi:10.1038/s41586-023-06924-6 , url =

work page doi:10.1038/s41586-023-06924-6 2024
[29]

In: Erk, K., Smith, N.A

Sennrich, Rico and Haddow, Barry and Birch, Alexandra , editor =. Improving. Proceedings of the 54th. doi:10.18653/v1/P16-1009 , url =

work page doi:10.18653/v1/p16-1009
[30]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

doi:10.48550/arXiv.2402.03300 , url =. arXiv , keywords =:2402.03300 , publisher =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.03300
[31]

Anderson and Yarin Gal , title =

Nature , volume =. doi:10.1038/s41586-024-07566-y , url =

work page doi:10.1038/s41586-024-07566-y
[32]

doi:10.48550/arXiv.2507.22876 , url =

Automatically discovering heuristics in a complex. doi:10.48550/arXiv.2507.22876 , url =. arXiv , keywords =:2507.22876 , publisher =

work page doi:10.48550/arxiv.2507.22876
[33]

arXiv preprint arXiv:2402.10705 (2024)

doi:10.48550/arXiv.2402.10705 , url =. arXiv , keywords =:2402.10705 , publisher =

work page doi:10.48550/arxiv.2402.10705
[34]

The bitter lesson , year = 2019, journal =

work page 2019
[35]

Training. 2018. doi:10.1109/CVPRW.2018.00143 , url =

work page doi:10.1109/cvprw.2018.00143 2018
[36]

Trinh, Yuhuai Wu, Quoc V

Solving olympiad geometry without human demonstrations , year = 2024, month = jan, journal =. doi:10.1038/s41586-023-06747-5 , url =

work page doi:10.1038/s41586-023-06747-5 2024
[37]

Position:

Villalobos, Pablo and Ho, Anson and Sevilla, Jaime and Besiroglu, Tamay and Heim, Lennart and Hobbhahn, Marius , editor =. Position:. Forty-first

work page
[38]

MiMo-V2-Flash Technical Report

doi:10.48550/arXiv.2601.02780 , url =. arXiv , keywords =:2601.02780 , publisher =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.02780
[39]

doi:10.48550/arXiv.2509.07367 , url =

Autonomous. doi:10.48550/arXiv.2509.07367 , url =. arXiv , keywords =:2509.07367 , publisher =

work page doi:10.48550/arxiv.2509.07367
[40]

Advances in

Character-level. Advances in

work page
[41]

Findings of the

Zheng, Tianyu and Zhang, Ge and Shen, Tianhao and Liu, Xueling and Lin, Bill Yuchen and Fu, Jie and Chen, Wenhu and Yue, Xiang , editor =. Findings of the. doi:10.18653/v1/2024.findings-acl.762 , url =

work page doi:10.18653/v1/2024.findings-acl.762 2024

[1] [1]

Alemohammad, Sina and. Self-. The

work page

[2] [2]

Google DeepMind , url =

work page

[3] [3]

Constitutional AI: Harmlessness from AI Feedback

Bai, Yuntao and Kadavath, Saurav and Kundu, Sandipan and Askell, Amanda and Kernion, Jackson and Jones, Andy and Chen, Anna and Goldie, Anna and Mirhoseini, Azalia and McKinnon, Cameron and Chen, Carol and Olsson, Catherine and Olah, Christopher and Hernandez, Danny and Drain, Dawn and Ganguli, Deep and Li, Dustin and. Constitutional. doi:10.48550/arXiv.2...

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212.08073

[4] [4]

NVIDIA Technical Blog , url =

Automating. NVIDIA Technical Blog , url =

work page

[5] [5]

Seed-prover 1.5: Mastering undergraduate-level theorem proving via learning from experience.arXiv preprint arXiv:2512.17260, 2025

Seed-. doi:10.48550/arXiv.2512.17260 , url =. arXiv , keywords =:2512.17260 , publisher =

work page doi:10.48550/arxiv.2512.17260

[6] [6]

Seed-prover: Deep and broad reasoning for automated theorem proving.arXiv preprint arXiv:2507.23726, 2025

Seed-. doi:10.48550/arXiv.2507.23726 , url =. arXiv , keywords =:2507.23726 , publisher =

work page doi:10.48550/arxiv.2507.23726

[7] [7]

Gold-medalist performance in solving olympiad geometry with alphageometry2.arXiv preprint arXiv:2502.03544, 2025

Gold-medalist. doi:10.48550/arXiv.2502.03544 , url =. arXiv , keywords =:2502.03544 , publisher =

work page doi:10.48550/arxiv.2502.03544

[8] [8]

DeepSeek-V3.2: Pushing the Frontier of Open Large Language Models

doi:10.48550/arXiv.2512.02556 , url =. arXiv , keywords =:2512.02556 , publisher =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2512.02556

[9] [9]

arXiv , keywords =:2601.08468 , publisher =

doi:10.48550/arXiv.2601.08468 , url =. arXiv , keywords =:2601.08468 , publisher =

work page doi:10.48550/arxiv.2601.08468

[10] [10]

doi: 10.1038/s41586-025-09422-z

Nature , volume =. doi:10.1038/s41586-025-09422-z , url =

work page doi:10.1038/s41586-025-09422-z

[11] [11]

Distilling the Knowledge in a Neural Network

Distilling the. doi:10.48550/arXiv.1503.02531 , url =. arXiv , keywords =:1503.02531 , publisher =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1503.02531

[12] [12]

Huang, Jiaxin and Gu, Shixiang and Hou, Le and Wu, Yuexin and Wang, Xuezhi and Yu, Hongkun and Han, Jiawei , editor =. Large. Proceedings of the 2023. doi:10.18653/V1/2023.EMNLP-MAIN.67 , url =

work page doi:10.18653/v1/2023.emnlp-main.67 2023

[13] [13]

Winning gold at imo 2025 with a model-agnostic verification- and-refinement pipeline.arXiv preprint arXiv:2507.15855, 2025

Winning. doi:10.48550/arXiv.2507.15855 , url =. arXiv , keywords =:2507.15855 , publisher =

work page doi:10.48550/arxiv.2507.15855

[14] [14]

URL https://aclanthology.org/2025

Olympiad-level formal mathematical reasoning with reinforcement learning , year = 2025, month = nov, journal =. doi:10.1038/s41586-025-09833-y , url =

work page doi:10.1038/s41586-025-09833-y 2025

[15] [15]

arXiv , keywords =:2511.10515 , publisher =

doi:10.48550/arXiv.2511.10515 , url =. arXiv , keywords =:2511.10515 , publisher =

work page doi:10.48550/arxiv.2511.10515

[16] [16]

Scaling Laws for Neural Language Models

Scaling. doi:10.48550/arXiv.2001.08361 , url =. arXiv , keywords =:2001.08361 , publisher =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2001.08361 2001

[17] [17]

Advances in

Le, Hung and Wang, Yue and Gotmare, Akhilesh Deepak and Savarese, Silvio and Hoi, Steven Chu-Hong , editor =. Advances in

work page

[18] [18]

Mankowitz, Esme Sutherland Robson, Pushmeet Kohli, Nando De Freitas, Koray Kavukcuoglu, and Oriol Vinyals

Li, Yujia and Choi, David and Chung, Junyoung and Kushman, Nate and Schrittwieser, Julian and Leblond, Rémi and Eccles, Tom and Keeling, James and Gimeno, Felix and Dal Lago, Agustin and Hubert, Thomas and Choy, Peter and. Competition-level code generation with. Science , volume =. doi:10.1126/science.abq1158 , url =

work page doi:10.1126/science.abq1158

[19] [19]

doi:10.48550/arXiv.2508.11874 , url =

Discovering. doi:10.48550/arXiv.2508.11874 , url =. arXiv , keywords =:2508.11874 , publisher =

work page doi:10.48550/arxiv.2508.11874

[20] [20]

doi:10.48550/arXiv.2601.06052 , url =

Reinforcement. doi:10.48550/arXiv.2601.06052 , url =. arXiv , keywords =:2601.06052 , publisher =

work page doi:10.48550/arxiv.2601.06052

[21] [21]

Thinking Machines Lab: Connectionism , doi =

On-. Thinking Machines Lab: Connectionism , doi =

work page

[22] [22]

Madaan, Aman and Tandon, Niket and Gupta, Prakhar and Hallinan, Skyler and Gao, Luyu and Wiegreffe, Sarah and Alon, Uri and Dziri, Nouha and Prabhumoye, Shrimai and Yang, Yiming and Gupta, Shashank and Majumder, Bodhisattwa Prasad and Hermann, Katherine and Welleck, Sean and Yazdanbakhsh, Amir and Clark, Peter , editor =. Self-. Advances in

work page

[23] [23]

Orca: Progressive Learning from Complex Explanation Traces of GPT-4

Orca:. doi:10.48550/arXiv.2306.02707 , url =. arXiv , keywords =:2306.02707 , publisher =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2306.02707

[24] [24]

AlphaEvolve: A coding agent for scientific and algorithmic discovery

doi:10.48550/arXiv.2506.13131 , url =. arXiv , keywords =:2506.13131 , publisher =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2506.13131

[25] [25]

and Lewis, Mike , editor =

Press, Ofir and Zhang, Muru and Min, Sewon and Schmidt, Ludwig and Smith, Noah A. and Lewis, Mike , editor =. Measuring and. Findings of the. doi:10.18653/V1/2023.FINDINGS-EMNLP.378 , url =

work page doi:10.18653/v1/2023.findings-emnlp.378 2023

[26] [26]

Harness design for long-running application development , year = 2026, month = mar, url =

work page 2026

[27] [27]

DeepSeek-Prover-V2: Advancing Formal Mathematical Reasoning via Reinforcement Learning for Subgoal Decomposition

doi:10.48550/arXiv.2504.21801 , url =. arXiv , keywords =:2504.21801 , publisher =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2504.21801

[28] [28]

doi: 10.1038/s41586-023-06924-6

Mathematical discoveries from program search with large language models , year = 2024, month = jan, journal =. doi:10.1038/s41586-023-06924-6 , url =

work page doi:10.1038/s41586-023-06924-6 2024

[29] [29]

In: Erk, K., Smith, N.A

Sennrich, Rico and Haddow, Barry and Birch, Alexandra , editor =. Improving. Proceedings of the 54th. doi:10.18653/v1/P16-1009 , url =

work page doi:10.18653/v1/p16-1009

[30] [30]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

doi:10.48550/arXiv.2402.03300 , url =. arXiv , keywords =:2402.03300 , publisher =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2402.03300

[31] [31]

Anderson and Yarin Gal , title =

Nature , volume =. doi:10.1038/s41586-024-07566-y , url =

work page doi:10.1038/s41586-024-07566-y

[32] [32]

doi:10.48550/arXiv.2507.22876 , url =

Automatically discovering heuristics in a complex. doi:10.48550/arXiv.2507.22876 , url =. arXiv , keywords =:2507.22876 , publisher =

work page doi:10.48550/arxiv.2507.22876

[33] [33]

arXiv preprint arXiv:2402.10705 (2024)

doi:10.48550/arXiv.2402.10705 , url =. arXiv , keywords =:2402.10705 , publisher =

work page doi:10.48550/arxiv.2402.10705

[34] [34]

The bitter lesson , year = 2019, journal =

work page 2019

[35] [35]

Training. 2018. doi:10.1109/CVPRW.2018.00143 , url =

work page doi:10.1109/cvprw.2018.00143 2018

[36] [36]

Trinh, Yuhuai Wu, Quoc V

Solving olympiad geometry without human demonstrations , year = 2024, month = jan, journal =. doi:10.1038/s41586-023-06747-5 , url =

work page doi:10.1038/s41586-023-06747-5 2024

[37] [37]

Position:

Villalobos, Pablo and Ho, Anson and Sevilla, Jaime and Besiroglu, Tamay and Heim, Lennart and Hobbhahn, Marius , editor =. Position:. Forty-first

work page

[38] [38]

MiMo-V2-Flash Technical Report

doi:10.48550/arXiv.2601.02780 , url =. arXiv , keywords =:2601.02780 , publisher =

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2601.02780

[39] [39]

doi:10.48550/arXiv.2509.07367 , url =

Autonomous. doi:10.48550/arXiv.2509.07367 , url =. arXiv , keywords =:2509.07367 , publisher =

work page doi:10.48550/arxiv.2509.07367

[40] [40]

Advances in

Character-level. Advances in

work page

[41] [41]

Findings of the

Zheng, Tianyu and Zhang, Ge and Shen, Tianhao and Liu, Xueling and Lin, Bill Yuchen and Fu, Jie and Chen, Wenhu and Yue, Xiang , editor =. Findings of the. doi:10.18653/v1/2024.findings-acl.762 , url =

work page doi:10.18653/v1/2024.findings-acl.762 2024