Learning First Integrals via Backward-Generated Data and Guided Reinforcement Learning

Jingfeng Zhong; Shuai Li; Zhengxiang Liu; Zhijie Wang

arxiv: 2605.21160 · v1 · pith:CBOLIDUUnew · submitted 2026-05-20 · 💻 cs.LG

Learning First Integrals via Backward-Generated Data and Guided Reinforcement Learning

Jingfeng Zhong , Zhengxiang Liu , Zhijie Wang , Shuai Li This is my paper

Pith reviewed 2026-05-21 05:52 UTC · model grok-4.3

classification 💻 cs.LG

keywords first integralsdifferential equationsbackward generationreinforcement learninglarge language modelsconservation lawssymbolic computation

0 comments

The pith

FISolver uses backward generation of training pairs and reinforcement learning to let a compact model outperform larger LLMs and Mathematica at finding first integrals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a data-driven approach to the long-standing problem of discovering first integrals, which represent conservation laws in systems of differential equations. It creates large synthetic datasets by reversing the usual process: sampling candidate integrals first and then deriving the differential equations that would have those integrals. Supervised fine-tuning on this data is followed by reinforcement learning that rewards outputs closer in edit distance to correct solutions. The resulting system solves challenging benchmark problems more accurately than much larger language models or established symbolic solvers while using far less computation.

Core claim

By deriving differential equations from randomly sampled first integrals, the backward generation procedure produces abundant (equation, integral) training pairs. Supervised fine-tuning of a compact mathematical language model on these pairs, followed by reinforcement learning with a Levenshtein-distance-shaped reward and targeted data blending, produces FISolver, which solves difficult first-integral problems more reliably than larger mathematical LLMs or commercial solvers such as Mathematica.

What carries the argument

The backward generation algorithm, which starts from sampled first integrals and constructs corresponding differential equations to create training pairs, together with Levenshtein Distance-based reward shaping inside reinforcement learning for exact symbolic output.

If this is right

Data synthesis by reversing from solutions to problems can relieve scarcity for other symbolic mathematics tasks.
Reinforcement learning with string-edit rewards can steer language models toward exact rather than approximate mathematical outputs.
Smaller models become viable for specialized scientific discovery once appropriate synthetic data and reward shaping are supplied.
Conservation-law discovery in dynamical systems can be partially automated without relying on hand-crafted heuristics or massive model scale.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same backward-generation idea could be tested on inverse problems in algebra or geometry where forward sampling is expensive.
If the method scales, it suggests that many conservation laws possess statistical regularities that reward-shaped models can exploit without explicit mathematical insight.
Hybrid pipelines might emerge in which an LLM proposes candidate integrals and a traditional verifier confirms them, reducing the need for full end-to-end symbolic search.

Load-bearing premise

The synthetic pairs created by backward generation are diverse and representative enough that a model trained on them can solve genuinely new and harder families of differential equations rather than only patterns already present in the generated distribution.

What would settle it

A new benchmark consisting of first-integral problems drawn from families that cannot be obtained by the backward-generation sampling procedure, on which FISolver fails to match or exceed the success rate of Mathematica or larger LLMs, would show that generalization has not occurred.

Figures

Figures reproduced from arXiv: 2605.21160 by Jingfeng Zhong, Shuai Li, Zhengxiang Liu, Zhijie Wang.

**Figure 1.** Figure 1: Accuracy of first integral prediction on the Normal and Hard test sets. FISolver (1.5B parameters) significantly [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: FISolver comprises three main stages: (1) [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

read the original abstract

The discovery of first integrals is of fundamental scientific importance for understanding conservation laws in dynamical systems. However, existing symbolic computation tools and Large Language Models (LLMs) remain limited on this task because high-quality training data are scarce and successful solutions often depend on mathematical intuition. This paper presents FISolver, an LLM-based solver developed to address this challenge. First, we introduce a "Backward Generation" algorithm that systematically builds large-scale datasets of (differential equation, first integral) pairs by deriving differential equations from sampled integrals, thereby alleviating the data scarcity bottleneck. Second, we apply supervised fine-tuning to a compact mathematical model and further improve its performance through reinforcement learning with a Levenshtein Distance-based shaped reward. In addition, we design data synthesis and blending strategies that support effective adaptation to difficult problem families from sparse examples. Experiments show that FISolver, while requiring substantially lower computational cost, significantly outperforms larger mathematical LLMs and commercial solvers such as Mathematica on challenging benchmarks, indicating a new data-driven route for automated discovery of first integrals.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Backward generation of (DE, integral) pairs plus Levenshtein RL is a practical new route around data scarcity, but the abstract gives no numbers to support the outperformance claims.

read the letter

The main thing here is a data-synthesis trick that starts from sampled first integrals, derives the corresponding differential equations, and uses those pairs to fine-tune a small model before applying reinforcement learning with a Levenshtein-distance reward. That pipeline is the clearest novelty, and it directly targets the scarcity problem that limits both symbolic solvers and larger LLMs on conservation-law discovery. The paper does well by keeping the base model compact and by making the reward external and measurable rather than relying on the model’s own guesses. The blending strategies for sparse difficult examples also look like a sensible engineering step. These pieces together give a concrete, reproducible way to generate training signal without needing massive human-curated datasets. The soft spots sit mostly in the evaluation. The abstract states that the method significantly outperforms larger mathematical LLMs and Mathematica on challenging benchmarks, yet supplies no quantitative results, error bars, baseline details, or description of how the hard problems were chosen. Without those numbers it is impossible to judge whether the gains are real or modest. The stress-test concern about distribution mismatch is worth checking: if the backward-sampled integrals are drawn from a narrow class of forms, the derived DEs may not cover the nonlinear or high-order cases that actually define the tough benchmarks. A statistical comparison between the generated training distribution and the evaluation set would settle this quickly. This paper is aimed at people working on symbolic regression, automated discovery for dynamical systems, and LLM fine-tuning for mathematics. A reader who needs practical data-augmentation ideas for equation learning will find usable pieces even if the final performance numbers turn out moderate. It deserves a serious referee because the core pipeline is original, the problem is well-motivated, and the approach is technically plausible. I would send it to peer review and ask for the full experimental tables plus a direct check on whether the synthetic distribution matches the hard test cases.

Referee Report

2 major / 2 minor

Summary. The paper introduces FISolver, an LLM-based solver for discovering first integrals of differential equations. It proposes a backward-generation algorithm that samples first integrals and derives the corresponding DEs to create large-scale synthetic (DE, integral) training pairs, followed by supervised fine-tuning of a compact mathematical LLM and reinforcement learning using a Levenshtein distance-shaped reward. Data synthesis and blending strategies are used to adapt to difficult problem families. The central claim is that FISolver achieves significantly better performance than larger mathematical LLMs and commercial solvers such as Mathematica on challenging benchmarks, at substantially lower computational cost.

Significance. If the performance and generalization claims hold, the work would provide a practical data-driven route to automated discovery of conservation laws, addressing the scarcity of high-quality symbolic training data in dynamical systems. The combination of backward data generation with Levenshtein-guided RL on a compact model, plus explicit strategies for sparse adaptation, represents a concrete strength; reproducible code or machine-checked elements are not mentioned but would further strengthen the contribution if present.

major comments (2)

[Abstract] Abstract: the claim that FISolver 'significantly outperforms larger mathematical LLMs and commercial solvers such as Mathematica on challenging benchmarks' is presented without any quantitative metrics, error bars, baseline details, or description of benchmark selection and difficulty. This absence directly undermines evaluation of the central performance claim.
[Data Synthesis / Experiments] Data generation and experiments sections: no quantitative measure (e.g., Kolmogorov-Smirnov test, moment matching, or coverage statistics over known hard families) is reported comparing the distribution of backward-generated (DE, first integral) pairs to the evaluation benchmarks. Without this, the generalization claim that the synthetic distribution supports extrapolation to genuinely difficult, unseen problem families remains unverified and load-bearing for the main result.

minor comments (2)

[Abstract] The abstract would be clearer if it briefly stated the base model size, number of benchmarks, and exact performance deltas rather than qualitative descriptors.
[Method] Notation for the Levenshtein reward and the blending strategies could be introduced earlier with a short equation or pseudocode to aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We address each major comment point by point below and outline the planned revisions to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that FISolver 'significantly outperforms larger mathematical LLMs and commercial solvers such as Mathematica on challenging benchmarks' is presented without any quantitative metrics, error bars, baseline details, or description of benchmark selection and difficulty. This absence directly undermines evaluation of the central performance claim.

Authors: We agree that the abstract would benefit from explicit quantitative support for the performance claim. In the revised manuscript we will update the abstract to incorporate the key metrics already reported in the experiments section, including success rates on the benchmarks, direct comparisons to the larger LLMs and Mathematica, and brief indications of benchmark selection criteria and observed variability. This change will make the central claim easier to evaluate while preserving the abstract's length and focus. revision: yes
Referee: [Data Synthesis / Experiments] Data generation and experiments sections: no quantitative measure (e.g., Kolmogorov-Smirnov test, moment matching, or coverage statistics over known hard families) is reported comparing the distribution of backward-generated (DE, first integral) pairs to the evaluation benchmarks. Without this, the generalization claim that the synthetic distribution supports extrapolation to genuinely difficult, unseen problem families remains unverified and load-bearing for the main result.

Authors: We acknowledge that an explicit quantitative comparison of the synthetic data distribution to the evaluation benchmarks would strengthen the generalization argument. The backward-generation algorithm is constructed to sample broadly across polynomial, trigonometric, and other families that appear in standard benchmarks, but we did not include distributional statistics in the original submission. We will add coverage statistics and, where appropriate, moment-matching or diversity metrics in the revised data-synthesis section to document the overlap with known hard families and thereby support the extrapolation claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained.

full rationale

The paper generates (DE, integral) training pairs independently via backward sampling of integrals followed by derivation of the corresponding DEs; the RL reward is defined externally as Levenshtein distance to ground-truth integrals; evaluation occurs on separate challenging benchmarks. No load-bearing step equates a claimed prediction or result to its own fitted inputs or self-citations by construction. The method is data-driven and externally validated rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unverified assumption that backward-generated synthetic data plus string-edit reward produces genuine generalization rather than distribution-specific overfitting; no free parameters or invented entities are explicitly named in the abstract.

axioms (1)

domain assumption Large language models can be effectively fine-tuned and reinforced on symbolic mathematical tasks when provided with sufficient high-quality paired data.
Implicit in the use of supervised fine-tuning and RL on the generated dataset.

pith-pipeline@v0.9.0 · 5713 in / 1366 out tokens · 40412 ms · 2026-05-21T05:52:04.523745+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

reinforcement learning with a Levenshtein Distance-based shaped reward

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 7 internal anchors

[1]

Springer Science & Business Media, 1992

Vladimir I Arnold.Ordinary differential equations. Springer Science & Business Media, 1992

work page 1992
[2]

Centennial history of hilbert’s 16th problem.Bulletin of the American Mathematical Society, 39(3):301–354, 2002

Yu Ilyashenko. Centennial history of hilbert’s 16th problem.Bulletin of the American Mathematical Society, 39(3):301–354, 2002

work page 2002
[3]

Gauthier-Villars et fils, imprimeurs- libraires, 1893

Henri Poincaré.Les méthodes nouvelles de la mécanique céleste, volume 2. Gauthier-Villars et fils, imprimeurs- libraires, 1893

work page
[4]

A new way to use nonlocal symmetries to determine first integrals of second-order nonlinear ordinary differential equations

A Braz, LGS Duarte, HS Ferreira, ACS Guabiraba, LACP da Mota, and ISS Nascimento. A new way to use nonlocal symmetries to determine first integrals of second-order nonlinear ordinary differential equations. Computer Physics Communications, 307:109426, 2025

work page 2025
[5]

Formal mathematical reasoning: A new frontier in ai.arXiv preprint arXiv:2412.16075, 2024

Kaiyu Yang, Gabriel Poesia, Jingxuan He, Wenda Li, Kristin Lauter, Swarat Chaudhuri, and Dawn Song. Formal mathematical reasoning: A new frontier in ai.arXiv preprint arXiv:2412.16075, 2024

work page arXiv 2024
[6]

Solving quantitative reasoning problems with language models.Advances in neural information processing systems, 35:3843–3857, 2022

Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, et al. Solving quantitative reasoning problems with language models.Advances in neural information processing systems, 35:3843–3857, 2022. 14

work page 2022
[7]

Solving olympiad geometry without human demonstrations.Nature, 625(7995):476–482, 2024

Trieu H Trinh, Yuhuai Wu, Quoc V Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations.Nature, 625(7995):476–482, 2024

work page 2024
[8]

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Longhui Yu, Weisen Jiang, Han Shi, Jincheng Yu, Zhengying Liu, Yu Zhang, James T Kwok, Zhenguo Li, Adrian Weller, and Weiyang Liu. Metamath: Bootstrap your own mathematical questions for large language models. arXiv preprint arXiv:2309.12284, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[9]

Deep learning for symbolic mathematics.arXiv preprint arXiv:1912.01412, 2019

Guillaume Lample and François Charton. Deep learning for symbolic mathematics.arXiv preprint arXiv:1912.01412, 2019

work page arXiv 1912
[10]

Advancing mathematics by guiding human intuition with ai.Nature, 600(7887):70–74, 2021

Alex Davies, Petar Veliˇckovi´c, Lars Buesing, Sam Blackwell, Daniel Zheng, Nenad Tomašev, Richard Tanburn, Peter Battaglia, Charles Blundell, András Juhász, et al. Advancing mathematics by guiding human intuition with ai.Nature, 600(7887):70–74, 2021

work page 2021
[11]

Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

work page 2022
[12]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Y Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024. URL https://arxiv. org/abs/2402.03300, 2(3):5, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

Invariant variation problems.Transport theory and statistical physics, 1(3):186–207, 1971

Emmy Noether. Invariant variation problems.Transport theory and statistical physics, 1(3):186–207, 1971

work page 1971
[14]

Elementary first integrals of differential equations

Myra Jean Prelle and Michael F Singer. Elementary first integrals of differential equations. InProceedings of the fourth ACM symposium on Symbolic and algebraic computation, pages 30–35, 1981

work page 1981
[15]

Springer, 2002

George W Bluman and Stephen C Anco.Symmetry and integration methods for differential equations. Springer, 2002

work page 2002
[16]

Liouvillian first integrals of differential equations.Transactions of the American Mathematical Society, 333(2):673–688, 1992

Michael F Singer. Liouvillian first integrals of differential equations.Transactions of the American Mathematical Society, 333(2):673–688, 1992

work page 1992
[17]

Constants of motion network.Advances in Neural Information Processing Systems, 35:25295–25305, 2022

Muhammad Firmansyah Kasim and Yi Heng Lim. Constants of motion network.Advances in Neural Information Processing Systems, 35:25295–25305, 2022

work page 2022
[18]

Hamiltonian neural networks.Advances in neural information processing systems, 32, 2019

Samuel Greydanus, Misko Dzamba, and Jason Yosinski. Hamiltonian neural networks.Advances in neural information processing systems, 32, 2019

work page 2019
[19]

Neural symplectic form: Learning hamiltonian equations on general coordinate systems.Advances in Neural Information Processing Systems, 34:16659–16670, 2021

Yuhan Chen, Takashi Matsubara, and Takaharu Yaguchi. Neural symplectic form: Learning hamiltonian equations on general coordinate systems.Advances in Neural Information Processing Systems, 34:16659–16670, 2021

work page 2021
[20]

Lagrangian neural networks, 2020

Miles Cranmer, Sam Greydanus, Stephan Hoyer, Peter Battaglia, David Spergel, and Shirley Ho. Lagrangian neural networks.arXiv preprint arXiv:2003.04630, 2020

work page arXiv 2003
[21]

Discovering invariants via machine learning.Physical Review Research, 3(4):L042035, 2021

Seungwoong Ha and Hawoong Jeong. Discovering invariants via machine learning.Physical Review Research, 3(4):L042035, 2021

work page 2021
[22]

Machine learning conservation laws from differential equations

Ziming Liu, Varun Madhavan, and Max Tegmark. Machine learning conservation laws from differential equations. Physical Review E, 106(4):045307, 2022

work page 2022
[23]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

work page 2017
[24]

Qwen2 Technical Report

An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, et al. Qwen2 technical report.arXiv preprint arXiv:2407.10671, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[25]

Learning the greatest common divisor: explaining transformer predictions.arXiv preprint arXiv:2308.15594, 2023

François Charton. Learning the greatest common divisor: explaining transformer predictions.arXiv preprint arXiv:2308.15594, 2023

work page arXiv 2023
[26]

Linear algebra with transformers.arXiv preprint arXiv:2112.01898, 2021

François Charton. Linear algebra with transformers.arXiv preprint arXiv:2112.01898, 2021

work page arXiv 2021
[27]

Transforming the bootstrap: using transformers to compute scattering amplitudes in planarN= 4super yang–mills theory.Machine Learning: Science and Technology, 5(3):035073, 2024

Tianji Cai, Garrett W Merz, François Charton, Niklas Nolte, Matthias Wilhelm, Kyle Cranmer, and Lance J Dixon. Transforming the bootstrap: using transformers to compute scattering amplitudes in planarN= 4super yang–mills theory.Machine Learning: Science and Technology, 5(3):035073, 2024

work page 2024
[28]

Global lyapunov functions: a long-standing open problem in mathematics, with symbolic transformers.Advances in Neural Information Processing Systems, 37:93643–93670, 2024

Alberto Alfarano, François Charton, and Amaury Hayat. Global lyapunov functions: a long-standing open problem in mathematics, with symbolic transformers.Advances in Neural Information Processing Systems, 37:93643–93670, 2024

work page 2024
[29]

Ai feynman: A physics-inspired method for symbolic regression

Silviu-Marian Udrescu and Max Tegmark. Ai feynman: A physics-inspired method for symbolic regression. Science advances, 6(16):eaay2631, 2020

work page 2020
[30]

Neural symbolic regression that scales

Luca Biggio, Tommaso Bendinelli, Alexander Neitz, Aurelien Lucchi, and Giambattista Parascandolo. Neural symbolic regression that scales. InInternational Conference on Machine Learning, pages 936–945. Pmlr, 2021. 15

work page 2021
[31]

Shojaee, K

Parshin Shojaee, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, and Chandan K Reddy. Llm-sr: Scientific equation discovery via programming with large language models.arXiv preprint arXiv:2404.18400, 2024

work page arXiv 2024
[32]

Discovering symmetries of odes by symbolic regression

Paul Kahlmeyer, Niklas Merk, and Joachim Giesen. Discovering symmetries of odes by symbolic regression. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 17715–17723, 2025

work page 2025
[33]

Mathgenie: Generating synthetic data with question back-translation for enhancing mathematical reasoning of llms.arXiv preprint arXiv:2402.16352, 2024

Zimu Lu, Aojun Zhou, Houxing Ren, Ke Wang, Weikang Shi, Junting Pan, Mingjie Zhan, and Hongsheng Li. Mathgenie: Generating synthetic data with question back-translation for enhancing mathematical reasoning of llms.arXiv preprint arXiv:2402.16352, 2024

work page arXiv 2024
[34]

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jianguang Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, and Dongmei Zhang. Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct.arXiv preprint arXiv:2308.09583, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[35]

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, and Weizhu Chen. Tora: A tool-integrated reasoning agent for mathematical problem solving.arXiv preprint arXiv:2309.17452, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[36]

Lima: Less is more for alignment.Advances in Neural Information Processing Systems, 36:55006–55021, 2023

Chunting Zhou, Pengfei Liu, Puxin Xu, Srinivasan Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, et al. Lima: Less is more for alignment.Advances in Neural Information Processing Systems, 36:55006–55021, 2023

work page 2023
[37]

How abilities in large language models are affected by supervised fine-tuning data composition

Guanting Dong, Hongyi Yuan, Keming Lu, Chengpeng Li, Mingfeng Xue, Dayiheng Liu, Wei Wang, Zheng Yuan, Chang Zhou, and Jingren Zhou. How abilities in large language models are affected by supervised fine-tuning data composition. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 177–198, 2024

work page 2024
[38]

Textbooks Are All You Need

Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, et al. Textbooks are all you need.arXiv preprint arXiv:2306.11644, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[39]

Let’s verify step by step

Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. InThe Twelfth International Conference on Learning Representations, 2023

work page 2023
[40]

Binary codes capable of correcting deletions, insertions, and reversals probl.Inf

VI Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals probl.Inf. Transm, 1:8–17, 1965

work page 1965
[41]

Sequence Level Training with Recurrent Neural Networks

Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. Sequence level training with recurrent neural networks.arXiv preprint arXiv:1511.06732, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[42]

Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients

Brenden K Petersen, Mikel Landajuela, T Nathan Mundhenk, Claudio P Santiago, Soo K Kim, and Joanne T Kim. Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients. arXiv preprint arXiv:1912.04871, 2019

work page arXiv 1912
[43]

Tree edit distance learning via adaptive symbol embeddings

Benjamin Paaßen, Claudio Gallicchio, Alessio Micheli, and Barbara Hammer. Tree edit distance learning via adaptive symbol embeddings. InInternational Conference on Machine Learning, pages 3976–3985. PMLR, 2018

work page 2018
[44]

Sympy: symbolic computing in python.PeerJ Computer Science, 3:e103, 2017

Aaron Meurer, Christopher P Smith, Mateusz Paprocki, Ondˇrej ˇCertík, Sergey B Kirpichev, Matthew Rocklin, AMiT Kumar, Sergiu Ivanov, Jason K Moore, Sartaj Singh, et al. Sympy: symbolic computing in python.PeerJ Computer Science, 3:e103, 2017

work page 2017
[45]

Datasets: A community library for natural language processing

Quentin Lhoest, Albert Villanova Del Moral, Yacine Jernite, Abhishek Thakur, Patrick V on Platen, Suraj Patil, Julien Chaumond, Mariama Drame, Julien Plu, Lewis Tunstall, et al. Datasets: A community library for natural language processing. InProceedings of the 2021 conference on empirical methods in natural language processing: system demonstrations, pag...

work page 2021
[46]

Transformers: State-of-the-art natural language processing

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45, 2020

work page 2020
[47]

Trl: Transformer reinforcement learning

Leandro von Werra, Younes Belkada, Lewis Tunstall, Edward Beeching, Tristan Thrush, Nathan Lambert, Shengyi Huang, Kashif Rasul, and Quentin Gallouédec. Trl: Transformer reinforcement learning. https: //github.com/huggingface/trl, 2020

work page 2020
[48]

Mathematica, Version 14.3, 2024

Wolfram Research, Inc. Mathematica, Version 14.3, 2024. Champaign, IL, 2024

work page 2024
[49]

Deepseek-v3.2-exp: Boosting long-context efficiency with deepseek sparse attention, 2025

DeepSeek-AI. Deepseek-v3.2-exp: Boosting long-context efficiency with deepseek sparse attention, 2025. 16

work page 2025
[50]

Mathematical biology: I

James D Murray. Mathematical biology: I. an introduction. interdisciplinary applied mathematics.Mathematical Biology, Springer, 17, 2002. 17

work page 2002

[1] [1]

Springer Science & Business Media, 1992

Vladimir I Arnold.Ordinary differential equations. Springer Science & Business Media, 1992

work page 1992

[2] [2]

Centennial history of hilbert’s 16th problem.Bulletin of the American Mathematical Society, 39(3):301–354, 2002

Yu Ilyashenko. Centennial history of hilbert’s 16th problem.Bulletin of the American Mathematical Society, 39(3):301–354, 2002

work page 2002

[3] [3]

Gauthier-Villars et fils, imprimeurs- libraires, 1893

Henri Poincaré.Les méthodes nouvelles de la mécanique céleste, volume 2. Gauthier-Villars et fils, imprimeurs- libraires, 1893

work page

[4] [4]

A new way to use nonlocal symmetries to determine first integrals of second-order nonlinear ordinary differential equations

A Braz, LGS Duarte, HS Ferreira, ACS Guabiraba, LACP da Mota, and ISS Nascimento. A new way to use nonlocal symmetries to determine first integrals of second-order nonlinear ordinary differential equations. Computer Physics Communications, 307:109426, 2025

work page 2025

[5] [5]

Formal mathematical reasoning: A new frontier in ai.arXiv preprint arXiv:2412.16075, 2024

Kaiyu Yang, Gabriel Poesia, Jingxuan He, Wenda Li, Kristin Lauter, Swarat Chaudhuri, and Dawn Song. Formal mathematical reasoning: A new frontier in ai.arXiv preprint arXiv:2412.16075, 2024

work page arXiv 2024

[6] [6]

Solving quantitative reasoning problems with language models.Advances in neural information processing systems, 35:3843–3857, 2022

Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, et al. Solving quantitative reasoning problems with language models.Advances in neural information processing systems, 35:3843–3857, 2022. 14

work page 2022

[7] [7]

Solving olympiad geometry without human demonstrations.Nature, 625(7995):476–482, 2024

Trieu H Trinh, Yuhuai Wu, Quoc V Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations.Nature, 625(7995):476–482, 2024

work page 2024

[8] [8]

MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

Longhui Yu, Weisen Jiang, Han Shi, Jincheng Yu, Zhengying Liu, Yu Zhang, James T Kwok, Zhenguo Li, Adrian Weller, and Weiyang Liu. Metamath: Bootstrap your own mathematical questions for large language models. arXiv preprint arXiv:2309.12284, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[9] [9]

Deep learning for symbolic mathematics.arXiv preprint arXiv:1912.01412, 2019

Guillaume Lample and François Charton. Deep learning for symbolic mathematics.arXiv preprint arXiv:1912.01412, 2019

work page arXiv 1912

[10] [10]

Advancing mathematics by guiding human intuition with ai.Nature, 600(7887):70–74, 2021

Alex Davies, Petar Veliˇckovi´c, Lars Buesing, Sam Blackwell, Daniel Zheng, Nenad Tomašev, Richard Tanburn, Peter Battaglia, Charles Blundell, András Juhász, et al. Advancing mathematics by guiding human intuition with ai.Nature, 600(7887):70–74, 2021

work page 2021

[11] [11]

Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

work page 2022

[12] [12]

DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Y Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024. URL https://arxiv. org/abs/2402.03300, 2(3):5, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[13] [13]

Invariant variation problems.Transport theory and statistical physics, 1(3):186–207, 1971

Emmy Noether. Invariant variation problems.Transport theory and statistical physics, 1(3):186–207, 1971

work page 1971

[14] [14]

Elementary first integrals of differential equations

Myra Jean Prelle and Michael F Singer. Elementary first integrals of differential equations. InProceedings of the fourth ACM symposium on Symbolic and algebraic computation, pages 30–35, 1981

work page 1981

[15] [15]

Springer, 2002

George W Bluman and Stephen C Anco.Symmetry and integration methods for differential equations. Springer, 2002

work page 2002

[16] [16]

Liouvillian first integrals of differential equations.Transactions of the American Mathematical Society, 333(2):673–688, 1992

Michael F Singer. Liouvillian first integrals of differential equations.Transactions of the American Mathematical Society, 333(2):673–688, 1992

work page 1992

[17] [17]

Constants of motion network.Advances in Neural Information Processing Systems, 35:25295–25305, 2022

Muhammad Firmansyah Kasim and Yi Heng Lim. Constants of motion network.Advances in Neural Information Processing Systems, 35:25295–25305, 2022

work page 2022

[18] [18]

Hamiltonian neural networks.Advances in neural information processing systems, 32, 2019

Samuel Greydanus, Misko Dzamba, and Jason Yosinski. Hamiltonian neural networks.Advances in neural information processing systems, 32, 2019

work page 2019

[19] [19]

Neural symplectic form: Learning hamiltonian equations on general coordinate systems.Advances in Neural Information Processing Systems, 34:16659–16670, 2021

Yuhan Chen, Takashi Matsubara, and Takaharu Yaguchi. Neural symplectic form: Learning hamiltonian equations on general coordinate systems.Advances in Neural Information Processing Systems, 34:16659–16670, 2021

work page 2021

[20] [20]

Lagrangian neural networks, 2020

Miles Cranmer, Sam Greydanus, Stephan Hoyer, Peter Battaglia, David Spergel, and Shirley Ho. Lagrangian neural networks.arXiv preprint arXiv:2003.04630, 2020

work page arXiv 2003

[21] [21]

Discovering invariants via machine learning.Physical Review Research, 3(4):L042035, 2021

Seungwoong Ha and Hawoong Jeong. Discovering invariants via machine learning.Physical Review Research, 3(4):L042035, 2021

work page 2021

[22] [22]

Machine learning conservation laws from differential equations

Ziming Liu, Varun Madhavan, and Max Tegmark. Machine learning conservation laws from differential equations. Physical Review E, 106(4):045307, 2022

work page 2022

[23] [23]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

work page 2017

[24] [24]

Qwen2 Technical Report

An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, et al. Qwen2 technical report.arXiv preprint arXiv:2407.10671, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[25] [25]

Learning the greatest common divisor: explaining transformer predictions.arXiv preprint arXiv:2308.15594, 2023

François Charton. Learning the greatest common divisor: explaining transformer predictions.arXiv preprint arXiv:2308.15594, 2023

work page arXiv 2023

[26] [26]

Linear algebra with transformers.arXiv preprint arXiv:2112.01898, 2021

François Charton. Linear algebra with transformers.arXiv preprint arXiv:2112.01898, 2021

work page arXiv 2021

[27] [27]

Transforming the bootstrap: using transformers to compute scattering amplitudes in planarN= 4super yang–mills theory.Machine Learning: Science and Technology, 5(3):035073, 2024

Tianji Cai, Garrett W Merz, François Charton, Niklas Nolte, Matthias Wilhelm, Kyle Cranmer, and Lance J Dixon. Transforming the bootstrap: using transformers to compute scattering amplitudes in planarN= 4super yang–mills theory.Machine Learning: Science and Technology, 5(3):035073, 2024

work page 2024

[28] [28]

Global lyapunov functions: a long-standing open problem in mathematics, with symbolic transformers.Advances in Neural Information Processing Systems, 37:93643–93670, 2024

Alberto Alfarano, François Charton, and Amaury Hayat. Global lyapunov functions: a long-standing open problem in mathematics, with symbolic transformers.Advances in Neural Information Processing Systems, 37:93643–93670, 2024

work page 2024

[29] [29]

Ai feynman: A physics-inspired method for symbolic regression

Silviu-Marian Udrescu and Max Tegmark. Ai feynman: A physics-inspired method for symbolic regression. Science advances, 6(16):eaay2631, 2020

work page 2020

[30] [30]

Neural symbolic regression that scales

Luca Biggio, Tommaso Bendinelli, Alexander Neitz, Aurelien Lucchi, and Giambattista Parascandolo. Neural symbolic regression that scales. InInternational Conference on Machine Learning, pages 936–945. Pmlr, 2021. 15

work page 2021

[31] [31]

Shojaee, K

Parshin Shojaee, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, and Chandan K Reddy. Llm-sr: Scientific equation discovery via programming with large language models.arXiv preprint arXiv:2404.18400, 2024

work page arXiv 2024

[32] [32]

Discovering symmetries of odes by symbolic regression

Paul Kahlmeyer, Niklas Merk, and Joachim Giesen. Discovering symmetries of odes by symbolic regression. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 17715–17723, 2025

work page 2025

[33] [33]

Mathgenie: Generating synthetic data with question back-translation for enhancing mathematical reasoning of llms.arXiv preprint arXiv:2402.16352, 2024

Zimu Lu, Aojun Zhou, Houxing Ren, Ke Wang, Weikang Shi, Junting Pan, Mingjie Zhan, and Hongsheng Li. Mathgenie: Generating synthetic data with question back-translation for enhancing mathematical reasoning of llms.arXiv preprint arXiv:2402.16352, 2024

work page arXiv 2024

[34] [34]

WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jianguang Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, and Dongmei Zhang. Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct.arXiv preprint arXiv:2308.09583, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[35] [35]

ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, and Weizhu Chen. Tora: A tool-integrated reasoning agent for mathematical problem solving.arXiv preprint arXiv:2309.17452, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[36] [36]

Lima: Less is more for alignment.Advances in Neural Information Processing Systems, 36:55006–55021, 2023

Chunting Zhou, Pengfei Liu, Puxin Xu, Srinivasan Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, et al. Lima: Less is more for alignment.Advances in Neural Information Processing Systems, 36:55006–55021, 2023

work page 2023

[37] [37]

How abilities in large language models are affected by supervised fine-tuning data composition

Guanting Dong, Hongyi Yuan, Keming Lu, Chengpeng Li, Mingfeng Xue, Dayiheng Liu, Wei Wang, Zheng Yuan, Chang Zhou, and Jingren Zhou. How abilities in large language models are affected by supervised fine-tuning data composition. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 177–198, 2024

work page 2024

[38] [38]

Textbooks Are All You Need

Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, et al. Textbooks are all you need.arXiv preprint arXiv:2306.11644, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[39] [39]

Let’s verify step by step

Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. InThe Twelfth International Conference on Learning Representations, 2023

work page 2023

[40] [40]

Binary codes capable of correcting deletions, insertions, and reversals probl.Inf

VI Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals probl.Inf. Transm, 1:8–17, 1965

work page 1965

[41] [41]

Sequence Level Training with Recurrent Neural Networks

Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. Sequence level training with recurrent neural networks.arXiv preprint arXiv:1511.06732, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[42] [42]

Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients

Brenden K Petersen, Mikel Landajuela, T Nathan Mundhenk, Claudio P Santiago, Soo K Kim, and Joanne T Kim. Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients. arXiv preprint arXiv:1912.04871, 2019

work page arXiv 1912

[43] [43]

Tree edit distance learning via adaptive symbol embeddings

Benjamin Paaßen, Claudio Gallicchio, Alessio Micheli, and Barbara Hammer. Tree edit distance learning via adaptive symbol embeddings. InInternational Conference on Machine Learning, pages 3976–3985. PMLR, 2018

work page 2018

[44] [44]

Sympy: symbolic computing in python.PeerJ Computer Science, 3:e103, 2017

Aaron Meurer, Christopher P Smith, Mateusz Paprocki, Ondˇrej ˇCertík, Sergey B Kirpichev, Matthew Rocklin, AMiT Kumar, Sergiu Ivanov, Jason K Moore, Sartaj Singh, et al. Sympy: symbolic computing in python.PeerJ Computer Science, 3:e103, 2017

work page 2017

[45] [45]

Datasets: A community library for natural language processing

Quentin Lhoest, Albert Villanova Del Moral, Yacine Jernite, Abhishek Thakur, Patrick V on Platen, Suraj Patil, Julien Chaumond, Mariama Drame, Julien Plu, Lewis Tunstall, et al. Datasets: A community library for natural language processing. InProceedings of the 2021 conference on empirical methods in natural language processing: system demonstrations, pag...

work page 2021

[46] [46]

Transformers: State-of-the-art natural language processing

Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45, 2020

work page 2020

[47] [47]

Trl: Transformer reinforcement learning

Leandro von Werra, Younes Belkada, Lewis Tunstall, Edward Beeching, Tristan Thrush, Nathan Lambert, Shengyi Huang, Kashif Rasul, and Quentin Gallouédec. Trl: Transformer reinforcement learning. https: //github.com/huggingface/trl, 2020

work page 2020

[48] [48]

Mathematica, Version 14.3, 2024

Wolfram Research, Inc. Mathematica, Version 14.3, 2024. Champaign, IL, 2024

work page 2024

[49] [49]

Deepseek-v3.2-exp: Boosting long-context efficiency with deepseek sparse attention, 2025

DeepSeek-AI. Deepseek-v3.2-exp: Boosting long-context efficiency with deepseek sparse attention, 2025. 16

work page 2025

[50] [50]

Mathematical biology: I

James D Murray. Mathematical biology: I. an introduction. interdisciplinary applied mathematics.Mathematical Biology, Springer, 17, 2002. 17

work page 2002