pith. sign in

arxiv: 2605.21160 · v1 · pith:CBOLIDUUnew · submitted 2026-05-20 · 💻 cs.LG

Learning First Integrals via Backward-Generated Data and Guided Reinforcement Learning

Pith reviewed 2026-05-21 05:52 UTC · model grok-4.3

classification 💻 cs.LG
keywords first integralsdifferential equationsbackward generationreinforcement learninglarge language modelsconservation lawssymbolic computation
0
0 comments X

The pith

FISolver uses backward generation of training pairs and reinforcement learning to let a compact model outperform larger LLMs and Mathematica at finding first integrals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a data-driven approach to the long-standing problem of discovering first integrals, which represent conservation laws in systems of differential equations. It creates large synthetic datasets by reversing the usual process: sampling candidate integrals first and then deriving the differential equations that would have those integrals. Supervised fine-tuning on this data is followed by reinforcement learning that rewards outputs closer in edit distance to correct solutions. The resulting system solves challenging benchmark problems more accurately than much larger language models or established symbolic solvers while using far less computation.

Core claim

By deriving differential equations from randomly sampled first integrals, the backward generation procedure produces abundant (equation, integral) training pairs. Supervised fine-tuning of a compact mathematical language model on these pairs, followed by reinforcement learning with a Levenshtein-distance-shaped reward and targeted data blending, produces FISolver, which solves difficult first-integral problems more reliably than larger mathematical LLMs or commercial solvers such as Mathematica.

What carries the argument

The backward generation algorithm, which starts from sampled first integrals and constructs corresponding differential equations to create training pairs, together with Levenshtein Distance-based reward shaping inside reinforcement learning for exact symbolic output.

If this is right

  • Data synthesis by reversing from solutions to problems can relieve scarcity for other symbolic mathematics tasks.
  • Reinforcement learning with string-edit rewards can steer language models toward exact rather than approximate mathematical outputs.
  • Smaller models become viable for specialized scientific discovery once appropriate synthetic data and reward shaping are supplied.
  • Conservation-law discovery in dynamical systems can be partially automated without relying on hand-crafted heuristics or massive model scale.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same backward-generation idea could be tested on inverse problems in algebra or geometry where forward sampling is expensive.
  • If the method scales, it suggests that many conservation laws possess statistical regularities that reward-shaped models can exploit without explicit mathematical insight.
  • Hybrid pipelines might emerge in which an LLM proposes candidate integrals and a traditional verifier confirms them, reducing the need for full end-to-end symbolic search.

Load-bearing premise

The synthetic pairs created by backward generation are diverse and representative enough that a model trained on them can solve genuinely new and harder families of differential equations rather than only patterns already present in the generated distribution.

What would settle it

A new benchmark consisting of first-integral problems drawn from families that cannot be obtained by the backward-generation sampling procedure, on which FISolver fails to match or exceed the success rate of Mathematica or larger LLMs, would show that generalization has not occurred.

Figures

Figures reproduced from arXiv: 2605.21160 by Jingfeng Zhong, Shuai Li, Zhengxiang Liu, Zhijie Wang.

Figure 1
Figure 1. Figure 1: Accuracy of first integral prediction on the Normal and Hard test sets. FISolver (1.5B parameters) significantly [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FISolver comprises three main stages: (1) [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
read the original abstract

The discovery of first integrals is of fundamental scientific importance for understanding conservation laws in dynamical systems. However, existing symbolic computation tools and Large Language Models (LLMs) remain limited on this task because high-quality training data are scarce and successful solutions often depend on mathematical intuition. This paper presents FISolver, an LLM-based solver developed to address this challenge. First, we introduce a "Backward Generation" algorithm that systematically builds large-scale datasets of (differential equation, first integral) pairs by deriving differential equations from sampled integrals, thereby alleviating the data scarcity bottleneck. Second, we apply supervised fine-tuning to a compact mathematical model and further improve its performance through reinforcement learning with a Levenshtein Distance-based shaped reward. In addition, we design data synthesis and blending strategies that support effective adaptation to difficult problem families from sparse examples. Experiments show that FISolver, while requiring substantially lower computational cost, significantly outperforms larger mathematical LLMs and commercial solvers such as Mathematica on challenging benchmarks, indicating a new data-driven route for automated discovery of first integrals.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces FISolver, an LLM-based solver for discovering first integrals of differential equations. It proposes a backward-generation algorithm that samples first integrals and derives the corresponding DEs to create large-scale synthetic (DE, integral) training pairs, followed by supervised fine-tuning of a compact mathematical LLM and reinforcement learning using a Levenshtein distance-shaped reward. Data synthesis and blending strategies are used to adapt to difficult problem families. The central claim is that FISolver achieves significantly better performance than larger mathematical LLMs and commercial solvers such as Mathematica on challenging benchmarks, at substantially lower computational cost.

Significance. If the performance and generalization claims hold, the work would provide a practical data-driven route to automated discovery of conservation laws, addressing the scarcity of high-quality symbolic training data in dynamical systems. The combination of backward data generation with Levenshtein-guided RL on a compact model, plus explicit strategies for sparse adaptation, represents a concrete strength; reproducible code or machine-checked elements are not mentioned but would further strengthen the contribution if present.

major comments (2)
  1. [Abstract] Abstract: the claim that FISolver 'significantly outperforms larger mathematical LLMs and commercial solvers such as Mathematica on challenging benchmarks' is presented without any quantitative metrics, error bars, baseline details, or description of benchmark selection and difficulty. This absence directly undermines evaluation of the central performance claim.
  2. [Data Synthesis / Experiments] Data generation and experiments sections: no quantitative measure (e.g., Kolmogorov-Smirnov test, moment matching, or coverage statistics over known hard families) is reported comparing the distribution of backward-generated (DE, first integral) pairs to the evaluation benchmarks. Without this, the generalization claim that the synthetic distribution supports extrapolation to genuinely difficult, unseen problem families remains unverified and load-bearing for the main result.
minor comments (2)
  1. [Abstract] The abstract would be clearer if it briefly stated the base model size, number of benchmarks, and exact performance deltas rather than qualitative descriptors.
  2. [Method] Notation for the Levenshtein reward and the blending strategies could be introduced earlier with a short equation or pseudocode to aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed review. We address each major comment point by point below and outline the planned revisions to the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the claim that FISolver 'significantly outperforms larger mathematical LLMs and commercial solvers such as Mathematica on challenging benchmarks' is presented without any quantitative metrics, error bars, baseline details, or description of benchmark selection and difficulty. This absence directly undermines evaluation of the central performance claim.

    Authors: We agree that the abstract would benefit from explicit quantitative support for the performance claim. In the revised manuscript we will update the abstract to incorporate the key metrics already reported in the experiments section, including success rates on the benchmarks, direct comparisons to the larger LLMs and Mathematica, and brief indications of benchmark selection criteria and observed variability. This change will make the central claim easier to evaluate while preserving the abstract's length and focus. revision: yes

  2. Referee: [Data Synthesis / Experiments] Data generation and experiments sections: no quantitative measure (e.g., Kolmogorov-Smirnov test, moment matching, or coverage statistics over known hard families) is reported comparing the distribution of backward-generated (DE, first integral) pairs to the evaluation benchmarks. Without this, the generalization claim that the synthetic distribution supports extrapolation to genuinely difficult, unseen problem families remains unverified and load-bearing for the main result.

    Authors: We acknowledge that an explicit quantitative comparison of the synthetic data distribution to the evaluation benchmarks would strengthen the generalization argument. The backward-generation algorithm is constructed to sample broadly across polynomial, trigonometric, and other families that appear in standard benchmarks, but we did not include distributional statistics in the original submission. We will add coverage statistics and, where appropriate, moment-matching or diversity metrics in the revised data-synthesis section to document the overlap with known hard families and thereby support the extrapolation claim. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained.

full rationale

The paper generates (DE, integral) training pairs independently via backward sampling of integrals followed by derivation of the corresponding DEs; the RL reward is defined externally as Levenshtein distance to ground-truth integrals; evaluation occurs on separate challenging benchmarks. No load-bearing step equates a claimed prediction or result to its own fitted inputs or self-citations by construction. The method is data-driven and externally validated rather than tautological.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unverified assumption that backward-generated synthetic data plus string-edit reward produces genuine generalization rather than distribution-specific overfitting; no free parameters or invented entities are explicitly named in the abstract.

axioms (1)
  • domain assumption Large language models can be effectively fine-tuned and reinforced on symbolic mathematical tasks when provided with sufficient high-quality paired data.
    Implicit in the use of supervised fine-tuning and RL on the generated dataset.

pith-pipeline@v0.9.0 · 5713 in / 1366 out tokens · 40412 ms · 2026-05-21T05:52:04.523745+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · 7 internal anchors

  1. [1]

    Springer Science & Business Media, 1992

    Vladimir I Arnold.Ordinary differential equations. Springer Science & Business Media, 1992

  2. [2]

    Centennial history of hilbert’s 16th problem.Bulletin of the American Mathematical Society, 39(3):301–354, 2002

    Yu Ilyashenko. Centennial history of hilbert’s 16th problem.Bulletin of the American Mathematical Society, 39(3):301–354, 2002

  3. [3]

    Gauthier-Villars et fils, imprimeurs- libraires, 1893

    Henri Poincaré.Les méthodes nouvelles de la mécanique céleste, volume 2. Gauthier-Villars et fils, imprimeurs- libraires, 1893

  4. [4]

    A new way to use nonlocal symmetries to determine first integrals of second-order nonlinear ordinary differential equations

    A Braz, LGS Duarte, HS Ferreira, ACS Guabiraba, LACP da Mota, and ISS Nascimento. A new way to use nonlocal symmetries to determine first integrals of second-order nonlinear ordinary differential equations. Computer Physics Communications, 307:109426, 2025

  5. [5]

    Formal mathematical reasoning: A new frontier in ai.arXiv preprint arXiv:2412.16075, 2024

    Kaiyu Yang, Gabriel Poesia, Jingxuan He, Wenda Li, Kristin Lauter, Swarat Chaudhuri, and Dawn Song. Formal mathematical reasoning: A new frontier in ai.arXiv preprint arXiv:2412.16075, 2024

  6. [6]

    Solving quantitative reasoning problems with language models.Advances in neural information processing systems, 35:3843–3857, 2022

    Aitor Lewkowycz, Anders Andreassen, David Dohan, Ethan Dyer, Henryk Michalewski, Vinay Ramasesh, Ambrose Slone, Cem Anil, Imanol Schlag, Theo Gutman-Solo, et al. Solving quantitative reasoning problems with language models.Advances in neural information processing systems, 35:3843–3857, 2022. 14

  7. [7]

    Solving olympiad geometry without human demonstrations.Nature, 625(7995):476–482, 2024

    Trieu H Trinh, Yuhuai Wu, Quoc V Le, He He, and Thang Luong. Solving olympiad geometry without human demonstrations.Nature, 625(7995):476–482, 2024

  8. [8]

    MetaMath: Bootstrap Your Own Mathematical Questions for Large Language Models

    Longhui Yu, Weisen Jiang, Han Shi, Jincheng Yu, Zhengying Liu, Yu Zhang, James T Kwok, Zhenguo Li, Adrian Weller, and Weiyang Liu. Metamath: Bootstrap your own mathematical questions for large language models. arXiv preprint arXiv:2309.12284, 2023

  9. [9]

    Deep learning for symbolic mathematics.arXiv preprint arXiv:1912.01412, 2019

    Guillaume Lample and François Charton. Deep learning for symbolic mathematics.arXiv preprint arXiv:1912.01412, 2019

  10. [10]

    Advancing mathematics by guiding human intuition with ai.Nature, 600(7887):70–74, 2021

    Alex Davies, Petar Veliˇckovi´c, Lars Buesing, Sam Blackwell, Daniel Zheng, Nenad Tomašev, Richard Tanburn, Peter Battaglia, Charles Blundell, András Juhász, et al. Advancing mathematics by guiding human intuition with ai.Nature, 600(7887):70–74, 2021

  11. [11]

    Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3, 2022

  12. [12]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Y Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models, 2024. URL https://arxiv. org/abs/2402.03300, 2(3):5, 2024

  13. [13]

    Invariant variation problems.Transport theory and statistical physics, 1(3):186–207, 1971

    Emmy Noether. Invariant variation problems.Transport theory and statistical physics, 1(3):186–207, 1971

  14. [14]

    Elementary first integrals of differential equations

    Myra Jean Prelle and Michael F Singer. Elementary first integrals of differential equations. InProceedings of the fourth ACM symposium on Symbolic and algebraic computation, pages 30–35, 1981

  15. [15]

    Springer, 2002

    George W Bluman and Stephen C Anco.Symmetry and integration methods for differential equations. Springer, 2002

  16. [16]

    Liouvillian first integrals of differential equations.Transactions of the American Mathematical Society, 333(2):673–688, 1992

    Michael F Singer. Liouvillian first integrals of differential equations.Transactions of the American Mathematical Society, 333(2):673–688, 1992

  17. [17]

    Constants of motion network.Advances in Neural Information Processing Systems, 35:25295–25305, 2022

    Muhammad Firmansyah Kasim and Yi Heng Lim. Constants of motion network.Advances in Neural Information Processing Systems, 35:25295–25305, 2022

  18. [18]

    Hamiltonian neural networks.Advances in neural information processing systems, 32, 2019

    Samuel Greydanus, Misko Dzamba, and Jason Yosinski. Hamiltonian neural networks.Advances in neural information processing systems, 32, 2019

  19. [19]

    Neural symplectic form: Learning hamiltonian equations on general coordinate systems.Advances in Neural Information Processing Systems, 34:16659–16670, 2021

    Yuhan Chen, Takashi Matsubara, and Takaharu Yaguchi. Neural symplectic form: Learning hamiltonian equations on general coordinate systems.Advances in Neural Information Processing Systems, 34:16659–16670, 2021

  20. [20]

    Lagrangian neural networks, 2020

    Miles Cranmer, Sam Greydanus, Stephan Hoyer, Peter Battaglia, David Spergel, and Shirley Ho. Lagrangian neural networks.arXiv preprint arXiv:2003.04630, 2020

  21. [21]

    Discovering invariants via machine learning.Physical Review Research, 3(4):L042035, 2021

    Seungwoong Ha and Hawoong Jeong. Discovering invariants via machine learning.Physical Review Research, 3(4):L042035, 2021

  22. [22]

    Machine learning conservation laws from differential equations

    Ziming Liu, Varun Madhavan, and Max Tegmark. Machine learning conservation laws from differential equations. Physical Review E, 106(4):045307, 2022

  23. [23]

    Attention is all you need.Advances in neural information processing systems, 30, 2017

    Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

  24. [24]

    Qwen2 Technical Report

    An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, et al. Qwen2 technical report.arXiv preprint arXiv:2407.10671, 2024

  25. [25]

    Learning the greatest common divisor: explaining transformer predictions.arXiv preprint arXiv:2308.15594, 2023

    François Charton. Learning the greatest common divisor: explaining transformer predictions.arXiv preprint arXiv:2308.15594, 2023

  26. [26]

    Linear algebra with transformers.arXiv preprint arXiv:2112.01898, 2021

    François Charton. Linear algebra with transformers.arXiv preprint arXiv:2112.01898, 2021

  27. [27]

    Transforming the bootstrap: using transformers to compute scattering amplitudes in planarN= 4super yang–mills theory.Machine Learning: Science and Technology, 5(3):035073, 2024

    Tianji Cai, Garrett W Merz, François Charton, Niklas Nolte, Matthias Wilhelm, Kyle Cranmer, and Lance J Dixon. Transforming the bootstrap: using transformers to compute scattering amplitudes in planarN= 4super yang–mills theory.Machine Learning: Science and Technology, 5(3):035073, 2024

  28. [28]

    Global lyapunov functions: a long-standing open problem in mathematics, with symbolic transformers.Advances in Neural Information Processing Systems, 37:93643–93670, 2024

    Alberto Alfarano, François Charton, and Amaury Hayat. Global lyapunov functions: a long-standing open problem in mathematics, with symbolic transformers.Advances in Neural Information Processing Systems, 37:93643–93670, 2024

  29. [29]

    Ai feynman: A physics-inspired method for symbolic regression

    Silviu-Marian Udrescu and Max Tegmark. Ai feynman: A physics-inspired method for symbolic regression. Science advances, 6(16):eaay2631, 2020

  30. [30]

    Neural symbolic regression that scales

    Luca Biggio, Tommaso Bendinelli, Alexander Neitz, Aurelien Lucchi, and Giambattista Parascandolo. Neural symbolic regression that scales. InInternational Conference on Machine Learning, pages 936–945. Pmlr, 2021. 15

  31. [31]

    Shojaee, K

    Parshin Shojaee, Kazem Meidani, Shashank Gupta, Amir Barati Farimani, and Chandan K Reddy. Llm-sr: Scientific equation discovery via programming with large language models.arXiv preprint arXiv:2404.18400, 2024

  32. [32]

    Discovering symmetries of odes by symbolic regression

    Paul Kahlmeyer, Niklas Merk, and Joachim Giesen. Discovering symmetries of odes by symbolic regression. In Proceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 17715–17723, 2025

  33. [33]

    Mathgenie: Generating synthetic data with question back-translation for enhancing mathematical reasoning of llms.arXiv preprint arXiv:2402.16352, 2024

    Zimu Lu, Aojun Zhou, Houxing Ren, Ke Wang, Weikang Shi, Junting Pan, Mingjie Zhan, and Hongsheng Li. Mathgenie: Generating synthetic data with question back-translation for enhancing mathematical reasoning of llms.arXiv preprint arXiv:2402.16352, 2024

  34. [34]

    WizardMath: Empowering Mathematical Reasoning for Large Language Models via Reinforced Evol-Instruct

    Haipeng Luo, Qingfeng Sun, Can Xu, Pu Zhao, Jianguang Lou, Chongyang Tao, Xiubo Geng, Qingwei Lin, Shifeng Chen, and Dongmei Zhang. Wizardmath: Empowering mathematical reasoning for large language models via reinforced evol-instruct.arXiv preprint arXiv:2308.09583, 2023

  35. [35]

    ToRA: A Tool-Integrated Reasoning Agent for Mathematical Problem Solving

    Zhibin Gou, Zhihong Shao, Yeyun Gong, Yelong Shen, Yujiu Yang, Minlie Huang, Nan Duan, and Weizhu Chen. Tora: A tool-integrated reasoning agent for mathematical problem solving.arXiv preprint arXiv:2309.17452, 2023

  36. [36]

    Lima: Less is more for alignment.Advances in Neural Information Processing Systems, 36:55006–55021, 2023

    Chunting Zhou, Pengfei Liu, Puxin Xu, Srinivasan Iyer, Jiao Sun, Yuning Mao, Xuezhe Ma, Avia Efrat, Ping Yu, Lili Yu, et al. Lima: Less is more for alignment.Advances in Neural Information Processing Systems, 36:55006–55021, 2023

  37. [37]

    How abilities in large language models are affected by supervised fine-tuning data composition

    Guanting Dong, Hongyi Yuan, Keming Lu, Chengpeng Li, Mingfeng Xue, Dayiheng Liu, Wei Wang, Zheng Yuan, Chang Zhou, and Jingren Zhou. How abilities in large language models are affected by supervised fine-tuning data composition. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 177–198, 2024

  38. [38]

    Textbooks Are All You Need

    Suriya Gunasekar, Yi Zhang, Jyoti Aneja, Caio César Teodoro Mendes, Allie Del Giorno, Sivakanth Gopi, Mojan Javaheripi, Piero Kauffmann, Gustavo de Rosa, Olli Saarikivi, et al. Textbooks are all you need.arXiv preprint arXiv:2306.11644, 2023

  39. [39]

    Let’s verify step by step

    Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, and Karl Cobbe. Let’s verify step by step. InThe Twelfth International Conference on Learning Representations, 2023

  40. [40]

    Binary codes capable of correcting deletions, insertions, and reversals probl.Inf

    VI Levenshtein. Binary codes capable of correcting deletions, insertions, and reversals probl.Inf. Transm, 1:8–17, 1965

  41. [41]

    Sequence Level Training with Recurrent Neural Networks

    Marc’Aurelio Ranzato, Sumit Chopra, Michael Auli, and Wojciech Zaremba. Sequence level training with recurrent neural networks.arXiv preprint arXiv:1511.06732, 2015

  42. [42]

    Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients

    Brenden K Petersen, Mikel Landajuela, T Nathan Mundhenk, Claudio P Santiago, Soo K Kim, and Joanne T Kim. Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients. arXiv preprint arXiv:1912.04871, 2019

  43. [43]

    Tree edit distance learning via adaptive symbol embeddings

    Benjamin Paaßen, Claudio Gallicchio, Alessio Micheli, and Barbara Hammer. Tree edit distance learning via adaptive symbol embeddings. InInternational Conference on Machine Learning, pages 3976–3985. PMLR, 2018

  44. [44]

    Sympy: symbolic computing in python.PeerJ Computer Science, 3:e103, 2017

    Aaron Meurer, Christopher P Smith, Mateusz Paprocki, Ondˇrej ˇCertík, Sergey B Kirpichev, Matthew Rocklin, AMiT Kumar, Sergiu Ivanov, Jason K Moore, Sartaj Singh, et al. Sympy: symbolic computing in python.PeerJ Computer Science, 3:e103, 2017

  45. [45]

    Datasets: A community library for natural language processing

    Quentin Lhoest, Albert Villanova Del Moral, Yacine Jernite, Abhishek Thakur, Patrick V on Platen, Suraj Patil, Julien Chaumond, Mariama Drame, Julien Plu, Lewis Tunstall, et al. Datasets: A community library for natural language processing. InProceedings of the 2021 conference on empirical methods in natural language processing: system demonstrations, pag...

  46. [46]

    Transformers: State-of-the-art natural language processing

    Thomas Wolf, Lysandre Debut, Victor Sanh, Julien Chaumond, Clement Delangue, Anthony Moi, Pierric Cistac, Tim Rault, Remi Louf, Morgan Funtowicz, et al. Transformers: State-of-the-art natural language processing. In Proceedings of the 2020 conference on empirical methods in natural language processing: system demonstrations, pages 38–45, 2020

  47. [47]

    Trl: Transformer reinforcement learning

    Leandro von Werra, Younes Belkada, Lewis Tunstall, Edward Beeching, Tristan Thrush, Nathan Lambert, Shengyi Huang, Kashif Rasul, and Quentin Gallouédec. Trl: Transformer reinforcement learning. https: //github.com/huggingface/trl, 2020

  48. [48]

    Mathematica, Version 14.3, 2024

    Wolfram Research, Inc. Mathematica, Version 14.3, 2024. Champaign, IL, 2024

  49. [49]

    Deepseek-v3.2-exp: Boosting long-context efficiency with deepseek sparse attention, 2025

    DeepSeek-AI. Deepseek-v3.2-exp: Boosting long-context efficiency with deepseek sparse attention, 2025. 16

  50. [50]

    Mathematical biology: I

    James D Murray. Mathematical biology: I. an introduction. interdisciplinary applied mathematics.Mathematical Biology, Springer, 17, 2002. 17