Recognition: 1 theorem link
· Lean TheoremLearning to Unscramble: Simplifying Symbolic Expressions via Self-Supervised Oracle Trajectories
Pith reviewed 2026-05-15 12:28 UTC · model grok-4.3
The pith
Self-supervised training on scrambled expressions lets a policy network simplify complex high-energy physics expressions to near-perfect accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
By training exclusively on oracle trajectories obtained from scrambling and unscrambling simple expressions, the model learns a policy that generalizes to simplify far larger and structurally different expressions that appear in actual 5-point gluon tree-level amplitudes, reaching a 100 percent full-simplification rate when the policy is augmented with contrastive grouping and beam search.
What carries the argument
A permutation-equivariant transformer-based policy network trained stepwise to predict the next oracle action from the current scrambled expression.
Load-bearing premise
That a policy learned only from trajectories of scrambled simple expressions will generalize without failure to the structurally different and much larger expressions that arise in real high-energy physics calculations.
What would settle it
Finding even one 5-point gluon amplitude outside the reported test set, or any 6-point tree-level amplitude, on which the model plus beam search fails to reach a fully simplified form.
Figures
read the original abstract
We present a new self-supervised machine learning approach for symbolic simplification of complex mathematical expressions. Training data is generated by scrambling simple expressions and recording the inverse operations, creating oracle trajectories that provide both goal states and explicit paths to reach them. A permutation-equivariant, transformer-based policy network is then trained on this data step-wise to predict the oracle action given the input expression. We demonstrate this approach on two problems in high-energy physics: dilogarithm reduction and spinor-helicity scattering amplitude simplification. In both cases, our trained policy network achieves near perfect solve rates across a wide range of difficulty levels, substantially outperforming prior approaches based on reinforcement learning and end-to-end regression. When combined with contrastive grouping and beam search, our model achieves a 100\% full simplification rate on a representative selection of 5-point gluon tree-level amplitudes in Yang-Mills theory, including expressions with over 200 initial terms.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a self-supervised approach to symbolic simplification in which simple expressions are scrambled to generate oracle trajectories that serve as training data for a permutation-equivariant transformer policy network. The network is trained step-wise to predict the inverse scrambling actions. The method is applied to dilogarithm reduction and to the simplification of spinor-helicity scattering amplitudes in Yang-Mills theory. The abstract reports near-perfect solve rates on a range of difficulties and states that, when augmented with contrastive grouping and beam search, the model reaches a 100% full simplification rate on a representative set of 5-point gluon tree-level amplitudes containing more than 200 terms.
Significance. If the reported generalization from scrambled simple expressions to large physical amplitudes can be substantiated, the work would offer a data-efficient, self-supervised alternative to reinforcement-learning or regression-based symbolic manipulators. This could reduce the manual effort required for amplitude simplification in high-energy physics and provide a reproducible pipeline for generating training trajectories without hand-crafted heuristics.
major comments (2)
- [Abstract] Abstract: The headline claim of a 100% full simplification rate on 5-point gluon amplitudes with >200 initial terms is presented without any description of the training/validation split, the distribution of expression sizes or nesting depths in the training set, or an ablation that isolates the contribution of the learned policy from that of beam search. This directly bears on the central generalization claim and prevents assessment of whether the policy network itself succeeds on structurally dissimilar large expressions.
- [Abstract] Abstract: No error analysis, failure cases, or controls for overfitting are supplied despite the policy being trained exclusively on trajectories from simple scrambled expressions. Without these, it is impossible to evaluate whether the reported performance on the 200-term amplitudes reflects successful transfer or is an artifact of the search procedure.
minor comments (1)
- [Abstract] The abstract refers to 'near perfect solve rates across a wide range of difficulty levels' but supplies no quantitative table or figure that would allow the reader to judge the scaling with expression size or term count.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments on our manuscript. We address the major points raised below and will revise the manuscript to improve clarity on the training details, generalization evidence, and analysis of the results.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline claim of a 100% full simplification rate on 5-point gluon amplitudes with >200 initial terms is presented without any description of the training/validation split, the distribution of expression sizes or nesting depths in the training set, or an ablation that isolates the contribution of the learned policy from that of beam search. This directly bears on the central generalization claim and prevents assessment of whether the policy network itself succeeds on structurally dissimilar large expressions.
Authors: We agree that the abstract would benefit from additional context on these aspects to support the generalization claim. The full manuscript (Section 3) specifies that training trajectories are generated from scrambled expressions with 2–15 terms and nesting depths up to 3, using an 80/20 train/validation split on the oracle trajectories. We will revise the abstract to briefly summarize the training distribution and add an ablation study (new subsection in Section 5) that compares the learned policy against beam search with a random or heuristic policy to isolate the network's contribution. revision: yes
-
Referee: [Abstract] Abstract: No error analysis, failure cases, or controls for overfitting are supplied despite the policy being trained exclusively on trajectories from simple scrambled expressions. Without these, it is impossible to evaluate whether the reported performance on the 200-term amplitudes reflects successful transfer or is an artifact of the search procedure.
Authors: We acknowledge that a more explicit error analysis would strengthen the paper. The current manuscript reports near-perfect solve rates across a range of difficulty levels in Section 4, but we will expand the revision to include a dedicated error analysis subsection with representative failure cases (e.g., deeply nested dilogarithms) and a control experiment that evaluates the policy network on large amplitudes without beam search or contrastive grouping. This will help demonstrate that the performance reflects transfer rather than solely the search procedure. revision: yes
Circularity Check
No circularity: training trajectories generated independently of target expressions
full rationale
The paper's core method generates oracle trajectories exclusively by scrambling simple expressions and recording their inverse operations; these trajectories serve as training data for the policy network. The target 5-point gluon amplitudes (with >200 terms) are never used in training or as part of the oracle generation process. Consequently, the reported 100% simplification rate on those amplitudes cannot reduce to a fit or self-definition by construction. No load-bearing self-citation, uniqueness theorem, or ansatz smuggling is described that would collapse the result to the inputs. The derivation chain remains self-contained against external benchmarks because success is measured on held-out, structurally distinct expressions.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Training data is generated by scrambling simple expressions and recording the inverse operations, creating oracle trajectories... A permutation-equivariant, transformer-based policy network is then trained on this data step-wise to predict the oracle action
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 3 Pith papers
-
Learning to Unscramble Feynman Loop Integrals with SAILIR
A self-supervised transformer learns to unscramble Feynman integrals for online IBP reduction, delivering bounded memory use on complex two-loop topologies while matching Kira's speed on the hardest cases tested.
-
When Does Critique Improve AI-Assisted Theoretical Physics? SCALAR: Structured Critic--Actor Loop for Agentic Reasoning
Structured critic-actor loops improve AI performance on theoretical physics reasoning tasks, with benefits strongest in asymmetric model pairings using constructive feedback.
-
A Scientific Human-Agent Reproduction Pipeline
SHARP is a human-AI collaboration pipeline for reproducing scientific analyses, demonstrated by recreating a jet classification task from a particle physics paper.
Reference graph
Works this paper leans on
-
[1]
Phase 1: Greedy Contrastive Grouping Since input expressions have up to∼200 terms but the model accepts at most 25, we follow CDS and use a contrastive groupingstrategy to decompose the problem. A pre-trained contrastive encoder [18, 27] computes an embedding for each numerator term, and pairwise cosine 11 similarities identify groups of terms likely to s...
-
[2]
Computes the pairwise cosine similarity matrix over all current numerator terms (with numerical coefficients removed for robustness)
-
[3]
For each reference term, selects the most similar neighbors above a threshold, forming groups of up to 25 terms
-
[4]
For each group, factors out common spinor brackets shared by all terms, reducing the sub-expression complexity before model evaluation
-
[5]
Applies a single model action to the reduced sub- expression, accepts the result if it reduces the term count, and reassembles with the remaining terms
-
[6]
After a full pass through all reference terms, re- cancels the expression and begins a new pass with updated similarities. The similarity threshold starts at 0.6 and relaxes across passes (through 0.5 and 0.4), progressively allowing less similar terms to be grouped. Up to 100 passes are per- formed per form, with numerical validation after each pass to d...
-
[7]
The beam maintains the topK=20 candidate expressions, ranked by (n terms, nbrackets)
Phase 2: Beam Search Forms not solved by the greedy phase enter a beam search over sequences of identity applications. The beam maintains the topK=20 candidate expressions, ranked by (n terms, nbrackets). At each step, every beam entry is expanded using contrastive grouping: the contrastive encoder computes pairwise cosine similarities among all numerator...
-
[8]
Of these, 17 are solved in the greedy phase alone and the remaining 86 re- quire beam search
Results All 103 forms are successfully simplified to the 1-term Parke-Taylor formula, a 100% solve rate. Of these, 17 are solved in the greedy phase alone and the remaining 86 re- quire beam search. The hardest form (form990, starting at 178 terms) required 98 beam search steps to reach the target. Fig. 6 compares our solve rate against CDS as a func- tio...
-
[9]
G. Lample and F. Charton, inInternational Con- ference on Learning Representations (ICLR)(2020) arXiv:1912.01412 [cs.SC]
-
[10]
M. Cranmer, A. Sanchez-Gonzalez, P. Battaglia, R. Xu, K. Cranmer, D. Spergel, and S. Ho, inAdvances in Neu- ral Information Processing Systems 33(2020) pp. 17429– 17442, arXiv:2006.11287 [cs.LG]
-
[11]
G. Lample, M.-A. Lachaux, T. Lavril, X. Martinet, A. Hayat, G. Ebner, A. Rodriguez, and T. Lacroix, in Advances in Neural Information Processing Systems 35 (2022) arXiv:2205.11491 [cs.LG]
-
[12]
L. Dabelow and M. Ueda, Neurocomputing613, 128732 (2024), arXiv:2401.13447 [cs.LG]
- [13]
-
[14]
A. N. Kirillov, Prog. Theor. Phys. Suppl.118, 61 (1995), arXiv:hep-th/9408113
work page internal anchor Pith review Pith/arXiv arXiv 1995
-
[15]
A. V. Kotikov, Phys. Lett. B254, 158 (1991)
work page 1991
-
[16]
Two-Loop Master Integrals for $\gamma^* \to 3$ Jets: The planar topologies
T. Gehrmann and E. Remiddi, Nucl. Phys. B601, 248 (2001), arXiv:hep-ph/0008287
work page internal anchor Pith review Pith/arXiv arXiv 2001
-
[17]
S. J. Parke and T. R. Taylor, Phys. Rev. Lett.56, 2459 (1986)
work page 1986
-
[18]
V. P. Nair, Phys. Lett. B214, 215 (1988)
work page 1988
-
[19]
Perturbative Gauge Theory As A String Theory In Twistor Space
E. Witten, Commun. Math. Phys.252, 189 (2004), arXiv:hep-th/0312171
work page internal anchor Pith review Pith/arXiv arXiv 2004
-
[20]
MHV Vertices And Tree Amplitudes In Gauge Theory
F. Cachazo, P. Svrˇ cek, and E. Witten, JHEP09, 006 (2004), arXiv:hep-th/0403047
work page internal anchor Pith review Pith/arXiv arXiv 2004
-
[21]
Direct Proof Of Tree-Level Recursion Relation In Yang-Mills Theory
R. Britto, F. Cachazo, B. Feng, and E. Witten, Phys. Rev. Lett.94, 181602 (2005), arXiv:hep-th/0501052
work page internal anchor Pith review Pith/arXiv arXiv 2005
-
[22]
L. J. Dixon, inTheoretical Advanced Study Insti- tute in Elementary Particle Physics: Particle Physics: The Higgs Boson and Beyond(2014) pp. 31–67, arXiv:1310.5353 [hep-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[23]
H. Elvang and Y.-t. Huang,Scattering Amplitudes in Gauge Theory and Gravity(Cambridge University Press, 2015)
work page 2015
-
[24]
TASI Lectures on Scattering Amplitudes
C. Cheung, inProceedings, Theoretical Advanced Study Institute in Elementary Particle Physics (TASI 2016) (2018) pp. 571–623, arXiv:1708.03872 [hep-ph]
work page internal anchor Pith review Pith/arXiv arXiv 2016
- [25]
- [26]
-
[27]
D. A. Pomerleau, inAdvances in Neural Information Pro- cessing Systems 1, edited by D. S. Touretzky (Morgan Kaufmann, 1989) pp. 305–313
work page 1989
- [28]
-
[29]
F. Agostinelli, S. McAleer, A. Shmakov, R. Bhatt,et al., Nat. Mach. Intell.1, 356 (2019)
work page 2019
- [30]
-
[31]
S. Ross, G. J. Gordon, and D. Bagnell, inProceedings of the Fourteenth International Conference on Artificial In- telligence and Statistics (AISTATS), Proceedings of Ma- chine Learning Research, Vol. 15 (PMLR, 2011) pp. 627– 635
work page 2011
-
[32]
A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, inAdvances in Neural Information Processing Systems 30(2017) pp. 5998–6008, arXiv:1706.03762 [cs.CL]
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[33]
D. Zagier, inFrontiers in Number Theory, Physics, and Geometry II(Springer, Berlin, Heidelberg, 2007) pp. 3– 65
work page 2007
-
[34]
ML Polylogarithms: Code and pretrained models for polylogarithm simplification,
A. Dersy, M. D. Schwartz, and X. Zhang, “ML Polylogarithms: Code and pretrained models for polylogarithm simplification,”https://github.com/ aureliendersy/ML_Polylogarithms(2022)
work page 2022
-
[35]
spinorhe- licity: Code and pretrained models for scatter- ing amplitude simplification,
C. Cheung, A. Dersy, and M. D. Schwartz, “spinorhe- licity: Code and pretrained models for scatter- ing amplitude simplification,”https://github.com/ aureliendersy/spinorhelicity(2024)
work page 2024
- [36]
-
[37]
K. G. Chetyrkin and F. V. Tkachov, Nucl. Phys. B192, 159 (1981)
work page 1981
-
[38]
High-precision calculation of multi-loop Feynman integrals by difference equations
S. Laporta, Int. J. Mod. Phys. A15, 5087 (2000), arXiv:hep-ph/0102033
work page internal anchor Pith review Pith/arXiv arXiv 2000
-
[39]
M. von Hippel and M. Wilhelm, JHEP05, 185 (2025), arXiv:2502.05121 [hep-th]
-
[40]
Z.-Y. Song, T.-Z. Yang, Q.-H. Cao, M.-x. Luo, and H. X. Zhu, (2025), arXiv:2502.09544 [hep-ph]
-
[41]
M. Zeng, Phys. Rev. D (2025), 10.1103/dmlf-jkfc, arXiv:2504.16045 [hep-ph]
- [42]
- [43]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.