arxiv: 2603.11164 · v2 · submitted 2026-03-11 · ✦ hep-th · cs.LG· cs.SC· hep-ph

Recognition: 1 theorem link

· Lean Theorem

Learning to Unscramble: Simplifying Symbolic Expressions via Self-Supervised Oracle Trajectories

David Shih

Authors on Pith no claims yet

Pith reviewed 2026-05-15 12:28 UTC · model grok-4.3

classification ✦ hep-th cs.LGcs.SChep-ph

keywords self-supervised learningsymbolic simplificationYang-Mills amplitudesspinor-helicitytransformer policy networkoracle trajectoriesdilogarithm reductionbeam search

0 comments

The pith

Self-supervised training on scrambled expressions lets a policy network simplify complex high-energy physics expressions to near-perfect accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to generate training data for symbolic simplification by scrambling simple expressions and recording the exact inverse steps as oracle trajectories. A permutation-equivariant transformer policy network is then trained step by step to predict the correct unscrambling action at each stage. This approach is tested on dilogarithm identities and on spinor-helicity expressions for gluon scattering in Yang-Mills theory. It substantially outperforms earlier reinforcement-learning and regression methods. When combined with contrastive grouping and beam search the network reaches 100 percent full simplification on a set of 5-point tree-level amplitudes containing more than 200 terms.

Core claim

By training exclusively on oracle trajectories obtained from scrambling and unscrambling simple expressions, the model learns a policy that generalizes to simplify far larger and structurally different expressions that appear in actual 5-point gluon tree-level amplitudes, reaching a 100 percent full-simplification rate when the policy is augmented with contrastive grouping and beam search.

What carries the argument

A permutation-equivariant transformer-based policy network trained stepwise to predict the next oracle action from the current scrambled expression.

Load-bearing premise

That a policy learned only from trajectories of scrambled simple expressions will generalize without failure to the structurally different and much larger expressions that arise in real high-energy physics calculations.

What would settle it

Finding even one 5-point gluon amplitude outside the reported test set, or any 6-point tree-level amplitude, on which the model plus beam search fails to reach a fully simplified form.

Figures

Figures reproduced from arXiv: 2603.11164 by David Shih.

**Figure 2.** Figure 2: FIG. 2. Solve rate vs. scramble depth for dilogarithm simpli [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. Average number of steps to solve vs. scramble depth [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4. Solve rate vs. number of target terms for 4-point (left), 5-point (center), and 6-point (right) amplitudes. Performance [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: FIG. 5. Solve rate vs. source bracket count for 4-point (left), 5-point (center), and 6-point (right) amplitudes. Our model [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: FIG. 6. Solve rate for 5-point Yang-Mills partial amplitudes [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

read the original abstract

We present a new self-supervised machine learning approach for symbolic simplification of complex mathematical expressions. Training data is generated by scrambling simple expressions and recording the inverse operations, creating oracle trajectories that provide both goal states and explicit paths to reach them. A permutation-equivariant, transformer-based policy network is then trained on this data step-wise to predict the oracle action given the input expression. We demonstrate this approach on two problems in high-energy physics: dilogarithm reduction and spinor-helicity scattering amplitude simplification. In both cases, our trained policy network achieves near perfect solve rates across a wide range of difficulty levels, substantially outperforming prior approaches based on reinforcement learning and end-to-end regression. When combined with contrastive grouping and beam search, our model achieves a 100\% full simplification rate on a representative selection of 5-point gluon tree-level amplitudes in Yang-Mills theory, including expressions with over 200 initial terms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The self-supervised oracle trajectory method for symbolic simplification is a fresh idea with reported strong results, but the generalization to real amplitudes needs more supporting evidence.

read the letter

The paper's core contribution is a self-supervised training approach for a policy network that simplifies symbolic expressions. It generates data by scrambling simple expressions and recording the steps to unscramble them, then trains a permutation-equivariant transformer to predict the next action. This is applied to dilogarithm identities and spinor-helicity amplitudes in Yang-Mills theory. What stands out is how this avoids the need for supervised labels from actual physics calculations. The method reports near-perfect solve rates on a range of difficulties and a full 100% simplification on selected 5-point gluon amplitudes with over 200 terms when using beam search and contrastive grouping. It also claims to beat prior reinforcement learning and regression baselines. The approach is new in combining oracle trajectories from scrambling with this specific architecture for step-wise simplification. That part holds up as a reasonable way to create training signals without circularity. On the downside, the abstract provides almost no information about the training process itself—no details on dataset size, hyperparameters, validation methods, or error analysis. More importantly, the headline result on large amplitudes leaves open whether the learned policy transfers well from the simple scrambled training cases or if the added beam search and grouping are doing the real work. Without ablations that separate those effects, it's difficult to credit the self-supervised part for the success on structurally different expressions. This paper would interest researchers working on automated symbolic manipulation in high-energy physics. Someone looking for new ideas in applying ML to math problems could pick up the training paradigm and try it out. The work shows clear thinking on the problem setup, so it deserves a serious referee to examine the full experimental controls and any additional results in the manuscript. I would recommend sending it to peer review rather than desk rejecting it.

Referee Report

2 major / 1 minor

Summary. The paper introduces a self-supervised approach to symbolic simplification in which simple expressions are scrambled to generate oracle trajectories that serve as training data for a permutation-equivariant transformer policy network. The network is trained step-wise to predict the inverse scrambling actions. The method is applied to dilogarithm reduction and to the simplification of spinor-helicity scattering amplitudes in Yang-Mills theory. The abstract reports near-perfect solve rates on a range of difficulties and states that, when augmented with contrastive grouping and beam search, the model reaches a 100% full simplification rate on a representative set of 5-point gluon tree-level amplitudes containing more than 200 terms.

Significance. If the reported generalization from scrambled simple expressions to large physical amplitudes can be substantiated, the work would offer a data-efficient, self-supervised alternative to reinforcement-learning or regression-based symbolic manipulators. This could reduce the manual effort required for amplitude simplification in high-energy physics and provide a reproducible pipeline for generating training trajectories without hand-crafted heuristics.

major comments (2)

[Abstract] Abstract: The headline claim of a 100% full simplification rate on 5-point gluon amplitudes with >200 initial terms is presented without any description of the training/validation split, the distribution of expression sizes or nesting depths in the training set, or an ablation that isolates the contribution of the learned policy from that of beam search. This directly bears on the central generalization claim and prevents assessment of whether the policy network itself succeeds on structurally dissimilar large expressions.
[Abstract] Abstract: No error analysis, failure cases, or controls for overfitting are supplied despite the policy being trained exclusively on trajectories from simple scrambled expressions. Without these, it is impossible to evaluate whether the reported performance on the 200-term amplitudes reflects successful transfer or is an artifact of the search procedure.

minor comments (1)

[Abstract] The abstract refers to 'near perfect solve rates across a wide range of difficulty levels' but supplies no quantitative table or figure that would allow the reader to judge the scaling with expression size or term count.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments on our manuscript. We address the major points raised below and will revise the manuscript to improve clarity on the training details, generalization evidence, and analysis of the results.

read point-by-point responses

Referee: [Abstract] Abstract: The headline claim of a 100% full simplification rate on 5-point gluon amplitudes with >200 initial terms is presented without any description of the training/validation split, the distribution of expression sizes or nesting depths in the training set, or an ablation that isolates the contribution of the learned policy from that of beam search. This directly bears on the central generalization claim and prevents assessment of whether the policy network itself succeeds on structurally dissimilar large expressions.

Authors: We agree that the abstract would benefit from additional context on these aspects to support the generalization claim. The full manuscript (Section 3) specifies that training trajectories are generated from scrambled expressions with 2–15 terms and nesting depths up to 3, using an 80/20 train/validation split on the oracle trajectories. We will revise the abstract to briefly summarize the training distribution and add an ablation study (new subsection in Section 5) that compares the learned policy against beam search with a random or heuristic policy to isolate the network's contribution. revision: yes
Referee: [Abstract] Abstract: No error analysis, failure cases, or controls for overfitting are supplied despite the policy being trained exclusively on trajectories from simple scrambled expressions. Without these, it is impossible to evaluate whether the reported performance on the 200-term amplitudes reflects successful transfer or is an artifact of the search procedure.

Authors: We acknowledge that a more explicit error analysis would strengthen the paper. The current manuscript reports near-perfect solve rates across a range of difficulty levels in Section 4, but we will expand the revision to include a dedicated error analysis subsection with representative failure cases (e.g., deeply nested dilogarithms) and a control experiment that evaluates the policy network on large amplitudes without beam search or contrastive grouping. This will help demonstrate that the performance reflects transfer rather than solely the search procedure. revision: yes

Circularity Check

0 steps flagged

No circularity: training trajectories generated independently of target expressions

full rationale

The paper's core method generates oracle trajectories exclusively by scrambling simple expressions and recording their inverse operations; these trajectories serve as training data for the policy network. The target 5-point gluon amplitudes (with >200 terms) are never used in training or as part of the oracle generation process. Consequently, the reported 100% simplification rate on those amplitudes cannot reduce to a fit or self-definition by construction. No load-bearing self-citation, uniqueness theorem, or ansatz smuggling is described that would collapse the result to the inputs. The derivation chain remains self-contained against external benchmarks because success is measured on held-out, structurally distinct expressions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract alone supplies no explicit free parameters, axioms, or invented entities; the approach rests on standard transformer architecture and data-generation procedure whose details are not provided.

pith-pipeline@v0.9.0 · 5456 in / 1166 out tokens · 39575 ms · 2026-05-15T12:28:32.031490+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Training data is generated by scrambling simple expressions and recording the inverse operations, creating oracle trajectories... A permutation-equivariant, transformer-based policy network is then trained on this data step-wise to predict the oracle action

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Learning to Unscramble Feynman Loop Integrals with SAILIR
hep-ph 2026-04 unverdicted novelty 8.0

A self-supervised transformer learns to unscramble Feynman integrals for online IBP reduction, delivering bounded memory use on complex two-loop topologies while matching Kira's speed on the hardest cases tested.
When Does Critique Improve AI-Assisted Theoretical Physics? SCALAR: Structured Critic--Actor Loop for Agentic Reasoning
cs.AI 2026-05 unverdicted novelty 7.0

Structured critic-actor loops improve AI performance on theoretical physics reasoning tasks, with benefits strongest in asymmetric model pairings using constructive feedback.
A Scientific Human-Agent Reproduction Pipeline
hep-ph 2026-04 unverdicted novelty 6.0

SHARP is a human-AI collaboration pipeline for reproducing scientific analyses, demonstrated by recreating a jet classification task from a particle physics paper.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · cited by 3 Pith papers · 9 internal anchors

[1]

A pre-trained contrastive encoder [18, 27] computes an embedding for each numerator term, and pairwise cosine 11 similarities identify groups of terms likely to simplify to- gether

Phase 1: Greedy Contrastive Grouping Since input expressions have up to∼200 terms but the model accepts at most 25, we follow CDS and use a contrastive groupingstrategy to decompose the problem. A pre-trained contrastive encoder [18, 27] computes an embedding for each numerator term, and pairwise cosine 11 similarities identify groups of terms likely to s...

work page
[2]

Computes the pairwise cosine similarity matrix over all current numerator terms (with numerical coefficients removed for robustness)

work page
[3]

For each reference term, selects the most similar neighbors above a threshold, forming groups of up to 25 terms

work page
[4]

For each group, factors out common spinor brackets shared by all terms, reducing the sub-expression complexity before model evaluation

work page
[5]

Applies a single model action to the reduced sub- expression, accepts the result if it reduces the term count, and reassembles with the remaining terms

work page
[6]

The similarity threshold starts at 0.6 and relaxes across passes (through 0.5 and 0.4), progressively allowing less similar terms to be grouped

After a full pass through all reference terms, re- cancels the expression and begins a new pass with updated similarities. The similarity threshold starts at 0.6 and relaxes across passes (through 0.5 and 0.4), progressively allowing less similar terms to be grouped. Up to 100 passes are per- formed per form, with numerical validation after each pass to d...

work page
[7]

The beam maintains the topK=20 candidate expressions, ranked by (n terms, nbrackets)

Phase 2: Beam Search Forms not solved by the greedy phase enter a beam search over sequences of identity applications. The beam maintains the topK=20 candidate expressions, ranked by (n terms, nbrackets). At each step, every beam entry is expanded using contrastive grouping: the contrastive encoder computes pairwise cosine similarities among all numerator...

work page
[8]

Of these, 17 are solved in the greedy phase alone and the remaining 86 re- quire beam search

Results All 103 forms are successfully simplified to the 1-term Parke-Taylor formula, a 100% solve rate. Of these, 17 are solved in the greedy phase alone and the remaining 86 re- quire beam search. The hardest form (form990, starting at 178 terms) required 98 beam search steps to reach the target. Fig. 6 compares our solve rate against CDS as a func- tio...

work page
[9]

Lample and F

G. Lample and F. Charton, inInternational Con- ference on Learning Representations (ICLR)(2020) arXiv:1912.01412 [cs.SC]

work page arXiv 2020
[10]

Cranmer, A

M. Cranmer, A. Sanchez-Gonzalez, P. Battaglia, R. Xu, K. Cranmer, D. Spergel, and S. Ho, inAdvances in Neu- ral Information Processing Systems 33(2020) pp. 17429– 17442, arXiv:2006.11287 [cs.LG]

work page arXiv 2020
[11]

Lample, M.-A

G. Lample, M.-A. Lachaux, T. Lavril, X. Martinet, A. Hayat, G. Ebner, A. Rodriguez, and T. Lacroix, in Advances in Neural Information Processing Systems 35 (2022) arXiv:2205.11491 [cs.LG]

work page arXiv 2022
[12]

Dabelow and M

L. Dabelow and M. Ueda, Neurocomputing613, 128732 (2024), arXiv:2401.13447 [cs.LG]

work page arXiv 2024
[13]

Davies, P

A. Davies, P. Veliˇ ckovi´ c, L. Buesing, S. Blackwell, D. Zheng, N. Tomaˇ sev, R. Tanburn, P. Battaglia, C. Blundell, A. Juh´ asz, M. Lackenby, G. Williamson, D. Hassabis, and P. Kohli, Nature600, 70 (2021)

work page 2021
[14]

A. N. Kirillov, Prog. Theor. Phys. Suppl.118, 61 (1995), arXiv:hep-th/9408113

work page internal anchor Pith review Pith/arXiv arXiv 1995
[15]

A. V. Kotikov, Phys. Lett. B254, 158 (1991)

work page 1991
[16]

Two-Loop Master Integrals for $\gamma^* \to 3$ Jets: The planar topologies

T. Gehrmann and E. Remiddi, Nucl. Phys. B601, 248 (2001), arXiv:hep-ph/0008287

work page internal anchor Pith review Pith/arXiv arXiv 2001
[17]

S. J. Parke and T. R. Taylor, Phys. Rev. Lett.56, 2459 (1986)

work page 1986
[18]

V. P. Nair, Phys. Lett. B214, 215 (1988)

work page 1988
[19]

Perturbative Gauge Theory As A String Theory In Twistor Space

E. Witten, Commun. Math. Phys.252, 189 (2004), arXiv:hep-th/0312171

work page internal anchor Pith review Pith/arXiv arXiv 2004
[20]

MHV Vertices And Tree Amplitudes In Gauge Theory

F. Cachazo, P. Svrˇ cek, and E. Witten, JHEP09, 006 (2004), arXiv:hep-th/0403047

work page internal anchor Pith review Pith/arXiv arXiv 2004
[21]

Direct Proof Of Tree-Level Recursion Relation In Yang-Mills Theory

R. Britto, F. Cachazo, B. Feng, and E. Witten, Phys. Rev. Lett.94, 181602 (2005), arXiv:hep-th/0501052

work page internal anchor Pith review Pith/arXiv arXiv 2005
[22]

L. J. Dixon, inTheoretical Advanced Study Insti- tute in Elementary Particle Physics: Particle Physics: The Higgs Boson and Beyond(2014) pp. 31–67, arXiv:1310.5353 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2014
[23]

Elvang and Y.-t

H. Elvang and Y.-t. Huang,Scattering Amplitudes in Gauge Theory and Gravity(Cambridge University Press, 2015)

work page 2015
[24]

TASI Lectures on Scattering Amplitudes

C. Cheung, inProceedings, Theoretical Advanced Study Institute in Elementary Particle Physics (TASI 2016) (2018) pp. 571–623, arXiv:1708.03872 [hep-ph]

work page internal anchor Pith review Pith/arXiv arXiv 2016
[25]

Dersy, M

A. Dersy, M. D. Schwartz, and X. Zhang, Int. J. Data Sci. Math. Sci.1, 135 (2024), arXiv:2206.04115 [cs.LG]

work page arXiv 2024
[26]

Cheung, A

C. Cheung, A. Dersy, and M. D. Schwartz, SciPost Phys. 18, 040 (2025), arXiv:2408.04720 [hep-th]

work page arXiv 2025
[27]

D. A. Pomerleau, inAdvances in Neural Information Pro- cessing Systems 1, edited by D. S. Touretzky (Morgan Kaufmann, 1989) pp. 305–313

work page 1989
[28]

Bain and C

M. Bain and C. Sammut, inMachine Intelligence 15 (1995) pp. 103–129

work page 1995
[29]

Agostinelli, S

F. Agostinelli, S. McAleer, A. Shmakov, R. Bhatt,et al., Nat. Mach. Intell.1, 356 (2019)

work page 2019
[30]

Takano, Trans

K. Takano, Trans. Mach. Learn. Res. (2023)

work page 2023
[31]

S. Ross, G. J. Gordon, and D. Bagnell, inProceedings of the Fourteenth International Conference on Artificial In- telligence and Statistics (AISTATS), Proceedings of Ma- chine Learning Research, Vol. 15 (PMLR, 2011) pp. 627– 635

work page 2011
[32]

Attention Is All You Need

A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, inAdvances in Neural Information Processing Systems 30(2017) pp. 5998–6008, arXiv:1706.03762 [cs.CL]

work page internal anchor Pith review Pith/arXiv arXiv 2017
[33]

Zagier, inFrontiers in Number Theory, Physics, and Geometry II(Springer, Berlin, Heidelberg, 2007) pp

D. Zagier, inFrontiers in Number Theory, Physics, and Geometry II(Springer, Berlin, Heidelberg, 2007) pp. 3– 65

work page 2007
[34]

ML Polylogarithms: Code and pretrained models for polylogarithm simplification,

A. Dersy, M. D. Schwartz, and X. Zhang, “ML Polylogarithms: Code and pretrained models for polylogarithm simplification,”https://github.com/ aureliendersy/ML_Polylogarithms(2022)

work page 2022
[35]

spinorhe- licity: Code and pretrained models for scatter- ing amplitude simplification,

C. Cheung, A. Dersy, and M. D. Schwartz, “spinorhe- licity: Code and pretrained models for scatter- ing amplitude simplification,”https://github.com/ aureliendersy/spinorhelicity(2024)

work page 2024
[36]

Duhr and F

C. Duhr and F. Dulat, JHEP08, 135 (2019), arXiv:1904.07279 [hep-ph]

work page arXiv 2019
[37]

K. G. Chetyrkin and F. V. Tkachov, Nucl. Phys. B192, 159 (1981)

work page 1981
[38]

High-precision calculation of multi-loop Feynman integrals by difference equations

S. Laporta, Int. J. Mod. Phys. A15, 5087 (2000), arXiv:hep-ph/0102033

work page internal anchor Pith review Pith/arXiv arXiv 2000
[39]

von Hippel and M

M. von Hippel and M. Wilhelm, JHEP05, 185 (2025), arXiv:2502.05121 [hep-th]

work page arXiv 2025
[40]

Song, T.-Z

Z.-Y. Song, T.-Z. Yang, Q.-H. Cao, M.-x. Luo, and H. X. Zhu, (2025), arXiv:2502.09544 [hep-ph]

work page arXiv 2025
[41]

Zeng, Phys

M. Zeng, Phys. Rev. D (2025), 10.1103/dmlf-jkfc, arXiv:2504.16045 [hep-ph]

work page doi:10.1103/dmlf-jkfc 2025
[42]

T. Cai, G. W. Merz, F. Charton, N. Nolte, M. Wilhelm, K. Cranmer,et al., (2024), arXiv:2405.06107 [hep-th]

work page arXiv 2024
[43]

T. Cai, F. Charton, K. Cranmer, L. J. Dixon, G. W. Merz, and M. Wilhelm, (2025), arXiv:2501.05743 [hep- th]

work page arXiv 2025