pith. machine review for the scientific record. sign in

arxiv: 2603.11164 · v2 · submitted 2026-03-11 · ✦ hep-th · cs.LG· cs.SC· hep-ph

Recognition: 1 theorem link

· Lean Theorem

Learning to Unscramble: Simplifying Symbolic Expressions via Self-Supervised Oracle Trajectories

Authors on Pith no claims yet

Pith reviewed 2026-05-15 12:28 UTC · model grok-4.3

classification ✦ hep-th cs.LGcs.SChep-ph
keywords self-supervised learningsymbolic simplificationYang-Mills amplitudesspinor-helicitytransformer policy networkoracle trajectoriesdilogarithm reductionbeam search
0
0 comments X

The pith

Self-supervised training on scrambled expressions lets a policy network simplify complex high-energy physics expressions to near-perfect accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to generate training data for symbolic simplification by scrambling simple expressions and recording the exact inverse steps as oracle trajectories. A permutation-equivariant transformer policy network is then trained step by step to predict the correct unscrambling action at each stage. This approach is tested on dilogarithm identities and on spinor-helicity expressions for gluon scattering in Yang-Mills theory. It substantially outperforms earlier reinforcement-learning and regression methods. When combined with contrastive grouping and beam search the network reaches 100 percent full simplification on a set of 5-point tree-level amplitudes containing more than 200 terms.

Core claim

By training exclusively on oracle trajectories obtained from scrambling and unscrambling simple expressions, the model learns a policy that generalizes to simplify far larger and structurally different expressions that appear in actual 5-point gluon tree-level amplitudes, reaching a 100 percent full-simplification rate when the policy is augmented with contrastive grouping and beam search.

What carries the argument

A permutation-equivariant transformer-based policy network trained stepwise to predict the next oracle action from the current scrambled expression.

Load-bearing premise

That a policy learned only from trajectories of scrambled simple expressions will generalize without failure to the structurally different and much larger expressions that arise in real high-energy physics calculations.

What would settle it

Finding even one 5-point gluon amplitude outside the reported test set, or any 6-point tree-level amplitude, on which the model plus beam search fails to reach a fully simplified form.

Figures

Figures reproduced from arXiv: 2603.11164 by David Shih.

Figure 1
Figure 1. Figure 1: FIG. 1. Architecture of the policy network for symbolic sim [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: FIG. 2. Solve rate vs. scramble depth for dilogarithm simpli [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: FIG. 3. Average number of steps to solve vs. scramble depth [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: FIG. 4. Solve rate vs. number of target terms for 4-point (left), 5-point (center), and 6-point (right) amplitudes. Performance [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: FIG. 5. Solve rate vs. source bracket count for 4-point (left), 5-point (center), and 6-point (right) amplitudes. Our model [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: FIG. 6. Solve rate for 5-point Yang-Mills partial amplitudes [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗
read the original abstract

We present a new self-supervised machine learning approach for symbolic simplification of complex mathematical expressions. Training data is generated by scrambling simple expressions and recording the inverse operations, creating oracle trajectories that provide both goal states and explicit paths to reach them. A permutation-equivariant, transformer-based policy network is then trained on this data step-wise to predict the oracle action given the input expression. We demonstrate this approach on two problems in high-energy physics: dilogarithm reduction and spinor-helicity scattering amplitude simplification. In both cases, our trained policy network achieves near perfect solve rates across a wide range of difficulty levels, substantially outperforming prior approaches based on reinforcement learning and end-to-end regression. When combined with contrastive grouping and beam search, our model achieves a 100\% full simplification rate on a representative selection of 5-point gluon tree-level amplitudes in Yang-Mills theory, including expressions with over 200 initial terms.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces a self-supervised approach to symbolic simplification in which simple expressions are scrambled to generate oracle trajectories that serve as training data for a permutation-equivariant transformer policy network. The network is trained step-wise to predict the inverse scrambling actions. The method is applied to dilogarithm reduction and to the simplification of spinor-helicity scattering amplitudes in Yang-Mills theory. The abstract reports near-perfect solve rates on a range of difficulties and states that, when augmented with contrastive grouping and beam search, the model reaches a 100% full simplification rate on a representative set of 5-point gluon tree-level amplitudes containing more than 200 terms.

Significance. If the reported generalization from scrambled simple expressions to large physical amplitudes can be substantiated, the work would offer a data-efficient, self-supervised alternative to reinforcement-learning or regression-based symbolic manipulators. This could reduce the manual effort required for amplitude simplification in high-energy physics and provide a reproducible pipeline for generating training trajectories without hand-crafted heuristics.

major comments (2)
  1. [Abstract] Abstract: The headline claim of a 100% full simplification rate on 5-point gluon amplitudes with >200 initial terms is presented without any description of the training/validation split, the distribution of expression sizes or nesting depths in the training set, or an ablation that isolates the contribution of the learned policy from that of beam search. This directly bears on the central generalization claim and prevents assessment of whether the policy network itself succeeds on structurally dissimilar large expressions.
  2. [Abstract] Abstract: No error analysis, failure cases, or controls for overfitting are supplied despite the policy being trained exclusively on trajectories from simple scrambled expressions. Without these, it is impossible to evaluate whether the reported performance on the 200-term amplitudes reflects successful transfer or is an artifact of the search procedure.
minor comments (1)
  1. [Abstract] The abstract refers to 'near perfect solve rates across a wide range of difficulty levels' but supplies no quantitative table or figure that would allow the reader to judge the scaling with expression size or term count.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their careful reading and constructive comments on our manuscript. We address the major points raised below and will revise the manuscript to improve clarity on the training details, generalization evidence, and analysis of the results.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The headline claim of a 100% full simplification rate on 5-point gluon amplitudes with >200 initial terms is presented without any description of the training/validation split, the distribution of expression sizes or nesting depths in the training set, or an ablation that isolates the contribution of the learned policy from that of beam search. This directly bears on the central generalization claim and prevents assessment of whether the policy network itself succeeds on structurally dissimilar large expressions.

    Authors: We agree that the abstract would benefit from additional context on these aspects to support the generalization claim. The full manuscript (Section 3) specifies that training trajectories are generated from scrambled expressions with 2–15 terms and nesting depths up to 3, using an 80/20 train/validation split on the oracle trajectories. We will revise the abstract to briefly summarize the training distribution and add an ablation study (new subsection in Section 5) that compares the learned policy against beam search with a random or heuristic policy to isolate the network's contribution. revision: yes

  2. Referee: [Abstract] Abstract: No error analysis, failure cases, or controls for overfitting are supplied despite the policy being trained exclusively on trajectories from simple scrambled expressions. Without these, it is impossible to evaluate whether the reported performance on the 200-term amplitudes reflects successful transfer or is an artifact of the search procedure.

    Authors: We acknowledge that a more explicit error analysis would strengthen the paper. The current manuscript reports near-perfect solve rates across a range of difficulty levels in Section 4, but we will expand the revision to include a dedicated error analysis subsection with representative failure cases (e.g., deeply nested dilogarithms) and a control experiment that evaluates the policy network on large amplitudes without beam search or contrastive grouping. This will help demonstrate that the performance reflects transfer rather than solely the search procedure. revision: yes

Circularity Check

0 steps flagged

No circularity: training trajectories generated independently of target expressions

full rationale

The paper's core method generates oracle trajectories exclusively by scrambling simple expressions and recording their inverse operations; these trajectories serve as training data for the policy network. The target 5-point gluon amplitudes (with >200 terms) are never used in training or as part of the oracle generation process. Consequently, the reported 100% simplification rate on those amplitudes cannot reduce to a fit or self-definition by construction. No load-bearing self-citation, uniqueness theorem, or ansatz smuggling is described that would collapse the result to the inputs. The derivation chain remains self-contained against external benchmarks because success is measured on held-out, structurally distinct expressions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract alone supplies no explicit free parameters, axioms, or invented entities; the approach rests on standard transformer architecture and data-generation procedure whose details are not provided.

pith-pipeline@v0.9.0 · 5456 in / 1166 out tokens · 39575 ms · 2026-05-15T12:28:32.031490+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

  • IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear
    ?
    unclear

    Relation between the paper passage and the cited Recognition theorem.

    Training data is generated by scrambling simple expressions and recording the inverse operations, creating oracle trajectories... A permutation-equivariant, transformer-based policy network is then trained on this data step-wise to predict the oracle action

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Learning to Unscramble Feynman Loop Integrals with SAILIR

    hep-ph 2026-04 unverdicted novelty 8.0

    A self-supervised transformer learns to unscramble Feynman integrals for online IBP reduction, delivering bounded memory use on complex two-loop topologies while matching Kira's speed on the hardest cases tested.

  2. When Does Critique Improve AI-Assisted Theoretical Physics? SCALAR: Structured Critic--Actor Loop for Agentic Reasoning

    cs.AI 2026-05 unverdicted novelty 7.0

    Structured critic-actor loops improve AI performance on theoretical physics reasoning tasks, with benefits strongest in asymmetric model pairings using constructive feedback.

  3. A Scientific Human-Agent Reproduction Pipeline

    hep-ph 2026-04 unverdicted novelty 6.0

    SHARP is a human-AI collaboration pipeline for reproducing scientific analyses, demonstrated by recreating a jet classification task from a particle physics paper.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages · cited by 3 Pith papers · 9 internal anchors

  1. [1]

    A pre-trained contrastive encoder [18, 27] computes an embedding for each numerator term, and pairwise cosine 11 similarities identify groups of terms likely to simplify to- gether

    Phase 1: Greedy Contrastive Grouping Since input expressions have up to∼200 terms but the model accepts at most 25, we follow CDS and use a contrastive groupingstrategy to decompose the problem. A pre-trained contrastive encoder [18, 27] computes an embedding for each numerator term, and pairwise cosine 11 similarities identify groups of terms likely to s...

  2. [2]

    Computes the pairwise cosine similarity matrix over all current numerator terms (with numerical coefficients removed for robustness)

  3. [3]

    For each reference term, selects the most similar neighbors above a threshold, forming groups of up to 25 terms

  4. [4]

    For each group, factors out common spinor brackets shared by all terms, reducing the sub-expression complexity before model evaluation

  5. [5]

    Applies a single model action to the reduced sub- expression, accepts the result if it reduces the term count, and reassembles with the remaining terms

  6. [6]

    The similarity threshold starts at 0.6 and relaxes across passes (through 0.5 and 0.4), progressively allowing less similar terms to be grouped

    After a full pass through all reference terms, re- cancels the expression and begins a new pass with updated similarities. The similarity threshold starts at 0.6 and relaxes across passes (through 0.5 and 0.4), progressively allowing less similar terms to be grouped. Up to 100 passes are per- formed per form, with numerical validation after each pass to d...

  7. [7]

    The beam maintains the topK=20 candidate expressions, ranked by (n terms, nbrackets)

    Phase 2: Beam Search Forms not solved by the greedy phase enter a beam search over sequences of identity applications. The beam maintains the topK=20 candidate expressions, ranked by (n terms, nbrackets). At each step, every beam entry is expanded using contrastive grouping: the contrastive encoder computes pairwise cosine similarities among all numerator...

  8. [8]

    Of these, 17 are solved in the greedy phase alone and the remaining 86 re- quire beam search

    Results All 103 forms are successfully simplified to the 1-term Parke-Taylor formula, a 100% solve rate. Of these, 17 are solved in the greedy phase alone and the remaining 86 re- quire beam search. The hardest form (form990, starting at 178 terms) required 98 beam search steps to reach the target. Fig. 6 compares our solve rate against CDS as a func- tio...

  9. [9]

    Lample and F

    G. Lample and F. Charton, inInternational Con- ference on Learning Representations (ICLR)(2020) arXiv:1912.01412 [cs.SC]

  10. [10]

    Cranmer, A

    M. Cranmer, A. Sanchez-Gonzalez, P. Battaglia, R. Xu, K. Cranmer, D. Spergel, and S. Ho, inAdvances in Neu- ral Information Processing Systems 33(2020) pp. 17429– 17442, arXiv:2006.11287 [cs.LG]

  11. [11]

    Lample, M.-A

    G. Lample, M.-A. Lachaux, T. Lavril, X. Martinet, A. Hayat, G. Ebner, A. Rodriguez, and T. Lacroix, in Advances in Neural Information Processing Systems 35 (2022) arXiv:2205.11491 [cs.LG]

  12. [12]

    Dabelow and M

    L. Dabelow and M. Ueda, Neurocomputing613, 128732 (2024), arXiv:2401.13447 [cs.LG]

  13. [13]

    Davies, P

    A. Davies, P. Veliˇ ckovi´ c, L. Buesing, S. Blackwell, D. Zheng, N. Tomaˇ sev, R. Tanburn, P. Battaglia, C. Blundell, A. Juh´ asz, M. Lackenby, G. Williamson, D. Hassabis, and P. Kohli, Nature600, 70 (2021)

  14. [14]

    A. N. Kirillov, Prog. Theor. Phys. Suppl.118, 61 (1995), arXiv:hep-th/9408113

  15. [15]

    A. V. Kotikov, Phys. Lett. B254, 158 (1991)

  16. [16]

    Two-Loop Master Integrals for $\gamma^* \to 3$ Jets: The planar topologies

    T. Gehrmann and E. Remiddi, Nucl. Phys. B601, 248 (2001), arXiv:hep-ph/0008287

  17. [17]

    S. J. Parke and T. R. Taylor, Phys. Rev. Lett.56, 2459 (1986)

  18. [18]

    V. P. Nair, Phys. Lett. B214, 215 (1988)

  19. [19]

    Perturbative Gauge Theory As A String Theory In Twistor Space

    E. Witten, Commun. Math. Phys.252, 189 (2004), arXiv:hep-th/0312171

  20. [20]

    MHV Vertices And Tree Amplitudes In Gauge Theory

    F. Cachazo, P. Svrˇ cek, and E. Witten, JHEP09, 006 (2004), arXiv:hep-th/0403047

  21. [21]

    Direct Proof Of Tree-Level Recursion Relation In Yang-Mills Theory

    R. Britto, F. Cachazo, B. Feng, and E. Witten, Phys. Rev. Lett.94, 181602 (2005), arXiv:hep-th/0501052

  22. [22]

    L. J. Dixon, inTheoretical Advanced Study Insti- tute in Elementary Particle Physics: Particle Physics: The Higgs Boson and Beyond(2014) pp. 31–67, arXiv:1310.5353 [hep-ph]

  23. [23]

    Elvang and Y.-t

    H. Elvang and Y.-t. Huang,Scattering Amplitudes in Gauge Theory and Gravity(Cambridge University Press, 2015)

  24. [24]

    TASI Lectures on Scattering Amplitudes

    C. Cheung, inProceedings, Theoretical Advanced Study Institute in Elementary Particle Physics (TASI 2016) (2018) pp. 571–623, arXiv:1708.03872 [hep-ph]

  25. [25]

    Dersy, M

    A. Dersy, M. D. Schwartz, and X. Zhang, Int. J. Data Sci. Math. Sci.1, 135 (2024), arXiv:2206.04115 [cs.LG]

  26. [26]

    Cheung, A

    C. Cheung, A. Dersy, and M. D. Schwartz, SciPost Phys. 18, 040 (2025), arXiv:2408.04720 [hep-th]

  27. [27]

    D. A. Pomerleau, inAdvances in Neural Information Pro- cessing Systems 1, edited by D. S. Touretzky (Morgan Kaufmann, 1989) pp. 305–313

  28. [28]

    Bain and C

    M. Bain and C. Sammut, inMachine Intelligence 15 (1995) pp. 103–129

  29. [29]

    Agostinelli, S

    F. Agostinelli, S. McAleer, A. Shmakov, R. Bhatt,et al., Nat. Mach. Intell.1, 356 (2019)

  30. [30]

    Takano, Trans

    K. Takano, Trans. Mach. Learn. Res. (2023)

  31. [31]

    S. Ross, G. J. Gordon, and D. Bagnell, inProceedings of the Fourteenth International Conference on Artificial In- telligence and Statistics (AISTATS), Proceedings of Ma- chine Learning Research, Vol. 15 (PMLR, 2011) pp. 627– 635

  32. [32]

    Attention Is All You Need

    A. Vaswani, N. Shazeer, N. Parmar, J. Uszkoreit, L. Jones, A. N. Gomez, L. Kaiser, and I. Polosukhin, inAdvances in Neural Information Processing Systems 30(2017) pp. 5998–6008, arXiv:1706.03762 [cs.CL]

  33. [33]

    Zagier, inFrontiers in Number Theory, Physics, and Geometry II(Springer, Berlin, Heidelberg, 2007) pp

    D. Zagier, inFrontiers in Number Theory, Physics, and Geometry II(Springer, Berlin, Heidelberg, 2007) pp. 3– 65

  34. [34]

    ML Polylogarithms: Code and pretrained models for polylogarithm simplification,

    A. Dersy, M. D. Schwartz, and X. Zhang, “ML Polylogarithms: Code and pretrained models for polylogarithm simplification,”https://github.com/ aureliendersy/ML_Polylogarithms(2022)

  35. [35]

    spinorhe- licity: Code and pretrained models for scatter- ing amplitude simplification,

    C. Cheung, A. Dersy, and M. D. Schwartz, “spinorhe- licity: Code and pretrained models for scatter- ing amplitude simplification,”https://github.com/ aureliendersy/spinorhelicity(2024)

  36. [36]

    Duhr and F

    C. Duhr and F. Dulat, JHEP08, 135 (2019), arXiv:1904.07279 [hep-ph]

  37. [37]

    K. G. Chetyrkin and F. V. Tkachov, Nucl. Phys. B192, 159 (1981)

  38. [38]

    High-precision calculation of multi-loop Feynman integrals by difference equations

    S. Laporta, Int. J. Mod. Phys. A15, 5087 (2000), arXiv:hep-ph/0102033

  39. [39]

    von Hippel and M

    M. von Hippel and M. Wilhelm, JHEP05, 185 (2025), arXiv:2502.05121 [hep-th]

  40. [40]

    Song, T.-Z

    Z.-Y. Song, T.-Z. Yang, Q.-H. Cao, M.-x. Luo, and H. X. Zhu, (2025), arXiv:2502.09544 [hep-ph]

  41. [41]

    Zeng, Phys

    M. Zeng, Phys. Rev. D (2025), 10.1103/dmlf-jkfc, arXiv:2504.16045 [hep-ph]

  42. [42]

    T. Cai, G. W. Merz, F. Charton, N. Nolte, M. Wilhelm, K. Cranmer,et al., (2024), arXiv:2405.06107 [hep-th]

  43. [43]

    T. Cai, F. Charton, K. Cranmer, L. J. Dixon, G. W. Merz, and M. Wilhelm, (2025), arXiv:2501.05743 [hep- th]