pith. sign in

arxiv: 2605.19063 · v1 · pith:5OPRPFEMnew · submitted 2026-05-18 · 💻 cs.LG

Mapping Uncharted Symmetries: Machine Discovery in Combinatorics

Pith reviewed 2026-05-20 11:45 UTC · model grok-4.3

classification 💻 cs.LG
keywords q,t-Narayana polynomialsnoncrossing partitionscombinatorial interpretationmachine discoveryalgebraic combinatoricssymmetry proofSLURPLean formalization
0
0 comments X

The pith

Machine learning discovers a new combinatorial interpretation of the q,t-Narayana polynomials based on noncrossing partitions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The authors formalize a machine learning task called Simple Learning Under Rigid Proportions to search for exact mathematical functions that satisfy strict distributional constraints. They develop two search procedures, MapSeek-Functional and MapSeek-Symbolic, and apply them to an open question in algebraic combinatorics. The procedures surface statistics on noncrossing partitions that generate the q,t-Narayana polynomials arising from representation theory. This yields the first such combinatorial interpretation using noncrossing partitions. One of the surfaced statistics supplies a combinatorial proof of symmetry for these polynomials in a case that had remained open.

Core claim

The central discovery is a new combinatorial interpretation of the q,t-Narayana polynomials that arises from representation theory and is realized by statistics on noncrossing partitions. Machine learning under exact proportion constraints identifies these statistics, and one of them directly yields a combinatorial proof of the polynomials' symmetry in a previously unsolved case. All findings are formalized in Lean 4.

What carries the argument

Statistics on noncrossing partitions discovered by MapSeek-Functional and MapSeek-Symbolic that generate the q,t-Narayana polynomials and establish their symmetry.

If this is right

  • The interpretation supplies the first combinatorial model of these polynomials using noncrossing partitions.
  • One discovered statistic gives a direct combinatorial proof of symmetry for a case without prior proof.
  • The SLURP framework and MapSeek procedures are shown to produce verifiable mathematical objects in algebraic combinatorics.
  • Full Lean 4 formalization and released code make the discoveries immediately checkable by others.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same constrained search approach could be tried on other families of polynomials whose combinatorial interpretations remain incomplete.
  • Success here indicates that rigid-proportion constraints can guide machine search toward objects that admit human-readable proofs.
  • Combining the methods with proof assistants may create a repeatable workflow for generating candidate combinatorial identities.

Load-bearing premise

The statistics produced by the search procedures must reflect genuine combinatorial structure rather than artifacts of the training distribution or search heuristic.

What would settle it

Explicit enumeration of noncrossing partitions for small n and k must show that the proposed statistic reproduces the exact coefficients of the q,t-Narayana polynomial.

Figures

Figures reproduced from arXiv: 2605.19063 by Alessandro Iraci, Eugenio Cainelli, Giovanni Paolini, Lorenzo Luccioli, Michele D'Adderio.

Figure 1
Figure 1. Figure 1: Left: atlas of notable statistics (nodes) for [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Noncrossing partitions NC(4, 3). The corresponding q, t-Narayana is q 2+qt+t 2+q+t+1. 4 q, t-Combinatorics Setup Within algebraic combinatorics, the area of q, t-combinatorics studies families of bivariate polyno￾mials F(q, t) with nonnegative integer coefficients arising in a variety of mathematical domains such as symmetric functions, representation theory, algebraic geometry, knot theory, probability, a… view at source ↗
Figure 3
Figure 3. Figure 3: Left: PCA scatter plot of the MapSeek-Functional successful runs (refined [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: A 12 × 7 polyomino with area 30 − 18 = 12 (left) and its bounce path (right); the bounce statistic is equal to 41 − 18 = 23. Definition A.1. The area statistic on PP(m, n) is the number of whole squares between the two paths, normalized by subtracting m + n − 1 (so that the minimum possible area is 0). Parallelogram polyominoes have a bounce path. To compute it, start from (0, 0), draw a single east step, … view at source ↗
Figure 5
Figure 5. Figure 5: The embedding used in Proposition A.3. A new bottom row is added, but only its leftmost [PITH_FULL_IMAGE:figures/full_fig_p015_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: An example of the bijection η. This is closely related to the known inv statistic on ordered set partitions [24]. The skip statistic from (2) can also be read directly from the drawing of a noncrossing partition. If β = {b1 < · · · < br} is a block, draw the edges joining consecutive elements in increasing order, namely b1b2, b2b3, . . . , br−1br, and ignore only the closing edge brb1. Each counted edge bi… view at source ↗
Figure 7
Figure 7. Figure 7: Computing the skip statistic from a noncrossing partition. The contribution of each counted [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Exchanging bijection for n = 4 and k = 3. A.4 The Min Gap Arc Statistic The min gap arc statistic (mingarc) is a human-defined extension of the flipped skew statistic discovered by our ML methods on NC(n, 3). It is defined on NC(n, k) for arbitrary k through the arc-selection procedure below and agrees with flipped skew when k = 3. Together with skip, it gives a combinatorial interpretation of Nn,k(q, t). … view at source ↗
Figure 9
Figure 9. Figure 9: Min gap arc construction on two noncrossing partitions. Solid red arcs are selected; dashed [PITH_FULL_IMAGE:figures/full_fig_p017_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Overview of the objects and statistics covered by the Lean code. Thick edges indicate [PITH_FULL_IMAGE:figures/full_fig_p018_10.png] view at source ↗
read the original abstract

Inspired by long-standing open problems in algebraic combinatorics, we show that modern machine learning can meaningfully contribute to verifiable mathematical discoveries. In particular, we focus on the construction of simple mathematical functions under exact distributional constraints, a setting we formalize as Simple Learning Under Rigid Proportions (SLURP). We tackle this problem by introducing two methods: MapSeek-Functional, which models the desired function alternating pseudo-labeling and supervised training steps; and MapSeek-Symbolic, designed to directly produce symbolic formulas. We successfully apply both methods to a research problem in algebraic combinatorics, discovering a new combinatorial interpretation of the $q,t$-Narayana polynomials arising from representation theory. To our knowledge, this is the first such interpretation based on noncrossing partitions. Using one discovered statistic, we find a combinatorial proof of the symmetry of these polynomials in a previously unsolved case. To streamline verification and reproducibility, we release all code, including a formalization of all the mathematical discoveries of this paper in Lean 4.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces the SLURP setting for discovering simple functions under exact distributional constraints and develops two methods (MapSeek-Functional, which alternates pseudo-labeling with supervised training, and MapSeek-Symbolic) to address it. These methods are applied to an open problem in algebraic combinatorics, yielding a new combinatorial interpretation of the q,t-Narayana polynomials in terms of noncrossing partitions (the first such interpretation) together with a combinatorial proof of symmetry in a previously unsolved case; all identities and the proof are formalized and machine-checked in Lean 4, with full code released.

Significance. If the results hold, the work demonstrates that modern machine-learning techniques can produce verifiable, parameter-free contributions to algebraic combinatorics by supplying the first noncrossing-partition interpretation of the q,t-Narayana polynomials and resolving an open symmetry question combinatorially. The explicit Lean 4 formalization of the discovered statistics, identities, and proof, together with the public release of all code, constitutes a major strength: the mathematical claims become independently verifiable and decoupled from the training procedure once found.

minor comments (3)
  1. The precise proportions and sampling procedure used to generate the SLURP training instances are described only at a high level in the application section; adding an explicit table or pseudocode listing the exact distributional constraints would improve reproducibility of the discovery process.
  2. Notation for the newly discovered statistics on noncrossing partitions is introduced inline; a dedicated subsection collecting the definitions, together with their q,t-generating functions, would aid readability.
  3. The Lean 4 formalization is referenced in the abstract and conclusion, but the main text should include a direct pointer to the repository or a short summary of which theorems have been checked (e.g., the symmetry identity in the open case).

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript, including recognition of the SLURP framework, the application to the q,t-Narayana polynomials, the first noncrossing-partition interpretation, the combinatorial symmetry proof, and the Lean 4 formalization with public code release. We note the recommendation for minor revision and will address any editorial or presentational suggestions in the revised version.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper employs MapSeek-Functional and MapSeek-Symbolic to surface candidate statistics on noncrossing partitions, then extracts an explicit combinatorial interpretation of the q,t-Narayana polynomials together with a symmetry proof; both are stated as ordinary combinatorial objects and are independently formalized and machine-checked in Lean 4 with released code. Because the final mathematical statements are parameter-free, externally verifiable, and do not redefine any quantity in terms of the search procedure or fitted values, the derivation chain does not reduce to its inputs by construction. No self-definitional equations, fitted-input predictions, or load-bearing self-citations appear in the central claims.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work relies on standard facts from algebraic combinatorics and representation theory (q,t-analogs, noncrossing partitions) plus the correctness of the Lean 4 kernel. No new free parameters are introduced to define the final statistic or proof; the machine-learning stage is used only for discovery and is not part of the mathematical claim.

axioms (1)
  • domain assumption Standard properties of q,t-Narayana polynomials and noncrossing partitions as defined in the algebraic combinatorics literature
    Invoked when stating that the discovered statistic matches the known polynomials and when constructing the symmetry proof.

pith-pipeline@v0.9.0 · 5716 in / 1365 out tokens · 44761 ms · 2026-05-20T11:45:33.760999+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

28 extracted references · 28 canonical work pages · 2 internal anchors

  1. [1]

    Aristotle: IMO-level Automated Theorem Proving

    Tudor Achim, Alex Best, Alberto Bietti, Kevin Der, Mathïs Fédérico, Sergei Gukov, Daniel Halpern-Leistner, Kirsten Henningsgard, Yury Kudryashov, Alexander Meiburg, Martin Michelsen, Riley Patterson, Eric Rodriguez, Laura Scharff, Vikram Shanker, Vladmir Sicca, 10 Hari Sowrirajan, Aidan Swope, Matyas Tamas, Vlad Tenev, Jonathan Thomm, Harold Williams, and...

  2. [3]

    Global lyapunov functions: a long- standing open problem in mathematics, with symbolic transformers

    Alberto Alfarano, François Charton, and Amaury Hayat. Global lyapunov functions: a long- standing open problem in mathematics, with symbolic transformers. In Amir Globersons, Lester Mackey, Danielle Belgrave, Angela Fan, Ulrich Paquet, Jakub M. Tomczak, and Cheng Zhang, editors,Advances in Neural Information Processing Systems 38: Annual Conference on Neu...

  3. [4]

    Statistics on parallelogram polyominoes and a q, t-analogue of the Narayana numbers.J

    Jean-Christophe Aval, Michele D’Adderio, Mark Dukes, Angela Hicks, and Yvan Le Borgne. Statistics on parallelogram polyominoes and a q, t-analogue of the Narayana numbers.J. Combin. Theory Ser. A, 123:271–286, 2014. ISSN 0097-3165. URL https://doi.org/10. 1016/j.jcta.2013.09.001

  4. [6]

    Easy Learning from Label Proportions.CoRR, abs/2302.03115, 2023

    Róbert Istvan Busa-Fekete, Heejin Choi, Travis Dick, Claudio Gentile, and Andrés Muñoz Medina. Easy Learning from Label Proportions.CoRR, abs/2302.03115, 2023. doi: 10.48550/ ARXIV .2302.03115. URLhttps://doi.org/10.48550/arXiv.2302.03115

  5. [7]

    Neurosymbolic programming.Found

    Swarat Chaudhuri, Kevin Ellis, Oleksandr Polozov, Rishabh Singh, Armando Solar-Lezama, and Yisong Yue. Neurosymbolic programming.Found. Trends Program. Lang., 7(3):158–243,

  6. [8]

    URLhttps://doi.org/10.1561/2500000049

    doi: 10.1561/2500000049. URLhttps://doi.org/10.1561/2500000049

  7. [11]

    doi:10.1016/j

    Michele D’Adderio, Alessandro Iraci, and Anna Vanden Wyngaerd. Theta operators, refined delta conjectures, and coinvariants.Adv. Math., 376:60, 2021. ISSN 0001-8708. doi: 10.1016/j. aim.2020.107447. Id/No 107447

  8. [12]

    Michele D’Adderio, Alessandro Iraci, and Anna Vanden Wyngaerd.Decorated Dyck paths, polyominoes, and the Delta conjecture, volume 278 ofMemoirs of the American Mathemat- ical Society, no. 1370. American Mathematical Society, July 2022. ISBN 9781470471705 9781470471576. doi: 10.1090/memo/1370. URLhttps://www.ams.org/memo/1370/

  9. [13]

    Battaglia, Charles Blundell, András Juhász, Marc Lackenby, Geordie Williamson, Demis Hassabis, and Pushmeet Kohli

    Alex Davies, Petar Velickovic, Lars Buesing, Sam Blackwell, Daniel Zheng, Nenad Tomasev, Richard Tanburn, Peter W. Battaglia, Charles Blundell, András Juhász, Marc Lackenby, Geordie Williamson, Demis Hassabis, and Pushmeet Kohli. Advancing mathematics by guiding human intuition with AI.Nat., 600(7887):70–74, 2021. doi: 10.1038/S41586-021-04086-X. URL http...

  10. [14]

    Garsia and James Haglund

    Adriano M. Garsia and James Haglund. A proof of the q, t-Catalan positivity conjecture. Discrete Math, 256(3):677–717, 2002. ISSN 0012-365X. URL https://doi.org/10.1016/ S0012-365X(02)00343-6. LaCIM 2000 Conference on Combinatorics, Computer Science and Applications (Montreal, QC). 11

  11. [15]

    Generalized q, t- Catalan numbers.Algebraic Combinatorics, 3(4):855–886, 2020

    Eugene Gorsky, Graham Hawkes, Anne Schilling, and Julianne Rainbolt. Generalized q, t- Catalan numbers.Algebraic Combinatorics, 3(4):855–886, 2020. URL https://arxiv.org/ abs/1905.10973

  12. [16]

    Program synthesis.Found

    Sumit Gulwani, Oleksandr Polozov, and Rishabh Singh. Program synthesis.Found. Trends Program. Lang., 4(1-2):1–119, 2017. doi: 10.1561/2500000010. URL https://doi.org/10. 1561/2500000010

  13. [17]

    With an appendix on the combinatorics of Macdonald polynomials, volume 41 ofUniv

    James Haglund.The q, t-Catalan numbers and the space of diagonal harmonics. With an appendix on the combinatorics of Macdonald polynomials, volume 41 ofUniv. Lect. Ser. Providence, RI: American Mathematical Society (AMS), 2008. ISBN 978-0-8218-4411-3

  14. [18]

    Hilbert schemes, polygraphs and the Macdonald positivity conjecture.J

    Mark Haiman. Hilbert schemes, polygraphs and the Macdonald positivity conjecture.J. Amer. Math. Soc., 14(4):941–1006, 2001. ISSN 0894-0347. URL https://doi.org/10.1090/ S0894-0347-01-00373-3

  15. [19]

    Learning from label proportions: Bootstrapping supervised learners via belief propagation

    Shreyas Havaldar, Navodita Sharma, Shubhi Sareen, Karthikeyan Shanmugam, and Aravindan Raghuveer. Learning from label proportions: Bootstrapping supervised learners via belief propagation. InThe Twelfth International Conference on Learning Representations, ICLR 2024, Vienna, Austria, May 7-11, 2024. OpenReview.net, 2024. URL https://openreview.net/ forum?...

  16. [20]

    From black box to bijection: Interpreting machine learning to build a zeta map algorithm.CoRR, abs/2511.12421, 2025

    Xiaoyu Huang, Blake Jackson, and Kyu-Hwan Lee. From black box to bijection: Interpreting machine learning to build a zeta map algorithm.CoRR, abs/2511.12421, 2025. doi: 10.48550/ ARXIV .2511.12421. URLhttps://doi.org/10.48550/arXiv.2511.12421

  17. [21]

    I. G. Macdonald. A new class of symmetric functions.Sémin. Lothar. Comb., 20:b20a, 41, 1988. ISSN 1286-4889. URLhttps://eudml.org/doc/120965

  18. [22]

    The Lean mathematical library

    Mathlib community. The Lean mathematical library. InProceedings of the 9th ACM SIGPLAN International Conference on Certified Programs and Proofs, CPP 2020, pages 367–381, New York, NY , USA, 2020. Association for Computing Machinery. ISBN 9781450370974. doi: 10.1145/3372885.3373824. URLhttps://doi.org/10.1145/3372885.3373824

  19. [23]

    The Lean 4 theorem prover and programming language

    Leonardo de Moura and Sebastian Ullrich. The Lean 4 theorem prover and programming language. InAutomated Deduction – CADE 28: 28th International Conference on Automated Deduction, Virtual Event, July 12–15, 2021, Proceedings, pages 625–635, Berlin, Heidelberg,

  20. [24]

    TheLean4theoremproverandprogramming language

    Springer-Verlag. doi: 10.1007/978-3-030-79876-5_37. URL https://doi.org/10. 1007/978-3-030-79876-5_37

  21. [25]

    Langdon, and Nicholas Freitag McPhee.A Field Guide to Genetic Pro- gramming

    Riccardo Poli, William B. Langdon, and Nicholas Freitag McPhee.A Field Guide to Genetic Pro- gramming. lulu.com, 2008. ISBN 978-1-4092-0073-4. URL http://www.gp-field-guide. org.uk/

  22. [26]

    Smola, Tibério S

    Novi Quadrianto, Alexander J. Smola, Tibério S. Caetano, and Quoc V . Le. Estimating Labels from Label Proportions.J. Mach. Learn. Res., 10:2349–2374, 2009. doi: 10.5555/1577069. 1755865. URLhttps://dl.acm.org/doi/10.5555/1577069.1755865

  23. [27]

    Remmel and Andrew T

    Jeffrey B. Remmel and Andrew T. Wilson. An extension of MacMahon’s equidistribution theorem to ordered set partitions.J. Combin. Theory Ser. A, 134:242–277, 2015. ISSN 0097-

  24. [28]

    URL https://doi.org/10.1016/j.jcta.2015

    doi: 10.1016/j.jcta.2015.03.012. URL https://doi.org/10.1016/j.jcta.2015. 03.012

  25. [29]

    Learning from Label Proportions: A Mutual Contamination Framework

    Clayton Scott and Jianxin Zhang. Learning from Label Proportions: A Mutual Contamination Framework. In Hugo Larochelle, Marc’Aurelio Ranzato, Raia Hadsell, Maria-Florina Balcan, and Hsuan-Tien Lin, editors,Advances in Neural Information Processing Systems 33: Annual Conference on Neural Information Processing Systems 2020, NeurIPS 2020, December 6-12, 202...

  26. [30]

    Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories.Genet

    Sara Silva and Ernesto Costa. Dynamic limits for bloat control in genetic programming and a review of past and current bloat theories.Genet. Program. Evolvable Mach., 10(2): 141–179, 2009. doi: 10.1007/S10710-008-9075-9. URL https://doi.org/10.1007/ s10710-008-9075-9. 12

  27. [32]

    Constructions in combinatorics via neural networks.CoRR, abs/2104.14516, 2021

    Adam Zsolt Wagner. Constructions in combinatorics via neural networks.CoRR, abs/2104.14516, 2021. URLhttps://arxiv.org/abs/2104.14516

  28. [33]

    From scale to speed: Adaptive test-time scaling for image editing.CoRR, abs/2603.00141, 2026

    Geordie Williamson. Is deep learning a useful tool for the pure mathematician?CoRR, abs/2304.12602, 2023. doi: 10.48550/ARXIV .2304.12602. URL https://doi.org/10. 48550/arXiv.2304.12602. 13 Aq, t-Combinatorics Background & Narayana Models Triggered by the discovery of the famous Macdonald polynomials [19], q, t-combinatorics studies families of bivariate ...