pith. machine review for the scientific record. sign in

arxiv: 2602.17276 · v2 · submitted 2026-02-19 · 💻 cs.LG · math.CO

Recognition: no theorem link

RLGT: A reinforcement learning framework for extremal graph theory

Authors on Pith no claims yet

Pith reviewed 2026-05-15 21:18 UTC · model grok-4.3

classification 💻 cs.LG math.CO
keywords reinforcement learningextremal graph theorycombinatorial optimizationgraph representationRamsey numbersspectral radiusTurán problems
0
0 comments X

The pith

The RLGT framework systematizes reinforcement learning for extremal graph theory across undirected, directed, looped, and multi-colored graphs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces RLGT as a reinforcement learning framework that organizes prior applications of RL to extremal graph problems by reformulating them as combinatorial optimization tasks. It extends support to undirected and directed graphs, with or without loops, and any number of edge colors while emphasizing efficient graph representation. The modular design and performance optimizations are intended to reduce the effort needed for new RL experiments in the field. A sympathetic reader would care because this could accelerate work on open questions such as Ramsey number bounds and spectral radius inequalities by providing a shared starting point instead of repeated custom implementations.

Core claim

RLGT is a novel RL framework that systematizes the previous work and provides support for both undirected and directed graphs, with or without loops, and with an arbitrary number of edge colors. The framework efficiently represents graphs and aims to facilitate future RL-based research in extremal graph theory through optimized computational performance and a clean and modular design.

What carries the argument

The RLGT framework, a modular reinforcement learning environment that represents graphs of varying types and colors to optimize constructions for extremal objectives.

If this is right

  • Researchers can apply RL directly to directed graphs and multi-edge-color problems without building new environments.
  • Standardized graph representations lower the barrier to experimenting with Turán-type problems involving cycles of length three and four.
  • Optimized performance supports scaling RL searches to larger graphs for spectral radius questions.
  • Modular components allow straightforward addition of new forbidden subgraph constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same modular structure could be adapted for RL on other combinatorial structures such as hypergraphs or matroids.
  • Integration with newer policy-gradient or actor-critic methods might improve sample efficiency on the same graph problems.
  • Automated conjecture generation becomes more feasible if the framework is used to systematically explore large families of graphs.

Load-bearing premise

That supplying a modular and computationally optimized framework will lead researchers to adopt it and produce new results in extremal graph theory.

What would settle it

Future papers solving specific problems such as new lower bounds for Ramsey numbers would show no measurable speed or quality gain when using RLGT compared with prior custom RL setups.

Figures

Figures reproduced from arXiv: 2602.17276 by Dragan Stevanovi\'c, Irena {\DJ}or{\dj}evi\'c, Ivan Damnjanovi\'c, Uro\v{s} Milivojevi\'c.

Figure 1
Figure 1. Figure 1: The best score versus step count plot for the Deep Cross-Entropy agent based disproof of Conjec [PITH_FULL_IMAGE:figures/full_fig_p018_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The best score versus step count plot for the disproof of Conjecture [PITH_FULL_IMAGE:figures/full_fig_p021_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: The best score versus step count plot for the unsuccessful attempt to disprove Conjecture [PITH_FULL_IMAGE:figures/full_fig_p022_3.png] view at source ↗
read the original abstract

Reinforcement learning (RL) is a subfield of machine learning that focuses on developing models that can autonomously learn optimal decision-making strategies over time. In a recent pioneering paper, Wagner demonstrated how the Deep Cross-Entropy RL method can be applied to tackle various problems from extremal graph theory by reformulating them as combinatorial optimization problems. Subsequently, many researchers became interested in refining and extending the framework introduced by Wagner, thereby creating various RL environments specialized for graph theory. Moreover, a number of problems from extremal graph theory were solved through the use of RL. In particular, several inequalities concerning the Laplacian spectral radius of graphs were refuted, new lower bounds were obtained for certain Ramsey numbers, and contributions were made to the Tur\'{a}n-type extremal problem in which the forbidden structures are cycles of length three and four. Here, we present Reinforcement Learning for Graph Theory (RLGT), a novel RL framework that systematizes the previous work and provides support for both undirected and directed graphs, with or without loops, and with an arbitrary number of edge colors. The framework efficiently represents graphs and aims to facilitate future RL-based research in extremal graph theory through optimized computational performance and a clean and modular design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces RLGT, a reinforcement learning framework that systematizes prior Deep Cross-Entropy RL work on extremal graph theory (e.g., Wagner and follow-ups). It claims to support undirected/directed graphs with or without loops and arbitrary edge colors via an efficient graph representation and modular design intended to facilitate future research.

Significance. If the efficiency and modularity claims are substantiated, the framework could standardize RL environments for combinatorial graph problems, lowering barriers for exploring open questions in Ramsey theory, spectral radii, and Turán-type problems. The absence of any benchmarks or usage examples in the manuscript, however, leaves the practical significance unverified.

major comments (2)
  1. [Abstract and §1] Abstract and §1: The central claims of 'optimized computational performance' and 'facilitate future RL-based research' rest on assertions about representation efficiency and modularity, yet the manuscript supplies no runtime data, memory profiles, scaling curves, or side-by-side comparisons against the ad-hoc environments cited from Wagner et al. This directly undermines the facilitation claim.
  2. [§3 (Framework Architecture)] §3 (Framework Architecture): The description of the graph encoding (adjacency tensor, edge-list, or reward shaping) provides no complexity analysis or pseudocode showing how arbitrary numbers of colors, directed edges, and loops are handled without performance degradation; without this, the efficiency assertion cannot be evaluated.
minor comments (1)
  1. [§2.2] §2.2: The notation distinguishing graph classes (directed/undirected, loops, colors) would benefit from an explicit summary table to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and describe the revisions planned to strengthen the empirical and technical support for our claims.

read point-by-point responses
  1. Referee: [Abstract and §1] Abstract and §1: The central claims of 'optimized computational performance' and 'facilitate future RL-based research' rest on assertions about representation efficiency and modularity, yet the manuscript supplies no runtime data, memory profiles, scaling curves, or side-by-side comparisons against the ad-hoc environments cited from Wagner et al. This directly undermines the facilitation claim.

    Authors: We agree that the absence of runtime data, memory profiles, scaling curves, and direct comparisons weakens the substantiation of the efficiency and facilitation claims. In the revised manuscript we will add a new experimental subsection (likely §4) containing benchmark results against the ad-hoc environments from Wagner et al. These will include wall-clock times, peak memory usage, and scaling curves for graphs of increasing order, covering undirected/directed cases, loops, and multiple edge colors. The results will quantify the performance advantage of the modular tensor representation. revision: yes

  2. Referee: [§3 (Framework Architecture)] §3 (Framework Architecture): The description of the graph encoding (adjacency tensor, edge-list, or reward shaping) provides no complexity analysis or pseudocode showing how arbitrary numbers of colors, directed edges, and loops are handled without performance degradation; without this, the efficiency assertion cannot be evaluated.

    Authors: We accept that the current description lacks explicit pseudocode and asymptotic analysis. In the revision we will insert pseudocode for the core encoding routines (adjacency-tensor construction, directed-edge handling, loop support, and multi-color extension) together with a complexity statement: the representation uses O(n²C) space and O(mC) time per transition for m edges and C colors, with no hidden factors that degrade when C>1 or when direction/loops are enabled. This material will be placed in §3. revision: yes

Circularity Check

0 steps flagged

No circularity: RLGT is a new implementation framework without self-referential derivations

full rationale

The manuscript presents RLGT as a systematization of Wagner's prior RL approach for extremal graph problems, extending support to directed/undirected graphs with loops and multi-color edges. No equations, fitted parameters, or predictions are defined in terms of themselves. The central claims concern code-level representation efficiency and modularity, which are asserted rather than derived from any input quantities by construction. Prior work is cited externally (Wagner et al.), with no self-citation load-bearing the argument and no uniqueness theorems or ansatzes smuggled in. This is a standard non-circular framework paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper introduces a software framework rather than a mathematical derivation, so the ledger contains only standard domain assumptions with no free parameters or invented entities.

axioms (1)
  • domain assumption Reinforcement learning methods can be effectively applied to combinatorial optimization problems in graph theory.
    The framework relies on the success of prior RL applications to graph problems mentioned in the abstract.

pith-pipeline@v0.9.0 · 5540 in / 1256 out tokens · 44012 ms · 2026-05-15T21:18:18.935648+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Some results on small ordered and cyclic Ramsey numbers

    math.CO 2026-04 unverdicted novelty 7.0

    Authors compute new small two-color ordered and cyclic Ramsey numbers for monotone paths, cycles, stars, complete graphs and nested matchings via SAT solving, determine closed forms for several pairs of graph classes,...

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · cited by 1 Pith paper · 5 internal anchors

  1. [1]

    TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

    M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Józefowicz, Ł. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke...

  2. [2]

    Akbari, A

    S. Akbari, A. Alazemi and M. Anđelić, Upper bounds on the energy of graphs in terms of matching number, Appl. Anal. Discrete Math.15(2021), 444–459,https://doi.org/10.2298/AADM201227016A

  3. [3]

    Ali and T

    A. Ali and T. Došlić, Mostar index: Results and perspectives,Appl. Math. Comput.404(2021), 126245, https://doi.org/10.1016/j.amc.2021.126245

  4. [4]

    Angileri, G

    F. Angileri, G. Lombardi, A. Fois, R. Faraone, C. Metta, M. Salvi, L. A. Bianchi, M. Fantozzi, S. G. Galfrè, D. Pavesi, M. Parton and F. Morandin, A systematization of the Wagner framework: Graph theory conjectures and reinforcement learning, in: D. Pedreschi, A. Monreale, R. Guidotti, R. Pellungrini and F. Naretto (eds.),Discovery Science, volume 15243 o...

  5. [5]

    Angileri, G

    F. Angileri, G. Lombardi, A. Fois, R. Faraone, C. Metta, M. Salvi, L. A. Bianchi, M. Fantozzi, S. G. Galfrè, D. Pavesi, M. Parton and F. Morandin, Analyzing RL components for Wagner’s framework via Brouwer’s conjecture,Mach. Learn.114(2025), Art. No. 242,https://doi.org/10.1007/s10994-025-06890-2

  6. [6]

    Biggs,Algebraic Graph Theory, 2nd edition, Cambridge Mathematical Library, Cambridge University Press, Cambridge, 1993,https://doi.org/10.1017/CBO9780511608704

    N. Biggs,Algebraic Graph Theory, 2nd edition, Cambridge Mathematical Library, Cambridge University Press, Cambridge, 1993,https://doi.org/10.1017/CBO9780511608704

  7. [7]

    de Boer, D

    P.-T. de Boer, D. P. Kroese, S. Mannor and R. Y. Rubinstein, A tutorial on the cross-entropy method, Ann. Oper. Res.134(2005), 19–67,https://doi.org/10.1007/s10479-005-5724-z. 23

  8. [8]

    Bollobás,Modern Graph Theory, volume 184 ofGraduate Texts in Mathematics, Springer, New York, NY, 1998,https://doi.org/10.1007/978-1-4612-0619-4

    B. Bollobás,Modern Graph Theory, volume 184 ofGraduate Texts in Mathematics, Springer, New York, NY, 1998,https://doi.org/10.1007/978-1-4612-0619-4

  9. [9]

    J. A. Bondy and U. S. R. Murty,Graph Theory with Applications, Elsevier Science Publishing Co., Inc., New York, NY, 1976

  10. [10]

    V.Brankov, P.HansenandD.Stevanović, AutomatedconjecturesonupperboundsforthelargestLaplacian eigenvalue of graphs,Linear Algebra Appl.414(2006), 407–424,https://doi.org/10.1016/j.laa.2005. 10.017

  11. [11]

    A. E. Brouwer and W. H. Haemers,Spectra of Graphs, Universitext, Springer, New York, NY, 2012, https://doi.org/10.1007/978-1-4614-1939-6

  12. [12]

    Y. Cao, G. Chen, G. Jing, M. Stiebitz and B. Toft, Graph edge coloring: A survey,Graphs Combin.35 (2019), 33–66,https://doi.org/10.1007/s00373-018-1986-5

  13. [13]

    Crosley, et al.,isort,https://pycqa.github.io/isort/

    T. Crosley, et al.,isort,https://pycqa.github.io/isort/

  14. [14]

    Cvetković, M

    D. Cvetković, M. Doob and H. Sachs,Spectra of Graphs: Theory and Application, 2nd edition, Johann Ambrosius Barth Verlag, Heidelberg–Leipzig, 1980

  15. [15]

    Damnjanović, U

    I. Damnjanović, U. Milivojević, I. Ðorđević and D. Stevanović, RLGT: A reinforcement learning framework for extremal graph theory (GitHub documentation),https://ivan-damnjanovic.github.io/rlgt/

  16. [16]

    Damnjanović, U

    I. Damnjanović, U. Milivojević, I. Ðorđević and D. Stevanović, RLGT: A reinforcement learning framework for extremal graph theory (GitHub repository),https://github.com/Ivan-Damnjanovic/rlgt

  17. [17]

    Damnjanović, U

    I. Damnjanović, U. Milivojević, I. Ðorđević and D. Stevanović, RLGT: A reinforcement learning framework for extremal graph theory (PyPI page),https://pypi.org/project/RLGT/

  18. [18]

    Diestel,Graph Theory, 5th edition, volume 173 ofGraduate Texts in Mathematics, Springer Berlin, Heidelberg, 2017,https://doi.org/10.1007/978-3-662-53622-3

    R. Diestel,Graph Theory, 5th edition, volume 173 ofGraduate Texts in Mathematics, Springer Berlin, Heidelberg, 2017,https://doi.org/10.1007/978-3-662-53622-3

  19. [19]

    Došlić, I

    T. Došlić, I. Martinjak, R. Škrekovski, S. Tipurić Spužević and I. Zubac, Mostar index,J. Math. Chem. 56(2018), 2995–3013,https://doi.org/10.1007/s10910-018-0928-z

  20. [20]

    Erdős, Some recent progress on extremal problems in graph theory,Congr

    P. Erdős, Some recent progress on extremal problems in graph theory,Congr. Numer.14(1975), 3–14

  21. [21]

    Eustace, et al.,Poetry,https://python-poetry.org/

    S. Eustace, et al.,Poetry,https://python-poetry.org/

  22. [22]

    M.Ghebleh, S.Al-Yakoob, A.KansoandD.Stevanović, Graphshavingtwomaineigenvaluesandarbitrarily many distinct vertex degrees,Appl. Math. Comput.495(2025), 129311,https://doi.org/10.1016/j. amc.2025.129311

  23. [23]

    Ghebleh, S

    M. Ghebleh, S. Al-Yakoob, A. Kanso and D. Stevanović, Reinforcement learning for graph theory, II. Small Ramsey numbers,Art Discrete Appl. Math.8(2025), #P1.07,https://doi.org/10.26493/2590- 9770.1788.8af

  24. [24]

    Ghebleh, S

    M. Ghebleh, S. Al-Yakoob, A. Kanso and D. Stevanović, Reinforcement learning for graph theory, I: Reimplementation of Wagner’s approach,Discrete Appl. Math.380(2026), 468–479,https://doi.org/ 10.1016/j.dam.2025.10.047

  25. [25]

    Ghebleh, A

    M. Ghebleh, A. Kanso and D. Stevanović, Graph6Java: A researcher-friendly Java framework for testing conjectures in chemical graph theory,MATCH Commun. Math. Comput. Chem.81(2019) 737–770,https: //match.pmf.kg.ac.rs/electronic_versions/Match81/n3/match81n3_737-770.pdf

  26. [26]

    Glover, Tabu search — Part I,ORSA J

    F. Glover, Tabu search — Part I,ORSA J. Comput.1(1989), 190–206,https://doi.org/10.1287/ijoc. 1.3.190

  27. [27]

    Glover and M

    F. Glover and M. Laguna, Tabu search, in: D.-Z. Du and P. M. Pardalos (eds.),Handbook of Combinato- rial Optimization, Springer, Boston, MA, 1998, pp. 2093–2229,https://doi.org/10.1007/978-1-4613- 0303-9_33

  28. [28]

    Godsil and G

    C. Godsil and G. Royle,Algebraic Graph Theory, volume 207 ofGraduate Texts in Mathematics, Springer, New York, NY, 2001,https://doi.org/10.1007/978-1-4613-0163-9

  29. [29]

    Gutman, The energy of a graph,Ber

    I. Gutman, The energy of a graph,Ber. Math.-Statist. Sekt. Forschungsz. Graz.103(1978), 1–22. 24

  30. [30]

    C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer, M. H. van Kerkwijk, M. Brett, A. Haldane, J. Fernández del Río, M. Wiebe, P. Peterson, P. Gérard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi, C. Gohlke and T. E. Oliphant, Array prog...

  31. [31]

    Krekel, et al.,pytest,https://docs.pytest.org/

    H. Krekel, et al.,pytest,https://docs.pytest.org/

  32. [32]

    Langa, et al.,Black,https://black.readthedocs.io/en/stable/

    Ł. Langa, et al.,Black,https://black.readthedocs.io/en/stable/

  33. [33]

    Lapan,Deep Reinforcement Learning Hands-On, 2nd edition, Packt Publishing, Birmingham, 2020

    M. Lapan,Deep Reinforcement Learning Hands-On, 2nd edition, Packt Publishing, Birmingham, 2020

  34. [34]

    Mazyavkina, S

    N. Mazyavkina, S. Sviridov, S. Ivanov and E. Burnaev, Reinforcement learning for combinatorial optimiza- tion: A survey,Comput. Oper. Res.134(2021), 105400,https://doi.org/10.1016/j.cor.2021.105400

  35. [35]

    B. D. McKay and A. Piperno, Practical graph isomorphism, II,J. Symb. Comput.60(2014) 94–112, https://doi.org/10.1016/j.jsc.2013.09.003

  36. [36]

    Mehrabian, A

    A. Mehrabian, A. Anand, H. Kim, N. Sonnerat, M. Balog, Gh. Comanici, T. Berariu, A. Lee, A. Ruoss, A. Bulanova, D. Toyama, S. Blackwell, B. Romera Paredes, P. Veličković, L. Orseau, J. Lee, A. M. Naredla, D. Precup and A. Zs. Wagner, Finding increasingly large extremal graphs with AlphaZero and tabu search, in: K. Larson (ed.),Proceedings of the Thirty-Th...

  37. [37]

    Menard, L

    S. Menard, L. Nell, et al.,JPype,https://jpype.readthedocs.io/en/latest/

  38. [38]

    A Comparative Study of Programming Languages in Rosetta Code

    S. Nanz and C. A. Furia, A comparative study of programming languages in Rosetta Code, 2015, arXiv:1409.0252 [cs.SE]

  39. [39]

    PyTorch: An Imperative Style, High-Performance Deep Learning Library

    A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai and S. Chintala, PyTorch: An imperative style, high-performance deep learning library, 2019, arXiv:1912.01703 [cs.LG]

  40. [40]

    W. B. Powell,Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequen- tial Decisions, 1st edition, John Wiley & Sons, Inc., Hoboken, NJ, 2022,https://doi.org/10.1002/ 9781119815068

  41. [41]

    S. P. Radziszowski, Small Ramsey numbers,Electron. J. Combin.DS01(2024),https://doi.org/10. 37236/21

  42. [42]

    Rowlinson, The main eigenvalues of a graph: A survey,Appl

    P. Rowlinson, The main eigenvalues of a graph: A survey,Appl. Anal. Discrete Math.1(2007), 445–471, https://doi.org/10.2298/AADM0702445R

  43. [43]

    R. Y. Rubinstein, Optimization of computer simulation models with rare events,Eur. J. Oper. Res.99 (1997), 89–112,https://doi.org/10.1016/S0377-2217(96)00385-2

  44. [44]

    Proximal Policy Optimization Algorithms

    J. Schulman, F. Wolski, P. Dhariwal, A. Radford and O. Klimov, Proximal Policy Optimization algorithms, 2017,arXiv:1707.06347 [cs.LG]

  45. [45]

    Silver, J

    D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel and D. Hassabis, Mastering the game of Go without human knowledge,Nature550(2017), 354–359,https://doi.org/10. 1038/nature24270

  46. [46]

    Simonovits, Extremal graph problems, degenerate extremal problems, and supersaturated graphs, in: J

    M. Simonovits, Extremal graph problems, degenerate extremal problems, and supersaturated graphs, in: J. A. Bondy and U. S. R. Murty (eds.),Progress in Graph Theory, Academic Press, Toronto, ON, 1984, pp. 419–437

  47. [47]

    Sharafdini and T

    R. Sharafdini and T. Réti, On the transmission-based graph topological indices,Kragujevac J. Math.44 (2020), 41–63,https://doi.org/10.46793/KgJMat2001.041S

  48. [48]

    Soviany, R

    P. Soviany, R. T. Ionescu, P. Rota and N. Sebe, Curriculum learning: A survey,Int. J. Comput. Vision 130(2022), 1526–1565,https://doi.org/10.1007/s11263-022-01611-x

  49. [49]

    Stevanović, I

    Ð. Stevanović, I. Damnjanović and D. Stevanović, Finding counterexamples for a conjecture of Akbari, Alazemi and Anđelić, 2021,arXiv:2111.15303 [math.CO]. 25

  50. [50]

    R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction, 2nd edition, Adaptive Computa- tion and Machine Learning, MIT Press, Cambridge, MA, 2018

  51. [51]

    Cs. Szepesvári,Algorithms for Reinforcement Learning, 1st edition, Synthesis Lectures on Artificial Intel- ligence and Machine Learning, Springer, Cham, 2010,https://doi.org/10.1007/978-3-031-01551-9

  52. [52]

    Taieb, M

    L. Taieb, M. Roucairol, T. Cazenave and A. Harutyunyan, Automated refutation with Monte Carlo search of graph theory conjectures on the maximum Laplacian eigenvalue, in: Y. Zhang, M. Hladík and H. Moosaei (eds.),Learning and Intelligent Optimization, volume15744ofLecture Notes in Computer Science, Springer, Cham, 2026, pp. 52–63,https://doi.org/10.1007/97...

  53. [53]

    Gymnasium: A Standard Interface for Reinforcement Learning Environments

    M. Towers, A. Kwiatkowski, J. Terry, J. U. Balis, G. De Cola, T. Deleu, M. Goulão, A. Kallinteris, M. Krimmel, A. KG, R. Perez-Vicente, A. Pierré, S. Schulhoff, J. J. Tai, H. Tan and O. G. Younis, Gymnasium: A standard interface for reinforcement learning environments, 2025,arXiv:2407.17032 [cs.LG]

  54. [54]

    A. Zs. Wagner, Constructions in combinatorics via neural networks, 2021,arXiv:2104.14516 [math.CO]

  55. [55]

    R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn.8(1992), 229–256,https://doi.org/10.1007/BF00992696

  56. [56]

    The Sage Developers, SageMath, the Sage Mathematics Software System (Version 10.8), 2025,https: //www.sagemath.org. 26