arxiv: 2602.17276 · v2 · submitted 2026-02-19 · 💻 cs.LG · math.CO

Recognition: no theorem link

RLGT: A reinforcement learning framework for extremal graph theory

Ivan Damnjanovi\'c , Uro\v{s} Milivojevi\'c , Irena {\DJ}or{\dj}evi\'c , Dragan Stevanovi\'c

Authors on Pith no claims yet

Pith reviewed 2026-05-15 21:18 UTC · model grok-4.3

classification 💻 cs.LG math.CO

keywords reinforcement learningextremal graph theorycombinatorial optimizationgraph representationRamsey numbersspectral radiusTurán problems

0 comments

The pith

The RLGT framework systematizes reinforcement learning for extremal graph theory across undirected, directed, looped, and multi-colored graphs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper introduces RLGT as a reinforcement learning framework that organizes prior applications of RL to extremal graph problems by reformulating them as combinatorial optimization tasks. It extends support to undirected and directed graphs, with or without loops, and any number of edge colors while emphasizing efficient graph representation. The modular design and performance optimizations are intended to reduce the effort needed for new RL experiments in the field. A sympathetic reader would care because this could accelerate work on open questions such as Ramsey number bounds and spectral radius inequalities by providing a shared starting point instead of repeated custom implementations.

Core claim

RLGT is a novel RL framework that systematizes the previous work and provides support for both undirected and directed graphs, with or without loops, and with an arbitrary number of edge colors. The framework efficiently represents graphs and aims to facilitate future RL-based research in extremal graph theory through optimized computational performance and a clean and modular design.

What carries the argument

The RLGT framework, a modular reinforcement learning environment that represents graphs of varying types and colors to optimize constructions for extremal objectives.

If this is right

Researchers can apply RL directly to directed graphs and multi-edge-color problems without building new environments.
Standardized graph representations lower the barrier to experimenting with Turán-type problems involving cycles of length three and four.
Optimized performance supports scaling RL searches to larger graphs for spectral radius questions.
Modular components allow straightforward addition of new forbidden subgraph constraints.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same modular structure could be adapted for RL on other combinatorial structures such as hypergraphs or matroids.
Integration with newer policy-gradient or actor-critic methods might improve sample efficiency on the same graph problems.
Automated conjecture generation becomes more feasible if the framework is used to systematically explore large families of graphs.

Load-bearing premise

That supplying a modular and computationally optimized framework will lead researchers to adopt it and produce new results in extremal graph theory.

What would settle it

Future papers solving specific problems such as new lower bounds for Ramsey numbers would show no measurable speed or quality gain when using RLGT compared with prior custom RL setups.

Figures

Figures reproduced from arXiv: 2602.17276 by Dragan Stevanovi\'c, Irena {\DJ}or{\dj}evi\'c, Ivan Damnjanovi\'c, Uro\v{s} Milivojevi\'c.

**Figure 2.** Figure 2: The best score versus step count plot for the disproof of Conjecture [PITH_FULL_IMAGE:figures/full_fig_p021_2.png] view at source ↗

**Figure 3.** Figure 3: The best score versus step count plot for the unsuccessful attempt to disprove Conjecture [PITH_FULL_IMAGE:figures/full_fig_p022_3.png] view at source ↗

read the original abstract

Reinforcement learning (RL) is a subfield of machine learning that focuses on developing models that can autonomously learn optimal decision-making strategies over time. In a recent pioneering paper, Wagner demonstrated how the Deep Cross-Entropy RL method can be applied to tackle various problems from extremal graph theory by reformulating them as combinatorial optimization problems. Subsequently, many researchers became interested in refining and extending the framework introduced by Wagner, thereby creating various RL environments specialized for graph theory. Moreover, a number of problems from extremal graph theory were solved through the use of RL. In particular, several inequalities concerning the Laplacian spectral radius of graphs were refuted, new lower bounds were obtained for certain Ramsey numbers, and contributions were made to the Tur\'{a}n-type extremal problem in which the forbidden structures are cycles of length three and four. Here, we present Reinforcement Learning for Graph Theory (RLGT), a novel RL framework that systematizes the previous work and provides support for both undirected and directed graphs, with or without loops, and with an arbitrary number of edge colors. The framework efficiently represents graphs and aims to facilitate future RL-based research in extremal graph theory through optimized computational performance and a clean and modular design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RLGT offers a unified RL framework for extremal graphs but lacks demonstrated benchmarks for its efficiency claims.

read the letter

RLGT is a framework that tries to make reinforcement learning more practical for extremal graph theory by handling a wider range of graph types in one place. It takes the approach from Wagner's paper and the subsequent applications to things like Laplacian spectral radius and Ramsey numbers, and turns them into a single modular system that supports directed graphs, loops, and colored edges. This systematization is the main new piece, and it could help avoid reinventing the wheel for each new problem. The paper does well at explaining the background and why a general tool matters. It points to real uses of RL in the area without overclaiming what the framework itself has solved. The main issue is that the claims about optimized performance and clean design are not supported by any data or examples in the text. There are no comparisons to prior environments, no timing results, and no validation that it actually facilitates new research. This makes the contribution feel preliminary. A reader interested in trying RL on graph problems might find the framework worth testing if the code is released. Someone looking for new theorems or proven improvements will not get much from it. I would bring this to a reading group focused on machine learning for combinatorics to see the implementation details. It is not something I would cite in my own work soon, since it is mostly infrastructure. It deserves peer review because the idea of a shared framework is worth documenting properly, even if it needs more evidence to be convincing.

Referee Report

2 major / 1 minor

Summary. The paper introduces RLGT, a reinforcement learning framework that systematizes prior Deep Cross-Entropy RL work on extremal graph theory (e.g., Wagner and follow-ups). It claims to support undirected/directed graphs with or without loops and arbitrary edge colors via an efficient graph representation and modular design intended to facilitate future research.

Significance. If the efficiency and modularity claims are substantiated, the framework could standardize RL environments for combinatorial graph problems, lowering barriers for exploring open questions in Ramsey theory, spectral radii, and Turán-type problems. The absence of any benchmarks or usage examples in the manuscript, however, leaves the practical significance unverified.

major comments (2)

[Abstract and §1] Abstract and §1: The central claims of 'optimized computational performance' and 'facilitate future RL-based research' rest on assertions about representation efficiency and modularity, yet the manuscript supplies no runtime data, memory profiles, scaling curves, or side-by-side comparisons against the ad-hoc environments cited from Wagner et al. This directly undermines the facilitation claim.
[§3 (Framework Architecture)] §3 (Framework Architecture): The description of the graph encoding (adjacency tensor, edge-list, or reward shaping) provides no complexity analysis or pseudocode showing how arbitrary numbers of colors, directed edges, and loops are handled without performance degradation; without this, the efficiency assertion cannot be evaluated.

minor comments (1)

[§2.2] §2.2: The notation distinguishing graph classes (directed/undirected, loops, colors) would benefit from an explicit summary table to improve readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and describe the revisions planned to strengthen the empirical and technical support for our claims.

read point-by-point responses

Referee: [Abstract and §1] Abstract and §1: The central claims of 'optimized computational performance' and 'facilitate future RL-based research' rest on assertions about representation efficiency and modularity, yet the manuscript supplies no runtime data, memory profiles, scaling curves, or side-by-side comparisons against the ad-hoc environments cited from Wagner et al. This directly undermines the facilitation claim.

Authors: We agree that the absence of runtime data, memory profiles, scaling curves, and direct comparisons weakens the substantiation of the efficiency and facilitation claims. In the revised manuscript we will add a new experimental subsection (likely §4) containing benchmark results against the ad-hoc environments from Wagner et al. These will include wall-clock times, peak memory usage, and scaling curves for graphs of increasing order, covering undirected/directed cases, loops, and multiple edge colors. The results will quantify the performance advantage of the modular tensor representation. revision: yes
Referee: [§3 (Framework Architecture)] §3 (Framework Architecture): The description of the graph encoding (adjacency tensor, edge-list, or reward shaping) provides no complexity analysis or pseudocode showing how arbitrary numbers of colors, directed edges, and loops are handled without performance degradation; without this, the efficiency assertion cannot be evaluated.

Authors: We accept that the current description lacks explicit pseudocode and asymptotic analysis. In the revision we will insert pseudocode for the core encoding routines (adjacency-tensor construction, directed-edge handling, loop support, and multi-color extension) together with a complexity statement: the representation uses O(n²C) space and O(mC) time per transition for m edges and C colors, with no hidden factors that degrade when C>1 or when direction/loops are enabled. This material will be placed in §3. revision: yes

Circularity Check

0 steps flagged

No circularity: RLGT is a new implementation framework without self-referential derivations

full rationale

The manuscript presents RLGT as a systematization of Wagner's prior RL approach for extremal graph problems, extending support to directed/undirected graphs with loops and multi-color edges. No equations, fitted parameters, or predictions are defined in terms of themselves. The central claims concern code-level representation efficiency and modularity, which are asserted rather than derived from any input quantities by construction. Prior work is cited externally (Wagner et al.), with no self-citation load-bearing the argument and no uniqueness theorems or ansatzes smuggled in. This is a standard non-circular framework paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper introduces a software framework rather than a mathematical derivation, so the ledger contains only standard domain assumptions with no free parameters or invented entities.

axioms (1)

domain assumption Reinforcement learning methods can be effectively applied to combinatorial optimization problems in graph theory.
The framework relies on the success of prior RL applications to graph problems mentioned in the abstract.

pith-pipeline@v0.9.0 · 5540 in / 1256 out tokens · 44012 ms · 2026-05-15T21:18:18.935648+00:00 · methodology

discussion (0)

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Some results on small ordered and cyclic Ramsey numbers
math.CO 2026-04 unverdicted novelty 7.0

Authors compute new small two-color ordered and cyclic Ramsey numbers for monotone paths, cycles, stars, complete graphs and nested matchings via SAT solving, determine closed forms for several pairs of graph classes,...

Reference graph

Works this paper leans on

56 extracted references · 56 canonical work pages · cited by 1 Pith paper · 5 internal anchors

[1]

TensorFlow: Large-Scale Machine Learning on Heterogeneous Distributed Systems

M. Abadi, A. Agarwal, P. Barham, E. Brevdo, Z. Chen, C. Citro, G. S. Corrado, A. Davis, J. Dean, M. Devin, S. Ghemawat, I. Goodfellow, A. Harp, G. Irving, M. Isard, Y. Jia, R. Józefowicz, Ł. Kaiser, M. Kudlur, J. Levenberg, D. Mané, R. Monga, S. Moore, D. Murray, C. Olah, M. Schuster, J. Shlens, B. Steiner, I. Sutskever, K. Talwar, P. Tucker, V. Vanhoucke...

work page internal anchor Pith review Pith/arXiv arXiv 2016
[2]

Akbari, A

S. Akbari, A. Alazemi and M. Anđelić, Upper bounds on the energy of graphs in terms of matching number, Appl. Anal. Discrete Math.15(2021), 444–459,https://doi.org/10.2298/AADM201227016A

work page doi:10.2298/aadm201227016a 2021
[3]

Ali and T

A. Ali and T. Došlić, Mostar index: Results and perspectives,Appl. Math. Comput.404(2021), 126245, https://doi.org/10.1016/j.amc.2021.126245

work page doi:10.1016/j.amc.2021.126245 2021
[4]

Angileri, G

F. Angileri, G. Lombardi, A. Fois, R. Faraone, C. Metta, M. Salvi, L. A. Bianchi, M. Fantozzi, S. G. Galfrè, D. Pavesi, M. Parton and F. Morandin, A systematization of the Wagner framework: Graph theory conjectures and reinforcement learning, in: D. Pedreschi, A. Monreale, R. Guidotti, R. Pellungrini and F. Naretto (eds.),Discovery Science, volume 15243 o...

work page doi:10.1007/978-3-031-78977-9_21 2025
[5]

Angileri, G

F. Angileri, G. Lombardi, A. Fois, R. Faraone, C. Metta, M. Salvi, L. A. Bianchi, M. Fantozzi, S. G. Galfrè, D. Pavesi, M. Parton and F. Morandin, Analyzing RL components for Wagner’s framework via Brouwer’s conjecture,Mach. Learn.114(2025), Art. No. 242,https://doi.org/10.1007/s10994-025-06890-2

work page doi:10.1007/s10994-025-06890-2 2025
[6]

Biggs,Algebraic Graph Theory, 2nd edition, Cambridge Mathematical Library, Cambridge University Press, Cambridge, 1993,https://doi.org/10.1017/CBO9780511608704

N. Biggs,Algebraic Graph Theory, 2nd edition, Cambridge Mathematical Library, Cambridge University Press, Cambridge, 1993,https://doi.org/10.1017/CBO9780511608704

work page doi:10.1017/cbo9780511608704 1993
[7]

de Boer, D

P.-T. de Boer, D. P. Kroese, S. Mannor and R. Y. Rubinstein, A tutorial on the cross-entropy method, Ann. Oper. Res.134(2005), 19–67,https://doi.org/10.1007/s10479-005-5724-z. 23

work page doi:10.1007/s10479-005-5724-z 2005
[8]

Bollobás,Modern Graph Theory, volume 184 ofGraduate Texts in Mathematics, Springer, New York, NY, 1998,https://doi.org/10.1007/978-1-4612-0619-4

B. Bollobás,Modern Graph Theory, volume 184 ofGraduate Texts in Mathematics, Springer, New York, NY, 1998,https://doi.org/10.1007/978-1-4612-0619-4

work page doi:10.1007/978-1-4612-0619-4 1998
[9]

J. A. Bondy and U. S. R. Murty,Graph Theory with Applications, Elsevier Science Publishing Co., Inc., New York, NY, 1976

work page 1976
[10]

V.Brankov, P.HansenandD.Stevanović, AutomatedconjecturesonupperboundsforthelargestLaplacian eigenvalue of graphs,Linear Algebra Appl.414(2006), 407–424,https://doi.org/10.1016/j.laa.2005. 10.017

work page doi:10.1016/j.laa.2005 2006
[11]

A. E. Brouwer and W. H. Haemers,Spectra of Graphs, Universitext, Springer, New York, NY, 2012, https://doi.org/10.1007/978-1-4614-1939-6

work page doi:10.1007/978-1-4614-1939-6 2012
[12]

Y. Cao, G. Chen, G. Jing, M. Stiebitz and B. Toft, Graph edge coloring: A survey,Graphs Combin.35 (2019), 33–66,https://doi.org/10.1007/s00373-018-1986-5

work page doi:10.1007/s00373-018-1986-5 2019
[13]

Crosley, et al.,isort,https://pycqa.github.io/isort/

T. Crosley, et al.,isort,https://pycqa.github.io/isort/

work page
[14]

Cvetković, M

D. Cvetković, M. Doob and H. Sachs,Spectra of Graphs: Theory and Application, 2nd edition, Johann Ambrosius Barth Verlag, Heidelberg–Leipzig, 1980

work page 1980
[15]

Damnjanović, U

I. Damnjanović, U. Milivojević, I. Ðorđević and D. Stevanović, RLGT: A reinforcement learning framework for extremal graph theory (GitHub documentation),https://ivan-damnjanovic.github.io/rlgt/

work page
[16]

Damnjanović, U

I. Damnjanović, U. Milivojević, I. Ðorđević and D. Stevanović, RLGT: A reinforcement learning framework for extremal graph theory (GitHub repository),https://github.com/Ivan-Damnjanovic/rlgt

work page
[17]

Damnjanović, U

I. Damnjanović, U. Milivojević, I. Ðorđević and D. Stevanović, RLGT: A reinforcement learning framework for extremal graph theory (PyPI page),https://pypi.org/project/RLGT/

work page
[18]

Diestel,Graph Theory, 5th edition, volume 173 ofGraduate Texts in Mathematics, Springer Berlin, Heidelberg, 2017,https://doi.org/10.1007/978-3-662-53622-3

R. Diestel,Graph Theory, 5th edition, volume 173 ofGraduate Texts in Mathematics, Springer Berlin, Heidelberg, 2017,https://doi.org/10.1007/978-3-662-53622-3

work page doi:10.1007/978-3-662-53622-3 2017
[19]

Došlić, I

T. Došlić, I. Martinjak, R. Škrekovski, S. Tipurić Spužević and I. Zubac, Mostar index,J. Math. Chem. 56(2018), 2995–3013,https://doi.org/10.1007/s10910-018-0928-z

work page doi:10.1007/s10910-018-0928-z 2018
[20]

Erdős, Some recent progress on extremal problems in graph theory,Congr

P. Erdős, Some recent progress on extremal problems in graph theory,Congr. Numer.14(1975), 3–14

work page 1975
[21]

Eustace, et al.,Poetry,https://python-poetry.org/

S. Eustace, et al.,Poetry,https://python-poetry.org/

work page
[22]

M.Ghebleh, S.Al-Yakoob, A.KansoandD.Stevanović, Graphshavingtwomaineigenvaluesandarbitrarily many distinct vertex degrees,Appl. Math. Comput.495(2025), 129311,https://doi.org/10.1016/j. amc.2025.129311

work page doi:10.1016/j 2025
[23]

Ghebleh, S

M. Ghebleh, S. Al-Yakoob, A. Kanso and D. Stevanović, Reinforcement learning for graph theory, II. Small Ramsey numbers,Art Discrete Appl. Math.8(2025), #P1.07,https://doi.org/10.26493/2590- 9770.1788.8af

work page doi:10.26493/2590- 2025
[24]

Ghebleh, S

M. Ghebleh, S. Al-Yakoob, A. Kanso and D. Stevanović, Reinforcement learning for graph theory, I: Reimplementation of Wagner’s approach,Discrete Appl. Math.380(2026), 468–479,https://doi.org/ 10.1016/j.dam.2025.10.047

work page doi:10.1016/j.dam.2025.10.047 2026
[25]

Ghebleh, A

M. Ghebleh, A. Kanso and D. Stevanović, Graph6Java: A researcher-friendly Java framework for testing conjectures in chemical graph theory,MATCH Commun. Math. Comput. Chem.81(2019) 737–770,https: //match.pmf.kg.ac.rs/electronic_versions/Match81/n3/match81n3_737-770.pdf

work page 2019
[26]

Glover, Tabu search — Part I,ORSA J

F. Glover, Tabu search — Part I,ORSA J. Comput.1(1989), 190–206,https://doi.org/10.1287/ijoc. 1.3.190

work page doi:10.1287/ijoc 1989
[27]

Glover and M

F. Glover and M. Laguna, Tabu search, in: D.-Z. Du and P. M. Pardalos (eds.),Handbook of Combinato- rial Optimization, Springer, Boston, MA, 1998, pp. 2093–2229,https://doi.org/10.1007/978-1-4613- 0303-9_33

work page doi:10.1007/978-1-4613- 1998
[28]

Godsil and G

C. Godsil and G. Royle,Algebraic Graph Theory, volume 207 ofGraduate Texts in Mathematics, Springer, New York, NY, 2001,https://doi.org/10.1007/978-1-4613-0163-9

work page doi:10.1007/978-1-4613-0163-9 2001
[29]

Gutman, The energy of a graph,Ber

I. Gutman, The energy of a graph,Ber. Math.-Statist. Sekt. Forschungsz. Graz.103(1978), 1–22. 24

work page 1978
[30]

C. R. Harris, K. J. Millman, S. J. van der Walt, R. Gommers, P. Virtanen, D. Cournapeau, E. Wieser, J. Taylor, S. Berg, N. J. Smith, R. Kern, M. Picus, S. Hoyer, M. H. van Kerkwijk, M. Brett, A. Haldane, J. Fernández del Río, M. Wiebe, P. Peterson, P. Gérard-Marchant, K. Sheppard, T. Reddy, W. Weckesser, H. Abbasi, C. Gohlke and T. E. Oliphant, Array prog...

work page doi:10.1038/s41586-020-2649-2 2020
[31]

Krekel, et al.,pytest,https://docs.pytest.org/

H. Krekel, et al.,pytest,https://docs.pytest.org/

work page
[32]

Langa, et al.,Black,https://black.readthedocs.io/en/stable/

Ł. Langa, et al.,Black,https://black.readthedocs.io/en/stable/

work page
[33]

Lapan,Deep Reinforcement Learning Hands-On, 2nd edition, Packt Publishing, Birmingham, 2020

M. Lapan,Deep Reinforcement Learning Hands-On, 2nd edition, Packt Publishing, Birmingham, 2020

work page 2020
[34]

Mazyavkina, S

N. Mazyavkina, S. Sviridov, S. Ivanov and E. Burnaev, Reinforcement learning for combinatorial optimiza- tion: A survey,Comput. Oper. Res.134(2021), 105400,https://doi.org/10.1016/j.cor.2021.105400

work page doi:10.1016/j.cor.2021.105400 2021
[35]

B. D. McKay and A. Piperno, Practical graph isomorphism, II,J. Symb. Comput.60(2014) 94–112, https://doi.org/10.1016/j.jsc.2013.09.003

work page doi:10.1016/j.jsc.2013.09.003 2014
[36]

Mehrabian, A

A. Mehrabian, A. Anand, H. Kim, N. Sonnerat, M. Balog, Gh. Comanici, T. Berariu, A. Lee, A. Ruoss, A. Bulanova, D. Toyama, S. Blackwell, B. Romera Paredes, P. Veličković, L. Orseau, J. Lee, A. M. Naredla, D. Precup and A. Zs. Wagner, Finding increasingly large extremal graphs with AlphaZero and tabu search, in: K. Larson (ed.),Proceedings of the Thirty-Th...

work page doi:10.24963/ijcai.2024/772 2024
[37]

Menard, L

S. Menard, L. Nell, et al.,JPype,https://jpype.readthedocs.io/en/latest/

work page
[38]

A Comparative Study of Programming Languages in Rosetta Code

S. Nanz and C. A. Furia, A comparative study of programming languages in Rosetta Code, 2015, arXiv:1409.0252 [cs.SE]

work page internal anchor Pith review Pith/arXiv arXiv 2015
[39]

PyTorch: An Imperative Style, High-Performance Deep Learning Library

A. Paszke, S. Gross, F. Massa, A. Lerer, J. Bradbury, G. Chanan, T. Killeen, Z. Lin, N. Gimelshein, L. Antiga, A. Desmaison, A. Köpf, E. Yang, Z. DeVito, M. Raison, A. Tejani, S. Chilamkurthy, B. Steiner, L. Fang, J. Bai and S. Chintala, PyTorch: An imperative style, high-performance deep learning library, 2019, arXiv:1912.01703 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2019
[40]

W. B. Powell,Reinforcement Learning and Stochastic Optimization: A Unified Framework for Sequen- tial Decisions, 1st edition, John Wiley & Sons, Inc., Hoboken, NJ, 2022,https://doi.org/10.1002/ 9781119815068

work page 2022
[41]

S. P. Radziszowski, Small Ramsey numbers,Electron. J. Combin.DS01(2024),https://doi.org/10. 37236/21

work page 2024
[42]

Rowlinson, The main eigenvalues of a graph: A survey,Appl

P. Rowlinson, The main eigenvalues of a graph: A survey,Appl. Anal. Discrete Math.1(2007), 445–471, https://doi.org/10.2298/AADM0702445R

work page doi:10.2298/aadm0702445r 2007
[43]

R. Y. Rubinstein, Optimization of computer simulation models with rare events,Eur. J. Oper. Res.99 (1997), 89–112,https://doi.org/10.1016/S0377-2217(96)00385-2

work page doi:10.1016/s0377-2217(96)00385-2 1997
[44]

Proximal Policy Optimization Algorithms

J. Schulman, F. Wolski, P. Dhariwal, A. Radford and O. Klimov, Proximal Policy Optimization algorithms, 2017,arXiv:1707.06347 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2017
[45]

Silver, J

D. Silver, J. Schrittwieser, K. Simonyan, I. Antonoglou, A. Huang, A. Guez, T. Hubert, L. Baker, M. Lai, A. Bolton, Y. Chen, T. Lillicrap, F. Hui, L. Sifre, G. van den Driessche, T. Graepel and D. Hassabis, Mastering the game of Go without human knowledge,Nature550(2017), 354–359,https://doi.org/10. 1038/nature24270

work page 2017
[46]

Simonovits, Extremal graph problems, degenerate extremal problems, and supersaturated graphs, in: J

M. Simonovits, Extremal graph problems, degenerate extremal problems, and supersaturated graphs, in: J. A. Bondy and U. S. R. Murty (eds.),Progress in Graph Theory, Academic Press, Toronto, ON, 1984, pp. 419–437

work page 1984
[47]

Sharafdini and T

R. Sharafdini and T. Réti, On the transmission-based graph topological indices,Kragujevac J. Math.44 (2020), 41–63,https://doi.org/10.46793/KgJMat2001.041S

work page doi:10.46793/kgjmat2001.041s 2020
[48]

Soviany, R

P. Soviany, R. T. Ionescu, P. Rota and N. Sebe, Curriculum learning: A survey,Int. J. Comput. Vision 130(2022), 1526–1565,https://doi.org/10.1007/s11263-022-01611-x

work page doi:10.1007/s11263-022-01611-x 2022
[49]

Stevanović, I

Ð. Stevanović, I. Damnjanović and D. Stevanović, Finding counterexamples for a conjecture of Akbari, Alazemi and Anđelić, 2021,arXiv:2111.15303 [math.CO]. 25

work page arXiv 2021
[50]

R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction, 2nd edition, Adaptive Computa- tion and Machine Learning, MIT Press, Cambridge, MA, 2018

work page 2018
[51]

Cs. Szepesvári,Algorithms for Reinforcement Learning, 1st edition, Synthesis Lectures on Artificial Intel- ligence and Machine Learning, Springer, Cham, 2010,https://doi.org/10.1007/978-3-031-01551-9

work page doi:10.1007/978-3-031-01551-9 2010
[52]

Taieb, M

L. Taieb, M. Roucairol, T. Cazenave and A. Harutyunyan, Automated refutation with Monte Carlo search of graph theory conjectures on the maximum Laplacian eigenvalue, in: Y. Zhang, M. Hladík and H. Moosaei (eds.),Learning and Intelligent Optimization, volume15744ofLecture Notes in Computer Science, Springer, Cham, 2026, pp. 52–63,https://doi.org/10.1007/97...

work page doi:10.1007/978-3-032-09156-7_4 2026
[53]

Gymnasium: A Standard Interface for Reinforcement Learning Environments

M. Towers, A. Kwiatkowski, J. Terry, J. U. Balis, G. De Cola, T. Deleu, M. Goulão, A. Kallinteris, M. Krimmel, A. KG, R. Perez-Vicente, A. Pierré, S. Schulhoff, J. J. Tai, H. Tan and O. G. Younis, Gymnasium: A standard interface for reinforcement learning environments, 2025,arXiv:2407.17032 [cs.LG]

work page internal anchor Pith review Pith/arXiv arXiv 2025
[54]

A. Zs. Wagner, Constructions in combinatorics via neural networks, 2021,arXiv:2104.14516 [math.CO]

work page arXiv 2021
[55]

R. J. Williams, Simple statistical gradient-following algorithms for connectionist reinforcement learning, Mach. Learn.8(1992), 229–256,https://doi.org/10.1007/BF00992696

work page doi:10.1007/bf00992696 1992
[56]

The Sage Developers, SageMath, the Sage Mathematics Software System (Version 10.8), 2025,https: //www.sagemath.org. 26

work page 2025