pith. sign in

arxiv: 2605.30954 · v1 · pith:5IWEWS62new · submitted 2026-05-29 · 💻 cs.NE

GP-GOMEA with GPU-Based Fitness Evaluations: Design and Performance Analysis

Pith reviewed 2026-06-28 20:19 UTC · model grok-4.3

classification 💻 cs.NE
keywords symbolic regressiongenetic programmingGPU accelerationevolutionary algorithmsfitness evaluationGP-GOMEAFeynman equations
0
0 comments X

The pith

A GPU fitness evaluation scheme for GP-GOMEA increases evaluation throughput enough to regress large Feynman equations in four hours.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper designs a GPU-friendly representation of GP-GOMEA template-based individuals together with a parallel evaluation strategy. This change replaces serial CPU fitness calculations with massively parallel GPU ones, raising the number of evaluations that fit inside a fixed time budget by orders of magnitude. On four standard symbolic regression benchmarks the extra evaluations produce better models, especially when datasets or populations are large. The same capacity also lets the algorithm run systematic experiments on expression structure that were previously impossible and, for the first time, reliably recover one of the largest Feynman equations within four hours.

Core claim

By introducing a GPU-based fitness evaluation scheme that exploits the inherent parallelism of population-based search, the work raises evaluation throughput for GP-GOMEA. The resulting capacity yields performance gains on benchmarks with larger datasets and populations, enables new analyses of how expression structure affects search difficulty, and allows a problem-agnostic evolutionary algorithm to regress one of the largest Feynman equations reliably inside four hours.

What carries the argument

The GPU-based fitness evaluation scheme, which uses a template-based individual representation optimized for parallel GPU computation to replace CPU evaluations.

If this is right

  • Performance improves on four standard symbolic regression benchmarks, with larger gains for bigger datasets and populations.
  • Systematic study of which expression structures make search harder for GP-GOMEA becomes feasible.
  • A problem-agnostic evolutionary algorithm can now reliably regress one of the largest Feynman equations in four hours.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same GPU pattern could be applied to other template-based genetic programming methods to achieve comparable speed-ups.
  • The ability to run far more evaluations may narrow the scalability gap between symbolic regression and deep-learning approaches.
  • Insights from the new difficulty analyses could be used to design heuristics that prune hard expression structures early.

Load-bearing premise

The GPU implementation must compute fitness values that match the accuracy of the original CPU version while delivering the claimed higher throughput.

What would settle it

A side-by-side run of identical GP-GOMEA individuals on CPU and GPU that produces measurably different model errors or solution qualities on any benchmark would show the accuracy equivalence does not hold.

Figures

Figures reproduced from arXiv: 2605.30954 by Anton Bouter, Jasper Post, Johannes Koch, Peter A.N. Bosman, Tanja Alderliesten.

Figure 1
Figure 1. Figure 1: Left: the expression tree; colored nodes are active, while gray dashed [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Parallel decomposition and kernel design for two designs. Left: single block [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Median validation MSE and interquartile ranges (shaded) over all runs [PITH_FULL_IMAGE:figures/full_fig_p010_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Median number of evaluations per minute over population size across [PITH_FULL_IMAGE:figures/full_fig_p011_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Median number of evaluations per minute over population size across [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Top rows: NMSE over time across problems with template depths of 6 [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Median training MSE and interquartile ranges (shaded areas) over all [PITH_FULL_IMAGE:figures/full_fig_p017_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Median validation MSE and interquartile ranges (shaded areas) over all [PITH_FULL_IMAGE:figures/full_fig_p018_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Median training MSE and interquartile ranges (shaded areas) over all [PITH_FULL_IMAGE:figures/full_fig_p019_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Median number of evaluations per minute and interquartile ranges [PITH_FULL_IMAGE:figures/full_fig_p020_10.png] view at source ↗
read the original abstract

GP-GOMEA is a state-of-the-art evolutionary algorithm for symbolic regression, known for discovering small and interpretable models. However, its computational cost remains substantial, limiting its applicability to larger datasets and more complex target expressions. In contrast, the rise of modern subsymbolic approaches, particularly deep learning, has been driven largely by the massive parallelism offered by GPUs. In this work, we take the first major step toward a fully GPU-accelerated GP-GOMEA by introducing a GPU-based fitness evaluation scheme. We design a GPU-friendly representation of GP-GOMEA's template-based individuals and a corresponding evaluation strategy that exploits the inherent parallelism of population-based search. This substantially increases evaluation throughput, enabling orders of magnitude more evaluations within the same time budget. Across four standard symbolic regression benchmarks, this increased evaluation capacity yields performance improvements, particularly for larger datasets and larger population sizes. Moreover, the ability to efficiently evaluate much larger datasets and more complex templates enables analyses that were previously infeasible, allowing us to systematically analyze what makes expressions increasingly difficult for GP-GOMEA, providing new insights into how expression structure affects search difficulty. Finally, for the first time, this expanded capability allows a problem-agnostic evolutionary algorithm to reliably regress one of the largest Feynman equations within four hours.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces a GPU-friendly representation and evaluation strategy for template-based individuals in GP-GOMEA to accelerate fitness evaluations via parallelism. It reports substantially higher throughput enabling more evaluations within fixed time budgets, yielding performance gains on four symbolic regression benchmarks (especially larger datasets and populations), new analyses of how expression structure affects search difficulty, and the first reliable regression of one of the largest Feynman equations by a problem-agnostic EA within four hours.

Significance. If the GPU scheme produces numerically identical fitness values to the CPU version (preserving evolutionary dynamics), the work meaningfully extends GP-GOMEA's reach to larger-scale problems and provides new empirical insights into search difficulty; the Feynman result would be a notable demonstration of scalability for evolutionary symbolic regression.

major comments (3)
  1. [Abstract and experimental results] Abstract and experimental results section: The central claim that GPU evaluations enable the Feynman regression 'for the first time' within four hours rests on the unverified assumption that the new scheme yields identical fitness values (and thus identical selection dynamics) to the original CPU implementation. No explicit numerical verification—e.g., side-by-side fitness comparisons or model-quality metrics on identical individuals—is reported, despite the risk that changes in floating-point associativity, reduction order, or constant handling could alter outcomes.
  2. [Benchmark results] Benchmark results (likely §4): Performance improvements are stated without error bars, multiple independent runs with statistical tests, or direct baseline comparisons to the original CPU GP-GOMEA under identical time budgets; this makes it impossible to quantify whether gains are due to throughput alone or to any unintended changes in search behavior.
  3. [Methods/design] Methods/design section: The GPU representation and evaluation strategy for template-based individuals is described at a high level, but no pseudocode, kernel details, or handling of subtree reductions/precision is provided to allow assessment of numerical equivalence.
minor comments (2)
  1. [Design section] Notation for template-based individuals and GPU data structures could be clarified with a small diagram or explicit mapping to the original CPU representation.
  2. [Experimental setup] The four standard benchmarks should be named explicitly with dataset sizes and target expressions in the main text (not only supplementary).

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive review. We address each major comment below and commit to revisions that directly respond to the concerns raised.

read point-by-point responses
  1. Referee: [Abstract and experimental results] Abstract and experimental results section: The central claim that GPU evaluations enable the Feynman regression 'for the first time' within four hours rests on the unverified assumption that the new scheme yields identical fitness values (and thus identical selection dynamics) to the original CPU implementation. No explicit numerical verification—e.g., side-by-side fitness comparisons or model-quality metrics on identical individuals—is reported, despite the risk that changes in floating-point associativity, reduction order, or constant handling could alter outcomes.

    Authors: We agree that explicit verification of numerical equivalence is necessary to substantiate the claim that the GPU scheme preserves evolutionary dynamics and to support the 'for the first time' statement. Although the GPU representation and evaluation strategy were designed to maintain identical semantics and reduction order where possible, the submitted manuscript does not include side-by-side fitness or model-quality comparisons. In the revision we will add such verification experiments on identical individuals across both implementations. revision: yes

  2. Referee: [Benchmark results] Benchmark results (likely §4): Performance improvements are stated without error bars, multiple independent runs with statistical tests, or direct baseline comparisons to the original CPU GP-GOMEA under identical time budgets; this makes it impossible to quantify whether gains are due to throughput alone or to any unintended changes in search behavior.

    Authors: We accept that the results presentation would be strengthened by error bars, explicit reporting of the number of independent runs, statistical tests, and direct CPU baselines under matched time budgets. The experiments underlying the reported improvements were performed with multiple runs, yet these details and comparisons were not fully documented. We will revise the experimental results section to include error bars, run counts, statistical analysis, and side-by-side CPU comparisons. revision: yes

  3. Referee: [Methods/design] Methods/design section: The GPU representation and evaluation strategy for template-based individuals is described at a high level, but no pseudocode, kernel details, or handling of subtree reductions/precision is provided to allow assessment of numerical equivalence.

    Authors: We acknowledge that additional low-level details would facilitate independent assessment of numerical equivalence. The current description focuses on the high-level design choices that enable GPU parallelism. In the revised manuscript we will expand the methods section with pseudocode for the core evaluation kernels, explicit handling of subtree reductions, and discussion of floating-point precision and associativity choices. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical implementation paper with externally verifiable benchmarks

full rationale

The paper describes a GPU implementation of fitness evaluation for the existing GP-GOMEA algorithm and reports empirical throughput and performance gains on symbolic regression benchmarks, including one Feynman equation. No mathematical derivations, parameter fits, or predictions appear. Claims rest on direct runtime measurements and solution quality comparisons that can be reproduced independently. No self-citations are load-bearing for any core result, and no steps reduce by construction to inputs. The equivalence assumption for GPU vs CPU is an implementation detail subject to external verification rather than a circular definition.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work is an empirical algorithmic engineering effort; the abstract introduces no mathematical derivations, free parameters, axioms, or new postulated entities.

pith-pipeline@v0.9.1-grok · 5770 in / 1014 out tokens · 19147 ms · 2026-06-28T20:19:34.667139+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 18 canonical work pages · 1 internal anchor

  1. [1]

    Baeta, F., Correia, J., Martins, T., Machado, P.: TensorGP - Genetic Programming EngineinTensorflow.In:Castillo,P.A.,JiménezLaredo,J.L.(eds.)Applicationsof Evolutionary Computation. pp. 763–778. Springer International Publishing, Cham (2021)

  2. [2]

    https://doi.org/10.48550/ARXIV.2503.19043, https://arxiv.org/abs/2503.19043

    Bruneton, J.P.: Enhancing symbolic regression with quality-diversity and physics-inspired constraints (2025). https://doi.org/10.48550/ARXIV.2503.19043, https://arxiv.org/abs/2503.19043

  3. [3]

    In: Proceed- ings of the 2020 Genetic and Evolutionary Computation Conference Com- panion

    Burlacu, B., Kronberger, G., Kommenda, M.: Operon C++: an Efficient Genetic Programming Framework for Symbolic Regression. In: Proceed- ings of the 2020 Genetic and Evolutionary Computation Conference Com- panion. p. 1562–1570. GECCO ’20, Association for Computing Machin- ery, New York, NY, USA (2020). https://doi.org/10.1145/3377929.3398099, https://doi....

  4. [4]

    Cava, W.L., Orzechowski, P., Burlacu, B., de França, F.O., Virgolin, M., Jin, Y., Kommenda, M., Moore, J.H.: Contemporary Symbolic Regression Methods and their Relative Performance (2021), https://arxiv.org/abs/2107.14351

  5. [5]

    Cranmer, M.: Interpretable Machine Learning for Science with PySR and Symbol- icRegression (2023), https://arxiv.org/abs/2305.01582

  6. [6]

    Evolutionary computa- tion32(4), 371–397 (2024)

    Dushatskiy, A., Virgolin, M., Bouter, A., Thierens, D., Bosman, P.A.: Parameter- less Gene-pool Optimal Mixing Evolutionary Algorithms. Evolutionary computa- tion32(4), 371–397 (2024)

  7. [7]

    UCI Machine Learning Repository (2016), DOI: https://doi.org/10.24432/C5BC8T

    Ferreira, R., Martiniano, A., Ferreira, A., Ferreira, A., Sassi, R.: Daily De- mand Forecasting Orders. UCI Machine Learning Repository (2016), DOI: https://doi.org/10.24432/C5BC8T

  8. [8]

    The Journal of Machine Learning Research13(1), 2171–2175 (2012)

    Fortin, F.A., De Rainville, F.M., Gardner, M.A.G., Parizeau, M., Gagné, C.: DEAP: Evolutionary Algorithms Made Easy. The Journal of Machine Learning Research13(1), 2171–2175 (2012)

  9. [9]

    Harrison, J., Bosman, P.A.N., Alderliesten, T.: Thinking Outside the Template with Modular GP-GOMEA (2025), https://arxiv.org/abs/2505.01262

  10. [10]

    In: 2013 IEEE Congress on Evolutionary Computation

    Janikow, C.Z., Aleshunas, J.: Impact of commutative and non-commutative func- tions on symbolic regression with acgp. In: 2013 IEEE Congress on Evolutionary Computation. pp. 2290–2297 (2013). https://doi.org/10.1109/CEC.2013.6557842

  11. [11]

    In: Affenzeller, M., Winkler, S.M., Kononova, A.V., Trautmann, H., Tušar, T., Machado, P., Bäck, T

    Koch, J., Alderliesten, T., Bosman, P.A.N.: Simultaneous Model-Based Evolution of Constants and Expression Structure in GP-GOMEA for Symbolic Regression. In: Affenzeller, M., Winkler, S.M., Kononova, A.V., Trautmann, H., Tušar, T., Machado, P., Bäck, T. (eds.) Parallel Problem Solving from Nature – PPSN XVIII. pp. 238–255. Springer Nature Switzerland, Cham (2024)

  12. [12]

    arXiv preprint arXiv:2602.02311 (2026)

    Koch, J., Alderliesten, T., Bosman, P.A.: Introns and Templates Matter: Rethink- ing Linkage in GP-GOMEA. arXiv preprint arXiv:2602.02311 (2026)

  13. [13]

    (ed.) Late Breaking Papers at the Genetic Pro- gramming 1996 Conference Stanford University July 28-31, 1996

    Maxwell, S.R.: Why Might Some Problems Be Difficult for Genetic Programming to Find Solutions? In: Koza, J.R. (ed.) Late Breaking Papers at the Genetic Pro- gramming 1996 Conference Stanford University July 28-31, 1996. pp. 125–128. Stanford Bookstore, Stanford University, CA, USA (28–31 Jul 1996)

  14. [14]

    Meurer, A., Smith, C.P., Paprocki, M., Čertík, O., Kirpichev, S.B., Rocklin, M., Kumar, A., Ivanov, S., Moore, J.K., Singh, S., Rathnayake, T., Vig, S., Granger, B.E., Muller, R.P., Bonazzi, F., Gupta, H., Vats, S., Johansson, F., Pedregosa, F., Curry, M.J., Terrel, A.R., Roučka, v., Saboo, A., Fernando, I., Kulal, S., Cimrman, R., Scopatz, A.: SymPy: sym...

  15. [15]

    Journal of Machine Learning Research12, 2825–2830 (2011)

    Pedregosa, F., Varoquaux, G., Gramfort, A., Michel, V., Thirion, B., Grisel, O., Blondel, M., Prettenhofer, P., Weiss, R., Dubourg, V., Vanderplas, J., Passos, A., Cournapeau, D., Brucher, M., Perrot, M., Duchesnay, E.: Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research12, 2825–2830 (2011)

  16. [16]

    UCI Machine Learning Repository (1993), DOI: https://doi.org/10.24432/C5859H

    Quinlan, R.: Auto MPG. UCI Machine Learning Repository (1993), DOI: https://doi.org/10.24432/C5859H

  17. [17]

    arXiv preprint arXiv:2012.00058v2 (2021)

    Romano, J.D., Le, T.T., La Cava, W., Gregg, J.T., Goldberg, D.J., Chakraborty, P., Ray, N.L., Himmelstein, D., Fu, W., Moore, J.H.: PMLB v1.0: an open source dataset collection for benchmarking machine learning methods. arXiv preprint arXiv:2012.00058v2 (2021)

  18. [18]

    In: Pro- ceedings of the Genetic and Evolutionary Computation Conference

    Schlender, T., Malafaia, M., Alderliesten, T., Bosman, P.: Improv- ing the efficiency of GP-GOMEA for higher-arity operators. In: Pro- ceedings of the Genetic and Evolutionary Computation Conference. p. 971–979. GECCO ’24, Association for Computing Machinery, New York, NY, USA (2024). https://doi.org/10.1145/3638529.3654118, https://doi.org/10.1145/363852...

  19. [19]

    Zonglin Yang and Lidong Bing

    Shojaee, P., Meidani, K., Gupta, S., Farimani, A.B., Reddy, C.K.: Llm-sr: Sci- entific equation discovery via programming with large language models (2024). https://doi.org/10.48550/ARXIV.2404.18400, https://arxiv.org/abs/2404.18400

  20. [20]

    URL https://github

    Stephens, T.: gplearn: Genetic programming in Python, with a scikit-learn inspired API. URL https://github. com/trevorstephens/gplearn. Accessed pp. 05–22 (2024)

  21. [21]

    Science Advances 6(16), eaay2631 (2020)

    Udrescu, S.M., Tegmark, M.: AI Feynman: A physics- inspired method for symbolic regression. Science Advances 6(16), eaay2631 (2020). https://doi.org/10.1126/sciadv.aay2631, https://www.science.org/doi/abs/10.1126/sciadv.aay2631

  22. [22]

    In: Proceedings of the Genetic and Evolutionary Com- putation Conference

    Virgolin, M., Alderliesten, T., Witteveen, C., Bosman, P.A.N.: Scalable Genetic Programming by Gene-Pool Optimal Mixing and Input-Space Entropy-Based Building-Block Learning. In: Proceedings of the Genetic and Evolutionary Com- putation Conference. p. 1041–1048. GECCO ’17, Association for Computing Ma- chinery, New York, NY, USA (2017). https://doi.org/10...

  23. [23]

    In: Proceedings of the Genetic and Evolutionary Computation Conference Companion

    de Vries, S., Keemink, S.W., van Gerven, M.A.J.: Kozax: Flexible and Scalable Genetic Programming in JAX. In: Proceedings of the Genetic and Evolutionary Computation Conference Companion. p. 603–606. GECCO ’25 Companion, Association for Computing Machin- ery, New York, NY, USA (2025). https://doi.org/10.1145/3712255.3726681, https://doi.org/10.1145/371225...

  24. [24]

    Wu, Z., Wang, L., Sun, K., Li, Z., Cheng, R.: Enabling Population-Level Parallelism in Tree-Based Genetic Programming for GPU Acceleration (2026), https://arxiv.org/abs/2501.17168

  25. [25]

    Zhang, R., Lensen, A., Sun, Y.: Speeding up Genetic Programming Based Sym- bolic Regression using GPUs. In: PRICAI 2022: Trends in Artificial Intelligence: 19th Pacific Rim International Conference on Artificial Intelligence, PRICAI 2022, Shanghai, China, November 10–13, 2022, Proceedings, Part I. p. 519–533. Springer- Verlag, Berlin, Heidelberg (2022). h...