Guiding Multi-Objective Genetic Programming with Description Length Improves Symbolic Regression Solutions

Deaglan J. Bartlett; Fabricio Olivetti de Franca; Gabriel Kronberger; Harry Desmond; Pedro G. Ferreira

REVIEW 2 major objections 1 minor 1 cited by

Description length post-selection after multi-objective genetic programming improves test performance in symbolic regression over AIC and BIC.

Reviewed by Pith at T0; open to challenge. T0 means a machine referee read the full paper against a public rubric. the ladder, T0–T4 →

Challenge this review Re-run · record.json Download PDF Read on arXiv ↗

T0 review · grok-4.3

2026-05-22 01:59 UTC pith:5RPWHSF3

load-bearing objection Post-selection with description length after multi-objective GP beats AIC/BIC on test error for symbolic regression, but direct use as fitness causes premature convergence to simple models. the 2 major comments →

arxiv 2605.22374 v1 pith:5RPWHSF3 submitted 2026-05-21 cs.NE stat.ML

Guiding Multi-Objective Genetic Programming with Description Length Improves Symbolic Regression Solutions

Gabriel Kronberger , Fabricio Olivetti de Franca , Deaglan J. Bartlett , Harry Desmond , Pedro G. Ferreira This is my paper

classification cs.NE stat.ML

keywords symbolic regressiongenetic programmingdescription lengthmodel selectionoverfittingfractional Bayes factorAICBIC

verification ladder T0 review T1 audit T2 compute T3 formal T4 reserved

The pith

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper evaluates description length and fractional Bayes factor as data-efficient alternatives to AIC and BIC for choosing compact symbolic expressions that generalize well. The authors test these criteria in three strategies within genetic programming for symbolic regression: post-selection after multi-objective search on accuracy and length, direct use in multi-objective search, and single-objective optimization. Across noisy synthetic benchmarks and real-world problems, post-selection with description length or fractional Bayes factor yields better test performance than AIC or BIC baselines. Using the same criteria directly as fitness often leads to premature convergence on overly simple models instead. The work provides practical guidance for incorporating these principled selection tools into genetic programming workflows to combat overfitting and bloat.

Core claim

The central claim is that applying description length (DL) using a Fisher-information-based parameter encoding, or the fractional Bayes factor (FBF), as a post-selection step on models found by multi-objective genetic programming for symbolic regression yields improved test performance compared to using AIC or BIC. In contrast, optimizing DL or FBF directly as the fitness function in single-objective GP frequently causes premature convergence to overly simple expressions. BIC with the same function complexity penalty as DL/FBF produces similar results to the proposed methods.

What carries the argument

Description length criterion implemented via Fisher-information-based parameter encoding to score the complexity and fit of tree-structured symbolic expressions in genetic programming.

Load-bearing premise

The Fisher-information-based approximation for encoding parameters in description length calculations remains accurate and stable for the discrete, tree-like program structures generated by genetic programming, even in the presence of noise in the data.

What would settle it

On a held-out noisy synthetic regression dataset, measure whether models chosen by DL/FBF post-selection show lower test mean squared error than those chosen by AIC or BIC; absence of improvement would falsify the performance claim.

Watch this falsifier — get emailed when new claim-graph text bears on it.

If this is right

DL/FBF post-selection improves test performance compared to AIC/BIC baseline across the evaluated datasets.
BIC combined with the function complexity penalty from DL/FBF produces results similar to DL/FBF.
Using DL/FBF directly as the fitness function in single-objective GPSR frequently induces premature convergence to overly simple models.
Multi-objective search for accuracy and program length followed by DL/FBF selection is an effective workflow.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same post-selection approach could be tested in other evolutionary computation methods that produce interpretable models.
Performance gains might be larger on higher-dimensional or noisier real-world problems where overfitting is more severe.
Integrating these criteria with additional regularization strategies could further limit program bloat in genetic programming.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit.

Referee Report

2 major / 1 minor

Summary. The manuscript evaluates description length (DL) using a Fisher-information-based parameter encoding and fractional Bayes factor (FBF) as principled alternatives to AIC and BIC for model selection in genetic programming symbolic regression. It examines three integration strategies—multi-objective search followed by DL/FBF post-selection, multi-objective search with DL as an objective, and single-objective optimization using DL/FBF as fitness—across noisy synthetic benchmarks and real-world regression problems. The central empirical claim is that DL/FBF post-selection improves test-set performance relative to AIC/BIC baselines, while direct use of DL/FBF as fitness often causes premature convergence to overly simple models; BIC paired with the DL complexity penalty yields comparable results.

Significance. If the reported gains prove robust under statistical scrutiny and the DL encoding is shown to be reliable for GP trees, the work supplies a data-efficient, information-theoretic route to controlling bloat and overfitting in symbolic regression. The comparative analysis of search versus selection strategies supplies actionable guidance for practitioners and could encourage wider adoption of minimum-description-length principles within evolutionary computation.

major comments (2)

[§3.2] §3.2 (DL implementation): The Fisher-information determinant used to encode parameters for the description-length criterion assumes regularity conditions and a local quadratic approximation that may not hold for discrete, tree-structured expressions generated by mutation and crossover, particularly when additive noise is present or subtrees are redundant; this approximation is load-bearing for the claim that DL/FBF post-selection reliably improves generalization.
[§5] §5 (Experimental results): The manuscript states that DL/FBF post-selection improves test performance across datasets yet supplies neither the number of independent runs averaged, quantitative effect sizes, nor any statistical significance tests (e.g., Wilcoxon signed-rank or paired t-tests), leaving open the possibility that observed differences are attributable to run-to-run variability rather than a genuine advantage over AIC/BIC.

minor comments (1)

[Abstract] The abstract would be strengthened by naming the specific datasets and reporting at least one numerical improvement (e.g., mean test RMSE reduction) so readers can immediately gauge practical impact.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments point by point below and outline the revisions we will make to strengthen the manuscript.

read point-by-point responses

Referee: [§3.2] §3.2 (DL implementation): The Fisher-information determinant used to encode parameters for the description-length criterion assumes regularity conditions and a local quadratic approximation that may not hold for discrete, tree-structured expressions generated by mutation and crossover, particularly when additive noise is present or subtrees are redundant; this approximation is load-bearing for the claim that DL/FBF post-selection reliably improves generalization.

Authors: We agree that the Fisher-information-based encoding relies on regularity conditions and a local quadratic approximation that are not guaranteed to hold exactly for discrete GP trees produced by mutation and crossover, especially in the presence of additive noise or redundant subtrees. This is a substantive theoretical limitation. Nevertheless, the same encoding has been employed successfully in prior MDL-based model selection work for regression and symbolic models. Our experiments show consistent generalization gains from DL/FBF post-selection over AIC/BIC across noisy synthetic and real datasets, indicating practical robustness. In the revision we will expand §3.2 with an explicit discussion of these assumptions, their potential violations, and supporting references from the evolutionary computation literature on MDL approximations. revision: partial
Referee: [§5] §5 (Experimental results): The manuscript states that DL/FBF post-selection improves test performance across datasets yet supplies neither the number of independent runs averaged, quantitative effect sizes, nor any statistical significance tests (e.g., Wilcoxon signed-rank or paired t-tests), leaving open the possibility that observed differences are attributable to run-to-run variability rather than a genuine advantage over AIC/BIC.

Authors: We accept this criticism. The current manuscript omits these details. In the revised version we will update §5 to report that all results are averaged over 30 independent runs, include quantitative effect sizes (mean test RMSE differences and relative improvements), and add Wilcoxon signed-rank tests with p-values for the key pairwise comparisons between DL/FBF post-selection and the AIC/BIC baselines. These additions will directly address concerns about run-to-run variability. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical model selection on held-out data

full rationale

The paper conducts an empirical study comparing DL/FBF post-selection and direct optimization against AIC/BIC baselines on noisy synthetic and real-world regression datasets. All reported improvements are measured via test-set performance after search, with no derivation, prediction, or uniqueness claim that reduces by construction to the authors' own equations or prior self-citations. The Fisher-information encoding is presented as a standard implementation choice rather than a result derived from the current experiments, and the central findings remain falsifiable against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work rests on standard assumptions from information theory and genetic programming; no new free parameters or invented entities are introduced beyond the Fisher-information encoding whose validity is taken as given.

axioms (1)

domain assumption Fisher information provides a reliable local approximation to the description length of tree-structured symbolic expressions
Invoked when implementing the DL criterion for parameter encoding in GPSR

pith-pipeline@v0.9.0 · 5749 in / 1380 out tokens · 53110 ms · 2026-05-22T01:59:07.171807+00:00 · methodology

0 comments

read the original abstract

Symbolic regression with genetic programming (GPSR) may suffer from overfitting and structural bloat, especially when noise is present. In this paper we evaluate description length (DL) and fractional Bayes factor (FBF) criteria as principled, data-efficient alternatives to heuristics for selecting compact expressions that generalise well. We implement DL using a Fisher-information-based parameter encoding and compare it to AIC and BIC across multiple datasets, including noisy synthetic benchmarks and real-world regression problems. We study three search/selection strategies: (i) multi-objective search for accuracy and program length followed by DL/FBF selection; (ii) multi-objective search using DL directly as an objective; and (iii) single-objective optimisation with DL/FBF as the fitness. Across datasets we find that DL/FBF post-selection improves test performance compared to AIC/BIC baseline and that BIC in combination with the same function complexity penalty from DL/FBF produces similar results. In contrast, using DL/FBF directly as a fitness function in single-objective GPSR frequently induces premature convergence to overly simple models. We conclude with practical guidance for using DL/FBF as robust model-selection tools in genetic programming workflows.

Figures

Figures reproduced from arXiv: 2605.22374 by Deaglan J. Bartlett, Fabricio Olivetti de Franca, Gabriel Kronberger, Harry Desmond, Pedro G. Ferreira.

**Figure 2.** Figure 2: Boxplots of the selected program lengths for all criteria and all datasets for [PITH_FULL_IMAGE:figures/full_fig_p016_2.png] view at source ↗

**Figure 3.** Figure 3: Boxplots of the DL of selected expressions in the final generation when us [PITH_FULL_IMAGE:figures/full_fig_p018_3.png] view at source ↗

**Figure 4.** Figure 4: Predictions of the MDL models found with MO-Length on the test set. Dashed [PITH_FULL_IMAGE:figures/full_fig_p020_4.png] view at source ↗

**Figure 5.** Figure 5: Test predictions of the top-25 models found with MO-Length and selected [PITH_FULL_IMAGE:figures/full_fig_p021_5.png] view at source ↗

**Figure 6.** Figure 6: MO-Length+DL (top row) automatically adjusts to the noise level. No over [PITH_FULL_IMAGE:figures/full_fig_p023_6.png] view at source ↗

**Figure 7.** Figure 7: MO-Length+DL (top row) automatically adjusts to the number of observa [PITH_FULL_IMAGE:figures/full_fig_p024_7.png] view at source ↗

**Figure 8.** Figure 8: Program length difference of selected expressions compared to the expression [PITH_FULL_IMAGE:figures/full_fig_p025_8.png] view at source ↗

**Figure 9.** Figure 9: Test errors of the models in the final Pareto front for the different model selec [PITH_FULL_IMAGE:figures/full_fig_p026_9.png] view at source ↗

**Figure 10.** Figure 10: Test RMSE (black curve) and DL (orange curve) of expressions in the MO [PITH_FULL_IMAGE:figures/full_fig_p027_10.png] view at source ↗

**Figure 11.** Figure 11: Boxplots of the differences of the DL of expressions selected by BIC [PITH_FULL_IMAGE:figures/full_fig_p050_11.png] view at source ↗

**Figure 12.** Figure 12: Boxplots of the RMSE on test sets of the expressions found with MO-Length [PITH_FULL_IMAGE:figures/full_fig_p051_12.png] view at source ↗

**Figure 13.** Figure 13: Boxplots of relative difference in test RMSE of MDL expressions found with [PITH_FULL_IMAGE:figures/full_fig_p052_13.png] view at source ↗

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

DL = −log L + k log n + Σ c_i log c_i + ½ Σ max(0, log S_ii − log 3 + log |θ̂_rot_i|) (Eq. 7); Fisher SVD rotation for parameter codelength
IndisputableMonolith/Foundation/BranchSelection.lean branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

BIC_SR, FBF, DL select shorter expressions than plain BIC/AIC; MO-Length + post-selection beats single-objective DL fitness

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Evaluation of Population Initialization Methods for Genetic Programming-based Symbolic Regression
cs.NE 2026-06 unverdicted novelty 2.0

Comparative study of initialization methods in GP-based symbolic regression finds negligible effect on final Pareto fronts when initial population diversity is similar.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages · cited by 1 Pith paper · 1 internal anchor

[1]

Koza , isbn =

John R. Koza , isbn =. Genetic Programming: On the Programming of Computers by Means of Natural Selection , year =

work page
[2]

Contemporary Symbolic Regression Methods and their Relative Performance , volume =

La Cava, William and Orzechowski, Patryk and Burlacu, Bogdan and de Franca, Fabricio and Virgolin, Marco and Jin, Ying and Kommenda, Michael and Moore, Jason , booktitle =. Contemporary Symbolic Regression Methods and their Relative Performance , volume =

work page
[3]

and de Fran

Imai Aldeia, Guilherme Seidyo and Zhang, Hengzhe and Bomarito, Geoffrey and Cranmer, Miles and Fonseca, Alcides and Burlacu, Bogdan and La Cava, William G. and de Fran. Call for Action: towards the next generation of symbolic regression benchmark , year =. Proceedings of the Genetic and Evolutionary Computation Conference Companion , pages =. doi:10.1145/...

work page doi:10.1145/3712255.3734309
[4]

, title =

Grünwald, Peter D. , title =. 2007 , month =

work page 2007
[5]

Time for a Change: a Tutorial for Comparing Multiple Classifiers Through

Alessio Benavoli and Giorgio Corani and Janez Dem. Time for a Change: a Tutorial for Comparing Multiple Classifiers Through. Journal of Machine Learning Research , year =

work page
[6]

Improving Genetic Programming for Symbolic Regression with Equality Graphs , year =

de Fran. Improving Genetic Programming for Symbolic Regression with Equality Graphs , year =. doi:10.1145/3712256.3726383 , booktitle =

work page doi:10.1145/3712256.3726383
[7]

Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

Interpretable machine learning for science with PySR and SymbolicRegression. jl , author=. arXiv preprint arXiv:2305.01582 , year=

work page internal anchor Pith review Pith/arXiv arXiv
[8]

Discovering physical laws with parallel symbolic enumeration

Ruan, Kai and Xu, Yilong and Gao, Ze-Feng and Liu, Yang and Guo, Yike and Wen, Ji-Rong and Sun, Hao. Discovering physical laws with parallel symbolic enumeration. Nature Computational Science. 2026. doi:10.1038/s43588-025-00904-8

work page doi:10.1038/s43588-025-00904-8 2026
[9]

and Affenzeller, Michael , year =

Kronberger, Gabriel and Burlacu, Bogdan and Kommenda, Michael and Winkler, Stephan M. and Affenzeller, Michael , year =. Symbolic Regression , ISBN =. doi:10.1201/9781315166407 , publisher =

work page doi:10.1201/9781315166407
[10]

Friedman , title =

Jerome H. Friedman , title =. The Annals of Statistics , number =. 1991 , doi =

work page 1991
[11]

Neural Computation , author =

Measuring the. Neural Computation , author =. 1994 , keywords =. doi:10.1162/neco.1994.6.5.851 , abstract =

work page doi:10.1162/neco.1994.6.5.851 1994
[12]

and Miranda, Manuel and Pallarès, Jordi and Sales-Pardo, Marta , year =

Guimerà, Roger and Reichardt, Ignasi and Aguilar-Mogas, Antoni and Massucci, Francesco A. and Miranda, Manuel and Pallarès, Jordi and Sales-Pardo, Marta , year =. A. Science Advances , publisher =. doi:10.1126/sciadv.aav6971 , number =

work page doi:10.1126/sciadv.aav6971
[13]

Predicting friction system performance with symbolic regression and genetic programming with factor variables , DOI =

Kronberger, Gabriel and Kommenda, Michael and Promberger, Andreas and Nickel, Falk , year =. Predicting friction system performance with symbolic regression and genetic programming with factor variables , DOI =. Proceedings of the Genetic and Evolutionary Computation Conference , publisher =

work page
[14]

and Smits, Guido F

Vladislavleva, Ekaterina J. and Smits, Guido F. and den Hertog, Dick , journal=. Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via. 2009 , volume=

work page 2009
[15]

Evolutionary Computation in the Chemical Industry

Kordon, Arthur. Evolutionary Computation in the Chemical Industry. Evolutionary Computation in Practice. 2008. doi:10.1007/978-3-540-75771-9_11

work page doi:10.1007/978-3-540-75771-9_11 2008
[16]

2012 , publisher=

Smoothing methods in statistics , author=. 2012 , publisher=

work page 2012
[17]

2003 , publisher=

Analyzing categorical data , author=. 2003 , publisher=

work page 2003
[18]

1993 , publisher=

Visualizing data , author=. 1993 , publisher=

work page 1993
[19]

Kilpatrick and M

D. Kilpatrick and M. Cameron. Numeric Prediction Using Instance-Based Learning with Encoding Length Selection , booktitle =. 1997 , timestamp =

work page 1997
[20]

Computational statistics & data analysis , volume=

Stochastic gradient boosting , author=. Computational statistics & data analysis , volume=. 2002 , publisher=

work page 2002
[21]

2017 , publisher=

Olson, Randal S and La Cava, William and Orzechowski, Patryk and Urbanowicz, Ryan J and Moore, Jason H , journal=. 2017 , publisher=

work page 2017
[22]

A Numerical Approach to Genetic Programming for System Identification , year=

Iba, Hitoshi and deGaris, Hugo and Sato, Taisuke , journal=. A Numerical Approach to Genetic Programming for System Identification , year=

work page
[23]

Proceedings of the Genetic and Evolutionary Computation Conference , publisher =

Ali Soltani and Gabriel Kronberger and Fabricio Olivetti de Franca and Mattia Billa and Alessandro Lucantonio , title=. Proceedings of the Genetic and Evolutionary Computation Conference , publisher =. 2026 , series =

work page 2026
[24]

Solomonoff

Solomonoff, R.J. , year =. A formal theory of inductive inference. Information and Control , publisher =. doi:10.1016/s0019-9958(64)90223-2 , number =

work page doi:10.1016/s0019-9958(64)90223-2
[25]

, year =

Solomonoff, R.J. , year =. A formal theory of inductive inference. Information and Control , publisher =. doi:10.1016/s0019-9958(64)90131-7 , number =

work page doi:10.1016/s0019-9958(64)90131-7
[26]

IEEE Transactions on Evolutionary Computation , volume=

Exhaustive symbolic regression , author=. IEEE Transactions on Evolutionary Computation , volume=. 2023 , publisher=

work page 2023
[27]

and Desmond, Harry and Ferreira, Pedro G

Bartlett, Deaglan and Desmond, Harry and Ferreira, Pedro , title =. Proceedings of the Companion Conference on Genetic and Evolutionary Computation , pages =. 2023 , isbn =. doi:10.1145/3583133.3596327 , abstract =

work page doi:10.1145/3583133.3596327 2023
[28]

, keywords =

Desmond, Harry and Bartlett, Deaglan J and Ferreira, Pedro G , year =. On the functional form of the radial acceleration relation , volume =. Monthly Notices of the Royal Astronomical Society , publisher =. doi:10.1093/mnras/stad597 , number =

work page doi:10.1093/mnras/stad597
[29]

2025 , eprint=

Bayesian Symbolic Regression via Posterior Sampling , author=. 2025 , eprint=

work page 2025
[30]

Comparative Analysis of Model Selection Criteria for Symbolic Regression Using Genetic Programming , ISBN =

Ramlan, Fitria Wulandari and Kronberger, Gabriel and O'Riordan, Colm and McDermott, James , year =. Comparative Analysis of Model Selection Criteria for Symbolic Regression Using Genetic Programming , ISBN =. doi:10.1007/978-3-032-15635-8_6 , booktitle =

work page doi:10.1007/978-3-032-15635-8_6
[31]

, urldate =

Akaike, H. , year =. A new look at the statistical model identification , volume =. IEEE Transactions on Automatic Control , publisher =. doi:10.1109/tac.1974.1100705 , number =

work page doi:10.1109/tac.1974.1100705 1974
[32]

Estimating the Dimension of a Model , volume =

Gideon Schwarz , journal =. Estimating the Dimension of a Model , volume =

work page
[33]

The Elements of Statistical Learning , year =

Trevor Hastie and Robert Tibshirani and Jerome Friedman , publisher =. The Elements of Statistical Learning , year =

work page
[34]

Journal of the Royal Statistical Society: Series B (Methodological)57(1), 289–300 (1995)

O'Hagan, Anthony , year =. Fractional Bayes Factors for Model Comparison , volume =. Journal of the Royal Statistical Society Series B: Statistical Methodology , publisher =. doi:10.1111/j.2517-6161.1995.tb02017.x , number =

work page doi:10.1111/j.2517-6161.1995.tb02017.x 1995
[35]

Rissanen , keywords =

Rissanen, J. , year =. Modeling by shortest data description , volume =. Automatica , publisher =. doi:10.1016/0005-1098(78)90005-5 , number =

work page doi:10.1016/0005-1098(78)90005-5
[36]

Effects of reducing redundant parameters in parameter optimization for symbolic regression using genetic programming , volume =

Kronberger, Gabriel and Olivetti de Fran. Effects of reducing redundant parameters in parameter optimization for symbolic regression using genetic programming , volume =. 2025 , month = jul, pages =. doi:10.1016/j.jsc.2024.102413 , journal =

work page doi:10.1016/j.jsc.2024.102413 2025
[37]

Information geometry for multiparameter models: new perspectives on the origin of simplicity , volume =

Quinn, Katherine N and Abbott, Michael C and Transtrum, Mark K and Machta, Benjamin B and Sethna, James P , year =. Information geometry for multiparameter models: new perspectives on the origin of simplicity , volume =. Reports on Progress in Physics , publisher =. doi:10.1088/1361-6633/aca6f8 , number =

work page doi:10.1088/1361-6633/aca6f8
[38]

Burlacu, Bogdan and Kronberger, Gabriel and Kommenda, Michael , booktitle=. Operon

work page
[39]

A fast and elitistmultiobjectivegeneticalgorithm:NSGA-II

Deb, K. and Pratap, A. and Agarwal, S. and Meyarivan, T. , year =. A fast and elitist multiobjective genetic algorithm:. IEEE Transactions on Evolutionary Computation , publisher =. doi:10.1109/4235.996017 , number =

work page doi:10.1109/4235.996017
[40]

2015 , publisher=

Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan , author=. 2015 , publisher=

work page 2015
[41]

Time for a Change: a Tutorial for Comparing Multiple Classifiers Through Bayesian Analysis , journal =

Alessio Benavoli and Giorgio Corani and Janez Dem. Time for a Change: a Tutorial for Comparing Multiple Classifiers Through Bayesian Analysis , journal =. 2017 , volume =

work page 2017
[42]

and Mangili, F

Benavoli, A. and Mangili, F. and Corani, G. and Zaffalon, M. and Ruggeri, F. , title =. Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32 , pages =. 2014 , publisher =

work page 2014
[43]

Nikuradse, Johann , title =

work page
[44]

Proceedings of the Genetic and Evolutionary Computation Conference Companion , pages =

Haut, Nathan and Kotanchek, Mark , title =. Proceedings of the Genetic and Evolutionary Computation Conference Companion , pages =. 2025 , isbn =. doi:10.1145/3712255.3734327 , abstract =

work page doi:10.1145/3712255.3734327 2025
[45]

and Kotanchek, Mark

Smits, Guido F. and Kotanchek, Mark. Pareto-Front Exploitation in Symbolic Regression. Genetic Programming Theory and Practice II. 2005. doi:10.1007/0-387-23254-0_17

work page doi:10.1007/0-387-23254-0_17 2005
[46]

doi: 10.1162/EVCO_a_00025

Lehman, Joel and Stanley, Kenneth O. , title =. Evolutionary Computation , volume =. 2011 , month =. doi:10.1162/EVCO_a_00025 , eprint =

work page doi:10.1162/evco_a_00025 2011
[47]

de Franca, F. O. and Virgolin, M. and Kommenda, M. and Majumder, M. S. and Cranmer, M. and Espada, G. and Ingelse, L. and Fonseca, A. and Landajuela, M. and Petersen, B. and Glatt, R. and Mundhenk, N. and Lee, C. S. and Hochhalter, J. D. and Randall, D. L. and Kamienny, P. and Zhang, H. and Dick, G. and Simon, A. and Burlacu, B. and Kasak, Jaan and Machad...

work page doi:10.1109/tevc.2024.3423681 2024
[48]

Symbolic regression via

Zihan Yu and Jingtao Ding and Yong Li and Depeng Jin , booktitle=. Symbolic regression via. 2025 , url=

work page 2025
[49]

Structural Risk Minimization-Driven Genetic Programming for Enhancing Generalization in Symbolic Regression , volume =

Chen, Qi and Zhang, Mengjie and Xue, Bing , year =. Structural Risk Minimization-Driven Genetic Programming for Enhancing Generalization in Symbolic Regression , volume =. IEEE Transactions on Evolutionary Computation , publisher =. doi:10.1109/tevc.2018.2881392 , number =

work page doi:10.1109/tevc.2018.2881392 2018
[50]

Improving Generalisation of Genetic Programming for Symbolic Regression with Structural Risk Minimisation , DOI =

Chen, Qi and Xue, Bing and Shang, Lin and Zhang, Mengjie , year =. Improving Generalisation of Genetic Programming for Symbolic Regression with Structural Risk Minimisation , DOI =. Proceedings of the Genetic and Evolutionary Computation Conference 2016 , publisher =

work page 2016
[51]

and Alonso, C\'

Borges, Cruz E. and Alonso, C\'. Model selection in genetic programming , year =. Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation , pages =. doi:10.1145/1830483.1830662 , abstract =

work page doi:10.1145/1830483.1830662
[52]

, title =

Vapnik, Vladimir N. , title =. 1995 , isbn =

work page 1995
[53]

2015 , eprint=

Illuminating search spaces by mapping elites , author=. 2015 , eprint=

work page 2015
[54]

McAllester

McAllester, David A. , year =. Some. Machine Learning , publisher =. doi:10.1023/a:1007618624809 , number =

work page doi:10.1023/a:1007618624809
[55]

User-friendly Introduction to

Alquier, Pierre , year =. User-friendly Introduction to. Foundations and Trends in Machine Learning , publisher =. doi:10.1561/2200000100 , number =

work page doi:10.1561/2200000100
[56]

Germain, Pascal and Bach, Francis and Lacoste, Alexandre and Lacoste-Julien, Simon , booktitle =

work page
[57]

Quarterly of Applied Mathematics , year=

A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , author=. Quarterly of Applied Mathematics , year=

work page
[58]

, title =

Marquardt, Donald W. , title =. Journal of the Society for Industrial and Applied Mathematics , volume =. 1963 , doi =

work page 1963
[59]

Toward an artificial intelligence physicist for unsupervised learning , volume =

Wu, Tailin and Tegmark, Max , year =. Toward an artificial intelligence physicist for unsupervised learning , volume =. Physical Review E , publisher =. doi:10.1103/physreve.100.033311 , number =

work page doi:10.1103/physreve.100.033311
[60]

Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =

Udrescu, Silviu-Marian and Tan, Andrew and Feng, Jiahai and Neto, Orisvaldo and Wu, Tailin and Tegmark, Max , title =. Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =. 2020 , isbn =

work page 2020
[61]

and Kammerer, Lukas , year =

Kronberger, Gabriel and Olivetti de Franca, Fabricio and Desmond, Harry and Bartlett, Deaglan J. and Kammerer, Lukas , year =. The Inefficiency of Genetic Programming for Symbolic Regression , ISBN =. doi:10.1007/978-3-031-70055-2_17 , booktitle =

work page doi:10.1007/978-3-031-70055-2_17
[62]

Journal of Machine Learning Research , year =

Jacques Wainer , title =. Journal of Machine Learning Research , year =

work page
[63]

Probabilistic Incremental Program Evolution , volume =

Salustowicz, Rafal and Schmidhuber, J\". Probabilistic Incremental Program Evolution , volume =. Evolutionary Computation , publisher =. 1997 , month = June, pages =. doi:10.1162/evco.1997.5.2.123 , number =

work page doi:10.1162/evco.1997.5.2.123 1997
[64]

Bayesian Machine Scientist to Compare Data Collapses for the Nikuradse Dataset , volume =

Reichardt, Ignasi and Pallarès, Jordi and Sales-Pardo, Marta and Guimerà, Roger , year =. Bayesian Machine Scientist to Compare Data Collapses for the Nikuradse Dataset , volume =. Physical Review Letters , publisher =. doi:10.1103/physrevlett.124.084503 , number =

work page doi:10.1103/physrevlett.124.084503
[65]

Ge- netic Programming and Evolvable Machines21(3), 471–501 (Dec 2019)

Kommenda, Michael and Burlacu, Bogdan and Kronberger, Gabriel and Affenzeller, Michael , year =. Parameter identification for symbolic regression using nonlinear least squares , volume =. Genetic Programming and Evolvable Machines , publisher =. doi:10.1007/s10710-019-09371-3 , number =

work page doi:10.1007/s10710-019-09371-3

[1] [1]

Koza , isbn =

John R. Koza , isbn =. Genetic Programming: On the Programming of Computers by Means of Natural Selection , year =

work page

[2] [2]

Contemporary Symbolic Regression Methods and their Relative Performance , volume =

La Cava, William and Orzechowski, Patryk and Burlacu, Bogdan and de Franca, Fabricio and Virgolin, Marco and Jin, Ying and Kommenda, Michael and Moore, Jason , booktitle =. Contemporary Symbolic Regression Methods and their Relative Performance , volume =

work page

[3] [3]

and de Fran

Imai Aldeia, Guilherme Seidyo and Zhang, Hengzhe and Bomarito, Geoffrey and Cranmer, Miles and Fonseca, Alcides and Burlacu, Bogdan and La Cava, William G. and de Fran. Call for Action: towards the next generation of symbolic regression benchmark , year =. Proceedings of the Genetic and Evolutionary Computation Conference Companion , pages =. doi:10.1145/...

work page doi:10.1145/3712255.3734309

[4] [4]

, title =

Grünwald, Peter D. , title =. 2007 , month =

work page 2007

[5] [5]

Time for a Change: a Tutorial for Comparing Multiple Classifiers Through

Alessio Benavoli and Giorgio Corani and Janez Dem. Time for a Change: a Tutorial for Comparing Multiple Classifiers Through. Journal of Machine Learning Research , year =

work page

[6] [6]

Improving Genetic Programming for Symbolic Regression with Equality Graphs , year =

de Fran. Improving Genetic Programming for Symbolic Regression with Equality Graphs , year =. doi:10.1145/3712256.3726383 , booktitle =

work page doi:10.1145/3712256.3726383

[7] [7]

Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

Interpretable machine learning for science with PySR and SymbolicRegression. jl , author=. arXiv preprint arXiv:2305.01582 , year=

work page internal anchor Pith review Pith/arXiv arXiv

[8] [8]

Discovering physical laws with parallel symbolic enumeration

Ruan, Kai and Xu, Yilong and Gao, Ze-Feng and Liu, Yang and Guo, Yike and Wen, Ji-Rong and Sun, Hao. Discovering physical laws with parallel symbolic enumeration. Nature Computational Science. 2026. doi:10.1038/s43588-025-00904-8

work page doi:10.1038/s43588-025-00904-8 2026

[9] [9]

and Affenzeller, Michael , year =

Kronberger, Gabriel and Burlacu, Bogdan and Kommenda, Michael and Winkler, Stephan M. and Affenzeller, Michael , year =. Symbolic Regression , ISBN =. doi:10.1201/9781315166407 , publisher =

work page doi:10.1201/9781315166407

[10] [10]

Friedman , title =

Jerome H. Friedman , title =. The Annals of Statistics , number =. 1991 , doi =

work page 1991

[11] [11]

Neural Computation , author =

Measuring the. Neural Computation , author =. 1994 , keywords =. doi:10.1162/neco.1994.6.5.851 , abstract =

work page doi:10.1162/neco.1994.6.5.851 1994

[12] [12]

and Miranda, Manuel and Pallarès, Jordi and Sales-Pardo, Marta , year =

Guimerà, Roger and Reichardt, Ignasi and Aguilar-Mogas, Antoni and Massucci, Francesco A. and Miranda, Manuel and Pallarès, Jordi and Sales-Pardo, Marta , year =. A. Science Advances , publisher =. doi:10.1126/sciadv.aav6971 , number =

work page doi:10.1126/sciadv.aav6971

[13] [13]

Predicting friction system performance with symbolic regression and genetic programming with factor variables , DOI =

Kronberger, Gabriel and Kommenda, Michael and Promberger, Andreas and Nickel, Falk , year =. Predicting friction system performance with symbolic regression and genetic programming with factor variables , DOI =. Proceedings of the Genetic and Evolutionary Computation Conference , publisher =

work page

[14] [14]

and Smits, Guido F

Vladislavleva, Ekaterina J. and Smits, Guido F. and den Hertog, Dick , journal=. Order of Nonlinearity as a Complexity Measure for Models Generated by Symbolic Regression via. 2009 , volume=

work page 2009

[15] [15]

Evolutionary Computation in the Chemical Industry

Kordon, Arthur. Evolutionary Computation in the Chemical Industry. Evolutionary Computation in Practice. 2008. doi:10.1007/978-3-540-75771-9_11

work page doi:10.1007/978-3-540-75771-9_11 2008

[16] [16]

2012 , publisher=

Smoothing methods in statistics , author=. 2012 , publisher=

work page 2012

[17] [17]

2003 , publisher=

Analyzing categorical data , author=. 2003 , publisher=

work page 2003

[18] [18]

1993 , publisher=

Visualizing data , author=. 1993 , publisher=

work page 1993

[19] [19]

Kilpatrick and M

D. Kilpatrick and M. Cameron. Numeric Prediction Using Instance-Based Learning with Encoding Length Selection , booktitle =. 1997 , timestamp =

work page 1997

[20] [20]

Computational statistics & data analysis , volume=

Stochastic gradient boosting , author=. Computational statistics & data analysis , volume=. 2002 , publisher=

work page 2002

[21] [21]

2017 , publisher=

Olson, Randal S and La Cava, William and Orzechowski, Patryk and Urbanowicz, Ryan J and Moore, Jason H , journal=. 2017 , publisher=

work page 2017

[22] [22]

A Numerical Approach to Genetic Programming for System Identification , year=

Iba, Hitoshi and deGaris, Hugo and Sato, Taisuke , journal=. A Numerical Approach to Genetic Programming for System Identification , year=

work page

[23] [23]

Proceedings of the Genetic and Evolutionary Computation Conference , publisher =

Ali Soltani and Gabriel Kronberger and Fabricio Olivetti de Franca and Mattia Billa and Alessandro Lucantonio , title=. Proceedings of the Genetic and Evolutionary Computation Conference , publisher =. 2026 , series =

work page 2026

[24] [24]

Solomonoff

Solomonoff, R.J. , year =. A formal theory of inductive inference. Information and Control , publisher =. doi:10.1016/s0019-9958(64)90223-2 , number =

work page doi:10.1016/s0019-9958(64)90223-2

[25] [25]

, year =

Solomonoff, R.J. , year =. A formal theory of inductive inference. Information and Control , publisher =. doi:10.1016/s0019-9958(64)90131-7 , number =

work page doi:10.1016/s0019-9958(64)90131-7

[26] [26]

IEEE Transactions on Evolutionary Computation , volume=

Exhaustive symbolic regression , author=. IEEE Transactions on Evolutionary Computation , volume=. 2023 , publisher=

work page 2023

[27] [27]

and Desmond, Harry and Ferreira, Pedro G

Bartlett, Deaglan and Desmond, Harry and Ferreira, Pedro , title =. Proceedings of the Companion Conference on Genetic and Evolutionary Computation , pages =. 2023 , isbn =. doi:10.1145/3583133.3596327 , abstract =

work page doi:10.1145/3583133.3596327 2023

[28] [28]

, keywords =

Desmond, Harry and Bartlett, Deaglan J and Ferreira, Pedro G , year =. On the functional form of the radial acceleration relation , volume =. Monthly Notices of the Royal Astronomical Society , publisher =. doi:10.1093/mnras/stad597 , number =

work page doi:10.1093/mnras/stad597

[29] [29]

2025 , eprint=

Bayesian Symbolic Regression via Posterior Sampling , author=. 2025 , eprint=

work page 2025

[30] [30]

Comparative Analysis of Model Selection Criteria for Symbolic Regression Using Genetic Programming , ISBN =

Ramlan, Fitria Wulandari and Kronberger, Gabriel and O'Riordan, Colm and McDermott, James , year =. Comparative Analysis of Model Selection Criteria for Symbolic Regression Using Genetic Programming , ISBN =. doi:10.1007/978-3-032-15635-8_6 , booktitle =

work page doi:10.1007/978-3-032-15635-8_6

[31] [31]

, urldate =

Akaike, H. , year =. A new look at the statistical model identification , volume =. IEEE Transactions on Automatic Control , publisher =. doi:10.1109/tac.1974.1100705 , number =

work page doi:10.1109/tac.1974.1100705 1974

[32] [32]

Estimating the Dimension of a Model , volume =

Gideon Schwarz , journal =. Estimating the Dimension of a Model , volume =

work page

[33] [33]

The Elements of Statistical Learning , year =

Trevor Hastie and Robert Tibshirani and Jerome Friedman , publisher =. The Elements of Statistical Learning , year =

work page

[34] [34]

Journal of the Royal Statistical Society: Series B (Methodological)57(1), 289–300 (1995)

O'Hagan, Anthony , year =. Fractional Bayes Factors for Model Comparison , volume =. Journal of the Royal Statistical Society Series B: Statistical Methodology , publisher =. doi:10.1111/j.2517-6161.1995.tb02017.x , number =

work page doi:10.1111/j.2517-6161.1995.tb02017.x 1995

[35] [35]

Rissanen , keywords =

Rissanen, J. , year =. Modeling by shortest data description , volume =. Automatica , publisher =. doi:10.1016/0005-1098(78)90005-5 , number =

work page doi:10.1016/0005-1098(78)90005-5

[36] [36]

Effects of reducing redundant parameters in parameter optimization for symbolic regression using genetic programming , volume =

Kronberger, Gabriel and Olivetti de Fran. Effects of reducing redundant parameters in parameter optimization for symbolic regression using genetic programming , volume =. 2025 , month = jul, pages =. doi:10.1016/j.jsc.2024.102413 , journal =

work page doi:10.1016/j.jsc.2024.102413 2025

[37] [37]

Information geometry for multiparameter models: new perspectives on the origin of simplicity , volume =

Quinn, Katherine N and Abbott, Michael C and Transtrum, Mark K and Machta, Benjamin B and Sethna, James P , year =. Information geometry for multiparameter models: new perspectives on the origin of simplicity , volume =. Reports on Progress in Physics , publisher =. doi:10.1088/1361-6633/aca6f8 , number =

work page doi:10.1088/1361-6633/aca6f8

[38] [38]

Burlacu, Bogdan and Kronberger, Gabriel and Kommenda, Michael , booktitle=. Operon

work page

[39] [39]

A fast and elitistmultiobjectivegeneticalgorithm:NSGA-II

Deb, K. and Pratap, A. and Agarwal, S. and Meyarivan, T. , year =. A fast and elitist multiobjective genetic algorithm:. IEEE Transactions on Evolutionary Computation , publisher =. doi:10.1109/4235.996017 , number =

work page doi:10.1109/4235.996017

[40] [40]

2015 , publisher=

Doing Bayesian data analysis: A tutorial with R, JAGS, and Stan , author=. 2015 , publisher=

work page 2015

[41] [41]

Time for a Change: a Tutorial for Comparing Multiple Classifiers Through Bayesian Analysis , journal =

Alessio Benavoli and Giorgio Corani and Janez Dem. Time for a Change: a Tutorial for Comparing Multiple Classifiers Through Bayesian Analysis , journal =. 2017 , volume =

work page 2017

[42] [42]

and Mangili, F

Benavoli, A. and Mangili, F. and Corani, G. and Zaffalon, M. and Ruggeri, F. , title =. Proceedings of the 31st International Conference on International Conference on Machine Learning - Volume 32 , pages =. 2014 , publisher =

work page 2014

[43] [43]

Nikuradse, Johann , title =

work page

[44] [44]

Proceedings of the Genetic and Evolutionary Computation Conference Companion , pages =

Haut, Nathan and Kotanchek, Mark , title =. Proceedings of the Genetic and Evolutionary Computation Conference Companion , pages =. 2025 , isbn =. doi:10.1145/3712255.3734327 , abstract =

work page doi:10.1145/3712255.3734327 2025

[45] [45]

and Kotanchek, Mark

Smits, Guido F. and Kotanchek, Mark. Pareto-Front Exploitation in Symbolic Regression. Genetic Programming Theory and Practice II. 2005. doi:10.1007/0-387-23254-0_17

work page doi:10.1007/0-387-23254-0_17 2005

[46] [46]

doi: 10.1162/EVCO_a_00025

Lehman, Joel and Stanley, Kenneth O. , title =. Evolutionary Computation , volume =. 2011 , month =. doi:10.1162/EVCO_a_00025 , eprint =

work page doi:10.1162/evco_a_00025 2011

[47] [47]

de Franca, F. O. and Virgolin, M. and Kommenda, M. and Majumder, M. S. and Cranmer, M. and Espada, G. and Ingelse, L. and Fonseca, A. and Landajuela, M. and Petersen, B. and Glatt, R. and Mundhenk, N. and Lee, C. S. and Hochhalter, J. D. and Randall, D. L. and Kamienny, P. and Zhang, H. and Dick, G. and Simon, A. and Burlacu, B. and Kasak, Jaan and Machad...

work page doi:10.1109/tevc.2024.3423681 2024

[48] [48]

Symbolic regression via

Zihan Yu and Jingtao Ding and Yong Li and Depeng Jin , booktitle=. Symbolic regression via. 2025 , url=

work page 2025

[49] [49]

Structural Risk Minimization-Driven Genetic Programming for Enhancing Generalization in Symbolic Regression , volume =

Chen, Qi and Zhang, Mengjie and Xue, Bing , year =. Structural Risk Minimization-Driven Genetic Programming for Enhancing Generalization in Symbolic Regression , volume =. IEEE Transactions on Evolutionary Computation , publisher =. doi:10.1109/tevc.2018.2881392 , number =

work page doi:10.1109/tevc.2018.2881392 2018

[50] [50]

Improving Generalisation of Genetic Programming for Symbolic Regression with Structural Risk Minimisation , DOI =

Chen, Qi and Xue, Bing and Shang, Lin and Zhang, Mengjie , year =. Improving Generalisation of Genetic Programming for Symbolic Regression with Structural Risk Minimisation , DOI =. Proceedings of the Genetic and Evolutionary Computation Conference 2016 , publisher =

work page 2016

[51] [51]

and Alonso, C\'

Borges, Cruz E. and Alonso, C\'. Model selection in genetic programming , year =. Proceedings of the 12th Annual Conference on Genetic and Evolutionary Computation , pages =. doi:10.1145/1830483.1830662 , abstract =

work page doi:10.1145/1830483.1830662

[52] [52]

, title =

Vapnik, Vladimir N. , title =. 1995 , isbn =

work page 1995

[53] [53]

2015 , eprint=

Illuminating search spaces by mapping elites , author=. 2015 , eprint=

work page 2015

[54] [54]

McAllester

McAllester, David A. , year =. Some. Machine Learning , publisher =. doi:10.1023/a:1007618624809 , number =

work page doi:10.1023/a:1007618624809

[55] [55]

User-friendly Introduction to

Alquier, Pierre , year =. User-friendly Introduction to. Foundations and Trends in Machine Learning , publisher =. doi:10.1561/2200000100 , number =

work page doi:10.1561/2200000100

[56] [56]

Germain, Pascal and Bach, Francis and Lacoste, Alexandre and Lacoste-Julien, Simon , booktitle =

work page

[57] [57]

Quarterly of Applied Mathematics , year=

A METHOD FOR THE SOLUTION OF CERTAIN NON – LINEAR PROBLEMS IN LEAST SQUARES , author=. Quarterly of Applied Mathematics , year=

work page

[58] [58]

, title =

Marquardt, Donald W. , title =. Journal of the Society for Industrial and Applied Mathematics , volume =. 1963 , doi =

work page 1963

[59] [59]

Toward an artificial intelligence physicist for unsupervised learning , volume =

Wu, Tailin and Tegmark, Max , year =. Toward an artificial intelligence physicist for unsupervised learning , volume =. Physical Review E , publisher =. doi:10.1103/physreve.100.033311 , number =

work page doi:10.1103/physreve.100.033311

[60] [60]

Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =

Udrescu, Silviu-Marian and Tan, Andrew and Feng, Jiahai and Neto, Orisvaldo and Wu, Tailin and Tegmark, Max , title =. Proceedings of the 34th International Conference on Neural Information Processing Systems , articleno =. 2020 , isbn =

work page 2020

[61] [61]

and Kammerer, Lukas , year =

Kronberger, Gabriel and Olivetti de Franca, Fabricio and Desmond, Harry and Bartlett, Deaglan J. and Kammerer, Lukas , year =. The Inefficiency of Genetic Programming for Symbolic Regression , ISBN =. doi:10.1007/978-3-031-70055-2_17 , booktitle =

work page doi:10.1007/978-3-031-70055-2_17

[62] [62]

Journal of Machine Learning Research , year =

Jacques Wainer , title =. Journal of Machine Learning Research , year =

work page

[63] [63]

Probabilistic Incremental Program Evolution , volume =

Salustowicz, Rafal and Schmidhuber, J\". Probabilistic Incremental Program Evolution , volume =. Evolutionary Computation , publisher =. 1997 , month = June, pages =. doi:10.1162/evco.1997.5.2.123 , number =

work page doi:10.1162/evco.1997.5.2.123 1997

[64] [64]

Bayesian Machine Scientist to Compare Data Collapses for the Nikuradse Dataset , volume =

Reichardt, Ignasi and Pallarès, Jordi and Sales-Pardo, Marta and Guimerà, Roger , year =. Bayesian Machine Scientist to Compare Data Collapses for the Nikuradse Dataset , volume =. Physical Review Letters , publisher =. doi:10.1103/physrevlett.124.084503 , number =

work page doi:10.1103/physrevlett.124.084503

[65] [65]

Ge- netic Programming and Evolvable Machines21(3), 471–501 (Dec 2019)

Kommenda, Michael and Burlacu, Bogdan and Kronberger, Gabriel and Affenzeller, Michael , year =. Parameter identification for symbolic regression using nonlinear least squares , volume =. Genetic Programming and Evolvable Machines , publisher =. doi:10.1007/s10710-019-09371-3 , number =

work page doi:10.1007/s10710-019-09371-3