Sample Complexity of Scientific Discovery: PAC Learnability of Compositional Function Trees

\c{S}uayp Talha Kocabay; Kerem Yal\c{c}{\i}n; Talha R\"uzgar Akku\c{s}

arxiv: 2606.29331 · v1 · pith:2DAVE6ZAnew · submitted 2026-06-28 · 💻 cs.LG · stat.ML

Sample Complexity of Scientific Discovery: PAC Learnability of Compositional Function Trees

\c{S}uayp Talha Kocabay , Talha R\"uzgar Akku\c{s} , Kerem Yal\c{c}{\i}n This is my paper

Pith reviewed 2026-06-30 07:49 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords PAC learningRademacher complexitysymbolic regressioncompositional functionsLipschitz operatorssample complexityfunction treesgeneralization bounds

0 comments

The pith

Compositional function trees have generalization error controlled by depth and operator Lipschitz constants rather than by the total number of possible expressions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that the Rademacher complexity of depth-d compositional function trees built from a finite vocabulary of smooth operators is bounded by (K b sqrt(2) L)^{d-1} times the complexity of depth-1 trees. This yields high-probability excess risk bounds that scale as O(L^d / sqrt(n)) when vocabulary size and arity are constant. A sympathetic reader would care because the result shows that symbolic regression need not be statistically intractable despite the combinatorial size of the hypothesis space. The derivation relies on a finite union bound combined with recursive Maurer vector contraction, and the paper supports the theory with experiments on differentiable operator trees trained on synthetic targets.

Core claim

Under mild Lipschitz conditions on operators and bounded affine leaves, a finite-union bound over a vocabulary of size K together with Maurer-type vector contraction yields R_n(H_comp^d) ≤ (K b sqrt(2) L)^{d-1} R_n(H_comp^1), with corresponding high-probability risk bounds scaling as O(L^d / sqrt(n)) when K,b=O(1).

What carries the argument

Finite union bound over the vocabulary of size K combined with recursive Maurer vector contraction, which produces the depth-dependent multiplier (K b sqrt(2) L)^{d-1} on the base Rademacher complexity.

If this is right

High-probability excess risk bounds scale as O(L^d / sqrt(n)) for fixed K and b.
The combinatorial number of distinct symbolic structures does not appear in the leading term of the bound.
Empirical generalization gaps observed when training differentiable operator trees correlate positively with the term (L^d)/sqrt(n).
The bound applies directly to symbolic regression on targets of controlled depth built from the given operator vocabulary.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Preferring operators whose Lipschitz constants are close to one could keep the sample requirement modest even at larger depths.
The contraction technique might extend to other structured hypothesis classes that compose a finite set of smooth maps.
The theory supplies a concrete criterion for choosing operator vocabularies in symbolic regression systems.
Direct verification of the bound on real scientific data would test whether the Lipschitz and bounded-leaf assumptions hold in practice.

Load-bearing premise

The base operators satisfy uniform Lipschitz conditions with a fixed constant L and the affine leaves are bounded.

What would settle it

An experiment in which the measured excess risk for depth-d trees fails to decrease proportionally to L^d over sqrt(n) as sample size n grows, or in which the empirical Rademacher complexity exceeds the stated multiplicative bound by more than a small constant factor.

Figures

Figures reproduced from arXiv: 2606.29331 by \c{S}uayp Talha Kocabay, Kerem Yal\c{c}{\i}n, Talha R\"uzgar Akku\c{s}.

**Figure 1.** Figure 1: Empirical generalization gap versus the predicted complexity term (Lbd )/ √ n across 400 runs. horizontal uncertainty factor 1/ √ n by increasing n from 50 to 5000 yields comparable reductions in both median |gap| and median (Lbd )/ √ n. At d = 3 and especially d = 4, the same table shows that Lb remains order-one to ten while (Lbd )/ √ n spans orders of magnitude across depths because of exponentiation i… view at source ↗

**Figure 3.** Figure 3: Factorization diagnostics computed from the same logged runs (no retraining). (a) Heatmap of median log10 |gap| over depth and n. (b) Distribution of log10 |gap| aggregated across (n, seed) within each depth [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Descriptive summaries of the sweep beyond scatter plots ( [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

read the original abstract

Scientific discovery via symbolic regression is often viewed as statistically and computationally intractable because the hypothesis space of expressions grows combinatorially with depth. This paper revisits the statistical side through the lens of PAC learning, focusing on compositional function trees built from a finite vocabulary of smooth operators (e.g., $\{+,\times,\sin,\exp\}$ and affine maps). We prove that the relevant generalization quantity, Rademacher complexity, hence the excess risk, does not necessarily blow up exponentially with the number of distinct symbolic structures, but is controlled by (i) the depth $d$ and (ii) the Lipschitz constants of the base operators along the composed computation graph. Concretely, under mild Lipschitz conditions on operators and bounded affine leaves, a finite-union bound over a vocabulary of size $K=|\mathcal{H}_{\mathrm{base}}|$ together with Maurer-type vector contraction yields $\mathfrak{R}_n(\mathcal{H}_{\mathrm{comp}}^{d}) \leq (Kb\sqrt{2}L)^{d-1}\mathfrak{R}_n(\mathcal{H}_{\mathrm{comp}}^{1})$ with arity bound $b$; corresponding high-probability risk bounds scale as $\mathcal{O}(L^{d}/\sqrt{n})$ when $K,b=O(1)$ and $\mathfrak{R}_n(\mathcal{H}_{\mathrm{comp}}^{1})=O(n^{-1/2})$. We complement the theory with a modular codebase that trains differentiable operator trees (not MLPs) on synthetic "physics-like" targets of controlled depth and shows that the empirical generalization gap correlates positively with the predicted complexity term $(\widehat{L}^{d})/\sqrt{n}$.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The depth-dependent Rademacher bound is the main claim, but the uniform Lipschitz assumption does not hold for the listed operators without range controls on intermediates.

read the letter

The paper derives a Rademacher complexity bound for compositional function trees that scales with depth d and the Lipschitz constant L rather than exploding with the total number of possible expressions. Under the stated conditions it gives R_n(H_comp^d) ≤ (K b sqrt(2) L)^{d-1} R_n(H_comp^1) and the corresponding excess-risk scaling O(L^d / sqrt(n)).

What is actually new is the direct application of Maurer vector contraction plus a union bound over a finite vocabulary to the recursive tree structure. That step produces a depth-controlled bound instead of the usual combinatorial blow-up, and the authors supply a modular codebase that trains differentiable operator trees on synthetic targets of known depth. The reported positive correlation between the empirical gap and the term (L^d)/sqrt(n) is a concrete check on the prediction.

The central assumption is the weak point. The vocabulary includes exp and multiplication, neither of which is globally Lipschitz on the reals. The bound requires a single fixed L that works at every level, yet bounded affine leaves only constrain the depth-1 case. Nothing in the argument propagates explicit range bounds on intermediate activations, so the local Lipschitz constants can grow without limit as depth increases. If that happens the contraction step cannot be applied with a depth-independent L and the claimed recursion fails for the full hypothesis class. The abstract states the bound under “mild Lipschitz conditions” but does not show how those conditions are maintained through composition.

The empirical correlation is consistent with the theory but does not test the gap; no error bars or ablation on the range issue appear in the summary. The work is aimed at researchers who analyze sample complexity in symbolic regression and scientific machine learning. It is worth sending to a serious referee because the contraction idea is worth checking in detail and the codebase makes verification feasible, even though the Lipschitz gap needs to be closed or the claim narrowed.

Referee Report

1 major / 1 minor

Summary. The paper claims that under mild Lipschitz conditions on base operators and bounded affine leaves, the Rademacher complexity of compositional function trees of depth d satisfies R_n(H_comp^d) ≤ (K b sqrt(2) L)^{d-1} R_n(H_comp^1) via finite union bound and Maurer-type vector contraction, yielding high-probability excess risk bounds of order O(L^d / sqrt(n)) for constant K, b. The theory is supported by experiments on synthetic physics-like targets showing positive correlation between the empirical generalization gap and the predicted term (L^d)/sqrt(n).

Significance. If the bound holds with the stated assumptions, the result would offer a useful PAC-learning analysis for symbolic regression, demonstrating that generalization is controlled by depth and operator Lipschitz constants rather than the combinatorial size of the expression space. The modular codebase for training differentiable operator trees (distinct from MLPs) is a strength, enabling direct verification of the empirical claims on controlled-depth targets.

major comments (1)

[Abstract] Abstract: The derivation of R_n(H_comp^d) ≤ (K b sqrt(2) L)^{d-1} R_n(H_comp^1) applies Maurer vector contraction recursively under a uniform L-Lipschitz condition on all operators. However, the vocabulary includes exp (not globally Lipschitz) and × (Lipschitz constant depends on the magnitude of the other factor). Bounded affine leaves control only the depth-1 case; the abstract provides no mechanism to propagate explicit bounds on intermediate activations, so the contraction cannot be applied with depth-independent L for the full class H_comp^d.

minor comments (1)

The description of the empirical correlation between generalization gap and (L^d)/sqrt(n) should report a quantitative statistic (e.g., Pearson r or regression slope with confidence interval) rather than a qualitative statement.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for highlighting an important point about the Lipschitz assumptions. We address the major comment below and will make the corresponding revisions.

read point-by-point responses

Referee: [Abstract] Abstract: The derivation of R_n(H_comp^d) ≤ (K b sqrt(2) L)^{d-1} R_n(H_comp^1) applies Maurer vector contraction recursively under a uniform L-Lipschitz condition on all operators. However, the vocabulary includes exp (not globally Lipschitz) and × (Lipschitz constant depends on the magnitude of the other factor). Bounded affine leaves control only the depth-1 case; the abstract provides no mechanism to propagate explicit bounds on intermediate activations, so the contraction cannot be applied with depth-independent L for the full class H_comp^d.

Authors: We agree that the abstract (and the current presentation) does not explicitly describe how bounded affine leaves induce bounds on intermediate activations, which is needed to justify uniform Lipschitz constants for operators such as exp and ×. The manuscript states the result under 'mild Lipschitz conditions on operators and bounded affine leaves,' intending these conditions to hold on the compact ranges determined by the leaves (where exp is Lipschitz and multiplication has a bounded local Lipschitz constant). However, an explicit inductive range-propagation argument is indeed absent. We will revise by adding a short preliminary lemma in Section 3 that inductively bounds the range of every node in a depth-d tree given bounded leaves; this justifies applying a single effective L (or folding any mild depth dependence into the existing O(L^d) term). The main complexity bound and all subsequent statements will be updated for clarity. This addresses the concern without altering the core proof strategy. revision: yes

Circularity Check

0 steps flagged

No circularity: bound obtained by direct application of standard Rademacher and contraction inequalities to tree structure.

full rationale

The derivation applies a finite-union bound over vocabulary size K together with Maurer-type vector contraction to obtain the recursive inequality R_n(H_comp^d) ≤ (K b sqrt(2) L)^{d-1} R_n(H_comp^1) under the stated Lipschitz and bounded-leaf assumptions. This is a standard chaining argument on the compositional hypothesis class and does not reduce any quantity to a fitted parameter, self-definition, or load-bearing self-citation; the result is independent of its own outputs and remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that operators are Lipschitz with uniform constant L and that affine leaves are bounded; these are domain assumptions that enable the contraction to produce the stated depth scaling. No free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption Base operators satisfy uniform Lipschitz conditions with constant L
Invoked to bound the growth of complexity along the computation graph with depth.
domain assumption Affine leaves are bounded
Required for the base case of the inductive bound on Rademacher complexity.

pith-pipeline@v0.9.1-grok · 5853 in / 1373 out tokens · 35417 ms · 2026-06-30T07:49:39.239421+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

69 extracted references · 27 canonical work pages · 6 internal anchors

[1]

A theory of the learnable , author=. Commun. ACM , year=
[2]

Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , author=. J. Mach. Learn. Res. , year=
[3]

Annals of Statistics , year=

Empirical margin distributions and bounding the generalization error of combined classifiers , author=. Annals of Statistics , year=
[4]

Journal of Functional Analysis , year=

The Sizes of Compact Subsets of Hilbert Space and Continuity of Gaussian Processes , author=. Journal of Functional Analysis , year=
[5]

Annals of Probability , year=

A new look at independence , author=. Annals of Probability , year=
[6]

2014 , howpublished =

Probability in High Dimension , author =. 2014 , howpublished =

2014
[7]

2018 , edition=

Foundations of Machine Learning , author=. 2018 , edition=

2018
[8]

Journal of Machine Learning Research , year=

Stability and Generalization , author=. Journal of Machine Learning Research , year=
[9]

2010 , eprint =

Robustness and Generalization , author =. 2010 , eprint =

2010
[10]

User-friendly Introduction to

Alquier, Pierre , year =. User-friendly Introduction to. doi:10.1561/2200000100 , note =. 2110.11216 , archivePrefix=

work page doi:10.1561/2200000100
[12]

Langford, John and Seeger, Matthias , year =
[13]

PAC-Bayesian Generalization Error Bounds for Gaussian Process Classification , volume =

Seeger, Matthias , year =. PAC-Bayesian Generalization Error Bounds for Gaussian Process Classification , volume =. Journal of Machine Learning Research , doi =
[14]

Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data

Dziugaite, Gintare Karolina and Roy, Daniel M. , title =. Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence,. 2017 , url =. 1703.11008 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv 2017
[15]

Proceedings of the 26th International Conference on Machine Learning , pages =

Germain, Pascal and Lacasse, Alexandre and Laviolette, Fran. Proceedings of the 26th International Conference on Machine Learning , pages =. 2009 , doi =

2009
[17]

2014 , isbn =

Shalev-Shwartz, Shai and Ben-David, Shai , title =. 2014 , isbn =

2014
[18]

1991 , isbn =

Ledoux, Michel and Talagrand, Michel , title =. 1991 , isbn =

1991
[19]

and Wellner, Jon A

van der Vaart, Aad W. and Wellner, Jon A. , title =. 1996 , isbn =

1996
[20]

2014 , edition =

Talagrand, Michel , title =. 2014 , edition =

2014
[21]

Theoretical Computer Science , volume =

Maurer, Andreas , title =. Theoretical Computer Science , volume =. 2016 , doi =

2016
[23]

2011 , eprint =

Structured Sparsity and Generalization , author =. 2011 , eprint =

2011
[24]

2017 , eprint =

Spectrally-normalized margin bounds for neural networks , author =. 2017 , eprint =

2017
[25]

2017 , eprint=

Exploring Generalization in Deep Learning , author=. 2017 , eprint=

2017
[26]

Vapnik, Vladimir Naumovich , biburl =
[27]

Science , year =

Distilling Free-Form Natural Laws from Experimental Data , author =. Science , year =
[28]

, title =

Koza, John R. , title =. 1992 , isbn =

1992
[30]

2021 , eprint =

Deep Symbolic Regression: Recovering Mathematical Expressions from Data via Risk-Seeking Policy Gradients , author =. 2021 , eprint =

2021
[32]

Proceedings of the National Academy of Sciences , volume =

Bongard, Josh and Lipson, Hod , title =. Proceedings of the National Academy of Sciences , volume =. 2007 , doi =

2007
[33]

and Proctor, Joshua L

Brunton, Steven L. and Proctor, Joshua L. and Kutz, J. Nathan , title =. Proceedings of the National Academy of Sciences , year =
[34]

Journal of Computational Physics , volume =

Raissi, Maziar and Perdikaris, Paris and Karniadakis, George Em , title =. Journal of Computational Physics , volume =. 2019 , doi =

2019
[37]

2020 , eprint=

Discovering Symbolic Models from Deep Learning with Inductive Biases , author=. 2020 , eprint=

2020
[38]

2021 , eprint =

Contemporary Symbolic Regression Methods and their Relative Performance , author =. 2021 , eprint =

2021
[39]

Automatica , volume =

Rissanen, Jorma , title =. Automatica , volume =. 1978 , doi =

1978
[41]

Neurocomputing , volume =

Lei, Yunwen and Ding, Lixin and Bi, Yingzhou , title =. Neurocomputing , volume =. 2016 , doi =

2016
[42]

Journal of Machine Learning Research , year =

Kanade, Varun and Rebeschini, Patrick and Vaskevicius, Tomas , title =. Journal of Machine Learning Research , year =
[43]

and Kutz, J

Fasel, U. and Kutz, J. N. and Brunton, B. W. and Brunton, S. L. , title =. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences , volume =. 2022 , doi =

2022
[44]

User-friendly introduction to pac-bayes bounds.arXiv preprint arXiv:2110.11216,

Alquier, P. User-friendly introduction to PAC - B ayes bounds, 2024. URL https://arxiv.org/abs/2110.11216. Foundations and Trends in Machine Learning

work page arXiv 2024
[45]

Bartlett, P. L. and Mendelson, S. Rademacher and gaussian complexities: Risk bounds and structural results. J. Mach. Learn. Res., 3: 0 463--482, 2003. URL https://api.semanticscholar.org/CorpusID:463216

2003
[46]

Spectrally-normalized margin bounds for neural networks

Bartlett, P. L., Foster, D. J., and Telgarsky, M. J. Spectrally-normalized margin bounds for neural networks, 2017. URL https://arxiv.org/abs/1706.08498

work page internal anchor Pith review Pith/arXiv arXiv 2017
[47]

Discovering governing equations from data by sparse identification of nonlinear dynamical systems

Brunton, S. L., Proctor, J. L., and Kutz, J. N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 113 0 (15): 0 3932--3937, 2016. doi:10.1073/pnas.1517384113

work page doi:10.1073/pnas.1517384113 2016
[48]

PAC - B ayesian Supervised Classification: The Thermodynamics of Statistical Learning , volume 56 of Lecture Notes---Monograph Series

Catoni, O. PAC - B ayesian Supervised Classification: The Thermodynamics of Statistical Learning , volume 56 of Lecture Notes---Monograph Series. Institute of Mathematical Statistics, Beachwood, Ohio, 2007. ISBN 978-0-940600-68-7. doi:10.1214/074921707000000391. URL https://doi.org/10.1214/074921707000000391

work page doi:10.1214/074921707000000391 2007
[49]

Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

Cranmer, M. Interpretable machine learning for science with PySR and S ymbolic R egression.jl, 2023. URL https://arxiv.org/abs/2305.01582

work page internal anchor Pith review Pith/arXiv arXiv 2023
[50]

Discovering symbolic models from deep learning with inductive biases, 2020

Cranmer, M., Sanchez-Gonzalez, A., Battaglia, P., Xu, R., Cranmer, K., Spergel, D., and Ho, S. Discovering symbolic models from deep learning with inductive biases, 2020. URL https://arxiv.org/abs/2006.11287

work page arXiv 2020
[51]

Dudley, R. M. The sizes of compact subsets of hilbert space and continuity of gaussian processes. Journal of Functional Analysis, 1: 0 125--165, 1967. URL https://api.semanticscholar.org/CorpusID:122249056

1967
[52]

Exponential tail local R ademacher complexity risk bounds without the B ernstein condition

Kanade, V., Rebeschini, P., and Vaskevicius, T. Exponential tail local R ademacher complexity risk bounds without the B ernstein condition. Journal of Machine Learning Research, 25 0 (388): 0 1--43, 2024. URL http://jmlr.org/papers/v25/23-0063.html

2024
[53]

and Panchenko, D

Koltchinskii, V. and Panchenko, D. Empirical margin distributions and bounding the generalization error of combined classifiers. Annals of Statistics, 30: 0 1--50, 2002. URL https://api.semanticscholar.org/CorpusID:2307733

2002
[54]

Koza, J. R. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, 1992. ISBN 978-0-262-11170-6

1992
[55]

La Cava, W., Orzechowski, P., Burlacu, B., Olivetti de Fran c a, F., Virgolin, M., Jin, Y., Kommenda, M., and Moore, J. H. Contemporary symbolic regression methods and their relative performance, 2021. URL https://arxiv.org/abs/2107.14351. NeurIPS 2021 Datasets and Benchmarks Track; introduces the SRBench suite

work page arXiv 2021
[56]

and Seeger, M

Langford, J. and Seeger, M. Bounds for averaging classifiers. 02 2001

2001
[57]

Ledoux and M

Ledoux, M. and Talagrand, M. Probability in B anach Spaces: Isoperimetry and Processes . Springer-Verlag, Berlin, 1991. ISBN 978-3-540-52013-9. doi:10.1007/978-3-642-20212-4

work page doi:10.1007/978-3-642-20212-4 1991
[58]

Local R ademacher complexity bounds based on covering numbers

Lei, Y., Ding, L., and Bi, Y. Local R ademacher complexity bounds based on covering numbers. Neurocomputing, 218: 0 320--330, 2016. doi:10.1016/j.neucom.2016.08.074

work page doi:10.1016/j.neucom.2016.08.074 2016
[59]

and Chawla, S

Makke, N. and Chawla, S. Interpretable scientific discovery with symbolic regression: a review. Artificial Intelligence Review, 57 0 (1), 2024. doi:10.1007/s10462-023-10622-0. URL https://doi.org/10.1007/s10462-023-10622-0. Article number 2

work page doi:10.1007/s10462-023-10622-0 2024
[60]

A chain rule for the expected suprema of G aussian processes

Maurer, A. A chain rule for the expected suprema of G aussian processes. Theoretical Computer Science, 650: 0 109--122, 2016 a . doi:10.1016/j.tcs.2016.07.034

work page doi:10.1016/j.tcs.2016.07.034 2016
[61]

A vector-contraction inequality for Rademacher complexities

Maurer, A. A vector-contraction inequality for R ademacher complexities, 2016 b . URL https://arxiv.org/abs/1605.00251

work page internal anchor Pith review Pith/arXiv arXiv 2016
[62]

Structured Sparsity and Generalization

Maurer, A. and Pontil, M. Structured sparsity and generalization, 2011. URL https://arxiv.org/abs/1108.3476

work page internal anchor Pith review Pith/arXiv arXiv 2011
[63]

McAllester, D. A. Some PAC - B ayesian theorems. Machine Learning, 37 0 (3): 0 355--363, 1999. doi:10.1023/A:1007618624809. URL https://doi.org/10.1023/A:1007618624809

work page doi:10.1023/a:1007618624809 1999
[64]

Foundations of Machine Learning

Mohri, M., Rostamizadeh, A., and Talwalkar, A. Foundations of Machine Learning. MIT Press, Cambridge, MA, 2nd edition, 2018. ISBN 9780262039406

2018
[65]

Exploring Generalization in Deep Learning

Neyshabur, B., Bhojanapalli, S., McAllester, D., and Srebro, N. Exploring generalization in deep learning, 2017. URL https://arxiv.org/abs/1706.08947

work page internal anchor Pith review Pith/arXiv arXiv 2017
[66]

K., Landajuela, M., Mundhenk, T

Petersen, B. K., Landajuela, M., Mundhenk, T. N., Santiago, C. P., Kim, S. K., and Kim, J. T. Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients, 2021. URL https://arxiv.org/abs/1912.04871. Published at ICLR 2021

work page arXiv 2021
[67]

Modeling by shortest data description

Rissanen, J. Modeling by shortest data description. Automatica, 14 0 (5): 0 465--471, 1978. doi:10.1016/0005-1098(78)90005-5

work page doi:10.1016/0005-1098(78)90005-5 1978
[68]

Schmidt, M. D. and Lipson, H. Distilling free-form natural laws from experimental data. Science, 324 0 (5923): 0 81--85, 2009 a . doi:10.1126/science.1165893

work page doi:10.1126/science.1165893 2009
[69]

Schmidt, M. D. and Lipson, H. Symbolic regression of implicit equations. In Genetic Programming Theory and Practice VII, Genetic and Evolutionary Computation, pp.\ 73--85. Springer, New York, 2009 b . doi:10.1007/978-1-4419-1626-6\_5. URL https://doi.org/10.1007/978-1-4419-1626-6\_5

work page doi:10.1007/978-1-4419-1626-6 2009
[70]

Pac-bayesian generalization error bounds for gaussian process classification

Seeger, M. Pac-bayesian generalization error bounds for gaussian process classification. Journal of Machine Learning Research, 3, 08 2002. doi:10.1162/153244303765208386

work page doi:10.1162/153244303765208386 2002
[71]

and Ben-David, S

Shalev-Shwartz, S. and Ben-David, S. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014. ISBN 978-1-107-05713-5. doi:10.1017/CBO9781107298019

work page doi:10.1017/cbo9781107298019 2014
[72]

Upper and Lower Bounds for Stochastic Processes: Modern Methods and Classical Problems

Talagrand, M. Upper and Lower Bounds for Stochastic Processes: Modern Methods and Classical Problems. Springer, Heidelberg, 2 edition, 2014. ISBN 978-3-642-54074-5. doi:10.1007/978-3-642-54075-2

work page doi:10.1007/978-3-642-54075-2 2014
[73]

Truong, L. V. On R ademacher complexity-based generalization bounds for deep learning, 2022. URL https://arxiv.org/abs/2208.04284

work page arXiv 2022
[74]

and Tegmark, M

Udrescu, S.-M. and Tegmark, M. AI F eynman: A physics-inspired method for symbolic regression, 2020. URL https://arxiv.org/abs/1905.11481

work page arXiv 2020
[75]

Valiant, L. G. A theory of the learnable. Commun. ACM, 27: 0 1134--1142, 1984. URL https://api.semanticscholar.org/CorpusID:59712

1984
[76]

van der Vaart, A. W. and Wellner, J. A. Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics. Springer, 1996. ISBN 978-0-387-94640-5. doi:10.1007/978-1-4757-2545-2

work page doi:10.1007/978-1-4757-2545-2 1996
[77]

Vapnik, V. N. Statistical Learning Theory. Wiley, New York, NY, USA, September 1998

1998

[1] [1]

A theory of the learnable , author=. Commun. ACM , year=

[2] [2]

Rademacher and Gaussian Complexities: Risk Bounds and Structural Results , author=. J. Mach. Learn. Res. , year=

[3] [3]

Annals of Statistics , year=

Empirical margin distributions and bounding the generalization error of combined classifiers , author=. Annals of Statistics , year=

[4] [4]

Journal of Functional Analysis , year=

The Sizes of Compact Subsets of Hilbert Space and Continuity of Gaussian Processes , author=. Journal of Functional Analysis , year=

[5] [5]

Annals of Probability , year=

A new look at independence , author=. Annals of Probability , year=

[6] [6]

2014 , howpublished =

Probability in High Dimension , author =. 2014 , howpublished =

2014

[7] [7]

2018 , edition=

Foundations of Machine Learning , author=. 2018 , edition=

2018

[8] [8]

Journal of Machine Learning Research , year=

Stability and Generalization , author=. Journal of Machine Learning Research , year=

[9] [9]

2010 , eprint =

Robustness and Generalization , author =. 2010 , eprint =

2010

[10] [10]

User-friendly Introduction to

Alquier, Pierre , year =. User-friendly Introduction to. doi:10.1561/2200000100 , note =. 2110.11216 , archivePrefix=

work page doi:10.1561/2200000100

[11] [12]

Langford, John and Seeger, Matthias , year =

[12] [13]

PAC-Bayesian Generalization Error Bounds for Gaussian Process Classification , volume =

Seeger, Matthias , year =. PAC-Bayesian Generalization Error Bounds for Gaussian Process Classification , volume =. Journal of Machine Learning Research , doi =

[13] [14]

Computing Nonvacuous Generalization Bounds for Deep (Stochastic) Neural Networks with Many More Parameters than Training Data

Dziugaite, Gintare Karolina and Roy, Daniel M. , title =. Proceedings of the Thirty-Third Conference on Uncertainty in Artificial Intelligence,. 2017 , url =. 1703.11008 , archivePrefix =

work page internal anchor Pith review Pith/arXiv arXiv 2017

[14] [15]

Proceedings of the 26th International Conference on Machine Learning , pages =

Germain, Pascal and Lacasse, Alexandre and Laviolette, Fran. Proceedings of the 26th International Conference on Machine Learning , pages =. 2009 , doi =

2009

[15] [17]

2014 , isbn =

Shalev-Shwartz, Shai and Ben-David, Shai , title =. 2014 , isbn =

2014

[16] [18]

1991 , isbn =

Ledoux, Michel and Talagrand, Michel , title =. 1991 , isbn =

1991

[17] [19]

and Wellner, Jon A

van der Vaart, Aad W. and Wellner, Jon A. , title =. 1996 , isbn =

1996

[18] [20]

2014 , edition =

Talagrand, Michel , title =. 2014 , edition =

2014

[19] [21]

Theoretical Computer Science , volume =

Maurer, Andreas , title =. Theoretical Computer Science , volume =. 2016 , doi =

2016

[20] [23]

2011 , eprint =

Structured Sparsity and Generalization , author =. 2011 , eprint =

2011

[21] [24]

2017 , eprint =

Spectrally-normalized margin bounds for neural networks , author =. 2017 , eprint =

2017

[22] [25]

2017 , eprint=

Exploring Generalization in Deep Learning , author=. 2017 , eprint=

2017

[23] [26]

Vapnik, Vladimir Naumovich , biburl =

[24] [27]

Science , year =

Distilling Free-Form Natural Laws from Experimental Data , author =. Science , year =

[25] [28]

, title =

Koza, John R. , title =. 1992 , isbn =

1992

[26] [30]

2021 , eprint =

Deep Symbolic Regression: Recovering Mathematical Expressions from Data via Risk-Seeking Policy Gradients , author =. 2021 , eprint =

2021

[27] [32]

Proceedings of the National Academy of Sciences , volume =

Bongard, Josh and Lipson, Hod , title =. Proceedings of the National Academy of Sciences , volume =. 2007 , doi =

2007

[28] [33]

and Proctor, Joshua L

Brunton, Steven L. and Proctor, Joshua L. and Kutz, J. Nathan , title =. Proceedings of the National Academy of Sciences , year =

[29] [34]

Journal of Computational Physics , volume =

Raissi, Maziar and Perdikaris, Paris and Karniadakis, George Em , title =. Journal of Computational Physics , volume =. 2019 , doi =

2019

[30] [37]

2020 , eprint=

Discovering Symbolic Models from Deep Learning with Inductive Biases , author=. 2020 , eprint=

2020

[31] [38]

2021 , eprint =

Contemporary Symbolic Regression Methods and their Relative Performance , author =. 2021 , eprint =

2021

[32] [39]

Automatica , volume =

Rissanen, Jorma , title =. Automatica , volume =. 1978 , doi =

1978

[33] [41]

Neurocomputing , volume =

Lei, Yunwen and Ding, Lixin and Bi, Yingzhou , title =. Neurocomputing , volume =. 2016 , doi =

2016

[34] [42]

Journal of Machine Learning Research , year =

Kanade, Varun and Rebeschini, Patrick and Vaskevicius, Tomas , title =. Journal of Machine Learning Research , year =

[35] [43]

and Kutz, J

Fasel, U. and Kutz, J. N. and Brunton, B. W. and Brunton, S. L. , title =. Proceedings of the Royal Society A: Mathematical, Physical and Engineering Sciences , volume =. 2022 , doi =

2022

[36] [44]

User-friendly introduction to pac-bayes bounds.arXiv preprint arXiv:2110.11216,

Alquier, P. User-friendly introduction to PAC - B ayes bounds, 2024. URL https://arxiv.org/abs/2110.11216. Foundations and Trends in Machine Learning

work page arXiv 2024

[37] [45]

Bartlett, P. L. and Mendelson, S. Rademacher and gaussian complexities: Risk bounds and structural results. J. Mach. Learn. Res., 3: 0 463--482, 2003. URL https://api.semanticscholar.org/CorpusID:463216

2003

[38] [46]

Spectrally-normalized margin bounds for neural networks

Bartlett, P. L., Foster, D. J., and Telgarsky, M. J. Spectrally-normalized margin bounds for neural networks, 2017. URL https://arxiv.org/abs/1706.08498

work page internal anchor Pith review Pith/arXiv arXiv 2017

[39] [47]

Discovering governing equations from data by sparse identification of nonlinear dynamical systems

Brunton, S. L., Proctor, J. L., and Kutz, J. N. Discovering governing equations from data by sparse identification of nonlinear dynamical systems. Proceedings of the National Academy of Sciences, 113 0 (15): 0 3932--3937, 2016. doi:10.1073/pnas.1517384113

work page doi:10.1073/pnas.1517384113 2016

[40] [48]

PAC - B ayesian Supervised Classification: The Thermodynamics of Statistical Learning , volume 56 of Lecture Notes---Monograph Series

Catoni, O. PAC - B ayesian Supervised Classification: The Thermodynamics of Statistical Learning , volume 56 of Lecture Notes---Monograph Series. Institute of Mathematical Statistics, Beachwood, Ohio, 2007. ISBN 978-0-940600-68-7. doi:10.1214/074921707000000391. URL https://doi.org/10.1214/074921707000000391

work page doi:10.1214/074921707000000391 2007

[41] [49]

Interpretable Machine Learning for Science with PySR and SymbolicRegression.jl

Cranmer, M. Interpretable machine learning for science with PySR and S ymbolic R egression.jl, 2023. URL https://arxiv.org/abs/2305.01582

work page internal anchor Pith review Pith/arXiv arXiv 2023

[42] [50]

Discovering symbolic models from deep learning with inductive biases, 2020

Cranmer, M., Sanchez-Gonzalez, A., Battaglia, P., Xu, R., Cranmer, K., Spergel, D., and Ho, S. Discovering symbolic models from deep learning with inductive biases, 2020. URL https://arxiv.org/abs/2006.11287

work page arXiv 2020

[43] [51]

Dudley, R. M. The sizes of compact subsets of hilbert space and continuity of gaussian processes. Journal of Functional Analysis, 1: 0 125--165, 1967. URL https://api.semanticscholar.org/CorpusID:122249056

1967

[44] [52]

Exponential tail local R ademacher complexity risk bounds without the B ernstein condition

Kanade, V., Rebeschini, P., and Vaskevicius, T. Exponential tail local R ademacher complexity risk bounds without the B ernstein condition. Journal of Machine Learning Research, 25 0 (388): 0 1--43, 2024. URL http://jmlr.org/papers/v25/23-0063.html

2024

[45] [53]

and Panchenko, D

Koltchinskii, V. and Panchenko, D. Empirical margin distributions and bounding the generalization error of combined classifiers. Annals of Statistics, 30: 0 1--50, 2002. URL https://api.semanticscholar.org/CorpusID:2307733

2002

[46] [54]

Koza, J. R. Genetic Programming: On the Programming of Computers by Means of Natural Selection. MIT Press, Cambridge, MA, 1992. ISBN 978-0-262-11170-6

1992

[47] [55]

La Cava, W., Orzechowski, P., Burlacu, B., Olivetti de Fran c a, F., Virgolin, M., Jin, Y., Kommenda, M., and Moore, J. H. Contemporary symbolic regression methods and their relative performance, 2021. URL https://arxiv.org/abs/2107.14351. NeurIPS 2021 Datasets and Benchmarks Track; introduces the SRBench suite

work page arXiv 2021

[48] [56]

and Seeger, M

Langford, J. and Seeger, M. Bounds for averaging classifiers. 02 2001

2001

[49] [57]

Ledoux and M

Ledoux, M. and Talagrand, M. Probability in B anach Spaces: Isoperimetry and Processes . Springer-Verlag, Berlin, 1991. ISBN 978-3-540-52013-9. doi:10.1007/978-3-642-20212-4

work page doi:10.1007/978-3-642-20212-4 1991

[50] [58]

Local R ademacher complexity bounds based on covering numbers

Lei, Y., Ding, L., and Bi, Y. Local R ademacher complexity bounds based on covering numbers. Neurocomputing, 218: 0 320--330, 2016. doi:10.1016/j.neucom.2016.08.074

work page doi:10.1016/j.neucom.2016.08.074 2016

[51] [59]

and Chawla, S

Makke, N. and Chawla, S. Interpretable scientific discovery with symbolic regression: a review. Artificial Intelligence Review, 57 0 (1), 2024. doi:10.1007/s10462-023-10622-0. URL https://doi.org/10.1007/s10462-023-10622-0. Article number 2

work page doi:10.1007/s10462-023-10622-0 2024

[52] [60]

A chain rule for the expected suprema of G aussian processes

Maurer, A. A chain rule for the expected suprema of G aussian processes. Theoretical Computer Science, 650: 0 109--122, 2016 a . doi:10.1016/j.tcs.2016.07.034

work page doi:10.1016/j.tcs.2016.07.034 2016

[53] [61]

A vector-contraction inequality for Rademacher complexities

Maurer, A. A vector-contraction inequality for R ademacher complexities, 2016 b . URL https://arxiv.org/abs/1605.00251

work page internal anchor Pith review Pith/arXiv arXiv 2016

[54] [62]

Structured Sparsity and Generalization

Maurer, A. and Pontil, M. Structured sparsity and generalization, 2011. URL https://arxiv.org/abs/1108.3476

work page internal anchor Pith review Pith/arXiv arXiv 2011

[55] [63]

McAllester, D. A. Some PAC - B ayesian theorems. Machine Learning, 37 0 (3): 0 355--363, 1999. doi:10.1023/A:1007618624809. URL https://doi.org/10.1023/A:1007618624809

work page doi:10.1023/a:1007618624809 1999

[56] [64]

Foundations of Machine Learning

Mohri, M., Rostamizadeh, A., and Talwalkar, A. Foundations of Machine Learning. MIT Press, Cambridge, MA, 2nd edition, 2018. ISBN 9780262039406

2018

[57] [65]

Exploring Generalization in Deep Learning

Neyshabur, B., Bhojanapalli, S., McAllester, D., and Srebro, N. Exploring generalization in deep learning, 2017. URL https://arxiv.org/abs/1706.08947

work page internal anchor Pith review Pith/arXiv arXiv 2017

[58] [66]

K., Landajuela, M., Mundhenk, T

Petersen, B. K., Landajuela, M., Mundhenk, T. N., Santiago, C. P., Kim, S. K., and Kim, J. T. Deep symbolic regression: Recovering mathematical expressions from data via risk-seeking policy gradients, 2021. URL https://arxiv.org/abs/1912.04871. Published at ICLR 2021

work page arXiv 2021

[59] [67]

Modeling by shortest data description

Rissanen, J. Modeling by shortest data description. Automatica, 14 0 (5): 0 465--471, 1978. doi:10.1016/0005-1098(78)90005-5

work page doi:10.1016/0005-1098(78)90005-5 1978

[60] [68]

Schmidt, M. D. and Lipson, H. Distilling free-form natural laws from experimental data. Science, 324 0 (5923): 0 81--85, 2009 a . doi:10.1126/science.1165893

work page doi:10.1126/science.1165893 2009

[61] [69]

Schmidt, M. D. and Lipson, H. Symbolic regression of implicit equations. In Genetic Programming Theory and Practice VII, Genetic and Evolutionary Computation, pp.\ 73--85. Springer, New York, 2009 b . doi:10.1007/978-1-4419-1626-6\_5. URL https://doi.org/10.1007/978-1-4419-1626-6\_5

work page doi:10.1007/978-1-4419-1626-6 2009

[62] [70]

Pac-bayesian generalization error bounds for gaussian process classification

Seeger, M. Pac-bayesian generalization error bounds for gaussian process classification. Journal of Machine Learning Research, 3, 08 2002. doi:10.1162/153244303765208386

work page doi:10.1162/153244303765208386 2002

[63] [71]

and Ben-David, S

Shalev-Shwartz, S. and Ben-David, S. Understanding Machine Learning: From Theory to Algorithms. Cambridge University Press, 2014. ISBN 978-1-107-05713-5. doi:10.1017/CBO9781107298019

work page doi:10.1017/cbo9781107298019 2014

[64] [72]

Upper and Lower Bounds for Stochastic Processes: Modern Methods and Classical Problems

Talagrand, M. Upper and Lower Bounds for Stochastic Processes: Modern Methods and Classical Problems. Springer, Heidelberg, 2 edition, 2014. ISBN 978-3-642-54074-5. doi:10.1007/978-3-642-54075-2

work page doi:10.1007/978-3-642-54075-2 2014

[65] [73]

Truong, L. V. On R ademacher complexity-based generalization bounds for deep learning, 2022. URL https://arxiv.org/abs/2208.04284

work page arXiv 2022

[66] [74]

and Tegmark, M

Udrescu, S.-M. and Tegmark, M. AI F eynman: A physics-inspired method for symbolic regression, 2020. URL https://arxiv.org/abs/1905.11481

work page arXiv 2020

[67] [75]

Valiant, L. G. A theory of the learnable. Commun. ACM, 27: 0 1134--1142, 1984. URL https://api.semanticscholar.org/CorpusID:59712

1984

[68] [76]

van der Vaart, A. W. and Wellner, J. A. Weak Convergence and Empirical Processes: With Applications to Statistics. Springer Series in Statistics. Springer, 1996. ISBN 978-0-387-94640-5. doi:10.1007/978-1-4757-2545-2

work page doi:10.1007/978-1-4757-2545-2 1996

[69] [77]

Vapnik, V. N. Statistical Learning Theory. Wiley, New York, NY, USA, September 1998

1998