pith. sign in

arxiv: 2606.17319 · v1 · pith:HCC4VJFGnew · submitted 2026-06-15 · 📊 stat.ML · cs.LG· math.CO· math.ST· stat.TH

Tight L_infty Sample Complexity for Low-Degree and Sparse Boolean Polynomials

Pith reviewed 2026-06-27 02:06 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.COmath.STstat.TH
keywords sample complexityBoolean hypercubeL_infty estimationlow-degree polynomialssparse polynomialsFourier-Walsh expansionsubgaussian noiseuniform convergence
0
0 comments X

The pith

Uniform L_infty estimation of degree-d Boolean polynomials requires Theta(n^{d+1}) samples under subgaussian noise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper characterizes the minimax sample complexity for learning bounded Boolean polynomials with uniform L_infty error guarantees rather than L2-type averages, motivated by the need for surrogates whose optimizers remain close to the true optimum of a black-box objective. For polynomials of degree at most d the complexity scales as n^{d+1}; for s-sparse Fourier-Walsh polynomials it scales as n s^2. These rates exceed the noiseless exact-recovery rates of n^d and n s by extra factors of n and s. The lower bounds apply even to adaptive sampling, and the proofs use auxiliary norms as proxies because standard Fourier tools do not directly control uniform error.

Core claim

For polynomials of degree at most d on n variables the minimax sample complexity of uniform L_infty estimation under subgaussian noise is Theta(n^{d+1}). For s-sparse Fourier-Walsh polynomials with s at most n it is Theta(n s^2). These rates differ structurally from the noiseless setting where uniform exact recovery scales as n^d and n s respectively. The lower bounds hold for arbitrary adaptive learners, and the proofs rely on suitably chosen auxiliary norms that serve as proxies for the L_infty error.

What carries the argument

Suitably chosen auxiliary norms used as proxies for controlling the L_infty error, which enable uniform guarantees where standard Fourier-analysis tools for the L2 norm do not extend.

If this is right

  • The extra factor of n in sample complexity for low-degree polynomials is intrinsic even under adaptive sampling.
  • The extra factor of s appears for sparse polynomials and is likewise intrinsic.
  • Standard L2 Fourier tools fail to produce the needed uniform guarantees, requiring new proof techniques via auxiliary norms.
  • The structural gap between noisy and noiseless rates holds for both polynomial classes studied.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • These bounds suggest that practitioners may need to prefer lower-degree models over sparser ones when n is large and noise is present, to keep total samples manageable.
  • The same auxiliary-norm technique could be tested on other discrete domains or on functions that are neither low-degree nor sparse.
  • Empirical verification on small synthetic instances with known polynomials would check whether the predicted scaling appears in finite-sample regimes.

Load-bearing premise

Uniform L_infty error guarantees are required to ensure that optimizing the learned surrogate yields good solutions for the underlying bounded binary objective.

What would settle it

A learner achieving uniform L_infty approximation of all degree-d polynomials with o(n^{d+1}) subgaussian-noise samples on n variables, or a matching upper-bound algorithm for the sparse case with o(n s^2) samples.

read the original abstract

Motivated by the optimization of bounded binary black-box functions, we study the problem of learning polynomial surrogates over the Boolean hypercube. To ensure that optimizing the surrogate yields good solutions for the underlying objective, we require uniform $L_\infty$-error guarantees rather than the usual $L_2$-type guarantees. We characterize the minimax sample complexity of uniform estimation under subgaussian noise for two classes of bounded polynomials. First, for polynomials of degree at most $d$ on $n$ variables, the sample complexity scales as $n^{d+1}$. Second, for $s$-sparse Fourier-Walsh polynomials with $s \leq n$, it scales as $ns^2$. These rates differ structurally from the noiseless setting, where uniform exact recovery scales as $n^d$ and $ns$, respectively. Our lower bounds hold even for arbitrary adaptive learners, showing that the additional factors are intrinsic to the noisy cases. Standard Fourier-analysis tools for the $L_2$-norm do not naturally extend to the $L_\infty$-setting in a way that yields uniform guarantees. Our proofs overcome this difficulty by relying on suitably chosen auxiliary norms that serve as proxies for controlling the $L_\infty$-error. Together, our results provide a tight characterization of the sample complexity of learning optimization-safe polynomial surrogates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The paper characterizes the minimax sample complexity of uniform L_∞ estimation for two classes of bounded polynomials on the Boolean hypercube under sub-Gaussian noise. For polynomials of degree at most d on n variables, the sample complexity scales as n^{d+1}. For s-sparse Fourier-Walsh polynomials with s ≤ n, it scales as n s^2. These rates are shown to be tight with matching upper and lower bounds; the lower bounds hold even for arbitrary adaptive learners. The rates differ structurally from the noiseless setting (n^d and n s, respectively). Proofs rely on auxiliary norms as proxies to control the L_∞ error, overcoming limitations of standard Fourier L_2 tools.

Significance. If the results hold, this work delivers a tight characterization of sample complexity for learning optimization-safe polynomial surrogates, which is relevant for black-box optimization of bounded functions. Credit is due for the matching bounds under noise, the information-theoretic arguments establishing adaptivity lower bounds, and the auxiliary-norm technique that yields explicit control for the claimed rates. These elements strengthen the contribution to statistical learning theory in noisy uniform-approximation settings.

minor comments (2)
  1. The abstract motivates the L_∞ requirement by noting that it ensures optimizing the surrogate yields good solutions, but a short explicit inequality or reduction in the introduction would make this link more concrete for readers.
  2. A comparison table of the noisy versus noiseless rates (n^{d+1} vs n^d, and n s^2 vs n s) placed early in the paper would improve readability and highlight the structural differences.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their careful reading and positive assessment of the manuscript. The report recommends minor revision but lists no specific major comments or requested changes. Accordingly, we have no individual points to address point-by-point. We are happy to incorporate any minor editorial suggestions that may arise during the revision process.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents minimax sample-complexity characterizations for uniform L_infty estimation of low-degree and s-sparse Boolean polynomials under sub-Gaussian noise. The claimed rates (n^{d+1} and ns^2) are derived via information-theoretic lower bounds that hold for adaptive learners and matching upper bounds obtained through auxiliary-norm proxies; these steps rely on standard Fourier analysis and concentration inequalities rather than any fitted parameters, self-definitions, or load-bearing self-citations. The distinction from the noiseless setting is established directly by comparing the noisy and exact-recovery regimes within the same analytic framework. No equation or claim reduces to its own input by construction, and the derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based on abstract only; standard subgaussian noise and bounded polynomial assumptions are invoked but not detailed.

axioms (2)
  • domain assumption Noise is subgaussian
    Invoked to obtain concentration for uniform error bounds under noise.
  • domain assumption Polynomials are bounded
    Required for the L_infty uniform estimation setting.

pith-pipeline@v0.9.1-grok · 5793 in / 1159 out tokens · 34986 ms · 2026-06-27T02:06:05.102988+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

38 extracted references · 23 canonical work pages

  1. [1]

    Efficiently learning Fourier sparse set functions

    Andisheh Amrollahi, Amir Zandieh, Michael Kapralov, and Andreas Krause. Efficiently learning Fourier sparse set functions. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, 11 and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_fil...

  2. [2]

    Learning polynomials with neural networks

    Alexandr Andoni, Rina Panigrahy, Gregory Valiant, and Li Zhang. Learning polynomials with neural networks. In Eric P. Xing and Tony Jebara, editors,Proceedings of the 31st International Conference on Machine Learning, volume 32(2) ofProceedings of Machine Learning Research, pages 1908–1916, Beijing, China, June 2014. PMLR. URLhttps://proceedings.mlr.press...

  3. [3]

    Learning sparse polynomial functions

    Alexandr Andoni, Rina Panigrahy, Gregory Valiant, and Li Zhang. Learning sparse polynomial functions. InProceedings of the 2014 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 500–510, 2014.https://doi.org/10.1137/1.9781611973402.37

  4. [4]

    Bayesian optimization of combinatorial structures

    Ricardo Baptista and Matthias Poloczek. Bayesian optimization of combinatorial structures. In Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 462–471. PMLR, July 2018. URL https://proceedings.mlr.press/v80/baptista18a.html

  5. [5]

    Inequalities in Fourier analysis.Annals of Mathematics, 102(1):159–182, 1975

    William Beckner. Inequalities in Fourier analysis.Annals of Mathematics, 102(1):159–182, 1975. ISSN 0003486X, 19398980. URLhttp://www.jstor.org/stable/1970980

  6. [6]

    The Annals of Statistics , author =

    Peter J. Bickel, Yaacov Ritov, and Alexandre B. Tsybakov. Simultaneous analysis of Lasso and Dantzig selector.The Annals of Statistics, 37(4):1705–1732, 2009.https://doi.org/10.1214/08-AOS620

  7. [7]

    H. F. Bohnenblust and Einar Hille. On the absolute convergence of dirichlet series.The Annals of Mathematics, 32(3):600, July 1931. ISSN 0003-486X.https://doi.org/10.2307/1968255

  8. [8]

    étude des coefficients de Fourier des fonctions deLp(G).Annales de l’Institut Fourier, 20 (2):335–402, 1970.https://doi.org/10.5802/aif.357

    Aline Bonami. étude des coefficients de Fourier des fonctions deLp(G).Annales de l’Institut Fourier, 20 (2):335–402, 1970.https://doi.org/10.5802/aif.357

  9. [9]

    The Dantzig selector: Statistical estimation when p is much larger than n.The Annals of Statistics, 35(6):2313–2351, 2007

    Emmanuel Candes and Terence Tao. The Dantzig selector: Statistical estimation when p is much larger than n.The Annals of Statistics, 35(6):2313–2351, 2007. https://doi.org/10.1214/ 009053606000001523

  10. [10]

    On Bayes risk lower bounds.Journal of Machine Learning Research, 17(218):1–58, 2016

    Xi Chen, Adityanand Guntuboyina, and Yuchen Zhang. On Bayes risk lower bounds.Journal of Machine Learning Research, 17(218):1–58, 2016. URLhttp://jmlr.org/papers/v17/16-185.html

  11. [11]

    Hammer.Boolean Functions: Theory, Algorithms, and Applications

    Yves Crama and Peter L. Hammer.Boolean Functions: Theory, Algorithms, and Applications. Encyclopedia of Mathematics and its Applications. Cambridge University Press, 2011. https: //doi.org/10.1017/CBO9780511852008

  12. [12]

    Hoffman, Troy David Loeffler, and Subramanian Sankaranarayanan

    Hamid Dadkhahi, Karthikeyan Shanmugam, Jesus Rios, Payel Das, Samuel C. Hoffman, Troy David Loeffler, and Subramanian Sankaranarayanan. Combinatorial black-box optimization with expert advice. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’20, pages 1918–1927. ACM, August 2020.https://doi.org/10....

  13. [13]

    Fourier representations for black- box optimization over categorical variables.Proceedings of the AAAI Conference on Artificial Intelligence, 36(9):10156–10165, June 2022

    Hamid Dadkhahi, Jesus Rios, Karthikeyan Shanmugam, and Payel Das. Fourier representations for black- box optimization over categorical variables.Proceedings of the AAAI Conference on Artificial Intelligence, 36(9):10156–10165, June 2022. ISSN 2159-5399.https://doi.org/10.1609/aaai.v36i9.21255

  14. [14]

    On the Fourier spectrum of functions on boolean cubes.Mathematische Annalen, 374(1-2):653–680, September 2018

    Andreas Defant, Mieczysław Mastyło, and Antonio Pérez. On the Fourier spectrum of functions on boolean cubes.Mathematische Annalen, 374(1-2):653–680, September 2018. ISSN 1432-1807. https://doi.org/10.1007/s00208-018-1756-y

  15. [15]

    Walsh-based surrogate-assisted multi-objective combinatorial optimization: A fine-grained analysis for pseudo-boolean functions.Applied Soft Computing, 136:110061, March 2023

    Bilel Derbel, Geoffrey Pruvost, Arnaud Liefooghe, Sébastien Verel, and Qingfu Zhang. Walsh-based surrogate-assisted multi-objective combinatorial optimization: A fine-grained analysis for pseudo-boolean functions.Applied Soft Computing, 136:110061, March 2023. ISSN 1568-4946.https://doi.org/10. 1016/j.asoc.2023.110061. 12

  16. [16]

    High-dimensional Bayesian optimization with sparse axis- aligned subspaces

    David Eriksson and Martin Jankowiak. High-dimensional Bayesian optimization with sparse axis- aligned subspaces. In Cassio de Campos and Marloes H. Maathuis, editors,Proceedings of the Thirty- Seventh Conference on Uncertainty in Artificial Intelligence, volume 161 ofProceedings of Machine Learning Research, pages 493–503. PMLR, July 2021. URLhttps://proc...

  17. [17]

    Learning low-degree functions from a logarithmic number of random queries

    Alexandros Eskenazis and Paata Ivanisvili. Learning low-degree functions from a logarithmic number of random queries. InProceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC ’22. ACM, June 2022.https://doi.org/10.1145/3519935.3519981

  18. [18]

    Low-degree learning and the metric entropy of polynomials.Discrete Analysis, November 2023.https://doi.org/10.19086/da.88507

    Alexandros Eskenazis, Paata Ivanisvili, and Lauritz Streck. Low-degree learning and the metric entropy of polynomials.Discrete Analysis, November 2023.https://doi.org/10.19086/da.88507

  19. [19]

    The restricted isometry property of subsampled fourier matrices

    Ishay Haviv and Oded Regev. The restricted isometry property of subsampled fourier matrices. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’16, pages 288–297, USA, 2016. Society for Industrial and Applied Mathematics. ISBN 9781611974331. https://doi.org/10.1007/978-3-319-45282-1_11

  20. [20]

    Communication network reliability under geograph- ically correlated failures using probabilistic seismic hazard analysis.IEEE Access, 11:31341–31354, 2023

    Dora Jiménez, Javiera Barrera, and Héctor Cancela. Communication network reliability under geograph- ically correlated failures using probabilistic seismic hazard analysis.IEEE Access, 11:31341–31354, 2023. ISSN 2169-3536.https://doi.org/10.1109/access.2023.3255794

  21. [21]

    2025, PASA, 42, e136, doi: 10.1017/pasa.2025.10101

    David Krieg, Erich Novak, and Mario Ullrich. On the power of adaption and randomization.Forum of Mathematics, Sigma, 13, 2025. ISSN 2050-5094.https://doi.org/10.1017/fms.2025.10101. URL http://dx.doi.org/10.1017/fms.2025.10101

  22. [22]

    Lattimore and C

    T. Lattimore and C. Szepesvári.Bandit Algorithms. Cambridge University Press, 2020. ISBN 9781108486828. URLhttps://books.google.cl/books?id=bydXzAEACAAJ

  23. [23]

    Springer New York, 2012

    Jon Lee and Sven Leyffer.Mixed Integer Nonlinear Programming. Springer New York, 2012. ISBN 9781461419273.https://doi.org/10.1007/978-1-4614-1927-3

  24. [24]

    Constant depth circuits, Fourier transform, and learnability.Journal of the ACM, 40(3):607–620, July 1993

    Nathan Linial, Yishay Mansour, and Noam Nisan. Constant depth circuits, Fourier transform, and learnability.Journal of the ACM, 40(3):607–620, July 1993. ISSN 1557-735X.https://doi.org/10. 1145/174130.174138

  25. [25]

    Learning sparse Boolean polynomials

    Sahand Negahban and Devavrat Shah. Learning sparse Boolean polynomials. In2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 2032–2036. IEEE, October 2012.https://doi.org/10.1109/allerton.2012.6483472

  26. [26]

    Cambridge University Press, June 2014

    Ryan O’Donnell.Analysis of Boolean Functions. Cambridge University Press, June 2014. ISBN 9781107471542.https://doi.org/10.1017/cbo9781139814782

  27. [27]

    Combinatorial Bayesian optimiza- tion using the graph Cartesian product

    Changyong Oh, Jakub Tomczak, Efstratios Gavves, and Max Welling. Combinatorial Bayesian optimiza- tion using the graph Cartesian product. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Cur- ran Associates, Inc., 2019. URLhttps://proceedings.neurips...

  28. [28]

    Pijnappel, J.L

    T.R. Pijnappel, J.L. van den Berg, S.C. Borst, and R. Litjens. Online positioning of a drone-mounted base station in emergency scenarios.IEEE Transactions on Vehicular Technology, 73(4):5572–5586, April 2024. ISSN 1939-9359.https://doi.org/10.1109/tvt.2023.3329960

  29. [29]

    High-dimensional statistics, 2023

    Philippe Rigollet and Jan-Christian Hütter. High-dimensional statistics, 2023. URLhttps://doi.org/ 10.48550/arXiv.2310.19244. Lecture notes for MIT course

  30. [30]

    Adams, and Nando de Freitas

    Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P. Adams, and Nando de Freitas. Taking the human out of the loop: A review of Bayesian optimization.Proceedings of the IEEE, 104(1):148–175, January

  31. [31]

    ISSN 1558-2256.https://doi.org/10.1109/jproc.2015.2494218. 13

  32. [32]

    Learning fourier sparse set functions

    Peter Stobbe and Andreas Krause. Learning fourier sparse set functions. InProceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, volume 22 ofProceedings of Machine Learning Research, La Palma, Canary Islands, April 2012. PMLR. URLhttps://proceedings.mlr. press/v22/stobbe12.html

  33. [33]

    Regression shrinkage and selection via the Lasso.Journal of the Royal Sta- tistical Society: Series B (Methodological), 58(1):267–288, 1996

    Robert Tibshirani. Regression shrinkage and selection via the Lasso.Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996.https://doi.org/10.1111/j.2517-6161. 1996.tb02080.x

  34. [34]

    Tsybakov.Introduction to Nonparametric Estimation

    Alexandre B. Tsybakov.Introduction to Nonparametric Estimation. Springer New York, 2009. ISBN 9780387790527.https://doi.org/10.1007/b13794. URLhttp://dx.doi.org/10.1007/b13794

  35. [35]

    High-Dimensional Probability: An Introduction with Applications in Data Science, volume 47 of Cambridge Series in Statistical and Probabilistic Mathematics

    Roman Vershynin.High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018. https://doi.org/10.1017/9781108231596

  36. [36]

    , year =

    Martin J. Wainwright.High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, February 2019. ISBN 9781108498029.https://doi.org/10.1017/9781108627771. 14 A Tools for upper and lower-bounds A.1 Walsh polynomials Lemma A.1(Moments of signed Walsh sums).Let A ⊆ 2[n] sa...

  37. [37]

    Deterministic lower bound.We prove the lower bound by indistinguishability

    to lift this lower bound to randomized algorithms. Deterministic lower bound.We prove the lower bound by indistinguishability. Fix a determin- istic exact-query learner usingm < D queries, and run it on the zero function. This determines queried pointsx1, . . . , xm ∈ {−1, 1}n, all with observed value zero. Since these queries impose onlym homogeneous lin...

  38. [38]

    On that event, a fixed estimatorbf cannot be withinε of both posterior draws

    Thus, choosing C0 > 0sufficiently large, two independent posterior draws are separated by more than 2ε in uniform norm with conditional probability at least3/4. On that event, a fixed estimatorbf cannot be withinε of both posterior draws. Whenever∥fµG −f µG′∥L∞ > 2ε, at least one of the two inequalities ∥fµG − bf∥ L∞ > ε and ∥fµG′ − bf∥ L∞ > ε must hold, ...