Tight L_infty Sample Complexity for Low-Degree and Sparse Boolean Polynomials
Pith reviewed 2026-06-27 02:06 UTC · model grok-4.3
The pith
Uniform L_infty estimation of degree-d Boolean polynomials requires Theta(n^{d+1}) samples under subgaussian noise.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
For polynomials of degree at most d on n variables the minimax sample complexity of uniform L_infty estimation under subgaussian noise is Theta(n^{d+1}). For s-sparse Fourier-Walsh polynomials with s at most n it is Theta(n s^2). These rates differ structurally from the noiseless setting where uniform exact recovery scales as n^d and n s respectively. The lower bounds hold for arbitrary adaptive learners, and the proofs rely on suitably chosen auxiliary norms that serve as proxies for the L_infty error.
What carries the argument
Suitably chosen auxiliary norms used as proxies for controlling the L_infty error, which enable uniform guarantees where standard Fourier-analysis tools for the L2 norm do not extend.
If this is right
- The extra factor of n in sample complexity for low-degree polynomials is intrinsic even under adaptive sampling.
- The extra factor of s appears for sparse polynomials and is likewise intrinsic.
- Standard L2 Fourier tools fail to produce the needed uniform guarantees, requiring new proof techniques via auxiliary norms.
- The structural gap between noisy and noiseless rates holds for both polynomial classes studied.
Where Pith is reading between the lines
- These bounds suggest that practitioners may need to prefer lower-degree models over sparser ones when n is large and noise is present, to keep total samples manageable.
- The same auxiliary-norm technique could be tested on other discrete domains or on functions that are neither low-degree nor sparse.
- Empirical verification on small synthetic instances with known polynomials would check whether the predicted scaling appears in finite-sample regimes.
Load-bearing premise
Uniform L_infty error guarantees are required to ensure that optimizing the learned surrogate yields good solutions for the underlying bounded binary objective.
What would settle it
A learner achieving uniform L_infty approximation of all degree-d polynomials with o(n^{d+1}) subgaussian-noise samples on n variables, or a matching upper-bound algorithm for the sparse case with o(n s^2) samples.
read the original abstract
Motivated by the optimization of bounded binary black-box functions, we study the problem of learning polynomial surrogates over the Boolean hypercube. To ensure that optimizing the surrogate yields good solutions for the underlying objective, we require uniform $L_\infty$-error guarantees rather than the usual $L_2$-type guarantees. We characterize the minimax sample complexity of uniform estimation under subgaussian noise for two classes of bounded polynomials. First, for polynomials of degree at most $d$ on $n$ variables, the sample complexity scales as $n^{d+1}$. Second, for $s$-sparse Fourier-Walsh polynomials with $s \leq n$, it scales as $ns^2$. These rates differ structurally from the noiseless setting, where uniform exact recovery scales as $n^d$ and $ns$, respectively. Our lower bounds hold even for arbitrary adaptive learners, showing that the additional factors are intrinsic to the noisy cases. Standard Fourier-analysis tools for the $L_2$-norm do not naturally extend to the $L_\infty$-setting in a way that yields uniform guarantees. Our proofs overcome this difficulty by relying on suitably chosen auxiliary norms that serve as proxies for controlling the $L_\infty$-error. Together, our results provide a tight characterization of the sample complexity of learning optimization-safe polynomial surrogates.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper characterizes the minimax sample complexity of uniform L_∞ estimation for two classes of bounded polynomials on the Boolean hypercube under sub-Gaussian noise. For polynomials of degree at most d on n variables, the sample complexity scales as n^{d+1}. For s-sparse Fourier-Walsh polynomials with s ≤ n, it scales as n s^2. These rates are shown to be tight with matching upper and lower bounds; the lower bounds hold even for arbitrary adaptive learners. The rates differ structurally from the noiseless setting (n^d and n s, respectively). Proofs rely on auxiliary norms as proxies to control the L_∞ error, overcoming limitations of standard Fourier L_2 tools.
Significance. If the results hold, this work delivers a tight characterization of sample complexity for learning optimization-safe polynomial surrogates, which is relevant for black-box optimization of bounded functions. Credit is due for the matching bounds under noise, the information-theoretic arguments establishing adaptivity lower bounds, and the auxiliary-norm technique that yields explicit control for the claimed rates. These elements strengthen the contribution to statistical learning theory in noisy uniform-approximation settings.
minor comments (2)
- The abstract motivates the L_∞ requirement by noting that it ensures optimizing the surrogate yields good solutions, but a short explicit inequality or reduction in the introduction would make this link more concrete for readers.
- A comparison table of the noisy versus noiseless rates (n^{d+1} vs n^d, and n s^2 vs n s) placed early in the paper would improve readability and highlight the structural differences.
Simulated Author's Rebuttal
We thank the referee for their careful reading and positive assessment of the manuscript. The report recommends minor revision but lists no specific major comments or requested changes. Accordingly, we have no individual points to address point-by-point. We are happy to incorporate any minor editorial suggestions that may arise during the revision process.
Circularity Check
No significant circularity
full rationale
The paper presents minimax sample-complexity characterizations for uniform L_infty estimation of low-degree and s-sparse Boolean polynomials under sub-Gaussian noise. The claimed rates (n^{d+1} and ns^2) are derived via information-theoretic lower bounds that hold for adaptive learners and matching upper bounds obtained through auxiliary-norm proxies; these steps rely on standard Fourier analysis and concentration inequalities rather than any fitted parameters, self-definitions, or load-bearing self-citations. The distinction from the noiseless setting is established directly by comparing the noisy and exact-recovery regimes within the same analytic framework. No equation or claim reduces to its own input by construction, and the derivation remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Noise is subgaussian
- domain assumption Polynomials are bounded
Reference graph
Works this paper leans on
-
[1]
Efficiently learning Fourier sparse set functions
Andisheh Amrollahi, Amir Zandieh, Michael Kapralov, and Andreas Krause. Efficiently learning Fourier sparse set functions. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, 11 and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_fil...
2019
-
[2]
Learning polynomials with neural networks
Alexandr Andoni, Rina Panigrahy, Gregory Valiant, and Li Zhang. Learning polynomials with neural networks. In Eric P. Xing and Tony Jebara, editors,Proceedings of the 31st International Conference on Machine Learning, volume 32(2) ofProceedings of Machine Learning Research, pages 1908–1916, Beijing, China, June 2014. PMLR. URLhttps://proceedings.mlr.press...
1908
-
[3]
Learning sparse polynomial functions
Alexandr Andoni, Rina Panigrahy, Gregory Valiant, and Li Zhang. Learning sparse polynomial functions. InProceedings of the 2014 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 500–510, 2014.https://doi.org/10.1137/1.9781611973402.37
-
[4]
Bayesian optimization of combinatorial structures
Ricardo Baptista and Matthias Poloczek. Bayesian optimization of combinatorial structures. In Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 462–471. PMLR, July 2018. URL https://proceedings.mlr.press/v80/baptista18a.html
2018
-
[5]
Inequalities in Fourier analysis.Annals of Mathematics, 102(1):159–182, 1975
William Beckner. Inequalities in Fourier analysis.Annals of Mathematics, 102(1):159–182, 1975. ISSN 0003486X, 19398980. URLhttp://www.jstor.org/stable/1970980
arXiv 1975
-
[6]
The Annals of Statistics , author =
Peter J. Bickel, Yaacov Ritov, and Alexandre B. Tsybakov. Simultaneous analysis of Lasso and Dantzig selector.The Annals of Statistics, 37(4):1705–1732, 2009.https://doi.org/10.1214/08-AOS620
-
[7]
H. F. Bohnenblust and Einar Hille. On the absolute convergence of dirichlet series.The Annals of Mathematics, 32(3):600, July 1931. ISSN 0003-486X.https://doi.org/10.2307/1968255
-
[8]
Aline Bonami. étude des coefficients de Fourier des fonctions deLp(G).Annales de l’Institut Fourier, 20 (2):335–402, 1970.https://doi.org/10.5802/aif.357
-
[9]
The Dantzig selector: Statistical estimation when p is much larger than n.The Annals of Statistics, 35(6):2313–2351, 2007
Emmanuel Candes and Terence Tao. The Dantzig selector: Statistical estimation when p is much larger than n.The Annals of Statistics, 35(6):2313–2351, 2007. https://doi.org/10.1214/ 009053606000001523
2007
-
[10]
On Bayes risk lower bounds.Journal of Machine Learning Research, 17(218):1–58, 2016
Xi Chen, Adityanand Guntuboyina, and Yuchen Zhang. On Bayes risk lower bounds.Journal of Machine Learning Research, 17(218):1–58, 2016. URLhttp://jmlr.org/papers/v17/16-185.html
2016
-
[11]
Hammer.Boolean Functions: Theory, Algorithms, and Applications
Yves Crama and Peter L. Hammer.Boolean Functions: Theory, Algorithms, and Applications. Encyclopedia of Mathematics and its Applications. Cambridge University Press, 2011. https: //doi.org/10.1017/CBO9780511852008
-
[12]
Hoffman, Troy David Loeffler, and Subramanian Sankaranarayanan
Hamid Dadkhahi, Karthikeyan Shanmugam, Jesus Rios, Payel Das, Samuel C. Hoffman, Troy David Loeffler, and Subramanian Sankaranarayanan. Combinatorial black-box optimization with expert advice. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’20, pages 1918–1927. ACM, August 2020.https://doi.org/10....
-
[13]
Hamid Dadkhahi, Jesus Rios, Karthikeyan Shanmugam, and Payel Das. Fourier representations for black- box optimization over categorical variables.Proceedings of the AAAI Conference on Artificial Intelligence, 36(9):10156–10165, June 2022. ISSN 2159-5399.https://doi.org/10.1609/aaai.v36i9.21255
-
[14]
Andreas Defant, Mieczysław Mastyło, and Antonio Pérez. On the Fourier spectrum of functions on boolean cubes.Mathematische Annalen, 374(1-2):653–680, September 2018. ISSN 1432-1807. https://doi.org/10.1007/s00208-018-1756-y
-
[15]
Bilel Derbel, Geoffrey Pruvost, Arnaud Liefooghe, Sébastien Verel, and Qingfu Zhang. Walsh-based surrogate-assisted multi-objective combinatorial optimization: A fine-grained analysis for pseudo-boolean functions.Applied Soft Computing, 136:110061, March 2023. ISSN 1568-4946.https://doi.org/10. 1016/j.asoc.2023.110061. 12
arXiv 2023
-
[16]
High-dimensional Bayesian optimization with sparse axis- aligned subspaces
David Eriksson and Martin Jankowiak. High-dimensional Bayesian optimization with sparse axis- aligned subspaces. In Cassio de Campos and Marloes H. Maathuis, editors,Proceedings of the Thirty- Seventh Conference on Uncertainty in Artificial Intelligence, volume 161 ofProceedings of Machine Learning Research, pages 493–503. PMLR, July 2021. URLhttps://proc...
2021
-
[17]
Learning low-degree functions from a logarithmic number of random queries
Alexandros Eskenazis and Paata Ivanisvili. Learning low-degree functions from a logarithmic number of random queries. InProceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC ’22. ACM, June 2022.https://doi.org/10.1145/3519935.3519981
-
[18]
Alexandros Eskenazis, Paata Ivanisvili, and Lauritz Streck. Low-degree learning and the metric entropy of polynomials.Discrete Analysis, November 2023.https://doi.org/10.19086/da.88507
-
[19]
The restricted isometry property of subsampled fourier matrices
Ishay Haviv and Oded Regev. The restricted isometry property of subsampled fourier matrices. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’16, pages 288–297, USA, 2016. Society for Industrial and Applied Mathematics. ISBN 9781611974331. https://doi.org/10.1007/978-3-319-45282-1_11
-
[20]
Dora Jiménez, Javiera Barrera, and Héctor Cancela. Communication network reliability under geograph- ically correlated failures using probabilistic seismic hazard analysis.IEEE Access, 11:31341–31354, 2023. ISSN 2169-3536.https://doi.org/10.1109/access.2023.3255794
-
[21]
2025, PASA, 42, e136, doi: 10.1017/pasa.2025.10101
David Krieg, Erich Novak, and Mario Ullrich. On the power of adaption and randomization.Forum of Mathematics, Sigma, 13, 2025. ISSN 2050-5094.https://doi.org/10.1017/fms.2025.10101. URL http://dx.doi.org/10.1017/fms.2025.10101
-
[22]
Lattimore and C
T. Lattimore and C. Szepesvári.Bandit Algorithms. Cambridge University Press, 2020. ISBN 9781108486828. URLhttps://books.google.cl/books?id=bydXzAEACAAJ
2020
-
[23]
Jon Lee and Sven Leyffer.Mixed Integer Nonlinear Programming. Springer New York, 2012. ISBN 9781461419273.https://doi.org/10.1007/978-1-4614-1927-3
-
[24]
Nathan Linial, Yishay Mansour, and Noam Nisan. Constant depth circuits, Fourier transform, and learnability.Journal of the ACM, 40(3):607–620, July 1993. ISSN 1557-735X.https://doi.org/10. 1145/174130.174138
arXiv 1993
-
[25]
Learning sparse Boolean polynomials
Sahand Negahban and Devavrat Shah. Learning sparse Boolean polynomials. In2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 2032–2036. IEEE, October 2012.https://doi.org/10.1109/allerton.2012.6483472
-
[26]
Cambridge University Press, June 2014
Ryan O’Donnell.Analysis of Boolean Functions. Cambridge University Press, June 2014. ISBN 9781107471542.https://doi.org/10.1017/cbo9781139814782
-
[27]
Combinatorial Bayesian optimiza- tion using the graph Cartesian product
Changyong Oh, Jakub Tomczak, Efstratios Gavves, and Max Welling. Combinatorial Bayesian optimiza- tion using the graph Cartesian product. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Cur- ran Associates, Inc., 2019. URLhttps://proceedings.neurips...
2019
-
[28]
T.R. Pijnappel, J.L. van den Berg, S.C. Borst, and R. Litjens. Online positioning of a drone-mounted base station in emergency scenarios.IEEE Transactions on Vehicular Technology, 73(4):5572–5586, April 2024. ISSN 1939-9359.https://doi.org/10.1109/tvt.2023.3329960
-
[29]
High-dimensional statistics, 2023
Philippe Rigollet and Jan-Christian Hütter. High-dimensional statistics, 2023. URLhttps://doi.org/ 10.48550/arXiv.2310.19244. Lecture notes for MIT course
-
[30]
Adams, and Nando de Freitas
Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P. Adams, and Nando de Freitas. Taking the human out of the loop: A review of Bayesian optimization.Proceedings of the IEEE, 104(1):148–175, January
-
[31]
ISSN 1558-2256.https://doi.org/10.1109/jproc.2015.2494218. 13
-
[32]
Learning fourier sparse set functions
Peter Stobbe and Andreas Krause. Learning fourier sparse set functions. InProceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, volume 22 ofProceedings of Machine Learning Research, La Palma, Canary Islands, April 2012. PMLR. URLhttps://proceedings.mlr. press/v22/stobbe12.html
2012
-
[33]
Robert Tibshirani. Regression shrinkage and selection via the Lasso.Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996.https://doi.org/10.1111/j.2517-6161. 1996.tb02080.x
-
[34]
Tsybakov.Introduction to Nonparametric Estimation
Alexandre B. Tsybakov.Introduction to Nonparametric Estimation. Springer New York, 2009. ISBN 9780387790527.https://doi.org/10.1007/b13794. URLhttp://dx.doi.org/10.1007/b13794
-
[35]
Roman Vershynin.High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018. https://doi.org/10.1017/9781108231596
-
[36]
Martin J. Wainwright.High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, February 2019. ISBN 9781108498029.https://doi.org/10.1017/9781108627771. 14 A Tools for upper and lower-bounds A.1 Walsh polynomials Lemma A.1(Moments of signed Walsh sums).Let A ⊆ 2[n] sa...
-
[37]
Deterministic lower bound.We prove the lower bound by indistinguishability
to lift this lower bound to randomized algorithms. Deterministic lower bound.We prove the lower bound by indistinguishability. Fix a determin- istic exact-query learner usingm < D queries, and run it on the zero function. This determines queried pointsx1, . . . , xm ∈ {−1, 1}n, all with observed value zero. Since these queries impose onlym homogeneous lin...
-
[38]
On that event, a fixed estimatorbf cannot be withinε of both posterior draws
Thus, choosing C0 > 0sufficiently large, two independent posterior draws are separated by more than 2ε in uniform norm with conditional probability at least3/4. On that event, a fixed estimatorbf cannot be withinε of both posterior draws. Whenever∥fµG −f µG′∥L∞ > 2ε, at least one of the two inequalities ∥fµG − bf∥ L∞ > ε and ∥fµG′ − bf∥ L∞ > ε must hold, ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.