Tight $L_\infty$ Sample Complexity for Low-Degree and Sparse Boolean Polynomials

Jasper van Doornmalen; Jos\'e Verschae; Mathieu Molina; Victor Verdugo

arxiv: 2606.17319 · v1 · pith:HCC4VJFGnew · submitted 2026-06-15 · 📊 stat.ML · cs.LG· math.CO· math.ST· stat.TH

Tight L_infty Sample Complexity for Low-Degree and Sparse Boolean Polynomials

Jasper van Doornmalen , Mathieu Molina , Victor Verdugo , Jos\'e Verschae This is my paper

Pith reviewed 2026-06-27 02:06 UTC · model grok-4.3

classification 📊 stat.ML cs.LGmath.COmath.STstat.TH

keywords sample complexityBoolean hypercubeL_infty estimationlow-degree polynomialssparse polynomialsFourier-Walsh expansionsubgaussian noiseuniform convergence

0 comments

The pith

Uniform L_infty estimation of degree-d Boolean polynomials requires Theta(n^{d+1}) samples under subgaussian noise.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper characterizes the minimax sample complexity for learning bounded Boolean polynomials with uniform L_infty error guarantees rather than L2-type averages, motivated by the need for surrogates whose optimizers remain close to the true optimum of a black-box objective. For polynomials of degree at most d the complexity scales as n^{d+1}; for s-sparse Fourier-Walsh polynomials it scales as n s^2. These rates exceed the noiseless exact-recovery rates of n^d and n s by extra factors of n and s. The lower bounds apply even to adaptive sampling, and the proofs use auxiliary norms as proxies because standard Fourier tools do not directly control uniform error.

Core claim

For polynomials of degree at most d on n variables the minimax sample complexity of uniform L_infty estimation under subgaussian noise is Theta(n^{d+1}). For s-sparse Fourier-Walsh polynomials with s at most n it is Theta(n s^2). These rates differ structurally from the noiseless setting where uniform exact recovery scales as n^d and n s respectively. The lower bounds hold for arbitrary adaptive learners, and the proofs rely on suitably chosen auxiliary norms that serve as proxies for the L_infty error.

What carries the argument

Suitably chosen auxiliary norms used as proxies for controlling the L_infty error, which enable uniform guarantees where standard Fourier-analysis tools for the L2 norm do not extend.

If this is right

The extra factor of n in sample complexity for low-degree polynomials is intrinsic even under adaptive sampling.
The extra factor of s appears for sparse polynomials and is likewise intrinsic.
Standard L2 Fourier tools fail to produce the needed uniform guarantees, requiring new proof techniques via auxiliary norms.
The structural gap between noisy and noiseless rates holds for both polynomial classes studied.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

These bounds suggest that practitioners may need to prefer lower-degree models over sparser ones when n is large and noise is present, to keep total samples manageable.
The same auxiliary-norm technique could be tested on other discrete domains or on functions that are neither low-degree nor sparse.
Empirical verification on small synthetic instances with known polynomials would check whether the predicted scaling appears in finite-sample regimes.

Load-bearing premise

Uniform L_infty error guarantees are required to ensure that optimizing the learned surrogate yields good solutions for the underlying bounded binary objective.

What would settle it

A learner achieving uniform L_infty approximation of all degree-d polynomials with o(n^{d+1}) subgaussian-noise samples on n variables, or a matching upper-bound algorithm for the sparse case with o(n s^2) samples.

read the original abstract

Motivated by the optimization of bounded binary black-box functions, we study the problem of learning polynomial surrogates over the Boolean hypercube. To ensure that optimizing the surrogate yields good solutions for the underlying objective, we require uniform $L_\infty$-error guarantees rather than the usual $L_2$-type guarantees. We characterize the minimax sample complexity of uniform estimation under subgaussian noise for two classes of bounded polynomials. First, for polynomials of degree at most $d$ on $n$ variables, the sample complexity scales as $n^{d+1}$. Second, for $s$-sparse Fourier-Walsh polynomials with $s \leq n$, it scales as $ns^2$. These rates differ structurally from the noiseless setting, where uniform exact recovery scales as $n^d$ and $ns$, respectively. Our lower bounds hold even for arbitrary adaptive learners, showing that the additional factors are intrinsic to the noisy cases. Standard Fourier-analysis tools for the $L_2$-norm do not naturally extend to the $L_\infty$-setting in a way that yields uniform guarantees. Our proofs overcome this difficulty by relying on suitably chosen auxiliary norms that serve as proxies for controlling the $L_\infty$-error. Together, our results provide a tight characterization of the sample complexity of learning optimization-safe polynomial surrogates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives tight matching bounds showing noisy L_infty estimation of degree-d and s-sparse Boolean polynomials requires n^{d+1} and ns^2 samples respectively, an extra factor over noiseless recovery.

read the letter

This paper pins down the sample complexity for uniform L_infty approximation of low-degree and sparse polynomials on the hypercube under sub-Gaussian noise. The rates are n^{d+1} for degree-d and ns^2 for s-sparse, and they match from both sides even against adaptive learners.

The work does well by separating the noisy case from the noiseless one with explicit information-theoretic arguments. The auxiliary-norm technique supplies the uniform control that standard Fourier L2 tools do not deliver directly. The distinction in rates is structural and cleanly derived.

No major derivation gaps show up in the central claims. The motivation that L_infty error is needed for optimization safety is a modeling choice, but the sample-complexity question stands on its own.

This is aimed at people working on surrogate optimization over discrete spaces or on minimax rates for uniform function estimation. A reader who needs the precise noisy rates will get direct use from it.

It deserves peer review. The bounds are sharp and the techniques look reusable.

Referee Report

0 major / 2 minor

Summary. The paper characterizes the minimax sample complexity of uniform L_∞ estimation for two classes of bounded polynomials on the Boolean hypercube under sub-Gaussian noise. For polynomials of degree at most d on n variables, the sample complexity scales as n^{d+1}. For s-sparse Fourier-Walsh polynomials with s ≤ n, it scales as n s^2. These rates are shown to be tight with matching upper and lower bounds; the lower bounds hold even for arbitrary adaptive learners. The rates differ structurally from the noiseless setting (n^d and n s, respectively). Proofs rely on auxiliary norms as proxies to control the L_∞ error, overcoming limitations of standard Fourier L_2 tools.

Significance. If the results hold, this work delivers a tight characterization of sample complexity for learning optimization-safe polynomial surrogates, which is relevant for black-box optimization of bounded functions. Credit is due for the matching bounds under noise, the information-theoretic arguments establishing adaptivity lower bounds, and the auxiliary-norm technique that yields explicit control for the claimed rates. These elements strengthen the contribution to statistical learning theory in noisy uniform-approximation settings.

minor comments (2)

The abstract motivates the L_∞ requirement by noting that it ensures optimizing the surrogate yields good solutions, but a short explicit inequality or reduction in the introduction would make this link more concrete for readers.
A comparison table of the noisy versus noiseless rates (n^{d+1} vs n^d, and n s^2 vs n s) placed early in the paper would improve readability and highlight the structural differences.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their careful reading and positive assessment of the manuscript. The report recommends minor revision but lists no specific major comments or requested changes. Accordingly, we have no individual points to address point-by-point. We are happy to incorporate any minor editorial suggestions that may arise during the revision process.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper presents minimax sample-complexity characterizations for uniform L_infty estimation of low-degree and s-sparse Boolean polynomials under sub-Gaussian noise. The claimed rates (n^{d+1} and ns^2) are derived via information-theoretic lower bounds that hold for adaptive learners and matching upper bounds obtained through auxiliary-norm proxies; these steps rely on standard Fourier analysis and concentration inequalities rather than any fitted parameters, self-definitions, or load-bearing self-citations. The distinction from the noiseless setting is established directly by comparing the noisy and exact-recovery regimes within the same analytic framework. No equation or claim reduces to its own input by construction, and the derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

Based on abstract only; standard subgaussian noise and bounded polynomial assumptions are invoked but not detailed.

axioms (2)

domain assumption Noise is subgaussian
Invoked to obtain concentration for uniform error bounds under noise.
domain assumption Polynomials are bounded
Required for the L_infty uniform estimation setting.

pith-pipeline@v0.9.1-grok · 5793 in / 1159 out tokens · 34986 ms · 2026-06-27T02:06:05.102988+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

38 extracted references · 23 canonical work pages

[1]

Efficiently learning Fourier sparse set functions

Andisheh Amrollahi, Amir Zandieh, Michael Kapralov, and Andreas Krause. Efficiently learning Fourier sparse set functions. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, 11 and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_fil...

2019
[2]

Learning polynomials with neural networks

Alexandr Andoni, Rina Panigrahy, Gregory Valiant, and Li Zhang. Learning polynomials with neural networks. In Eric P. Xing and Tony Jebara, editors,Proceedings of the 31st International Conference on Machine Learning, volume 32(2) ofProceedings of Machine Learning Research, pages 1908–1916, Beijing, China, June 2014. PMLR. URLhttps://proceedings.mlr.press...

1908
[3]

Learning sparse polynomial functions

Alexandr Andoni, Rina Panigrahy, Gregory Valiant, and Li Zhang. Learning sparse polynomial functions. InProceedings of the 2014 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 500–510, 2014.https://doi.org/10.1137/1.9781611973402.37

work page doi:10.1137/1.9781611973402.37 2014
[4]

Bayesian optimization of combinatorial structures

Ricardo Baptista and Matthias Poloczek. Bayesian optimization of combinatorial structures. In Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 462–471. PMLR, July 2018. URL https://proceedings.mlr.press/v80/baptista18a.html

2018
[5]

Inequalities in Fourier analysis.Annals of Mathematics, 102(1):159–182, 1975

William Beckner. Inequalities in Fourier analysis.Annals of Mathematics, 102(1):159–182, 1975. ISSN 0003486X, 19398980. URLhttp://www.jstor.org/stable/1970980

arXiv 1975
[6]

The Annals of Statistics , author =

Peter J. Bickel, Yaacov Ritov, and Alexandre B. Tsybakov. Simultaneous analysis of Lasso and Dantzig selector.The Annals of Statistics, 37(4):1705–1732, 2009.https://doi.org/10.1214/08-AOS620

work page doi:10.1214/08-aos620 2009
[7]

H. F. Bohnenblust and Einar Hille. On the absolute convergence of dirichlet series.The Annals of Mathematics, 32(3):600, July 1931. ISSN 0003-486X.https://doi.org/10.2307/1968255

work page doi:10.2307/1968255 1931
[8]

étude des coefficients de Fourier des fonctions deLp(G).Annales de l’Institut Fourier, 20 (2):335–402, 1970.https://doi.org/10.5802/aif.357

Aline Bonami. étude des coefficients de Fourier des fonctions deLp(G).Annales de l’Institut Fourier, 20 (2):335–402, 1970.https://doi.org/10.5802/aif.357

work page doi:10.5802/aif.357 1970
[9]

The Dantzig selector: Statistical estimation when p is much larger than n.The Annals of Statistics, 35(6):2313–2351, 2007

Emmanuel Candes and Terence Tao. The Dantzig selector: Statistical estimation when p is much larger than n.The Annals of Statistics, 35(6):2313–2351, 2007. https://doi.org/10.1214/ 009053606000001523

2007
[10]

On Bayes risk lower bounds.Journal of Machine Learning Research, 17(218):1–58, 2016

Xi Chen, Adityanand Guntuboyina, and Yuchen Zhang. On Bayes risk lower bounds.Journal of Machine Learning Research, 17(218):1–58, 2016. URLhttp://jmlr.org/papers/v17/16-185.html

2016
[11]

Hammer.Boolean Functions: Theory, Algorithms, and Applications

Yves Crama and Peter L. Hammer.Boolean Functions: Theory, Algorithms, and Applications. Encyclopedia of Mathematics and its Applications. Cambridge University Press, 2011. https: //doi.org/10.1017/CBO9780511852008

work page doi:10.1017/cbo9780511852008 2011
[12]

Hoffman, Troy David Loeffler, and Subramanian Sankaranarayanan

Hamid Dadkhahi, Karthikeyan Shanmugam, Jesus Rios, Payel Das, Samuel C. Hoffman, Troy David Loeffler, and Subramanian Sankaranarayanan. Combinatorial black-box optimization with expert advice. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’20, pages 1918–1927. ACM, August 2020.https://doi.org/10....

work page doi:10.1145/3394486.3403243 1918
[13]

Fourier representations for black- box optimization over categorical variables.Proceedings of the AAAI Conference on Artificial Intelligence, 36(9):10156–10165, June 2022

Hamid Dadkhahi, Jesus Rios, Karthikeyan Shanmugam, and Payel Das. Fourier representations for black- box optimization over categorical variables.Proceedings of the AAAI Conference on Artificial Intelligence, 36(9):10156–10165, June 2022. ISSN 2159-5399.https://doi.org/10.1609/aaai.v36i9.21255

work page doi:10.1609/aaai.v36i9.21255 2022
[14]

On the Fourier spectrum of functions on boolean cubes.Mathematische Annalen, 374(1-2):653–680, September 2018

Andreas Defant, Mieczysław Mastyło, and Antonio Pérez. On the Fourier spectrum of functions on boolean cubes.Mathematische Annalen, 374(1-2):653–680, September 2018. ISSN 1432-1807. https://doi.org/10.1007/s00208-018-1756-y

work page doi:10.1007/s00208-018-1756-y 2018
[15]

Walsh-based surrogate-assisted multi-objective combinatorial optimization: A fine-grained analysis for pseudo-boolean functions.Applied Soft Computing, 136:110061, March 2023

Bilel Derbel, Geoffrey Pruvost, Arnaud Liefooghe, Sébastien Verel, and Qingfu Zhang. Walsh-based surrogate-assisted multi-objective combinatorial optimization: A fine-grained analysis for pseudo-boolean functions.Applied Soft Computing, 136:110061, March 2023. ISSN 1568-4946.https://doi.org/10. 1016/j.asoc.2023.110061. 12

arXiv 2023
[16]

High-dimensional Bayesian optimization with sparse axis- aligned subspaces

David Eriksson and Martin Jankowiak. High-dimensional Bayesian optimization with sparse axis- aligned subspaces. In Cassio de Campos and Marloes H. Maathuis, editors,Proceedings of the Thirty- Seventh Conference on Uncertainty in Artificial Intelligence, volume 161 ofProceedings of Machine Learning Research, pages 493–503. PMLR, July 2021. URLhttps://proc...

2021
[17]

Learning low-degree functions from a logarithmic number of random queries

Alexandros Eskenazis and Paata Ivanisvili. Learning low-degree functions from a logarithmic number of random queries. InProceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC ’22. ACM, June 2022.https://doi.org/10.1145/3519935.3519981

work page doi:10.1145/3519935.3519981 2022
[18]

Low-degree learning and the metric entropy of polynomials.Discrete Analysis, November 2023.https://doi.org/10.19086/da.88507

Alexandros Eskenazis, Paata Ivanisvili, and Lauritz Streck. Low-degree learning and the metric entropy of polynomials.Discrete Analysis, November 2023.https://doi.org/10.19086/da.88507

work page doi:10.19086/da.88507 2023
[19]

The restricted isometry property of subsampled fourier matrices

Ishay Haviv and Oded Regev. The restricted isometry property of subsampled fourier matrices. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’16, pages 288–297, USA, 2016. Society for Industrial and Applied Mathematics. ISBN 9781611974331. https://doi.org/10.1007/978-3-319-45282-1_11

work page doi:10.1007/978-3-319-45282-1_11 2016
[20]

Communication network reliability under geograph- ically correlated failures using probabilistic seismic hazard analysis.IEEE Access, 11:31341–31354, 2023

Dora Jiménez, Javiera Barrera, and Héctor Cancela. Communication network reliability under geograph- ically correlated failures using probabilistic seismic hazard analysis.IEEE Access, 11:31341–31354, 2023. ISSN 2169-3536.https://doi.org/10.1109/access.2023.3255794

work page doi:10.1109/access.2023.3255794 2023
[21]

2025, PASA, 42, e136, doi: 10.1017/pasa.2025.10101

David Krieg, Erich Novak, and Mario Ullrich. On the power of adaption and randomization.Forum of Mathematics, Sigma, 13, 2025. ISSN 2050-5094.https://doi.org/10.1017/fms.2025.10101. URL http://dx.doi.org/10.1017/fms.2025.10101

work page doi:10.1017/fms.2025.10101 2025
[22]

Lattimore and C

T. Lattimore and C. Szepesvári.Bandit Algorithms. Cambridge University Press, 2020. ISBN 9781108486828. URLhttps://books.google.cl/books?id=bydXzAEACAAJ

2020
[23]

Springer New York, 2012

Jon Lee and Sven Leyffer.Mixed Integer Nonlinear Programming. Springer New York, 2012. ISBN 9781461419273.https://doi.org/10.1007/978-1-4614-1927-3

work page doi:10.1007/978-1-4614-1927-3 2012
[24]

Constant depth circuits, Fourier transform, and learnability.Journal of the ACM, 40(3):607–620, July 1993

Nathan Linial, Yishay Mansour, and Noam Nisan. Constant depth circuits, Fourier transform, and learnability.Journal of the ACM, 40(3):607–620, July 1993. ISSN 1557-735X.https://doi.org/10. 1145/174130.174138

arXiv 1993
[25]

Learning sparse Boolean polynomials

Sahand Negahban and Devavrat Shah. Learning sparse Boolean polynomials. In2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 2032–2036. IEEE, October 2012.https://doi.org/10.1109/allerton.2012.6483472

work page doi:10.1109/allerton.2012.6483472 2032
[26]

Cambridge University Press, June 2014

Ryan O’Donnell.Analysis of Boolean Functions. Cambridge University Press, June 2014. ISBN 9781107471542.https://doi.org/10.1017/cbo9781139814782

work page doi:10.1017/cbo9781139814782 2014
[27]

Combinatorial Bayesian optimiza- tion using the graph Cartesian product

Changyong Oh, Jakub Tomczak, Efstratios Gavves, and Max Welling. Combinatorial Bayesian optimiza- tion using the graph Cartesian product. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Cur- ran Associates, Inc., 2019. URLhttps://proceedings.neurips...

2019
[28]

Pijnappel, J.L

T.R. Pijnappel, J.L. van den Berg, S.C. Borst, and R. Litjens. Online positioning of a drone-mounted base station in emergency scenarios.IEEE Transactions on Vehicular Technology, 73(4):5572–5586, April 2024. ISSN 1939-9359.https://doi.org/10.1109/tvt.2023.3329960

work page doi:10.1109/tvt.2023.3329960 2024
[29]

High-dimensional statistics, 2023

Philippe Rigollet and Jan-Christian Hütter. High-dimensional statistics, 2023. URLhttps://doi.org/ 10.48550/arXiv.2310.19244. Lecture notes for MIT course

work page doi:10.48550/arxiv.2310.19244 2023
[30]

Adams, and Nando de Freitas

Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P. Adams, and Nando de Freitas. Taking the human out of the loop: A review of Bayesian optimization.Proceedings of the IEEE, 104(1):148–175, January
[31]

ISSN 1558-2256.https://doi.org/10.1109/jproc.2015.2494218. 13

work page doi:10.1109/jproc.2015.2494218 2015
[32]

Learning fourier sparse set functions

Peter Stobbe and Andreas Krause. Learning fourier sparse set functions. InProceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, volume 22 ofProceedings of Machine Learning Research, La Palma, Canary Islands, April 2012. PMLR. URLhttps://proceedings.mlr. press/v22/stobbe12.html

2012
[33]

Regression shrinkage and selection via the Lasso.Journal of the Royal Sta- tistical Society: Series B (Methodological), 58(1):267–288, 1996

Robert Tibshirani. Regression shrinkage and selection via the Lasso.Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996.https://doi.org/10.1111/j.2517-6161. 1996.tb02080.x

work page doi:10.1111/j.2517-6161 1996
[34]

Tsybakov.Introduction to Nonparametric Estimation

Alexandre B. Tsybakov.Introduction to Nonparametric Estimation. Springer New York, 2009. ISBN 9780387790527.https://doi.org/10.1007/b13794. URLhttp://dx.doi.org/10.1007/b13794

work page doi:10.1007/b13794 2009
[35]

High-Dimensional Probability: An Introduction with Applications in Data Science, volume 47 of Cambridge Series in Statistical and Probabilistic Mathematics

Roman Vershynin.High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018. https://doi.org/10.1017/9781108231596

work page doi:10.1017/9781108231596 2018
[36]

, year =

Martin J. Wainwright.High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, February 2019. ISBN 9781108498029.https://doi.org/10.1017/9781108627771. 14 A Tools for upper and lower-bounds A.1 Walsh polynomials Lemma A.1(Moments of signed Walsh sums).Let A ⊆ 2[n] sa...

work page doi:10.1017/9781108627771 2019
[37]

Deterministic lower bound.We prove the lower bound by indistinguishability

to lift this lower bound to randomized algorithms. Deterministic lower bound.We prove the lower bound by indistinguishability. Fix a determin- istic exact-query learner usingm < D queries, and run it on the zero function. This determines queried pointsx1, . . . , xm ∈ {−1, 1}n, all with observed value zero. Since these queries impose onlym homogeneous lin...
[38]

On that event, a fixed estimatorbf cannot be withinε of both posterior draws

Thus, choosing C0 > 0sufficiently large, two independent posterior draws are separated by more than 2ε in uniform norm with conditional probability at least3/4. On that event, a fixed estimatorbf cannot be withinε of both posterior draws. Whenever∥fµG −f µG′∥L∞ > 2ε, at least one of the two inequalities ∥fµG − bf∥ L∞ > ε and ∥fµG′ − bf∥ L∞ > ε must hold, ...

[1] [1]

Efficiently learning Fourier sparse set functions

Andisheh Amrollahi, Amir Zandieh, Michael Kapralov, and Andreas Krause. Efficiently learning Fourier sparse set functions. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, 11 and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Curran Associates, Inc., 2019. URL https://proceedings.neurips.cc/paper_fil...

2019

[2] [2]

Learning polynomials with neural networks

Alexandr Andoni, Rina Panigrahy, Gregory Valiant, and Li Zhang. Learning polynomials with neural networks. In Eric P. Xing and Tony Jebara, editors,Proceedings of the 31st International Conference on Machine Learning, volume 32(2) ofProceedings of Machine Learning Research, pages 1908–1916, Beijing, China, June 2014. PMLR. URLhttps://proceedings.mlr.press...

1908

[3] [3]

Learning sparse polynomial functions

Alexandr Andoni, Rina Panigrahy, Gregory Valiant, and Li Zhang. Learning sparse polynomial functions. InProceedings of the 2014 Annual ACM-SIAM Symposium on Discrete Algorithms (SODA), pages 500–510, 2014.https://doi.org/10.1137/1.9781611973402.37

work page doi:10.1137/1.9781611973402.37 2014

[4] [4]

Bayesian optimization of combinatorial structures

Ricardo Baptista and Matthias Poloczek. Bayesian optimization of combinatorial structures. In Jennifer Dy and Andreas Krause, editors,Proceedings of the 35th International Conference on Machine Learning, volume 80 ofProceedings of Machine Learning Research, pages 462–471. PMLR, July 2018. URL https://proceedings.mlr.press/v80/baptista18a.html

2018

[5] [5]

Inequalities in Fourier analysis.Annals of Mathematics, 102(1):159–182, 1975

William Beckner. Inequalities in Fourier analysis.Annals of Mathematics, 102(1):159–182, 1975. ISSN 0003486X, 19398980. URLhttp://www.jstor.org/stable/1970980

arXiv 1975

[6] [6]

The Annals of Statistics , author =

Peter J. Bickel, Yaacov Ritov, and Alexandre B. Tsybakov. Simultaneous analysis of Lasso and Dantzig selector.The Annals of Statistics, 37(4):1705–1732, 2009.https://doi.org/10.1214/08-AOS620

work page doi:10.1214/08-aos620 2009

[7] [7]

H. F. Bohnenblust and Einar Hille. On the absolute convergence of dirichlet series.The Annals of Mathematics, 32(3):600, July 1931. ISSN 0003-486X.https://doi.org/10.2307/1968255

work page doi:10.2307/1968255 1931

[8] [8]

étude des coefficients de Fourier des fonctions deLp(G).Annales de l’Institut Fourier, 20 (2):335–402, 1970.https://doi.org/10.5802/aif.357

Aline Bonami. étude des coefficients de Fourier des fonctions deLp(G).Annales de l’Institut Fourier, 20 (2):335–402, 1970.https://doi.org/10.5802/aif.357

work page doi:10.5802/aif.357 1970

[9] [9]

The Dantzig selector: Statistical estimation when p is much larger than n.The Annals of Statistics, 35(6):2313–2351, 2007

Emmanuel Candes and Terence Tao. The Dantzig selector: Statistical estimation when p is much larger than n.The Annals of Statistics, 35(6):2313–2351, 2007. https://doi.org/10.1214/ 009053606000001523

2007

[10] [10]

On Bayes risk lower bounds.Journal of Machine Learning Research, 17(218):1–58, 2016

Xi Chen, Adityanand Guntuboyina, and Yuchen Zhang. On Bayes risk lower bounds.Journal of Machine Learning Research, 17(218):1–58, 2016. URLhttp://jmlr.org/papers/v17/16-185.html

2016

[11] [11]

Hammer.Boolean Functions: Theory, Algorithms, and Applications

Yves Crama and Peter L. Hammer.Boolean Functions: Theory, Algorithms, and Applications. Encyclopedia of Mathematics and its Applications. Cambridge University Press, 2011. https: //doi.org/10.1017/CBO9780511852008

work page doi:10.1017/cbo9780511852008 2011

[12] [12]

Hoffman, Troy David Loeffler, and Subramanian Sankaranarayanan

Hamid Dadkhahi, Karthikeyan Shanmugam, Jesus Rios, Payel Das, Samuel C. Hoffman, Troy David Loeffler, and Subramanian Sankaranarayanan. Combinatorial black-box optimization with expert advice. InProceedings of the 26th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, KDD ’20, pages 1918–1927. ACM, August 2020.https://doi.org/10....

work page doi:10.1145/3394486.3403243 1918

[13] [13]

Fourier representations for black- box optimization over categorical variables.Proceedings of the AAAI Conference on Artificial Intelligence, 36(9):10156–10165, June 2022

Hamid Dadkhahi, Jesus Rios, Karthikeyan Shanmugam, and Payel Das. Fourier representations for black- box optimization over categorical variables.Proceedings of the AAAI Conference on Artificial Intelligence, 36(9):10156–10165, June 2022. ISSN 2159-5399.https://doi.org/10.1609/aaai.v36i9.21255

work page doi:10.1609/aaai.v36i9.21255 2022

[14] [14]

On the Fourier spectrum of functions on boolean cubes.Mathematische Annalen, 374(1-2):653–680, September 2018

Andreas Defant, Mieczysław Mastyło, and Antonio Pérez. On the Fourier spectrum of functions on boolean cubes.Mathematische Annalen, 374(1-2):653–680, September 2018. ISSN 1432-1807. https://doi.org/10.1007/s00208-018-1756-y

work page doi:10.1007/s00208-018-1756-y 2018

[15] [15]

Walsh-based surrogate-assisted multi-objective combinatorial optimization: A fine-grained analysis for pseudo-boolean functions.Applied Soft Computing, 136:110061, March 2023

Bilel Derbel, Geoffrey Pruvost, Arnaud Liefooghe, Sébastien Verel, and Qingfu Zhang. Walsh-based surrogate-assisted multi-objective combinatorial optimization: A fine-grained analysis for pseudo-boolean functions.Applied Soft Computing, 136:110061, March 2023. ISSN 1568-4946.https://doi.org/10. 1016/j.asoc.2023.110061. 12

arXiv 2023

[16] [16]

High-dimensional Bayesian optimization with sparse axis- aligned subspaces

David Eriksson and Martin Jankowiak. High-dimensional Bayesian optimization with sparse axis- aligned subspaces. In Cassio de Campos and Marloes H. Maathuis, editors,Proceedings of the Thirty- Seventh Conference on Uncertainty in Artificial Intelligence, volume 161 ofProceedings of Machine Learning Research, pages 493–503. PMLR, July 2021. URLhttps://proc...

2021

[17] [17]

Learning low-degree functions from a logarithmic number of random queries

Alexandros Eskenazis and Paata Ivanisvili. Learning low-degree functions from a logarithmic number of random queries. InProceedings of the 54th Annual ACM SIGACT Symposium on Theory of Computing, STOC ’22. ACM, June 2022.https://doi.org/10.1145/3519935.3519981

work page doi:10.1145/3519935.3519981 2022

[18] [18]

Low-degree learning and the metric entropy of polynomials.Discrete Analysis, November 2023.https://doi.org/10.19086/da.88507

Alexandros Eskenazis, Paata Ivanisvili, and Lauritz Streck. Low-degree learning and the metric entropy of polynomials.Discrete Analysis, November 2023.https://doi.org/10.19086/da.88507

work page doi:10.19086/da.88507 2023

[19] [19]

The restricted isometry property of subsampled fourier matrices

Ishay Haviv and Oded Regev. The restricted isometry property of subsampled fourier matrices. In Proceedings of the Twenty-Seventh Annual ACM-SIAM Symposium on Discrete Algorithms, SODA ’16, pages 288–297, USA, 2016. Society for Industrial and Applied Mathematics. ISBN 9781611974331. https://doi.org/10.1007/978-3-319-45282-1_11

work page doi:10.1007/978-3-319-45282-1_11 2016

[20] [20]

Communication network reliability under geograph- ically correlated failures using probabilistic seismic hazard analysis.IEEE Access, 11:31341–31354, 2023

Dora Jiménez, Javiera Barrera, and Héctor Cancela. Communication network reliability under geograph- ically correlated failures using probabilistic seismic hazard analysis.IEEE Access, 11:31341–31354, 2023. ISSN 2169-3536.https://doi.org/10.1109/access.2023.3255794

work page doi:10.1109/access.2023.3255794 2023

[21] [21]

2025, PASA, 42, e136, doi: 10.1017/pasa.2025.10101

David Krieg, Erich Novak, and Mario Ullrich. On the power of adaption and randomization.Forum of Mathematics, Sigma, 13, 2025. ISSN 2050-5094.https://doi.org/10.1017/fms.2025.10101. URL http://dx.doi.org/10.1017/fms.2025.10101

work page doi:10.1017/fms.2025.10101 2025

[22] [22]

Lattimore and C

T. Lattimore and C. Szepesvári.Bandit Algorithms. Cambridge University Press, 2020. ISBN 9781108486828. URLhttps://books.google.cl/books?id=bydXzAEACAAJ

2020

[23] [23]

Springer New York, 2012

Jon Lee and Sven Leyffer.Mixed Integer Nonlinear Programming. Springer New York, 2012. ISBN 9781461419273.https://doi.org/10.1007/978-1-4614-1927-3

work page doi:10.1007/978-1-4614-1927-3 2012

[24] [24]

Constant depth circuits, Fourier transform, and learnability.Journal of the ACM, 40(3):607–620, July 1993

Nathan Linial, Yishay Mansour, and Noam Nisan. Constant depth circuits, Fourier transform, and learnability.Journal of the ACM, 40(3):607–620, July 1993. ISSN 1557-735X.https://doi.org/10. 1145/174130.174138

arXiv 1993

[25] [25]

Learning sparse Boolean polynomials

Sahand Negahban and Devavrat Shah. Learning sparse Boolean polynomials. In2012 50th Annual Allerton Conference on Communication, Control, and Computing (Allerton), pages 2032–2036. IEEE, October 2012.https://doi.org/10.1109/allerton.2012.6483472

work page doi:10.1109/allerton.2012.6483472 2032

[26] [26]

Cambridge University Press, June 2014

Ryan O’Donnell.Analysis of Boolean Functions. Cambridge University Press, June 2014. ISBN 9781107471542.https://doi.org/10.1017/cbo9781139814782

work page doi:10.1017/cbo9781139814782 2014

[27] [27]

Combinatorial Bayesian optimiza- tion using the graph Cartesian product

Changyong Oh, Jakub Tomczak, Efstratios Gavves, and Max Welling. Combinatorial Bayesian optimiza- tion using the graph Cartesian product. In H. Wallach, H. Larochelle, A. Beygelzimer, F. d'Alché-Buc, E. Fox, and R. Garnett, editors,Advances in Neural Information Processing Systems, volume 32. Cur- ran Associates, Inc., 2019. URLhttps://proceedings.neurips...

2019

[28] [28]

Pijnappel, J.L

T.R. Pijnappel, J.L. van den Berg, S.C. Borst, and R. Litjens. Online positioning of a drone-mounted base station in emergency scenarios.IEEE Transactions on Vehicular Technology, 73(4):5572–5586, April 2024. ISSN 1939-9359.https://doi.org/10.1109/tvt.2023.3329960

work page doi:10.1109/tvt.2023.3329960 2024

[29] [29]

High-dimensional statistics, 2023

Philippe Rigollet and Jan-Christian Hütter. High-dimensional statistics, 2023. URLhttps://doi.org/ 10.48550/arXiv.2310.19244. Lecture notes for MIT course

work page doi:10.48550/arxiv.2310.19244 2023

[30] [30]

Adams, and Nando de Freitas

Bobak Shahriari, Kevin Swersky, Ziyu Wang, Ryan P. Adams, and Nando de Freitas. Taking the human out of the loop: A review of Bayesian optimization.Proceedings of the IEEE, 104(1):148–175, January

[31] [31]

ISSN 1558-2256.https://doi.org/10.1109/jproc.2015.2494218. 13

work page doi:10.1109/jproc.2015.2494218 2015

[32] [32]

Learning fourier sparse set functions

Peter Stobbe and Andreas Krause. Learning fourier sparse set functions. InProceedings of the Fifteenth International Conference on Artificial Intelligence and Statistics, volume 22 ofProceedings of Machine Learning Research, La Palma, Canary Islands, April 2012. PMLR. URLhttps://proceedings.mlr. press/v22/stobbe12.html

2012

[33] [33]

Regression shrinkage and selection via the Lasso.Journal of the Royal Sta- tistical Society: Series B (Methodological), 58(1):267–288, 1996

Robert Tibshirani. Regression shrinkage and selection via the Lasso.Journal of the Royal Statistical Society: Series B (Methodological), 58(1):267–288, 1996.https://doi.org/10.1111/j.2517-6161. 1996.tb02080.x

work page doi:10.1111/j.2517-6161 1996

[34] [34]

Tsybakov.Introduction to Nonparametric Estimation

Alexandre B. Tsybakov.Introduction to Nonparametric Estimation. Springer New York, 2009. ISBN 9780387790527.https://doi.org/10.1007/b13794. URLhttp://dx.doi.org/10.1007/b13794

work page doi:10.1007/b13794 2009

[35] [35]

High-Dimensional Probability: An Introduction with Applications in Data Science, volume 47 of Cambridge Series in Statistical and Probabilistic Mathematics

Roman Vershynin.High-Dimensional Probability: An Introduction with Applications in Data Science. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, 2018. https://doi.org/10.1017/9781108231596

work page doi:10.1017/9781108231596 2018

[36] [36]

, year =

Martin J. Wainwright.High-Dimensional Statistics: A Non-Asymptotic Viewpoint. Cambridge Series in Statistical and Probabilistic Mathematics. Cambridge University Press, February 2019. ISBN 9781108498029.https://doi.org/10.1017/9781108627771. 14 A Tools for upper and lower-bounds A.1 Walsh polynomials Lemma A.1(Moments of signed Walsh sums).Let A ⊆ 2[n] sa...

work page doi:10.1017/9781108627771 2019

[37] [37]

Deterministic lower bound.We prove the lower bound by indistinguishability

to lift this lower bound to randomized algorithms. Deterministic lower bound.We prove the lower bound by indistinguishability. Fix a determin- istic exact-query learner usingm < D queries, and run it on the zero function. This determines queried pointsx1, . . . , xm ∈ {−1, 1}n, all with observed value zero. Since these queries impose onlym homogeneous lin...

[38] [38]

On that event, a fixed estimatorbf cannot be withinε of both posterior draws

Thus, choosing C0 > 0sufficiently large, two independent posterior draws are separated by more than 2ε in uniform norm with conditional probability at least3/4. On that event, a fixed estimatorbf cannot be withinε of both posterior draws. Whenever∥fµG −f µG′∥L∞ > 2ε, at least one of the two inequalities ∥fµG − bf∥ L∞ > ε and ∥fµG′ − bf∥ L∞ > ε must hold, ...