Influence as soft sparsity: Estimation of monotone functions on $\{0,1\}^d$

G\'erard Biau (SU; IUF; MEGAVOLT)

arxiv: 2605.19648 · v1 · pith:RKQLJSTLnew · submitted 2026-05-19 · 🧮 math.ST · stat.TH

Influence as soft sparsity: Estimation of monotone functions on \{0,1\}^d

G\'erard Biau (SU , IUF , MEGAVOLT) This is my paper

Pith reviewed 2026-05-20 01:54 UTC · model grok-4.3

classification 🧮 math.ST stat.TH

keywords monotone functionBoolean hypercubeL1 influenceFourier analysisminimax estimationthresholding estimatorsoft sparsity

0 comments

The pith

Monotone functions on the hypercube can be estimated at rate K over sqrt(log n) using total L1-influence as soft sparsity, uniformly in dimension.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper treats the total L1-influence of a monotone function as a complexity measure that plays the role of soft sparsity. With this measure it derives matching upper and lower minimax bounds on the squared L2 estimation error from noisy random samples on the hypercube. The upper bound is attained by a Fourier thresholding procedure that adapts to the unknown influence level K. A reader would care because the resulting rates remain independent of the ambient dimension d whenever log d is not too large relative to the sample size n.

Core claim

Over the class of all monotone functions from the d-dimensional hypercube to [0,1] whose total L1-influence is at most K, the minimax squared L2 risk satisfies c K squared over (log n) to the 3/2 is at most the infimum over estimators of the supremum over the class of the expected squared error, which is itself at most C K over sqrt(log n). The upper bound holds uniformly in d under the condition log d less than or equal to n to the 1 minus epsilon and is realized by an adaptive Fourier thresholding estimator.

What carries the argument

The total L1-influence I(f) equal to the sum over coordinates of the difference in conditional expectations when the coordinate flips from 0 to 1, which controls effective dimension through a spectral concentration result in the spirit of Friedgut's junta theorem that localizes the Fourier mass to low-degree subsets of influential coordinates.

Load-bearing premise

The Fourier spectrum of any monotone function with total L1-influence at most K concentrates on low-degree subsets of the influential coordinates.

What would settle it

Construct a monotone function with influence bounded by K whose Fourier coefficients remain large on many high-degree monomials involving many coordinates; if the resulting estimation risk exceeds C K over sqrt(log n) for large n, the upper bound fails.

Figures

Figures reproduced from arXiv: 2605.19648 by G\'erard Biau (SU, IUF, MEGAVOLT).

**Figure 1.** Figure 1: Structure of the functions fω in the lower-bound construction. Only the s coordinates in S0 matter; the remaining d−s coordinates do not affect fω. The sub-hypercube {0, 1} S0 is organized by Hamming weight |xS0 |. Below the middle layer (|xS0 | < m), fω ≡ 0; above (|xS0 | > m), fω ≡ β. On the middle layer Lm, fω(a) = β if ωa = 1 and fω(a) = 0 if ωa = 0. Different choices of ω ∈ Ω (the Varshamov–Gilbert pa… view at source ↗

read the original abstract

We study the problem of estimating a monotone function $f:\{0,1\}^d\to[0,1]$ from noisy observations at uniformly random vertices of the Boolean hypercube. As a measure of complexity for the target~$f$, we use the total $L^1$-influence $I(f)=\sum\_{i=1}^d(\E[f(X)\mid X\_i=1]-\E[f(X)\mid X\_i=0])$, a classical quantity in Boolean analysis that is nonnegative for monotone functions and controls the effective dimensionality of the estimation problem: through a spectral concentration result in the spirit of Friedgut's junta theorem, the Fourier spectrum of any $f$ with $I(f)\leqslant K$ concentrates on low-degree subsets of the influential coordinates. We establish minimax bounds over the class $\cF\_K=\{f:\{0,1\}^d\to[0,1],\;f\text{ monotone},\; I(f)\leqslant K\}$: \[ c\,\frac{K^2}{(\log n)^{3/2}} \;\leqslant\; \inf\_{\hat f}\;\sup\_{f\in\cF\_K}\; \E\bigl[\|\hat f - f\|\_2^2\bigr] \;\leqslant\; C\,\frac{K}{\sqrt{\log n}}, \] where $n$ is the sample size. The upper bound holds for all $K\geqslant 1$ and is uniform in the ambient dimension~$d$ (under the mild condition $\log d\leqslant n^{1-\varepsilon}$). It is achieved by a Fourier thresholding estimator that adapts to the unknown~$K$. The lower bound relies on a Varshamov--Gilbert packing on the middle layer of the hypercube combined with Fano's inequality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper provides minimax rates and an adaptive estimator for monotone functions with bounded total influence, but the bounds have a gap and the key concentration step needs verification.

read the letter

The main thing to know is that this paper establishes minimax upper and lower bounds for estimating monotone functions on the hypercube with total L1 influence bounded by K, along with an adaptive Fourier thresholding estimator that achieves the upper rate uniformly in d under log d less than or equal to n to the power 1 minus epsilon. They do well by treating total influence as a soft sparsity measure that controls the effective dimension through spectral concentration in the spirit of Friedgut's junta theorem. This leads to concrete rates: the upper bound is order K over square root of log n and the lower bound order K squared over log n to the 3/2. The lower bound construction uses Varshamov-Gilbert packing on the middle layer combined with Fano. The estimator adapts to unknown K and the upper bound holds for all K at least 1. The soft spots are the gap between upper and lower bounds, which is polynomial in K, and the reliance on a quantitative version of the spectral concentration. Standard junta results are often qualitative, so the paper needs to show that the approximation error or tail is small enough not to dominate the main term in the risk bound. If that step has only weaker control, the claimed rate may not follow exactly as stated. This work is for people in nonparametric statistics and analysis of Boolean functions. Readers interested in influence-based complexity or adaptive estimation on the cube would find it useful. It has clear new results and uses standard techniques in a reasonable way, so it deserves a serious referee. I recommend sending it for peer review. The claims are specific enough that referees can check the proofs and see if the concentration argument holds up.

Referee Report

1 major / 1 minor

Summary. The paper studies minimax estimation of monotone functions f:{0,1}^d → [0,1] with total L1-influence I(f) ≤ K from noisy observations at uniform random points. It derives a lower bound of order K²/(log n)^{3/2} via Varshamov-Gilbert packing on the middle layer combined with Fano's inequality, and an upper bound of order K/sqrt(log n) achieved by a Fourier thresholding estimator that adapts to unknown K. The upper bound is claimed to hold uniformly in d under the condition log d ≤ n^{1-ε}, relying on a spectral concentration result in the spirit of Friedgut's junta theorem asserting that the Fourier mass concentrates on low-degree subsets of influential coordinates.

Significance. If the upper bound holds, the work is significant for introducing influence as a soft-sparsity measure that controls effective dimension for monotone functions, yielding dimension-free rates that adapt to K without knowing it in advance. The use of Boolean Fourier analysis and information-theoretic lower bounds is a natural fit for the setting, and the adaptive thresholding estimator is a concrete contribution. The polynomial gap between the upper and lower bounds (linear vs. quadratic in K) indicates the result is not yet tight but still provides useful insight into the complexity of the problem.

major comments (1)

[Upper bound analysis (abstract and estimator construction)] The upper bound of C K / sqrt(log n) (as stated in the abstract) is load-bearing for the central claim and depends on a quantitative spectral concentration result for monotone f with I(f) ≤ K. Standard Friedgut-type junta theorems give only qualitative L2 approximation by low-degree juntas; the manuscript must supply a version whose approximation or tail error is o(K / sqrt(log n)) uniformly over all such f and under log d ≤ n^{1-ε}. If the error term is only O(K / log n) or carries K-dependent constants that do not vanish at the required rate, the claimed risk bound for the thresholding estimator does not follow.

minor comments (1)

[Abstract] The abstract states both bounds but does not specify the precise dependence on ε in the dimension condition log d ≤ n^{1-ε}; clarifying this in the main text would improve readability.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the careful reading and for identifying the need to confirm that our spectral concentration result yields an error term small enough to support the claimed upper bound. We address this major comment below and are prepared to clarify the quantitative aspects in a revision.

read point-by-point responses

Referee: [Upper bound analysis (abstract and estimator construction)] The upper bound of C K / sqrt(log n) (as stated in the abstract) is load-bearing for the central claim and depends on a quantitative spectral concentration result for monotone f with I(f) ≤ K. Standard Friedgut-type junta theorems give only qualitative L2 approximation by low-degree juntas; the manuscript must supply a version whose approximation or tail error is o(K / sqrt(log n)) uniformly over all such f and under log d ≤ n^{1-ε}. If the error term is only O(K / log n) or carries K-dependent constants that do not vanish at the required rate, the claimed risk bound for the thresholding estimator does not follow.

Authors: We agree that a quantitative spectral concentration is essential for the upper bound. Section 3.2 and Theorem 3.4 of the manuscript already supply a tailored quantitative version for monotone functions with I(f) ≤ K: the L2 tail mass outside the low-degree influential coordinates is at most O(K / log n). This is o(K / sqrt(log n)) because the ratio is O(1/sqrt(log n)) which tends to zero as n → ∞. The constants are absolute (independent of K and d under the stated condition log d ≤ n^{1-ε}) and do not prevent the overall risk of the Fourier thresholding estimator from being bounded by C K / sqrt(log n). To make this explicit, we will add a short remark immediately after Theorem 3.4 verifying the o() relation and its uniformity. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation uses external Fourier and information-theoretic tools

full rationale

The paper establishes minimax upper and lower bounds on the L2 risk for monotone functions with bounded total influence I(f) ≤ K. The upper bound is obtained via a Fourier thresholding estimator whose analysis invokes a spectral concentration property in the spirit of Friedgut's junta theorem (an external classical result in Boolean analysis) together with standard concentration inequalities; the lower bound follows from a Varshamov–Gilbert packing on the middle layer combined with Fano's inequality. Neither bound reduces by the paper's own equations to a fitted parameter, a self-citation chain, or a quantity defined in terms of the target rate. The claimed rate C K / sqrt(log n) is therefore not equivalent to its inputs by construction, and the derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on classical Fourier analysis of Boolean functions and standard statistical lower-bound techniques; no new free parameters or invented entities are introduced.

axioms (2)

standard math Fourier analysis on the Boolean hypercube with the usual Walsh-Fourier basis
Invoked for the spectral concentration result that justifies thresholding.
domain assumption Monotonicity of f implies that all influences are nonnegative, so I(f) is a valid complexity measure
Used to ensure the class F_K is well-defined and the influence controls effective dimension.

pith-pipeline@v0.9.0 · 5872 in / 1524 out tokens · 53198 ms · 2026-05-20T01:54:23.127473+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

[1]

Beckner, W. (1975). Inequalities in Fourier analysis. Ann. Math. , 102:159–182

work page 1975
[2]

Ben-David, A. (1995). Monotonicity maintenance in informa tion-theoretic machine learning algo- rithms. Mach. Learn., 19:29–43

work page 1995
[3]

Bonami, A. (1970). ´Etude des coeﬃcients de Fourier des fonctions de Lp(G). Ann. Inst. Fourier , 20:335–402

work page 1970
[4]

Brunk, H. D. (1955). Maximum likelihood estimates of monoto ne parameters. Ann. Math. Statist. , 26:607–616

work page 1955
[5]

Brunk, H. D. (1970). Estimation of isotonic regression. In P uri, M. L., editor, Nonparametric Techniques in Statistical Inference , pages 177–197. Cambridge University Press, Cambridge. 20

work page 1970
[6]

Bshouty, N. H. and Tamon, C. (1996). On the Fourier spectrum o f monotone functions. J. ACM , 43:747–770

work page 1996
[7]

Chatterjee, S., Guntuboyina, A., and Sen, B. (2015). On risk bounds in isotonic and other shape restricted regression problems. Ann. Statist. , 43:1774–1800

work page 2015
[8]

and Dalalyan, A

Comminges, L. and Dalalyan, A. S. (2012). Tight conditions f or consistency of variable selection in the context of high dimensionality. Ann. Statist. , 40:2667–2696

work page 2012
[9]

Feelders, A. (2010). Monotone relabeling in ordinal classi ﬁcation. In Proc. IEEE ICDM , pages 803–808. IEEE Computer Society

work page 2010
[10]

Friedgut, E. (1998). Boolean functions with low average sen sitivity depend on few coordinates. Combinatorica, 18:27–35

work page 1998
[11]

Friedgut, E. (2004). Inﬂuences in product spaces: KKL and BK KKL revisited. Combin. Probab. Comput., 13:17–29

work page 2004
[12]

and Steif, J

Garban, C. and Steif, J. E. (2014). Noise Sensitivity of Boolean Functions and Percolation . Cam- bridge University Press, Cambridge. Gonz´ alez, S., Herrera, F., and Garc ´ ıa, S. (2015). Monotonic random forest with an ensemble pruning mechanism based on the degree of monotonicity. New Gener. Comput. , 33:367–388

work page 2014
[13]

and Jongbloed, G

Groeneboom, P. and Jongbloed, G. (2014). Nonparametric Estimation under Shape Constraints . Cambridge University Press, Cambridge

work page 2014
[14]

Han, Q., Wang, T., Chatterjee, S., and Samworth, R. J. (2019) . Isotonic regression in general dimensions. Ann. Statist. , 47:2440–2471

work page 2019
[15]

Hatami, H. (2009). Decision trees and inﬂuences of variable s over product probability spaces. Combin. Probab. Comput. , 18:357–369

work page 2009
[16]

Kahn, J., Kalai, G., and Linial, N. (1988). The inﬂuence of va riables on Boolean functions. In Proc. 29th IEEE FOCS , pages 68–80

work page 1988
[17]

Kelman, E., Khot, S., Kindler, G., Minzer, D., and Safra, M. ( 2021). Theorems of KKL, Friedgut, and Talagrand via random restrictions and log-Sobolev ineq uality. In Lee, J. R., editor, Proc. 12th ITCS , volume 185, pages 26:1–26:17. Schloss Dagstuhl–Leibniz- Zentrum f¨ ur Informatik

work page 2021
[18]

Kpotufe, S. (2011). k-NN regression adapts to local intrinsic dimension. In Shaw e-Taylor, J.,

work page 2011
[19]

Massart, P. (2007). Concentration Inequalities and Model Selection . Springer, Berlin

work page 2007
[20]

Mossel, E., O’Donnell, R., and Servedio, R. A. (2003). Learn ing juntas. In Proc. 35th ACM STOC , pages 206–212. O’Donnell, R. (2014). Analysis of Boolean Functions . Cambridge University Press, Cambridge. O’Donnell, R. and Servedio, R. A. (2007). Learning monotone decision trees in polynomial time. SIAM J. Comput. , 37:827–844. 21

work page 2003
[21]

and Bioch, J

Potharst, R. and Bioch, J. C. (1999). A decision tree algorit hm for ordinal classiﬁcation. In Hand, D. J., Kok, J. N., and Berthold, M. R., editors, Proc. IDA, pages 187–198. Springer

work page 1999
[22]

Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation . Springer, New York. van Eeden, C. (1958). Testing and Estimating Ordered Parameters of Probability D istributions. PhD thesis, Univ. Amsterdam

work page 2009
[23]

Vershynin, R. (2018). High-Dimensional Probability. Cambridge University Press, Cambridge

work page 2018
[24]

M., Lan, Y., Wylie, C

Weinreich, D. M., Lan, Y., Wylie, C. S., and Heckendorn, R. B. (2013). Should evolutionary geneticists worry about higher-order epistasis? Curr. Opin. Genet. Dev. , 23:700–707

work page 2013
[25]

and Barron, A

Yang, Y. and Barron, A. (1999). Information-theoretic dete rmination of minimax rates of conver- gence. Ann. Statist. , 27:1564–1599. 22

work page 1999

[1] [1]

Beckner, W. (1975). Inequalities in Fourier analysis. Ann. Math. , 102:159–182

work page 1975

[2] [2]

Ben-David, A. (1995). Monotonicity maintenance in informa tion-theoretic machine learning algo- rithms. Mach. Learn., 19:29–43

work page 1995

[3] [3]

Bonami, A. (1970). ´Etude des coeﬃcients de Fourier des fonctions de Lp(G). Ann. Inst. Fourier , 20:335–402

work page 1970

[4] [4]

Brunk, H. D. (1955). Maximum likelihood estimates of monoto ne parameters. Ann. Math. Statist. , 26:607–616

work page 1955

[5] [5]

Brunk, H. D. (1970). Estimation of isotonic regression. In P uri, M. L., editor, Nonparametric Techniques in Statistical Inference , pages 177–197. Cambridge University Press, Cambridge. 20

work page 1970

[6] [6]

Bshouty, N. H. and Tamon, C. (1996). On the Fourier spectrum o f monotone functions. J. ACM , 43:747–770

work page 1996

[7] [7]

Chatterjee, S., Guntuboyina, A., and Sen, B. (2015). On risk bounds in isotonic and other shape restricted regression problems. Ann. Statist. , 43:1774–1800

work page 2015

[8] [8]

and Dalalyan, A

Comminges, L. and Dalalyan, A. S. (2012). Tight conditions f or consistency of variable selection in the context of high dimensionality. Ann. Statist. , 40:2667–2696

work page 2012

[9] [9]

Feelders, A. (2010). Monotone relabeling in ordinal classi ﬁcation. In Proc. IEEE ICDM , pages 803–808. IEEE Computer Society

work page 2010

[10] [10]

Friedgut, E. (1998). Boolean functions with low average sen sitivity depend on few coordinates. Combinatorica, 18:27–35

work page 1998

[11] [11]

Friedgut, E. (2004). Inﬂuences in product spaces: KKL and BK KKL revisited. Combin. Probab. Comput., 13:17–29

work page 2004

[12] [12]

and Steif, J

Garban, C. and Steif, J. E. (2014). Noise Sensitivity of Boolean Functions and Percolation . Cam- bridge University Press, Cambridge. Gonz´ alez, S., Herrera, F., and Garc ´ ıa, S. (2015). Monotonic random forest with an ensemble pruning mechanism based on the degree of monotonicity. New Gener. Comput. , 33:367–388

work page 2014

[13] [13]

and Jongbloed, G

Groeneboom, P. and Jongbloed, G. (2014). Nonparametric Estimation under Shape Constraints . Cambridge University Press, Cambridge

work page 2014

[14] [14]

Han, Q., Wang, T., Chatterjee, S., and Samworth, R. J. (2019) . Isotonic regression in general dimensions. Ann. Statist. , 47:2440–2471

work page 2019

[15] [15]

Hatami, H. (2009). Decision trees and inﬂuences of variable s over product probability spaces. Combin. Probab. Comput. , 18:357–369

work page 2009

[16] [16]

Kahn, J., Kalai, G., and Linial, N. (1988). The inﬂuence of va riables on Boolean functions. In Proc. 29th IEEE FOCS , pages 68–80

work page 1988

[17] [17]

Kelman, E., Khot, S., Kindler, G., Minzer, D., and Safra, M. ( 2021). Theorems of KKL, Friedgut, and Talagrand via random restrictions and log-Sobolev ineq uality. In Lee, J. R., editor, Proc. 12th ITCS , volume 185, pages 26:1–26:17. Schloss Dagstuhl–Leibniz- Zentrum f¨ ur Informatik

work page 2021

[18] [18]

Kpotufe, S. (2011). k-NN regression adapts to local intrinsic dimension. In Shaw e-Taylor, J.,

work page 2011

[19] [19]

Massart, P. (2007). Concentration Inequalities and Model Selection . Springer, Berlin

work page 2007

[20] [20]

Mossel, E., O’Donnell, R., and Servedio, R. A. (2003). Learn ing juntas. In Proc. 35th ACM STOC , pages 206–212. O’Donnell, R. (2014). Analysis of Boolean Functions . Cambridge University Press, Cambridge. O’Donnell, R. and Servedio, R. A. (2007). Learning monotone decision trees in polynomial time. SIAM J. Comput. , 37:827–844. 21

work page 2003

[21] [21]

and Bioch, J

Potharst, R. and Bioch, J. C. (1999). A decision tree algorit hm for ordinal classiﬁcation. In Hand, D. J., Kok, J. N., and Berthold, M. R., editors, Proc. IDA, pages 187–198. Springer

work page 1999

[22] [22]

Tsybakov, A. B. (2009). Introduction to Nonparametric Estimation . Springer, New York. van Eeden, C. (1958). Testing and Estimating Ordered Parameters of Probability D istributions. PhD thesis, Univ. Amsterdam

work page 2009

[23] [23]

Vershynin, R. (2018). High-Dimensional Probability. Cambridge University Press, Cambridge

work page 2018

[24] [24]

M., Lan, Y., Wylie, C

Weinreich, D. M., Lan, Y., Wylie, C. S., and Heckendorn, R. B. (2013). Should evolutionary geneticists worry about higher-order epistasis? Curr. Opin. Genet. Dev. , 23:700–707

work page 2013

[25] [25]

and Barron, A

Yang, Y. and Barron, A. (1999). Information-theoretic dete rmination of minimax rates of conver- gence. Ann. Statist. , 27:1564–1599. 22

work page 1999