Minimax optimal submatrix detection: Sharp non-asymptotic rates

Julien Chhor; Parker Knight

arxiv: 2605.09569 · v2 · pith:3QNPPWEBnew · submitted 2026-05-10 · 🧮 math.ST · cs.IT· math.IT· stat.ML· stat.TH

Minimax optimal submatrix detection: Sharp non-asymptotic rates

Parker Knight , Julien Chhor This is my paper

Pith reviewed 2026-05-20 22:46 UTC · model grok-4.3

classification 🧮 math.ST cs.ITmath.ITstat.MLstat.TH

keywords submatrix detectionminimax ratesplanted signalnon-asymptotic boundsGaussian noiseadaptive hypothesis testingmatrix detection

0 comments

The pith

Matching non-asymptotic minimax bounds now give the exact signal strength mu needed to detect a planted s1 by s2 submatrix in Gaussian noise for any dimensions and sparsities.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper determines how large the minimum entry mu must be before one can reliably tell whether a d1 by d2 matrix contains a hidden rectangular block of elevated values or is entirely zero, when the matrix is observed under i.i.d. standard Gaussian noise. A reader would care because the result supplies the precise finite-sample threshold separating detectable from undetectable signals, removing the need for large-dimension limits or special assumptions on how sparse or dense the block is. The authors prove an information-theoretic lower bound on the required mu and construct explicit tests that succeed at exactly this level. They further provide adaptive versions of the tests that work when the block dimensions s1 and s2 are unknown. The upper and lower bounds match exactly in every configuration of the four parameters.

Core claim

We establish a minimax lower bound characterizing how large μ must be to ensure that the two hypotheses are distinguishable with high probability. Furthermore, we derive novel minimax-optimal tests achieving the lower bound, and describe extensions of these tests that are adaptive to unknown sparsity levels s1 and s2. In contrast with previous work, our non-asymptotic upper and lower bounds match for any configuration of these parameters.

What carries the argument

The matching minimax lower bound on the critical signal strength μ, together with the novel detection tests that achieve it without restrictions on d1, d2, s1 or s2.

If this is right

Reliable detection is possible precisely at the information-theoretic limit for every choice of matrix size and submatrix sparsity.
Adaptive procedures succeed without prior knowledge of the unknown row and column counts s1 and s2.
No restrictions on the ratios between s1, s2, d1 and d2 are required to obtain sharp non-asymptotic rates.
The same bound and test construction apply uniformly across sparse, dense and intermediate regimes.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same style of exact finite-sample analysis may transfer to related tasks such as detecting sparse rank-one signals or anomalous subgraphs.
One could check whether the derived thresholds remain valid when the noise has heavier tails or mild dependence.
Connecting these rates to computational complexity would clarify whether the optimal tests can be implemented efficiently.

Load-bearing premise

The matrix is observed as a fixed mean plus i.i.d. standard normal noise, and the alternative is exactly one rectangular submatrix whose entries are all at least some constant mu greater than zero.

What would settle it

Simulate data under the alternative with mu set a small constant below the derived threshold; any procedure should then fail to achieve detection probability approaching one even as dimensions grow large.

read the original abstract

Given an observation $\mathbf Y \in \mathbb{R}^{d_1\times d_2}$ from the model $\mathbf Y = \mathbf X + \mathbf E$ where $\mathbf X$ is constant and $\mathbf E$ has i.i.d. $N(0,1)$ entries, we consider the problem of detecting a planted submatrix in the mean matrix $\mathbf X$. Specifically, we aim to distinguish the null hypothesis $\mathbf X = 0$ from the alternative hypothesis in which $\mathbf X$ is non-zero only on a submatrix of size $s_1 \times s_2$ with elevated entries bounded below by $\mu>0$. We establish a minimax lower bound characterizing how large $\mu$ must be to ensure that the two hypotheses are distinguishable with high probability. Furthermore, we derive novel minimax-optimal tests achieving the lower bound, and describe extensions of these tests that are adaptive to unknown sparsity levels $s_1$ and $s_2$. In contrast with previous work, which required restrictive assumptions on $s_1,s_2, d_1$ and $d_2$, our non-asymptotic upper and lower bounds match for any configuration of these parameters.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Matching non-asymptotic minimax bounds for arbitrary d1,d2,s1,s2 is the real advance, though the adaptive case needs close checking on logs.

read the letter

The main thing here is that the paper removes the usual restrictions on how the sparsity levels s1 and s2 sit relative to the matrix dimensions and still gets upper and lower bounds that match exactly in the non-asymptotic regime. Previous results often needed s1 much smaller than d1 or similar special cases; this one claims the threshold for mu works for any configuration. That is the concrete step forward. They build tests that hit the lower bound derived from standard information-theoretic arguments on the Gaussian model, and they sketch adaptive versions when s1 and s2 are unknown. The model itself is clean: Y equals X plus i.i.d. standard normal noise, with the alternative putting a constant mu on an s1 by s2 block. The derivations appear to rest on chi-squared or Le Cam style lower bounds and explicit test constructions rather than post-hoc fitting. Credit is due for delivering explicit rates that apply broadly instead of case-by-case. The soft spot sits in the adaptive extension. Union bounds or discretization over candidate sparsity pairs typically inject an extra log(d1 d2) or log log term into the threshold for finite samples, which would make the upper bound strictly larger than the lower bound for some parameter values. The abstract states that the bounds still match for any configuration, so either they have a construction that avoids the usual penalty or the extra factor is absorbed without breaking exactness. Without the full proofs it is hard to see which. The rest of the argument looks standard and non-circular. This is for people who work on sharp minimax rates in high-dimensional detection and sparse signal problems. A reader who wants explicit non-asymptotic thresholds rather than asymptotic regimes will get direct value. It is worth sending to a serious referee because the claim is precise, the setup is standard, and the removal of restrictive assumptions is a genuine improvement even if the adaptive part needs tightening.

Referee Report

2 major / 2 minor

Summary. The paper studies minimax-optimal detection of a planted s1×s2 submatrix with minimum entry μ in a d1×d2 matrix observed as Y = X + E with i.i.d. N(0,1) noise. It derives a non-asymptotic minimax lower bound on the critical μ separating the null (X=0) from the alternative, constructs tests achieving this bound, and extends the tests to the adaptive setting where s1 and s2 are unknown. The central claim is that the resulting upper and lower bounds match exactly for arbitrary configurations of d1,d2,s1,s2 without the restrictive assumptions required in prior work.

Significance. If the matching non-asymptotic bounds hold for all regimes, the work strengthens the literature on submatrix detection and sparse signal detection by removing dimension/sparsity restrictions and delivering sharp finite-sample thresholds. The adaptive extensions would be practically relevant for settings where sparsity levels are not known a priori.

major comments (2)

[§5] §5 (Adaptive extensions): The upper bound for the adaptive test is obtained via a union bound (or discretization) over candidate (s1,s2) pairs. This introduces an additive log(d1 d2) or log log term inside the threshold for μ that is absent from the fixed-(s1,s2) lower bound derived via the usual chi-squared or Le Cam method over exact-size subsets. Consequently the claimed exact non-asymptotic matching fails to hold uniformly for every parameter configuration, especially when s1,s2 ≪ d1,d2.
[Theorem 4.1] Theorem 4.1 (upper bound for known sparsity): the probability-of-error bound is stated to match the lower bound of Theorem 3.1 for all regimes, but the proof sketch relies on a specific choice of test statistic whose variance and tail behavior must be verified to remain parameter-free after the exact constants are tracked through the non-asymptotic analysis.

minor comments (2)

[Notation section] Notation: matrix and vector boldface is used inconsistently in the statements of the main theorems.
[Introduction] Add a short table comparing the new thresholds with the best previously known rates under the same non-asymptotic regime.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful reading of our manuscript and the constructive comments. We address each major comment point by point below and indicate the planned revisions.

read point-by-point responses

Referee: [§5] §5 (Adaptive extensions): The upper bound for the adaptive test is obtained via a union bound (or discretization) over candidate (s1,s2) pairs. This introduces an additive log(d1 d2) or log log term inside the threshold for μ that is absent from the fixed-(s1,s2) lower bound derived via the usual chi-squared or Le Cam method over exact-size subsets. Consequently the claimed exact non-asymptotic matching fails to hold uniformly for every parameter configuration, especially when s1,s2 ≪ d1,d2.

Authors: We agree that the adaptive procedure in Section 5 relies on a union bound over candidate sparsity pairs, which introduces an extra logarithmic factor (of order log log(d1 d2) or similar, depending on the discretization) into the detection threshold. The lower bound of Theorem 3.1 is derived specifically for known s1 and s2. We do not claim that the adaptive upper bound matches this lower bound exactly; the adaptive test achieves the minimax rate up to this small logarithmic factor. We will revise the manuscript to clarify this distinction explicitly, state the precise adaptive rate, and note that exact matching holds only in the non-adaptive setting with known sparsity. This clarification is particularly relevant in the sparse regime s1, s2 ≪ d1, d2. revision: yes
Referee: [Theorem 4.1] Theorem 4.1 (upper bound for known sparsity): the probability-of-error bound is stated to match the lower bound of Theorem 3.1 for all regimes, but the proof sketch relies on a specific choice of test statistic whose variance and tail behavior must be verified to remain parameter-free after the exact constants are tracked through the non-asymptotic analysis.

Authors: We appreciate the referee highlighting the need for explicit constant tracking. The full proof of Theorem 4.1 (provided in the appendix) uses a normalized maximum-submatrix-sum statistic and applies standard Gaussian tail bounds (e.g., sub-Gaussian concentration for the maximum of correlated Gaussians). After normalization by the exact variance factor sqrt(s1 s2), the resulting bounds on the type-I and type-II errors are indeed parameter-free and match the lower bound of Theorem 3.1 for all regimes. To improve transparency, we will expand the main-text proof sketch with a short paragraph outlining the constant verification steps. revision: partial

Circularity Check

0 steps flagged

No circularity; standard minimax analysis is self-contained

full rationale

The paper derives minimax lower bounds via information-theoretic arguments (e.g., chi-squared or Le Cam methods over all s1 x s2 submatrices) and constructs explicit tests for matching upper bounds under the Gaussian noise model Y = X + E. These steps are independent: the lower bound quantifies distinguishability from the null, while the upper bound exhibits a test whose error probability is controlled directly by the same threshold on mu. Adaptive extensions to unknown s1,s2 are handled by discretization and union bounds, but the paper states that the resulting non-asymptotic bounds continue to match the lower bound for arbitrary parameter configurations without reducing to a fit or self-citation. No quoted step equates a claimed result to its input by construction, and the analysis relies on external mathematical facts (Gaussian tail bounds, union bounds) rather than prior work by the same authors as the load-bearing justification. The derivation is therefore verifiable from first principles within the stated model.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the standard Gaussian noise model and the planted-submatrix alternative; no free parameters or invented entities are introduced in the abstract.

axioms (2)

domain assumption Observation model Y = X + E with E having i.i.d. standard normal entries.
Standard additive Gaussian noise assumption for the matrix observation.
domain assumption Under the alternative, X is nonzero only on an s1 × s2 submatrix with all entries ≥ μ > 0.
Definition of the planted signal strength and support.

pith-pipeline@v0.9.0 · 5753 in / 1314 out tokens · 60517 ms · 2026-05-20T22:46:14.571555+00:00 · methodology

Review history (2 revisions) →

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We observe a matrix Y∈R^{d1×d2} drawn from the model Y=X+E where ... E_{ij} iid ~ N(0,1). ... Θ(s1,s2,d1,d2,μ) = union over subsets of matrices nonzero only on s1×s2 block with entries ≥μ.
IndisputableMonolith/Foundation/ArithmeticFromLogic.lean LogicNat.induction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Our proof of the lower bound ... relies on a refined use of the second moment method ... E[exp(μ² W1 W2)] with Wj ~ HyperGeometric.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

30 extracted references · 30 canonical work pages

[1]

Ingster, Y

Springer Science & Business Media. Ingster, Y. I. (1982). On the minimax nonparametric detection of signals in white gaussian noise.Problemy Peredachi Informatsii, 18(2):61–73. Ingster, Y. I. (1987). Minimax testing of nonparametric hypotheses on a distribution density in the l_p metrics.Theory of Probability & Its Applications, 31(2):333–337. 21 Ingster,...

work page arXiv 1982
[2]

In thegeneral sparsitysetting, meaning thats1≤c1d1 and s2≤c2d2 for sufficiently small constants c1,c 2 > 0, we show that there exists a constantcµ> 0such that if µ2≤cµR, then (41) holds. The proof of this claim constitutes the primary technical difficulty of the derivation of our lower bound, and relies on a precise analysis of the moment generating funct...

work page
[3]

In this case, the problem roughly reduces to the sparse signal detection problem in a standard Gaussian sequence model

Otherwise, we place our selves in thevery densesetting and assume without loss of generality that s1≥cd1 for a constantc∈(0, 1). In this case, the problem roughly reduces to the sparse signal detection problem in a standard Gaussian sequence model. In Section A.2, we prove that there exist constants cµ,c′> 0such that if µ2≤cµd1 s2 1 log ( 1 +c′d2 s2 2 ) ,...

work page 2026
[4]

Then if µ2 <C−1 ∗ d1 s2 1 log ( cd2 s2 ) + cµ s2 log (C∗ 2e ) , it holds E [ exp(µ2XY)1 ( X≥⌈C∗ s2 1 d1 ⌉ )] <α

Suppose thatc−1 µs2 log ( cd2 s2 ) ≤C∗ s2 1 d1 . Then if µ2 <C−1 ∗ d1 s2 1 log ( cd2 s2 ) + cµ s2 log (C∗ 2e ) , it holds E [ exp(µ2XY)1 ( X≥⌈C∗ s2 1 d1 ⌉ )] <α

work page
[5]

Then if µ2 < cµ s2 log (s2d1 log(cd2 s2 ) cµ2es2 1 ) , it holds E [ exp(µ2XY)1 ( X≥⌈C∗ s2 1 d1 ⌉ )] <α

Suppose thatC∗ s2 1 d1 <c−1 µs2 log ( cd2 s2 ) ≤s1. Then if µ2 < cµ s2 log (s2d1 log(cd2 s2 ) cµ2es2 1 ) , it holds E [ exp(µ2XY)1 ( X≥⌈C∗ s2 1 d1 ⌉ )] <α

work page
[6]

Proof.DefineA={⌈C∗ s2 1 d1 ⌉,...,s1}andgas in (43)

Suppose thats1 <c−1 µs2 log ( cd2 s2 ) .Then if µ2 < 1 s1 log ( cd2 s2 ) + cµ s2 log (d1 2es1 ) , it holds E [ exp(µ2XY)1 ( X≥⌈C∗ s2 1 d1 ⌉ )] <α. Proof.DefineA={⌈C∗ s2 1 d1 ⌉,...,s1}andgas in (43). Fork∈A, it holds cµk s2 log (kd1 2es2 1 ) ≥cµC∗s2 1 s2d1 log (C∗ 2e ) ≥cµC∗(2e)−4log (C∗ 2e ) >0 where the second inequality follows from the assumptions2 1≥(...

work page
[7]

Thenf is increasing overA, and by our assumption onµ2, it holds µ2 <C−1 ∗ d1 s2 1 log ( cd2 s2 ) + cµ s2 log (C∗ 2e ) ≤min k∈A f(k) ≤min k∈A g(k)

Suppose thatc−1 µs2 log ( cd2 s2 ) ≤C∗ s2 1 d1 . Thenf is increasing overA, and by our assumption onµ2, it holds µ2 <C−1 ∗ d1 s2 1 log ( cd2 s2 ) + cµ s2 log (C∗ 2e ) ≤min k∈A f(k) ≤min k∈A g(k)

work page
[8]

Then by our assumption onµ2, we have µ2 < cµ s2 log (s2d1 log(cd2 s2 ) cµ2es2 1 ) ≤min k∈A f(k) ≤min k∈A g(k)

Suppose thatC∗ s2 1 d1 <c−1 µs2 log ( cd2 s2 ) ≤s1. Then by our assumption onµ2, we have µ2 < cµ s2 log (s2d1 log(cd2 s2 ) cµ2es2 1 ) ≤min k∈A f(k) ≤min k∈A g(k)

work page
[9]

Thenf is decreasing overA and is minimized atk =s1

Suppose thats1 <c−1 µs2 log ( cd2 s2 ) . Thenf is decreasing overA and is minimized atk =s1. Therefore by our assumption onµ2, we have µ2 < 1 s1 log ( cd2 s2 ) + cµ s2 log (d1 2es1 ) = min k∈A f(k) ≤min k∈A g(k). By Lemma 2, the proof is complete. 40 A.3.1 Simplification of the rate For anys1,s 2,d 1,d 2∈N, we define the following quantities ψ(s1,s 2,d 1,...

work page
[10]

Then the following two properties hold ( ψ12 +ψ21 ) ∧ϕ12≍ψ21∧ϕ12 (47) ψ21∧ϕ12≍ ( ψ21 +β12 ) ∧ϕ12.(48)

work page
[11]

Proof of Lemma 8

It follows that, for anyα>0, there exists a constantcµ>0such that, wheneverµ2≤cµR, we have E [ exp(µ2XY)1 ( X≥C∗ s2 1 d1 )] <α. Proof of Lemma 8

work page
[12]

41 We start by showing that ψ21∧ϕ12≍ψ21∧˜ϕ12

For anys1,s 2,d 1,d 2, we let ˜ϕ12 = d1 s2 1 log ( 1 +d2 s2 2 ) . 41 We start by showing that ψ21∧ϕ12≍ψ21∧˜ϕ12. This is clear ifd1 s2 1 ≤Cby definition ofϕ12. Assume now thatd1 s2 1 >C , which implies thatϕ12 =∞and ψ21∧ϕ12 =ψ21. By the assumptions2 1≥¯cd1s2, we obtains2≤s2 1 d1¯c≤1 ¯c. Therefore, we have ϕ12 = d1 s2 1 log ( 1 +d2 s2 2 ) ≥d1 s2 1 log ( 1 +...

work page
[13]

Assume first thatc−1 µs2 log ( cd2 s2 ) ≤C∗ s2 1 d1

Now, letα>0and letc µ,c>0be the constants defined in Lemma 7. Assume first thatc−1 µs2 log ( cd2 s2 ) ≤C∗ s2 1 d1 . Recalling that d2 s2 ≥1 c′where c′can be taken arbitrarily small, we have R≤ϕ12 = d1 s2 1 log ( 1 +d2 s2 2 ) ≤d1 s2 1 log ( 2d2 s2 ) providedc′is small enough ≤2d1 s2 1 log ( cd2 s2 ) providedc′is small enough ≤C−1 ∗ d1 s2 1 log ( cd2 s2 ) +...

work page
[14]

Hence, for anyα>0, there exists a constantc′ µ>0such that, wheneverµ2≤c′ µR, we have E [ exp(µ2XY)1 ( 1∨C∗ s2 1 d1 ≤X≤s1∧ s2 log (d1s2 s2 1 ) )] <α

Then for some sufficiently small constantc> 0depending only on µ, it holds thatM≥cR. Hence, for anyα>0, there exists a constantc′ µ>0such that, wheneverµ2≤c′ µR, we have E [ exp(µ2XY)1 ( 1∨C∗ s2 1 d1 ≤X≤s1∧ s2 log (d1s2 s2 1 ) )] <α

work page
[15]

(51) Proof of Lemma 9

Moreover, we have R≥    1 2(ψ12 +β21)∧ϕ12∧ϕ21 ifs 1≤ s2 log ( d1s2 s2 1 ) 1 2(ψ21 +β12)∧ϕ12∧ϕ21 otherwise. (51) Proof of Lemma 9

work page
[16]

We start by showing thatM≥c(ψ21 +ψ12). Using Lemma 4 and Lemma 25.(i), we obtain 2M≥1 s1 log ( 1 +cµ d2s1 s2 2 log( d1 2es1 ) ) + log (d1s2 s2 1 ) s2 log ( 1 +cµ d2 s2 log (d1s2 s2 1 )log ( d1s2 2es2 1 log(d1s2 s2 1 ) )) ≥cµ s1 log ( 1 +d2s1 s2 2 log( d1 2es1 ) ) +cµ log (d1s2 s2 1 ) s2 log ( 1 + d2 s2 log (d1s2 s2 1 )log ( d1s2 2es2 1 log(d1s2 s2 1 ) )) ...

work page
[17]

Assume first thats1≤ s2 log ( d1s2 s2 1 )

We now prove equation (51). Assume first thats1≤ s2 log ( d1s2 s2 1 ). Then we obtain d1 s1 ≤ d1s2 s2 1 log (d1s2 s2 1 ),hence d1s2 s2 1 ≥d1 s1 log (d1 s1 ) ,i.e. 1 s1 ≥1 s2 log (d1 s1 ) by Lemma 24.(ii). Note that, sinced2s1 s2 2 ≥d1 s1 ≥1 c′can be made arbitrarily large by takingc′small enough, we have β21≤1 s2 log (d1 s1 ) ≤1 s1 ≤1 s1 log ( 1 +d2s1 s2 ...

work page
[18]

Then for anyα>0, there exist constantscµ> 0and C∗≥1such that if1 ∨ ⌈ C∗ s2 1 d1 ⌉ ∨ ⌈ C∗s2 2/d2 log (d1s2 2 s2 1d2 ) ⌉ ≤ s1∧ ⌊ s2 log ( d1s2 s2 1 )⌋ andµ2≤cµR, E [ exp(µ2XY)1 ( C∗ s2 1 d1 ∨C∗s2 2/d2 log (d1s2 2 s2 1d2 )≤X≤s1∧ s2 log (d1s2 s2 1 ) )] <α. 47

work page
[19]

Proof of Lemma 11

Moreover, we have R≥    ( ψ21 +β12 ) ∧ϕ12∧ϕ21 ifs 2≤s1 log ( d1 s1 ) ( ψ12 +β21 ) ∧ϕ12∧ϕ21 otherwise. Proof of Lemma 11

work page
[20]

By the assumptiond1s2 s2 1 ≥¯c−1≥16e4, we can repeat the steps leading to equation (52) to obtain log (d1s2 s2 1 ) s2 log ( 1 +cµ d2 s2 log (d1s2 s2 1 )log ( d1s2 2es2 1 log(d1s2 s2 1 ) )) ≥cµ 4 log ( 1 + d1s2 s2 1 log (d2 s2 )) s2 = cµ 4 ψ21. It immediately follows that, for some small enough constantc, we have µ2≤cR ≤c(ψ12 +ψ21) ≤1 2 1 s1 log ( 1 +cµ d2...

work page
[21]

Then we have ψ12 = 1 s1 log ( 1 +d2s1 s2 2 log (d1 s1 )) ≥1 s1 log ( 1 +d2 s2 ) ≥β12, which yields the desired result

Assume now thats2≤s1 log ( d1 s1 ) . Then we have ψ12 = 1 s1 log ( 1 +d2s1 s2 2 log (d1 s1 )) ≥1 s1 log ( 1 +d2 s2 ) ≥β12, which yields the desired result. Assume now thats2 >s 1 log ( d1 s1 ) . Then ψ21 = 1 s2 log ( 1 +d1s2 s2 1 log (d2 s2 )) > 1 s2 log (d1 s1 log (d1 s1 ) log (d2 s2 )) ≥1 s2 log (d1 s1 ) ≥β21. This concludes the proof. The lemma below e...

work page
[22]

Then for anyα4 >0, there exist constantscµ>0andC ∗≥1such that ifµ2≤cµR,then E [ exp(µ2XY)1 ( X≥ s2 log (d1s2 s2 1 )∨C∗ s2 1 d1 ⌉)] <α4

work page
[23]

Proof of Lemma 12

Moreover, it holds that R≥    1 4 ( (ψ21 +β12) + (ψ12 +β21) ) ∧ϕ12∧ϕ21 ifs 1≤s2 log ( d2 s2 ) 1 2 ( ψ21 +β12 ) ∧ϕ12∧ϕ21 otherwise. Proof of Lemma 12

work page
[24]

Assume first thats1 <s 2 log(d2/s2)

Note that the relations2 <s 1 log ( (d1s2)/s2 1 ) implies that d1s2 s2 1 < d1 s1 log (d1s2 s2 1 ) ≤2d1 s1 log (2d1 s1 ) by Lemma 24.(i) which yields s2≤4s1 log(d1/s1).(53) Now, we obtain d2s1 s2 2 log (d1 s1 ) = d2 s2 ·s1 s2 log (d1 s1 ) ≥1 4c′≥1 providedc′≤1/4, and d1s2 s2 1 log (d2 s2 ) ≥1 ¯clog ( 1 c′ ) ≥1 providedc′and¯care small enough, which ensures...

work page
[25]

Assume first thats1 <s 2 log ( d2 s2 ) . Then we have ψ12 +ψ21 = 1 s1 log ( 1 +d2s1 s2 2 log (d1 s1 )) + 1 s2 log ( 1 +d1s2 s2 1 log (d2 s2 )) ≥1 s1 log ( 1 + d2 4s2 ) + 1 s2 log (d1s2 s2 1 log (d2 s2 )) > 1 4s1 log ( 1 +d2 s2 ) + 1 s2 log (d1 s1 ) ≥1 4 ( β12 +β21 ) , which yieldsR≥1 8 ( (ψ21 +β12) + (ψ12 +β21) ) ∧ϕ12∧ϕ21, as desired. Assume now thats1≥s2...

work page 2021
[26]

Similarly, if˜R =ϕ21 and d1≥s2 1, the result follows by Lemma 16

Then the result follows by Lemma 16. Similarly, if˜R =ϕ21 and d1≥s2 1, the result follows by Lemma 16. Finally, if none of the conditions above are satisfied, then we have ∆∗= ∆ h2 lin and the result follows by Lemma 14. The proof is complete. B.6 Analysis of adaptive tests B.6.1 Adaptive Bonferroni corrected linear test Lemma 18.Letα∈(0,1)be given. Defin...

work page
[27]

However, it is true thats+ 1 :=d12−⌈log2(d1/s1)⌉belongs toΩ 1

We have not assumed thats∗ 1∈Ω 1. However, it is true thats+ 1 :=d12−⌈log2(d1/s1)⌉belongs toΩ 1. Sinces∗ 1≤s+ 1 , there exists a setS+ 1 ∈Ps+ 1 (d1)such that S∗ 1⊂S+ 1 . Define the statistic tS+ 1 = d2∑ j=1 ¯YS+ 1,j = 1√ s+ 1d2 d2∑ j=1 ∑ i∈S+ 1 Yij. Under the alternative hypothesis,EX[tS+ 1 ] = (µs2s∗ 1)/ √ s+ 1d2 and var(tS1) = 1. By our assumption onµan...

work page 2023
[28]

Lemma 23.Let f : S→(0,∞)be differentiable, where S⊆(0,∞)is an interval

Then for anyk∈{1,...,n}, we have Pr(X=k)≤ (2enp k )k exp ( −np ) Proof.Using the bound (n k ) ≤ (ne k )k (Appendix A in Roch (2024)), we have Pr(X=k) = (n k ) pk( 1−p )n−k ≤ (npe k )k( 1−p )n−k = (npe k )k( 1−p )n( 1−p )−k ≤ (npe k )k( 1−p )n( 1/2 )−k = (2enp k )k( 1−p )n ≤ (2enp k )k exp ( −np ) where the final inequality uses(1−x)b≤exp(−xb)for anyx,b≥0....

work page 2024
[29]

First case: Assumelog(d 1)>s 2. Then the quantities involved in the rate simplify as follows •ψ12 = log ( 1 + d2 s2 2 log(ed1) ) ≍log ( ed2 s2 ) + log ( elog(d 1) s2 ) •ψ21 = 1 s2 log ( 1 +d 1 log(e (d2 s2 ) ) ) ≍logd 1 s2 + log(log(e( d2 s2))) s2 , hence ψ12 +ψ21≍log (ed2 s2 ) + logd 1 s2 . Moreover, we also have ϕ21 = d2 s2 2 log(1 +d 1)≳log (ed2 s2 ) +...

work page
[30]

We haveψ12 = log ( 1 + d2 s2 2 log(ed1) ) , and •ϕ12 =d 1 log ( 1 + d2 s2 2 ) ≥ψ12 by Lemma 25.(ii)

Assume now thatlog(d1)<s 2. We haveψ12 = log ( 1 + d2 s2 2 log(ed1) ) , and •ϕ12 =d 1 log ( 1 + d2 s2 2 ) ≥ψ12 by Lemma 25.(ii). •ϕ21 = d2 s2 2 log(1 +d 1)≳ψ12. Assume first thatd2 s2 2 log(ed1)< 1. It follows thatψ12≍ϕ21, so that(µ∗)2≍(ψ12+ψ21)∧ϕ12∧ϕ21≍ϕ12, as claimed. Now, assume thatd2 s2 2 log(ed1)≥1. Then by Lemma 25.(ii), we have s2ψ12≥log ( 1 +d2 l...

work page

[1] [1]

Ingster, Y

Springer Science & Business Media. Ingster, Y. I. (1982). On the minimax nonparametric detection of signals in white gaussian noise.Problemy Peredachi Informatsii, 18(2):61–73. Ingster, Y. I. (1987). Minimax testing of nonparametric hypotheses on a distribution density in the l_p metrics.Theory of Probability & Its Applications, 31(2):333–337. 21 Ingster,...

work page arXiv 1982

[2] [2]

In thegeneral sparsitysetting, meaning thats1≤c1d1 and s2≤c2d2 for sufficiently small constants c1,c 2 > 0, we show that there exists a constantcµ> 0such that if µ2≤cµR, then (41) holds. The proof of this claim constitutes the primary technical difficulty of the derivation of our lower bound, and relies on a precise analysis of the moment generating funct...

work page

[3] [3]

In this case, the problem roughly reduces to the sparse signal detection problem in a standard Gaussian sequence model

Otherwise, we place our selves in thevery densesetting and assume without loss of generality that s1≥cd1 for a constantc∈(0, 1). In this case, the problem roughly reduces to the sparse signal detection problem in a standard Gaussian sequence model. In Section A.2, we prove that there exist constants cµ,c′> 0such that if µ2≤cµd1 s2 1 log ( 1 +c′d2 s2 2 ) ,...

work page 2026

[4] [4]

Then if µ2 <C−1 ∗ d1 s2 1 log ( cd2 s2 ) + cµ s2 log (C∗ 2e ) , it holds E [ exp(µ2XY)1 ( X≥⌈C∗ s2 1 d1 ⌉ )] <α

Suppose thatc−1 µs2 log ( cd2 s2 ) ≤C∗ s2 1 d1 . Then if µ2 <C−1 ∗ d1 s2 1 log ( cd2 s2 ) + cµ s2 log (C∗ 2e ) , it holds E [ exp(µ2XY)1 ( X≥⌈C∗ s2 1 d1 ⌉ )] <α

work page

[5] [5]

Then if µ2 < cµ s2 log (s2d1 log(cd2 s2 ) cµ2es2 1 ) , it holds E [ exp(µ2XY)1 ( X≥⌈C∗ s2 1 d1 ⌉ )] <α

Suppose thatC∗ s2 1 d1 <c−1 µs2 log ( cd2 s2 ) ≤s1. Then if µ2 < cµ s2 log (s2d1 log(cd2 s2 ) cµ2es2 1 ) , it holds E [ exp(µ2XY)1 ( X≥⌈C∗ s2 1 d1 ⌉ )] <α

work page

[6] [6]

Proof.DefineA={⌈C∗ s2 1 d1 ⌉,...,s1}andgas in (43)

Suppose thats1 <c−1 µs2 log ( cd2 s2 ) .Then if µ2 < 1 s1 log ( cd2 s2 ) + cµ s2 log (d1 2es1 ) , it holds E [ exp(µ2XY)1 ( X≥⌈C∗ s2 1 d1 ⌉ )] <α. Proof.DefineA={⌈C∗ s2 1 d1 ⌉,...,s1}andgas in (43). Fork∈A, it holds cµk s2 log (kd1 2es2 1 ) ≥cµC∗s2 1 s2d1 log (C∗ 2e ) ≥cµC∗(2e)−4log (C∗ 2e ) >0 where the second inequality follows from the assumptions2 1≥(...

work page

[7] [7]

Thenf is increasing overA, and by our assumption onµ2, it holds µ2 <C−1 ∗ d1 s2 1 log ( cd2 s2 ) + cµ s2 log (C∗ 2e ) ≤min k∈A f(k) ≤min k∈A g(k)

Suppose thatc−1 µs2 log ( cd2 s2 ) ≤C∗ s2 1 d1 . Thenf is increasing overA, and by our assumption onµ2, it holds µ2 <C−1 ∗ d1 s2 1 log ( cd2 s2 ) + cµ s2 log (C∗ 2e ) ≤min k∈A f(k) ≤min k∈A g(k)

work page

[8] [8]

Then by our assumption onµ2, we have µ2 < cµ s2 log (s2d1 log(cd2 s2 ) cµ2es2 1 ) ≤min k∈A f(k) ≤min k∈A g(k)

Suppose thatC∗ s2 1 d1 <c−1 µs2 log ( cd2 s2 ) ≤s1. Then by our assumption onµ2, we have µ2 < cµ s2 log (s2d1 log(cd2 s2 ) cµ2es2 1 ) ≤min k∈A f(k) ≤min k∈A g(k)

work page

[9] [9]

Thenf is decreasing overA and is minimized atk =s1

Suppose thats1 <c−1 µs2 log ( cd2 s2 ) . Thenf is decreasing overA and is minimized atk =s1. Therefore by our assumption onµ2, we have µ2 < 1 s1 log ( cd2 s2 ) + cµ s2 log (d1 2es1 ) = min k∈A f(k) ≤min k∈A g(k). By Lemma 2, the proof is complete. 40 A.3.1 Simplification of the rate For anys1,s 2,d 1,d 2∈N, we define the following quantities ψ(s1,s 2,d 1,...

work page

[10] [10]

Then the following two properties hold ( ψ12 +ψ21 ) ∧ϕ12≍ψ21∧ϕ12 (47) ψ21∧ϕ12≍ ( ψ21 +β12 ) ∧ϕ12.(48)

work page

[11] [11]

Proof of Lemma 8

It follows that, for anyα>0, there exists a constantcµ>0such that, wheneverµ2≤cµR, we have E [ exp(µ2XY)1 ( X≥C∗ s2 1 d1 )] <α. Proof of Lemma 8

work page

[12] [12]

41 We start by showing that ψ21∧ϕ12≍ψ21∧˜ϕ12

For anys1,s 2,d 1,d 2, we let ˜ϕ12 = d1 s2 1 log ( 1 +d2 s2 2 ) . 41 We start by showing that ψ21∧ϕ12≍ψ21∧˜ϕ12. This is clear ifd1 s2 1 ≤Cby definition ofϕ12. Assume now thatd1 s2 1 >C , which implies thatϕ12 =∞and ψ21∧ϕ12 =ψ21. By the assumptions2 1≥¯cd1s2, we obtains2≤s2 1 d1¯c≤1 ¯c. Therefore, we have ϕ12 = d1 s2 1 log ( 1 +d2 s2 2 ) ≥d1 s2 1 log ( 1 +...

work page

[13] [13]

Assume first thatc−1 µs2 log ( cd2 s2 ) ≤C∗ s2 1 d1

Now, letα>0and letc µ,c>0be the constants defined in Lemma 7. Assume first thatc−1 µs2 log ( cd2 s2 ) ≤C∗ s2 1 d1 . Recalling that d2 s2 ≥1 c′where c′can be taken arbitrarily small, we have R≤ϕ12 = d1 s2 1 log ( 1 +d2 s2 2 ) ≤d1 s2 1 log ( 2d2 s2 ) providedc′is small enough ≤2d1 s2 1 log ( cd2 s2 ) providedc′is small enough ≤C−1 ∗ d1 s2 1 log ( cd2 s2 ) +...

work page

[14] [14]

Hence, for anyα>0, there exists a constantc′ µ>0such that, wheneverµ2≤c′ µR, we have E [ exp(µ2XY)1 ( 1∨C∗ s2 1 d1 ≤X≤s1∧ s2 log (d1s2 s2 1 ) )] <α

Then for some sufficiently small constantc> 0depending only on µ, it holds thatM≥cR. Hence, for anyα>0, there exists a constantc′ µ>0such that, wheneverµ2≤c′ µR, we have E [ exp(µ2XY)1 ( 1∨C∗ s2 1 d1 ≤X≤s1∧ s2 log (d1s2 s2 1 ) )] <α

work page

[15] [15]

(51) Proof of Lemma 9

Moreover, we have R≥    1 2(ψ12 +β21)∧ϕ12∧ϕ21 ifs 1≤ s2 log ( d1s2 s2 1 ) 1 2(ψ21 +β12)∧ϕ12∧ϕ21 otherwise. (51) Proof of Lemma 9

work page

[16] [16]

We start by showing thatM≥c(ψ21 +ψ12). Using Lemma 4 and Lemma 25.(i), we obtain 2M≥1 s1 log ( 1 +cµ d2s1 s2 2 log( d1 2es1 ) ) + log (d1s2 s2 1 ) s2 log ( 1 +cµ d2 s2 log (d1s2 s2 1 )log ( d1s2 2es2 1 log(d1s2 s2 1 ) )) ≥cµ s1 log ( 1 +d2s1 s2 2 log( d1 2es1 ) ) +cµ log (d1s2 s2 1 ) s2 log ( 1 + d2 s2 log (d1s2 s2 1 )log ( d1s2 2es2 1 log(d1s2 s2 1 ) )) ...

work page

[17] [17]

Assume first thats1≤ s2 log ( d1s2 s2 1 )

We now prove equation (51). Assume first thats1≤ s2 log ( d1s2 s2 1 ). Then we obtain d1 s1 ≤ d1s2 s2 1 log (d1s2 s2 1 ),hence d1s2 s2 1 ≥d1 s1 log (d1 s1 ) ,i.e. 1 s1 ≥1 s2 log (d1 s1 ) by Lemma 24.(ii). Note that, sinced2s1 s2 2 ≥d1 s1 ≥1 c′can be made arbitrarily large by takingc′small enough, we have β21≤1 s2 log (d1 s1 ) ≤1 s1 ≤1 s1 log ( 1 +d2s1 s2 ...

work page

[18] [18]

Then for anyα>0, there exist constantscµ> 0and C∗≥1such that if1 ∨ ⌈ C∗ s2 1 d1 ⌉ ∨ ⌈ C∗s2 2/d2 log (d1s2 2 s2 1d2 ) ⌉ ≤ s1∧ ⌊ s2 log ( d1s2 s2 1 )⌋ andµ2≤cµR, E [ exp(µ2XY)1 ( C∗ s2 1 d1 ∨C∗s2 2/d2 log (d1s2 2 s2 1d2 )≤X≤s1∧ s2 log (d1s2 s2 1 ) )] <α. 47

work page

[19] [19]

Proof of Lemma 11

Moreover, we have R≥    ( ψ21 +β12 ) ∧ϕ12∧ϕ21 ifs 2≤s1 log ( d1 s1 ) ( ψ12 +β21 ) ∧ϕ12∧ϕ21 otherwise. Proof of Lemma 11

work page

[20] [20]

By the assumptiond1s2 s2 1 ≥¯c−1≥16e4, we can repeat the steps leading to equation (52) to obtain log (d1s2 s2 1 ) s2 log ( 1 +cµ d2 s2 log (d1s2 s2 1 )log ( d1s2 2es2 1 log(d1s2 s2 1 ) )) ≥cµ 4 log ( 1 + d1s2 s2 1 log (d2 s2 )) s2 = cµ 4 ψ21. It immediately follows that, for some small enough constantc, we have µ2≤cR ≤c(ψ12 +ψ21) ≤1 2 1 s1 log ( 1 +cµ d2...

work page

[21] [21]

Then we have ψ12 = 1 s1 log ( 1 +d2s1 s2 2 log (d1 s1 )) ≥1 s1 log ( 1 +d2 s2 ) ≥β12, which yields the desired result

Assume now thats2≤s1 log ( d1 s1 ) . Then we have ψ12 = 1 s1 log ( 1 +d2s1 s2 2 log (d1 s1 )) ≥1 s1 log ( 1 +d2 s2 ) ≥β12, which yields the desired result. Assume now thats2 >s 1 log ( d1 s1 ) . Then ψ21 = 1 s2 log ( 1 +d1s2 s2 1 log (d2 s2 )) > 1 s2 log (d1 s1 log (d1 s1 ) log (d2 s2 )) ≥1 s2 log (d1 s1 ) ≥β21. This concludes the proof. The lemma below e...

work page

[22] [22]

Then for anyα4 >0, there exist constantscµ>0andC ∗≥1such that ifµ2≤cµR,then E [ exp(µ2XY)1 ( X≥ s2 log (d1s2 s2 1 )∨C∗ s2 1 d1 ⌉)] <α4

work page

[23] [23]

Proof of Lemma 12

Moreover, it holds that R≥    1 4 ( (ψ21 +β12) + (ψ12 +β21) ) ∧ϕ12∧ϕ21 ifs 1≤s2 log ( d2 s2 ) 1 2 ( ψ21 +β12 ) ∧ϕ12∧ϕ21 otherwise. Proof of Lemma 12

work page

[24] [24]

Assume first thats1 <s 2 log(d2/s2)

Note that the relations2 <s 1 log ( (d1s2)/s2 1 ) implies that d1s2 s2 1 < d1 s1 log (d1s2 s2 1 ) ≤2d1 s1 log (2d1 s1 ) by Lemma 24.(i) which yields s2≤4s1 log(d1/s1).(53) Now, we obtain d2s1 s2 2 log (d1 s1 ) = d2 s2 ·s1 s2 log (d1 s1 ) ≥1 4c′≥1 providedc′≤1/4, and d1s2 s2 1 log (d2 s2 ) ≥1 ¯clog ( 1 c′ ) ≥1 providedc′and¯care small enough, which ensures...

work page

[25] [25]

Assume first thats1 <s 2 log ( d2 s2 ) . Then we have ψ12 +ψ21 = 1 s1 log ( 1 +d2s1 s2 2 log (d1 s1 )) + 1 s2 log ( 1 +d1s2 s2 1 log (d2 s2 )) ≥1 s1 log ( 1 + d2 4s2 ) + 1 s2 log (d1s2 s2 1 log (d2 s2 )) > 1 4s1 log ( 1 +d2 s2 ) + 1 s2 log (d1 s1 ) ≥1 4 ( β12 +β21 ) , which yieldsR≥1 8 ( (ψ21 +β12) + (ψ12 +β21) ) ∧ϕ12∧ϕ21, as desired. Assume now thats1≥s2...

work page 2021

[26] [26]

Similarly, if˜R =ϕ21 and d1≥s2 1, the result follows by Lemma 16

Then the result follows by Lemma 16. Similarly, if˜R =ϕ21 and d1≥s2 1, the result follows by Lemma 16. Finally, if none of the conditions above are satisfied, then we have ∆∗= ∆ h2 lin and the result follows by Lemma 14. The proof is complete. B.6 Analysis of adaptive tests B.6.1 Adaptive Bonferroni corrected linear test Lemma 18.Letα∈(0,1)be given. Defin...

work page

[27] [27]

However, it is true thats+ 1 :=d12−⌈log2(d1/s1)⌉belongs toΩ 1

We have not assumed thats∗ 1∈Ω 1. However, it is true thats+ 1 :=d12−⌈log2(d1/s1)⌉belongs toΩ 1. Sinces∗ 1≤s+ 1 , there exists a setS+ 1 ∈Ps+ 1 (d1)such that S∗ 1⊂S+ 1 . Define the statistic tS+ 1 = d2∑ j=1 ¯YS+ 1,j = 1√ s+ 1d2 d2∑ j=1 ∑ i∈S+ 1 Yij. Under the alternative hypothesis,EX[tS+ 1 ] = (µs2s∗ 1)/ √ s+ 1d2 and var(tS1) = 1. By our assumption onµan...

work page 2023

[28] [28]

Lemma 23.Let f : S→(0,∞)be differentiable, where S⊆(0,∞)is an interval

Then for anyk∈{1,...,n}, we have Pr(X=k)≤ (2enp k )k exp ( −np ) Proof.Using the bound (n k ) ≤ (ne k )k (Appendix A in Roch (2024)), we have Pr(X=k) = (n k ) pk( 1−p )n−k ≤ (npe k )k( 1−p )n−k = (npe k )k( 1−p )n( 1−p )−k ≤ (npe k )k( 1−p )n( 1/2 )−k = (2enp k )k( 1−p )n ≤ (2enp k )k exp ( −np ) where the final inequality uses(1−x)b≤exp(−xb)for anyx,b≥0....

work page 2024

[29] [29]

First case: Assumelog(d 1)>s 2. Then the quantities involved in the rate simplify as follows •ψ12 = log ( 1 + d2 s2 2 log(ed1) ) ≍log ( ed2 s2 ) + log ( elog(d 1) s2 ) •ψ21 = 1 s2 log ( 1 +d 1 log(e (d2 s2 ) ) ) ≍logd 1 s2 + log(log(e( d2 s2))) s2 , hence ψ12 +ψ21≍log (ed2 s2 ) + logd 1 s2 . Moreover, we also have ϕ21 = d2 s2 2 log(1 +d 1)≳log (ed2 s2 ) +...

work page

[30] [30]

We haveψ12 = log ( 1 + d2 s2 2 log(ed1) ) , and •ϕ12 =d 1 log ( 1 + d2 s2 2 ) ≥ψ12 by Lemma 25.(ii)

Assume now thatlog(d1)<s 2. We haveψ12 = log ( 1 + d2 s2 2 log(ed1) ) , and •ϕ12 =d 1 log ( 1 + d2 s2 2 ) ≥ψ12 by Lemma 25.(ii). •ϕ21 = d2 s2 2 log(1 +d 1)≳ψ12. Assume first thatd2 s2 2 log(ed1)< 1. It follows thatψ12≍ϕ21, so that(µ∗)2≍(ψ12+ψ21)∧ϕ12∧ϕ21≍ϕ12, as claimed. Now, assume thatd2 s2 2 log(ed1)≥1. Then by Lemma 25.(ii), we have s2ψ12≥log ( 1 +d2 l...

work page