A Multiscale Scan Statistic for Adaptive Submatrix Localization

Ery Arias-Castro; Yuchao Liu

arxiv: 1906.08884 · v1 · pith:NZFMGNUZnew · submitted 2019-06-20 · 🧮 math.ST · stat.CO· stat.ME· stat.TH

A Multiscale Scan Statistic for Adaptive Submatrix Localization

Yuchao Liu , Ery Arias-Castro This is my paper

Pith reviewed 2026-05-25 18:50 UTC · model grok-4.3

classification 🧮 math.ST stat.COstat.MEstat.TH

keywords submatrix localizationscan statisticmultiscale methodsanomaly detectionexact recoveryminimax ratesadaptive estimation

0 comments

The pith

A multiscale scan statistic localizes an anomalous submatrix without knowing its size in advance while matching the signal strength needed by an oracle estimator that knows the size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the task of identifying a rectangular submatrix whose entries have a higher mean than the rest of the data matrix when the dimensions of that submatrix are unknown beforehand. It constructs an optimization problem centered on a multiscale scan statistic and supplies algorithms to compute a solution. The central result shows that this procedure recovers the exact location of the submatrix with high probability at a signal strength that is of the same order as the minimax rate achieved by an estimator given oracle knowledge of the submatrix size. This removes a common practical obstacle because the size is seldom known in applications. Simulations indicate that the method outperforms other estimators that also lack size information and runs comparatively quickly.

Core claim

We establish an optimization framework based on a multiscale scan statistic for localizing a submatrix with elevated means without knowing its size, and prove that our estimator requires only signal strength of the same order as the minimax rate with oracle knowledge of the submatrix size to exactly recover the anomaly with high probability.

What carries the argument

The multiscale scan statistic that evaluates candidate submatrices over a range of scales and locations to identify the elevated-mean rectangle.

If this is right

Exact recovery of the anomalous submatrix becomes feasible in settings where its dimensions cannot be specified in advance.
The multiscale formulation yields an estimator whose computational cost remains practical relative to alternatives that also lack size information.
Performance in simulations exceeds that of other size-agnostic methods at comparable signal strengths.
The approach supplies both a theoretical guarantee and implementable algorithms for the same problem.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The independence assumption on matrix entries could be relaxed to weak dependence while preserving the recovery guarantee, though this would require a separate proof.
The same multiscale construction might apply directly to rectangular anomaly detection in image or genomic matrices without further modification.
Comparing the method's runtime scaling with matrix dimensions against exhaustive search provides a concrete way to quantify the efficiency gain.

Load-bearing premise

The data matrix entries are independent with a constant elevated mean inside an unknown rectangular submatrix and null mean elsewhere.

What would settle it

Generate matrices under the stated model with signal strength at the claimed order; if the proposed estimator fails to recover the exact submatrix with high probability while an oracle-size estimator succeeds, the matching-rate claim is falsified.

Figures

Figures reproduced from arXiv: 1906.08884 by Ery Arias-Castro, Yuchao Liu.

**Figure 2.** Figure 2: Error counts for an imbalanced design [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: Computing times for different algorithms. (The computing time is calculated by function [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗

**Figure 4.** Figure 4: Levelplots for illustrating unimodality The spectral method was computationally much more demanding than the other methods. As expected, the clustering algorithm based on the largest marginal gap is the fastest, however our algorithms are not too far (while much more accurate). 5.3 Unimodality issue in Algorithm 3 The golden section search presented in Algorithm 3 requires the function defined in (8) to be… view at source ↗

read the original abstract

We consider the problem of localizing a submatrix with larger-than-usual entry values inside a data matrix, without the prior knowledge of the submatrix size. We establish an optimization framework based on a multiscale scan statistic, and develop algorithms in order to approach the optimizer. We also show that our estimator only requires a signal strength of the same order as the minimax estimator with oracle knowledge of the submatrix size, to exactly recover the anomaly with high probability. We perform some simulations that show that our estimator has superior performance compared to other estimators which do not require prior submatrix knowledge, while being comparatively faster to compute.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Multiscale scan statistic matches oracle minimax rate for unknown-size submatrix localization under the standard Gaussian model.

read the letter

The key point is that this paper gives a multiscale scan statistic for finding a submatrix with elevated entries when you do not know the size ahead of time. Their estimator matches the signal strength needed by the oracle that knows the size, for exact recovery with high probability. The new part is the multiscale construction that adapts to unknown dimensions. They frame it as an optimization problem and provide algorithms to solve it. Under the independent Gaussian model with a rectangular anomaly, the theory shows the rate matches the minimax benchmark. Simulations back this up by showing better performance than other non-oracle estimators and quicker computation. The paper does well by staying close to the standard model and deriving the rate from known minimax results without introducing circular arguments. The citation pattern looks appropriate for the scan statistic literature. Soft spots are limited. The analysis depends on independent entries and constant mean shift, which is standard but means it does not handle dependence or varying signals. The simulation section is only summarized in the abstract, so the strength of the empirical evidence is not fully clear from what is here. These are not load-bearing issues for the main theoretical contribution. This paper is for researchers in statistical signal processing who care about anomaly localization in matrices. A reader focused on theoretical rates for adaptive detection will get something concrete from the matching bound and the multiscale idea. It deserves a serious referee because the result is sharp and the method is practical enough to implement. I recommend sending it out for peer review.

Referee Report

0 major / 3 minor

Summary. The paper addresses adaptive localization of an unknown rectangular submatrix with elevated mean inside an otherwise null-mean matrix of independent Gaussian entries. It introduces a multiscale scan statistic, formulates an optimization problem over candidate rectangles at multiple scales, develops algorithms to approximate the optimizer, and proves that the resulting estimator achieves exact recovery of the anomalous submatrix with high probability at the same signal-strength order as the oracle minimax estimator that knows the submatrix dimensions in advance. Simulations are reported to show improved performance over non-adaptive competitors while remaining computationally competitive.

Significance. If the central matching-rate claim holds, the work supplies a practical adaptive procedure whose detection threshold matches the information-theoretic optimum without requiring prior knowledge of submatrix size. The theoretical guarantee is a clear strength, as is the explicit comparison to the oracle benchmark and the reported simulation superiority.

minor comments (3)

[Abstract / Main theorem] The abstract states that the estimator 'exactly recover[s] the anomaly with high probability' at the oracle minimax order; the corresponding theorem statement (presumably in the main theoretical section) should explicitly restate the precise probability bound and the dependence on matrix dimensions n,m to avoid any ambiguity about the 'same order' claim.
[Simulations] Simulation section: the description of competing methods and the precise parameter settings (matrix size, signal strength, number of Monte Carlo trials) should be expanded so that the reported superiority can be reproduced from the text alone.
[Method / Notation] Notation: ensure that the multiscale collection of candidate rectangles is defined with a single consistent symbol before it is used in the optimization program and in the concentration arguments.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our manuscript on the multiscale scan statistic for adaptive submatrix localization. The description accurately captures the problem setup, the proposed method, the theoretical result matching the oracle minimax rate, and the simulation comparisons. We appreciate the recognition of these contributions and the recommendation for minor revision.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper constructs a multiscale scan statistic estimator for submatrix localization and proves that it matches the signal strength order of an external oracle minimax benchmark for exact recovery w.h.p. under the standard independent Gaussian model. This rate comparison is to a known external result rather than any fitted parameter or self-derived quantity. No self-citation is load-bearing for the central claim, no ansatz is smuggled, and no step reduces by definition or construction to the paper's own inputs. The derivation is self-contained against external minimax benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard i.i.d. entry assumptions and the existence of a rectangular anomalous block with constant signal; no free parameters or invented entities are introduced in the abstract.

axioms (1)

domain assumption Matrix entries are independent with known null distribution (e.g., standard normal) outside the anomalous submatrix.
Standard modeling choice for scan statistics; invoked to define the null and alternative.

pith-pipeline@v0.9.0 · 5630 in / 1079 out tokens · 23081 ms · 2026-05-25T18:50:50.153355+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

[1]

Abbe, E., A. S. Bandeira, and G. Hall (2016). Exact recovery in the stochastic block model. IEEE Transactions on Information Theory\/ 62\/ (1), 471--487

work page 2016
[2]

Arias-Castro, E., R. M. Castro, E. T \'a nczos, and M. Wang (2018). Distribution-free detection of structured anomalies: Permutation and rank-based scans. Journal of the American Statistical Association\/ 113\/ (522), 789--801

work page 2018
[3]

Arias-Castro, E. and Y. Liu (2017). Distribution-free detection of a submatrix. Journal of Multivariate Analysis\/ 156 , 29--38

work page 2017
[4]

Arias-Castro, E. and N. Verzelen (2014). Community detection in dense random networks. The Annals of Statistics\/ 42\/ (3), 940--969

work page 2014
[5]

Brault, V. and A. Channarond (2016). Fast and consistent algorithm for the latent block model. arXiv preprint arXiv:1610.09005\/

work page arXiv 2016
[6]

Butucea, C. and Y. I. Ingster (2013). Detection of a sparse submatrix of a high-dimensional noisy matrix. Bernoulli\/ 19\/ (5B), 2652--2688

work page 2013
[7]

Butucea, C., Y. I. Ingster, and I. A. Suslina (2015). Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix. ESAIM: Probability and Statistics\/ 19 , 115--134

work page 2015
[8]

Cai, T. T., T. Liang, A. Rakhlin, et al. (2017). Computational and statistical boundaries for submatrix localization in a large noisy matrix. The Annals of Statistics\/ 45\/ (4), 1403--1430

work page 2017
[9]

Chang, Y.-C. (2009). N-dimension golden section search: Its variants and limitations. In 2009 2nd International Conference on Biomedical Engineering and Informatics , pp.\ 1--6. IEEE

work page 2009
[10]

Charrad, M. and M. B. Ahmed (2011). Simultaneous clustering: A survey. In International Conference on Pattern Recognition and Machine Intelligence , pp.\ 370--375. Springer

work page 2011
[11]

Chung, and A

Chaudhuri, K., F. Chung, and A. Tsiatas (2012). Spectral clustering of graphs with general degrees in the extended planted partition model. In Conference on Learning Theory , pp.\ 1--23

work page 2012
[12]

Chen, Y. and J. Xu (2016). Statistical-computational tradeoffs in planted problems and submatrix localization with a growing number of clusters and submatrices. Journal of Machine Learning Research\/ 17\/ (27), 1--57

work page 2016
[13]

Cheng, Y. and G. M. Church (2000). Biclustering of expression data. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology , pp.\ 93--103. AAAI Press

work page 2000
[14]

Chi, E. C., G. I. Allen, and R. G. Baraniuk (2016). Convex biclustering. Biometrics\/

work page 2016
[15]

Gotelli, N. J. (2000). Null model analysis of species co-occurrence patterns. Ecology\/ 81\/ (9), 2606--2621

work page 2000
[16]

Wu, and J

Hajek, B., Y. Wu, and J. Xu (2017a). Information limits for recovering a hidden community. IEEE Transactions on Information Theory\/ 63\/ (8), 4729--4745

work page
[17]

Wu, and J

Hajek, B., Y. Wu, and J. Xu (2017b). Submatrix localization via message passing. The Journal of Machine Learning Research\/ 18\/ (1), 6817--6868

work page
[18]

Holland, P. W., K. B. Laskey, and S. Leinhardt (1983). Stochastic blockmodels: First steps. Social networks\/ 5\/ (2), 109--137

work page 1983
[19]

Balakrishnan, A

Kolar, M., S. Balakrishnan, A. Rinaldo, and A. Singh (2011). Minimax localization of structural information in large noisy matrices. In Advances in Neural Information Processing Systems , pp.\ 909--917

work page 2011
[20]

Lee, J. D., Y. Sun, and J. E. Taylor (2015). Evaluating the statistical significance of biclusters. In Advances in Neural Information Processing Systems , pp.\ 1324--1332

work page 2015
[21]

Lehmann, E. L. and J. P. Romano (2005). Testing Statistical Hypotheses\/ (3rd ed.). Spring. New York: Springer

work page 2005
[22]

Ma, Z. and Y. Wu (2015). Computational barriers in minimax submatrix detection. The Annals of Statistics\/ 43\/ (3), 1089--1116

work page 2015
[23]

Madeira, S. C. and A. L. Oliveira (2004). Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)\/ 1\/ (1), 24--45

work page 2004
[24]

McSherry, F. (2001). Spectral partitioning of random graphs. In Proceedings of the 42nd IEEE symposium on Foundations of Computer Science , pp.\ 529. IEEE Computer Society

work page 2001
[25]

Neeman, A

Mossel, E., J. Neeman, A. Sly, et al. (2016). Consistency thresholds for the planted bisection model. Electronic Journal of Probability\/ 21

work page 2016
[26]

Bleuler, P

Preli \'c , A., S. Bleuler, P. Zimmermann, A. Wille, P. B \"u hlmann, W. Gruissem, L. Hennig, L. Thiele, and E. Zitzler (2006). A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics\/ 22\/ (9), 1122--1129

work page 2006
[27]

Shabalin, A. A., V. J. Weigman, C. M. Perou, and A. B. Nobel (2009). Finding large average submatrices in high dimensional data. The Annals of Applied Statistics\/ 3\/ (3), 985--1012

work page 2009
[28]

Tan, K. M. and D. M. Witten (2014). Sparse biclustering of transposable data. Journal of Computational and Graphical Statistics\/ 23\/ (4), 985--1008

work page 2014
[29]

Sharan, and R

Tanay, A., R. Sharan, and R. Shamir (2002). Discovering statistically significant biclusters in gene expression data. Bioinformatics\/ 18\/ (suppl 1), S136--S144

work page 2002
[30]

Verzelen, N. and E. Arias-Castro (2015). Community detection in sparse random networks. The Annals of Applied Probability\/ 25\/ (6), 3465--3510

work page 2015
[31]

Zhang, A. Y. and H. H. Zhou (2016). Minimax rates of community detection in stochastic block models. The Annals of Statistics\/ 44\/ (5), 2252--2280

work page 2016

[1] [1]

Abbe, E., A. S. Bandeira, and G. Hall (2016). Exact recovery in the stochastic block model. IEEE Transactions on Information Theory\/ 62\/ (1), 471--487

work page 2016

[2] [2]

Arias-Castro, E., R. M. Castro, E. T \'a nczos, and M. Wang (2018). Distribution-free detection of structured anomalies: Permutation and rank-based scans. Journal of the American Statistical Association\/ 113\/ (522), 789--801

work page 2018

[3] [3]

Arias-Castro, E. and Y. Liu (2017). Distribution-free detection of a submatrix. Journal of Multivariate Analysis\/ 156 , 29--38

work page 2017

[4] [4]

Arias-Castro, E. and N. Verzelen (2014). Community detection in dense random networks. The Annals of Statistics\/ 42\/ (3), 940--969

work page 2014

[5] [5]

Brault, V. and A. Channarond (2016). Fast and consistent algorithm for the latent block model. arXiv preprint arXiv:1610.09005\/

work page arXiv 2016

[6] [6]

Butucea, C. and Y. I. Ingster (2013). Detection of a sparse submatrix of a high-dimensional noisy matrix. Bernoulli\/ 19\/ (5B), 2652--2688

work page 2013

[7] [7]

Butucea, C., Y. I. Ingster, and I. A. Suslina (2015). Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix. ESAIM: Probability and Statistics\/ 19 , 115--134

work page 2015

[8] [8]

Cai, T. T., T. Liang, A. Rakhlin, et al. (2017). Computational and statistical boundaries for submatrix localization in a large noisy matrix. The Annals of Statistics\/ 45\/ (4), 1403--1430

work page 2017

[9] [9]

Chang, Y.-C. (2009). N-dimension golden section search: Its variants and limitations. In 2009 2nd International Conference on Biomedical Engineering and Informatics , pp.\ 1--6. IEEE

work page 2009

[10] [10]

Charrad, M. and M. B. Ahmed (2011). Simultaneous clustering: A survey. In International Conference on Pattern Recognition and Machine Intelligence , pp.\ 370--375. Springer

work page 2011

[11] [11]

Chung, and A

Chaudhuri, K., F. Chung, and A. Tsiatas (2012). Spectral clustering of graphs with general degrees in the extended planted partition model. In Conference on Learning Theory , pp.\ 1--23

work page 2012

[12] [12]

Chen, Y. and J. Xu (2016). Statistical-computational tradeoffs in planted problems and submatrix localization with a growing number of clusters and submatrices. Journal of Machine Learning Research\/ 17\/ (27), 1--57

work page 2016

[13] [13]

Cheng, Y. and G. M. Church (2000). Biclustering of expression data. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology , pp.\ 93--103. AAAI Press

work page 2000

[14] [14]

Chi, E. C., G. I. Allen, and R. G. Baraniuk (2016). Convex biclustering. Biometrics\/

work page 2016

[15] [15]

Gotelli, N. J. (2000). Null model analysis of species co-occurrence patterns. Ecology\/ 81\/ (9), 2606--2621

work page 2000

[16] [16]

Wu, and J

Hajek, B., Y. Wu, and J. Xu (2017a). Information limits for recovering a hidden community. IEEE Transactions on Information Theory\/ 63\/ (8), 4729--4745

work page

[17] [17]

Wu, and J

Hajek, B., Y. Wu, and J. Xu (2017b). Submatrix localization via message passing. The Journal of Machine Learning Research\/ 18\/ (1), 6817--6868

work page

[18] [18]

Holland, P. W., K. B. Laskey, and S. Leinhardt (1983). Stochastic blockmodels: First steps. Social networks\/ 5\/ (2), 109--137

work page 1983

[19] [19]

Balakrishnan, A

Kolar, M., S. Balakrishnan, A. Rinaldo, and A. Singh (2011). Minimax localization of structural information in large noisy matrices. In Advances in Neural Information Processing Systems , pp.\ 909--917

work page 2011

[20] [20]

Lee, J. D., Y. Sun, and J. E. Taylor (2015). Evaluating the statistical significance of biclusters. In Advances in Neural Information Processing Systems , pp.\ 1324--1332

work page 2015

[21] [21]

Lehmann, E. L. and J. P. Romano (2005). Testing Statistical Hypotheses\/ (3rd ed.). Spring. New York: Springer

work page 2005

[22] [22]

Ma, Z. and Y. Wu (2015). Computational barriers in minimax submatrix detection. The Annals of Statistics\/ 43\/ (3), 1089--1116

work page 2015

[23] [23]

Madeira, S. C. and A. L. Oliveira (2004). Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)\/ 1\/ (1), 24--45

work page 2004

[24] [24]

McSherry, F. (2001). Spectral partitioning of random graphs. In Proceedings of the 42nd IEEE symposium on Foundations of Computer Science , pp.\ 529. IEEE Computer Society

work page 2001

[25] [25]

Neeman, A

Mossel, E., J. Neeman, A. Sly, et al. (2016). Consistency thresholds for the planted bisection model. Electronic Journal of Probability\/ 21

work page 2016

[26] [26]

Bleuler, P

Preli \'c , A., S. Bleuler, P. Zimmermann, A. Wille, P. B \"u hlmann, W. Gruissem, L. Hennig, L. Thiele, and E. Zitzler (2006). A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics\/ 22\/ (9), 1122--1129

work page 2006

[27] [27]

Shabalin, A. A., V. J. Weigman, C. M. Perou, and A. B. Nobel (2009). Finding large average submatrices in high dimensional data. The Annals of Applied Statistics\/ 3\/ (3), 985--1012

work page 2009

[28] [28]

Tan, K. M. and D. M. Witten (2014). Sparse biclustering of transposable data. Journal of Computational and Graphical Statistics\/ 23\/ (4), 985--1008

work page 2014

[29] [29]

Sharan, and R

Tanay, A., R. Sharan, and R. Shamir (2002). Discovering statistically significant biclusters in gene expression data. Bioinformatics\/ 18\/ (suppl 1), S136--S144

work page 2002

[30] [30]

Verzelen, N. and E. Arias-Castro (2015). Community detection in sparse random networks. The Annals of Applied Probability\/ 25\/ (6), 3465--3510

work page 2015

[31] [31]

Zhang, A. Y. and H. H. Zhou (2016). Minimax rates of community detection in stochastic block models. The Annals of Statistics\/ 44\/ (5), 2252--2280

work page 2016