pith. sign in

arxiv: 1906.08884 · v1 · pith:NZFMGNUZnew · submitted 2019-06-20 · 🧮 math.ST · stat.CO· stat.ME· stat.TH

A Multiscale Scan Statistic for Adaptive Submatrix Localization

Pith reviewed 2026-05-25 18:50 UTC · model grok-4.3

classification 🧮 math.ST stat.COstat.MEstat.TH
keywords submatrix localizationscan statisticmultiscale methodsanomaly detectionexact recoveryminimax ratesadaptive estimation
0
0 comments X

The pith

A multiscale scan statistic localizes an anomalous submatrix without knowing its size in advance while matching the signal strength needed by an oracle estimator that knows the size.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper addresses the task of identifying a rectangular submatrix whose entries have a higher mean than the rest of the data matrix when the dimensions of that submatrix are unknown beforehand. It constructs an optimization problem centered on a multiscale scan statistic and supplies algorithms to compute a solution. The central result shows that this procedure recovers the exact location of the submatrix with high probability at a signal strength that is of the same order as the minimax rate achieved by an estimator given oracle knowledge of the submatrix size. This removes a common practical obstacle because the size is seldom known in applications. Simulations indicate that the method outperforms other estimators that also lack size information and runs comparatively quickly.

Core claim

We establish an optimization framework based on a multiscale scan statistic for localizing a submatrix with elevated means without knowing its size, and prove that our estimator requires only signal strength of the same order as the minimax rate with oracle knowledge of the submatrix size to exactly recover the anomaly with high probability.

What carries the argument

The multiscale scan statistic that evaluates candidate submatrices over a range of scales and locations to identify the elevated-mean rectangle.

If this is right

  • Exact recovery of the anomalous submatrix becomes feasible in settings where its dimensions cannot be specified in advance.
  • The multiscale formulation yields an estimator whose computational cost remains practical relative to alternatives that also lack size information.
  • Performance in simulations exceeds that of other size-agnostic methods at comparable signal strengths.
  • The approach supplies both a theoretical guarantee and implementable algorithms for the same problem.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The independence assumption on matrix entries could be relaxed to weak dependence while preserving the recovery guarantee, though this would require a separate proof.
  • The same multiscale construction might apply directly to rectangular anomaly detection in image or genomic matrices without further modification.
  • Comparing the method's runtime scaling with matrix dimensions against exhaustive search provides a concrete way to quantify the efficiency gain.

Load-bearing premise

The data matrix entries are independent with a constant elevated mean inside an unknown rectangular submatrix and null mean elsewhere.

What would settle it

Generate matrices under the stated model with signal strength at the claimed order; if the proposed estimator fails to recover the exact submatrix with high probability while an oracle-size estimator succeeds, the matching-rate claim is falsified.

Figures

Figures reproduced from arXiv: 1906.08884 by Ery Arias-Castro, Yuchao Liu.

Figure 1
Figure 1. Figure 1: Error counts for a balanced design [PITH_FULL_IMAGE:figures/full_fig_p011_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Error counts for an imbalanced design [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Computing times for different algorithms. (The computing time is calculated by function [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Levelplots for illustrating unimodality The spectral method was computationally much more demanding than the other methods. As expected, the clustering algorithm based on the largest marginal gap is the fastest, however our algorithms are not too far (while much more accurate). 5.3 Unimodality issue in Algorithm 3 The golden section search presented in Algorithm 3 requires the function defined in (8) to be… view at source ↗
read the original abstract

We consider the problem of localizing a submatrix with larger-than-usual entry values inside a data matrix, without the prior knowledge of the submatrix size. We establish an optimization framework based on a multiscale scan statistic, and develop algorithms in order to approach the optimizer. We also show that our estimator only requires a signal strength of the same order as the minimax estimator with oracle knowledge of the submatrix size, to exactly recover the anomaly with high probability. We perform some simulations that show that our estimator has superior performance compared to other estimators which do not require prior submatrix knowledge, while being comparatively faster to compute.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper addresses adaptive localization of an unknown rectangular submatrix with elevated mean inside an otherwise null-mean matrix of independent Gaussian entries. It introduces a multiscale scan statistic, formulates an optimization problem over candidate rectangles at multiple scales, develops algorithms to approximate the optimizer, and proves that the resulting estimator achieves exact recovery of the anomalous submatrix with high probability at the same signal-strength order as the oracle minimax estimator that knows the submatrix dimensions in advance. Simulations are reported to show improved performance over non-adaptive competitors while remaining computationally competitive.

Significance. If the central matching-rate claim holds, the work supplies a practical adaptive procedure whose detection threshold matches the information-theoretic optimum without requiring prior knowledge of submatrix size. The theoretical guarantee is a clear strength, as is the explicit comparison to the oracle benchmark and the reported simulation superiority.

minor comments (3)
  1. [Abstract / Main theorem] The abstract states that the estimator 'exactly recover[s] the anomaly with high probability' at the oracle minimax order; the corresponding theorem statement (presumably in the main theoretical section) should explicitly restate the precise probability bound and the dependence on matrix dimensions n,m to avoid any ambiguity about the 'same order' claim.
  2. [Simulations] Simulation section: the description of competing methods and the precise parameter settings (matrix size, signal strength, number of Monte Carlo trials) should be expanded so that the reported superiority can be reproduced from the text alone.
  3. [Method / Notation] Notation: ensure that the multiscale collection of candidate rectangles is defined with a single consistent symbol before it is used in the optimization program and in the concentration arguments.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our manuscript on the multiscale scan statistic for adaptive submatrix localization. The description accurately captures the problem setup, the proposed method, the theoretical result matching the oracle minimax rate, and the simulation comparisons. We appreciate the recognition of these contributions and the recommendation for minor revision.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper constructs a multiscale scan statistic estimator for submatrix localization and proves that it matches the signal strength order of an external oracle minimax benchmark for exact recovery w.h.p. under the standard independent Gaussian model. This rate comparison is to a known external result rather than any fitted parameter or self-derived quantity. No self-citation is load-bearing for the central claim, no ansatz is smuggled, and no step reduces by definition or construction to the paper's own inputs. The derivation is self-contained against external minimax benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on standard i.i.d. entry assumptions and the existence of a rectangular anomalous block with constant signal; no free parameters or invented entities are introduced in the abstract.

axioms (1)
  • domain assumption Matrix entries are independent with known null distribution (e.g., standard normal) outside the anomalous submatrix.
    Standard modeling choice for scan statistics; invoked to define the null and alternative.

pith-pipeline@v0.9.0 · 5630 in / 1079 out tokens · 23081 ms · 2026-05-25T18:50:50.153355+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

  1. [1]

    Abbe, E., A. S. Bandeira, and G. Hall (2016). Exact recovery in the stochastic block model. IEEE Transactions on Information Theory\/ 62\/ (1), 471--487

  2. [2]

    Arias-Castro, E., R. M. Castro, E. T \'a nczos, and M. Wang (2018). Distribution-free detection of structured anomalies: Permutation and rank-based scans. Journal of the American Statistical Association\/ 113\/ (522), 789--801

  3. [3]

    Arias-Castro, E. and Y. Liu (2017). Distribution-free detection of a submatrix. Journal of Multivariate Analysis\/ 156 , 29--38

  4. [4]

    Arias-Castro, E. and N. Verzelen (2014). Community detection in dense random networks. The Annals of Statistics\/ 42\/ (3), 940--969

  5. [5]

    Brault, V. and A. Channarond (2016). Fast and consistent algorithm for the latent block model. arXiv preprint arXiv:1610.09005\/

  6. [6]

    Butucea, C. and Y. I. Ingster (2013). Detection of a sparse submatrix of a high-dimensional noisy matrix. Bernoulli\/ 19\/ (5B), 2652--2688

  7. [7]

    Butucea, C., Y. I. Ingster, and I. A. Suslina (2015). Sharp variable selection of a sparse submatrix in a high-dimensional noisy matrix. ESAIM: Probability and Statistics\/ 19 , 115--134

  8. [8]

    Cai, T. T., T. Liang, A. Rakhlin, et al. (2017). Computational and statistical boundaries for submatrix localization in a large noisy matrix. The Annals of Statistics\/ 45\/ (4), 1403--1430

  9. [9]

    Chang, Y.-C. (2009). N-dimension golden section search: Its variants and limitations. In 2009 2nd International Conference on Biomedical Engineering and Informatics , pp.\ 1--6. IEEE

  10. [10]

    Charrad, M. and M. B. Ahmed (2011). Simultaneous clustering: A survey. In International Conference on Pattern Recognition and Machine Intelligence , pp.\ 370--375. Springer

  11. [11]

    Chung, and A

    Chaudhuri, K., F. Chung, and A. Tsiatas (2012). Spectral clustering of graphs with general degrees in the extended planted partition model. In Conference on Learning Theory , pp.\ 1--23

  12. [12]

    Chen, Y. and J. Xu (2016). Statistical-computational tradeoffs in planted problems and submatrix localization with a growing number of clusters and submatrices. Journal of Machine Learning Research\/ 17\/ (27), 1--57

  13. [13]

    Cheng, Y. and G. M. Church (2000). Biclustering of expression data. In Proceedings of the Eighth International Conference on Intelligent Systems for Molecular Biology , pp.\ 93--103. AAAI Press

  14. [14]

    Chi, E. C., G. I. Allen, and R. G. Baraniuk (2016). Convex biclustering. Biometrics\/

  15. [15]

    Gotelli, N. J. (2000). Null model analysis of species co-occurrence patterns. Ecology\/ 81\/ (9), 2606--2621

  16. [16]

    Wu, and J

    Hajek, B., Y. Wu, and J. Xu (2017a). Information limits for recovering a hidden community. IEEE Transactions on Information Theory\/ 63\/ (8), 4729--4745

  17. [17]

    Wu, and J

    Hajek, B., Y. Wu, and J. Xu (2017b). Submatrix localization via message passing. The Journal of Machine Learning Research\/ 18\/ (1), 6817--6868

  18. [18]

    Holland, P. W., K. B. Laskey, and S. Leinhardt (1983). Stochastic blockmodels: First steps. Social networks\/ 5\/ (2), 109--137

  19. [19]

    Balakrishnan, A

    Kolar, M., S. Balakrishnan, A. Rinaldo, and A. Singh (2011). Minimax localization of structural information in large noisy matrices. In Advances in Neural Information Processing Systems , pp.\ 909--917

  20. [20]

    Lee, J. D., Y. Sun, and J. E. Taylor (2015). Evaluating the statistical significance of biclusters. In Advances in Neural Information Processing Systems , pp.\ 1324--1332

  21. [21]

    Lehmann, E. L. and J. P. Romano (2005). Testing Statistical Hypotheses\/ (3rd ed.). Spring. New York: Springer

  22. [22]

    Ma, Z. and Y. Wu (2015). Computational barriers in minimax submatrix detection. The Annals of Statistics\/ 43\/ (3), 1089--1116

  23. [23]

    Madeira, S. C. and A. L. Oliveira (2004). Biclustering algorithms for biological data analysis: a survey. IEEE/ACM Transactions on Computational Biology and Bioinformatics (TCBB)\/ 1\/ (1), 24--45

  24. [24]

    McSherry, F. (2001). Spectral partitioning of random graphs. In Proceedings of the 42nd IEEE symposium on Foundations of Computer Science , pp.\ 529. IEEE Computer Society

  25. [25]

    Neeman, A

    Mossel, E., J. Neeman, A. Sly, et al. (2016). Consistency thresholds for the planted bisection model. Electronic Journal of Probability\/ 21

  26. [26]

    Bleuler, P

    Preli \'c , A., S. Bleuler, P. Zimmermann, A. Wille, P. B \"u hlmann, W. Gruissem, L. Hennig, L. Thiele, and E. Zitzler (2006). A systematic comparison and evaluation of biclustering methods for gene expression data. Bioinformatics\/ 22\/ (9), 1122--1129

  27. [27]

    Shabalin, A. A., V. J. Weigman, C. M. Perou, and A. B. Nobel (2009). Finding large average submatrices in high dimensional data. The Annals of Applied Statistics\/ 3\/ (3), 985--1012

  28. [28]

    Tan, K. M. and D. M. Witten (2014). Sparse biclustering of transposable data. Journal of Computational and Graphical Statistics\/ 23\/ (4), 985--1008

  29. [29]

    Sharan, and R

    Tanay, A., R. Sharan, and R. Shamir (2002). Discovering statistically significant biclusters in gene expression data. Bioinformatics\/ 18\/ (suppl 1), S136--S144

  30. [30]

    Verzelen, N. and E. Arias-Castro (2015). Community detection in sparse random networks. The Annals of Applied Probability\/ 25\/ (6), 3465--3510

  31. [31]

    Zhang, A. Y. and H. H. Zhou (2016). Minimax rates of community detection in stochastic block models. The Annals of Statistics\/ 44\/ (5), 2252--2280