Adaptive spatial blocking for scalable clustering inference with applications to high-throughput spatial proteomics
Pith reviewed 2026-06-27 08:43 UTC · model grok-4.3
The pith
Adaptive spatial blocking makes Ripley's K-function clustering tests feasible on large point pattern images.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The proposed adaptive spatial blocking algorithm constructs blocks satisfying point-count and shape constraints, enabling scalable spatial clustering inference and fast p-value computation through an asymptotic normal approximation. Numerical studies demonstrate that the proposed method provides a favorable balance between statistical power and computational efficiency. In an application to healthy human intestine spatial proteomics data, the method detects strong spatial aggregation of plasma cells and colocalization between plasma cells and macrophages, while scaling favorably to large images.
What carries the argument
The adaptive spatial blocking algorithm, which partitions the image into disjoint blocks meeting point-count and shape constraints so that local K-function signals can be aggregated and approximated by a normal distribution for p-values.
If this is right
- Clustering inference becomes practical for high-throughput spatial proteomics images that were previously too large for full K-function calculations.
- P-values can be obtained rapidly via the normal approximation once blocks are formed.
- The method maintains statistical power while reducing computation time, as shown in the numerical studies.
- Real-data analysis can identify biologically relevant patterns such as cell aggregation and colocalization at scale.
Where Pith is reading between the lines
- The same blocking idea might extend to other spatial summary statistics that also suffer from quadratic scaling with point count.
- Parallel implementation of block construction and local K calculations could further reduce wall-clock time on multi-core hardware.
- If the shape and count constraints prove too restrictive in some domains, relaxing them while preserving the normal approximation would be a direct next test.
Load-bearing premise
Aggregating clustering evidence from the constructed disjoint blocks still permits a reliable asymptotic normal approximation for p-value calculation.
What would settle it
Compare p-values and detected clusters from the blocked method against those from the full-image K-function calculation on the same large simulated or real datasets; substantial mismatch in either quantity would undermine the claim.
Figures
read the original abstract
Ripley's K-function is a widely used spatial summary statistic for assessing clustering in point patterns. However, existing K-based methods can be computationally prohibitive for large-scale data, particularly in high-throughput spatial proteomics, because they rely on spatial information from all points in the image. To address this challenge, we propose a computationally efficient block-based testing framework that extracts disjoint local blocks from an image and aggregates clustering evidence across them. The proposed adaptive spatial blocking algorithm constructs blocks satisfying point-count and shape constraints, enabling scalable spatial clustering inference and fast p-value computation through an asymptotic normal approximation. Numerical studies demonstrate that the proposed method provides a favorable balance between statistical power and computational efficiency. In an application to healthy human intestine spatial proteomics data, our method detects strong spatial aggregation of plasma cells and colocalization between plasma cells and macrophages, while scaling favorably to large images.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes an adaptive spatial blocking algorithm that extracts disjoint blocks from large point patterns subject to point-count and shape constraints. Clustering evidence is aggregated across blocks using Ripley's K-function, and p-values are obtained via an asymptotic normal approximation. Numerical studies are claimed to demonstrate a favorable power-efficiency tradeoff, and the method is applied to healthy human intestine spatial proteomics data to detect plasma-cell aggregation and colocalization with macrophages.
Significance. If the asymptotic normality of the aggregated statistic remains valid after data-dependent block selection, the framework would address a genuine computational bottleneck in spatial statistics for high-throughput proteomics and similar large-scale point-pattern applications, enabling routine analysis of images that are currently intractable.
major comments (2)
- [Abstract] Abstract: the claim that numerical studies demonstrate a favorable power-efficiency balance supplies no quantitative details on type-I error rates, coverage probabilities of the normal approximation, or sensitivity of results to block-construction parameters. Without these, the central claim of reliable scalable inference cannot be evaluated.
- [Method (adaptive blocking)] Method (adaptive blocking construction): blocks are selected using the same observed point pattern that enters the aggregated K-statistic. Standard central-limit arguments for fixed blocks do not automatically extend to this data-dependent selection; the manuscript provides neither additional regularity conditions nor simulation evidence confirming that the null distribution of the aggregated quantity remains asymptotically normal after adaptation.
minor comments (1)
- [Application] Application section: the reported detections of aggregation and colocalization are stated qualitatively; quantitative p-values, effect sizes, and block counts used should be reported for reproducibility.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive report. The comments highlight important aspects of the abstract presentation and the theoretical justification for the asymptotic approximation under adaptive blocking. We respond to each major comment below and indicate the revisions we will make.
read point-by-point responses
-
Referee: [Abstract] Abstract: the claim that numerical studies demonstrate a favorable power-efficiency balance supplies no quantitative details on type-I error rates, coverage probabilities of the normal approximation, or sensitivity of results to block-construction parameters. Without these, the central claim of reliable scalable inference cannot be evaluated.
Authors: We agree that the abstract would benefit from greater specificity. The numerical studies section already contains the requested quantities (empirical type-I error rates near nominal levels, coverage of the normal approximation, and sensitivity checks to block parameters), but these are not summarized in the abstract. In the revised manuscript we will update the abstract to include concise quantitative statements drawn from those studies. revision: yes
-
Referee: [Method (adaptive blocking)] Method (adaptive blocking construction): blocks are selected using the same observed point pattern that enters the aggregated K-statistic. Standard central-limit arguments for fixed blocks do not automatically extend to this data-dependent selection; the manuscript provides neither additional regularity conditions nor simulation evidence confirming that the null distribution of the aggregated quantity remains asymptotically normal after adaptation.
Authors: We acknowledge that data-dependent block selection requires justification beyond the fixed-block case. The selection rules depend only on local point counts and shape constraints, which under the null are independent of the spatial clustering structure that the K-function targets. The numerical studies section already reports Monte Carlo experiments under the null in which the aggregated statistic exhibits close-to-normal behavior and type-I error control across a range of image sizes and block-parameter settings. We will add an explicit paragraph in the methods section discussing why the adaptation does not invalidate the asymptotic normality and will include a supplementary table summarizing the simulation diagnostics for the normal approximation. revision: partial
Circularity Check
No circularity; asymptotic approximation treated as standard external tool
full rationale
The provided abstract and description present the adaptive blocking algorithm as a computational device that extracts disjoint blocks satisfying point-count and shape constraints, after which clustering evidence is aggregated and an asymptotic normal approximation (a standard statistical result) is invoked for p-value computation. No equations, fitted parameters, or self-citations are shown that define the normal approximation in terms of the blocks or reduce the reported performance to a quantity fitted from the same data. The derivation chain therefore remains self-contained against external benchmarks; the reader's suggested concern about data-dependent block selection is a question of regularity conditions for the CLT, not a circular reduction by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Aggregated evidence across disjoint blocks follows an asymptotic normal distribution suitable for p-value calculation
Reference graph
Works this paper leans on
-
[1]
Nature methods , volume=
Squidpy: a scalable framework for spatial omics analysis , author=. Nature methods , volume=. 2022 , publisher=
2022
-
[2]
Nature Reviews Genetics , volume=
Methods and applications for single-cell and spatial multi-omics , author=. Nature Reviews Genetics , volume=. 2023 , publisher=
2023
-
[3]
Cell , volume=
A structured tumor-immune microenvironment in triple negative breast cancer revealed by multiplexed ion beam imaging , author=. Cell , volume=. 2018 , publisher=
2018
-
[4]
Feature extraction for proteomics imaging mass spectrometry data , author=
-
[5]
1988 , publisher=
Statistical inference for spatial processes , author=. 1988 , publisher=
1988
-
[6]
PLOS Computational Biology , volume=
FunSpace: a functional and spatial analytic approach to cell imaging data using entropy measures , author=. PLOS Computational Biology , volume=. 2023 , publisher=
2023
-
[7]
PLoS computational biology , volume=
Tumor immune cell clustering and its association with survival in African American women with ovarian cancer , author=. PLoS computational biology , volume=. 2022 , publisher=
2022
-
[8]
Nature biotechnology , volume=
High-plex imaging of RNA and proteins at subcellular resolution in fixed tissue by spatial molecular imaging , author=. Nature biotechnology , volume=. 2022 , publisher=
2022
-
[9]
Gen Biotechnology , volume=
Mapping the spatial proteome of head and neck tumors: key immune mediators and metabolic determinants in the tumor microenvironment , author=. Gen Biotechnology , volume=. 2023 , publisher=
2023
-
[10]
arXiv preprint arXiv:2412.08498 , year=
A robust, scalable K-statistic for quantifying immune cell clustering in spatial proteomics data , author=. arXiv preprint arXiv:2412.08498 , year=
-
[11]
Cell systems , volume=
Simultaneous multiplexed imaging of mRNA and proteins with subcellular resolution in breast cancer tissue samples by mass cytometry , author=. Cell systems , volume=. 2018 , publisher=
2018
-
[12]
Advances in neural information processing systems , volume=
B-test: A non-parametric, low variance kernel two-sample test , author=. Advances in neural information processing systems , volume=
-
[13]
arXiv preprint arXiv:2110.03118 , year=
A fast and effective large-scale two-sample test based on kernels , author=. arXiv preprint arXiv:2110.03118 , year=
-
[14]
Nature , volume=
The single-cell pathology landscape of breast cancer , author=. Nature , volume=. 2020 , publisher=
2020
-
[15]
Science , volume=
The dawn of spatial omics , author=. Science , volume=. 2023 , publisher=
2023
-
[16]
Nature Cancer , volume=
Imaging mass cytometry and multiplatform genomics define the phenogenomic landscape of breast cancer , author=. Nature Cancer , volume=. 2020 , publisher=
2020
-
[17]
Journal for Immunotherapy of Cancer , volume=
Geospatial characterization of immune cell distributions and dynamics across the microenvironment in clear cell renal cell carcinoma , author=. Journal for Immunotherapy of Cancer , volume=
-
[18]
A Merging Heuristic for the Rectangle Decomposition of Binary Matrices
Subercaze, Julien and Gravier, Christophe and Rocher, Pierre-Olivier. A Merging Heuristic for the Rectangle Decomposition of Binary Matrices. Experimental Algorithms. 2016
2016
-
[19]
Journal of Vegetation Science , volume=
On explicit formulas of edge effect correction for Ripley's K-function , author=. Journal of Vegetation Science , volume=. 1999 , publisher=
1999
-
[20]
Journal of Vegetation Science , volume=
A practical approach to the study of spatial structure in simple cases of heterogeneous vegetation , author=. Journal of Vegetation Science , volume=. 2001 , publisher=
2001
-
[21]
Journal of vegetation science , volume=
Spatial pattern analysis in ecology based on Ripley's K-function: Introduction and methods of edge correction , author=. Journal of vegetation science , volume=. 1995 , publisher=
1995
-
[22]
Biometrical Journal , volume=
Edge-corrected estimators for the reduced second moment measure of point processes , author=. Biometrical Journal , volume=. 1989 , publisher=
1989
-
[23]
International symposium on algorithms and computation , pages=
Computing the smallest color-spanning axis-parallel square , author=. International symposium on algorithms and computation , pages=. 2013 , organization=
2013
-
[24]
Proceedings of the 25th Canadian Conference on Computational Geometry , pages=
On k-enclosing objects in a coloured point set , author=. Proceedings of the 25th Canadian Conference on Computational Geometry , pages=
-
[25]
Acta informatica , volume=
Quad trees a data structure for retrieval on composite keys , author=. Acta informatica , volume=. 1974 , publisher=
1974
-
[26]
ACM Computing Surveys (CSUR) , volume=
The quadtree and related hierarchical data structures , author=. ACM Computing Surveys (CSUR) , volume=. 1984 , publisher=
1984
-
[27]
2016 , publisher=
Spatial point patterns: methodology and applications with R , author=. 2016 , publisher=
2016
-
[28]
5864694150 bytes Dryad 10.5061/DRYAD , author=
Processed single cell data from CODEX multiplexed imaging of the human intestine. 5864694150 bytes Dryad 10.5061/DRYAD , author=. 2022 , publisher=
-
[29]
Nature , volume=
Organization of the human intestine at single-cell resolution , author=. Nature , volume=. 2023 , publisher=
2023
-
[30]
Journal of translational medicine , volume=
Spatial distribution pattern of immune cells is associated with patient prognosis in colorectal cancer , author=. Journal of translational medicine , volume=. 2024 , publisher=
2024
-
[31]
Bioinformatics , volume=
A SPatial Omnibus Test (SPOT) for spatial proteomic data , author=. Bioinformatics , volume=. 2024 , publisher=
2024
-
[32]
Cell , volume=
Coordinated cellular neighborhoods orchestrate antitumoral immunity at the colorectal cancer invasive front , author=. Cell , volume=. 2020 , publisher=
2020
-
[33]
Journal of Allergy and Clinical Immunology , volume=
Profiling immune cell tissue niches in the spatial-omics era , author=. Journal of Allergy and Clinical Immunology , volume=. 2025 , publisher=
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.