pith. sign in

arxiv: 2605.15291 · v1 · pith:LVRAYLFYnew · submitted 2026-05-14 · 📊 stat.AP

BaySC: Uncovering Tissue Architecture in Spatial Multi-Omics via Probabilistic Spatial Clustering

Pith reviewed 2026-05-19 15:53 UTC · model grok-4.3

classification 📊 stat.AP
keywords spatial domain identificationBayesian clusteringmulti-omics integrationMarkov random fieldmixture of finite mixturestissue architecturespatial transcriptomicsGibbs sampling
0
0 comments X

The pith

BaySC automatically infers the number of spatial domains from data and integrates multi-omics layers while enforcing local tissue coherence.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces BaySC as a Bayesian spatial clustering method designed to identify tissue domains by jointly using molecular profiles and physical locations. It employs a Mixture of Finite Mixtures prior so the model determines the correct number of domains directly from the observations rather than requiring a user-specified value. Tissue structure is captured by placing a Markov Random Field on the discrete cell-to-domain assignments, which encourages neighboring cells to receive similar labels without altering the original gene expression signals. For datasets that combine multiple omics measurements the framework fuses the sources through a weighted log-likelihood updated by Gibbs sampling, producing interpretable weights that show each modality's contribution. Across ten single-modal and two multi-omics spatial datasets the method recovers both continuous tissue layers and isolated groups of transcriptionally identical cells while scoring higher on a spatially aware agreement metric than prior tools.

Core claim

BaySC models spatial domain identification by combining a Mixture of Finite Mixtures prior that infers the number of domains from data with a Markov Random Field on discrete cell assignments that enforces local spatial coherence. This setup allows accurate recovery of tissue architecture in both contiguous layers and scattered transcriptionally identical populations. For multi-omics data a weighted log-likelihood fusion is performed via Gibbs sampling to assign interpretable weights to each data modality.

What carries the argument

Mixture of Finite Mixtures (MFM) prior for automatic domain count selection together with Markov Random Field (MRF) on discrete cellular assignments for spatial coherence, plus weighted log-likelihood fusion executed by Gibbs sampling for multimodal integration.

Load-bearing premise

The approach assumes that an MRF applied to discrete cell assignments will enforce local spatial coherence without distorting the underlying gene expression features and that the weighted fusion will assign biologically meaningful importance to each modality.

What would settle it

Run BaySC on a spatial multi-omics dataset whose domain boundaries have been independently verified by expert histological annotation or orthogonal imaging and check whether the recovered domains match the known topography or diverge from it.

Figures

Figures reproduced from arXiv: 2605.15291 by Guanyu Hu, Hanwen Ning, Lulu Shang, Xiaofei Dong, Xiao Wang, Xin Li, Xinyuan Song, Zhenke Duan.

Figure 1
Figure 1. Figure 1: Overall framework of BaySC. (Left) Raw count matrices from each molecular modality are embedded into low-dimensional representations, from which pairwise similarity matrices are derived via Fisher-Z transformation. A binary spatial neighborhood matrix W is simultaneously constructed from physical spot coordinates. (Middle) A collapsed Gibbs sampler updates cluster assignments by jointly balancing multimoda… view at source ↗
Figure 2
Figure 2. Figure 2: Single-modality spatial clustering results on the Human Breast Cancer 10x Visium dataset and the STARmap [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Spatial clustering results on the HER2-positive Breast Cancer dataset across eight tissue sections (A1–H1). [PITH_FULL_IMAGE:figures/full_fig_p011_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Multi-modal spatial clustering results on the Human Lymph Node A1 and MISAR-seq Mouse E15.5 Brain [PITH_FULL_IMAGE:figures/full_fig_p012_4.png] view at source ↗
read the original abstract

Spatial domain identification requires jointly modeling molecular signatures and physical coordinates, yet current tools frequently over-smooth biological boundaries, require user-specified cluster numbers, and lack principled multimodal integration. We introduce BaySC, an integrative Bayesian spatial clustering framework for spatial domain identification. BaySC inherently learns the true number of spatial domains from the data by employing a Mixture of Finite Mixtures (MFM) prior. Tissue topology is modeled via a Markov Random Field (MRF) applied to discrete cellular assignments, a strategy that enforces local spatial coherence without distorting the underlying gene expression features. This enables BaySC to accurately map contiguous tissue layers as well as geographically scattered, transcriptionally identical cell populations. Furthermore, BaySC handles spatial multi-omics data through a weighted log-likelihood fusion mechanism executed via Gibbs sampling. This approach assigns interpretable weights to each modality, allowing users to quantify the biological relevance of different data layers to the final tissue map. Validated across ten single-modal spatial transcriptomics and two spatial multi-omics datasets, BaySC yields highly interpretable probabilistic outputs. It demonstrates competitive accuracy on standard clustering metrics and consistently outperforms existing tools in preserving spatial topography, as measured by spatially-aware Adjusted Rand Index (spARI).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces BaySC, a Bayesian spatial clustering framework for identifying tissue domains in spatial multi-omics data. BaySC employs a Mixture of Finite Mixtures (MFM) prior to learn the number of spatial domains directly from the data. It models tissue topology using a Markov Random Field (MRF) on discrete cellular assignments to enforce local spatial coherence. For multi-omics data, it uses a weighted log-likelihood fusion mechanism within Gibbs sampling to integrate modalities with interpretable weights. The method is validated on ten single-modal spatial transcriptomics datasets and two spatial multi-omics datasets, demonstrating competitive accuracy on clustering metrics and superior performance in preserving spatial topography via the spatially-aware Adjusted Rand Index (spARI).

Significance. If the key modeling assumptions hold—specifically that the MRF prior on assignments does not distort transcriptional signals and that the modality weights reflect biological relevance—this approach could address limitations in existing spatial clustering tools by automatically inferring domain number, providing probabilistic outputs, and enabling multimodal integration without heavy user tuning. The multi-dataset validation supports potential robustness for tissue architecture mapping in spatial biology.

major comments (2)
  1. [Methods (generative model and MRF specification)] The central claim that the MRF applied to discrete cellular assignments 'enforces local spatial coherence without distorting the underlying gene expression features' is load-bearing for the method's ability to recover both contiguous layers and scattered transcriptionally identical populations. In the generative model, p(expression | z) may be independent of the MRF prior p(z | neighbors), yet the posterior necessarily compromises between them; a non-negligible MRF interaction parameter risks pulling assignments toward spatial contiguity even for transcriptionally identical but geographically distant cells. This requires either an explicit derivation of conditions under which distortion remains negligible or targeted simulations with known scattered populations (see also the modeling choice highlighted in the stress-test note).
  2. [Methods (multi-omics fusion step)] The weighted log-likelihood fusion for multi-omics integration is described as assigning 'interpretable weights' to modalities via Gibbs sampling, but the manuscript does not derive these weights from first principles or demonstrate that they capture biological relevance rather than acting as tunable parameters. Without sensitivity analyses, ablation studies comparing weighted vs. unweighted fusion, or convergence diagnostics for the weight estimation, it is unclear whether this mechanism provides a principled advantage over existing multimodal approaches.
minor comments (2)
  1. [Abstract] The abstract would be strengthened by briefly referencing the form of the MFM prior or the MRF energy function to convey technical details to readers familiar with Bayesian nonparametrics and spatial models.
  2. [Results] In the results, ensure that spARI and other metrics are reported with variability measures (e.g., standard deviation across replicates or datasets) and direct statistical comparisons to baseline methods to support the outperformance claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments, which help clarify key modeling aspects of BaySC. We address each major comment below with clarifications and indicate the revisions planned for the manuscript.

read point-by-point responses
  1. Referee: The central claim that the MRF applied to discrete cellular assignments 'enforces local spatial coherence without distorting the underlying gene expression features' is load-bearing for the method's ability to recover both contiguous layers and scattered transcriptionally identical populations. In the generative model, p(expression | z) may be independent of the MRF prior p(z | neighbors), yet the posterior necessarily compromises between them; a non-negligible MRF interaction parameter risks pulling assignments toward spatial contiguity even for transcriptionally identical but geographically distant cells. This requires either an explicit derivation of conditions under which distortion remains negligible or targeted simulations with known scattered populations (see also the modeling choice highlighted in the stress-test note).

    Authors: We agree that the joint posterior involves a trade-off between the expression likelihood and the MRF prior on assignments. The original claim in the manuscript is motivated by the fact that the likelihood term p(expression | z) depends directly on the cluster assignments while the MRF only regularizes the spatial configuration of those assignments; sufficiently strong transcriptional evidence can therefore dominate. To address the concern rigorously, the revised manuscript will include (i) a short derivation bounding the MRF influence as a function of the likelihood precision and the interaction parameter, and (ii) new simulation experiments on synthetic data containing both contiguous domains and deliberately scattered, transcriptionally identical populations. We will also revise the abstract and introduction to describe the mechanism as “encouraging local coherence while permitting data-driven exceptions” rather than claiming zero distortion. revision: yes

  2. Referee: The weighted log-likelihood fusion for multi-omics integration is described as assigning 'interpretable weights' to modalities via Gibbs sampling, but the manuscript does not derive these weights from first principles or demonstrate that they capture biological relevance rather than acting as tunable parameters. Without sensitivity analyses, ablation studies comparing weighted vs. unweighted fusion, or convergence diagnostics for the weight estimation, it is unclear whether this mechanism provides a principled advantage over existing multimodal approaches.

    Authors: The fusion weights are sampled from their full conditional posterior within the Gibbs sampler and are therefore data-driven rather than user-specified tuning parameters. Nevertheless, we acknowledge that additional empirical support is needed to demonstrate their interpretability and advantage. In the revision we will add (i) sensitivity analyses over the hyperprior on the weights, (ii) ablation experiments that compare weighted fusion against equal-weight and single-modality baselines on the two multi-omics datasets, and (iii) convergence diagnostics (trace plots and Gelman–Rubin statistics) for the weight parameters. These results will be presented in a new supplementary section. revision: yes

Circularity Check

0 steps flagged

No significant circularity; framework introduces independent modeling components

full rationale

The paper presents BaySC as a new integrative Bayesian framework using an MFM prior to learn the number of domains, an MRF on discrete assignments for spatial coherence, and a weighted log-likelihood fusion for multi-omics. These elements are described as novel combinations without any quoted derivation that reduces a claimed prediction or result directly back to fitted inputs or self-citations by construction. No equations or steps in the provided abstract or reader summary exhibit self-definitional loops, fitted parameters renamed as predictions, or load-bearing self-citations that collapse the central claims. The validation on external datasets further supports treating the approach as self-contained rather than tautological.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claims rest on standard Bayesian modeling assumptions plus two domain-specific modeling choices whose validity is not independently verified in the abstract.

free parameters (1)
  • modality weights
    Weights that determine the contribution of each omics layer in the fused log-likelihood; these are learned or assigned during inference.
axioms (2)
  • domain assumption The Mixture of Finite Mixtures prior can recover the true number of spatial domains from the observed data.
    Invoked to eliminate the need for user-specified cluster numbers.
  • domain assumption A Markov Random Field on discrete cell assignments enforces local spatial coherence without distorting gene expression features.
    Core premise for preserving tissue topology while modeling spatial structure.

pith-pipeline@v0.9.0 · 5765 in / 1524 out tokens · 56882 ms · 2026-05-19T15:53:46.368557+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

65 extracted references · 65 canonical work pages

  1. [1]

    Science , volume=

    Three-dimensional intact-tissue sequencing of single-cell transcriptional states , author=. Science , volume=

  2. [2]

    Genome Medicine , volume=

    Unsupervised spatially embedded deep representation of spatial transcriptomics , author=. Genome Medicine , volume=

  3. [3]

    Bayesian Analysis , volume=

    Bayesian spatial homogeneity pursuit of functional data: an application to the us income distribution , author=. Bayesian Analysis , volume=. 2023 , publisher=

  4. [4]

    Nature Methods , volume=

    Benchmarking spatial clustering methods with spatially resolved transcriptomics data , author=. Nature Methods , volume=

  5. [5]

    Nature Methods , volume=

    Simultaneous profiling of spatial gene expression and chromatin accessibility during mouse brain development , author=. Nature Methods , volume=

  6. [6]

    Cell , volume=

    Integrated analysis of multimodal single-cell data , author=. Cell , volume=

  7. [7]

    Biometrika , volume=

    Notes on continuous stochastic phenomena , author=. Biometrika , volume=

  8. [8]

    Qiao, Yinqiao and others , year=

  9. [9]

    Biometrics , year=

    spARI: a spatially aware adjusted Rand index for spatial transcriptomics clustering evaluation , author=. Biometrics , year=

  10. [10]

    Nature Methods , volume=

    Dependency-aware deep generative models for multitasking analysis of spatial omics data , author=. Nature Methods , volume=. 2024 , publisher=

  11. [11]

    Nature Communications , volume=

    Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST , author=. Nature Communications , volume=. 2023 , publisher=

  12. [12]

    Briefings in Bioinformatics , volume=

    soFusion: facilitating tissue structure identification via spatial multi-omics data fusion , author=. Briefings in Bioinformatics , volume=. 2025 , publisher=

  13. [13]

    Nature Methods , volume=

    SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network , author=. Nature Methods , volume=. 2021 , publisher=

  14. [14]

    Nature Genetics , volume=

    BANKSY unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis , author=. Nature Genetics , volume=. 2024 , publisher=

  15. [15]

    Nature Communications , volume=

    Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder , author=. Nature Communications , volume=. 2022 , publisher=

  16. [16]

    Briefings in Bioinformatics , volume=

    spaLLM: enhancing spatial domain analysis in multi-omics data through large language model integration , author=. Briefings in Bioinformatics , volume=. 2025 , publisher=

  17. [17]

    Nature Methods , volume=

    Simultaneous profiling of spatial gene expression and chromatin accessibility during mouse brain development , author=. Nature Methods , volume=. 2023 , publisher=

  18. [18]

    Econometrics and Statistics , volume=

    Spatially varying sparsity in dynamic regression models , author=. Econometrics and Statistics , volume=. 2021 , publisher=

  19. [19]

    Nature Communications , volume=

    INSTINCT: Multi-sample integration of spatial chromatin accessibility sequencing data via stochastic domain translation , author=. Nature Communications , volume=. 2025 , publisher=

  20. [20]

    bioRxiv , pages=

    SpaTranslator: A deep generative framework for universal spatial multi-omics cross-modality translation , author=. bioRxiv , pages=. 2025 , publisher=

  21. [21]

    Genome Biology , volume=

    SMOPCA: spatially aware dimension reduction integrating multi-omics improves the efficiency of spatial domain detection , author=. Genome Biology , volume=. 2025 , publisher=

  22. [22]

    PLOS Computational Biology , volume=

    A graph neural network-based spatial multi-omics data integration method for deciphering spatial domains , author=. PLOS Computational Biology , volume=. 2025 , publisher=

  23. [23]

    Bayesian inference for gene expression and proteomics , volume=

    Model-based clustering for expression data via a Dirichlet process mixture model , author=. Bayesian inference for gene expression and proteomics , volume=

  24. [24]

    , booktitle=

    Dahl, David B. , booktitle=. Model-based clustering for expression data via a. 2006 , publisher=

  25. [25]

    R package version 0.1 , volume=

    salso: Sequentially-allocated latent structure optimization , author=. R package version 0.1 , volume=

  26. [26]

    and Johnson, Devin J

    Dahl, David B. and Johnson, Devin J. and M. Search Algorithms and Loss Functions for. Journal of Computational and Graphical Statistics , volume=

  27. [27]

    Spatial transcriptomics at subspot resolution with

    Zhao, Edward and Stone, Matthew R and Ren, Xing and others , journal=. Spatial transcriptomics at subspot resolution with. 2021 , publisher=

  28. [28]

    Nature Methods , volume=

    Deciphering spatial domains from spatial multi-omics with SpatialGlue , author=. Nature Methods , volume=. 2024 , publisher=

  29. [29]

    Social Networks , volume=

    Stochastic blockmodels: First steps , author=. Social Networks , volume=. 1983 , publisher=

  30. [30]

    Journal of the American Statistical Association , volume=

    Probabilistic community detection with unknown number of communities , author=. Journal of the American Statistical Association , volume=. 2019 , publisher=

  31. [31]

    The Annals of Statistics , pages=

    Estimating the dimension of a model , author=. The Annals of Statistics , pages=. 1978 , publisher=

  32. [32]

    Reviews of Modern Physics , volume=

    The Potts model , author=. Reviews of Modern Physics , volume=. 1982 , publisher=

  33. [33]

    International Journal of Computer Vision , volume=

    Nonparametric Bayesian image segmentation , author=. International Journal of Computer Vision , volume=. 2008 , publisher=

  34. [34]

    Nature Methods , volume=

    Museum of spatial transcriptomics , author=. Nature Methods , volume=

  35. [35]

    Nature , volume=

    Exploring tissue architecture using spatial transcriptomics , author=. Nature , volume=

  36. [36]

    Nature Genetics , volume=

    Spatiotemporal transcriptomic maps of whole mouse embryos at the onset of organogenesis , author=. Nature Genetics , volume=

  37. [37]

    Science , volume=

    Transcriptomic cytoarchitecture reveals principles of human neocortex organization , author=. Science , volume=

  38. [38]

    Nature Communications , volume=

    Spatially resolved transcriptomics reveals the architecture of the tumor-microenvironment interface , author=. Nature Communications , volume=

  39. [39]

    Nature Reviews Molecular Cell Biology , year=

    Profiling cell identity and tissue architecture with single-cell and spatial transcriptomics , author=. Nature Reviews Molecular Cell Biology , year=

  40. [40]

    Genome Biology , volume=

    Giotto: a toolbox for integrative analysis and visualization of spatial expression data , author=. Genome Biology , volume=

  41. [41]

    Nature Methods , volume=

    Dependency-aware deep generative models for multitasking analysis of spatial omics data , author=. Nature Methods , volume=

  42. [42]

    Nature Methods , volume=

    Sprod for de-noising spatially resolved transcriptomics data based on position and image information , author=. Nature Methods , volume=

  43. [43]

    High-plex protein and whole transcriptome co-mapping at cellular resolution with spatial

    Liu, Yang and DiStasio, Marcello and Su, Graham and others , journal=. High-plex protein and whole transcriptome co-mapping at cellular resolution with spatial

  44. [44]

    Journal of the American Statistical Association , volume=

    Mixture models with a prior on the number of components , author=. Journal of the American Statistical Association , volume=

  45. [45]

    Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume=

    Bayesian measures of model complexity and fit , author=. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume=

  46. [46]

    Spatial deconvolution of

    Andersson, Alma and Bergenstrahle, Joseph and Asp, Michaela and others , journal=. Spatial deconvolution of

  47. [47]

    Cell , volume=

    A single-cell atlas of in vivo mammalian chromatin accessibility , author=. Cell , volume=

  48. [48]

    Information Sciences , volume=

    SPAMI: A cognitive spam protector for advertisement malicious images , author=. Information Sciences , volume=. 2020 , publisher=

  49. [49]

    Communications Biology , volume=

    Unveiling spatial domains from spatial multi-omics data using dual-graph regularized ensemble learning , author=. Communications Biology , volume=. 2025 , publisher=

  50. [50]

    Bioinformatics , volume=

    m2ST: dual multi-scale graph clustering for spatially resolved transcriptomics , author=. Bioinformatics , volume=. 2025 , publisher=

  51. [51]

    Bioinformatics , pages=

    CEMUSA: A Graph-based Integrative Metric for Evaluating Clusters in Spatial Transcriptomics , author=. Bioinformatics , pages=. 2026 , publisher=

  52. [52]

    Bioinformatics , volume=

    SpatialSort: a Bayesian model for clustering and cell population annotation of spatial proteomics data , author=. Bioinformatics , volume=. 2023 , publisher=

  53. [53]

    Bioinformatics , volume=

    A multi-modality and multi-granularity collaborative learning framework for identifying spatial domains and spatially variable genes , author=. Bioinformatics , volume=. 2024 , publisher=

  54. [54]

    Bioinformatics , volume=

    Unraveling spatial domain characterization in spatially resolved transcriptomics with robust graph contrastive clustering , author=. Bioinformatics , volume=. 2024 , publisher=

  55. [55]

    Bioinformatics , volume=

    Single-cell and spatial multiomic inference of gene regulatory networks using SCRIPro , author=. Bioinformatics , volume=. 2024 , publisher=

  56. [56]

    Bioinformatics , volume=

    PCA-based spatial domain identification with state-of-the-art performance , author=. Bioinformatics , volume=. 2025 , publisher=

  57. [57]

    Bioinformatics , volume=

    Assembling spatial clustering framework for heterogeneous spatial transcriptomics data with GRAPHDeep , author=. Bioinformatics , volume=. 2024 , publisher=

  58. [58]

    Bioinformatics , volume=

    BISON: bi-clustering of spatial omics data with feature selection , author=. Bioinformatics , volume=. 2025 , publisher=

  59. [59]

    Bioinformatics , volume=

    DESpace: spatially variable gene detection via differential expression testing of spatial clusters , author=. Bioinformatics , volume=. 2024 , publisher=

  60. [60]

    Bioinformatics , volume=

    scBSP: a fast and accurate tool for identifying spatially variable features from high-resolution spatial omics data , author=. Bioinformatics , volume=. 2025 , publisher=

  61. [61]

    Bioinformatics , volume=

    Spatial mutual nearest neighbors for spatial transcriptomics data , author=. Bioinformatics , volume=. 2025 , publisher=

  62. [62]

    Bioinformatics , volume=

    STAHD: a scalable and accurate method to detect spatial domains in high-resolution spatial transcriptomics data , author=. Bioinformatics , volume=. 2026 , publisher=

  63. [63]

    Bioinformatics , pages=

    STransfer: A Transfer Learning-Enhanced Graph Convolutional Network for Clustering Spatial Transcriptomics Data , author=. Bioinformatics , pages=. 2026 , publisher=

  64. [64]

    Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with

    Long, Yahui and Ang, Kok Siong and Li, Mengwei and others , journal=. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with

  65. [65]

    Dong, Hongyu and others , year=