BaySC: Uncovering Tissue Architecture in Spatial Multi-Omics via Probabilistic Spatial Clustering
Pith reviewed 2026-05-19 15:53 UTC · model grok-4.3
The pith
BaySC automatically infers the number of spatial domains from data and integrates multi-omics layers while enforcing local tissue coherence.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
BaySC models spatial domain identification by combining a Mixture of Finite Mixtures prior that infers the number of domains from data with a Markov Random Field on discrete cell assignments that enforces local spatial coherence. This setup allows accurate recovery of tissue architecture in both contiguous layers and scattered transcriptionally identical populations. For multi-omics data a weighted log-likelihood fusion is performed via Gibbs sampling to assign interpretable weights to each data modality.
What carries the argument
Mixture of Finite Mixtures (MFM) prior for automatic domain count selection together with Markov Random Field (MRF) on discrete cellular assignments for spatial coherence, plus weighted log-likelihood fusion executed by Gibbs sampling for multimodal integration.
Load-bearing premise
The approach assumes that an MRF applied to discrete cell assignments will enforce local spatial coherence without distorting the underlying gene expression features and that the weighted fusion will assign biologically meaningful importance to each modality.
What would settle it
Run BaySC on a spatial multi-omics dataset whose domain boundaries have been independently verified by expert histological annotation or orthogonal imaging and check whether the recovered domains match the known topography or diverge from it.
Figures
read the original abstract
Spatial domain identification requires jointly modeling molecular signatures and physical coordinates, yet current tools frequently over-smooth biological boundaries, require user-specified cluster numbers, and lack principled multimodal integration. We introduce BaySC, an integrative Bayesian spatial clustering framework for spatial domain identification. BaySC inherently learns the true number of spatial domains from the data by employing a Mixture of Finite Mixtures (MFM) prior. Tissue topology is modeled via a Markov Random Field (MRF) applied to discrete cellular assignments, a strategy that enforces local spatial coherence without distorting the underlying gene expression features. This enables BaySC to accurately map contiguous tissue layers as well as geographically scattered, transcriptionally identical cell populations. Furthermore, BaySC handles spatial multi-omics data through a weighted log-likelihood fusion mechanism executed via Gibbs sampling. This approach assigns interpretable weights to each modality, allowing users to quantify the biological relevance of different data layers to the final tissue map. Validated across ten single-modal spatial transcriptomics and two spatial multi-omics datasets, BaySC yields highly interpretable probabilistic outputs. It demonstrates competitive accuracy on standard clustering metrics and consistently outperforms existing tools in preserving spatial topography, as measured by spatially-aware Adjusted Rand Index (spARI).
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces BaySC, a Bayesian spatial clustering framework for identifying tissue domains in spatial multi-omics data. BaySC employs a Mixture of Finite Mixtures (MFM) prior to learn the number of spatial domains directly from the data. It models tissue topology using a Markov Random Field (MRF) on discrete cellular assignments to enforce local spatial coherence. For multi-omics data, it uses a weighted log-likelihood fusion mechanism within Gibbs sampling to integrate modalities with interpretable weights. The method is validated on ten single-modal spatial transcriptomics datasets and two spatial multi-omics datasets, demonstrating competitive accuracy on clustering metrics and superior performance in preserving spatial topography via the spatially-aware Adjusted Rand Index (spARI).
Significance. If the key modeling assumptions hold—specifically that the MRF prior on assignments does not distort transcriptional signals and that the modality weights reflect biological relevance—this approach could address limitations in existing spatial clustering tools by automatically inferring domain number, providing probabilistic outputs, and enabling multimodal integration without heavy user tuning. The multi-dataset validation supports potential robustness for tissue architecture mapping in spatial biology.
major comments (2)
- [Methods (generative model and MRF specification)] The central claim that the MRF applied to discrete cellular assignments 'enforces local spatial coherence without distorting the underlying gene expression features' is load-bearing for the method's ability to recover both contiguous layers and scattered transcriptionally identical populations. In the generative model, p(expression | z) may be independent of the MRF prior p(z | neighbors), yet the posterior necessarily compromises between them; a non-negligible MRF interaction parameter risks pulling assignments toward spatial contiguity even for transcriptionally identical but geographically distant cells. This requires either an explicit derivation of conditions under which distortion remains negligible or targeted simulations with known scattered populations (see also the modeling choice highlighted in the stress-test note).
- [Methods (multi-omics fusion step)] The weighted log-likelihood fusion for multi-omics integration is described as assigning 'interpretable weights' to modalities via Gibbs sampling, but the manuscript does not derive these weights from first principles or demonstrate that they capture biological relevance rather than acting as tunable parameters. Without sensitivity analyses, ablation studies comparing weighted vs. unweighted fusion, or convergence diagnostics for the weight estimation, it is unclear whether this mechanism provides a principled advantage over existing multimodal approaches.
minor comments (2)
- [Abstract] The abstract would be strengthened by briefly referencing the form of the MFM prior or the MRF energy function to convey technical details to readers familiar with Bayesian nonparametrics and spatial models.
- [Results] In the results, ensure that spARI and other metrics are reported with variability measures (e.g., standard deviation across replicates or datasets) and direct statistical comparisons to baseline methods to support the outperformance claims.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments, which help clarify key modeling aspects of BaySC. We address each major comment below with clarifications and indicate the revisions planned for the manuscript.
read point-by-point responses
-
Referee: The central claim that the MRF applied to discrete cellular assignments 'enforces local spatial coherence without distorting the underlying gene expression features' is load-bearing for the method's ability to recover both contiguous layers and scattered transcriptionally identical populations. In the generative model, p(expression | z) may be independent of the MRF prior p(z | neighbors), yet the posterior necessarily compromises between them; a non-negligible MRF interaction parameter risks pulling assignments toward spatial contiguity even for transcriptionally identical but geographically distant cells. This requires either an explicit derivation of conditions under which distortion remains negligible or targeted simulations with known scattered populations (see also the modeling choice highlighted in the stress-test note).
Authors: We agree that the joint posterior involves a trade-off between the expression likelihood and the MRF prior on assignments. The original claim in the manuscript is motivated by the fact that the likelihood term p(expression | z) depends directly on the cluster assignments while the MRF only regularizes the spatial configuration of those assignments; sufficiently strong transcriptional evidence can therefore dominate. To address the concern rigorously, the revised manuscript will include (i) a short derivation bounding the MRF influence as a function of the likelihood precision and the interaction parameter, and (ii) new simulation experiments on synthetic data containing both contiguous domains and deliberately scattered, transcriptionally identical populations. We will also revise the abstract and introduction to describe the mechanism as “encouraging local coherence while permitting data-driven exceptions” rather than claiming zero distortion. revision: yes
-
Referee: The weighted log-likelihood fusion for multi-omics integration is described as assigning 'interpretable weights' to modalities via Gibbs sampling, but the manuscript does not derive these weights from first principles or demonstrate that they capture biological relevance rather than acting as tunable parameters. Without sensitivity analyses, ablation studies comparing weighted vs. unweighted fusion, or convergence diagnostics for the weight estimation, it is unclear whether this mechanism provides a principled advantage over existing multimodal approaches.
Authors: The fusion weights are sampled from their full conditional posterior within the Gibbs sampler and are therefore data-driven rather than user-specified tuning parameters. Nevertheless, we acknowledge that additional empirical support is needed to demonstrate their interpretability and advantage. In the revision we will add (i) sensitivity analyses over the hyperprior on the weights, (ii) ablation experiments that compare weighted fusion against equal-weight and single-modality baselines on the two multi-omics datasets, and (iii) convergence diagnostics (trace plots and Gelman–Rubin statistics) for the weight parameters. These results will be presented in a new supplementary section. revision: yes
Circularity Check
No significant circularity; framework introduces independent modeling components
full rationale
The paper presents BaySC as a new integrative Bayesian framework using an MFM prior to learn the number of domains, an MRF on discrete assignments for spatial coherence, and a weighted log-likelihood fusion for multi-omics. These elements are described as novel combinations without any quoted derivation that reduces a claimed prediction or result directly back to fitted inputs or self-citations by construction. No equations or steps in the provided abstract or reader summary exhibit self-definitional loops, fitted parameters renamed as predictions, or load-bearing self-citations that collapse the central claims. The validation on external datasets further supports treating the approach as self-contained rather than tautological.
Axiom & Free-Parameter Ledger
free parameters (1)
- modality weights
axioms (2)
- domain assumption The Mixture of Finite Mixtures prior can recover the true number of spatial domains from the observed data.
- domain assumption A Markov Random Field on discrete cell assignments enforces local spatial coherence without distorting gene expression features.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanLogicNat recovery and embed_strictMono unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Tissue topology is modeled via a Markov Random Field (MRF) applied to discrete cellular assignments... without distorting the underlying gene expression features.
-
IndisputableMonolith/Foundation/AlphaCoordinateFixation.leanalpha_pin_under_high_calibration unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
BaySC inherently learns the true number of spatial domains from the data by employing a Mixture of Finite Mixtures (MFM) prior.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Three-dimensional intact-tissue sequencing of single-cell transcriptional states , author=. Science , volume=
-
[2]
Unsupervised spatially embedded deep representation of spatial transcriptomics , author=. Genome Medicine , volume=
-
[3]
Bayesian spatial homogeneity pursuit of functional data: an application to the us income distribution , author=. Bayesian Analysis , volume=. 2023 , publisher=
work page 2023
-
[4]
Benchmarking spatial clustering methods with spatially resolved transcriptomics data , author=. Nature Methods , volume=
-
[5]
Simultaneous profiling of spatial gene expression and chromatin accessibility during mouse brain development , author=. Nature Methods , volume=
-
[6]
Integrated analysis of multimodal single-cell data , author=. Cell , volume=
-
[7]
Notes on continuous stochastic phenomena , author=. Biometrika , volume=
-
[8]
Qiao, Yinqiao and others , year=
-
[9]
spARI: a spatially aware adjusted Rand index for spatial transcriptomics clustering evaluation , author=. Biometrics , year=
-
[10]
Dependency-aware deep generative models for multitasking analysis of spatial omics data , author=. Nature Methods , volume=. 2024 , publisher=
work page 2024
-
[11]
Nature Communications , volume=
Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST , author=. Nature Communications , volume=. 2023 , publisher=
work page 2023
-
[12]
Briefings in Bioinformatics , volume=
soFusion: facilitating tissue structure identification via spatial multi-omics data fusion , author=. Briefings in Bioinformatics , volume=. 2025 , publisher=
work page 2025
-
[13]
SpaGCN: Integrating gene expression, spatial location and histology to identify spatial domains and spatially variable genes by graph convolutional network , author=. Nature Methods , volume=. 2021 , publisher=
work page 2021
-
[14]
BANKSY unifies cell typing and tissue domain segmentation for scalable spatial omics data analysis , author=. Nature Genetics , volume=. 2024 , publisher=
work page 2024
-
[15]
Nature Communications , volume=
Deciphering spatial domains from spatially resolved transcriptomics with an adaptive graph attention auto-encoder , author=. Nature Communications , volume=. 2022 , publisher=
work page 2022
-
[16]
Briefings in Bioinformatics , volume=
spaLLM: enhancing spatial domain analysis in multi-omics data through large language model integration , author=. Briefings in Bioinformatics , volume=. 2025 , publisher=
work page 2025
-
[17]
Simultaneous profiling of spatial gene expression and chromatin accessibility during mouse brain development , author=. Nature Methods , volume=. 2023 , publisher=
work page 2023
-
[18]
Econometrics and Statistics , volume=
Spatially varying sparsity in dynamic regression models , author=. Econometrics and Statistics , volume=. 2021 , publisher=
work page 2021
-
[19]
Nature Communications , volume=
INSTINCT: Multi-sample integration of spatial chromatin accessibility sequencing data via stochastic domain translation , author=. Nature Communications , volume=. 2025 , publisher=
work page 2025
-
[20]
SpaTranslator: A deep generative framework for universal spatial multi-omics cross-modality translation , author=. bioRxiv , pages=. 2025 , publisher=
work page 2025
-
[21]
SMOPCA: spatially aware dimension reduction integrating multi-omics improves the efficiency of spatial domain detection , author=. Genome Biology , volume=. 2025 , publisher=
work page 2025
-
[22]
PLOS Computational Biology , volume=
A graph neural network-based spatial multi-omics data integration method for deciphering spatial domains , author=. PLOS Computational Biology , volume=. 2025 , publisher=
work page 2025
-
[23]
Bayesian inference for gene expression and proteomics , volume=
Model-based clustering for expression data via a Dirichlet process mixture model , author=. Bayesian inference for gene expression and proteomics , volume=
-
[24]
Dahl, David B. , booktitle=. Model-based clustering for expression data via a. 2006 , publisher=
work page 2006
-
[25]
R package version 0.1 , volume=
salso: Sequentially-allocated latent structure optimization , author=. R package version 0.1 , volume=
-
[26]
Dahl, David B. and Johnson, Devin J. and M. Search Algorithms and Loss Functions for. Journal of Computational and Graphical Statistics , volume=
-
[27]
Spatial transcriptomics at subspot resolution with
Zhao, Edward and Stone, Matthew R and Ren, Xing and others , journal=. Spatial transcriptomics at subspot resolution with. 2021 , publisher=
work page 2021
-
[28]
Deciphering spatial domains from spatial multi-omics with SpatialGlue , author=. Nature Methods , volume=. 2024 , publisher=
work page 2024
-
[29]
Stochastic blockmodels: First steps , author=. Social Networks , volume=. 1983 , publisher=
work page 1983
-
[30]
Journal of the American Statistical Association , volume=
Probabilistic community detection with unknown number of communities , author=. Journal of the American Statistical Association , volume=. 2019 , publisher=
work page 2019
-
[31]
The Annals of Statistics , pages=
Estimating the dimension of a model , author=. The Annals of Statistics , pages=. 1978 , publisher=
work page 1978
-
[32]
Reviews of Modern Physics , volume=
The Potts model , author=. Reviews of Modern Physics , volume=. 1982 , publisher=
work page 1982
-
[33]
International Journal of Computer Vision , volume=
Nonparametric Bayesian image segmentation , author=. International Journal of Computer Vision , volume=. 2008 , publisher=
work page 2008
-
[34]
Museum of spatial transcriptomics , author=. Nature Methods , volume=
-
[35]
Exploring tissue architecture using spatial transcriptomics , author=. Nature , volume=
-
[36]
Spatiotemporal transcriptomic maps of whole mouse embryos at the onset of organogenesis , author=. Nature Genetics , volume=
-
[37]
Transcriptomic cytoarchitecture reveals principles of human neocortex organization , author=. Science , volume=
-
[38]
Nature Communications , volume=
Spatially resolved transcriptomics reveals the architecture of the tumor-microenvironment interface , author=. Nature Communications , volume=
-
[39]
Nature Reviews Molecular Cell Biology , year=
Profiling cell identity and tissue architecture with single-cell and spatial transcriptomics , author=. Nature Reviews Molecular Cell Biology , year=
-
[40]
Giotto: a toolbox for integrative analysis and visualization of spatial expression data , author=. Genome Biology , volume=
-
[41]
Dependency-aware deep generative models for multitasking analysis of spatial omics data , author=. Nature Methods , volume=
-
[42]
Sprod for de-noising spatially resolved transcriptomics data based on position and image information , author=. Nature Methods , volume=
-
[43]
High-plex protein and whole transcriptome co-mapping at cellular resolution with spatial
Liu, Yang and DiStasio, Marcello and Su, Graham and others , journal=. High-plex protein and whole transcriptome co-mapping at cellular resolution with spatial
-
[44]
Journal of the American Statistical Association , volume=
Mixture models with a prior on the number of components , author=. Journal of the American Statistical Association , volume=
-
[45]
Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume=
Bayesian measures of model complexity and fit , author=. Journal of the Royal Statistical Society: Series B (Statistical Methodology) , volume=
-
[46]
Andersson, Alma and Bergenstrahle, Joseph and Asp, Michaela and others , journal=. Spatial deconvolution of
-
[47]
A single-cell atlas of in vivo mammalian chromatin accessibility , author=. Cell , volume=
-
[48]
Information Sciences , volume=
SPAMI: A cognitive spam protector for advertisement malicious images , author=. Information Sciences , volume=. 2020 , publisher=
work page 2020
-
[49]
Communications Biology , volume=
Unveiling spatial domains from spatial multi-omics data using dual-graph regularized ensemble learning , author=. Communications Biology , volume=. 2025 , publisher=
work page 2025
-
[50]
m2ST: dual multi-scale graph clustering for spatially resolved transcriptomics , author=. Bioinformatics , volume=. 2025 , publisher=
work page 2025
-
[51]
CEMUSA: A Graph-based Integrative Metric for Evaluating Clusters in Spatial Transcriptomics , author=. Bioinformatics , pages=. 2026 , publisher=
work page 2026
-
[52]
SpatialSort: a Bayesian model for clustering and cell population annotation of spatial proteomics data , author=. Bioinformatics , volume=. 2023 , publisher=
work page 2023
-
[53]
A multi-modality and multi-granularity collaborative learning framework for identifying spatial domains and spatially variable genes , author=. Bioinformatics , volume=. 2024 , publisher=
work page 2024
-
[54]
Unraveling spatial domain characterization in spatially resolved transcriptomics with robust graph contrastive clustering , author=. Bioinformatics , volume=. 2024 , publisher=
work page 2024
-
[55]
Single-cell and spatial multiomic inference of gene regulatory networks using SCRIPro , author=. Bioinformatics , volume=. 2024 , publisher=
work page 2024
-
[56]
PCA-based spatial domain identification with state-of-the-art performance , author=. Bioinformatics , volume=. 2025 , publisher=
work page 2025
-
[57]
Assembling spatial clustering framework for heterogeneous spatial transcriptomics data with GRAPHDeep , author=. Bioinformatics , volume=. 2024 , publisher=
work page 2024
-
[58]
BISON: bi-clustering of spatial omics data with feature selection , author=. Bioinformatics , volume=. 2025 , publisher=
work page 2025
-
[59]
DESpace: spatially variable gene detection via differential expression testing of spatial clusters , author=. Bioinformatics , volume=. 2024 , publisher=
work page 2024
-
[60]
scBSP: a fast and accurate tool for identifying spatially variable features from high-resolution spatial omics data , author=. Bioinformatics , volume=. 2025 , publisher=
work page 2025
-
[61]
Spatial mutual nearest neighbors for spatial transcriptomics data , author=. Bioinformatics , volume=. 2025 , publisher=
work page 2025
-
[62]
STAHD: a scalable and accurate method to detect spatial domains in high-resolution spatial transcriptomics data , author=. Bioinformatics , volume=. 2026 , publisher=
work page 2026
-
[63]
STransfer: A Transfer Learning-Enhanced Graph Convolutional Network for Clustering Spatial Transcriptomics Data , author=. Bioinformatics , pages=. 2026 , publisher=
work page 2026
-
[64]
Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with
Long, Yahui and Ang, Kok Siong and Li, Mengwei and others , journal=. Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with
-
[65]
Dong, Hongyu and others , year=
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.