INST-Align: Implicit Neural Alignment for Spatial Transcriptomics via Canonical Expression Fields
Pith reviewed 2026-05-10 15:26 UTC · model grok-4.3
The pith
A shared canonical expression field enables joint unsupervised alignment and reconstruction of spatial transcriptomics slices.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
INST-Align is an unsupervised pairwise framework that couples a coordinate-based deformation network with a shared Canonical Expression Field, an implicit neural representation mapping spatial coordinates to expression embeddings. The two-phase training first establishes a stable canonical embedding space and then jointly optimizes deformation and spatial-feature matching. Cross-slice parameter sharing of the canonical field regularizes ambiguous correspondences and absorbs batch variation, yielding state-of-the-art OT Accuracy of 0.702, NN Accuracy of 0.719, and up to 94.9% Chamfer distance reduction on large-deformation data across nine datasets while producing biologically meaningful 3D-s
What carries the argument
Canonical Expression Field: an implicit neural representation mapping spatial coordinates to expression embeddings that is shared across slices to regularize correspondences and absorb batch effects.
If this is right
- Joint optimization produces mutually constrained alignment and representation learning.
- Mean OT Accuracy reaches 0.702 and NN Accuracy reaches 0.719 across nine datasets.
- Chamfer distance drops by up to 94.9% on large-deformation sections relative to baselines.
- The learned embeddings are biologically meaningful and support coherent 3D tissue reconstruction.
Where Pith is reading between the lines
- The same implicit-field regularizer could be applied to align other spatially resolved modalities such as multiplexed imaging or spatial proteomics.
- Chaining pairwise canonical fields across many slices may allow direct multi-slice integration without sequential pairwise steps.
- If the canonical embeddings capture condition-invariant signals, they could support cross-sample or cross-condition comparison without additional alignment.
Load-bearing premise
The shared Canonical Expression Field can effectively regularize ambiguous correspondences and absorb batch variation through cross-slice parameter sharing without any supervision or external validation.
What would settle it
On a held-out dataset with known ground-truth deformations, the shared-field version shows no improvement in alignment metrics over independently trained per-slice fields.
Figures
read the original abstract
Spatial transcriptomics (ST) measures mRNA expression while preserving spatial organization, but multi-slice analysis faces two coupled difficulties: large non-rigid deformations across slices and inter-slice batch effects when alignment and integration are treated independently. We present INST-Align, an unsupervised pairwise framework that couples a coordinate-based deformation network with a shared Canonical Expression Field, an implicit neural representation mapping spatial coordinates to expression embeddings, for joint alignment and reconstruction. A two-phase training strategy first establishes a stable canonical embedding space and then jointly optimizes deformation and spatial-feature matching, enabling mutually constrained alignment and representation learning. Cross-slice parameter sharing of the canonical field regularizes ambiguous correspondences and absorbs batch variation. Across nine datasets, INST-Align achieves state-of-the-art mean OT Accuracy (0.702), NN Accuracy (0.719), and Chamfer distance, with Chamfer reductions of up to 94.9\% on large-deformation sections relative to the strongest baseline. The framework also yields biologically meaningful spatial embeddings and coherent 3D tissue reconstruction. The code will be released after review phase.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces INST-Align, an unsupervised pairwise framework for spatial transcriptomics slice alignment. It couples a coordinate-based deformation network with a shared Canonical Expression Field (an implicit neural representation mapping coordinates to expression embeddings) via a two-phase training procedure: first stabilizing the canonical space, then jointly optimizing deformation and spatial-feature matching. Cross-slice parameter sharing in the canonical field is used to regularize ambiguous correspondences and absorb batch effects. On nine datasets the method reports state-of-the-art mean OT Accuracy (0.702), NN Accuracy (0.719), and Chamfer distance, with Chamfer reductions up to 94.9% on large-deformation sections, plus biologically coherent 3D reconstructions. Code release is promised.
Significance. If the quantitative gains and regularization mechanism hold under detailed scrutiny, the work offers a principled unsupervised route to joint alignment and representation learning for multi-slice ST data, addressing the coupled problems of non-rigid deformation and batch variation through implicit neural fields. The continuous canonical representation and cross-slice sharing constitute a clear technical contribution, and the promised code release supports reproducibility. The reported Chamfer improvements on large-deformation cases are particularly noteworthy if they generalize beyond the evaluated sections.
major comments (2)
- [§3 and §4] §3 (Method) and §4 (Experiments): the abstract and high-level description claim that cross-slice parameter sharing of the Canonical Expression Field regularizes correspondences and absorbs batch variation without supervision, yet no derivation, loss-term weighting schedule, or ablation isolating the sharing effect is provided. It is therefore unclear whether the reported OT/NN accuracy gains are independent of the joint optimization loop or whether they reduce to quantities already fitted during training.
- [§4] §4 (Experiments), Table 1 or equivalent results table: mean OT Accuracy 0.702 and NN Accuracy 0.719 are stated as state-of-the-art across nine datasets, but the manuscript supplies neither per-dataset standard deviations, statistical significance tests against the strongest baseline, nor explicit data-exclusion or preprocessing criteria. Without these, the robustness of the central empirical claim cannot be fully assessed.
minor comments (2)
- [Abstract] The abstract states that 'the code will be released after review phase.' Adding a GitHub link or Zenodo DOI in the camera-ready version would strengthen reproducibility.
- [§3.1] Notation for the Canonical Expression Field (e.g., the mapping from coordinates to embeddings) should be introduced with an explicit equation in §3.1 to aid readers in following the two-phase training description.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and positive overall assessment of our work. We address each major comment below with point-by-point responses and indicate the revisions we will make to the manuscript.
read point-by-point responses
-
Referee: [§3 and §4] §3 (Method) and §4 (Experiments): the abstract and high-level description claim that cross-slice parameter sharing of the Canonical Expression Field regularizes correspondences and absorbs batch variation without supervision, yet no derivation, loss-term weighting schedule, or ablation isolating the sharing effect is provided. It is therefore unclear whether the reported OT/NN accuracy gains are independent of the joint optimization loop or whether they reduce to quantities already fitted during training.
Authors: The cross-slice parameter sharing is implemented by maintaining a single set of network weights for the Canonical Expression Field that is optimized jointly across all input slices. This architectural choice creates a shared embedding space that serves as a regularizer: the deformation network must map each slice into a consistent canonical representation, which constrains ambiguous correspondences and absorbs batch effects without requiring explicit alignment supervision. The two-phase training first stabilizes the canonical field on individual slices before enabling joint optimization, ensuring the shared parameters are not simply fitted to the deformation objective. While the original manuscript did not include a formal derivation or isolated ablation of the sharing mechanism, the performance improvements on large-deformation cases are consistent with the regularization effect. To address the concern directly, we will expand Section 3 with the loss-term weighting schedule and add an ablation comparing shared versus per-slice canonical fields in the revised manuscript. revision: yes
-
Referee: [§4] §4 (Experiments), Table 1 or equivalent results table: mean OT Accuracy 0.702 and NN Accuracy 0.719 are stated as state-of-the-art across nine datasets, but the manuscript supplies neither per-dataset standard deviations, statistical significance tests against the strongest baseline, nor explicit data-exclusion or preprocessing criteria. Without these, the robustness of the central empirical claim cannot be fully assessed.
Authors: We agree that additional statistical detail would strengthen the empirical section. The reported means aggregate results over the nine datasets, with each dataset processed using the same pipeline. Preprocessing steps, including spot filtering and normalization, are described in Section 4.1, and data exclusion criteria follow standard practices for the source datasets. In the revision we will expand the results table to include per-dataset means and standard deviations, add paired statistical significance tests (Wilcoxon signed-rank) against the strongest baseline, and provide an explicit summary of preprocessing and exclusion rules to allow full evaluation of the claims. revision: yes
Circularity Check
No significant circularity detected
full rationale
The INST-Align framework is an unsupervised optimization procedure that jointly learns a shared Canonical Expression Field and a deformation network across slices via two-phase training. Reported metrics (OT Accuracy, NN Accuracy, Chamfer distance) are external evaluation quantities computed after optimization on nine datasets; they are not shown to reduce by construction to any fitted parameter or input quantity inside the same loss. No self-definitional equations, fitted-input predictions, load-bearing self-citations, or ansatz smuggling appear in the provided description. The cross-slice parameter sharing is presented as a regularization mechanism whose effect is measured against independent baselines, leaving the derivation self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Science353(6294), 78–82 (2016)
Ståhl, P.L., et al.: Visualization and analysis of gene expression in tissue sections by spatial transcriptomics. Science353(6294), 78–82 (2016)
work page 2016
-
[2]
Science348(6233), aaa6090 (2015)
Chen,K.H.,Boettiger,A.N.,Moffitt,J.R.,Wang,S.,Zhuang,X.:Spatiallyresolved, highly multiplexed RNA profiling in single cells. Science348(6233), aaa6090 (2015)
work page 2015
-
[3]
Science361(6400), eaat5691 (2018)
Wang, X., et al.: Three-dimensional intact-tissue sequencing of single-cell transcrip- tional states. Science361(6400), eaat5691 (2018)
work page 2018
-
[4]
Nature Biotechnology39(3), 313–319 (2021)
Stickels, R.R., et al.: Highly sensitive spatial transcriptomics at near-cellular reso- lution with Slide-seqV2. Nature Biotechnology39(3), 313–319 (2021)
work page 2021
-
[5]
Nature635(8039), 668–678 (2024)
Zhang, B., et al.: A human embryonic limb cell atlas resolved in space and time. Nature635(8039), 668–678 (2024)
work page 2024
-
[6]
arXiv preprint arXiv:2505.04891(2025)
Qi, C., Chen, Y., Wei, Z.: Clustering with communication: A variational framework for single cell representation learning. arXiv preprint arXiv:2505.04891(2025)
- [7]
-
[8]
Allen, W.E., et al.: Molecular and spatial signatures of mouse brain aging at single- cell resolution. Cell186(1), 194–208 (2023)
work page 2023
-
[9]
Nature Ge- netics56(11), 2455–2465 (2024)
Khaliq, A.M., et al.: Spatial transcriptomic analysis of primary and metastatic pancreatic cancers highlights tumor microenvironmental heterogeneity. Nature Ge- netics56(11), 2455–2465 (2024)
work page 2024
-
[10]
Longo, S.K., Guo, M.G., Ji, A.L., Khavari, P.A.: Integrating single-cell and spatial transcriptomicstoelucidateintercellulartissuedynamics.NatureReviewsGenetics 22(10), 627–644 (2021)
work page 2021
-
[11]
Nature Neuroscience24(3), 425–436 (2021)
Maynard, K.R., et al.: Transcriptome-scale spatial gene expression in the human dorsolateral prefrontal cortex. Nature Neuroscience24(3), 425–436 (2021)
work page 2021
-
[12]
Chen, A., et al.: Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays. Cell185(10), 1777–1792 (2022)
work page 2022
-
[13]
Dong,K.,etal.:Benchmarkingmulti-sliceintegrationanddownstreamapplications in spatial transcriptomics data analysis. Genome Biology26, 318 (2025)
work page 2025
-
[14]
Nature Computational Science3(10), 894–906 (2023)
Zhou, X., Dong, K., Zhang, S.: Integrating spatial transcriptomics data across different conditions, technologies and developmental stages. Nature Computational Science3(10), 894–906 (2023)
work page 2023
-
[15]
Nature Communications14(1), 1155 (2023)
Long, Y., et al.: Spatially informed clustering, integration, and deconvolution of spatial transcriptomics with GraphST. Nature Communications14(1), 1155 (2023)
work page 2023
-
[16]
Nature Communications14(1), 7603 (2023)
Xu, H., et al.: SPACEL: deep learning-based characterization of spatial transcrip- tome architectures. Nature Communications14(1), 7603 (2023)
work page 2023
-
[17]
Nature Communications15, 6048 (2024)
Li, H., et al.: SANTO: a coarse-to-fine alignment and stitching method for spatial omics. Nature Communications15, 6048 (2024)
work page 2024
-
[18]
Nature Methods19(5), 567–575 (2022)
Zeira, R., Land, M., Strzalkowski, A., Raphael, B.J.: Alignment and integration of spatial transcriptomics data. Nature Methods19(5), 567–575 (2022)
work page 2022
-
[19]
Nature Methods20(9), 1379–1387 (2023)
Jones, A., Townes, F.W., Li, D., Engelhardt, B.E.: Alignment of spatial genomics data using deep Gaussian processes. Nature Methods20(9), 1379–1387 (2023)
work page 2023
-
[20]
Nature Communications14(1), 8123 (2023)
Clifton, K., et al.: STalign: Alignment of spatial transcriptomics data using diffeo- morphic metric mapping. Nature Communications14(1), 8123 (2023)
work page 2023
-
[21]
Qiu, X., et al.: Spatiotemporal modeling of molecular holograms. Cell187(26), 7351–7373 (2024)
work page 2024
-
[22]
Mildenhall, B., Srinivasan, P.P., Tancik, M., Barron, J.T., Ramamoorthi, R., Ng, R.: NeRF: Representing scenes as neural radiance fields for view synthesis. In: 10 Han et al. Vedaldi, A., Bischof, H., Brox, T., Frahm, J.-M. (eds.) ECCV 2020. LNCS, vol. 12346, pp. 405–421. Springer, Cham (2020)
work page 2020
-
[23]
Sitzmann, V., Martel, J.N.P., Bergman, A.W., Lindell, D.B., Wetzstein, G.: Im- plicit neural representations with periodic activation functions. In: NeurIPS, pp. 7462–7473 (2020)
work page 2020
-
[24]
Luo, Y., Zhao, X., Ye, K., Meng, D.: STINR: Deciphering spatial transcriptomics via implicit neural representation. In: CVPR, pp. 25930–25939 (2025)
work page 2025
- [25]
- [26]
-
[27]
Medical Image Analysis103, 103577 (2025)
Jena, R., Chaudhari, P., Gee, J.C.: Deep implicit optimization enables robust learnable features for deformable image registration. Medical Image Analysis103, 103577 (2025)
work page 2025
-
[28]
Besl, P.J., McKay, N.D.: Method for registration of 3-D shapes. IEEE Trans. Pat- tern Anal. Mach. Intell.14(2), 239–256 (1992)
work page 1992
-
[29]
Technical Report TR-VRVIS-009- 2009, VRVis Research Center, Vienna (2009)
Musialski, P.: Point cloud to model registration. Technical Report TR-VRVIS-009- 2009, VRVis Research Center, Vienna (2009)
work page 2009
-
[30]
Park,K.,etal.:Nerfies:Deformableneuralradiancefields.In:ICCV,pp.5845–5854 (2021)
work page 2021
-
[31]
Tancik, M., et al.: Fourier features let networks learn high frequency functions in low dimensional domains. In: NeurIPS, pp. 7537–7547 (2020)
work page 2020
-
[32]
IEEE Journal of Selected Topics in Signal Processing3(1), 159–169 (2009)
Chun, S.Y., Fessler, J.A.: A simple regularizer for B-spline nonrigid image registra- tion that encourages local invertibility. IEEE Journal of Selected Topics in Signal Processing3(1), 159–169 (2009)
work page 2009
-
[33]
Fan, H., Su, H., Guibas, L.J.: A point set generation network for 3D object recon- struction from a single image. In: CVPR, pp. 2463–2471 (2017)
work page 2017
-
[34]
Nature598(7879), 137–143 (2021)
Zhang, M., et al.: Spatially resolved cell atlas of the mouse primary motor cortex by MERFISH. Nature598(7879), 137–143 (2021)
work page 2021
-
[35]
Science362(6416), eaau5324 (2018)
Moffitt, J.R., et al.: Molecular, spatial, and functional single-cell profiling of the hypothalamic preoptic region. Science362(6416), eaau5324 (2018)
work page 2018
-
[36]
Guo, T., et al.: SPIRAL: integrating and aligning spatially resolved transcriptomics data across different experiments, conditions, and technologies. Genome Biology 24, 241 (2023)
work page 2023
-
[37]
Stuart, T., Butler, A., Hoffman, P., Hafemeister, C., Papalexi, E., Mauck, W.M., Hao, Y., Stoeckius, M., Smibert, P., Satija, R.: Comprehensive integration of single- cell data. Cell177(7), 1888–1902 (2019)
work page 1902
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.