pith. sign in

arxiv: 2604.21573 · v1 · submitted 2026-04-23 · 💻 cs.CV · q-bio.QM

CHRep: Cross-modal Histology Representation and Post-hoc Calibration for Spatial Gene Expression Prediction

Pith reviewed 2026-05-09 22:33 UTC · model grok-4.3

classification 💻 cs.CV q-bio.QM
keywords spatial gene expressionhistology representationH&E imagecross-modal predictionpost-hoc calibrationtopology regularizationleave-one-slide-outspatial transcriptomics
0
0 comments X

The pith

CHRep combines structure-aware representation learning with post-hoc calibration to predict spatial gene expression from H&E slides more robustly across different slides.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper develops CHRep to predict where genes are expressed in tissue using only common H&E stained slides instead of expensive spatial transcriptomics. Existing approaches struggle when test slides look different from training ones and tend to average out real spatial patterns in genes. CHRep tackles this in two steps: first by training a model with objectives that capture both expression correlations and spatial structure from the images, then by adding a lightweight adjustment step at prediction time that uses examples from the training data to fix slide-specific biases. The result is better matching to actual gene measurements when testing on entirely new slides from different patients or cohorts. This matters because it could make gene mapping cheaper and more scalable for studying diseases in large groups without needing specialized equipment for every sample.

Core claim

CHRep is a two-phase framework that learns a structure-aware representation by jointly optimizing correlation-aware regression, symmetric image-expression alignment, and coordinate-induced spatial topology regularization during training. For inference, it improves robustness to slide-level appearance shifts using a lightweight calibration module that combines a non-parametric estimate from a training gallery with a magnitude-regularized correction module, all without fine-tuning the main model. This coupling allows stable neighborhood retrieval and controlled bias correction, leading to improved gene-wise correlation under leave-one-slide-out evaluation on multiple cohorts.

What carries the argument

The post-hoc calibration module that integrates non-parametric gallery estimates with magnitude-regularized corrections, supported by joint training objectives for correlation-aware regression, symmetric alignment, and spatial topology regularization.

If this is right

  • Gene expression predictions achieve higher Pearson correlations with ground truth across all considered genes on held-out slides.
  • Mean squared error and mean absolute error decrease for the predictions.
  • The method maintains biologically meaningful spatial variation instead of over-smoothing.
  • Predictions remain stable even when test slides have different appearance characteristics from the training set.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the calibration generalizes well, this could allow training on smaller spatial transcriptomics datasets and applying to many routine H&E slides for larger studies.
  • Similar calibration strategies might help other histology-based prediction tasks facing domain shifts between slides or patients.
  • Extending the topology regularization to incorporate known biological relationships could further improve prediction of coordinated gene sets.
  • Testing the framework on whole-slide images from additional tissue types would reveal how broadly the robustness to appearance shifts holds.

Load-bearing premise

The calibration module trained solely on training data can correct for appearance shifts in completely new slides without needing to adjust the main model, and that the combined training losses avoid producing overly smoothed predictions that erase real spatial gene patterns.

What would settle it

Observing that on a new cohort of slides, the calibrated predictions show lower correlation or higher error than the base model alone, or that the predicted expression maps lack the expected spatial heterogeneity seen in actual measurements.

read the original abstract

Spatial transcriptomics (ST) enables spatially resolved gene profiling but remains expensive and low-throughput, limiting large-cohort studies and routine clinical use. Predicting spatial gene expression from routine hematoxylin and eosin (H&E) slides is a promising alternative, yet under realistic leave-one-slide-out evaluation, existing models often suffer from slide-level appearance shifts and regression-driven over-smoothing that suppress biologically meaningful variation. CHRep is a two-phase framework for robust histology-to-expression prediction. In the training phase, CHRep learns a structure-aware representation by jointly optimizing correlation-aware regression, symmetric image-expression alignment, and coordinate-induced spatial topology regularization. In the inference phase, cross-slide robustness is improved without backbone fine-tuning through a lightweight calibration module trained on the training slides, which combines a non-parametric estimate from a training gallery with a magnitude-regularized correction module. Unlike prior embedding-alignment or retrieval-based transfer methods that rely on a single prediction route, CHRep couples topology-preserving representation learning with post-hoc calibration, enabling stable neighborhood retrieval and controlled bias correction under slide-level shifts. Across the three cohorts, CHRep consistently improves gene-wise correlation under leave-one-slide-out evaluation, with the largest gains observed on Alex+10x. Relative to HAGE, the Pearson correlation coefficient on all considered genes [PCC(ACG)] increases by 4.0% on cSCC and 9.8% on HER2+. Relative to mclSTExp, PCC(ACG) further improves by 39.5% on Alex+10x, together with 9.7% and 9.0% reductions in mean squared error (MSE) and mean absolute error (MAE), respectively.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. CHRep is a two-phase framework for predicting spatial gene expression from H&E histology images. In training, it jointly optimizes correlation-aware regression, symmetric image-expression alignment, and coordinate-induced topology regularization to learn structure-aware cross-modal representations. At inference, a lightweight post-hoc calibration module (non-parametric training-gallery estimate plus magnitude-regularized correction) is applied without backbone fine-tuning to mitigate slide-level appearance shifts. Under leave-one-slide-out evaluation on three cohorts (cSCC, HER2+, Alex+10x), the method reports consistent gains in gene-wise Pearson correlation (PCC(ACG)) over baselines such as HAGE (4.0% on cSCC, 9.8% on HER2+) and mclSTExp (39.5% on Alex+10x, with accompanying MSE/MAE reductions).

Significance. If the reported gains prove robust, the work would be significant for spatial transcriptomics by directly addressing two practical failure modes—slide appearance shifts and regression-induced over-smoothing—through a combination of topology-preserving representation learning and post-hoc calibration. The approach avoids full fine-tuning at test time, which is practically attractive. However, the absence of statistical testing, error bars, and detailed ablation of the calibration module's generalization limits the strength of the significance assessment at present.

major comments (3)
  1. [Abstract and §4] Abstract and §4 (Experiments): The central claim of consistent PCC(ACG) improvement under leave-one-slide-out rests on percentage gains (e.g., 4.0% vs HAGE on cSCC, 39.5% vs mclSTExp on Alex+10x) without reported p-values, confidence intervals, exact baseline re-implementations, or variance across random seeds; this makes it impossible to judge whether the gains exceed experimental noise or depend on particular data-split choices.
  2. [§3.2] §3.2 (Inference-phase calibration): The calibration module is trained on the same training-slide gallery used to learn the backbone; while leave-one-slide-out partially mitigates leakage, no analysis demonstrates that the non-parametric gallery estimate plus magnitude-regularized correction generalizes when a held-out slide's appearance shift lies outside the span of the training gallery, which is load-bearing for the claim of cross-slide robustness without fine-tuning.
  3. [§4.3] §4.3 (Ablations and topology regularization): The joint training objectives are asserted to prevent over-smoothing while preserving spatial variation, yet no isolated ablation shows whether the post-hoc calibration step itself preserves or erodes the topology-regularized properties on out-of-gallery test slides; this directly affects the weakest assumption that biologically meaningful variation is retained after correction.
minor comments (2)
  1. [Abstract] Notation for PCC(ACG) and the exact gene sets considered should be defined once in §2 or §4 rather than only in the abstract.
  2. [Figures] Figure captions for qualitative spatial maps should explicitly state whether the displayed predictions include or exclude the calibration step.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which highlight important aspects for strengthening the presentation of our results. We address each major comment below and will revise the manuscript to incorporate additional statistical analyses, generalization checks, and ablations as outlined.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Experiments): The central claim of consistent PCC(ACG) improvement under leave-one-slide-out rests on percentage gains (e.g., 4.0% vs HAGE on cSCC, 39.5% vs mclSTExp on Alex+10x) without reported p-values, confidence intervals, exact baseline re-implementations, or variance across random seeds; this makes it impossible to judge whether the gains exceed experimental noise or depend on particular data-split choices.

    Authors: We agree that statistical rigor is essential for validating the reported improvements. In the revised manuscript, we will add p-values (via paired Wilcoxon signed-rank tests across genes) and 95% confidence intervals for PCC(ACG) differences. We will also report mean and standard deviation over at least three random seeds for training, and provide explicit details on baseline re-implementations, including all hyperparameters and data preprocessing steps used to ensure reproducibility. revision: yes

  2. Referee: [§3.2] §3.2 (Inference-phase calibration): The calibration module is trained on the same training-slide gallery used to learn the backbone; while leave-one-slide-out partially mitigates leakage, no analysis demonstrates that the non-parametric gallery estimate plus magnitude-regularized correction generalizes when a held-out slide's appearance shift lies outside the span of the training gallery, which is load-bearing for the claim of cross-slide robustness without fine-tuning.

    Authors: The leave-one-slide-out protocol already evaluates on completely unseen slides, and the non-parametric gallery estimate is intended to capture typical appearance variations within the training distribution. We acknowledge the value of explicit out-of-span analysis. In revision, we will add a supplementary analysis quantifying appearance shift magnitudes (e.g., via embedding distances) between training and test slides, and report calibration performance stratified by shift severity to demonstrate robustness limits. revision: yes

  3. Referee: [§4.3] §4.3 (Ablations and topology regularization): The joint training objectives are asserted to prevent over-smoothing while preserving spatial variation, yet no isolated ablation shows whether the post-hoc calibration step itself preserves or erodes the topology-regularized properties on out-of-gallery test slides; this directly affects the weakest assumption that biologically meaningful variation is retained after correction.

    Authors: We concur that isolating the calibration module's impact on topology preservation is important. The current ablations focus on the joint training objectives, but we will expand §4.3 with a dedicated experiment measuring topology metrics (e.g., spatial autocorrelation of predictions and neighborhood preservation scores) before versus after calibration on held-out slides. This will confirm that the magnitude-regularized correction does not erode the structure-aware properties learned during training. revision: yes

Circularity Check

0 steps flagged

CHRep's claimed PCC gains under leave-one-slide-out do not reduce to fitted inputs or self-citations by construction

full rationale

The paper's core derivation consists of a training phase that jointly optimizes three explicit objectives (correlation-aware regression, symmetric image-expression alignment, topology regularization) on labeled training slides, followed by a separate inference calibration module whose non-parametric gallery and magnitude-regularized correction are also fit exclusively on those same training slides. Leave-one-slide-out evaluation holds out entire slides, so the reported improvements (e.g., +4.0% PCC(ACG) vs HAGE on cSCC) are measured on data absent from both the main model and the calibration gallery. No equation, claim, or cited result in the manuscript equates a prediction to its training inputs by definition, imports a uniqueness theorem from the same authors, or renames an empirical pattern as a new derivation. The calibration step is an explicit post-hoc mechanism whose generalization to unseen shifts is an empirical question, not a tautology.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim rests on the effectiveness of the described training objectives and the generalization of the calibration module, but the abstract provides no explicit list of hyperparameters, loss weights, or architectural choices; the evaluation protocol and biological-preservation assumptions are domain-level rather than derived.

free parameters (2)
  • loss weighting coefficients for joint optimization
    The training phase jointly optimizes three terms whose relative importance must be set by hand or cross-validation.
  • magnitude regularization strength in calibration
    The correction module includes a regularization term whose scale is a tunable parameter.
axioms (2)
  • domain assumption Leave-one-slide-out evaluation adequately proxies real-world slide-level appearance shifts
    The paper treats entire-slide hold-out as a sufficient test of cross-slide robustness.
  • domain assumption Coordinate-induced topology regularization preserves biologically meaningful gene variation
    The regularization is assumed not to suppress real spatial patterns while reducing over-smoothing.

pith-pipeline@v0.9.0 · 5625 in / 1742 out tokens · 34597 ms · 2026-05-09T22:33:29.964772+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages

  1. [1]

    Expansion sequencing: Spatially precise in situ transcriptomics in intact biological systems,

    S. Alon, D. R. Goodwin, A. Sinha, A. T. Wassie, F. Chen, E. R. Daugh- arthy, Y . Bando, A. Kajita, A. G. Xue, K. Marrettet al., “Expansion sequencing: Spatially precise in situ transcriptomics in intact biological systems,”Science, vol. 371, no. 6528, p. eaax2656, 2021

  2. [2]

    Spatially resolved tran- scriptomes—next generation tools for tissue exploration,

    M. Asp, J. Bergenstr ˚ahle, and J. Lundeberg, “Spatially resolved tran- scriptomes—next generation tools for tissue exploration,”BioEssays, vol. 42, no. 10, p. 1900221, 2020

  3. [3]

    Exploring tissue architecture using spatial transcriptomics,

    A. Rao, D. Barkley, G. S. Franc ¸a, and I. Yanai, “Exploring tissue architecture using spatial transcriptomics,”Nature, vol. 596, no. 7871, pp. 211–220, 2021

  4. [4]

    Deciphering tumor ecosystems at super resolution from spatial transcriptomics with TESLA,

    J. Hu, K. Coleman, D. Zhang, E. B. Lee, H. Kadara, L. Wang, and M. Li, “Deciphering tumor ecosystems at super resolution from spatial transcriptomics with TESLA,”Cell Systems, vol. 14, no. 5, pp. 404–417, 2023

  5. [5]

    Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays,

    A. Chen, S. Liao, M. Cheng, K. Ma, L. Wu, Y . Lai, X. Qiu, J. Yang, J. Xu, S. Haoet al., “Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays,”Cell, vol. 185, no. 10, pp. 1777–1792, 2022

  6. [6]

    Spatial transcriptomics at subspot resolution with BayesSpace,

    E. Zhao, M. R. Stone, X. Ren, J. Guenthoer, K. S. Smythe, T. Pulliam, S. R. Williams, C. R. Uytingco, S. E. Taylor, P. Nghiemet al., “Spatial transcriptomics at subspot resolution with BayesSpace,”Nature Biotechnology, vol. 39, no. 11, pp. 1375–1384, 2021

  7. [7]

    Recent advances in spatially resolved transcriptomics: challenges and opportunities,

    J. Lee, M. Yoo, and J. Choi, “Recent advances in spatially resolved transcriptomics: challenges and opportunities,”BMB Reports, vol. 55, no. 3, p. 113, 2022

  8. [8]

    Abnormality-aware multimodal learning for WSI classifica- tion,

    T. M. Dang, Q. Zhou, Y . Guo, H. Ma, S. Na, T. B. Dang, J. Gao, and J. Huang, “Abnormality-aware multimodal learning for WSI classifica- tion,”Frontiers in Medicine, vol. 12, p. 1546452, 2025

  9. [9]

    Digital profiling of gene expression from histology images with linearized attention,

    M. Pizurica, Y . Zheng, F. Carrillo-Perez, H. Noor, W. Yao, C. Wohlfart, A. Vladimirova, K. Marchal, and O. Gevaert, “Digital profiling of gene expression from histology images with linearized attention,”Nature Communications, vol. 15, no. 1, p. 9886, 2024

  10. [10]

    Starfysh integrates spatial transcriptomic and histologic data to reveal heterogeneous tumor– immune hubs,

    S. He, Y . Jin, A. Nazaret, L. Shi, X. Chen, S. Rampersaud, B. S. Dhillon, I. Valdez, L. E. Friend, J. L. Fanet al., “Starfysh integrates spatial transcriptomic and histologic data to reveal heterogeneous tumor– immune hubs,”Nature Biotechnology, vol. 43, no. 2, pp. 223–235, 2025

  11. [11]

    Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology,

    D. Tellez, G. Litjens, P. B ´andi, W. Bulten, J.-M. Bokhorst, F. Ciompi, and J. Van Der Laak, “Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology,”Medical Image Analysis, vol. 58, p. 101544, 2019

  12. [12]

    Integrating spatial gene ex- pression and breast tumour morphology via deep learning,

    B. He, L. Bergenstr ˚ahle, L. Stenbeck, A. Abid, A. Andersson, ˚A. Borg, J. Maaskola, J. Lundeberg, and J. Zou, “Integrating spatial gene ex- pression and breast tumour morphology via deep learning,”Nature Biomedical Engineering, vol. 4, no. 8, pp. 827–834, 2020

  13. [13]

    Sclera-transfuse: Fusing swin transformer and CNN for accurate sclera segmentation,

    H. Li, C. Wang, G. Zhao, Z. He, Y . Wang, and Z. Sun, “Sclera-transfuse: Fusing swin transformer and CNN for accurate sclera segmentation,” in 2023 IEEE International Joint Conference on Biometrics (IJCB). IEEE, 2023, pp. 1–8

  14. [14]

    Leveraging information in spatial tran- scriptomics to predict super-resolution gene expression from histology images in tumors,

    M. Pang, K. Su, and M. Li, “Leveraging information in spatial tran- scriptomics to predict super-resolution gene expression from histology images in tumors,”bioRxiv, pp. 2021–11, 2021

  15. [15]

    Spatial transcriptomics prediction from histology jointly through transformer and graph neural networks,

    Y . Zeng, Z. Wei, W. Yu, R. Yin, Y . Yuan, B. Li, Z. Tang, Y . Lu, and Y . Yang, “Spatial transcriptomics prediction from histology jointly through transformer and graph neural networks,”Briefings in Bioinfor- matics, vol. 23, no. 5, 2022

  16. [16]

    THItoGene: a deep learning method for predicting spatial transcriptomics from histological images,

    Y . Jia, J. Liu, L. Chen, T. Zhao, and Y . Wang, “THItoGene: a deep learning method for predicting spatial transcriptomics from histological images,”Briefings in Bioinformatics, vol. 25, no. 1, 2023

  17. [17]

    Spatially resolved gene expression prediction from histology images via bi-modal contrastive learning,

    R. Xie, K. Pang, S. Chung, C. Perciani, S. MacParland, B. Wang, and G. Bader, “Spatially resolved gene expression prediction from histology images via bi-modal contrastive learning,”Advances in Neural Information Processing Systems, vol. 36, pp. 70 626–70 637, 2023

  18. [18]

    Multimodal contrastive learning for spatial gene expression prediction using histology images,

    W. Min, Z. Shi, J. Zhang, J. Wan, and C. Wang, “Multimodal contrastive learning for spatial gene expression prediction using histology images,” Briefings in Bioinformatics, vol. 25, no. 6, p. bbae551, 2024

  19. [19]

    Measuring domain shift for deep learning in histopathology,

    K. Stacke, G. Eilertsen, J. Unger, and C. Lundstr ¨om, “Measuring domain shift for deep learning in histopathology,”IEEE Journal of Biomedical and Health Informatics, vol. 25, no. 2, pp. 325–336, 2020

  20. [20]

    Identifying multicellular spatiotemporal organization of cells with SpaceFlow,

    H. Ren, B. L. Walker, Z. Cang, and Q. Nie, “Identifying multicellular spatiotemporal organization of cells with SpaceFlow,”Nature Commu- nications, vol. 13, no. 1, p. 4076, 2022

  21. [21]

    Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma,

    A. L. Ji, A. J. Rubin, K. Thrane, S. Jiang, D. L. Reynolds, R. M. Meyers, M. G. Guo, B. M. George, A. Mollbrink, J. Bergenstr ˚ahleet al., “Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma,”Cell, vol. 182, no. 2, pp. 497–514, 2020

  22. [22]

    Spatial de- convolution of HER2-positive breast tumors reveals novel intercellular relationships,

    A. Andersson, L. Larsson, L. Stenbeck, F. Salm ´en, A. Ehinger, S. Wu, G. Al-Eryani, D. Roden, A. Swarbrick, ˚A. Borget al., “Spatial de- convolution of HER2-positive breast tumors reveals novel intercellular relationships,”bioRxiv, pp. 2020–07, 2020

  23. [23]

    A single-cell and spatially resolved atlas of human breast cancers,

    S. Z. Wu, G. Al-Eryani, D. L. Roden, S. Junankar, K. Harvey, A. An- dersson, A. Thennavan, C. Wang, J. R. Torpy, N. Bartoniceket al., “A single-cell and spatially resolved atlas of human breast cancers,”Nature Genetics, vol. 53, no. 9, pp. 1334–1347, 2021

  24. [24]

    HAGE: Hierarchical alignment gene-enhanced pathology representation learning with spatial transcriptomics,

    T. M. Dang, H. Li, Y . Guo, H. Ma, F. Jiang, Y . Miao, Q. Zhou, J. Gao, and J. Huang, “HAGE: Hierarchical alignment gene-enhanced pathology representation learning with spatial transcriptomics,” inInternational Conference on Medical Image Computing and Computer-Assisted In- tervention. Springer, 2025, pp. 228–238

  25. [25]

    Benchmarking the translational potential of spatial gene expression prediction from histology,

    C. Wang, A. S. Chan, X. Fu, S. Ghazanfar, J. Kim, E. Patrick, and J. Y . Yang, “Benchmarking the translational potential of spatial gene expression prediction from histology,”Nature Communications, vol. 16, no. 1, p. 1544, 2025