CHRep: Cross-modal Histology Representation and Post-hoc Calibration for Spatial Gene Expression Prediction
Pith reviewed 2026-05-09 22:33 UTC · model grok-4.3
The pith
CHRep combines structure-aware representation learning with post-hoc calibration to predict spatial gene expression from H&E slides more robustly across different slides.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CHRep is a two-phase framework that learns a structure-aware representation by jointly optimizing correlation-aware regression, symmetric image-expression alignment, and coordinate-induced spatial topology regularization during training. For inference, it improves robustness to slide-level appearance shifts using a lightweight calibration module that combines a non-parametric estimate from a training gallery with a magnitude-regularized correction module, all without fine-tuning the main model. This coupling allows stable neighborhood retrieval and controlled bias correction, leading to improved gene-wise correlation under leave-one-slide-out evaluation on multiple cohorts.
What carries the argument
The post-hoc calibration module that integrates non-parametric gallery estimates with magnitude-regularized corrections, supported by joint training objectives for correlation-aware regression, symmetric alignment, and spatial topology regularization.
If this is right
- Gene expression predictions achieve higher Pearson correlations with ground truth across all considered genes on held-out slides.
- Mean squared error and mean absolute error decrease for the predictions.
- The method maintains biologically meaningful spatial variation instead of over-smoothing.
- Predictions remain stable even when test slides have different appearance characteristics from the training set.
Where Pith is reading between the lines
- If the calibration generalizes well, this could allow training on smaller spatial transcriptomics datasets and applying to many routine H&E slides for larger studies.
- Similar calibration strategies might help other histology-based prediction tasks facing domain shifts between slides or patients.
- Extending the topology regularization to incorporate known biological relationships could further improve prediction of coordinated gene sets.
- Testing the framework on whole-slide images from additional tissue types would reveal how broadly the robustness to appearance shifts holds.
Load-bearing premise
The calibration module trained solely on training data can correct for appearance shifts in completely new slides without needing to adjust the main model, and that the combined training losses avoid producing overly smoothed predictions that erase real spatial gene patterns.
What would settle it
Observing that on a new cohort of slides, the calibrated predictions show lower correlation or higher error than the base model alone, or that the predicted expression maps lack the expected spatial heterogeneity seen in actual measurements.
read the original abstract
Spatial transcriptomics (ST) enables spatially resolved gene profiling but remains expensive and low-throughput, limiting large-cohort studies and routine clinical use. Predicting spatial gene expression from routine hematoxylin and eosin (H&E) slides is a promising alternative, yet under realistic leave-one-slide-out evaluation, existing models often suffer from slide-level appearance shifts and regression-driven over-smoothing that suppress biologically meaningful variation. CHRep is a two-phase framework for robust histology-to-expression prediction. In the training phase, CHRep learns a structure-aware representation by jointly optimizing correlation-aware regression, symmetric image-expression alignment, and coordinate-induced spatial topology regularization. In the inference phase, cross-slide robustness is improved without backbone fine-tuning through a lightweight calibration module trained on the training slides, which combines a non-parametric estimate from a training gallery with a magnitude-regularized correction module. Unlike prior embedding-alignment or retrieval-based transfer methods that rely on a single prediction route, CHRep couples topology-preserving representation learning with post-hoc calibration, enabling stable neighborhood retrieval and controlled bias correction under slide-level shifts. Across the three cohorts, CHRep consistently improves gene-wise correlation under leave-one-slide-out evaluation, with the largest gains observed on Alex+10x. Relative to HAGE, the Pearson correlation coefficient on all considered genes [PCC(ACG)] increases by 4.0% on cSCC and 9.8% on HER2+. Relative to mclSTExp, PCC(ACG) further improves by 39.5% on Alex+10x, together with 9.7% and 9.0% reductions in mean squared error (MSE) and mean absolute error (MAE), respectively.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. CHRep is a two-phase framework for predicting spatial gene expression from H&E histology images. In training, it jointly optimizes correlation-aware regression, symmetric image-expression alignment, and coordinate-induced topology regularization to learn structure-aware cross-modal representations. At inference, a lightweight post-hoc calibration module (non-parametric training-gallery estimate plus magnitude-regularized correction) is applied without backbone fine-tuning to mitigate slide-level appearance shifts. Under leave-one-slide-out evaluation on three cohorts (cSCC, HER2+, Alex+10x), the method reports consistent gains in gene-wise Pearson correlation (PCC(ACG)) over baselines such as HAGE (4.0% on cSCC, 9.8% on HER2+) and mclSTExp (39.5% on Alex+10x, with accompanying MSE/MAE reductions).
Significance. If the reported gains prove robust, the work would be significant for spatial transcriptomics by directly addressing two practical failure modes—slide appearance shifts and regression-induced over-smoothing—through a combination of topology-preserving representation learning and post-hoc calibration. The approach avoids full fine-tuning at test time, which is practically attractive. However, the absence of statistical testing, error bars, and detailed ablation of the calibration module's generalization limits the strength of the significance assessment at present.
major comments (3)
- [Abstract and §4] Abstract and §4 (Experiments): The central claim of consistent PCC(ACG) improvement under leave-one-slide-out rests on percentage gains (e.g., 4.0% vs HAGE on cSCC, 39.5% vs mclSTExp on Alex+10x) without reported p-values, confidence intervals, exact baseline re-implementations, or variance across random seeds; this makes it impossible to judge whether the gains exceed experimental noise or depend on particular data-split choices.
- [§3.2] §3.2 (Inference-phase calibration): The calibration module is trained on the same training-slide gallery used to learn the backbone; while leave-one-slide-out partially mitigates leakage, no analysis demonstrates that the non-parametric gallery estimate plus magnitude-regularized correction generalizes when a held-out slide's appearance shift lies outside the span of the training gallery, which is load-bearing for the claim of cross-slide robustness without fine-tuning.
- [§4.3] §4.3 (Ablations and topology regularization): The joint training objectives are asserted to prevent over-smoothing while preserving spatial variation, yet no isolated ablation shows whether the post-hoc calibration step itself preserves or erodes the topology-regularized properties on out-of-gallery test slides; this directly affects the weakest assumption that biologically meaningful variation is retained after correction.
minor comments (2)
- [Abstract] Notation for PCC(ACG) and the exact gene sets considered should be defined once in §2 or §4 rather than only in the abstract.
- [Figures] Figure captions for qualitative spatial maps should explicitly state whether the displayed predictions include or exclude the calibration step.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments, which highlight important aspects for strengthening the presentation of our results. We address each major comment below and will revise the manuscript to incorporate additional statistical analyses, generalization checks, and ablations as outlined.
read point-by-point responses
-
Referee: [Abstract and §4] Abstract and §4 (Experiments): The central claim of consistent PCC(ACG) improvement under leave-one-slide-out rests on percentage gains (e.g., 4.0% vs HAGE on cSCC, 39.5% vs mclSTExp on Alex+10x) without reported p-values, confidence intervals, exact baseline re-implementations, or variance across random seeds; this makes it impossible to judge whether the gains exceed experimental noise or depend on particular data-split choices.
Authors: We agree that statistical rigor is essential for validating the reported improvements. In the revised manuscript, we will add p-values (via paired Wilcoxon signed-rank tests across genes) and 95% confidence intervals for PCC(ACG) differences. We will also report mean and standard deviation over at least three random seeds for training, and provide explicit details on baseline re-implementations, including all hyperparameters and data preprocessing steps used to ensure reproducibility. revision: yes
-
Referee: [§3.2] §3.2 (Inference-phase calibration): The calibration module is trained on the same training-slide gallery used to learn the backbone; while leave-one-slide-out partially mitigates leakage, no analysis demonstrates that the non-parametric gallery estimate plus magnitude-regularized correction generalizes when a held-out slide's appearance shift lies outside the span of the training gallery, which is load-bearing for the claim of cross-slide robustness without fine-tuning.
Authors: The leave-one-slide-out protocol already evaluates on completely unseen slides, and the non-parametric gallery estimate is intended to capture typical appearance variations within the training distribution. We acknowledge the value of explicit out-of-span analysis. In revision, we will add a supplementary analysis quantifying appearance shift magnitudes (e.g., via embedding distances) between training and test slides, and report calibration performance stratified by shift severity to demonstrate robustness limits. revision: yes
-
Referee: [§4.3] §4.3 (Ablations and topology regularization): The joint training objectives are asserted to prevent over-smoothing while preserving spatial variation, yet no isolated ablation shows whether the post-hoc calibration step itself preserves or erodes the topology-regularized properties on out-of-gallery test slides; this directly affects the weakest assumption that biologically meaningful variation is retained after correction.
Authors: We concur that isolating the calibration module's impact on topology preservation is important. The current ablations focus on the joint training objectives, but we will expand §4.3 with a dedicated experiment measuring topology metrics (e.g., spatial autocorrelation of predictions and neighborhood preservation scores) before versus after calibration on held-out slides. This will confirm that the magnitude-regularized correction does not erode the structure-aware properties learned during training. revision: yes
Circularity Check
CHRep's claimed PCC gains under leave-one-slide-out do not reduce to fitted inputs or self-citations by construction
full rationale
The paper's core derivation consists of a training phase that jointly optimizes three explicit objectives (correlation-aware regression, symmetric image-expression alignment, topology regularization) on labeled training slides, followed by a separate inference calibration module whose non-parametric gallery and magnitude-regularized correction are also fit exclusively on those same training slides. Leave-one-slide-out evaluation holds out entire slides, so the reported improvements (e.g., +4.0% PCC(ACG) vs HAGE on cSCC) are measured on data absent from both the main model and the calibration gallery. No equation, claim, or cited result in the manuscript equates a prediction to its training inputs by definition, imports a uniqueness theorem from the same authors, or renames an empirical pattern as a new derivation. The calibration step is an explicit post-hoc mechanism whose generalization to unseen shifts is an empirical question, not a tautology.
Axiom & Free-Parameter Ledger
free parameters (2)
- loss weighting coefficients for joint optimization
- magnitude regularization strength in calibration
axioms (2)
- domain assumption Leave-one-slide-out evaluation adequately proxies real-world slide-level appearance shifts
- domain assumption Coordinate-induced topology regularization preserves biologically meaningful gene variation
Reference graph
Works this paper leans on
-
[1]
Expansion sequencing: Spatially precise in situ transcriptomics in intact biological systems,
S. Alon, D. R. Goodwin, A. Sinha, A. T. Wassie, F. Chen, E. R. Daugh- arthy, Y . Bando, A. Kajita, A. G. Xue, K. Marrettet al., “Expansion sequencing: Spatially precise in situ transcriptomics in intact biological systems,”Science, vol. 371, no. 6528, p. eaax2656, 2021
work page 2021
-
[2]
Spatially resolved tran- scriptomes—next generation tools for tissue exploration,
M. Asp, J. Bergenstr ˚ahle, and J. Lundeberg, “Spatially resolved tran- scriptomes—next generation tools for tissue exploration,”BioEssays, vol. 42, no. 10, p. 1900221, 2020
work page 2020
-
[3]
Exploring tissue architecture using spatial transcriptomics,
A. Rao, D. Barkley, G. S. Franc ¸a, and I. Yanai, “Exploring tissue architecture using spatial transcriptomics,”Nature, vol. 596, no. 7871, pp. 211–220, 2021
work page 2021
-
[4]
Deciphering tumor ecosystems at super resolution from spatial transcriptomics with TESLA,
J. Hu, K. Coleman, D. Zhang, E. B. Lee, H. Kadara, L. Wang, and M. Li, “Deciphering tumor ecosystems at super resolution from spatial transcriptomics with TESLA,”Cell Systems, vol. 14, no. 5, pp. 404–417, 2023
work page 2023
-
[5]
Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays,
A. Chen, S. Liao, M. Cheng, K. Ma, L. Wu, Y . Lai, X. Qiu, J. Yang, J. Xu, S. Haoet al., “Spatiotemporal transcriptomic atlas of mouse organogenesis using DNA nanoball-patterned arrays,”Cell, vol. 185, no. 10, pp. 1777–1792, 2022
work page 2022
-
[6]
Spatial transcriptomics at subspot resolution with BayesSpace,
E. Zhao, M. R. Stone, X. Ren, J. Guenthoer, K. S. Smythe, T. Pulliam, S. R. Williams, C. R. Uytingco, S. E. Taylor, P. Nghiemet al., “Spatial transcriptomics at subspot resolution with BayesSpace,”Nature Biotechnology, vol. 39, no. 11, pp. 1375–1384, 2021
work page 2021
-
[7]
Recent advances in spatially resolved transcriptomics: challenges and opportunities,
J. Lee, M. Yoo, and J. Choi, “Recent advances in spatially resolved transcriptomics: challenges and opportunities,”BMB Reports, vol. 55, no. 3, p. 113, 2022
work page 2022
-
[8]
Abnormality-aware multimodal learning for WSI classifica- tion,
T. M. Dang, Q. Zhou, Y . Guo, H. Ma, S. Na, T. B. Dang, J. Gao, and J. Huang, “Abnormality-aware multimodal learning for WSI classifica- tion,”Frontiers in Medicine, vol. 12, p. 1546452, 2025
work page 2025
-
[9]
Digital profiling of gene expression from histology images with linearized attention,
M. Pizurica, Y . Zheng, F. Carrillo-Perez, H. Noor, W. Yao, C. Wohlfart, A. Vladimirova, K. Marchal, and O. Gevaert, “Digital profiling of gene expression from histology images with linearized attention,”Nature Communications, vol. 15, no. 1, p. 9886, 2024
work page 2024
-
[10]
S. He, Y . Jin, A. Nazaret, L. Shi, X. Chen, S. Rampersaud, B. S. Dhillon, I. Valdez, L. E. Friend, J. L. Fanet al., “Starfysh integrates spatial transcriptomic and histologic data to reveal heterogeneous tumor– immune hubs,”Nature Biotechnology, vol. 43, no. 2, pp. 223–235, 2025
work page 2025
-
[11]
D. Tellez, G. Litjens, P. B ´andi, W. Bulten, J.-M. Bokhorst, F. Ciompi, and J. Van Der Laak, “Quantifying the effects of data augmentation and stain color normalization in convolutional neural networks for computational pathology,”Medical Image Analysis, vol. 58, p. 101544, 2019
work page 2019
-
[12]
Integrating spatial gene ex- pression and breast tumour morphology via deep learning,
B. He, L. Bergenstr ˚ahle, L. Stenbeck, A. Abid, A. Andersson, ˚A. Borg, J. Maaskola, J. Lundeberg, and J. Zou, “Integrating spatial gene ex- pression and breast tumour morphology via deep learning,”Nature Biomedical Engineering, vol. 4, no. 8, pp. 827–834, 2020
work page 2020
-
[13]
Sclera-transfuse: Fusing swin transformer and CNN for accurate sclera segmentation,
H. Li, C. Wang, G. Zhao, Z. He, Y . Wang, and Z. Sun, “Sclera-transfuse: Fusing swin transformer and CNN for accurate sclera segmentation,” in 2023 IEEE International Joint Conference on Biometrics (IJCB). IEEE, 2023, pp. 1–8
work page 2023
-
[14]
M. Pang, K. Su, and M. Li, “Leveraging information in spatial tran- scriptomics to predict super-resolution gene expression from histology images in tumors,”bioRxiv, pp. 2021–11, 2021
work page 2021
-
[15]
Y . Zeng, Z. Wei, W. Yu, R. Yin, Y . Yuan, B. Li, Z. Tang, Y . Lu, and Y . Yang, “Spatial transcriptomics prediction from histology jointly through transformer and graph neural networks,”Briefings in Bioinfor- matics, vol. 23, no. 5, 2022
work page 2022
-
[16]
THItoGene: a deep learning method for predicting spatial transcriptomics from histological images,
Y . Jia, J. Liu, L. Chen, T. Zhao, and Y . Wang, “THItoGene: a deep learning method for predicting spatial transcriptomics from histological images,”Briefings in Bioinformatics, vol. 25, no. 1, 2023
work page 2023
-
[17]
R. Xie, K. Pang, S. Chung, C. Perciani, S. MacParland, B. Wang, and G. Bader, “Spatially resolved gene expression prediction from histology images via bi-modal contrastive learning,”Advances in Neural Information Processing Systems, vol. 36, pp. 70 626–70 637, 2023
work page 2023
-
[18]
Multimodal contrastive learning for spatial gene expression prediction using histology images,
W. Min, Z. Shi, J. Zhang, J. Wan, and C. Wang, “Multimodal contrastive learning for spatial gene expression prediction using histology images,” Briefings in Bioinformatics, vol. 25, no. 6, p. bbae551, 2024
work page 2024
-
[19]
Measuring domain shift for deep learning in histopathology,
K. Stacke, G. Eilertsen, J. Unger, and C. Lundstr ¨om, “Measuring domain shift for deep learning in histopathology,”IEEE Journal of Biomedical and Health Informatics, vol. 25, no. 2, pp. 325–336, 2020
work page 2020
-
[20]
Identifying multicellular spatiotemporal organization of cells with SpaceFlow,
H. Ren, B. L. Walker, Z. Cang, and Q. Nie, “Identifying multicellular spatiotemporal organization of cells with SpaceFlow,”Nature Commu- nications, vol. 13, no. 1, p. 4076, 2022
work page 2022
-
[21]
Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma,
A. L. Ji, A. J. Rubin, K. Thrane, S. Jiang, D. L. Reynolds, R. M. Meyers, M. G. Guo, B. M. George, A. Mollbrink, J. Bergenstr ˚ahleet al., “Multimodal analysis of composition and spatial architecture in human squamous cell carcinoma,”Cell, vol. 182, no. 2, pp. 497–514, 2020
work page 2020
-
[22]
Spatial de- convolution of HER2-positive breast tumors reveals novel intercellular relationships,
A. Andersson, L. Larsson, L. Stenbeck, F. Salm ´en, A. Ehinger, S. Wu, G. Al-Eryani, D. Roden, A. Swarbrick, ˚A. Borget al., “Spatial de- convolution of HER2-positive breast tumors reveals novel intercellular relationships,”bioRxiv, pp. 2020–07, 2020
work page 2020
-
[23]
A single-cell and spatially resolved atlas of human breast cancers,
S. Z. Wu, G. Al-Eryani, D. L. Roden, S. Junankar, K. Harvey, A. An- dersson, A. Thennavan, C. Wang, J. R. Torpy, N. Bartoniceket al., “A single-cell and spatially resolved atlas of human breast cancers,”Nature Genetics, vol. 53, no. 9, pp. 1334–1347, 2021
work page 2021
-
[24]
T. M. Dang, H. Li, Y . Guo, H. Ma, F. Jiang, Y . Miao, Q. Zhou, J. Gao, and J. Huang, “HAGE: Hierarchical alignment gene-enhanced pathology representation learning with spatial transcriptomics,” inInternational Conference on Medical Image Computing and Computer-Assisted In- tervention. Springer, 2025, pp. 228–238
work page 2025
-
[25]
Benchmarking the translational potential of spatial gene expression prediction from histology,
C. Wang, A. S. Chan, X. Fu, S. Ghazanfar, J. Kim, E. Patrick, and J. Y . Yang, “Benchmarking the translational potential of spatial gene expression prediction from histology,”Nature Communications, vol. 16, no. 1, p. 1544, 2025
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.