arxiv: 2605.04682 · v1 · submitted 2026-05-06 · 💻 cs.LG · cs.CV

Recognition: unknown

HEXST: Hexagonal Shifted-Window Transformer for Spatial Transcriptomics Gene Expression Prediction

Keunho Byeon , Jin Tae Kwak

Authors on Pith no claims yet

Pith reviewed 2026-05-08 16:50 UTC · model grok-4.3

classification 💻 cs.LG cs.CV

keywords spatial transcriptomicsgene expression predictionhexagonal transformerhistology slidesspatial heterogeneityshifted-window attentionrotary positional encoding

0 comments

The pith

A transformer built for hexagonal spot layouts predicts gene activity from tissue slides while keeping local contrasts sharp.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to show that existing models for turning routine microscope slides into spatial gene maps lose important detail because they ignore the actual hexagonal arrangement of measurement spots on common platforms. HEXST instead builds attention windows and position encodings that match this hexagonal sampling directly, then adds a training term that rewards differences between neighboring genes rather than averaging them away. A reader should care because this could let doctors and researchers obtain reliable maps of gene activity without running costly specialized assays for every sample. If the approach holds, histology images alone would yield predictions that reflect real biological variation across space instead of smoothed approximations.

Core claim

HEXST operates directly on hexagonal spot coordinates to enable efficient local-to-global contextual modeling via a tailored shifted-window attention mechanism and hexagonal rotary positional encoding, and it complements point-wise regression with a contrast-sensitive differential objective plus transcriptomic priors from a pretrained single-cell model, yielding accurate and robust spatial gene expression predictions that preserve gene-wise contrast and spatial heterogeneity across seven datasets.

What carries the argument

Hexagonal shifted-window attention paired with hexagonal rotary positional encoding, which aligns the model's context aggregation with the native geometry of spot-array platforms.

If this is right

Gene expression maps retain distinct profiles for individual genes rather than converging toward uniform averages.
Predictions remain robust when applied to new tissue sections from the same platforms without major retraining.
The model can incorporate single-cell priors during training to guide spatial inference from bulk histology alone.
Over-smoothing artifacts common in point-wise regression objectives are reduced in the final output maps.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar geometry-aware attention designs could be tested on other data collected on non-rectangular lattices, such as certain lattice-based imaging sensors.
The contrast-sensitive objective might be adapted to other regression tasks where preserving local differences matters more than global averages.
If the hexagonal components prove essential, future histology-based predictors may need to expose their sampling geometry as an explicit input rather than assuming image-like grids.

Load-bearing premise

That matching attention and position encoding to hexagonal spot geometry plus using a contrast-sensitive loss term will capture gene-specific spatial patterns more faithfully than standard Cartesian or geometry-agnostic models.

What would settle it

If a baseline transformer without hexagonal window shifting or hexagonal rotary encoding matches or exceeds HEXST accuracy and contrast preservation on the same seven datasets, the claim that geometry alignment is required would be falsified.

Figures

Figures reproduced from arXiv: 2605.04682 by Jin Tae Kwak, Keunho Byeon.

**Figure 1.** Figure 1: Overview of the HEXST architecture. Given a WSI, spot-level image patches are extracted and encoded by a pathology foundation model, and then processed by a multi-stage shifted hexagonal window Transformer with hexagonal rotary positional encoding (HexRoPE). HEXST predicts spatial gene expression from spot images by matching deviations between ground truth and predictions and aligning intermediate represen… view at source ↗

**Figure 2.** Figure 2: Visualization of hexagonal window partitioning. By varying the shifting offset δ and window size k, distinct windows are formed (displayed with different colors). and M is the number of windows. The shifting parameter δb offsets window centers across successive blocks, facilitating information exchange between neighboring windows and reducing boundary artifacts. Using Ω (l,b) m , we assign the correspondin… view at source ↗

**Figure 3.** Figure 3: Conceptual illustration of hexagonal rotary positional encoding (HexRoPE). Within each window, HexRoPE assigns a unique set of relative spatial offsets that are used to rate query and key features. Rectangular boxes denote rotation angles applied along each principal direction (u, v, w) of the hexagonal lattice. 4.1.3. HEXAGONAL ROTARY POSITIONAL ENCODING (HEXROPE) Absolute (Cartesian) positional encoding … view at source ↗

**Figure 4.** Figure 4: Qualitative comparison of spatial gene expression prediction results. Each column shows heatmaps of ground truth and HEXST-predicted gene expression for selected marker genes. Heatmap values are annotated as mean ± standard deviation (minimum–maximum) across spatial spots. (5) EGNv2 (Yang et al., 2024), (6) NH2ST (Qu et al., 2025), and (7) PEKA (Pan et al., 2025). To provide a comprehensive evaluation, w… view at source ↗

**Figure 5.** Figure 5: Qualitative examples of transcriptomics-guided survival prediction on TCGA-LUAD. A representative slide is shown, annotated with survival outcome (event vs. censored) and followup time (days), together with predicted spatial gene expression heatmaps. The numbers reported below each heatmap indicate the mean ± standard deviation with the minimum–maximum range of predicted expression computed across all sp… view at source ↗

**Figure 6.** Figure 6: Rank-based comparison across seven SpaRED datasets over five metrics (PCCF, PCCS, MIF, AUC0vNZ, AUCQ50). For each metric and dataset, models are ranked using dense ranking (1 = best), and the bar plot reports the top-1 frequency (fraction of datasets where a method attains rank 1). C.3. Per-Dataset Results on Seven Datasets view at source ↗

**Figure 7.** Figure 7: Qualitative comparison of spatial gene expression prediction results. For each dataset, the first column shows Leiden clustering results, followed by ground-truth (GT) and HEXST-predicted gene expression heatmaps for selected marker genes. Results are shown for Abalo Human Squamous Cell Carcinoma, Erickson Human Prostate Cancer P1, and Villacampa Lung Organoid. Heatmap values are annotated as mean ± standa… view at source ↗

**Figure 8.** Figure 8: Qualitative examples of transcriptomics-guided survival prediction on TCGA-LUAD. For each patient, we show a representative slide thumbnail annotated with survival outcome (event vs. censored) and follow-up time, together with predicted spatial gene expression heatmaps for SLC2A1, MDK, and B2M. The values reported below each heatmap indicate the mean ± standard deviation with the minimum–maximum range of p… view at source ↗

read the original abstract

Spatial transcriptomics offers spatially resolved gene expression profiling within tissue sections, but its cost and limited throughput hinder large-scale deployment. To extend this capability to routine practice, recent computational methods aim to infer spatial gene expression directly from ubiquitous hematoxylin and eosin-stained histology slides. However, most existing models assume Cartesian or geometry-agnostic locality, despite the hexagonal sampling of widely used spot-array platforms, and point-wise regression objectives often yield over-smoothed gene expression profiles, obscuring gene-specific spatial heterogeneity. To address these, we propose HEXST, a geometry-aligned Transformer for spatial gene expression prediction from histology. HEXST operates directly on hexagonal spot coordinates to enable efficient local-to-global contextual modeling via tailored shifted-window attention mechanism and hexagonal rotary positional encoding. To enhance gene-wise spatial contrast, HEXST complements point-wise regression with a contrast-sensitive differential objective and transcriptomic priors from a pretrained single-cell foundation model during training. Across seven spatial transcriptomics datasets, HEXST consistently outperforms state-of-the-art models, providing accurate and robust spatial gene expression predictions while preserving gene-wise contrast and spatial heterogeneity.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

HEXST tunes a Transformer to hexagonal spot geometry and adds a contrast-sensitive loss for spatial transcriptomics prediction, but the abstract supplies no numbers or ablations to back the outperformance claim.

read the letter

HEXST adapts the Transformer to hexagonal spot geometry and adds a contrast-sensitive loss for spatial transcriptomics prediction, but the abstract supplies no numbers or ablations to back the outperformance claim. The work targets a real bottleneck: most histology-to-gene models assume square grids or ignore geometry, and plain regression losses tend to wash out gene-specific spatial patterns. HEXST instead uses shifted-window attention built for hex coordinates, a matching rotary positional encoding, and an extra differential term in the objective, plus priors from a pretrained single-cell model. Those choices are a direct response to the sampling layout and the smoothing issue, and they are not just a minor tweak on existing Cartesian designs. If the gains are real, the approach could help scale up spatial studies without running full ST assays on every sample. The abstract states consistent wins across seven datasets while preserving contrast and heterogeneity. The main weakness is the complete absence of quantitative results, baseline names, ablation tables, error bars, or statistical tests in the provided text. Without those, it is impossible to tell whether the hexagonal adaptations actually deliver measurable improvement or whether the contrast term helps more than a simpler regularizer. The reliance on an external single-cell prior is reasonable but also means the model is not purely end-to-end from histology. This is aimed at computational pathology and spatial omics groups that already experiment with Transformers on histology. A reader who wants concrete ideas for geometry-aware attention could extract useful pieces even if the final numbers need checking. I would send it to peer review. The problem is practical, the architectural moves are targeted, and a referee can verify the results and controls once the full methods and tables are examined.

Referee Report

2 major / 1 minor

Summary. The paper proposes HEXST, a geometry-aligned Transformer for inferring spatial gene expression from H&E histology slides. It operates on hexagonal spot coordinates using a tailored shifted-window attention mechanism and hexagonal rotary positional encoding, augments standard point-wise regression with a contrast-sensitive differential objective, and incorporates transcriptomic priors from a pretrained single-cell foundation model. The central claim is that HEXST consistently outperforms state-of-the-art models across seven spatial transcriptomics datasets while preserving gene-wise contrast and spatial heterogeneity.

Significance. If the performance claims hold with rigorous validation, the work could meaningfully advance computational spatial transcriptomics by respecting the native hexagonal sampling geometry of platforms such as Visium and mitigating over-smoothing that obscures spatial heterogeneity. The geometry-specific architectural choices and differential objective represent a targeted adaptation that, if shown to generalize, may influence similar modeling in other spatially resolved biological data modalities.

major comments (2)

Abstract: The assertion that HEXST 'consistently outperforms state-of-the-art models' on seven datasets is presented without any quantitative metrics (e.g., MSE, Pearson correlation), baseline specifications, statistical tests, ablation results, or error analysis. This absence is load-bearing for the central claim and prevents assessment of whether the reported gains are meaningful or robust.
Methods (architecture and objective): The motivation for hexagonal rotary positional encoding and the contrast-sensitive differential objective is stated as addressing Cartesian assumptions and over-smoothing, yet no equations, derivations, or ablation studies are supplied to demonstrate that these components produce measurable improvement in gene-specific spatial heterogeneity over geometry-agnostic baselines.

minor comments (1)

Abstract: The writing is clear, but inclusion of one or two key quantitative highlights (with dataset names) would allow readers to gauge the scale of improvement immediately.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the detailed and constructive feedback. We address each major comment below and will revise the manuscript to strengthen the presentation of results and methods.

read point-by-point responses

Referee: Abstract: The assertion that HEXST 'consistently outperforms state-of-the-art models' on seven datasets is presented without any quantitative metrics (e.g., MSE, Pearson correlation), baseline specifications, statistical tests, ablation results, or error analysis. This absence is load-bearing for the central claim and prevents assessment of whether the reported gains are meaningful or robust.

Authors: We agree that the abstract would be strengthened by including concise quantitative support for the performance claim. In the revised version we will add key metrics (e.g., mean Pearson correlation and MSE improvements across the seven datasets) together with a brief statement of the primary baselines and the use of paired statistical tests. Full tables, error analyses, and ablation results remain in the main Results section and supplementary material. revision: yes
Referee: Methods (architecture and objective): The motivation for hexagonal rotary positional encoding and the contrast-sensitive differential objective is stated as addressing Cartesian assumptions and over-smoothing, yet no equations, derivations, or ablation studies are supplied to demonstrate that these components produce measurable improvement in gene-specific spatial heterogeneity over geometry-agnostic baselines.

Authors: The current Methods section provides the architectural description and objective formulation but does not include explicit derivations or dedicated ablation experiments isolating the hexagonal rotary encoding and contrast-sensitive loss. We will add (i) the full mathematical definitions and a short derivation showing how the hexagonal rotary encoding differs from standard Cartesian rotary encodings, and (ii) ablation results quantifying the contribution of each component to gene-wise spatial contrast and heterogeneity metrics relative to Cartesian and standard-regression baselines. These additions will appear in the main text or supplementary material. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The accessible text consists only of the abstract, which describes architectural motivations (hexagonal coordinates, shifted-window attention, rotary encoding, contrast-sensitive objective) and use of an external pretrained single-cell model without providing equations, derivation steps, fitted-parameter predictions, or self-citations. No load-bearing claim reduces to its own inputs by construction; empirical outperformance is asserted across datasets with external grounding. Full manuscript details are unavailable, precluding identification of any circular reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Based solely on the abstract, no explicit free parameters, axioms, or invented entities are described. The model presumably inherits standard Transformer hyperparameters and relies on an external pretrained single-cell foundation model whose weights are treated as given.

pith-pipeline@v0.9.0 · 5491 in / 1142 out tokens · 60326 ms · 2026-05-08T16:50:34.250230+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

48 extracted references · 3 canonical work pages · 1 internal anchor

[1]

Science , volume=

Visualization and analysis of gene expression in tissue sections by spatial transcriptomics , author=. Science , volume=. 2016 , publisher=

2016
[2]

Nature methods , volume=

Method of the Year: spatially resolved transcriptomics , author=. Nature methods , volume=. 2021 , publisher=

2021
[3]

Nature methods , volume=

Museum of spatial transcriptomics , author=. Nature methods , volume=. 2022 , publisher=

2022
[4]

Science , volume=

Slide-seq: A scalable technology for measuring genome-wide expression at high spatial resolution , author=. Science , volume=. 2019 , publisher=

2019
[5]

Science , volume=

Spatially resolved, highly multiplexed RNA profiling in single cells , author=. Science , volume=. 2015 , publisher=

2015
[6]

Nature biomedical engineering , volume=

Integrating spatial gene expression and breast tumour morphology via deep learning , author=. Nature biomedical engineering , volume=. 2020 , publisher=

2020
[7]

BioRxiv , pages=

Leveraging information in spatial transcriptomics to predict super-resolution gene expression from histology images in tumors , author=. BioRxiv , pages=. 2021 , publisher=

2021
[8]

Briefings in Bioinformatics , volume=

Spatial transcriptomics prediction from histology jointly through transformer and graph neural networks , author=. Briefings in Bioinformatics , volume=. 2022 , publisher=

2022
[9]

Advances in Neural Information Processing Systems , volume=

Spatially resolved gene expression prediction from histology images via bi-modal contrastive learning , author=. Advances in Neural Information Processing Systems , volume=
[10]

International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=

Spatially gene expression prediction using dual-scale contrastive learning , author=. International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=. 2025 , organization=

2025
[11]

arXiv preprint arXiv:2505.02980 , year=

Completing Spatial Transcriptomics Data for Gene Expression Prediction Benchmarking , author=. arXiv preprint arXiv:2505.02980 , year=

work page arXiv
[12]

Medical Image Analysis , volume=

Transformer with convolution and graph-node co-embedding: An accurate and interpretable vision backbone for predicting gene expressions from local histopathological image , author=. Medical Image Analysis , volume=. 2024 , publisher=

2024
[13]

Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages=

Exemplar guided deep neural network for spatial transcriptomics analysis of gene expression prediction , author=. Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision , pages=
[14]

International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=

Teaching pathology foundation models to accurately predict gene expression with parameter efficient knowledge transfer , author=. International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=. 2025 , organization=

2025
[15]

Nature , volume=

Transfer learning enables predictions in network biology , author=. Nature , volume=. 2023 , publisher=

2023
[16]

Nature methods , volume=

Large-scale foundation model on single-cell transcriptomics , author=. Nature methods , volume=. 2024 , publisher=

2024
[17]

Proceedings of the IEEE/CVF international conference on computer vision , pages=

Swin transformer: Hierarchical vision transformer using shifted windows , author=. Proceedings of the IEEE/CVF international conference on computer vision , pages=
[18]

An Image is Worth 16x16 Words: Transformers for Image Recognition at Scale

An image is worth 16x16 words: Transformers for image recognition at scale , author=. arXiv preprint arXiv:2010.11929 , year=

work page internal anchor Pith review arXiv 2010
[19]

Neurocomputing , volume=

Roformer: Enhanced transformer with rotary position embedding , author=. Neurocomputing , volume=. 2024 , publisher=

2024
[20]

Self-attention with relative position repre- sentations

Self-attention with relative position representations , author=. arXiv preprint arXiv:1803.02155 , year=

work page arXiv
[21]

European Conference on Computer Vision , pages=

Rotary position embedding for vision transformer , author=. European Conference on Computer Vision , pages=. 2024 , organization=

2024
[22]

Pattern Recognition , volume=

Spatial transcriptomics analysis of gene expression prediction using exemplar guided graph neural network , author=. Pattern Recognition , volume=. 2024 , publisher=

2024
[23]

Medical Image Analysis , pages=

DIOR-ViT: Differential ordinal learning Vision Transformer for cancer classification in pathology images , author=. Medical Image Analysis , pages=. 2025 , publisher=

2025
[24]

International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=

Enhancing Gene Expression Prediction from Histology Images with Spatial Transcriptomics Completion , author=. International Conference on Medical Image Computing and Computer-Assisted Intervention , pages=. 2024 , organization=

2024
[25]

Nature medicine , volume=

Towards a general-purpose foundation model for computational pathology , author=. Nature medicine , volume=. 2024 , publisher=

2024
[26]

Advances in neural information processing systems , volume=

Transmil: Transformer based correlated multiple instance learning for whole slide image classification , author=. Advances in neural information processing systems , volume=
[27]

International conference on machine learning , pages=

Attention-based deep multiple instance learning , author=. International conference on machine learning , pages=. 2018 , organization=

2018
[28]

Nature biomedical engineering , volume=

Data-efficient and weakly supervised computational pathology on whole-slide images , author=. Nature biomedical engineering , volume=. 2021 , publisher=

2021
[29]

Journal of the Royal Statistical Society: Series B (Methodological) , volume=

Regression models and life-tables , author=. Journal of the Royal Statistical Society: Series B (Methodological) , volume=. 1972 , publisher=

1972
[30]

Journal of innate immunity , volume=

S100A8 and S100A9: new insights into their roles in malignancy , author=. Journal of innate immunity , volume=. 2011 , publisher=

2011
[31]

Oncotarget , volume=

The prognostic value of GLUT1 in cancers: a systematic review and meta-analysis , author=. Oncotarget , volume=
[32]

Disease markers , volume=

FASN protein overexpression indicates poor biochemical recurrence-free survival in prostate cancer , author=. Disease markers , volume=. 2020 , publisher=

2020
[33]

33 locus , author=

Functional characterization of CLPTM1L as a lung cancer risk candidate gene in the 5p15. 33 locus , author=. PloS one , volume=. 2012 , publisher=

2012
[34]

Annals of Translational Medicine , volume=

Homozygous deletion of the HLA-B gene as an acquired-resistance mechanism to nivolumab in a patient with lung adenocarcinoma: a case report , author=. Annals of Translational Medicine , volume=
[35]

JTO Clinical and Research Reports , pages=

Midkine expression as a candidate biomarker to predict the recurrence of stage IA lung adenocarcinoma , author=. JTO Clinical and Research Reports , pages=. 2025 , publisher=

2025
[36]

Genomic correlates of response to immune checkpoint blockade in microsatellite-stable solid tumors , title =

Miao, Di and others , journal =. Genomic correlates of response to immune checkpoint blockade in microsatellite-stable solid tumors , title =. 2018 , volume =

2018
[37]

Breakthroughs in statistics: Methodology and distribution , pages=

Statistical methods for research workers , author=. Breakthroughs in statistics: Methodology and distribution , pages=. 1970 , publisher=

1970
[38]

2013 , publisher=

Applied logistic regression , author=. 2013 , publisher=

2013
[39]

Journal of the Royal Statistical Society Series A: Statistics in Society , volume=

Generalized linear models , author=. Journal of the Royal Statistical Society Series A: Statistics in Society , volume=. 1972 , publisher=

1972
[40]

Mendeley Data , volume=

Human squamous cell carcinoma, visium , author=. Mendeley Data , volume=
[41]

Nature , volume=

Spatially resolved clonal copy number alterations in benign and malignant tissue , author=. Nature , volume=. 2022 , publisher=

2022
[42]

Nature Communications , volume=

Spatially resolved transcriptomic profiling of degraded and challenging fresh frozen samples , author=. Nature Communications , volume=. 2023 , publisher=

2023
[43]

Nature Biotechnology , volume=

Spatial multimodal analysis of transcriptomes and metabolomes in tissues , author=. Nature Biotechnology , volume=. 2024 , publisher=

2024
[44]

Cell Genomics , volume=

Genome-wide spatial expression profiling in formalin-fixed tissues , author=. Cell Genomics , volume=. 2021 , publisher=

2021
[45]

Nature communications , volume=

Spatially organized tumor-stroma boundary determines the efficacy of immunotherapy in colorectal cancer patients , author=. Nature communications , volume=. 2024 , publisher=

2024
[46]

Nature Communications , volume=

Spatially resolved gene expression profiling of tumor microenvironment reveals key steps of lung adenocarcinoma development , author=. Nature Communications , volume=. 2024 , publisher=

2024
[47]

Nature Communications , volume=

Inferring histology-associated gene expression gradients in spatial transcriptomic studies , author=. Nature Communications , volume=. 2024 , publisher=

2024
[48]

Genome biology , volume=

Large scale comparison of global gene expression patterns in human and mouse , author=. Genome biology , volume=. 2010 , publisher=

2010