ImmuVis: Hyperconvolutional Foundation Model for Imaging Mass Cytometry

Dawid Uchal; Eike Staub; Ewa Szczurek; Jakub Giezga{\l}a; Kacper Pietrzyk; Karol Zagr\'odka; Krzysztof Gogolewski; Marcin Mo\.zejko; Mateusz Sulimowicz; Michal Orzy{\l}owski

arxiv: 2602.04585 · v2 · pith:XFV5UMRQnew · submitted 2026-02-04 · 💻 cs.CV

ImmuVis: Hyperconvolutional Foundation Model for Imaging Mass Cytometry

Dawid Uchal , Marcin Mo\.zejko , Krzysztof Gogolewski , Piotr Kupidura , Szymon {\L}ukasik , Jakub Giezga{\l}a , Tomasz Noco\'n , Kacper Pietrzyk

show 7 more authors

Robert Pieniuta Mateusz Sulimowicz Michal Orzy{\l}owski Tomasz Si{\l}kowski Karol Zagr\'odka Eike Staub Ewa Szczurek

This is my paper

Pith reviewed 2026-05-16 07:32 UTC · model grok-4.3

classification 💻 cs.CV

keywords imaging mass cytometryhyperconvolutionfoundation modelmultiplex imagingvirtual staininguncertainty calibrationmarker-adaptive convolution

0 comments

The pith

ImmuVis generates convolutional kernels on the fly from marker embeddings so one model works with any combination of molecular markers in tissue images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard vision models require a fixed set of input channels, but imaging mass cytometry measures different molecular markers in each study, so the channel count changes. ImmuVis pretrains a single set of weights that learns an embedding for every marker and then uses those embeddings to create the exact convolutional filters needed for whatever markers are present in a given image. The resulting hyperconvolution layer lets the model process completely new marker panels without retraining or architectural changes. Pretraining occurs on the IMC17M dataset of more than seventeen million patches, and the model exceeds prior methods on virtual staining and classification while also returning calibrated uncertainty estimates.

Core claim

ImmuVis establishes that marker-adaptive hyperconvolutions, driven by learned embeddings of each measured marker, allow a single foundation model to operate on arbitrary marker subsets in imaging mass cytometry data. The model is pretrained with self-supervised masked reconstruction on the largest IMC corpus to date and delivers higher accuracy than fixed-channel baselines and transformer alternatives at lower compute cost, while the heteroscedastic likelihood objective supplies the only calibrated uncertainty among compared approaches.

What carries the argument

marker-adaptive hyperconvolutions that generate convolutional kernels directly from learned embeddings of the input markers

If this is right

A single set of model weights can be deployed across studies that measure completely different marker panels.
Virtual staining and tissue classification tasks become feasible without aligning marker spaces between training and test data.
Inference cost stays low because the same convolutional backbone serves every marker combination instead of requiring separate models.
Uncertainty estimates are available for every prediction because the heteroscedastic likelihood is part of the training objective.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same embedding-to-kernel mechanism could be tested on other variable-channel modalities such as multiplexed immunofluorescence or mass spectrometry imaging.
New studies could add markers incrementally by learning only the embedding for the added marker while keeping the rest of the model frozen.
The approach may reduce the data-collection burden in clinical settings where full marker panels are expensive or unavailable.

Load-bearing premise

Embeddings learned from the pretraining markers remain effective when asked to generate kernels for entirely new marker combinations never seen during training.

What would settle it

Measure performance on a held-out IMC dataset that uses at least one marker absent from the 265 markers in IMC17M and compare it to a model retrained from scratch on that new marker set.

read the original abstract

We present ImmuVis, a family of efficient foundation models for imaging mass cytometry (IMC), a high-throughput multiplex imaging technology that handles molecular marker measurements as image channels and enables large-scale spatial tissue profiling. Unlike natural images, multiplex imaging lacks a fixed channel space, as real-world marker sets vary across studies, violating a core assumption of standard vision backbones. To address this, ImmuVis introduces marker-adaptive hyperconvolutions that generate convolutional kernels from learned marker embeddings, enabling a single model to operate on arbitrary measured marker subsets without retraining. We pretrain ImmuVis on the largest dataset to date, IMC17M (28 cohorts, 24,405 images, 265 markers, over 17M patches), using self-supervised masked reconstruction. ImmuVis outperforms state-of-the-art baselines and ablations in virtual staining and downstream classification tasks at substantially lower compute cost than transformer-based alternatives, and is the sole model that provides calibrated uncertainty via a heteroscedastic likelihood objective. These results position ImmuVis as a practical framework for real-world IMC modeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ImmuVis gives a workable hyperconvolution fix for variable marker sets in IMC, but the generalization claim to truly new markers still needs external checks.

read the letter

The core move is generating conv kernels from marker embeddings so one model can take any subset of channels without retraining. That hyperconvolution step is new for this domain and directly solves a real pain point in multiplex imaging where panels change between studies. Pretraining on the 17M-patch IMC17M set across 28 cohorts is a solid scale choice, and the self-supervised masked reconstruction gives the model a broad starting point before fine-tuning on virtual staining or classification tasks. The lower compute relative to transformers and the heteroscedastic uncertainty output are practical bonuses if the numbers check out in the full experiments. The approach avoids obvious circularity since the embeddings are learned from data rather than baked into the architecture by design. The soft spot is the generalization test. Results on held-out combinations from the same 265 markers do not address whether the embeddings produce useful kernels for markers never seen in pretraining, different antibody clones, or new staining protocols. That is the load-bearing assumption for the “arbitrary subsets” claim, and without those external cohorts the practical reach is narrower than stated. The paper is aimed at spatial proteomics groups who run IMC and need models that survive changing marker panels. A reader already working on multiplex data would find the method and the large pretraining corpus worth examining. It deserves peer review because the idea is concrete, the dataset is substantial, and the remaining questions are fixable with targeted experiments rather than fatal.

Referee Report

2 major / 1 minor

Summary. The paper introduces ImmuVis, a family of foundation models for imaging mass cytometry that employs marker-adaptive hyperconvolutions. These generate convolutional kernels dynamically from learned marker embeddings, allowing a single pretrained model to process arbitrary marker subsets without retraining. The model is pretrained via self-supervised masked reconstruction on the large IMC17M dataset (28 cohorts, 24,405 images, 265 markers, >17M patches) and is claimed to outperform baselines in virtual staining and downstream classification tasks at lower compute cost than transformers while uniquely providing calibrated uncertainty through a heteroscedastic likelihood.

Significance. If the central claims hold, this work would be significant for multiplex imaging by directly addressing the variable-channel problem that breaks standard vision backbones. It offers a practical, efficient alternative to transformers for real-world IMC analysis and introduces uncertainty calibration that could improve reliability in downstream biological applications.

major comments (2)

[Abstract] Abstract: The assertions of outperformance over state-of-the-art baselines and unique calibrated uncertainty are stated without any quantitative metrics, ablation results, or error analysis, preventing verification of the central claims from the provided text.
[§4] §4 (Experiments): Evaluation is performed only on held-out marker subsets drawn from the same 265-marker vocabulary used in pretraining; this does not test generalization to entirely new markers (different antibody clones, staining protocols, or novel targets) from external cohorts, which is load-bearing for the marker-adaptive hyperconvolution claim.

minor comments (1)

[§3.2] §3.2: The hyperconvolution operation would benefit from an explicit equation showing how marker embeddings are mapped to kernel weights, including any dimensionality or normalization details.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive feedback on our manuscript. We address each major comment below and describe the corresponding revisions.

read point-by-point responses

Referee: [Abstract] Abstract: The assertions of outperformance over state-of-the-art baselines and unique calibrated uncertainty are stated without any quantitative metrics, ablation results, or error analysis, preventing verification of the central claims from the provided text.

Authors: We agree that the abstract, being a concise summary, lacks quantitative support for the performance claims. In the revised manuscript we will insert key metrics (e.g., PSNR/SSIM gains in virtual staining and accuracy improvements in downstream classification) directly into the abstract. Full quantitative results, ablations, and uncertainty calibration details remain in Section 4 and the supplement. revision: yes
Referee: [§4] §4 (Experiments): Evaluation is performed only on held-out marker subsets drawn from the same 265-marker vocabulary used in pretraining; this does not test generalization to entirely new markers (different antibody clones, staining protocols, or novel targets) from external cohorts, which is load-bearing for the marker-adaptive hyperconvolution claim.

Authors: The experiments test the core capability of processing arbitrary subsets drawn from the pretrained 265-marker vocabulary without retraining, which directly addresses the variable-channel problem encountered in real IMC studies. We do not claim zero-shot generalization to entirely novel markers outside this vocabulary, as new markers would require learning new embeddings. In revision we will explicitly delimit the scope of the claim, add a dedicated limitations paragraph, and outline future directions for extending the embedding space. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central mechanism (marker-adaptive hyperconvolutions generating kernels from embeddings) is an architectural design trained via self-supervised masked reconstruction on the external IMC17M dataset (28 cohorts, 24k+ images, 265 markers). Downstream evaluations use separate tasks and held-out data. No equations reduce to fitted inputs by construction, no self-citation chains load-bear the uniqueness or derivation, and no ansatz or renaming is smuggled in. The result is independently falsifiable on external cohorts and does not collapse to its own inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unproven assumption that marker embeddings can be learned to produce suitable kernels for unseen marker combinations; no free parameters or invented entities are quantified in the abstract.

free parameters (1)

marker embeddings
Learned vectors that drive kernel generation; their dimensionality and training details are unspecified.

axioms (1)

domain assumption Marker embeddings capture functional properties sufficient to generate appropriate convolutional kernels for any marker subset
Core premise enabling the hyperconvolution to work without retraining on new panels.

invented entities (1)

hyperconvolution no independent evidence
purpose: Dynamically generates convolutional kernels from marker embeddings
New architectural component introduced to solve variable-channel problem

pith-pipeline@v0.9.0 · 5568 in / 1220 out tokens · 46538 ms · 2026-05-16T07:32:09.971514+00:00 · methodology

ImmuVis: Hyperconvolutional Foundation Model for Imaging Mass Cytometry

Core claim

What carries the argument

If this is right

Where Pith is reading between the lines

Load-bearing premise

What would settle it

discussion (0)