ImmuVis: Hyperconvolutional Foundation Model for Imaging Mass Cytometry
Pith reviewed 2026-05-16 07:32 UTC · model grok-4.3
The pith
ImmuVis generates convolutional kernels on the fly from marker embeddings so one model works with any combination of molecular markers in tissue images.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ImmuVis establishes that marker-adaptive hyperconvolutions, driven by learned embeddings of each measured marker, allow a single foundation model to operate on arbitrary marker subsets in imaging mass cytometry data. The model is pretrained with self-supervised masked reconstruction on the largest IMC corpus to date and delivers higher accuracy than fixed-channel baselines and transformer alternatives at lower compute cost, while the heteroscedastic likelihood objective supplies the only calibrated uncertainty among compared approaches.
What carries the argument
marker-adaptive hyperconvolutions that generate convolutional kernels directly from learned embeddings of the input markers
If this is right
- A single set of model weights can be deployed across studies that measure completely different marker panels.
- Virtual staining and tissue classification tasks become feasible without aligning marker spaces between training and test data.
- Inference cost stays low because the same convolutional backbone serves every marker combination instead of requiring separate models.
- Uncertainty estimates are available for every prediction because the heteroscedastic likelihood is part of the training objective.
Where Pith is reading between the lines
- The same embedding-to-kernel mechanism could be tested on other variable-channel modalities such as multiplexed immunofluorescence or mass spectrometry imaging.
- New studies could add markers incrementally by learning only the embedding for the added marker while keeping the rest of the model frozen.
- The approach may reduce the data-collection burden in clinical settings where full marker panels are expensive or unavailable.
Load-bearing premise
Embeddings learned from the pretraining markers remain effective when asked to generate kernels for entirely new marker combinations never seen during training.
What would settle it
Measure performance on a held-out IMC dataset that uses at least one marker absent from the 265 markers in IMC17M and compare it to a model retrained from scratch on that new marker set.
read the original abstract
We present ImmuVis, a family of efficient foundation models for imaging mass cytometry (IMC), a high-throughput multiplex imaging technology that handles molecular marker measurements as image channels and enables large-scale spatial tissue profiling. Unlike natural images, multiplex imaging lacks a fixed channel space, as real-world marker sets vary across studies, violating a core assumption of standard vision backbones. To address this, ImmuVis introduces marker-adaptive hyperconvolutions that generate convolutional kernels from learned marker embeddings, enabling a single model to operate on arbitrary measured marker subsets without retraining. We pretrain ImmuVis on the largest dataset to date, IMC17M (28 cohorts, 24,405 images, 265 markers, over 17M patches), using self-supervised masked reconstruction. ImmuVis outperforms state-of-the-art baselines and ablations in virtual staining and downstream classification tasks at substantially lower compute cost than transformer-based alternatives, and is the sole model that provides calibrated uncertainty via a heteroscedastic likelihood objective. These results position ImmuVis as a practical framework for real-world IMC modeling.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces ImmuVis, a family of foundation models for imaging mass cytometry that employs marker-adaptive hyperconvolutions. These generate convolutional kernels dynamically from learned marker embeddings, allowing a single pretrained model to process arbitrary marker subsets without retraining. The model is pretrained via self-supervised masked reconstruction on the large IMC17M dataset (28 cohorts, 24,405 images, 265 markers, >17M patches) and is claimed to outperform baselines in virtual staining and downstream classification tasks at lower compute cost than transformers while uniquely providing calibrated uncertainty through a heteroscedastic likelihood.
Significance. If the central claims hold, this work would be significant for multiplex imaging by directly addressing the variable-channel problem that breaks standard vision backbones. It offers a practical, efficient alternative to transformers for real-world IMC analysis and introduces uncertainty calibration that could improve reliability in downstream biological applications.
major comments (2)
- [Abstract] Abstract: The assertions of outperformance over state-of-the-art baselines and unique calibrated uncertainty are stated without any quantitative metrics, ablation results, or error analysis, preventing verification of the central claims from the provided text.
- [§4] §4 (Experiments): Evaluation is performed only on held-out marker subsets drawn from the same 265-marker vocabulary used in pretraining; this does not test generalization to entirely new markers (different antibody clones, staining protocols, or novel targets) from external cohorts, which is load-bearing for the marker-adaptive hyperconvolution claim.
minor comments (1)
- [§3.2] §3.2: The hyperconvolution operation would benefit from an explicit equation showing how marker embeddings are mapped to kernel weights, including any dimensionality or normalization details.
Simulated Author's Rebuttal
We thank the referee for their thoughtful review and constructive feedback on our manuscript. We address each major comment below and describe the corresponding revisions.
read point-by-point responses
-
Referee: [Abstract] Abstract: The assertions of outperformance over state-of-the-art baselines and unique calibrated uncertainty are stated without any quantitative metrics, ablation results, or error analysis, preventing verification of the central claims from the provided text.
Authors: We agree that the abstract, being a concise summary, lacks quantitative support for the performance claims. In the revised manuscript we will insert key metrics (e.g., PSNR/SSIM gains in virtual staining and accuracy improvements in downstream classification) directly into the abstract. Full quantitative results, ablations, and uncertainty calibration details remain in Section 4 and the supplement. revision: yes
-
Referee: [§4] §4 (Experiments): Evaluation is performed only on held-out marker subsets drawn from the same 265-marker vocabulary used in pretraining; this does not test generalization to entirely new markers (different antibody clones, staining protocols, or novel targets) from external cohorts, which is load-bearing for the marker-adaptive hyperconvolution claim.
Authors: The experiments test the core capability of processing arbitrary subsets drawn from the pretrained 265-marker vocabulary without retraining, which directly addresses the variable-channel problem encountered in real IMC studies. We do not claim zero-shot generalization to entirely novel markers outside this vocabulary, as new markers would require learning new embeddings. In revision we will explicitly delimit the scope of the claim, add a dedicated limitations paragraph, and outline future directions for extending the embedding space. revision: partial
Circularity Check
No significant circularity detected
full rationale
The paper's central mechanism (marker-adaptive hyperconvolutions generating kernels from embeddings) is an architectural design trained via self-supervised masked reconstruction on the external IMC17M dataset (28 cohorts, 24k+ images, 265 markers). Downstream evaluations use separate tasks and held-out data. No equations reduce to fitted inputs by construction, no self-citation chains load-bear the uniqueness or derivation, and no ansatz or renaming is smuggled in. The result is independently falsifiable on external cohorts and does not collapse to its own inputs.
Axiom & Free-Parameter Ledger
free parameters (1)
- marker embeddings
axioms (1)
- domain assumption Marker embeddings capture functional properties sufficient to generate appropriate convolutional kernels for any marker subset
invented entities (1)
-
hyperconvolution
no independent evidence
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.