pith. sign in

arxiv: 2602.04585 · v2 · pith:XFV5UMRQnew · submitted 2026-02-04 · 💻 cs.CV

ImmuVis: Hyperconvolutional Foundation Model for Imaging Mass Cytometry

Pith reviewed 2026-05-16 07:32 UTC · model grok-4.3

classification 💻 cs.CV
keywords imaging mass cytometryhyperconvolutionfoundation modelmultiplex imagingvirtual staininguncertainty calibrationmarker-adaptive convolution
0
0 comments X

The pith

ImmuVis generates convolutional kernels on the fly from marker embeddings so one model works with any combination of molecular markers in tissue images.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard vision models require a fixed set of input channels, but imaging mass cytometry measures different molecular markers in each study, so the channel count changes. ImmuVis pretrains a single set of weights that learns an embedding for every marker and then uses those embeddings to create the exact convolutional filters needed for whatever markers are present in a given image. The resulting hyperconvolution layer lets the model process completely new marker panels without retraining or architectural changes. Pretraining occurs on the IMC17M dataset of more than seventeen million patches, and the model exceeds prior methods on virtual staining and classification while also returning calibrated uncertainty estimates.

Core claim

ImmuVis establishes that marker-adaptive hyperconvolutions, driven by learned embeddings of each measured marker, allow a single foundation model to operate on arbitrary marker subsets in imaging mass cytometry data. The model is pretrained with self-supervised masked reconstruction on the largest IMC corpus to date and delivers higher accuracy than fixed-channel baselines and transformer alternatives at lower compute cost, while the heteroscedastic likelihood objective supplies the only calibrated uncertainty among compared approaches.

What carries the argument

marker-adaptive hyperconvolutions that generate convolutional kernels directly from learned embeddings of the input markers

If this is right

  • A single set of model weights can be deployed across studies that measure completely different marker panels.
  • Virtual staining and tissue classification tasks become feasible without aligning marker spaces between training and test data.
  • Inference cost stays low because the same convolutional backbone serves every marker combination instead of requiring separate models.
  • Uncertainty estimates are available for every prediction because the heteroscedastic likelihood is part of the training objective.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same embedding-to-kernel mechanism could be tested on other variable-channel modalities such as multiplexed immunofluorescence or mass spectrometry imaging.
  • New studies could add markers incrementally by learning only the embedding for the added marker while keeping the rest of the model frozen.
  • The approach may reduce the data-collection burden in clinical settings where full marker panels are expensive or unavailable.

Load-bearing premise

Embeddings learned from the pretraining markers remain effective when asked to generate kernels for entirely new marker combinations never seen during training.

What would settle it

Measure performance on a held-out IMC dataset that uses at least one marker absent from the 265 markers in IMC17M and compare it to a model retrained from scratch on that new marker set.

read the original abstract

We present ImmuVis, a family of efficient foundation models for imaging mass cytometry (IMC), a high-throughput multiplex imaging technology that handles molecular marker measurements as image channels and enables large-scale spatial tissue profiling. Unlike natural images, multiplex imaging lacks a fixed channel space, as real-world marker sets vary across studies, violating a core assumption of standard vision backbones. To address this, ImmuVis introduces marker-adaptive hyperconvolutions that generate convolutional kernels from learned marker embeddings, enabling a single model to operate on arbitrary measured marker subsets without retraining. We pretrain ImmuVis on the largest dataset to date, IMC17M (28 cohorts, 24,405 images, 265 markers, over 17M patches), using self-supervised masked reconstruction. ImmuVis outperforms state-of-the-art baselines and ablations in virtual staining and downstream classification tasks at substantially lower compute cost than transformer-based alternatives, and is the sole model that provides calibrated uncertainty via a heteroscedastic likelihood objective. These results position ImmuVis as a practical framework for real-world IMC modeling.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces ImmuVis, a family of foundation models for imaging mass cytometry that employs marker-adaptive hyperconvolutions. These generate convolutional kernels dynamically from learned marker embeddings, allowing a single pretrained model to process arbitrary marker subsets without retraining. The model is pretrained via self-supervised masked reconstruction on the large IMC17M dataset (28 cohorts, 24,405 images, 265 markers, >17M patches) and is claimed to outperform baselines in virtual staining and downstream classification tasks at lower compute cost than transformers while uniquely providing calibrated uncertainty through a heteroscedastic likelihood.

Significance. If the central claims hold, this work would be significant for multiplex imaging by directly addressing the variable-channel problem that breaks standard vision backbones. It offers a practical, efficient alternative to transformers for real-world IMC analysis and introduces uncertainty calibration that could improve reliability in downstream biological applications.

major comments (2)
  1. [Abstract] Abstract: The assertions of outperformance over state-of-the-art baselines and unique calibrated uncertainty are stated without any quantitative metrics, ablation results, or error analysis, preventing verification of the central claims from the provided text.
  2. [§4] §4 (Experiments): Evaluation is performed only on held-out marker subsets drawn from the same 265-marker vocabulary used in pretraining; this does not test generalization to entirely new markers (different antibody clones, staining protocols, or novel targets) from external cohorts, which is load-bearing for the marker-adaptive hyperconvolution claim.
minor comments (1)
  1. [§3.2] §3.2: The hyperconvolution operation would benefit from an explicit equation showing how marker embeddings are mapped to kernel weights, including any dimensionality or normalization details.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thoughtful review and constructive feedback on our manuscript. We address each major comment below and describe the corresponding revisions.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The assertions of outperformance over state-of-the-art baselines and unique calibrated uncertainty are stated without any quantitative metrics, ablation results, or error analysis, preventing verification of the central claims from the provided text.

    Authors: We agree that the abstract, being a concise summary, lacks quantitative support for the performance claims. In the revised manuscript we will insert key metrics (e.g., PSNR/SSIM gains in virtual staining and accuracy improvements in downstream classification) directly into the abstract. Full quantitative results, ablations, and uncertainty calibration details remain in Section 4 and the supplement. revision: yes

  2. Referee: [§4] §4 (Experiments): Evaluation is performed only on held-out marker subsets drawn from the same 265-marker vocabulary used in pretraining; this does not test generalization to entirely new markers (different antibody clones, staining protocols, or novel targets) from external cohorts, which is load-bearing for the marker-adaptive hyperconvolution claim.

    Authors: The experiments test the core capability of processing arbitrary subsets drawn from the pretrained 265-marker vocabulary without retraining, which directly addresses the variable-channel problem encountered in real IMC studies. We do not claim zero-shot generalization to entirely novel markers outside this vocabulary, as new markers would require learning new embeddings. In revision we will explicitly delimit the scope of the claim, add a dedicated limitations paragraph, and outline future directions for extending the embedding space. revision: partial

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The paper's central mechanism (marker-adaptive hyperconvolutions generating kernels from embeddings) is an architectural design trained via self-supervised masked reconstruction on the external IMC17M dataset (28 cohorts, 24k+ images, 265 markers). Downstream evaluations use separate tasks and held-out data. No equations reduce to fitted inputs by construction, no self-citation chains load-bear the uniqueness or derivation, and no ansatz or renaming is smuggled in. The result is independently falsifiable on external cohorts and does not collapse to its own inputs.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 1 invented entities

The central claim rests on the unproven assumption that marker embeddings can be learned to produce suitable kernels for unseen marker combinations; no free parameters or invented entities are quantified in the abstract.

free parameters (1)
  • marker embeddings
    Learned vectors that drive kernel generation; their dimensionality and training details are unspecified.
axioms (1)
  • domain assumption Marker embeddings capture functional properties sufficient to generate appropriate convolutional kernels for any marker subset
    Core premise enabling the hyperconvolution to work without retraining on new panels.
invented entities (1)
  • hyperconvolution no independent evidence
    purpose: Dynamically generates convolutional kernels from marker embeddings
    New architectural component introduced to solve variable-channel problem

pith-pipeline@v0.9.0 · 5568 in / 1220 out tokens · 46538 ms · 2026-05-16T07:32:09.971514+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.