SVC-Probe: A Framework for Evaluating Perturbation Generalization in Spatial Foundation-Model Embeddings

Ehsan Saghapour; Fuad Al Abir; Huu Phong Nguyen; Jake Y. Chen

arxiv: 2606.28465 · v1 · pith:FXCI2HTOnew · submitted 2026-06-26 · 🧬 q-bio.QM · cs.AI

SVC-Probe: A Framework for Evaluating Perturbation Generalization in Spatial Foundation-Model Embeddings

Jake Y. Chen , Huu Phong Nguyen , Fuad Al Abir , Ehsan Saghapour This is my paper

Pith reviewed 2026-06-30 01:11 UTC · model grok-4.3

classification 🧬 q-bio.QM cs.AI

keywords perturbation generalizationspatial embeddingsfoundation modelsfluorescence microscopydrug perturbationembedding stabilitycross-drug predictionvirtual cell

0 comments

The pith

High accuracy distinguishing drug conditions in spatial embeddings does not imply reliable cross-drug prediction.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces SVC-Probe to test whether spatial foundation-model embeddings from fluorescence microscopy capture perturbation patterns that transfer across drugs rather than just discriminate conditions in the training set. Applied to the CM4AI MDA-MB-468 atlas with 462 antibody labels and 1536-dimensional SubCell embeddings, the framework shows models reaching 98.6 percent three-way accuracy exhibit cosine similarity falling from 0.944 in-domain to 0.30 under leave-one-drug-out evaluation. This reveals that current benchmarks function mainly as two-drug stress tests instead of measuring general perturbation generalization. The approach combines stability metrics, neighborhood graphs, and centroid probes to expose that some drug axes produce consistent signals while others do not, due to coverage limits in the atlas.

Core claim

SVC-Probe demonstrates that 98.6% three-way condition accuracy does not correlate with reliable cross-drug prediction, with cosine similarity diminishing from 0.944 in-domain to 0.30 under leave-one-drug-out evaluation, constituting a two-drug stress test rather than a general benchmark. Null calibration indicates that raw residual-turnover coupling is largely influenced by generic embedding structure, whereas a drug-specific signal emerges under vorinostat and is consistent with chromatin-related reorganization. In contrast, the paclitaxel axis is not robustly reconstructed, likely due to sparse coverage of microtubule-associated proteins.

What carries the argument

SVC-Probe, a perturbation-aware framework that combines Subcellular Embedding Atlas Stability, Mondrian Neighborhood Graphs, and a Foundation Model Perturbation Probe to assess embedding stability, neighborhood rewiring, and centroid prediction under drug treatment.

If this is right

Perturbation generalization serves as a stricter benchmark than baseline condition discrimination for spatial virtual-cell representations.
Drug-specific signals can be isolated from generic embedding structure using null calibration.
Sparse coverage of certain protein classes limits reconstruction of specific perturbation axes such as paclitaxel.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Applying the same probe to embeddings from other cell lines or perturbation classes could show whether the observed similarity drop is atlas-specific.
Expanding antibody panels to include more microtubule-associated proteins would test if paclitaxel-axis reconstruction improves.
Comparing multiple foundation models with SVC-Probe could quantify which architectures better capture transferable perturbation axes.

Load-bearing premise

That leave-one-drug-out evaluation on this particular atlas with its specific protein coverage provides a meaningful test of generalization to arbitrary perturbations.

What would settle it

Observing cosine similarity remaining above 0.7 in leave-one-drug-out tests on a new atlas with dense microtubule-associated protein labels would indicate the paclitaxel reconstruction failure is not due to sparsity.

Figures

Figures reproduced from arXiv: 2606.28465 by Ehsan Saghapour, Fuad Al Abir, Huu Phong Nguyen, Jake Y. Chen.

**Figure 1.** Figure 1: SVC-Probe pipeline: CM4AI IF → SubCell embeddings (1536-D) → SEAS (stability scoring) → MNG (Mondrian kNN graph + R(P)) → FMP (perturbation probe). 2.3.1 SEAS In this pipeline, for each protein P and each control-vs-drug condition pair (ctrl, c), SEAS computes S(P) ∈ (0, 1] capturing how stable the protein's subcellular fingerprint is across the perturbation. With per-cell embedding cloud, centroid μ(P,c),… view at source ↗

**Figure 2.** Figure 2: SEAS stability scores for the 33-label calibration subset under control-versus-paclitaxel and controlversus-vorinostat comparisons. Each point represents the SEAS score of one protein, and horizontal bars indicate 95% bootstrap confidence intervals estimated from per-cell SubCell embeddings. Proteins are shown in the same order across both panels to facilitate comparison between perturbations. Colors indi… view at source ↗

read the original abstract

This work examines perturbation generalization in spatial foundation-model embeddings derived from fluorescence microscopy images. Although these models can discriminate drug conditions accurately, it remains unclear whether the learned representations reflect patterns consistent with expected perturbation axes that transfer across drugs. We introduce SVC-Probe, a perturbation-aware framework that combines Subcellular Embedding Atlas Stability, Mondrian Neighborhood Graphs, and a Foundation Model Perturbation Probe to assess embedding stability, neighborhood rewiring, and centroid prediction under drug treatment. Applied to the CM4AI MDA-MB-468 chemical-perturbation atlas comprising 462 antibody labels and SubCell 1536-dimensional embeddings, SVC-Probe demonstrates that 98.6% three-way condition accuracy does not correlate with reliable cross-drug prediction, with cosine similarity diminishing from 0.944 in-domain to 0.30 under leave-one-drug-out evaluation, constituting a two-drug stress test rather than a general benchmark. Null calibration indicates that raw residual-turnover coupling is largely influenced by generic embedding structure, whereas a drug-specific signal emerges under vorinostat and is consistent with chromatin-related reorganization. In contrast, the paclitaxel axis is not robustly reconstructed, likely due to sparse coverage of microtubule-associated proteins. Together, these results introduce and demonstrate a reusable diagnostic framework for stress-testing spatial virtual-cell representations and indicate that perturbation generalization may serve as a stricter and more informative benchmark than baseline condition discrimination.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The main result is that 98.6% condition accuracy does not predict reliable cross-drug centroid shifts in these embeddings, shown by a cosine drop to 0.30 on leave-one-drug-out.

read the letter

The punchline is that high accuracy on distinguishing drug conditions does not mean the embeddings capture perturbation effects that transfer to a new drug. The paper introduces SVC-Probe, which combines Subcellular Embedding Atlas Stability, Mondrian Neighborhood Graphs, and a Foundation Model Perturbation Probe. On the CM4AI MDA-MB-468 atlas with 462 antibody labels and SubCell embeddings, it reports 98.6% three-way accuracy alongside a cosine similarity drop from 0.944 in-domain to 0.30 under leave-one-drug-out. Null calibration shows mostly generic structure, with a drug-specific chromatin signal only for vorinostat; paclitaxel fails, which the authors tie to sparse microtubule-protein coverage.

This combined diagnostic approach for checking stability, neighborhood rewiring, and centroid prediction is new. The work does well by stating its own limits as a two-drug stress test and by grounding the claims with the null runs.

The soft spots are the narrow scope: one atlas, one cell line, two drugs. That makes the suggestion that perturbation generalization is a stricter benchmark feel preliminary rather than settled. The abstract gives clean numbers but no error bars or run-to-run details, though the internal logic is consistent. No circularity or fitting issues appear.

This is for people building or using spatial foundation models in bioimaging who want concrete diagnostics for perturbation tasks. A reader focused on drug studies or virtual-cell representations would get practical value from the framework and the specific failure modes.

It deserves peer review to test whether the probe extends beyond these two cases.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces SVC-Probe, a perturbation-aware framework that integrates Subcellular Embedding Atlas Stability, Mondrian Neighborhood Graphs, and a Foundation Model Perturbation Probe to evaluate embedding stability, neighborhood rewiring, and centroid prediction in spatial foundation-model embeddings from fluorescence microscopy. Applied to the CM4AI MDA-MB-468 chemical-perturbation atlas (462 antibody labels, SubCell 1536-dimensional embeddings), the work shows that 98.6% three-way condition accuracy does not imply reliable cross-drug prediction: cosine similarity falls from 0.944 (in-domain) to 0.30 (leave-one-drug-out). The evaluation is explicitly framed as a two-drug stress test; null calibration indicates mostly generic embedding structure, with a drug-specific chromatin signal only for vorinostat, while the paclitaxel axis fails due to sparse microtubule-protein coverage. The central claim is that perturbation generalization constitutes a stricter benchmark than baseline condition discrimination.

Significance. If the reported metrics and caveats hold, the paper supplies a reusable diagnostic framework for stress-testing spatial virtual-cell representations. The explicit positioning as a limited two-drug test, together with null calibration and protein-coverage caveats, provides a balanced assessment of current embedding limitations and highlights perturbation generalization as a more informative evaluation axis than simple classification accuracy.

minor comments (3)

Abstract: the quantitative claims (98.6% accuracy, cosine values 0.944/0.30) would benefit from a parenthetical reference to the exact data split and embedding dimensionality used, even at abstract length.
Methods/Results: clarify whether the Mondrian Neighborhood Graphs component includes any tunable parameters (e.g., neighborhood size) and, if so, how they were chosen or shown to be robust.
Results: the null-calibration section would be strengthened by an explicit table or figure panel contrasting generic vs. drug-specific signals across all tested perturbations rather than narrative description alone.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive and accurate summary of the manuscript, which correctly captures the scope of SVC-Probe, the reported metrics, the two-drug stress-test framing, and the protein-coverage caveats. We appreciate the recommendation for minor revision. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper introduces SVC-Probe as an evaluation framework applied to an external atlas (CM4AI MDA-MB-468) using explicit data splits such as leave-one-drug-out. No equations, fitted parameters, self-citations, or ansatzes are presented that reduce the reported metrics (condition accuracy, cosine similarity) to inputs by construction. The central results derive from standard cross-validation on independent data rather than self-referential definitions or renamings.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no explicit free parameters, axioms, or invented entities; the framework implicitly assumes embeddings should encode drug-specific perturbation axes independent of generic structure.

pith-pipeline@v0.9.1-grok · 5794 in / 1174 out tokens · 28840 ms · 2026-06-30T01:11:39.636899+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

16 extracted references · 1 canonical work pages

[1]

A subcellular map of the human proteome,

Thul et al., “A subcellular map of the human proteome,” Science, vol. 356, p. eaal3321, 2017

2017
[2]

SubCell: Proteome-aware vision foundation models for microscopy capture single-cell biology,

Gupta et al., “SubCell: Proteome-aware vision foundation models for microscopy capture single-cell biology,” bioRxiv 2024.12.06.627299, 2024. Proceedings of CIBB 2026 7

2024
[3]

A foundation model for spatial proteomics,

Shaban et al., “A foundation model for spatial proteomics,” arXiv:2506.03373, 2025

work page arXiv 2025
[4]

scGen predicts single-cell perturbation responses,

Lotfollahi et al., “scGen predicts single-cell perturbation responses,” Nat. Methods, vol. 16, pp. 715–721, 2019

2019
[5]

Learning single-cell perturbation responses using neural optimal transport,

Bunne et al., “Learning single-cell perturbation responses using neural optimal transport,” Nat. Methods, vol. 20, pp. 1759–1768, 2023

2023
[6]

Predicting transcriptional outcomes of novel multigene perturbations with GEARS,

Roohani et al., “Predicting transcriptional outcomes of novel multigene perturbations with GEARS,” Nat. Biotechnol., vol. 42, pp. 927–935, 2024

2024
[7]

Coordinated cellular neighborhoods orchestrate antitumoral immunity at the colorectal cancer invasive front,

Schürch et al., “Coordinated cellular neighborhoods orchestrate antitumoral immunity at the colorectal cancer invasive front,” Cell, vol. 182, pp. 1341–1359, 2020

2020
[8]

The proliferation rate paradox in antimitotic chemotherapy,

Mitchison, “The proliferation rate paradox in antimitotic chemotherapy,” Mol. Biol. Cell, vol. 23, pp. 1 –6, 2012

2012
[9]

New and emerging HDAC inhibitors for cancer treatment,

West et al., “New and emerging HDAC inhibitors for cancer treatment,” J. Clin. Invest., vol. 124, pp. 30 –39, 2014

2014
[10]

Cell Maps for Artificial Intelligence: AI -ready maps of human cell architecture from disease - relevant cell lines,

Clark et al., “Cell Maps for Artificial Intelligence: AI -ready maps of human cell architecture from disease - relevant cell lines,” bioRxiv 2024.05.21.589311, 2024

2024
[11]

Densely connected convolutional networks,

Huang et al., “Densely connected convolutional networks,” in Proc. CVPR, pp. 4700–4708, 2017

2017
[12]

Generative machine learning unlocks the first proteome -wide image of human cells,

Sun et al., “Generative machine learning unlocks the first proteome -wide image of human cells,” bioRxiv 2026.03.31.715748, 2026

2026
[13]

scPortrait integrates single -cell images into multimodal modeling,

Mädler et al., “scPortrait integrates single -cell images into multimodal modeling,” bioRxiv 2025.09.22.677590, 2025

2025
[14]

Deep learning-based gene perturbation effect prediction does not yet outperform simple linear methods,

Ahlmann-Eltze et al., “Deep learning-based gene perturbation effect prediction does not yet outperform simple linear methods,” Nat. Methods, vol. 22, pp. 1657–1661, 2025

2025
[15]

Cellpose 2.0: how to train your own model,

Pachitariu et al., “Cellpose 2.0: how to train your own model,” Nat. Methods, vol. 19, pp. 1634–1641, 2022

2022
[16]

From Louvain to Leiden: guaranteeing well -connected communities,

Traag et al., “From Louvain to Leiden: guaranteeing well -connected communities,” Sci. Reports, vol. 9, p. 5233, 2019

2019

[1] [1]

A subcellular map of the human proteome,

Thul et al., “A subcellular map of the human proteome,” Science, vol. 356, p. eaal3321, 2017

2017

[2] [2]

SubCell: Proteome-aware vision foundation models for microscopy capture single-cell biology,

Gupta et al., “SubCell: Proteome-aware vision foundation models for microscopy capture single-cell biology,” bioRxiv 2024.12.06.627299, 2024. Proceedings of CIBB 2026 7

2024

[3] [3]

A foundation model for spatial proteomics,

Shaban et al., “A foundation model for spatial proteomics,” arXiv:2506.03373, 2025

work page arXiv 2025

[4] [4]

scGen predicts single-cell perturbation responses,

Lotfollahi et al., “scGen predicts single-cell perturbation responses,” Nat. Methods, vol. 16, pp. 715–721, 2019

2019

[5] [5]

Learning single-cell perturbation responses using neural optimal transport,

Bunne et al., “Learning single-cell perturbation responses using neural optimal transport,” Nat. Methods, vol. 20, pp. 1759–1768, 2023

2023

[6] [6]

Predicting transcriptional outcomes of novel multigene perturbations with GEARS,

Roohani et al., “Predicting transcriptional outcomes of novel multigene perturbations with GEARS,” Nat. Biotechnol., vol. 42, pp. 927–935, 2024

2024

[7] [7]

Coordinated cellular neighborhoods orchestrate antitumoral immunity at the colorectal cancer invasive front,

Schürch et al., “Coordinated cellular neighborhoods orchestrate antitumoral immunity at the colorectal cancer invasive front,” Cell, vol. 182, pp. 1341–1359, 2020

2020

[8] [8]

The proliferation rate paradox in antimitotic chemotherapy,

Mitchison, “The proliferation rate paradox in antimitotic chemotherapy,” Mol. Biol. Cell, vol. 23, pp. 1 –6, 2012

2012

[9] [9]

New and emerging HDAC inhibitors for cancer treatment,

West et al., “New and emerging HDAC inhibitors for cancer treatment,” J. Clin. Invest., vol. 124, pp. 30 –39, 2014

2014

[10] [10]

Cell Maps for Artificial Intelligence: AI -ready maps of human cell architecture from disease - relevant cell lines,

Clark et al., “Cell Maps for Artificial Intelligence: AI -ready maps of human cell architecture from disease - relevant cell lines,” bioRxiv 2024.05.21.589311, 2024

2024

[11] [11]

Densely connected convolutional networks,

Huang et al., “Densely connected convolutional networks,” in Proc. CVPR, pp. 4700–4708, 2017

2017

[12] [12]

Generative machine learning unlocks the first proteome -wide image of human cells,

Sun et al., “Generative machine learning unlocks the first proteome -wide image of human cells,” bioRxiv 2026.03.31.715748, 2026

2026

[13] [13]

scPortrait integrates single -cell images into multimodal modeling,

Mädler et al., “scPortrait integrates single -cell images into multimodal modeling,” bioRxiv 2025.09.22.677590, 2025

2025

[14] [14]

Deep learning-based gene perturbation effect prediction does not yet outperform simple linear methods,

Ahlmann-Eltze et al., “Deep learning-based gene perturbation effect prediction does not yet outperform simple linear methods,” Nat. Methods, vol. 22, pp. 1657–1661, 2025

2025

[15] [15]

Cellpose 2.0: how to train your own model,

Pachitariu et al., “Cellpose 2.0: how to train your own model,” Nat. Methods, vol. 19, pp. 1634–1641, 2022

2022

[16] [16]

From Louvain to Leiden: guaranteeing well -connected communities,

Traag et al., “From Louvain to Leiden: guaranteeing well -connected communities,” Sci. Reports, vol. 9, p. 5233, 2019

2019