Cell Instance Segmentation via Multi-Task Image-to-Image Schr\"odinger Bridge
Pith reviewed 2026-05-10 15:47 UTC · model grok-4.3
The pith
Cell instance segmentation can be reframed as generating masks from images via a Schrödinger Bridge to enforce global structure without post-processing.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Existing cell instance segmentation pipelines typically combine deterministic predictions with post-processing, which imposes limited explicit constraints on the global structure of instance masks. This work proposes a multi-task image-to-image Schrödinger Bridge framework that formulates instance segmentation as a distribution-based image-to-image generation problem. Boundary-aware supervision is integrated through a reverse distance map, and deterministic inference is employed to produce stable predictions. Experimental results on the PanNuke dataset demonstrate that the proposed method achieves competitive or superior performance without relying on SAM pre-training or additional post-pro
What carries the argument
The multi-task image-to-image Schrödinger Bridge framework, which solves segmentation by finding a stochastic path that matches the distribution of input cell images to the distribution of their instance mask outputs.
If this is right
- Instance segmentation pipelines no longer require separate post-processing stages to refine outputs.
- Performance stays competitive on PanNuke even without large pre-trained models or extra supervision.
- The same framework remains effective on MoNuSeg when only limited training data is available.
- Boundary information can be supplied directly through a reverse distance map rather than as a separate task.
Where Pith is reading between the lines
- The approach might transfer to other medical imaging tasks that need strong global consistency, such as nuclei or organ segmentation.
- Avoiding dependence on large pre-trained models could help in settings with strict privacy rules or small datasets.
- The underlying distribution-matching view may later support uncertainty estimates for each segmented instance.
- Bridge-based generation could replace heuristic cleanup steps in other dense prediction problems.
Load-bearing premise
That treating segmentation as a Schrödinger Bridge between image distributions imposes stronger explicit global-structure constraints than deterministic prediction plus post-processing.
What would settle it
A head-to-head test on PanNuke where a standard deterministic model plus post-processing produces higher instance-level accuracy or fewer global mask inconsistencies than the Schrödinger Bridge method.
Figures
read the original abstract
Existing cell instance segmentation pipelines typically combine deterministic predictions with post-processing, which imposes limited explicit constraints on the global structure of instance masks. In this work, we propose a multi-task image-to-image Schr\"odinger Bridge framework that formulates instance segmentation as a distribution-based image-to-image generation problem. Boundary-aware supervision is integrated through a reverse distance map, and deterministic inference is employed to produce stable predictions. Experimental results on the PanNuke dataset demonstrate that the proposed method achieves competitive or superior performance without relying on SAM pre-training or additional post-processing. Additional results on the MoNuSeg dataset show robustness under limited training data. These findings indicate that Schr\"odinger Bridge-based image-to-image generation provides an effective framework for cell instance segmentation.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a multi-task image-to-image Schrödinger Bridge framework for cell instance segmentation. It formulates the task as distribution-based image-to-image generation, integrates boundary-aware supervision via a reverse distance map, and uses deterministic inference at test time. Experiments claim competitive or superior performance on PanNuke without SAM pre-training or post-processing, plus robustness on MoNuSeg under limited data.
Significance. If validated, the work could advance generative approaches to instance segmentation by leveraging Schrödinger Bridge transport for explicit global structure constraints, reducing reliance on post-processing in pathology imaging. The multi-task and boundary components are sensible extensions, but the paper does not ship machine-checked proofs, reproducible code, or parameter-free derivations that would strengthen the assessment.
major comments (2)
- [Experimental Results] Experimental Results section: the central claim that the Schrödinger Bridge imposes stronger explicit global-structure constraints (and thereby enables competitive results without post-processing) is not isolated from the multi-task loss and reverse distance map supervision. No ablation is reported that holds the auxiliary supervision fixed while replacing the SB transport with a standard deterministic regression head or conditional GAN; this directly undermines attribution of any PanNuke gains to the SB mechanism.
- [Abstract and Results] Abstract and Results section: competitive or superior performance is asserted on PanNuke and MoNuSeg, yet the manuscript supplies no quantitative tables, per-class metrics, error bars, ablation tables, or statistical significance tests. This absence is load-bearing for the claim of robustness under limited training data.
minor comments (2)
- [Method] Notation in the method section: the precise definition of the multi-task objective combining the Schrödinger Bridge loss with the reverse distance map term should be written out explicitly (including any weighting hyperparameters) to allow reproduction.
- [Figures] Figure captions: qualitative segmentation examples would benefit from side-by-side comparison with a deterministic baseline to visually illustrate the claimed global consistency advantage.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to strengthen the experimental validation and presentation of results.
read point-by-point responses
-
Referee: [Experimental Results] Experimental Results section: the central claim that the Schrödinger Bridge imposes stronger explicit global-structure constraints (and thereby enables competitive results without post-processing) is not isolated from the multi-task loss and reverse distance map supervision. No ablation is reported that holds the auxiliary supervision fixed while replacing the SB transport with a standard deterministic regression head or conditional GAN; this directly undermines attribution of any PanNuke gains to the SB mechanism.
Authors: We agree that isolating the contribution of the Schrödinger Bridge (SB) transport is necessary to support the claim of stronger global-structure constraints. The current framework integrates SB with multi-task learning and reverse distance map supervision, but we did not include the requested ablation that replaces SB with a standard deterministic regression head (or conditional GAN) while holding the auxiliary losses fixed. In the revised version, we will add this ablation on PanNuke, training a direct-regression baseline with identical multi-task and boundary supervision for direct comparison of instance segmentation metrics. This will clarify the specific role of the SB mechanism. revision: yes
-
Referee: [Abstract and Results] Abstract and Results section: competitive or superior performance is asserted on PanNuke and MoNuSeg, yet the manuscript supplies no quantitative tables, per-class metrics, error bars, ablation tables, or statistical significance tests. This absence is load-bearing for the claim of robustness under limited training data.
Authors: We acknowledge that the manuscript does not currently include the full set of quantitative tables, per-class metrics, error bars, complete ablation tables, or statistical significance tests needed to fully substantiate the performance claims. The abstract and results summarize outcomes, but to support assertions of competitive results on PanNuke and robustness on MoNuSeg under limited data, we will expand the Results section and supplementary material with comprehensive tables (including Dice, AJI, PQ scores, per-class breakdowns, error bars from repeated runs, the new SB-vs-regression ablation, and paired statistical tests). revision: yes
Circularity Check
No significant circularity; Schrödinger Bridge formulation is an independent modeling choice with empirical validation.
full rationale
The paper proposes framing cell instance segmentation as a distribution-based image-to-image generation task via a multi-task Schrödinger Bridge, augmented with reverse distance map supervision and deterministic inference at test time. Performance is demonstrated via experiments on PanNuke and MoNuSeg datasets, claiming competitive results without SAM pre-training or post-processing. No equations, derivations, or claims in the provided text reduce the reported outcomes to a fitted parameter renamed as prediction, a self-citation chain, or a self-definitional loop. The central claims rest on external dataset benchmarks rather than internal construction, satisfying the default expectation of self-contained modeling.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
CellViT: Vision Transformers for precise cell segmen- tation and classification,
F. H ¨orstet al., “CellViT: Vision Transformers for precise cell segmen- tation and classification,”Medical Image Analysis, vol. 94, p. 103143, 2024
work page 2024
-
[2]
S. Grahamet al., “Hover-Net: Simultaneous segmentation and classifica- tion of nuclei in multi-tissue histology images,”Medical Image Analysis, vol. 58, p. 101563, 2019
work page 2019
-
[3]
Cell Detection with Star-Convex Polygons,
U. Schmidtet al., “Cell Detection with Star-Convex Polygons,” inMed- ical Image Computing and Computer Assisted Intervention – MICCAI 2018, 2018
work page 2018
-
[4]
CellPose: a Generalist Algorithm for Cellular Segmentation,
C. Stringeret al., “CellPose: a Generalist Algorithm for Cellular Segmentation,”Nature Methods, vol. 18, pp. 100–106, 2021
work page 2021
-
[5]
MEDIAR: Harmony of Data-Centric and Model-Centric for Multi-Modality Microscopy,
G. Leeet al., “MEDIAR: Harmony of Data-Centric and Model-Centric for Multi-Modality Microscopy,” inProceedings of The Cell Segmenta- tion Challenge in Multi-modality High-Resolution Microscopy Images, ser. Proceedings of Machine Learning Research, vol. 212. PMLR, 2023, pp. 1–16
work page 2023
-
[6]
High-Resolution Image Synthesis With Latent Diffusion Models,
R. Rombachet al., “High-Resolution Image Synthesis With Latent Diffusion Models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2022, pp. 10 684–10 695
work page 2022
-
[7]
I 2SB: Image-to-Image Schr ¨odinger Bridge,
G.-H. Liuet al., “I 2SB: Image-to-Image Schr ¨odinger Bridge,” inPro- ceedings of the 40th International Conference on Machine Learning (ICML), ser. Proceedings of Machine Learning Research, vol. 202. PMLR, 2023, pp. 22 042–22 062
work page 2023
-
[8]
K. Heet al., “Mask R-CNN,” inProceedings of the IEEE International Conference on Computer Vision (ICCV), 2017, pp. 2980–2988
work page 2017
-
[9]
J. Raufeisenet al., “Cyto R-CNN and CytoNuke Dataset: Towards reliable whole-cell segmentation in bright-field histological images,” Computer Methods and Programs in Biomedicine, vol. 252, p. 108215, 2024
work page 2024
-
[10]
Ambiguous Medical Image Segmentation using Diffusion Models,
A. Rahmanet al., “Ambiguous Medical Image Segmentation using Diffusion Models,” inProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), June 2023, pp. 11 536–11 546
work page 2023
-
[11]
Generative medical segmentation,
J. Huoet al., “Generative medical segmentation,” inProceedings of the AAAI Conference on Artificial Intelligence, vol. 39, no. 4, April 2025, pp. 3851–3859
work page 2025
-
[12]
R. Qiuet al., “Accurate Boundary Alignment and Realism Enhancement for Colonoscopic Polyp Image-Mask Pair Generation,” inMedical Image Computing and Computer Assisted Intervention – MICCAI 2025, vol. LNCS 15969. Springer Nature Switzerland, October 2025, pp. 34–44
work page 2025
-
[13]
PanNuke Dataset Extension, Insights and Baselines,
J. Gamperet al., “PanNuke Dataset Extension, Insights and Baselines,” arXiv preprint, 2020
work page 2020
-
[14]
PanNuke: An Open Pan-Cancer Histology Dataset for Nuclei Instance Segmentation and Classification,
J. Gamperet al., “PanNuke: An Open Pan-Cancer Histology Dataset for Nuclei Instance Segmentation and Classification,” inDigital Pathology – 15th European Congress, ECDP 2019, Proceedings, ser. Lecture Notes in Computer Science, vol. 11435. Springer, 2019, pp. 11–19
work page 2019
-
[15]
A Dataset and a Technique for Generalized Nuclear Segmentation for Computational Pathology,
N. Kumaret al., “A Dataset and a Technique for Generalized Nuclear Segmentation for Computational Pathology,”IEEE transactions on med- ical imaging, vol. 36, no. 7, pp. 1550–1560, 2017
work page 2017
-
[16]
A Multi-organ Nucleus Segmentation Challenge,
N. Kumaret al., “A Multi-organ Nucleus Segmentation Challenge,” IEEE transactions on medical imaging, vol. 39, no. 5, pp. 1380–1391, 2019
work page 2019
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.