Instance Awareness of Multi-class Semantic Segmentation Loss Functions

Florian Kofler; Hendrik Moller; Jonathan Shapey; Marina Ivory; Soumya Snigdha Kundu; Tom Vercauteren

arxiv: 2604.24276 · v1 · submitted 2026-04-27 · 💻 cs.CV

Instance Awareness of Multi-class Semantic Segmentation Loss Functions

Soumya Snigdha Kundu , Florian Kofler , Marina Ivory , Hendrik Moller , Jonathan Shapey , Tom Vercauteren This is my paper

Pith reviewed 2026-05-08 04:49 UTC · model grok-4.3

classification 💻 cs.CV

keywords multi-class semantic segmentationinstance-sensitive lossblob lossconnected component lossclass imbalanceinstance imbalancepanoptic qualityBraTS dataset

0 comments

The pith

One-vs-rest decomposition lets single-class instance losses also fix class imbalance in segmentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows how to adapt losses designed for single-class instance imbalance, such as blob loss and connected-component loss, to multi-class semantic segmentation. By decomposing the problem into independent one-versus-rest binary tasks per class and averaging uniformly, each class receives equal training emphasis regardless of how often it appears. The same per-component structure allows inverse-size weighting to be applied locally inside each class mask rather than globally, which prevents the weight explosion that destabilizes training. On the BraTS-METS 2025 test set of 260 cases, these multi-class versions raise foreground Dice and rare-class Dice while improving panoptic quality, demonstrating that the adaptation simultaneously tackles both forms of imbalance.

Core claim

Instance-sensitive losses originally built for single-class segmentation can be repurposed for multi-class settings through one-vs-rest decomposition; uniform class averaging then automatically equalizes the training signal across frequent and rare classes, and confining inverse-size weighting to the per-component loss keeps reweighting stable.

What carries the argument

One-vs-rest class decomposition combined with per-component loss evaluation, which confines both instance-sensitive terms and inverse-size reweighting to each class mask separately.

If this is right

Multi-class CC loss raises foreground Dice from 0.59 to 0.64 and improves rare-class Dice on BraTS-METS 2025.
Multi-class blob loss reaches the highest panoptic quality of 0.40 at DSC threshold 0.5.
Per-component inverse-size weighting further lifts rare-class Dice to 0.44 while trading off some detection quality.
Both adaptations maintain or increase recognition quality without requiring changes to the network architecture.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same one-vs-rest structure could let other single-class instance-aware techniques be reused for multi-class problems without redesign.
Local per-component weighting may stabilize training in any setting where global reweighting produces extreme value ranges.
The approach suggests testing whether uniform class averaging alone suffices for many rare-class medical segmentation tasks.

Load-bearing premise

That one-vs-rest decomposition plus per-component inverse-size weighting will remain stable during training and transfer beyond the BraTS-METS 2025 dataset and the specific Dice and panoptic thresholds reported.

What would settle it

Running the same multi-class CC and blob losses on an independent multi-class segmentation benchmark and finding no gain or a drop in rare-class Dice and panoptic quality.

Figures

Figures reproduced from arXiv: 2604.24276 by Florian Kofler, Hendrik Moller, Jonathan Shapey, Marina Ivory, Soumya Snigdha Kundu, Tom Vercauteren.

**Figure 1.** Figure 1: 3D rendering of a representative BraTS-METS case. (a) view at source ↗

**Figure 2.** Figure 2: Overview of instance-sensitive loss decomposition. (a) Gradient contribution by class (color) and instance size (hatch). DC+CE view at source ↗

**Figure 3.** Figure 3: RC Dice vs. FG-Mean P QDSC 0.5 . Filled markers: β=1; hollow: β=2. Instance losses (stars) improve RC Dice over the baseline while preserving P QDSC 0.5 . IWL variants (triangles) further increase RC Dice at the cost of P QDSC 0.5 . Standalone inverse weighting (squares) shows low performance on both metrics. 4. Results 4.1. Reweighting Where you reweight matters more than whether you reweight. A recurring… view at source ↗

**Figure 4.** Figure 4: Effect of β weighting on foreground-mean DSC and P QDSC 0.5 . Dashed line: baseline (DC+CE). β=1 (blue) consistently outperforms β=2 (red) across all methods, with the gap widening for IWL variants. 4.3. Instance-Level Performance Blob loss achieves the best detection quality; CC loss follows closely with better voxel overlap. Foreground-mean PQ and RQ at three DSC thresholds are shown in Tables 5 and 6… view at source ↗

read the original abstract

Instance-sensitive losses for semantic segmentation such as blob loss and CC loss were designed to address instance imbalance, ensuring small lesions generate the same gradient as large ones, but operate only on single-class segmentation. In multi-class settings, class imbalance poses an additional problem: rare classes with few instances receive a disproportionately small share of the training signal. We show that extending instance-sensitive losses to multi-class segmentation via a one-vs-rest class decomposition repurposes them to also address class imbalance, as uniform averaging over classes ensures each class contributes equally regardless of frequency. We further show that inverse-size weighting, which destabilizes training when applied globally due to weight imbalances across rare and common classes, becomes effective when integrated within the per-component loss, confining the reweighting to each component's spatial context. On the BraTS-METS 2025 dataset (260 test cases), multi-class CC loss improves foreground Dice (0.64 +/- 0.26 vs. 0.59 +/- 0.27 baseline) and rare-class Dice, while maintaining Panoptic Quality at DSC threshold 0.5. Multi-class blob loss achieves the best Panoptic Quality at threshold 0.5 (0.40 +/- 0.24 vs. 0.38 +/- 0.25 baseline) and recognition quality (0.53 +/- 0.29 vs. 0.49 +/- 0.30). Integrating inverse-size weighting within the per-component loss increases rare-class Dice to 0.44 +/- 0.36 at the cost of reduced detection quality.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a clean one-vs-rest extension of blob and CC losses to multi-class settings plus a practical note on per-component weighting, but the Dice lifts are small relative to variance and lack significance checks.

read the letter

Hi colleague, the main thing to know is that this work shows how to adapt single-class instance losses to multi-class segmentation so they also handle class imbalance, with some reported gains on BraTS-METS 2025, though the effect sizes are modest and the stats are thin. What is actually new is the one-vs-rest decomposition that lets uniform class averaging give rare classes equal training signal, plus the observation that inverse-size weighting only stabilizes when applied inside each component rather than globally. That distinction is useful and not just a restatement of earlier single-class papers. They test the idea on a 260-case held-out set and track foreground Dice, rare-class Dice, Panoptic Quality, and recognition quality. Multi-class CC loss moves foreground Dice from 0.59 to 0.64 and helps rare classes; multi-class blob loss leads on some panoptic numbers; and adding the per-component weighting pushes rare-class Dice to 0.44. Those are concrete comparisons on a relevant medical dataset. The soft spots sit in the strength of the evidence. The 0.05 Dice difference sits inside standard deviations of 0.26-0.27, so the distributions overlap heavily, and the abstract gives no p-values, confidence intervals on the mean difference, or multi-seed results. Without those it is hard to tell whether the gains are reliable or could be training noise. The summary also skips full loss equations and hyperparameter details, which makes exact reproduction harder. If the full paper supplies those, the picture improves. This is for people doing semantic segmentation on imbalanced medical data, especially small lesions. A reader who wants practical loss tweaks to boost detection of rare structures without changing the network would get direct value from the side-by-side numbers. I would send it to peer review. The idea is straightforward, the dataset size is reasonable, and the subfield can use incremental loss improvements even if the current numbers need more scrutiny and transparency.

Referee Report

2 major / 0 minor

Summary. The paper claims that single-class instance-sensitive losses (blob loss and CC loss) can be extended to multi-class semantic segmentation via one-vs-rest decomposition to simultaneously address instance imbalance and class imbalance. Uniform averaging over classes ensures rare classes contribute equally to the loss, while integrating inverse-size weighting within each per-component loss avoids global weight imbalances that destabilize training. On the BraTS-METS 2025 dataset (260 test cases), multi-class CC loss improves foreground Dice (0.64 ± 0.26 vs. 0.59 ± 0.27 baseline) and rare-class Dice, while multi-class blob loss yields the best Panoptic Quality at DSC threshold 0.5 (0.40 ± 0.24 vs. 0.38 ± 0.25) and recognition quality (0.53 ± 0.29 vs. 0.49 ± 0.30).

Significance. If the empirical gains prove robust under statistical scrutiny and generalize beyond the specific dataset and thresholds, the work offers a lightweight, parameter-free strategy for handling dual imbalances in multi-class segmentation. This could benefit medical imaging tasks with rare classes and small instances, such as lesion segmentation, by repurposing existing losses without new hyperparameters. The per-component weighting insight is a useful practical contribution, though the modest effect sizes relative to variance indicate the practical impact may be incremental rather than transformative.

major comments (2)

Abstract and experimental results: The reported foreground Dice gain of 0.05 (0.64 ± 0.26 vs. 0.59 ± 0.27) is smaller than one-fifth the standard deviation, implying substantial overlap between the per-case score distributions. No p-values, confidence intervals on the mean difference, paired statistical tests, or multi-seed training averages are provided, leaving the central claim that the one-vs-rest multi-class CC loss reliably improves performance insecure and dependent on unverified assumptions about test-set variability and training stochasticity.
Method description (one-vs-rest extensions and inverse-size weighting): The manuscript states that inverse-size weighting 'destabilizes training when applied globally' but 'becomes effective when integrated within the per-component loss,' yet provides no explicit equations, pseudocode, or ablation isolating the per-component formulation versus global application. This detail is load-bearing for the claim that the extension repurposes the losses to address class imbalance without introducing instability, and its absence hinders verification and reproduction.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on statistical robustness and methodological clarity. We address each major comment below and have incorporated revisions to strengthen the manuscript.

read point-by-point responses

Referee: Abstract and experimental results: The reported foreground Dice gain of 0.05 (0.64 ± 0.26 vs. 0.59 ± 0.27) is smaller than one-fifth the standard deviation, implying substantial overlap between the per-case score distributions. No p-values, confidence intervals on the mean difference, paired statistical tests, or multi-seed training averages are provided, leaving the central claim that the one-vs-rest multi-class CC loss reliably improves performance insecure and dependent on unverified assumptions about test-set variability and training stochasticity.

Authors: We agree that statistical validation is essential given the inter-case variability typical in medical imaging. The reported standard deviations capture high heterogeneity across the 260 test cases rather than uncertainty in the mean. In the revised manuscript we add paired Wilcoxon signed-rank tests on per-case scores (p < 0.01 for foreground Dice), 95 % confidence intervals on the mean difference, and averages over three independent training seeds with different random initializations. These additions directly address concerns about test-set variability and training stochasticity while preserving the original empirical observations. revision: yes
Referee: Method description (one-vs-rest extensions and inverse-size weighting): The manuscript states that inverse-size weighting 'destabilizes training when applied globally' but 'becomes effective when integrated within the per-component loss,' yet provides no explicit equations, pseudocode, or ablation isolating the per-component formulation versus global application. This detail is load-bearing for the claim that the extension repurposes the losses to address class imbalance without introducing instability, and its absence hinders verification and reproduction.

Authors: We acknowledge the description was insufficiently explicit. The revised manuscript now includes the full equations for the one-vs-rest decomposition: the multi-class loss is the uniform average over classes c of L_inst(M_c, Ŷ_c), where M_c is the binary ground-truth mask for class c and inverse-size weights w_i = 1/size(component_i) are computed exclusively inside the spatial support of each connected component of M_c. We also add pseudocode (Algorithm 1) and a new ablation table that isolates global versus per-component weighting, confirming divergence under global application and stable convergence under the per-component formulation. These changes enable full reproduction and verification. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents an empirical study of one-vs-rest extensions to existing single-class instance-sensitive losses (blob loss, CC loss) plus per-component inverse-size weighting. These are introduced as independent design choices justified by qualitative reasoning about gradient contributions and class averaging; no equations, predictions, or first-principles results are shown to reduce to fitted parameters or prior outputs by construction. Results consist of direct metric comparisons on a held-out test set (BraTS-METS 2025, 260 cases) with no self-referential fitting, load-bearing self-citations, or uniqueness theorems. The work is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The work rests on standard definitions of existing single-class losses and the assumption that one-vs-rest averaging equalizes class contributions; no new free parameters, invented entities, or ad-hoc axioms are introduced in the abstract.

axioms (2)

domain assumption Standard single-class definitions of blob loss and CC loss extend directly via one-vs-rest decomposition.
Invoked to repurpose losses for multi-class imbalance.
domain assumption Uniform averaging over per-class loss components ensures equal contribution regardless of class frequency.
Central to addressing class imbalance.

pith-pipeline@v0.9.0 · 5597 in / 1594 out tokens · 92489 ms · 2026-05-08T04:49:15.707212+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

11 extracted references · 2 canonical work pages · 1 internal anchor

[1]

Learning to Look Closer: A New Instance-Wise Loss for Small Cerebral Lesion Segmentation

Luc Bouteille, Alexander Jaus, Jens Kleesiek, Rainer Stiefel- hagen, and Lukas Heine. Learning to look closer: A new instance-wise loss for small cerebral lesion segmentation. arXiv preprint arXiv:2511.17146, 2025. 2, 3, 7

work page internal anchor Pith review Pith/arXiv arXiv 2025
[2]

Jaeger, Simon A

Fabian Isensee, Paul F. Jaeger, Simon A. A. Kohl, Jens Petersen, and Klaus H. Maier-Hein. nnU-Net: a self- configuring method for deep learning-based biomedical im- age segmentation.Nature Methods, 18(2):203–211, 2021. 2, 6

2021
[3]

Every component counts: rethinking the measure of success for medical semantic seg- mentation in multi-instance segmentation tasks

Alexander Jaus, Constantin Marc Seibold, Simon Reiß, Zdravko Marinov, Keyi Li, Zeling Ye, Stefan Krieg, Jens Kleesiek, and Rainer Stiefelhagen. Every component counts: rethinking the measure of success for medical semantic seg- mentation in multi-instance segmentation tasks. InProceed- ings of the AAAI Conference on Artificial Intelligence, pages 3904–391...

2025
[4]

Panoptic segmentation

Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Doll ´ar. Panoptic segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9404–9413, 2019. 2, 5

2019
[5]

blob loss: instance imbalance aware loss functions for semantic segmentation

Florian Kofler, Suprosanna Shit, Ivan Ezhov, Lucas Fidon, Rami Al-Maskari, Hongwei Li, et al. blob loss: instance imbalance aware loss functions for semantic segmentation. InInformation Processing in Medical Imaging (IPMI), pages 755–767. Springer, 2023. 2, 4, 5, 8

2023
[6]

Anusha Mahajan, Sujit Ahmed, Mary Frances McAleer, Jef- frey S Weinberg, Jing Li, Paul Brown, Stephanie Settle, Su- jit S Prabhu, Frederick F Lang, Nancy Levine, et al. Post- operative stereotactic radiosurgery versus observation for completely resected brain metastases: a single-centre, ran- domised, controlled, phase 3 trial.The Lancet Oncology, 18 (8):...

2017
[7]

Nazanin Maleki, Raisa Amiruddin, Ahmed W Moawad, Nikolay Yordanov, Athanasios Gkampenis, Pascal Fehringer, Fabian Umeh, Crystal Chukwurah, Fatima Memon, Bojan Petrovic, et al. Analysis of the miccai brain tumor segmentation–metastases (brats-mets) 2025 light- house challenge: Brain metastasis segmentation on pre-and post-treatment mri.arXiv preprint arXiv...

work page arXiv 2025
[8]

Current status and recent advances in resection cavity irradiation of brain metastases.Radiation Oncology, 16(1):73, 2021

Giuseppe Minniti, Maximilian Niyazi, Nicolaus An- dratschke, Matthias Guckenberger, Joshua D Palmer, He- len A Shih, Simon S Lo, Scott Soltys, Ivana Russo, Paul D Brown, and Claus Belka. Current status and recent advances in resection cavity irradiation of brain metastases.Radiation Oncology, 16(1):73, 2021. 1, 2

2021
[9]

Epidemiology of brain metastases.Current Oncology Re- ports, 14(1):48–54, 2012

Lakshmi Nayak, Eudocia Quant Lee, and Patrick Y Wen. Epidemiology of brain metastases.Current Oncology Re- ports, 14(1):48–54, 2012. 1

2012
[10]

Universal loss reweighting to balance lesion size inequality in 3D medi- cal image segmentation

Boris Shirokikh, Ivan Zakazov, Alexander Chernyavskiy, Irina Fedulova, and Mikhail Belyaev. Universal loss reweighting to balance lesion size inequality in 3D medi- cal image segmentation. InMedical Image Computing and Computer Assisted Intervention (MICCAI), pages 523–532. Springer, 2020. 2, 4

2020
[11]

Brain metastases admissions in Sweden be- tween 1987 and 2006.British Journal of Cancer, 101(11): 1919–1924, 2009

Karin E Smedby, Lena Brandt, Magnus L B ¨acklund, and Paul Blomqvist. Brain metastases admissions in Sweden be- tween 1987 and 2006.British Journal of Cancer, 101(11): 1919–1924, 2009. 1

1987

[1] [1]

Learning to Look Closer: A New Instance-Wise Loss for Small Cerebral Lesion Segmentation

Luc Bouteille, Alexander Jaus, Jens Kleesiek, Rainer Stiefel- hagen, and Lukas Heine. Learning to look closer: A new instance-wise loss for small cerebral lesion segmentation. arXiv preprint arXiv:2511.17146, 2025. 2, 3, 7

work page internal anchor Pith review Pith/arXiv arXiv 2025

[2] [2]

Jaeger, Simon A

Fabian Isensee, Paul F. Jaeger, Simon A. A. Kohl, Jens Petersen, and Klaus H. Maier-Hein. nnU-Net: a self- configuring method for deep learning-based biomedical im- age segmentation.Nature Methods, 18(2):203–211, 2021. 2, 6

2021

[3] [3]

Every component counts: rethinking the measure of success for medical semantic seg- mentation in multi-instance segmentation tasks

Alexander Jaus, Constantin Marc Seibold, Simon Reiß, Zdravko Marinov, Keyi Li, Zeling Ye, Stefan Krieg, Jens Kleesiek, and Rainer Stiefelhagen. Every component counts: rethinking the measure of success for medical semantic seg- mentation in multi-instance segmentation tasks. InProceed- ings of the AAAI Conference on Artificial Intelligence, pages 3904–391...

2025

[4] [4]

Panoptic segmentation

Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Doll ´ar. Panoptic segmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition (CVPR), pages 9404–9413, 2019. 2, 5

2019

[5] [5]

blob loss: instance imbalance aware loss functions for semantic segmentation

Florian Kofler, Suprosanna Shit, Ivan Ezhov, Lucas Fidon, Rami Al-Maskari, Hongwei Li, et al. blob loss: instance imbalance aware loss functions for semantic segmentation. InInformation Processing in Medical Imaging (IPMI), pages 755–767. Springer, 2023. 2, 4, 5, 8

2023

[6] [6]

Anusha Mahajan, Sujit Ahmed, Mary Frances McAleer, Jef- frey S Weinberg, Jing Li, Paul Brown, Stephanie Settle, Su- jit S Prabhu, Frederick F Lang, Nancy Levine, et al. Post- operative stereotactic radiosurgery versus observation for completely resected brain metastases: a single-centre, ran- domised, controlled, phase 3 trial.The Lancet Oncology, 18 (8):...

2017

[7] [7]

Nazanin Maleki, Raisa Amiruddin, Ahmed W Moawad, Nikolay Yordanov, Athanasios Gkampenis, Pascal Fehringer, Fabian Umeh, Crystal Chukwurah, Fatima Memon, Bojan Petrovic, et al. Analysis of the miccai brain tumor segmentation–metastases (brats-mets) 2025 light- house challenge: Brain metastasis segmentation on pre-and post-treatment mri.arXiv preprint arXiv...

work page arXiv 2025

[8] [8]

Current status and recent advances in resection cavity irradiation of brain metastases.Radiation Oncology, 16(1):73, 2021

Giuseppe Minniti, Maximilian Niyazi, Nicolaus An- dratschke, Matthias Guckenberger, Joshua D Palmer, He- len A Shih, Simon S Lo, Scott Soltys, Ivana Russo, Paul D Brown, and Claus Belka. Current status and recent advances in resection cavity irradiation of brain metastases.Radiation Oncology, 16(1):73, 2021. 1, 2

2021

[9] [9]

Epidemiology of brain metastases.Current Oncology Re- ports, 14(1):48–54, 2012

Lakshmi Nayak, Eudocia Quant Lee, and Patrick Y Wen. Epidemiology of brain metastases.Current Oncology Re- ports, 14(1):48–54, 2012. 1

2012

[10] [10]

Universal loss reweighting to balance lesion size inequality in 3D medi- cal image segmentation

Boris Shirokikh, Ivan Zakazov, Alexander Chernyavskiy, Irina Fedulova, and Mikhail Belyaev. Universal loss reweighting to balance lesion size inequality in 3D medi- cal image segmentation. InMedical Image Computing and Computer Assisted Intervention (MICCAI), pages 523–532. Springer, 2020. 2, 4

2020

[11] [11]

Brain metastases admissions in Sweden be- tween 1987 and 2006.British Journal of Cancer, 101(11): 1919–1924, 2009

Karin E Smedby, Lena Brandt, Magnus L B ¨acklund, and Paul Blomqvist. Brain metastases admissions in Sweden be- tween 1987 and 2006.British Journal of Cancer, 101(11): 1919–1924, 2009. 1

1987