pith. sign in

arxiv: 2605.31094 · v1 · pith:6WBBHFX4new · submitted 2026-05-29 · 💻 cs.CV · cs.AI

Redefining Instance Matching: A Unified Framework for Part-Aware Matching in Panoptic Segmentation Evaluation

Pith reviewed 2026-06-28 23:05 UTC · model grok-4.3

classification 💻 cs.CV cs.AI
keywords panoptic segmentationinstance matchingPanoptic Qualitybipartite matchingpart-aware segmentationevaluation metricssegmentation evaluation
0
0 comments X

The pith

Panoptic Quality holds for three segment matching strategies but not Many-to-Many when IoU drops below 0.5

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes a unified framework for instance matching in panoptic segmentation by modeling it as a constrained bipartite assignment problem. Bounding the matching degrees independently on each side defines four strategies. It shows that One-to-One, Many-to-One and One-to-Many remain well-defined within the Panoptic Quality metric, but Many-to-Many does not. This clarifies evaluation when instances fragment or annotations are noisy. The vertex-based counting of true and false positives extends the framework to part-aware segmentation.

Core claim

By recasting segment matching as a constrained bipartite assignment problem and bounding the degrees on the prediction and ground-truth sides independently, four matching strategies arise. The first three are well-defined within the PQ framework while Many-to-Many falls outside it. Central to the framework is a vertex-based accounting of TP, FN, and FP anchored to ground truth and predicted segments rather than to matching edges. The framework extends naturally to part-aware panoptic segmentation.

What carries the argument

Constrained bipartite assignment with independent degree bounds on each side of the graph, which classifies matching strategies and supports vertex-based TP/FN/FP accounting

If this is right

  • Across configurable case studies, different combinations of thresholds and matching strategies can be compared in practice
  • The framework extends naturally to part-aware panoptic segmentation evaluation
  • A unified open-source package on Panoptica implements the strategies with Voronoi-based analysis and Area Under Threshold Curve options
  • These strategies become relevant for fragmented instances, difficult delineations, or noisy annotations

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This model could be tested on standard panoptic benchmarks to quantify score differences from the original PQ
  • Extending the vertex accounting to other segmentation metrics might unify evaluation across tasks
  • The part-aware version opens evaluation for hierarchical structures in medical imaging datasets

Load-bearing premise

Independently bounding the degrees on the prediction and ground-truth sides of the bipartite graph produces evaluation strategies that remain meaningful and comparable to the original Panoptic Quality definition

What would settle it

Finding a Many-to-Many matching example where the computed Panoptic Quality score behaves identically to the standard One-to-One matching under the same IoU conditions would challenge the claim that it falls outside the framework

Figures

Figures reproduced from arXiv: 2605.31094 by Benedikt Wiestler, Erik Gro{\ss}kopf, Florian Kofler, Hendrik M\"oller, Jan Kirschke, Jonathan Shapey, Kerstin Ritter, Mehdi Astaraki, Nicolas M\"unster, Paula Tamara Buzduga, Soumya Snigdha Kundu, Tom Vercauteren.

Figure 1
Figure 1. Figure 1: Comparison of different scenarios where the choice of a matching strategy and threshold [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Left shows the One-to-One matching for the threshold [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Examples of part-aware segmentation across domains: (a) natural-image instances with [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: A toy example encompassing different scenarios in real-life segmentation tasks, like [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Semantic segmentation delineates stuff (grass and sheeps) exclusively (b), whereas panoptic segmentation augments semantic classes with things which are countable objects like individual sheeps (d). Instance segmentation permits multiple instance assignments per pixel in contrast to panoptic segmentation (c). Part-aware panoptic segmentation combines scene and part parsing in a single task (e). 13 [PITH_F… view at source ↗
read the original abstract

The Panoptic Quality (PQ) metric is the standard for jointly evaluating instance and semantic segmentation. However, its original definition relies on a One-to-One matching between predicted and ground truth segments, which is only straightforward when the IoU threshold exceeds 0.5. Below 0.5, multiple matching strategies emerge in a poorly explored problem space. We systematically elucidate this space by recasting segment matching as a constrained bipartite assignment problem. Independently bounding the prediction- and ground-truth-side degrees yields four matching strategies: One-to-One, Many-to-One, One-to-Many, and Many-to-Many. We show that the first three are well-defined within the PQ framework, while Many-to-Many falls outside it. These strategies become relevant when instances are fragmented, adjacent objects are difficult to delineate, or annotations are noisy. Central to our framework is a vertex-based accounting of TP, FN, and FP, anchored to ground truth and predicted segments rather than to matching edges. We further show that the framework extends naturally to part-aware panoptic segmentation, and we explore part-aware evaluation on biomedical data. Across configurable case studies we report how different combinations of thresholds and matching strategies behave in practice. We release a unified open-source package built on Panoptica. It exposes Voronoi-based region-wise analysis, part-aware evaluation, and Area Under Threshold Curve computations as configurable options.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The paper recasts panoptic segment matching as a constrained bipartite assignment problem whose four strategies (One-to-One, Many-to-One, One-to-Many, Many-to-Many) arise from independent degree bounds on the prediction and ground-truth partitions. It asserts that the first three remain well-defined inside the original PQ framework via a vertex-based (segment-anchored) counting of TP/FN/FP, while Many-to-Many does not; the framework is extended to part-aware panoptic segmentation and demonstrated on biomedical data with configurable thresholds and an open-source implementation on Panoptica.

Significance. A rigorously derived set of matching strategies that reduce exactly to classical PQ when both degree bounds equal 1, together with an invariance proof for SQ and RQ, would supply a principled tool for evaluating fragmented or noisy instances. The release of a configurable open-source package exposing Voronoi analysis, part-aware metrics, and AUC computations is a concrete strength that lowers the barrier to adoption.

major comments (1)
  1. [Abstract / central claim] The central claim that the first three strategies are 'well-defined within the PQ framework' is load-bearing yet unsupported by any explicit reduction: no derivation shows that the vertex-based TP/FN/FP rules recover the classical PQ formulas exactly when both degree bounds are set to 1, nor that SQ and RQ retain their original interpretation, monotonicity, and [0,1] range once a bound exceeds 1 (abstract, paragraph on recasting as constrained assignment).

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review. The major comment correctly identifies that the central claim requires an explicit supporting derivation, which we will add in revision.

read point-by-point responses
  1. Referee: [Abstract / central claim] The central claim that the first three strategies are 'well-defined within the PQ framework' is load-bearing yet unsupported by any explicit reduction: no derivation shows that the vertex-based TP/FN/FP rules recover the classical PQ formulas exactly when both degree bounds are set to 1, nor that SQ and RQ retain their original interpretation, monotonicity, and [0,1] range once a bound exceeds 1 (abstract, paragraph on recasting as constrained assignment).

    Authors: We agree that an explicit derivation is required. In the revised manuscript we will insert a new subsection (under Methods) containing: (i) a formal proof that the vertex-based TP/FN/FP accounting reduces exactly to the classical edge-based PQ formulas when both degree bounds equal 1; (ii) proofs that SQ and RQ preserve their original semantic interpretations, monotonicity with respect to matching quality, and the [0,1] range for the three bounded strategies; and (iii) a concise counter-example showing why Many-to-Many violates these properties. The abstract and introductory paragraph will be updated to reference this new subsection. revision: yes

Circularity Check

0 steps flagged

No significant circularity; framework extends standard bipartite matching without self-referential reduction.

full rationale

The paper recasts panoptic matching as a constrained bipartite assignment problem whose four strategies follow directly from independent degree bounds on the two partitions. The vertex-based TP/FN/FP accounting is introduced as a modeling choice anchored to segments, not derived from or fitted to the same quantities it evaluates. No equations reduce a reported metric to a fitted parameter or prior result by construction, and no load-bearing claim rests on a self-citation chain. The assertion that the first three strategies remain inside the PQ framework is presented as a modeling consequence rather than an unproven invariance that collapses to the input definitions. The derivation is therefore self-contained against external benchmarks (standard assignment problem and original PQ definition) and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 1 invented entities

The framework relies on standard graph-matching algorithms and introduces configurable IoU thresholds plus degree bounds; the vertex-based counting is presented as a new conceptual primitive without external validation shown in the abstract.

free parameters (2)
  • IoU threshold
    Determines when multiple matching strategies become active; treated as a configurable parameter in the released package.
  • Degree bounds on prediction and ground-truth sides
    Chosen independently to generate the four matching strategies; no fixed values given in abstract.
axioms (1)
  • domain assumption Bipartite assignment with degree constraints yields well-defined matching strategies that can be compared to the original PQ definition
    Invoked when the authors state that the first three strategies remain inside the PQ framework.
invented entities (1)
  • Vertex-based accounting of TP, FN, FP no independent evidence
    purpose: Anchor counts to segments rather than to matching edges
    Presented as central to the framework; no independent evidence supplied in abstract.

pith-pipeline@v0.9.1-grok · 5841 in / 1459 out tokens · 31967 ms · 2026-06-28T23:05:32.913900+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

24 extracted references · 9 canonical work pages · 2 internal anchors

  1. [1]

    Pedro R. A. S. Bassi, Mehmet Can Yavuz, Kang Wang, Xiaoxi Chen, Wenxuan Li, Sergio Decherchi, Andrea Cavalli, Yang Yang, Alan Yuille, and Zongwei Zhou. Radgpt: Constructing 3d image-text tumor datasets, 2025. URLhttps://arxiv.org/abs/2501.04678

  2. [2]

    The liver tumor segmentation benchmark (lits).Medical image analysis, 84:102680, 2023

    Patrick Bilic, Patrick Christ, Hongwei Bran Li, Eugene V orontsov, Avi Ben-Cohen, Georgios Kaissis, Adi Szeskin, Colin Jacobs, Gabriel Efrain Humpire Mamani, Gabriel Chartrand, et al. The liver tumor segmentation benchmark (lits).Medical image analysis, 84:102680, 2023

  3. [3]

    Revisiting the Coco Panoptic Metric to Enable Visual and Qualitative Analysis of Historical Map Instance Segmentation

    Joseph Chazalon and Edwin Carlinet. Revisiting the Coco Panoptic Metric to Enable Visual and Qualitative Analysis of Historical Map Instance Segmentation. In Josep Lladós, Daniel Lopresti, and Seiichi Uchida, editors,16th International Conference on Document Analysis and Recognition, IC- DAR 2021, Lausanne, Switzerland, September 5-10, 2021, Proceedings, ...

  4. [4]

    Sortedap: rethinking evaluation metrics for instance segmentation

    Long Chen, Yuli Wu, Johannes Stegmaier, and Dorit Merhof. Sortedap: rethinking evaluation metrics for instance segmentation. InProceedings of the ieee/cvf international conference on computer vision, pages 3923–3929, 2023

  5. [5]

    The cityscapes dataset for semantic urban scene understanding

    Marius Cordts, Mohamed Omran, Sebastian Ramos, Timo Rehfeld, Markus Enzweiler, Rodrigo Benenson, Uwe Franke, Stefan Roth, and Bernt Schiele. The cityscapes dataset for semantic urban scene understanding. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 3213–3223, 2016

  6. [6]

    Part-aware panoptic segmentation

    Daan De Geus, Panagiotis Meletis, Chenyang Lu, Xiaoxiao Wen, and Gijs Dubbelman. Part-aware panoptic segmentation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 5485–5494, 2021

  7. [7]

    Panoptic quality should be avoided as a metric for assessing cell nuclei segmentation and classification in digital pathology.Scientific reports, 13(1):8614, 2023

    Adrien Foucart, Olivier Debeir, and Christine Decaestecker. Panoptic quality should be avoided as a metric for assessing cell nuclei segmentation and classification in digital pathology.Scientific reports, 13(1):8614, 2023

  8. [8]

    Girshick, and Jitendra Malik

    Bharath Hariharan, Pablo Andrés Arbeláez, Ross B. Girshick, and Jitendra Malik. Simultaneous de- tection and segmentation. In David J. Fleet, Tomás Pajdla, Bernt Schiele, and Tinne Tuytelaars, edi- tors,Computer Vision - ECCV 2014 - 13th European Conference, Zurich, Switzerland, September 6-12, 2014, Proceedings, Part VII, Lecture Notes in Computer Scienc...

  9. [10]

    URLhttp://arxiv.org/abs/1703.06870

  10. [11]

    The state of the art in kidney and kidney tumor segmentation in contrast-enhanced ct imaging: Results of the kits19 challenge.Medical image analysis, 67:101821, 2021

    Nicholas Heller, Fabian Isensee, Klaus H Maier-Hein, Xiaoshuai Hou, Chunmei Xie, Fengyi Li, Yang Nan, Guangrui Mu, Zhiyong Lin, Miofei Han, et al. The state of the art in kidney and kidney tumor segmentation in contrast-enhanced ct imaging: Results of the kits19 challenge.Medical image analysis, 67:101821, 2021

  11. [12]

    Every component counts: rethinking the measure of success for medical semantic segmentation in multi-instance segmentation tasks

    Alexander Jaus, Constantin Marc Seibold, Simon Reiß, Zdravko Marinov, Keyi Li, Zeling Ye, Stefan Krieg, Jens Kleesiek, and Rainer Stiefelhagen. Every component counts: rethinking the measure of success for medical semantic segmentation in multi-instance segmentation tasks. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pages 3...

  12. [13]

    Virtual reality-empowered deep-learning analysis of brain cells.Nature Methods, 21(7):1306–1315, 2024

    Doris Kaltenecker, Rami Al-Maskari, Moritz Negwer, Luciano Hoeher, Florian Kofler, Shan Zhao, Mihail Todorov, Zhouyi Rong, Johannes Christian Paetzold, Benedikt Wiestler, et al. Virtual reality-empowered deep-learning analysis of brain cells.Nature Methods, 21(7):1306–1315, 2024

  13. [14]

    Panoptic segmentation

    Alexander Kirillov, Kaiming He, Ross Girshick, Carsten Rother, and Piotr Dollár. Panoptic segmentation. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9404–9413, 2019

  14. [15]

    Panoptica–instance-wise evaluation of 3d semantic and instance segmentation maps.arXiv preprint arXiv:2312.02608, 2023

    Florian Kofler, Hendrik Möller, Josef A Buchner, Ezequiel de la Rosa, Ivan Ezhov, Marcel Rosier, Isra Mekki, Suprosanna Shit, Moritz Negwer, Rami Al-Maskari, et al. Panoptica–instance-wise evaluation of 3d semantic and instance segmentation maps.arXiv preprint arXiv:2312.02608, 2023

  15. [16]

    Blob loss: Instance imbalance aware loss functions for semantic segmentation

    Florian Kofler, Suprosanna Shit, Ivan Ezhov, Lucas Fidon, Izabela Horvath, Rami Al-Maskari, Hong- wei Bran Li, Harsharan Bhatia, Timo Loehr, Marie Piraud, et al. Blob loss: Instance imbalance aware loss functions for semantic segmentation. InInternational Conference on Information Processing in Medical Imaging, pages 755–767. Springer, 2023

  16. [17]

    H. W. Kuhn. The hungarian method for the assignment problem.Naval Research Logistics Quarterly, 2 (1-2):83–97, 1955. 10

  17. [18]

    Cluster dice: a simple and fast approach for instance-based semantic segmentation evaluation via many-to-many matching

    Soumya Snigdha Kundu, Aaron Kujawa, Marina Ivory, Theodore Barfoot, Jonathan Shapey, and Tom Vercauteren. Cluster dice: a simple and fast approach for instance-based semantic segmentation evaluation via many-to-many matching. InMedical Imaging 2025: Computer-Aided Diagnosis, volume 13407, pages 226–232. SPIE, 2025

  18. [19]

    Brain tumor segmentation (BraTS) challenge 2024: Meningioma radiotherapy planning automated segmentation

    Dominic LaBella, Katherine Schumacher, Michael Mix, Kevin Leu, Shan McBurney-Lin, et al. Brain tumor segmentation (BraTS) challenge 2024: Meningioma radiotherapy planning automated segmentation. arXiv preprint arXiv:2405.18383, 2024

  19. [20]

    Fully Convolutional Networks for Semantic Segmentation

    Jonathan Long, Evan Shelhamer, and Trevor Darrell. Fully Convolutional Networks for Semantic Segmen- tation. abs/1411.4038. URLhttp://arxiv.org/abs/1411.4038

  20. [21]

    Metrics reloaded: recommendations for image analysis validation.Nature methods, 21(2):195–212, 2024

    Lena Maier-Hein, Annika Reinke, Patrick Godau, Minu D Tizabi, Florian Buettner, Evangelia Christodoulou, Ben Glocker, Fabian Isensee, Jens Kleesiek, Michal Kozubek, et al. Metrics reloaded: recommendations for image analysis validation.Nature methods, 21(2):195–212, 2024

  21. [22]

    Nazanin Maleki, Raisa Amiruddin, Ahmed W Moawad, Nikolay Yordanov, Athanasios Gkampenis, Pascal Fehringer, Fabian Umeh, Crystal Chukwurah, Fatima Memon, Bojan Petrovic, et al. Analysis of the miccai brain tumor segmentation–metastases (brats-mets) 2025 lighthouse challenge: Brain metastasis segmentation on pre-and post-treatment mri.arXiv preprint arXiv:2...

  22. [23]

    ccDice: A topology-aware Dice score based on connected components

    Pierre Rougé, Odyssée Merveille, and Nicolas Passat. ccDice: A topology-aware Dice score based on connected components. InTopology- and Graph-Informed Imaging Informatics: First International Workshop, TGI3 2024, Held in Conjunction with MICCAI 2024, Lecture Notes in Computer Science, pages 11–21. Springer, 2024. doi: 10.1007/978-3-031-73967-5\_2

  23. [24]

    Genetically programmable barcodes for correlative volume electron microscopy.2023 Synthetic Biology: Engineering, Evolution & Design (SEED), 2023

    Felix Sigmund, Oleksandr Berezin, Sofia Beliakova, Bernhard Magerl, Martin Drawitsch, Alberto Piovesan, Filipa Gonçalves, Silviu-Vasile Bodea, Stefanie Winkler, Zoe Bousraou, et al. Genetically programmable barcodes for correlative volume electron microscopy.2023 Synthetic Biology: Engineering, Evolution & Design (SEED), 2023

  24. [25]

    C. J. van Rijsbergen.Information Retrieval. Butterworth, 1979. ISBN 0-408-70929-4. 11 Appendix A Voronoi Matching: Formal Construction Let Ω denote the image domain and d(x, g) the (Euclidean) distance from voxel x∈Ω to the nearest voxel of ground truth segmentg∈G. The V oronoi cell ofgis V(g) ={x∈Ω :d(x, g)≤d(x, g ′)∀g ′ ∈G},(14) with ties broken arbitra...