pith. sign in

arxiv: 2606.10107 · v1 · pith:LVRDKZP5new · submitted 2026-06-08 · 💻 cs.CV · q-bio.QM

Maximum Matching Accuracy: An Instance Segmentation Evaluation Metric Utilizing Globally Optimal Matching

Pith reviewed 2026-06-27 16:54 UTC · model grok-4.3

classification 💻 cs.CV q-bio.QM
keywords instance segmentationevaluation metricmaximum matching accuracybiological cell imagingglobally optimal matchingper-pixel normalizationsegmentation quality
0
0 comments X

The pith

Maximum Matching Accuracy evaluates instance segmentation via globally optimal one-to-one matching and per-pixel normalization without thresholds.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Maximum Matching Accuracy (MMA) to fix weaknesses in common instance segmentation metrics used in biological imaging. Existing approaches rely on hard IoU thresholds that create discontinuous scores, per-object normalization that distorts results with size variation, and greedy matching that produces order-dependent or suboptimal assignments. MMA instead solves for a globally optimal bipartite matching between predictions and ground truth then aggregates overlap with per-pixel normalization. Tests on synthetic failure modes, progressive corruption, and real model rankings indicate MMA yields more stable, sensitive, and interpretable scores than AP@50, PQ, SEG, or AJI when models produce splits, merges, or boundary errors. The result is a metric intended to support fairer benchmarking of cell segmentation algorithms.

Core claim

MMA is a threshold-free continuous metric that finds a globally optimal one-to-one matching between predicted and ground truth objects and aggregates total overlap using per-pixel normalization.

What carries the argument

Maximum Matching Accuracy (MMA), which solves a globally optimal bipartite matching between predicted and ground-truth instances then normalizes aggregate overlap on a per-pixel basis.

If this is right

  • MMA scores vary continuously rather than jumping when an IoU threshold is crossed.
  • MMA produces order-independent correspondences even when predictions contain splits or merges.
  • Per-pixel normalization prevents larger objects from dominating the score.
  • Model rankings remain consistent across common biological imaging failure modes.
  • Benchmarking can proceed without arbitrary threshold choices.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Adoption of MMA might shift training objectives toward boundary precision in ways that current IoU-based losses do not reward.
  • The same matching-plus-normalization construction could be tested on non-biological instance segmentation tasks such as autonomous driving or microscopy of non-cellular objects.
  • Direct comparison of MMA rankings against expert pairwise preference judgments would provide an external check on claimed interpretability gains.

Load-bearing premise

That globally optimal one-to-one matching combined with per-pixel normalization better reflects true segmentation quality than threshold-based or per-object methods, particularly under splits, merges, and boundary imprecision.

What would settle it

A controlled experiment in which two models produce segmentations judged equivalent by biologists yet receive substantially different MMA scores driven by the matching step.

Figures

Figures reproduced from arXiv: 2606.10107 by Alexandra D. VandeLoo, Craig R. Forest, Kaden Stillwagon.

Figure 1
Figure 1. Figure 1: Maximum Bipartite Matching for Instance Segmentation Assignment. (a) Example ground truth (GT) and predicted (Pred) segmentation masks. Areas of overlap are represented by numbers in regions where GT and Pred overlap. (b) A bipartite graph representing the segmentation scenario in (a) including nodes for GT masks on the left, nodes for predicted masks on the right, and edges between nodes that overlap with… view at source ↗
Figure 2
Figure 2. Figure 2: MMA Demonstrates Robustness on Structured Test Cases where Other Metrics Behave Unintuitively. MMA and five baseline segmentation quality metrics are evaluated on six structured scenarios representing common segmentation failure modes. In the left column are drawings of each scenario with filled blue circles representing predicted masks and dashed black circle outlines representing ground truth masks. To t… view at source ↗
Figure 3
Figure 3. Figure 3: MMA Consistently Exhibits Greater Stability and Sensitivity than Baseline Metrics Under Progressive Segmentation Corruption. Ground truth instance masks were progressively degraded using eight controlled corruption operations representing common segmentation failure modes. Metric responses are shown as corruption severity increases across iterations. Additionally, a segmentation example from LIVECell (Edlu… view at source ↗
read the original abstract

Reliable evaluation of instance segmentation models requires metrics that accurately and consistently reflect segmentation quality. However, the metrics most widely used in biological imaging carry fundamental mathematical weaknesses: hard Intersection-over-Union (IoU) thresholds that produce discontinuous, low sensitivity scoring; per-object normalization that distorts scores under object size variation; and greedy or one-to-many matching procedures that yield non-optimal, order-dependent correspondences. Together, these properties produce unintuitive and unreliable model rankings under common failure modes such as split cells, merged cells, and cell boundary imprecision. We propose Maximum Matching Accuracy (MMA), a threshold-free continuous metric that finds a globally optimal one-to-one matching between predicted and ground truth objects and aggregates total overlap using per-pixel normalization. We evaluate MMA against AP@50, PQ, SEG, and AJI across three experiments: synthetic failure cases, progressive corruption tests, and a model ranking comparison. MMA produces scores that are more stable, more sensitive, and more interpretable than existing alternatives, providing a principled foundation for fair instance segmentation benchmarking in biological cell imaging.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The paper claims that widely used instance segmentation metrics (AP@50, PQ, SEG, AJI) suffer from hard IoU thresholds causing discontinuous low-sensitivity scoring, per-object normalization that distorts results under size variation, and greedy/one-to-many matching that produces non-optimal order-dependent correspondences. It introduces Maximum Matching Accuracy (MMA) as a threshold-free continuous metric that computes a globally optimal one-to-one matching between predictions and ground truth then aggregates overlap via per-pixel normalization. Experiments on synthetic failure cases (splits, merges, boundary errors), progressive corruption tests, and model ranking comparisons are reported to show MMA yields more stable, sensitive, and interpretable scores than the baselines for biological cell imaging.

Significance. If the reported experimental advantages hold under scrutiny, MMA would offer a more reliable and principled alternative for benchmarking instance segmentation in biological imaging, where object-size variation and split/merge errors are common. The globally optimal matching component and per-pixel normalization are clear methodological strengths that directly target documented weaknesses in prior metrics; reproducible code or explicit matching formulation would further strengthen the contribution.

minor comments (3)
  1. [Abstract] Abstract: the claim that MMA is 'more stable, more sensitive, and more interpretable' is presented without any numerical deltas or statistical tests; the results section should include explicit quantitative comparisons (e.g., variance across corruption levels or ranking stability metrics) to support this.
  2. [Methods] The description of the matching procedure does not specify the algorithm used to obtain the globally optimal one-to-one assignment (Hungarian, min-cost flow, etc.) or its computational complexity; this detail is needed for reproducibility and should appear in the methods section.
  3. [Experiments] Figure captions and axis labels in the corruption-test and model-ranking figures should explicitly state the number of trials, error bars, and whether differences are statistically significant.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for the positive summary of our work, the recognition of MMA's methodological strengths, and the recommendation for minor revision. No specific major comments were raised in the report.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper defines MMA as a new metric via globally optimal one-to-one matching plus per-pixel normalization. This is an explicit construction, not a derivation that reduces to fitted inputs or self-citations. Experiments compare it to AP@50, PQ, SEG, and AJI on synthetic cases and model rankings, but the metric definition itself is independent of those outcomes. No self-citation load-bearing steps, no fitted-parameter predictions, and no uniqueness theorems imported from prior author work are present in the provided text. The central claim rests on empirical comparisons rather than any definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The proposal relies on the existence of an efficient algorithm for globally optimal bipartite matching (standard assignment problem) and the mathematical validity of per-pixel normalization; no free parameters or invented entities are mentioned.

axioms (1)
  • standard math An efficient algorithm exists to compute the globally optimal one-to-one matching that maximizes total overlap
    Invoked when defining MMA; this is the standard bipartite matching / assignment problem solvable by Hungarian algorithm or min-cost flow.

pith-pipeline@v0.9.1-grok · 5724 in / 1222 out tokens · 19463 ms · 2026-06-27T16:54:47.191340+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

29 extracted references · 29 canonical work pages · 3 internal anchors

  1. [1]

    Medical Image Analysis 84, 102699

    Mitosis domain generalization in histopathology images — The MIDOG challenge. Medical Image Analysis 84, 102699. URL:https://www.sciencedirect.com/science/article/ pii/S1361841522003279, doi:10.1016/j.media.2022.102699. Bradski, G.,

  2. [2]

    Nature Methods 16, 1247–1253

    Nucleus segmentation across imaging experiments: the 2018 Data Science Bowl. Nature Methods 16, 1247–1253. URL:https://www.nature.com/articles/s41592-019-0612-7, doi:10.1038/s41592-019-0612-7. Chen, L., Wu, Y., Stegmaier, J., Merhof, D.,

  3. [3]

    URL:http://arxiv

    SortedAP: Rethinking evaluation metrics for instance segmentation. URL:http://arxiv. org/abs/2309.04887, doi:10.48550/arXiv.2309.04887. arXiv:2309.04887 [cs.CV]. Cheng,B.,Girshick,R.,Dollár,P.,Berg,A.C.,Kirillov,A.,2021. BoundaryIoU:ImprovingObject-CentricImageSegmentationEvaluation. URL: http://arxiv.org/abs/2103.16562, doi:10.48550/arXiv.2103.16562. arX...

  4. [4]

    Edlund, C., Jackson, T.R., Khalid, N., Bevan, N., Dale, T., Dengel, A., Ahmed, S., Trygg, J., Sjögren, R.,

    URL:https://jbt.pubpub.org/pub/4n84h3kc/release/1, doi:10.7171/3fc1f5fe.5d696e01. Edlund, C., Jackson, T.R., Khalid, N., Bevan, N., Dale, T., Dengel, A., Ahmed, S., Trygg, J., Sjögren, R.,

  5. [5]

    NatureMethods18,1038–1045

    LIVECell—A large-scale dataset forlabel-freelivecellsegmentation. NatureMethods18,1038–1045. URL:https://www.nature.com/articles/s41592-021-01249-6, doi:10.1038/s41592-021-01249-6. Everingham,M.,VanGool,L.,Williams,C.K.I.,Winn,J.,Zisserman,A.,2010. ThePascalVisualObjectClasses(VOC)Challenge. International Journal of Computer Vision 88, 303–338. URL:https:...

  6. [6]

    Graham, S., Jahanifar, M., Vu, Q.D., Hadjigeorghiou, G., Leech, T., Snead, D., Raza, S.E.A., Minhas, F., Rajpoot, N.,

    URL:https://www.nature.com/articles/s41598-023-35605-7, doi:10.1038/s41598-023-35605-7. Graham, S., Jahanifar, M., Vu, Q.D., Hadjigeorghiou, G., Leech, T., Snead, D., Raza, S.E.A., Minhas, F., Rajpoot, N.,

  7. [7]

    arXiv:2111.14485 [cs.CV]

    URL:http://arxiv.org/abs/2111.14485, doi:10.48550/arXiv.2111.14485. arXiv:2111.14485 [cs.CV]. Graham, S., Vu, Q.D., Raza, S.E.A., Azam, A., Tsang, Y.W., Kwak, J.T., Rajpoot, N.,

  8. [8]

    URL:http://arxiv.org/abs/1812.06499, doi:10.48550/arXiv.1812

    HoVer-Net: Simultaneous Segmentation and Classification of Nuclei in Multi-Tissue Histology Images. URL:http://arxiv.org/abs/1812.06499, doi:10.48550/arXiv.1812. 06499. arXiv:1812.06499 [cs.CV]. Hagberg,A.A.,Schult,D.A.,Swart,P.J.,2008. ProceedingsofthePythoninScienceConference(SciPy):ExploringNetworkStructure,Dynamics, and Function using NetworkX. URL:ht...

  9. [9]

    BMCBiology22,1

    Morphology-based deep learning enables accurate detectionofsenescenceinmesenchymalstemcellcultures. BMCBiology22,1. URL:https://doi.org/10.1186/s12915-023-01780-2, doi:10.1186/s12915-023-01780-2. Hirling, D., Tasnadi, E., Caicedo, J., Caroprese, M.V., Sjögren, R., Aubreville, M., Koos, K., Horvath, P.,

  10. [10]

    Howard, A., Chow, A., CorporateResearchSartorius, Ca, M., Culliton, P., Jackson, T.,

    Segmentation metric misinterpretationsinbioimageanalysis.NatureMethods21,213–216.URL:https://www.nature.com/articles/s41592-023-01942-8, doi:10.1038/s41592-023-01942-8. Howard, A., Chow, A., CorporateResearchSartorius, Ca, M., Culliton, P., Jackson, T.,

  11. [11]

    J.vandeSandeetal.arXive-prints,art.arXiv:2306.00059,May2023

    Sartorius - Cell Instance Segmentation. URL: https://kaggle.com/sartorius-cell-instance-segmentation. Hörst,F.,Rempe,M.,Heine,L.,Seibold,C.,Keyl,J.,Baldini,G.,Ugurel,S.,Siveke,J.,Grünwald,B.,Egger,J.,Kleesiek,J.,2023. CellViT:Vision Transformers for Precise Cell Segmentation and Classification. URL:http://arxiv.org/abs/2306.15350, doi:10.48550/arXiv.2306....

  12. [12]

    URL:http://arxiv.org/abs/2207.01614, doi:10.48550/arXiv.2207.01614

    Beyond mAP: Towards better evaluation of instance segmentation. URL:http://arxiv.org/abs/2207.01614, doi:10.48550/arXiv.2207.01614. arXiv:2207.01614 [cs.CV]. Kamat, P., Macaluso, N., Min, C., Li, Y., Agrawal, A., Winston, A., Pan, L., Starich, B., Stewart, T., Wu, P.H., Fan, J., Walston, J., Phillip, J.M.,

  13. [13]

    URL:https://www.biorxiv

    Single-cell morphology encodes functional subtypes of senescence in aging human dermal fibroblasts. URL:https://www.biorxiv. org/content/10.1101/2024.05.10.593637v2, doi:10.1101/2024.05.10.593637. pages: 2024.05.10.593637 Section: New Results. Karmakar, R., Nørrelykke, S.F.,

  14. [14]

    URL: http://arxiv.org/abs/2505.12155, doi:10.48550/arXiv.2505.12155

    SoftPQ: Robust Instance Segmentation Evaluation via Soft Matching and Tunable Thresholds. URL: http://arxiv.org/abs/2505.12155, doi:10.48550/arXiv.2505.12155. arXiv:2505.12155 [cs]. Kirillov, A., Girshick, R., He, K., Dollár, P.,

  15. [15]

    Panoptic Feature Pyramid Networks

    Panoptic Feature Pyramid Networks. URL:http://arxiv.org/abs/1901.02446, doi:10.48550/arXiv.1901.02446. arXiv:1901.02446 [cs]. Kumar, N., Verma, R., Sharma, S., Bhargava, S., Vahadane, A., Sethi, A.,

  16. [16]

    IEEE Transactions on Medical Imaging 36, 1550–1560

    A Dataset and a Technique for Generalized Nuclear Segmentation for Computational Pathology. IEEE Transactions on Medical Imaging 36, 1550–1560. URL:https://ieeexplore.ieee.org/document/ 7872382, doi:10.1109/TMI.2017.2677499. Lin, T.Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., Dollár, P.,

  17. [17]

    Microsoft COCO: Common Objects in Context

    Microsoft COCO: Common Objects in Context. URL:http://arxiv.org/abs/1405.0312, doi:10.48550/arXiv.1405.0312. arXiv:1405.0312 [cs]. Marks, M., Israel, U., Dilip, R., Li, Q., Yu, C., Laubscher, E., Iqbal, A., Pradhan, E., Ates, A., Abt, M., Brown, C., Pao, E., Li, S., Pearson-Goulart, A.,Perona,P.,Gkioxari,G.,Barnowski,R.,Yue,Y.,VanValen,D.,2025. CellSAM:af...

  18. [18]

    Bioinformatics 30, 1609–1617

    A benchmark for comparison of cell tracking algorithms. Bioinformatics 30, 1609–1617. URL:https://doi.org/10. 1093/bioinformatics/btu080, doi:10.1093/bioinformatics/btu080. Mousavikhamene,Z.,Sykora,D.J.,Mrksich,M.,Bagheri,N.,2021. Morphologicalfeaturesofsinglecellsenableaccurateautomatedclassification of cancer from non-cancer cell lines. Scientific Repor...

  19. [19]

    Pachitariu, M., Rariden, M., Stringer, C.,

    URL:https://doi.org/10.1186/s13287-017-0740-x, doi:10.1186/ s13287-017-0740-x. Pachitariu, M., Rariden, M., Stringer, C.,

  20. [20]

    URL:https://www

    Cellpose-SAM: superhuman generalization for cellular segmentation. URL:https://www. biorxiv.org/content/10.1101/2025.04.28.651001v1, doi:10.1101/2025.04.28.651001. pages: 2025.04.28.651001 Section: New Results. Reta, C., Altamirano, L., Gonzalez, J.A., Diaz-Hernandez, R., Peregrina, H., Olmos, I., Alonso, J.E., Lobato, R.,

  21. [21]

    URL:https://pmc.ncbi.nlm.nih.gov/articles/PMC4479443/, doi:10.1371/journal.pone.0130805

    Segmentation and ClassificationofBoneMarrowCellsImagesUsingContextualInformationforMedicalDiagnosisofAcuteLeukemias.PLoSONE10,e0130805. URL:https://pmc.ncbi.nlm.nih.gov/articles/PMC4479443/, doi:10.1371/journal.pone.0130805. Stillwagon, K., VandeLoo, A.D., Magondu, B., Forest, C.R.,

  22. [22]

    Self-supervised Pretraining of Cell Segmentation Models

    Self-supervised Pretraining of Cell Segmentation Models. URL:http: //arxiv.org/abs/2604.10609, doi:10.48550/arXiv.2604.10609. arXiv:2604.10609 [cs.CV] version:

  23. [23]

    Cellpose:ageneralistalgorithmforcellularsegmentation

    Stringer,C.,Wang,T.,Michaelos,M.,Pachitariu,M.,2021. Cellpose:ageneralistalgorithmforcellularsegmentation. NatureMethods18,100–106. URL:https://www.nature.com/articles/s41592-020-01018-x, doi:10.1038/s41592-020-01018-x. VandeLoo, A.D., Malta, N.J., Sanganeriya, S., Aponte, E., Zyl, C.v., Xu, D., Forest, C.,

  24. [24]

    PLOS ONE 20, e0319532

    SAMCell: Generalized label-free biological cell segmentation with segment anything. PLOS ONE 20, e0319532. URL:https://journals.plos.org/plosone/article?id=10.1371/ journal.pone.0319532, doi:10.1371/journal.pone.0319532. Vasilevich, A.S., Vermeulen, S., Kamphuis, M., Roumans, N., Eroumé, S., Hebels, D.G.A.J., van de Peppel, J., Reihs, R., Beijer, N.R.M., ...

  25. [25]

    Scientific Reports 10, 18988

    On the correlation between material-induced cell shape and phenotypical response of human mesenchymal stem cells. Scientific Reports 10, 18988. URL:https://www.nature.com/articles/s41598-020-76019-z, doi:10.1038/s41598-020-76019-z. Verma, R., Kumar, N., Patil, A., Kurian, N.C., Rane, S., Graham, S., Vu, Q.D., Zwager, M., Raza, S.E.A., Rajpoot, N., Wu, X.,...

  26. [26]

    IEEE Transactions on Medical Imaging40(12), 3413–3423 (2021).https://doi.org/10.1109/TMI.2021.3085712

    MoNuSAC2020: A Multi-Organ Nuclei Segmentation and Classification Challenge. IEEE Transactions on Medical Imaging 40, 3413–3423. URL:https://ieeexplore.ieee.org/abstract/document/9446924, doi:10.1109/TMI.2021.3085712. Way, G.P., Kost-Alimova, M., Shibue, T., Harrington, W.F., Gill, S., Piccioni, F., Becker, T., Shafqat-Abbasi, H., Hahn, W.C., Carpenter, A...

  27. [27]

    Molecular Biology of the Cell 32, 995–1005

    Predicting cell health phenotypes using image-based morphology profiling. Molecular Biology of the Cell 32, 995–1005. doi:10.1091/mbc.E20-12-0784. Welter, E.M., Benavides, S., Archer, T.K., Kosyk, O., Zannas, A.S.,

  28. [28]

    GeroScience 46, 2425–2439

    Machine learning-based morphological quantification of replicative senescence in human fibroblasts. GeroScience 46, 2425–2439. URL:https://doi.org/10.1007/s11357-023-01007-w, doi:10.1007/ s11357-023-01007-w. Wu, P.H., Gilkes, D.M., Phillip, J.M., Narkar, A., Cheng, T.W.T., Marchand, J., Lee, M.H., Li, R., Wirtz, D.,

  29. [29]

    Science Advances 6, eaaw6938

    Single-cell morphology encodes metastatic potential. Science Advances 6, eaaw6938. URL:https://www.science.org/doi/10.1126/sciadv.aaw6938, doi:10.1126/sciadv.aaw6938. K. Stillwagon et al.:Preprint Page 15 of 16 Maximum Matching Accuracy Kaden Stillwagon is a Computer Science masters student at Georgia Tech. He holds a B.S. in Computer Science from Georgia...