Maximum Matching Accuracy: An Instance Segmentation Evaluation Metric Utilizing Globally Optimal Matching
Pith reviewed 2026-06-27 16:54 UTC · model grok-4.3
The pith
Maximum Matching Accuracy evaluates instance segmentation via globally optimal one-to-one matching and per-pixel normalization without thresholds.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
MMA is a threshold-free continuous metric that finds a globally optimal one-to-one matching between predicted and ground truth objects and aggregates total overlap using per-pixel normalization.
What carries the argument
Maximum Matching Accuracy (MMA), which solves a globally optimal bipartite matching between predicted and ground-truth instances then normalizes aggregate overlap on a per-pixel basis.
If this is right
- MMA scores vary continuously rather than jumping when an IoU threshold is crossed.
- MMA produces order-independent correspondences even when predictions contain splits or merges.
- Per-pixel normalization prevents larger objects from dominating the score.
- Model rankings remain consistent across common biological imaging failure modes.
- Benchmarking can proceed without arbitrary threshold choices.
Where Pith is reading between the lines
- Adoption of MMA might shift training objectives toward boundary precision in ways that current IoU-based losses do not reward.
- The same matching-plus-normalization construction could be tested on non-biological instance segmentation tasks such as autonomous driving or microscopy of non-cellular objects.
- Direct comparison of MMA rankings against expert pairwise preference judgments would provide an external check on claimed interpretability gains.
Load-bearing premise
That globally optimal one-to-one matching combined with per-pixel normalization better reflects true segmentation quality than threshold-based or per-object methods, particularly under splits, merges, and boundary imprecision.
What would settle it
A controlled experiment in which two models produce segmentations judged equivalent by biologists yet receive substantially different MMA scores driven by the matching step.
Figures
read the original abstract
Reliable evaluation of instance segmentation models requires metrics that accurately and consistently reflect segmentation quality. However, the metrics most widely used in biological imaging carry fundamental mathematical weaknesses: hard Intersection-over-Union (IoU) thresholds that produce discontinuous, low sensitivity scoring; per-object normalization that distorts scores under object size variation; and greedy or one-to-many matching procedures that yield non-optimal, order-dependent correspondences. Together, these properties produce unintuitive and unreliable model rankings under common failure modes such as split cells, merged cells, and cell boundary imprecision. We propose Maximum Matching Accuracy (MMA), a threshold-free continuous metric that finds a globally optimal one-to-one matching between predicted and ground truth objects and aggregates total overlap using per-pixel normalization. We evaluate MMA against AP@50, PQ, SEG, and AJI across three experiments: synthetic failure cases, progressive corruption tests, and a model ranking comparison. MMA produces scores that are more stable, more sensitive, and more interpretable than existing alternatives, providing a principled foundation for fair instance segmentation benchmarking in biological cell imaging.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that widely used instance segmentation metrics (AP@50, PQ, SEG, AJI) suffer from hard IoU thresholds causing discontinuous low-sensitivity scoring, per-object normalization that distorts results under size variation, and greedy/one-to-many matching that produces non-optimal order-dependent correspondences. It introduces Maximum Matching Accuracy (MMA) as a threshold-free continuous metric that computes a globally optimal one-to-one matching between predictions and ground truth then aggregates overlap via per-pixel normalization. Experiments on synthetic failure cases (splits, merges, boundary errors), progressive corruption tests, and model ranking comparisons are reported to show MMA yields more stable, sensitive, and interpretable scores than the baselines for biological cell imaging.
Significance. If the reported experimental advantages hold under scrutiny, MMA would offer a more reliable and principled alternative for benchmarking instance segmentation in biological imaging, where object-size variation and split/merge errors are common. The globally optimal matching component and per-pixel normalization are clear methodological strengths that directly target documented weaknesses in prior metrics; reproducible code or explicit matching formulation would further strengthen the contribution.
minor comments (3)
- [Abstract] Abstract: the claim that MMA is 'more stable, more sensitive, and more interpretable' is presented without any numerical deltas or statistical tests; the results section should include explicit quantitative comparisons (e.g., variance across corruption levels or ranking stability metrics) to support this.
- [Methods] The description of the matching procedure does not specify the algorithm used to obtain the globally optimal one-to-one assignment (Hungarian, min-cost flow, etc.) or its computational complexity; this detail is needed for reproducibility and should appear in the methods section.
- [Experiments] Figure captions and axis labels in the corruption-test and model-ranking figures should explicitly state the number of trials, error bars, and whether differences are statistically significant.
Simulated Author's Rebuttal
We thank the referee for the positive summary of our work, the recognition of MMA's methodological strengths, and the recommendation for minor revision. No specific major comments were raised in the report.
Circularity Check
No significant circularity
full rationale
The paper defines MMA as a new metric via globally optimal one-to-one matching plus per-pixel normalization. This is an explicit construction, not a derivation that reduces to fitted inputs or self-citations. Experiments compare it to AP@50, PQ, SEG, and AJI on synthetic cases and model rankings, but the metric definition itself is independent of those outcomes. No self-citation load-bearing steps, no fitted-parameter predictions, and no uniqueness theorems imported from prior author work are present in the provided text. The central claim rests on empirical comparisons rather than any definitional equivalence.
Axiom & Free-Parameter Ledger
axioms (1)
- standard math An efficient algorithm exists to compute the globally optimal one-to-one matching that maximizes total overlap
Reference graph
Works this paper leans on
-
[1]
Medical Image Analysis 84, 102699
Mitosis domain generalization in histopathology images — The MIDOG challenge. Medical Image Analysis 84, 102699. URL:https://www.sciencedirect.com/science/article/ pii/S1361841522003279, doi:10.1016/j.media.2022.102699. Bradski, G.,
-
[2]
Nucleus segmentation across imaging experiments: the 2018 Data Science Bowl. Nature Methods 16, 1247–1253. URL:https://www.nature.com/articles/s41592-019-0612-7, doi:10.1038/s41592-019-0612-7. Chen, L., Wu, Y., Stegmaier, J., Merhof, D.,
-
[3]
SortedAP: Rethinking evaluation metrics for instance segmentation. URL:http://arxiv. org/abs/2309.04887, doi:10.48550/arXiv.2309.04887. arXiv:2309.04887 [cs.CV]. Cheng,B.,Girshick,R.,Dollár,P.,Berg,A.C.,Kirillov,A.,2021. BoundaryIoU:ImprovingObject-CentricImageSegmentationEvaluation. URL: http://arxiv.org/abs/2103.16562, doi:10.48550/arXiv.2103.16562. arX...
-
[4]
URL:https://jbt.pubpub.org/pub/4n84h3kc/release/1, doi:10.7171/3fc1f5fe.5d696e01. Edlund, C., Jackson, T.R., Khalid, N., Bevan, N., Dale, T., Dengel, A., Ahmed, S., Trygg, J., Sjögren, R.,
-
[5]
LIVECell—A large-scale dataset forlabel-freelivecellsegmentation. NatureMethods18,1038–1045. URL:https://www.nature.com/articles/s41592-021-01249-6, doi:10.1038/s41592-021-01249-6. Everingham,M.,VanGool,L.,Williams,C.K.I.,Winn,J.,Zisserman,A.,2010. ThePascalVisualObjectClasses(VOC)Challenge. International Journal of Computer Vision 88, 303–338. URL:https:...
-
[6]
URL:https://www.nature.com/articles/s41598-023-35605-7, doi:10.1038/s41598-023-35605-7. Graham, S., Jahanifar, M., Vu, Q.D., Hadjigeorghiou, G., Leech, T., Snead, D., Raza, S.E.A., Minhas, F., Rajpoot, N.,
-
[7]
URL:http://arxiv.org/abs/2111.14485, doi:10.48550/arXiv.2111.14485. arXiv:2111.14485 [cs.CV]. Graham, S., Vu, Q.D., Raza, S.E.A., Azam, A., Tsang, Y.W., Kwak, J.T., Rajpoot, N.,
-
[8]
URL:http://arxiv.org/abs/1812.06499, doi:10.48550/arXiv.1812
HoVer-Net: Simultaneous Segmentation and Classification of Nuclei in Multi-Tissue Histology Images. URL:http://arxiv.org/abs/1812.06499, doi:10.48550/arXiv.1812. 06499. arXiv:1812.06499 [cs.CV]. Hagberg,A.A.,Schult,D.A.,Swart,P.J.,2008. ProceedingsofthePythoninScienceConference(SciPy):ExploringNetworkStructure,Dynamics, and Function using NetworkX. URL:ht...
-
[9]
Morphology-based deep learning enables accurate detectionofsenescenceinmesenchymalstemcellcultures. BMCBiology22,1. URL:https://doi.org/10.1186/s12915-023-01780-2, doi:10.1186/s12915-023-01780-2. Hirling, D., Tasnadi, E., Caicedo, J., Caroprese, M.V., Sjögren, R., Aubreville, M., Koos, K., Horvath, P.,
-
[10]
Howard, A., Chow, A., CorporateResearchSartorius, Ca, M., Culliton, P., Jackson, T.,
Segmentation metric misinterpretationsinbioimageanalysis.NatureMethods21,213–216.URL:https://www.nature.com/articles/s41592-023-01942-8, doi:10.1038/s41592-023-01942-8. Howard, A., Chow, A., CorporateResearchSartorius, Ca, M., Culliton, P., Jackson, T.,
-
[11]
J.vandeSandeetal.arXive-prints,art.arXiv:2306.00059,May2023
Sartorius - Cell Instance Segmentation. URL: https://kaggle.com/sartorius-cell-instance-segmentation. Hörst,F.,Rempe,M.,Heine,L.,Seibold,C.,Keyl,J.,Baldini,G.,Ugurel,S.,Siveke,J.,Grünwald,B.,Egger,J.,Kleesiek,J.,2023. CellViT:Vision Transformers for Precise Cell Segmentation and Classification. URL:http://arxiv.org/abs/2306.15350, doi:10.48550/arXiv.2306....
-
[12]
URL:http://arxiv.org/abs/2207.01614, doi:10.48550/arXiv.2207.01614
Beyond mAP: Towards better evaluation of instance segmentation. URL:http://arxiv.org/abs/2207.01614, doi:10.48550/arXiv.2207.01614. arXiv:2207.01614 [cs.CV]. Kamat, P., Macaluso, N., Min, C., Li, Y., Agrawal, A., Winston, A., Pan, L., Starich, B., Stewart, T., Wu, P.H., Fan, J., Walston, J., Phillip, J.M.,
-
[13]
Single-cell morphology encodes functional subtypes of senescence in aging human dermal fibroblasts. URL:https://www.biorxiv. org/content/10.1101/2024.05.10.593637v2, doi:10.1101/2024.05.10.593637. pages: 2024.05.10.593637 Section: New Results. Karmakar, R., Nørrelykke, S.F.,
-
[14]
URL: http://arxiv.org/abs/2505.12155, doi:10.48550/arXiv.2505.12155
SoftPQ: Robust Instance Segmentation Evaluation via Soft Matching and Tunable Thresholds. URL: http://arxiv.org/abs/2505.12155, doi:10.48550/arXiv.2505.12155. arXiv:2505.12155 [cs]. Kirillov, A., Girshick, R., He, K., Dollár, P.,
-
[15]
Panoptic Feature Pyramid Networks
Panoptic Feature Pyramid Networks. URL:http://arxiv.org/abs/1901.02446, doi:10.48550/arXiv.1901.02446. arXiv:1901.02446 [cs]. Kumar, N., Verma, R., Sharma, S., Bhargava, S., Vahadane, A., Sethi, A.,
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1901.02446 1901
-
[16]
IEEE Transactions on Medical Imaging 36, 1550–1560
A Dataset and a Technique for Generalized Nuclear Segmentation for Computational Pathology. IEEE Transactions on Medical Imaging 36, 1550–1560. URL:https://ieeexplore.ieee.org/document/ 7872382, doi:10.1109/TMI.2017.2677499. Lin, T.Y., Maire, M., Belongie, S., Bourdev, L., Girshick, R., Hays, J., Perona, P., Ramanan, D., Zitnick, C.L., Dollár, P.,
-
[17]
Microsoft COCO: Common Objects in Context
Microsoft COCO: Common Objects in Context. URL:http://arxiv.org/abs/1405.0312, doi:10.48550/arXiv.1405.0312. arXiv:1405.0312 [cs]. Marks, M., Israel, U., Dilip, R., Li, Q., Yu, C., Laubscher, E., Iqbal, A., Pradhan, E., Ates, A., Abt, M., Brown, C., Pao, E., Li, S., Pearson-Goulart, A.,Perona,P.,Gkioxari,G.,Barnowski,R.,Yue,Y.,VanValen,D.,2025. CellSAM:af...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1405.0312 2025
-
[18]
A benchmark for comparison of cell tracking algorithms. Bioinformatics 30, 1609–1617. URL:https://doi.org/10. 1093/bioinformatics/btu080, doi:10.1093/bioinformatics/btu080. Mousavikhamene,Z.,Sykora,D.J.,Mrksich,M.,Bagheri,N.,2021. Morphologicalfeaturesofsinglecellsenableaccurateautomatedclassification of cancer from non-cancer cell lines. Scientific Repor...
-
[19]
Pachitariu, M., Rariden, M., Stringer, C.,
URL:https://doi.org/10.1186/s13287-017-0740-x, doi:10.1186/ s13287-017-0740-x. Pachitariu, M., Rariden, M., Stringer, C.,
-
[20]
Cellpose-SAM: superhuman generalization for cellular segmentation. URL:https://www. biorxiv.org/content/10.1101/2025.04.28.651001v1, doi:10.1101/2025.04.28.651001. pages: 2025.04.28.651001 Section: New Results. Reta, C., Altamirano, L., Gonzalez, J.A., Diaz-Hernandez, R., Peregrina, H., Olmos, I., Alonso, J.E., Lobato, R.,
-
[21]
URL:https://pmc.ncbi.nlm.nih.gov/articles/PMC4479443/, doi:10.1371/journal.pone.0130805
Segmentation and ClassificationofBoneMarrowCellsImagesUsingContextualInformationforMedicalDiagnosisofAcuteLeukemias.PLoSONE10,e0130805. URL:https://pmc.ncbi.nlm.nih.gov/articles/PMC4479443/, doi:10.1371/journal.pone.0130805. Stillwagon, K., VandeLoo, A.D., Magondu, B., Forest, C.R.,
-
[22]
Self-supervised Pretraining of Cell Segmentation Models
Self-supervised Pretraining of Cell Segmentation Models. URL:http: //arxiv.org/abs/2604.10609, doi:10.48550/arXiv.2604.10609. arXiv:2604.10609 [cs.CV] version:
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.10609
-
[23]
Cellpose:ageneralistalgorithmforcellularsegmentation
Stringer,C.,Wang,T.,Michaelos,M.,Pachitariu,M.,2021. Cellpose:ageneralistalgorithmforcellularsegmentation. NatureMethods18,100–106. URL:https://www.nature.com/articles/s41592-020-01018-x, doi:10.1038/s41592-020-01018-x. VandeLoo, A.D., Malta, N.J., Sanganeriya, S., Aponte, E., Zyl, C.v., Xu, D., Forest, C.,
-
[24]
SAMCell: Generalized label-free biological cell segmentation with segment anything. PLOS ONE 20, e0319532. URL:https://journals.plos.org/plosone/article?id=10.1371/ journal.pone.0319532, doi:10.1371/journal.pone.0319532. Vasilevich, A.S., Vermeulen, S., Kamphuis, M., Roumans, N., Eroumé, S., Hebels, D.G.A.J., van de Peppel, J., Reihs, R., Beijer, N.R.M., ...
-
[25]
On the correlation between material-induced cell shape and phenotypical response of human mesenchymal stem cells. Scientific Reports 10, 18988. URL:https://www.nature.com/articles/s41598-020-76019-z, doi:10.1038/s41598-020-76019-z. Verma, R., Kumar, N., Patil, A., Kurian, N.C., Rane, S., Graham, S., Vu, Q.D., Zwager, M., Raza, S.E.A., Rajpoot, N., Wu, X.,...
-
[26]
MoNuSAC2020: A Multi-Organ Nuclei Segmentation and Classification Challenge. IEEE Transactions on Medical Imaging 40, 3413–3423. URL:https://ieeexplore.ieee.org/abstract/document/9446924, doi:10.1109/TMI.2021.3085712. Way, G.P., Kost-Alimova, M., Shibue, T., Harrington, W.F., Gill, S., Piccioni, F., Becker, T., Shafqat-Abbasi, H., Hahn, W.C., Carpenter, A...
-
[27]
Molecular Biology of the Cell 32, 995–1005
Predicting cell health phenotypes using image-based morphology profiling. Molecular Biology of the Cell 32, 995–1005. doi:10.1091/mbc.E20-12-0784. Welter, E.M., Benavides, S., Archer, T.K., Kosyk, O., Zannas, A.S.,
-
[28]
Machine learning-based morphological quantification of replicative senescence in human fibroblasts. GeroScience 46, 2425–2439. URL:https://doi.org/10.1007/s11357-023-01007-w, doi:10.1007/ s11357-023-01007-w. Wu, P.H., Gilkes, D.M., Phillip, J.M., Narkar, A., Cheng, T.W.T., Marchand, J., Lee, M.H., Li, R., Wirtz, D.,
-
[29]
Single-cell morphology encodes metastatic potential. Science Advances 6, eaaw6938. URL:https://www.science.org/doi/10.1126/sciadv.aaw6938, doi:10.1126/sciadv.aaw6938. K. Stillwagon et al.:Preprint Page 15 of 16 Maximum Matching Accuracy Kaden Stillwagon is a Computer Science masters student at Georgia Tech. He holds a B.S. in Computer Science from Georgia...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.