pith. sign in

arxiv: 2206.01653 · v8 · pith:TXA72PKYnew · submitted 2022-06-03 · 💻 cs.CV

Metrics reloaded: Recommendations for image analysis validation

classification 💻 cs.CV
keywords metricsimagereloadedvalidationanalysisframeworkproblemacross
0
0 comments X
read the original abstract

Increasing evidence shows that flaws in machine learning (ML) algorithm validation are an underestimated global problem. Particularly in automatic biomedical image analysis, chosen performance metrics often do not reflect the domain interest, thus failing to adequately measure scientific progress and hindering translation of ML techniques into practice. To overcome this, our large international expert consortium created Metrics Reloaded, a comprehensive framework guiding researchers in the problem-aware selection of metrics. Following the convergence of ML methodology across application domains, Metrics Reloaded fosters the convergence of validation methodology. The framework was developed in a multi-stage Delphi process and is based on the novel concept of a problem fingerprint - a structured representation of the given problem that captures all aspects that are relevant for metric selection, from the domain interest to the properties of the target structure(s), data set and algorithm output. Based on the problem fingerprint, users are guided through the process of choosing and applying appropriate validation metrics while being made aware of potential pitfalls. Metrics Reloaded targets image analysis problems that can be interpreted as a classification task at image, object or pixel level, namely image-level classification, object detection, semantic segmentation, and instance segmentation tasks. To improve the user experience, we implemented the framework in the Metrics Reloaded online tool, which also provides a point of access to explore weaknesses, strengths and specific recommendations for the most common validation metrics. The broad applicability of our framework across domains is demonstrated by an instantiation for various biological and medical image analysis use cases.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 7 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. U-Mamba: Enhancing Long-range Dependency for Biomedical Image Segmentation

    eess.IV 2024-01 unverdicted novelty 7.0

    U-Mamba is a hybrid CNN-SSM architecture that outperforms prior CNN and Transformer networks on biomedical image segmentation tasks by efficiently modeling long-range dependencies.

  2. MONAI: An open-source framework for deep learning in healthcare

    cs.LG 2022-11 accept novelty 6.0

    MONAI is a community-supported PyTorch framework that extends deep learning to medical data with domain-specific architectures, transforms, and deployment tools.

  3. ClinReadNet: A clinical reading-inspired network for low-dose abdominal CT image quality assessment

    cs.CV 2026-06 unverdicted novelty 5.0

    ClinReadNet introduces SOQN, (S)W-MTMSA, and HRPS loss to achieve SOTA no-reference IQA on LDCTIQAG2023 with PLCC 0.9507, SROCC 0.9554, KROCC 0.8629.

  4. OSS: Open Suturing Skills Vision-Based Assessment Challenge 2024-2025

    cs.CV 2026-05 accept novelty 5.0

    The OSS Challenge provides benchmarks showing spatiotemporal video models excel at open suturing skill classification and OSATS scoring but struggle with keypoint tracking under occlusion.

  5. U-SEG: Uncertainty in SEGmentation -- A systematic multi-variable exploration

    cs.CV 2026-05 unverdicted novelty 5.0

    Systematic multi-variable experiments show panoptic segmentation yields poorer uncertainty quality than semantic, with high variance across datasets and backbones, limited value from time-series samples, calibration g...

  6. The autoPET3 Challenge: Automated Lesion Segmentation in Whole-Body PET/CT $\unicode{x2013}$ Multitracer Multicenter Generalization

    cs.CV 2026-05 unverdicted novelty 5.0

    The autoPET3 challenge finds that leading AI models reach a mean Dice score of 0.66 for multitracer PET/CT lesion segmentation, with compositional generalization to unseen tracer-center pairs remaining an open problem...

  7. The autoPET3 Challenge: Automated Lesion Segmentation in Whole-Body PET/CT $\unicode{x2013}$ Multitracer Multicenter Generalization

    cs.CV 2026-05 unverdicted novelty 4.0

    The autoPET3 challenge finds good in-domain lesion segmentation performance in multitracer PET/CT but identifies compositional generalization to unseen tracer-center combinations as an open problem driven by volume ov...