pith. sign in

arxiv: 2604.17525 · v1 · submitted 2026-04-19 · 📡 eess.IV · cs.CV

VIDS: A Verified Imaging Dataset Standard for Medical AI

Pith reviewed 2026-05-10 04:56 UTC · model grok-4.3

classification 📡 eess.IV cs.CV
keywords VIDSmedical imaging datasetsdataset provenanceDICOM metadataNIfTI formatAI dataset validationannotation qualitymedical AI
0
0 comments X

The pith

VIDS introduces a machine-enforceable standard for medical imaging datasets that tracks annotation provenance and quality documentation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents VIDS as an open specification that defines consistent folder layouts, file naming, annotation history schemas, and quality records for medical imaging data used in AI. It fills the gap left by DICOM, which handles individual studies, and BIDS, which focuses on research organization, by adding curation-level checks that current datasets often lack. VIDS keeps NIfTI as the working image format while storing full original DICOM metadata in sidecar files to maintain traceability, and it includes 21 validation rules plus export paths to common ML tools. Benchmarking of four widely used public datasets shows they meet only 20 to 39 percent of the 22 defined compliance dimensions, with the biggest shortfalls in provenance and quality documentation. The authors release LIDC-Hybrid-100, a 100-subject CT dataset with consensus annotations that passes all 21 rules on the full compliance profile.

Core claim

VIDS establishes folder layouts, naming rules, provenance schemas for who annotated what and with what tool, plus quality documentation requirements, all backed by 21 machine-enforceable validation rules across two compliance profiles. It adopts NIfTI as the canonical working format while preserving complete DICOM metadata in sidecars, and it supports direct export to frameworks such as nnU-Net, MONAI, COCO, or flat NIfTI without loss of traceability information. Benchmarking reveals that even major public datasets like LIDC-IDRI, BraTS, CheXpert, and the Medical Segmentation Decathlon satisfy only 20-39 percent of the 22 compliance dimensions, primarily failing on provenance and quality. A

What carries the argument

The 22 compliance dimensions and 21 validation rules that check dataset structure, annotation provenance, quality documentation, and ML export readiness while preserving DICOM metadata in sidecars.

If this is right

  • Existing public datasets can be converted to VIDS format to add missing provenance records and pass automated validation.
  • Dataset creators can apply the rules from the start so that every annotation step remains traceable for later review.
  • ML pipelines can import VIDS datasets into nnU-Net or MONAI while retaining full metadata and quality flags.
  • Validators can be run automatically during dataset curation to catch gaps before release.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Widespread adoption could make it easier to combine data from multiple sources without losing track of how each label was created.
  • The sidecar approach for DICOM metadata may reduce the need to store duplicate full DICOM archives alongside working NIfTI files.
  • Future extensions could apply the same provenance rules to non-imaging medical data such as electronic health records.

Load-bearing premise

That the 22 dimensions and 21 rules capture the essential requirements for dataset quality and provenance without becoming either too narrow or too burdensome for everyday use.

What would settle it

A large-scale comparison in which models trained on VIDS-converted versions of existing datasets show no measurable improvement in accuracy or robustness over models trained on the original versions.

Figures

Figures reproduced from arXiv: 2604.17525 by Joan S. Muthu, John Shalen.

Figure 1
Figure 1. Figure 1: VIDS directory structure. Root level metadata files, subject/session/modality hierarchy, [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: VIDS annotation sidecar with provenance, documenting the annotator, tool, date, and QC [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Average compliance across four public datasets by category. Provenance and quality documen [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
read the original abstract

Medical imaging AI development is fundamentally dependent on annotated datasets, yet no existing standard provides machine-enforceable validation across dataset structure, annotation provenance, quality documentation, and ML readiness within a single framework. DICOM standardizes image acquisition, storage, and communication at the individual study level. BIDS organizes neuroimaging research datasets with consistent naming conventions. Neither addresses the curation layer, viz., who annotated what, when, with what tool, and to what quality standard. This paper presents VIDS (Verified Imaging Dataset Standard), an open specification that defines folder layout, file naming, annotation provenance schemas, quality documentation, and 21 machine-enforceable validation rules across two compliance profiles. VIDS uses NIfTI as a canonical working format while preserving full DICOM metadata in sidecars for traceability, and supports export to any downstream ML framework (nnU-Net, MONAI, COCO, flat NIfTI) without loss of provenance. Twenty-two compliance dimensions are defined and four major public datasets -- LIDC-IDRI, BraTS, CheXpert, and the Medical Segmentation Decathlon -- are benchmarked against these dimensions. Even widely used datasets satisfy only 20--39% of these dimensions, with provenance and quality documentation as the largest systematic gaps. LIDC-Hybrid-100 is released as a 100-subject VIDS-compliant reference CT dataset with consensus segmentation masks from four radiologist annotations (mean pairwise Dice 0.7765), validating 21/21 on the Full compliance profile. VIDS is fully open source: the specification is CC BY 4.0, all tools are Apache 2.0, the reference validator is available on PyPI (pip install vids-validator), and LIDC-Hybrid-100 is published on Zenodo (https://doi.org/10.5281/zenodo.19582717).

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper introduces VIDS (Verified Imaging Dataset Standard), an open specification defining folder layout, file naming, annotation provenance schemas, quality documentation, and 21 machine-enforceable validation rules across two compliance profiles for medical imaging datasets. VIDS adopts NIfTI as the canonical working format while retaining full DICOM metadata in sidecars and supports lossless export to frameworks such as nnU-Net, MONAI, and COCO. Twenty-two compliance dimensions are defined; four public datasets (LIDC-IDRI, BraTS, CheXpert, Medical Segmentation Decathlon) are benchmarked and found to satisfy only 20-39% of the dimensions, with provenance and quality documentation as the largest gaps. The authors release LIDC-Hybrid-100, a 100-subject VIDS-compliant CT reference dataset with consensus segmentations (mean pairwise Dice 0.7765) that passes 21/21 rules on the Full profile. The specification (CC BY 4.0), validator (Apache 2.0, available via pip), and dataset (Zenodo) are fully open.

Significance. If the 22 dimensions and 21 rules are shown to be appropriately scoped and derived, VIDS could provide a practical, machine-checkable framework that improves traceability and quality assurance for medical imaging datasets used in AI. The open release of the validator tool and the LIDC-Hybrid-100 reference dataset with multi-radiologist consensus annotations constitute concrete, immediately usable contributions that support reproducibility.

major comments (1)
  1. [Definition of the 22 compliance dimensions] The manuscript does not describe any systematic process (literature synthesis, expert Delphi, or explicit gap mapping from DICOM/BIDS) used to arrive at the precise set of 22 compliance dimensions and 21 validation rules. This is load-bearing for the central claim: the reported 20-39% compliance rates for LIDC-IDRI, BraTS, CheXpert, and the Medical Segmentation Decathlon, and the conclusion that provenance/quality gaps are systematic, depend directly on these author-chosen criteria being both essential and non-burdensome. Without such justification it remains possible that the dimensions are incomplete (e.g., omitting regulatory traceability or inter-rater metrics) or include low-value items, rendering the benchmarking percentages non-generalizable. (See the section defining the 22 compliance dimensions and the subsequent benchmarking results.)
minor comments (1)
  1. [Abstract and benchmarking results] The abstract reports that datasets satisfy 'only 20-39% of these dimensions'; a table or explicit breakdown in the results section showing per-dataset and per-dimension compliance would make the gap analysis more transparent and reproducible.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive evaluation of VIDS and for the constructive major comment. We agree that explicit justification of the compliance dimensions is necessary to support the benchmarking claims and will revise the manuscript accordingly.

read point-by-point responses
  1. Referee: [Definition of the 22 compliance dimensions] The manuscript does not describe any systematic process (literature synthesis, expert Delphi, or explicit gap mapping from DICOM/BIDS) used to arrive at the precise set of 22 compliance dimensions and 21 validation rules. This is load-bearing for the central claim: the reported 20-39% compliance rates for LIDC-IDRI, BraTS, CheXpert, and the Medical Segmentation Decathlon, and the conclusion that provenance/quality gaps are systematic, depend directly on these author-chosen criteria being both essential and non-burdensome. Without such justification it remains possible that the dimensions are incomplete (e.g., omitting regulatory traceability or inter-rater metrics) or include low-value items, rendering the benchmarking percentages non-generalizable. (See the section defining the 22 compliance dimensions and the subsequent benchmarking results.

    Authors: We acknowledge that the submitted manuscript does not include an explicit description of the derivation process for the 22 compliance dimensions and 21 rules. These were obtained by (1) cataloguing recurring dataset deficiencies reported across medical imaging AI literature (e.g., missing annotation provenance, undocumented quality controls), (2) identifying the specific gaps left by DICOM (study-level) and BIDS (research neuroimaging) when applied to curated ML datasets, and (3) retaining only those attributes that can be expressed as machine-enforceable rules. We will add a dedicated subsection (new Section 3.1) that presents this rationale, supplies supporting citations, and provides a gap-mapping table. We maintain that the current dimensions are both essential and practical, as shown by the reference dataset achieving full compliance and by the systematic shortfalls observed in four widely used public collections. We agree, however, that additional discussion of scope limitations (e.g., dataset-specific inter-rater statistics) will further strengthen the paper. revision: yes

Circularity Check

0 steps flagged

No significant circularity; VIDS is a definitional standard proposal with direct benchmarking

full rationale

The paper defines 22 compliance dimensions and 21 validation rules as an open specification, then directly counts how many are met by four public datasets (reporting 20-39% compliance) and shows one constructed reference dataset meeting 21/21. No equations, fitted parameters, predictions, or self-citations appear in the derivation chain. The benchmarking is a straightforward application of the proposed criteria rather than any reduction of outputs to inputs by construction. The central claim—that existing datasets exhibit gaps relative to the new standard—is self-contained and does not rely on external uniqueness theorems or smuggled ansatzes. This matches the expected non-finding for a standards proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper introduces a new specification rather than deriving results from equations; it relies on existing image formats as foundations.

axioms (1)
  • domain assumption NIfTI is an appropriate canonical working format that can preserve full DICOM metadata via sidecars
    Stated as the basis for the standard in the abstract
invented entities (1)
  • VIDS compliance profiles and 21 validation rules no independent evidence
    purpose: To provide machine-enforceable checks for dataset structure, provenance, and quality
    Newly defined by the authors

pith-pipeline@v0.9.0 · 5641 in / 1419 out tokens · 43720 ms · 2026-05-10T04:56:10.136820+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

12 extracted references · 12 canonical work pages

  1. [1]

    DICOM — Digital Imaging and Communications in Medicine, 2024

    National Electrical Manufacturers Association. DICOM — Digital Imaging and Communications in Medicine, 2024. Accessed: 2026-04-14

  2. [2]

    The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments.Scientific Data, 3(1):1–9, 2016

    Krzysztof J Gorgolewski, Tibor Auer, Vince D Calhoun, et al. The brain imaging data structure, a format for organizing and describing outputs of neuroimaging experiments.Scientific Data, 3(1):1–9, 2016

  3. [3]

    Microsoft COCO: Common objects in context

    Tsung-Yi Lin, Michael Maire, Serge Belongie, et al. Microsoft COCO: Common objects in context. InEuropean Conference on Computer Vision (ECCV), pages 740–755, 2014

  4. [4]

    The Pascal visual object classes (VOC) challenge.International Journal of Computer Vision, 88(2):303–338, 2010

    Mark Everingham, Luc Van Gool, Christopher K I Williams, John Winn, and Andrew Zisser- man. The Pascal visual object classes (VOC) challenge.International Journal of Computer Vision, 88(2):303–338, 2010. 10

  5. [5]

    Datasheets for datasets.Communica- tions of the ACM, 64(12):86–92, 2021

    Timnit Gebru, Jamie Morgenstern, Briana Vecchione, et al. Datasheets for datasets.Communica- tions of the ACM, 64(12):86–92, 2021

  6. [6]

    Data cards: Purposeful and trans- parent dataset documentation for responsible AI

    Mahima Pushkarna, Andrew Zaldivar, and Oddur Kjartansson. Data cards: Purposeful and trans- parent dataset documentation for responsible AI. InACM Conference on Fairness, Accountability, and Transparency (FAccT), pages 1776–1826, 2022

  7. [7]

    nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation.Nature Methods, 18(2):203–211, 2021

    Fabian Isensee, Paul F Jaeger, Simon A A Kohl, Jens Petersen, and Klaus H Maier-Hein. nnU-Net: a self-configuring method for deep learning-based biomedical image segmentation.Nature Methods, 18(2):203–211, 2021

  8. [8]

    MONAI: Medical open network for AI, 2020

    MONAI Consortium. MONAI: Medical open network for AI, 2020. Accessed: 2026-04-14

  9. [9]

    Samuel G Armato III, Geoffrey McLennan, Luc Bidaut, et al. The lung image database consortium (LIDC) and image database resource initiative (IDRI): a completed reference database of lung nodules on CT scans.Medical Physics, 38(2):915–931, 2011

  10. [10]

    The BraTS challenge: a comprehensive benchmark for brain tumor segmentation.IEEE Transactions on Medical Imaging, 34(10):1993– 2024, 2015

    Bjoern H Menze, Andras Jakab, Stefan Bauer, et al. The BraTS challenge: a comprehensive benchmark for brain tumor segmentation.IEEE Transactions on Medical Imaging, 34(10):1993– 2024, 2015. BraTS challenge ongoing; 2023 iteration used for analysis

  11. [11]

    CheXpert: A large chest radio- graph dataset with uncertainty labels and expert comparison.Proceedings of the AAAI Conference on Artificial Intelligence, 33:590–597, 2019

    Jeremy Irvin, Pranav Rajpurkar, Michael Ko, Yifan Yu, Silviana Ciurea-Ilcus, Chris Chute, Henrik Marklund, Behzad Haghgoo, Robyn Ball, Katie Shpanskaya, et al. CheXpert: A large chest radio- graph dataset with uncertainty labels and expert comparison.Proceedings of the AAAI Conference on Artificial Intelligence, 33:590–597, 2019

  12. [12]

    The medical segmentation decathlon

    Michela Antonelli, Annika Reinke, Spyridon Bakas, et al. The medical segmentation decathlon. Nature Communications, 13(1):4128, 2022. 11