pith. sign in

arxiv: 2604.11944 · v1 · submitted 2026-04-13 · 💻 cs.LG · q-bio.QM

A unified data format for managing diabetes time-series data: DIAbetes eXchange (DIAX)

Pith reviewed 2026-05-10 14:55 UTC · model grok-4.3

classification 💻 cs.LG q-bio.QM
keywords diabetestime-series datadata standardizationJSON formatCGMmachine learninginteroperabilityinsulin delivery
0
0 comments X

The pith

DIAX introduces a standardized JSON format to unify diabetes time-series data from CGM, insulin, and meals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes DIAX, a JSON-based standard for organizing time-series data generated by diabetes devices such as continuous glucose monitors, insulin delivery systems, and meal logs. The inconsistent formats currently used by different devices and studies make it difficult to combine datasets or reproduce analyses, especially for machine learning tasks. DIAX addresses this by defining a common structure that includes key signals while allowing extensions, backed by open-source tools for conversion and visualization. If adopted, it would make large collections of patient data, already totaling over 10 million hours, more accessible and usable across research groups without forcing data to be hosted centrally.

Core claim

We present DIAX (DIAbetes eXchange), a standardized JSON-based format for unifying diabetes time-series data, including CGM, insulin, and meal signals. DIAX promotes interoperability, reproducibility, and extensibility, particularly for machine learning applications. An open-source repository provides tools for dataset conversion, cross-format compatibility, visualization, and community contributions. DIAX is a translational resource, not a data host, ensuring flexibility without imposing data-sharing constraints. Currently, DIAX is compatible with other standardization efforts and supports major datasets totaling over 10 million patient-hours of data.

What carries the argument

The DIAX JSON schema, which defines a common extensible structure for time-stamped signals including glucose levels, insulin doses, and meal events from diverse diabetes devices.

If this is right

  • Data from different clinical trials and devices can be combined more easily for larger-scale analyses.
  • Machine learning models for glucose prediction or automated insulin delivery become more reproducible across independent studies.
  • New device types or signal categories can be added to the format without breaking compatibility with existing datasets.
  • Existing large datasets such as DCLP3, DCLP5, IOBP2, PEDAP, T1Dexi, and Loop become directly usable with shared analysis code.
  • The format remains compatible with other ongoing standardization initiatives in the field.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Widespread use of DIAX could speed up training of more generalizable AI models by letting researchers pool data across trials without heavy preprocessing.
  • Device manufacturers may gain incentive to support DIAX export in their software to make their data more attractive for research collaborations.
  • The same extensible JSON approach could be tested on time-series data from other chronic conditions that rely on wearable sensors.

Load-bearing premise

That the diabetes research community and device manufacturers will adopt the DIAX format and associated tools instead of continuing with their current varied or proprietary data structures.

What would settle it

Check whether recent publications or new studies using diabetes device data have converted their datasets to DIAX format using the provided tools, or whether major device makers have added native DIAX export support.

Figures

Figures reproduced from arXiv: 2604.11944 by Anas El Fathi, Elliott C. Pryor, Marc D. Breton.

Figure 1
Figure 1. Figure 1: Example conversion workflow. Raw data is acquired from the original source. Then the matching [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Example visualization of DCLP3 data. Left shows comparison of AGP for two weeks generated with [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
read the original abstract

Diabetes devices, including Continuous Glucose Monitoring (CGM), Smart Insulin Pens, and Automated Insulin Delivery systems, generate rich time-series data widely used in research and machine learning. However, inconsistent data formats across sources hinder sharing, integration, and analysis. We present DIAX (DIAbetes eXchange), a standardized JSON-based format for unifying diabetes time-series data, including CGM, insulin, and meal signals. DIAX promotes interoperability, reproducibility, and extensibility, particularly for machine learning applications. An open-source repository provides tools for dataset conversion, cross-format compatibility, visualization, and community contributions. DIAX is a translational resource, not a data host, ensuring flexibility without imposing data-sharing constraints. Currently, DIAX is compatible with other standardization efforts and supports major datasets (DCLP3, DCLP5, IOBP2, PEDAP, T1Dexi, Loop), totaling over 10 million patient-hours of data. https://github.com/Center-for-Diabetes-Technology/DIAX

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The paper proposes DIAX (DIAbetes eXchange), a standardized JSON-based format for unifying heterogeneous diabetes time-series data sources including CGM, insulin, and meal logs. It supplies an open-source repository with conversion tools, visualization utilities, and cross-format compatibility, claiming support for major public datasets (DCLP3, DCLP5, IOBP2, PEDAP, T1Dexi, Loop) totaling over 10 million patient-hours. The format is presented as an extensible container to promote interoperability, reproducibility, and ML use without serving as a data host.

Significance. If the format and tools achieve adoption, the work could meaningfully reduce format inconsistencies that currently impede data sharing and integration in diabetes research and machine learning. The explicit compatibility claims with large public datasets and the provision of conversion/visualization code constitute practical strengths that support reproducibility. The decision to position DIAX as a translational resource rather than a data repository avoids overreach and aligns with the stated goals.

major comments (1)
  1. [Abstract and §3] Abstract and §3 (DIAX format description): The JSON schema is characterized at a high level (e.g., as a container for CGM/insulin/meal signals) without including the actual schema definition, required/optional fields, data types, timestamp conventions, or validation constraints. This omission is load-bearing for the central interoperability claim, as readers cannot evaluate extensibility or compatibility without immediately consulting the external repository.
minor comments (2)
  1. [§4] §4 (compatibility and tools): The conversion process for each listed dataset is asserted but not illustrated with even a single before/after example or pseudocode; adding one compact example per major dataset type would improve clarity without lengthening the manuscript substantially.
  2. The manuscript references an open-source repository but does not summarize its directory structure, main entry points, or maintenance plan; a short paragraph or table listing key files (schema.json, converters/, viz/) would aid reproducibility.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their positive evaluation of the work and for the constructive recommendation of minor revision. We address the single major comment below and will update the manuscript accordingly.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (DIAX format description): The JSON schema is characterized at a high level (e.g., as a container for CGM/insulin/meal signals) without including the actual schema definition, required/optional fields, data types, timestamp conventions, or validation constraints. This omission is load-bearing for the central interoperability claim, as readers cannot evaluate extensibility or compatibility without immediately consulting the external repository.

    Authors: We agree that the current high-level description in the abstract and §3 limits readers' ability to fully assess the format without consulting the repository. In the revised manuscript we will expand §3 to include the core JSON schema definition, explicitly listing required and optional fields for CGM, insulin, and meal entries, data types, timestamp conventions (ISO 8601 with timezone handling), and validation constraints. We will retain the repository link for the complete extensible schema, conversion tools, and examples. This change directly strengthens the interoperability claim while preserving the paper's focus as a translational resource rather than a data host. revision: yes

Circularity Check

0 steps flagged

No significant circularity identified

full rationale

The paper proposes DIAX, a JSON-based data format and associated open-source conversion/visualization tools for unifying heterogeneous diabetes time-series data (CGM, insulin, meals). No mathematical derivations, equations, predictions, or fitted parameters appear anywhere in the manuscript. Claims of interoperability rest on explicit compatibility with listed public datasets (DCLP3, DCLP5, etc.) and the released repository, not on any self-referential construction, self-citation load-bearing step, or renaming of prior results. The contribution is a self-contained practical standard whose correctness is externally verifiable by adoption and use of the schema, with no internal reduction of outputs to inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The paper introduces a new data format without relying on fitted parameters or complex mathematical axioms; it assumes standard JSON parsing and time-series handling are sufficient.

axioms (1)
  • domain assumption JSON is an appropriate and extensible container for heterogeneous diabetes time-series signals
    The choice of JSON as the base format is presented without comparison to alternatives such as HDF5 or specialized medical formats.
invented entities (1)
  • DIAX JSON schema no independent evidence
    purpose: To provide a unified structure for CGM, insulin, meal, and related diabetes signals
    The specific schema and field definitions are defined in this work.

pith-pipeline@v0.9.0 · 5496 in / 1218 out tokens · 45748 ms · 2026-05-10T14:55:20.282503+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

15 extracted references · 15 canonical work pages · 1 internal anchor

  1. [1]

    Klonoff, Richard M

    David C. Klonoff, Richard M. Bergenstal, Eda Cengiz, Mark A. Clements, Daniel Espes, Juan Espinoza, et al. CGM data analysis 2.0: Functional data pattern recognition and artificial intelligence applications. Journal of Diabetes Science and Technology, 2025. doi:10.1177/19322968251353228

  2. [2]

    Jacobs, Pau Herrero, Andrea Facchinetti, Josep Vehi, Boris Kovatchev, Marc D

    Peter G. Jacobs, Pau Herrero, Andrea Facchinetti, Josep Vehi, Boris Kovatchev, Marc D. Breton, et al. Artificial intelligence and machine learning for improving glycemic control in diabetes: Best practices, pitfalls, and opportunities.IEEE Reviews in Biomedical Engineering, 17:19–41, 2024. doi:10.1109/RBME.2023.3331297

  3. [3]

    Deep learning for diabetes: A sys- tematic review.IEEE Journal of Biomedical and Health Informatics, 25(7):2744–2757, 2021

    Taiyu Zhu, Kezhi Li, Pau Herrero, and Pantelis Georgiou. Deep learning for diabetes: A sys- tematic review.IEEE Journal of Biomedical and Health Informatics, 25(7):2744–2757, 2021. doi:10.1109/JBHI.2020.3040225

  4. [4]

    Digital twins in Type 1 diabetes: A systematic review,

    Giacomo Cappon and Andrea Facchinetti. Digital twins in type 1 diabetes: A systematic review.Journal of Diabetes Science and Technology, 19:1641–1649, 2024. doi:10.1177/19322968241262112

  5. [5]

    Aaron, Tina Tian, Andy M

    Rachel E. Aaron, Tina Tian, Andy M. Yeung, Jim Huang, David C. Klonoff, and Juan C. Es- pinoza. The launch of the iCoDE-2 standard project: Integration of connected diabetes device data into the electronic health record.Journal of Diabetes Science and Technology, 18:82–88, 2024. doi:10.1177/19322968231207888

  6. [6]

    DiaData: An integrated large dataset for type 1 diabetes and hypoglycemia research.BIO Web of Conferences, 195, 2025

    Burak Cinar and Maria Maleshkova. DiaData: An integrated large dataset for type 1 diabetes and hypoglycemia research.BIO Web of Conferences, 195, 2025. doi:10.1051/bioconf/202519503001

  7. [7]

    MetaboNet: The Largest Publicly Available Consolidated Dataset for Type 1 Diabetes Management

    Miriam K. Wolff, Peter Calhoun, Eleonora Maria Aiello, Yao Qin, and Sam F. Royston. Metabonet: The largest publicly available consolidated dataset for type 1 diabetes management. 1 2026. URL http://arxiv.org/abs/2601.11505

  8. [8]

    Carbohydrate counting accuracy and blood glucose variability in adults with type 1 diabetes.Diabetes Research and Clinical Practice, 99:19–23, 2013

    Anne-Sophie Brazeau, Hortensia Mircescu, Kathleen Desjardins, Christine Leroux, Irene Strychar, Jean- Marie Ekoé, et al. Carbohydrate counting accuracy and blood glucose variability in adults with type 1 diabetes.Diabetes Research and Clinical Practice, 99:19–23, 2013. doi:10.1016/j.diabres.2012.10.024

  9. [9]

    Breton, Lauren G

    Marc D. Breton, Lauren G. Kanapka, Roy W. Beck, Emily Laya, Gregory P. Forlenza, Colleen Eda, et al. A randomized trial of closed-loop control in children with type 1 diabetes.New England Journal of Medicine, 383:836–845, 2020. doi:10.1056/NEJMoa2004736. 6

  10. [10]

    Brown, Boris P

    Sue A. Brown, Boris P. Kovatchev, Dan Raghinaru, John W. Lum, Bruce A. Buckingham, Yogish C. Kudva, et al. Six-month randomized, multicenter trial of closed-loop control in type 1 diabetes.New England Journal of Medicine, 381:1707–1717, 2019. doi:10.1056/NEJMoa1907863

  11. [11]

    Multicenter, randomized trial of a bionic pancreas in type 1 diabetes

    Bionic Pancreas Research Group. Multicenter, randomized trial of a bionic pancreas in type 1 diabetes. New England Journal of Medicine, 387:1161–1172, 2022. doi:10.1056/NEJMoa2205225

  12. [12]

    Paul Wadwa, Zhaoming W

    R. Paul Wadwa, Zhaoming W. Reed, Bruce A. Buckingham, Mark D. DeBoer, Laya Ekhlaspour, Gregory P. Forlenza, et al. Trial of hybrid closed-loop control in young children with type 1 diabetes. New England Journal of Medicine, 388:991–1001, 2023. doi:10.1056/nejmoa2210834

  13. [13]

    Riddell, Zoey Li, Robin L

    Michael C. Riddell, Zoey Li, Robin L. Gal, Peter Calhoun, Peter G. Jacobs, Mark A. Clements, et al. Examining the acute glycemic effects of different types of structured exercise sessions in type 1 diabetes in a real-world setting: The type 1 diabetes and exercise initiative (T1DEXI).Diabetes Care, 46: 704–713, 2023. doi:10.2337/dc22-1721

  14. [14]

    Lum, Ryan J

    John W. Lum, Ryan J. Bailey, Vanessa Barnes-Lomen, Diana Naranjo, Korey K. Hood, Ray- han A. Lal, et al. A real-world prospective study of the safety and effectiveness of the loop open source automated insulin delivery system.Diabetes Technology & Therapeutics, 23:367–375, 2021. doi:10.1089/dia.2020.0535

  15. [15]

    AGATA: A toolbox for auto- mated glucose data analysis.Journal of Diabetes Science and Technology, 18:1109–1121, 2024

    Giacomo Cappon, Giovanni Sparacino, and Andrea Facchinetti. AGATA: A toolbox for auto- mated glucose data analysis.Journal of Diabetes Science and Technology, 18:1109–1121, 2024. doi:10.1177/19322968221147570. 7