LLM-Powered Personalized Glycemic Assessment in Type 2 Diabetes with Wearable Sensor Data

Yanmin Gong; Yifan Gao; Yuanxiong Guo; Yun Shi

arxiv: 2606.12699 · v1 · pith:JZMERVE7new · submitted 2026-06-10 · 💻 cs.LG · cs.AI

LLM-Powered Personalized Glycemic Assessment in Type 2 Diabetes with Wearable Sensor Data

Yifan Gao , Yanmin Gong , Yun Shi , Yuanxiong Guo This is my paper

Pith reviewed 2026-06-27 09:54 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords LLMType 2 DiabetesContinuous Glucose MonitorGlycemic AssessmentWearable SensorsGlucose ForecastingPersonalized MedicineAI-READI

0 comments

The pith

GlyLLM combines continuous glucose monitor readings with personal metadata inside a pre-trained LLM to improve forecasting and classification for type 2 diabetes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces GlyLLM, a framework that feeds wearable sensor streams and structured health metadata into a large language model so the model can perform sensor-text semantic abstraction at decision time. Traditional machine-learning approaches rely mainly on past glucose values and ignore individual context, which limits accuracy across patients. GlyLLM uses the LLM's pre-trained knowledge plus the supplied metadata to address this gap. On the AI-READI dataset the method lowers root-mean-squared error in glucose forecasting by 13.66 percent and raises AUROC in diabetes categorization by 13.08 percent relative to standard baselines. An ablation study identifies diabetes surveys and biometric tests as the most influential metadata components.

Core claim

GlyLLM achieves sensor-text semantic abstraction at decision time by integrating continuous glucose monitor data with structured metadata inside a pre-trained large language model, yielding lower forecasting error and higher categorization accuracy than traditional machine-learning baselines on the AI-READI dataset.

What carries the argument

GlyLLM, an LLM-powered framework that performs sensor-text semantic abstraction using pre-trained knowledge plus provided metadata at decision time.

If this is right

Glucose forecasting error drops by an average of 13.66 percent RMSE compared with standard machine-learning methods.
Diabetes categorization improves by an average of 13.08 percent AUROC compared with standard machine-learning methods.
Diabetes surveys and biometric test results contribute more to performance than other categories of health metadata.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same metadata-plus-LLM pattern could be tested on other wearable-driven chronic-disease tasks such as hypertension or sleep-apnea monitoring.
Real-time mobile applications could deliver daily glycemic guidance without collecting large amounts of patient-specific training data.
If distribution shift proves problematic, lightweight metadata-only adapters might be added without full model retraining.

Load-bearing premise

The pre-trained LLM can reliably translate sensor readings into useful abstractions from metadata alone without task-specific fine-tuning or performance loss from patient distribution shift.

What would settle it

Performance on a held-out patient cohort drawn from a different demographic or sensor distribution falls back to or below the level of traditional ML methods.

Figures

Figures reproduced from arXiv: 2606.12699 by Yanmin Gong, Yifan Gao, Yuanxiong Guo, Yun Shi.

**Figure 1.** Figure 1: GlyLLM Model Architecture. Text embeddings from static metadata and sensor data embeddings, along with text embeddings from task instruction prompts, are structured as sequential inputs for the backbone LLM. text embedder that encodes 𝑋𝑝 and 𝑋𝑞 into text token embeddings, a sensor encoder with an adapter to map 𝑋𝑠 into sensor data embeddings, a backbone LLM to fuse and analyze all the provided information… view at source ↗

**Figure 2.** Figure 2: Examples of text prompts and sensor data used in two tasks. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Prompt templates used for glucose forecasting. [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗

read the original abstract

Type 2 Diabetes (T2D) poses an increasing global health threat, demanding effective glycemic assessment to support personalized and improved diabetes care. Wearable sensors such as continuous glucose monitors (CGM) and fitness trackers offer many valuable insights for glycemic assessment. However, effectively analyzing these data requires integration with essential individual-level context. Existing methods are often based on traditional machine learning (ML) and rely primarily on historical blood glucose measurements and overlook personalized information, which limits their performance across diverse diabetes populations. Recent advances in large language models (LLMs) have demonstrated their ability to integrate diverse data modalities while modeling sequential dependencies, motivating the exploration of their potential for personalized glycemic assessment. In this paper, we propose GlyLLM, an LLM-powered framework for modeling CGM-based glycemic dynamics through the integration of wearable sensor data and structured metadata. GlyLLM can leverage the extensive prior knowledge of pre-trained LLMs and achieve sensor-text semantic abstraction at decision time. Experiments on two related tasks on the AI-READI dataset demonstrate that our model outperforms traditional ML methods by an average of 13.66\% in Root Mean Squared Error (RMSE) for glucose forecasting and 13.08\% in Area Under the Receiver Operating Characteristic (AUROC) for diabetes categorization. Additionally, our ablation study shows that diabetes surveys and biometric tests are more critical than other health information for glycemic assessment. Our work presents a promising step toward harnessing the power of LLMs to advance personalized glycemic assessment in T2D care.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GlyLLM gets modest gains over plain ML on glucose tasks by adding metadata to an LLM, but the writeup skips the details needed to trust how the sensor data was handled or split.

read the letter

The paper's main result is that GlyLLM beats standard machine learning by roughly 13.7% RMSE on glucose forecasting and 13.1% AUROC on diabetes categorization when it folds in wearable readings plus patient metadata from the AI-READI cohort. An ablation flags diabetes surveys and biometric tests as the most useful extra inputs.

The approach itself is a direct application of pre-trained LLMs to combine time-series sensor data with structured text. That move is reasonable for a setting where personal context should matter, and the ablation gives a usable signal about which metadata types actually help.

The weak point is the missing experimental controls. Nothing in the description explains how CGM traces are tokenized for the LLM, whether any fine-tuning or in-context examples were used, or whether the train-test split kept patients completely separate. Without those pieces it is difficult to rule out leakage or distribution shift as the source of the reported margins. Significance testing is also absent.

This work is aimed at people already working on wearable health models who want to test LLM prompting in a diabetes context. A reader looking for concrete ideas on metadata weighting could extract value from the ablation, but the performance claims are not yet reproducible from what is shown.

I would send it to peer review if the authors supply the tokenization, training, and split details, because the topic is practical and the ablation is worth checking. Right now the evidence is too thin for a strong endorsement.

Referee Report

3 major / 2 minor

Summary. The paper proposes GlyLLM, an LLM-powered framework that integrates CGM wearable sensor data with structured metadata (including diabetes surveys and biometric tests) to model glycemic dynamics for Type 2 diabetes. It reports that GlyLLM outperforms traditional ML baselines by an average of 13.66% RMSE on glucose forecasting and 13.08% AUROC on diabetes categorization tasks using the AI-READI dataset, with an ablation study identifying surveys and biometric tests as the most critical metadata components.

Significance. If the performance margins are reproducible and attributable to the LLM component rather than data-processing choices, the work would represent a meaningful exploration of pre-trained LLMs for multimodal sensor-text integration in personalized glycemic assessment, addressing a gap in traditional ML approaches that overlook individual context.

major comments (3)

[Abstract and §3] Abstract and §3 (Methods): no description is given of CGM time-series tokenization, input formatting for the LLM, use of in-context examples, fine-tuning procedure, or whether the LLM operates zero-shot at inference; without these details the central claim that 'sensor-text semantic abstraction at decision time' drives the reported gains cannot be evaluated.
[§4] §4 (Experiments): the 13.66% RMSE and 13.08% AUROC margins are stated without patient-level train/test splits, baseline implementation details, statistical significance tests, or variance across runs; this prevents verification that improvements are not due to distribution shift or leakage within the AI-READI cohort.
[§4.2] §4.2 (Ablation): the finding that 'diabetes surveys and biometric tests are more critical' is presented without quantitative ablation tables or controls for feature correlation, so it is impossible to assess whether the result is load-bearing for the personalization claim.

minor comments (2)

[Abstract] Abstract: the phrase 'outperforms traditional ML methods by an average of 13.66%' should specify the exact set of baselines and whether the average is macro or weighted.
[§2] Notation: 'sensor-text semantic abstraction' is used without a formal definition or pseudocode showing how metadata is concatenated with CGM sequences.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important gaps in methodological and experimental transparency. We will revise the manuscript to address each point and improve reproducibility and clarity.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Methods): no description is given of CGM time-series tokenization, input formatting for the LLM, use of in-context examples, fine-tuning procedure, or whether the LLM operates zero-shot at inference; without these details the central claim that 'sensor-text semantic abstraction at decision time' drives the reported gains cannot be evaluated.

Authors: We agree these details are required to evaluate the LLM's contribution. In the revised §3 we will add a full description of the CGM tokenization process (including how time-series values are discretized and embedded), the precise prompt/input formatting for the LLM, whether in-context examples were used, the fine-tuning procedure (or confirmation of zero-shot operation), and the inference setting. This will directly support the sensor-text abstraction claim. revision: yes
Referee: [§4] §4 (Experiments): the 13.66% RMSE and 13.08% AUROC margins are stated without patient-level train/test splits, baseline implementation details, statistical significance tests, or variance across runs; this prevents verification that improvements are not due to distribution shift or leakage within the AI-READI cohort.

Authors: We acknowledge that these experimental controls are necessary to rule out leakage and confirm robustness. We will update §4 to explicitly state patient-level train/test splits, provide complete baseline implementation details (hyperparameters, libraries, preprocessing), report statistical significance (e.g., paired tests with p-values), and include variance or standard deviation across multiple random seeds/runs. revision: yes
Referee: [§4.2] §4.2 (Ablation): the finding that 'diabetes surveys and biometric tests are more critical' is presented without quantitative ablation tables or controls for feature correlation, so it is impossible to assess whether the result is load-bearing for the personalization claim.

Authors: We agree that quantitative tables and correlation controls are needed. We will expand §4.2 with full ablation tables showing performance changes when each metadata type is removed, plus an analysis of feature correlations (e.g., correlation matrix or controlled ablations) to demonstrate that the identified components remain critical after accounting for inter-feature dependencies. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical ML framework with standard evaluation

full rationale

The paper proposes GlyLLM as an LLM integration framework for CGM and metadata, then reports empirical outperformance (13.66% RMSE, 13.08% AUROC) on the AI-READI dataset. No mathematical derivation chain, equations, or first-principles results exist that reduce to inputs by construction. No self-definitional steps, fitted inputs renamed as predictions, or load-bearing self-citations appear in the provided text. The central claims rest on experimental results rather than any closed logical loop, making the work self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unstated premise that pre-trained LLMs already encode useful priors for glucose dynamics and that metadata can be fused at inference time without additional training details or validation against external cohorts.

axioms (1)

domain assumption Pre-trained LLMs contain transferable knowledge sufficient for sensor-text abstraction in glycemic tasks
Invoked in the motivation paragraph when the authors state that LLMs can integrate diverse data modalities while modeling sequential dependencies.

pith-pipeline@v0.9.1-grok · 5811 in / 1413 out tokens · 15944 ms · 2026-06-27T09:54:37.524546+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

41 extracted references · 4 canonical work pages · 2 internal anchors

[1]

Global, regional, and national burden of type 2 diabetes mellitus caused by high BMI from 1990 to 2021, and forecasts to 2045: Analysis from the global burden of disease study 2021

X. Huang, Y . Wu, Y . Ni, H. Xu, and Y . He, “Global, regional, and national burden of type 2 diabetes mellitus caused by high BMI from 1990 to 2021, and forecasts to 2045: Analysis from the global burden of disease study 2021”,Frontiers in Public Health, vol. 13, 2025

1990
[2]

The burden and risks of emerging complications of diabetes mellitus

D. Tomic, J. E. Shaw, and D. J. Magliano, “The burden and risks of emerging complications of diabetes mellitus”,Nature Reviews Endocrinology, vol. 18, no. 9, pp. 525–539, 2022

2022
[3]

Economic costs of diabetes in the U.S. in 2022

E. D. Parker, J. Lin, T. Mahoney, N. Ume, G. Yang, R. A. Gabbay, N. A. ElSayed, and R. R. Bannuru, “Economic costs of diabetes in the U.S. in 2022”,Diabetes Care, vol. 47, no. 1, pp. 26–43, 2023

2022
[4]

Mobile and wearable technology for the monitoring of diabetes-related parameters: Systematic review

C. Rodriguez-Le ´on, C. Villalonga, M. Munoz-Torres, J. R. Ruiz, and O. Banos, “Mobile and wearable technology for the monitoring of diabetes-related parameters: Systematic review”,JMIR mHealth and uHealth, vol. 9, no. 6, p. e25138, 2021

2021
[5]

Applications of federated learning in mobile health: Scoping review

T. Wang, Y . Du, Y . Gong, K.-K. R. Choo, and Y . Guo, “Applications of federated learning in mobile health: Scoping review”,Journal of Medical Internet Research, vol. 25, p. e43006, 2023

2023
[6]

Heterogeneity of continuous glucose monitoring features and their clinical associations in a type 2 diabetes population

E. Healey, C. Morato, J. Murillo, and I. Kohane, “Heterogeneity of continuous glucose monitoring features and their clinical associations in a type 2 diabetes population”,Diabetes, Obesity and Metabolism, vol. 27, no. 7, pp. 3957–3966, 2025

2025
[7]

Continuous glucose monitoring data analysis 2.0: Functional data pattern recognition and artificial intelligence applications

D. C. Klonoff, R. M. Bergenstal, E. Cengiz, M. A. Clements, D. Espes, J. Espinoza, D. Kerr, B. Kovatchev, D. M. Maahs, J. K. Mader, N. Mathioudakis, A. A. Metwally, S. N. Shah, B. Sheng, M. P. Snyder, G. Umpierrez, M. M. Shao, A. F. Scheideman, A. T. Ayers, C. N. Ho, and E. Healey, “Continuous glucose monitoring data analysis 2.0: Functional data pattern ...

2025
[8]

Deep multitask learning by stacked long short-term memory for predicting personalized blood glucose concentration

M. M. H. Shuvo, and S. K. Islam, “Deep multitask learning by stacked long short-term memory for predicting personalized blood glucose concentration”,IEEE Journal of Biomedical and Health Informatics, vol. 27, no. 3, pp. 1612–1623, 2023

2023
[9]

Prediction of metabolic subphenotypes of type 2 diabetes via continuous glucose monitoring and machine learning

A. A. Metwally, D. Perelman, H. Park, Y . Wu, A. Jha, S. Sharp, A. Celli, E. Ayhan, F. Abbasi, A. L. Gloyn, T. McLaughlin, and M. P. Snyder, “Prediction of metabolic subphenotypes of type 2 diabetes via continuous glucose monitoring and machine learning”,Nature Biomedical Engineering, vol. 9, no. 8, pp. 1222–1239, 2024

2024
[10]

Multi-horizon glucose prediction across populations with deep domain generalization

T. Zhu, I. Afentakis, K. Li, R. Armiger, N. Hill, N. Oliver, and P. Georgiou, “Multi-horizon glucose prediction across populations with deep domain generalization”,IEEE Journal of Biomedical and Health Informatics, vol. 29, no. 8, pp. 5424–5437, 2025

2025
[11]

Perspective on harnessing large language models to uncover insights in diabetes wearable data

A. Alavi, K. Cha, D. P. Esfarjani, B. Patel, J. L. P. Than, A. Y . Lee, C. Nebeker, M. Snyder, and A. Bahmani, “Perspective on harnessing large language models to uncover insights in diabetes wearable data”, medRxiv preprint medRxiv:2024.07.29.24310315, 2024

2024
[12]

LLM-CGM: A benchmark for large language model-enabled querying of continuous glucose monitoring data for conversational diabetes management

E. Healey, and I. Kohane, “LLM-CGM: A benchmark for large language model-enabled querying of continuous glucose monitoring data for conversational diabetes management”, inBiocomputing, pp. 82–93, 2025

2025
[13]

DM-Bench: Benchmarking LLMs for personalized decision making in diabetes management

M. A. Cardei, J. Lamp, M. Derdzinski, and K. Bhatia, “DM-Bench: Benchmarking LLMs for personalized decision making in diabetes management”,arXiv preprint arXiv:2510.00038, 2025

work page arXiv 2025
[14]

Empowering digital health management with on-device large language models for glucose prediction

T. Zhu, J. Howson, and A. Nevado-Holgado, “Empowering digital health management with on-device large language models for glucose prediction”,medRxiv preprint medRxiv:2025.07.12.25331188, 2025

2025
[15]

Mental-LLM: Leveraging large language models for mental health prediction via online text data

X. Xu, B. Yao, Y . Dong, S. Gabriel, H. Yu, J. Hendler, M. Ghassemi, A. K. Dey, and D. Wang, “Mental-LLM: Leveraging large language models for mental health prediction via online text data”,Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 8, no. 1, pp. 1–32, 2024

2024
[16]

Health- LLM: Large language models for health prediction via wearable sensor data

Y . Kim, X. Xu, D. McDuff, C. Breazeal, and H. W. Park, “Health- LLM: Large language models for health prediction via wearable sensor data”, inProceedings of the 5th Conference on Health, Inference, and Learning, 2024

2024
[17]

Empowering time series analysis with large language models: A survey

Y . Jiang, Z. Pan, X. Zhang, S. Garg, A. Schneider, Y . Nevmyvaka, and D. Song, “Empowering time series analysis with large language models: A survey”, inProceedings of the 33rd International Joint Conference on Artificial Intelligence, 2024

2024
[18]

An image is worth 16x16 words: Trans- formers for image recognition at scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Trans- formers for image recognition at scale”, inInternational Conference on Learning Representations, 2021

2021
[19]

A time series is worth 64 words: Long-term forecasting with transformers

Y . Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A time series is worth 64 words: Long-term forecasting with transformers”, in International Conference on Learning Representations, 2023

2023
[20]

SensorLM: Learning the language of wearable sensors

Y . Zhang, K. Ayush, S. Qiao, A. A. Heydari, G. Narayanswamy, M. A. Xu, A. Metwally, J. Xu, J. Garrison, X. Xu, T. Althoff, Y . Liu, P. Kohli, J. Zhan, M. Malhotra, S. Patel, C. Mascolo, X. Liu, D. McDuff, and Y . Yang, “SensorLM: Learning the language of wearable sensors”, in 39th Conference on Neural Information Processing Systems, 2025

2025
[21]

LoRA: Low-rank adaptation of large language models

E. J. Hu, yelong shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models”, inInternational Conference on Learning Representations, 2022

2022
[22]

Navigating text-to-image customization: From LyCORIS fine-tuning to model evaluation

S.-Y . Yeh, Y .-G. Hsieh, Z. Gao, B. B. Yang, G. Oh, and Y . Gong, “Navigating text-to-image customization: From LyCORIS fine-tuning to model evaluation”, inICLR, 2023

2023
[23]

Federated adaptive fine-tuning of large language models with heterogeneous quantization and LoRA

Z. Gao, Z. Zhang, Y . Guo, and Y . Gong, “Federated adaptive fine-tuning of large language models with heterogeneous quantization and LoRA”, inIEEE INFOCOM, 2025

2025
[24]

FedKRSO: Commu- nication and memory efficient federated fine-tuning of large language models

G. Yang, T. Wu, Y . Guo, Y . Sun, and Y . Gong, “FedKRSO: Commu- nication and memory efficient federated fine-tuning of large language models”, inIEEE INFOCOM, 2026

2026
[25]

Management of diabetes and hyperglycaemia in the hospital

F. J. Pasquel, M. C. Lansang, K. Dhatariya, and G. E. Umpierrez, “Management of diabetes and hyperglycaemia in the hospital”,The Lancet Diabetes and Endocrinology, vol. 9, no. 3, pp. 174–188, 2021

2021
[26]

Ana- lyzing the impact of personalization on fairness in federated learning for healthcare

T. Wang, K. Zhang, J. Cai, Y . Gong, K.-K. R. Choo, and Y . Guo, “Ana- lyzing the impact of personalization on fairness in federated learning for healthcare”,Journal of Healthcare Informatics Research, vol. 8, no. 2, pp. 181–205, 2024

2024
[27]

AI-READI: Rethinking data collection, prepa- ration and sharing for propelling AI-based discoveries in diabetes research and beyond

AI-READI Consortium, “AI-READI: Rethinking data collection, prepa- ration and sharing for propelling AI-based discoveries in diabetes research and beyond”,Nature Metabolism, vol. 6, no. 12, pp. 2210– 2212, 2024

2024
[28]

Are time series foundation models ready for vital sign forecasting in healthcare?

X. Gu, Y . Liu, Z. Mohsin, J. Bedford, A. Thakur, P. Watkinson, L. Clifton, T. Zhu, and D. Clifton, “Are time series foundation models ready for vital sign forecasting in healthcare?”, inProceedings of the 4th Machine Learning for Health Symposium, pp. 401–419, 2025

2025
[29]

A foundation model for continuous glucose monitoring data

G. Lutsker, G. Sapir, S. Shilo, J. Merino, A. Godneva, J. R. Greenfield, D. Samocha-Bonet, R. Dhir, F. Gude, S. Mannor, E. Meirom, E. P. Xing, G. Chechik, H. Rossman, and E. Segal, “A foundation model for continuous glucose monitoring data”,Nature, vol. 650, no. 8103, pp. 978–986, 2026

2026
[30]

A pretrained transformer model for decoding individual glucose dynamics from continuous glucose monitoring data

Y . Lu, D. Liu, Z. Liang, R. Liu, P. Chen, Y . Liu, J. Li, Z. Feng, L. M. Li, B. Sheng, W. Jia, L. Chen, H. Li, and Y . Wang, “A pretrained transformer model for decoding individual glucose dynamics from continuous glucose monitoring data”,National Science Review, vol. 12, no. 5, 2025

2025
[31]

Integration of artificial intelligence and wearable technology in the management of diabetes and prediabetes

R. A. Fraser, R. J. Walker, J. A. Campbell, O. Ekwunife, and L. E. Egede, “Integration of artificial intelligence and wearable technology in the management of diabetes and prediabetes”,npj Digital Medicine, vol. 8, no. 1, 2025

2025
[32]

Med42-v2: A suite of clinical LLMs

C. Christophe, P. K. Kanithi, T. Raha, S. Khan, and M. A. Pi- mentel, “Med42-v2: A suite of clinical LLMs”,arXiv preprint arXiv:2408.06142, 2024

work page arXiv 2024
[33]

Gemma 2: Improving Open Language Models at a Practical Size

G. Team, M. Riviere, S. Pathak, P. G. Sessa, C. Hardin, S. Bhupatiraju, L. Hussenot, T. Mesnard, B. Shahriari, A. Ram ´e et al., “Gemma 2: Improving open language models at a practical size”,arXiv preprint arXiv:2408.00118, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[34]

Mistral 7B

A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de las Casas, F. Bressand et al., “Mistral 7B”,arXiv preprint arXiv:2310.06825, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[35]

Crossformer: Transformer utilizing cross- dimension dependency for multivariate time series forecasting

Y . Zhang, and J. Yan, “Crossformer: Transformer utilizing cross- dimension dependency for multivariate time series forecasting”, in International Conference on Learning Representations, 2023

2023
[36]

iTrans- former: Inverted transformers are effective for time series forecasting

Y . Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long, “iTrans- former: Inverted transformers are effective for time series forecasting”, inInternational Conference on Learning Representations, 2024

2024
[37]

Multilayer feedforward networks are universal approximators

K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators”,Neural Networks, vol. 2, no. 5, pp. 359–366, 1989

1989
[38]

Long short-term memory

S. Hochreiter, and J. Schmidhuber, “Long short-term memory”,Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997

1997
[39]

Large language models are zero-shot reasoners

T. Kojima, S. S. Gu, M. Reid, Y . Matsuo, and Y . Iwasawa, “Large language models are zero-shot reasoners”, in36th Conference on Neural Information Processing Systems, 2022

2022
[40]

Language models are few-shot learners

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners”, in34th Conference on Neural Information Processing Systems, 2020

2020
[41]

Are language models actually useful for time series forecasting?

M. Tan, M. A. Merrill, V . Gupta, T. Althoff, and T. Hartvigsen, “Are language models actually useful for time series forecasting?”, in38th Conference on Neural Information Processing Systems, 2024

2024

[1] [1]

Global, regional, and national burden of type 2 diabetes mellitus caused by high BMI from 1990 to 2021, and forecasts to 2045: Analysis from the global burden of disease study 2021

X. Huang, Y . Wu, Y . Ni, H. Xu, and Y . He, “Global, regional, and national burden of type 2 diabetes mellitus caused by high BMI from 1990 to 2021, and forecasts to 2045: Analysis from the global burden of disease study 2021”,Frontiers in Public Health, vol. 13, 2025

1990

[2] [2]

The burden and risks of emerging complications of diabetes mellitus

D. Tomic, J. E. Shaw, and D. J. Magliano, “The burden and risks of emerging complications of diabetes mellitus”,Nature Reviews Endocrinology, vol. 18, no. 9, pp. 525–539, 2022

2022

[3] [3]

Economic costs of diabetes in the U.S. in 2022

E. D. Parker, J. Lin, T. Mahoney, N. Ume, G. Yang, R. A. Gabbay, N. A. ElSayed, and R. R. Bannuru, “Economic costs of diabetes in the U.S. in 2022”,Diabetes Care, vol. 47, no. 1, pp. 26–43, 2023

2022

[4] [4]

Mobile and wearable technology for the monitoring of diabetes-related parameters: Systematic review

C. Rodriguez-Le ´on, C. Villalonga, M. Munoz-Torres, J. R. Ruiz, and O. Banos, “Mobile and wearable technology for the monitoring of diabetes-related parameters: Systematic review”,JMIR mHealth and uHealth, vol. 9, no. 6, p. e25138, 2021

2021

[5] [5]

Applications of federated learning in mobile health: Scoping review

T. Wang, Y . Du, Y . Gong, K.-K. R. Choo, and Y . Guo, “Applications of federated learning in mobile health: Scoping review”,Journal of Medical Internet Research, vol. 25, p. e43006, 2023

2023

[6] [6]

Heterogeneity of continuous glucose monitoring features and their clinical associations in a type 2 diabetes population

E. Healey, C. Morato, J. Murillo, and I. Kohane, “Heterogeneity of continuous glucose monitoring features and their clinical associations in a type 2 diabetes population”,Diabetes, Obesity and Metabolism, vol. 27, no. 7, pp. 3957–3966, 2025

2025

[7] [7]

Continuous glucose monitoring data analysis 2.0: Functional data pattern recognition and artificial intelligence applications

D. C. Klonoff, R. M. Bergenstal, E. Cengiz, M. A. Clements, D. Espes, J. Espinoza, D. Kerr, B. Kovatchev, D. M. Maahs, J. K. Mader, N. Mathioudakis, A. A. Metwally, S. N. Shah, B. Sheng, M. P. Snyder, G. Umpierrez, M. M. Shao, A. F. Scheideman, A. T. Ayers, C. N. Ho, and E. Healey, “Continuous glucose monitoring data analysis 2.0: Functional data pattern ...

2025

[8] [8]

Deep multitask learning by stacked long short-term memory for predicting personalized blood glucose concentration

M. M. H. Shuvo, and S. K. Islam, “Deep multitask learning by stacked long short-term memory for predicting personalized blood glucose concentration”,IEEE Journal of Biomedical and Health Informatics, vol. 27, no. 3, pp. 1612–1623, 2023

2023

[9] [9]

Prediction of metabolic subphenotypes of type 2 diabetes via continuous glucose monitoring and machine learning

A. A. Metwally, D. Perelman, H. Park, Y . Wu, A. Jha, S. Sharp, A. Celli, E. Ayhan, F. Abbasi, A. L. Gloyn, T. McLaughlin, and M. P. Snyder, “Prediction of metabolic subphenotypes of type 2 diabetes via continuous glucose monitoring and machine learning”,Nature Biomedical Engineering, vol. 9, no. 8, pp. 1222–1239, 2024

2024

[10] [10]

Multi-horizon glucose prediction across populations with deep domain generalization

T. Zhu, I. Afentakis, K. Li, R. Armiger, N. Hill, N. Oliver, and P. Georgiou, “Multi-horizon glucose prediction across populations with deep domain generalization”,IEEE Journal of Biomedical and Health Informatics, vol. 29, no. 8, pp. 5424–5437, 2025

2025

[11] [11]

Perspective on harnessing large language models to uncover insights in diabetes wearable data

A. Alavi, K. Cha, D. P. Esfarjani, B. Patel, J. L. P. Than, A. Y . Lee, C. Nebeker, M. Snyder, and A. Bahmani, “Perspective on harnessing large language models to uncover insights in diabetes wearable data”, medRxiv preprint medRxiv:2024.07.29.24310315, 2024

2024

[12] [12]

LLM-CGM: A benchmark for large language model-enabled querying of continuous glucose monitoring data for conversational diabetes management

E. Healey, and I. Kohane, “LLM-CGM: A benchmark for large language model-enabled querying of continuous glucose monitoring data for conversational diabetes management”, inBiocomputing, pp. 82–93, 2025

2025

[13] [13]

DM-Bench: Benchmarking LLMs for personalized decision making in diabetes management

M. A. Cardei, J. Lamp, M. Derdzinski, and K. Bhatia, “DM-Bench: Benchmarking LLMs for personalized decision making in diabetes management”,arXiv preprint arXiv:2510.00038, 2025

work page arXiv 2025

[14] [14]

Empowering digital health management with on-device large language models for glucose prediction

T. Zhu, J. Howson, and A. Nevado-Holgado, “Empowering digital health management with on-device large language models for glucose prediction”,medRxiv preprint medRxiv:2025.07.12.25331188, 2025

2025

[15] [15]

Mental-LLM: Leveraging large language models for mental health prediction via online text data

X. Xu, B. Yao, Y . Dong, S. Gabriel, H. Yu, J. Hendler, M. Ghassemi, A. K. Dey, and D. Wang, “Mental-LLM: Leveraging large language models for mental health prediction via online text data”,Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, vol. 8, no. 1, pp. 1–32, 2024

2024

[16] [16]

Health- LLM: Large language models for health prediction via wearable sensor data

Y . Kim, X. Xu, D. McDuff, C. Breazeal, and H. W. Park, “Health- LLM: Large language models for health prediction via wearable sensor data”, inProceedings of the 5th Conference on Health, Inference, and Learning, 2024

2024

[17] [17]

Empowering time series analysis with large language models: A survey

Y . Jiang, Z. Pan, X. Zhang, S. Garg, A. Schneider, Y . Nevmyvaka, and D. Song, “Empowering time series analysis with large language models: A survey”, inProceedings of the 33rd International Joint Conference on Artificial Intelligence, 2024

2024

[18] [18]

An image is worth 16x16 words: Trans- formers for image recognition at scale

A. Dosovitskiy, L. Beyer, A. Kolesnikov, D. Weissenborn, X. Zhai, T. Unterthiner, M. Dehghani, M. Minderer, G. Heigold, S. Gelly, J. Uszkoreit, and N. Houlsby, “An image is worth 16x16 words: Trans- formers for image recognition at scale”, inInternational Conference on Learning Representations, 2021

2021

[19] [19]

A time series is worth 64 words: Long-term forecasting with transformers

Y . Nie, N. H. Nguyen, P. Sinthong, and J. Kalagnanam, “A time series is worth 64 words: Long-term forecasting with transformers”, in International Conference on Learning Representations, 2023

2023

[20] [20]

SensorLM: Learning the language of wearable sensors

Y . Zhang, K. Ayush, S. Qiao, A. A. Heydari, G. Narayanswamy, M. A. Xu, A. Metwally, J. Xu, J. Garrison, X. Xu, T. Althoff, Y . Liu, P. Kohli, J. Zhan, M. Malhotra, S. Patel, C. Mascolo, X. Liu, D. McDuff, and Y . Yang, “SensorLM: Learning the language of wearable sensors”, in 39th Conference on Neural Information Processing Systems, 2025

2025

[21] [21]

LoRA: Low-rank adaptation of large language models

E. J. Hu, yelong shen, P. Wallis, Z. Allen-Zhu, Y . Li, S. Wang, L. Wang, and W. Chen, “LoRA: Low-rank adaptation of large language models”, inInternational Conference on Learning Representations, 2022

2022

[22] [22]

Navigating text-to-image customization: From LyCORIS fine-tuning to model evaluation

S.-Y . Yeh, Y .-G. Hsieh, Z. Gao, B. B. Yang, G. Oh, and Y . Gong, “Navigating text-to-image customization: From LyCORIS fine-tuning to model evaluation”, inICLR, 2023

2023

[23] [23]

Federated adaptive fine-tuning of large language models with heterogeneous quantization and LoRA

Z. Gao, Z. Zhang, Y . Guo, and Y . Gong, “Federated adaptive fine-tuning of large language models with heterogeneous quantization and LoRA”, inIEEE INFOCOM, 2025

2025

[24] [24]

FedKRSO: Commu- nication and memory efficient federated fine-tuning of large language models

G. Yang, T. Wu, Y . Guo, Y . Sun, and Y . Gong, “FedKRSO: Commu- nication and memory efficient federated fine-tuning of large language models”, inIEEE INFOCOM, 2026

2026

[25] [25]

Management of diabetes and hyperglycaemia in the hospital

F. J. Pasquel, M. C. Lansang, K. Dhatariya, and G. E. Umpierrez, “Management of diabetes and hyperglycaemia in the hospital”,The Lancet Diabetes and Endocrinology, vol. 9, no. 3, pp. 174–188, 2021

2021

[26] [26]

Ana- lyzing the impact of personalization on fairness in federated learning for healthcare

T. Wang, K. Zhang, J. Cai, Y . Gong, K.-K. R. Choo, and Y . Guo, “Ana- lyzing the impact of personalization on fairness in federated learning for healthcare”,Journal of Healthcare Informatics Research, vol. 8, no. 2, pp. 181–205, 2024

2024

[27] [27]

AI-READI: Rethinking data collection, prepa- ration and sharing for propelling AI-based discoveries in diabetes research and beyond

AI-READI Consortium, “AI-READI: Rethinking data collection, prepa- ration and sharing for propelling AI-based discoveries in diabetes research and beyond”,Nature Metabolism, vol. 6, no. 12, pp. 2210– 2212, 2024

2024

[28] [28]

Are time series foundation models ready for vital sign forecasting in healthcare?

X. Gu, Y . Liu, Z. Mohsin, J. Bedford, A. Thakur, P. Watkinson, L. Clifton, T. Zhu, and D. Clifton, “Are time series foundation models ready for vital sign forecasting in healthcare?”, inProceedings of the 4th Machine Learning for Health Symposium, pp. 401–419, 2025

2025

[29] [29]

A foundation model for continuous glucose monitoring data

G. Lutsker, G. Sapir, S. Shilo, J. Merino, A. Godneva, J. R. Greenfield, D. Samocha-Bonet, R. Dhir, F. Gude, S. Mannor, E. Meirom, E. P. Xing, G. Chechik, H. Rossman, and E. Segal, “A foundation model for continuous glucose monitoring data”,Nature, vol. 650, no. 8103, pp. 978–986, 2026

2026

[30] [30]

A pretrained transformer model for decoding individual glucose dynamics from continuous glucose monitoring data

Y . Lu, D. Liu, Z. Liang, R. Liu, P. Chen, Y . Liu, J. Li, Z. Feng, L. M. Li, B. Sheng, W. Jia, L. Chen, H. Li, and Y . Wang, “A pretrained transformer model for decoding individual glucose dynamics from continuous glucose monitoring data”,National Science Review, vol. 12, no. 5, 2025

2025

[31] [31]

Integration of artificial intelligence and wearable technology in the management of diabetes and prediabetes

R. A. Fraser, R. J. Walker, J. A. Campbell, O. Ekwunife, and L. E. Egede, “Integration of artificial intelligence and wearable technology in the management of diabetes and prediabetes”,npj Digital Medicine, vol. 8, no. 1, 2025

2025

[32] [32]

Med42-v2: A suite of clinical LLMs

C. Christophe, P. K. Kanithi, T. Raha, S. Khan, and M. A. Pi- mentel, “Med42-v2: A suite of clinical LLMs”,arXiv preprint arXiv:2408.06142, 2024

work page arXiv 2024

[33] [33]

Gemma 2: Improving Open Language Models at a Practical Size

G. Team, M. Riviere, S. Pathak, P. G. Sessa, C. Hardin, S. Bhupatiraju, L. Hussenot, T. Mesnard, B. Shahriari, A. Ram ´e et al., “Gemma 2: Improving open language models at a practical size”,arXiv preprint arXiv:2408.00118, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[34] [34]

Mistral 7B

A. Q. Jiang, A. Sablayrolles, A. Mensch, C. Bamford, D. S. Chaplot, D. de las Casas, F. Bressand et al., “Mistral 7B”,arXiv preprint arXiv:2310.06825, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[35] [35]

Crossformer: Transformer utilizing cross- dimension dependency for multivariate time series forecasting

Y . Zhang, and J. Yan, “Crossformer: Transformer utilizing cross- dimension dependency for multivariate time series forecasting”, in International Conference on Learning Representations, 2023

2023

[36] [36]

iTrans- former: Inverted transformers are effective for time series forecasting

Y . Liu, T. Hu, H. Zhang, H. Wu, S. Wang, L. Ma, and M. Long, “iTrans- former: Inverted transformers are effective for time series forecasting”, inInternational Conference on Learning Representations, 2024

2024

[37] [37]

Multilayer feedforward networks are universal approximators

K. Hornik, M. Stinchcombe, and H. White, “Multilayer feedforward networks are universal approximators”,Neural Networks, vol. 2, no. 5, pp. 359–366, 1989

1989

[38] [38]

Long short-term memory

S. Hochreiter, and J. Schmidhuber, “Long short-term memory”,Neural Computation, vol. 9, no. 8, pp. 1735–1780, 1997

1997

[39] [39]

Large language models are zero-shot reasoners

T. Kojima, S. S. Gu, M. Reid, Y . Matsuo, and Y . Iwasawa, “Large language models are zero-shot reasoners”, in36th Conference on Neural Information Processing Systems, 2022

2022

[40] [40]

Language models are few-shot learners

T. Brown, B. Mann, N. Ryder, M. Subbiah, J. D. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell et al., “Language models are few-shot learners”, in34th Conference on Neural Information Processing Systems, 2020

2020

[41] [41]

Are language models actually useful for time series forecasting?

M. Tan, M. A. Merrill, V . Gupta, T. Althoff, and T. Hartvigsen, “Are language models actually useful for time series forecasting?”, in38th Conference on Neural Information Processing Systems, 2024

2024