Hybrid Attention Model Using Feature Decomposition and Knowledge Distillation for Glucose Forecasting
Pith reviewed 2026-05-23 17:25 UTC · model grok-4.3
The pith
GlucoNet forecasts blood glucose from irregular multimodal data by decomposing features and distilling a transformer, cutting RMSE by 60 percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GlucoNet is a feature decomposition-based transformer that incorporates patients' behavioral and physiological data, transforms sparse and irregular patient data into continuous features using a mathematical model, extracts low and high-frequency components from BGL signals to address their non-linear non-stationary character, and applies knowledge distillation to compress the model, delivering a 60 percent RMSE improvement, a 21 percent parameter reduction, and 51 percent and 57 percent gains in RMSE and MAE on recordings from twelve T1-Diabetes participants.
What carries the argument
Feature decomposition-based transformer that converts irregular inputs to continuous features via a mathematical model, splits BGL signals into frequency components, and is compressed by knowledge distillation.
If this is right
- Enables real-time forecasting on edge devices for in-the-moment interventions.
- Improves handling of long prediction horizons in non-stationary signals.
- Reduces model size while preserving accuracy on multimodal inputs.
- Allows behavioral data to be used alongside continuous glucose readings without alignment problems.
Where Pith is reading between the lines
- The same decomposition step could be tested on other irregularly sampled physiological streams such as heart-rate variability or activity counts.
- If the frequency split reliably captures non-stationarity, the approach might reduce data requirements for training in related time-series health tasks.
- Closed-loop integration with insulin delivery pumps would be a direct next measurement of clinical utility.
Load-bearing premise
The mathematical model that converts sparse irregular patient data into continuous features integrates accurately with blood glucose level measurements.
What would settle it
A new cohort of T1-Diabetes patients where the full GlucoNet pipeline shows no RMSE reduction relative to an undecomposed undistilled transformer baseline.
Figures
read the original abstract
The availability of continuous glucose monitors as over-the-counter commodities have created a unique opportunity to monitor a person's blood glucose levels, forecast blood glucose trajectories and provide automated interventions to prevent devastating chronic complications that arise from poor glucose control. However, forecasting blood glucose levels is challenging because blood glucose changes consistently in response to food intake, medication intake, physical activity, sleep, and stress. It is particularly difficult to accurately predict BGL from multimodal and irregularly sampled data and over long prediction horizons. Furthermore, these forecasting models must operate in real-time on edge devices to provide in-the-moment interventions. To address these challenges, we propose GlucoNet, an AI-powered sensor system for continuously monitoring behavioral and physiological health and robust forecasting of blood glucose patterns. GlucoNet devises a feature decomposition-based transformer model that incorporates patients' behavioral and physiological data and transforms sparse and irregular patient data (e.g., diet and medication intake data) into continuous features using a mathematical model, facilitating better integration with the BGL data. Given the non-linear and non-stationary nature of BG signals, we propose a decomposition method to extract both low and high-frequency components from the BGL signals, thus providing accurate forecasting. To reduce the computational complexity, we also propose to employ knowledge distillation to compress the transformer model. GlucoNet achieves a 60% improvement in RMSE and a 21% reduction in the number of parameters, improving RMSE and MAE by 51% and 57%, using data obtained involving 12 participants with T1-Diabetes. These results underscore GlucoNet's potential as a compact and reliable tool for real-world diabetes prevention and management.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes GlucoNet, a hybrid attention transformer model for blood glucose level (BGL) forecasting that applies feature decomposition to convert sparse/irregular multimodal inputs (diet, medication, etc.) into continuous features, decomposes BGL signals into low- and high-frequency components, and uses knowledge distillation to compress the model for edge deployment. It reports a 60% RMSE improvement, 51% RMSE and 57% MAE gains, and 21% parameter reduction on data from 12 T1D participants.
Significance. If the performance claims prove robust, the combination of explicit handling of irregular sampling via mathematical feature transformation and distillation-based compression would be a useful contribution to real-time, resource-constrained physiological forecasting. The emphasis on edge-device suitability is timely given the rise of continuous glucose monitors.
major comments (1)
- [Abstract and experimental evaluation] Abstract and experimental evaluation: the central claims of 60% RMSE improvement (plus 51%/57% gains and 21% parameter reduction) are presented without any description of the baseline models, cross-validation scheme (subject-independent vs. random splits), statistical significance tests, or error bars. With only 12 participants this omission renders the quantitative improvements impossible to assess and is load-bearing for the paper's primary contribution.
minor comments (1)
- [Abstract] The abstract states both a '60% improvement in RMSE' and 'improving RMSE and MAE by 51% and 57%'; clarify whether these refer to different baselines or prediction horizons.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback on experimental reporting. We agree that additional details are required to allow proper assessment of the reported performance gains and will revise the manuscript to address this.
read point-by-point responses
-
Referee: [Abstract and experimental evaluation] Abstract and experimental evaluation: the central claims of 60% RMSE improvement (plus 51%/57% gains and 21% parameter reduction) are presented without any description of the baseline models, cross-validation scheme (subject-independent vs. random splits), statistical significance tests, or error bars. With only 12 participants this omission renders the quantitative improvements impossible to assess and is load-bearing for the paper's primary contribution.
Authors: We acknowledge the omission and agree that the abstract and experimental sections must be expanded for transparency. In the revision we will add: (1) explicit descriptions of all baseline models compared against (LSTM, standard Transformer, and prior glucose forecasting methods); (2) clarification that evaluation used leave-one-subject-out cross-validation to ensure subject independence; (3) statistical significance results (paired t-tests or equivalent with p-values); and (4) error bars (standard deviation across folds) on all metrics. These changes will be incorporated into a dedicated Experimental Setup subsection and updated Results tables/figures. The small cohort (n=12) is already noted as a limitation in the discussion; the added details will make the quantitative claims evaluable without altering the underlying experiments. revision: yes
Circularity Check
No circularity in derivation or performance claims
full rationale
The paper describes an empirical ML pipeline: a feature-decomposition transformer trained on multimodal patient data from 12 T1D participants, followed by knowledge distillation and standard RMSE/MAE evaluation. No equation or step equates a fitted parameter to a claimed prediction by construction, no self-citation chain supplies the central result, and the reported improvements are external experimental outcomes rather than tautological restatements of inputs. The derivation remains self-contained against the data and architecture.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
Parkinson's Disease Detection via Self-Supervised Dual-Channel Cross-Attention on Bilateral Wrist-Worn IMU Signals
Self-supervised cross-attention on bilateral wrist IMU signals achieves 93% accuracy distinguishing Parkinson's from healthy controls and 87-92% from other diagnoses using only 20% labeled data.
Reference graph
Works this paper leans on
-
[1]
S. A. Rokni and H. Ghasemzadeh, “Plug-n-learn: automatic learning of computational algorithms in human-centered internet-of-things ap- plications,” in Proceedings of the 53rd Annual Design Automation Conference, 2016, pp. 1–6
work page 2016
-
[2]
S. B. Soumma, K. Mangipudi, D. Peterson, S. Mehta, and H. Ghasemzadeh, “Self-supervised learning and opportunistic inference for continuous monitoring of freezing of gait in parkinson’s disease,” arXiv preprint arXiv:2410.21326, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[3]
M. M. H. Shuvo and S. K. Islam, “Deep multitask learning by stacked long short-term memory for predicting personalized blood glucose concentration,” IEEE Journal of Biomedical and Health Informatics, vol. 27, no. 3, pp. 1612–1623, 2023. 12
work page 2023
-
[4]
S. Shanthi, “A novel approach for the prediction of glucose concen- tration in type 1 diabetes ahead in time through arima and differential evolution,” Adv. Eng. Inform., vol. 38, pp. 4182–4186, 2011
work page 2011
-
[5]
E. I. Georga, V . C. Protopappas, D. Polyzos, and D. I. Fotiadis, “Evaluation of short-term predictors of glucose concentration in type 1 diabetes combining feature ranking with regression models,” Medical & biological engineering & computing, vol. 53, pp. 1305–1318, 2015
work page 2015
-
[6]
A. Kargar-Barzi, E. Farahmand, N. Taheri Chatrudi, A. Mahani, and M. Shafique, “An edge-based wifi fingerprinting indoor localization using convolutional neural network and convolutional auto-encoder,” IEEE Access, vol. 12, pp. 85 050–85 060, 2024
work page 2024
-
[7]
Deep person- alized glucose level forecasting using attention-based recurrent neural networks,
M. Armandpour, B. Kidd, Y . Du, and J. Z. Huang, “Deep person- alized glucose level forecasting using attention-based recurrent neural networks,” in 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021, pp. 1–8
work page 2021
-
[8]
Bgformer: An improved informer model to enhance blood glucose prediction,
Y . Xue, S. Guan, and W. Jia, “Bgformer: An improved informer model to enhance blood glucose prediction,” Journal of Biomedical Informatics, vol. 157, p. 104715, 2024
work page 2024
-
[9]
S. Chen, F. Qin, X. Ma, J. Wei, Y .-T. Zhang, Y . Zhang, and E. Jovanov, “Multi-view cross-fusion transformer based on kinetic features for non- invasive blood glucose measurement using ppg signal,” IEEE Journal of Biomedical and Health Informatics, 2024
work page 2024
-
[10]
Multivariate variational mode decompo- sition,
N. ur Rehman and H. Aftab, “Multivariate variational mode decompo- sition,” IEEE Transactions on signal processing, vol. 67, no. 23, pp. 6039–6052, 2019
work page 2019
-
[11]
M. Yousefi, J. Wang, Ø. Fandrem Høivik, J. Rajasekharan, A. Hu- bert Wierling, H. Farahmand, and R. Arghandeh, “Short-term inflow forecasting in a dam-regulated river in southwest norway using causal variational mode decomposition,” Scientific Reports, vol. 13, no. 1, p. 7016, 2023
work page 2023
-
[12]
C. Kaur, A. Bisht, P. Singh, and G. Joshi, “Eeg signal denoising using hybrid approach of variational mode decomposition and wavelets for depression,” Biomedical Signal Processing and Control, vol. 65, p. 102337, 2021
work page 2021
-
[13]
Blood glucose prediction with vmd and lstm optimized by improved particle swarm optimization,
W. Wang, M. Tong, and M. Yu, “Blood glucose prediction with vmd and lstm optimized by improved particle swarm optimization,” IEEE Access, vol. 8, pp. 217 908–217 916, 2020
work page 2020
-
[14]
Blood glucose prediction with variance estimation using recurrent neural networks,
J. Martinsson, A. Schliep, B. Eliasson, and O. Mogren, “Blood glucose prediction with variance estimation using recurrent neural networks,” Journal of Healthcare Informatics Research, vol. 4, pp. 1–18, 2020
work page 2020
-
[15]
A deep learning approach for blood glucose prediction of type 1 diabetes,
J. Freiburghaus, A. Rizzotti, and F. Albertetti, “A deep learning approach for blood glucose prediction of type 1 diabetes,” in Proceedings of the Proceedings of the 5th International Workshop on Knowledge Discovery in Healthcare Data co-located with 24th European Conference on Artificial Intelligence (ECAI 2020), 29-30 August 2020, Santiago de Compostela...
work page 2020
-
[16]
Glysim: Modeling and simulating glycemic response for behavioral lifestyle interventions,
A. Arefeen and H. Ghasemzadeh, “Glysim: Modeling and simulating glycemic response for behavioral lifestyle interventions,” in 2023 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). IEEE, 2023, pp. 1–5
work page 2023
-
[17]
A multitask learning approach to personalized blood glucose prediction,
J. Daniels, P. Herrero, and P. Georgiou, “A multitask learning approach to personalized blood glucose prediction,” IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 1, pp. 436–445, 2021
work page 2021
-
[18]
Distilling the Knowledge in a Neural Network
G. Hinton, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015
work page internal anchor Pith review Pith/arXiv arXiv 2015
-
[19]
Blood glucose concentration predic- tion based on vmd-kelm-adaboost,
W. Wenbo, S. Yang, and C. Guici, “Blood glucose concentration predic- tion based on vmd-kelm-adaboost,” Medical & Biological Engineering & Computing, vol. 59, pp. 2219–2235, 2021
work page 2021
-
[20]
Variational mode decomposition,
K. Dragomiretskiy and D. Zosso, “Variational mode decomposition,” IEEE transactions on signal processing, vol. 62, no. 3, pp. 531–544, 2013
work page 2013
-
[21]
M. Jaloli and M. Cescon, “Long-term prediction of blood glucose levels in type 1 diabetes using a cnn-lstm-based deep neural network,” Journal of diabetes science and technology, vol. 17, no. 6, pp. 1590–1601, 2023
work page 2023
-
[22]
Convolutional recurrent neural networks for glucose prediction,
K. Li, J. Daniels, C. Liu, P. Herrero, and P. Georgiou, “Convolutional recurrent neural networks for glucose prediction,” IEEE journal of biomedical and health informatics, vol. 24, no. 2, pp. 603–613, 2019
work page 2019
-
[23]
S. Hochreiter, “Long short-term memory,” Neural Computation MIT-Press, 1997
work page 1997
-
[24]
W. P. van Doorn, Y . D. Foreman, N. C. Schaper, H. H. Savelberg, A. Koster, C. J. van der Kallen, A. Wesselius, M. T. Schram, R. M. Henry, P. C. Dagnelie et al., “Machine learning-based glucose prediction with use of continuous glucose and physical activity monitoring data: The maastricht study,” PloS one, vol. 16, no. 6, p. e0253125, 2021
work page 2021
-
[25]
M. F. Rabby, Y . Tu, M. I. Hossen, I. Lee, A. S. Maida, and X. Hei, “Stacked lstm based deep recurrent neural network with kalman smooth- ing for blood glucose prediction,” BMC Medical Informatics and Decision Making, vol. 21, pp. 1–15, 2021
work page 2021
-
[26]
Shortcomings in the evaluation of blood glucose forecasting,
J. M. Lee, R. Pop-Busui, J. M. Lee, J. Fleischer, and J. Wiens, “Shortcomings in the evaluation of blood glucose forecasting,” IEEE Transactions on Biomedical Engineering, 2024
work page 2024
-
[27]
Designing deep neural networks robust to sensor failure in mobile health environments,
A. Mamun, S. I. Mirzadeh, and H. Ghasemzadeh, “Designing deep neural networks robust to sensor failure in mobile health environments,” in 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 2022, pp. 2442–2446
work page 2022
-
[28]
Multi- modal time-series activity forecasting for adaptive lifestyle intervention design,
A. Mamun, K. S. Leonard, M. P. Buman, and H. Ghasemzadeh, “Multi- modal time-series activity forecasting for adaptive lifestyle intervention design,” in 2022 IEEE-EMBS International Conference on Wearable and Implantable Body Sensor Networks (BSN). IEEE, 2022, pp. 1–4
work page 2022
-
[29]
Be your own teacher: Improve the performance of convolutional neural networks via self distillation,
L. Zhang, J. Song, A. Gao, J. Chen, C. Bao, and K. Ma, “Be your own teacher: Improve the performance of convolutional neural networks via self distillation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 3713–3722
work page 2019
-
[30]
H. Hameed and S. Kleinberg, “Investigating potentials and pitfalls of knowledge distillation across datasets for blood glucose forecasting,” in Proceedings of the 5th Annual Workshop on Knowledge Discovery in Healthcare Data, 2020
work page 2020
-
[31]
The ohiot1dm dataset for blood glucose level prediction: Update 2020,
C. Marling and R. Bunescu, “The ohiot1dm dataset for blood glucose level prediction: Update 2020,” in CEUR workshop proceedings, vol
work page 2020
-
[32]
NIH Public Access, 2020, p. 71
work page 2020
-
[33]
Timing of insulin delivery with meals,
E. Kraegen, D. Chisholm, and M. E. McNamara, “Timing of insulin delivery with meals,” Hormone and Metabolic Research, vol. 13, no. 07, pp. 365–367, 1981
work page 1981
-
[34]
Optimal insulin administration for people with type 1 diabetes,
D. Boiroux, D. A. Finan, J. B. Jørgensen, N. K. Poulsen, and H. Madsen, “Optimal insulin administration for people with type 1 diabetes,” IFAC Proceedings V olumes, vol. 43, no. 5, pp. 248–253, 2010
work page 2010
-
[35]
A review of variational mode decomposition in seismic data analysis,
W. Liu, Y . Liu, S. Li, and Y . Chen, “A review of variational mode decomposition in seismic data analysis,” Surveys in Geophysics, vol. 44, pp. 323–355, 2022. [Online]. Available: https://api.semanticscholar.org/ CorpusID:253502339
work page 2022
-
[36]
Variational mode decomposition,
K. Dragomiretskiy and D. Zosso, “Variational mode decomposition,” IEEE Transactions on Signal Processing, vol. 62, no. 3, pp. 531–544, 2014
work page 2014
-
[37]
A. Vaswani, “Attention is all you need,” Advances in Neural Information Processing Systems, 2017
work page 2017
-
[38]
Glucose transformer: Forecasting glucose level and events of hyperglycemia and hypoglycemia,
S.-M. Lee, D.-Y . Kim, and J. Woo, “Glucose transformer: Forecasting glucose level and events of hyperglycemia and hypoglycemia,” IEEE Journal of Biomedical and Health Informatics, vol. 27, no. 3, pp. 1600– 1611, 2023
work page 2023
-
[39]
A personalized blood glucose prediction model using random forest regression,
M. Syafrudin, G. Alfian, N. L. Fitriyani, I. Fahrurrozi, M. Anshari, and J. Rhee, “A personalized blood glucose prediction model using random forest regression,” in 2022 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems (ICETSIS). IEEE, 2022, pp. 295–299
work page 2022
-
[40]
Blood glucose level time series forecasting: nested deep ensemble learning lag fusion,
H. Khadem, H. Nemat, J. Elliott, and M. Benaissa, “Blood glucose level time series forecasting: nested deep ensemble learning lag fusion,” Bioengineering, vol. 10, no. 4, p. 487, 2023
work page 2023
-
[41]
Glunet: A deep learning framework for accurate glucose forecasting,
K. Li, C. Liu, T. Zhu, P. Herrero, and P. Georgiou, “Glunet: A deep learning framework for accurate glucose forecasting,” IEEE journal of biomedical and health informatics, vol. 24, no. 2, pp. 414–423, 2019
work page 2019
-
[42]
Deep residual time-series forecasting: Application to blood glucose prediction
H. Rubin-Falcone, I. Fox, and J. Wiens, “Deep residual time-series forecasting: Application to blood glucose prediction.” KDH@ ECAI, vol. 20, pp. 105–109, 2020
work page 2020
-
[43]
Personalised short-term glucose prediction via recurrent self- attention network,
R. Cui, C. Hettiarachchi, C. J. Nolan, E. Daskalaki, and H. Suomi- nen, “Personalised short-term glucose prediction via recurrent self- attention network,” in 2021 IEEE 34th International Symposium on Computer-Based Medical Systems (CBMS). IEEE, 2021, pp. 154–159
work page 2021
-
[44]
Efficient acceleration of deep learning inference on resource-constrained edge devices: A review,
M. M. H. Shuvo, S. K. Islam, J. Cheng, and B. I. Morshed, “Efficient acceleration of deep learning inference on resource-constrained edge devices: A review,” Proceedings of the IEEE, vol. 111, no. 1, pp. 42– 91, 2022
work page 2022
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.