pith. sign in

arxiv: 2411.10703 · v3 · submitted 2024-11-16 · 💻 cs.LG · eess.SP

Hybrid Attention Model Using Feature Decomposition and Knowledge Distillation for Glucose Forecasting

Pith reviewed 2026-05-23 17:25 UTC · model grok-4.3

classification 💻 cs.LG eess.SP
keywords glucose forecastingfeature decompositionknowledge distillationtransformer modelT1 diabetescontinuous glucose monitoringedge deployment
0
0 comments X

The pith

GlucoNet forecasts blood glucose from irregular multimodal data by decomposing features and distilling a transformer, cutting RMSE by 60 percent.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents GlucoNet as a hybrid attention system that turns sparse behavioral inputs such as diet and medication into continuous features through a mathematical model and then decomposes blood glucose signals into low- and high-frequency parts for longer-horizon prediction. Knowledge distillation shrinks the transformer so the full pipeline can run on edge hardware while still incorporating physiological signals. A reader would care because continuous glucose monitors now generate the raw streams needed for real-time forecasts that could trigger automated interventions before complications develop. The system is evaluated on data from twelve people with type 1 diabetes.

Core claim

GlucoNet is a feature decomposition-based transformer that incorporates patients' behavioral and physiological data, transforms sparse and irregular patient data into continuous features using a mathematical model, extracts low and high-frequency components from BGL signals to address their non-linear non-stationary character, and applies knowledge distillation to compress the model, delivering a 60 percent RMSE improvement, a 21 percent parameter reduction, and 51 percent and 57 percent gains in RMSE and MAE on recordings from twelve T1-Diabetes participants.

What carries the argument

Feature decomposition-based transformer that converts irregular inputs to continuous features via a mathematical model, splits BGL signals into frequency components, and is compressed by knowledge distillation.

If this is right

  • Enables real-time forecasting on edge devices for in-the-moment interventions.
  • Improves handling of long prediction horizons in non-stationary signals.
  • Reduces model size while preserving accuracy on multimodal inputs.
  • Allows behavioral data to be used alongside continuous glucose readings without alignment problems.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same decomposition step could be tested on other irregularly sampled physiological streams such as heart-rate variability or activity counts.
  • If the frequency split reliably captures non-stationarity, the approach might reduce data requirements for training in related time-series health tasks.
  • Closed-loop integration with insulin delivery pumps would be a direct next measurement of clinical utility.

Load-bearing premise

The mathematical model that converts sparse irregular patient data into continuous features integrates accurately with blood glucose level measurements.

What would settle it

A new cohort of T1-Diabetes patients where the full GlucoNet pipeline shows no RMSE reduction relative to an undecomposed undistilled transformer baseline.

Figures

Figures reproduced from arXiv: 2411.10703 by Ebrahim Farahmand, Hassan Ghasemzadeh, Nooshin Taheri Chatrudi, Shovito Barua Soumma.

Figure 1
Figure 1. Figure 1: Accuracy-efficiency trade-offs Martinsson [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: An Overview of GlucoNet includes sensing to measure variables, Sparse signal construction to extract the effective [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: An example of transforming sparse events (carb intake and insulin dosage) using Sparse Signal Reconstruction (SSR) [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: We conducted a series of experiments to optimize the Transformer model architecture. For the teacher model, the best-performing configuration consisted of a single encoder layer, 64 input dimensions, 4 attention heads, and 128 feed￾forward units. Moreover, for the student model, the optimal configuration differed slightly. It utilized a single encoder layer, 32 input dimensions, 2 attention heads, and 64 f… view at source ↗
Figure 4
Figure 4. Figure 4: Overall Transformer model architecture for the teacher [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: An example of forecasting the blood glucose model with [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Average error metrics for different configurations of GlucoNet for different memory cells of LSTM across three [PITH_FULL_IMAGE:figures/full_fig_p010_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Accuracy-efficiency trade-offs for different configurations of GlucoNet compared to state-of-the-art works. GN 1: GlucoNet+KD (ST), [(128, 128),(128, 64)], GN 2: GlucoNet(ST), [(128, 128),(128, 64)], GN 3: GlucoNet (LT), [(128, 128),(128, 64)], GN 4: GlucoNet+KD (ST), [(128, 64)], GN 5: GlucoNet+KD (ST), [(64, 32)], GN 6: GlucoNet+KD (ST), [(32, 16)], GN 7: GlucoNet+KD (ST), [(16, 8)] V. CONCLUSION We pres… view at source ↗
read the original abstract

The availability of continuous glucose monitors as over-the-counter commodities have created a unique opportunity to monitor a person's blood glucose levels, forecast blood glucose trajectories and provide automated interventions to prevent devastating chronic complications that arise from poor glucose control. However, forecasting blood glucose levels is challenging because blood glucose changes consistently in response to food intake, medication intake, physical activity, sleep, and stress. It is particularly difficult to accurately predict BGL from multimodal and irregularly sampled data and over long prediction horizons. Furthermore, these forecasting models must operate in real-time on edge devices to provide in-the-moment interventions. To address these challenges, we propose GlucoNet, an AI-powered sensor system for continuously monitoring behavioral and physiological health and robust forecasting of blood glucose patterns. GlucoNet devises a feature decomposition-based transformer model that incorporates patients' behavioral and physiological data and transforms sparse and irregular patient data (e.g., diet and medication intake data) into continuous features using a mathematical model, facilitating better integration with the BGL data. Given the non-linear and non-stationary nature of BG signals, we propose a decomposition method to extract both low and high-frequency components from the BGL signals, thus providing accurate forecasting. To reduce the computational complexity, we also propose to employ knowledge distillation to compress the transformer model. GlucoNet achieves a 60% improvement in RMSE and a 21% reduction in the number of parameters, improving RMSE and MAE by 51% and 57%, using data obtained involving 12 participants with T1-Diabetes. These results underscore GlucoNet's potential as a compact and reliable tool for real-world diabetes prevention and management.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes GlucoNet, a hybrid attention transformer model for blood glucose level (BGL) forecasting that applies feature decomposition to convert sparse/irregular multimodal inputs (diet, medication, etc.) into continuous features, decomposes BGL signals into low- and high-frequency components, and uses knowledge distillation to compress the model for edge deployment. It reports a 60% RMSE improvement, 51% RMSE and 57% MAE gains, and 21% parameter reduction on data from 12 T1D participants.

Significance. If the performance claims prove robust, the combination of explicit handling of irregular sampling via mathematical feature transformation and distillation-based compression would be a useful contribution to real-time, resource-constrained physiological forecasting. The emphasis on edge-device suitability is timely given the rise of continuous glucose monitors.

major comments (1)
  1. [Abstract and experimental evaluation] Abstract and experimental evaluation: the central claims of 60% RMSE improvement (plus 51%/57% gains and 21% parameter reduction) are presented without any description of the baseline models, cross-validation scheme (subject-independent vs. random splits), statistical significance tests, or error bars. With only 12 participants this omission renders the quantitative improvements impossible to assess and is load-bearing for the paper's primary contribution.
minor comments (1)
  1. [Abstract] The abstract states both a '60% improvement in RMSE' and 'improving RMSE and MAE by 51% and 57%'; clarify whether these refer to different baselines or prediction horizons.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback on experimental reporting. We agree that additional details are required to allow proper assessment of the reported performance gains and will revise the manuscript to address this.

read point-by-point responses
  1. Referee: [Abstract and experimental evaluation] Abstract and experimental evaluation: the central claims of 60% RMSE improvement (plus 51%/57% gains and 21% parameter reduction) are presented without any description of the baseline models, cross-validation scheme (subject-independent vs. random splits), statistical significance tests, or error bars. With only 12 participants this omission renders the quantitative improvements impossible to assess and is load-bearing for the paper's primary contribution.

    Authors: We acknowledge the omission and agree that the abstract and experimental sections must be expanded for transparency. In the revision we will add: (1) explicit descriptions of all baseline models compared against (LSTM, standard Transformer, and prior glucose forecasting methods); (2) clarification that evaluation used leave-one-subject-out cross-validation to ensure subject independence; (3) statistical significance results (paired t-tests or equivalent with p-values); and (4) error bars (standard deviation across folds) on all metrics. These changes will be incorporated into a dedicated Experimental Setup subsection and updated Results tables/figures. The small cohort (n=12) is already noted as a limitation in the discussion; the added details will make the quantitative claims evaluable without altering the underlying experiments. revision: yes

Circularity Check

0 steps flagged

No circularity in derivation or performance claims

full rationale

The paper describes an empirical ML pipeline: a feature-decomposition transformer trained on multimodal patient data from 12 T1D participants, followed by knowledge distillation and standard RMSE/MAE evaluation. No equation or step equates a fitted parameter to a claimed prediction by construction, no self-citation chain supplies the central result, and the reported improvements are external experimental outcomes rather than tautological restatements of inputs. The derivation remains self-contained against the data and architecture.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the approach relies on standard transformer and distillation methods from prior literature without detailing any new fitted constants or unproven assumptions beyond the high-level description.

pith-pipeline@v0.9.0 · 5846 in / 1256 out tokens · 40516 ms · 2026-05-23T17:25:26.957159+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Parkinson's Disease Detection via Self-Supervised Dual-Channel Cross-Attention on Bilateral Wrist-Worn IMU Signals

    cs.LG 2026-04 unverdicted novelty 5.0

    Self-supervised cross-attention on bilateral wrist IMU signals achieves 93% accuracy distinguishing Parkinson's from healthy controls and 87-92% from other diagnoses using only 20% labeled data.

Reference graph

Works this paper leans on

44 extracted references · 44 canonical work pages · cited by 1 Pith paper · 2 internal anchors

  1. [1]

    Plug-n-learn: automatic learning of computational algorithms in human-centered internet-of-things ap- plications,

    S. A. Rokni and H. Ghasemzadeh, “Plug-n-learn: automatic learning of computational algorithms in human-centered internet-of-things ap- plications,” in Proceedings of the 53rd Annual Design Automation Conference, 2016, pp. 1–6

  2. [2]

    Self-Supervised Learning and Opportunistic Inference for Continuous Monitoring of Freezing of Gait in Parkinson's Disease

    S. B. Soumma, K. Mangipudi, D. Peterson, S. Mehta, and H. Ghasemzadeh, “Self-supervised learning and opportunistic inference for continuous monitoring of freezing of gait in parkinson’s disease,” arXiv preprint arXiv:2410.21326, 2024

  3. [3]

    Deep multitask learning by stacked long short-term memory for predicting personalized blood glucose concentration,

    M. M. H. Shuvo and S. K. Islam, “Deep multitask learning by stacked long short-term memory for predicting personalized blood glucose concentration,” IEEE Journal of Biomedical and Health Informatics, vol. 27, no. 3, pp. 1612–1623, 2023. 12

  4. [4]

    A novel approach for the prediction of glucose concen- tration in type 1 diabetes ahead in time through arima and differential evolution,

    S. Shanthi, “A novel approach for the prediction of glucose concen- tration in type 1 diabetes ahead in time through arima and differential evolution,” Adv. Eng. Inform., vol. 38, pp. 4182–4186, 2011

  5. [5]

    Evaluation of short-term predictors of glucose concentration in type 1 diabetes combining feature ranking with regression models,

    E. I. Georga, V . C. Protopappas, D. Polyzos, and D. I. Fotiadis, “Evaluation of short-term predictors of glucose concentration in type 1 diabetes combining feature ranking with regression models,” Medical & biological engineering & computing, vol. 53, pp. 1305–1318, 2015

  6. [6]

    An edge-based wifi fingerprinting indoor localization using convolutional neural network and convolutional auto-encoder,

    A. Kargar-Barzi, E. Farahmand, N. Taheri Chatrudi, A. Mahani, and M. Shafique, “An edge-based wifi fingerprinting indoor localization using convolutional neural network and convolutional auto-encoder,” IEEE Access, vol. 12, pp. 85 050–85 060, 2024

  7. [7]

    Deep person- alized glucose level forecasting using attention-based recurrent neural networks,

    M. Armandpour, B. Kidd, Y . Du, and J. Z. Huang, “Deep person- alized glucose level forecasting using attention-based recurrent neural networks,” in 2021 International Joint Conference on Neural Networks (IJCNN). IEEE, 2021, pp. 1–8

  8. [8]

    Bgformer: An improved informer model to enhance blood glucose prediction,

    Y . Xue, S. Guan, and W. Jia, “Bgformer: An improved informer model to enhance blood glucose prediction,” Journal of Biomedical Informatics, vol. 157, p. 104715, 2024

  9. [9]

    Multi-view cross-fusion transformer based on kinetic features for non- invasive blood glucose measurement using ppg signal,

    S. Chen, F. Qin, X. Ma, J. Wei, Y .-T. Zhang, Y . Zhang, and E. Jovanov, “Multi-view cross-fusion transformer based on kinetic features for non- invasive blood glucose measurement using ppg signal,” IEEE Journal of Biomedical and Health Informatics, 2024

  10. [10]

    Multivariate variational mode decompo- sition,

    N. ur Rehman and H. Aftab, “Multivariate variational mode decompo- sition,” IEEE Transactions on signal processing, vol. 67, no. 23, pp. 6039–6052, 2019

  11. [11]

    Short-term inflow forecasting in a dam-regulated river in southwest norway using causal variational mode decomposition,

    M. Yousefi, J. Wang, Ø. Fandrem Høivik, J. Rajasekharan, A. Hu- bert Wierling, H. Farahmand, and R. Arghandeh, “Short-term inflow forecasting in a dam-regulated river in southwest norway using causal variational mode decomposition,” Scientific Reports, vol. 13, no. 1, p. 7016, 2023

  12. [12]

    Eeg signal denoising using hybrid approach of variational mode decomposition and wavelets for depression,

    C. Kaur, A. Bisht, P. Singh, and G. Joshi, “Eeg signal denoising using hybrid approach of variational mode decomposition and wavelets for depression,” Biomedical Signal Processing and Control, vol. 65, p. 102337, 2021

  13. [13]

    Blood glucose prediction with vmd and lstm optimized by improved particle swarm optimization,

    W. Wang, M. Tong, and M. Yu, “Blood glucose prediction with vmd and lstm optimized by improved particle swarm optimization,” IEEE Access, vol. 8, pp. 217 908–217 916, 2020

  14. [14]

    Blood glucose prediction with variance estimation using recurrent neural networks,

    J. Martinsson, A. Schliep, B. Eliasson, and O. Mogren, “Blood glucose prediction with variance estimation using recurrent neural networks,” Journal of Healthcare Informatics Research, vol. 4, pp. 1–18, 2020

  15. [15]

    A deep learning approach for blood glucose prediction of type 1 diabetes,

    J. Freiburghaus, A. Rizzotti, and F. Albertetti, “A deep learning approach for blood glucose prediction of type 1 diabetes,” in Proceedings of the Proceedings of the 5th International Workshop on Knowledge Discovery in Healthcare Data co-located with 24th European Conference on Artificial Intelligence (ECAI 2020), 29-30 August 2020, Santiago de Compostela...

  16. [16]

    Glysim: Modeling and simulating glycemic response for behavioral lifestyle interventions,

    A. Arefeen and H. Ghasemzadeh, “Glysim: Modeling and simulating glycemic response for behavioral lifestyle interventions,” in 2023 IEEE EMBS International Conference on Biomedical and Health Informatics (BHI). IEEE, 2023, pp. 1–5

  17. [17]

    A multitask learning approach to personalized blood glucose prediction,

    J. Daniels, P. Herrero, and P. Georgiou, “A multitask learning approach to personalized blood glucose prediction,” IEEE Journal of Biomedical and Health Informatics, vol. 26, no. 1, pp. 436–445, 2021

  18. [18]

    Distilling the Knowledge in a Neural Network

    G. Hinton, “Distilling the knowledge in a neural network,” arXiv preprint arXiv:1503.02531, 2015

  19. [19]

    Blood glucose concentration predic- tion based on vmd-kelm-adaboost,

    W. Wenbo, S. Yang, and C. Guici, “Blood glucose concentration predic- tion based on vmd-kelm-adaboost,” Medical & Biological Engineering & Computing, vol. 59, pp. 2219–2235, 2021

  20. [20]

    Variational mode decomposition,

    K. Dragomiretskiy and D. Zosso, “Variational mode decomposition,” IEEE transactions on signal processing, vol. 62, no. 3, pp. 531–544, 2013

  21. [21]

    Long-term prediction of blood glucose levels in type 1 diabetes using a cnn-lstm-based deep neural network,

    M. Jaloli and M. Cescon, “Long-term prediction of blood glucose levels in type 1 diabetes using a cnn-lstm-based deep neural network,” Journal of diabetes science and technology, vol. 17, no. 6, pp. 1590–1601, 2023

  22. [22]

    Convolutional recurrent neural networks for glucose prediction,

    K. Li, J. Daniels, C. Liu, P. Herrero, and P. Georgiou, “Convolutional recurrent neural networks for glucose prediction,” IEEE journal of biomedical and health informatics, vol. 24, no. 2, pp. 603–613, 2019

  23. [23]

    Long short-term memory,

    S. Hochreiter, “Long short-term memory,” Neural Computation MIT-Press, 1997

  24. [24]

    Machine learning-based glucose prediction with use of continuous glucose and physical activity monitoring data: The maastricht study,

    W. P. van Doorn, Y . D. Foreman, N. C. Schaper, H. H. Savelberg, A. Koster, C. J. van der Kallen, A. Wesselius, M. T. Schram, R. M. Henry, P. C. Dagnelie et al., “Machine learning-based glucose prediction with use of continuous glucose and physical activity monitoring data: The maastricht study,” PloS one, vol. 16, no. 6, p. e0253125, 2021

  25. [25]

    Stacked lstm based deep recurrent neural network with kalman smooth- ing for blood glucose prediction,

    M. F. Rabby, Y . Tu, M. I. Hossen, I. Lee, A. S. Maida, and X. Hei, “Stacked lstm based deep recurrent neural network with kalman smooth- ing for blood glucose prediction,” BMC Medical Informatics and Decision Making, vol. 21, pp. 1–15, 2021

  26. [26]

    Shortcomings in the evaluation of blood glucose forecasting,

    J. M. Lee, R. Pop-Busui, J. M. Lee, J. Fleischer, and J. Wiens, “Shortcomings in the evaluation of blood glucose forecasting,” IEEE Transactions on Biomedical Engineering, 2024

  27. [27]

    Designing deep neural networks robust to sensor failure in mobile health environments,

    A. Mamun, S. I. Mirzadeh, and H. Ghasemzadeh, “Designing deep neural networks robust to sensor failure in mobile health environments,” in 2022 44th Annual International Conference of the IEEE Engineering in Medicine & Biology Society (EMBC). IEEE, 2022, pp. 2442–2446

  28. [28]

    Multi- modal time-series activity forecasting for adaptive lifestyle intervention design,

    A. Mamun, K. S. Leonard, M. P. Buman, and H. Ghasemzadeh, “Multi- modal time-series activity forecasting for adaptive lifestyle intervention design,” in 2022 IEEE-EMBS International Conference on Wearable and Implantable Body Sensor Networks (BSN). IEEE, 2022, pp. 1–4

  29. [29]

    Be your own teacher: Improve the performance of convolutional neural networks via self distillation,

    L. Zhang, J. Song, A. Gao, J. Chen, C. Bao, and K. Ma, “Be your own teacher: Improve the performance of convolutional neural networks via self distillation,” in Proceedings of the IEEE/CVF international conference on computer vision, 2019, pp. 3713–3722

  30. [30]

    Investigating potentials and pitfalls of knowledge distillation across datasets for blood glucose forecasting,

    H. Hameed and S. Kleinberg, “Investigating potentials and pitfalls of knowledge distillation across datasets for blood glucose forecasting,” in Proceedings of the 5th Annual Workshop on Knowledge Discovery in Healthcare Data, 2020

  31. [31]

    The ohiot1dm dataset for blood glucose level prediction: Update 2020,

    C. Marling and R. Bunescu, “The ohiot1dm dataset for blood glucose level prediction: Update 2020,” in CEUR workshop proceedings, vol

  32. [32]

    NIH Public Access, 2020, p. 71

  33. [33]

    Timing of insulin delivery with meals,

    E. Kraegen, D. Chisholm, and M. E. McNamara, “Timing of insulin delivery with meals,” Hormone and Metabolic Research, vol. 13, no. 07, pp. 365–367, 1981

  34. [34]

    Optimal insulin administration for people with type 1 diabetes,

    D. Boiroux, D. A. Finan, J. B. Jørgensen, N. K. Poulsen, and H. Madsen, “Optimal insulin administration for people with type 1 diabetes,” IFAC Proceedings V olumes, vol. 43, no. 5, pp. 248–253, 2010

  35. [35]

    A review of variational mode decomposition in seismic data analysis,

    W. Liu, Y . Liu, S. Li, and Y . Chen, “A review of variational mode decomposition in seismic data analysis,” Surveys in Geophysics, vol. 44, pp. 323–355, 2022. [Online]. Available: https://api.semanticscholar.org/ CorpusID:253502339

  36. [36]

    Variational mode decomposition,

    K. Dragomiretskiy and D. Zosso, “Variational mode decomposition,” IEEE Transactions on Signal Processing, vol. 62, no. 3, pp. 531–544, 2014

  37. [37]

    Attention is all you need,

    A. Vaswani, “Attention is all you need,” Advances in Neural Information Processing Systems, 2017

  38. [38]

    Glucose transformer: Forecasting glucose level and events of hyperglycemia and hypoglycemia,

    S.-M. Lee, D.-Y . Kim, and J. Woo, “Glucose transformer: Forecasting glucose level and events of hyperglycemia and hypoglycemia,” IEEE Journal of Biomedical and Health Informatics, vol. 27, no. 3, pp. 1600– 1611, 2023

  39. [39]

    A personalized blood glucose prediction model using random forest regression,

    M. Syafrudin, G. Alfian, N. L. Fitriyani, I. Fahrurrozi, M. Anshari, and J. Rhee, “A personalized blood glucose prediction model using random forest regression,” in 2022 ASU International Conference in Emerging Technologies for Sustainability and Intelligent Systems (ICETSIS). IEEE, 2022, pp. 295–299

  40. [40]

    Blood glucose level time series forecasting: nested deep ensemble learning lag fusion,

    H. Khadem, H. Nemat, J. Elliott, and M. Benaissa, “Blood glucose level time series forecasting: nested deep ensemble learning lag fusion,” Bioengineering, vol. 10, no. 4, p. 487, 2023

  41. [41]

    Glunet: A deep learning framework for accurate glucose forecasting,

    K. Li, C. Liu, T. Zhu, P. Herrero, and P. Georgiou, “Glunet: A deep learning framework for accurate glucose forecasting,” IEEE journal of biomedical and health informatics, vol. 24, no. 2, pp. 414–423, 2019

  42. [42]

    Deep residual time-series forecasting: Application to blood glucose prediction

    H. Rubin-Falcone, I. Fox, and J. Wiens, “Deep residual time-series forecasting: Application to blood glucose prediction.” KDH@ ECAI, vol. 20, pp. 105–109, 2020

  43. [43]

    Personalised short-term glucose prediction via recurrent self- attention network,

    R. Cui, C. Hettiarachchi, C. J. Nolan, E. Daskalaki, and H. Suomi- nen, “Personalised short-term glucose prediction via recurrent self- attention network,” in 2021 IEEE 34th International Symposium on Computer-Based Medical Systems (CBMS). IEEE, 2021, pp. 154–159

  44. [44]

    Efficient acceleration of deep learning inference on resource-constrained edge devices: A review,

    M. M. H. Shuvo, S. K. Islam, J. Cheng, and B. I. Morshed, “Efficient acceleration of deep learning inference on resource-constrained edge devices: A review,” Proceedings of the IEEE, vol. 111, no. 1, pp. 42– 91, 2022