pith. sign in

arxiv: 2604.26126 · v3 · pith:FHF4O3LPnew · submitted 2026-04-28 · 📡 eess.SY · cs.SY· stat.ML

Application of Deep Reinforcement Learning to Event-Triggered Control for Networked Artificial Pancreas Systems

Pith reviewed 2026-05-19 17:41 UTC · model grok-4.3

classification 📡 eess.SY cs.SYstat.ML
keywords deep reinforcement learningevent-triggered controlartificial pancreasnetworked control systemssemi-Markov decision processblood glucose regulationinsulin delivery
0
0 comments X

The pith

A rule-based trigger on blood glucose changes lets deep reinforcement learning control networked artificial pancreas systems at irregular intervals.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a deep reinforcement learning controller for insulin delivery in networked artificial pancreas systems that avoids the complexity of jointly learning both dosing amounts and update timings. It replaces explicit timing decisions with a simple rule that triggers updates whenever blood glucose levels change by a defined amount. This choice converts the control task into a semi-Markov decision process, which is solved by extending a standard DRL algorithm. The resulting controller issues commands only at irregular times rather than on a fixed schedule. A reader would care because frequent wireless updates drain batteries and raise energy costs in wearable medical devices, while the method keeps glucose regulation performance intact.

Core claim

By introducing a rule-based event-triggering criterion defined by changes in blood glucose, the design avoids explicitly learning update timing inside the DRL framework. Decision making therefore occurs at irregular intervals, and the problem is formulated as a semi-Markov decision process for which a standard DRL algorithm is extended. Numerical experiments show that this controller improves communication efficiency while maintaining control performance comparable to periodic-update baselines.

What carries the argument

Rule-based event-triggering criterion on blood glucose changes, which decides update times without joint learning and converts the task into a semi-Markov decision process.

If this is right

  • Control actions are issued only at irregular intervals determined by observed glucose changes rather than fixed periods.
  • Communication frequency drops because updates occur only when the rule-based criterion is met.
  • The semi-Markov decision process formulation allows extension of existing DRL algorithms without major redesign.
  • Numerical results indicate that glucose control performance stays comparable to periodic-update methods.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same rule-based separation of timing and action could simplify DRL designs in other networked medical or industrial control settings.
  • Patient-specific tuning of the glucose-change threshold might further reduce unnecessary updates while preserving safety margins.
  • Hardware deployment on actual insulin pumps would reveal whether the observed communication savings translate to longer battery life.
  • Combining the approach with model-predictive elements could address rare edge cases that the fixed rule might overlook.

Load-bearing premise

A rule-based criterion defined by changes in blood glucose is adequate to handle event-triggering without jointly learning update timing in the DRL framework.

What would settle it

A simulation in which rapid or unexpected blood glucose excursions cause the rule-based triggers to miss necessary updates, resulting in poorer glucose regulation than a periodic DRL controller with the same network constraints.

Figures

Figures reproduced from arXiv: 2604.26126 by Junya Ikemoto, Kazumune Hashimoto, Satoshi Maruyama.

Figure 1
Figure 1. Figure 1: Illustration of a networked AP system. The controller, implemented view at source ↗
Figure 3
Figure 3. Figure 3: Illustration of the glucose-insulin dynamics (S2008). The mathemat view at source ↗
Figure 4
Figure 4. Figure 4: Time responses under the policy learned by CGM-ETPPO for view at source ↗
Figure 5
Figure 5. Figure 5: Histograms of the interval-averaged CGM values and the correspond view at source ↗
Figure 6
Figure 6. Figure 6: Time responses under the policy learned by CGM-ETPPO with view at source ↗
read the original abstract

This paper proposes a deep reinforcement learning (DRL)-based event-triggered controller design for networked artificial pancreas (AP) systems. Although existing DRL-based AP controllers typically assume periodic control updates, networked control systems (NCSs) require a reduction in communication frequency to achieve energy-efficient operation, which is directly tied to control updates. However, jointly learning both insulin dosing and update timing significantly increases the complexity of the learning problem. To alleviate this complexity, we develop a practical DRL-based controller design that avoids explicitly learning update timing by introducing a rule-based criterion defined by changes in blood glucose. As a result, decision-making occurs at irregular intervals, and the problem is naturally formulated as a semi-Markov decision process (SMDP), for which we extend a standard DRL algorithm. Numerical experiments demonstrate that the proposed method improves communication efficiency while maintaining control performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The manuscript proposes a DRL-based event-triggered controller for networked artificial pancreas systems. To avoid the complexity of jointly learning insulin dosing and update timing, it introduces a fixed rule-based event trigger defined by changes in blood glucose levels. The resulting irregular decision intervals are handled by formulating the problem as a semi-Markov decision process and extending a standard DRL algorithm. Numerical experiments are presented to support the claim of improved communication efficiency without degradation in control performance.

Significance. If the experimental claims hold under broader conditions, the work could provide a practical route to energy-efficient networked control for medical devices such as the artificial pancreas by reducing communication overhead while preserving glycemic regulation. The SMDP formulation is a natural fit for the irregular timing induced by the rule-based trigger. However, the reliance on a non-learned, fixed trigger restricts the method's adaptability, and the absence of detailed experimental protocols limits the strength of the supporting evidence.

major comments (1)
  1. The central claim that numerical experiments demonstrate improved communication efficiency while maintaining control performance rests on the adequacy of the fixed rule-based glucose-change trigger. Because this trigger is not co-optimized with the insulin policy inside the SMDP, the reported results only validate the specific combination of patient model, disturbance set, and threshold chosen; nothing rules out regimes in which the same rule either triggers excessively or misses excursions that a jointly learned policy would capture.
minor comments (1)
  1. The abstract would benefit from naming the specific DRL algorithm being extended and from indicating the baseline periodic controller used for comparison.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comments and the opportunity to improve our manuscript. We address the major comment below and have incorporated revisions to strengthen the experimental validation and discussion of limitations.

read point-by-point responses
  1. Referee: The central claim that numerical experiments demonstrate improved communication efficiency while maintaining control performance rests on the adequacy of the fixed rule-based glucose-change trigger. Because this trigger is not co-optimized with the insulin policy inside the SMDP, the reported results only validate the specific combination of patient model, disturbance set, and threshold chosen; nothing rules out regimes in which the same rule either triggers excessively or misses excursions that a jointly learned policy would capture.

    Authors: We thank the referee for this observation. The fixed rule-based trigger is a deliberate design choice to avoid the substantial increase in learning complexity that would arise from jointly optimizing update timing and insulin dosing within the SMDP. This approach prioritizes practicality and interpretability for medical applications. The reported experiments use the standard UVA/Padova virtual patient simulator with benchmark meal disturbances and patient parameters from the literature. We acknowledge that the results are tied to the selected threshold and conditions. In the revised manuscript we have added a sensitivity analysis varying the glucose-change threshold, additional simulation cases with altered disturbance magnitudes, and an expanded discussion of regimes where the fixed trigger may underperform relative to a jointly learned policy, including a brief comparison to fully adaptive triggering approaches. revision: partial

Circularity Check

0 steps flagged

No significant circularity; derivation is self-contained

full rationale

The paper introduces a practical DRL controller for networked AP systems by adopting a fixed rule-based event trigger (blood-glucose change threshold) to avoid jointly learning update timing, then formulates the resulting irregular decision process as an SMDP and extends a standard DRL algorithm. The performance claim rests on numerical experiments that evaluate the combined policy-plus-trigger design against periodic baselines. No equation or result is shown to equal its own inputs by construction, no parameter is fitted on a subset and then relabeled a prediction, and no load-bearing premise reduces to a self-citation chain. The method is therefore an independent engineering proposal whose validity is tested externally rather than derived tautologically from its own assumptions.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No free parameters, axioms, or invented entities are specified in the abstract.

pith-pipeline@v0.9.0 · 5689 in / 1030 out tokens · 44686 ms · 2026-05-19T17:41:53.567600+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

51 extracted references · 51 canonical work pages · 3 internal anchors

  1. [1]

    Diagnosis and Classification of Dia- betes Mellitus,

    American Diabetes Association, “Diagnosis and Classification of Dia- betes Mellitus,”Diabetes Care, vol.37, no.1, pp. 581–590, 2013. Fig. 7. Time responses under the policy learned by CGM-ETPPO with the variable threshold scheme for adult#009. The first, second, and third plots show CGM value, insulin infusion rate, and the variable CGM-threshold, respect...

  2. [2]

    Diabetes Mellitus: Classification, Mediators, and Complications; A Gate to Identify Potential Targets for the Development of New Effective Treatments,

    S. A. Antar et al., “Diabetes Mellitus: Classification, Mediators, and Complications; A Gate to Identify Potential Targets for the Development of New Effective Treatments,”Biomedicine & Pharmacotherapy, vol. 168, 115734, 2023

  3. [3]

    2. Diagnosis and Classification of Diabetes: Standards of Care in Diabetes—2025,

    American Diabetes Association, “2. Diagnosis and Classification of Diabetes: Standards of Care in Diabetes—2025,”Diabetes Care, vol. 48, no.1, pp. S27–S49, 2025

  4. [4]

    Continuous Glucose Monitoring and Intensive Treatment of Type 1 Diabetes,

    W. V . Tamborlane et al., “Continuous Glucose Monitoring and Intensive Treatment of Type 1 Diabetes,”New England Journal of Medicine, vol. 359, no.14, pp. 1464–1476, 2008

  5. [5]

    C. K. Boughton et al., “Hybrid Closed-Loop Glucose Control Compared with Sensor Augmented Pump Therapy in Older Adults with Type 1 Diabetes: An Open-Label Multicentre, Multinational, Randomised, Crossover Study,”The Lancet Healthy Longevity, vol.3, no.3, pp. e135– e142, 2022

  6. [6]

    Cambridge Hybrid Closed-Loop Algorithm in Children and Adolescents with Type 1 Diabetes: A Multicentre 6-Month Ran- domised Controlled Trial,

    J. Ware et al., “Cambridge Hybrid Closed-Loop Algorithm in Children and Adolescents with Type 1 Diabetes: A Multicentre 6-Month Ran- domised Controlled Trial,”The Lancet Digital Health, vol.4, no.4, pp. e245–e255, 2022

  7. [7]

    Feasibility of Automating Insulin Deliv- ery for the Treatment of Type 1 Diabetes,

    G. M. Steil et al., “Feasibility of Automating Insulin Deliv- ery for the Treatment of Type 1 Diabetes,”Diabetes, vol.55, no.12, pp. 3344–3350, 2006

  8. [8]

    In Silico Preclinical Trials: A Proof of Concept in Closed-Loop Control of Type 1 Diabetes,

    B. P. Kovatchev et al., “In Silico Preclinical Trials: A Proof of Concept in Closed-Loop Control of Type 1 Diabetes,”Journal of Diabetes Science and Technology, vol.3, no.1, pp. 44–55, 2009

  9. [9]

    The UV A/PADOV A Type 1 Diabetes Simulator: New Features,

    C. D. Man et al., “The UV A/PADOV A Type 1 Diabetes Simulator: New Features,”Journal of Diabetes Science and Technology, vol.8, no.1, pp. 26–34, 2014. 13

  10. [10]

    Model Predictive Control of Type 1 Diabetes: An In Silico Trial,

    L. Magni et al., “Model Predictive Control of Type 1 Diabetes: An In Silico Trial,”Journal of Diabetes Science and Technology, vol.1, no. 6, pp. 804–812, 2007

  11. [11]

    Hypoglycemia Prevention via Pump Attenuation and Red-Yellow-Green “Traffic

    C. S. Hughes et al., “Hypoglycemia Prevention via Pump Attenuation and Red-Yellow-Green “Traffic” Lights Using Continuous Glucose Monitoring and Insulin Pump Data,”Journal of Diabetes Science and Technology, vol.4, no.5, pp. 1146–1155, 2010

  12. [12]

    MPC Based Artificial Pancreas: Strategies for Individu- alization and Meal Compensation,

    P. Soru et al., “MPC Based Artificial Pancreas: Strategies for Individu- alization and Meal Compensation,”Annual Reviews in Control, vol.36, no.1, pp. 118–128, 2012

  13. [13]

    Artificial Pancreas: Model Predictive Control Design from Clinical Experience,

    C. Toffanin et al., “Artificial Pancreas: Model Predictive Control Design from Clinical Experience,”Journal of Diabetes Science and Technology, vol.7, no.6, pp. 1470–1483, 2013

  14. [14]

    Fully Integrated Artificial Pancreas in Type 1 Diabetes Modular Closed-Loop Glucose Control Maintains Near Nor- moglycemia,

    M. Breton et al., “Fully Integrated Artificial Pancreas in Type 1 Diabetes Modular Closed-Loop Glucose Control Maintains Near Nor- moglycemia,”Diabetes, vol.61, no.9, pp. 2230–2237, 2012

  15. [15]

    The Diabetes Assistant: A Smartphone-Based System for Real-Time Control of Blood Glucose,

    P. Keith-Hynes et al., “The Diabetes Assistant: A Smartphone-Based System for Real-Time Control of Blood Glucose,”Electronics, vol.3, no.4, pp. 609–623, 2014

  16. [16]

    Realizing a Closed-Loop (Artificial Pancreas) System for the Treatment of Type 1 Diabetes,

    R. A. Lal et al., “Realizing a Closed-Loop (Artificial Pancreas) System for the Treatment of Type 1 Diabetes,”Endocrine Reviews, vol.40, no. 6, pp. 1521–1546, 2019

  17. [17]

    Synthesis of Model Predictive Control and Reinforce- ment Learning: Survey and Classification,

    R. Reiter et al., “Synthesis of Model Predictive Control and Reinforce- ment Learning: Survey and Classification,”Annual Reviews in Control, vol.61, 101045, 2026

  18. [18]

    R. S. Sutton and A. G. Barto,Reinforcement Learning: An Introduction Second Edition, MIT Press, 2018

  19. [19]

    Dong et al.,Deep Reinforcement Learning Fundamentals, Research and Applications, Springer, 2021

    H. Dong et al.,Deep Reinforcement Learning Fundamentals, Research and Applications, Springer, 2021

  20. [20]

    A Survey of Sim-to-Real Methods in RL: Progress, Prospects and Challenges with Foundation Models

    L. Da et al., “A Survey of Sim-to-Real Methods in RL: Progress, Prospects and Challenges with Foundation Models”arXiv Preprint, arXiv:2502.13187, 2025

  21. [21]

    Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey,

    W. Zhao et al., “Sim-to-Real Transfer in Deep Reinforcement Learning for Robotics: a Survey,” in Proc. ofIEEE Symposium Series on Com- putational Intelligence, pp. 737–744, 2020

  22. [22]

    Toward a Fully Automated Artificial Pancreas System Using a Bioinspired Reinforcement Learning Design: In Silico Vali- dation,

    S. Lee et al., “Toward a Fully Automated Artificial Pancreas System Using a Bioinspired Reinforcement Learning Design: In Silico Vali- dation,”IEEE Journal of Biomedical and Health Informatics, vol.25, no.2, pp. 536–546, 2021

  23. [23]

    Networked Control Systems: A Survey of Trends and Techniques,

    X.-M. Zhang et al., “Networked Control Systems: A Survey of Trends and Techniques,”IEEE/CAA Journal of Automatica Sinica, vol.7, no. 1, pp. 1–17, 2019

  24. [24]

    An Introduction to Event-Triggered and Self-Triggered Control,

    W. P. M. H. Heemels et al., “An Introduction to Event-Triggered and Self-Triggered Control,” Proc. of2012 IEEE 51st IEEE Conference on Decision and Control (CDC), pp. 3270–3285, 2012

  25. [25]

    Learning Event-Triggered Control from Data through Joint Optimization,

    N. Funk et al., “Learning Event-Triggered Control from Data through Joint Optimization,”IFAC Journal of Systems and Control, vol.16, 100144, 2021

  26. [26]

    A Learning Approach for Joint Design of Event-triggered Control and Power-Efficient Resource Allocation,

    A. Termehchi and M. Rasti, “A Learning Approach for Joint Design of Event-triggered Control and Power-Efficient Resource Allocation,”IEEE Transactions on Vehicular Technology, vol.71, no.6, pp. 6322–6334, 2022

  27. [27]

    Toward Multi-Agent Reinforcement Learning for Distributed Event-Triggered Control,

    L. Kesper et al., “Toward Multi-Agent Reinforcement Learning for Distributed Event-Triggered Control,” in Proc. of5th Annual Conference on Learning for Dynamics and Control, vol.211, pp. 1072–1085, 2023

  28. [28]

    Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning,

    R. S. Sutton et al., “Between MDPs and Semi-MDPs: A Framework for Temporal Abstraction in Reinforcement Learning,”Artificial Intelli- gence, vol.112, no.1-2, pp. 181–211, 1999

  29. [29]

    Use of a “Fuzzy Logic

    R. Mauseth et al., “Use of a “Fuzzy Logic” Controller in a Closed- Loop Artificial Pancreas,”Diabetes Technology & Therapeutics, vol.15, no.8, pp. 628–633, 2013

  30. [30]

    The Use of Reinforcement Learning Algorithms to Meet the Challenges of an Artificial Pancreas,

    M. K. Bothe et al., “The Use of Reinforcement Learning Algorithms to Meet the Challenges of an Artificial Pancreas,”Expert Review of Medical Devices, vol.10, no.5, pp. 661–673, 2014

  31. [31]

    Model-Free Machine Learning in Biomedicine: Feasibility Study in Type 1 Diabetes,

    E. Daskalaki et al., “Model-Free Machine Learning in Biomedicine: Feasibility Study in Type 1 Diabetes,”PLoS One, vol.11, no.7, e0158722, 2016

  32. [32]

    A Dual Mode Adaptive Basal-Bolus Advisor Based on Reinforcement Learning,

    Q. Sun et al., “A Dual Mode Adaptive Basal-Bolus Advisor Based on Reinforcement Learning,”IEEE Journal of Biomedical and Health Informatics, vol.23, no.6, pp. 2633–2641, 2019

  33. [33]

    Reinforcement Learning Application in Diabetes Blood Glucose Control: A Systematic Review,

    M Tejedor et al., “Reinforcement Learning Application in Diabetes Blood Glucose Control: A Systematic Review,”Artificial Intelligence In Medicine, vol.104, 101836, 2020

  34. [34]

    Deep Reinforcement Learning for Closed-Loop Blood Glucose Control,

    I. Fox et al., “Deep Reinforcement Learning for Closed-Loop Blood Glucose Control,” in Proc. ofMachine Learning for Healthcare Confer- ence, pp. 508–536, 2020

  35. [35]

    AndroidAPS,

    “AndroidAPS,” [Online]. Available: https://androidaps.readthedocs.io

  36. [36]

    Deep Reinforcement Learning for Continuous-time Self- triggered Control,

    R. Wang et al., “Deep Reinforcement Learning for Continuous-time Self- triggered Control,”IFAC Papers Online, vol.54, no.14, pp. 203–208, 2021

  37. [37]

    Model-Free Self-Triggered Control Based on Deep Re- inforcement Learning for Unknown Nonlinear Systems,

    H. Wan et al., “Model-Free Self-Triggered Control Based on Deep Re- inforcement Learning for Unknown Nonlinear Systems,”International Journal of Robust and Nonlinear Control, vol.33, no.3, pp. 2238–2250, 2023

  38. [38]

    Policy Gradient Methods for Reinforcement Learn- ing with Function Approximation,

    R. S. Sutton et al., “Policy Gradient Methods for Reinforcement Learn- ing with Function Approximation,” in Proc. ofAdvances in Neural Information Processing Systems 12 (NIPS1999), pp. 1057–1063, 1999

  39. [39]

    High-Dimensional Continuous Control Using Generalized Advantage Estimation

    J. Schulman et al., “High-Dimensional Continuous Control Using Gener- alized Advantage Estimation,”arXiv Preprint, arXiv: 1506.02438, 2015

  40. [40]

    Asynchronous Methods for Deep Reinforcement Learning,

    V . Mnih et al., “Asynchronous Methods for Deep Reinforcement Learning,” in Proc. ofThe 33rd International Conference on Machine Learning, vol.48, pp. 1928–1937, 2016

  41. [41]

    Trust Region Policy Optimization,

    J. Schulman et al., “Trust Region Policy Optimization,” in Proc. ofthe 32nd International Conference on Machine Learning, vol.37, pp. 1889– 1897, 2015

  42. [42]

    Proximal Policy Optimization Algorithms

    J. Schulman et al., “Proximal Policy Optimization,”arXiv Preprint, arXiv: 1707.06347, 2016

  43. [43]

    Adam: A Method for Stochastic Optimization

    D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” arXiv Preprint, arXiv:1412.6980, 2014

  44. [44]

    arXiv preprint arXiv:2005.12729 , year=

    L. Engstrom et al., “Implementation Matters in Deep RL: A Case Study on PPO and TRPO,”arXiv Preprint, arXiv:2005.12729, 2020

  45. [45]

    What matters in on-policy reinforcement learning? A large-scal e empirical study

    M. Andrychowicz et al., “What Matters In On-Policy Reinforce- ment Learning? A Large-Scale Empirical Study,”arXiv Preprint, arXiv:2006.05990, 2020

  46. [46]

    Hairer et al.,Solving Ordinary Differential Equations I, Springer, 1993

    E. Hairer et al.,Solving Ordinary Differential Equations I, Springer, 1993

  47. [47]

    Meal Simulation Model of the Glucose-Insulin System,

    C. D. Man et al., “Meal Simulation Model of the Glucose-Insulin System,”IEEE Transactions on Biomedical Engineering, vol.54, no. 10, pp. 1740–1749, 2007

  48. [48]

    Clinical Targets for Continuous Glucose Monitoring Data Interpretation: Recommendations From the International Consen- sus on Time in Range,

    T. Battelino et al., “Clinical Targets for Continuous Glucose Monitoring Data Interpretation: Recommendations From the International Consen- sus on Time in Range,”Diabetes Care, vol.42, no.8, pp. 1593–1603, 2019

  49. [49]

    Reinforcement Learning for Robot Navigation with Adaptive Forward Simulation Time (AFST) in a Semi-Markov Model,

    Y . Chen et al., “Reinforcement Learning for Robot Navigation with Adaptive Forward Simulation Time (AFST) in a Semi-Markov Model,” Proc. ofIEEE/RSJ International Conference on Intelligent Robots and Systems (IROS), pp. 3597–3604, 2023

  50. [50]

    Simglucose v0.2.1,

    J. Xie, “Simglucose v0.2.1,” [Online]. Available: https://github.com/ jxx123/simglucose?tab=readme-ov-file

  51. [51]

    Control-Informed Reinforcement Learning for Chem- ical Processes,

    M. Bloor et al., “Control-Informed Reinforcement Learning for Chem- ical Processes,”Industrial & Engineering Chemistry Research, vol.64, no.9, pp. 4966–4978, 2026. APPENDIXA MEALSCENARIOGENERATION At the beginning of each episode, a stochastic meal scenario is generated. The scenario consists of a set of meal events characterized by their occurrence times...