Precision Physical Activity Prescription via Reinforcement Learning for Functional Actions
Pith reviewed 2026-05-20 02:59 UTC · model grok-4.3
The pith
Offline reinforcement learning derives personalized daily step distributions from All of Us data that associate with improved cardiometabolic markers.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors introduce an offline reinforcement learning algorithm designed for functional actions, where each action is a full distribution of daily steps across a time window. Using large-scale observational records that link step counts to cardiometabolic biomarkers, the method learns a policy whose recommended activity patterns are associated with lower risk markers. The learned policy increases total daily steps and reduces day-to-day variability while providing subgroup-specific adjustments for blood glucose level, body mass index, blood pressure, age, and sex.
What carries the argument
Offline reinforcement learning algorithm that learns policies over functional actions representing daily step count distributions, trained on paired step and biomarker trajectories from the All of Us program.
If this is right
- The optimal policy generally recommends higher daily step totals than observed in the data.
- It favors a steadier, less variable pattern of activity across days.
- Recommendations differ for subgroups defined by blood glucose level.
- Further tailoring occurs according to body mass index, blood pressure, age, and sex.
Where Pith is reading between the lines
- The same functional-action reinforcement learning setup could be applied to other continuous wearable signals such as heart rate or sleep duration.
- Embedding the learned policy in a smartphone app would allow real-time updates as new step and biomarker data arrive.
- Direct comparison of the derived policy against current public health step guidelines could quantify how much additional personalization improves outcomes.
Load-bearing premise
The observational All of Us step counts and biomarker records contain enough information for the algorithm to recover activity policies whose effects on cardiometabolic markers are not substantially distorted by unmeasured confounding or selection bias.
What would settle it
A randomized trial that assigns participants to follow the learned step-distribution policy versus usual care and then tracks changes in blood glucose, BMI, and blood pressure would directly test whether the recommended patterns produce the expected biomarker improvements.
Figures
read the original abstract
Physical activity (PA) plays an important role in maintaining and improving health. Daily steps have been a key PA measure that is easily accessible with common wearable devices. However, methods are lacking to recommend a personalized optimal distribution of daily steps over a period of time for the best of certain health biomarkers. In this paper, we fill this void based on the data from the All of Us Research Program which includes months of step counts as well as repeated measurements of key health biomarkers. We develop a new offline reinforcement learning (RL) algorithm to learn personalized and optimal PA distributions associated with cardiometabolic risk, where the action is a function representing the daily step distribution over a period of time. Simulation studies demonstrate the advantage of the proposed approach over existing continuous-action RL methods. The learned optimal policy from the All of Us data generally suggests people take more daily steps and also follow a more consistent pattern of PA over time while offering tailored recommendations for subgroups in blood glucose level, body mass index, blood pressure, age, and sex.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a new offline reinforcement learning algorithm that treats the daily step count distribution over a multi-day period as a functional action. It applies this method to observational data from the All of Us Research Program to learn personalized policies that are claimed to optimize cardiometabolic biomarkers (blood glucose, BMI, blood pressure). Simulation studies are reported to show advantages over existing continuous-action RL baselines, and the real-data analysis concludes that the learned policy recommends higher and more consistent step counts with subgroup-specific tailoring by age, sex, and biomarker levels.
Significance. If the offline RL procedure can be shown to recover policies whose value functions reflect causal effects rather than spurious associations, the work would offer a practical framework for precision physical-activity prescriptions that leverage readily available wearable data. The functional-action formulation is a distinctive technical contribution that could be adopted in other longitudinal health-behavior settings.
major comments (3)
- [§4.2] §4.2 (Offline RL algorithm and value-function estimation): the procedure learns the policy directly from the observational All of Us trajectories without any described adjustment for unmeasured confounding, time-varying covariates, or selection bias. Because the headline claim is that the resulting policy improves cardiometabolic markers, the absence of confounding control (e.g., via negative controls, instrumental variables, or explicit sensitivity analysis) is load-bearing for the causal interpretation of the subgroup recommendations.
- [§5.2] §5.2 (Real-data results): no quantitative performance metrics, confidence intervals, or sensitivity checks are supplied for the All of Us policy; the text only states that the policy “generally suggests” more steps and consistency. This makes it impossible to judge the magnitude or robustness of the claimed biomarker improvements that underpin the personalized recommendations.
- [§3] §3 (Simulation studies): while the abstract asserts an advantage over existing continuous-action RL methods, the specific evaluation metrics (regret, value-function difference, or biomarker improvement) and their variability across replications are not reported in sufficient detail to verify that the proposed functional-action approach is meaningfully superior under realistic confounding structures.
minor comments (2)
- [§2.1] The notation for the functional action space (step-count distribution over a horizon) is introduced without an explicit mathematical definition or illustrative plot; adding a short equation and example figure would improve readability.
- [§4.1] Several biomarker trajectories are described as “repeated measurements” but the exact number of observations per participant and the handling of missingness are not stated; a brief table summarizing the data structure would help.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which have helped us identify areas for improvement in the manuscript. We address each major comment below and indicate the revisions we plan to make.
read point-by-point responses
-
Referee: [§4.2] §4.2 (Offline RL algorithm and value-function estimation): the procedure learns the policy directly from the observational All of Us trajectories without any described adjustment for unmeasured confounding, time-varying covariates, or selection bias. Because the headline claim is that the resulting policy improves cardiometabolic markers, the absence of confounding control (e.g., via negative controls, instrumental variables, or explicit sensitivity analysis) is load-bearing for the causal interpretation of the subgroup recommendations.
Authors: We agree that the manuscript would benefit from a clearer discussion of the assumptions underlying the causal interpretation of the learned policies. Our approach is based on offline RL applied to observational data, which implicitly relies on the no unmeasured confounding assumption for causal claims. In the revised manuscript, we will expand the methods section to explicitly state these assumptions and add a sensitivity analysis subsection. This will include bounding the policy value under different levels of unmeasured confounding and discussing potential time-varying confounders in the All of Us data context. We believe this will strengthen the interpretation without overclaiming causality. revision: yes
-
Referee: [§5.2] §5.2 (Real-data results): no quantitative performance metrics, confidence intervals, or sensitivity checks are supplied for the All of Us policy; the text only states that the policy “generally suggests” more steps and consistency. This makes it impossible to judge the magnitude or robustness of the claimed biomarker improvements that underpin the personalized recommendations.
Authors: We acknowledge the lack of quantitative details in the real-data analysis. To address this, we will revise §5.2 to include specific quantitative results, such as the average increase in recommended daily steps (with standard deviations), consistency measures (e.g., variance of daily steps), and estimated improvements in biomarker values under the learned policy. We will also add bootstrap confidence intervals for these estimates and perform sensitivity checks by varying the number of days in the functional action and the RL hyperparameters. These additions will allow for a better assessment of the magnitude and robustness of the findings. revision: yes
-
Referee: [§3] §3 (Simulation studies): while the abstract asserts an advantage over existing continuous-action RL methods, the specific evaluation metrics (regret, value-function difference, or biomarker improvement) and their variability across replications are not reported in sufficient detail to verify that the proposed functional-action approach is meaningfully superior under realistic confounding structures.
Authors: Thank you for this observation. The simulation studies in §3 compare our functional-action RL method against continuous-action baselines using metrics such as average regret and value function estimates. However, we recognize that more detailed reporting is needed. In the revision, we will include comprehensive tables showing the mean and standard error of these metrics across 100 simulation replications. We will also add experiments under simulated confounding structures to demonstrate performance under realistic conditions, thereby verifying the advantages more rigorously. revision: yes
Circularity Check
No significant circularity; derivation is data-driven RL on observational inputs
full rationale
The paper develops an offline RL algorithm whose output (optimal policy for step distributions) is computed directly from the All of Us observational records via the proposed method. No equation or procedure reduces the target result to a quantity defined in terms of itself, nor renames a fitted parameter as a prediction. The abstract and described approach contain no self-citation load-bearing steps, uniqueness theorems imported from prior author work, or ansatzes smuggled via citation. The central claim remains an empirical learning result whose validity hinges on external assumptions about confounding rather than internal definitional equivalence.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We develop a new offline reinforcement learning (RL) algorithm to learn personalized and optimal PA distributions... using penalized splines... LQD transformation
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Algorithm 2... policy update... maximizing averaged Q with roughness penalty on second derivative
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
A. Agarwal, N. Jiang, S. M. Kakade, and W. Sun. Reinforcement learning: Theory and algorithms. Technical report, University of Washington, Seattle, WA, 2019
work page 2019
- [2]
-
[3]
M. N. Ahmadi, L. F. M. Rezende, G. Ferrari, B. del Pozo Cruz, I.-M. Lee, and E. Sta- matakis. Do the associations of daily steps with mortality and incident cardiovascular disease differ by sedentary time levels? a device-based cohort study.British Journal of Sports Medicine, 58(5):261–268, 2024. 29
work page 2024
-
[4]
Resources for using Fitbit data
All of Us Research Program. Resources for using Fitbit data. User Support article, 2025
work page 2025
- [5]
- [6]
-
[7]
Y.-K. Chang, L.-F. Huang, S.-J. Shin, K.-D. Lin, K. Chong, F.-S. Yen, H.-Y. Chang, S.-Y. Chuang, T.-J. Hsieh, C. A. Hsiung, and C.-C. Hsu. A point-based mortality prediction system for older adults with diabetes.Scientific Reports, 7(1):12652, 2017
work page 2017
-
[8]
L. Cleven, J. Krell-Roesch, C. R. Nigg, and A. Woll. The association between physical activity with incident obesity, coronary heart disease, diabetes and hypertension in adults: a systematic review of longitudinal studies published after 2012.BMC Public Health, 20:726, 2020
work page 2012
-
[9]
P.deSoutoBarreto, J.Delrieu, S.Andrieu, B.Vellas, andY.Rolland. Physicalactivity and cognitive function in middle-aged and older adults: An analysis of 104,909 people from 20 countries.Mayo Clinic Proceedings, 91(11):1515–1524, 2016
work page 2016
-
[10]
B. del Pozo Cruz, M. N. Ahmadi, S. L. Naismith, and E. Stamatakis. Association of daily step count and intensity with incident dementia in 78,430 adults living in the UK.JAMA Neurology, 79(10):1059–1063, 2022
work page 2022
-
[11]
B. del Pozo Cruz, S. J. H. Biddle, P. A. Gardiner, and D. Ding. Light-intensity physical activity and life expectancy: National health and nutrition survey.American Journal of Preventive Medicine, 61(3):428–433, 2021. 30
work page 2021
-
[12]
A. Delaigle and P. Hall. Defining probability density for a distribution of random functions.The Annals of Statistics, 38(2):1171–1193, 2010
work page 2010
-
[13]
J. M. Desman, Z.-W. Hong, M. Sabounchi, A. S. Sawant, J. Gill, A. C. Costa, G. Ku- mar, R. Sharma, A. Gupta, P. McCarthy, V. Nandwani, D. Powell, A. Carideo, D. Goodwin, S. Ahmed, U. Gidwani, M. A. Levin, R. Varghese, F. Filsoufi, R. Free- man, A. Shetreat-Klein, A. W. Charney, I. Hofer, L. Chan, D. Reich, P. Kovatch, R. Kohli-Seth, M. Kraft, P. Agrawal, ...
work page 2025
- [14]
- [15]
-
[16]
J. L. Gay, D. M. Buchner, and M. D. Schmidt. Dose-response association of physical activity with hba1c: Intensity and bout length.Preventive Medicine, 86:58–63, 2016
work page 2016
-
[17]
R. Ghosal, S. K. Ghosh, J. A. Schrack, and V. Zipunnikov. Distributional outcome regression via quantile functions and its application to modelling continuously moni- tored heart rate and physical activity.Journal of the American Statistical Association, 120(551):1347–1359, 2025
work page 2025
- [18]
- [19]
-
[20]
G. S. Kimeldorf and G. Wahba. A correspondence between Bayesian estimation on stochastic processes and smoothing by splines.The Annals of Mathematical Statistics, 41(2):495–502, 1970
work page 1970
-
[21]
W. E. Kraus, K. F. Janz, K. E. Powell, W. W. Campbell, J. M. Jakicic, R. P. Troiano, K. Sprow, A. Torres, K. L. Piercy, and 2018 Physical Activity Guidelines Advisory Committee. Daily step counts for measuring physical activity exposure and its relation to health.Medicine & Science in Sports & Exercise, 51(6):1206–1212, 2019
work page 2018
-
[22]
T. Lattimore, M. Hutter, and P. Sunehag. The sample-complexity of general rein- forcement learning. InProceedings of the 30th International Conference on Machine Learning, PMLR 28, pages 28–36, 2013
work page 2013
-
[23]
H. M. Le, C. Voloshin, and Y. Yue. Batch policy learning under constraints. In Proceedings of the 36th International Conference on Machine Learning, PMLR 97, pages 3703–3712, 2019
work page 2019
-
[24]
P. Liao, K. Greenewald, P. Klasnja, and S. Murphy. Personalized heartsteps: A reinforcement learning algorithm for optimizing physical activity.Proceedings of the ACM on Interactive, Mobile, Wearable and Ubiquitous Technologies, 4(1):18, 2020
work page 2020
-
[25]
Z. Long and X. Zhang. Learning causal effect of physical activity distribution: An 32 application of functional treatment effect estimation with unmeasured confounding. Journal of Applied Statistics, 52(14):2759–2776, 2025
work page 2025
-
[26]
M. Matabuena and A. Petersen. Distributional data analysis of accelerometer data from the NHANES database using nonparametric survey regression models.Journal of the Royal Statistical Society: Series C (Applied Statistics), 72(2):294–313, 2023
work page 2023
-
[27]
G. Neumann and J. Peters. Fitted Q-iteration by advantage weighted regression. In Advances in Neural Information Processing Systems 21, pages 1177–1184, 2008
work page 2008
-
[28]
A. E. Paluch, S. Bajpai, D. R. Bassett, M. R. Carnethon, U. Ekelund, K. R. Evenson, D. A. Galuska, et al. Daily steps and all-cause mortality: A meta-analysis of 15 international cohorts.The Lancet Public Health, 7(3):e219–e228, 2022
work page 2022
-
[29]
Y. Pan, A.-M. Farahmand, M. White, S. Nabi, P. Grover, and D. Nikovski. Rein- forcement learning with function-valued action spaces for partial differential equation control. InProceedings of the 35th International Conference on Machine Learning, PMLR 80, pages 3986–3995, 2018
work page 2018
-
[30]
A. Petersen and H.-G. Müller. Functional data analysis for density functions by trans- formation to a Hilbert space.The Annals of Statistics, 44(1):183–218, 2016
work page 2016
-
[31]
A. Pini and S. Vantini. Interval-wise testing for functional data.Journal of Nonpara- metric Statistics, 29(2):407–424, 2017
work page 2017
-
[32]
Z. P. Rostron, R. A. Green, M. Kingsley, and A. Zacharias. Associations between measures of physical activity and muscle size and strength: A systematic review. Archives of Rehabilitation Research and Clinical Translation, 3(2):100124, 2021. 33
work page 2021
-
[33]
T. Strain, S. Flaxman, R. Guthold, E. Semenova, M. Cowan, L. M. Riley, F. C. Bull, G. A. Stevens, and Country Data Author Group. National, regional, and global trends in insufficient physical activity among adults from 2000 to 2022: A pooled analysis of 507 population-based surveys with 5.7 million participants.The Lancet Global Health, 12(8):e1232–e1243, 2024
work page 2000
- [34]
-
[35]
C. Tudor-Locke and D. R. Bassett. How many steps/day are enough? preliminary pedometer indices for public health.Sports Medicine, 34(1):1–8, 2004
work page 2004
-
[36]
C. Tudor-Locke, C. L. Craig, W. J. Brown, S. A. Clemes, K. De Cocker, B. Giles-Corti, Y. Hatano, S. Inoue, S. M. Matsudo, N. Mutrie, J.-M. Oppert, D. A. Rowe, M. D. Schmidt, G. M. Schofield, J. C. Spence, P. J. Teixeira, M. A. Tully, and S. N. Blair. How many steps/day are enough? for adults.International Journal of Behavioral Nutrition and Physical Activ...
work page 2011
-
[37]
C. Tudor-Locke, Y. Hatano, R. P. Pangrazi, and M. Kang. Revisiting “how many steps are enough?”.Medicine & Science in Sports & Exercise, 40(7 Suppl):S537–S543, 2008
work page 2008
-
[38]
A review of causal estimation of effects in mediation analyses
M. Uehara, C. Shi, and N. Kallus. A review of off-policy evaluation in reinforcement learning. arXiv:2212.06355, 2022
-
[39]
Department of Health and Human Services.Physical Activity Guidelines for Americans
U.S. Department of Health and Human Services.Physical Activity Guidelines for Americans. U.S. Department of Health and Human Services, Washington, DC, 2 edition, 2018. 34
work page 2018
-
[40]
J. Wang, R. K. W. Wong, X. Zhang, and K. C. G. Chan. Flexible functional treatment effect estimation.Journal of Machine Learning Research, 27(16):1–48, 2026
work page 2026
-
[41]
World Health Organization, Geneva, 2020
World Health Organization.WHO Guidelines on Physical Activity and Sedentary Behaviour. World Health Organization, Geneva, 2020
work page 2020
- [42]
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.