pith. sign in

arxiv: 2604.19066 · v1 · submitted 2026-04-21 · 💻 cs.LG · stat.AP

Age-Dependent Heterogeneity in the Association Between Physical Activity and Mental Distress: A Causal Machine Learning Analysis of 3.2 Million U.S. Adults

Pith reviewed 2026-05-10 03:38 UTC · model grok-4.3

classification 💻 cs.LG stat.AP
keywords physical activitymental distressage heterogeneitycausal machine learningtreatment effect variationBRFSS surveyodds ratio trends
0
0 comments X

The pith

The protective association between physical activity and mental health strengthens with age and has nearly disappeared for young adults in recent years.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Using survey data from more than 3.2 million U.S. adults across ten years, the analysis finds that physical activity is linked to lower odds of frequent mental distress, but this link is weakest for those aged 18 to 24 and grows stronger in older groups. The effect for young adults has eroded over the decade, turning statistically null in multiple recent years. Causal machine learning confirms age as the strongest source of variation in how physical activity relates to distress. These patterns indicate that the exercise-mental health connection observed in the broader population does not apply evenly to the youngest adults.

Core claim

Survey-weighted logistic regression on pooled Behavioral Risk Factor Surveillance System data shows the adjusted odds ratio for physical activity and frequent mental distress ranging from 0.89 in adults aged 18-24 to 0.50 in those aged 55-64, with the protective association increasing monotonically with age. Temporal trends reveal the young-adult association eroding to 1.01 in 2018 and 2024. Causal forest models via double machine learning rank age as the top driver of treatment-effect heterogeneity, with feature importance twice that of the next variable.

What carries the argument

Age as the primary moderator of the physical activity effect on frequent mental distress, identified through survey-weighted logistic regression and confirmed by causal forest analysis.

If this is right

  • Mental health interventions for young adults may need to address factors beyond physical activity alone.
  • Public health recommendations should differentiate by age when promoting exercise for distress reduction.
  • The weakening link in recent years parallels and may contribute to the documented rise in youth mental health issues.
  • Older adult populations could see larger returns from physical activity programs targeting mental health.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the age pattern holds, policies focused solely on increasing exercise in young adults may have limited impact without complementary supports.
  • Similar heterogeneity analyses could be applied to other health behaviors where age differences are suspected but untested.
  • The findings raise the question of whether digital or social stressors specific to younger cohorts are now overriding the usual benefits of activity.

Load-bearing premise

The observed age gradient and temporal changes reflect causal effects of physical activity on mental distress rather than residual confounding, reverse causation, or biases in self-reported measures.

What would settle it

A longitudinal study that measures changes in physical activity and mental distress while adjusting for social support, economic stressors, and other unmeasured factors would show whether the age-specific associations remain or disappear.

Figures

Figures reproduced from arXiv: 2604.19066 by Duke University), Yuan Shan (Department of Statistical Science.

Figure 1
Figure 1. Figure 1: Analytical pipeline. From the pooled BRFSS data (top), three parallel analysis streams converge on the central finding (bottom): the confirmatory arm provides precise age-stratified estimates; the causal ML arm independently discovers age as the dominant heterogeneity driver; the robustness arm validates assumptions. 3 Methods [PITH_FULL_IMAGE:figures/full_fig_p004_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Stratified PA odds ratios by sex and age group (pooled 2015–2024). Blue dashed line indicates the overall adjusted OR. The protective association strengthens with age, plateauing after age 55. 4.5 Causal Forest Results The Causal Forest estimated an overall Average Treatment Effect (ATE) of −0.061 (95% CI: [−0.153, 0.031]), indicating that PA reduces FMD probability by 6.1 percentage points on average, con… view at source ↗
Figure 3
Figure 3. Figure 3: Year-by-year age-stratified PA odds ratios for FMD across 2015–2024. Each line represents one survey year. The 18–24 group consistently shows the weakest effect [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Causal Forest CATE estimates by age. (a) Smoothed treatment effect vs. continuous age with 95% CI band. (b) Mean CATE by age group with confidence intervals [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Feature importance for PA treatment effect heterogeneity from the Causal Forest. Age dominates, confirming it as the primary source of heterogeneity. Imputation sensitivity. Age-stratified ORs were virtually identical between complete-case and imputed samples (e.g., 18–24: 0.892 vs. 0.895; 65+: 0.533 vs. 0.508), confirming robustness to the handling of missing income data. 5 Discussion This study provides … view at source ↗
Figure 6
Figure 6. Figure 6: (a) PA odds ratios by age group with 95% CIs. (b) Corresponding E-values for unmeasured confounding sensitivity. The temporal erosion of PA’s benefit for young adults. Perhaps the most striking finding is that the PA–FMD association among 18–24-year-olds is not only weak but has been intermittently reaching the null value across the decade. The 18–24 OR reached 1.007 (null) in both 2018 and 2024, and was n… view at source ↗
read the original abstract

Physical activity (PA) is widely recognized as protective against mental distress, yet whether this benefit varies systematically across population subgroups remains poorly understood. Using pooled data from ten consecutive annual waves of the U.S. Behavioral Risk Factor Surveillance System (2015-2024; n = 3,242,218), we investigate heterogeneity in the association between leisure-time PA and frequent mental distress (FMD, >=14 days/month) across age groups. Survey-weighted logistic regression reveals a striking age gradient: the adjusted odds ratio for PA ranges from 0.89 among young adults (18-24) to 0.50 among adults aged 55-64, with the protective association strengthening monotonically with age. Temporal analysis across all ten years shows that the young-adult PA effect has been eroding over the past decade, with the 18-24 OR reaching 1.01 (null) in both 2018 and 2024 -- paralleling the deepening youth mental health crisis. Causal Forest via Double Machine Learning independently identifies age as the dominant driver of treatment effect heterogeneity (feature importance = 0.39, 2.5x the next predictor). E-value sensitivity analysis, propensity score overlap checks, placebo tests, and imputation comparisons confirm the robustness of the findings. These results suggest that the well-documented exercise--mental health link may not generalize to the youngest adult population, whose distress appears increasingly driven by stressors that PA alone cannot mitigate.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript pools ten waves of BRFSS data (2015-2024, n=3,242,218) and uses survey-weighted logistic regression plus Causal Forest via Double Machine Learning to document an age gradient in the association between leisure-time physical activity and frequent mental distress (FMD). The adjusted OR for PA ranges from 0.89 (18-24) to 0.50 (55-64), strengthening monotonically with age; age emerges as the dominant heterogeneity driver (feature importance 0.39); the young-adult association has eroded to null in recent years. Robustness is assessed via E-value, overlap, placebo, and imputation checks.

Significance. If the reported age gradient is not an artifact of residual confounding, the work supplies large-scale, policy-relevant evidence that the protective PA-FMD link is heterogeneous and may be absent for young adults, aligning with the documented youth mental-health crisis. The combination of survey weighting, DML heterogeneity detection, and multiple sensitivity analyses constitutes a solid empirical contribution to causal-inference applications in public health.

major comments (2)
  1. [Methods (sensitivity analyses) and Results (age-gradient and temporal analyses)] The central causal claim (abstract and discussion) that PA exerts a monotonically strengthening protective effect on FMD rests on conditional ignorability. The E-value, placebo, and propensity-overlap checks (Methods, sensitivity subsection) address average confounding but do not directly test age-specific unmeasured confounders (e.g., social-media exposure, economic precarity) or reverse causation that could vary systematically by age group; the cross-sectional design precludes temporal ordering within waves.
  2. [Results (logistic regression and Causal Forest subsections)] Table 3 (or equivalent results table) reports the monotonic OR gradient and the Causal Forest feature importance for age, yet the manuscript does not present the full covariate set, age-by-PA interaction coefficients, or a formal test of the monotonicity hypothesis; without these, it is unclear whether the reported gradient survives adjustment for age-varying socioeconomic or reporting factors.
minor comments (2)
  1. [Methods] Clarify the exact FMD threshold (≥14 days) and leisure-time PA definition in the first paragraph of the Methods; these operational choices affect the reported ORs and should be stated before any results.
  2. [Abstract and Results (temporal analysis)] The abstract states the 18-24 OR reaches 1.01 in 2018 and 2024; add a footnote or appendix table showing the year-specific estimates and standard errors to allow readers to assess the precision of the 'null' claim.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments, which have helped us improve the clarity and robustness of our analysis. We have revised the manuscript to address the major concerns by expanding the presentation of results, adding formal tests, and more explicitly discussing limitations related to the cross-sectional design and potential unmeasured confounding.

read point-by-point responses
  1. Referee: The central causal claim (abstract and discussion) that PA exerts a monotonically strengthening protective effect on FMD rests on conditional ignorability. The E-value, placebo, and propensity-overlap checks (Methods, sensitivity subsection) address average confounding but do not directly test age-specific unmeasured confounders (e.g., social-media exposure, economic precarity) or reverse causation that could vary systematically by age group; the cross-sectional design precludes temporal ordering within waves.

    Authors: We agree that the cross-sectional design of the BRFSS data limits our ability to establish temporal ordering and fully exclude reverse causation or age-specific unmeasured confounders such as differential social media exposure or economic precarity across age groups. The sensitivity analyses we presented address average effects, and while the Causal Forest identifies age as the primary heterogeneity driver, it does not eliminate the possibility of age-varying confounding. In the revised manuscript, we have added a dedicated paragraph in the Discussion section acknowledging these limitations and their implications for causal interpretation, particularly in light of the youth mental health crisis. We have also included age-stratified E-value calculations to provide some insight into age-specific robustness. However, we maintain that the large sample size, survey weighting, and multiple robustness checks provide valuable descriptive and associational evidence that aligns with the observed patterns. revision: partial

  2. Referee: Table 3 (or equivalent results table) reports the monotonic OR gradient and the Causal Forest feature importance for age, yet the manuscript does not present the full covariate set, age-by-PA interaction coefficients, or a formal test of the monotonicity hypothesis; without these, it is unclear whether the reported gradient survives adjustment for age-varying socioeconomic or reporting factors.

    Authors: Thank you for this observation. We have revised the manuscript to include the complete list of covariates used in the models, now presented in the Methods section and Supplementary Table S1. We have added the age-by-PA interaction coefficients from the logistic regression in a new Supplementary Table S2, which show statistically significant interactions consistent with the observed gradient. Furthermore, we have conducted and reported a formal test for the monotonicity of the age gradient by fitting a model with age as an ordinal variable and testing the linear trend, yielding a significant result (p < 0.001). These additions confirm that the gradient holds after adjustment for socioeconomic and reporting factors. revision: yes

Circularity Check

0 steps flagged

No circularity in empirical analysis of survey data

full rationale

The paper is a purely empirical study applying standard survey-weighted logistic regression and Causal Forest/DML methods to public BRFSS data. No mathematical derivation chain exists that reduces any result to a fitted parameter or self-referential quantity by construction. All reported quantities (age-specific ORs, feature importances, E-values) are direct statistical outputs from the data and off-the-shelf estimators, with no self-definitional loops, fitted-input predictions, or load-bearing self-citations. The analysis is self-contained against external benchmarks and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

3 free parameters · 3 axioms · 0 invented entities

The central claim rests on standard causal inference assumptions plus author choices for age bins and distress threshold; no new entities are postulated.

free parameters (3)
  • age group boundaries
    Cutoffs such as 18-24 and 55-64 are chosen by the authors to reveal the gradient.
  • FMD threshold
    Frequent mental distress defined as >=14 days/month; this cutoff is conventional but arbitrary.
  • causal forest hyperparameters
    Tuning parameters for the Double Machine Learning forest that affect feature importance and heterogeneity estimates.
axioms (3)
  • domain assumption No unmeasured confounding between leisure-time PA and FMD conditional on observed covariates
    Required for interpreting the odds ratios and treatment effects as causal.
  • domain assumption Positivity / overlap: every individual has positive probability of observed PA levels given covariates
    Needed for causal forest and propensity score methods to be valid.
  • domain assumption Survey sampling weights correctly represent the US adult population
    Invoked for all weighted estimates to be nationally representative.

pith-pipeline@v0.9.0 · 5578 in / 1615 out tokens · 32641 ms · 2026-05-10T03:38:11.489069+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

27 extracted references · 27 canonical work pages

  1. [1]

    About mental health.https://www.cdc.gov/ mentalhealth/learn/, 2023

    Centers for Disease Control and Prevention. About mental health.https://www.cdc.gov/ mentalhealth/learn/, 2023

  2. [2]

    Jean M Twenge, A Bell Cooper, Thomas E Joiner, Mary E Duffy, and Sarah G Binau. Age, period, and cohort trends in mood disorder indicators and suicide-related outcomes in a nationally representative dataset, 2005–2017.Journal of Abnormal Psychology, 128(3):185–199, 2019

  3. [3]

    Increases in poor mental health, mental distress, and depression symptoms among U.S

    Jean M Twenge and Gabrielle N Martin. Increases in poor mental health, mental distress, and depression symptoms among U.S. adults, 1993–2020.Journal of Mood and Anxiety Disorders, 1:100005, 2023

  4. [4]

    Trends in anxiety among adults in the United States, 2008–2018: Rapid increases among young adults.Journal of Psychiatric Research, 150:299–307, 2022

    Renee D Goodwin, Andrea H Weinberger, Jung Hye Kim, Melody Wu, and Sandro Galea. Trends in anxiety among adults in the United States, 2008–2018: Rapid increases among young adults.Journal of Psychiatric Research, 150:299–307, 2022

  5. [5]

    Adolescent mental health and smartphone-based social media: declining well-being is a global trend.Unpublished manuscript, New York University, 2023

    Jonathan Haidt and Jean M Twenge. Adolescent mental health and smartphone-based social media: declining well-being is a global trend.Unpublished manuscript, New York University, 2023

  6. [6]

    Emerging adulthood: A theory of development from the late teens through the twenties.American Psychologist, 55(5):469–480, 2000

    Jeffrey Jensen Arnett. Emerging adulthood: A theory of development from the late teens through the twenties.American Psychologist, 55(5):469–480, 2000

  7. [7]

    Physical activity and incident depression: A meta-analysis of prospective cohort studies

    Felipe B Schuch, Davy Vancampfort, Joseph Firth, Simon Rosenbaum, Philip B Ward, Edson S Silva, Mats Hallgren, Antonio Ponce De Leon, Andrea L Dunn, Andrea C Deslandes, et al. Physical activity and incident depression: A meta-analysis of prospective cohort studies. American Journal of Psychiatry, 175(7):631–648, 2018

  8. [8]

    Association between physical activity and risk of depression: a systematic review and meta-analysis.JAMA Psychiatry, 79 (6):550–559, 2022

    Mats Pearce, Leandro Garcia, Ali Abbas, Tessa Strain, Felipe B Schuch, Rajna Golber, Paul Kelly, Aaron Mok, Ivo Rakovac, Kabir P Sadarangani, et al. Association between physical activity and risk of depression: a systematic review and meta-analysis.JAMA Psychiatry, 79 (6):550–559, 2022

  9. [9]

    Effectiveness of physical activity interventions for improving depression, anxiety and distress: an overview of systematic reviews

    Ben Singh, Timothy Olds, Rachel Curtis, Dorothea Dumuid, Rosa Virgara, Andrea Watson, Kit Szeto, Edward O’Connor, Ty Ferguson, Emily Eglitis, et al. Effectiveness of physical activity interventions for improving depression, anxiety and distress: an overview of systematic reviews. British Journal of Sports Medicine, 57(18):1203–1209, 2023

  10. [10]

    Neurobiology of exercise.Obesity, 14(3):345–356, 2006

    Rod K Dishman, Hans-Rudolf Berthoud, Frank W Booth, Carl W Cotman, V Regis Edgerton, Monika R Fleshner, Simon C Gandevia, Fernando Gomez-Pinilla, Benjamin N Greenwood, Charles H Hillman, et al. Neurobiology of exercise.Obesity, 14(3):345–356, 2006. 12

  11. [11]

    Physical activity for cognitive and mental health in youth: a systematic review of mechanisms.Pediatrics, 138(3): e20161642, 2016

    David R Lubans, Justin Richards, Charles H Hillman, Guy Faulkner, Mark Beauchamp, Michael Nilsson, Paul Kelly, Jordan Smith, Lauren Raine, and Stuart Biddle. Physical activity for cognitive and mental health in youth: a systematic review of mechanisms.Pediatrics, 138(3): e20161642, 2016

  12. [12]

    2024.Global status report on alcohol and health and treatment of substance use disorders

    World Health Organization. WHO guidelines on physical activity and sedentary behaviour. https://www.who.int/publications/i/item/9789240015128, 2020

  13. [13]

    Sammi R Chekroud, Ralitza Gueorguieva, Amanda B Zheutlin, Martin Paulus, Harlan M Krumholz, John H Krystal, and Adam M Chekroud. Association between physical exercise and mental health in 1.2 million individuals in the USA between 2011 and 2015: a cross-sectional study.The Lancet Psychiatry, 5(9):739–746, 2018

  14. [14]

    Exercise and the prevention of depression: results of the HUNT cohort study

    Samuel B Harvey, Simon Øverland, Stephani L Hatch, Simon Wessely, Arnstein Mykletun, and Matthew Hotopf. Exercise and the prevention of depression: results of the HUNT cohort study. American Journal of Psychiatry, 175(1):28–36, 2018

  15. [15]

    Physical activity in European adolescents and associations with anxiety, depression and well-being

    Elaine M McMahon, Paul Corcoran, Grace O’Regan, Helen Keeley, Mary Cannon, Vladimir Carli, Camilla Wasserman, Gergoe Hadlaczky, Marco Sarchiapone, Alan Apter, et al. Physical activity in European adolescents and associations with anxiety, depression and well-being. European Child & Adolescent Psychiatry, 26(1):111–122, 2017

  16. [16]

    Depressive symptoms and objectively measured physical activity and sedentary behaviour throughout adolescence: a prospective cohort study.The Lancet Psychiatry, 7(3):262–271, 2020

    AaronKandola, GlynLewis, DavidPJOsborn, BrendonStubbs, andJosephFHayes. Depressive symptoms and objectively measured physical activity and sedentary behaviour throughout adolescence: a prospective cohort study.The Lancet Psychiatry, 7(3):262–271, 2020

  17. [17]

    Estimation and inference of heterogeneous treatment effects using random forests.Journal of the American Statistical Association, 113(523):1228–1242, 2018

    Stefan Wager and Susan Athey. Estimation and inference of heterogeneous treatment effects using random forests.Journal of the American Statistical Association, 113(523):1228–1242, 2018

  18. [18]

    Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 21(1):C1–C68, 2018

    Victor Chernozhukov, Denis Chetverikov, Mert Demirer, Esther Duflo, Christian Hansen, Whitney Newey, and James Robins. Double/debiased machine learning for treatment and structural parameters.The Econometrics Journal, 21(1):C1–C68, 2018

  19. [19]

    Behavioral Risk Factor Surveillance System: 2024 survey data and documentation.https://www.cdc.gov/brfss/annual_data/annual_2024

    Centers for Disease Control and Prevention. Behavioral Risk Factor Surveillance System: 2024 survey data and documentation.https://www.cdc.gov/brfss/annual_data/annual_2024. html, 2025

  20. [20]

    Carol Pierannunzi, Shaohua Sean Hu, and Lina Balluz. A systematic review of publications assessing reliability and validity of the Behavioral Risk Factor Surveillance System (BRFSS), 2004–2011.BMC Medical Research Methodology, 13:49, 2013

  21. [21]

    Generalized random forests.The Annals of Statistics, 47(2):1148–1178, 2019

    Susan Athey, Julie Tibshirani, and Stefan Wager. Generalized random forests.The Annals of Statistics, 47(2):1148–1178, 2019. 13

  22. [22]

    Towards optimal doubly robust estimation of heterogeneous causal effects

    Edward H Kennedy. Towards optimal doubly robust estimation of heterogeneous causal effects. Electronic Journal of Statistics, 17(2):3008–3049, 2023

  23. [23]

    EconML: A Python package for ML-based heterogeneous treatment effects estimation.https://github.com/py-why/EconML, 2019

    Microsoft Research. EconML: A Python package for ML-based heterogeneous treatment effects estimation.https://github.com/py-why/EconML, 2019

  24. [24]

    Estimating treatment effect heterogeneity in psychiatry: a review and tutorial with causal forests.International Journal of Methods in Psychiatric Research, 34(1):e70015, 2025

    Erik Sverdrup and Stefan Wager. Estimating treatment effect heterogeneity in psychiatry: a review and tutorial with causal forests.International Journal of Methods in Psychiatric Research, 34(1):e70015, 2025

  25. [25]

    Sensitivity analysis in observational research: introducing the E-value.Annals of Internal Medicine, 167(4):268–274, 2017

    Tyler J VanderWeele and Peng Ding. Sensitivity analysis in observational research: introducing the E-value.Annals of Internal Medicine, 167(4):268–274, 2017

  26. [26]

    John Wiley & Sons, 1987

    Donald B Rubin.Multiple Imputation for Nonresponse in Surveys. John Wiley & Sons, 1987

  27. [27]

    The central role of the propensity score in observational studies for causal effects.Biometrika, 70(1):41–55, 1983

    Paul R Rosenbaum and Donald B Rubin. The central role of the propensity score in observational studies for causal effects.Biometrika, 70(1):41–55, 1983. 14 A Full Odds Ratio Table Table 4:Adjusted odds ratios from pooled survey-weighted logistic regression (2015–2024, all 10 years). Reference categories: inactive, male, age 18–24, White NH,<high school, i...