pith. machine review for the scientific record. sign in

arxiv: 2604.06990 · v1 · submitted 2026-04-08 · 💻 cs.LG · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Stress Estimation in Elderly Oncology Patients Using Visual Wearable Representations and Multi-Instance Learning

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:43 UTC · model grok-4.3

classification 💻 cs.LG cs.AI
keywords stress estimationwearable sensorsmultiple instance learningelderly oncologyperceived stress scalevisual representationsweak supervisionbreast cancer
0
0 comments X

The pith

Wearable sensor data enables moderate prediction of perceived stress in elderly breast cancer patients.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to estimate psychological stress from passive multimodal wearable devices in elderly oncology patients, where stress affects cardio-oncology outcomes but is typically measured only through infrequent self-report questionnaires. By converting activity, sleep, and ECG streams into visual representations and applying attention-based multiple instance learning on embeddings from a lightweight pretrained model, the approach handles the weak supervision setting of one questionnaire score per many unlabeled windows. Under leave-one-subject-out testing, this produces moderate agreement with Perceived Stress Scale scores at three and six months, pointing toward continuous monitoring integrated into routine surveillance.

Core claim

Transforming multimodal wearable streams into heterogeneous visual representations, embedding them with a pretrained Tiny-BioMoE into 192-dimensional vectors, and aggregating via attention-based multiple instance learning enables prediction of Perceived Stress Scale scores that show moderate agreement with questionnaire results (R²=0.24 at month 3 and R²=0.28 at month 6) under leave-one-subject-out evaluation in an elderly multicenter breast cancer cohort.

What carries the argument

Attention-based multiple instance learning aggregator operating on 192-dimensional embeddings from a lightweight pretrained mixture-of-experts model applied to visual representations of physical activity, sleep, and ECG data.

Load-bearing premise

That heterogeneous visual representations derived from multimodal wearable streams contain sufficient generalizable information about perceived stress to support accurate prediction under weak supervision in a new elderly oncology cohort.

What would settle it

Repeating the leave-one-subject-out protocol on data from an independent elderly oncology cohort and finding correlations below 0.3 with actual Perceived Stress Scale scores would show the predictions do not generalize.

Figures

Figures reproduced from arXiv: 2604.06990 by Anastasia Constantinidou, Andri Papakonstantinou, Aristofania Simatou, Dimitar Stefanovski, Dimitrios I. Fotiadis, Georgia Karanasiou, Ioannis Kyprakis, Kalliopi Keramida, Ketti Mazzocco, Konstantinos Marias, Manolis Tsiknakis, Vasileios Skaramagkas, Vasilis Bouratzis.

Figure 1
Figure 1. Figure 1: Example of visual representations used. From left to right: Physical Activity Heatmap, Hypnogram, Sleep Heatmap, Scalogram, Recurence, Poincar [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: A schematic illustration of the proposed pipeline; The first two images (human icons) were generated using Google Gemini [30] (synthetic images). [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
read the original abstract

Psychological stress is clinically relevant in cardio-oncology, yet it is typically assessed only through patient-reported outcome measures (PROMs) and is rarely integrated into continuous cardiotoxicity surveillance. We estimate perceived stress in an elderly, multicenter breast cancer cohort (CARDIOCARE) using multimodal wearable data from a smartwatch (physical activity and sleep) and a chest-worn ECG sensor. Wearable streams are transformed into heterogeneous visual representations, yielding a weakly supervised setting in which a single Perceived Stress Scale (PSS) score corresponds to many unlabeled windows. A lightweight pretrained mixture-of-experts backbone (Tiny-BioMoE) embeds each representation into 192-dimensional vectors, which are aggregated via attention-based multiple instance learning (MIL) to predict PSS at month 3 (M3) and month 6 (M6). Under leave-one-subject-out (LOSO) evaluation, predictions showed moderate agreement with questionnaire scores (M3: R^2=0.24, Pearson r=0.42, Spearman rho=0.48; M6: R^2=0.28, Pearson r=0.49, Spearman rho=0.52), with global RMSE/MAE of 6.62/6.07 at M3 and 6.13/5.54 at M6.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript presents a machine learning pipeline for estimating perceived stress in elderly oncology patients from multimodal wearable sensor data. Wearable streams (physical activity, sleep from smartwatch; ECG from chest sensor) are converted to heterogeneous visual representations, embedded using a pretrained Tiny-BioMoE model into 192-dimensional vectors, and aggregated using attention-based multiple instance learning (MIL) to predict single Perceived Stress Scale (PSS) scores per subject at months 3 and 6. Under leave-one-subject-out evaluation on the CARDIOCARE cohort, the model achieves moderate agreement with questionnaire scores (R²=0.24 at M3, R²=0.28 at M6).

Significance. If the results hold after addressing potential confounds, this approach could contribute to integrating continuous stress assessment into cardio-oncology monitoring, reducing reliance on infrequent PROMs. The use of visual representations and MIL for weak supervision is a reasonable adaptation for the setting where only one label per subject is available. However, the moderate R² values limit immediate clinical impact, and the significance hinges on demonstrating that predictions are driven by stress-related patterns rather than subject-level covariates.

major comments (2)
  1. [Abstract / Evaluation section] Abstract / Evaluation: The reported metrics show only moderate explanatory power (R²=0.24–0.28), which is equivalent to limited predictive utility; this undermines the claim of effective stress estimation unless accompanied by ablation studies showing improvement over baselines that use only non-stress covariates such as age or treatment stage.
  2. [Methods] Methods (visual representation and MIL): No details are provided on the construction of the heterogeneous visual representations from wearable streams, the training procedure for the MIL head, hyperparameter selection, or any checks for confounding factors (e.g., age, treatment effects) that are constant within subjects but vary across LOSO folds. This is load-bearing because the concern that attention may latch onto spurious correlations cannot be ruled out without such controls.
minor comments (2)
  1. [Abstract] The abstract mentions 'global RMSE/MAE' but does not specify if these are computed per-subject or aggregated; clarify the exact evaluation protocol.
  2. [Results] Consider adding a table comparing to simple baselines (e.g., mean predictor or linear regression on subject metadata) to contextualize the R² values.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We appreciate the referee's insightful comments, which have helped us improve the manuscript. We address each major comment below and have made revisions to strengthen the evaluation and methods sections.

read point-by-point responses
  1. Referee: [Abstract / Evaluation section] Abstract / Evaluation: The reported metrics show only moderate explanatory power (R²=0.24–0.28), which is equivalent to limited predictive utility; this undermines the claim of effective stress estimation unless accompanied by ablation studies showing improvement over baselines that use only non-stress covariates such as age or treatment stage.

    Authors: We agree that the moderate R² values warrant further validation to confirm the model's reliance on stress-related signals. Accordingly, we have conducted ablation experiments in the revised manuscript. These include training and evaluating baseline models using only subject-level covariates (age, cancer treatment stage, and other demographic factors) under the same leave-one-subject-out protocol. The results demonstrate that the full model, which incorporates the visual embeddings from wearable data, achieves higher R² and correlation metrics compared to the covariate-only baselines. This supports that the predictions are not solely driven by non-stress covariates. We have also clarified in the evaluation section that while the absolute performance is moderate, the relative improvement highlights the value of the wearable-based approach in this challenging setting. revision: yes

  2. Referee: [Methods] Methods (visual representation and MIL): No details are provided on the construction of the heterogeneous visual representations from wearable streams, the training procedure for the MIL head, hyperparameter selection, or any checks for confounding factors (e.g., age, treatment effects) that are constant within subjects but vary across LOSO folds. This is load-bearing because the concern that attention may latch onto spurious correlations cannot be ruled out without such controls.

    Authors: We concur that expanded methodological transparency is essential. The revised Methods section now provides: (i) explicit details on generating the heterogeneous visual representations, such as converting activity counts to bar plots, sleep metrics to timeline visualizations, and ECG signals to spectrograms or waveform images; (ii) the full specification of the MIL head, including the attention pooling mechanism, the prediction head architecture, and the training objective (e.g., mean squared error loss with Adam optimizer); (iii) the hyperparameter selection strategy, which involved a grid search over learning rates, attention dimensions, and number of experts in the backbone, validated on a held-out subset; and (iv) confounding checks, including Pearson correlations between model predictions and covariates, as well as retraining with covariates concatenated to the embeddings to assess if they explain additional variance. These controls help mitigate concerns about spurious subject-level correlations in the LOSO evaluation. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical ML pipeline evaluated on external labels

full rationale

The manuscript describes a standard supervised learning pipeline: wearable streams are converted to heterogeneous visual representations, embedded by a pretrained Tiny-BioMoE model into 192-dim vectors, aggregated by attention MIL, and regressed against single per-subject PSS questionnaire scores. Evaluation uses LOSO cross-validation with reported R², Pearson, Spearman, RMSE and MAE metrics against those independent ground-truth labels. No equations, derivations, fitted parameters renamed as predictions, self-citations invoked as uniqueness theorems, or ansatzes appear in the provided text. The central claim therefore rests on empirical correlation with external data rather than any tautological reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard machine-learning assumptions and a pretrained backbone; no new physical entities or ad-hoc constants are introduced beyond typical model parameters.

free parameters (1)
  • MIL attention weights
    Learned parameters that aggregate instance embeddings into a bag-level prediction; fitted during training on the target cohort.
axioms (1)
  • domain assumption Wearable-derived visual representations encode patterns correlated with self-reported psychological stress
    Invoked to justify the feasibility of the prediction task from activity, sleep, and ECG streams.

pith-pipeline@v0.9.0 · 5608 in / 1409 out tokens · 93559 ms · 2026-05-10T17:43:48.498757+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 24 canonical work pages

  1. [1]

    Protective and damaging effects of stress mediators,

    B. S. McEwen, “Protective and damaging effects of stress mediators,” New England Journal of Medicine, vol. 338, no. 3, pp. 171–179, 1998, doi: 10.1056/NEJM199801153380307

  2. [2]

    Stress and cardiovascular disease,

    A. Steptoe and M. Kivim ¨aki, “Stress and cardiovascular disease,” Nature Reviews Cardiology, vol. 9, no. 6, pp. 360–370, 2012, doi: 10.1038/nrcardio.2012.45

  3. [3]

    Mind over malignancy: A systematic review and meta-analysis of psychological distress, coping, and therapeutic interventions in oncology,

    A. M. Paslaruet al., “Mind over malignancy: A systematic review and meta-analysis of psychological distress, coping, and therapeutic interventions in oncology,”Medicina, vol. 61, no. 6, p. 1086, 2025, doi: 10.3390/medicina61061086

  4. [4]

    A. R. Lyonet al., “Baseline cardiovascular risk assessment in cancer patients scheduled to receive cardiotoxic cancer therapies: A position statement and new risk assessment tools from the Cardio-Oncology Study Group of the Heart Failure Association of the European Society of Cardiology in collaboration with the International Cardio-Oncology Society,”Euro...

  5. [5]

    Cardiotoxicity in elderly breast cancer patients,

    K. Keramida et al., “Cardiotoxicity in elderly breast cancer patients,” Cancers, vol. 17, no. 13, p. 2198, 2025, doi: 10.3390/cancers17132198

  6. [6]

    A., and Kaplan, J

    Rozanski, A., Blumenthal, J. A., and Kaplan, J. (1999). Impact of psychological factors on the pathogenesis of cardiovascular disease. Circulation, 99(16), 2192–2217. doi: 10.1161/01.CIR.99.16.2192

  7. [7]

    Reduced heart rate variability and mortality risk

    Tsuji, H.et al., (1996). Reduced heart rate variability and mortality risk. Circulation, 94(11), 2850–2855. doi: 10.1161/01.CIR.94.11.2850

  8. [8]

    Stress management interventions to facilitate psychological and physiological adaptation and optimal health outcomes in cancer patients and survivors,

    M. H. Antoniet al., “Stress management interventions to facilitate psychological and physiological adaptation and optimal health outcomes in cancer patients and survivors,”Annual Review of Psychology, vol. 74, pp. 423–455, 2023, doi: 10.1146/annurev-psych-030122-124119

  9. [9]

    Detec- tion and monitoring of stress using wearables: A systematic review,

    A. Pinge, V . Gad, D. Jaisighani, S. Ghosh, and S. Sen, “Detec- tion and monitoring of stress using wearables: A systematic review,” Frontiers in Computer Science, vol. 6, Art. no. 1478851, 2024, doi: 10.3389/fcomp.2024.1478851

  10. [10]

    Large-scale wearable data reveal digital phenotypes for daily-life stress detection,

    E. Smets et al., “Large-scale wearable data reveal digital phenotypes for daily-life stress detection,” npj Digital Medicine, vol. 1, no. 1, p. 67, 2018, doi: 10.1038/s41746-018-0074-9

  11. [11]

    Wearables and the medical revolu- tion,

    J. Dunn, R. Runge, and M. Snyder, “Wearables and the medical revolu- tion,” Per. Med., vol. 15, no. 5, pp. 429–448, 2018, doi: 10.2217/pme- 2018-0044

  12. [12]

    A deep learning approach to stress recognition through multimodal physiological signal image transformation,

    S. Yanget al., “A deep learning approach to stress recognition through multimodal physiological signal image transformation,”Scientific Re- ports, vol. 15, art. no. 22258, 2025, doi: 10.1038/s41598-025-01228-3

  13. [13]

    Multi-representation dia- grams for pain recognition: Integrating various electrodermal activity signals into a single image,

    S. Gkikas, I. Kyprakis, and M. Tsiknakis, “Multi-representation dia- grams for pain recognition: Integrating various electrodermal activity signals into a single image,” inCompanion Proc. 27th Int. Conf. on Multimodal Interaction (ICMI Companion), New York, NY , USA: ACM, 2025, pp. 162–171, doi: 10.1145/3747327.3764793

  14. [14]

    Dn-splatter: Depth and normal priors for gaussian splatting and meshing

    S. Ziaratnia, T. Laohakangvalvit, M. Sugaya, and P. Sripian, “Mul- timodal deep learning for remote stress estimation using CCT- LSTM,” inProc. IEEE/CVF Winter Conf. on Applications of Com- puter Vision (WACV), Waikoloa, HI, USA, 2024, pp. 8321–8329, doi: 10.1109/W ACV57701.2024.00815

  15. [15]

    Generalizable machine learning for stress monitoring from wearable devices: A sys- tematic literature review,

    G. V os, K. Trinh, Z. Sarnyai, and M. R. Azghadi, “Generalizable machine learning for stress monitoring from wearable devices: A sys- tematic literature review,”International Journal of Medical Informatics, vol. 173, p. 105026, 2023, doi: 10.1016/j.ijmedinf.2023.105026

  16. [16]

    Attention-based deep multiple instance learning,

    M. Ilse, J. M. Tomczak, and M. Welling, “Attention-based deep multiple instance learning,” in Proceedings of the 35th International Confer- ence on Machine Learning (ICML), Stockholm, Sweden, 2018, pp. 2127–2136

  17. [17]

    Toward Efficient Inference for Mixture of Experts,

    H. Huang, N. Ardalani, A. Sun, L. Ke, H.-H. S. Lee, S. Bhosale, C.-J. Wu, and B. Lee, “Toward Efficient Inference for Mixture of Experts,” in Proc. 38th Conf. Neural Information Processing Systems (NeurIPS), 2024

  18. [18]

    Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity,

    W. Fedus, B. Zoph, and N. Shazeer, “Switch transformers: Scaling to trillion parameter models with simple and efficient sparsity,”J. Mach. Learn. Res., vol. 23, no. 1, art. no. 120, pp. 1–39, Jan. 2022

  19. [19]

    Home – CARDIOCARE

    “Home – CARDIOCARE.” [Online]. Available: https://cardiocare- project.eu/ Accessed: Dec. 2025

  20. [20]

    AI and smart devices in cardio-oncology: Advance- ments in cardiotoxicity prediction and cardiovascular monitoring,

    L. C. Nechita,et al., “AI and smart devices in cardio-oncology: Advance- ments in cardiotoxicity prediction and cardiovascular monitoring,”Diag- nostics, vol. 15, no. 6, p. 787, 2025, doi: 10.3390/diagnostics15060787

  21. [21]

    Early detection of chronic stress using wearable devices: A machine learning approach with the WESAD database,

    A. Calvo, J. Martin, and C. Martin, “Early detection of chronic stress using wearable devices: A machine learning approach with the WESAD database,” inProc. 11th Int. Conf. Information and Communication Technologies for Ageing Well and e-Health (ICT4AWE), 2025, pp. 189–196, doi: 10.5220/0013209700003938

  22. [22]

    Wearable flexible electronics based cardiac electrode for researcher mental stress detection system using machine learning models on single lead electrocardiogram signal,

    M. B. Bin Heyatet al., “Wearable flexible electronics based cardiac electrode for researcher mental stress detection system using machine learning models on single lead electrocardiogram signal,” Biosensors, vol. 12, no. 6, p. 427, 2022, doi: 10.3390/bios12060427

  23. [23]

    Extending Stress Detection Reproducibility to Consumer Wearable Sensors,

    O. B. Amin, V . Mishra, T. M. Tapera, R. V olpe, and A. Sathyanarayana, “Extending Stress Detection Reproducibility to Consumer Wearable Sensors,”arXiv preprint arXiv:2505.05694, 2025. [Online]. Available: https://arxiv.org/abs/2505.05694

  24. [24]

    Alignment between heart rate variability from fitness trackers and perceived stress: Per- spectives from a large-scale in situ longitudinal study of information workers,

    G. J. Martinez, T. Grover, S. M. Mattingly,et al., “Alignment between heart rate variability from fitness trackers and perceived stress: Per- spectives from a large-scale in situ longitudinal study of information workers,”JMIR Human Factors, vol. 9, no. 3, p. e33754, Aug. 2022, doi: 10.2196/33754

  25. [25]

    Visualizing relaxation in wearables: Multi-domain feature fusion of HRV using fuzzy recurrence plots,

    P. Arya,et al., “Visualizing relaxation in wearables: Multi-domain feature fusion of HRV using fuzzy recurrence plots,”Sensors, vol. 25, no. 13, p. 4210, Jul. 2025, doi: 10.3390/s25134210

  26. [26]

    Venu SQ smartwatch,

    Garmin Ltd., “Venu SQ smartwatch,” [Online]. Available: https://www.garmin.com/en-US/p/707174/. Accessed: Dec. 2025

  27. [27]

    Polar H10 heart rate sensor,

    Polar Electro Oy, “Polar H10 heart rate sensor,” [Online]. Available: https://www.polar.com/en/sensors/h10-heart-rate-sensor. Accessed: Dec. 2025

  28. [28]

    A global measure of perceived stress,

    S. Cohen, T. Kamarck, and R. Mermelstein, “A global measure of perceived stress,”Journal of Health and Social Behavior, vol. 24, no. 4, pp. 385–396, 1983, doi: 10.2307/2136404

  29. [29]

    Lau, Jan C

    D. Makowskiet al., “NeuroKit2: A Python toolbox for neurophysio- logical signal processing,”Behavior Research Methods, vol. 53, no. 4, pp. 1689–1696, 2021, doi: 10.3758/s13428-020-01516-y

  30. [30]

    Accessed: Dec

    Google LLC, “Gemini,” AI image generation system. Accessed: Dec

  31. [31]

    Available: https://gemini.google.com/

    [Online]. Available: https://gemini.google.com/

  32. [32]

    Tiny-BioMoE: A Lightweight Embedding Model for Biosignal Analysis,

    S. Gkikas, I. Kyprakis, and M. Tsiknakis, “Tiny-BioMoE: A Lightweight Embedding Model for Biosignal Analysis,” inCompanion Proceedings of the 27th International Conference on Multimodal Interaction (ICMI Companion), New York, NY , USA: ACM, 2025, pp. 117–126, doi: 10.1145/3747327.3764788