arxiv: 2604.06985 · v1 · submitted 2026-04-08 · 💻 cs.LG · cs.AI

Recognition: 2 theorem links

· Lean Theorem

Frailty Estimation in Elderly Oncology Patients Using Multimodal Wearable Data and Multi-Instance Learning

Anastasia Constantinidou, Andri Papakonstantinou, Dimitrios I. Fotiadis, Domen Ribnikar, Dorothea Tsekoura, Georgia Karanasiou, Ioannis Kyprakis, Kalliopi Keramida, Ketti Mazzocco, Konstantinos Marias, Lampros Lakkas, Manolis Tsiknakis, Vasileios Skaramagkas

Authors on Pith no claims yet

Pith reviewed 2026-05-10 17:49 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords frailty estimationwearable sensorsmulti-instance learningelderly oncologymultimodal fusionfunctional declineattention mechanismsmartwatch data

0 comments

The pith

Multimodal wearables and attention-based learning estimate frailty-related functional changes between clinic visits in elderly breast cancer patients.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a system that uses free-living data from smartwatches and chest-strap ECG devices to forecast whether handgrip strength or fatigue scores have worsened, remained stable, or improved since the prior assessment. Wearable measurements are grouped into bags aligned to month-3 and month-6 visits and processed by an attention-based multiple-instance learning model that learns to weight the most informative instances despite irregular timing and missing readings. This produces predictions under weak supervision from the clinic labels alone. The reported leave-one-subject-out results reach balanced accuracies near 0.70 for handgrip and 0.64 for FACIT-F at six months, with smartwatch activity and sleep features supplying the largest share of signal. If the approach holds, it supplies a practical route to continuous frailty monitoring without additional patient burden or extra clinic trips.

Core claim

An attention-based multiple instance learning model with modality-specific MLP encoders aggregates variable-length, partially missing multimodal wearable instances (smartwatch physical activity and sleep plus ECG heart-rate variability) into bags aligned to clinical follow-ups and predicts discretized change-from-baseline classes for handgrip strength and FACIT-F in elderly oncology patients, attaining balanced accuracies of 0.68/0.70 and 0.59/0.64 at months 3 and 6 respectively under subject-independent validation.

What carries the argument

Attention-based multiple instance learning that applies modality-specific multilayer perceptrons to encode and then attention-weight irregular longitudinal wearable instances under weak supervision.

If this is right

Smartwatch activity and sleep streams supply the dominant predictive information, while HRV adds complementary value only when fused.
The model maintains performance under leave-one-subject-out validation, supporting generalization to unseen patients.
Discretized change classes align outputs directly to clinical thresholds used for treatment decisions.
The framework tolerates real-world irregularities including variable bag sizes and missing modalities without requiring complete recordings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Continuous estimates could trigger earlier supportive interventions before the next scheduled visit.
The same bag-and-attention structure may transfer to other sparse-label longitudinal monitoring tasks in chronic disease.
Pairing the model with outcome data such as treatment tolerance or survival would test whether predicted changes carry prognostic weight.

Load-bearing premise

The attention-weighted aggregation of wearable instances can recover the true functional changes that occur between clinic visits when only the visit-level labels are available for training.

What would settle it

A new cohort in which model predictions of worsened/stable/improved status are directly compared against repeated clinical handgrip and FACIT-F measurements taken at the same three- and six-month time points.

Figures

Figures reproduced from arXiv: 2604.06985 by Anastasia Constantinidou, Andri Papakonstantinou, Dimitrios I. Fotiadis, Domen Ribnikar, Dorothea Tsekoura, Georgia Karanasiou, Ioannis Kyprakis, Kalliopi Keramida, Ketti Mazzocco, Konstantinos Marias, Lampros Lakkas, Manolis Tsiknakis, Vasileios Skaramagkas.

**Figure 1.** Figure 1: A graphical illustration of the proposed pipeline; the initial two images (human icons) were generated using Google Gemini [30] (synthetic images). [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

read the original abstract

Frailty and functional decline strongly influence treatment tolerance and outcomes in older patients with cancer, yet assessment is typically limited to infrequent clinic visits. We propose a multimodal wearable framework to estimate frailty-related functional change between visits in elderly breast cancer patients enrolled in the multicenter CARDIOCARE study. Free-living smartwatch physical activity and sleep features are combined with ECG-derived heart rate variability (HRV) features from a chest strap and organized into patient-horizon bags aligned to month 3 (M3) and month 6 (M6) follow-ups. Our innovation is an attention-based multiple instance learning (MIL) formulation that fuses irregular, multimodal wearable instances under real-world missingness and weak supervision. An attention-based MIL model with modality-specific multilayer perceptron (MLP) encoders with embedding dimension 128 aggregates variable-length and partially missing longitudinal instances to predict discretized change-from-baseline classes (worsened, stable, improved) for FACIT-F and handgrip strength. Under subject-independent leave-one-subject-out (LOSO) evaluation, the full multimodal model achieved balanced accuracy/F1 of 0.68 +/- 0.08/0.67 +/- 0.09 at M3 and 0.70 +/- 0.10/0.69 +/- 0.08 at M6 for handgrip, and 0.59 +/- 0.04/0.58 +/- 0.06 at M3 and 0.64 +/- 0.05/0.63 +/- 0.07 at M6 for FACIT-F. Ablation results indicated that smartwatch activity and sleep provide the strongest predictive information for frailty-related functional changes, while HRV contributes complementary information when fused with smartwatch streams.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript proposes an attention-based multiple instance learning (MIL) framework to estimate changes in frailty indicators (handgrip strength and FACIT-F) in elderly oncology patients using multimodal data from smartwatches (activity and sleep) and chest-strap ECG (HRV). Data are organized into patient-horizon bags aligned to M3 and M6 visits, with the model predicting discretized change classes (worsened, stable, improved) under weak supervision and real-world missingness. Subject-independent LOSO evaluation yields balanced accuracy/F1 scores of 0.68/0.67 (M3 handgrip), 0.70/0.69 (M6 handgrip), 0.59/0.58 (M3 FACIT-F), and 0.64/0.63 (M6 FACIT-F), with ablations indicating smartwatch features as most predictive and HRV as complementary.

Significance. If the results hold under larger validation, the work offers a promising approach for remote frailty monitoring between clinic visits in oncology, leveraging MIL to handle irregular multimodal wearable streams. The explicit LOSO protocol and modality ablations are strengths that support reproducibility and interpretability of feature contributions. Performance remains modest (near 0.6-0.7 balanced accuracy), however, so clinical translation would require further evidence on robustness to missingness and alignment of attention weights with meaningful temporal patterns.

major comments (3)

[Results (LOSO evaluation)] Results (LOSO evaluation paragraph): The reported standard deviations (0.04–0.10) are large relative to the mean accuracies, yet no cohort size, number of subjects, or total instances is stated. Without this, it is impossible to assess whether the variability reflects small-sample effects or unstable aggregation, which directly weakens confidence in the headline performance claims.
[Methods (MIL formulation)] Methods (attention-based MIL formulation): The central assumption that attention weights capture clinically meaningful functional-change dynamics rather than artifacts of data availability or missingness patterns is untested. No attention-weight visualizations, correlations with clinical events, or ablation on masked vs. imputed instances are provided, leaving the reliability of the bag-level aggregation unsupported.
[Methods (label preparation)] Methods (label preparation): Continuous handgrip and FACIT-F scores are discretized into three classes without reported validation against continuous regression baselines, sensitivity analysis on thresholds, or clinical justification for the cut-points. This post-hoc step is load-bearing for the reported classification metrics and could introduce bias not captured by the current evaluation.

minor comments (2)

[Abstract] Abstract: Sample size and patient count should be stated explicitly so readers can contextualize the reported means and standard deviations.
[Ablation experiments] Ablation experiments: Clarify the exact missingness handling strategy (exclusion, masking, or imputation) applied when ablating modalities, as this affects comparability across ablations.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments. We address each major point below and will revise the manuscript to improve transparency and support for our claims.

read point-by-point responses

Referee: Results (LOSO evaluation paragraph): The reported standard deviations (0.04–0.10) are large relative to the mean accuracies, yet no cohort size, number of subjects, or total instances is stated. Without this, it is impossible to assess whether the variability reflects small-sample effects or unstable aggregation, which directly weakens confidence in the headline performance claims.

Authors: We agree that cohort details are essential for interpreting the reported variability. The revised manuscript will explicitly state the number of subjects and total instances in the Results section. The standard deviations primarily reflect the small, heterogeneous elderly oncology cohort and real-world missingness patterns under the stringent LOSO protocol; we will add a brief discussion of these factors and their implications for performance stability. revision: yes
Referee: Methods (attention-based MIL formulation): The central assumption that attention weights capture clinically meaningful functional-change dynamics rather than artifacts of data availability or missingness patterns is untested. No attention-weight visualizations, correlations with clinical events, or ablation on masked vs. imputed instances are provided, leaving the reliability of the bag-level aggregation unsupported.

Authors: We acknowledge the value of directly testing this assumption. The revised version will include attention-weight visualizations for representative patients and an ablation comparing performance on masked versus imputed instances to assess sensitivity to missingness patterns. While exhaustive correlation with specific clinical events would require additional annotations beyond the current dataset, the existing modality ablations already provide supporting evidence for the contribution of each data stream to the bag-level predictions. revision: partial
Referee: Methods (label preparation): Continuous handgrip and FACIT-F scores are discretized into three classes without reported validation against continuous regression baselines, sensitivity analysis on thresholds, or clinical justification for the cut-points. This post-hoc step is load-bearing for the reported classification metrics and could introduce bias not captured by the current evaluation.

Authors: The discretization thresholds follow established minimal clinically important differences (MCID) reported in the oncology and geriatrics literature for handgrip strength and FACIT-F. The revised Methods section will include explicit clinical references and justification for the cut-points. We will also add a sensitivity analysis varying the thresholds and report the resulting impact on balanced accuracy. A supplementary comparison against continuous regression baselines can be provided if requested. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical MIL training on external labels under LOSO

full rationale

The paper describes a supervised attention-based MIL pipeline that encodes multimodal wearable instances with modality-specific MLPs, aggregates via attention, and predicts discretized functional-change classes from clinical labels. All reported metrics (balanced accuracy/F1 under subject-independent LOSO) are obtained by training and evaluating on held-out subjects; no equation, parameter, or prediction is defined in terms of itself or reduced to a fitted input by construction. No self-citation is invoked as a uniqueness theorem or to justify the core architecture. The derivation chain is therefore self-contained empirical learning.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Relies on standard assumptions of MIL applicability to longitudinal data with missingness; no new invented entities or heavily fitted parameters beyond typical hyperparameters.

free parameters (1)

embedding dimension
MLP encoder size set to 128; chosen to balance capacity and data scale.

axioms (1)

domain assumption Attention weights can effectively aggregate variable-length multimodal instances for bag-level prediction under weak supervision
Core MIL assumption invoked for handling irregular wearable data aligned to M3/M6 visits.

pith-pipeline@v0.9.0 · 5690 in / 1334 out tokens · 59523 ms · 2026-05-10T17:49:03.449439+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

attention-based MIL model with modality-specific MLP encoders (embedding dimension D=128) aggregates variable-length and partially missing longitudinal instances to predict discretized change-from-baseline classes
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

LOSO evaluation... balanced accuracy/F1 of 0.68±0.08/0.67±0.09 at M3 and 0.70±0.10/0.69±0.08 at M6 for handgrip

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

31 extracted references · 23 canonical work pages

[1]

Frailty in elderly people,

A. Clegg, J. Young, S. Iliffe, M. O. Rikkert, and K. Rockwood, “Frailty in elderly people,”The Lancet, vol. 381, no. 9868, pp. 752–762, Mar. 2013, doi: 10.1016/S0140-6736(12)62167-9

work page doi:10.1016/s0140-6736(12)62167-9 2013
[2]

Functional Decline and Resilience in Older Women Receiving Adjuvant Chemotherapy for Breast Cancer,

A. Hurria,et al., “Functional Decline and Resilience in Older Women Receiving Adjuvant Chemotherapy for Breast Cancer,”Journal of the American Geriatrics Society, vol. 67, no. 5, pp. 920–927, 2019, doi: 10.1111/jgs.15493

work page doi:10.1111/jgs.15493 2019
[3]

The impact of low-grade toxicity in older people with cancer undergoing chemotherapy,

T. Kalsi,et al., “The impact of low-grade toxicity in older people with cancer undergoing chemotherapy,”British Journal of Cancer, vol. 111, no. 12, pp. 2224–2228, 2014, doi: 10.1038/bjc.2014.496

work page doi:10.1038/bjc.2014.496 2014
[4]

Randomized clinical trial of a digital integra- tive medicine intervention among patients undergoing active cancer treatment,

J. J. Mao,et al., “Randomized clinical trial of a digital integra- tive medicine intervention among patients undergoing active cancer treatment,”npj Digital Medicine, vol. 8, Art. no. 29, 2025, doi: 10.1038/s41746-024-01387-z

work page doi:10.1038/s41746-024-01387-z 2025
[5]

Cardiovascular and cancer risk: The role of cardio- oncology,

J. S. Coviello, “Cardiovascular and cancer risk: The role of cardio- oncology,”Journal of the Advanced Practitioner in Oncology, vol. 9, no. 2, pp. 160–176, 2018

2018
[6]

International Society of Geriatric Oncology Consen- sus on Geriatric Assessment in Older Patients With Cancer,

H. Wildierset al., “International Society of Geriatric Oncology Consen- sus on Geriatric Assessment in Older Patients With Cancer,”Journal of Clinical Oncology, vol. 32, no. 24, pp. 2595–2603, 2014, doi: 10.1200/JCO.2013.54.8347

work page doi:10.1200/jco.2013.54.8347 2014
[7]

Comprehensive geriatric assessment in the older cancer patient: coming of age in clinical cancer care,

C. Owusu and N. A. Berger, “Comprehensive geriatric assessment in the older cancer patient: coming of age in clinical cancer care,”Clin. Pract. (Lond.), vol. 11, no. 6, pp. 749–762, 2014, doi: 10.2217/cpr.14.72

work page doi:10.2217/cpr.14.72 2014
[8]

Characteristics associated with physical func- tion trajectories in older adults with cancer during chemotherapy,

M. L. Wonget al., “Characteristics associated with physical func- tion trajectories in older adults with cancer during chemotherapy,”J. Pain Symptom Manage., vol. 56, no. 5, pp. 678–688.e1, 2018, doi: 10.1016/j.jpainsymman.2018.08.006

work page doi:10.1016/j.jpainsymman.2018.08.006 2018
[9]

Wearable Sensors and the Assessment of Frailty among Vulnerable Older Adults: An Observational Cohort Study,

J. Razjouyanet al., “Wearable Sensors and the Assessment of Frailty among Vulnerable Older Adults: An Observational Cohort Study,”Sen- sors (Basel), vol. 18, no. 5, p. 1336, Apr. 2018, doi: 10.3390/s18051336

work page doi:10.3390/s18051336 2018
[10]

From wearable sensor data to digital biomarker development: ten lessons learned and a framework proposal,

P. Danioreet al., “From wearable sensor data to digital biomarker development: ten lessons learned and a framework proposal,”npj Digital Medicine, vol. 7, Art. no. 161, 2024, doi: 10.1038/s41746-024-01151-3

work page doi:10.1038/s41746-024-01151-3 2024
[11]

Cardiotoxicity in elderly breast cancer patients,

K. Keramida et al., “Cardiotoxicity in elderly breast cancer patients,” Cancers, vol. 17, no. 13, p. 2198, 2025, doi: 10.3390/cancers17132198

work page doi:10.3390/cancers17132198 2025
[12]

Home – CARDIOCARE

“Home – CARDIOCARE.” [Online]. Available: https://cardiocare- project.eu/ Accessed: Dec. 2025

2025
[13]

Cella,Manual of the Functional Assessment of Chronic Illness Ther- apy (F ACIT) Measurement System, Ver

D. Cella,Manual of the Functional Assessment of Chronic Illness Ther- apy (F ACIT) Measurement System, Ver. 4. Evanston, IL, USA: Center on Outcomes, Research and Education (CORE), Evanston Northwestern Healthcare and Northwestern University, 1997

1997
[14]

Attention-based deep multiple instance learning,

M. Ilse, J. M. Tomczak, and M. Welling, “Attention-based deep multiple instance learning,” in Proceedings of the 35th International Confer- ence on Machine Learning (ICML), Stockholm, Sweden, 2018, pp. 2127–2136

2018
[15]

Wearable AI for on-device frailty assessment,

K. A. Kasperet al., “Wearable AI for on-device frailty assessment,” Nature Communications, 2025, doi: 10.1038/s41467-025-67728-y

work page doi:10.1038/s41467-025-67728-y 2025
[16]

Digital Biomarkers of Physical Frailty and Frailty Phenotypes Using Sensor-Based Physical Activity and Machine Learning,

C. Park, R. Mishra, J. Golledge, and B. Najafi, “Digital Biomarkers of Physical Frailty and Frailty Phenotypes Using Sensor-Based Physical Activity and Machine Learning,”Sensors, vol. 21, no. 16, Art. no. 5289, 2021, doi: 10.3390/s21165289

work page doi:10.3390/s21165289 2021
[17]

Digital Biomarker Representing Frailty Phenotypes: The Use of Machine Learning and Sensor-Based Sit-to-Stand Test,

C. Parket al., “Digital Biomarker Representing Frailty Phenotypes: The Use of Machine Learning and Sensor-Based Sit-to-Stand Test,”Sensors (Basel), vol. 21, no. 9, Art. no. 3258, 2021, doi: 10.3390/s21093258

work page doi:10.3390/s21093258 2021
[18]

Frailty Assessment Using Temporal Gait Characteristics and a Long Short-Term Memory Network,

D. Jung, J. Kim, M. Kim, C. W. Won, and K.-R. Mun, “Frailty Assessment Using Temporal Gait Characteristics and a Long Short-Term Memory Network,”IEEE J. Biomed. Health Inform., vol. 25, no. 9, pp. 3649–3658, 2021, doi: 10.1109/JBHI.2021.3067931

work page doi:10.1109/jbhi.2021.3067931 2021
[19]

Digital health technology combining wearable gait sensors and machine learning improve the accuracy in prediction of frailty,

S. Fanet al., “Digital health technology combining wearable gait sensors and machine learning improve the accuracy in prediction of frailty,”Frontiers in Public Health, vol. 11, p. 1169083, 2023, doi: 10.3389/fpubh.2023.1169083

work page doi:10.3389/fpubh.2023.1169083 2023
[20]

Towards Automated Assessment of Frailty Status Using a Wrist-Worn Device,

D. Miniciet al., “Towards Automated Assessment of Frailty Status Using a Wrist-Worn Device,”IEEE J. Biomed. Health Inform., vol. 26, no. 3, pp. 1013–1022, 2022, doi: 10.1109/JBHI.2021.3100979

work page doi:10.1109/jbhi.2021.3100979 2022
[21]

Frailty and outcomes in adults undergoing sys- temic anti-cancer treatment: a systematic review and meta-analysis,

A. Pearceet al., “Frailty and outcomes in adults undergoing sys- temic anti-cancer treatment: a systematic review and meta-analysis,” J. Natl. Cancer Inst., vol. 117, no. 7, pp. 1316–1339, 2025, doi: 10.1093/jnci/djaf017

work page doi:10.1093/jnci/djaf017 2025
[22]

The use of wearable technology in studies in older adults with cancer: A systematic review,

J. J. Duinet al., “The use of wearable technology in studies in older adults with cancer: A systematic review,” Oncologist, vol. 30, no. 8, p. oyae319, 2025, doi: 10.1093/oncolo/oyae319

work page doi:10.1093/oncolo/oyae319 2025
[23]

Associations between performance-based and patient- reported physical functioning and real-world mobile sensor metrics in older cancer survivors: A pilot study,

C. A. Lowet al., “Associations between performance-based and patient- reported physical functioning and real-world mobile sensor metrics in older cancer survivors: A pilot study,”J. Geriatr . Oncol., p. 101708, 2024, doi: 10.1016/j.jgo.2024.101708

work page doi:10.1016/j.jgo.2024.101708 2024
[24]

Harnessing physical activity monitoring and digi- tal biomarkers of frailty from pendant based wearables to predict chemotherapy resilience in veterans with cancer,

G. Cayet al., “Harnessing physical activity monitoring and digi- tal biomarkers of frailty from pendant based wearables to predict chemotherapy resilience in veterans with cancer,”Sci. Rep., vol. 14, p. 2612, 2024, doi: 10.1038/s41598-024-53025-z

work page doi:10.1038/s41598-024-53025-z 2024
[25]

Venu SQ smartwatch,

Garmin Ltd., “Venu SQ smartwatch,” [Online]. Available: https://www.garmin.com/en-US/p/707174/. Accessed: Dec. 2025

2025
[26]

Polar H10 heart rate sensor,

Polar Electro Oy, “Polar H10 heart rate sensor,” [Online]. Available: https://www.polar.com/en/sensors/h10-heart-rate-sensor. Accessed: Dec. 2025

2025
[27]

FACIT-Fatigue scale in patients with cold ag- glutinin disease: psychometric validation and estimation of clini- cally meaningful change,

D. Cellaet al.“FACIT-Fatigue scale in patients with cold ag- glutinin disease: psychometric validation and estimation of clini- cally meaningful change,”Frontiers in Hematology, vol. 4, 2025, doi: 10.3389/frhem.2025.1490130

work page doi:10.3389/frhem.2025.1490130 2025
[28]

Minimal detectable change in handgrip strength and usual and maximum gait speed scores in community- dwelling Japanese older adults requiring long-term care/support,

Y . Sawayaet al., “Minimal detectable change in handgrip strength and usual and maximum gait speed scores in community- dwelling Japanese older adults requiring long-term care/support,” Geriatric Nursing, vol. 42, no. 5, pp. 1184–1189, 2021, doi: 10.1016/j.gerinurse.2021.07.004

work page doi:10.1016/j.gerinurse.2021.07.004 2021
[29]

Lau, Jan C

D. Makowskiet al., “NeuroKit2: A Python toolbox for neurophysio- logical signal processing,”Behavior Research Methods, vol. 53, no. 4, pp. 1689–1696, 2021, doi: 10.3758/s13428-020-01516-y

work page doi:10.3758/s13428-020-01516-y 2021
[30]

Accessed: Dec

Google LLC, “Gemini,” AI image generation system. Accessed: Dec
[31]

Available: https://gemini.google.com/

[Online]. Available: https://gemini.google.com/