Adaptive data selection improves wearable prediction under low baseline performance

Ali Kargarandehkordi

arxiv: 2606.00141 · v1 · pith:JSR6QDZEnew · submitted 2026-05-29 · 💻 cs.LG · cs.AI

Adaptive data selection improves wearable prediction under low baseline performance

Ali Kargarandehkordi This is my paper

Pith reviewed 2026-06-28 23:46 UTC · model grok-4.3

classification 💻 cs.LG cs.AI

keywords adaptive data selectionwearable predictionbaseline performanceinverse correlationAUROCtime window selectionlongitudinal wearable dataEMA

0 comments

The pith

Adaptive window selection improves wearable predictions most for users with low baseline performance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests adaptive selection of time windows for training prediction models on wearable data under fixed budgets, comparing it to random sampling across heart rate, activity, and EMA modalities. Adaptive methods produce large AUROC gains, up to 0.7, for participants whose random-sampling baselines are weak, but deliver small or negative gains for participants whose baselines are already strong. The size of the gain is strongly inversely related to baseline performance, with Pearson correlation -0.67 across modalities, and 60-80 percent of individuals show AUROC improvement even though F1 gains are smaller and less reliable. These patterns indicate that adaptive sensing delivers its main value in the subset of cases where standard sampling already performs poorly rather than helping uniformly.

Core claim

Adaptive strategies for selecting time windows yield substantial AUROC improvements for participants with low baseline performance while offering limited or negative gains for those with strong baselines, with adaptive gain strongly inversely correlated with baseline performance across modalities.

What carries the argument

Adaptive selection of time windows for model training under fixed measurement budgets, measured by its performance gain over random sampling as a function of each participant's baseline AUROC.

If this is right

Adaptive sensing supplies its largest benefit precisely when baseline performance is low.
Selective use of adaptive methods conditioned on baseline performance improves overall efficiency of wearable monitoring.
Sixty to eighty percent of participants gain in AUROC from adaptive window selection.
F1-score improvements remain smaller and less consistent than AUROC improvements.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

A quick baseline estimate could let a system switch between adaptive and random sampling on a per-user basis.
The same inverse relationship may appear in other longitudinal sensing or prediction settings outside wearables.
Real-time online deployment of the adaptive rule could be tested to check whether offline gains translate to live use.

Load-bearing premise

The longitudinal wearable dataset and the chosen adaptive procedure produce unbiased performance estimates without participant or window exclusions that artificially strengthen the reported inverse correlation.

What would settle it

Re-running the identical adaptive procedure on a fresh longitudinal wearable dataset and finding that the inverse correlation with baseline performance disappears or reverses when no post-hoc exclusions are applied.

read the original abstract

Adaptive sensing strategies that selectively sample data are increasingly used in wearable health systems to improve prediction performance under limited data budgets, yet their benefits across individuals remain poorly understood. Here, we evaluate adaptive selection of time windows for model training under fixed measurement budgets across multiple sensing modalities, including heart rate, activity, and ecological momentary assessment (EMA), in a longitudinal wearable dataset. We quantify performance gains relative to random sampling using both area under the receiver operating characteristic curve (AUROC) and F1 score. Adaptive strategies yield substantial improvements in AUROC for participants with low baseline performance (with gains up to 0.7), while offering limited or negative gains for participants with strong baselines. Across modalities, adaptive gain is strongly inversely correlated with baseline performance (Pearson r = -0.67; Spearman p = -0.62). At the participant level, most individuals benefit in AUROC (60-80% across modalities), although improvements in F1 are smaller and less consistent. These findings show that adaptive sensing is not uniformly beneficial, but instead provides the greatest value in underperforming settings. Our results support selective deployment strategies that tailor adaptive sensing based on baseline performance to improve efficiency in wearable health monitoring.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper reports a clear inverse link between baseline performance and adaptive sampling gains in wearables, but the abstract supplies no methods so the correlation cannot be trusted yet.

read the letter

The main point is that adaptive window selection improves AUROC most for participants whose random-sampling baseline is already weak, with gains up to 0.7 and a reported Pearson correlation of -0.67 across modalities.

The work does one thing cleanly: it shows the benefit is not uniform. Most participants (60-80 %) see an AUROC lift, but F1 gains are smaller and the high-baseline group often sees little or negative change. That differential pattern is worth knowing for anyone thinking about when to turn on adaptive sensing in health wearables.

The soft spot is obvious and large. The abstract gives zero information on model training, how baseline AUROC was computed, longitudinal window handling, or any exclusion rules applied after the baselines were known. Without those details the inverse correlation could easily be an artifact of post-hoc filtering that removed high-baseline cases. The stress-test note flags exactly this risk, and nothing in the provided text rules it out.

This is the sort of observation that matters for practical deployment in mobile health, but only if the methods hold up. Right now it reads as a preliminary finding rather than a verified result.

I would bring it to a reading group to talk through the filtering concern. It is worth sending to referees so the methods can be checked, even though heavy revision is likely needed.

Referee Report

3 major / 2 minor

Summary. The manuscript evaluates adaptive selection of time windows for model training under fixed measurement budgets in a longitudinal wearable dataset across modalities (heart rate, activity, EMA). It reports that adaptive strategies yield large AUROC gains (up to 0.7) for participants with low baseline performance, limited or negative gains for strong baselines, and a strong inverse correlation between adaptive gain and baseline performance (Pearson r = -0.67; Spearman ho = -0.62). At the participant level, 60-80% benefit in AUROC (less consistently in F1), supporting selective deployment of adaptive sensing based on baseline performance.

Significance. If the results are robust, the work provides useful empirical evidence of heterogeneity in adaptive sensing benefits, which could inform more efficient wearable health systems by prioritizing adaptive strategies where baselines are weak. The cross-modality inverse correlation is a concrete observation that may guide future adaptive system design.

major comments (3)

[Methods] Methods: The manuscript provides no description of the predictive models used, training procedures, hyperparameter selection, or handling of longitudinal dependencies in the wearable time series. These details are required to evaluate whether the reported AUROC gains and the inverse correlation are reproducible and free of bias from data leakage or improper cross-validation.
[Results] Results (correlation and participant-level analyses): The reported Pearson r = -0.67, Spearman ho = -0.62, and 60-80% benefit rates are presented without any statement of participant or window exclusion rules applied after baseline AUROC computation. If any minimum-sample, outlier, or modality-availability filter was applied post-baseline and disproportionately removed high-baseline cases, the inverse relationship would be mechanically strengthened; explicit safeguards or sensitivity checks are needed.
[Results] Results: No information is given on correction for multiple comparisons across modalities, participants, and the two metrics (AUROC, F1), nor on the exact definition of 'baseline performance' used to stratify participants. These omissions directly affect the reliability of the central claim that adaptive gain is strongly inversely correlated with baseline performance.

minor comments (2)

[Abstract] Abstract: The notation 'Spearman p = -0.62' is nonstandard; clarify whether this is Spearman ho and whether a p-value is also reported.
[Results] The manuscript would benefit from a table summarizing per-modality baseline AUROC distributions and gain statistics to allow readers to assess the range and outliers driving the correlation.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which highlight areas where additional clarity will strengthen the manuscript. We address each major comment below and will incorporate revisions as indicated.

read point-by-point responses

Referee: [Methods] Methods: The manuscript provides no description of the predictive models used, training procedures, hyperparameter selection, or handling of longitudinal dependencies in the wearable time series. These details are required to evaluate whether the reported AUROC gains and the inverse correlation are reproducible and free of bias from data leakage or improper cross-validation.

Authors: We agree that the Methods section requires expansion for reproducibility. In the revised manuscript we will add a detailed description of the predictive models (including classifier types per modality), training procedures, hyperparameter selection methods, and the time-series cross-validation strategy used to respect longitudinal structure and prevent leakage. revision: yes
Referee: [Results] Results (correlation and participant-level analyses): The reported Pearson r = -0.67, Spearman ρ = -0.62, and 60-80% benefit rates are presented without any statement of participant or window exclusion rules applied after baseline AUROC computation. If any minimum-sample, outlier, or modality-availability filter was applied post-baseline and disproportionately removed high-baseline cases, the inverse relationship would be mechanically strengthened; explicit safeguards or sensitivity checks are needed.

Authors: No post-baseline exclusions or filters were applied beyond the dataset preprocessing steps already described. We will add an explicit statement confirming this and include a sensitivity analysis demonstrating that the reported correlations remain stable under alternative outlier-handling rules. revision: yes
Referee: [Results] Results: No information is given on correction for multiple comparisons across modalities, participants, and the two metrics (AUROC, F1), nor on the exact definition of 'baseline performance' used to stratify participants. These omissions directly affect the reliability of the central claim that adaptive gain is strongly inversely correlated with baseline performance.

Authors: Baseline performance is defined as AUROC under random window selection at the same measurement budget; we will state this definition explicitly. The primary correlation result is reported with both Pearson and Spearman coefficients across modalities. We will add a note on the absence of formal multiple-comparison correction (given the small number of pre-specified analyses) while presenting modality-specific correlations to allow independent evaluation. revision: partial

Circularity Check

0 steps flagged

No significant circularity in empirical comparisons

full rationale

The paper reports direct empirical results from comparing adaptive versus random sampling strategies on a longitudinal wearable dataset, including AUROC/F1 gains and their correlation with baseline performance. No equations, fitted parameters, derivations, or self-citations are described that would reduce the reported gains or inverse correlation to tautological inputs by construction. The central claims rest on observable data comparisons without self-referential structure.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The work is an empirical comparison study; the abstract does not introduce or rely on explicit free parameters, mathematical axioms, or newly postulated entities beyond standard machine-learning evaluation practices.

pith-pipeline@v0.9.1-grok · 5733 in / 1249 out tokens · 22293 ms · 2026-06-28T23:46:19.604114+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references

[1]

A., Andrews, S

Piwek, L., Ellis, D. A., Andrews, S. & Joinson, A. The rise of consumer health wearables. PLoS Med. 13, e1001953 (2016)

2016
[2]

Shull, P. B. & Jirattigalachote, W. Digital biomarkers in wearable health technology. npj Digit. Med. 4, 1–10 (2021)

2021
[3]

Wang, R. et al. Tracking depression dynamics in college students using mobile phone and wearable sensing. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 1–26 (2018)

2018
[4]

Sun, Y. et al. Personalized deep learning for substance use in Hawaii: protocol for a passive sensing and ecological momentary assessment study. JMIR Res. Protoc. 13, 46493 (2024)

2024
[5]

Li, S. et al. Monitoring substance use with Fitbit biosignals: a case study on training deep learning models using ecological momentary assessments and passive sensing. AI 5, 2725–2738 (2024)

2024
[6]

Sun, Y. et al. Barriers to designing inclusive ecological momentary assessment and wearable data collection protocols for AI-driven substance use monitoring in Hawai‘i. In Biocomputing 2026 566–579 (2025)

2026
[7]

Active learning literature survey

Settles, B. Active learning literature survey. Univ. Wisconsin Madison 52, 55–66 (2010)

2010
[8]

A., Ghahramani, Z

Cohn, D. A., Ghahramani, Z. & Jordan, M. I. Active learning with statistical models. J. Artif. Intell. Res. 4, 129–145 (1996)

1996
[9]

& Fua, P

Konyushkova, K., Sznitman, R. & Fua, P. Learning active learning from data. Adv. Neural Inf. Process. Syst. 30, 4225–4235 (2017)

2017
[10]

& Washington, P

Islam, T. & Washington, P. Individualized stress mobile sensing using self-supervised pre-training. Appl. Sci. 13, 12035 (2023)

2023
[11]

Baca-García, E. et al. Ecological momentary assessment in mental health. Curr. Opin. Psychiatry 20, 288–292 (2007)

2007
[12]

Personalization of AI using personal foundation models can lead to more precise digital therapeutics

Washington, P. Personalization of AI using personal foundation models can lead to more precise digital therapeutics. JMIR AI 4, 55530 (2025)

2025
[13]

K., Mascolo, C

Lathia, N., Rachuri, K. K., Mascolo, C. & Rentfrow, P. J. Contextual dissonance: design bias in sensor-based experience sampling methods. In UbiComp 183–192 (2013)

2013
[14]

Kargarandehkordi, A. et al. Fusing wearable biosensors with artificial intelligence for mental health monitoring: a systematic review. Biosensors 15, 202 (2025)

2025
[15]

Stone, A. A. & Shiffman, S. Ecological momentary assessment in behavioral medicine. Ann. Behav. Med. 16, 199–202 (1994)

1994
[16]

Shiffman, S., Stone, A. A. & Hufford, M. R. Ecological momentary assessment. Annu. Rev. Clin. Psychol. 4, 1–32 (2008)

2008
[17]

Runyan, J. D. et al. Ecological momentary assessment of psychological states. J. Med. Internet Res. 15, e154 (2013)

2013
[18]

Chen, T. et al. A simple framework for contrastive learning of visual representations. Proc. ICML 1597–1607 (2020)

2020
[19]

& Washington, P

Kargarandehkordi, A., Slade, C. & Washington, P. Personalized AI-driven real-time models to predict stress-induced blood pressure spikes using wearable devices: proposal for a prospective cohort study. JMIR Res. Protoc.13, 55615 (2024)

2024

[1] [1]

A., Andrews, S

Piwek, L., Ellis, D. A., Andrews, S. & Joinson, A. The rise of consumer health wearables. PLoS Med. 13, e1001953 (2016)

2016

[2] [2]

Shull, P. B. & Jirattigalachote, W. Digital biomarkers in wearable health technology. npj Digit. Med. 4, 1–10 (2021)

2021

[3] [3]

Wang, R. et al. Tracking depression dynamics in college students using mobile phone and wearable sensing. Proc. ACM Interact. Mob. Wearable Ubiquitous Technol. 2, 1–26 (2018)

2018

[4] [4]

Sun, Y. et al. Personalized deep learning for substance use in Hawaii: protocol for a passive sensing and ecological momentary assessment study. JMIR Res. Protoc. 13, 46493 (2024)

2024

[5] [5]

Li, S. et al. Monitoring substance use with Fitbit biosignals: a case study on training deep learning models using ecological momentary assessments and passive sensing. AI 5, 2725–2738 (2024)

2024

[6] [6]

Sun, Y. et al. Barriers to designing inclusive ecological momentary assessment and wearable data collection protocols for AI-driven substance use monitoring in Hawai‘i. In Biocomputing 2026 566–579 (2025)

2026

[7] [7]

Active learning literature survey

Settles, B. Active learning literature survey. Univ. Wisconsin Madison 52, 55–66 (2010)

2010

[8] [8]

A., Ghahramani, Z

Cohn, D. A., Ghahramani, Z. & Jordan, M. I. Active learning with statistical models. J. Artif. Intell. Res. 4, 129–145 (1996)

1996

[9] [9]

& Fua, P

Konyushkova, K., Sznitman, R. & Fua, P. Learning active learning from data. Adv. Neural Inf. Process. Syst. 30, 4225–4235 (2017)

2017

[10] [10]

& Washington, P

Islam, T. & Washington, P. Individualized stress mobile sensing using self-supervised pre-training. Appl. Sci. 13, 12035 (2023)

2023

[11] [11]

Baca-García, E. et al. Ecological momentary assessment in mental health. Curr. Opin. Psychiatry 20, 288–292 (2007)

2007

[12] [12]

Personalization of AI using personal foundation models can lead to more precise digital therapeutics

Washington, P. Personalization of AI using personal foundation models can lead to more precise digital therapeutics. JMIR AI 4, 55530 (2025)

2025

[13] [13]

K., Mascolo, C

Lathia, N., Rachuri, K. K., Mascolo, C. & Rentfrow, P. J. Contextual dissonance: design bias in sensor-based experience sampling methods. In UbiComp 183–192 (2013)

2013

[14] [14]

Kargarandehkordi, A. et al. Fusing wearable biosensors with artificial intelligence for mental health monitoring: a systematic review. Biosensors 15, 202 (2025)

2025

[15] [15]

Stone, A. A. & Shiffman, S. Ecological momentary assessment in behavioral medicine. Ann. Behav. Med. 16, 199–202 (1994)

1994

[16] [16]

Shiffman, S., Stone, A. A. & Hufford, M. R. Ecological momentary assessment. Annu. Rev. Clin. Psychol. 4, 1–32 (2008)

2008

[17] [17]

Runyan, J. D. et al. Ecological momentary assessment of psychological states. J. Med. Internet Res. 15, e154 (2013)

2013

[18] [18]

Chen, T. et al. A simple framework for contrastive learning of visual representations. Proc. ICML 1597–1607 (2020)

2020

[19] [19]

& Washington, P

Kargarandehkordi, A., Slade, C. & Washington, P. Personalized AI-driven real-time models to predict stress-induced blood pressure spikes using wearable devices: proposal for a prospective cohort study. JMIR Res. Protoc.13, 55615 (2024)

2024