pith. sign in

arxiv: 2606.23631 · v1 · pith:7K7JDJAUnew · submitted 2026-06-22 · 💻 cs.AI

AI-driven Optimisation of Quality of Recovery (QoR) in Remote Patient Monitoring

Pith reviewed 2026-06-26 08:17 UTC · model grok-4.3

classification 💻 cs.AI
keywords remote patient monitoringquality of recoveryQoR-15item reductionpredictive modelingpostoperative recoveryAUC-ROC
0
0 comments X

The pith

A five-item subset of the QoR-15 predicts recovery severity as accurately as the full survey.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops QoR-compact, a five-question version of the QoR-15 for daily remote patient monitoring. It shows through exhaustive testing that this subset matches the full 15-item survey's ability to predict near-term recovery severity, with an AUC-ROC of 0.968 compared to 0.964. This matters because low completion rates of the longer survey in daily use limit its utility in remote monitoring, so a shorter version could improve consistency while preserving predictive power. The items cover both physical and psychological aspects of recovery.

Core claim

QoR-compact, identified by evaluating all possible five-item subsets of the QoR-15, achieves a mean AUC-ROC of 0.968 (95% CI 0.915-0.988) in predicting recovery severity, statistically comparable to the 0.964 baseline from one-third of the items, and tracks readmission events similarly in patient-level backtesting.

What carries the argument

Exhaustive search over all 3,003 possible five-question subsets of the QoR-15 to select the subset that best matches the full instrument's predictive performance for postoperative recovery severity.

If this is right

  • QoR-compact provides a shorter daily input that maintains predictive parity with the full QoR-15 for remote monitoring.
  • The five items span physical and psychological recovery axes.
  • Patient-level tracking of readmission events remains faithful to the full form.
  • This parity supports testing whether lighter input increases daily completion rates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Completion rates in daily remote monitoring may rise with fewer items per day.
  • External validation across different surgical cohorts and settings is needed to confirm generalizability.
  • The exhaustive subset search method could apply to other multi-item patient-reported outcome measures.

Load-bearing premise

The five-item subset selected from one post-surgical deployment cohort will perform equally well in other patient populations and in ongoing daily remote monitoring.

What would settle it

Measuring the AUC-ROC of QoR-compact on an independent cohort of patients from a different hospital or surgical type and finding it significantly below the full survey's performance.

Figures

Figures reproduced from arXiv: 2606.23631 by Ivana Drobnjak, John Kelly, Li-Hsi (Sonny) Lin, Pramit Khetrapal, Ronnie Stafford, Yansong Liu.

Figure 1
Figure 1. Figure 1: Clinical data-collection workflow of the HALO-Surgery study. Eligible patients undergoing abdominal or thoracic cancer surgery were enrolled, discharged with a remote-monitoring device, and asked to complete the QoR-15 survey daily; the submissions were streamed to the HALO platform for analysis. feature vector is the mean score of each question over the input window ( [PITH_FULL_IMAGE:figures/full_fig_p0… view at source ↗
Figure 2
Figure 2. Figure 2: Overview of the analysis. (1) The 15 QoR-15 items generate all 15 5  = 3,003 five￾question input subsets. (2) The prediction target is the patient’s recovery class over the 14 days following the input window, mapped to four ordinal categories. (3) Each subset is benchmarked with an XGBoost multiclass classifier under 10 stratified bootstrap resamples. (4) Subsets are ranked by AUC-ROC; the five items most… view at source ↗
Figure 3
Figure 3. Figure 3: Hierarchical clustering of the 15 QoR-15 items. The dendrogram uses distance = 1 − |ρ| with average linkage on the pairwise Spearman correlations. Tight item clusters pair questions that probe the same domain: Q3 (feeling rested) with Q4 (good sleep), Q11 (moderate pain) with Q12 (severe pain), Q9 (feeling comfortable and in control) with Q10 (general well-being), and Q14 (feeling worried or anxious) with … view at source ↗
Figure 4
Figure 4. Figure 4: Exhaustive subset evaluation and robust item selection. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Longitudinal backtesting of QoR-15 baseline versus QoR-compact scores. Postoperative recovery trajectories over 30 days are shown for three individual patients (Panels A, B, and C). The full QoR-15 total score (solid line, left axis; max 150) is plotted alongside the derived 5- question QoR-compact total score (dashed line, right axis; max 50). The QoR-compact demonstrates strong alignment with the baselin… view at source ↗
read the original abstract

Remote patient monitoring depends on patient-reported data to capture the subjective dimension of recovery that devices cannot measure. The Quality of Recovery (QoR-15) survey is the gold-standard instrument for this purpose. It was designed and validated for occasional in-hospital assessment, yet remote monitoring now administers it to patients daily. In our own post-surgical deployment, only 55% of patients submitted the survey more than 14 days of 30 monitoring days. We developed QoR-compact, a five-item daily input for the RPM prediction pathway. Setting a deployment-driven target of one-third of the daily items, we exhaustively evaluated all 3,003 five-question subsets of the QoR-15 and tested whether the best of them matches the full instrument in predicting near-term postoperative recovery severity. QoR-compact achieves a mean AUC-ROC of 0.968 (95% CI 0.915-0.988), statistically comparable to the 0.964 baseline obtained with one-third of the items. Patient-level backtesting indicates that it tracks readmission events as faithfully as the full form. Its five items span the physical and psychological axes of recovery: Q3 (feeling rested), Q9 (feeling comfortable and in control), Q10 (general well-being), Q12 (severe pain), and Q14 (feeling worried or anxious). The QoR-15 remains the gold-standard measure of recovery; QoR-compact complements it as a shorter daily input designed for prediction. This parity provides the basis for a prospective study of whether a lighter daily input is, in turn, completed more consistently. External validation on larger cohorts is required before clinical use.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that exhaustive enumeration of all 3,003 five-item subsets of the QoR-15 on a single post-surgical remote-monitoring deployment cohort yields a compact instrument (items Q3, Q9, Q10, Q12, Q14) whose AUC-ROC of 0.968 (95% CI 0.915-0.988) for predicting near-term recovery severity is statistically comparable to the 0.964 value obtained with one-third of the items; patient-level backtesting further indicates equivalent tracking of readmission events. The work positions QoR-compact as a lower-burden daily input that complements the full QoR-15 while preserving predictive utility.

Significance. If externally validated, the result would supply a practical, deployment-derived route to reducing daily patient burden from 15 to 5 items in remote monitoring without apparent loss of predictive signal for recovery severity, directly addressing the reported 55% compliance rate beyond 14 days. The transparent enumeration of all subsets and the explicit call for prospective studies on completion rates constitute concrete, falsifiable next steps.

major comments (2)
  1. [Abstract] Abstract: the reported AUC-ROC parity (0.968 vs 0.964) and the claim that the selected five-item subset 'tracks readmission events as faithfully as the full form' rest on exhaustive subset selection followed by performance evaluation on the identical internal cohort, with no description of nested cross-validation, pre-specified hold-out, or external test set; this selection procedure is load-bearing for the generalization claim to prospective daily RPM use.
  2. [Abstract] Abstract: the manuscript supplies no cohort size, patient demographics, handling of missing data, or statistical test used to declare the two AUC values 'statistically comparable'; these omissions prevent assessment of whether the observed parity exceeds what would be expected from chance capitalization on cohort-specific correlations.
minor comments (1)
  1. [Abstract] Abstract: the parenthetical descriptions of the five retained items (e.g., 'Q3 (feeling rested)') are helpful but would benefit from an explicit table mapping each to the original QoR-15 wording and subscale for readers outside the immediate domain.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive review. We address each major comment below, agreeing that greater transparency on validation and methods is needed.

read point-by-point responses
  1. Referee: [Abstract] Abstract: the reported AUC-ROC parity (0.968 vs 0.964) and the claim that the selected five-item subset 'tracks readmission events as faithfully as the full form' rest on exhaustive subset selection followed by performance evaluation on the identical internal cohort, with no description of nested cross-validation, pre-specified hold-out, or external test set; this selection procedure is load-bearing for the generalization claim to prospective daily RPM use.

    Authors: We agree that subset selection and evaluation were performed on the same internal cohort without nested cross-validation, hold-out set, or external testing. This limits generalization claims and risks cohort-specific overfitting. We will revise the abstract, methods, and discussion to explicitly describe the procedure as internal validation only, qualify the readmission tracking as cohort backtesting, and strengthen emphasis on the need for prospective external validation before clinical deployment. revision: yes

  2. Referee: [Abstract] Abstract: the manuscript supplies no cohort size, patient demographics, handling of missing data, or statistical test used to declare the two AUC values 'statistically comparable'; these omissions prevent assessment of whether the observed parity exceeds what would be expected from chance capitalization on cohort-specific correlations.

    Authors: We will add these details in revision: cohort size and demographics in Methods/Results, missing data handling (e.g., exclusion or imputation rules), and the statistical test for AUC comparability (e.g., bootstrap or DeLong test). The abstract will reference the updated methods to allow assessment of chance capitalization risk. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical selection and evaluation on internal cohort with no equations or self-citation chains

full rationale

The paper contains no equations, derivations, or first-principles claims. It describes an exhaustive enumeration of 3003 five-item subsets on the authors' single post-surgical deployment cohort, followed by direct reporting of AUC-ROC performance for the selected subset on that same data. This is a standard (if optimistic) empirical feature-selection procedure rather than any reduction of a claimed prediction to its inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps. The result is therefore self-contained as a data-driven report on the deployment cohort without circularity under the enumerated patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms or invented entities are stated. The subset selection is data-driven via exhaustive enumeration but details on data characteristics and selection criteria are absent.

pith-pipeline@v0.9.1-grok · 5858 in / 1116 out tokens · 24091 ms · 2026-06-26T08:17:16.183386+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

13 extracted references · 9 canonical work pages

  1. [1]

    Remote patient monitoring using artificial intelligence: Current state, applications, and challenges.WIREs Data Mining and Knowledge Discovery

    Shaik T, Tao X, Higgins N, et al. Remote patient monitoring using artificial intelligence: Current state, applications, and challenges.WIREs Data Mining and Knowledge Discovery. 2023;13(2):e1485. doi:https://doi.org/10.1002/widm.1485

  2. [2]

    Patient-reported outcome measures (PROMs): A review of generic and condition-specific measures and a discussion of trends and issues.Health Expectations

    Churruca K, Pomare C, Ellis LA, et al. Patient-reported outcome measures (PROMs): A review of generic and condition-specific measures and a discussion of trends and issues.Health Expectations. 2021;24(4):1015–1024. doi:https://doi.org/10.1111/hex.13254

  3. [3]

    Development and psychometric evaluation of a postoperative quality of recovery score: the QoR-15.Anesthesiology

    Stark PA, Myles PS, Burke JA. Development and psychometric evaluation of a postoperative quality of recovery score: the QoR-15.Anesthesiology. 2013;118(6):1332–1340. doi:https://doi. org/10.1097/ALN.0b013e318289b84b

  4. [4]

    Measurement of quality of recovery after surgery using the 15-item quality of recovery scale: a systematic review and meta-analysis

    Myles PS, Shulman MA, Reilly J, Kasza J, Romero L. Measurement of quality of recovery after surgery using the 15-item quality of recovery scale: a systematic review and meta-analysis. British Journal of Anaesthesia. 2022;128(6):1029–1039. doi:https://doi.org/10.1016/j.bja. 2022.03.009

  5. [5]

    Using a mobile app for monitoring post-operative quality of recovery of patients at home: a feasibility study.Journal of Medical Internet Research

    Semple C, Bhatt B, Sharpe S, et al. Using a mobile app for monitoring post-operative quality of recovery of patients at home: a feasibility study.Journal of Medical Internet Research. 2015;17(7):e168. doi:https://doi.org/10.2196/jmir.3851

  6. [6]

    Effect of Smartphone App Postoperative Home Monitoring After Oncologic Surgery on Quality of Recovery: A Randomized Clinical Trial

    Temple-Oberle C, Shea-Budgell MA, Tan M, et al. Effect of Smartphone App Postoperative Home Monitoring After Oncologic Surgery on Quality of Recovery: A Randomized Clinical Trial. JAMA Surgery. 2023;158(11):1181–1188. doi:https://doi.org/10.1001/jamasurg.2023.4145

  7. [7]

    Jour- nal of Systems and Software225, 112326 (2025) https://doi.org/10.1016/j.jss

    Kleif J, Gögenur I. Severity classification of the quality of recovery-15 score—An observational study.Journal of Surgical Research. 2018;225:101–107. doi:https://doi.org/10.1016/j.jss. 2017.12.040

  8. [8]

    Multi-Modal AI for Remote Patient Monitoring in Cancer Care.arXiv preprintarXiv:2512.00949

    Liu Y, Stafford R, Khetrapal P, et al. Multi-Modal AI for Remote Patient Monitoring in Cancer Care.arXiv preprintarXiv:2512.00949. 2025.https://arxiv.org/abs/2512.00949

  9. [9]

    HALO-X: A full-stack remote patient monitoring platform for post-cancer recovery

    Stafford R, Liu Y, Khetrapal P, Carvalho G, Kocadag H, Surrao D, McBain H, Winter P, Jackson-Spence F, Powles T, Kelly JD, Drobnjak I. HALO-X: A full-stack remote patient monitoring platform for post-cancer recovery. Manuscript under review, 2025. 9

  10. [10]

    Scikit-learn: Machine Learning in Python

    Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830

  11. [11]

    Methodological approaches to shortening composite measurement scales.Journal of Clinical Epidemiology

    Coste J, Guillemin F, Pouchot J, Fermanian J. Methodological approaches to shortening composite measurement scales.Journal of Clinical Epidemiology. 1997;50(3):247–252. doi:https: //doi.org/10.1016/S0895-4356(97)90533-8

  12. [12]

    Multimodal strategies to improve surgical outcome.American Journal of Surgery

    Kehlet H, Wilmore DW. Multimodal strategies to improve surgical outcome.American Journal of Surgery. 2002;183(6):630–641. doi:https://doi.org/10.1016/S0002-9610(02)00412-3

  13. [13]

    AI-driven Optimisation of Quality of Recovery (QoR) in Remote Patient Monitoring [abstract]

    Lin LH, Liu Y, Khetrapal P, et al. AI-driven Optimisation of Quality of Recovery (QoR) in Remote Patient Monitoring [abstract]. Accepted as a poster at AI in Medicine 2026, Polish Institute for Evidence Based Medicine. Available at: piebm.org/abstracts/qor-remote-patient- monitoring Supplementary Material Figure S1:Lower-triangle Spearman correlation matr...