AI-driven Optimisation of Quality of Recovery (QoR) in Remote Patient Monitoring

Ivana Drobnjak; John Kelly; Li-Hsi (Sonny) Lin; Pramit Khetrapal; Ronnie Stafford; Yansong Liu

arxiv: 2606.23631 · v1 · pith:7K7JDJAUnew · submitted 2026-06-22 · 💻 cs.AI

AI-driven Optimisation of Quality of Recovery (QoR) in Remote Patient Monitoring

Yansong Liu , Li-Hsi (Sonny) Lin , Pramit Khetrapal , Ronnie Stafford , John Kelly , Ivana Drobnjak This is my paper

Pith reviewed 2026-06-26 08:17 UTC · model grok-4.3

classification 💻 cs.AI

keywords remote patient monitoringquality of recoveryQoR-15item reductionpredictive modelingpostoperative recoveryAUC-ROC

0 comments

The pith

A five-item subset of the QoR-15 predicts recovery severity as accurately as the full survey.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops QoR-compact, a five-question version of the QoR-15 for daily remote patient monitoring. It shows through exhaustive testing that this subset matches the full 15-item survey's ability to predict near-term recovery severity, with an AUC-ROC of 0.968 compared to 0.964. This matters because low completion rates of the longer survey in daily use limit its utility in remote monitoring, so a shorter version could improve consistency while preserving predictive power. The items cover both physical and psychological aspects of recovery.

Core claim

QoR-compact, identified by evaluating all possible five-item subsets of the QoR-15, achieves a mean AUC-ROC of 0.968 (95% CI 0.915-0.988) in predicting recovery severity, statistically comparable to the 0.964 baseline from one-third of the items, and tracks readmission events similarly in patient-level backtesting.

What carries the argument

Exhaustive search over all 3,003 possible five-question subsets of the QoR-15 to select the subset that best matches the full instrument's predictive performance for postoperative recovery severity.

If this is right

QoR-compact provides a shorter daily input that maintains predictive parity with the full QoR-15 for remote monitoring.
The five items span physical and psychological recovery axes.
Patient-level tracking of readmission events remains faithful to the full form.
This parity supports testing whether lighter input increases daily completion rates.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Completion rates in daily remote monitoring may rise with fewer items per day.
External validation across different surgical cohorts and settings is needed to confirm generalizability.
The exhaustive subset search method could apply to other multi-item patient-reported outcome measures.

Load-bearing premise

The five-item subset selected from one post-surgical deployment cohort will perform equally well in other patient populations and in ongoing daily remote monitoring.

What would settle it

Measuring the AUC-ROC of QoR-compact on an independent cohort of patients from a different hospital or surgical type and finding it significantly below the full survey's performance.

Figures

Figures reproduced from arXiv: 2606.23631 by Ivana Drobnjak, John Kelly, Li-Hsi (Sonny) Lin, Pramit Khetrapal, Ronnie Stafford, Yansong Liu.

**Figure 1.** Figure 1: Clinical data-collection workflow of the HALO-Surgery study. Eligible patients undergoing abdominal or thoracic cancer surgery were enrolled, discharged with a remote-monitoring device, and asked to complete the QoR-15 survey daily; the submissions were streamed to the HALO platform for analysis. feature vector is the mean score of each question over the input window ( [PITH_FULL_IMAGE:figures/full_fig_p0… view at source ↗

**Figure 2.** Figure 2: Overview of the analysis. (1) The 15 QoR-15 items generate all 15 5 = 3,003 fivequestion input subsets. (2) The prediction target is the patient’s recovery class over the 14 days following the input window, mapped to four ordinal categories. (3) Each subset is benchmarked with an XGBoost multiclass classifier under 10 stratified bootstrap resamples. (4) Subsets are ranked by AUC-ROC; the five items most… view at source ↗

**Figure 3.** Figure 3: Hierarchical clustering of the 15 QoR-15 items. The dendrogram uses distance = 1 − |ρ| with average linkage on the pairwise Spearman correlations. Tight item clusters pair questions that probe the same domain: Q3 (feeling rested) with Q4 (good sleep), Q11 (moderate pain) with Q12 (severe pain), Q9 (feeling comfortable and in control) with Q10 (general well-being), and Q14 (feeling worried or anxious) with … view at source ↗

**Figure 4.** Figure 4: Exhaustive subset evaluation and robust item selection. [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Longitudinal backtesting of QoR-15 baseline versus QoR-compact scores. Postoperative recovery trajectories over 30 days are shown for three individual patients (Panels A, B, and C). The full QoR-15 total score (solid line, left axis; max 150) is plotted alongside the derived 5- question QoR-compact total score (dashed line, right axis; max 50). The QoR-compact demonstrates strong alignment with the baselin… view at source ↗

read the original abstract

Remote patient monitoring depends on patient-reported data to capture the subjective dimension of recovery that devices cannot measure. The Quality of Recovery (QoR-15) survey is the gold-standard instrument for this purpose. It was designed and validated for occasional in-hospital assessment, yet remote monitoring now administers it to patients daily. In our own post-surgical deployment, only 55% of patients submitted the survey more than 14 days of 30 monitoring days. We developed QoR-compact, a five-item daily input for the RPM prediction pathway. Setting a deployment-driven target of one-third of the daily items, we exhaustively evaluated all 3,003 five-question subsets of the QoR-15 and tested whether the best of them matches the full instrument in predicting near-term postoperative recovery severity. QoR-compact achieves a mean AUC-ROC of 0.968 (95% CI 0.915-0.988), statistically comparable to the 0.964 baseline obtained with one-third of the items. Patient-level backtesting indicates that it tracks readmission events as faithfully as the full form. Its five items span the physical and psychological axes of recovery: Q3 (feeling rested), Q9 (feeling comfortable and in control), Q10 (general well-being), Q12 (severe pain), and Q14 (feeling worried or anxious). The QoR-15 remains the gold-standard measure of recovery; QoR-compact complements it as a shorter daily input designed for prediction. This parity provides the basis for a prospective study of whether a lighter daily input is, in turn, completed more consistently. External validation on larger cohorts is required before clinical use.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper identifies a 5-item QoR-15 subset that matches full-instrument AUC internally via exhaustive search, but reports no nested CV or hold-out so the parity claim rests on potentially overfit internal data.

read the letter

The main thing here is that the authors ran an exhaustive search over all 3003 possible 5-item subsets of the QoR-15 on their single post-surgical RPM cohort and landed on items Q3, Q9, Q10, Q12, and Q14, which gave a mean AUC-ROC of 0.968 (CI 0.915-0.988) versus 0.964 for a random one-third subset. Patient-level backtesting showed similar tracking of readmission events. That specific subset and its reported numbers in the daily RPM setting are new.

The work is straightforward and honest about its limits. They started from a real deployment problem (only 55% compliance with daily full QoR-15) and set a practical target of one-third the items. The chosen items cover both physical and psychological domains, and the abstract explicitly calls for external validation before clinical use.

The soft spot is the evaluation design. Subset selection and performance reporting both happened on the same internal cohort with no description of nested cross-validation, pre-specified hold-out, or external test set. Exhaustive enumeration on the evaluation data tends to capitalize on cohort-specific correlations, so the claimed statistical comparability may shrink or disappear in new populations or prospective daily use. The abstract also omits cohort size, demographics, missing-data handling, and the exact statistical test for comparability.

This is for people working on daily remote monitoring instruments and patient-reported outcome reduction. A reader interested in QoR-15 applications or short-form instrument design would get value from the concrete subset and the compliance motivation, but the paper is too preliminary for direct adoption. It deserves a serious referee to check the full methods and push for proper validation.

Referee Report

2 major / 1 minor

Summary. The manuscript claims that exhaustive enumeration of all 3,003 five-item subsets of the QoR-15 on a single post-surgical remote-monitoring deployment cohort yields a compact instrument (items Q3, Q9, Q10, Q12, Q14) whose AUC-ROC of 0.968 (95% CI 0.915-0.988) for predicting near-term recovery severity is statistically comparable to the 0.964 value obtained with one-third of the items; patient-level backtesting further indicates equivalent tracking of readmission events. The work positions QoR-compact as a lower-burden daily input that complements the full QoR-15 while preserving predictive utility.

Significance. If externally validated, the result would supply a practical, deployment-derived route to reducing daily patient burden from 15 to 5 items in remote monitoring without apparent loss of predictive signal for recovery severity, directly addressing the reported 55% compliance rate beyond 14 days. The transparent enumeration of all subsets and the explicit call for prospective studies on completion rates constitute concrete, falsifiable next steps.

major comments (2)

[Abstract] Abstract: the reported AUC-ROC parity (0.968 vs 0.964) and the claim that the selected five-item subset 'tracks readmission events as faithfully as the full form' rest on exhaustive subset selection followed by performance evaluation on the identical internal cohort, with no description of nested cross-validation, pre-specified hold-out, or external test set; this selection procedure is load-bearing for the generalization claim to prospective daily RPM use.
[Abstract] Abstract: the manuscript supplies no cohort size, patient demographics, handling of missing data, or statistical test used to declare the two AUC values 'statistically comparable'; these omissions prevent assessment of whether the observed parity exceeds what would be expected from chance capitalization on cohort-specific correlations.

minor comments (1)

[Abstract] Abstract: the parenthetical descriptions of the five retained items (e.g., 'Q3 (feeling rested)') are helpful but would benefit from an explicit table mapping each to the original QoR-15 wording and subscale for readers outside the immediate domain.

Simulated Author's Rebuttal

2 responses · 0 unresolved

Thank you for the constructive review. We address each major comment below, agreeing that greater transparency on validation and methods is needed.

read point-by-point responses

Referee: [Abstract] Abstract: the reported AUC-ROC parity (0.968 vs 0.964) and the claim that the selected five-item subset 'tracks readmission events as faithfully as the full form' rest on exhaustive subset selection followed by performance evaluation on the identical internal cohort, with no description of nested cross-validation, pre-specified hold-out, or external test set; this selection procedure is load-bearing for the generalization claim to prospective daily RPM use.

Authors: We agree that subset selection and evaluation were performed on the same internal cohort without nested cross-validation, hold-out set, or external testing. This limits generalization claims and risks cohort-specific overfitting. We will revise the abstract, methods, and discussion to explicitly describe the procedure as internal validation only, qualify the readmission tracking as cohort backtesting, and strengthen emphasis on the need for prospective external validation before clinical deployment. revision: yes
Referee: [Abstract] Abstract: the manuscript supplies no cohort size, patient demographics, handling of missing data, or statistical test used to declare the two AUC values 'statistically comparable'; these omissions prevent assessment of whether the observed parity exceeds what would be expected from chance capitalization on cohort-specific correlations.

Authors: We will add these details in revision: cohort size and demographics in Methods/Results, missing data handling (e.g., exclusion or imputation rules), and the statistical test for AUC comparability (e.g., bootstrap or DeLong test). The abstract will reference the updated methods to allow assessment of chance capitalization risk. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical selection and evaluation on internal cohort with no equations or self-citation chains

full rationale

The paper contains no equations, derivations, or first-principles claims. It describes an exhaustive enumeration of 3003 five-item subsets on the authors' single post-surgical deployment cohort, followed by direct reporting of AUC-ROC performance for the selected subset on that same data. This is a standard (if optimistic) empirical feature-selection procedure rather than any reduction of a claimed prediction to its inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked as load-bearing steps. The result is therefore self-contained as a data-driven report on the deployment cohort without circularity under the enumerated patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms or invented entities are stated. The subset selection is data-driven via exhaustive enumeration but details on data characteristics and selection criteria are absent.

pith-pipeline@v0.9.1-grok · 5858 in / 1116 out tokens · 24091 ms · 2026-06-26T08:17:16.183386+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

13 extracted references · 9 canonical work pages

[1]

Remote patient monitoring using artificial intelligence: Current state, applications, and challenges.WIREs Data Mining and Knowledge Discovery

Shaik T, Tao X, Higgins N, et al. Remote patient monitoring using artificial intelligence: Current state, applications, and challenges.WIREs Data Mining and Knowledge Discovery. 2023;13(2):e1485. doi:https://doi.org/10.1002/widm.1485

work page doi:10.1002/widm.1485 2023
[2]

Patient-reported outcome measures (PROMs): A review of generic and condition-specific measures and a discussion of trends and issues.Health Expectations

Churruca K, Pomare C, Ellis LA, et al. Patient-reported outcome measures (PROMs): A review of generic and condition-specific measures and a discussion of trends and issues.Health Expectations. 2021;24(4):1015–1024. doi:https://doi.org/10.1111/hex.13254

work page doi:10.1111/hex.13254 2021
[3]

Development and psychometric evaluation of a postoperative quality of recovery score: the QoR-15.Anesthesiology

Stark PA, Myles PS, Burke JA. Development and psychometric evaluation of a postoperative quality of recovery score: the QoR-15.Anesthesiology. 2013;118(6):1332–1340. doi:https://doi. org/10.1097/ALN.0b013e318289b84b

work page doi:10.1097/aln.0b013e318289b84b 2013
[4]

Measurement of quality of recovery after surgery using the 15-item quality of recovery scale: a systematic review and meta-analysis

Myles PS, Shulman MA, Reilly J, Kasza J, Romero L. Measurement of quality of recovery after surgery using the 15-item quality of recovery scale: a systematic review and meta-analysis. British Journal of Anaesthesia. 2022;128(6):1029–1039. doi:https://doi.org/10.1016/j.bja. 2022.03.009

work page doi:10.1016/j.bja 2022
[5]

Using a mobile app for monitoring post-operative quality of recovery of patients at home: a feasibility study.Journal of Medical Internet Research

Semple C, Bhatt B, Sharpe S, et al. Using a mobile app for monitoring post-operative quality of recovery of patients at home: a feasibility study.Journal of Medical Internet Research. 2015;17(7):e168. doi:https://doi.org/10.2196/jmir.3851

work page doi:10.2196/jmir.3851 2015
[6]

Effect of Smartphone App Postoperative Home Monitoring After Oncologic Surgery on Quality of Recovery: A Randomized Clinical Trial

Temple-Oberle C, Shea-Budgell MA, Tan M, et al. Effect of Smartphone App Postoperative Home Monitoring After Oncologic Surgery on Quality of Recovery: A Randomized Clinical Trial. JAMA Surgery. 2023;158(11):1181–1188. doi:https://doi.org/10.1001/jamasurg.2023.4145

work page doi:10.1001/jamasurg.2023.4145 2023
[7]

Jour- nal of Systems and Software225, 112326 (2025) https://doi.org/10.1016/j.jss

Kleif J, Gögenur I. Severity classification of the quality of recovery-15 score—An observational study.Journal of Surgical Research. 2018;225:101–107. doi:https://doi.org/10.1016/j.jss. 2017.12.040

work page doi:10.1016/j.jss 2018
[8]

Multi-Modal AI for Remote Patient Monitoring in Cancer Care.arXiv preprintarXiv:2512.00949

Liu Y, Stafford R, Khetrapal P, et al. Multi-Modal AI for Remote Patient Monitoring in Cancer Care.arXiv preprintarXiv:2512.00949. 2025.https://arxiv.org/abs/2512.00949

arXiv 2025
[9]

HALO-X: A full-stack remote patient monitoring platform for post-cancer recovery

Stafford R, Liu Y, Khetrapal P, Carvalho G, Kocadag H, Surrao D, McBain H, Winter P, Jackson-Spence F, Powles T, Kelly JD, Drobnjak I. HALO-X: A full-stack remote patient monitoring platform for post-cancer recovery. Manuscript under review, 2025. 9

2025
[10]

Scikit-learn: Machine Learning in Python

Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830

2011
[11]

Methodological approaches to shortening composite measurement scales.Journal of Clinical Epidemiology

Coste J, Guillemin F, Pouchot J, Fermanian J. Methodological approaches to shortening composite measurement scales.Journal of Clinical Epidemiology. 1997;50(3):247–252. doi:https: //doi.org/10.1016/S0895-4356(97)90533-8

work page doi:10.1016/s0895-4356(97)90533-8 1997
[12]

Multimodal strategies to improve surgical outcome.American Journal of Surgery

Kehlet H, Wilmore DW. Multimodal strategies to improve surgical outcome.American Journal of Surgery. 2002;183(6):630–641. doi:https://doi.org/10.1016/S0002-9610(02)00412-3

work page doi:10.1016/s0002-9610(02)00412-3 2002
[13]

AI-driven Optimisation of Quality of Recovery (QoR) in Remote Patient Monitoring [abstract]

Lin LH, Liu Y, Khetrapal P, et al. AI-driven Optimisation of Quality of Recovery (QoR) in Remote Patient Monitoring [abstract]. Accepted as a poster at AI in Medicine 2026, Polish Institute for Evidence Based Medicine. Available at: piebm.org/abstracts/qor-remote-patient- monitoring Supplementary Material Figure S1:Lower-triangle Spearman correlation matr...

2026

[1] [1]

Remote patient monitoring using artificial intelligence: Current state, applications, and challenges.WIREs Data Mining and Knowledge Discovery

Shaik T, Tao X, Higgins N, et al. Remote patient monitoring using artificial intelligence: Current state, applications, and challenges.WIREs Data Mining and Knowledge Discovery. 2023;13(2):e1485. doi:https://doi.org/10.1002/widm.1485

work page doi:10.1002/widm.1485 2023

[2] [2]

Patient-reported outcome measures (PROMs): A review of generic and condition-specific measures and a discussion of trends and issues.Health Expectations

Churruca K, Pomare C, Ellis LA, et al. Patient-reported outcome measures (PROMs): A review of generic and condition-specific measures and a discussion of trends and issues.Health Expectations. 2021;24(4):1015–1024. doi:https://doi.org/10.1111/hex.13254

work page doi:10.1111/hex.13254 2021

[3] [3]

Development and psychometric evaluation of a postoperative quality of recovery score: the QoR-15.Anesthesiology

Stark PA, Myles PS, Burke JA. Development and psychometric evaluation of a postoperative quality of recovery score: the QoR-15.Anesthesiology. 2013;118(6):1332–1340. doi:https://doi. org/10.1097/ALN.0b013e318289b84b

work page doi:10.1097/aln.0b013e318289b84b 2013

[4] [4]

Measurement of quality of recovery after surgery using the 15-item quality of recovery scale: a systematic review and meta-analysis

Myles PS, Shulman MA, Reilly J, Kasza J, Romero L. Measurement of quality of recovery after surgery using the 15-item quality of recovery scale: a systematic review and meta-analysis. British Journal of Anaesthesia. 2022;128(6):1029–1039. doi:https://doi.org/10.1016/j.bja. 2022.03.009

work page doi:10.1016/j.bja 2022

[5] [5]

Using a mobile app for monitoring post-operative quality of recovery of patients at home: a feasibility study.Journal of Medical Internet Research

Semple C, Bhatt B, Sharpe S, et al. Using a mobile app for monitoring post-operative quality of recovery of patients at home: a feasibility study.Journal of Medical Internet Research. 2015;17(7):e168. doi:https://doi.org/10.2196/jmir.3851

work page doi:10.2196/jmir.3851 2015

[6] [6]

Effect of Smartphone App Postoperative Home Monitoring After Oncologic Surgery on Quality of Recovery: A Randomized Clinical Trial

Temple-Oberle C, Shea-Budgell MA, Tan M, et al. Effect of Smartphone App Postoperative Home Monitoring After Oncologic Surgery on Quality of Recovery: A Randomized Clinical Trial. JAMA Surgery. 2023;158(11):1181–1188. doi:https://doi.org/10.1001/jamasurg.2023.4145

work page doi:10.1001/jamasurg.2023.4145 2023

[7] [7]

Jour- nal of Systems and Software225, 112326 (2025) https://doi.org/10.1016/j.jss

Kleif J, Gögenur I. Severity classification of the quality of recovery-15 score—An observational study.Journal of Surgical Research. 2018;225:101–107. doi:https://doi.org/10.1016/j.jss. 2017.12.040

work page doi:10.1016/j.jss 2018

[8] [8]

Multi-Modal AI for Remote Patient Monitoring in Cancer Care.arXiv preprintarXiv:2512.00949

Liu Y, Stafford R, Khetrapal P, et al. Multi-Modal AI for Remote Patient Monitoring in Cancer Care.arXiv preprintarXiv:2512.00949. 2025.https://arxiv.org/abs/2512.00949

arXiv 2025

[9] [9]

HALO-X: A full-stack remote patient monitoring platform for post-cancer recovery

Stafford R, Liu Y, Khetrapal P, Carvalho G, Kocadag H, Surrao D, McBain H, Winter P, Jackson-Spence F, Powles T, Kelly JD, Drobnjak I. HALO-X: A full-stack remote patient monitoring platform for post-cancer recovery. Manuscript under review, 2025. 9

2025

[10] [10]

Scikit-learn: Machine Learning in Python

Pedregosa F, Varoquaux G, Gramfort A, et al. Scikit-learn: Machine Learning in Python. Journal of Machine Learning Research. 2011;12:2825–2830

2011

[11] [11]

Methodological approaches to shortening composite measurement scales.Journal of Clinical Epidemiology

Coste J, Guillemin F, Pouchot J, Fermanian J. Methodological approaches to shortening composite measurement scales.Journal of Clinical Epidemiology. 1997;50(3):247–252. doi:https: //doi.org/10.1016/S0895-4356(97)90533-8

work page doi:10.1016/s0895-4356(97)90533-8 1997

[12] [12]

Multimodal strategies to improve surgical outcome.American Journal of Surgery

Kehlet H, Wilmore DW. Multimodal strategies to improve surgical outcome.American Journal of Surgery. 2002;183(6):630–641. doi:https://doi.org/10.1016/S0002-9610(02)00412-3

work page doi:10.1016/s0002-9610(02)00412-3 2002

[13] [13]

AI-driven Optimisation of Quality of Recovery (QoR) in Remote Patient Monitoring [abstract]

Lin LH, Liu Y, Khetrapal P, et al. AI-driven Optimisation of Quality of Recovery (QoR) in Remote Patient Monitoring [abstract]. Accepted as a poster at AI in Medicine 2026, Polish Institute for Evidence Based Medicine. Available at: piebm.org/abstracts/qor-remote-patient- monitoring Supplementary Material Figure S1:Lower-triangle Spearman correlation matr...

2026