Temporal Data Requirement for Predicting Unplanned Hospital Readmissions

Amir T. Namin; Ramin Mohammadi; Ramya Palacholla; Sagar Kamarthi; Sarthak Jain; Vahab vahdat

arxiv: 2605.00738 · v1 · submitted 2026-05-01 · 💻 cs.LG

Temporal Data Requirement for Predicting Unplanned Hospital Readmissions

Ramin Mohammadi , Vahab vahdat , Sarthak Jain , Amir T. Namin , Ramya Palacholla , Sagar Kamarthi This is my paper

Pith reviewed 2026-05-09 19:09 UTC · model grok-4.3

classification 💻 cs.LG

keywords hospital readmissionelectronic health recordsclinical notesstructured datatemporal windowsmachine learning predictionarthroplastyEHR modalities

0 comments

The pith

Shorter histories from clinical notes outperform longer ones for predicting surgical readmissions, while structured data improves only up to twelve months before plateauing.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests how far back in electronic health records one needs to look to best predict 30-day unplanned readmission after hip or knee replacement surgery. It compares models built from unstructured clinical notes alone, structured encounter records alone, and both together, using data from over 7,000 patients and a range of encoders from bag-of-words to BERT and BiLSTM. The results show that notes reach peak accuracy with information from only the prior three to six months, whereas structured data keeps gaining value as the window extends to twelve months and then stops improving. These modality-specific patterns hold steady no matter which model or encoder is used. If correct, the work supplies practical rules for choosing data windows that can make readmission models both more accurate and less expensive to run.

Core claim

The study shows that the optimal observation window for unstructured clinical notes is three to six months prior to surgery for maximum predictive performance in 30-day readmission after arthroplasties, while for structured encounter records performance improves with longer windows but plateaus after twelve months; these modality-specific temporal patterns remain consistent across non-neural and neural encoders and across model complexities.

What carries the argument

Modality-specific temporal performance curves, in which unstructured notes require markedly shorter observation windows than structured records to reach peak accuracy for readmission prediction.

If this is right

Models reach maximum accuracy using notes from only the three-to-six-month window before surgery.
Structured data adds value up to twelve months but yields no further gain beyond that point.
The differing optimal windows hold for every encoder and model type tested.
Readmission prediction systems can be tuned with modality-specific time cutoffs rather than uniform long histories.
The assumption that longer historical data always improves machine-learning predictions is contradicted by these modality-specific results.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar modality-dependent window rules may apply to other EHR-based predictions such as infection risk or length of stay.
Limiting note processing to recent months could reduce storage and compute costs while preserving or improving accuracy.
Hospitals adopting these guidelines might achieve faster model retraining cycles by focusing data pipelines on the most informative periods.
The plateau in structured data suggests that very old records contain mostly redundant information for this outcome.

Load-bearing premise

The temporal patterns found in one health system's 7,174-patient cohort will appear in other hospitals and patient populations.

What would settle it

Re-running the exact same window-length experiments on readmission data from a second, independent health system and finding that optimal note windows exceed six months or that structured-data performance keeps rising past twelve months.

read the original abstract

With the proliferation of Electronic Health Records (EHRs), a critical challenge in building predictive models is determining the optimal historical data time window to maximize accuracy. This study investigates the impact of various observation windows ranging from the day of surgery to three years prior on predicting 30-day readmission following hip and knee arthroplasties. The dataset encompasses both structured encounter records (over 4 million) and unstructured clinical notes (80,000) from 7,174 patients. To extract meaning from the clinical notes, we employed a suite of non neural (BOW, count BOW, TF IDF, LDA) and neural encoders (BERT, 1D CNN, BiLSTM, Average). We subsequently evaluated models utilizing clinical notes alone, structured data alone, and a combination of both modalities. Our results demonstrate that the optimal time window for unstructured clinical notes is significantly shorter than for structured data, maximum predictive performance was achieved using notes from just three to six months prior to surgery. In contrast, performance using structured data improved as the time window lengthened, but strictly plateaued after twelve months. These modality-specific temporal patterns remained consistent regardless of model complexity or encoder type. Ultimately, these findings challenge the general assumption that more historical data inherently yields better machine learning predictions, establishing targeted time-window guidelines for optimizing readmission prediction models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows notes stop adding value after 3-6 months for readmission prediction while structured data improves to 12 months, with the pattern holding across encoders, but single-site data caps how much we can generalize the rule.

read the letter

The central finding is straightforward: for 30-day readmission after hip and knee procedures, clinical notes reach peak usefulness at 3-6 months before surgery and then plateau, while structured encounter data keeps improving until 12 months and then levels off. This split appears no matter which encoder they use on the notes, from simple bag-of-words to BERT or BiLSTM. They ran the comparison on 7,174 patients with over 4 million structured records and 80,000 notes, sweeping windows from the day of surgery out to three years. That consistency across model types is the part that feels solid and worth noting. It directly tests the common assumption that longer history is always better and gives a concrete, modality-specific cutoff instead. The work is purely empirical with no circular definitions or self-referential loops, which keeps it clean. They also check notes alone, structured alone, and combined, so the reader can see the windows don't shift much when modalities are mixed. The main weakness is the single health system. All patterns could reflect local documentation habits, coding density, or patient mix rather than a transferable principle. No external cohort or multi-site check is described, so the exact 3-6 and 12-month numbers are hard to treat as general guidelines yet. The abstract also skips details on statistical testing, confidence intervals, or imbalance handling, though the full paper may cover them. This is the kind of paper that matters for teams inside health systems who are trimming data pipelines and want an evidence-based rule rather than a blanket “use everything” approach. Applied ML people working on EHR models would get practical value from the curves. It deserves peer review because the experimental design is transparent and the question is actionable, even if reviewers will likely ask for external validation or clearer stats before acceptance.

Referee Report

2 major / 1 minor

Summary. The manuscript presents an empirical study on the effect of historical time windows (from surgery day to 3 years prior) on machine learning models for predicting 30-day readmission after hip and knee arthroplasties. Using data from 7,174 patients including over 4 million structured records and 80,000 clinical notes, it evaluates models based on notes alone, structured data alone, and combined, employing both traditional (BOW, TF-IDF, LDA) and neural (BERT, 1D CNN, BiLSTM, Average) encoders. The key finding is that optimal performance for notes occurs with 3-6 months of history, while structured data benefits from up to 12 months before plateauing, with these patterns consistent across encoder types. The authors argue this challenges the assumption that longer historical data always improves predictions and offers practical guidelines for time window selection.

Significance. If the modality-specific temporal patterns hold, this work could meaningfully inform the design of efficient EHR-based predictive models by demonstrating that longer histories are not always superior and may be unnecessary or even suboptimal for unstructured notes. The consistency of results across a diverse suite of non-neural and neural encoders is a clear strength, lending credibility to the empirical trends observed. Such targeted guidelines could help optimize data selection, reduce computational overhead, and improve model practicality in clinical settings. The purely empirical nature with no circular derivations further supports its value as a data-driven contribution.

major comments (2)

[Abstract] Abstract: The claim that the findings 'challenge the general assumption that more historical data inherently yields better machine learning predictions' and 'establish targeted time-window guidelines' is load-bearing on generalizability. However, the entire analysis derives from a single health system cohort of 7,174 patients with no external validation cohort, multi-site split, or cross-institutional temporal hold-out described. The modality-specific patterns (notes at 3-6 months; structured plateauing at 12 months) could arise from local documentation practices, coding styles, or base rates rather than transferable principles.
[Results] Results (performance trends): The reported optimal windows and consistency across encoders are presented without statistical testing, confidence intervals, or p-values to establish that the peaks and plateaus differ significantly from noise or sampling variation. No details are provided on handling of class imbalance (typical in readmission tasks) or missing data, which directly affects the reliability of cross-window model comparisons.

minor comments (1)

[Methods] Methods: The exact procedure for constructing the observation windows (e.g., cumulative aggregation vs. window-restricted selection for encounters and notes) and any preprocessing steps for the 4 million structured records should be specified in greater detail to support reproducibility.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for their constructive feedback. We address each major comment below and indicate revisions made to the manuscript.

read point-by-point responses

Referee: [Abstract] Abstract: The claim that the findings 'challenge the general assumption that more historical data inherently yields better machine learning predictions' and 'establish targeted time-window guidelines' is load-bearing on generalizability. However, the entire analysis derives from a single health system cohort of 7,174 patients with no external validation cohort, multi-site split, or cross-institutional temporal hold-out described. The modality-specific patterns (notes at 3-6 months; structured plateauing at 12 months) could arise from local documentation practices, coding styles, or base rates rather than transferable principles.

Authors: We acknowledge that the study is limited to a single health system and that the observed patterns could partly reflect local documentation or coding practices. The consistency of the 3-6 month optimum for notes and 12-month plateau for structured data across both traditional (BOW, TF-IDF, LDA) and neural (BERT, CNN, BiLSTM) encoders provides some internal evidence that the modality-specific temporal behavior is not purely an artifact of one site. We have revised the abstract and added an explicit limitations paragraph that qualifies the claims as cohort-specific observations and calls for future multi-site replication. We cannot supply external validation with the current dataset. revision: partial
Referee: [Results] Results (performance trends): The reported optimal windows and consistency across encoders are presented without statistical testing, confidence intervals, or p-values to establish that the peaks and plateaus differ significantly from noise or sampling variation. No details are provided on handling of class imbalance (typical in readmission tasks) or missing data, which directly affects the reliability of cross-window model comparisons.

Authors: We agree that formal statistical support and clearer methodological details are needed. In the revised manuscript we now report bootstrap 95% confidence intervals for AUC and AUPRC at every time window, and we add paired statistical tests (McNemar for classification, Wilcoxon signed-rank for continuous metrics) between adjacent windows to confirm that the identified optima differ significantly from neighboring windows. Class imbalance is addressed by class-weighted cross-entropy loss and by reporting both AUC-ROC and AUPRC; missing-data handling (exclusion rules for patients lacking sufficient history and multiple-imputation for structured variables) is now described in the Methods section. revision: yes

standing simulated objections not resolved

We do not have access to an external validation cohort from another health system and therefore cannot directly demonstrate that the modality-specific temporal patterns generalize beyond the single-center data.

Circularity Check

0 steps flagged

No circularity: purely empirical evaluation of time-window performance on fixed cohort

full rationale

The paper trains and evaluates a suite of encoders (BOW, TF-IDF, LDA, BERT, CNN, BiLSTM) on structured records and clinical notes drawn from fixed 7,174-patient records, then measures AUROC or similar metrics for each observation window length. No equations, fitted parameters, or predictions are defined in terms of the reported optima; the optimal windows (3-6 months for notes, 12-month plateau for structured data) are direct experimental outcomes, not constructed by definition or renamed known results. No self-citations supply uniqueness theorems or ansatzes. The derivation chain is simply data partitioning → model training → performance measurement, which is self-contained and externally falsifiable on other cohorts.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on standard supervised-learning assumptions plus the representativeness of a single-institution cohort; no new entities are postulated.

axioms (2)

domain assumption The 30-day readmission label is accurately recorded in the EHR for all patients in the cohort.
Implicit in any readmission-prediction study; no validation of label quality is described in the abstract.
domain assumption Clinical notes and structured records are independent enough that their optimal windows can be studied separately.
The paper evaluates notes alone, structured alone, and combined, but does not test interaction effects between modalities.

pith-pipeline@v0.9.0 · 5557 in / 1295 out tokens · 30650 ms · 2026-05-09T19:09:26.437574+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

1 extracted references · 1 canonical work pages

[1]

& Patrick-Lake, B

1 Hudson, K., Lifton, R. & Patrick-Lake, B. The precision medicine initiative cohort program—Building a Research Foundation for 21st Century Medicine. Precision Medicine Initiative (PMI) Working Group Report to the Advisory Committee to the Director, ed (2015). 2 JaWanna Henry, M., Yuriy Pylypchuk, P., Talisha Searcy, M., MA; & Vaishali Patel, P. M. Adopt...

work page doi:10.1016/j.dss.2018.06.010 2015

[1] [1]

& Patrick-Lake, B

1 Hudson, K., Lifton, R. & Patrick-Lake, B. The precision medicine initiative cohort program—Building a Research Foundation for 21st Century Medicine. Precision Medicine Initiative (PMI) Working Group Report to the Advisory Committee to the Director, ed (2015). 2 JaWanna Henry, M., Yuriy Pylypchuk, P., Talisha Searcy, M., MA; & Vaishali Patel, P. M. Adopt...

work page doi:10.1016/j.dss.2018.06.010 2015