Temporal Data Requirement for Predicting Unplanned Hospital Readmissions
Pith reviewed 2026-05-09 19:09 UTC · model grok-4.3
The pith
Shorter histories from clinical notes outperform longer ones for predicting surgical readmissions, while structured data improves only up to twelve months before plateauing.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The study shows that the optimal observation window for unstructured clinical notes is three to six months prior to surgery for maximum predictive performance in 30-day readmission after arthroplasties, while for structured encounter records performance improves with longer windows but plateaus after twelve months; these modality-specific temporal patterns remain consistent across non-neural and neural encoders and across model complexities.
What carries the argument
Modality-specific temporal performance curves, in which unstructured notes require markedly shorter observation windows than structured records to reach peak accuracy for readmission prediction.
If this is right
- Models reach maximum accuracy using notes from only the three-to-six-month window before surgery.
- Structured data adds value up to twelve months but yields no further gain beyond that point.
- The differing optimal windows hold for every encoder and model type tested.
- Readmission prediction systems can be tuned with modality-specific time cutoffs rather than uniform long histories.
- The assumption that longer historical data always improves machine-learning predictions is contradicted by these modality-specific results.
Where Pith is reading between the lines
- Similar modality-dependent window rules may apply to other EHR-based predictions such as infection risk or length of stay.
- Limiting note processing to recent months could reduce storage and compute costs while preserving or improving accuracy.
- Hospitals adopting these guidelines might achieve faster model retraining cycles by focusing data pipelines on the most informative periods.
- The plateau in structured data suggests that very old records contain mostly redundant information for this outcome.
Load-bearing premise
The temporal patterns found in one health system's 7,174-patient cohort will appear in other hospitals and patient populations.
What would settle it
Re-running the exact same window-length experiments on readmission data from a second, independent health system and finding that optimal note windows exceed six months or that structured-data performance keeps rising past twelve months.
read the original abstract
With the proliferation of Electronic Health Records (EHRs), a critical challenge in building predictive models is determining the optimal historical data time window to maximize accuracy. This study investigates the impact of various observation windows ranging from the day of surgery to three years prior on predicting 30-day readmission following hip and knee arthroplasties. The dataset encompasses both structured encounter records (over 4 million) and unstructured clinical notes (80,000) from 7,174 patients. To extract meaning from the clinical notes, we employed a suite of non neural (BOW, count BOW, TF IDF, LDA) and neural encoders (BERT, 1D CNN, BiLSTM, Average). We subsequently evaluated models utilizing clinical notes alone, structured data alone, and a combination of both modalities. Our results demonstrate that the optimal time window for unstructured clinical notes is significantly shorter than for structured data, maximum predictive performance was achieved using notes from just three to six months prior to surgery. In contrast, performance using structured data improved as the time window lengthened, but strictly plateaued after twelve months. These modality-specific temporal patterns remained consistent regardless of model complexity or encoder type. Ultimately, these findings challenge the general assumption that more historical data inherently yields better machine learning predictions, establishing targeted time-window guidelines for optimizing readmission prediction models.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents an empirical study on the effect of historical time windows (from surgery day to 3 years prior) on machine learning models for predicting 30-day readmission after hip and knee arthroplasties. Using data from 7,174 patients including over 4 million structured records and 80,000 clinical notes, it evaluates models based on notes alone, structured data alone, and combined, employing both traditional (BOW, TF-IDF, LDA) and neural (BERT, 1D CNN, BiLSTM, Average) encoders. The key finding is that optimal performance for notes occurs with 3-6 months of history, while structured data benefits from up to 12 months before plateauing, with these patterns consistent across encoder types. The authors argue this challenges the assumption that longer historical data always improves predictions and offers practical guidelines for time window selection.
Significance. If the modality-specific temporal patterns hold, this work could meaningfully inform the design of efficient EHR-based predictive models by demonstrating that longer histories are not always superior and may be unnecessary or even suboptimal for unstructured notes. The consistency of results across a diverse suite of non-neural and neural encoders is a clear strength, lending credibility to the empirical trends observed. Such targeted guidelines could help optimize data selection, reduce computational overhead, and improve model practicality in clinical settings. The purely empirical nature with no circular derivations further supports its value as a data-driven contribution.
major comments (2)
- [Abstract] Abstract: The claim that the findings 'challenge the general assumption that more historical data inherently yields better machine learning predictions' and 'establish targeted time-window guidelines' is load-bearing on generalizability. However, the entire analysis derives from a single health system cohort of 7,174 patients with no external validation cohort, multi-site split, or cross-institutional temporal hold-out described. The modality-specific patterns (notes at 3-6 months; structured plateauing at 12 months) could arise from local documentation practices, coding styles, or base rates rather than transferable principles.
- [Results] Results (performance trends): The reported optimal windows and consistency across encoders are presented without statistical testing, confidence intervals, or p-values to establish that the peaks and plateaus differ significantly from noise or sampling variation. No details are provided on handling of class imbalance (typical in readmission tasks) or missing data, which directly affects the reliability of cross-window model comparisons.
minor comments (1)
- [Methods] Methods: The exact procedure for constructing the observation windows (e.g., cumulative aggregation vs. window-restricted selection for encounters and notes) and any preprocessing steps for the 4 million structured records should be specified in greater detail to support reproducibility.
Simulated Author's Rebuttal
We thank the referee for their constructive feedback. We address each major comment below and indicate revisions made to the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that the findings 'challenge the general assumption that more historical data inherently yields better machine learning predictions' and 'establish targeted time-window guidelines' is load-bearing on generalizability. However, the entire analysis derives from a single health system cohort of 7,174 patients with no external validation cohort, multi-site split, or cross-institutional temporal hold-out described. The modality-specific patterns (notes at 3-6 months; structured plateauing at 12 months) could arise from local documentation practices, coding styles, or base rates rather than transferable principles.
Authors: We acknowledge that the study is limited to a single health system and that the observed patterns could partly reflect local documentation or coding practices. The consistency of the 3-6 month optimum for notes and 12-month plateau for structured data across both traditional (BOW, TF-IDF, LDA) and neural (BERT, CNN, BiLSTM) encoders provides some internal evidence that the modality-specific temporal behavior is not purely an artifact of one site. We have revised the abstract and added an explicit limitations paragraph that qualifies the claims as cohort-specific observations and calls for future multi-site replication. We cannot supply external validation with the current dataset. revision: partial
-
Referee: [Results] Results (performance trends): The reported optimal windows and consistency across encoders are presented without statistical testing, confidence intervals, or p-values to establish that the peaks and plateaus differ significantly from noise or sampling variation. No details are provided on handling of class imbalance (typical in readmission tasks) or missing data, which directly affects the reliability of cross-window model comparisons.
Authors: We agree that formal statistical support and clearer methodological details are needed. In the revised manuscript we now report bootstrap 95% confidence intervals for AUC and AUPRC at every time window, and we add paired statistical tests (McNemar for classification, Wilcoxon signed-rank for continuous metrics) between adjacent windows to confirm that the identified optima differ significantly from neighboring windows. Class imbalance is addressed by class-weighted cross-entropy loss and by reporting both AUC-ROC and AUPRC; missing-data handling (exclusion rules for patients lacking sufficient history and multiple-imputation for structured variables) is now described in the Methods section. revision: yes
- We do not have access to an external validation cohort from another health system and therefore cannot directly demonstrate that the modality-specific temporal patterns generalize beyond the single-center data.
Circularity Check
No circularity: purely empirical evaluation of time-window performance on fixed cohort
full rationale
The paper trains and evaluates a suite of encoders (BOW, TF-IDF, LDA, BERT, CNN, BiLSTM) on structured records and clinical notes drawn from fixed 7,174-patient records, then measures AUROC or similar metrics for each observation window length. No equations, fitted parameters, or predictions are defined in terms of the reported optima; the optimal windows (3-6 months for notes, 12-month plateau for structured data) are direct experimental outcomes, not constructed by definition or renamed known results. No self-citations supply uniqueness theorems or ansatzes. The derivation chain is simply data partitioning → model training → performance measurement, which is self-contained and externally falsifiable on other cohorts.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption The 30-day readmission label is accurately recorded in the EHR for all patients in the cohort.
- domain assumption Clinical notes and structured records are independent enough that their optimal windows can be studied separately.
Reference graph
Works this paper leans on
-
[1]
1 Hudson, K., Lifton, R. & Patrick-Lake, B. The precision medicine initiative cohort program—Building a Research Foundation for 21st Century Medicine. Precision Medicine Initiative (PMI) Working Group Report to the Advisory Committee to the Director, ed (2015). 2 JaWanna Henry, M., Yuriy Pylypchuk, P., Talisha Searcy, M., MA; & Vaishali Patel, P. M. Adopt...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.