pith. sign in

arxiv: 1907.00089 · v1 · pith:PJ6COE77new · submitted 2019-06-28 · 💻 cs.CY · cs.LG· stat.ML

Learning to Identify Patients at Risk of Uncontrolled Hypertension Using Electronic Health Records Data

Pith reviewed 2026-05-25 12:45 UTC · model grok-4.3

classification 💻 cs.CY cs.LGstat.ML
keywords hypertensionmachine learningelectronic health recordsrisk predictionuncontrolled hypertensionlogistic regressionrecurrent neural networksAUROC
0
0 comments X

The pith

Machine learning models using EHR data can identify patients likely to have uncontrolled hypertension in the next three months.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops and tests logistic regression and recurrent neural network models to predict which patients with hypertension will show uncontrolled blood pressure in the coming three months. It uses data from over 17,000 patients' electronic health records. The best model reaches an area under the ROC curve of 0.719, better than just using the most recent blood pressure reading as a predictor. Surprisingly, the simpler logistic regression performs as well as or better than the recurrent neural networks. This suggests that early identification could enable more proactive care for at-risk patients.

Core claim

Logistic regression and recurrent neural networks trained on electronic health record data from 14,407 patients can stratify hypertension patients by their risk of uncontrolled hypertension within three months, with the best model achieving an AUROC of 0.719 compared to a baseline of 0.634 using only the last blood pressure measure.

What carries the argument

Logistic regression and recurrent neural networks applied to sequences of patient EHR features for three-month risk stratification of uncontrolled hypertension.

If this is right

  • Targeted use of personalized treatments for high-risk patients becomes feasible.
  • Simple linear models like logistic regression serve as strong baselines and may suffice for EHR predictive tasks.
  • Recurrent neural networks do not provide additional benefit over logistic regression in this setting.
  • Proactive management of hypertension could decrease incidence of uncontrolled cases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar modeling approaches might apply to predicting other chronic disease complications using EHR.
  • Deployment would require validation on diverse populations beyond the training data.
  • Integration into clinical workflows could change how follow-up visits are scheduled.

Load-bearing premise

The electronic health records from the studied patients are complete, unbiased, and representative of future patients seen in clinical practice.

What would settle it

A drop in predictive performance below the reported AUROC when the model is applied to a new, independent cohort of patients from a different healthcare system.

Figures

Figures reproduced from arXiv: 1907.00089 by Byron C. Wallace, Ramin Mohammadi, Ramya Palacholla, Sagar Kamarthi, Sarthak Jain, Stephen Agboola.

Figure 1
Figure 1. Figure 1: The cohort used excludes deceased patients; patients older than 90 and younger than 18; those with fewer than 2 records in a year; and those with no vital sign records. Organizations (ACOs) face significant financial penalties if more than half of their hypertensive population remain uncontrolled at the end of the financial year7 . Thus, hospitals and care providers have additional incentives to mitigate u… view at source ↗
Figure 2
Figure 2. Figure 2: A schematic depicting the retrospective predictive task setup we consider. We acquired and cleaned EHR data from all patients in our cohort and created targets that reflect their hypertension status in a ninety window from point of prediction. Patient Level Demographic information Health history Health information Vital information Laboratory test results Co-morbidities Medication information Hospital Leve… view at source ↗
Figure 3
Figure 3. Figure 3: LSTM model for processing visits in sequence. values were dropped (see Appendix D and Appendix E). Categorical variables were converted to one-hot representation (i.e., indicator vectors). We labeled patients with systolic BP above 140 or diastolic BP above 90 as uncontrolled and others as controlled. Uncontrolled and controlled statuses were coded as 1 and 0, respectively. We fit our model on the training… view at source ↗
Figure 4
Figure 4. Figure 4: ROC curves of each method over the test set. We used early stopping criteria for assessing convergence, terminating training when loss decreased by ≤ 10−7 on the validation set. Under this criterion, the LR model trained for 500 epochs, and LSTM model ran for 250. Results We compared developed models against the natural baseline of using the patient’s BP measure from their most recent (last) visit as the p… view at source ↗
read the original abstract

Hypertension is a major risk factor for stroke, cardiovascular disease, and end-stage renal disease, and its prevalence is expected to rise dramatically. Effective hypertension management is thus critical. A particular priority is decreasing the incidence of uncontrolled hypertension. Early identification of patients at risk for uncontrolled hypertension would allow targeted use of personalized, proactive treatments. We develop machine learning models (logistic regression and recurrent neural networks) to stratify patients with respect to the risk of exhibiting uncontrolled hypertension within the coming three-month period. We trained and tested models using EHR data from 14,407 and 3,009 patients, respectively. The best model achieved an AUROC of 0.719, outperforming the simple, competitive baseline of relying prediction based on the last BP measure alone (0.634). Perhaps surprisingly, recurrent neural networks did not outperform a simple logistic regression for this task, suggesting that linear models should be included as strong baselines for predictive tasks using EHR

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 0 minor

Summary. The manuscript develops logistic regression and recurrent neural network models to predict the risk that a patient will exhibit uncontrolled hypertension in the next three-month window, using electronic health record data. Models are trained on 14,407 patients and evaluated on a held-out set of 3,009 patients; the best model reaches an AUROC of 0.719, exceeding the baseline that simply uses the most recent blood-pressure measurement (AUROC 0.634). The authors observe that the RNN does not outperform logistic regression and therefore recommend that linear models be retained as strong baselines for EHR prediction tasks.

Significance. If the reported performance difference is reproducible and generalizable, the work supplies a concrete, low-complexity risk-stratification signal that could support targeted hypertension management. The explicit comparison against a competitive last-BP baseline and the counter-intuitive finding that a recurrent architecture adds no value are useful contributions that strengthen the empirical literature on EHR-based forecasting.

major comments (2)
  1. [Methods] Methods section: the abstract (and therefore the central performance claim) supplies no description of cohort assembly, inclusion/exclusion criteria, the temporal or random nature of the 14,407/3,009 split, feature construction, or handling of missing blood-pressure values. These omissions are load-bearing because systematic missingness or selection effects in EHR follow-up frequency could inflate the reported AUROC difference relative to the last-BP baseline.
  2. [Results] Results: no confidence intervals, statistical test, or calibration plot is mentioned for the AUROC values 0.719 versus 0.634. Without these, it is impossible to judge whether the 0.085 absolute improvement is distinguishable from sampling variability and therefore whether the headline claim of outperformance is actionable.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We will revise the manuscript to address the points raised on methodological transparency and statistical reporting. Point-by-point responses follow.

read point-by-point responses
  1. Referee: [Methods] Methods section: the abstract (and therefore the central performance claim) supplies no description of cohort assembly, inclusion/exclusion criteria, the temporal or random nature of the 14,407/3,009 split, feature construction, or handling of missing blood-pressure values. These omissions are load-bearing because systematic missingness or selection effects in EHR follow-up frequency could inflate the reported AUROC difference relative to the last-BP baseline.

    Authors: We agree these details are essential for evaluating the results. We will expand the Methods section to explicitly describe cohort assembly, inclusion/exclusion criteria, the random (non-temporal) nature of the 14,407/3,009 split, feature construction from EHR data, and the handling of missing blood-pressure values (via forward-fill where clinically appropriate or exclusion). This revision will clarify the comparison to the last-BP baseline. revision: yes

  2. Referee: [Results] Results: no confidence intervals, statistical test, or calibration plot is mentioned for the AUROC values 0.719 versus 0.634. Without these, it is impossible to judge whether the 0.085 absolute improvement is distinguishable from sampling variability and therefore whether the headline claim of outperformance is actionable.

    Authors: We agree that uncertainty quantification and a formal comparison are needed. We will add 95% bootstrap confidence intervals for both AUROCs, apply a statistical test for the difference (e.g., DeLong test), and include a calibration plot in the revised results section. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical held-out evaluation

full rationale

The paper trains logistic regression and RNN models on EHR data from 14,407 patients and reports AUROC 0.719 on a separate 3,009-patient test set, outperforming a last-BP baseline (0.634). No equations, derivations, or self-citations are present that reduce the reported performance metric to any fitted input by construction. The result is a standard empirical comparison on held-out data and is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the empirical performance of fitted machine-learning models; the only non-standard elements are the specific EHR cohort and the 3-month prediction window.

free parameters (1)
  • logistic regression and RNN parameters
    All model weights are fitted to the training EHR data; no explicit count or values given in abstract.
axioms (2)
  • domain assumption Training and test splits are representative of the target clinical population
    Required for any generalization claim from the reported AUROC.
  • domain assumption Uncontrolled hypertension can be reliably labeled from EHR fields
    Implicit in the supervised learning setup.

pith-pipeline@v0.9.0 · 5721 in / 1412 out tokens · 47485 ms · 2026-05-25T12:45:48.416375+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 2 internal anchors

  1. [1]

    Hypertension management: an update,

    Q. Nguyen, J. Dominguez, L. Nguyen, and N. Gullapalli, “Hypertension management: an update,” American health & drug benefits, vol. 3, no. 1, p. 47, 2010

  2. [2]

    Heart disease and stroke statisticsâ ˘AˇT2015 update: a report from the american heart association,

    D. Mozaffarian, “Heart disease and stroke statisticsâ ˘AˇT2015 update: a report from the american heart association,” Circulation, vol. 131, no. 4, pp. e29–e322, 2015

  3. [3]

    New acc/aha high blood pressure guidelines lower definition of hyperten- sion,

    A. C. of Cardiology Foundation et al., “New acc/aha high blood pressure guidelines lower definition of hyperten- sion,” 2018

  4. [4]

    Long-term absolute benefit of lowering blood pressure in hypertensive patients according to the jnc vi risk stratification,

    L. G. Ogden, J. He, E. Lydick, and P. K. Whelton, “Long-term absolute benefit of lowering blood pressure in hypertensive patients according to the jnc vi risk stratification,”Hypertension, vol. 35, no. 2, pp. 539–543, 2000

  5. [5]

    Risk stratification in hypertension: new insights from the framingham study,

    W. B. Kannel, “Risk stratification in hypertension: new insights from the framingham study,” American journal of hypertension, vol. 13, no. S1, pp. 3S–10S, 2000

  6. [6]

    Accountable care organization (aco),

    G. W. de la Torre JI, “Accountable care organization (aco),” Medical Care Research and Review, 2017

  7. [7]

    Accountable care organizations, explained,

    J. Gold, “Accountable care organizations, explained,” 2015

  8. [8]

    Predicting changes in hypertension control using electronic health records from a chronic disease management program,

    J. Sun, C. D. McNaughton, P. Zhang, A. Perer, A. Gkoulalas-Divanis, J. C. Denny, J. Kirby, T. Lasko, A. Saip, and B. A. Malin, “Predicting changes in hypertension control using electronic health records from a chronic disease management program,” Journal of the American Medical Informatics Association , vol. 21, no. 2, pp. 337–344, 2013

  9. [9]

    Long short-term memory,

    S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997

  10. [10]

    Class imbalance, redux,

    B. C. Wallace, K. Small, C. E. Brodley, and T. A. Trikalinos, “Class imbalance, redux,” in Data Mining (ICDM), 2011 IEEE 11th International Conference on, pp. 754–763, IEEE, 2011

  11. [11]

    Learning to Diagnose with LSTM Recurrent Neural Networks

    Z. C. Lipton, D. C. Kale, C. Elkan, and R. Wetzel, “Learning to diagnose with lstm recurrent neural networks,” arXiv preprint arXiv:1511.03677, 2015

  12. [12]

    Scalable and accurate deep learning with electronic health records,

    A. Rajkomar, E. Oren, K. Chen, A. M. Dai, N. Hajaj, M. Hardt, P. J. Liu, X. Liu, J. Marcus, M. Sun, et al., “Scalable and accurate deep learning with electronic health records,” npj Digital Medicine , vol. 1, no. 1, p. 18, 2018

  13. [13]

    Chollet et al., “Keras.” https://keras.io, 2015

    F. Chollet et al., “Keras.” https://keras.io, 2015

  14. [14]

    Tensorflow: a system for large-scale machine learning.,

    M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al., “Tensorflow: a system for large-scale machine learning.,” in OSDI, vol. 16, pp. 265–283, 2016

  15. [15]

    Adam: A Method for Stochastic Optimization

    D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014

  16. [16]

    Axiomatic attribution for deep networks,

    M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” in International Conference on Machine Learning, pp. 3319–3328, 2017

  17. [17]

    Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls,

    J. A. Sterne, I. R. White, J. B. Carlin, M. Spratt, P. Royston, M. G. Kenward, A. M. Wood, and J. R. Carpen- ter, “Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls,” Bmj, vol. 338, p. b2393, 2009

  18. [18]

    Modeling missing data in clinical time series with rnns,

    Z. C. Lipton, D. C. Kale, and R. Wetzel, “Modeling missing data in clinical time series with rnns,” Machine Learning for Healthcare, 2016

  19. [19]

    Supervised machine learning: A review of classification tech- niques,

    S. B. Kotsiantis, I. Zaharakis, and P. Pintelas, “Supervised machine learning: A review of classification tech- niques,” Emerging artificial intelligence applications in computer engineering, vol. 160, pp. 3–24, 2007. Appendix A Medications Drug Family Types Drug Family Types ACE InhibitorLisinopril, Benazepril Calcium channel blockerAmlodipine, Nifedipine ...