Learning to Identify Patients at Risk of Uncontrolled Hypertension Using Electronic Health Records Data

Byron C. Wallace; Ramin Mohammadi; Ramya Palacholla; Sagar Kamarthi; Sarthak Jain; Stephen Agboola

arxiv: 1907.00089 · v1 · pith:PJ6COE77new · submitted 2019-06-28 · 💻 cs.CY · cs.LG· stat.ML

Learning to Identify Patients at Risk of Uncontrolled Hypertension Using Electronic Health Records Data

Ramin Mohammadi , Sarthak Jain , Stephen Agboola , Ramya Palacholla , Sagar Kamarthi , Byron C. Wallace This is my paper

Pith reviewed 2026-05-25 12:45 UTC · model grok-4.3

classification 💻 cs.CY cs.LGstat.ML

keywords hypertensionmachine learningelectronic health recordsrisk predictionuncontrolled hypertensionlogistic regressionrecurrent neural networksAUROC

0 comments

The pith

Machine learning models using EHR data can identify patients likely to have uncontrolled hypertension in the next three months.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops and tests logistic regression and recurrent neural network models to predict which patients with hypertension will show uncontrolled blood pressure in the coming three months. It uses data from over 17,000 patients' electronic health records. The best model reaches an area under the ROC curve of 0.719, better than just using the most recent blood pressure reading as a predictor. Surprisingly, the simpler logistic regression performs as well as or better than the recurrent neural networks. This suggests that early identification could enable more proactive care for at-risk patients.

Core claim

Logistic regression and recurrent neural networks trained on electronic health record data from 14,407 patients can stratify hypertension patients by their risk of uncontrolled hypertension within three months, with the best model achieving an AUROC of 0.719 compared to a baseline of 0.634 using only the last blood pressure measure.

What carries the argument

Logistic regression and recurrent neural networks applied to sequences of patient EHR features for three-month risk stratification of uncontrolled hypertension.

If this is right

Targeted use of personalized treatments for high-risk patients becomes feasible.
Simple linear models like logistic regression serve as strong baselines and may suffice for EHR predictive tasks.
Recurrent neural networks do not provide additional benefit over logistic regression in this setting.
Proactive management of hypertension could decrease incidence of uncontrolled cases.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Similar modeling approaches might apply to predicting other chronic disease complications using EHR.
Deployment would require validation on diverse populations beyond the training data.
Integration into clinical workflows could change how follow-up visits are scheduled.

Load-bearing premise

The electronic health records from the studied patients are complete, unbiased, and representative of future patients seen in clinical practice.

What would settle it

A drop in predictive performance below the reported AUROC when the model is applied to a new, independent cohort of patients from a different healthcare system.

Figures

Figures reproduced from arXiv: 1907.00089 by Byron C. Wallace, Ramin Mohammadi, Ramya Palacholla, Sagar Kamarthi, Sarthak Jain, Stephen Agboola.

**Figure 1.** Figure 1: The cohort used excludes deceased patients; patients older than 90 and younger than 18; those with fewer than 2 records in a year; and those with no vital sign records. Organizations (ACOs) face significant financial penalties if more than half of their hypertensive population remain uncontrolled at the end of the financial year7 . Thus, hospitals and care providers have additional incentives to mitigate u… view at source ↗

**Figure 2.** Figure 2: A schematic depicting the retrospective predictive task setup we consider. We acquired and cleaned EHR data from all patients in our cohort and created targets that reflect their hypertension status in a ninety window from point of prediction. Patient Level Demographic information Health history Health information Vital information Laboratory test results Co-morbidities Medication information Hospital Leve… view at source ↗

**Figure 3.** Figure 3: LSTM model for processing visits in sequence. values were dropped (see Appendix D and Appendix E). Categorical variables were converted to one-hot representation (i.e., indicator vectors). We labeled patients with systolic BP above 140 or diastolic BP above 90 as uncontrolled and others as controlled. Uncontrolled and controlled statuses were coded as 1 and 0, respectively. We fit our model on the training… view at source ↗

**Figure 4.** Figure 4: ROC curves of each method over the test set. We used early stopping criteria for assessing convergence, terminating training when loss decreased by ≤ 10−7 on the validation set. Under this criterion, the LR model trained for 500 epochs, and LSTM model ran for 250. Results We compared developed models against the natural baseline of using the patient’s BP measure from their most recent (last) visit as the p… view at source ↗

read the original abstract

Hypertension is a major risk factor for stroke, cardiovascular disease, and end-stage renal disease, and its prevalence is expected to rise dramatically. Effective hypertension management is thus critical. A particular priority is decreasing the incidence of uncontrolled hypertension. Early identification of patients at risk for uncontrolled hypertension would allow targeted use of personalized, proactive treatments. We develop machine learning models (logistic regression and recurrent neural networks) to stratify patients with respect to the risk of exhibiting uncontrolled hypertension within the coming three-month period. We trained and tested models using EHR data from 14,407 and 3,009 patients, respectively. The best model achieved an AUROC of 0.719, outperforming the simple, competitive baseline of relying prediction based on the last BP measure alone (0.634). Perhaps surprisingly, recurrent neural networks did not outperform a simple logistic regression for this task, suggesting that linear models should be included as strong baselines for predictive tasks using EHR

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Logistic regression matches RNN performance here but the abstract gives almost no methods detail, so the 0.719 AUROC is hard to evaluate or trust for deployment.

read the letter

The main point is that logistic regression reaches an AUROC of 0.719 on predicting uncontrolled hypertension in the next three months from EHR data, beats the last-BP baseline at 0.634, and RNNs add nothing. That empirical observation is the paper's real contribution. It is useful to see a case where the simpler model is enough and to have the explicit suggestion that linear baselines should be tried first on EHR tasks. The inclusion of a competitive baseline is also a plus; too many papers skip that step. The work is otherwise standard supervised learning applied to a well-defined clinical endpoint. The central weakness is the absence of any description of cohort construction, feature extraction, missing-data handling, temporal splitting, or site characteristics. With only patient counts and AUROC numbers, it is impossible to judge whether the result reflects real signal or artifacts from how the data were assembled or labeled. Single-center EHR data often carry selection and measurement biases that do not travel, and nothing in the abstract addresses external validation or sensitivity checks. This paper is mainly for researchers who build clinical prediction models on EHR and want a quick data point on when RNNs are not worth the trouble. It is not a methods paper and does not claim new algorithms. A reader looking for reproducible risk scores or deployment guidance will not get much from it without the full methods. It deserves peer review if the full manuscript supplies the missing details on data processing and shows some attempt at temporal validation or multi-site checks; otherwise the numerical claim is too under-specified to justify referee time. I would not cite the numbers as they stand but would note the RNN-versus-LR comparison if a methods paper later fills in the gaps.

Referee Report

2 major / 0 minor

Summary. The manuscript develops logistic regression and recurrent neural network models to predict the risk that a patient will exhibit uncontrolled hypertension in the next three-month window, using electronic health record data. Models are trained on 14,407 patients and evaluated on a held-out set of 3,009 patients; the best model reaches an AUROC of 0.719, exceeding the baseline that simply uses the most recent blood-pressure measurement (AUROC 0.634). The authors observe that the RNN does not outperform logistic regression and therefore recommend that linear models be retained as strong baselines for EHR prediction tasks.

Significance. If the reported performance difference is reproducible and generalizable, the work supplies a concrete, low-complexity risk-stratification signal that could support targeted hypertension management. The explicit comparison against a competitive last-BP baseline and the counter-intuitive finding that a recurrent architecture adds no value are useful contributions that strengthen the empirical literature on EHR-based forecasting.

major comments (2)

[Methods] Methods section: the abstract (and therefore the central performance claim) supplies no description of cohort assembly, inclusion/exclusion criteria, the temporal or random nature of the 14,407/3,009 split, feature construction, or handling of missing blood-pressure values. These omissions are load-bearing because systematic missingness or selection effects in EHR follow-up frequency could inflate the reported AUROC difference relative to the last-BP baseline.
[Results] Results: no confidence intervals, statistical test, or calibration plot is mentioned for the AUROC values 0.719 versus 0.634. Without these, it is impossible to judge whether the 0.085 absolute improvement is distinguishable from sampling variability and therefore whether the headline claim of outperformance is actionable.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments. We will revise the manuscript to address the points raised on methodological transparency and statistical reporting. Point-by-point responses follow.

read point-by-point responses

Referee: [Methods] Methods section: the abstract (and therefore the central performance claim) supplies no description of cohort assembly, inclusion/exclusion criteria, the temporal or random nature of the 14,407/3,009 split, feature construction, or handling of missing blood-pressure values. These omissions are load-bearing because systematic missingness or selection effects in EHR follow-up frequency could inflate the reported AUROC difference relative to the last-BP baseline.

Authors: We agree these details are essential for evaluating the results. We will expand the Methods section to explicitly describe cohort assembly, inclusion/exclusion criteria, the random (non-temporal) nature of the 14,407/3,009 split, feature construction from EHR data, and the handling of missing blood-pressure values (via forward-fill where clinically appropriate or exclusion). This revision will clarify the comparison to the last-BP baseline. revision: yes
Referee: [Results] Results: no confidence intervals, statistical test, or calibration plot is mentioned for the AUROC values 0.719 versus 0.634. Without these, it is impossible to judge whether the 0.085 absolute improvement is distinguishable from sampling variability and therefore whether the headline claim of outperformance is actionable.

Authors: We agree that uncertainty quantification and a formal comparison are needed. We will add 95% bootstrap confidence intervals for both AUROCs, apply a statistical test for the difference (e.g., DeLong test), and include a calibration plot in the revised results section. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical held-out evaluation

full rationale

The paper trains logistic regression and RNN models on EHR data from 14,407 patients and reports AUROC 0.719 on a separate 3,009-patient test set, outperforming a last-BP baseline (0.634). No equations, derivations, or self-citations are present that reduce the reported performance metric to any fitted input by construction. The result is a standard empirical comparison on held-out data and is self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The central claim rests on the empirical performance of fitted machine-learning models; the only non-standard elements are the specific EHR cohort and the 3-month prediction window.

free parameters (1)

logistic regression and RNN parameters
All model weights are fitted to the training EHR data; no explicit count or values given in abstract.

axioms (2)

domain assumption Training and test splits are representative of the target clinical population
Required for any generalization claim from the reported AUROC.
domain assumption Uncontrolled hypertension can be reliably labeled from EHR fields
Implicit in the supervised learning setup.

pith-pipeline@v0.9.0 · 5721 in / 1412 out tokens · 47485 ms · 2026-05-25T12:45:48.416375+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 2 internal anchors

[1]

Hypertension management: an update,

Q. Nguyen, J. Dominguez, L. Nguyen, and N. Gullapalli, “Hypertension management: an update,” American health & drug beneﬁts, vol. 3, no. 1, p. 47, 2010

work page 2010
[2]

Heart disease and stroke statisticsâ ˘AˇT2015 update: a report from the american heart association,

D. Mozaffarian, “Heart disease and stroke statisticsâ ˘AˇT2015 update: a report from the american heart association,” Circulation, vol. 131, no. 4, pp. e29–e322, 2015

work page 2015
[3]

New acc/aha high blood pressure guidelines lower deﬁnition of hyperten- sion,

A. C. of Cardiology Foundation et al., “New acc/aha high blood pressure guidelines lower deﬁnition of hyperten- sion,” 2018

work page 2018
[4]

Long-term absolute beneﬁt of lowering blood pressure in hypertensive patients according to the jnc vi risk stratiﬁcation,

L. G. Ogden, J. He, E. Lydick, and P. K. Whelton, “Long-term absolute beneﬁt of lowering blood pressure in hypertensive patients according to the jnc vi risk stratiﬁcation,”Hypertension, vol. 35, no. 2, pp. 539–543, 2000

work page 2000
[5]

Risk stratiﬁcation in hypertension: new insights from the framingham study,

W. B. Kannel, “Risk stratiﬁcation in hypertension: new insights from the framingham study,” American journal of hypertension, vol. 13, no. S1, pp. 3S–10S, 2000

work page 2000
[6]

Accountable care organization (aco),

G. W. de la Torre JI, “Accountable care organization (aco),” Medical Care Research and Review, 2017

work page 2017
[7]

Accountable care organizations, explained,

J. Gold, “Accountable care organizations, explained,” 2015

work page 2015
[8]

Predicting changes in hypertension control using electronic health records from a chronic disease management program,

J. Sun, C. D. McNaughton, P. Zhang, A. Perer, A. Gkoulalas-Divanis, J. C. Denny, J. Kirby, T. Lasko, A. Saip, and B. A. Malin, “Predicting changes in hypertension control using electronic health records from a chronic disease management program,” Journal of the American Medical Informatics Association , vol. 21, no. 2, pp. 337–344, 2013

work page 2013
[9]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997

work page 1997
[10]

Class imbalance, redux,

B. C. Wallace, K. Small, C. E. Brodley, and T. A. Trikalinos, “Class imbalance, redux,” in Data Mining (ICDM), 2011 IEEE 11th International Conference on, pp. 754–763, IEEE, 2011

work page 2011
[11]

Learning to Diagnose with LSTM Recurrent Neural Networks

Z. C. Lipton, D. C. Kale, C. Elkan, and R. Wetzel, “Learning to diagnose with lstm recurrent neural networks,” arXiv preprint arXiv:1511.03677, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015
[12]

Scalable and accurate deep learning with electronic health records,

A. Rajkomar, E. Oren, K. Chen, A. M. Dai, N. Hajaj, M. Hardt, P. J. Liu, X. Liu, J. Marcus, M. Sun, et al., “Scalable and accurate deep learning with electronic health records,” npj Digital Medicine , vol. 1, no. 1, p. 18, 2018

work page 2018
[13]

Chollet et al., “Keras.” https://keras.io, 2015

F. Chollet et al., “Keras.” https://keras.io, 2015

work page 2015
[14]

Tensorﬂow: a system for large-scale machine learning.,

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al., “Tensorﬂow: a system for large-scale machine learning.,” in OSDI, vol. 16, pp. 265–283, 2016

work page 2016
[15]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[16]

Axiomatic attribution for deep networks,

M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” in International Conference on Machine Learning, pp. 3319–3328, 2017

work page 2017
[17]

Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls,

J. A. Sterne, I. R. White, J. B. Carlin, M. Spratt, P. Royston, M. G. Kenward, A. M. Wood, and J. R. Carpen- ter, “Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls,” Bmj, vol. 338, p. b2393, 2009

work page 2009
[18]

Modeling missing data in clinical time series with rnns,

Z. C. Lipton, D. C. Kale, and R. Wetzel, “Modeling missing data in clinical time series with rnns,” Machine Learning for Healthcare, 2016

work page 2016
[19]

Supervised machine learning: A review of classiﬁcation tech- niques,

S. B. Kotsiantis, I. Zaharakis, and P. Pintelas, “Supervised machine learning: A review of classiﬁcation tech- niques,” Emerging artiﬁcial intelligence applications in computer engineering, vol. 160, pp. 3–24, 2007. Appendix A Medications Drug Family Types Drug Family Types ACE InhibitorLisinopril, Benazepril Calcium channel blockerAmlodipine, Nifedipine ...

work page 2007

[1] [1]

Hypertension management: an update,

Q. Nguyen, J. Dominguez, L. Nguyen, and N. Gullapalli, “Hypertension management: an update,” American health & drug beneﬁts, vol. 3, no. 1, p. 47, 2010

work page 2010

[2] [2]

Heart disease and stroke statisticsâ ˘AˇT2015 update: a report from the american heart association,

D. Mozaffarian, “Heart disease and stroke statisticsâ ˘AˇT2015 update: a report from the american heart association,” Circulation, vol. 131, no. 4, pp. e29–e322, 2015

work page 2015

[3] [3]

New acc/aha high blood pressure guidelines lower deﬁnition of hyperten- sion,

A. C. of Cardiology Foundation et al., “New acc/aha high blood pressure guidelines lower deﬁnition of hyperten- sion,” 2018

work page 2018

[4] [4]

Long-term absolute beneﬁt of lowering blood pressure in hypertensive patients according to the jnc vi risk stratiﬁcation,

L. G. Ogden, J. He, E. Lydick, and P. K. Whelton, “Long-term absolute beneﬁt of lowering blood pressure in hypertensive patients according to the jnc vi risk stratiﬁcation,”Hypertension, vol. 35, no. 2, pp. 539–543, 2000

work page 2000

[5] [5]

Risk stratiﬁcation in hypertension: new insights from the framingham study,

W. B. Kannel, “Risk stratiﬁcation in hypertension: new insights from the framingham study,” American journal of hypertension, vol. 13, no. S1, pp. 3S–10S, 2000

work page 2000

[6] [6]

Accountable care organization (aco),

G. W. de la Torre JI, “Accountable care organization (aco),” Medical Care Research and Review, 2017

work page 2017

[7] [7]

Accountable care organizations, explained,

J. Gold, “Accountable care organizations, explained,” 2015

work page 2015

[8] [8]

Predicting changes in hypertension control using electronic health records from a chronic disease management program,

J. Sun, C. D. McNaughton, P. Zhang, A. Perer, A. Gkoulalas-Divanis, J. C. Denny, J. Kirby, T. Lasko, A. Saip, and B. A. Malin, “Predicting changes in hypertension control using electronic health records from a chronic disease management program,” Journal of the American Medical Informatics Association , vol. 21, no. 2, pp. 337–344, 2013

work page 2013

[9] [9]

Long short-term memory,

S. Hochreiter and J. Schmidhuber, “Long short-term memory,” Neural computation, vol. 9, no. 8, pp. 1735–1780, 1997

work page 1997

[10] [10]

Class imbalance, redux,

B. C. Wallace, K. Small, C. E. Brodley, and T. A. Trikalinos, “Class imbalance, redux,” in Data Mining (ICDM), 2011 IEEE 11th International Conference on, pp. 754–763, IEEE, 2011

work page 2011

[11] [11]

Learning to Diagnose with LSTM Recurrent Neural Networks

Z. C. Lipton, D. C. Kale, C. Elkan, and R. Wetzel, “Learning to diagnose with lstm recurrent neural networks,” arXiv preprint arXiv:1511.03677, 2015

work page internal anchor Pith review Pith/arXiv arXiv 2015

[12] [12]

Scalable and accurate deep learning with electronic health records,

A. Rajkomar, E. Oren, K. Chen, A. M. Dai, N. Hajaj, M. Hardt, P. J. Liu, X. Liu, J. Marcus, M. Sun, et al., “Scalable and accurate deep learning with electronic health records,” npj Digital Medicine , vol. 1, no. 1, p. 18, 2018

work page 2018

[13] [13]

Chollet et al., “Keras.” https://keras.io, 2015

F. Chollet et al., “Keras.” https://keras.io, 2015

work page 2015

[14] [14]

Tensorﬂow: a system for large-scale machine learning.,

M. Abadi, P. Barham, J. Chen, Z. Chen, A. Davis, J. Dean, M. Devin, S. Ghemawat, G. Irving, M. Isard, et al., “Tensorﬂow: a system for large-scale machine learning.,” in OSDI, vol. 16, pp. 265–283, 2016

work page 2016

[15] [15]

Adam: A Method for Stochastic Optimization

D. P. Kingma and J. Ba, “Adam: A method for stochastic optimization,” arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[16] [16]

Axiomatic attribution for deep networks,

M. Sundararajan, A. Taly, and Q. Yan, “Axiomatic attribution for deep networks,” in International Conference on Machine Learning, pp. 3319–3328, 2017

work page 2017

[17] [17]

Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls,

J. A. Sterne, I. R. White, J. B. Carlin, M. Spratt, P. Royston, M. G. Kenward, A. M. Wood, and J. R. Carpen- ter, “Multiple imputation for missing data in epidemiological and clinical research: potential and pitfalls,” Bmj, vol. 338, p. b2393, 2009

work page 2009

[18] [18]

Modeling missing data in clinical time series with rnns,

Z. C. Lipton, D. C. Kale, and R. Wetzel, “Modeling missing data in clinical time series with rnns,” Machine Learning for Healthcare, 2016

work page 2016

[19] [19]

Supervised machine learning: A review of classiﬁcation tech- niques,

S. B. Kotsiantis, I. Zaharakis, and P. Pintelas, “Supervised machine learning: A review of classiﬁcation tech- niques,” Emerging artiﬁcial intelligence applications in computer engineering, vol. 160, pp. 3–24, 2007. Appendix A Medications Drug Family Types Drug Family Types ACE InhibitorLisinopril, Benazepril Calcium channel blockerAmlodipine, Nifedipine ...

work page 2007