pith. sign in

arxiv: 1906.08339 · v1 · pith:Z6ARXL2Hnew · submitted 2019-06-19 · 💻 cs.LG · stat.AP· stat.ML

Learning Patient Engagement in Care Management: Performance vs. Interpretability

Pith reviewed 2026-05-25 20:09 UTC · model grok-4.3

classification 💻 cs.LG stat.APstat.ML
keywords patient engagementcare managementpredictive modelinginterpretabilityhealthcare analyticsenrollment predictiongoal commitment
0
0 comments X

The pith

A behavioral engagement scoring pipeline predicts patient responses to care program calls and goals while supplying interpretable references through prototypical patients.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a data-driven scoring method that measures a patient's interest in enrolling in a care program and their commitment to assigned goals. These scores are then used to forecast whether the patient will respond to enrollment outreach or goal assignments. On real-world care management records the scores achieve useful prediction accuracy. The same pipeline supplies care managers with concrete reference points in the form of prototypical patients, preserving that accuracy rather than trading it for explainability.

Core claim

Using real-world care management data, the scoring method successfully predicts patient engagement, and the use of prototypical patients as reference points supplies interpretable insights to care managers without sacrificing prediction performance.

What carries the argument

Behavioral engagement scoring pipeline that produces two component scores (enrollment interest and goal commitment) and compares new patients to a set of prototypical patients for interpretability.

If this is right

  • Care managers can prioritize outreach calls and goal assignments according to the two engagement scores.
  • Explanations anchored to prototypical patients allow managers to discuss specific reasons for a given prediction.
  • The dual requirement of prediction accuracy and interpretability can be met within the same pipeline rather than requiring separate models.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same scoring structure might transfer to other patient-facing interventions where enrollment and follow-through are the key behaviors.
  • If new data streams such as wearable or portal usage logs are added, the pipeline could be retrained without redesigning the interpretability layer.

Load-bearing premise

The single real-world care management dataset used for training and testing represents the broader patient population and the chosen prototypical patients remain valid references on unseen cases.

What would settle it

Apply the trained scoring model to a fresh care-management dataset drawn from a different health system or program and measure whether prediction accuracy falls below the level reported on the original data or whether the prototypical-patient explanations cease to align with observed responses.

Figures

Figures reproduced from arXiv: 1906.08339 by Chandramouli Maduri, Ching-Hua Chen, Pei-Yun S. Hsueh, Subhro Das.

Figure 1
Figure 1. Figure 1: Care Management flow. 3 DATA 3.1 Care Management Decision Support For patients with complex care needs, it is important to coordinate across the patients’ care givers and providers to account for the differing advice received from clinicians, the varying medications, and the adverse drug events [22]. In practice, this is often achieved by implementing structured care programs, in which a predeter￾mined set… view at source ↗
Figure 2
Figure 2. Figure 2: Program enrollment timeline [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 5
Figure 5. Figure 5: Goal assignment across different focus areas. [PITH_FULL_IMAGE:figures/full_fig_p003_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Goal attainment records of patients distributed across focus areas & intervention categories. [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Interpretable engagement insights for Care Man [PITH_FULL_IMAGE:figures/full_fig_p006_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Feature weights that indicate patient response driver across the five different behavioral profiles for goal attainment. [PITH_FULL_IMAGE:figures/full_fig_p007_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Pipeline evaluation using prototypical patient cases. [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
read the original abstract

The health outcomes of high-need patients can be substantially influenced by the degree of patient engagement in their own care. The role of care managers includes that of enrolling patients into care programs and keeping them sufficiently engaged in the program, so that patients can attain various goals. The attainment of these goals is expected to improve the patients' health outcomes. In this paper, we present a real world data-driven method and the behavioral engagement scoring pipeline for scoring the engagement level of a patient in two regards: (1) Their interest in enrolling into a relevant care program, and (2) their interest and commitment to program goals. We use this score to predict a patient's propensity to respond (i.e., to a call for enrollment into a program, or to an assigned program goal). Using real-world care management data, we show that our scoring method successfully predicts patient engagement. We also show that we are able to provide interpretable insights to care managers, using prototypical patients as a point of reference, without sacrificing prediction performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 2 minor

Summary. The manuscript introduces a behavioral engagement scoring pipeline that assesses patients' interest in enrolling into relevant care programs and their commitment to assigned program goals. It uses this score to predict response propensity and, on real-world care management data, claims to demonstrate successful prediction of engagement while enabling interpretable insights to care managers via prototypical patients without sacrificing predictive performance.

Significance. If the empirical results hold under proper validation, the work addresses a practically important tension in healthcare ML between predictive accuracy and interpretability. The emphasis on real-world data and prototype-based explanations for actionable insights to non-technical users is a strength that could support deployment in care-management settings.

minor comments (2)
  1. Abstract: the summary asserts successful prediction and preserved interpretability but supplies no quantitative metrics, model class, validation scheme, or data characteristics; adding one or two key numbers would make the claim easier to evaluate at a glance.
  2. The manuscript would benefit from an explicit statement of the baseline models against which the engagement scoring pipeline is compared and from reporting performance with confidence intervals or statistical tests.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their thoughtful review and positive recommendation for minor revision. The referee's summary correctly reflects the core contributions of the manuscript regarding the behavioral engagement scoring pipeline, its use for predicting response propensity on real-world data, and the provision of interpretable insights via prototypical patients. No specific major comments were provided in the report.

Circularity Check

0 steps flagged

No significant circularity detected

full rationale

The manuscript describes an empirical pipeline that scores patient engagement from care-management records and evaluates its ability to predict response propensity on held-out data, together with prototype-based explanations. No equations, fitting procedures, or derivation steps are presented in the abstract or summary that would allow any claimed prediction to reduce by construction to its own inputs. No self-citations are invoked as load-bearing uniqueness theorems, and no ansatz or renaming of known results is described. The central claims therefore rest on external performance numbers rather than on any definitional or self-referential reduction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review yields no information on free parameters, axioms, or invented entities.

pith-pipeline@v0.9.0 · 5722 in / 1059 out tokens · 29393 ms · 2026-05-25T20:09:16.411679+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages · 2 internal anchors

  1. [1]

    Agency for Healthcare Research and Quality. 2015. Implications for Medical Practice, Health Policy, and Health Services Research . Technical Report. Rockville, MD, USA. 7 Figure 9: Pipeline evaluation using prototypical patient cases

  2. [2]

    Randall S Brown, Deborah Peikes, Greg Peterson, Jennifer Schore, and Carol M Razafindrakoto. 2012. Six features of Medicare coordinated care demonstration programs that cut hospital admissions of high-risk patients. Health Affairs 31, 6 (2012), 1156–1166

  3. [3]

    Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. 2015. Intelligible Models for HealthCare. In Proceedings of the 21th ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’15. ACM Press, New York, New York, USA, 1721–1730. https://doi.org/10.1145/ 2783258.2788613

  4. [4]

    Center for Health Care Strategies. 2007. Care Management Definition and Frame- work. Technical Report

  5. [5]

    Chen and Steven M

    Jonathan H. Chen and Steven M. Asch. 2017. Machine Learning and Prediction in Medicine âĂŤ Beyond the Peak of Inflated Expectations. New England Journal of Medicine 376, 26 (jun 2017), 2507–2509. https://doi.org/10.1056/NEJMp1702071

  6. [6]

    RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism

    Edward Choi, Mohammad Taha Bahadori, Joshua A. Kulas, Andy Schuetz, Wal- ter F. Stewart, and Jimeng Sun. 2016. RETAIN: An Interpretable Predictive Model for Healthcare using Reverse Time Attention Mechanism. (aug 2016). arXiv:1608.05745 http://arxiv.org/abs/1608.05745

  7. [7]

    DARPA. [n. d.]. ARPA explainable AI Program. Retrieved January 28, 2019 from https://www.darpa.mil/program/explainable-artificial-intelligence

  8. [8]

    Sanjoy Dey, Kelvin Lim, Gowtham Atluri, Angus MacDonald, Michael Steinbach, and Vipin Kumar. 2012. A pattern mining based integrative framework for biomarker discovery. In Proceedings of the ACM Conference on Bioinformatics, Computational Biology and Biomedicine - BCB ’12. ACM Press, New York, New York, USA, 498–505. https://doi.org/10.1145/2382936.2383000

  9. [9]

    Jacob Feldman. 2000. Minimization of Boolean complexity in human concept learning. Nature 407, 6804 (oct 2000), 630–633. https://doi.org/10.1038/35036586

  10. [10]

    Jerome Friedman, Trevor Hastie, and Robert Tibshirani. 2001. The elements of statistical learning. Vol. 1. Springer series in statistics New York, NY, USA:

  11. [11]

    google. [n. d.]. Google AI experiment: high-dimensional space. Re- trieved January 28, 2019 from https://experiments.withgoogle.com/ai/ visualizing-high-dimensional-space

  12. [12]

    Bonanomi A

    Barello S. Bonanomi A. Graffigna, G. and E. Lozza. 2015. Measuring patient engagement: development and psychometric properties of the Patient Health Engagement (PHE) Scale. Frontiers in psychology 6, 274 (2015)

  13. [13]

    Mahoney E

    Stockard J. Mahoney E. R. Hibbard, J. H. and M. Tusler. 2004. Development of the Patient Activation Measure (PAM): conceptualizing and measuring activation in patients and consumers. Health services research 39, 4 Pt 1 (2004), 1005–26

  14. [14]

    More" to

    Pei-Yun S. Hsueh, S. Dey, S. Das, and T. Wetter. 2017. Making sense of patient- generated health data for interpretable patient-centered care: The transition from "More" to "Better". Vol. 245. https://doi.org/10.3233/978-1-61499-830-3-113

  15. [15]

    Xinyu Hu, Pei-Yun S Hsueh, Ching-Hua Chen, Keith M Diaz, Ying-Kuen K Cheung, and Min Qian. 2017. A First Step Towards Behavioral Coaching for Managing Stress: A Case Study on Optimal Policy Estimation with Multi-stage Threshold Q-learning. AMIA ... Annual Symposium proceedings. AMIA Sympo- sium 2017 (2017), 930–939. http://www.ncbi.nlm.nih.gov/pubmed/2985...

  16. [16]

    Ravi Karkar, Jasmine Zia, Roger Vilardaga, Sonali R Mishra, James Fogarty, Sean A Munson, and Julie A Kientz. 2015. A framework for self-experimentation in personalized health. Journal of the American Medical Informatics Association (2015)

  17. [17]

    David J Ketchen and Christopher L Shook. 1996. The application of cluster analysis in strategic management research: an analysis and critique. Strategic management journal 17, 6 (1996), 441–458

  18. [18]

    Been Kim, Cynthia Rudin, and Julie A. Shah. 2014. The Bayesian Case Model: A Generative Approach for Case-Based Reasoning and Prototype Classification. , 1952–1960 pages

  19. [19]

    Bach, and Jure Leskovec

    Himabindu Lakkaraju, Stephen H. Bach, and Jure Leskovec. 2016. Interpretable Decision Sets. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16 . ACM Press, New York, New York, USA, 1675–1684. https://doi.org/10.1145/2939672.2939874

  20. [20]

    Tao Lei, Regina Barzilay, and Tommi Jaakkola. 2016. Rationalizing Neural Predic- tions. (jun 2016). arXiv:1606.04155 http://arxiv.org/abs/1606.04155

  21. [21]

    Bin Liu, Ying Li, Zhaonan Sun, Soumya Ghosh, and Kenney Ng. 2018. Early Prediction of Diabetes Complications from Electronic Health Records: A Multi- Task Survival Analysis Approach. AAAI (2018). https://www.semanticscholar. org/paper/Early-Prediction-of-Diabetes-Complications-from-A-Liu-Li/ 28dec33fc71b9139e7e1b6c4a1b32b2947c53176

  22. [22]

    Peter V Long. 2017. Effective Care for High-need Patients: Opportunities for Im- proving Outcomes, Value, and Health. National Academy Of Medicine

  23. [23]

    Yen-Fu Luo and Anna Rumshisky. [n. d.]. Interpretable Topic Features for Post- ICU Mortality Prediction. ([n. d.]). http://www.cs.uml.edu/{~}arum/publications/ YFLuo{_}AMIA{_}2016.pdf

  24. [24]

    Michigan Care Management Resource Center Home. [n. d.]. Patient Engagement. Retrieved January 28, 2019 from https://micmrc.org/topics/patient-engagement-0

  25. [25]

    Why Should I Trust You?

    Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. 2016. "Why Should I Trust You?". In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining - KDD ’16 . ACM Press, New York, New York, USA, 1135–1144. https://doi.org/10.1145/2939672.2939778

  26. [26]

    Robins, Andrea Rotnitzky, and Lue Ping Zhao

    James M. Robins, Andrea Rotnitzky, and Lue Ping Zhao. 1994. Estimation of Regression Coefficients When Some Regressors Are Not Always Observed. J. Amer. Statist. Assoc. 89, 427 (sep 1994), 846. https://doi.org/10.2307/2290910

  27. [27]

    Manu Sridharan and Gerald Tesauro. 2002. Multi-agent Q-learning and Regres- sion Trees for Automated Pricing Decisions. Springer, Boston, MA, 217–234. https://doi.org/10.1007/978-1-4615-1107-6_11

  28. [28]

    Jimeng Sun, Daby Sow, Jianying Hu, and Shahram Ebadollahi. 2010. Localized supervised metric learning on temporal physiological data. In Pattern Recognition (ICPR), 2010 20th International Conference on . IEEE, 4149–4152. 8

  29. [29]

    Jimeng Sun, Fei Wang, Jianying Hu, and Shahram Edabollahi. 2012. Supervised patient similarity measure of heterogeneous patient records. ACM SIGKDD Explorations Newsletter 14, 1 (2012), 16–24

  30. [30]

    Jaakko Tuomilehto, Jaana Lindström, Johan G Eriksson, Timo T Valle, Helena Hämäläinen, Pirjo Ilanne-Parikka, Sirkka Keinänen-Kiukaanniemi, Mauri Laakso, Anne Louheranta, Merja Rastas, et al. 2001. Prevention of type 2 diabetes mellitus by changes in lifestyle among subjects with impaired glucose tolerance. New England Journal of Medicine 344, 18 (2001), 1343–1350

  31. [31]

    Jeffrey A West, Nancy H Miller, Kathleen M Parker, Deborah Senneca, Ghassan Ghandour, Mia Clark, George Greenwald, Robert S Heller, Michael B Fowler, and Robert F DeBusk. 1997. A comprehensive management system for heart failure improves clinical outcomes and reduces medical resource utilization. American Journal of Cardiology 79, 1 (1997), 58–63. 9