Medical Concept Representation Learning from Claims Data and Application to Health Plan Payment Risk Adjustment
Pith reviewed 2026-05-24 21:30 UTC · model grok-4.3
The pith
Semantic embeddings from claims data improve risk score prediction over a commercial model.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Patient histories can be represented as embeddings that encode medical concepts from sequences of codes; these representations allow predictive models to achieve higher performance than a commercial linear risk adjustment model on the task of prospective risk score prediction, while requiring substantially less expert-driven feature engineering.
What carries the argument
Semantic embeddings that encode medical concepts from diagnostic, procedure, and prescription codes appearing in patient claims histories.
If this is right
- Risk adjustment systems can operate with reduced reliance on manual preprocessing of claims codes.
- Linear regression models leave exploitable structure in the raw sequence of medical codes.
- Embeddings can be pre-computed once and reused across multiple downstream payment or cost models.
- The same code sequences can support both risk scoring and other patient-level predictions.
Where Pith is reading between the lines
- The method could be applied to predict individual disease progression or hospitalization risk using the same embedding space.
- Combining embeddings with structured demographic or pharmacy data might close remaining gaps versus expert models.
- Periodic retraining of the embeddings on newer claims would be needed to track changes in coding practices or medical technology.
Load-bearing premise
Performance gains are caused by the embeddings rather than differences in training data, feature sets, or tuning between the embedding models and the commercial baseline.
What would settle it
Re-train the commercial model on exactly the same patient cohort and data split used for the embedding models and compare their prospective prediction errors head-to-head.
read the original abstract
Risk adjustment has become an increasingly important tool in healthcare. It has been extensively applied to payment adjustment for health plans to reflect the expected cost of providing coverage for members. Risk adjustment models are typically estimated using linear regression, which does not fully exploit the information in claims data. Moreover, the development of such linear regression models requires substantial domain expert knowledge and computational effort for data preprocessing. In this paper, we propose a novel approach for risk adjustment that uses semantic embeddings to represent patient medical histories. Embeddings efficiently represent medical concepts learned from diagnostic, procedure, and prescription codes in patients' medical histories. This approach substantially reduces the need for feature engineering. Our results show that models using embeddings had better performance than a commercial risk adjustment model on the task of prospective risk score prediction.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes learning semantic embeddings from diagnostic, procedure, and prescription codes in claims data to represent patient medical histories for risk adjustment. It argues that this approach reduces the need for domain-expert feature engineering required by conventional linear regression models and reports that embedding-based models outperform a commercial risk adjustment model on prospective risk score prediction.
Significance. A well-controlled demonstration that embeddings yield measurable gains over a strong commercial baseline while lowering preprocessing effort would be of interest to health-services research and applied ML, as it could streamline risk adjustment for plan payments. The current manuscript supplies no quantitative deltas, dataset sizes, or matched-baseline details, so the practical significance cannot yet be evaluated.
major comments (1)
- [Abstract] Abstract: the central claim that 'models using embeddings had better performance than a commercial risk adjustment model' is presented without any numerical results, confidence intervals, cohort sizes, time windows, outcome definitions, or description of how the commercial baseline was re-implemented or whether identical patient cohorts and splits were used. This omission directly undermines the attribution of any gain to the embeddings rather than to differences in data, features, or tuning.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. The comment on the abstract is well-taken, and we will revise the manuscript to address it directly.
read point-by-point responses
-
Referee: [Abstract] Abstract: the central claim that 'models using embeddings had better performance than a commercial risk adjustment model' is presented without any numerical results, confidence intervals, cohort sizes, time windows, outcome definitions, or description of how the commercial baseline was re-implemented or whether identical patient cohorts and splits were used. This omission directly undermines the attribution of any gain to the embeddings rather than to differences in data, features, or tuning.
Authors: We agree that the abstract should include key quantitative results, cohort sizes, time windows, and baseline details to support the claim. The main text of the manuscript reports these elements (including performance metrics on prospective risk score prediction, dataset characteristics, and how the commercial model was applied to the same cohorts and splits). In the revision we will condense the relevant results into the abstract so that the performance comparison is stated with numbers rather than qualitatively. revision: yes
Circularity Check
No circularity: empirical ML results with independent embedding learning and external baseline
full rationale
The paper describes an empirical pipeline in which medical concept embeddings are learned from claims data and then used as features in downstream risk score models. No equations, derivations, or self-citations are presented that reduce the reported performance gain to a quantity defined by construction from the same fitted parameters or inputs. The claimed superiority is an experimental outcome relative to an external commercial model rather than a tautological renaming or self-referential fit. The derivation chain is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (2)
- embedding dimension
- context window or negative sampling rate
axioms (1)
- domain assumption Medical codes in claims data carry semantic relationships that can be captured by vector proximity
Reference graph
Works this paper leans on
- [1]
- [2]
-
[3]
Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Chr istian Jauvin. 2003. A Neural Probabilistic Language Model. J. Mach. Learn. Res. 3, Feb (2003), 1137– 1155
work page 2003
-
[4]
Hsien-Yen Chang and Jonathan P Weiner. 2010. An in-depth assessment of a diagnosis-based risk adjustment model based on national h ealth insurance claims: the application of the Johns Hopkins Adjusted Clini cal Group case-mix system in Taiwan. BMC Med. 8 (Jan. 2010), 7
work page 2010
-
[5]
Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalab le Tree Boosting System. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16) . ACM, New York, NY, USA, 785–794
work page 2016
-
[6]
Edward Choi, Mohammad Taha Bahadori, Elizabeth Searles , Catherine Coffey, and Jimeng Sun. 2016. Multi-layer Representation Learning for Medical Con- cepts. (Feb. 2016). arXiv:cs.LG/1602.05568
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[7]
Youngduck Choi, Chill Yi-I Chiu, and David Sontag. 2016. Learning low- dimensional representations of medical concepts. AMIA Summits on Transla- tional Science Proceedings 2016 (2016), 41
work page 2016
-
[8]
Lance De Vine, Guido Zuccon, Bevan Koopman, Laurianne Si tbon, and Peter Bruza. 2014. Medical semantic similarity with a neural lang uage model. In Pro- ceedings of the 23rd ACM international conference on confer ence on information and knowledge management . ACM, 1819–1822
work page 2014
-
[9]
Randall P Ellis, Bruno Martins, and Sherri Rose. 2018. Ch apter 3 - Risk Adjust- ment for Health Plan Payment. In Risk Adjustment, Risk Sharing and Premium Regulation in Health Insurance Markets , Thomas G McGuire and Richard C van Kleef (Eds.). Academic Press, 55–104
work page 2018
-
[10]
Geoffrey R Hileman, Syed Muzayan Mehmud, and Marjorie A R osenberg. 2016. Risk Scoring in Health Insurance: A Primer . Technical Report. Society of Actuar- ies, Chicago, IL
work page 2016
-
[11]
Geoffrey R Hileman and Spenser Steele. 2007. Accuracy of Claims-Based Risk Scoring Models. Technical Report. Society of Actuaries, Chicago, IL
work page 2007
-
[12]
Quoc V Le and Tomas Mikolov. 2014. Distributed Represen tations of Sentences and Documents. (May 2014). arXiv:cs.CL/1405.4053
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[13]
Tomas Mikolov, Kai Chen, Greg Corrado, and Jeffrey Dean. 2013. Effi- cient Estimation of Word Representations in Vector Space. ( Jan. 2013). arXiv:cs.CL/1301.3781
work page internal anchor Pith review Pith/arXiv arXiv 2013
-
[14]
Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado , and Jeff Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26 , C J C Burges, L Bottou, M Welling, Z Ghahramani, and K Q Weinberger (Eds.). Curran As sociates, Inc., 3111–3119
work page 2013
-
[15]
Riccardo Miotto, Li Li, Brian A Kidd, and Joel T Dudley. 2 016. Deep Patient: An Unsupervised Representation to Predict the Future of Pat ients from the Elec- tronic Health Records. Sci. Rep. 6 (May 2016), 26094
work page 2016
-
[16]
Trang Pham, Truyen Tran, Dinh Phung, and Svetha Venkate sh. 2017. Predict- ing healthcare trajectories from medical records: A deep le arning approach. J. Biomed. Inform. 69 (May 2017), 218–229
work page 2017
-
[17]
Alvin Rajkomar, Eyal Oren, Kai Chen, Andrew M Dai, Nissa n Hajaj, Michaela Hardt, Peter J Liu, Xiaobing Liu, Jake Marcus, Mimi Sun, Patrik Sundberg, Hector Yee, Kun Zhang, Yi Zhang, Gerardo Flores, Gavin E Duggan, Jam ie Irvine, Quoc Le, Kurt Litsch, Alexander Mossin, Justin Tansuwan, De Wang , James Wexler, Jimbo Wilson, Dana Ludwig, Samuel L Volchenboum...
work page 2018
-
[18]
Sherri Rose. 2016. A Machine Learning Framework for Pla n Payment Risk Ad- justment. Health Serv. Res. 51, 6 (Dec. 2016), 2358–2374
work page 2016
-
[19]
Eric Schone and Randall Brown. 2013. Risk Adjustment: What is the Current State of the Art and How Can it be Improved? Technical Report. Wood Johnson Foundation, Princeton, NJ
work page 2013
-
[20]
Jinghe Zhang, Kamran Kowsari, James H Harrison, Jennif er M Lobo, and Laura E Barnes. 2018. Patient2Vec: A Personalized Interpre table Deep Representation of the Longitudinal Electronic Health Reco rd. (Oct. 2018). arXiv:q-bio.QM/1810.04793
work page internal anchor Pith review Pith/arXiv arXiv 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.