Medical Concept Representation Learning from Claims Data and Application to Health Plan Payment Risk Adjustment

Andrew H. Fairless; Farbod Rahmanian; Jasmine M. McCammon; Qiu-Yue Zhong

arxiv: 1907.06600 · v1 · pith:FDLIMRHGnew · submitted 2019-07-15 · 💻 cs.LG · stat.ML

Medical Concept Representation Learning from Claims Data and Application to Health Plan Payment Risk Adjustment

Qiu-Yue Zhong , Andrew H. Fairless , Jasmine M. McCammon , Farbod Rahmanian This is my paper

Pith reviewed 2026-05-24 21:30 UTC · model grok-4.3

classification 💻 cs.LG stat.ML

keywords risk adjustmentmedical embeddingsclaims datahealthcare predictionprospective modelingsemantic representationsfeature learning

0 comments

The pith

Semantic embeddings from claims data improve risk score prediction over a commercial model.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes representing patient medical histories as semantic embeddings learned directly from diagnostic, procedure, and prescription codes in claims data. Traditional risk adjustment relies on linear regression that demands heavy manual feature engineering by domain experts. The embedding approach reduces that preprocessing burden while capturing richer patterns in the data. Models built on these embeddings outperform a commercial risk adjustment system at forecasting future healthcare costs. Accurate risk scores matter because they determine fair payments to health plans that cover sicker populations.

Core claim

Patient histories can be represented as embeddings that encode medical concepts from sequences of codes; these representations allow predictive models to achieve higher performance than a commercial linear risk adjustment model on the task of prospective risk score prediction, while requiring substantially less expert-driven feature engineering.

What carries the argument

Semantic embeddings that encode medical concepts from diagnostic, procedure, and prescription codes appearing in patient claims histories.

If this is right

Risk adjustment systems can operate with reduced reliance on manual preprocessing of claims codes.
Linear regression models leave exploitable structure in the raw sequence of medical codes.
Embeddings can be pre-computed once and reused across multiple downstream payment or cost models.
The same code sequences can support both risk scoring and other patient-level predictions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method could be applied to predict individual disease progression or hospitalization risk using the same embedding space.
Combining embeddings with structured demographic or pharmacy data might close remaining gaps versus expert models.
Periodic retraining of the embeddings on newer claims would be needed to track changes in coding practices or medical technology.

Load-bearing premise

Performance gains are caused by the embeddings rather than differences in training data, feature sets, or tuning between the embedding models and the commercial baseline.

What would settle it

Re-train the commercial model on exactly the same patient cohort and data split used for the embedding models and compare their prospective prediction errors head-to-head.

read the original abstract

Risk adjustment has become an increasingly important tool in healthcare. It has been extensively applied to payment adjustment for health plans to reflect the expected cost of providing coverage for members. Risk adjustment models are typically estimated using linear regression, which does not fully exploit the information in claims data. Moreover, the development of such linear regression models requires substantial domain expert knowledge and computational effort for data preprocessing. In this paper, we propose a novel approach for risk adjustment that uses semantic embeddings to represent patient medical histories. Embeddings efficiently represent medical concepts learned from diagnostic, procedure, and prescription codes in patients' medical histories. This approach substantially reduces the need for feature engineering. Our results show that models using embeddings had better performance than a commercial risk adjustment model on the task of prospective risk score prediction.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 0 minor

Summary. The manuscript proposes learning semantic embeddings from diagnostic, procedure, and prescription codes in claims data to represent patient medical histories for risk adjustment. It argues that this approach reduces the need for domain-expert feature engineering required by conventional linear regression models and reports that embedding-based models outperform a commercial risk adjustment model on prospective risk score prediction.

Significance. A well-controlled demonstration that embeddings yield measurable gains over a strong commercial baseline while lowering preprocessing effort would be of interest to health-services research and applied ML, as it could streamline risk adjustment for plan payments. The current manuscript supplies no quantitative deltas, dataset sizes, or matched-baseline details, so the practical significance cannot yet be evaluated.

major comments (1)

[Abstract] Abstract: the central claim that 'models using embeddings had better performance than a commercial risk adjustment model' is presented without any numerical results, confidence intervals, cohort sizes, time windows, outcome definitions, or description of how the commercial baseline was re-implemented or whether identical patient cohorts and splits were used. This omission directly undermines the attribution of any gain to the embeddings rather than to differences in data, features, or tuning.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive feedback. The comment on the abstract is well-taken, and we will revise the manuscript to address it directly.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that 'models using embeddings had better performance than a commercial risk adjustment model' is presented without any numerical results, confidence intervals, cohort sizes, time windows, outcome definitions, or description of how the commercial baseline was re-implemented or whether identical patient cohorts and splits were used. This omission directly undermines the attribution of any gain to the embeddings rather than to differences in data, features, or tuning.

Authors: We agree that the abstract should include key quantitative results, cohort sizes, time windows, and baseline details to support the claim. The main text of the manuscript reports these elements (including performance metrics on prospective risk score prediction, dataset characteristics, and how the commercial model was applied to the same cohorts and splits). In the revision we will condense the relevant results into the abstract so that the performance comparison is stated with numbers rather than qualitatively. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical ML results with independent embedding learning and external baseline

full rationale

The paper describes an empirical pipeline in which medical concept embeddings are learned from claims data and then used as features in downstream risk score models. No equations, derivations, or self-citations are presented that reduce the reported performance gain to a quantity defined by construction from the same fitted parameters or inputs. The claimed superiority is an experimental outcome relative to an external commercial model rather than a tautological renaming or self-referential fit. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The approach rests on the domain assumption that medical codes exhibit semantic structure amenable to embedding and on standard hyperparameters of embedding models that must be chosen or tuned.

free parameters (2)

embedding dimension
Standard hyperparameter of the embedding model whose value affects downstream performance and is not derived from first principles.
context window or negative sampling rate
Typical training choices for learning embeddings from code sequences.

axioms (1)

domain assumption Medical codes in claims data carry semantic relationships that can be captured by vector proximity
Invoked when the paper states that embeddings represent medical concepts learned from diagnostic, procedure, and prescription codes.

pith-pipeline@v0.9.0 · 5671 in / 1189 out tokens · 22731 ms · 2026-05-24T21:30:17.494728+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

20 extracted references · 20 canonical work pages · 4 internal anchors

[1]

Jacek M Bajor, Diego A Mesa, Travis J Osterman, and Thomas A Lasko. 2018. Em- bedding Complexity In the Data Representation Instead of In the Model: A Case Study Using Heterogeneous Medical Data. (Feb. 2018). arXiv :stat.AP/1802.04233

work page arXiv 2018
[2]

Andrew L Beam, Benjamin Kompa, Inbar Fried, Nathan P Palm er, Xu Shi, Tianxi Cai, and Isaac S Kohane. 2018. Clinical Concept Embeddings L earned from Mas- sive Sources of Multimodal Medical Data. (April 2018). arXi v:cs.CL/1804.01486

work page arXiv 2018
[3]

Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Chr istian Jauvin. 2003. A Neural Probabilistic Language Model. J. Mach. Learn. Res. 3, Feb (2003), 1137– 1155

work page 2003
[4]

Hsien-Yen Chang and Jonathan P Weiner. 2010. An in-depth assessment of a diagnosis-based risk adjustment model based on national h ealth insurance claims: the application of the Johns Hopkins Adjusted Clini cal Group case-mix system in Taiwan. BMC Med. 8 (Jan. 2010), 7

work page 2010
[5]

Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalab le Tree Boosting System. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16) . ACM, New York, NY, USA, 785–794

work page 2016
[6]

Edward Choi, Mohammad Taha Bahadori, Elizabeth Searles , Catherine Coﬀey, and Jimeng Sun. 2016. Multi-layer Representation Learning for Medical Con- cepts. (Feb. 2016). arXiv:cs.LG/1602.05568

work page internal anchor Pith review Pith/arXiv arXiv 2016
[7]

Youngduck Choi, Chill Yi-I Chiu, and David Sontag. 2016. Learning low- dimensional representations of medical concepts. AMIA Summits on Transla- tional Science Proceedings 2016 (2016), 41

work page 2016
[8]

Lance De Vine, Guido Zuccon, Bevan Koopman, Laurianne Si tbon, and Peter Bruza. 2014. Medical semantic similarity with a neural lang uage model. In Pro- ceedings of the 23rd ACM international conference on confer ence on information and knowledge management . ACM, 1819–1822

work page 2014
[9]

Randall P Ellis, Bruno Martins, and Sherri Rose. 2018. Ch apter 3 - Risk Adjust- ment for Health Plan Payment. In Risk Adjustment, Risk Sharing and Premium Regulation in Health Insurance Markets , Thomas G McGuire and Richard C van Kleef (Eds.). Academic Press, 55–104

work page 2018
[10]

Geoﬀrey R Hileman, Syed Muzayan Mehmud, and Marjorie A R osenberg. 2016. Risk Scoring in Health Insurance: A Primer . Technical Report. Society of Actuar- ies, Chicago, IL

work page 2016
[11]

Geoﬀrey R Hileman and Spenser Steele. 2007. Accuracy of Claims-Based Risk Scoring Models. Technical Report. Society of Actuaries, Chicago, IL

work page 2007
[12]

Quoc V Le and Tomas Mikolov. 2014. Distributed Represen tations of Sentences and Documents. (May 2014). arXiv:cs.CL/1405.4053

work page internal anchor Pith review Pith/arXiv arXiv 2014
[13]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeﬀrey Dean. 2013. Eﬃ- cient Estimation of Word Representations in Vector Space. ( Jan. 2013). arXiv:cs.CL/1301.3781

work page internal anchor Pith review Pith/arXiv arXiv 2013
[14]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado , and Jeﬀ Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26 , C J C Burges, L Bottou, M Welling, Z Ghahramani, and K Q Weinberger (Eds.). Curran As sociates, Inc., 3111–3119

work page 2013
[15]

Riccardo Miotto, Li Li, Brian A Kidd, and Joel T Dudley. 2 016. Deep Patient: An Unsupervised Representation to Predict the Future of Pat ients from the Elec- tronic Health Records. Sci. Rep. 6 (May 2016), 26094

work page 2016
[16]

Trang Pham, Truyen Tran, Dinh Phung, and Svetha Venkate sh. 2017. Predict- ing healthcare trajectories from medical records: A deep le arning approach. J. Biomed. Inform. 69 (May 2017), 218–229

work page 2017
[17]

Alvin Rajkomar, Eyal Oren, Kai Chen, Andrew M Dai, Nissa n Hajaj, Michaela Hardt, Peter J Liu, Xiaobing Liu, Jake Marcus, Mimi Sun, Patrik Sundberg, Hector Yee, Kun Zhang, Yi Zhang, Gerardo Flores, Gavin E Duggan, Jam ie Irvine, Quoc Le, Kurt Litsch, Alexander Mossin, Justin Tansuwan, De Wang , James Wexler, Jimbo Wilson, Dana Ludwig, Samuel L Volchenboum...

work page 2018
[18]

Sherri Rose. 2016. A Machine Learning Framework for Pla n Payment Risk Ad- justment. Health Serv. Res. 51, 6 (Dec. 2016), 2358–2374

work page 2016
[19]

Eric Schone and Randall Brown. 2013. Risk Adjustment: What is the Current State of the Art and How Can it be Improved? Technical Report. Wood Johnson Foundation, Princeton, NJ

work page 2013
[20]

Jinghe Zhang, Kamran Kowsari, James H Harrison, Jennif er M Lobo, and Laura E Barnes. 2018. Patient2Vec: A Personalized Interpre table Deep Representation of the Longitudinal Electronic Health Reco rd. (Oct. 2018). arXiv:q-bio.QM/1810.04793

work page internal anchor Pith review Pith/arXiv arXiv 2018

[1] [1]

Jacek M Bajor, Diego A Mesa, Travis J Osterman, and Thomas A Lasko. 2018. Em- bedding Complexity In the Data Representation Instead of In the Model: A Case Study Using Heterogeneous Medical Data. (Feb. 2018). arXiv :stat.AP/1802.04233

work page arXiv 2018

[2] [2]

Andrew L Beam, Benjamin Kompa, Inbar Fried, Nathan P Palm er, Xu Shi, Tianxi Cai, and Isaac S Kohane. 2018. Clinical Concept Embeddings L earned from Mas- sive Sources of Multimodal Medical Data. (April 2018). arXi v:cs.CL/1804.01486

work page arXiv 2018

[3] [3]

Yoshua Bengio, Réjean Ducharme, Pascal Vincent, and Chr istian Jauvin. 2003. A Neural Probabilistic Language Model. J. Mach. Learn. Res. 3, Feb (2003), 1137– 1155

work page 2003

[4] [4]

Hsien-Yen Chang and Jonathan P Weiner. 2010. An in-depth assessment of a diagnosis-based risk adjustment model based on national h ealth insurance claims: the application of the Johns Hopkins Adjusted Clini cal Group case-mix system in Taiwan. BMC Med. 8 (Jan. 2010), 7

work page 2010

[5] [5]

Tianqi Chen and Carlos Guestrin. 2016. XGBoost: A Scalab le Tree Boosting System. In Proceedings of the 22Nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining (KDD ’16) . ACM, New York, NY, USA, 785–794

work page 2016

[6] [6]

Edward Choi, Mohammad Taha Bahadori, Elizabeth Searles , Catherine Coﬀey, and Jimeng Sun. 2016. Multi-layer Representation Learning for Medical Con- cepts. (Feb. 2016). arXiv:cs.LG/1602.05568

work page internal anchor Pith review Pith/arXiv arXiv 2016

[7] [7]

Youngduck Choi, Chill Yi-I Chiu, and David Sontag. 2016. Learning low- dimensional representations of medical concepts. AMIA Summits on Transla- tional Science Proceedings 2016 (2016), 41

work page 2016

[8] [8]

Lance De Vine, Guido Zuccon, Bevan Koopman, Laurianne Si tbon, and Peter Bruza. 2014. Medical semantic similarity with a neural lang uage model. In Pro- ceedings of the 23rd ACM international conference on confer ence on information and knowledge management . ACM, 1819–1822

work page 2014

[9] [9]

Randall P Ellis, Bruno Martins, and Sherri Rose. 2018. Ch apter 3 - Risk Adjust- ment for Health Plan Payment. In Risk Adjustment, Risk Sharing and Premium Regulation in Health Insurance Markets , Thomas G McGuire and Richard C van Kleef (Eds.). Academic Press, 55–104

work page 2018

[10] [10]

Geoﬀrey R Hileman, Syed Muzayan Mehmud, and Marjorie A R osenberg. 2016. Risk Scoring in Health Insurance: A Primer . Technical Report. Society of Actuar- ies, Chicago, IL

work page 2016

[11] [11]

Geoﬀrey R Hileman and Spenser Steele. 2007. Accuracy of Claims-Based Risk Scoring Models. Technical Report. Society of Actuaries, Chicago, IL

work page 2007

[12] [12]

Quoc V Le and Tomas Mikolov. 2014. Distributed Represen tations of Sentences and Documents. (May 2014). arXiv:cs.CL/1405.4053

work page internal anchor Pith review Pith/arXiv arXiv 2014

[13] [13]

Tomas Mikolov, Kai Chen, Greg Corrado, and Jeﬀrey Dean. 2013. Eﬃ- cient Estimation of Word Representations in Vector Space. ( Jan. 2013). arXiv:cs.CL/1301.3781

work page internal anchor Pith review Pith/arXiv arXiv 2013

[14] [14]

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado , and Jeﬀ Dean. 2013. Distributed Representations of Words and Phrases and their Compositionality. In Advances in Neural Information Processing Systems 26 , C J C Burges, L Bottou, M Welling, Z Ghahramani, and K Q Weinberger (Eds.). Curran As sociates, Inc., 3111–3119

work page 2013

[15] [15]

Riccardo Miotto, Li Li, Brian A Kidd, and Joel T Dudley. 2 016. Deep Patient: An Unsupervised Representation to Predict the Future of Pat ients from the Elec- tronic Health Records. Sci. Rep. 6 (May 2016), 26094

work page 2016

[16] [16]

Trang Pham, Truyen Tran, Dinh Phung, and Svetha Venkate sh. 2017. Predict- ing healthcare trajectories from medical records: A deep le arning approach. J. Biomed. Inform. 69 (May 2017), 218–229

work page 2017

[17] [17]

Alvin Rajkomar, Eyal Oren, Kai Chen, Andrew M Dai, Nissa n Hajaj, Michaela Hardt, Peter J Liu, Xiaobing Liu, Jake Marcus, Mimi Sun, Patrik Sundberg, Hector Yee, Kun Zhang, Yi Zhang, Gerardo Flores, Gavin E Duggan, Jam ie Irvine, Quoc Le, Kurt Litsch, Alexander Mossin, Justin Tansuwan, De Wang , James Wexler, Jimbo Wilson, Dana Ludwig, Samuel L Volchenboum...

work page 2018

[18] [18]

Sherri Rose. 2016. A Machine Learning Framework for Pla n Payment Risk Ad- justment. Health Serv. Res. 51, 6 (Dec. 2016), 2358–2374

work page 2016

[19] [19]

Eric Schone and Randall Brown. 2013. Risk Adjustment: What is the Current State of the Art and How Can it be Improved? Technical Report. Wood Johnson Foundation, Princeton, NJ

work page 2013

[20] [20]

Jinghe Zhang, Kamran Kowsari, James H Harrison, Jennif er M Lobo, and Laura E Barnes. 2018. Patient2Vec: A Personalized Interpre table Deep Representation of the Longitudinal Electronic Health Reco rd. (Oct. 2018). arXiv:q-bio.QM/1810.04793

work page internal anchor Pith review Pith/arXiv arXiv 2018