Representation learning to advance multi-institutional studies with electronic health record data from US and France

Boris Hejblum; Chuan Hong; Clara-Lea Bonzel; Doudou Zhou; Han Tong; J. Michael Gaziano; Katherine Liao; Kelly Cho; Kenneth Mandl; Kevin Pan

arxiv: 2502.08547 · v2 · submitted 2025-02-12 · 💻 cs.AI

Representation learning to advance multi-institutional studies with electronic health record data from US and France

Doudou Zhou , Han Tong , Linshanshan Wang , Suqi Liu , Xin Xiong , Ziming Gan , Romain Griffier , Boris Hejblum

show 16 more authors

Yun-Chung Liu Chuan Hong Clara-Lea Bonzel Tianrun Cai Kevin Pan Yuk-Lam Ho Lauren Costa Vidul A. Panickan J. Michael Gaziano Kenneth Mandl Vianney Jouhet Rodolphe Thiebaut Zongqi Xia Kelly Cho Katherine Liao Tianxi Cai

This is my paper

Pith reviewed 2026-05-23 03:15 UTC · model grok-4.3

classification 💻 cs.AI

keywords electronic health recordsdata harmonizationrepresentation learningmulti-institutional collaborationprivacy-preserving methodsknowledge graphssemantic embedding

0 comments

The pith

A graph-based framework aligns electronic health record vocabularies across institutions by learning a shared semantic space from local statistics, knowledge graphs, and language models.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a graph-based framework for harmonizing electronic health record data from multiple institutions without sharing patient-level information. It treats data harmonization as a representation learning task that combines institution-specific summary statistics, biomedical knowledge graphs, and semantic embeddings from large language models. The goal is to create a common semantic space that maps different coding practices used at each site. This approach was tested on data from seven institutions in the US and France, supporting the development of clinical models that can be trained and deployed across different healthcare systems and languages.

Core claim

The framework learns a shared semantic space by integrating institution-specific summary statistics from health records, curated biomedical knowledge graphs, and semantic information derived from large language models, thereby aligning diverse site-specific vocabularies while preserving patient privacy.

What carries the argument

A graph-based representation learning framework that jointly embeds institution-specific data summaries, biomedical knowledge graphs, and large language model-derived semantics into a unified space for vocabulary alignment.

If this is right

Clinical models can be trained at one institution and deployed at others with aligned data representations.
The method supports multi-institutional studies across different countries and languages.
Privacy is maintained since only summary statistics are used, not individual patient records.
Scalable harmonization is achieved without relying on fixed standards or manual mappings.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This approach might extend to other data types like imaging or genomic records if similar summary statistics and knowledge resources are available.
Institutions could use the shared space to identify and correct inconsistencies in their own coding practices.
Future work could test whether the alignment improves performance in specific clinical prediction tasks like disease diagnosis.

Load-bearing premise

That institution-specific summary statistics, curated biomedical knowledge graphs, and semantic information derived from large language models can be jointly learned into a shared semantic space that aligns diverse site-specific vocabularies.

What would settle it

Demonstrating that models using the learned alignments perform no better than those using random mappings or no alignment on cross-institution tasks would falsify the central claim.

Figures

Figures reproduced from arXiv: 2502.08547 by Boris Hejblum, Chuan Hong, Clara-Lea Bonzel, Doudou Zhou, Han Tong, J. Michael Gaziano, Katherine Liao, Kelly Cho, Kenneth Mandl, Kevin Pan, Lauren Costa, Linshanshan Wang, Rodolphe Thiebaut, Romain Griffier, Suqi Liu, Tianrun Cai, Tianxi Cai, Vianney Jouhet, Vidul A. Panickan, Xin Xiong, Yuk-Lam Ho, Yun-Chung Liu, Ziming Gan, Zongqi Xia.

**Figure 2.** Figure 2: The data processing procedure of the GAME algorithm. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Mapping local codes to standard codes using GPT-4. [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Overview of key steps in the GAME algorithm: (a) aligning embeddings into a shared repre [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 5.** Figure 5: Comparison of AUCs for detecting similarity (left) and relatedness (right) relationships using [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗

**Figure 6.** Figure 6: Correlation between cosine similarities assigned by GAME and other PLMs compared to [PITH_FULL_IMAGE:figures/full_fig_p015_6.png] view at source ↗

**Figure 7.** Figure 7: The C-index between the cosine similarities of the candidate features and the GPT-4 scores [PITH_FULL_IMAGE:figures/full_fig_p016_7.png] view at source ↗

**Figure 8.** Figure 8: Comparison of hazard ratios (HR) between AD subgroups identified by GAME embedding with [PITH_FULL_IMAGE:figures/full_fig_p016_8.png] view at source ↗

**Figure 9.** Figure 9: Comparison of hazard ratios (HR) across mental health subgroups identified by GAME em [PITH_FULL_IMAGE:figures/full_fig_p018_9.png] view at source ↗

read the original abstract

The widespread adoption of electronic health records has created new opportunities for translational clinical research, yet this promise remains constrained by fragmented data across privacy-siloed institutions and substantial heterogeneity in local coding practices. While privacy-preserving collaborative learning allows institutions to work together without sharing patient-level data, it does not address inconsistencies in how clinical concepts are represented across sites. We introduce a graph-based framework that addresses this gap by treating data harmonization as a scalable representation learning problem. Rather than relying on fixed standards or manual mappings, the framework integrates institution-specific summary statistics from health records, curated biomedical knowledge graphs, and semantic information derived from large language models to learn a shared semantic space. This joint learning approach aligns diverse, site-specific vocabularies while preserving patient privacy. Evaluated across seven institutions and two languages, the framework provides a robust, data-centric foundation for training and deploying clinical models across heterogeneous healthcare systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper sketches a graph representation learning setup that fuses site summaries, KGs, and LLM semantics for EHR vocabulary alignment, but the abstract supplies no mechanism, loss, or results to show it works.

read the letter

The core idea is a graph-based framework that treats EHR harmonization as representation learning. It pulls in institution-specific summary statistics, curated biomedical knowledge graphs, and LLM-derived semantics to build a shared space that aligns site vocabularies without moving patient records. That specific mix is the main novelty relative to standard federated or mapping approaches mentioned in the abstract. It correctly identifies that privacy-preserving methods still leave coding differences unaddressed, so the framing of the practical obstacle is on target. The claim of evaluation across seven institutions in two languages is stated plainly. The soft spot is that none of the supporting evidence appears: no architecture, no alignment objective, no loss function, no baselines, and no quantitative metrics. The joint learning step is asserted rather than demonstrated, which leaves the robustness claim hanging on an untested premise. Minor details like how summary statistics avoid losing too much signal or how LLM noise is handled are also absent. This paper is aimed at clinical informatics and federated learning groups that need scalable ways to combine heterogeneous EHR sources. Readers working on data integration methods would get the most from it if the full text fills in the missing technical pieces. It deserves peer review because the underlying problem is real and the proposed direction is distinct enough to warrant scrutiny, even with the current thin evidence.

Referee Report

2 major / 1 minor

Summary. The manuscript introduces a graph-based framework for data harmonization of electronic health records across privacy-siloed institutions. It treats harmonization as a representation learning task that jointly incorporates institution-specific summary statistics, curated biomedical knowledge graphs, and LLM-derived semantic information to produce a shared semantic space aligning site-specific vocabularies, while preserving patient privacy. The work claims evaluation across seven institutions and two languages as providing a robust foundation for multi-institutional clinical models.

Significance. If the central mechanism can be shown to work, the approach would address a genuine barrier in collaborative EHR research by moving beyond fixed standards or manual mappings. The data-centric framing and use of multiple heterogeneous inputs (summary stats + KGs + LLM semantics) are conceptually aligned with current needs in federated clinical modeling.

major comments (2)

[Abstract] Abstract: the claim that the framework 'was evaluated across seven institutions' is unsupported because the abstract (and the supplied manuscript excerpt) contains no quantitative results, baselines, error metrics, ablation studies, or performance tables.
[Abstract] Abstract: the core modeling claim—that institution-specific summary statistics, biomedical KGs, and LLM semantics can be jointly learned into an aligning shared space—lacks any description of the loss function, architecture, alignment objective (contrastive, reconstruction, graph alignment, etc.), or training procedure, rendering the joint-learning premise an unverified assumption rather than a demonstrated mechanism.

minor comments (1)

[Abstract] The abstract could be revised to separate the high-level motivation from the specific technical contributions and to include at least one key quantitative result if the full manuscript contains it.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on the abstract. We agree that the abstract requires strengthening to better substantiate its claims and will revise it accordingly while preserving its brevity. We address each major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that the framework 'was evaluated across seven institutions' is unsupported because the abstract (and the supplied manuscript excerpt) contains no quantitative results, baselines, error metrics, ablation studies, or performance tables.

Authors: We acknowledge that the current abstract states the evaluation scope without accompanying metrics. The full manuscript reports quantitative results including alignment F1 scores, cosine similarity improvements over baselines (e.g., direct KG matching and LLM-only embeddings), and ablation studies removing each input modality, all computed across the seven institutions. In revision we will insert a concise sentence summarizing key performance metrics and the evaluation scope to make the claim self-contained within the abstract. revision: yes
Referee: [Abstract] Abstract: the core modeling claim—that institution-specific summary statistics, biomedical KGs, and LLM semantics can be jointly learned into an aligning shared space—lacks any description of the loss function, architecture, alignment objective (contrastive, reconstruction, graph alignment, etc.), or training procedure, rendering the joint-learning premise an unverified assumption rather than a demonstrated mechanism.

Authors: The manuscript body specifies a graph neural network architecture with a composite objective: reconstruction loss on site-specific summary statistics, graph alignment loss on the biomedical KG edges, and contrastive loss aligning LLM-derived embeddings to the shared space, optimized via federated averaging. To address the abstract-level concern we will add a single clause describing the joint objective and alignment mechanism. revision: yes

Circularity Check

0 steps flagged

No significant circularity; framework relies on external inputs

full rationale

The paper introduces a graph-based representation learning framework that integrates institution-specific summary statistics, curated biomedical knowledge graphs, and LLM-derived semantics to produce a shared semantic space for vocabulary alignment. No equations, loss functions, or derivation steps are shown that reduce any claimed prediction or alignment result to a fitted parameter or input by construction. The approach depends on external resources (KGs, LLMs, site aggregates) rather than self-defining its outputs, and the provided text invokes no self-citations or uniqueness theorems as load-bearing justification. The central claim of cross-institution robustness is presented as an empirical outcome of the joint learning process, not a tautological renaming or definitional equivalence.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; the approach implicitly assumes standard properties of representation learning and privacy-preserving aggregation.

pith-pipeline@v0.9.0 · 5780 in / 1004 out tokens · 31922 ms · 2026-05-23T03:15:59.341056+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · 1 internal anchor

[1]

Perspectives and challenges in patient stratification in alzheimer’s disease.Alzheimer’s research & therapy, 14(1):112, 2022

Carla Abdelnour, Federica Agosta, Marco Bozzali, Bertrand Fougère, Atsushi Iwata, Ramin Nil- forooshan, Leonel T Takada, Félix Viñuela, and Martin Traber. Perspectives and challenges in patient stratification in alzheimer’s disease.Alzheimer’s research & therapy, 14(1):112, 2022

work page 2022
[2]

Melissa J Armstrong, Shangchen Song, Andrea M Kurasz, and Zhigang Li. Predictors of mortality 7https://docs.smarthealthit.org/ 19 in individuals with dementia in the national alzheimer’s coordinating center.Journal of Alzheimer’s Disease, 86(4):1935–1946, 2022

work page 1935
[3]

seroquel

Lisa A Arvanitis and Barbara G Miller. Multiple fixed doses of “seroquel”(quetiapine) in patients with acute exacerbation of schizophrenia: a comparison with haloperidol and placebo.Biological psychiatry, 42(4):233–246, 1997

work page 1997
[4]

Ehr phenotyping via jointly embedding medical concepts and words into a unified vector space.BMC medical informatics and decision making, 18:15–25, 2018

Tian Bai, Ashis Kumar Chanda, Brian L Egleston, and Slobodan Vucetic. Ehr phenotyping via jointly embedding medical concepts and words into a unified vector space.BMC medical informatics and decision making, 18:15–25, 2018

work page 2018
[5]

Tucker: Tensor factorization for knowledge graph completion

Ivana Balažević, Carl Allen, and Timothy M Hospedales. Tucker: Tensor factorization for knowledge graph completion. arXiv preprint arXiv:1901.09590, 2019

work page arXiv 1901
[6]

Beam, Benjamin Kompa, Allen Schmaltz, Inbar Fried, Griffin Weber, Nathan Palmer, Xu Shi, Tianxi Cai, and Isaac S

Andrew L. Beam, Benjamin Kompa, Allen Schmaltz, Inbar Fried, Griffin Weber, Nathan Palmer, Xu Shi, Tianxi Cai, and Isaac S. Kohane. Clinical concept embeddings learned from massive sources of multimodal medical data. In Biocomputing 2020. WORLD SCIENTIFIC, Nov 2019. doi: 10. 1142/9789811215636_0027. URL https://doi.org/10.1142%2F9789811215636_0027

work page 2020
[7]

A neural probabilistic language model

Yoshua Bengio, Réjean Ducharme, and Pascal Vincent. A neural probabilistic language model. JMLR, 3:1137–1155, 2003

work page 2003
[8]

Bodenreider

O. Bodenreider. The unified medical language system (umls): integrating biomedical terminology. Nucleic Acids Research, page 267D – 270, Jan 2004. doi: 10.1093/nar/gkh061. URLhttp://dx. doi.org/10.1093/nar/gkh061

work page doi:10.1093/nar/gkh061 2004
[9]

Evaluation of the ccam hierarchy and semi structured code for retrieving relevant procedures in a hospital case mix database

Cédric Bousquet, Béatrice Trombert, Julien Souvignet, Eric Sadou, and Jean-Marie Rodrigues. Evaluation of the ccam hierarchy and semi structured code for retrieving relevant procedures in a hospital case mix database. In AMIA Annual Symposium Proceedings, volume 2010, page 61. American Medical Informatics Association, 2010

work page 2010
[10]

International statistical classification of diseases and related health problems

Gerlind R Brämer. International statistical classification of diseases and related health problems. tenth revision.World health statistics quarterly. Rapport trimestriel de statistiques sanitaires mon- diales, 41(1):32–36, 1988

work page 1988
[11]

Consensus knowledge graph learning via multi-view sparse low rank block model.arXiv preprint arXiv:2209.13762, 2022

Tianxi Cai, Dong Xia, Luwan Zhang, and Doudou Zhou. Consensus knowledge graph learning via multi-view sparse low rank block model.arXiv preprint arXiv:2209.13762, 2022

work page arXiv 2022
[12]

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distil- lation. arXiv preprint arXiv:2402.03216, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

Privacy protec- tion and intrusion avoidance for cloudlet-based medical data sharing.IEEE transactions on Cloud computing, 8(4):1274–1283, 2016

Min Chen, Yongfeng Qian, Jing Chen, Kai Hwang, Shiwen Mao, and Long Hu. Privacy protec- tion and intrusion avoidance for cloudlet-based medical data sharing.IEEE transactions on Cloud computing, 8(4):1274–1283, 2016

work page 2016
[14]

Multi-layer representation learning for medi- cal concepts

Edward Choi, Mohammad Taha Bahadori, Elizabeth Searles, Catherine Coffey, Michael Thompson, James Bost, Javier Tejedor-Sojo, and Jimeng Sun. Multi-layer representation learning for medi- cal concepts. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1495–1504, 2016. 20

work page 2016
[15]

Comparative efficacy and acceptability of antimanic drugs in acute mania: a multiple-treatments meta-analysis

Andrea Cipriani, Corrado Barbui, Georgia Salanti, Jennifer Rendell, Rachel Brown, Sarah Stockton, Marianna Purgato, Loukia M Spineli, Guy M Goodwin, and John R Geddes. Comparative efficacy and acceptability of antimanic drugs in acute mania: a multiple-treatments meta-analysis. The Lancet, 378(9799):1306–1315, 2011

work page 2011
[16]

Comorbidity clusters in autism spectrum dis- orders: an electronic health record time-series analysis.Pediatrics, 133(1):e54–e63, 2014

Finale Doshi-Velez, Yaorong Ge, and Isaac Kohane. Comorbidity clusters in autism spectrum dis- orders: an electronic health record time-series analysis.Pediatrics, 133(1):e54–e63, 2014

work page 2014
[17]

Risk factors for suicide in adults: systematic review and meta-analysis of psychological autopsy studies.BMJ Ment Health, 25(4):148–155, 2022

Louis Favril, Rongqin Yu, Abdo Uyar, Michael Sharpe, and Seena Fazel. Risk factors for suicide in adults: systematic review and meta-analysis of psychological autopsy studies.BMJ Ment Health, 25(4):148–155, 2022

work page 2022
[18]

Seena Fazel and Bo Runeson. Suicide. New England Journal of Medicine, 382(3):266–274, 2020. doi: 10.1056/NEJMra1902944

work page doi:10.1056/nejmra1902944 2020
[19]

Gnaeus: Utilizing clinical guidelines for knowledge-assisted visualisation of ehr cohorts

Paolo Federico, Jürgen Unger, Albert Amor-Amorós, Lucia Sacchi, Denis Klimov, and Silvia Miksch. Gnaeus: Utilizing clinical guidelines for knowledge-assisted visualisation of ehr cohorts. InEuroVA@ EuroVis, pages 79–83, 2015

work page 2015
[20]

The benefit of augmenting open data with clinical data-warehouse ehr for forecasting sars-cov-2 hospi- talizations in bordeaux area, france.JAMIA open, 5(4):ooac086, 2022

Thomas Ferté, Vianney Jouhet, Romain Griffier, Boris P Hejblum, and Rodolphe Thiébaut. The benefit of augmenting open data with clinical data-warehouse ehr for forecasting sars-cov-2 hospi- talizations in bordeaux area, france.JAMIA open, 5(4):ooac086, 2022

work page 2022
[21]

ARCH: Large-scale knowledge graph via aggregated narrative codified health records analysis.medRxiv, 2023

Ziming Gan, Doudou Zhou, Everett Rush, Vidul A Panickan, Yuk-Lam Ho, George Ostrouchov, Zhiwei Xu, Shuting Shen, Xin Xiong, Kimberly F Greco, et al. ARCH: Large-scale knowledge graph via aggregated narrative codified health records analysis.medRxiv, 2023

work page 2023
[22]

A new model for learning in graph domains

Marco Gori, Gabriele Monfardini, and Franco Scarselli. A new model for learning in graph domains. InProceedings. 2005 IEEE international joint conference on neural networks, 2005., volume 2, pages 729–734. IEEE, 2005

work page 2005
[23]

Domain-specific language model pretraining for biomedical natural language processing.ACM Transactions on Computing for Healthcare (HEALTH), 3(1):1–23, 2021

Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon. Domain-specific language model pretraining for biomedical natural language processing.ACM Transactions on Computing for Healthcare (HEALTH), 3(1):1–23, 2021

work page 2021
[24]

An open-source framework for end-to-end analysis of electronic health record data.Nature medicine, 30(11):3369– 3380, 2024

Lukas Heumos, Philipp Ehmele, Tim Treis, Julius Upmeier zu Belzen, Eljas Roellin, Lilly May, Altana Namsaraeva, Nastassya Horlava, Vladimir A Shitov, Xinyue Zhang, et al. An open-source framework for end-to-end analysis of electronic health record data.Nature medicine, 30(11):3369– 3380, 2024

work page 2024
[25]

Clinical knowledge extraction via sparse embedding regression (keser) with multi-center large scale electronic health record data

Chuan Hong, Everett Rush, Molei Liu, Doudou Zhou, Jiehuan Sun, Aaron Sonabend, Victor M Castro, Petra Schubert, Vidul A Panickan, Tianrun Cai, et al. Clinical knowledge extraction via sparse embedding regression (keser) with multi-center large scale electronic health record data. medRxiv, 2021

work page 2021
[26]

Psychosis in alzheimer disease—mechanisms, genetics and therapeutic opportunities

Zahinoor Ismail, Byron Creese, Dag Aarsland, Helen C Kales, Constantine G Lyketsos, Robert A Sweet, and Clive Ballard. Psychosis in alzheimer disease—mechanisms, genetics and therapeutic opportunities. Nature Reviews Neurology, 18(3):131–144, 2022

work page 2022
[27]

MIMIC-IV (version 0.4)

A Johnson, L Bulgarelli, T Pollard, S Horng, L A Celi, and R Mark. MIMIC-IV (version 0.4). PhysioNet., 2020. 21

work page 2020
[28]

Code2vec: Embed- ding and clustering medical diagnosis data

David Kartchner, Tanner Christensen, Jeffrey Humpherys, and Sean Wade. Code2vec: Embed- ding and clustering medical diagnosis data. In2017 IEEE International Conference on Healthcare Informatics, pages 386–390, 2017

work page 2017
[29]

Deep representation learning of electronic health records to unlock patient stratification at scale.NPJ digital medicine, 3(1):96, 2020

Isotta Landi, Benjamin S Glicksberg, Hao-Chih Lee, Sarah Cherng, Giulia Landi, Matteo Danieletto, Joel T Dudley, Cesare Furlanello, and Riccardo Miotto. Deep representation learning of electronic health records to unlock patient stratification at scale.NPJ digital medicine, 3(1):96, 2020

work page 2020
[30]

Lozano, A

Dongha Lee, Xiaoqian Jiang, and Hwanjo Yu. Harmonized representation learning on dynamic ehr graphs. Journal of biomedical informatics, 106:103426, June 2020. ISSN 1532-0464. doi: 10.1016/j. jbi.2020.103426. URL https://doi.org/10.1016/j.jbi.2020.103426

work page doi:10.1016/j 2020
[31]

Biobert: a pre-trained biomedical language representation model for biomedical text mining

Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 2020

work page 2020
[32]

Neural word embedding as implicit matrix factorization.Advances in neural information processing systems, 27, 2014

Omer Levy and Yoav Goldberg. Neural word embedding as implicit matrix factorization.Advances in neural information processing systems, 27, 2014

work page 2014
[33]

Identification of type 2 diabetes subgroups through topological analysis of patient similarity.Science translational medicine, 7(311):311ra174–311ra174, 2015

Li Li, Wei-Yi Cheng, Benjamin S Glicksberg, Omri Gottesman, Ronald Tamler, Rong Chen, Erwin P Bottinger, and Joel T Dudley. Identification of type 2 diabetes subgroups through topological analysis of patient similarity.Science translational medicine, 7(311):311ra174–311ra174, 2015

work page 2015
[34]

Development of phenotype algorithms using electronic medical records and incorporating natural language processing

Katherine P Liao, Tianxi Cai, Guergana K Savova, Shawn N Murphy, Elizabeth W Karlson, Ash- win N Ananthakrishnan, Vivian S Gainer, Stanley Y Shaw, Zongqi Xia, Peter Szolovits, et al. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. bmj, 350, 2015

work page 2015
[35]

Multimodal learning on graphs for disease relation extraction

Yucong Lin, Keming Lu, Sheng Yu, Tianxi Cai, and Marinka Zitnik. Multimodal learning on graphs for disease relation extraction. CoRR, abs/2203.08893, 2022. doi: 10.48550/ARXIV.2203.08893. URL https://doi.org/10.48550/arXiv.2203.08893

work page doi:10.48550/arxiv.2203.08893 2022
[36]

Self-alignment pretraining for biomedical entity representations

Fangyu Liu, Ehsan Shareghi, Zaiqiao Meng, Marco Basaldella, and Nigel Collier. Self-alignment pretraining for biomedical entity representations. In Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao Zhou, editors,Proceedings of the 2021 Conference of the North...

work page doi:10.18653/v1/2021.naacl-main 2021
[37]

URL https://aclanthology.org/2021.naacl-main.334

work page 2021
[38]

The role of nmda receptors in alzheimer’s disease.Frontiers in neuroscience, 13:43, 2019

Jinping Liu, Lirong Chang, Yizhi Song, Hui Li, and Yan Wu. The role of nmda receptors in alzheimer’s disease.Frontiers in neuroscience, 13:43, 2019

work page 2019
[39]

Loinc, a universal standard for identifying laboratory observations: a 5-year update.Clinical chemistry, 49(4):624–633, 2003

Clement J McDonald, Stanley M Huff, Jeffrey G Suico, Gilbert Hill, Dennis Leavelle, Raymond Aller, Arden Forrey, Kathy Mercer, Georges DeMoor, John Hook, et al. Loinc, a universal standard for identifying laboratory observations: a 5-year update.Clinical chemistry, 49(4):624–633, 2003

work page 2003
[40]

Distributed representa- tions of words and phrases and their compositionality.Adv Neural Inf Process Syst, 26:3111–3119, 2013

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representa- tions of words and phrases and their compositionality.Adv Neural Inf Process Syst, 26:3111–3119, 2013. 22

work page 2013
[41]

Federated learning for heterogeneous electronic health records utilising augmented temporal graph attention networks

Soheila Molaei, Anshul Thakur, Ghazaleh Niknam, Andrew Soltan, Hadi Zare, and David A Clifton. Federated learning for heterogeneous electronic health records utilising augmented temporal graph attention networks. InInternational Conference on Artificial Intelligence and Statistics, pages 1342–

work page
[42]

Omop, 2021

OMOP. Omop, 2021. URLhttps://ohdsi.org/omop/. Accessed: June, 2021

work page 2021
[43]

International classification of diseases—ninth revision (icd-9)

World Health Organization et al. International classification of diseases—ninth revision (icd-9). Weekly Epidemiological Record= Relevé épidémiologique hebdomadaire, 63(45):343–344, 1988

work page 1988
[44]

Glove: Global vectors for word representation

Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pages 1532–1543, 2014

work page 2014
[45]

Reina, Jason Martin, Sarthak Pati, Aikaterini Kotrotsou, Mikhail Milchenko, Weilin Xu, Daniel Marcus, Rivka Colen, and Spyridon Bakas

Micah Sheller, Brandon Edwards, G. Reina, Jason Martin, Sarthak Pati, Aikaterini Kotrotsou, Mikhail Milchenko, Weilin Xu, Daniel Marcus, Rivka Colen, and Spyridon Bakas. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data.Scientific Reports, 10, 07 2020. doi: 10.1038/s41598-020-69250-1

work page doi:10.1038/s41598-020-69250-1 2020
[46]

Biomegatron: larger biomedical domain language model

Hoo-Chang Shin, Yang Zhang, Evelina Bakhturina, Raul Puri, Mostofa Patwary, Mohammad Shoeybi, and Raghav Mani. Biomegatron: larger biomedical domain language model. In Pro- ceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4700–4706, 2020

work page 2020
[47]

Suicide and prevalence of mental disorders: A systematic review and meta-analysis of world data on case-control psychological autopsy studies

Roshan Sutar, Akash Kumar, and Vikas Yadav. Suicide and prevalence of mental disorders: A systematic review and meta-analysis of world data on case-control psychological autopsy studies. Psychiatry research, page 115492, 2023

work page 2023
[48]

Federated k-means clustering.arXiv preprint arXiv:2310.01195, 2024

Marcel Reinders Swier Garst. Federated k-means clustering.arXiv preprint arXiv:2310.01195, 2024

work page arXiv 2024
[49]

Tariot, Martin R

Pierre N. Tariot, Martin R. Farlow, George T. Grossberg, Stephen M. Graham, Scott McDonald, Ivan Gergel, and for the Memantine Study Group. Memantine treatment in patients with moderate to severe alzheimer disease already receiving donepezila randomized controlled trial.JAMA, 291(3): 317–324, 01 2004. ISSN 0098-7484. doi: 10.1001/jama.291.3.317

work page doi:10.1001/jama.291.3.317 2004
[50]

Graph attention networks

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. InInternational Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJXMpikCZ

work page 2018
[51]

Risk and Trust Perceptions of the Public of Artifical Intelligence Applications

Ke Wang, Ning Chen, and Ting Chen. Joint medical ontology representation learning for healthcare predictions. In2020 International Joint Conference on Neural Networks (IJCNN), pages 1–7, 2020. doi: 10.1109/IJCNN48605.2020.9207355

work page doi:10.1109/ijcnn48605.2020.9207355 2020
[52]

Stratification of alzheimer’s disease patients using knowledge-guided unsupervised latent factor clustering with electronic health record data

Linshanshan Wang, Shruthi Venkatesh, Michele Morris, Mengyan Li, Ratnam Srivastava, Shyam Visweswaran, Oscar Lopez, Zongqi Xia, and Tianxi Cai. Stratification of alzheimer’s disease patients using knowledge-guided unsupervised latent factor clustering with electronic health record data. medRxiv, 2024. doi: 10.1101/2024.12.23.24319588. URL https://www.medr...

work page doi:10.1101/2024.12.23.24319588 2024
[53]

Multi-similarity loss with general pair weighting for deep metric learning

Xun Wang, Xintong Han, Weilin Huang, Dengke Dong, and Matthew R Scott. Multi-similarity loss with general pair weighting for deep metric learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5022–5030, 2019. 23

work page 2019
[54]

Knowledge graph embedding by trans- lating on hyperplanes

Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph embedding by trans- lating on hyperplanes. InProceedings of the AAAI Conference on Artificial Intelligence, volume 28, 2014

work page 2014
[55]

Mandl, Suchun Cheng, Zongqi Xia, Kelly Cho, J

Xin Xiong, Sara Morini Sweet, Molei Liu, Chuan Hong, Clara-Lea Bonzel, Vidul Ayakulangara Panickan, Doudou Zhou, Linshanshan Wang, Lauren Costa, Yuk-Lam Ho, Alon Geva, Kenneth D. Mandl, Suchun Cheng, Zongqi Xia, Kelly Cho, J. Michael Gaziano, Katherine P. Liao, Tianxi Cai, and Tianrun Cai. Knowledge-driven online multimodal automated phenotyping system.medRxiv,

work page
[56]

URL https://www.medrxiv.org/content/early/2023/ 10/02/2023.09.29.23296239

doi: 10.1101/2023.09.29.23296239. URL https://www.medrxiv.org/content/early/2023/ 10/02/2023.09.29.23296239

work page doi:10.1101/2023.09.29.23296239 2023
[57]

Kg-bert: Bert for knowledge graph completion.arXiv preprint arXiv:1909.03193, 2019

Liang Yao, Chengsheng Mao, and Yuan Luo. Kg-bert: Bert for knowledge graph completion.arXiv preprint arXiv:1909.03193, 2019

work page arXiv 1909
[58]

Coder: Knowledge- infused cross-lingual medical term embedding for term normalization.Journal of Biomedical Infor- matics, 126:103983, 2022

Zheng Yuan, Zhengyun Zhao, Haixia Sun, Jiao Li, Fei Wang, and Sheng Yu. Coder: Knowledge- infused cross-lingual medical term embedding for term normalization.Journal of Biomedical Infor- matics, 126:103983, 2022

work page 2022
[59]

Predictors for survival in patients with alzheimer’s disease: a large comprehensive meta-analysis.Translational Psychiatry, 14(1):184, 2024

Xiaoting Zheng, Shichan Wang, Jingxuan Huang, Chunyu Li, and Huifang Shang. Predictors for survival in patients with alzheimer’s disease: a large comprehensive meta-analysis.Translational Psychiatry, 14(1):184, 2024

work page 2024
[60]

Panickan, Chuan Hong, Yuk-Lam Ho, Tianrun Cai, Lauren Costa, Xiaoou Li, Victor M

Doudou Zhou, Ziming Gan, Xu Shi, Alina Patwari, Everett Rush, Clara-Lea Bonzel, Vidul A. Panickan, Chuan Hong, Yuk-Lam Ho, Tianrun Cai, Lauren Costa, Xiaoou Li, Victor M. Castro, Shawn N. Murphy, Gabriel Brat, Griffin Weber, Paul Avillach, J. Michael Gaziano, Kelly Cho, Katherine P. Liao, Junwei Lu, and Tianxi Cai. Multiview incomplete knowledge graph int...

work page doi:10.1016/j.jbi.2022.104147 2022
[61]

grandparent

Doudou Zhou, Yufeng Zhang, Aaron Sonabend-W, Zhaoran Wang, Junwei Lu, and Tianxi Cai. Federated offline reinforcement learning. Journal of the American Statistical Association, pages 1–12, 2024. 24 Supplementary Material Representation Learning to Advance Multi-Institutional Studies with Electronic Health Record Data S.1 Training and validation data base ...

work page 2024
[62]

one-step training

In the similarity training step, we save the embedding with the highest code mapping accuracy, as detailed in Algorithm 2. In the relatedness training step, we save the embedding with the highest feature selection correlation, also detailed in Algorithm 2. When splitting the training and validation sets, we divide the similar hierarchical pairs according ...

work page arXiv 1977

[1] [1]

Perspectives and challenges in patient stratification in alzheimer’s disease.Alzheimer’s research & therapy, 14(1):112, 2022

Carla Abdelnour, Federica Agosta, Marco Bozzali, Bertrand Fougère, Atsushi Iwata, Ramin Nil- forooshan, Leonel T Takada, Félix Viñuela, and Martin Traber. Perspectives and challenges in patient stratification in alzheimer’s disease.Alzheimer’s research & therapy, 14(1):112, 2022

work page 2022

[2] [2]

Melissa J Armstrong, Shangchen Song, Andrea M Kurasz, and Zhigang Li. Predictors of mortality 7https://docs.smarthealthit.org/ 19 in individuals with dementia in the national alzheimer’s coordinating center.Journal of Alzheimer’s Disease, 86(4):1935–1946, 2022

work page 1935

[3] [3]

seroquel

Lisa A Arvanitis and Barbara G Miller. Multiple fixed doses of “seroquel”(quetiapine) in patients with acute exacerbation of schizophrenia: a comparison with haloperidol and placebo.Biological psychiatry, 42(4):233–246, 1997

work page 1997

[4] [4]

Ehr phenotyping via jointly embedding medical concepts and words into a unified vector space.BMC medical informatics and decision making, 18:15–25, 2018

Tian Bai, Ashis Kumar Chanda, Brian L Egleston, and Slobodan Vucetic. Ehr phenotyping via jointly embedding medical concepts and words into a unified vector space.BMC medical informatics and decision making, 18:15–25, 2018

work page 2018

[5] [5]

Tucker: Tensor factorization for knowledge graph completion

Ivana Balažević, Carl Allen, and Timothy M Hospedales. Tucker: Tensor factorization for knowledge graph completion. arXiv preprint arXiv:1901.09590, 2019

work page arXiv 1901

[6] [6]

Beam, Benjamin Kompa, Allen Schmaltz, Inbar Fried, Griffin Weber, Nathan Palmer, Xu Shi, Tianxi Cai, and Isaac S

Andrew L. Beam, Benjamin Kompa, Allen Schmaltz, Inbar Fried, Griffin Weber, Nathan Palmer, Xu Shi, Tianxi Cai, and Isaac S. Kohane. Clinical concept embeddings learned from massive sources of multimodal medical data. In Biocomputing 2020. WORLD SCIENTIFIC, Nov 2019. doi: 10. 1142/9789811215636_0027. URL https://doi.org/10.1142%2F9789811215636_0027

work page 2020

[7] [7]

A neural probabilistic language model

Yoshua Bengio, Réjean Ducharme, and Pascal Vincent. A neural probabilistic language model. JMLR, 3:1137–1155, 2003

work page 2003

[8] [8]

Bodenreider

O. Bodenreider. The unified medical language system (umls): integrating biomedical terminology. Nucleic Acids Research, page 267D – 270, Jan 2004. doi: 10.1093/nar/gkh061. URLhttp://dx. doi.org/10.1093/nar/gkh061

work page doi:10.1093/nar/gkh061 2004

[9] [9]

Evaluation of the ccam hierarchy and semi structured code for retrieving relevant procedures in a hospital case mix database

Cédric Bousquet, Béatrice Trombert, Julien Souvignet, Eric Sadou, and Jean-Marie Rodrigues. Evaluation of the ccam hierarchy and semi structured code for retrieving relevant procedures in a hospital case mix database. In AMIA Annual Symposium Proceedings, volume 2010, page 61. American Medical Informatics Association, 2010

work page 2010

[10] [10]

International statistical classification of diseases and related health problems

Gerlind R Brämer. International statistical classification of diseases and related health problems. tenth revision.World health statistics quarterly. Rapport trimestriel de statistiques sanitaires mon- diales, 41(1):32–36, 1988

work page 1988

[11] [11]

Consensus knowledge graph learning via multi-view sparse low rank block model.arXiv preprint arXiv:2209.13762, 2022

Tianxi Cai, Dong Xia, Luwan Zhang, and Doudou Zhou. Consensus knowledge graph learning via multi-view sparse low rank block model.arXiv preprint arXiv:2209.13762, 2022

work page arXiv 2022

[12] [12]

M3-Embedding: Multi-Linguality, Multi-Functionality, Multi-Granularity Text Embeddings Through Self-Knowledge Distillation

Jianlv Chen, Shitao Xiao, Peitian Zhang, Kun Luo, Defu Lian, and Zheng Liu. Bge m3-embedding: Multi-lingual, multi-functionality, multi-granularity text embeddings through self-knowledge distil- lation. arXiv preprint arXiv:2402.03216, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[13] [13]

Privacy protec- tion and intrusion avoidance for cloudlet-based medical data sharing.IEEE transactions on Cloud computing, 8(4):1274–1283, 2016

Min Chen, Yongfeng Qian, Jing Chen, Kai Hwang, Shiwen Mao, and Long Hu. Privacy protec- tion and intrusion avoidance for cloudlet-based medical data sharing.IEEE transactions on Cloud computing, 8(4):1274–1283, 2016

work page 2016

[14] [14]

Multi-layer representation learning for medi- cal concepts

Edward Choi, Mohammad Taha Bahadori, Elizabeth Searles, Catherine Coffey, Michael Thompson, James Bost, Javier Tejedor-Sojo, and Jimeng Sun. Multi-layer representation learning for medi- cal concepts. In Proceedings of the 22nd ACM SIGKDD International Conference on Knowledge Discovery and Data Mining, pages 1495–1504, 2016. 20

work page 2016

[15] [15]

Comparative efficacy and acceptability of antimanic drugs in acute mania: a multiple-treatments meta-analysis

Andrea Cipriani, Corrado Barbui, Georgia Salanti, Jennifer Rendell, Rachel Brown, Sarah Stockton, Marianna Purgato, Loukia M Spineli, Guy M Goodwin, and John R Geddes. Comparative efficacy and acceptability of antimanic drugs in acute mania: a multiple-treatments meta-analysis. The Lancet, 378(9799):1306–1315, 2011

work page 2011

[16] [16]

Comorbidity clusters in autism spectrum dis- orders: an electronic health record time-series analysis.Pediatrics, 133(1):e54–e63, 2014

Finale Doshi-Velez, Yaorong Ge, and Isaac Kohane. Comorbidity clusters in autism spectrum dis- orders: an electronic health record time-series analysis.Pediatrics, 133(1):e54–e63, 2014

work page 2014

[17] [17]

Risk factors for suicide in adults: systematic review and meta-analysis of psychological autopsy studies.BMJ Ment Health, 25(4):148–155, 2022

Louis Favril, Rongqin Yu, Abdo Uyar, Michael Sharpe, and Seena Fazel. Risk factors for suicide in adults: systematic review and meta-analysis of psychological autopsy studies.BMJ Ment Health, 25(4):148–155, 2022

work page 2022

[18] [18]

Seena Fazel and Bo Runeson. Suicide. New England Journal of Medicine, 382(3):266–274, 2020. doi: 10.1056/NEJMra1902944

work page doi:10.1056/nejmra1902944 2020

[19] [19]

Gnaeus: Utilizing clinical guidelines for knowledge-assisted visualisation of ehr cohorts

Paolo Federico, Jürgen Unger, Albert Amor-Amorós, Lucia Sacchi, Denis Klimov, and Silvia Miksch. Gnaeus: Utilizing clinical guidelines for knowledge-assisted visualisation of ehr cohorts. InEuroVA@ EuroVis, pages 79–83, 2015

work page 2015

[20] [20]

The benefit of augmenting open data with clinical data-warehouse ehr for forecasting sars-cov-2 hospi- talizations in bordeaux area, france.JAMIA open, 5(4):ooac086, 2022

Thomas Ferté, Vianney Jouhet, Romain Griffier, Boris P Hejblum, and Rodolphe Thiébaut. The benefit of augmenting open data with clinical data-warehouse ehr for forecasting sars-cov-2 hospi- talizations in bordeaux area, france.JAMIA open, 5(4):ooac086, 2022

work page 2022

[21] [21]

ARCH: Large-scale knowledge graph via aggregated narrative codified health records analysis.medRxiv, 2023

Ziming Gan, Doudou Zhou, Everett Rush, Vidul A Panickan, Yuk-Lam Ho, George Ostrouchov, Zhiwei Xu, Shuting Shen, Xin Xiong, Kimberly F Greco, et al. ARCH: Large-scale knowledge graph via aggregated narrative codified health records analysis.medRxiv, 2023

work page 2023

[22] [22]

A new model for learning in graph domains

Marco Gori, Gabriele Monfardini, and Franco Scarselli. A new model for learning in graph domains. InProceedings. 2005 IEEE international joint conference on neural networks, 2005., volume 2, pages 729–734. IEEE, 2005

work page 2005

[23] [23]

Domain-specific language model pretraining for biomedical natural language processing.ACM Transactions on Computing for Healthcare (HEALTH), 3(1):1–23, 2021

Yu Gu, Robert Tinn, Hao Cheng, Michael Lucas, Naoto Usuyama, Xiaodong Liu, Tristan Naumann, Jianfeng Gao, and Hoifung Poon. Domain-specific language model pretraining for biomedical natural language processing.ACM Transactions on Computing for Healthcare (HEALTH), 3(1):1–23, 2021

work page 2021

[24] [24]

An open-source framework for end-to-end analysis of electronic health record data.Nature medicine, 30(11):3369– 3380, 2024

Lukas Heumos, Philipp Ehmele, Tim Treis, Julius Upmeier zu Belzen, Eljas Roellin, Lilly May, Altana Namsaraeva, Nastassya Horlava, Vladimir A Shitov, Xinyue Zhang, et al. An open-source framework for end-to-end analysis of electronic health record data.Nature medicine, 30(11):3369– 3380, 2024

work page 2024

[25] [25]

Clinical knowledge extraction via sparse embedding regression (keser) with multi-center large scale electronic health record data

Chuan Hong, Everett Rush, Molei Liu, Doudou Zhou, Jiehuan Sun, Aaron Sonabend, Victor M Castro, Petra Schubert, Vidul A Panickan, Tianrun Cai, et al. Clinical knowledge extraction via sparse embedding regression (keser) with multi-center large scale electronic health record data. medRxiv, 2021

work page 2021

[26] [26]

Psychosis in alzheimer disease—mechanisms, genetics and therapeutic opportunities

Zahinoor Ismail, Byron Creese, Dag Aarsland, Helen C Kales, Constantine G Lyketsos, Robert A Sweet, and Clive Ballard. Psychosis in alzheimer disease—mechanisms, genetics and therapeutic opportunities. Nature Reviews Neurology, 18(3):131–144, 2022

work page 2022

[27] [27]

MIMIC-IV (version 0.4)

A Johnson, L Bulgarelli, T Pollard, S Horng, L A Celi, and R Mark. MIMIC-IV (version 0.4). PhysioNet., 2020. 21

work page 2020

[28] [28]

Code2vec: Embed- ding and clustering medical diagnosis data

David Kartchner, Tanner Christensen, Jeffrey Humpherys, and Sean Wade. Code2vec: Embed- ding and clustering medical diagnosis data. In2017 IEEE International Conference on Healthcare Informatics, pages 386–390, 2017

work page 2017

[29] [29]

Deep representation learning of electronic health records to unlock patient stratification at scale.NPJ digital medicine, 3(1):96, 2020

Isotta Landi, Benjamin S Glicksberg, Hao-Chih Lee, Sarah Cherng, Giulia Landi, Matteo Danieletto, Joel T Dudley, Cesare Furlanello, and Riccardo Miotto. Deep representation learning of electronic health records to unlock patient stratification at scale.NPJ digital medicine, 3(1):96, 2020

work page 2020

[30] [30]

Lozano, A

Dongha Lee, Xiaoqian Jiang, and Hwanjo Yu. Harmonized representation learning on dynamic ehr graphs. Journal of biomedical informatics, 106:103426, June 2020. ISSN 1532-0464. doi: 10.1016/j. jbi.2020.103426. URL https://doi.org/10.1016/j.jbi.2020.103426

work page doi:10.1016/j 2020

[31] [31]

Biobert: a pre-trained biomedical language representation model for biomedical text mining

Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. Biobert: a pre-trained biomedical language representation model for biomedical text mining. Bioinformatics, 36(4):1234–1240, 2020

work page 2020

[32] [32]

Neural word embedding as implicit matrix factorization.Advances in neural information processing systems, 27, 2014

Omer Levy and Yoav Goldberg. Neural word embedding as implicit matrix factorization.Advances in neural information processing systems, 27, 2014

work page 2014

[33] [33]

Identification of type 2 diabetes subgroups through topological analysis of patient similarity.Science translational medicine, 7(311):311ra174–311ra174, 2015

Li Li, Wei-Yi Cheng, Benjamin S Glicksberg, Omri Gottesman, Ronald Tamler, Rong Chen, Erwin P Bottinger, and Joel T Dudley. Identification of type 2 diabetes subgroups through topological analysis of patient similarity.Science translational medicine, 7(311):311ra174–311ra174, 2015

work page 2015

[34] [34]

Development of phenotype algorithms using electronic medical records and incorporating natural language processing

Katherine P Liao, Tianxi Cai, Guergana K Savova, Shawn N Murphy, Elizabeth W Karlson, Ash- win N Ananthakrishnan, Vivian S Gainer, Stanley Y Shaw, Zongqi Xia, Peter Szolovits, et al. Development of phenotype algorithms using electronic medical records and incorporating natural language processing. bmj, 350, 2015

work page 2015

[35] [35]

Multimodal learning on graphs for disease relation extraction

Yucong Lin, Keming Lu, Sheng Yu, Tianxi Cai, and Marinka Zitnik. Multimodal learning on graphs for disease relation extraction. CoRR, abs/2203.08893, 2022. doi: 10.48550/ARXIV.2203.08893. URL https://doi.org/10.48550/arXiv.2203.08893

work page doi:10.48550/arxiv.2203.08893 2022

[36] [36]

Self-alignment pretraining for biomedical entity representations

Fangyu Liu, Ehsan Shareghi, Zaiqiao Meng, Marco Basaldella, and Nigel Collier. Self-alignment pretraining for biomedical entity representations. In Kristina Toutanova, Anna Rumshisky, Luke Zettlemoyer, Dilek Hakkani-Tur, Iz Beltagy, Steven Bethard, Ryan Cotterell, Tanmoy Chakraborty, and Yichao Zhou, editors,Proceedings of the 2021 Conference of the North...

work page doi:10.18653/v1/2021.naacl-main 2021

[37] [37]

URL https://aclanthology.org/2021.naacl-main.334

work page 2021

[38] [38]

The role of nmda receptors in alzheimer’s disease.Frontiers in neuroscience, 13:43, 2019

Jinping Liu, Lirong Chang, Yizhi Song, Hui Li, and Yan Wu. The role of nmda receptors in alzheimer’s disease.Frontiers in neuroscience, 13:43, 2019

work page 2019

[39] [39]

Loinc, a universal standard for identifying laboratory observations: a 5-year update.Clinical chemistry, 49(4):624–633, 2003

Clement J McDonald, Stanley M Huff, Jeffrey G Suico, Gilbert Hill, Dennis Leavelle, Raymond Aller, Arden Forrey, Kathy Mercer, Georges DeMoor, John Hook, et al. Loinc, a universal standard for identifying laboratory observations: a 5-year update.Clinical chemistry, 49(4):624–633, 2003

work page 2003

[40] [40]

Distributed representa- tions of words and phrases and their compositionality.Adv Neural Inf Process Syst, 26:3111–3119, 2013

Tomas Mikolov, Ilya Sutskever, Kai Chen, Greg S Corrado, and Jeff Dean. Distributed representa- tions of words and phrases and their compositionality.Adv Neural Inf Process Syst, 26:3111–3119, 2013. 22

work page 2013

[41] [41]

Federated learning for heterogeneous electronic health records utilising augmented temporal graph attention networks

Soheila Molaei, Anshul Thakur, Ghazaleh Niknam, Andrew Soltan, Hadi Zare, and David A Clifton. Federated learning for heterogeneous electronic health records utilising augmented temporal graph attention networks. InInternational Conference on Artificial Intelligence and Statistics, pages 1342–

work page

[42] [42]

Omop, 2021

OMOP. Omop, 2021. URLhttps://ohdsi.org/omop/. Accessed: June, 2021

work page 2021

[43] [43]

International classification of diseases—ninth revision (icd-9)

World Health Organization et al. International classification of diseases—ninth revision (icd-9). Weekly Epidemiological Record= Relevé épidémiologique hebdomadaire, 63(45):343–344, 1988

work page 1988

[44] [44]

Glove: Global vectors for word representation

Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 Conference on Empirical Methods in Natural Language Processing, pages 1532–1543, 2014

work page 2014

[45] [45]

Reina, Jason Martin, Sarthak Pati, Aikaterini Kotrotsou, Mikhail Milchenko, Weilin Xu, Daniel Marcus, Rivka Colen, and Spyridon Bakas

Micah Sheller, Brandon Edwards, G. Reina, Jason Martin, Sarthak Pati, Aikaterini Kotrotsou, Mikhail Milchenko, Weilin Xu, Daniel Marcus, Rivka Colen, and Spyridon Bakas. Federated learning in medicine: facilitating multi-institutional collaborations without sharing patient data.Scientific Reports, 10, 07 2020. doi: 10.1038/s41598-020-69250-1

work page doi:10.1038/s41598-020-69250-1 2020

[46] [46]

Biomegatron: larger biomedical domain language model

Hoo-Chang Shin, Yang Zhang, Evelina Bakhturina, Raul Puri, Mostofa Patwary, Mohammad Shoeybi, and Raghav Mani. Biomegatron: larger biomedical domain language model. In Pro- ceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 4700–4706, 2020

work page 2020

[47] [47]

Suicide and prevalence of mental disorders: A systematic review and meta-analysis of world data on case-control psychological autopsy studies

Roshan Sutar, Akash Kumar, and Vikas Yadav. Suicide and prevalence of mental disorders: A systematic review and meta-analysis of world data on case-control psychological autopsy studies. Psychiatry research, page 115492, 2023

work page 2023

[48] [48]

Federated k-means clustering.arXiv preprint arXiv:2310.01195, 2024

Marcel Reinders Swier Garst. Federated k-means clustering.arXiv preprint arXiv:2310.01195, 2024

work page arXiv 2024

[49] [49]

Tariot, Martin R

Pierre N. Tariot, Martin R. Farlow, George T. Grossberg, Stephen M. Graham, Scott McDonald, Ivan Gergel, and for the Memantine Study Group. Memantine treatment in patients with moderate to severe alzheimer disease already receiving donepezila randomized controlled trial.JAMA, 291(3): 317–324, 01 2004. ISSN 0098-7484. doi: 10.1001/jama.291.3.317

work page doi:10.1001/jama.291.3.317 2004

[50] [50]

Graph attention networks

Petar Veličković, Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Liò, and Yoshua Bengio. Graph attention networks. InInternational Conference on Learning Representations, 2018. URL https://openreview.net/forum?id=rJXMpikCZ

work page 2018

[51] [51]

Risk and Trust Perceptions of the Public of Artifical Intelligence Applications

Ke Wang, Ning Chen, and Ting Chen. Joint medical ontology representation learning for healthcare predictions. In2020 International Joint Conference on Neural Networks (IJCNN), pages 1–7, 2020. doi: 10.1109/IJCNN48605.2020.9207355

work page doi:10.1109/ijcnn48605.2020.9207355 2020

[52] [52]

Stratification of alzheimer’s disease patients using knowledge-guided unsupervised latent factor clustering with electronic health record data

Linshanshan Wang, Shruthi Venkatesh, Michele Morris, Mengyan Li, Ratnam Srivastava, Shyam Visweswaran, Oscar Lopez, Zongqi Xia, and Tianxi Cai. Stratification of alzheimer’s disease patients using knowledge-guided unsupervised latent factor clustering with electronic health record data. medRxiv, 2024. doi: 10.1101/2024.12.23.24319588. URL https://www.medr...

work page doi:10.1101/2024.12.23.24319588 2024

[53] [53]

Multi-similarity loss with general pair weighting for deep metric learning

Xun Wang, Xintong Han, Weilin Huang, Dengke Dong, and Matthew R Scott. Multi-similarity loss with general pair weighting for deep metric learning. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 5022–5030, 2019. 23

work page 2019

[54] [54]

Knowledge graph embedding by trans- lating on hyperplanes

Zhen Wang, Jianwen Zhang, Jianlin Feng, and Zheng Chen. Knowledge graph embedding by trans- lating on hyperplanes. InProceedings of the AAAI Conference on Artificial Intelligence, volume 28, 2014

work page 2014

[55] [55]

Mandl, Suchun Cheng, Zongqi Xia, Kelly Cho, J

Xin Xiong, Sara Morini Sweet, Molei Liu, Chuan Hong, Clara-Lea Bonzel, Vidul Ayakulangara Panickan, Doudou Zhou, Linshanshan Wang, Lauren Costa, Yuk-Lam Ho, Alon Geva, Kenneth D. Mandl, Suchun Cheng, Zongqi Xia, Kelly Cho, J. Michael Gaziano, Katherine P. Liao, Tianxi Cai, and Tianrun Cai. Knowledge-driven online multimodal automated phenotyping system.medRxiv,

work page

[56] [56]

URL https://www.medrxiv.org/content/early/2023/ 10/02/2023.09.29.23296239

doi: 10.1101/2023.09.29.23296239. URL https://www.medrxiv.org/content/early/2023/ 10/02/2023.09.29.23296239

work page doi:10.1101/2023.09.29.23296239 2023

[57] [57]

Kg-bert: Bert for knowledge graph completion.arXiv preprint arXiv:1909.03193, 2019

Liang Yao, Chengsheng Mao, and Yuan Luo. Kg-bert: Bert for knowledge graph completion.arXiv preprint arXiv:1909.03193, 2019

work page arXiv 1909

[58] [58]

Coder: Knowledge- infused cross-lingual medical term embedding for term normalization.Journal of Biomedical Infor- matics, 126:103983, 2022

Zheng Yuan, Zhengyun Zhao, Haixia Sun, Jiao Li, Fei Wang, and Sheng Yu. Coder: Knowledge- infused cross-lingual medical term embedding for term normalization.Journal of Biomedical Infor- matics, 126:103983, 2022

work page 2022

[59] [59]

Predictors for survival in patients with alzheimer’s disease: a large comprehensive meta-analysis.Translational Psychiatry, 14(1):184, 2024

Xiaoting Zheng, Shichan Wang, Jingxuan Huang, Chunyu Li, and Huifang Shang. Predictors for survival in patients with alzheimer’s disease: a large comprehensive meta-analysis.Translational Psychiatry, 14(1):184, 2024

work page 2024

[60] [60]

Panickan, Chuan Hong, Yuk-Lam Ho, Tianrun Cai, Lauren Costa, Xiaoou Li, Victor M

Doudou Zhou, Ziming Gan, Xu Shi, Alina Patwari, Everett Rush, Clara-Lea Bonzel, Vidul A. Panickan, Chuan Hong, Yuk-Lam Ho, Tianrun Cai, Lauren Costa, Xiaoou Li, Victor M. Castro, Shawn N. Murphy, Gabriel Brat, Griffin Weber, Paul Avillach, J. Michael Gaziano, Kelly Cho, Katherine P. Liao, Junwei Lu, and Tianxi Cai. Multiview incomplete knowledge graph int...

work page doi:10.1016/j.jbi.2022.104147 2022

[61] [61]

grandparent

Doudou Zhou, Yufeng Zhang, Aaron Sonabend-W, Zhaoran Wang, Junwei Lu, and Tianxi Cai. Federated offline reinforcement learning. Journal of the American Statistical Association, pages 1–12, 2024. 24 Supplementary Material Representation Learning to Advance Multi-Institutional Studies with Electronic Health Record Data S.1 Training and validation data base ...

work page 2024

[62] [62]

one-step training

In the similarity training step, we save the embedding with the highest code mapping accuracy, as detailed in Algorithm 2. In the relatedness training step, we save the embedding with the highest feature selection correlation, also detailed in Algorithm 2. When splitting the training and validation sets, we divide the similar hierarchical pairs according ...

work page arXiv 1977