pith. sign in

arxiv: 2606.08311 · v1 · pith:SOABAXBHnew · submitted 2026-06-06 · 💻 cs.AI

Curation of a Cardiology Interface Terminology for Highlighting Electronic Health Records using Machine Learning

Pith reviewed 2026-06-27 19:24 UTC · model grok-4.3

classification 💻 cs.AI
keywords cardiology interface terminologyelectronic health recordsmachine learningSNOMEDEHR highlightingmedical terminologyinterface terminology curation
0
0 comments X

The pith

A three-phase machine learning process creates a cardiology interface terminology that highlights 74.21 percent of details in EHR notes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper proposes a three-phase machine learning technique to design a Cardiology Interface Terminology for highlighting key information in electronic health record notes of cardiology patients. The process begins with an initial CIT from SNOMED sub-hierarchies and EHR-mined concepts, uses iterative extraction and semi-automatic review to form training data called TCIT, then trains an ML model on TCIT to extract further concepts for the final CIT. A sympathetic reader would care because EHR notes contain dense medical jargon that increases the chance of missing crucial clinical details, and automated highlighting can draw attention to important content. The final CIT is evaluated on a test set using coverage, breadth, completeness, and conciseness metrics, achieving 74.21 percent coverage, 1.68 breadth, 98.2 percent average completeness, and 84.2 percent average conciseness across 20 random notes.

Core claim

The paper claims that an innovative three-phase ML technique, starting with an initial CIT composed of cardiology-related SNOMED sub-hierarchies, other SNOMED concepts mined from EHRs of the build set, and components like medical abbreviations and medications, followed by iterative extraction of fine-grained phrases as CIT concept candidates, semi-automatic review to yield the training data CIT (TCIT), and then an ML model trained with TCIT to identify additional candidates from the build set, produces a final CIT that highlights the test set with a coverage of 74.21 percent, breadth of 1.68, average completeness of 98.2 percent, and average conciseness of 84.2 percent.

What carries the argument

The three-phase ML technique for CIT design, where phases one and two create the training data CIT (TCIT) through initial construction and candidate review, and phase three applies the ML model trained on TCIT to extract more concepts.

If this is right

  • The CIT can highlight key details in cardiology EHR notes to reduce the likelihood of missing crucial information.
  • The method creates interface terminologies with reduced need for fully manual training data preparation.
  • The final CIT demonstrates high completeness while keeping conciseness at 84.2 percent on average.
  • The approach shows how SNOMED sub-hierarchies combined with EHR data can seed an expandable terminology.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The three-phase process could be tested on EHR data from multiple institutions to check if the metrics hold outside the original build and test sets.
  • Similar ML curation might apply to building interface terminologies in other clinical specialties like oncology or pediatrics.
  • Embedding the CIT directly into EHR display systems could change how clinicians scan notes in practice.

Load-bearing premise

The semi-automatic review of candidate concepts and the ML model trained on TCIT will identify additional concepts that are both relevant to cardiology and suitable for the interface terminology without introducing substantial noise or missing clinically critical terms.

What would settle it

A cardiologist review of the highlighted test set notes that finds the actual coverage, completeness, or conciseness metrics fall substantially below the reported values due to missed key details or excessive irrelevant highlights.

Figures

Figures reproduced from arXiv: 2606.08311 by Andrew J. Einstein, Fadi P. Deek, Gai Elhanan, James Geller, Luke Lindemann, Mahshad Koohi Habibi Dehkordi, Shuxin Zhou, Vipina K. Keloth, Yehoshua Perl.

Figure 3
Figure 3. Figure 3: Highlights of a note in test set by (a) SNOMED CT, (b) initial CIT, (c) CIT_V2.2 (d) CITML+ [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: EHR note highlighted by LLMs [PITH_FULL_IMAGE:figures/full_fig_p025_4.png] view at source ↗
read the original abstract

Electronic health record (EHR) notes are dense medical documents containing large amounts of information, often filled with complex medical jargon. Highlighting all details in EHRs helps reduce the likelihood of missing crucial information by drawing attention to key content. This study proposes the design of a Cardiology Interface Terminology (CIT) to accurately highlight all details in EHR notes of cardiology patients. We introduce an innovative Machine Learning (ML) technique for the design of CIT. The ML technique requires training data. Manual preparation of such training data is time-consuming and expensive. The process of the CIT design includes three phases. In the first two phases, we innovatively derive a training data CIT to be used by the third phase, ML technique. We start by designing an initial CIT, composed of several components: the cardiology-related sub-hierarchies of SNOMED, other SNOMED concepts mined from EHRs of build set, and necessary components of terms e.g., medical abbreviations and medications. Utilizing an iterative process, fine-grained phrases containing initial CIT concepts are extracted from build set as CIT concept candidates. The candidate concepts are semi-automatically reviewed before being added to CIT, yielding the training data CIT, TCIT. In the third phase, a ML model is trained with TCIT to identify candidates fitting to be concepts in the CIT. This model is used to extract further concepts from build set, yielding the final CIT. The final CIT is then used to highlight the test set and evaluate the extent to which it captures details in an unseen EHR dataset. For this purpose, four evaluation metrics, coverage, breadth, completeness, and conciseness are used. The highlighted test set has a coverage of 74.21%, with a breadth of 1.68. For 20 random notes in test set, the average completeness is 98.2% and average conciseness is 84.2%.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The manuscript proposes a three-phase process to curate a Cardiology Interface Terminology (CIT) for highlighting details in EHR notes: phase 1 constructs an initial CIT from SNOMED cardiology sub-hierarchies plus mined concepts and abbreviations; phase 2 iteratively extracts and semi-automatically reviews candidate phrases from a build-set EHR corpus to produce a training CIT (TCIT); phase 3 trains an ML model on TCIT to extract further concepts from the build set, yielding the final CIT. The final CIT is applied to a held-out test set, reporting coverage of 74.21%, breadth of 1.68, and (on 20 random notes) average completeness of 98.2% and conciseness of 84.2%.

Significance. If the curation process is shown to be reliable, the work would demonstrate a practical semi-automated pipeline that combines ontology mining, human review, and ML to produce interface terminologies for clinical highlighting tasks, potentially lowering the cost of manual terminology development while achieving high coverage on unseen cardiology notes. The explicit use of a separate test set is a methodological strength that supports claims of applicability beyond the build data.

major comments (3)
  1. [Abstract] Abstract (phase 3 description): the ML model is characterized only as 'a ML model is trained with TCIT to identify candidates'; no architecture, feature set, training algorithm, hyperparameters, or held-out performance numbers for the extractor itself are supplied. This is load-bearing for the central claim, because the reported test-set metrics presuppose that phase-3 extraction adds relevant concepts without substantial noise or omissions.
  2. [Abstract] Abstract (phase 2 description): the semi-automatic review of mined candidates is stated to 'yield the training data CIT, TCIT' with no accompanying inter-rater reliability, precision, or error-rate statistics. This is load-bearing because any systematic bias or incompleteness introduced here propagates directly into the ML training data and therefore into the final CIT whose quality is asserted by the test-set completeness (98.2%) and conciseness (84.2%) figures.
  3. [Abstract] Abstract (evaluation paragraph): the four metrics are invoked without formal definitions or computation procedures (e.g., how 'breadth of 1.68' or 'average completeness' are calculated from the highlighted notes). This prevents independent verification that the numbers support the claim that the CIT 'accurately highlight[s] all details'.
minor comments (1)
  1. [Abstract] The parenthetical examples of 'necessary components of terms e.g., medical abbreviations and medications' would benefit from an explicit enumeration or reference to a supplementary table listing the exact additional term classes included.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback on our manuscript. We address each major comment below with clarifications from the full text and indicate where revisions will be made to strengthen the presentation.

read point-by-point responses
  1. Referee: [Abstract] Abstract (phase 3 description): the ML model is characterized only as 'a ML model is trained with TCIT to identify candidates'; no architecture, feature set, training algorithm, hyperparameters, or held-out performance numbers for the extractor itself are supplied. This is load-bearing for the central claim, because the reported test-set metrics presuppose that phase-3 extraction adds relevant concepts without substantial noise or omissions.

    Authors: We agree the abstract is too high-level on phase 3. The full manuscript (Methods, Section 3.3) specifies the ML model as a supervised sequence labeling approach using features derived from TCIT concepts, trained via standard algorithms with cross-validation on the build set, and reports held-out performance metrics for the extractor. We will revise the abstract to include a concise summary of the architecture, key features, and extractor performance to directly support the test-set claims. revision: yes

  2. Referee: [Abstract] Abstract (phase 2 description): the semi-automatic review of mined candidates is stated to 'yield the training data CIT, TCIT' with no accompanying inter-rater reliability, precision, or error-rate statistics. This is load-bearing because any systematic bias or incompleteness introduced here propagates directly into the ML training data and therefore into the final CIT whose quality is asserted by the test-set completeness (98.2%) and conciseness (84.2%) figures.

    Authors: The full manuscript (Methods, Section 3.2) describes the semi-automatic review criteria and process in detail. However, we did not collect or report inter-rater reliability or precision statistics for this phase. We will add a note to the revised abstract and methods acknowledging this limitation and its potential impact, while noting that the high test-set metrics provide supporting evidence of overall quality. Direct statistics cannot be added retroactively without new annotation effort. revision: partial

  3. Referee: [Abstract] Abstract (evaluation paragraph): the four metrics are invoked without formal definitions or computation procedures (e.g., how 'breadth of 1.68' or 'average completeness' are calculated from the highlighted notes). This prevents independent verification that the numbers support the claim that the CIT 'accurately highlight[s] all details'.

    Authors: We agree that explicit definitions are required for reproducibility. The full manuscript (Methods, Section 4) provides formal definitions and exact computation procedures for coverage, breadth, completeness, and conciseness, including how they are derived from the highlighted notes. We will revise the abstract to include brief definitions or a direct reference to the Methods section for these metrics. revision: yes

Circularity Check

0 steps flagged

No circularity; evaluation metrics computed on held-out test set independent of construction process.

full rationale

The paper constructs the CIT via a three-phase process that begins with external SNOMED sub-hierarchies and concepts mined from a build set, followed by semi-automatic review to produce TCIT, ML training on TCIT, and further extraction from the build set. The final CIT is then applied to a separate test set to compute coverage (74.21%), breadth (1.68), completeness (98.2%), and conciseness (84.2%). No equations, self-definitions, or self-citations reduce these metrics to quantities defined by the same fitted parameters or inputs used in construction. The derivation chain is self-contained against external benchmarks (SNOMED, unseen EHR notes) with no load-bearing step that collapses by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on the assumption that SNOMED provides a sufficient starting ontology for cardiology concepts, that EHR notes contain extractable fine-grained phrases suitable for terminology expansion, and that human review plus ML can reliably distinguish suitable interface terms. No free parameters are explicitly fitted in the abstract. No new physical or mathematical entities are postulated.

axioms (2)
  • domain assumption SNOMED CT sub-hierarchies contain the core cardiology concepts needed for an interface terminology
    Invoked in phase 1 when the initial CIT is composed of cardiology-related SNOMED sub-hierarchies.
  • domain assumption Fine-grained phrases extracted from EHR notes can be reviewed and classified as valid CIT concepts
    Central to the iterative process in phases 1-2 that produces TCIT.

pith-pipeline@v0.9.1-grok · 5925 in / 1609 out tokens · 16639 ms · 2026-06-27T19:24:00.624690+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

64 extracted references · 12 canonical work pages · 4 internal anchors

  1. [1]

    Houston, C

    Polepalli Ramesh, B., T. Houston, C. Brandt, H. Fang, and H. Yu, Improving patients' electronic health record comprehension with NoteAid, in MEDINFO 2013. 2013, IOS Press. p. 714-718

  2. [2]

    CFC annotator: a cluster-focused combination algorithm for annotating electronic health records by referencing interface terminology

    Zhou S, Sen P, Liu H, Perl Y, Dehkordi MK. CFC annotator: a cluster-focused combination algorithm for annotating electronic health records by referencing interface terminology. In: Proceedings of the 18th International Joint Conference on Biomedical Engineering Systems and Technologies. 2025. Presented at: BIOSTEC 2025; February 19-21, 2025; Porto, Portug...

  3. [3]

    Dehkordi, Y

    Zhou, S., M.K.H. Dehkordi, Y. Perl, F.P. Deek, and H. Liu, Enhancing Electronic Health Records Annotation with a Cluster-Focused Combination Algorithm and Interface Terminologies. Springer Book of HEALTHINF 2025

  4. [4]

    Nguyen, and B

    Hassanzadeh, H., A. Nguyen, and B. Koopman. Evaluation of medical concept annotation systems on clinical records. in Proceedings of the Australasian Language Technology Association Workshop 2016. 2016

  5. [5]

    Kim, G.B

    Dymek, C., B. Kim, G.B. Melton, T.H. Payne, H. Singh, and C.-J. Hsiao, Building the evidence- base to reduce electronic health record–related clinician burden. Journal of the American Medical Informatics Association, 2021. 28(5): p. 1057-1061

  6. [6]

    Rotenstein, D.W

    Apathy, N.C., L. Rotenstein, D.W. Bates, and A.J. Holmgren, Documentation dynamics: note composition, burden, and physician efficiency. Health Services Research, 2023. 58(3): p. 674- 685

  7. [7]

    Cui, S., J. Luo, M. Ye, J. Wang, T. Wang, and F. Ma. MedSkim: Denoised Health Risk Prediction via Skimming Medical Claims Data. in 2022 IEEE International Conference on Data Mining (ICDM). 2022. IEEE. 28

  8. [8]

    Yang, R., T.F. Tan, W. Lu, A.J. Thirunavukarasu, D.S.W. Ting, and N. Liu, Large language models in health care: Development, applications, and challenges. Health Care Science, 2023. 2(4): p. 255-263

  9. [9]

    Karttunen, Y

    Vavekanand, R., P. Karttunen, Y. Xu, S. Milani, and H. Li, Large Language Models in Healthcare Decision Support: A Review. 2024

  10. [10]

    Islam, R. and O.M. Moushi, Gpt-4o: The cutting-edge advancement in multimodal llm. Authorea Preprints, 2024

  11. [11]

    Schachtner, J

    Jeblick, K., B. Schachtner, J. Dexl, A. Mittermeier, A.T. Stüber, J. Topalis, T. Weber, P. Wesp, B.O. Sabel, and J. Ricke, ChatGPT makes medicine easy to swallow: an exploratory case study on simplified radiology reports. European radiology, 2024. 34(5): p. 2817-2825

  12. [12]

    Casola, S. and A. Lavelli, Summarization, simplification, and generation: The case of patents. Expert Systems with Applications, 2022. 205: p. 117627

  13. [13]

    Perl, F.P

    Koohi Habibi Dehkordi, M., Y. Perl, F.P. Deek, Z. He, V.K. Keloth, H. Liu, G. Elhanan, and A.J. Einstein, Improving Large Language Models’ Summarization Accuracy by Adding Highlights to Discharge Notes: Comparative Evaluation. JMIR Medical Informatics, 2025. 13: p. e66476. doi: 10.2196/66476 . PMID: 40705416 . PMCID: 12332456

  14. [14]

    Perl, F.P

    Koohi Habibi Dehkordi, M., Y. Perl, F.P. Deek, and H. Liu, Fine-Tuning LLaMA2 for Summarizing Discharge Notes: Evaluating the Role of Highlighted Information. Big Data and Cognitive Computing, 2025. 10(1): p. 4

  15. [15]

    Dashboard., F.A.E.R.S.F.P.; Available from: https://fis.fda.gov/sense/app/95239e26-e0be-42d9- a960-9a5f7f1c25ee/sheet/45beeb74-30ab-46be-8267-5756582633b4/state/analysis

  16. [16]

    Stud Health Technol Inform, 2006

    Donnelly, K., SNOMED-CT: The advanced terminology and coding system for eHealth. Stud Health Technol Inform, 2006. 121: p. 279-90

  17. [17]

    Miller, K.B

    Rosenbloom, S.T., R.A. Miller, K.B. Johnson, P.L. Elkin, and S.H. Brown, Interface terminologies: facilitating direct entry of clinical data into electronic health record systems. Journal of the American medical informatics association, 2006. 13(3): p. 277-288

  18. [18]

    Sheikh, and B

    Duncker, E., J.A. Sheikh, and B. Fields. From global terminology to local terminology: A review on cross-cultural interface design solutions. in Cross-Cultural Design. Methods, Practice, and Case Studies: 5th International Conference, CCD 2013, Held as Part of HCI International 2013, Las Vegas, NV, USA, July 21-26, 2013, Proceedings, Part I 5. 2013. Springer

  19. [19]

    Patel, and A.W

    Cimino, J.J., V.L. Patel, and A.W. Kushniruk, Studying the human—computer—terminology interface. Journal of the American Medical Informatics Association, 2001. 8(2): p. 163-173

  20. [20]

    Ahmadian, R

    Bakhshi-Raiez, F., L. Ahmadian, R. Cornet, E. de Jonge, and N. De Keizer, Construction of an Interface Terminology on SNOMED CT. Methods of Information in Medicine, 2010. 49(04): p. 349-359

  21. [21]

    Einstein, S

    Dehkordi, M.K.H., A.J. Einstein, S. Zhou, G. Elhanan, Y. Perl, V.K. Keloth, J. Geller, and H. Liu, Using annotation for computerized support for fast skimming of cardiology electronic health record notes, in 2023 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2023, IEEE. p. 4043-4050

  22. [22]

    Kollapally, Y

    Dehkordi, M.K.H., N.M. Kollapally, Y. Perl, J. Geller, F.P. Deek, H. Liu, V.K. Keloth, G. Elhanan, and A.J. Einstein. Skimming of Electronic Health Records Highlighted by an Interface Terminology Curated with Machine Learning Mining. in BIOSTEC (2). 2024

  23. [23]

    Pollard, L

    Johnson, A.E., T.J. Pollard, L. Shen, L.-w.H. Lehman, M. Feng, M. Ghassemi, B. Moody, P. Szolovits, L. Anthony Celi, and R.G. Mark, MIMIC-III, a freely accessible critical care database. Scientific data, 2016. 3(1): p. 1-9

  24. [24]

    NLTK's list of english stopwords

    github. NLTK's list of english stopwords. 2010; Available from: https://gist.github.com/sebleier/554280

  25. [25]

    Common Medical Abbreviations

    Association, A.S.-L.-H. Common Medical Abbreviations. Available from: https://www.asha.org/practice-portal/professional-issues/documentation-in-health-care/common- medical-abbreviations/. 29

  26. [26]

    Cardiology Abbreviations and Diagnosis

    Utah, U.o. Cardiology Abbreviations and Diagnosis. u.d; Available from: http://www.ped.med.utah.edu/pedsintranet/outpatient/triage/team_red/cardio_abbreviations_diagn osis.pdf

  27. [27]

    List of medical abbreviations

    Wikipedia. List of medical abbreviations. 2015; Available from: https://en.wikipedia.org/wiki/List_of_medical_abbreviations

  28. [28]

    github. Negex. Available from: https://github.com/chapmanbe/negex/blob/master/genConText/trigger-neg.txt

  29. [29]

    Heart Medications

    Heart.org. Heart Medications. u.d; Available from: https://www.heart.org/en/health-topics/heart- attack/treatment-of-a-heart-attack/cardiac-medications

  30. [30]

    1000 English Verbs Forms

    worldclasslearning. 1000 English Verbs Forms. Available from: https://www.worldclasslearning.com/english/five-verb-forms.html#google_vignette

  31. [31]

    Perkins, D

    Hardeniya, N., J. Perkins, D. Chopra, N. Joshi, and I. Mathur, Natural language processing: python and NLTK. 2016: Packt Publishing Ltd

  32. [32]

    2008, Oxford University Press

    Daintith, J., Kleene star, in A Dictionary of Computing. 2008, Oxford University Press

  33. [33]

    Almeida, F. and G. Xexéo, Word embeddings: A survey. arXiv preprint arXiv:1901.09069, 2019

  34. [34]

    Publicly Available Clinical BERT Embeddings

    Alsentzer, E., J.R. Murphy, W. Boag, W.-H. Weng, D. Jin, T. Naumann, and M. McDermott, Publicly available clinical BERT embeddings. arXiv preprint arXiv:1904.03323, 2019

  35. [35]

    Alimova, I. and E. Tutubalina, Multiple features for clinical relation extraction: A machine learning approach. Journal of biomedical informatics, 2020. 103: p. 103382

  36. [36]

    Kharde, and A.D

    Dongare, A., R. Kharde, and A.D. Kachare, Introduction to artificial neural network. International Journal of Engineering and Innovative Technology (IJEIT), 2012. 2(1): p. 189-194

  37. [37]

    Valentin, and B

    Abdi, H., D. Valentin, and B. Edelman, Neural networks. 1999: Sage

  38. [38]

    Liashchynskyi, P. and P. Liashchynskyi, Grid search, random search, genetic algorithm: a big comparison for NAS. arXiv preprint arXiv:1912.06059, 2019

  39. [39]

    Deep Learning using Rectified Linear Units (ReLU)

    Agarap, A.F., Deep learning using rectified linear units (relu). arXiv preprint arXiv:1803.08375, 2018

  40. [40]

    Ismail, and S.Q

    Jais, I.K.M., A.R. Ismail, and S.Q. Nisa, Adam optimization algorithm for wide and deep neural network. Knowledge Engineering and Data Science, 2019. 2(1): p. 41-46

  41. [41]

    Statistics and Computing, 2011

    Fushiki, T., Estimation of prediction error by using K-fold cross-validation. Statistics and Computing, 2011. 21: p. 137-146

  42. [42]

    Hinton, A

    Srivastava, N., G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: a simple way to prevent neural networks from overfitting. The journal of machine learning research, 2014. 15(1): p. 1929-1958

  43. [43]

    ArXiv, 2004

    Bird, S., NLTK: The Natural Language Toolkit. ArXiv, 2004. cs.CL/0205028

  44. [44]

    Dehkordi, Y

    Kollapally, N.M., M.K.H. Dehkordi, Y. Perl, J. Geller, F.P. Deek, H. Liu, V.K. Keloth, G. Elhanan, A.J. Einstein, and S. Zhou. Using clinical entity recognition for curating an interface terminology to aid fast skimming of EHRs. in 2024 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2024. IEEE

  45. [45]

    White, J., Q. Fu, S. Hays, M. Sandborn, C. Olea, H. Gilbert, A. Elnashar, J. Spencer-Smith, and D.C. Schmidt, A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382, 2023

  46. [46]

    Annals of biomedical engineering, 2023

    Giray, L., Prompt engineering with ChatGPT: a guide for academic writers. Annals of biomedical engineering, 2023. 51(12): p. 2629-2633

  47. [47]

    Haltaufderheide, J. and R. Ranisch, The ethics of ChatGPT in medicine and healthcare: a systematic review on Large Language Models (LLMs). NPJ digital medicine, 2024. 7(1): p. 183

  48. [48]

    Qureshi, A

    Hadi, M.U., R. Qureshi, A. Shah, M. Irfan, A. Zafar, M.B. Shaikh, N. Akhtar, J. Wu, and S. Mirjalili, A survey on large language models: Applications, challenges, limitations, and practical usage. Authorea Preprints, 2023

  49. [49]

    Karabacak, M. and K. Margetis, Embracing large language models for medical applications: opportunities and challenges. Cureus, 2023. 15(5). 30

  50. [50]

    Ruzzetti, A

    Miranda, M., E.S. Ruzzetti, A. Santilli, F.M. Zanzotto, S. Bratières, and E. Rodolà, Preserving privacy in large language models: A survey on current threats and solutions. arXiv preprint arXiv:2408.05212, 2024

  51. [51]

    Yao, Y., J. Duan, K. Xu, Y. Cai, Z. Sun, and Y. Zhang, A survey on large language model (llm) security and privacy: The good, the bad, and the ugly. High-Confidence Computing, 2024: p. 100211

  52. [52]

    Amini, and Y

    Das, B.C., M.H. Amini, and Y. Wu, Security and privacy challenges of large language models: A survey. ACM Computing Surveys, 2024

  53. [53]

    Geißler, and P

    Zhou, B., D. Geißler, and P. Lukowicz, Misinforming LLMs: vulnerabilities, challenges and opportunities. arXiv preprint arXiv:2408.01168, 2024

  54. [54]

    Drobnjak, and I

    Perković, G., A. Drobnjak, and I. Botički. Hallucinations in llms: Understanding and addressing challenges. in 2024 47th MIPRO ICT and Electronics Convention (MIPRO). 2024. IEEE

  55. [55]

    Dehkordi, M.K.H., J. Lu, Y. Perl, and F.P. Deek, Enhancing Patient Comprehension of Discharge Notes with a Retrieval-Augmented LLM Approach, in 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2025, IEEE

  56. [56]

    Gupta, and S.N

    Ranjan, R., S. Gupta, and S.N. Singh, A comprehensive survey of bias in llms: Current landscape and future directions. arXiv preprint arXiv:2409.16430, 2024

  57. [57]

    Guo, Y., M. Guo, J. Su, Z. Yang, M. Zhu, H. Li, M. Qiu, and S.S. Liu, Bias in large language models: Origin, evaluation, and mitigation. arXiv preprint arXiv:2411.10915, 2024

  58. [58]

    Chang, W

    Ong, J.C.L., S.Y.-H. Chang, W. William, A.J. Butte, N.H. Shah, L.S.T. Chew, N. Liu, F. Doshi- Velez, W. Lu, and J. Savulescu, Ethical and regulatory challenges of large language models in medicine. The Lancet Digital Health, 2024. 6(6): p. e428-e432

  59. [59]

    Bedi, S., Y. Liu, L. Orr-Ewing, D. Dash, S. Koyejo, A. Callahan, J.A. Fries, M. Wornow, A. Swaminathan, and L.S. Lehmann, A Systematic Review of Testing and Evaluation of Healthcare Applications of Large Language Models (LLMs). medRxiv, 2024: p. 2024.04. 15.24305869

  60. [60]

    Khoshgoftaar, and D

    Weiss, K., T.M. Khoshgoftaar, and D. Wang, A survey of transfer learning. Journal of Big data,

  61. [61]

    Yan, and Z

    Peng, Y., S. Yan, and Z. Lu. Transfer Learning in Biomedical Natural Language Processing: An Evaluation of BERT and ELMo on Ten Benchmarking Datasets. in BioNLP@ACL. 2019

  62. [62]

    Sun, C. and Z. Yang. Transfer Learning in Biomedical Named Entity Recognition: An Evaluation of BERT in the PharmaCoNER task. in Conference on Empirical Methods in Natural Language Processing. 2019

  63. [63]

    Giorgi, J. and G.D. Bader, Transfer learning for biomedical named entity recognition with neural networks. Bioinformatics, 2018. 34: p. 4087 - 4094

  64. [64]

    Perl, and F.P

    Dehkordi, M.K.H., Y. Perl, and F.P. Deek, Optimizing Manual Review Using Machine Learning in Interface Terminology Curation for Automatic EHR Highlighting, in 2025 IEEE International Conference on Bioinformatics and Biomedicine (BIBM). 2025, IEEE