RDMA: Cost Effective Agent-Driven Rare Disease Mining from Electronic Health Records

Adam Cross; Jimeng Sun; John Wu

arxiv: 2507.15867 · v2 · pith:KZ5K6C5Hnew · submitted 2025-07-14 · 💻 cs.LG · cs.AI· cs.CL· cs.MA

RDMA: Cost Effective Agent-Driven Rare Disease Mining from Electronic Health Records

John Wu , Adam Cross , Jimeng Sun This is my paper

Pith reviewed 2026-05-19 03:53 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.CLcs.MA

keywords rare disease miningelectronic health recordsclinical notesagentic frameworkquantized language modelsontology groundingphenotype reasoning

0 comments

The pith

Agent tools let small quantized models extract rare diseases from noisy clinical notes without any task-specific training.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents RDMA as an agentic system that gives smaller language models access to tools for resolving abbreviations, reasoning about implicit phenotypes, and grounding findings to Orphanet and HPO ontologies. This setup targets the gap where more than half of rare-disease codes lack direct ICD mappings, so patient cases stay invisible in structured records. By operating directly on long, abbreviation-heavy notes, the approach avoids the need for fine-tuning or large annotated datasets that are usually required for clinical NLP tasks. The result is higher accuracy than both fine-tuned models and retrieval-augmented baselines across varied benchmarks, achieved with a quantized model that also lowers inference and hardware costs.

Core claim

RDMA shows that an agentic framework supplying abbreviation resolution, implicit phenotype reasoning, and ontology grounding tools allows a small quantized LLM to outperform fine-tuned and RAG baselines on rare-disease extraction from real-world clinical notes without task-specific training or large labeled data.

What carries the argument

The RDMA agentic framework, which equips the model with callable tools for abbreviation resolution, phenotype reasoning, and ontology grounding against Orphanet and HPO.

If this is right

Rare-disease populations become visible in existing EHR data without new annotation campaigns.
Expert review effort drops because uncertainty flags direct attention only to ambiguous cases.
Deployment can move to local standard hardware, removing the need to send protected health information to external cloud services.
The same agent pattern could scale to other sparsely coded conditions once the core tools are in place.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Hospitals could run the system nightly on existing note archives to generate candidate rare-disease lists for specialist review.
The uncertainty-flagging step might serve as a general template for reducing labeling cost in other clinical extraction tasks.
If the tool set generalizes, similar agent designs could address other under-coded medical domains such as social determinants or adverse-event detection.

Load-bearing premise

The provided tools are enough for a small quantized model to handle the noise and abbreviations in actual clinical notes reliably enough to beat trained baselines.

What would settle it

A held-out collection of real clinical notes containing many rare-disease mentions and heavy abbreviation use where the small quantized RDMA model fails to exceed the accuracy of a fine-tuned baseline.

read the original abstract

Rare diseases affect 1 in 10 Americans yet remain systematically underdocumented in clinical records. ICD-based systems cannot capture their breadth, over 50\% of Orphanet codes lack a direct ICD mapping and only 2.2\% of HPO codes have matching ICD codes, leaving patient populations invisible and delaying diagnosis. Mining unstructured clinical notes offers a direct path forward, but real notes are long, noisy, and abbreviation-dense, and limited annotations make fine-tuning infeasible, demanding approaches that generalize without task-specific training. We present Rare Disease Mining Agents (RDMA), an agentic framework equipping smaller quantized LLMs with tools for abbreviation resolution, implicit phenotype reasoning, and ontology grounding against Orphanet and HPO. RDMA substantially outperforms fine-tuned and RAG-based baselines across benchmarks with different data characteristics, without any task-specific training. A small quantized model achieves maximal performance, reducing inference costs by up to 10x and local hardware costs by up to 17x, enabling private deployment on standard hardware without cloud-based PHI exposure. RDMA's uncertainty-flagging mechanism further reduces expert annotation burden while preserving agreement quality, supporting scalable rare disease documentation in clinical practice. Available at https://github.com/jhnwu3/RDMA.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RDMA gives small quantized LLMs a tool kit for rare-disease phenotype extraction from notes and claims big cost wins, but the abstract leaves the actual performance numbers and tool ablations unclear.

read the letter

RDMA equips small quantized models with tools for abbreviation resolution, implicit phenotype reasoning, and ontology grounding to Orphanet and HPO. The main claim is that this agentic setup beats fine-tuned and RAG baselines on benchmarks without any task-specific training, while cutting inference cost up to 10x and hardware cost up to 17x for local private deployment. Uncertainty flagging is added to lower annotation load. That combination is the concrete new piece: prior work on clinical note mining has not paired exactly these tools with quantized models for this rare-disease use case. The clinical motivation is solid; the under-mapping of Orphanet and HPO codes to ICD is a documented problem that affects real patients. The privacy angle for keeping PHI on-prem is also useful for anyone who cannot send notes to cloud APIs. The soft spot is the missing detail on numbers. The abstract says “substantially outperforms” but gives no F1 scores, no exact baseline code or hyperparameters, and no statistical tests. Without those, it is hard to know how large the gap really is or whether it holds on noisier held-out notes. The stress-test point about tool sufficiency is fair: if the tools do not fully compensate for reduced model capacity on abbreviation-heavy text, the cost claims would not transfer. The paper would be stronger with explicit ablations that remove one tool at a time and with error analysis on real clinical notes. This work is aimed at applied clinical NLP groups and health-system teams that need rare-disease documentation without large labeled sets or cloud exposure. A reader already running quantized models on EHR data would find the tool descriptions and deployment numbers directly usable. The problem is important enough and the approach practical enough that it deserves a serious referee to check the experiments and reproducibility. I would send it out for review rather than desk reject.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces RDMA, an agentic framework that equips smaller quantized LLMs with tools for abbreviation resolution, implicit phenotype reasoning, and ontology grounding against Orphanet and HPO to extract rare disease information from noisy, abbreviation-dense clinical notes in EHRs. The central claim is that RDMA substantially outperforms fine-tuned and RAG-based baselines across benchmarks with varying data characteristics, without any task-specific training; a small quantized model achieves peak performance, yielding up to 10x inference cost reduction and 17x local hardware cost reduction while enabling private on-premise deployment, and an uncertainty-flagging mechanism reduces expert annotation burden.

Significance. If the empirical results hold under scrutiny, the work could meaningfully advance scalable rare-disease documentation by demonstrating that tool-augmented small models can generalize to real clinical notes without fine-tuning or large annotated datasets, while addressing privacy and cost barriers. The emphasis on cost reductions and uncertainty flagging is clinically relevant. The significance is tempered by the current lack of detailed quantitative support and component ablations needed to confirm that the claimed gains are attributable to the proposed framework rather than untested assumptions about tool sufficiency.

major comments (2)

[Abstract and §4] Abstract and §4 (Experimental Results): The abstract asserts that RDMA 'substantially outperforms fine-tuned and RAG-based baselines across benchmarks' yet supplies no numeric metrics, exact baseline implementations, statistical significance tests, or error bars. This absence directly undermines evaluation of the headline performance and cost-reduction claims (10x inference, 17x hardware), which are load-bearing for the paper's contribution.
[§4.2–4.3] §4.2–4.3 (Ablations and Robustness): No ablation results are presented that remove or isolate individual tools (abbreviation resolution, implicit phenotype reasoning, ontology grounding) or that evaluate performance on progressively noisier held-out clinical notes. Without these controls it is impossible to determine whether the reported outperformance on real-world abbreviation-dense notes is driven by the agentic tool suite or by the base quantized LLM, which is the central assumption underlying the claim of training-free generalization and the associated cost savings.

minor comments (2)

[§3] §3 (Method): The description of how tool outputs are aggregated and passed back to the LLM could be clarified with a short pseudocode snippet or explicit state diagram to improve reproducibility.
[Table 1] Table 1 or equivalent benchmark table: Ensure all baselines are described with the exact model sizes, quantization levels, and prompting strategies used so that the 'no task-specific training' comparison is fully transparent.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback, which has helped us identify areas to strengthen the manuscript. We address each major comment below and have revised the paper to incorporate the suggested improvements where feasible.

read point-by-point responses

Referee: [Abstract and §4] Abstract and §4 (Experimental Results): The abstract asserts that RDMA 'substantially outperforms fine-tuned and RAG-based baselines across benchmarks' yet supplies no numeric metrics, exact baseline implementations, statistical significance tests, or error bars. This absence directly undermines evaluation of the headline performance and cost-reduction claims (10x inference, 17x hardware), which are load-bearing for the paper's contribution.

Authors: We agree that the abstract would benefit from explicit quantitative support to make the claims more immediately verifiable. In the revised manuscript, we will update the abstract to report key metrics such as F1-score gains over baselines, exact inference cost reductions (e.g., 10x), and hardware cost savings (e.g., 17x), while directing readers to the corresponding tables in §4. We will also expand §4 to fully specify baseline implementations (including model sizes, quantization levels, and RAG configurations), include statistical significance testing (e.g., McNemar’s test or paired t-tests with p-values), and add error bars or standard deviations across multiple runs for all primary results. revision: yes
Referee: [§4.2–4.3] §4.2–4.3 (Ablations and Robustness): No ablation results are presented that remove or isolate individual tools (abbreviation resolution, implicit phenotype reasoning, ontology grounding) or that evaluate performance on progressively noisier held-out clinical notes. Without these controls it is impossible to determine whether the reported outperformance on real-world abbreviation-dense notes is driven by the agentic tool suite or by the base quantized LLM, which is the central assumption underlying the claim of training-free generalization and the associated cost savings.

Authors: We concur that targeted ablations are important for isolating the contribution of the tool suite. Although the existing comparisons to fine-tuned and RAG baselines provide indirect evidence of the framework’s value, we will add a dedicated ablation study in the revised §4. This will include variants that disable one tool at a time (abbreviation resolution, implicit phenotype reasoning, and ontology grounding) while keeping the rest of the agent intact, reporting performance deltas on the same benchmarks. We will also introduce robustness experiments on progressively noisier held-out clinical note subsets (e.g., by synthetically increasing abbreviation density and noise levels) to directly test generalization under realistic EHR conditions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical claims rest on external benchmarks

full rationale

The paper presents an agentic framework (RDMA) that augments smaller quantized LLMs with tools for abbreviation resolution, phenotype reasoning, and ontology grounding, then reports empirical outperformance versus fine-tuned and RAG baselines on benchmarks with varying data characteristics. No equations, derivations, or load-bearing self-citations appear in the abstract or description that would reduce any claimed result to a fitted parameter or self-referential definition by construction. All central claims are tested against independent external baselines rather than internal fits, satisfying the self-contained-against-benchmarks criterion.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that tool-augmented smaller LLMs can perform implicit phenotype reasoning and ontology grounding on noisy clinical text without task-specific training data.

axioms (1)

domain assumption Smaller quantized LLMs equipped with abbreviation resolution, implicit phenotype reasoning, and ontology grounding tools can generalize to real clinical notes without task-specific fine-tuning.
This premise is required for the claim that no annotations are needed and that performance exceeds fine-tuned baselines.

pith-pipeline@v0.9.0 · 5756 in / 1269 out tokens · 40663 ms · 2026-05-19T03:53:12.168797+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

RDMA connects scattered clinical observations... tools for abbreviation resolution, implicit phenotype reasoning, and ontology grounding against Orphanet and HPO.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

RDMA substantially outperforms fine-tuned and RAG-based baselines... without any task-specific training.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

54 extracted references · 54 canonical work pages · 3 internal anchors

[1]

Virginia Tech News

Virginia Tech: One in 10 Americans Is Living with a Rare Disease. Virginia Tech News. Accessed: 2025-04-02 (2025). news.vt.edu/articles/2025/02/research fralinbiomed rarediseaseday2025 0228.html

work page 2025
[2]

Value in Health 21(5), 501–507 (2018)

Auvin, S., Irwin, J., Abi-Aad, P., Battersby, A.: The problem of rarity: estimation of prevalence in rare disease. Value in Health 21(5), 501–507 (2018)

work page 2018
[3]

European Journal of Public Health 30(Supplement 5), 166–494 (2020)

Cavero-Carbonell, C., Rico, J., Garibay, L., Garc´ ıa-L´ opez, M., Guardiola- Vilarroig, S., Maceda-Rold´ an, L., Zurriaga, O.: From icd10 to orphacodes: paving the way towards improved identification systems for rare diseases. European Journal of Public Health 30(Supplement 5), 166–494 (2020)

work page 2020
[4]

JAMIA open 7(4), 118 (2024)

Tan, A.L., Gon¸ calves, R.S., Yuan, W., Brat, G.A., Gentleman, R., Kohane, I.S.: Implications of mappings between international classification of diseases clinical diagnosis codes and human phenotype ontology terms. JAMIA open 7(4), 118 (2024)

work page 2024
[5]

BMC Medical Informatics and Decision Making 23(1), 86 (2023)

Dong, H., Su´ arez-Paniagua, V., Zhang, H., Wang, M., Casey, A., Davidson, E., Chen, J., Alex, B., Whiteley, W., Wu, H.: Ontology-driven and weakly super- vised rare disease identification from clinical notes. BMC Medical Informatics and Decision Making 23(1), 86 (2023)

work page 2023
[6]

https://arxiv.org/abs/2308.06294

Yang, J., Liu, C., Deng, W., Wu, D., Weng, C., Zhou, Y., Wang, K.: Enhanc- ing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT (2023). https://arxiv.org/abs/2308.06294

work page arXiv 2023
[7]

BMC Medical Informatics and Decision Making 24(1), 289 (2024)

Wu, J., Dong, H., Li, Z., Wang, H., Li, R., Patra, A., Dai, C., Ali, W., Scordis, P., Wu, H.: A hybrid framework with large language models for rare disease phenotyping. BMC Medical Informatics and Decision Making 24(1), 289 (2024)

work page 2024
[8]

https://arxiv.org/abs/2402

Chen, X., Mao, X., Guo, Q., Wang, L., Zhang, S., Chen, T.: RareBench: Can LLMs Serve as Rare Diseases Specialists? (2024). https://arxiv.org/abs/2402. 06341

work page 2024
[9]

NPJ Digital Medicine 7(1), 20 (2024)

Savage, T., Nayak, A., Gallo, R., Rangan, E., Chen, J.H.: Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine. NPJ Digital Medicine 7(1), 20 (2024)

work page 2024
[10]

medRxiv, 2024–12 (2024)

Garcia, B.T., Westerfield, L., Yelemali, P., Gogate, N., Rivera-Munoz, E.A., Du, H., Dawood, M., Jolly, A., Lupski, J.R., Posey, J.E.: Improving automated deep phenotyping through large language models using retrieval augmented generation. medRxiv, 2024–12 (2024)

work page 2024
[11]

arXiv preprint arXiv:2405.12035 (2024) 32

Sanmartin, D.: Kg-rag: Bridging the gap between knowledge and creativity. arXiv preprint arXiv:2405.12035 (2024) 32

work page arXiv 2024
[12]

Retrieval-Augmented Generation for Large Language Models: A Survey

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., Wang, H.: Retrieval-Augmented Generation for Large Language Models: A Survey (2024). https://arxiv.org/abs/2312.10997

work page internal anchor Pith review Pith/arXiv arXiv 2024
[13]

: ¡? mode longauthoraffil?¿ the human phenotype ontology in 2024: phenotypes around the world

Gargano, M.A., Matentzoglu, N., Coleman, B., Addo-Lartey, E.B., Anagnos- topoulos, A.V., Anderton, J., Avillach, P., Bagley, A.M., Bakˇ stein, E., Balhoff, J.P., et al. : ¡? mode longauthoraffil?¿ the human phenotype ontology in 2024: phenotypes around the world. Nucleic acids research 52(D1), 1333–1346 (2024)

work page 2024
[14]

Nederlands tijdschrift voor geneeskunde 152(9), 518–519 (2008)

Weinreich, S.S., Mangon, R., Sikkens, J., Teeuw, M.E., Cornel, M.: Orphanet: a european database for rare diseases. Nederlands tijdschrift voor geneeskunde 152(9), 518–519 (2008)

work page 2008
[15]

Scientific data 3(1), 1–9 (2016)

Johnson, A.E., Pollard, T.J., Shen, L., Lehman, L.-w.H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Anthony Celi, L., Mark, R.G.: Mimic-iii, a freely accessible critical care database. Scientific data 3(1), 1–9 (2016)

work page 2016
[16]

In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp

Edin, J., Junge, A., Havtorn, J.D., Borgholt, L., Maistro, M., Ruotsalo, T., Maaløe, L.: Automated medical coding on mimic-iii and mimic-iv: a critical review and replicability study. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2572–2582 (2023)

work page 2023
[17]

MIMIC-IV-Note: Deidentified free-text clinical notes.PhysioNet, 2023b

Johnson, A., et al.: MIMIC-IV-Note: Deidentified free-text clinical notes. Phy- sioNet (2023). https://doi.org/10.13026/1n74-ne17 . https://doi.org/10.13026/ 1n74-ne17

work page doi:10.13026/1n74-ne17 2023
[18]

Windows% 20Azure% 20HIPAA% 20Imple- mentation% 20Guidance

Ayad, M., Rodriguez, H., Squire, J.: Addressing hipaa security and privacy requirements in the microsoft cloud. Windows% 20Azure% 20HIPAA% 20Imple- mentation% 20Guidance. pdf (2011)

work page 2011
[19]

: Designing scalable and hipaa-compliant notification systems for healthcare: Leveraging cloud, microservices, and secure architectures

Keshetti, S., et al. : Designing scalable and hipaa-compliant notification systems for healthcare: Leveraging cloud, microservices, and secure architectures. In: International Journal for Research Publication and Seminar, vol. 16, pp. 154–173 (2025)

work page 2025
[20]

Chest 148(5), 1148–1155 (2015)

Grady, C.: Institutional review boards: Purpose and challenges. Chest 148(5), 1148–1155 (2015)

work page 2015
[21]

https://arxiv.org/abs/2411

Sun, Q., Wu, H., Zhang, X.S.: On Active Privacy Auditing in Supervised Fine-tuning for White-Box Language Models (2024). https://arxiv.org/abs/2411. 07070

work page 2024
[22]

Bioinformatics 40(7), 406 (2024) 33

Groza, T., Gration, D., Baynam, G., Robinson, P.N.: Fasthpocr: pragmatic, fast, and accurate concept recognition using the human phenotype ontology. Bioinformatics 40(7), 406 (2024) 33

work page 2024
[23]

Journal of the American Medical Informatics Association 25(5), 530–537 (2018)

Wu, H., Toti, G., Morley, K.I., Ibrahim, Z.M., Folarin, A., Jackson, R., Kartoglu, I., Agrawal, A., Stringer, C., Gale, D., et al.: Semehr: A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research. Journal of the American Medical Informatics Association 25(5), 530–537 (2018)

work page 2018
[24]

: A survey on large language model based autonomous agents

Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z., Tang, J., Chen, X., Lin, Y., et al. : A survey on large language model based autonomous agents. Frontiers of Computer Science 18(6), 186345 (2024)

work page 2024
[25]

In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Edin, J., Junge, A., Havtorn, J.D., Borgholt, L., Maistro, M., Ruotsalo, T., Maaløe, L.: Automated medical coding on mimic-iii and mimic-iv: A critical review and replicability study. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’23, pp. 2572–2582. ACM, ??? (2023). https://doi.o...

work page doi:10.1145/3539618.3591918 2023
[26]

https://salad.com/pricing

Pricing, S.: Pricing (2025). https://salad.com/pricing

work page 2025
[27]

https://www.hyperstack.cloud/gpu-pricing

Hyperstack: GPU Pricing (2025). https://www.hyperstack.cloud/gpu-pricing

work page 2025
[28]

https:// github.com/abhinand5/MedEmbed

Balachandran, A.: MedEmbed: Medical-Focused Embedding Models. https:// github.com/abhinand5/MedEmbed

work page
[29]

Natural Language Engineering, 1–28 (2023)

Rohanian, O., Nouriborji, M., Jauncey, H., Kouchaki, S., Nooralahzadeh, F., Clifton, L., Merson, L., Clifton, D.A., Group, I.C.C., et al.: Lightweight trans- formers for clinical natural language processing. Natural Language Engineering, 1–28 (2023)

work page 2023
[30]

Hugging Face (2024)

Ankit Pal, M.S.: OpenBioLLMs: Advancing Open-Source Large Language Models for Healthcare and Life Sciences. Hugging Face (2024)

work page 2024
[31]

The Llama 3 Herd of Models

Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Let- man, A., Mathur, A., Schelten, A., Vaughan, A., Yang, A., Fan, A., Goyal, A., Hartshorn, A., Yang, A., Mitra, A., Sravankumar, A., Korenev, A., Hinsvark, A., Rao, A., Zhang, A., Rodriguez, A., Gregerson, A., Spataru, A., Roziere, B., Biron, B., Tang, B., Chern, B., Cauchete...

work page internal anchor Pith review Pith/arXiv arXiv 2024
[32]

https://mistral.ai/news/mistral-small-3-1 Accessed 2025-04-27

AI, M.: Mistral Small 3.1. https://mistral.ai/news/mistral-small-3-1 Accessed 2025-04-27

work page 2025
[33]

https://www.newegg.com/p/ 3D5-000V-001R8 Accessed 2025-04-26

Newegg: Product 3D5-000V-001R8. https://www.newegg.com/p/ 3D5-000V-001R8 Accessed 2025-04-26

work page 2025
[34]

https://www.newegg.com/ velztorm-gaming-desktop-nvidia-rtx-a6000-intel-core-i9-13900k-32gb-ddr5-1tb-ssd-ace-i-black/ p/3D5-000W-134U1 Accessed 2025-04-26

Newegg: Velztorm Gaming Desktop with NVIDIA RTX A6000, Intel Core i9-13900K. https://www.newegg.com/ velztorm-gaming-desktop-nvidia-rtx-a6000-intel-core-i9-13900k-32gb-ddr5-1tb-ssd-ace-i-black/ p/3D5-000W-134U1 Accessed 2025-04-26

work page 2025
[35]

https://www.thinkmate.com/system/ gpx-xn4-21s3-4gpu Accessed 2025-04-26

Thinkmate: GPX XN4-21S3-4GPU. https://www.thinkmate.com/system/ gpx-xn4-21s3-4gpu Accessed 2025-04-26

work page 2025
[36]

BioMed Research International 2017(1), 8565739 (2017)

Lobo, M., Lamurias, A., Couto, F.M.: Identifying human phenotype terms by combining machine learning and validation rules. BioMed Research International 2017(1), 8565739 (2017)

work page 2017
[37]

arXiv preprint 36 arXiv:2003.07082 (2020)

Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C.D.: Stanza: A python natural language processing toolkit for many human languages. arXiv preprint 36 arXiv:2003.07082 (2020)

work page arXiv 2003
[38]

Scaling Laws for Neural Language Models

Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., Amodei, D.: Scaling Laws for Neural Language Models (2020). https://arxiv.org/abs/2001.08361

work page internal anchor Pith review Pith/arXiv arXiv 2020
[39]

Genome medicine 7, 1–14 (2015)

Wei, W.-Q., Denny, J.C.: Extracting research-quality phenotypes from electronic health records to support precision medicine. Genome medicine 7, 1–14 (2015)

work page 2015
[40]

https://arxiv.org/abs/2412.12475

Chen, X., Jin, Y., Mao, X., Wang, L., Zhang, S., Chen, T.: RareAgents: Advancing Rare Disease Care through LLM-Empowered Multi-disciplinary Team (2025). https://arxiv.org/abs/2412.12475

work page arXiv 2025
[41]

Orphanet Journal of Rare Diseases 20, 186 (2025)

Germain, D.P., Gruson, D., Malcles, M., Garcelon, N.: Applying artificial intelli- gence to rare diseases: a literature review highlighting lessons from fabry disease. Orphanet Journal of Rare Diseases 20, 186 (2025)

work page 2025
[42]

https://arxiv.org/abs/2108.01204

Mart´ ınez-deMiguel, C., Segura-Bedmar, I., Chac´ on-Solano, E., Guerrero-Aspizua, S.: The RareDis corpus: a corpus annotated with rare diseases, their signs and symptoms (2021). https://arxiv.org/abs/2108.01204

work page arXiv 2021
[43]

: Mimic-iv, a freely accessible electronic health record dataset

Johnson, A.E., Bulgarelli, L., Shen, L., Gayles, A., Shammout, A., Horng, S., Pollard, T.J., Hao, S., Moody, B., Gow, B., et al. : Mimic-iv, a freely accessible electronic health record dataset. Scientific data 10(1), 1 (2023)

work page 2023
[44]

NEJM AI 1(5), 2300040 (2024)

Soroush, A., Glicksberg, B.S., Zimlichman, E., Barash, Y., Freeman, R., Char- ney, A.W., Nadkarni, G.N., Klang, E.: Large language models are poor medical coders—benchmarking of medical code querying. NEJM AI 1(5), 2300040 (2024)

work page 2024
[45]

npj Digital Medicine 7(1), 16 (2024)

Wang, H., Gao, C., Dantona, C., Hull, B., Sun, J.: Drg-llama: tuning llama model to predict diagnosis-related group for hospitalized patients. npj Digital Medicine 7(1), 16 (2024)

work page 2024
[46]

: Orphacodes use for the coding of rare diseases: comparison of the accuracy and cross country comparability

Mazzucato, M., Pozza, L.V.D., Facchin, P., Angin, C., Agius, F., Cavero- Carbonell, C., Corrochano, V., Hanusova, K., Kirch, K., Lambert, D., et al. : Orphacodes use for the coding of rare diseases: comparison of the accuracy and cross country comparability. Orphanet Journal of Rare Diseases18(1), 267 (2023)

work page 2023
[47]

Journal of clinical epidemiology 65(9), 1026–1027 (2012)

Kodra, Y., Fantini, B., Taruscio, D.: Classification and codification of rare diseases. Journal of clinical epidemiology 65(9), 1026–1027 (2012)

work page 2012
[48]

In: Rogers, A., Boyd- Graber, J., Okazaki, N

Cheng, H., Jafari, R., Russell, A., Klopfer, R., Lu, E., Striner, B., Gormley, M.: MDACE: MIMIC documents annotated with code evidence. In: Rogers, A., Boyd- Graber, J., Okazaki, N. (eds.) Proceedings of the 61st Annual Meeting of the Asso- ciation for Computational Linguistics (Volume 1: Long Papers), pp. 7534–7550. Association for Computational Linguist...

work page doi:10.18653/v1/2023.acl-long.416 2023
[49]

arXiv preprint arXiv:2504.13861 (2025)

Sviridov, I., Miftakhova, A., Tereshchenko, A., Zubkova, G., Blinov, P., Savchenko, A.: 3mdbench: Medical multimodal multi-agent dialogue benchmark. arXiv preprint arXiv:2504.13861 (2025)

work page arXiv 2025
[50]

Schmidgall, R

Schmidgall, S., Ziaei, R., Harris, C., Reis, E., Jopling, J., Moor, M.: Agentclinic: a multimodal agent benchmark to evaluate ai in simulated clinical environments. arXiv preprint arXiv:2405.07960 (2024)

work page arXiv 2024
[51]

In: The Thirty-eight Confer- ence on Neural Information Processing Systems Datasets and Benchmarks Track (2024)

Wu, Z., Dadu, A., Nalls, M., Faghri, F., Sun, J.: Instruction tuning large language models to understand electronic health records. In: The Thirty-eight Confer- ence on Neural Information Processing Systems Datasets and Benchmarks Track (2024)

work page 2024
[52]

Advances in Neural Information Processing Systems 37, 140334–140365 (2024)

Xia, P., Chen, Z., Tian, J., Gong, Y., Hou, R., Xu, Y., Wu, Z., Fan, Z., Zhou, Y., Zhu, K., et al.: Cares: A comprehensive benchmark of trustworthiness in medical vision language models. Advances in Neural Information Processing Systems 37, 140334–140365 (2024)

work page 2024
[53]

Encyclopedia of library and information science, 369–378 (2002)

Nelson, S.J., Powell, T., Humphreys, B.: The unified medical language system (umls) project. Encyclopedia of library and information science, 369–378 (2002)

work page 2002
[54]

https://arxiv.org/abs/2401

Douze, M., Guzhva, A., Deng, C., Johnson, J., Szilvasy, G., Mazar´ e, P.-E., Lomeli, M., Hosseini, L., J´ egou, H.: The Faiss library (2025). https://arxiv.org/abs/2401. 08281 38

work page 2025

[1] [1]

Virginia Tech News

Virginia Tech: One in 10 Americans Is Living with a Rare Disease. Virginia Tech News. Accessed: 2025-04-02 (2025). news.vt.edu/articles/2025/02/research fralinbiomed rarediseaseday2025 0228.html

work page 2025

[2] [2]

Value in Health 21(5), 501–507 (2018)

Auvin, S., Irwin, J., Abi-Aad, P., Battersby, A.: The problem of rarity: estimation of prevalence in rare disease. Value in Health 21(5), 501–507 (2018)

work page 2018

[3] [3]

European Journal of Public Health 30(Supplement 5), 166–494 (2020)

Cavero-Carbonell, C., Rico, J., Garibay, L., Garc´ ıa-L´ opez, M., Guardiola- Vilarroig, S., Maceda-Rold´ an, L., Zurriaga, O.: From icd10 to orphacodes: paving the way towards improved identification systems for rare diseases. European Journal of Public Health 30(Supplement 5), 166–494 (2020)

work page 2020

[4] [4]

JAMIA open 7(4), 118 (2024)

Tan, A.L., Gon¸ calves, R.S., Yuan, W., Brat, G.A., Gentleman, R., Kohane, I.S.: Implications of mappings between international classification of diseases clinical diagnosis codes and human phenotype ontology terms. JAMIA open 7(4), 118 (2024)

work page 2024

[5] [5]

BMC Medical Informatics and Decision Making 23(1), 86 (2023)

Dong, H., Su´ arez-Paniagua, V., Zhang, H., Wang, M., Casey, A., Davidson, E., Chen, J., Alex, B., Whiteley, W., Wu, H.: Ontology-driven and weakly super- vised rare disease identification from clinical notes. BMC Medical Informatics and Decision Making 23(1), 86 (2023)

work page 2023

[6] [6]

https://arxiv.org/abs/2308.06294

Yang, J., Liu, C., Deng, W., Wu, D., Weng, C., Zhou, Y., Wang, K.: Enhanc- ing Phenotype Recognition in Clinical Notes Using Large Language Models: PhenoBCBERT and PhenoGPT (2023). https://arxiv.org/abs/2308.06294

work page arXiv 2023

[7] [7]

BMC Medical Informatics and Decision Making 24(1), 289 (2024)

Wu, J., Dong, H., Li, Z., Wang, H., Li, R., Patra, A., Dai, C., Ali, W., Scordis, P., Wu, H.: A hybrid framework with large language models for rare disease phenotyping. BMC Medical Informatics and Decision Making 24(1), 289 (2024)

work page 2024

[8] [8]

https://arxiv.org/abs/2402

Chen, X., Mao, X., Guo, Q., Wang, L., Zhang, S., Chen, T.: RareBench: Can LLMs Serve as Rare Diseases Specialists? (2024). https://arxiv.org/abs/2402. 06341

work page 2024

[9] [9]

NPJ Digital Medicine 7(1), 20 (2024)

Savage, T., Nayak, A., Gallo, R., Rangan, E., Chen, J.H.: Diagnostic reasoning prompts reveal the potential for large language model interpretability in medicine. NPJ Digital Medicine 7(1), 20 (2024)

work page 2024

[10] [10]

medRxiv, 2024–12 (2024)

Garcia, B.T., Westerfield, L., Yelemali, P., Gogate, N., Rivera-Munoz, E.A., Du, H., Dawood, M., Jolly, A., Lupski, J.R., Posey, J.E.: Improving automated deep phenotyping through large language models using retrieval augmented generation. medRxiv, 2024–12 (2024)

work page 2024

[11] [11]

arXiv preprint arXiv:2405.12035 (2024) 32

Sanmartin, D.: Kg-rag: Bridging the gap between knowledge and creativity. arXiv preprint arXiv:2405.12035 (2024) 32

work page arXiv 2024

[12] [12]

Retrieval-Augmented Generation for Large Language Models: A Survey

Gao, Y., Xiong, Y., Gao, X., Jia, K., Pan, J., Bi, Y., Dai, Y., Sun, J., Wang, M., Wang, H.: Retrieval-Augmented Generation for Large Language Models: A Survey (2024). https://arxiv.org/abs/2312.10997

work page internal anchor Pith review Pith/arXiv arXiv 2024

[13] [13]

: ¡? mode longauthoraffil?¿ the human phenotype ontology in 2024: phenotypes around the world

Gargano, M.A., Matentzoglu, N., Coleman, B., Addo-Lartey, E.B., Anagnos- topoulos, A.V., Anderton, J., Avillach, P., Bagley, A.M., Bakˇ stein, E., Balhoff, J.P., et al. : ¡? mode longauthoraffil?¿ the human phenotype ontology in 2024: phenotypes around the world. Nucleic acids research 52(D1), 1333–1346 (2024)

work page 2024

[14] [14]

Nederlands tijdschrift voor geneeskunde 152(9), 518–519 (2008)

Weinreich, S.S., Mangon, R., Sikkens, J., Teeuw, M.E., Cornel, M.: Orphanet: a european database for rare diseases. Nederlands tijdschrift voor geneeskunde 152(9), 518–519 (2008)

work page 2008

[15] [15]

Scientific data 3(1), 1–9 (2016)

Johnson, A.E., Pollard, T.J., Shen, L., Lehman, L.-w.H., Feng, M., Ghassemi, M., Moody, B., Szolovits, P., Anthony Celi, L., Mark, R.G.: Mimic-iii, a freely accessible critical care database. Scientific data 3(1), 1–9 (2016)

work page 2016

[16] [16]

In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp

Edin, J., Junge, A., Havtorn, J.D., Borgholt, L., Maistro, M., Ruotsalo, T., Maaløe, L.: Automated medical coding on mimic-iii and mimic-iv: a critical review and replicability study. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp. 2572–2582 (2023)

work page 2023

[17] [17]

MIMIC-IV-Note: Deidentified free-text clinical notes.PhysioNet, 2023b

Johnson, A., et al.: MIMIC-IV-Note: Deidentified free-text clinical notes. Phy- sioNet (2023). https://doi.org/10.13026/1n74-ne17 . https://doi.org/10.13026/ 1n74-ne17

work page doi:10.13026/1n74-ne17 2023

[18] [18]

Windows% 20Azure% 20HIPAA% 20Imple- mentation% 20Guidance

Ayad, M., Rodriguez, H., Squire, J.: Addressing hipaa security and privacy requirements in the microsoft cloud. Windows% 20Azure% 20HIPAA% 20Imple- mentation% 20Guidance. pdf (2011)

work page 2011

[19] [19]

: Designing scalable and hipaa-compliant notification systems for healthcare: Leveraging cloud, microservices, and secure architectures

Keshetti, S., et al. : Designing scalable and hipaa-compliant notification systems for healthcare: Leveraging cloud, microservices, and secure architectures. In: International Journal for Research Publication and Seminar, vol. 16, pp. 154–173 (2025)

work page 2025

[20] [20]

Chest 148(5), 1148–1155 (2015)

Grady, C.: Institutional review boards: Purpose and challenges. Chest 148(5), 1148–1155 (2015)

work page 2015

[21] [21]

https://arxiv.org/abs/2411

Sun, Q., Wu, H., Zhang, X.S.: On Active Privacy Auditing in Supervised Fine-tuning for White-Box Language Models (2024). https://arxiv.org/abs/2411. 07070

work page 2024

[22] [22]

Bioinformatics 40(7), 406 (2024) 33

Groza, T., Gration, D., Baynam, G., Robinson, P.N.: Fasthpocr: pragmatic, fast, and accurate concept recognition using the human phenotype ontology. Bioinformatics 40(7), 406 (2024) 33

work page 2024

[23] [23]

Journal of the American Medical Informatics Association 25(5), 530–537 (2018)

Wu, H., Toti, G., Morley, K.I., Ibrahim, Z.M., Folarin, A., Jackson, R., Kartoglu, I., Agrawal, A., Stringer, C., Gale, D., et al.: Semehr: A general-purpose semantic search system to surface semantic data from clinical notes for tailored care, trial recruitment, and clinical research. Journal of the American Medical Informatics Association 25(5), 530–537 (2018)

work page 2018

[24] [24]

: A survey on large language model based autonomous agents

Wang, L., Ma, C., Feng, X., Zhang, Z., Yang, H., Zhang, J., Chen, Z., Tang, J., Chen, X., Lin, Y., et al. : A survey on large language model based autonomous agents. Frontiers of Computer Science 18(6), 186345 (2024)

work page 2024

[25] [25]

In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval

Edin, J., Junge, A., Havtorn, J.D., Borgholt, L., Maistro, M., Ruotsalo, T., Maaløe, L.: Automated medical coding on mimic-iii and mimic-iv: A critical review and replicability study. In: Proceedings of the 46th International ACM SIGIR Conference on Research and Development in Information Retrieval. SIGIR ’23, pp. 2572–2582. ACM, ??? (2023). https://doi.o...

work page doi:10.1145/3539618.3591918 2023

[26] [26]

https://salad.com/pricing

Pricing, S.: Pricing (2025). https://salad.com/pricing

work page 2025

[27] [27]

https://www.hyperstack.cloud/gpu-pricing

Hyperstack: GPU Pricing (2025). https://www.hyperstack.cloud/gpu-pricing

work page 2025

[28] [28]

https:// github.com/abhinand5/MedEmbed

Balachandran, A.: MedEmbed: Medical-Focused Embedding Models. https:// github.com/abhinand5/MedEmbed

work page

[29] [29]

Natural Language Engineering, 1–28 (2023)

Rohanian, O., Nouriborji, M., Jauncey, H., Kouchaki, S., Nooralahzadeh, F., Clifton, L., Merson, L., Clifton, D.A., Group, I.C.C., et al.: Lightweight trans- formers for clinical natural language processing. Natural Language Engineering, 1–28 (2023)

work page 2023

[30] [30]

Hugging Face (2024)

Ankit Pal, M.S.: OpenBioLLMs: Advancing Open-Source Large Language Models for Healthcare and Life Sciences. Hugging Face (2024)

work page 2024

[31] [31]

The Llama 3 Herd of Models

Grattafiori, A., Dubey, A., Jauhri, A., Pandey, A., Kadian, A., Al-Dahle, A., Let- man, A., Mathur, A., Schelten, A., Vaughan, A., Yang, A., Fan, A., Goyal, A., Hartshorn, A., Yang, A., Mitra, A., Sravankumar, A., Korenev, A., Hinsvark, A., Rao, A., Zhang, A., Rodriguez, A., Gregerson, A., Spataru, A., Roziere, B., Biron, B., Tang, B., Chern, B., Cauchete...

work page internal anchor Pith review Pith/arXiv arXiv 2024

[32] [32]

https://mistral.ai/news/mistral-small-3-1 Accessed 2025-04-27

AI, M.: Mistral Small 3.1. https://mistral.ai/news/mistral-small-3-1 Accessed 2025-04-27

work page 2025

[33] [33]

https://www.newegg.com/p/ 3D5-000V-001R8 Accessed 2025-04-26

Newegg: Product 3D5-000V-001R8. https://www.newegg.com/p/ 3D5-000V-001R8 Accessed 2025-04-26

work page 2025

[34] [34]

https://www.newegg.com/ velztorm-gaming-desktop-nvidia-rtx-a6000-intel-core-i9-13900k-32gb-ddr5-1tb-ssd-ace-i-black/ p/3D5-000W-134U1 Accessed 2025-04-26

Newegg: Velztorm Gaming Desktop with NVIDIA RTX A6000, Intel Core i9-13900K. https://www.newegg.com/ velztorm-gaming-desktop-nvidia-rtx-a6000-intel-core-i9-13900k-32gb-ddr5-1tb-ssd-ace-i-black/ p/3D5-000W-134U1 Accessed 2025-04-26

work page 2025

[35] [35]

https://www.thinkmate.com/system/ gpx-xn4-21s3-4gpu Accessed 2025-04-26

Thinkmate: GPX XN4-21S3-4GPU. https://www.thinkmate.com/system/ gpx-xn4-21s3-4gpu Accessed 2025-04-26

work page 2025

[36] [36]

BioMed Research International 2017(1), 8565739 (2017)

Lobo, M., Lamurias, A., Couto, F.M.: Identifying human phenotype terms by combining machine learning and validation rules. BioMed Research International 2017(1), 8565739 (2017)

work page 2017

[37] [37]

arXiv preprint 36 arXiv:2003.07082 (2020)

Qi, P., Zhang, Y., Zhang, Y., Bolton, J., Manning, C.D.: Stanza: A python natural language processing toolkit for many human languages. arXiv preprint 36 arXiv:2003.07082 (2020)

work page arXiv 2003

[38] [38]

Scaling Laws for Neural Language Models

Kaplan, J., McCandlish, S., Henighan, T., Brown, T.B., Chess, B., Child, R., Gray, S., Radford, A., Wu, J., Amodei, D.: Scaling Laws for Neural Language Models (2020). https://arxiv.org/abs/2001.08361

work page internal anchor Pith review Pith/arXiv arXiv 2020

[39] [39]

Genome medicine 7, 1–14 (2015)

Wei, W.-Q., Denny, J.C.: Extracting research-quality phenotypes from electronic health records to support precision medicine. Genome medicine 7, 1–14 (2015)

work page 2015

[40] [40]

https://arxiv.org/abs/2412.12475

Chen, X., Jin, Y., Mao, X., Wang, L., Zhang, S., Chen, T.: RareAgents: Advancing Rare Disease Care through LLM-Empowered Multi-disciplinary Team (2025). https://arxiv.org/abs/2412.12475

work page arXiv 2025

[41] [41]

Orphanet Journal of Rare Diseases 20, 186 (2025)

Germain, D.P., Gruson, D., Malcles, M., Garcelon, N.: Applying artificial intelli- gence to rare diseases: a literature review highlighting lessons from fabry disease. Orphanet Journal of Rare Diseases 20, 186 (2025)

work page 2025

[42] [42]

https://arxiv.org/abs/2108.01204

Mart´ ınez-deMiguel, C., Segura-Bedmar, I., Chac´ on-Solano, E., Guerrero-Aspizua, S.: The RareDis corpus: a corpus annotated with rare diseases, their signs and symptoms (2021). https://arxiv.org/abs/2108.01204

work page arXiv 2021

[43] [43]

: Mimic-iv, a freely accessible electronic health record dataset

Johnson, A.E., Bulgarelli, L., Shen, L., Gayles, A., Shammout, A., Horng, S., Pollard, T.J., Hao, S., Moody, B., Gow, B., et al. : Mimic-iv, a freely accessible electronic health record dataset. Scientific data 10(1), 1 (2023)

work page 2023

[44] [44]

NEJM AI 1(5), 2300040 (2024)

Soroush, A., Glicksberg, B.S., Zimlichman, E., Barash, Y., Freeman, R., Char- ney, A.W., Nadkarni, G.N., Klang, E.: Large language models are poor medical coders—benchmarking of medical code querying. NEJM AI 1(5), 2300040 (2024)

work page 2024

[45] [45]

npj Digital Medicine 7(1), 16 (2024)

Wang, H., Gao, C., Dantona, C., Hull, B., Sun, J.: Drg-llama: tuning llama model to predict diagnosis-related group for hospitalized patients. npj Digital Medicine 7(1), 16 (2024)

work page 2024

[46] [46]

: Orphacodes use for the coding of rare diseases: comparison of the accuracy and cross country comparability

Mazzucato, M., Pozza, L.V.D., Facchin, P., Angin, C., Agius, F., Cavero- Carbonell, C., Corrochano, V., Hanusova, K., Kirch, K., Lambert, D., et al. : Orphacodes use for the coding of rare diseases: comparison of the accuracy and cross country comparability. Orphanet Journal of Rare Diseases18(1), 267 (2023)

work page 2023

[47] [47]

Journal of clinical epidemiology 65(9), 1026–1027 (2012)

Kodra, Y., Fantini, B., Taruscio, D.: Classification and codification of rare diseases. Journal of clinical epidemiology 65(9), 1026–1027 (2012)

work page 2012

[48] [48]

In: Rogers, A., Boyd- Graber, J., Okazaki, N

Cheng, H., Jafari, R., Russell, A., Klopfer, R., Lu, E., Striner, B., Gormley, M.: MDACE: MIMIC documents annotated with code evidence. In: Rogers, A., Boyd- Graber, J., Okazaki, N. (eds.) Proceedings of the 61st Annual Meeting of the Asso- ciation for Computational Linguistics (Volume 1: Long Papers), pp. 7534–7550. Association for Computational Linguist...

work page doi:10.18653/v1/2023.acl-long.416 2023

[49] [49]

arXiv preprint arXiv:2504.13861 (2025)

Sviridov, I., Miftakhova, A., Tereshchenko, A., Zubkova, G., Blinov, P., Savchenko, A.: 3mdbench: Medical multimodal multi-agent dialogue benchmark. arXiv preprint arXiv:2504.13861 (2025)

work page arXiv 2025

[50] [50]

Schmidgall, R

Schmidgall, S., Ziaei, R., Harris, C., Reis, E., Jopling, J., Moor, M.: Agentclinic: a multimodal agent benchmark to evaluate ai in simulated clinical environments. arXiv preprint arXiv:2405.07960 (2024)

work page arXiv 2024

[51] [51]

In: The Thirty-eight Confer- ence on Neural Information Processing Systems Datasets and Benchmarks Track (2024)

Wu, Z., Dadu, A., Nalls, M., Faghri, F., Sun, J.: Instruction tuning large language models to understand electronic health records. In: The Thirty-eight Confer- ence on Neural Information Processing Systems Datasets and Benchmarks Track (2024)

work page 2024

[52] [52]

Advances in Neural Information Processing Systems 37, 140334–140365 (2024)

Xia, P., Chen, Z., Tian, J., Gong, Y., Hou, R., Xu, Y., Wu, Z., Fan, Z., Zhou, Y., Zhu, K., et al.: Cares: A comprehensive benchmark of trustworthiness in medical vision language models. Advances in Neural Information Processing Systems 37, 140334–140365 (2024)

work page 2024

[53] [53]

Encyclopedia of library and information science, 369–378 (2002)

Nelson, S.J., Powell, T., Humphreys, B.: The unified medical language system (umls) project. Encyclopedia of library and information science, 369–378 (2002)

work page 2002

[54] [54]

https://arxiv.org/abs/2401

Douze, M., Guzhva, A., Deng, C., Johnson, J., Szilvasy, G., Mazar´ e, P.-E., Lomeli, M., Hosseini, L., J´ egou, H.: The Faiss library (2025). https://arxiv.org/abs/2401. 08281 38

work page 2025