UniD$^3$: A Knowledge Graph-Enhanced RAG Framework for Drug-Disease Discovery and Reasoning

Guangyu Wang; Jialu Liang; Jing Su; Minghao Zhou; Qianqian Song; Qing Wang; Sen Guo; Tianshi Liu

arxiv: 2606.01394 · v1 · pith:F2GEGQMHnew · submitted 2026-05-31 · 💻 cs.CL

UniD³: A Knowledge Graph-Enhanced RAG Framework for Drug-Disease Discovery and Reasoning

Qing Wang , Tianshi Liu , Minghao Zhou , Jialu Liang , Sen Guo , Guangyu Wang , Jing Su , Qianqian Song This is my paper

Pith reviewed 2026-06-28 16:57 UTC · model grok-4.3

classification 💻 cs.CL

keywords drug-disease relationshipsknowledge graphretrieval-augmented generationlarge language modelsbiomedical literaturedrug discoverydrug repurposingprecision medicine

0 comments

The pith

UniD³ combines LLMs with knowledge graph RAG to turn PubMed literature into structured drug-disease datasets.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces UniD³ as a framework that processes 157,849 PubMed articles with Llama 3.3-70B to extract drug-disease relations through a dual-stage approach of paper-level extraction followed by KG-level consolidation. This produces six knowledge graphs and large QA datasets for drug-disease matching, effectiveness assessment, and target analysis, which are then used with KG-RAG to generate outputs that are evaluated against external benchmarks and clinician review. A sympathetic reader would care because existing manual datasets are incomplete and pure LLM approaches hallucinate, so a method that grounds generation in consolidated graphs could support more reliable AI applications in drug discovery and repurposing.

Core claim

UniD³ processes 157,849 PubMed articles with Llama 3.3-70B and constructs knowledge graphs via a dual-stage strategy combining paper-level extraction with KG-level consolidation centered on drug and disease entities. These graphs support KG-RAG-based generation of structured datasets for Drug-Disease Matching, Drug Effectiveness Assessment, and Drug-Target Analysis, achieving F1 scores of 0.85-0.87 on external validation and AUROC of 0.90 in clinician review, while KG-RAG models outperform standalone LLMs.

What carries the argument

The dual-stage paper-level extraction and KG-level consolidation centered on drug and disease entities, which feeds consolidated graphs into KG-RAG for evidence-grounded generation of structured outputs.

If this is right

KG-RAG-augmented models outperform standalone LLMs on the drug-disease tasks.
The framework produces over 28,915 DDM and 15,042 DEA QA pairs plus more than 4,000 DTA pairs.
The UniD³ chatbot enables interpretable, citation-supported exploration of drug-disease relationships.
The overall approach supplies a scalable method for converting unstructured literature into usable knowledge for repurposing and precision medicine.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same dual-stage KG construction could be applied to extract other biomedical relations such as gene-disease or protein-drug links.
Periodic re-running of the pipeline on new literature releases could maintain up-to-date graphs without full manual re-curation.
Hybrid use with existing curated databases might further reduce any residual extraction errors.

Load-bearing premise

The dual-stage extraction and consolidation with Llama 3.3-70B on 157,849 articles produces accurate entity relations without substantial hallucination or selection bias.

What would settle it

An independent manual review of a sample of the generated QA pairs or relations that finds agreement rates substantially below the reported F1 of 0.85-0.87 or clinician AUROC of 0.90.

Figures

Figures reproduced from arXiv: 2606.01394 by Guangyu Wang, Jialu Liang, Jing Su, Minghao Zhou, Qianqian Song, Qing Wang, Sen Guo, Tianshi Liu.

**Figure 2.** Figure 2: Research classification prompt used in UniD3 . Example of the task-specific prompt provided to Llama 3.3-70B for classifying biomedical research articles into Drug-Disease Matching (DDM), Drug Effectiveness Assessment (DEA), and DrugTarget Analysis (DTA). The prompt instructs the Llama 3.3-70B to act as a domain expert, defines the scope and criteria of each task, and specifies a standardized output forma… view at source ↗

**Figure 3.** Figure 3: Task-specific entity extraction prompts used in UniD3 . Illustration of the prompt templates provided to Llama 3.3-70B for entity and relationship extraction in the three UniD3 tasks: Drug-Disease Matching (DDM), Drug Effectiveness Assessment (DEA), and Drug-Target Analysis (DTA). Each prompt defines the task objective, specifies relevant entity types, and instructs Llama 3.3-70B to extract task-specific e… view at source ↗

read the original abstract

Systematic characterization of drug-disease relationships is essential for drug discovery and repurposing, yet is hindered by the heterogeneity and rapid growth of biomedical literature. Existing datasets rely on labor-intensive curation and are often incomplete, while LLM-only approaches suffer from hallucination and weak evidence grounding. We introduce UniD$^3$, a unified framework that integrates Large Language Models with Knowledge Graph-enhanced Retrieval-Augmented Generation (KG-RAG) to extract, organize, and validate drug-disease knowledge across Drug-Disease Matching (DDM), Drug Effectiveness Assessment (DEA), and Drug-Target Analysis (DTA). UniD$^3$ processes 157,849 PubMed articles with Llama 3.3-70B and constructs knowledge graphs via a dual-stage strategy combining paper-level extraction with KG-level consolidation centered on drug and disease entities. These graphs support KG-RAG-based generation of structured datasets, evaluated through external benchmarks, fuzzy matching with curated resources, and clinician review. UniD$^3$ produces six knowledge graphs and large-scale datasets, including 28,915 DDM, 15,042 DEA, and over 4,000 DTA QA pairs. External validation shows strong performance (F1: 0.85-0.87 for DDM/DEA; 0.82 for DTA), with clinician review confirming high reliability (AUROC = 0.90). KG-RAG-augmented models outperform standalone LLMs, and the UniD$^3$ chatbot enables interpretable, citation-supported exploration of drug-disease relationships. UniD$^3$ provides a scalable, extensible framework for transforming unstructured biomedical literature into high-quality, structured drug-disease knowledge, supporting AI-driven discovery, repurposing, and precision medicine.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UniD³ scales extraction from 157k papers into new drug-disease datasets with decent reported numbers, but the extraction accuracy itself lacks direct human validation.

read the letter

The paper processes 157,849 PubMed articles with Llama 3.3-70B through paper-level extraction followed by KG-level consolidation, then applies KG-RAG to produce structured outputs: 28,915 DDM pairs, 15,042 DEA pairs, and over 4,000 DTA pairs, along with six knowledge graphs and a citation-backed chatbot.

It does produce previously unavailable dataset sizes at this literature volume and shows KG-RAG models beating standalone LLMs on the external benchmarks, with F1 scores of 0.85-0.87 for DDM and DEA plus clinician AUROC of 0.90.

The main limitation is in how the extraction quality is checked. The abstract relies on fuzzy matching to curated resources and clinician review, but gives no held-out human-annotated precision or recall for the relations pulled from the papers, no inter-annotator agreement, and no ablation that isolates hallucination rates before versus after consolidation. If systematic errors survive into the graphs, the downstream numbers may not reflect true knowledge quality.

This work is aimed at biomedical NLP groups and drug-repurposing teams that need large-scale structured literature data. The datasets could be practical if they ship with clear release details and provenance.

It has enough concrete scale, pipeline description, and benchmark numbers to warrant peer review, though the methods will need tighter evidence on the extraction step.

Referee Report

2 major / 2 minor

Summary. The paper introduces UniD³, a KG-RAG framework that applies Llama 3.3-70B to 157,849 PubMed articles via dual-stage paper-level extraction followed by KG-level consolidation around drug/disease entities. It constructs six knowledge graphs and generates large QA datasets (28,915 DDM, 15,042 DEA, >4,000 DTA pairs) for drug-disease matching, effectiveness assessment, and target analysis. External benchmarks report F1 scores of 0.85-0.87 (DDM/DEA) and 0.82 (DTA), clinician review yields AUROC 0.90, KG-RAG outperforms standalone LLMs, and a citation-supported chatbot is provided for exploration.

Significance. If the extraction and consolidation steps are shown to be accurate, the work supplies a scalable pipeline for converting unstructured literature into structured, evidence-linked drug-disease knowledge at a scale (157k articles) that existing curated resources cannot match. The release of the generated datasets and the interpretable KG-RAG chatbot constitute concrete, reusable contributions to biomedical NLP and drug-repurposing research.

major comments (2)

[Evaluation] Evaluation section: The reported F1 0.85-0.87 and clinician AUROC 0.90 rest on external benchmarks, fuzzy matching, and clinician review, yet no held-out human-annotated precision/recall, inter-annotator agreement, or error analysis is provided for the paper-level extraction step itself. Because the central claim is that the dual-stage process yields high-quality structured knowledge without substantial hallucination, the absence of an independent extraction-level validation is load-bearing.
[Methods] Methods (dual-stage KG construction): The paper asserts that KG-level consolidation improves accuracy over paper-level extraction alone, but supplies no ablation that quantifies hallucination or relation-error rates before versus after consolidation. Without this measurement it is impossible to determine whether the reported downstream metrics reflect true knowledge quality or undetected systematic errors in the 157k-article extraction.

minor comments (2)

The abstract states that six knowledge graphs are produced; the manuscript should include a table or section explicitly listing their node/edge counts, entity coverage, and construction parameters for reproducibility.
Clarify the exact prompting templates and few-shot examples used for the Llama 3.3-70B extraction and consolidation stages; current description is high-level and would benefit from an appendix containing the prompts.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on evaluation and methods. We respond point-by-point below.

read point-by-point responses

Referee: [Evaluation] Evaluation section: The reported F1 0.85-0.87 and clinician AUROC 0.90 rest on external benchmarks, fuzzy matching, and clinician review, yet no held-out human-annotated precision/recall, inter-annotator agreement, or error analysis is provided for the paper-level extraction step itself. Because the central claim is that the dual-stage process yields high-quality structured knowledge without substantial hallucination, the absence of an independent extraction-level validation is load-bearing.

Authors: We agree that the current validations (external benchmarks, fuzzy matching to curated resources, and clinician review) do not isolate the paper-level extraction step with held-out human annotations or inter-annotator agreement. This is a substantive gap for the central claim. In revision we will add a dedicated extraction-level validation subsection reporting precision/recall on a held-out human-annotated sample, inter-annotator agreement, and error analysis. revision: yes
Referee: [Methods] Methods (dual-stage KG construction): The paper asserts that KG-level consolidation improves accuracy over paper-level extraction alone, but supplies no ablation that quantifies hallucination or relation-error rates before versus after consolidation. Without this measurement it is impossible to determine whether the reported downstream metrics reflect true knowledge quality or undetected systematic errors in the 157k-article extraction.

Authors: The manuscript describes the consolidation step but does not quantify its effect via ablation on hallucination or relation-error rates. We acknowledge this limits interpretation of the downstream metrics. In the revision we will add an ablation study on a representative subset measuring error rates before and after consolidation. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical system evaluation without derivations or self-referential reductions

full rationale

The paper presents a descriptive framework for LLM-based extraction and KG construction from PubMed articles, followed by empirical evaluation via external benchmarks, fuzzy matching, and clinician review. No equations, derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. Performance numbers (F1 0.85-0.87, AUROC 0.90) are reported outcomes of the described pipeline rather than quantities defined in terms of the pipeline's own outputs. The central claim therefore does not reduce to its inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review supplies no explicit free parameters, axioms, or invented entities; all claims rest on empirical processing and validation whose internal assumptions are not detailed.

pith-pipeline@v0.9.1-grok · 5877 in / 1325 out tokens · 35145 ms · 2026-06-28T16:57:17.617332+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

67 extracted references · 18 canonical work pages · 11 internal anchors

[1]

Data Collection DDM DEA DTA Llama 3 KG Knowledge Graph Refinement Stage I: Paper-level
[2]

Ah, the PI3K pathway

Dual-Stage Extraction (UniD3 Core) Stage II: KG-level Drug Entity Expansion Targets Side Effect Gene Pathway Gene Pathway Gene Gene Side Effect Disease Entity b Structured Datasets & Explainable Chat Which pathway is involved in Disease B treatment with Drug A? Let's use our KG-RAG. Ah, the PI3K pathway. Based on the KG and 150K+ papers, the PI3K/AKT path...

2024
[3]

effective

and applied them to perform task-specific classification and filtering. Using his approach, the Llama3.3-70B model initially identified 4,701, 4,642, and 4,638 papers for the three tasks, respectiv ely. However, due to potential hallucination effects in the LLM, some of the extracted titles were incomplete or inconsistent with the expected content. To add...

2079
[4]

[VD] Context: Influence of baseline airflow obstruction on the patient's ability to detect any further increase in airway resistance
[5]

[VD] Context: Effect of eosinophilic inflammation on the airway and its relation to perception of dyspnea
[6]

[VD] Study findings on the effect of ICSs on PD during bronchoconstriction
[7]

[KG] Asthma pathophysiology and treatment guidelines
[8]

Short: yes Base Model Answer Reference Answer UniD3 Answer validated by clinicians and cross -referenced with external datasets

[KG] Role of inhaled corticosteroids in asthma management and their effects on airway inflammation and hyperresponsiveness." IS LATE-NIGHT SALIV ARY CORTISOL A BETTER SCREENING TEST FOR POSSIBLE CORTISOL EXCESS THAN STANDARD SCREENING TESTS IN OBESE PATIENTS WITH TYPE 2 DIABETES? Long: We have shown that eosinophilic inflammation of the airway wall may in...
[9]

& Cheng, Y

Yang, J., Li, Z., Fan, X. & Cheng, Y. Drug–disease association and drug-repositioning predictions in complex diseases using causal inference –probabilistic matrix factorization. Journal of chemical information and modeling 54, 2562-2569 (2014)

2014
[10]

& Woodcock, J

Corrigan-Curay, J., Sacks, L. & Woodcock, J. Real -world evidence and real -world data for evaluating drug safety and effectiveness. Jama 320, 867-868 (2018)

2018
[11]

& MacKenzie, D

Mitchell, O., Wilson, D.B., Eggers, A. & MacKenzie, D. L. Assessing the effectiveness of drug courts on recidivism: A meta-analytic review of traditional and non -traditional drug courts. Journal of Criminal Justice 40, 60-71 (2012)

2012
[12]

Liu, X. et al. DrugFormer: Graph‐Enhanced Language Model to Predict Drug Sensitivity. Advanced Science 11, 2405861 (2024)

2024
[13]

Wang, Q. et al. scDrugMap: Benchmarking Large Founda tion Models for Drug Response Prediction. arXiv preprint arXiv:2505.05612 (2025)

work page arXiv 2025
[14]

& Shi, Y

Duan, W., Yu, Y ., He, J. & Shi, Y . Adaptive Context Length Optimization with Low-Frequency Truncation for Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2510.26389 (2025)

work page arXiv 2025
[15]

-I., Cusick, M.E., Barabási, A

Yıldırım, M.A., Goh, K. -I., Cusick, M.E., Barabási, A. -L. & Vidal, M. Drug —target network. Nature biotechnology 25, 1119-1126 (2007)

2007
[16]

Tanoli, Z. et al. Drug Target Commons 2.0: a communit y platform for systematic analysis of drug –target interaction profiles. Database 2018, bay083 (2018)

2018
[17]

Ochoa, D. et al. Open Targets Platform: supporting sys tematic drug–target identification and prioritisation. Nucleic acids research 49, D1302-D1310 (2021)

2021
[18]

Zhang, W. et al. Predicting drug-disease associations by using similarity constrained matrix factorization. BMC bioinformatics 19, 233 (2018)

2018
[19]

-H., Luo, L., Chen, Q

Lai, P.-T., Wei, C. -H., Luo, L., Chen, Q. & Lu, Z. BioREx: improving biomedical r elation extraction by leveraging heterogeneous datasets. Journal of Biomedical Informatics 146, 104487 (2023)

2023
[20]

& Wang, F

Zhu, Y ., Elemento, O., Pathak, J. & Wang, F. Drug knowle dge bases and their applications in biomedical informatics research. Briefings in bioinformatics 20, 1308-1321 (2019)

2019
[21]

Davis, A.P. et al. Comparative Toxicogenomics Database: a knowledgebase and discovery tool for chemical – gene–disease networks. Nucleic acids research 37, D786-D792 (2009)

2009
[22]

Brown, T. et al. Language models are few-shot learners. Advances in neural information processing systems 33, 1877-1901 (2020)

1901
[23]

Touvron, H. et al. Llama: Open and efficient foundat ion language models. arXiv preprint arXiv:2302.13971 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[24]

& van der Schaar, M

Seedat, N., Huynh, N., van Breugel, B. & van der Schaar, M. Curated llm: Synergy of llms and data curation for tabular augmentation in ultra low-data regimes. (2023)

2023
[25]

Abdullin, Y ., Molla-Aliod, D., Ofoghi, B., Yearwood, J. & Li, Q. Synthetic dia logue dataset generation using llm agents. arXiv preprint arXiv:2401.17461 (2024)

work page arXiv 2024
[26]

Silberg, J. et al. UniTox: leveraging LLMs to curate a unified dataset of drug-induced toxicity from FDA labels. Advances in Neural Information Processing Systems 37, 12078-12093 (2024)

2024
[27]

Han, H. et al. Retrieval-augmented generation with graphs (graphrag). arXiv preprint arXiv:2501.00309 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[28]

& Zhang, R

Li, M., Kilicoglu, H., Xu, H. & Zhang, R. Biomedrag: A retr ieval augmented large language model for biomedicine. Journal of Biomedical Informatics 162, 104769 (2025)

2025
[29]

Wu, J. et al. Medical graph rag: Towards safe medica l large language model via graph retrieval -augmented generation. arXiv preprint arXiv:2408.04187 (2024)

work page arXiv 2024
[30]

Zhao, X., Liu, S., Yang, S. -Y. & M i a o , C . i n P r o c e e d i n g s o f t h e A C M o n We b C o n f er e n c e 2 0 2 5 4 4 4 2-4457 (2025)

2025
[31]

Li, D. et al. DALK: Dynamic Co -Augmentation of LLMs and KG to answer Alzheimer's Disease Qu estions with Scientific Literature. arXiv preprint arXiv:2405.04819 (2024)

work page arXiv 2024
[32]

Matsumoto, N. et al. KRAGEN: a knowledge graph-enhanced RAG framework for biomedical problem solving using large language models. Bioinformatics 40, btae353 (2024)

2024
[33]

LightRAG: Simple and Fast Retrieval-Augmented Generation

Guo, Z., Xia, L., Yu, Y ., Ao, T. & Huang, C. Lightrag: Simple and fast retrieval-augmented generation. arXiv preprint arXiv:2410.05779 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[34]

& Huang, C

Fan, T., Wang, J., Ren, X. & Huang, C. Minirag: Towards extremely simple retr ieval-augmented generation. arXiv preprint arXiv:2501.06713 (2025)

work page arXiv 2025
[35]

Edge, D. et al. From local to global: A graph rag app roach to query -focused summarization. arXiv preprint arXiv:2404.16130 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[36]

Wang, S., Fang, Y., Zhou, Y., Liu, X. & Ma, Y. ArchRAG: Attributed Community-based Hierarchical Retrieval- Augmented Generation. arXiv preprint arXiv:2502.09891 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025
[37]

RapidFuzz: Rapid fuzzy string matching in Python and C++

contributors, M.B.a. RapidFuzz: Rapid fuzzy string matching in Python and C++. (2024)

2024
[38]

Zhou, Y . et al. in The eleventh international conference on learning representations (2022)

2022
[39]

White, J. et al. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[40]

Hurst, A. et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[41]

Anthropic (Anthropic, 2024)

2024
[42]

Team, G. et al. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[43]

& Tchounwou, P.B

Dasari, S. & Tchounwou, P.B. Cisplatin in cancer therapy: molecular mechanisms of action. European journal of pharmacology 740, 364-378 (2014)

2014
[44]

Cisplatin: The first metal based anticancer drug

Ghosh, S. Cisplatin: The first metal based anticancer drug. Bioorganic chemistry 88, 102925 (2019)

2019
[45]

Druker, B.J. et al. Five -year follow -up of patients receiving imatinib for chronic myeloid leukemia. New England Journal of Medicine 355, 2408-2417 (2006)

2006
[46]

Slamon, D.J. et al. Use of chemotherapy plus a monoclonal antibody against HER2 for metastatic breast cancer that overexpresses HER2. New England journal of medicine 344, 783-792 (2001)

2001
[47]

Llovet, J.M. et al. Sorafenib in advanced hepatocellular carcinoma. New England journal of medicine 359, 378- 390 (2008)

2008
[48]

Dixon, S.J. et al. Ferroptosis: an iron-dependent form of nonapoptotic cell death. cell 149, 1060-1072 (2012)

2012
[49]

Singhal, A. et al. Metformin as adjunct antituberculosis therapy. Science translational medicine 6, 263ra159- 263ra159 (2014)

2014
[50]

Lhoest, Q. et al. Datasets: A community library for na tural language processing. arXiv preprint arXiv:2109.02846 (2021)

work page arXiv 2021
[51]

https://huggingface.co/datasets/YufeiHFUT/CDR_with_all_info (2024)

YufeiHFUT CDR_with_all_info. https://huggingface.co/datasets/YufeiHFUT/CDR_with_all_info (2024)

2024
[52]

https://huggingface.co/datasets/flxclxc/encoded_drug_reviews (2024)

flxclxc encoded_drug_reviews. https://huggingface.co/datasets/flxclxc/encoded_drug_reviews (2024)

2024
[53]

https://huggingface.co/datasets/P3ps/condition_to_drug (2024)

P3ps condition_to_drug. https://huggingface.co/datasets/P3ps/condition_to_drug (2024)

2024
[54]

https://huggingface.co/datasets/truehealth/medicationqa (2024)

TrueHealth medicationqa. https://huggingface.co/datasets/truehealth/medicationqa (2024)

2024
[55]

& Sankarasubbu, M

Pal, A. & Sankarasubbu, M. OpenBioLLMs: Advancing Open-Source Large Language Models for Healthcare and Life Sciences. Hugging Face repository (2024)

2024
[56]

JSL-MedLlama-3-8B-v2.0

Labs, J.S. JSL-MedLlama-3-8B-v2.0. https://huggingface.co/johnsnowlabs/JSL-MedLlama-3-8B-v2.0 (2024)

2024
[57]

https://huggingface.co/skumar9/Llama-medx_v3.2 (2024)

skumar9 Llama-medx_v3.2. https://huggingface.co/skumar9/Llama-medx_v3.2 (2024)

2024
[58]

Chen, J. et al. Huatuogpt-o1, towards medical complex reasoning with llms. arXiv preprint arXiv:2412.18925 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[59]

Luo, R. et al. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Briefings in bioinformatics 23, bbac409 (2022)

2022
[60]

Grattafiori, A. et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[61]

Open Life Science AI

AI, O.L.S. Open Life Science AI. https://huggingface.co/openlifescienceai (2024)

2024
[62]

& Sanderson, M

Tombros, A. & Sanderson, M. in Proceedings of the 21 st annual international ACM SIGIR conference on Research and development in information retrieval 2-10 (1998)

1998
[63]

& Zhu, W.-J

Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. in Proceedings of the 40th annual meeting of the Association for Computational Linguistics 311-318 (2002)

2002
[64]

i n Te x t s u m m a r i z a t i o n b r a n c h e s o u t 7 4-81 (2004)

Lin, C.-Y. i n Te x t s u m m a r i z a t i o n b r a n c h e s o u t 7 4-81 (2004)

2004
[65]

BERTScore: Evaluating Text Generation with BERT

Zhang, T., Kishore, V ., Wu, F., Weinberger, K.Q. & Artzi, Y . Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1904
[66]

Hancock, D.Y . et al. in Practice and Experience in Advanced Research Computing 2021: Evolution Across All Dimensions 1-8 (2021)

2021
[67]

& Towns, J

Boerner, T.J., Deems, S., Furlani, T.R., Knuth, S.L. & Towns, J. in Practice and experience in advanced research computing 2023: Computing for the common good 173-176 (2023)

2023

[1] [1]

Data Collection DDM DEA DTA Llama 3 KG Knowledge Graph Refinement Stage I: Paper-level

[2] [2]

Ah, the PI3K pathway

Dual-Stage Extraction (UniD3 Core) Stage II: KG-level Drug Entity Expansion Targets Side Effect Gene Pathway Gene Pathway Gene Gene Side Effect Disease Entity b Structured Datasets & Explainable Chat Which pathway is involved in Disease B treatment with Drug A? Let's use our KG-RAG. Ah, the PI3K pathway. Based on the KG and 150K+ papers, the PI3K/AKT path...

2024

[3] [3]

effective

and applied them to perform task-specific classification and filtering. Using his approach, the Llama3.3-70B model initially identified 4,701, 4,642, and 4,638 papers for the three tasks, respectiv ely. However, due to potential hallucination effects in the LLM, some of the extracted titles were incomplete or inconsistent with the expected content. To add...

2079

[4] [4]

[VD] Context: Influence of baseline airflow obstruction on the patient's ability to detect any further increase in airway resistance

[5] [5]

[VD] Context: Effect of eosinophilic inflammation on the airway and its relation to perception of dyspnea

[6] [6]

[VD] Study findings on the effect of ICSs on PD during bronchoconstriction

[7] [7]

[KG] Asthma pathophysiology and treatment guidelines

[8] [8]

Short: yes Base Model Answer Reference Answer UniD3 Answer validated by clinicians and cross -referenced with external datasets

[KG] Role of inhaled corticosteroids in asthma management and their effects on airway inflammation and hyperresponsiveness." IS LATE-NIGHT SALIV ARY CORTISOL A BETTER SCREENING TEST FOR POSSIBLE CORTISOL EXCESS THAN STANDARD SCREENING TESTS IN OBESE PATIENTS WITH TYPE 2 DIABETES? Long: We have shown that eosinophilic inflammation of the airway wall may in...

[9] [9]

& Cheng, Y

Yang, J., Li, Z., Fan, X. & Cheng, Y. Drug–disease association and drug-repositioning predictions in complex diseases using causal inference –probabilistic matrix factorization. Journal of chemical information and modeling 54, 2562-2569 (2014)

2014

[10] [10]

& Woodcock, J

Corrigan-Curay, J., Sacks, L. & Woodcock, J. Real -world evidence and real -world data for evaluating drug safety and effectiveness. Jama 320, 867-868 (2018)

2018

[11] [11]

& MacKenzie, D

Mitchell, O., Wilson, D.B., Eggers, A. & MacKenzie, D. L. Assessing the effectiveness of drug courts on recidivism: A meta-analytic review of traditional and non -traditional drug courts. Journal of Criminal Justice 40, 60-71 (2012)

2012

[12] [12]

Liu, X. et al. DrugFormer: Graph‐Enhanced Language Model to Predict Drug Sensitivity. Advanced Science 11, 2405861 (2024)

2024

[13] [13]

Wang, Q. et al. scDrugMap: Benchmarking Large Founda tion Models for Drug Response Prediction. arXiv preprint arXiv:2505.05612 (2025)

work page arXiv 2025

[14] [14]

& Shi, Y

Duan, W., Yu, Y ., He, J. & Shi, Y . Adaptive Context Length Optimization with Low-Frequency Truncation for Multi-Agent Reinforcement Learning. arXiv preprint arXiv:2510.26389 (2025)

work page arXiv 2025

[15] [15]

-I., Cusick, M.E., Barabási, A

Yıldırım, M.A., Goh, K. -I., Cusick, M.E., Barabási, A. -L. & Vidal, M. Drug —target network. Nature biotechnology 25, 1119-1126 (2007)

2007

[16] [16]

Tanoli, Z. et al. Drug Target Commons 2.0: a communit y platform for systematic analysis of drug –target interaction profiles. Database 2018, bay083 (2018)

2018

[17] [17]

Ochoa, D. et al. Open Targets Platform: supporting sys tematic drug–target identification and prioritisation. Nucleic acids research 49, D1302-D1310 (2021)

2021

[18] [18]

Zhang, W. et al. Predicting drug-disease associations by using similarity constrained matrix factorization. BMC bioinformatics 19, 233 (2018)

2018

[19] [19]

-H., Luo, L., Chen, Q

Lai, P.-T., Wei, C. -H., Luo, L., Chen, Q. & Lu, Z. BioREx: improving biomedical r elation extraction by leveraging heterogeneous datasets. Journal of Biomedical Informatics 146, 104487 (2023)

2023

[20] [20]

& Wang, F

Zhu, Y ., Elemento, O., Pathak, J. & Wang, F. Drug knowle dge bases and their applications in biomedical informatics research. Briefings in bioinformatics 20, 1308-1321 (2019)

2019

[21] [21]

Davis, A.P. et al. Comparative Toxicogenomics Database: a knowledgebase and discovery tool for chemical – gene–disease networks. Nucleic acids research 37, D786-D792 (2009)

2009

[22] [22]

Brown, T. et al. Language models are few-shot learners. Advances in neural information processing systems 33, 1877-1901 (2020)

1901

[23] [23]

Touvron, H. et al. Llama: Open and efficient foundat ion language models. arXiv preprint arXiv:2302.13971 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[24] [24]

& van der Schaar, M

Seedat, N., Huynh, N., van Breugel, B. & van der Schaar, M. Curated llm: Synergy of llms and data curation for tabular augmentation in ultra low-data regimes. (2023)

2023

[25] [25]

Abdullin, Y ., Molla-Aliod, D., Ofoghi, B., Yearwood, J. & Li, Q. Synthetic dia logue dataset generation using llm agents. arXiv preprint arXiv:2401.17461 (2024)

work page arXiv 2024

[26] [26]

Silberg, J. et al. UniTox: leveraging LLMs to curate a unified dataset of drug-induced toxicity from FDA labels. Advances in Neural Information Processing Systems 37, 12078-12093 (2024)

2024

[27] [27]

Han, H. et al. Retrieval-augmented generation with graphs (graphrag). arXiv preprint arXiv:2501.00309 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[28] [28]

& Zhang, R

Li, M., Kilicoglu, H., Xu, H. & Zhang, R. Biomedrag: A retr ieval augmented large language model for biomedicine. Journal of Biomedical Informatics 162, 104769 (2025)

2025

[29] [29]

Wu, J. et al. Medical graph rag: Towards safe medica l large language model via graph retrieval -augmented generation. arXiv preprint arXiv:2408.04187 (2024)

work page arXiv 2024

[30] [30]

Zhao, X., Liu, S., Yang, S. -Y. & M i a o , C . i n P r o c e e d i n g s o f t h e A C M o n We b C o n f er e n c e 2 0 2 5 4 4 4 2-4457 (2025)

2025

[31] [31]

Li, D. et al. DALK: Dynamic Co -Augmentation of LLMs and KG to answer Alzheimer's Disease Qu estions with Scientific Literature. arXiv preprint arXiv:2405.04819 (2024)

work page arXiv 2024

[32] [32]

Matsumoto, N. et al. KRAGEN: a knowledge graph-enhanced RAG framework for biomedical problem solving using large language models. Bioinformatics 40, btae353 (2024)

2024

[33] [33]

LightRAG: Simple and Fast Retrieval-Augmented Generation

Guo, Z., Xia, L., Yu, Y ., Ao, T. & Huang, C. Lightrag: Simple and fast retrieval-augmented generation. arXiv preprint arXiv:2410.05779 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[34] [34]

& Huang, C

Fan, T., Wang, J., Ren, X. & Huang, C. Minirag: Towards extremely simple retr ieval-augmented generation. arXiv preprint arXiv:2501.06713 (2025)

work page arXiv 2025

[35] [35]

Edge, D. et al. From local to global: A graph rag app roach to query -focused summarization. arXiv preprint arXiv:2404.16130 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[36] [36]

Wang, S., Fang, Y., Zhou, Y., Liu, X. & Ma, Y. ArchRAG: Attributed Community-based Hierarchical Retrieval- Augmented Generation. arXiv preprint arXiv:2502.09891 (2025)

work page internal anchor Pith review Pith/arXiv arXiv 2025

[37] [37]

RapidFuzz: Rapid fuzzy string matching in Python and C++

contributors, M.B.a. RapidFuzz: Rapid fuzzy string matching in Python and C++. (2024)

2024

[38] [38]

Zhou, Y . et al. in The eleventh international conference on learning representations (2022)

2022

[39] [39]

White, J. et al. A prompt pattern catalog to enhance prompt engineering with chatgpt. arXiv preprint arXiv:2302.11382 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[40] [40]

Hurst, A. et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[41] [41]

Anthropic (Anthropic, 2024)

2024

[42] [42]

Team, G. et al. Gemini: a family of highly capable multimodal models. arXiv preprint arXiv:2312.11805 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[43] [43]

& Tchounwou, P.B

Dasari, S. & Tchounwou, P.B. Cisplatin in cancer therapy: molecular mechanisms of action. European journal of pharmacology 740, 364-378 (2014)

2014

[44] [44]

Cisplatin: The first metal based anticancer drug

Ghosh, S. Cisplatin: The first metal based anticancer drug. Bioorganic chemistry 88, 102925 (2019)

2019

[45] [45]

Druker, B.J. et al. Five -year follow -up of patients receiving imatinib for chronic myeloid leukemia. New England Journal of Medicine 355, 2408-2417 (2006)

2006

[46] [46]

Slamon, D.J. et al. Use of chemotherapy plus a monoclonal antibody against HER2 for metastatic breast cancer that overexpresses HER2. New England journal of medicine 344, 783-792 (2001)

2001

[47] [47]

Llovet, J.M. et al. Sorafenib in advanced hepatocellular carcinoma. New England journal of medicine 359, 378- 390 (2008)

2008

[48] [48]

Dixon, S.J. et al. Ferroptosis: an iron-dependent form of nonapoptotic cell death. cell 149, 1060-1072 (2012)

2012

[49] [49]

Singhal, A. et al. Metformin as adjunct antituberculosis therapy. Science translational medicine 6, 263ra159- 263ra159 (2014)

2014

[50] [50]

Lhoest, Q. et al. Datasets: A community library for na tural language processing. arXiv preprint arXiv:2109.02846 (2021)

work page arXiv 2021

[51] [51]

https://huggingface.co/datasets/YufeiHFUT/CDR_with_all_info (2024)

YufeiHFUT CDR_with_all_info. https://huggingface.co/datasets/YufeiHFUT/CDR_with_all_info (2024)

2024

[52] [52]

https://huggingface.co/datasets/flxclxc/encoded_drug_reviews (2024)

flxclxc encoded_drug_reviews. https://huggingface.co/datasets/flxclxc/encoded_drug_reviews (2024)

2024

[53] [53]

https://huggingface.co/datasets/P3ps/condition_to_drug (2024)

P3ps condition_to_drug. https://huggingface.co/datasets/P3ps/condition_to_drug (2024)

2024

[54] [54]

https://huggingface.co/datasets/truehealth/medicationqa (2024)

TrueHealth medicationqa. https://huggingface.co/datasets/truehealth/medicationqa (2024)

2024

[55] [55]

& Sankarasubbu, M

Pal, A. & Sankarasubbu, M. OpenBioLLMs: Advancing Open-Source Large Language Models for Healthcare and Life Sciences. Hugging Face repository (2024)

2024

[56] [56]

JSL-MedLlama-3-8B-v2.0

Labs, J.S. JSL-MedLlama-3-8B-v2.0. https://huggingface.co/johnsnowlabs/JSL-MedLlama-3-8B-v2.0 (2024)

2024

[57] [57]

https://huggingface.co/skumar9/Llama-medx_v3.2 (2024)

skumar9 Llama-medx_v3.2. https://huggingface.co/skumar9/Llama-medx_v3.2 (2024)

2024

[58] [58]

Chen, J. et al. Huatuogpt-o1, towards medical complex reasoning with llms. arXiv preprint arXiv:2412.18925 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[59] [59]

Luo, R. et al. BioGPT: generative pre-trained transformer for biomedical text generation and mining. Briefings in bioinformatics 23, bbac409 (2022)

2022

[60] [60]

Grattafiori, A. et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[61] [61]

Open Life Science AI

AI, O.L.S. Open Life Science AI. https://huggingface.co/openlifescienceai (2024)

2024

[62] [62]

& Sanderson, M

Tombros, A. & Sanderson, M. in Proceedings of the 21 st annual international ACM SIGIR conference on Research and development in information retrieval 2-10 (1998)

1998

[63] [63]

& Zhu, W.-J

Papineni, K., Roukos, S., Ward, T. & Zhu, W.-J. in Proceedings of the 40th annual meeting of the Association for Computational Linguistics 311-318 (2002)

2002

[64] [64]

i n Te x t s u m m a r i z a t i o n b r a n c h e s o u t 7 4-81 (2004)

Lin, C.-Y. i n Te x t s u m m a r i z a t i o n b r a n c h e s o u t 7 4-81 (2004)

2004

[65] [65]

BERTScore: Evaluating Text Generation with BERT

Zhang, T., Kishore, V ., Wu, F., Weinberger, K.Q. & Artzi, Y . Bertscore: Evaluating text generation with bert. arXiv preprint arXiv:1904.09675 (2019)

work page internal anchor Pith review Pith/arXiv arXiv 1904

[66] [66]

Hancock, D.Y . et al. in Practice and Experience in Advanced Research Computing 2021: Evolution Across All Dimensions 1-8 (2021)

2021

[67] [67]

& Towns, J

Boerner, T.J., Deems, S., Furlani, T.R., Knuth, S.L. & Towns, J. in Practice and experience in advanced research computing 2023: Computing for the common good 173-176 (2023)

2023