Continual pretraining on UMLS-derived text improves BERT on BLURB biomedical tasks while GraphRAG boosts LLaMA 3-8B accuracy by over 3 points on PubMedQA and 5 on BioASQ without retraining.
Injecting Structured Biomedical Knowledge into Language Models: Continual Pretraining vs. GraphRAG
1 Pith paper cite this work. Polarity classification is still indexing.
abstract
The injection of domain-specific knowledge is crucial for adapting language models (LMs) to specialized fields such as biomedicine. While most current approaches rely on unstructured text corpora, this study explores two complementary strategies for leveraging structured knowledge from the UMLS Metathesaurus: (i) Continual pretraining that embeds knowledge into model parameters, and (ii) Graph Retrieval-Augmented Generation (GraphRAG) that consults a knowledge graph at inference time. We first construct a large-scale biomedical knowledge graph from UMLS (3.4 million concepts and 34.2 million relations), stored in Neo4j for efficient querying. We then derive a ~100-million-token textual corpus from this graph to continually pretrain two models: BERTUMLS (from BERT) and BioBERTUMLS (from BioBERT). We evaluate these models on six BLURB (Biomedical Language Understanding and Reasoning Benchmark) datasets spanning five task types and evaluate GraphRAG on the two QA (Question Answering) datasets (PubMedQA, BioASQ). On BLURB tasks, BERTUMLS improves over BERT, with the largest gains on knowledge-intensive QA. Effects on BioBERT are more nuanced, suggesting diminishing returns when the base model already encodes substantial biomedical text knowledge. Finally, augmenting LLaMA 3-8B with our GraphRAG pipeline yields over than 3 points accuracy on PubMedQA and 5 points on BioASQ without any retraining, delivering transparent, multi-hop, and easily updated knowledge access. We release the processed UMLS Neo4j graph to support reproducibility.
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Injecting Structured Biomedical Knowledge into Language Models: Continual Pretraining vs. GraphRAG
Continual pretraining on UMLS-derived text improves BERT on BLURB biomedical tasks while GraphRAG boosts LLaMA 3-8B accuracy by over 3 points on PubMedQA and 5 on BioASQ without retraining.