BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine

Jiahuan Zhang; Kai Yang; Mu Qiao; Siqi Fan; Yizhen Luo; Yushuai Wu; Zaiqing Nie

arxiv: 2308.09442 · v2 · pith:2PI4IXJXnew · submitted 2023-08-18 · 💻 cs.CE

BioMedGPT: Open Multimodal Generative Pre-trained Transformer for BioMedicine

Yizhen Luo , Jiahuan Zhang , Siqi Fan , Kai Yang , Yushuai Wu , Mu Qiao , Zaiqing Nie This is my paper

classification 💻 cs.CE

keywords languagebiomedgptgenerativenaturalbiologicalbiomedgpt-10bbiomedicinehuman

0 comments

read the original abstract

Foundation models (FMs) have exhibited remarkable performance across a wide range of downstream tasks in many domains. Nevertheless, general-purpose FMs often face challenges when confronted with domain-specific problems, due to their limited access to the proprietary training data in a particular domain. In biomedicine, there are various biological modalities, such as molecules, proteins, and cells, which are encoded by the language of life and exhibit significant modality gaps with human natural language. In this paper, we introduce BioMedGPT, an open multimodal generative pre-trained transformer (GPT) for biomedicine, to bridge the gap between the language of life and human natural language. BioMedGPT allows users to easily ``communicate'' with diverse biological modalities through free text, which is the first of its kind. BioMedGPT aligns different biological modalities with natural language via a large generative language model, namely, BioMedGPT-LM. We publish BioMedGPT-10B, which unifies the feature spaces of molecules, proteins, and natural language via encoding and alignment. Through fine-tuning, BioMedGPT-10B outperforms or is on par with human and significantly larger general-purpose foundation models on the biomedical QA task. It also demonstrates promising performance in the molecule QA and protein QA tasks, which could greatly accelerate the discovery of new drugs and therapeutic targets. In addition, BioMedGPT-LM-7B is the first large generative language model based on Llama2 in the biomedical domain, therefore is commercial friendly. Both BioMedGPT-10B and BioMedGPT-LM-7B are open-sourced to the research community. In addition, we publish the datasets that are meticulously curated for the alignment of multi-modalities, i.e., PubChemQA and UniProtQA. All the models, codes, and datasets are available at \url{https://github.com/PharMolix/OpenBioMed}.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 10 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

BioMatrix: Towards a Comprehensive Biological Foundation Model Spanning the Modality Matrix of Sequences, Structures, and Language
cs.CL 2026-06 unverdicted novelty 7.0

BioMatrix unifies sequences, structures, and language for molecules and proteins inside one decoder-only foundation model via shared discrete tokens and achieves SOTA or competitive results on 77 of 80 downstream tasks.
A Vision-language Framework for Comparative Reasoning in Radiology
cs.CV 2026-06 unverdicted novelty 7.0

Introduces MedReCo-DB dataset of 690k+ images and entity-aware models MedReCo/MedReCo-VLM that improve reference retrieval and comparative change interpretation in radiology across multiple centers and modalities.
Caught in the Web of Words: Do LLMs Fall for Spin in Medical Literature?
cs.CL 2025-02 unverdicted novelty 7.0

Evaluation of 22 LLMs shows they are more susceptible to spin in medical abstracts than humans but can recognize and mitigate it when prompted.
Vision-driven Preference Synthesis for Mitigating Hallucinations in VLMs
cs.CV 2026-06 unverdicted novelty 6.0

ViPSy constructs policy-aligned and visually grounded preference pairs for VLMs via visual cues from image variants, yielding SOTA hallucination reductions of 35.7% on AMBER and 24.5% on Object HalBench.
Bolek: A Multimodal Language Model for Molecular Reasoning
cs.LG 2026-05 unverdicted novelty 5.0

Bolek injects Morgan fingerprint embeddings into an instruction-tuned text model, then fine-tunes on molecular alignment and synthetic chain-of-thought tasks to improve performance and grounding on 15 TDC binary class...
Deep neural networks with Fisher vector encoding for medical image classification
cs.CV 2026-05 unverdicted novelty 5.0

Fisher vector encoding integrated into CNN-ViT hybrids outperforms benchmarks on MedMNIST datasets and matches literature results on other medical image sets.
Human-aligned AI Model Cards with Weighted Hierarchy Architecture
cs.SE 2025-10 unverdicted novelty 4.0

Introduces CRAI-MCF, an eight-module framework distilling 217 parameters from 240 projects into a quantitative sufficiency criterion for cross-model LLM comparison grounded in Value Sensitive Design.
Perovskite-R1: a domain-specialized large language model for intelligent discovery of precursor additives and experimental design
cs.LG 2025-07 unverdicted novelty 4.0

A fine-tuned LLM called Perovskite-R1, built from curated perovskite literature and material libraries, proposes precursor additives and designs with some experimental validation showing improved stability and performance.
A Survey on Knowledge Distillation of Large Language Models
cs.CL 2024-02 accept novelty 3.0

A comprehensive survey of knowledge distillation for LLMs structured around algorithms, skill enhancement, and vertical applications, highlighting data augmentation as a key enabler.
Data-Centric Foundation Models in Computational Healthcare: A Survey
cs.LG 2024-01 unverdicted novelty 3.0

The paper surveys data-centric strategies for foundation models in computational healthcare and supplies a curated list of related models and datasets.