Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare

Dading Chong; Junling Liu; Peilin Zhou; Qichen Ye; Yining Hua; Ziming Wang

arxiv: 2310.17956 · v2 · pith:RQ3IC73Enew · submitted 2023-10-27 · 💻 cs.CV · cs.AI· cs.CL

Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare

Junling Liu , Ziming Wang , Qichen Ye , Dading Chong , Peilin Zhou , Yining Hua This is my paper

classification 💻 cs.CV cs.AIcs.CL

keywords healthcarelargemedicalmodelmodelsqilin-med-vlchinesecomplex

0 comments

read the original abstract

Large Language Models (LLMs) have introduced a new era of proficiency in comprehending complex healthcare and biomedical topics. However, there is a noticeable lack of models in languages other than English and models that can interpret multi-modal input, which is crucial for global healthcare accessibility. In response, this study introduces Qilin-Med-VL, the first Chinese large vision-language model designed to integrate the analysis of textual and visual data. Qilin-Med-VL combines a pre-trained Vision Transformer (ViT) with a foundational LLM. It undergoes a thorough two-stage curriculum training process that includes feature alignment and instruction tuning. This method enhances the model's ability to generate medical captions and answer complex medical queries. We also release ChiMed-VL, a dataset consisting of more than 1M image-text pairs. This dataset has been carefully curated to enable detailed and comprehensive interpretation of medical data using various types of images.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

DREAM-S: Speculative Decoding with Searchable Drafting and Target-Aware Refinement for Multimodal Generation
cs.LG 2026-05 unverdicted novelty 6.0

DREAM-S combines neural architecture search, target-aware supernet training, and attention-entropy-guided distillation to accelerate speculative decoding in VLMs, reporting up to 3.85x speedup over standard methods.
PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering
cs.CV 2023-05 conditional novelty 6.0

PMC-VQA dataset and MedVInT model achieve better generative performance on medical VQA benchmarks by visual instruction tuning on a newly constructed large-scale dataset.
Data-Centric Foundation Models in Computational Healthcare: A Survey
cs.LG 2024-01 unverdicted novelty 3.0

The paper surveys data-centric strategies for foundation models in computational healthcare and supplies a curated list of related models and datasets.