pith. sign in

arxiv: 2310.17956 · v2 · pith:RQ3IC73Enew · submitted 2023-10-27 · 💻 cs.CV · cs.AI· cs.CL

Qilin-Med-VL: Towards Chinese Large Vision-Language Model for General Healthcare

classification 💻 cs.CV cs.AIcs.CL
keywords healthcarelargemedicalmodelmodelsqilin-med-vlchinesecomplex
0
0 comments X
read the original abstract

Large Language Models (LLMs) have introduced a new era of proficiency in comprehending complex healthcare and biomedical topics. However, there is a noticeable lack of models in languages other than English and models that can interpret multi-modal input, which is crucial for global healthcare accessibility. In response, this study introduces Qilin-Med-VL, the first Chinese large vision-language model designed to integrate the analysis of textual and visual data. Qilin-Med-VL combines a pre-trained Vision Transformer (ViT) with a foundational LLM. It undergoes a thorough two-stage curriculum training process that includes feature alignment and instruction tuning. This method enhances the model's ability to generate medical captions and answer complex medical queries. We also release ChiMed-VL, a dataset consisting of more than 1M image-text pairs. This dataset has been carefully curated to enable detailed and comprehensive interpretation of medical data using various types of images.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. DREAM-S: Speculative Decoding with Searchable Drafting and Target-Aware Refinement for Multimodal Generation

    cs.LG 2026-05 unverdicted novelty 6.0

    DREAM-S combines neural architecture search, target-aware supernet training, and attention-entropy-guided distillation to accelerate speculative decoding in VLMs, reporting up to 3.85x speedup over standard methods.

  2. PMC-VQA: Visual Instruction Tuning for Medical Visual Question Answering

    cs.CV 2023-05 conditional novelty 6.0

    PMC-VQA dataset and MedVInT model achieve better generative performance on medical VQA benchmarks by visual instruction tuning on a newly constructed large-scale dataset.

  3. Data-Centric Foundation Models in Computational Healthcare: A Survey

    cs.LG 2024-01 unverdicted novelty 3.0

    The paper surveys data-centric strategies for foundation models in computational healthcare and supplies a curated list of related models and datasets.