pith. machine review for the scientific record. sign in

arxiv: 2307.01139 · v2 · submitted 2023-07-03 · 💻 cs.CV · cs.AI· cs.CL· cs.LG

Recognition: unknown

SCITUNE: Aligning Large Language Models with Human-Curated Scientific Multimodal Instructions

Authors on Pith no claims yet
classification 💻 cs.CV cs.AIcs.CLcs.LG
keywords modelsmultimodalscientificscituneinstructionslanguagelargellama-scitune
0
0 comments X
read the original abstract

Instruction finetuning is a popular paradigm to align large language models (LLM) with human intent. Despite its popularity, this idea is less explored in improving LLMs to align existing foundation models with scientific disciplines, concepts and goals. In this work, we present \textit{SciTune} as a tuning framework to improve the ability of LLMs to follow multimodal instructions generated from scientific publications. To test our methodology, we train a large multimodal model LLaMA-SciTune that connects a vision encoder and LLM for science-focused visual and language understanding. LLaMA-SciTune significantly outperforms the state-of-the-art models in the generated figure types and captions in SciCap and VisText benchmarks. In comparison to the models that are finetuned with synthetic data only, LLaMA-SciTune surpasses human performance on average and in many sub-categories on the ScienceQA benchmark. Our results demonstrate that human-generated scientific multimodal instructions remain highly valuable in tuning LLMs to perform well on science tasks, despite their lower volume and relative scarcity compared to synthetic data. We publicly release the SciTune codebase https://github.com/pnnl/scitune.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Back to the Barn with LLAMAs: Evolving Pretrained LLM Backbones in Finetuning Vision Language Models

    cs.AI 2026-04 unverdicted novelty 5.0

    Newer LLM backbones in VLMs do not always improve performance; gains are task-dependent, with VQA models solving different questions due to better confidence calibration and stable representations.