A single model trained only on image-text pairs gains instruction-following ability across images, video, and audio by routing all modalities through ImageBind's shared embedding space into Vicuna.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Introduces Tree Generation (TG-SFT) to generate synthetic instruction-tuning data from LLMs, reducing catastrophic forgetting when fine-tuning MLLMs on domain-specific or multimodal data.
FediLoRA is a lightweight federated LoRA aggregation method that jointly mitigates missing modalities and heterogeneous ranks in collaborative fine-tuning of foundation models.
citing papers explorer
-
PandaGPT: One Model To Instruction-Follow Them All
A single model trained only on image-text pairs gains instruction-following ability across images, video, and audio by routing all modalities through ImageBind's shared embedding space into Vicuna.
-
Preserving Knowledge in Large Language Model with Model-Agnostic Self-Decompression
Introduces Tree Generation (TG-SFT) to generate synthetic instruction-tuning data from LLMs, reducing catastrophic forgetting when fine-tuning MLLMs on domain-specific or multimodal data.
-
FediLoRA: Practical Federated Fine-Tuning of Foundation Models Under Missing-Modality Constraints
FediLoRA is a lightweight federated LoRA aggregation method that jointly mitigates missing modalities and heterogeneous ranks in collaborative fine-tuning of foundation models.