The paper introduces the MICL scenario for MLLMs with modality and task shifts and proposes MoInCL using pseudo-target generation and instruction-based distillation, reporting gains over continual learning baselines on six tasks.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
verdicts
UNVERDICTED 2representative citing papers
A multimodal Transformer ingests image features plus multiple external entity label sources and learns to control their appearance in fluent output captions.
citing papers explorer
-
Modality-Inconsistent Continual Learning of Multimodal Large Language Models
The paper introduces the MICL scenario for MLLMs with modality and task shifts and proposes MoInCL using pseudo-target generation and instruction-based distillation, reporting gains over continual learning baselines on six tasks.
-
Informative Image Captioning with External Sources of Information
A multimodal Transformer ingests image features plus multiple external entity label sources and learns to control their appearance in fluent output captions.