The paper offers a comprehensive survey and proposes a new taxonomy for continual learning strategies in VLMs and MLLMs to combat catastrophic forgetting beyond traditional methods.
Crossmodal-3600: A massively multilin- gual multimodal evaluation dataset
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.CV 3roles
dataset 1polarities
background 1representative citing papers
PaLI jointly scales a 4B-parameter vision transformer with language models on a new 10B multilingual image-text dataset to reach state-of-the-art results on vision-language tasks while keeping a simple modular design.
GRAPE applies GRPO to an LLM query rewriter with a corpus-relative ranking reward to improve frozen CLIP retrieval by an average 4.9% Recall@10 on shifted benchmarks without retraining or re-embedding.
citing papers explorer
-
Continual Learning for VLMs: A Survey and Taxonomy Beyond Forgetting
The paper offers a comprehensive survey and proposes a new taxonomy for continual learning strategies in VLMs and MLLMs to combat catastrophic forgetting beyond traditional methods.
-
PaLI: A Jointly-Scaled Multilingual Language-Image Model
PaLI jointly scales a 4B-parameter vision transformer with language models on a new 10B multilingual image-text dataset to reach state-of-the-art results on vision-language tasks while keeping a simple modular design.
-
GRAPE: Let GRPO Supervise Query Rewriting by Ranking for Retrieval
GRAPE applies GRPO to an LLM query rewriter with a corpus-relative ranking reward to improve frozen CLIP retrieval by an average 4.9% Recall@10 on shifted benchmarks without retraining or re-embedding.