The paper offers a comprehensive survey and proposes a new taxonomy for continual learning strategies in VLMs and MLLMs to combat catastrophic forgetting beyond traditional methods.
A practitioner’s guide to continual multimodal pretraining
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
DataComp-VLM benchmark shows instruction-heavy data mixtures outperform caption-heavy ones for VLM training, with DCVLM-Baseline reaching 63.6% on 33 tasks using 200B tokens, +5.4pp over FineVision.
ProtoAda uses format-aware prototypes for better task routing and geometry-aware consolidation to reduce interference in multimodal continual instruction tuning.
Introduces a representation-geometry-based taxonomy for continual learning in speech and audio, identifies mismatches with current CL assumptions in foundation models, and lists open challenges.
CRAM uses adaptive MoE with centroid routing and orthogonality constraints to enable parameter-efficient multimodal continual instruction tuning while mitigating forgetting.
citing papers explorer
No citing papers match the current filters.