PERL augments frozen CLIP with a shared recurrent reasoning module of roughly 6K parameters that iteratively refines representations via latent token injection, delivering strong base-to-novel and transfer performance across 15 benchmarks.
Mmrl: Multi-modal representation learning for vision- language models
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
PERL: Parameter Efficient Reasoning in CLIP Latent Space
PERL augments frozen CLIP with a shared recurrent reasoning module of roughly 6K parameters that iteratively refines representations via latent token injection, delivering strong base-to-novel and transfer performance across 15 benchmarks.