BRA VE: Broadening the visual encoding of vision-language models

O˘guzhan Fatih Kar, Alessio Tonioni, Petra Poklukar, Achin Kulshrestha, Amir Zamir, Federico Tombari · 2024 · arXiv 2404.07204

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

read on arXiv browse 2 citing papers

representative citing papers

Beyond Encoder Accumulation: Measuring Encoder Roles in Multi-Encoder VLMs

cs.CV · 2026-06-02 · unverdicted · novelty 6.0

Retraining all 31 subsets of five vision encoders shows Capacity and Necessity are distinct, pre-projector effective rank predicts residual performance at fixed parameter count, and high-Capacity plus adaptive complement pairs match the full five-encoder model.

PaliGemma 2: A Family of Versatile VLMs for Transfer

cs.CV · 2024-12-04 · unverdicted · novelty 4.0

PaliGemma 2 is a family of vision-language models that achieves state-of-the-art results on transfer tasks like table structure recognition and radiography report generation by combining SigLIP with Gemma 2 models at various sizes and resolutions.

citing papers explorer

Showing 2 of 2 citing papers after filters.

Beyond Encoder Accumulation: Measuring Encoder Roles in Multi-Encoder VLMs cs.CV · 2026-06-02 · unverdicted · none · ref 12
Retraining all 31 subsets of five vision encoders shows Capacity and Necessity are distinct, pre-projector effective rank predicts residual performance at fixed parameter count, and high-Capacity plus adaptive complement pairs match the full five-encoder model.
PaliGemma 2: A Family of Versatile VLMs for Transfer cs.CV · 2024-12-04 · unverdicted · none · ref 34
PaliGemma 2 is a family of vision-language models that achieves state-of-the-art results on transfer tasks like table structure recognition and radiography report generation by combining SigLIP with Gemma 2 models at various sizes and resolutions.

BRA VE: Broadening the visual encoding of vision-language models

fields

years

verdicts

representative citing papers

citing papers explorer