Introduces Efficient Operator Search, a differentiable framework that jointly optimizes token reduction locations, retention budgets, and operator behaviors in multimodal models under cost constraints, recovering manual baselines and finding hybrid operators with competitive efficiency.
Rethinking causal mask attention for vision-language inference
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
PulseFocus improves multi-image reasoning in VLMs by interleaving planning and attention-gated focus blocks during chain-of-thought, achieving gains on BLINK and MuirBench.
citing papers explorer
-
Differentiable Efficient Operator Search
Introduces Efficient Operator Search, a differentiable framework that jointly optimizes token reduction locations, retention budgets, and operator behaviors in multimodal models under cost constraints, recovering manual baselines and finding hybrid operators with competitive efficiency.
-
Decoding the Pulse of Reasoning VLMs in Multi-Image Understanding Tasks
PulseFocus improves multi-image reasoning in VLMs by interleaving planning and attention-gated focus blocks during chain-of-thought, achieving gains on BLINK and MuirBench.