OLIVIA treats LLM agent action selection as a contextual linear bandit over frozen hidden states and applies UCB exploration to adapt online, yielding consistent gains over static ReAct and prompt-based baselines on four benchmarks.
arXiv preprint arXiv:2502.11740 , volume=
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3verdicts
UNVERDICTED 3representative citing papers
CMIB uses a conditional multimodal information bottleneck to create reusable agent skills that separate verbalizable text content from predictive perceptual residuals, improving execution stability.
LASER uses Visual Grounding Reward and Sink Suppression Reward to preserve visual attention trajectories and suppress sink tokens, reducing visual forgetting in LVLMs.
citing papers explorer
-
LASER: A Corrective Lens for LVLMs via Visual Attention Preservation and Sink Suppression
LASER uses Visual Grounding Reward and Sink Suppression Reward to preserve visual attention trajectories and suppress sink tokens, reducing visual forgetting in LVLMs.