Visual attention in MLLMs shows inertia that hinders cognitive inference on object relations, addressed by a training-free Inertia-aware Visual Excitation method that selects dynamically emerging tokens and applies an inertia-aware penalty.
arXiv preprint arXiv:2304.09349 (2023) Title Suppressed Due to Excessive Length 27
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
LURE reduces object hallucination in LVLMs by 23% via post-hoc revision informed by co-occurrence, uncertainty, and text position analysis.
DEPS combines LLM-based interactive planning with a trainable goal selector to create a zero-shot multi-task agent that completes 70+ Minecraft tasks and nearly doubles prior performance.
citing papers explorer
-
Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation
Visual attention in MLLMs shows inertia that hinders cognitive inference on object relations, addressed by a training-free Inertia-aware Visual Excitation method that selects dynamically emerging tokens and applies an inertia-aware penalty.
-
Analyzing and Mitigating Object Hallucination in Large Vision-Language Models
LURE reduces object hallucination in LVLMs by 23% via post-hoc revision informed by co-occurrence, uncertainty, and text position analysis.
-
Describe, Explain, Plan and Select: Interactive Planning with Large Language Models Enables Open-World Multi-Task Agents
DEPS combines LLM-based interactive planning with a trainable goal selector to create a zero-shot multi-task agent that completes 70+ Minecraft tasks and nearly doubles prior performance.