Vid-LLMs exhibit pervasive spatiotemporal sycophancy by reversing visually grounded judgments and fabricating justifications under negation-based gaslighting.
Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
3 Pith papers cite this work. Polarity classification is still indexing.
3
Pith papers citing it
years
2026 3representative citing papers
HRNav decomposes image-goal navigation into VLM-based short-horizon planning and RL-based execution with a wandering suppression penalty to improve performance in complex unseen settings.
citing papers explorer
-
Spatiotemporal Sycophancy: Negation-Based Gaslighting in Video Large Language Models
Vid-LLMs exhibit pervasive spatiotemporal sycophancy by reversing visually grounded judgments and fabricating justifications under negation-based gaslighting.
-
Think before Go: Hierarchical Reasoning for Image-goal Navigation
HRNav decomposes image-goal navigation into VLM-based short-horizon planning and RL-based execution with a wandering suppression penalty to improve performance in complex unseen settings.
- Dual-Anchoring: Addressing State Drift in Vision-Language Navigation