Incremental visual scaffolding using multimodal models improves persistent common ground representation in situated dialogue by reducing representational blur compared to text-only approaches, with hybrid text-visual yielding best results on the IndiRef benchmark.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Using Machine Mental Imagery for Representing Common Ground in Situated Dialogue
Incremental visual scaffolding using multimodal models improves persistent common ground representation in situated dialogue by reducing representational blur compared to text-only approaches, with hybrid text-visual yielding best results on the IndiRef benchmark.