MM-Conv supplies over 4,200 verified referring expressions from dynamic 3D VR data and shows a two-stage pipeline with contextual rewriting lifts grounding accuracy 11-22 points, nearly doubling pronominal performance versus end-to-end baselines.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CV 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
MM-Conv: A Multimodal Dataset and Benchmark for Context-Aware Grounding in 3D Dialogue
MM-Conv supplies over 4,200 verified referring expressions from dynamic 3D VR data and shows a two-stage pipeline with contextual rewriting lifts grounding accuracy 11-22 points, nearly doubling pronominal performance versus end-to-end baselines.