An asynchronous architecture decouples incremental voxel-based mapping from VLM-based semantic enrichment to produce queryable open-vocabulary 3D scene graphs that match or exceed prior methods on segmentation and grounding benchmarks.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
years
2026 2verdicts
UNVERDICTED 2representative citing papers
NEXT estimates external torques from short free-motion data without hardware sensors and FIRST improves imitation learning by upsampling contact phases, yielding over 17% better task progress on long-horizon manipulation tasks.
citing papers explorer
-
Think While You Map: Asynchronous Vision-Language Agents for Incremental 3D Scene Graphs
An asynchronous architecture decouples incremental voxel-based mapping from VLM-based semantic enrichment to produce queryable open-vocabulary 3D scene graphs that match or exceed prior methods on segmentation and grounding benchmarks.
-
FACTR 2: Learning External Force Sensing for Commodity Robot Arms Improves Policy Learning
NEXT estimates external torques from short free-motion data without hardware sensors and FIRST improves imitation learning by upsampling contact phases, yielding over 17% better task progress on long-horizon manipulation tasks.