Dynamic scene graphs serve as explicit memory to improve imitation learning policies for spatial-temporal reasoning under partial observability in mobile and tabletop manipulation.
Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning
5 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 5roles
background 1polarities
background 1representative citing papers
This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.
SceneGraphGrounder builds a persistent 3D scene graph from VLM-inferred relations in 2D views and solves grounding via constrained graph alignment, achieving competitive zero-shot results on ScanRefer with only RGB-D input.
FUS3DMaps fuses voxel- and instance-level open-vocabulary layers inside a shared 3D voxel map to improve both layers and enable scalable accurate semantic mapping.
Describes a modular VLA framework with semantic voxel mapping via OwlViT and VLM-based command classification and grounding for the CMU VLA Challenge.
citing papers explorer
-
A Survey on Vision-Language-Action Models for Embodied AI
This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.