Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning

Qiao Gu, Ali Kuwajerwala, Sacha Morin, Krishna Murthy Jatavallabhula, Bipasha Sen, Aditya Agarwal, Corban Rivera, William Paul, Kirsty Ellis, Ramalingam Chellappa, Chuang Gan, Celso Miguel de Melo, Joshua B Tenenbaum, Antonio Torralba, Flor · 2023 · arXiv 2309.16650

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Expanding Spatial and Temporal Context for Robotic Imitation Learning With Scene Graphs

cs.RO · 2026-05-31 · unverdicted · novelty 6.0

Dynamic scene graphs serve as explicit memory to improve imitation learning policies for spatial-temporal reasoning under partial observability in mobile and tabletop manipulation.

A Survey on Vision-Language-Action Models for Embodied AI

cs.RO · 2024-05-23 · unverdicted · novelty 6.0

This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.

SceneGraphGrounder: Zero-Shot 3D Visual Grounding via Structured Scene Graph Matching

cs.CV · 2026-05-20 · unverdicted · novelty 5.0

SceneGraphGrounder builds a persistent 3D scene graph from VLM-inferred relations in 2D views and solves grounding via constrained graph alignment, achieving competitive zero-shot results on ScanRefer with only RGB-D input.

FUS3DMaps: Scalable and Accurate Open-Vocabulary Semantic Mapping by 3D Fusion of Voxel- and Instance-Level Layers

cs.RO · 2026-05-05 · unverdicted · novelty 5.0

FUS3DMaps fuses voxel- and instance-level open-vocabulary layers inside a shared 3D voxel map to improve both layers and enable scalable accurate semantic mapping.

A Modular Vision-Language-Action Robotics Framework for Indoor Environments

cs.RO · 2026-06-30 · unverdicted · novelty 3.0

Describes a modular VLA framework with semantic voxel mapping via OwlViT and VLM-based command classification and grounding for the CMU VLA Challenge.

citing papers explorer

Showing 1 of 1 citing paper after filters.

A Survey on Vision-Language-Action Models for Embodied AI cs.RO · 2024-05-23 · unverdicted · none · ref 154
This is the first survey on vision-language-action models, providing a taxonomy across three lines, plus summaries of datasets, simulators, benchmarks, challenges, and future directions in embodied AI.

Conceptgraphs: Open-vocabulary 3d scene graphs for perception and planning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer