Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Niessner

· 2017 · DOI 10.1109/cvpr.2017.261

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

open at publisher browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

From Pixels to Concepts: Growing Rich 3D Semantic Scene Graph Forests utilizing Foundation Models

cs.RO · 2026-06-22 · unverdicted · novelty 6.0

Uses VLMs to detect instance concepts and LLMs to infer abstract relationships, assembling them into 3D scene graph forests that are evaluated on uHumans2 and ScanNet and tested in open-vocabulary retrieval on a Spot robot.

Language as a Sensor: Calibrated Spatial Belief Estimation in 3D Scenes from Natural Language

cs.RO · 2026-06-07 · unverdicted · novelty 6.0

Introduces LSM that outputs calibrated multimodal spatial distributions from language plus scene graph, fused via VL-Map to improve 3D target localization on VLA-3D benchmark and real robot.

Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models

cs.RO · 2026-06-09 · unverdicted · novelty 5.0

Embodied-R1.5 is an 8B EFM achieving SOTA on 16 of 24 embodied VLM benchmarks, fine-tunable to outperform leading VLAs, with claimed zero-shot real-robot generalization.

A Survey on Vision-Language-Action Models: An Action Tokenization Perspective

cs.RO · 2025-07-02 · unverdicted · novelty 5.0

The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.

citing papers explorer

Showing 3 of 3 citing papers after filters.

From Pixels to Concepts: Growing Rich 3D Semantic Scene Graph Forests utilizing Foundation Models cs.RO · 2026-06-22 · unverdicted · none · ref 25
Uses VLMs to detect instance concepts and LLMs to infer abstract relationships, assembling them into 3D scene graph forests that are evaluated on uHumans2 and ScanNet and tested in open-vocabulary retrieval on a Spot robot.
Language as a Sensor: Calibrated Spatial Belief Estimation in 3D Scenes from Natural Language cs.RO · 2026-06-07 · unverdicted · none · ref 26
Introduces LSM that outputs calibrated multimodal spatial distributions from language plus scene graph, fused via VL-Map to improve 3D target localization on VLA-3D benchmark and real robot.
Embodied-R1.5: Evolving Physical Intelligence via Embodied Foundation Models cs.RO · 2026-06-09 · unverdicted · none · ref 15
Embodied-R1.5 is an 8B EFM achieving SOTA on 16 of 24 embodied VLM benchmarks, fine-tunable to outperform leading VLAs, with claimed zero-shot real-robot generalization.

Chang, Manolis Savva, Maciej Halber, Thomas Funkhouser, and Matthias Niessner

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer