hub

SayPlan: Grounding large language models using 3d scene graphs for scalable robot task planning

· 2023 · arXiv 2307.06135

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 4

citation-polarity summary

background 3 support 1

representative citing papers

When Robots Do the Chores: A Benchmark and Agent for Long-Horizon Household Task Execution

cs.AI · 2026-05-14 · unverdicted · novelty 8.0 · 2 refs

LongAct benchmark evaluates long-horizon household task execution from free-form instructions; HoloMind agent raises performance but top VLMs still reach only 59% goal completion and 16% full-task success.

KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis

cs.RO · 2026-04-08 · unverdicted · novelty 7.0

KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.

ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs

cs.RO · 2026-02-09 · unverdicted · novelty 7.0

ST-BiBench reveals a coordination paradox in which MLLMs show strong high-level strategic reasoning yet fail at fine-grained 16-dimensional bimanual action synthesis and multi-stream fusion.

BridgeEQA: Virtual Embodied Agents for Real Bridge Inspections

cs.CV · 2025-11-16 · conditional · novelty 7.0

BridgeEQA creates a new benchmark and EMVR method for embodied agents to perform question answering on real-world bridge inspections using egocentric images and professional reports.

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

cs.AI · 2023-04-22 · accept · novelty 7.0

LLM+P lets LLMs solve planning problems optimally by converting them to PDDL for classical planners and back to natural language.

VEOcc: Voxel-Centric Online Semantic Occupancy Prediction For Embodied Scene Understanding

cs.CV · 2026-05-24 · unverdicted · novelty 6.0

VEOcc is a voxel-based online semantic occupancy prediction method using recursive assimilation and three update modules (TLA, RCM, CSU) that reports new SOTA results on Occ-ScanNet and EmbodiedOcc-ScanNet.

RGB-only Active 3D Scene Graph Generation for Indoor Mobile Robots

cs.RO · 2026-05-18 · unverdicted · novelty 6.0

RGB-only active 3D scene graph generation unifies perception and planning to achieve depth-baseline parity and more than double object detection in active indoor exploration.

Fixed External Cameras as Common Prior Maps for Active 3D Scene Graph Generation

cs.RO · 2026-05-18 · unverdicted · novelty 6.0

Fixed external cameras as Common Prior Maps boost initial object recall in 3D scene graph generation by up to 79% and improve active exploration efficiency.

Long-Horizon Manipulation via Trace-Conditioned VLA Planning

cs.RO · 2026-04-23 · unverdicted · novelty 6.0

LoHo-Manip enables robust long-horizon robot manipulation by using a receding-horizon VLM manager to output progress-aware subtask sequences and 2D visual traces that condition a VLA executor for automatic replanning.

InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy

cs.RO · 2025-10-15 · unverdicted · novelty 6.0

InternVLA-M1 uses spatially guided pre-training on 2.3M examples followed by action post-training to deliver up to 17% gains on robot manipulation benchmarks and 20.6% on unseen objects.

Beyond Waypoints: Dual-Heatmap Grounding for Cross-Embodiment Semantic Navigation

cs.RO · 2026-05-19 · unverdicted · novelty 5.0

A vision-language model outputs dual heatmaps for navigation affordance and facing to ground semantic instructions into executable free space, achieving higher affordance rates than waypoint regression across simulated robot embodiments.

TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning

cs.AI · 2026-05-18 · unverdicted · novelty 5.0

TaskGround introduces a Ground-Infer-Execute framework for full-scene household reasoning that improves success rates on the FullHome benchmark and enables compact models to match larger ones at up to 18x lower token cost.

Towards Robust Surgical Automation via Digital Twin Representations from Foundation Models

cs.RO · 2024-09-19 · unverdicted · novelty 5.0

Digital twin representations from vision foundation models enable LLM-based planning for robust peg transfer and gauze retrieval on the dVRK surgical platform with claimed generalizability.

The Rise and Potential of Large Language Model Based Agents: A Survey

cs.AI · 2023-09-14 · accept · novelty 4.0

The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.

citing papers explorer

Showing 14 of 14 citing papers.

When Robots Do the Chores: A Benchmark and Agent for Long-Horizon Household Task Execution cs.AI · 2026-05-14 · unverdicted · none · ref 31 · 2 links
LongAct benchmark evaluates long-horizon household task execution from free-form instructions; HoloMind agent raises performance but top VLMs still reach only 59% goal completion and 16% full-task success.
KITE: Keyframe-Indexed Tokenized Evidence for VLM-Based Robot Failure Analysis cs.RO · 2026-04-08 · unverdicted · none · ref 42
KITE is a training-free method that uses keyframe-indexed tokenized evidence including BEV schematics to enhance VLM performance on robot failure detection, identification, localization, explanation, and correction.
ST-BiBench: Benchmarking Multi-Stream Multimodal Coordination in Bimanual Embodied Tasks for MLLMs cs.RO · 2026-02-09 · unverdicted · none · ref 92
ST-BiBench reveals a coordination paradox in which MLLMs show strong high-level strategic reasoning yet fail at fine-grained 16-dimensional bimanual action synthesis and multi-stream fusion.
BridgeEQA: Virtual Embodied Agents for Real Bridge Inspections cs.CV · 2025-11-16 · conditional · none · ref 37
BridgeEQA creates a new benchmark and EMVR method for embodied agents to perform question answering on real-world bridge inspections using egocentric images and professional reports.
LLM+P: Empowering Large Language Models with Optimal Planning Proficiency cs.AI · 2023-04-22 · accept · none · ref 56
LLM+P lets LLMs solve planning problems optimally by converting them to PDDL for classical planners and back to natural language.
VEOcc: Voxel-Centric Online Semantic Occupancy Prediction For Embodied Scene Understanding cs.CV · 2026-05-24 · unverdicted · none · ref 7
VEOcc is a voxel-based online semantic occupancy prediction method using recursive assimilation and three update modules (TLA, RCM, CSU) that reports new SOTA results on Occ-ScanNet and EmbodiedOcc-ScanNet.
RGB-only Active 3D Scene Graph Generation for Indoor Mobile Robots cs.RO · 2026-05-18 · unverdicted · none · ref 5
RGB-only active 3D scene graph generation unifies perception and planning to achieve depth-baseline parity and more than double object detection in active indoor exploration.
Fixed External Cameras as Common Prior Maps for Active 3D Scene Graph Generation cs.RO · 2026-05-18 · unverdicted · none · ref 5
Fixed external cameras as Common Prior Maps boost initial object recall in 3D scene graph generation by up to 79% and improve active exploration efficiency.
Long-Horizon Manipulation via Trace-Conditioned VLA Planning cs.RO · 2026-04-23 · unverdicted · none · ref 54
LoHo-Manip enables robust long-horizon robot manipulation by using a receding-horizon VLM manager to output progress-aware subtask sequences and 2D visual traces that condition a VLA executor for automatic replanning.
InternVLA-M1: A Spatially Guided Vision-Language-Action Framework for Generalist Robot Policy cs.RO · 2025-10-15 · unverdicted · none · ref 33
InternVLA-M1 uses spatially guided pre-training on 2.3M examples followed by action post-training to deliver up to 17% gains on robot manipulation benchmarks and 20.6% on unseen objects.
Beyond Waypoints: Dual-Heatmap Grounding for Cross-Embodiment Semantic Navigation cs.RO · 2026-05-19 · unverdicted · none · ref 20
A vision-language model outputs dual heatmaps for navigation affordance and facing to ground semantic instructions into executable free space, achieving higher affordance rates than waypoint regression across simulated robot embodiments.
TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning cs.AI · 2026-05-18 · unverdicted · none · ref 33
TaskGround introduces a Ground-Infer-Execute framework for full-scene household reasoning that improves success rates on the FullHome benchmark and enables compact models to match larger ones at up to 18x lower token cost.
Towards Robust Surgical Automation via Digital Twin Representations from Foundation Models cs.RO · 2024-09-19 · unverdicted · none · ref 58
Digital twin representations from vision foundation models enable LLM-based planning for robust peg transfer and gauze retrieval on the dVRK surgical platform with claimed generalizability.
The Rise and Potential of Large Language Model Based Agents: A Survey cs.AI · 2023-09-14 · accept · none · ref 261
The paper surveys the origins, frameworks, applications, and open challenges of AI agents built on large language models.

SayPlan: Grounding large language models using 3d scene graphs for scalable robot task planning

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer