Embodied task planning with large language models

Zhenyu Wu, Ziwei Wang, Xiuwei Xu, Jiwen Lu, Haibin Yan · 2023 · arXiv 2307.01848

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 4

citation-polarity summary

background 4

representative citing papers

Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation

cs.CV · 2026-04-02 · unverdicted · novelty 7.0

Visual attention in MLLMs shows inertia that hinders cognitive inference on object relations, addressed by a training-free Inertia-aware Visual Excitation method that selects dynamically emerging tokens and applies an inertia-aware penalty.

From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems

cs.MA · 2025-06-05 · accept · novelty 7.0

A survey that defines Compound AI Systems, proposes a multi-dimensional taxonomy based on component roles and orchestration strategies, reviews four foundational paradigms, and identifies key challenges for future research.

Mitigating Object Hallucinations via Sentence-Level Early Intervention

cs.CV · 2025-07-16 · conditional · novelty 6.0

SENTINEL reduces MLLM object hallucinations by over 90% via sentence-level early intervention with detector-bootstrapped preference data and C-DPO loss, outperforming prior SOTA on hallucination and capability benchmarks.

A Survey on Large Language Model based Autonomous Agents

cs.AI · 2023-08-22 · accept · novelty 6.0

A survey of LLM-based autonomous agents that proposes a unified framework for their construction and reviews applications in social science, natural science, and engineering along with evaluation methods and future directions.

TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning

cs.AI · 2026-05-18 · unverdicted · novelty 5.0

TaskGround introduces a Ground-Infer-Execute framework for full-scene household reasoning that improves success rates on the FullHome benchmark and enables compact models to match larger ones at up to 18x lower token cost.

RoboAgent: Chaining Basic Capabilities for Embodied Task Planning

cs.RO · 2026-04-09 · unverdicted · novelty 5.0

RoboAgent chains basic vision-language capabilities inside a single VLM via a scheduler and trains it in three stages (behavior cloning, DAgger, RL) to improve embodied task planning.

Embodied Task Planning via Graph-Informed Action Generation with Large Language Models

cs.CL · 2026-01-29 · unverdicted · novelty 5.0

GiG uses a Graph-in-Graph architecture with GNN-encoded states, experience memory retrieval, and bounded symbolic lookahead to improve LLM planning on embodied benchmarks with gains up to 37%.

Towards Robust Surgical Automation via Digital Twin Representations from Foundation Models

cs.RO · 2024-09-19 · unverdicted · novelty 5.0

Digital twin representations from vision foundation models enable LLM-based planning for robust peg transfer and gauze retrieval on the dVRK surgical platform with claimed generalizability.

citing papers explorer

Showing 8 of 8 citing papers.

Attention at Rest Stays at Rest: Breaking Visual Inertia for Cognitive Hallucination Mitigation cs.CV · 2026-04-02 · unverdicted · none · ref 49
Visual attention in MLLMs shows inertia that hinders cognitive inference on object relations, addressed by a training-free Inertia-aware Visual Excitation method that selects dynamically emerging tokens and applies an inertia-aware penalty.
From Standalone LLMs to Integrated Intelligence: A Survey of Compound Al Systems cs.MA · 2025-06-05 · accept · none · ref 200
A survey that defines Compound AI Systems, proposes a multi-dimensional taxonomy based on component roles and orchestration strategies, reviews four foundational paradigms, and identifies key challenges for future research.
Mitigating Object Hallucinations via Sentence-Level Early Intervention cs.CV · 2025-07-16 · conditional · none · ref 68
SENTINEL reduces MLLM object hallucinations by over 90% via sentence-level early intervention with detector-bootstrapped preference data and C-DPO loss, outperforming prior SOTA on hallucination and capability benchmarks.
A Survey on Large Language Model based Autonomous Agents cs.AI · 2023-08-22 · accept · none · ref 137
A survey of LLM-based autonomous agents that proposes a unified framework for their construction and reviews applications in social science, natural science, and engineering along with evaluation methods and future directions.
TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning cs.AI · 2026-05-18 · unverdicted · none · ref 47
TaskGround introduces a Ground-Infer-Execute framework for full-scene household reasoning that improves success rates on the FullHome benchmark and enables compact models to match larger ones at up to 18x lower token cost.
RoboAgent: Chaining Basic Capabilities for Embodied Task Planning cs.RO · 2026-04-09 · unverdicted · none · ref 116
RoboAgent chains basic vision-language capabilities inside a single VLM via a scheduler and trains it in three stages (behavior cloning, DAgger, RL) to improve embodied task planning.
Embodied Task Planning via Graph-Informed Action Generation with Large Language Models cs.CL · 2026-01-29 · unverdicted · none · ref 9
GiG uses a Graph-in-Graph architecture with GNN-encoded states, experience memory retrieval, and bounded symbolic lookahead to improve LLM planning on embodied benchmarks with gains up to 37%.
Towards Robust Surgical Automation via Digital Twin Representations from Foundation Models cs.RO · 2024-09-19 · unverdicted · none · ref 57
Digital twin representations from vision foundation models enable LLM-based planning for robust peg transfer and gauze retrieval on the dVRK surgical platform with claimed generalizability.

Embodied task planning with large language models

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer