Robots that ask for help: Uncertainty alignment for large language model planners

Allen Z Ren, Anushri Dixit, Alexandra Bodrova, Sumeet Singh, Stephen Tu, Noah Brown, Peng Xu, Leila Takayama, Fei Xia, Jake Varley, et al · 2023 · arXiv 2307.01928

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

read on arXiv browse 10 citing papers

citation-role summary

background 2

citation-polarity summary

background 1 support 1

representative citing papers

3D Instruction Ambiguity Detection

cs.AI · 2026-01-09 · unverdicted · novelty 8.0

Defines 3D Instruction Ambiguity Detection as a new task, releases the Ambi3D benchmark, shows state-of-the-art 3D LLMs struggle with it, and proposes the AmbiVer framework that gathers multi-view visual evidence to guide VLMs in judging ambiguity.

Uncovering the Representation Geometry of Minimal Cores in Overcomplete Reasoning Traces

cs.AI · 2026-05-14 · unverdicted · novelty 7.0

Language models produce overcomplete reasoning traces where on average 46% of steps can be removed while preserving the answer in 86% of cases, with necessity concentrated in the top three steps.

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

cs.AI · 2023-04-22 · accept · novelty 7.0

LLM+P lets LLMs solve planning problems optimally by converting them to PDDL for classical planners and back to natural language.

Budgeted Act-or-Defer Multi-Agent LLM Deliberation with Local Reliability Bounds

cs.AI · 2026-06-28 · unverdicted · novelty 6.0

A kNN lower-confidence-bound approach for act-or-defer decisions in multi-agent LLM debates respects user-declared wrong-action budgets while achieving high automation rates on benchmarks.

Policy-Guided Stepwise Model Routing for Cost-Effective Reasoning

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

A small RL-trained policy for stepwise model routing between LLM sizes improves the accuracy-cost tradeoff on math benchmarks over handcrafted strategies and matches large process reward model methods.

Geometry-Calibrated Conformal Abstention for Language Models

cs.CL · 2026-04-30 · unverdicted · novelty 6.0

Geometry-calibrated conformal abstention lets language models abstain from uncertain queries with finite-sample guarantees on both participation rate and conditional correctness of answers.

KGLAMP: Knowledge Graph-guided Language model for Adaptive Multi-robot Planning and Replanning

cs.RO · 2026-02-04 · unverdicted · novelty 6.0

KGLAMP uses a dynamically updated knowledge graph to guide LLMs in creating and replanning PDDL specifications for heterogeneous multi-robot teams, reporting at least 25.3% better performance than LLM-only or classical PDDL baselines on the MAT-THOR benchmark.

Code as Agent Harness

cs.CL · 2026-05-18 · accept · novelty 5.0

A survey that organizes existing work on LLM-based agents around code as the central harness, structured in three layers of interfaces, mechanisms, and multi-agent scaling, with applications across domains and listed open challenges.

TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning

cs.AI · 2026-05-18 · unverdicted · novelty 5.0

TaskGround introduces a Ground-Infer-Execute framework for full-scene household reasoning that improves success rates on the FullHome benchmark and enables compact models to match larger ones at up to 18x lower token cost.

VizCopilot: Fostering Appropriate Reliance on Enterprise Chatbots with Context Visualization

cs.HC · 2025-10-13 · unverdicted · novelty 5.0

VizCopilot integrates topic modeling with document visualization to support user oversight of retrieved context in enterprise chatbots, enabling detection of misalignments and adaptation of prompting strategies.

citing papers explorer

Showing 8 of 8 citing papers after filters.

3D Instruction Ambiguity Detection cs.AI · 2026-01-09 · unverdicted · none · ref 20
Defines 3D Instruction Ambiguity Detection as a new task, releases the Ambi3D benchmark, shows state-of-the-art 3D LLMs struggle with it, and proposes the AmbiVer framework that gathers multi-view visual evidence to guide VLMs in judging ambiguity.
Uncovering the Representation Geometry of Minimal Cores in Overcomplete Reasoning Traces cs.AI · 2026-05-14 · unverdicted · none · ref 61
Language models produce overcomplete reasoning traces where on average 46% of steps can be removed while preserving the answer in 86% of cases, with necessity concentrated in the top three steps.
Budgeted Act-or-Defer Multi-Agent LLM Deliberation with Local Reliability Bounds cs.AI · 2026-06-28 · unverdicted · none · ref 82
A kNN lower-confidence-bound approach for act-or-defer decisions in multi-agent LLM debates respects user-declared wrong-action budgets while achieving high automation rates on benchmarks.
Policy-Guided Stepwise Model Routing for Cost-Effective Reasoning cs.AI · 2026-05-07 · unverdicted · none · ref 14
A small RL-trained policy for stepwise model routing between LLM sizes improves the accuracy-cost tradeoff on math benchmarks over handcrafted strategies and matches large process reward model methods.
Geometry-Calibrated Conformal Abstention for Language Models cs.CL · 2026-04-30 · unverdicted · none · ref 20
Geometry-calibrated conformal abstention lets language models abstain from uncertain queries with finite-sample guarantees on both participation rate and conditional correctness of answers.
KGLAMP: Knowledge Graph-guided Language model for Adaptive Multi-robot Planning and Replanning cs.RO · 2026-02-04 · unverdicted · none · ref 7
KGLAMP uses a dynamically updated knowledge graph to guide LLMs in creating and replanning PDDL specifications for heterogeneous multi-robot teams, reporting at least 25.3% better performance than LLM-only or classical PDDL baselines on the MAT-THOR benchmark.
Code as Agent Harness cs.CL · 2026-05-18 · accept · none · ref 111
A survey that organizes existing work on LLM-based agents around code as the central harness, structured in three layers of interfaces, mechanisms, and multi-agent scaling, with applications across domains and listed open challenges.
TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning cs.AI · 2026-05-18 · unverdicted · none · ref 34
TaskGround introduces a Ground-Infer-Execute framework for full-scene household reasoning that improves success rates on the FullHome benchmark and enables compact models to match larger ones at up to 18x lower token cost.

Robots that ask for help: Uncertainty alignment for large language model planners

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer