hub

Robots that ask for help: Uncertainty alignment for large language model planners

URL https://openai · 2023 · arXiv 2307.01928

14 Pith papers cite this work. Polarity classification is still indexing.

14 Pith papers citing it

read on arXiv browse 14 citing papers

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 2

citation-polarity summary

background 1 support 1

representative citing papers

3D Instruction Ambiguity Detection

cs.AI · 2026-01-09 · unverdicted · novelty 8.0

Defines 3D Instruction Ambiguity Detection as a new task, releases the Ambi3D benchmark, shows state-of-the-art 3D LLMs struggle with it, and proposes the AmbiVer framework that gathers multi-view visual evidence to guide VLMs in judging ambiguity.

Uncovering the Representation Geometry of Minimal Cores in Overcomplete Reasoning Traces

cs.AI · 2026-05-14 · unverdicted · novelty 7.0

Language models produce overcomplete reasoning traces where on average 46% of steps can be removed while preserving the answer in 86% of cases, with necessity concentrated in the top three steps.

LLM+P: Empowering Large Language Models with Optimal Planning Proficiency

cs.AI · 2023-04-22 · accept · novelty 7.0

LLM+P lets LLMs solve planning problems optimally by converting them to PDDL for classical planners and back to natural language.

Budgeted Act-or-Defer Multi-Agent LLM Deliberation with Local Reliability Bounds

cs.AI · 2026-06-28 · unverdicted · novelty 6.0

A kNN lower-confidence-bound approach for act-or-defer decisions in multi-agent LLM debates respects user-declared wrong-action budgets while achieving high automation rates on benchmarks.

Calibration Is Not Control: Why LLM-Agent Oversight Needs Intervention

cs.AI · 2026-06-19 · unverdicted · novelty 6.0

Action-conditioned estimation of intervention advantage via prefix branching reduces control regret over calibrated scalar risk scores in LLM agent oversight across benchmarks.

Beyond Failure Recovery: An Engagement-Aware Human-in-the-loop Framework for Robotic Systems

cs.RO · 2026-06-16 · unverdicted · novelty 6.0

E-MPC is a model predictive control framework that uses a user interaction dynamics model to balance autonomy and engagement under workload constraints in robotic caregiving, evaluated via simulation and a user study.

Confidence Laundering in Agent Systems: Why Uncertainty Needs a Latent Carrier

cs.AI · 2026-06-09 · unverdicted · novelty 6.0

Agent systems lose uncertainty at decision handoffs, causing downstream over-trust; the paper proposes latent uncertainty as a carrier to preserve pre-commitment fragility across interfaces.

Policy-Guided Stepwise Model Routing for Cost-Effective Reasoning

cs.AI · 2026-05-07 · unverdicted · novelty 6.0

A small RL-trained policy for stepwise model routing between LLM sizes improves the accuracy-cost tradeoff on math benchmarks over handcrafted strategies and matches large process reward model methods.

Geometry-Calibrated Conformal Abstention for Language Models

cs.CL · 2026-04-30 · unverdicted · novelty 6.0

Geometry-calibrated conformal abstention lets language models abstain from uncertain queries with finite-sample guarantees on both participation rate and conditional correctness of answers.

KGLAMP: Knowledge Graph-guided Language model for Adaptive Multi-robot Planning and Replanning

cs.RO · 2026-02-04 · unverdicted · novelty 6.0

KGLAMP uses a dynamically updated knowledge graph to guide LLMs in creating and replanning PDDL specifications for heterogeneous multi-robot teams, reporting at least 25.3% better performance than LLM-only or classical PDDL baselines on the MAT-THOR benchmark.

Strategic Decision Support for AI Agents

cs.AI · 2026-06-10 · unverdicted · novelty 5.0

The paper introduces an optimization framework for AI agents to strategically seek support, proving a threshold policy on support value and providing an online algorithm to control missed-support error without distributional assumptions.

Code as Agent Harness

cs.CL · 2026-05-18 · accept · novelty 5.0

A survey that organizes existing work on LLM-based agents around code as the central harness, structured in three layers of interfaces, mechanisms, and multi-agent scaling, with applications across domains and listed open challenges.

TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning

cs.AI · 2026-05-18 · unverdicted · novelty 5.0

TaskGround introduces a Ground-Infer-Execute framework for full-scene household reasoning that improves success rates on the FullHome benchmark and enables compact models to match larger ones at up to 18x lower token cost.

VizCopilot: Fostering Appropriate Reliance on Enterprise Chatbots with Context Visualization

cs.HC · 2025-10-13 · unverdicted · novelty 5.0

VizCopilot integrates topic modeling with document visualization to support user oversight of retrieved context in enterprise chatbots, enabling detection of misalignments and adaptation of prompting strategies.

citing papers explorer

Showing 12 of 12 citing papers after filters.

3D Instruction Ambiguity Detection cs.AI · 2026-01-09 · unverdicted · none · ref 20
Defines 3D Instruction Ambiguity Detection as a new task, releases the Ambi3D benchmark, shows state-of-the-art 3D LLMs struggle with it, and proposes the AmbiVer framework that gathers multi-view visual evidence to guide VLMs in judging ambiguity.
Uncovering the Representation Geometry of Minimal Cores in Overcomplete Reasoning Traces cs.AI · 2026-05-14 · unverdicted · none · ref 61
Language models produce overcomplete reasoning traces where on average 46% of steps can be removed while preserving the answer in 86% of cases, with necessity concentrated in the top three steps.
Budgeted Act-or-Defer Multi-Agent LLM Deliberation with Local Reliability Bounds cs.AI · 2026-06-28 · unverdicted · none · ref 82
A kNN lower-confidence-bound approach for act-or-defer decisions in multi-agent LLM debates respects user-declared wrong-action budgets while achieving high automation rates on benchmarks.
Calibration Is Not Control: Why LLM-Agent Oversight Needs Intervention cs.AI · 2026-06-19 · unverdicted · none · ref 9
Action-conditioned estimation of intervention advantage via prefix branching reduces control regret over calibrated scalar risk scores in LLM agent oversight across benchmarks.
Beyond Failure Recovery: An Engagement-Aware Human-in-the-loop Framework for Robotic Systems cs.RO · 2026-06-16 · unverdicted · none · ref 31
E-MPC is a model predictive control framework that uses a user interaction dynamics model to balance autonomy and engagement under workload constraints in robotic caregiving, evaluated via simulation and a user study.
Confidence Laundering in Agent Systems: Why Uncertainty Needs a Latent Carrier cs.AI · 2026-06-09 · unverdicted · none · ref 56
Agent systems lose uncertainty at decision handoffs, causing downstream over-trust; the paper proposes latent uncertainty as a carrier to preserve pre-commitment fragility across interfaces.
Policy-Guided Stepwise Model Routing for Cost-Effective Reasoning cs.AI · 2026-05-07 · unverdicted · none · ref 14
A small RL-trained policy for stepwise model routing between LLM sizes improves the accuracy-cost tradeoff on math benchmarks over handcrafted strategies and matches large process reward model methods.
Geometry-Calibrated Conformal Abstention for Language Models cs.CL · 2026-04-30 · unverdicted · none · ref 20
Geometry-calibrated conformal abstention lets language models abstain from uncertain queries with finite-sample guarantees on both participation rate and conditional correctness of answers.
KGLAMP: Knowledge Graph-guided Language model for Adaptive Multi-robot Planning and Replanning cs.RO · 2026-02-04 · unverdicted · none · ref 7
KGLAMP uses a dynamically updated knowledge graph to guide LLMs in creating and replanning PDDL specifications for heterogeneous multi-robot teams, reporting at least 25.3% better performance than LLM-only or classical PDDL baselines on the MAT-THOR benchmark.
Strategic Decision Support for AI Agents cs.AI · 2026-06-10 · unverdicted · none · ref 71
The paper introduces an optimization framework for AI agents to strategically seek support, proving a threshold policy on support value and providing an online algorithm to control missed-support error without distributional assumptions.
TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning cs.AI · 2026-05-18 · unverdicted · none · ref 34
TaskGround introduces a Ground-Infer-Execute framework for full-scene household reasoning that improves success rates on the FullHome benchmark and enables compact models to match larger ones at up to 18x lower token cost.
VizCopilot: Fostering Appropriate Reliance on Enterprise Chatbots with Context Visualization cs.HC · 2025-10-13 · unverdicted · none · ref 41
VizCopilot integrates topic modeling with document visualization to support user oversight of retrieved context in enterprise chatbots, enabling detection of misalignments and adaptation of prompting strategies.

Robots that ask for help: Uncertainty alignment for large language model planners

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer