pith. sign in

Robots that ask for help: Uncertainty alignment for large language model planners

10 Pith papers cite this work. Polarity classification is still indexing.

10 Pith papers citing it

citation-role summary

background 2

citation-polarity summary

roles

background 2

polarities

background 1 support 1

clear filters

representative citing papers

3D Instruction Ambiguity Detection

cs.AI · 2026-01-09 · unverdicted · novelty 8.0

Defines 3D Instruction Ambiguity Detection as a new task, releases the Ambi3D benchmark, shows state-of-the-art 3D LLMs struggle with it, and proposes the AmbiVer framework that gathers multi-view visual evidence to guide VLMs in judging ambiguity.

Geometry-Calibrated Conformal Abstention for Language Models

cs.CL · 2026-04-30 · unverdicted · novelty 6.0

Geometry-calibrated conformal abstention lets language models abstain from uncertain queries with finite-sample guarantees on both participation rate and conditional correctness of answers.

Code as Agent Harness

cs.CL · 2026-05-18 · accept · novelty 5.0

A survey that organizes existing work on LLM-based agents around code as the central harness, structured in three layers of interfaces, mechanisms, and multi-agent scaling, with applications across domains and listed open challenges.

citing papers explorer

Showing 8 of 8 citing papers after filters.

  • 3D Instruction Ambiguity Detection cs.AI · 2026-01-09 · unverdicted · none · ref 20

    Defines 3D Instruction Ambiguity Detection as a new task, releases the Ambi3D benchmark, shows state-of-the-art 3D LLMs struggle with it, and proposes the AmbiVer framework that gathers multi-view visual evidence to guide VLMs in judging ambiguity.

  • Uncovering the Representation Geometry of Minimal Cores in Overcomplete Reasoning Traces cs.AI · 2026-05-14 · unverdicted · none · ref 61

    Language models produce overcomplete reasoning traces where on average 46% of steps can be removed while preserving the answer in 86% of cases, with necessity concentrated in the top three steps.

  • Budgeted Act-or-Defer Multi-Agent LLM Deliberation with Local Reliability Bounds cs.AI · 2026-06-28 · unverdicted · none · ref 82

    A kNN lower-confidence-bound approach for act-or-defer decisions in multi-agent LLM debates respects user-declared wrong-action budgets while achieving high automation rates on benchmarks.

  • Policy-Guided Stepwise Model Routing for Cost-Effective Reasoning cs.AI · 2026-05-07 · unverdicted · none · ref 14

    A small RL-trained policy for stepwise model routing between LLM sizes improves the accuracy-cost tradeoff on math benchmarks over handcrafted strategies and matches large process reward model methods.

  • Geometry-Calibrated Conformal Abstention for Language Models cs.CL · 2026-04-30 · unverdicted · none · ref 20

    Geometry-calibrated conformal abstention lets language models abstain from uncertain queries with finite-sample guarantees on both participation rate and conditional correctness of answers.

  • KGLAMP: Knowledge Graph-guided Language model for Adaptive Multi-robot Planning and Replanning cs.RO · 2026-02-04 · unverdicted · none · ref 7

    KGLAMP uses a dynamically updated knowledge graph to guide LLMs in creating and replanning PDDL specifications for heterogeneous multi-robot teams, reporting at least 25.3% better performance than LLM-only or classical PDDL baselines on the MAT-THOR benchmark.

  • Code as Agent Harness cs.CL · 2026-05-18 · accept · none · ref 111

    A survey that organizes existing work on LLM-based agents around code as the central harness, structured in three layers of interfaces, mechanisms, and multi-agent scaling, with applications across domains and listed open challenges.

  • TaskGround: Structured Executable Task Inference for Full-Scene Household Reasoning cs.AI · 2026-05-18 · unverdicted · none · ref 34

    TaskGround introduces a Ground-Infer-Execute framework for full-scene household reasoning that improves success rates on the FullHome benchmark and enables compact models to match larger ones at up to 18x lower token cost.