LLM agents achieve greater flexibility during execution by classifying actions via a reversibility taxonomy and using an Earliest-Conflict Rollback algorithm that matches full-restart quality while wasting far less completed work.
InThe F ourteenth International Conference on Learning Representations
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 1polarities
background 1representative citing papers
Plan-RewardBench is a trajectory-level preference benchmark that evaluates how well reward models distinguish preferred agent trajectories from hard distractors across safety refusal, tool handling, complex planning, and error recovery tasks.
Fluent AI users adopt an active, iterative collaboration mode that produces more visible failures but better recovery and success on hard tasks, whereas novices experience more invisible failures from passive use.
AI alignment must move beyond assuming users have fully formed goals and instead provide active cognitive support to help form and refine intent over time.
Behavior Forest decouples multi-constraint travel planning into parallel behavior trees with LLM nodes and global coordination, yielding 6.67% and 11.82% gains over prior methods on two benchmarks.
A decision-theoretic model based on the observed Confirmation-Diagnosis-Correction-Redo user pattern places intermediate confirmations in AI agent tasks, yielding 81% user preference and 13.54% faster completion versus confirm-at-end.
In real human subjects, AI transparency impacts imperfectly cooperative interactions far more than personality traits, unlike simulations where both are comparably influential.
A survey comparing classical multi-agent systems with large foundation model-enabled multi-agent systems, showing how the latter enables semantic-level collaboration and greater adaptability.
citing papers explorer
-
Revisable by Design: A Theory of Streaming LLM Agent Execution
LLM agents achieve greater flexibility during execution by classifying actions via a reversibility taxonomy and using an Earliest-Conflict Rollback algorithm that matches full-restart quality while wasting far less completed work.
-
Aligning Agents via Planning: A Benchmark for Trajectory-Level Reward Modeling
Plan-RewardBench is a trajectory-level preference benchmark that evaluates how well reward models distinguish preferred agent trajectories from hard distractors across safety refusal, tool handling, complex planning, and error recovery tasks.
-
A paradox of AI fluency
Fluent AI users adopt an active, iterative collaboration mode that produces more visible failures but better recovery and success on hard tasks, whereas novices experience more invisible failures from passive use.
-
Alignment has a Fantasia Problem
AI alignment must move beyond assuming users have fully formed goals and instead provide active cognitive support to help form and refine intent over time.
-
Decoupled Travel Planning with Behavior Forest
Behavior Forest decouples multi-constraint travel planning into parallel behavior trees with LLM nodes and global coordination, yielding 6.67% and 11.82% gains over prior methods on two benchmarks.
-
When Should Users Check? Modeling Confirmation Frequency inMulti-Step Agentic AI Tasks
A decision-theoretic model based on the observed Confirmation-Diagnosis-Correction-Redo user pattern places intermediate confirmations in AI agent tasks, yielding 81% user preference and 13.54% faster completion versus confirm-at-end.
-
Imperfectly Cooperative Human-AI Interactions: Comparing the Impacts of Human and AI Attributes in Simulated and User Studies
In real human subjects, AI transparency impacts imperfectly cooperative interactions far more than personality traits, unlike simulations where both are comparably influential.
-
Multi-Agent Systems: From Classical Paradigms to Large Foundation Model-Enabled Futures
A survey comparing classical multi-agent systems with large foundation model-enabled multi-agent systems, showing how the latter enables semantic-level collaboration and greater adaptability.