AdaPlanBench introduces a multi-turn benchmark where LLM agents must adapt plans under progressively revealed dual constraints, with top models reaching only 67.75% accuracy.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
AdaPlanBench: Evaluating Adaptive Planning in Large Language Model Agents under World and User Constraints
AdaPlanBench introduces a multi-turn benchmark where LLM agents must adapt plans under progressively revealed dual constraints, with top models reaching only 67.75% accuracy.