PlanningBench supplies a taxonomy-guided synthesis pipeline that produces scalable, self-verifiable planning instances to evaluate LLM planning failures and improve them via reinforcement learning.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
PlanningBench: Generating Scalable and Verifiable Planning Data for Evaluating and Training Large Language Models
PlanningBench supplies a taxonomy-guided synthesis pipeline that produces scalable, self-verifiable planning instances to evaluate LLM planning failures and improve them via reinforcement learning.