PDDL planning problems are used to generate about one million precise reasoning steps for training Process Reward Models, and adding this data to existing datasets improves LLM performance on both mathematical and non-mathematical reasoning benchmarks.
Title resolution pending
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
citation-role summary
method 1
citation-polarity summary
years
2026 2verdicts
UNVERDICTED 2roles
method 1polarities
use method 1representative citing papers
Distill refines user task specifications for robots by pruning unnecessary steps, generalizing meanings, and relaxing order constraints, as demonstrated in a crowdsourcing study on a web interface.
citing papers explorer
-
Process Reward Models Meet Planning: Generating Precise and Scalable Datasets for Step-Level Rewards
PDDL planning problems are used to generate about one million precise reasoning steps for training Process Reward Models, and adding this data to existing datasets improves LLM performance on both mathematical and non-mathematical reasoning benchmarks.
-
Distill: Uncovering the True Intent behind Human-Robot Communication
Distill refines user task specifications for robots by pruning unnecessary steps, generalizing meanings, and relaxing order constraints, as demonstrated in a crowdsourcing study on a web interface.