HALO trains an orchestrator policy on verifier-approved refinement trajectories across 11 PDDL domains, matching GPT-5-mini success rates at roughly 45x lower orchestration cost and cutting LLM calls by 40-50%.
arXiv preprint arXiv:2311.09830 , year=
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.AI 2years
2026 2verdicts
UNVERDICTED 2representative citing papers
Presents NL-PDDL-Bench and a planner-in-the-loop framework combining LoRA fine-tuning, DPO on planner-derived pairs, and inference-time repair to improve LLM PDDL generation.
citing papers explorer
-
Training the Orchestrator: A Supervised Approach to End-to-End PDDL Planning with LLM Agents
HALO trains an orchestrator policy on verifier-approved refinement trajectories across 11 PDDL domains, matching GPT-5-mini success rates at roughly 45x lower orchestration cost and cutting LLM calls by 40-50%.
-
Toward Secure and Reliable PDDL Formalization of Large Language Models with Planner-in-the-Loop Feedback
Presents NL-PDDL-Bench and a planner-in-the-loop framework combining LoRA fine-tuning, DPO on planner-derived pairs, and inference-time repair to improve LLM PDDL generation.