Optimizing solely the planner with trajectory-level RL rewards in a decomposed multi-agent setup yields compute-efficient gains on long-horizon benchmarks.
Output format: <plan>Your updated plan here</plan> <subgoal>next single step</subgoal> 21 Preprint
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Planner Matters! An Efficient and Unbalanced Multi-agent Collaboration Framework for Long-horizon Planning
Optimizing solely the planner with trajectory-level RL rewards in a decomposed multi-agent setup yields compute-efficient gains on long-horizon benchmarks.