Agent² RL-Bench shows LLM agents can occasionally engineer online RL post-training pipelines that boost performance (e.g., ALFWorld from 4.85 to 93.28) but stable success remains rare under fixed budgets.
Title resolution pending
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Agent^2 RL-Bench: Can LLM Agents Engineer Agentic RL Post-Training?
Agent² RL-Bench shows LLM agents can occasionally engineer online RL post-training pipelines that boost performance (e.g., ALFWorld from 4.85 to 93.28) but stable success remains rare under fixed budgets.