← back to paper
arxiv: 2605.30719 · 2 revisions
When are LLMs Sufficient Policy Optimizers for Sequential RL Tasks?