← back to paper
arxiv: 2601.21484 · 2 revisions
ETS: Energy-Guided Test-Time Scaling for Training-Free RL Alignment