← back to paper
arxiv: 2603.19470 · 2 revisions
Adaptive Layerwise Perturbation: Unifying Off-Policy Corrections for LLM RL