ReFPO adds explicit Reflow regularization to FPO, stabilizing PPO-style training and supporting high-fidelity one-step inference across GridWorld, MuJoCo, and Humanoid tasks.
Revisiting diffusion q-learning: From iterative denoising to one-step action generation.arXiv preprint arXiv:2508.13904, 2025
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.RO 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
ReFPO: Reflow Regularization for Flow Matching Policy Gradients
ReFPO adds explicit Reflow regularization to FPO, stabilizing PPO-style training and supporting high-fidelity one-step inference across GridWorld, MuJoCo, and Humanoid tasks.