Q-Flow enables stable optimization of expressive flow-based policies in RL by propagating terminal values along deterministic flow dynamics to intermediate states for gradient updates without solver unrolling.
Both approaches fall into the class of guidance-based methods, where policy improvement relies on evaluating the outer critic at intermediate latent actions
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy
Q-Flow enables stable optimization of expressive flow-based policies in RL by propagating terminal values along deterministic flow dynamics to intermediate states for gradient updates without solver unrolling.