Q-Flow bridges stability and expressivity in flow-based RL policies by propagating terminal trajectory values to intermediate states for gradient-based optimization.
However, the fundamental distinction lies in the generative policy class, which dictates optimization complexity and intermediate value construction
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy
Q-Flow bridges stability and expressivity in flow-based RL policies by propagating terminal trajectory values to intermediate states for gradient-based optimization.