For the policy network, we use Fourier embedding for the flow time embedding

For policy, we use the Euler method of 10 steps across all tasks · 2026

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

Q-Flow enables stable optimization of expressive flow-based policies in RL by propagating terminal values along deterministic flow dynamics to intermediate states for gradient updates without solver unrolling.

citing papers explorer

Showing 1 of 1 citing paper.

Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy cs.LG · 2026-05-13 · unverdicted · none · ref 22
Q-Flow enables stable optimization of expressive flow-based policies in RL by propagating terminal values along deterministic flow dynamics to intermediate states for gradient updates without solver unrolling.

For the policy network, we use Fourier embedding for the flow time embedding

fields

years

verdicts

representative citing papers

citing papers explorer