In contrast, the sparse reward definition used in *-sparse tasks does not award the subtask completion reward and provides the full reward only upon the full completion

In contrast, manipulation tasks typically involve multiple sequential subtasks (e · 2021

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy

cs.LG · 2026-05-13 · unverdicted · novelty 6.0

Q-Flow enables stable optimization of expressive flow-based policies in RL by propagating terminal values along deterministic flow dynamics to intermediate states for gradient updates without solver unrolling.

citing papers explorer

Showing 1 of 1 citing paper.

Q-Flow: Stable and Expressive Reinforcement Learning with Flow-Based Policy cs.LG · 2026-05-13 · unverdicted · none · ref 19
Q-Flow enables stable optimization of expressive flow-based policies in RL by propagating terminal values along deterministic flow dynamics to intermediate states for gradient updates without solver unrolling.

In contrast, the sparse reward definition used in *-sparse tasks does not award the subtask completion reward and provides the full reward only upon the full completion

fields

years

verdicts

representative citing papers

citing papers explorer