Flow-based single-step completion for efficient and expressive policy learning.arXiv preprint arXiv:2506.21427, 2025

Prajwal Koirala, Cody Fleming · 2025 · arXiv 2506.21427

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning

cs.LG · 2026-06-09 · unverdicted · novelty 7.0

QGF performs test-time policy optimization for flow models in RL by guiding a behavior-cloned reference policy with value-function gradients, achieving strong results on high-dimensional offline RL benchmarks without additional policy training.

Reinforcement Learning for Flow-Matching Policies with Density Transport

cs.LG · 2026-06-07 · unverdicted · novelty 7.0

RLDT fine-tunes pretrained flow-matching policies for continuous control by aligning them to a max-entropy RL transport field constructed via SVGD, using expected-target estimation for stable multi-step updates.

Drift Q-Learning

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

DriftQL is a single-pass offline RL algorithm using drift regularization that outperforms diffusion and flow policies on standard benchmarks.

citing papers explorer

Showing 3 of 3 citing papers after filters.

Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning cs.LG · 2026-06-09 · unverdicted · none · ref 28
QGF performs test-time policy optimization for flow models in RL by guiding a behavior-cloned reference policy with value-function gradients, achieving strong results on high-dimensional offline RL benchmarks without additional policy training.
Reinforcement Learning for Flow-Matching Policies with Density Transport cs.LG · 2026-06-07 · unverdicted · none · ref 21
RLDT fine-tunes pretrained flow-matching policies for continuous control by aligning them to a max-entropy RL transport field constructed via SVGD, using expected-target estimation for stable multi-step updates.
Drift Q-Learning cs.LG · 2026-05-29 · unverdicted · none · ref 42
DriftQL is a single-pass offline RL algorithm using drift regularization that outperforms diffusion and flow policies on standard benchmarks.

Flow-based single-step completion for efficient and expressive policy learning.arXiv preprint arXiv:2506.21427, 2025

fields

years

verdicts

representative citing papers

citing papers explorer