pith. sign in

Flow-based single-step completion for efficient and expressive policy learning.arXiv preprint arXiv:2506.21427, 2025

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

fields

cs.LG 3

years

2026 3

verdicts

UNVERDICTED 3

clear filters

representative citing papers

Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning

cs.LG · 2026-06-09 · unverdicted · novelty 7.0

QGF performs test-time policy optimization for flow models in RL by guiding a behavior-cloned reference policy with value-function gradients, achieving strong results on high-dimensional offline RL benchmarks without additional policy training.

Drift Q-Learning

cs.LG · 2026-05-29 · unverdicted · novelty 6.0

DriftQL is a single-pass offline RL algorithm using drift regularization that outperforms diffusion and flow policies on standard benchmarks.

citing papers explorer

Showing 3 of 3 citing papers after filters.

  • Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning cs.LG · 2026-06-09 · unverdicted · none · ref 28

    QGF performs test-time policy optimization for flow models in RL by guiding a behavior-cloned reference policy with value-function gradients, achieving strong results on high-dimensional offline RL benchmarks without additional policy training.

  • Reinforcement Learning for Flow-Matching Policies with Density Transport cs.LG · 2026-06-07 · unverdicted · none · ref 21

    RLDT fine-tunes pretrained flow-matching policies for continuous control by aligning them to a max-entropy RL transport field constructed via SVGD, using expected-target estimation for stable multi-step updates.

  • Drift Q-Learning cs.LG · 2026-05-29 · unverdicted · none · ref 42

    DriftQL is a single-pass offline RL algorithm using drift regularization that outperforms diffusion and flow policies on standard benchmarks.