Value Flows

Benjamin Eysenbach; Chelsea Finn; Chongyi Zheng; Dorsa Sadigh; Perry Dong

arxiv: 2510.07650 · v4 · pith:QMTYAAIGnew · submitted 2025-10-09 · 💻 cs.LG · cs.AI

Value Flows

Perry Dong , Chongyi Zheng , Chelsea Finn , Dorsa Sadigh , Benjamin Eysenbach This is my paper

classification 💻 cs.LG cs.AI

keywords returndistributionvalueflowslearningmethodsstatesuncertainty

0 comments

read the original abstract

While most reinforcement learning methods today flatten the distribution of future returns to a single scalar value, distributional RL methods exploit the return distribution to provide stronger learning signals and to enable applications in exploration and safe RL. While the predominant method for estimating the return distribution is by modeling it as a categorical distribution over discrete bins or estimating a finite number of quantiles, such approaches leave unanswered questions about the fine-grained structure of the return distribution and about how to distinguish states with high return uncertainty for decision-making. The key idea in this paper is to use modern, flexible flow-based models to estimate the full future return distributions and identify those states with high return variance. We do so by formulating a new flow-matching objective that generates probability density paths satisfying the distributional Bellman equation. Building upon the learned flow models, we estimate the return uncertainty of distinct states using a new flow derivative ODE. We additionally use this uncertainty information to prioritize learning a more accurate return estimation on certain transitions. We compare our method (Value Flows) with prior methods in the offline and online-to-online settings. Experiments on $37$ state-based and $25$ image-based benchmark tasks demonstrate that Value Flows achieves a $1.3\times$ improvement on average in success rates. Website: https://pd-perry.github.io/value-flows Code: https://github.com/chongyi-zheng/value-flows

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 4 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning
cs.LG 2026-05 unverdicted novelty 7.0

FAN achieves state-of-the-art offline RL performance on robotic tasks by anchoring flow policies and using single-sample noise-conditioned Q-learning, with proven convergence and reduced runtimes.
Reinforcement Learning via Value Gradient Flow
cs.LG 2026-04 unverdicted novelty 7.0

VGF solves behavior-regularized RL by transporting particles from a reference distribution to the value-induced optimal policy via discrete value-guided gradient flow.
FASTER: Value-Guided Sampling for Fast RL
cs.LG 2026-04 unverdicted novelty 6.0

FASTER models multi-candidate denoising as an MDP and trains a value function to filter actions early, delivering the performance of full sampling at lower cost in diffusion RL policies.
What Does Flow Matching Bring To TD Learning?
cs.LG 2026-03 conditional novelty 6.0

Flow matching critics outperform monolithic ones in RL by 2x performance and 5x sample efficiency via test-time error recovery through integration and multi-point velocity supervision that preserves feature plasticity.