Energy-weighted flow matching for offline reinforcement learning

Shiyuan Zhang, Weitong Zhang, Quanquan Gu · 2025 · arXiv 2503.04975

12 Pith papers cite this work. Polarity classification is still indexing.

12 Pith papers citing it

read on arXiv browse 12 citing papers

citation-role summary

background 3

citation-polarity summary

background 2 unclear 1

representative citing papers

Test-Time Gradient Guidance of Flow Policies in Reinforcement Learning

cs.LG · 2026-06-09 · unverdicted · novelty 7.0

QGF performs test-time policy optimization for flow models in RL by guiding a behavior-cloned reference policy with value-function gradients, achieving strong results on high-dimensional offline RL benchmarks without additional policy training.

Block-R1: Rethinking the Role of Block Size in Multi-domain Reinforcement Learning for Diffusion Large Language Models

cs.LG · 2026-05-12 · unverdicted · novelty 7.0 · 2 refs

Introduces Block-R1 benchmark, Block-R1-41K dataset, and a conflict score to handle domain-specific optimal block sizes in RL post-training of diffusion LLMs.

ScoRe-Flow: Complete Distributional Control via Score-Based Reinforcement Learning for Flow Matching

cs.RO · 2026-04-13 · unverdicted · novelty 7.0

ScoRe-Flow achieves decoupled mean-variance control in stochastic flow matching by deriving a closed-form score for drift modulation plus learned variance, yielding faster RL convergence and higher success rates on locomotion and manipulation benchmarks.

Steering Your Diffusion Policy with Latent Space Reinforcement Learning

cs.RO · 2025-06-18 · unverdicted · novelty 7.0

DSRL steers pretrained diffusion policies for robotics by applying RL to their latent noise inputs, achieving sample-efficient real-world adaptation with only black-box access.

Dual-Flow Reinforcement Learning with State-Aware Exploration

cs.LG · 2026-06-29 · unverdicted · novelty 6.0

Dual-Flow RL jointly models return distributions and multimodal policies via conditional flow matching with an added ECER for exploration, claiming SOTA results on control benchmarks.

GDSD: Reinforcement Learning as Guided Denoiser Self-Distillation for Diffusion Language Models

cs.LG · 2026-05-28 · unverdicted · novelty 6.0

GDSD reduces RL for dLLMs to likelihood-free self-distillation via a normalization-free logit-matching objective, outperforming ELBO methods with more stable training on LLaDA-8B and Dream-7B.

Path-Coupled Bellman Flows for Distributional Reinforcement Learning

cs.LG · 2026-05-07 · unverdicted · novelty 6.0

PCBF learns return distributions via source-consistent Bellman-coupled paths with shared noise and λ-parameterized control variates, reporting improved fidelity and stability on MRPs, OGBench, and D4RL.

Towards Efficient and Expressive Offline RL via Flow-Anchored Noise-conditioned Q-Learning

cs.LG · 2026-05-03 · unverdicted · novelty 6.0

FAN simplifies expressive flow policies and distributional critics in offline RL via single-iteration behavior regularization and single-sample noise conditioning to claim SOTA performance with lower training and inference time.

Fisher Decorator: Refining Flow Policy via a Local Transport Map

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

Fisher Decorator refines flow policies in offline RL via a local transport map and Fisher-matrix quadratic approximation of the KL constraint, yielding controllable error near the optimum and SOTA benchmark results.

Energy-Guided Generative Modeling for Low-Energy Molecular Structure Discovery

cs.LG · 2025-12-27 · unverdicted · novelty 6.0

EnFlow integrates flow-based conformer generation with energy landscape modeling to enable joint ensemble generation and ground-state identification using only 1-2 ODE steps.

Energy-Weighted Flow Matching: Unlocking Continuous Normalizing Flows for Efficient and Scalable Boltzmann Sampling

stat.ML · 2025-09-03 · unverdicted · novelty 6.0

Energy-Weighted Flow Matching reformulates conditional flow matching with importance sampling to enable continuous normalizing flows to model Boltzmann distributions from energy evaluations alone, with iterative and annealed variants showing competitive performance on benchmarks.

FlowAWR: Online Adaptive Flow Reinforcement via Advantage-Weighted Rectification

cs.LG · 2026-06-29 · unverdicted · novelty 5.0

FlowAWR derives an advantage-weighted rectification for optimal velocity fields in flow models, claiming 2-5x faster convergence than DiffusionNFT on SD3.5-Medium.

citing papers explorer

Showing 0 of 0 citing papers after filters.

No citing papers match the current filters.

Energy-weighted flow matching for offline reinforcement learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer