pith. sign in

arxiv: 2505.23527 · v4 · pith:AAJBQHVMnew · submitted 2025-05-29 · 💻 cs.LG

Normalizing Flows are Capable Models for Continuous Control

classification 💻 cs.LG
keywords modelsalgorithmsflowslearningnormalizingautoregressivedifferentialdiffusion
0
0 comments X
read the original abstract

Modern reinforcement learning (RL) algorithms have found success by using powerful probabilistic models, such as transformers, energy-based models, and diffusion/flow-based models. To this end, RL researchers often choose to pay the price of accommodating these models into their algorithms -- diffusion models are expressive, but are computationally intensive due to their reliance on solving differential equations, while autoregressive transformer models are scalable but typically require learning discrete representations. Normalizing flows (NFs), by contrast, seem to provide an appealing alternative, as they enable likelihoods and sampling without solving differential equations or autoregressive architectures. However, their potential in RL has received limited attention, partly due to the prevailing belief that normalizing flows lack sufficient expressivity. We show that this is not the case. Building on recent work in NFs, we propose a single NF architecture which integrates seamlessly into RL algorithms, serving as a policy, Q-function, and occupancy measure. Our approach leads to much simpler algorithms, and achieves higher performance in imitation learning, offline, goal conditioned RL and unsupervised RL.

This paper has not been read by Pith yet.

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. SERNF: Sample-Efficient Real-World Dexterous Policy Fine-Tuning via Action-Chunked Critics and Normalizing Flows

    cs.RO 2026-02 unverdicted novelty 6.0

    SERNF achieves sample-efficient real-world fine-tuning of multimodal dexterous policies by pairing exact-likelihood normalizing flow policies with action-chunked value critics.