TRFP combines rectified flow models with truncation to support multimodal policies in MaxEnt RL while allowing fast one-step sampling and stable training.
Diffusion actor-critic with entropy regulator.Advances in Neural Information Processing Systems, 37:54183–54204
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2representative citing papers
citing papers explorer
-
Truncated Rectified Flow Policy for Reinforcement Learning with One-Step Sampling
TRFP combines rectified flow models with truncation to support multimodal policies in MaxEnt RL while allowing fast one-step sampling and stable training.