TRFP combines rectified flow models with truncation to support multimodal policies in MaxEnt RL while allowing fast one-step sampling and stable training.
Diffusion actor-critic with entropy regulator.Advances in Neural Information Processing Systems, 37:54183–54204
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it