pith. sign in

Offline reinforcement learning via high-fidelity generative behavior modeling

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

background 2 baseline 1

citation-polarity summary

fields

cs.LG 6 cs.RO 2

representative citing papers

EXPO: Stable Reinforcement Learning with Expressive Policies

cs.LG · 2025-07-10 · conditional · novelty 7.0

EXPO stabilizes online RL for expressive policies by training a base policy with imitation and using a lightweight Gaussian edit policy to select higher-value actions on the fly for sampling and TD backups.

FASTER: Value-Guided Sampling for Fast RL

cs.LG · 2026-04-21 · unverdicted · novelty 6.0

FASTER models multi-candidate denoising as an MDP and trains a value function to filter actions early, delivering the performance of full sampling at lower cost in diffusion RL policies.

Diffusion Policy Policy Optimization

cs.RO · 2024-09-01 · unverdicted · novelty 6.0

DPPO fine-tunes diffusion policies via policy gradients and outperforms prior RL approaches for diffusion policies and PG-tuned alternatives on robot benchmarks while enabling stable training and hardware deployment.

citing papers explorer

Showing 8 of 8 citing papers.