CGPO integrates training-free critic guidance into diffusion denoising to produce high-Q actions as regression targets, yielding SOTA results on MuJoCo locomotion and successful Franka arm grasping.
Real-world reinforcement learning from suboptimal interventions
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
fields
cs.RO 2years
2026 2representative citing papers
citing papers explorer
-
Sample-Efficient Diffusion-based Reinforcement Learning with Critic Guidance
CGPO integrates training-free critic guidance into diffusion denoising to produce high-Q actions as regression targets, yielding SOTA results on MuJoCo locomotion and successful Franka arm grasping.
- OHP-RL: Online Human Preference as Guidance in Reinforcement Learning for Robot Manipulation