SDPG is a new on-policy visual RL algorithm that estimates gradients via stochastic perturbations of rollouts, achieving faster training and lower memory use than baselines on visual MuJoCo tasks while adding new robotics benchmarks and sim-to-real results.
Learn- ing quadrupedal locomotion via differentiable simulation.arXiv preprint arXiv:2404.02887
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
In policy gradient RL, careful variance control and simple estimator switching frequently outperform explicit discontinuity detection even when using differentiable simulators.
citing papers explorer
-
Efficient On-policy Visual-RL via Stochastic Decoupled Policy Gradient
SDPG is a new on-policy visual RL algorithm that estimates gradients via stochastic perturbations of rollouts, achieving faster training and lower memory use than baselines on visual MuJoCo tasks while adding new robotics benchmarks and sim-to-real results.
-
Does "Do Differentiable Simulators Give Better Policy Gradients?'' Give Better Policy Gradients?
In policy gradient RL, careful variance control and simple estimator switching frequently outperform explicit discontinuity detection even when using differentiable simulators.