In policy gradient RL, careful variance control and simple estimator switching frequently outperform explicit discontinuity detection even when using differentiable simulators.
Figure 12: Sensitivity analysis on the parameter γ for AoBG in the Ball with Wall landscape analysis (1000 samples)
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Does "Do Differentiable Simulators Give Better Policy Gradients?'' Give Better Policy Gradients?
In policy gradient RL, careful variance control and simple estimator switching frequently outperform explicit discontinuity detection even when using differentiable simulators.