Policy invariance under reward transfor- mations: Theory and application to reward shaping

Andrew Y Ng, Daishi Harada, Stuart Russell · 1999

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

Descent-Guided Policy Gradient for Scalable Cooperative Multi-Agent Learning

cs.MA · 2026-02-23 · unverdicted · novelty 7.0

DG-PG augments policy gradients with descent signals from analytical models to reduce estimator variance from O(N) to O(1), preserve game equilibria, and achieve agent-independent sample complexity while converging on 1500-agent tasks where baselines fail.

citing papers explorer

Showing 1 of 1 citing paper.

Descent-Guided Policy Gradient for Scalable Cooperative Multi-Agent Learning cs.MA · 2026-02-23 · unverdicted · none · ref 17
DG-PG augments policy gradients with descent signals from analytical models to reduce estimator variance from O(N) to O(1), preserve game equilibria, and achieve agent-independent sample complexity while converging on 1500-agent tasks where baselines fail.

Policy invariance under reward transfor- mations: Theory and application to reward shaping

fields

years

verdicts

representative citing papers

citing papers explorer