pith. sign in

Policy gradient meth- ods for reinforcement learning with function approximation.Advances in neural information processing systems, 12

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

fields

cs.LG 3 cs.MA 1

years

2026 2 2025 2

clear filters

representative citing papers

Descent-Guided Policy Gradient for Scalable Cooperative Multi-Agent Learning

cs.MA · 2026-02-23 · unverdicted · novelty 7.0

DG-PG augments policy gradients with descent signals from analytical models to reduce estimator variance from O(N) to O(1), preserve game equilibria, and achieve agent-independent sample complexity while converging on 1500-agent tasks where baselines fail.

Soft Deterministic Policy Gradient with Gaussian Smoothing

cs.LG · 2026-05-07 · unverdicted · novelty 5.0

Soft-DPG uses Gaussian smoothing on the Bellman equation to derive a well-defined policy gradient without relying on critic action derivatives, yielding competitive performance on dense-reward tasks and gains on discretized-reward variants.

citing papers explorer

Showing 2 of 2 citing papers after filters.

  • Descent-Guided Policy Gradient for Scalable Cooperative Multi-Agent Learning cs.MA · 2026-02-23 · unverdicted · none · ref 24

    DG-PG augments policy gradients with descent signals from analytical models to reduce estimator variance from O(N) to O(1), preserve game equilibria, and achieve agent-independent sample complexity while converging on 1500-agent tasks where baselines fail.

  • Soft Deterministic Policy Gradient with Gaussian Smoothing cs.LG · 2026-05-07 · unverdicted · none · ref 24

    Soft-DPG uses Gaussian smoothing on the Bellman equation to derive a well-defined policy gradient without relying on critic action derivatives, yielding competitive performance on dense-reward tasks and gains on discretized-reward variants.