Advances in neural information processing systems , volume=

Actor-critic algorithms , author=

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients

cs.LG · 2026-05-14 · unverdicted · novelty 7.0

HPO enables unbiased policy optimization in hybrid action spaces by mixing differentiable simulation gradients with score-function estimates, outperforming PPO as continuous dimensions increase.

Policy Gradient Methods for Non-Markovian Reinforcement Learning

cs.LG · 2026-05-11 · unverdicted · novelty 6.0

Introduces the Agent State-Markov Policy Gradient (ASMPG) algorithm and a policy gradient theorem for non-Markovian decision processes by jointly optimizing agent state dynamics and control policy.

The Counterexample Game: Iterated Conceptual Analysis and Repair in Language Models

cs.CL · 2026-05-05 · unverdicted · novelty 6.0

Language models engage in counterexample-repair loops for conceptual definitions but produce increasingly verbose outputs without accuracy gains and hit diminishing returns quickly.

Using Common Random Numbers for Simulation-based Planning with Rollouts

cs.LG · 2026-05-06 · unverdicted · novelty 5.0

Using common random numbers in rollout simulations provably reduces variance in relative utility estimates when a rollout policy is invoked beyond some depth.

A note on convergence of Wasserstein policy optimization

cs.LG · 2026-05-21 · unverdicted · novelty 4.0

The note claims linear convergence of WPO in entropy-regularized MDPs by combining mean-field gradient flow analysis with a local log-Sobolev inequality under a regularity assumption.

citing papers explorer

Showing 5 of 5 citing papers.

Policy Optimization in Hybrid Discrete-Continuous Action Spaces via Mixed Gradients cs.LG · 2026-05-14 · unverdicted · none · ref 5
HPO enables unbiased policy optimization in hybrid action spaces by mixing differentiable simulation gradients with score-function estimates, outperforming PPO as continuous dimensions increase.
Policy Gradient Methods for Non-Markovian Reinforcement Learning cs.LG · 2026-05-11 · unverdicted · none · ref 65
Introduces the Agent State-Markov Policy Gradient (ASMPG) algorithm and a policy gradient theorem for non-Markovian decision processes by jointly optimizing agent state dynamics and control policy.
The Counterexample Game: Iterated Conceptual Analysis and Repair in Language Models cs.CL · 2026-05-05 · unverdicted · none · ref 4
Language models engage in counterexample-repair loops for conceptual definitions but produce increasingly verbose outputs without accuracy gains and hit diminishing returns quickly.
Using Common Random Numbers for Simulation-based Planning with Rollouts cs.LG · 2026-05-06 · unverdicted · none · ref 9
Using common random numbers in rollout simulations provably reduces variance in relative utility estimates when a rollout policy is invoked beyond some depth.
A note on convergence of Wasserstein policy optimization cs.LG · 2026-05-21 · unverdicted · none · ref 62
The note claims linear convergence of WPO in entropy-regularized MDPs by combining mean-field gradient flow analysis with a local log-Sobolev inequality under a regularity assumption.

Advances in neural information processing systems , volume=

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer