Greedification operators for policy optimization: Investigating forward and reverse kl divergences

Alan Chan, Hugo Silva, Sungsu Lim, Tadashi Kozuno, A Rupam Mahmood, Martha White · 2022

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces

cs.LG · 2025-09-26 · unverdicted · novelty 6.0

A method trains discrete diffusion policies for combinatorial RL by matching to a PMD-regularized target distribution, reporting SOTA performance and sample efficiency on DNA generation, macro-action, and multi-agent benchmarks.

Dissecting Discrete Soft Actor-Critic: Limitations and Principled Alternatives

cs.LG · 2025-09-11 · conditional · novelty 6.0

Shows entropy coupling limits DSAC on discrete tasks and introduces a generalized actor-critic framework with m-step critics and novel entropy-regularized objectives that perform robustly on Atari.

citing papers explorer

Showing 2 of 2 citing papers.

Reinforcement Learning with Discrete Diffusion Policies for Combinatorial Action Spaces cs.LG · 2025-09-26 · unverdicted · none · ref 10
A method trains discrete diffusion policies for combinatorial RL by matching to a PMD-regularized target distribution, reporting SOTA performance and sample efficiency on DNA generation, macro-action, and multi-agent benchmarks.
Dissecting Discrete Soft Actor-Critic: Limitations and Principled Alternatives cs.LG · 2025-09-11 · conditional · none · ref 5
Shows entropy coupling limits DSAC on discrete tasks and introduces a generalized actor-critic framework with m-step critics and novel entropy-regularized objectives that perform robustly on Atari.

Greedification operators for policy optimization: Investigating forward and reverse kl divergences

fields

years

verdicts

representative citing papers

citing papers explorer