PPO-Clip gradient equals a per-sample KL surrogate with closed-form coefficient on importance ratio and advantage, yielding identical curves on five MuJoCo tasks.
You may not need ratio clipping in PPO.arXiv preprint arXiv:2202.00079, 2022
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
KLip-PPO: A per-sample KL perspective on PPO-Clip
PPO-Clip gradient equals a per-sample KL surrogate with closed-form coefficient on importance ratio and advantage, yielding identical curves on five MuJoCo tasks.