Title resolution pending

Hunter Lightman, Vineet Kosaraju, Yuri Burda, Harrison Edwards, Bowen Baker, Teddy Lee, Jan Leike, John Schulman, Ilya Sutskever, Karl Cobbe · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

representative citing papers

Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning

cs.LG · 2025-12-05 · unverdicted · novelty 6.0

Entropy Ratio Clipping introduces a global entropy-ratio constraint that stabilizes RL policy updates in LLM post-training beyond local PPO clipping.

CE-GPPO: Coordinating Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning

cs.LG · 2025-09-25 · unverdicted · novelty 6.0

CE-GPPO preserves bounded gradients from clipped tokens in PPO to regulate entropy evolution and improve performance on mathematical reasoning benchmarks.

citing papers explorer

Showing 2 of 2 citing papers.

Entropy Ratio Clipping as a Soft Global Constraint for Stable Reinforcement Learning cs.LG · 2025-12-05 · unverdicted · none · ref 10
Entropy Ratio Clipping introduces a global entropy-ratio constraint that stabilizes RL policy updates in LLM post-training beyond local PPO clipping.
CE-GPPO: Coordinating Entropy via Gradient-Preserving Clipping Policy Optimization in Reinforcement Learning cs.LG · 2025-09-25 · unverdicted · none · ref 13
CE-GPPO preserves bounded gradients from clipped tokens in PPO to regulate entropy evolution and improve performance on mathematical reasoning benchmarks.

Title resolution pending

fields

years

verdicts

representative citing papers

citing papers explorer