pith. sign in

How does rl policy entropy converge during iteration?https://zhuanlan.zhihu

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

fields

cs.LG 2

years

2026 1 2025 1

verdicts

UNVERDICTED 2

clear filters

representative citing papers

Rethinking Entropy Interventions in RLVR: An Entropy Change Perspective

cs.LG · 2025-10-11 · unverdicted · novelty 5.0

Derives a token-level entropy change approximation revealing four factors, identifies limitations in prior entropy interventions, and proposes STEER which adaptively reweights tokens to mitigate collapse and improve performance on math and coding benchmarks.

citing papers explorer

Showing 2 of 2 citing papers after filters.