Covariance-based entropy control selectively regularizes high-covariance tokens in softmax policies and achieves asymptotic unbiasedness upon annealing, unlike traditional regularization which introduces dense bias and alters the stationary distribution.
Training language models to follow instructions with human feedback
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
citation-role summary
background 1
citation-polarity summary
fields
cs.LG 1years
2026 1verdicts
UNVERDICTED 1roles
background 1polarities
background 1representative citing papers
citing papers explorer
-
A Comparative Theoretical Analysis of Entropy Control Methods in Reinforcement Learning
Covariance-based entropy control selectively regularizes high-covariance tokens in softmax policies and achieves asymptotic unbiasedness upon annealing, unlike traditional regularization which introduces dense bias and alters the stationary distribution.