Introduces RSI metric and RSI-S filtering method for adaptive token selection in RLVR, reporting 2-3 point gains over GRPO on AIME/AMC benchmarks.
On the entropy dynamics in reinforcement fine-tuning of large language models
4 Pith papers cite this work. Polarity classification is still indexing.
4
Pith papers citing it
years
2026 4representative citing papers
Fine-tuning reorganizes uncertainty in LLMs into more efficient information conveyance, as shown by stronger length-entropy correlations and a tripling of entropy-semantic diversity links after controls.
Entrocraft uses rejection sampling to enforce precise entropy schedules in LLM RL by biasing advantages, enabling longer training, better generalization, and higher performance than baselines.