Beyond the 80/20 rule: High-entropy minority tokens drive effective reinforcement learning for llm reasoning

Shenzhi Wang, Le Yu, Chang Gao, Chujie Zheng, Shixuan Liu, Rui Lu, Kai Dang, Xionghui Chen, Jianxin Yang, Zhenru Zhang, Yuqiong Liu, An Yang, Andrew Zhao, Yang Yue, Shiji Song, Bowen Yu, Gao Huang, Junyang Lin · 2025

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation

cs.LG · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

GEAR adaptively reweights GRPO advantages in LLM RL by using divergence spikes from self-distillation to define semantic segments and modulate local credit.

Heterogeneous Adaptive Policy Optimization: Tailoring Optimization to Every Token's Nature

cs.CL · 2025-09-20 · unverdicted · novelty 5.0

HAPO is a new token-level policy optimization method for LLMs that continuously adapts four optimization stages using entropy, claiming consistent gains over DAPO on math, code, and logic tasks.

citing papers explorer

Showing 2 of 2 citing papers.

GEAR: Granularity-Adaptive Advantage Reweighting for LLM Agents via Self-Distillation cs.LG · 2026-05-12 · unverdicted · none · ref 14 · 2 links
GEAR adaptively reweights GRPO advantages in LLM RL by using divergence spikes from self-distillation to define semantic segments and modulate local credit.
Heterogeneous Adaptive Policy Optimization: Tailoring Optimization to Every Token's Nature cs.CL · 2025-09-20 · unverdicted · none · ref 27
HAPO is a new token-level policy optimization method for LLMs that continuously adapts four optimization stages using entropy, claiming consistent gains over DAPO on math, code, and logic tasks.

Beyond the 80/20 rule: High-entropy minority tokens drive effective reinforcement learning for llm reasoning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer