Reasoning with exploration: An entropy perspective

Daixuan Cheng, Shaohan Huang, Xuekai Zhu, Bo Dai, Xin Zhao, Zhenliang Zhang, Furu Wei · 2026 · DOI 10.1609/aaai.v40i36.40290

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open at publisher browse 3 citing papers

representative citing papers

Tailoring Teaching to Aptitude: Direction-Adaptive Self-Distillation for LLM Reasoning

cs.LG · 2026-05-21 · unverdicted · novelty 6.0

DASD improves math reasoning in LLMs by adaptively directing self-distillation based on per-token entropy to balance exploration and step accuracy, outperforming prior self-distillation and RLVR baselines on six benchmarks.

Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control

cs.LG · 2026-05-12 · unverdicted · novelty 6.0 · 2 refs

Entropy polarity is a signed token-level quantity derived from a first-order approximation of entropy change that predicts whether RL updates expand or contract policy entropy in LLM fine-tuning, revealing an asymmetry between high- and low-probability tokens.

Forget What Matters, Keep the Rest: Selective Unlearning of Informative Tokens

cs.CL · 2026-04-20 · unverdicted · novelty 6.0

ETW uses predictive entropy as a proxy for token informativeness to improve selective unlearning in LLMs, achieving better forgetting with less utility loss than prior token-level methods.

citing papers explorer

Showing 3 of 3 citing papers.

Tailoring Teaching to Aptitude: Direction-Adaptive Self-Distillation for LLM Reasoning cs.LG · 2026-05-21 · unverdicted · none · ref 31
DASD improves math reasoning in LLMs by adaptively directing self-distillation based on per-token entropy to balance exploration and step accuracy, outperforming prior self-distillation and RLVR baselines on six benchmarks.
Entropy Polarity in Reinforcement Fine-Tuning: Direction, Asymmetry, and Control cs.LG · 2026-05-12 · unverdicted · none · ref 45 · 2 links
Entropy polarity is a signed token-level quantity derived from a first-order approximation of entropy change that predicts whether RL updates expand or contract policy entropy in LLM fine-tuning, revealing an asymmetry between high- and low-probability tokens.
Forget What Matters, Keep the Rest: Selective Unlearning of Informative Tokens cs.CL · 2026-04-20 · unverdicted · none · ref 28
ETW uses predictive entropy as a proxy for token informativeness to improve selective unlearning in LLMs, achieving better forgetting with less utility loss than prior token-level methods.

Reasoning with exploration: An entropy perspective

fields

years

verdicts

representative citing papers

citing papers explorer