Entropy change with contrast score.As established in §D, our framework is equivalent to policy gradient when the contrast score is used as the advantage

Thus, when high-probability actions tend to carry positive advantage (positive covariance), entropy decreases, whereas if advantage is concentrated on low-probability actions (negative covariance), entropy can increase (Cui et al · 2025

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

LightReasoner: Can Small Language Models Teach Large Language Models Reasoning?

cs.CL · 2025-10-09 · unverdicted · novelty 6.0

LightReasoner distills supervision signals from SLM-LLM behavioral divergence to improve LLM reasoning on math benchmarks with up to 28.1% accuracy gains and 90-99% reductions in resources.

citing papers explorer

Showing 1 of 1 citing paper.

LightReasoner: Can Small Language Models Teach Large Language Models Reasoning? cs.CL · 2025-10-09 · unverdicted · none · ref 26
LightReasoner distills supervision signals from SLM-LLM behavioral divergence to improve LLM reasoning on math benchmarks with up to 28.1% accuracy gains and 90-99% reductions in resources.

Entropy change with contrast score.As established in §D, our framework is equivalent to policy gradient when the contrast score is used as the advantage

fields

years

verdicts

representative citing papers

citing papers explorer