Title resolution pending

Beyond the 80/20 Rule: High-Entropy Minority Tokens Drive Effective Reinforcement Learning for LLM Reasoning , author= · 2025

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

browse 4 citing papers

Title metadata for this work has not finished resolving. The hub is built from the citation graph; the title resolver retries DOI and OpenAlex on its next pass.

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Uniform-Correct Policy Optimization: Breaking RLVR's Indifference to Diversity

cs.LG · 2026-05-01 · unverdicted · novelty 7.0

UCPO modifies GRPO with a uniformity penalty over correct solutions to prevent diversity collapse in RLVR, yielding up to 10% higher Pass@64 on AIME24 and 45% more equation-level diversity.

Forecasting Downstream Performance of LLMs With Proxy Metrics

cs.CL · 2026-05-18 · unverdicted · novelty 6.0

Proxy metrics from next-token distributions over expert solutions outperform loss and compute baselines for ranking LLMs, selecting pretraining data, and extrapolating performance across compute scales.

Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models

cs.AI · 2026-05-18 · unverdicted · novelty 5.0

Defines Entropy-Gradient Inversion as a geometric fingerprint of LRM reasoning and introduces CorR-PO to embed it in RL reward regularization, reporting improved benchmark performance.

Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation

cs.CL · 2026-05-12 · unverdicted · novelty 5.0 · 2 refs

On-policy distillation gains efficiency from early foresight in module allocation and update directions, which the proposed EffOPD method exploits for 3x faster training with comparable performance.

citing papers explorer

Showing 4 of 4 citing papers.

Uniform-Correct Policy Optimization: Breaking RLVR's Indifference to Diversity cs.LG · 2026-05-01 · unverdicted · none · ref 12
UCPO modifies GRPO with a uniformity penalty over correct solutions to prevent diversity collapse in RLVR, yielding up to 10% higher Pass@64 on AIME24 and 45% more equation-level diversity.
Forecasting Downstream Performance of LLMs With Proxy Metrics cs.CL · 2026-05-18 · unverdicted · none · ref 91
Proxy metrics from next-token distributions over expert solutions outperform loss and compute baselines for ranking LLMs, selecting pretraining data, and extrapolating performance across compute scales.
Entropy-Gradient Inversion: Moving Toward Internal Mechanism of Large Reasoning Models cs.AI · 2026-05-18 · unverdicted · none · ref 37
Defines Entropy-Gradient Inversion as a geometric fingerprint of LRM reasoning and introduces CorR-PO to embed it in RL reward regularization, reporting improved benchmark performance.
Learning to Foresee: Unveiling the Unlocking Efficiency of On-Policy Distillation cs.CL · 2026-05-12 · unverdicted · none · ref 56 · 2 links
On-policy distillation gains efficiency from early foresight in module allocation and update directions, which the proposed EffOPD method exploits for 3x faster training with comparable performance.

Title resolution pending

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer