pith. sign in

Title resolution pending

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.LG 1

years

2026 1

verdicts

UNVERDICTED 1

clear filters

representative citing papers

OISD: On-Policy Internal Self-Distillation of Language Models

cs.LG · 2026-05-27 · unverdicted · novelty 6.0

OISD improves mathematical reasoning in language models by using the final layer as an internal teacher to align logits and attention patterns in selected intermediate layers via signed advantage-weighted Jensen-Shannon divergence during GRPO optimization.

citing papers explorer

Showing 1 of 1 citing paper after filters.

  • OISD: On-Policy Internal Self-Distillation of Language Models cs.LG · 2026-05-27 · unverdicted · none · ref 2

    OISD improves mathematical reasoning in language models by using the final layer as an internal teacher to align logits and attention patterns in selected intermediate layers via signed advantage-weighted Jensen-Shannon divergence during GRPO optimization.