pith. sign in

Nanbeige4-3b technical report: Exploring the frontier of small language models.arXiv preprint arXiv:2512.06266,

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

fields

cs.LG 1

years

2026 1

verdicts

UNVERDICTED 1

representative citing papers

Extreme Region Policy Distillation

cs.LG · 2026-05-25 · unverdicted · novelty 6.0

ERPD decouples aggressive off-policy optimization on fixed trajectories from trust-region distillation to achieve comparable or better LLM performance with substantially smaller KL divergence.

citing papers explorer

Showing 1 of 1 citing paper.

  • Extreme Region Policy Distillation cs.LG · 2026-05-25 · unverdicted · none · ref 12

    ERPD decouples aggressive off-policy optimization on fixed trajectories from trust-region distillation to achieve comparable or better LLM performance with substantially smaller KL divergence.