Nanbeige4-3b technical report: Exploring the frontier of small language models.arXiv preprint arXiv:2512.06266,

Chen Yang, Guangyue Peng, Jiaying Zhu, Ran Le, Ruixiang Feng, Tao Zhang, Wei Ruan, Xiaoqi Liu, Xiaoxue Cheng, Xiyun Xu, et al · arXiv 2512.06266

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

read on arXiv browse 1 citing papers

representative citing papers

Extreme Region Policy Distillation

cs.LG · 2026-05-25 · unverdicted · novelty 6.0

ERPD decouples aggressive off-policy optimization on fixed trajectories from trust-region distillation to achieve comparable or better LLM performance with substantially smaller KL divergence.

citing papers explorer

Showing 1 of 1 citing paper after filters.

Extreme Region Policy Distillation cs.LG · 2026-05-25 · unverdicted · none · ref 12
ERPD decouples aggressive off-policy optimization on fixed trajectories from trust-region distillation to achieve comparable or better LLM performance with substantially smaller KL divergence.

Nanbeige4-3b technical report: Exploring the frontier of small language models.arXiv preprint arXiv:2512.06266,

fields

years

verdicts

representative citing papers

citing papers explorer