pith. sign in

Qwenlong-l1: Towards long-context large reasoning models with reinforcement learning

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

citation-role summary

background 1 dataset 1

citation-polarity summary

years

2026 6 2025 2

verdicts

UNVERDICTED 8

representative citing papers

OPSDL: On-Policy Self-Distillation for Long-Context Language Models

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

OPSDL improves long-context LLM performance by having the model self-distill from its short-context capability using point-wise reverse KL divergence on generated tokens, outperforming SFT and DPO on benchmarks without harming short-context abilities.

citing papers explorer

Showing 8 of 8 citing papers.