PARAMΔ upcycles dense models to MoE for per-language experts and grafts post-training deltas to enable data-efficient language expansion while preserving original capabilities.
Language models can self-lengthen to generate long texts
4 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CL 4verdicts
UNVERDICTED 4representative citing papers
LongWriter-Zero applies RL from a base model with specialized rewards for length, quality, and structure to outperform SFT baselines and larger models on long-writing benchmarks.
Writing-RL applies adaptive curriculum RL with pairwise rewards and dynamic scheduling to enhance long-form writing in 7B LLMs over SFT baselines and shows generalization to long-input reasoning tasks.
Qwen2.5 LLMs scale pre-training data to 18 trillion tokens and apply multistage reinforcement learning, achieving competitive performance on benchmarks with models up to 5 times larger.
citing papers explorer
-
A Data-Efficient Path to Multilingual LLMs: Language Expansion via Post-training PARAM$\Delta$ Integration into Upcycled MoE
PARAMΔ upcycles dense models to MoE for per-language experts and grafts post-training deltas to enable data-efficient language expansion while preserving original capabilities.
-
LongWriter-Zero: Mastering Ultra-Long Text Generation via Reinforcement Learning
LongWriter-Zero applies RL from a base model with specialized rewards for length, quality, and structure to outperform SFT baselines and larger models on long-writing benchmarks.
-
Writing-RL: Advancing Long-form Writing via Adaptive Curriculum Reinforcement Learning
Writing-RL applies adaptive curriculum RL with pairwise rewards and dynamic scheduling to enhance long-form writing in 7B LLMs over SFT baselines and shows generalization to long-input reasoning tasks.
-
Qwen2.5 Technical Report
Qwen2.5 LLMs scale pre-training data to 18 trillion tokens and apply multistage reinforcement learning, achieving competitive performance on benchmarks with models up to 5 times larger.