Qwenlong-l1: Towards long-context large reasoning models with reinforcement learning

Fanqi Wan, Weizhou Shen, Shengyi Liao, Yingcheng Shi, Chenliang Li, Ziyi Yang, Ji Zhang, Fei Huang, Jingren Zhou, Ming Yan · 2025 · arXiv 2505.17667

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 1 dataset 1

citation-polarity summary

background 1 use dataset 1

representative citing papers

StoryAlign: Evaluating and Training Reward Models for Story Generation

cs.CL · 2026-05-06 · unverdicted · novelty 7.0

StoryReward, trained on a new 100k story preference dataset, sets state-of-the-art performance on the introduced StoryRMB benchmark for aligning LLM stories with human preferences.

OPSDL: On-Policy Self-Distillation for Long-Context Language Models

cs.CL · 2026-04-19 · unverdicted · novelty 6.0

OPSDL improves long-context LLM performance by having the model self-distill from its short-context capability using point-wise reverse KL divergence on generated tokens, outperforming SFT and DPO on benchmarks without harming short-context abilities.

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

cs.CL · 2025-07-03 · unverdicted · novelty 6.0

MemAgent uses multi-conversation RL to train a memory agent that reads text in segments and overwrites memory, extrapolating from 8K training to 3.5M token QA with under 5% loss and 95%+ on 512K RULER.

A Recipe for Long-Context Reasoning in Large Language Models via On-Policy Optimization and Distillation

cs.CL · 2026-05-12 · unverdicted · novelty 5.0

Combines GRPO with teacher-guided on-policy distillation and introduces LongBlocks dataset to yield more stable long-context reasoning than either method alone.

LongAct: Harnessing Intrinsic Activation Patterns for Long-Context Reinforcement Learning

cs.LG · 2026-04-16 · unverdicted · novelty 5.0

LongAct uses saliency from high-magnitude activations to guide sparse weight updates in long-context RL, yielding about 8% gains on LongBench v2 across multiple algorithms.

A Decomposition Perspective to Long-context Reasoning for LLMs

cs.CL · 2026-04-09 · unverdicted · novelty 5.0

Decomposing long-context reasoning into atomic skills, synthesizing targeted pseudo-datasets, and applying RL improves LLM performance on long-context benchmarks by an average of 7.7%.

GLM-4.5: Agentic, Reasoning, and Coding (ARC) Foundation Models

cs.CL · 2025-08-08 · unverdicted · novelty 4.0

GLM-4.5, a 355B-parameter MoE model with hybrid reasoning, scores 70.1% on TAU-Bench, 91.0% on AIME 24, and 64.2% on SWE-bench Verified while ranking 3rd overall and 2nd on agentic benchmarks.

Internalized Reasoning for Long-Context Visual Document Understanding

cs.CV · 2026-03-31

citing papers explorer

Showing 1 of 1 citing paper after filters.

Internalized Reasoning for Long-Context Visual Document Understanding cs.CV · 2026-03-31 · unreviewed · ref 47

Qwenlong-l1: Towards long-context large reasoning models with reinforcement learning

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer