Qwq-32b: Embracing the power of reinforcement learning, 2024

Qwen · 2024

2 Pith papers cite this work. Polarity classification is still indexing.

2 Pith papers citing it

browse 2 citing papers

representative citing papers

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent

cs.CL · 2025-07-03 · unverdicted · novelty 6.0

MemAgent uses multi-conversation RL to train a memory agent that reads text in segments and overwrites memory, extrapolating from 8K training to 3.5M token QA with under 5% loss and 95%+ on 512K RULER.

DAPO: An Open-Source LLM Reinforcement Learning System at Scale

cs.LG · 2025-03-18 · conditional · novelty 6.0

DAPO introduces decoupled clipping and dynamic sampling for LLM RL, achieving 50 on AIME 2024 with Qwen2.5-32B while fully open-sourcing code, data, and the verl-based training system.

citing papers explorer

Showing 2 of 2 citing papers.

MemAgent: Reshaping Long-Context LLM with Multi-Conv RL-based Memory Agent cs.CL · 2025-07-03 · unverdicted · none · ref 52
MemAgent uses multi-conversation RL to train a memory agent that reads text in segments and overwrites memory, extrapolating from 8K training to 3.5M token QA with under 5% loss and 95%+ on 512K RULER.
DAPO: An Open-Source LLM Reinforcement Learning System at Scale cs.LG · 2025-03-18 · conditional · none · ref 10
DAPO introduces decoupled clipping and dynamic sampling for LLM RL, achieving 50 on AIME 2024 with Qwen2.5-32B while fully open-sourcing code, data, and the verl-based training system.

Qwq-32b: Embracing the power of reinforcement learning, 2024

fields

years

verdicts

representative citing papers

citing papers explorer