GoLongRL releases a 23K-sample open long-context RL dataset spanning 9 tasks and introduces TMN-Reweight to improve multitask optimization, achieving performance comparable to much larger models under GRPO.
Proceedings of the Student Research Workshop at the 14th Conference of the European Chapter of the Association for Computational Linguistics , month =
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.CL 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
GoLongRL: Capability-Oriented Long Context Reinforcement Learning with Multitask Alignment
GoLongRL releases a 23K-sample open long-context RL dataset spanning 9 tasks and introduces TMN-Reweight to improve multitask optimization, achieving performance comparable to much larger models under GRPO.