We have also tried different sampling batch size and gradient update batch size to vary the maximum number of off-policy update

We use learning rate 5e-7 for Qwen2

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

browse 1 citing papers

representative citing papers

The Unlearnability Phenomenon in RLVR for Language Models

cs.LG · 2026-05-16 · unverdicted · novelty 7.0

RLVR training for language models exhibits an unlearnability phenomenon where certain hard examples stay unlearnable due to low gradient similarity and ungeneralizable reasoning patterns.

citing papers explorer

Showing 1 of 1 citing paper.

The Unlearnability Phenomenon in RLVR for Language Models cs.LG · 2026-05-16 · unverdicted · none · ref 3
RLVR training for language models exhibits an unlearnability phenomenon where certain hard examples stay unlearnable due to low gradient similarity and ungeneralizable reasoning patterns.

We have also tried different sampling batch size and gradient update batch size to vary the maximum number of off-policy update

fields

years

verdicts

representative citing papers

citing papers explorer