pith. sign in

Open r1: A fully open reproduction of deepseek-r1, January 2025

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

citation-role summary

dataset 2

citation-polarity summary

years

2026 3 2025 2

verdicts

UNVERDICTED 5

roles

dataset 2

polarities

use dataset 2

representative citing papers

Learning to Reason under Off-Policy Guidance

cs.LG · 2025-04-21 · unverdicted · novelty 6.0

LUFFY mixes off-policy reasoning traces into RLVR training via Mixed-Policy GRPO and regularized importance sampling, delivering over 6-point gains on math benchmarks and enabling training of weak models where on-policy RLVR fails.

citing papers explorer

Showing 5 of 5 citing papers.