Tinyv: Reducing false negatives in verification improves rl for llm reasoning

Zhangchen Xu, Yuetai Li, Fengqing Jiang, Bhaskar Ramasubramanian, Luyao Niu, Bill Yuchen Lin, Radha Poovendran · 2025 · arXiv 2505.14625

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

read on arXiv browse 3 citing papers

representative citing papers

Delay, Plateau, or Collapse: Evaluating the Impact of Systematic Verification Error on RLVR

cs.LG · 2026-04-06 · unverdicted · novelty 6.0

Systematic false positives in verifiers can cause RLVR training to reach suboptimal plateaus or collapse, with outcomes driven by error patterns rather than overall error rate.

Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers

cs.LG · 2025-10-01 · unverdicted · novelty 6.0

Derives backward and forward corrections for asymmetric verifier noise that improve RLVR performance on math reasoning tasks.

High-Dimensional Statistics: Reflections on Progress and Open Problems

math.ST · 2026-05-06

citing papers explorer

Showing 3 of 3 citing papers.

Delay, Plateau, or Collapse: Evaluating the Impact of Systematic Verification Error on RLVR cs.LG · 2026-04-06 · unverdicted · none · ref 22
Systematic false positives in verifiers can cause RLVR training to reach suboptimal plateaus or collapse, with outcomes driven by error patterns rather than overall error rate.
Reinforcement Learning with Verifiable yet Noisy Rewards under Imperfect Verifiers cs.LG · 2025-10-01 · unverdicted · none · ref 34
Derives backward and forward corrections for asymmetric verifier noise that improve RLVR performance on math reasoning tasks.
High-Dimensional Statistics: Reflections on Progress and Open Problems math.ST · 2026-05-06 · unreviewed · ref 105

Tinyv: Reducing false negatives in verification improves rl for llm reasoning

fields

years

verdicts

representative citing papers

citing papers explorer