Learn more with less: Uncertainty consistency guided query selection for rlvr.arXiv preprint arXiv:2601.22595, 2026

Hao Yi, Yulan Hu, Xin Li, Sheng Ouyang, Lizhong Ding, Yong Liu · 2026 · arXiv 2601.22595

1 Pith paper cite this work. Polarity classification is still indexing.

1 Pith paper citing it

representative citing papers

When Self-Belief Misleads: Active Label Acquisition for Reinforcement Learning with Verifiable Rewards

cs.LG · 2026-05-25 · unverdicted · novelty 5.0

RLAVR uses the Corrective Advantage Gap metric and CARE policy to actively acquire ground-truth labels for key samples, stabilizing RLVR training and boosting performance with limited annotation budgets.

citing papers explorer

Showing 1 of 1 citing paper.

When Self-Belief Misleads: Active Label Acquisition for Reinforcement Learning with Verifiable Rewards cs.LG · 2026-05-25 · unverdicted · none · ref 45
RLAVR uses the Corrective Advantage Gap metric and CARE policy to actively acquire ground-truth labels for key samples, stabilizing RLVR training and boosting performance with limited annotation budgets.

Learn more with less: Uncertainty consistency guided query selection for rlvr.arXiv preprint arXiv:2601.22595, 2026

fields

years

verdicts

representative citing papers

citing papers explorer