arXiv preprint arXiv:2402.12146 , year=

Liu, Z · 2024 · arXiv 2402.12146

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

Beyond Variance: Prompt-Efficient RLVR via Rare-Event Amplification and Bidirectional Pairing

cs.LG · 2026-02-03 · unverdicted · novelty 7.0

Positive-negative prompt pairing with weighted GRPO improves RLVR sample efficiency, raising AIME 2025 Pass@8 from 16.8 to 22.2 on Qwen2.5-Math-7B while matching large-scale training.

Reinforcement Learning for Reasoning in Large Language Models with One Training Example

cs.LG · 2025-04-29 · accept · novelty 7.0

One training example via RLVR boosts LLM math reasoning from 17.6% to 35.7% average across six benchmarks.

HEALing Entropy Collapse: Enhancing Exploration in Few-Shot RLVR via Hybrid-Domain Entropy Dynamics Alignment

cs.LG · 2026-04-20 · unverdicted · novelty 6.0

HEAL mitigates entropy collapse in few-shot RLVR by selectively adding general-domain data and aligning trajectory-level entropy dynamics, matching full-shot performance with 32 target samples.

citing papers explorer

Showing 3 of 3 citing papers.

Beyond Variance: Prompt-Efficient RLVR via Rare-Event Amplification and Bidirectional Pairing cs.LG · 2026-02-03 · unverdicted · none · ref 16
Positive-negative prompt pairing with weighted GRPO improves RLVR sample efficiency, raising AIME 2025 Pass@8 from 16.8 to 22.2 on Qwen2.5-Math-7B while matching large-scale training.
Reinforcement Learning for Reasoning in Large Language Models with One Training Example cs.LG · 2025-04-29 · accept · none · ref 53
One training example via RLVR boosts LLM math reasoning from 17.6% to 35.7% average across six benchmarks.
HEALing Entropy Collapse: Enhancing Exploration in Few-Shot RLVR via Hybrid-Domain Entropy Dynamics Alignment cs.LG · 2026-04-20 · unverdicted · none · ref 94
HEAL mitigates entropy collapse in few-shot RLVR by selectively adding general-domain data and aligning trajectory-level entropy dynamics, matching full-shot performance with 32 target samples.

arXiv preprint arXiv:2402.12146 , year=

fields

years

verdicts

representative citing papers

citing papers explorer