FEST improves RLVR sample efficiency on math and coding benchmarks by combining supervised signals, on-policy signals, and decaying weights on just 128 randomly chosen demonstrations, matching full-dataset baselines.
Data Diversity Matters for Robust Instruction Tuning
2 Pith papers cite this work, alongside 11 external citations. Polarity classification is still indexing.
2
Pith papers citing it
11
external citations · Crossref
citation-role summary
background 1
citation-polarity summary
fields
cs.LG 2years
2026 2roles
background 1polarities
background 1representative citing papers
citing papers explorer
-
Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance
FEST improves RLVR sample efficiency on math and coding benchmarks by combining supervised signals, on-policy signals, and decaying weights on just 128 randomly chosen demonstrations, matching full-dataset baselines.
- Weasel: Out-of-Domain Generalization for Web Agents via Importance-Diversity Data Selection