FEST improves RLVR sample efficiency on math and coding benchmarks by combining supervised signals, on-policy signals, and decaying weights on just 128 randomly chosen demonstrations, matching full-dataset baselines.
Title resolution pending
2 Pith papers cite this work, alongside 6 external citations. Polarity classification is still indexing.
2
Pith papers citing it
6
external citations · Crossref
citation-role summary
background 2
citation-polarity summary
verdicts
UNVERDICTED 2roles
background 2polarities
background 2representative citing papers
A literature survey that organizes prompting, fine-tuning, preference optimization, and context-aware techniques for LLM-based machine translation with emphasis on low-resource languages.
citing papers explorer
-
Boosting Reinforcement Learning with Verifiable Rewards via Randomly Selected Few-Shot Guidance
FEST improves RLVR sample efficiency on math and coding benchmarks by combining supervised signals, on-policy signals, and decaying weights on just 128 randomly chosen demonstrations, matching full-dataset baselines.
-
Bridging the Linguistic Divide: A Survey on Leveraging Large Language Models for Machine Translation
A literature survey that organizes prompting, fine-tuning, preference optimization, and context-aware techniques for LLM-based machine translation with emphasis on low-resource languages.