Prefix-RFT blends SFT and RFT via prefix sampling from demonstrations to outperform standalone SFT, RFT, and mixed-policy baselines on math reasoning problems.
Title resolution pending
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
fields
cs.LG 3verdicts
UNVERDICTED 3representative citing papers
RECAP enables a generalist VLA to self-improve via advantage-conditioned RL on mixed real-world data, more than doubling throughput and halving failure rates on hard manipulation tasks.
AWAC combines offline data with online RL via advantage-weighted actor-critic updates to enable faster acquisition of robotic skills such as dexterous manipulation.
citing papers explorer
-
Blending Supervised and Reinforcement Fine-Tuning with Prefix Sampling
Prefix-RFT blends SFT and RFT via prefix sampling from demonstrations to outperform standalone SFT, RFT, and mixed-policy baselines on math reasoning problems.
-
$\pi^{*}_{0.6}$: a VLA That Learns From Experience
RECAP enables a generalist VLA to self-improve via advantage-conditioned RL on mixed real-world data, more than doubling throughput and halving failure rates on hard manipulation tasks.
-
AWAC: Accelerating Online Reinforcement Learning with Offline Datasets
AWAC combines offline data with online RL via advantage-weighted actor-critic updates to enable faster acquisition of robotic skills such as dexterous manipulation.