VLA-RL applies online RL to pretrained VLAs, yielding a 4.5% gain over strong baselines on 40 LIBERO manipulation tasks and matching commercial models like π₀-FAST.
arXiv preprint arXiv:2403.12203 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
Unsupervised behavioral mode discovery combined with mutual information rewards enables RL fine-tuning of multimodal generative policies that achieves higher success rates without losing action diversity.
SCAL derives an upper bound on target-domain imitation loss using source loss plus state-conditional latent KL divergence and aligns distributions via a discriminator-based adversarial estimator.
citing papers explorer
-
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning
VLA-RL applies online RL to pretrained VLAs, yielding a 4.5% gain over strong baselines on 40 LIBERO manipulation tasks and matching commercial models like π₀-FAST.
-
Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies
Unsupervised behavioral mode discovery combined with mutual information rewards enables RL fine-tuning of multimodal generative policies that achieves higher success rates without losing action diversity.
-
State-Conditional Adversarial Learning: An Off-Policy Visual Domain Transfer Method for End-to-End Imitation Learning
SCAL derives an upper bound on target-domain imitation loss using source loss plus state-conditional latent KL divergence and aligns distributions via a discriminator-based adversarial estimator.