Hierarchical Behaviour Spaces uses linear combinations of reward functions to induce expressive behavior spaces in hierarchical RL, yielding strong performance on NetHack primarily through better exploration rather than long-term planning.
arXiv preprint arXiv:2402.02868 , year=
3 Pith papers cite this work. Polarity classification is still indexing.
representative citing papers
VLA-RL applies online RL to pretrained VLAs, yielding a 4.5% gain over strong baselines on 40 LIBERO manipulation tasks and matching commercial models like π₀-FAST.
Unsupervised behavioral mode discovery combined with mutual information rewards enables RL fine-tuning of multimodal generative policies that achieves higher success rates without losing action diversity.
citing papers explorer
-
Hierarchical Behaviour Spaces
Hierarchical Behaviour Spaces uses linear combinations of reward functions to induce expressive behavior spaces in hierarchical RL, yielding strong performance on NetHack primarily through better exploration rather than long-term planning.
-
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning
VLA-RL applies online RL to pretrained VLAs, yielding a 4.5% gain over strong baselines on 40 LIBERO manipulation tasks and matching commercial models like π₀-FAST.
-
Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies
Unsupervised behavioral mode discovery combined with mutual information rewards enables RL fine-tuning of multimodal generative policies that achieves higher success rates without losing action diversity.