arXiv preprint arXiv:2402.02868 , year=

Wołczyk, M · 2024 · arXiv 2402.02868

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

cs.AI · 2026-04-27 · unverdicted · novelty 6.0

Hierarchical Behaviour Spaces uses linear combinations of reward functions to induce expressive behavior spaces in hierarchical RL, yielding strong performance on NetHack primarily through better exploration rather than long-term planning.

VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning

cs.RO · 2025-05-24 · conditional · novelty 6.0

VLA-RL applies online RL to pretrained VLAs, yielding a 4.5% gain over strong baselines on 40 LIBERO manipulation tasks and matching commercial models like π₀-FAST.

Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

Unsupervised behavioral mode discovery combined with mutual information rewards enables RL fine-tuning of multimodal generative policies that achieves higher success rates without losing action diversity.

citing papers explorer

Showing 3 of 3 citing papers.

Hierarchical Behaviour Spaces cs.AI · 2026-04-27 · unverdicted · none · ref 19
Hierarchical Behaviour Spaces uses linear combinations of reward functions to induce expressive behavior spaces in hierarchical RL, yielding strong performance on NetHack primarily through better exploration rather than long-term planning.
VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning cs.RO · 2025-05-24 · conditional · none · ref 76
VLA-RL applies online RL to pretrained VLAs, yielding a 4.5% gain over strong baselines on 40 LIBERO manipulation tasks and matching commercial models like π₀-FAST.
Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies cs.LG · 2026-05-12 · unverdicted · none · ref 71
Unsupervised behavioral mode discovery combined with mutual information rewards enables RL fine-tuning of multimodal generative policies that achieves higher success rates without losing action diversity.

arXiv preprint arXiv:2402.02868 , year=

fields

years

verdicts

representative citing papers

citing papers explorer