arXiv preprint arXiv:2403.12203 , year=

Jiaxu Xing, Angel Romero, Leonard Bauersfeld, Davide Scaramuzza · 2024 · arXiv 2403.12203

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

representative citing papers

VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning

cs.RO · 2025-05-24 · conditional · novelty 6.0

VLA-RL applies online RL to pretrained VLAs, yielding a 4.5% gain over strong baselines on 40 LIBERO manipulation tasks and matching commercial models like π₀-FAST.

Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies

cs.LG · 2026-05-12 · unverdicted · novelty 5.0

Unsupervised behavioral mode discovery combined with mutual information rewards enables RL fine-tuning of multimodal generative policies that achieves higher success rates without losing action diversity.

State-Conditional Adversarial Learning: An Off-Policy Visual Domain Transfer Method for End-to-End Imitation Learning

cs.RO · 2025-12-05 · unverdicted · novelty 5.0

SCAL derives an upper bound on target-domain imitation loss using source loss plus state-conditional latent KL divergence and aligns distributions via a discriminator-based adversarial estimator.

citing papers explorer

Showing 3 of 3 citing papers.

VLA-RL: Towards Masterful and General Robotic Manipulation with Scalable Reinforcement Learning cs.RO · 2025-05-24 · conditional · none · ref 77
VLA-RL applies online RL to pretrained VLAs, yielding a 4.5% gain over strong baselines on 40 LIBERO manipulation tasks and matching commercial models like π₀-FAST.
Behavioral Mode Discovery for Fine-tuning Multimodal Generative Policies cs.LG · 2026-05-12 · unverdicted · none · ref 22
Unsupervised behavioral mode discovery combined with mutual information rewards enables RL fine-tuning of multimodal generative policies that achieves higher success rates without losing action diversity.
State-Conditional Adversarial Learning: An Off-Policy Visual Domain Transfer Method for End-to-End Imitation Learning cs.RO · 2025-12-05 · unverdicted · none · ref 28
SCAL derives an upper bound on target-domain imitation loss using source loss plus state-conditional latent KL divergence and aligns distributions via a discriminator-based adversarial estimator.

arXiv preprint arXiv:2403.12203 , year=

fields

years

verdicts

representative citing papers

citing papers explorer