PaIR-Drive runs IL and RL in parallel branches with a tree-structured sampler to reach 91.2 PDMS and 87.9 EPDMS on NAVSIM benchmarks while outperforming sequential RL fine-tuning and correcting some human errors.
Efficient online reinforcement learning with offline data
2 Pith papers cite this work. Polarity classification is still indexing.
2
Pith papers citing it
years
2026 2verdicts
UNVERDICTED 2representative citing papers
RL Token enables sample-efficient online RL fine-tuning of large VLAs, delivering up to 3x speed gains and higher success rates on real-robot manipulation tasks within minutes to hours.
citing papers explorer
-
Fine-tuning is Not Enough: A Parallel Framework for Collaborative Imitation and Reinforcement Learning in End-to-end Autonomous Driving
PaIR-Drive runs IL and RL in parallel branches with a tree-structured sampler to reach 91.2 PDMS and 87.9 EPDMS on NAVSIM benchmarks while outperforming sequential RL fine-tuning and correcting some human errors.
-
RL Token: Bootstrapping Online RL with Vision-Language-Action Models
RL Token enables sample-efficient online RL fine-tuning of large VLAs, delivering up to 3x speed gains and higher success rates on real-robot manipulation tasks within minutes to hours.