RePO-VLA raises average adversarial success rates in VLA manipulation from 20% to 75% by using recovery-aware initialization, a progress-aware semantic value function, and value-conditioned refinement on success and corrective trajectories.
Yell at your robot: Improving on-the-fly from language corrections
9 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
roles
background 3polarities
background 3representative citing papers
π_{0.5} is a VLA model that achieves long-horizon dexterous manipulation in entirely new homes through co-training on heterogeneous tasks and multi-source data including web and semantic predictions.
OpenVLA-OFT fine-tuning boosts LIBERO success rate from 76.5% to 97.1%, speeds action generation 26x, and outperforms baselines on real bimanual dexterous tasks.
A hierarchical VLA architecture lets robots follow complex instructions and situated feedback by separating high-level reasoning from low-level control.
DexVLA combines a scaled diffusion action expert with embodiment curriculum learning to achieve better generalization and performance than prior VLA models on diverse robot hardware and long-horizon tasks.
DyGRO-VLA is a two-stage optimization framework for cross-task scaling of Vision-Language-Action models via dynamic grouped residual optimization in RL.
AFIL trains dual action generators on success and failure rollouts from a pretrained VLA to steer diffusion policies away from failure modes during inference.
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
TinyVLA achieves faster inference and higher data efficiency than OpenVLA on robotic manipulation tasks by initializing from high-speed multimodal models and adding a diffusion policy decoder, without any pre-training phase.
citing papers explorer
-
RePO-VLA: Recovery-Driven Policy Optimization for Vision-Language-Action Models
RePO-VLA raises average adversarial success rates in VLA manipulation from 20% to 75% by using recovery-aware initialization, a progress-aware semantic value function, and value-conditioned refinement on success and corrective trajectories.
-
$\pi_{0.5}$: a Vision-Language-Action Model with Open-World Generalization
π_{0.5} is a VLA model that achieves long-horizon dexterous manipulation in entirely new homes through co-training on heterogeneous tasks and multi-source data including web and semantic predictions.
-
Fine-Tuning Vision-Language-Action Models: Optimizing Speed and Success
OpenVLA-OFT fine-tuning boosts LIBERO success rate from 76.5% to 97.1%, speeds action generation 26x, and outperforms baselines on real bimanual dexterous tasks.
-
Hi Robot: Open-Ended Instruction Following with Hierarchical Vision-Language-Action Models
A hierarchical VLA architecture lets robots follow complex instructions and situated feedback by separating high-level reasoning from low-level control.
-
DexVLA: Vision-Language Model with Plug-In Diffusion Expert for General Robot Control
DexVLA combines a scaled diffusion action expert with embodiment curriculum learning to achieve better generalization and performance than prior VLA models on diverse robot hardware and long-horizon tasks.
-
DyGRO-VLA: Cross-Task Scaling of Vision-Language-Action Models via Dynamic Grouped Residual Optimization
DyGRO-VLA is a two-stage optimization framework for cross-task scaling of Vision-Language-Action models via dynamic grouped residual optimization in RL.
-
Failing Forward: Adaptive Failure-Informed Learning for Vision-Language-Action Models
AFIL trains dual action generators on success and failure rollouts from a pretrained VLA to steer diffusion policies away from failure modes during inference.
-
A Survey on Vision-Language-Action Models: An Action Tokenization Perspective
The survey frames VLA models as pipelines that generate progressively grounded action tokens and classifies those tokens into eight types to guide future development.
-
TinyVLA: Towards Fast, Data-Efficient Vision-Language-Action Models for Robotic Manipulation
TinyVLA achieves faster inference and higher data efficiency than OpenVLA on robotic manipulation tasks by initializing from high-speed multimodal models and adding a diffusion policy decoder, without any pre-training phase.