Capability vectors extracted from parameter differences between standard and auxiliary-finetuned VLA models can be merged into pretrained weights to match auxiliary-training performance while reducing computational overhead during adaptation.
Shortcut learning in generalist robot policies: The role of dataset diversity and fragmentation.arXiv preprint arXiv:2508.06426
4 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 4roles
background 3polarities
background 3representative citing papers
Action Images turn robot arm motions into interpretable multiview pixel videos, letting video backbones serve as zero-shot policies for end-to-end robot learning.
DeLock mitigates lock-in in low-data VLA post-training via visual grounding preservation and test-time contrastive prompt guidance, outperforming baselines across eight evaluations while matching data-heavy generalist policies.
A vision-language-aligned world model turns visuomotor MPC into a language-following planner that reaches 87% success on 288 unseen semantic tasks where standard VLAs drop to 22%.
citing papers explorer
-
CapVector: Learning Transferable Capability Vectors in Parametric Space for Vision-Language-Action Models
Capability vectors extracted from parameter differences between standard and auxiliary-finetuned VLA models can be merged into pretrained weights to match auxiliary-training performance while reducing computational overhead during adaptation.
-
Action Images: End-to-End Policy Learning via Multiview Video Generation
Action Images turn robot arm motions into interpretable multiview pixel videos, letting video backbones serve as zero-shot policies for end-to-end robot learning.
-
Breaking Lock-In: Preserving Steerability under Low-Data VLA Post-Training
DeLock mitigates lock-in in low-data VLA post-training via visual grounding preservation and test-time contrastive prompt guidance, outperforming baselines across eight evaluations while matching data-heavy generalist policies.
-
Grounded World Model for Semantically Generalizable Planning
A vision-language-aligned world model turns visuomotor MPC into a language-following planner that reaches 87% success on 288 unseen semantic tasks where standard VLAs drop to 22%.