PrimitiveVLA: Learning Reusable Motion Primitives for Efficient and Generalizable Robotic Manipulation

Di Huang; Jiaming Guo; Ling Li; Shaohui Peng; Siming Lan; Xing Hu; Yunji Chen; Yunkai Gao; Yutai Li; Yuxuan Guo

arxiv: 2605.28634 · v1 · pith:4ANJTL75new · submitted 2026-05-27 · 💻 cs.RO

PrimitiveVLA: Learning Reusable Motion Primitives for Efficient and Generalizable Robotic Manipulation

Yutai Li , Shaohui Peng , Jiaming Guo , Di Huang , Zihao Zhang , Yuxuan Guo , Yunkai Gao , Siming Lan

show 3 more authors

Ling Li Xing Hu Yunji Chen

This is my paper

classification 💻 cs.RO

keywords primitivesprimitivevlareusablemodelsmotionparadigmdatadisassemble

0 comments

read the original abstract

Vision-Language-Action (VLA) models offer a promising paradigm for generalist robotic policies, yet their adaptation is hindered by data inefficiency and poor generalization. We argue that these bottlenecks stem from the prevailing Direct Instruction-to-Control Mapping, which forces models to memorize monolithic trajectories rather than reusable motion patterns, i.e., primitives. We propose PrimitiveVLA, a framework that shifts this paradigm toward a Primitive-Centric Disassemble & Assemble paradigm. Supported by a shared Multimodal Canonical Representation (MCR), PrimitiveVLA unifies two phases: (1) Fine-tuning-phase Disassembly, which uses an automated pipeline to disassemble demonstrations into reusable primitives; and (2) Inference-phase Assembly, which employs a VLM-based planner and an LLM-generated switch module for robust closed-loop execution. By disassembling tasks into reusable primitives, PrimitiveVLA enables VLA models to learn invariant motion patterns instead of task-specific trajectories. Extensive experiments show that our framework improves data efficiency and achieves superior zero-shot generalization across unseen and long-horizon tasks.

This paper has not been read by Pith yet.

PrimitiveVLA: Learning Reusable Motion Primitives for Efficient and Generalizable Robotic Manipulation

discussion (0)