Is Diversity All You Need for Scalable Robotic Manipulation?

Chiming Liu; Di Huang; Guanghui Ren; Hongyang Li; Jin Chen; Li Chen; Maoqing Yao; Modi Shi; Ping Luo; Yuxiang Lu

arxiv: 2507.06219 · v2 · pith:EWD6K4D7new · submitted 2025-07-08 · 💻 cs.RO · cs.AI· cs.LG

Is Diversity All You Need for Scalable Robotic Manipulation?

Modi Shi , Li Chen , Jin Chen , Yuxiang Lu , Chiming Liu , Guanghui Ren , Ping Luo , Di Huang

show 2 more authors

Maoqing Yao Hongyang Li

This is my paper

classification 💻 cs.RO cs.AIcs.LG

keywords datadiversitymanipulationpre-trainingrobotroboticscalingcritical

0 comments

read the original abstract

Data scaling has driven remarkable success in foundation models for Natural Language Processing (NLP) and Computer Vision (CV), yet the principles of effective data scaling in robotic manipulation remain insufficiently understood. In this work, we investigate the nuanced role of data diversity in robot learning by examining three critical dimensions-task (what to do), embodiment (which robot to use), and expert (who demonstrates)-challenging the conventional intuition of "more diverse is better". Throughout extensive experiments on various robot platforms, we reveal that (1) task diversity proves more critical than per-task demonstration quantity, benefiting transfer from diverse pre-training tasks to novel downstream scenarios; (2) multi-embodiment pre-training data is optional for cross-embodiment transfer-models trained on high-quality single-embodiment data can efficiently transfer to different platforms, showing more desirable scaling property during fine-tuning than multi-embodiment pre-trained models; and (3) expert diversity, arising from individual operational preferences and stochastic variations in human demonstrations, can be confounding to policy learning, with velocity multimodality emerging as a key contributing factor. Based on this insight, we propose a distribution debiasing method to mitigate velocity ambiguity, the yielding GO-1-Pro achieves substantial performance gains of 15%, equivalent to using 2.5 times pre-training data. Collectively, these findings provide new perspectives and offer practical guidance on how to scale robotic manipulation datasets effectively.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 5 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

QDTraj: Exploration of Diverse Trajectory Primitives for Articulated Objects Robotic Manipulation
cs.RO 2026-04 unverdicted novelty 6.0

QDTraj uses Quality-Diversity algorithms with sparse rewards to produce at least five times more diverse high-performing trajectories for articulated object manipulation than compared methods, validated across 30 obje...
FASTER: Rethinking Real-Time Flow VLAs
cs.RO 2026-03 conditional novelty 6.0

FASTER uses a horizon-aware flow sampling schedule to compress immediate-action denoising to one step, slashing effective reaction latency in real-robot VLA deployments.
FASTER: Rethinking Real-Time Flow VLAs
cs.RO 2026-03 unverdicted novelty 6.0

FASTER adds a Horizon-Aware Schedule to flow VLAs that compresses immediate-action denoising to one step while keeping long-horizon trajectory quality, lowering real-robot reaction latency.
RISE: Self-Improving Robot Policy with Compositional World Model
cs.RO 2026-02 unverdicted novelty 6.0

RISE combines a controllable dynamics model and progress value model into a closed-loop self-improving pipeline that updates robot policies entirely in imagination, reporting over 35% absolute gains on three real-world tasks.
Genie Sim 3.0 : A High-Fidelity Comprehensive Simulation Platform for Humanoid Robot
cs.RO 2026-01 unverdicted novelty 6.0

Genie Sim 3.0 introduces an LLM-powered scene generator, the first LLM-based automated evaluation benchmark, and a large open synthetic dataset that demonstrates zero-shot sim-to-real transfer for robotic manipulation...