EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies

Chunhua Shen; Hanqing Wang; Hao Li; Haoxiang Ma; Jiangmiao Pang; Jiantong Chen; Jia Zeng; Jinliang Zheng; Mingda Jia; Ning Gao

arxiv: 2606.18239 · v1 · pith:YBMQCNI3new · submitted 2026-06-16 · 💻 cs.RO

EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies

Ning Gao , Jinliang Zheng , Xing Gao , Haoxiang Ma , Hanqing Wang , Yukai Wang , Jiantong Chen , Zanxin Chen

show 17 more authors

Shujie Zhang Mingda Jia Xuekun Jiang Zihou Zhu Xinyu Li Shuai Wang Hao Li Wenzhe Cai Yuqiang Yang Xudong Xu Zhaoyang Lyu Yao Mu Tai Wang Jiangmiao Pang Jia Zeng Weinan Zhang Chunhua Shen

This is my paper

classification 💻 cs.RO

keywords manipulationebenchgeneralistmodelscapabilitymobilepoliciesbenchmark

0 comments

read the original abstract

We present EBench, a simulation benchmark that diagnoses generalist mobile manipulation policies beyond a single success-rate scalar. EBench comprises 26 diverse and challenging manipulation tasks annotated along 5 capability dimensions and 4 generalization dimensions. We evaluate state-of-the-art generalist manipulation models including $\pi_0$, $\pi_{0.5}$, XVLA, and InternVLA-A1, and reveal that models with near success rates exhibit strikingly different capability profiles: $\pi_{0.5}$ achieves the highest test success rate and the best train--test retention, whereas InternVLA-A1 dominates mobile manipulation but collapses on dexterous tasks, and XVLA exhibits strengths on a disjoint set of atomic skills compared to other policies. Beyond capability profiling, EBench analyzes the generalization ability from 4 representative perspectives, identifying the impact of different distribution shift factors. The results reveal strengths and weaknesses of models behind an overall score. We hope this benchmark offers a broad set of diagnostic signals to guide iteration on generalist manipulation models.

This paper has not been read by Pith yet.

EBench: Elemental Diagnosis of Generalist Mobile Manipulation Policies

discussion (0)