Recognition: unknown
RoboEval: Where Robotic Manipulation Meets Structured and Scalable Evaluation
read the original abstract
We introduce RoboEval, a structured evaluation framework and benchmark for robotic manipulation that augments binary success with principled behavioral and outcome metrics. Existing evaluations often collapse performance into outcome counts, masking differences in execution quality and obscuring failure structure. RoboEval provides eight bimanual tasks with systematically controlled variations, more than three thousand expert demonstrations, and a modular simulation platform for reproducible experimentation. All tasks are instrumented with standardized metrics that quantify efficiency, coordination, and safety/stability, as well as outcome measures that trace stagewise progress and localize failure modes. Through extensive experiments with state-of-the-art visuomotor policies, we validate these metrics by analyzing their stability under variation, discriminative power across policies with similar success rates, and correlation with task success. Project Page: https://robo-eval.github.io
This paper has not been read by Pith yet.
Forward citations
Cited by 4 Pith papers
-
MolmoAct2: Action Reasoning Models for Real-world Deployment
MolmoAct2 delivers an open VLA model with new specialized components, datasets, and techniques that outperforms baselines on benchmarks while releasing all weights, code, and data for real-world robot use.
-
MolmoAct2: Action Reasoning Models for Real-world Deployment
MolmoAct2 is an open VLA model that outperforms baselines like Pi-05 on 7 benchmarks and whose backbone surpasses GPT-5 on 13 embodied-reasoning tasks through new datasets, specialized training, and architecture chang...
-
RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains
RoboPlayground reframes robotic manipulation evaluation as a language-driven process over structured physical domains, letting users author varied yet reproducible tasks that reveal policy generalization failures.
-
World Action Models: The Next Frontier in Embodied AI
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.