RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies

· 2026 · cs.RO · arXiv 2604.09860

3 Pith papers cite this work. Polarity classification is still indexing.

3 Pith papers citing it

open full Pith review browse 3 citing papers arXiv PDF

abstract

The pursuit of general-purpose robotics has yielded impressive foundation models, yet simulation-based benchmarking remains a bottleneck due to rapid performance saturation and a lack of true generalization testing. Existing benchmarks often exhibit significant domain overlap between training and evaluation, trivializing success rates and obscuring insights into robustness. We introduce RoboLab, a simulation benchmarking framework designed to address these challenges. Concretely, our framework is designed to answer two questions: (1) to what extent can we understand the performance of a real-world policy by analyzing its behavior in simulation, and (2) which factor most strongly affect policy behavior. First, RoboLab enables human-authored and LLM-enabled generation of scenes and tasks in a robot- and policy-agnostic manner within a high-fidelity simulation environment. We introduce an accompanying RoboLab-120 benchmark, consisting of 120 tasks categorized into three competency axes: visual, procedural, relational, across three difficulty levels. Second, we introduce a systematic analysis of real-world policies that quantify both their performance and the sensitivity of their behavior to controlled perturbations, exposing significant performance gap in current state-of-the-art models. By providing granular metrics and a scalable toolset, RoboLab offers a scalable framework for evaluating the true generalization capabilities of task-generalist robotic policies. Project website: https://research.nvidia.com/labs/srl/projects/robolab/.

citation-role summary

background 1

citation-polarity summary

unclear 1

representative citing papers

VLA-REPLICA: A Low-Cost, Reproducible Benchmark for Real-World Evaluation of Vision-Language-Action Models

cs.RO · 2026-05-20 · conditional · novelty 5.0

VLA-REPLICA is a low-cost and reproducible real-world benchmark for evaluating VLA models in robotic manipulation tasks.

Automatically Improving Simulation Physics for Articulated Objects

cs.RO · 2026-05-18 · unverdicted · novelty 5.0

A simulator-in-the-loop multi-modal method refines physical properties of incomplete 3D articulated objects to improve simulation stability and downstream robot policy performance.

World Action Models: The Next Frontier in Embodied AI

cs.RO · 2026-05-12 · unverdicted · novelty 4.0

The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

citing papers explorer

Showing 3 of 3 citing papers.

VLA-REPLICA: A Low-Cost, Reproducible Benchmark for Real-World Evaluation of Vision-Language-Action Models cs.RO · 2026-05-20 · conditional · none · ref 31 · internal anchor
VLA-REPLICA is a low-cost and reproducible real-world benchmark for evaluating VLA models in robotic manipulation tasks.
Automatically Improving Simulation Physics for Articulated Objects cs.RO · 2026-05-18 · unverdicted · none · ref 16 · internal anchor
A simulator-in-the-loop multi-modal method refines physical properties of incomplete 3D articulated objects to improve simulation stability and downstream robot policy performance.
World Action Models: The Next Frontier in Embodied AI cs.RO · 2026-05-12 · unverdicted · none · ref 242 · internal anchor
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer