Roboarena: Distributed real-world evaluation of generalist robot policies

Pranav Atreya, Karl Pertsch, Tony Lee, Moo Jin Kim, Arhan Jain, Artur Kuramshin, Clemens Eppner, Cyrus Neary, Edward Hu, Fabio Ramos, et al · 2025 · arXiv 2506.18123

8 Pith papers cite this work. Polarity classification is still indexing.

8 Pith papers citing it

read on arXiv browse 8 citing papers

citation-role summary

background 4

citation-polarity summary

background 3 unclear 1

representative citing papers

Atomic-Probe Governance for Skill Updates in Compositional Robot Policies

cs.RO · 2026-04-29 · unverdicted · novelty 7.0 · 2 refs

A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing full compositions.

LongBench: Evaluating Robotic Manipulation Policies on Real-World Long-Horizon Tasks

cs.RO · 2026-04-18 · unverdicted · novelty 6.0

LongBench is a new real-world benchmark that separates execution robustness from context-dependent reasoning in long-horizon robotic manipulation and shows these are distinct challenges not uniformly solved by memory-based policies.

RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains

cs.RO · 2026-04-06 · unverdicted · novelty 6.0

RoboPlayground reframes robotic manipulation evaluation as a language-driven process over structured physical domains, letting users author varied yet reproducible tasks that reveal policy generalization failures.

ROBOGATE: Adaptive Failure Discovery for Safe Robot Policy Deployment via Two-Stage Boundary-Focused Sampling

cs.RO · 2026-03-23 · unverdicted · novelty 6.0

ROBOGATE applies adaptive boundary-focused sampling in simulation to discover robot policy failure boundaries, revealing a 97.65 percentage point performance gap for a VLA model between LIBERO and industrial scenarios.

A Pragmatic VLA Foundation Model

cs.RO · 2026-01-26 · unverdicted · novelty 6.0

LingBot-VLA is a VLA foundation model trained on massive real robot data that shows superior generalization across tasks and platforms with fast training throughput.

Ctrl-World: A Controllable Generative World Model for Robot Manipulation

cs.RO · 2025-10-11 · unverdicted · novelty 6.0

A controllable world model trained on the DROID dataset generates consistent multi-view robot trajectories for over 20 seconds and improves generalist policy success rates by 44.7% via imagined trajectory fine-tuning.

From Where Things Are to What They Are For: Benchmarking Spatial-Functional Intelligence in Multimodal LLMs

cs.CV · 2026-05-04 · unverdicted · novelty 5.0

SFI-Bench shows current multimodal LLMs struggle to integrate spatial memory with functional reasoning and external knowledge in video tasks.

World Action Models: The Next Frontier in Embodied AI

cs.RO · 2026-05-12 · unverdicted · novelty 4.0

The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

citing papers explorer

Showing 8 of 8 citing papers.

Atomic-Probe Governance for Skill Updates in Compositional Robot Policies cs.RO · 2026-04-29 · unverdicted · none · ref 26 · 2 links
A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing full compositions.
LongBench: Evaluating Robotic Manipulation Policies on Real-World Long-Horizon Tasks cs.RO · 2026-04-18 · unverdicted · none · ref 1
LongBench is a new real-world benchmark that separates execution robustness from context-dependent reasoning in long-horizon robotic manipulation and shows these are distinct challenges not uniformly solved by memory-based policies.
RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains cs.RO · 2026-04-06 · unverdicted · none · ref 3
RoboPlayground reframes robotic manipulation evaluation as a language-driven process over structured physical domains, letting users author varied yet reproducible tasks that reveal policy generalization failures.
ROBOGATE: Adaptive Failure Discovery for Safe Robot Policy Deployment via Two-Stage Boundary-Focused Sampling cs.RO · 2026-03-23 · unverdicted · none · ref 28
ROBOGATE applies adaptive boundary-focused sampling in simulation to discover robot policy failure boundaries, revealing a 97.65 percentage point performance gap for a VLA model between LIBERO and industrial scenarios.
A Pragmatic VLA Foundation Model cs.RO · 2026-01-26 · unverdicted · none · ref 1
LingBot-VLA is a VLA foundation model trained on massive real robot data that shows superior generalization across tasks and platforms with fast training throughput.
Ctrl-World: A Controllable Generative World Model for Robot Manipulation cs.RO · 2025-10-11 · unverdicted · none · ref 2
A controllable world model trained on the DROID dataset generates consistent multi-view robot trajectories for over 20 seconds and improves generalist policy success rates by 44.7% via imagined trajectory fine-tuning.
From Where Things Are to What They Are For: Benchmarking Spatial-Functional Intelligence in Multimodal LLMs cs.CV · 2026-05-04 · unverdicted · none · ref 1
SFI-Bench shows current multimodal LLMs struggle to integrate spatial memory with functional reasoning and external knowledge in video tasks.
World Action Models: The Next Frontier in Embodied AI cs.RO · 2026-05-12 · unverdicted · none · ref 257
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.

Roboarena: Distributed real-world evaluation of generalist robot policies

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer