A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing full compositions.
Roboarena: Distributed real-world evaluation of generalist robot policies
8 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
verdicts
UNVERDICTED 8roles
background 4representative citing papers
LongBench is a new real-world benchmark that separates execution robustness from context-dependent reasoning in long-horizon robotic manipulation and shows these are distinct challenges not uniformly solved by memory-based policies.
RoboPlayground reframes robotic manipulation evaluation as a language-driven process over structured physical domains, letting users author varied yet reproducible tasks that reveal policy generalization failures.
ROBOGATE applies adaptive boundary-focused sampling in simulation to discover robot policy failure boundaries, revealing a 97.65 percentage point performance gap for a VLA model between LIBERO and industrial scenarios.
LingBot-VLA is a VLA foundation model trained on massive real robot data that shows superior generalization across tasks and platforms with fast training throughput.
A controllable world model trained on the DROID dataset generates consistent multi-view robot trajectories for over 20 seconds and improves generalist policy success rates by 44.7% via imagined trajectory fine-tuning.
SFI-Bench shows current multimodal LLMs struggle to integrate spatial memory with functional reasoning and external knowledge in video tasks.
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.
citing papers explorer
-
Atomic-Probe Governance for Skill Updates in Compositional Robot Policies
A cross-version swap protocol reveals dominant skills that swing composition success by up to 50 percentage points, and an atomic probe with selective revalidation governs updates at lower cost than always re-testing full compositions.
-
LongBench: Evaluating Robotic Manipulation Policies on Real-World Long-Horizon Tasks
LongBench is a new real-world benchmark that separates execution robustness from context-dependent reasoning in long-horizon robotic manipulation and shows these are distinct challenges not uniformly solved by memory-based policies.
-
RoboPlayground: Democratizing Robotic Evaluation through Structured Physical Domains
RoboPlayground reframes robotic manipulation evaluation as a language-driven process over structured physical domains, letting users author varied yet reproducible tasks that reveal policy generalization failures.
-
ROBOGATE: Adaptive Failure Discovery for Safe Robot Policy Deployment via Two-Stage Boundary-Focused Sampling
ROBOGATE applies adaptive boundary-focused sampling in simulation to discover robot policy failure boundaries, revealing a 97.65 percentage point performance gap for a VLA model between LIBERO and industrial scenarios.
-
A Pragmatic VLA Foundation Model
LingBot-VLA is a VLA foundation model trained on massive real robot data that shows superior generalization across tasks and platforms with fast training throughput.
-
Ctrl-World: A Controllable Generative World Model for Robot Manipulation
A controllable world model trained on the DROID dataset generates consistent multi-view robot trajectories for over 20 seconds and improves generalist policy success rates by 44.7% via imagined trajectory fine-tuning.
-
From Where Things Are to What They Are For: Benchmarking Spatial-Functional Intelligence in Multimodal LLMs
SFI-Bench shows current multimodal LLMs struggle to integrate spatial memory with functional reasoning and external knowledge in video tasks.
-
World Action Models: The Next Frontier in Embodied AI
The paper introduces World Action Models as a new paradigm unifying predictive world modeling with action generation in embodied foundation models and provides a taxonomy of existing approaches.