pith. sign in

hub Mixed citations

LIBERO-Plus: In-depth Robustness Analysis of Vision-Language-Action Models

Mixed citation behavior. Most common role is background (46%).

74 Pith papers citing it
Background 46% of classified citations
abstract

Visual-Language-Action (VLA) models report impressive success rates on robotic manipulation benchmarks, yet these results may mask fundamental weaknesses in robustness. We perform a systematic vulnerability analysis by introducing controlled perturbations across seven dimensions: objects layout, camera viewpoints, robot initial states, language instructions, light conditions, background textures and sensor noise. We comprehensively analyzed multiple state-of-the-art models and revealed consistent brittleness beneath apparent competence. Our analysis exposes critical weaknesses: models exhibit extreme sensitivity to perturbation factors, including camera viewpoints and robot initial states, with performance dropping from 95% to below 30% under modest perturbations. Surprisingly, models are largely insensitive to language variations, with further experiments revealing that models tend to ignore language instructions completely. Our findings challenge the assumption that high benchmark scores equate to true competency and highlight the need for evaluation practices that assess reliability under realistic variation.

hub tools

citation-role summary

background 15 dataset 7 baseline 3 method 1

citation-polarity summary

claims ledger

  • abstract Visual-Language-Action (VLA) models report impressive success rates on robotic manipulation benchmarks, yet these results may mask fundamental weaknesses in robustness. We perform a systematic vulnerability analysis by introducing controlled perturbations across seven dimensions: objects layout, camera viewpoints, robot initial states, language instructions, light conditions, background textures and sensor noise. We comprehensively analyzed multiple state-of-the-art models and revealed consistent brittleness beneath apparent competence. Our analysis exposes critical weaknesses: models exhibit
  • background However, standard VLA models do not explicitly model world dynamics ithey learn direct observation-to- action mappings without predicting how the environment changes under intervention[ 4]. This absence of predictive physical reasoning limits their generalization, where anticipating future states is essential. Equip- ping embodied policy models with world modeling capabilities thus emerges as a natural direction [ 5]. A growing body of recent work has begun integrating world models into the embo

co-cited works

years

2026 73 2025 1

clear filters

representative citing papers

PlayWorld: Learning Robot World Models from Autonomous Play

cs.RO · 2026-03-09 · unverdicted · novelty 7.0

PlayWorld learns high-fidelity robot world models from unsupervised self-play, producing physically consistent video predictions that outperform models trained on human data and enabling 65% better real-world policy performance via model-based RL.

ABot-M0.5: Unified Mobility-and-Manipulation World Action Model

cs.CV · 2026-07-01 · unverdicted · novelty 6.0

ABot-M0.5 proposes a unified mobility-and-manipulation world action model using three alignment strategies that achieves state-of-the-art performance on mobile and fine-grained manipulation benchmarks.

Sequential Planning via Anchored Robotic Keypoints

cs.RO · 2026-06-29 · unverdicted · novelty 6.0

SPARK reaches 43.7% success on six LIBERO-PRO cells by LLM-generated typed behavior trees plus multi-prompt perception and recovery, more than doubling CaP-Agent0 and VLA baselines.

What Are We Actually Benchmarking in Robot Manipulation?

cs.RO · 2026-06-02 · conditional · novelty 6.0

LIBERO and CALVIN fail multiple proposed diagnostics for shortcut solvability, statistical significance, overfitting, and data dependence, while a tiny 0.09B probe reaches near-SOTA on LIBERO.

citing papers explorer

Showing 4 of 4 citing papers after filters.