Pith Number

pith:QJ7SX2AM

pith:2025:QJ7SX2AM3DJ4RZAJ7H3PPKJHSJ

not attested not anchored not stored refs resolved

LIBERO-PRO: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization

Duanfeng Chu, Guiyao Tie, Guowen Zhang, Lichao Sun, Pan Zhou, Xueyang Zhou, Yangming Xu, Yongchao Chen

Vision-Language-Action models achieve over 90 percent on standard benchmarks yet drop to zero percent when objects, instructions or environments are perturbed.

arxiv:2510.03827 v1 · 2025-10-04 · cs.CV · cs.RO

Open paper page JSON Open Graph Bundle Merged state Verified badge What is a Pith Number?

Add to your LaTeX paper

\usepackage{pith}
\pithnumber{QJ7SX2AM3DJ4RZAJ7H3PPKJHSJ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp

2 Internet Archive

3 Author claim open · sign in to claim

4 Citations open

5 Replications open

✓ Portable graph bundle live · download bundle · merged state

The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

although existing models achieve over 90% accuracy under the standard LIBERO evaluation, their performance collapses to 0.0% under our generalized setting. This discrepancy exposes the models' reliance on rote memorization of action sequences and environment layouts from the training set, rather than genuine task understanding or environmental perception.

C2weakest assumption

The specific perturbations chosen across the four dimensions constitute fair tests of generalization and comprehension rather than introducing unrelated difficulties that no model could reasonably handle.

C3one line summary

LIBERO-PRO shows VLA models collapse from over 90% to 0% accuracy under perturbations in objects, states, instructions, and environments, exposing memorization instead of genuine comprehension.

References

25 extracted · 25 resolved · 13 Pith anchors

[1] $\pi_0$: A Vision-Language-Action Flow Model for General Robot Control · arXiv:2410.24164

[2] UniVLA: Learning to Act Anywhere with Task-centric Latent Actions · arXiv:2505.06111

[3] WorldVLA: Towards Autoregressive Action World Model · arXiv:2506.21539

[4] arXiv preprint arXiv:2506.08440 , year=

[5] Irving Fang, Juexiao Zhang, Shengbang Tong, and Chen Feng

Formal links

3 machine-checked theorem links

Cited by

20 papers in Pith

PointACT: Vision-Language-Action Models with Multi-Scale Point-Action Interaction

Steerable Vision-Language-Action Policies for Embodied Reasoning and Hierarchical Control

vla-eval: A Unified Evaluation Harness for Vision-Language-Action Models

RoboLab: A High-Fidelity Simulation Benchmark for Analysis of Task Generalist Policies

VP-VLA: Visual Prompting as an Interface for Vision-Language-Action Models

Receipt and verification

First computed	2026-05-17T23:38:14.843571Z
Builder	pith-number-builder-2026-05-17-v1
Signature	Pith Ed25519 (`pith-v1-2026-05`) · public key
Schema	pith-number/v1.0

Canonical hash

827f2be80cd8d3c8e409f9f6f7a927924061d8248c61b892f6dcb7847bbe717b

Aliases

arxiv: 2510.03827 · arxiv_version: 2510.03827v1 · doi: 10.48550/arxiv.2510.03827 · pith_short_12: QJ7SX2AM3DJ4 · pith_short_16: QJ7SX2AM3DJ4RZAJ · pith_short_8: QJ7SX2AM

Agent API

Resolver JSON Graph JSON Events JSON Schema Signing key

Verify this Pith Number yourself

curl -sH 'Accept: application/ld+json' https://pith.science/pith/QJ7SX2AM3DJ4RZAJ7H3PPKJHSJ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 827f2be80cd8d3c8e409f9f6f7a927924061d8248c61b892f6dcb7847bbe717b

Canonical record JSON

{
  "metadata": {
    "abstract_canon_sha256": "be511cad227cf3b223756032a6a653287c4f6fcade091d65bcff23d40138b13a",
    "cross_cats_sorted": [
      "cs.RO"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2025-10-04T14:56:40Z",
    "title_canon_sha256": "ac47bff84c98b8cbd8254dad859366420c75b1086ba3196334414e34e088aaed"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2510.03827",
    "kind": "arxiv",
    "version": 1
  }
}