pith. sign in
Pith Number

pith:QJ7SX2AM

pith:2025:QJ7SX2AM3DJ4RZAJ7H3PPKJHSJ
not attested not anchored not stored refs resolved

LIBERO-PRO: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization

Duanfeng Chu, Guiyao Tie, Guowen Zhang, Lichao Sun, Pan Zhou, Xueyang Zhou, Yangming Xu, Yongchao Chen

Vision-Language-Action models achieve over 90 percent on standard benchmarks yet drop to zero percent when objects, instructions or environments are perturbed.

arxiv:2510.03827 v1 · 2025-10-04 · cs.CV · cs.RO

Add to your LaTeX paper
\usepackage{pith}
\pithnumber{QJ7SX2AM3DJ4RZAJ7H3PPKJHSJ}

Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge

Record completeness

1 Bitcoin timestamp
2 Internet Archive
3 Author claim open · sign in to claim
4 Citations open
5 Replications open
Portable graph bundle live · download bundle · merged state
The bundle contains the canonical record plus signed events. A mirror can host it anywhere and recompute the same current state with the deterministic merge algorithm.

Claims

C1strongest claim

although existing models achieve over 90% accuracy under the standard LIBERO evaluation, their performance collapses to 0.0% under our generalized setting. This discrepancy exposes the models' reliance on rote memorization of action sequences and environment layouts from the training set, rather than genuine task understanding or environmental perception.

C2weakest assumption

The specific perturbations chosen across the four dimensions constitute fair tests of generalization and comprehension rather than introducing unrelated difficulties that no model could reasonably handle.

C3one line summary

LIBERO-PRO shows VLA models collapse from over 90% to 0% accuracy under perturbations in objects, states, instructions, and environments, exposing memorization instead of genuine comprehension.

References

25 extracted · 25 resolved · 13 Pith anchors

[1] $\pi_0$: A Vision-Language-Action Flow Model for General Robot Control · arXiv:2410.24164
[2] UniVLA: Learning to Act Anywhere with Task-centric Latent Actions · arXiv:2505.06111
[3] WorldVLA: Towards Autoregressive Action World Model · arXiv:2506.21539
[4] arXiv preprint arXiv:2506.08440 , year=
[5] Irving Fang, Juexiao Zhang, Shengbang Tong, and Chen Feng

Formal links

3 machine-checked theorem links

Cited by

20 papers in Pith

Receipt and verification
First computed 2026-05-17T23:38:14.843571Z
Builder pith-number-builder-2026-05-17-v1
Signature Pith Ed25519 (pith-v1-2026-05) · public key
Schema pith-number/v1.0

Canonical hash

827f2be80cd8d3c8e409f9f6f7a927924061d8248c61b892f6dcb7847bbe717b

Aliases

arxiv: 2510.03827 · arxiv_version: 2510.03827v1 · doi: 10.48550/arxiv.2510.03827 · pith_short_12: QJ7SX2AM3DJ4 · pith_short_16: QJ7SX2AM3DJ4RZAJ · pith_short_8: QJ7SX2AM
Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/QJ7SX2AM3DJ4RZAJ7H3PPKJHSJ \
  | jq -c '.canonical_record' \
  | python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 827f2be80cd8d3c8e409f9f6f7a927924061d8248c61b892f6dcb7847bbe717b
Canonical record JSON
{
  "metadata": {
    "abstract_canon_sha256": "be511cad227cf3b223756032a6a653287c4f6fcade091d65bcff23d40138b13a",
    "cross_cats_sorted": [
      "cs.RO"
    ],
    "license": "http://creativecommons.org/licenses/by/4.0/",
    "primary_cat": "cs.CV",
    "submitted_at": "2025-10-04T14:56:40Z",
    "title_canon_sha256": "ac47bff84c98b8cbd8254dad859366420c75b1086ba3196334414e34e088aaed"
  },
  "schema_version": "1.0",
  "source": {
    "id": "2510.03827",
    "kind": "arxiv",
    "version": 1
  }
}