pith:QJ7SX2AM
LIBERO-PRO: Towards Robust and Fair Evaluation of Vision-Language-Action Models Beyond Memorization
Vision-Language-Action models achieve over 90 percent on standard benchmarks yet drop to zero percent when objects, instructions or environments are perturbed.
arxiv:2510.03827 v1 · 2025-10-04 · cs.CV · cs.RO
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{QJ7SX2AM3DJ4RZAJ7H3PPKJHSJ}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
although existing models achieve over 90% accuracy under the standard LIBERO evaluation, their performance collapses to 0.0% under our generalized setting. This discrepancy exposes the models' reliance on rote memorization of action sequences and environment layouts from the training set, rather than genuine task understanding or environmental perception.
The specific perturbations chosen across the four dimensions constitute fair tests of generalization and comprehension rather than introducing unrelated difficulties that no model could reasonably handle.
LIBERO-PRO shows VLA models collapse from over 90% to 0% accuracy under perturbations in objects, states, instructions, and environments, exposing memorization instead of genuine comprehension.
References
Formal links
Cited by
Receipt and verification
| First computed | 2026-05-17T23:38:14.843571Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
827f2be80cd8d3c8e409f9f6f7a927924061d8248c61b892f6dcb7847bbe717b
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/QJ7SX2AM3DJ4RZAJ7H3PPKJHSJ \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 827f2be80cd8d3c8e409f9f6f7a927924061d8248c61b892f6dcb7847bbe717b
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "be511cad227cf3b223756032a6a653287c4f6fcade091d65bcff23d40138b13a",
"cross_cats_sorted": [
"cs.RO"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2025-10-04T14:56:40Z",
"title_canon_sha256": "ac47bff84c98b8cbd8254dad859366420c75b1086ba3196334414e34e088aaed"
},
"schema_version": "1.0",
"source": {
"id": "2510.03827",
"kind": "arxiv",
"version": 1
}
}