pith:GTPZI2U6
VISOR: A Vision-Language Model-based Test Oracle for Testing Robots
VISOR uses vision-language models to automatically score robot task correctness, quality, and uncertainty from videos.
arxiv:2605.10408 v2 · 2026-05-11 · cs.SE · cs.RO
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{GTPZI2U66RREXMND4TVXBFAEHV}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
VISOR performs automated evaluation of task correctness and quality, addressing the limitations of existing symbolic test oracles, which are task-specific and provide pass/fail judgments without explicitly quantifying task quality. Given the inherent uncertainty in VLMs, VISOR also explicitly quantifies its own uncertainty during test assessments.
That off-the-shelf vision-language models can accurately interpret and score complex, dynamic robot behaviors in videos without task-specific fine-tuning or symbolic grounding.
VISOR applies VLMs to automate robot test oracles for correctness and quality assessment while reporting uncertainty, with evaluation on GPT and Gemini showing trade-offs in precision and recall but poor uncertainty calibration.
Formal links
Receipt and verification
| First computed | 2026-05-20T00:03:16.796682Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
34df946a9ef4624bb1a3e4eb7094043d7c1f6cdb0f5b60b7a2d716c0224d5671
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/GTPZI2U66RREXMND4TVXBFAEHV \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 34df946a9ef4624bb1a3e4eb7094043d7c1f6cdb0f5b60b7a2d716c0224d5671
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "f53a59a7f98ab23fcbaedf5b5f1e4342efe241fe944c3c0ebe1a527eb93217be",
"cross_cats_sorted": [
"cs.RO"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.SE",
"submitted_at": "2026-05-11T11:46:57Z",
"title_canon_sha256": "1599b5322227c8656f71f69588cecc589c9565f0ee2e7eb656639c6dd0aed675"
},
"schema_version": "1.0",
"source": {
"id": "2605.10408",
"kind": "arxiv",
"version": 2
}
}