pith:CQEI36IJ
SceneFunRI: Reasoning the Invisible for Task-Driven Functional Object Localization
Vision-language models cannot reliably locate invisible functional objects from task instructions and commonsense.
arxiv:2605.14704 v1 · 2026-05-14 · cs.CV · cs.AI · cs.RO
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{CQEI36IJNBHHZC2CHBUTGJ2QMZ}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
The strongest baseline model (Gemini 3 Flash) only achieves an CAcc@75 of 15.20, an mIoU of 0.74, and a Dist of 28.65. These findings indicate that invisible-region reasoning remains an unstable capability in current VLMs.
The semi-automatic pipeline accurately creates 855 instances that genuinely require commonsense and spatial reasoning beyond superficial visual cues, rather than introducing artifacts that explain the low model performance.
SceneFunRI benchmark shows current VLMs struggle severely with inferring locations of invisible functional objects, with the strongest model (Gemini 3 Flash) reaching only 15.20 CAcc@75.
References
Receipt and verification
| First computed | 2026-05-17T23:38:59.290291Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
14088df909684e7c8b4238693327506666bc4de6bf1c46cbd714cf846ebb3700
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/CQEI36IJNBHHZC2CHBUTGJ2QMZ \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 14088df909684e7c8b4238693327506666bc4de6bf1c46cbd714cf846ebb3700
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "54637346a93f1dc6c32674dbf6a01de1093a6119253cb8b203a427902373360a",
"cross_cats_sorted": [
"cs.AI",
"cs.RO"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.CV",
"submitted_at": "2026-05-14T11:21:41Z",
"title_canon_sha256": "fb13c269bda0be01455a9e6ad02abdb673621e71c92e40ae576b22d11b96e459"
},
"schema_version": "1.0",
"source": {
"id": "2605.14704",
"kind": "arxiv",
"version": 1
}
}