pith:SORHSKQO
Bad Seeing or Bad Thinking? Rewarding Perception for Vision-Language Reasoning
Vision-language models improve both perception and reasoning by routing rewards to the specific source of error via blindfolded verification.
arxiv:2605.14054 v1 · 2026-05-13 · cs.AI · cs.CV
Add to your LaTeX paper
\usepackage{pith}
\pithnumber{SORHSKQOXTUTC5ZOJA5WWZRVBY}
Prints a linked badge after your title and injects PDF metadata. Compiles on arXiv. Learn more · Embed verified badge
Record completeness
Claims
the root cause of this trade-off is an ambiguity in modality credit assignment: when a VLM fails, is it due to flawed perception (bad seeing) or flawed logic (bad thinking)? ... These techniques are integrated into a Modality-Aware Credit Assignment (MoCA) mechanism, which routes rewards to the specific source of error.
That the blindfolded reasoning proxy in Perception Verification can reliably measure and reward perceptual fidelity independently of reasoning outcomes without introducing new biases or requiring perfect separation of modalities.
A new RL method called MoCA with Perception Verification rewards perceptual fidelity independently to improve both seeing and thinking in VLMs.
References
Receipt and verification
| First computed | 2026-05-17T23:39:12.607850Z |
|---|---|
| Builder | pith-number-builder-2026-05-17-v1 |
| Signature | Pith Ed25519
(pith-v1-2026-05) · public key |
| Schema | pith-number/v1.0 |
Canonical hash
93a2792a0ebce931772e483b6b66350e3792d34ff7ac67fa33c097b01c74cb12
Aliases
· · · · ·Agent API
Verify this Pith Number yourself
curl -sH 'Accept: application/ld+json' https://pith.science/pith/SORHSKQOXTUTC5ZOJA5WWZRVBY \
| jq -c '.canonical_record' \
| python3 -c "import sys,json,hashlib; b=json.dumps(json.loads(sys.stdin.read()), sort_keys=True, separators=(',',':'), ensure_ascii=False).encode(); print(hashlib.sha256(b).hexdigest())"
# expect: 93a2792a0ebce931772e483b6b66350e3792d34ff7ac67fa33c097b01c74cb12
Canonical record JSON
{
"metadata": {
"abstract_canon_sha256": "694771751536fcba700787642bf133f0d086ef157772f5033bc5d447655fcf45",
"cross_cats_sorted": [
"cs.CV"
],
"license": "http://creativecommons.org/licenses/by/4.0/",
"primary_cat": "cs.AI",
"submitted_at": "2026-05-13T19:23:53Z",
"title_canon_sha256": "9ebacb23d313ac608f8a5c552c8dd22b719d0dee191d005b9d551ca8153f4c7b"
},
"schema_version": "1.0",
"source": {
"id": "2605.14054",
"kind": "arxiv",
"version": 1
}
}