{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:SORHSKQOXTUTC5ZOJA5WWZRVBY","short_pith_number":"pith:SORHSKQO","schema_version":"1.0","canonical_sha256":"93a2792a0ebce931772e483b6b66350e3792d34ff7ac67fa33c097b01c74cb12","source":{"kind":"arxiv","id":"2605.14054","version":1},"attestation_state":"computed","paper":{"title":"Bad Seeing or Bad Thinking? Rewarding Perception for Vision-Language Reasoning","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Vision-language models improve both perception and reasoning by routing rewards to the specific source of error via blindfolded verification.","cross_cats":["cs.CV"],"primary_cat":"cs.AI","authors_text":"Changpeng Wang, Chong Peng, Fangzhen Lin, Haozhe Wang, Qixin Xu, Taofeng Xue, Wenhu Chen","submitted_at":"2026-05-13T19:23:53Z","abstract_excerpt":"Achieving robust perception-reasoning synergy is a central goal for advanced Vision-Language Models (VLMs). Recent advancements have pursued this goal via architectural designs or agentic workflows. However, these approaches are often limited by static textual reasoning or complicated by the significant compute and engineering burden of external agentic complexity. Worse, this heavy investment does not yield proportional gains, often witnessing a \"seesaw effect\" on perception and reasoning. This motivates a fundamental rethinking of the true bottleneck. In this paper, we argue that the root ca"},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":false},"canonical_record":{"source":{"id":"2605.14054","kind":"arxiv","version":1},"metadata":{"license":"http://creativecommons.org/licenses/by/4.0/","primary_cat":"cs.AI","submitted_at":"2026-05-13T19:23:53Z","cross_cats_sorted":["cs.CV"],"title_canon_sha256":"9ebacb23d313ac608f8a5c552c8dd22b719d0dee191d005b9d551ca8153f4c7b","abstract_canon_sha256":"694771751536fcba700787642bf133f0d086ef157772f5033bc5d447655fcf45"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-17T23:39:12.608576Z","signature_b64":"XNYFM8qVo3MhoDlKfrvB32lAmfM5/R0REUUN4YSpumcddbZcucRaesqREdP+gXSUUt+yj9/w03ocFwALxj3PAA==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"93a2792a0ebce931772e483b6b66350e3792d34ff7ac67fa33c097b01c74cb12","last_reissued_at":"2026-05-17T23:39:12.607850Z","signature_status":"signed_v1","first_computed_at":"2026-05-17T23:39:12.607850Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"Bad Seeing or Bad Thinking? Rewarding Perception for Vision-Language Reasoning","license":"http://creativecommons.org/licenses/by/4.0/","headline":"Vision-language models improve both perception and reasoning by routing rewards to the specific source of error via blindfolded verification.","cross_cats":["cs.CV"],"primary_cat":"cs.AI","authors_text":"Changpeng Wang, Chong Peng, Fangzhen Lin, Haozhe Wang, Qixin Xu, Taofeng Xue, Wenhu Chen","submitted_at":"2026-05-13T19:23:53Z","abstract_excerpt":"Achieving robust perception-reasoning synergy is a central goal for advanced Vision-Language Models (VLMs). Recent advancements have pursued this goal via architectural designs or agentic workflows. However, these approaches are often limited by static textual reasoning or complicated by the significant compute and engineering burden of external agentic complexity. Worse, this heavy investment does not yield proportional gains, often witnessing a \"seesaw effect\" on perception and reasoning. This motivates a fundamental rethinking of the true bottleneck. In this paper, we argue that the root ca"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"the root cause of this trade-off is an ambiguity in modality credit assignment: when a VLM fails, is it due to flawed perception (bad seeing) or flawed logic (bad thinking)? ... These techniques are integrated into a Modality-Aware Credit Assignment (MoCA) mechanism, which routes rewards to the specific source of error.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That the blindfolded reasoning proxy in Perception Verification can reliably measure and reward perceptual fidelity independently of reasoning outcomes without introducing new biases or requiring perfect separation of modalities.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"A new RL method called MoCA with Perception Verification rewards perceptual fidelity independently to improve both seeing and thinking in VLMs.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Vision-language models improve both perception and reasoning by routing rewards to the specific source of error via blindfolded verification.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"cf462856a502330046b2e61fc13488c0f27208f96a2378871c885962038b7d24"},"source":{"id":"2605.14054","kind":"arxiv","version":1},"verdict":{"id":"85e1d818-1167-4a1a-96b0-57abb5f3f096","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T05:10:40.425167Z","strongest_claim":"the root cause of this trade-off is an ambiguity in modality credit assignment: when a VLM fails, is it due to flawed perception (bad seeing) or flawed logic (bad thinking)? ... These techniques are integrated into a Modality-Aware Credit Assignment (MoCA) mechanism, which routes rewards to the specific source of error.","one_line_summary":"A new RL method called MoCA with Perception Verification rewards perceptual fidelity independently to improve both seeing and thinking in VLMs.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That the blindfolded reasoning proxy in Perception Verification can reliably measure and reward perceptual fidelity independently of reasoning outcomes without introducing new biases or requiring perfect separation of modalities.","pith_extraction_headline":"Vision-language models improve both perception and reasoning by routing rewards to the specific source of error via blindfolded verification."},"references":{"count":100,"sample":[{"doi":"","year":null,"title":"FirstName LastName , title =","work_id":"d9cab501-317f-4237-9e32-b5ead5964402","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"FirstName Alpher , title =","work_id":"42297990-8783-41a1-b0fa-8ccdbf630852","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Journal of Foo , volume = 13, number = 1, pages =","work_id":"65a8b3d0-af84-4f68-87eb-101c85ab18b2","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Journal of Foo , volume = 14, number = 1, pages =","work_id":"b3089947-bd36-4a24-9199-cc535e299537","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"FirstName Alpher and FirstName Gamow , title =","work_id":"caed320b-7cdc-41ca-bb08-00fb14feec62","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":100,"snapshot_sha256":"80ea5f2c4df8119756beb1529e3459ce5cfc0c8e65ac50bc14a6bdfc34aa73cc","internal_anchors":7},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2605.14054","created_at":"2026-05-17T23:39:12.607986+00:00"},{"alias_kind":"arxiv_version","alias_value":"2605.14054v1","created_at":"2026-05-17T23:39:12.607986+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2605.14054","created_at":"2026-05-17T23:39:12.607986+00:00"},{"alias_kind":"pith_short_12","alias_value":"SORHSKQOXTUT","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"SORHSKQOXTUTC5ZO","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"SORHSKQO","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":0,"internal_anchor_count":0,"sample":[]},"formal_canon":{"evidence_count":0,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/SORHSKQOXTUTC5ZOJA5WWZRVBY","json":"https://pith.science/pith/SORHSKQOXTUTC5ZOJA5WWZRVBY.json","graph_json":"https://pith.science/api/pith-number/SORHSKQOXTUTC5ZOJA5WWZRVBY/graph.json","events_json":"https://pith.science/api/pith-number/SORHSKQOXTUTC5ZOJA5WWZRVBY/events.json","paper":"https://pith.science/paper/SORHSKQO"},"agent_actions":{"view_html":"https://pith.science/pith/SORHSKQOXTUTC5ZOJA5WWZRVBY","download_json":"https://pith.science/pith/SORHSKQOXTUTC5ZOJA5WWZRVBY.json","view_paper":"https://pith.science/paper/SORHSKQO","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2605.14054&json=true","fetch_graph":"https://pith.science/api/pith-number/SORHSKQOXTUTC5ZOJA5WWZRVBY/graph.json","fetch_events":"https://pith.science/api/pith-number/SORHSKQOXTUTC5ZOJA5WWZRVBY/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/SORHSKQOXTUTC5ZOJA5WWZRVBY/action/timestamp_anchor","attest_storage":"https://pith.science/pith/SORHSKQOXTUTC5ZOJA5WWZRVBY/action/storage_attestation","attest_author":"https://pith.science/pith/SORHSKQOXTUTC5ZOJA5WWZRVBY/action/author_attestation","sign_citation":"https://pith.science/pith/SORHSKQOXTUTC5ZOJA5WWZRVBY/action/citation_signature","submit_replication":"https://pith.science/pith/SORHSKQOXTUTC5ZOJA5WWZRVBY/action/replication_record"}},"created_at":"2026-05-17T23:39:12.607986+00:00","updated_at":"2026-05-17T23:39:12.607986+00:00"}