{"record_type":"pith_number_record","schema_url":"https://pith.science/schemas/pith-number/v1.json","pith_number":"pith:2026:LVPWXSGYEVADWFCGDBQYX7ZL5L","short_pith_number":"pith:LVPWXSGY","schema_version":"1.0","canonical_sha256":"5d5f6bc8d825403b144618618bff2bead30ec8155b6065932ba15832e9e355a8","source":{"kind":"arxiv","id":"2602.07458","version":4},"attestation_state":"computed","paper":{"title":"SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Anchoring rewards to predicted edit regions closes the perception gap in image editing RL","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Bin Wen, Changyi Liu, Fan Yang, Han Li, Haonan Fan, Hongyang Wei, Jiankang Chen, Kaiyu Jiang, Kaiyu Tang, Shuo Yang, Tianke Zhang, Tingting Gao, Wei Chen, Yancheng Long, Yankai Yang","submitted_at":"2026-02-07T09:23:34Z","abstract_excerpt":"Online Reinforcement Learning (RL) offers a promising avenue for complex image editing but is currently constrained by the scarcity of reliable and fine-grained reward signals. Existing evaluators frequently struggle with a critical perception gap we term \"Attention Collapse,\" where models neglect cross-image comparisons and fail to capture fine-grained details, resulting in inaccurate perception and miscalibrated scores. To address these limitations, we propose SpatialReward, a reward model that enforces precise verification via explicit spatial reasoning. By anchoring reasoning to predicted "},"verification_status":{"content_addressed":true,"pith_receipt":true,"author_attested":false,"weak_author_claims":0,"strong_author_claims":0,"externally_anchored":false,"storage_verified":false,"citation_signatures":0,"replication_records":0,"graph_snapshot":true,"references_resolved":true,"formal_links_present":true},"canonical_record":{"source":{"id":"2602.07458","kind":"arxiv","version":4},"metadata":{"license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","primary_cat":"cs.CV","submitted_at":"2026-02-07T09:23:34Z","cross_cats_sorted":[],"title_canon_sha256":"1b5d4237dc6f688e08f86957fa5d701d53ea0fa79ea55d2fde65d83d45a38ce0","abstract_canon_sha256":"858e2f5a747c75d8bd75320f8f48e4fc3f0a7e5cdc6e2a6ef6e7b75004144ea8"},"schema_version":"1.0"},"receipt":{"kind":"pith_receipt","key_id":"pith-v1-2026-05","algorithm":"ed25519","signed_at":"2026-05-18T02:44:31.313550Z","signature_b64":"2b8or8nPl1VbxndHxGzi7gjFSX3UMo0+/oFE/PigAOyv+DT1X8uhXev3UTfyyj2nFDRvdNODu/zVlQagbloPCg==","signed_message":"canonical_sha256_bytes","builder_version":"pith-number-builder-2026-05-17-v1","receipt_version":"0.3","canonical_sha256":"5d5f6bc8d825403b144618618bff2bead30ec8155b6065932ba15832e9e355a8","last_reissued_at":"2026-05-18T02:44:31.312988Z","signature_status":"signed_v1","first_computed_at":"2026-05-18T02:44:31.312988Z","public_key_fingerprint":"8d4b5ee74e4693bcd1df2446408b0d54"},"graph_snapshot":{"paper":{"title":"SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Anchoring rewards to predicted edit regions closes the perception gap in image editing RL","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Bin Wen, Changyi Liu, Fan Yang, Han Li, Haonan Fan, Hongyang Wei, Jiankang Chen, Kaiyu Jiang, Kaiyu Tang, Shuo Yang, Tianke Zhang, Tingting Gao, Wei Chen, Yancheng Long, Yankai Yang","submitted_at":"2026-02-07T09:23:34Z","abstract_excerpt":"Online Reinforcement Learning (RL) offers a promising avenue for complex image editing but is currently constrained by the scarcity of reliable and fine-grained reward signals. Existing evaluators frequently struggle with a critical perception gap we term \"Attention Collapse,\" where models neglect cross-image comparisons and fail to capture fine-grained details, resulting in inaccurate perception and miscalibrated scores. To address these limitations, we propose SpatialReward, a reward model that enforces precise verification via explicit spatial reasoning. By anchoring reasoning to predicted "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"SpatialReward serves as a robust signal in online RL, boosting OmniGen2 by +0.90 on GEdit-Bench--surpassing the leading discriminative model and doubling the gain of GPT-4.1 (+0.45).","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That predicting edit regions and anchoring reasoning to them reliably grounds semantic judgments in pixel-level evidence without the prediction step introducing new errors that offset the gains.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"SpatialReward is a new reward model that grounds image edit evaluations in pixel-level spatial reasoning on predicted regions, achieving SOTA on benchmarks and doubling RL gains for OmniGen2.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Anchoring rewards to predicted edit regions closes the perception gap in image editing RL","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"c37e0bce4c6925394fa2bfaffdedf6bf090594c13379ef373ccad3e0141bc907"},"source":{"id":"2602.07458","kind":"arxiv","version":4},"verdict":{"id":"b41b4442-63f5-4b9e-b539-5dc62b07270e","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T06:30:28.074324Z","strongest_claim":"SpatialReward serves as a robust signal in online RL, boosting OmniGen2 by +0.90 on GEdit-Bench--surpassing the leading discriminative model and doubling the gain of GPT-4.1 (+0.45).","one_line_summary":"SpatialReward is a new reward model that grounds image edit evaluations in pixel-level spatial reasoning on predicted regions, achieving SOTA on benchmarks and doubling RL gains for OmniGen2.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That predicting edit regions and anchoring reasoning to them reliably grounds semantic judgments in pixel-level evidence without the prediction step introducing new errors that offset the gains.","pith_extraction_headline":"Anchoring rewards to predicted edit regions closes the perception gap in image editing RL"},"references":{"count":29,"sample":[{"doi":"","year":null,"title":"• Good: All edit operations in the instruction are perfectly executed","work_id":"5aa17261-390f-4938-a28d-cef6dfb44857","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"•Good: High fidelity, no visible artifacts","work_id":"bacec80f-ceaf-47cb-8122-ca37b79da335","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2017,"title":"Overall AestheticsA holistic assessment of the image’s visual appeal and harmony. annotators are instructed to judge solely based on the visual outcome: •Good: Visually pleasing, professional-looking ","work_id":"1626f568-15b7-4dd0-b7f8-0dce40250ee2","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Reward Model Interpretation(Section C.1): We analyze the internal attention mechanisms of SpatialReward to verify its reasoning logic and explain the metrics used for quantitative diagnosis","work_id":"339aea62-cb1a-4a16-ab1b-f02f812da550","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Policy Generation Results(Section C.2): We showcase additional qualitative comparisons of the downstream policy model (OmniGen2) trained via Online RL, demonstrating the effectiveness of our reward si","work_id":"638b3958-8a3c-4ec9-ad34-8fa2ace54285","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":29,"snapshot_sha256":"e392cd577fc150b4a5b1c8001342c906a09c3ebf0d7e37c97d5f8535a4ebb7c8","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"7fcad0284b9586149212dd74cd856be83c26e01c97febcea75a9cd92901d98e7"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"},"aliases":[{"alias_kind":"arxiv","alias_value":"2602.07458","created_at":"2026-05-18T02:44:31.313070+00:00"},{"alias_kind":"arxiv_version","alias_value":"2602.07458v4","created_at":"2026-05-18T02:44:31.313070+00:00"},{"alias_kind":"doi","alias_value":"10.48550/arxiv.2602.07458","created_at":"2026-05-18T02:44:31.313070+00:00"},{"alias_kind":"pith_short_12","alias_value":"LVPWXSGYEVAD","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_16","alias_value":"LVPWXSGYEVADWFCG","created_at":"2026-05-18T12:33:37.589309+00:00"},{"alias_kind":"pith_short_8","alias_value":"LVPWXSGY","created_at":"2026-05-18T12:33:37.589309+00:00"}],"events":[],"event_summary":{},"paper_claims":[],"inbound_citations":{"count":1,"internal_anchor_count":1,"sample":[{"citing_arxiv_id":"2604.07296","citing_title":"OpenSpatial: A Principled Data Engine for Empowering Spatial Intelligence","ref_index":32,"is_internal_anchor":true}]},"formal_canon":{"evidence_count":2,"sample":[],"anchors":[]},"links":{"html":"https://pith.science/pith/LVPWXSGYEVADWFCGDBQYX7ZL5L","json":"https://pith.science/pith/LVPWXSGYEVADWFCGDBQYX7ZL5L.json","graph_json":"https://pith.science/api/pith-number/LVPWXSGYEVADWFCGDBQYX7ZL5L/graph.json","events_json":"https://pith.science/api/pith-number/LVPWXSGYEVADWFCGDBQYX7ZL5L/events.json","paper":"https://pith.science/paper/LVPWXSGY"},"agent_actions":{"view_html":"https://pith.science/pith/LVPWXSGYEVADWFCGDBQYX7ZL5L","download_json":"https://pith.science/pith/LVPWXSGYEVADWFCGDBQYX7ZL5L.json","view_paper":"https://pith.science/paper/LVPWXSGY","resolve_alias":"https://pith.science/api/pith-number/resolve?arxiv=2602.07458&json=true","fetch_graph":"https://pith.science/api/pith-number/LVPWXSGYEVADWFCGDBQYX7ZL5L/graph.json","fetch_events":"https://pith.science/api/pith-number/LVPWXSGYEVADWFCGDBQYX7ZL5L/events.json","actions":{"anchor_timestamp":"https://pith.science/pith/LVPWXSGYEVADWFCGDBQYX7ZL5L/action/timestamp_anchor","attest_storage":"https://pith.science/pith/LVPWXSGYEVADWFCGDBQYX7ZL5L/action/storage_attestation","attest_author":"https://pith.science/pith/LVPWXSGYEVADWFCGDBQYX7ZL5L/action/author_attestation","sign_citation":"https://pith.science/pith/LVPWXSGYEVADWFCGDBQYX7ZL5L/action/citation_signature","submit_replication":"https://pith.science/pith/LVPWXSGYEVADWFCGDBQYX7ZL5L/action/replication_record"}},"created_at":"2026-05-18T02:44:31.313070+00:00","updated_at":"2026-05-18T02:44:31.313070+00:00"}