{"paper":{"title":"SpatialReward: Bridging the Perception Gap in Online RL for Image Editing via Explicit Spatial Reasoning","license":"http://arxiv.org/licenses/nonexclusive-distrib/1.0/","headline":"Anchoring rewards to predicted edit regions closes the perception gap in image editing RL","cross_cats":[],"primary_cat":"cs.CV","authors_text":"Bin Wen, Changyi Liu, Fan Yang, Han Li, Haonan Fan, Hongyang Wei, Jiankang Chen, Kaiyu Jiang, Kaiyu Tang, Shuo Yang, Tianke Zhang, Tingting Gao, Wei Chen, Yancheng Long, Yankai Yang","submitted_at":"2026-02-07T09:23:34Z","abstract_excerpt":"Online Reinforcement Learning (RL) offers a promising avenue for complex image editing but is currently constrained by the scarcity of reliable and fine-grained reward signals. Existing evaluators frequently struggle with a critical perception gap we term \"Attention Collapse,\" where models neglect cross-image comparisons and fail to capture fine-grained details, resulting in inaccurate perception and miscalibrated scores. To address these limitations, we propose SpatialReward, a reward model that enforces precise verification via explicit spatial reasoning. By anchoring reasoning to predicted "},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"SpatialReward serves as a robust signal in online RL, boosting OmniGen2 by +0.90 on GEdit-Bench--surpassing the leading discriminative model and doubling the gain of GPT-4.1 (+0.45).","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That predicting edit regions and anchoring reasoning to them reliably grounds semantic judgments in pixel-level evidence without the prediction step introducing new errors that offset the gains.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"SpatialReward is a new reward model that grounds image edit evaluations in pixel-level spatial reasoning on predicted regions, achieving SOTA on benchmarks and doubling RL gains for OmniGen2.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"Anchoring rewards to predicted edit regions closes the perception gap in image editing RL","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"c37e0bce4c6925394fa2bfaffdedf6bf090594c13379ef373ccad3e0141bc907"},"source":{"id":"2602.07458","kind":"arxiv","version":4},"verdict":{"id":"b41b4442-63f5-4b9e-b539-5dc62b07270e","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-16T06:30:28.074324Z","strongest_claim":"SpatialReward serves as a robust signal in online RL, boosting OmniGen2 by +0.90 on GEdit-Bench--surpassing the leading discriminative model and doubling the gain of GPT-4.1 (+0.45).","one_line_summary":"SpatialReward is a new reward model that grounds image edit evaluations in pixel-level spatial reasoning on predicted regions, achieving SOTA on benchmarks and doubling RL gains for OmniGen2.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That predicting edit regions and anchoring reasoning to them reliably grounds semantic judgments in pixel-level evidence without the prediction step introducing new errors that offset the gains.","pith_extraction_headline":"Anchoring rewards to predicted edit regions closes the perception gap in image editing RL"},"references":{"count":29,"sample":[{"doi":"","year":null,"title":"• Good: All edit operations in the instruction are perfectly executed","work_id":"5aa17261-390f-4938-a28d-cef6dfb44857","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"•Good: High fidelity, no visible artifacts","work_id":"bacec80f-ceaf-47cb-8122-ca37b79da335","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2017,"title":"Overall AestheticsA holistic assessment of the image’s visual appeal and harmony. annotators are instructed to judge solely based on the visual outcome: •Good: Visually pleasing, professional-looking ","work_id":"1626f568-15b7-4dd0-b7f8-0dce40250ee2","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Reward Model Interpretation(Section C.1): We analyze the internal attention mechanisms of SpatialReward to verify its reasoning logic and explain the metrics used for quantitative diagnosis","work_id":"339aea62-cb1a-4a16-ab1b-f02f812da550","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":null,"title":"Policy Generation Results(Section C.2): We showcase additional qualitative comparisons of the downstream policy model (OmniGen2) trained via Online RL, demonstrating the effectiveness of our reward si","work_id":"638b3958-8a3c-4ec9-ad34-8fa2ace54285","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":29,"snapshot_sha256":"e392cd577fc150b4a5b1c8001342c906a09c3ebf0d7e37c97d5f8535a4ebb7c8","internal_anchors":0},"formal_canon":{"evidence_count":2,"snapshot_sha256":"7fcad0284b9586149212dd74cd856be83c26e01c97febcea75a9cd92901d98e7"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}