{"paper":{"title":"PROVE: A Perceptual RemOVal cohErence Benchmark for Visual Media","license":"http://creativecommons.org/licenses/by/4.0/","headline":"RC metrics measure local spatial and temporal coherence in object removal to better match human perception than prior evaluation protocols.","cross_cats":["cs.AI","cs.MM"],"primary_cat":"cs.CV","authors_text":"Daiguo Zhou, Fei Wang, Fuhao Li, Jiagao Hu, Jian Luan, Shaofeng You, Yu Liu, Yuxuan Chen, Zepeng Wang","submitted_at":"2026-05-14T08:16:51Z","abstract_excerpt":"Evaluating object removal in images and videos remains challenging because the task is inherently one-to-many, yet existing metrics frequently disagree with human perception. Full-reference metrics reward copy-paste behaviors over genuine erasure; no-reference metrics suffer from systematic biases such as favoring blurry results; and global temporal metrics are insensitive to localized artifacts within edited regions. To address these limitations, we propose RC (Removal Coherence), a pair of perception-aligned metrics: RC-S, which measures spatial coherence via sliding-window feature compariso"},"claims":{"count":4,"items":[{"kind":"strongest_claim","text":"Experiments across diverse image and video benchmarks demonstrate that RC achieves substantially stronger alignment with human judgments than existing evaluation protocols.","source":"verdict.strongest_claim","status":"machine_extracted","claim_id":"C1","attestation":"unclaimed"},{"kind":"weakest_assumption","text":"That sliding-window feature comparisons and distribution tracking in restored regions will reliably capture human perception of coherence without post-hoc tuning or unstated biases in the chosen feature extractors.","source":"verdict.weakest_assumption","status":"machine_extracted","claim_id":"C2","attestation":"unclaimed"},{"kind":"one_line_summary","text":"PROVE proposes RC metrics for perceptual removal coherence and releases PROVE-Bench to better align automatic scores with human judgments on object removal tasks.","source":"verdict.one_line_summary","status":"machine_extracted","claim_id":"C3","attestation":"unclaimed"},{"kind":"headline","text":"RC metrics measure local spatial and temporal coherence in object removal to better match human perception than prior evaluation protocols.","source":"verdict.pith_extraction.headline","status":"machine_extracted","claim_id":"C4","attestation":"unclaimed"}],"snapshot_sha256":"ec506dd766459f1740e9f18787098b78198101f78e35a57fed8c03e3b1bbfff3"},"source":{"id":"2605.14534","kind":"arxiv","version":1},"verdict":{"id":"f79dcb31-903e-4600-9230-646463996c21","model_set":{"reader":"grok-4.3"},"created_at":"2026-05-15T02:08:07.586906Z","strongest_claim":"Experiments across diverse image and video benchmarks demonstrate that RC achieves substantially stronger alignment with human judgments than existing evaluation protocols.","one_line_summary":"PROVE proposes RC metrics for perceptual removal coherence and releases PROVE-Bench to better align automatic scores with human judgments on object removal tasks.","pipeline_version":"pith-pipeline@v0.9.0","weakest_assumption":"That sliding-window feature comparisons and distribution tracking in restored regions will reliably capture human perception of coherence without post-hoc tuning or unstated biases in the chosen feature extractors.","pith_extraction_headline":"RC metrics measure local spatial and temporal coherence in object removal to better match human perception than prior evaluation protocols."},"references":{"count":43,"sample":[{"doi":"","year":2024,"title":"Assessing image inpainting via re-inpainting self-consistency evaluation,","work_id":"94d1f353-a1ae-4c6f-a293-845194a9e75a","ref_index":1,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2010,"title":"Image quality metrics: Psnr vs. ssim,","work_id":"1180cfb6-cbae-49e8-b6b5-a0192b43b015","ref_index":2,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2004,"title":"Image quality assessment: from error visibility to structural similarity","work_id":"3d0f8170-1f04-4c66-b75e-5cd6399ea77c","ref_index":3,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2018,"title":"The unreasonable effectiveness of deep features as a perceptual metric,","work_id":"8fc1f7c4-743c-4fb9-b0bd-63b7fcce3272","ref_index":4,"cited_arxiv_id":"","is_internal_anchor":false},{"doi":"","year":2022,"title":"Resolution-robust large mask inpainting with fourier convolutions,","work_id":"a1cd330e-e045-41cd-a256-812d4f644f13","ref_index":5,"cited_arxiv_id":"","is_internal_anchor":false}],"resolved_work":43,"snapshot_sha256":"d3223ad3c30edf46b794b7eca5b46ad9de2d963f473eebef32e4762d1382a461","internal_anchors":3},"formal_canon":{"evidence_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"author_claims":{"count":0,"strong_count":0,"snapshot_sha256":"258153158e38e3291e3d48162225fcdb2d5a3ed65a07baac614ab91432fd4f57"},"builder_version":"pith-number-builder-2026-05-17-v1"}