Unireditbench: A unified reasoning-based image editing benchmark

Han, F · 2025 · arXiv 2511.01295

9 Pith papers cite this work. Polarity classification is still indexing.

9 Pith papers citing it

read on arXiv browse 9 citing papers

citation-role summary

background 2

citation-polarity summary

background 2

representative citing papers

PhyEditBench: A Real-World Multi-Stage Benchmark for Physics-Aware Image Editing

cs.CV · 2026-06-25 · unverdicted · novelty 7.0

PhyEditBench is a new benchmark for physics-aware image editing with real and synthetic instances plus a training-free PhyWorld baseline that uses test-time scaling to outperform SOTA models.

Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling

cs.CV · 2026-05-13 · unverdicted · novelty 7.0

Edit-Compass and EditReward-Compass are new unified benchmarks for fine-grained image editing evaluation and realistic reward modeling in reinforcement learning optimization.

Inline Critic Steers Image Editing

cs.CV · 2026-05-12 · conditional · novelty 7.0

Inline Critic uses a learnable token to critique and steer a frozen image-editing model's intermediate layers during generation, delivering state-of-the-art results on GEdit-Bench, RISEBench, and KRIS-Bench.

MULTITEXTEDIT: Benchmarking Cross-Lingual Degradation in Text-in-Image Editing

cs.CV · 2026-05-04 · unverdicted · novelty 7.0

MULTITEXTEDIT benchmark reveals that all tested text-in-image editing models show pronounced degradation on non-English languages, especially Hebrew and Arabic, mainly in text accuracy and script fidelity.

DLEBench: Evaluating Small-scale Object Editing Ability for Instruction-based Image Editing Model

cs.CV · 2026-02-27 · unverdicted · novelty 7.0

DLEBench is the first benchmark for small-scale object editing in instruction-based image editing models, using 1889 samples, seven instruction types, and a dual-mode evaluation protocol to reveal performance gaps in 10 tested models.

ReasonEdit: Towards Interpretable Image Editing Evaluation via Reinforcement Learning

cs.CV · 2026-05-08 · unverdicted · novelty 6.0

ReasonEdit uses a new CoT dataset and reinforcement learning to produce interpretable, human-aligned evaluations of text-guided image edits.

PaintBench: Deterministic Evaluation of Precise Visual Editing

cs.GR · 2026-05-29 · unverdicted · novelty 5.0

PaintBench provides a scalable deterministic benchmark for precise visual editing operations, revealing that even the best of 11 models achieves only 17.1% mIoU and that scores correlate strongly with applied data visualization editing performance.

Bernini: Latent Semantic Planning for Video Diffusion

cs.CV · 2026-05-21 · unverdicted · novelty 5.0

Bernini is a framework that uses an MLLM planner to output semantic representations for a DiT renderer to generate or edit videos, reporting SOTA benchmark performance.

DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop Agents

cs.AI · 2026-05-03 · unverdicted · novelty 5.0

DataEvolver introduces a reusable framework with generation-time self-correction and validation-time self-expansion loops that improves visual datasets, shown to outperform baselines on an object-rotation task.

citing papers explorer

Showing 8 of 8 citing papers after filters.

PhyEditBench: A Real-World Multi-Stage Benchmark for Physics-Aware Image Editing cs.CV · 2026-06-25 · unverdicted · none · ref 17
PhyEditBench is a new benchmark for physics-aware image editing with real and synthetic instances plus a training-free PhyWorld baseline that uses test-time scaling to outperform SOTA models.
Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling cs.CV · 2026-05-13 · unverdicted · none · ref 13
Edit-Compass and EditReward-Compass are new unified benchmarks for fine-grained image editing evaluation and realistic reward modeling in reinforcement learning optimization.
MULTITEXTEDIT: Benchmarking Cross-Lingual Degradation in Text-in-Image Editing cs.CV · 2026-05-04 · unverdicted · none · ref 51
MULTITEXTEDIT benchmark reveals that all tested text-in-image editing models show pronounced degradation on non-English languages, especially Hebrew and Arabic, mainly in text accuracy and script fidelity.
DLEBench: Evaluating Small-scale Object Editing Ability for Instruction-based Image Editing Model cs.CV · 2026-02-27 · unverdicted · none · ref 8
DLEBench is the first benchmark for small-scale object editing in instruction-based image editing models, using 1889 samples, seven instruction types, and a dual-mode evaluation protocol to reveal performance gaps in 10 tested models.
ReasonEdit: Towards Interpretable Image Editing Evaluation via Reinforcement Learning cs.CV · 2026-05-08 · unverdicted · none · ref 14
ReasonEdit uses a new CoT dataset and reinforcement learning to produce interpretable, human-aligned evaluations of text-guided image edits.
PaintBench: Deterministic Evaluation of Precise Visual Editing cs.GR · 2026-05-29 · unverdicted · none · ref 11
PaintBench provides a scalable deterministic benchmark for precise visual editing operations, revealing that even the best of 11 models achieves only 17.1% mIoU and that scores correlate strongly with applied data visualization editing performance.
Bernini: Latent Semantic Planning for Video Diffusion cs.CV · 2026-05-21 · unverdicted · none · ref 25
Bernini is a framework that uses an MLLM planner to output semantic representations for a DiT renderer to generate or edit videos, reporting SOTA benchmark performance.
DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop Agents cs.AI · 2026-05-03 · unverdicted · none · ref 12
DataEvolver introduces a reusable framework with generation-time self-correction and validation-time self-expansion loops that improves visual datasets, shown to outperform baselines on an object-rotation task.

Unireditbench: A unified reasoning-based image editing benchmark

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer