Edit-Compass and EditReward-Compass are new unified benchmarks for fine-grained image editing evaluation and realistic reward modeling in reinforcement learning optimization.
arXiv preprint arXiv:2511.01295 , year=
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 7roles
background 2polarities
background 2representative citing papers
Inline Critic uses a learnable token to critique and steer a frozen image-editing model's intermediate layers during generation, delivering state-of-the-art results on GEdit-Bench, RISEBench, and KRIS-Bench.
MULTITEXTEDIT benchmark reveals that all tested text-in-image editing models show pronounced degradation on non-English languages, especially Hebrew and Arabic, mainly in text accuracy and script fidelity.
DLEBench is the first benchmark for small-scale object editing in instruction-based image editing models, using 1889 samples, seven instruction types, and a dual-mode evaluation protocol to reveal performance gaps in 10 tested models.
ReasonEdit uses a new CoT dataset and reinforcement learning to produce interpretable, human-aligned evaluations of text-guided image edits.
Bernini is a framework that uses an MLLM planner to output semantic representations for a DiT renderer to generate or edit videos, reporting SOTA benchmark performance.
DataEvolver introduces a reusable framework with generation-time self-correction and validation-time self-expansion loops that improves visual datasets, shown to outperform baselines on an object-rotation task.
citing papers explorer
-
Edit-Compass & EditReward-Compass: A Unified Benchmark for Image Editing and Reward Modeling
Edit-Compass and EditReward-Compass are new unified benchmarks for fine-grained image editing evaluation and realistic reward modeling in reinforcement learning optimization.
-
Inline Critic Steers Image Editing
Inline Critic uses a learnable token to critique and steer a frozen image-editing model's intermediate layers during generation, delivering state-of-the-art results on GEdit-Bench, RISEBench, and KRIS-Bench.
-
MULTITEXTEDIT: Benchmarking Cross-Lingual Degradation in Text-in-Image Editing
MULTITEXTEDIT benchmark reveals that all tested text-in-image editing models show pronounced degradation on non-English languages, especially Hebrew and Arabic, mainly in text accuracy and script fidelity.
-
DLEBench: Evaluating Small-scale Object Editing Ability for Instruction-based Image Editing Model
DLEBench is the first benchmark for small-scale object editing in instruction-based image editing models, using 1889 samples, seven instruction types, and a dual-mode evaluation protocol to reveal performance gaps in 10 tested models.
-
ReasonEdit: Towards Interpretable Image Editing Evaluation via Reinforcement Learning
ReasonEdit uses a new CoT dataset and reinforcement learning to produce interpretable, human-aligned evaluations of text-guided image edits.
-
Bernini: Latent Semantic Planning for Video Diffusion
Bernini is a framework that uses an MLLM planner to output semantic representations for a DiT renderer to generate or edit videos, reporting SOTA benchmark performance.
-
DataEvolver: Let Your Data Build and Improve Itself via Goal-Driven Loop Agents
DataEvolver introduces a reusable framework with generation-time self-correction and validation-time self-expansion loops that improves visual datasets, shown to outperform baselines on an object-rotation task.