Do-Undo Bench is a new evaluation task and dataset that forces models to simulate forward action effects and then undo them to measure genuine action understanding in image generation.
Picabench: How far are we from physically realistic image editing?
4 Pith papers cite this work. Polarity classification is still indexing.
4
Pith papers citing it
citation-role summary
baseline 1
method 1
citation-polarity summary
representative citing papers
LithoBench is a new multi-level benchmark showing that existing large multimodal models have substantial limitations in geological semantic understanding for remote sensing lithology interpretation.
DataEvolver introduces a reusable framework with generation-time self-correction and validation-time self-expansion loops that improves visual datasets, shown to outperform baselines on an object-rotation task.