New 3DLP benchmark with real-world 1K HDR pairs shows state-of-the-art image editing models vary in physical lighting consistency, with best models close to reality but error-prone in low-light regions.
Picabench: How far are we from physically realistic image editing?
7 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
representative citing papers
Presents Entity-Rubrics and AbstractEdit benchmark to measure image editing models on abstract intent, finding standard models struggle to balance edit intent with image preservation.
Do-Undo Bench is a new evaluation task and dataset that forces models to simulate forward action effects and then undo them to measure genuine action understanding in image generation.
Introduces HOI-Edit benchmark with HOI-Eval metric and SCPE self-correcting framework leveraging I2V models for competitive HOI image editing performance.
LithoBench is a new multi-level benchmark showing that existing large multimodal models have substantial limitations in geological semantic understanding for remote sensing lithology interpretation.
DataEvolver introduces a reusable framework with generation-time self-correction and validation-time self-expansion loops that improves visual datasets, shown to outperform baselines on an object-rotation task.
citing papers explorer
-
LithoBench: Benchmarking Large Multimodal Models for Remote-Sensing Lithology Interpretation
LithoBench is a new multi-level benchmark showing that existing large multimodal models have substantial limitations in geological semantic understanding for remote sensing lithology interpretation.