CV-Arena is a new 12K-pair benchmark for instruction-guided real-image editing with 16 task types, CogRetriever curation, and Active Elo mixed human-AI evaluation that finds gaps in 21 models and presents CV-Agent.
arXiv preprint arXiv:2411.07232 , year=
5 Pith papers cite this work. Polarity classification is still indexing.
fields
cs.CV 5verdicts
UNVERDICTED 5representative citing papers
Presents Entity-Rubrics and AbstractEdit benchmark to measure image editing models on abstract intent, finding standard models struggle to balance edit intent with image preservation.
GenHSI is a training-free three-stage pipeline that turns a scene image, character image, and complex HSI prompt into long videos with plausible chained interactions by generating atomic actions, 3D keyframes via 2D inpainting plus optimization, and then feeding them to pre-trained video diffusion.
AdaEraser introduces token-wise adaptive attention suppression in diffusion denoising to enable high-quality training-free object removal by modulating suppression according to evolving self-attention maps.
AD-Relight adapts diffusion-based relighting models at test time via a multi-stage framework to relight custom ad banners so they match scene illumination, outperforming warping and prior relighting approaches.
citing papers explorer
-
CV-Arena: An Open Benchmark for Instructional Computer Vision Problem Solving with Human-AI Collaborative Preferences
CV-Arena is a new 12K-pair benchmark for instruction-guided real-image editing with 16 task types, CogRetriever curation, and Active Elo mixed human-AI evaluation that finds gaps in 21 models and presents CV-Agent.
-
Editor's Choice: Evaluating Abstract Intent in Image Editing through Atomic Entity Analysis
Presents Entity-Rubrics and AbstractEdit benchmark to measure image editing models on abstract intent, finding standard models struggle to balance edit intent with image preservation.
-
GenHSI: Controllable Generation of Human-Scene Interaction Videos
GenHSI is a training-free three-stage pipeline that turns a scene image, character image, and complex HSI prompt into long videos with plausible chained interactions by generating atomic actions, 3D keyframes via 2D inpainting plus optimization, and then feeding them to pre-trained video diffusion.
-
AdaEraser: Training-Free Object Removal via Adaptive Attention Suppression
AdaEraser introduces token-wise adaptive attention suppression in diffusion denoising to enable high-quality training-free object removal by modulating suppression according to evolving self-attention maps.
-
AD-Relight: Training-Free Banner Relighting via Illumination Translation with Diffusion Priors
AD-Relight adapts diffusion-based relighting models at test time via a multi-stage framework to relight custom ad banners so they match scene illumination, outperforming warping and prior relighting approaches.