ImagenWorld: Stress-testing image generation models with explainable human evaluation on open-ended real-world tasks

Samin Mahdizadeh Sani, Max Ku, Nima Jamali, Matina Mahdizadeh Sani, Paria Khoshtab, Wei-Chieh Sun, Parnian Fazel, Zhi Rui Tam, Thomas Chong, Edisy Kin Wai Chan, Donald Wai Tong Tsang, Chiao-Wei Hsu, Ting Wai Lam, Ho Yin Sam Ng, Chiafeng Chu · 2026 · arXiv 2603.27862

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

read on arXiv browse 4 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Can Image Models Imagine Time? ImageTime: A Novel Benchmark for Probing Visual World Modeling Through Spatiotemporal Consistency

cs.CV · 2026-06-09 · unverdicted · novelty 7.0

ImageTime is a benchmark that probes image generation models' visual world modeling by requiring coherent four-state sequences in single images, scored via VLM judge.

TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design

cs.CV · 2026-05-20 · unverdicted · novelty 7.0 · 2 refs

TASTE supplies designer multi-dimensional rankings of T2I graphic outputs with statistical validation showing moderate agreement and benchmarks where a TASTE-trained MLP outperforms off-the-shelf VLMs.

RewardHarness: Self-Evolving Agentic Post-Training

cs.AI · 2026-05-09 · unverdicted · novelty 7.0

RewardHarness self-evolves a tool-and-skill library from 100 preference examples to reach 47.4% accuracy on image-edit evaluation, beating GPT-5, and yields stronger RL-tuned models.

A Cross-Model VLM-Judge Protocol for Single-Image 3D Mesh Quality (and Why Cheap Proxies Fall Short)

cs.LG · 2026-06-16 · unverdicted · novelty 6.0

A reproducible VLM-judge protocol with position-bias correction is validated as superior to CLIP similarity and geometry-validity proxies for assessing single-image 3D mesh quality.

citing papers explorer

Showing 4 of 4 citing papers after filters.

Can Image Models Imagine Time? ImageTime: A Novel Benchmark for Probing Visual World Modeling Through Spatiotemporal Consistency cs.CV · 2026-06-09 · unverdicted · none · ref 31
ImageTime is a benchmark that probes image generation models' visual world modeling by requiring coherent four-state sequences in single images, scored via VLM judge.
TASTE: A Designer-Annotated Multi-Dimensional Preference Dataset for AI-Generated Graphic Design cs.CV · 2026-05-20 · unverdicted · none · ref 34 · 2 links
TASTE supplies designer multi-dimensional rankings of T2I graphic outputs with statistical validation showing moderate agreement and benchmarks where a TASTE-trained MLP outperforms off-the-shelf VLMs.
RewardHarness: Self-Evolving Agentic Post-Training cs.AI · 2026-05-09 · unverdicted · none · ref 22
RewardHarness self-evolves a tool-and-skill library from 100 preference examples to reach 47.4% accuracy on image-edit evaluation, beating GPT-5, and yields stronger RL-tuned models.
A Cross-Model VLM-Judge Protocol for Single-Image 3D Mesh Quality (and Why Cheap Proxies Fall Short) cs.LG · 2026-06-16 · unverdicted · none · ref 13
A reproducible VLM-judge protocol with position-bias correction is validated as superior to CLIP similarity and geometry-validity proxies for assessing single-image 3D mesh quality.

ImagenWorld: Stress-testing image generation models with explainable human evaluation on open-ended real-world tasks

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer