GenExam: A Multidisciplinary Text-to-Image Exam

Zhaokai Wang , Penghao Yin , Xiangyu Zhao , Changyao Tian , Yu Qiao , Wenhai Wang , Jifeng Dai , Gen Luo

Authors on Pith no claims yet

classification 💻 cs.CV

keywords genexamgenerationmodelsevaluationexamsreasoningtext-to-imageunderstanding

read the original abstract

Exams are a fundamental test of expert-level intelligence and require integrated understanding, reasoning, and generation. Existing exam-style benchmarks mainly focus on understanding and reasoning tasks, and current generation benchmarks emphasize the illustration of world knowledge and visual concepts, neglecting the evaluation of rigorous drawing exams. We introduce GenExam, the first benchmark for multidisciplinary text-to-image exams, featuring 1,000 samples across 10 subjects with exam-style prompts organized under a four-level taxonomy. Each problem is equipped with ground-truth images and fine-grained scoring points to enable a precise evaluation of semantic correctness and visual plausibility. Experiments on 17 text-to-image and unified models demonstrate the great challenge of GenExam and the huge gap where open-source models consistently lag behind the leading closed-source ones. By framing image generation as an exam, GenExam offers a rigorous assessment of models' ability to integrate understanding, reasoning, and generation, providing insights for on the path to intelligent generative models. Our benchmark and evaluation code are released at https://github.com/OpenGVLab/GenExam.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

MULTITEXTEDIT: Benchmarking Cross-Lingual Degradation in Text-in-Image Editing
cs.CV 2026-05 unverdicted novelty 7.0

MULTITEXTEDIT benchmark reveals that all tested text-in-image editing models show pronounced degradation on non-English languages, especially Hebrew and Arabic, mainly in text accuracy and script fidelity.
How RL Unlocks the Aha Moment in Geometric Interleaved Reasoning
cs.CL 2026-03 unverdicted novelty 6.0

Reinforcement learning with three causal constraints enables multimodal models to internalize diagram-reasoning links in geometry, unlike SFT which only mimics surface format and harms performance.