IGenBench: Benchmarking the Reliability of Text-to-Infographic Generation
read the original abstract
Infographics are composite visual artifacts that combine data visualizations with textual and illustrative elements to communicate information. While recent text-to-image (T2I) models can generate aesthetically appealing images, their reliability in generating infographics remains unclear. Generated infographics may appear correct at first glance but contain easily overlooked issues, such as distorted data encoding or incorrect textual content. We present IGENBENCH, the first benchmark for evaluating the reliability of text-to-infographic generation, comprising 600 curated test cases spanning 30 infographic types. We design an automated evaluation framework that decomposes reliability verification into atomic yes/no questions based on a taxonomy of 10 question types. We employ multimodal large language models (MLLMs) to verify each question, yielding question-level accuracy (Q-ACC) and infographic-level accuracy (I-ACC). We comprehensively evaluate 10 state-of-the-art T2I models on IGENBENCH. Our systematic analysis reveals key insights for future model development: (i) a three-tier performance hierarchy with the top model achieving Q-ACC of 0.90 but I-ACC of only 0.49; (ii) data-related dimensions emerging as universal bottlenecks (e.g., Data Completeness: 0.21); and (iii) the challenge of achieving end-to-end correctness across all models. We release IGENBENCH at https://igen-bench.vercel.app/.
This paper has not been read by Pith yet.
Forward citations
Cited by 4 Pith papers
-
sketch-plot: Progressive Editing for Text-to-Image Academic Figures
sketch-plot introduces a three-layer progressive editing pipeline with human-in-the-loop refinement for targeted modifications to text-to-image academic figures.
-
DataMagic: Transforming Tabular Data into Data Insight Video
DataMagic generates narrative data videos from tabular data and queries via DVSpec declarative bindings and a Generate-then-Orchestrate multi-agent pipeline.
-
Demonstrating chart-plot: Closing the Last Mile of Academic Chart Generation
chart-plot is an agentic harness using style-aware code generation from venue figures, a LaTeX-aware render-and-revise loop, and structured edit handles to produce top-venue-ready academic charts.
-
SenseNova-U1: Unifying Multimodal Understanding and Generation with NEO-unify Architecture
SenseNova-U1 presents native unified multimodal models that match top understanding VLMs while delivering strong performance in image generation, infographics, and interleaved tasks via the NEO-unify architecture.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.