Commonsense-t2i challenge: Can text-to-image generation models understand commonsense?

Fu, X · 2024 · arXiv 2406.07546

4 Pith papers cite this work. Polarity classification is still indexing.

4 Pith papers citing it

representative citing papers

T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts

cs.CV · 2024-12-05 · unverdicted · novelty 7.0

T2I-FactualBench is a new three-tier benchmark for factuality of knowledge-intensive concepts in T2I models, using multi-round VQA evaluation to show SOTA models need improvement.

Intermediate Text Representation Guided Text-to-Image Generation for Enhancing One-and-Only Alignment

cs.CV · 2026-06-29 · unverdicted · novelty 6.0

IR-guided diffusion injects intermediate text representations into early denoising steps to improve alignment for one-and-only objects, reporting up to 19.1pp VQAScore gains on OAO-AttackBench and other benchmarks.

Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation

cs.CV · 2024-10-07 · unverdicted · novelty 6.0

PhyGenBench supplies 160 prompts across 27 physical laws and an automated LLM/VLM evaluation pipeline to measure physical commonsense compliance in current text-to-video models.

WISE: A World Knowledge-Informed Semantic Evaluation for Text-to-Image Generation

cs.CV · 2025-03-10

citing papers explorer

Showing 2 of 2 citing papers after filters.

T2I-FactualBench: Benchmarking the Factuality of Text-to-Image Models with Knowledge-Intensive Concepts cs.CV · 2024-12-05 · unverdicted · none · ref 10
T2I-FactualBench is a new three-tier benchmark for factuality of knowledge-intensive concepts in T2I models, using multi-round VQA evaluation to show SOTA models need improvement.
Towards World Simulator: Crafting Physical Commonsense-Based Benchmark for Video Generation cs.CV · 2024-10-07 · unverdicted · none · ref 8
PhyGenBench supplies 160 prompts across 27 physical laws and an automated LLM/VLM evaluation pipeline to measure physical commonsense compliance in current text-to-video models.

Commonsense-t2i challenge: Can text-to-image generation models understand commonsense?

fields

years

verdicts

representative citing papers

citing papers explorer