T2i-compbench++: An enhanced and comprehensive benchmark for compositional text-to-image generation

Huang, K · 2025 · arXiv 2307.06350

5 Pith papers cite this work. Polarity classification is still indexing.

5 Pith papers citing it

read on arXiv browse 5 citing papers

citation-role summary

background 1

citation-polarity summary

background 1

representative citing papers

Beyond Accuracy: Benchmarking Cross-Task Consistency in Unified Multimodal Models

cs.CV · 2026-04-27 · unverdicted · novelty 7.0

XTC-Bench reveals that strong performance on generation or understanding tasks in unified multimodal models does not guarantee cross-task semantic consistency, which instead depends on how tightly coupled the learning objectives are across modalities.

The Illusion of High Utility in Safety Alignment of Text-to-Image Diffusion Models

cs.CV · 2026-07-01 · unverdicted · novelty 6.0

Safety-aligned T2I diffusion models exhibit semantic collapse in text embeddings causing TIFA drops; SAGE regularization restores structured utility while retaining safety.

Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning

cs.CV · 2026-05-14 · unverdicted · novelty 6.0

CLVR framework adds closed-loop visual verification, proxy prompt reinforcement learning, and delta-space weight merge to improve complex text-to-image generation over single-step or unverified multi-step baselines.

Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models

cs.CL · 2024-11-07 · conditional · novelty 6.0

MoT decouples non-embedding parameters by modality in transformers to match dense multi-modal performance with roughly one-third to one-half the FLOPs.

Scaling Rectified Flow Transformers for High-Resolution Image Synthesis

cs.CV · 2024-03-05 · conditional · novelty 6.0

Biased noise sampling for rectified flows combined with a bidirectional text-image transformer architecture yields state-of-the-art high-resolution text-to-image results that scale predictably with model size.

citing papers explorer

Showing 5 of 5 citing papers.

Beyond Accuracy: Benchmarking Cross-Task Consistency in Unified Multimodal Models cs.CV · 2026-04-27 · unverdicted · none · ref 17
XTC-Bench reveals that strong performance on generation or understanding tasks in unified multimodal models does not guarantee cross-task semantic consistency, which instead depends on how tightly coupled the learning objectives are across modalities.
The Illusion of High Utility in Safety Alignment of Text-to-Image Diffusion Models cs.CV · 2026-07-01 · unverdicted · none · ref 12
Safety-aligned T2I diffusion models exhibit semantic collapse in text embeddings causing TIFA drops; SAGE regularization restores structured utility while retaining safety.
Unlocking Complex Visual Generation via Closed-Loop Verified Reasoning cs.CV · 2026-05-14 · unverdicted · none · ref 18
CLVR framework adds closed-loop visual verification, proxy prompt reinforcement learning, and delta-space weight merge to improve complex text-to-image generation over single-step or unverified multi-step baselines.
Mixture-of-Transformers: A Sparse and Scalable Architecture for Multi-Modal Foundation Models cs.CL · 2024-11-07 · conditional · none · ref 15
MoT decouples non-embedding parameters by modality in transformers to match dense multi-modal performance with roughly one-third to one-half the FLOPs.
Scaling Rectified Flow Transformers for High-Resolution Image Synthesis cs.CV · 2024-03-05 · conditional · none · ref 142
Biased noise sampling for rectified flows combined with a bidirectional text-image transformer architecture yields state-of-the-art high-resolution text-to-image results that scale predictably with model size.

T2i-compbench++: An enhanced and comprehensive benchmark for compositional text-to-image generation

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer