Qwen-image technical report

Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng ming Yin, Shuai Bai, Xiao Xu, Yilei Chen, Yuxiang Chen, Zecheng Tang, Zekai Zhang, Zhengyi Wang, An Yang, Bowen Yu, Chen Cheng, Dayiheng Liu, Deqing Li, Hang Zhan · 2025

6 Pith papers cite this work. Polarity classification is still indexing.

6 Pith papers citing it

browse 6 citing papers

citation-role summary

method 2 background 1

citation-polarity summary

use method 2 background 1

representative citing papers

SAM 3D Animal: Promptable Animal 3D Reconstruction from Images in the Wild

cs.CV · 2026-05-08 · unverdicted · novelty 7.0

SAM 3D Animal is the first promptable framework for multi-animal 3D reconstruction from single images, built on SMAL+ and trained on the new Herd3D dataset, achieving SOTA results on Animal3D, APTv2, and Animal Kingdom benchmarks.

Knowledge Visualization: A Benchmark and Method for Knowledge-Intensive Text-to-Image Generation

cs.CV · 2026-04-24 · conditional · novelty 7.0

KVBench reveals major gaps in current T2I models for knowledge-intensive tasks, and KE-Check narrows the gap between open- and closed-source models by adding structured knowledge and enforcing constraints.

Efficient Image Synthesis with Sphere Latent Encoder

cs.CV · 2026-05-15 · unverdicted · novelty 6.0

Decouples Sphere Encoder into fixed pretrained encoder and spherical latent denoiser, yielding higher quality and faster inference than the joint original on Animal-Faces, Oxford-Flowers and ImageNet-1K.

Can MLLMs Reason About Visual Persuasion? Evaluating the Efficacy and Faithfulness of Reasoning

cs.CV · 2026-05-09 · conditional · novelty 6.0

Diverse teacher-generated rationales improve MLLM visual persuasiveness prediction via supervised fine-tuning, while a new three-dimensional faithfulness framework shows that prediction accuracy alone does not ensure faithful reasoning and that decision sensitivity best matches human preferences.

Nucleus-Image: Sparse MoE for Image Generation

cs.CV · 2026-04-14 · unverdicted · novelty 6.0

A 17B-parameter sparse MoE diffusion transformer activates 2B parameters per pass and reaches competitive quality on image generation benchmarks without post-training.

BLK-Assist: A Methodological Framework for Artist-Led Co-Creation with Generative AI Models

cs.CY · 2026-03-10 · unverdicted · novelty 6.0

BLK-Assist is a three-part framework (Conceptor for sketches, Stencil for transparent assets, Upscale for high-res outputs) that fine-tunes public diffusion models on one artist's proprietary corpus for style-faithful generative co-creation.

citing papers explorer

Showing 6 of 6 citing papers.

SAM 3D Animal: Promptable Animal 3D Reconstruction from Images in the Wild cs.CV · 2026-05-08 · unverdicted · none · ref 37
SAM 3D Animal is the first promptable framework for multi-animal 3D reconstruction from single images, built on SMAL+ and trained on the new Herd3D dataset, achieving SOTA results on Animal3D, APTv2, and Animal Kingdom benchmarks.
Knowledge Visualization: A Benchmark and Method for Knowledge-Intensive Text-to-Image Generation cs.CV · 2026-04-24 · conditional · none · ref 68
KVBench reveals major gaps in current T2I models for knowledge-intensive tasks, and KE-Check narrows the gap between open- and closed-source models by adding structured knowledge and enforcing constraints.
Efficient Image Synthesis with Sphere Latent Encoder cs.CV · 2026-05-15 · unverdicted · none · ref 33
Decouples Sphere Encoder into fixed pretrained encoder and spherical latent denoiser, yielding higher quality and faster inference than the joint original on Animal-Faces, Oxford-Flowers and ImageNet-1K.
Can MLLMs Reason About Visual Persuasion? Evaluating the Efficacy and Faithfulness of Reasoning cs.CV · 2026-05-09 · conditional · none · ref 38
Diverse teacher-generated rationales improve MLLM visual persuasiveness prediction via supervised fine-tuning, while a new three-dimensional faithfulness framework shows that prediction accuracy alone does not ensure faithful reasoning and that decision sensitivity best matches human preferences.
Nucleus-Image: Sparse MoE for Image Generation cs.CV · 2026-04-14 · unverdicted · none · ref 13
A 17B-parameter sparse MoE diffusion transformer activates 2B parameters per pass and reaches competitive quality on image generation benchmarks without post-training.
BLK-Assist: A Methodological Framework for Artist-Led Co-Creation with Generative AI Models cs.CY · 2026-03-10 · unverdicted · none · ref 38
BLK-Assist is a three-part framework (Conceptor for sketches, Stencil for transparent assets, Upscale for high-res outputs) that fine-tunes public diffusion models on one artist's proprietary corpus for style-faithful generative co-creation.

Qwen-image technical report

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer