Qwen-Image-Flash: Beyond Objective Design

Chenfei Wu; Deqing Li; Jiahao Li; Jie Zhang; Jingren Zhou; Kaiyuan Gao; Kun Yan; Liang Peng; Lihan Jiang; Ningyuan Tang

arxiv: 2606.03746 · v2 · pith:KMGZLPZInew · submitted 2026-06-02 · 💻 cs.CV · cs.AI· cs.GR· cs.LG

Qwen-Image-Flash: Beyond Objective Design

Tianhe Wu , Kun Yan , Zikai Zhou , Lihan Jiang , Jiahao Li , Jie Zhang , Kaiyuan Gao , Ningyuan Tang

show 16 more authors

Shengming Yin Xiaoyue Chen Xiao Xu Yilei Chen Yuxiang Chen Yan Shu Yixian Xu Yanran Zhang Zihao Liu Zhendong Wang Zekai Zhang Deqing Li Liang Peng Yi Wang Jingren Zhou Chenfei Wu

This is my paper

Pith reviewed 2026-06-28 10:57 UTC · model grok-4.3

classification 💻 cs.CV cs.AIcs.GRcs.LG

keywords few-step distillationtext-to-image generationinstruction-guided editingtraining pipelinedata compositionteacher guidancetask mixtureQwen-Image-Flash

0 comments

The pith

Effective few-step distillation requires principled organization of the training pipeline beyond the distillation objective.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines few-step distillation of visual generative models, using Qwen-Image-2.0 to shift attention from distillation objectives alone to the full training recipe. It systematically varies data composition, teacher guidance, and task mixture for unified text-to-image generation and instruction-guided editing. This analysis uncovers non-obvious behaviors that guide the creation of Qwen-Image-Flash. A sympathetic reader would care because the work indicates that accelerating generative models depends on how training is structured, not solely on loss design.

Core claim

By systematically varying data composition, teacher guidance, and task mixture when distilling Qwen-Image-2.0 for unified text-to-image generation and instruction-guided image editing, the authors identify non-obvious behaviors that motivate Qwen-Image-Flash, establishing that effective few-step distillation requires not only carefully designed objectives but also principled organization of the broader training pipeline.

What carries the argument

Data composition, teacher guidance, and task mixture as the training-pipeline factors that shape student performance in few-step distillation.

If this is right

Changes in data composition produce non-obvious effects on distilled model quality.
Different strengths of teacher guidance lead to distinct student outcomes during distillation.
The ratio of tasks in the mixture between generation and editing influences final performance.
These pipeline adjustments enable the construction of Qwen-Image-Flash with improved few-step results.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same pipeline factors could be tuned when distilling other text-to-image models to achieve similar gains.
Automated search over data composition and task mixture might further improve distillation efficiency.
This emphasis on training organization could extend to few-step distillation of video or 3D generative models.

Load-bearing premise

The non-obvious behaviors observed when varying data composition, teacher guidance, and task mixture on Qwen-Image-2.0 will hold for other base models and distillation settings.

What would settle it

Repeating the same variations of data composition, teacher guidance, and task mixture on a different base model and finding that they produce no performance change or the opposite effect from what was observed with Qwen-Image-2.0.

Figures

Figures reproduced from arXiv: 2606.03746 by Chenfei Wu, Deqing Li, Jiahao Li, Jie Zhang, Jingren Zhou, Kaiyuan Gao, Kun Yan, Liang Peng, Lihan Jiang, Ningyuan Tang, Shengming Yin, Tianhe Wu, Xiao Xu, Xiaoyue Chen, Yanran Zhang, Yan Shu, Yilei Chen, Yi Wang, Yixian Xu, Yuxiang Chen, Zekai Zhang, Zhendong Wang, Zihao Liu, Zikai Zhou.

**Figure 1.** Figure 1: Qwen-Image-Flash examples. T2I and instruction-guided editing results with only 4 NFEs, showing unified few-step generation-editing capability. ∗Corresponding author 1 arXiv:2606.03746v2 [cs.CV] 3 Jun 2026 [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗

**Figure 2.** Figure 2: Qualitative comparison of T2I distillation under different training data compositions. We compare students distilled with text-centric, mixed-category, landscape-only, landscape-portrait, and portrait-only training data across representative evaluation scenarios. The results show that text-centric or more diverse mixed-category data does not necessarily improve text rendering or overall visual quality. In … view at source ↗

**Figure 3.** Figure 3: Qualitative comparison of teacher guidance strategies during distillation. (a) Direct guidance from a task-specialized teacher can destabilize training, leading to progressive degradation in alignment and visual quality. (b) Step-wise multi-teacher guidance maintains sample fidelity and layout consistency throughout distillation, yielding better-aligned generations. downstream performance of the specialize… view at source ↗

**Figure 4.** Figure 4: Qualitative comparison of joint T2I-editing distillation under different task-mixture ratios. We compare editing results from the task-specialized teacher, the T2I-only zero-shot student, and jointly distilled students trained with T2I:Edit ratios of 9:1, 7:3, and 5:5 across six editing categories. The balanced 5:5 mixture consistently achieves better instruction following while preserving image fidelity, … view at source ↗

read the original abstract

Few-step distillation has become an effective strategy for accelerating advanced visual generative models, yet prior work has largely focused on distillation objectives. In this work, we revisit few-step distillation from a complementary perspective, focusing on the training recipe that critically shapes student performance. Using Qwen-Image-2.0 as a representative case, we systematically investigate three factors in unified text-to-image generation and instruction-guided image editing distillation: data composition, teacher guidance, and task mixture. Our empirical analysis reveals several non-obvious behaviors, which motivate the development of Qwen-Image-Flash. Overall, our results suggest that effective few-step distillation requires not only carefully designed objectives, but also principled organization of the broader training pipeline.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Pipeline tweaks like data mix and teacher guidance show non-obvious effects in Qwen distillation, but the work stays tied to one model with no numbers visible yet.

read the letter

The main thing to know is that this paper argues few-step distillation works better when you organize the full training setup, not just the loss. Using Qwen-Image-2.0 they checked data composition, teacher guidance, and task mixture for text-to-image and instruction editing, spotted some unexpected patterns, and turned that into Qwen-Image-Flash.

What is new is the explicit shift to these pipeline factors as a separate lever. Most prior distillation papers fix the recipe and vary the objective; here they treat the recipe itself as the variable and run a unified study across two tasks. That framing is reasonable because real training always involves these choices together.

The paper does a clean job laying out why these factors deserve attention and showing that standard ML knobs can produce non-obvious results in this setting. Anyone already running distillation on large vision models will recognize the variables and may pick up practical signals for their own runs.

The soft spots are straightforward. The abstract contains no tables, no effect sizes, and no controls, so the actual size or reliability of the claimed behaviors cannot be judged. All the work sits on a single base model, which matches the stress-test concern: if the same changes produce different or null results on other architectures, the recommendation about principled pipeline organization does not travel far. That is a real limit on scope rather than a fatal flaw.

This is for engineers and researchers who need faster inference versions of generative image models and are willing to run their own ablations. A reader in that position can extract ideas to test; a reader looking for general principles or cross-model evidence will find less.

The paper deserves a serious referee. The topic is practically relevant and the empirical angle is coherent even if the current write-up is light on data. Send it for review so the experiments can be examined in full and the generalization question can be addressed.

Referee Report

2 major / 1 minor

Summary. The manuscript presents an empirical study on few-step distillation for text-to-image generation and instruction-guided editing. Using Qwen-Image-2.0 as the base model, it systematically ablates data composition, teacher guidance, and task mixture, identifies non-obvious behaviors from these factors, develops the Qwen-Image-Flash recipe, and concludes that effective few-step distillation requires principled organization of the broader training pipeline in addition to objective design.

Significance. If the observed behaviors prove robust, the work usefully shifts focus from distillation objectives alone to pipeline-level choices, with credit for the unified treatment of generation and editing tasks. The single-model scope, however, limits the strength of the broader claim about pipeline organization.

major comments (2)

[Abstract] Abstract: the claim of a 'systematic empirical investigation' revealing non-obvious behaviors is unsupported by any quantitative results, ablation tables, controls, or statistical details in the abstract, preventing verification of the central claim.
[Empirical analysis] Empirical analysis (throughout): all ablations and the resulting Qwen-Image-Flash recipe are performed exclusively on Qwen-Image-2.0; the paper provides no experiments on other base models, which is load-bearing for the general recommendation that pipeline organization is required beyond objectives.

minor comments (1)

The title is somewhat generic; a more specific subtitle referencing the three pipeline factors would better convey the contribution.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive feedback. We address each major comment below and have revised the manuscript to improve clarity and accuracy where feasible.

read point-by-point responses

Referee: [Abstract] Abstract: the claim of a 'systematic empirical investigation' revealing non-obvious behaviors is unsupported by any quantitative results, ablation tables, controls, or statistical details in the abstract, preventing verification of the central claim.

Authors: We agree that the abstract should better substantiate the central claims with concrete evidence. In the revised manuscript, we have updated the abstract to include specific quantitative highlights from the ablations, such as the performance gains in unified generation and editing tasks from the optimized data composition, teacher guidance, and task mixture. revision: yes
Referee: [Empirical analysis] Empirical analysis (throughout): all ablations and the resulting Qwen-Image-Flash recipe are performed exclusively on Qwen-Image-2.0; the paper provides no experiments on other base models, which is load-bearing for the general recommendation that pipeline organization is required beyond objectives.

Authors: We acknowledge the single-model scope as a limitation that restricts the strength of broader claims. The work is framed as a detailed case study on Qwen-Image-2.0 as a representative model. We have revised the manuscript to moderate the general recommendation, emphasizing the findings as suggestive for this model and calling for future validation on additional base models. revision: partial

standing simulated objections not resolved

Experiments on additional base models to support the general claim that pipeline organization is required beyond objectives.

Circularity Check

0 steps flagged

No circularity: empirical study with independent experimental support

full rationale

The paper contains no equations, derivations, or first-principles claims. It is framed entirely as an empirical investigation that reports ablation results on data composition, teacher guidance, and task mixture using Qwen-Image-2.0. The central recommendation about pipeline organization follows directly from those observed behaviors rather than reducing to any fitted parameter, self-definition, or self-citation chain. No load-bearing step equates its output to its input by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical derivations, free parameters, axioms, or invented entities are described in the abstract; the work is entirely empirical.

pith-pipeline@v0.9.1-grok · 5733 in / 1058 out tokens · 27927 ms · 2026-06-28T10:57:54.252617+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

17 extracted references · 13 linked inside Pith

[1]

Optimizing few-step generation with adaptive matching distillation.arXiv preprint arXiv:2602.07345,

Lichen Bai, Zikai Zhou, Shitong Shao, Wenliang Zhong, Shuo Yang, Shuo Chen, Bojun Chen, and Zeke Xie. Optimizing few-step generation with adaptive matching distillation.arXiv preprint arXiv:2602.07345,

Pith/arXiv arXiv
[2]

Flow-OPD: On-policy distillation for flow matching models.arXiv preprint arXiv:2605.08063,

Zhen Fang, Wenxuan Huang, Yu Zeng, Yiming Zhao, Shuang Chen, Kaituo Feng, Yunlong Lin, Lin Chen, Zehui Chen, Shaosheng Cao, et al. Flow-OPD: On-policy distillation for flow matching models.arXiv preprint arXiv:2605.08063,

Pith/arXiv arXiv
[3]

Mean flows for one-step generative modeling.arXiv preprint arXiv:2505.13447,

Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling.arXiv preprint arXiv:2505.13447,

Pith/arXiv arXiv
[4]

Distribution matching distillation meets reinforcement learning.arXiv preprint arXiv:2511.13649,

Dengyang Jiang, Dongyang Liu, Zanyi Wang, Qilong Wu, Liuzhuozheng Li, Hengzhuang Li, Xin Jin, David Liu, Changsheng Lu, Zhen Li, et al. Distribution matching distillation meets reinforcement learning.arXiv preprint arXiv:2511.13649,

arXiv
[5]

DiffusionOPD: A unified perspective of on-policy distillation in diffusion models.arXiv preprint arXiv:2605.15055,

Quanhao Li, Junqiu Yu, Kaixun Jiang, Yujie Wei, Zhen Xing, Pandeng Li, Ruihang Chu, Shiwei Zhang, Yu Liu, and Zuxuan Wu. DiffusionOPD: A unified perspective of on-policy distillation in diffusion models.arXiv preprint arXiv:2605.15055,

Pith/arXiv arXiv
[6]

Decoupled DMD: CFG augmentation as the spear, distribution matching as the shield.arXiv preprint arXiv:2511.22677,

Dongyang Liu, Peng Gao, David Liu, Ruoyi Du, Zhen Li, Qilong Wu, Xin Jin, Sihan Cao, Shifeng Zhang, Hongsheng Li, et al. Decoupled DMD: CFG augmentation as the spear, distribution matching as the shield.arXiv preprint arXiv:2511.22677,

arXiv
[7]

ERNIE-Image technical report.arXiv preprint arXiv:2605.25347, 2026a

Jiaxiang Liu, Zhida Feng, Pengyu Zou, Zhenyu Qian, Tianrui Zhu, Jun Xia, Yuehu Dong, Yanzheng Lin, Honglin Xiong, et al. ERNIE-Image technical report.arXiv preprint arXiv:2605.25347, 2026a. Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-GRPO: Training flow matching models via online...

Pith/arXiv arXiv
[8]

Latent consistency models: Synthesizing high-resolution images with few-step inference.arXiv preprint arXiv:2310.04378,

Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent consistency models: Synthesizing high-resolution images with few-step inference.arXiv preprint arXiv:2310.04378,

Pith/arXiv arXiv
[9]

TDM-R1: Reinforcing few-step diffusion models with non-differentiable reward.arXiv preprint arXiv:2603.07700,

Yihong Luo, Tianyang Hu, Weijian Luo, and Jing Tang. TDM-R1: Reinforcing few-step diffusion models with non-differentiable reward.arXiv preprint arXiv:2603.07700,

arXiv
[10]

Wan-Image: Pushing the boundaries of generative visual intelligence.arXiv preprint arXiv:2604.19858,

Chaojie Mao, Chen-Wei Xie, Chongyang Zhong, Haoyou Deng, Jiaxing Zhao, Jie Xiao, Jinbo Xing, Jingfeng Zhang, Jingren Zhou, Jingyi Zhang, et al. Wan-Image: Pushing the boundaries of generative visual intelligence.arXiv preprint arXiv:2604.19858,

Pith/arXiv arXiv
[11]

Progressive distillation for fast sampling of diffusion models.arXiv preprint arXiv:2202.00512,

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models.arXiv preprint arXiv:2202.00512,

Pith/arXiv arXiv
[12]

JoyAI-Image: Awaking spatial intelligence in unified multimodal understanding and generation.arXiv preprint arXiv:2605.04128,

Lin Song, Wenbo Li, Guoqing Ma, Wei Tang, Bo Wang, Yuan Zhang, Yijun Yang, Yicheng Xiao, Jianhui Liu, Yanbing Zhang, et al. JoyAI-Image: Awaking spatial intelligence in unified multimodal understanding and generation.arXiv preprint arXiv:2605.04128,

Pith/arXiv arXiv
[13]

TIIF-Bench: How does your T2I model follow your instructions?arXiv preprint arXiv:2506.02161,

Xinyu Wei, Jinrui Zhang, Zeqing Wang, Hongyang Wei, Zhen Guo, and Lei Zhang. TIIF-Bench: How does your T2I model follow your instructions?arXiv preprint arXiv:2506.02161,

arXiv
[14]

Diversity-preserved distribution matching distillation for fast visual synthesis.arXiv preprint arXiv:2602.03139,

Tianhe Wu, Ruibin Li, Lei Zhang, and Kede Ma. Diversity-preserved distribution matching distillation for fast visual synthesis.arXiv preprint arXiv:2602.03139,

Pith/arXiv arXiv
[15]

MiMo-V2-Flash technical report.arXiv preprint arXiv:2601.02780,

Bangjun Xiao, Bingquan Xia, Bo Yang, Bofei Gao, Bowen Shen, Chen Zhang, Chenhong He, Chiheng Lou, Fuli Luo, Gang Wang, et al. MiMo-V2-Flash technical report.arXiv preprint arXiv:2601.02780,

Pith/arXiv arXiv
[16]

Qwen3 technical report.arXiv preprint arXiv:2505.09388,

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388,

Pith/arXiv arXiv
[17]

Qwen-Image-2.0 technical report.arXiv preprint arXiv:2605.10730,

Bing Zhao, Chenfei Wu, Deqing Li, Hao Meng, Jiahao Li, Jie Zhang, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kuan Cao, et al. Qwen-Image-2.0 technical report.arXiv preprint arXiv:2605.10730,

Pith/arXiv arXiv

[1] [1]

Optimizing few-step generation with adaptive matching distillation.arXiv preprint arXiv:2602.07345,

Lichen Bai, Zikai Zhou, Shitong Shao, Wenliang Zhong, Shuo Yang, Shuo Chen, Bojun Chen, and Zeke Xie. Optimizing few-step generation with adaptive matching distillation.arXiv preprint arXiv:2602.07345,

Pith/arXiv arXiv

[2] [2]

Flow-OPD: On-policy distillation for flow matching models.arXiv preprint arXiv:2605.08063,

Zhen Fang, Wenxuan Huang, Yu Zeng, Yiming Zhao, Shuang Chen, Kaituo Feng, Yunlong Lin, Lin Chen, Zehui Chen, Shaosheng Cao, et al. Flow-OPD: On-policy distillation for flow matching models.arXiv preprint arXiv:2605.08063,

Pith/arXiv arXiv

[3] [3]

Mean flows for one-step generative modeling.arXiv preprint arXiv:2505.13447,

Zhengyang Geng, Mingyang Deng, Xingjian Bai, J Zico Kolter, and Kaiming He. Mean flows for one-step generative modeling.arXiv preprint arXiv:2505.13447,

Pith/arXiv arXiv

[4] [4]

Distribution matching distillation meets reinforcement learning.arXiv preprint arXiv:2511.13649,

Dengyang Jiang, Dongyang Liu, Zanyi Wang, Qilong Wu, Liuzhuozheng Li, Hengzhuang Li, Xin Jin, David Liu, Changsheng Lu, Zhen Li, et al. Distribution matching distillation meets reinforcement learning.arXiv preprint arXiv:2511.13649,

arXiv

[5] [5]

DiffusionOPD: A unified perspective of on-policy distillation in diffusion models.arXiv preprint arXiv:2605.15055,

Quanhao Li, Junqiu Yu, Kaixun Jiang, Yujie Wei, Zhen Xing, Pandeng Li, Ruihang Chu, Shiwei Zhang, Yu Liu, and Zuxuan Wu. DiffusionOPD: A unified perspective of on-policy distillation in diffusion models.arXiv preprint arXiv:2605.15055,

Pith/arXiv arXiv

[6] [6]

Decoupled DMD: CFG augmentation as the spear, distribution matching as the shield.arXiv preprint arXiv:2511.22677,

Dongyang Liu, Peng Gao, David Liu, Ruoyi Du, Zhen Li, Qilong Wu, Xin Jin, Sihan Cao, Shifeng Zhang, Hongsheng Li, et al. Decoupled DMD: CFG augmentation as the spear, distribution matching as the shield.arXiv preprint arXiv:2511.22677,

arXiv

[7] [7]

ERNIE-Image technical report.arXiv preprint arXiv:2605.25347, 2026a

Jiaxiang Liu, Zhida Feng, Pengyu Zou, Zhenyu Qian, Tianrui Zhu, Jun Xia, Yuehu Dong, Yanzheng Lin, Honglin Xiong, et al. ERNIE-Image technical report.arXiv preprint arXiv:2605.25347, 2026a. Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-GRPO: Training flow matching models via online...

Pith/arXiv arXiv

[8] [8]

Latent consistency models: Synthesizing high-resolution images with few-step inference.arXiv preprint arXiv:2310.04378,

Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent consistency models: Synthesizing high-resolution images with few-step inference.arXiv preprint arXiv:2310.04378,

Pith/arXiv arXiv

[9] [9]

TDM-R1: Reinforcing few-step diffusion models with non-differentiable reward.arXiv preprint arXiv:2603.07700,

Yihong Luo, Tianyang Hu, Weijian Luo, and Jing Tang. TDM-R1: Reinforcing few-step diffusion models with non-differentiable reward.arXiv preprint arXiv:2603.07700,

arXiv

[10] [10]

Wan-Image: Pushing the boundaries of generative visual intelligence.arXiv preprint arXiv:2604.19858,

Chaojie Mao, Chen-Wei Xie, Chongyang Zhong, Haoyou Deng, Jiaxing Zhao, Jie Xiao, Jinbo Xing, Jingfeng Zhang, Jingren Zhou, Jingyi Zhang, et al. Wan-Image: Pushing the boundaries of generative visual intelligence.arXiv preprint arXiv:2604.19858,

Pith/arXiv arXiv

[11] [11]

Progressive distillation for fast sampling of diffusion models.arXiv preprint arXiv:2202.00512,

Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models.arXiv preprint arXiv:2202.00512,

Pith/arXiv arXiv

[12] [12]

JoyAI-Image: Awaking spatial intelligence in unified multimodal understanding and generation.arXiv preprint arXiv:2605.04128,

Lin Song, Wenbo Li, Guoqing Ma, Wei Tang, Bo Wang, Yuan Zhang, Yijun Yang, Yicheng Xiao, Jianhui Liu, Yanbing Zhang, et al. JoyAI-Image: Awaking spatial intelligence in unified multimodal understanding and generation.arXiv preprint arXiv:2605.04128,

Pith/arXiv arXiv

[13] [13]

TIIF-Bench: How does your T2I model follow your instructions?arXiv preprint arXiv:2506.02161,

Xinyu Wei, Jinrui Zhang, Zeqing Wang, Hongyang Wei, Zhen Guo, and Lei Zhang. TIIF-Bench: How does your T2I model follow your instructions?arXiv preprint arXiv:2506.02161,

arXiv

[14] [14]

Diversity-preserved distribution matching distillation for fast visual synthesis.arXiv preprint arXiv:2602.03139,

Tianhe Wu, Ruibin Li, Lei Zhang, and Kede Ma. Diversity-preserved distribution matching distillation for fast visual synthesis.arXiv preprint arXiv:2602.03139,

Pith/arXiv arXiv

[15] [15]

MiMo-V2-Flash technical report.arXiv preprint arXiv:2601.02780,

Bangjun Xiao, Bingquan Xia, Bo Yang, Bofei Gao, Bowen Shen, Chen Zhang, Chenhong He, Chiheng Lou, Fuli Luo, Gang Wang, et al. MiMo-V2-Flash technical report.arXiv preprint arXiv:2601.02780,

Pith/arXiv arXiv

[16] [16]

Qwen3 technical report.arXiv preprint arXiv:2505.09388,

An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388,

Pith/arXiv arXiv

[17] [17]

Qwen-Image-2.0 technical report.arXiv preprint arXiv:2605.10730,

Bing Zhao, Chenfei Wu, Deqing Li, Hao Meng, Jiahao Li, Jie Zhang, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kuan Cao, et al. Qwen-Image-2.0 technical report.arXiv preprint arXiv:2605.10730,

Pith/arXiv arXiv