Flow-OPD: On-Policy Distillation for Flow Matching Models
Pith reviewed 2026-05-20 22:35 UTC · model grok-4.3
The pith
Flow-OPD aligns flow matching text-to-image models on multiple tasks by distilling expertise from single-reward teachers without gradient clashes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Flow-OPD first cultivates domain-specialized teacher models via single-reward GRPO fine-tuning, then uses a Flow-based Cold-Start to initialize a policy and consolidates the heterogeneous expertise into one student through on-policy sampling, task-routing labeling, and dense trajectory-level supervision, augmented by Manifold Anchor Regularization from a task-agnostic teacher; on Stable Diffusion 3.5 Medium this raises GenEval from 63 to 92 and OCR accuracy from 59 to 94, for an overall gain of roughly 10 points over vanilla GRPO while preserving fidelity and human-preference alignment and producing an emergent teacher-surpassing effect.
What carries the argument
The two-stage on-policy distillation pipeline that samples trajectories from the current student, routes them to the appropriate teacher for labeling, and applies dense supervision plus Manifold Anchor Regularization to consolidate multiple objectives without interference.
If this is right
- Specialized expertise from isolated single-reward teachers transfers into one policy without metric trade-offs.
- Dense trajectory supervision and task routing produce higher combined benchmark scores than joint multi-reward training.
- Manifold Anchor Regularization keeps image fidelity and human preference scores stable during aggressive alignment.
- The student can exceed the performance of its source teachers on some tasks.
- The approach scales as a general post-training method for building generalist flow matching text-to-image models.
Where Pith is reading between the lines
- The same teacher-then-distill pattern might apply to other generative paradigms that currently struggle with multi-objective fine-tuning.
- Removing the cold-start initialization or the manifold anchor could be tested to measure how much each component contributes to stability.
- If the teacher-surpassing effect holds across more tasks, it may indicate that the orchestration creates new synergies rather than simple averaging of capabilities.
Load-bearing premise
That single-reward teachers can be merged via on-policy sampling and anchoring without reintroducing gradient interference or reward hacking when the student faces heterogeneous objectives.
What would settle it
Training the student on the routed data and observing the same seesaw effect, reward hacking, or aesthetic drop as vanilla GRPO, or finding that the student scores below the strongest individual teacher on any single task, would show the consolidation step does not deliver the claimed benefits.
Figures
read the original abstract
Existing Flow Matching (FM) text-to-image models suffer from two critical bottlenecks under multi-task alignment: the reward sparsity induced by scalar-valued rewards, and the gradient interference arising from jointly optimizing heterogeneous objectives, which together give rise to a 'seesaw effect' of competing metrics and pervasive reward hacking. Inspired by the success of On-Policy Distillation (OPD) in the large language model community, we propose Flow-OPD, the first unified post-training framework that integrates on-policy distillation into Flow Matching models. Flow-OPD adopts a two-stage alignment strategy: it first cultivates domain-specialized teacher models via single-reward GRPO fine-tuning, allowing each expert to reach its performance ceiling in isolation; it then establishes a robust initial policy through a Flow-based Cold-Start scheme and seamlessly consolidates heterogeneous expertise into a single student via a three-step orchestration of on-policy sampling, task-routing labeling, and dense trajectory-level supervision. We further introduce Manifold Anchor Regularization (MAR), which leverages a task-agnostic teacher to provide full-data supervision that anchors generation to a high-quality manifold, effectively mitigating the aesthetic degradation commonly observed in purely RL-driven alignment. Built upon Stable Diffusion 3.5 Medium, Flow-OPD raises the GenEval score from 63 to 92 and the OCR accuracy from 59 to 94, yielding an overall improvement of roughly 10 points over vanilla GRPO, while preserving image fidelity and human-preference alignment and exhibiting an emergent 'teacher-surpassing' effect. These results establish Flow-OPD as a scalable alignment paradigm for building generalist text-to-image models. The codes and weights will be released in: https://github.com/CostaliyA/Flow-OPD .
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes Flow-OPD, a two-stage post-training framework for Flow Matching text-to-image models that first trains domain-specialized teachers via single-reward GRPO and then consolidates them into a student using on-policy sampling, task-routing labeling, dense trajectory supervision, and Manifold Anchor Regularization (MAR) to mitigate reward sparsity and gradient interference. Built on Stable Diffusion 3.5 Medium, it reports GenEval rising from 63 to 92 and OCR accuracy from 59 to 94, with an overall ~10-point gain over vanilla GRPO, preserved fidelity, and an emergent teacher-surpassing effect.
Significance. If the performance gains and mechanism hold under rigorous controls, the work offers a scalable paradigm for multi-task alignment of flow-based generative models by extending on-policy distillation ideas from LLMs, with the MAR component providing a concrete way to anchor aesthetics. The planned code and weight release is a clear strength for reproducibility.
major comments (1)
- [§4 (Experiments)] §4 (Experiments): The central claim that task-routing labeling and dense trajectory supervision eliminate gradient interference (and thereby the seesaw effect) is load-bearing, yet the manuscript provides no quantification of gradient cosine similarities across tasks, no per-task reward curves during student training, and no ablation isolating the routing mechanism from potential confounds such as the cold-start policy or MAR. This leaves the attribution of the reported metric gains to the distillation procedure unverified.
minor comments (3)
- [Abstract and §4.1] The abstract and §4.1 should report exact baseline configurations for vanilla GRPO (including reward weights, sampling steps, and statistical significance tests) rather than summary deltas.
- [§3.2] §3.2: The Flow-based Cold-Start scheme is described at a high level; explicit hyperparameter values, loss formulations, and initialization details are needed for reproducibility.
- [Table 2] Table 2 or equivalent: Clarify whether the GenEval and OCR numbers reflect single-run results or averages over multiple seeds, and include standard deviations.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive feedback on our manuscript. We have carefully reviewed the major comment concerning the empirical support for our claims about gradient interference mitigation and provide a point-by-point response below. We commit to making the necessary revisions to strengthen the validation of our proposed mechanisms.
read point-by-point responses
-
Referee: The central claim that task-routing labeling and dense trajectory supervision eliminate gradient interference (and thereby the seesaw effect) is load-bearing, yet the manuscript provides no quantification of gradient cosine similarities across tasks, no per-task reward curves during student training, and no ablation isolating the routing mechanism from potential confounds such as the cold-start policy or MAR. This leaves the attribution of the reported metric gains to the distillation procedure unverified.
Authors: We agree that direct quantification would provide stronger mechanistic evidence for the role of task-routing labeling and dense trajectory supervision in reducing gradient interference. The current results demonstrate substantial gains over vanilla GRPO along with an emergent teacher-surpassing effect, which offer indirect support for the overall framework. However, to address the concern rigorously, the revised manuscript will include: (1) measurements of gradient cosine similarities across tasks during student training to quantify interference reduction; (2) per-task reward curves to illustrate stable multi-objective optimization without seesaw dynamics; and (3) a controlled ablation isolating the routing and dense supervision components from the cold-start policy and MAR. These analyses will be added to Section 4 to better attribute the metric improvements to the distillation procedure. revision: yes
Circularity Check
No circularity: empirical method with benchmarked gains
full rationale
The paper describes an empirical two-stage post-training procedure (single-reward GRPO teachers, Flow-based Cold-Start, on-policy sampling with task-routing, dense supervision, and Manifold Anchor Regularization) applied to Stable Diffusion 3.5 Medium. Reported gains such as GenEval rising from 63 to 92 and OCR from 59 to 94 are presented as experimental outcomes on standard benchmarks, not as quantities derived from equations or parameters that are defined in terms of themselves. No mathematical derivations, uniqueness theorems, or fitted-input predictions appear in the provided text that reduce the central claims to tautological inputs by construction. The approach builds on existing RL and distillation techniques without self-referential definitions or load-bearing self-citations that substitute for independent validation.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, et al. Flux. 1 kontext: Flow matching for in-context image generation and editing in latent space.arXiv e-prints, pages arXiv–2506, 2025
work page 2025
-
[2]
Scaling rectified flow trans- formers for high-resolution image synthesis
Patrick Esser, Sumith Kulal, Andreas Blattmann, Rahim Entezari, Jonas Müller, Harry Saini, Yam Levi, Dominik Lorenz, Axel Sauer, Frederic Boesel, et al. Scaling rectified flow trans- formers for high-resolution image synthesis. InForty-first international conference on machine learning, 2024
work page 2024
-
[3]
Flow Matching for Generative Modeling
Yaron Lipman, Ricky TQ Chen, Heli Ben-Hamu, Maximilian Nickel, and Matt Le. Flow matching for generative modeling.arXiv preprint arXiv:2210.02747, 2022
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[4]
Zhen Fang, Zhuoyang Liu, Jiaming Liu, Hao Chen, Yu Zeng, Shiting Huang, Zehui Chen, Lin Chen, Shanghang Zhang, and Feng Zhao. Dualvla: Building a generalizable embodied agent via partial decoupling of reasoning and action.arXiv preprint arXiv:2511.22134, 2025
-
[5]
Vision-r1: Incentivizing reasoning capability in multimodal large language models, 2026
Wenxuan Huang, Bohan Jia, Zijie Zhai, Shaosheng Cao, Zheyu Ye, Fei Zhao, Zhe Xu, Xu Tang, Yao Hu, and Shaohui Lin. Vision-r1: Incentivizing reasoning capability in multimodal large language models, 2026
work page 2026
-
[6]
Wenxuan Huang, Yu Zeng, Qiuchen Wang, Zhen Fang, Shaosheng Cao, Zheng Chu, Qingyu Yin, Shuang Chen, Zhenfei Yin, Lin Chen, et al. Vision-deepresearch: Incentivizing deepresearch capability in multimodal large language models.arXiv preprint arXiv:2601.22060, 2026
-
[7]
Shuang Chen, Yue Guo, Zhaochen Su, Yafu Li, Yulun Wu, Jiacheng Chen, Jiayu Chen, Weijie Wang, Xiaoye Qu, and Yu Cheng. Advancing multimodal reasoning: From optimized cold start to staged reinforcement learning.arXiv preprint arXiv:2506.04207, 2025
-
[8]
Shuang Chen, Yue Guo, Yimeng Ye, Shijue Huang, Wenbo Hu, Haoxi Li, Manyuan Zhang, Jiayu Chen, Song Guo, and Nanyun Peng. Ares: Multimodal adaptive reasoning via difficulty-aware token-level entropy shaping.arXiv preprint arXiv:2510.08457, 2025
-
[10]
Opensearch-vl: An open recipe for frontier multimodal search agents, 2026
Shuang Chen, Kaituo Feng, Hangting Chen, Wenxuan Huang, Dasen Dai, Quanxin Shou, Yunlong Lin, Xiangyu Yue, Shenghua Gao, and Tianyu Pang. Opensearch-vl: An open recipe for frontier multimodal search agents, 2026
work page 2026
-
[11]
Ruiyan Han, Zhen Fang, XinYu Sun, Yuchen Ma, Ziheng Wang, Yu Zeng, Zehui Chen, Lin Chen, Wenxuan Huang, Wei-Jie Xu, et al. Unicorn: Towards self-improving unified multimodal models through self-generated supervision.arXiv preprint arXiv:2601.03193, 2026
-
[12]
Shuang Chen, Quanxin Shou, Hangting Chen, Yucheng Zhou, Kaituo Feng, Wenbo Hu, Yi- Fan Zhang, Yunlong Lin, Wenxuan Huang, Mingyang Song, et al. Unify-agent: A unified multimodal agent for world-grounded image synthesis.arXiv preprint arXiv:2603.29620, 2026
-
[13]
Gen-Searcher: Reinforcing Agentic Search for Image Generation
Kaituo Feng, Manyuan Zhang, Shuang Chen, Yunlong Lin, Kaixuan Fan, Yilei Jiang, Hongyu Li, Dian Zheng, Chenyang Wang, and Xiangyu Yue. Gen-searcher: Reinforcing agentic search for image generation.arXiv preprint arXiv:2603.28767, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[14]
Interleaving reasoning for better text-to-image generation
Wenxuan Huang, Shuang Chen, Zheyong Xie, Shaosheng Cao, Shixiang Tang, Yufan Shen, Qingyu Yin, Wenbo Hu, Xiaoman Wang, Yuntian Tang, et al. Interleaving reasoning for better text-to-image generation.arXiv preprint arXiv:2509.06945, 2025
-
[15]
DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning
Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning.arXiv preprint arXiv:2501.12948, 2025. 10
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[17]
Dancegrpo: Unleashing grpo on visual generation, 2025
Zeyue Xue, Jie Wu, Yu Gao, Fangyuan Kong, Lingting Zhu, Mengzhao Chen, Zhiheng Liu, Wei Liu, Qiushan Guo, Weilin Huang, and Ping Luo. Dancegrpo: Unleashing grpo on visual generation, 2025
work page 2025
-
[18]
MixGRPO: Unlocking Flow-based GRPO Efficiency with Mixed ODE-SDE
Junzhe Li, Yutao Cui, Tao Huang, Yinping Ma, Chun Fan, Yiming Cheng, Miles Yang, Zhao Zhong, and Liefeng Bo. Mixgrpo: Unlocking flow-based grpo efficiency with mixed ode-sde. arXiv preprint arXiv:2507.21802, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[19]
GLM-5: from Vibe Coding to Agentic Engineering
Aohan Zeng, Xin Lv, Zhenyu Hou, Zhengxiao Du, Qinkai Zheng, Bin Chen, Da Yin, Chendi Ge, Chenghua Huang, Chengxing Xie, et al. Glm-5: from vibe coding to agentic engineering. arXiv preprint arXiv:2602.15763, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[20]
MiMo-V2-Flash Technical Report
Bangjun Xiao, Bingquan Xia, Bo Yang, Bofei Gao, Bowen Shen, Chen Zhang, Chenhong He, Chiheng Lou, Fuli Luo, Gang Wang, et al. Mimo-v2-flash technical report.arXiv preprint arXiv:2601.02780, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[21]
Dhruba Ghosh, Hannaneh Hajishirzi, and Ludwig Schmidt. Geneval: An object-focused framework for evaluating text-to-image alignment.Advances in Neural Information Processing Systems, 36:52132–52152, 2023
work page 2023
-
[22]
Jingye Chen, Yupan Huang, Tengchao Lv, Lei Cui, Qifeng Chen, and Furu Wei. Textdiffuser: Diffusion models as text painters.Advances in Neural Information Processing Systems, 36:9353– 9387, 2023
work page 2023
-
[23]
Training diffusion models with reinforcement learning, 2024
Kevin Black, Michael Janner, Yilun Du, Ilya Kostrikov, and Sergey Levine. Training diffusion models with reinforcement learning, 2024
work page 2024
-
[24]
Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models, 2023
Ying Fan, Olivia Watkins, Yuqing Du, Hao Liu, Moonkyung Ryu, Craig Boutilier, Pieter Abbeel, Mohammad Ghavamzadeh, Kangwook Lee, and Kimin Lee. Dpok: Reinforcement learning for fine-tuning text-to-image diffusion models, 2023
work page 2023
-
[25]
Imagereward: learning and evaluating human preferences for text-to-image generation
Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagereward: learning and evaluating human preferences for text-to-image generation. InProceedings of the 37th International Conference on Neural Information Processing Systems, pages 15903–15935, 2023
work page 2023
-
[26]
Diffusion model alignment using direct preference optimization, 2023
Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, and Nikhil Naik. Diffusion model alignment using direct preference optimization, 2023
work page 2023
-
[27]
Flow-GRPO: Training Flow Matching Models via Online RL
Jie Liu, Gongye Liu, Jiajun Liang, Yangguang Li, Jiaheng Liu, Xintao Wang, Pengfei Wan, Di Zhang, and Wanli Ouyang. Flow-grpo: Training flow matching models via online rl.arXiv preprint arXiv:2505.05470, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[28]
Shihao Yuan, Yahui Liu, Yang Yue, Jingyuan Zhang, Wangmeng Zuo, Qi Wang, Fuzheng Zhang, and Guorui Zhou. Ar-grpo: Training autoregressive image generation models via reinforcement learning.arXiv preprint arXiv:2508.06924, 2025
-
[29]
Guohui Zhang, Hu Yu, Xiaoxiao Ma, JingHao Zhang, Yaning Pan, Mingde Yao, Jie Xiao, Linjiang Huang, and Feng Zhao. Group critical-token policy optimization for autoregressive image generation.arXiv preprint arXiv:2509.22485, 2025
-
[30]
Xiaoxiao Ma, Haibo Qiu, Guohui Zhang, Zhixiong Zeng, Siqi Yang, Lin Ma, and Feng Zhao. Stage: Stable and generalizable grpo for autoregressive image generation.arXiv preprint arXiv:2509.25027, 2025
-
[31]
Guohui Zhang, Hu Yu, Xiaoxiao Ma, Yaning Pan, Hang Xu, and Feng Zhao. Maskfocus: Focusing policy optimization on critical steps for masked image generation.arXiv preprint arXiv:2512.18766, 2025. 11
-
[32]
MAR-GRPO: Stabilized GRPO for AR-diffusion Hybrid Image Generation
Xiaoxiao Ma, Jiachen Lei, Tianfei Ren, Jie Huang, Siming Fu, Aiming Hao, Jiahong Wu, Xiangxiang Chu, and Feng Zhao. Mar-grpo: Stabilized grpo for ar-diffusion hybrid image generation.arXiv preprint arXiv:2604.06966, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[33]
Shih-Yang Liu, Xin Dong, Ximing Lu, Shizhe Diao, Peter Belcak, Mingjie Liu, Min-Hung Chen, Hongxu Yin, Yu-Chiang Frank Wang, Kwang-Ting Cheng, Yejin Choi, Jan Kautz, and Pavlo Molchanov. Gdpo: Group reward-decoupled normalization policy optimization for multi-reward rl optimization, 2026
work page 2026
-
[34]
On-policy distillation of language models: Learning from self-generated mistakes
Rishabh Agarwal, Nino Vieillard, Yongchao Zhou, Piotr Stanczyk, Sabela Ramos Garea, Matthieu Geist, and Olivier Bachem. On-policy distillation of language models: Learning from self-generated mistakes. InThe twelfth international conference on learning representations, 2024
work page 2024
-
[35]
Minillm: Knowledge distillation of large language models
Yuxian Gu, Li Dong, Furu Wei, and Minlie Huang. Minillm: Knowledge distillation of large language models. InThe twelfth international conference on learning representations, 2024
work page 2024
-
[36]
Jongwoo Ko, Tianyi Chen, Sungnyun Kim, Tianyu Ding, Luming Liang, Ilya Zharkov, and Se-Young Yun. Distillm-2: A contrastive approach boosts the distillation of llms.arXiv preprint arXiv:2503.07067, 2025
-
[37]
Learning beyond Teacher: Generalized On-Policy Distillation with Reward Extrapolation
Wenkai Yang, Weijie Liu, Ruobing Xie, Kai Yang, Saiyong Yang, and Yankai Lin. Learning beyond teacher: Generalized on-policy distillation with reward extrapolation.arXiv preprint arXiv:2602.12125, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[38]
Entropy-aware on-policy distillation of language models
Woogyeol Jin, Taywon Min, Yongjin Yang, Swanand Ravindra Kadhe, Yi Zhou, Dennis Wei, Nathalie Baracaldo, and Kimin Lee. Entropy-aware on-policy distillation of language models. arXiv preprint arXiv:2603.07079, 2026
work page internal anchor Pith review arXiv 2026
-
[39]
Dongxu Zhang, Zhichao Yang, Sepehr Janghorbani, Jun Han, Andrew Ressler II, Qian Qian, Gregory D Lyng, Sanjit Singh Batra, and Robert E Tillman. Fast and effective on-policy distillation from reasoning prefixes.arXiv preprint arXiv:2602.15260, 2026
-
[40]
Yuanda Xu, Hejian Sang, Zhengze Zhou, Ran He, and Zhipeng Wang. Paced: Distillation and self-distillation at the frontier of student competence.arXiv e-prints, pages arXiv–2603, 2026
work page 2026
-
[41]
On-policy distillation.Thinking Machines Lab: Connec- tionism, 2025
Kevin Lu and Thinking Machines Lab. On-policy distillation.Thinking Machines Lab: Connec- tionism, 2025. https://thinkingmachines.ai/blog/on-policy-distillation
work page 2025
-
[42]
Yuval Kirstain, Adam Polyak, Uriel Singer, Shahbuland Matiana, Joe Penna, and Omer Levy. Pick-a-pic: An open dataset of user preferences for text-to-image generation.Advances in Neural Information Processing Systems, 36:36652–36663, 2023
work page 2023
-
[43]
Zhiyuan You, Xin Cai, Jinjin Gu, Tianfan Xue, and Chao Dong. Teaching large language models to regress accurate image quality scores using score distribution.arXiv preprint arXiv:2501.11561, 2025
- [44]
-
[45]
Imagereward: Learning and evaluating human preferences for text-to-image generation
Jiazheng Xu, Xiao Liu, Yuchen Wu, Yuxuan Tong, Qinkai Li, Ming Ding, Jie Tang, and Yuxiao Dong. Imagereward: Learning and evaluating human preferences for text-to-image generation. Advances in Neural Information Processing Systems, 36, 2024
work page 2024
-
[46]
Unified Reward Model for Multimodal Understanding and Generation
Yibin Wang, Yuhang Zang, Hao Li, Cheng Jin, and Jiaqi Wang. Unified reward model for multimodal understanding and generation.arXiv preprint arXiv:2503.05236, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[47]
Xiaoshi Wu, Yiming Hao, Keqiang Sun, Yixiong Chen, Feng Zhu, Rui Zhao, and Hongsheng Li. Human preference score v2: A solid benchmark for evaluating human preferences of text-to-image synthesis.arXiv preprint arXiv:2306.09341, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[48]
An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Gao, Chengen Huang, Chenxu Lv, et al. Qwen3 technical report.arXiv preprint arXiv:2505.09388, 2025. 12
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[49]
DiffusionNFT: Online Diffusion Reinforcement with Forward Process
Kaiwen Zheng, Huayu Chen, Haotian Ye, Haoxiang Wang, Qinsheng Zhang, Kai Jiang, Hang Su, Stefano Ermon, Jun Zhu, and Ming-Yu Liu. Diffusionnft: Online diffusion reinforcement with forward process.arXiv preprint arXiv:2509.16117, 2025. 13 A More Details Following the data and reward configurations of Flow-GRPO, we conducted multi-task hybrid training for G...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[50]
•3 (Fair):In focus, adequate lighting, but lacks creativity
Aesthetic Quality •1-2 (Low):Blurry, poor lighting, or chaotic composition. •3 (Fair):In focus, adequate lighting, but lacks creativity. •4-5 (High):Sharp, vibrant colors, masterful composition and impact
-
[51]
•3 (Fair):Partially follows, but distorts some important elements
Instruction Following •1-2 (Low):Ignores or contradicts the instruction; misses key elements. •3 (Fair):Partially follows, but distorts some important elements. •4-5 (High):Faithful representation of all elements in the prompt
-
[52]
Overall Score (Priority: Alignment>Aesthetics) The overall score must primarily reflectInstruction Following. A fair image that perfectly follows the prompt scores higher than a beautiful image that misses it. [EXECUTION RULES] •Strictness:Be rigorous; required details must be explicitly supported. •Reasoning:You MUST analyze keyword-by-keyword in the<Tho...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.