Recognition: 2 theorem links
· Lean TheoremPremier: Personalized Preference Modulation with Learnable User Embedding in Text-to-Image Generation
Pith reviewed 2026-05-15 07:15 UTC · model grok-4.3
The pith
Premier learns distinct user preference embeddings that fuse with text prompts to personalize image generation.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Premier represents each user's preference as a learnable embedding and introduces a preference adapter that fuses the user embedding with the text prompt. The fused preference embedding is further used to modulate the generative process. A dispersion loss enforces separation among user embeddings to enhance distinctness and alignment. When user data are scarce, new users are represented as linear combinations of existing preference embeddings learned during training.
What carries the argument
The learnable user preference embedding combined with a preference adapter that fuses it into the text prompt and modulates generation.
If this is right
- Stronger preference alignment than prior methods when both use the same length of user history.
- Improved text consistency and higher scores on ViPer proxy metrics.
- Better results according to expert human evaluations.
- Effective personalization for new users without requiring large amounts of their own data.
Where Pith is reading between the lines
- The linear-combination approach for new users could be tested as a lightweight way to initialize personalization in other conditional generation tasks.
- Because embeddings are kept separate by the dispersion loss, the framework might support user-level privacy controls by allowing selective forgetting or isolation of individual embeddings.
- If the modulation mechanism proves stable, similar adapters could be explored for video or 3D generation conditioned on the same preference vectors.
Load-bearing premise
User preferences can be faithfully captured by low-dimensional learnable embeddings that stay distinct under dispersion loss and generalize accurately to unseen users via linear combinations of trained embeddings.
What would settle it
An experiment that replaces learned user embeddings with random vectors or untrained linear combinations for new users and measures whether preference alignment, text consistency, and expert ratings drop to the level of non-personalized baselines.
Figures
read the original abstract
Text-to-image generation has advanced rapidly, yet it still struggles to capture the nuanced user preferences. Existing approaches typically rely on multimodal large language models to infer user preferences, but the derived prompts or latent codes rarely reflect them faithfully, leading to suboptimal personalization. We present Premier, a novel preference modulation framework for personalized image generation. Premier represents each user's preference as a learnable embedding and introduces a preference adapter that fuses the user embedding with the text prompt. To enable accurate and fine-grained preference control, the fused preference embedding is further used to modulate the generative process. To enhance the distinctness of individual preference and improve alignment between outputs and user-specific styles, we incorporate a dispersion loss that enforces separation among user embeddings. When user data are scarce, new users are represented as linear combinations of existing preference embeddings learned during training, enabling effective generalization. Experiments show that Premier outperforms prior methods under the same history length, achieving stronger preference alignment and superior performance on text consistency, ViPer proxy metrics, and expert evaluations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Premier, a preference modulation framework for personalized text-to-image generation. Each user is represented by a learnable embedding; a preference adapter fuses this embedding with the text prompt, and the fused representation modulates the generative process. A dispersion loss encourages separation among embeddings. For users with scarce data, new preference vectors are formed as linear combinations of the embeddings learned on the training set. Experiments report that Premier outperforms prior methods at matched history lengths on preference alignment, text consistency, ViPer proxy metrics, and expert evaluations.
Significance. If the linear-combination generalization holds and the reported metric gains are robust, the work would offer a practical, low-data route to user-specific control that avoids per-user fine-tuning or heavy reliance on MLLM inference. The dispersion loss and adapter are incremental but well-motivated extensions of existing embedding techniques.
major comments (2)
- [Method (generalization subsection)] The central generalization claim (new users represented as linear combinations of training embeddings) is load-bearing yet unsupported by any analysis showing that the dispersion loss produces an embedding space whose convex hull covers the distribution of real user preferences. Without such evidence or a controlled ablation on coefficient recovery from few examples, the reported gains for low-history regimes cannot be distinguished from overfitting to the training user set.
- [Experiments] The abstract and experimental claims assert consistent outperformance on preference alignment, text consistency, ViPer metrics, and expert scores, but no table or section provides the exact baseline implementations, metric definitions, statistical significance tests, or train/test user splits. This absence prevents verification that post-hoc choices were not made after observing results.
minor comments (2)
- [Method] Notation for the fused preference embedding and the modulation operation should be introduced with explicit equations rather than prose descriptions to improve reproducibility.
- [Method] The paper should clarify whether the linear coefficients for unseen users are obtained by a closed-form solve, a small optimization step, or another procedure, and report the computational cost of this step.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and describe the revisions we will make to strengthen the manuscript.
read point-by-point responses
-
Referee: [Method (generalization subsection)] The central generalization claim (new users represented as linear combinations of training embeddings) is load-bearing yet unsupported by any analysis showing that the dispersion loss produces an embedding space whose convex hull covers the distribution of real user preferences. Without such evidence or a controlled ablation on coefficient recovery from few examples, the reported gains for low-history regimes cannot be distinguished from overfitting to the training user set.
Authors: We appreciate the referee's concern regarding the generalization mechanism. The dispersion loss is explicitly designed to encourage separation among user embeddings, creating a basis in which new preferences can be expressed as linear combinations. We agree that direct evidence for convex-hull coverage and coefficient recovery would strengthen the claim. In the revised manuscript we will add (i) a quantitative analysis and visualization of the learned embedding space (including convex-hull coverage metrics) and (ii) a controlled ablation that recovers combination coefficients from few-shot examples of held-out users and reports the resulting preference-alignment performance. These additions will clarify that gains in low-history regimes arise from the proposed linear-combination approach rather than overfitting. revision: yes
-
Referee: [Experiments] The abstract and experimental claims assert consistent outperformance on preference alignment, text consistency, ViPer metrics, and expert scores, but no table or section provides the exact baseline implementations, metric definitions, statistical significance tests, or train/test user splits. This absence prevents verification that post-hoc choices were not made after observing results.
Authors: We acknowledge that the current manuscript lacks sufficient experimental detail for full reproducibility and verification. In the revised version we will expand the Experiments section and add a dedicated appendix that provides: (1) exact code-level descriptions and hyper-parameters for every baseline, (2) precise mathematical definitions and implementation details for all metrics (including ViPer), (3) statistical significance results (paired t-tests or Wilcoxon tests with p-values), and (4) explicit documentation of the train/test user splits and data-partitioning protocol. These changes will eliminate ambiguity about post-hoc decisions. revision: yes
Circularity Check
No significant circularity; derivation relies on standard ML constructs
full rationale
The paper presents a learnable user embedding fused via a preference adapter, modulated into the generative process, with a dispersion loss for separation and linear combination fallback for scarce-data users. These are standard embedding and generalization techniques grounded in external ML literature (e.g., user embeddings in recommendation systems and convex combinations in few-shot adaptation). No equations reduce claimed improvements to fitted parameters by definition, no self-citations are invoked as load-bearing uniqueness theorems, and no ansatz is smuggled via prior author work. The central claims rest on experimental comparisons against baselines rather than tautological reductions, making the derivation self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
free parameters (1)
- user preference embeddings
invented entities (1)
-
preference adapter
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_add / embed_injective echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
When user data are scarce, new users are represented as linear combinations of existing preference embeddings learned during training
-
IndisputableMonolith/Cost/FunctionalEquation.leanJcost_pos_of_ne_one echoes?
echoesECHOES: this paper passage has the same mathematical shape or conceptual pattern as the Recognition theorem, but is not a direct formal dependency.
we introduce a dispersion loss that promotes separation between the adapter’s output representations in the feature space
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Prefgen: Multimodal preference learning for preference-conditioned image generation
Anonymous. Prefgen: Multimodal preference learning for preference-conditioned image generation. InSubmitted to The Fourteenth International Conference on Learning Repre- sentations, 2025. under review. 3, 6
work page 2025
-
[2]
Zhipeng Bian, Jieming Zhu, Qijiong Liu, Wang Lin, Guohao Cai, Zhaocheng Du, Jiacheng Sun, Zhou Zhao, and Zhenhua Dong. Icg: Improving cover image generation via mllm- based prompting and personalized preference alignment. In Proceedings of the 2025 Conference on Empirical Methods in Natural Language Processing, pages 12279–12289, 2025. 2, 3
work page 2025
-
[3]
Ledits++: Limitless image editing us- ing text-to-image models
Manuel Brack, Felix Friedrich, Katharia Kornmeier, Linoy Tsaban, Patrick Schramowski, Kristian Kersting, and Apolin´ario Passos. Ledits++: Limitless image editing us- ing text-to-image models. InProceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 8861–8870, 2024. 3
work page 2024
-
[4]
Bowen Chen, Mengyi Zhao, Haomiao Sun, Li Chen, Xu Wang, Kang Du, and Xinglong Wu. Xverse: Consistent multi-subject control of identity and semantic attributes via dit modulation.arXiv preprint arXiv:2506.21416, 2025. 3, 4
-
[5]
Tailored visions: Enhancing text-to-image generation with personalized prompt rewriting
Zijie Chen, Lichao Zhang, Fangsheng Weng, Lili Pan, and Zhenzhong Lan. Tailored visions: Enhancing text-to-image generation with personalized prompt rewriting. InProceed- ings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7727–7736, 2024. 3
work page 2024
-
[6]
Personalized preference fine-tuning of diffusion models
Meihua Dang, Anikait Singh, Linqi Zhou, Stefano Ermon, and Jiaming Song. Personalized preference fine-tuning of diffusion models. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 8020–8030, 2025. 3
work page 2025
-
[7]
Emerging Properties in Unified Multimodal Pretraining
Chaorui Deng, Deyao Zhu, Kunchang Li, Chenhui Gou, Feng Li, Zeyu Wang, Shu Zhong, Weihao Yu, Xiaonan Nie, Ziang Song, et al. Emerging properties in unified multimodal pre- training.arXiv preprint arXiv:2505.14683, 2025. 6, 7
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[8]
Connor Dunlop, Matthew Zheng, Kavana Venkatesh, and Pinar Yanardag. Personalized image editing in text-to-image diffusion models via collaborative direct preference optimiza- tion. InThe Thirty-ninth Annual Conference on Neural Infor- mation Processing Systems. 3
-
[9]
Bingjie Gao, Xinyu Gao, Xiaoxue Wu, Yujie Zhou, Yu Qiao, Qwen-Image-EditOurs Flux Bagel DrUM User Preference Image Female chef logo trending in artstation Beautiful Dove Cameron Princess Mermaid Siren Female Pink Mermaid Hime Figure A.Qualitative comparison on PIP-dataset. User Preference Image Flux Ours Prompt A graceful elf-like woman with long blonde h...
work page 2025
-
[10]
Daniel Garibi, Shahar Yadin, Roni Paiss, Omer Tov, Shiran Zada, Ariel Ephrat, Tomer Michaeli, Inbar Mosseri, and Tali Dekel. Tokenverse: Versatile multi-concept personalization in token modulation space.ACM Transactions On Graphics (TOG), 44(4):1–11, 2025. 3, 4
work page 2025
-
[11]
Im- agegem: In-the-wild generative image interaction dataset for generative model personalization
Yuanhe Guo, Linxi Xie, Zhuoran Chen, Kangrui Yu, Ryan Po, Guandao Yang, Gordon Wetzstein, and Hongyi Wen. Im- agegem: In-the-wild generative image interaction dataset for generative model personalization. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 19577–19586, 2025. 3
work page 2025
-
[12]
Optimiz- ing prompts for text-to-image generation
Yaru Hao, Zewen Chi, Li Dong, and Furu Wei. Optimiz- ing prompts for text-to-image generation. InProceedings of the 37th International Conference on Neural Information Processing Systems, Red Hook, NY , USA, 2023. 2
work page 2023
-
[13]
Momentum contrast for unsupervised visual rep- resentation learning
Kaiming He, Haoqi Fan, Yuxin Wu, Saining Xie, and Ross Girshick. Momentum contrast for unsupervised visual rep- resentation learning. InProceedings of the IEEE/CVF con- ference on computer vision and pattern recognition, pages 9729–9738, 2020. 5
work page 2020
-
[14]
Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020. 2
work page 2020
-
[15]
Lora: Low-rank adaptation of large language models.ICLR, 1(2):3,
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.ICLR, 1(2):3,
-
[16]
An open, free, and uncensored captioning visual language model (vlm)., 2024
JoyCaption. An open, free, and uncensored captioning visual language model (vlm)., 2024. Accessed: 2024-11-01. 7
work page 2024
-
[17]
Hyungjin Kim, Seokho Ahn, and Young-Duk Seo. Draw your mind: Personalized generation via condition-level modeling in text-to-image diffusion models. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 17171–17180, 2025. 3, 6, 7
work page 2025
-
[18]
Black Forest Labs, Stephen Batifol, Andreas Blattmann, Fred- eric Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, Sumith Kulal, Kyle Lacey, Yam Levi, Cheng Li, Dominik Lorenz, Jonas M¨uller, Flux Qwen Image Edit Bagel InstantStyle DrUM OursViPerUser Preference Image User 17963 User 17985 User 17996 User 18023...
work page 2025
-
[19]
Wen Li, Muyuan Fang, Cheng Zou, Biao Gong, Ruobing Zheng, Meng Wang, Jingdong Chen, and Ming Yang. Styleto- kenizer: Defining image style by a single instance for control- Flux Qwen Image Edit Bagel InstantStyle DrUM OursViPerUser Preference Image User 18148 User 18117 User 18132 User 18143 Two people sitting on a park bench under a purple flowered tree A...
work page 2024
-
[20]
Yang Li, Songlin Yang, Xiaoxuan Han, Wei Wang, Jing Dong, Yueming Lyu, and Ziyu Xue. Instant preference alignment for text-to-image diffusion models.arXiv preprint arXiv:2508.17718, 2025. 2, 3
-
[21]
Run Ling, Wenji Wang, Yuting Liu, Guibing Guo, Linying Jiang, and Xingwei Wang. Ragar: Retrieval augment person- alized image generation guided by recommendation.arXiv preprint arXiv:2505.01657, 2025. 2, 3
-
[22]
Flow straight and fast: Learning to generate and transfer data with rectified flow
Xingchao Liu, Chengyue Gong, and Qiang Liu. Flow straight and fast: Learning to generate and transfer data with rectified flow. InThe Eleventh International Conference on Learning Representations (ICLR), 2023. 2, 4, 1
work page 2023
-
[23]
Rethinking cross-modal interaction in multimodal diffusion transformers
Zhengyao Lv, Tianlin Pan, Chenyang Si, Zhaoxi Chen, Wang- meng Zuo, Ziwei Liu, and Kwan-Yee K Wong. Rethinking cross-modal interaction in multimodal diffusion transformers. arXiv preprint arXiv:2506.07986, 2025. 2
-
[24]
Bingqi Ma, Zhuofan Zong, Guanglu Song, Hongsheng Li, and Yu Liu. Exploring the role of large language models in prompt encoding for diffusion models.arXiv preprint arXiv:2406.11831, 2024. 2
-
[25]
Prodigy: An expeditiously adaptive parameter-free learner
Konstantin Mishchenko and Aaron Defazio. Prodigy: An expeditiously adaptive parameter-free learner. InForty-first International Conference on Machine Learning, 2024. 7, 1
work page 2024
-
[26]
Dynamic prompt optimizing for text-to-image generation
Wenyi Mo, Tianyu Zhang, Yalong Bai, Bing Su, Ji-Rong Wen, and Qing Yang. Dynamic prompt optimizing for text-to-image generation. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 26627– 26636, 2024. 2
work page 2024
-
[27]
Preference adaptive and sequential text-to- image generation.arXiv preprint arXiv:2412.10419, 2024
Ofir Nabati, Guy Tennenholtz, ChihWei Hsu, Moonkyung Ryu, Deepak Ramachandran, Yinlam Chow, Xiang Li, and Craig Boutilier. Preference adaptive and sequential text-to- image generation.arXiv preprint arXiv:2412.10419, 2024. 3
-
[28]
Scalable diffusion models with transformers
William Peebles and Saining Xie. Scalable diffusion models with transformers. InProceedings of the IEEE/CVF inter- national conference on computer vision, pages 4195–4205,
-
[29]
Learning transferable visual models from natural language supervi- sion
Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervi- sion. InInternational conference on machine learning, pages 8748–8763. PmLR, 2021. 4, 7, 1
work page 2021
-
[30]
Viper: Visual personalization of generative models via individual preference learning
Sogand Salehi, Mahdi Shafiei, Teresa Yeo, Roman Bachmann, and Amir Zamir. Viper: Visual personalization of generative models via individual preference learning. InEuropean Con- ference on Computer Vision, pages 391–406. Springer, 2024. 3, 6, 7, 1
work page 2024
-
[31]
Pmg: Personalized multimodal generation with large language models
Xiaoteng Shen, Rui Zhang, Xiaoyan Zhao, Jieming Zhu, and Xi Xiao. Pmg: Personalized multimodal generation with large language models. InProceedings of the ACM Web Conference 2024, pages 3833–3843, 2024. 2, 3
work page 2024
-
[32]
Score-Based Generative Modeling through Stochastic Differential Equations
Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Ab- hishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations. arXiv preprint arXiv:2011.13456, 2020. 2
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[33]
Ominicontrol: Minimal and univer- sal control for diffusion transformer
Zhenxiong Tan, Songhua Liu, Xingyi Yang, Qiaochu Xue, and Xinchao Wang. Ominicontrol: Minimal and univer- sal control for diffusion transformer. InProceedings of the IEEE/CVF International Conference on Computer Vision, pages 14940–14950, 2025. 2, 3
work page 2025
-
[34]
Diffusion model align- ment using direct preference optimization
Bram Wallace, Meihua Dang, Rafael Rafailov, Linqi Zhou, Aaron Lou, Senthil Purushwalkam, Stefano Ermon, Caiming Xiong, Shafiq Joty, and Nikhil Naik. Diffusion model align- ment using direct preference optimization. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 8228–8238, 2024. 3
work page 2024
-
[35]
Cong Wang, Jiaxi Gu, Panwen Hu, Haoyu Zhao, Yuanfan Guo, Jianhua Han, Hang Xu, and Xiaodan Liang. Easycontrol: Transfer controlnet to video diffusion for controllable gen- eration and interpolation.arXiv preprint arXiv:2408.13005,
-
[36]
arXiv preprint arXiv:2404.02733 (2024)
Haofan Wang, Matteo Spinelli, Qixun Wang, Xu Bai, Zekui Qin, and Anthony Chen. Instantstyle: Free lunch towards style-preserving in text-to-image generation.arXiv preprint arXiv:2404.02733, 2024. 3, 6, 7
-
[37]
Runqian Wang and Kaiming He. Diffuse and disperse: Image generation with representation regularization.arXiv preprint arXiv:2506.09027, 2025. 5
-
[38]
Omnistyle: Filtering high quality style transfer data at scale
Ye Wang, Ruiqi Liu, Jiang Lin, Fei Liu, Zili Yi, Yilin Wang, and Rui Ma. Omnistyle: Filtering high quality style transfer data at scale. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 7847–7856, 2025. 3
work page 2025
-
[39]
Zhouxia Wang, Xintao Wang, Liangbin Xie, Zhongang Qi, Ying Shan, Wenping Wang, and Ping Luo. Styleadapter: A unified stylized image generation model.International Journal of Computer Vision, 133(4):1894–1911, 2025
work page 1911
-
[40]
Ace: Anti-editing concept erasure in text-to- image models
Zihao Wang, Yuxiang Wei, Fan Li, Renjing Pei, Hang Xu, and Wangmeng Zuo. Ace: Anti-editing concept erasure in text-to- image models. InProceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 23505– 23515, 2025. 3
work page 2025
-
[41]
Qwen-image technical report, 2025
Chenfei Wu, Jiahao Li, Jingren Zhou, Junyang Lin, Kaiyuan Gao, Kun Yan, Sheng ming Yin, Shuai Bai, Xiao Xu, Yilei Chen, Yuxiang Chen, Zecheng Tang, Zekai Zhang, Zhengyi Wang, An Yang, Bowen Yu, Chen Cheng, Dayiheng Liu, De- qing Li, Hang Zhang, Hao Meng, Hu Wei, Jingyuan Ni, Kai Chen, Kuan Cao, Liang Peng, Lin Qu, Minggang Wu, Peng Wang, Shuting Yu, Tingk...
work page 2025
-
[42]
Omnigen: Unified image genera- tion
Shitao Xiao, Yueze Wang, Junjie Zhou, Huaying Yuan, Xin- grun Xing, Ruiran Yan, Chaofan Li, Shuting Wang, Tiejun Huang, and Zheng Liu. Omnigen: Unified image genera- tion. InProceedings of the Computer Vision and Pattern Recognition Conference, pages 13294–13304, 2025. 2, 3
work page 2025
-
[43]
Jingjing Xu, Xu Sun, Zhiyuan Zhang, Guangxiang Zhao, and Junyang Lin. Understanding and improving layer normaliza- tion.Advances in neural information processing systems, 32,
-
[44]
Personalized image generation with large multimodal models
Yiyan Xu, Wenjie Wang, Yang Zhang, Biao Tang, Peng Yan, Fuli Feng, and Xiangnan He. Personalized image generation with large multimodal models. InProceedings of the ACM on Web Conference 2025, pages 264–274, 2025. 2, 3
work page 2025
-
[45]
Drc: Enhancing personalized image generation via disentangled representation composition
Yiyan Xu, Wuqiang Zheng, Wenjie Wang, Fengbin Zhu, Xint- ing Hu, Yang Zhang, Fuli Feng, and Tat-Seng Chua. Drc: Enhancing personalized image generation via disentangled representation composition. InProceedings of the 33rd ACM International Conference on Multimedia, pages 9667–9676,
-
[46]
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. Ip- adapter: Text compatible image prompt adapter for text-to- image diffusion models.arXiv preprint arXiv:2308.06721,
work page internal anchor Pith review Pith/arXiv arXiv
-
[47]
Adding conditional control to text-to-image diffusion models
Lvmin Zhang, Anyi Rao, and Maneesh Agrawala. Adding conditional control to text-to-image diffusion models. In Proceedings of the IEEE/CVF international conference on computer vision, pages 3836–3847, 2023. 3
work page 2023
-
[48]
The unreasonable effectiveness of deep features as a perceptual metric
Richard Zhang, Phillip Isola, Alexei A Efros, Eli Shechtman, and Oliver Wang. The unreasonable effectiveness of deep features as a perceptual metric. InProceedings of the IEEE conference on computer vision and pattern recognition, pages 586–595, 2018. 7, 1
work page 2018
-
[49]
Weizhi Zhong, Huan Yang, Zheng Liu, Huiguo He, Zijian He, Xuesong Niu, Di Zhang, and Guanbin Li. Mod-adapter: Tuning-free and versatile multi-concept personalization via modulation adapter.arXiv preprint arXiv:2505.18612, 2025. 4
-
[50]
Easyref: Omni-generalized group image reference for dif- fusion models via multimodal llm
Zhuofan Zong, Dongzhi Jiang, Bingqi Ma, Guanglu Song, Hao Shao, Dazhong Shen, Yu Liu, and Hongsheng Li. Easyref: Omni-generalized group image reference for dif- fusion models via multimodal llm. InForty-second Interna- tional Conference on Machine Learning, 2024. 3
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.