CraftGraffiti: Exploring Human Identity with Custom Graffiti Art via Facial-Preserving Diffusion Models
Pith reviewed 2026-05-18 20:41 UTC · model grok-4.3
The pith
A diffusion-based system creates graffiti art from photos while keeping the subject's face recognizable by styling first then locking in identity.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
CraftGraffiti shows that graffiti style transfer followed by identity enforcement via augmented self-attention layers produces outputs with reduced facial attribute drift, competitive identity preservation metrics, and high aesthetic and preference scores, outperforming the identity-first ordering.
What carries the argument
The face-consistent self-attention mechanism that augments attention layers with explicit identity embeddings to protect facial features after style application.
If this is right
- Ordering style transfer before identity enforcement measurably lowers attribute drift relative to the reverse sequence.
- The resulting images achieve competitive facial feature consistency with existing methods.
- Aesthetic quality and human preference reach state-of-the-art levels for text-guided graffiti generation.
- Pose customization works without keypoints while facial coherence is retained.
- The pipeline supports live creative deployments such as festival installations.
Where Pith is reading between the lines
- The same ordering principle could be tested on other extreme stylizations such as pixel art or mosaic rendering to see whether identity drift drops similarly.
- Integration with existing portrait editing tools might extend the approach to video or animation sequences.
- Cultural applications could include generating identity-preserving art for communities that value both stylistic freedom and recognizability.
Load-bearing premise
The explicit identity embeddings added to attention layers will reliably block subtle distortions to eyes, nose, or mouth even when graffiti stylization is extreme.
What would settle it
Generate a set of graffiti images from the same input faces using the system and check whether independent viewers can still match them to the originals at rates no better than chance or whether measurable eye-nose-mouth drift exceeds that of a simple reverse-order baseline.
Figures
read the original abstract
Preserving facial identity under extreme stylistic transformation remains a major challenge in generative art. In graffiti, a high-contrast, abstract medium, subtle distortions to the eyes, nose, or mouth can erase the subject's recognizability, undermining both personal and cultural authenticity. We present CraftGraffiti, an end-to-end text-guided graffiti generation framework designed with facial feature preservation as a primary objective. Given an input image and a style and pose descriptive prompt, CraftGraffiti first applies graffiti style transfer via LoRA-fine-tuned pretrained diffusion transformer, then enforces identity fidelity through a face-consistent self-attention mechanism that augments attention layers with explicit identity embeddings. Pose customization is achieved without keypoints, using CLIP-guided prompt extension to enable dynamic re-posing while retaining facial coherence. We formally justify and empirically validate the "style-first, identity-after" paradigm, showing it reduces attribute drift compared to the reverse order. Quantitative results demonstrate competitive facial feature consistency and state-of-the-art aesthetic and human preference scores, while qualitative analyses and a live deployment at the Cruilla Festival highlight the system's real-world creative impact. CraftGraffiti advances the goal of identity-respectful AI-assisted artistry, offering a principled approach for blending stylistic freedom with recognizability in creative AI applications.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents CraftGraffiti, an end-to-end text-guided framework for generating custom graffiti art from an input face image and descriptive prompts. It adopts a 'style-first, identity-after' pipeline: LoRA-fine-tuned pretrained diffusion transformer for graffiti style transfer, followed by a face-consistent self-attention mechanism that augments attention layers with explicit identity embeddings to enforce facial fidelity. Pose customization is handled via CLIP-guided prompt extension without keypoints. The work formally justifies and empirically validates that this ordering reduces attribute drift relative to the reverse sequence, reporting competitive facial feature consistency together with state-of-the-art aesthetic and human-preference scores, supported by quantitative tables, qualitative examples, and a live deployment at the Cruilla Festival.
Significance. If the empirical claims are substantiated, the paper would make a meaningful contribution to identity-preserving generative models for highly abstract artistic domains. The explicit validation of the style-first ordering, the real-world festival deployment, and the focus on recognizability in graffiti together address a practical gap between stylistic freedom and cultural/personal authenticity in creative AI applications.
major comments (2)
- [Quantitative Evaluation] Quantitative Evaluation section: the reported facial-feature consistency scores rely on standard identity embeddings (ArcFace or equivalent). These embeddings are trained on photorealistic data; the paper does not demonstrate that the same embeddings remain reliable under extreme high-contrast graffiti stylization where eyes, nose, and mouth can be heavily abstracted. Without a domain-specific validation (e.g., human study on identity recognition in the generated graffiti or an alternative metric), the central claim that the style-first paradigm reduces attribute drift cannot be considered fully supported.
- [Framework and Ablation] Framework and Ablation sections: the face-consistent self-attention is described as the safeguard against subtle distortions, yet the manuscript provides no targeted ablation isolating its contribution specifically on identity-critical regions (eyes/nose/mouth) under the most extreme stylization prompts. This leaves the mechanistic justification for the paradigm partially unproven.
minor comments (2)
- [Abstract] The abstract states that the paradigm is 'formally justified'; if this justification appears in §3 or §4, a one-sentence pointer in the abstract would improve readability.
- [Tables] Tables reporting quantitative scores should include standard deviations or confidence intervals and the exact number of human raters for the preference study.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback, which raises valid points about metric reliability and ablation depth in stylized domains. We address each major comment below and will revise the manuscript accordingly to provide stronger empirical support.
read point-by-point responses
-
Referee: [Quantitative Evaluation] Quantitative Evaluation section: the reported facial-feature consistency scores rely on standard identity embeddings (ArcFace or equivalent). These embeddings are trained on photorealistic data; the paper does not demonstrate that the same embeddings remain reliable under extreme high-contrast graffiti stylization where eyes, nose, and mouth can be heavily abstracted. Without a domain-specific validation (e.g., human study on identity recognition in the generated graffiti or an alternative metric), the central claim that the style-first paradigm reduces attribute drift cannot be considered fully supported.
Authors: We acknowledge that ArcFace embeddings are trained on photorealistic data and may have reduced reliability for heavily abstracted graffiti styles. While our existing human preference scores provide indirect support for recognizability, we agree this does not fully substitute for domain-specific validation. In the revised manuscript, we will add a targeted human study in which participants match generated graffiti images to original face photos and rate identity similarity. Results will be reported alongside the existing metrics in the Quantitative Evaluation section to directly support the claim that the style-first ordering reduces attribute drift. revision: yes
-
Referee: [Framework and Ablation] Framework and Ablation sections: the face-consistent self-attention is described as the safeguard against subtle distortions, yet the manuscript provides no targeted ablation isolating its contribution specifically on identity-critical regions (eyes/nose/mouth) under the most extreme stylization prompts. This leaves the mechanistic justification for the paradigm partially unproven.
Authors: The face-consistent self-attention augments attention layers with identity embeddings precisely to protect critical facial regions. To strengthen the mechanistic evidence, we will add a focused ablation study in the revised manuscript. This will compare outputs with and without the self-attention module under extreme stylization prompts, with quantitative evaluation of distortions localized to the eyes, nose, and mouth regions (using region-specific feature similarity) as well as qualitative examples. These results will be included in the Ablation section. revision: yes
Circularity Check
No circularity: framework relies on external pretrained components and empirical validation
full rationale
The paper presents CraftGraffiti as an end-to-end framework that applies LoRA-fine-tuned pretrained diffusion transformers for style transfer followed by a face-consistent self-attention mechanism using explicit identity embeddings. The 'style-first, identity-after' paradigm is described as formally justified and empirically validated against the reverse order, with quantitative results on facial consistency, aesthetics, and human preference. No equations, fitted parameters, or self-referential definitions are shown that reduce any claimed prediction or improvement to quantities defined inside the paper itself. All core modules reference external pretrained models (diffusion transformers, CLIP) rather than internal fits or self-citations that would create a closed loop. The derivation chain therefore remains self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We formally justify and empirically validate the 'style-first, identity-after' paradigm... face-consistent self-attention mechanism that augments attention layers with explicit identity embeddings.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Quantitative results demonstrate competitive facial feature consistency and state-of-the-art aesthetic and human preference scores.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
DocRevive: A Unified Pipeline for Document Text Restoration
DocRevive builds a unified pipeline using OCR, image analysis, language models, and diffusion to reconstruct degraded document text, backed by a 30k-image synthetic dataset and the UCSM metric.
-
DocRevive: A Unified Pipeline for Document Text Restoration
A unified pipeline using OCR, inpainting, and diffusion models restores text in degraded documents on a new synthetic benchmark dataset, evaluated with the proposed UCSM metric.
Reference graph
Works this paper leans on
-
[1]
https://en.wikipedia.org/wiki/Living_lab, 2025
Living lab. https://en.wikipedia.org/wiki/Living_lab, 2025. Accessed: 2025-08-09
work page 2025
-
[2]
Mousa Al-kfairy, Dheya Mustafa, Nir Kshetri, Mazen Insiew, and Omar Alfandi. Ethical challenges and solutions of generative ai: An interdisciplinary perspective.Informatics, 11(3):58,
-
[3]
Svgcraft: Beyond single object text-to-svg synthesis with comprehensive canvas layout, 2025
Ayan Banerjee, Nityanand Mathur, Josep Lladós, Umapada Pal, and Anjan Dutta. Svgcraft: Beyond single object text-to-svg synthesis with comprehensive canvas layout, 2025
work page 2025
-
[4]
Bombing, tagging, writing: An analysis of the significance of graffiti and street art
Lindsay Bates. Bombing, tagging, writing: An analysis of the significance of graffiti and street art. PhD thesis, University of Pennsylvania, 2014
work page 2014
-
[5]
Fairness in machine learning: Lessons from political philosophy
Reuben Binns. Fairness in machine learning: Lessons from political philosophy. Proceedings of the 2017 FMML Workshop on Fair ML, 2017. arXiv preprint arXiv:1712.03586
-
[6]
Instructpix2pix: Learning to follow image editing instructions
Tim Brooks, Aleksander Holynski, and Alexei A Efros. Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18392–18402, 2023
work page 2023
-
[7]
Vggface2: A dataset for recognising faces across pose and age
Qiong Cao, Li Shen, Weidi Xie, Omkar M Parkhi, and Andrew Zisserman. Vggface2: A dataset for recognising faces across pose and age. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), pages 67–74. IEEE, 2018
work page 2018
-
[8]
Upgpt: Universal diffusion model for person image generation, editing and pose transfer
Soon Yau Cheong, Armin Mustafa, and Andrew Gilbert. Upgpt: Universal diffusion model for person image generation, editing and pose transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4173–4182, 2023
work page 2023
-
[9]
Living labs and user engagement for innovation and sustainability
Luca Compagnucci, Francesca Spigarelli, Jorge Coelho, and Carlos Duarte. Living labs and user engagement for innovation and sustainability. Journal of Cleaner Production, 317:128223, 2021
work page 2021
-
[10]
Power of graffiti: Exploring its cultural and social significance
Saday Chandra Das. Power of graffiti: Exploring its cultural and social significance. Aayushi International Interdisciplinary Research Journal (AIIRJ), X (IX), pages 34–35, 2023. 9
work page 2023
-
[11]
Prompt tuning inversion for text- driven image editing using diffusion models
Wenkai Dong, Song Xue, Xiaoyue Duan, and Shumin Han. Prompt tuning inversion for text- driven image editing using diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7430–7440, 2023
work page 2023
-
[12]
Martin Nicolas Everaert, Marco Bocchio, Sami Arpa, Sabine Süsstrunk, and Radhakrishna Achanta. Diffusion in style. In Proceedings of the ieee/cvf international conference on computer vision, pages 2251–2261, 2023
work page 2023
-
[13]
Fairness and bias in artificial intelligence: A survey
Emilio Ferrara. Fairness and bias in artificial intelligence: A survey. Digital, 6(1):1–41, 2023
work page 2023
-
[14]
Evaluating the cultural signifi- cance of historic graffiti
Alan M Forster, Samantha Vettese-Forster, and John Borland. Evaluating the cultural signifi- cance of historic graffiti. Structural Survey, 30(1):43–64, 2012
work page 2012
-
[15]
i don’t see myself represented here at all
Sourojit Ghosh, Nina Lutz, and Aylin Caliskan. “i don’t see myself represented here at all”: User experiences of stable diffusion outputs containing representational harms across gender identities and nationalities. In Proceedings of the AAAI/ACM conference on AI, ethics, and society, volume 7, pages 463–475, 2024
work page 2024
- [16]
-
[17]
Focus on your instruction: Fine-grained and multi-instruction image editing by attention modulation
Qin Guo and Tianwei Lin. Focus on your instruction: Fine-grained and multi-instruction image editing by attention modulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6986–6996, 2024
work page 2024
-
[18]
Diffusion-enhanced patchmatch: A framework for arbitrary style transfer with diffusion models
Mark Hamazaspyan and Shant Navasardyan. Diffusion-enhanced patchmatch: A framework for arbitrary style transfer with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 797–805, 2023
work page 2023
-
[19]
Lora: Low-rank adaptation of large language models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022
work page 2022
-
[20]
Diffstyler: Controllable dual diffusion for text-driven image stylization
Nisha Huang, Yuxin Zhang, Fan Tang, Chongyang Ma, Haibin Huang, Weiming Dong, and Changsheng Xu. Diffstyler: Controllable dual diffusion for text-driven image stylization. IEEE Transactions on Neural Networks and Learning Systems, 2024
work page 2024
-
[21]
Diffusion model-based image editing: A survey
Yi Huang, Jiancheng Huang, Yifan Liu, Mingfu Yan, Jiaxi Lv, Jianzhuang Liu, Wei Xiong, He Zhang, Liangliang Cao, and Shifeng Chen. Diffusion model-based image editing: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025
work page 2025
-
[22]
Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[23]
Humansd: A native skeleton-guided diffusion model for human image generation
Xuan Ju, Ailing Zeng, Chenchen Zhao, Jianan Wang, Lei Zhang, and Qiang Xu. Humansd: A native skeleton-guided diffusion model for human image generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15988–15998, 2023
work page 2023
-
[24]
Imagic: Text-based real image editing with diffusion models
Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. Imagic: Text-based real image editing with diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6007–6017, 2023
work page 2023
-
[25]
Reposedm: Recurrent pose alignment and gradient guidance for pose guided image synthesis
Anant Khandelwal. Reposedm: Recurrent pose alignment and gradient guidance for pose guided image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2495–2504, 2024
work page 2024
-
[26]
Ecoval: Ecological validity of cues and representative design in user experience evaluations
Suzanne Kieffer. Ecoval: Ecological validity of cues and representative design in user experience evaluations. AIS Transactions on Human-Computer Interaction, 9(2):149–172, 2017
work page 2017
-
[27]
Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, et al. Flux. 1 kontext: Flow matching for in-context image generation and editing in latent space. arXiv preprint arXiv:2506.15742, 2025. 10
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[28]
Universal style transfer via feature transforms
Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, and Ming-Hsuan Yang. Universal style transfer via feature transforms. Advances in neural information processing systems, 30, 2017
work page 2017
-
[29]
Global and local consistent age generative adversarial network (glca-gan)
Zhen Li, Ping Wang, Qiong Hu, and Ran He. Global and local consistent age generative adversarial network (glca-gan). In Proceedings of the 26th ACM International Conference on Multimedia, pages 305–313, 2018
work page 2018
-
[30]
Null-text inversion for editing real images using guided diffusion models
Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Null-text inversion for editing real images using guided diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6038–6047, 2023
work page 2023
-
[31]
Ava: A large-scale database for aesthetic visual analysis
Naila Murray, Luca Marchesotti, and Florent Perronnin. Ava: A large-scale database for aesthetic visual analysis. In 2012 IEEE conference on computer vision and pattern recognition, pages 2408–2415. IEEE, 2012
work page 2012
-
[32]
Uncovering bias in face generation models
Cristian Muñoz, Nicola Zannone, Mohamed Mohammed, and Adriano Koshiyama. Uncovering bias in face generation models. arXiv preprint arXiv:2302.11562, 2023
-
[33]
Contrastive denoising score for text-guided latent diffusion image editing
Hyelin Nam, Gihyun Kwon, Geon Yeong Park, and Jong Chul Ye. Contrastive denoising score for text-guided latent diffusion image editing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9192–9201, 2024
work page 2024
-
[34]
Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, et al. Do transformer modifi- cations transfer across implementations and applications? arXiv preprint arXiv:2102.11972, 2021
-
[35]
Diffbody: Diffusion-based pose and shape editing of human images
Yuta Okuyama, Yuki Endo, and Yoshihiro Kanamori. Diffbody: Diffusion-based pose and shape editing of human images. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 6333–6342, 2024
work page 2024
-
[36]
Ziheng Ouyang, Zhen Li, and Qibin Hou. K-lora: Unlocking training-free fusion of any subject and style loras. arXiv preprint arXiv:2502.18461, 2025
-
[37]
Enhancing dreambooth with lora for generating unlimited characters with stable diffusion
Rubén Pascual, Adrián Maiza, Mikel Sesma-Sara, Daniel Paternain, and Mikel Galar. Enhancing dreambooth with lora for generating unlimited characters with stable diffusion. In 2024 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2024
work page 2024
-
[38]
Pytorch: An imperative style, high-performance deep learning library
Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019
work page 2019
-
[39]
SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis
Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[40]
Crowd sensing and living lab outdoor experimentation made easy
Evangelos Pournaras, Atif Nabi Ghulam, Renato Kunz, and Regula Hänggli. Crowd sensing and living lab outdoor experimentation made easy. arXiv preprint arXiv:2107.04117, 2021
-
[41]
Training- free identity preservation in stylized image generation using diffusion models
Mohammad Ali Rezaei, Helia Hajikazem, Saeed Khanehgir, and Mahdi Javanmardi. Training- free identity preservation in stylized image generation using diffusion models. arXiv preprint arXiv:2506.06802, 2025
-
[42]
High- resolution image synthesis with latent diffusion models
Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022
work page 2022
-
[43]
Facenet: A unified embedding for face recognition and clustering
Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823, 2015. 11
work page 2015
-
[44]
Smiling women pitching down: Auditing gender bias in image generative ai
Chien Sun, William Tzeng, et al. Smiling women pitching down: Auditing gender bias in image generative ai. arXiv preprint arXiv:2305.10566, 2023
-
[45]
Training-free consistent text-to-image generation
Yoad Tewel, Omri Kaduri, Rinon Gal, Yoni Kasten, Lior Wolf, Gal Chechik, and Yuval Atzmon. Training-free consistent text-to-image generation. ACM Transactions on Graphics (TOG) , 43(4):1–18, 2024
work page 2024
-
[46]
Instantstyle-plus: Style transfer with content-preserving in text-to-image generation
Haofan Wang, Peng Xing, Renyuan Huang, Hao Ai, Qixun Wang, and Xu Bai. Instantstyle- plus: Style transfer with content-preserving in text-to-image generation. arXiv preprint arXiv:2407.00788, 2024
-
[47]
Stable-pose: Leveraging transformers for pose-guided text-to-image generation
Jiajun Wang, Morteza Ghahremani Boozandani, Yitong Li, Björn Ommer, and Christian Wachinger. Stable-pose: Leveraging transformers for pose-guided text-to-image generation. Advances in Neural Information Processing Systems, 37:65670–65698, 2024
work page 2024
-
[48]
Interactive image style transfer guided by graffiti
Quan Wang, Yanli Ren, Xinpeng Zhang, and Guorui Feng. Interactive image style transfer guided by graffiti. In Proceedings of the 31st ACM International Conference on Multimedia, pages 6685–6694, 2023
work page 2023
-
[49]
Au- tostory: Generating diverse storytelling images with minimal human efforts
Wen Wang, Canyu Zhao, Hao Chen, Zhekai Chen, Kecheng Zheng, and Chunhua Shen. Au- tostory: Generating diverse storytelling images with minimal human efforts. International Journal of Computer Vision, pages 1–22, 2024
work page 2024
-
[50]
Stylediffusion: Controllable disentangled style transfer via diffusion models
Zhizhong Wang, Lei Zhao, and Wei Xing. Stylediffusion: Controllable disentangled style transfer via diffusion models. In Proceedings of the IEEE/CVF international conference on computer vision, pages 7677–7689, 2023
work page 2023
-
[51]
A latent space of stochastic diffusion models for zero-shot image editing and guidance
Chen Henry Wu and Fernando De la Torre. A latent space of stochastic diffusion models for zero-shot image editing and guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7378–7387, 2023
work page 2023
-
[52]
Human preference score: Better aligning text-to-image models with human preference
Xiaoshi Wu, Keqiang Sun, Feng Zhu, Rui Zhao, and Hongsheng Li. Human preference score: Better aligning text-to-image models with human preference. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2096–2105, 2023
work page 2096
-
[53]
Drb-gan: A dynamic res- block generative adversarial network for artistic style transfer
Wenju Xu, Chengjiang Long, Ruisheng Wang, and Guanghui Wang. Drb-gan: A dynamic res- block generative adversarial network for artistic style transfer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6383–6392, 2021
work page 2021
-
[54]
Controllable artistic text style transfer via shape-matching gan
Shuai Yang, Zhangyang Wang, Zhaowen Wang, Ning Xu, Jiaying Liu, and Zongming Guo. Controllable artistic text style transfer via shape-matching gan. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4442–4451, 2019
work page 2019
-
[55]
IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models
Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. arXiv preprint arXiv:2308.06721, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[56]
Inversion-based style transfer with diffusion models
Yuxin Zhang, Nisha Huang, Fan Tang, Haibin Huang, Chongyang Ma, Weiming Dong, and Changsheng Xu. Inversion-based style transfer with diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10146–10156, 2023
work page 2023
-
[57]
Sine: Single image editing with text-to-image diffusion models
Zhixing Zhang, Ligong Han, Arnab Ghosh, Dimitris N Metaxas, and Jian Ren. Sine: Single image editing with text-to-image diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6027–6037, 2023
work page 2023
-
[58]
Xiang Zhou. Bias in generative ai. arXiv preprint arXiv:2403.02726, 2024
-
[59]
Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, and Qibin Hou. Storydiffusion: Consistent self-attention for long-range image and video generation. Advances in Neural Information Processing Systems, 37:110315–110340, 2024. 12 A Demonstration at Cruïlla Festival An installation (see Fig. 8) implementing the proposed system was deployed during the F...
work page 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.