CraftGraffiti: Exploring Human Identity with Custom Graffiti Art via Facial-Preserving Diffusion Models

Ayan Banerjee; Fernando Vilari\~no; Josep Llad\'os

arxiv: 2508.20640 · v2 · submitted 2025-08-28 · 💻 cs.CV

CraftGraffiti: Exploring Human Identity with Custom Graffiti Art via Facial-Preserving Diffusion Models

Ayan Banerjee , Fernando Vilari\~no , Josep Llad\'os This is my paper

Pith reviewed 2026-05-18 20:41 UTC · model grok-4.3

classification 💻 cs.CV

keywords graffiti generationfacial identity preservationdiffusion modelsstyle transferself-attention mechanismLoRA fine-tuningCLIP-guided reposing

0 comments

The pith

A diffusion-based system creates graffiti art from photos while keeping the subject's face recognizable by styling first then locking in identity.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents CraftGraffiti as an end-to-end framework that turns input images into text-guided graffiti using a LoRA-fine-tuned diffusion transformer. It tackles the problem of facial distortions in high-contrast abstract styles by first applying the style, then adding a face-consistent self-attention layer that injects explicit identity embeddings. The work validates that this style-first order reduces attribute drift compared with reversing the steps, while also supporting keypoint-free pose changes through CLIP prompt extension. Results show competitive facial consistency alongside leading scores for aesthetics and human preference, with a festival deployment demonstrating practical use in creative settings.

Core claim

CraftGraffiti shows that graffiti style transfer followed by identity enforcement via augmented self-attention layers produces outputs with reduced facial attribute drift, competitive identity preservation metrics, and high aesthetic and preference scores, outperforming the identity-first ordering.

What carries the argument

The face-consistent self-attention mechanism that augments attention layers with explicit identity embeddings to protect facial features after style application.

If this is right

Ordering style transfer before identity enforcement measurably lowers attribute drift relative to the reverse sequence.
The resulting images achieve competitive facial feature consistency with existing methods.
Aesthetic quality and human preference reach state-of-the-art levels for text-guided graffiti generation.
Pose customization works without keypoints while facial coherence is retained.
The pipeline supports live creative deployments such as festival installations.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same ordering principle could be tested on other extreme stylizations such as pixel art or mosaic rendering to see whether identity drift drops similarly.
Integration with existing portrait editing tools might extend the approach to video or animation sequences.
Cultural applications could include generating identity-preserving art for communities that value both stylistic freedom and recognizability.

Load-bearing premise

The explicit identity embeddings added to attention layers will reliably block subtle distortions to eyes, nose, or mouth even when graffiti stylization is extreme.

What would settle it

Generate a set of graffiti images from the same input faces using the system and check whether independent viewers can still match them to the originals at rates no better than chance or whether measurable eye-nose-mouth drift exceeds that of a simple reverse-order baseline.

Figures

Figures reproduced from arXiv: 2508.20640 by Ayan Banerjee, Fernando Vilari\~no, Josep Llad\'os.

**Figure 2.** Figure 2: CraftGraffiti transforms a source image into a graffiti-style portrait while preserving the subject’s identity and pose. Graffiti style is injected via a pretrained diffusion fine-tuned with LoRA for the dedicated style. Later on, another diffusion model is equipped with face-consistent self-attention and cross-attention modules to preserve key facial features, and a LoRA module enables pose customization … view at source ↗

**Figure 3.** Figure 3: face-consistent self-attention: we can easily preserve the facial attribute of the character through the extra dimension of identity embedding. To ensure that the two generated characters depict the same identity, we introduce an explicit identity embedding into the attention computation. Conceptually, this means adding a special identity vector (often derived from a reference face) as an extra token or … view at source ↗

**Figure 4.** Figure 4: Qualitative Comparison: CraftGraffiti perfectly transforms the input image into the graffiti style and maintains facial attributes, while the rest cannot do both. We also compare CraftGraffiti with InstructPix2Pix [6]; however, it neither adds objects nor generates high-quality images. Similarly, VLMs (Grok 3 [16] and GPT4o [22]) maintain consistency, unable to blend style due to the complexity of graffiti… view at source ↗

**Figure 5.** Figure 5: It has been observed that with the FLUX.1 dev (12B) baseline, we neither achieve consistency [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: Ablation of the self-attention: The face consistent self-attention primarily focuses on the human faces and their corresponding poses, whereas the traditional self-attention and subject-driven self-attention of Consistory [45] diverges towards the global scenario. Cultural and Societal Implications: Beyond technical performance, CraftGraffiti acts as an enabler in ongoing cultural debates around represen… view at source ↗

**Figure 7.** Figure 7: Human Evaluation: CraftGraffiti outperforms SOTA techniques [55, 27, 6, 16, 22] in style blending and aesthetics while preserving facial attributes (a decent performance in recognizability). Limitations: While our model successfully preserves facial attributes, real-time generation in public settings is computationally demanding, and user interactions are influenced by the physical constraints of the inst… view at source ↗

**Figure 8.** Figure 8: CraftGrafitti artsitic installation: Conceptual rendering and Actual setup at Cruïlla Festival 2025 (1) The general impression of the users was that the demo was fun and engaging, often reacting with surprise and amusement at the results. The majority of participants understood that the demonstration was intended as a playful, exploratory experience rather than a precise or professional tool. (2) Some user… view at source ↗

**Figure 9.** Figure 9: Example outcomes from the demonstration at Cruilla Festival Barcelona 2025 [PITH_FULL_IMAGE:figures/full_fig_p014_9.png] view at source ↗

**Figure 10.** Figure 10: Some more qualitative examples generated with [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

read the original abstract

Preserving facial identity under extreme stylistic transformation remains a major challenge in generative art. In graffiti, a high-contrast, abstract medium, subtle distortions to the eyes, nose, or mouth can erase the subject's recognizability, undermining both personal and cultural authenticity. We present CraftGraffiti, an end-to-end text-guided graffiti generation framework designed with facial feature preservation as a primary objective. Given an input image and a style and pose descriptive prompt, CraftGraffiti first applies graffiti style transfer via LoRA-fine-tuned pretrained diffusion transformer, then enforces identity fidelity through a face-consistent self-attention mechanism that augments attention layers with explicit identity embeddings. Pose customization is achieved without keypoints, using CLIP-guided prompt extension to enable dynamic re-posing while retaining facial coherence. We formally justify and empirically validate the "style-first, identity-after" paradigm, showing it reduces attribute drift compared to the reverse order. Quantitative results demonstrate competitive facial feature consistency and state-of-the-art aesthetic and human preference scores, while qualitative analyses and a live deployment at the Cruilla Festival highlight the system's real-world creative impact. CraftGraffiti advances the goal of identity-respectful AI-assisted artistry, offering a principled approach for blending stylistic freedom with recognizability in creative AI applications.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

CraftGraffiti describes a diffusion pipeline for graffiti art that puts style transfer before identity enforcement, but the evidence for better face preservation is still thin.

read the letter

The main takeaway is a practical pipeline that turns a photo into graffiti-style output while trying to keep the person's face recognizable. It fine-tunes a diffusion transformer with LoRA for the style, then adds identity embeddings into the attention layers, and uses CLIP prompt extension to change pose without keypoints. The ordering they push—style first, identity after—is presented as the key choice that cuts attribute drift compared to the reverse sequence.

Referee Report

2 major / 2 minor

Summary. The manuscript presents CraftGraffiti, an end-to-end text-guided framework for generating custom graffiti art from an input face image and descriptive prompts. It adopts a 'style-first, identity-after' pipeline: LoRA-fine-tuned pretrained diffusion transformer for graffiti style transfer, followed by a face-consistent self-attention mechanism that augments attention layers with explicit identity embeddings to enforce facial fidelity. Pose customization is handled via CLIP-guided prompt extension without keypoints. The work formally justifies and empirically validates that this ordering reduces attribute drift relative to the reverse sequence, reporting competitive facial feature consistency together with state-of-the-art aesthetic and human-preference scores, supported by quantitative tables, qualitative examples, and a live deployment at the Cruilla Festival.

Significance. If the empirical claims are substantiated, the paper would make a meaningful contribution to identity-preserving generative models for highly abstract artistic domains. The explicit validation of the style-first ordering, the real-world festival deployment, and the focus on recognizability in graffiti together address a practical gap between stylistic freedom and cultural/personal authenticity in creative AI applications.

major comments (2)

[Quantitative Evaluation] Quantitative Evaluation section: the reported facial-feature consistency scores rely on standard identity embeddings (ArcFace or equivalent). These embeddings are trained on photorealistic data; the paper does not demonstrate that the same embeddings remain reliable under extreme high-contrast graffiti stylization where eyes, nose, and mouth can be heavily abstracted. Without a domain-specific validation (e.g., human study on identity recognition in the generated graffiti or an alternative metric), the central claim that the style-first paradigm reduces attribute drift cannot be considered fully supported.
[Framework and Ablation] Framework and Ablation sections: the face-consistent self-attention is described as the safeguard against subtle distortions, yet the manuscript provides no targeted ablation isolating its contribution specifically on identity-critical regions (eyes/nose/mouth) under the most extreme stylization prompts. This leaves the mechanistic justification for the paradigm partially unproven.

minor comments (2)

[Abstract] The abstract states that the paradigm is 'formally justified'; if this justification appears in §3 or §4, a one-sentence pointer in the abstract would improve readability.
[Tables] Tables reporting quantitative scores should include standard deviations or confidence intervals and the exact number of human raters for the preference study.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which raises valid points about metric reliability and ablation depth in stylized domains. We address each major comment below and will revise the manuscript accordingly to provide stronger empirical support.

read point-by-point responses

Referee: [Quantitative Evaluation] Quantitative Evaluation section: the reported facial-feature consistency scores rely on standard identity embeddings (ArcFace or equivalent). These embeddings are trained on photorealistic data; the paper does not demonstrate that the same embeddings remain reliable under extreme high-contrast graffiti stylization where eyes, nose, and mouth can be heavily abstracted. Without a domain-specific validation (e.g., human study on identity recognition in the generated graffiti or an alternative metric), the central claim that the style-first paradigm reduces attribute drift cannot be considered fully supported.

Authors: We acknowledge that ArcFace embeddings are trained on photorealistic data and may have reduced reliability for heavily abstracted graffiti styles. While our existing human preference scores provide indirect support for recognizability, we agree this does not fully substitute for domain-specific validation. In the revised manuscript, we will add a targeted human study in which participants match generated graffiti images to original face photos and rate identity similarity. Results will be reported alongside the existing metrics in the Quantitative Evaluation section to directly support the claim that the style-first ordering reduces attribute drift. revision: yes
Referee: [Framework and Ablation] Framework and Ablation sections: the face-consistent self-attention is described as the safeguard against subtle distortions, yet the manuscript provides no targeted ablation isolating its contribution specifically on identity-critical regions (eyes/nose/mouth) under the most extreme stylization prompts. This leaves the mechanistic justification for the paradigm partially unproven.

Authors: The face-consistent self-attention augments attention layers with identity embeddings precisely to protect critical facial regions. To strengthen the mechanistic evidence, we will add a focused ablation study in the revised manuscript. This will compare outputs with and without the self-attention module under extreme stylization prompts, with quantitative evaluation of distortions localized to the eyes, nose, and mouth regions (using region-specific feature similarity) as well as qualitative examples. These results will be included in the Ablation section. revision: yes

Circularity Check

0 steps flagged

No circularity: framework relies on external pretrained components and empirical validation

full rationale

The paper presents CraftGraffiti as an end-to-end framework that applies LoRA-fine-tuned pretrained diffusion transformers for style transfer followed by a face-consistent self-attention mechanism using explicit identity embeddings. The 'style-first, identity-after' paradigm is described as formally justified and empirically validated against the reverse order, with quantitative results on facial consistency, aesthetics, and human preference. No equations, fitted parameters, or self-referential definitions are shown that reduce any claimed prediction or improvement to quantities defined inside the paper itself. All core modules reference external pretrained models (diffusion transformers, CLIP) rather than internal fits or self-citations that would create a closed loop. The derivation chain therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only view yields no explicit free parameters, axioms, or invented entities; the central claim rests on the unstated assumption that pretrained diffusion models plus added identity embeddings will behave as described without further derivation.

pith-pipeline@v0.9.0 · 5761 in / 1082 out tokens · 42621 ms · 2026-05-18T20:41:50.970887+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We formally justify and empirically validate the 'style-first, identity-after' paradigm... face-consistent self-attention mechanism that augments attention layers with explicit identity embeddings.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Quantitative results demonstrate competitive facial feature consistency and state-of-the-art aesthetic and human preference scores.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

DocRevive: A Unified Pipeline for Document Text Restoration
cs.CV 2026-04 unverdicted novelty 5.0

DocRevive builds a unified pipeline using OCR, image analysis, language models, and diffusion to reconstruct degraded document text, backed by a 30k-image synthetic dataset and the UCSM metric.
DocRevive: A Unified Pipeline for Document Text Restoration
cs.CV 2026-04 unverdicted novelty 5.0

A unified pipeline using OCR, inpainting, and diffusion models restores text in degraded documents on a new synthetic benchmark dataset, evaluated with the proposed UCSM metric.

Reference graph

Works this paper leans on

59 extracted references · 59 canonical work pages · cited by 1 Pith paper · 4 internal anchors

[1]

https://en.wikipedia.org/wiki/Living_lab, 2025

Living lab. https://en.wikipedia.org/wiki/Living_lab, 2025. Accessed: 2025-08-09

work page 2025
[2]

Ethical challenges and solutions of generative ai: An interdisciplinary perspective.Informatics, 11(3):58,

Mousa Al-kfairy, Dheya Mustafa, Nir Kshetri, Mazen Insiew, and Omar Alfandi. Ethical challenges and solutions of generative ai: An interdisciplinary perspective.Informatics, 11(3):58,

work page
[3]

Svgcraft: Beyond single object text-to-svg synthesis with comprehensive canvas layout, 2025

Ayan Banerjee, Nityanand Mathur, Josep Lladós, Umapada Pal, and Anjan Dutta. Svgcraft: Beyond single object text-to-svg synthesis with comprehensive canvas layout, 2025

work page 2025
[4]

Bombing, tagging, writing: An analysis of the significance of graffiti and street art

Lindsay Bates. Bombing, tagging, writing: An analysis of the significance of graffiti and street art. PhD thesis, University of Pennsylvania, 2014

work page 2014
[5]

Fairness in machine learning: Lessons from political philosophy

Reuben Binns. Fairness in machine learning: Lessons from political philosophy. Proceedings of the 2017 FMML Workshop on Fair ML, 2017. arXiv preprint arXiv:1712.03586

work page arXiv 2017
[6]

Instructpix2pix: Learning to follow image editing instructions

Tim Brooks, Aleksander Holynski, and Alexei A Efros. Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18392–18402, 2023

work page 2023
[7]

Vggface2: A dataset for recognising faces across pose and age

Qiong Cao, Li Shen, Weidi Xie, Omkar M Parkhi, and Andrew Zisserman. Vggface2: A dataset for recognising faces across pose and age. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), pages 67–74. IEEE, 2018

work page 2018
[8]

Upgpt: Universal diffusion model for person image generation, editing and pose transfer

Soon Yau Cheong, Armin Mustafa, and Andrew Gilbert. Upgpt: Universal diffusion model for person image generation, editing and pose transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4173–4182, 2023

work page 2023
[9]

Living labs and user engagement for innovation and sustainability

Luca Compagnucci, Francesca Spigarelli, Jorge Coelho, and Carlos Duarte. Living labs and user engagement for innovation and sustainability. Journal of Cleaner Production, 317:128223, 2021

work page 2021
[10]

Power of graffiti: Exploring its cultural and social significance

Saday Chandra Das. Power of graffiti: Exploring its cultural and social significance. Aayushi International Interdisciplinary Research Journal (AIIRJ), X (IX), pages 34–35, 2023. 9

work page 2023
[11]

Prompt tuning inversion for text- driven image editing using diffusion models

Wenkai Dong, Song Xue, Xiaoyue Duan, and Shumin Han. Prompt tuning inversion for text- driven image editing using diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7430–7440, 2023

work page 2023
[12]

Diffusion in style

Martin Nicolas Everaert, Marco Bocchio, Sami Arpa, Sabine Süsstrunk, and Radhakrishna Achanta. Diffusion in style. In Proceedings of the ieee/cvf international conference on computer vision, pages 2251–2261, 2023

work page 2023
[13]

Fairness and bias in artificial intelligence: A survey

Emilio Ferrara. Fairness and bias in artificial intelligence: A survey. Digital, 6(1):1–41, 2023

work page 2023
[14]

Evaluating the cultural signifi- cance of historic graffiti

Alan M Forster, Samantha Vettese-Forster, and John Borland. Evaluating the cultural signifi- cance of historic graffiti. Structural Survey, 30(1):43–64, 2012

work page 2012
[15]

i don’t see myself represented here at all

Sourojit Ghosh, Nina Lutz, and Aylin Caliskan. “i don’t see myself represented here at all”: User experiences of stable diffusion outputs containing representational harms across gender identities and nationalities. In Proceedings of the AAAI/ACM conference on AI, ethics, and society, volume 7, pages 463–475, 2024

work page 2024
[16]

beta—the age of reasoning agents, 3

XAI Grok. beta—the age of reasoning agents, 3

work page
[17]

Focus on your instruction: Fine-grained and multi-instruction image editing by attention modulation

Qin Guo and Tianwei Lin. Focus on your instruction: Fine-grained and multi-instruction image editing by attention modulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6986–6996, 2024

work page 2024
[18]

Diffusion-enhanced patchmatch: A framework for arbitrary style transfer with diffusion models

Mark Hamazaspyan and Shant Navasardyan. Diffusion-enhanced patchmatch: A framework for arbitrary style transfer with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 797–805, 2023

work page 2023
[19]

Lora: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022

work page 2022
[20]

Diffstyler: Controllable dual diffusion for text-driven image stylization

Nisha Huang, Yuxin Zhang, Fan Tang, Chongyang Ma, Haibin Huang, Weiming Dong, and Changsheng Xu. Diffstyler: Controllable dual diffusion for text-driven image stylization. IEEE Transactions on Neural Networks and Learning Systems, 2024

work page 2024
[21]

Diffusion model-based image editing: A survey

Yi Huang, Jiancheng Huang, Yifan Liu, Mingfu Yan, Jiaxi Lv, Jianzhuang Liu, Wei Xiong, He Zhang, Liangliang Cao, and Shifeng Chen. Diffusion model-based image editing: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

work page 2025
[22]

GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[23]

Humansd: A native skeleton-guided diffusion model for human image generation

Xuan Ju, Ailing Zeng, Chenchen Zhao, Jianan Wang, Lei Zhang, and Qiang Xu. Humansd: A native skeleton-guided diffusion model for human image generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15988–15998, 2023

work page 2023
[24]

Imagic: Text-based real image editing with diffusion models

Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. Imagic: Text-based real image editing with diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6007–6017, 2023

work page 2023
[25]

Reposedm: Recurrent pose alignment and gradient guidance for pose guided image synthesis

Anant Khandelwal. Reposedm: Recurrent pose alignment and gradient guidance for pose guided image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2495–2504, 2024

work page 2024
[26]

Ecoval: Ecological validity of cues and representative design in user experience evaluations

Suzanne Kieffer. Ecoval: Ecological validity of cues and representative design in user experience evaluations. AIS Transactions on Human-Computer Interaction, 9(2):149–172, 2017

work page 2017
[27]

Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, et al. Flux. 1 kontext: Flow matching for in-context image generation and editing in latent space. arXiv preprint arXiv:2506.15742, 2025. 10

work page internal anchor Pith review Pith/arXiv arXiv 2025
[28]

Universal style transfer via feature transforms

Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, and Ming-Hsuan Yang. Universal style transfer via feature transforms. Advances in neural information processing systems, 30, 2017

work page 2017
[29]

Global and local consistent age generative adversarial network (glca-gan)

Zhen Li, Ping Wang, Qiong Hu, and Ran He. Global and local consistent age generative adversarial network (glca-gan). In Proceedings of the 26th ACM International Conference on Multimedia, pages 305–313, 2018

work page 2018
[30]

Null-text inversion for editing real images using guided diffusion models

Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Null-text inversion for editing real images using guided diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6038–6047, 2023

work page 2023
[31]

Ava: A large-scale database for aesthetic visual analysis

Naila Murray, Luca Marchesotti, and Florent Perronnin. Ava: A large-scale database for aesthetic visual analysis. In 2012 IEEE conference on computer vision and pattern recognition, pages 2408–2415. IEEE, 2012

work page 2012
[32]

Uncovering bias in face generation models

Cristian Muñoz, Nicola Zannone, Mohamed Mohammed, and Adriano Koshiyama. Uncovering bias in face generation models. arXiv preprint arXiv:2302.11562, 2023

work page arXiv 2023
[33]

Contrastive denoising score for text-guided latent diffusion image editing

Hyelin Nam, Gihyun Kwon, Geon Yeong Park, and Jong Chul Ye. Contrastive denoising score for text-guided latent diffusion image editing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9192–9201, 2024

work page 2024
[34]

Do transformer modifi- cations transfer across implementations and applications? arXiv preprint arXiv:2102.11972, 2021

Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, et al. Do transformer modifi- cations transfer across implementations and applications? arXiv preprint arXiv:2102.11972, 2021

work page arXiv 2021
[35]

Diffbody: Diffusion-based pose and shape editing of human images

Yuta Okuyama, Yuki Endo, and Yoshihiro Kanamori. Diffbody: Diffusion-based pose and shape editing of human images. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 6333–6342, 2024

work page 2024
[36]

K-lora: Unlocking training-free fusion of any subject and style loras.arXiv preprint arXiv:2502.18461, 2025

Ziheng Ouyang, Zhen Li, and Qibin Hou. K-lora: Unlocking training-free fusion of any subject and style loras. arXiv preprint arXiv:2502.18461, 2025

work page arXiv 2025
[37]

Enhancing dreambooth with lora for generating unlimited characters with stable diffusion

Rubén Pascual, Adrián Maiza, Mikel Sesma-Sara, Daniel Paternain, and Mikel Galar. Enhancing dreambooth with lora for generating unlimited characters with stable diffusion. In 2024 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2024

work page 2024
[38]

Pytorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019

work page 2019
[39]

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[40]

Crowd sensing and living lab outdoor experimentation made easy

Evangelos Pournaras, Atif Nabi Ghulam, Renato Kunz, and Regula Hänggli. Crowd sensing and living lab outdoor experimentation made easy. arXiv preprint arXiv:2107.04117, 2021

work page arXiv 2021
[41]

Training- free identity preservation in stylized image generation using diffusion models

Mohammad Ali Rezaei, Helia Hajikazem, Saeed Khanehgir, and Mahdi Javanmardi. Training- free identity preservation in stylized image generation using diffusion models. arXiv preprint arXiv:2506.06802, 2025

work page arXiv 2025
[42]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

work page 2022
[43]

Facenet: A unified embedding for face recognition and clustering

Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823, 2015. 11

work page 2015
[44]

Smiling women pitching down: Auditing gender bias in image generative ai

Chien Sun, William Tzeng, et al. Smiling women pitching down: Auditing gender bias in image generative ai. arXiv preprint arXiv:2305.10566, 2023

work page arXiv 2023
[45]

Training-free consistent text-to-image generation

Yoad Tewel, Omri Kaduri, Rinon Gal, Yoni Kasten, Lior Wolf, Gal Chechik, and Yuval Atzmon. Training-free consistent text-to-image generation. ACM Transactions on Graphics (TOG) , 43(4):1–18, 2024

work page 2024
[46]

Instantstyle-plus: Style transfer with content-preserving in text-to-image generation

Haofan Wang, Peng Xing, Renyuan Huang, Hao Ai, Qixun Wang, and Xu Bai. Instantstyle- plus: Style transfer with content-preserving in text-to-image generation. arXiv preprint arXiv:2407.00788, 2024

work page arXiv 2024
[47]

Stable-pose: Leveraging transformers for pose-guided text-to-image generation

Jiajun Wang, Morteza Ghahremani Boozandani, Yitong Li, Björn Ommer, and Christian Wachinger. Stable-pose: Leveraging transformers for pose-guided text-to-image generation. Advances in Neural Information Processing Systems, 37:65670–65698, 2024

work page 2024
[48]

Interactive image style transfer guided by graffiti

Quan Wang, Yanli Ren, Xinpeng Zhang, and Guorui Feng. Interactive image style transfer guided by graffiti. In Proceedings of the 31st ACM International Conference on Multimedia, pages 6685–6694, 2023

work page 2023
[49]

Au- tostory: Generating diverse storytelling images with minimal human efforts

Wen Wang, Canyu Zhao, Hao Chen, Zhekai Chen, Kecheng Zheng, and Chunhua Shen. Au- tostory: Generating diverse storytelling images with minimal human efforts. International Journal of Computer Vision, pages 1–22, 2024

work page 2024
[50]

Stylediffusion: Controllable disentangled style transfer via diffusion models

Zhizhong Wang, Lei Zhao, and Wei Xing. Stylediffusion: Controllable disentangled style transfer via diffusion models. In Proceedings of the IEEE/CVF international conference on computer vision, pages 7677–7689, 2023

work page 2023
[51]

A latent space of stochastic diffusion models for zero-shot image editing and guidance

Chen Henry Wu and Fernando De la Torre. A latent space of stochastic diffusion models for zero-shot image editing and guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7378–7387, 2023

work page 2023
[52]

Human preference score: Better aligning text-to-image models with human preference

Xiaoshi Wu, Keqiang Sun, Feng Zhu, Rui Zhao, and Hongsheng Li. Human preference score: Better aligning text-to-image models with human preference. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2096–2105, 2023

work page 2096
[53]

Drb-gan: A dynamic res- block generative adversarial network for artistic style transfer

Wenju Xu, Chengjiang Long, Ruisheng Wang, and Guanghui Wang. Drb-gan: A dynamic res- block generative adversarial network for artistic style transfer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6383–6392, 2021

work page 2021
[54]

Controllable artistic text style transfer via shape-matching gan

Shuai Yang, Zhangyang Wang, Zhaowen Wang, Ning Xu, Jiaying Liu, and Zongming Guo. Controllable artistic text style transfer via shape-matching gan. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4442–4451, 2019

work page 2019
[55]

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. arXiv preprint arXiv:2308.06721, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[56]

Inversion-based style transfer with diffusion models

Yuxin Zhang, Nisha Huang, Fan Tang, Haibin Huang, Chongyang Ma, Weiming Dong, and Changsheng Xu. Inversion-based style transfer with diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10146–10156, 2023

work page 2023
[57]

Sine: Single image editing with text-to-image diffusion models

Zhixing Zhang, Ligong Han, Arnab Ghosh, Dimitris N Metaxas, and Jian Ren. Sine: Single image editing with text-to-image diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6027–6037, 2023

work page 2023
[58]

Bias in generative ai

Xiang Zhou. Bias in generative ai. arXiv preprint arXiv:2403.02726, 2024

work page arXiv 2024
[59]

face1 . png

Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, and Qibin Hou. Storydiffusion: Consistent self-attention for long-range image and video generation. Advances in Neural Information Processing Systems, 37:110315–110340, 2024. 12 A Demonstration at Cruïlla Festival An installation (see Fig. 8) implementing the proposed system was deployed during the F...

work page 2024

[1] [1]

https://en.wikipedia.org/wiki/Living_lab, 2025

Living lab. https://en.wikipedia.org/wiki/Living_lab, 2025. Accessed: 2025-08-09

work page 2025

[2] [2]

Ethical challenges and solutions of generative ai: An interdisciplinary perspective.Informatics, 11(3):58,

Mousa Al-kfairy, Dheya Mustafa, Nir Kshetri, Mazen Insiew, and Omar Alfandi. Ethical challenges and solutions of generative ai: An interdisciplinary perspective.Informatics, 11(3):58,

work page

[3] [3]

Svgcraft: Beyond single object text-to-svg synthesis with comprehensive canvas layout, 2025

Ayan Banerjee, Nityanand Mathur, Josep Lladós, Umapada Pal, and Anjan Dutta. Svgcraft: Beyond single object text-to-svg synthesis with comprehensive canvas layout, 2025

work page 2025

[4] [4]

Bombing, tagging, writing: An analysis of the significance of graffiti and street art

Lindsay Bates. Bombing, tagging, writing: An analysis of the significance of graffiti and street art. PhD thesis, University of Pennsylvania, 2014

work page 2014

[5] [5]

Fairness in machine learning: Lessons from political philosophy

Reuben Binns. Fairness in machine learning: Lessons from political philosophy. Proceedings of the 2017 FMML Workshop on Fair ML, 2017. arXiv preprint arXiv:1712.03586

work page arXiv 2017

[6] [6]

Instructpix2pix: Learning to follow image editing instructions

Tim Brooks, Aleksander Holynski, and Alexei A Efros. Instructpix2pix: Learning to follow image editing instructions. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 18392–18402, 2023

work page 2023

[7] [7]

Vggface2: A dataset for recognising faces across pose and age

Qiong Cao, Li Shen, Weidi Xie, Omkar M Parkhi, and Andrew Zisserman. Vggface2: A dataset for recognising faces across pose and age. In 2018 13th IEEE international conference on automatic face & gesture recognition (FG 2018), pages 67–74. IEEE, 2018

work page 2018

[8] [8]

Upgpt: Universal diffusion model for person image generation, editing and pose transfer

Soon Yau Cheong, Armin Mustafa, and Andrew Gilbert. Upgpt: Universal diffusion model for person image generation, editing and pose transfer. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4173–4182, 2023

work page 2023

[9] [9]

Living labs and user engagement for innovation and sustainability

Luca Compagnucci, Francesca Spigarelli, Jorge Coelho, and Carlos Duarte. Living labs and user engagement for innovation and sustainability. Journal of Cleaner Production, 317:128223, 2021

work page 2021

[10] [10]

Power of graffiti: Exploring its cultural and social significance

Saday Chandra Das. Power of graffiti: Exploring its cultural and social significance. Aayushi International Interdisciplinary Research Journal (AIIRJ), X (IX), pages 34–35, 2023. 9

work page 2023

[11] [11]

Prompt tuning inversion for text- driven image editing using diffusion models

Wenkai Dong, Song Xue, Xiaoyue Duan, and Shumin Han. Prompt tuning inversion for text- driven image editing using diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7430–7440, 2023

work page 2023

[12] [12]

Diffusion in style

Martin Nicolas Everaert, Marco Bocchio, Sami Arpa, Sabine Süsstrunk, and Radhakrishna Achanta. Diffusion in style. In Proceedings of the ieee/cvf international conference on computer vision, pages 2251–2261, 2023

work page 2023

[13] [13]

Fairness and bias in artificial intelligence: A survey

Emilio Ferrara. Fairness and bias in artificial intelligence: A survey. Digital, 6(1):1–41, 2023

work page 2023

[14] [14]

Evaluating the cultural signifi- cance of historic graffiti

Alan M Forster, Samantha Vettese-Forster, and John Borland. Evaluating the cultural signifi- cance of historic graffiti. Structural Survey, 30(1):43–64, 2012

work page 2012

[15] [15]

i don’t see myself represented here at all

Sourojit Ghosh, Nina Lutz, and Aylin Caliskan. “i don’t see myself represented here at all”: User experiences of stable diffusion outputs containing representational harms across gender identities and nationalities. In Proceedings of the AAAI/ACM conference on AI, ethics, and society, volume 7, pages 463–475, 2024

work page 2024

[16] [16]

beta—the age of reasoning agents, 3

XAI Grok. beta—the age of reasoning agents, 3

work page

[17] [17]

Focus on your instruction: Fine-grained and multi-instruction image editing by attention modulation

Qin Guo and Tianwei Lin. Focus on your instruction: Fine-grained and multi-instruction image editing by attention modulation. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 6986–6996, 2024

work page 2024

[18] [18]

Diffusion-enhanced patchmatch: A framework for arbitrary style transfer with diffusion models

Mark Hamazaspyan and Shant Navasardyan. Diffusion-enhanced patchmatch: A framework for arbitrary style transfer with diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 797–805, 2023

work page 2023

[19] [19]

Lora: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1(2):3, 2022

work page 2022

[20] [20]

Diffstyler: Controllable dual diffusion for text-driven image stylization

Nisha Huang, Yuxin Zhang, Fan Tang, Chongyang Ma, Haibin Huang, Weiming Dong, and Changsheng Xu. Diffstyler: Controllable dual diffusion for text-driven image stylization. IEEE Transactions on Neural Networks and Learning Systems, 2024

work page 2024

[21] [21]

Diffusion model-based image editing: A survey

Yi Huang, Jiancheng Huang, Yifan Liu, Mingfu Yan, Jiaxi Lv, Jianzhuang Liu, Wei Xiong, He Zhang, Liangliang Cao, and Shifeng Chen. Diffusion model-based image editing: A survey. IEEE Transactions on Pattern Analysis and Machine Intelligence, 2025

work page 2025

[22] [22]

GPT-4o System Card

Aaron Hurst, Adam Lerer, Adam P Goucher, Adam Perelman, Aditya Ramesh, Aidan Clark, AJ Ostrow, Akila Welihinda, Alan Hayes, Alec Radford, et al. Gpt-4o system card. arXiv preprint arXiv:2410.21276, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[23] [23]

Humansd: A native skeleton-guided diffusion model for human image generation

Xuan Ju, Ailing Zeng, Chenchen Zhao, Jianan Wang, Lei Zhang, and Qiang Xu. Humansd: A native skeleton-guided diffusion model for human image generation. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 15988–15998, 2023

work page 2023

[24] [24]

Imagic: Text-based real image editing with diffusion models

Bahjat Kawar, Shiran Zada, Oran Lang, Omer Tov, Huiwen Chang, Tali Dekel, Inbar Mosseri, and Michal Irani. Imagic: Text-based real image editing with diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6007–6017, 2023

work page 2023

[25] [25]

Reposedm: Recurrent pose alignment and gradient guidance for pose guided image synthesis

Anant Khandelwal. Reposedm: Recurrent pose alignment and gradient guidance for pose guided image synthesis. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 2495–2504, 2024

work page 2024

[26] [26]

Ecoval: Ecological validity of cues and representative design in user experience evaluations

Suzanne Kieffer. Ecoval: Ecological validity of cues and representative design in user experience evaluations. AIS Transactions on Human-Computer Interaction, 9(2):149–172, 2017

work page 2017

[27] [27]

Black Forest Labs, Stephen Batifol, Andreas Blattmann, Frederic Boesel, Saksham Consul, Cyril Diagne, Tim Dockhorn, Jack English, Zion English, Patrick Esser, et al. Flux. 1 kontext: Flow matching for in-context image generation and editing in latent space. arXiv preprint arXiv:2506.15742, 2025. 10

work page internal anchor Pith review Pith/arXiv arXiv 2025

[28] [28]

Universal style transfer via feature transforms

Yijun Li, Chen Fang, Jimei Yang, Zhaowen Wang, Xin Lu, and Ming-Hsuan Yang. Universal style transfer via feature transforms. Advances in neural information processing systems, 30, 2017

work page 2017

[29] [29]

Global and local consistent age generative adversarial network (glca-gan)

Zhen Li, Ping Wang, Qiong Hu, and Ran He. Global and local consistent age generative adversarial network (glca-gan). In Proceedings of the 26th ACM International Conference on Multimedia, pages 305–313, 2018

work page 2018

[30] [30]

Null-text inversion for editing real images using guided diffusion models

Ron Mokady, Amir Hertz, Kfir Aberman, Yael Pritch, and Daniel Cohen-Or. Null-text inversion for editing real images using guided diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6038–6047, 2023

work page 2023

[31] [31]

Ava: A large-scale database for aesthetic visual analysis

Naila Murray, Luca Marchesotti, and Florent Perronnin. Ava: A large-scale database for aesthetic visual analysis. In 2012 IEEE conference on computer vision and pattern recognition, pages 2408–2415. IEEE, 2012

work page 2012

[32] [32]

Uncovering bias in face generation models

Cristian Muñoz, Nicola Zannone, Mohamed Mohammed, and Adriano Koshiyama. Uncovering bias in face generation models. arXiv preprint arXiv:2302.11562, 2023

work page arXiv 2023

[33] [33]

Contrastive denoising score for text-guided latent diffusion image editing

Hyelin Nam, Gihyun Kwon, Geon Yeong Park, and Jong Chul Ye. Contrastive denoising score for text-guided latent diffusion image editing. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 9192–9201, 2024

work page 2024

[34] [34]

Do transformer modifi- cations transfer across implementations and applications? arXiv preprint arXiv:2102.11972, 2021

Sharan Narang, Hyung Won Chung, Yi Tay, William Fedus, Thibault Fevry, Michael Matena, Karishma Malkan, Noah Fiedel, Noam Shazeer, Zhenzhong Lan, et al. Do transformer modifi- cations transfer across implementations and applications? arXiv preprint arXiv:2102.11972, 2021

work page arXiv 2021

[35] [35]

Diffbody: Diffusion-based pose and shape editing of human images

Yuta Okuyama, Yuki Endo, and Yoshihiro Kanamori. Diffbody: Diffusion-based pose and shape editing of human images. In Proceedings of the IEEE/CVF Winter Conference on Applications of Computer Vision, pages 6333–6342, 2024

work page 2024

[36] [36]

K-lora: Unlocking training-free fusion of any subject and style loras.arXiv preprint arXiv:2502.18461, 2025

Ziheng Ouyang, Zhen Li, and Qibin Hou. K-lora: Unlocking training-free fusion of any subject and style loras. arXiv preprint arXiv:2502.18461, 2025

work page arXiv 2025

[37] [37]

Enhancing dreambooth with lora for generating unlimited characters with stable diffusion

Rubén Pascual, Adrián Maiza, Mikel Sesma-Sara, Daniel Paternain, and Mikel Galar. Enhancing dreambooth with lora for generating unlimited characters with stable diffusion. In 2024 International Joint Conference on Neural Networks (IJCNN), pages 1–8. IEEE, 2024

work page 2024

[38] [38]

Pytorch: An imperative style, high-performance deep learning library

Adam Paszke, Sam Gross, Francisco Massa, Adam Lerer, James Bradbury, Gregory Chanan, Trevor Killeen, Zeming Lin, Natalia Gimelshein, Luca Antiga, et al. Pytorch: An imperative style, high-performance deep learning library. Advances in neural information processing systems, 32, 2019

work page 2019

[39] [39]

SDXL: Improving Latent Diffusion Models for High-Resolution Image Synthesis

Dustin Podell, Zion English, Kyle Lacey, Andreas Blattmann, Tim Dockhorn, Jonas Müller, Joe Penna, and Robin Rombach. Sdxl: Improving latent diffusion models for high-resolution image synthesis. arXiv preprint arXiv:2307.01952, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[40] [40]

Crowd sensing and living lab outdoor experimentation made easy

Evangelos Pournaras, Atif Nabi Ghulam, Renato Kunz, and Regula Hänggli. Crowd sensing and living lab outdoor experimentation made easy. arXiv preprint arXiv:2107.04117, 2021

work page arXiv 2021

[41] [41]

Training- free identity preservation in stylized image generation using diffusion models

Mohammad Ali Rezaei, Helia Hajikazem, Saeed Khanehgir, and Mahdi Javanmardi. Training- free identity preservation in stylized image generation using diffusion models. arXiv preprint arXiv:2506.06802, 2025

work page arXiv 2025

[42] [42]

High- resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Björn Ommer. High- resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10684–10695, 2022

work page 2022

[43] [43]

Facenet: A unified embedding for face recognition and clustering

Florian Schroff, Dmitry Kalenichenko, and James Philbin. Facenet: A unified embedding for face recognition and clustering. In Proceedings of the IEEE conference on computer vision and pattern recognition, pages 815–823, 2015. 11

work page 2015

[44] [44]

Smiling women pitching down: Auditing gender bias in image generative ai

Chien Sun, William Tzeng, et al. Smiling women pitching down: Auditing gender bias in image generative ai. arXiv preprint arXiv:2305.10566, 2023

work page arXiv 2023

[45] [45]

Training-free consistent text-to-image generation

Yoad Tewel, Omri Kaduri, Rinon Gal, Yoni Kasten, Lior Wolf, Gal Chechik, and Yuval Atzmon. Training-free consistent text-to-image generation. ACM Transactions on Graphics (TOG) , 43(4):1–18, 2024

work page 2024

[46] [46]

Instantstyle-plus: Style transfer with content-preserving in text-to-image generation

Haofan Wang, Peng Xing, Renyuan Huang, Hao Ai, Qixun Wang, and Xu Bai. Instantstyle- plus: Style transfer with content-preserving in text-to-image generation. arXiv preprint arXiv:2407.00788, 2024

work page arXiv 2024

[47] [47]

Stable-pose: Leveraging transformers for pose-guided text-to-image generation

Jiajun Wang, Morteza Ghahremani Boozandani, Yitong Li, Björn Ommer, and Christian Wachinger. Stable-pose: Leveraging transformers for pose-guided text-to-image generation. Advances in Neural Information Processing Systems, 37:65670–65698, 2024

work page 2024

[48] [48]

Interactive image style transfer guided by graffiti

Quan Wang, Yanli Ren, Xinpeng Zhang, and Guorui Feng. Interactive image style transfer guided by graffiti. In Proceedings of the 31st ACM International Conference on Multimedia, pages 6685–6694, 2023

work page 2023

[49] [49]

Au- tostory: Generating diverse storytelling images with minimal human efforts

Wen Wang, Canyu Zhao, Hao Chen, Zhekai Chen, Kecheng Zheng, and Chunhua Shen. Au- tostory: Generating diverse storytelling images with minimal human efforts. International Journal of Computer Vision, pages 1–22, 2024

work page 2024

[50] [50]

Stylediffusion: Controllable disentangled style transfer via diffusion models

Zhizhong Wang, Lei Zhao, and Wei Xing. Stylediffusion: Controllable disentangled style transfer via diffusion models. In Proceedings of the IEEE/CVF international conference on computer vision, pages 7677–7689, 2023

work page 2023

[51] [51]

A latent space of stochastic diffusion models for zero-shot image editing and guidance

Chen Henry Wu and Fernando De la Torre. A latent space of stochastic diffusion models for zero-shot image editing and guidance. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 7378–7387, 2023

work page 2023

[52] [52]

Human preference score: Better aligning text-to-image models with human preference

Xiaoshi Wu, Keqiang Sun, Feng Zhu, Rui Zhao, and Hongsheng Li. Human preference score: Better aligning text-to-image models with human preference. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 2096–2105, 2023

work page 2096

[53] [53]

Drb-gan: A dynamic res- block generative adversarial network for artistic style transfer

Wenju Xu, Chengjiang Long, Ruisheng Wang, and Guanghui Wang. Drb-gan: A dynamic res- block generative adversarial network for artistic style transfer. In Proceedings of the IEEE/CVF international conference on computer vision, pages 6383–6392, 2021

work page 2021

[54] [54]

Controllable artistic text style transfer via shape-matching gan

Shuai Yang, Zhangyang Wang, Zhaowen Wang, Ning Xu, Jiaying Liu, and Zongming Guo. Controllable artistic text style transfer via shape-matching gan. In Proceedings of the IEEE/CVF International Conference on Computer Vision, pages 4442–4451, 2019

work page 2019

[55] [55]

IP-Adapter: Text Compatible Image Prompt Adapter for Text-to-Image Diffusion Models

Hu Ye, Jun Zhang, Sibo Liu, Xiao Han, and Wei Yang. Ip-adapter: Text compatible image prompt adapter for text-to-image diffusion models. arXiv preprint arXiv:2308.06721, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[56] [56]

Inversion-based style transfer with diffusion models

Yuxin Zhang, Nisha Huang, Fan Tang, Haibin Huang, Chongyang Ma, Weiming Dong, and Changsheng Xu. Inversion-based style transfer with diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 10146–10156, 2023

work page 2023

[57] [57]

Sine: Single image editing with text-to-image diffusion models

Zhixing Zhang, Ligong Han, Arnab Ghosh, Dimitris N Metaxas, and Jian Ren. Sine: Single image editing with text-to-image diffusion models. In Proceedings of the IEEE/CVF conference on computer vision and pattern recognition, pages 6027–6037, 2023

work page 2023

[58] [58]

Bias in generative ai

Xiang Zhou. Bias in generative ai. arXiv preprint arXiv:2403.02726, 2024

work page arXiv 2024

[59] [59]

face1 . png

Yupeng Zhou, Daquan Zhou, Ming-Ming Cheng, Jiashi Feng, and Qibin Hou. Storydiffusion: Consistent self-attention for long-range image and video generation. Advances in Neural Information Processing Systems, 37:110315–110340, 2024. 12 A Demonstration at Cruïlla Festival An installation (see Fig. 8) implementing the proposed system was deployed during the F...

work page 2024