Training-Free Debiasing of Diffusion Models via CLIP-Guided Denoising Optimization

Dain Kim; Jinseo Kim; Sungyong Baik

arxiv: 2607.00817 · v1 · pith:BJXBGJKXnew · submitted 2026-07-01 · 💻 cs.CV

Training-Free Debiasing of Diffusion Models via CLIP-Guided Denoising Optimization

Dain Kim , Jinseo Kim , Sungyong Baik This is my paper

Pith reviewed 2026-07-02 13:55 UTC · model grok-4.3

classification 💻 cs.CV

keywords diffusion modelsdebiasingtext embedding optimizationCLIP guidancetraining-free methodsdemographic biasfairness in generationinference-time steering

0 comments

The pith

Text Embedding Steering optimizes conditional embeddings in two stages to reduce demographic bias in diffusion models without any parameter changes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that demographic bias in text-to-image diffusion models, where neutral prompts yield stereotypical gender and race outputs, can be addressed by directly optimizing the conditional text embeddings during the diffusion process. It introduces a two-stage approach consisting of early global alignment followed by iterative refinement guided by CLIP feedback at denoising steps. This method avoids both model retraining and the quality degradation common in prior inference-time fixes. A sympathetic reader would care because it offers a practical way to steer attributes controllably while keeping the original model intact and image quality competitive. Experiments on Stable Diffusion show it surpasses existing training-free baselines on fairness metrics.

Core claim

Text Embedding Steering (TES) is a training-free framework that mitigates demographic bias by optimizing conditional text embeddings during the diffusion process. A two-stage strategy—early-stage global alignment followed by iterative denoising-time refinement with CLIP-based feedback—enables stable and controllable attribute steering without modifying model parameters. On Stable Diffusion, TES outperforms training-free baselines in fairness while maintaining competitive image quality.

What carries the argument

Text Embedding Steering (TES), which directly optimizes conditional text embeddings using a two-stage process of global alignment and CLIP-guided iterative refinement during denoising.

If this is right

Demographic bias can be steered at inference time without retraining the diffusion model.
The two-stage process supports controllable attribute adjustment while preserving semantic alignment.
Training-free methods can match or exceed retraining approaches on fairness without quality trade-offs.
CLIP feedback loops enable refinement specifically during the denoising trajectory rather than only at the start.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same embedding optimization pattern might extend to other controllable attributes beyond demographics, such as style or object composition.
If CLIP signals prove noisy for certain attributes, the method could be paired with alternative feedback models without changing the core two-stage structure.
The approach suggests that many bias issues in generative models stem from conditioning embeddings rather than the underlying denoising network itself.

Load-bearing premise

CLIP-based feedback during denoising supplies reliable signals for demographic attributes that can be optimized without creating new semantic misalignment or degrading image quality.

What would settle it

Apply TES to neutral prompts on Stable Diffusion, then check whether stereotypical gender or race representations remain in the outputs or whether FID/CLIP scores drop below the unmodified model baseline.

Figures

Figures reproduced from arXiv: 2607.00817 by Dain Kim, Jinseo Kim, Sungyong Baik.

**Figure 1.** Figure 1: Overview of qualitative results on Stable Diffusion v2.1. Our method reduces demographic bias across multiple profession prompts while preserving profession semantics and image quality. Abstract Text-to-image diffusion models achieve impressive visual quality, yet demographic bias remains a challenge, as neutral prompts consistently produce stereotypical representations across gender and race. Existing a… view at source ↗

**Figure 2.** Figure 2: Overview of the proposed Text Embedding Steering (TES). Instead of keeping the text embedding fixed throughout sampling, TES continuously updates the embedding during diffusion, enabling controllable attribute steering while preserving the original prompt semantics. • We demonstrate that TES achieves a strong balance between fairness and image quality, and improves the reliability of generated content i… view at source ↗

**Figure 3.** Figure 3: Overall framework of Text Embedding Steering (TES). TES first performs an early-stage embedding alignment to steer the global generation trajectory, followed by denoising-time refinement that iteratively updates the text embedding using CLIP feedback from predicted clean images. At each active timestep, the reconstructed clean image xˆ0 is compared with the target attribute prompt, and the resulting object… view at source ↗

**Figure 4.** Figure 4: Effect of embedding update timing. Apply Ratio denotes the fraction of total denoising steps during which embedding updates are active, where 0.1–0.5 corresponds to early-stage, 0.3–0.7 to mid-stage, and 0.5–0.9 to late-stage intervention. Stage 1: Early-stage Alignment. The first stage provides a coarse global correction before demographic attributes become firmly established. Rather than directly modi… view at source ↗

**Figure 5.** Figure 5: Intersectional attribute control. Our method enables simultaneous control of gender and race, generating diverse attribute combinations while preserving realism. our method not only reduces the dominance of majority demographic groups in generated samples, but also improves balance in the quality of generated images across attributes. Compared with strong baselines such as LightFair, our method further i… view at source ↗

**Figure 6.** Figure 6: Generalization to Stable Diffusion v1.5. Our method consistently reduces bias while preserving semantic fidelity across different diffusion backbones. such as facial structure, pose, and overall scene composition. As shown in [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗

read the original abstract

Text-to-image diffusion models achieve impressive visual quality, yet demographic bias remains a challenge, as neutral prompts consistently produce stereotypical representations across gender and race. Existing approaches remain limited by costly retraining or by inference-time interventions that often degrade image quality and semantic alignment. We propose Text Embedding Steering (TES), a training-free framework that mitigates demographic bias by directly optimizing conditional text embeddings during the diffusion process. We show that a two-stage strategy - early-stage global alignment followed by iterative denoising-time refinement with CLIP-based feedback - enables stable and controllable attribute steering without modifying model parameters. Extensive experiments on Stable Diffusion demonstrate that TES outperforms existing training-free baselines in fairness while maintaining competitive image quality. These results highlight that inference-time text embedding optimization is a practical and scalable solution for fairness-aware generation in diffusion models.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's core idea is a two-stage CLIP-guided optimization of text embeddings to cut demographic bias in diffusion models without training, but the method's viability turns on whether CLIP signals stay reliable during denoising.

read the letter

The one thing to flag is that TES tries to fix bias in models like Stable Diffusion by directly tweaking the conditional text embeddings in two phases: an early global alignment step followed by iterative refinement using CLIP feedback at denoising time. This keeps the model weights untouched and claims better fairness than other training-free baselines while holding image quality steady.

The two-stage structure is the clearest addition here. Most prior inference-time fixes either apply post-hoc corrections or alter the sampling schedule in ways that hurt alignment. Steering the embedding itself during the process is a straightforward alternative that could scale easily if it works.

The weakest part is the assumption that CLIP can supply trustworthy gradients for attributes like gender and race on partially denoised latents. CLIP itself comes from the same web data that encodes the stereotypes the method wants to remove, so its scores could lock in spurious correlations or create new semantic drift. The abstract states that experiments show outperformance, yet it gives no concrete metrics, no ablation on the CLIP signal accuracy against human judgments, and no check for embedding drift from the original prompt. That gap makes the central claim hard to evaluate from the description alone.

The work is aimed at people building or deploying text-to-image systems who need practical, no-retrain options for fairness. A reader already working on inference-time control or CLIP-based guidance would get the most out of the two-stage recipe.

It is worth sending to peer review. The problem matters and the framing is distinct enough that referees can pressure-test the CLIP reliability question with the full experiments and ablations.

Referee Report

2 major / 0 minor

Summary. The manuscript proposes Text Embedding Steering (TES), a training-free debiasing method for text-to-image diffusion models. It optimizes conditional text embeddings via a two-stage process—early global alignment followed by iterative CLIP-guided refinement during denoising—claiming this achieves superior fairness on gender/race attributes compared to training-free baselines while preserving image quality and semantic fidelity on Stable Diffusion.

Significance. If the empirical claims hold under rigorous controls, the work would demonstrate a practical inference-time intervention that avoids retraining costs, offering a scalable route to controllable attribute steering in diffusion models. The two-stage design and emphasis on parameter-free operation are potentially useful contributions to fairness-aware generation.

major comments (2)

[Abstract] Abstract: the central claim of outperformance on fairness metrics while maintaining quality is asserted without any reported quantitative results, baselines, datasets, or controls; this absence prevents evaluation of whether the two-stage strategy actually supports the fairness claim.
[Abstract (method description)] The method's reliance on CLIP similarity signals extracted from partially denoised latents for demographic attribute optimization lacks any reported validation (e.g., correlation with human labels or ablation on embedding drift); because CLIP was trained on the same web data containing the target stereotypes, this untested assumption is load-bearing for the refinement stage.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. The abstract summarizes results detailed in the full manuscript (Sections 4-5), but we agree it can be strengthened with explicit metrics. For the CLIP component, our experiments include supporting ablations, though we acknowledge the need for clearer validation reporting.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim of outperformance on fairness metrics while maintaining quality is asserted without any reported quantitative results, baselines, datasets, or controls; this absence prevents evaluation of whether the two-stage strategy actually supports the fairness claim.

Authors: The abstract provides a high-level overview; quantitative results appear in the main text, including fairness metric improvements (e.g., reduced gender/race bias scores vs. baselines like prompt engineering and embedding editing), datasets (e.g., neutral prompts for gender/race), controls, and quality preservation (FID, CLIP similarity). We will revise the abstract to include key numerical results and baseline names for immediate evaluability. revision: yes
Referee: [Abstract (method description)] The method's reliance on CLIP similarity signals extracted from partially denoised latents for demographic attribute optimization lacks any reported validation (e.g., correlation with human labels or ablation on embedding drift); because CLIP was trained on the same web data containing the target stereotypes, this untested assumption is load-bearing for the refinement stage.

Authors: The manuscript reports ablations on the refinement stage (Section 4.3 and supp. mat.) showing stable attribute steering and limited embedding drift via the two-stage design. CLIP guidance on partial latents is validated indirectly through final image metrics and semantic fidelity. Direct human correlation studies on intermediate latents are absent; we note this as a common limitation in CLIP-based methods and can add targeted validation or discussion of the bias concern in revision. revision: partial

Circularity Check

0 steps flagged

No circularity detected in derivation chain

full rationale

The paper proposes the TES framework as a training-free two-stage optimization of text embeddings using early global alignment and later CLIP-guided refinement during denoising. No equations, self-definitional reductions, fitted inputs renamed as predictions, or load-bearing self-citations appear in the provided text. The central claim rests on the described procedure and experimental results on Stable Diffusion rather than any step that reduces by construction to its own inputs. This is a standard methodological contribution with independent content.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no equations, parameters, or explicit assumptions; cannot identify free parameters, axioms, or invented entities.

pith-pipeline@v0.9.1-grok · 5667 in / 991 out tokens · 25764 ms · 2026-07-02T13:55:48.575465+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

52 extracted references · 12 canonical work pages

[1]

Hritik Bansal, Da Yin, Masoud Monajatipoor, and Kai-Wei Chang. How well can text-to-image generative models un- derstand ethical natural language interventions? InPro- ceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1358–1370, 2022. 2, 3, 7

2022
[2]

Easily acces- sible text-to-image generation amplifies demographic stereo- types at large scale

Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, and Aylin Caliskan. Easily acces- sible text-to-image generation amplifies demographic stereo- types at large scale. In2023 ACM Conference on Fairness, Accountability, and Transparency, pages 1493–1504, 2023. 2

2023
[3]

Debiaspi: Inference-time debiasing by prompt iter- ation of a text-to-image generative model.arXiv preprint arXiv:2501.18642, 2025

Sarah Bonna, Yu-Cheng Huang, Ekaterina Novozhilova, Sejin Paik, Zhengyang Shan, Michelle Yilin Feng, Ge Gao, Yonish Tayal, Rushil Kulkarni, Jialin Yu, Nupur Divekar, Deepti Ghadiyaram, Derry Wijaya, and Margrit Betke. Debiaspi: Inference-time debiasing by prompt iter- ation of a text-to-image generative model.arXiv preprint arXiv:2501.18642, 2025. 2, 3

work page arXiv 2025
[4]

Dall-eval: Probing the reasoning skills and social biases of text-to- image generation models

Jaemin Cho, Abhay Zala, and Mohit Bansal. Dall-eval: Probing the reasoning skills and social biases of text-to- image generation models. InICCV, 2023. 2

2023
[5]

Fair generative modeling via weak supervision

Kristy Choi, Aditya Grover, Trisha Singh, Rui Shu, and Ste- fano Ermon. Fair generative modeling via weak supervision. InICML, 2020

2020
[6]

Fair sampling in diffusion models through switching mechanism

Yujin Choi, Jinseong Park, Hoki Kim, Jaewook Lee, and Saerom Park. Fair sampling in diffusion models through switching mechanism. InAAAI, 2024. 2, 3

2024
[7]

Debiasing vision- language models via biased prompts.arXiv preprint arXiv:2302.00070, 2023

Ching-Yao Chuang, Varun Jampani, Yuanzhen Li, Anto- nio Torralba, and Stefanie Jegelka. Debiasing vision- language models via biased prompts.arXiv preprint arXiv:2302.00070, 2023. 7

work page arXiv 2023
[8]

arXiv preprint arXiv:2302.10893 , year=

Felix Friedrich, Manuel Brack, Lukas Struppek, Dominik Hintersdorf, Patrick Schramowski, Sasha Luccioni, and Kristian Kersting. Fair diffusion: Instructing text-to- image generation models on fairness.arXiv preprint arXiv:2302.10893, 2023. 2, 3

work page arXiv 2023
[9]

De- laney, and Chris Russell

Zihao Fu, Ryan Brown, Shun Shao, Kai Rawal, Eoin D. De- laney, and Chris Russell. Fairimagen: Post-processing for bias mitigation in text-to-image models. InNeurIPS, 2025. 2, 3

2025
[10]

An image is worth one word: Personalizing text-to-image generation using textual inversion

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit Haim Bermano, Gal Chechik, and Daniel Cohen-Or. An image is worth one word: Personalizing text-to-image generation using textual inversion. InICLR, 2023

2023
[11]

Unified concept editing in dif- fusion models

Rohit Gandikota, Hadas Orgad, Yonatan Belinkov, Joanna Materzy´nska, and David Bau. Unified concept editing in dif- fusion models. InProceedings of the IEEE/CVF Winter Con- ference on Applications of Computer Vision (WACV), pages 5111–5120, 2024. 7

2024
[12]

Lightfair: Towards an efficient alternative for fair t2i diffusion via debiasing pre-trained text encoders

Boyu Han, Qianqian Xu, Shilong Bao, Zhiyong Yang, Kan- gli Zi, and Qingming Huang. Lightfair: Towards an efficient alternative for fair t2i diffusion via debiasing pre-trained text encoders. InAdvances in Neural Information Processing Systems (NeurIPS), 2025. 2, 7

2025
[13]

Gans trained by a two time-scale update rule converge to a local nash equilib- rium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium. InNeurIPS, 2017

2017
[14]

Saner: Annotation-free societal attribute neutralizer for de- biasing clip

Yusuke Hirota, Min-Hung Chen, Chien-Yi Wang, Yuta Nakashima, Yu-Chiang Frank Wang, and Ryo Hachiuma. Saner: Annotation-free societal attribute neutralizer for de- biasing clip. InICLR, 2025

2025
[15]

Denoising diffu- sion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models. InNeurIPS, 2020. 3

2020
[16]

Aitti: Learning adaptive inclusive token for text-to-image gener- ation.arXiv preprint arXiv:2406.12805, 2026

Xinyu Hou, Xiaoming Li, and Chen Change Loy. Aitti: Learning adaptive inclusive token for text-to-image gener- ation.arXiv preprint arXiv:2406.12805, 2026. 2

work page arXiv 2026
[17]

Imperfect Ima- GANation: Implications of GANs exacerbating biases on facial data augmentation and Snapchat selfie lenses.arXiv preprint arXiv:2001.09528, 2020

Niharika Jain, Alberto Olmo, Sailik Sengupta, Lydia Manikonda, and Subbarao Kambhampati. Imperfect Ima- GANation: Implications of GANs exacerbating biases on facial data augmentation and Snapchat selfie lenses.arXiv preprint arXiv:2001.09528, 2020. 2

work page arXiv 2001
[18]

Mitigating social biases in text-to-image diffusion models via linguistic-aligned attention guidance

Yue Jiang, Yueming Lyu, Ziwen He, Bo Peng, and Jing Dong. Mitigating social biases in text-to-image diffusion models via linguistic-aligned attention guidance. InACM MM, 2024. 2, 3

2024
[19]

Fairgen: Con- trolling sensitive attributes for fair generations in diffusion models via adaptive latent guidance

Mintong Kang, Vinayshekhar Bannihatti Kumar, Shamik Roy, Abhishek Kumar, Sopan Khosla, Balakrishnan Murali Narayanaswamy, and Rashmi Gangadharaiah. Fairgen: Con- trolling sensitive attributes for fair generations in diffusion models via adaptive latent guidance. InProceedings of the Conference on Empirical Methods in Natural Language Pro- cessing (EMNLP),...

2025
[20]

Fair text-to-image diffusion via fair mapping.arXiv preprint arXiv:2311.17695, 2024

Jia Li, Lijie Hu, Jingfeng Zhang, Tianhang Zheng, Hua Zhang, and Di Wang. Fair text-to-image diffusion via fair mapping.arXiv preprint arXiv:2311.17695, 2024. 2

work page arXiv 2024
[21]

Scoft: Self-contrastive fine-tuning for equitable image generation

Zhixuan Liu, Peter Schaldenbrand, Beverley-Claire Okogwu, Wenxuan Peng, Youngsik Yun, Andrew Hundt, Jihie Kim, and Jean Oh. Scoft: Self-contrastive fine-tuning for equitable image generation. InCVPR, 2024. 2

2024
[22]

Stable bias: Evaluating societal representa- tions in diffusion models

Sasha Luccioni, Christopher Akiki, Margaret Mitchell, and Yacine Jernite. Stable bias: Evaluating societal representa- tions in diffusion models. InNeurIPS, 2023. 2

2023
[23]

Glide: Towards photorealistic image genera- tion and editing with text-guided diffusion models

Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image genera- tion and editing with text-guided diffusion models. InICML,
[24]

Dinov2: Learning robust visual features without supervision

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. Transactions on Machine Learning Research (TMLR), pages 1–31, 2024

2024
[25]

Editing implicit assumptions in text-to-image diffusion models

Hadas Orgad, Bahjat Kawar, and Yonatan Belinkov. Editing implicit assumptions in text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 7053–7061, 2023. 2

2023
[26]

Venkatesh Babu

Rishubh Parihar, Abhijnya Bhat, Abhipsa Basu, Saswat Mallick, Jogendra Nath Kundu, and R. Venkatesh Babu. Bal- ancing act: Distribution-guided debiasing in diffusion mod- els. InCVPR, 2024. 2

2024
[27]

Fair generation with- out unfair distortions: Debiasing text-to-image generation with entanglement-free attention

Jeonghoon Park, Juyoung Lee, Chaeyeon Chung, Jaeseong Lee, Jaegul Choo, and Jindong Gu. Fair generation with- out unfair distortions: Debiasing text-to-image generation with entanglement-free attention. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025. 2

2025
[28]

Perera and Vishal M

Malsha V . Perera and Vishal M. Patel. Analyzing bias in diffusion-based face generation models.arXiv preprint arXiv:2305.06402, 2023. 2

work page arXiv 2023
[29]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InICML, 2021. 4

2021
[30]

Zero-shot text-to-image generation

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. InICML, 2021. 1

2021
[31]

High-resolution image syn- thesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image syn- thesis with latent diffusion models. InCVPR, 2022. 1, 3, 6, 7

2022
[32]

Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Sali- mans, Jonathan Ho, David J

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L. Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Sali- mans, Jonathan Ho, David J. Fleet, and Mohammad Norouzi. Photorealistic text-to-image diffusion models with deep lan- guage understanding. InNeurIPS, 2022. 1

2022
[33]

Improved techniques for training gans

Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. InNeurIPS, 2016

2016
[34]

The bias amplification paradox in text-to-image generation.arXiv preprint arXiv:2308.00755, 2023

Preethi Seshadri, Sameer Singh, and Yanai Elazar. The bias amplification paradox in text-to-image generation.arXiv preprint arXiv:2308.00755, 2023. 2

work page arXiv 2023
[35]

Dear: Debiasing vision-language models with additive residuals

Ashish Seth, Mayur Hemani, and Chirag Agarwal. Dear: Debiasing vision-language models with additive residuals. In CVPR, 2023

2023
[36]

Finetuning text-to-image diffusion models for fairness

Xudong Shen, Chao Du, Tianyu Pang, Min Lin, Yongkang Wong, and Mohan Kankanhalli. Finetuning text-to-image diffusion models for fairness. InICLR, 2024

2024
[37]

Finetuning text-to-image diffusion models for fairness

Xudong Shen, Chao Du, Tianyu Pang, Min Lin, Yongkang Wong, and Mohan Kankanhalli. Finetuning text-to-image diffusion models for fairness. InICLR, 2024. 2

2024
[38]

Fairrag: Fair human genera- tion via fair retrieval augmentation

Robik Shrestha, Yang Zou, Qiuyu Chen, Zhiheng Li, Yusheng Xie, and Siqi Deng. Fairrag: Fair human genera- tion via fair retrieval augmentation. InCVPR, 2024. 2

2024
[39]

Denois- ing diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denois- ing diffusion implicit models. InICLR, 2021. 3

2021
[40]

Exploit- ing cultural biases via homoglyphs in text-to-image synthe- sis.Journal of Artificial Intelligence Research, pages 1017– 1068, 2023

Lukas Struppek, Dom Hintersdorf, Felix Friedrich, Manuel Brack, Patrick Schramowski, and Kristian Kersting. Exploit- ing cultural biases via homoglyphs in text-to-image synthe- sis.Journal of Artificial Intelligence Research, pages 1017– 1068, 2023. 2

2023
[41]

Christopher T. H. Teo, Milad Abdollahzadeh, and Ngai-Man Cheung. Fair generative models via transfer learning. In AAAI, 2023. 2

2023
[42]

DECAF: Generating fair synthetic data using causally-aware generative networks

Boris van Breugel, Trent Kyono, Jeroen Berrevoets, and Mi- haela van der Schaar. DECAF: Generating fair synthetic data using causally-aware generative networks. InNeurIPS, 2021. 2

2021
[43]

Fully unsupervised self-debiasing of text-to-image diffusion models

Korada Sri Vardhana, Shrikrishna Lolla, and Soma Biswas. Fully unsupervised self-debiasing of text-to-image diffusion models. InWACV, 2026. 2

2026
[44]

Moesd: Mixture of ex- perts stable diffusion to mitigate gender bias.arXiv preprint arXiv:2407.11002, 2024

Guorun Wang and Lucia Specia. Moesd: Mixture of ex- perts stable diffusion to mitigate gender bias.arXiv preprint arXiv:2407.11002, 2024. 2

work page arXiv 2024
[45]

T2IAT: Measuring valence and stereotypical biases in text-to-image generation

Jialu Wang, Xinyue Liu, Zonglin Di, Yang Liu, and Xin Wang. T2IAT: Measuring valence and stereotypical biases in text-to-image generation. InFindings of the Association for Computational Linguistics (ACL), 2023. 2

2023
[46]

Model-agnostic gender bias control for text-to- image generation via sparse autoencoder.arXiv preprint arXiv:2507.20973, 2025

Chao Wu, Zhenyi Wang, Kangxian Xie, Naresh Ku- mar Devulapally, Vishnu Suresh Lokhande, and Mingchen Gao. Model-agnostic gender bias control for text-to- image generation via sparse autoencoder.arXiv preprint arXiv:2507.20973, 2025. 2

work page arXiv 2025
[47]

Stable dif- fusion exposed: Gender bias from prompt to image.arXiv preprint arXiv:2312.03027, 2024

Yankun Wu, Yuta Nakashima, and Noa Garcia. Stable dif- fusion exposed: Gender bias from prompt to image.arXiv preprint arXiv:2312.03027, 2024. 2

work page arXiv 2024
[48]

Mist: Mitigating intersectional bias with disentangled cross- attention editing in text-to-image diffusion models.arXiv preprint arXiv:2403.19738, 2024

Hidir Yesiltepe, Kiymet Akdemir, and Pinar Yanardag. Mist: Mitigating intersectional bias with disentangled cross- attention editing in text-to-image diffusion models.arXiv preprint arXiv:2403.19738, 2024. 2, 3

work page arXiv 2024
[49]

ITI- GEN: Inclusive text-to-image generation

Cheng Zhang, Xuanbai Chen, Siqi Chai, Henry Chen Wu, Dmitry Lagun, Thabo Beeler, and Fernando De la Torre. ITI- GEN: Inclusive text-to-image generation. InICCV, 2023. 2 Training-Free Debiasing of Diffusion Models via CLIP-Guided Denoising Optimization Supplementary Material

2023
[50]

male” and “female

Implementation Details This section provides additional implementation details that are omitted from the main paper for brevity. Unless other- wise specified, all experiments follow the settings described in Sec. 5.1 of the main paper. 1.1. Attribute and Semantic Alignment At every active optimization timestep, the reconstructed clean imageˆx0 is decoded ...
[51]

a photo of a{profession}

Additional Results 2.1. Quantitative Results Generalization to Other Backbones.To assess whether our method is specific to a particular diffusion model, we additionally evaluate it on Stable Diffusion v1.5 under the same experimental protocol. As shown in Table S4 and Ta- ble S5, our approach consistently improves fairness metrics while maintaining compet...
[52]

Photo portrait of a{profession}, a per- son

Fairness Metrics Details FollowingLightFair[12], we adopt the same evaluation protocol and report bothfairnessandgeneration quality metrics. In our experiments, we consider six professions: doctor, CEO, taxi driver, nurse, artist,andteacher. For each profession, we generate 100 images, resulting in a total of 600 generated images for evaluation. All metri...

[1] [1]

Hritik Bansal, Da Yin, Masoud Monajatipoor, and Kai-Wei Chang. How well can text-to-image generative models un- derstand ethical natural language interventions? InPro- ceedings of the Conference on Empirical Methods in Natural Language Processing (EMNLP), pages 1358–1370, 2022. 2, 3, 7

2022

[2] [2]

Easily acces- sible text-to-image generation amplifies demographic stereo- types at large scale

Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, and Aylin Caliskan. Easily acces- sible text-to-image generation amplifies demographic stereo- types at large scale. In2023 ACM Conference on Fairness, Accountability, and Transparency, pages 1493–1504, 2023. 2

2023

[3] [3]

Debiaspi: Inference-time debiasing by prompt iter- ation of a text-to-image generative model.arXiv preprint arXiv:2501.18642, 2025

Sarah Bonna, Yu-Cheng Huang, Ekaterina Novozhilova, Sejin Paik, Zhengyang Shan, Michelle Yilin Feng, Ge Gao, Yonish Tayal, Rushil Kulkarni, Jialin Yu, Nupur Divekar, Deepti Ghadiyaram, Derry Wijaya, and Margrit Betke. Debiaspi: Inference-time debiasing by prompt iter- ation of a text-to-image generative model.arXiv preprint arXiv:2501.18642, 2025. 2, 3

work page arXiv 2025

[4] [4]

Dall-eval: Probing the reasoning skills and social biases of text-to- image generation models

Jaemin Cho, Abhay Zala, and Mohit Bansal. Dall-eval: Probing the reasoning skills and social biases of text-to- image generation models. InICCV, 2023. 2

2023

[5] [5]

Fair generative modeling via weak supervision

Kristy Choi, Aditya Grover, Trisha Singh, Rui Shu, and Ste- fano Ermon. Fair generative modeling via weak supervision. InICML, 2020

2020

[6] [6]

Fair sampling in diffusion models through switching mechanism

Yujin Choi, Jinseong Park, Hoki Kim, Jaewook Lee, and Saerom Park. Fair sampling in diffusion models through switching mechanism. InAAAI, 2024. 2, 3

2024

[7] [7]

Debiasing vision- language models via biased prompts.arXiv preprint arXiv:2302.00070, 2023

Ching-Yao Chuang, Varun Jampani, Yuanzhen Li, Anto- nio Torralba, and Stefanie Jegelka. Debiasing vision- language models via biased prompts.arXiv preprint arXiv:2302.00070, 2023. 7

work page arXiv 2023

[8] [8]

arXiv preprint arXiv:2302.10893 , year=

Felix Friedrich, Manuel Brack, Lukas Struppek, Dominik Hintersdorf, Patrick Schramowski, Sasha Luccioni, and Kristian Kersting. Fair diffusion: Instructing text-to- image generation models on fairness.arXiv preprint arXiv:2302.10893, 2023. 2, 3

work page arXiv 2023

[9] [9]

De- laney, and Chris Russell

Zihao Fu, Ryan Brown, Shun Shao, Kai Rawal, Eoin D. De- laney, and Chris Russell. Fairimagen: Post-processing for bias mitigation in text-to-image models. InNeurIPS, 2025. 2, 3

2025

[10] [10]

An image is worth one word: Personalizing text-to-image generation using textual inversion

Rinon Gal, Yuval Alaluf, Yuval Atzmon, Or Patashnik, Amit Haim Bermano, Gal Chechik, and Daniel Cohen-Or. An image is worth one word: Personalizing text-to-image generation using textual inversion. InICLR, 2023

2023

[11] [11]

Unified concept editing in dif- fusion models

Rohit Gandikota, Hadas Orgad, Yonatan Belinkov, Joanna Materzy´nska, and David Bau. Unified concept editing in dif- fusion models. InProceedings of the IEEE/CVF Winter Con- ference on Applications of Computer Vision (WACV), pages 5111–5120, 2024. 7

2024

[12] [12]

Lightfair: Towards an efficient alternative for fair t2i diffusion via debiasing pre-trained text encoders

Boyu Han, Qianqian Xu, Shilong Bao, Zhiyong Yang, Kan- gli Zi, and Qingming Huang. Lightfair: Towards an efficient alternative for fair t2i diffusion via debiasing pre-trained text encoders. InAdvances in Neural Information Processing Systems (NeurIPS), 2025. 2, 7

2025

[13] [13]

Gans trained by a two time-scale update rule converge to a local nash equilib- rium

Martin Heusel, Hubert Ramsauer, Thomas Unterthiner, Bernhard Nessler, and Sepp Hochreiter. Gans trained by a two time-scale update rule converge to a local nash equilib- rium. InNeurIPS, 2017

2017

[14] [14]

Saner: Annotation-free societal attribute neutralizer for de- biasing clip

Yusuke Hirota, Min-Hung Chen, Chien-Yi Wang, Yuta Nakashima, Yu-Chiang Frank Wang, and Ryo Hachiuma. Saner: Annotation-free societal attribute neutralizer for de- biasing clip. InICLR, 2025

2025

[15] [15]

Denoising diffu- sion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffu- sion probabilistic models. InNeurIPS, 2020. 3

2020

[16] [16]

Aitti: Learning adaptive inclusive token for text-to-image gener- ation.arXiv preprint arXiv:2406.12805, 2026

Xinyu Hou, Xiaoming Li, and Chen Change Loy. Aitti: Learning adaptive inclusive token for text-to-image gener- ation.arXiv preprint arXiv:2406.12805, 2026. 2

work page arXiv 2026

[17] [17]

Imperfect Ima- GANation: Implications of GANs exacerbating biases on facial data augmentation and Snapchat selfie lenses.arXiv preprint arXiv:2001.09528, 2020

Niharika Jain, Alberto Olmo, Sailik Sengupta, Lydia Manikonda, and Subbarao Kambhampati. Imperfect Ima- GANation: Implications of GANs exacerbating biases on facial data augmentation and Snapchat selfie lenses.arXiv preprint arXiv:2001.09528, 2020. 2

work page arXiv 2001

[18] [18]

Mitigating social biases in text-to-image diffusion models via linguistic-aligned attention guidance

Yue Jiang, Yueming Lyu, Ziwen He, Bo Peng, and Jing Dong. Mitigating social biases in text-to-image diffusion models via linguistic-aligned attention guidance. InACM MM, 2024. 2, 3

2024

[19] [19]

Fairgen: Con- trolling sensitive attributes for fair generations in diffusion models via adaptive latent guidance

Mintong Kang, Vinayshekhar Bannihatti Kumar, Shamik Roy, Abhishek Kumar, Sopan Khosla, Balakrishnan Murali Narayanaswamy, and Rashmi Gangadharaiah. Fairgen: Con- trolling sensitive attributes for fair generations in diffusion models via adaptive latent guidance. InProceedings of the Conference on Empirical Methods in Natural Language Pro- cessing (EMNLP),...

2025

[20] [20]

Fair text-to-image diffusion via fair mapping.arXiv preprint arXiv:2311.17695, 2024

Jia Li, Lijie Hu, Jingfeng Zhang, Tianhang Zheng, Hua Zhang, and Di Wang. Fair text-to-image diffusion via fair mapping.arXiv preprint arXiv:2311.17695, 2024. 2

work page arXiv 2024

[21] [21]

Scoft: Self-contrastive fine-tuning for equitable image generation

Zhixuan Liu, Peter Schaldenbrand, Beverley-Claire Okogwu, Wenxuan Peng, Youngsik Yun, Andrew Hundt, Jihie Kim, and Jean Oh. Scoft: Self-contrastive fine-tuning for equitable image generation. InCVPR, 2024. 2

2024

[22] [22]

Stable bias: Evaluating societal representa- tions in diffusion models

Sasha Luccioni, Christopher Akiki, Margaret Mitchell, and Yacine Jernite. Stable bias: Evaluating societal representa- tions in diffusion models. InNeurIPS, 2023. 2

2023

[23] [23]

Glide: Towards photorealistic image genera- tion and editing with text-guided diffusion models

Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image genera- tion and editing with text-guided diffusion models. InICML,

[24] [24]

Dinov2: Learning robust visual features without supervision

Maxime Oquab, Timoth ´ee Darcet, Th ´eo Moutakanni, Huy V o, Marc Szafraniec, Vasil Khalidov, Pierre Fernandez, Daniel Haziza, Francisco Massa, Alaaeldin El-Nouby, et al. Dinov2: Learning robust visual features without supervision. Transactions on Machine Learning Research (TMLR), pages 1–31, 2024

2024

[25] [25]

Editing implicit assumptions in text-to-image diffusion models

Hadas Orgad, Bahjat Kawar, and Yonatan Belinkov. Editing implicit assumptions in text-to-image diffusion models. In Proceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pages 7053–7061, 2023. 2

2023

[26] [26]

Venkatesh Babu

Rishubh Parihar, Abhijnya Bhat, Abhipsa Basu, Saswat Mallick, Jogendra Nath Kundu, and R. Venkatesh Babu. Bal- ancing act: Distribution-guided debiasing in diffusion mod- els. InCVPR, 2024. 2

2024

[27] [27]

Fair generation with- out unfair distortions: Debiasing text-to-image generation with entanglement-free attention

Jeonghoon Park, Juyoung Lee, Chaeyeon Chung, Jaeseong Lee, Jaegul Choo, and Jindong Gu. Fair generation with- out unfair distortions: Debiasing text-to-image generation with entanglement-free attention. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), 2025. 2

2025

[28] [28]

Perera and Vishal M

Malsha V . Perera and Vishal M. Patel. Analyzing bias in diffusion-based face generation models.arXiv preprint arXiv:2305.06402, 2023. 2

work page arXiv 2023

[29] [29]

Learning transferable visual models from natural language supervision

Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, Gretchen Krueger, and Ilya Sutskever. Learning transferable visual models from natural language supervision. InICML, 2021. 4

2021

[30] [30]

Zero-shot text-to-image generation

Aditya Ramesh, Mikhail Pavlov, Gabriel Goh, Scott Gray, Chelsea V oss, Alec Radford, Mark Chen, and Ilya Sutskever. Zero-shot text-to-image generation. InICML, 2021. 1

2021

[31] [31]

High-resolution image syn- thesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj¨orn Ommer. High-resolution image syn- thesis with latent diffusion models. InCVPR, 2022. 1, 3, 6, 7

2022

[32] [32]

Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Sali- mans, Jonathan Ho, David J

Chitwan Saharia, William Chan, Saurabh Saxena, Lala Li, Jay Whang, Emily L. Denton, Kamyar Ghasemipour, Raphael Gontijo Lopes, Burcu Karagol Ayan, Tim Sali- mans, Jonathan Ho, David J. Fleet, and Mohammad Norouzi. Photorealistic text-to-image diffusion models with deep lan- guage understanding. InNeurIPS, 2022. 1

2022

[33] [33]

Improved techniques for training gans

Tim Salimans, Ian Goodfellow, Wojciech Zaremba, Vicki Cheung, Alec Radford, and Xi Chen. Improved techniques for training gans. InNeurIPS, 2016

2016

[34] [34]

The bias amplification paradox in text-to-image generation.arXiv preprint arXiv:2308.00755, 2023

Preethi Seshadri, Sameer Singh, and Yanai Elazar. The bias amplification paradox in text-to-image generation.arXiv preprint arXiv:2308.00755, 2023. 2

work page arXiv 2023

[35] [35]

Dear: Debiasing vision-language models with additive residuals

Ashish Seth, Mayur Hemani, and Chirag Agarwal. Dear: Debiasing vision-language models with additive residuals. In CVPR, 2023

2023

[36] [36]

Finetuning text-to-image diffusion models for fairness

Xudong Shen, Chao Du, Tianyu Pang, Min Lin, Yongkang Wong, and Mohan Kankanhalli. Finetuning text-to-image diffusion models for fairness. InICLR, 2024

2024

[37] [37]

Finetuning text-to-image diffusion models for fairness

Xudong Shen, Chao Du, Tianyu Pang, Min Lin, Yongkang Wong, and Mohan Kankanhalli. Finetuning text-to-image diffusion models for fairness. InICLR, 2024. 2

2024

[38] [38]

Fairrag: Fair human genera- tion via fair retrieval augmentation

Robik Shrestha, Yang Zou, Qiuyu Chen, Zhiheng Li, Yusheng Xie, and Siqi Deng. Fairrag: Fair human genera- tion via fair retrieval augmentation. InCVPR, 2024. 2

2024

[39] [39]

Denois- ing diffusion implicit models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denois- ing diffusion implicit models. InICLR, 2021. 3

2021

[40] [40]

Exploit- ing cultural biases via homoglyphs in text-to-image synthe- sis.Journal of Artificial Intelligence Research, pages 1017– 1068, 2023

Lukas Struppek, Dom Hintersdorf, Felix Friedrich, Manuel Brack, Patrick Schramowski, and Kristian Kersting. Exploit- ing cultural biases via homoglyphs in text-to-image synthe- sis.Journal of Artificial Intelligence Research, pages 1017– 1068, 2023. 2

2023

[41] [41]

Christopher T. H. Teo, Milad Abdollahzadeh, and Ngai-Man Cheung. Fair generative models via transfer learning. In AAAI, 2023. 2

2023

[42] [42]

DECAF: Generating fair synthetic data using causally-aware generative networks

Boris van Breugel, Trent Kyono, Jeroen Berrevoets, and Mi- haela van der Schaar. DECAF: Generating fair synthetic data using causally-aware generative networks. InNeurIPS, 2021. 2

2021

[43] [43]

Fully unsupervised self-debiasing of text-to-image diffusion models

Korada Sri Vardhana, Shrikrishna Lolla, and Soma Biswas. Fully unsupervised self-debiasing of text-to-image diffusion models. InWACV, 2026. 2

2026

[44] [44]

Moesd: Mixture of ex- perts stable diffusion to mitigate gender bias.arXiv preprint arXiv:2407.11002, 2024

Guorun Wang and Lucia Specia. Moesd: Mixture of ex- perts stable diffusion to mitigate gender bias.arXiv preprint arXiv:2407.11002, 2024. 2

work page arXiv 2024

[45] [45]

T2IAT: Measuring valence and stereotypical biases in text-to-image generation

Jialu Wang, Xinyue Liu, Zonglin Di, Yang Liu, and Xin Wang. T2IAT: Measuring valence and stereotypical biases in text-to-image generation. InFindings of the Association for Computational Linguistics (ACL), 2023. 2

2023

[46] [46]

Model-agnostic gender bias control for text-to- image generation via sparse autoencoder.arXiv preprint arXiv:2507.20973, 2025

Chao Wu, Zhenyi Wang, Kangxian Xie, Naresh Ku- mar Devulapally, Vishnu Suresh Lokhande, and Mingchen Gao. Model-agnostic gender bias control for text-to- image generation via sparse autoencoder.arXiv preprint arXiv:2507.20973, 2025. 2

work page arXiv 2025

[47] [47]

Stable dif- fusion exposed: Gender bias from prompt to image.arXiv preprint arXiv:2312.03027, 2024

Yankun Wu, Yuta Nakashima, and Noa Garcia. Stable dif- fusion exposed: Gender bias from prompt to image.arXiv preprint arXiv:2312.03027, 2024. 2

work page arXiv 2024

[48] [48]

Mist: Mitigating intersectional bias with disentangled cross- attention editing in text-to-image diffusion models.arXiv preprint arXiv:2403.19738, 2024

Hidir Yesiltepe, Kiymet Akdemir, and Pinar Yanardag. Mist: Mitigating intersectional bias with disentangled cross- attention editing in text-to-image diffusion models.arXiv preprint arXiv:2403.19738, 2024. 2, 3

work page arXiv 2024

[49] [49]

ITI- GEN: Inclusive text-to-image generation

Cheng Zhang, Xuanbai Chen, Siqi Chai, Henry Chen Wu, Dmitry Lagun, Thabo Beeler, and Fernando De la Torre. ITI- GEN: Inclusive text-to-image generation. InICCV, 2023. 2 Training-Free Debiasing of Diffusion Models via CLIP-Guided Denoising Optimization Supplementary Material

2023

[50] [50]

male” and “female

Implementation Details This section provides additional implementation details that are omitted from the main paper for brevity. Unless other- wise specified, all experiments follow the settings described in Sec. 5.1 of the main paper. 1.1. Attribute and Semantic Alignment At every active optimization timestep, the reconstructed clean imageˆx0 is decoded ...

[51] [51]

a photo of a{profession}

Additional Results 2.1. Quantitative Results Generalization to Other Backbones.To assess whether our method is specific to a particular diffusion model, we additionally evaluate it on Stable Diffusion v1.5 under the same experimental protocol. As shown in Table S4 and Ta- ble S5, our approach consistently improves fairness metrics while maintaining compet...

[52] [52]

Photo portrait of a{profession}, a per- son

Fairness Metrics Details FollowingLightFair[12], we adopt the same evaluation protocol and report bothfairnessandgeneration quality metrics. In our experiments, we consider six professions: doctor, CEO, taxi driver, nurse, artist,andteacher. For each profession, we generate 100 images, resulting in a total of 600 generated images for evaluation. All metri...