Recognition: unknown
Operationalizing Fairness in Text-to-Image Models: A Survey of Bias, Fairness Audits and Mitigation Strategies
Pith reviewed 2026-05-10 13:43 UTC · model grok-4.3
The pith
Text-to-image models require a shift from vague descriptive bias checks to concrete target-based fairness tests.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Current fairness work in text-to-image generation suffers from unclear distinctions between bias and fairness and from reliance on descriptive metrics that do not yield actionable decisions. The paper organizes the literature into a taxonomy of bias types and fairness notions, identifies the gap between normative target ideals and threshold standards with decision rules, surveys mitigation approaches from prompt engineering through diffusion manipulation, and concludes that a new operational framework centered on rigorous target-based testing is needed for accountable development.
What carries the argument
The proposed operationalizing framework, which separates target fairness (normative ideals) from threshold fairness (standards with decision rules) and replaces descriptive metrics with target-based testing.
If this is right
- Developers can apply the bias taxonomy to locate and classify unwanted patterns in model outputs before deployment.
- Mitigation methods such as prompt engineering or diffusion changes can be selected according to the specific fairness notion they address.
- Target-based testing supplies clear pass-fail criteria that can be written into development and release checklists.
- Accountability improves because teams must state explicit output targets rather than report only average metric values.
Where Pith is reading between the lines
- Industry groups could adopt the taxonomy as a shared checklist for auditing new text-to-image releases.
- The framework might extend naturally to other generative media such as video or audio where similar stereotype issues appear.
- Regulators could reference the target-versus-threshold distinction when drafting rules for high-impact AI image tools.
- Empirical studies could test whether models that meet stated targets also reduce downstream harms in real user interactions.
Load-bearing premise
The existing research literature can be exhaustively sorted into the paper's taxonomy of bias types and fairness notions, and the main obstacle to progress is the lack of a shift from descriptive to target-based evaluation.
What would settle it
A new review that uncovers major bias types or fairness concepts absent from the taxonomy, or an experiment showing that target-based testing produces no measurable reduction in stereotypical outputs compared with current descriptive audits.
Figures
read the original abstract
Text-to-Image (T2I) generation models have been widely adopted across various industries, yet are criticized for frequently exhibiting societal stereotypes. While a growing body of research has emerged to evaluate and mitigate these biases, the field at present contends with conceptual ambiguity, for example terms like "bias" and "fairness" are not always clearly distinguished and often lack clear operational definitions. This paper provides a comprehensive systematic review of T2I fairness literature, organizing existing work into a taxonomy of bias types and fairness notions. We critically assess the gap between "target fairness" (normative ideals in T2I outputs) and "threshold fairness" (normative standards with actionable decision rules). Furthermore, we survey the landscape of mitigation strategies, ranging from prompt engineering to diffusion process manipulation. We conclude by proposing a new framework for operationalizing fairness that moves beyond descriptive metrics towards rigorous, target-based testing, offering an approach for more accountable generative AI development.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper conducts a systematic review of bias and fairness research in text-to-image (T2I) generation models. It organizes the literature into a taxonomy of bias types and fairness notions, critically assesses the conceptual gap between 'target fairness' (normative ideals for T2I outputs) and 'threshold fairness' (actionable normative standards with decision rules), surveys mitigation strategies ranging from prompt engineering to diffusion process manipulation, and proposes a new framework to operationalize fairness through rigorous target-based testing rather than descriptive metrics.
Significance. If the taxonomy proves comprehensive and the framework supplies concrete, testable protocols, the work could become a key reference for standardizing fairness evaluation in generative AI. The synthesis of mitigation approaches and the emphasis on moving from descriptive to prescriptive methods address a genuine need for accountability in T2I systems; the absence of mathematical derivations or fitted models is appropriate for a survey but means the contribution rests on the clarity and completeness of the organizational scheme.
major comments (2)
- [concluding section / proposed framework] The central claim that the proposed framework moves 'beyond descriptive metrics towards rigorous, target-based testing' (abstract and concluding section) is load-bearing yet remains high-level; without explicit decision rules, example test protocols, or pseudocode for implementing target-based evaluation, it is difficult to verify that the framework is operationalizable or superior to existing threshold approaches.
- [taxonomy / methods section] The taxonomy of bias types and fairness notions is presented as comprehensive, but the manuscript does not detail the systematic review methodology (databases, search strings, inclusion/exclusion criteria, or date range). This omission directly affects the credibility of the gap assessment between target and threshold fairness, as unstated coverage limitations could mean key works are absent.
minor comments (2)
- [abstract] The abstract introduces 'target fairness' and 'threshold fairness' without brief parenthetical definitions; adding one-sentence glosses would improve accessibility for readers outside the immediate subfield.
- [mitigation survey] Mitigation strategies are surveyed from prompt engineering to diffusion manipulation, but the paper should include a summary table mapping each strategy to the bias types it targets and the fairness notion it addresses; this would strengthen the synthesis without altering the core argument.
Simulated Author's Rebuttal
We thank the referee for their constructive review and recommendation for minor revision. The comments identify areas where greater transparency and operational detail will strengthen the manuscript, and we address each point below with commitments to specific revisions.
read point-by-point responses
-
Referee: The central claim that the proposed framework moves 'beyond descriptive metrics towards rigorous, target-based testing' (abstract and concluding section) is load-bearing yet remains high-level; without explicit decision rules, example test protocols, or pseudocode for implementing target-based evaluation, it is difficult to verify that the framework is operationalizable or superior to existing threshold approaches.
Authors: We agree that the framework section would benefit from greater concreteness to support the central claim. In the revised manuscript we will expand the concluding section with (1) explicit decision rules that distinguish target fairness evaluation from threshold-based metrics, (2) two worked example test protocols (one for gender-stereotype auditing and one for occupational bias), and (3) pseudocode that outlines the target-based evaluation workflow. These additions will make the operational steps explicit while preserving the survey character of the paper. revision: yes
-
Referee: The taxonomy of bias types and fairness notions is presented as comprehensive, but the manuscript does not detail the systematic review methodology (databases, search strings, inclusion/exclusion criteria, or date range). This omission directly affects the credibility of the gap assessment between target and threshold fairness, as unstated coverage limitations could mean key works are absent.
Authors: We acknowledge the omission of methodological details. The revised manuscript will include a new subsection (placed after the introduction) that reports the systematic review protocol in full: databases searched (arXiv, ACL Anthology, Google Scholar, IEEE Xplore), exact search strings, inclusion/exclusion criteria, screening process, and date range (January 2019–December 2023). This addition will allow readers to assess coverage and will directly support the validity of the identified gap between target and threshold fairness. revision: yes
Circularity Check
No significant circularity identified
full rationale
This is a survey paper that organizes T2I fairness literature into a taxonomy of bias types and fairness notions, assesses the gap between target and threshold fairness, surveys mitigation strategies, and proposes a high-level framework based on that synthesis. No mathematical derivations, equations, fitted parameters, or predictions appear in the provided abstract or reader summary. The central claims are literature-based synthesis and a conceptual distinction presented as an organizing principle; they do not reduce to self-definitions, self-citation chains, or inputs renamed as outputs. The framework is explicitly derived from analysis of cited prior work rather than internal construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Fairness concepts in generative AI can be usefully taxonomized and operationalized through target-based testing
invented entities (2)
-
Target fairness
no independent evidence
-
Threshold fairness
no independent evidence
Reference graph
Works this paper leans on
-
[1]
First year cs students exploring and iden- tifying biases and social injustices in text-to-image generative ai
9 ICLR 2026 Algorithmic Fairness Across Alignment Procedures and Agentic Systems (AFAA) Workshop Mikko Apiola, Henriikka Vartiainen, and Matti Tedre. First year cs students exploring and iden- tifying biases and social injustices in text-to-image generative ai. pp. 485–491, 07
2026
-
[2]
doi: 10.1145/3649217.3653596. Basim Azam and Naveed Akhtar. Plug-and-play interpretable responsible text-to-image generation via dual-space multi-facet concept control. InProceedings of the Computer Vision and Pattern Recognition Conference, pp. 2976–2985,
-
[3]
doi: 10.18653/v1/2023.artofsafety-1.6
Association for Computational Linguistics. doi: 10.18653/v1/2023.artofsafety-1.6. URL https://aclanthology.org/2023.artofsafety-1.6. Federico Bianchi, Pratyusha Kalluri, Esin Durmus, Faisal Ladhak, Myra Cheng, Debora Nozza, Tatsunori Hashimoto, Dan Jurafsky, James Zou, and Aylin Caliskan. Easily accessible text-to- image generation amplifies demographic s...
-
[4]
Association for Computing Machinery. ISBN 9798400701924. doi: 10.1145/3593013.3594095. URLhttps://doi.org/10.1145/3593013.3594095. Tolga Bolukbasi, Kai-Wei Chang, James Y Zou, Venkatesh Saligrama, and Adam T Kalai. Man is to computer programmer as woman is to homemaker? debiasing word embeddings.Advances in neural information processing systems, 29,
-
[5]
Conditional fairness for gen- erative ais.arXiv preprint arXiv:2404.16663,
Chih-Hong Cheng, Harald Ruess, Changshun Wu, and Xingyu Zhao. Conditional fairness for gen- erative ais.arXiv preprint arXiv:2404.16663,
-
[6]
Debias- ing vision-language models via biased prompts.arXiv preprint arXiv:2302.00070,
Ching-Yao Chuang, Varun Jampani, Yuanzhen Li, Antonio Torralba, and Stefanie Jegelka. Debias- ing vision-language models via biased prompts.arXiv preprint arXiv:2302.00070,
-
[7]
The measure and mismeasure of fairness.Journal of Machine Learning Research, 24(312):1–117,
10 ICLR 2026 Algorithmic Fairness Across Alignment Procedures and Agentic Systems (AFAA) Workshop Sam Corbett-Davies, Johann D Gaebler, Hamed Nilforoshan, Ravi Shroff, and Sharad Goel. The measure and mismeasure of fairness.Journal of Machine Learning Research, 24(312):1–117,
2026
-
[8]
doi: 10.1093/ijpp/riae049. Moreno D’Inca, Elia Peruzzo, Massimiliano Mancini, Dejia Xu, Vidit Goel, Xingqian Xu, Zhangyang Wang, Humphrey Shi, and Nicu Sebe. Openbias: Open-set bias detection in text- to-image generative models. InCVPR,
-
[9]
Friedler, Carlos Scheidegger, and Suresh Venkatasubramanian
doi: 10.1145/3433949. Felix Friedrich, Manuel Brack, Lukas Struppek, Dominik Hintersdorf, Patrick Schramowski, Sasha Luccioni, and Kristian Kersting. Auditing and instructing text-to-image generation models on fairness.AI and Ethics, 5(3):2103–2123,
-
[10]
Concept sliders: Lora adaptors for precise control in diffusion models
Rohit Gandikota, Joanna Materzy ´nska, Tingrui Zhou, Antonio Torralba, and David Bau. Concept sliders: Lora adaptors for precise control in diffusion models. InEuropean Conference on Com- puter Vision, pp. 172–188. Springer, 2024a. Rohit Gandikota, Hadas Orgad, Yonatan Belinkov, Joanna Materzy ´nska, and David Bau. Unified concept editing in diffusion mod...
2024
-
[11]
Hila Gonen and Yoav Goldberg. Lipstick on a pig: Debiasing methods cover up systematic gender biases in word embeddings but do not remove them.arXiv preprint arXiv:1903.03862,
-
[12]
Deeper diffusion models amplify bias.arXiv preprint arXiv:2505.17560,
Shahin Hakemi, Naveed Akhtar, Ghulam Mubashar Hassan, and Ajmal Mian. Deeper diffusion models amplify bias.arXiv preprint arXiv:2505.17560,
-
[13]
Optimizing prompts for text-to-image generation
11 ICLR 2026 Algorithmic Fairness Across Alignment Procedures and Agentic Systems (AFAA) Workshop Yaru Hao, Zewen Chi, Li Dong, and Furu Wei. Optimizing prompts for text-to-image generation. Advances in Neural Information Processing Systems, 36:66923–66939,
2026
-
[14]
Clipscore: A reference-free evaluation metric for image captioning
Jack Hessel, Ari Holtzman, Maxwell Forbes, Ronan Le Bras, and Yejin Choi. Clipscore: A reference-free evaluation metric for image captioning. InProceedings of the 2021 conference on empirical methods in natural language processing, pp. 7514–7528,
2021
-
[15]
Infelm: In-depth fairness evaluation of large text-to-image models.arXiv preprint arXiv:2501.01973,
Di Jin, Xing Liu, Yu Liu, Jia Qing Yap, Andrea Wong, Adriana Crespo, Qi Lin, Zhiyuan Yin, Qiang Yan, and Ryan Ye. Infelm: In-depth fairness evaluation of large text-to-image models.arXiv preprint arXiv:2501.01973,
-
[16]
Mintong Kang, Vinayshekhar Bannihatti Kumar, Shamik Roy, Abhishek Kumar, Sopan Khosla, Balakrishnan Murali Narayanaswamy, and Rashmi Gangadharaiah. Fairgen: Controlling sensitive attributes for fair generations in diffusion models via adaptive latent guidance.arXiv preprint arXiv:2503.01872,
-
[17]
Inherent trade-offs in the fair determination of risk scores
Jon Kleinberg, Sendhil Mullainathan, and Manish Raghavan. Inherent trade-offs in the fair determination of risk scores. InProceedings of the 8th Innovations in Theoretical Com- puter Science Conference (ITCS 2017), volume 67 ofLeibniz International Proceedings in In- formatics (LIPIcs), pp. 43:1–43:23. Schloss Dagstuhl–Leibniz-Zentrum f ¨ur Informatik,
2017
-
[18]
Inherent trade-offs in the fair determination of risk scores
doi: 10.4230/LIPIcs.ITCS.2017.43. URLhttps://drops.dagstuhl.de/entities/ document/10.4230/LIPIcs.ITCS.2017.43. Tony Lee, Michihiro Yasunaga, Chenlin Meng, Yifan Mai, Joon Sung Park, Agrim Gupta, Yunzhi Zhang, Deepak Narayanan, Hannah Teufel, Marco Bellagente, et al. Holistic evaluation of text- to-image models.Advances in Neural Information Processing Sys...
-
[19]
Fair text-to-image diffusion via fair mapping
12 ICLR 2026 Algorithmic Fairness Across Alignment Procedures and Agentic Systems (AFAA) Workshop Jia Li, Lijie Hu, Jingfeng Zhang, Tianhang Zheng, Hua Zhang, and Di Wang. Fair text-to-image diffusion via fair mapping. InProceedings of the AAAI Conference on Artificial Intelligence, volume 39, pp. 26256–26264, 2025a. Lijun Li, Zhelun Shi, Xuhao Hu, Bowen ...
2026
-
[20]
Hanjun Luo, Ziye Deng, Haoyu Huang, Xuecheng Liu, Ruizhe Chen, and Zuozhu Liu. Versusdebias: Universal zero-shot debiasing for text-to-image models via slm-based prompt engineering and generative adversary.arXiv preprint arXiv:2407.19524,
-
[21]
Counts: Benchmarking llm numerical reasoning with verifiable rewards
Yunbo Lyu, Zhou Yang, Yuqing Niu, Jing Jiang, and David Lo. Do existing testing tools really uncover gender bias in text-to-image models?arXiv preprint arXiv:2501.00000,
-
[22]
they only care to show us the wheelchair
Kelly Avery Mack, Rida Qadri, Remi Denton, Shaun K. Kane, and Cynthia L. Bennett. “they only care to show us the wheelchair”: disability representation in text-to-image ai models. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, CHI ’24, New York, NY , USA,
2024
-
[23]
Association for Computing Machinery. ISBN 9798400703300. doi: 10.1145/3613904.3642166. URLhttps://doi.org/10.1145/3613904.3642166. Sina Malakouti and Adriana Kovashka. Role bias in diffusion models: Diagnosing and mitigating through intermediate decomposition. InThe Thirty-ninth Annual Conference on Neural Informa- tion Processing Systems. David Manheim a...
-
[24]
Categorizing Variants of Goodhart's Law
doi: 10.48550/arXiv.1803.04585. URLhttps://arxiv.org/ abs/1803.04585. Chenlin Meng, Yutong He, Yang Song, Jiaming Song, Jiajun Wu, Jun-Yan Zhu, and Stefano Ermon. Sdedit: Guided image synthesis and editing with stochastic differential equations.arXiv preprint arXiv:2108.01073,
-
[25]
Social Biases through the Text-to-Image Generation Lens
Ranjita Naik and Besmira Nushi. Social Biases through the Text-to-Image Generation Lens. InPro- ceedings of the 2023 AAAI/ACM Conference on AI, Ethics, and Society, pp. 786–808, Montr ´eal QC Canada, August
2023
-
[26]
ACM. ISBN 979-8-4007-0231-0. doi: 10.1145/3600211.3604711. URLhttps://dl.acm.org/doi/10.1145/3600211.3604711. Alex Nichol, Prafulla Dhariwal, Aditya Ramesh, Pranav Shyam, Pamela Mishkin, Bob McGrew, Ilya Sutskever, and Mark Chen. Glide: Towards photorealistic image generation and editing with text-guided diffusion models.arXiv preprint arXiv:2112.10741,
-
[27]
Analyzing bias in diffusion-based face generation models
Malsha V Perera and Vishal M Patel. Analyzing bias in diffusion-based face generation models. In 2023 IEEE International Joint Conference on Biometrics (IJCB), pp. 1–10. IEEE,
2023
-
[28]
doi: 10.1145/3593013.3594016. 13 ICLR 2026 Algorithmic Fairness Across Alignment Procedures and Agentic Systems (AFAA) Workshop Alec Radford, Jong Wook Kim, Chris Hallacy, Aditya Ramesh, Gabriel Goh, Sandhini Agarwal, Girish Sastry, Amanda Askell, Pamela Mishkin, Jack Clark, et al. Learning transferable visual models from natural language supervision. InI...
-
[29]
U-net: Convolutional networks for biomed- ical image segmentation
Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomed- ical image segmentation. InMedical image computing and computer-assisted intervention– MICCAI 2015: 18th international conference, Munich, Germany, October 5-9, 2015, proceed- ings, part III 18, pp. 234–241. Springer,
2015
-
[30]
ISSN 2076-0760. doi: 10.3390/socsci13050250. URLhttps://www.mdpi.com/ 2076-0760/13/5/250. Preethi Seshadri, Sameer Singh, and Yanai Elazar. The bias amplification paradox in text-to-image generation. In Kevin Duh, Helena Gomez, and Steven Bethard (eds.),Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational L...
-
[31]
doi: 10.18653/v1/2024.naacl-long.353
Association for Computational Linguistics. doi: 10.18653/v1/2024.naacl-long.353. URLhttps://aclanthology.org/2024.naacl-long.353/. Xudong Shen, Chao Du, Tianyu Pang, Min Lin, Yongkang Wong, and Mohan Kankanhalli. Fine- tuning text-to-image diffusion models for fairness.arXiv preprint arXiv:2311.07604,
-
[32]
ISSN 1083-6101. doi: 10.1093/jcmc/zmad045. URLhttps://doi.org/10.1093/jcmc/zmad045. Piotr Szyma ´nski, Magdalena Lipczy ´nska, and Anna Maria G ´orska. From data to percep- tion: visualizing bias in artificial intelligence-generated images.European Heart Jour- nal, pp. ehae850, March
-
[33]
doi: 10.1093/eurheartj/ ehae850
ISSN 0195-668X, 1522-9645. doi: 10.1093/eurheartj/ ehae850. URLhttps://academic.oup.com/eurheartj/advance-article/ doi/10.1093/eurheartj/ehae850/8088193. Christopher T Teo, Milad Abdollahzadeh, Xinda Ma, and Ngai-man Cheung. Fairqueue: Rethinking prompt learning for fair text-to-image generation.Advances in Neural Information Processing Systems, 37:22878–22926,
-
[34]
URLhttps://arxiv.org/abs/2404. 01030. Jialu Wang, Xinyue Liu, Zonglin Di, Yang Liu, and Xin Wang. T2iat: Measuring valence and stereotypical biases in text-to-image generation. InFindings of the Association for Computational Linguistics: ACL 2023, pp. 2560–2574,
2023
-
[35]
doi: 10.3390/ jimaging11020035
ISSN 2313-433X. doi: 10.3390/ jimaging11020035. URLhttps://www.mdpi.com/2313-433X/11/2/35. 14 ICLR 2026 Algorithmic Fairness Across Alignment Procedures and Agentic Systems (AFAA) Workshop Eric J. York, Eva Brumberger, and La Verne Abe Harris. Prompting Bias: Assessing representation and accuracy in AI-generated images. InProceedings of the 42nd ACM Inter...
2026
-
[36]
ACM. ISBN 979- 8-4007-0519-9. doi: 10.1145/3641237.3691658. URLhttps://dl.acm.org/doi/10. 1145/3641237.3691658. Cheng Zhang, Xuanbai Chen, Siqi Chai, Chen Henry Wu, Dmitry Lagun, Thabo Beeler, and Fer- nando De la Torre. Iti-gen: Inclusive text-to-image generation. InProceedings of the IEEE/CVF International Conference on Computer Vision (ICCV), pp. 3969–...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.