Unmasking LAION-5B: Age, Gender, Race, and Emotion Biases in Large-Scale Image Datasets
Pith reviewed 2026-06-26 09:10 UTC · model grok-4.3
The pith
LAION-5B substantially overrepresents young White males while underrepresenting minority groups, older women, and certain emotion expressions.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using FairFace, DeepFace, and Emo-AffectNet to label faces detected in LAION-2B-en and LAION-2B-multi, the authors identify substantial overrepresentation of young adults (20-39), White individuals, and males, alongside consistent underrepresentation of minority racial groups and middle-aged or older women. They also observe stereotypical associations between demographic attributes and emotions, such as Anger being predominantly linked to males and Happiness to females. The consistency of these patterns across both dataset components and two demographic models shows that the biases are deeply embedded.
What carries the argument
Demographic and emotion labeling of detected faces by the pre-trained models FairFace, DeepFace, and Emo-AffectNet
If this is right
- Generative models trained on LAION-5B are likely to inherit and reproduce the observed demographic imbalances in their outputs.
- AI systems built on the dataset may show skewed performance or representation across age, gender, race, and emotion categories.
- The documented patterns persist across both English and multilingual components, indicating the biases are not limited to one language slice.
- Users of LAION-5B for training should incorporate explicit balancing or filtering steps to counteract the measured skews.
Where Pith is reading between the lines
- The same face-labeling pipeline could be run on other large web-scraped collections to test whether comparable demographic patterns appear elsewhere.
- If classifier error rates do vary by group, the reported bias magnitudes could shift once those rates are measured and corrected.
- The intersectional underrepresentation of older women from minority groups may compound effects on downstream model fairness beyond the separate marginal counts.
Load-bearing premise
The error rates of the three attribute-classification models do not vary systematically with the true demographics or emotions present in the LAION images.
What would settle it
A human re-annotation of a random sample of several thousand images from each component, followed by a direct statistical comparison of the resulting age-gender-race-emotion distributions against the model outputs.
Figures
read the original abstract
Large-scale image-text datasets, such as LAION-5B, are foundational to modern AI systems, yet their vast scale and uncurated nature raise significant concerns about demographic and stereotypical biases. This study presents a comprehensive analysis of the demographic composition and representational, stereotypical, and intersectional biases in LAION-2B-en and LAION-2B-multi, the two main components of the LAION-5B dataset. Using state-of-the-art models -- FairFace, DeepFace, and Emo-AffectNet -- we analyze faces detected in the dataset to identify biases across age, gender, race, and expressed emotion. Our findings reveal substantial overrepresentation of young adults (20--39), White individuals, and males, alongside consistent underrepresentation of minority racial groups and middle-aged or older women across both dataset components. We also observe stereotypical associations between demographic attributes and emotions, such as ``Anger'' being predominantly linked to males and ``Happiness'' to females, pointing to systemic imbalances in the data. The consistency of these patterns across two demographic models and both components of LAION-5B demonstrates that these biases are deeply embedded in one of the most widely-used training datasets. Given the scale at which LAION-5B is used to train generative models, these demographic imbalances could shape the behavior and outputs of numerous downstream AI systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper analyzes demographic (age, gender, race) and emotion biases in the LAION-2B-en and LAION-2B-multi components of LAION-5B by applying face detection followed by the classifiers FairFace, DeepFace, and Emo-AffectNet. It reports substantial overrepresentation of young adults (20-39), White individuals, and males, underrepresentation of minority racial groups and middle-aged/older women, and stereotypical emotion associations (e.g., Anger predominantly with males, Happiness with females), with consistency across the two dataset components and two demographic models.
Significance. If the classifier-derived counts accurately reflect image content, the work provides a valuable empirical baseline on biases in one of the largest publicly used image-text datasets, with direct relevance to fairness in generative models and downstream CV systems. The reported consistency across components and models is a positive empirical feature.
major comments (2)
- [§3 (Methodology)] §3 (Methodology): No stratified validation, calibration curves, or human-labeled accuracy metrics are reported for FairFace, DeepFace, or Emo-AffectNet on LAION images or a comparable held-out set; without this, differential error rates by age/gender/race cannot be ruled out as a source of the reported imbalances.
- [§4 (Results)] §4 (Results) and Abstract: The central percentages and association claims rest on the untested assumption that classifier error rates do not covary with the true demographics or emotions; no sensitivity analysis or error-propagation bounds are supplied to quantify how plausible misclassification patterns would alter the over/under-representation findings.
minor comments (1)
- The abstract would be clearer if it stated the total number of detected faces, the face-detection threshold used, and the fraction of images discarded due to no-face or low-confidence detections.
Simulated Author's Rebuttal
We thank the referee for the constructive comments on classifier validation and robustness. We address each major comment below and will revise the manuscript accordingly to improve methodological transparency.
read point-by-point responses
-
Referee: [§3 (Methodology)] §3 (Methodology): No stratified validation, calibration curves, or human-labeled accuracy metrics are reported for FairFace, DeepFace, or Emo-AffectNet on LAION images or a comparable held-out set; without this, differential error rates by age/gender/race cannot be ruled out as a source of the reported imbalances.
Authors: The manuscript selects FairFace, DeepFace, and Emo-AffectNet as established, publicly documented tools for demographic and emotion inference, relying on the performance metrics reported in their original publications rather than re-validating them on LAION. No new stratified validation, calibration curves, or human-labeled metrics on LAION (or a comparable set) are provided because the study focus is an audit of dataset composition, not a benchmark of the classifiers themselves. We acknowledge that this leaves open the possibility of differential error rates contributing to observed imbalances. In revision we will add an expanded discussion in §3 of the classifiers' documented limitations and a new limitations subsection that explicitly flags the absence of LAION-specific validation as a constraint on interpreting the counts. revision: yes
-
Referee: [§4 (Results)] §4 (Results) and Abstract: The central percentages and association claims rest on the untested assumption that classifier error rates do not covary with the true demographics or emotions; no sensitivity analysis or error-propagation bounds are supplied to quantify how plausible misclassification patterns would alter the over/under-representation findings.
Authors: We agree that the reported percentages and associations would be strengthened by explicit quantification of sensitivity to plausible misclassification. While the manuscript already notes consistency of patterns across two independent demographic classifiers, this does not constitute a formal sensitivity analysis. In the revised version we will insert a new subsection in §4 that performs a sensitivity analysis: we will simulate misclassification matrices drawn from the published error rates of FairFace, DeepFace, and Emo-AffectNet, propagate these errors through the demographic and emotion distributions, and report bounds on how the over- and under-representation statistics could shift under different error-covariance assumptions. Corresponding caveats will be added to the abstract. revision: yes
Circularity Check
No circularity; empirical counts from external classifiers on external data
full rationale
The paper performs a direct empirical audit: it runs three pre-trained, externally developed classifiers (FairFace, DeepFace, Emo-AffectNet) on faces detected in the public LAION-5B subsets and tabulates the resulting label distributions. No equations, fitted parameters, or self-citations are used to derive the reported over/under-representations or emotion associations; the counts are literal outputs of the external models. The work therefore contains no self-definitional steps, no fitted-input predictions, and no load-bearing self-citation chains. The skeptic concern about classifier error rates varying by demographic is a validity question, not a circularity question under the stated criteria.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption FairFace, DeepFace, and Emo-AffectNet labels are sufficiently accurate and unbiased for the purpose of measuring dataset composition.
Reference graph
Works this paper leans on
-
[1]
2021 , month = jan, institution =
Emotional. 2021 , month = jan, institution =
2021
-
[2]
Fairness in Representation: Quantifying Stereotyping as a Representational Harm , author =. 2019 , month = may, journal =. doi:10.1137/1.9781611975673 , urldate =
-
[3]
Abdu, Amina A. and Pasquetto, Irene V. and Jacobs, Abigail Z. , year =. An. 2023. doi:10.1145/3593013.3594083 , urldate =
-
[4]
2005 , month = jan, publisher =
Computer. 2005 , month = jan, publisher =
2005
-
[5]
2013 , month = jan, publisher =
Computer. 2013 , month = jan, publisher =. doi:10.1145/2534860 , urldate =
-
[6]
Directly. 2023 , journal =. doi:10.1111/1468-2230.12759 , urldate =
-
[7]
Agarwal, Alekh and Dud. Fair. 2019 , month = may, journal =. doi:10.48550/arXiv.1905.12843 , urldate =
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1905.12843 2019
-
[8]
Ahmad, Khurshid and Wang, Shirui and Vogel, Carl and Jain, Pranav and O'Neill, Oscar and Sufi, Basit Hamid , editor =. Comparing the. Proceedings of the. 2022 , volume =. doi:10.1007/978-3-030-89906-6\_14 , urldate =
-
[9]
, year =
Aifanti, Niki and Papachristou, Christos and Delopoulos, A. , year =. The. 11th Int. Workshop Image Anal. Multimed. Interact. Serv. WIAMIS 10 , urldate =
-
[10]
arXiv , langid =:2402.01002 , primaryclass =
AlDahoul, Nouar and Rahwan, Talal and Zaki, Yasir , year =. arXiv , langid =:2402.01002 , primaryclass =
-
[11]
Aneja, Deepali and Colburn, Alex and Faigin, Gary and Shapiro, Linda and Mones, Barbara , editor =. Modeling. Comput. 2017 , series =. doi:10.1007/978-3-319-54184-6\_9 , abstract =
-
[12]
Angwin, Julia and Larson, Jeff and Mattu, Surya and Kirchner, Lauren , year =. Machine. ProPublica , urldate =
-
[13]
Argyle, Lisa P. and Busby, Ethan and Gubler, Joshua and Bail, Chris and Howe, Thomas and Rytting, Christopher and Wingate, David , year =. arXiv , langid =:2302.07268 , primaryclass =
-
[14]
Arp, Daniel and Quiring, Erwin and Pendlebury, Feargus and Warnecke, Alexander and Pierazzi, Fabio and Wressnegger, Christian and Cavallaro, Lorenzo and Rieck, Konrad , year =. Dos and. arXiv , file =:2010.09470 , primaryclass =
arXiv 2010
-
[15]
Assuncao, Gustavo and Patrao, Bruno and. An. 2022 , journal =. doi:10.1109/TAI.2022.3159614 , abstract =
-
[16]
Asymmetric Interference between Sex and Emotion in Face Perception , author =. 2005 , month = oct, journal =. doi:10.3758/BF03193553 , urldate =
-
[17]
Avella, Marcela del Pilar Roa , year =. Crime. Telecommun. Syst. Manag. , volume =
-
[18]
Bagert, Donald J. , year =. Viewpoint: Taking the Lead in Licensing Software Engineers , shorttitle =. Commun. ACM , volume =. doi:10.1145/299157.299163 , urldate =
-
[19]
Constitutional AI: Harmlessness from AI Feedback
Bai, Yuntao and Kadavath, Saurav and Kundu, Sandipan and Askell, Amanda and Kernion, Jackson and Jones, Andy and Chen, Anna and Goldie, Anna and Mirhoseini, Azalia and McKinnon, Cameron and Chen, Carol and Olsson, Catherine and Olah, Christopher and Hernandez, Danny and Drain, Dawn and Ganguli, Deep and Li, Dustin and. Constitutional. 2022 , month = dec, ...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2212.08073 2022
-
[20]
Facial Emotion Recognition and Music Recommendation System Using
Bakariya, Brijesh and Singh, Arshdeep and Singh, Harmanpreet and Raju, Pankaj and Rajpoot, Rohit and Mohbey, Krishna Kumar , year =. Facial Emotion Recognition and Music Recommendation System Using. Evolving Systems , issn =. doi:10.1007/s12530-023-09506-z , urldate =
-
[21]
B. Introducing the. 2012 , journal =. doi:10.1037/a0025827 , urldate =
-
[22]
2019 , publisher =
Fairness and Machine Learning , author =. 2019 , publisher =
2019
-
[23]
Barrett, Teanna and Chen, Quanze and Zhang, Amy , year =. Skin. Proc. 2023. doi:10.1145/3593013.3594114 , urldate =
-
[24]
In: Proceedings of the 18th ACM International Conference on Multimodal Interactio
Barsoum, Emad and Zhang, Cha and Ferrer, Cristian Canton and Zhang, Zhengyou , year =. Training Deep Networks for Facial Expression Recognition with Crowd-Sourced Label Distribution , booktitle =. doi:10.1145/2993148.2993165 , urldate =
-
[25]
Bayram, Firas and Ahmed, Bestoun S. and Kassler, Andreas , year =. From. doi:10.48550/arXiv.2203.11070 , urldate =. arXiv , keywords =:2203.11070 , primaryclass =
-
[26]
Beaupr. Cross-. 2005 , month = may, journal =. doi:10.1177/0022022104273656 , urldate =
-
[27]
The Confounded Nature of Angry Men and Happy Women , author =. 2007 , journal =. doi:10.1037/0022-3514.92.2.179 , abstract =
-
[28]
Ethics Dumping in Artificial Intelligence , author =. 2024 , month = nov, journal =. doi:10.3389/frai.2024.1426761 , urldate =
-
[29]
1966 , address =
A Comparison of Some Cluster-Seeking Techniques , author =. 1966 , address =
1966
-
[30]
doi:10.1145/3442188.3445924 , isbn =
Bender, Emily M. and Gebru, Timnit and. On the. Proc. 2021. 2021 , month = mar, series =. doi:10.1145/3442188.3445922 , urldate =
-
[31]
2016. 2016 , month = jun, pages =. doi:10.1109/CVPR.2016.600 , urldate =
-
[32]
Berger, Wolfgang H. and Parker, Frances L. , year =. Diversity of. Science , volume =. doi:10.1126/science.168.3937.1345 , urldate =
-
[33]
Bergsma, Wicher , year =. A Bias-Correction for. Journal of the Korean Statistical Society , volume =. doi:10.1016/j.jkss.2012.10.002 , urldate =
-
[34]
and Heidari, Hoda and Jabbari, S
Berk, R. and Heidari, Hoda and Jabbari, S. and Joseph, Matthew and Kearns, Michael and Morgenstern, Jamie and Neel, Seth and Roth, Aaron , year =. A. ArXiv , doi =
-
[35]
Berk, Richard and Heidari, Hoda and Jabbari, Shahin and Kearns, Michael and Roth, Aaron , year =. Fairness in. Sociological Methods & Research , volume =. doi:10.1177/0049124118782533 , urldate =
-
[36]
Potential Sources of Dataset Bias Complicate Investigation of Underdiagnosis by Machine Learning Algorithms , author =. 2022 , month = jun, journal =. doi:10.1038/s41591-022-01846-8 , urldate =
-
[37]
Beyer, Lucas and H. Are We Done with. 2020 , month = jun, journal =. arXiv , langid =:2006.07159 , primaryclass =
arXiv 2020
-
[38]
Biehl, Michael and Matsumoto, David and Ekman, Paul and Hearn, Valerie and Heider, Karl and Kudoh, Tsutomu and Ton, Veronica , year =. Matsumoto and. Journal of Nonverbal Behavior , volume =. doi:10.1023/A:1024902500935 , urldate =
-
[39]
Multimodal Datasets: Misogyny, Pornography, and Malignant Stereotypes , shorttitle =
Birhane, Abeba and Prabhu, Vinay Uday and Kahembwe, Emmanuel , year =. Multimodal Datasets: Misogyny, Pornography, and Malignant Stereotypes , shorttitle =. ArXiv211001963 Cs , eprint =
-
[40]
In: IEEE Winter Conference on Applications of Computer Vision (WACV)
Birhane, Abeba and Prabhu, Vinay Uday , year =. Large Image Datasets:. 2021. doi:10.1109/WACV48630.2021.00158 , urldate =
-
[41]
Birhane, Abeba and Prabhu, Vinay and Han, Sang and Boddeti, Vishnu Naresh , year =. On. ArXiv230613141 Cs , doi =
-
[42]
Birhane, Abeba and Dehdashtian, Sepehr and Prabhu, Vinay and Boddeti, Vishnu , year =. The. Proc. 2024. doi:10.1145/3630106.3658968 , urldate =
-
[43]
The Visibility of Social Class from Facial Cues , author =. 2017 , month = oct, journal =. doi:10.1037/pspa0000091 , abstract =
-
[44]
Blakeney, Cody and Atkinson, Gentry and Huish, Nathaniel and Yan, Yan and Metsis, Vangelis and Zong, Ziliang , year =. Measuring. 2022. doi:10.1109/NAS55553.2022.9925287 , urldate =
-
[45]
Learning to Teach in Higher Education:
Bligh, Donald , year =. Learning to Teach in Higher Education:. Studies in Higher Education , volume =. doi:10.1080/03075079312331382498 , urldate =
-
[46]
Language (Technology) is Power: A Critical Survey of ``Bias'' in NLP
Blodgett, Su Lin and Barocas, Solon and Daum. Language (. Proc. 58th. 2020 , pages =. doi:10.18653/v1/2020.acl-main.485 , urldate =
-
[47]
On the Opportunities and Risks of Foundation Models
Bommasani, Rishi and Hudson, Drew A. and Adeli, Ehsan and Altman, Russ and Arora, Simran and von Arx, Sydney and Bernstein, Michael S. and Bohg, Jeannette and Bosselut, Antoine and Brunskill, Emma and Brynjolfsson, Erik and Buch, Shyamal and Card, Dallas and Castellon, Rodrigo and Chatterji, Niladri and Chen, Annie and Creel, Kathleen and Davis, Jared Qui...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2108.07258 2022
-
[48]
Stereotypes* , author =. 2016 , month = nov, journal =. doi:10.1093/qje/qjw029 , urldate =
-
[49]
Normalized (Pointwise) Mutual Information in Collocation Extraction , booktitle =
Bouma, Gerlof , year =. Normalized (Pointwise) Mutual Information in Collocation Extraction , booktitle =
-
[50]
, year =
Brock, David A. , year =. Comparison of. J. Water Pollut. Control Fed. , volume =. 25039481 , eprinttype =
-
[51]
and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, J
Brown, Tom B. and Mann, Benjamin and Ryder, Nick and Subbiah, Melanie and Kaplan, J. and Dhariwal, Prafulla and Neelakantan, Arvind and Shyam, Pranav and Sastry, Girish and Askell, Amanda and Agarwal, Sandhini and. Language. 2020 , month = may, journal =
2020
-
[52]
Gender Shades:
Buolamwini, Joy and Gebru, Timnit , editor =. Gender Shades:. Proc. 1st. 2018 , month = feb, series =
2018
-
[53]
Burkart, Nadia and Huber, Marco F. , year =. A. J. Artif. Intell. Res. , volume =. doi:10.1613/jair.1.12228 , urldate =
-
[54]
and Lee, Sungbok and Narayanan, Shrikanth S
Busso, Carlos and Bulut, Murtaza and Lee, Chi-Chun and Kazemzadeh, Abe and Mower, Emily and Kim, Samuel and Chang, Jeannette N. and Lee, Sungbok and Narayanan, Shrikanth S. , year =. Lang Resources & Evaluation , volume =. doi:10.1007/s10579-008-9076-6 , urldate =
-
[55]
Buyl, Maarten and Rogiers, Alexander and Noels, Sander and. Large. 2024 , month = oct, number =. arXiv , keywords =:2410.18417 , journal =
arXiv 2024
-
[56]
Calders, Toon and Kamiran, Faisal and Pechenizkiy, Mykola , year =. Building. 2009. doi:10.1109/ICDMW.2009.83 , abstract =
-
[57]
Cannarsa, Michel , editor =. Ethics. The. 2021 , series =. doi:10.1017/9781108936040.022 , urldate =
-
[58]
Cao, Houwei and Cooper, David G. and Keutmann, Michael K. and Gur, Ruben C. and Nenkova, Ani and Verma, Ragini , year =. IEEE Trans Affect Comput , volume =. doi:10.1109/TAFFC.2014.2336244 , urldate =
-
[59]
and Zisserman, Andrew , year =
Cao, Qiong and Shen, Li and Xie, Weidi and Parkhi, Omkar M. and Zisserman, Andrew , year =. arXiv , langid =:1710.08092 , primaryclass =
-
[60]
Casta. 2024 , month = may, journal =. doi:10.1016/j.softx.2024.101728 , urldate =
-
[61]
2020 , month = nov, publisher =
Computing. 2020 , month = nov, publisher =. doi:10.1145/3467967 , urldate =
-
[62]
Chalkidis, Ilias and Brandl, Stephanie , year =. Llama Meets. arXiv , langid =:2403.13592 , primaryclass =
-
[63]
and Cramer, Mackenzie and Soni, Sandeep and Bamman, David , year =
Chang, Kent K. and Cramer, Mackenzie and Soni, Sandeep and Bamman, David , year =. Speak,. ArXiv230500118 Cs , eprint =
-
[64]
Chang, Hongyan and Shokri, Reza , year =. Bias. doi:10.48550/ARXIV.2309.02160 , urldate =
-
[65]
Chaudhary, Bhushan and Pandey, Anubha and Bhatt, Deepak and Tiwari, Darshika , year =. Practical. doi:10.48550/arXiv.2312.15994 , urldate =. arXiv , keywords =:2312.15994 , publisher =
-
[66]
Smile Detection in the Wild with Deep Convolutional Neural Networks , author =. 2017 , month = feb, journal =. doi:10.1007/s00138-016-0817-z , urldate =
-
[67]
Distinct Facial Expressions Represent Pain and Pleasure across Cultures , author =. 2018 , month = oct, journal =. doi:10.1073/pnas.1807862115 , urldate =
-
[68]
Chen, Jiahao and Kallus, Nathan and Mao, Xiaojie and Svacha, Geoffry and Udell, Madeleine , year =. Fairness. Proc. doi:10.1145/3287560.3287594 , urldate =. arXiv , langid =:1811.11154 , primaryclass =
-
[69]
Chen, Yunliang and Joo, Jungseock , year =. Understanding and. 2021. doi:10.1109/ICCV48922.2021.01471 , abstract =
-
[70]
Cheong, Jiaee and Kalkan, Sinan and Gunes, Hatice , year =. The. IEEE Signal Process. Mag. , volume =. doi:10.1109/MSP.2021.3106619 , abstract =
-
[71]
Cheong, Marc and Abedin, Ehsan and Ferreira, Marinus and Reimann, Ritsaart and Chalson, Shalom and Robinson, Pamela and Byrne, Joanne and Ruppanner, Leah and Alfano, Mark and Klein, Colin , year =. Investigating. ACM J. Responsib. Comput. , volume =. doi:10.1145/3649883 , urldate =
-
[72]
doi:10.48550/arXiv.2407.18745 , urldate =
Chinta, Sribala Vidyadhari and Wang, Zichong and Yin, Zhipeng and Hoang, Nhat and Gonzalez, Matthew and Quy, Tai Le and Zhang, Wenbin , year =. doi:10.48550/arXiv.2407.18745 , urldate =. arXiv , file =:2407.18745 , primaryclass =
-
[73]
A Snapshot of the Frontiers of Fairness in Machine Learning , author =. 2020 , month = apr, journal =. doi:10.1145/3376898 , urldate =
-
[74]
Christoforaki, Maria and Beyan, Oya , year =. Appl. Sci. , volume =. doi:10.3390/app12094130 , urldate =
-
[75]
Churamani, Nikhil and Kara, Ozgur and Gunes, Hatice , year =. Domain-. IEEE Trans. Affective Comput. , eprint =. doi:10.1109/TAFFC.2022.3181033 , urldate =
-
[76]
and Rubin, Donald B
Cochran, William G. and Rubin, Donald B. , year =. Controlling. Sankhy. 25049893 , eprinttype =
-
[78]
Statistical Power Analysis for the Behavioral Sciences , ISBN =
Cohen, Jacob , year =. Statistical. doi:10.4324/9780203771587 , urldate =
-
[79]
1987 , address =
Informe:. 1987 , address =
1987
-
[80]
1988 , address =
Informe:. 1988 , address =
1988
-
[81]
Cooper, A. Feder and Abrams, Ellen , year =. Emergent. Proc. 2021 AAAIACM Conf. AI Ethics Soc. , eprint =. doi:10.1145/3461702.3462519 , urldate =
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.