Clustering Discourses: Racial Biases in Short Stories about Women Generated by Large Language Models

Gustavo Bonil; Helena Maia; Helio Pedrini; Jo\~ao Gondim; Marina dos Santos; Nadia Silva; Sandra Avila; Simone Hashiguti

arxiv: 2509.02834 · v1 · submitted 2025-09-02 · 💻 cs.CL · cs.AI

Clustering Discourses: Racial Biases in Short Stories about Women Generated by Large Language Models

Gustavo Bonil , Jo\~ao Gondim , Marina dos Santos , Simone Hashiguti , Helena Maia , Nadia Silva , Helio Pedrini , Sandra Avila This is my paper

Pith reviewed 2026-05-18 19:00 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords large language modelsracial biasesshort storiesdiscourse analysisPortuguesecolonial framingnarrative generationsemantic clustering

0 comments

The pith

Large language models embed colonially structured framings of Black and white women in generated short stories.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper examines how a large language model builds short stories in Portuguese about Black and white women. It generates 2100 texts, clusters them by semantic similarity, and selects groups for close discourse analysis. The clusters reveal three recurring ways of portraying women: overcoming social barriers, connecting to ancestral myths, and achieving personal realization. These ways rely on a fixed colonial view of the female body that keeps older power imbalances in place. Readers should care because AI text now reaches many people and can spread such framings without anyone noticing obvious mistakes.

Core claim

The central claim is that grammatically coherent and seemingly neutral short stories generated by large language models about Black and white women in Portuguese actually carry a crystallized, colonially structured framing of the female body. This framing reinforces historical inequalities through three main discursive representations: social overcoming, ancestral mythification, and subjective self-realization. The evidence comes from computational clustering of 2100 generated texts followed by qualitative analysis of the resulting groups.

What carries the argument

The mechanism of computationally clustering semantically similar generated stories and then applying qualitative discourse analysis to uncover the underlying colonial structures.

If this is right

Generated stories about women will tend to recycle colonial ideas about the body even when they sound neutral and coherent.
The combination of automated grouping and close reading can make subtle biases in AI text visible and testable.
These discursive patterns may shape public understanding of race and gender when AI content spreads widely.
Applying the same method to other models or languages could show whether the colonial framing is widespread.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Training data scraped from historical sources likely plants these colonial patterns deep in the model weights.
Future bias checks for creative writing tools should include discourse analysis of clustered outputs rather than just checking for explicit slurs.
Readers might start treating AI stories as neutral mirrors of culture when they actually carry older power structures.
Similar methods could be used to study biases in other generated content like news summaries or educational materials.

Load-bearing premise

The load-bearing premise is that clusters based on semantic similarity in the stories point to distinct discursive representations that reveal colonial framing, and that the sample chosen for reading does not skew the interpretation.

What would settle it

If a researcher clusters a new set of stories from the same model and finds that the qualitative themes do not match social overcoming, ancestral mythification, or subjective self-realization, or that they lack any colonial body framing, the central claim would be challenged.

Figures

Figures reproduced from arXiv: 2509.02834 by Gustavo Bonil, Helena Maia, Helio Pedrini, Jo\~ao Gondim, Marina dos Santos, Nadia Silva, Sandra Avila, Simone Hashiguti.

**Figure 2.** Figure 2: Representation of corpus’ structure in reduced dimension. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: On the top left (a), a word cloud considering all the words in short [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

read the original abstract

This study investigates how large language models, in particular LLaMA 3.2-3B, construct narratives about Black and white women in short stories generated in Portuguese. From 2100 texts, we applied computational methods to group semantically similar stories, allowing a selection for qualitative analysis. Three main discursive representations emerge: social overcoming, ancestral mythification and subjective self-realization. The analysis uncovers how grammatically coherent, seemingly neutral texts materialize a crystallized, colonially structured framing of the female body, reinforcing historical inequalities. The study proposes an integrated approach, that combines machine learning techniques with qualitative, manual discourse analysis.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper claims that generating 2100 short stories in Portuguese with LLaMA 3.2-3B about Black and white women, followed by computational clustering of semantically similar texts, yields three main discursive representations—social overcoming, ancestral mythification, and subjective self-realization. Qualitative analysis of these clusters is presented as evidence that the generated narratives materialize a crystallized, colonially structured framing of the female body that reinforces historical inequalities. The work advocates an integrated approach combining machine learning techniques with manual discourse analysis.

Significance. If the clustering and interpretive steps prove robust, the findings would add concrete evidence that LLMs can embed colonial-era tropes into coherent, seemingly neutral Portuguese-language narratives about women. The mixed-methods design offers a replicable template for bias auditing in generative models, particularly for non-English settings, and could inform both technical mitigation strategies and broader discussions in computational social science.

major comments (2)

The manuscript supplies no information on the clustering algorithm, embedding model, distance metric, or criterion used to select the number of clusters. Because the headline result is the emergence of exactly three representations, these details are load-bearing; without them the reader cannot determine whether the groupings are stable or sensitive to arbitrary choices.
The qualitative step that assigns the labels 'social overcoming', 'ancestral mythification', and 'subjective self-realization' and then interprets them as 'colonially structured framing' is described without a pre-registered codebook, inter-annotator agreement statistics, or explicit sampling protocol for the stories chosen for manual reading. This mapping is central to the claim that the LLM outputs reproduce historical inequalities rather than generic narrative patterns.

minor comments (2)

The abstract states that 2100 texts were generated but does not clarify the distribution across Black and white women or the prompt templates used; adding these details would improve reproducibility.
A short discussion of how the Portuguese language setting might interact with the observed representations (e.g., lexical choices unavailable in English) would strengthen the cross-linguistic contribution.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which identifies key areas where additional methodological detail will strengthen the paper. We respond to each major comment below and indicate the revisions we will implement.

read point-by-point responses

Referee: The manuscript supplies no information on the clustering algorithm, embedding model, distance metric, or criterion used to select the number of clusters. Because the headline result is the emergence of exactly three representations, these details are load-bearing; without them the reader cannot determine whether the groupings are stable or sensitive to arbitrary choices.

Authors: We agree that these details are necessary for assessing reproducibility and robustness. The initial submission omitted a dedicated methods subsection on the computational pipeline for brevity. In the revised manuscript we will add a full description of the embedding model, clustering algorithm, distance metric, and cluster-number selection procedure, along with supplementary analyses of cluster stability under varied parameter settings. revision: yes
Referee: The qualitative step that assigns the labels 'social overcoming', 'ancestral mythification', and 'subjective self-realization' and then interprets them as 'colonially structured framing' is described without a pre-registered codebook, inter-annotator agreement statistics, or explicit sampling protocol for the stories chosen for manual reading. This mapping is central to the claim that the LLM outputs reproduce historical inequalities rather than generic narrative patterns.

Authors: We acknowledge that greater transparency is warranted. The thematic labels were developed through iterative team-based close reading grounded in discourse-analytic traditions. In revision we will insert an explicit sampling protocol (stratified random selection from each cluster) and a step-by-step account of how the categories were derived. We did not pre-register a codebook or compute inter-annotator agreement; we will therefore add a limitations paragraph addressing interpretive subjectivity while preserving the exploratory character of the study. revision: partial

Circularity Check

0 steps flagged

No significant circularity in the empirical derivation chain

full rationale

The paper generates 2100 short stories in Portuguese using LLaMA 3.2-3B, applies computational clustering to group semantically similar texts, selects exemplars for qualitative discourse analysis, and interprets three emergent representations (social overcoming, ancestral mythification, subjective self-realization) as evidence of colonially structured framing. No equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations appear in the chain; the interpretive conclusions are derived from the generated data and manual analysis rather than reducing to prior inputs by construction. The study is self-contained against external benchmarks of LLM output and clustering.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no explicit free parameters, axioms, or invented entities; the three discursive representations are presented as emergent findings rather than postulated constructs.

pith-pipeline@v0.9.0 · 5653 in / 1011 out tokens · 37642 ms · 2026-05-18T19:00:10.829754+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We applied clustering algorithms in the original 1024-dimensional vectors... DBSCAN yielding a VRC of 114.41 and with three clusters... qualitative analysis... Discursive Sequences of Reference (DSRs)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages

[1]

Abid, A., Farooqi, M., and Zou, J. (2021). Persistent anti-muslim bias in large language models. In AAAI/ACM Conference on AI, Ethics, and Society , pages 298--306

work page 2021
[2]

Araújo, J. (2024). Racismo algorítmico e microagressões nas redes sociais . Domínios de Lingu@gem , 18:e1849

work page 2024
[3]

Assi, F. M. and Caseli, H. d. M. (2024). Biases in gpt-3.5 turbo model: a case study regarding gender and language. In Simp \'o sio Brasileiro de Tecnologia da Informa c \ a o e da Linguagem Humana (STIL) , pages 294--305. SBC

work page 2024
[4]

L., Barocas, S., Daum \'e III, H., and Wallach, H

Blodgett, S. L., Barocas, S., Daum \'e III, H., and Wallach, H. (2020). Language (technology) is power: A critical survey of ``bias'' in NLP . In 58th Annual Meeting of the Association for Computational Linguistics , pages 5454--5476

work page 2020
[5]

Bonil, G., Hashiguti, S., Silva, J., Gondim, J., Maia, H., Silva, N., Pedrini, H., and Avila, S. (2025). Yet another algorithmic bias: A discursive analysis of large language models reinforcing dominant discourses on gender and race. arXiv 2508.10304

work page arXiv 2025
[6]

and Bowman, S

Bordia, S. and Bowman, S. R. (2019). Identifying and reducing gender bias in word-level language models. In Conference of the North A merican Chapter of the Association for Computational Linguistics: Student Research Workshop , pages 7--15

work page 2019
[7]

and Harabasz, J

Caliński, T. and Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics -- Theory and Methods , 3:1--27

work page 1974
[8]

Cervantes, J., Garcia-Lamont, F., Rodr\'iguez-Mazahua, L., and Lopez, A. (2020). A comprehensive survey on support vector machine classification: Applications , challenges and trends. Neurocomputing , 408:189--215

work page 2020
[9]

Chen, J., Xiao, S., Zhang, P., Luo, K., Lian, D., and Liu, Z. (2024). M 3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. In Findings of the Association for Computational Linguistics: ACL 2024 , pages 2318--2335. Association for Computational Linguistics

work page 2024
[10]

X., Silva, D

Corazza, B. X., Silva, D. V. S., and Neves, C. A. d. B. (2024). ``Ser ou não ser'' digno de uma história de amor: inovações do ChatGPT e persistência colonial na validação de existências . Domínios de Lingu@gem , 18:e1831

work page 2024
[11]

and Vapnik, V

Cortes, C. and Vapnik, V. (1995). Support-Vector Networks . Machine Learning , 20(3):273--297

work page 1995
[12]

Courtine, J. J. (1981). Analyse du discours politique (le discours communiste addres \'e aux chr \'e tiens). Langages , 62:19--128

work page 1981
[13]

Grant, D. L. and Grant, M. B. (1975). Some notes on the capital `` N ''. Phylon (1960-) , 36(4):435--443

work page 1975
[14]

Grattafiori, A. et al. (2024). The Llama 3 Herd of Models

work page 2024
[15]

Guardian, T. (2021). Facebook data leak: Details from 533 million users found on website for hackers

work page 2021
[16]

Hashiguti, S. T. (2015). Corpo de Mem \'o ria . Paco Editorial, S \ a o Paulo

work page 2015
[17]

Kilomba, G. (2016). Plantation memories: episodes of everyday racism . UNRAST-Verlag, Münster, 4th edition edition

work page 2016
[18]

and Bamman, D

Lucy, L. and Bamman, D. (2021). Gender and representation bias in GPT-3 generated stories . In Third Workshop on Narrative Understanding , pages 48--55

work page 2021
[19]

Maxwell, J. (1992). Understanding and validity in qualitative research. Harvard Educational Review , 62(3):279--301

work page 1992
[20]

R., and Rudinger, R

May, C., Wang, A., Bordia, S., Bowman, S. R., and Rudinger, R. (2019). On measuring social biases in sentence encoders. In Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 622--628. Association for Computational Linguistics

work page 2019
[21]

McInnes, L., Healy, J., Saul, N., and Grossberger, L. (2018). Umap: Uniform manifold approximation and projection. The Journal of Open Source Software , 3(29):861

work page 2018
[22]

Pêcheux, M. (2022). O Discurso - Estrutura ou Acontecimento . Pontes, Campinas. Translation: Eni Puccinelli Orlandi

work page 2022
[23]

Rajagopalan, K. (2023). A disciplina chamada lingu \' stica aplicada e as contribui c \ o es de Luiz Paulo da Moita Lopes . Oficina de Lingu \' stica Aplicada INdisciplinar: homenagem a Luiz Paulo da Moita Lopes. Campinas, SP: Editora da Unicamp , pages 193--212

work page 2023
[24]

Remy, P. (2021). Name dataset. https://github.com/philipperemy/name-dataset

work page 2021
[25]

Salinas, A., Haim, A., and Nyarko, J. (2024). What's in a name? Auditing large language models for race and gender bias . arXiv preprint arXiv:2402.14875

work page arXiv 2024
[26]

Sheng, E., Chang, K.-W., Natarajan, P., and Peng, N. (2019). The woman worked as a babysitter: On biases in language generation. In International Joint Conference on Natural Language Processing , pages 3407--3412

work page 2019
[27]

Silva, T. (2022). Racismo Algor \' tmico: Intelig \^e ncia Artificial e Discrimina c \ a o nas Redes Digitais . Edi c \ o es SESC SP, São Paulo

work page 2022
[28]

Tharps, L. L. (2014). I refuse to remain in the lower case. https://myamericanmeltingpot.com/2014/06/02/i-refuse-to-remain-in-the-lower-case. [Accessed 22-1-2025]

work page 2014
[29]

N., Gautam, S., Panchanadikar, R., Huang, T.-H., and Wilson, S

Venkit, P. N., Gautam, S., Panchanadikar, R., Huang, T.-H., and Wilson, S. (2023). Nationality bias in text generation. arXiv preprint arXiv:2302.02463

work page arXiv 2023
[30]

Zao-Sanders, M. (2025). How People Are Really Using Gen AI in 2025 . https://hbr.org/2025/04/how-people-are-really-using-gen-ai-in-2025

work page 2025
[31]

write newline

" write newline "" before.all 'output.state := FUNCTION fin.entry add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION new.sentence output.state after.block = 'skip output.state before.all = 'skip after.sentence 'output.state := if if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 if FUNCTIO...

work page
[32]

write newline

" write newline "" before.all 'output.state := FUNCTION fin.entry add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION new.sentence output.state after.block = 'skip output.state before.all = 'skip after.sentence 'output.state := if if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 if FUNCTIO...

work page

[1] [1]

Abid, A., Farooqi, M., and Zou, J. (2021). Persistent anti-muslim bias in large language models. In AAAI/ACM Conference on AI, Ethics, and Society , pages 298--306

work page 2021

[2] [2]

Araújo, J. (2024). Racismo algorítmico e microagressões nas redes sociais . Domínios de Lingu@gem , 18:e1849

work page 2024

[3] [3]

Assi, F. M. and Caseli, H. d. M. (2024). Biases in gpt-3.5 turbo model: a case study regarding gender and language. In Simp \'o sio Brasileiro de Tecnologia da Informa c \ a o e da Linguagem Humana (STIL) , pages 294--305. SBC

work page 2024

[4] [4]

L., Barocas, S., Daum \'e III, H., and Wallach, H

Blodgett, S. L., Barocas, S., Daum \'e III, H., and Wallach, H. (2020). Language (technology) is power: A critical survey of ``bias'' in NLP . In 58th Annual Meeting of the Association for Computational Linguistics , pages 5454--5476

work page 2020

[5] [5]

Bonil, G., Hashiguti, S., Silva, J., Gondim, J., Maia, H., Silva, N., Pedrini, H., and Avila, S. (2025). Yet another algorithmic bias: A discursive analysis of large language models reinforcing dominant discourses on gender and race. arXiv 2508.10304

work page arXiv 2025

[6] [6]

and Bowman, S

Bordia, S. and Bowman, S. R. (2019). Identifying and reducing gender bias in word-level language models. In Conference of the North A merican Chapter of the Association for Computational Linguistics: Student Research Workshop , pages 7--15

work page 2019

[7] [7]

and Harabasz, J

Caliński, T. and Harabasz, J. (1974). A dendrite method for cluster analysis. Communications in Statistics -- Theory and Methods , 3:1--27

work page 1974

[8] [8]

Cervantes, J., Garcia-Lamont, F., Rodr\'iguez-Mazahua, L., and Lopez, A. (2020). A comprehensive survey on support vector machine classification: Applications , challenges and trends. Neurocomputing , 408:189--215

work page 2020

[9] [9]

Chen, J., Xiao, S., Zhang, P., Luo, K., Lian, D., and Liu, Z. (2024). M 3-embedding: Multi-linguality, multi-functionality, multi-granularity text embeddings through self-knowledge distillation. In Findings of the Association for Computational Linguistics: ACL 2024 , pages 2318--2335. Association for Computational Linguistics

work page 2024

[10] [10]

X., Silva, D

Corazza, B. X., Silva, D. V. S., and Neves, C. A. d. B. (2024). ``Ser ou não ser'' digno de uma história de amor: inovações do ChatGPT e persistência colonial na validação de existências . Domínios de Lingu@gem , 18:e1831

work page 2024

[11] [11]

and Vapnik, V

Cortes, C. and Vapnik, V. (1995). Support-Vector Networks . Machine Learning , 20(3):273--297

work page 1995

[12] [12]

Courtine, J. J. (1981). Analyse du discours politique (le discours communiste addres \'e aux chr \'e tiens). Langages , 62:19--128

work page 1981

[13] [13]

Grant, D. L. and Grant, M. B. (1975). Some notes on the capital `` N ''. Phylon (1960-) , 36(4):435--443

work page 1975

[14] [14]

Grattafiori, A. et al. (2024). The Llama 3 Herd of Models

work page 2024

[15] [15]

Guardian, T. (2021). Facebook data leak: Details from 533 million users found on website for hackers

work page 2021

[16] [16]

Hashiguti, S. T. (2015). Corpo de Mem \'o ria . Paco Editorial, S \ a o Paulo

work page 2015

[17] [17]

Kilomba, G. (2016). Plantation memories: episodes of everyday racism . UNRAST-Verlag, Münster, 4th edition edition

work page 2016

[18] [18]

and Bamman, D

Lucy, L. and Bamman, D. (2021). Gender and representation bias in GPT-3 generated stories . In Third Workshop on Narrative Understanding , pages 48--55

work page 2021

[19] [19]

Maxwell, J. (1992). Understanding and validity in qualitative research. Harvard Educational Review , 62(3):279--301

work page 1992

[20] [20]

R., and Rudinger, R

May, C., Wang, A., Bordia, S., Bowman, S. R., and Rudinger, R. (2019). On measuring social biases in sentence encoders. In Conference of the North A merican Chapter of the Association for Computational Linguistics: Human Language Technologies , pages 622--628. Association for Computational Linguistics

work page 2019

[21] [21]

McInnes, L., Healy, J., Saul, N., and Grossberger, L. (2018). Umap: Uniform manifold approximation and projection. The Journal of Open Source Software , 3(29):861

work page 2018

[22] [22]

Pêcheux, M. (2022). O Discurso - Estrutura ou Acontecimento . Pontes, Campinas. Translation: Eni Puccinelli Orlandi

work page 2022

[23] [23]

Rajagopalan, K. (2023). A disciplina chamada lingu \' stica aplicada e as contribui c \ o es de Luiz Paulo da Moita Lopes . Oficina de Lingu \' stica Aplicada INdisciplinar: homenagem a Luiz Paulo da Moita Lopes. Campinas, SP: Editora da Unicamp , pages 193--212

work page 2023

[24] [24]

Remy, P. (2021). Name dataset. https://github.com/philipperemy/name-dataset

work page 2021

[25] [25]

Salinas, A., Haim, A., and Nyarko, J. (2024). What's in a name? Auditing large language models for race and gender bias . arXiv preprint arXiv:2402.14875

work page arXiv 2024

[26] [26]

Sheng, E., Chang, K.-W., Natarajan, P., and Peng, N. (2019). The woman worked as a babysitter: On biases in language generation. In International Joint Conference on Natural Language Processing , pages 3407--3412

work page 2019

[27] [27]

Silva, T. (2022). Racismo Algor \' tmico: Intelig \^e ncia Artificial e Discrimina c \ a o nas Redes Digitais . Edi c \ o es SESC SP, São Paulo

work page 2022

[28] [28]

Tharps, L. L. (2014). I refuse to remain in the lower case. https://myamericanmeltingpot.com/2014/06/02/i-refuse-to-remain-in-the-lower-case. [Accessed 22-1-2025]

work page 2014

[29] [29]

N., Gautam, S., Panchanadikar, R., Huang, T.-H., and Wilson, S

Venkit, P. N., Gautam, S., Panchanadikar, R., Huang, T.-H., and Wilson, S. (2023). Nationality bias in text generation. arXiv preprint arXiv:2302.02463

work page arXiv 2023

[30] [30]

Zao-Sanders, M. (2025). How People Are Really Using Gen AI in 2025 . https://hbr.org/2025/04/how-people-are-really-using-gen-ai-in-2025

work page 2025

[31] [31]

write newline

" write newline "" before.all 'output.state := FUNCTION fin.entry add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION new.sentence output.state after.block = 'skip output.state before.all = 'skip after.sentence 'output.state := if if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 if FUNCTIO...

work page

[32] [32]

write newline

" write newline "" before.all 'output.state := FUNCTION fin.entry add.period write newline FUNCTION new.block output.state before.all = 'skip after.block 'output.state := if FUNCTION new.sentence output.state after.block = 'skip output.state before.all = 'skip after.sentence 'output.state := if if FUNCTION not #0 #1 if FUNCTION and 'skip pop #0 if FUNCTIO...

work page