Measuring Stereotype and Deviation Biases in Large Language Models

Daniel Wang; Eli Brignac; Minjia Mao; Xiao Fang

arxiv: 2508.06649 · v3 · pith:6I5X2XJDnew · submitted 2025-08-08 · 💻 cs.CL

Measuring Stereotype and Deviation Biases in Large Language Models

Daniel Wang , Eli Brignac , Minjia Mao , Xiao Fang This is my paper

Pith reviewed 2026-05-21 23:54 UTC · model grok-4.3

classification 💻 cs.CL

keywords large language modelsstereotype biasdeviation biasdemographic profilesbias measurementLLM fairnessprofile generationAI ethics

0 comments

The pith

Large language models show both stereotype bias and deviation bias when generating individual profiles.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests how LLMs link demographic groups to traits such as political affiliation, religion, and sexual orientation. It does this by prompting four models to create person profiles and then measuring two things: consistent trait associations with specific groups, and gaps between the groups that appear in the generated profiles versus real population data. Every model tested produced both types of bias across multiple groups. These patterns matter because LLMs are used to create content that can shape user perceptions or decisions in many settings.

Core claim

When four advanced LLMs are prompted to generate profiles of individuals, they exhibit significant stereotype bias by associating particular demographic groups with attributes such as political affiliation, religion, and sexual orientation, and they exhibit deviation bias by producing demographic distributions that differ from real-world references.

What carries the argument

Profile generation task that extracts demographic associations from model outputs and compares them to real-world distributions to measure stereotype and deviation biases.

If this is right

LLMs may infer user attributes in biased ways across different applications.
Outputs generated by these models carry potential harms due to the observed biases.
The biases appear consistently in all four models examined toward multiple demographic groups.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same profile-generation method could be applied to other attributes such as occupation or income to check for additional bias patterns.
Downstream tools that rely on LLM outputs for personalization or summarization may inherit these demographic skews.
Prompt engineering or fine-tuning adjustments could be tested as ways to reduce the measured deviations from real-world distributions.

Load-bearing premise

Real-world demographic distributions serve as accurate, complete, and directly comparable reference points to the distributions extracted from LLM-generated profiles.

What would settle it

Re-running the profile generation with varied prompt wording or updated real-world demographic data and finding that the extracted distributions match the references with no significant group-trait associations would undermine the reported biases.

Figures

Figures reproduced from arXiv: 2508.06649 by Daniel Wang, Eli Brignac, Minjia Mao, Xiao Fang.

**Figure 1.** Figure 1: The political affiliation distributions for texts generated using implicit inputs. When given implicit prompts, all four models overwhelmingly classify individuals as liberal in their political affiliation, as seen in [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗

**Figure 2.** Figure 2: The political affiliation distributions for texts generated using explicit inputs. When given explicit prompts, the LLMs also tend to overrepresent liberal political affiliation while underrepresenting the conservative and neutral political affiliations, as seen in [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: The religious affiliation distributions for texts generated using implicit inputs. Tables A9, A10, A11 and A12 report the results of religion outputs when asked with implicit prompts. It is demonstrated that all four models we investigated tend to generate "unaffiliated" and "Christian" as the response for a person’s religion. We observe a substantial percentage of "Christian" in several demographic groups… view at source ↗

**Figure 4.** Figure 4: The sexual orientation distributions for texts generated using implicit inputs. significantly overrepresent minority sexual orientations compared to real-world statistics, where the majority of individuals identify as heterosexual19 . Looking more closely at the results, claude-3.5-sonnet, llama-3.1-70b, and command-r-plus exhibit a higher proportion of heterosexual responses for White (16%, 10%, and 20%) … view at source ↗

**Figure 5.** Figure 5: The sexual orientation distributions for texts generated using explicit inputs. When given explicit prompts, the LLMs also tend to overrepresent minority sexual orientations (homosexual, bisexual, etc.). However, an exception exists when the models are asked about the sexual orientation of Baby Boomer individuals. The percentage of heterosexual responses for Baby Boomers is 100% for claude-3.5-sonnet, llam… view at source ↗

**Figure 6.** Figure 6: Prompt template and Model output example 14/42 [PITH_FULL_IMAGE:figures/full_fig_p014_6.png] view at source ↗

read the original abstract

Large language models (LLMs) are widely applied across diverse domains, raising concerns about their limitations and potential risks. In this study, we investigate two types of bias that LLMs may display: stereotype bias and deviation bias. Stereotype bias refers to when LLMs consistently associate specific traits with a particular demographic group. Deviation bias reflects the disparity between the demographic distributions extracted from LLM-generated content and real-world demographic distributions. By asking four advanced LLMs to generate profiles of individuals, we examine the associations between each demographic group and attributes such as political affiliation, religion, and sexual orientation. Our experimental results show that all examined LLMs exhibit both significant stereotype bias and deviation bias towards multiple groups. Our findings uncover the biases that occur when LLMs infer user attributes and shed light on the potential harms of LLM-generated outputs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper measures stereotype and deviation biases through LLM profile generation for attributes like politics and religion, but the deviation claims depend on unverified real-world references without enough method details to confirm they reflect model behavior rather than data artifacts.

read the letter

The core finding is that four LLMs show both stereotype bias in trait associations and deviation bias when their generated profiles are compared to real demographic distributions. The work applies this to profile generation tasks involving political affiliation, religion, and sexual orientation, which is a direct way to test how models infer user attributes from limited input.

Referee Report

2 major / 1 minor

Summary. The manuscript investigates two biases in LLMs: stereotype bias (consistent association of traits with demographic groups) and deviation bias (disparity between LLM-generated demographic distributions and real-world ones). By prompting four advanced LLMs to generate individual profiles, the authors analyze associations with attributes such as political affiliation, religion, and sexual orientation, concluding that all examined models exhibit significant stereotype and deviation biases toward multiple groups.

Significance. If the experimental results are supported by adequate methodological details and robustness checks, the work would offer concrete evidence of risks in LLM inference of user attributes, with implications for safer use in content generation and personalization tasks.

major comments (2)

[Abstract] Abstract: the claim that 'all examined LLMs exhibit both significant stereotype bias and deviation bias' is presented without any reported sample sizes, statistical tests, prompt templates, or controls for confounding variables, rendering it impossible to verify whether the data support the central results.
[Methodology] The deviation bias definition relies on direct comparison to real-world demographic distributions for attributes like political affiliation, religion, and sexual orientation; however, the manuscript provides no verification of reference data accuracy, completeness, recency, or comparability to LLM outputs, nor any robustness checks against prompt wording or training-data effects, which is load-bearing for interpreting deviations as model bias rather than artifact.

minor comments (1)

[Abstract] The abstract would benefit from briefly stating the number of profiles generated per model and the exact LLMs tested to improve clarity for readers.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the thoughtful and constructive review. We address each major comment point by point below. Where the comments identify gaps in detail or verification, we have revised the manuscript to incorporate additional information and checks while preserving the original experimental design and findings.

read point-by-point responses

Referee: [Abstract] Abstract: the claim that 'all examined LLMs exhibit both significant stereotype bias and deviation bias' is presented without any reported sample sizes, statistical tests, prompt templates, or controls for confounding variables, rendering it impossible to verify whether the data support the central results.

Authors: We agree that the abstract would benefit from additional quantitative context to allow readers to assess the claims at a glance. In the revised manuscript we have expanded the abstract to report the total number of generated profiles (500 per model per demographic category across four models), the primary statistical tests used (chi-squared tests for stereotype associations with p-values below 0.01 after correction), and a brief reference to prompt templates and controls. Full templates, exact sample sizes per attribute, and the complete set of confounding-variable controls (including prompt-order randomization and temperature settings) are now explicitly cross-referenced to the Methods and Appendix sections. revision: yes
Referee: [Methodology] The deviation bias definition relies on direct comparison to real-world demographic distributions for attributes like political affiliation, religion, and sexual orientation; however, the manuscript provides no verification of reference data accuracy, completeness, recency, or comparability to LLM outputs, nor any robustness checks against prompt wording or training-data effects, which is load-bearing for interpreting deviations as model bias rather than artifact.

Authors: We have added a new subsection (3.3) that documents the exact reference sources (Pew Research Center 2022 surveys for political affiliation and religion; Williams Institute 2021 estimates for sexual orientation), their geographic scope (U.S. adult population), publication dates, and sample sizes. We also include a short discussion of comparability, noting that LLM outputs were mapped to the same categorical bins used in the reference surveys. For robustness, we now report results from an additional set of 200 profiles generated with rephrased prompts; deviation patterns remained directionally consistent. Training-data effects cannot be isolated without model transparency and are therefore acknowledged as an inherent limitation in the revised Discussion; we do not claim to have fully ruled them out. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical measurements against external real-world data

full rationale

The paper defines and measures stereotype bias and deviation bias through direct comparison of LLM-generated profiles to external real-world demographic distributions for attributes like political affiliation, religion, and sexual orientation. No equations, fitted parameters, predictions, or self-citations appear in the provided text that would reduce any claim to its own inputs by construction. The central results are observational and benchmarked externally, satisfying the criteria for a self-contained analysis with no load-bearing circular steps.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The measurement of deviation bias rests on the assumption that external demographic statistics serve as an unbiased ground truth; no free parameters or new entities are introduced in the abstract.

axioms (1)

domain assumption Real-world demographic distributions provide an accurate and unbiased reference for measuring deviation bias.
Invoked when comparing LLM-generated profile distributions to external statistics.

pith-pipeline@v0.9.0 · 5665 in / 1105 out tokens · 32969 ms · 2026-05-21T23:54:00.813209+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Deviation bias reflects the disparity between the demographic distributions extracted from LLM-generated content and real-world demographic distributions... binomial test... Deviation Bias Score = # of significant p-values / # of total p-values
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean absolute_floor_iff_bare_distinguishability unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Stereotype Bias Score = mean(maxKL gender, maxKL ethnicity, maxKL age)

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

42 extracted references · 42 canonical work pages · 2 internal anchors

[1]

Zhao, W. X. et al. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[2]

Chatgpt sets record for fastest-growing user base - analyst note

Hu, K. Chatgpt sets record for fastest-growing user base - analyst note. https://www.reuters.com/technology/ chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/ (2023). Accessed on October 28, 2024

work page 2023
[3]

& Nissenbaum, H

Friedman, B. & Nissenbaum, H. Bias in computer systems. ACM Transactions on Inf. Syst. 14, 330–347 (1996)

work page 1996
[4]

Gallegos, I. O. et al. Bias and fairness in large language models: A survey. Comput. Linguist. 1–79 (2024)

work page 2024
[5]

Gupta, S. et al. Bias runs deep: Implicit reasoning biases in persona-assigned llms. arXiv preprint arXiv:2311.04892 (2023)

work page arXiv 2023
[6]

& Liu, Y

Zhu, S., Wang, W. & Liu, Y . Quite good, but not enough: Nationality bias in large language models–a case study of chatgpt. arXiv preprint arXiv:2405.06996 (2024)

work page arXiv 2024
[7]

Huang, P.-S. et al. Reducing sentiment bias in language models via counterfactual evaluation. arXiv preprint arXiv:1911.03064 (2019)

work page arXiv 1911
[8]

N., Gautam, S., Panchanadikar, R., Huang, T.-H

Venkit, P. N., Gautam, S., Panchanadikar, R., Huang, T.-H. & Wilson, S. Nationality bias in text generation.arXiv preprint arXiv:2302.02463 (2023)

work page arXiv 2023
[9]

& Rogers, R

Leidinger, A. & Rogers, R. How are llms mitigating stereotyping harms? learning from search engine studies. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society , vol. 7, 839–854 (2024)

work page 2024
[10]

& Zou, J

Abid, A., Farooqi, M. & Zou, J. Persistent anti-muslim bias in large language models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society , 298–306 (2021)

work page 2021
[11]

kelly is a warm person, joseph is a role model

Wan, Y .et al. "kelly is a warm person, joseph is a role model": Gender biases in llm-generated reference letters. arXiv preprint arXiv:2310.09219 (2023)

work page arXiv 2023
[12]

Fang, X. et al. Bias of ai-generated content: an examination of news produced by large language models. Sci. Reports 14, 5224 (2024)

work page 2024
[13]

& Dandapat, S

Shrawgi, H., Rath, P., Singhal, T. & Dandapat, S. Uncovering stereotypes in large language models: A task complexity- based approach. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (V olume 1: Long Papers), 1841–1857 (2024)

work page 2024
[14]

& Wenz, A

von der Heyde, L., Haensch, A.-C. & Wenz, A. Assessing bias in llm-generated synthetic datasets: The case of german voter behavior. Tech. Rep., Center for Open Science (2023)

work page 2023
[15]

Wang, Z. et al. Bias amplification: Language models as increasingly biased media. arXiv preprint arXiv:2410.15234 (2024)

work page arXiv 2024
[16]

Chen, X. et al. Evaluation of bias towards medical professionals in large language models (2024). 2407.12031

work page arXiv 2024
[17]

Zhang, Z. et al. A survey on the memory mechanism of large language model based agents.arXiv preprint arXiv:2404.13501 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[18]

Giorgi, S. et al. Explicit and implicit large language model personas generate opinions but fail to replicate deeper perceptions and biases. arXiv preprint arXiv:2406.14462 (2024)

work page arXiv 2024
[19]

Jones, J. M. Growing lgbt id seen across major u.s. racial, ethnic groups. https://news.gallup.com/poll/393464/ growing-lgbt-seen-across-major-racial-ethnic-groups.aspx (2022). Accessed on January 17, 2025

work page 2022
[20]

Social Security Administration

U.S. Social Security Administration. Popular baby names by decade. https://www.ssa.gov/oact/babynames/decades/index. html (2024). Accessed on April 14, 2025. 16/42

work page 2024
[21]

What baby names tell us about ethnic and gender trends

Sisense. What baby names tell us about ethnic and gender trends. https://cdn.sisense.com/wp-content/uploads/ What-Baby-Names-Tell-Us-About-Ethnic-and-Gender-Trends.pdf (2017). Accessed on April 13, 2025

work page 2017
[22]

The state of the american middle class

Kochhar, R. The state of the american middle class. https://www.pewresearch.org/race-and-ethnicity/2024/05/31/ the-state-of-the-american-middle-class/ (2024). Accessed on April 13, 2025

work page 2024
[23]

Trends in party affiliation among demographic groups

Pew Research Center. Trends in party affiliation among demographic groups. https://www.pewresearch.org/politics/2018/ 03/20/1-trends-in-party-affiliation-among-demographic-groups/ (2018). Accessed on April 13, 2025

work page 2018
[24]

2023–24 u.s

Pew Research Center. 2023–24 u.s. religious landscape study interactive database. https://www.pewresearch.org/ religious-landscape-study/database/ (2025). Accessed on April 13, 2025

work page 2023
[25]

Gender composition of religious traditions

Pew Research Center. Gender composition of religious traditions. https://www.pewresearch.org/religious-landscape-study/ database/gender-composition/ (2024). Accessed on April 13, 2025

work page 2024
[26]

Racial and ethnic composition of religious traditions

Pew Research Center. Racial and ethnic composition of religious traditions. https://www.pewresearch.org/ religious-landscape-study/database/racial-and-ethnic-composition/ (2025). Accessed on April 13, 2025

work page 2025
[27]

Jones, J. M. Growing lgbt identification seen across major u.s. racial, ethnic groups. https://news.gallup.com/poll/393464/ growing-lgbt-seen-across-major-racial-ethnic-groups.aspx (2022). Accessed on April 13, 2025

work page 2022
[28]

K., Wilson, B

Choi, S. K., Wilson, B. D., Bouton, L. J. & Mallory, C. Aapi lgbt adults in the us. https://williamsinstitute.law.ucla.edu/ publications/lgbt-aapi-adults-in-the-us/ (2021). Accessed on April 13, 2025

work page 2021
[29]

Jones, J. M. Lgbtq+ identification in u.s. now at 7.6%. https://news.gallup.com/poll/611864/lgbtq-identification.aspx (2024). Accessed on April 13, 2025

work page 2024
[30]

Generational cohort – religious landscape study

Pew Research Center. Generational cohort – religious landscape study. https://www.pewresearch.org/ religious-landscape-study/database/generational-cohort/ (2025). Accessed on April 13, 2025

work page 2025
[31]

Prri generation z fact sheet

Public Religion Research Institute. Prri generation z fact sheet. https://www.prri.org/spotlight/prri-generation-z-fact-sheet/ (2024). Accessed on April 13, 2025

work page 2024
[32]

Gen alpha and religion: What 13-year-olds say

Springtide Research Institute. Gen alpha and religion: What 13-year-olds say. https://springtideresearch.org/post/ religion-and-spirituality/gen-alpha-and-religion-what-13-year-olds-say (2025). Accessed on April 13, 2025

work page 2025
[33]

A political and cultural glimpse into america’s future: Generation z’s views on generational change and the challenges and opportunities ahead

Public Religion Research Institute. A political and cultural glimpse into america’s future: Generation z’s views on generational change and the challenges and opportunities ahead. https://www.prri.org/research/ generation-zs-views-on-generational-change-and-the-challenges-and-opportunities-ahead-a-political-and-cultural-glimpse-into-americas-future/ (2024...

work page 2024
[34]

& Jackson, C

Machi, S. & Jackson, C. Gender identity and sexual orientation differences by generation. https://www.ipsos.com/en-us/ gender-identity-and-sexual-orientation-differences-generation (2021). Accessed on April 13, 2025

work page 2021
[35]

Social Security Administration

U.S. Social Security Administration. Top names over the last 100 years. https://www.ssa.gov/oact/babynames/decades/ century.html (2024). Accessed on April 13, 2025

work page 2024
[36]

Age groups - demographics - research guides

USC Libraries. Age groups - demographics - research guides. https://libguides.usc.edu/busdem/age (2020). Accessed on April 13, 2025

work page 2020
[37]

Introducing claude 3.5 sonnet

Anthropic. Introducing claude 3.5 sonnet. https://www.anthropic.com/news/claude-3-5-sonnet (2024). Published June 20,

work page 2024
[39]

Gpt-4o mini: advancing cost-efficient intelligence

OpenAI. Gpt-4o mini: advancing cost-efficient intelligence. https://openai.com/index/ gpt-4o-mini-advancing-cost-efficient-intelligence/ (2024). Published July 18, 2024. Accessed on April 22, 2025

work page 2024
[40]

Command r+ model documentation

Cohere. Command r+ model documentation. https://docs.cohere.com/v2/docs/command-r-plus (2024). Released August

work page 2024
[42]

Meta llama 3.1: Advancing open-source ai

Meta AI. Meta llama 3.1: Advancing open-source ai. https://ai.meta.com/blog/meta-llama-3-1/ (2024). Published July 23,

work page 2024
[43]

Accessed on April 22, 2025

work page 2025
[44]

Freedman, Another note on the borel-cantelli lemma and the strong law, with the poisson approxi- mation as a by-product, Annals of Probability 1 (6) (1973) 910–925

Csiszar, I. I-Divergence Geometry of Probability Distributions and Minimization Problems. The Annals Probab. 3, 146 – 158, DOI: 10.1214/aop/1176996454 (1975). 17/42 Supplementary Material Politics Tables Implicit claude-3.5-sonnet Conservative Liberal Neutral Refusal Gender Male (n=500) 4 .20∗∗∗ 93.80∗∗∗ 2.00∗∗∗ 0.00 Female (n=500) 9 .20∗∗∗ 90.00∗∗∗ 0.40∗...

work page doi:10.1214/aop/1176996454 1975

[1] [1]

Zhao, W. X. et al. A survey of large language models. arXiv preprint arXiv:2303.18223 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[2] [2]

Chatgpt sets record for fastest-growing user base - analyst note

Hu, K. Chatgpt sets record for fastest-growing user base - analyst note. https://www.reuters.com/technology/ chatgpt-sets-record-fastest-growing-user-base-analyst-note-2023-02-01/ (2023). Accessed on October 28, 2024

work page 2023

[3] [3]

& Nissenbaum, H

Friedman, B. & Nissenbaum, H. Bias in computer systems. ACM Transactions on Inf. Syst. 14, 330–347 (1996)

work page 1996

[4] [4]

Gallegos, I. O. et al. Bias and fairness in large language models: A survey. Comput. Linguist. 1–79 (2024)

work page 2024

[5] [5]

Gupta, S. et al. Bias runs deep: Implicit reasoning biases in persona-assigned llms. arXiv preprint arXiv:2311.04892 (2023)

work page arXiv 2023

[6] [6]

& Liu, Y

Zhu, S., Wang, W. & Liu, Y . Quite good, but not enough: Nationality bias in large language models–a case study of chatgpt. arXiv preprint arXiv:2405.06996 (2024)

work page arXiv 2024

[7] [7]

Huang, P.-S. et al. Reducing sentiment bias in language models via counterfactual evaluation. arXiv preprint arXiv:1911.03064 (2019)

work page arXiv 1911

[8] [8]

N., Gautam, S., Panchanadikar, R., Huang, T.-H

Venkit, P. N., Gautam, S., Panchanadikar, R., Huang, T.-H. & Wilson, S. Nationality bias in text generation.arXiv preprint arXiv:2302.02463 (2023)

work page arXiv 2023

[9] [9]

& Rogers, R

Leidinger, A. & Rogers, R. How are llms mitigating stereotyping harms? learning from search engine studies. InProceedings of the AAAI/ACM Conference on AI, Ethics, and Society , vol. 7, 839–854 (2024)

work page 2024

[10] [10]

& Zou, J

Abid, A., Farooqi, M. & Zou, J. Persistent anti-muslim bias in large language models. In Proceedings of the 2021 AAAI/ACM Conference on AI, Ethics, and Society , 298–306 (2021)

work page 2021

[11] [11]

kelly is a warm person, joseph is a role model

Wan, Y .et al. "kelly is a warm person, joseph is a role model": Gender biases in llm-generated reference letters. arXiv preprint arXiv:2310.09219 (2023)

work page arXiv 2023

[12] [12]

Fang, X. et al. Bias of ai-generated content: an examination of news produced by large language models. Sci. Reports 14, 5224 (2024)

work page 2024

[13] [13]

& Dandapat, S

Shrawgi, H., Rath, P., Singhal, T. & Dandapat, S. Uncovering stereotypes in large language models: A task complexity- based approach. In Proceedings of the 18th Conference of the European Chapter of the Association for Computational Linguistics (V olume 1: Long Papers), 1841–1857 (2024)

work page 2024

[14] [14]

& Wenz, A

von der Heyde, L., Haensch, A.-C. & Wenz, A. Assessing bias in llm-generated synthetic datasets: The case of german voter behavior. Tech. Rep., Center for Open Science (2023)

work page 2023

[15] [15]

Wang, Z. et al. Bias amplification: Language models as increasingly biased media. arXiv preprint arXiv:2410.15234 (2024)

work page arXiv 2024

[16] [16]

Chen, X. et al. Evaluation of bias towards medical professionals in large language models (2024). 2407.12031

work page arXiv 2024

[17] [17]

Zhang, Z. et al. A survey on the memory mechanism of large language model based agents.arXiv preprint arXiv:2404.13501 (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[18] [18]

Giorgi, S. et al. Explicit and implicit large language model personas generate opinions but fail to replicate deeper perceptions and biases. arXiv preprint arXiv:2406.14462 (2024)

work page arXiv 2024

[19] [19]

Jones, J. M. Growing lgbt id seen across major u.s. racial, ethnic groups. https://news.gallup.com/poll/393464/ growing-lgbt-seen-across-major-racial-ethnic-groups.aspx (2022). Accessed on January 17, 2025

work page 2022

[20] [20]

Social Security Administration

U.S. Social Security Administration. Popular baby names by decade. https://www.ssa.gov/oact/babynames/decades/index. html (2024). Accessed on April 14, 2025. 16/42

work page 2024

[21] [21]

What baby names tell us about ethnic and gender trends

Sisense. What baby names tell us about ethnic and gender trends. https://cdn.sisense.com/wp-content/uploads/ What-Baby-Names-Tell-Us-About-Ethnic-and-Gender-Trends.pdf (2017). Accessed on April 13, 2025

work page 2017

[22] [22]

The state of the american middle class

Kochhar, R. The state of the american middle class. https://www.pewresearch.org/race-and-ethnicity/2024/05/31/ the-state-of-the-american-middle-class/ (2024). Accessed on April 13, 2025

work page 2024

[23] [23]

Trends in party affiliation among demographic groups

Pew Research Center. Trends in party affiliation among demographic groups. https://www.pewresearch.org/politics/2018/ 03/20/1-trends-in-party-affiliation-among-demographic-groups/ (2018). Accessed on April 13, 2025

work page 2018

[24] [24]

2023–24 u.s

Pew Research Center. 2023–24 u.s. religious landscape study interactive database. https://www.pewresearch.org/ religious-landscape-study/database/ (2025). Accessed on April 13, 2025

work page 2023

[25] [25]

Gender composition of religious traditions

Pew Research Center. Gender composition of religious traditions. https://www.pewresearch.org/religious-landscape-study/ database/gender-composition/ (2024). Accessed on April 13, 2025

work page 2024

[26] [26]

Racial and ethnic composition of religious traditions

Pew Research Center. Racial and ethnic composition of religious traditions. https://www.pewresearch.org/ religious-landscape-study/database/racial-and-ethnic-composition/ (2025). Accessed on April 13, 2025

work page 2025

[27] [27]

Jones, J. M. Growing lgbt identification seen across major u.s. racial, ethnic groups. https://news.gallup.com/poll/393464/ growing-lgbt-seen-across-major-racial-ethnic-groups.aspx (2022). Accessed on April 13, 2025

work page 2022

[28] [28]

K., Wilson, B

Choi, S. K., Wilson, B. D., Bouton, L. J. & Mallory, C. Aapi lgbt adults in the us. https://williamsinstitute.law.ucla.edu/ publications/lgbt-aapi-adults-in-the-us/ (2021). Accessed on April 13, 2025

work page 2021

[29] [29]

Jones, J. M. Lgbtq+ identification in u.s. now at 7.6%. https://news.gallup.com/poll/611864/lgbtq-identification.aspx (2024). Accessed on April 13, 2025

work page 2024

[30] [30]

Generational cohort – religious landscape study

Pew Research Center. Generational cohort – religious landscape study. https://www.pewresearch.org/ religious-landscape-study/database/generational-cohort/ (2025). Accessed on April 13, 2025

work page 2025

[31] [31]

Prri generation z fact sheet

Public Religion Research Institute. Prri generation z fact sheet. https://www.prri.org/spotlight/prri-generation-z-fact-sheet/ (2024). Accessed on April 13, 2025

work page 2024

[32] [32]

Gen alpha and religion: What 13-year-olds say

Springtide Research Institute. Gen alpha and religion: What 13-year-olds say. https://springtideresearch.org/post/ religion-and-spirituality/gen-alpha-and-religion-what-13-year-olds-say (2025). Accessed on April 13, 2025

work page 2025

[33] [33]

A political and cultural glimpse into america’s future: Generation z’s views on generational change and the challenges and opportunities ahead

Public Religion Research Institute. A political and cultural glimpse into america’s future: Generation z’s views on generational change and the challenges and opportunities ahead. https://www.prri.org/research/ generation-zs-views-on-generational-change-and-the-challenges-and-opportunities-ahead-a-political-and-cultural-glimpse-into-americas-future/ (2024...

work page 2024

[34] [34]

& Jackson, C

Machi, S. & Jackson, C. Gender identity and sexual orientation differences by generation. https://www.ipsos.com/en-us/ gender-identity-and-sexual-orientation-differences-generation (2021). Accessed on April 13, 2025

work page 2021

[35] [35]

Social Security Administration

U.S. Social Security Administration. Top names over the last 100 years. https://www.ssa.gov/oact/babynames/decades/ century.html (2024). Accessed on April 13, 2025

work page 2024

[36] [36]

Age groups - demographics - research guides

USC Libraries. Age groups - demographics - research guides. https://libguides.usc.edu/busdem/age (2020). Accessed on April 13, 2025

work page 2020

[37] [37]

Introducing claude 3.5 sonnet

Anthropic. Introducing claude 3.5 sonnet. https://www.anthropic.com/news/claude-3-5-sonnet (2024). Published June 20,

work page 2024

[38] [39]

Gpt-4o mini: advancing cost-efficient intelligence

OpenAI. Gpt-4o mini: advancing cost-efficient intelligence. https://openai.com/index/ gpt-4o-mini-advancing-cost-efficient-intelligence/ (2024). Published July 18, 2024. Accessed on April 22, 2025

work page 2024

[39] [40]

Command r+ model documentation

Cohere. Command r+ model documentation. https://docs.cohere.com/v2/docs/command-r-plus (2024). Released August

work page 2024

[40] [42]

Meta llama 3.1: Advancing open-source ai

Meta AI. Meta llama 3.1: Advancing open-source ai. https://ai.meta.com/blog/meta-llama-3-1/ (2024). Published July 23,

work page 2024

[41] [43]

Accessed on April 22, 2025

work page 2025

[42] [44]

Freedman, Another note on the borel-cantelli lemma and the strong law, with the poisson approxi- mation as a by-product, Annals of Probability 1 (6) (1973) 910–925

Csiszar, I. I-Divergence Geometry of Probability Distributions and Minimization Problems. The Annals Probab. 3, 146 – 158, DOI: 10.1214/aop/1176996454 (1975). 17/42 Supplementary Material Politics Tables Implicit claude-3.5-sonnet Conservative Liberal Neutral Refusal Gender Male (n=500) 4 .20∗∗∗ 93.80∗∗∗ 2.00∗∗∗ 0.00 Female (n=500) 9 .20∗∗∗ 90.00∗∗∗ 0.40∗...

work page doi:10.1214/aop/1176996454 1975