Not-quite-human tastes: the stylized omnivorousness of LLM survey surrogates
Pith reviewed 2026-06-30 06:13 UTC · model grok-4.3
The pith
LLM survey surrogates show systematic positive bias for liking and lose the relational and demographic structure of human cultural tastes.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Large-language models produce silicon surrogates whose tastes are highly stylized facsimiles of human tastes: silicon samples have a systematic positive-bias for liking that inflates ecological estimates, the complex relationality in real taste structures is completely lost, and very little of the known cultural alignment between tastes and social space is preserved, with attenuated age-taste associations, resurrected anachronistic class-taste associations, and caricaturized gender- and race-taste associations.
What carries the argument
Generation of 277470 LLM silicon surrogates matched to SPPA respondent demographics, followed by direct comparison of liking rates, pairwise taste correlations, and regression coefficients linking tastes to age, class, gender, and race.
If this is right
- Ecological estimates of how much the population likes particular cultural activities will be higher when drawn from silicon samples than from human samples.
- Any analysis that depends on the network of associations among tastes will find essentially no structure in silicon data where human data shows clear patterns.
- Inferences about how tastes vary by age, education, gender, or race will be systematically distorted, with some associations weakened and others invented or exaggerated.
Where Pith is reading between the lines
- Market-research firms already selling synthetic panels may need bias-correction layers before their outputs can substitute for human surveys.
- The absence of taste correlations suggests LLMs are not reproducing the social or psychological mechanisms that generate human taste clusters.
- Policy or marketing uses that rely on accurate mapping of tastes onto demographic groups risk misallocating resources if they draw on silicon data without adjustment.
Load-bearing premise
The specific prompts and sampling procedure used to generate the silicon surrogates produce responses whose distribution matches the target human respondent population without introducing unmeasured artifacts from model training data or prompt design.
What would settle it
A controlled replication that draws both human and silicon responses to identical SPPA items under the same demographic stratification and then measures the size of the positive bias and the drop in taste correlations.
Figures
read the original abstract
Large-language models have proven to be remarkable if inconsistent parrots of public attitudes and opinions. The extent to which LLMs are able to produce reasonable approximations of cultural taste remains an open empirical question that becomes more urgent by the day, with market research companies already offering provisional `synthetic' survey panels and the contamination of standard survey data from LLM-generated responses. In this study, we build on past work on silicon sampling by extending considerations of its algorithmic fidelity and alignment to the domain of cultural consumption. We use large-language models from OpenAI, Anthropic, and DeepSeek to each produce 277,470 (30x9249) silicon surrogates of survey respondents from the Survey of Public Participation in the Arts (SPPA). We find these silicon surrogates' tastes to be highly stylized facsimiles of human tastes. (1) Silicon samples have a systematic postive-bias for liking, resulting in inflated ecological estimates of tastes. The individual-level bias of silicon samples are not well-explained by the WEIRD-bias often discussed in the literature. (2) The complex relationality in real taste structures is completely lost among silicon samples. (3) Finally, very little of the known cultural alignment between tastes and social space are preserved. Silicon samples attenuate age-taste associations, resurrect anachronistic class-taste associations, caricaturize gender- and race-taste associations.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper examines whether large language models can serve as faithful surrogates for human respondents in surveys of cultural tastes. Using OpenAI, Anthropic, and DeepSeek models, it generates 277,470 silicon responses (30 replications of each of 9,249 SPPA respondents) and reports three main distortions relative to the human benchmark: (1) a systematic positive bias in reported liking that inflates ecological estimates of taste prevalence and is not explained by WEIRD bias; (2) complete loss of the complex relational structure among tastes; and (3) attenuation or reversal of known demographic alignments (age, class, gender, race) with tastes.
Significance. If the reported distortions prove robust to prompt variation and sampling choices, the work would be significant for the emerging literature on synthetic survey data. It supplies concrete, large-scale evidence that LLM surrogates systematically misrepresent both marginal distributions and joint structures of cultural preferences, with direct implications for market-research panels and any downstream social-science use of LLM-generated responses.
major comments (3)
- [Methods] Methods section (prompt construction and sampling): the central claim that the three headline distortions are properties of silicon surrogates rather than elicitation artifacts rests on the untested premise that the demographic-conditioning and question-framing templates produce human-like distributions absent model-specific effects. No robustness checks that vary prompt wording, response instructions, or temperature are reported, which is load-bearing for distinguishing intrinsic LLM behavior from prompt-induced shifts.
- [Results] Results on relational structure: the assertion that 'the complex relationality in real taste structures is completely lost' requires explicit metrics (e.g., pairwise correlations, network modularity, or factor loadings) and a direct side-by-side comparison table against the human SPPA matrix; without these, the strength of the 'completely lost' claim cannot be evaluated.
- [Results] Demographic-alignment analyses: the statements that silicon samples 'attenuate age-taste associations, resurrect anachronistic class-taste associations, [and] caricaturize gender- and race-taste associations' are presented directionally; quantitative effect sizes, confidence intervals, and multiple-testing corrections for the many taste-by-demographic tests performed are needed to support the claim that alignment is systematically mis-preserved.
minor comments (2)
- [Abstract] Abstract: 'postive-bias' is a typographical error.
- [Abstract] Abstract: 'WEIRD-bias' is used without definition or citation; a brief parenthetical or reference is required for readers outside the cultural-sociology literature.
Simulated Author's Rebuttal
We thank the referee for their constructive comments, which help clarify the presentation of our findings on LLM survey surrogates. We address each major comment below.
read point-by-point responses
-
Referee: [Methods] Methods section (prompt construction and sampling): the central claim that the three headline distortions are properties of silicon surrogates rather than elicitation artifacts rests on the untested premise that the demographic-conditioning and question-framing templates produce human-like distributions absent model-specific effects. No robustness checks that vary prompt wording, response instructions, or temperature are reported, which is load-bearing for distinguishing intrinsic LLM behavior from prompt-induced shifts.
Authors: We agree that the absence of explicit robustness checks to prompt variations is a limitation in distinguishing model-intrinsic effects from elicitation artifacts. Our multi-model approach and large replication count offer some protection against model-specific artifacts, but we will strengthen the manuscript by adding robustness analyses that vary prompt wording, response instructions, and temperature settings. These will be reported in a revised methods and results section. revision: yes
-
Referee: [Results] Results on relational structure: the assertion that 'the complex relationality in real taste structures is completely lost' requires explicit metrics (e.g., pairwise correlations, network modularity, or factor loadings) and a direct side-by-side comparison table against the human SPPA matrix; without these, the strength of the 'completely lost' claim cannot be evaluated.
Authors: The manuscript includes analyses of taste correlations and relational structures, but we acknowledge that a more explicit presentation with metrics and a comparison table would better support the claim. In the revision, we will add a dedicated table and metrics (pairwise correlations, modularity scores) comparing silicon and human taste matrices to allow direct evaluation of the 'completely lost' assertion. revision: yes
-
Referee: [Results] Demographic-alignment analyses: the statements that silicon samples 'attenuate age-taste associations, resurrect anachronistic class-taste associations, [and] caricaturize gender- and race-taste associations' are presented directionally; quantitative effect sizes, confidence intervals, and multiple-testing corrections for the many taste-by-demographic tests performed are needed to support the claim that alignment is systematically mis-preserved.
Authors: We agree that providing quantitative effect sizes, confidence intervals, and addressing multiple testing would strengthen the demographic alignment results. The current analyses are directional but we will revise to include these statistical details and corrections in the updated results section. revision: yes
Circularity Check
No circularity: purely empirical comparison to external SPPA benchmark with no derivations or self-referential steps.
full rationale
The paper generates 277,470 LLM responses under demographic conditioning and directly compares their taste distributions, relational structure, and demographic alignments against the independent SPPA survey data. No equations, fitted parameters, predictions derived from inputs, or load-bearing self-citations appear in the described method or findings. The three headline results (positive liking bias, collapsed relationality, misaligned demographics) are presented as outcomes of this external comparison rather than reductions to the prompt design itself. The analysis is therefore self-contained against the external benchmark.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
How Cultural Capital Emerged in Gilded Age America: Musical Purification and Cross-Class Inclusion at the New York Philharmonic.Amer. J. Sociology123, 6 (May 2018), 1743–1783. https://doi.org/10.1086/696938 Lisa P. Argyle, Ethan C. Busby, Nancy Fulda, Joshua R. Gubler, Christopher Rytting, and David Wingate
-
[2]
https://doi.org/10.1017/pan.2023.2 Will Atkinson
Out of One, Many: Using Language Models to Simulate Human Samples.Political Analysis31, 3 (July 2023), 337–351. https://doi.org/10.1017/pan.2023.2 Will Atkinson
-
[3]
https://doi.org/10.1177/1749975516639083 Place: London WOS:000375725000005
The Structure of Literary Taste: Class, Gender and Reading in the UK.Cultural Sociology10, 2 (June 2016), 247–266. https://doi.org/10.1177/1749975516639083 Place: London WOS:000375725000005. Will Atkinson
-
[4]
https://doi.org/10.1080/01419870.2026.2651919 _eprint: https://doi.org/10.1080/01419870.2026.2651919
Class, race and lifestyles in the US: a play of spaces.Ethnic and Racial Studies0, 0 (April 2026), 1–24. https://doi.org/10.1080/01419870.2026.2651919 _eprint: https://doi.org/10.1080/01419870.2026.2651919. Christopher A. Bail
-
[5]
https://doi.org/10.1073/pnas.2314021121 James Bisbee, Joshua D
Can Generative AI improve social science?Proceedings of the National Academy of Sciences121, 21 (May 2024), e2314021121. https://doi.org/10.1073/pnas.2314021121 James Bisbee, Joshua D. Clinton, Cassy Dorff, Brenton Kenkel, and Jennifer M. Larson
-
[6]
https://doi.org/10.1017/ pan.2024.5 Pierre Bourdieu
Synthetic Replacements for Human Survey Data? The Perils of Large Language Models.Political Analysis(May 2024), 1–16. https://doi.org/10.1017/ pan.2024.5 Pierre Bourdieu. 1984.Distinction: A Social Critique of the Judgement of Taste. Harvard University Press. Google-Books-ID: nVaS6gS9Jz4C. Pierre Bourdieu. 2001.Masculine Domination. Stanford University Pr...
2024
-
[7]
https://doi.org/10.2139/ssrn.4395751 David Broska, Michael Howes, and Austin van Loon
Using LLMs for Market Research. https://doi.org/10.2139/ssrn.4395751 David Broska, Michael Howes, and Austin van Loon
-
[8]
2025), 1074–1109
The Mixed Subjects Design: Treating Large Language Models as Potentially Informative Observations.Sociological Methods & Research54, 3 (Aug. 2025), 1074–1109. https://doi.org/10. 1177/00491241251326865 18 E. Brown and M. Grover. 2011.Middlebrow Literary Cultures: The Battle of the Brows, 1920-1960. Springer. Google-Books-ID: 7keEDAAAQBAJ. Bram Bulté and A...
2025
-
[9]
LLMs and Cultural Values: The Impact of Prompt Language and Explicit Cultural Framing.Computational Linguistics(Dec. 2025), 1–85. https://doi.org/10.1162/COLI.a.583 Yiting Chen, Tracy Xiao Liu, You Shan, and Songfa Zhong
-
[10]
The emergence of economic rationality of GPT.Proceedings of the National Academy of Sciences120, 51 (Dec. 2023), e2316205120. https://doi.org/10.1073/pnas.2316205120 Currie
-
[11]
https://doi.org/10.48550/arXiv.2512.14562 arXiv:2512.14562 [cs] version:
Polypersona: Persona-Grounded LLM for Synthetic Survey Responses. https://doi.org/10.48550/arXiv.2512.14562 arXiv:2512.14562 [cs] version:
-
[12]
Can AI language models replace human participants?Trends in Cognitive Sciences27, 7 (July 2023), 597–600. https://doi.org/10.1016/j.tics.2023.04.008 Ricardo Dominguez-Olmedo, Moritz Hardt, and Celestine Mendler-Dünner
-
[13]
https://doi.org/10.48550/arXiv.2306.07951 arXiv:2306.07951 [cs]
Questioning the Survey Responses of Large Language Models. https://doi.org/10.48550/arXiv.2306.07951 arXiv:2306.07951 [cs]. Mustafa Emirbayer
-
[14]
Manifesto for a Relational Sociology.Amer. J. Sociology103, 2 (Sept. 1997), 281–317. https: //doi.org/10.1086/231209 John Kenneth Galbraith. 1998.The Affluent Society. Houghton Mifflin Harcourt. Google-Books-ID: buihYlwXhuwC. Yuan Gao, Dokyun Lee, Gordon Burtch, and Sina Fazelpour
-
[15]
Proceedings of the National Academy of Sciences122, 24 (June 2025), e2501660122
Take caution in using LLMs as human surrogates. Proceedings of the National Academy of Sciences122, 24 (June 2025), e2501660122. https://doi.org/10.1073/pnas.2501660122 Amir Goldberg
-
[16]
Mapping Shared Understandings Using Relational Class Analysis: The Case of the Cultural Omnivore Reexamined.Amer. J. Sociology116, 5 (March 2011), 1397–1436. https://doi.org/10.1086/657976 Ali Goli and Amandeep Singh
-
[17]
https://doi.org/10.1287/mksc.2023.0306 John J
Frontiers: Can Large Language Models Capture Human Preferences?Marketing Science 43, 4 (July 2024), 709–722. https://doi.org/10.1287/mksc.2023.0306 John J. Horton
-
[18]
Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? https://doi.org/10.3386/w31122 Juan Isaza
-
[19]
Austin C
Can Synthetic Panels Help Us Find Better Insights? https://www.forbes.com/councils/ forbesbusinesscouncil/2024/10/04/can-synthetic-panels-help-us-find-better-insights/ Section: Small Business. Austin C. Kozlowski, Hyunku Kwon, and James A. Evans
2024
-
[20]
https://doi.org/10.48550/arXiv.2407.11190 arXiv:2407.11190 [cs]
In Silico Sociology: Forecasting COVID-19 Polarization with Large Language Models. https://doi.org/10.48550/arXiv.2407.11190 arXiv:2407.11190 [cs]. Sanguk Lee, Tai-Quan Peng, Matthew H. Goldberg, Seth A. Rosenthal, John E. Kotcher, Edward W. Maibach, and Anthony Leiserowitz
-
[21]
Can large language models estimate public opinion about global warming? An empirical assessment of algorithmic fidelity and bias.PLOS Climate3, 8 (Aug. 2024), e0000429. https://doi.org/10.1371/journal.pclm.0000429 Lawrence W. Levine. 1988.Highbrow/lowbrow: the emergence of cultural hierarchy in America. Harvard University Press, Cambridge, MA. Victoria R....
-
[22]
https://doi.org/10.48550/arXiv.2407.06866 arXiv:2407.06866 [cs]
ChatGPT Doesn’t Trust Chargers Fans: Guardrail Sensitivity in Context. https://doi.org/10.48550/arXiv.2407.06866 arXiv:2407.06866 [cs]. Andy Liu, Mona Diab, and Daniel Fried
-
[23]
https://doi.org/10.48550/arXiv.2405.20253 arXiv:2405.20253 [cs]
Evaluating Large Language Model Biases in Persona-Steered Generation. https://doi.org/10.48550/arXiv.2405.20253 arXiv:2405.20253 [cs]. Omar Lizardo and Sara Skiles
-
[24]
Reconceptualizing and Theorizing “Omnivorousness”: Genetic and Relational Mecha- nisms.Sociological Theory30, 4 (Dec. 2012), 263–282. https://doi.org/10.1177/0735275112466999 Alex Lyman, Bryce Hepner, Lisa P. Argyle, Ethan C. Busby, Joshua R. Gubler, and David Wingate
-
[25]
Balancing Large Language Model Alignment and Algorithmic Fidelity in Social Science Research.Sociological Methods & Research54, 3 (Aug. 2025), 1110–1155. https://doi.org/10.1177/00491241251342008 Bolei Ma, Berk Yoztyurk, Anna-Carolina Haensch, Xinpeng Wang, Markus Herklotz, Frauke Kreuter, Barbara Plank, and Matthias Aßenmacher
-
[26]
Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (Eds.). Association for Computational Linguistics, Vienna,...
-
[27]
What are the temporal dynamics of taste?Poetics(Dec. 2020), 101514. https://doi.org/10.1016/j.poetic. 2020.101514 Xiangyu Ma
-
[28]
2024), 17499755241301603
Tastes and Complex Tastes.Cultural Sociology(Dec. 2024), 17499755241301603. https://doi.org/10.1177/ 17499755241301603 Emma Rose Madden
2024
-
[29]
https://doi.org/10.48550/arXiv.2509.26080 arXiv:2509.26080 [cs]
Evaluating the Use of Large Language Models as Synthetic Social Agents in Social Science Research. https://doi.org/10.48550/arXiv.2509.26080 arXiv:2509.26080 [cs]. John Levi Martin
-
[30]
What Is Field Theory?Amer. J. Sociology109, 1 (July 2003), 1–49. https://doi.org/10.1086/375201 Reem Masoud, Ziquan Liu, Martin Ferianc, Philip C. Treleaven, and Miguel Rodrigues Rodrigues
-
[31]
Cultural Alignment in Large Language Models: An Explanatory Analysis Based on Hofstede’s Cultural Dimensions. InProceedings of the 19 31st International Conference on Computational Linguistics, Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, and Steven Schockaert (Eds.). Association for Computational Linguistics, Abu Dha...
2025
-
[32]
Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–22. https://doi.org/10.1145/3586183.3606763 Richard A. Peterson and Roger M. Kern
-
[33]
https://doi.org/10.2307/2096460 Kyle Puetz
Changing Highbrow Taste: From Snob to Omnivore.American Sociological Review61, 5 (1996), 900–907. https://doi.org/10.2307/2096460 Kyle Puetz
-
[34]
Taste boundaries and friendship preferences: Insights from the formalist approach.Poetics86 (June 2021), 101551. https://doi.org/10.1016/j.poetic.2021.101551 Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto
-
[35]
InProceedings of the 40th International Conference on Machine Learning
Whose Opinions Do Language Models Reflect?. InProceedings of the 40th International Conference on Machine Learning. PMLR, 29971–30004. https://proceedings.mlr.press/v202/santurkar23a.html John Seabrook. 2000.Nobrow: The Culture of Marketing, the Marketing of Culture. Random House. Google-Books-ID: yYxEDwAAQBAJ. Georg Simmel
2000
-
[36]
Fashion.Amer. J. Sociology62, 6 (May 1957), 541–558. https://doi.org/10.1086/222102 Dieter Vandebroeck
-
[37]
https://doi.org/10.1016/j.poetic.2022.101670 VeraSight
‘Thinking through’ technique or thinking ‘through’ technique? Expanding the toolkit of cultural sociology.Poetics91 (April 2022), 101670. https://doi.org/10.1016/j.poetic.2022.101670 VeraSight
-
[38]
Prevalence and Prevention of Large Language Model Use in Crowd Work.Commun. ACM68, 3 (Feb. 2025), 42–47. https://doi.org/10.1145/3685527 Jiancong Xiao, Ziniu Li, Xingyu Xie, Emily Getzen, Cong Fang, Qi Long, and Weijie J. Su
-
[39]
On the Algo- rithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization. J. Amer. Statist. Assoc.120, 552 (Oct. 2025), 2154–2164. https://doi.org/10.1080/01621459.2025.2555067 _eprint: https://doi.org/10.1080/01621459.2025.2555067. Simone Zhang, Janet Xu, and AJ Alvero
-
[40]
2025), 1197–1242
Generative AI Meets Open-Ended Survey Responses: Research Participant Use of AI and Homogenization.Sociological Methods & Research54, 3 (Aug. 2025), 1197–1242. https://doi.org/10.1177/ 00491241251327130 20 A Cramer’s V matrices 21 Table
2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.