pith. sign in

arxiv: 2606.30085 · v1 · pith:VEY2ZS6Tnew · submitted 2026-06-29 · 💻 cs.CL · econ.GN· q-fin.EC

Not-quite-human tastes: the stylized omnivorousness of LLM survey surrogates

Pith reviewed 2026-06-30 06:13 UTC · model grok-4.3

classification 💻 cs.CL econ.GNq-fin.EC
keywords LLM survey surrogatessilicon samplingcultural tastestaste structuresdemographic alignmentpositive biasarts participation
0
0 comments X

The pith

LLM survey surrogates show systematic positive bias for liking and lose the relational and demographic structure of human cultural tastes.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper examines whether large language models can stand in for human respondents when measuring preferences for cultural consumption such as arts participation. It generates 277470 silicon surrogates from three model families and compares their responses to the real Survey of Public Participation in the Arts. Silicon samples display a consistent upward bias in reported liking that inflates population-level taste estimates. They also erase the correlations among different tastes that exist in human data and fail to reproduce the documented alignments between tastes and social position. Age associations weaken, older class patterns reappear, and gender and race patterns become caricatures.

Core claim

Large-language models produce silicon surrogates whose tastes are highly stylized facsimiles of human tastes: silicon samples have a systematic positive-bias for liking that inflates ecological estimates, the complex relationality in real taste structures is completely lost, and very little of the known cultural alignment between tastes and social space is preserved, with attenuated age-taste associations, resurrected anachronistic class-taste associations, and caricaturized gender- and race-taste associations.

What carries the argument

Generation of 277470 LLM silicon surrogates matched to SPPA respondent demographics, followed by direct comparison of liking rates, pairwise taste correlations, and regression coefficients linking tastes to age, class, gender, and race.

If this is right

  • Ecological estimates of how much the population likes particular cultural activities will be higher when drawn from silicon samples than from human samples.
  • Any analysis that depends on the network of associations among tastes will find essentially no structure in silicon data where human data shows clear patterns.
  • Inferences about how tastes vary by age, education, gender, or race will be systematically distorted, with some associations weakened and others invented or exaggerated.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Market-research firms already selling synthetic panels may need bias-correction layers before their outputs can substitute for human surveys.
  • The absence of taste correlations suggests LLMs are not reproducing the social or psychological mechanisms that generate human taste clusters.
  • Policy or marketing uses that rely on accurate mapping of tastes onto demographic groups risk misallocating resources if they draw on silicon data without adjustment.

Load-bearing premise

The specific prompts and sampling procedure used to generate the silicon surrogates produce responses whose distribution matches the target human respondent population without introducing unmeasured artifacts from model training data or prompt design.

What would settle it

A controlled replication that draws both human and silicon responses to identical SPPA items under the same demographic stratification and then measures the size of the positive bias and the drop in taste correlations.

Figures

Figures reproduced from arXiv: 2606.30085 by Mengmi Zhang, Minne Chen, Shannon Ang, Xiangyu Ma.

Figure 1
Figure 1. Figure 1: Comparing Cramer’s V estimates of Silicon Samples against SPPA data [PITH_FULL_IMAGE:figures/full_fig_p013_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Heat map of Deviation in Cramer’s V 4.3 Meta-regression Finally, we consider how closely silicon samples are able to reproduce well-understood associations between tastes and social location [PITH_FULL_IMAGE:figures/full_fig_p014_2.png] view at source ↗
read the original abstract

Large-language models have proven to be remarkable if inconsistent parrots of public attitudes and opinions. The extent to which LLMs are able to produce reasonable approximations of cultural taste remains an open empirical question that becomes more urgent by the day, with market research companies already offering provisional `synthetic' survey panels and the contamination of standard survey data from LLM-generated responses. In this study, we build on past work on silicon sampling by extending considerations of its algorithmic fidelity and alignment to the domain of cultural consumption. We use large-language models from OpenAI, Anthropic, and DeepSeek to each produce 277,470 (30x9249) silicon surrogates of survey respondents from the Survey of Public Participation in the Arts (SPPA). We find these silicon surrogates' tastes to be highly stylized facsimiles of human tastes. (1) Silicon samples have a systematic postive-bias for liking, resulting in inflated ecological estimates of tastes. The individual-level bias of silicon samples are not well-explained by the WEIRD-bias often discussed in the literature. (2) The complex relationality in real taste structures is completely lost among silicon samples. (3) Finally, very little of the known cultural alignment between tastes and social space are preserved. Silicon samples attenuate age-taste associations, resurrect anachronistic class-taste associations, caricaturize gender- and race-taste associations.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper examines whether large language models can serve as faithful surrogates for human respondents in surveys of cultural tastes. Using OpenAI, Anthropic, and DeepSeek models, it generates 277,470 silicon responses (30 replications of each of 9,249 SPPA respondents) and reports three main distortions relative to the human benchmark: (1) a systematic positive bias in reported liking that inflates ecological estimates of taste prevalence and is not explained by WEIRD bias; (2) complete loss of the complex relational structure among tastes; and (3) attenuation or reversal of known demographic alignments (age, class, gender, race) with tastes.

Significance. If the reported distortions prove robust to prompt variation and sampling choices, the work would be significant for the emerging literature on synthetic survey data. It supplies concrete, large-scale evidence that LLM surrogates systematically misrepresent both marginal distributions and joint structures of cultural preferences, with direct implications for market-research panels and any downstream social-science use of LLM-generated responses.

major comments (3)
  1. [Methods] Methods section (prompt construction and sampling): the central claim that the three headline distortions are properties of silicon surrogates rather than elicitation artifacts rests on the untested premise that the demographic-conditioning and question-framing templates produce human-like distributions absent model-specific effects. No robustness checks that vary prompt wording, response instructions, or temperature are reported, which is load-bearing for distinguishing intrinsic LLM behavior from prompt-induced shifts.
  2. [Results] Results on relational structure: the assertion that 'the complex relationality in real taste structures is completely lost' requires explicit metrics (e.g., pairwise correlations, network modularity, or factor loadings) and a direct side-by-side comparison table against the human SPPA matrix; without these, the strength of the 'completely lost' claim cannot be evaluated.
  3. [Results] Demographic-alignment analyses: the statements that silicon samples 'attenuate age-taste associations, resurrect anachronistic class-taste associations, [and] caricaturize gender- and race-taste associations' are presented directionally; quantitative effect sizes, confidence intervals, and multiple-testing corrections for the many taste-by-demographic tests performed are needed to support the claim that alignment is systematically mis-preserved.
minor comments (2)
  1. [Abstract] Abstract: 'postive-bias' is a typographical error.
  2. [Abstract] Abstract: 'WEIRD-bias' is used without definition or citation; a brief parenthetical or reference is required for readers outside the cultural-sociology literature.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which help clarify the presentation of our findings on LLM survey surrogates. We address each major comment below.

read point-by-point responses
  1. Referee: [Methods] Methods section (prompt construction and sampling): the central claim that the three headline distortions are properties of silicon surrogates rather than elicitation artifacts rests on the untested premise that the demographic-conditioning and question-framing templates produce human-like distributions absent model-specific effects. No robustness checks that vary prompt wording, response instructions, or temperature are reported, which is load-bearing for distinguishing intrinsic LLM behavior from prompt-induced shifts.

    Authors: We agree that the absence of explicit robustness checks to prompt variations is a limitation in distinguishing model-intrinsic effects from elicitation artifacts. Our multi-model approach and large replication count offer some protection against model-specific artifacts, but we will strengthen the manuscript by adding robustness analyses that vary prompt wording, response instructions, and temperature settings. These will be reported in a revised methods and results section. revision: yes

  2. Referee: [Results] Results on relational structure: the assertion that 'the complex relationality in real taste structures is completely lost' requires explicit metrics (e.g., pairwise correlations, network modularity, or factor loadings) and a direct side-by-side comparison table against the human SPPA matrix; without these, the strength of the 'completely lost' claim cannot be evaluated.

    Authors: The manuscript includes analyses of taste correlations and relational structures, but we acknowledge that a more explicit presentation with metrics and a comparison table would better support the claim. In the revision, we will add a dedicated table and metrics (pairwise correlations, modularity scores) comparing silicon and human taste matrices to allow direct evaluation of the 'completely lost' assertion. revision: yes

  3. Referee: [Results] Demographic-alignment analyses: the statements that silicon samples 'attenuate age-taste associations, resurrect anachronistic class-taste associations, [and] caricaturize gender- and race-taste associations' are presented directionally; quantitative effect sizes, confidence intervals, and multiple-testing corrections for the many taste-by-demographic tests performed are needed to support the claim that alignment is systematically mis-preserved.

    Authors: We agree that providing quantitative effect sizes, confidence intervals, and addressing multiple testing would strengthen the demographic alignment results. The current analyses are directional but we will revise to include these statistical details and corrections in the updated results section. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison to external SPPA benchmark with no derivations or self-referential steps.

full rationale

The paper generates 277,470 LLM responses under demographic conditioning and directly compares their taste distributions, relational structure, and demographic alignments against the independent SPPA survey data. No equations, fitted parameters, predictions derived from inputs, or load-bearing self-citations appear in the described method or findings. The three headline results (positive liking bias, collapsed relationality, misaligned demographics) are presented as outcomes of this external comparison rather than reductions to the prompt design itself. The analysis is therefore self-contained against the external benchmark.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.1-grok · 5793 in / 1108 out tokens · 35667 ms · 2026-06-30T06:13:21.467229+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

40 extracted references · 33 canonical work pages

  1. [1]

    How Cultural Capital Emerged in Gilded Age America: Musical Purification and Cross-Class Inclusion at the New York Philharmonic.Amer. J. Sociology123, 6 (May 2018), 1743–1783. https://doi.org/10.1086/696938 Lisa P. Argyle, Ethan C. Busby, Nancy Fulda, Joshua R. Gubler, Christopher Rytting, and David Wingate

  2. [2]

    https://doi.org/10.1017/pan.2023.2 Will Atkinson

    Out of One, Many: Using Language Models to Simulate Human Samples.Political Analysis31, 3 (July 2023), 337–351. https://doi.org/10.1017/pan.2023.2 Will Atkinson

  3. [3]

    https://doi.org/10.1177/1749975516639083 Place: London WOS:000375725000005

    The Structure of Literary Taste: Class, Gender and Reading in the UK.Cultural Sociology10, 2 (June 2016), 247–266. https://doi.org/10.1177/1749975516639083 Place: London WOS:000375725000005. Will Atkinson

  4. [4]

    https://doi.org/10.1080/01419870.2026.2651919 _eprint: https://doi.org/10.1080/01419870.2026.2651919

    Class, race and lifestyles in the US: a play of spaces.Ethnic and Racial Studies0, 0 (April 2026), 1–24. https://doi.org/10.1080/01419870.2026.2651919 _eprint: https://doi.org/10.1080/01419870.2026.2651919. Christopher A. Bail

  5. [5]

    https://doi.org/10.1073/pnas.2314021121 James Bisbee, Joshua D

    Can Generative AI improve social science?Proceedings of the National Academy of Sciences121, 21 (May 2024), e2314021121. https://doi.org/10.1073/pnas.2314021121 James Bisbee, Joshua D. Clinton, Cassy Dorff, Brenton Kenkel, and Jennifer M. Larson

  6. [6]

    https://doi.org/10.1017/ pan.2024.5 Pierre Bourdieu

    Synthetic Replacements for Human Survey Data? The Perils of Large Language Models.Political Analysis(May 2024), 1–16. https://doi.org/10.1017/ pan.2024.5 Pierre Bourdieu. 1984.Distinction: A Social Critique of the Judgement of Taste. Harvard University Press. Google-Books-ID: nVaS6gS9Jz4C. Pierre Bourdieu. 2001.Masculine Domination. Stanford University Pr...

  7. [7]

    https://doi.org/10.2139/ssrn.4395751 David Broska, Michael Howes, and Austin van Loon

    Using LLMs for Market Research. https://doi.org/10.2139/ssrn.4395751 David Broska, Michael Howes, and Austin van Loon

  8. [8]

    2025), 1074–1109

    The Mixed Subjects Design: Treating Large Language Models as Potentially Informative Observations.Sociological Methods & Research54, 3 (Aug. 2025), 1074–1109. https://doi.org/10. 1177/00491241251326865 18 E. Brown and M. Grover. 2011.Middlebrow Literary Cultures: The Battle of the Brows, 1920-1960. Springer. Google-Books-ID: 7keEDAAAQBAJ. Bram Bulté and A...

  9. [9]

    2025), 1–85

    LLMs and Cultural Values: The Impact of Prompt Language and Explicit Cultural Framing.Computational Linguistics(Dec. 2025), 1–85. https://doi.org/10.1162/COLI.a.583 Yiting Chen, Tracy Xiao Liu, You Shan, and Songfa Zhong

  10. [10]

    2023), e2316205120

    The emergence of economic rationality of GPT.Proceedings of the National Academy of Sciences120, 51 (Dec. 2023), e2316205120. https://doi.org/10.1073/pnas.2316205120 Currie

  11. [11]

    https://doi.org/10.48550/arXiv.2512.14562 arXiv:2512.14562 [cs] version:

    Polypersona: Persona-Grounded LLM for Synthetic Survey Responses. https://doi.org/10.48550/arXiv.2512.14562 arXiv:2512.14562 [cs] version:

  12. [12]

    https://doi.org/10.1016/j.tics.2023.04.008 Ricardo Dominguez-Olmedo, Moritz Hardt, and Celestine Mendler-Dünner

    Can AI language models replace human participants?Trends in Cognitive Sciences27, 7 (July 2023), 597–600. https://doi.org/10.1016/j.tics.2023.04.008 Ricardo Dominguez-Olmedo, Moritz Hardt, and Celestine Mendler-Dünner

  13. [13]

    https://doi.org/10.48550/arXiv.2306.07951 arXiv:2306.07951 [cs]

    Questioning the Survey Responses of Large Language Models. https://doi.org/10.48550/arXiv.2306.07951 arXiv:2306.07951 [cs]. Mustafa Emirbayer

  14. [14]

    Manifesto for a Relational Sociology.Amer. J. Sociology103, 2 (Sept. 1997), 281–317. https: //doi.org/10.1086/231209 John Kenneth Galbraith. 1998.The Affluent Society. Houghton Mifflin Harcourt. Google-Books-ID: buihYlwXhuwC. Yuan Gao, Dokyun Lee, Gordon Burtch, and Sina Fazelpour

  15. [15]

    Proceedings of the National Academy of Sciences122, 24 (June 2025), e2501660122

    Take caution in using LLMs as human surrogates. Proceedings of the National Academy of Sciences122, 24 (June 2025), e2501660122. https://doi.org/10.1073/pnas.2501660122 Amir Goldberg

  16. [16]

    Mapping Shared Understandings Using Relational Class Analysis: The Case of the Cultural Omnivore Reexamined.Amer. J. Sociology116, 5 (March 2011), 1397–1436. https://doi.org/10.1086/657976 Ali Goli and Amandeep Singh

  17. [17]

    https://doi.org/10.1287/mksc.2023.0306 John J

    Frontiers: Can Large Language Models Capture Human Preferences?Marketing Science 43, 4 (July 2024), 709–722. https://doi.org/10.1287/mksc.2023.0306 John J. Horton

  18. [18]

    Large Language Models as Simulated Economic Agents: What Can We Learn from Homo Silicus? https://doi.org/10.3386/w31122 Juan Isaza

  19. [19]

    Austin C

    Can Synthetic Panels Help Us Find Better Insights? https://www.forbes.com/councils/ forbesbusinesscouncil/2024/10/04/can-synthetic-panels-help-us-find-better-insights/ Section: Small Business. Austin C. Kozlowski, Hyunku Kwon, and James A. Evans

  20. [20]

    https://doi.org/10.48550/arXiv.2407.11190 arXiv:2407.11190 [cs]

    In Silico Sociology: Forecasting COVID-19 Polarization with Large Language Models. https://doi.org/10.48550/arXiv.2407.11190 arXiv:2407.11190 [cs]. Sanguk Lee, Tai-Quan Peng, Matthew H. Goldberg, Seth A. Rosenthal, John E. Kotcher, Edward W. Maibach, and Anthony Leiserowitz

  21. [21]

    2024), e0000429

    Can large language models estimate public opinion about global warming? An empirical assessment of algorithmic fidelity and bias.PLOS Climate3, 8 (Aug. 2024), e0000429. https://doi.org/10.1371/journal.pclm.0000429 Lawrence W. Levine. 1988.Highbrow/lowbrow: the emergence of cultural hierarchy in America. Harvard University Press, Cambridge, MA. Victoria R....

  22. [22]

    https://doi.org/10.48550/arXiv.2407.06866 arXiv:2407.06866 [cs]

    ChatGPT Doesn’t Trust Chargers Fans: Guardrail Sensitivity in Context. https://doi.org/10.48550/arXiv.2407.06866 arXiv:2407.06866 [cs]. Andy Liu, Mona Diab, and Daniel Fried

  23. [23]

    https://doi.org/10.48550/arXiv.2405.20253 arXiv:2405.20253 [cs]

    Evaluating Large Language Model Biases in Persona-Steered Generation. https://doi.org/10.48550/arXiv.2405.20253 arXiv:2405.20253 [cs]. Omar Lizardo and Sara Skiles

  24. [24]

    Omnivorousness

    Reconceptualizing and Theorizing “Omnivorousness”: Genetic and Relational Mecha- nisms.Sociological Theory30, 4 (Dec. 2012), 263–282. https://doi.org/10.1177/0735275112466999 Alex Lyman, Bryce Hepner, Lisa P. Argyle, Ethan C. Busby, Joshua R. Gubler, and David Wingate

  25. [25]

    2025), 1110–1155

    Balancing Large Language Model Alignment and Algorithmic Fidelity in Social Science Research.Sociological Methods & Research54, 3 (Aug. 2025), 1110–1155. https://doi.org/10.1177/00491241251342008 Bolei Ma, Berk Yoztyurk, Anna-Carolina Haensch, Xinpeng Wang, Markus Herklotz, Frauke Kreuter, Barbara Plank, and Matthias Aßenmacher

  26. [26]

    Algorithmic Fidelity of Large Language Models in Generating Synthetic German Public Opinions: A Case Study. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (Eds.). Association for Computational Linguistics, Vienna,...

  27. [27]

    2020), 101514

    What are the temporal dynamics of taste?Poetics(Dec. 2020), 101514. https://doi.org/10.1016/j.poetic. 2020.101514 Xiangyu Ma

  28. [28]

    2024), 17499755241301603

    Tastes and Complex Tastes.Cultural Sociology(Dec. 2024), 17499755241301603. https://doi.org/10.1177/ 17499755241301603 Emma Rose Madden

  29. [29]

    https://doi.org/10.48550/arXiv.2509.26080 arXiv:2509.26080 [cs]

    Evaluating the Use of Large Language Models as Synthetic Social Agents in Social Science Research. https://doi.org/10.48550/arXiv.2509.26080 arXiv:2509.26080 [cs]. John Levi Martin

  30. [30]

    What Is Field Theory?Amer. J. Sociology109, 1 (July 2003), 1–49. https://doi.org/10.1086/375201 Reem Masoud, Ziquan Liu, Martin Ferianc, Philip C. Treleaven, and Miguel Rodrigues Rodrigues

  31. [31]

    Cultural Alignment in Large Language Models: An Explanatory Analysis Based on Hofstede’s Cultural Dimensions. InProceedings of the 19 31st International Conference on Computational Linguistics, Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al-Khalifa, Barbara Di Eugenio, and Steven Schockaert (Eds.). Association for Computational Linguistics, Abu Dha...

  32. [32]

    O’Brien, Carrie J

    Generative Agents: Interactive Simulacra of Human Behavior. InProceedings of the 36th Annual ACM Symposium on User Interface Software and Technology (UIST ’23). Association for Computing Machinery, New York, NY, USA, 1–22. https://doi.org/10.1145/3586183.3606763 Richard A. Peterson and Roger M. Kern

  33. [33]

    https://doi.org/10.2307/2096460 Kyle Puetz

    Changing Highbrow Taste: From Snob to Omnivore.American Sociological Review61, 5 (1996), 900–907. https://doi.org/10.2307/2096460 Kyle Puetz

  34. [34]

    https://doi.org/10.1016/j.poetic.2021.101551 Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto

    Taste boundaries and friendship preferences: Insights from the formalist approach.Poetics86 (June 2021), 101551. https://doi.org/10.1016/j.poetic.2021.101551 Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto

  35. [35]

    InProceedings of the 40th International Conference on Machine Learning

    Whose Opinions Do Language Models Reflect?. InProceedings of the 40th International Conference on Machine Learning. PMLR, 29971–30004. https://proceedings.mlr.press/v202/santurkar23a.html John Seabrook. 2000.Nobrow: The Culture of Marketing, the Marketing of Culture. Random House. Google-Books-ID: yYxEDwAAQBAJ. Georg Simmel

  36. [36]

    Fashion.Amer. J. Sociology62, 6 (May 1957), 541–558. https://doi.org/10.1086/222102 Dieter Vandebroeck

  37. [37]

    https://doi.org/10.1016/j.poetic.2022.101670 VeraSight

    ‘Thinking through’ technique or thinking ‘through’ technique? Expanding the toolkit of cultural sociology.Poetics91 (April 2022), 101670. https://doi.org/10.1016/j.poetic.2022.101670 VeraSight

  38. [38]

    ACM68, 3 (Feb

    Prevalence and Prevention of Large Language Model Use in Crowd Work.Commun. ACM68, 3 (Feb. 2025), 42–47. https://doi.org/10.1145/3685527 Jiancong Xiao, Ziniu Li, Xingyu Xie, Emily Getzen, Cong Fang, Qi Long, and Weijie J. Su

  39. [39]

    On the Algo- rithmic Bias of Aligning Large Language Models with RLHF: Preference Collapse and Matching Regularization. J. Amer. Statist. Assoc.120, 552 (Oct. 2025), 2154–2164. https://doi.org/10.1080/01621459.2025.2555067 _eprint: https://doi.org/10.1080/01621459.2025.2555067. Simone Zhang, Janet Xu, and AJ Alvero

  40. [40]

    2025), 1197–1242

    Generative AI Meets Open-Ended Survey Responses: Research Participant Use of AI and Homogenization.Sociological Methods & Research54, 3 (Aug. 2025), 1197–1242. https://doi.org/10.1177/ 00491241251327130 20 A Cramer’s V matrices 21 Table