Distorted Perspectives of LLM-Simulated Preferences: Can AI Mislead Design?

Eduard Kuric; Matus Krajcovic; Peter Demcak

arxiv: 2605.18311 · v1 · pith:XD4UTQYMnew · submitted 2026-05-18 · 💻 cs.HC

Distorted Perspectives of LLM-Simulated Preferences: Can AI Mislead Design?

Eduard Kuric , Peter Demcak , Matus Krajcovic This is my paper

Pith reviewed 2026-05-20 08:43 UTC · model grok-4.3

classification 💻 cs.HC

keywords LLM simulationdesign preferencesuser experiencepreference testingalgorithmic fidelityvisual designAI misalignmentsynthetic user data

0 comments

The pith

LLM simulations of design preferences diverge systematically from real user choices across multiple setups.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether large language models can stand in for real people when designers want quick feedback on visual interfaces and layouts. It draws on thousands of actual preference tests run on a live research platform and runs parallel simulations while varying the model, its reasoning steps, sampling settings, assigned personas, and prompt detail. The comparisons reveal consistent gaps that do not disappear when the simulation parameters change. Human answers show specific reasoning and balanced critique; LLM answers default to generic observations, repetition of obvious traits, and excessive praise. Because many design teams already consult LLMs for early direction, these gaps could steer final products away from what users actually prefer.

Core claim

Aggregated data from twenty-nine real preference tests (n = 2073) show significant and systematic discrepancies with LLM outputs; the mismatches remain stable when the model is altered in reasoning depth, sampling strategy, persona framing, or prompt specificity. LLM justifications substitute genuine nuance with patterns such as emphasis on generic visual properties, attention to isolated elements, unnecessary elaboration, and overpraising.

What carries the argument

Holistic multimodal simulation of preference-test stimuli, with controlled manipulation of LLM variables (reasoning, sampling, persona, specificity) to quantify alignment against real-user aggregates.

If this is right

Design teams that substitute LLM feedback for human testing risk creating interfaces that real users rate lower on preference measures.
LLM-generated design critiques tend to lack the balanced, context-specific reasoning that human participants provide.
Any automated pipeline that relies on current LLM preference simulation will inherit the same systematic biases observed here.
Patterns such as overpraising and generic focus can be used as diagnostic signals to flag low-fidelity LLM outputs in design workflows.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Designers could treat LLM output as a low-cost first pass that still requires targeted human checks on the specific dimensions where mismatches are largest.
The same simulation approach might be applied to other subjective judgments, such as content appeal or brand perception, to test whether similar distortions appear.
If the root cause lies in training-data coverage of visual design judgments, targeted fine-tuning on large preference datasets could narrow the observed gaps.

Load-bearing premise

The aggregated preference data from the UXtweak platform accurately reflects unbiased user choices without platform-specific selection effects or test-format artifacts.

What would settle it

A new set of preference tests collected outside the original platform, using different recruitment and response formats, that produces LLM outputs closely matching the human distribution would undermine the claim of persistent discrepancies.

Figures

Figures reproduced from arXiv: 2605.18311 by Eduard Kuric, Matus Krajcovic, Peter Demcak.

**Figure 1.** Figure 1: Research model. H2b. Temperature does not affect the similarity between LLM-synthesized and audience design preferences. H2c. Top-p does not affect the similarity between LLM-synthesized and audience design preferences. Works simulating participants have imposed various persona representations to prime models toward better alignment with audiences (Gerosa et al., 2024). Personas can represent individuals (… view at source ↗

**Figure 2.** Figure 2: Preference test LLM simulation procedure. Our ensemble of hypotheses demanded that the simulations be performed iteratively with different settings. We used GPT 4.1 as the baseline model to assess LLM-generated design preferences and their justifications, with a parameter configuration intended to improve its algorithmic fidelity. Mega-personas and recommended values of temperature and top_p = 1 were used … view at source ↗

**Figure 3.** Figure 3: Open-ended justification measures (a-d) and linguistic similarity of simulations to real justifications (e, f). argument, even as they were less likely to be relevant. In copywriting alternatives communicating the same message differently, LLMs failed to capture nuanced subjective reasons that caused some options to resonate with people more strongly Inconsistency with human justifications also translated … view at source ↗

read the original abstract

Designers of digital solutions increasingly consult Large Language Models (LLMs) for their work. However, it remains unclear how this may affect the user experiences they produce and there are no established practices. We investigate how design preferences expressed by LLM-driven simulation methods align with those of real users. We present a study that aggregates real-world data and design stimuli from twenty-nine preference tests conducted in practice by users of the UXtweak online research platform (n = 2073). We perform holistic multimodal simulations where we manipulate LLM variables (model reasoning, sampling, persona type, and specificity) and assess their effects on algorithmic fidelity. Our results unveil significant and systematic discrepancies between peoples' real design preferences and LLM simulations that are consistent across manipulations. Synthetic justifications lack genuine depth, nuance and reasoning, which they substitute by patterns like focus on generic properties, specific elements, elaboration and overpraising. The unique attention directed by this research toward preferences within visual design stimuli highlights misrepresentation of perception and meaning by LLMs in a context that is intuitive yet critical for design teams. The external and ecological validity of our findings is high, given their replication across a multitude of real-world studies.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

LLM simulations of visual design preferences show consistent gaps from real aggregated user data, but the UXtweak source needs checking for platform effects.

read the letter

The main thing to know is that this paper reports systematic differences between what LLMs pick in visual design preference tasks and what real users chose in a set of commercial tests, and those differences hold up when the authors vary model reasoning, sampling, personas, and prompt detail. They aggregate data from 29 separate tests on the UXtweak platform with 2073 participants and run holistic multimodal simulations against the same stimuli. That setup is more grounded than most single-study LLM evaluations. They also break down the synthetic justifications and flag recurring patterns such as emphasis on generic properties, focus on isolated elements, extra elaboration, and overpraising. The cross-study replication gives the discrepancy claim some weight beyond a one-off result. The work does a decent job tying the question to actual design practice rather than abstract benchmarks. The real-user data comes from self-selected platform users and forced-choice formats, so selection or presentation effects could shape the baseline preferences in ways the paper does not fully test against other elicitation methods. The abstract is light on exact discrepancy metrics and how justification patterns were coded, so the methods section will need to show clear statistical controls and reproducibility steps. This is useful for HCI readers who evaluate or deploy LLM tools in design workflows. A practitioner or researcher looking for concrete evidence on simulation limits in visual tasks would get practical value. The empirical base and direct engagement with real test data are solid enough to send it for peer review, with the main requests being more on data-source limitations and analysis transparency.

Referee Report

1 major / 2 minor

Summary. The manuscript aggregates real-world design preference data from 29 UXtweak platform tests (n=2073) and compares them to LLM-driven multimodal simulations that systematically vary model reasoning, sampling, persona type, and specificity. It reports significant, manipulation-consistent discrepancies between real and simulated preferences, along with qualitative patterns in LLM justifications (generic focus, element-specific elaboration, overpraising) that lack depth or nuance. The work emphasizes high ecological validity from cross-study replication and highlights risks for design teams using LLMs to simulate user perception of visual stimuli.

Significance. If the discrepancies are robust to alternative elicitation methods, the findings would caution against direct substitution of LLM simulations for real-user preference testing in visual design, particularly given the multi-study scale and explicit manipulation of LLM variables. The external grounding in independent platform data and the focus on algorithmic fidelity in an applied HCI context add practical value beyond purely synthetic evaluations.

major comments (1)

Methods / Study Design: The central claim attributes observed discrepancies to LLM limitations after treating the aggregated UXtweak preference tests as an unbiased ground truth for 'peoples’ real design preferences.' No explicit controls, sensitivity analyses, or discussion address platform selection effects (self-selected digital-savvy participants) or test-format artifacts (forced-choice visual stimuli), leaving open the possibility that these factors contribute to or drive the reported misalignment rather than LLM behavior alone.

minor comments (2)

Abstract: The claim of 'significant and systematic discrepancies' would benefit from a brief statement of the exact discrepancy metric (e.g., choice agreement rate, rank correlation) and any statistical controls applied across the 29 studies.
Results: The description of post-hoc coding of justification patterns (generic properties, specific elements, elaboration, overpraising) should include inter-coder reliability or a reproducible coding scheme to support the qualitative claims.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their insightful comments, which highlight important considerations for interpreting our real-world benchmark data. We address the major comment below and will incorporate revisions to clarify the scope of our findings.

read point-by-point responses

Referee: The central claim attributes observed discrepancies to LLM limitations after treating the aggregated UXtweak preference tests as an unbiased ground truth for 'peoples’ real design preferences.' No explicit controls, sensitivity analyses, or discussion address platform selection effects (self-selected digital-savvy participants) or test-format artifacts (forced-choice visual stimuli), leaving open the possibility that these factors contribute to or drive the reported misalignment rather than LLM behavior alone.

Authors: We agree that the manuscript would benefit from greater explicitness on this point. The UXtweak data is presented as an ecologically valid aggregation of real design preference tests rather than a universally unbiased ground truth for all people's preferences. To address the referee's concern, we will add a dedicated 'Limitations' subsection in the Discussion that discusses platform self-selection (e.g., digitally engaged participants) and forced-choice format effects as potential influences on the observed distributions. We will also note the consistency of discrepancies across the 29 independent studies as partial evidence of robustness, though we did not perform formal sensitivity analyses focused on these artifacts. We maintain that the core finding—systematic misalignment between LLM simulations and real aggregated preferences—remains informative for design practice even if the real data carries context-specific characteristics, but we will revise the text to avoid any implication of universal ground truth. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical comparison to independent external platform data

full rationale

The paper conducts an empirical study by aggregating real-world preference test data from 29 studies on the independent UXtweak platform (n=2073) and directly comparing it against LLM simulations under manipulated variables. No mathematical derivations, equations, fitted parameters, or self-citations are used to generate the central results; the discrepancies are measured against external user data rather than being constructed from the study's own inputs or prior author work. The analysis is therefore self-contained against external benchmarks with no reduction of outputs to inputs by definition or fitting.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The paper relies on standard statistical comparison methods and the representativeness of platform-collected user data without introducing new free parameters, axioms beyond basic statistical assumptions, or invented entities.

axioms (1)

domain assumption Aggregated preference data from multiple real-world tests can be treated as a reliable proxy for general user design preferences.
Invoked when claiming high external validity and systematic discrepancies.

pith-pipeline@v0.9.0 · 5740 in / 1299 out tokens · 48693 ms · 2026-05-20T08:43:51.012563+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

57 extracted references · 57 canonical work pages

[1]

Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation

Yoon, Se-eun and He, Zhankui and Echterhoff, Jessica and McAuley, Julian. Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024

work page 2024
[2]

and Schoenegger, Philipp and Zhu, Chongyang , title=

Park, Peter S. and Schoenegger, Philipp and Zhu, Chongyang , title=. Behavior Research Methods , year=

work page
[3]

Correcting Systematic Bias in LLM-Generated Dialogues Using Big Five Personality Traits , year=

Sparrenberg, Lorenz and Schneider, Tobias and Deußer, Tobias and Koppenborg, Markus and Sifa, Rafet , booktitle=. Correcting Systematic Bias in LLM-Generated Dialogues Using Big Five Personality Traits , year=

work page
[4]

Socially Responsible Language Modelling Research , year=

Do Personality Tests Generalize to Large Language Models? , author=. Socially Responsible Language Modelling Research , year=

work page
[5]

and Ghanem, Bernard and Li, Guohao and Xie, Chengxing and Chen, Canyu , booktitle =

Jia, Feiran and Ye, Ziyu and Lai, Shiyang and Shu, Kai and Gu, Jindong and Bibi, Adel and Hu, Ziniu and Jurgens, David and Evans, James and Torr, Philip H.S. and Ghanem, Bernard and Li, Guohao and Xie, Chengxing and Chen, Canyu , booktitle =. Can Large Language Model Agents Simulate Human Trust Behavior? , volume =

work page
[6]

Humanities and Social Sciences Communications , year=

Qu, Yao and Wang, Jue , title=. Humanities and Social Sciences Communications , year=

work page
[7]

Proceedings of the 40th International Conference on Machine Learning , pages =

Whose Opinions Do Language Models Reflect? , author =. Proceedings of the 40th International Conference on Machine Learning , pages =. 2023 , editor =

work page 2023
[8]

2025 , issn =

Toward accurate psychological simulations: Investigating LLMs’ responses to personality and cultural variables , journal =. 2025 , issn =

work page 2025
[9]

Modeling Human Subjectivity in LLM s Using Explicit and Implicit Human Factors in Personas

Giorgi, Salvatore and Liu, Tingting and Aich, Ankit and Isman, Kelsey Jane and Sherman, Garrick and Fried, Zachary and Sedoc, Jo \ a o and Ungar, Lyle and Curtis, Brenda. Modeling Human Subjectivity in LLM s Using Explicit and Implicit Human Factors in Personas. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024

work page 2024
[10]

Automated Software Engineering , year=

Gerosa, Marco and Trinkenreich, Bianca and Steinmacher, Igor and Sarma, Anita , title=. Automated Software Engineering , year=

work page
[11]

The implications of Big Five standing for the distribution of trait manifestation in behavior: fifteen experience-sampling studies and a meta-analysis

Fleeson, William and Gallagher, Patrick. The implications of Big Five standing for the distribution of trait manifestation in behavior: fifteen experience-sampling studies and a meta-analysis. J Pers Soc Psychol

work page
[12]

Stick to your role! Stability of personal values expressed in large language models , year =

Kovač, Grgur AND Portelas, Rémy AND Sawayama, Masataka AND Dominey, Peter Ford AND Oudeyer, Pierre-Yves , journal =. Stick to your role! Stability of personal values expressed in large language models , year =

work page
[13]

and Liao, Q

Xiao, Ziang and Zhou, Michelle X. and Liao, Q. Vera and Mark, Gloria and Chi, Changyan and Chen, Wenxi and Yang, Huahai , title =. ACM Trans. Comput.-Hum. Interact. , month = jun, articleno =. 2020 , issue_date =

work page 2020
[14]

Social Science Computer Review , volume =

Jan Karem Höhne and Konstantin Gavras and Joshua Claassen , title =. Social Science Computer Review , volume =. 2024 , URL =

work page 2024
[15]

Internet Research , volume =

Zhu, Zimeng and Hsu, Carol and Nah, Fiona Fui-Hoon and Liu, Na , title =. Internet Research , volume =. 2026 , month =

work page 2026
[16]

Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems , pages =

Baughan, Amanda and August, Tal and Yamashita, Naomi and Reinecke, Katharina , title =. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems , pages =. 2020 , isbn =

work page 2020
[17]

Journal of Product Innovation Management , volume =

Stock, Ruth Maria and Oliveira, Pedro and von Hippel, Eric , title =. Journal of Product Innovation Management , volume =

work page
[18]

Pizzoli and Cathy Anne Pinto and Jorien Veldwijk and Rosanne Janssens and Gwenda Simons and Marie Falahee and Esther

Selena Russo and Chiara Jongerius and Flavia Faccio and Silvia F.M. Pizzoli and Cathy Anne Pinto and Jorien Veldwijk and Rosanne Janssens and Gwenda Simons and Marie Falahee and Esther. Understanding Patients' Preferences: A Systematic Review of Psychological Instruments Used in Patients' Preference and Decision Studies , journal =. 2019 , issn =

work page 2019
[19]

, title =

Lee, Sangwon and Koubek, Richard J. , title =. Interacting with Computers , volume =. 2010 , month =

work page 2010
[20]

and Kahnau, Pia and Cassidy, Lauren C

Pfefferle, Dana and Talbot, Steven R. and Kahnau, Pia and Cassidy, Lauren C. and Brockhausen, Ralf R. and Jaap, Anne and Deikun, Veronika and Yurt, Pinar and Gail, Alexander and Treue, Stefan and Lewejohann, Lars , title=. Behavior Research Methods , year=

work page
[21]

Tomlin, W. Craig. UX and Usability Testing Data. UX Optimization: Combining Behavioral UX and Usability Testing Data to Optimize Websites. 2018

work page 2018
[22]

Proceedings of the Mensch Und Computer 2025 , pages =

Lazik, Christopher Klaus and Katins, Christopher and Kauter, Charlotte and Jakob, Jonas and Jay, Caroline and Grunske, Lars and Kosch, Thomas , title =. Proceedings of the Mensch Und Computer 2025 , pages =. 2025 , isbn =

work page 2025
[23]

Political Analysis , author=

Out of One, Many: Using Language Models to Simulate Human Samples , volume=. Political Analysis , author=. 2023 , pages=

work page 2023
[24]

Evaluating Large Language Models in Generating Synthetic HCI Research Data: a Case Study , year =

H\". Evaluating Large Language Models in Generating Synthetic HCI Research Data: a Case Study , year =. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems , articleno =

work page 2023
[25]

Adler and Jun Hwa Cheah , title =

Monika Imschloss and Marko Sarstedt and Susanne J. Adler and Jun Hwa Cheah , title =. The Service Industries Journal , volume =. 2025 , publisher =

work page 2025
[26]

Nature , year=

Shanahan, Murray and McDonell, Kyle and Reynolds, Laria , title=. Nature , year=

work page
[27]

Proceedings of the 24th ACM International Conference on Intelligent Virtual Agents , articleno =

Zhang, Taiyu and Zhang, Xuesong and Cools, Robbe and Simeone, Adalberto , title =. Proceedings of the 24th ACM International Conference on Intelligent Virtual Agents , articleno =. 2024 , isbn =

work page 2024
[28]

Content-Based Recommendation Engine Using Term Frequency-Inverse Document Frequency Vectorization and Cosine Similarity: A Case Study , year=

Lumintu, Ida , booktitle=. Content-Based Recommendation Engine Using Term Frequency-Inverse Document Frequency Vectorization and Cosine Similarity: A Case Study , year=

work page
[29]

Organization Science , volume =

Hui, Xiang and Reshef, Oren and Zhou, Luofeng , title =. Organization Science , volume =. 2024 , URL =

work page 2024
[30]

, title =

Niederhoffer, Kate and Kellerman, Gabriella Rosen and Lee, Angela and Liebscher, Alex and Rapuano, Kristina and Hancock, Jeffrey T. , title =. 2025 , month =

work page 2025
[31]

Noûs , volume =

Dietrich, Franz and List, Christian , title =. Noûs , volume =

work page
[32]

2024 , issn =

Trust and reliance on AI — An experimental study on the extent and costs of overreliance on AI , journal =. 2024 , issn =

work page 2024
[33]

2026 , eprint=

AI Arms and Influence: Frontier Models Exhibit Sophisticated Reasoning in Simulated Nuclear Crises , author=. 2026 , eprint=

work page 2026
[34]

2024 , eprint=

Towards Measuring the Representation of Subjective Global Opinions in Language Models , author=. 2024 , eprint=

work page 2024
[35]

Harvard business school marketing unit working paper , number=

Using LLMs for market research , author=. Harvard business school marketing unit working paper , number=. 2023 , url=

work page 2023
[36]

Journal of Computing and Information Science in Engineering , volume=

Do large language models produce diverse design concepts? A comparative study with human-crowdsourced solutions , author=. Journal of Computing and Information Science in Engineering , volume=. 2025 , publisher=

work page 2025
[37]

2025 , eprint=

Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina , author=. 2025 , eprint=

work page 2025
[38]

2025 , eprint=

A Tale of Two Identities: An Ethical Audit of Human and AI-Crafted Personas , author=. 2025 , eprint=

work page 2025
[39]

2025 , eprint=

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity , author=. 2025 , eprint=

work page 2025
[40]

2024 , eprint=

Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric Analysis , author=. 2024 , eprint=

work page 2024
[41]

2025 , month =

Introducing. 2025 , month =

work page 2025
[42]

2024 , note =

Models -. 2024 , note =

work page 2024
[43]

Simulated Misinformation Susceptibility ( SMISTS ): Enhancing Misinformation Research with Large Language Model Simulations

Ma, Weicheng and Deng, Chunyuan and Moossavi, Aram and Wang, Lili and Vosoughi, Soroush and Yang, Diyi. Simulated Misinformation Susceptibility ( SMISTS ): Enhancing Misinformation Research with Large Language Model Simulations. Findings of the Association for Computational Linguistics: ACL 2024. 2024

work page 2024
[44]

Generative AI in User Experience Design and Research: How Do UX Practitioners, Teams, and Companies Use GenAI in Industry? , year =

Takaffoli, Macy and Li, Sijia and M\". Generative AI in User Experience Design and Research: How Do UX Practitioners, Teams, and Companies Use GenAI in Industry? , year =. Proceedings of the 2024 ACM Designing Interactive Systems Conference , pages =

work page 2024
[45]

Generating personas using LLMs and assessing their viability , year =

Schuller, Andreas and Janssen, Doris and Blumenr\". Generating personas using LLMs and assessing their viability , year =. Extended Abstracts of the CHI Conference on Human Factors in Computing Systems , articleno =

work page
[46]

Journal of Mechanical Design , volume =

Zhu, Qihao and Chong, Leah and Yang, Maria and Luo, Jianxi , title =. Journal of Mechanical Design , volume =. 2025 , month =

work page 2025
[47]

International Journal of Design Creativity and Innovation , volume =

Jingoog Kim and Mary Lou Maher , title =. International Journal of Design Creativity and Innovation , volume =. 2023 , publisher =

work page 2023
[48]

Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology , articleno =

Duan, Peitong and Cheng, Chin-Yi and Li, Gang and Hartmann, Bjoern and Li, Yang , title =. Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology , articleno =. 2024 , isbn =

work page 2024
[49]

Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems , articleno =

Petridis, Savvas and Terry, Michael and Cai, Carrie Jun , title =. Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems , articleno =. 2023 , isbn =

work page 2023
[50]

2014 , issn =

The preference effect in design concept evaluation , journal =. 2014 , issn =

work page 2014
[51]

Economics and Philosophy , author=

Preferences: neither behavioural nor mental , volume=. Economics and Philosophy , author=. 2019 , pages=

work page 2019
[52]

2025 , issn =

Is usability testing valid with prototypes where clickable hotspots are highlighted upon misclick? , journal =. 2025 , issn =

work page 2025
[53]

2025 , issn =

Validation of information architecture: Cross-methodological comparison of tree testing variants and prototype user testing , journal =. 2025 , issn =

work page 2025
[54]

Proceedings of the National Academy of Sciences , volume =

Marcel Binz and Eric Schulz , title =. Proceedings of the National Academy of Sciences , volume =. 2023 , url =

work page 2023
[55]

Journal of Hospitality and Tourism Technology , volume =

Sop, Serhat Adem and Kurçer, Doğa , title =. Journal of Hospitality and Tourism Technology , volume =. 2024 , month =

work page 2024
[56]

2025 , issn =

Democratizing eye-tracking? Appearance-based gaze estimation with improved attention branch , journal =. 2025 , issn =

work page 2025
[57]

2025 , issn =

Can behavioral features reveal lying in an online personality questionnaire? The impact of mouse dynamics and speech , journal =. 2025 , issn =

work page 2025

[1] [1]

Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation

Yoon, Se-eun and He, Zhankui and Echterhoff, Jessica and McAuley, Julian. Evaluating Large Language Models as Generative User Simulators for Conversational Recommendation. Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers). 2024

work page 2024

[2] [2]

and Schoenegger, Philipp and Zhu, Chongyang , title=

Park, Peter S. and Schoenegger, Philipp and Zhu, Chongyang , title=. Behavior Research Methods , year=

work page

[3] [3]

Correcting Systematic Bias in LLM-Generated Dialogues Using Big Five Personality Traits , year=

Sparrenberg, Lorenz and Schneider, Tobias and Deußer, Tobias and Koppenborg, Markus and Sifa, Rafet , booktitle=. Correcting Systematic Bias in LLM-Generated Dialogues Using Big Five Personality Traits , year=

work page

[4] [4]

Socially Responsible Language Modelling Research , year=

Do Personality Tests Generalize to Large Language Models? , author=. Socially Responsible Language Modelling Research , year=

work page

[5] [5]

and Ghanem, Bernard and Li, Guohao and Xie, Chengxing and Chen, Canyu , booktitle =

Jia, Feiran and Ye, Ziyu and Lai, Shiyang and Shu, Kai and Gu, Jindong and Bibi, Adel and Hu, Ziniu and Jurgens, David and Evans, James and Torr, Philip H.S. and Ghanem, Bernard and Li, Guohao and Xie, Chengxing and Chen, Canyu , booktitle =. Can Large Language Model Agents Simulate Human Trust Behavior? , volume =

work page

[6] [6]

Humanities and Social Sciences Communications , year=

Qu, Yao and Wang, Jue , title=. Humanities and Social Sciences Communications , year=

work page

[7] [7]

Proceedings of the 40th International Conference on Machine Learning , pages =

Whose Opinions Do Language Models Reflect? , author =. Proceedings of the 40th International Conference on Machine Learning , pages =. 2023 , editor =

work page 2023

[8] [8]

2025 , issn =

Toward accurate psychological simulations: Investigating LLMs’ responses to personality and cultural variables , journal =. 2025 , issn =

work page 2025

[9] [9]

Modeling Human Subjectivity in LLM s Using Explicit and Implicit Human Factors in Personas

Giorgi, Salvatore and Liu, Tingting and Aich, Ankit and Isman, Kelsey Jane and Sherman, Garrick and Fried, Zachary and Sedoc, Jo \ a o and Ungar, Lyle and Curtis, Brenda. Modeling Human Subjectivity in LLM s Using Explicit and Implicit Human Factors in Personas. Findings of the Association for Computational Linguistics: EMNLP 2024. 2024

work page 2024

[10] [10]

Automated Software Engineering , year=

Gerosa, Marco and Trinkenreich, Bianca and Steinmacher, Igor and Sarma, Anita , title=. Automated Software Engineering , year=

work page

[11] [11]

The implications of Big Five standing for the distribution of trait manifestation in behavior: fifteen experience-sampling studies and a meta-analysis

Fleeson, William and Gallagher, Patrick. The implications of Big Five standing for the distribution of trait manifestation in behavior: fifteen experience-sampling studies and a meta-analysis. J Pers Soc Psychol

work page

[12] [12]

Stick to your role! Stability of personal values expressed in large language models , year =

Kovač, Grgur AND Portelas, Rémy AND Sawayama, Masataka AND Dominey, Peter Ford AND Oudeyer, Pierre-Yves , journal =. Stick to your role! Stability of personal values expressed in large language models , year =

work page

[13] [13]

and Liao, Q

Xiao, Ziang and Zhou, Michelle X. and Liao, Q. Vera and Mark, Gloria and Chi, Changyan and Chen, Wenxi and Yang, Huahai , title =. ACM Trans. Comput.-Hum. Interact. , month = jun, articleno =. 2020 , issue_date =

work page 2020

[14] [14]

Social Science Computer Review , volume =

Jan Karem Höhne and Konstantin Gavras and Joshua Claassen , title =. Social Science Computer Review , volume =. 2024 , URL =

work page 2024

[15] [15]

Internet Research , volume =

Zhu, Zimeng and Hsu, Carol and Nah, Fiona Fui-Hoon and Liu, Na , title =. Internet Research , volume =. 2026 , month =

work page 2026

[16] [16]

Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems , pages =

Baughan, Amanda and August, Tal and Yamashita, Naomi and Reinecke, Katharina , title =. Proceedings of the 2020 CHI Conference on Human Factors in Computing Systems , pages =. 2020 , isbn =

work page 2020

[17] [17]

Journal of Product Innovation Management , volume =

Stock, Ruth Maria and Oliveira, Pedro and von Hippel, Eric , title =. Journal of Product Innovation Management , volume =

work page

[18] [18]

Pizzoli and Cathy Anne Pinto and Jorien Veldwijk and Rosanne Janssens and Gwenda Simons and Marie Falahee and Esther

Selena Russo and Chiara Jongerius and Flavia Faccio and Silvia F.M. Pizzoli and Cathy Anne Pinto and Jorien Veldwijk and Rosanne Janssens and Gwenda Simons and Marie Falahee and Esther. Understanding Patients' Preferences: A Systematic Review of Psychological Instruments Used in Patients' Preference and Decision Studies , journal =. 2019 , issn =

work page 2019

[19] [19]

, title =

Lee, Sangwon and Koubek, Richard J. , title =. Interacting with Computers , volume =. 2010 , month =

work page 2010

[20] [20]

and Kahnau, Pia and Cassidy, Lauren C

Pfefferle, Dana and Talbot, Steven R. and Kahnau, Pia and Cassidy, Lauren C. and Brockhausen, Ralf R. and Jaap, Anne and Deikun, Veronika and Yurt, Pinar and Gail, Alexander and Treue, Stefan and Lewejohann, Lars , title=. Behavior Research Methods , year=

work page

[21] [21]

Tomlin, W. Craig. UX and Usability Testing Data. UX Optimization: Combining Behavioral UX and Usability Testing Data to Optimize Websites. 2018

work page 2018

[22] [22]

Proceedings of the Mensch Und Computer 2025 , pages =

Lazik, Christopher Klaus and Katins, Christopher and Kauter, Charlotte and Jakob, Jonas and Jay, Caroline and Grunske, Lars and Kosch, Thomas , title =. Proceedings of the Mensch Und Computer 2025 , pages =. 2025 , isbn =

work page 2025

[23] [23]

Political Analysis , author=

Out of One, Many: Using Language Models to Simulate Human Samples , volume=. Political Analysis , author=. 2023 , pages=

work page 2023

[24] [24]

Evaluating Large Language Models in Generating Synthetic HCI Research Data: a Case Study , year =

H\". Evaluating Large Language Models in Generating Synthetic HCI Research Data: a Case Study , year =. Proceedings of the 2023 CHI Conference on Human Factors in Computing Systems , articleno =

work page 2023

[25] [25]

Adler and Jun Hwa Cheah , title =

Monika Imschloss and Marko Sarstedt and Susanne J. Adler and Jun Hwa Cheah , title =. The Service Industries Journal , volume =. 2025 , publisher =

work page 2025

[26] [26]

Nature , year=

Shanahan, Murray and McDonell, Kyle and Reynolds, Laria , title=. Nature , year=

work page

[27] [27]

Proceedings of the 24th ACM International Conference on Intelligent Virtual Agents , articleno =

Zhang, Taiyu and Zhang, Xuesong and Cools, Robbe and Simeone, Adalberto , title =. Proceedings of the 24th ACM International Conference on Intelligent Virtual Agents , articleno =. 2024 , isbn =

work page 2024

[28] [28]

Content-Based Recommendation Engine Using Term Frequency-Inverse Document Frequency Vectorization and Cosine Similarity: A Case Study , year=

Lumintu, Ida , booktitle=. Content-Based Recommendation Engine Using Term Frequency-Inverse Document Frequency Vectorization and Cosine Similarity: A Case Study , year=

work page

[29] [29]

Organization Science , volume =

Hui, Xiang and Reshef, Oren and Zhou, Luofeng , title =. Organization Science , volume =. 2024 , URL =

work page 2024

[30] [30]

, title =

Niederhoffer, Kate and Kellerman, Gabriella Rosen and Lee, Angela and Liebscher, Alex and Rapuano, Kristina and Hancock, Jeffrey T. , title =. 2025 , month =

work page 2025

[31] [31]

Noûs , volume =

Dietrich, Franz and List, Christian , title =. Noûs , volume =

work page

[32] [32]

2024 , issn =

Trust and reliance on AI — An experimental study on the extent and costs of overreliance on AI , journal =. 2024 , issn =

work page 2024

[33] [33]

2026 , eprint=

AI Arms and Influence: Frontier Models Exhibit Sophisticated Reasoning in Simulated Nuclear Crises , author=. 2026 , eprint=

work page 2026

[34] [34]

2024 , eprint=

Towards Measuring the Representation of Subjective Global Opinions in Language Models , author=. 2024 , eprint=

work page 2024

[35] [35]

Harvard business school marketing unit working paper , number=

Using LLMs for market research , author=. Harvard business school marketing unit working paper , number=. 2023 , url=

work page 2023

[36] [36]

Journal of Computing and Information Science in Engineering , volume=

Do large language models produce diverse design concepts? A comparative study with human-crowdsourced solutions , author=. Journal of Computing and Information Science in Engineering , volume=. 2025 , publisher=

work page 2025

[37] [37]

2025 , eprint=

Take Caution in Using LLMs as Human Surrogates: Scylla Ex Machina , author=. 2025 , eprint=

work page 2025

[38] [38]

2025 , eprint=

A Tale of Two Identities: An Ethical Audit of Human and AI-Crafted Personas , author=. 2025 , eprint=

work page 2025

[39] [39]

2025 , eprint=

The Illusion of Thinking: Understanding the Strengths and Limitations of Reasoning Models via the Lens of Problem Complexity , author=. 2025 , eprint=

work page 2025

[40] [40]

2024 , eprint=

Limited Ability of LLMs to Simulate Human Psychological Behaviours: a Psychometric Analysis , author=. 2024 , eprint=

work page 2024

[41] [41]

2025 , month =

Introducing. 2025 , month =

work page 2025

[42] [42]

2024 , note =

Models -. 2024 , note =

work page 2024

[43] [43]

Simulated Misinformation Susceptibility ( SMISTS ): Enhancing Misinformation Research with Large Language Model Simulations

Ma, Weicheng and Deng, Chunyuan and Moossavi, Aram and Wang, Lili and Vosoughi, Soroush and Yang, Diyi. Simulated Misinformation Susceptibility ( SMISTS ): Enhancing Misinformation Research with Large Language Model Simulations. Findings of the Association for Computational Linguistics: ACL 2024. 2024

work page 2024

[44] [44]

Generative AI in User Experience Design and Research: How Do UX Practitioners, Teams, and Companies Use GenAI in Industry? , year =

Takaffoli, Macy and Li, Sijia and M\". Generative AI in User Experience Design and Research: How Do UX Practitioners, Teams, and Companies Use GenAI in Industry? , year =. Proceedings of the 2024 ACM Designing Interactive Systems Conference , pages =

work page 2024

[45] [45]

Generating personas using LLMs and assessing their viability , year =

Schuller, Andreas and Janssen, Doris and Blumenr\". Generating personas using LLMs and assessing their viability , year =. Extended Abstracts of the CHI Conference on Human Factors in Computing Systems , articleno =

work page

[46] [46]

Journal of Mechanical Design , volume =

Zhu, Qihao and Chong, Leah and Yang, Maria and Luo, Jianxi , title =. Journal of Mechanical Design , volume =. 2025 , month =

work page 2025

[47] [47]

International Journal of Design Creativity and Innovation , volume =

Jingoog Kim and Mary Lou Maher , title =. International Journal of Design Creativity and Innovation , volume =. 2023 , publisher =

work page 2023

[48] [48]

Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology , articleno =

Duan, Peitong and Cheng, Chin-Yi and Li, Gang and Hartmann, Bjoern and Li, Yang , title =. Proceedings of the 37th Annual ACM Symposium on User Interface Software and Technology , articleno =. 2024 , isbn =

work page 2024

[49] [49]

Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems , articleno =

Petridis, Savvas and Terry, Michael and Cai, Carrie Jun , title =. Extended Abstracts of the 2023 CHI Conference on Human Factors in Computing Systems , articleno =. 2023 , isbn =

work page 2023

[50] [50]

2014 , issn =

The preference effect in design concept evaluation , journal =. 2014 , issn =

work page 2014

[51] [51]

Economics and Philosophy , author=

Preferences: neither behavioural nor mental , volume=. Economics and Philosophy , author=. 2019 , pages=

work page 2019

[52] [52]

2025 , issn =

Is usability testing valid with prototypes where clickable hotspots are highlighted upon misclick? , journal =. 2025 , issn =

work page 2025

[53] [53]

2025 , issn =

Validation of information architecture: Cross-methodological comparison of tree testing variants and prototype user testing , journal =. 2025 , issn =

work page 2025

[54] [54]

Proceedings of the National Academy of Sciences , volume =

Marcel Binz and Eric Schulz , title =. Proceedings of the National Academy of Sciences , volume =. 2023 , url =

work page 2023

[55] [55]

Journal of Hospitality and Tourism Technology , volume =

Sop, Serhat Adem and Kurçer, Doğa , title =. Journal of Hospitality and Tourism Technology , volume =. 2024 , month =

work page 2024

[56] [56]

2025 , issn =

Democratizing eye-tracking? Appearance-based gaze estimation with improved attention branch , journal =. 2025 , issn =

work page 2025

[57] [57]

2025 , issn =

Can behavioral features reveal lying in an online personality questionnaire? The impact of mouse dynamics and speech , journal =. 2025 , issn =

work page 2025