pith. sign in

arxiv: 2511.02135 · v2 · submitted 2025-11-03 · 💻 cs.CL

Graph-Based Alternatives to LLMs for Human Simulation

Pith reviewed 2026-05-18 00:44 UTC · model grok-4.3

classification 💻 cs.CL
keywords graph neural networkshuman behavior simulationlink predictionlarge language modelssurvey predictiontest-taking simulationefficient modeling
0
0 comments X

The pith

Graph neural networks match or beat LLMs at simulating human choices on closed-ended tasks while using three orders of magnitude fewer parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether large language models are essential for simulating human behavior on tasks such as survey response prediction and test-taking. It presents GEMS, which builds a heterogeneous graph from past individual responses and frames new simulations as a link-prediction problem solved by a graph neural network. On three datasets and three evaluation settings the graph model performs at or above the level of strong LLM baselines. The result matters because it shows that for many practical simulation needs, far smaller and more efficient models can replace the scale and cost of current LLM approaches.

Core claim

GEMS formulates close-ended human simulation as link prediction on a heterogeneous graph of individuals and choices. Across three datasets and three evaluation settings, this graph neural network matches or outperforms the strongest LLM-based methods while using three orders of magnitude fewer parameters.

What carries the argument

Link prediction on a heterogeneous graph whose nodes are individuals and possible choices, with edges derived from historical response data.

If this is right

  • Survey and test prediction tasks can achieve strong accuracy without generative language models.
  • Behavioral simulation becomes feasible at lower computational cost for repeated or large-scale use.
  • Predictions rest on observable historical links rather than opaque internal representations.
  • The method supplies a lighter-weight complement for applications where past choice patterns dominate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • If the same graph construction succeeds on additional closed-ended tasks, historical data alone may suffice for many simulation needs without any generative component.
  • Hybrid systems could route the bulk of prediction through the graph model and reserve language models only for edge cases that require explicit reasoning.
  • Scaling the approach to new populations would reveal how much of the performance depends on the density and coverage of the original response graph.

Load-bearing premise

The graph built from historical responses already encodes the behavioral patterns required to predict responses to new items accurately.

What would settle it

On a fresh dataset or task, if the graph model falls substantially below LLM performance while the graph construction remains unchanged, the claimed advantage would not hold.

Figures

Figures reproduced from arXiv: 2511.02135 by Joseph Suh, Serina Chang, Suhong Moon.

Figure 1
Figure 1. Figure 1: In our GEMS framework, we construct a heterogeneous graph for discrete choice simu￾lation tasks (Top) where the goal is to predict the option chosen by an individual in response to a context or question. Under three widely studied settings (Bottom), we show that our GNN-based method achieves prediction accuracy consistently comparable to the best LLM-based approaches. imputation), (2) responses of new indi… view at source ↗
Figure 2
Figure 2. Figure 2: Overall architecture of GEMS. The graph encoder learns representations of individual and choice nodes from the relational structure of observed responses, then predicts new responses with a softmax classifier over question options (Top). In setting 3 only, we learn a simple LLM￾to-GNN projection that maps an LLM’s frozen representation of the choice node’s text to its GNN embedding, so that we can acquire … view at source ↗
Figure 3
Figure 3. Figure 3: Prediction accuracy vs. GPU-hours (A100-80GB-SXM4) on the OPINIONQA dataset by task setting and method. Zero-/few-shot prompting accuracies fall below the plotted y-range. For LLM-based methods, we report the best result across three LLMs (LLaMA-2-7B, Mistral-7B-v0.1, and Qwen3-8B). For GEMS, we report the best result across three models (RGCN, GAT, and SAGE) for setting 1 & 2, and report across different … view at source ↗
Figure 4
Figure 4. Figure 4: Visualization of LLM hidden states and GNN node embeddings on the first and second [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Mean and standard deviation of prediction accuracy on setting 3 (new questions) of O [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Mean and standard deviation of prediction accuracy on Setting 3 (new questions) of the [PITH_FULL_IMAGE:figures/full_fig_p028_6.png] view at source ↗
read the original abstract

Large language models (LLMs) have become a popular approach for simulating human behaviors, yet it remains unclear if LLMs are necessary for all simulation tasks. We study a broad family of close-ended simulation tasks, with applications from survey prediction to test-taking, and show that a graph neural network can match or surpass strong LLM-based methods. We introduce Graph-basEd Models for Human Simulation (GEMS) which formulates close-ended simulation as link prediction on a heterogeneous graph of individuals and choices. Across three datasets and three evaluation settings, GEMS matches or outperforms the strongest LLM-based methods while using three orders of magnitude fewer parameters. These results suggest that graph-based modeling can complement LLMs as an efficient and transparent approach to simulating human behaviors. Code is available at https://github.com/schang-lab/gems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces Graph-basEd Models for Human Simulation (GEMS), which reformulates close-ended human behavior simulation tasks (e.g., survey prediction, test-taking) as link prediction on a heterogeneous graph of individuals and choices. Across three datasets and three evaluation settings, the authors report that GEMS matches or outperforms strong LLM-based methods while using three orders of magnitude fewer parameters, positioning graph-based modeling as an efficient, transparent complement to LLMs.

Significance. If the empirical comparisons prove fair with respect to input data parity, this result would demonstrate that standard GNN link-prediction models can achieve competitive performance on structured human simulation tasks at far lower computational cost. The provision of code and the focus on parameter efficiency are strengths that could influence practical deployments in behavioral modeling.

major comments (3)
  1. [§4] §4 (Evaluation Settings) and associated tables: the central claim that GEMS matches or outperforms LLM baselines requires explicit confirmation that the LLM methods received the same per-individual historical response data used to construct the GEMS heterogeneous graph. If the LLMs were prompted only with demographics or item descriptions, the reported advantage may reflect asymmetric information access rather than inherent modeling superiority; this parity must be verified for each of the three settings and datasets.
  2. [§3.1] §3.1 (Graph Construction) and §3.2 (Link Prediction Objective): the heterogeneous graph encodes all historical individual-choice links before prediction. Clarify the train/test edge split procedure to rule out leakage of test-item information into the graph used for evaluation; without this, the link-prediction performance cannot be interpreted as genuine out-of-sample simulation.
  3. [Results] Results tables (e.g., Tables 2–4): the performance comparisons should report statistical significance tests or confidence intervals for the 'matches or outperforms' statements. Current presentation leaves unclear whether observed differences are reliable across the three datasets.
minor comments (2)
  1. [Abstract] Abstract: the claim of 'three orders of magnitude fewer parameters' would be strengthened by stating the exact parameter counts for GEMS versus the strongest LLM baselines.
  2. [§3] Notation in §3: the definition of the heterogeneous graph (nodes for individuals and choices, edge types) could include a small diagram or explicit adjacency-matrix formulation to aid readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to incorporate clarifications and additional analyses.

read point-by-point responses
  1. Referee: [§4] §4 (Evaluation Settings) and associated tables: the central claim that GEMS matches or outperforms LLM baselines requires explicit confirmation that the LLM methods received the same per-individual historical response data used to construct the GEMS heterogeneous graph. If the LLMs were prompted only with demographics or item descriptions, the reported advantage may reflect asymmetric information access rather than inherent modeling superiority; this parity must be verified for each of the three settings and datasets.

    Authors: We confirm that the LLM baselines received exactly the same per-individual historical response data used to build the GEMS graph. In the prompting protocol of §4, each LLM input for an individual includes their full set of prior responses to other items (along with demographics and item descriptions). This information parity holds for all three datasets and all three evaluation settings. We will add an explicit verification paragraph in the revised §4 to document this for each case. revision: yes

  2. Referee: [§3.1] §3.1 (Graph Construction) and §3.2 (Link Prediction Objective): the heterogeneous graph encodes all historical individual-choice links before prediction. Clarify the train/test edge split procedure to rule out leakage of test-item information into the graph used for evaluation; without this, the link-prediction performance cannot be interpreted as genuine out-of-sample simulation.

    Authors: We use a per-individual random edge split: for every person, 70% of their historical responses are included as training edges when constructing the heterogeneous graph, while the remaining 30% are held out entirely as test edges. No test edges or test-item information appear in the graph at training or inference time. This is a standard inductive-style split for simulation. We will expand §3.1 with a dedicated paragraph and pseudocode describing the split to eliminate any ambiguity. revision: yes

  3. Referee: Results tables (e.g., Tables 2–4): the performance comparisons should report statistical significance tests or confidence intervals for the 'matches or outperforms' statements. Current presentation leaves unclear whether observed differences are reliable across the three datasets.

    Authors: We agree that statistical reliability measures will strengthen the claims. In the revised version we will augment Tables 2–4 with 95% bootstrap confidence intervals (1,000 resamples) for every reported metric across the three datasets. This will make clear which performance differences are reliable. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical comparison of GNN link prediction vs LLM baselines

full rationale

The paper's core contribution is an empirical demonstration that a standard heterogeneous GNN link-prediction model (GEMS) matches or exceeds LLM performance on close-ended simulation tasks across three datasets and three evaluation settings, while using far fewer parameters. No mathematical derivation chain exists that reduces reported performance metrics to quantities defined by construction from the same fitted inputs; the heterogeneous graph is assembled from historical individual-choice links and evaluated on held-out predictions, which is a conventional train/test split rather than a self-referential loop. The modeling choice to treat simulation as link prediction is an explicit ansatz, not a result derived from prior equations within the paper. No self-citations, uniqueness theorems, or renamings of known results are invoked as load-bearing steps. The central claim therefore remains an independent empirical finding rather than a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach relies on standard graph-neural-network assumptions plus the modeling decision that human choices can be represented as edges in a person-choice graph; no new physical entities or ad-hoc constants are introduced.

axioms (1)
  • domain assumption A heterogeneous graph of individuals and answer choices can be constructed from historical response data such that missing links correspond to plausible future choices.
    This premise is required for the link-prediction framing to be meaningful for simulation.

pith-pipeline@v0.9.0 · 5658 in / 1254 out tokens · 28602 ms · 2026-05-18T00:44:01.953996+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

105 extracted references · 105 canonical work pages · 16 internal anchors

  1. [1]

    Prediction-powered inference

    Anastasios N Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I Jordan, and Tijana Zrnic. Prediction-powered inference. Science, 382 0 (6671): 0 669--674, 2023

  2. [2]

    Richardson, Austin C

    Jacy Reese Anthis, Ryan Liu, Sean M Richardson, Austin C Kozlowski, Bernard Koch, James Evans, Erik Brynjolfsson, and Michael Bernstein. Llm social simulations are a promising research method. arXiv preprint arXiv:2504.02234, 2025

  3. [3]

    Artificial societies — company profile

    Artificial Societies Artificial Societies. Artificial societies — company profile. https://www.ycombinator.com/companies/artificial-societies, 2025. Accessed: 2025-09-21

  4. [4]

    Layer Normalization

    Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016

  5. [5]

    Explicitly unbiased large language models still form biased associations

    Xuechunzi Bai, Angelina Wang, Ilia Sucholutsky, and Thomas L Griffiths. Explicitly unbiased large language models still form biased associations. Proceedings of the National Academy of Sciences, 122 0 (8): 0 e2416228122, 2025

  6. [6]

    Relational inductive biases, deep learning, and graph networks

    Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018

  7. [7]

    Integration of choice and latent variable models

    Moshe Ben-Akiva, Joan Walker, Adriana T Bernardino, Dinesh A Gopinath, Taka Morikawa, and Amalia Polydoropoulou. Integration of choice and latent variable models. Perpetual motion: Travel behaviour research opportunities and application challenges, 2002: 0 431--470, 2002

  8. [8]

    Graph Convolutional Matrix Completion

    Rianne van den Berg, Thomas N Kipf, and Max Welling. Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263, 2017

  9. [9]

    A foundation model to predict and capture human cognition

    Marcel Binz, Elif Akata, Matthias Bethge, Franziska Br \"a ndle, Fred Callaway, Julian Coda-Forno, Peter Dayan, Can Demircan, Maria K Eckstein, No \'e mi \'E ltet o , et al. A foundation model to predict and capture human cognition. Nature, pp.\ 1--8, 2025

  10. [10]

    Clinton, Cassy Dorff, Brenton Kenkel, and Jennifer M

    James Bisbee, Joshua D. Clinton, Cassy Dorff, Brenton Kenkel, and Jennifer M. Larson. Synthetic replacements for human survey data? the perils of large language models. Political Analysis, 2024

  11. [11]

    Specializing large language models to simulate survey response distributions for global populations

    Yong Cao, Haijiang Liu, Arnav Arora, Isabelle Augenstein, Paul R \"o ttger, and Daniel Hershcovich. Specializing large language models to simulate survey response distributions for global populations. arXiv preprint arXiv:2502.07068, 2025

  12. [12]

    Issues and the 2024 election

    Pew Research Center. Issues and the 2024 election. https://www.pewresearch.org/politics/2024/09/09/issues-and-the-2024-election/, September 9 2024. Accessed: YYYY-MM-DD

  13. [13]

    Llaga: Large language and graph assistant.arXiv preprint arXiv:2402.08170, 2024

    Runjin Chen, Tong Zhao, Ajay Jaiswal, Neil Shah, and Zhangyang Wang. Llaga: Large language and graph assistant. arXiv preprint arXiv:2402.08170, 2024

  14. [14]

    Compost: Characterizing and evaluating caricature in llm simulations

    Myra Cheng, Tiziano Piccardi, and Diyi Yang. Compost: Characterizing and evaluating caricature in llm simulations. In EMNLP, 2023

  15. [15]

    arXiv preprint arXiv:2303.16779 (2023)

    Eric Chu, Jacob Andreas, Stephen Ansolabehere, and Deb Roy. Language models trained on media diets can predict public opinion. arXiv preprint arXiv:2303.16779, 2023

  16. [16]

    Unveiling the spectrum of data contamination in language models: A survey from detection to remediation

    Chunyuan Deng, Yilun Zhao, Yuzhao Heng, Yitong Li, Jiannan Cao, Xiangru Tang, and Arman Cohan. Unveiling the spectrum of data contamination in language models: A survey from detection to remediation. arXiv preprint arXiv:2406.14644, 2024

  17. [17]

    Dominguez-Olmedo, M

    Ricardo Dominguez-Olmedo, Moritz Hardt, and Celestine Mendler-D \"u nner. Questioning the survey responses of large language models. arXiv preprint arXiv:2306.07951, 2023

  18. [18]

    The Llama 3 Herd of Models

    Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

  19. [19]

    Expected parrot

    Expected Parrot Expected Parrot. Expected parrot. https://www.expectedparrot.com/, 2025. Accessed: 2025-09-21

  20. [20]

    Graph neural networks for social recommendation

    Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. Graph neural networks for social recommendation. In The world wide web conference, pp.\ 417--426, 2019

  21. [21]

    Modular pluralism: Pluralistic alignment via multi- LLM collaboration

    Shangbin Feng, Taylor Sorensen, Yuhan Liu, Jillian Fisher, Chan Young Park, Yejin Choi, and Yulia Tsvetkov. Modular pluralism: Pluralistic alignment via multi- LLM collaboration. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp.\ 4151--4171, November 2024

  22. [22]

    Fast Graph Representation Learning with PyTorch Geometric

    Matthias Fey and Jan Eric Lenssen. Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428, 2019

  23. [23]

    Large language models empowered agent-based modeling and simulation: a survey and perspectives

    Chen Gao, Xiaochong Lan, Nian Li, Yuan Yuan, Jingtao Ding, Zhilun Zhou, Fengli Xu, and Yong Li. Large language models empowered agent-based modeling and simulation: a survey and perspectives. Humanities and Social Sciences Communications, 11 0 (1259), 2024

  24. [24]

    A latent class model for discrete choice analysis: contrasts with mixed logit

    William H Greene and David A Hensher. A latent class model for discrete choice analysis: contrasts with mixed logit. Transportation Research Part B: Methodological, 37 0 (8): 0 681--698, 2003

  25. [25]

    Inductive representation learning on large graphs

    Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017

  26. [26]

    Lightgcn: Simplifying and powering graph convolution network for recommendation

    Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pp.\ 639--648, 2020

  27. [27]

    Explanations as features: Llm-based features for text-attributed graphs.CoRR, abs/2305.19523, 2023

    Xiaoxin He, Xavier Bresson, Thomas Laurent, Adam Perold, Yann LeCun, and Bryan Hooi. Harnessing explanations: Llm-to-lm interpreter for enhanced text-attributed graph representation learning. arXiv preprint arXiv:2305.19523, 2023

  28. [28]

    Community-cross-instruct: Unsupervised instruction generation for aligning large language models to online communities

    Zihao He, Minh Duc Chu, Rebecca Dorn, Siyi Guo, and Kristina Lerman. Community-cross-instruct: Unsupervised instruction generation for aligning large language models to online communities. arXiv preprint arXiv:2406.12074, 2024

  29. [29]

    Predicting results of social science experiments using large language models

    Luke Hewitt, Ashwini Ashokkumar, Isaias Ghezae, and Robb Willer. Predicting results of social science experiments using large language models. Technical report, Stanford University and New York University, August 2024

  30. [30]

    Addressing systematic non-response bias with supervised fine-tuning of large language models: A case study on german voting behaviour

    Tobias Holtdirk, Dennis Assenmacher, Arnim Bleier, and Claudia Wagner. Addressing systematic non-response bias with supervised fine-tuning of large language models: A case study on german voting behaviour. Technical report, Center for Open Science, 2025

  31. [31]

    Lora: Low-rank adaptation of large language models

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1 0 (2): 0 3, 2022

  32. [32]

    Let's ask gnn: Empowering large language model for graph in-context learning

    Zhengyu Hu, Yichuan Li, Zhengyu Chen, Jingang Wang, Han Liu, Kyumin Lee, and Kaize Ding. Let's ask gnn: Empowering large language model for graph in-context learning. arXiv preprint arXiv:2410.07074, 2024

  33. [33]

    Human subjects research in the age of generative ai: Opportunities and challenges of applying llm-simulated data to hci studies

    Angel Hsing-Chi Hwang, Michael S Bernstein, S Shyam Sundar, Renwen Zhang, Manoel Horta Ribeiro, Yingdan Lu, Serina Chang, Tongshuang Wu, Aimei Yang, Dmitri Williams, et al. Human subjects research in the age of generative ai: Opportunities and challenges of applying llm-simulated data to hci studies. In Proceedings of the Extended Abstracts of the CHI Con...

  34. [34]

    Hwang, B

    EunJeong Hwang, Bodhisattwa Prasad Majumder, and Niket Tandon. Aligning language models to user opinions. arXiv preprint arXiv:2305.14929, 2023

  35. [35]

    A rational model of the dunning--kruger effect supports insensitivity to evidence in low performers

    Rachel A Jansen, Anna N Rafferty, and Thomas L Griffiths. A rational model of the dunning--kruger effect supports insensitivity to evidence in low performers. Nature Human Behaviour, 5 0 (6): 0 756--763, 2021

  36. [36]

    Mistral 7B

    Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023

  37. [37]

    Deep binding of language model virtual personas: a study on approximating political partisan misperceptions

    Minwoo Kang, Suhong Moon, Seung Hyeong Lee, Ayush Raj, Joseph Suh, and David Chan. Deep binding of language model virtual personas: a study on approximating political partisan misperceptions. In Second Conference on Language Modeling, 2025

  38. [38]

    Simulacrum of stories: Examining large language models as qualitative research participants

    Shivani Kapania, William Agnew, Motahhare Eslami, Hoda Heidari, and Sarah E Fox. Simulacrum of stories: Examining large language models as qualitative research participants. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pp.\ 1--17, 2025

  39. [39]

    Few-shot personalization of llms with mis-aligned responses

    Jaehyung Kim and Yiming Yang. Few-shot personalization of llms with mis-aligned responses. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pp.\ 11943--11974, 2025

  40. [40]

    Linear representations of political perspective emerge in large language models

    Junsol Kim, James Evans, and Aaron Schein. Linear representations of political perspective emerge in large language models. In The Thirteenth International Conference on Learning Representations, 2025

  41. [41]

    Adam: A Method for Stochastic Optimization

    Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

  42. [42]

    Variational Graph Auto-Encoders

    Thomas N Kipf and Max Welling. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308, 2016

  43. [43]

    Bernstein

    Akaash Kolluri, Shengguang Wu, Joon Sung Park, and Michael S. Bernstein. Finetuning llms for human behavior prediction in social science experiments, 2025

  44. [44]

    Persona-driven simulation of voting behavior in the european parliament with large language models

    Maximilian Kreutner, Marlene Lutz, and Markus Strohmaier. Persona-driven simulation of voting behavior in the european parliament with large language models. arXiv preprint arXiv:2506.11798, 2025

  45. [45]

    Valid survey simulations with limited human data: The roles of prompting, fine-tuning, and rectification

    Stefan Krsteski, Giuseppe Russo, Serina Chang, Robert West, and Kristina Gligori \'c . Valid survey simulations with limited human data: The roles of prompting, fine-tuning, and rectification. arXiv preprint arXiv:2510.11408, 2025

  46. [46]

    Efficient memory management for large language model serving with pagedattention

    Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th symposium on operating systems principles, pp.\ 611--626, 2023

  47. [47]

    Llm generated persona is a promise with a catch.arXiv preprint arXiv:2503.16527, 2025

    Ang Li, Haozhe Chen, Hongseok Namkoong, and Tianyi Peng. Llm generated persona is a promise with a catch. arXiv preprint arXiv:2503.16527, 2025

  48. [48]

    Culturellm: Incorporating cultural differences into large language models

    Cheng Li, Mengzhuo Chen, Jindong Wang, Sunayana Sitaram, and Xing Xie. Culturellm: Incorporating cultural differences into large language models. Advances in Neural Information Processing Systems, 37: 0 84799--84838, 2024

  49. [49]

    An introduction to neural networks for the social sciences

    Gechun Lin and Christopher Lucas. An introduction to neural networks for the social sciences. Oxford Handbook of Engaged Methodological Pluralism in Political Science, 2023

  50. [50]

    What makes good in-context examples for GPT-3?arXiv preprint arXiv:2101.06804, 2021

    Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen. What makes good in-context examples for gpt- 3 ? arXiv preprint arXiv:2101.06804, 2021

  51. [51]

    Decoupled Weight Decay Regularization

    Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017

  52. [52]

    Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity

    Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, and Pontus Stenetorp. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 8086--8098, 2022

  53. [53]

    Automated social science: Language models as scientist and subjects

    Benjamin S Manning, Kehang Zhu, and John J Horton. Automated social science: Language models as scientist and subjects. Technical report, National Bureau of Economic Research, 2024

  54. [54]

    Mixed mnl models for discrete response

    Daniel McFadden and Kenneth Train. Mixed mnl models for discrete response. Journal of applied Econometrics, 15 0 (5): 0 447--470, 2000

  55. [55]

    Virtual personas for language models via an anthology of backstories

    Suhong Moon, Marwa Abdulhai, Minwoo Kang, Joseph Suh, Widyadewi Soedarmadji, Eran Behar, and David Chan. Virtual personas for language models via an anthology of backstories. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp.\ 19864--19897, 2024

  56. [56]

    gpt-oss-120b & gpt-oss-20b Model Card

    OpenAI. gpt-oss-120b & gpt-oss-20b model card, 2025. URL https://arxiv.org/abs/2508.10925

  57. [57]

    LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

    Joon Sung Park, Carolyn Q Zou, Aaron Shaw, Benjamin Mako Hill, Carrie Cai, Meredith Ringel Morris, Robb Willer, Percy Liang, and Michael S Bernstein. Generative agent simulations of 1,000 people. arXiv preprint arXiv:2411.10109, 2024

  58. [58]

    America trends panel waves

    PewResearch. America trends panel waves. Retrieved February 06, 2025, from https://www.pewsocialtrends.org/dataset, 2018

  59. [59]

    Performance and biases of large language models in public opinion simulation

    Yao Qu and Jue Wang. Performance and biases of large language models in public opinion simulation. Humanities and Social Sciences Communications, 11 0 (1): 0 1--13, 2024

  60. [60]

    Qwen2.5 Technical Report

    Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...

  61. [61]

    Synthia: Scalable Grounded Persona Generation from Social Media Data

    Vahid Rahimzadeh, Erfan Moosavi Monazzah, Mohammad Taher Pilehvar, and Yadollah Yaghoobzadeh. Synthia: Synthetic yet naturally tailored human-inspired personas. arXiv preprint arXiv:2507.14922, 2025

  62. [62]

    Representation learning with large language models for recommendation

    Xubin Ren, Wei Wei, Lianghao Xia, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, and Chao Huang. Representation learning with large language models for recommendation. In Proceedings of the ACM web conference 2024, pp.\ 3464--3475, 2024

  63. [63]

    Opportunities and risks of llms in survey research

    David M Rothschild, James Brand, Hope Schroeder, and Jenny Wang. Opportunities and risks of llms in survey research. Available at SSRN, 2024

  64. [64]

    Whose opinions do language models reflect? In International Conference on Machine Learning, pp.\ 29971--30004

    Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. Whose opinions do language models reflect? In International Conference on Machine Learning, pp.\ 29971--30004. PMLR, 2023

  65. [65]

    Modeling relational data with graph convolutional networks

    Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. Modeling relational data with graph convolutional networks. In European semantic web conference, pp.\ 593--607. Springer, 2018

  66. [66]

    Quantifying language models' sensitivity to spurious features in prompt design or: How i learned to start worrying about prompt formatting

    Melanie Sclar, Yejin Choi, Yulia Tsvetkov, and Alane Suhr. Quantifying language models' sensitivity to spurious features in prompt design or: How i learned to start worrying about prompt formatting. In The Twelfth International Conference on Learning Representations, 2024

  67. [67]

    Language representations can be what recommenders need: Findings and potentials

    Leheng Sheng, An Zhang, Yi Zhang, Yuxin Chen, Xiang Wang, and Tat-Seng Chua. Language representations can be what recommenders need: Findings and potentials. In ICLR, 2025

  68. [68]

    Singh, A

    Shivalika Singh, Angelika Romanou, Cl \'e mentine Fourrier, David I Adelani, Jian Gang Ngui, Daniel Vila-Suero, Peerat Limkonchotiwat, Kelly Marchisio, Wei Qi Leong, Yosephine Susanto, et al. Global mmlu: Understanding and addressing cultural and linguistic biases in multilingual evaluation. arXiv preprint arXiv:2412.03304, 2024

  69. [69]

    Social simulation with llms @ colm 2025

    SocialSim'25. Social simulation with llms @ colm 2025. https://sites.google.com/view/social-sims-with-llms, 2025. Accessed: 2025-09-21

  70. [70]

    Language model fine-tuning on scaled survey data for predicting distributions of public opinions

    Joseph Suh, Erfan Jahanparast, Suhong Moon, Minwoo Kang, and Serina Chang. Language model fine-tuning on scaled survey data for predicting distributions of public opinions. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 21147--21170, July 2025

  71. [71]

    Graphicl: Unlocking graph learning potential in llms through structured prompt design

    Yuanfu Sun, Zhengnan Ma, Yi Fang, Jing Ma, and Qiaoyu Tan. Graphicl: Unlocking graph learning potential in llms through structured prompt design. arXiv preprint arXiv:2501.15755, 2025

  72. [72]

    Graphgpt: Graph instruction tuning for large language models

    Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Lixin Su, Suqi Cheng, Dawei Yin, and Chao Huang. Graphgpt: Graph instruction tuning for large language models. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.\ 491--500, 2024

  73. [73]

    Linear Representations of Sentiment in Large Language Models

    Curt Tigges, Oskar John Hollinsworth, Atticus Geiger, and Neel Nanda. Linear representations of sentiment in large language models. arXiv preprint arXiv:2310.15154, 2023

  74. [74]

    Graph-based methods for discrete choice

    Kiran Tomlinson and Austin R Benson. Graph-based methods for discrete choice. Network Science, 12 0 (1): 0 21--40, 2024

  75. [75]

    Twin-2k-500: A data set for building digital twins of over 2,000 people based on their answers to over 500 questions

    Olivier Toubia, George Z Gui, Tianyi Peng, Daniel J Merlau, Ang Li, and Haozhe Chen. Twin-2k-500: A data set for building digital twins of over 2,000 people based on their answers to over 500 questions. Marketing Science, 2025

  76. [76]

    Llama 2: Open Foundation and Fine-Tuned Chat Models

    Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023

  77. [77]

    Discrete choice methods with simulation

    Kenneth E Train. Discrete choice methods with simulation. Cambridge university press, 2009

  78. [78]

    Graph attention networks

    Petar Veli c kovi \'c , Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li \`o , and Yoshua Bengio. Graph attention networks. In International Conference on Learning Representations, 2018

  79. [79]

    Generative agent-based modeling with actions grounded in physical, social, or digital space using concordia

    Alexander Sasha Vezhnevets, John P Agapiou, Avia Aharon, Ron Ziv, Jayd Matyas, Edgar A Du \'e \ n ez-Guzm \'a n, William A Cunningham, Simon Osindero, Danny Karmon, and Joel Z Leibo. Generative agent-based modeling with actions grounded in physical, social, or digital space using concordia. arXiv preprint arXiv:2312.03664, 2023

  80. [80]

    Vox populi, vox ai? using large language models to estimate german vote choice

    Leah von der Heyde, Anna-Carolina Haensch, and Alexander Wenz. Vox populi, vox ai? using large language models to estimate german vote choice. Social Science Computer Review, pp.\ 08944393251337014, 2025

Showing first 80 references.