Graph-Based Alternatives to LLMs for Human Simulation

Joseph Suh; Serina Chang; Suhong Moon

arxiv: 2511.02135 · v2 · submitted 2025-11-03 · 💻 cs.CL

Graph-Based Alternatives to LLMs for Human Simulation

Joseph Suh , Suhong Moon , Serina Chang This is my paper

Pith reviewed 2026-05-18 00:44 UTC · model grok-4.3

classification 💻 cs.CL

keywords graph neural networkshuman behavior simulationlink predictionlarge language modelssurvey predictiontest-taking simulationefficient modeling

0 comments

The pith

Graph neural networks match or beat LLMs at simulating human choices on closed-ended tasks while using three orders of magnitude fewer parameters.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether large language models are essential for simulating human behavior on tasks such as survey response prediction and test-taking. It presents GEMS, which builds a heterogeneous graph from past individual responses and frames new simulations as a link-prediction problem solved by a graph neural network. On three datasets and three evaluation settings the graph model performs at or above the level of strong LLM baselines. The result matters because it shows that for many practical simulation needs, far smaller and more efficient models can replace the scale and cost of current LLM approaches.

Core claim

GEMS formulates close-ended human simulation as link prediction on a heterogeneous graph of individuals and choices. Across three datasets and three evaluation settings, this graph neural network matches or outperforms the strongest LLM-based methods while using three orders of magnitude fewer parameters.

What carries the argument

Link prediction on a heterogeneous graph whose nodes are individuals and possible choices, with edges derived from historical response data.

If this is right

Survey and test prediction tasks can achieve strong accuracy without generative language models.
Behavioral simulation becomes feasible at lower computational cost for repeated or large-scale use.
Predictions rest on observable historical links rather than opaque internal representations.
The method supplies a lighter-weight complement for applications where past choice patterns dominate.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the same graph construction succeeds on additional closed-ended tasks, historical data alone may suffice for many simulation needs without any generative component.
Hybrid systems could route the bulk of prediction through the graph model and reserve language models only for edge cases that require explicit reasoning.
Scaling the approach to new populations would reveal how much of the performance depends on the density and coverage of the original response graph.

Load-bearing premise

The graph built from historical responses already encodes the behavioral patterns required to predict responses to new items accurately.

What would settle it

On a fresh dataset or task, if the graph model falls substantially below LLM performance while the graph construction remains unchanged, the claimed advantage would not hold.

Figures

Figures reproduced from arXiv: 2511.02135 by Joseph Suh, Serina Chang, Suhong Moon.

**Figure 1.** Figure 1: In our GEMS framework, we construct a heterogeneous graph for discrete choice simulation tasks (Top) where the goal is to predict the option chosen by an individual in response to a context or question. Under three widely studied settings (Bottom), we show that our GNN-based method achieves prediction accuracy consistently comparable to the best LLM-based approaches. imputation), (2) responses of new indi… view at source ↗

**Figure 2.** Figure 2: Overall architecture of GEMS. The graph encoder learns representations of individual and choice nodes from the relational structure of observed responses, then predicts new responses with a softmax classifier over question options (Top). In setting 3 only, we learn a simple LLMto-GNN projection that maps an LLM’s frozen representation of the choice node’s text to its GNN embedding, so that we can acquire … view at source ↗

**Figure 3.** Figure 3: Prediction accuracy vs. GPU-hours (A100-80GB-SXM4) on the OPINIONQA dataset by task setting and method. Zero-/few-shot prompting accuracies fall below the plotted y-range. For LLM-based methods, we report the best result across three LLMs (LLaMA-2-7B, Mistral-7B-v0.1, and Qwen3-8B). For GEMS, we report the best result across three models (RGCN, GAT, and SAGE) for setting 1 & 2, and report across different … view at source ↗

**Figure 4.** Figure 4: Visualization of LLM hidden states and GNN node embeddings on the first and second [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗

**Figure 5.** Figure 5: Mean and standard deviation of prediction accuracy on setting 3 (new questions) of O [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗

**Figure 6.** Figure 6: Mean and standard deviation of prediction accuracy on Setting 3 (new questions) of the [PITH_FULL_IMAGE:figures/full_fig_p028_6.png] view at source ↗

read the original abstract

Large language models (LLMs) have become a popular approach for simulating human behaviors, yet it remains unclear if LLMs are necessary for all simulation tasks. We study a broad family of close-ended simulation tasks, with applications from survey prediction to test-taking, and show that a graph neural network can match or surpass strong LLM-based methods. We introduce Graph-basEd Models for Human Simulation (GEMS) which formulates close-ended simulation as link prediction on a heterogeneous graph of individuals and choices. Across three datasets and three evaluation settings, GEMS matches or outperforms the strongest LLM-based methods while using three orders of magnitude fewer parameters. These results suggest that graph-based modeling can complement LLMs as an efficient and transparent approach to simulating human behaviors. Code is available at https://github.com/schang-lab/gems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

GEMS shows a graph model can match LLMs on closed-ended simulation by casting it as link prediction, but the win hinges on whether baselines got the same per-person history data.

read the letter

The main thing to take away is that this paper gives a concrete alternative to LLMs for tasks like survey response prediction or test simulation. It builds a heterogeneous graph from past individual choices and uses a standard GNN for link prediction, then reports that the approach holds up against stronger LLM baselines across three datasets while using three orders of magnitude fewer parameters. That efficiency angle is the practical hook.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces Graph-basEd Models for Human Simulation (GEMS), which reformulates close-ended human behavior simulation tasks (e.g., survey prediction, test-taking) as link prediction on a heterogeneous graph of individuals and choices. Across three datasets and three evaluation settings, the authors report that GEMS matches or outperforms strong LLM-based methods while using three orders of magnitude fewer parameters, positioning graph-based modeling as an efficient, transparent complement to LLMs.

Significance. If the empirical comparisons prove fair with respect to input data parity, this result would demonstrate that standard GNN link-prediction models can achieve competitive performance on structured human simulation tasks at far lower computational cost. The provision of code and the focus on parameter efficiency are strengths that could influence practical deployments in behavioral modeling.

major comments (3)

[§4] §4 (Evaluation Settings) and associated tables: the central claim that GEMS matches or outperforms LLM baselines requires explicit confirmation that the LLM methods received the same per-individual historical response data used to construct the GEMS heterogeneous graph. If the LLMs were prompted only with demographics or item descriptions, the reported advantage may reflect asymmetric information access rather than inherent modeling superiority; this parity must be verified for each of the three settings and datasets.
[§3.1] §3.1 (Graph Construction) and §3.2 (Link Prediction Objective): the heterogeneous graph encodes all historical individual-choice links before prediction. Clarify the train/test edge split procedure to rule out leakage of test-item information into the graph used for evaluation; without this, the link-prediction performance cannot be interpreted as genuine out-of-sample simulation.
[Results] Results tables (e.g., Tables 2–4): the performance comparisons should report statistical significance tests or confidence intervals for the 'matches or outperforms' statements. Current presentation leaves unclear whether observed differences are reliable across the three datasets.

minor comments (2)

[Abstract] Abstract: the claim of 'three orders of magnitude fewer parameters' would be strengthened by stating the exact parameter counts for GEMS versus the strongest LLM baselines.
[§3] Notation in §3: the definition of the heterogeneous graph (nodes for individuals and choices, edge types) could include a small diagram or explicit adjacency-matrix formulation to aid readability.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to incorporate clarifications and additional analyses.

read point-by-point responses

Referee: [§4] §4 (Evaluation Settings) and associated tables: the central claim that GEMS matches or outperforms LLM baselines requires explicit confirmation that the LLM methods received the same per-individual historical response data used to construct the GEMS heterogeneous graph. If the LLMs were prompted only with demographics or item descriptions, the reported advantage may reflect asymmetric information access rather than inherent modeling superiority; this parity must be verified for each of the three settings and datasets.

Authors: We confirm that the LLM baselines received exactly the same per-individual historical response data used to build the GEMS graph. In the prompting protocol of §4, each LLM input for an individual includes their full set of prior responses to other items (along with demographics and item descriptions). This information parity holds for all three datasets and all three evaluation settings. We will add an explicit verification paragraph in the revised §4 to document this for each case. revision: yes
Referee: [§3.1] §3.1 (Graph Construction) and §3.2 (Link Prediction Objective): the heterogeneous graph encodes all historical individual-choice links before prediction. Clarify the train/test edge split procedure to rule out leakage of test-item information into the graph used for evaluation; without this, the link-prediction performance cannot be interpreted as genuine out-of-sample simulation.

Authors: We use a per-individual random edge split: for every person, 70% of their historical responses are included as training edges when constructing the heterogeneous graph, while the remaining 30% are held out entirely as test edges. No test edges or test-item information appear in the graph at training or inference time. This is a standard inductive-style split for simulation. We will expand §3.1 with a dedicated paragraph and pseudocode describing the split to eliminate any ambiguity. revision: yes
Referee: Results tables (e.g., Tables 2–4): the performance comparisons should report statistical significance tests or confidence intervals for the 'matches or outperforms' statements. Current presentation leaves unclear whether observed differences are reliable across the three datasets.

Authors: We agree that statistical reliability measures will strengthen the claims. In the revised version we will augment Tables 2–4 with 95% bootstrap confidence intervals (1,000 resamples) for every reported metric across the three datasets. This will make clear which performance differences are reliable. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical comparison of GNN link prediction vs LLM baselines

full rationale

The paper's core contribution is an empirical demonstration that a standard heterogeneous GNN link-prediction model (GEMS) matches or exceeds LLM performance on close-ended simulation tasks across three datasets and three evaluation settings, while using far fewer parameters. No mathematical derivation chain exists that reduces reported performance metrics to quantities defined by construction from the same fitted inputs; the heterogeneous graph is assembled from historical individual-choice links and evaluated on held-out predictions, which is a conventional train/test split rather than a self-referential loop. The modeling choice to treat simulation as link prediction is an explicit ansatz, not a result derived from prior equations within the paper. No self-citations, uniqueness theorems, or renamings of known results are invoked as load-bearing steps. The central claim therefore remains an independent empirical finding rather than a tautology.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The approach relies on standard graph-neural-network assumptions plus the modeling decision that human choices can be represented as edges in a person-choice graph; no new physical entities or ad-hoc constants are introduced.

axioms (1)

domain assumption A heterogeneous graph of individuals and answer choices can be constructed from historical response data such that missing links correspond to plausible future choices.
This premise is required for the link-prediction framing to be meaningful for simulation.

pith-pipeline@v0.9.0 · 5658 in / 1254 out tokens · 28602 ms · 2026-05-18T00:44:01.953996+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We formulate discrete choice simulation as a link prediction problem on a graph... GEMS as a link prediction model trained end-to-end.
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

GEMS matches or outperforms the strongest LLM-based methods while using three orders of magnitude fewer parameters.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

105 extracted references · 105 canonical work pages · 16 internal anchors

[1]

Prediction-powered inference

Anastasios N Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I Jordan, and Tijana Zrnic. Prediction-powered inference. Science, 382 0 (6671): 0 669--674, 2023

work page 2023
[2]

Richardson, Austin C

Jacy Reese Anthis, Ryan Liu, Sean M Richardson, Austin C Kozlowski, Bernard Koch, James Evans, Erik Brynjolfsson, and Michael Bernstein. Llm social simulations are a promising research method. arXiv preprint arXiv:2504.02234, 2025

work page arXiv 2025
[3]

Artificial societies — company profile

Artificial Societies Artificial Societies. Artificial societies — company profile. https://www.ycombinator.com/companies/artificial-societies, 2025. Accessed: 2025-09-21

work page 2025
[4]

Layer Normalization

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[5]

Explicitly unbiased large language models still form biased associations

Xuechunzi Bai, Angelina Wang, Ilia Sucholutsky, and Thomas L Griffiths. Explicitly unbiased large language models still form biased associations. Proceedings of the National Academy of Sciences, 122 0 (8): 0 e2416228122, 2025

work page 2025
[6]

Relational inductive biases, deep learning, and graph networks

Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018
[7]

Integration of choice and latent variable models

Moshe Ben-Akiva, Joan Walker, Adriana T Bernardino, Dinesh A Gopinath, Taka Morikawa, and Amalia Polydoropoulou. Integration of choice and latent variable models. Perpetual motion: Travel behaviour research opportunities and application challenges, 2002: 0 431--470, 2002

work page 2002
[8]

Graph Convolutional Matrix Completion

Rianne van den Berg, Thomas N Kipf, and Max Welling. Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[9]

A foundation model to predict and capture human cognition

Marcel Binz, Elif Akata, Matthias Bethge, Franziska Br \"a ndle, Fred Callaway, Julian Coda-Forno, Peter Dayan, Can Demircan, Maria K Eckstein, No \'e mi \'E ltet o , et al. A foundation model to predict and capture human cognition. Nature, pp.\ 1--8, 2025

work page 2025
[10]

Clinton, Cassy Dorff, Brenton Kenkel, and Jennifer M

James Bisbee, Joshua D. Clinton, Cassy Dorff, Brenton Kenkel, and Jennifer M. Larson. Synthetic replacements for human survey data? the perils of large language models. Political Analysis, 2024

work page 2024
[11]

Specializing large language models to simulate survey response distributions for global populations

Yong Cao, Haijiang Liu, Arnav Arora, Isabelle Augenstein, Paul R \"o ttger, and Daniel Hershcovich. Specializing large language models to simulate survey response distributions for global populations. arXiv preprint arXiv:2502.07068, 2025

work page arXiv 2025
[12]

Issues and the 2024 election

Pew Research Center. Issues and the 2024 election. https://www.pewresearch.org/politics/2024/09/09/issues-and-the-2024-election/, September 9 2024. Accessed: YYYY-MM-DD

work page 2024
[13]

Llaga: Large language and graph assistant.arXiv preprint arXiv:2402.08170, 2024

Runjin Chen, Tong Zhao, Ajay Jaiswal, Neil Shah, and Zhangyang Wang. Llaga: Large language and graph assistant. arXiv preprint arXiv:2402.08170, 2024

work page arXiv 2024
[14]

Compost: Characterizing and evaluating caricature in llm simulations

Myra Cheng, Tiziano Piccardi, and Diyi Yang. Compost: Characterizing and evaluating caricature in llm simulations. In EMNLP, 2023

work page 2023
[15]

arXiv preprint arXiv:2303.16779 (2023)

Eric Chu, Jacob Andreas, Stephen Ansolabehere, and Deb Roy. Language models trained on media diets can predict public opinion. arXiv preprint arXiv:2303.16779, 2023

work page arXiv 2023
[16]

Unveiling the spectrum of data contamination in language models: A survey from detection to remediation

Chunyuan Deng, Yilun Zhao, Yuzhao Heng, Yitong Li, Jiannan Cao, Xiangru Tang, and Arman Cohan. Unveiling the spectrum of data contamination in language models: A survey from detection to remediation. arXiv preprint arXiv:2406.14644, 2024

work page arXiv 2024
[17]

Dominguez-Olmedo, M

Ricardo Dominguez-Olmedo, Moritz Hardt, and Celestine Mendler-D \"u nner. Questioning the survey responses of large language models. arXiv preprint arXiv:2306.07951, 2023

work page arXiv 2023
[18]

The Llama 3 Herd of Models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[19]

Expected parrot

Expected Parrot Expected Parrot. Expected parrot. https://www.expectedparrot.com/, 2025. Accessed: 2025-09-21

work page 2025
[20]

Graph neural networks for social recommendation

Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. Graph neural networks for social recommendation. In The world wide web conference, pp.\ 417--426, 2019

work page 2019
[21]

Modular pluralism: Pluralistic alignment via multi- LLM collaboration

Shangbin Feng, Taylor Sorensen, Yuhan Liu, Jillian Fisher, Chan Young Park, Yejin Choi, and Yulia Tsvetkov. Modular pluralism: Pluralistic alignment via multi- LLM collaboration. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp.\ 4151--4171, November 2024

work page 2024
[22]

Fast Graph Representation Learning with PyTorch Geometric

Matthias Fey and Jan Eric Lenssen. Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903
[23]

Large language models empowered agent-based modeling and simulation: a survey and perspectives

Chen Gao, Xiaochong Lan, Nian Li, Yuan Yuan, Jingtao Ding, Zhilun Zhou, Fengli Xu, and Yong Li. Large language models empowered agent-based modeling and simulation: a survey and perspectives. Humanities and Social Sciences Communications, 11 0 (1259), 2024

work page 2024
[24]

A latent class model for discrete choice analysis: contrasts with mixed logit

William H Greene and David A Hensher. A latent class model for discrete choice analysis: contrasts with mixed logit. Transportation Research Part B: Methodological, 37 0 (8): 0 681--698, 2003

work page 2003
[25]

Inductive representation learning on large graphs

Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017

work page 2017
[26]

Lightgcn: Simplifying and powering graph convolution network for recommendation

Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pp.\ 639--648, 2020

work page 2020
[27]

Explanations as features: Llm-based features for text-attributed graphs.CoRR, abs/2305.19523, 2023

Xiaoxin He, Xavier Bresson, Thomas Laurent, Adam Perold, Yann LeCun, and Bryan Hooi. Harnessing explanations: Llm-to-lm interpreter for enhanced text-attributed graph representation learning. arXiv preprint arXiv:2305.19523, 2023

work page arXiv 2023
[28]

Community-cross-instruct: Unsupervised instruction generation for aligning large language models to online communities

Zihao He, Minh Duc Chu, Rebecca Dorn, Siyi Guo, and Kristina Lerman. Community-cross-instruct: Unsupervised instruction generation for aligning large language models to online communities. arXiv preprint arXiv:2406.12074, 2024

work page arXiv 2024
[29]

Predicting results of social science experiments using large language models

Luke Hewitt, Ashwini Ashokkumar, Isaias Ghezae, and Robb Willer. Predicting results of social science experiments using large language models. Technical report, Stanford University and New York University, August 2024

work page 2024
[30]

Addressing systematic non-response bias with supervised fine-tuning of large language models: A case study on german voting behaviour

Tobias Holtdirk, Dennis Assenmacher, Arnim Bleier, and Claudia Wagner. Addressing systematic non-response bias with supervised fine-tuning of large language models: A case study on german voting behaviour. Technical report, Center for Open Science, 2025

work page 2025
[31]

Lora: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1 0 (2): 0 3, 2022

work page 2022
[32]

Let's ask gnn: Empowering large language model for graph in-context learning

Zhengyu Hu, Yichuan Li, Zhengyu Chen, Jingang Wang, Han Liu, Kyumin Lee, and Kaize Ding. Let's ask gnn: Empowering large language model for graph in-context learning. arXiv preprint arXiv:2410.07074, 2024

work page arXiv 2024
[33]

Human subjects research in the age of generative ai: Opportunities and challenges of applying llm-simulated data to hci studies

Angel Hsing-Chi Hwang, Michael S Bernstein, S Shyam Sundar, Renwen Zhang, Manoel Horta Ribeiro, Yingdan Lu, Serina Chang, Tongshuang Wu, Aimei Yang, Dmitri Williams, et al. Human subjects research in the age of generative ai: Opportunities and challenges of applying llm-simulated data to hci studies. In Proceedings of the Extended Abstracts of the CHI Con...

work page 2025
[34]

Hwang, B

EunJeong Hwang, Bodhisattwa Prasad Majumder, and Niket Tandon. Aligning language models to user opinions. arXiv preprint arXiv:2305.14929, 2023

work page arXiv 2023
[35]

A rational model of the dunning--kruger effect supports insensitivity to evidence in low performers

Rachel A Jansen, Anna N Rafferty, and Thomas L Griffiths. A rational model of the dunning--kruger effect supports insensitivity to evidence in low performers. Nature Human Behaviour, 5 0 (6): 0 756--763, 2021

work page 2021
[36]

Mistral 7B

Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[37]

Deep binding of language model virtual personas: a study on approximating political partisan misperceptions

Minwoo Kang, Suhong Moon, Seung Hyeong Lee, Ayush Raj, Joseph Suh, and David Chan. Deep binding of language model virtual personas: a study on approximating political partisan misperceptions. In Second Conference on Language Modeling, 2025

work page 2025
[38]

Simulacrum of stories: Examining large language models as qualitative research participants

Shivani Kapania, William Agnew, Motahhare Eslami, Hoda Heidari, and Sarah E Fox. Simulacrum of stories: Examining large language models as qualitative research participants. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pp.\ 1--17, 2025

work page 2025
[39]

Few-shot personalization of llms with mis-aligned responses

Jaehyung Kim and Yiming Yang. Few-shot personalization of llms with mis-aligned responses. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pp.\ 11943--11974, 2025

work page 2025
[40]

Linear representations of political perspective emerge in large language models

Junsol Kim, James Evans, and Aaron Schein. Linear representations of political perspective emerge in large language models. In The Thirteenth International Conference on Learning Representations, 2025

work page 2025
[41]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014
[42]

Variational Graph Auto-Encoders

Thomas N Kipf and Max Welling. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016
[43]

Bernstein

Akaash Kolluri, Shengguang Wu, Joon Sung Park, and Michael S. Bernstein. Finetuning llms for human behavior prediction in social science experiments, 2025

work page 2025
[44]

Persona-driven simulation of voting behavior in the european parliament with large language models

Maximilian Kreutner, Marlene Lutz, and Markus Strohmaier. Persona-driven simulation of voting behavior in the european parliament with large language models. arXiv preprint arXiv:2506.11798, 2025

work page arXiv 2025
[45]

Valid survey simulations with limited human data: The roles of prompting, fine-tuning, and rectification

Stefan Krsteski, Giuseppe Russo, Serina Chang, Robert West, and Kristina Gligori \'c . Valid survey simulations with limited human data: The roles of prompting, fine-tuning, and rectification. arXiv preprint arXiv:2510.11408, 2025

work page arXiv 2025
[46]

Efficient memory management for large language model serving with pagedattention

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th symposium on operating systems principles, pp.\ 611--626, 2023

work page 2023
[47]

Llm generated persona is a promise with a catch.arXiv preprint arXiv:2503.16527, 2025

Ang Li, Haozhe Chen, Hongseok Namkoong, and Tianyi Peng. Llm generated persona is a promise with a catch. arXiv preprint arXiv:2503.16527, 2025

work page arXiv 2025
[48]

Culturellm: Incorporating cultural differences into large language models

Cheng Li, Mengzhuo Chen, Jindong Wang, Sunayana Sitaram, and Xing Xie. Culturellm: Incorporating cultural differences into large language models. Advances in Neural Information Processing Systems, 37: 0 84799--84838, 2024

work page 2024
[49]

An introduction to neural networks for the social sciences

Gechun Lin and Christopher Lucas. An introduction to neural networks for the social sciences. Oxford Handbook of Engaged Methodological Pluralism in Political Science, 2023

work page 2023
[50]

What makes good in-context examples for GPT-3?arXiv preprint arXiv:2101.06804, 2021

Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen. What makes good in-context examples for gpt- 3 ? arXiv preprint arXiv:2101.06804, 2021

work page arXiv 2021
[51]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017
[52]

Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity

Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, and Pontus Stenetorp. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 8086--8098, 2022

work page 2022
[53]

Automated social science: Language models as scientist and subjects

Benjamin S Manning, Kehang Zhu, and John J Horton. Automated social science: Language models as scientist and subjects. Technical report, National Bureau of Economic Research, 2024

work page 2024
[54]

Mixed mnl models for discrete response

Daniel McFadden and Kenneth Train. Mixed mnl models for discrete response. Journal of applied Econometrics, 15 0 (5): 0 447--470, 2000

work page 2000
[55]

Virtual personas for language models via an anthology of backstories

Suhong Moon, Marwa Abdulhai, Minwoo Kang, Joseph Suh, Widyadewi Soedarmadji, Eran Behar, and David Chan. Virtual personas for language models via an anthology of backstories. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp.\ 19864--19897, 2024

work page 2024
[56]

gpt-oss-120b & gpt-oss-20b Model Card

OpenAI. gpt-oss-120b & gpt-oss-20b model card, 2025. URL https://arxiv.org/abs/2508.10925

work page internal anchor Pith review Pith/arXiv arXiv 2025
[57]

LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

Joon Sung Park, Carolyn Q Zou, Aaron Shaw, Benjamin Mako Hill, Carrie Cai, Meredith Ringel Morris, Robb Willer, Percy Liang, and Michael S Bernstein. Generative agent simulations of 1,000 people. arXiv preprint arXiv:2411.10109, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[58]

America trends panel waves

PewResearch. America trends panel waves. Retrieved February 06, 2025, from https://www.pewsocialtrends.org/dataset, 2018

work page 2025
[59]

Performance and biases of large language models in public opinion simulation

Yao Qu and Jue Wang. Performance and biases of large language models in public opinion simulation. Humanities and Social Sciences Communications, 11 0 (1): 0 1--13, 2024

work page 2024
[60]

Qwen2.5 Technical Report

Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[61]

Synthia: Scalable Grounded Persona Generation from Social Media Data

Vahid Rahimzadeh, Erfan Moosavi Monazzah, Mohammad Taher Pilehvar, and Yadollah Yaghoobzadeh. Synthia: Synthetic yet naturally tailored human-inspired personas. arXiv preprint arXiv:2507.14922, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025
[62]

Representation learning with large language models for recommendation

Xubin Ren, Wei Wei, Lianghao Xia, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, and Chao Huang. Representation learning with large language models for recommendation. In Proceedings of the ACM web conference 2024, pp.\ 3464--3475, 2024

work page 2024
[63]

Opportunities and risks of llms in survey research

David M Rothschild, James Brand, Hope Schroeder, and Jenny Wang. Opportunities and risks of llms in survey research. Available at SSRN, 2024

work page 2024
[64]

Whose opinions do language models reflect? In International Conference on Machine Learning, pp.\ 29971--30004

Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. Whose opinions do language models reflect? In International Conference on Machine Learning, pp.\ 29971--30004. PMLR, 2023

work page 2023
[65]

Modeling relational data with graph convolutional networks

Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. Modeling relational data with graph convolutional networks. In European semantic web conference, pp.\ 593--607. Springer, 2018

work page 2018
[66]

Quantifying language models' sensitivity to spurious features in prompt design or: How i learned to start worrying about prompt formatting

Melanie Sclar, Yejin Choi, Yulia Tsvetkov, and Alane Suhr. Quantifying language models' sensitivity to spurious features in prompt design or: How i learned to start worrying about prompt formatting. In The Twelfth International Conference on Learning Representations, 2024

work page 2024
[67]

Language representations can be what recommenders need: Findings and potentials

Leheng Sheng, An Zhang, Yi Zhang, Yuxin Chen, Xiang Wang, and Tat-Seng Chua. Language representations can be what recommenders need: Findings and potentials. In ICLR, 2025

work page 2025
[68]

Singh, A

Shivalika Singh, Angelika Romanou, Cl \'e mentine Fourrier, David I Adelani, Jian Gang Ngui, Daniel Vila-Suero, Peerat Limkonchotiwat, Kelly Marchisio, Wei Qi Leong, Yosephine Susanto, et al. Global mmlu: Understanding and addressing cultural and linguistic biases in multilingual evaluation. arXiv preprint arXiv:2412.03304, 2024

work page arXiv 2024
[69]

Social simulation with llms @ colm 2025

SocialSim'25. Social simulation with llms @ colm 2025. https://sites.google.com/view/social-sims-with-llms, 2025. Accessed: 2025-09-21

work page 2025
[70]

Language model fine-tuning on scaled survey data for predicting distributions of public opinions

Joseph Suh, Erfan Jahanparast, Suhong Moon, Minwoo Kang, and Serina Chang. Language model fine-tuning on scaled survey data for predicting distributions of public opinions. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 21147--21170, July 2025

work page 2025
[71]

Graphicl: Unlocking graph learning potential in llms through structured prompt design

Yuanfu Sun, Zhengnan Ma, Yi Fang, Jing Ma, and Qiaoyu Tan. Graphicl: Unlocking graph learning potential in llms through structured prompt design. arXiv preprint arXiv:2501.15755, 2025

work page arXiv 2025
[72]

Graphgpt: Graph instruction tuning for large language models

Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Lixin Su, Suqi Cheng, Dawei Yin, and Chao Huang. Graphgpt: Graph instruction tuning for large language models. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.\ 491--500, 2024

work page 2024
[73]

Linear Representations of Sentiment in Large Language Models

Curt Tigges, Oskar John Hollinsworth, Atticus Geiger, and Neel Nanda. Linear representations of sentiment in large language models. arXiv preprint arXiv:2310.15154, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[74]

Graph-based methods for discrete choice

Kiran Tomlinson and Austin R Benson. Graph-based methods for discrete choice. Network Science, 12 0 (1): 0 21--40, 2024

work page 2024
[75]

Twin-2k-500: A data set for building digital twins of over 2,000 people based on their answers to over 500 questions

Olivier Toubia, George Z Gui, Tianyi Peng, Daniel J Merlau, Ang Li, and Haozhe Chen. Twin-2k-500: A data set for building digital twins of over 2,000 people based on their answers to over 500 questions. Marketing Science, 2025

work page 2025
[76]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[77]

Discrete choice methods with simulation

Kenneth E Train. Discrete choice methods with simulation. Cambridge university press, 2009

work page 2009
[78]

Graph attention networks

Petar Veli c kovi \'c , Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li \`o , and Yoshua Bengio. Graph attention networks. In International Conference on Learning Representations, 2018

work page 2018
[79]

Generative agent-based modeling with actions grounded in physical, social, or digital space using concordia

Alexander Sasha Vezhnevets, John P Agapiou, Avia Aharon, Ron Ziv, Jayd Matyas, Edgar A Du \'e \ n ez-Guzm \'a n, William A Cunningham, Simon Osindero, Danny Karmon, and Joel Z Leibo. Generative agent-based modeling with actions grounded in physical, social, or digital space using concordia. arXiv preprint arXiv:2312.03664, 2023

work page arXiv 2023
[80]

Vox populi, vox ai? using large language models to estimate german vote choice

Leah von der Heyde, Anna-Carolina Haensch, and Alexander Wenz. Vox populi, vox ai? using large language models to estimate german vote choice. Social Science Computer Review, pp.\ 08944393251337014, 2025

work page 2025

Showing first 80 references.

[1] [1]

Prediction-powered inference

Anastasios N Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I Jordan, and Tijana Zrnic. Prediction-powered inference. Science, 382 0 (6671): 0 669--674, 2023

work page 2023

[2] [2]

Richardson, Austin C

Jacy Reese Anthis, Ryan Liu, Sean M Richardson, Austin C Kozlowski, Bernard Koch, James Evans, Erik Brynjolfsson, and Michael Bernstein. Llm social simulations are a promising research method. arXiv preprint arXiv:2504.02234, 2025

work page arXiv 2025

[3] [3]

Artificial societies — company profile

Artificial Societies Artificial Societies. Artificial societies — company profile. https://www.ycombinator.com/companies/artificial-societies, 2025. Accessed: 2025-09-21

work page 2025

[4] [4]

Layer Normalization

Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[5] [5]

Explicitly unbiased large language models still form biased associations

Xuechunzi Bai, Angelina Wang, Ilia Sucholutsky, and Thomas L Griffiths. Explicitly unbiased large language models still form biased associations. Proceedings of the National Academy of Sciences, 122 0 (8): 0 e2416228122, 2025

work page 2025

[6] [6]

Relational inductive biases, deep learning, and graph networks

Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018

work page internal anchor Pith review Pith/arXiv arXiv 2018

[7] [7]

Integration of choice and latent variable models

Moshe Ben-Akiva, Joan Walker, Adriana T Bernardino, Dinesh A Gopinath, Taka Morikawa, and Amalia Polydoropoulou. Integration of choice and latent variable models. Perpetual motion: Travel behaviour research opportunities and application challenges, 2002: 0 431--470, 2002

work page 2002

[8] [8]

Graph Convolutional Matrix Completion

Rianne van den Berg, Thomas N Kipf, and Max Welling. Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[9] [9]

A foundation model to predict and capture human cognition

Marcel Binz, Elif Akata, Matthias Bethge, Franziska Br \"a ndle, Fred Callaway, Julian Coda-Forno, Peter Dayan, Can Demircan, Maria K Eckstein, No \'e mi \'E ltet o , et al. A foundation model to predict and capture human cognition. Nature, pp.\ 1--8, 2025

work page 2025

[10] [10]

Clinton, Cassy Dorff, Brenton Kenkel, and Jennifer M

James Bisbee, Joshua D. Clinton, Cassy Dorff, Brenton Kenkel, and Jennifer M. Larson. Synthetic replacements for human survey data? the perils of large language models. Political Analysis, 2024

work page 2024

[11] [11]

Specializing large language models to simulate survey response distributions for global populations

Yong Cao, Haijiang Liu, Arnav Arora, Isabelle Augenstein, Paul R \"o ttger, and Daniel Hershcovich. Specializing large language models to simulate survey response distributions for global populations. arXiv preprint arXiv:2502.07068, 2025

work page arXiv 2025

[12] [12]

Issues and the 2024 election

Pew Research Center. Issues and the 2024 election. https://www.pewresearch.org/politics/2024/09/09/issues-and-the-2024-election/, September 9 2024. Accessed: YYYY-MM-DD

work page 2024

[13] [13]

Llaga: Large language and graph assistant.arXiv preprint arXiv:2402.08170, 2024

Runjin Chen, Tong Zhao, Ajay Jaiswal, Neil Shah, and Zhangyang Wang. Llaga: Large language and graph assistant. arXiv preprint arXiv:2402.08170, 2024

work page arXiv 2024

[14] [14]

Compost: Characterizing and evaluating caricature in llm simulations

Myra Cheng, Tiziano Piccardi, and Diyi Yang. Compost: Characterizing and evaluating caricature in llm simulations. In EMNLP, 2023

work page 2023

[15] [15]

arXiv preprint arXiv:2303.16779 (2023)

Eric Chu, Jacob Andreas, Stephen Ansolabehere, and Deb Roy. Language models trained on media diets can predict public opinion. arXiv preprint arXiv:2303.16779, 2023

work page arXiv 2023

[16] [16]

Unveiling the spectrum of data contamination in language models: A survey from detection to remediation

Chunyuan Deng, Yilun Zhao, Yuzhao Heng, Yitong Li, Jiannan Cao, Xiangru Tang, and Arman Cohan. Unveiling the spectrum of data contamination in language models: A survey from detection to remediation. arXiv preprint arXiv:2406.14644, 2024

work page arXiv 2024

[17] [17]

Dominguez-Olmedo, M

Ricardo Dominguez-Olmedo, Moritz Hardt, and Celestine Mendler-D \"u nner. Questioning the survey responses of large language models. arXiv preprint arXiv:2306.07951, 2023

work page arXiv 2023

[18] [18]

The Llama 3 Herd of Models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[19] [19]

Expected parrot

Expected Parrot Expected Parrot. Expected parrot. https://www.expectedparrot.com/, 2025. Accessed: 2025-09-21

work page 2025

[20] [20]

Graph neural networks for social recommendation

Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. Graph neural networks for social recommendation. In The world wide web conference, pp.\ 417--426, 2019

work page 2019

[21] [21]

Modular pluralism: Pluralistic alignment via multi- LLM collaboration

Shangbin Feng, Taylor Sorensen, Yuhan Liu, Jillian Fisher, Chan Young Park, Yejin Choi, and Yulia Tsvetkov. Modular pluralism: Pluralistic alignment via multi- LLM collaboration. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp.\ 4151--4171, November 2024

work page 2024

[22] [22]

Fast Graph Representation Learning with PyTorch Geometric

Matthias Fey and Jan Eric Lenssen. Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428, 2019

work page internal anchor Pith review Pith/arXiv arXiv 1903

[23] [23]

Large language models empowered agent-based modeling and simulation: a survey and perspectives

Chen Gao, Xiaochong Lan, Nian Li, Yuan Yuan, Jingtao Ding, Zhilun Zhou, Fengli Xu, and Yong Li. Large language models empowered agent-based modeling and simulation: a survey and perspectives. Humanities and Social Sciences Communications, 11 0 (1259), 2024

work page 2024

[24] [24]

A latent class model for discrete choice analysis: contrasts with mixed logit

William H Greene and David A Hensher. A latent class model for discrete choice analysis: contrasts with mixed logit. Transportation Research Part B: Methodological, 37 0 (8): 0 681--698, 2003

work page 2003

[25] [25]

Inductive representation learning on large graphs

Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017

work page 2017

[26] [26]

Lightgcn: Simplifying and powering graph convolution network for recommendation

Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pp.\ 639--648, 2020

work page 2020

[27] [27]

Explanations as features: Llm-based features for text-attributed graphs.CoRR, abs/2305.19523, 2023

Xiaoxin He, Xavier Bresson, Thomas Laurent, Adam Perold, Yann LeCun, and Bryan Hooi. Harnessing explanations: Llm-to-lm interpreter for enhanced text-attributed graph representation learning. arXiv preprint arXiv:2305.19523, 2023

work page arXiv 2023

[28] [28]

Community-cross-instruct: Unsupervised instruction generation for aligning large language models to online communities

Zihao He, Minh Duc Chu, Rebecca Dorn, Siyi Guo, and Kristina Lerman. Community-cross-instruct: Unsupervised instruction generation for aligning large language models to online communities. arXiv preprint arXiv:2406.12074, 2024

work page arXiv 2024

[29] [29]

Predicting results of social science experiments using large language models

Luke Hewitt, Ashwini Ashokkumar, Isaias Ghezae, and Robb Willer. Predicting results of social science experiments using large language models. Technical report, Stanford University and New York University, August 2024

work page 2024

[30] [30]

Addressing systematic non-response bias with supervised fine-tuning of large language models: A case study on german voting behaviour

Tobias Holtdirk, Dennis Assenmacher, Arnim Bleier, and Claudia Wagner. Addressing systematic non-response bias with supervised fine-tuning of large language models: A case study on german voting behaviour. Technical report, Center for Open Science, 2025

work page 2025

[31] [31]

Lora: Low-rank adaptation of large language models

Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1 0 (2): 0 3, 2022

work page 2022

[32] [32]

Let's ask gnn: Empowering large language model for graph in-context learning

Zhengyu Hu, Yichuan Li, Zhengyu Chen, Jingang Wang, Han Liu, Kyumin Lee, and Kaize Ding. Let's ask gnn: Empowering large language model for graph in-context learning. arXiv preprint arXiv:2410.07074, 2024

work page arXiv 2024

[33] [33]

Human subjects research in the age of generative ai: Opportunities and challenges of applying llm-simulated data to hci studies

Angel Hsing-Chi Hwang, Michael S Bernstein, S Shyam Sundar, Renwen Zhang, Manoel Horta Ribeiro, Yingdan Lu, Serina Chang, Tongshuang Wu, Aimei Yang, Dmitri Williams, et al. Human subjects research in the age of generative ai: Opportunities and challenges of applying llm-simulated data to hci studies. In Proceedings of the Extended Abstracts of the CHI Con...

work page 2025

[34] [34]

Hwang, B

EunJeong Hwang, Bodhisattwa Prasad Majumder, and Niket Tandon. Aligning language models to user opinions. arXiv preprint arXiv:2305.14929, 2023

work page arXiv 2023

[35] [35]

A rational model of the dunning--kruger effect supports insensitivity to evidence in low performers

Rachel A Jansen, Anna N Rafferty, and Thomas L Griffiths. A rational model of the dunning--kruger effect supports insensitivity to evidence in low performers. Nature Human Behaviour, 5 0 (6): 0 756--763, 2021

work page 2021

[36] [36]

Mistral 7B

Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[37] [37]

Deep binding of language model virtual personas: a study on approximating political partisan misperceptions

Minwoo Kang, Suhong Moon, Seung Hyeong Lee, Ayush Raj, Joseph Suh, and David Chan. Deep binding of language model virtual personas: a study on approximating political partisan misperceptions. In Second Conference on Language Modeling, 2025

work page 2025

[38] [38]

Simulacrum of stories: Examining large language models as qualitative research participants

Shivani Kapania, William Agnew, Motahhare Eslami, Hoda Heidari, and Sarah E Fox. Simulacrum of stories: Examining large language models as qualitative research participants. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pp.\ 1--17, 2025

work page 2025

[39] [39]

Few-shot personalization of llms with mis-aligned responses

Jaehyung Kim and Yiming Yang. Few-shot personalization of llms with mis-aligned responses. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pp.\ 11943--11974, 2025

work page 2025

[40] [40]

Linear representations of political perspective emerge in large language models

Junsol Kim, James Evans, and Aaron Schein. Linear representations of political perspective emerge in large language models. In The Thirteenth International Conference on Learning Representations, 2025

work page 2025

[41] [41]

Adam: A Method for Stochastic Optimization

Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014

work page internal anchor Pith review Pith/arXiv arXiv 2014

[42] [42]

Variational Graph Auto-Encoders

Thomas N Kipf and Max Welling. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308, 2016

work page internal anchor Pith review Pith/arXiv arXiv 2016

[43] [43]

Bernstein

Akaash Kolluri, Shengguang Wu, Joon Sung Park, and Michael S. Bernstein. Finetuning llms for human behavior prediction in social science experiments, 2025

work page 2025

[44] [44]

Persona-driven simulation of voting behavior in the european parliament with large language models

Maximilian Kreutner, Marlene Lutz, and Markus Strohmaier. Persona-driven simulation of voting behavior in the european parliament with large language models. arXiv preprint arXiv:2506.11798, 2025

work page arXiv 2025

[45] [45]

Valid survey simulations with limited human data: The roles of prompting, fine-tuning, and rectification

Stefan Krsteski, Giuseppe Russo, Serina Chang, Robert West, and Kristina Gligori \'c . Valid survey simulations with limited human data: The roles of prompting, fine-tuning, and rectification. arXiv preprint arXiv:2510.11408, 2025

work page arXiv 2025

[46] [46]

Efficient memory management for large language model serving with pagedattention

Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th symposium on operating systems principles, pp.\ 611--626, 2023

work page 2023

[47] [47]

Llm generated persona is a promise with a catch.arXiv preprint arXiv:2503.16527, 2025

Ang Li, Haozhe Chen, Hongseok Namkoong, and Tianyi Peng. Llm generated persona is a promise with a catch. arXiv preprint arXiv:2503.16527, 2025

work page arXiv 2025

[48] [48]

Culturellm: Incorporating cultural differences into large language models

Cheng Li, Mengzhuo Chen, Jindong Wang, Sunayana Sitaram, and Xing Xie. Culturellm: Incorporating cultural differences into large language models. Advances in Neural Information Processing Systems, 37: 0 84799--84838, 2024

work page 2024

[49] [49]

An introduction to neural networks for the social sciences

Gechun Lin and Christopher Lucas. An introduction to neural networks for the social sciences. Oxford Handbook of Engaged Methodological Pluralism in Political Science, 2023

work page 2023

[50] [50]

What makes good in-context examples for GPT-3?arXiv preprint arXiv:2101.06804, 2021

Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen. What makes good in-context examples for gpt- 3 ? arXiv preprint arXiv:2101.06804, 2021

work page arXiv 2021

[51] [51]

Decoupled Weight Decay Regularization

Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017

work page internal anchor Pith review Pith/arXiv arXiv 2017

[52] [52]

Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity

Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, and Pontus Stenetorp. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 8086--8098, 2022

work page 2022

[53] [53]

Automated social science: Language models as scientist and subjects

Benjamin S Manning, Kehang Zhu, and John J Horton. Automated social science: Language models as scientist and subjects. Technical report, National Bureau of Economic Research, 2024

work page 2024

[54] [54]

Mixed mnl models for discrete response

Daniel McFadden and Kenneth Train. Mixed mnl models for discrete response. Journal of applied Econometrics, 15 0 (5): 0 447--470, 2000

work page 2000

[55] [55]

Virtual personas for language models via an anthology of backstories

Suhong Moon, Marwa Abdulhai, Minwoo Kang, Joseph Suh, Widyadewi Soedarmadji, Eran Behar, and David Chan. Virtual personas for language models via an anthology of backstories. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp.\ 19864--19897, 2024

work page 2024

[56] [56]

gpt-oss-120b & gpt-oss-20b Model Card

OpenAI. gpt-oss-120b & gpt-oss-20b model card, 2025. URL https://arxiv.org/abs/2508.10925

work page internal anchor Pith review Pith/arXiv arXiv 2025

[57] [57]

LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

Joon Sung Park, Carolyn Q Zou, Aaron Shaw, Benjamin Mako Hill, Carrie Cai, Meredith Ringel Morris, Robb Willer, Percy Liang, and Michael S Bernstein. Generative agent simulations of 1,000 people. arXiv preprint arXiv:2411.10109, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[58] [58]

America trends panel waves

PewResearch. America trends panel waves. Retrieved February 06, 2025, from https://www.pewsocialtrends.org/dataset, 2018

work page 2025

[59] [59]

Performance and biases of large language models in public opinion simulation

Yao Qu and Jue Wang. Performance and biases of large language models in public opinion simulation. Humanities and Social Sciences Communications, 11 0 (1): 0 1--13, 2024

work page 2024

[60] [60]

Qwen2.5 Technical Report

Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[61] [61]

Synthia: Scalable Grounded Persona Generation from Social Media Data

Vahid Rahimzadeh, Erfan Moosavi Monazzah, Mohammad Taher Pilehvar, and Yadollah Yaghoobzadeh. Synthia: Synthetic yet naturally tailored human-inspired personas. arXiv preprint arXiv:2507.14922, 2025

work page internal anchor Pith review Pith/arXiv arXiv 2025

[62] [62]

Representation learning with large language models for recommendation

Xubin Ren, Wei Wei, Lianghao Xia, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, and Chao Huang. Representation learning with large language models for recommendation. In Proceedings of the ACM web conference 2024, pp.\ 3464--3475, 2024

work page 2024

[63] [63]

Opportunities and risks of llms in survey research

David M Rothschild, James Brand, Hope Schroeder, and Jenny Wang. Opportunities and risks of llms in survey research. Available at SSRN, 2024

work page 2024

[64] [64]

Whose opinions do language models reflect? In International Conference on Machine Learning, pp.\ 29971--30004

Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. Whose opinions do language models reflect? In International Conference on Machine Learning, pp.\ 29971--30004. PMLR, 2023

work page 2023

[65] [65]

Modeling relational data with graph convolutional networks

Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. Modeling relational data with graph convolutional networks. In European semantic web conference, pp.\ 593--607. Springer, 2018

work page 2018

[66] [66]

Quantifying language models' sensitivity to spurious features in prompt design or: How i learned to start worrying about prompt formatting

Melanie Sclar, Yejin Choi, Yulia Tsvetkov, and Alane Suhr. Quantifying language models' sensitivity to spurious features in prompt design or: How i learned to start worrying about prompt formatting. In The Twelfth International Conference on Learning Representations, 2024

work page 2024

[67] [67]

Language representations can be what recommenders need: Findings and potentials

Leheng Sheng, An Zhang, Yi Zhang, Yuxin Chen, Xiang Wang, and Tat-Seng Chua. Language representations can be what recommenders need: Findings and potentials. In ICLR, 2025

work page 2025

[68] [68]

Singh, A

Shivalika Singh, Angelika Romanou, Cl \'e mentine Fourrier, David I Adelani, Jian Gang Ngui, Daniel Vila-Suero, Peerat Limkonchotiwat, Kelly Marchisio, Wei Qi Leong, Yosephine Susanto, et al. Global mmlu: Understanding and addressing cultural and linguistic biases in multilingual evaluation. arXiv preprint arXiv:2412.03304, 2024

work page arXiv 2024

[69] [69]

Social simulation with llms @ colm 2025

SocialSim'25. Social simulation with llms @ colm 2025. https://sites.google.com/view/social-sims-with-llms, 2025. Accessed: 2025-09-21

work page 2025

[70] [70]

Language model fine-tuning on scaled survey data for predicting distributions of public opinions

Joseph Suh, Erfan Jahanparast, Suhong Moon, Minwoo Kang, and Serina Chang. Language model fine-tuning on scaled survey data for predicting distributions of public opinions. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 21147--21170, July 2025

work page 2025

[71] [71]

Graphicl: Unlocking graph learning potential in llms through structured prompt design

Yuanfu Sun, Zhengnan Ma, Yi Fang, Jing Ma, and Qiaoyu Tan. Graphicl: Unlocking graph learning potential in llms through structured prompt design. arXiv preprint arXiv:2501.15755, 2025

work page arXiv 2025

[72] [72]

Graphgpt: Graph instruction tuning for large language models

Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Lixin Su, Suqi Cheng, Dawei Yin, and Chao Huang. Graphgpt: Graph instruction tuning for large language models. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.\ 491--500, 2024

work page 2024

[73] [73]

Linear Representations of Sentiment in Large Language Models

Curt Tigges, Oskar John Hollinsworth, Atticus Geiger, and Neel Nanda. Linear representations of sentiment in large language models. arXiv preprint arXiv:2310.15154, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[74] [74]

Graph-based methods for discrete choice

Kiran Tomlinson and Austin R Benson. Graph-based methods for discrete choice. Network Science, 12 0 (1): 0 21--40, 2024

work page 2024

[75] [75]

Twin-2k-500: A data set for building digital twins of over 2,000 people based on their answers to over 500 questions

Olivier Toubia, George Z Gui, Tianyi Peng, Daniel J Merlau, Ang Li, and Haozhe Chen. Twin-2k-500: A data set for building digital twins of over 2,000 people based on their answers to over 500 questions. Marketing Science, 2025

work page 2025

[76] [76]

Llama 2: Open Foundation and Fine-Tuned Chat Models

Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[77] [77]

Discrete choice methods with simulation

Kenneth E Train. Discrete choice methods with simulation. Cambridge university press, 2009

work page 2009

[78] [78]

Graph attention networks

Petar Veli c kovi \'c , Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li \`o , and Yoshua Bengio. Graph attention networks. In International Conference on Learning Representations, 2018

work page 2018

[79] [79]

Generative agent-based modeling with actions grounded in physical, social, or digital space using concordia

Alexander Sasha Vezhnevets, John P Agapiou, Avia Aharon, Ron Ziv, Jayd Matyas, Edgar A Du \'e \ n ez-Guzm \'a n, William A Cunningham, Simon Osindero, Danny Karmon, and Joel Z Leibo. Generative agent-based modeling with actions grounded in physical, social, or digital space using concordia. arXiv preprint arXiv:2312.03664, 2023

work page arXiv 2023

[80] [80]

Vox populi, vox ai? using large language models to estimate german vote choice

Leah von der Heyde, Anna-Carolina Haensch, and Alexander Wenz. Vox populi, vox ai? using large language models to estimate german vote choice. Social Science Computer Review, pp.\ 08944393251337014, 2025

work page 2025