Graph-Based Alternatives to LLMs for Human Simulation
Pith reviewed 2026-05-18 00:44 UTC · model grok-4.3
The pith
Graph neural networks match or beat LLMs at simulating human choices on closed-ended tasks while using three orders of magnitude fewer parameters.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
GEMS formulates close-ended human simulation as link prediction on a heterogeneous graph of individuals and choices. Across three datasets and three evaluation settings, this graph neural network matches or outperforms the strongest LLM-based methods while using three orders of magnitude fewer parameters.
What carries the argument
Link prediction on a heterogeneous graph whose nodes are individuals and possible choices, with edges derived from historical response data.
If this is right
- Survey and test prediction tasks can achieve strong accuracy without generative language models.
- Behavioral simulation becomes feasible at lower computational cost for repeated or large-scale use.
- Predictions rest on observable historical links rather than opaque internal representations.
- The method supplies a lighter-weight complement for applications where past choice patterns dominate.
Where Pith is reading between the lines
- If the same graph construction succeeds on additional closed-ended tasks, historical data alone may suffice for many simulation needs without any generative component.
- Hybrid systems could route the bulk of prediction through the graph model and reserve language models only for edge cases that require explicit reasoning.
- Scaling the approach to new populations would reveal how much of the performance depends on the density and coverage of the original response graph.
Load-bearing premise
The graph built from historical responses already encodes the behavioral patterns required to predict responses to new items accurately.
What would settle it
On a fresh dataset or task, if the graph model falls substantially below LLM performance while the graph construction remains unchanged, the claimed advantage would not hold.
Figures
read the original abstract
Large language models (LLMs) have become a popular approach for simulating human behaviors, yet it remains unclear if LLMs are necessary for all simulation tasks. We study a broad family of close-ended simulation tasks, with applications from survey prediction to test-taking, and show that a graph neural network can match or surpass strong LLM-based methods. We introduce Graph-basEd Models for Human Simulation (GEMS) which formulates close-ended simulation as link prediction on a heterogeneous graph of individuals and choices. Across three datasets and three evaluation settings, GEMS matches or outperforms the strongest LLM-based methods while using three orders of magnitude fewer parameters. These results suggest that graph-based modeling can complement LLMs as an efficient and transparent approach to simulating human behaviors. Code is available at https://github.com/schang-lab/gems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces Graph-basEd Models for Human Simulation (GEMS), which reformulates close-ended human behavior simulation tasks (e.g., survey prediction, test-taking) as link prediction on a heterogeneous graph of individuals and choices. Across three datasets and three evaluation settings, the authors report that GEMS matches or outperforms strong LLM-based methods while using three orders of magnitude fewer parameters, positioning graph-based modeling as an efficient, transparent complement to LLMs.
Significance. If the empirical comparisons prove fair with respect to input data parity, this result would demonstrate that standard GNN link-prediction models can achieve competitive performance on structured human simulation tasks at far lower computational cost. The provision of code and the focus on parameter efficiency are strengths that could influence practical deployments in behavioral modeling.
major comments (3)
- [§4] §4 (Evaluation Settings) and associated tables: the central claim that GEMS matches or outperforms LLM baselines requires explicit confirmation that the LLM methods received the same per-individual historical response data used to construct the GEMS heterogeneous graph. If the LLMs were prompted only with demographics or item descriptions, the reported advantage may reflect asymmetric information access rather than inherent modeling superiority; this parity must be verified for each of the three settings and datasets.
- [§3.1] §3.1 (Graph Construction) and §3.2 (Link Prediction Objective): the heterogeneous graph encodes all historical individual-choice links before prediction. Clarify the train/test edge split procedure to rule out leakage of test-item information into the graph used for evaluation; without this, the link-prediction performance cannot be interpreted as genuine out-of-sample simulation.
- [Results] Results tables (e.g., Tables 2–4): the performance comparisons should report statistical significance tests or confidence intervals for the 'matches or outperforms' statements. Current presentation leaves unclear whether observed differences are reliable across the three datasets.
minor comments (2)
- [Abstract] Abstract: the claim of 'three orders of magnitude fewer parameters' would be strengthened by stating the exact parameter counts for GEMS versus the strongest LLM baselines.
- [§3] Notation in §3: the definition of the heterogeneous graph (nodes for individuals and choices, edge types) could include a small diagram or explicit adjacency-matrix formulation to aid readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript to incorporate clarifications and additional analyses.
read point-by-point responses
-
Referee: [§4] §4 (Evaluation Settings) and associated tables: the central claim that GEMS matches or outperforms LLM baselines requires explicit confirmation that the LLM methods received the same per-individual historical response data used to construct the GEMS heterogeneous graph. If the LLMs were prompted only with demographics or item descriptions, the reported advantage may reflect asymmetric information access rather than inherent modeling superiority; this parity must be verified for each of the three settings and datasets.
Authors: We confirm that the LLM baselines received exactly the same per-individual historical response data used to build the GEMS graph. In the prompting protocol of §4, each LLM input for an individual includes their full set of prior responses to other items (along with demographics and item descriptions). This information parity holds for all three datasets and all three evaluation settings. We will add an explicit verification paragraph in the revised §4 to document this for each case. revision: yes
-
Referee: [§3.1] §3.1 (Graph Construction) and §3.2 (Link Prediction Objective): the heterogeneous graph encodes all historical individual-choice links before prediction. Clarify the train/test edge split procedure to rule out leakage of test-item information into the graph used for evaluation; without this, the link-prediction performance cannot be interpreted as genuine out-of-sample simulation.
Authors: We use a per-individual random edge split: for every person, 70% of their historical responses are included as training edges when constructing the heterogeneous graph, while the remaining 30% are held out entirely as test edges. No test edges or test-item information appear in the graph at training or inference time. This is a standard inductive-style split for simulation. We will expand §3.1 with a dedicated paragraph and pseudocode describing the split to eliminate any ambiguity. revision: yes
-
Referee: Results tables (e.g., Tables 2–4): the performance comparisons should report statistical significance tests or confidence intervals for the 'matches or outperforms' statements. Current presentation leaves unclear whether observed differences are reliable across the three datasets.
Authors: We agree that statistical reliability measures will strengthen the claims. In the revised version we will augment Tables 2–4 with 95% bootstrap confidence intervals (1,000 resamples) for every reported metric across the three datasets. This will make clear which performance differences are reliable. revision: yes
Circularity Check
No significant circularity in empirical comparison of GNN link prediction vs LLM baselines
full rationale
The paper's core contribution is an empirical demonstration that a standard heterogeneous GNN link-prediction model (GEMS) matches or exceeds LLM performance on close-ended simulation tasks across three datasets and three evaluation settings, while using far fewer parameters. No mathematical derivation chain exists that reduces reported performance metrics to quantities defined by construction from the same fitted inputs; the heterogeneous graph is assembled from historical individual-choice links and evaluated on held-out predictions, which is a conventional train/test split rather than a self-referential loop. The modeling choice to treat simulation as link prediction is an explicit ansatz, not a result derived from prior equations within the paper. No self-citations, uniqueness theorems, or renamings of known results are invoked as load-bearing steps. The central claim therefore remains an independent empirical finding rather than a tautology.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A heterogeneous graph of individuals and answer choices can be constructed from historical response data such that missing links correspond to plausible future choices.
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We formulate discrete choice simulation as a link prediction problem on a graph... GEMS as a link prediction model trained end-to-end.
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
GEMS matches or outperforms the strongest LLM-based methods while using three orders of magnitude fewer parameters.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
Anastasios N Angelopoulos, Stephen Bates, Clara Fannjiang, Michael I Jordan, and Tijana Zrnic. Prediction-powered inference. Science, 382 0 (6671): 0 669--674, 2023
work page 2023
-
[2]
Jacy Reese Anthis, Ryan Liu, Sean M Richardson, Austin C Kozlowski, Bernard Koch, James Evans, Erik Brynjolfsson, and Michael Bernstein. Llm social simulations are a promising research method. arXiv preprint arXiv:2504.02234, 2025
-
[3]
Artificial societies — company profile
Artificial Societies Artificial Societies. Artificial societies — company profile. https://www.ycombinator.com/companies/artificial-societies, 2025. Accessed: 2025-09-21
work page 2025
-
[4]
Jimmy Lei Ba, Jamie Ryan Kiros, and Geoffrey E Hinton. Layer normalization. arXiv preprint arXiv:1607.06450, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
-
[5]
Explicitly unbiased large language models still form biased associations
Xuechunzi Bai, Angelina Wang, Ilia Sucholutsky, and Thomas L Griffiths. Explicitly unbiased large language models still form biased associations. Proceedings of the National Academy of Sciences, 122 0 (8): 0 e2416228122, 2025
work page 2025
-
[6]
Relational inductive biases, deep learning, and graph networks
Peter W Battaglia, Jessica B Hamrick, Victor Bapst, Alvaro Sanchez-Gonzalez, Vinicius Zambaldi, Mateusz Malinowski, Andrea Tacchetti, David Raposo, Adam Santoro, Ryan Faulkner, et al. Relational inductive biases, deep learning, and graph networks. arXiv preprint arXiv:1806.01261, 2018
work page internal anchor Pith review Pith/arXiv arXiv 2018
-
[7]
Integration of choice and latent variable models
Moshe Ben-Akiva, Joan Walker, Adriana T Bernardino, Dinesh A Gopinath, Taka Morikawa, and Amalia Polydoropoulou. Integration of choice and latent variable models. Perpetual motion: Travel behaviour research opportunities and application challenges, 2002: 0 431--470, 2002
work page 2002
-
[8]
Graph Convolutional Matrix Completion
Rianne van den Berg, Thomas N Kipf, and Max Welling. Graph convolutional matrix completion. arXiv preprint arXiv:1706.02263, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[9]
A foundation model to predict and capture human cognition
Marcel Binz, Elif Akata, Matthias Bethge, Franziska Br \"a ndle, Fred Callaway, Julian Coda-Forno, Peter Dayan, Can Demircan, Maria K Eckstein, No \'e mi \'E ltet o , et al. A foundation model to predict and capture human cognition. Nature, pp.\ 1--8, 2025
work page 2025
-
[10]
Clinton, Cassy Dorff, Brenton Kenkel, and Jennifer M
James Bisbee, Joshua D. Clinton, Cassy Dorff, Brenton Kenkel, and Jennifer M. Larson. Synthetic replacements for human survey data? the perils of large language models. Political Analysis, 2024
work page 2024
-
[11]
Specializing large language models to simulate survey response distributions for global populations
Yong Cao, Haijiang Liu, Arnav Arora, Isabelle Augenstein, Paul R \"o ttger, and Daniel Hershcovich. Specializing large language models to simulate survey response distributions for global populations. arXiv preprint arXiv:2502.07068, 2025
-
[12]
Pew Research Center. Issues and the 2024 election. https://www.pewresearch.org/politics/2024/09/09/issues-and-the-2024-election/, September 9 2024. Accessed: YYYY-MM-DD
work page 2024
-
[13]
Llaga: Large language and graph assistant.arXiv preprint arXiv:2402.08170, 2024
Runjin Chen, Tong Zhao, Ajay Jaiswal, Neil Shah, and Zhangyang Wang. Llaga: Large language and graph assistant. arXiv preprint arXiv:2402.08170, 2024
-
[14]
Compost: Characterizing and evaluating caricature in llm simulations
Myra Cheng, Tiziano Piccardi, and Diyi Yang. Compost: Characterizing and evaluating caricature in llm simulations. In EMNLP, 2023
work page 2023
-
[15]
arXiv preprint arXiv:2303.16779 (2023)
Eric Chu, Jacob Andreas, Stephen Ansolabehere, and Deb Roy. Language models trained on media diets can predict public opinion. arXiv preprint arXiv:2303.16779, 2023
-
[16]
Chunyuan Deng, Yilun Zhao, Yuzhao Heng, Yitong Li, Jiannan Cao, Xiangru Tang, and Arman Cohan. Unveiling the spectrum of data contamination in language models: A survey from detection to remediation. arXiv preprint arXiv:2406.14644, 2024
-
[17]
Ricardo Dominguez-Olmedo, Moritz Hardt, and Celestine Mendler-D \"u nner. Questioning the survey responses of large language models. arXiv preprint arXiv:2306.07951, 2023
-
[18]
Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[19]
Expected Parrot Expected Parrot. Expected parrot. https://www.expectedparrot.com/, 2025. Accessed: 2025-09-21
work page 2025
-
[20]
Graph neural networks for social recommendation
Wenqi Fan, Yao Ma, Qing Li, Yuan He, Eric Zhao, Jiliang Tang, and Dawei Yin. Graph neural networks for social recommendation. In The world wide web conference, pp.\ 417--426, 2019
work page 2019
-
[21]
Modular pluralism: Pluralistic alignment via multi- LLM collaboration
Shangbin Feng, Taylor Sorensen, Yuhan Liu, Jillian Fisher, Chan Young Park, Yejin Choi, and Yulia Tsvetkov. Modular pluralism: Pluralistic alignment via multi- LLM collaboration. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp.\ 4151--4171, November 2024
work page 2024
-
[22]
Fast Graph Representation Learning with PyTorch Geometric
Matthias Fey and Jan Eric Lenssen. Fast graph representation learning with pytorch geometric. arXiv preprint arXiv:1903.02428, 2019
work page internal anchor Pith review Pith/arXiv arXiv 1903
-
[23]
Large language models empowered agent-based modeling and simulation: a survey and perspectives
Chen Gao, Xiaochong Lan, Nian Li, Yuan Yuan, Jingtao Ding, Zhilun Zhou, Fengli Xu, and Yong Li. Large language models empowered agent-based modeling and simulation: a survey and perspectives. Humanities and Social Sciences Communications, 11 0 (1259), 2024
work page 2024
-
[24]
A latent class model for discrete choice analysis: contrasts with mixed logit
William H Greene and David A Hensher. A latent class model for discrete choice analysis: contrasts with mixed logit. Transportation Research Part B: Methodological, 37 0 (8): 0 681--698, 2003
work page 2003
-
[25]
Inductive representation learning on large graphs
Will Hamilton, Zhitao Ying, and Jure Leskovec. Inductive representation learning on large graphs. Advances in neural information processing systems, 30, 2017
work page 2017
-
[26]
Lightgcn: Simplifying and powering graph convolution network for recommendation
Xiangnan He, Kuan Deng, Xiang Wang, Yan Li, Yongdong Zhang, and Meng Wang. Lightgcn: Simplifying and powering graph convolution network for recommendation. In Proceedings of the 43rd International ACM SIGIR conference on research and development in Information Retrieval, pp.\ 639--648, 2020
work page 2020
-
[27]
Explanations as features: Llm-based features for text-attributed graphs.CoRR, abs/2305.19523, 2023
Xiaoxin He, Xavier Bresson, Thomas Laurent, Adam Perold, Yann LeCun, and Bryan Hooi. Harnessing explanations: Llm-to-lm interpreter for enhanced text-attributed graph representation learning. arXiv preprint arXiv:2305.19523, 2023
-
[28]
Zihao He, Minh Duc Chu, Rebecca Dorn, Siyi Guo, and Kristina Lerman. Community-cross-instruct: Unsupervised instruction generation for aligning large language models to online communities. arXiv preprint arXiv:2406.12074, 2024
-
[29]
Predicting results of social science experiments using large language models
Luke Hewitt, Ashwini Ashokkumar, Isaias Ghezae, and Robb Willer. Predicting results of social science experiments using large language models. Technical report, Stanford University and New York University, August 2024
work page 2024
-
[30]
Tobias Holtdirk, Dennis Assenmacher, Arnim Bleier, and Claudia Wagner. Addressing systematic non-response bias with supervised fine-tuning of large language models: A case study on german voting behaviour. Technical report, Center for Open Science, 2025
work page 2025
-
[31]
Lora: Low-rank adaptation of large language models
Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Lu Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models. ICLR, 1 0 (2): 0 3, 2022
work page 2022
-
[32]
Let's ask gnn: Empowering large language model for graph in-context learning
Zhengyu Hu, Yichuan Li, Zhengyu Chen, Jingang Wang, Han Liu, Kyumin Lee, and Kaize Ding. Let's ask gnn: Empowering large language model for graph in-context learning. arXiv preprint arXiv:2410.07074, 2024
-
[33]
Angel Hsing-Chi Hwang, Michael S Bernstein, S Shyam Sundar, Renwen Zhang, Manoel Horta Ribeiro, Yingdan Lu, Serina Chang, Tongshuang Wu, Aimei Yang, Dmitri Williams, et al. Human subjects research in the age of generative ai: Opportunities and challenges of applying llm-simulated data to hci studies. In Proceedings of the Extended Abstracts of the CHI Con...
work page 2025
- [34]
-
[35]
A rational model of the dunning--kruger effect supports insensitivity to evidence in low performers
Rachel A Jansen, Anna N Rafferty, and Thomas L Griffiths. A rational model of the dunning--kruger effect supports insensitivity to evidence in low performers. Nature Human Behaviour, 5 0 (6): 0 756--763, 2021
work page 2021
-
[36]
Albert Q Jiang, Alexandre Sablayrolles, Arthur Mensch, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Florian Bressand, Gianna Lengyel, Guillaume Lample, Lucile Saulnier, et al. Mistral 7b. arXiv preprint arXiv:2310.06825, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[37]
Minwoo Kang, Suhong Moon, Seung Hyeong Lee, Ayush Raj, Joseph Suh, and David Chan. Deep binding of language model virtual personas: a study on approximating political partisan misperceptions. In Second Conference on Language Modeling, 2025
work page 2025
-
[38]
Simulacrum of stories: Examining large language models as qualitative research participants
Shivani Kapania, William Agnew, Motahhare Eslami, Hoda Heidari, and Sarah E Fox. Simulacrum of stories: Examining large language models as qualitative research participants. In Proceedings of the 2025 CHI Conference on Human Factors in Computing Systems, pp.\ 1--17, 2025
work page 2025
-
[39]
Few-shot personalization of llms with mis-aligned responses
Jaehyung Kim and Yiming Yang. Few-shot personalization of llms with mis-aligned responses. In Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pp.\ 11943--11974, 2025
work page 2025
-
[40]
Linear representations of political perspective emerge in large language models
Junsol Kim, James Evans, and Aaron Schein. Linear representations of political perspective emerge in large language models. In The Thirteenth International Conference on Learning Representations, 2025
work page 2025
-
[41]
Adam: A Method for Stochastic Optimization
Diederik P Kingma and Jimmy Ba. Adam: A method for stochastic optimization. arXiv preprint arXiv:1412.6980, 2014
work page internal anchor Pith review Pith/arXiv arXiv 2014
-
[42]
Variational Graph Auto-Encoders
Thomas N Kipf and Max Welling. Variational graph auto-encoders. arXiv preprint arXiv:1611.07308, 2016
work page internal anchor Pith review Pith/arXiv arXiv 2016
- [43]
-
[44]
Persona-driven simulation of voting behavior in the european parliament with large language models
Maximilian Kreutner, Marlene Lutz, and Markus Strohmaier. Persona-driven simulation of voting behavior in the european parliament with large language models. arXiv preprint arXiv:2506.11798, 2025
-
[45]
Stefan Krsteski, Giuseppe Russo, Serina Chang, Robert West, and Kristina Gligori \'c . Valid survey simulations with limited human data: The roles of prompting, fine-tuning, and rectification. arXiv preprint arXiv:2510.11408, 2025
-
[46]
Efficient memory management for large language model serving with pagedattention
Woosuk Kwon, Zhuohan Li, Siyuan Zhuang, Ying Sheng, Lianmin Zheng, Cody Hao Yu, Joseph Gonzalez, Hao Zhang, and Ion Stoica. Efficient memory management for large language model serving with pagedattention. In Proceedings of the 29th symposium on operating systems principles, pp.\ 611--626, 2023
work page 2023
-
[47]
Llm generated persona is a promise with a catch.arXiv preprint arXiv:2503.16527, 2025
Ang Li, Haozhe Chen, Hongseok Namkoong, and Tianyi Peng. Llm generated persona is a promise with a catch. arXiv preprint arXiv:2503.16527, 2025
-
[48]
Culturellm: Incorporating cultural differences into large language models
Cheng Li, Mengzhuo Chen, Jindong Wang, Sunayana Sitaram, and Xing Xie. Culturellm: Incorporating cultural differences into large language models. Advances in Neural Information Processing Systems, 37: 0 84799--84838, 2024
work page 2024
-
[49]
An introduction to neural networks for the social sciences
Gechun Lin and Christopher Lucas. An introduction to neural networks for the social sciences. Oxford Handbook of Engaged Methodological Pluralism in Political Science, 2023
work page 2023
-
[50]
What makes good in-context examples for GPT-3?arXiv preprint arXiv:2101.06804, 2021
Jiachang Liu, Dinghan Shen, Yizhe Zhang, Bill Dolan, Lawrence Carin, and Weizhu Chen. What makes good in-context examples for gpt- 3 ? arXiv preprint arXiv:2101.06804, 2021
-
[51]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. arXiv preprint arXiv:1711.05101, 2017
work page internal anchor Pith review Pith/arXiv arXiv 2017
-
[52]
Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity
Yao Lu, Max Bartolo, Alastair Moore, Sebastian Riedel, and Pontus Stenetorp. Fantastically ordered prompts and where to find them: Overcoming few-shot prompt order sensitivity. In Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 8086--8098, 2022
work page 2022
-
[53]
Automated social science: Language models as scientist and subjects
Benjamin S Manning, Kehang Zhu, and John J Horton. Automated social science: Language models as scientist and subjects. Technical report, National Bureau of Economic Research, 2024
work page 2024
-
[54]
Mixed mnl models for discrete response
Daniel McFadden and Kenneth Train. Mixed mnl models for discrete response. Journal of applied Econometrics, 15 0 (5): 0 447--470, 2000
work page 2000
-
[55]
Virtual personas for language models via an anthology of backstories
Suhong Moon, Marwa Abdulhai, Minwoo Kang, Joseph Suh, Widyadewi Soedarmadji, Eran Behar, and David Chan. Virtual personas for language models via an anthology of backstories. In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pp.\ 19864--19897, 2024
work page 2024
-
[56]
gpt-oss-120b & gpt-oss-20b Model Card
OpenAI. gpt-oss-120b & gpt-oss-20b model card, 2025. URL https://arxiv.org/abs/2508.10925
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[57]
LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals
Joon Sung Park, Carolyn Q Zou, Aaron Shaw, Benjamin Mako Hill, Carrie Cai, Meredith Ringel Morris, Robb Willer, Percy Liang, and Michael S Bernstein. Generative agent simulations of 1,000 people. arXiv preprint arXiv:2411.10109, 2024
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[58]
PewResearch. America trends panel waves. Retrieved February 06, 2025, from https://www.pewsocialtrends.org/dataset, 2018
work page 2025
-
[59]
Performance and biases of large language models in public opinion simulation
Yao Qu and Jue Wang. Performance and biases of large language models in public opinion simulation. Humanities and Social Sciences Communications, 11 0 (1): 0 1--13, 2024
work page 2024
-
[60]
Qwen, :, An Yang, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chengyuan Li, Dayiheng Liu, Fei Huang, Haoran Wei, Huan Lin, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Yang, Jiaxi Yang, Jingren Zhou, Junyang Lin, Kai Dang, Keming Lu, Keqin Bao, Kexin Yang, Le Yu, Mei Li, Mingfeng Xue, Pei Zhang, Qin Zhu, Rui Men, Runji Lin, Tianhao Li,...
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[61]
Synthia: Scalable Grounded Persona Generation from Social Media Data
Vahid Rahimzadeh, Erfan Moosavi Monazzah, Mohammad Taher Pilehvar, and Yadollah Yaghoobzadeh. Synthia: Synthetic yet naturally tailored human-inspired personas. arXiv preprint arXiv:2507.14922, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[62]
Representation learning with large language models for recommendation
Xubin Ren, Wei Wei, Lianghao Xia, Lixin Su, Suqi Cheng, Junfeng Wang, Dawei Yin, and Chao Huang. Representation learning with large language models for recommendation. In Proceedings of the ACM web conference 2024, pp.\ 3464--3475, 2024
work page 2024
-
[63]
Opportunities and risks of llms in survey research
David M Rothschild, James Brand, Hope Schroeder, and Jenny Wang. Opportunities and risks of llms in survey research. Available at SSRN, 2024
work page 2024
-
[64]
Shibani Santurkar, Esin Durmus, Faisal Ladhak, Cinoo Lee, Percy Liang, and Tatsunori Hashimoto. Whose opinions do language models reflect? In International Conference on Machine Learning, pp.\ 29971--30004. PMLR, 2023
work page 2023
-
[65]
Modeling relational data with graph convolutional networks
Michael Schlichtkrull, Thomas N Kipf, Peter Bloem, Rianne Van Den Berg, Ivan Titov, and Max Welling. Modeling relational data with graph convolutional networks. In European semantic web conference, pp.\ 593--607. Springer, 2018
work page 2018
-
[66]
Melanie Sclar, Yejin Choi, Yulia Tsvetkov, and Alane Suhr. Quantifying language models' sensitivity to spurious features in prompt design or: How i learned to start worrying about prompt formatting. In The Twelfth International Conference on Learning Representations, 2024
work page 2024
-
[67]
Language representations can be what recommenders need: Findings and potentials
Leheng Sheng, An Zhang, Yi Zhang, Yuxin Chen, Xiang Wang, and Tat-Seng Chua. Language representations can be what recommenders need: Findings and potentials. In ICLR, 2025
work page 2025
-
[68]
Shivalika Singh, Angelika Romanou, Cl \'e mentine Fourrier, David I Adelani, Jian Gang Ngui, Daniel Vila-Suero, Peerat Limkonchotiwat, Kelly Marchisio, Wei Qi Leong, Yosephine Susanto, et al. Global mmlu: Understanding and addressing cultural and linguistic biases in multilingual evaluation. arXiv preprint arXiv:2412.03304, 2024
-
[69]
Social simulation with llms @ colm 2025
SocialSim'25. Social simulation with llms @ colm 2025. https://sites.google.com/view/social-sims-with-llms, 2025. Accessed: 2025-09-21
work page 2025
-
[70]
Language model fine-tuning on scaled survey data for predicting distributions of public opinions
Joseph Suh, Erfan Jahanparast, Suhong Moon, Minwoo Kang, and Serina Chang. Language model fine-tuning on scaled survey data for predicting distributions of public opinions. In Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pp.\ 21147--21170, July 2025
work page 2025
-
[71]
Graphicl: Unlocking graph learning potential in llms through structured prompt design
Yuanfu Sun, Zhengnan Ma, Yi Fang, Jing Ma, and Qiaoyu Tan. Graphicl: Unlocking graph learning potential in llms through structured prompt design. arXiv preprint arXiv:2501.15755, 2025
-
[72]
Graphgpt: Graph instruction tuning for large language models
Jiabin Tang, Yuhao Yang, Wei Wei, Lei Shi, Lixin Su, Suqi Cheng, Dawei Yin, and Chao Huang. Graphgpt: Graph instruction tuning for large language models. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval, pp.\ 491--500, 2024
work page 2024
-
[73]
Linear Representations of Sentiment in Large Language Models
Curt Tigges, Oskar John Hollinsworth, Atticus Geiger, and Neel Nanda. Linear representations of sentiment in large language models. arXiv preprint arXiv:2310.15154, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[74]
Graph-based methods for discrete choice
Kiran Tomlinson and Austin R Benson. Graph-based methods for discrete choice. Network Science, 12 0 (1): 0 21--40, 2024
work page 2024
-
[75]
Olivier Toubia, George Z Gui, Tianyi Peng, Daniel J Merlau, Ang Li, and Haozhe Chen. Twin-2k-500: A data set for building digital twins of over 2,000 people based on their answers to over 500 questions. Marketing Science, 2025
work page 2025
-
[76]
Llama 2: Open Foundation and Fine-Tuned Chat Models
Hugo Touvron, Louis Martin, Kevin Stone, Peter Albert, Amjad Almahairi, Yasmine Babaei, Nikolay Bashlykov, Soumya Batra, Prajjwal Bhargava, Shruti Bhosale, et al. Llama 2: Open foundation and fine-tuned chat models. arXiv preprint arXiv:2307.09288, 2023
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[77]
Discrete choice methods with simulation
Kenneth E Train. Discrete choice methods with simulation. Cambridge university press, 2009
work page 2009
-
[78]
Petar Veli c kovi \'c , Guillem Cucurull, Arantxa Casanova, Adriana Romero, Pietro Li \`o , and Yoshua Bengio. Graph attention networks. In International Conference on Learning Representations, 2018
work page 2018
-
[79]
Alexander Sasha Vezhnevets, John P Agapiou, Avia Aharon, Ron Ziv, Jayd Matyas, Edgar A Du \'e \ n ez-Guzm \'a n, William A Cunningham, Simon Osindero, Danny Karmon, and Joel Z Leibo. Generative agent-based modeling with actions grounded in physical, social, or digital space using concordia. arXiv preprint arXiv:2312.03664, 2023
-
[80]
Vox populi, vox ai? using large language models to estimate german vote choice
Leah von der Heyde, Anna-Carolina Haensch, and Alexander Wenz. Vox populi, vox ai? using large language models to estimate german vote choice. Social Science Computer Review, pp.\ 08944393251337014, 2025
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.