pith. sign in

hub Canonical reference

LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

Canonical reference. 78% of citing Pith papers cite this work as background.

43 Pith papers citing it
Background 78% of classified citations
abstract

Machine learning can predict human behavior well when substantial structured data and well-defined outcomes are available, but these models are typically limited to specific outcomes and cannot readily be applied to new domains. We test whether large language models (LLMs) can support a more general-purpose approach by building person-specific simulations (i.e., "generative agents") grounded in self-report data. Using data from a diverse national sample of 1,052 Americans, we build agents from (i) two-hour, semi-structured interviews (elicited using the American Voices Project interview schedule), (ii) structured surveys (the General Social Survey and Big Five personality inventory), or (iii) both sources combined. On held-out General Social Survey items, agent accuracy reached 83% (interview only), 82% (surveys only), and 86% (combined) of participants' two-week test-retest consistency, compared with agents prompted only with individuals' demographics (74%). Agents predicted personality traits and behaviors in experiments with similar accuracy, and reduced disparities in accuracy across racial and ideological groups relative to demographics-only baselines. Together, these results show that LLMs agents grounded in rich qualitative or quantitative self-report data can support general-purpose simulation of individuals across outcomes, without requiring task-specific training data.

hub tools

citation-role summary

background 7 method 1 other 1

citation-polarity summary

clear filters

representative citing papers

From Role to Person: Trust Calibration Challenges in Twin Agents

cs.HC · 2026-05-19 · unverdicted · novelty 7.0

Twin agents as personal digital representations create distinct trust calibration challenges because they dissolve the boundary between AI and human decision-makers, unlike existing frameworks designed for clear separation.

Text-Based Personas for Simulating User Privacy Decisions

cs.CR · 2026-03-20 · unverdicted · novelty 7.0

Narriva generates behavior-grounded text personas from survey data that achieve up to 87% accuracy in predicting privacy decisions, improve 6-17 points over baselines, cut tokens by 80-95%, and reproduce aggregate distributions across different studies.

PrivacySIM: Evaluating LLM Simulation of User Privacy Behavior

cs.CR · 2026-05-12 · unverdicted · novelty 6.0

PrivacySIM shows that conditioning LLMs on user personas like demographics and attitudes improves simulation of privacy choices but reaches only 40.4% accuracy against real responses from 1,000 users.

Post-training makes large language models less human-like

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

Post-training reduces LLMs' behavioral alignment with humans across families and sizes, with the misalignment increasing in newer generations while persona induction fails to improve individual-level predictions.

The Collapse of Heterogeneity in Silicon Philosophers

cs.CY · 2026-04-26 · unverdicted · novelty 6.0

Large language models collapse philosophical heterogeneity by over-correlating judgments across domains, creating artificial consensus unlike the views of 277 professional philosophers.

In-Situ Behavioral Evaluation for LLM Fairness, Not Standardized-Test Scores

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

Standardized-test benchmarks for LLM fairness are unreliable because prompt wording alone drives most score variance and ranking changes, while a multi-agent conversational framework reveals consistent model-specific fairness behaviors across millions of dialogues.

Explicit Trait Inference for Multi-Agent Coordination

cs.AI · 2026-04-21 · unverdicted · novelty 6.0

ETI lets LLM agents infer and track partners' psychological traits (warmth and competence) from histories, cutting payoff loss 45-77% in games and boosting performance 3-29% on MultiAgentBench versus CoT baselines.

Graph-Based Alternatives to LLMs for Human Simulation

cs.CL · 2025-11-03 · conditional · novelty 6.0

GEMS formulates close-ended human-behavior simulation as link prediction on a heterogeneous graph and matches or exceeds LLM performance with three orders of magnitude fewer parameters across three datasets and three evaluation settings.

citing papers explorer

Showing 1 of 1 citing paper after filters.