hub Canonical reference

LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

· 2024 · cs.AI · arXiv 2411.10109

Canonical reference. 78% of citing Pith papers cite this work as background.

43 Pith papers citing it

Background 78% of classified citations

open full Pith review browse 43 citing papers arXiv PDF

abstract

Machine learning can predict human behavior well when substantial structured data and well-defined outcomes are available, but these models are typically limited to specific outcomes and cannot readily be applied to new domains. We test whether large language models (LLMs) can support a more general-purpose approach by building person-specific simulations (i.e., "generative agents") grounded in self-report data. Using data from a diverse national sample of 1,052 Americans, we build agents from (i) two-hour, semi-structured interviews (elicited using the American Voices Project interview schedule), (ii) structured surveys (the General Social Survey and Big Five personality inventory), or (iii) both sources combined. On held-out General Social Survey items, agent accuracy reached 83% (interview only), 82% (surveys only), and 86% (combined) of participants' two-week test-retest consistency, compared with agents prompted only with individuals' demographics (74%). Agents predicted personality traits and behaviors in experiments with similar accuracy, and reduced disparities in accuracy across racial and ideological groups relative to demographics-only baselines. Together, these results show that LLMs agents grounded in rich qualitative or quantitative self-report data can support general-purpose simulation of individuals across outcomes, without requiring task-specific training data.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 7 method 1 other 1

citation-polarity summary

background 7 unclear 1 use method 1

representative citing papers

Narrative Sharpens Gender Gaps: Surveying Film Characters with LLM Agents

cs.HC · 2026-05-21 · unverdicted · novelty 7.0

LLM agents built from movie scripts reproduce and exaggerate real-world gender attitude gaps, indicating that film narratives sharpen rather than smooth gender contrasts.

From Role to Person: Trust Calibration Challenges in Twin Agents

cs.HC · 2026-05-19 · unverdicted · novelty 7.0

Twin agents as personal digital representations create distinct trust calibration challenges because they dissolve the boundary between AI and human decision-makers, unlike existing frameworks designed for clear separation.

PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.

ScioMind: Cognitively Grounded Multi-Agent Social Simulation with Anchoring-Based Belief Dynamics and Dynamic Profiles

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

ScioMind combines anchoring-based belief updates, hierarchical memory, and dynamic profiles in LLM multi-agent systems to produce more stable, diverse, and psychologically aligned opinion trajectories than prior fixed-rule or unconstrained approaches.

Measuring and Mitigating the Distributional Gap Between Real and Simulated User Behaviors

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

A clustering and divergence method reveals a large distributional gap between real and LLM-simulated user behaviors on coding and writing tasks, partially closed by combining complementary simulators.

PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI

cs.HC · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

Persona-driven workflow and interface improve automated and human-AI red-teaming of generative AI by incorporating diverse perspectives into adversarial prompt creation.

WhatIf: Interactive Exploration of LLM-Powered Social Simulations for Policy Reasoning

cs.HC · 2026-04-19 · unverdicted · novelty 7.0

WhatIf provides an interactive platform for real-time exploration of LLM-driven social simulations, enabling policymakers to iteratively test plans, reflect on assumptions, and uncover vulnerabilities in emergency preparedness scenarios.

IntervenSim: Intervention-Aware Social Network Simulation for Opinion Dynamics

cs.SI · 2026-04-08 · unverdicted · novelty 7.0

IntervenSim is an intervention-aware social network simulation that couples source interventions with crowd interactions in a feedback loop, improving MAPE by 41.6% and DTW by 66.9% over prior static frameworks on real-world events.

Text-Based Personas for Simulating User Privacy Decisions

cs.CR · 2026-03-20 · unverdicted · novelty 7.0

Narriva generates behavior-grounded text personas from survey data that achieve up to 87% accuracy in predicting privacy decisions, improve 6-17 points over baselines, cut tokens by 80-95%, and reproduce aggregate distributions across different studies.

Evalet: Evaluating Large Language Models through Functional Fragmentation

cs.HC · 2025-09-14 · conditional · novelty 7.0

Evalet applies functional fragmentation to deliver fragment-level qualitative analysis of LLM evaluations, with a user study showing 48% more misalignment detections than holistic scoring.

ChatCLIDS: Simulating Persuasive AI Dialogues to Promote Closed-Loop Insulin Adoption in Type 1 Diabetes Care

cs.AI · 2025-08-31 · unverdicted · novelty 7.0

ChatCLIDS creates a library of expert-validated virtual patients and tests LLM agents using evidence-based persuasive strategies in simulated longitudinal and adversarial health counseling sessions for closed-loop insulin adoption.

You Can't Fool Us: Understanding the Resilience of LLM-driven Agent Communities to Misinformation

cs.CY · 2026-05-17 · unverdicted · novelty 6.0

LLM agent simulations show higher actively open-minded thinking boosts resistance to and recovery from misinformation while ideological moderation supports more reliable correction than polarization.

SimPersona: Learning Discrete Buyer Personas from Raw Clickstreams for Grounded E-Commerce Agents

cs.AI · 2026-05-14 · unverdicted · novelty 6.0 · 2 refs

SimPersona induces a discrete buyer-type space from clickstreams via VQ-VAE, maps types to LLM persona tokens, fine-tunes agents on traces, and samples from merchant distributions to achieve 78% conversion-rate alignment on 42 held-out storefronts.

PrivacySIM: Evaluating LLM Simulation of User Privacy Behavior

cs.CR · 2026-05-12 · unverdicted · novelty 6.0

PrivacySIM shows that conditioning LLMs on user personas like demographics and attitudes improves simulation of privacy choices but reaches only 40.4% accuracy against real responses from 1,000 users.

Post-training makes large language models less human-like

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

Post-training reduces LLMs' behavioral alignment with humans across families and sizes, with the misalignment increasing in newer generations while persona induction fails to improve individual-level predictions.

The Collapse of Heterogeneity in Silicon Philosophers

cs.CY · 2026-04-26 · unverdicted · novelty 6.0

Large language models collapse philosophical heterogeneity by over-correlating judgments across domains, creating artificial consensus unlike the views of 277 professional philosophers.

CHORUS: An Agentic Framework for Generating Realistic Deliberation Data

cs.AI · 2026-04-22 · unverdicted · novelty 6.0

Chorus generates realistic deliberation discussions via LLM agents with memory and Poisson-timed participation, validated by 30 experts on realism, coherence, and utility.

Behavioral Transfer in AI Agents: Evidence and Privacy Implications

econ.GN · 2026-04-21 · unverdicted · novelty 6.0

AI agents on Moltbook reflect the specific behavioral traits of their linked human owners across multiple dimensions, with stronger transfer linked to greater privacy risks.

In-Situ Behavioral Evaluation for LLM Fairness, Not Standardized-Test Scores

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

Standardized-test benchmarks for LLM fairness are unreliable because prompt wording alone drives most score variance and ranking changes, while a multi-agent conversational framework reveals consistent model-specific fairness behaviors across millions of dialogues.

Explicit Trait Inference for Multi-Agent Coordination

cs.AI · 2026-04-21 · unverdicted · novelty 6.0

ETI lets LLM agents infer and track partners' psychological traits (warmth and competence) from histories, cutting payoff loss 45-77% in games and boosting performance 3-29% on MultiAgentBench versus CoT baselines.

Can LLM Agents Simulate Dynamic Networks? A Case Study on Email Networks with Phishing Synthesis

cs.SI · 2026-03-20 · unverdicted · novelty 6.0

LLM multi-agent systems augmented with data-driven event triggers and Hawkes processes simulate both micro-level interactions and macroscopic topologies in dynamic email networks for realistic phishing synthesis.

Agora: Teaching the Skill of Consensus-Finding with AI Personas Grounded in Human Voice

cs.HC · 2026-03-07 · conditional · novelty 6.0

Agora uses AI to ground policy discussions in real human voices and a small study shows it improves users' perspective-taking compared to numerical summaries alone.

StreetDesignAI: Broadening Designer Perspectives Through Multi-Persona Evaluation of Cycling Infrastructure

cs.HC · 2026-01-22 · unverdicted · novelty 6.0 · 2 refs

StreetDesignAI provides structured multi-persona feedback on cycling designs and a user study shows it broadens designers' grasp of diverse cyclist perspectives and improves design decision confidence.

Graph-Based Alternatives to LLMs for Human Simulation

cs.CL · 2025-11-03 · conditional · novelty 6.0

GEMS formulates close-ended human-behavior simulation as link prediction on a heterogeneous graph and matches or exceeds LLM performance with three orders of magnitude fewer parameters across three datasets and three evaluation settings.

citing papers explorer

Showing 43 of 43 citing papers.

Narrative Sharpens Gender Gaps: Surveying Film Characters with LLM Agents cs.HC · 2026-05-21 · unverdicted · none · ref 9 · internal anchor
LLM agents built from movie scripts reproduce and exaggerate real-world gender attitude gaps, indicating that film narratives sharpen rather than smooth gender contrasts.
From Role to Person: Trust Calibration Challenges in Twin Agents cs.HC · 2026-05-19 · unverdicted · none · ref 10 · internal anchor
Twin agents as personal digital representations create distinct trust calibration challenges because they dissolve the boundary between AI and human decision-makers, unlike existing frameworks designed for clear separation.
PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media cs.CL · 2026-05-16 · unverdicted · none · ref 63 · internal anchor
PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.
ScioMind: Cognitively Grounded Multi-Agent Social Simulation with Anchoring-Based Belief Dynamics and Dynamic Profiles cs.AI · 2026-05-13 · unverdicted · none · ref 3 · internal anchor
ScioMind combines anchoring-based belief updates, hierarchical memory, and dynamic profiles in LLM multi-agent systems to produce more stable, diverse, and psychologically aligned opinion trajectories than prior fixed-rule or unconstrained approaches.
Measuring and Mitigating the Distributional Gap Between Real and Simulated User Behaviors cs.CL · 2026-05-08 · unverdicted · none · ref 36 · internal anchor
A clustering and divergence method reveals a large distributional gap between real and LLM-simulated user behaviors on coding and writing tasks, partially closed by combining complementary simulators.
PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI cs.HC · 2026-05-07 · unverdicted · none · ref 48 · 2 links · internal anchor
Persona-driven workflow and interface improve automated and human-AI red-teaming of generative AI by incorporating diverse perspectives into adversarial prompt creation.
WhatIf: Interactive Exploration of LLM-Powered Social Simulations for Policy Reasoning cs.HC · 2026-04-19 · unverdicted · none · ref 50 · internal anchor
WhatIf provides an interactive platform for real-time exploration of LLM-driven social simulations, enabling policymakers to iteratively test plans, reflect on assumptions, and uncover vulnerabilities in emergency preparedness scenarios.
IntervenSim: Intervention-Aware Social Network Simulation for Opinion Dynamics cs.SI · 2026-04-08 · unverdicted · none · ref 68 · internal anchor
IntervenSim is an intervention-aware social network simulation that couples source interventions with crowd interactions in a feedback loop, improving MAPE by 41.6% and DTW by 66.9% over prior static frameworks on real-world events.
Text-Based Personas for Simulating User Privacy Decisions cs.CR · 2026-03-20 · unverdicted · none · ref 27 · internal anchor
Narriva generates behavior-grounded text personas from survey data that achieve up to 87% accuracy in predicting privacy decisions, improve 6-17 points over baselines, cut tokens by 80-95%, and reproduce aggregate distributions across different studies.
Evalet: Evaluating Large Language Models through Functional Fragmentation cs.HC · 2025-09-14 · conditional · none · ref 63 · internal anchor
Evalet applies functional fragmentation to deliver fragment-level qualitative analysis of LLM evaluations, with a user study showing 48% more misalignment detections than holistic scoring.
ChatCLIDS: Simulating Persuasive AI Dialogues to Promote Closed-Loop Insulin Adoption in Type 1 Diabetes Care cs.AI · 2025-08-31 · unverdicted · none · ref 3 · internal anchor
ChatCLIDS creates a library of expert-validated virtual patients and tests LLM agents using evidence-based persuasive strategies in simulated longitudinal and adversarial health counseling sessions for closed-loop insulin adoption.
You Can't Fool Us: Understanding the Resilience of LLM-driven Agent Communities to Misinformation cs.CY · 2026-05-17 · unverdicted · none · ref 2 · internal anchor
LLM agent simulations show higher actively open-minded thinking boosts resistance to and recovery from misinformation while ideological moderation supports more reliable correction than polarization.
SimPersona: Learning Discrete Buyer Personas from Raw Clickstreams for Grounded E-Commerce Agents cs.AI · 2026-05-14 · unverdicted · none · ref 17 · 2 links · internal anchor
SimPersona induces a discrete buyer-type space from clickstreams via VQ-VAE, maps types to LLM persona tokens, fine-tunes agents on traces, and samples from merchant distributions to achieve 78% conversion-rate alignment on 42 held-out storefronts.
PrivacySIM: Evaluating LLM Simulation of User Privacy Behavior cs.CR · 2026-05-12 · unverdicted · none · ref 32 · internal anchor
PrivacySIM shows that conditioning LLMs on user personas like demographics and attitudes improves simulation of privacy choices but reaches only 40.4% accuracy against real responses from 1,000 users.
Post-training makes large language models less human-like cs.CL · 2026-05-08 · unverdicted · none · ref 9 · internal anchor
Post-training reduces LLMs' behavioral alignment with humans across families and sizes, with the misalignment increasing in newer generations while persona induction fails to improve individual-level predictions.
The Collapse of Heterogeneity in Silicon Philosophers cs.CY · 2026-04-26 · unverdicted · none · ref 18 · internal anchor
Large language models collapse philosophical heterogeneity by over-correlating judgments across domains, creating artificial consensus unlike the views of 277 professional philosophers.
CHORUS: An Agentic Framework for Generating Realistic Deliberation Data cs.AI · 2026-04-22 · unverdicted · none · ref 6 · internal anchor
Chorus generates realistic deliberation discussions via LLM agents with memory and Poisson-timed participation, validated by 30 experts on realism, coherence, and utility.
Behavioral Transfer in AI Agents: Evidence and Privacy Implications econ.GN · 2026-04-21 · unverdicted · none · ref 26 · internal anchor
AI agents on Moltbook reflect the specific behavioral traits of their linked human owners across multiple dimensions, with stronger transfer linked to greater privacy risks.
In-Situ Behavioral Evaluation for LLM Fairness, Not Standardized-Test Scores cs.CL · 2026-04-21 · unverdicted · none · ref 6 · internal anchor
Standardized-test benchmarks for LLM fairness are unreliable because prompt wording alone drives most score variance and ranking changes, while a multi-agent conversational framework reveals consistent model-specific fairness behaviors across millions of dialogues.
Explicit Trait Inference for Multi-Agent Coordination cs.AI · 2026-04-21 · unverdicted · none · ref 32 · internal anchor
ETI lets LLM agents infer and track partners' psychological traits (warmth and competence) from histories, cutting payoff loss 45-77% in games and boosting performance 3-29% on MultiAgentBench versus CoT baselines.
Can LLM Agents Simulate Dynamic Networks? A Case Study on Email Networks with Phishing Synthesis cs.SI · 2026-03-20 · unverdicted · none · ref 6 · internal anchor
LLM multi-agent systems augmented with data-driven event triggers and Hawkes processes simulate both micro-level interactions and macroscopic topologies in dynamic email networks for realistic phishing synthesis.
Agora: Teaching the Skill of Consensus-Finding with AI Personas Grounded in Human Voice cs.HC · 2026-03-07 · conditional · none · ref 25 · internal anchor
Agora uses AI to ground policy discussions in real human voices and a small study shows it improves users' perspective-taking compared to numerical summaries alone.
StreetDesignAI: Broadening Designer Perspectives Through Multi-Persona Evaluation of Cycling Infrastructure cs.HC · 2026-01-22 · unverdicted · none · ref 44 · 2 links · internal anchor
StreetDesignAI provides structured multi-persona feedback on cycling designs and a user study shows it broadens designers' grasp of diverse cyclist perspectives and improves design decision confidence.
Graph-Based Alternatives to LLMs for Human Simulation cs.CL · 2025-11-03 · conditional · none · ref 57 · internal anchor
GEMS formulates close-ended human-behavior simulation as link prediction on a heterogeneous graph and matches or exceeds LLM performance with three orders of magnitude fewer parameters across three datasets and three evaluation settings.
Synthia: Scalable Grounded Persona Generation from Social Media Data cs.CL · 2025-07-20 · unverdicted · none · ref 10 · internal anchor
Synthia creates scalable personas from Bluesky posts that better match human survey responses than prior methods, uses smaller models, and retains social network structure for network-aware analysis.
TinyTroupe: An LLM-powered Multiagent Persona Simulation Toolkit cs.MA · 2025-07-13 · accept · none · ref 21 · internal anchor
TinyTroupe provides a toolkit for fine-grained persona-based LLM multi-agent simulations with built-in support for population sampling, experimentation, and validation.
Language Model Fine-Tuning on Scaled Survey Data for Predicting Distributions of Public Opinions cs.CL · 2025-02-24 · unverdicted · none · ref 7 · internal anchor
Fine-tuning LLMs on the SubPOP dataset of 3,362 questions and 70K pairs reduces the gap between LLM predictions and human survey responses by up to 46% and generalizes to unseen surveys and subpopulations.
AgentSociety: Large-Scale Simulation of LLM-Driven Generative Agents Advances Understanding of Human Behaviors and Society cs.SI · 2025-02-12 · unverdicted · none · ref 78 · internal anchor
AgentSociety is a large-scale LLM agent-based social simulator validated on polarization, UBI, disasters, and sustainability issues with alignment to real experiments.
Why Expert Alignment Is Hard: Evidence from Subjective Evaluation cs.CL · 2026-05-06 · unverdicted · none · ref 14 · internal anchor
Expert alignment in subjective LLM evaluations is difficult because expert judgments are heterogeneous, partly tacit, dimension-dependent, and temporally unstable.
From Demographics to Survey Anchors: Evaluating LLM Agents for Modeling Retirement Attitudes cs.CY · 2026-04-24 · conditional · none · ref 2 · internal anchor
Demographic-only LLM agents for retirement survey prediction exhibit central tendency bias, fail to reproduce incorrect or 'don't know' answers, and miss factor interactions in regressions, unlike survey-anchored agents.
JudgeMeNot: Personalizing Large Language Models to Emulate Judicial Reasoning in Hebrew cs.CL · 2026-04-20 · unverdicted · none · ref 11 · internal anchor
A pipeline using causal language modeling and synthetic instruction-tuning personalizes LLMs to replicate individual Hebrew judges' reasoning, outperforming baselines on similarity metrics with outputs indistinguishable from human judges.
Imperfectly Cooperative Human-AI Interactions: Comparing the Impacts of Human and AI Attributes in Simulated and User Studies cs.CL · 2026-04-17 · unverdicted · none · ref 48 · internal anchor
In real human subjects, AI transparency impacts imperfectly cooperative interactions far more than personality traits, unlike simulations where both are comparably influential.
AI and Collective Decisions: Strengthening Legitimacy and Losers' Consent cs.HC · 2026-04-07 · unverdicted · none · ref 61 · internal anchor
An AI system that elicits personal experiences and visualizes policy support increased perceived legitimacy and perspective-taking in collective decisions despite unfavorable outcomes.
Same Voice, Different Lab: On the Homogenization of Frontier LLM Personalities cs.HC · 2026-03-20 · unverdicted · none · ref 24 · internal anchor
Frontier LLMs homogenize toward systematic and analytical personalities, suppressing emotional traits like remorseful or sycophantic, indicating an implicit consensus on optimal assistant behavior.
When AI Agents Learn from Each Other: Insights from Emergent AI Agent Communities on OpenClaw for Human-AI Partnership in Education cs.CY · 2026-03-17 · unverdicted · none · ref 34 · internal anchor
Qualitative observations of over 167,000 AI agents in open platforms reveal emergent peer learning, shared memory architectures, and trust dynamics that can inform multi-agent educational AI design.
Simulating Online Social Media Conversations on Controversial Topics Using AI Agents Calibrated on Real-World Data cs.SI · 2025-09-23 · conditional · none · ref 27 · internal anchor
LLM agents calibrated on Italian election data produce coherent posts and realistic network structure but show less tone and toxicity variation than real users, with opinion changes resembling traditional mathematical models.
Large Language Models as Virtual Survey Respondents: Evaluating Sociodemographic Response Generation cs.AI · 2025-09-08 · conditional · none · ref 25 · internal anchor
Introduces PAS and FAS task abstractions plus the LLM-S^3 benchmark to evaluate LLMs on generating sociodemographic survey responses across 11 real datasets and multiple models.
The Rise of AI Companions: Interaction with AI Companions and Psychological Well-being cs.HC · 2025-06-14 · conditional · none · ref 17 · internal anchor
Survey and chat data from CharacterAI users link companionship-focused AI use to lower well-being, with stronger ties for users who have small offline networks and engage intensively or disclosively.
AgentDynEx: Nudging the Mechanics and Dynamics of Multi-Agent Simulations cs.MA · 2025-04-13 · unverdicted · none · ref 31 · internal anchor
AgentDynEx introduces nudging and a Configuration Matrix to help set up and maintain balanced mechanics and dynamics in multi-agent LLM simulations.
Can LLMs Emulate Human Belief Dynamics? cs.SI · 2026-05-05 · unverdicted · none · ref 17 · internal anchor
LLMs fail to emulate human belief dynamics: they mismatch initial distributions and show higher conformity than humans in network interactions.
Stable Behavior, Limited Variation: Persona Validity in LLM Agents for Urban Sentiment Perception cs.CL · 2026-04-30 · unverdicted · none · ref 3 · 2 links · internal anchor
Persona prompting in multimodal LLMs for urban sentiment yields high within-persona stability but limited cross-persona variation, with no-persona models often matching or exceeding persona-conditioned agreement to human labels.
Network Effects and Agreement Drift in LLM Debates cs.SI · 2026-04-13 · unverdicted · none · ref 14 · internal anchor
LLM agents in controlled network debates show agreement drift toward specific opinion positions, requiring separation of structural effects from LLM biases before using them as human behavioral proxies.
We Need Strong Preconditions For Using Simulations In Policy cs.CY · 2026-04-09 · unverdicted · none · ref 48 · internal anchor
Societal-scale LLM agent simulations for policy need three preconditions: avoid neutral treatment of marginalized population simulations, require population participation, ensure accountability, plus development and deployment reports.

LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer