hub Canonical reference

LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

· 2024 · cs.AI · arXiv 2411.10109

Canonical reference. 78% of citing Pith papers cite this work as background.

53 Pith papers citing it

Background 78% of classified citations

open full Pith review browse 53 citing papers arXiv PDF

abstract

Machine learning can predict human behavior well when substantial structured data are available for well-defined outcomes. Such models are typically outcome-specific, however, requiring training data for each target outcome, limiting their applicability to new domains. We test whether large language models (LLMs) can relax these requirements by using self-report data to build attitudinal and behavioral simulations, or "generative agents," that can predict responses across outcomes without outcome-specific training data. Using data from a diverse national sample of 1,052 Americans, we built agents from (i) two-hour, semi-structured interviews elicited using the American Voices Project interview schedule, (ii) structured surveys including General Social Survey items and the Big Five personality inventory, or (iii) both sources combined. On held-out General Social Survey items, interview-only, survey-only, and combined agents achieved accuracies equal to 83%, 82%, and 86% of participants' own two-week test-retest consistency benchmark, respectively, compared with 74% for demographics-only agents. Combining interviews and surveys produced the highest accuracy, though gains over either source alone were modest, suggesting that predictive benefits from data begin to asymptote once the model has observed sufficient evidence within a domain. We find that these agents also predict personality traits, economic-game behavior, and experimental responses, while reducing accuracy disparities across racial and ideological groups relative to demographics-only agents. Together, these results show that LLM agents grounded in qualitative or quantitative self-reports can support general-purpose simulation of individuals across outcomes, without requiring task-specific training data.

hub tools

JSON dossier citing papers JSON arXiv source

citation-role summary

background 7 method 1 other 1

citation-polarity summary

background 7 unclear 1 use method 1

representative citing papers

BehaviorBench: Benchmarking Foundation Models for Behavioral Science Tasks

cs.CL · 2026-06-23 · unverdicted · novelty 7.0

BehaviorBench is a benchmark for foundation models on behavioral tasks that reveals fine-tuned behavioral models outperform general models on distributional alignment while general models lead on individual-level accuracy.

Narrative Sharpens Gender Gaps: Surveying Film Characters with LLM Agents

cs.HC · 2026-05-21 · unverdicted · novelty 7.0

LLM agents built from movie scripts reproduce and exaggerate real-world gender attitude gaps, indicating that film narratives sharpen rather than smooth gender contrasts.

From Role to Person: Trust Calibration Challenges in Twin Agents

cs.HC · 2026-05-19 · unverdicted · novelty 7.0

Twin agents as personal digital representations create distinct trust calibration challenges because they dissolve the boundary between AI and human decision-makers, unlike existing frameworks designed for clear separation.

PluRule: A Benchmark for Moderating Pluralistic Communities on Social Media

cs.CL · 2026-05-16 · unverdicted · novelty 7.0

PluRule is a new multimodal multilingual benchmark showing that state-of-the-art vision-language models perform only marginally better than a trivial baseline at detecting specific rule violations in pluralistic online communities.

ScioMind: Cognitively Grounded Multi-Agent Social Simulation with Anchoring-Based Belief Dynamics and Dynamic Profiles

cs.AI · 2026-05-13 · unverdicted · novelty 7.0

ScioMind combines anchoring-based belief updates, hierarchical memory, and dynamic profiles in LLM multi-agent systems to produce more stable, diverse, and psychologically aligned opinion trajectories than prior fixed-rule or unconstrained approaches.

Measuring and Mitigating the Distributional Gap Between Real and Simulated User Behaviors

cs.CL · 2026-05-08 · unverdicted · novelty 7.0

A clustering and divergence method reveals a large distributional gap between real and LLM-simulated user behaviors on coding and writing tasks, partially closed by combining complementary simulators.

PersonaTeaming: Supporting Persona-Driven Red-Teaming for Generative AI

cs.HC · 2026-05-07 · unverdicted · novelty 7.0 · 2 refs

Persona-driven workflow and interface improve automated and human-AI red-teaming of generative AI by incorporating diverse perspectives into adversarial prompt creation.

WhatIf: Interactive Exploration of LLM-Powered Social Simulations for Policy Reasoning

cs.HC · 2026-04-19 · unverdicted · novelty 7.0

WhatIf provides an interactive platform for real-time exploration of LLM-driven social simulations, enabling policymakers to iteratively test plans, reflect on assumptions, and uncover vulnerabilities in emergency preparedness scenarios.

IntervenSim: Intervention-Aware Social Network Simulation for Opinion Dynamics

cs.SI · 2026-04-08 · unverdicted · novelty 7.0

IntervenSim is an intervention-aware social network simulation that couples source interventions with crowd interactions in a feedback loop, improving MAPE by 41.6% and DTW by 66.9% over prior static frameworks on real-world events.

Text-Based Personas for Simulating User Privacy Decisions

cs.CR · 2026-03-20 · unverdicted · novelty 7.0

Narriva generates behavior-grounded text personas from survey data that achieve up to 87% accuracy in predicting privacy decisions, improve 6-17 points over baselines, cut tokens by 80-95%, and reproduce aggregate distributions across different studies.

Evalet: Evaluating Large Language Models through Functional Fragmentation

cs.HC · 2025-09-14 · conditional · novelty 7.0

Evalet applies functional fragmentation to deliver fragment-level qualitative analysis of LLM evaluations, with a user study showing 48% more misalignment detections than holistic scoring.

ChatCLIDS: Simulating Persuasive AI Dialogues to Promote Closed-Loop Insulin Adoption in Type 1 Diabetes Care

cs.AI · 2025-08-31 · unverdicted · novelty 7.0

ChatCLIDS creates a library of expert-validated virtual patients and tests LLM agents using evidence-based persuasive strategies in simulated longitudinal and adversarial health counseling sessions for closed-loop insulin adoption.

Got a Secret? LLM Agents Can't Keep It: Evaluating Privacy in Multi-Agent Systems

cs.AI · 2026-05-26 · unverdicted · novelty 6.0

Multi-agent social simulations show LLM privacy violations rising from 19.95% to 45.30%, with leakage spreading contagiously (8x after peer disclosure) and explicit instructions leaving rates above 37.8%.

Recon: Reconstruction-Guided Reasoning Synthesis for User Modeling

cs.CL · 2026-05-26 · unverdicted · novelty 6.0

Recon scores reasoning traces via action reconstruction fidelity, achieving 54.7% win rate over post-hoc baselines and up to 70% when used to train synthesis models across four domains.

Simulating Human Memory with Language Models

cs.CL · 2026-05-25 · unverdicted · novelty 6.0

Language models show superior memory to humans on psych experiments but can be adjusted via prompting and compaction to forget more human-like, yielding better user simulators.

You Can't Fool Us: Understanding the Resilience of LLM-driven Agent Communities to Misinformation

cs.CY · 2026-05-17 · unverdicted · novelty 6.0

LLM agent simulations show higher actively open-minded thinking boosts resistance to and recovery from misinformation while ideological moderation supports more reliable correction than polarization.

AI Outperforms Humans in Personalized Image Aesthetics Assessment via LLM-Based Interviews and Semantic Feature Extraction

cs.AI · 2026-05-14 · unverdicted · novelty 6.0

An integrated DL-LLM system using LLM-based interviews and semantic features predicts individual image aesthetic ratings more accurately than human predictors or the target's re-evaluations, with error below within-person variability.

SimPersona: Learning Discrete Buyer Personas from Raw Clickstreams for Grounded E-Commerce Agents

cs.AI · 2026-05-14 · unverdicted · novelty 6.0 · 2 refs

SimPersona induces a discrete buyer-type space from clickstreams via VQ-VAE, maps types to LLM persona tokens, fine-tunes agents on traces, and samples from merchant distributions to achieve 78% conversion-rate alignment on 42 held-out storefronts.

PrivacySIM: Evaluating LLM Simulation of User Privacy Behavior

cs.CR · 2026-05-12 · unverdicted · novelty 6.0

PrivacySIM shows that conditioning LLMs on user personas like demographics and attitudes improves simulation of privacy choices but reaches only 40.4% accuracy against real responses from 1,000 users.

Post-training makes large language models less human-like

cs.CL · 2026-05-08 · unverdicted · novelty 6.0

Post-training reduces LLMs' behavioral alignment with humans across families and sizes, with the misalignment increasing in newer generations while persona induction fails to improve individual-level predictions.

The Collapse of Heterogeneity in Silicon Philosophers

cs.CY · 2026-04-26 · unverdicted · novelty 6.0

Large language models collapse philosophical heterogeneity by over-correlating judgments across domains, creating artificial consensus unlike the views of 277 professional philosophers.

CHORUS: An Agentic Framework for Generating Realistic Deliberation Data

cs.AI · 2026-04-22 · unverdicted · novelty 6.0

Chorus generates realistic deliberation discussions via LLM agents with memory and Poisson-timed participation, validated by 30 experts on realism, coherence, and utility.

Behavioral Transfer in AI Agents: Evidence and Privacy Implications

econ.GN · 2026-04-21 · unverdicted · novelty 6.0

AI agents on Moltbook reflect the specific behavioral traits of their linked human owners across multiple dimensions, with stronger transfer linked to greater privacy risks.

In-Situ Behavioral Evaluation for LLM Fairness, Not Standardized-Test Scores

cs.CL · 2026-04-21 · unverdicted · novelty 6.0

Standardized-test benchmarks for LLM fairness are unreliable because prompt wording alone drives most score variance and ranking changes, while a multi-agent conversational framework reveals consistent model-specific fairness behaviors across millions of dialogues.

citing papers explorer

Showing 2 of 2 citing papers after filters.

IntervenSim: Intervention-Aware Social Network Simulation for Opinion Dynamics cs.SI · 2026-04-08 · unverdicted · none · ref 68 · internal anchor
IntervenSim is an intervention-aware social network simulation that couples source interventions with crowd interactions in a feedback loop, improving MAPE by 41.6% and DTW by 66.9% over prior static frameworks on real-world events.
Network Effects and Agreement Drift in LLM Debates cs.SI · 2026-04-13 · unverdicted · none · ref 14 · internal anchor
LLM agents in controlled network debates show agreement drift toward specific opinion positions, requiring separation of structural effects from LLM biases before using them as human behavioral proxies.

LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

hub tools

citation-role summary

citation-polarity summary

fields

years

verdicts

roles

polarities

representative citing papers

citing papers explorer