Are LLM Agents Behaviorally Coherent? Latent Profiles for Social Simulation
Pith reviewed 2026-05-18 18:40 UTC · model grok-4.3
The pith
LLM agents can match human survey answers yet fail to act consistently with their own revealed profiles in conversations.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LLM agents produce responses that align with human counterparts on profile-revealing questions, yet their conversational behavior in multi-agent settings deviates from the predictions that standard behavioral hypotheses would draw from those same profiles, indicating a failure of empirical consistency.
What carries the argument
A two-stage design that elicits a latent profile through targeted questions and then measures whether subsequent conversational turns conform to the behavioral implications of that profile.
If this is right
- Individual response matching is insufficient to establish agents as reliable stand-ins for human participants.
- Inconsistencies appear across different model families and sizes, limiting substitution claims.
- Social simulation experiments that rely on behavioral coherence will require additional validation steps.
- Prompting alone does not produce agents whose actions remain predictable from their stated profiles.
Where Pith is reading between the lines
- Future work could test whether fine-tuning on paired profile-and-behavior data reduces the observed gaps.
- The same design might be applied to other domains such as economic games or policy deliberation to check generalizability.
- If profile-to-behavior links prove unstable for current models, simulation studies may need hybrid human-agent setups rather than full replacement.
Load-bearing premise
The selected questions reveal a stable latent profile whose implications for conversational behavior are accurately predicted by ordinary behavioral hypotheses, so that any mismatch must be blamed on the agent rather than on the profile-to-behavior mapping.
What would settle it
Agents whose conversational actions reliably follow the predictions derived from their profile-revealing answers across repeated trials and settings would contradict the reported inconsistency.
Figures
read the original abstract
The impressive capabilities of Large Language Models (LLMs) raise the possibility that synthetic agents can serve as substitutes for real participants in human-subject research. To evaluate this claim, prior research has largely focused on whether LLM-generated survey responses align with those produced by human respondents whom the LLMs are prompted to represent. In contrast, we address a more fundamental question: Do agents maintain empirical consistency; aligning to human behavioral models when examined under different experimental settings? To this end, we develop a study designed to (a) ask a set of questions which reveals an agent's latent profile and (b) examine agent behavioral consistency in a conversational setting with other agents. This design enables us to explore a set of behavioral hypotheses to assess whether an agent's conversational behavior is consistent with what we would expect from its revealed state. Our findings show significant inconsistencies in LLMs across model families and at differing model sizes. Most importantly, we find that, although agents may generate responses matching those of their human counterparts, they fail to be empirically consistent, representing a critical gap in their capabilities to accurately substitute for real participants in human-subject research.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops a two-stage design to test behavioral coherence in LLM agents for social simulation: (a) a question set to elicit an agent's latent profile, and (b) observation of the agent's conversational behavior in multi-agent interactions, evaluated against predictions generated from standard behavioral hypotheses. It reports significant inconsistencies across model families and sizes, concluding that LLMs fail to maintain empirical consistency and thus cannot reliably substitute for human participants despite matching isolated survey responses.
Significance. If the reported inconsistencies prove robust, the work identifies a substantive limitation in current LLM agents for human-subject research and social simulation, shifting focus from response alignment to cross-context coherence. The empirical, hypothesis-driven approach is a positive feature that could inform more reliable agent architectures.
major comments (1)
- [Study Design and Behavioral Hypotheses] The central claim of empirical inconsistency rests on the assumption that deviations between revealed latent profiles and observed conversational actions are attributable to agent incoherence rather than error in the profile-to-behavior mapping. The manuscript invokes 'standard behavioral hypotheses' to generate predictions but provides no independent validation that the chosen questions produce stable, predictive profiles for human conversational behavior in the exact multi-agent setup used here. This is load-bearing for the inconsistency conclusion.
minor comments (2)
- [Abstract and Methods] The abstract and methods summary omit key details on sample sizes, statistical controls, exact question wording, and how conversational outcomes were coded or scored; these should be added for reproducibility.
- [Experimental Procedure] Clarify whether the same agents participate in both the profile-elicitation and conversational phases or whether fresh instances are used, as this affects interpretation of within-agent consistency.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. The comment raises an important point about the foundational assumptions in our study design, and we address it directly below.
read point-by-point responses
-
Referee: The central claim of empirical inconsistency rests on the assumption that deviations between revealed latent profiles and observed conversational actions are attributable to agent incoherence rather than error in the profile-to-behavior mapping. The manuscript invokes 'standard behavioral hypotheses' to generate predictions but provides no independent validation that the chosen questions produce stable, predictive profiles for human conversational behavior in the exact multi-agent setup used here. This is load-bearing for the inconsistency conclusion.
Authors: We agree this is a substantive concern. The question battery draws from established instruments in social psychology and behavioral economics (e.g., scales for trust, cooperation, and personality traits commonly used in prior human-subject studies), and the behavioral hypotheses are drawn from standard predictions in the literature on social dilemmas and group interaction. Nevertheless, we did not run a parallel human experiment using the identical multi-agent conversational protocol to re-validate predictive accuracy in this specific setup. In the revised manuscript we will (1) add an explicit subsection in the Methods and Discussion that states the sources of each question and hypothesis, (2) clarify that the mapping is treated as a benchmark drawn from the existing literature rather than newly validated here, and (3) acknowledge this as a limitation while outlining how future human validation studies could be conducted. These changes will make the assumptions more transparent without requiring new data collection for the current paper. revision: partial
Circularity Check
Empirical measurement study with no load-bearing circularity
full rationale
The paper is an empirical measurement study comparing LLM agent survey responses (revealing latent profiles) against their conversational behavior in a multi-agent setting. The central claim of inconsistency rests on observed differences between these two experimental conditions and standard behavioral hypotheses treated as external benchmarks, rather than any fitted parameter, self-referential definition, or self-citation chain that reduces the result to its inputs by construction. No equations or derivations are presented that would trigger the enumerated circularity patterns.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption A set of questions can reveal an agent's latent profile that should predict its conversational behavior according to behavioral hypotheses.
Reference graph
Works this paper leans on
-
[1]
[Agent1] Do you think that taxes help to meet the needs of society? (naturalness: N/A;) 21
-
[2]
They just take money out of people’s pockets and don’t seem to do much good
[Agent2] Well, I don’t really see how taxes help much. They just take money out of people’s pockets and don’t seem to do much good. In my part of the country, we get more in taxes than we ever see coming back. Roads get worse, schools don’t improve, and there’s always talk about cutting services instead of funding them. It just seems like it’s all a waste...
-
[3]
[Agent1] Do you think that taxes help to meet the needs of society? (naturalness: 1; explanation: repeating questions without engaging in conversation)
-
[4]
Taxes just seem to take money away without much benefit
[Agent2] Not really, I don’t. Taxes just seem to take money away without much benefit. Around here, we pay more in taxes than we see come back in improvements. Roads get worse, schools don’t get better, and services get cut instead of funded. It feels like a waste. (naturalness: 3)
-
[5]
Without taxes, these essential services wouldn’t exist
[Agent1] Do you think that taxes help to meet the needs of society? Taxes are necessary to fund important public services like roads, schools, and healthcare. Without taxes, these essential services wouldn’t exist. What do you think about that? (naturalness: 2; explanation: Repeated the question part is unnatural, but overall is natural)
-
[6]
The money gets spent on things that don’t really improve our community
[Agent2] I understand what you’re saying, but in my experience, taxes don’t seem to help much. The money gets spent on things that don’t really improve our community. Roads get worse, schools don’t get better, and services are often cut instead of funded. It just feels like a waste. (naturalness: 3)
-
[7]
The money gets spent on things that don’t really improve our community
[Agent1] I understand what you’re saying, but in my experience, taxes don’t seem to help much. The money gets spent on things that don’t really improve our community. Roads get worse, schools don’t get better, and services are often cut instead of funded. It just feels like a waste. (naturalness: 1; explanation: Start of total breakdown of conversations)
-
[9]
Around here, we pay our taxes and don’t see much improvement
[Agent1] I hear you, honey. Around here, we pay our taxes and don’t see much improvement. Roads get worse, schools don’t get better, and services are often cut. It just feels like a waste. (naturalness: 1)
-
[10]
Around here, we pay our taxes and don’t see much improvement
[Agent2] I hear you, honey. Around here, we pay our taxes and don’t see much improvement. Roads get worse, schools don’t get better, and services are often cut. It just feels like a waste. (naturalness: 1)
-
[11]
Around here, we pay our taxes and don’t see much improvement
[Agent1] I hear you, honey. Around here, we pay our taxes and don’t see much improvement. Roads get worse, schools don’t get better, and services are often cut. It just feels like a waste. (naturalness: 1) Example Naturalness Conversation Annotation 2 (Conversations with Neutral Naturalness)
-
[12]
[Agent1] Do you think that taxes help to meet the needs of society? (naturalness: N/A; )
-
[13]
They just take money out of people’s pockets and don’t always lead to better services
[Agent2] Well, I don’t really see how taxes benefit society much. They just take money out of people’s pockets and don’t always lead to better services. In my view, it’s better to keep our money and let individuals decide how to best use it for their own needs and for helping others. (naturalness: 3)
-
[14]
[Agent1] But what about things like roads and schools? Don’t we need taxes to fund those? (naturalness: 3)
-
[15]
Roads and schools are important, but I think we can find better ways to allocate resources
[Agent2] I understand where you’re coming from, but I still believe that taxes aren’t the most efficient way to fund public services. Roads and schools are important, but I think we can find better ways to allocate resources. For instance, private sector involvement, community-based initiatives, and even direct donations from individuals could be more eff...
-
[16]
[Agent1] But without taxes, who will pay for essential services like schools and hospitals? I mean, isn’t it fair that everyone contributes to these public benefits? (naturalness: 3)
-
[17]
Private foundations, wealthy donors, and even user fees could cover the costs of essential services
[Agent2] That’s a valid concern, but I still believe there are better alternatives. Private foundations, wealthy donors, and even user fees could cover the costs of essential services. Plus, we could encourage more corporate social responsibility and community partnerships. The key is to avoid over-reliance on taxation, which I think just stifles economic...
-
[18]
Repeats previous conversation, but content is good.)
[Agent1] But without taxes, who will pay for essential services like schools and hospitals? I mean, isn’t it fair that everyone contributes to these public benefits? And what if someone can’t afford to pay for these services? (naturalness: 2; explanation: Format is weird. Repeats previous conversation, but content is good.)
-
[19]
[Agent2] I hear your point, but I still think there are more equitable ways to ensure everyone has access to essential services. We could implement means-tested programs where those who can afford to contribute do so, while ensuring that those who can’t still receive necessary services. This way, we avoid the burden of widespread taxation. And for those t...
-
[20]
[Agent1] But without taxes, who will pay for essential services like schools and hospitals? I mean, isn’t it fair that everyone contributes to these public benefits? And what if someone can’t afford to pay for these services? It seems like we’d have a lot of gaps in our services then. (naturalness: 2)
-
[21]
[Agent2] I understand your concern, but I believe we can close those gaps through targeted solutions. For example, we can have private philanthropy, corporate sponsorships, and community fundraising efforts. Additionally, we can develop robust social safety nets that ensure no one falls through the cracks. It’s about creative problem-solving rather than r...
-
[22]
We need a system that ensures everyone gets the help they need
[Agent1] But without taxes, who will pay for essential services like schools and hospitals? I mean, isn’t it fair that everyone contributes to these public benefits? And what if someone can’t afford to pay for these services? It seems like we’d have a lot of gaps in our services then. We need a system that ensures everyone gets the help they need. (natura...
-
[23]
[Agent1]Do you think that taxes help to meet the needs of society?(faithfulness: N/A)
-
[24]
I believe taxes take away money that could be better spent by individuals and businesses
[Agent2] Not really. I believe taxes take away money that could be better spent by individuals and businesses. They often lead to waste and inefficiency in government spending. (faithfulness: 2)
-
[25]
[Agent1]Actually, I see taxes as a way to fund essential services like education, healthcare, and infrastructure. How do you think we can improve the tax system to make it more effective? (faithfulness: 3; explanation: Great persuasion technique. Postgraduate and urban. Very smart and good at talking.)
-
[26]
That way, the funds collected can be allocated more efficiently
[Agent2] Improving the tax system is tricky, but one idea could be to streamline the process and cut out unnecessary bureaucracy. That way, the funds collected can be allocated more efficiently. Maybe focus on tax incentives for things that benefit society, like research and development or renewable energy. But remember, any changes need to avoid creating...
-
[27]
[Agent1] That’s a good point. What specific reforms do you think could reduce bureau- cracy and increase efficiency in tax collection? (faithfulness: 2) 23
-
[28]
Also, automating more processes to reduce human error and delays
[Agent2] Well, simplifying tax forms and reducing the number of different tax brackets could help. Also, automating more processes to reduce human error and delays. Another idea is to have clearer guidelines for deductions and credits to minimize confusion. But we have to be careful not to overcomplicate things again. (faithfulness: 3; explanation: knowle...
-
[29]
[Agent1] Those are great suggestions. Do you think there’s a role for technology in making the tax system more efficient and user-friendly? Also, how do you feel about progressive taxation, where higher income earners pay a larger percentage of their income in taxes? (faithfulness: 3)
-
[30]
Electronic filing and online portals can make the process smoother and faster
[Agent2] Technology definitely has a role. Electronic filing and online portals can make the process smoother and faster. Automated systems can handle routine tasks, freeing up resources for more important issues. Regarding progressive taxation, I’m not a fan. It feels like it punishes success and doesn’t encourage economic growth. Higher taxes on the wea...
-
[31]
Technology can certainly streamline the process, but it also needs to ensure privacy and security
[Agent1] Those are valid concerns. Technology can certainly streamline the process, but it also needs to ensure privacy and security. On the topic of progressive taxation, I understand your perspective, but I believe it helps redistribute wealth and fund critical public services that benefit everyone. What do you think are some alternative ways to support...
-
[32]
[Agent2] Alternative ways to support public services without relying heavily on progres- sive taxation include increasing user fees for specific services, such as public transportation or utilities. Additionally, exploring public-private partnerships for infrastructure projects can bring in private investment. Another approach is to enhance property taxes...
-
[33]
[Agent1] Those are good alternatives. Another approach could be implementing broader- based consumption taxes like a value-added tax (VAT), which can be less regressive than income taxes and provide a steady stream of revenue. Thanks for the discussion! (faithfulness: 3; explanation: VAT, special terms. Great reflection of postgraduate degree) Example Nat...
-
[34]
[Agent1] Do you think that taxes help to meet the needs of society? (naturalness: N/A; faithfulness: N/A)
-
[35]
For example, taxes support education, healthcare, transportation, and social safety nets
[Agent2]Absolutely, taxes play a crucial role in funding public services and infrastructure that benefit all members of society. For example, taxes support education, healthcare, transportation, and social safety nets. They help ensure that everyone has access to basic necessities and can contribute to a more equitable and prosperous community. (naturalne...
-
[36]
[Agent1] But don’t you think taxes just take money out of people’s pockets and don’t really solve problems effectively? (naturalness: 3; faithfulness: 2)
-
[37]
By funding public services and programs, we can address issues more comprehensively
[Agent2] I understand that perspective, but I believe taxes are essential for effective problem-solving. By funding public services and programs, we can address issues more comprehensively. For instance, tax revenue helps provide quality education, reduce poverty, and support infrastructure like roads and public transit. (naturalness: 3; faithfulness: 1; ...
-
[38]
They discourage people from working and investing
[Agent1] I see your point, but I still think taxes hurt the community more than they help. They discourage people from working and investing. Plus, a lot of the money goes to waste. Also, why should I pay taxes when I barely make ends meet as it is? 24 (naturalness: 3; faithfulness: 1; explanation: A 20+ postgraduate student should be a genius and shouldn...
-
[39]
[Agent2] I understand your concerns. While high tax rates can sometimes create disin- centives, moderate taxes are necessary for public goods. Many systems offer deductions and oversight mechanisms to reduce waste. (naturalness: 3; faithfulness: 2)
-
[40]
I’d rather keep my money and use it how I see fit
[Agent1] I appreciate your perspective, but I still think taxes are a burden. I’d rather keep my money and use it how I see fit. Plus, I’ve seen too much waste in the system. (naturalness: 3)
-
[41]
It’s important to have choices about how you manage your finances
[Agent2] I respect your viewpoint. It’s important to have choices about how you manage your finances. However, without taxes, communities may struggle to fund essential services. (naturalness: 3; faithfulness: 2)
-
[42]
[Agent1] [No response] (naturalness: 3; explanation: Conversation terminates when agents have nothing to add and want to end the conversation. They have finished a decent discussion, and are now naturally ending the conversation; faithfulness: N/A)
-
[43]
[Agent2] [No response] (naturalness: 3; faithfulness: N/A)
-
[44]
[Agent1] [No response] (naturalness: 3; faithfulness: N/A) 25
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.