Simulating Online Social Media Conversations on Controversial Topics Using AI Agents Calibrated on Real-World Data
Pith reviewed 2026-05-18 14:35 UTC · model grok-4.3
The pith
LLM agents calibrated on 2022 Italian election data form realistic social networks and evolve opinions like traditional models, yet generate content with less tone and toxicity variation than real users.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
LLM-based agents initialized with realistic profiles calibrated on 2022 Italian political election conversations generate coherent content, form connections, and build realistic social network structures in a simulated microblogging environment. Their opinion dynamics evolve over time in ways similar to traditional mathematical models. However, the generated content displays less heterogeneity in tone and toxicity than real-world data, and varying opinion-modeling parameters produces no significant changes.
What carries the argument
LLM agents equipped with opinion modeling mechanisms inside an extended microblogging simulator, with profiles calibrated directly on real conversation data.
If this is right
- Controlled simulations of controversial topics become feasible without exposing real users to harm.
- Opinion shifts in the model follow trajectories already described by established mathematical frameworks.
- Current LLM setups under-reproduce the range of tones and toxicity levels seen in actual online exchanges.
- More detailed cognitive modeling during agent initialization is needed before simulations can faithfully replicate human behavior.
Where Pith is reading between the lines
- Testing the same agents on non-political or non-Italian topics could reveal whether calibration on one dataset limits broader applicability.
- Adding sources of individual personality variation at setup might increase content diversity without changing the network-building mechanics.
- The approach could be used to forecast how different moderation rules affect conversation spread under realistic language constraints.
Load-bearing premise
Profiles drawn solely from one 2022 Italian election dataset, together with adjustable opinion parameters, are enough to produce behavior that generalizes and reveals robustness limits.
What would settle it
Compute the variance in tone and toxicity scores across large sets of generated versus real posts from a later election or different country; a persistently smaller variance in the simulated set would falsify the realism claim.
Figures
read the original abstract
Online social networks offer a valuable lens to analyze both individual and collective phenomena. Researchers often use simulators to explore controlled scenarios, and the integration of Large Language Models (LLMs) makes these simulations more realistic by enabling agents to understand and generate natural language content. In this work, we investigate the behavior of LLM-based agents in a simulated microblogging social network. We initialize agents with realistic profiles calibrated on real-world online conversations from the 2022 Italian political election and extend an existing simulator by introducing mechanisms for opinion modeling. We examine how LLM agents simulate online conversations, interact with others, and evolve their opinions under different scenarios. Our results show that LLM agents generate coherent content, form connections, and build a realistic social network structure. However, their generated content displays less heterogeneity in tone and toxicity compared to real data. We also find that LLM-based opinion dynamics evolve over time in ways similar to traditional mathematical models. Varying parameter configurations produces no significant changes, indicating that simulations require more careful cognitive modeling at initialization to replicate human behavior more faithfully. Overall, we demonstrate the potential of LLMs for simulating user behavior in social environments, while also identifying key challenges in capturing heterogeneity and complex dynamics.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces a simulation framework for online social media conversations on controversial topics using LLM-based agents. Agents are initialized with profiles calibrated on 2022 Italian election conversation data, and the simulator is extended with opinion modeling mechanisms. Key findings include that agents generate coherent content and form realistic network structures, but exhibit reduced heterogeneity in tone and toxicity relative to real data; LLM opinion dynamics evolve similarly to traditional mathematical models; and varying opinion-modeling parameters produces no significant changes, implying that more careful cognitive modeling at initialization is required to better replicate human behavior.
Significance. If the central claims hold under rigorous validation, the work contributes to computational social science by demonstrating how LLMs can enhance agent-based simulators with natural language capabilities while identifying limitations in heterogeneity and dynamics. The grounding in real election data and comparison to mathematical opinion models are positive elements that could guide future hybrid modeling approaches.
major comments (2)
- [Results] Results section: the claim that 'LLM-based opinion dynamics evolve over time in ways similar to traditional mathematical models' is presented only qualitatively, without explicit quantitative metrics (e.g., Kolmogorov-Smirnov distances on opinion distributions, fitted update rates, or convergence statistics) or direct comparisons to specific models such as DeGroot averaging or bounded-confidence dynamics. This is load-bearing for the argument that parameter insensitivity indicates a need for better initialization.
- [Results] Results section: the statement that 'Varying parameter configurations produces no significant changes' lacks details on which parameters were varied, the statistical tests or effect sizes used, and any error bars or robustness checks, undermining the conclusion that initialization rather than the opinion update rule drives the outcomes.
minor comments (2)
- [Abstract] Abstract and results: specify the exact quantitative measures (e.g., variance, entropy, or toxicity scores) used to assess reduced heterogeneity in tone and toxicity compared to real data.
- [Methodology] Methodology: clarify the precise prompt structure and opinion-update rule implementation when extending the existing simulator.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments. These observations help clarify how the presentation of our results can be made more rigorous. We address each major comment below and will incorporate the suggested improvements in the revised manuscript.
read point-by-point responses
-
Referee: [Results] Results section: the claim that 'LLM-based opinion dynamics evolve over time in ways similar to traditional mathematical models' is presented only qualitatively, without explicit quantitative metrics (e.g., Kolmogorov-Smirnov distances on opinion distributions, fitted update rates, or convergence statistics) or direct comparisons to specific models such as DeGroot averaging or bounded-confidence dynamics. This is load-bearing for the argument that parameter insensitivity indicates a need for better initialization.
Authors: We agree that the current comparison is primarily qualitative, based on visual inspection of opinion trajectories and aggregate statistics in the Results section. To address this, the revised manuscript will include quantitative metrics: Kolmogorov-Smirnov distances between the evolving opinion distributions produced by the LLM agents and those from DeGroot averaging and bounded-confidence models run on equivalent initial conditions; fitted per-step update rates; and convergence statistics such as time to stabilization and final variance. These additions will provide explicit support for the similarity claim and strengthen the subsequent argument that initialization, rather than the update rule, is the dominant factor. revision: yes
-
Referee: [Results] Results section: the statement that 'Varying parameter configurations produces no significant changes' lacks details on which parameters were varied, the statistical tests or effect sizes used, and any error bars or robustness checks, undermining the conclusion that initialization rather than the opinion update rule drives the outcomes.
Authors: We accept that the reporting of the parameter-sensitivity experiments requires greater specificity. In the revision we will explicitly enumerate the parameters varied (opinion-update threshold, interaction probability, and network rewiring rate), describe the statistical procedure (repeated-measures ANOVA across 20 independent runs per configuration), report effect sizes (partial eta-squared), and display error bars as standard deviation across runs. These details will demonstrate that the observed insensitivity is statistically robust and thereby reinforce the conclusion that more careful cognitive modeling at initialization is needed. revision: yes
Circularity Check
No significant circularity; simulation grounded in external calibration data
full rationale
The paper initializes LLM agents using profiles calibrated directly on external 2022 Italian election conversation data and extends a pre-existing simulator with opinion modeling. Reported outcomes on content coherence, network formation, reduced heterogeneity versus real data, and qualitative similarity of opinion trajectories to traditional models are obtained by executing the simulation under varied parameters and inspecting the generated outputs. No equations, fitted parameters renamed as predictions, or self-citation chains are shown that would make any central result equivalent to its inputs by construction. The derivation therefore remains self-contained against the external benchmark data and independent model comparisons.
Axiom & Free-Parameter Ledger
free parameters (1)
- opinion-modeling parameters
axioms (1)
- domain assumption LLM agents supplied with realistic profiles from election data will generate coherent posts and evolve opinions in a manner comparable to human users and to traditional mathematical models
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We initialize agents with realistic profiles calibrated on real-world online conversations from the 2022 Italian political election and extend an existing simulator by introducing mechanisms for opinion modeling.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
LLM-based opinion dynamics evolve over time in ways similar to traditional mathematical models.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
, " * write output.state after.block = add.period write newline
ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...
-
[2]
" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...
-
[3]
Bakshy, E.; Messing, S.; and Adamic, L. 2015. Political science. Exposure to ideologically diverse news and opinion on Facebook. Science (New York, N.Y.), 348
work page 2015
-
[4]
Barrick, M. R.; and Mount, M. K. 1991. The Big Five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44(1): 1--26
work page 1991
-
[5]
M.; Gebru, T.; McMillan-Major, A.; and Shmitchell, S
Bender, E. M.; Gebru, T.; McMillan-Major, A.; and Shmitchell, S. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, 610–623. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-4503-8309-7
work page 2021
- [6]
- [7]
-
[8]
Conte, R.; and Paolucci, M. 2014. On Agent-Based Modeling and Computational Social Science. Frontiers in Psychology, 5
work page 2014
-
[9]
Corso, F.; Russo, G.; Pierri, F.; and Morales, G. D. F. 2025. Early linguistic fingerprints of online users who engage with conspiracy communities. arXiv preprint arXiv:2506.05086
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[10]
DeGroot, M. H. 1974 a . Reaching a Consensus. Journal of the American Statistical Association, 69(345): 118--121
work page 1974
-
[11]
DeGroot, M. H. 1974 b . Reaching a consensus. Journal of the American Statistical association, 69(345): 118--121
work page 1974
-
[12]
Failla, A.; and Rossetti, G. 2024. “I’m in the Bluesky Tonight”: Insights from a year worth of social data. PLOS ONE, 19(11): e0310330
work page 2024
-
[13]
Floridi, L.; and Chiriatti, M. 2020. GPT-3: Its Nature, Scope, Limits, and Consequences. Minds and Machines, 30(4): 681–694
work page 2020
-
[14]
Fontana, N.; Pierri, F.; and Aiello, L. M. 2025. Nicer Than Humans: How Do Large Language Models Behave in the Prisoner's Dilemma? In Proceedings of the International AAAI Conference on Web and Social Media, volume 19, 522--535
work page 2025
-
[15]
Friedkin, N.; and Johnsen, E. 1990. Social Influence and Opinions. Journal of Mathematical Sociology - J MATH SOCIOL, 15: 193--206
work page 1990
-
[16]
Gao, C.; Lan, X.; Lu, Z.; Mao, J.; Piao, J.; Wang, H.; Jin, D.; and Li, Y. 2023. S3: Social-network Simulation System with Large Language Model-Empowered Agents. arXiv:2307.14984
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[17]
Gausen, A.; Luk, W.; and Guo, C. 2021. Can We Stop Fake News? Using Agent-Based Modelling to Evaluate Countermeasures for Misinformation on Social Media. In Gausen, A.; Luk, W.; and Guo, C., eds., Proceedings of the 15th International AAAI Conference on Web and Social Media (ICWSM). AAAI Press
work page 2021
-
[18]
Hanu, D. 2020. Detoxify. https://github.com/unitaryai/detoxify. Accessed on June 24, 2025
work page 2020
- [19]
- [20]
-
[21]
Lazer, D.; Pentland, A.; Adamic, L.; Aral, S.; Barabási, A.-L.; Brewer, D.; Christakis, N.; Contractor, N.; Fowler, J.; Gutmann, M.; Jebara, T.; King, G.; Macy, M.; Roy, D.; and Alstyne, M. V. 2009. Computational Social Science. Science, 323(5915): 721--723
work page 2009
-
[22]
Liu, J.; Ye, M.; Anderson, B. D.; Basar, T.; and Nedic, A. 2018. Discrete-Time Polar Opinion Dynamics with Heterogeneous Individuals. In 2018 IEEE Conference on Decision and Control (CDC), 1694–1699. IEEE
work page 2018
-
[23]
Liu, Y.; Chen, X.; Zhang, X.; Gao, X.; Zhang, J.; and Yan, R. 2024. From Skepticism to Acceptance: Simulating the Attitude Dynamics Toward Fake News. In Proceedings of the Thirty-ThirdInternational Joint Conference on Artificial Intelligence, IJCAI-2024, 7886–7894. International Joint Conferences on Artificial Intelligence Organization
work page 2024
-
[24]
Macy, M. W.; and Willer, R. 2002. From Factors to Actors: Computational Sociology and Agent-Based Modeling. Annual Review of Sociology, 28: 143--166
work page 2002
-
[25]
McCrae, R. R.; and John, O. P. 1992. An Introduction to the Five-Factor Model and Its Applications. Journal of Personality, 60(2): 175--215
work page 1992
-
[26]
Generative Agents: Interactive Simulacra of Human Behavior
Park, J. S.; O'Brien, J. C.; Cai, C. J.; Morris, M. R.; Liang, P.; and Bernstein, M. S. 2023. Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442
work page internal anchor Pith review Pith/arXiv arXiv 2023
-
[27]
LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals
Park, J. S.; Zou, C. Q.; Shaw, A.; Hill, B. M.; Cai, C.; Morris, M. R.; Willer, R.; Liang, P.; and Bernstein, M. S. 2024. Generative agent simulations of 1,000 people. arXiv preprint arXiv:2411.10109
work page internal anchor Pith review Pith/arXiv arXiv 2024
-
[28]
Emergence of human-like polarization among large language model agents
Piao, J.; Lu, Z.; Gao, C.; Xu, F.; Hu, Q.; Santos, F. P.; Li, Y.; and Evans, J. 2025. Emergence of human-like polarization among large language model agents. arXiv:2501.05171
- [29]
- [30]
-
[31]
Squazzoni, F.; Jager, W.; and Edmonds, B. 2014. Social Simulation in the Social Sciences: A Brief Overview. Social Science Computer Review, 32(3): 279--294
work page 2014
- [32]
-
[33]
Vosoughi, S.; Roy, D.; and Aral, S. 2018. The spread of true and false news online. Science, 359(6380): 1146--1151
work page 2018
-
[34]
A.; Rimell, L.; Isaac, W.; Haas, J.; Legassick, S.; Irving, G.; and Gabriel, I
Weidinger, L.; Uesato, J.; Rauh, M.; Griffin, C.; Huang, P.-S.; Mellor, J.; Glaese, A.; Cheng, M.; Balle, B.; Kasirzadeh, A.; Biles, C.; Brown, S.; Kenton, Z.; Hawkins, W.; Stepleton, T.; Birhane, A.; Hendricks, L. A.; Rimell, L.; Isaac, W.; Haas, J.; Legassick, S.; Irving, G.; and Gabriel, I. 2022b. Taxonomy of Risks posed by Language Models. In Proceedi...
work page 2022
-
[35]
Ye, M.; Liu, J.; and Anderson, B. D. O. 2018. Opinion Dynamics with State-Dependent Susceptibility to Influence. In Proceedings of the 23rd International Symposium on Mathematical Theory of Networks and Systems (MTNS)
work page 2018
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.