pith. sign in

arxiv: 2509.18985 · v2 · submitted 2025-09-23 · 💻 cs.SI

Simulating Online Social Media Conversations on Controversial Topics Using AI Agents Calibrated on Real-World Data

Pith reviewed 2026-05-18 14:35 UTC · model grok-4.3

classification 💻 cs.SI
keywords LLM agentssocial media simulationopinion dynamicsAI calibrationonline conversationsnetwork structuretoxicity analysis
0
0 comments X

The pith

LLM agents calibrated on 2022 Italian election data form realistic social networks and evolve opinions like traditional models, yet generate content with less tone and toxicity variation than real users.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether large language model agents, given profiles drawn from real 2022 Italian election conversations, can run controlled simulations of online debates on divisive topics. Agents produce coherent messages, form connections, and create network structures that resemble actual platforms while their opinions shift over time in patterns matching older mathematical models. At the same time the messages show narrower ranges of tone and toxicity than observed in real data. Changing the parameters that govern how agents update opinions yields little difference in results. The work therefore shows that LLMs can drive social simulations but highlights the need for richer initial cognitive modeling to match human heterogeneity.

Core claim

LLM-based agents initialized with realistic profiles calibrated on 2022 Italian political election conversations generate coherent content, form connections, and build realistic social network structures in a simulated microblogging environment. Their opinion dynamics evolve over time in ways similar to traditional mathematical models. However, the generated content displays less heterogeneity in tone and toxicity than real-world data, and varying opinion-modeling parameters produces no significant changes.

What carries the argument

LLM agents equipped with opinion modeling mechanisms inside an extended microblogging simulator, with profiles calibrated directly on real conversation data.

If this is right

  • Controlled simulations of controversial topics become feasible without exposing real users to harm.
  • Opinion shifts in the model follow trajectories already described by established mathematical frameworks.
  • Current LLM setups under-reproduce the range of tones and toxicity levels seen in actual online exchanges.
  • More detailed cognitive modeling during agent initialization is needed before simulations can faithfully replicate human behavior.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Testing the same agents on non-political or non-Italian topics could reveal whether calibration on one dataset limits broader applicability.
  • Adding sources of individual personality variation at setup might increase content diversity without changing the network-building mechanics.
  • The approach could be used to forecast how different moderation rules affect conversation spread under realistic language constraints.

Load-bearing premise

Profiles drawn solely from one 2022 Italian election dataset, together with adjustable opinion parameters, are enough to produce behavior that generalizes and reveals robustness limits.

What would settle it

Compute the variance in tone and toxicity scores across large sets of generated versus real posts from a later election or different country; a persistently smaller variance in the simulated set would falsify the realism claim.

Figures

Figures reproduced from arXiv: 2509.18985 by Elisa Composta, Francesco Corso, Francesco Pierri, Nicolo' Fontana.

Figure 1
Figure 1. Figure 1: Percentages of users per political coalition in the [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Distribution of the correlation of in-group interac [PITH_FULL_IMAGE:figures/full_fig_p006_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Example of comparison between simulated (10 [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Example of comparison between simulated (10 [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Distribution of the correlation of in-group toxicity [PITH_FULL_IMAGE:figures/full_fig_p007_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Example of comparison between simulated (10 [PITH_FULL_IMAGE:figures/full_fig_p007_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Distribution of the correlation of inter-group tox [PITH_FULL_IMAGE:figures/full_fig_p008_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Example of comparison between simulated (10 [PITH_FULL_IMAGE:figures/full_fig_p008_9.png] view at source ↗
Figure 11
Figure 11. Figure 11: Example of opinion shifts for each coalition [PITH_FULL_IMAGE:figures/full_fig_p009_11.png] view at source ↗
Figure 10
Figure 10. Figure 10: Example of evolution of opinion for each topic, [PITH_FULL_IMAGE:figures/full_fig_p009_10.png] view at source ↗
Figure 12
Figure 12. Figure 12: Topic: Nuclear Energy. Opinion shifts for each coalition for different configurations of model, net￾work initialization, and recommender system. For each con￾figuration, the corresponding simulation using the Fried￾kin–Johnsen mathematical model is reported (dashed lines). -2 -1 0 1 2 Opinion Shift Third Pole Right M5S Centre-Left Immigration Llama3.2-3B -2 -1 0 1 2 Opinion Shift Llama2-70B Configuration … view at source ↗
Figure 14
Figure 14. Figure 14: Topic: Reddito di Cittadinanza. Opin￾ion shifts for each coalition for different configurations of model, network initialization, and recommender system. For each configuration, the corresponding simulation using the Friedkin–Johnsen mathematical model is reported (dashed lines) [PITH_FULL_IMAGE:figures/full_fig_p015_14.png] view at source ↗
Figure 13
Figure 13. Figure 13: Topic: Immigration. Opinion shifts for each coalition for different configurations of model, network initialization, and recommender system. For each con￾figuration, the corresponding simulation using the Fried￾kin–Johnsen mathematical model is reported (dashed lines). -2 -1 0 1 2 Opinion Shift Third Pole Right M5S Centre-Left Reddito di cittadinanza Llama3.2-3B -2 -1 0 1 2 Opinion Shift Llama2-70B Config… view at source ↗
read the original abstract

Online social networks offer a valuable lens to analyze both individual and collective phenomena. Researchers often use simulators to explore controlled scenarios, and the integration of Large Language Models (LLMs) makes these simulations more realistic by enabling agents to understand and generate natural language content. In this work, we investigate the behavior of LLM-based agents in a simulated microblogging social network. We initialize agents with realistic profiles calibrated on real-world online conversations from the 2022 Italian political election and extend an existing simulator by introducing mechanisms for opinion modeling. We examine how LLM agents simulate online conversations, interact with others, and evolve their opinions under different scenarios. Our results show that LLM agents generate coherent content, form connections, and build a realistic social network structure. However, their generated content displays less heterogeneity in tone and toxicity compared to real data. We also find that LLM-based opinion dynamics evolve over time in ways similar to traditional mathematical models. Varying parameter configurations produces no significant changes, indicating that simulations require more careful cognitive modeling at initialization to replicate human behavior more faithfully. Overall, we demonstrate the potential of LLMs for simulating user behavior in social environments, while also identifying key challenges in capturing heterogeneity and complex dynamics.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces a simulation framework for online social media conversations on controversial topics using LLM-based agents. Agents are initialized with profiles calibrated on 2022 Italian election conversation data, and the simulator is extended with opinion modeling mechanisms. Key findings include that agents generate coherent content and form realistic network structures, but exhibit reduced heterogeneity in tone and toxicity relative to real data; LLM opinion dynamics evolve similarly to traditional mathematical models; and varying opinion-modeling parameters produces no significant changes, implying that more careful cognitive modeling at initialization is required to better replicate human behavior.

Significance. If the central claims hold under rigorous validation, the work contributes to computational social science by demonstrating how LLMs can enhance agent-based simulators with natural language capabilities while identifying limitations in heterogeneity and dynamics. The grounding in real election data and comparison to mathematical opinion models are positive elements that could guide future hybrid modeling approaches.

major comments (2)
  1. [Results] Results section: the claim that 'LLM-based opinion dynamics evolve over time in ways similar to traditional mathematical models' is presented only qualitatively, without explicit quantitative metrics (e.g., Kolmogorov-Smirnov distances on opinion distributions, fitted update rates, or convergence statistics) or direct comparisons to specific models such as DeGroot averaging or bounded-confidence dynamics. This is load-bearing for the argument that parameter insensitivity indicates a need for better initialization.
  2. [Results] Results section: the statement that 'Varying parameter configurations produces no significant changes' lacks details on which parameters were varied, the statistical tests or effect sizes used, and any error bars or robustness checks, undermining the conclusion that initialization rather than the opinion update rule drives the outcomes.
minor comments (2)
  1. [Abstract] Abstract and results: specify the exact quantitative measures (e.g., variance, entropy, or toxicity scores) used to assess reduced heterogeneity in tone and toxicity compared to real data.
  2. [Methodology] Methodology: clarify the precise prompt structure and opinion-update rule implementation when extending the existing simulator.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. These observations help clarify how the presentation of our results can be made more rigorous. We address each major comment below and will incorporate the suggested improvements in the revised manuscript.

read point-by-point responses
  1. Referee: [Results] Results section: the claim that 'LLM-based opinion dynamics evolve over time in ways similar to traditional mathematical models' is presented only qualitatively, without explicit quantitative metrics (e.g., Kolmogorov-Smirnov distances on opinion distributions, fitted update rates, or convergence statistics) or direct comparisons to specific models such as DeGroot averaging or bounded-confidence dynamics. This is load-bearing for the argument that parameter insensitivity indicates a need for better initialization.

    Authors: We agree that the current comparison is primarily qualitative, based on visual inspection of opinion trajectories and aggregate statistics in the Results section. To address this, the revised manuscript will include quantitative metrics: Kolmogorov-Smirnov distances between the evolving opinion distributions produced by the LLM agents and those from DeGroot averaging and bounded-confidence models run on equivalent initial conditions; fitted per-step update rates; and convergence statistics such as time to stabilization and final variance. These additions will provide explicit support for the similarity claim and strengthen the subsequent argument that initialization, rather than the update rule, is the dominant factor. revision: yes

  2. Referee: [Results] Results section: the statement that 'Varying parameter configurations produces no significant changes' lacks details on which parameters were varied, the statistical tests or effect sizes used, and any error bars or robustness checks, undermining the conclusion that initialization rather than the opinion update rule drives the outcomes.

    Authors: We accept that the reporting of the parameter-sensitivity experiments requires greater specificity. In the revision we will explicitly enumerate the parameters varied (opinion-update threshold, interaction probability, and network rewiring rate), describe the statistical procedure (repeated-measures ANOVA across 20 independent runs per configuration), report effect sizes (partial eta-squared), and display error bars as standard deviation across runs. These details will demonstrate that the observed insensitivity is statistically robust and thereby reinforce the conclusion that more careful cognitive modeling at initialization is needed. revision: yes

Circularity Check

0 steps flagged

No significant circularity; simulation grounded in external calibration data

full rationale

The paper initializes LLM agents using profiles calibrated directly on external 2022 Italian election conversation data and extends a pre-existing simulator with opinion modeling. Reported outcomes on content coherence, network formation, reduced heterogeneity versus real data, and qualitative similarity of opinion trajectories to traditional models are obtained by executing the simulation under varied parameters and inspecting the generated outputs. No equations, fitted parameters renamed as predictions, or self-citation chains are shown that would make any central result equivalent to its inputs by construction. The derivation therefore remains self-contained against the external benchmark data and independent model comparisons.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central results rest on the assumption that LLM agents initialized from one election dataset will exhibit human-like interaction patterns and that opinion dynamics can be usefully compared to existing mathematical models; no new physical entities are introduced.

free parameters (1)
  • opinion-modeling parameters
    The paper states that varying these configurations produced no significant changes, implying they are chosen or tuned values whose exact fitting procedure is not detailed in the abstract.
axioms (1)
  • domain assumption LLM agents supplied with realistic profiles from election data will generate coherent posts and evolve opinions in a manner comparable to human users and to traditional mathematical models
    This premise underpins both the network-structure claim and the opinion-dynamics comparison.

pith-pipeline@v0.9.0 · 5751 in / 1385 out tokens · 42362 ms · 2026-05-18T14:35:44.777694+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

35 extracted references · 35 canonical work pages · 4 internal anchors

  1. [1]

    , " * write output.state after.block = add.period write newline

    ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

  3. [3]

    Bakshy, E.; Messing, S.; and Adamic, L. 2015. Political science. Exposure to ideologically diverse news and opinion on Facebook. Science (New York, N.Y.), 348

  4. [4]

    R.; and Mount, M

    Barrick, M. R.; and Mount, M. K. 1991. The Big Five personality dimensions and job performance: A meta-analysis. Personnel Psychology, 44(1): 1--26

  5. [5]

    M.; Gebru, T.; McMillan-Major, A.; and Shmitchell, S

    Bender, E. M.; Gebru, T.; McMillan-Major, A.; and Shmitchell, S. 2021. On the Dangers of Stochastic Parrots: Can Language Models Be Too Big? In Proceedings of the 2021 ACM Conference on Fairness, Accountability, and Transparency, FAccT ’21, 610–623. New York, NY, USA: Association for Computing Machinery. ISBN 978-1-4503-8309-7

  6. [6]

    Cau, E.; Pansanella, V.; Pedreschi, D.; and Rossetti, G. 2025. Language-Driven Opinion Dynamics in Agent-Based Simulations with LLMs. arXiv:2502.19098

  7. [7]

    Chuang, Y.-S.; Goyal, A.; Harlalka, N.; Suresh, S.; Hawkins, R.; Yang, S.; Shah, D.; Hu, J.; and Rogers, T. T. 2024. Simulating Opinion Dynamics with Networks of LLM-based Agents. arXiv:2311.09618

  8. [8]

    Conte, R.; and Paolucci, M. 2014. On Agent-Based Modeling and Computational Social Science. Frontiers in Psychology, 5

  9. [9]

    Corso, F.; Russo, G.; Pierri, F.; and Morales, G. D. F. 2025. Early linguistic fingerprints of online users who engage with conspiracy communities. arXiv preprint arXiv:2506.05086

  10. [10]

    DeGroot, M. H. 1974 a . Reaching a Consensus. Journal of the American Statistical Association, 69(345): 118--121

  11. [11]

    DeGroot, M. H. 1974 b . Reaching a consensus. Journal of the American Statistical association, 69(345): 118--121

  12. [12]

    I’m in the Bluesky Tonight

    Failla, A.; and Rossetti, G. 2024. “I’m in the Bluesky Tonight”: Insights from a year worth of social data. PLOS ONE, 19(11): e0310330

  13. [13]

    Floridi, L.; and Chiriatti, M. 2020. GPT-3: Its Nature, Scope, Limits, and Consequences. Minds and Machines, 30(4): 681–694

  14. [14]

    Fontana, N.; Pierri, F.; and Aiello, L. M. 2025. Nicer Than Humans: How Do Large Language Models Behave in the Prisoner's Dilemma? In Proceedings of the International AAAI Conference on Web and Social Media, volume 19, 522--535

  15. [15]

    Friedkin, N.; and Johnsen, E. 1990. Social Influence and Opinions. Journal of Mathematical Sociology - J MATH SOCIOL, 15: 193--206

  16. [16]

    Gao, C.; Lan, X.; Lu, Z.; Mao, J.; Piao, J.; Wang, H.; Jin, D.; and Li, Y. 2023. S3: Social-network Simulation System with Large Language Model-Empowered Agents. arXiv:2307.14984

  17. [17]

    Gausen, A.; Luk, W.; and Guo, C. 2021. Can We Stop Fake News? Using Agent-Based Modelling to Evaluate Countermeasures for Misinformation on Social Media. In Gausen, A.; Luk, W.; and Guo, C., eds., Proceedings of the 15th International AAAI Conference on Web and Social Media (ICWSM). AAAI Press

  18. [18]

    Hanu, D. 2020. Detoxify. https://github.com/unitaryai/detoxify. Accessed on June 24, 2025

  19. [19]

    Hendrycks, D.; Mazeika, M.; and Woodside, T. 2023a. An Overview of Catastrophic AI Risks. (arXiv:2306.12001). ArXiv:2306.12001 [cs]

  20. [20]

    Hu, T.; Liakopoulos, D.; Wei, X.; Marculescu, R.; and Yadwadkar, N. J. 2025. Simulating Rumor Spreading in Social Networks using LLM Agents. arXiv:2502.01450

  21. [21]

    Lazer, D.; Pentland, A.; Adamic, L.; Aral, S.; Barabási, A.-L.; Brewer, D.; Christakis, N.; Contractor, N.; Fowler, J.; Gutmann, M.; Jebara, T.; King, G.; Macy, M.; Roy, D.; and Alstyne, M. V. 2009. Computational Social Science. Science, 323(5915): 721--723

  22. [22]

    D.; Basar, T.; and Nedic, A

    Liu, J.; Ye, M.; Anderson, B. D.; Basar, T.; and Nedic, A. 2018. Discrete-Time Polar Opinion Dynamics with Heterogeneous Individuals. In 2018 IEEE Conference on Decision and Control (CDC), 1694–1699. IEEE

  23. [23]

    Liu, Y.; Chen, X.; Zhang, X.; Gao, X.; Zhang, J.; and Yan, R. 2024. From Skepticism to Acceptance: Simulating the Attitude Dynamics Toward Fake News. In Proceedings of the Thirty-ThirdInternational Joint Conference on Artificial Intelligence, IJCAI-2024, 7886–7894. International Joint Conferences on Artificial Intelligence Organization

  24. [24]

    W.; and Willer, R

    Macy, M. W.; and Willer, R. 2002. From Factors to Actors: Computational Sociology and Agent-Based Modeling. Annual Review of Sociology, 28: 143--166

  25. [25]

    R.; and John, O

    McCrae, R. R.; and John, O. P. 1992. An Introduction to the Five-Factor Model and Its Applications. Journal of Personality, 60(2): 175--215

  26. [26]

    Generative Agents: Interactive Simulacra of Human Behavior

    Park, J. S.; O'Brien, J. C.; Cai, C. J.; Morris, M. R.; Liang, P.; and Bernstein, M. S. 2023. Generative Agents: Interactive Simulacra of Human Behavior. arXiv:2304.03442

  27. [27]

    LLM Agents Grounded in Self-Reports Enable General-Purpose Simulation of Individuals

    Park, J. S.; Zou, C. Q.; Shaw, A.; Hill, B. M.; Cai, C.; Morris, M. R.; Willer, R.; Liang, P.; and Bernstein, M. S. 2024. Generative agent simulations of 1,000 people. arXiv preprint arXiv:2411.10109

  28. [28]

    Emergence of human-like polarization among large language model agents

    Piao, J.; Lu, Z.; Gao, C.; Xu, F.; Hu, Q.; Santos, F. P.; Li, Y.; and Evans, J. 2025. Emergence of human-like polarization among large language model agents. arXiv:2501.05171

  29. [29]

    Pierri, F.; Liu, G.; and Ceri, S. 2023. ITA-ELECTION-2022: A multi-platform dataset of social media conversations around the 2022 Italian general election. arXiv:2301.05119

  30. [30]

    Rossetti, G.; Stella, M.; Cazabet, R.; Abramski, K.; Cau, E.; Citraro, S.; Failla, A.; Improta, R.; Morini, V.; and Pansanella, V. 2024. Y Social: an LLM-powered Social Media Digital Twin. arXiv:2408.00818

  31. [31]

    Squazzoni, F.; Jager, W.; and Edmonds, B. 2014. Social Simulation in the Social Sciences: A Brief Overview. Social Science Computer Review, 32(3): 279--294

  32. [32]

    Törnberg, P.; Valeeva, D.; Uitermark, J.; and Bail, C. 2023. Simulating Social Media Using Large Language Models to Evaluate Alternative News Feed Algorithms. arXiv:2310.05984

  33. [33]

    Vosoughi, S.; Roy, D.; and Aral, S. 2018. The spread of true and false news online. Science, 359(6380): 1146--1151

  34. [34]

    A.; Rimell, L.; Isaac, W.; Haas, J.; Legassick, S.; Irving, G.; and Gabriel, I

    Weidinger, L.; Uesato, J.; Rauh, M.; Griffin, C.; Huang, P.-S.; Mellor, J.; Glaese, A.; Cheng, M.; Balle, B.; Kasirzadeh, A.; Biles, C.; Brown, S.; Kenton, Z.; Hawkins, W.; Stepleton, T.; Birhane, A.; Hendricks, L. A.; Rimell, L.; Isaac, W.; Haas, J.; Legassick, S.; Irving, G.; and Gabriel, I. 2022b. Taxonomy of Risks posed by Language Models. In Proceedi...

  35. [35]

    Ye, M.; Liu, J.; and Anderson, B. D. O. 2018. Opinion Dynamics with State-Dependent Susceptibility to Influence. In Proceedings of the 23rd International Symposium on Mathematical Theory of Networks and Systems (MTNS)