Simulating Hate Speech Cascades with Multi-LLM Agents: Empirical Grounding, Modeling Fidelity, and Intervention Strategies

Fan Huang

arxiv: 2606.18264 · v1 · pith:YCXDXOOPnew · submitted 2026-05-21 · 💻 cs.SI · cs.AI· cs.CL

Simulating Hate Speech Cascades with Multi-LLM Agents: Empirical Grounding, Modeling Fidelity, and Intervention Strategies

Fan Huang This is my paper

Pith reviewed 2026-06-30 15:28 UTC · model grok-4.3

classification 💻 cs.SI cs.AIcs.CL

keywords hate speech cascadesmulti-LLM agentscascade simulationBlueskytoxicity modelingsocial media moderationagent heterogeneityintervention strategies

0 comments

The pith

Multi-LLM agents simulate hateful cascades by basing each reshare decision on user profile, community, and post content.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to establish that multi-agent LLM systems can model hateful content propagation more faithfully than classical cascade models, which omit explicit profile, community, and content factors. Empirical examination of three hateful Bluesky cascades shows 97.4-99.7 percent hostile stance among reposters, higher toxicity-engagement homophily on the diffusion tree than the follower graph, and star-like topologies, unlike the tree-like benign control. The multi-LLM simulator reproduces the stance monoculture and toxicity-delta direction. A structured ablation finds agent heterogeneity as the leading fidelity factor, while amplifier targeting on dense networks produces 7.5-12.9 percent toxicity reduction at 5.7 percent benign collateral. Sympathetic readers would care because such models could yield moderation strategies that transfer better to real platforms.

Core claim

In simulation, a multi-LLM-agent simulator reproduces the stance monoculture and the toxicity-delta direction observed in real Bluesky hateful cascades. A structured ablation identifies agent heterogeneity as the leading fidelity factor, and amplifier targeting on dense networks yields 7.5--12.9 percent reduction at 5.7 percent benign collateral.

What carries the argument

The multi-LLM-agent simulator in which each agent decides on reshares after being prompted with the user's profile, surrounding community, and post content.

If this is right

Classical cascade models that omit profile, community, and content factors will produce less faithful simulations of hateful cascades than the multi-LLM approach.
Agent heterogeneity is required to match the empirical patterns of stance monoculture and toxicity homophily.
Amplifier targeting on dense networks can reduce hateful content spread while limiting collateral effects on benign activity.
Simulations that match real empirical signatures can be used to test moderation interventions before live deployment.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same prompting structure might be applied to other platforms if their user and community signals can be extracted at similar granularity.
Running repeated simulations with varied seeds could quantify uncertainty in the predicted intervention effects.
Identifying dense subgraphs for targeting might be combined with real-time network monitoring to prioritize interventions.

Load-bearing premise

LLM agents prompted with user profile, community, and content information can produce reshare decisions that faithfully reflect the real behavioral drivers of hateful cascades.

What would settle it

Running the multi-LLM simulator on new hateful cascades from the same platform and finding that it no longer reproduces the stance monoculture or toxicity-delta direction, or that the simulated intervention reductions do not appear in controlled platform experiments.

Figures

Figures reproduced from arXiv: 2606.18264 by Fan Huang.

**Figure 2.** Figure 2: Homophily delta ∆Ha = Hdiffusion a −Hfollower a per attribute per cascade. Negative values indicate a diffusion tree less homophilic than the follower network; positive values, the converse. ( [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Cumulative reshare profiles per cascade on (a) linear and (b) log time axes. Cascade B shows a fast viral [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Intervention parameter sweeps. (a) Delay-based moderation: cascade reduction as a function of delay [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Cascade structure metrics per cascade (size, depth, virality, time-to-90%). X-axis labels [PITH_FULL_IMAGE:figures/full_fig_p012_5.png] view at source ↗

**Figure 6.** Figure 6: Cross-community penetration by hop distance [PITH_FULL_IMAGE:figures/full_fig_p012_6.png] view at source ↗

**Figure 7.** Figure 7: Homophily Ha comparison: follower network (gray) versus diffusion tree (cascade-colored), per cascade. 1 2 3 4 5 6 Hop distance from root 0.00 0.05 0.10 0.15 0.20 0.25 0.30 P ( r e s h a r e | e x p o s e d a t h o p d ) Cascade A (Anti-trans) Cascade B (Islamophobia) Cascade C (Anti-DEI) [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: Per-hop reshare probability per cascade. [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

**Figure 9.** Figure 9: Fidelity error per model, averaged across the three hateful cascades. Panels (left to right): toxicity-delta [PITH_FULL_IMAGE:figures/full_fig_p015_9.png] view at source ↗

read the original abstract

Faithful modeling of hateful content propagation on online platforms remains an open problem for moderation research. Classical cascade models that do not explicitly represent the profile, community, and content factors associated with hateful-content propagation may yield moderation strategies that behave less effectively when deployed in real-world scenarios. Multi-agent large language model (LLM) systems can, in principle, make each reshare decision depend on the user's profile, the surrounding community, and the post's content, but it remains unclear whether this added flexibility actually reproduces real hateful cascades more faithfully than classical baselines. We study three hateful Bluesky cascades and a size-matched benign control. In the empirical Bluesky data, we found that: 97.4--99.7\% of reposters take a hostile stance; toxicity-engagement homophily is higher on the diffusion tree than on the follower graph for hateful cascades; topology is star-like for the hateful cascades (most reposts come directly from the root) versus tree-like for the benign cascade (reposts propagate through multi-hop chains). In simulation, a multi-LLM-agent simulator reproduces the stance monoculture and the toxicity-delta direction. A structured ablation identifies agent heterogeneity as the leading fidelity factor, and amplifier targeting on dense networks yields 7.5--12.9\% reduction at 5.7\% benign collateral.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Multi-LLM simulation matches some Bluesky hate cascade patterns but rests on thin evidence without methods details.

read the letter

The main point is that this paper builds a multi-LLM agent simulator for hate speech cascades on Bluesky, claims it reproduces stance monoculture and toxicity direction from real data, identifies agent heterogeneity as the key factor via ablation, and shows an amplifier-targeting intervention cutting spread by 7.5-12.9% with 5.7% collateral on dense networks.

It does a few things right. The empirical section reports clear Bluesky observations: 97.4-99.7% hostile stances among reposters, higher toxicity-engagement homophily on diffusion trees than follower graphs for hateful cases, and star-like topology for hate cascades versus tree-like for the benign control. The simulation side compares directly to those observations rather than fitting to itself, and the structured ablation plus intervention numbers give concrete outputs to evaluate.

The soft spots are mostly about missing pieces. The abstract gives percentages and directions but no error bars, full prompting details, network generation steps, or exact baseline comparisons, so it is hard to judge whether the fidelity gains are robust or driven by modeling choices. The central assumption—that LLM agents prompted with profile, community, and content will reflect real reshare drivers—remains untested at the level needed for strong claims. Without the full methods, the soundness stays low.

This is for researchers working on agent-based social simulations and moderation testing. A reader focused on computational approaches to online harm would get value from the empirical grounding and the intervention angle.

It deserves serious referee time because the idea connects simulation to real platform data in a new way, even if the current version needs substantial methods expansion.

Referee Report

2 major / 1 minor

Summary. The paper claims that classical cascade models omit key factors in hateful content propagation and that a multi-LLM-agent simulator, by conditioning reshare decisions on user profile, community, and content, reproduces empirical Bluesky patterns from three hateful cascades and one benign control: 97.4--99.7% hostile stance among reposters, higher toxicity-engagement homophily on the diffusion tree than the follower graph, and star-like vs. tree-like topology. A structured ablation identifies agent heterogeneity as the dominant fidelity driver, and an amplifier-targeting intervention on dense networks yields 7.5--12.9% toxicity reduction at 5.7% benign collateral.

Significance. If the fidelity claims hold after full verification, the work would provide a concrete advance over classical models by enabling simulation-based testing of interventions that incorporate behavioral realism. The empirical grounding against independent Bluesky observations and the explicit ablation on heterogeneity are strengths that could support more reliable moderation strategy evaluation.

major comments (2)

[Abstract and Methods] Abstract and Methods: the central claim that the multi-LLM simulator reproduces stance monoculture and toxicity-delta direction rests on quantitative matches to empirical data, yet the abstract supplies no simulation protocol, network generation procedure, exact comparison metrics, error bars, or verification steps, preventing assessment of whether the reproduction is robust or artifactual.
[Results (ablation study)] Results (ablation study): the assertion that agent heterogeneity is the leading fidelity factor is load-bearing for the modeling contribution, but requires the specific per-metric deltas (with vs. without heterogeneity, vs. other ablations and classical baselines) and statistical tests to be shown; without them the ranking cannot be evaluated.

minor comments (1)

[Abstract] Abstract: the phrase 'toxicity-delta direction' is undefined; the main text should supply its precise definition and how it is computed from the empirical and simulated cascades.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive report and positive assessment of the work's potential contribution. We address each major comment below and will revise the manuscript to improve clarity and substantiation of the claims.

read point-by-point responses

Referee: [Abstract and Methods] Abstract and Methods: the central claim that the multi-LLM simulator reproduces stance monoculture and toxicity-delta direction rests on quantitative matches to empirical data, yet the abstract supplies no simulation protocol, network generation procedure, exact comparison metrics, error bars, or verification steps, preventing assessment of whether the reproduction is robust or artifactual.

Authors: We agree that the abstract would benefit from additional high-level information to allow readers to evaluate the robustness of the reproduction claims. We will revise the abstract to include a concise description of the simulation protocol (multi-LLM agent reshare decisions conditioned on profile/community/content), network generation procedure (size-matched to empirical cascades), comparison metrics (stance distribution, toxicity-engagement homophily, topology), and verification steps (quantitative matches with error bars). Full details remain in Methods, but this addition will address the concern without altering the manuscript's core claims. revision: yes
Referee: [Results (ablation study)] Results (ablation study): the assertion that agent heterogeneity is the leading fidelity factor is load-bearing for the modeling contribution, but requires the specific per-metric deltas (with vs. without heterogeneity, vs. other ablations and classical baselines) and statistical tests to be shown; without them the ranking cannot be evaluated.

Authors: We acknowledge that the ablation results section must explicitly report the per-metric deltas (e.g., stance match, homophily delta, topology metrics) for the heterogeneity ablation versus other ablations and classical baselines, along with the associated statistical tests, to substantiate the ranking. We will add these quantitative values and tests to the revised manuscript. This strengthens the evidence for agent heterogeneity as the dominant fidelity driver without changing the existing conclusion. revision: yes

Circularity Check

0 steps flagged

No significant circularity; simulation validated against independent empirical data

full rationale

The paper's central claims rest on direct comparison of multi-LLM simulation outputs to independent Bluesky cascade observations (stance monoculture, toxicity-delta, topology differences) rather than any self-referential fitting or redefinition. No equations, parameters, or predictions are shown to reduce to the inputs by construction; ablations test heterogeneity as a variable against external fidelity metrics. The derivation chain is self-contained because the empirical grounding data and simulation protocol are described as separate, with results evaluated on held-out structural patterns.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review provides no extractable details on free parameters, axioms, or invented entities; full text would be required to populate the ledger.

pith-pipeline@v0.9.1-grok · 5772 in / 1195 out tokens · 62271 ms · 2026-06-30T15:28:02.685423+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

28 extracted references · 1 canonical work pages

[1]

Science , volume=

Exposure to ideologically diverse news and opinion on Facebook , author=. Science , volume=. 2015 , publisher=

2015
[2]

Proceedings of the National Academy of Sciences , volume=

Can generative AI improve social science? , author=. Proceedings of the National Academy of Sciences , volume=. 2024 , publisher=

2024
[3]

Proceedings of the ACM on human-computer interaction , volume=

You can't stay here: The efficacy of reddit's 2015 ban examined through hate speech , author=. Proceedings of the ACM on human-computer interaction , volume=. 2017 , publisher=

2015
[4]

Political behavior , volume=

Real solutions for fake news? Measuring the effectiveness of general warnings and fact-check tags in reducing belief in false stories on social media , author=. Political behavior , volume=. 2020 , publisher=

2020
[5]

Acm Computing Surveys (Csur) , volume=

A survey on automatic detection of hate speech in text , author=. Acm Computing Surveys (Csur) , volume=. 2018 , publisher=

2018
[6]

Humanities and Social Sciences Communications , volume=

Large language models empowered agent-based modeling and simulation: A survey and perspectives , author=. Humanities and Social Sciences Communications , volume=. 2024 , publisher=

2024
[7]

Management science , volume=

The structural virality of online diffusion , author=. Management science , volume=. 2016 , publisher=

2016
[8]

American journal of sociology , volume=

Threshold models of collective behavior , author=. American journal of sociology , volume=. 1978 , publisher=

1978
[9]

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

Maximizing the spread of influence through a social network , author=. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining , pages=
[10]

Proceedings of the 10th ACM conference on web science , pages=

Spread of hate speech in online social media , author=. Proceedings of the 10th ACM conference on web science , pages=
[11]

Policy & internet , volume=

Cleaning up social media: The effect of warning labels on likelihood of sharing false news on Facebook , author=. Policy & internet , volume=. 2020 , publisher=

2020
[12]

Journal of the European Economic Association , volume=

Fanning the flames of hate: Social media and hate crime , author=. Journal of the European Economic Association , volume=. 2021 , publisher=

2021
[13]

Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

Generative agents: Interactive simulacra of human behavior , author=. Proceedings of the 36th annual acm symposium on user interface software and technology , pages=
[14]

Reviews of modern physics , volume=

Epidemic processes in complex networks , author=. Reviews of modern physics , volume=. 2015 , publisher=

2015
[15]

Management science , volume=

The implied truth effect: Attaching warnings to a subset of fake news headlines increases perceived accuracy of headlines without warnings , author=. Management science , volume=. 2020 , publisher=

2020
[16]

Journal of Computational Social Science , volume=

Social influence and unfollowing accelerate the emergence of echo chambers , author=. Journal of Computational Social Science , volume=. 2021 , publisher=

2021
[17]

Törnberg, D

Simulating social media using large language models to evaluate alternative news feed algorithms , author=. arXiv preprint arXiv:2310.05984 , year=

work page arXiv
[18]

Plos one , volume=

Directions in abusive language training data, a systematic review: Garbage in, garbage out , author=. Plos one , volume=. 2020 , publisher=

2020
[19]

science , volume=

The spread of true and false news online , author=. science , volume=. 2018 , publisher=

2018
[20]

Proceedings of the international AAAI conference on web and social media , year=

Characterizing and detecting hateful users on twitter , author=. Proceedings of the international AAAI conference on web and social media , year=
[21]

Proceedings of the international AAAI conference on web and social media , pages=

Automated hate speech detection and the problem of offensive language , author=. Proceedings of the international AAAI conference on web and social media , pages=
[22]

Political Analysis , volume=

Out of one, many: Using language models to simulate human samples , author=. Political Analysis , volume=. 2023 , publisher=

2023
[23]

2023 , institution=

Large language models as simulated economic agents: What can we learn from homo silicus? , author=. 2023 , institution=

2023
[24]

Computational Linguistics , volume=

Can large language models transform computational social science? , author=. Computational Linguistics , volume=
[25]

science , volume=

The spread of behavior in an online social network experiment , author=. science , volume=. 2010 , publisher=

2010
[26]

Television & new media , volume=

Racism, hate speech, and social media: A systematic review and critique , author=. Television & new media , volume=. 2021 , publisher=

2021
[27]

The British Journal of Criminology , volume=

Hate in the machine: Anti-Black and Anti-Muslim social media posts as predictors of offline racially and religiously aggravated crime , author=. The British Journal of Criminology , volume=. 2020 , publisher=

2020
[28]

Proceedings of the international AAAI conference on web and social media , year=

The effect of extremist violence on hateful speech online , author=. Proceedings of the international AAAI conference on web and social media , year=

[1] [1]

Science , volume=

Exposure to ideologically diverse news and opinion on Facebook , author=. Science , volume=. 2015 , publisher=

2015

[2] [2]

Proceedings of the National Academy of Sciences , volume=

Can generative AI improve social science? , author=. Proceedings of the National Academy of Sciences , volume=. 2024 , publisher=

2024

[3] [3]

Proceedings of the ACM on human-computer interaction , volume=

You can't stay here: The efficacy of reddit's 2015 ban examined through hate speech , author=. Proceedings of the ACM on human-computer interaction , volume=. 2017 , publisher=

2015

[4] [4]

Political behavior , volume=

Real solutions for fake news? Measuring the effectiveness of general warnings and fact-check tags in reducing belief in false stories on social media , author=. Political behavior , volume=. 2020 , publisher=

2020

[5] [5]

Acm Computing Surveys (Csur) , volume=

A survey on automatic detection of hate speech in text , author=. Acm Computing Surveys (Csur) , volume=. 2018 , publisher=

2018

[6] [6]

Humanities and Social Sciences Communications , volume=

Large language models empowered agent-based modeling and simulation: A survey and perspectives , author=. Humanities and Social Sciences Communications , volume=. 2024 , publisher=

2024

[7] [7]

Management science , volume=

The structural virality of online diffusion , author=. Management science , volume=. 2016 , publisher=

2016

[8] [8]

American journal of sociology , volume=

Threshold models of collective behavior , author=. American journal of sociology , volume=. 1978 , publisher=

1978

[9] [9]

Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

Maximizing the spread of influence through a social network , author=. Proceedings of the ninth ACM SIGKDD international conference on Knowledge discovery and data mining , pages=

[10] [10]

Proceedings of the 10th ACM conference on web science , pages=

Spread of hate speech in online social media , author=. Proceedings of the 10th ACM conference on web science , pages=

[11] [11]

Policy & internet , volume=

Cleaning up social media: The effect of warning labels on likelihood of sharing false news on Facebook , author=. Policy & internet , volume=. 2020 , publisher=

2020

[12] [12]

Journal of the European Economic Association , volume=

Fanning the flames of hate: Social media and hate crime , author=. Journal of the European Economic Association , volume=. 2021 , publisher=

2021

[13] [13]

Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

Generative agents: Interactive simulacra of human behavior , author=. Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

[14] [14]

Reviews of modern physics , volume=

Epidemic processes in complex networks , author=. Reviews of modern physics , volume=. 2015 , publisher=

2015

[15] [15]

Management science , volume=

The implied truth effect: Attaching warnings to a subset of fake news headlines increases perceived accuracy of headlines without warnings , author=. Management science , volume=. 2020 , publisher=

2020

[16] [16]

Journal of Computational Social Science , volume=

Social influence and unfollowing accelerate the emergence of echo chambers , author=. Journal of Computational Social Science , volume=. 2021 , publisher=

2021

[17] [17]

Törnberg, D

Simulating social media using large language models to evaluate alternative news feed algorithms , author=. arXiv preprint arXiv:2310.05984 , year=

work page arXiv

[18] [18]

Plos one , volume=

Directions in abusive language training data, a systematic review: Garbage in, garbage out , author=. Plos one , volume=. 2020 , publisher=

2020

[19] [19]

science , volume=

The spread of true and false news online , author=. science , volume=. 2018 , publisher=

2018

[20] [20]

Proceedings of the international AAAI conference on web and social media , year=

Characterizing and detecting hateful users on twitter , author=. Proceedings of the international AAAI conference on web and social media , year=

[21] [21]

Proceedings of the international AAAI conference on web and social media , pages=

Automated hate speech detection and the problem of offensive language , author=. Proceedings of the international AAAI conference on web and social media , pages=

[22] [22]

Political Analysis , volume=

Out of one, many: Using language models to simulate human samples , author=. Political Analysis , volume=. 2023 , publisher=

2023

[23] [23]

2023 , institution=

Large language models as simulated economic agents: What can we learn from homo silicus? , author=. 2023 , institution=

2023

[24] [24]

Computational Linguistics , volume=

Can large language models transform computational social science? , author=. Computational Linguistics , volume=

[25] [25]

science , volume=

The spread of behavior in an online social network experiment , author=. science , volume=. 2010 , publisher=

2010

[26] [26]

Television & new media , volume=

Racism, hate speech, and social media: A systematic review and critique , author=. Television & new media , volume=. 2021 , publisher=

2021

[27] [27]

The British Journal of Criminology , volume=

Hate in the machine: Anti-Black and Anti-Muslim social media posts as predictors of offline racially and religiously aggravated crime , author=. The British Journal of Criminology , volume=. 2020 , publisher=

2020

[28] [28]

Proceedings of the international AAAI conference on web and social media , year=

The effect of extremist violence on hateful speech online , author=. Proceedings of the international AAAI conference on web and social media , year=