pith. sign in

arxiv: 2405.18374 · v2 · submitted 2024-05-28 · 💻 cs.CY · cs.HC

Assessing How Hate, Counterspeech, and Toxicity Affect Hate Group Newcomers

Pith reviewed 2026-05-24 01:12 UTC · model grok-4.3

classification 💻 cs.CY cs.HC
keywords counterspeechhate speechtoxicityRedditnewcomersonline communitiesretentionsocial media
0
0 comments X

The pith

Newcomers who post hate speech and receive counterspeech are less likely to continue posting in hate subreddits.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper studies how counterspeech influences whether newcomers keep participating in online hate communities after they post hate speech. Using data from over 16,000 newcomers in 104 Reddit hate subreddits, it shows that counterspeech reduces the chance they will post again, instead of making them more committed. It also measures toxicity in counterspeech and finds that while counterspeech is more toxic than normal discussion, its toxicity level does not change the retention outcome but does increase the odds of further hostile replies from the newcomer. These patterns matter for efforts to limit hate speech because they indicate that responses can discourage new participants rather than entrench them.

Core claim

Newcomers using hate speech who receive counterspeech are less likely to continue posting within these hate subreddits, rather than becoming galvanized. Counterspeech comments are less toxic than hate speech comments but almost twice as toxic as other discourse. No association exists between the toxicity of counterspeech and its effects on user retention, yet toxic counterspeech increases the probability of continued hostility from hate users within the same discussion.

What carries the argument

LLM-based counterspeech detection applied to observational posting records of 16,513 newcomers across 104 hate subreddits.

If this is right

  • Counterspeech reduces the likelihood that newcomers continue posting in hate subreddits.
  • Toxicity of counterspeech has no measurable effect on whether users stay or leave the community.
  • Toxic counterspeech raises the probability of continued hostile replies from the original poster in the same thread.
  • Many newcomers may be testing proscribed beliefs rather than acting as committed adherents.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Platforms might reduce newcomer involvement by promoting or surfacing counterspeech early in threads.
  • The retention drop could reflect boundary-testing behavior that fades once the belief meets opposition.
  • Effects observed on Reddit may differ on platforms with different moderation norms or reply visibility.
  • Longer-term tracking could reveal whether the initial drop in posting leads to permanent disengagement or migration elsewhere.

Load-bearing premise

The LLM-based counterspeech detection accurately identifies true counterspeech without significant false positives or negatives, and the observational data allows inferring the effect of counterspeech on retention without major confounding.

What would settle it

A controlled experiment that randomly assigns counterspeech replies to some hate-posting newcomers and measures their subsequent posting rates in the same subreddits.

Figures

Figures reproduced from arXiv: 2405.18374 by Daniel Hickey, Daniel M.T. Fessler, Goran Muri\'c, Keith Burghardt, Kristina Lerman, Matheus Schmitz, Paul E. Smaldino.

Figure 1
Figure 1. Figure 1: Assessing the impact of counterspeech on new hate [PITH_FULL_IMAGE:figures/full_fig_p001_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Performance of counterspeech detection models [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Mean toxicity by type of reply. Black vertical lines [PITH_FULL_IMAGE:figures/full_fig_p007_3.png] view at source ↗
Figure 5
Figure 5. Figure 5: Probability of receiving a toxic follow-up reply [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

Counterspeech has gained attention as a strategy to reduce hate speech on social media. Although previous studies suggest that counterspeech can reduce hate speech, little is known about its effects on participation in online hate communities. Relatedly, we lack an understanding about the degree of hostility in counterspeech. Hostile counterspeech may increase online conflict, potentially hardening the positions of hate adherents, and further eroding online environments. Here, we analyzed the effect of counterspeech on 16,513 newcomers across 104 hate subreddits (forums within Reddit.com). We devised an LLM-based counterspeech detection approach that outperforms specialized models trained on existing datasets, then examined the presence, and effects of, hostility. While counterspeech comments are less toxic than hate speech comments, they are almost twice as toxic as other discourse within hate subreddits. We then evaluated the effect of counterspeech on newcomer engagement in hate subreddits. We found that newcomers using hate speech who receive counterspeech are less likely to continue posting within these hate subreddits, rather than becoming galvanized. We speculate that, instead of constituting ardent hate adherents, readily-dissuaded newcomers may merely be toying with beliefs that are proscribed in other contexts. Although we found no association between the toxicity of counterspeech and its effects on user retention, consistent with prior research regarding the harmful effects of toxic speech, we found that toxic counterspeech increases the probability of continued hostility from hate users within the same discussion.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper analyzes counterspeech effects on 16,513 newcomers across 104 hate subreddits using Reddit data. It introduces an LLM-based counterspeech detector claimed to outperform specialized models, reports that counterspeech is less toxic than hate speech but nearly twice as toxic as other subreddit discourse, and finds that hate-speech-posting newcomers receiving counterspeech are less likely to continue posting (rather than becoming galvanized). It further notes no association between counterspeech toxicity and retention but an increase in continued hostility from toxic counterspeech within the same thread.

Significance. If the central observational association can be shown to support causal inference, the result would contribute to computational social science by suggesting counterspeech deters rather than entrenches participation in hate communities, with implications for moderation design. The scale of the newcomer cohort and the toxicity comparisons add empirical value, though the lack of reported identification strategies limits immediate policy weight.

major comments (3)
  1. [Abstract and Results] Abstract and Results sections: the claim that counterspeech recipients show lower retention is presented as an effect on engagement, yet the analysis is purely observational with no reported fixed effects, propensity matching, or other identification strategy to address selection on unobservables (e.g., pre-existing user commitment, post visibility, or subreddit norms that jointly predict both counterspeech receipt and disengagement).
  2. [Methods] Methods section: the LLM counterspeech detector is validated only against existing external datasets rather than against human labels collected on the specific corpus of newcomer hate-speech posts; this gap directly affects the reliability of the downstream retention comparison.
  3. [Results] Results section: no error bars, confidence intervals, or robustness checks (e.g., alternative classifiers, subsample analyses) are mentioned for the reported retention differences, undermining assessment of whether the observed association is statistically distinguishable from noise.
minor comments (1)
  1. [Abstract] The abstract could more explicitly distinguish the reported association from a causal claim to avoid over-interpretation by readers.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed comments. We address each major comment below, clarifying the observational nature of the study and outlining planned revisions to improve clarity, transparency, and rigor.

read point-by-point responses
  1. Referee: [Abstract and Results] Abstract and Results sections: the claim that counterspeech recipients show lower retention is presented as an effect on engagement, yet the analysis is purely observational with no reported fixed effects, propensity matching, or other identification strategy to address selection on unobservables (e.g., pre-existing user commitment, post visibility, or subreddit norms that jointly predict both counterspeech receipt and disengagement).

    Authors: We agree that the analysis is observational and does not include causal identification strategies. The manuscript employs the term 'effect' in a descriptive sense, but to prevent any implication of causality we will revise the abstract and results sections to use 'association' and 'relationship' throughout. We will also add an explicit statement in the methods and discussion sections noting the observational design and the absence of controls for selection on unobservables. revision: yes

  2. Referee: [Methods] Methods section: the LLM counterspeech detector is validated only against existing external datasets rather than against human labels collected on the specific corpus of newcomer hate-speech posts; this gap directly affects the reliability of the downstream retention comparison.

    Authors: Validation was conducted on established external datasets to benchmark against prior work. We acknowledge that human annotation on the specific newcomer corpus would provide stronger domain-specific evidence. Because new annotation collection lies outside the scope of the current study, we will add a dedicated limitations paragraph discussing this gap and its potential implications for the retention analyses. revision: partial

  3. Referee: [Results] Results section: no error bars, confidence intervals, or robustness checks (e.g., alternative classifiers, subsample analyses) are mentioned for the reported retention differences, undermining assessment of whether the observed association is statistically distinguishable from noise.

    Authors: We will incorporate confidence intervals or error bars for all reported retention differences. In addition, we will perform and report robustness checks using alternative classifiers and relevant subsample analyses in the revised results section. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical observational analysis with external data and validation

full rationale

The paper conducts an observational study on Reddit data for 16,513 newcomers across 104 subreddits, using an LLM-based detector validated against existing datasets and examining associations with retention and toxicity. No equations, derivations, or first-principles claims are presented that reduce to fitted inputs or self-definitions by construction. No load-bearing self-citations or uniqueness theorems from prior author work are invoked to force results. The central findings (lower retention after counterspeech, toxicity comparisons) are statistical associations from external corpus, not renamings or predictions equivalent to inputs. This is a standard empirical paper self-contained against benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The study relies on assumptions about data labeling and detection accuracy typical in social media research; no free parameters or new entities introduced.

axioms (2)
  • domain assumption Reddit subreddits labeled as hate communities accurately represent online hate groups.
    The study selects 104 hate subreddits without detailing how they were identified or validated.
  • domain assumption LLM can reliably detect counterspeech in context of hate speech.
    The paper devises an LLM-based approach but details not in abstract.

pith-pipeline@v0.9.0 · 5828 in / 1427 out tokens · 35633 ms · 2026-05-24T01:12:00.248937+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

73 extracted references · 73 canonical work pages · 1 internal anchor

  1. [1]

    , " * write output.state after.block = add.period write newline

    ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

  2. [2]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

  3. [3]

    Alyahya, G.; and Aldayel, A. 2025. Hatred stems from ignorance! distillation of the persuasion modes in countering conversational hate speech. In ICWSM, volume 19, 52--67

  4. [4]

    Austin, P. C. 2011. An introduction to propensity score methods for reducing the effects of confounding in observational studies. Multivariate behavioral research, 46(3): 399--424

  5. [5]

    Baider, F. 2023. Accountability issues, online covert hate speech, and the efficacy of counter-speech. Politics and Governance, 11(2): 249--260

  6. [6]

    A.; Argyle, L

    Bail, C. A.; Argyle, L. P.; Brown, T. W.; Bumpus, J. P.; Chen, H.; Hunzaker, M. F.; Lee, J.; Mann, M.; Merhout, F.; and Volfovsky, A. 2018. Exposure to opposing views on social media can increase political polarization. PNAS, 115(37): 9216--9221

  7. [7]

    B \"a r, D.; Maarouf, A.; and Feuerriegel, S. 2024. Generative AI may backfire for counterspeech. arXiv preprint arXiv:2411.14986

  8. [8]

    Baumgartner, J.; Zannettou, S.; Keegan, B.; Squire, M.; and Blackburn, J. 2020. The pushshift reddit dataset. In ICWSM, volume 14, 830--839

  9. [9]

    Beknazar-Yuzbashev, G.; Jim \'e nez-Dur \'a n, R.; McCrosky, J.; and Stalinski, M. 2025. Toxic content and user engagement on social media: Evidence from a field experiment. Technical report, CESifo Working Paper

  10. [10]

    Buntain, C.; Innes, M.; Mitts, T.; and Shapiro, J. 2023. Cross-platform reactions to the post-January 6 deplatforming. Journal of Quantitative Description: Digital Media, 3

  11. [11]

    Chan, J.; Ghose, A.; and Seamans, R. 2016. The internet and racial hate crime. Mis Quarterly, 40(2): 381--404

  12. [12]

    Chandrasekharan, E.; Jhaver, S.; Bruckman, A.; and Gilbert, E. 2022. Quarantined! Examining the effects of a community-wide moderation intervention on Reddit. ACM Transactions on Computer-Human Interaction (TOCHI), 29(4): 1--26

  13. [13]

    Chandrasekharan, E.; Pavalanathan, U.; Srinivasan, A.; Glynn, A.; Eisenstein, J.; and Gilbert, E. 2017. You can't stay here: The efficacy of reddit's 2015 ban examined through hate speech. CSCW, 1(CSCW): 1--22

  14. [14]

    Cheng, Z.-c.; and Guo, T.-c. 2015. The formation of social identity and self-identity based on knowledge contribution in virtual communities: An inductive route model. Computers in Human Behavior, 43: 229--241

  15. [15]

    Cima, L.; Trujillo, A.; Avvenuti, M.; and Cresci, S. 2024. The great ban: Efficacy and unintended consequences of a massive deplatforming operation on reddit. In Companion Publication of the 16th ACM Web Science Conference, 85--93

  16. [16]

    Dettmers, T.; Pagnoni, A.; Holtzman, A.; and Zettlemoyer, L. 2023. Qlora: Efficient finetuning of quantized llms. Advances in neural information processing systems, 36: 10088--10115

  17. [17]

    S.; Carik, B.; Stil, S.; Wilhelm, L

    Ding, X.; Ping, K.; Gunturi, U. S.; Carik, B.; Stil, S.; Wilhelm, L. T.; Daryanto, T.; Hawdon, J.; Lee, S. W.; and Rho, E. H. 2024. CounterQuill: Investigating the Potential of Human-AI Collaboration in Online Counterspeech Writing. arXiv preprint arXiv:2410.03032

  18. [18]

    Dinkar, T.; Jiang, A.; Frenda, S.; Gerrard-Abbott, P.; Gunson, N.; Abercrombie, G.; and Konstas, I. 2025. Can NLP Tackle Hate Speech in the Real World? Stakeholder-Informed Feedback and Survey on Counterspeech. arXiv preprint arXiv:2508.04638

  19. [19]

    M.; Kruglanski, A

    Doosje, B.; Moghaddam, F. M.; Kruglanski, A. W.; De Wolf, A.; Mann, L.; and Feddes, A. R. 2016. Terrorism, radicalization and de-radicalization. Current Opinion in Psychology, 11: 79--84

  20. [20]

    Erickson, J.; and Yan, B. 2025. Content Moderation and Hate Speech on Alternative Platforms: A Case Study of BitChute. CSCW, 9(2): 1--18

  21. [21]

    FORCE11 . 2020. The FAIR Data principles. https://force11.org/info/the-fair-data-principles/

  22. [22]

    W.; Wallach, H.; Iii, H

    Gebru, T.; Morgenstern, J.; Vecchione, B.; Vaughan, J. W.; Wallach, H.; Iii, H. D.; and Crawford, K. 2021. Datasheets for datasets. Communications of the ACM, 64(12): 86--92

  23. [23]

    Gelber, K.; and McNamara, L. 2016. Evidencing the harms of hate speech. Social Identities, 22(3): 324--341

  24. [24]

    A.; Haerter, V

    Gennaro, G.; Derksen, L.; Abdelrahman, A.; Broggini, E.; Green, M. A.; Haerter, V. A.; Heer, E.; Heidler, I.; Kauer, F.; Kim, H.-N.; et al. 2025. Counterspeech encouraging users to adopt the perspective of minority groups reduces hate speech and its amplification on social media. Scientific Reports, 15(1): 22018

  25. [25]

    Gillespie, T. 2018. Custodians of the Internet: Platforms, content moderation, and the hidden decisions that shape social media. Yale University Press

  26. [26]

    Gligoric, K.; Cheng, M.; Zheng, L.; Durmus, E.; and Jurafsky, D. 2024. NLP Systems That Can't Tell Use from Mention Censor Counterspeech, but Teaching the Distinction Helps. arXiv preprint arXiv:2404.01651

  27. [27]

    Google Jigsaw . 2017. Perspective API

  28. [28]

    B.; Derksen, L.; Hall, A.; Jochum, M.; et al

    Hangartner, D.; Gennaro, G.; Alasiri, S.; Bahrich, N.; Bornhoft, A.; Boucher, J.; Demirci, B. B.; Derksen, L.; Hall, A.; Jochum, M.; et al. 2021. Empathy-based counterspeech can reduce racist hate speech in a social media field experiment. PNAS, 118(50): e2116310118

  29. [29]

    J.; R \"a s \"a nen, P.; Zych, I.; Oksanen, A.; and Blaya, C

    Hawdon, J.; Reichelmann, A.; Costello, M.; Llorent, V. J.; R \"a s \"a nen, P.; Zych, I.; Oksanen, A.; and Blaya, C. 2024. Measuring hate: Does a definition affect self-reported levels of perpetration and exposure to online hate in surveys? Social Science Computer Review, 42(3): 812--831

  30. [30]

    He, B.; Ziems, C.; Soni, S.; Ramakrishnan, N.; Yang, D.; and Kumar, S. 2021. Racism is a virus: Anti-Asian hate and counterspeech in social media during the COVID-19 crisis. In Proceedings of the 2021 IEEE/ACM international conference on advances in social networks analysis and mining, 90--94

  31. [31]

    M.; Lerman, K.; and Burghardt, K

    Hickey, D.; Fessler, D. M.; Lerman, K.; and Burghardt, K. 2025 a . X under Musk’s leadership: Substantial hate and no reduction in inauthentic activity. PLoS One, 20(2): e0313293

  32. [32]

    M.; Schmitz, M.; Lerman, K.; and Burghardt, K

    Hickey, D.; Fessler, D. M.; Schmitz, M.; Lerman, K.; and Burghardt, K. 2025 b . The peripatetic hater: predicting movement among hate subreddits. In ICWSM, volume 19, 786--803

  33. [33]

    Horta Ribeiro, M.; Hosseinmardi, H.; West, R.; and Watts, D. J. 2023. Deplatforming did not decrease Parler users’ activity on fringe social media. PNAS nexus, 2(3): pgad035

  34. [34]

    Horta Ribeiro, M.; Jhaver, S.; Zannettou, S.; Blackburn, J.; Stringhini, G.; De Cristofaro, E.; and West, R. 2021. Do platform migrations compromise content moderation? evidence from r/the\_donald and r/incels. CSCW, 5(CSCW2): 1--24

  35. [35]

    Hutto, C.; and Gilbert, E. 2014. Vader: A parsimonious rule-based model for sentiment analysis of social media text. In ICWSM, volume 8, 216--225

  36. [36]

    F.; Leahy, R.; Restrepo, N

    Johnson, N. F.; Leahy, R.; Restrepo, N. J.; Vel \'a squez, N.; Zheng, M.; Manrique, P.; Devkota, P.; and Wuchty, S. 2019. Hidden resilience and adaptive dynamics of the global online hate ecology. Nature, 573(7773): 261--265

  37. [37]

    W.; Guess, A.; Nyhan, B.; and Reifler, J

    Kim, J. W.; Guess, A.; Nyhan, B.; and Reifler, J. 2021. The distorting prism of social media: How self-selection and exposure to incivility fuel online comment toxicity. Journal of Communication, 71(6): 922--946

  38. [38]

    M.; Lewandowsky, S.; Hertwig, R.; Lorenz-Spreen, P.; Leiser, M.; and Reifler, J

    Kozyreva, A.; Herzog, S. M.; Lewandowsky, S.; Hertwig, R.; Lorenz-Spreen, P.; Leiser, M.; and Reifler, J. 2023. Resolving content moderation dilemmas between free speech and harmful misinformation. PNAS, 120(7): e2210666120

  39. [39]

    Kumarswamy, N.; Singhal, M.; and Nilizadeh, S. 2025. Causal Insights into Parler's Content Moderation Shift: Effects on Toxicity and Factuality. In Proceedings of the ACM on Web Conference 2025, 3762--3771

  40. [40]

    A.; and Boyd, R

    Lahnala, A.; Varadarajan, V.; Flek, L.; Schwartz, H. A.; and Boyd, R. L. 2025. Unifying the Extremes: Developing a Unified Model for Detecting and Predicting Extremist Traits and Radicalization. In ICWSM, volume 19, 1051--1067

  41. [41]

    Liu, Y.; Ott, M.; Goyal, N.; Du, J.; Joshi, M.; Chen, D.; Levy, O.; Lewis, M.; Zettlemoyer, L.; and Stoyanov, V. 2019. Roberta: A robustly optimized bert pretraining approach. arXiv preprint arXiv:1907.11692

  42. [42]

    Luhrmann, T. M. 1991. Persuasions of the witch’s craft: Ritual magic in contemporary England. Harvard University Press

  43. [43]

    J.; Goldberg, B.; and Johnson, N

    Lupu, Y.; Sear, R.; Vel \'a squez, N.; Leahy, R.; Restrepo, N. J.; Goldberg, B.; and Johnson, N. F. 2023. Offline events and online hate. PLoS one, 18(1): e0278511

  44. [44]

    Mann, D.; Sutton, M.; and Tuffin, R. 2003. The evolution of hate: social dynamics in white racist newsgroups. Internet Journal of Criminology

  45. [45]

    Marwick, A. E. 2021. Morally motivated networked harassment as normative reinforcement. Social Media+ Society, 7(2): 20563051211021378

  46. [46]

    Mathew, B.; Illendula, A.; Saha, P.; Sarkar, S.; Goyal, P.; and Mukherjee, A. 2020. Hate begets hate: A temporal study of hate speech. CSCW, 4(CSCW2): 1--24

  47. [47]

    K.; Goyal, P.; and Mukherjee, A

    Mathew, B.; Saha, P.; Tharad, H.; Rajgaria, S.; Singhania, P.; Maity, S. K.; Goyal, P.; and Mukherjee, A. 2019. Thou shalt not hate: Countering online hate speech. In ICWSM, volume 13, 369--380

  48. [48]

    Mekacher, A.; Falkenberg, M.; and Baronchelli, A. 2023. The systemic impact of deplatforming on social media. PNAS nexus, 2(11): pgad346

  49. [49]

    Mun, J.; Allaway, E.; Yerukola, A.; Vianna, L.; Leslie, S.-J.; and Sap, M. 2023. Beyond Denouncing Hate: Strategies for Countering Implied Biases and Stereotypes in Language. In Findings of the Association for Computational Linguistics: EMNLP 2023, 9759--9777

  50. [50]

    Nattino, G.; Lu, B.; Shi, J.; Lemeshow, S.; and Xiang, H. 2021. Triplet matching for estimating causal effects with three treatment arms: a comparative study of mortality by trauma center level. Journal of the American Statistical Association, 116(533): 44--53

  51. [51]

    Ping, K.; Hawdon, J.; and Rho, E. H. 2025. Perceiving and countering hate: The role of identity in online responses. CSCW, 9(2): 1--28

  52. [52]

    Russo, G.; Horta Ribeiro, M.; Casiraghi, G.; and Verginer, L. 2023 a . Understanding online migration decisions following the banning of radical communities. In Proceedings of the 15th ACM Web Science Conference 2023, 251--259

  53. [53]

    H.; and West, R

    Russo, G.; Ribeiro, M. H.; and West, R. 2024. Stranger danger! cross-community interactions with fringe users increase the growth of fringe communities on reddit. In ICWSM, volume 18, 1342--1353

  54. [54]

    H.; and Casiraghi, G

    Russo, G.; Verginer, L.; Ribeiro, M. H.; and Casiraghi, G. 2023 b . Spillover of antisocial behavior from fringe platforms: The unintended consequences of community banning. In ICWSM, volume 17, 742--753

  55. [55]

    Sap, M.; Swayamdipta, S.; Vianna, L.; Zhou, X.; Choi, Y.; and Smith, N. A. 2022. Annotators with attitudes: How annotator beliefs and identities bias toxic language detection. In Proceedings of the 2022 conference of the north american chapter of the association for computational linguistics: Human language technologies, 5884--5906

  56. [56]

    Saveski, M.; Roy, B.; and Roy, D. 2021. The structure of toxic conversations on Twitter. In Proceedings of the web conference 2021, 1086--1097

  57. [57]

    Schmid, U. K. 2025. Humorous hate speech on social media: A mixed-methods investigation of users’ perceptions and processing of hateful memes. New Media & Society, 27(3): 1588--1606

  58. [58]

    K.; Schulze, H.; and Drexel, A

    Schmid, U. K.; Schulze, H.; and Drexel, A. 2025. Memes, humor, and the far right’s strategic mainstreaming. Information, Communication & Society, 28(4): 537--556

  59. [59]

    Senaviratna, N.; Cooray, T.; et al. 2019. Diagnosing multicollinearity of logistic regression model. Asian J. Probab. Stat., 5(2): 1--9

  60. [60]

    Shen, Q.; and Ros \'e , C. P. 2022. A tale of two subreddits: Measuring the impacts of quarantines on political engagement on Reddit. In ICWSM, volume 16, 932--943

  61. [61]

    Song, K.; Tan, X.; Qin, T.; Lu, J.; and Liu, T.-Y. 2020. Mpnet: Masked and permuted pre-training for language understanding. Advances in neural information processing systems, 33: 16857--16867

  62. [62]

    Song, X.; Mamidisetty, S.; Blanco, E.; and Hong, L. 2024. Assessing the human likeness of AI-generated counterspeech. arXiv preprint arXiv:2410.11007

  63. [63]

    L.; Yu, X.; Blanco, E.; and Hong, L

    Song, X.; Perez, S. L.; Yu, X.; Blanco, E.; and Hong, L. 2025. Echoes of Discord: Forecasting Hater Reactions to Counterspeech. arXiv preprint arXiv:2501.16235

  64. [64]

    Suhay, E.; Bello-Pardo, E.; and Maurer, B. 2018. The polarizing effects of online partisan criticism: Evidence from two experiments. The International Journal of Press/Politics, 23(1): 95--115

  65. [65]

    Trujillo, A.; and Cresci, S. 2022. Make reddit great again: assessing community effects of moderation interventions on r/the\_donald. CSCW, 6(CSCW2): 1--28

  66. [66]

    Vidgen, B.; Nguyen, D.; Margetts, H.; Rossini, P.; and Tromble, R. 2021. Introducing CAD : the Contextual Abuse Dataset. In Toutanova, K.; Rumshisky, A.; Zettlemoyer, L.; Hakkani-Tur, D.; Beltagy, I.; Bethard, S.; Cotterell, R.; Chakraborty, T.; and Zhou, Y., eds., Proceedings of the 2021 Conference of the North American Chapter of the Association for Com...

  67. [67]

    Waller, I.; and Anderson, A. 2021. Quantifying social organization and political polarization in online platforms. Nature, 600(7888): 264--268

  68. [68]

    Walther, J. B. 2024. The effects of social approval signals on the production of online hate: A theoretical explication. Communication Research, 00936502241278944

  69. [69]

    Xia, Y.; Monti, C.; Keller, B.; and Kivel \"a , M. 2025. Integrated or Segregated? User Behavior Change After Cross-Party Interactions on Reddit. In ICWSM, volume 19, 2044--2061

  70. [70]

    Yu, X.; Blanco, E.; and Hong, L. 2022. Hate speech and counter speech detection: Conversational context does matter. arXiv preprint arXiv:2206.06423

  71. [71]

    Yu, X.; Blanco, E.; and Hong, L. 2024. Hate cannot drive out hate: Forecasting conversation incivility following replies to hate speech. In ICWSM, volume 18, 1740--1752

  72. [72]

    Yu, X.; Zhao, A.; Blanco, E.; and Hong, L. 2023. A fine-grained taxonomy of replies to hate speech. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 7275--7289

  73. [73]

    Ziems, C.; Held, W.; Shaikh, O.; Chen, J.; Zhang, Z.; and Yang, D. 2024. Can large language models transform computational social science? Computational Linguistics, 50(1): 237--291