pith. sign in

arxiv: 2605.16204 · v1 · pith:OSYSUEV4new · submitted 2026-05-15 · 💻 cs.CY

Who, Why, and How: Disentangling the Effects of Moderation Source, Context, and Language on Post-Removal Behavior

Pith reviewed 2026-05-19 21:45 UTC · model grok-4.3

classification 💻 cs.CY
keywords content moderationredditbot moderationself-censorshipuser complianceviolation severitylinguistic strategies
0
0 comments X

The pith

Bot moderation on Reddit produces higher compliance and lower self-censorship than human or modteam moderation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper analyzes how moderator source, violation context, and removal message language jointly shape what users do after their content is taken down. It draws on more than eleven million Reddit moderation events to compare bots, individual humans, and moderation teams. The central finding is that bots achieve stronger compliance with less silent withdrawal, while team moderation increases self-censorship and violation severity reverses which linguistic tactics succeed.

Core claim

In a dataset of 11,795,036 moderation events across 9 million users, bot-moderated removals yield higher compliance and lower self-censorship than removals by humans or modteams. Modteam actions produce the largest withdrawal effects. Linguistic features such as elaborated explanations and direct address improve outcomes only for routine violations; for serious violations these same features increase withdrawal while prosocial and emotionally emphatic framing becomes most effective.

What carries the argument

Violation severity as a moderator of cue-based processing, tested inside an extension of the Human-AI Interaction Theory of Interactive Media Effects through probabilistic behavioral classification and regression on linguistic features extracted via PCA.

If this is right

  • Routine violations can be routed to bots to raise compliance rates without raising self-censorship.
  • Modteam interventions should be reserved for cases where institutional signaling is the goal rather than retention.
  • Removal messages for high-severity violations should favor prosocial framing and emotional emphasis over detailed explanations.
  • Moderation systems can become context-adaptive by letting violation severity select the linguistic strategy.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The compliance advantage of bots may extend to other platforms if their community structures resemble Reddit's subreddit model.
  • Hybrid designs that start with bot messages and escalate serious cases to humans could capture both efficiency and perceived legitimacy.
  • Long-term user retention on platforms might rise if self-censorship is lowered through calibrated moderation language.

Load-bearing premise

The large observational dataset lets researchers attribute differences in user compliance and withdrawal directly to moderator source and message language without major confounding from subreddit norms or moderator assignment choices.

What would settle it

A randomized experiment that assigns identical violations to bot, human, or team moderation while varying message language and then measures the fraction of users who post again versus those who reduce activity.

Figures

Figures reproduced from arXiv: 2605.16204 by Emilio Ferrara, Lindsay Young, Marlon Twyman, Siyi Zhou.

Figure 1
Figure 1. Figure 1: Sigmoid-based probabilistic classifiers for self-censorship, resistance, and compliance. [PITH_FULL_IMAGE:figures/full_fig_p007_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Mean probability of user behavior trajectory after moderated by different source for different [PITH_FULL_IMAGE:figures/full_fig_p009_2.png] view at source ↗
Figure 7
Figure 7. Figure 7: Distribution for difference of post frequency, log ratio of post frequency, and moderation [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗
read the original abstract

Content moderation is a central mechanism through which platforms attempt to balance user engagement with community governance. Yet existing research has largely treated moderation as a uniform intervention, overlooking how moderator source, violation context, and linguistic style jointly shape user behavior. Drawing on the Human--AI Interaction Theory of Interactive Media Effects (HAII-TIME), this study examines how these three dimensions produce divergent post-moderation behavioral trajectories in a large-scale observational dataset of 11,795,036 moderation events across 9,285,410 users and 61,261 subreddits on Reddit (2021--2025). Using probabilistic behavioral classification, ANOVA, and OLS regression with PCA-derived linguistic features, we find that bot moderation consistently produces higher compliance and lower self-censorship than human or modteam moderation, challenging the assumption that human agency cues are inherently advantageous. Modteam moderation produces the strongest self-censorship effects, suggesting that institutional depersonalization is a meaningful driver of behavioral withdrawal. Violation severity emerges as a critical contingency: linguistic strategies effective in routine contexts -- elaborated explanation, community-scale appeals, direct personal address -- can backfire for serious violations, whereas prosocially framed and emotionally emphatic messages become most effective when stakes are highest. Of 480 linguistic interactions tested, 33 survive FDR correction. These findings extend HAII-TIME by introducing violation salience as a moderator of cue-based processing, and offer empirical grounding for context-adaptive moderation design.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. This paper analyzes a large observational dataset of 11,795,036 moderation events across 9,285,410 users and 61,261 subreddits on Reddit (2021-2025) to examine how moderator source (bot, human, modteam), violation context, and linguistic style jointly influence post-moderation user behavior. Drawing on HAII-TIME, it employs probabilistic behavioral classification, ANOVA, and OLS regression with PCA-derived linguistic features, reporting that bot moderation is associated with higher compliance and lower self-censorship than human or modteam moderation, that modteam moderation drives the strongest self-censorship, and that violation severity moderates the effectiveness of linguistic strategies (with 33 of 480 interactions surviving FDR correction). The work claims to extend HAII-TIME by introducing violation salience as a moderator of cue-based processing.

Significance. If the central associations hold after addressing potential confounding, the findings would be significant for computational social science and platform governance research by providing large-scale evidence on differential effects of automated versus human moderation and by identifying violation severity as a key contingency for linguistic interventions. The dataset scale, use of FDR correction across 480 tests, and extension of an existing theoretical framework are clear strengths that would support practical implications for context-adaptive moderation design.

major comments (3)
  1. [Abstract] Abstract: The claim that 'bot moderation consistently produces higher compliance and lower self-censorship' attributes outcomes causally to moderator source, yet the observational design compares outcomes across non-randomly assigned sources without demonstrated controls (e.g., subreddit fixed effects, violation-type stratification, or propensity weighting) for selection into moderator type or subreddit norms; the reported OLS and ANOVA results on PCA features therefore cannot isolate the source cue itself from the contexts in which each source appears.
  2. [Methods/Results] Methods/Results (OLS and ANOVA sections): The manuscript does not detail whether the regression models include subreddit fixed effects, user-level clustering, or robustness checks such as propensity score weighting to address the non-random assignment of moderation sources noted in the skeptic's concern; without these, the source main effects and the 33 FDR-significant interactions remain vulnerable to confounding and cannot cleanly support the headline behavioral attribution.
  3. [Abstract and Discussion] Abstract and Discussion: The extension of HAII-TIME by 'introducing violation salience as a moderator' is presented as a theoretical contribution, but the observational data leave open whether the reported severity-by-language interactions reflect cue processing or unmeasured differences in how severe violations are routed to different moderator sources and linguistic framings.
minor comments (2)
  1. [Abstract] The abstract would benefit from a brief parenthetical definition or citation for 'probabilistic behavioral classification' to clarify how compliance and self-censorship are operationalized from the 11.8M events.
  2. [Results] Figure or table captions for the linguistic interaction results should explicitly state the exact number of tests (480) and the FDR threshold applied so readers can assess the 33 significant findings without returning to the text.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below, clarifying our approach and indicating revisions where the manuscript can be strengthened without overstating the observational evidence.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The claim that 'bot moderation consistently produces higher compliance and lower self-censorship' attributes outcomes causally to moderator source, yet the observational design compares outcomes across non-randomly assigned sources without demonstrated controls (e.g., subreddit fixed effects, violation-type stratification, or propensity weighting) for selection into moderator type or subreddit norms; the reported OLS and ANOVA results on PCA features therefore cannot isolate the source cue itself from the contexts in which each source appears.

    Authors: We agree that the phrasing 'produces' risks implying causation beyond what the observational data support. The reported OLS models control for violation severity, subreddit size, and other observed covariates, with violation-type stratification implicit in the interaction terms, but subreddit fixed effects and propensity weighting were not applied in the primary specifications. We will revise the abstract to use associative language ('is associated with') and add a dedicated robustness subsection describing these controls and limitations. revision: yes

  2. Referee: [Methods/Results] Methods/Results (OLS and ANOVA sections): The manuscript does not detail whether the regression models include subreddit fixed effects, user-level clustering, or robustness checks such as propensity score weighting to address the non-random assignment of moderation sources noted in the skeptic's concern; without these, the source main effects and the 33 FDR-significant interactions remain vulnerable to confounding and cannot cleanly support the headline behavioral attribution.

    Authors: The primary models include user-level random effects to address clustering and control for violation type and subreddit characteristics. Subreddit fixed effects were omitted from the main results to retain statistical power across 61,261 subreddits. We will expand the Methods section with complete model equations, explicit mention of the clustering approach, and new robustness analyses that incorporate subreddit fixed effects and propensity-score weighting on observable features such as subreddit activity and violation category. revision: yes

  3. Referee: [Abstract and Discussion] Abstract and Discussion: The extension of HAII-TIME by 'introducing violation salience as a moderator' is presented as a theoretical contribution, but the observational data leave open whether the reported severity-by-language interactions reflect cue processing or unmeasured differences in how severe violations are routed to different moderator sources and linguistic framings.

    Authors: The models explicitly interact linguistic features with violation severity while holding moderator source constant within strata, which provides evidence consistent with salience moderating cue effectiveness. We cannot fully exclude differential routing with observational data alone. We will revise the Discussion to acknowledge this limitation more explicitly, frame the HAII-TIME extension as an empirical pattern supporting the proposed moderator rather than a conclusive test, and suggest future experimental designs to isolate routing mechanisms. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical analysis is self-contained

full rationale

The paper reports results from an observational dataset of 11.8M moderation events analyzed via probabilistic classification, ANOVA, and OLS regression on PCA-derived features. All load-bearing claims (bot moderation producing higher compliance, violation severity as moderator, 33 FDR-significant interactions) are statistical outputs from the data rather than quantities defined by the paper's own fitted parameters or reduced to self-citations by construction. The reference to HAII-TIME is used to frame the study and is extended by new empirical findings; it does not serve as a load-bearing premise whose validity depends on the present results. No self-definitional loops, fitted inputs called predictions, or ansatzes smuggled via citation appear in the derivation chain. The analysis is therefore independent of its own outputs and receives the default non-circularity finding.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The analysis rests on standard statistical modeling assumptions and data classification procedures rather than new theoretical entities or derivations.

free parameters (2)
  • PCA-derived linguistic feature dimensions
    Number and selection of principal components for language features fitted from the moderation message corpus.
  • OLS regression coefficients for interaction terms
    Coefficients estimated from data to quantify effects of moderator type, severity, and language on behavioral outcomes.
axioms (2)
  • domain assumption Probabilistic behavioral classification correctly identifies compliance versus self-censorship from post-moderation activity logs
    Central measurement step for the dependent variables.
  • domain assumption OLS regression assumptions (linearity, no omitted variable bias, homoscedasticity) hold for the behavioral outcome models
    Required for interpreting coefficient estimates as effects.

pith-pipeline@v0.9.0 · 5806 in / 1379 out tokens · 49982 ms · 2026-05-19T21:45:39.137253+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

43 extracted references · 43 canonical work pages

  1. [1]

    K., & boyd danah, d

    Baym, N. K., & boyd danah, d. (2012). Socially Mediated Publicness: An Introduction [ eprint: https://doi.org/10.1080/08838151.2012.705200].Journal of Broadcasting & Electronic Media, 56(3), 320–329. https://doi.org/10.1080/08838151.2012.705200

  2. [2]

    Binns, R., Van Kleek, M., Veale, M., Lyngs, U., Zhao, J., & Shadbolt, N. (2018). ’It’s Reducing a Human Being to a Percentage’: Perceptions of Justice in Algorithmic Decisions.Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems, 1–14. https://doi.org/ 10.1145/3173574.3173951

  3. [3]

    , year 2007

    Braithwaite, J. (2001).Restorative Justice & Responsive Regulation(1st ed.). Oxford University Press. https://doi.org/10.1093/oso/9780195136395.001.0001

  4. [4]

    Brehm, J. W. (1966).A Theory of Psychological Reactance. Academic Press

  5. [5]

    Brown, P., & Levinson, S. C. (1987).Politeness: Some Universals in Language Usage. Cambridge University Press

  6. [6]

    (2018).Content or Context Moderation? Artisanal, Community, and Industrial Approaches (tech

    Caplan, R. (2018).Content or Context Moderation? Artisanal, Community, and Industrial Approaches (tech. rep.). Data & Society Research Institute. New York. https://datasociety.net/library/ content-or-context-moderation/

  7. [7]

    Chandrasekharan, E., Pavalanathan, U., Srinivasan, A., Glynn, A., Eisenstein, J., & Gilbert, E. (2017). You Can’t Stay Here: The Efficacy of Reddit’s 2015 Ban Examined Through Hate Speech. Proceedings of the ACM Conference on Computer Supported Cooperative Work and Social Computing (CSCW), 1–18. https://doi.org/10.1145/2998181.2998215

  8. [8]

    Chandrasekharan, E., Samory, M., Jhaver, S., Charvat, H., Bruckman, A., Lampe, C., Eisenstein, J., & Gilbert, E. (2018). The Internet’s Hidden Rules: An Empirical Study of Reddit Norm Violations at Micro, Meso, and Macro Scales.Proc. ACM Hum.-Comput. Interact.,2(CSCW). https://doi.org/10.1145/3274301

  9. [9]

    Chandrasekharan, E., Samory, M., Srinivasan, A., & Gilbert, E. (2022). Quarantined! Examining the Effects of Reddit Quarantines on Online Hate and Behavior.Proceedings of the International AAAI Conference on Web and Social Media (ICWSM),16(1), 109–120

  10. [10]

    Chang, J., Zhang, H., & Danescu-Niculescu-Mizil, C. (2022). Echoes of Moderation: How Banning Affects the Spread of Toxic Content Online.Proceedings of the International AAAI Conference on Web and Social Media (ICWSM),16(1), 76–87

  11. [11]

    S., Hancock, J

    Christin, A., Bernstein, M. S., Hancock, J. T., Jia, C., Mado, M. N., Tsai, J. L., & Xu, C. (2024). Inter- nal Fractures: The Competing Logics of Social Media Platforms [eprint: https://doi.org/10.1177/20563051241274668]. Social Media + Society,10(3), 20563051241274668. https://doi.org/10.1177/20563051241274668

  12. [12]

    \ Goldstein, N J

    Cialdini, R. B., & Goldstein, N. J. (2004). Social Influence: Compliance and Conformity.Annual review of psychology,55(1), 591–621. https://doi.org/10.1146/annurev.psych.55.090902.142015

  13. [13]

    L., & Ryan, R

    Deci, E. L., & Ryan, R. M. (2000). The ”what” and ”why” of goal pursuits: Human needs and the self-determination of behavior.Psychological Inquiry,11(4), 227–268

  14. [14]

    A., Gergle, D., & Birnholtz, J

    DeVito, M. A., Gergle, D., & Birnholtz, J. (2017). ”Algorithms ruin everything”: #RIPTwitter, Folk Theories, and Resistance to Algorithmic Change in Social Media.Proceedings of the 2017 CHI Conference on Human Factors in Computing Systems, 3163–3174. https://doi.org/10.1145/ 3025453.3025659

  15. [15]

    P., & Shen, L

    Dillard, J. P., & Shen, L. (2005). On the nature of reactance and its role in persuasive health commu- nication.Communication Monographs,72(2), 144–168

  16. [16]

    Ferrara, E., Varol, O., Davis, C., Menczer, F., & Flammini, A. (2016). The rise of social bots.Commun. ACM,59(7), 96–104. https://doi.org/10.1145/2818717

  17. [17]

    Gerrard, Y. (2018). Beyond Hashtags: Coded Discourse in the Pro–Eating Disorder Community on Instagram.New Media & Society,20(12), 4653–4670

  18. [18]

    (2018).Custodians of the Internet: Platforms, Content Moderation, and the Hidden De- cisions That Shape Social Media

    Gillespie, T. (2018).Custodians of the Internet: Platforms, Content Moderation, and the Hidden De- cisions That Shape Social Media. Yale University Press

  19. [19]

    (2019, December).Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media

    Gillespie, T. (2019, December).Custodians of the Internet: Platforms, Content Moderation, and the Hidden Decisions That Shape Social Media. Yale University Press. https://doi.org/10.12987/ 9780300235029

  20. [20]

    Gillespie, T. (2022). Do Not Recommend? Reduction as a Form of Content Moderation [ eprint: https://doi.org/10.1177/20563051221117552].Social Media + Society,8(3), 20563051221117552. https://doi.org/10.1177/20563051221117552 18 Gon¸ calves, J., Weber, I., Masullo, G. M., Silva, M. T. d., & Hofhuis, J. (2023). Common sense or censorship: How algorithmic mo...

  21. [21]

    Grimmelmann, J. (2015). The virtues of moderation.Yale Journal of Law & Technology,17, 42–109. Horta Ribeiro, M., Jhaver, S., Zannettou, S., Blackburn, J., Stringhini, G., De Cristofaro, E., &

  22. [22]

    West, R. (2021). Do Platform Migrations Compromise Content Moderation? Evidence from r/The donald and r/Incels.Proc. ACM Hum.-Comput. Interact.,5(CSCW2). https://doi.org/ 10.1145/3476057

  23. [23]

    (2006).Convergence Culture

    Jenkins, H. (2006).Convergence Culture. NYU Press. Retrieved April 10, 2026, from http://www. jstor.org/stable/j.ctt9qffwr

  24. [24]

    Jhaver, S., Birman, I., Gilbert, E., & Bruckman, A. (2019). Did You Suspect the Post Would Be Removed? Understanding User Reactions to Content Moderation on Reddit.Proceedings of the ACM on Human-Computer Interaction (CSCW),3(CSCW), 1–33

  25. [25]

    Jhaver, S., Birman, I., Gilbert, E., & Bruckman, A. (2021). Measuring the Effectiveness of Content Moderation Efforts on YouTube.Proceedings of the ACM on Human-Computer Interaction (CSCW),5(CSCW2), 1–27

  26. [26]

    Jhaver, S., Bruckman, A., & Gilbert, E. (2019). Does Transparency in Moderation Affect User Be- havior?Proceedings of the ACM Conference on Human Factors in Computing Systems (CHI), 1–14. https://doi.org/10.1145/3290605.3300479

  27. [27]

    Jhaver, S., Rathi, H., & Saha, K. (2024). Bystanders of Online Moderation: Examining the Effects of Witnessing Post-Removal Explanations.Proceedings of the CHI Conference on Human Factors in Computing Systems (CHI ’24), 1–9. https://doi.org/10.1145/3613904.3642204

  28. [28]

    ’., Middler, S., Brubaker, J

    Jiang, J. ’., Middler, S., Brubaker, J. R., & Fiesler, C. (2020). Characterizing Community Guidelines on Social Media Platforms.Companion Publication of the 2020 Conference on Computer Sup- ported Cooperative Work and Social Computing, 287–291. https://doi.org/10.1145/3406865. 3418312

  29. [29]

    D., & Sundar, S

    Molina, M. D., & Sundar, S. S. (2022). When AI moderates online content: Effects of human collabora- tion and interactive transparency on user trust [eprint: https://academic.oup.com/jcmc/article- pdf/27/4/zmac010/45048191/zmac010.pdf].Journal of Computer-Mediated Communication, 27(4), zmac010. https://doi.org/10.1093/jcmc/zmac010 Myers West, S. (2018). C...

  30. [30]

    Nass, C., Steuer, J., & Tauber, E. R. (1994). Computers are social actors.Proceedings of the SIGCHI Conference on Human Factors in Computing Systems, 72–78. https://doi.org/10.1145/191666. 191703

  31. [31]

    Penney, J. W. (2017). Chilling effects: Online surveillance and Wikipedia use.Berkeley Technology Law Journal,31(1), 117–182

  32. [32]

    E., & Cacioppo, J

    Petty, R. E., & Cacioppo, J. T. (1986).Communication and Persuasion: Central and Peripheral Routes to Attitude Change. Springer

  33. [33]

    Puschmann, C. (2021). Coded Speech and Platform Governance.Internet, Policy & Politics Confer- ence

  34. [34]

    S., & Fiske, A

    Rai, T. S., & Fiske, A. P. (2011). Moral Psychology Is Relationship Regulation: Moral Motives for

  35. [35]

    https: //doi.org/10.1037/a0021867

    Unity, Hierarchy, Equality, and Proportionality.Psychological review,118(1), 57–75. https: //doi.org/10.1037/a0021867

  36. [36]

    Roberts, M. E. (2018).Censored: Distraction and Diversion Inside China’s Great Firewall. Princeton University Press

  37. [37]

    M., & Ruths, D

    Saleem, H. M., & Ruths, D. (2018). The Aftermath of Reddit Bans on Hate Communities.Proceedings of the International AAAI Conference on Web and Social Media (ICWSM),12(1), 313–322

  38. [38]

    Schauer, F. (1978). Fear, risk and the first amendment: Unraveling the ”chilling effect”.Boston Uni- versity Law Review,58, 685–732

  39. [39]

    B., Danescu-Niculescu-Mizil, C., Lee, L., & Tan, C

    Srinivasan, K. B., Danescu-Niculescu-Mizil, C., Lee, L., & Tan, C. (2019). Content Removal as a Moderation Strategy: Compliance and Other Outcomes in the ChangeMyView Community. Proc. ACM Hum.-Comput. Interact.,3(CSCW). https://doi.org/10.1145/3359265 19

  40. [40]

    Sundar, S. S. (2020). Rise of machine agency: A framework for studying the psychology of human-AI interaction (HAII).Journal of Computer-Mediated Communication,25(1), 74–88

  41. [42]

    https://doi.org/https://doi-org.libproxy2.usc.edu/10.1002/9781118426456.ch3

    Sons, Ltd. https://doi.org/https://doi-org.libproxy2.usc.edu/10.1002/9781118426456.ch3

  42. [43]

    Tyler, T. R. (1990).Why People Obey the Law. Yale University Press

  43. [44]

    WALTHER, J. B. (1996). Computer-Mediated Communication: Impersonal, Interpersonal, and Hyper- personal Interaction [ eprint: https://doi.org/10.1177/009365096023001001].Communication Research,23(1), 3–43. https://doi.org/10.1177/009365096023001001 20 Appendix Data Overview Table 2: Summary Statistics of Moderator Roles and Activity Metric Bot Modteam Pers...