Among Us: Language of Conspiracy Theorists on Mainstream Reddit
Pith reviewed 2026-05-19 11:18 UTC · model grok-4.3
The pith
Users active in conspiracy subreddits display distinctive language patterns even in mainstream Reddit communities that per-community machine learning models can identify at 87 percent average accuracy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Users who participate in conspiracy-focused subreddits exhibit distinctive linguistic patterns when they post in general-interest Reddit communities. These patterns allow binary machine learning classifiers to distinguish them from other users within individual communities at an average accuracy of 87 percent across more than twenty tasks. Community-specific models outperform any single global classifier by as much as 17 percentage points. The authors interpret this gap as evidence that linguistic expression among these users remains dynamic and responsive to the norms of each environment rather than fixed across all contexts.
What carries the argument
Per-community binary classifiers trained on linguistic features from user comments to separate conspiracy-active users from the rest of a given subreddit.
If this is right
- Uniform moderation strategies across Reddit will miss many signals because linguistic markers vary by community.
- Detection systems must be retrained or adapted separately for each subreddit to maintain high accuracy.
- Linguistic differences among these users persist across news, humor, and hobby contexts rather than appearing only inside conspiracy spaces.
- Global models trained on pooled data lose up to 17 points of accuracy compared with localized ones.
Where Pith is reading between the lines
- The same approach could be applied to users from other fringe or activist groups to test whether distinctive language appears outside their dedicated forums.
- Longitudinal tracking of individual users might reveal whether these linguistic markers appear before or after joining conspiracy communities.
- Platform interventions could prioritize community-specific training data collection to improve early identification without over-flagging general discussion.
Load-bearing premise
Participation in conspiracy subreddits marks a distinct user population whose language differences in other communities arise independently of the specific topics under discussion.
What would settle it
A test that trains classifiers on linguistic features while matching or controlling for the exact topics discussed in comments and finds that accuracy falls close to random guessing would falsify the claim of topic-independent signatures.
read the original abstract
The interaction between fringe subcultures and mainstream online communities poses significant challenges for understanding discourse on social media. In this work, we investigate whether users active in conspiracy-focused communities exhibit detectable linguistic signatures when participating in general-interest spaces, such as news, humor, or hobbyist forums. We analyze a large-scale longitudinal dataset of over 500 million comments spanning 10 years of Reddit activity, examining the communication patterns of these users across diverse social contexts independent of the topics they discuss. We show that these users exhibit distinctive linguistic patterns that enable machine learning models to reliably distinguish them from the general population within individual communities (averaging 87\% accuracy across more than 20 binary classification tasks). Crucially, no single aggregate model captures these patterns across communities, as community-specific models outperform global classifiers by up to 17 percentage points. This result suggests that while these users are distinct, their linguistic expression is dynamic and highly responsive to the social norms of the environment they inhabit. Our findings suggest the need for tailored interventions in online spaces, as linguistic signals associated with conspiracy and fringe subcultures vary across communities and cannot be effectively addressed by uniform detection or moderation strategies.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that users active in conspiracy-focused subreddits exhibit distinctive linguistic patterns in mainstream Reddit communities (news, humor, hobbyist forums) that enable ML models to distinguish them from the general population with an average 87% accuracy across more than 20 per-community binary classification tasks. It further claims that community-specific models outperform global classifiers by up to 17 points, indicating that linguistic expression is dynamic and responsive to local social norms rather than fixed or topic-driven.
Significance. If the central result holds after proper controls, the work would be significant for computational social science by providing large-scale longitudinal evidence (500M+ comments over 10 years) that fringe-group linguistic markers are context-dependent. This would support the need for community-tailored moderation strategies and demonstrate the value of comparing local versus aggregate classifiers for detecting norm-responsive idiolects.
major comments (2)
- [Abstract] Abstract: The claim that detected patterns are 'independent of the topics they discuss' and reflect 'dynamic and highly responsive' linguistic signatures is load-bearing for the central result but lacks described controls. The user-identification method (conspiracy-subreddit participation) followed by classification on mainstream comments does not mention topic matching, keyword filtering, LDA-based content removal, or restriction to function-word features; without these, the 87% accuracy may exploit residual domain vocabulary rather than style.
- [Abstract] Abstract and results sections: The reported classification accuracies are presented without details on feature engineering, baseline comparisons (e.g., bag-of-words vs. LIWC vs. embeddings), statistical significance testing, or confounds such as differential post volume, user tenure, or demographic proxies. These omissions make it impossible to evaluate whether the per-community advantage over global models (up to 17 points) is robust or an artifact of unbalanced data.
minor comments (1)
- [Abstract] Abstract: The statement 'averaging 87% accuracy across more than 20 binary classification tasks' would be clearer if it reported the range, standard deviation, or per-community breakdown to allow assessment of consistency.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive feedback. The comments identify important areas for clarification regarding controls for topic independence and experimental details. We address each point below and commit to revisions that strengthen the manuscript without altering its core claims.
read point-by-point responses
-
Referee: [Abstract] Abstract: The claim that detected patterns are 'independent of the topics they discuss' and reflect 'dynamic and highly responsive' linguistic signatures is load-bearing for the central result but lacks described controls. The user-identification method (conspiracy-subreddit participation) followed by classification on mainstream comments does not mention topic matching, keyword filtering, LDA-based content removal, or restriction to function-word features; without these, the 87% accuracy may exploit residual domain vocabulary rather than style.
Authors: We agree that the abstract would benefit from explicit reference to the controls used. The full manuscript selects mainstream comments exclusively from non-conspiracy subreddits (news, humor, and hobbyist forums) and focuses analysis on stylistic patterns. In revision we will expand the abstract and add a methods subsection detailing keyword filtering to exclude conspiracy-related terms, LDA-based verification that topic distributions are matched between groups, and primary reliance on function-word and syntactic features to isolate style from content. These additions will directly support the independence claim. revision: yes
-
Referee: [Abstract] Abstract and results sections: The reported classification accuracies are presented without details on feature engineering, baseline comparisons (e.g., bag-of-words vs. LIWC vs. embeddings), statistical significance testing, or confounds such as differential post volume, user tenure, or demographic proxies. These omissions make it impossible to evaluate whether the per-community advantage over global models (up to 17 points) is robust or an artifact of unbalanced data.
Authors: We accept that greater methodological transparency is required. The revised manuscript will include an expanded methods section describing the full feature set, explicit baseline comparisons (bag-of-words, LIWC, and embeddings), and statistical tests (e.g., McNemar’s test) for accuracy differences between community-specific and global models. We will also report controls for post volume and user tenure via matching or covariate adjustment. Demographic proxies cannot be addressed because the Reddit dataset contains no such information; we will note this limitation explicitly. revision: partial
- Demographic proxies cannot be controlled because the underlying Reddit data provides no user demographic information.
Circularity Check
No circularity: empirical classification results are independent of inputs
full rationale
The paper conducts a standard empirical study: users are labeled by participation in conspiracy-focused subreddits, features are extracted from their separate mainstream Reddit comments, and binary classifiers are trained and evaluated on held-out data to produce the reported per-community accuracies. No equations, parameters, or predictions are defined in terms of the target result itself, no self-citation chain is load-bearing for the central claim, and the accuracies are direct outputs of the ML pipeline rather than renamings or fitted inputs. The work is therefore self-contained against external benchmarks and exhibits no reduction of results to inputs by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Participation in conspiracy-focused subreddits reliably identifies users with a distinct linguistic profile independent of discussion topics
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We analyze a large-scale longitudinal dataset of over 500 million comments... machine learning models to reliably distinguish them... averaging 87% accuracy across more than 20 binary classification tasks.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
user representations based on the activity in several of the most popular mainstream subreddits... LIWC-22... 110 linguistic and psycholinguistic features
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
Effects of Algorithmic Visibility on Conspiracy Communities: Reddit after Epstein's 'Suicide'
Mainstream visibility after Epstein's death selected for users who stayed less and talked less like core members, pointing to selection rather than simple amplification in conspiracy community growth.
-
Simulating Online Social Media Conversations on Controversial Topics Using AI Agents Calibrated on Real-World Data
LLM agents calibrated on Italian election data produce coherent posts and realistic network structure but show less tone and toxicity variation than real users, with opinion changes resembling traditional mathematical models.
Reference graph
Works this paper leans on
-
[1]
Douglas, K. M.et al. Understanding Conspiracy Theories.Political Psychology 40, 3–35 (2019). URL https://onlinelibrary.wiley.com/doi/abs/10.1111/pops. 12568. _eprint: https://onlinelibrary.wiley.com/doi/pdf/10.1111/pops.12568
-
[2]
Enders, A. M., Uscinski, J., Klofstad, C. & Stoler, J. On the relationship between conspiracy theory beliefs, misinformation, and vaccine hesitancy.Plos one 17, e0276082 (2022)
work page 2022
-
[3]
Conspiracy theories and violent extremism.Counter Terrorist Trends and Analyses13, 1–9 (2021)
Basit, A. Conspiracy theories and violent extremism.Counter Terrorist Trends and Analyses13, 1–9 (2021)
work page 2021
-
[4]
Imhoff, R. & Bruder, M. Speaking (un–) truth to power: Conspiracy mentality as a generalised political attitude. European Journal of Personality28, 25–43 (2014)
work page 2014
-
[5]
Sutton, R. M. & Douglas, K. M. 14 examining the monological nature of con- spiracy theories. Power Polit. Paranoia Why People Are Suspicious Their Lead 29, 254–272 (2014)
work page 2014
-
[6]
Douglas, K. M., Sutton, R. M. & Cichocka, A. The psychology of conspiracy theories. Current directions in psychological science26, 538–542 (2017)
work page 2017
- [7]
-
[8]
Strömbäck, J., Broda, E., Tsfati, Y., Kossowska, M. & Vliegenthart, R. Disen- tangling the relationship between conspiracy mindset versus beliefs in specific conspiracy theories. Zeitschrift für Psychologie232, 18 (2024)
work page 2024
-
[9]
Frenken, M. & Imhoff, R. A uniform conspiracy mindset or differentiated reactions to specific conspiracy beliefs? evidence from latent profile analyses. International Review of Social Psychology34 (2021)
work page 2021
-
[10]
Imhoff, R. et al. Conspiracy mentality and political orientation across 26 countries. Nature human behaviour6, 392–403 (2022)
work page 2022
-
[11]
Kroke, A. M. & Ruthig, J. C. Conspiracy beliefs predicting health behaviors: an integration of the theory of planned behavior and health belief model.Current Psychology 43, 7959–7973 (2024)
work page 2024
-
[12]
Tangherlini, T. R., Shahsavari, S., Shahbazi, B., Ebrahimzadeh, E. & Roychowd- hury, V. An automated pipeline for the discovery of conspiracy and conspiracy theory narrative frameworks: Bridgegate, Pizzagate and storytelling on the web. PLOS ONE15,e0233879(2020). URLhttps://journals.plos.org/plosone/article? 21 id=10.1371/journal.pone.0233879. Publisher: ...
-
[13]
Faddoul, M., Chaslot, G. & Farid, H. A Longitudinal Analysis of YouTube’s Promotion of Conspiracy Videos (2020). URL http://arxiv.org/abs/2003.03318. ArXiv:2003.03318 [cs]
-
[14]
Samory, M. & Mitra, T. ’the government spies using our webcams’: The language of conspiracy theories in online discussions.Proc. ACM Hum.-Comput. Interact. 2 (2018). URL https://doi.org/10.1145/3274421
-
[15]
Naab, T. K. & Küchler, C. Content Analysis in the Research Field of Online User Comments, 441–450 (Springer Fachmedien Wiesbaden, Wiesbaden, 2023). URL https://doi.org/10.1007/978-3-658-36179-2_37
-
[16]
Gunton, K. in The impact of the internet and social media platforms on rad- icalisation to terrorism and violent extremism(eds Montasari, R., Carroll, F., Mitchell, I., Hara, S. & Bolton-King, R.)Privacy, Security And Forensics in The Internet of Things (IoT)167–177 (Springer, 2022)
work page 2022
-
[17]
Cinelli,M.,DeFrancisciMorales,G.,Galeazzi,A.,Quattrociocchi,W.&Starnini, M. The echo chamber effect on social media.Proceedings of the national academy of sciences118, e2023301118 (2021)
work page 2021
-
[18]
Sunstein, C. R. Republic: Divided democracy in the age of social media (2018)
work page 2018
- [19]
-
[20]
Phadke, S., Samory, M. & Mitra, T. Pathways through conspiracy: the evolution of conspiracy radicalization through engagement in online conspiracy discussions (2022)
work page 2022
-
[21]
Samory, M. & Mitra, T. Conspiracies Online: User Discussions in a Conspiracy Community Following Dramatic Events.Proceedings of the International AAAI Conference on Web and Social Media12(2018). URL https://ojs.aaai.org/index. php/ICWSM/article/view/15039. Number: 1
work page 2018
-
[22]
A Theory of Cognitive Dissonance (Stanford University Press, 1957)
Festinger, L. A Theory of Cognitive Dissonance (Stanford University Press, 1957)
work page 1957
- [23]
- [24]
-
[25]
Williams, T. J. V. & Tzani, C. How does language influence the radicalisation process? a systematic review of research exploring online extremist communica- tion and discussion. Behavioral Sciences of Terrorism and Political Aggression 16, 310–330 (2024)
work page 2024
-
[26]
Costello, T. H., Pennycook, G. & Rand, D. G. Durably reducing conspiracy beliefs through dialogues with ai.Science 385, eadq1814 (2024)
work page 2024
-
[27]
Dyer, K. D. & Hall, R. E. Effect of critical thinking education on epistemically unwarranted beliefs in college students.Research in Higher Education60, 293– 314 (2019)
work page 2019
-
[28]
Kunst, J. R.et al. Leveraging artificial intelligence to identify the psychological factors associated with conspiracy theory beliefs online.Nature Communications 15, 7497 (2024)
work page 2024
-
[29]
O’Mahony, C., Brassil, M., Murphy, G. & Linehan, C. The efficacy of interven- tions in reducing belief in conspiracy theories: A systematic review.PLoS One 18, e0280902 (2023)
work page 2023
-
[30]
Study conspiracy theories with compassion.Nature 603, 765 (2022)
Drążkiewicz, E. Study conspiracy theories with compassion.Nature 603, 765 (2022). URL https://www.nature.com/articles/d41586-022-00879-w
work page 2022
-
[31]
Dodds, M.Uncertain Conspiracies: A Latourian Analysis of R/Conspiracy in an Era of Global Upheaval. Ph.D. thesis, Concordia University (2021)
work page 2021
-
[32]
Phadke, S., Samory, M. & Mitra, T. What makes people join conspiracy commu- nities? role of social factors in conspiracy engagement.Proceedings of the ACM on Human-Computer Interaction4, 1–30 (2021)
work page 2021
-
[33]
Sutton, R. M. & Douglas, K. M. Rabbit hole syndrome: Inadvertent, accelerating, andentrenchedcommitmenttoconspiracybeliefs. Current Opinion in Psychology 48, 101462 (2022)
work page 2022
-
[34]
Russo, G., Ribeiro, M. H. & West, R. Stranger danger! cross-community inter- actions with fringe users increase the growth of fringe communities on reddit (2024)
work page 2024
-
[35]
Engel, K., Phadke, S. & Mitra, T. Learning from the ex-believers: Individuals’ journeys in and out of conspiracy theories online. Proc. ACM Hum.-Comput. Interact.7 (2023). URL https://doi.org/10.1145/3610076
-
[36]
Russo, G., Horta Ribeiro, M., Casiraghi, G. & Verginer, L. Understanding online migration decisions following the banning of radical communities (2023)
work page 2023
-
[37]
Fong, A., Roozenbeek, J., Goldwert, D., Rathje, S. & Van Der Linden, S. The language of conspiracy: A psychological analysis of speech used by conspiracy 23 theorists and their followers on twitter.Group Processes & Intergroup Relations 24, 606–623 (2021)
work page 2021
-
[38]
Danescu-Niculescu-Mizil, C., West, R., Jurafsky, D., Leskovec, J. & Potts, C. No country for old members: User lifecycle and linguistic change in online communities (2013)
work page 2013
- [39]
-
[40]
Zhong, C., Chang, H.-w., Karamshuk, D., Lee, D. & Sastry, N. Wearing many (social) hats: How different are your different social network personae? (2017)
work page 2017
- [41]
-
[42]
Danescu-Niculescu-Mizil, C., Gamon, M. & Dumais, S. Mark my words! linguis- tic style accommodation in social media (2011). URL https://doi.org/10.1145/ 1963405.1963509
-
[43]
Imhoff, R., Bertlich, T. & Frenken, M. Tearing apart the “evil” twins: A gen- eral conspiracy mentality is not the same as specific conspiracy beliefs.Current Opinion in Psychology46, 101349 (2022)
work page 2022
-
[44]
Spohr, D. Fake news and ideological polarization: Filter bubbles and selective exposure on social media.Business information review34, 150–160 (2017)
work page 2017
-
[45]
Hosseinmardi, H. et al. Examining the consumption of radical content on youtube. Proceedings of the National Academy of Sciences118, e2101967118 (2021)
work page 2021
-
[46]
Stroud, N. J. Media use and political predispositions: Revisiting the concept of selective exposure. Political behavior30, 341–366 (2008)
work page 2008
-
[47]
Mohammed, S. N. Conspiracy Theories and Flat-Earth Videos on YouTube. Social media and society8, 84–102 (2019). Number: 2 MAG ID: 2998879517 S2ID: fbb99c6cc8d48368afb5ada82b6bb2ae9fbbe3df
work page 2019
-
[48]
Grinberg, N., Joseph, K., Friedland, L., Swire-Thompson, B. & Lazer, D. Fake news on twitter during the 2016 us presidential election.Science 363, 374–378 (2019)
work page 2016
-
[49]
Habib, H., Srinivasan, P. & Nithyanand, R. Making a radical misogynist: How online social engagement with the manosphere influences traits of radicalization. Proc. ACM Hum.-Comput. Interact.6 (2022). URL https://doi.org/10.1145/ 3555551. 24
work page 2022
- [50]
-
[51]
A theory of freedom of expression
Scanlon, T. A theory of freedom of expression. Philosophy & Public Affairs 204–226 (1972)
work page 1972
-
[52]
Monti, C., Cinelli, M., Valensise, C., Quattrociocchi, W. & Starnini, M. Online conspiracy communities are more resilient to deplatforming. PNAS Nexus 2, pgad324 (2023). URL https://doi.org/10.1093/pnasnexus/pgad324
-
[53]
Tadesse, M. M., Lin, H., Xu, B. & Yang, L. Detection of depression-related posts in reddit social media forum.Ieee Access7, 44883–44893 (2019)
work page 2019
-
[54]
Faasse, K., Chatman, C. J. & Martin, L. R. A comparison of language use in pro-and anti-vaccination comments in response to a high profile facebook post. Vaccine34, 5808–5814 (2016)
work page 2016
-
[55]
Giachanou, A., Ghanem, B. & Rosso, P. Detection of conspiracy propagators using psycho-linguistic characteristics.Journal of Information Science49, 3–17 (2023)
work page 2023
-
[56]
Behavior research methods50, 344–361 (2018)
Garten, J.et al.Dictionaries and distributions: Combining expert knowledge and large scale textual data content analysis: Distributed dictionary representation. Behavior research methods50, 344–361 (2018)
work page 2018
-
[57]
Sun, N., Rau, P. P.-L. & Ma, L. Understanding lurkers in online communities: A literature review. Computers in Human Behavior38, 110–117 (2014)
work page 2014
-
[58]
Baumgartner, J., Zannettou, S., Keegan, B., Squire, M. & Blackburn, J. The pushshift reddit dataset (2020)
work page 2020
-
[59]
Valensise, C. M., Cinelli, M., Galeazzi, A. & Quattrociocchi, W. Drifts and shifts: characterizing the evolution of users interests on reddit.arXiv preprint arXiv:1912.09210 (2019)
-
[60]
Rollo, C., De Francisci Morales, G., Monti, C. & Panisson, A. Communities, gateways, and bridges: Measuring attention flow in the reddit political sphere (2022)
work page 2022
-
[61]
Boyd, R. L., Ashokkumar, A., Seraj, S. & Pennebaker, J. W. The development andpsychometricpropertiesofliwc-22. Austin, TX: University of Texas at Austin 10 (2022)
work page 2022
- [62]
-
[63]
Tausczik, Y. R. & Pennebaker, J. W. The psychological meaning of words: Liwc and computerized text analysis methods. Journal of language and social psychology 29, 24–54 (2010)
work page 2010
-
[64]
Random forests.Machine learning45, 5–32 (2001)
Breiman, L. Random forests.Machine learning45, 5–32 (2001)
work page 2001
-
[65]
Sutton, R. M. & Douglas, K. M. Conspiracy theories and the conspiracy mind- set: implications for political ideology.Current Opinion in Behavioral Sciences 34, 118–122 (2020). URL https://www.sciencedirect.com/science/article/pii/ S2352154620300358
work page 2020
-
[66]
Swami, V.et al.Conspiracist ideation in britain and austria: Evidence of a mono- logical belief system and associations between individual psychological differences and real-world and fictitious conspiracy theories.British Journal of Psychology 102, 443–463 (2011)
work page 2011
-
[67]
Aunifiedapproachtointerpretingmodelpredictions
Lundberg,S.M.&Lee,S.-I. Aunifiedapproachtointerpretingmodelpredictions. Advances in neural information processing systems30 (2017)
work page 2017
-
[68]
Sokal, R. & Michener, C. A statistical method for evaluating systematic relation- ships: The university of kansas science bulletin, v. 38.Sokal104938University of Kansas Science Bulletin19581049–1438 (1958)
work page 1958
-
[69]
Ojala, M. & Garriga, G. C. Permutation tests for studying classifier performance. Journal of machine learning research11 (2010)
work page 2010
-
[70]
Areanti-feministcommunitiesgateways to the far right? evidence from reddit and youtube (2021)
Mamié,R.,HortaRibeiro,M.&West,R. Areanti-feministcommunitiesgateways to the far right? evidence from reddit and youtube (2021)
work page 2021
-
[71]
Hollmann, N.et al.Accurate predictions on small data with a tabular foundation model. Nature 637, 319–326 (2025). 26 Fig. A1 Bias in the selection of subreddits using the activity of conspiracy users as proxy. The users we consider are those with at least 100 comments onr/conspiracy. Appendix A Dataset Information Table A1 Data sizes for each subreddit, i...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.