arxiv: 2603.26062 · v2 · submitted 2026-03-27 · 💻 cs.CL · cs.CY· cs.SI

Recognition: no theorem link

Measuring the Semantic Structure and Evolution of Conspiracy Theories

Manisha Keim , Sarmad Chandio , Osama Khalid , Rishab Nithyanand

Authors on Pith no claims yet

Pith reviewed 2026-05-15 00:26 UTC · model grok-4.3

classification 💻 cs.CL cs.CYcs.SI

keywords conspiracy theoriessemantic analysisword embeddingsredditlanguage evolutiononline discoursepolitical language

0 comments

The pith

Conspiracy theories form coherent semantic regions and evolve non-uniformly over time.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper demonstrates that conspiracy-related language occupies distinct, coherent areas in semantic space derived from online discussions, making it possible to study conspiracy theories as dynamic semantic entities. Using aligned word embeddings on nearly 170 million Reddit comments from 2012 to 2022, the authors track changes in these semantic neighborhoods. The analysis uncovers that conspiracy theories do not change uniformly but show varied patterns including periods of stability, growth, shrinkage, and outright replacement of ideas. Traditional keyword tracking overlooks these deeper shifts in meaning.

Core claim

Conspiracy-related language forms coherent and semantically distinguishable regions of language space, allowing conspiracy theories to be treated as semantic objects. Using aligned word embeddings on Reddit comments spanning 2012-2022, the analysis shows these objects evolve non-uniformly through semantic stability, expansion, contraction, and replacement, patterns invisible to keyword-based methods.

What carries the argument

Aligned word embeddings used to map and compare semantic neighborhoods of conspiracy terms across different time periods in a large Reddit corpus.

If this is right

Conspiracy theories can be modeled and tracked as evolving semantic objects.
Non-uniform evolution means some theories persist in meaning while others transform significantly.
Keyword approaches are insufficient for capturing the full dynamics of conspiracy language.
This enables finer-grained study of how meanings in political discourse shift over time.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

This method could be applied to track semantic shifts in other types of online belief systems.
Semantic stability patterns might correlate with the longevity of specific conspiracy theories.
It suggests potential for real-time monitoring of meaning changes in political discussions.

Load-bearing premise

That the alignment of word embeddings across time periods preserves true semantic relationships without introducing distortions from the alignment technique or subreddit data biases.

What would settle it

A study comparing the embedding-based evolution metrics against human-coded semantic similarity judgments on sampled conspiracy texts from 2012 and 2022.

read the original abstract

Research on conspiracy theories has largely focused on belief formation, exposure, and diffusion, while paying less attention to how their meanings change over time. This gap persists partly because conspiracy-related terms are often treated as stable lexical markers, making it difficult to separate genuine semantic changes from surface-level vocabulary changes. In this paper, we measure the semantic structure and evolution of conspiracy theories in online political discourse. Using 169.9M comments from Reddit's r/politics subreddit spanning 2012--2022, we first demonstrate that conspiracy-related language forms coherent and semantically distinguishable regions of language space, allowing conspiracy theories to be treated as semantic objects. We then track how these objects evolve over time using aligned word embeddings, enabling comparisons of semantic neighborhoods across periods. Our analysis reveals that conspiracy theories evolve non-uniformly, exhibiting patterns of semantic stability, expansion, contraction, and replacement that are not captured by keyword-based approaches alone.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows conspiracy language on Reddit forms distinct semantic clusters that shift non-uniformly over a decade via aligned embeddings, beyond what keywords capture.

read the letter

The core contribution is applying aligned word embeddings to 170 million r/politics comments from 2012-2022 to treat conspiracy theories as evolving semantic objects rather than fixed terms. They first show these terms cluster coherently in embedding space, then track neighborhood changes to identify stability, expansion, contraction, and replacement patterns. This moves past static keyword tracking and gives a quantitative handle on how meanings drift in online discourse. The scale of the corpus and the explicit contrast with keyword baselines are the parts that land cleanly. It is a straightforward extension of embedding alignment techniques to a domain where temporal change matters. The weakest part is the reliance on the alignment process itself. Political subreddits have strong topic drift and uneven comment volumes, so neighborhood shifts could partly reflect alignment artifacts or sampling biases rather than genuine semantic evolution. The abstract gives no numbers on alignment stability, shuffled controls, or human checks on whether the measured neighborhoods match actual usage. Without those, the non-uniform evolution claim stays provisional. This is useful for researchers already working on computational social science and misinformation tracking who need a way to quantify narrative change. It is not yet tight enough for immediate citation in my own work, but the idea is clear enough that a serious referee should see it. I would recommend sending it out for review so the authors can add the missing validation steps.

Referee Report

3 major / 3 minor

Summary. The paper claims that conspiracy-related language in 169.9M Reddit r/politics comments (2012–2022) forms coherent and semantically distinguishable regions in embedding space, allowing conspiracy theories to be treated as semantic objects, and that these objects evolve non-uniformly via patterns of stability, expansion, contraction, and replacement that keyword-based methods miss, as measured with aligned word embeddings.

Significance. If the embedding-based measurements are shown to be robust, the work would provide a concrete quantitative method for tracking semantic evolution in conspiracy discourse beyond static lexical markers, with potential to inform computational social science on meaning change in polarized online communities.

major comments (3)

[Methods] Methods (embedding alignment subsection): the central non-uniform evolution claim depends on aligned embeddings (likely Procrustes or equivalent) capturing genuine semantic shifts rather than alignment artifacts or subreddit topic drift; the manuscript provides no alignment stability metrics, shuffled-corpus controls, or held-out neighborhood validation against human judgments.
[Results] Results (coherence and evolution sections): the assertion that conspiracy language forms coherent, distinguishable regions lacks quantitative support such as coherence scores, silhouette coefficients, or baseline comparisons to random or non-conspiracy terms, leaving the 'semantically distinguishable' claim dependent on visualizations alone.
[Results] Results (temporal patterns): reported patterns of stability/expansion/contraction/replacement are presented without error analysis, statistical significance tests, or controls for varying comment volume and annotation biases in seed selection, which are load-bearing for the claim that these patterns are not captured by keyword approaches.

minor comments (3)

[Methods] Clarify the exact procedure and hyperparameters for training yearly embeddings and the alignment method in the methods section.
[Figures] Add more descriptive figure captions that explicitly state the time slices and embedding dimensions used.
[Data] The abstract states 169.9M comments; ensure the data filtering and conspiracy-term seed selection criteria are fully detailed to allow replication.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the detailed and constructive comments. We address each major point below and will incorporate the suggested validations and quantitative analyses into a revised manuscript.

read point-by-point responses

Referee: [Methods] Methods (embedding alignment subsection): the central non-uniform evolution claim depends on aligned embeddings (likely Procrustes or equivalent) capturing genuine semantic shifts rather than alignment artifacts or subreddit topic drift; the manuscript provides no alignment stability metrics, shuffled-corpus controls, or held-out neighborhood validation against human judgments.

Authors: We agree that explicit validation of the alignment procedure is needed to rule out artifacts. In the revised manuscript we will report alignment stability via average cosine similarity of a fixed set of anchor terms across consecutive periods, include shuffled-corpus controls that randomly permute temporal labels while preserving corpus size, and add a small-scale human validation study on held-out neighborhood changes for a subset of terms. revision: yes
Referee: [Results] Results (coherence and evolution sections): the assertion that conspiracy language forms coherent, distinguishable regions lacks quantitative support such as coherence scores, silhouette coefficients, or baseline comparisons to random or non-conspiracy terms, leaving the 'semantically distinguishable' claim dependent on visualizations alone.

Authors: The current version relies primarily on visualizations. We will add quantitative support in the revised results: average intra-cluster cosine similarity for conspiracy terms versus random samples, silhouette coefficients measuring separation from non-conspiracy terms, and explicit baseline comparisons against randomly sampled vocabulary items of matched frequency. revision: yes
Referee: [Results] Results (temporal patterns): reported patterns of stability/expansion/contraction/replacement are presented without error analysis, statistical significance tests, or controls for varying comment volume and annotation biases in seed selection, which are load-bearing for the claim that these patterns are not captured by keyword approaches.

Authors: We acknowledge the absence of error bars and formal tests. The revision will include bootstrap-derived confidence intervals for each evolution metric, permutation tests comparing embedding-based patterns against keyword baselines, volume-normalized subsampling to equalize comment counts across periods, and sensitivity checks that vary the seed-term sets to assess robustness to annotation choices. revision: yes

Circularity Check

0 steps flagged

No significant circularity; empirical measurements on external corpus using standard methods

full rationale

The paper applies established word-embedding and alignment techniques to an external 169.9M-comment Reddit corpus. The claims that conspiracy language forms coherent regions and evolves via stability/expansion/contraction are direct outputs of neighborhood analysis and temporal comparisons on the data, not reductions of fitted parameters to themselves or load-bearing self-citations. No equations or steps equate a derived quantity to its own input by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard NLP assumptions about embeddings representing semantics via co-occurrence; no free parameters, new entities, or ad-hoc axioms are introduced in the abstract description.

axioms (1)

domain assumption Word embeddings capture semantic meaning based on contextual co-occurrence patterns in text corpora.
Invoked implicitly when treating conspiracy language as forming coherent semantic regions via embeddings.

pith-pipeline@v0.9.0 · 5464 in / 1171 out tokens · 44366 ms · 2026-05-15T00:26:02.892556+00:00 · methodology