Recognition: no theorem link
Measuring the Semantic Structure and Evolution of Conspiracy Theories
Pith reviewed 2026-05-15 00:26 UTC · model grok-4.3
The pith
Conspiracy theories form coherent semantic regions and evolve non-uniformly over time.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Conspiracy-related language forms coherent and semantically distinguishable regions of language space, allowing conspiracy theories to be treated as semantic objects. Using aligned word embeddings on Reddit comments spanning 2012-2022, the analysis shows these objects evolve non-uniformly through semantic stability, expansion, contraction, and replacement, patterns invisible to keyword-based methods.
What carries the argument
Aligned word embeddings used to map and compare semantic neighborhoods of conspiracy terms across different time periods in a large Reddit corpus.
If this is right
- Conspiracy theories can be modeled and tracked as evolving semantic objects.
- Non-uniform evolution means some theories persist in meaning while others transform significantly.
- Keyword approaches are insufficient for capturing the full dynamics of conspiracy language.
- This enables finer-grained study of how meanings in political discourse shift over time.
Where Pith is reading between the lines
- This method could be applied to track semantic shifts in other types of online belief systems.
- Semantic stability patterns might correlate with the longevity of specific conspiracy theories.
- It suggests potential for real-time monitoring of meaning changes in political discussions.
Load-bearing premise
That the alignment of word embeddings across time periods preserves true semantic relationships without introducing distortions from the alignment technique or subreddit data biases.
What would settle it
A study comparing the embedding-based evolution metrics against human-coded semantic similarity judgments on sampled conspiracy texts from 2012 and 2022.
read the original abstract
Research on conspiracy theories has largely focused on belief formation, exposure, and diffusion, while paying less attention to how their meanings change over time. This gap persists partly because conspiracy-related terms are often treated as stable lexical markers, making it difficult to separate genuine semantic changes from surface-level vocabulary changes. In this paper, we measure the semantic structure and evolution of conspiracy theories in online political discourse. Using 169.9M comments from Reddit's r/politics subreddit spanning 2012--2022, we first demonstrate that conspiracy-related language forms coherent and semantically distinguishable regions of language space, allowing conspiracy theories to be treated as semantic objects. We then track how these objects evolve over time using aligned word embeddings, enabling comparisons of semantic neighborhoods across periods. Our analysis reveals that conspiracy theories evolve non-uniformly, exhibiting patterns of semantic stability, expansion, contraction, and replacement that are not captured by keyword-based approaches alone.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper claims that conspiracy-related language in 169.9M Reddit r/politics comments (2012–2022) forms coherent and semantically distinguishable regions in embedding space, allowing conspiracy theories to be treated as semantic objects, and that these objects evolve non-uniformly via patterns of stability, expansion, contraction, and replacement that keyword-based methods miss, as measured with aligned word embeddings.
Significance. If the embedding-based measurements are shown to be robust, the work would provide a concrete quantitative method for tracking semantic evolution in conspiracy discourse beyond static lexical markers, with potential to inform computational social science on meaning change in polarized online communities.
major comments (3)
- [Methods] Methods (embedding alignment subsection): the central non-uniform evolution claim depends on aligned embeddings (likely Procrustes or equivalent) capturing genuine semantic shifts rather than alignment artifacts or subreddit topic drift; the manuscript provides no alignment stability metrics, shuffled-corpus controls, or held-out neighborhood validation against human judgments.
- [Results] Results (coherence and evolution sections): the assertion that conspiracy language forms coherent, distinguishable regions lacks quantitative support such as coherence scores, silhouette coefficients, or baseline comparisons to random or non-conspiracy terms, leaving the 'semantically distinguishable' claim dependent on visualizations alone.
- [Results] Results (temporal patterns): reported patterns of stability/expansion/contraction/replacement are presented without error analysis, statistical significance tests, or controls for varying comment volume and annotation biases in seed selection, which are load-bearing for the claim that these patterns are not captured by keyword approaches.
minor comments (3)
- [Methods] Clarify the exact procedure and hyperparameters for training yearly embeddings and the alignment method in the methods section.
- [Figures] Add more descriptive figure captions that explicitly state the time slices and embedding dimensions used.
- [Data] The abstract states 169.9M comments; ensure the data filtering and conspiracy-term seed selection criteria are fully detailed to allow replication.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive comments. We address each major point below and will incorporate the suggested validations and quantitative analyses into a revised manuscript.
read point-by-point responses
-
Referee: [Methods] Methods (embedding alignment subsection): the central non-uniform evolution claim depends on aligned embeddings (likely Procrustes or equivalent) capturing genuine semantic shifts rather than alignment artifacts or subreddit topic drift; the manuscript provides no alignment stability metrics, shuffled-corpus controls, or held-out neighborhood validation against human judgments.
Authors: We agree that explicit validation of the alignment procedure is needed to rule out artifacts. In the revised manuscript we will report alignment stability via average cosine similarity of a fixed set of anchor terms across consecutive periods, include shuffled-corpus controls that randomly permute temporal labels while preserving corpus size, and add a small-scale human validation study on held-out neighborhood changes for a subset of terms. revision: yes
-
Referee: [Results] Results (coherence and evolution sections): the assertion that conspiracy language forms coherent, distinguishable regions lacks quantitative support such as coherence scores, silhouette coefficients, or baseline comparisons to random or non-conspiracy terms, leaving the 'semantically distinguishable' claim dependent on visualizations alone.
Authors: The current version relies primarily on visualizations. We will add quantitative support in the revised results: average intra-cluster cosine similarity for conspiracy terms versus random samples, silhouette coefficients measuring separation from non-conspiracy terms, and explicit baseline comparisons against randomly sampled vocabulary items of matched frequency. revision: yes
-
Referee: [Results] Results (temporal patterns): reported patterns of stability/expansion/contraction/replacement are presented without error analysis, statistical significance tests, or controls for varying comment volume and annotation biases in seed selection, which are load-bearing for the claim that these patterns are not captured by keyword approaches.
Authors: We acknowledge the absence of error bars and formal tests. The revision will include bootstrap-derived confidence intervals for each evolution metric, permutation tests comparing embedding-based patterns against keyword baselines, volume-normalized subsampling to equalize comment counts across periods, and sensitivity checks that vary the seed-term sets to assess robustness to annotation choices. revision: yes
Circularity Check
No significant circularity; empirical measurements on external corpus using standard methods
full rationale
The paper applies established word-embedding and alignment techniques to an external 169.9M-comment Reddit corpus. The claims that conspiracy language forms coherent regions and evolves via stability/expansion/contraction are direct outputs of neighborhood analysis and temporal comparisons on the data, not reductions of fitted parameters to themselves or load-bearing self-citations. No equations or steps equate a derived quantity to its own input by construction.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Word embeddings capture semantic meaning based on contextual co-occurrence patterns in text corpora.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.