LLMs Can Get "Brain Rot": A Pilot Study on Twitter/X
Pith reviewed 2026-05-18 07:10 UTC · model grok-4.3
The pith
Continual pre-training on junk Twitter text causes lasting cognitive decline in large language models.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors establish that continual pre-training on Twitter data labeled as junk through either high engagement or low semantic quality produces non-trivial declines on reasoning benchmarks, long-context understanding, and safety evaluations while also increasing dark traits, with the magnitude of decline rising in proportion to the fraction of junk data used.
What carries the argument
The construction and use of junk and reverse-controlled Twitter datasets based on engagement degree and semantic quality, applied via controlled continual pre-training on four LLMs.
If this is right
- Reasoning performance on ARC-Challenge with chain-of-thought drops from 72.1 to 57.2 as the junk ratio rises from 0% to 100% under the engagement measure.
- Models increasingly truncate or skip steps in reasoning chains as the primary form of error.
- Additional instruction tuning and clean continual pre-training produce partial recovery but leave residual deficits below the original baseline.
- Tweet popularity predicts the size of the decline better than tweet length does.
Where Pith is reading between the lines
- Training data pipelines for ongoing model updates may need explicit filters based on semantic quality to limit gradual capability loss.
- Similar degradation patterns could appear with low-quality text from other social platforms that reward engagement over depth.
- Periodic checks on fixed reasoning and safety benchmarks could become routine to track model health during continual pre-training.
Load-bearing premise
The observed performance drops are caused by the engagement or semantic properties of the junk text rather than by unmatched statistical properties of the Twitter corpora or by the specific training schedule.
What would settle it
An experiment that matches the junk and control datasets on every statistical property including token frequencies and sequence statistics, then applies identical training, and still finds no performance difference would indicate the effect is not due to junk content.
read the original abstract
We propose and test the LLM Brain Rot Hypothesis: continual exposure to junk web text induces lasting cognitive decline in large language models (LLMs). To unveil junk effects, we designed a novel controlled experiment on real Twitter/X corpora, by constructing junk and reverse-controlled datasets via two orthogonal operationalizations: M1 (engagement degree) and M2 (semantic quality), with matched token scale and training operations across conditions. Compared to the control group, continual pre-training of 4 LLMs on the junk dataset causes non-trivial declines (Hedges' g>0.3) on reasoning, long-context understanding, safety, and inflating "dark traits" (e.g., psychopathy, narcissism). The gradual mixtures of junk and control datasets also yield dose-response cognition decay: for example, under M1, ARC-Challenge with Chain-of-Thought drops 72.1 -> 57.2 and RULER-CWE 83.7 -> 52.3 as junk ratio rises from 0% to 100%. Error forensics reveal several key insights. First, we identify thought-skipping as the primary lesion in reasoning: models increasingly truncate or skip chains. Second, partial but incomplete healing is observed: scaling instruction tuning and clean continual pre-training improve the declined cognition, yet cannot restore baseline capability, suggesting persistent representational drift rather than format mismatch. Finally, we discover that the popularity, a non-semantic metric, of a tweet is a better indicator of the Brain Rot effect than the length in M1. Together, the results provide significant, multi-perspective evidence that social effects of data could be a causal driver of LLM capability decay in continual pre-training, thereby motivating routine "cognitive health checks" for deployed and evolving LLMs.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript presents a pilot study testing the LLM Brain Rot Hypothesis: continual pre-training on junk Twitter/X text causes lasting cognitive decline in LLMs. Using two orthogonal definitions of junk (M1: engagement degree; M2: semantic quality) with matched token counts and training operations, the authors continually pre-train 4 LLMs and report non-trivial declines (Hedges' g > 0.3) on reasoning (ARC-Challenge CoT drops from 72.1 to 57.2), long-context understanding (RULER-CWE from 83.7 to 52.3), safety benchmarks, and inflated dark traits as junk ratio rises from 0% to 100%. They further identify thought-skipping as the dominant reasoning error, observe partial recovery via instruction tuning and clean pre-training, and note that tweet popularity outperforms length as a predictor under M1.
Significance. If the results hold, the work supplies multi-perspective empirical evidence that engagement and semantic properties of web data can causally degrade LLM capabilities during continual pre-training, beyond simple token-volume effects. Strengths include the orthogonal junk operationalizations, explicit dose-response curves, error forensics, and recovery experiments; these elements make the design more informative than single-condition comparisons. The pilot scope with four models limits generalizability but usefully motivates routine cognitive-health monitoring for evolving LLMs.
major comments (2)
- [Methods / Dataset Construction] Dataset construction and matching procedure: the central claim requires that observed declines on ARC-Challenge, RULER, and safety suites are driven by the semantic/engagement properties of the junk text rather than unmatched higher-order corpus statistics. Token counts and training operations are matched, yet no balancing or reporting is provided for perplexity under a reference LM, n-gram distributions, lexical diversity, topic distribution, or syntactic complexity between junk and control sets. These unmeasured differences could produce the reported performance drops independently of the junk labels.
- [Results / Error Forensics] Results section on error forensics: the identification of thought-skipping as the primary lesion is presented as a key insight, but the manuscript does not specify the a priori criteria or annotation protocol used to detect and quantify truncation or skipping of reasoning chains across conditions. Without this, it is unclear whether the pattern was hypothesized before inspection or emerged post-hoc, affecting the strength of the mechanistic interpretation.
minor comments (3)
- [Abstract] Abstract and results: the reported metric drops and Hedges' g values would be more interpretable with accompanying standard errors or confidence intervals; their absence on all metrics makes it harder to judge the reliability of the dose-response trends.
- [Evaluation Metrics] Throughout: the exact instruments and scoring procedures for safety benchmarks and dark-trait measures (psychopathy, narcissism) should be stated explicitly, including any prompt templates or evaluation rubrics.
- [Discussion] Discussion: the suggestion of persistent representational drift versus format mismatch could be strengthened by reporting representation-similarity or probing analyses before and after the continual pre-training stages.
Simulated Author's Rebuttal
We thank the referee for their constructive and insightful review. We appreciate the acknowledgment of the study's design strengths, including the orthogonal junk operationalizations and dose-response analyses. We address each major comment below with proposed revisions to enhance clarity and robustness.
read point-by-point responses
-
Referee: [Methods / Dataset Construction] Dataset construction and matching procedure: the central claim requires that observed declines on ARC-Challenge, RULER, and safety suites are driven by the semantic/engagement properties of the junk text rather than unmatched higher-order corpus statistics. Token counts and training operations are matched, yet no balancing or reporting is provided for perplexity under a reference LM, n-gram distributions, lexical diversity, topic distribution, or syntactic complexity between junk and control sets. These unmeasured differences could produce the reported performance drops independently of the junk labels.
Authors: We agree that explicit controls for higher-order corpus statistics would strengthen isolation of the junk properties' effects. While the consistent degradation patterns across two orthogonal junk definitions (M1 engagement and M2 semantic quality) provide evidence against purely superficial confounds, we did not report perplexity, n-gram distributions, lexical diversity, topic distributions, or syntactic complexity in the original manuscript. In the revised version, we will add these analyses (e.g., perplexity under a held-out reference LM, type-token ratios, and topic model comparisons) for the junk and control sets in the Methods section or Appendix to directly address this concern. revision: partial
-
Referee: [Results / Error Forensics] Results section on error forensics: the identification of thought-skipping as the primary lesion is presented as a key insight, but the manuscript does not specify the a priori criteria or annotation protocol used to detect and quantify truncation or skipping of reasoning chains across conditions. Without this, it is unclear whether the pattern was hypothesized before inspection or emerged post-hoc, affecting the strength of the mechanistic interpretation.
Authors: The thought-skipping pattern was identified through systematic manual review of model outputs on reasoning tasks, comparing error types across junk ratios. We defined it as abrupt truncation of reasoning chains without logical completion (distinct from factual errors or format violations). While the analysis had an exploratory component, we will revise the manuscript to explicitly document the annotation protocol, including the predefined error categories, sample sizes inspected per condition, and inter-annotator agreement if applicable. This will be added to the error forensics subsection to improve transparency and reproducibility. revision: yes
Circularity Check
No significant circularity: empirical measurements on external benchmarks
full rationale
The paper reports results from a controlled empirical study: junk and reverse-controlled Twitter datasets are constructed via two operationalizations (M1 engagement, M2 semantic quality), token scale and training operations are matched, four LLMs undergo continual pre-training, and performance is measured on independent external suites (ARC-Challenge, RULER, safety benchmarks). Declines, dose-response curves, thought-skipping observations, and partial healing under instruction tuning are direct experimental outcomes, not reductions of any internal equation or fitted parameter to itself. No self-definitional loops, fitted-input predictions, or load-bearing self-citation chains appear in the reported chain; the central claim rests on observable differences against fixed external metrics rather than on any ansatz or uniqueness theorem imported from the authors' prior work.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption High-engagement or low-semantic-quality tweets constitute junk text that induces cognitive decline when used for continual pre-training
invented entities (1)
-
LLM Brain Rot
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We propose and test the LLM Brain Rot Hypothesis: continual exposure to junk web text induces lasting cognitive decline in large language models (LLMs). ... constructing junk and reverse-controlled datasets via two orthogonal operationalizations: M1 (engagement degree) and M2 (semantic quality)
-
IndisputableMonolith/Foundation/ArithmeticFromLogic.leanembed_strictMono_of_one_lt unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
The gradual mixtures of junk and control datasets also yield dose-response cognition decay: for example, under M1, ARC-Challenge with Chain-of-Thought drops 72.1 -> 57.2 ... as junk ratio rises from 0% to 100%.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 4 Pith papers
-
The Impact of AI-Generated Text on the Internet
By mid-2025 roughly 35% of new websites are AI-generated or AI-assisted, correlating with lower semantic diversity and higher positive sentiment but showing no significant drop in factual accuracy or stylistic diversity.
-
LongTail Driving Scenarios with Reasoning Traces: The KITScenes LongTail Dataset
KITScenes LongTail supplies multimodal driving data and multilingual expert reasoning traces to benchmark models on rare scenarios beyond basic safety metrics.
-
State Contamination in Memory-Augmented LLM Agents
Toxic context can be laundered into memory summaries that stay below toxicity thresholds while still driving higher downstream toxicity in LLM agents compared to neutral baselines.
-
Sketching the Readout of Large Language Models for Scalable Data Attribution and Valuation
RISE applies CountSketch to dual lexical and semantic channels derived from output-layer gradient outer products, cutting data attribution storage by up to 112x and enabling retrospective and prospective influence ana...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.