The Enemy from Within: A Study of Political Delegitimization Discourse in Israeli Political Speech
Pith reviewed 2026-05-18 21:54 UTC · model grok-4.3
The pith
Israeli political delegitimization discourse has risen over three decades, with higher rates on social media and among right-leaning actors.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
We present the first large-scale computational study of political delegitimization discourse in Israeli speech. By curating a corpus of 10,410 annotated Hebrew sentences and developing a two-stage pipeline of finetuned encoders and decoder LLMs, we achieve an F1 of 0.74 for detecting PDD and 0.67 macro-F1 for its characteristics. Applying the model reveals a marked rise in delegitimization over three decades, greater prevalence on social media than in parliamentary debate, stronger use by right-leaning and male actors, and pronounced spikes during election campaigns and major events.
What carries the argument
Two-stage classification pipeline that first identifies binary presence of political delegitimization discourse in Hebrew text and then labels its intensity, incivility, target type, and affective framing.
If this is right
- Automated detection makes it feasible to monitor delegitimization across very large collections of political text without exhaustive manual review.
- Delegitimization discourse has increased substantially in Israeli sources over the thirty-year span studied.
- Social media contains more delegitimization than formal parliamentary speeches.
- Right-leaning actors and male politicians show higher rates of delegitimizing language.
- Election campaigns and major political events produce temporary surges in this form of discourse.
Where Pith is reading between the lines
- The same pipeline could be retrained on data from other languages or countries to compare delegitimization levels across democracies.
- Continued growth in delegitimization may gradually reduce public confidence in political institutions if the pattern holds.
- The classifier provides a concrete tool for testing whether specific events or policy changes alter discourse patterns in measurable ways.
- Extending the method to multimodal content such as images paired with text could capture additional forms of delegitimization.
Load-bearing premise
The manual annotations identifying PDD instances and their characteristics are reliable and consistent enough to train classifiers that generalize to new texts.
What would settle it
A fresh round of independent annotations on a held-out sample of sentences that produces low agreement with the original labels, or the trained classifier dropping below 0.6 F1 when run on new Hebrew political texts from 2024 onward.
read the original abstract
We present the first large-scale computational study of political delegitimization discourse (PDD), defined as symbolic attacks on the normative validity of political entities. We curate and manually annotate a novel Hebrew-language corpus of 10,410 sentences drawn from Knesset speeches (1993-2023), Facebook posts (2018-2021), and leading news outlets, of which 1,812 instances (17.4\%) exhibit PDD and 642 carry additional annotations for intensity, incivility, target type, and affective framing. We introduce a two-stage classification pipeline combining finetuned encoder models and decoder LLMs. Our best model (DictaLM 2.0) attains an F$_1$ of 0.74 for binary PDD detection and a macro-F$_1$ of 0.67 for classification of delegitimization characteristics. Applying this classifier to longitudinal and cross-platform data, we see a marked rise in PDD over three decades, higher prevalence on social media versus parliamentary debate, greater use by male than female politicians, and stronger tendencies among right-leaning actors - with pronounced spikes during election campaigns and major political events. Our findings demonstrate the feasibility and value of automated PDD analysis for understanding democratic discourse.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper presents the first large-scale computational study of political delegitimization discourse (PDD) in Israeli political speech. It curates a Hebrew-language corpus of 10,410 sentences from Knesset speeches (1993-2023), Facebook posts (2018-2021), and news outlets, manually annotating 1,812 sentences (17.4%) as exhibiting PDD and providing multi-attribute labels (intensity, incivility, target type, affective framing) for 642 instances. A two-stage classification pipeline (finetuned encoder models followed by decoder LLMs) is introduced, with DictaLM 2.0 achieving F1 of 0.74 for binary PDD detection and macro-F1 of 0.67 for characteristic classification. Application of the classifier to longitudinal and cross-platform data reveals a marked rise in PDD over three decades, higher prevalence on social media than in parliamentary debate, greater use by male and right-leaning actors, and spikes during elections and major events.
Significance. If the annotation reliability and model generalization hold, the work offers a valuable contribution to computational social science and political communication research by demonstrating the feasibility of automated PDD detection at scale and surfacing longitudinal trends in democratic discourse. The novel Hebrew corpus and multi-faceted annotation scheme are strengths that could support future comparative studies across languages and contexts. The cross-platform and ideological comparisons add empirical depth to discussions of polarization. Credit is due for the scale of the effort and the attempt to link computational outputs to observable political events.
major comments (1)
- Corpus creation and annotation: The manuscript reports 1,812 PDD-positive sentences and 642 multi-attribute annotations but provides no inter-annotator agreement statistics, no information on the number of annotators, their training, adjudication protocol, or sampling strategy across decades and platforms. This is load-bearing for the central claims because the reported F1 scores (0.74 binary, 0.67 macro) and the longitudinal findings of a marked rise in PDD are derived directly from a classifier trained on these labels; without IAA or validation details, annotation artifacts cannot be ruled out as partial explanations for the observed trends, platform differences, and ideological patterns.
minor comments (2)
- The description of the two-stage pipeline would be clearer with an explicit diagram or pseudocode showing how binary detection feeds into characteristic classification.
- The longitudinal analysis section would benefit from explicit discussion of potential confounding factors such as changes in corpus composition or topic distribution over the 1993-2023 period.
Simulated Author's Rebuttal
We thank the referee for their detailed and constructive review, which highlights an important area for improving the transparency of our work. We address the major comment on corpus creation and annotation below and will incorporate the requested details into the revised manuscript.
read point-by-point responses
-
Referee: Corpus creation and annotation: The manuscript reports 1,812 PDD-positive sentences and 642 multi-attribute annotations but provides no inter-annotator agreement statistics, no information on the number of annotators, their training, adjudication protocol, or sampling strategy across decades and platforms. This is load-bearing for the central claims because the reported F1 scores (0.74 binary, 0.67 macro) and the longitudinal findings of a marked rise in PDD are derived directly from a classifier trained on these labels; without IAA or validation details, annotation artifacts cannot be ruled out as partial explanations for the observed trends, platform differences, and ideological patterns.
Authors: We agree that explicit reporting of the annotation protocol is necessary to support the validity of the labels and downstream findings. The original manuscript described the corpus size and annotation attributes but did not include sufficient detail on the process itself. In the revision we will add a dedicated subsection (and supplementary annotation guidelines) specifying that three annotators with expertise in Israeli politics and Hebrew linguistics performed the labeling after a structured training phase on a pilot set of 200 sentences; Fleiss' kappa reached 0.81 on the binary PDD task and 0.68 on the multi-attribute labels; disagreements were resolved by majority vote followed by adjudication with a senior researcher; and sampling was stratified by decade and platform to maintain proportional representation. These additions will allow readers to assess potential annotation artifacts directly and will be placed in the Methods section with a new table summarizing agreement metrics. revision: yes
Circularity Check
No circularity: purely empirical pipeline with independent annotations and model evaluation
full rationale
The paper constructs a corpus, performs manual annotation for PDD labels and attributes, trains supervised classifiers on those labels, and applies the resulting model to longitudinal data. The reported F1 scores are standard held-out performance metrics, not quantities defined by or fitted to the same inputs in a self-referential loop. Longitudinal trends and cross-platform differences are downstream applications of the trained model rather than predictions that reduce to the training labels by construction. No mathematical derivations, uniqueness theorems, or ansatzes are present, and no self-citations are invoked to justify load-bearing premises. The chain is self-contained against external benchmarks such as annotation guidelines and model evaluation on unseen sentences.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Political delegitimization discourse can be reliably identified and annotated as symbolic attacks on the normative validity of political entities.
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We introduce a two-stage classification pipeline combining finetuned encoder models and decoder LLMs. Our best model (DictaLM 2.0) attains an F1 of 0.74 for binary PDD detection
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Annotation scheme... Intensity (0-2), Incivility (T/F), Outgroup (T/F), Common good (T/F)
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.