LLM Analysis of 150+ years of German Parliamentary Debates on Migration Reveals Shift from Post-War Solidarity to Anti-Solidarity in the Last Decade
Pith reviewed 2026-05-18 17:24 UTC · model grok-4.3
The pith
German parliamentary debates showed high postwar solidarity toward migrants but a sharp rise in anti-solidarity since 2015.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Using a theory-driven scheme, the strongest LLMs achieve human-level agreement on labeling solidarity and anti-solidarity subtypes in German parliamentary migration speech. Systematic errors persist and bias trend estimates, but combining soft-label outputs with Design-based Supervised Learning reduces that bias. The corrected annotations show relatively high levels of solidarity, especially group-based and compassionate forms, throughout the postwar period and a marked rise in anti-solidarity since 2015, expressed through exclusion, undeservingness, and resource burden.
What carries the argument
LLM annotation of solidarity and anti-solidarity subtypes, combined with Design-based Supervised Learning to correct systematic bias in long-term trend estimates.
If this is right
- Political discourse on migrants moved from postwar solidarity toward exclusionary framing after 2015.
- Group-based and compassionate solidarity dominated earlier debates while resource-burden arguments rose recently.
- LLM-based annotation supports large-scale historical text analysis only when outputs are validated and bias-corrected.
- Systematic model errors can distort downstream social-science inferences unless addressed by design-based methods.
- The corrected labels enable tracing how solidarity subtypes changed across postwar displacement, labor migration, and recent refugee movements.
Where Pith is reading between the lines
- The same corrected-LLM pipeline could be applied to parliamentary records in other countries to test whether similar solidarity declines occurred after 2015.
- The post-2015 rise in anti-solidarity may be linked to specific legislative events or crises that future work could align with the trend line.
- If the framing shift holds, it suggests resource-burden arguments have become the dominant anti-solidarity register in contemporary German politics.
- Longer time-series analysis could reveal whether solidarity levels follow cyclical patterns tied to economic conditions or demographic changes.
Load-bearing premise
LLM outputs retain enough validity for long-term historical trend inference after statistical correction despite remaining systematic errors.
What would settle it
Manual re-annotation of a large stratified sample of pre-2015 and post-2015 debate segments that finds no increase in anti-solidarity frames would falsify the reported shift.
Figures
read the original abstract
Migration has been a core topic in German political debate, from the postwar displacement of millions of expellees to labor migration and recent refugee movements. Studying political speech across such wide-ranging phenomena in depth has traditionally required extensive manual annotation, limiting analysis to small subsets of the data. Large language models (LLMs) offer a potential way to overcome this constraint. Using a theory-driven annotation scheme, we examine how well LLMs annotate subtypes of solidarity and anti-solidarity in German parliamentary debates and whether the resulting labels support valid downstream inference. We first provide a comprehensive evaluation of multiple LLMs, analyzing the effects of model size, prompting strategies, fine-tuning, historical versus contemporary data, and systematic error patterns. We find that the strongest models, especially GPT-5 and gpt-oss-120B, achieve human-level agreement on this task, although their errors remain systematic and bias downstream results. To address this issue, we combine soft-label model outputs with Design-based Supervised Learning (DSL) to reduce bias in long-term trend estimates. Beyond the methodological evaluation, we interpret the resulting annotations from a social-scientific perspective to trace trends in solidarity and anti-solidarity toward migrants in postwar and contemporary Germany. Our approach shows relatively high levels of solidarity in the postwar period, especially in group-based and compassionate forms, and a marked rise in anti-solidarity since 2015, framed through exclusion, undeservingness, and resource burden. We argue that LLMs can support large-scale social-scientific text analysis, but only when their outputs are rigorously validated and statistically corrected.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript evaluates LLMs for annotating subtypes of solidarity and anti-solidarity in 150+ years of German parliamentary debates on migration. It reports human-level agreement for top models like GPT-5, identifies systematic errors, applies Design-based Supervised Learning (DSL) to combine soft labels and reduce bias for trend estimation, and interprets the corrected annotations as showing relatively high postwar solidarity (especially group-based and compassionate forms) with a marked rise in anti-solidarity since 2015 framed through exclusion, undeservingness, and resource burden.
Significance. If the DSL correction proves robust to historical language variation, the work would offer a scalable, theory-driven pipeline for large-scale historical discourse analysis that combines LLM annotation with statistical debiasing, enabling trend inference over periods where manual coding is impractical.
major comments (2)
- [Abstract / Methods] Abstract and methods: the claim that DSL mitigates bias sufficiently for valid long-term trend inference lacks quantitative details on correction performance (e.g., pre/post bias metrics or residual error impact specifically on the 2015 shift). This is load-bearing for the central historical comparison.
- [Results / Evaluation] Results / Evaluation: no era-specific validation is reported to test whether LLM error patterns (e.g., over-detection of exclusion frames in formal postwar German versus contemporary speech) are stationary; if the DSL correction is fit on mixed or recent data, it may not generalize to earlier periods and could distort the reported postwar-to-2015 solidarity shift.
minor comments (2)
- [Methods] Clarify the precise mathematical formulation of the soft-label + DSL combination and any assumptions about label noise stationarity.
- [Results] Add explicit before/after trend plots or tables showing the numerical effect of the DSL step on the key solidarity/anti-solidarity time series.
Simulated Author's Rebuttal
We thank the referee for their constructive and detailed comments, which highlight important aspects of our methodological claims. We address each major comment point by point below and have revised the manuscript accordingly where the concerns are valid.
read point-by-point responses
-
Referee: [Abstract / Methods] Abstract and methods: the claim that DSL mitigates bias sufficiently for valid long-term trend inference lacks quantitative details on correction performance (e.g., pre/post bias metrics or residual error impact specifically on the 2015 shift). This is load-bearing for the central historical comparison.
Authors: We agree that the manuscript would benefit from more explicit quantitative evidence on DSL performance to support the long-term inferences. In the revised version we have added a dedicated subsection reporting pre- and post-correction bias metrics (including mean bias reduction and confidence intervals) together with a direct assessment of the correction's impact on the estimated 2015 shift. These additions confirm a substantial bias reduction while preserving the direction and statistical significance of the observed change. revision: yes
-
Referee: [Results / Evaluation] Results / Evaluation: no era-specific validation is reported to test whether LLM error patterns (e.g., over-detection of exclusion frames in formal postwar German versus contemporary speech) are stationary; if the DSL correction is fit on mixed or recent data, it may not generalize to earlier periods and could distort the reported postwar-to-2015 solidarity shift.
Authors: The original manuscript already contains a comparison of model performance on historical versus contemporary subsets. To directly address stationarity, we have expanded the evaluation section with era-stratified metrics and a cross-era hold-out analysis. The results indicate that the dominant systematic error patterns remain sufficiently consistent across periods for the mixed-data DSL correction to be applied to the full series; we now discuss residual limitations of this assumption in the text. revision: partial
Circularity Check
No significant circularity; trends derived from externally validated LLM annotations plus statistical debiasing
full rationale
The paper's derivation chain begins with LLM annotation of solidarity/anti-solidarity subtypes, followed by direct comparison to human agreement benchmarks on held-out data, identification of systematic errors, and application of Design-based Supervised Learning (DSL) for bias correction before trend estimation. None of these steps reduce by construction to the target historical trends: the human benchmarks are independent external labels, DSL is a general statistical correction procedure applied to the observed error patterns rather than a fit tuned to the postwar-to-2015 contrast, and the final social-scientific interpretation follows from the corrected labels without redefining the inputs in terms of the outputs. No self-citation is load-bearing for the central claim, no ansatz is smuggled, and no prediction is statistically forced by construction. The analysis is therefore self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption LLMs can approximate human annotations on solidarity and anti-solidarity subtypes when evaluated against human agreement
- domain assumption Design-based Supervised Learning removes enough systematic bias to support valid long-term inferences
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/RealityFromDistinction.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Our approach shows relatively high levels of solidarity in the postwar period... and a marked rise in anti-solidarity since 2015, framed through exclusion, undeservingness, and resource burden.
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We combine soft-label model outputs with Design-based Supervised Learning (DSL) to reduce bias in long-term trend estimates.
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
“Wir glauben, daß das Problem der Beschaffung von Arbeit für [Umsiedler] unter dem kapitalis- tischen System nicht in der Form gelöst werden kann [...]. Wir sind also der Auffassung, daß die Umsiedlung eine Notwendigkeit ist [...].Wenn hier der Sprecher der CDU das schöne, stolze und richtige Wort geprägt hat ‘Unser Haus ist deutsch’ , wenn er am Schluß g...
work page 2009
-
[2]
Diese Zahl wächst seit 2015 durch die Zuwan- derung von Gering- und Unqualifizierten rasant
“[...] jährlich [verlassen] etwa 140 000 Hochkompetente [...] dieses Land und im Gegenzug nach derzeitigem Stand um die 200 000 sogenannte Flüchtlinge hereinkommen, von denen nicht nur kaum einer hochkompe- tent ist, sondern viele Analphabeten sind.In Deutschland, meine Damen und Herren, lebt bereits eine große soziale Unterschicht [...]. Diese Zahl wächs...
work page 2015
-
[3]
“[...] Die gute Nachricht ist: Der Politikwechsel hat begonnen. [...]Die illegale Migration der letzten zehn Jahre gefährdet die politische Sta- bilität Deutschlands und Europas.[...] Unsere Städte und Gemeinden, die Landkreise: Alle sind über dem Limit. In den Kitas, in den Schulen, auf dem Wohnungsmarkt, bei der Sicherheit an den Bahnhöfen und auf den M...
work page 2009
-
[4]
Damit betreibt sie eine gefährliche und spaltende Gleichsetzung
“[...]Migration wird pauschal als sicherheit- spolitisches Problem dargestellt.Die AfD führt die sogenannte gescheiterte Migrationspolitik in einer Liste von Krisen auf – neben dem Krieg in der Ukraine, der Coronapandemie und der hybriden Bedrohung. Damit betreibt sie eine gefährliche und spaltende Gleichsetzung. Sie erk- lärt das bloße Vorhandensein von ...
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.