pith. sign in

arxiv: 2606.08512 · v1 · pith:GNXJKOTYnew · submitted 2026-06-07 · 💻 cs.CY · cs.CL

Friend or Foe? Language as an ideological switch in open-weight LLMs under Russian disinformation stress

Pith reviewed 2026-06-27 17:59 UTC · model grok-4.3

classification 💻 cs.CY cs.CL
keywords LLMsdisinformationfine-tuningRussian languageUkrainian languagewar narrativesinformation sovereigntyhybrid warfare
0
0 comments X

The pith

Ukrainian-oriented LLMs accept Russian disinformation more readily than Russian-oriented ones when prompted in Russian.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests the assumption that fine-tuning LLMs for a specific cultural community will make them resistant to that community's adversaries' narratives. It audits four open models sharing the same base but adapted for Ukrainian, Russian, and other post-Soviet languages, querying each in Ukrainian, Russian, and English about ten wartime topics including Crimea, denazification, and atrocities at Bucha and Mariupol. Results reveal a Fine-Tuning Paradox in which the Ukrainian-oriented model shows the least resistance to Russian disinformation in Russian prompts, while the Russian-oriented model rejects it most strongly. Corpus composition, language coverage, and prompt format determine outcomes more than the model's stated cultural target. This finding questions whether local adaptation reliably builds information sovereignty against hybrid warfare narratives.

Core claim

The central claim is that nominal cultural provenance does not predict resilience: the Ukrainian-oriented model exhibits the weakest resistance to Russian disinformation in Russian, while the Russian-oriented model shows the strongest rejection, with corpus composition, language coverage, and prompt format proving more decisive than assumed alignment.

What carries the argument

Controlled audit of four open LLMs on shared base across three prompt languages and ten contested wartime narratives, exposing the Fine-Tuning Paradox.

If this is right

  • Prompt language influences model acceptance of disinformation more than the fine-tuning community's nominal orientation.
  • Corpus composition and multilingual coverage determine resistance to contested narratives more than cultural alignment.
  • Open models adapted for local languages do not automatically encode the political orientation of their target community.
  • The main risk to regional information sovereignty stems from the assumption that cultural fine-tuning guarantees resilience.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Audits of other language pairs or geopolitical narratives could test whether the same language-coverage dominance appears.
  • Providers may need to add explicit cross-lingual consistency checks beyond community-specific fine-tuning.
  • Balanced training data across adversarial viewpoints could reduce prompt-language sensitivity in future models.

Load-bearing premise

The four models differ primarily in their fine-tuning language and cultural target rather than in uncontrolled factors such as training data overlap, model scale, or safety tuning.

What would settle it

Demonstrating that the Ukrainian-oriented model rejects Russian disinformation more strongly than the Russian-oriented model under identical Russian prompts would falsify the reported paradox.

read the original abstract

As Russia's war against Ukraine extends into generative AI, large language models (LLMs) adapted for local post-Soviet languages are deployed in contested information environments. Policy and industry discourse assumes that culturally aligned adaptation encodes the political orientation of the target community: a Ukrainian-oriented model will resist Russian narratives, a Russian-oriented one will reinforce them. Does it? This article systematically disconfirms that assumption. We run a controlled audit of four openly available LLMs sharing a common base model but fine-tuned for different linguistic communities, querying them in Ukrainian, Russian and English across ten contested wartime narratives: Crimea, "denazification", the "one people" thesis, and atrocity denial at Bucha and Mariupol. The result is a Fine-Tuning Paradox: the Ukrainian-oriented model shows the weakest resistance to Russian disinformation in Russian, while the Russian-oriented one exhibits the strongest rejection. Corpus composition, language coverage and prompt format prove more decisive than nominal cultural provenance. We situate these findings within debates on hybrid warfare, digital sovereignty and post-imperial information orders, arguing that the principal threat to regional information sovereignty is not adversarial fine-tuning but the untested assumption that cultural alignment guarantees resilience.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper conducts a controlled empirical audit of four open-weight LLMs that share a common base but were fine-tuned for different post-Soviet linguistic communities. It queries the models in Ukrainian, Russian, and English on ten contested wartime narratives (Crimea, denazification, one-people thesis, Bucha and Mariupol atrocities) and reports a Fine-Tuning Paradox: the Ukrainian-oriented model shows the weakest resistance to Russian disinformation when prompted in Russian, while the Russian-oriented model exhibits the strongest rejection. The authors conclude that corpus composition, language coverage, and prompt format are more decisive than nominal cultural provenance, with implications for hybrid warfare and information sovereignty.

Significance. If the empirical patterns survive methodological scrutiny, the work would provide a concrete falsifiable counter-example to the policy assumption that culturally aligned fine-tuning automatically confers resilience to adversarial narratives. It supplies an empirical test of claims in digital sovereignty debates and highlights the risk of untested alignment assumptions, which is a substantive contribution to the cs.CY literature on LLMs in contested information environments.

major comments (2)
  1. [Methods] Methods section: The manuscript supplies no information on the number of queries per narrative, the response coding protocol, statistical tests performed, inter-rater reliability, or controls for prompt variation. Because the central claim (the Fine-Tuning Paradox and its attribution) rests entirely on these observed response differences, the absence of these details prevents assessment of whether the reported pattern is robust.
  2. [Results] Results / Discussion: The attribution of response differences primarily to corpus composition, language coverage, and prompt format (rather than model scale, safety tuning, or training-data overlap) is load-bearing for the claim that nominal cultural provenance is not decisive. No ablations, matching criteria, or explicit controls for these confounding variables are reported, leaving open the possibility that uncontrolled factors explain the observed ordering.
minor comments (1)
  1. [Abstract] The abstract and introduction use the term "controlled audit" without defining what controls were applied; a brief enumeration of the matching or balancing steps would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. The comments highlight important areas for improving transparency and strengthening the attribution of results. We address each major comment below and commit to revisions that enhance the manuscript without altering its core findings.

read point-by-point responses
  1. Referee: [Methods] Methods section: The manuscript supplies no information on the number of queries per narrative, the response coding protocol, statistical tests performed, inter-rater reliability, or controls for prompt variation. Because the central claim (the Fine-Tuning Paradox and its attribution) rests entirely on these observed response differences, the absence of these details prevents assessment of whether the reported pattern is robust.

    Authors: We agree that the current Methods section is insufficiently detailed for full reproducibility and robustness assessment. In the revised manuscript we will expand it to specify: five independent queries per narrative-language-model combination to mitigate stochasticity; a two-coder manual classification protocol into acceptance/rejection/neutral categories with disagreement resolution by consensus; chi-squared tests for distributional differences; Cohen's kappa inter-rater reliability (target >0.8); and fixed prompt templates that vary only the narrative content while holding structure, length, and instruction wording constant across conditions. These additions will directly enable evaluation of the reported patterns. revision: yes

  2. Referee: [Results] Results / Discussion: The attribution of response differences primarily to corpus composition, language coverage, and prompt format (rather than model scale, safety tuning, or training-data overlap) is load-bearing for the claim that nominal cultural provenance is not decisive. No ablations, matching criteria, or explicit controls for these confounding variables are reported, leaving open the possibility that uncontrolled factors explain the observed ordering.

    Authors: The experimental design matches all models on the identical base architecture, thereby controlling for scale and core capabilities while varying only the fine-tuning corpus and language focus. This matching criterion is stated in the manuscript. We acknowledge, however, that provider disclosures do not permit full ablations on safety-tuning intensity or exact training-data overlap. In revision we will add an explicit Limitations subsection that (a) reiterates the base-model matching, (b) discusses the untestable confounders, and (c) qualifies the causal language around corpus and prompt factors. No new experiments are feasible at this stage, but the discussion will be expanded to address the referee's concern. revision: partial

Circularity Check

0 steps flagged

No circularity: purely empirical audit with independent measurements

full rationale

The paper reports a controlled empirical audit of four LLMs on ten wartime narratives, comparing refusal rates across languages and models. No equations, fitted parameters, derivations, or self-citations appear in the provided text. The central claim rests on observed model outputs rather than any quantity defined by the authors' prior work or by construction. This matches the default expectation for non-circular empirical studies.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract provides no information on free parameters, axioms, or invented entities; all such elements are unknown.

pith-pipeline@v0.9.1-grok · 5750 in / 1127 out tokens · 24998 ms · 2026-06-27T17:59:52.475978+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

22 extracted references · 14 canonical work pages

  1. [1]

    denazification

    Friend or Foe? Language as an ideological switch in open-weight LLMs under Russian disinformation stress Anna Małgorzata Kamińska a,*, 1 and Tetiana Klynina b,c, 2 a Institute of Culture Studies, University of Silesia in Katowice, ul. Uniwersytecka 4, 40-007 Katowice, Poland; b University of Texas at Austin 2515, TX 78712 Speedway Str., Texas, USA; c Nati...

  2. [2]

    when responding to contested political and historical narratives? RQ2: Does the language of the query (English, Ukrainian, Russian) significantly influence the magnitude and direction of ideological drift, and to what extent does it correlate with the linguistic provenance of the models' fine-tuning corpora? RQ3: To what degree does prompt structure - spe...

  3. [3]

    On the Historical Unity of Russians and Ukrainians

    The subsequent Soviet Ukrainian SSR, whatever its political subordination, functioned as a formally distinct union republic with codified language rights, its own academy of sciences, and a seat in the United Nations General Assembly from 1945, providing de facto international recognition of Ukrainian distinctness even under Soviet governance. This histor...

  4. [4]

    propaganda and misinformation

    shows intermediate variance, with Lapa in Russian consistently at 40-60% for Position A, while the other models converge at 65%. A qualitative examination of model outputs at selected extreme cells illuminates the mechanisms underlying these quantitative contrasts. On Prompt 6 in Russian, the 65-point divergence between Lapa (30%) and Saiga (95%) does not...

  5. [5]

    Factuality challenges in the era of large language models and opportunities for fact-checking , volume =

    "Factuality Challenges in the Era of Large Language Models and Opportunities for Fact-Checking." Nature Machine Intelligence 6 (8): 852–863. https://doi.org/10.1038/s42256-024-00881-z. Bradshaw, Samantha, Mona Elswah, Monzima Haque, and Dorian Quelle

  6. [6]

    Strategic Storytelling: Russian State-Backed Media Coverage of the Ukraine War

    "Strategic Storytelling: Russian State-Backed Media Coverage of the Ukraine War." International Journal of Public Opinion Research 36 (3): edae028. https://doi.org/10.1093/ijpor/edae028. Castillo, Denis, Tetiana Klynina, Nicholas Pierce, and Mykhaylo Simanovskyy

  7. [7]

    Putin the Historian: Russian Disinformation Narratives Around Ukraine

    "Putin the Historian: Russian Disinformation Narratives Around Ukraine." Canadian Institute of Ukrainian Studies. https://ukrainian-studies.ca/2023/03/01/putin-the-historian-russian-disinformation-narratives-around-ukraine/. Chen, Kai, Zihao He, Jun Yan, Taiwei Shi, and Kristina Lerman

  8. [8]

    How Susceptible Are Large Language Models to Ideological Manipulation?

    "How Susceptible Are Large Language Models to Ideological Manipulation?" In Proceedings of the 2024 Conference on Empirical Methods in Natural Language Processing , 17140–17161. Miami: Association for Computational Linguistics. De Duro, Edoardo Sebastiano, Emma Franchino, Riccardo Improta, Giuseppe Alessandro Veltri, and Massimo Stella

  9. [9]

    Cognitive Networks Identify AI Biases on Societal Issues in Large Language Models

    "Cognitive Networks Identify AI Biases on Societal Issues in Large Language Models." EPJ Data Science 15 (7): 1–33. https://doi.org/10.1140/epjds/s13688-025-00600-7. 20 Haller, Patrick, Ansar Aynetdinov, and Alan Akbik

  10. [10]

    OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs

    "OpinionGPT: Modelling Explicit Biases in Instruction-Tuned LLMs." In Proceedings of the 2024 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 3: System Demonstrations) , 78–86. Mexico City: Association for Computational Linguistics. Hordiichuk, Olha, Alex Halapsis, and Mykola Kozlovets

  11. [11]

    How the Information Warfare Turns into Full-Scale Military Aggression: The Experience of Ukraine

    "How the Information Warfare Turns into Full-Scale Military Aggression: The Experience of Ukraine." Przegląd Strategiczny 16: 345–362. https://doi.org/10.14746/ps.2023.1.25. Howard, Philip N., Samuel Woolley, and Ryan Calo

  12. [12]

    Algorithms, Bots, and Political Communication in the US 2016 Election: The Challenge of Automated Political Communication for Election Law and Administration

    "Algorithms, Bots, and Political Communication in the US 2016 Election: The Challenge of Automated Political Communication for Election Law and Administration." Journal of Information Technology & Politics 15 (2): 81–93. https://doi.org/10.1080/19331681.2018.1448735. Khlevniuk, Daria, GN, and Boris Noordenbos

  13. [13]

    The Temporality of Memory Politics: An Analysis of Russian State Media Narratives on the War in Ukraine

    "The Temporality of Memory Politics: An Analysis of Russian State Media Narratives on the War in Ukraine." The British Journal of Sociology 76 (2): 390–406. https://doi.org/10.1111/1468-4446.13171. Locoman, Ecaterina, and Richard R. Lau

  14. [14]

    Narratives of Conflict: Russian Media’s Evolving Treatment of Ukraine (2013–2022)

    "Narratives of Conflict: Russian Media’s Evolving Treatment of Ukraine (2013–2022)." Media, War & Conflict 18 (3): 325–347. https://doi.org/10.1177/17506352241257053. Lunardi, Riccardo, David La Barbera, and Kevin Roitero

  15. [15]

    Stochastic Lies: How LLM-Powered Chatbots Deal with Russian Disinformation About the War in Ukraine

    "Stochastic Lies: How LLM-Powered Chatbots Deal with Russian Disinformation About the War in Ukraine." Harvard Kennedy School Misinformation Review 5 (4): 1–21. https://doi.org/10.37016/mr-2020-154. Maxwell, Alexander

  16. [16]

    Popular and scholarly primordialism: The politics of Ukrainian history during Russia's 2022 invasion of Ukraine

    "Popular and scholarly primordialism: The politics of Ukrainian history during Russia's 2022 invasion of Ukraine." Journal of Nationalism, Memory & Language Politics 16 (2): 152–171. https://doi.org/10.2478/jnmlp-2022-0008. Motoki, Fabio Y. S., Valdemar Pinho Neto, and Victor Rangel

  17. [17]

    Assessing Political Bias and Value Misalignment in Generative Artificial Intelligence

    "Assessing Political Bias and Value Misalignment in Generative Artificial Intelligence." Journal of Economic Behavior & Organization 234, 106904: 1–18. https://doi.org/10.1016/j.jebo.2025.106904. Naous, Tarek, Michael J. Ryan, Alan Ritter, and Wei Xu

  18. [18]

    Generative AI and Misinformation: A Scoping Review of the Role of Generative AI in the Generation, Detection, Mitigation, and Impact of Misinformation

    "Generative AI and Misinformation: A Scoping Review of the Role of Generative AI in the Generation, Detection, Mitigation, and Impact of Misinformation." AI & Society 41 (2): 1501–1515. https://doi.org/10.1007/s00146-025-02620-3. 21 Röttger, Paul, Valentin Hofmann, Valentina Pyatkin, Musashi Hinck, Hannah Rose Kirk, Hinrich Schütze, and Dirk Hovy

  19. [19]

    The Political Preferences of LLMs

    "The Political Preferences of LLMs." PLoS ONE 19 (7): e0306621. https://doi.org/10.1371/journal.pone.0306621. Sokhansanj, Bahrad A

  20. [20]

    Uncensored AI in the Wild: Tracking Publicly Available and Locally Deployable LLMs

    "Uncensored AI in the Wild: Tracking Publicly Available and Locally Deployable LLMs." Future Internet 17 (10), 477: 1–35. https://doi.org/10.3390/fi17100477. Wack, Morgan, Carl Ehrett, Darren Linvill, and Patrick Warren

  21. [21]

    Generative Propaganda: Evidence of AI’s Impact from a State-Backed Disinformation Campaign

    "Generative Propaganda: Evidence of AI’s Impact from a State-Backed Disinformation Campaign." PNAS Nexus 4 (4), pgaf083: 1–7. https://doi.org/10.1093/pnasnexus/pgaf083. Wolfe, Robert, Isaac Slaughter, Bin Han, Bingbing Wen, Yiwei Yang, Lucas Rosenblatt, Bernease Herman, et al

  22. [22]

    Laboratory-Scale AI: Open-Weight Models Are Competitive with ChatGPT Even in Low-Resource Settings

    "Laboratory-Scale AI: Open-Weight Models Are Competitive with ChatGPT Even in Low-Resource Settings." In Proceedings of the 2024 ACM Conference on Fairness, Accountability, and Transparency, 1199–1210. New York: Association for Computing Machinery. Woolley, Samuel C