pith. sign in

arxiv: 2604.27661 · v1 · submitted 2026-04-30 · 💻 cs.CL

Language Ideologies in a Multilingual Society: An LLM-based Analysis of Luxembourgish News Comments

Pith reviewed 2026-05-07 06:14 UTC · model grok-4.3

classification 💻 cs.CL
keywords language ideologiesLuxembourgishlarge language modelsmultilingual societyideological annotationnews commentslow-resource languagesdiscourse analysis
0
0 comments X

The pith

Large language models can practically identify language ideological content in Luxembourgish news comments despite incomplete optimization for multi-class tasks.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether large language models can detect language ideologies in user comments on Luxembourgish news sites. In Luxembourg's multilingual setting these ideologies go beyond simple preferences and instead reflect how people build identities and social belonging through language. The authors first label a set of comments by hand with ideological categories, then prompt LLMs to reproduce those labels under different conditions. They also check whether translating the Luxembourgish text into better-represented languages raises the models' accuracy. A reader would care because reliable automated detection would let researchers study discourse and identity at a scale that manual coding alone cannot reach, especially for small languages that are poorly covered in current training data.

Core claim

After manually annotating Luxembourgish news comments with predefined language-ideology categories, the authors evaluate large language models on their ability to replicate the human labels. The models prove useful for spotting ideological content even though they fall short of full optimization on the multi-class version of the task. The study further tests whether machine translation of the original Luxembourgish comments into high-resource languages improves classification performance for this low-resource language.

What carries the argument

The pipeline of hand-labeled ideological categories on Luxembourgish comments followed by LLM prompting for classification, with optional machine translation of the input text.

If this is right

  • LLMs can serve as scalable assistants for researchers studying how language ideologies shape identity in multilingual societies.
  • Translation to high-resource languages offers one workable route for applying these models to low-resource language data.
  • Even imperfect multi-class performance still yields usable signals for identifying ideological content in online discourse.
  • The approach supports larger-scale analysis of cultural and social meanings carried by language choices in places like Luxembourg.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Similar LLM-assisted annotation could be tested on comments or texts from other small or minority languages to map identity construction patterns.
  • Combining the models' output with targeted human review might produce richer qualitative insights into social belonging than either method alone.
  • If prompt engineering or fine-tuning raises accuracy, real-time monitoring of public discourse for ideological signals could become feasible in multilingual regions.

Load-bearing premise

The manually chosen ideological categories and the human annotations used as ground truth accurately and exhaustively capture the language ideologies present in the comments without major annotator bias or category overlap.

What would settle it

A fresh round of independent human annotations that produces substantially different category assignments for the same comments, after which the LLMs show markedly lower agreement with the new labels.

Figures

Figures reproduced from arXiv: 2604.27661 by Alistair Plum, Christoph Purschke, Emilia Milano, Yves Scherrer.

Figure 1
Figure 1. Figure 1: Prompt 4 comprehensive of instructions, definitions, and examples view at source ↗
Figure 2
Figure 2. Figure 2: JSON representation of the required schema. manually reviewed to improve translation quality; correcting mistakes due to orthographic variation and lack of capitalization, and adapting ideology bearing content to the target languages. The distri￾bution of categories within the two translation sets is described in view at source ↗
Figure 3
Figure 3. Figure 3: Confusion matrix per annotation with GPT-5 (a) Lang subset (b) Lang and Notlang subsets view at source ↗
Figure 5
Figure 5. Figure 5: Confusion matrix per annotation with Magistral (a) Lang subset (b) Lang and NotLang subsets view at source ↗
read the original abstract

Detecting language ideologies is a valuable yet complex task for understanding how identities are constructed through discourse. In Luxembourg's multicultural and multilingual society, language ideologies reflect more than simple preferences: they carry deep cultural and social meanings, shaping identities and social belonging. Following recent developments in applying Natural Language Processing tools to linguistics and social science, this paper explores the potential of large language models to assist in the detection of language ideologies. We manually annotate a corpus of user comments in Luxembourgish with predefined ideological categories and then evaluate the performance of large language models under varying prompt conditions to assess their ability to replicate these human annotations. Since Luxembourgish is a small language and poorly represented in the LLMs' training data, we also investigate whether machine-translating the data to high-resource languages increases performance on the ideology detection task. Our findings suggest that, while LLMs are not yet fully optimized for a multi-class ideological annotation task, they are practical tools to identify language ideological content.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 1 minor

Summary. The paper claims that LLMs can assist in detecting language ideologies in Luxembourgish news comments from a multilingual society. It describes manually annotating a corpus of user comments with predefined ideological categories, then evaluating LLMs under varying prompt conditions (including a machine-translation condition to high-resource languages) to measure how well they replicate the human annotations. The authors conclude that LLMs are not yet fully optimized for multi-class ideological annotation but remain practical tools for identifying language-ideological content.

Significance. If the central empirical comparison holds after proper validation, the work would be significant for computational sociolinguistics and digital humanities. It addresses a culturally situated task in an under-resourced language (Luxembourgish) within a multilingual setting, testing whether LLMs plus translation can scale ideological analysis where manual methods are costly. The translation experiment is a concrete contribution to handling low-resource scenarios in LLM-based social-science applications.

major comments (3)
  1. [Methods / annotation subsection] Methods / annotation subsection: No inter-annotator agreement statistics (Cohen’s κ, Fleiss’ κ, or equivalent), annotator training details, or disagreement-resolution procedure are reported. Because the entire evaluation of LLM performance is benchmarked against these human labels as ground truth, the absence of reliability metrics means the reported agreement between LLMs and humans cannot be interpreted as evidence of model capability versus agreement with a potentially noisy label set.
  2. [Results section] Results section: The abstract states that LLMs are 'practical tools' for identifying language-ideological content, yet no quantitative metrics (accuracy, macro-F1, per-class precision/recall), confusion matrices, or error analysis are supplied for the prompt-variation or translation conditions. Without these numbers it is impossible to assess whether the observed performance actually supports the 'practical' claim or merely reflects chance-level or category-imbalanced behavior.
  3. [Category definition / annotation scheme] Category definition / annotation scheme: The paper does not demonstrate that the manually predefined ideological categories are exhaustive or minimally overlapping for Luxembourgish comments (e.g., whether dimensions such as purism, identity construction, or economic utility are covered or bleed into one another). Because the LLM evaluation is defined relative to these categories, any incompleteness or overlap directly limits the generalizability of the 'practical tools' conclusion.
minor comments (1)
  1. [Abstract] Abstract: The abstract would be more informative if it included at least one key quantitative result (e.g., best F1 or accuracy under the translation condition) to ground the qualitative claim that LLMs are 'practical tools'.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough and constructive feedback. The comments identify key areas where additional methodological details and quantitative results will improve the paper's clarity and rigor. We will make the suggested revisions and respond to each major comment below.

read point-by-point responses
  1. Referee: [Methods / annotation subsection] Methods / annotation subsection: No inter-annotator agreement statistics (Cohen’s κ, Fleiss’ κ, or equivalent), annotator training details, or disagreement-resolution procedure are reported. Because the entire evaluation of LLM performance is benchmarked against these human labels as ground truth, the absence of reliability metrics means the reported agreement between LLMs and humans cannot be interpreted as evidence of model capability versus agreement with a potentially noisy label set.

    Authors: We agree that reliability metrics for the human annotations are essential to interpret the LLM results. We will revise the Methods section to include a full description of the annotation guidelines, annotator qualifications and training, and the disagreement-resolution procedure (iterative discussion until consensus). To provide a quantitative reliability measure, we will re-annotate a random 20% subset of the corpus with a second independent annotator and report Cohen’s κ. This addition will allow readers to assess label noise and strengthen the grounding of the LLM evaluation. revision: yes

  2. Referee: [Results section] Results section: The abstract states that LLMs are 'practical tools' for identifying language-ideological content, yet no quantitative metrics (accuracy, macro-F1, per-class precision/recall), confusion matrices, or error analysis are supplied for the prompt-variation or translation conditions. Without these numbers it is impossible to assess whether the observed performance actually supports the 'practical' claim or merely reflects chance-level or category-imbalanced behavior.

    Authors: We acknowledge that the current Results section emphasizes qualitative observations and does not present the full quantitative evaluation. We will add a summary table reporting accuracy, macro-F1, per-class precision and recall for every prompt condition and the machine-translation setup. We will also include confusion matrices (in the main text or appendix) and a dedicated error-analysis subsection that examines misclassifications, class imbalance effects, and whether performance exceeds chance baselines. These revisions will directly support and quantify the 'practical tools' claim. revision: yes

  3. Referee: [Category definition / annotation scheme] Category definition / annotation scheme: The paper does not demonstrate that the manually predefined ideological categories are exhaustive or minimally overlapping for Luxembourgish comments (e.g., whether dimensions such as purism, identity construction, or economic utility are covered or bleed into one another). Because the LLM evaluation is defined relative to these categories, any incompleteness or overlap directly limits the generalizability of the 'practical tools' conclusion.

    Authors: The categories were derived from established sociolinguistic literature on Luxembourgish and multilingual language ideologies. We will add a dedicated subsection in Methods that (a) provides explicit definitions and corpus examples for each category, (b) discusses potential overlaps with justifications for maintaining the distinctions, and (c) argues for exhaustiveness within the news-comment genre while acknowledging limits on generalizability to other text types. This will clarify the scope of the 'practical tools' conclusion. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical comparison of LLM outputs to human annotations

full rationale

The paper describes a standard empirical pipeline: manual annotation of Luxembourgish comments using predefined ideological categories, followed by direct evaluation of LLM performance (under different prompts and with/without translation) against those human labels. No equations, no fitted parameters, no predictions derived from the authors' own prior quantities, and no load-bearing self-citations that substitute for independent justification. The central claim rests on observable agreement metrics between model and human labels rather than any self-referential derivation or renaming of results. This is a self-contained data-driven study with no reduction of outputs to inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on two domain assumptions about annotation quality and category validity rather than mathematical free parameters or new entities.

axioms (2)
  • domain assumption Human annotations using the predefined ideological categories constitute reliable ground truth for evaluating LLM performance.
    All reported LLM results are measured against these annotations; any systematic bias in the human labels would directly affect the 'practical tools' conclusion.
  • domain assumption The chosen ideological categories are sufficient to represent the relevant language ideologies in Luxembourgish discourse.
    Both human and LLM annotation tasks are restricted to these categories; incomplete coverage would limit the generalizability of the findings.

pith-pipeline@v0.9.0 · 5469 in / 1386 out tokens · 74683 ms · 2026-05-07T06:14:10.191349+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

41 extracted references · 41 canonical work pages · 2 internal anchors

  1. [1]

    Language Ideologies in a Multilingual Society: An LLM-based Analysis of Luxembourgish News Comments

    Introduction Automatic detection of language ideologies is a promising but challenging task. Language ide- ologies, or shared beliefs about language, play a central role in reinforcing and establishing iden- tities and power imbalances (Blommaert, 2010). Although culturally tied to the speakers’ community, similar language ideologies have been found in di...

  2. [2]

    Is language ideology classification a suitable task for the current generation of LLMs?

  3. [3]

    Do the languages of the data and the quality of their translations have an impact on LLM performance on language ideology classifica- tion?

  4. [4]

    Our contributions offer an evaluation of LLMs for language ideologies identi- fication, assessing both the impact of the source language and translations on the same task

    Can LLMs contribute to inform sociolinguistic research without requiring high technical ex- pertise or task-specific model adaptation? To explore the feasibility and challenges of the task, weprovideLLMswithlanguageideologycate- gories,descriptionsandexamples,andinstructions to annotate accordingly. Our contributions offer an evaluation of LLMs for langua...

  5. [5]

    understanding

    Related work In this section, we discuss work related to Luxem- bourgish NLP (Section 2.1), as well as language ideologies (Section 2.2). After that, we discuss work on ideologies and LLMs (Section 2.3), as well as cross-lingual transfer in LLMs (Section 2.4). 2.1. Luxembourgish Luxembourgish is a West Germanic language with about 400.000 speakers (Gilles...

  6. [6]

    We operate at sentence level to reduce topic drift within long comments

    Data We frame ideology detection as a multi-class sen- tence annotation task over Luxembourgish user comments from the RTL.lu corpus. We operate at sentence level to reduce topic drift within long comments. Our corpus contains 300 comments with 1524 total sentences. Comments were split into sentences using GPT, and then manually cor- rected where the sent...

  7. [7]

    PromptsWe design four prompt configurations that vary the amount of information that is passed to the model

    Prompt Engineering Inthisfirstexperiment,wefocusonevaluatingdiffer- ent prompts in both zero- and few-shot scenarios. PromptsWe design four prompt configurations that vary the amount of information that is passed to the model. • Prompt 1lists the labels, instructs the LLM to annotate each sentence with one of the five ideology labels ornoneand to provide ...

  8. [8]

    Therefore, we adopt prompt 4 as the final prompting template for all further experiments

    This prompt also consistently outperforms other configurations across models, as F1 weighted score improves up to around 0.3 from prompt 1 to prompt 4 – see Table 2. Therefore, we adopt prompt 4 as the final prompting template for all further experiments. Prompt o3 GPT-4o GPT-4o-mini F1w F1m F1w F1m F1w F1m Prompt 1 0.734 0.431 0.491 0.248 0.563 0.280 Pro...

  9. [9]

    sentence_id

    Evaluating LLMs and cross-lingual transfer settings In this main experiment, we evaluate several widely used generative LLMs on the ideology annotation task with Prompt 4. We also vary the language of the comments, comparing annotation performance on the original Luxembourgish comments with their translations to English, French and German. Machine transla...

  10. [10]

    It shows that Luxem- bourgish does not need to be translated to higher- resourced languages in order to obtain satisfactory classification performance with generative LLMs

    Conclusions This paper offers an evaluation of LLMs for lan- guage ideology detection. It shows that Luxem- bourgish does not need to be translated to higher- resourced languages in order to obtain satisfactory classification performance with generative LLMs. It highlights the crucial role of model-generated ex- planations for this task and that the evalu...

  11. [11]

    Large language models reflect the ideology of their creators, 2024

    Large Language Models Reflect the Ideol- ogy of their Creators. ArXiv:2410.18417 [cs]. Kai Chen, Zihao He, Jun Yan, Taiwei Shi, and Kristina Lerman. 2024. How Susceptible are Large Language Models to Ideological Manipu- lation? Adam Dahlgren Lindström, Leila Methnani, Lea Krause, Petter Ericson, Íñigo Martínez de Rit- uerto de Troya, Dimitri Coelho Mollo,...

  12. [12]

    DeepSeek-AI, D

    Aya Expanse: Combining research break- throughs for a new multilingual frontier.arXiv preprint arXiv:2412.04261. Jwala Dhamala, Tony Sun, Varun Kumar, Satyapriya Krishna, Yada Pruksachatkun, Kai-Wei Chang, and Rahul Gupta. 2021. BOLD: Dataset and Metrics for Measuring Biases in Open-Ended Language Generation. In Proceedings of the 2021 ACM Conference on F...

  13. [13]

    InProceed- ings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024, pages 97–104, Torino, Italia

    Forget NLI, use a dictionary: Zero-shot topic classification for low-resource languages with application to Luxembourgish. InProceed- ings of the 3rd Annual Meeting of the Special Interest Group on Under-resourced Languages @ LREC-COLING 2024, pages 97–104, Torino, Italia. ELRA and ICCL. Alistair Plum, Caroline Döhmer, Emilia Milano, Anne-Marie Lutgen, an...

  14. [14]

    InProceedings of the 22nd Workshop on Treebanks and Lin- guistic Theories (TLT 2024), pages 30–39, Ham- burg,Germany

    LuxBank: The first Universal Dependency treebank for Luxembourgish. InProceedings of the 22nd Workshop on Treebanks and Lin- guistic Theories (TLT 2024), pages 30–39, Ham- burg,Germany. Association for Computational Linguistics. Alistair Plum, Tharindu Ranasinghe, and Christoph Purschke. 2025. Text generation models for Lux- embourgish with limited data: ...

  15. [15]

    Qwen3 Technical Report

    Qwen3 technical report.arXiv preprint arXiv:2505.09388. Zhihao Zhang, Jun Zhao, Qi Zhang, Tao Gui, and Xuanjing Huang. 2024. Unveiling linguistic re- gions in large language models. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 6228–6247, Bangkok, Thailand. Association for Computat...

  16. [16]

    lëtzebuergesch ass fir eis lëtzebuerger (250’000 leit) eis éischt sprooch

  17. [17]

    et gëtt leit de zu dachau agespaart ware wéint eiser sprooch an e vollek dat seng sproch ver- léiert verléiert och seng identitéit

  18. [18]

    ech perséinlech maan mer emmer en spaass draus vir mat de verschiddenen wierder ze spillen wann ech eppes schreiwen examples where this label is not applicable:

  19. [19]

    tschüss!

    mee mir kann et dee moment egal sinn, well ech da wahrscheinlech an däitschland wun- nen... tschüss!

  20. [20]

    The focal element of this category is always the language situation

    patricia courtois Vitality: The language related ideology ‘vitality’ groups opinions about Luxembourgish language considered: •endangered •threatened •in decay •soon to be replaced by other languages Threatening factors are: • languages other than Luxembourgish spoken in the country • impossibility of speaking Luxembourgish in ev- eryday contexts (shops, ...

  21. [21]

    awer eng babá fënnt ee séier

    engdagesmammdéilëtzebuergeschschwätzt ze fannen, ass quasi onméiglech. awer eng babá fënnt ee séier

  22. [22]

    soss gi mer all frankophon gemaach

  23. [23]

    ech denken sou seier geht dat net verluer examples where this label is not applicable:

  24. [24]

    bonjour a vill gléck a neie joer

  25. [25]

    all guddes an deem sënn

  26. [26]

    ech verstinn eent net,léif fra. Belonging: The language related ideology ‘belong- ing’ groups answers to the following questions: • Do people born outside Luxembourg belong in Luxembourg? • Do people speaking Luxembourgish at a basic level belong to Luxembourg? • Do people not learning Luxembourgish belong to Luxembourg? This category is about the integra...

  27. [27]

    eischt offiziell sprooch awer onbedengt fir e kloert zeeche fir d’integratioun ze setzen, fir eisen auslännesche matbierger kloer ze verstoen ze ginn dass mir eng eegestänneg denkweis, sprooch a kultur hunn an si sech eis unzepassen hunn andeems si net just 1 joer lëtzebuergesch léieren mee bis si et sou kënnen dass si et och kënne schwätzen a ver- stoen

  28. [28]

    ween lëtzebuerger wëll ginn muss och lëtze- buergesch schwätzen

  29. [29]

    ganz traureg daat do. wann et engem schlecht geht muss een sech kennen an sen- ger mammesprooch ausdrecken an et huet een et nët néideg sech vun schlecht gelaun- ten franséischen infirmièren ungranzen ze loossen. déi däitsch maachen jo nach éis- chter en effort an probéieren lëtzebuergesch ze schwätzen examples where this label is not applicable:

  30. [30]

    ierch all een schéinen owend

  31. [31]

    där meenung sin ech och

  32. [32]

    This category has two main elements: 1) responsible agent,2)subjectofresponsibility

    wou soll daat dann hin feieren? Responsibility: The language related ideology ‘responsibility’ groups opinions about who is con- sideredresponsibleforLuxembourgishdecay. This category has two main elements: 1) responsible agent,2)subjectofresponsibility. Thesubjectofre- sponsibility is the Luxembourgish language decay (everything defined in ‘vitality’ cat...

  33. [33]

    dann sollen dei ausländesch elteren emol ufänken eis sprooch ze leieren mee dofir sin mer ze faul

  34. [34]

    et ass wéi am palais, do gët och keen lëtze- buergesch geschwaat.sie hun problemer vir sech an eiser sprooch ze artikuléieren

  35. [35]

    wann dir am geschäft en- gem soot e soll letzebuergesch schwetzen, da get een einfach lenks leihen gelos an färdeg examples where this label is not applicable:

    et as leider net einfach ze verlaangen, dass soll letzebuergesch geschwaat soll gin, well soss get een nämlech ganz ganz ganz seier asl rassist ugesin. wann dir am geschäft en- gem soot e soll letzebuergesch schwetzen, da get een einfach lenks leihen gelos an färdeg examples where this label is not applicable:

  36. [36]

    also muss daat och hei goen

  37. [37]

    as daat lo wierklech den sprengenden punkt vun desem artikel? Recognition: The language related ideology ‘recognition’ groups opinion on Luxembourgish lan- guage and other languages spoken in the country. In this category, different ways of acknowledging languages are involved: • Luxembourgish considered as an oral lan- guage or/and a dialect • Luxembourg...

  38. [38]

    mee ech verstinn haut nach ëmmer net, wéisou alles, awer och wierklech alles hei zu lëtzebuerg op franséisch leeft

  39. [39]

    jomirhunengnationalsprooch,andéisollemir och héich halen, mee ons stäerkt war ëmmer, dassmirdräisproochegwaren, anochhoufreg drop waren

  40. [40]

    examples where this label is not applicable:

    ass wuel net esou einfach, en dialekt forcement zu enger sprooch wëllen ze for- méieren... examples where this label is not applicable:

  41. [41]

    adr" an der spëtz hun soss hätten mir awer wirklech

    bai allem respekt awer dier sidd mengen ech am joer 1945 henken bliwen. examples where none of the labels described are applicable: • Ech kennen 2 lëtzebuergesch Mammen déi aleng mat hire Kanner do stin, déi kruten null Ennerstëtzung fir eng Wunneng ze fannen. • Letzebuerg huet keen mobiliteitsproblem ausser moies an owes zu den Spetzenston- nen. •Kommt m...