Cross-Lingual Sentiment Misalignment: Auditing Multilingual Language Models for Inversion Risk, Dialectal Representation, and Affective Stability

Nusrat Jahan Lia; Shubhashis Roy Dipta

arxiv: 2602.17469 · v2 · submitted 2026-02-19 · 💻 cs.CL · cs.HC

Cross-Lingual Sentiment Misalignment: Auditing Multilingual Language Models for Inversion Risk, Dialectal Representation, and Affective Stability

Nusrat Jahan Lia , Shubhashis Roy Dipta This is my paper

Pith reviewed 2026-05-15 20:58 UTC · model grok-4.3

classification 💻 cs.CL cs.HC

keywords cross-lingual sentimentmultilingual transformerssentiment inversionBengalidialect biasaffective stability

0 comments

The pith

A compressed multilingual model inverts sentiment polarity in 28.7 percent of Bengali-English sentence pairs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper audits four multilingual transformer models on parallel Bengali-English sentences to measure how reliably they keep positive or negative meaning intact across languages. It reports that one compressed architecture flips the sentiment label in 28.7 percent of cases and systematically changes the emotional intensity of Bengali text relative to its English match. The study also finds that a regional model makes 57 percent more alignment errors on formal Bengali than on modern colloquial text. These findings matter because the same encoders often act as safety classifiers or reward models inside larger language systems.

Core claim

Multilingual transformers exhibit measurable cross-lingual sentiment misalignment, with a compressed model reaching a 28.7 percent Sentiment Inversion Rate on dialect-stratified parallel Bengali-English pairs, plus asymmetric affective weighting and a modern bias that raises alignment error on formal registers.

What carries the argument

A controlled benchmarking framework that runs four transformer models on parallel sentence pairs stratified by dialect and computes Sentiment Inversion Rate plus alignment error.

Load-bearing premise

The parallel Bengali-English sentence pairs are perfectly sentiment-aligned and representative of real-world usage across dialects.

What would settle it

Human-annotated sentiment labels on the identical parallel sentence pairs compared against the models' output labels to verify or refute the reported 28.7 percent inversion rate.

Figures

Figures reproduced from arXiv: 2602.17469 by Nusrat Jahan Lia, Shubhashis Roy Dipta.

**Figure 2.** Figure 2: Sentiment Inversion Rate Across Models • Compression and Elevated Inversion Risk. The distilled multilingual architecture (mDistilBERT) exhibits the highest mean alignment divergence and lowest robustness (see Table 1). Nearly one in three sentence pairs 5 [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Distribution of Alignment Error Density 4.2 Finding 2: Representational Harm and the Dialectal Gap To evaluate robustness under Bengali diglossia, we compare alignment divergence across colloquial (Cholito) and formal (Sadhu) variants. A multilingual system should maintain stable cross-lingual 6 [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Directional Bias in Sentiment Scores (English [PITH_FULL_IMAGE:figures/full_fig_p010_4.png] view at source ↗

**Figure 5.** Figure 5: Illustrative case study validating Asymmetric [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

read the original abstract

Recent advances in multilingual representation learning aim to bridge the performance gap between high- and low-resource languages, yet their ability to preserve affective meaning across languages remains underexplored, particularly for underrepresented languages like Bengali. This research addresses cross-lingual sentiment misalignment between Bengali and English by introducing a controlled benchmarking framework evaluating four multilingual transformer models on parallel Bengali-English sentence pairs, stratified by dialect, to assess their representational stability. We demonstrate that a compressed model architecture exhibits a 28.7% "Sentiment Inversion Rate," fundamentally misinterpreting positive semantics as negative (or vice versa). Consequently, we identify a cross-lingual sentiment skew that we call "Asymmetric Empathy," where models systematically dampen or artificially amplify the affective weight of Bengali text relative to its exact English counterpart. Finally, we expose a key vulnerability regarding dialectal representation: a "Modern Bias" in the regional model, which exhibits a 57% increase in alignment error when processing the formal Bengali register compared to modern colloquial text. As foundational encoders continue to serve as safety classifiers and reward models for LLM pipelines, cross-lingual reliability becomes a critical concern. We therefore advocate for the integration of "Affective Stability" metrics into future cross-lingual benchmarks to detect and penalize polarity inversions, particularly in low-resource settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The 28.7% inversion rate and new terms like Asymmetric Empathy rest on unverified claims that the Bengali-English pairs match in sentiment polarity.

read the letter

The paper's main takeaway is a reported 28.7% sentiment inversion rate in one compressed multilingual model on Bengali-English pairs, plus two new labels for skew and dialect problems. It flags a practical risk for models used in safety filters or reward modeling for low-resource languages. That focus on affective stability in deployed systems is the useful part. The work applies standard parallel-sentence testing to Bengali and adds concrete percentages plus the dialect split, which is a straightforward extension rather than a new method. It does draw attention to an area that matters for real-world use. The soft spots are bigger. The central numbers come without sample sizes, error bars, or statistical tests in the abstract, and the stress-test note is right that no independent native-speaker check on pair polarity is described. If even a modest fraction of the pairs have translation-induced sentiment shifts, the inversion rate becomes an upper bound on model error instead of a clean measure of misalignment. The Asymmetric Empathy and Modern Bias claims inherit the same gap. Without the full methods, data release, or annotation details, the evidence stays thin. This is the kind of paper that could interest people building or auditing multilingual safety components, but only after the data and verification steps are shown. It is not ready for a serious referee in its current form because the load-bearing assumption about pair equivalence is untested. I would not cite it yet and would not bring it to a reading group until the methods are filled in.

Referee Report

2 major / 2 minor

Summary. The paper introduces a controlled benchmarking framework to evaluate cross-lingual sentiment alignment in four multilingual transformer models using parallel Bengali-English sentence pairs stratified by dialect. It reports a 28.7% Sentiment Inversion Rate in a compressed model, identifies an 'Asymmetric Empathy' skew where models dampen or amplify Bengali affective weight relative to English counterparts, and documents a 'Modern Bias' with 57% higher alignment error on formal Bengali registers. The work advocates incorporating 'Affective Stability' metrics into future cross-lingual benchmarks.

Significance. If the empirical measurements hold after verification, the results would highlight a practically important failure mode in multilingual encoders used as safety classifiers or reward models. The specific metrics (Sentiment Inversion Rate, Asymmetric Empathy, Modern Bias) and focus on a low-resource language like Bengali provide a concrete starting point for improving affective reliability in cross-lingual settings.

major comments (2)

[Benchmarking Framework] Benchmarking Framework (described in Abstract and Methods): The 28.7% Sentiment Inversion Rate is computed by treating parallel Bengali-English pairs as sentiment-identical by construction, yet no section reports independent polarity annotation by native speakers, inter-annotator agreement, or a check that dialectal variants preserve affective polarity. If 10-15% of pairs contain translation-induced shifts, the reported rate becomes an upper bound on model error rather than a direct measure of misalignment.
[Results] Results and Abstract: Specific percentages (28.7% inversion rate, 57% increase in alignment error) are presented without sample sizes, statistical tests, baseline comparisons, or error bars. This prevents verification of the central quantitative claims and undermines the cross-dialect and cross-model comparisons.

minor comments (2)

[Introduction] The terms 'Asymmetric Empathy' and 'Modern Bias' are introduced without formal definitions or equations; a brief operationalization in the Methods section would improve clarity.
[Methods] The manuscript would benefit from an explicit statement of the total number of sentence pairs and the dialect stratification procedure.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive comments on our manuscript. The feedback highlights important aspects of methodological transparency and statistical rigor that we will address in revision. Below we respond point by point to the major comments.

read point-by-point responses

Referee: [Benchmarking Framework] Benchmarking Framework (described in Abstract and Methods): The 28.7% Sentiment Inversion Rate is computed by treating parallel Bengali-English pairs as sentiment-identical by construction, yet no section reports independent polarity annotation by native speakers, inter-annotator agreement, or a check that dialectal variants preserve affective polarity. If 10-15% of pairs contain translation-induced shifts, the reported rate becomes an upper bound on model error rather than a direct measure of misalignment.

Authors: We agree that explicit validation of affective polarity preservation would strengthen the interpretation. The parallel pairs were sourced from established, publicly documented corpora (OPUS and related resources) whose translations are generally accepted as sentiment-preserving in prior literature, but the manuscript does not include new native-speaker polarity annotations or inter-annotator agreement statistics for this dataset. In the revised version we will add a Methods subsection that (a) cites the exact data sources and any pre-existing validation, (b) explicitly states the assumption of polarity equivalence, and (c) acknowledges the possibility of translation-induced shifts as a limitation. We will also note that the reported inversion rate should be interpreted as model behavior on these particular pairs rather than an absolute measure of misalignment. This clarification does not alter the empirical observations but improves transparency. revision: yes
Referee: [Results] Results and Abstract: Specific percentages (28.7% inversion rate, 57% increase in alignment error) are presented without sample sizes, statistical tests, baseline comparisons, or error bars. This prevents verification of the central quantitative claims and undermines the cross-dialect and cross-model comparisons.

Authors: We accept this criticism. While the full experimental section contains the underlying sample sizes (approximately 5,000 sentence pairs per dialect stratum across the four models), these figures and associated statistical details were not restated prominently in the abstract or summarized results. In revision we will (a) report exact sample sizes, (b) add error bars or confidence intervals, (c) include appropriate statistical tests (e.g., McNemar’s test for paired polarity comparisons and paired t-tests for alignment error differences), and (d) provide baseline comparisons against monolingual English and Bengali models. These additions will enable direct verification of all quantitative claims and strengthen the cross-dialect and cross-model analyses. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical benchmarking of model outputs

full rationale

The paper's core results consist of direct empirical measurements of multilingual model predictions on parallel Bengali-English sentence pairs, including the reported 28.7% Sentiment Inversion Rate and derived statistics for Asymmetric Empathy and Modern Bias. No mathematical derivations, fitted parameters, or equations are presented that reduce these outputs to the inputs by construction. The framework evaluates existing transformer models without invoking self-citations, uniqueness theorems, or ansatzes that would create load-bearing circularity. New terminology describes observed patterns from the measurements rather than renaming results or smuggling assumptions. The analysis is therefore self-contained as an auditing study.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claims rest on the unstated premise that the chosen parallel sentences carry identical sentiment across languages and that the four models are representative of current multilingual practice.

axioms (1)

domain assumption Parallel Bengali-English sentences have identical ground-truth sentiment polarity
Required for the inversion-rate calculation to be meaningful

invented entities (1)

Asymmetric Empathy no independent evidence
purpose: Label for observed affective skew between Bengali and English representations
New descriptive term introduced to characterize the empirical pattern

pith-pipeline@v0.9.0 · 5542 in / 1286 out tokens · 38361 ms · 2026-05-15T20:58:08.825071+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We demonstrate that a compressed model architecture exhibits a 28.7% Sentiment Inversion Rate... Asymmetric Empathy... Modern Bias... Affective Stability Index AS(M)
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Jcost and recognition-cost machinery never referenced

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

18 extracted references · 18 canonical work pages

[1]

Llm stabil- ity: A detailed analysis with some surprises.arXiv preprint arXiv:2408.04667,

work page arXiv
[2]

InFindings of the Association for Computational Linguistics: NAACL 2022, pages 1318–1327

Banglabert: Language model pretraining and bench- marks for low-resource language understanding eval- uation in bangla. InFindings of the Association for Computational Linguistics: NAACL 2022, pages 1318–1327. Anirban Bhowmick and Abhik Jana

work page 2022
[3]

arXiv preprint arXiv:2507.23248

Evaluating llms’ multilingual capabilities for ben- gali: Benchmark creation and performance analysis. arXiv preprint arXiv:2507.23248. Terra Blevins, Tomasz Limisiewicz, Suchin Gururan- gan, Margaret Li, Hila Gonen, Noah A Smith, and Luke Zettlemoyer

work page arXiv
[4]

InProceedings of the 2024 conference on empiri- cal methods in natural language processing, pages 10822–10837

Breaking the curse of multi- linguality with cross-lingual expert language models. InProceedings of the 2024 conference on empiri- cal methods in natural language processing, pages 10822–10837. Vadim Borisov, Samuel Gyamfi, and Richard H. Schreiber

work page 2024
[5]

colonial impulse

The“colonial impulse" of natural language processing: An audit of bengali sentiment analysis tools and their identity-based biases. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pages 1–18. Negar Foroutan, Paul Teiletche, Ayush Kumar Tarun, and Antoine Bosselut

work page 2024
[6]

Omer Goldman, Uri Shaham, Dan Malkin, Sivan Eiger, Avinatan Hassidim, Yossi Matias, Joshua Maynez, Adi Mayrav Gilady, Jason Riesa, Shruti Rijhwani, and 1 others

Revisiting multilingual data mixtures in language model pretraining.arXiv preprint arXiv:2510.25947. Omer Goldman, Uri Shaham, Dan Malkin, Sivan Eiger, Avinatan Hassidim, Yossi Matias, Joshua Maynez, Adi Mayrav Gilady, Jason Riesa, Shruti Rijhwani, and 1 others

work page arXiv
[7]

doi:10.48550/arXiv.2502.21228 , url =

Eclektic: a novel challenge set for evaluation of cross-lingual knowledge transfer. arXiv preprint arXiv:2502.21228. Wenhan Han, Yifan Zhang, Zhixun Chen, Binbin Liu, Haobin Lin, Bingni Zhang, Taifeng Wang, Mykola Pechenizkiy, Meng Fang, and Yin Zheng

work page arXiv
[8]

Md Nesarul Hoque, Umme Salma, Md Jamal Uddin, Md Martuza Ahamad, and Sakifa Aktar

Mubench: Assessment of multilingual capabilities of large language models across 61 languages.arXiv preprint arXiv:2506.19468. Md Nesarul Hoque, Umme Salma, Md Jamal Uddin, Md Martuza Ahamad, and Sakifa Aktar

work page arXiv
[9]

Mohsinul Kabir, Mohammed Saidul Islam, Md Tah- mid Rahman Laskar, Mir Tafseer Nayeem, M Saiful Bari, and Enamul Hoque

Bnmmlu: Measuring massive multitask lan- guage understanding in bengali.arXiv preprint arXiv:2505.18951. Mohsinul Kabir, Mohammed Saidul Islam, Md Tah- mid Rahman Laskar, Mir Tafseer Nayeem, M Saiful Bari, and Enamul Hoque

work page arXiv
[10]

InPro- ceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 2238–

Benllm-eval: A com- prehensive evaluation into the potentials and pitfalls of large language models on bengali nlp. InPro- ceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 2238–

work page 2024
[11]

InFindings of the association for compu- tational linguistics: EMNLP 2020, pages 4948–4961

Indicnlpsuite: Monolingual corpora, evaluation benchmarks and 9 pre-trained multilingual language models for indian languages. InFindings of the association for compu- tational linguistics: EMNLP 2020, pages 4948–4961. Zhiwei Liu, Lingfei Qian, Qianqian Xie, Jimin Huang, Kailai Yang, and Sophia Ananiadou

work page 2020
[12]

Hemal Mahmud, Hasan Mahmud, and Mohammad Rifat Ahmmad Rashid

Mmaff- ben: a multilingual and multimodal affective analy- sis benchmark for evaluating llms and vlms.arXiv preprint arXiv:2505.24423. Hemal Mahmud, Hasan Mahmud, and Mohammad Rifat Ahmmad Rashid

work page arXiv
[13]

Md Saef Ullah Miah, Md Mohsin Kabir, Talha Bin Sarwar, Mejdl Safran, Sultan Alfarhood, and Md F Mridha

Enhancing sen- timent analysis in bengali texts: A hybrid ap- proach using lexicon-based algorithm and pre- trained language model bangla-bert.arXiv preprint arXiv:2411.19584. Md Saef Ullah Miah, Md Mohsin Kabir, Talha Bin Sarwar, Mejdl Safran, Sultan Alfarhood, and Md F Mridha

work page arXiv
[14]

Alberto Poncelas, Pintu Lohar, James Hadley, and Andy Way

Reasoning beyond labels: Measuring llm sentiment in low- resource, culturally nuanced contexts.arXiv preprint arXiv:2508.04199. Alberto Poncelas, Pintu Lohar, James Hadley, and Andy Way

work page arXiv
[15]

InProceedings of the 2021 Conference on Empirical Methods in Natural Lan- guage Processing, pages 10215–10245

Xtreme-r: Towards more challenging and nuanced multilingual evaluation. InProceedings of the 2021 Conference on Empirical Methods in Natural Lan- guage Processing, pages 10215–10245. Jayanta Sadhu, Maneesha Rani Saha, and Rifat Shahri- yar

work page 2021
[16]

InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 264–285

Lionguard 2: Building lightweight, data- efficient & localised multilingual content moderators. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 264–285. Azmine Toushik Wasi, Raima Islam, Mst Rafia Islam, Taki Hasan Rafi, and Dong-Kyu Chae

work page 2025
[17]

A Supplementary Figures and Tables This appendix contains the visualizations and table referenced in the main findings of the paper

Explor- ing bengali religious dialect biases in large language models with evaluation perspectives.arXiv preprint arXiv:2407.18376. A Supplementary Figures and Tables This appendix contains the visualizations and table referenced in the main findings of the paper. Model Name Repository (HuggingFace) XLM-Tcardiffnlp/XLM-Toberta-sentiment (Barbieri et al.,

work page arXiv
[18]

It utilizes synthetic data from multiple sources to achieve robust performance across different languages and cultural contexts (Borisov et al., 2025)

IndicBERTai4bharat/IndicBERTv2-sentiment Tabularistabularisai/multilingual-sentiment mDistilBERT lxyuan/distilbert-multilingual Table 4: Model Repository Mapping Note on Tabularis:This model is a fine-tuned version of the model distilbert/distilbert-base-multilingual-cased for multilingual sentiment analysis. It utilizes synthetic data from multiple sourc...

work page 2025

[1] [1]

Llm stabil- ity: A detailed analysis with some surprises.arXiv preprint arXiv:2408.04667,

work page arXiv

[2] [2]

InFindings of the Association for Computational Linguistics: NAACL 2022, pages 1318–1327

Banglabert: Language model pretraining and bench- marks for low-resource language understanding eval- uation in bangla. InFindings of the Association for Computational Linguistics: NAACL 2022, pages 1318–1327. Anirban Bhowmick and Abhik Jana

work page 2022

[3] [3]

arXiv preprint arXiv:2507.23248

Evaluating llms’ multilingual capabilities for ben- gali: Benchmark creation and performance analysis. arXiv preprint arXiv:2507.23248. Terra Blevins, Tomasz Limisiewicz, Suchin Gururan- gan, Margaret Li, Hila Gonen, Noah A Smith, and Luke Zettlemoyer

work page arXiv

[4] [4]

InProceedings of the 2024 conference on empiri- cal methods in natural language processing, pages 10822–10837

Breaking the curse of multi- linguality with cross-lingual expert language models. InProceedings of the 2024 conference on empiri- cal methods in natural language processing, pages 10822–10837. Vadim Borisov, Samuel Gyamfi, and Richard H. Schreiber

work page 2024

[5] [5]

colonial impulse

The“colonial impulse" of natural language processing: An audit of bengali sentiment analysis tools and their identity-based biases. In Proceedings of the 2024 CHI Conference on Human Factors in Computing Systems, pages 1–18. Negar Foroutan, Paul Teiletche, Ayush Kumar Tarun, and Antoine Bosselut

work page 2024

[6] [6]

Omer Goldman, Uri Shaham, Dan Malkin, Sivan Eiger, Avinatan Hassidim, Yossi Matias, Joshua Maynez, Adi Mayrav Gilady, Jason Riesa, Shruti Rijhwani, and 1 others

Revisiting multilingual data mixtures in language model pretraining.arXiv preprint arXiv:2510.25947. Omer Goldman, Uri Shaham, Dan Malkin, Sivan Eiger, Avinatan Hassidim, Yossi Matias, Joshua Maynez, Adi Mayrav Gilady, Jason Riesa, Shruti Rijhwani, and 1 others

work page arXiv

[7] [7]

doi:10.48550/arXiv.2502.21228 , url =

Eclektic: a novel challenge set for evaluation of cross-lingual knowledge transfer. arXiv preprint arXiv:2502.21228. Wenhan Han, Yifan Zhang, Zhixun Chen, Binbin Liu, Haobin Lin, Bingni Zhang, Taifeng Wang, Mykola Pechenizkiy, Meng Fang, and Yin Zheng

work page arXiv

[8] [8]

Md Nesarul Hoque, Umme Salma, Md Jamal Uddin, Md Martuza Ahamad, and Sakifa Aktar

Mubench: Assessment of multilingual capabilities of large language models across 61 languages.arXiv preprint arXiv:2506.19468. Md Nesarul Hoque, Umme Salma, Md Jamal Uddin, Md Martuza Ahamad, and Sakifa Aktar

work page arXiv

[9] [9]

Mohsinul Kabir, Mohammed Saidul Islam, Md Tah- mid Rahman Laskar, Mir Tafseer Nayeem, M Saiful Bari, and Enamul Hoque

Bnmmlu: Measuring massive multitask lan- guage understanding in bengali.arXiv preprint arXiv:2505.18951. Mohsinul Kabir, Mohammed Saidul Islam, Md Tah- mid Rahman Laskar, Mir Tafseer Nayeem, M Saiful Bari, and Enamul Hoque

work page arXiv

[10] [10]

InPro- ceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 2238–

Benllm-eval: A com- prehensive evaluation into the potentials and pitfalls of large language models on bengali nlp. InPro- ceedings of the 2024 Joint International Conference on Computational Linguistics, Language Resources and Evaluation (LREC-COLING 2024), pages 2238–

work page 2024

[11] [11]

InFindings of the association for compu- tational linguistics: EMNLP 2020, pages 4948–4961

Indicnlpsuite: Monolingual corpora, evaluation benchmarks and 9 pre-trained multilingual language models for indian languages. InFindings of the association for compu- tational linguistics: EMNLP 2020, pages 4948–4961. Zhiwei Liu, Lingfei Qian, Qianqian Xie, Jimin Huang, Kailai Yang, and Sophia Ananiadou

work page 2020

[12] [12]

Hemal Mahmud, Hasan Mahmud, and Mohammad Rifat Ahmmad Rashid

Mmaff- ben: a multilingual and multimodal affective analy- sis benchmark for evaluating llms and vlms.arXiv preprint arXiv:2505.24423. Hemal Mahmud, Hasan Mahmud, and Mohammad Rifat Ahmmad Rashid

work page arXiv

[13] [13]

Md Saef Ullah Miah, Md Mohsin Kabir, Talha Bin Sarwar, Mejdl Safran, Sultan Alfarhood, and Md F Mridha

Enhancing sen- timent analysis in bengali texts: A hybrid ap- proach using lexicon-based algorithm and pre- trained language model bangla-bert.arXiv preprint arXiv:2411.19584. Md Saef Ullah Miah, Md Mohsin Kabir, Talha Bin Sarwar, Mejdl Safran, Sultan Alfarhood, and Md F Mridha

work page arXiv

[14] [14]

Alberto Poncelas, Pintu Lohar, James Hadley, and Andy Way

Reasoning beyond labels: Measuring llm sentiment in low- resource, culturally nuanced contexts.arXiv preprint arXiv:2508.04199. Alberto Poncelas, Pintu Lohar, James Hadley, and Andy Way

work page arXiv

[15] [15]

InProceedings of the 2021 Conference on Empirical Methods in Natural Lan- guage Processing, pages 10215–10245

Xtreme-r: Towards more challenging and nuanced multilingual evaluation. InProceedings of the 2021 Conference on Empirical Methods in Natural Lan- guage Processing, pages 10215–10245. Jayanta Sadhu, Maneesha Rani Saha, and Rifat Shahri- yar

work page 2021

[16] [16]

InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 264–285

Lionguard 2: Building lightweight, data- efficient & localised multilingual content moderators. InProceedings of the 2025 Conference on Empirical Methods in Natural Language Processing: System Demonstrations, pages 264–285. Azmine Toushik Wasi, Raima Islam, Mst Rafia Islam, Taki Hasan Rafi, and Dong-Kyu Chae

work page 2025

[17] [17]

A Supplementary Figures and Tables This appendix contains the visualizations and table referenced in the main findings of the paper

Explor- ing bengali religious dialect biases in large language models with evaluation perspectives.arXiv preprint arXiv:2407.18376. A Supplementary Figures and Tables This appendix contains the visualizations and table referenced in the main findings of the paper. Model Name Repository (HuggingFace) XLM-Tcardiffnlp/XLM-Toberta-sentiment (Barbieri et al.,

work page arXiv

[18] [18]

It utilizes synthetic data from multiple sources to achieve robust performance across different languages and cultural contexts (Borisov et al., 2025)

IndicBERTai4bharat/IndicBERTv2-sentiment Tabularistabularisai/multilingual-sentiment mDistilBERT lxyuan/distilbert-multilingual Table 4: Model Repository Mapping Note on Tabularis:This model is a fine-tuned version of the model distilbert/distilbert-base-multilingual-cased for multilingual sentiment analysis. It utilizes synthetic data from multiple sourc...

work page 2025