arxiv: 2604.02772 · v1 · submitted 2026-04-03 · 💻 cs.CL

Multiple-Debias: A Full-process Debiasing Method for Multilingual Pre-trained Language Models

Haoyu Liang , Peijian Zeng , Wentao Huang , Aimin Yang , Dong Zhou This is my paper

Pith reviewed 2026-05-13 20:28 UTC · model grok-4.3

classification 💻 cs.CL

keywords debiasingmultilingual pre-trained language modelsbias mitigationcounterfactual data augmentationself-debiasinggender biasracial biasreligious bias

0 comments

The pith

Multilingual debiasing via Multiple-Debias reduces gender, racial, and religious biases in pre-trained language models more effectively than monolingual methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces Multiple-Debias, a full-process method for reducing biases in multilingual pre-trained language models. It applies multilingual counterfactual data augmentation and multilingual self-debiasing during both pre-processing and post-processing stages, combined with parameter-efficient fine-tuning. The approach was evaluated on extended CrowS-Pairs datasets covering German, Spanish, Chinese, and Japanese for three bias types. Results indicate that multilingual strategies outperform monolingual ones and that sharing debiasing signals across languages further improves fairness.

Core claim

The authors establish that incorporating multilingual counterfactual data augmentation and multilingual Self-Debias across pre-processing and post-processing stages, alongside parameter-efficient fine-tuning, significantly reduces biases in MPLMs for gender, race, and religion in four languages. Multilingual debiasing methods surpass monolingual approaches, and integrating debiasing information from different languages notably improves the fairness of MPLMs.

What carries the argument

The Multiple-Debias pipeline, which combines multilingual counterfactual data augmentation and multilingual self-debiasing in pre- and post-processing with parameter-efficient fine-tuning.

If this is right

Multilingual debiasing methods surpass monolingual approaches in mitigating biases.
Integrating debiasing information from different languages improves the fairness of MPLMs.
The method reduces biases across gender, race, and religion in German, Spanish, Chinese, and Japanese using the extended datasets.
Applying interventions at multiple stages in the pipeline produces stronger bias reduction than single-stage techniques.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Bias patterns may have shared cross-lingual structures that joint multilingual methods can target more efficiently than isolated per-language fixes.
Similar full-process pipelines could address additional bias types such as age or disability once comparable multilingual datasets exist.
The technique may support better model generalization in multilingual settings without explicit trade-offs against task performance.

Load-bearing premise

The extended CrowS-Pairs datasets accurately measure bias in non-English languages and the interventions do not introduce new biases or degrade core model performance.

What would settle it

Bias scores on the extended CrowS-Pairs datasets remaining unchanged or rising after Multiple-Debias application, or substantial drops in accuracy on standard multilingual NLP tasks, would show the method fails to deliver the claimed reductions.

Figures

Figures reproduced from arXiv: 2604.02772 by Aimin Yang, Dong Zhou, Haoyu Liang, Peijian Zeng, Wentao Huang.

**Figure 1.** Figure 1: The framework of Multiple-Debias. needed for fine-tuning while enhancing model fairness with minimal impact on performance. C. Multilingual-Self-Debias Schick et al. [12] proposed the Self-Debias method, which leverages the inherent capacity of PLMs to recognize their own biases. This technique works by first prompting the model to generate biased content using specific prompts, such as “The following text… view at source ↗

**Figure 2.** Figure 2: MBE bias scores of various debiasing methods on the mBERT model. [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: MBE bias scores of various debiasing methods on the XLM-R model. [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

read the original abstract

Multilingual Pre-trained Language Models (MPLMs) have become essential tools for natural language processing. However, they often exhibit biases related to sensitive attributes such as gender, race, and religion. In this paper, we introduce a comprehensive multilingual debiasing method named Multiple-Debias to address these issues across multiple languages. By incorporating multilingual counterfactual data augmentation and multilingual Self-Debias across both pre-processing and post-processing stages, alongside parameter-efficient fine-tuning, we significantly reduced biases in MPLMs across three sensitive attributes in four languages. We also extended CrowS-Pairs to German, Spanish, Chinese, and Japanese, validating our full-process multilingual debiasing method for gender, racial, and religious bias. Our experiments show that (i) multilingual debiasing methods surpass monolingual approaches in effectively mitigating biases, and (ii) integrating debiasing information from different languages notably improves the fairness of MPLMs.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Multiple-Debias stitches together counterfactual augmentation and Self-Debias into a multilingual pipeline and extends CrowS-Pairs, but the new dataset versions carry unverified translation and cultural issues that undercut the main results.

read the letter

The paper's main move is to run debiasing at both pre- and post-processing stages for models like mBERT and XLM-R, combining multilingual counterfactual data augmentation with a multilingual version of Self-Debias plus parameter-efficient fine-tuning. It also produces new CrowS-Pairs sets for German, Spanish, Chinese, and Japanese and reports that the full multilingual treatment beats monolingual baselines while cross-language signals add further fairness gains on gender, race, and religion biases. That integration across languages and stages is the concrete new piece relative to prior monolingual work. The experiments appear to show measurable drops in bias scores, which is useful for anyone who actually ships these models. The approach is straightforward to implement and targets a real deployment problem. The soft spot is the extended CrowS-Pairs data. The abstract and methods give almost no detail on translation procedure, native-speaker review, or checks for language-specific stereotypes versus translation artifacts, so the reported improvements could partly reflect metric distortion rather than genuine debiasing. There is also no clear evidence presented on whether the interventions preserve downstream task performance, which matters for any practical claim. The central empirical comparison therefore rests on a load-bearing assumption that has not been stress-tested in the write-up. This work is aimed at practitioners and applied researchers who need off-the-shelf debiasing recipes for multilingual settings. A reader already working on fairness metrics or counterfactual methods will find the pipeline description and the cross-language results worth looking at, even if they will want to re-run the dataset construction themselves. The paper is coherent on its own terms and engages the existing literature without obvious internal contradictions, so it clears the bar for serious refereeing. I would send it to review with explicit requests for dataset validation details and utility metrics.

Referee Report

3 major / 1 minor

Summary. The manuscript presents Multiple-Debias, a full-process debiasing method for multilingual pre-trained language models that combines multilingual counterfactual data augmentation, multilingual Self-Debias across pre- and post-processing stages, and parameter-efficient fine-tuning. The authors extend CrowS-Pairs to German, Spanish, Chinese, and Japanese and report experiments on gender, racial, and religious bias, claiming that multilingual debiasing outperforms monolingual approaches and that cross-lingual integration of debiasing information improves fairness in MPLMs.

Significance. If the extended datasets prove valid and downstream performance is preserved, the work could offer a practical full-process framework for multilingual debiasing that addresses multiple attributes across languages. The emphasis on both pre- and post-processing stages and the cross-lingual integration aspect represent constructive directions, though the current support for the superiority claims rests on unverified dataset extensions.

major comments (3)

[CrowS-Pairs Extension] CrowS-Pairs Extension: the extension to German, Spanish, Chinese, and Japanese is load-bearing for all reported bias reductions, yet the manuscript supplies no details on translation procedure, cultural adaptation, native-speaker validation, or inter-annotator reliability. Without these, measured improvements cannot be confidently attributed to Multiple-Debias rather than metric artifacts.
[Experimental Evaluation] Experimental Evaluation: the abstract states that biases were significantly reduced and that multilingual methods surpass monolingual ones, but no quantitative bias scores, error bars, statistical tests, or ablation results isolating the cross-lingual integration component are referenced. In addition, no controls or measurements are described for downstream task performance after debiasing.
[Method Description] Method Description: the claim that integrating debiasing information from different languages notably improves fairness requires explicit mechanisms and ablations showing how the multilingual components interact beyond simple concatenation; the current description leaves open whether gains arise from the method or from increased data volume.

minor comments (1)

[Abstract] Abstract: including one or two key quantitative bias-metric results would immediately ground the positive claims for readers.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We appreciate the emphasis on methodological transparency, experimental rigor, and clarity in cross-lingual mechanisms. We address each major comment point-by-point below and will revise the manuscript accordingly to address the concerns.

read point-by-point responses

Referee: [CrowS-Pairs Extension] CrowS-Pairs Extension: the extension to German, Spanish, Chinese, and Japanese is load-bearing for all reported bias reductions, yet the manuscript supplies no details on translation procedure, cultural adaptation, native-speaker validation, or inter-annotator reliability. Without these, measured improvements cannot be confidently attributed to Multiple-Debias rather than metric artifacts.

Authors: We agree that the current description of the CrowS-Pairs extension is insufficiently detailed. In the revised manuscript we will add a dedicated subsection (under Datasets and Evaluation Metrics) that specifies: (1) the translation pipeline using professional native-speaker translators with bias-context preservation guidelines, (2) cultural adaptation steps to ensure stereotypes remain relevant in each language, and (3) inter-annotator reliability metrics (Cohen’s kappa and percentage agreement) obtained from three independent annotators per language. We will also release the full extended datasets and annotation protocols. revision: yes
Referee: [Experimental Evaluation] Experimental Evaluation: the abstract states that biases were significantly reduced and that multilingual methods surpass monolingual ones, but no quantitative bias scores, error bars, statistical tests, or ablation results isolating the cross-lingual integration component are referenced. In addition, no controls or measurements are described for downstream task performance after debiasing.

Authors: The full manuscript (Section 4) already reports per-language and per-attribute CrowS-Pairs scores, monolingual vs. multilingual comparisons, and some ablation results. However, we acknowledge that error bars, formal statistical tests, and downstream-task controls were not presented with sufficient prominence. In the revision we will: (i) add standard deviations and paired t-test p-values for all bias reductions, (ii) include an explicit ablation isolating the cross-lingual integration component, and (iii) report downstream performance on standard tasks (e.g., XNLI, NER) before and after debiasing to demonstrate that utility is preserved. revision: yes
Referee: [Method Description] Method Description: the claim that integrating debiasing information from different languages notably improves fairness requires explicit mechanisms and ablations showing how the multilingual components interact beyond simple concatenation; the current description leaves open whether gains arise from the method or from increased data volume.

Authors: We will expand the Method section to provide a precise description of the cross-lingual interaction: multilingual Self-Debias operates on a shared parameter-efficient adapter that receives concatenated counterfactual examples from all languages, with language-specific prefixes and a joint debiasing loss that encourages transfer. To rule out simple data-volume effects, the revision will include a controlled ablation in which monolingual baselines are trained on repeated data to match the total token count of the multilingual setting; results will show that the multilingual configuration still yields larger bias reductions, supporting the claim that cross-lingual integration is the source of improvement. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical method validated on independent benchmarks

full rationale

The paper introduces Multiple-Debias via multilingual counterfactual augmentation, Self-Debias, and parameter-efficient fine-tuning, then reports bias reductions on extended CrowS-Pairs datasets across languages. All central claims rest on direct experimental comparisons rather than any derivation, fitted parameter renamed as prediction, or self-citation chain. No equations, uniqueness theorems, or ansatzes are presented that reduce to the inputs by construction. The extension of CrowS-Pairs is described as a supporting contribution but does not create a self-referential loop in the reported results.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The approach assumes counterfactual sentences preserve semantics and that bias metrics like extended CrowS-Pairs capture real-world fairness; no new entities postulated.

free parameters (1)

fine-tuning hyperparameters
Learning rate, epochs, and regularization strength for parameter-efficient tuning are chosen to balance bias reduction and performance.

axioms (1)

domain assumption Counterfactual data augmentation does not introduce new semantic distortions
Invoked when generating multilingual counterfactual examples for pre-processing.

pith-pipeline@v0.9.0 · 5462 in / 1258 out tokens · 26230 ms · 2026-05-13T20:28:50.791872+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

By incorporating multilingual counterfactual data augmentation and multilingual Self-Debias across both pre-processing and post-processing stages, alongside parameter-efficient fine-tuning
IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We also extended CrowS-Pairs to German, Spanish, Chinese, and Japanese

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

23 extracted references · 23 canonical work pages · 1 internal anchor

[1]

BERT: pre-training of deep bidirectional transformers for language understanding,

J. Devlin, M. Chang, K. Lee, and K. Toutanova, “BERT: pre-training of deep bidirectional transformers for language understanding,” in Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), 2019, pp. 4171–4186

work page 2019
[2]

Language models are few-shot learners,

T. B. Brown, B. Mann, N. Ryder, M. Subbiah, J. Kaplan, P. Dhariwal, A. Neelakantan, P. Shyam, G. Sastry, A. Askell, S. Agarwal, A. Herbert- V oss, G. Krueger, T. Henighan, R. Child, A. Ramesh, D. M. Ziegler, J. Wu, C. Winter, C. Hesse, M. Chen, E. Sigler, M. Litwin, S. Gray, B. Chess, J. Clark, C. Berner, S. McCandlish, A. Radford, I. Sutskever, and D. Am...

work page 2020
[3]

LLaMA: Open and Efficient Foundation Language Models

H. Touvron, T. Lavril, G. Izacard, X. Martinet, M.-A. Lachaux, T. Lacroix, B. Rozi `ere, N. Goyal, E. Hambro, F. Azharet al., “Llama: Open and efficient foundation language models,”arXiv preprint arXiv:2302.13971, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[4]

Social biases in NLP models as barriers for persons with disabilities,

B. Hutchinson, V . Prabhakaran, E. Denton, K. Webster, Y . Zhong, and S. Denuyl, “Social biases in NLP models as barriers for persons with disabilities,” inProceedings of the 58th Annual Meeting of the Asso- ciation for Computational Linguistics. Association for Computational Linguistics, 2020, pp. 5491–5501

work page 2020
[5]

A review on fairness in machine learning,

D. Pessach and E. Shmueli, “A review on fairness in machine learning,” ACM Computing Surveys (CSUR), vol. 55, no. 3, pp. 1–44, 2022

work page 2022
[6]

A survey on multilingual large language models: Corpora, alignment, and bias,

Y . Xu, L. Hu, J. Zhao, Z. Qiu, Y . Ye, and H. Gu, “A survey on multilingual large language models: Corpora, alignment, and bias,” arXiv preprint arXiv:2404.00929, 2024

work page arXiv 2024
[7]

Gender bias in coreference resolution: Evaluation and debiasing methods,

J. Zhao, T. Wang, M. Yatskar, V . Ordonez, and K. Chang, “Gender bias in coreference resolution: Evaluation and debiasing methods,” in Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 2 (Short Papers), 2018, pp. 15–20

work page 2018
[8]

Parameter-efficient transfer learning for nlp,

N. Houlsby, A. Giurgiu, S. Jastrzebski, B. Morrone, Q. De Laroussilhe, A. Gesmundo, M. Attariyan, and S. Gelly, “Parameter-efficient transfer learning for nlp,” inInternational conference on machine learning, 2019, pp. 2790–2799

work page 2019
[9]

Prefix-tuning: Optimizing continuous prompts for generation,

X. L. Li and P. Liang, “Prefix-tuning: Optimizing continuous prompts for generation,” inProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 4582–4597

work page 2021
[10]

The power of scale for parameter-efficient prompt tuning,

B. Lester, R. Al-Rfou, and N. Constant, “The power of scale for parameter-efficient prompt tuning,” inProceedings of the 2021 Con- ference on Empirical Methods in Natural Language Processing, 2021, pp. 3045–3059

work page 2021
[11]

Unsu- pervised cross-lingual representation learning at scale,

A. Conneau, K. Khandelwal, N. Goyal, V . Chaudhary, G. Wenzek, F. Guzm´an, E. Grave, M. Ott, L. Zettlemoyer, and V . Stoyanov, “Unsu- pervised cross-lingual representation learning at scale,” inProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 8440–8451

work page 2020
[12]

Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in nlp,

T. Schick, S. Udupa, and H. Sch ¨utze, “Self-diagnosis and self-debiasing: A proposal for reducing corpus-based bias in nlp,”Transactions of the Association for Computational Linguistics, vol. 9, pp. 1408–1424, 2021

work page 2021
[13]

Crows-pairs: A challenge dataset for measuring social biases in masked language models,

N. Nangia, C. Vania, R. Bhalerao, and S. R. Bowman, “Crows-pairs: A challenge dataset for measuring social biases in masked language models,” inProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing, 2020, pp. 1953–1967

work page 2020
[14]

Gender bias in multilingual embeddings and cross-lingual transfer,

J. Zhao, S. Mukherjee, S. Hosseini, K. Chang, and A. H. Awadallah, “Gender bias in multilingual embeddings and cross-lingual transfer,” inProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, 2020, pp. 2896–2907

work page 2020
[15]

On evaluating and mitigating gender biases in multilingual settings,

A. Vashishtha, K. Ahuja, and S. Sitaram, “On evaluating and mitigating gender biases in multilingual settings,” inFindings of the Association for Computational Linguistics: ACL 2023, 2023, pp. 307–318

work page 2023
[16]

Investigating bias in multilingual language models: Cross-lingual trans- fer of debiasing techniques,

M. Reusens, P. Borchert, M. Mieskes, J. D. Weerdt, and B. Baesens, “Investigating bias in multilingual language models: Cross-lingual trans- fer of debiasing techniques,” inProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, 2023, pp. 2887– 2896

work page 2023
[17]

Fairness in language models beyond english: Gaps and challenges,

K. Ramesh, S. Sitaram, and M. Choudhury, “Fairness in language models beyond english: Gaps and challenges,” inFindings of the Association for Computational Linguistics: EACL 2023, 2023, pp. 2061– 2074

work page 2023
[18]

An empirical analysis of parameter-efficient methods for debiasing pre-trained language models,

Z. Xie and T. Lukasiewicz, “An empirical analysis of parameter-efficient methods for debiasing pre-trained language models,” inProceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), A. Rogers, J. L. Boyd-Graber, and N. Okazaki, Eds., 2023, pp. 15 730–15 745

work page 2023
[19]

Stereoset: Measuring stereo- typical bias in pretrained language models,

M. Nadeem, A. Bethke, and S. Reddy, “Stereoset: Measuring stereo- typical bias in pretrained language models,” inProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 2021, pp. 5356–5371

work page 2021
[20]

Gender bias in masked language models for multiple languages,

M. Kaneko, A. Imankulova, D. Bollegala, and N. Okazaki, “Gender bias in masked language models for multiple languages,” inProceedings of the 2022 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, 2022, pp. 2740–2750

work page 2022
[21]

Transformers: State-of-the-art natural language processing,

T. Wolf, L. Debut, V . Sanh, J. Chaumond, C. Delangue, A. Moi, P. Cistac, T. Rault, R. Louf, M. Funtowicz, J. Davison, S. Shleifer, P. von Platen, C. Ma, Y . Jernite, J. Plu, C. Xu, T. Le Scao, S. Gugger, M. Drame, Q. Lhoest, and A. Rush, “Transformers: State-of-the-art natural language processing,” inProceedings of the 2020 Conference on Empirical Method...

work page 2020
[22]

An empirical survey of the effectiveness of debiasing techniques for pre-trained language models,

N. Meade, E. Poole-Dayan, and S. Reddy, “An empirical survey of the effectiveness of debiasing techniques for pre-trained language models,” inProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), 2022, pp. 1878– 1898

work page 2022
[23]

Mitigating language-dependent ethnic bias in BERT,

J. Ahn and A. Oh, “Mitigating language-dependent ethnic bias in BERT,” inProceedings of the 2021 Conference on Empirical Methods in Natural Language Processing, 2021, pp. 533–549

work page 2021