UnBias-Plus: Detect, Explain, and Rewrite Bias

Ahmed ElKady; Ahmed Y. Radwan; Amrit Krishnan; Mohamed Hafez; Shaina Raza; Sindhuja Chaduvula

arxiv: 2606.23412 · v1 · pith:XDSI46V2new · submitted 2026-06-22 · 💻 cs.CL · cs.AI· cs.SE

UnBias-Plus: Detect, Explain, and Rewrite Bias

Ahmed Y. Radwan , Ahmed ElKady , Sindhuja Chaduvula , Mohamed Hafez , Amrit Krishnan , Shaina Raza This is my paper

Pith reviewed 2026-06-26 08:34 UTC · model grok-4.3

classification 💻 cs.CL cs.AIcs.SE

keywords bias detectionnatural language processingtext rewritingexplainable AIopen-source toolkitmulti-class classificationspan localization

0 comments

The pith

UnBias-Plus unifies segment-level bias classification, span localization, neutral rewriting, and decision reasoning in one open-source toolkit.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents UnBias-Plus to overcome the limits of current bias detection tools that mostly flag bias presence without finer details or fixes. It bundles four capabilities into one system: multi-class classification of bias in text segments, identification of the exact biased spans, rewriting of those spans into neutral versions, and generation of reasoning for the outputs. This approach targets persistent bias issues in journalism, education, and AI-generated content by making the full pipeline accessible. The toolkit runs through Python code, command line, REST API, and web interfaces, with all models, data, and documentation released publicly.

Core claim

UnBias-Plus is an open-source toolkit that unifies segment-level multi-class bias classification, biased span localization, neutral text rewriting, and reasoning for each decision, delivered through Python, CLI, REST API, and web interfaces with all components made publicly available.

What carries the argument

The UnBias-Plus toolkit, which integrates the four bias-handling tasks of classification, localization, rewriting, and explanation into a single accessible system.

If this is right

Analysts gain segment-level multi-class labels instead of binary bias flags.
Exact biased spans can be isolated and replaced with neutral alternatives.
Each output includes explicit reasoning to support interpretability.
Multiple interfaces make the capabilities available to coders, command-line users, and web visitors.
Public release of models and datasets enables reuse and extension by others.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Content platforms could embed the rewriting step to flag and suggest edits for user posts before publication.
The modular design might allow swapping in new bias classifiers for specialized domains such as legal or medical text.
Integration with existing writing assistants could create automated bias checks during document creation.

Load-bearing premise

Existing bias detection techniques can be reliably combined into one toolkit that produces accurate, interpretable, and neutral outputs across domains without introducing new biases or losing original meaning.

What would settle it

A side-by-side human evaluation of the toolkit's neutral rewrites on held-out texts from multiple domains, checking whether meaning is preserved and bias is reduced without new biases appearing.

Figures

Figures reproduced from arXiv: 2606.23412 by Ahmed ElKady, Ahmed Y. Radwan, Amrit Krishnan, Mohamed Hafez, Shaina Raza, Sindhuja Chaduvula.

**Figure 2.** Figure 2: The web demo interface of UnBias-Plus allows users to input text, highlights biased segments with [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

read the original abstract

Bias in natural language remains a persistent challenge in both human-written and AI-generated content, affecting domains such as journalism, education, and AI research. Most existing detection methods identify only the presence of bias, with limited support for granular detection, interpretable explanations, neutral rewriting, and openly available trained models. We present UnBias-Plus, an open-source toolkit unifying (1) segment-level multi-class bias classification, (2) biased span localization, (3) neutral text rewriting, and (4) reasoning for each decision. Available via Python, CLI, REST API, and web interfaces, UnBias-Plus supports accessible bias analysis. The toolkit, source code, models, datasets, and documentation are publicly available.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

UnBias-Plus is a released open-source toolkit that bundles four bias-handling tasks with multiple interfaces, but the paper offers no performance data or comparisons.

read the letter

The paper's core contribution is the public release of UnBias-Plus, a toolkit that combines segment-level multi-class bias classification, biased span localization, neutral text rewriting, and per-decision reasoning. It supports access through Python, CLI, REST API, and a web interface, with the code, models, and datasets all made available.

This packaging and the decision to open everything up are the concrete steps that stand out. Releasing trained models and datasets gives the community something usable right away, which can save time for people who need bias tools in practice.

The abstract contains no accuracy figures, no baselines, and no evaluation details on how well the components work together or across domains. The text stays at the level of describing what the system supports rather than showing results or improvements over prior separate tools.

Without those numbers it is hard to judge whether the combined system holds up or introduces new issues during rewriting. The paper reads as a systems announcement rather than a claim of new algorithmic performance.

This work is aimed at practitioners who want a ready package for bias analysis in journalism, education, or AI content. Researchers looking for new methods, formal analysis, or strong empirical evidence will get less from it.

I would send it to peer review as a systems paper. The release itself can be useful if the implementation details check out, even though the scientific advance is modest.

Referee Report

2 major / 0 minor

Summary. The paper presents UnBias-Plus, an open-source toolkit that unifies four capabilities: (1) segment-level multi-class bias classification, (2) biased span localization, (3) neutral text rewriting, and (4) reasoning for each decision. The toolkit is made available through Python, CLI, REST API, and web interfaces, with source code, models, datasets, and documentation publicly released to support bias analysis in domains such as journalism, education, and AI research.

Significance. If the implementations deliver accurate, interpretable outputs without introducing new biases or altering meaning, the toolkit would address a practical gap by providing granular, multi-function bias handling in a single accessible package. The public release of all components supports reproducibility and adoption, which could be a concrete contribution to applied NLP tooling even without novel algorithmic advances.

major comments (2)

[Abstract] Abstract: The central claim is the release of a unified toolkit addressing limitations of existing bias detection methods, yet no performance metrics, baselines, evaluation datasets, or ablation studies are mentioned. This absence makes it impossible to verify whether the four components function reliably together or outperform prior tools, which is load-bearing for any claim of practical utility.
[Abstract] Abstract: The manuscript asserts that the toolkit supports 'accurate, interpretable, and neutral outputs across domains' implicitly through its design, but provides no description of the underlying models, training procedures, or integration method. Without these details, the unification claim cannot be assessed for internal consistency or risk of compounding errors from the combined components.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive comments. We address each major comment below and indicate planned revisions to the abstract.

read point-by-point responses

Referee: [Abstract] Abstract: The central claim is the release of a unified toolkit addressing limitations of existing bias detection methods, yet no performance metrics, baselines, evaluation datasets, or ablation studies are mentioned. This absence makes it impossible to verify whether the four components function reliably together or outperform prior tools, which is load-bearing for any claim of practical utility.

Authors: The manuscript is a toolkit release paper whose primary contribution is the public unification and accessibility of the four capabilities rather than new algorithmic results. Evaluation details, including metrics on the component models, appear in the Methods and Experiments sections. We will revise the abstract to reference key performance figures and direct readers to those sections. revision: yes
Referee: [Abstract] Abstract: The manuscript asserts that the toolkit supports 'accurate, interpretable, and neutral outputs across domains' implicitly through its design, but provides no description of the underlying models, training procedures, or integration method. Without these details, the unification claim cannot be assessed for internal consistency or risk of compounding errors from the combined components.

Authors: The body of the manuscript describes the models, training procedures, and integration pipeline. To make this information visible from the abstract, we will add a concise summary of the model choices and integration approach. revision: yes

Circularity Check

0 steps flagged

No circularity: descriptive system announcement with no derivations or fitted claims

full rationale

The paper presents an open-source toolkit implementing four listed capabilities (segment-level classification, span localization, neutral rewriting, reasoning). No equations, predictions, fitted parameters, or derivation chains appear in the provided text. The central claim is simply the construction and public release of the toolkit, which is a factual descriptive statement rather than a result derived from inputs. No self-citations, uniqueness theorems, or ansatzes are invoked. This matches the default expectation of no significant circularity for system-description papers.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

No mathematical model or derivation is present; the work is a software engineering contribution.

pith-pipeline@v0.9.1-grok · 5669 in / 994 out tokens · 17935 ms · 2026-06-26T08:34:38.448280+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

33 extracted references · 12 canonical work pages

[1]

Bellamy, Rachel K. E. and Dey, Kuntal and Hind, Michael and Hoffman, Samuel C. and Houde, Stephanie and Kannan, Kalapriya and Lohia, Pranay and Martino, Jacquelyn and Mehta, Sameep and Mojsilovic, Aleksandra and Nagar, Seema and Ramamurthy, Karthikeyan Natesan and Richards, John T. and Saha, Diptikalyan and Sattigeri, Prasanna and Singh, Moninder and Vars...

work page doi:10.1147/jrd.2019.2942287 2019
[2]

arXiv preprint arXiv:2601.21666 , year=

SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding , author=. arXiv preprint arXiv:2601.21666 , year=

Pith/arXiv arXiv
[3]

arXiv preprint arXiv:2505.11454 , year=

Humanibench: A human-centric framework for large multimodal models evaluation , author=. arXiv preprint arXiv:2505.11454 , year=

Pith/arXiv arXiv
[5]

Neural Media Bias Detection Using Distant Supervision With BABE - Bias Annotations By Experts

Spinde, Timo and Plank, Manuel and Krieger, Jan-David and Ruas, Terry and Gipp, Bela and Aizawa, Akiko. Neural Media Bias Detection Using Distant Supervision With BABE - Bias Annotations By Experts. Findings of the Association for Computational Linguistics: EMNLP 2021. 2021. doi:10.18653/v1/2021.findings-emnlp.101

work page doi:10.18653/v1/2021.findings-emnlp.101 2021
[6]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

NeuTral Rewriter: A rule-based and neural approach to automatic rewriting into gender neutral alternatives , author=. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

2021
[7]

Expert Systems with Applications , volume=

Nbias: A natural language processing framework for BIAS identification in text , author=. Expert Systems with Applications , volume=. 2024 , publisher=

2024
[8]

Biaslyze API Documentation , year =
[9]

Sunipa Dev, Masoud Monajatipoor, Anaelia Ovalle, Arjun Subramonian, Jeff Phillips, and Kai-Wei Chang

Blodgett, Su Lin and Barocas, Solon and Daum. Language (Technology) Is Power: A Critical Survey of ``Bias'' in. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , year =. doi:10.18653/v1/2020.acl-main.485 , url =

work page doi:10.18653/v1/2020.acl-main.485 2020
[10]

and Saligrama, Venkatesh and Kalai, Adam T

Bolukbasi, Tolga and Chang, Kai-Wei and Zou, James Y. and Saligrama, Venkatesh and Kalai, Adam T. , title =. Advances in Neural Information Processing Systems (. 2016 , volume =

2016
[11]

Journal of Open Source Software , year =

Bouchard, Dylan and Chauhan, Mohit Singh and Skarbrevik, David and Bajaj, Viren and Ahmad, Zeya , title =. Journal of Open Source Software , year =. doi:10.21105/joss.07570 , url =

work page doi:10.21105/joss.07570
[12]

, title =

Castillo-Campos, Miguel and Becerra-Alonso, David and Boomgaarden, Hajo G. , title =. Social Science Computer Review , year =. doi:10.1177/08944393251331510 , url =

work page doi:10.1177/08944393251331510
[13]

Nikita Nangia, Clara Vania, Rasika Bhalerao, and Samuel R

Gallegos, Isabel O. and Rossi, Ryan A. and Barrow, Joe and Tanjim, Md Mehrab and Kim, Sungchul and Dernoncourt, Franck and Yu, Tong and Chang, Ruiyi and Ahmed, Nesreen K. , title =. Computational Linguistics , year =. doi:10.1162/coli_a_00524 , url =

work page doi:10.1162/coli_a_00524
[14]

and Aponte, Rogelio and Rossi, Ryan A

Gallegos, Isabel O. and Aponte, Rogelio and Rossi, Ryan A. and Barrow, Joe and Tanjim, Md Mehrab and Yu, Tong and Deilamsalehy, Hanieh and Zhang, Ruiyi and Kim, Sungchul and Dernoncourt, Franck and Lipka, Nedim and Owens, Daniel and Gu, Jiuxiang , title =. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Com...

2025
[15]

and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , title =

Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , title =. Proceedings of the 10th International Conference on Learning Representations (. 2022 , url =

2022
[16]

Proceedings of the 31st International Conference on Computational Linguistics (

Lin, Luyang and Wang, Lingzhi and Guo, Jinsong and Wong, Kam-Fai , title =. Proceedings of the 31st International Conference on Computational Linguistics (. 2025 , pages =

2025
[17]

arXiv preprint arXiv:2508.03677 , year =

Marchiori Manerba, Marta and Navigli, Roberto and Ruggeri, Federico and Bernardi, Debora , title =. arXiv preprint arXiv:2508.03677 , year =

arXiv
[18]

Nallapati, R., Zhou, B., Gulcehre, C., and Xiang, B

Nadeem, Moin and Bethke, Anna and Reddy, Siva , title =. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics , year =. doi:10.18653/v1/2021.acl-long.416 , url =

work page doi:10.18653/v1/2021.acl-long.416 2021
[19]

Ouyang, Long and Wu, Jeff and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L. and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and Schulman, John and Hilton, Jacob and Kelton, Fraser and Miller, Luke and Simens, Maddie and Askell, Amanda and Welinder, Peter and Christiano, Paul and Leike, Jan and Lowe, R...

2022
[20]

Jwala Dhamala, Tony Sun, et al

Parrish, Alicia and Chen, Angelica and Nangia, Nikita and Padmakumar, Vishakh and Phang, Jason and Thompson, Jana and Htut, Phu Mon and Bowman, Samuel R. , title =. Findings of the Association for Computational Linguistics:. 2022 , pages =. doi:10.18653/v1/2022.findings-acl.165 , url =

work page doi:10.18653/v1/2022.findings-acl.165 2022
[21]

arXiv preprint arXiv:2505.09388 , year =

Pith/arXiv arXiv
[22]

International Journal of Data Science and Analytics , year =

Raza, Shaina and Reji, Deepak John and Ding, Chen , title =. International Journal of Data Science and Analytics , year =. doi:10.1007/s41060-022-00359-4 , url =

work page doi:10.1007/s41060-022-00359-4
[23]

arXiv preprint arXiv:2312.00168 , year =

Raza, Shaina , title =. arXiv preprint arXiv:2312.00168 , year =

arXiv
[24]

Information Fusion , year =

Raza, Shaina and Vayani, Arshia and Jain, Ankita and Narayanan, Ananya and Khazaie, Vahid Reza and Bashir, Syed Raza and Dolatabadi, Elham and Uddin, Gias and Emmanouilidis, Christos and Qureshi, Rizwan and others , title =. Information Fusion , year =. doi:10.1016/j.inffus.2025.104092 , url =

work page doi:10.1016/j.inffus.2025.104092 2025
[25]

Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (

Tokpo, Ewoenam Kwaku and Calders, Toon , title =. Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (. 2022 , pages =

2022
[26]

2025 , url =

Neutralizing Bias in. 2025 , url =

2025
[27]

2026 , url =

unbias-plus Demo Notebook , howpublished =. 2026 , url =

2026
[28]

Fairlearn: Assessing and Improving Fairness of

Weerts, Hilde and Dud. Fairlearn: Assessing and Improving Fairness of. Journal of Machine Learning Research , year =
[29]

Proceedings of the 5th Workshop on Trustworthy

Xu, Xin and Xu, Wei and Zhang, Ningyu and McAuley, Julian , title =. Proceedings of the 5th Workshop on Trustworthy. 2025 , pages =

2025
[30]

Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods

Zhao, Jieyu and Wang, Tianlu and Yatskar, Mark and Ordonez, Vicente and Chang, Kai-Wei , title =. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , year =. doi:10.18653/v1/N18-2003 , url =

work page doi:10.18653/v1/n18-2003 2018
[31]

Findings of the Association for Computational Linguistics: EMNLP 2021 , pages=

WIKIBIAS: Detecting multi-span subjective biases in language , author=. Findings of the Association for Computational Linguistics: EMNLP 2021 , pages=

2021
[32]

Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

Language (technology) is power: A critical survey of “bias” in NLP , author=. Proceedings of the 58th annual meeting of the association for computational linguistics , pages=
[33]

arXiv preprint arXiv:2312.06674 , year=

Llama guard: Llm-based input-output safeguard for human-ai conversations , author=. arXiv preprint arXiv:2312.06674 , year=

Pith/arXiv arXiv
[34]

and Wallach, Hanna and Cotterell, Ryan

Zmigrod, Ran and Mielke, Sabrina J. and Wallach, Hanna and Cotterell, Ryan , title =. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , year =. doi:10.18653/v1/P19-1161 , url =

work page doi:10.18653/v1/p19-1161

[1] [1]

Bellamy, Rachel K. E. and Dey, Kuntal and Hind, Michael and Hoffman, Samuel C. and Houde, Stephanie and Kannan, Kalapriya and Lohia, Pranay and Martino, Jacquelyn and Mehta, Sameep and Mojsilovic, Aleksandra and Nagar, Seema and Ramamurthy, Karthikeyan Natesan and Richards, John T. and Saha, Diptikalyan and Sattigeri, Prasanna and Singh, Moninder and Vars...

work page doi:10.1147/jrd.2019.2942287 2019

[2] [2]

arXiv preprint arXiv:2601.21666 , year=

SONIC-O1: A Real-World Benchmark for Evaluating Multimodal Large Language Models on Audio-Video Understanding , author=. arXiv preprint arXiv:2601.21666 , year=

Pith/arXiv arXiv

[3] [3]

arXiv preprint arXiv:2505.11454 , year=

Humanibench: A human-centric framework for large multimodal models evaluation , author=. arXiv preprint arXiv:2505.11454 , year=

Pith/arXiv arXiv

[4] [5]

Neural Media Bias Detection Using Distant Supervision With BABE - Bias Annotations By Experts

Spinde, Timo and Plank, Manuel and Krieger, Jan-David and Ruas, Terry and Gipp, Bela and Aizawa, Akiko. Neural Media Bias Detection Using Distant Supervision With BABE - Bias Annotations By Experts. Findings of the Association for Computational Linguistics: EMNLP 2021. 2021. doi:10.18653/v1/2021.findings-emnlp.101

work page doi:10.18653/v1/2021.findings-emnlp.101 2021

[5] [6]

Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

NeuTral Rewriter: A rule-based and neural approach to automatic rewriting into gender neutral alternatives , author=. Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing , pages=

2021

[6] [7]

Expert Systems with Applications , volume=

Nbias: A natural language processing framework for BIAS identification in text , author=. Expert Systems with Applications , volume=. 2024 , publisher=

2024

[7] [8]

Biaslyze API Documentation , year =

[8] [9]

Sunipa Dev, Masoud Monajatipoor, Anaelia Ovalle, Arjun Subramonian, Jeff Phillips, and Kai-Wei Chang

Blodgett, Su Lin and Barocas, Solon and Daum. Language (Technology) Is Power: A Critical Survey of ``Bias'' in. Proceedings of the 58th Annual Meeting of the Association for Computational Linguistics , year =. doi:10.18653/v1/2020.acl-main.485 , url =

work page doi:10.18653/v1/2020.acl-main.485 2020

[9] [10]

and Saligrama, Venkatesh and Kalai, Adam T

Bolukbasi, Tolga and Chang, Kai-Wei and Zou, James Y. and Saligrama, Venkatesh and Kalai, Adam T. , title =. Advances in Neural Information Processing Systems (. 2016 , volume =

2016

[10] [11]

Journal of Open Source Software , year =

Bouchard, Dylan and Chauhan, Mohit Singh and Skarbrevik, David and Bajaj, Viren and Ahmad, Zeya , title =. Journal of Open Source Software , year =. doi:10.21105/joss.07570 , url =

work page doi:10.21105/joss.07570

[11] [12]

, title =

Castillo-Campos, Miguel and Becerra-Alonso, David and Boomgaarden, Hajo G. , title =. Social Science Computer Review , year =. doi:10.1177/08944393251331510 , url =

work page doi:10.1177/08944393251331510

[12] [13]

Nikita Nangia, Clara Vania, Rasika Bhalerao, and Samuel R

Gallegos, Isabel O. and Rossi, Ryan A. and Barrow, Joe and Tanjim, Md Mehrab and Kim, Sungchul and Dernoncourt, Franck and Yu, Tong and Chang, Ruiyi and Ahmed, Nesreen K. , title =. Computational Linguistics , year =. doi:10.1162/coli_a_00524 , url =

work page doi:10.1162/coli_a_00524

[13] [14]

and Aponte, Rogelio and Rossi, Ryan A

Gallegos, Isabel O. and Aponte, Rogelio and Rossi, Ryan A. and Barrow, Joe and Tanjim, Md Mehrab and Yu, Tong and Deilamsalehy, Hanieh and Zhang, Ruiyi and Kim, Sungchul and Dernoncourt, Franck and Lipka, Nedim and Owens, Daniel and Gu, Jiuxiang , title =. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Com...

2025

[14] [15]

and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , title =

Hu, Edward J. and Shen, Yelong and Wallis, Phillip and Allen-Zhu, Zeyuan and Li, Yuanzhi and Wang, Shean and Wang, Lu and Chen, Weizhu , title =. Proceedings of the 10th International Conference on Learning Representations (. 2022 , url =

2022

[15] [16]

Proceedings of the 31st International Conference on Computational Linguistics (

Lin, Luyang and Wang, Lingzhi and Guo, Jinsong and Wong, Kam-Fai , title =. Proceedings of the 31st International Conference on Computational Linguistics (. 2025 , pages =

2025

[16] [17]

arXiv preprint arXiv:2508.03677 , year =

Marchiori Manerba, Marta and Navigli, Roberto and Ruggeri, Federico and Bernardi, Debora , title =. arXiv preprint arXiv:2508.03677 , year =

arXiv

[17] [18]

Nallapati, R., Zhou, B., Gulcehre, C., and Xiang, B

Nadeem, Moin and Bethke, Anna and Reddy, Siva , title =. Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics , year =. doi:10.18653/v1/2021.acl-long.416 , url =

work page doi:10.18653/v1/2021.acl-long.416 2021

[18] [19]

Ouyang, Long and Wu, Jeff and Jiang, Xu and Almeida, Diogo and Wainwright, Carroll L. and Mishkin, Pamela and Zhang, Chong and Agarwal, Sandhini and Slama, Katarina and Ray, Alex and Schulman, John and Hilton, Jacob and Kelton, Fraser and Miller, Luke and Simens, Maddie and Askell, Amanda and Welinder, Peter and Christiano, Paul and Leike, Jan and Lowe, R...

2022

[19] [20]

Jwala Dhamala, Tony Sun, et al

Parrish, Alicia and Chen, Angelica and Nangia, Nikita and Padmakumar, Vishakh and Phang, Jason and Thompson, Jana and Htut, Phu Mon and Bowman, Samuel R. , title =. Findings of the Association for Computational Linguistics:. 2022 , pages =. doi:10.18653/v1/2022.findings-acl.165 , url =

work page doi:10.18653/v1/2022.findings-acl.165 2022

[20] [21]

arXiv preprint arXiv:2505.09388 , year =

Pith/arXiv arXiv

[21] [22]

International Journal of Data Science and Analytics , year =

Raza, Shaina and Reji, Deepak John and Ding, Chen , title =. International Journal of Data Science and Analytics , year =. doi:10.1007/s41060-022-00359-4 , url =

work page doi:10.1007/s41060-022-00359-4

[22] [23]

arXiv preprint arXiv:2312.00168 , year =

Raza, Shaina , title =. arXiv preprint arXiv:2312.00168 , year =

arXiv

[23] [24]

Information Fusion , year =

Raza, Shaina and Vayani, Arshia and Jain, Ankita and Narayanan, Ananya and Khazaie, Vahid Reza and Bashir, Syed Raza and Dolatabadi, Elham and Uddin, Gias and Emmanouilidis, Christos and Qureshi, Rizwan and others , title =. Information Fusion , year =. doi:10.1016/j.inffus.2025.104092 , url =

work page doi:10.1016/j.inffus.2025.104092 2025

[24] [25]

Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (

Tokpo, Ewoenam Kwaku and Calders, Toon , title =. Proceedings of the 4th Workshop on Gender Bias in Natural Language Processing (. 2022 , pages =

2022

[25] [26]

2025 , url =

Neutralizing Bias in. 2025 , url =

2025

[26] [27]

2026 , url =

unbias-plus Demo Notebook , howpublished =. 2026 , url =

2026

[27] [28]

Fairlearn: Assessing and Improving Fairness of

Weerts, Hilde and Dud. Fairlearn: Assessing and Improving Fairness of. Journal of Machine Learning Research , year =

[28] [29]

Proceedings of the 5th Workshop on Trustworthy

Xu, Xin and Xu, Wei and Zhang, Ningyu and McAuley, Julian , title =. Proceedings of the 5th Workshop on Trustworthy. 2025 , pages =

2025

[29] [30]

Gender Bias in Coreference Resolution: Evaluation and Debiasing Methods

Zhao, Jieyu and Wang, Tianlu and Yatskar, Mark and Ordonez, Vicente and Chang, Kai-Wei , title =. Proceedings of the 2018 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies , year =. doi:10.18653/v1/N18-2003 , url =

work page doi:10.18653/v1/n18-2003 2018

[30] [31]

Findings of the Association for Computational Linguistics: EMNLP 2021 , pages=

WIKIBIAS: Detecting multi-span subjective biases in language , author=. Findings of the Association for Computational Linguistics: EMNLP 2021 , pages=

2021

[31] [32]

Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

Language (technology) is power: A critical survey of “bias” in NLP , author=. Proceedings of the 58th annual meeting of the association for computational linguistics , pages=

[32] [33]

arXiv preprint arXiv:2312.06674 , year=

Llama guard: Llm-based input-output safeguard for human-ai conversations , author=. arXiv preprint arXiv:2312.06674 , year=

Pith/arXiv arXiv

[33] [34]

and Wallach, Hanna and Cotterell, Ryan

Zmigrod, Ran and Mielke, Sabrina J. and Wallach, Hanna and Cotterell, Ryan , title =. Proceedings of the 57th Annual Meeting of the Association for Computational Linguistics , year =. doi:10.18653/v1/P19-1161 , url =

work page doi:10.18653/v1/p19-1161