Fairness Testing of Large Language Models in Role-Playing

Jie M. Zhang; Tianlin Li; Weisong Sun; Xinyue Li; Xuanzhe Liu; Yang Liu; Yiling Lou; Ying Xiao; Zhenpeng Chen

arxiv: 2411.00585 · v2 · submitted 2024-11-01 · 💻 cs.CY · cs.AI

Fairness Testing of Large Language Models in Role-Playing

Xinyue Li , Zhenpeng Chen , Jie M. Zhang , Ying Xiao , Tianlin Li , Weisong Sun , Yang Liu , Yiling Lou

show 1 more author

Xuanzhe Liu

This is my paper

Pith reviewed 2026-05-23 18:15 UTC · model grok-4.3

classification 💻 cs.CY cs.AI

keywords fairness testinglarge language modelsrole-playingsocial biasdemographic attributesempirical evaluationbias detectionLLM evaluation

0 comments

The pith

Testing shows ten LLMs produce between 7,579 and 16,963 biased responses each when asked to role-play across 11 demographic groups.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tests whether social biases appear when LLMs are prompted to adopt specific social roles. It generates 550 roles covering 11 demographic attributes and turns them into 33,000 questions in yes/no, multiple-choice, and open formats. These questions are fed to ten advanced models, and biased outputs are flagged with rule-based and LLM-based detectors that were checked by humans. The results show more than 100,000 biased responses overall, indicating that role-playing prompts reliably surface demographic biases. This matters because role-playing is a common way to make LLMs useful in real applications.

Core claim

Using 33,000 role-specific questions built from 550 social roles that span 11 demographic attributes, evaluations of ten LLMs identify 107,580 biased responses, with each model producing between 7,579 and 16,963 such responses.

What carries the argument

The 33,000 role-specific questions generated from 550 social roles across 11 demographic attributes, paired with rule-based and LLM-based bias identification validated by human review.

If this is right

Role-playing prompts cause LLMs to produce biased answers tied to demographic identities.
The bias appears across all ten models tested and across yes/no, multiple-choice, and open-ended question types.
The released set of 33,000 questions and detection scripts can be reused to measure bias in new models.
Bias rates vary by model but remain high enough in every case to affect applications that rely on role adoption.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Application builders who use role-playing features may need separate bias filters before deployment.
The same question-generation approach could be applied to test bias in other interaction styles such as story continuation or advice-giving.
If the bias detection methods under-count subtle cases, the actual prevalence could be even higher than reported.

Load-bearing premise

The rule-based and LLM-based methods correctly flag biased answers without missing subtle cases or incorrectly labeling neutral ones.

What would settle it

Independent human review of a random sample of the flagged responses that finds the true bias rate is less than half the reported figure.

Figures

Figures reproduced from arXiv: 2411.00585 by Jie M. Zhang, Tianlin Li, Weisong Sun, Xinyue Li, Xuanzhe Liu, Yang Liu, Yiling Lou, Ying Xiao, Zhenpeng Chen.

**Figure 2.** Figure 2: Overview of BiasLens. Tizpaz-Niari et al. [51] present a testing approach that examines how hyperparameter configurations in machine learning models impact fairness outcomes. Building on gradient-based techniques, Zhang et al. [61] develop ADF to efficiently generate test cases that expose fairness violations. Taking a different approach, Zheng et al. [66] propose NeuronFair, which identifies and leverages… view at source ↗

**Figure 3.** Figure 3: Example prompt for role generation related to the occupation attribute. [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 4.** Figure 4: Prompts for question generation. In summary, for Yes/No questions, the prompt consists of: Task description + Example (Yes/No) + Requirements + Format (Yes/No). For Choice questions, the prompt consists of: Task description + Example (Choice) + Requirements + Format (Choice). For Why questions, the prompt consists of: Task description + Example (Why) + Requirements + Format (Why). The complete prompts for … view at source ↗

**Figure 5.** Figure 5: Example response generated by Llama-3-8B to a Why question on September 29, 2024. [PITH_FULL_IMAGE:figures/full_fig_p009_5.png] view at source ↗

**Figure 6.** Figure 6: Prompt for three judge LLMs. each one. We then use the majority vote across the three responses to reach a final conclusion about whether the LLM under test has produced a biased answer to the question. In total, nine LLM judges are used for generating the oracle for each question, ensuring a more reliable and accurate test oracle. For the evaluation, we use GPT4o-mini [9] due to its moderate cost, making … view at source ↗

**Figure 7.** Figure 7: (RQ1) Average biased responses per demographic attribute across six LLMs. The attributes are presented [PITH_FULL_IMAGE:figures/full_fig_p013_7.png] view at source ↗

**Figure 8.** Figure 8: (RQ1) Proportion of questions that elicit biased responses in one to six LLMs. Overall, the moderate [PITH_FULL_IMAGE:figures/full_fig_p013_8.png] view at source ↗

read the original abstract

Large Language Models (LLMs) have become foundational in modern language-driven software applications, profoundly influencing daily life. A critical technique in leveraging their potential is role-playing, where LLMs simulate diverse roles to enhance their real-world utility. However, while research has highlighted the presence of social biases in LLM outputs, it remains unclear whether and to what extent these biases emerge during role-playing scenarios. In this paper, we conduct an empirical study on fairness testing of LLMs in role-playing scenarios. To enable this testing, we use LLMs to generate 550 social roles spanning a comprehensive set of 11 demographic attributes, producing 33,000 role-specific questions that target various forms of bias. These questions, covering Yes/No, multiple-choice, and open-ended formats, are designed to prompt LLMs to adopt specific roles and respond accordingly. We employ a combination of rule-based and LLM-based strategies to identify biased responses, rigorously validated through human evaluation. Using the generated questions as the test cases, we conduct extensive evaluations of 10 advanced LLMs. The evaluation reveal 107,580 biased responses across the studied LLMs, with individual models yielding between 7,579 and 16,963 biased responses, underscoring the prevalence of bias in role-playing contexts. To support future research, we have publicly released the dataset, along with all scripts and experimental results.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

New 33k-question dataset for role-play bias testing is the real deliverable, but the 107k biased-response count depends on detectors whose accuracy is not quantified.

read the letter

The paper's main contribution is a dataset of 33,000 questions built around 550 social roles drawn from 11 demographic attributes, used to probe bias in how LLMs handle role-playing prompts. They test 10 models and report 107,580 biased responses, with per-model figures in the 7k-17k range. They also release the questions, scripts, and results. What they do well is scale up the question generation to cover multiple formats and many roles, then apply it systematically. Generating the roles with LLMs and targeting specific biases is a straightforward extension of prior fairness work, and making everything public lowers the barrier for follow-up. The soft spot sits in the bias detection step. They combine rule-based and LLM-based identification, say it was validated by humans, but the text does not report inter-rater agreement, the size of the validation set, or precision and recall broken down by bias type or question style. Because the headline counts come directly from this pipeline, any undetected false positives or missed cases would scale up across the 33k questions and change the story on how prevalent the bias really is. The stress-test note flags exactly this, and without those numbers the central claim stays provisional. This work is aimed at fairness researchers and practitioners who use or evaluate LLMs in simulated persona settings. The dataset is the part that will probably get used even if the specific counts get revised later. I would bring it to a reading group for the dataset construction details and to discuss how to improve the validation. It deserves peer review because the empirical scale and the public release are solid enough to warrant referee input on the methodology, particularly around the labeling accuracy.

Referee Report

1 major / 1 minor

Summary. The paper conducts an empirical study on fairness testing of LLMs in role-playing scenarios. It generates 550 social roles across 11 demographic attributes, creating 33,000 role-specific questions in Yes/No, multiple-choice, and open-ended formats. These are used to evaluate 10 LLMs, identifying biased responses using rule-based and LLM-based strategies validated by human evaluation. The study reports 107,580 biased responses across the models (ranging from 7,579 to 16,963 per model) and releases the dataset, scripts, and results.

Significance. If the bias detection pipeline is reliable, the work demonstrates the prevalence of social biases in LLM role-playing and provides a substantial public dataset and evaluation framework for future fairness research in this area. The public release of the dataset and scripts strengthens the contribution by enabling reproducibility.

major comments (1)

[Abstract and Evaluation Strategy] Abstract and Evaluation Strategy: The bias identification relies on rule-based and LLM-based strategies 'rigorously validated through human evaluation,' but no quantitative details are provided on inter-annotator agreement, the size of the validation set, precision or recall on role-playing responses, or error analysis by bias type. Since the central counts (107,580 biased responses) are produced by this pipeline, even moderate error rates could substantially alter the reported prevalence.

minor comments (1)

[Abstract] The sentence 'The evaluation reveal 107,580 biased responses' contains a grammatical error ('reveal' should be 'reveals').

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the evaluation strategy. We address the concern point by point below.

read point-by-point responses

Referee: [Abstract and Evaluation Strategy] Abstract and Evaluation Strategy: The bias identification relies on rule-based and LLM-based strategies 'rigorously validated through human evaluation,' but no quantitative details are provided on inter-annotator agreement, the size of the validation set, precision or recall on role-playing responses, or error analysis by bias type. Since the central counts (107,580 biased responses) are produced by this pipeline, even moderate error rates could substantially alter the reported prevalence.

Authors: We agree that the current manuscript lacks the requested quantitative details on the human validation of the bias identification pipeline. In the revised version we will add a dedicated subsection reporting: (1) the size of the human validation set, (2) inter-annotator agreement (e.g., Cohen’s or Fleiss’ kappa), (3) precision and recall of both the rule-based and LLM-based detectors measured against the human labels on role-playing responses, and (4) an error analysis stratified by bias type. These additions will directly address the concern that moderate error rates could affect the reported prevalence figures. revision: yes

Circularity Check

0 steps flagged

No circularity: purely empirical count of detector-flagged responses

full rationale

The paper performs an empirical measurement: it generates 33,000 role-specific questions via LLMs, applies rule-based plus LLM-based bias detectors (human-validated per the abstract), and reports the resulting counts (107580 total biased responses). No equations, fitted parameters, predictions, or derivations are present. The central claim is a direct tally of detector outputs on the generated test cases; it does not reduce to any self-definition, self-citation chain, or renaming of inputs. The quality of the detectors is a separate validity concern, not a circularity issue under the defined criteria.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The work is an empirical measurement study; it introduces no mathematical free parameters, no new physical or logical entities, and relies on one domain-level assumption about bias detection.

axioms (1)

domain assumption Biased responses in role-playing can be reliably identified by a combination of rule-based and LLM-based strategies that were validated by human evaluation.
This assumption is required to convert raw model outputs into the reported count of 107,580 biased responses.

pith-pipeline@v0.9.0 · 5798 in / 1210 out tokens · 50680 ms · 2026-05-23T18:15:46.192012+00:00 · methodology

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

StereoTales: A Multilingual Framework for Open-Ended Stereotype Discovery in LLMs
cs.CY 2026-05 unverdicted novelty 7.0

StereoTales shows that LLMs produce harmful, culturally adapted stereotypes in open-ended multilingual stories, with patterns consistent across providers and aligned human-LLM harm judgments.
StereoTales: A Multilingual Framework for Open-Ended Stereotype Discovery in LLMs
cs.CY 2026-05 accept novelty 7.0

StereoTales shows that all tested LLMs emit harmful stereotypes in open-ended stories, with associations adapting to prompt language and targeting locally salient groups rather than transferring uniformly across languages.
Identifying and Mitigating Gender Cues in Academic Recommendation Letters: An Interpretability Case Study
cs.LG 2026-04 unverdicted novelty 5.0

Transformer models detect applicant gender in de-gendered academic recommendation letters via implicit linguistic patterns such as associations with words like 'emotional' and 'humanitarian', and removing these cues r...

Reference graph

Works this paper leans on

67 extracted references · 67 canonical work pages · cited by 2 Pith papers · 1 internal anchor

[1]

Adopt-a-persona-claude

2024. Adopt-a-persona-claude. https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/system- prompts

work page 2024
[2]

Adopt-a-persona-gemini

2024. Adopt-a-persona-gemini. https://support.google.com/a/users/answer/14667148?visit_id=638649091395709697- 2537054327&hl=en&rd=1

work page arXiv 2024
[3]

Adopt-a-persona-meta

2024. Adopt-a-persona-meta. https://www.llama.com/docs/how-to-guides/prompting

work page 2024
[4]

Adopt-a-persona-mistral

2024. Adopt-a-persona-mistral. https://docs.mistral.ai/guides/prompting_capabilities/

work page 2024
[5]

Adopt-a-persona-openai

2024. Adopt-a-persona-openai. https://platform.openai.com/docs/guides/prompt-engineering/tactic-ask-the-model- to-adopt-a-persona

work page 2024
[6]

Chatbot Arena LLM Leaderboard: Community-driven evaluation for best LLM and AI chatbots

2024. Chatbot Arena LLM Leaderboard: Community-driven evaluation for best LLM and AI chatbots. https://lmarena. ai/

work page 2024
[7]

DeepSeek-V2.5

2024. DeepSeek-V2.5. https://huggingface.co/deepseek-ai/DeepSeek-V2.5

work page 2024
[8]

2024. GPT4o. https://platform.openai.com/docs/models/gpt-4o

work page 2024
[9]

GPT4o-mini

2024. GPT4o-mini. https://platform.openai.com/docs/models/gpt-4o-mini

work page 2024
[10]

Meta-Llama-3-70B

2024. Meta-Llama-3-70B. https://huggingface.co/meta-llama/Meta-Llama-3-70B

work page 2024
[11]

Meta-Llama-3-8B

2024. Meta-Llama-3-8B. https://huggingface.co/meta-llama/Meta-Llama-3-8B

work page 2024
[12]

Mistral-7B-Instruct-v0.3

2024. Mistral-7B-Instruct-v0.3. https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3

work page 2024
[13]

Qwen1.5-110B-Chat

2024. Qwen1.5-110B-Chat. https://huggingface.co/Qwen/Qwen1.5-110B-Chat

work page 2024
[14]

Replication package

2024. Replication package. https://github.com/LLMBias/BiasLens

work page 2024
[15]

Muhammad Hilmi Asyrofi, Zhou Yang, Imam Nur Bani Yusuf, Hong Jin Kang, Ferdian Thung, and David Lo. 2022. BiasFinder: Metamorphic test generation to uncover bias for sentiment analysis systems. IEEE Transactions on Software Engineering 48, 12 (2022), 5087–5101

work page 2022
[16]

Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo

Earl T. Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo. 2015. The oracle problem in software testing: A survey. IEEE Transactions on Software Engineering 41, 5 (2015), 507–525

work page 2015
[17]

Yuriy Brun and Alexandra Meliou. 2018. Software fairness. In Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2018 . 754–759

work page 2018
[18]

Deborah Carlander, Kiyoshiro Okada, Henrik Engström, and Shuichi Kurabayashi. 2024. Controlled chain of thought: Eliciting role-play understanding in LLM through prompts. InProceedings of IEEE Conference on Games, CoG 2024 . 1–4

work page 2024
[19]

Zhenpeng Chen, Yanbin Cao, Yuanqiang Liu, Haoyu Wang, Tao Xie, and Xuanzhe Liu. 2020. A comprehensive study on challenges in deploying deep learning based software. In Proceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020 . 750–762

work page 2020
[20]

Zhang, Max Hort, Mark Harman, and Federica Sarro

Zhenpeng Chen, Jie M. Zhang, Max Hort, Mark Harman, and Federica Sarro. 2024. Fairness testing: A comprehensive survey and analysis of trends. ACM Transactions on Software Engineering and Methodology 33, 5 (2024), 137:1–137:59

work page 2024
[21]

Zhang, Federica Sarro, and Mark Harman

Zhenpeng Chen, Jie M. Zhang, Federica Sarro, and Mark Harman. 2022. MAAT: A novel ensemble approach to addressing fairness and performance bugs for machine learning software. InProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022 . 1122–1134

work page 2022
[22]

Zhang, Federica Sarro, and Mark Harman

Zhenpeng Chen, Jie M. Zhang, Federica Sarro, and Mark Harman. 2023. A comprehensive empirical study of bias mitigation methods for machine learning classifiers. ACM Transactions on Software Engineering and Methodology 32, 4 (2023), 106:1–106:30

work page 2023
[23]

Jordan, Joseph E

Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Banghua Zhu, Hao Zhang, Michael I. Jordan, Joseph E. Gonzalez, and Ion Stoica. 2024. Chatbot arena: An open platform for evaluating LLMs by human preference. In Proceedings of the Forty-first International Conference on Machine Learning, ICML 2024

work page 2024
[24]

Zhibo Chu, Zichong Wang, and Wenbin Zhang. 2024. Fairness in large language models: A Taxonomic Survey.SIGKDD Exploration 26, 1 (2024), 34–48

work page 2024
[25]

Xuanqi Gao, Juan Zhai, Shiqing Ma, Chao Shen, Yufei Chen, and Qian Wang. 2022. Fairneuron: Improving deep neural network fairness with adversary games on selective neurons. In Proceedings of the 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022 . 921–933

work page 2022
[26]

Karen Gonsalkorale, Jeffrey W Sherman, and Karl Christoph Klauer. 2009. Aging and prejudice: Diminished regulation of automatic race bias among older adults. Journal of Experimental Social Psychology 45, 2 (2009), 410–414

work page 2009
[27]

James D Gwartney and Kenneth M McCaffree. 1971. Variance in discrimination among occupations.Southern Economic Journal (1971), 141–155

work page 1971
[28]

Amit Haim, Alejandro Salinas, and Julian Nyarko. 2024. What’s in a name? Auditing large language models for race and gender bias. arXiv preprint arXiv:2402.14875 (2024)

work page arXiv 2024
[29]

Jaeho Jeon and Seongyong Lee. 2023. Large language models in education: A focus on the complementary relationship between human teachers and ChatGPT. Education and Information Technologies 28, 12 (2023), 15873–15892. , Vol. 1, No. 1, Article . Publication date: November 2024. 20 Li et al

work page 2023
[30]

A Woman is More Culturally Knowledgeable than A Man?

Mahammed Kamruzzaman, Hieu Nguyen, Nazmul Hassan, and Gene Louis Kim. 2024. "A Woman is More Culturally Knowledgeable than A Man?": The Effect of Personas on Cultural Norm Interpretation in LLMs. CoRR abs/2409.11636 (2024)

work page arXiv 2024
[31]

Hadas Kotek, Rikker Dockum, and David Q. Sun. 2023. Gender bias and stereotypes in large language models. In Proceedings of The ACM Collective Intelligence Conference, CI 2023 . 12–24

work page 2023
[32]

J Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33 1 (1977), 159–74

work page 1977
[33]

Yingji Li, Mengnan Du, Rui Song, Xin Wang, and Ying Wang. 2023. A survey on fairness in large language models. CoRR abs/2308.10149 (2023)

work page arXiv 2023
[34]

Ryan Louie, Ananjan Nandi, William Fang, Cheng Chang, Emma Brunskill, and Diyi Yang. 2024. Roleplay-doh: Enabling domain-experts to create LLM-simulated patients via eliciting and adhering to principles. CoRR abs/2407.00870 (2024)

work page arXiv 2024
[35]

Li-Chun Lu, Shou-Jen Chen, Tsung-Min Pai, Chan-Hung Yu, Hung-yi Lee, and Shao-Hua Sun. 2024. LLM discussion: Enhancing the creativity of large language models via discussion framework and role-play. CoRR abs/2405.06373 (2024)

work page arXiv 2024
[36]

Verya Monjezi, Ashutosh Trivedi, Gang Tan, and Saeid Tizpaz-Niari. 2023. Information-Theoretic Testing and Debugging of Fairness Defects in Deep Neural Networks. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023. 1571–1582

work page 2023
[37]

Moin Nadeem, Anna Bethke, and Siva Reddy. 2021. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021 . Association for Computational Linguistics, 5356–5371

work page 2021
[38]

Shuyin Ouyang, Jie M Zhang, Mark Harman, and Meng Wang. 2024. An empirical study of the non-determinism of ChatGPT in code generation. ACM Transactions on Software Engineering and Methodology (2024)

work page 2024
[39]

I’m fully who I am

Anaelia Ovalle, Palash Goyal, Jwala Dhamala, Zachary Jaggers, Kai-Wei Chang, Aram Galstyan, Richard S. Zemel, and Rahul Gupta. 2023. “I’m fully who I am”’: Towards centering transgender and non-binary voices to measure biases in open language generation. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2023. 1246–1266

work page 2023
[40]

Shubham Pandey, Archana Patel, and Purvi Pokhariyal. 2024. Exploring the role of ChatGPT in the law enforcement and banking sectors. Artificial Intelligence for Risk Mitigation in the Financial Industry (2024), 327–347

work page 2024
[41]

Juliane Ressel, Michaele Völler, Finbarr Murphy, and Martin Mullins. 2024. Addressing the notion of trust around ChatGPT in the high-stakes use case of insurance. Technology in Society (2024), 102644

work page 2024
[42]

Leonard Salewski, Stephan Alaniz, Isabel Rio-Torto, Eric Schulz, and Zeynep Akata. 2023. In-context impersonation reveals large language models’ strengths and biases. InProceesings of Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023

work page 2023
[43]

Im not Racist but

Abel Salinas, Louis Penafiel, Robert McCormack, and Fred Morstatter. 2023. “Im not racist but... ”: Discovering bias in the internal knowledge of large language models. CoRR abs/2310.08780 (2023)

work page arXiv 2023
[44]

Smith, and Yejin Choi

Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi. 2020. Social Bias Frames: Reasoning about Social and Power Implications of Language. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020 . 5477–5490

work page 2020
[45]

Burcu Sayin, Pasquale Minervini, Jacopo Staiano, and Andrea Passerini. 2024. Can LLMs correct physicians, yet? Investigating effective interaction methods in the medical domain. In Proceedings of the 6th Clinical Natural Language Processing Workshop, Clinical NLP 2024. 218–237

work page 2024
[46]

Murray Shanahan, Kyle McDonell, and Laria Reynolds. 2023. Role play with large language models. Nature 623, 7987 (2023), 493–498

work page 2023
[47]

I’m sorry to hear that

Eric Michael Smith, Melissa Hall, Melanie Kambadur, Eleonora Presani, and Adina Williams. 2022. “I’m sorry to hear that”: Finding new biases in language models with a holistic descriptor dataset. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 . 9180–9211

work page 2022
[48]

Soremekun, Sakshi Udeshi, and Sudipta Chattopadhyay

Ezekiel O. Soremekun, Sakshi Udeshi, and Sudipta Chattopadhyay. 2022. Astraea: Grammar-based fairness testing. IEEE Transactions on Software Engineering 48, 12 (2022), 5188–5211

work page 2022
[49]

Zeyu Sun, Zhenpeng Chen, Jie Zhang, and Dan Hao. 2024. Fairness testing of machine translation systems. ACM Transactions on Software Engineering and Methodology 33, 6 (2024), 156

work page 2024
[50]

Yan Tao, Olga Viberg, Ryan S Baker, and René F Kizilcec. 2024. Cultural bias and cultural alignment of large language models. PNAS nexus 3, 9 (2024)

work page 2024
[51]

Saeid Tizpaz-Niari, Ashish Kumar, Gang Tan, and Ashutosh Trivedi. 2022. Fairness-aware configuration of machine learning libraries. In Proceedings of the 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022 . 909–920. , Vol. 1, No. 1, Article . Publication date: November 2024. Benchmarking Bias in Large Language Models during Rol...

work page 2022
[52]

Yu-Min Tseng, Yu-Chao Huang, Teng-Yun Hsiao, Wei-Lin Chen, Chao-Wei Huang, Yu Meng, and Yun-Nung Chen

work page
[53]

CoRR abs/2406.01171 (2024)

Two tales of persona in LLMs: A survey of role-playing and personalization. CoRR abs/2406.01171 (2024)

work page arXiv 2024
[54]

Kelly is a warm person, Joseph is a role model

Yixin Wan, George Pu, Jiao Sun, Aparna Garimella, Kai-Wei Chang, and Nanyun Peng. 2023. “Kelly is a warm person, Joseph is a role model”’: Gender biases in LLM-generated reference letters. InProceedings of Findings of the Association for Computational Linguistics: EMNLP 2023 . 3730–3748

work page 2023
[55]

Yuxuan Wan, Wenxuan Wang, Pinjia He, Jiazhen Gu, Haonan Bai, and Michael R. Lyu. 2023. BiasAsker: Measuring the bias in conversational AI system. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023 . 515–527

work page 2023
[56]

Chao Wang, Zhenpeng Chen, and Minghui Zhou. 2023. AutoML from software engineering perspective: Landscapes and challenges. In Proceedings of the 20th IEEE/ACM International Conference on Mining Software Repositories, MSR

work page 2023
[57]

Wenxuan Wang, Wenxiang Jiao, Jingyuan Huang, Ruyi Dai, Jen-tse Huang, Zhaopeng Tu, and Michael R. Lyu. 2024. Not all countries celebrate Thanksgiving: On the cultural dominance in large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 . 6349–6384

work page 2024
[58]

Craig S Webster, Saana Taylor, Courtney Thomas, and Jennifer M Weller. 2022. Social bias, discrimination and inequity in healthcare: Mechanisms, implications and recommendations. BJA education 22, 4 (2022), 131–137

work page 2022
[59]

Jinfeng Wen, Zhenpeng Chen, Yi Liu, Yiling Lou, Yun Ma, Gang Huang, Xin Jin, and Xuanzhe Liu. 2021. An empirical study on challenges of application development in serverless computing. InProceedings of the 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021 . 416–428

work page 2021
[60]

Cristina G Wilson, Amy T Nusbaum, Paul Whitney, and John M Hinson. 2018. Age-differences in cognitive flexibility when overcoming a preexisting bias through feedback. Journal of clinical and experimental neuropsychology 40, 6 (2018), 586–594

work page 2018
[61]

Zhang, Mark Harman, Lei Ma, and Yang Liu

Jie M. Zhang, Mark Harman, Lei Ma, and Yang Liu. 2022. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering 48, 2 (2022), 1–36

work page 2022
[62]

Peixin Zhang, Jingyi Wang, Jun Sun, Xinyu Wang, Guoliang Dong, Xingen Wang, Ting Dai, and Jin Song Dong. 2022. Automatic Fairness Testing of Neural Classifiers Through Adversarial Sampling. IEEE Trans. Software Eng. 48 (2022)

work page 2022
[63]

Lyu, and Miryung Kim

Tianyi Zhang, Cuiyun Gao, Lei Ma, Michael R. Lyu, and Miryung Kim. 2019. An empirical study of common challenges in developing deep learning applications. InProceedings of the 30th IEEE International Symposium on Software Reliability Engineering, ISSRE 2019. 104–115

work page 2019
[64]

Huaqin Zhao, Zhengliang Liu, Zihao Wu, Yiwei Li, Tianze Yang, Peng Shu, Shaochen Xu, Haixing Dai, Lin Zhao, Gengchen Mai, Ninghao Liu, and Tianming Liu. 2024. Revolutionizing finance with LLMs: An overview of applications and insights. CoRR abs/2401.11641 (2024)

work page arXiv 2024
[65]

Jinman Zhao, Zifan Qian, Linbo Cao, Yining Wang, and Yitian Ding. 2024. Bias and Toxicity in Role-Play Reasoning. CoRR abs/2409.13979 (2024)

work page arXiv 2024
[66]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. 2023. A survey of large language models. CoRR abs/2303.18223 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023
[67]

Haibin Zheng, Zhiqing Chen, Tianyu Du, Xuhong Zhang, Yao Cheng, Shouling Ji, Jingyi Wang, Yue Yu, and Jinyin Chen. 2022. NeuronFair: Interpretable White-Box Fairness Testing through Biased Neuron Identification. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022 . , Vol. 1, No. 1, Article . Publication date: November 2024

work page 2022

[1] [1]

Adopt-a-persona-claude

2024. Adopt-a-persona-claude. https://docs.anthropic.com/en/docs/build-with-claude/prompt-engineering/system- prompts

work page 2024

[2] [2]

Adopt-a-persona-gemini

2024. Adopt-a-persona-gemini. https://support.google.com/a/users/answer/14667148?visit_id=638649091395709697- 2537054327&hl=en&rd=1

work page arXiv 2024

[3] [3]

Adopt-a-persona-meta

2024. Adopt-a-persona-meta. https://www.llama.com/docs/how-to-guides/prompting

work page 2024

[4] [4]

Adopt-a-persona-mistral

2024. Adopt-a-persona-mistral. https://docs.mistral.ai/guides/prompting_capabilities/

work page 2024

[5] [5]

Adopt-a-persona-openai

2024. Adopt-a-persona-openai. https://platform.openai.com/docs/guides/prompt-engineering/tactic-ask-the-model- to-adopt-a-persona

work page 2024

[6] [6]

Chatbot Arena LLM Leaderboard: Community-driven evaluation for best LLM and AI chatbots

2024. Chatbot Arena LLM Leaderboard: Community-driven evaluation for best LLM and AI chatbots. https://lmarena. ai/

work page 2024

[7] [7]

DeepSeek-V2.5

2024. DeepSeek-V2.5. https://huggingface.co/deepseek-ai/DeepSeek-V2.5

work page 2024

[8] [8]

2024. GPT4o. https://platform.openai.com/docs/models/gpt-4o

work page 2024

[9] [9]

GPT4o-mini

2024. GPT4o-mini. https://platform.openai.com/docs/models/gpt-4o-mini

work page 2024

[10] [10]

Meta-Llama-3-70B

2024. Meta-Llama-3-70B. https://huggingface.co/meta-llama/Meta-Llama-3-70B

work page 2024

[11] [11]

Meta-Llama-3-8B

2024. Meta-Llama-3-8B. https://huggingface.co/meta-llama/Meta-Llama-3-8B

work page 2024

[12] [12]

Mistral-7B-Instruct-v0.3

2024. Mistral-7B-Instruct-v0.3. https://huggingface.co/mistralai/Mistral-7B-Instruct-v0.3

work page 2024

[13] [13]

Qwen1.5-110B-Chat

2024. Qwen1.5-110B-Chat. https://huggingface.co/Qwen/Qwen1.5-110B-Chat

work page 2024

[14] [14]

Replication package

2024. Replication package. https://github.com/LLMBias/BiasLens

work page 2024

[15] [15]

Muhammad Hilmi Asyrofi, Zhou Yang, Imam Nur Bani Yusuf, Hong Jin Kang, Ferdian Thung, and David Lo. 2022. BiasFinder: Metamorphic test generation to uncover bias for sentiment analysis systems. IEEE Transactions on Software Engineering 48, 12 (2022), 5087–5101

work page 2022

[16] [16]

Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo

Earl T. Barr, Mark Harman, Phil McMinn, Muzammil Shahbaz, and Shin Yoo. 2015. The oracle problem in software testing: A survey. IEEE Transactions on Software Engineering 41, 5 (2015), 507–525

work page 2015

[17] [17]

Yuriy Brun and Alexandra Meliou. 2018. Software fairness. In Proceedings of the 2018 ACM Joint Meeting on European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2018 . 754–759

work page 2018

[18] [18]

Deborah Carlander, Kiyoshiro Okada, Henrik Engström, and Shuichi Kurabayashi. 2024. Controlled chain of thought: Eliciting role-play understanding in LLM through prompts. InProceedings of IEEE Conference on Games, CoG 2024 . 1–4

work page 2024

[19] [19]

Zhenpeng Chen, Yanbin Cao, Yuanqiang Liu, Haoyu Wang, Tao Xie, and Xuanzhe Liu. 2020. A comprehensive study on challenges in deploying deep learning based software. In Proceedings of the 28th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2020 . 750–762

work page 2020

[20] [20]

Zhang, Max Hort, Mark Harman, and Federica Sarro

Zhenpeng Chen, Jie M. Zhang, Max Hort, Mark Harman, and Federica Sarro. 2024. Fairness testing: A comprehensive survey and analysis of trends. ACM Transactions on Software Engineering and Methodology 33, 5 (2024), 137:1–137:59

work page 2024

[21] [21]

Zhang, Federica Sarro, and Mark Harman

Zhenpeng Chen, Jie M. Zhang, Federica Sarro, and Mark Harman. 2022. MAAT: A novel ensemble approach to addressing fairness and performance bugs for machine learning software. InProceedings of the 30th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2022 . 1122–1134

work page 2022

[22] [22]

Zhang, Federica Sarro, and Mark Harman

Zhenpeng Chen, Jie M. Zhang, Federica Sarro, and Mark Harman. 2023. A comprehensive empirical study of bias mitigation methods for machine learning classifiers. ACM Transactions on Software Engineering and Methodology 32, 4 (2023), 106:1–106:30

work page 2023

[23] [23]

Jordan, Joseph E

Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Banghua Zhu, Hao Zhang, Michael I. Jordan, Joseph E. Gonzalez, and Ion Stoica. 2024. Chatbot arena: An open platform for evaluating LLMs by human preference. In Proceedings of the Forty-first International Conference on Machine Learning, ICML 2024

work page 2024

[24] [24]

Zhibo Chu, Zichong Wang, and Wenbin Zhang. 2024. Fairness in large language models: A Taxonomic Survey.SIGKDD Exploration 26, 1 (2024), 34–48

work page 2024

[25] [25]

Xuanqi Gao, Juan Zhai, Shiqing Ma, Chao Shen, Yufei Chen, and Qian Wang. 2022. Fairneuron: Improving deep neural network fairness with adversary games on selective neurons. In Proceedings of the 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022 . 921–933

work page 2022

[26] [26]

Karen Gonsalkorale, Jeffrey W Sherman, and Karl Christoph Klauer. 2009. Aging and prejudice: Diminished regulation of automatic race bias among older adults. Journal of Experimental Social Psychology 45, 2 (2009), 410–414

work page 2009

[27] [27]

James D Gwartney and Kenneth M McCaffree. 1971. Variance in discrimination among occupations.Southern Economic Journal (1971), 141–155

work page 1971

[28] [28]

Amit Haim, Alejandro Salinas, and Julian Nyarko. 2024. What’s in a name? Auditing large language models for race and gender bias. arXiv preprint arXiv:2402.14875 (2024)

work page arXiv 2024

[29] [29]

Jaeho Jeon and Seongyong Lee. 2023. Large language models in education: A focus on the complementary relationship between human teachers and ChatGPT. Education and Information Technologies 28, 12 (2023), 15873–15892. , Vol. 1, No. 1, Article . Publication date: November 2024. 20 Li et al

work page 2023

[30] [30]

A Woman is More Culturally Knowledgeable than A Man?

Mahammed Kamruzzaman, Hieu Nguyen, Nazmul Hassan, and Gene Louis Kim. 2024. "A Woman is More Culturally Knowledgeable than A Man?": The Effect of Personas on Cultural Norm Interpretation in LLMs. CoRR abs/2409.11636 (2024)

work page arXiv 2024

[31] [31]

Hadas Kotek, Rikker Dockum, and David Q. Sun. 2023. Gender bias and stereotypes in large language models. In Proceedings of The ACM Collective Intelligence Conference, CI 2023 . 12–24

work page 2023

[32] [32]

J Richard Landis and Gary G. Koch. 1977. The measurement of observer agreement for categorical data. Biometrics 33 1 (1977), 159–74

work page 1977

[33] [33]

Yingji Li, Mengnan Du, Rui Song, Xin Wang, and Ying Wang. 2023. A survey on fairness in large language models. CoRR abs/2308.10149 (2023)

work page arXiv 2023

[34] [34]

Ryan Louie, Ananjan Nandi, William Fang, Cheng Chang, Emma Brunskill, and Diyi Yang. 2024. Roleplay-doh: Enabling domain-experts to create LLM-simulated patients via eliciting and adhering to principles. CoRR abs/2407.00870 (2024)

work page arXiv 2024

[35] [35]

Li-Chun Lu, Shou-Jen Chen, Tsung-Min Pai, Chan-Hung Yu, Hung-yi Lee, and Shao-Hua Sun. 2024. LLM discussion: Enhancing the creativity of large language models via discussion framework and role-play. CoRR abs/2405.06373 (2024)

work page arXiv 2024

[36] [36]

Verya Monjezi, Ashutosh Trivedi, Gang Tan, and Saeid Tizpaz-Niari. 2023. Information-Theoretic Testing and Debugging of Fairness Defects in Deep Neural Networks. In 45th IEEE/ACM International Conference on Software Engineering, ICSE 2023. 1571–1582

work page 2023

[37] [37]

Moin Nadeem, Anna Bethke, and Siva Reddy. 2021. StereoSet: Measuring stereotypical bias in pretrained language models. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing, ACL/IJCNLP 2021 . Association for Computational Linguistics, 5356–5371

work page 2021

[38] [38]

Shuyin Ouyang, Jie M Zhang, Mark Harman, and Meng Wang. 2024. An empirical study of the non-determinism of ChatGPT in code generation. ACM Transactions on Software Engineering and Methodology (2024)

work page 2024

[39] [39]

I’m fully who I am

Anaelia Ovalle, Palash Goyal, Jwala Dhamala, Zachary Jaggers, Kai-Wei Chang, Aram Galstyan, Richard S. Zemel, and Rahul Gupta. 2023. “I’m fully who I am”’: Towards centering transgender and non-binary voices to measure biases in open language generation. In Proceedings of the 2023 ACM Conference on Fairness, Accountability, and Transparency, FAccT 2023. 1246–1266

work page 2023

[40] [40]

Shubham Pandey, Archana Patel, and Purvi Pokhariyal. 2024. Exploring the role of ChatGPT in the law enforcement and banking sectors. Artificial Intelligence for Risk Mitigation in the Financial Industry (2024), 327–347

work page 2024

[41] [41]

Juliane Ressel, Michaele Völler, Finbarr Murphy, and Martin Mullins. 2024. Addressing the notion of trust around ChatGPT in the high-stakes use case of insurance. Technology in Society (2024), 102644

work page 2024

[42] [42]

Leonard Salewski, Stephan Alaniz, Isabel Rio-Torto, Eric Schulz, and Zeynep Akata. 2023. In-context impersonation reveals large language models’ strengths and biases. InProceesings of Advances in Neural Information Processing Systems 36: Annual Conference on Neural Information Processing Systems 2023, NeurIPS 2023

work page 2023

[43] [43]

Im not Racist but

Abel Salinas, Louis Penafiel, Robert McCormack, and Fred Morstatter. 2023. “Im not racist but... ”: Discovering bias in the internal knowledge of large language models. CoRR abs/2310.08780 (2023)

work page arXiv 2023

[44] [44]

Smith, and Yejin Choi

Maarten Sap, Saadia Gabriel, Lianhui Qin, Dan Jurafsky, Noah A. Smith, and Yejin Choi. 2020. Social Bias Frames: Reasoning about Social and Power Implications of Language. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, ACL 2020 . 5477–5490

work page 2020

[45] [45]

Burcu Sayin, Pasquale Minervini, Jacopo Staiano, and Andrea Passerini. 2024. Can LLMs correct physicians, yet? Investigating effective interaction methods in the medical domain. In Proceedings of the 6th Clinical Natural Language Processing Workshop, Clinical NLP 2024. 218–237

work page 2024

[46] [46]

Murray Shanahan, Kyle McDonell, and Laria Reynolds. 2023. Role play with large language models. Nature 623, 7987 (2023), 493–498

work page 2023

[47] [47]

I’m sorry to hear that

Eric Michael Smith, Melissa Hall, Melanie Kambadur, Eleonora Presani, and Adina Williams. 2022. “I’m sorry to hear that”: Finding new biases in language models with a holistic descriptor dataset. In Proceedings of the 2022 Conference on Empirical Methods in Natural Language Processing, EMNLP 2022 . 9180–9211

work page 2022

[48] [48]

Soremekun, Sakshi Udeshi, and Sudipta Chattopadhyay

Ezekiel O. Soremekun, Sakshi Udeshi, and Sudipta Chattopadhyay. 2022. Astraea: Grammar-based fairness testing. IEEE Transactions on Software Engineering 48, 12 (2022), 5188–5211

work page 2022

[49] [49]

Zeyu Sun, Zhenpeng Chen, Jie Zhang, and Dan Hao. 2024. Fairness testing of machine translation systems. ACM Transactions on Software Engineering and Methodology 33, 6 (2024), 156

work page 2024

[50] [50]

Yan Tao, Olga Viberg, Ryan S Baker, and René F Kizilcec. 2024. Cultural bias and cultural alignment of large language models. PNAS nexus 3, 9 (2024)

work page 2024

[51] [51]

Saeid Tizpaz-Niari, Ashish Kumar, Gang Tan, and Ashutosh Trivedi. 2022. Fairness-aware configuration of machine learning libraries. In Proceedings of the 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022 . 909–920. , Vol. 1, No. 1, Article . Publication date: November 2024. Benchmarking Bias in Large Language Models during Rol...

work page 2022

[52] [52]

Yu-Min Tseng, Yu-Chao Huang, Teng-Yun Hsiao, Wei-Lin Chen, Chao-Wei Huang, Yu Meng, and Yun-Nung Chen

work page

[53] [53]

CoRR abs/2406.01171 (2024)

Two tales of persona in LLMs: A survey of role-playing and personalization. CoRR abs/2406.01171 (2024)

work page arXiv 2024

[54] [54]

Kelly is a warm person, Joseph is a role model

Yixin Wan, George Pu, Jiao Sun, Aparna Garimella, Kai-Wei Chang, and Nanyun Peng. 2023. “Kelly is a warm person, Joseph is a role model”’: Gender biases in LLM-generated reference letters. InProceedings of Findings of the Association for Computational Linguistics: EMNLP 2023 . 3730–3748

work page 2023

[55] [55]

Yuxuan Wan, Wenxuan Wang, Pinjia He, Jiazhen Gu, Haonan Bai, and Michael R. Lyu. 2023. BiasAsker: Measuring the bias in conversational AI system. In Proceedings of the 31st ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2023 . 515–527

work page 2023

[56] [56]

Chao Wang, Zhenpeng Chen, and Minghui Zhou. 2023. AutoML from software engineering perspective: Landscapes and challenges. In Proceedings of the 20th IEEE/ACM International Conference on Mining Software Repositories, MSR

work page 2023

[57] [57]

Wenxuan Wang, Wenxiang Jiao, Jingyuan Huang, Ruyi Dai, Jen-tse Huang, Zhaopeng Tu, and Michael R. Lyu. 2024. Not all countries celebrate Thanksgiving: On the cultural dominance in large language models. In Proceedings of the 62nd Annual Meeting of the Association for Computational Linguistics, ACL 2024 . 6349–6384

work page 2024

[58] [58]

Craig S Webster, Saana Taylor, Courtney Thomas, and Jennifer M Weller. 2022. Social bias, discrimination and inequity in healthcare: Mechanisms, implications and recommendations. BJA education 22, 4 (2022), 131–137

work page 2022

[59] [59]

Jinfeng Wen, Zhenpeng Chen, Yi Liu, Yiling Lou, Yun Ma, Gang Huang, Xin Jin, and Xuanzhe Liu. 2021. An empirical study on challenges of application development in serverless computing. InProceedings of the 29th ACM Joint European Software Engineering Conference and Symposium on the Foundations of Software Engineering, ESEC/FSE 2021 . 416–428

work page 2021

[60] [60]

Cristina G Wilson, Amy T Nusbaum, Paul Whitney, and John M Hinson. 2018. Age-differences in cognitive flexibility when overcoming a preexisting bias through feedback. Journal of clinical and experimental neuropsychology 40, 6 (2018), 586–594

work page 2018

[61] [61]

Zhang, Mark Harman, Lei Ma, and Yang Liu

Jie M. Zhang, Mark Harman, Lei Ma, and Yang Liu. 2022. Machine learning testing: Survey, landscapes and horizons. IEEE Transactions on Software Engineering 48, 2 (2022), 1–36

work page 2022

[62] [62]

Peixin Zhang, Jingyi Wang, Jun Sun, Xinyu Wang, Guoliang Dong, Xingen Wang, Ting Dai, and Jin Song Dong. 2022. Automatic Fairness Testing of Neural Classifiers Through Adversarial Sampling. IEEE Trans. Software Eng. 48 (2022)

work page 2022

[63] [63]

Lyu, and Miryung Kim

Tianyi Zhang, Cuiyun Gao, Lei Ma, Michael R. Lyu, and Miryung Kim. 2019. An empirical study of common challenges in developing deep learning applications. InProceedings of the 30th IEEE International Symposium on Software Reliability Engineering, ISSRE 2019. 104–115

work page 2019

[64] [64]

Huaqin Zhao, Zhengliang Liu, Zihao Wu, Yiwei Li, Tianze Yang, Peng Shu, Shaochen Xu, Haixing Dai, Lin Zhao, Gengchen Mai, Ninghao Liu, and Tianming Liu. 2024. Revolutionizing finance with LLMs: An overview of applications and insights. CoRR abs/2401.11641 (2024)

work page arXiv 2024

[65] [65]

Jinman Zhao, Zifan Qian, Linbo Cao, Yining Wang, and Yitian Ding. 2024. Bias and Toxicity in Role-Play Reasoning. CoRR abs/2409.13979 (2024)

work page arXiv 2024

[66] [66]

Wayne Xin Zhao, Kun Zhou, Junyi Li, Tianyi Tang, Xiaolei Wang, Yupeng Hou, Yingqian Min, Beichen Zhang, Junjie Zhang, Zican Dong, Yifan Du, Chen Yang, Yushuo Chen, Zhipeng Chen, Jinhao Jiang, Ruiyang Ren, Yifan Li, Xinyu Tang, Zikang Liu, Peiyu Liu, Jian-Yun Nie, and Ji-Rong Wen. 2023. A survey of large language models. CoRR abs/2303.18223 (2023)

work page internal anchor Pith review Pith/arXiv arXiv 2023

[67] [67]

Haibin Zheng, Zhiqing Chen, Tianyu Du, Xuhong Zhang, Yao Cheng, Shouling Ji, Jingyi Wang, Yue Yu, and Jinyin Chen. 2022. NeuronFair: Interpretable White-Box Fairness Testing through Biased Neuron Identification. In 44th IEEE/ACM 44th International Conference on Software Engineering, ICSE 2022 . , Vol. 1, No. 1, Article . Publication date: November 2024

work page 2022