pith. sign in

arxiv: 2605.16113 · v1 · pith:XLC6DM2Anew · submitted 2026-05-15 · 💻 cs.CL · cs.AI

DebiasRAG: A Tuning-Free Path to Fair Generation in Large Language Models through Retrieval-Augmented Generation

Pith reviewed 2026-05-20 18:57 UTC · model grok-4.3

classification 💻 cs.CL cs.AI
keywords debiasingretrieval-augmented generationlarge language modelssocial biasfairnesstuning-freequery-specificRAG
0
0 comments X

The pith

DebiasRAG improves fairness in large language models by retrieving and reversing bias contexts into debiasing constraints without model tuning.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Large language models can generate prejudiced responses on topics involving race, gender, and age due to biases in their training data. Existing approaches to reduce these biases often involve fine-tuning or prompt engineering, which demand extra resources and may impair the models' original abilities. This paper presents DebiasRAG, a tuning-free framework that uses retrieval-augmented generation to create query-specific debiasing contexts dynamically. It prepares bias contexts offline, retrieves relevant ones for a given query, reverses them to form fairness constraints, and reranks them using gradient updates before feeding to the LLM. If successful, this would allow fairer generation while keeping the models' representation capabilities intact, making it practical for real-world use without heavy computational costs.

Core claim

DebiasRAG is a novel tuning-free and dynamic query-specific debiasing framework based on retrieval-augmented generation. It improves fairness while preserving the intrinsic properties of LLMs such as representation ability. The framework operates in three stages: query-specific debiasing candidate generation by self-diagnosing and reversing bias contexts, context candidate pool construction from regular RAG databases, and gradient-updated debiasing-guided context piece reranking to select effective pieces for inference.

What carries the argument

The central mechanism is the retrieval-augmented generation process that generates debiasing contexts by reversing offline-prepared bias contexts and applies gradient-updated reranking to guide fair outputs.

Load-bearing premise

The central claim rests on the assumption that offline-prepared bias contexts can be self-diagnosed from the input query and reversed into effective debiasing contexts that act as reliable fairness constraints without reducing output quality.

What would settle it

An experiment where applying the DebiasRAG framework shows no reduction in measured social bias scores on benchmark queries or a decline in the LLM's performance on standard representation tasks would falsify the claim.

Figures

Figures reproduced from arXiv: 2605.16113 by Bingyin Zhao, Duy Cao Hoang, Huawei Lin, Khoa D Doan, Ping Li, Rui Chu, Thanh Quoc Hung Le, Weijie Zhao, Yingjie Lao.

Figure 1
Figure 1. Figure 1: System workflow of DebiasRAG. The workflow consists of three main components. The first stage (Upper Block) [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: RAG Document-free Scenario. Taking the StereoSet score bench￾mark as an example, we compare the performance of 2 scenarios. In this scenario, there is no mini-Wikipedia dataset be added to D (𝐷normal), but remaining a NLI dataset. The optimization flow of the scenario can be viewed as followed Figure 2b. In the Figure, we are showing the optimization steps of our methods: the original perfor￾mance of Llama… view at source ↗
Figure 3
Figure 3. Figure 3: a, the running average of the target scores across iterations provides a clear visual indication that our gradient optimization process is steadily converging towards an optimal solution. This convergence not only demonstrates that the parameter estimates are progressively refined, but also substantiates the overall effec￾tiveness of our optimization strategy. Performance improvements with increasing 𝜆. By… view at source ↗
Figure 4
Figure 4. Figure 4: Under the intersentence criterion, the original model ex [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 4
Figure 4. Figure 4: SS score on Llama3-8b with Intersentence and In [PITH_FULL_IMAGE:figures/full_fig_p007_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Comparing DebiasRAG performance with prior [PITH_FULL_IMAGE:figures/full_fig_p008_5.png] view at source ↗
read the original abstract

Large language models (LLMs) have achieved unprecedented success due to their exceptional generative capabilities. However, because they depend on knowledge encapsulated from training corpora, they may produce hallucinations, stereotypes, and socially biased content. In particular, LLMs are prone to prejudiced responses involving race, gender, and age, which are collectively referred to as social biases. Prior studies have used fine-tuning and prompt engineering to mitigate such biases in LLMs, but these methods require additional training resources or domain knowledge to design the framework. Moreover, they may degrade the original capabilities of LLMs and often overlook the need for dynamic debiasing contexts for fairer inference. In this paper, we propose DebiasRAG, a novel tuning-free and dynamic query-specific debiasing framework based on retrieval-augmented generation (RAG). DebiasRAG improves fairness while preserving the intrinsic properties of LLMs, such as representation ability. DebiasRAG consists of three stages: (1) query-specific debiasing candidate generation; (2) context candidate pool construction; and (3) gradient-updated debiasing-guided context piece reranking. First, DebiasRAG leverages self-diagnosed bias contexts relevant to the query through regular retrieval, where the bias contexts are prepared offline by the DebiasRAG provider. Given the query-specific bias contexts, DebiasRAG reversely produces debiasing contexts, which are provided as additional fairness constraints for LLM outputs. Second, a regular RAG retrieval process produces query-related contexts from the regular RAG document database, such as a chunked Wikipedia dataset.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper proposes DebiasRAG, a tuning-free, dynamic, query-specific debiasing framework for LLMs that uses retrieval-augmented generation. It consists of three stages: (1) offline-prepared bias contexts are self-diagnosed from the query and reversed to generate debiasing contexts as fairness constraints; (2) a standard RAG process builds a context candidate pool from a document database such as chunked Wikipedia; and (3) gradient-updated reranking selects debiasing context pieces to guide generation. The central claim is that this process reduces social biases (race, gender, age) while preserving intrinsic LLM properties such as representation ability, without fine-tuning or prompt engineering.

Significance. If the empirical claims hold, the work would offer a practical, resource-light alternative to fine-tuning for bias mitigation that can be applied at inference time across models. The RAG-based, query-specific approach addresses limitations of static debiasing methods and could improve fairness in deployed systems while avoiding capability degradation.

major comments (2)
  1. [Abstract and §3] Abstract and §3 (Method): The manuscript asserts that reversed bias contexts function as effective fairness constraints and that gradient-updated reranking preserves representation ability, yet no bias metrics, perplexity measurements, downstream task accuracies, ablation studies, or baseline comparisons are reported anywhere in the paper. Without these, the central claim that fairness improves while intrinsic properties are preserved remains unsupported.
  2. [§3.3] §3.3 (Stage 3): The gradient-updated debiasing-guided context piece reranking is presented as selecting useful pieces without degrading model output quality, but the text provides no analysis showing that this step leaves the original LLM distribution unchanged or that it avoids introducing quality loss, which directly bears on the preservation claim.
minor comments (2)
  1. [§3] The description of the three stages would be clearer if accompanied by pseudocode or a diagram illustrating the flow from bias-context reversal through reranking to final generation.
  2. [§2] Related-work section could explicitly contrast the proposed method with recent RAG-based fairness techniques to better position the novelty.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We agree that the current manuscript requires empirical validation to support its central claims and have revised the paper to include the necessary evaluations.

read point-by-point responses
  1. Referee: [Abstract and §3] Abstract and §3 (Method): The manuscript asserts that reversed bias contexts function as effective fairness constraints and that gradient-updated reranking preserves representation ability, yet no bias metrics, perplexity measurements, downstream task accuracies, ablation studies, or baseline comparisons are reported anywhere in the paper. Without these, the central claim that fairness improves while intrinsic properties are preserved remains unsupported.

    Authors: We agree that the initial submission focused on describing the DebiasRAG methodology without including quantitative results, leaving the claims of improved fairness and preserved representation ability without direct empirical support. In the revised manuscript we have added a dedicated Experiments section reporting: bias metrics (e.g., stereotype and fairness scores on race/gender/age queries), perplexity and generation-quality measures, downstream task accuracies, ablation studies isolating each of the three stages, and comparisons against fine-tuning and prompt-engineering baselines. These results demonstrate that the reversed debiasing contexts and reranking improve fairness while maintaining intrinsic model properties. revision: yes

  2. Referee: [§3.3] §3.3 (Stage 3): The gradient-updated debiasing-guided context piece reranking is presented as selecting useful pieces without degrading model output quality, but the text provides no analysis showing that this step leaves the original LLM distribution unchanged or that it avoids introducing quality loss, which directly bears on the preservation claim.

    Authors: We acknowledge that §3.3 describes the gradient-updated reranking but does not provide explicit analysis of its effect on output distribution or quality. The revised manuscript now includes targeted evaluations comparing perplexity, coherence, and distributional similarity (via KL divergence on token probabilities) before and after reranking. These measurements confirm that the reranking step selects debiasing contexts without materially altering the original LLM distribution or introducing measurable quality degradation. revision: yes

Circularity Check

0 steps flagged

No significant circularity; procedural framework is self-contained

full rationale

The paper describes a three-stage procedural method (query-specific debiasing candidate generation, context pool construction, and gradient-updated reranking) without equations, fitted parameters, predictions, or first-principles derivations. No load-bearing steps reduce by construction to self-defined inputs, self-citations, or renamed known results. The central claim that fairness improves while preserving LLM properties rests on the design of the stages rather than any internal reduction or unverified uniqueness theorem. This is the normal case of a self-contained method proposal.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The framework rests on the domain assumption that bias contexts prepared offline can be reliably retrieved and reversed into useful debiasing constraints, plus the assumption that gradient-based reranking can be performed in a tuning-free manner.

axioms (1)
  • domain assumption Offline-prepared bias contexts relevant to a query can be self-diagnosed and reversed to produce effective debiasing contexts that serve as fairness constraints.
    This assumption is invoked to justify stage one of the pipeline.

pith-pipeline@v0.9.0 · 5851 in / 1320 out tokens · 63847 ms · 2026-05-20T18:57:58.421250+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

72 extracted references · 72 canonical work pages · 8 internal anchors

  1. [1]

    Perplexity—a measure of the difficulty of speech recognition tasks.The Journal of the Acoustical Society of America62, S1 (1977), S63–S63

    1977. Perplexity—a measure of the difficulty of speech recognition tasks.The Journal of the Acoustical Society of America62, S1 (1977), S63–S63

  2. [2]

    Ahmed Agiza, Mohamed Mostagir, and Sherief Reda. 2024. Analyzing the Impact of Data Selection and Fine-Tuning on Economic and Political Biases in LLMs. arXiv preprint arXiv:2404.08699(2024)

  3. [3]

    Amanda Askell, Yuntao Bai, Anna Chen, Dawn Drain, Deep Ganguli, Tom Henighan, Andy Jones, Nicholas Joseph, Ben Mann, Nova DasSarma, et al. 2021. A general language assistant as a laboratory for alignment.arXiv preprint arXiv:2112.00861(2021)

  4. [4]

    Eyal Ben-David, Nadav Oved, and Roi Reichart. 2022. PADA: Example-based Prompt Learning for on-the-fly Adaptation to Unseen Domains.Transactions of the Association for Computational Linguistics10 (2022), 414–433

  5. [5]

    Rishi Bommasani, Kelly Davis, and Claire Cardie. 2020. Interpreting pretrained contextualized representations via reductions to static embeddings. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics. 4758– 4781

  6. [6]

    Alexander Borzunov, Max Ryabinin, Artem Chumachenko, Dmitry Baranchuk, Tim Dettmers, Younes Belkada, Pavel Samygin, and Colin A Raffel. 2024. Dis- tributed inference and fine-tuning of large language models over the internet. Advances in Neural Information Processing Systems36 (2024)

  7. [7]

    Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun. 2024. Benchmarking large language models in retrieval-augmented generation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 17754–17762

  8. [8]

    Rui Chen, Yongwei Chen, Ningxin Jiao, and Kui Jia. 2023. Fantasia3d: Disentan- gling geometry and appearance for high-quality text-to-3d content creation. In Proceedings of the IEEE/CVF international conference on computer vision. 22246– 22256

  9. [9]

    Xin Cheng, Di Luo, Xiuying Chen, Lemao Liu, Dongyan Zhao, and Rui Yan

  10. [10]

    Advances in Neural Information Processing Systems36 (2024)

    Lift yourself up: Retrieval-augmented text generation with self-memory. Advances in Neural Information Processing Systems36 (2024)

  11. [11]

    Saswat Das, Marco Romanelli, Cuong Tran, Zarreen Reza, Bhavya Kailkhura, and Ferdinando Fioretto. 2024. Low-rank finetuning for LLMs: A fairness perspective. arXiv preprint arXiv:2405.18572(2024)

  12. [12]

    Tim Dettmers, Artidoro Pagnoni, Ari Holtzman, and Luke Zettlemoyer. 2024. Qlora: Efficient finetuning of quantized llms.Advances in Neural Information Processing Systems36 (2024)

  13. [13]

    Jacob Devlin, Ming-Wei Chang, Kenton Lee, and Kristina Toutanova. 2019. BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding. In Proceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers). Minneapolis, Minnesota...

  14. [14]

    Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hosseini, and Hervé Jégou. 2024. The faiss library.arXiv preprint arXiv:2401.08281(2024)

  15. [15]

    David Esiobu, Xiaoqing Tan, Saghar Hosseini, Megan Ung, Yuchen Zhang, Jude Fernandes, Jane Dwivedi-Yu, Eleonora Presani, Adina Williams, and Eric Michael Smith. 2023. ROBBIE: Robust bias evaluation of large generative language models. arXiv preprint arXiv:2311.18140(2023)

  16. [16]

    Yue Guo, Yi Yang, and Ahmed Abbasi. 2022. Auto-debias: Debiasing masked language models with automated biased prompts. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). 1012–1023

  17. [17]

    Yue Guo, Yi Yang, and Ahmed Abbasi. 2022. Auto-Debias: Debiasing Masked Lan- guage Models with Automated Biased Prompts. InProceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), Smaranda Muresan, Preslav Nakov, and Aline Villavicencio (Eds.). Association for Computational Linguistics, Dublin, Irela...

  18. [18]

    Jacqueline He, Mengzhou Xia, Christiane Fellbaum, and Danqi Chen. 2022. MABEL: Attenuating Gender Bias using Textual Entailment Data. InProceed- ings of the 2022 Conference on Empirical Methods in Natural Language Pro- cessing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Association for Computational Linguistics, Abu Dhabi, United Arab Emirate...

  19. [19]

    Daniel Hewlett, Alexandre Lacoste, Llion Jones, Illia Polosukhin, Andrew Fan- drianto, Jay Han, Matthew Kelcey, and David Berthelot. 2016. Wikireading: A novel large-scale language understanding task over wikipedia.arXiv preprint arXiv:1608.03542(2016)

  20. [20]

    Wenyue Hua, Yingqiang Ge, Shuyuan Xu, Jianchao Ji, and Yongfeng Zhang. 2023. UP5: Unbiased Foundation Model for Fairness-aware Recommendation.arXiv preprint arXiv:2305.12090(2023)

  21. [21]

    Mikhail Isaev, Nic McDonald, and Richard Vuduc. 2023. Scaling Infrastructure to Support Multi-Trillion Parameter LLM Training. InArchitecture and System Support for Transformer Models (ASSYST @ISCA 2023)

  22. [22]

    Masahiro Kaneko and Danushka Bollegala. 2021. Debiasing Pre-trained Con- textualised Embeddings. InProceedings of the 16th Conference of the European Chapter of the Association for Computational Linguistics: Main Volume, EACL 2021, Online, April 19 - 23, 2021. Association for Computational Linguistics, 1256–1266

  23. [23]

    Ahmed Khalifa, Philip Bontrager, Sam Earle, and Julian Togelius. 2020. Pcgrl: Procedural content generation via reinforcement learning. InProceedings of the AAAI Conference on Artificial Intelligence and Interactive Digital Entertainment, Vol. 16. 95–101

  24. [24]

    Taeyoun Kim, Jacob Mitchell Springer, Aditi Raghunathan, and Maarten Sap. 2025. Mitigating Bias in RAG: Controlling the Embedder. InFindings of the Association for Computational Linguistics, ACL 2025, Vienna, Austria, July 27 - August 1, 2025, Wanxiang Che, Joyce Nabende, Ekaterina Shutova, and Mohammad Taher Pilehvar (Eds.). Association for Computational...

  25. [25]

    LangTest. 2024. StereoSet - LangTest. https://langtest.org/docs/pages/tests/ stereoset. Accessed: 2024-10-16

  26. [26]

    Daniel D Lee, P Pham, Y Largman, and A Ng. 2009. Advances in neural informa- tion processing systems 22.Tech Rep(2009)

  27. [27]

    Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rock- täschel, et al. 2020. Retrieval-augmented generation for knowledge-intensive nlp tasks.Advances in Neural Information Processing Systems33 (2020), 9459–9474

  28. [28]

    Haochen Li, Rui Zhang, Hantao Yao, Xinkai Song, Yifan Hao, Yongwei Zhao, Ling Li, and Yunji Chen. 2024. Learning domain-aware detection head with prompt tuning.Advances in Neural Information Processing Systems36 (2024)

  29. [29]

    Xiang Lisa Li and Percy Liang. 2021. Prefix-tuning: Optimizing continuous prompts for generation.arXiv preprint arXiv:2101.00190(2021)

  30. [30]

    Zhenyu Li, Sunqi Fan, Yu Gu, Xiuxing Li, Zhichao Duan, Bowen Dong, Ning Liu, and Jianyong Wang. 2024. Flexkbqa: A flexible llm-powered framework for few-shot knowledge base question answering. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 18608–18616

  31. [31]

    Paul Pu Liang, Irene Mengze Li, Emily Zheng, Yao Chong Lim, Ruslan Salakhut- dinov, and Louis-Philippe Morency. 2020. Towards Debiasing Sentence Rep- resentations. InProceedings of the 58th Annual Meeting of the Association for Computational Linguistics, Dan Jurafsky, Joyce Chai, Natalie Schluter, and Joel Tetreault (Eds.). Association for Computational L...

  32. [32]

    Chen-Hsuan Lin, Jun Gao, Luming Tang, Towaki Takikawa, Xiaohui Zeng, Xun Huang, Karsten Kreis, Sanja Fidler, Ming-Yu Liu, and Tsung-Yi Lin. 2023. Magic3d: High-resolution text-to-3d content creation. InProceedings of the IEEE/CVF Con- ference on Computer Vision and Pattern Recognition. 300–309

  33. [33]

    Pingsheng Liu, Zhengjie Huang, Xiechi Zhang, Linlin Wang, Gerard de Melo, Xin Lin, Liang Pang, and Liang He. 2023. A disentangled-attention based framework with persona-aware prompt learning for dialogue generation. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 13255–13263

  34. [34]

    Antoine Louis, Gijs van Dijck, and Gerasimos Spanakis. 2024. Interpretable long-form legal question answering with retrieval-augmented large language models. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 22266–22275

  35. [35]

    Rowan Hall Maudslay, Hila Gonen, Ryan Cotterell, and Simone Teufel. 2019. It’s All in the Name: Mitigating Gender Bias with Name-Based Counterfactual Data Substitution. InProceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), Kentaro I...

  36. [36]

    Vasilios Mavroudis. 2024. LangChain. (2024)

  37. [37]

    Chandler May, Alex Wang, Shikha Bordia, Samuel R Bowman, and Rachel Rudinger. 2019. On measuring social biases in sentence encoders.arXiv preprint arXiv:1903.10561(2019)

  38. [38]

    Bowman, and Rachel Rudinger

    Chandler May, Alex Wang, Shikha Bordia, Samuel R. Bowman, and Rachel Rudinger. 2019. On Measuring Social Biases in Sentence Encoders. InProceedings of the 2019 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, Volume 1 (Long and Short Papers), Jill Burstein, Christy Doran, and Thamar So...

  39. [39]

    Nicholas Meade, Elinor Poole-Dayan, and Siva Reddy. 2021. An empirical survey of the effectiveness of debiasing techniques for pre-trained language models. arXiv preprint arXiv:2110.08527(2021)

  40. [40]

    Moin Nadeem, Anna Bethke, and Siva Reddy. 2020. StereoSet: Measuring stereo- typical bias in pretrained language models. InAnnual Meeting of the Association for Computational Linguistics

  41. [41]

    Nikita Nangia, Clara Vania, Rasika Bhalerao, and Samuel R. Bowman. 2020. CrowS-Pairs: A Challenge Dataset for Measuring Social Biases in Masked Lan- guage Models. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing (EMNLP). 1953–1967

  42. [42]

    Zabir Al Nazi, Md Zahangir Alom Bhuiyan, Yuji Nakamura, Md Mamunur Rah- man, Liton Barua, and Md Kamrul Hasan. 2023. Large language models in healthcare and medical domain: A review.arXiv preprint arXiv:2401.06775(2023). DebiasRAG: A Tuning-Free Path to Fair Generation in Large Language Models through Retrieval-Augmented Generation

  43. [43]

    Daeseung Park, Gi-taek An, Chayapol Kamyod, and Cheong Ghil Kim. 2023. A Study on Performance Improvement of Prompt Engineering for Generative AI with a Large Language Model.Journal of Web Engineering22, 8 (2023), 1187–1206

  44. [44]

    Ratish Puduppully, Li Dong, and Mirella Lapata. 2019. Data-to-text generation with content selection and planning. InProceedings of the AAAI conference on artificial intelligence, Vol. 33. 6908–6915

  45. [45]

    Rebecca Qian, Candace Ross, Jude Fernandes, Eric Michael Smith, Douwe Kiela, and Adina Williams. 2022. Perturbation Augmentation for Fairer NLP. InProceed- ings of the 2022 Conference on Empirical Methods in Natural Language Processing, Yoav Goldberg, Zornitsa Kozareva, and Yue Zhang (Eds.). Abu Dhabi, United Arab Emirates, 9496–9521

  46. [46]

    Alec Radford, Jeffrey Wu, Rewon Child, David Luan, Dario Amodei, Ilya Sutskever, et al. 2019. Language models are unsupervised multitask learners. OpenAI blog1, 8 (2019), 9

  47. [47]

    Timo Schick, Sahana Udupa, and Hinrich Schütze. 2021. Self-Diagnosis and Self- Debiasing: A Proposal for Reducing Corpus-Based Bias in NLP.arXiv preprint arXiv:2103.00453(2021)

  48. [48]

    Robik Shrestha, Yang Zou, Qiuyu Chen, Zhiheng Li, Yusheng Xie, and Siqi Deng

  49. [49]

    In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, W A, USA, June 16-22, 2024

    FairRAG: Fair Human Generation via Fair Retrieval Augmentation. In IEEE/CVF Conference on Computer Vision and Pattern Recognition, CVPR 2024, Seattle, W A, USA, June 16-22, 2024. IEEE, 11996–12005. doi:10.1109/CVPR52733. 2024.01140

  50. [50]

    Kurt Shuster, Jing Xu, Mojtaba Komeili, Da Ju, Eric Michael Smith, Stephen Roller, Megan Ung, Moya Chen, Kushal Arora, Joshua Lane, et al. 2022. Blenderbot 3: a deployed conversational agent that continually learns to responsibly engage. arXiv preprint arXiv:2208.03188(2022)

  51. [51]

    Ashudeep Singh and Thorsten Joachims. 2018. Fairness of Exposure in Rankings. InProceedings of the 24th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD 2018, London, UK, August 19-23, 2018, Yike Guo and Faisal Farooq (Eds.). ACM, 2219–2228. doi:10.1145/3219819.3220088

  52. [52]

    Karanbir Singh and William Ngu. 2025. Bias-Aware Agent: Enhancing Fairness in AI-Driven Knowledge Retrieval. InCompanion Proceedings of the ACM on Web Conference 2025, WWW 2025, Sydney, NSW, Australia, 28 April 2025 - 2 May 2025, Guodong Long, Michale Blumestein, Yi Chang, Liane Lewin-Eytan, Zi Helen Huang, and Elad Yom-Tov (Eds.). ACM, 1705–1712. doi:10....

  53. [53]

    Irene Solaiman and Christy Dennison. 2021. Process for Adapting Language Models to Society (PALMS) with Values-Targeted Datasets. InAdvances in Neural Information Processing Systems 34: Annual Conference on Neural Information Processing Systems 2021, NeurIPS 2021, December 6-14, 2021, virtual. 5861–5873

  54. [54]

    Alessandro Sordoni, Eric Yuan, Marc-Alexandre Côté, Matheus Pereira, Adam Trischler, Ziang Xiao, Arian Hosseini, Friederike Niedtner, and Nicolas Le Roux

  55. [55]

    Advances in Neural Information Processing Systems36 (2024)

    Joint prompt optimization of stacked llms using variational inference. Advances in Neural Information Processing Systems36 (2024)

  56. [56]

    Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. 2023. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971(2023)

  57. [57]

    Maria Tsimpoukelli, Jacob L Menick, Serkan Cabi, SM Eslami, Oriol Vinyals, and Felix Hill. 2021. Multimodal few-shot learning with frozen language models. Advances in Neural Information Processing Systems34 (2021), 200–212

  58. [58]

    Li Wang, Xi Chen, XiangWen Deng, Hao Wen, MingKe You, WeiZhi Liu, Qi Li, and Jian Li. 2024. Prompt engineering in consistency and reliability with the evidence-based guideline for LLMs.npj Digital Medicine7, 1 (2024), 41

  59. [59]

    Yu Wang, Nedim Lipka, Ryan A Rossi, Alexa Siu, Ruiyi Zhang, and Tyler Derr

  60. [60]

    In Proceedings of the AAAI Conference on Artificial Intelligence, Vol

    Knowledge graph prompting for multi-document question answering. In Proceedings of the AAAI Conference on Artificial Intelligence, Vol. 38. 19206–19214

  61. [61]

    Kellie Webster, Xuezhi Wang, Ian Tenney, Alex Beutel, Emily Pitler, Ellie Pavlick, Jilin Chen, Ed Chi, and Slav Petrov. 2020. Measuring and reducing gendered correlations in pre-trained models.arXiv preprint arXiv:2010.06032(2020)

  62. [62]

    Xuyang Wu, Shuowei Li, Hsin-Tai Wu, Zhiqiang Tao, and Yi Fang. 2025. Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems. InProceedings of the 31st International Conference on Com- putational Linguistics, COLING 2025, Abu Dhabi, UAE, January 19-24, 2025, Owen Rambow, Leo Wanner, Marianna Apidianaki, Hend Al...

  63. [63]

    Xiaoqian Wu, Yong-Lu Li, Jianhua Sun, and Cewu Lu. 2024. Symbol-LLM: leverage language models for symbolic system in visual human activity reasoning. Advances in Neural Information Processing Systems36 (2024)

  64. [64]

    Fen Xia, Tie-Yan Liu, Jue Wang, Wensheng Zhang, and Hang Li. 2008. List- wise approach to learning to rank: theory and algorithm. InMachine Learning, Proceedings of the Twenty-Fifth International Conference (ICML 2008), Helsinki, Finland, June 5-9, 2008 (ACM International Conference Proceeding Series, Vol. 307), William W. Cohen, Andrew McCallum, and Sam ...

  65. [65]

    doi:10.1145/1390156.1390306

  66. [66]

    Ke Yang, Charles Yu, Yi R Fung, Manling Li, and Heng Ji. 2023. Adept: A debiasing prompt framework. InProceedings of the AAAI Conference on Artificial Intelligence, Vol. 37. 10780–10788

  67. [67]

    Yi Yang, Wen-tau Yih, and Christopher Meek. 2015. Wikiqa: A challenge dataset for open-domain question answering. InProceedings of the 2015 conference on empirical methods in natural language processing. 2013–2018

  68. [68]

    Susan Zhang, Stephen Roller, Naman Goyal, Mikel Artetxe, Moya Chen, Shuohui Chen, Christopher Dewan, Mona Diab, Xian Li, Xi Victoria Lin, et al. 2022. Opt: Open pre-trained transformer language models.arXiv preprint arXiv:2205.01068 (2022)

  69. [69]

    Yunfan Zhang, Xinyu Li, Shuohang Zhu, Lidong Bing, and Luo Si. 2023. Retrieval- Augmented Generation for Large Language Models: A Survey.arXiv preprint arXiv:2312.10997(2023)

  70. [70]

    Zheng Zhang, Fan Yang, Ziyan Jiang, Zheng Chen, Zhengyang Zhao, Chengyuan Ma, Liang Zhao, and Yang Liu. 2024. Position-Aware Parameter Efficient Fine-Tuning Approach for Reducing Positional Bias in LLMs.arXiv preprint arXiv:2404.01430(2024)

  71. [71]

    Yuchen Zhuang, Yue Yu, Kuan Wang, Haotian Sun, and Chao Zhang. 2023. Toolqa: A dataset for llm question answering with external tools.Advances in Neural Information Processing Systems36 (2023), 50117–50143

  72. [72]

    2023.Extended context for InstructGPT with LlamaIndex

    Bruno Zirnstein. 2023.Extended context for InstructGPT with LlamaIndex. Tech- nical Report. Technical Report. Hochschule für Wirtschaft und Recht Berlin