Fairness-Aware Retrieval Optimization for Retrieval-Augmented Generation

Jyrki Nummenmaa; Kostas Stefanidis; Vasilis Efthymiou; Yingqi Zhao

arxiv: 2605.15790 · v1 · pith:P6HZQIN2new · submitted 2026-05-15 · 💻 cs.DB · cs.IR

Fairness-Aware Retrieval Optimization for Retrieval-Augmented Generation

Yingqi Zhao , Vasilis Efthymiou , Jyrki Nummenmaa , Kostas Stefanidis This is my paper

Pith reviewed 2026-05-19 19:22 UTC · model grok-4.3

classification 💻 cs.DB cs.IR

keywords fairness-aware retrievalretrieval-augmented generationbias mitigationoptimizationrerankingRAG systemsfairness optimization

0 comments

The pith

Retrieval optimization that models position-dependent bias propagation can reduce unfairness in RAG outputs while maintaining document relevance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to establish a method for making retrieval in RAG systems fairer by controlling how bias from document positions affects the final generation. In top-k retrieval, several documents influence the output together, so bias can spread unevenly. The approach uses reranking to inject controlled bias, a model that tracks bias based on position, and an optimization to balance fairness with relevance. A scalable solver called FARO makes this practical by breaking down the problem. If successful, this would let RAG systems produce more equitable answers without sacrificing accuracy.

Core claim

The central claim is that by modeling bias propagation in a position-aware way and formulating retrieval as an optimization problem that trades off relevance against fairness, with a quadratic approximation via dual hyperplanes for efficiency, one can mitigate the bias that reaches the generation stage in RAG while preserving the utility of the retrieved documents.

What carries the argument

The position-aware model of bias propagation combined with controlled bias injection via reranking and the FARO optimization that decomposes the quadratic fairness problem using dual hyperplane approximation.

Load-bearing premise

The position-aware model of bias propagation combined with controlled bias injection via reranking accurately represents how retrieval choices affect downstream generation bias in top-k settings.

What would settle it

Running the method on a RAG system and measuring generation bias metrics before and after, checking if bias drops significantly while relevance stays high; if bias remains unchanged, the claim fails.

Figures

Figures reproduced from arXiv: 2605.15790 by Jyrki Nummenmaa, Kostas Stefanidis, Vasilis Efthymiou, Yingqi Zhao.

**Figure 2.** Figure 2: Reranking-based mechanism for controlling embedding bias in retrieved docu [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗

**Figure 3.** Figure 3: Overview of the FARO optimization process. For each value of the surrogate [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗

**Figure 4.** Figure 4: Top-2 RAG grid search results: We conducted our grid search experiments based [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗

**Figure 5.** Figure 5: To further assess generalization, we apply the learned linear parameters [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗

**Figure 5.** Figure 5: Bias validation results: the first row corresponds to gender bias, the second [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗

**Figure 6.** Figure 6: Comparison of position-dependent weight distributions of biased content in the [PITH_FULL_IMAGE:figures/full_fig_p028_6.png] view at source ↗

**Figure 7.** Figure 7: Comparison of attention weight distributions for political bias across different [PITH_FULL_IMAGE:figures/full_fig_p030_7.png] view at source ↗

**Figure 8.** Figure 8: Trade-off curves between fairness and relevance for political bias in RAG systems [PITH_FULL_IMAGE:figures/full_fig_p031_8.png] view at source ↗

**Figure 9.** Figure 9: Trade-off curves between fairness and relevance for gender bias in RAG systems [PITH_FULL_IMAGE:figures/full_fig_p032_9.png] view at source ↗

read the original abstract

Retrieval-Augmented Generation (RAG) improves reliability of large language models by incorporating external knowledge, but the retrieval process can introduce bias that propagates to generated outputs. This issue is particularly challenging in top-k settings, where multiple documents jointly influence generation. We propose a fairness-aware retrieval framework that models and controls this bias. Our approach combines controlled bias injection via reranking, a position-aware model of bias propagation, and an optimization formulation that balances relevance and fairness. We further introduce a scalable solution based on Quadratic Fairness via Dual Hyperplane Approximation (FARO), which enables efficient optimization through problem decomposition. Experimental results show that our method effectively mitigates generation bias while preserving relevance. This work provides a principled approach for fairness-aware retrieval in RAG systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

FARO gives a workable approximation for trading relevance against fairness in RAG retrieval, but the abstract leaves the experimental claims uncheckable and the bias model may oversimplify joint document effects.

read the letter

The paper's main contribution is FARO, a dual-hyperplane approximation that turns a quadratic fairness objective into something decomposable and scalable for top-k retrieval. It pairs this with reranking for controlled bias injection and a position-aware propagation model. That combination is the concrete new piece: an optimization formulation that explicitly balances the two goals instead of treating fairness as a post-hoc filter.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes a fairness-aware retrieval framework for Retrieval-Augmented Generation (RAG) in top-k settings. It combines a position-aware model of bias propagation, controlled bias injection via reranking, and an optimization formulation that balances relevance and fairness objectives. The framework is made scalable via Quadratic Fairness via Dual Hyperplane Approximation (FARO) through problem decomposition. The central claim is that this approach mitigates generation bias while preserving relevance, supported by experimental results.

Significance. If the claims hold, the work addresses an important practical issue in RAG systems by providing a principled optimization approach to fairness. The FARO decomposition for efficient solving represents a useful technical contribution for balancing the two objectives.

major comments (1)

[position-aware model of bias propagation] The position-aware model of bias propagation (combined with reranking-based controlled injection) is load-bearing for the central claim that the method accurately controls downstream generation bias. This model implicitly treats bias effects as additive or linear across ranked positions, yet top-k RAG generation involves non-linear interactions including attention mixing and context fusion over the full retrieved set. If these joint effects are not captured, the optimization objective and reported bias reductions may be miscalibrated relative to actual LLM outputs.

minor comments (2)

[Abstract] The abstract asserts that experiments support bias mitigation with preserved relevance but provides no details on datasets, baselines, metrics, error bars, or statistical tests; a brief summary of these should be added for transparency.
[optimization formulation] The relevance-fairness trade-off weight is listed as a free parameter; clarify whether the method is intended to be parameter-free or how this hyperparameter is selected in practice.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and for highlighting the central role of the position-aware bias propagation model. We address this comment directly below and outline planned revisions to strengthen the manuscript.

read point-by-point responses

Referee: [position-aware model of bias propagation] The position-aware model of bias propagation (combined with reranking-based controlled injection) is load-bearing for the central claim that the method accurately controls downstream generation bias. This model implicitly treats bias effects as additive or linear across ranked positions, yet top-k RAG generation involves non-linear interactions including attention mixing and context fusion over the full retrieved set. If these joint effects are not captured, the optimization objective and reported bias reductions may be miscalibrated relative to actual LLM outputs.

Authors: We agree that the position-aware propagation model serves as a key component and that it employs a structured, position-dependent weighting rather than a fully non-linear representation of LLM internals. The model is intentionally formulated as a tractable approximation that captures observed positional decay in bias influence, which is supported by prior empirical studies on context utilization in retrieval-augmented settings. While we do not claim to model every attention-mixing or fusion interaction explicitly, the framework is validated end-to-end: bias metrics are computed directly from the LLM's generated outputs after applying the optimized retrieval sets. This provides empirical grounding that the resulting bias reductions are realized in practice, not merely in the surrogate objective. To address the concern, we will add a new subsection in the revised manuscript discussing the modeling assumptions, the linear-position approximation, and its limitations relative to full non-linear LLM dynamics, along with suggestions for future extensions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; optimization balances independent objectives

full rationale

The paper presents a fairness-aware retrieval framework that combines controlled bias injection via reranking, a position-aware model of bias propagation, and an optimization formulation balancing relevance and fairness, solved via the FARO approximation. No equations, derivations, or self-citations are exhibited in the provided text that reduce the claimed bias mitigation or fairness gains to a fitted parameter by construction or to a load-bearing self-citation chain. The central claims rest on the proposed balancing of two objectives and experimental validation, which remain independent of the inputs by the paper's own description.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Framework rests on the domain assumption that bias propagation can be modeled positionally and controlled through reranking; introduces one optimization technique and likely at least one trade-off parameter.

free parameters (1)

relevance-fairness trade-off weight
Optimization formulation balances relevance and fairness; such a scalar is required to produce a single solution and is not derived from first principles in the abstract.

axioms (1)

domain assumption Bias in RAG generation can be modeled via position-aware propagation and controlled by reranking.
This premise underpins the entire controlled-bias-injection and optimization approach described in the abstract.

pith-pipeline@v0.9.0 · 5662 in / 1236 out tokens · 74341 ms · 2026-05-19T19:22:29.557138+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

we approximate this relationship using a linear model: Rb = Σ wp·Ep_b + Lb + ε
IndisputableMonolith/Foundation/RealityFromDistinction.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

FARO framework transforms globally coupled fairness optimization into independent per-question assignment problems

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 7 internal anchors

[1]

URL:https://papers.nips.cc/paper_files/paper/ 2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

Vaswaniet al., Attention is All you Need, in: Advances in Neural Information Processing Systems, volume 30, Curran Asso- ciates, Inc., 2017. URL:https://papers.nips.cc/paper_files/paper/ 2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

work page 2017
[2]

Scaling Laws for Neural Language Models

J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, D. Amodei, Scaling Laws for Neural Lan- guage Models, 2020. URL:http://arxiv.org/abs/2001.08361. doi:10. 48550/arXiv.2001.08361, arXiv:2001.08361 [cs, stat]

work page internal anchor Pith review Pith/arXiv arXiv 2020
[3]

Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. J. Bang, A. Madotto, P. Fung, Survey of hallucination in natural language generation, ACM Comput. Surv. 55 (2023). URL:https://doi.org/10.1145/3571730. doi:10.1145/3571730

work page doi:10.1145/3571730 2023
[4]

Lewis, E

P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, S. Riedel, D. Kiela, Retrieval-augmented generation for knowledge-intensive nlp tasks, in: Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Curran Associates Inc., 2020

work page 2020
[5]

M. Hu, H. Wu, Z. Guan, R. Zhu, D. Guo, D. Qi, S. Li, No Free Lunch: Retrieval-Augmented Generation Undermines Fairness in LLMs, Even for Vig- 36 ilant Users, 2024. URL:http://arxiv.org/abs/2410.07589. doi:10.48550/ arXiv.2410.07589, arXiv:2410.07589 [cs]

work page arXiv 2024
[6]

X. Wu, S. Li, H.-T. Wu, Z. Tao, Y. Fang, Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems, in: COLING, 2025, pp. 10021–10036

work page 2025
[7]

Ranjan, S

R. Ranjan, S. Gupta, S. N. Singh, A comprehensive survey of bias in llms: Current landscape and future directions, CoRR abs/2409.16430 (2024)

work page arXiv 2024
[8]

Pitoura, K

E. Pitoura, K. Stefanidis, G. Koutrika, Fairness in rankings and recommen- dations: an overview, VLDB J. 31 (2022) 431–458

work page 2022
[9]

Gallegos, Ryan A

G. et al., Bias and Fairness in Large Language Models: A Survey, Computa- tional Linguistics 50 (2024) 1097–1179. doi:10.1162/coli_a_00524

work page doi:10.1162/coli_a_00524 2024
[10]

Singh, T

A. Singh, T. Joachims, Fairness of Exposure in Rankings, in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discov- ery & Data Mining, KDD ’18, Association for Computing Machinery, 2018, pp. 2219–2228. URL:https://dl.acm.org/doi/10.1145/3219819.3220088. doi:10.1145/3219819.3220088

work page doi:10.1145/3219819.3220088 2018
[11]

T. Kim, J. M. Springer, A. Raghunathan, M. Sap, Mitigating Bias in RAG: Controlling the Embedder, in: Findings of the Association for Computational Linguistics, ACL 2025, Vienna, Austria, July 27 - August 1, 2025, Association for Computational Linguistics, 2025, pp. 18999–19024

work page 2025
[12]

Y. Zhao, V. Efthymiou, J. Nummenmaa, K. Stefanidis, ReFaRAG: Re- ranking for Bias Mitigation in Retrieval-Augmented Generation, in: New TrendsinDatabaseandInformationSystems, 2026, pp.516–530.doi:10.1007/ 978-3-032-05727-3_42

work page 2026
[13]

Zhang, Y

T. Zhang, Y. Zhou, D. Bollegala, Evaluating the Effect of Retrieval Augmentation on Social Biases, 2025. doi:10.48550/arXiv.2502.17611. arXiv:2502.17611

work page doi:10.48550/arxiv.2502.17611 2025
[14]

et al., Lost in the Middle: How Language Models Use Long Con- texts, Transactions of the Association for Computational Linguistics 12 (2024) 157–173

L. et al., Lost in the Middle: How Language Models Use Long Con- texts, Transactions of the Association for Computational Linguistics 12 (2024) 157–173. URL:https://aclanthology.org/2024.tacl-1.9/. doi:10.1162/ tacl_a_00638. 37

work page 2024
[15]

T. E. Kim, F. Diaz, Towards fair rag: On the impact of fair ranking in retrieval-augmented generation, ICTIR ’25, Association for Computing Ma- chinery, 2025, p. 33–43. URL:https://doi.org/10.1145/3731120.3744599. doi:10.1145/3731120.3744599

work page doi:10.1145/3731120.3744599 2025
[16]

Dehghan, G

M. Dehghan, G. McDonald, Who benefits from rag? the role of expo- sure, utility and attribution bias, in: Advances in Information Retrieval: 48th European Conference on Information Retrieval, ECIR 2026, Delft, The Netherlands, March 29 – April 2, 2026, Proceedings, Part I, Springer-Verlag, 2026, p. 289–304. URL:https://doi.org/10.1007/978-3-032-21289-4_19...

work page doi:10.1007/978-3-032-21289-4_19 2026
[17]

Zehlike, F

M. Zehlike, F. Bonchi, C. Castillo, S. Hajian, M. Megahed, R. Baeza-Yates, FA*IR: A fair top-k ranking algorithm, in: CIKM, 2017, pp. 1569–1578

work page 2017
[18]

Zehlike, C

M. Zehlike, C. Castillo, Reducing Disparate Exposure in Ranking: A Learning To Rank Approach, in: Proceedings of The Web Conference 2020, WWW ’20, Association for Computing Machinery, 2020, pp. 2849–

work page 2020
[19]

URL:https://dl.acm.org/doi/10.1145/3366424.3380048. doi:10. 1145/3366424.3380048

work page doi:10.1145/3366424.3380048
[20]

2212–2220

Beutel et al., Fairness in Recommendation Ranking through Pairwise Com- parisons, in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, Association for Computing Machinery, 2019, pp. 2212–2220. URL:https://dl.acm.org/doi/10.1145/ 3292500.3330745. doi:10.1145/3292500.3330745

work page doi:10.1145/3292500.3330745 2019
[21]

Singh, T

A. Singh, T. Joachims, Policy Learning for Fairness in Ranking, in: Ad- vances in Neural Information Processing Systems, volume 32, Curran As- sociates, Inc., 2019. URL:https://proceedings.neurips.cc/paper/2019/ hash/9e82757e9a1c12cb710ad680db11f6f1-Abstract.html

work page 2019
[22]

Linear Programming, in: Optimization Techniques and Applications with Ex- amples, 2018, pp. 125–140. URL:https://onlinelibrary.wiley.com/doi/ abs/10.1002/9781119490616.ch6. doi:10.1002/9781119490616.ch6

work page doi:10.1002/9781119490616.ch6 2018
[23]

H. W. Kuhn, The Hungarian method for the assignment problem (1955) 83–97. URL:https://onlinelibrary.wiley.com/doi/abs/10.1002/nav. 3800020109. doi:10.1002/nav.3800020109

work page doi:10.1002/nav 1955
[24]

Fulay, W

S. Fulay, W. Brannon, S. Mohanty, C. Overney, E. Poole-Dayan, D. Roy, J. Kabbara, On the relationship between truth and political bias in language models, in: EMNLP, 2024, pp. 9004–9018. 38

work page 2024
[25]

Y. Zhao, V. Efthymiou, J. Nummenmaa, K. Stefanidis, A dataset generation method for bias evaluation in retrieval-augmented generation, in: Proceed- ingsoftheEDBT/ICDT2026JointConferenceWorkshops(EDBT/ICDT-WS 2026), CEUR Workshop Proceedings, CEUR-WS.org, Helsinki, Finland, 2026. URL:https://ceur-ws.org/Vol-4192/DARLIAP-paper5.pdf

work page 2026
[26]

The Llama 3 Herd of Models

A. Grattafiori et al., The Llama 3 Herd of Models, 2024. URL:http://arxiv. org/abs/2407.21783. doi:10.48550/arXiv.2407.21783, arXiv:2407.21783 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.21783 2024
[27]

Gemma 2: Improving Open Language Models at a Practical Size

Gemma Team, Gemma 2: Improving Open Language Models at a Practical Size, 2024. URL:http://arxiv.org/abs/2408.00118. doi:10.48550/arXiv. 2408.00118, arXiv:2408.00118 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2024
[28]

A. Q. Jiang et al., Mistral 7B, 2023. URL:http://arxiv.org/abs/2310. 06825. doi:10.48550/arXiv.2310.06825, arXiv:2310.06825 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.06825 2023
[29]

Qwen2 Technical Report

A. Yang et al., Qwen2 Technical Report, 2024. URL:http://arxiv.org/abs/ 2407.10671. doi:10.48550/arXiv.2407.10671, arXiv:2407.10671 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.10671 2024
[30]

Roelleke, J

T. Roelleke, J. Wang, TF-IDF uncovered: A study of theories and probabili- ties, in: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Com- puting Machinery, 2008, pp. 435–442. URL:https://dl.acm.org/doi/10. 1145/1390334.1390409. doi:10.1145/1390334.1390409

work page doi:10.1145/1390334.1390409 2008
[31]

Robertson and Hugo Zaragoza , title =

S. Robertson, H. Zaragoza, The Probabilistic Relevance Framework: BM25 and Beyond, Found. Trends Inf. Retr. 3 (2009) 333–389. URL:https://doi. org/10.1561/1500000019. doi:10.1561/1500000019

work page doi:10.1561/1500000019 2009
[32]

Formal, B

T. Formal, B. Piwowarski, S. Clinchant, Splade: Sparse lexical and expansion model for first stage ranking, in: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, Association for Computing Machinery, 2021, p. 2288–2292. URL: https://doi.org. doi:10.1145/3404835.3463098

work page doi:10.1145/3404835.3463098 2021
[33]

J. Chen et al., M3-embedding: Multi-linguality, multi-functionality, multi- granularity text embeddings through self-knowledge distillation, in: Find- ings of the Association for Computational Linguistics: ACL 2024, Associ- ation for Computational Linguistics, Bangkok, Thailand, 2024, pp. 2318–

work page 2024
[34]

URL:https://aclanthology.org/2024.findings-acl.137/. doi:10. 18653/v1/2024.findings-acl.137. 39

work page 2024
[35]

Towards General Text Embeddings with Multi-stage Contrastive Learning

Z. Li et al., Towards General Text Embeddings with Multi-stage Contrastive Learning, 2023. URL:http://arxiv.org/abs/2308.03281. doi:10.48550/ arXiv.2308.03281, arXiv:2308.03281 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2023
[36]

The Faiss library

M. Douze et al., The Faiss library, 2025. URL:http://arxiv.org/abs/2401. 08281. doi:10.48550/arXiv.2401.08281, arXiv:2401.08281 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2401.08281 2025
[37]

Virtanen, et al., Scipy 1.0: Fundamental algorithms for scientific computing in python, Nature Methods 17 (2020) 261–272

P. Virtanen, et al., Scipy 1.0: Fundamental algorithms for scientific computing in python, Nature Methods 17 (2020) 261–272. 40

work page 2020

[1] [1]

URL:https://papers.nips.cc/paper_files/paper/ 2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

Vaswaniet al., Attention is All you Need, in: Advances in Neural Information Processing Systems, volume 30, Curran Asso- ciates, Inc., 2017. URL:https://papers.nips.cc/paper_files/paper/ 2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

work page 2017

[2] [2]

Scaling Laws for Neural Language Models

J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, D. Amodei, Scaling Laws for Neural Lan- guage Models, 2020. URL:http://arxiv.org/abs/2001.08361. doi:10. 48550/arXiv.2001.08361, arXiv:2001.08361 [cs, stat]

work page internal anchor Pith review Pith/arXiv arXiv 2020

[3] [3]

Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. J. Bang, A. Madotto, P. Fung, Survey of hallucination in natural language generation, ACM Comput. Surv. 55 (2023). URL:https://doi.org/10.1145/3571730. doi:10.1145/3571730

work page doi:10.1145/3571730 2023

[4] [4]

Lewis, E

P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, S. Riedel, D. Kiela, Retrieval-augmented generation for knowledge-intensive nlp tasks, in: Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Curran Associates Inc., 2020

work page 2020

[5] [5]

M. Hu, H. Wu, Z. Guan, R. Zhu, D. Guo, D. Qi, S. Li, No Free Lunch: Retrieval-Augmented Generation Undermines Fairness in LLMs, Even for Vig- 36 ilant Users, 2024. URL:http://arxiv.org/abs/2410.07589. doi:10.48550/ arXiv.2410.07589, arXiv:2410.07589 [cs]

work page arXiv 2024

[6] [6]

X. Wu, S. Li, H.-T. Wu, Z. Tao, Y. Fang, Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems, in: COLING, 2025, pp. 10021–10036

work page 2025

[7] [7]

Ranjan, S

R. Ranjan, S. Gupta, S. N. Singh, A comprehensive survey of bias in llms: Current landscape and future directions, CoRR abs/2409.16430 (2024)

work page arXiv 2024

[8] [8]

Pitoura, K

E. Pitoura, K. Stefanidis, G. Koutrika, Fairness in rankings and recommen- dations: an overview, VLDB J. 31 (2022) 431–458

work page 2022

[9] [9]

Gallegos, Ryan A

G. et al., Bias and Fairness in Large Language Models: A Survey, Computa- tional Linguistics 50 (2024) 1097–1179. doi:10.1162/coli_a_00524

work page doi:10.1162/coli_a_00524 2024

[10] [10]

Singh, T

A. Singh, T. Joachims, Fairness of Exposure in Rankings, in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discov- ery & Data Mining, KDD ’18, Association for Computing Machinery, 2018, pp. 2219–2228. URL:https://dl.acm.org/doi/10.1145/3219819.3220088. doi:10.1145/3219819.3220088

work page doi:10.1145/3219819.3220088 2018

[11] [11]

T. Kim, J. M. Springer, A. Raghunathan, M. Sap, Mitigating Bias in RAG: Controlling the Embedder, in: Findings of the Association for Computational Linguistics, ACL 2025, Vienna, Austria, July 27 - August 1, 2025, Association for Computational Linguistics, 2025, pp. 18999–19024

work page 2025

[12] [12]

Y. Zhao, V. Efthymiou, J. Nummenmaa, K. Stefanidis, ReFaRAG: Re- ranking for Bias Mitigation in Retrieval-Augmented Generation, in: New TrendsinDatabaseandInformationSystems, 2026, pp.516–530.doi:10.1007/ 978-3-032-05727-3_42

work page 2026

[13] [13]

Zhang, Y

T. Zhang, Y. Zhou, D. Bollegala, Evaluating the Effect of Retrieval Augmentation on Social Biases, 2025. doi:10.48550/arXiv.2502.17611. arXiv:2502.17611

work page doi:10.48550/arxiv.2502.17611 2025

[14] [14]

et al., Lost in the Middle: How Language Models Use Long Con- texts, Transactions of the Association for Computational Linguistics 12 (2024) 157–173

L. et al., Lost in the Middle: How Language Models Use Long Con- texts, Transactions of the Association for Computational Linguistics 12 (2024) 157–173. URL:https://aclanthology.org/2024.tacl-1.9/. doi:10.1162/ tacl_a_00638. 37

work page 2024

[15] [15]

T. E. Kim, F. Diaz, Towards fair rag: On the impact of fair ranking in retrieval-augmented generation, ICTIR ’25, Association for Computing Ma- chinery, 2025, p. 33–43. URL:https://doi.org/10.1145/3731120.3744599. doi:10.1145/3731120.3744599

work page doi:10.1145/3731120.3744599 2025

[16] [16]

Dehghan, G

M. Dehghan, G. McDonald, Who benefits from rag? the role of expo- sure, utility and attribution bias, in: Advances in Information Retrieval: 48th European Conference on Information Retrieval, ECIR 2026, Delft, The Netherlands, March 29 – April 2, 2026, Proceedings, Part I, Springer-Verlag, 2026, p. 289–304. URL:https://doi.org/10.1007/978-3-032-21289-4_19...

work page doi:10.1007/978-3-032-21289-4_19 2026

[17] [17]

Zehlike, F

M. Zehlike, F. Bonchi, C. Castillo, S. Hajian, M. Megahed, R. Baeza-Yates, FA*IR: A fair top-k ranking algorithm, in: CIKM, 2017, pp. 1569–1578

work page 2017

[18] [18]

Zehlike, C

M. Zehlike, C. Castillo, Reducing Disparate Exposure in Ranking: A Learning To Rank Approach, in: Proceedings of The Web Conference 2020, WWW ’20, Association for Computing Machinery, 2020, pp. 2849–

work page 2020

[19] [19]

URL:https://dl.acm.org/doi/10.1145/3366424.3380048. doi:10. 1145/3366424.3380048

work page doi:10.1145/3366424.3380048

[20] [20]

2212–2220

Beutel et al., Fairness in Recommendation Ranking through Pairwise Com- parisons, in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, Association for Computing Machinery, 2019, pp. 2212–2220. URL:https://dl.acm.org/doi/10.1145/ 3292500.3330745. doi:10.1145/3292500.3330745

work page doi:10.1145/3292500.3330745 2019

[21] [21]

Singh, T

A. Singh, T. Joachims, Policy Learning for Fairness in Ranking, in: Ad- vances in Neural Information Processing Systems, volume 32, Curran As- sociates, Inc., 2019. URL:https://proceedings.neurips.cc/paper/2019/ hash/9e82757e9a1c12cb710ad680db11f6f1-Abstract.html

work page 2019

[22] [22]

Linear Programming, in: Optimization Techniques and Applications with Ex- amples, 2018, pp. 125–140. URL:https://onlinelibrary.wiley.com/doi/ abs/10.1002/9781119490616.ch6. doi:10.1002/9781119490616.ch6

work page doi:10.1002/9781119490616.ch6 2018

[23] [23]

H. W. Kuhn, The Hungarian method for the assignment problem (1955) 83–97. URL:https://onlinelibrary.wiley.com/doi/abs/10.1002/nav. 3800020109. doi:10.1002/nav.3800020109

work page doi:10.1002/nav 1955

[24] [24]

Fulay, W

S. Fulay, W. Brannon, S. Mohanty, C. Overney, E. Poole-Dayan, D. Roy, J. Kabbara, On the relationship between truth and political bias in language models, in: EMNLP, 2024, pp. 9004–9018. 38

work page 2024

[25] [25]

Y. Zhao, V. Efthymiou, J. Nummenmaa, K. Stefanidis, A dataset generation method for bias evaluation in retrieval-augmented generation, in: Proceed- ingsoftheEDBT/ICDT2026JointConferenceWorkshops(EDBT/ICDT-WS 2026), CEUR Workshop Proceedings, CEUR-WS.org, Helsinki, Finland, 2026. URL:https://ceur-ws.org/Vol-4192/DARLIAP-paper5.pdf

work page 2026

[26] [26]

The Llama 3 Herd of Models

A. Grattafiori et al., The Llama 3 Herd of Models, 2024. URL:http://arxiv. org/abs/2407.21783. doi:10.48550/arXiv.2407.21783, arXiv:2407.21783 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.21783 2024

[27] [27]

Gemma 2: Improving Open Language Models at a Practical Size

Gemma Team, Gemma 2: Improving Open Language Models at a Practical Size, 2024. URL:http://arxiv.org/abs/2408.00118. doi:10.48550/arXiv. 2408.00118, arXiv:2408.00118 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv 2024

[28] [28]

A. Q. Jiang et al., Mistral 7B, 2023. URL:http://arxiv.org/abs/2310. 06825. doi:10.48550/arXiv.2310.06825, arXiv:2310.06825 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2310.06825 2023

[29] [29]

Qwen2 Technical Report

A. Yang et al., Qwen2 Technical Report, 2024. URL:http://arxiv.org/abs/ 2407.10671. doi:10.48550/arXiv.2407.10671, arXiv:2407.10671 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2407.10671 2024

[30] [30]

Roelleke, J

T. Roelleke, J. Wang, TF-IDF uncovered: A study of theories and probabili- ties, in: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Com- puting Machinery, 2008, pp. 435–442. URL:https://dl.acm.org/doi/10. 1145/1390334.1390409. doi:10.1145/1390334.1390409

work page doi:10.1145/1390334.1390409 2008

[31] [31]

Robertson and Hugo Zaragoza , title =

S. Robertson, H. Zaragoza, The Probabilistic Relevance Framework: BM25 and Beyond, Found. Trends Inf. Retr. 3 (2009) 333–389. URL:https://doi. org/10.1561/1500000019. doi:10.1561/1500000019

work page doi:10.1561/1500000019 2009

[32] [32]

Formal, B

T. Formal, B. Piwowarski, S. Clinchant, Splade: Sparse lexical and expansion model for first stage ranking, in: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, Association for Computing Machinery, 2021, p. 2288–2292. URL: https://doi.org. doi:10.1145/3404835.3463098

work page doi:10.1145/3404835.3463098 2021

[33] [33]

J. Chen et al., M3-embedding: Multi-linguality, multi-functionality, multi- granularity text embeddings through self-knowledge distillation, in: Find- ings of the Association for Computational Linguistics: ACL 2024, Associ- ation for Computational Linguistics, Bangkok, Thailand, 2024, pp. 2318–

work page 2024

[34] [34]

URL:https://aclanthology.org/2024.findings-acl.137/. doi:10. 18653/v1/2024.findings-acl.137. 39

work page 2024

[35] [35]

Towards General Text Embeddings with Multi-stage Contrastive Learning

Z. Li et al., Towards General Text Embeddings with Multi-stage Contrastive Learning, 2023. URL:http://arxiv.org/abs/2308.03281. doi:10.48550/ arXiv.2308.03281, arXiv:2308.03281 [cs]

work page internal anchor Pith review Pith/arXiv arXiv 2023

[36] [36]

The Faiss library

M. Douze et al., The Faiss library, 2025. URL:http://arxiv.org/abs/2401. 08281. doi:10.48550/arXiv.2401.08281, arXiv:2401.08281 [cs]

work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2401.08281 2025

[37] [37]

Virtanen, et al., Scipy 1.0: Fundamental algorithms for scientific computing in python, Nature Methods 17 (2020) 261–272

P. Virtanen, et al., Scipy 1.0: Fundamental algorithms for scientific computing in python, Nature Methods 17 (2020) 261–272. 40

work page 2020