pith. sign in

arxiv: 2605.15790 · v1 · pith:P6HZQIN2new · submitted 2026-05-15 · 💻 cs.DB · cs.IR

Fairness-Aware Retrieval Optimization for Retrieval-Augmented Generation

Pith reviewed 2026-05-19 19:22 UTC · model grok-4.3

classification 💻 cs.DB cs.IR
keywords fairness-aware retrievalretrieval-augmented generationbias mitigationoptimizationrerankingRAG systemsfairness optimization
0
0 comments X

The pith

Retrieval optimization that models position-dependent bias propagation can reduce unfairness in RAG outputs while maintaining document relevance.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper aims to establish a method for making retrieval in RAG systems fairer by controlling how bias from document positions affects the final generation. In top-k retrieval, several documents influence the output together, so bias can spread unevenly. The approach uses reranking to inject controlled bias, a model that tracks bias based on position, and an optimization to balance fairness with relevance. A scalable solver called FARO makes this practical by breaking down the problem. If successful, this would let RAG systems produce more equitable answers without sacrificing accuracy.

Core claim

The central claim is that by modeling bias propagation in a position-aware way and formulating retrieval as an optimization problem that trades off relevance against fairness, with a quadratic approximation via dual hyperplanes for efficiency, one can mitigate the bias that reaches the generation stage in RAG while preserving the utility of the retrieved documents.

What carries the argument

The position-aware model of bias propagation combined with controlled bias injection via reranking and the FARO optimization that decomposes the quadratic fairness problem using dual hyperplane approximation.

Load-bearing premise

The position-aware model of bias propagation combined with controlled bias injection via reranking accurately represents how retrieval choices affect downstream generation bias in top-k settings.

What would settle it

Running the method on a RAG system and measuring generation bias metrics before and after, checking if bias drops significantly while relevance stays high; if bias remains unchanged, the claim fails.

Figures

Figures reproduced from arXiv: 2605.15790 by Jyrki Nummenmaa, Kostas Stefanidis, Vasilis Efthymiou, Yingqi Zhao.

Figure 1
Figure 1. Figure 1: Overview of the proposed three-stage framework for fairness-aware retrieval in [PITH_FULL_IMAGE:figures/full_fig_p009_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Reranking-based mechanism for controlling embedding bias in retrieved docu [PITH_FULL_IMAGE:figures/full_fig_p012_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Overview of the FARO optimization process. For each value of the surrogate [PITH_FULL_IMAGE:figures/full_fig_p020_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Top-2 RAG grid search results: We conducted our grid search experiments based [PITH_FULL_IMAGE:figures/full_fig_p026_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: To further assess generalization, we apply the learned linear parameters [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗
Figure 5
Figure 5. Figure 5: Bias validation results: the first row corresponds to gender bias, the second [PITH_FULL_IMAGE:figures/full_fig_p027_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Comparison of position-dependent weight distributions of biased content in the [PITH_FULL_IMAGE:figures/full_fig_p028_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Comparison of attention weight distributions for political bias across different [PITH_FULL_IMAGE:figures/full_fig_p030_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Trade-off curves between fairness and relevance for political bias in RAG systems [PITH_FULL_IMAGE:figures/full_fig_p031_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Trade-off curves between fairness and relevance for gender bias in RAG systems [PITH_FULL_IMAGE:figures/full_fig_p032_9.png] view at source ↗
read the original abstract

Retrieval-Augmented Generation (RAG) improves reliability of large language models by incorporating external knowledge, but the retrieval process can introduce bias that propagates to generated outputs. This issue is particularly challenging in top-k settings, where multiple documents jointly influence generation. We propose a fairness-aware retrieval framework that models and controls this bias. Our approach combines controlled bias injection via reranking, a position-aware model of bias propagation, and an optimization formulation that balances relevance and fairness. We further introduce a scalable solution based on Quadratic Fairness via Dual Hyperplane Approximation (FARO), which enables efficient optimization through problem decomposition. Experimental results show that our method effectively mitigates generation bias while preserving relevance. This work provides a principled approach for fairness-aware retrieval in RAG systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes a fairness-aware retrieval framework for Retrieval-Augmented Generation (RAG) in top-k settings. It combines a position-aware model of bias propagation, controlled bias injection via reranking, and an optimization formulation that balances relevance and fairness objectives. The framework is made scalable via Quadratic Fairness via Dual Hyperplane Approximation (FARO) through problem decomposition. The central claim is that this approach mitigates generation bias while preserving relevance, supported by experimental results.

Significance. If the claims hold, the work addresses an important practical issue in RAG systems by providing a principled optimization approach to fairness. The FARO decomposition for efficient solving represents a useful technical contribution for balancing the two objectives.

major comments (1)
  1. [position-aware model of bias propagation] The position-aware model of bias propagation (combined with reranking-based controlled injection) is load-bearing for the central claim that the method accurately controls downstream generation bias. This model implicitly treats bias effects as additive or linear across ranked positions, yet top-k RAG generation involves non-linear interactions including attention mixing and context fusion over the full retrieved set. If these joint effects are not captured, the optimization objective and reported bias reductions may be miscalibrated relative to actual LLM outputs.
minor comments (2)
  1. [Abstract] The abstract asserts that experiments support bias mitigation with preserved relevance but provides no details on datasets, baselines, metrics, error bars, or statistical tests; a brief summary of these should be added for transparency.
  2. [optimization formulation] The relevance-fairness trade-off weight is listed as a free parameter; clarify whether the method is intended to be parameter-free or how this hyperparameter is selected in practice.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive review and for highlighting the central role of the position-aware bias propagation model. We address this comment directly below and outline planned revisions to strengthen the manuscript.

read point-by-point responses
  1. Referee: [position-aware model of bias propagation] The position-aware model of bias propagation (combined with reranking-based controlled injection) is load-bearing for the central claim that the method accurately controls downstream generation bias. This model implicitly treats bias effects as additive or linear across ranked positions, yet top-k RAG generation involves non-linear interactions including attention mixing and context fusion over the full retrieved set. If these joint effects are not captured, the optimization objective and reported bias reductions may be miscalibrated relative to actual LLM outputs.

    Authors: We agree that the position-aware propagation model serves as a key component and that it employs a structured, position-dependent weighting rather than a fully non-linear representation of LLM internals. The model is intentionally formulated as a tractable approximation that captures observed positional decay in bias influence, which is supported by prior empirical studies on context utilization in retrieval-augmented settings. While we do not claim to model every attention-mixing or fusion interaction explicitly, the framework is validated end-to-end: bias metrics are computed directly from the LLM's generated outputs after applying the optimized retrieval sets. This provides empirical grounding that the resulting bias reductions are realized in practice, not merely in the surrogate objective. To address the concern, we will add a new subsection in the revised manuscript discussing the modeling assumptions, the linear-position approximation, and its limitations relative to full non-linear LLM dynamics, along with suggestions for future extensions. revision: yes

Circularity Check

0 steps flagged

No significant circularity; optimization balances independent objectives

full rationale

The paper presents a fairness-aware retrieval framework that combines controlled bias injection via reranking, a position-aware model of bias propagation, and an optimization formulation balancing relevance and fairness, solved via the FARO approximation. No equations, derivations, or self-citations are exhibited in the provided text that reduce the claimed bias mitigation or fairness gains to a fitted parameter by construction or to a load-bearing self-citation chain. The central claims rest on the proposed balancing of two objectives and experimental validation, which remain independent of the inputs by the paper's own description.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

Framework rests on the domain assumption that bias propagation can be modeled positionally and controlled through reranking; introduces one optimization technique and likely at least one trade-off parameter.

free parameters (1)
  • relevance-fairness trade-off weight
    Optimization formulation balances relevance and fairness; such a scalar is required to produce a single solution and is not derived from first principles in the abstract.
axioms (1)
  • domain assumption Bias in RAG generation can be modeled via position-aware propagation and controlled by reranking.
    This premise underpins the entire controlled-bias-injection and optimization approach described in the abstract.

pith-pipeline@v0.9.0 · 5662 in / 1236 out tokens · 74341 ms · 2026-05-19T19:22:29.557138+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 7 internal anchors

  1. [1]

    URL:https://papers.nips.cc/paper_files/paper/ 2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

    Vaswaniet al., Attention is All you Need, in: Advances in Neural Information Processing Systems, volume 30, Curran Asso- ciates, Inc., 2017. URL:https://papers.nips.cc/paper_files/paper/ 2017/hash/3f5ee243547dee91fbd053c1c4a845aa-Abstract.html

  2. [2]

    Scaling Laws for Neural Language Models

    J. Kaplan, S. McCandlish, T. Henighan, T. B. Brown, B. Chess, R. Child, S. Gray, A. Radford, J. Wu, D. Amodei, Scaling Laws for Neural Lan- guage Models, 2020. URL:http://arxiv.org/abs/2001.08361. doi:10. 48550/arXiv.2001.08361, arXiv:2001.08361 [cs, stat]

  3. [3]

    Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y. Xu, E. Ishii, Y. J. Bang, A. Madotto, P. Fung, Survey of hallucination in natural language generation, ACM Comput. Surv. 55 (2023). URL:https://doi.org/10.1145/3571730. doi:10.1145/3571730

  4. [4]

    Lewis, E

    P. Lewis, E. Perez, A. Piktus, F. Petroni, V. Karpukhin, N. Goyal, H. Küttler, M. Lewis, W.-t. Yih, T. Rocktäschel, S. Riedel, D. Kiela, Retrieval-augmented generation for knowledge-intensive nlp tasks, in: Proceedings of the 34th International Conference on Neural Information Processing Systems, NIPS ’20, Curran Associates Inc., 2020

  5. [5]

    M. Hu, H. Wu, Z. Guan, R. Zhu, D. Guo, D. Qi, S. Li, No Free Lunch: Retrieval-Augmented Generation Undermines Fairness in LLMs, Even for Vig- 36 ilant Users, 2024. URL:http://arxiv.org/abs/2410.07589. doi:10.48550/ arXiv.2410.07589, arXiv:2410.07589 [cs]

  6. [6]

    X. Wu, S. Li, H.-T. Wu, Z. Tao, Y. Fang, Does RAG Introduce Unfairness in LLMs? Evaluating Fairness in Retrieval-Augmented Generation Systems, in: COLING, 2025, pp. 10021–10036

  7. [7]

    Ranjan, S

    R. Ranjan, S. Gupta, S. N. Singh, A comprehensive survey of bias in llms: Current landscape and future directions, CoRR abs/2409.16430 (2024)

  8. [8]

    Pitoura, K

    E. Pitoura, K. Stefanidis, G. Koutrika, Fairness in rankings and recommen- dations: an overview, VLDB J. 31 (2022) 431–458

  9. [9]

    Gallegos, Ryan A

    G. et al., Bias and Fairness in Large Language Models: A Survey, Computa- tional Linguistics 50 (2024) 1097–1179. doi:10.1162/coli_a_00524

  10. [10]

    Singh, T

    A. Singh, T. Joachims, Fairness of Exposure in Rankings, in: Proceedings of the 24th ACM SIGKDD International Conference on Knowledge Discov- ery & Data Mining, KDD ’18, Association for Computing Machinery, 2018, pp. 2219–2228. URL:https://dl.acm.org/doi/10.1145/3219819.3220088. doi:10.1145/3219819.3220088

  11. [11]

    T. Kim, J. M. Springer, A. Raghunathan, M. Sap, Mitigating Bias in RAG: Controlling the Embedder, in: Findings of the Association for Computational Linguistics, ACL 2025, Vienna, Austria, July 27 - August 1, 2025, Association for Computational Linguistics, 2025, pp. 18999–19024

  12. [12]

    Y. Zhao, V. Efthymiou, J. Nummenmaa, K. Stefanidis, ReFaRAG: Re- ranking for Bias Mitigation in Retrieval-Augmented Generation, in: New TrendsinDatabaseandInformationSystems, 2026, pp.516–530.doi:10.1007/ 978-3-032-05727-3_42

  13. [13]

    Zhang, Y

    T. Zhang, Y. Zhou, D. Bollegala, Evaluating the Effect of Retrieval Augmentation on Social Biases, 2025. doi:10.48550/arXiv.2502.17611. arXiv:2502.17611

  14. [14]

    et al., Lost in the Middle: How Language Models Use Long Con- texts, Transactions of the Association for Computational Linguistics 12 (2024) 157–173

    L. et al., Lost in the Middle: How Language Models Use Long Con- texts, Transactions of the Association for Computational Linguistics 12 (2024) 157–173. URL:https://aclanthology.org/2024.tacl-1.9/. doi:10.1162/ tacl_a_00638. 37

  15. [15]

    T. E. Kim, F. Diaz, Towards fair rag: On the impact of fair ranking in retrieval-augmented generation, ICTIR ’25, Association for Computing Ma- chinery, 2025, p. 33–43. URL:https://doi.org/10.1145/3731120.3744599. doi:10.1145/3731120.3744599

  16. [16]

    Dehghan, G

    M. Dehghan, G. McDonald, Who benefits from rag? the role of expo- sure, utility and attribution bias, in: Advances in Information Retrieval: 48th European Conference on Information Retrieval, ECIR 2026, Delft, The Netherlands, March 29 – April 2, 2026, Proceedings, Part I, Springer-Verlag, 2026, p. 289–304. URL:https://doi.org/10.1007/978-3-032-21289-4_19...

  17. [17]

    Zehlike, F

    M. Zehlike, F. Bonchi, C. Castillo, S. Hajian, M. Megahed, R. Baeza-Yates, FA*IR: A fair top-k ranking algorithm, in: CIKM, 2017, pp. 1569–1578

  18. [18]

    Zehlike, C

    M. Zehlike, C. Castillo, Reducing Disparate Exposure in Ranking: A Learning To Rank Approach, in: Proceedings of The Web Conference 2020, WWW ’20, Association for Computing Machinery, 2020, pp. 2849–

  19. [19]

    URL:https://dl.acm.org/doi/10.1145/3366424.3380048. doi:10. 1145/3366424.3380048

  20. [20]

    2212–2220

    Beutel et al., Fairness in Recommendation Ranking through Pairwise Com- parisons, in: Proceedings of the 25th ACM SIGKDD International Conference on Knowledge Discovery & Data Mining, KDD ’19, Association for Computing Machinery, 2019, pp. 2212–2220. URL:https://dl.acm.org/doi/10.1145/ 3292500.3330745. doi:10.1145/3292500.3330745

  21. [21]

    Singh, T

    A. Singh, T. Joachims, Policy Learning for Fairness in Ranking, in: Ad- vances in Neural Information Processing Systems, volume 32, Curran As- sociates, Inc., 2019. URL:https://proceedings.neurips.cc/paper/2019/ hash/9e82757e9a1c12cb710ad680db11f6f1-Abstract.html

  22. [22]

    Linear Programming, in: Optimization Techniques and Applications with Ex- amples, 2018, pp. 125–140. URL:https://onlinelibrary.wiley.com/doi/ abs/10.1002/9781119490616.ch6. doi:10.1002/9781119490616.ch6

  23. [23]

    H. W. Kuhn, The Hungarian method for the assignment problem (1955) 83–97. URL:https://onlinelibrary.wiley.com/doi/abs/10.1002/nav. 3800020109. doi:10.1002/nav.3800020109

  24. [24]

    Fulay, W

    S. Fulay, W. Brannon, S. Mohanty, C. Overney, E. Poole-Dayan, D. Roy, J. Kabbara, On the relationship between truth and political bias in language models, in: EMNLP, 2024, pp. 9004–9018. 38

  25. [25]

    Y. Zhao, V. Efthymiou, J. Nummenmaa, K. Stefanidis, A dataset generation method for bias evaluation in retrieval-augmented generation, in: Proceed- ingsoftheEDBT/ICDT2026JointConferenceWorkshops(EDBT/ICDT-WS 2026), CEUR Workshop Proceedings, CEUR-WS.org, Helsinki, Finland, 2026. URL:https://ceur-ws.org/Vol-4192/DARLIAP-paper5.pdf

  26. [26]

    The Llama 3 Herd of Models

    A. Grattafiori et al., The Llama 3 Herd of Models, 2024. URL:http://arxiv. org/abs/2407.21783. doi:10.48550/arXiv.2407.21783, arXiv:2407.21783 [cs]

  27. [27]

    Gemma 2: Improving Open Language Models at a Practical Size

    Gemma Team, Gemma 2: Improving Open Language Models at a Practical Size, 2024. URL:http://arxiv.org/abs/2408.00118. doi:10.48550/arXiv. 2408.00118, arXiv:2408.00118 [cs]

  28. [28]

    A. Q. Jiang et al., Mistral 7B, 2023. URL:http://arxiv.org/abs/2310. 06825. doi:10.48550/arXiv.2310.06825, arXiv:2310.06825 [cs]

  29. [29]

    Qwen2 Technical Report

    A. Yang et al., Qwen2 Technical Report, 2024. URL:http://arxiv.org/abs/ 2407.10671. doi:10.48550/arXiv.2407.10671, arXiv:2407.10671 [cs]

  30. [30]

    Roelleke, J

    T. Roelleke, J. Wang, TF-IDF uncovered: A study of theories and probabili- ties, in: Proceedings of the 31st Annual International ACM SIGIR Conference on Research and Development in Information Retrieval, Association for Com- puting Machinery, 2008, pp. 435–442. URL:https://dl.acm.org/doi/10. 1145/1390334.1390409. doi:10.1145/1390334.1390409

  31. [31]

    Robertson and Hugo Zaragoza , title =

    S. Robertson, H. Zaragoza, The Probabilistic Relevance Framework: BM25 and Beyond, Found. Trends Inf. Retr. 3 (2009) 333–389. URL:https://doi. org/10.1561/1500000019. doi:10.1561/1500000019

  32. [32]

    Formal, B

    T. Formal, B. Piwowarski, S. Clinchant, Splade: Sparse lexical and expansion model for first stage ranking, in: Proceedings of the 44th International ACM SIGIR Conference on Research and Development in Information Retrieval, SIGIR ’21, Association for Computing Machinery, 2021, p. 2288–2292. URL: https://doi.org. doi:10.1145/3404835.3463098

  33. [33]

    J. Chen et al., M3-embedding: Multi-linguality, multi-functionality, multi- granularity text embeddings through self-knowledge distillation, in: Find- ings of the Association for Computational Linguistics: ACL 2024, Associ- ation for Computational Linguistics, Bangkok, Thailand, 2024, pp. 2318–

  34. [34]

    URL:https://aclanthology.org/2024.findings-acl.137/. doi:10. 18653/v1/2024.findings-acl.137. 39

  35. [35]

    Towards General Text Embeddings with Multi-stage Contrastive Learning

    Z. Li et al., Towards General Text Embeddings with Multi-stage Contrastive Learning, 2023. URL:http://arxiv.org/abs/2308.03281. doi:10.48550/ arXiv.2308.03281, arXiv:2308.03281 [cs]

  36. [36]

    The Faiss library

    M. Douze et al., The Faiss library, 2025. URL:http://arxiv.org/abs/2401. 08281. doi:10.48550/arXiv.2401.08281, arXiv:2401.08281 [cs]

  37. [37]

    Virtanen, et al., Scipy 1.0: Fundamental algorithms for scientific computing in python, Nature Methods 17 (2020) 261–272

    P. Virtanen, et al., Scipy 1.0: Fundamental algorithms for scientific computing in python, Nature Methods 17 (2020) 261–272. 40