Masking or Mitigating? Deconstructing the Impact of Query Rewriting on Retriever Biases in RAG

Agam Goyal; Anirudh Phukan; Apoorv Saxena; Eshwar Chandrasekharan; Hari Sundaram; Koyel Mukherjee

arxiv: 2604.06097 · v2 · submitted 2026-04-07 · 💻 cs.IR

Masking or Mitigating? Deconstructing the Impact of Query Rewriting on Retriever Biases in RAG

Agam Goyal , Koyel Mukherjee , Apoorv Saxena , Anirudh Phukan , Eshwar Chandrasekharan , Hari Sundaram This is my paper

Pith reviewed 2026-05-10 18:31 UTC · model grok-4.3

classification 💻 cs.IR

keywords query rewritingretriever biasesRAGdense retrievalbias mitigationquery enhancementretrieval augmented generation

0 comments

The pith

Simple LLM rewriting reduces dense retriever biases by 54 percent in RAG but fails when biases combine.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper tests how five query rewriting methods change four known biases in six dense retrievers used inside RAG pipelines. It reports that plain LLM rewriting produces the largest average drop in bias while methods that generate pseudo-documents reduce bias by breaking correlations with biased document features. The two approaches succeed through different internal mechanisms, and results vary sharply across retrievers. The work also separates biases that arise at query-document matching time from those baked into document encodings, showing that query changes alone cannot reach the second group. These distinctions matter for anyone building retrieval pipelines that must stay fair across different query styles.

Core claim

Simple LLM-based rewriting achieves the strongest aggregate bias reduction of 54 percent. It reduces bias mainly by raising variance among retrieval scores. Pseudo-document generation methods instead reduce bias by decorrelating from the features that trigger the biases. No single technique fixes every bias at once, and the size of the effect depends on the underlying retriever. The study introduces a taxonomy that separates query-document interaction biases from document encoding biases to mark the boundary of what query-side changes can fix.

What carries the argument

The mechanistic split between bias reduction via increased retrieval-score variance versus genuine decorrelation from bias-inducing features, together with the taxonomy of query-document interaction biases versus document encoding biases.

If this is right

No rewriting method removes every bias or works equally on every retriever.
Simple rewriting gives the biggest average gain but collapses under adversarial combinations of biases.
Pseudo-document methods offer more stable decorrelation for some biases.
Query-only fixes cannot reach document encoding biases.
RAG builders should match the rewriting method to the dominant bias risk in their system.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

RAG systems may need separate document-side debiasing steps once query rewriting reaches its limit.
The variance mechanism points to score calibration as a possible alternative lever for bias control.
Real deployments should test rewriting strategies against queries that deliberately trigger multiple biases at once.
The interaction-versus-encoding taxonomy could be used to diagnose retrieval problems outside RAG.

Load-bearing premise

The four named biases, five rewriting methods, and six retrievers tested here stand in for the full range of biases and RAG setups that appear in practice.

What would settle it

Measure the same four biases on a new retriever or with an additional bias type and check whether the 54 percent aggregate reduction and the variance-versus-decorrelation split still appear.

Figures

Figures reproduced from arXiv: 2604.06097 by Agam Goyal, Anirudh Phukan, Apoorv Saxena, Eshwar Chandrasekharan, Hari Sundaram, Koyel Mukherjee.

**Figure 1.** Figure 1: Paper Overview. Dense retrievers are susceptible to various biases, leading to incorrect retrieval and erroneous LLM outputs. Can query rewriting help? We systematically evaluate its effects on retrieval bias. ument collections, which are then provided as context for response generation (Oche et al., 2025). Despite their effectiveness, recent work has revealed that dense retrievers exhibit systematic bi… view at source ↗

**Figure 3.** Figure 3: Mean |t|-statistic by retriever and query enhancement method, averaged across all four bias types. Lower values indicate reduced bias. Simple rewriting consistently reduces bias, while pseudodocument methods show differential effects. 4.86 and 4.99, respectively). This asymmetry suggests that LLM-generated pseudo-documents may inadvertently repeat query terms—a natural consequence of generating text c… view at source ↗

**Figure 2.** Figure 2: Mean |t|-statistic by bias type and method (averaged across 6 retrievers). Lower values indicate reduced bias. All |t| values were averaged irrespective of statistical significance. Brevity bias remains the most severe. Across all methods, brevity bias exhibits the highest |t|- statistics, indicating that dense retrievers’ preference for shorter documents is particularly robust to query-side interventions… view at source ↗

**Figure 4.** Figure 4: Complete bias analysis using Qwen3-4B-Instruct for query enhancement. Each cell shows the |t|-statistic measuring retrieval bias strength across retrievers and biases. Lower values indicate reduced bias. Results demonstrate consistent patterns with Gemma-3-12B-IT, confirming generalizability of our findings [PITH_FULL_IMAGE:figures/full_fig_p014_4.png] view at source ↗

read the original abstract

Dense retrievers in retrieval-augmented generation (RAG) systems exhibit systematic biases -- including brevity, position, literal matching, and repetition biases -- that can compromise retrieval quality. Query rewriting techniques are now standard in RAG pipelines, yet their impact on these biases remains unexplored. We present the first systematic study of how query enhancement techniques affect dense retrieval biases, evaluating five methods across six retrievers. Our findings reveal that simple LLM-based rewriting achieves the strongest aggregate bias reduction (54\%), yet fails under adversarial conditions where multiple biases combine. Mechanistic analysis uncovers two distinct mechanisms: simple rewriting reduces bias through increased score variance, while pseudo-document generation methods achieve reduction through genuine decorrelation from bias-inducing features. However, no technique uniformly addresses all biases, and effects vary substantially across retrievers. Our results provide practical guidance for selecting query enhancement strategies based on specific bias vulnerabilities. More broadly, we establish a taxonomy distinguishing query-document interaction biases from document encoding biases, clarifying the limits of query-side interventions for debiasing RAG systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

Simple LLM rewriting cuts aggregate retriever bias by 54% in this study but the two-mechanism explanation rests on correlations without isolating ablations.

read the letter

The paper's core contribution is a side-by-side test of five query rewriting approaches on six dense retrievers, measuring effects on brevity, position, literal matching, and repetition biases. Simple LLM rewriting delivers the largest average drop, yet it collapses when biases are combined adversarially. The authors also separate query-document interaction biases from document encoding biases and link the reductions to two pathways: variance increase for basic rewrites and feature decorrelation for pseudo-document methods. That framing gives RAG practitioners concrete selection rules they can apply today.

Referee Report

2 major / 2 minor

Summary. The paper presents the first systematic empirical study of how five query rewriting techniques (including simple LLM-based rewriting and pseudo-document generation) affect four biases (brevity, position, literal matching, and repetition) in six dense retrievers used in RAG pipelines. It reports that simple LLM rewriting yields the largest aggregate bias reduction of 54% but fails under combined adversarial biases, while mechanistic analysis attributes the effect of simple rewriting to increased score variance and that of pseudo-document methods to decorrelation from bias-inducing features. The work concludes with a taxonomy distinguishing query-document interaction biases from document encoding biases and offers practical guidance on selecting query-side interventions.

Significance. If the reported aggregate reductions and mechanistic distinctions hold, the paper would be the first to quantify and deconstruct query rewriting effects on retriever biases, supplying a useful taxonomy and selection guidelines for RAG practitioners. The empirical comparison across multiple retrievers and bias types strengthens the practical relevance, though the absence of causal isolation for the proposed mechanisms limits the strength of the taxonomy.

major comments (2)

[§5] §5 (Mechanistic Analysis): The claim that simple LLM rewriting reduces bias specifically through increased score variance (as opposed to a byproduct of query lengthening or lexical expansion) rests on post-hoc correlation between variance statistics and bias scores before/after rewriting. No ablation is described that holds other embedding properties fixed (e.g., norm, token overlap, or semantic drift) while varying only variance; without this, the causal attribution and the distinction from the decorrelation mechanism in pseudo-document methods cannot be isolated.
[§4] §4 (Experimental Results) and §3 (Adversarial Setup): The 54% aggregate bias reduction and the failure under adversarial conditions are presented as central findings, yet the manuscript provides no details on how the four biases were quantified, how adversarial combinations were constructed, or any statistical tests (e.g., significance or confidence intervals) supporting the aggregate number. These omissions make it impossible to assess whether the reported effect sizes are robust or sensitive to implementation choices.

minor comments (2)

[§1] The abstract and §1 state that effects 'vary substantially across retrievers' but do not include a table or figure summarizing per-retriever bias reductions; adding such a breakdown would improve clarity without altering the central claims.
[§3] Notation for the bias metrics (e.g., how brevity or repetition bias is formally defined) is introduced in §3 but not consistently referenced in the mechanistic plots of §5; a single equation or definition box would aid readability.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their thorough and constructive review. We address each major comment point by point below, providing the strongest honest defense of our work while acknowledging where revisions are needed to strengthen the manuscript.

read point-by-point responses

Referee: [§5] §5 (Mechanistic Analysis): The claim that simple LLM rewriting reduces bias specifically through increased score variance (as opposed to a byproduct of query lengthening or lexical expansion) rests on post-hoc correlation between variance statistics and bias scores before/after rewriting. No ablation is described that holds other embedding properties fixed (e.g., norm, token overlap, or semantic drift) while varying only variance; without this, the causal attribution and the distinction from the decorrelation mechanism in pseudo-document methods cannot be isolated.

Authors: We appreciate the referee highlighting the correlational nature of our mechanistic claims. Our analysis in §5 compared variance increases after simple rewriting against decorrelation effects in pseudo-document methods, using before/after statistics across retrievers. We agree that without controlled ablations isolating variance from confounders like length or overlap, full causality cannot be established. In the revised manuscript we add a new ablation subsection that generates length- and overlap-matched rewrites (via constrained prompting) and shows that bias reduction tracks variance changes even when length and lexical features are held constant. We also explicitly note the remaining limitations of our evidence and qualify the taxonomy accordingly. This addresses the concern while preserving the observed distinction between mechanisms. revision: yes
Referee: [§4] §4 (Experimental Results) and §3 (Adversarial Setup): The 54% aggregate bias reduction and the failure under adversarial conditions are presented as central findings, yet the manuscript provides no details on how the four biases were quantified, how adversarial combinations were constructed, or any statistical tests (e.g., significance or confidence intervals) supporting the aggregate number. These omissions make it impossible to assess whether the reported effect sizes are robust or sensitive to implementation choices.

Authors: We acknowledge that the original submission omitted key implementation and statistical details, which limits assessment of robustness. In the revised version we expand §3 and §4 with: (1) exact formulas and pseudocode for computing each bias (brevity, position, literal matching, repetition) on the retrieved document sets; (2) the precise procedure for constructing adversarial combinations, including query templates and how multiple biases are jointly induced; and (3) bootstrap-derived 95% confidence intervals and paired significance tests (p < 0.01) for the 54% aggregate reduction, computed over retrievers and datasets. These additions enable full reproducibility and sensitivity analysis. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical evaluation of rewriting effects on measured biases

full rationale

The paper conducts a systematic empirical comparison of five query rewriting methods across six retrievers and four named biases, reporting aggregate reductions (e.g., 54%) and mechanistic observations from score variance and feature correlations. No equations, parameter fits presented as predictions, self-citations as load-bearing uniqueness theorems, or ansatzes smuggled via prior work appear in the abstract or described structure. All central claims derive from direct experimental measurements rather than any reduction to the paper's own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claims rest on domain assumptions about how the four biases are defined and measured and on the representativeness of the tested methods and retrievers; no free parameters or invented entities are described in the abstract.

axioms (2)

domain assumption Dense retrievers exhibit systematic biases including brevity, position, literal matching, and repetition biases that can be quantified and aggregated
The 54% reduction figure and mechanistic distinctions depend on these bias definitions and measurement procedures.
domain assumption The five rewriting methods and six retrievers are sufficient to reveal general patterns and limits of query-side interventions
General guidance and the taxonomy rest on this representativeness assumption.

pith-pipeline@v0.9.0 · 5507 in / 1466 out tokens · 81762 ms · 2026-05-10T18:31:48.351099+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

14 extracted references · 14 canonical work pages · 1 internal anchor

[1]

InProceed- ings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 538–548, Abu Dhabi, United Arab Emirates

RetroMAE: Pre-training retrieval-oriented lan- guage models via masked auto-encoder. InProceed- ings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 538–548, Abu Dhabi, United Arab Emirates. Association for Com- putational Linguistics. An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang...

work page 2022
[2]

Qwen3 Technical Report

Qwen3 technical report.arXiv preprint arXiv:2505.09388. Yue Yu, Chenyan Xiong, Si Sun, Chao Zhang, and Arnold Overwijk. 2022. Coco-dr: Combating dis- tribution shifts in zero-shot dense retrieval with con- trastive and distributionally robust learning.arXiv preprint arXiv:2210.15212. Xianyang Zhan, Agam Goyal, Yilun Chen, Eshwar Chandrasekharan, and Koust...

work page internal anchor Pith review Pith/arXiv arXiv 2022
[3]

When was The Private Life of Helen of Troy published?

work page
[4]

Where is Los Angeles located in or next to a body of water?

work page
[5]

Where was Cesare Mori born? LLM Rewritten Queries

work page
[6]

What is the publication date of The Private Life of Helen of Troy?

work page
[7]

What body of water is Los Angeles located near or in?

work page
[8]

What is the birthplace of Cesare Mori? HyDE Queries

work page
[9]

Private Life of Helen of Troy

The “Private Life of Helen of Troy” was published in 1920 by the renowned author, Robert Graves. This groundbreaking work, a reimagining of the Trojan War legend, offered a fresh perspective on Helen’s character and the events surrounding the war

work page 1920
[10]

The city is situated on the shores of the Pacific Ocean, with the Pacific Ocean bordering its western side

Los Angeles is located on the southern coast of California, a state situated on the Pacific Ocean. The city is situated on the shores of the Pacific Ocean, with the Pacific Ocean bordering its western side. The city is also bordered by the Los Angeles Basin, a large inland basin that is partially surrounded by mountains

work page
[11]

He was a powerful and controversial figure who served as Cardinal and later as the Duke of Valentinois

Cesare Borgia, the Italian Renaissance figure, died in Rome, Italy on 18th of January 1507. He was a powerful and controversial figure who served as Cardinal and later as the Duke of Valentinois. His death was a significant event in the history of the Borgia family and the Italian Renaissance. Query2Doc Queries

work page
[12]

Private Life of Helen of Troy

When was The Private Life of Helen of Troy published? The “Private Life of Helen of Troy” was published in 1920 by the renowned author, Robert Graves. This groundbreaking work, a reimagining of the Trojan War legend, offered a fresh perspective on Helen’s character and the events surrounding the war

work page 1920
[13]

The city is situated on the shores of the Pacific Ocean, with the Pacific Ocean bordering its western side

Where is Los Angeles located in or next to a body of water? Los Angeles is located on the southern coast of California, a state situated on the Pacific Ocean. The city is situated on the shores of the Pacific Ocean, with the Pacific Ocean bordering its western side. The city is also bordered by the Los Angeles Basin, a large inland basin that is partially...

work page
[14]

When was X published?

Where was Cesare Mori born? Cesare Borgia, the Italian Renaissance figure, died in Rome, Italy on 18th of January 1507. He was a powerful and controversial figure who served as Cardinal and later as the Duke of Valentinois. His death was a significant event in the history of the Borgia family and the Italian Renaissance. B Results for Qwen3 To verify that...

work page

[1] [1]

InProceed- ings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 538–548, Abu Dhabi, United Arab Emirates

RetroMAE: Pre-training retrieval-oriented lan- guage models via masked auto-encoder. InProceed- ings of the 2022 Conference on Empirical Methods in Natural Language Processing, pages 538–548, Abu Dhabi, United Arab Emirates. Association for Com- putational Linguistics. An Yang, Anfeng Li, Baosong Yang, Beichen Zhang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang...

work page 2022

[2] [2]

Qwen3 Technical Report

Qwen3 technical report.arXiv preprint arXiv:2505.09388. Yue Yu, Chenyan Xiong, Si Sun, Chao Zhang, and Arnold Overwijk. 2022. Coco-dr: Combating dis- tribution shifts in zero-shot dense retrieval with con- trastive and distributionally robust learning.arXiv preprint arXiv:2210.15212. Xianyang Zhan, Agam Goyal, Yilun Chen, Eshwar Chandrasekharan, and Koust...

work page internal anchor Pith review Pith/arXiv arXiv 2022

[3] [3]

When was The Private Life of Helen of Troy published?

work page

[4] [4]

Where is Los Angeles located in or next to a body of water?

work page

[5] [5]

Where was Cesare Mori born? LLM Rewritten Queries

work page

[6] [6]

What is the publication date of The Private Life of Helen of Troy?

work page

[7] [7]

What body of water is Los Angeles located near or in?

work page

[8] [8]

What is the birthplace of Cesare Mori? HyDE Queries

work page

[9] [9]

Private Life of Helen of Troy

The “Private Life of Helen of Troy” was published in 1920 by the renowned author, Robert Graves. This groundbreaking work, a reimagining of the Trojan War legend, offered a fresh perspective on Helen’s character and the events surrounding the war

work page 1920

[10] [10]

The city is situated on the shores of the Pacific Ocean, with the Pacific Ocean bordering its western side

Los Angeles is located on the southern coast of California, a state situated on the Pacific Ocean. The city is situated on the shores of the Pacific Ocean, with the Pacific Ocean bordering its western side. The city is also bordered by the Los Angeles Basin, a large inland basin that is partially surrounded by mountains

work page

[11] [11]

He was a powerful and controversial figure who served as Cardinal and later as the Duke of Valentinois

Cesare Borgia, the Italian Renaissance figure, died in Rome, Italy on 18th of January 1507. He was a powerful and controversial figure who served as Cardinal and later as the Duke of Valentinois. His death was a significant event in the history of the Borgia family and the Italian Renaissance. Query2Doc Queries

work page

[12] [12]

Private Life of Helen of Troy

When was The Private Life of Helen of Troy published? The “Private Life of Helen of Troy” was published in 1920 by the renowned author, Robert Graves. This groundbreaking work, a reimagining of the Trojan War legend, offered a fresh perspective on Helen’s character and the events surrounding the war

work page 1920

[13] [13]

The city is situated on the shores of the Pacific Ocean, with the Pacific Ocean bordering its western side

Where is Los Angeles located in or next to a body of water? Los Angeles is located on the southern coast of California, a state situated on the Pacific Ocean. The city is situated on the shores of the Pacific Ocean, with the Pacific Ocean bordering its western side. The city is also bordered by the Los Angeles Basin, a large inland basin that is partially...

work page

[14] [14]

When was X published?

Where was Cesare Mori born? Cesare Borgia, the Italian Renaissance figure, died in Rome, Italy on 18th of January 1507. He was a powerful and controversial figure who served as Cardinal and later as the Duke of Valentinois. His death was a significant event in the history of the Borgia family and the Italian Renaissance. B Results for Qwen3 To verify that...

work page