pith. sign in

arxiv: 2605.28112 · v1 · pith:IZKX2E2Ynew · submitted 2026-05-27 · 💻 cs.CR · cs.CL· cs.IR

A Wolf in Sheep's Clothing: Targeted Routing Hijacking in Federated RAG

Pith reviewed 2026-06-29 11:35 UTC · model grok-4.3

classification 💻 cs.CR cs.CLcs.IR
keywords federated ragrouting hijackingretrieval-augmented generationsecurity attackdata poisoningquery routingfederated learningadversarial manipulation
0
0 comments X

The pith

Malicious clients forge semantic profiles to hijack routing in FedRAG, misdirecting queries and triggering hallucinations.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Federated RAG keeps raw data local for privacy, forcing routing decisions to rely on client-supplied semantic profiles. The paper shows that this setup allows a malicious client to fabricate its profile and attract specific target queries even when its underlying data is irrelevant. Across three standard routing architectures the attack produces consistent misrouting, which then causes missing evidence, data poisoning, wrong answers, and hallucinations in the generation stage. A MedQA-USMLE case study confirms that the poisoned evidence misleads models of different sizes. Existing defenses such as encrypted routing and Byzantine-robust aggregation leave the vulnerability open, leading the authors to introduce a post-routing reweighting method based on retrieval feedback.

Core claim

Routing Hijacking is a routing-stage attack in which a malicious client forges its profile to attract target queries despite having irrelevant underlying data. This vulnerability is severe. Across three representative FedRAG routing architectures, Routing Hijacking consistently misroutes target queries and leads to downstream disruptions and failures, including missing evidence, poisoning, incorrect answers, and hallucinations. In a high-stakes MedQA-USMLE case study, poisoned retrieved evidence misleads models across scales, leading to incorrect answers, hallucinations, and sycophantic failures. Existing defenses do not close this gap: encrypted routing preserves the exploited ranking, and

What carries the argument

Routing Hijacking attack that exploits unverified, client-provided semantic profiles to manipulate query routing in FedRAG.

If this is right

  • Misrouting produces concrete downstream failures such as missing evidence and hallucinations.
  • The attack succeeds against three representative FedRAG routing architectures.
  • Poisoned evidence from hijacked routes misleads models on medical QA tasks across model scales.
  • Encrypted routing and Byzantine-robust FL rules leave the routing vulnerability intact.
  • A trust-aware post-routing framework using relevance, consistency, and agreement feedback can suppress persistent hijacking and transfer to neural routers.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Systems that select data sources using only self-reported metadata may share similar hijacking risks beyond FedRAG.
  • Independent verification of client data relevance could be tested as a direct countermeasure.
  • The feedback-based reweighting approach might extend to other federated selection problems where profile accuracy is hard to audit upfront.

Load-bearing premise

The routing mechanism trusts and ranks clients based solely on the semantic profiles they voluntarily provide, without independent verification of profile accuracy or data relevance.

What would settle it

A routing implementation that rejects forged profiles by cross-checking returned evidence against the claimed profile or by requiring proof of data relevance would show the attack does not succeed.

Figures

Figures reproduced from arXiv: 2605.28112 by Junjie Mu, Qiongxiu Li.

Figure 1
Figure 1. Figure 1: Routing Hijacking and TASR in FedRAG. A malicious client forges a target-domain semantic profile to [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Failure Mode Distribution Under Poisoned [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Harmful Content Injection. A selected mali [PITH_FULL_IMAGE:figures/full_fig_p013_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Missing Information Attack. A selected mali [PITH_FULL_IMAGE:figures/full_fig_p013_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Data Poisoning Attack. A selected malicious [PITH_FULL_IMAGE:figures/full_fig_p014_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: HR@1 on Physics under single-domain and multi-domain client configurations [PITH_FULL_IMAGE:figures/full_fig_p016_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: HR@1 of Byzantine-robust baselines under [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Effective trust trajectory of the malicious [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗
read the original abstract

Federated Retrieval-Augmented Generation (FedRAG) is attractive for privacy-sensitive applications because raw data remain local. As a result, routing must rely on client-provided semantic profiles, creating a new opportunity for manipulation. We introduce Routing Hijacking, a routing-stage attack in which a malicious client forges its profile to attract target queries despite having irrelevant underlying data. We show that this vulnerability is severe. Across three representative FedRAG routing architectures, Routing Hijacking consistently misroutes target queries and leads to downstream disruptions and failures, including missing evidence, poisoning, incorrect answers, and hallucinations. In a high-stakes MedQA-USMLE case study, we further show that poisoned retrieved evidence can mislead models across scales, leading to incorrect answers, hallucinations, and sycophantic failures. Existing defenses do not close this gap: encrypted routing preserves the exploited ranking, and Byzantine-robust Federated Learning (FL) rules transfer poorly to heterogeneous routing profiles. To address this gap, we propose a trust-aware post-routing framework that reweights clients using returned-evidence feedback, including retrieval relevance, profile consistency, and cross-client agreement; online experiments show that it suppresses persistent hijacking over recurring queries and transfers to a learned neural router. Our findings establish routing integrity as a new security challenge in FedRAG and highlight the need for stronger defenses for secure federated retrieval.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

0 major / 3 minor

Summary. The manuscript introduces Routing Hijacking, an attack in which a malicious client in Federated RAG forges its semantic profile to attract target queries despite holding irrelevant data. It reports that the attack succeeds across three representative FedRAG routing architectures, produces downstream failures (missing evidence, poisoning, incorrect answers, hallucinations), demonstrates these effects in a MedQA-USMLE case study across model scales, shows that encrypted routing and Byzantine-robust FL do not close the gap, and proposes a trust-aware post-routing framework that reweights clients via retrieval relevance, profile consistency, and cross-client agreement; online experiments indicate the framework suppresses persistent hijacking and transfers to learned neural routers.

Significance. If the empirical results hold, the work is significant for establishing routing integrity as a distinct security challenge in FedRAG systems that rely on unverified client profiles to preserve privacy. The cross-architecture evaluation, the high-stakes MedQA case study, and the concrete post-routing mitigation (with online experiments) provide falsifiable evidence and a practical starting point for defenses. The explicit premise that routing trusts voluntarily provided profiles without verification is stated directly and underpins the attack surface analysis.

minor comments (3)
  1. [Abstract] Abstract: the phrase 'sycophantic failures' is used without a brief parenthetical definition or example; adding one would improve accessibility for readers outside the immediate subfield.
  2. [Evaluation] The three representative routing architectures are described at a high level; a short table or paragraph in the evaluation section listing their key differences (e.g., profile representation, ranking function) would aid reproducibility.
  3. [Defense Proposal] The trust-aware framework description would benefit from an explicit equation or pseudocode for the reweighting function that combines the three feedback signals.

Simulated Author's Rebuttal

0 responses · 0 unresolved

We thank the referee for their positive assessment of the manuscript, accurate summary of the contributions, and recommendation for minor revision. We appreciate the recognition that routing integrity represents a distinct security challenge in FedRAG.

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper is an empirical security analysis of a routing attack in FedRAG. It states the core premise (routing trusts unverified client semantic profiles) directly in the abstract and introduction, then reports experimental outcomes across three architectures, a MedQA case study, and a proposed mitigation. No equations, fitted parameters, self-definitional reductions, or load-bearing self-citations appear in the provided text. The central claims follow from the stated attack surface and observed results rather than any internal redefinition or renaming of inputs.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 1 invented entities

The central claim depends on the domain assumption that client profiles are forgeable and that routing decisions are made without external verification of data quality.

axioms (1)
  • domain assumption Routing decisions in FedRAG are made exclusively from client-provided semantic profiles
    Stated in abstract as the reason the attack is possible.
invented entities (1)
  • Routing Hijacking attack no independent evidence
    purpose: Demonstrate targeted misrouting via profile forgery
    New attack concept introduced to explain the vulnerability.

pith-pipeline@v0.9.1-grok · 5776 in / 1181 out tokens · 25491 ms · 2026-06-29T11:35:46.347474+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

5 extracted references · 3 canonical work pages

  1. [1]

    Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun

    Federated retrieval-augmented generation: A systematic mapping study.arXiv preprint arXiv:2505.18906. Jiawei Chen, Hongyu Lin, Xianpei Han, and Le Sun

  2. [2]

    InProceedings of the AAAI Conference on Artificial Intelligence, vol- ume 38, pages 17754–17762

    Benchmarking large language models in retrieval-augmented generation. InProceedings of the AAAI Conference on Artificial Intelligence, vol- ume 38, pages 17754–17762. Jung Hee Cheon, Andrey Kim, Miran Kim, and Yong- soo Song. 2017. Homomorphic encryption for arith- metic of approximate numbers. InInternational con- ference on the theory and application of...

  3. [3]

    Flax Sentence Embeddings Team

    The faiss library. Flax Sentence Embeddings Team

  4. [4]

    Unic-rag: Universal knowledge corruption attacks to retrieval-augmented generation.arXiv preprint arXiv:2508.18652, 2025

    Stack exchange question pairs. https://huggingface.co/datasets/flax-sentence- embeddings/. Runpeng Geng, Yanting Wang, Ying Chen, and Jinyuan Jia. 2025. Unic-rag: Universal knowledge corrup- tion attacks to retrieval-augmented generation.arXiv preprint arXiv:2508.18652. Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, A...

  5. [5]

    arXiv preprint arXiv:2006.09365 , year=

    Byzantine-robust learning on heteroge- neous datasets via bucketing.arXiv preprint arXiv:2006.09365. Yubin Kim, Hyewon Jeong, Shan Chen, Shuyue Stella Li, Chanwoo Park, Mingyu Lu, Kumail Alhamoud, Jimin Mun, Cristina Grau, Minseok Jung, and 1 oth- ers. 2025. Medical hallucinations in foundation mod- els and their impact on healthcare.arXiv preprint arXiv:...