Efficient Federated Search for Retrieval-Augmented Generation using Lightweight Routing

Akash Dhasade; Anne-Marie Kermarrec; Diana Petrescu; Martijn de Vos; Mathis Randl; Rachid Guerraoui; Rafael Pires

arxiv: 2502.19280 · v2 · submitted 2025-02-26 · 💻 cs.LG · cs.DC· cs.IR

Efficient Federated Search for Retrieval-Augmented Generation using Lightweight Routing

Akash Dhasade , Rachid Guerraoui , Anne-Marie Kermarrec , Diana Petrescu , Rafael Pires , Mathis Randl , Martijn de Vos This is my paper

Pith reviewed 2026-05-23 02:15 UTC · model grok-4.3

classification 💻 cs.LG cs.DCcs.IR

keywords federated searchretrieval-augmented generationRAG routingneural classifiercommunication efficiencylatency reductiondistributed retrieval

0 comments

The pith

RAGRoute uses a neural classifier to route queries only to relevant sources in federated RAG, cutting communication volume by up to 80.65% and latency by 52.50% while matching full accuracy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces RAGRoute to support retrieval-augmented generation when documents sit across separate organizations that cannot pool their data. A lightweight neural classifier examines each query and picks only the sources likely to contain useful information, skipping the rest. This selective routing avoids the cost of broadcasting every query to every source. Experiments on three benchmarks show that retrieval accuracy stays the same as the all-sources baseline. The approach therefore lowers data movement and response time without sacrificing quality.

Core claim

RAGRoute is a lightweight routing mechanism that employs a neural classifier to dynamically select relevant data sources at query time for federated search in RAG systems. By avoiding indiscriminate querying of all sources, the method reduces communication volume by up to 80.65% and end-to-end latency by up to 52.50% across three benchmarks while preserving retrieval accuracy equivalent to querying every source.

What carries the argument

A neural classifier that predicts which data sources are relevant to a given query and thereby enables selective rather than broadcast routing.

Load-bearing premise

The neural classifier can reliably pick all relevant sources for arbitrary queries without systematically omitting any that would lower overall retrieval quality.

What would settle it

A benchmark run in which the classifier consistently fails to select a source containing unique relevant documents for a measurable fraction of queries, producing lower accuracy than the full-query baseline.

Figures

Figures reproduced from arXiv: 2502.19280 by Akash Dhasade, Anne-Marie Kermarrec, Diana Petrescu, Martijn de Vos, Mathis Randl, Rachid Guerraoui, Rafael Pires.

**Figure 2.** Figure 2: The relevance of different corpora in RAG when answering questions, using question sets from MIRAGE. split into chunks. Each chunk is then converted into a highdimensional vector using an embedding model (step 1). These embeddings are then stored in a vector database (2). When a user submits a query (3), it is transformed into an embedding and passed to the retriever (4), which searches for the most relev… view at source ↗

**Figure 3.** Figure 3: The workflow of RAGRoute. The components specific to RAGRoute are indicated in the box with the dashed border. In contrast to existing RAG workflows that rely on a single data store, RAGRoute enables efficient federated search by using a lightweight router to determine relevant data sources during an inference request. 4 Evaluation We implement RAGRoute and evaluate its effectiveness and efficiency. Specif… view at source ↗

**Figure 4.** Figure 4: The mean recall for both benchmarks and for different data sources. We also show the mean recall for RAGRoute. Experiment Accuracy (%) Recall (%) F1-Score (%) AUC (%) MIRAGE (Top 32) 85.63 ± 3.92 85.47 ± 3.61 85.79 ± 2.45 92.6 ± 2.33 MIRAGE (Top 10) 87.3 ± 6.1 88.32 ± 3.96 85.43 ± 4.18 93.67 ± 3.33 MMLU (Top 10) 90.06 ± 5.04 76.23 ± 6.64 78.29 ± 7.59 92.88 ± 3.29 [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

**Figure 5.** Figure 5: The number of queries for both benchmarks and for different routing strategies. 4.4 RAGRoute efficiency gains Next, we quantify the reduction by RAGRoute in the number of queries and communication volume. Number of queries [PITH_FULL_IMAGE:figures/full_fig_p005_5.png] view at source ↗

read the original abstract

Large language models (LLMs) achieve remarkable performance across domains but remain prone to hallucinations and inconsistencies. Retrieval-augmented generation (RAG) mitigates these issues by augmenting model inputs with relevant documents retrieved from external sources. In many real-world scenarios, relevant knowledge is fragmented across organizations or institutions, motivating the need for federated search mechanisms that can aggregate results from heterogeneous data sources without centralizing the data. We introduce RAGRoute, a lightweight routing mechanism for federated search in RAG systems that dynamically selects relevant data sources at query time using a neural classifier, avoiding indiscriminate querying. This selective routing reduces communication overhead and end-to-end latency while preserving retrieval quality, achieving up to 80.65% reductions in communication volume and 52.50% reductions in latency across three benchmarks, while matching the accuracy of querying all sources.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

RAGRoute adds a practical neural router for federated RAG that cuts communication and latency while claiming accuracy parity, but the parity rests on unverified source recall.

read the letter

RAGRoute trains a lightweight neural classifier to pick which data sources to query at runtime instead of hitting every source in a federated RAG setup. The core new element is this selective routing layer built specifically for the constraint that data stays local and cannot be centralized. The paper reports the mechanism, how the classifier is trained on source relevance, and results across three benchmarks. It does well on the practical metrics: the claimed reductions reach 80.65% in communication volume and 52.5% in latency while retrieval accuracy stays level with the full-query baseline. Those numbers address a real cost in distributed deployments. The soft spot is exactly the one flagged in the stress test. Accuracy parity requires the classifier to maintain high recall on sources that actually contain useful documents for the query. If relevant documents are split across sources or if the training labels are single-source only, even moderate false negatives would produce measurable drops that the headline numbers do not address. The abstract gives only summary percentages, so the experiments section must supply source-level recall figures, error bars, dataset descriptions, and a direct comparison showing the end-to-end metrics are statistically indistinguishable. Without that, the central claim stays provisional. The rest of the work looks standard: no circular derivations, ordinary citation framing for RAG and federated learning. This paper is for people building or evaluating federated retrieval pipelines who need a concrete way to trim overhead. A reader who wants to test a routing method on their own benchmarks will get usable details. It deserves peer review because the problem is concrete, the approach is simple to replicate, and the reported gains are large enough to check.

Referee Report

3 major / 1 minor

Summary. The manuscript introduces RAGRoute, a lightweight neural classifier for routing queries to relevant data sources in federated RAG setups. It claims this selective routing achieves up to 80.65% reduction in communication volume and 52.50% reduction in latency across three benchmarks while matching the retrieval accuracy of querying all sources.

Significance. If the empirical claims hold under rigorous verification, the work addresses a practical bottleneck in distributed RAG by enabling efficient federated search without data centralization. The reported overhead reductions are large enough to matter for real-world multi-institutional deployments, provided the accuracy parity is shown to be robust rather than benchmark-specific.

major comments (3)

[Section 4] Section 4 (experiments): The accuracy-matching claim is load-bearing yet rests on aggregate end-to-end metrics without reported source-level recall or precision of the classifier. The manuscript must show that false-negative rate on relevant sources is low enough that retrieval metrics (e.g., recall@K) remain statistically indistinguishable from the all-sources baseline, including error bars and significance tests.
[Section 3] Section 3 (method): The training procedure for the neural classifier is not described in sufficient detail to evaluate the weakest assumption. The paper should specify the labeling strategy (single-source vs. multi-source relevance), loss function, and how queries whose relevant documents are split across sources are handled during training and evaluation.
[Section 4] Section 4 (experiments): No information is provided on dataset characteristics, number of sources per benchmark, query distribution, or baseline routing methods. Without these, it is impossible to assess whether the reported reductions generalize or are artifacts of particular benchmark constructions.

minor comments (1)

[Abstract] The abstract states concrete percentage improvements without referencing the corresponding tables or figures; cross-references should be added.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive feedback. We address each major comment below and will revise the manuscript accordingly to strengthen the presentation of the method and experiments.

read point-by-point responses

Referee: [Section 4] Section 4 (experiments): The accuracy-matching claim is load-bearing yet rests on aggregate end-to-end metrics without reported source-level recall or precision of the classifier. The manuscript must show that false-negative rate on relevant sources is low enough that retrieval metrics (e.g., recall@K) remain statistically indistinguishable from the all-sources baseline, including error bars and significance tests.

Authors: We agree that source-level metrics would provide stronger support for the accuracy parity claim. In the revised manuscript we will add the classifier's per-source precision and recall, false-negative rates on relevant sources, error bars on all retrieval metrics, and statistical significance tests against the all-sources baseline. revision: yes
Referee: [Section 3] Section 3 (method): The training procedure for the neural classifier is not described in sufficient detail to evaluate the weakest assumption. The paper should specify the labeling strategy (single-source vs. multi-source relevance), loss function, and how queries whose relevant documents are split across sources are handled during training and evaluation.

Authors: We will expand Section 3 with the requested details: the labeling strategy (multi-label relevance when documents span sources), the loss function used for training the router, and the procedure for handling split-relevance queries in both training and evaluation. revision: yes
Referee: [Section 4] Section 4 (experiments): No information is provided on dataset characteristics, number of sources per benchmark, query distribution, or baseline routing methods. Without these, it is impossible to assess whether the reported reductions generalize or are artifacts of particular benchmark constructions.

Authors: We will add a dedicated subsection in Section 4 describing dataset characteristics, the number of sources per benchmark, query distributions, and explicit comparisons to baseline routing methods to allow readers to evaluate generalizability. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical system with direct benchmark measurements.

full rationale

The paper describes RAGRoute as a neural classifier-based router for federated RAG search. Claims of communication/latency reductions and accuracy matching are presented as direct empirical outcomes from three benchmarks, not as quantities derived from equations or fitted parameters that are then renamed as predictions. No equations, self-definitional steps, fitted-input predictions, or load-bearing self-citations appear in the provided text. The method is self-contained against external benchmarks, with results falsifiable via the reported measurements rather than forced by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Abstract-only review; no explicit free parameters, axioms, or invented entities are described.

pith-pipeline@v0.9.0 · 5701 in / 1031 out tokens · 52695 ms · 2026-05-23T02:15:03.050474+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

36 extracted references · 36 canonical work pages · 2 internal anchors

[1]

C-fedrag: A confidential federated retrieval-augmented generation system

Parker Addison, Minh-Tuan H Nguyen, Tomislav Medan, Moham- mad T Manzari, Brendan McElrone, Laksh Lalwani, Aboli More, Smita Sharma, Holger R Roth, Isaac Yang, et al. C-fedrag: A confidential federated retrieval-augmented generation system. arXiv preprint arXiv:2412.13163, 2024

work page arXiv 2024
[2]

On the effectiveness of one-shot federated ensembles in heterogeneous cross-silo settings

Youssef Allouah, Akash Dhasade, Rachid Guerraoui, Nirupam Gupta, Anne-Marie Kermarrec, Rafael Pinot, Rafael Pires, and Rishi Sharma. On the effectiveness of one-shot federated ensembles in heterogeneous cross-silo settings. Advances in Neural Information Processing Systems , 2024. Efficient Federated Search for Retrieval-Augmented Generation EuroMLSys’25,...

work page 2024
[3]

Classification-based resource selection

Jaime Arguello, Jamie Callan, and Fernando Diaz. Classification-based resource selection. In Proceedings of the 18th ACM conference on Infor- mation and knowledge management , pages 1277–1286, 2009

work page 2009
[4]

An analysis of large language models: their impact and potential applications

G Bharathi Mohan, R Prasanna Kumar, P Vishal Krishh, A Keerthi- nathan, G Lavanya, Meka Kavya Uma Meghana, Sheba Sulthana, and Srinath Doss. An analysis of large language models: their impact and potential applications. Knowledge and Information Systems, pages 1–24, 2024

work page 2024
[5]

Information scattering

Suresh K Bhavnani and Concepción S Wilson. Information scattering. Encyclopedia of library and information sciences, pages 2564–2569, 2009

work page 2009
[6]

Wikipedia 2023-11 embed multilingual v3, 2023

Cohere. Wikipedia 2023-11 embed multilingual v3, 2023. Accessed: 2025-02-10

work page 2023
[7]

Learning to rank resources

Zhuyun Dai, Yubin Kim, and Jamie Callan. Learning to rank resources. In Proceedings of the 40th International ACM SIGIR conference on re- search and development in information retrieval , pages 837–840, 2017

work page 2017
[8]

The faiss library

Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hos- seini, and Hervé Jégou. The faiss library. 2024

work page 2024
[9]

The Llama 3 Herd of Models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Ka- dian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024
[10]

Determinants of llm-assisted decision-making

Eva Eigner and Thorsten Händler. Determinants of llm-assisted decision-making. arXiv preprint arXiv:2402.17385, 2024

work page arXiv 2024
[11]

Federated search tech- niques: an overview of the trends and state of the art

Adamu Garba, Shengli Wu, and Shah Khalid. Federated search tech- niques: an overview of the trends and state of the art. Knowledge and Information Systems, 65(12):5065–5095, 2023

work page 2023
[12]

The ethics of chatgpt in medicine and healthcare: a systematic review on large language models (llms)

Joschka Haltaufderheide and Robert Ranisch. The ethics of chatgpt in medicine and healthcare: a systematic review on large language models (llms). NPJ digital medicine, 7(1):183, 2024

work page 2024
[13]

A comprehensive survey on vector database: Storage and retrieval technique, challenge

Yikun Han, Chunjiang Liu, and Pengfei Wang. A comprehensive survey on vector database: Storage and retrieval technique, challenge. arXiv preprint arXiv:2310.11703, 2023

work page arXiv 2023
[14]

Measuring massive multi- task language understanding, 2021

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multi- task language understanding, 2021

work page 2021
[15]

Towards mitigating llm hallucination via self reflection

Ziwei Ji, Tiezheng Yu, Yan Xu, Nayeon Lee, Etsuko Ishii, and Pascale Fung. Towards mitigating llm hallucination via self reflection. In Findings of the Association for Computational Linguistics: EMNLP 2023 , pages 1827–1843, 2023

work page 2023
[16]

Clinical Question-Answering over Distributed EHR Data

Emily Jiang. Clinical Question-Answering over Distributed EHR Data. PhD thesis, Massachusetts Institute of Technology, 2024

work page 2024
[17]

Comeau, Lana Yeganova, W

Qiao Jin, Won Kim, Qingyu Chen, Donald C. Comeau, Lana Yeganova, W. John Wilbur, and Zhiyong Lu. MedCPT: Contrastive pre-trained transformers with large-scale pubmed search logs for zero-shot biomedical information retrieval. Bioinformatics, 2023

work page 2023
[18]

Advances and open problems in federated learning

Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bel- let, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. Advances and open problems in federated learning. Foundations and trends ® in machine learning, 14(1–2):1–210, 2021

work page 2021
[19]

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Ben- jamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2001
[20]

Performance evaluation of vector embeddings with retrieval-augmented generation

Sanjay Kukreja, Tarun Kumar, Vishal Bharate, Amit Purohit, Abhi- jit Dasgupta, and Debashis Guha. Performance evaluation of vector embeddings with retrieval-augmented generation. In 2024 9th Interna- tional Conference on Computer and Communication Systems (ICCCS) , pages 333–340. IEEE, 2024

work page 2024
[21]

Yoonjoo Lee, Kihoon Son, Tae Soo Kim, Jisu Kim, John Joon Young Chung, Eytan Adar, and Juho Kim. One vs. many: Comprehending accurate information from multiple erroneous and inconsistent ai generations. In The 2024 ACM Conference on Fairness, Accountability, and Transparency, pages 2518–2531, 2024

work page 2024
[22]

Retrieval-augmented generation for knowledge- intensive nlp tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge- intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020

work page 2020
[23]

Approximate nearest neighbor search on high di- mensional data—experiments, analyses, and improvement

Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Mingjie Li, Wenjie Zhang, and Xuemin Lin. Approximate nearest neighbor search on high di- mensional data—experiments, analyses, and improvement. IEEE Trans- actions on Knowledge and Data Engineering , 32(8):1475–1488, 2019

work page 2019
[24]

Cache me if you can: The case for retrieval augmentation in federated learning

Aashiq Muhamed, Pratiksha Thaker, Mona T Diab, and Virginia Smith. Cache me if you can: The case for retrieval augmentation in federated learning. In Privacy Regulation and Protection in Machine Learning

work page
[25]

Ollama: Get up and running with large language models

Ollama. Ollama: Get up and running with large language models. GitHub repository, 2025. Accessed: February 8, 2025

work page 2025
[26]

Maximizing rag efficiency: A compar- ative analysis of rag methods

Tolga Şakar and Hakan Emekci. Maximizing rag efficiency: A compar- ative analysis of rag methods. Natural Language Processing, 31(1):1–25, 2025

work page 2025
[27]

A collaborative multi-agent approach to retrieval-augmented generation across diverse data

Aniruddha Salve, Saba Attar, Mahesh Deshmukh, Sayali Shivpuje, and Arnab Mitra Utsab. A collaborative multi-agent approach to retrieval-augmented generation across diverse data. arXiv preprint arXiv:2412.05838, 2024

work page arXiv 2024
[28]

Know where to go: Make llm a relevant, responsible, and trustworthy searchers

Xiang Shi, Jiawei Liu, Yinpeng Liu, Qikai Cheng, and Wei Lu. Know where to go: Make llm a relevant, responsible, and trustworthy searchers. Decision Support Systems, 188:114354, 2025

work page 2025
[29]

Federated search

Milad Shokouhi, Luo Si, et al. Federated search. Foundations and Trends® in Information Retrieval, 5(1):1–102, 2011

work page 2011
[30]

Retrieval-qa-benchmark: A benchmark for evaluating retrieval-augmented qa systems

MyScale Team. Retrieval-qa-benchmark: A benchmark for evaluating retrieval-augmented qa systems. GitHub repository, 2024. Accessed: 2025-02-11

work page 2024
[31]

Feb4rag: Evaluating federated search in the context of retrieval augmented generation

Shuai Wang, Ekaterina Khramtsova, Shengyao Zhuang, and Guido Zuccon. Feb4rag: Evaluating federated search in the context of retrieval augmented generation. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages 763–773, 2024

work page 2024
[32]

Resllm: Large language models are strong resource selectors for federated search

Shuai Wang, Shengyao Zhuang, Bevan Koopman, and Guido Zuc- con. Resllm: Large language models are strong resource selectors for federated search. arXiv preprint arXiv:2401.17645, 2024

work page arXiv 2024
[33]

Ltrrs: a learning to rank based algorithm for resource selection in distributed information retrieval

Tianfeng Wu, Xiaofeng Liu, and Shoubin Dong. Ltrrs: a learning to rank based algorithm for resource selection in distributed information retrieval. In Information Retrieval: 25th China Conference, CCIR 2019, Fuzhou, China, September 20–22, 2019, Proceedings 25 , pages 52–63. Springer, 2019

work page 2019
[34]

Bench- marking retrieval-augmented generation for medicine

Guangzhi Xiong, Qiao Jin, Zhiyong Lu, and Aidong Zhang. Bench- marking retrieval-augmented generation for medicine. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors, Findings of the As- sociation for Computational Linguistics ACL 2024 , pages 6233–6251, Bangkok, Thailand and virtual meeting, August 2024. Association for Computational Linguistics

work page 2024
[35]

Frag: Toward federated vector database management for collaborative and secure retrieval-augmented generation

Dongfang Zhao. Frag: Toward federated vector database management for collaborative and secure retrieval-augmented generation. arXiv preprint arXiv:2410.13272, 2024

work page arXiv 2024
[36]

Mixture-of- experts with expert choice routing

Yanqi Zhou, Tao Lei, Hanxiao Liu, Nan Du, Yanping Huang, Vincent Zhao, Andrew M Dai, Quoc V Le, James Laudon, et al. Mixture-of- experts with expert choice routing. Advances in Neural Information Processing Systems, 35:7103–7114, 2022

work page 2022

[1] [1]

C-fedrag: A confidential federated retrieval-augmented generation system

Parker Addison, Minh-Tuan H Nguyen, Tomislav Medan, Moham- mad T Manzari, Brendan McElrone, Laksh Lalwani, Aboli More, Smita Sharma, Holger R Roth, Isaac Yang, et al. C-fedrag: A confidential federated retrieval-augmented generation system. arXiv preprint arXiv:2412.13163, 2024

work page arXiv 2024

[2] [2]

On the effectiveness of one-shot federated ensembles in heterogeneous cross-silo settings

Youssef Allouah, Akash Dhasade, Rachid Guerraoui, Nirupam Gupta, Anne-Marie Kermarrec, Rafael Pinot, Rafael Pires, and Rishi Sharma. On the effectiveness of one-shot federated ensembles in heterogeneous cross-silo settings. Advances in Neural Information Processing Systems , 2024. Efficient Federated Search for Retrieval-Augmented Generation EuroMLSys’25,...

work page 2024

[3] [3]

Classification-based resource selection

Jaime Arguello, Jamie Callan, and Fernando Diaz. Classification-based resource selection. In Proceedings of the 18th ACM conference on Infor- mation and knowledge management , pages 1277–1286, 2009

work page 2009

[4] [4]

An analysis of large language models: their impact and potential applications

G Bharathi Mohan, R Prasanna Kumar, P Vishal Krishh, A Keerthi- nathan, G Lavanya, Meka Kavya Uma Meghana, Sheba Sulthana, and Srinath Doss. An analysis of large language models: their impact and potential applications. Knowledge and Information Systems, pages 1–24, 2024

work page 2024

[5] [5]

Information scattering

Suresh K Bhavnani and Concepción S Wilson. Information scattering. Encyclopedia of library and information sciences, pages 2564–2569, 2009

work page 2009

[6] [6]

Wikipedia 2023-11 embed multilingual v3, 2023

Cohere. Wikipedia 2023-11 embed multilingual v3, 2023. Accessed: 2025-02-10

work page 2023

[7] [7]

Learning to rank resources

Zhuyun Dai, Yubin Kim, and Jamie Callan. Learning to rank resources. In Proceedings of the 40th International ACM SIGIR conference on re- search and development in information retrieval , pages 837–840, 2017

work page 2017

[8] [8]

The faiss library

Matthijs Douze, Alexandr Guzhva, Chengqi Deng, Jeff Johnson, Gergely Szilvasy, Pierre-Emmanuel Mazaré, Maria Lomeli, Lucas Hos- seini, and Hervé Jégou. The faiss library. 2024

work page 2024

[9] [9]

The Llama 3 Herd of Models

Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Ka- dian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Amy Yang, Angela Fan, et al. The llama 3 herd of models. arXiv preprint arXiv:2407.21783, 2024

work page internal anchor Pith review Pith/arXiv arXiv 2024

[10] [10]

Determinants of llm-assisted decision-making

Eva Eigner and Thorsten Händler. Determinants of llm-assisted decision-making. arXiv preprint arXiv:2402.17385, 2024

work page arXiv 2024

[11] [11]

Federated search tech- niques: an overview of the trends and state of the art

Adamu Garba, Shengli Wu, and Shah Khalid. Federated search tech- niques: an overview of the trends and state of the art. Knowledge and Information Systems, 65(12):5065–5095, 2023

work page 2023

[12] [12]

The ethics of chatgpt in medicine and healthcare: a systematic review on large language models (llms)

Joschka Haltaufderheide and Robert Ranisch. The ethics of chatgpt in medicine and healthcare: a systematic review on large language models (llms). NPJ digital medicine, 7(1):183, 2024

work page 2024

[13] [13]

A comprehensive survey on vector database: Storage and retrieval technique, challenge

Yikun Han, Chunjiang Liu, and Pengfei Wang. A comprehensive survey on vector database: Storage and retrieval technique, challenge. arXiv preprint arXiv:2310.11703, 2023

work page arXiv 2023

[14] [14]

Measuring massive multi- task language understanding, 2021

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multi- task language understanding, 2021

work page 2021

[15] [15]

Towards mitigating llm hallucination via self reflection

Ziwei Ji, Tiezheng Yu, Yan Xu, Nayeon Lee, Etsuko Ishii, and Pascale Fung. Towards mitigating llm hallucination via self reflection. In Findings of the Association for Computational Linguistics: EMNLP 2023 , pages 1827–1843, 2023

work page 2023

[16] [16]

Clinical Question-Answering over Distributed EHR Data

Emily Jiang. Clinical Question-Answering over Distributed EHR Data. PhD thesis, Massachusetts Institute of Technology, 2024

work page 2024

[17] [17]

Comeau, Lana Yeganova, W

Qiao Jin, Won Kim, Qingyu Chen, Donald C. Comeau, Lana Yeganova, W. John Wilbur, and Zhiyong Lu. MedCPT: Contrastive pre-trained transformers with large-scale pubmed search logs for zero-shot biomedical information retrieval. Bioinformatics, 2023

work page 2023

[18] [18]

Advances and open problems in federated learning

Peter Kairouz, H Brendan McMahan, Brendan Avent, Aurélien Bel- let, Mehdi Bennis, Arjun Nitin Bhagoji, Kallista Bonawitz, Zachary Charles, Graham Cormode, Rachel Cummings, et al. Advances and open problems in federated learning. Foundations and trends ® in machine learning, 14(1–2):1–210, 2021

work page 2021

[19] [19]

Scaling Laws for Neural Language Models

Jared Kaplan, Sam McCandlish, Tom Henighan, Tom B Brown, Ben- jamin Chess, Rewon Child, Scott Gray, Alec Radford, Jeffrey Wu, and Dario Amodei. Scaling laws for neural language models.arXiv preprint arXiv:2001.08361, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2001

[20] [20]

Performance evaluation of vector embeddings with retrieval-augmented generation

Sanjay Kukreja, Tarun Kumar, Vishal Bharate, Amit Purohit, Abhi- jit Dasgupta, and Debashis Guha. Performance evaluation of vector embeddings with retrieval-augmented generation. In 2024 9th Interna- tional Conference on Computer and Communication Systems (ICCCS) , pages 333–340. IEEE, 2024

work page 2024

[21] [21]

Yoonjoo Lee, Kihoon Son, Tae Soo Kim, Jisu Kim, John Joon Young Chung, Eytan Adar, and Juho Kim. One vs. many: Comprehending accurate information from multiple erroneous and inconsistent ai generations. In The 2024 ACM Conference on Fairness, Accountability, and Transparency, pages 2518–2531, 2024

work page 2024

[22] [22]

Retrieval-augmented generation for knowledge- intensive nlp tasks

Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, et al. Retrieval-augmented generation for knowledge- intensive nlp tasks. Advances in Neural Information Processing Systems, 33:9459–9474, 2020

work page 2020

[23] [23]

Approximate nearest neighbor search on high di- mensional data—experiments, analyses, and improvement

Wen Li, Ying Zhang, Yifang Sun, Wei Wang, Mingjie Li, Wenjie Zhang, and Xuemin Lin. Approximate nearest neighbor search on high di- mensional data—experiments, analyses, and improvement. IEEE Trans- actions on Knowledge and Data Engineering , 32(8):1475–1488, 2019

work page 2019

[24] [24]

Cache me if you can: The case for retrieval augmentation in federated learning

Aashiq Muhamed, Pratiksha Thaker, Mona T Diab, and Virginia Smith. Cache me if you can: The case for retrieval augmentation in federated learning. In Privacy Regulation and Protection in Machine Learning

work page

[25] [25]

Ollama: Get up and running with large language models

Ollama. Ollama: Get up and running with large language models. GitHub repository, 2025. Accessed: February 8, 2025

work page 2025

[26] [26]

Maximizing rag efficiency: A compar- ative analysis of rag methods

Tolga Şakar and Hakan Emekci. Maximizing rag efficiency: A compar- ative analysis of rag methods. Natural Language Processing, 31(1):1–25, 2025

work page 2025

[27] [27]

A collaborative multi-agent approach to retrieval-augmented generation across diverse data

Aniruddha Salve, Saba Attar, Mahesh Deshmukh, Sayali Shivpuje, and Arnab Mitra Utsab. A collaborative multi-agent approach to retrieval-augmented generation across diverse data. arXiv preprint arXiv:2412.05838, 2024

work page arXiv 2024

[28] [28]

Know where to go: Make llm a relevant, responsible, and trustworthy searchers

Xiang Shi, Jiawei Liu, Yinpeng Liu, Qikai Cheng, and Wei Lu. Know where to go: Make llm a relevant, responsible, and trustworthy searchers. Decision Support Systems, 188:114354, 2025

work page 2025

[29] [29]

Federated search

Milad Shokouhi, Luo Si, et al. Federated search. Foundations and Trends® in Information Retrieval, 5(1):1–102, 2011

work page 2011

[30] [30]

Retrieval-qa-benchmark: A benchmark for evaluating retrieval-augmented qa systems

MyScale Team. Retrieval-qa-benchmark: A benchmark for evaluating retrieval-augmented qa systems. GitHub repository, 2024. Accessed: 2025-02-11

work page 2024

[31] [31]

Feb4rag: Evaluating federated search in the context of retrieval augmented generation

Shuai Wang, Ekaterina Khramtsova, Shengyao Zhuang, and Guido Zuccon. Feb4rag: Evaluating federated search in the context of retrieval augmented generation. In Proceedings of the 47th International ACM SIGIR Conference on Research and Development in Information Retrieval , pages 763–773, 2024

work page 2024

[32] [32]

Resllm: Large language models are strong resource selectors for federated search

Shuai Wang, Shengyao Zhuang, Bevan Koopman, and Guido Zuc- con. Resllm: Large language models are strong resource selectors for federated search. arXiv preprint arXiv:2401.17645, 2024

work page arXiv 2024

[33] [33]

Ltrrs: a learning to rank based algorithm for resource selection in distributed information retrieval

Tianfeng Wu, Xiaofeng Liu, and Shoubin Dong. Ltrrs: a learning to rank based algorithm for resource selection in distributed information retrieval. In Information Retrieval: 25th China Conference, CCIR 2019, Fuzhou, China, September 20–22, 2019, Proceedings 25 , pages 52–63. Springer, 2019

work page 2019

[34] [34]

Bench- marking retrieval-augmented generation for medicine

Guangzhi Xiong, Qiao Jin, Zhiyong Lu, and Aidong Zhang. Bench- marking retrieval-augmented generation for medicine. In Lun-Wei Ku, Andre Martins, and Vivek Srikumar, editors, Findings of the As- sociation for Computational Linguistics ACL 2024 , pages 6233–6251, Bangkok, Thailand and virtual meeting, August 2024. Association for Computational Linguistics

work page 2024

[35] [35]

Frag: Toward federated vector database management for collaborative and secure retrieval-augmented generation

Dongfang Zhao. Frag: Toward federated vector database management for collaborative and secure retrieval-augmented generation. arXiv preprint arXiv:2410.13272, 2024

work page arXiv 2024

[36] [36]

Mixture-of- experts with expert choice routing

Yanqi Zhou, Tao Lei, Hanxiao Liu, Nan Du, Yanping Huang, Vincent Zhao, Andrew M Dai, Quoc V Le, James Laudon, et al. Mixture-of- experts with expert choice routing. Advances in Neural Information Processing Systems, 35:7103–7114, 2022

work page 2022