Query pipeline optimization for cancer patient question answering systems
Pith reviewed 2026-05-23 06:28 UTC · model grok-4.3
The pith
A three-aspect optimization of the RAG query pipeline improves accuracy on cancer patient questions by 5.24 percent.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that the three proposed optimizations—comparative analysis of NCBI resources with Hybrid Semantic Real-time Document Retrieval, identification of best dense retriever and reranker pairs, and Semantic Enhanced Overlap Segmentation—raise the answer accuracy of Claude-3-haiku by 5.24 percent over chain-of-thought prompting and roughly 3 percent over a naive RAG setup when tested on a custom dataset of cancer-related inquiries.
What carries the argument
The three-aspect optimization approach for the RAG query pipeline, consisting of document retrieval via HSRDR, passage retrieval via retriever-reranker pairings, and semantic representation via SEOS.
If this is right
- Domain-specific tuning of each RAG pipeline stage is required to achieve the reported gains in CPQA systems.
- Public biomedical databases such as PubMed become effective grounding sources once paired with the described retrieval methods.
- The overall framework supports construction of more accurate CPQA systems than either prompting alone or untuned RAG.
- The same three-aspect structure can be reused as a template for other biomedical RAG applications.
Where Pith is reading between the lines
- If the optimizations hold on other medical topics, they could reduce the need for larger models in specialized QA tasks.
- Testing the pipeline on questions drawn directly from clinical records rather than a custom set would clarify real-world transfer.
- The accuracy delta might compound if the optimized retrieval is combined with model fine-tuning on biomedical text.
Load-bearing premise
The custom dataset is representative of real cancer patient questions and the measured accuracy gains are produced by the three optimizations rather than by how the dataset was built or evaluated.
What would settle it
Running the same optimized pipeline and baselines on an independently gathered collection of actual cancer patient questions and observing no accuracy improvement would falsify the claim.
Figures
read the original abstract
Retrieval-augmented generation (RAG) mitigates hallucination in Large Language Models (LLMs) by using query pipelines to retrieve relevant external information and grounding responses in retrieved knowledge. However, query pipeline optimization for cancer patient question-answering (CPQA) systems requires separately optimizing multiple components with domain-specific considerations. We propose a novel three-aspect optimization approach for the RAG query pipeline in CPQA systems, utilizing public biomedical databases like PubMed and PubMed Central. Our optimization includes: (1) document retrieval, utilizing a comparative analysis of NCBI resources and introducing Hybrid Semantic Real-time Document Retrieval (HSRDR); (2) passage retrieval, identifying optimal pairings of dense retrievers and rerankers; and (3) semantic representation, introducing Semantic Enhanced Overlap Segmentation (SEOS) for improved contextual understanding. On a custom-developed dataset tailored for cancer-related inquiries, our optimized RAG approach improved the answer accuracy of Claude-3-haiku by 5.24% over chain-of-thought prompting and about 3% over a naive RAG setup. This study highlights the importance of domain-specific query optimization in realizing the full potential of RAG and provides a robust framework for building more accurate and reliable CPQA systems, advancing the development of RAG-based biomedical systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript proposes a three-aspect optimization framework for RAG query pipelines in cancer patient question-answering systems. It introduces Hybrid Semantic Real-time Document Retrieval (HSRDR) using NCBI resources, identifies optimal dense retriever-reranker pairings for passage retrieval, and presents Semantic Enhanced Overlap Segmentation (SEOS) for semantic representation. On a custom-developed dataset of cancer-related inquiries, the optimized pipeline is reported to improve answer accuracy of Claude-3-haiku by 5.24% relative to chain-of-thought prompting and approximately 3% relative to a naive RAG baseline.
Significance. If the reported gains prove robust and causally attributable to the three proposed components, the work would supply a practical, domain-specific template for RAG optimization in biomedical QA. The emphasis on public biomedical corpora (PubMed, PubMed Central) and the explicit separation of document-level, passage-level, and segmentation-level choices are potentially useful for practitioners. At present, however, the absence of dataset statistics, metric definitions, and component ablations prevents any such assessment.
major comments (3)
- [Abstract] Abstract: the headline claim of a 5.24 % accuracy lift is stated without any accompanying information on dataset size, question provenance, ground-truth construction, inter-annotator agreement, or the precise definition of “answer accuracy” (exact match, LLM-as-judge, human rating, etc.). These omissions make it impossible to determine whether the observed deltas are driven by the three optimizations or by dataset-construction or evaluation artifacts.
- [Results] Results / Evaluation section: the three proposed components (HSRDR, retriever-reranker pairings, SEOS) are never ablated against one another on a fixed test set. Consequently it is impossible to isolate which component, if any, accounts for the reported improvement over the naive RAG baseline.
- [Methods] Methods: no statistical tests, confidence intervals, or controls for prompt leakage or dataset leakage are described, leaving the 3 % and 5.24 % deltas without evidence of statistical reliability or causal attribution.
minor comments (2)
- [Abstract] The manuscript should supply a clear, reproducible definition of the accuracy metric and release (or at minimum describe in detail) the custom dataset so that the empirical claims can be verified.
- [Methods] Notation for the three optimization stages (HSRDR, SEOS) is introduced without an accompanying diagram or pseudocode that would clarify their integration into a single query pipeline.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript to improve transparency, add missing analyses, and strengthen the evaluation section.
read point-by-point responses
-
Referee: [Abstract] Abstract: the headline claim of a 5.24 % accuracy lift is stated without any accompanying information on dataset size, question provenance, ground-truth construction, inter-annotator agreement, or the precise definition of “answer accuracy” (exact match, LLM-as-judge, human rating, etc.). These omissions make it impossible to determine whether the observed deltas are driven by the three optimizations or by dataset-construction or evaluation artifacts.
Authors: We agree that the abstract would benefit from a concise summary of the evaluation setup. Detailed information on the custom dataset (size, provenance from cancer patient inquiries, expert-constructed ground truth, and answer accuracy defined via LLM-as-judge with human verification) appears in the Methods and Results sections. We will revise the abstract to include a brief overview of these elements, dataset statistics, and the accuracy metric. Inter-annotator agreement is not applicable, as ground truth was produced by domain experts using a single-annotator protocol for consistency in this specialized biomedical domain. revision: yes
-
Referee: [Results] Results / Evaluation section: the three proposed components (HSRDR, retriever-reranker pairings, SEOS) are never ablated against one another on a fixed test set. Consequently it is impossible to isolate which component, if any, accounts for the reported improvement over the naive RAG baseline.
Authors: The referee is correct that component-wise ablations on a fixed test set are absent. We will add these ablations in a revised Results section, reporting performance when enabling/disabling each aspect (HSRDR, optimal retriever-reranker pairs, and SEOS) independently while holding the test set constant. This will clarify the contribution of each optimization to the observed gains over the naive RAG baseline. revision: yes
-
Referee: [Methods] Methods: no statistical tests, confidence intervals, or controls for prompt leakage or dataset leakage are described, leaving the 3 % and 5.24 % deltas without evidence of statistical reliability or causal attribution.
Authors: We acknowledge the omission of statistical validation and leakage controls. In the revision we will add bootstrap-derived 95% confidence intervals around the accuracy deltas and report results of paired statistical tests (e.g., McNemar’s test) to establish reliability. We will also describe the leakage safeguards already used, including temporally disjoint test questions and explicit checks that test queries do not overlap with the retrieval corpus or model training data. revision: yes
Circularity Check
No circularity: purely empirical performance reporting with no derivations or self-referential reductions
full rationale
The paper proposes three RAG pipeline components (HSRDR, retriever-reranker pairings, SEOS) and reports measured accuracy gains on a custom dataset. No equations, derivations, fitted parameters renamed as predictions, or load-bearing self-citations appear in the provided text. All claims are direct empirical outcomes rather than tautological reductions of inputs; the work is self-contained against external benchmarks and receives the default non-circularity finding.
Axiom & Free-Parameter Ledger
Forward citations
Cited by 1 Pith paper
-
Learning Where to Embed: Noise-Aware Positional Embedding for Query Retrieval in Small-Object Detection
HELP uses heatmap-guided positional embeddings and a gradient mask to suppress background noise in queries, enabling efficient small-object detection with fewer decoder layers and parameters.
Reference graph
Works this paper leans on
-
[1]
Scientific literature: Information overload,
E. Landhuis, “Scientific literature: Information overload,” Nature, vol. 535, no. 7612, pp. 457–458, 2016
work page 2016
-
[2]
Benchmarking retrieval- augmented generation for medicine,
G. Xiong, Q. Jin, Z. Lu, and A. Zhang, “Benchmarking retrieval- augmented generation for medicine,” in Findings of the Association for Computational Linguistics: ACL 2024 , L.-W. Ku, A. Martins, and V . Srikumar, Eds. Bangkok, Thailand: Association for Computational Linguistics, Aug. 2024, pp. 6233–6251. [Online]. Available: https: //aclanthology.org/2024...
work page 2024
-
[3]
Survey of hallucination in natural language generation,
Z. Ji, N. Lee, R. Frieske, T. Yu, D. Su, Y . Xu, E. Ishii, Y . J. Bang, A. Madotto, and P. Fung, “Survey of hallucination in natural language generation,” ACM Computing Surveys , vol. 55, no. 12, pp. 1–38, 2023
work page 2023
-
[4]
Towards mitigating llm hallucination via self reflection,
Z. Ji, T. Yu, Y . Xu, N. Lee, E. Ishii, and P. Fung, “Towards mitigating llm hallucination via self reflection,” in Findings of the Association for Computational Linguistics: EMNLP 2023 , 2023, pp. 1827–1843
work page 2023
-
[5]
Med-halt: Medical domain hallucination test for large language models,
L. K. Umapathi, A. Pal, and M. Sankarasubbu, “Med-halt: Medical domain hallucination test for large language models,” arXiv preprint arXiv:2307.15343, 2023
-
[6]
Retrieval- augmented generation for knowledge-intensive nlp tasks,
P. Lewis, E. Perez, A. Piktus, F. Petroni, V . Karpukhin, N. Goyal, H. K ¨uttler, M. Lewis, W.-t. Yih, T. Rockt ¨aschel et al. , “Retrieval- augmented generation for knowledge-intensive nlp tasks,” Advances in Neural Information Processing Systems , vol. 33, pp. 9459–9474, 2020
work page 2020
-
[7]
Knowledge-augmented reasoning distillation for small language models in knowledge-intensive tasks,
M. Kang, S. Lee, J. Baek, K. Kawaguchi, and S. J. Hwang, “Knowledge-augmented reasoning distillation for small language models in knowledge-intensive tasks,” Advances in Neural Information Process- ing Systems, vol. 36, 2024
work page 2024
-
[8]
Large language mod- els should be used as scientific reasoning engines, not knowledge databases,
D. Truhn, J. S. Reis-Filho, and J. N. Kather, “Large language mod- els should be used as scientific reasoning engines, not knowledge databases,” Nature medicine, vol. 29, no. 12, pp. 2983–2984, 2023
work page 2023
-
[9]
Enabling large language models to generate text with citations,
T. Gao, H. Yen, J. Yu, and D. Chen, “Enabling large language models to generate text with citations,” arXiv preprint arXiv:2305.14627, 2023
-
[10]
Superposition prompting: Improving and accelerating retrieval-augmented generation,
T. Merth, Q. Fu, M. Rastegari, and M. Najibi, “Superposition prompting: Improving and accelerating retrieval-augmented generation,” in Forty- first International Conference on Machine Learning
-
[11]
Raptor: Recursive abstractive processing for tree-organized retrieval,
P. Sarthi, S. Abdullah, A. Tuli, S. Khanna, A. Goldie, and C. D. Manning, “Raptor: Recursive abstractive processing for tree-organized retrieval,” in The Twelfth International Conference on Learning Repre- sentations, 2023
work page 2023
-
[12]
Gar-meets-rag paradigm for zero-shot information re- trieval,
D. Arora, A. Kini, S. R. Chowdhury, N. Natarajan, G. Sinha, and A. Sharma, “Gar-meets-rag paradigm for zero-shot information re- trieval,” arXiv preprint arXiv:2310.20158 , 2023
-
[13]
Multihop-rag: Benchmarking retrieval-augmented generation for multi-hop queries,
Y . Tang and Y . Yang, “Multihop-rag: Benchmarking retrieval-augmented generation for multi-hop queries,” arXiv e-prints, pp. arXiv–2401, 2024
work page 2024
-
[14]
Making retrieval- augmented language models robust to irrelevant context,
O. Yoran, T. Wolfson, O. Ram, and J. Berant, “Making retrieval- augmented language models robust to irrelevant context,” in The Twelfth International Conference on Learning Representations , 2023
work page 2023
-
[15]
Entrez direct: E-utilities on the unix command line,
J. Kans, “Entrez direct: E-utilities on the unix command line,” in Entrez programming utilities help [Internet] . National Center for Biotechnology Information (US), 2024. 8 IEEE TRANSACTIONS AND JOURNALS TEMPLATE
work page 2024
-
[16]
Pubmed and beyond: biomedical literature search in the age of artificial intelligence,
Q. Jin, R. Leaman, and Z. Lu, “Pubmed and beyond: biomedical literature search in the age of artificial intelligence,” Ebiomedicine, vol. 100, 2024
work page 2024
-
[17]
Pmc-llama: toward building open-source language models for medicine,
C. Wu, W. Lin, X. Zhang, Y . Zhang, W. Xie, and Y . Wang, “Pmc-llama: toward building open-source language models for medicine,” Journal of the American Medical Informatics Association , p. ocae045, 2024
work page 2024
-
[18]
Bioreader: a retrieval-enhanced text-to-text transformer for biomedical literature,
G. Frisoni, M. Mizutani, G. Moro, and L. Valgimigli, “Bioreader: a retrieval-enhanced text-to-text transformer for biomedical literature,” in Proceedings of the 2022 conference on empirical methods in natural language processing, 2022, pp. 5770–5793
work page 2022
-
[19]
Improving health question answering with reliable and time-aware evidence retrieval,
J. Vladika and F. Matthes, “Improving health question answering with reliable and time-aware evidence retrieval,” in Findings of the Associa- tion for Computational Linguistics: NAACL 2024, 2024, pp. 4752–4763
work page 2024
-
[20]
Q. Jin, W. Kim, Q. Chen, D. C. Comeau, L. Yeganova, W. J. Wilbur, and Z. Lu, “Medcpt: Contrastive pre-trained transformers with large-scale pubmed search logs for zero-shot biomedical information retrieval,” Bioinformatics, vol. 39, no. 11, p. btad651, 2023
work page 2023
-
[21]
Semantic models for the first-stage retrieval: A comprehensive review,
J. Guo, Y . Cai, Y . Fan, F. Sun, R. Zhang, and X. Cheng, “Semantic models for the first-stage retrieval: A comprehensive review,” ACM Transactions on Information Systems (TOIS) , vol. 40, no. 4, pp. 1–42, 2022
work page 2022
-
[22]
The probabilistic relevance frame- work: Bm25 and beyond,
S. Robertson, H. Zaragoza et al. , “The probabilistic relevance frame- work: Bm25 and beyond,” Foundations and Trends® in Information Retrieval, vol. 3, no. 4, pp. 333–389, 2009
work page 2009
-
[23]
Towards robust ranker for text retrieval,
Y . Zhou, T. Shen, X. Geng, C. Tao, C. Xu, G. Long, B. Jiao, and D. Jiang, “Towards robust ranker for text retrieval,” in Findings of the Association for Computational Linguistics: ACL 2023 , 2023, pp. 5387– 5401
work page 2023
-
[24]
Multi-stage document ranking with bert,
R. Nogueira, W. Yang, K. Cho, and J. Lin, “Multi-stage document ranking with bert,” arXiv preprint arXiv:1910.14424 , 2019
-
[25]
Retrieval-augmented generation for large language models: A survey
“Retrieval-augmented generation for large language models: A survey.”
- [26]
-
[27]
Text tiling: Segmenting text into multi-paragraph subtopic passages,
M. A. Hearst, “Text tiling: Segmenting text into multi-paragraph subtopic passages,” Computational linguistics, vol. 23, no. 1, pp. 33–64, 1997
work page 1997
-
[28]
Can generalist foundation models outcompete special-purpose tuning? case study in medicine,
H. Nori, Y . T. Lee, S. Zhang, D. Carignan, R. Edgar, N. Fusi, N. King, J. Larson, Y . Li, W. Liu et al. , “Can generalist foundation models outcompete special-purpose tuning? case study in medicine,” Medicine, vol. 84, no. 88.3, pp. 77–3, 2023
work page 2023
-
[29]
Large language models encode clinical knowledge,
K. Singhal, S. Azizi, T. Tu, S. S. Mahdavi, J. Wei, H. W. Chung, N. Scales, A. Tanwani, H. Cole-Lewis, S. Pfohl et al., “Large language models encode clinical knowledge,”Nature, vol. 620, no. 7972, pp. 172– 180, 2023
work page 2023
-
[30]
Can large language models reason about medical questions?
V . Li ´evin, C. E. Hother, A. G. Motzfeldt, and O. Winther, “Can large language models reason about medical questions?” Patterns, vol. 5, no. 3, 2024
work page 2024
-
[31]
Pubmedqa: A dataset for biomedical research question answering,
Q. Jin, B. Dhingra, Z. Liu, W. Cohen, and X. Lu, “Pubmedqa: A dataset for biomedical research question answering,” in Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing and the 9th International Joint Conference on Natural Language Processing (EMNLP-IJCNLP), 2019, pp. 2567–2577
work page 2019
-
[32]
Measuring massive multitask language understanding,
D. Hendrycks, C. Burns, S. Basart, A. Zou, M. Mazeika, D. Song, and J. Steinhardt, “Measuring massive multitask language understanding,” in International Conference on Learning Representations , 2020
work page 2020
-
[33]
D. Jin, E. Pan, N. Oufattole, W. Wei-Hung, H. Fang, and P. Szolovits, “What disease does this patient have? a large-scale open domain question answering dataset from medical exams,” Applied Sciences , vol. 11, no. 14, p. 6421, 2021
work page 2021
-
[34]
Medmcqa: A large- scale multi-subject multi-choice dataset for medical domain question answering,
A. Pal, L. K. Umapathi, and M. Sankarasubbu, “Medmcqa: A large- scale multi-subject multi-choice dataset for medical domain question answering,” in Conference on health, inference, and learning . PMLR, 2022, pp. 248–260
work page 2022
-
[35]
Towards Expert-Level Medical Question Answering with Large Language Models
K. Singhal, T. Tu, J. Gottweis, R. Sayres, E. Wulczyn, L. Hou, K. Clark, S. Pfohl, H. Cole-Lewis, D. Neal et al. , “Towards expert- level medical question answering with large language models,” arXiv preprint arXiv:2305.09617, 2023
work page internal anchor Pith review arXiv 2023
-
[36]
Spell checker for consumer language (cspell),
C. J. Lu, A. R. Aronson, S. E. Shooshan, and D. Demner-Fushman, “Spell checker for consumer language (cspell),”Journal of the American Medical Informatics Association , vol. 26, no. 3, pp. 211–218, 2019
work page 2019
-
[37]
Bridging the gap between consumers’ med- ication questions and trusted answers,
A. B. Abacha, Y . Mrabet, M. Sharp, T. R. Goodwin, S. E. Shooshan, and D. Demner-Fushman, “Bridging the gap between consumers’ med- ication questions and trusted answers,” in MEDINFO 2019: Health and Wellbeing e-Networks for All . IOS Press, 2019, pp. 25–29
work page 2019
-
[38]
Introduction to information retrieval,
D. MANNING, “Introduction to information retrieval,” Journal of the American Statistical Association , vol. 15, 2008
work page 2008
-
[39]
Domain-specific language model pretraining for biomedical natural language processing,
Y . Gu, R. Tinn, H. Cheng, M. Lucas, N. Usuyama, X. Liu, T. Naumann, J. Gao, and H. Poon, “Domain-specific language model pretraining for biomedical natural language processing,” 2020
work page 2020
-
[40]
Matryoshka representation learning,
A. Kusupati, G. Bhatt, A. Rege, M. Wallingford, A. Sinha, V . Ramanu- jan, W. Howard-Snyder, K. Chen, S. Kakade, P. Jain et al., “Matryoshka representation learning,” Advances in Neural Information Processing Systems, vol. 35, pp. 30 233–30 249, 2022
work page 2022
-
[41]
Angle-optimized text embeddings,
X. Li and J. Li, “Angle-optimized text embeddings,” arXiv preprint arXiv:2309.12871, 2023
-
[42]
C-pack: Packaged resources to advance general chinese embedding,
S. Xiao, Z. Liu, P. Zhang, and N. Muennighoff, “C-pack: Packaged resources to advance general chinese embedding,” 2023
work page 2023
-
[43]
Sfr-embedding-mistral:enhance text retrieval with transfer learning,
R. Meng, Y . Liu, S. R. Joty, C. Xiong, Y . Zhou, and S. Yavuz, “Sfr-embedding-mistral:enhance text retrieval with transfer learning,” Salesforce AI Research Blog, 2024. [Online]. Available: https: //blog.salesforceairesearch.com/sfr-embedded-mistral/
work page 2024
-
[44]
X. Yang, X. He, H. Zhang, Y . Ma, J. Bian, Y . Wuet al., “Measurement of semantic textual similarity in clinical texts: comparison of transformer- based models,” JMIR medical informatics , vol. 8, no. 11, p. e19735, 2020
work page 2020
-
[45]
MTEB: Massive Text Embedding Benchmark
N. Muennighoff, N. Tazi, L. Magne, and N. Reimers, “Mteb: Massive text embedding benchmark,” arXiv preprint arXiv:2210.07316 , 2022. [Online]. Available: https://arxiv.org/abs/2210.07316
work page internal anchor Pith review Pith/arXiv arXiv 2022
-
[46]
Boot and switch: Alter- nating distillation for zero-shot dense retrieval,
F. Jiang, Q. Xu, T. Drummond, and T. Cohn, “Boot and switch: Alter- nating distillation for zero-shot dense retrieval,” in The 2023 Conference on Empirical Methods in Natural Language Processing
work page 2023
-
[47]
Making large language models a better foundation for dense retrieval,
C. Li, Z. Liu, S. Xiao, and Y . Shao, “Making large language models a better foundation for dense retrieval,” 2023
work page 2023
-
[48]
Multi-passage bert: A globally normalized bert model for open-domain question answering,
Z. Wang, P. Ng, X. Ma, R. Nallapati, and B. Xiang, “Multi-passage bert: A globally normalized bert model for open-domain question answering,” arXiv preprint arXiv:1908.08167 , 2019
-
[49]
I. Ullah, S. Khusro, and I. Ahmad, “Improving social book search using structure semantics, bibliographic descriptions and social metadata,” Multimedia Tools and Applications, vol. 80, no. 4, pp. 5131–5172, 2021
work page 2021
-
[50]
Reciprocal rank fusion outperforms condorcet and individual rank learning methods,
G. V . Cormack, C. L. Clarke, and S. Buettcher, “Reciprocal rank fusion outperforms condorcet and individual rank learning methods,” in Proceedings of the 32nd international ACM SIGIR conference on Research and development in information retrieval , 2009, pp. 758–759
work page 2009
-
[51]
Information entropy, rough entropy and knowledge granulation in incomplete information systems,
J. Liang, Z. Shi, D. Li, and M. J. Wierman, “Information entropy, rough entropy and knowledge granulation in incomplete information systems,” International Journal of general systems , vol. 35, no. 6, pp. 641–654, 2006
work page 2006
-
[52]
Chain-of-thought prompting elicits reasoning in large language models,
J. Wei, X. Wang, D. Schuurmans, M. Bosma, F. Xia, E. Chi, Q. V . Le, D. Zhou et al. , “Chain-of-thought prompting elicits reasoning in large language models,” Advances in neural information processing systems , vol. 35, pp. 24 824–24 837, 2022
work page 2022
-
[53]
Approximate nearest neighbor negative contrastive learning for dense text retrieval,
L. Xiong, C. Xiong, Y . Li, K.-F. Tang, J. Liu, P. Bennett, J. Ahmed, and A. Overwijk, “Approximate nearest neighbor negative contrastive learning for dense text retrieval,” arXiv preprint arXiv:2007.00808 , 2020
-
[54]
Parameter-efficient prompt tuning makes generalized and calibrated neural text retrievers,
W. L. Tam, X. Liu, K. Ji, L. Xue, X. Zhang, Y . Dong, J. Liu, M. Hu, and J. Tang, “Parameter-efficient prompt tuning makes generalized and calibrated neural text retrievers,”arXiv preprint arXiv:2207.07087, 2022
-
[55]
Self-knowledge guided retrieval augmentation for large language models,
Y . Wang, P. Li, M. Sun, and Y . Liu, “Self-knowledge guided retrieval augmentation for large language models,” in Findings of the Association for Computational Linguistics: EMNLP 2023 , 2023, pp. 10 303–10 315
work page 2023
-
[56]
Self-rag: Learning to retrieve, generate, and critique through self-reflection,
A. Asai, Z. Wu, Y . Wang, A. Sil, and H. Hajishirzi, “Self-rag: Learning to retrieve, generate, and critique through self-reflection,” in The Twelfth International Conference on Learning Representations , 2023
work page 2023
-
[57]
Query2doc: Query expansion with large language models,
L. Wang, N. Yang, and F. Wei, “Query2doc: Query expansion with large language models,” arXiv preprint arXiv:2303.07678 , 2023
-
[58]
Rag-fusion: a new take on retrieval-augmented gener- ation,
Z. Rackauckas, “Rag-fusion: a new take on retrieval-augmented gener- ation,” arXiv preprint arXiv:2402.03367 , 2024
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.