AdaComp: Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models

Hainan Zhang; Hongwei Zheng; Liang Pang; Qianchi Zhang; Zhiming Zheng

arxiv: 2409.01579 · v2 · submitted 2024-09-03 · 💻 cs.CL · cs.AI

AdaComp: Extractive Context Compression with Adaptive Predictor for Retrieval-Augmented Large Language Models

Qianchi Zhang , Hainan Zhang , Liang Pang , Hongwei Zheng , Zhiming Zheng This is my paper

Pith reviewed 2026-05-23 21:04 UTC · model grok-4.3

classification 💻 cs.CL cs.AI

keywords context compressionretrieval-augmented generationadaptive compressionRAG efficiencyextractive compressioncompression rate predictorquery complexityretrieval quality

0 comments

The pith

A predictor trained on minimal sufficient document sets lets RAG systems keep only the documents each query actually needs.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Retrieved documents for retrieval-augmented generation often contain noise that raises both inference cost and error rates. The paper argues that the smallest number of top-ranked documents required to answer a query depends on query complexity and retrieval quality, so a fixed compression rate is inefficient. It first labels, for training queries, the fewest top documents that still let the model produce a correct answer. These labels form triplets with the query and documents that train a predictor to output the right compression rate at test time. Experiments on three QA datasets and one conversational multi-document QA dataset show the resulting shorter contexts reduce inference cost while answer accuracy stays nearly the same as the uncompressed baseline.

Core claim

AdaComp first annotates the minimum top-k documents necessary for the RAG system to answer the current query as the compression rate, then constructs triplets of the query, retrieved documents, and its compression rate to train a compression-rate predictor that adaptively determines the compression rate based on both query complexity and retrieval quality.

What carries the argument

A compression-rate predictor trained on triplets of query, retrieved documents, and the minimum number of those documents needed to answer correctly.

If this is right

The number of kept documents varies with each query instead of using one fixed rate for every input.
Fewer tokens reach the language model, lowering inference time and memory use.
Answer quality on standard QA tasks remains nearly identical to the full-context baseline.
The same procedure works for both single-document QA and conversational multi-document QA.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The predictor could be retrained on different retrievers or language models without changing the overall approach.
If labeling the minimal k during training proves costly, cheaper proxies such as attention patterns inside the model might substitute for it.
Adaptive selection might let systems safely increase the initial retrieval list size without a matching rise in cost.

Load-bearing premise

The smallest number of documents sufficient to answer a query can be reliably identified in advance for training examples so that a predictor trained on those labels will choose an adequate number for new queries.

What would settle it

Apply the trained predictor to a new set of queries and measure whether answer accuracy falls more than a few percentage points below the accuracy obtained when all retrieved documents are kept.

Figures

Figures reproduced from arXiv: 2409.01579 by Hainan Zhang, Hongwei Zheng, Liang Pang, Qianchi Zhang, Zhiming Zheng.

**Figure 2.** Figure 2: Overall architecture of AdaComp, which includes a retriever module [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗

**Figure 3.** Figure 3: An illustration of how the number of documents affects final RAG performance, generally, in the beginning, as the [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗

**Figure 5.** Figure 5: Case Study: answers generated using without [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

read the original abstract

Retrieved documents containing noise will hinder RAG from detecting answer clues and make the inference process slow and expensive. Therefore, context compression is necessary to enhance its accuracy and efficiency. Existing context compression methods use extractive or generative models to retain the most query-relevant sentences or apply the information bottleneck theory to preserve sufficient information. However, these methods may face issues such as over-compression or high computational costs. We observe that the retriever often ranks relevant documents at the top, but the exact number of documents needed to answer the query is uncertain due to the impact of query complexity and retrieval quality: complex queries like multi-hop questions may require retaining more documents than simpler queries, and a low-quality retrieval may need to rely on more documents to generate accurate outputs. Therefore, determining the minimum number of required documents (compression rate) is still a challenge for RAG. In this paper, we introduce AdaComp, a low-cost extractive context compression method that adaptively determines the compression rate based on both query complexity and retrieval quality. Specifically, we first annotate the minimum top-k documents necessary for the RAG system to answer the current query as the compression rate and then construct triplets of the query, retrieved documents, and its compression rate. Then, we use this triplet dataset to train a compression-rate predictor. Experiments on three QA datasets and one conversational Multi-doc QA dataset show that AdaComp significantly reduces inference costs while maintaining performance nearly identical to uncompressed models, achieving a balance between efficiency and performance.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AdaComp trains a predictor on per-query min-k labels to adapt RAG compression, but the labeling step lacks detail and the performance claims rest on thin evidence.

read the letter

The core idea is training a small predictor to output the smallest number of top documents a RAG system needs for a given query, using labels that mark the minimum k allowing a correct answer. This is distinct from fixed extractive compressors or information-bottleneck approaches mentioned in the abstract. The paper correctly notes that query complexity and retrieval quality make a single compression rate suboptimal, and the pipeline of building triplets then training the predictor is a straightforward way to make compression adaptive. Experiments on three QA sets plus a multi-doc conversational set reportedly keep answer quality close to the uncompressed baseline while cutting inference cost. That practical angle is the main value. The soft spot is the annotation procedure itself. The abstract says only that min-k is annotated as the smallest number allowing the RAG system to answer; there is no description of how that label is produced, whether it is stable across seeds or models, or how well it transfers. If the labels are noisy or query-specific in ways that do not generalize, the predictor will either waste tokens or drop accuracy on new queries. The abstract also gives no numbers, baselines, or error bars, so the central claim cannot be checked from the summary alone. The setup looks non-circular and the predictor is trained on external labels rather than fitted quantities. This paper is aimed at people shipping RAG systems who want a tunable efficiency knob. A reader already working on context compression might pick up the adaptive-labeling trick, but the work needs clearer validation of the labeling step before the efficiency claim can be trusted. I would send it for peer review so the annotation method and experimental details can be examined directly.

Referee Report

2 major / 1 minor

Summary. The paper introduces AdaComp, a low-cost extractive context compression technique for RAG. It first annotates the minimum top-k documents required for a query to be answered correctly, forms (query, retrieved docs, compression-rate) triplets, trains a predictor on these triplets to output an adaptive compression rate based on query complexity and retrieval quality, and then applies the predictor at inference time. Experiments on three QA datasets plus one conversational multi-document QA dataset are reported to show substantial inference-cost reduction while preserving performance nearly identical to the uncompressed baseline.

Significance. If the central experimental claim holds, AdaComp would supply a practical, trainable mechanism for variable-rate context compression that avoids both the over-compression risk of fixed extractive methods and the high cost of generative compressors. The explicit modeling of both query difficulty and retrieval quality as inputs to the predictor is a clear conceptual advance over static top-k or information-bottleneck baselines.

major comments (2)

[Abstract and §3] Abstract and §3 (Method): the annotation procedure that produces the ground-truth minimum top-k labels is described only as “annotate the minimum top-k documents necessary for the RAG system to answer.” No concrete protocol (incremental LLM prompting, oracle search, human judgment, stability checks across seeds, or inter-annotator agreement) is supplied. Because the predictor is trained directly on these labels, any noise or query-specific bias in the annotation step directly undermines the claim that the learned predictor will generalize to held-out queries while preserving answer quality.
[§4] §4 (Experiments): the abstract states that AdaComp “significantly reduces inference costs while maintaining performance nearly identical to uncompressed models,” yet supplies neither concrete metrics (exact accuracy deltas, latency or token counts, baselines, error bars, or statistical tests) nor training details for the predictor (architecture, loss, hyper-parameters, dataset sizes). These omissions make the central efficiency–performance trade-off claim impossible to verify from the provided text.

minor comments (1)

[Abstract] Abstract: the phrase “nearly identical to uncompressed models” is vague; a quantitative bound (e.g., <1 % drop on each dataset) would strengthen the claim even in the abstract.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. The comments highlight important areas for improving clarity and verifiability. We address each major comment below and will revise the manuscript accordingly.

read point-by-point responses

Referee: [Abstract and §3] Abstract and §3 (Method): the annotation procedure that produces the ground-truth minimum top-k labels is described only as “annotate the minimum top-k documents necessary for the RAG system to answer.” No concrete protocol (incremental LLM prompting, oracle search, human judgment, stability checks across seeds, or inter-annotator agreement) is supplied. Because the predictor is trained directly on these labels, any noise or query-specific bias in the annotation step directly undermines the claim that the learned predictor will generalize to held-out queries while preserving answer quality.

Authors: We agree that the current description of the annotation procedure is insufficiently detailed for reproducibility. In the revised version we will expand §3 with a concrete protocol: an automated incremental prompting procedure that begins with the top-ranked document and successively adds the next-highest document until the RAG system produces a correct answer, repeated across three random seeds with a stability threshold. We will also add discussion of potential label noise and a pseudocode listing. revision: yes
Referee: [§4] §4 (Experiments): the abstract states that AdaComp “significantly reduces inference costs while maintaining performance nearly identical to uncompressed models,” yet supplies neither concrete metrics (exact accuracy deltas, latency or token counts, baselines, error bars, or statistical tests) nor training details for the predictor (architecture, loss, hyper-parameters, dataset sizes). These omissions make the central efficiency–performance trade-off claim impossible to verify from the provided text.

Authors: We acknowledge that the abstract and introductory sections do not contain the numerical results needed for immediate verification. The full experimental section reports these quantities (accuracy deltas, token counts, etc.), but to make the central claim verifiable without reading the entire paper we will insert a concise metrics summary into the abstract and add an explicit subsection in §4 listing the predictor architecture, loss function, hyperparameters, and training set sizes. revision: yes

Circularity Check

0 steps flagged

No circularity; method uses external annotation to train independent predictor

full rationale

The paper's core procedure annotates minimum top-k externally, builds triplets, and trains a separate predictor; reported performance is measured on held-out datasets rather than any fitted quantity or self-referential definition. No equations, self-citations, or renamings appear that would force the compression-rate output to equal its training inputs by construction. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

The method rests on the existence of a reliable annotation procedure for minimum top-k and on standard supervised-learning assumptions that the predictor will generalize; no free parameters, invented entities, or non-standard axioms are visible in the abstract.

pith-pipeline@v0.9.0 · 5810 in / 1042 out tokens · 18562 ms · 2026-05-23T21:04:51.667110+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

21 extracted references · 21 canonical work pages · 6 internal anchors

[1]

, " * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

work page
[2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page
[3]

Chen, J.; Zhang, R.; Guo, J.; Fan, Y.; and Cheng, X. 2022. GERE: Generative evidence retrieval for fact verification. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2184--2189

work page 2022
[4]

Chevalier, A.; Wettig, A.; Ajith, A.; and Chen, D. 2023. Adapting language models to compress contexts. arXiv preprint arXiv:2305.14788

work page arXiv 2023
[5]

Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; Dai, Y.; Sun, J.; and Wang, H. 2023. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997

work page internal anchor Pith review Pith/arXiv arXiv 2023
[6]

Ge, T.; Hu, J.; Wang, L.; Wang, X.; Chen, S.-Q.; and Wei, F. 2023. In-context autoencoder for context compression in a large language model. arXiv preprint arXiv:2307.06945

work page arXiv 2023
[7]

LoRA: Low-Rank Adaptation of Large Language Models

Hu, E. J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; and Chen, W. 2021. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685

work page internal anchor Pith review Pith/arXiv arXiv 2021
[8]

Huang, Q.; Fu, S.; Liu, X.; Wang, W.; Ko, T.; Zhang, Y.; and Tang, L. 2023. Learning Retrieval Augmentation for Personalized Dialogue Generation. In The 2023 Conference on Empirical Methods in Natural Language Processing

work page 2023
[9]

Jiang, H.; Wu, Q.; Lin, C.-Y.; Yang, Y.; and Qiu, L. 2023 a . Llmlingua: Compressing prompts for accelerated inference of large language models. arXiv preprint arXiv:2310.05736

work page arXiv 2023
[10]

Jiang, H.; Wu, Q.; Luo, X.; Li, D.; Lin, C.-Y.; Yang, Y.; and Qiu, L. 2023 b . Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression. arXiv preprint arXiv:2310.06839

work page arXiv 2023
[11]

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

Joshi, M.; Choi, E.; Weld, D. S.; and Zettlemoyer, L. 2017. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551

work page internal anchor Pith review Pith/arXiv arXiv 2017
[12]

Karpukhin, V.; O g uz, B.; Min, S.; Lewis, P.; Wu, L.; Edunov, S.; Chen, D.; and Yih, W.-t. 2020. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906

work page internal anchor Pith review Pith/arXiv arXiv 2020
[13]

Kwiatkowski, T.; Palomaki, J.; Redfield, O.; Collins, M.; Parikh, A.; Alberti, C.; Epstein, D.; Polosukhin, I.; Devlin, J.; Lee, K.; et al. 2019. Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7: 453--466

work page 2019
[14]

Li, Y.; Dong, B.; Lin, C.; and Guerin, F. 2023. Compressing context to enhance inference efficiency of large language models. arXiv preprint arXiv:2310.06201

work page arXiv 2023
[15]

F.; Lin, K.; Hewitt, J.; Paranjape, A.; Bevilacqua, M.; Petroni, F.; and Liang, P

Liu, N. F.; Lin, K.; Hewitt, J.; Paranjape, A.; Bevilacqua, M.; Petroni, F.; and Liang, P. 2024. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12: 157--173

work page 2024
[16]

Mao, Y.; He, P.; Liu, X.; Shen, Y.; Gao, J.; Han, J.; and Chen, W. 2021. Generation-Augmented Retrieval for Open-Domain Question Answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 4089--4100

work page 2021
[17]

Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.-A.; Lacroix, T.; Rozi \`e re, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971

work page internal anchor Pith review Pith/arXiv arXiv 2023
[18]

Learning to filter context for retrieval-augmented generation,

Wang, Z.; Araki, J.; Jiang, Z.; Parvez, M. R.; and Neubig, G. 2023. Learning to filter context for retrieval-augmented generation. arXiv preprint arXiv:2311.08377

work page arXiv 2023
[19]

Xu, F.; Shi, W.; and Choi, E. 2023. Recomp: Improving retrieval-augmented lms with compression and selective augmentation. arXiv preprint arXiv:2310.04408

work page arXiv 2023
[20]

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Yang, Z.; Qi, P.; Zhang, S.; Bengio, Y.; Cohen, W. W.; Salakhutdinov, R.; and Manning, C. D. 2018. HotpotQA: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600

work page internal anchor Pith review Pith/arXiv arXiv 2018
[21]

Zhu, K.; Feng, X.; Du, X.; Gu, Y.; Yu, W.; Wang, H.; Chen, Q.; Chu, Z.; Chen, J.; and Qin, B. 2024. An Information Bottleneck Perspective for Effective Noise Filtering on Retrieval-Augmented Generation. arXiv preprint arXiv:2406.01549

work page arXiv 2024

[1] [1]

, " * write output.state after.block = add.period write newline

ENTRY address archivePrefix author booktitle chapter edition editor eid eprint howpublished institution isbn journal key month note number organization pages publisher school series title type volume year label extra.label sort.label short.list INTEGERS output.state before.all mid.sentence after.sentence after.block FUNCTION init.state.consts #0 'before.a...

work page

[2] [2]

write newline

" write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION word.in bbl.in capitalize " " * FUNCT...

work page

[3] [3]

Chen, J.; Zhang, R.; Guo, J.; Fan, Y.; and Cheng, X. 2022. GERE: Generative evidence retrieval for fact verification. In Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval, 2184--2189

work page 2022

[4] [4]

Chevalier, A.; Wettig, A.; Ajith, A.; and Chen, D. 2023. Adapting language models to compress contexts. arXiv preprint arXiv:2305.14788

work page arXiv 2023

[5] [5]

Gao, Y.; Xiong, Y.; Gao, X.; Jia, K.; Pan, J.; Bi, Y.; Dai, Y.; Sun, J.; and Wang, H. 2023. Retrieval-augmented generation for large language models: A survey. arXiv preprint arXiv:2312.10997

work page internal anchor Pith review Pith/arXiv arXiv 2023

[6] [6]

Ge, T.; Hu, J.; Wang, L.; Wang, X.; Chen, S.-Q.; and Wei, F. 2023. In-context autoencoder for context compression in a large language model. arXiv preprint arXiv:2307.06945

work page arXiv 2023

[7] [7]

LoRA: Low-Rank Adaptation of Large Language Models

Hu, E. J.; Shen, Y.; Wallis, P.; Allen-Zhu, Z.; Li, Y.; Wang, S.; Wang, L.; and Chen, W. 2021. Lora: Low-rank adaptation of large language models. arXiv preprint arXiv:2106.09685

work page internal anchor Pith review Pith/arXiv arXiv 2021

[8] [8]

Huang, Q.; Fu, S.; Liu, X.; Wang, W.; Ko, T.; Zhang, Y.; and Tang, L. 2023. Learning Retrieval Augmentation for Personalized Dialogue Generation. In The 2023 Conference on Empirical Methods in Natural Language Processing

work page 2023

[9] [9]

Jiang, H.; Wu, Q.; Lin, C.-Y.; Yang, Y.; and Qiu, L. 2023 a . Llmlingua: Compressing prompts for accelerated inference of large language models. arXiv preprint arXiv:2310.05736

work page arXiv 2023

[10] [10]

Jiang, H.; Wu, Q.; Luo, X.; Li, D.; Lin, C.-Y.; Yang, Y.; and Qiu, L. 2023 b . Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression. arXiv preprint arXiv:2310.06839

work page arXiv 2023

[11] [11]

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension

Joshi, M.; Choi, E.; Weld, D. S.; and Zettlemoyer, L. 2017. Triviaqa: A large scale distantly supervised challenge dataset for reading comprehension. arXiv preprint arXiv:1705.03551

work page internal anchor Pith review Pith/arXiv arXiv 2017

[12] [12]

Karpukhin, V.; O g uz, B.; Min, S.; Lewis, P.; Wu, L.; Edunov, S.; Chen, D.; and Yih, W.-t. 2020. Dense passage retrieval for open-domain question answering. arXiv preprint arXiv:2004.04906

work page internal anchor Pith review Pith/arXiv arXiv 2020

[13] [13]

Kwiatkowski, T.; Palomaki, J.; Redfield, O.; Collins, M.; Parikh, A.; Alberti, C.; Epstein, D.; Polosukhin, I.; Devlin, J.; Lee, K.; et al. 2019. Natural questions: a benchmark for question answering research. Transactions of the Association for Computational Linguistics, 7: 453--466

work page 2019

[14] [14]

Li, Y.; Dong, B.; Lin, C.; and Guerin, F. 2023. Compressing context to enhance inference efficiency of large language models. arXiv preprint arXiv:2310.06201

work page arXiv 2023

[15] [15]

F.; Lin, K.; Hewitt, J.; Paranjape, A.; Bevilacqua, M.; Petroni, F.; and Liang, P

Liu, N. F.; Lin, K.; Hewitt, J.; Paranjape, A.; Bevilacqua, M.; Petroni, F.; and Liang, P. 2024. Lost in the middle: How language models use long contexts. Transactions of the Association for Computational Linguistics, 12: 157--173

work page 2024

[16] [16]

Mao, Y.; He, P.; Liu, X.; Shen, Y.; Gao, J.; Han, J.; and Chen, W. 2021. Generation-Augmented Retrieval for Open-Domain Question Answering. In Proceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 1: Long Papers), 4089--4100

work page 2021

[17] [17]

Touvron, H.; Lavril, T.; Izacard, G.; Martinet, X.; Lachaux, M.-A.; Lacroix, T.; Rozi \`e re, B.; Goyal, N.; Hambro, E.; Azhar, F.; et al. 2023. Llama: Open and efficient foundation language models. arXiv preprint arXiv:2302.13971

work page internal anchor Pith review Pith/arXiv arXiv 2023

[18] [18]

Learning to filter context for retrieval-augmented generation,

Wang, Z.; Araki, J.; Jiang, Z.; Parvez, M. R.; and Neubig, G. 2023. Learning to filter context for retrieval-augmented generation. arXiv preprint arXiv:2311.08377

work page arXiv 2023

[19] [19]

Xu, F.; Shi, W.; and Choi, E. 2023. Recomp: Improving retrieval-augmented lms with compression and selective augmentation. arXiv preprint arXiv:2310.04408

work page arXiv 2023

[20] [20]

HotpotQA: A Dataset for Diverse, Explainable Multi-hop Question Answering

Yang, Z.; Qi, P.; Zhang, S.; Bengio, Y.; Cohen, W. W.; Salakhutdinov, R.; and Manning, C. D. 2018. HotpotQA: A dataset for diverse, explainable multi-hop question answering. arXiv preprint arXiv:1809.09600

work page internal anchor Pith review Pith/arXiv arXiv 2018

[21] [21]

Zhu, K.; Feng, X.; Du, X.; Gu, Y.; Yu, W.; Wang, H.; Chen, Q.; Chu, Z.; Chen, J.; and Qin, B. 2024. An Information Bottleneck Perspective for Effective Noise Filtering on Retrieval-Augmented Generation. arXiv preprint arXiv:2406.01549

work page arXiv 2024