pith. sign in

arxiv: 2606.31156 · v1 · pith:QMBXLJ6Gnew · submitted 2026-06-30 · 💻 cs.IR · cs.AI

One Retrieval to Cover Them All: Co-occurrence-Aware Knowledge Base Reorganization for Session-Level RAG

Pith reviewed 2026-07-01 04:38 UTC · model grok-4.3

classification 💻 cs.IR cs.AI
keywords RAGknowledge base reorganizationco-occurrence clusteringsession-level retrievalenterprise supportinformation retrievalsession coverageretrieval augmentation
0
0 comments X

The pith

Reorganizing the knowledge base with co-occurrence clusters raises single-retrieval session coverage from 41% to 58%.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

Standard RAG retrieves documents for one query at a time, yet users work through sessions of related questions that touch distant parts of the knowledge base. Analysis of enterprise support sessions shows a single retrieval covers only 41% of the session's information need. The paper proposes reorganizing the knowledge base offline by clustering articles that co-occur in sessions and then retrieving entire cluster neighborhoods at query time. This raises coverage to 58%, cuts the retrieval calls needed for 70% coverage by 34%, and reduces the knowledge base to 20% of its size. Readers should care because it shifts evaluation from single-query performance to session-level utility, which better matches how enterprise users actually interact with these systems.

Core claim

A single retrieval over a standard knowledge base covers only 41% of a user's session-level information need. Reorganizing the KB offline using co-occurrence-aware clustering and expanding retrieval candidates through cluster neighborhoods at query time raises single-query session coverage to 58%, reduces retrieval calls to 70% coverage by 34%, and compresses the KB to 20% of its original size, consistently across four embedding models and six functional domains on the WixQA dataset.

What carries the argument

Co-occurrence-aware clustering, which identifies groups of articles that frequently appear together in user sessions so retrieval can include neighboring documents from the same cluster.

If this is right

  • Session-level coverage becomes the recommended primary metric for evaluating enterprise RAG systems.
  • Knowledge bases can be compressed substantially while improving session coverage.
  • The number of retrieval operations per session decreases.
  • Results remain stable regardless of the embedding model chosen.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • This method may lower overall latency in RAG applications by requiring fewer retrieval rounds.
  • It suggests that static knowledge bases could benefit from periodic reorganization based on usage logs.
  • The approach might generalize to non-enterprise settings with logged user sessions.

Load-bearing premise

Co-occurrence patterns from the observed enterprise sessions are representative enough that expanding to cluster neighborhoods adds relevant documents without introducing harmful noise.

What would settle it

Running the reorganization method on a different set of user sessions and measuring no gain in session coverage or a drop in final answer quality would show the claim does not hold.

Figures

Figures reproduced from arXiv: 2606.31156 by Cecilia Cheng, Chaya Vijayakumar, Shivam Ratnakar, Yixuan Zhu.

Figure 1
Figure 1. Figure 1: The session coverage gap. A user setting up a website needs articles about domains, templates, and payments, but these reside in distant regions of the semantic embedding space. Vanilla RAG retrieves only documents within the top-k similarity radius of the query (dashed circle), missing semantically distant but session-relevant articles. Our cluster expansion recovers these, raising session coverage from 4… view at source ↗
Figure 2
Figure 2. Figure 2: System overview. (a) Offline: Documents from the same user session (same color) are scattered in semantic embedding space but cluster together after Word2Vec training on co-occurrence sequences. (b) Online: Given a query, we retrieve the top-3 by similarity (gray dots), expand through their cluster neighborhoods (blue dots, +12 candidates), and re-rank the union to return the top-8 with 58% session coverag… view at source ↗
Figure 3
Figure 3. Figure 3: Session coverage vs. retrieval budget. The [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Retrieval calls to reach session coverage [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Semantic similarity vs. co-occurrence strength [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗
read the original abstract

RAG systems retrieve documents optimized for answering one query at a time. Yet enterprise users arrive with sessions, that is, coherent episodes of related questions that span semantically distant parts of the knowledge base. We show that a single retrieval call over a standard knowledge base covers only 41% of a user's session-level information need. To close this gap, we reorganize the KB offline using co-occurrence-aware clustering and expand retrieval candidates through cluster neighborhoods at query time. On WixQA (6,221 enterprise support articles), our method raises single-query session coverage to 58% (+17% absolute; 95% CI: [14.1, 20.4]), reduces retrieval calls to 70% coverage by 34%, and compresses the KB to 20% of its original size, all consistently across four embedding models and six functional domains. We argue that session-level coverage, not single-query recall, should be the primary metric for enterprise RAG evaluation.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims that a single retrieval over a standard KB covers only 41% of session-level information needs in enterprise RAG. It introduces offline co-occurrence-aware clustering to reorganize the KB and expands retrieval to cluster neighborhoods at query time. On the WixQA dataset of 6,221 articles, this raises single-query session coverage to 58% (+17% absolute, 95% CI [14.1, 20.4]), reduces the number of retrieval calls needed for 70% coverage by 34%, and compresses the KB to 20% of original size, with results consistent across four embedding models and six functional domains. The authors argue that session-level coverage should replace single-query recall as the primary RAG evaluation metric.

Significance. If the coverage gains translate to improved end-to-end performance without introducing noise, the work could meaningfully shift enterprise RAG evaluation practices toward session-aware metrics and yield efficiency gains via KB compression. The reported cross-model and cross-domain consistency strengthens the empirical case, but the absence of any downstream answer-quality evaluation limits the assessed significance.

major comments (1)
  1. [Experiments] Experiments (results on WixQA): the central claim that the reorganization improves practical RAG utility rests on coverage and call-reduction numbers alone; no experiments evaluate whether cluster-neighborhood expansion improves, preserves, or harms final answer correctness, relevance, or hallucination rates. This is load-bearing for the recommendation to adopt session coverage as the primary metric.
minor comments (1)
  1. [Abstract] Abstract: the 95% CI is reported for the coverage gain, which is a positive detail, but the paper should state the exact statistical procedure and whether the CI accounts for multiple embedding models.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the detailed review and constructive feedback. We address the major comment on experiments below.

read point-by-point responses
  1. Referee: Experiments (results on WixQA): the central claim that the reorganization improves practical RAG utility rests on coverage and call-reduction numbers alone; no experiments evaluate whether cluster-neighborhood expansion improves, preserves, or harms final answer correctness, relevance, or hallucination rates. This is load-bearing for the recommendation to adopt session coverage as the primary metric.

    Authors: We agree that end-to-end evaluation of answer correctness, relevance, and hallucination rates would provide stronger evidence for practical utility. Our work focuses on retrieval coverage as a necessary precondition for answer quality in session-based RAG, since missing documents cannot be used by the generator. The co-occurrence clustering is designed to group semantically related content, which we expect to limit noise, but we did not measure downstream effects. In the revised manuscript we will add an explicit limitations section discussing this gap, its implications for the proposed metric, and planned future experiments with answer-quality metrics across multiple generators. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical measurements only

full rationale

The paper presents a clustering-based KB reorganization method evaluated via direct coverage measurements on WixQA sessions. No equations, fitted parameters, or predictions are described that reduce by construction to inputs. No self-citation load-bearing steps or ansatz smuggling appear in the provided text. Results are framed as empirical observations across embedding models and domains, making the derivation self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Abstract supplies no explicit parameters, lemmas, or new entities; the approach implicitly rests on the domain assumption that session co-occurrences reflect retrieval utility.

axioms (1)
  • domain assumption Documents that co-occur in user sessions are semantically close enough that retrieving from their cluster neighborhoods improves session coverage without adding harmful noise.
    This premise underpins both the offline reorganization and the query-time neighborhood expansion.

pith-pipeline@v0.9.1-grok · 5711 in / 1309 out tokens · 49978 ms · 2026-07-01T04:38:50.446217+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

15 extracted references · 9 canonical work pages · 1 internal anchor

  1. [1]

    Retrieval-Augmented Generation for Knowledge-Intensive

    Patrick Lewis and Ethan Perez and Aleksandra Piktus and Fabio Petroni and Vladimir Karpukhin and Naman Goyal and Heinrich K. Retrieval-Augmented Generation for Knowledge-Intensive. 2021 , eprint=

  2. [2]

    2013 , eprint=

    Efficient Estimation of Word Representations in Vector Space , author=. 2013 , eprint=

  3. [3]

    2505.08643 , archivePrefix=

    Dvir Cohen and Lin Burg and Sviatoslav Pykhnivskyi and Hagit Gur and Stanislav Kovynov and Olga Atzmon and Gilad Barkan , year=. 2505.08643 , archivePrefix=

  4. [4]

    Findings of the Association for Computational Linguistics: EMNLP 2025 , month=nov, year=

    Efficient Dynamic Clustering-Based Document Compression for Retrieval-Augmented-Generation , author=. Findings of the Association for Computational Linguistics: EMNLP 2025 , month=nov, year=. doi:10.18653/v1/2025.findings-emnlp.522 , pages=

  5. [5]

    Cluster-based Adaptive Retrieval: Dynamic Context Selection for

    Yifan Xu and Vipul Gupta and Rohit Aggarwal and Varsha Mahadevan and Bhaskar Krishnamachari , year=. Cluster-based Adaptive Retrieval: Dynamic Context Selection for. 2511.14769 , archivePrefix=

  6. [6]

    Santos , year=

    Simon Akesson and Frances A. Santos , year=. Clustered Retrieved Augmented Generation (. 2406.00029 , archivePrefix=

  7. [7]

    RAGA s: Automated Evaluation of Retrieval Augmented Generation

    Es, Shahul and James, Jithin and Espinosa Anke, Luis and Schockaert, Steven , booktitle=. 2024 , address=. doi:10.18653/v1/2024.eacl-demo.16 , pages=

  8. [8]

    In: Proceedings of the 2018 Conference on Empirical Methods in Natu- ral Language Processing

    Yang, Zhilin and Qi, Peng and Zhang, Saizheng and Bengio, Yoshua and Cohen, William and Salakhutdinov, Ruslan and Manning, Christopher D. , booktitle=. 2018 , address=. doi:10.18653/v1/D18-1259 , pages=

  9. [9]

    Fine-Tuning

    Xueguang Ma and Liang Wang and Nan Yang and Furu Wei and Jimmy Lin , year=. Fine-Tuning. 2310.08319 , archivePrefix=

  10. [10]

    2026 , eprint=

    On the Theoretical Limitations of Embedding-Based Retrieval , author=. 2026 , eprint=

  11. [11]

    2017 , eprint=

    Item2Vec: Neural Item Embedding for Collaborative Filtering , author=. 2017 , eprint=

  12. [12]

    Rousseeuw, Peter J. , year=. Silhouettes: A Graphical Aid to the Interpretation and Validation of Cluster Analysis , journal=

  13. [13]

    Transactions of the Association for Computational Linguistics , volume=

    Lost in the Middle: How Language Models Use Long Contexts , author=. Transactions of the Association for Computational Linguistics , volume=. 2024 , doi=

  14. [14]

    RAPTOR: Recursive Abstractive Processing for Tree-Organized Retrieval

    Parth Sarthi and Salman Abdullah and Aditi Tuli and Shubh Khanna and Anna Goldie and Christopher D. Manning , year=. 2401.18059 , archivePrefix=

  15. [15]

    Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , month=jul, year=

    Precise Zero-Shot Dense Retrieval without Relevance Labels , author=. Proceedings of the 61st Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , month=jul, year=. doi:10.18653/v1/2023.acl-long.99 , pages=