Don't Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills for QA and RAG
Pith reviewed 2026-05-19 17:35 UTC · model grok-4.3
The pith
Distilling a corpus into a hierarchical skill directory lets an LLM agent navigate it to improve QA and RAG on structured enterprise data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
Corpus2Skill distills the document corpus offline into a hierarchical skill directory. An LLM agent then navigates this directory at inference time, beginning with broad summaries and refining its path to specific documents while backtracking from unproductive branches. The result is improved answer quality and stronger grounding compared with single-shot dense, hybrid, hierarchical-retrieval, and agentic RAG baselines on an enterprise customer-support benchmark.
What carries the argument
The hierarchical skill directory created by offline clustering, which the agent traverses by expanding summaries level by level and backtracking when a branch yields no progress.
If this is right
- Answer quality and evidence grounding both rise on enterprise customer-support benchmarks.
- The method outperforms single-shot dense, hybrid, hierarchical-retrieval, and agentic RAG baselines.
- Cost increases remain moderate relative to the quality gains.
- Consistent benefits appear on single-domain corpora that possess a recoverable topical taxonomy.
- Flat retrieval remains preferable for open-domain factoid pools or homogeneous-tabular corpora.
Where Pith is reading between the lines
- Navigation gives the agent an explicit map of the corpus, which may reduce cases where relevant but non-retrieved documents are overlooked.
- System designers should first test whether their data supports clean top-level clustering before adopting navigation over retrieval.
- The same offline distillation step could be applied to other structured collections such as technical manuals or policy archives that share a natural hierarchy.
Load-bearing premise
The target corpus contains a recoverable topical taxonomy that supports effective offline hierarchical clustering into a navigable skill directory.
What would settle it
A performance comparison on a corpus lacking clear topical clusters, such as a random mix of unrelated facts, where navigation fails to outperform standard retrieval methods.
Figures
read the original abstract
Retrieval-Augmented Generation (RAG) grounds LLM responses in external evidence but treats the model as a passive consumer of search results, with no view of how the corpus is organized or what it has not yet seen. We present Corpus2Skill, which distills a document corpus offline into a hierarchical skill directory and lets an LLM agent navigate it at serve time, drilling from a bird's-eye view through progressively finer summaries down to documents, and backtracking when a branch is unproductive. On an enterprise customer-support benchmark, Corpus2Skill improves both answer quality and grounding over single-shot dense, hybrid, hierarchical-retrieval, and agentic RAG baselines at a moderate cost tradeoff. A ten-subset generalization study further shows that corpus navigation is not a universal replacement for retrieval: it consistently helps on single-domain corpora with a recoverable topical taxonomy, but flat retrieval remains preferable on open-domain factoid pools or homogeneous-tabular corpora that defeat top-level clustering. We characterize this scope distinction and discuss it as a design guideline for knowledge-grounded systems. Code is available at https://github.com/dukesun99/Corpus2Skill.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces Corpus2Skill, which distills a document corpus offline into a hierarchical skill directory that an LLM agent navigates at serve time by drilling from high-level summaries to documents and backtracking as needed. On an enterprise customer-support benchmark it reports improvements in answer quality and grounding relative to single-shot dense, hybrid, hierarchical-retrieval, and agentic RAG baselines at moderate extra cost. A ten-subset generalization study further claims that navigation outperforms retrieval on single-domain corpora possessing a recoverable topical taxonomy but that flat retrieval remains preferable on open-domain factoid or homogeneous-tabular corpora; the authors present this distinction as a design guideline and release code.
Significance. If the empirical results and scope characterization hold, the work offers a concrete alternative paradigm to retrieval-centric RAG by shifting to offline distillation and online navigation. The explicit code release and the attempt to delineate corpus conditions under which navigation is advantageous are positive contributions that could guide practitioners building enterprise QA systems. The generalization study, while preliminary, supplies a falsifiable framing that future work can test.
major comments (1)
- [Generalization study] Generalization study (ten-subset evaluation): the scope guideline that navigation helps precisely on corpora with a 'recoverable topical taxonomy' is asserted after observing performance differences, yet the manuscript reports no independent clustering-quality diagnostics (silhouette score, cophenetic correlation, cluster purity against domain labels, or summary-fidelity metrics). Without these, it is impossible to confirm that the skill directory actually recovered a useful taxonomy or whether gains arise from prompting differences or other unablated factors; this directly affects the load-bearing claim that the method's benefit is tied to the stated corpus property.
minor comments (2)
- [Abstract] Abstract: comparative improvements are stated without any numerical deltas, error bars, or statistical tests, leaving the central empirical claim difficult to assess from the summary alone.
- [Method] The description of the offline hierarchical clustering procedure lacks sufficient detail on algorithm choice, linkage method, stopping criteria, and how summaries are generated at each level.
Simulated Author's Rebuttal
We thank the referee for the detailed and constructive review. We address the major comment on the generalization study below and will incorporate revisions to strengthen the supporting evidence for our scope guideline.
read point-by-point responses
-
Referee: Generalization study (ten-subset evaluation): the scope guideline that navigation helps precisely on corpora with a 'recoverable topical taxonomy' is asserted after observing performance differences, yet the manuscript reports no independent clustering-quality diagnostics (silhouette score, cophenetic correlation, cluster purity against domain labels, or summary-fidelity metrics). Without these, it is impossible to confirm that the skill directory actually recovered a useful taxonomy or whether gains arise from prompting differences or other unablated factors; this directly affects the load-bearing claim that the method's benefit is tied to the stated corpus property.
Authors: We agree that the generalization study would benefit from explicit, independent clustering-quality diagnostics to more directly link performance gains to the recovery of a useful topical taxonomy rather than unablated factors such as prompting variations. The current manuscript presents the scope distinction primarily through observed performance differentials across corpus types. In the revised version we will add silhouette scores and cophenetic correlation coefficients computed on the hierarchical clustering steps used to build the skill directories for each of the ten subsets. We will also report summary-fidelity metrics (e.g., ROUGE or embedding similarity between generated summaries and source documents) and, for the subsets that possess domain labels, cluster purity. In addition, we will include a targeted ablation that applies the same hierarchical prompting structure to a flat-retrieval baseline, helping isolate the contribution of the navigation mechanism itself. These additions should provide clearer confirmation that the reported benefits track the stated corpus property. revision: yes
Circularity Check
No significant circularity; claims rest on direct empirical comparisons
full rationale
The paper's central claims rest on offline distillation of a corpus into a hierarchical skill directory followed by online agent navigation, with performance improvements demonstrated via direct comparisons against dense, hybrid, hierarchical-retrieval, and agentic RAG baselines on an enterprise benchmark plus a ten-subset generalization study. No equations, fitted parameters, or predictions reduce by construction to the same evaluation data or self-defined quantities. The scope distinction (navigation helps on single-domain corpora with recoverable topical taxonomy) is presented as an observed empirical pattern rather than a self-referential definition or load-bearing self-citation. The derivation chain is self-contained through the described distillation and navigation process without invoking unverified uniqueness theorems or ansatzes from prior author work.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Document corpora in the target domain contain a recoverable topical taxonomy amenable to hierarchical clustering.
invented entities (1)
-
Hierarchical skill directory
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Cost/FunctionalEquation.leanwashburn_uniqueness_aczel unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
We build a multi-level hierarchy through an iterative bottom-up process controlled by two parameters: the branching ratio p ... via K-Means ... Each resulting cluster is then summarized by an LLM
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
the hierarchy depth grows as O(log_p N) ... agent traverses at most L=⌈log p N⌉ levels
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Forward citations
Cited by 2 Pith papers
-
A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications
A survey that taxonomizes agent skills for LLM-based agents across representation, acquisition, retrieval, and evolution stages while reviewing methods, resources, and open challenges.
-
A Comprehensive Survey on Agent Skills: Taxonomy, Techniques, and Applications
The paper surveys agent skills for LLM agents, organizing the literature into a four-stage lifecycle of representation, acquisition, retrieval, and evolution while highlighting their role in system scalability.
Reference graph
Works this paper leans on
-
[1]
Agentic Retrieval-Augmented Generation: A Survey on Agentic RAG
InProceedings of the 1st Workshop on NLP for COVID-19 at ACL 2020, 2020. Nandy, A., Sharma, S., Maddhashiya, S., Sachdeva, K., Goyal, P., and Ganguly, N. Question answering over elec- tronic devices: A new benchmark dataset and a multi-task learning based QA framework. InFindings of the Associ- ation for Computational Linguistics: EMNLP 2021, pp. 4600–460...
work page internal anchor Pith review Pith/arXiv arXiv 2020
-
[4]
Name actual features, products, or processes
Key TERMS or features mentioned across documents Be specific and concrete. Name actual features, products, or processes. Documents ({N} total, showing up to 15): --- Document 1 --- {doc_text[:600]} 15 Don’t Retrieve, Navigate: Distilling Enterprise Knowledge into Navigable Agent Skills --- Document 2 --- {doc_text[:600]} [...] Summary: Cluster-level summa...
-
[7]
What types of user QUESTIONS this group can answer Be specific -- name the main product areas, features, or workflows. Sub-group summaries: - Sub-group 1: {summary[:300]} - Sub-group 2: {summary[:300]} [...] Overview: Labeling prompt.Used to generate a filesystem-safe di- rectory name from a cluster summary. The response is post-processed: lowercased, non...
-
[8]
The common TOPIC area these documents cover
-
[9]
The types of QUESTIONS these documents answer
-
[10]
Name actual features, products, or processes
Key TERMS or features mentioned across documents Be specific and concrete. Name actual features, products, or processes. Documents ({N} total, showing up to 15): --- Document 1 --- {doc_text[:600]} --- Document 2 --- {doc_text[:600]} [...] Summary: Cluster-level summarization.Used at higher levels to summarize a group of sub-cluster summaries into a broad...
-
[11]
The broad DOMAIN these sub-groups cover
-
[12]
The range of TOPICS within this domain
-
[13]
What types of user QUESTIONS this group can answer Be specific -- name the main product areas, features, or workflows. Sub-group summaries: - Sub-group 1: {summary[:300]} - Sub-group 2: {summary[:300]} [...] Overview: Labeling.Used to generate a filesystem-safe directory name from a cluster summary. The response is post- processed: lowercased, non-alphanu...
-
[14]
Read the SKILL.md of the 1-2 most relevant skills for your query
-
[15]
Drill into the most relevant sub-group: Read its INDEX.md
-
[16]
Pick the most relevant document IDs
At the leaf level, INDEX.md lists document IDs with brief titles. Pick the most relevant document IDs
-
[17]
Call get_document with each relevant doc_id to retrieve the full text
-
[18]
Read at least one full document before answering. ## Tools - Code execution: Use ‘ls‘ and ‘cat‘ to navigate the skills hierarchy. - get_document(doc_id): Retrieve the full text of a document by its ID. The doc_id values are listed in leaf-level INDEX.md files. ## Answer Format - First sentence = direct answer. No preamble. - Factual questions: 1-3 sentenc...
-
[19]
Go to Settings in your dashboard
-
[20]
Click Language & Region
-
[21]
Scroll to Currency and select your desired currency
-
[22]
Click Save. This applies to all Wix products including Online Programs. Your currency must match your payment provider’s currency." Navigation paths: 6221 -> 1513 -> 107 -> 18 -> 1 (programs) 6221 -> 1513 -> 68 -> 18 -> 1 (billing) This trace demonstrates cross-branch navigation: the agent first explored the online-programs subgroup to understand the rela...
work page 2009
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.