Recognition: 2 theorem links
· Lean TheoremSkillRAE: Agent Skill-Based Context Compilation for Retrieval-Augmented Execution
Pith reviewed 2026-05-12 02:50 UTC · model grok-4.3
The pith
SkillRAE compiles coarse skill retrievals into compact, grounded contexts using a multi-level graph and rescue-aware steps for better LLM agent execution.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SkillRAE is a two-stage Retrieval-Augmented Execution method that first builds a multi-level skill graph over communities, skills, and reusable subunits in an offline indexing stage, then in the online stage performs skill-ranked retrieval with subunit evidence export followed by rescue-aware compact compilation to convert a coarse-ranked skill set into a compact, grounded, and immediately usable task-specific context.
What carries the argument
The multi-level skill graph over communities, skills, and subunits, paired with rescue-aware compact compilation that recovers key evidence from coarse retrievals.
If this is right
- LLM agents can scale to larger skill libraries while keeping execution contexts efficient and grounded.
- Retrieval can tolerate coarser initial ranking provided a subsequent rescue and compilation stage is present.
- Document-centric and data-intensive workflows become more tractable once skills are organized into immediately usable forms.
- Context compilation is shown to be a distinct and necessary component rather than a simple prompt-engineering addition.
Where Pith is reading between the lines
- The same graph-plus-rescue pattern could be tested on non-skill retrieval-augmented tasks such as code generation or multi-hop question answering.
- Dynamic updates to the skill graph during agent operation might allow the system to incorporate newly discovered skills without full re-indexing.
- If the graph construction depends on initial skill quality, low-quality libraries would limit gains and point to a need for upstream skill curation.
Load-bearing premise
The multi-level skill graph accurately captures skill relationships and the rescue-aware compilation step can recover critical evidence from coarse-ranked retrievals without important loss.
What would settle it
Running SkillRAE on SkillsBench with the rescue-aware compilation stage removed and finding performance equal to or below the prior SOTA baseline would falsify the claim that context compilation is essential.
Figures
read the original abstract
Large Language Model (LLM)-based agents (e.g., OpenClaw) increasingly rely on reusable skill libraries to solve artifact-rich tasks such as document-centric workflows and data-intensive analysis. As these libraries grow, a few works have attempted to study the Retrieval-Augmented Execution (RAE), which often first retrieves some external skills and other knowledge, then compiles the context using retrieved skills, and finally executes the task. Existing works mainly focus on optimizing skill retrieval and task execution, and they pay little attention to how to effectively organize the selected skill evidence in a form that is compact, grounded, and immediately usable for the downstream executors to complete tasks. To fill this gap, we propose SkillRAE, a two-stage RAE approach focusing on skill-based context compilation, which consists of the offline and online stages. Specifically, in the offline indexing stage, it builds a multi-level skill graph over skill communities, skills, and reusable subunits, for capturing their relationships. In the online retrieval stage, it first performs skill-ranked retrieval with selected-subunit evidence export in the graph, and then applies rescue-aware compact compilation to recover the key evidence. Together, these components compile a coarse-ranked skill set into a task-specific context that is compact, grounded, and immediately usable. Experiments on two public benchmarks show that SkillRAE achieves a significant improvement over baselines for RAE. For example, on SkillsBench, it achieves an improvement of 11.7% over the SOTA method. Ablation studies further show that our context compilation is crucial, instead of a mere prompt addition.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces SkillRAE, a two-stage Retrieval-Augmented Execution (RAE) method for LLM-based agents. The offline stage constructs a multi-level skill graph over skill communities, skills, and reusable subunits to capture relationships. The online stage performs skill-ranked retrieval with selected-subunit evidence export, followed by rescue-aware compact compilation to produce compact, grounded, and immediately usable task contexts. Experiments on two public benchmarks report a 11.7% improvement over SOTA on SkillsBench, with ablations indicating that context compilation is essential rather than a simple prompt addition.
Significance. If the results hold under rigorous verification, SkillRAE addresses a clear gap in RAE literature by prioritizing effective organization of retrieved skill evidence over retrieval or execution alone. The multi-level graph and rescue-aware mechanism offer a structured way to handle expanding skill libraries for artifact-rich tasks. The reported gains and ablation emphasis on compilation provide a promising direction, though the absence of detailed experimental protocols limits immediate assessment of robustness and reproducibility.
major comments (2)
- [Experiments] Experiments section (as summarized in abstract): The central performance claim of an 11.7% improvement over SOTA on SkillsBench, along with the assertion that ablations demonstrate context compilation is crucial, lacks any description of experimental setup, baselines, number of runs, statistical tests, error bars, or data handling. This information is load-bearing for validating the empirical results that support the paper's main contribution.
- [Online stage] Online retrieval and compilation stage (as described in abstract): The rescue-aware compact compilation is presented at a high level as recovering key evidence from coarse-ranked skills without loss; however, no concrete mechanism, algorithm, or example is supplied to show how it avoids dropping critical evidence, which directly underpins the claim that the compiled context is 'immediately usable' for downstream executors.
minor comments (2)
- [Abstract] The abstract would benefit from a brief illustrative example of the multi-level skill graph (communities/skills/subunits) to clarify the offline indexing process for readers.
- [Method overview] Terminology such as 'rescue-aware' and 'selected-subunit evidence export' is introduced without prior definition or reference, which could be clarified in the method overview for better readability.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback and the recommendation for major revision. We address each major comment point by point below and will incorporate the suggested improvements to strengthen the manuscript's clarity and reproducibility.
read point-by-point responses
-
Referee: [Experiments] Experiments section (as summarized in abstract): The central performance claim of an 11.7% improvement over SOTA on SkillsBench, along with the assertion that ablations demonstrate context compilation is crucial, lacks any description of experimental setup, baselines, number of runs, statistical tests, error bars, or data handling. This information is load-bearing for validating the empirical results that support the paper's main contribution.
Authors: We agree that the current manuscript does not include sufficient details on the experimental protocol to allow full verification of the reported results. In the revised version, we will expand Section 4 (Experiments) to explicitly describe the full experimental setup, list all baselines with citations, specify the number of runs (including error bars), detail the statistical tests performed, and outline data handling procedures. This will directly support the 11.7% improvement claim and the ablation analysis. revision: yes
-
Referee: [Online stage] Online retrieval and compilation stage (as described in abstract): The rescue-aware compact compilation is presented at a high level as recovering key evidence from coarse-ranked skills without loss; however, no concrete mechanism, algorithm, or example is supplied to show how it avoids dropping critical evidence, which directly underpins the claim that the compiled context is 'immediately usable' for downstream executors.
Authors: We acknowledge that the rescue-aware compact compilation is currently described at a conceptual level without a concrete algorithm or example. In the revision, we will add a detailed algorithmic description with pseudocode for the rescue mechanism in Section 3, along with a worked example illustrating how critical evidence is identified, exported from the skill graph, and preserved during compilation to ensure no loss and immediate usability. revision: yes
Circularity Check
No significant circularity; empirical claims on public benchmarks
full rationale
The paper introduces SkillRAE as a two-stage system (offline multi-level skill graph over communities/skills/subunits, online ranked retrieval plus rescue-aware compilation) whose value is asserted via measured gains on public benchmarks (e.g., +11.7% on SkillsBench versus SOTA) and ablation studies showing context compilation is not mere prompt addition. No equations, fitted parameters, or derivations are present that could reduce to the inputs by construction. No self-citations are invoked as load-bearing uniqueness theorems or ansatzes. The evaluation therefore rests on independent external data rather than self-referential definitions or renamings, rendering the derivation chain self-contained.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption LLM agents benefit from reusable skill libraries for artifact-rich tasks
- domain assumption Organizing skills into communities, skills, and subunits captures useful relationships
invented entities (2)
-
multi-level skill graph
no independent evidence
-
rescue-aware compact compilation
no independent evidence
Lean theorems connected to this paper
-
IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean, Cost/FunctionalEquation.leanreality_from_one_distinction unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
builds a multi-level skill graph over skill communities, skills, and reusable subunits... rescue-aware compact compilation to recover the key evidence
-
IndisputableMonolith/Foundation/AlexanderDuality.leanalexander_duality_circle_linking unclear?
unclearRelation between the paper passage and the cited Recognition theorem.
Experiments on SkillsBench... 11.7% improvement over SOTA
What do these tags mean?
- matches
- The paper's claim is directly supported by a theorem in the formal canon.
- supports
- The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
- extends
- The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
- uses
- The paper appears to rely on the theorem as machinery.
- contradicts
- The paper's claim conflicts with a theorem or certificate in the canon.
- unclear
- Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.
Reference graph
Works this paper leans on
-
[1]
OpenClaw: Personal ai assistant
OpenClaw. OpenClaw: Personal ai assistant. https://github.com/openclaw/openclaw, 2026. Accessed: 2026-05-05
work page 2026
-
[2]
Manus AI. Welcome – manus documentation. https://manus.im/docs/introduction/welcome,
-
[3]
Accessed: 2026-05-05
work page 2026
-
[4]
API-bank: A comprehensive benchmark for tool-augmented LLMs
Minghao Li, Yingxiu Zhao, Bowen Yu, Feifan Song, Hangyu Li, Haiyang Yu, Zhoujun Li, Fei Huang, and Yongbin Li. API-bank: A comprehensive benchmark for tool-augmented LLMs. In Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 3102–3116, Singapore, 2023. Association for Computational Linguistics. doi: 10.186 53/v1...
work page 2023
-
[5]
ToolLLM: Facilitating large lan- guage models to master 16000+ real-world APIs
Yujia Qin, Shihao Liang, Yining Ye, Kunlun Zhu, Lan Yan, Yaxi Lu, Yankai Lin, Xin Cong, Xiangru Tang, Bill Qian, Sihan Zhao, Lauren Hong, Runchu Tian, Ruobing Xie, Jie Zhou, Mark Gerstein, Dahai Li, Zhiyuan Liu, and Maosong Sun. ToolLLM: Facilitating large lan- guage models to master 16000+ real-world APIs. InInternational Conference on Learning Represent...
work page 2024
-
[6]
ToolHop: A query-driven benchmark for evaluating large language models in multi-hop tool use
Junjie Ye, Zhengyin Du, Xuesong Yao, Weijian Lin, Yufei Xu, Zehui Chen, Zaiyuan Wang, Sining Zhu, Zhiheng Xi, Siyu Yuan, Tao Gui, Qi Zhang, Xuanjing Huang, and Jiecao Chen. ToolHop: A query-driven benchmark for evaluating large language models in multi-hop tool use. InProceedings of the 63rd Annual Meeting of the Association for Computational Linguis- tic...
work page 2025
-
[7]
ShortcutsBench: A large-scale real-world benchmark for api-based agents
Haiyang Shen, Yue Li, Desong Meng, Dongqi Cai, Sheng Qi, Li Zhang, Mengwei Xu, and Yun Ma. ShortcutsBench: A large-scale real-world benchmark for api-based agents. InProceedings of the International Conference on Learning Representations, 2025
work page 2025
-
[8]
Voyager: An Open-Ended Embodied Agent with Large Language Models
Guanzhi Wang, Yuqi Xie, Yunfan Jiang, Ajay Mandlekar, Chaowei Xiao, Yuke Zhu, Linxi Fan, and Anima Anandkumar. V oyager: An open-ended embodied agent with large language models.arXiv preprint arXiv:2305.16291, 2023. doi: 10.48550/arXiv.2305.16291. URL https://arxiv.org/abs/2305.16291
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2305.16291 2023
-
[9]
SkillsBench: Benchmarking How Well Agent Skills Work Across Diverse Tasks
Xiangyi Li, Wenbo Chen, Yimin Liu, Shenghan Zheng, Xiaokun Chen, Yifeng He, Yubo Li, Bingran You, Haotian Shen, Jiankai Sun, Shuyi Wang, Qunhong Zeng, Di Wang, Xuan- dong Zhao, Yuanli Wang, Roey Ben Chaim, Zonglin Di, Yipeng Gao, Junwei He, Yizhuo He, Liqiang Jing, Luyang Kong, Xin Lan, Jiachen Li, Songlin Li, Yijiang Li, Yueqian Lin, Xinyi Liu, Xuanqing ...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2602.12670 2026
-
[10]
Organizing, orchestrating, and benchmarking agent skills at ecosystem scale, 2026
Hao Li, Chunjiang Mu, Jianhao Chen, Siyue Ren, Zhiyao Cui, Yiqun Zhang, Lei Bai, and Shuyue Hu. Organizing, orchestrating, and benchmarking agent skills at ecosystem scale. arXiv preprint arXiv:2603.02176, 2026. doi: 10.48550/arXiv.2603.02176. URL https: //arxiv.org/abs/2603.02176
-
[11]
SkillRouter: Skill routing for LLM agents at scale.arXiv preprint arXiv:2603.22455, 2026
YanZhao Zheng, ZhenTao Zhang, Chao Ma, YuanQiang Yu, JiHuai Zhu, Yong Wu, Tianze Xu, Baohua Dong, Hangcheng Zhu, Ruohui Huang, and Gang Yu. SkillRouter: Skill routing for LLM agents at scale.arXiv preprint arXiv:2603.22455, 2026. doi: 10.48550/arXiv.2603.22
- [12]
-
[13]
RepoCoder: Repository-level code completion through itera- tive retrieval and generation
Fengji Zhang, Bei Chen, Yue Zhang, Jacky Keung, Jin Liu, Daoguang Zan, Yi Mao, Jian- Guang Lou, and Weizhu Chen. RepoCoder: Repository-level code completion through itera- tive retrieval and generation. InProceedings of the 2023 Conference on Empirical Methods in Natural Language Processing, pages 2471–2484, 2023
work page 2023
-
[14]
Dataflow-guided retrieval augmentation for repository- level code completion
Wei Cheng, Yuhan Wu, and Wei Hu. Dataflow-guided retrieval augmentation for repository- level code completion. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 7957–7977, 2024. 10
work page 2024
-
[15]
RepoGraph: Enhancing ai software engineering with repository-level code graph
Siru Ouyang, Wenhao Yu, Kaixin Ma, Zilin Xiao, Zhihan Zhang, Mengzhao Jia, Jiawei Han, Hongming Zhang, and Dong Yu. RepoGraph: Enhancing ai software engineering with repository-level code graph. InProceedings of the International Conference on Learning Rep- resentations, 2025
work page 2025
-
[16]
Parth Sarthi, Salman Abdullah, Aditi Tuli, Shubh Khanna, Anna Goldie, and Christopher D. Manning. RAPTOR: Recursive abstractive processing for tree-organized retrieval. InInterna- tional Conference on Learning Representations, 2024. URL https://openreview.net/forum?id= GN921JHCRw
work page 2024
-
[17]
HippoRAG: Neurobiologically inspired long-term memory for large language models
Bernal Jiménez Gutiérrez, Yiheng Shu, Yu Gu, Michihiro Yasunaga, and Yu Su. HippoRAG: Neurobiologically inspired long-term memory for large language models. InAdvances in Neu- ral Information Processing Systems, volume 37, 2024
work page 2024
-
[18]
Zhuoqun Li, Xuanang Chen, Haiyang Yu, Hongyu Lin, Yaojie Lu, Qiaoyu Tang, Fei Huang, Xianpei Han, Le Sun, and Yongbin Li. StructRAG: Boosting knowledge intensive reasoning of llms via inference-time hybrid information structurization. InProceedings of the International Conference on Learning Representations, 2025
work page 2025
-
[19]
ArchRAG: Attributed community-based hierarchical retrieval-augmented generation
Shu Wang, Yixiang Fang, Yingli Zhou, Xilin Liu, and Yuchi Ma. ArchRAG: Attributed community-based hierarchical retrieval-augmented generation. InProceedings of the AAAI Conference on Artificial Intelligence, volume 40, pages 15868–15876, 2026. doi: 10.1609/aa ai.v40i19.38619. URL https://ojs.aaai.org/index.php/AAAI/article/view/38619
work page doi:10.1609/aa 2026
-
[20]
Shu Wang, Yingli Zhou, and Yixiang Fang. BookRAG: A hierarchical structure-aware index- based approach for retrieval-augmented generation on complex documents.arXiv preprint arXiv:2512.03413, 2025. doi: 10.48550/arXiv.2512.03413. URL https://arxiv.org/abs/2512.0 3413
-
[21]
HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face
Yongliang Shen, Kaitao Song, Xu Tan, Dongsheng Li, Weiming Lu, and Yueting Zhuang. HuggingGPT: Solving AI tasks with ChatGPT and its friends in hugging face. InAdvances in Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=yHdT scY6Ci
work page 2023
-
[22]
ToolPlanner: A tool augmented LLM for multi granularity instructions with path planning and feedback
Qinzhuo Wu, Wei Liu, Jian Luan, and Bin Wang. ToolPlanner: A tool augmented LLM for multi granularity instructions with path planning and feedback. InProceedings of the 2024 Conference on Empirical Methods in Natural Language Processing, pages 18315–18339, Miami, Florida, USA, November 2024. Association for Computational Linguistics. doi: 10.1 8653/v1/202...
work page 2024
-
[23]
Trace2Skill: Distill Trajectory-Local Lessons into Transferable Agent Skills
Jingwei Ni, Yihao Liu, Xinpeng Liu, Yutao Sun, Mengyu Zhou, Pengyu Cheng, Dexin Wang, Xiaoxi Jiang, and Guanjun Jiang. Trace2skill: Distill trajectory-local lessons into transferable agent skills.arXiv preprint arXiv:2603.25158, 2026
work page internal anchor Pith review Pith/arXiv arXiv 2026
-
[24]
ReasoningBank: Scaling Agent Self-Evolving with Reasoning Memory
Siru Ouyang, Jun Yan, I Hsu, Yanfei Chen, Ke Jiang, Zifeng Wang, Rujun Han, Long T Le, Samira Daruki, Xiangru Tang, et al. Reasoningbank: Scaling agent self-evolving with reason- ing memory.arXiv preprint arXiv:2509.25140, 2025
work page internal anchor Pith review arXiv 2025
-
[25]
Graph of Skills: Dependency-Aware Structural Retrieval for Massive Agent Skills
Dawei Liu, Zongxia Li, Hongyang Du, Xiyang Wu, Shihang Gui, Yongbei Kuang, and Lichao Sun. Graph of skills: Dependency-aware structural retrieval for massive agent skills.arXiv preprint arXiv:2604.05333, 2026. doi: 10.48550/arXiv.2604.05333. URL https://arxiv.org/ab s/2604.05333
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2604.05333 2026
-
[26]
ReAct: Synergizing reasoning and acting in language models
Shunyu Yao, Jeffrey Zhao, Dian Yu, Nan Du, Izhak Shafran, Karthik Narasimhan, and Yuan Cao. ReAct: Synergizing reasoning and acting in language models. InInternational Confer- ence on Learning Representations, 2023. URL https://openreview.net/forum?id=WE_vluYU L-X
work page 2023
-
[27]
Toolformer: Language models can teach themselves to use tools
Timo Schick, Jane Dwivedi-Yu, Roberto Dessi, Roberta Raileanu, Maria Lomeli, Eric Hambro, Luke Zettlemoyer, Nicola Cancedda, and Thomas Scialom. Toolformer: Language models can teach themselves to use tools. InAdvances in Neural Information Processing Systems, 2023. URL https://openreview.net/forum?id=Yacmpz84TH. 11
work page 2023
-
[28]
Patil, Tianjun Zhang, Xin Wang, and Joseph E
Shishir G. Patil, Tianjun Zhang, Xin Wang, and Joseph E. Gonzalez. Gorilla: Large language model connected with massive APIs. InAdvances in Neural Information Processing Systems,
-
[29]
URL https://proceedings.neurips.cc/paper_files/paper/2024/hash/e4c61f578ff07830f5 c37378dd3ecb0d-Abstract-Conference.html
work page 2024
-
[30]
ToolkenGPT: Augmenting frozen language models with massive tools via tool embeddings
Shibo Hao, Tianyang Liu, Zhen Wang, and Zhiting Hu. ToolkenGPT: Augmenting frozen language models with massive tools via tool embeddings. InAdvances in Neural Information Processing Systems, 2023. URL https://proceedings.neurips.cc/paper_files/paper/2023/hash/8 fd1a81c882cd45f64958da6284f4a3f-Abstract-Conference.html
work page 2023
-
[31]
AnyTool: Self-reflective, hierarchical agents for large-scale API calls
Yu Du, Fangyun Wei, and Hongyang Zhang. AnyTool: Self-reflective, hierarchical agents for large-scale API calls. InProceedings of the 41st International Conference on Machine Learning, volume 235 ofProceedings of Machine Learning Research, pages 11812–11829. PMLR, 2024. URL https://proceedings.mlr.press/v235/du24h.html
work page 2024
-
[32]
Re-invoke: Tool invoca- tion rewriting for zero-shot tool retrieval
Yanfei Chen, Jinsung Yoon, Devendra Singh Sachan, Qingze Wang, Vincent Cohen-Addad, Mohammadhossein Bateni, Chen-Yu Lee, and Tomas Pfister. Re-invoke: Tool invoca- tion rewriting for zero-shot tool retrieval. InFindings of the Association for Computa- tional Linguistics: EMNLP 2024, pages 4705–4726, Miami, Florida, USA, 2024. Associa- tion for Computation...
-
[33]
Retrieval-augmented generation for knowledge-intensive NLP tasks
Patrick Lewis, Ethan Perez, Aleksandra Piktus, Fabio Petroni, Vladimir Karpukhin, Naman Goyal, Heinrich Küttler, Mike Lewis, Wen-tau Yih, Tim Rocktäschel, Sebastian Riedel, and Douwe Kiela. Retrieval-augmented generation for knowledge-intensive NLP tasks. InAd- vances in Neural Information Processing Systems, volume 33, 2020. URL https://proceeding s.neur...
work page 2020
-
[34]
In-depth Analysis of Graph-based RAG in a Unified Framework
Yingli Zhou, Yaodong Su, Youran Sun, Shu Wang, Taotao Wang, Runyuan He, Yongwei Zhang, Sicong Liang, Xilin Liu, Yuchi Ma, et al. In-depth analysis of graph-based rag in a unified framework.arXiv preprint arXiv:2503.04338, 2025
work page internal anchor Pith review Pith/arXiv arXiv 2025
-
[35]
From Local to Global: A Graph RAG Approach to Query-Focused Summarization
Darren Edge, Ha Trinh, Newman Cheng, Joshua Bradley, Alex Chao, Apurva Mody, Steven Truitt, Dasha Metropolitansky, Robert Osazuwa Ness, and Jonathan Larson. From lo- cal to global: A graph RAG approach to query-focused summarization.arXiv preprint arXiv:2404.16130, 2024. doi: 10.48550/arXiv.2404.16130. URL https://arxiv.org/abs/ 2404.16130
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2404.16130 2024
-
[36]
Pathrag: Pruning graph-based retrieval augmented generation with relational paths
Boyu Chen, Zirui Guo, Zidan Yang, Yuluo Chen, Junze Chen, Zhenghao Liu, Chuan Shi, and Cheng Yang. Pathrag: Pruning graph-based retrieval augmented generation with relational paths. InProceedings of the AAAI conference on artificial intelligence, volume 40, pages 30183–30191, 2026
work page 2026
-
[37]
LightRAG: Simple and Fast Retrieval-Augmented Generation
Zirui Guo, Lianghao Xia, Yanhua Yu, Tian Ao, and Chao Huang. Lightrag: Simple and fast retrieval-augmented generation.arXiv preprint arXiv:2410.05779, 2(3), 2024
work page internal anchor Pith review arXiv 2024
-
[38]
Retrieval-augmented generation with hierarchical knowledge.arXiv preprint arXiv:2503.10150,
Haoyu Huang, Yongfeng Huang, Junjie Yang, Zhenyu Pan, Yongqiang Chen, Kaili Ma, Hongzhi Chen, and James Cheng. Retrieval-augmented generation with hierarchical knowl- edge.arXiv preprint arXiv:2503.10150, 2025
-
[39]
Active retrieval augmented generation
Zhengbao Jiang, Frank F Xu, Luyu Gao, Zhiqing Sun, Qian Liu, Jane Dwivedi-Yu, Yiming Yang, Jamie Callan, and Graham Neubig. Active retrieval augmented generation. InPro- ceedings of the 2023 conference on empirical methods in natural language processing, pages 7969–7992, 2023
work page 2023
-
[40]
Self-rag: Learn- ing to retrieve, generate, and critique through self-reflection
Akari Asai, Zeqiu Wu, Yizhong Wang, Avirup Sil, and Hannaneh Hajishirzi. Self-rag: Learn- ing to retrieve, generate, and critique through self-reflection. InThe Twelfth International Conference on Learning Representations, 2023
work page 2023
-
[41]
Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions
Harsh Trivedi, Niranjan Balasubramanian, Tushar Khot, and Ashish Sabharwal. Interleaving retrieval with chain-of-thought reasoning for knowledge-intensive multi-step questions. InPro- ceedings of the 61st annual meeting of the association for computational linguistics (volume 1: long papers), pages 10014–10037, 2023. 12
work page 2023
-
[42]
Tiannuo Yang, Zebin Yao, Bowen Jin, Lixiao Cui, Yusen Li, Gang Wang, and Xiaoguang Liu. Demystifying and enhancing the efficiency of large language model based search agents.arXiv preprint arXiv:2505.12065, 2025
-
[43]
Recomp: Improving retrieval-augmented lms with context compression and selective augmentation
Fangyuan Xu, Weijia Shi, and Eunsol Choi. Recomp: Improving retrieval-augmented lms with context compression and selective augmentation. InThe Twelfth International Conference on Learning Representations, 2023
work page 2023
-
[44]
Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression
Huiqiang Jiang, Qianhui Wu, Xufang Luo, Dongsheng Li, Chin-Yew Lin, Yuqing Yang, and Lili Qiu. Longllmlingua: Accelerating and enhancing llms in long context scenarios via prompt compression. InProceedings of the 62nd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers), pages 1658–1677, 2024
work page 2024
-
[45]
Jinyuan Fang, Zaiqiao Meng, and Craig Macdonald. Trace the evidence: Constructing knowledge-grounded reasoning chains for retrieval-augmented generation. InFindings of the Association for Computational Linguistics: EMNLP 2024, pages 8472–8494, 2024
work page 2024
-
[46]
Chameleon: Plug-and-play compositional reasoning with large lan- guage models
Pan Lu, Baolin Peng, Hao Cheng, Michel Galley, Kai-Wei Chang, Ying Nian Wu, Song-Chun Zhu, and Jianfeng Gao. Chameleon: Plug-and-play compositional reasoning with large lan- guage models. InAdvances in Neural Information Processing Systems, 2023
work page 2023
-
[48]
URL https://arxiv.org/abs/2309.07597
work page internal anchor Pith review arXiv
-
[49]
Sentence- BERT : Sentence Embeddings using S iamese BERT -Networks
Nils Reimers and Iryna Gurevych. Sentence-BERT: Sentence embeddings using siamese BERT-networks. InProceedings of the 2019 Conference on Empirical Methods in Nat- ural Language Processing and the 9th International Joint Conference on Natural Lan- guage Processing, pages 3982–3992. Association for Computational Linguistics, 2019. doi: 10.18653/v1/D19-1410....
- [50]
-
[51]
OpenAI. GPT-5.2 Model. https://developers.openai.com/api/docs/models/gpt-5.2, 2025. Accessed 2026-05-06
work page 2025
-
[52]
Google. Gemini CLI Documentation. https://google-gemini.github.io/gemini-cli/docs/, 2026. Accessed 2026-05-06
work page 2026
-
[53]
Gemini 3 Flash is now available in Gemini CLI
Google Developers Blog. Gemini 3 Flash is now available in Gemini CLI. https://developers.g oogleblog.com/gemini-3-flash-is-now-available-in-gemini-cli/, 2025. Accessed 2026-05-06. 13 A Appendix A.1 Implementation Details The subunit extractor is deterministic. It collects three types of support evidence from each source SKILL.md: procedural lines, elemen...
work page 2025
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.