SeedER: Seed-and-Expand Retrieval from Knowledge Graphs
Pith reviewed 2026-05-25 04:52 UTC · model grok-4.3
The pith
SeedER retrieves from knowledge graphs by seeding a compact node set then expanding it with a learned reinforcement policy.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
SeedER first seeds a compact set of core nodes using lightweight dense and entity-based retrieval, then selectively expands this set via a learned graph-aware policy trained with reinforcement learning. This design decomposes global reasoning into reusable local decisions, enabling efficient discovery of query-relevant nodes while tightly controlling expansion cost.
What carries the argument
The seed-and-expand retrieval process that trains a graph-aware policy with reinforcement learning to make low-cost local expansion decisions from an initial dense seed set.
If this is right
- Recall improves over dense retrieval and graph-augmented baselines while keeping candidate sets small.
- The method serves as an effective first-stage retriever for knowledge-intensive reasoning systems.
- Theoretical analysis shows dense methods have inherent limits on compositional graph queries that the expansion step can mitigate.
- The approach yields advantages from both compositional generalization and graph-constrained submodular optimization views.
Where Pith is reading between the lines
- The same seeding-plus-local-policy pattern might reduce search cost in other irregular structured data such as molecular graphs or citation networks.
- If the policy can be made query-agnostic at inference time, the method could support real-time retrieval without per-query retraining.
- Downstream systems that rely on retrieved subgraphs might see accuracy gains if the compact sets preserve more relational paths than embedding-only methods.
Load-bearing premise
The learned expansion policy generalizes from training data to produce query-relevant nodes on unseen compositional queries without the set growing too large.
What would settle it
A test set of new multi-hop queries where the policy either misses more than a baseline fraction of relevant nodes or produces candidate sets larger than the reported compact sizes.
Figures
read the original abstract
Knowledge graphs (KGs) offer a rich representation for relational knowledge, but their irregular structure makes retrieval challenging: ego-graph expansion grows rapidly, and dense embedding methods struggle with multi-hop compositional queries. Existing agent-based graph exploration approaches, while expressive, are often too expensive for large-scale retrieval. We introduce SeedER (Seed-and-Expand Retrieval), a retrieval framework that explicitly leverages KG structure through iterative, low-cost expansion. SeedER first seeds a compact set of core nodes using lightweight dense and entity-based retrieval, then selectively expands this set via a learned graph-aware policy trained with reinforcement learning. This design decomposes global reasoning into reusable local decisions, enabling efficient discovery of query-relevant nodes while tightly controlling expansion cost. We show theoretical limitations of dense retrieval on compositional graph queries, and establish advantages of SeedER from both compositional generalization and graph-constrained submodular optimization perspectives. Empirically, SeedER substantially improves recall with compact candidate sets over strong dense and graph-augmented baselines, making it an effective first-stage retriever for knowledge-intensive reasoning systems.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper introduces SeedER, a retrieval framework for knowledge graphs. It seeds a compact set of core nodes via lightweight dense and entity-based retrieval, then expands the set via a learned graph-aware policy trained with reinforcement learning on local decisions. The work claims to demonstrate theoretical limitations of dense retrieval on compositional graph queries, advantages from compositional generalization and graph-constrained submodular optimization perspectives, and substantial empirical recall gains with compact candidate sets over dense and graph-augmented baselines, positioning SeedER as an effective first-stage retriever.
Significance. If the empirical gains and policy generalization hold, SeedER could meaningfully advance efficient multi-hop retrieval over KGs by decomposing global reasoning into reusable local RL decisions while controlling expansion cost. The combination of seeding, RL expansion, and the stated theoretical perspectives would offer a practical alternative to both pure dense methods and expensive agent-based exploration.
major comments (2)
- [Abstract] Abstract and experimental sections: The central claim of substantial recall improvements with compact sets depends on the RL policy generalizing to unseen compositional queries without cost blowup. The manuscript asserts this but supplies no results on out-of-distribution generalization, policy overfitting to training query patterns, or behavior on higher-hop structures, leaving the weakest assumption unverified.
- [Theoretical analysis] Theoretical analysis section: The claimed advantages from both compositional generalization and graph-constrained submodular optimization are asserted, but the explicit linkage between the RL objective (local decisions) and the submodular objective is not derived or shown, so the connection to the stated perspectives remains unclear.
minor comments (1)
- [Abstract] The abstract uses qualitative phrasing ('substantially improves') without any quantitative anchors; adding even high-level numbers from the experiments would improve clarity.
Simulated Author's Rebuttal
We thank the referee for the constructive feedback. We address the two major comments below, agreeing that additional clarification and experiments would strengthen the manuscript.
read point-by-point responses
-
Referee: [Abstract] Abstract and experimental sections: The central claim of substantial recall improvements with compact sets depends on the RL policy generalizing to unseen compositional queries without cost blowup. The manuscript asserts this but supplies no results on out-of-distribution generalization, policy overfitting to training query patterns, or behavior on higher-hop structures, leaving the weakest assumption unverified.
Authors: We agree that the current manuscript does not include dedicated experiments on out-of-distribution generalization of the RL policy, potential overfitting to training query patterns, or explicit scaling to higher-hop structures. While the local decision formulation is intended to promote reusability, these aspects remain unverified in the presented results. We will add a new subsection with OOD tests, higher-hop analysis, and overfitting diagnostics in the revised version. revision: yes
-
Referee: [Theoretical analysis] Theoretical analysis section: The claimed advantages from both compositional generalization and graph-constrained submodular optimization are asserted, but the explicit linkage between the RL objective (local decisions) and the submodular objective is not derived or shown, so the connection to the stated perspectives remains unclear.
Authors: The manuscript presents advantages separately from the compositional generalization view and from the graph-constrained submodular optimization view, with the RL policy serving as a learned approximation to local submodular decisions. However, an explicit derivation linking the local RL objective to the global submodular objective is not provided. We will revise the theoretical analysis section to include this derivation or a clearer statement of the relationship. revision: yes
Circularity Check
No circularity detected; derivation is self-contained
full rationale
The provided abstract and description introduce SeedER as a seed-then-expand framework with an RL-trained policy, theoretical limitations on dense retrieval, and empirical gains, but contain no equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations that reduce claims to inputs by construction. Claims rest on external RL training, submodular optimization perspectives, and baselines rather than internal redefinition. This matches the default expectation of no significant circularity.
Axiom & Free-Parameter Ledger
Reference graph
Works this paper leans on
-
[1]
Jialin Dong, Bahare Fatemi, Bryan Perozzi, Lin F Yang, and Anton Tsitsulin
doi: 10.1007/978-3-642-36973-5_36. Jialin Dong, Bahare Fatemi, Bryan Perozzi, Lin F Yang, and Anton Tsitsulin. Don’t forget to connect! improving rag with graph-based reranking.arXiv preprint arXiv:2405.18414,
-
[2]
Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783,
work page internal anchor Pith review Pith/arXiv arXiv
-
[3]
LightRAG: Simple and Fast Retrieval-Augmented Generation
Zirui Guo, Lianghao Xia, Yanhua Yu, Tian Ao, and Chao Huang. Lightrag: Simple and fast retrieval- augmented generation.arXiv preprint arXiv:2410.05779, 2(3),
work page internal anchor Pith review Pith/arXiv arXiv
-
[4]
From RAG to Memory: Non-Parametric Continual Learning for Large Language Models
Bernal Jiménez Gutiérrez, Yiheng Shu, Weijian Qi, Sizhe Zhou, and Yu Su. From rag to memory: Non-parametric continual learning for large language models.arXiv preprint arXiv:2502.14802,
work page internal anchor Pith review Pith/arXiv arXiv
-
[5]
Retrieval-Augmented Generation with Graphs (GraphRAG)
Haoyu Han, Yu Wang, Harry Shomer, Kai Guo, Jiayuan Ding, Yongjia Lei, Mahantesh Halappanavar, Ryan A Rossi, Subhabrata Mukherjee, Xianfeng Tang, et al. Retrieval-augmented generation with graphs (graphrag).arXiv preprint arXiv:2501.00309,
work page internal anchor Pith review Pith/arXiv arXiv
-
[6]
doi: 10.1090/S0273-0979-06-01126-8. Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.Iclr, 1(2):3,
-
[7]
Grag: Graph retrieval- augmented generation
Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, and Liang Zhao. Grag: Graph retrieval- augmented generation. InFindings of the Association for Computational Linguistics: NAACL 2025, pp. 4145–4157, 2025b. Glen Jeh and Jennifer Widom. Scaling personalized web search. InProceedings of the 12th international conference on World Wide Web, pp. 271–279,
work page 2025
-
[8]
Dense passage retrieval for open-domain question answering
Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. InProceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp. 6769–6781,
work page 2020
-
[9]
Mufei Li, Siqi Miao, and Pan Li. Simple is effective: The roles of graphs and large language models in knowledge-graph-based retrieval-augmented generation. InInternational Conference on Learning Representations, volume 2025, pp. 6061–6089,
work page 2025
-
[10]
Understanding R1-Zero-Like Training: A Critical Perspective
Zichen Liu, Changyu Chen, Wenjun Li, Penghui Qi, Tianyu Pang, Chao Du, Wee Sun Lee, and Min Lin. Understanding r1-zero-like training: A critical perspective.arXiv preprint arXiv:2503.20783,
work page internal anchor Pith review Pith/arXiv arXiv
-
[11]
Reasoning on graphs: Faithful and inter- pretable large language model reasoning
Linhao Luo, Yuan-Fang Li, Reza Haffari, and Shirui Pan. Reasoning on graphs: Faithful and inter- pretable large language model reasoning. InInternational Conference on Learning Representations, volume 2024, pp. 14400–14423,
work page 2024
-
[12]
Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin
doi: 10.1007/BF01588971. Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. Document ranking with a pretrained sequence-to-sequence model. In Trevor Cohn, Yulan He, and Yang Liu (eds.),Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 708–718, Online, November
-
[13]
doi: 10.18653/v1/2020.findings-emnlp.63
Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.63. URLhttps: //aclanthology.org/2020.findings-emnlp.63/. OpenAI. New and improved embedding model. https://openai.com/index/ new-and-improved-embedding-model/,
-
[14]
Accessed: 2026-05-07. Bryan Perozzi, Bahare Fatemi, Dustin Zelle, Anton Tsitsulin, Mehran Kazemi, Rami Al-Rfou, and Jonathan Halcrow. Let your graph do the talking: Encoding structured data for LLMs,
work page 2026
-
[15]
Sentence-bert: Sentence embeddings using siamese bert-networks
Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11
work page 2019
-
[16]
Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks
URLhttps://arxiv.org/abs/1908.10084. Nils Reimers and Iryna Gurevych. Making monolingual sentence embeddings multilingual using knowledge distillation. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11
work page internal anchor Pith review Pith/arXiv arXiv 1908
-
[17]
URLhttps://arxiv. org/abs/2004.09813. Nils Reimers and Iryna Gurevych. The curse of dense low-dimensional information retrieval for large index sizes. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 605–611, Online, 8
-
[18]
URL https://arxiv.org/abs/2012.14210
Association for Computational Linguistics. URL https://arxiv.org/abs/2012.14210. Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. Bpr: Bayesian personalized ranking from implicit feedback,
-
[19]
DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models
Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300,
work page internal anchor Pith review Pith/arXiv arXiv
-
[20]
Hamed Shirzad, Honghao Lin, Ameya Velingker, Balaji Venkatachalam, David Woodruff, and Danica Sutherland. A theory for compressibility of graph transformers for transductive learning.arXiv preprint arXiv:2411.13028, 2024a. Hamed Shirzad, Honghao Lin, Balaji Venkatachalam, Ameya Velingker, David P Woodruff, and Danica J Sutherland. Even sparser graph trans...
-
[21]
Nandan Thakur, Nils Reimers, Johannes Daxenberger, and Iryna Gurevych. Augmented SBERT: Data augmentation method for improving bi-encoders for pairwise sentence scoring tasks. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 296–310, Online, June 2021a. As...
work page 2021
-
[22]
Qa-gnn: Reasoning with language models and knowledge graphs for question answering
MichihiroYasunaga, HongyuRen, AntoineBosselut, PercyLiang, andJureLeskovec. Qa-gnn: Reasoning with language models and knowledge graphs for question answering. InProceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, pp. 535–546,
work page 2021
-
[23]
Graphrag-r1: Graph retrieval-augmented generation with process-constrained reinforcement learning
Chuanyue Yu, Kuo Zhao, Yuhan Li, Heng Chang, Mingjian Feng, Xiangzhe Jiang, Yufei Sun, Jia Li, Yuzhi Zhang, Qingyun Sun, et al. Graphrag-r1: Graph retrieval-augmented generation with process-constrained reinforcement learning. InProceedings of the ACM Web Conference 2026, pp. 1398–1409,
work page 2026
-
[24]
Stochastic retrieval-conditioned reranking
Hamed Zamani, Michael Bendersky, Donald Metzler, Honglei Zhuang, and Xuanhui Wang. Stochastic retrieval-conditioned reranking. InProceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR ’22, pp. 81–91. Association for Computing Machinery,
work page 2022
-
[25]
Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna
doi: 10.1145/3539813.3545141. Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. Graphsaint: Graph sampling based inductive learning method.arXiv preprint arXiv:1907.04931,
-
[26]
Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models
Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, et al. Qwen3 embedding: Advancing text embedding and reranking through foundation models.arXiv preprint arXiv:2506.05176,
work page internal anchor Pith review Pith/arXiv arXiv
-
[27]
Knowledge graph-guided retrieval augmented generation
Xiangrong Zhu, Yuexiang Xie, Yi Liu, Yaliang Li, and Wei Hu. Knowledge graph-guided retrieval augmented generation. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pp. 8912–8924,
work page 2025
-
[28]
Appendix A Dataset Details We evaluateSeedERon the three semi-structured retrieval benchmarks introduced by STARK (Wu et al., 2024): STARK-PRIME, STARK-MAG, and STARK-AMAZON. Each benchmark is built on top of a semi-structured knowledge base (SKB) that combines textual information associated with entities and typed relational information between entities....
work page 2024
-
[29]
Thus, unrolling the recursion and applyingQ i(1 −a i) ≥ 1 −P i ai for ai ∈[0,1], we have E|AL| ≥k L L−1Y ℓ=0 1− 1 2(k−1)k ℓ−L ≥k L − 1 2(k−1) L−1X ℓ=0 kℓ =k L − 1 2(k−1) kL −1 k−1 = 1 2(kL + 1)> 1 2 kL. B.1.1 Success of local, iterative rules Theorem B.5.Take any relation-tracing graphG. Assign the binary features xs = bin(σ1(s)) · · · bin(σk(s)) ...
work page 2014
-
[30]
over a textual graph, we implement only its non-finetuned retrieval stage. Following Yu et al. (2025), for each query we retrieve dense seed nodes using the existing node embeddings, construct a bounded 2-hop ego graph, assign node prizes from node- query similarity and edge prizes from head, relation, and tail similarity, and solve the PCST objective usi...
work page 2025
-
[31]
SFT.SFT is a supervised fine-tuning baseline built on top of the ToG-style LLM search agent
and GPT-4o. SFT.SFT is a supervised fine-tuning baseline built on top of the ToG-style LLM search agent. The agent is fine-tuned on valid retrieval trajectories so that it learns to imitate graph-search decisions that lead to relevant nodes. In the setup of Yu et al. (2025), SFT uses LLaMA3-8B-Instruct as the backbone and applies LoRA fine-tuning (Hu et al.,
work page 2025
-
[32]
for efficiency. PRM.The Process Reward Model (PRM) baseline follows the step-wise supervision paradigm of Lightman et al. (2023). In contrast to standard supervised fine-tuning, which imitates complete successful trajectories, PRM learns a reward model that scores intermediate state-action pairs during graph search. Given a retrieval statest and a candida...
work page 2023
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.