pith. sign in

arxiv: 2605.23753 · v1 · pith:LIIAJOL3new · submitted 2026-05-22 · 💻 cs.LG

SeedER: Seed-and-Expand Retrieval from Knowledge Graphs

Pith reviewed 2026-05-25 04:52 UTC · model grok-4.3

classification 💻 cs.LG
keywords knowledge graph retrievalseed and expandreinforcement learningcompositional queriesdense retrievalgraph explorationfirst-stage retrieval
0
0 comments X

The pith

SeedER retrieves from knowledge graphs by seeding a compact node set then expanding it with a learned reinforcement policy.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that dense embedding retrieval alone fails on multi-hop compositional queries in knowledge graphs because it ignores structure, while full ego-graph expansion grows too fast for practical use. SeedER addresses this by first retrieving a small seed set with lightweight dense and entity methods, then using a graph-aware policy trained via reinforcement learning to decide which neighbors to add at each step. This decomposition turns global retrieval into reusable local decisions that control cost while recovering relevant nodes. The authors argue the result is higher recall at small candidate sizes compared to strong baselines, positioning SeedER as an efficient first-stage retriever for downstream reasoning.

Core claim

SeedER first seeds a compact set of core nodes using lightweight dense and entity-based retrieval, then selectively expands this set via a learned graph-aware policy trained with reinforcement learning. This design decomposes global reasoning into reusable local decisions, enabling efficient discovery of query-relevant nodes while tightly controlling expansion cost.

What carries the argument

The seed-and-expand retrieval process that trains a graph-aware policy with reinforcement learning to make low-cost local expansion decisions from an initial dense seed set.

If this is right

  • Recall improves over dense retrieval and graph-augmented baselines while keeping candidate sets small.
  • The method serves as an effective first-stage retriever for knowledge-intensive reasoning systems.
  • Theoretical analysis shows dense methods have inherent limits on compositional graph queries that the expansion step can mitigate.
  • The approach yields advantages from both compositional generalization and graph-constrained submodular optimization views.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same seeding-plus-local-policy pattern might reduce search cost in other irregular structured data such as molecular graphs or citation networks.
  • If the policy can be made query-agnostic at inference time, the method could support real-time retrieval without per-query retraining.
  • Downstream systems that rely on retrieved subgraphs might see accuracy gains if the compact sets preserve more relational paths than embedding-only methods.

Load-bearing premise

The learned expansion policy generalizes from training data to produce query-relevant nodes on unseen compositional queries without the set growing too large.

What would settle it

A test set of new multi-hop queries where the policy either misses more than a baseline fraction of relevant nodes or produces candidate sets larger than the reported compact sizes.

Figures

Figures reproduced from arXiv: 2605.23753 by Danica J. Sutherland, Dominique Beaini, Emmanuel Noutahi, Frederik Wenkel, Hamed Shirzad.

Figure 1
Figure 1. Figure 1: Overview of the SeedER pipeline. Given a query, SeedER first retrieves a compact set of core nodes, then learns to selectively expand their multi-hop neighborhood using an RL-guided graph policy. The final candidate nodes are ranked with a GNN-based scoring head. mismatch is not merely empirical, but fundamental: there exist reasonable families of KGs and queries for which dense retrieval requires feature … view at source ↗
Figure 2
Figure 2. Figure 2: Illustration of multi-hop neighborhood sampling. Starting from a query node, the policy [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Expansion rates of naive multi-hop neighborhoods and retrieval performance of [PITH_FULL_IMAGE:figures/full_fig_p006_3.png] view at source ↗
Figure 4
Figure 4. Figure 4 [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: extends the k-hop expansion analysis from [PITH_FULL_IMAGE:figures/full_fig_p026_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Dense retrieval versus budgeted K-hop-with-filtering across datasets and metrics. Each row corresponds to a STARK dataset and each column to a retrieval metric. The curves compare pure dense retrieval with the same initial dense seeds followed by budgeted frontier expansion. Filtering over local graph neighborhoods improves coverage-oriented metrics most clearly on STARK￾PRIME and STARK-MAG, while gains on… view at source ↗
Figure 7
Figure 7. Figure 7: RL training stability across random seeds on STARK-PRIME. Curves show training loss, supervised BPR loss, RL loss, training reward, test Recall@20, and validation Recall@20 across ten random seeds. The BPR loss decreases consistently, and reward and recall improve early. and often matches or exceeds filtered expansion on Hit@Any, while Recall@Any remains competitive for the filtered method, this limits the… view at source ↗
read the original abstract

Knowledge graphs (KGs) offer a rich representation for relational knowledge, but their irregular structure makes retrieval challenging: ego-graph expansion grows rapidly, and dense embedding methods struggle with multi-hop compositional queries. Existing agent-based graph exploration approaches, while expressive, are often too expensive for large-scale retrieval. We introduce SeedER (Seed-and-Expand Retrieval), a retrieval framework that explicitly leverages KG structure through iterative, low-cost expansion. SeedER first seeds a compact set of core nodes using lightweight dense and entity-based retrieval, then selectively expands this set via a learned graph-aware policy trained with reinforcement learning. This design decomposes global reasoning into reusable local decisions, enabling efficient discovery of query-relevant nodes while tightly controlling expansion cost. We show theoretical limitations of dense retrieval on compositional graph queries, and establish advantages of SeedER from both compositional generalization and graph-constrained submodular optimization perspectives. Empirically, SeedER substantially improves recall with compact candidate sets over strong dense and graph-augmented baselines, making it an effective first-stage retriever for knowledge-intensive reasoning systems.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper introduces SeedER, a retrieval framework for knowledge graphs. It seeds a compact set of core nodes via lightweight dense and entity-based retrieval, then expands the set via a learned graph-aware policy trained with reinforcement learning on local decisions. The work claims to demonstrate theoretical limitations of dense retrieval on compositional graph queries, advantages from compositional generalization and graph-constrained submodular optimization perspectives, and substantial empirical recall gains with compact candidate sets over dense and graph-augmented baselines, positioning SeedER as an effective first-stage retriever.

Significance. If the empirical gains and policy generalization hold, SeedER could meaningfully advance efficient multi-hop retrieval over KGs by decomposing global reasoning into reusable local RL decisions while controlling expansion cost. The combination of seeding, RL expansion, and the stated theoretical perspectives would offer a practical alternative to both pure dense methods and expensive agent-based exploration.

major comments (2)
  1. [Abstract] Abstract and experimental sections: The central claim of substantial recall improvements with compact sets depends on the RL policy generalizing to unseen compositional queries without cost blowup. The manuscript asserts this but supplies no results on out-of-distribution generalization, policy overfitting to training query patterns, or behavior on higher-hop structures, leaving the weakest assumption unverified.
  2. [Theoretical analysis] Theoretical analysis section: The claimed advantages from both compositional generalization and graph-constrained submodular optimization are asserted, but the explicit linkage between the RL objective (local decisions) and the submodular objective is not derived or shown, so the connection to the stated perspectives remains unclear.
minor comments (1)
  1. [Abstract] The abstract uses qualitative phrasing ('substantially improves') without any quantitative anchors; adding even high-level numbers from the experiments would improve clarity.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We address the two major comments below, agreeing that additional clarification and experiments would strengthen the manuscript.

read point-by-point responses
  1. Referee: [Abstract] Abstract and experimental sections: The central claim of substantial recall improvements with compact sets depends on the RL policy generalizing to unseen compositional queries without cost blowup. The manuscript asserts this but supplies no results on out-of-distribution generalization, policy overfitting to training query patterns, or behavior on higher-hop structures, leaving the weakest assumption unverified.

    Authors: We agree that the current manuscript does not include dedicated experiments on out-of-distribution generalization of the RL policy, potential overfitting to training query patterns, or explicit scaling to higher-hop structures. While the local decision formulation is intended to promote reusability, these aspects remain unverified in the presented results. We will add a new subsection with OOD tests, higher-hop analysis, and overfitting diagnostics in the revised version. revision: yes

  2. Referee: [Theoretical analysis] Theoretical analysis section: The claimed advantages from both compositional generalization and graph-constrained submodular optimization are asserted, but the explicit linkage between the RL objective (local decisions) and the submodular objective is not derived or shown, so the connection to the stated perspectives remains unclear.

    Authors: The manuscript presents advantages separately from the compositional generalization view and from the graph-constrained submodular optimization view, with the RL policy serving as a learned approximation to local submodular decisions. However, an explicit derivation linking the local RL objective to the global submodular objective is not provided. We will revise the theoretical analysis section to include this derivation or a clearer statement of the relationship. revision: yes

Circularity Check

0 steps flagged

No circularity detected; derivation is self-contained

full rationale

The provided abstract and description introduce SeedER as a seed-then-expand framework with an RL-trained policy, theoretical limitations on dense retrieval, and empirical gains, but contain no equations, fitted parameters renamed as predictions, self-definitional constructs, or load-bearing self-citations that reduce claims to inputs by construction. Claims rest on external RL training, submodular optimization perspectives, and baselines rather than internal redefinition. This matches the default expectation of no significant circularity.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no specific free parameters, axioms, or invented entities can be identified from the provided text.

pith-pipeline@v0.9.0 · 5725 in / 989 out tokens · 17527 ms · 2026-05-25T04:52:06.478144+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · 8 internal anchors

  1. [1]

    Jialin Dong, Bahare Fatemi, Bryan Perozzi, Lin F Yang, and Anton Tsitsulin

    doi: 10.1007/978-3-642-36973-5_36. Jialin Dong, Bahare Fatemi, Bryan Perozzi, Lin F Yang, and Anton Tsitsulin. Don’t forget to connect! improving rag with graph-based reranking.arXiv preprint arXiv:2405.18414,

  2. [2]

    The Llama 3 Herd of Models

    Aaron Grattafiori, Abhimanyu Dubey, Abhinav Jauhri, Abhinav Pandey, Abhishek Kadian, Ahmad Al-Dahle, Aiesha Letman, Akhil Mathur, Alan Schelten, Alex Vaughan, et al. The llama 3 herd of models.arXiv preprint arXiv:2407.21783,

  3. [3]

    LightRAG: Simple and Fast Retrieval-Augmented Generation

    Zirui Guo, Lianghao Xia, Yanhua Yu, Tian Ao, and Chao Huang. Lightrag: Simple and fast retrieval- augmented generation.arXiv preprint arXiv:2410.05779, 2(3),

  4. [4]

    From RAG to Memory: Non-Parametric Continual Learning for Large Language Models

    Bernal Jiménez Gutiérrez, Yiheng Shu, Weijian Qi, Sizhe Zhou, and Yu Su. From rag to memory: Non-parametric continual learning for large language models.arXiv preprint arXiv:2502.14802,

  5. [5]

    Retrieval-Augmented Generation with Graphs (GraphRAG)

    Haoyu Han, Yu Wang, Harry Shomer, Kai Guo, Jiayuan Ding, Yongjia Lei, Mahantesh Halappanavar, Ryan A Rossi, Subhabrata Mukherjee, Xianfeng Tang, et al. Retrieval-augmented generation with graphs (graphrag).arXiv preprint arXiv:2501.00309,

  6. [6]

    Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al

    doi: 10.1090/S0273-0979-06-01126-8. Edward J Hu, Yelong Shen, Phillip Wallis, Zeyuan Allen-Zhu, Yuanzhi Li, Shean Wang, Liang Wang, Weizhu Chen, et al. Lora: Low-rank adaptation of large language models.Iclr, 1(2):3,

  7. [7]

    Grag: Graph retrieval- augmented generation

    Yuntong Hu, Zhihan Lei, Zheng Zhang, Bo Pan, Chen Ling, and Liang Zhao. Grag: Graph retrieval- augmented generation. InFindings of the Association for Computational Linguistics: NAACL 2025, pp. 4145–4157, 2025b. Glen Jeh and Jennifer Widom. Scaling personalized web search. InProceedings of the 12th international conference on World Wide Web, pp. 271–279,

  8. [8]

    Dense passage retrieval for open-domain question answering

    Vladimir Karpukhin, Barlas Oguz, Sewon Min, Patrick Lewis, Ledell Wu, Sergey Edunov, Danqi Chen, and Wen-tau Yih. Dense passage retrieval for open-domain question answering. InProceedings of the 2020 conference on empirical methods in natural language processing (EMNLP), pp. 6769–6781,

  9. [9]

    Simple is effective: The roles of graphs and large language models in knowledge-graph-based retrieval-augmented generation

    Mufei Li, Siqi Miao, and Pan Li. Simple is effective: The roles of graphs and large language models in knowledge-graph-based retrieval-augmented generation. InInternational Conference on Learning Representations, volume 2025, pp. 6061–6089,

  10. [10]

    Understanding R1-Zero-Like Training: A Critical Perspective

    Zichen Liu, Changyu Chen, Wenjun Li, Penghui Qi, Tianyu Pang, Chao Du, Wee Sun Lee, and Min Lin. Understanding r1-zero-like training: A critical perspective.arXiv preprint arXiv:2503.20783,

  11. [11]

    Reasoning on graphs: Faithful and inter- pretable large language model reasoning

    Linhao Luo, Yuan-Fang Li, Reza Haffari, and Shirui Pan. Reasoning on graphs: Faithful and inter- pretable large language model reasoning. InInternational Conference on Learning Representations, volume 2024, pp. 14400–14423,

  12. [12]

    Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin

    doi: 10.1007/BF01588971. Rodrigo Nogueira, Zhiying Jiang, Ronak Pradeep, and Jimmy Lin. Document ranking with a pretrained sequence-to-sequence model. In Trevor Cohn, Yulan He, and Yang Liu (eds.),Findings of the Association for Computational Linguistics: EMNLP 2020, pp. 708–718, Online, November

  13. [13]

    doi: 10.18653/v1/2020.findings-emnlp.63

    Association for Computational Linguistics. doi: 10.18653/v1/2020.findings-emnlp.63. URLhttps: //aclanthology.org/2020.findings-emnlp.63/. OpenAI. New and improved embedding model. https://openai.com/index/ new-and-improved-embedding-model/,

  14. [14]

    Bryan Perozzi, Bahare Fatemi, Dustin Zelle, Anton Tsitsulin, Mehran Kazemi, Rami Al-Rfou, and Jonathan Halcrow

    Accessed: 2026-05-07. Bryan Perozzi, Bahare Fatemi, Dustin Zelle, Anton Tsitsulin, Mehran Kazemi, Rami Al-Rfou, and Jonathan Halcrow. Let your graph do the talking: Encoding structured data for LLMs,

  15. [15]

    Sentence-bert: Sentence embeddings using siamese bert-networks

    Nils Reimers and Iryna Gurevych. Sentence-bert: Sentence embeddings using siamese bert-networks. In Proceedings of the 2019 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11

  16. [16]

    Sentence-BERT: Sentence Embeddings using Siamese BERT-Networks

    URLhttps://arxiv.org/abs/1908.10084. Nils Reimers and Iryna Gurevych. Making monolingual sentence embeddings multilingual using knowledge distillation. InProceedings of the 2020 Conference on Empirical Methods in Natural Language Processing. Association for Computational Linguistics, 11

  17. [17]

    org/abs/2004.09813

    URLhttps://arxiv. org/abs/2004.09813. Nils Reimers and Iryna Gurevych. The curse of dense low-dimensional information retrieval for large index sizes. InProceedings of the 59th Annual Meeting of the Association for Computational Linguistics and the 11th International Joint Conference on Natural Language Processing (Volume 2: Short Papers), pp. 605–611, Online, 8

  18. [18]

    URL https://arxiv.org/abs/2012.14210

    Association for Computational Linguistics. URL https://arxiv.org/abs/2012.14210. Steffen Rendle, Christoph Freudenthaler, Zeno Gantner, and Lars Schmidt-Thieme. Bpr: Bayesian personalized ranking from implicit feedback,

  19. [19]

    DeepSeekMath: Pushing the Limits of Mathematical Reasoning in Open Language Models

    Zhihong Shao, Peiyi Wang, Qihao Zhu, Runxin Xu, Junxiao Song, Xiao Bi, Haowei Zhang, Mingchuan Zhang, YK Li, Yang Wu, et al. Deepseekmath: Pushing the limits of mathematical reasoning in open language models.arXiv preprint arXiv:2402.03300,

  20. [20]

    A theory for compressibility of graph transformers for transductive learning.arXiv preprint arXiv:2411.13028, 2024a

    Hamed Shirzad, Honghao Lin, Ameya Velingker, Balaji Venkatachalam, David Woodruff, and Danica Sutherland. A theory for compressibility of graph transformers for transductive learning.arXiv preprint arXiv:2411.13028, 2024a. Hamed Shirzad, Honghao Lin, Balaji Venkatachalam, Ameya Velingker, David P Woodruff, and Danica J Sutherland. Even sparser graph trans...

  21. [21]

    Augmented SBERT: Data augmentation method for improving bi-encoders for pairwise sentence scoring tasks

    Nandan Thakur, Nils Reimers, Johannes Daxenberger, and Iryna Gurevych. Augmented SBERT: Data augmentation method for improving bi-encoders for pairwise sentence scoring tasks. InProceedings of the 2021 Conference of the North American Chapter of the Association for Computational Linguistics: Human Language Technologies, pp. 296–310, Online, June 2021a. As...

  22. [22]

    Qa-gnn: Reasoning with language models and knowledge graphs for question answering

    MichihiroYasunaga, HongyuRen, AntoineBosselut, PercyLiang, andJureLeskovec. Qa-gnn: Reasoning with language models and knowledge graphs for question answering. InProceedings of the 2021 conference of the North American chapter of the association for computational linguistics: human language technologies, pp. 535–546,

  23. [23]

    Graphrag-r1: Graph retrieval-augmented generation with process-constrained reinforcement learning

    Chuanyue Yu, Kuo Zhao, Yuhan Li, Heng Chang, Mingjian Feng, Xiangzhe Jiang, Yufei Sun, Jia Li, Yuzhi Zhang, Qingyun Sun, et al. Graphrag-r1: Graph retrieval-augmented generation with process-constrained reinforcement learning. InProceedings of the ACM Web Conference 2026, pp. 1398–1409,

  24. [24]

    Stochastic retrieval-conditioned reranking

    Hamed Zamani, Michael Bendersky, Donald Metzler, Honglei Zhuang, and Xuanhui Wang. Stochastic retrieval-conditioned reranking. InProceedings of the 2022 ACM SIGIR International Conference on Theory of Information Retrieval, ICTIR ’22, pp. 81–91. Association for Computing Machinery,

  25. [25]

    Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna

    doi: 10.1145/3539813.3545141. Hanqing Zeng, Hongkuan Zhou, Ajitesh Srivastava, Rajgopal Kannan, and Viktor Prasanna. Graphsaint: Graph sampling based inductive learning method.arXiv preprint arXiv:1907.04931,

  26. [26]

    Qwen3 Embedding: Advancing Text Embedding and Reranking Through Foundation Models

    Yanzhao Zhang, Mingxin Li, Dingkun Long, Xin Zhang, Huan Lin, Baosong Yang, Pengjun Xie, An Yang, Dayiheng Liu, Junyang Lin, et al. Qwen3 embedding: Advancing text embedding and reranking through foundation models.arXiv preprint arXiv:2506.05176,

  27. [27]

    Knowledge graph-guided retrieval augmented generation

    Xiangrong Zhu, Yuexiang Xie, Yi Liu, Yaliang Li, and Wei Hu. Knowledge graph-guided retrieval augmented generation. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pp. 8912–8924,

  28. [28]

    Appendix A Dataset Details We evaluateSeedERon the three semi-structured retrieval benchmarks introduced by STARK (Wu et al., 2024): STARK-PRIME, STARK-MAG, and STARK-AMAZON. Each benchmark is built on top of a semi-structured knowledge base (SKB) that combines textual information associated with entities and typed relational information between entities....

  29. [29]

    connector

    Thus, unrolling the recursion and applyingQ i(1 −a i) ≥ 1 −P i ai for ai ∈[0,1], we have E|AL| ≥k L L−1Y ℓ=0 1− 1 2(k−1)k ℓ−L ≥k L − 1 2(k−1) L−1X ℓ=0 kℓ =k L − 1 2(k−1) kL −1 k−1 = 1 2(kL + 1)> 1 2 kL. B.1.1 Success of local, iterative rules Theorem B.5.Take any relation-tracing graphG. Assign the binary features xs =   bin(σ1(s)) · · · bin(σk(s))   ...

  30. [30]

    Following Yu et al

    over a textual graph, we implement only its non-finetuned retrieval stage. Following Yu et al. (2025), for each query we retrieve dense seed nodes using the existing node embeddings, construct a bounded 2-hop ego graph, assign node prizes from node- query similarity and edge prizes from head, relation, and tail similarity, and solve the PCST objective usi...

  31. [31]

    SFT.SFT is a supervised fine-tuning baseline built on top of the ToG-style LLM search agent

    and GPT-4o. SFT.SFT is a supervised fine-tuning baseline built on top of the ToG-style LLM search agent. The agent is fine-tuned on valid retrieval trajectories so that it learns to imitate graph-search decisions that lead to relevant nodes. In the setup of Yu et al. (2025), SFT uses LLaMA3-8B-Instruct as the backbone and applies LoRA fine-tuning (Hu et al.,

  32. [32]

    PRM.The Process Reward Model (PRM) baseline follows the step-wise supervision paradigm of Lightman et al

    for efficiency. PRM.The Process Reward Model (PRM) baseline follows the step-wise supervision paradigm of Lightman et al. (2023). In contrast to standard supervised fine-tuning, which imitates complete successful trajectories, PRM learns a reward model that scores intermediate state-action pairs during graph search. Given a retrieval statest and a candida...