Knowledge-graph paths reused as intermediate supervision improve self-evolving search agents over standard Search Self-Play on seven QA benchmarks by supplying relational context and graded waypoint rewards.
Generation uses temperature 0.8, a maximum of 2,048 tokens per response, and up to 10 search turns per trajectory, with top-3 document retrieval at each turn
1 Pith paper cite this work. Polarity classification is still indexing.
1
Pith paper citing it
fields
cs.AI 1years
2026 1verdicts
UNVERDICTED 1representative citing papers
citing papers explorer
-
Knowledge-Graph Paths as Intermediate Supervision for Self-Evolving Search Agents
Knowledge-graph paths reused as intermediate supervision improve self-evolving search agents over standard Search Self-Play on seven QA benchmarks by supplying relational context and graded waypoint rewards.