SkillRet benchmark shows fine-tuned retrievers improve NDCG@10 by 13+ points over prior models on large-scale skill retrieval for LLM agents.
Retrieval models aren’t tool-savvy: Benchmarking tool retrieval for large language models
3 Pith papers cite this work. Polarity classification is still indexing.
citation-role summary
citation-polarity summary
years
2026 3verdicts
UNVERDICTED 3roles
background 1polarities
unclear 1representative citing papers
FitText embeds memetic evolutionary retrieval inside the agent's reasoning loop to iteratively refine pseudo-tool descriptions, raising retrieval rank from 8.81 to 2.78 on ToolRet and pass rate to 0.73 on StableToolBench.
Complete cyclic subtask graphs offer a lens to measure when multi-agent revisitation aids recovery and exploration versus when it increases costs or is dominated by other bottlenecks in LLM agent workflows.
citing papers explorer
-
SkillRet: A Large-Scale Benchmark for Skill Retrieval in LLM Agents
SkillRet benchmark shows fine-tuned retrievers improve NDCG@10 by 13+ points over prior models on large-scale skill retrieval for LLM agents.
-
FitText: Evolving Agent Tool Ecologies via Memetic Retrieval
FitText embeds memetic evolutionary retrieval inside the agent's reasoning loop to iteratively refine pseudo-tool descriptions, raising retrieval rank from 8.81 to 2.78 on ToolRet and pass rate to 0.73 on StableToolBench.
-
Complete Cyclic Subtask Graphs for Tool-Using LLM Agents: Flexibility, Cost, and Bottlenecks in Multi-Agent Workflows
Complete cyclic subtask graphs offer a lens to measure when multi-agent revisitation aids recovery and exploration versus when it increases costs or is dominated by other bottlenecks in LLM agent workflows.