One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents

Deguo Xia; Jiahui Liang; Jiyan He; Jizhou Huang; Kun Liang; Weikang Li; Yanzhi Zhang; Yiming Xu; Yitong Duan; Yunfang Wu

arxiv: 2512.20957 · v6 · pith:JA66AEK5new · submitted 2025-12-24 · 💻 cs.SE · cs.AI

One Tool Is Enough: Reinforcement Learning for Repository-Level LLM Agents

Zhaoxi Zhang , Yitong Duan , Yanzhi Zhang , Yiming Xu , Zhixiang Wang , Kun Liang , Weikang Li , Jiahui Liang

show 4 more authors

Deguo Xia Jizhou Huang Jiyan He Yunfang Wu

This is my paper

classification 💻 cs.SE cs.AI

keywords modeltoolreponavigatorrepository-levelclosed-sourcecodeexecutionlearning

0 comments

read the original abstract

Locating files and functions requiring modification in large software repositories is challenging due to their scale and structural complexity. Existing LLM-based methods typically treat this as a repository-level retrieval task and rely on multiple auxiliary tools, which often overlook code execution logic and complicate model control. We propose RepoNavigator, an LLM agent equipped with a single execution-aware tool: jumping to the definition of an invoked symbol. This unified design reflects the actual flow of code execution while simplifying tool manipulation. RepoNavigator is trained end-to-end via Reinforcement Learning (RL) directly from a base pretrained model, without relying on closed-source distillation. Experiments demonstrate that RL-trained RepoNavigator achieves state-of-the-art performance, with the 7B model outperforming 14B baselines, the 14B model surpassing 32B competitors, and the 32B model exceeding closed-source models such as GPT-5 on most metrics. These results confirm that integrating a single, structurally grounded tool with RL training provides an efficient and scalable solution for repository-level issue localization.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Generate, Filter, Control, Replay: A Comprehensive Survey of Rollout Strategies for LLM Reinforcement Learning
cs.LG 2026-04 unverdicted novelty 7.0

This survey introduces the Generate-Filter-Control-Replay (GFCR) taxonomy to structure rollout pipelines for RL-based post-training of reasoning LLMs.
LARGER: Lexically Anchored Repository Graph Exploration and Retrieval
cs.IR 2026-05 unverdicted novelty 5.0

LARGER boosts file localization accuracy for repository-level coding agents by integrating lexically anchored graph expansion directly into standard search loops, yielding gains of up to 13.9 points on LocBench.