ToolRosella: Translating Code Repositories into Standardized Tools for Scientific Agents

Shimin Di , Xujie Yuan , Hanghui Guo , Chaoqian Ouyang , Yongxu Liu , Ling Yue , Zhangze Chen , Libin Zheng

show 5 more authors

Jia Zhu Shaowu Pan Jian Yin Yong Rui Min-Ling Zhang

Authors on Pith no claims yet

classification 💻 cs.SE cs.CEcs.MA

keywords toolsrepositoriesscientifictoolrosellaagentcodecuratedinvoke

0 comments

read the original abstract

Large Language Model (LLM)-based agent systems are increasingly used for scientific tasks, yet their practical capability remains constrained by the narrow scope of manually curated tools they can invoke. Much scientific computational functionality already exists in open-source code repositories, but these resources remain difficult to standardize, operationalize, and invoke reliably for agent use. Here we present ToolRosella, a framework that automatically transforms heterogeneous scientific code repositories into standardized, agent-invocable tools. ToolRosella combines repository analysis, tool interface construction, execution testing, and iterative repair to address the problem of repository-to-tool standardization. Across 122 GitHub repositories spanning 35 subdisciplines in six domains, ToolRosella reaches a 61.5% repository conversion success rate after iterative repair, with a 4.4 speedup over human engineers. The resulting 1,580 callable tools support a downstream task success rate of 84.0% and improve performance when integrated into other agent frameworks, particularly on tasks whose required tools are absent from fixed, curated inventories.

This paper has not been read by Pith yet.

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

RemoteAgent: Bridging Vague Human Intents and Earth Observation with RL-based Agentic MLLMs
cs.CV 2026-04 unverdicted novelty 7.0

RemoteAgent uses RL fine-tuning on VagueEO to align MLLMs for vague EO intent recognition, handling simple tasks internally and routing dense predictions to tools via Model Context Protocol.
FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification
cs.AI 2026-04 conditional novelty 7.0

FactReview extracts claims from ML papers, positions them via literature retrieval, and verifies them through code execution, labeling each as Supported, Partially supported, or In conflict, as shown in a CompGCN case study.
SKILLFOUNDRY: Building Self-Evolving Agent Skill Libraries from Heterogeneous Scientific Resources
cs.AI 2026-04 unverdicted novelty 7.0

SkillFoundry mines heterogeneous scientific resources into a self-evolving library of validated agent skills, with 71.1% novelty versus prior libraries and measurable gains on coding benchmarks plus two genomics tasks.