arxiv: 2603.09290 · v3 · submitted 2026-03-10 · 💻 cs.SE · cs.CE· cs.MA

Recognition: no theorem link

ToolRosella: Translating Code Repositories into Standardized Tools for Scientific Agents

Shimin Di , Xujie Yuan , Hanghui Guo , Chaoqian Ouyang , Yongxu Liu , Ling Yue , Zhangze Chen , Libin Zheng

show 5 more authors

Jia Zhu Shaowu Pan Jian Yin Yong Rui Min-Ling Zhang

Authors on Pith no claims yet

Pith reviewed 2026-05-15 13:52 UTC · model grok-4.3

classification 💻 cs.SE cs.CEcs.MA

keywords ToolRosellascientific agentsLLM toolscode repositoriestool standardizationrepository conversionagent frameworksscientific computing

0 comments

The pith

ToolRosella converts scientific code repositories into standardized, agent-invocable tools with 61.5 percent success after repair.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

LLM-based agents for scientific work are limited by the small number of manually built tools available to them, while large amounts of useful code sit in open repositories that are hard to make reliable and callable. ToolRosella tackles this gap by running automated repository analysis, building tool interfaces, testing execution, and applying iterative repairs until the code becomes usable by agents. Tested on 122 GitHub repositories spanning 35 subdisciplines in six domains, the system reaches 61.5 percent conversion success after repairs, produces 1,580 callable tools, runs 4.4 times faster than human engineers, and supports 84 percent success on downstream tasks. These tools also raise performance when added to other agent frameworks, especially on problems whose needed functions are missing from existing fixed tool sets. The result matters because it offers a way to expand what agents can do in science without requiring constant human curation of every new capability.

Core claim

ToolRosella is a framework that automatically transforms heterogeneous scientific code repositories into standardized, agent-invocable tools through the combination of repository analysis, tool interface construction, execution testing, and iterative repair. Across 122 GitHub repositories covering 35 subdisciplines in six domains, it achieves a 61.5 percent repository conversion success rate after iterative repair at 4.4 times the speed of human engineers, yielding 1,580 callable tools that deliver an 84.0 percent success rate on downstream tasks and improve results when integrated into other agent frameworks, particularly where required tools are absent from curated inventories.

What carries the argument

The ToolRosella pipeline of repository analysis, interface construction, execution testing, and iterative repair that standardizes code into agent-callable tools.

If this is right

Agents can draw on far larger sets of scientific functionality without manual tool creation for each repository.
Task success rates rise on problems that need tools missing from fixed, hand-curated inventories.
Human engineering time for tool standardization drops by a factor of roughly 4.4.
The produced tools integrate directly into multiple existing agent frameworks and raise their performance.
The same conversion process applies across many scientific domains and subdisciplines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Wider use could reduce the bottleneck of tool curation and support more autonomous scientific workflows.
The method might extend to non-scientific code bases for general-purpose agents.
Additional domain checks could be layered on to catch subtle functionality losses the current tests miss.
Scaling to thousands more repositories would test whether the 61.5 percent rate holds outside the evaluated set.

Load-bearing premise

That automatic analysis, interface building, testing, and repair can turn varied scientific code into reliable tools without losing original functionality or introducing new errors across many domains.

What would settle it

Apply ToolRosella to a fresh collection of repositories not included in the original 122 and check whether the conversion success rate remains near 61.5 percent while the resulting tools preserve the same computational outputs as the source code.

read the original abstract

Large Language Model (LLM)-based agent systems are increasingly used for scientific tasks, yet their practical capability remains constrained by the narrow scope of manually curated tools they can invoke. Much scientific computational functionality already exists in open-source code repositories, but these resources remain difficult to standardize, operationalize, and invoke reliably for agent use. Here we present ToolRosella, a framework that automatically transforms heterogeneous scientific code repositories into standardized, agent-invocable tools. ToolRosella combines repository analysis, tool interface construction, execution testing, and iterative repair to address the problem of repository-to-tool standardization. Across 122 GitHub repositories spanning 35 subdisciplines in six domains, ToolRosella reaches a 61.5% repository conversion success rate after iterative repair, with a 4.4 speedup over human engineers. The resulting 1,580 callable tools support a downstream task success rate of 84.0% and improve performance when integrated into other agent frameworks, particularly on tasks whose required tools are absent from fixed, curated inventories.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

ToolRosella gives a concrete pipeline that converts scientific repos into agent tools at scale with reported speedups, but its success metric likely only checks for crashes rather than preserved scientific behavior.

read the letter

ToolRosella turns code repositories into standardized tools for LLM agents by analyzing repos, building interfaces, testing execution, and repairing issues iteratively. On 122 GitHub repos from 35 subdisciplines, it achieves 61.5% conversion success, produces 1580 tools, and reports a 4.4x speedup over human engineers along with 84% success on downstream tasks when the tools are used in agents. The paper does a solid job describing an end-to-end process that addresses a real bottleneck in agent systems for science. The scale of the evaluation stands out, and the numbers on conversion and task performance give a sense of what is achievable in practice. They also show integration benefits with other frameworks. The main soft spot is in how success is measured. The framework relies on execution testing and repair, but for scientific code this often means checking that it runs without crashing on sample inputs rather than verifying that outputs match the original implementation exactly. Numerical differences, changed defaults, or lost side effects could slip through, which would undermine the downstream claims. The abstract gives no details on error analysis or how they confirmed semantic preservation, so the 84% task success rests on an assumption that needs more scrutiny. This work is aimed at researchers building or extending scientific agent systems who want to expand their tool libraries beyond manual curation. Readers focused on practical agent engineering will find the framework and the reported metrics useful as a starting point. It deserves serious peer review because the core idea is grounded in a clear problem and comes with empirical results at a decent scale. The evaluation methodology will need strengthening, but the paper is coherent enough to warrant referee input.

Referee Report

2 major / 2 minor

Summary. The paper introduces ToolRosella, a framework that automatically converts heterogeneous scientific code repositories into standardized, agent-invocable tools via repository analysis, interface construction, execution testing, and iterative repair. Evaluated on 122 GitHub repositories spanning 35 subdisciplines in six domains, it reports a 61.5% conversion success rate after repair, a 4.4x speedup over human engineers, 1,580 resulting callable tools, an 84.0% downstream task success rate, and performance gains when integrated into other agent frameworks.

Significance. If the conversion process reliably preserves original functionality, the work would meaningfully expand the tool inventory available to LLM-based scientific agents beyond manually curated sets, with the reported empirical scale (122 repositories, 1,580 tools) and downstream improvements providing concrete evidence of practical impact. The concrete success rates and speedup measurements are strengths that support the central claim of scalable standardization.

major comments (2)

[Evaluation and Results] The definition of repository conversion success (61.5% after iterative repair) relies on execution testing that checks for non-crashing behavior on example inputs, but the manuscript provides no details on verification of semantic equivalence for scientific outputs such as numerical accuracy, solver results, or side effects. This assumption is load-bearing for the downstream 84.0% task success claim and the assertion that the 1,580 tools preserve original repository functionality without introducing silent errors.
[Results] The 4.4x speedup over human engineers is presented as a direct empirical outcome, yet the manuscript lacks a precise description of the human baseline protocol, task scope, and measurement methodology (e.g., time per repository, expertise level), making it difficult to assess whether the comparison fairly supports the efficiency claim.

minor comments (2)

The abstract and results would benefit from explicit reference to any supplementary material containing the full error analysis or repair logs to allow readers to evaluate the iterative repair process.
Notation for tool interface standardization (e.g., how function signatures and parameter mappings are formalized) could be clarified with a small example in the methods section for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. Below we provide point-by-point responses to the major comments and indicate the revisions made.

read point-by-point responses

Referee: [Evaluation and Results] The definition of repository conversion success (61.5% after iterative repair) relies on execution testing that checks for non-crashing behavior on example inputs, but the manuscript provides no details on verification of semantic equivalence for scientific outputs such as numerical accuracy, solver results, or side effects. This assumption is load-bearing for the downstream 84.0% task success claim and the assertion that the 1,580 tools preserve original repository functionality without introducing silent errors.

Authors: We agree that the conversion success metric is defined via execution testing for non-crashing behavior on example inputs and does not include explicit verification of semantic equivalence such as numerical accuracy, solver outputs, or side effects. The manuscript therefore does not claim or demonstrate full semantic preservation beyond operational executability. The reported 84.0% downstream task success rate provides indirect support that the tools function usefully in agent workflows, but we acknowledge this does not substitute for direct equivalence checks. In the revision we have added an explicit limitations paragraph clarifying the evaluation scope and noting the difficulty of general semantic equivalence testing across heterogeneous scientific codes. revision: partial
Referee: [Results] The 4.4x speedup over human engineers is presented as a direct empirical outcome, yet the manuscript lacks a precise description of the human baseline protocol, task scope, and measurement methodology (e.g., time per repository, expertise level), making it difficult to assess whether the comparison fairly supports the efficiency claim.

Authors: We accept that the original manuscript did not supply sufficient detail on the human baseline. We have revised the relevant results section to describe the protocol: two graduate students with domain expertise each processed a disjoint subset of the 122 repositories; timing began at repository checkout and ended when a working standardized tool interface was produced; the task scope matched the automated pipeline exactly, including interface design and basic testing. This added description allows readers to evaluate the fairness of the 4.4x comparison. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results from external repositories

full rationale

The paper presents ToolRosella as an empirical system for repository-to-tool conversion, with all reported metrics (61.5% success rate, 4.4x speedup, 84.0% downstream task success) obtained directly from execution testing on 122 external GitHub repositories across domains. No equations, parameter fitting, self-citations, or uniqueness theorems appear in the provided text to derive these outcomes; the evaluation chain relies on independent test inputs and observed behavior rather than reducing to fitted inputs or self-referential definitions. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions about the feasibility of automatic standardization and repair for scientific code; no free parameters or invented entities are evident from the abstract.

axioms (2)

domain assumption Heterogeneous scientific code repositories can be automatically analyzed and transformed into standardized agent-invocable tools without substantial loss of original functionality.
This assumption underpins the entire conversion pipeline and reported success rate.
domain assumption Iterative repair processes can resolve execution and interface issues across diverse codebases in a reliable manner.
Required to reach the 61.5% success rate after initial conversion attempts.

pith-pipeline@v0.9.0 · 5525 in / 1360 out tokens · 47362 ms · 2026-05-15T13:52:15.104433+00:00 · methodology

discussion (0)

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

RemoteAgent: Bridging Vague Human Intents and Earth Observation with RL-based Agentic MLLMs
cs.CV 2026-04 unverdicted novelty 7.0

RemoteAgent uses RL fine-tuning on VagueEO to align MLLMs for vague EO intent recognition, handling simple tasks internally and routing dense predictions to tools via Model Context Protocol.
FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification
cs.AI 2026-04 conditional novelty 7.0

FactReview extracts claims from ML papers, positions them via literature retrieval, and verifies them through code execution, labeling each as Supported, Partially supported, or In conflict, as shown in a CompGCN case study.
SKILLFOUNDRY: Building Self-Evolving Agent Skill Libraries from Heterogeneous Scientific Resources
cs.AI 2026-04 unverdicted novelty 7.0

SkillFoundry mines heterogeneous scientific resources into a self-evolving library of validated agent skills, with 71.1% novelty versus prior libraries and measurable gains on coding benchmarks plus two genomics tasks.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · cited by 3 Pith papers

[1]

Nature620(7972), 47–60 (2023)

Wang, H., Fu, T., Du, Y., Gao, W., Huang, K., Liu, Z., Chandak, P., Liu, S., Van Katwyk, P., Deac, A.,et al.: Scientific discovery in the age of artificial intelligence. Nature620(7972), 47–60 (2023)

work page 2023
[2]

The innovation2(4) (2021)

Xu, Y., Liu, X., Cao, X., Huang, C., Liu, E., Qian, S., Liu, X., Wu, Y., Dong, F., Qiu, C.-W., et al.: Artificial intelligence: A powerful paradigm for scientific research. The innovation2(4) (2021)

work page 2021
[3]

Nature646(8085), 716–723 (2025)

Swanson, K., Wu, W., Bulaong, N.L., Pak, J.E., Zou, J.: The virtual lab of ai agents designs new sars-cov-2 nanobodies. Nature646(8085), 716–723 (2025)

work page 2025
[4]

nature596(7873), 583–589 (2021)

Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., ˇZ´ ıdek, A., Potapenko, A.,et al.: Highly accurate protein structure prediction with alphafold. nature596(7873), 583–589 (2021)

work page 2021
[5]

Nature619(7970), 533–538 (2023)

Bi, K., Xie, L., Zhang, H., Chen, X., Gu, X., Tian, Q.: Accurate medium-range global weather forecasting with 3d neural networks. Nature619(7970), 533–538 (2023)

work page 2023
[6]

Science387(6736), 850–858 (2025)

Hayes, T., Rao, R., Akin, H., Sofroniew, N.J., Oktay, D., Lin, Z., Verkuil, R., Tran, V.Q., Deaton, J., Wiggert, M.,et al.: Simulating 500 million years of evolution with a language model. Science387(6736), 850–858 (2025)

work page 2025
[7]

Nature593(7859), 351–361 (2021)

Hatfield, P.W., Gaffney, J.A., Anderson, G.J., Ali, S., Antonelli, L., Pree, S., Citrin, J., Fajardo, M., Knapp, P., Kettle, B.,et al.: The data-driven future of high-energy-density physics. Nature593(7859), 351–361 (2021)

work page 2021
[8]

Science379(6637), 1123–1130 (2023)

Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., Smetanin, N., Verkuil, R., Kabeli, O., Shmueli, Y.,et al.: Evolutionary-scale prediction of atomic-level protein structure with a language model. Science379(6637), 1123–1130 (2023)

work page 2023
[9]

The Innovation5(2) (2024)

Qu, Y., Du, P., Che, W., Wei, C., Zhang, C., Ouyang, W., Bian, Y., Xu, F., Hu, B., Du, K., et al.: Promoting interactions between cognitive science and large language models. The Innovation5(2) (2024)

work page 2024
[10]

Nature Machine Intelligence7(9), 1373–1375 (2025)

Xin, H., Kitchin, J.R., Kulik, H.J.: Towards agentic science for advancing scientific discovery. Nature Machine Intelligence7(9), 1373–1375 (2025)

work page 2025
[11]

Nature651(8107), 914–919 (2026)

Lu, C., Lu, C., Lange, R.T., Yamada, Y., Hu, S., Foerster, J., Ha, D., Clune, J.: Towards end-to-end automation of ai research. Nature651(8107), 914–919 (2026)

work page 2026
[12]

Nature Machine Intelligence 4(6), 533–543 (2022)

Tee, K.P., Cheong, S., Li, J., Ganesh, G.: A framework for tool cognition in robots without prior tool learning or observation. Nature Machine Intelligence 4(6), 533–543 (2022)

work page 2022
[13]

Bran, A., Cox, S., Schilter, O., Baldassari, C., White, A.D., Schwaller, P.: Aug- menting large language models with chemistry tools

M. Bran, A., Cox, S., Schilter, O., Baldassari, C., White, A.D., Schwaller, P.: Aug- menting large language models with chemistry tools. Nature Machine Intelligence 6(5), 525–535 (2024) https://doi.org/10.1038/s42256-024-00832-8

work page doi:10.1038/s42256-024-00832-8 2024
[14]

Nature Computational Science, 1–15 (2025) 19

Shao, E., Wang, Y., Qian, Y., Pan, Z., Liu, H., Wang, D.: Sciscigpt: advancing human–ai collaboration in the science of science. Nature Computational Science, 1–15 (2025) 19

work page 2025
[15]

Nature624(7992), 570–578 (2023) https://doi.org/ 10.1038/s41586-023-06792-0

Boiko, D.A., MacKnight, R., Kline, B., Gomes, G.: Autonomous chemical research with large language models. Nature624(7992), 570–578 (2023) https://doi.org/ 10.1038/s41586-023-06792-0

work page doi:10.1038/s41586-023-06792-0 2023
[16]

npj Computational Materials (2026)

Vriza, A., Prince, M.H., Zhou, T., Chan, H., Cherukara, M.J.: Operating advanced scientific instruments with ai agents that learn on the job. npj Computational Materials (2026)

work page 2026
[17]

Nature Computational Science5(10), 962–972 (2025) https://doi.org/10.1038/ s43588-025-00849-y

Ding, K., Yu, J., Huang, J., Yang, Y., Zhang, Q., Chen, H.: Scitoola- gent: a knowledge-graph-driven scientific agent for multitool integration. Nature Computational Science5(10), 962–972 (2025) https://doi.org/10.1038/ s43588-025-00849-y

work page 2025
[18]

Nature Biomedical Engineering10(2), 245–258 (2026)

Qu, Y., Huang, K., Yin, M., Zhan, K., Liu, D., Yin, D., Cousins, H.C., Johnson, W.A., Wang, X., Shah, M.,et al.: Crispr-gpt for agentic automation of gene- editing experiments. Nature Biomedical Engineering10(2), 245–258 (2026)

work page 2026
[19]

In: The Thirty- ninth Annual Conference on Neural Information Processing Systems (2025)

Wang, H., Ni, Z., Zhang, S., Lu, S., Hu, S., He, Z., Hu, C., Lin, J., Guo, Y., Du, Y., Lyu, P.: Repomaster: Autonomous exploration and under- standing of github repositories for complex task solving. In: The Thirty- ninth Annual Conference on Neural Information Processing Systems (2025). https://openreview.net/forum?id=aSfBbhUJAa

work page 2025
[20]

In: Che, W., Nabende, J., Shutova, E., Pilehvar, M.T

Lyu, B., Cong, X., Yu, H., Yang, P., Qian, C., Wang, Z., Qin, Y., Ye, Y., Lu, Y., Qian, C., Zhang, Z., Yan, Y., Lin, Y., Liu, Z., Sun, M.: Enhancing open- domain task-solving capability of LLMs via autonomous tool integration from GitHub. In: Che, W., Nabende, J., Shutova, E., Pilehvar, M.T. (eds.) Proceed- ings of the 63rd Annual Meeting of the Associati...

work page doi:10.18653/v1/2025.acl-long.845 2025
[21]

Nature cancer6(8), 1337–1349 (2025)

Ferber, D., El Nahhas, O.S., W¨ olflein, G., Wiest, I.C., Clusmann, J., Leßmann, M.-E., Foersch, S., Lammert, J., Tschochohei, M., J¨ ager, D.,et al.: Development and validation of an autonomous artificial intelligence agent for clinical decision- making in oncology. Nature cancer6(8), 1337–1349 (2025)

work page 2025
[22]

npj Artificial Intelligence2(1), 22 (2026)

Yang, A., Woo, J., Zhang, R., Mach, A., Ramkumar, P., Ma, Y.: Tool-wielding language model-based agent offers conversational exploration of clinical tabular data. npj Artificial Intelligence2(1), 22 (2026)

work page 2026
[23]

Advances in Neural Information Processing Systems37, 50528–50652 (2024)

Yang, J., Jimenez, C.E., Wettig, A., Lieret, K., Yao, S., Narasimhan, K., Press, O.: Swe-agent: Agent-computer interfaces enable automated software engineering. Advances in Neural Information Processing Systems37, 50528–50652 (2024)

work page 2024
[24]

Nature chemistry3(10), 745–748 (2011)

Woelfle, M., Olliaro, P., Todd, M.H.: Open science is a research accelerator. Nature chemistry3(10), 745–748 (2011)

work page 2011
[25]

Nature Communications15(1), 8117 (2024) 20

Kampen, A.H., Mahamune, U., Jongejan, A., Schaik, B.D., Balashova, D., Lash- gari, D., Pras-Raves, M., Wever, E.J., Dane, A.D., Garc´ ıa-Valiente, R.,et al.: Encore: a practical implementation to improve reproducibility and transparency of computational research. Nature Communications15(1), 8117 (2024) 20

work page 2024
[26]

Nature646(8084), 284–286 (2025)

Di Cosmo, R., Granger, S., Hinsen, K., Jullien, N., Le Berre, D., Louvet, V., Maumet, C., Maurice, C., Monat, R., Rougier, N.P.: Stop treating code like an afterthought: record, share and value it. Nature646(8084), 284–286 (2025)

work page 2025
[27]

The Innovation (2026)

Wang, P., Sun, X., Zou, L., Law, E.L.-C., Paas, F.: Ai-driven complex systems redefine cognitive science. The Innovation (2026)

work page 2026
[28]

In: Thirty-seventh Conference on Neural Informa- tion Processing Systems (2023).https://openreview.net/forum?id=Yacmpz84TH

Schick, T., Dwivedi-Yu, J., Dessi, R., Raileanu, R., Lomeli, M., Hambro, E., Zettlemoyer, L., Cancedda, N., Scialom, T.: Toolformer: Language models can teach themselves to use tools. In: Thirty-seventh Conference on Neural Informa- tion Processing Systems (2023).https://openreview.net/forum?id=Yacmpz84TH

work page 2023
[29]

Stroke55(9), 2284–2294 (2024)

Kelly, D.M., Engelbertz, C., Rothwell, P.M., Anderson, C.D., Reinecke, H., Koeppe, J.: Age-and sex-specific analysis of stroke hospitalization rates, risk factors, and outcomes from german nationwide data. Stroke55(9), 2284–2294 (2024)

work page 2024
[30]

Stroke56(1), 105–112 (2025)

Howard, G., Muntner, P., Lackland, D.T., Plante, T.B., Cushman, M., Stamm, B., Judd, S.E., Howard, V.J.: Association of duration of recognized hypertension and stroke risk: The regards study. Stroke56(1), 105–112 (2025)

work page 2025
[31]

Translational Stroke Research 16(5), 1474–1485 (2025)

Zhu, W., He, X., Huang, D., Jiang, Y., Hong, W., Ke, S., Wang, E., Wang, F., Wang, X., Shan, R.,et al.: Global and regional burden of ischemic stroke disease from 1990 to 2021: an age-period-cohort analysis. Translational Stroke Research 16(5), 1474–1485 (2025)

work page 1990
[32]

Diabetologia67(7), 1192– 1205 (2024)

Sacco, S., Foschi, M., Ornello, R., De Santis, F., Pofi, R., Romoli, M.: Prevention and treatment of ischaemic and haemorrhagic stroke in people with diabetes mellitus: a focus on glucose control and comorbidities. Diabetologia67(7), 1192– 1205 (2024)

work page 2024
[33]

Nature Microbiology7(6), 831–843 (2022)

Xu, S., Liu, Y.-X., Cernava, T., Wang, H., Zhou, Y., Xia, T., Cao, S., Berg, G., Shen, X.-X., Wen, Z.,et al.: Fusarium fruiting body microbiome member pantoea agglomerans inhibits fungal pathogenesis by targeting lipid rafts. Nature Microbiology7(6), 831–843 (2022)

work page 2022
[34]

Nature Communications14(1), 232 (2023)

Klughammer, J., Romanovskaia, D., Nemc, A., Posautz, A., Seid, C.A., Schuster, L.C., Keinath, M.C., Lugo Ramos, J.S., Kosack, L., Evankow, A.,et al.: Com- parative analysis of genome-scale, base-resolution dna methylation profiles across 580 animal species. Nature Communications14(1), 232 (2023)

work page 2023
[35]

Nature Photonics, 1–8 (2025)

Li, G., Zhang, Z., Agyei-Tuffour, B., Wu, L., Gries, T.W., Prashanthan, K., Musi- ienko, A., Li, J., Zhu, R., Hart, L.J., et al.: Stabilizing high-efficiency perovskite solar cells via strategic interfacial contact engineering. Nature Photonics, 1–8 (2025)

work page 2025
[36]

Nature Reviews Materials 10(7), 536–549 (2025)

Wu, L., Hu, S., Yang, F., Li, G., Wang, J., Zuo, W., Jer´ onimo-Rendon, J.J., Turren-Cruz, S.-H., Saba, M., Saliba, M.,et al.: Resilience pathways for halide perovskite photovoltaics under temperature cycling. Nature Reviews Materials 10(7), 536–549 (2025)

work page 2025
[37]

Nature Publishing Group UK London (2016) 21

Baker, M.: 1,500 scientists lift the lid on reproducibility. Nature Publishing Group UK London (2016) 21

work page 2016
[38]

Nature 633(8029), 266–266 (2024)

Castelvecchi, D.: Researchers built an ‘ai scientist’—what can it do. Nature 633(8029), 266–266 (2024)

work page 2024
[39]

Nature Communications (2026)

Chen, F., Stogiannidis, I., Wood, A., Bueno, D., Williams, D., Macfarlane, F., Grieve, B.D., Wells, D., Atkinson, J.A., Hawkesford, M.J., et al.: A conversational multi-agent ai system for automated plant phenotyping. Nature Communications (2026)

work page 2026
[40]

Science advances9(44), 0461 (2023)

Ha, T., Lee, D., Kwon, Y., Park, M.S., Lee, S., Jang, J., Choi, B., Jeon, H., Kim, J., Choi, H.,et al.: Ai-driven robotic chemist for autonomous synthesis of organic molecules. Science advances9(44), 0461 (2023)

work page 2023
[41]

NPJ digital medicine7(1), 178 (2024)

Cordes, A., Bak, M., Lyndon, M., Hudson, M., Fiske, A., Celi, L.A., McLen- nan, S.: Competing interests: digital health and indigenous data sovereignty. NPJ digital medicine7(1), 178 (2024)

work page 2024
[42]

Nature Genetics57(9), 2090–2098 (2025)

Yu, L., Feng, R., Sun, Y., Peng, Y.: Governance of cross-border genomic data sharing through a human rights approach. Nature Genetics57(9), 2090–2098 (2025)

work page 2090
[43]

GigaScience10(6), 025 (2021)

Kaur, B., Dugr´ e, M., Hanna, A., Glatard, T.: An analysis of security vulnera- bilities in container images for scientific data analysis. GigaScience10(6), 025 (2021)

work page 2021
[44]

In: 12th USENIX Security Symposium (USENIX Security 03) (2003)

Crosby, S.A., Wallach, D.S.: Denial of service via algorithmic complexity attacks. In: 12th USENIX Security Symposium (USENIX Security 03) (2003)

work page 2003
[45]

Nature Human Behaviour7(11), 1804–1805 (2023)

Nakadai, R., Nakawake, Y., Shibasaki, S.: Ai language tools risk scientific diversity and innovation. Nature Human Behaviour7(11), 1804–1805 (2023)

work page 2023
[46]

Nature Communications16(1), 8317 (2025)

Tang, X., Jin, Q., Zhu, K., Yuan, T., Zhang, Y., Zhou, W., Qu, M., Zhao, Y., Tang, J., Zhang, Z.,et al.: Risks of ai scientists: prioritizing safeguarding over autonomy. Nature Communications16(1), 8317 (2025)

work page 2025
[47]

Nature541(7638), 563–565 (2017)

Baker, M.: Scientific computing: Code alert. Nature541(7638), 563–565 (2017)

work page 2017
[48]

Nature methods 17(3), 261–272 (2020)

Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Cour- napeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J.,et al.: Scipy 1.0: fundamental algorithms for scientific computing in python. Nature methods 17(3), 261–272 (2020)

work page 2020
[49]

nature585(7825), 357–362 (2020)

Harris, C.R., Millman, K.J., Van Der Walt, S.J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N.J.,et al.: Array programming with numpy. nature585(7825), 357–362 (2020)

work page 2020
[50]

In: The Eleventh International Conference on Learning Representations (2022) 22

Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K.R., Cao, Y.: React: Synergizing reasoning and acting in language models. In: The Eleventh International Conference on Learning Representations (2022) 22

work page 2022