pith. machine review for the scientific record. sign in

arxiv: 2603.09290 · v3 · submitted 2026-03-10 · 💻 cs.SE · cs.CE· cs.MA

Recognition: no theorem link

ToolRosella: Translating Code Repositories into Standardized Tools for Scientific Agents

Authors on Pith no claims yet

Pith reviewed 2026-05-15 13:52 UTC · model grok-4.3

classification 💻 cs.SE cs.CEcs.MA
keywords ToolRosellascientific agentsLLM toolscode repositoriestool standardizationrepository conversionagent frameworksscientific computing
0
0 comments X

The pith

ToolRosella converts scientific code repositories into standardized, agent-invocable tools with 61.5 percent success after repair.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

LLM-based agents for scientific work are limited by the small number of manually built tools available to them, while large amounts of useful code sit in open repositories that are hard to make reliable and callable. ToolRosella tackles this gap by running automated repository analysis, building tool interfaces, testing execution, and applying iterative repairs until the code becomes usable by agents. Tested on 122 GitHub repositories spanning 35 subdisciplines in six domains, the system reaches 61.5 percent conversion success after repairs, produces 1,580 callable tools, runs 4.4 times faster than human engineers, and supports 84 percent success on downstream tasks. These tools also raise performance when added to other agent frameworks, especially on problems whose needed functions are missing from existing fixed tool sets. The result matters because it offers a way to expand what agents can do in science without requiring constant human curation of every new capability.

Core claim

ToolRosella is a framework that automatically transforms heterogeneous scientific code repositories into standardized, agent-invocable tools through the combination of repository analysis, tool interface construction, execution testing, and iterative repair. Across 122 GitHub repositories covering 35 subdisciplines in six domains, it achieves a 61.5 percent repository conversion success rate after iterative repair at 4.4 times the speed of human engineers, yielding 1,580 callable tools that deliver an 84.0 percent success rate on downstream tasks and improve results when integrated into other agent frameworks, particularly where required tools are absent from curated inventories.

What carries the argument

The ToolRosella pipeline of repository analysis, interface construction, execution testing, and iterative repair that standardizes code into agent-callable tools.

If this is right

  • Agents can draw on far larger sets of scientific functionality without manual tool creation for each repository.
  • Task success rates rise on problems that need tools missing from fixed, hand-curated inventories.
  • Human engineering time for tool standardization drops by a factor of roughly 4.4.
  • The produced tools integrate directly into multiple existing agent frameworks and raise their performance.
  • The same conversion process applies across many scientific domains and subdisciplines.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Wider use could reduce the bottleneck of tool curation and support more autonomous scientific workflows.
  • The method might extend to non-scientific code bases for general-purpose agents.
  • Additional domain checks could be layered on to catch subtle functionality losses the current tests miss.
  • Scaling to thousands more repositories would test whether the 61.5 percent rate holds outside the evaluated set.

Load-bearing premise

That automatic analysis, interface building, testing, and repair can turn varied scientific code into reliable tools without losing original functionality or introducing new errors across many domains.

What would settle it

Apply ToolRosella to a fresh collection of repositories not included in the original 122 and check whether the conversion success rate remains near 61.5 percent while the resulting tools preserve the same computational outputs as the source code.

read the original abstract

Large Language Model (LLM)-based agent systems are increasingly used for scientific tasks, yet their practical capability remains constrained by the narrow scope of manually curated tools they can invoke. Much scientific computational functionality already exists in open-source code repositories, but these resources remain difficult to standardize, operationalize, and invoke reliably for agent use. Here we present ToolRosella, a framework that automatically transforms heterogeneous scientific code repositories into standardized, agent-invocable tools. ToolRosella combines repository analysis, tool interface construction, execution testing, and iterative repair to address the problem of repository-to-tool standardization. Across 122 GitHub repositories spanning 35 subdisciplines in six domains, ToolRosella reaches a 61.5% repository conversion success rate after iterative repair, with a 4.4 speedup over human engineers. The resulting 1,580 callable tools support a downstream task success rate of 84.0% and improve performance when integrated into other agent frameworks, particularly on tasks whose required tools are absent from fixed, curated inventories.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The paper introduces ToolRosella, a framework that automatically converts heterogeneous scientific code repositories into standardized, agent-invocable tools via repository analysis, interface construction, execution testing, and iterative repair. Evaluated on 122 GitHub repositories spanning 35 subdisciplines in six domains, it reports a 61.5% conversion success rate after repair, a 4.4x speedup over human engineers, 1,580 resulting callable tools, an 84.0% downstream task success rate, and performance gains when integrated into other agent frameworks.

Significance. If the conversion process reliably preserves original functionality, the work would meaningfully expand the tool inventory available to LLM-based scientific agents beyond manually curated sets, with the reported empirical scale (122 repositories, 1,580 tools) and downstream improvements providing concrete evidence of practical impact. The concrete success rates and speedup measurements are strengths that support the central claim of scalable standardization.

major comments (2)
  1. [Evaluation and Results] The definition of repository conversion success (61.5% after iterative repair) relies on execution testing that checks for non-crashing behavior on example inputs, but the manuscript provides no details on verification of semantic equivalence for scientific outputs such as numerical accuracy, solver results, or side effects. This assumption is load-bearing for the downstream 84.0% task success claim and the assertion that the 1,580 tools preserve original repository functionality without introducing silent errors.
  2. [Results] The 4.4x speedup over human engineers is presented as a direct empirical outcome, yet the manuscript lacks a precise description of the human baseline protocol, task scope, and measurement methodology (e.g., time per repository, expertise level), making it difficult to assess whether the comparison fairly supports the efficiency claim.
minor comments (2)
  1. The abstract and results would benefit from explicit reference to any supplementary material containing the full error analysis or repair logs to allow readers to evaluate the iterative repair process.
  2. Notation for tool interface standardization (e.g., how function signatures and parameter mappings are formalized) could be clarified with a small example in the methods section for reproducibility.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback on our manuscript. Below we provide point-by-point responses to the major comments and indicate the revisions made.

read point-by-point responses
  1. Referee: [Evaluation and Results] The definition of repository conversion success (61.5% after iterative repair) relies on execution testing that checks for non-crashing behavior on example inputs, but the manuscript provides no details on verification of semantic equivalence for scientific outputs such as numerical accuracy, solver results, or side effects. This assumption is load-bearing for the downstream 84.0% task success claim and the assertion that the 1,580 tools preserve original repository functionality without introducing silent errors.

    Authors: We agree that the conversion success metric is defined via execution testing for non-crashing behavior on example inputs and does not include explicit verification of semantic equivalence such as numerical accuracy, solver outputs, or side effects. The manuscript therefore does not claim or demonstrate full semantic preservation beyond operational executability. The reported 84.0% downstream task success rate provides indirect support that the tools function usefully in agent workflows, but we acknowledge this does not substitute for direct equivalence checks. In the revision we have added an explicit limitations paragraph clarifying the evaluation scope and noting the difficulty of general semantic equivalence testing across heterogeneous scientific codes. revision: partial

  2. Referee: [Results] The 4.4x speedup over human engineers is presented as a direct empirical outcome, yet the manuscript lacks a precise description of the human baseline protocol, task scope, and measurement methodology (e.g., time per repository, expertise level), making it difficult to assess whether the comparison fairly supports the efficiency claim.

    Authors: We accept that the original manuscript did not supply sufficient detail on the human baseline. We have revised the relevant results section to describe the protocol: two graduate students with domain expertise each processed a disjoint subset of the 122 repositories; timing began at repository checkout and ended when a working standardized tool interface was produced; the task scope matched the automated pipeline exactly, including interface design and basic testing. This added description allows readers to evaluate the fairness of the 4.4x comparison. revision: yes

Circularity Check

0 steps flagged

No circularity: empirical results from external repositories

full rationale

The paper presents ToolRosella as an empirical system for repository-to-tool conversion, with all reported metrics (61.5% success rate, 4.4x speedup, 84.0% downstream task success) obtained directly from execution testing on 122 external GitHub repositories across domains. No equations, parameter fitting, self-citations, or uniqueness theorems appear in the provided text to derive these outcomes; the evaluation chain relies on independent test inputs and observed behavior rather than reducing to fitted inputs or self-referential definitions. The derivation is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 0 invented entities

The central claim rests on domain assumptions about the feasibility of automatic standardization and repair for scientific code; no free parameters or invented entities are evident from the abstract.

axioms (2)
  • domain assumption Heterogeneous scientific code repositories can be automatically analyzed and transformed into standardized agent-invocable tools without substantial loss of original functionality.
    This assumption underpins the entire conversion pipeline and reported success rate.
  • domain assumption Iterative repair processes can resolve execution and interface issues across diverse codebases in a reliable manner.
    Required to reach the 61.5% success rate after initial conversion attempts.

pith-pipeline@v0.9.0 · 5525 in / 1360 out tokens · 47362 ms · 2026-05-15T13:52:15.104433+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. RemoteAgent: Bridging Vague Human Intents and Earth Observation with RL-based Agentic MLLMs

    cs.CV 2026-04 unverdicted novelty 7.0

    RemoteAgent uses RL fine-tuning on VagueEO to align MLLMs for vague EO intent recognition, handling simple tasks internally and routing dense predictions to tools via Model Context Protocol.

  2. FactReview: Evidence-Grounded Reviews with Literature Positioning and Execution-Based Claim Verification

    cs.AI 2026-04 conditional novelty 7.0

    FactReview extracts claims from ML papers, positions them via literature retrieval, and verifies them through code execution, labeling each as Supported, Partially supported, or In conflict, as shown in a CompGCN case study.

  3. SKILLFOUNDRY: Building Self-Evolving Agent Skill Libraries from Heterogeneous Scientific Resources

    cs.AI 2026-04 unverdicted novelty 7.0

    SkillFoundry mines heterogeneous scientific resources into a self-evolving library of validated agent skills, with 71.1% novelty versus prior libraries and measurable gains on coding benchmarks plus two genomics tasks.

Reference graph

Works this paper leans on

50 extracted references · 50 canonical work pages · cited by 3 Pith papers

  1. [1]

    Nature620(7972), 47–60 (2023)

    Wang, H., Fu, T., Du, Y., Gao, W., Huang, K., Liu, Z., Chandak, P., Liu, S., Van Katwyk, P., Deac, A.,et al.: Scientific discovery in the age of artificial intelligence. Nature620(7972), 47–60 (2023)

  2. [2]

    The innovation2(4) (2021)

    Xu, Y., Liu, X., Cao, X., Huang, C., Liu, E., Qian, S., Liu, X., Wu, Y., Dong, F., Qiu, C.-W., et al.: Artificial intelligence: A powerful paradigm for scientific research. The innovation2(4) (2021)

  3. [3]

    Nature646(8085), 716–723 (2025)

    Swanson, K., Wu, W., Bulaong, N.L., Pak, J.E., Zou, J.: The virtual lab of ai agents designs new sars-cov-2 nanobodies. Nature646(8085), 716–723 (2025)

  4. [4]

    nature596(7873), 583–589 (2021)

    Jumper, J., Evans, R., Pritzel, A., Green, T., Figurnov, M., Ronneberger, O., Tunyasuvunakool, K., Bates, R., ˇZ´ ıdek, A., Potapenko, A.,et al.: Highly accurate protein structure prediction with alphafold. nature596(7873), 583–589 (2021)

  5. [5]

    Nature619(7970), 533–538 (2023)

    Bi, K., Xie, L., Zhang, H., Chen, X., Gu, X., Tian, Q.: Accurate medium-range global weather forecasting with 3d neural networks. Nature619(7970), 533–538 (2023)

  6. [6]

    Science387(6736), 850–858 (2025)

    Hayes, T., Rao, R., Akin, H., Sofroniew, N.J., Oktay, D., Lin, Z., Verkuil, R., Tran, V.Q., Deaton, J., Wiggert, M.,et al.: Simulating 500 million years of evolution with a language model. Science387(6736), 850–858 (2025)

  7. [7]

    Nature593(7859), 351–361 (2021)

    Hatfield, P.W., Gaffney, J.A., Anderson, G.J., Ali, S., Antonelli, L., Pree, S., Citrin, J., Fajardo, M., Knapp, P., Kettle, B.,et al.: The data-driven future of high-energy-density physics. Nature593(7859), 351–361 (2021)

  8. [8]

    Science379(6637), 1123–1130 (2023)

    Lin, Z., Akin, H., Rao, R., Hie, B., Zhu, Z., Lu, W., Smetanin, N., Verkuil, R., Kabeli, O., Shmueli, Y.,et al.: Evolutionary-scale prediction of atomic-level protein structure with a language model. Science379(6637), 1123–1130 (2023)

  9. [9]

    The Innovation5(2) (2024)

    Qu, Y., Du, P., Che, W., Wei, C., Zhang, C., Ouyang, W., Bian, Y., Xu, F., Hu, B., Du, K., et al.: Promoting interactions between cognitive science and large language models. The Innovation5(2) (2024)

  10. [10]

    Nature Machine Intelligence7(9), 1373–1375 (2025)

    Xin, H., Kitchin, J.R., Kulik, H.J.: Towards agentic science for advancing scientific discovery. Nature Machine Intelligence7(9), 1373–1375 (2025)

  11. [11]

    Nature651(8107), 914–919 (2026)

    Lu, C., Lu, C., Lange, R.T., Yamada, Y., Hu, S., Foerster, J., Ha, D., Clune, J.: Towards end-to-end automation of ai research. Nature651(8107), 914–919 (2026)

  12. [12]

    Nature Machine Intelligence 4(6), 533–543 (2022)

    Tee, K.P., Cheong, S., Li, J., Ganesh, G.: A framework for tool cognition in robots without prior tool learning or observation. Nature Machine Intelligence 4(6), 533–543 (2022)

  13. [13]

    Bran, A., Cox, S., Schilter, O., Baldassari, C., White, A.D., Schwaller, P.: Aug- menting large language models with chemistry tools

    M. Bran, A., Cox, S., Schilter, O., Baldassari, C., White, A.D., Schwaller, P.: Aug- menting large language models with chemistry tools. Nature Machine Intelligence 6(5), 525–535 (2024) https://doi.org/10.1038/s42256-024-00832-8

  14. [14]

    Nature Computational Science, 1–15 (2025) 19

    Shao, E., Wang, Y., Qian, Y., Pan, Z., Liu, H., Wang, D.: Sciscigpt: advancing human–ai collaboration in the science of science. Nature Computational Science, 1–15 (2025) 19

  15. [15]

    Nature624(7992), 570–578 (2023) https://doi.org/ 10.1038/s41586-023-06792-0

    Boiko, D.A., MacKnight, R., Kline, B., Gomes, G.: Autonomous chemical research with large language models. Nature624(7992), 570–578 (2023) https://doi.org/ 10.1038/s41586-023-06792-0

  16. [16]

    npj Computational Materials (2026)

    Vriza, A., Prince, M.H., Zhou, T., Chan, H., Cherukara, M.J.: Operating advanced scientific instruments with ai agents that learn on the job. npj Computational Materials (2026)

  17. [17]

    Nature Computational Science5(10), 962–972 (2025) https://doi.org/10.1038/ s43588-025-00849-y

    Ding, K., Yu, J., Huang, J., Yang, Y., Zhang, Q., Chen, H.: Scitoola- gent: a knowledge-graph-driven scientific agent for multitool integration. Nature Computational Science5(10), 962–972 (2025) https://doi.org/10.1038/ s43588-025-00849-y

  18. [18]

    Nature Biomedical Engineering10(2), 245–258 (2026)

    Qu, Y., Huang, K., Yin, M., Zhan, K., Liu, D., Yin, D., Cousins, H.C., Johnson, W.A., Wang, X., Shah, M.,et al.: Crispr-gpt for agentic automation of gene- editing experiments. Nature Biomedical Engineering10(2), 245–258 (2026)

  19. [19]

    In: The Thirty- ninth Annual Conference on Neural Information Processing Systems (2025)

    Wang, H., Ni, Z., Zhang, S., Lu, S., Hu, S., He, Z., Hu, C., Lin, J., Guo, Y., Du, Y., Lyu, P.: Repomaster: Autonomous exploration and under- standing of github repositories for complex task solving. In: The Thirty- ninth Annual Conference on Neural Information Processing Systems (2025). https://openreview.net/forum?id=aSfBbhUJAa

  20. [20]

    In: Che, W., Nabende, J., Shutova, E., Pilehvar, M.T

    Lyu, B., Cong, X., Yu, H., Yang, P., Qian, C., Wang, Z., Qin, Y., Ye, Y., Lu, Y., Qian, C., Zhang, Z., Yan, Y., Lin, Y., Liu, Z., Sun, M.: Enhancing open- domain task-solving capability of LLMs via autonomous tool integration from GitHub. In: Che, W., Nabende, J., Shutova, E., Pilehvar, M.T. (eds.) Proceed- ings of the 63rd Annual Meeting of the Associati...

  21. [21]

    Nature cancer6(8), 1337–1349 (2025)

    Ferber, D., El Nahhas, O.S., W¨ olflein, G., Wiest, I.C., Clusmann, J., Leßmann, M.-E., Foersch, S., Lammert, J., Tschochohei, M., J¨ ager, D.,et al.: Development and validation of an autonomous artificial intelligence agent for clinical decision- making in oncology. Nature cancer6(8), 1337–1349 (2025)

  22. [22]

    npj Artificial Intelligence2(1), 22 (2026)

    Yang, A., Woo, J., Zhang, R., Mach, A., Ramkumar, P., Ma, Y.: Tool-wielding language model-based agent offers conversational exploration of clinical tabular data. npj Artificial Intelligence2(1), 22 (2026)

  23. [23]

    Advances in Neural Information Processing Systems37, 50528–50652 (2024)

    Yang, J., Jimenez, C.E., Wettig, A., Lieret, K., Yao, S., Narasimhan, K., Press, O.: Swe-agent: Agent-computer interfaces enable automated software engineering. Advances in Neural Information Processing Systems37, 50528–50652 (2024)

  24. [24]

    Nature chemistry3(10), 745–748 (2011)

    Woelfle, M., Olliaro, P., Todd, M.H.: Open science is a research accelerator. Nature chemistry3(10), 745–748 (2011)

  25. [25]

    Nature Communications15(1), 8117 (2024) 20

    Kampen, A.H., Mahamune, U., Jongejan, A., Schaik, B.D., Balashova, D., Lash- gari, D., Pras-Raves, M., Wever, E.J., Dane, A.D., Garc´ ıa-Valiente, R.,et al.: Encore: a practical implementation to improve reproducibility and transparency of computational research. Nature Communications15(1), 8117 (2024) 20

  26. [26]

    Nature646(8084), 284–286 (2025)

    Di Cosmo, R., Granger, S., Hinsen, K., Jullien, N., Le Berre, D., Louvet, V., Maumet, C., Maurice, C., Monat, R., Rougier, N.P.: Stop treating code like an afterthought: record, share and value it. Nature646(8084), 284–286 (2025)

  27. [27]

    The Innovation (2026)

    Wang, P., Sun, X., Zou, L., Law, E.L.-C., Paas, F.: Ai-driven complex systems redefine cognitive science. The Innovation (2026)

  28. [28]

    In: Thirty-seventh Conference on Neural Informa- tion Processing Systems (2023).https://openreview.net/forum?id=Yacmpz84TH

    Schick, T., Dwivedi-Yu, J., Dessi, R., Raileanu, R., Lomeli, M., Hambro, E., Zettlemoyer, L., Cancedda, N., Scialom, T.: Toolformer: Language models can teach themselves to use tools. In: Thirty-seventh Conference on Neural Informa- tion Processing Systems (2023).https://openreview.net/forum?id=Yacmpz84TH

  29. [29]

    Stroke55(9), 2284–2294 (2024)

    Kelly, D.M., Engelbertz, C., Rothwell, P.M., Anderson, C.D., Reinecke, H., Koeppe, J.: Age-and sex-specific analysis of stroke hospitalization rates, risk factors, and outcomes from german nationwide data. Stroke55(9), 2284–2294 (2024)

  30. [30]

    Stroke56(1), 105–112 (2025)

    Howard, G., Muntner, P., Lackland, D.T., Plante, T.B., Cushman, M., Stamm, B., Judd, S.E., Howard, V.J.: Association of duration of recognized hypertension and stroke risk: The regards study. Stroke56(1), 105–112 (2025)

  31. [31]

    Translational Stroke Research 16(5), 1474–1485 (2025)

    Zhu, W., He, X., Huang, D., Jiang, Y., Hong, W., Ke, S., Wang, E., Wang, F., Wang, X., Shan, R.,et al.: Global and regional burden of ischemic stroke disease from 1990 to 2021: an age-period-cohort analysis. Translational Stroke Research 16(5), 1474–1485 (2025)

  32. [32]

    Diabetologia67(7), 1192– 1205 (2024)

    Sacco, S., Foschi, M., Ornello, R., De Santis, F., Pofi, R., Romoli, M.: Prevention and treatment of ischaemic and haemorrhagic stroke in people with diabetes mellitus: a focus on glucose control and comorbidities. Diabetologia67(7), 1192– 1205 (2024)

  33. [33]

    Nature Microbiology7(6), 831–843 (2022)

    Xu, S., Liu, Y.-X., Cernava, T., Wang, H., Zhou, Y., Xia, T., Cao, S., Berg, G., Shen, X.-X., Wen, Z.,et al.: Fusarium fruiting body microbiome member pantoea agglomerans inhibits fungal pathogenesis by targeting lipid rafts. Nature Microbiology7(6), 831–843 (2022)

  34. [34]

    Nature Communications14(1), 232 (2023)

    Klughammer, J., Romanovskaia, D., Nemc, A., Posautz, A., Seid, C.A., Schuster, L.C., Keinath, M.C., Lugo Ramos, J.S., Kosack, L., Evankow, A.,et al.: Com- parative analysis of genome-scale, base-resolution dna methylation profiles across 580 animal species. Nature Communications14(1), 232 (2023)

  35. [35]

    Nature Photonics, 1–8 (2025)

    Li, G., Zhang, Z., Agyei-Tuffour, B., Wu, L., Gries, T.W., Prashanthan, K., Musi- ienko, A., Li, J., Zhu, R., Hart, L.J., et al.: Stabilizing high-efficiency perovskite solar cells via strategic interfacial contact engineering. Nature Photonics, 1–8 (2025)

  36. [36]

    Nature Reviews Materials 10(7), 536–549 (2025)

    Wu, L., Hu, S., Yang, F., Li, G., Wang, J., Zuo, W., Jer´ onimo-Rendon, J.J., Turren-Cruz, S.-H., Saba, M., Saliba, M.,et al.: Resilience pathways for halide perovskite photovoltaics under temperature cycling. Nature Reviews Materials 10(7), 536–549 (2025)

  37. [37]

    Nature Publishing Group UK London (2016) 21

    Baker, M.: 1,500 scientists lift the lid on reproducibility. Nature Publishing Group UK London (2016) 21

  38. [38]

    Nature 633(8029), 266–266 (2024)

    Castelvecchi, D.: Researchers built an ‘ai scientist’—what can it do. Nature 633(8029), 266–266 (2024)

  39. [39]

    Nature Communications (2026)

    Chen, F., Stogiannidis, I., Wood, A., Bueno, D., Williams, D., Macfarlane, F., Grieve, B.D., Wells, D., Atkinson, J.A., Hawkesford, M.J., et al.: A conversational multi-agent ai system for automated plant phenotyping. Nature Communications (2026)

  40. [40]

    Science advances9(44), 0461 (2023)

    Ha, T., Lee, D., Kwon, Y., Park, M.S., Lee, S., Jang, J., Choi, B., Jeon, H., Kim, J., Choi, H.,et al.: Ai-driven robotic chemist for autonomous synthesis of organic molecules. Science advances9(44), 0461 (2023)

  41. [41]

    NPJ digital medicine7(1), 178 (2024)

    Cordes, A., Bak, M., Lyndon, M., Hudson, M., Fiske, A., Celi, L.A., McLen- nan, S.: Competing interests: digital health and indigenous data sovereignty. NPJ digital medicine7(1), 178 (2024)

  42. [42]

    Nature Genetics57(9), 2090–2098 (2025)

    Yu, L., Feng, R., Sun, Y., Peng, Y.: Governance of cross-border genomic data sharing through a human rights approach. Nature Genetics57(9), 2090–2098 (2025)

  43. [43]

    GigaScience10(6), 025 (2021)

    Kaur, B., Dugr´ e, M., Hanna, A., Glatard, T.: An analysis of security vulnera- bilities in container images for scientific data analysis. GigaScience10(6), 025 (2021)

  44. [44]

    In: 12th USENIX Security Symposium (USENIX Security 03) (2003)

    Crosby, S.A., Wallach, D.S.: Denial of service via algorithmic complexity attacks. In: 12th USENIX Security Symposium (USENIX Security 03) (2003)

  45. [45]

    Nature Human Behaviour7(11), 1804–1805 (2023)

    Nakadai, R., Nakawake, Y., Shibasaki, S.: Ai language tools risk scientific diversity and innovation. Nature Human Behaviour7(11), 1804–1805 (2023)

  46. [46]

    Nature Communications16(1), 8317 (2025)

    Tang, X., Jin, Q., Zhu, K., Yuan, T., Zhang, Y., Zhou, W., Qu, M., Zhao, Y., Tang, J., Zhang, Z.,et al.: Risks of ai scientists: prioritizing safeguarding over autonomy. Nature Communications16(1), 8317 (2025)

  47. [47]

    Nature541(7638), 563–565 (2017)

    Baker, M.: Scientific computing: Code alert. Nature541(7638), 563–565 (2017)

  48. [48]

    Nature methods 17(3), 261–272 (2020)

    Virtanen, P., Gommers, R., Oliphant, T.E., Haberland, M., Reddy, T., Cour- napeau, D., Burovski, E., Peterson, P., Weckesser, W., Bright, J.,et al.: Scipy 1.0: fundamental algorithms for scientific computing in python. Nature methods 17(3), 261–272 (2020)

  49. [49]

    nature585(7825), 357–362 (2020)

    Harris, C.R., Millman, K.J., Van Der Walt, S.J., Gommers, R., Virtanen, P., Cournapeau, D., Wieser, E., Taylor, J., Berg, S., Smith, N.J.,et al.: Array programming with numpy. nature585(7825), 357–362 (2020)

  50. [50]

    In: The Eleventh International Conference on Learning Representations (2022) 22

    Yao, S., Zhao, J., Yu, D., Du, N., Shafran, I., Narasimhan, K.R., Cao, Y.: React: Synergizing reasoning and acting in language models. In: The Eleventh International Conference on Learning Representations (2022) 22