pith. sign in

arxiv: 2509.23986 · v2 · pith:MRUFOTG3new · submitted 2025-09-28 · 💻 cs.AI

TusoAI: Agentic Optimization for Scientific Methods

Pith reviewed 2026-05-21 22:30 UTC · model grok-4.3

classification 💻 cs.AI
keywords agentic AIscientific method developmentknowledge treecomputational optimizationgeneticssingle-cell RNA-seqautoimmune disease associations
0
0 comments X

The pith

TusoAI autonomously builds and refines computational methods for scientific tasks by turning domain knowledge into an optimizable tree structure.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents TusoAI as an AI agent that receives a scientific task and an evaluation function, then develops its own analysis tools without manual coding by scientists. It organizes scattered domain knowledge into a knowledge tree, generates candidate methods, and repeatedly diagnoses and improves them against data. If successful, this removes a major bottleneck where researchers spend time iterating on literature, assumptions, and code instead of interpreting results. The system is shown to beat existing expert methods and other AI agents on tasks including RNA sequencing and earth observation data, while also surfacing new biological findings in genetics.

Core claim

TusoAI integrates domain knowledge into a knowledge tree representation and performs iterative, domain-specific optimization and model diagnosis to improve performance over a pool of candidate solutions, outperforming state-of-the-art expert methods, MLE agents, and scientific AI agents on benchmarks while improving genetics methods and identifying 9 new autoimmune-T-cell associations plus 7 unreported variant-gene links.

What carries the argument

The knowledge tree that structures domain knowledge for iterative optimization and diagnosis steps.

If this is right

  • TusoAI produces improved computational methods for single-cell RNA-seq denoising and satellite-based earth monitoring.
  • When applied to genetics open problems, it enhances existing methods and reports previously unknown associations between autoimmune diseases and T-cell subtypes.
  • The same process identifies new links between disease variants and their target genes.
  • The approach works across diverse tasks without requiring scientists to hand-craft each analysis pipeline.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same knowledge-tree plus diagnosis loop could be tested on tasks in chemistry or materials science where method development is similarly labor-intensive.
  • If the system scales, scientists might shift effort from writing analysis code toward designing better evaluation functions that guide the agent.
  • Repeated independent runs on the same task would clarify how stable the discovered methods are.

Load-bearing premise

Performance gains come from genuine integration of scientific domain knowledge rather than from how the system is prompted or from selecting favorable runs after the fact.

What would settle it

Apply TusoAI independently to a fresh scientific task several times with fixed prompts and report whether the best method found still outperforms baselines without post-hoc selection.

Figures

Figures reproduced from arXiv: 2509.23986 by Alistair Turcan, Kexin Huang, Lei Li, Martin Jinye Zhang.

Figure 1
Figure 1. Figure 1: Scientific method development with TusoAI. (A) Method overview. (B) Example do￾main knowledge tree (categories and instructions per category), feedback, and diagnostics. with a different approach from ours, which integrates a domain knowledge tree with fine-grained it￾erative optimization and Bayesian updates. As their code is not publicly available, direct comparison is not possible. LLM-based general mac… view at source ↗
Figure 2
Figure 2. Figure 2: Behavior of code generated by TusoAI. (A) Code diversity of TusoAI and AIDE over optimization time, as measured by 1− cosine similarity. Each line corresponds to a dataset. (B) Performance of the current code and the best code over optimization time for a representative task “Denoise”. Key optimization changes with their occurrence times are annotated. 4.2 ABLATION STUDIES We conducted extensive ablation s… view at source ↗
Figure 3
Figure 3. Figure 3: Optimizing scDRS for detecting cell-disease associations. (A) Assessing power in causal simulations. 95% CI’s are calculated across 30 replicates at each perturbation effect size. (B) Number of discovered ground-truth trait-cell type pairs at FDR 0.05. (C) Number of discovered trait-T cell subtype pairs at FDR 0.05. Linking genetic variants to genes using single-cell multiome. pgBoost (Dorans et al., 2025)… view at source ↗
Figure 4
Figure 4. Figure 4: Optimizing pgBoost for SNP-gene link discovery. (A) Area under the enrichment-recall curve (AUERC, as defined in pgBoost) across distance thresholds for ground truth eQTL variant￾gene links. (B) AUERC across distance thresholds for ground truth ABC variant-gene links. (C) Locus plot of rs138917529 and surrounding genes. Red dashed line indicates cutoff for SNP-gene linking. 9 [PITH_FULL_IMAGE:figures/full… view at source ↗
Figure 5
Figure 5. Figure 5: Validation performance across 3 replicates. Final validation performance after running TusoAI 3 separate times on each single-cell task. 19 [PITH_FULL_IMAGE:figures/full_fig_p019_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Additional ablation information. (A) Box plot across 5 tasks of the mean code diversity. (B) Box plot across 5 tasks of the mean time to optimize. 21 [PITH_FULL_IMAGE:figures/full_fig_p021_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Additional LLM information. (A) Average length of generated methods for each task and LLM versus the total count of how many methods were generated. (B) Average length of generated methods for each task and LLM versus the final deployment performance. 22 [PITH_FULL_IMAGE:figures/full_fig_p022_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Additional scDRS metrics. (A) Q-Q plot of -log10 p-values in null simulations. 95% CI’s are calculated at each point across 30 replicates. (B) AUPRC of associating individual cells in causal simulations. 95% CI’s are calculated at each point across 30 replicates. (C) FDR of associating individual cells in causal simulations. 95% CI’s are calculated at each point across 30 replicates. 23 [PITH_FULL_IMAGE:f… view at source ↗
read the original abstract

Scientific discovery is often slowed by the manual development of computational tools needed to analyze complex experimental data. Building such tools is costly and time-consuming because scientists must iteratively review literature, test modeling and scientific assumptions against empirical data, and implement these insights into efficient software. Large language models (LLMs) have demonstrated strong capabilities in synthesizing literature, reasoning with empirical data, and generating domain-specific code, offering new opportunities to accelerate computational method development. Existing LLM-based systems either focus on performing scientific analyses using existing computational methods or on developing computational methods or models for general machine learning without effectively integrating the often unstructured knowledge specific to scientific domains. Here, we introduce TusoAI , an agentic AI system that takes a scientific task description with an evaluation function and autonomously develops and optimizes computational methods for the application. TusoAI integrates domain knowledge into a knowledge tree representation and performs iterative, domain-specific optimization and model diagnosis, improving performance over a pool of candidate solutions. We conducted comprehensive benchmark evaluations demonstrating that TusoAI outperforms state-of-the-art expert methods, MLE agents, and scientific AI agents across diverse tasks, such as single-cell RNA-seq data denoising and satellite-based earth monitoring. Applying TusoAI to two key open problems in genetics improved existing computational methods and uncovered novel biology, including 9 new associations between autoimmune diseases and T cell subtypes and 7 previously unreported links between disease variants linked to their target genes. Our code is publicly available at https://github.com/Alistair-Turcan/TusoAI.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The paper introduces TusoAI, an agentic AI system that takes a scientific task description and evaluation function as input and autonomously develops optimized computational methods. It represents domain knowledge as a knowledge tree, performs iterative domain-specific optimization and model diagnosis over a pool of candidate solutions, and claims to outperform state-of-the-art expert methods, MLE agents, and scientific AI agents on benchmarks including single-cell RNA-seq denoising and satellite-based earth monitoring. In genetics applications the system is reported to improve existing methods while identifying 9 new autoimmune-T-cell associations and 7 previously unreported variant-gene links. Public code is provided.

Significance. If the performance gains are shown to arise from structured domain integration rather than generic iterative LLM search or post-hoc selection, the framework could meaningfully accelerate computational method development across scientific domains. The public code release supports reproducibility, and the genetics application yielding novel biological associations demonstrates potential real-world utility.

major comments (3)
  1. [Abstract and §4] Abstract and §4 (Benchmark Evaluations): the reported outperformance over MLE agents and scientific AI agents provides no quantitative details on baselines, number of independent runs, statistical tests, or controls for prompt variability, preventing assessment of whether the gains are robust.
  2. [§3.2] §3.2 (Knowledge Tree and Iterative Optimization): no ablation is presented that holds total LLM calls, iteration budget, and supplied domain information fixed while removing the knowledge-tree structure. Without this control, gains on RNA-seq and genetics tasks cannot be attributed to domain integration rather than expanded search effort or selective reporting of favorable trajectories.
  3. [§5] §5 (Genetics Applications): the claims of 9 new autoimmune-T-cell associations and 7 unreported variant-gene links lack details on validation procedures or false-positive controls, which is load-bearing given the system's iterative, LLM-driven candidate selection.
minor comments (2)
  1. [Figure 1] Figure 1: the knowledge-tree diagram would benefit from an expanded caption that explicitly labels node types and shows how domain knowledge is injected at each level.
  2. [§2] §2 (Related Work): several recent papers on LLM agents for scientific code generation are cited only in passing; a more systematic comparison table would clarify the precise novelty of the knowledge-tree component.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback. We address each major comment below and have revised the manuscript to incorporate additional details, experiments, and clarifications where needed.

read point-by-point responses
  1. Referee: [Abstract and §4] Abstract and §4 (Benchmark Evaluations): the reported outperformance over MLE agents and scientific AI agents provides no quantitative details on baselines, number of independent runs, statistical tests, or controls for prompt variability, preventing assessment of whether the gains are robust.

    Authors: We appreciate this observation. The original submission reported performance metrics but did not include sufficient statistical details. In the revised version, we have added information on the number of independent runs (five per method with varied seeds), standard deviations, and results from statistical tests (Wilcoxon signed-rank tests with p-values). Additionally, we describe our approach to controlling for prompt variability by using consistent prompt structures. These updates are included in the revised Abstract, §4, and a new supplementary section. revision: yes

  2. Referee: [§3.2] §3.2 (Knowledge Tree and Iterative Optimization): no ablation is presented that holds total LLM calls, iteration budget, and supplied domain information fixed while removing the knowledge-tree structure. Without this control, gains on RNA-seq and genetics tasks cannot be attributed to domain integration rather than expanded search effort or selective reporting of favorable trajectories.

    Authors: We acknowledge the importance of this control experiment. To address it, we have performed an ablation study in which the knowledge tree is replaced by an equivalent flat collection of domain facts, while maintaining identical LLM call budgets and iteration limits. The results indicate that the hierarchical structure contributes to improved performance beyond mere search volume. We have incorporated this ablation analysis into the revised §3.2 and updated the benchmark results in §4. revision: yes

  3. Referee: [§5] §5 (Genetics Applications): the claims of 9 new autoimmune-T-cell associations and 7 unreported variant-gene links lack details on validation procedures or false-positive controls, which is load-bearing given the system's iterative, LLM-driven candidate selection.

    Authors: This point is well taken. We have expanded §5 to detail the validation steps: associations were validated against the GWAS Catalog, GTEx eQTL data, and relevant literature for supporting evidence. For false-positive control, we report FDR-adjusted p-values and results from permutation-based controls where labels were shuffled to assess chance discovery rates. We emphasize that these are computational findings and recommend experimental follow-up. revision: yes

Circularity Check

0 steps flagged

No significant circularity: empirical claims rest on external benchmarks rather than self-referential definitions or fitted predictions.

full rationale

The paper presents TusoAI as an LLM-based agentic system that builds a knowledge tree, performs iterative optimization, and reports benchmark outperformance on tasks like RNA-seq denoising and genetics. No equations, uniqueness theorems, or first-principles derivations are described that reduce to the system's own inputs by construction. Performance claims are framed as empirical comparisons against external baselines (expert methods, MLE agents, scientific AI agents), with no evidence of self-definitional loops, post-hoc fitted quantities renamed as predictions, or load-bearing self-citations that substitute for independent verification. The absence of ablations is a methodological limitation but does not constitute circularity under the specified patterns, as the central results are not forced by re-labeling the evaluation function or candidate pool as an output of the knowledge tree itself.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unstated premise that LLM-generated code plus iterative self-diagnosis reliably extracts and applies domain-specific scientific knowledge without introducing systematic biases or hallucinations that would be caught by the evaluation function.

axioms (1)
  • domain assumption LLMs can synthesize unstructured scientific literature into a usable knowledge tree that improves method optimization beyond generic prompting.
    Invoked in the description of how TusoAI integrates domain knowledge.

pith-pipeline@v0.9.0 · 5804 in / 1248 out tokens · 26657 ms · 2026-05-21T22:30:06.784527+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. SpatialEpiBench: Benchmarking Spatial Information and Epidemic Priors in Forecasting

    cs.AI 2026-05 unverdicted novelty 7.0

    SpatialEpiBench shows adjacency-informed models with epidemic priors underperform a last-value baseline across 11 datasets from 1 day to 1 month ahead, identifying failures in outbreak anticipation, sparsity handling,...

  2. CellScientist: Dual-Space Hierarchical Orchestration for Closed-Loop Refinement of Virtual Cell Models

    cs.LG 2026-05 unverdicted novelty 6.0

    CellScientist introduces a dual-space hierarchical orchestration system that enables closed-loop refinement of virtual cell models by routing execution discrepancies back to hypothesis or implementation updates, yield...

Reference graph

Works this paper leans on

47 extracted references · 47 canonical work pages · cited by 2 Pith papers · 5 internal anchors

  1. [1]

    Semantic scholar

    Allen Institute for AI . Semantic scholar. https://www.semanticscholar.org, 2025. Accessed: 2025-09-16

  2. [2]

    Openscholar: Synthesizing scientific literature with retrieval-augmented lms, 2024

    Akari Asai, Jacqueline He, Rulin Shao, Weijia Shi, Amanpreet Singh, Joseph Chee Chang, Kyle Lo, Luca Soldaini, Sergey Feldman, Mike D'arcy, et al. Openscholar: Synthesizing scientific literature with retrieval-augmented lms. arXiv preprint arXiv:2411.14199, 2024

  3. [3]

    An AI system to help scientists write expert-level empirical software

    Eser Ayg \"u n, Anastasiya Belyaeva, Gheorghe Comanici, Marc Coram, Hao Cui, Jake Garrison, Renee Johnston Anton Kast, Cory Y McLean, Peter Norgaard, Zahra Shamsi, et al. An ai system to help scientists write expert-level empirical software. arXiv preprint arXiv:2509.06503, 2025

  4. [4]

    Bower, E

    G. Bower, E. W. Hollingsworth, S. H. Jacinto, et al. Range extender mediates long-distance enhancer activity. Nature, 643: 0 830--838, 2025. doi:10.1038/s41586-025-09221-6

  5. [5]

    Roumeliotis, et al

    Enrique Cano-Gamez, Blagoje Soskic, Theodoros I. Roumeliotis, et al. Single-cell transcriptomics identifies an effectorness gradient shaping the response of cd4+ t cells to cytokines. Nature Communications, 11 0 (1): 0 1801, 2020. doi:10.1038/s41467-020-15543-y. URL https://doi.org/10.1038/s41467-020-15543-y

  6. [6]

    Chakera, Anna M

    Ali J. Chakera, Anna M. Steele, Anna L. Gloyn, Maggie H. Shepherd, Beverley Shields, Sian Ellard, and Andrew T. Hattersley. Recognition and management of individuals with hyperglycemia because of a heterozygous glucokinase mutation. Diabetes Care, 38 0 (7): 0 1383--1392, 06 2015. ISSN 0149-5992. doi:10.2337/dc14-2769. URL https://doi.org/10.2337/dc14-2769

  7. [7]

    Margarita Dominguez-Villar and David A. Hafler. Regulatory t cells in autoimmune disease. Nature Immunology, 19: 0 665--673, 2018. doi:10.1038/s41590-018-0120-4. URL https://doi.org/10.1038/s41590-018-0120-4

  8. [8]

    Linking regulatory variants to target genes by integrating single-cell multiome methods and genomic distance

    Elizabeth Dorans, Karthik Jagadeesh, Kushal Dey, and Alkes L Price. Linking regulatory variants to target genes by integrating single-cell multiome methods and genomic distance. Nature Genetics, pp.\ 1--10, 2025

  9. [9]

    AutoGluon-Tabular: Robust and Accurate AutoML for Structured Data

    Nick Erickson, Jonas Mueller, Alexander Shirkov, Hang Zhang, Pedro Larroy, Mu Li, and Alexander Smola. Autogluon-tabular: Robust and accurate automl for structured data. arXiv preprint arXiv:2003.06505, 2020

  10. [10]

    CodeBERT: A Pre-Trained Model for Programming and Natural Languages

    Zhangyin Feng, Daya Guo, Duyu Tang, Nan Duan, Xiaocheng Feng, Ming Gong, Linjun Shou, Bing Qin, Ting Liu, Daxin Jiang, and Ming Zhou. Codebert: A pre-trained model for programming and natural languages. CoRR, abs/2002.08155, 2020. URL https://arxiv.org/abs/2002.08155

  11. [11]

    Efficient and robust automated machine learning

    Matthias Feurer, Aaron Klein, Katharina Eggensperger, Jost Springenberg, Manuel Blum, and Frank Hutter. Efficient and robust automated machine learning. Advances in neural information processing systems, 28, 2015

  12. [12]

    Alan Permutt, Jacques S

    Philippe Froguel, Habib Zouali, Nathalie Vionnet, Gilberto Velho, Martine Vaxillaire, Fang Sun, Suzanne Lesage, Markus Stoffel, Jun Takeda, Philippe Passa, M. Alan Permutt, Jacques S. Beckmann, Graeme I. Bell, and Daniel Cohen. Familial hyperglycemia due to mutations in glucokinase -- definition of a subtype of diabetes mellitus. New England Journal of Me...

  13. [13]

    Empowering biomedical discovery with ai agents

    Shanghua Gao, Ada Fang, Yepeng Huang, Valentina Giunchiglia, Ayush Noori, Jonathan Richard Schwarz, Yasha Ektefaie, Jovana Kondic, and Marinka Zitnik. Empowering biomedical discovery with ai agents. Cell, 187 0 (22): 0 6125--6151, 2024

  14. [14]

    Dey, Joseph Nasser, Kumar A

    Steven Gazal, Omer Weissbrod, Farhad Hormozdiari, Kushal K. Dey, Joseph Nasser, Kumar A. Jagadeesh, Daniel J. Weiner, Huwenbo Shi, Charles P. Fulco, Luke J. O'Connor, Bogdan Pasaniuc, Jesse M. Engreitz, and Alkes L. Price. Combining snp-to-gene linking strategies to identify disease genes and assess disease omnigenicity. Nature Genetics, 54 0 (6): 0 827--...

  15. [15]

    Ds-agent: Automated data science by empowering large language models with case-based reasoning,

    Siyuan Guo, Cheng Deng, Ying Wen, Hechang Chen, Yi Chang, and Jun Wang. Ds-agent: Automated data science by empowering large language models with case-based reasoning. arXiv preprint arXiv:2402.17453, 2024

  16. [16]

    Biomni: A general-purpose biomedical ai agent

    Kexin Huang, Serena Zhang, Hanchen Wang, Yuanhao Qu, Yingzhou Lu, Yusuf Roohani, Ryan Li, Lin Qiu, Gavin Li, Junze Zhang, et al. Biomni: A general-purpose biomedical ai agent. biorxiv, pp.\ 2025--05, 2025

  17. [17]

    AIDE: AI-Driven Exploration in the Space of Code

    Zhengyao Jiang, Dominik Schmidt, Dhruv Srikanth, Dixing Xu, Ian Kaplan, Deniss Jacenko, and Yuxiang Wu. Aide: Ai-driven exploration in the space of code. arXiv preprint arXiv:2502.13138, 2025

  18. [18]

    arXiv preprint arXiv:2507.02004 , year=

    Ruofan Jin, Zaixi Zhang, Mengdi Wang, and Le Cong. Stella: Self-evolving llm agent for biomedical research. arXiv preprint arXiv:2507.02004, 2025

  19. [19]

    From variant to function in human disease genetics

    Tuuli Lappalainen and Daniel G MacArthur. From variant to function in human disease genetics. Science, 373 0 (6562): 0 1464--1468, 2021

  20. [20]

    H2o automl: Scalable automatic machine learning

    Erin LeDell, Sebastien Poirier, et al. H2o automl: Scalable automatic machine learning. In Proceedings of the AutoML Workshop at ICML, volume 2020, pp.\ 24, 2020

  21. [21]

    Sullivan, Jens Hjerling-Leffler, Naomi R

    Ang Li, Tian Lin, Alicia Walker, Xiao Tan, Ruolan Zhao, Shuyang Yao, Patrick F. Sullivan, Jens Hjerling-Leffler, Naomi R. Wray, and Jian Zeng. Benchmarking methods integrating gwas and single-cell transcriptomic data for mapping trait-cell type associations. medRxiv, 2025. doi:10.1101/2025.05.24.25328275. URL https://www.medrxiv.org/content/early/2025/06/...

  22. [22]

    Linderman, Jiajun Zhao, Maria Roulis, et al

    George C. Linderman, Jiajun Zhao, Maria Roulis, et al. Zero-preserving imputation of single-cell rna-seq data. Nature Communications, 13: 0 192, 2022. doi:10.1038/s41467-021-27729-z. URL https://doi.org/10.1038/s41467-021-27729-z

  23. [23]

    DARTS: Differentiable Architecture Search

    Hanxiao Liu, Karen Simonyan, and Yiming Yang. Darts: Differentiable architecture search. arXiv preprint arXiv:1806.09055, 2018

  24. [24]

    M. D. Luecken, S. Gigante, D. B. Burkhardt, et al. Defining and benchmarking open problems in single-cell analysis. Nature Biotechnology, 43: 0 1035--1040, 2025. doi:10.1038/s41587-025-02694-w. URL https://doi.org/10.1038/s41587-025-02694-w

  25. [25]

    Large language models surpass human experts in predicting neuroscience results

    Xiaoliang Luo, Akilles Rechardt, Guangzhi Sun, Kevin K Nejad, Felipe Y \'a \ n ez, Bati Yilmaz, Kangjoo Lee, Alexandra O Cohen, Valentina Borghesani, Anton Pashkov, et al. Large language models surpass human experts in predicting neuroscience results. Nature human behaviour, 9 0 (2): 0 305--315, 2025

  26. [26]

    Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D White, and Philippe Schwaller

    Andres M. Bran, Sam Cox, Oliver Schilter, Carlo Baldassari, Andrew D White, and Philippe Schwaller. Augmenting large language models with chemistry tools. Nature Machine Intelligence, 6 0 (5): 0 525--535, 2024

  27. [27]

    Miller, Matthew Greenig, Benjamin Tenmann, and Bo Wang

    Henry E. Miller, Matthew Greenig, Benjamin Tenmann, and Bo Wang. Bioml-bench: Evaluation of ai agents for end-to-end biomedical ml. bioRxiv, 2025. doi:10.1101/2025.09.01.673319. URL https://www.biorxiv.org/content/early/2025/09/07/2025.09.01.673319

  28. [28]

    J. M. Mudge, S. Carbonell-Sala, M. Diekhans, et al. Gencode 2025: reference gene annotation for human and mouse. Nucleic Acids Research, 53 0 (D1): 0 D966--D975, 2025. doi:10.1093/nar/gkae1078

  29. [29]

    Mle-star: Machine learning engineering agent via search and targeted refinement

    Jaehyun Nam, Jinsung Yoon, Jiefeng Chen, Jinwoo Shin, Sercan \"O Ar k, and Tomas Pfister. Mle-star: Machine learning engineering agent via search and targeted refinement. arXiv preprint arXiv:2506.15692, 2025

  30. [30]

    Tpot: A tree-based pipeline optimization tool for automating machine learning

    Randal S Olson and Jason H Moore. Tpot: A tree-based pipeline optimization tool for automating machine learning. In Workshop on automatic machine learning, pp.\ 66--74. PMLR, 2016

  31. [31]

    Introducing chatgpt agent: Bridging research and action

    OpenAI. Introducing chatgpt agent: Bridging research and action. https://openai.com/index/introducing-chatgpt-agent/, July 2025

  32. [32]

    Large language models for code generation: The practitioners perspective

    Zeeshan Rasheed, Muhammad Waseem, Kai Kristian Kemell, Aakash Ahmad, Malik Abdul Sami, Jussi Rasku, Kari Syst \"a , and Pekka Abrahamsson. Large language models for code generation: The practitioners perspective. arXiv preprint arXiv:2501.16998, 2025

  33. [33]

    Mathematical discoveries from program search with large language models

    Bernardino Romera-Paredes, Mohammadamin Barekatain, Alexander Novikov, Matej Balog, M Pawan Kumar, Emilien Dupont, Francisco JR Ruiz, Jordan S Ellenberg, Pengming Wang, Omar Fawzi, et al. Mathematical discoveries from program search with large language models. Nature, 625 0 (7995): 0 468--475, 2024

  34. [34]

    Seo, Sang K

    Eui S. Seo, Sang K. Lee, and Young M. Son. Multifaceted functions of tissue-resident memory t cells in tumorigenesis and cancer immunotherapy. Cancer Immunology, Immunotherapy, 74 0 (6): 0 184, April 2025. doi:10.1007/s00262-025-04035-x. URL https://doi.org/10.1007/s00262-025-04035-x

  35. [35]

    Boxlm: Unifying structures and semantics of medical concepts for diagnosis prediction in healthcare

    Yanchao Tan, Hang Lv, Yunfei Zhan, Guofang Ma, Bo Xiong, and Carl Yang. Boxlm: Unifying structures and semantics of medical concepts for diagnosis prediction in healthcare. In Forty-second International Conference on Machine Learning, 2025

  36. [36]

    Cellforge: Agentic design of virtual cell models

    Xiangru Tang, Zhuoyun Yu, Jiapeng Chen, Yan Cui, Daniel Shao, Weixu Wang, Fang Wu, Yuchen Zhuang, Wenqi Shi, Zhi Huang, et al. Cellforge: Agentic design of virtual cell models. arXiv preprint arXiv:2508.02276, 2025

  37. [37]

    Internagent: When agent becomes the scientist -- building closed-loop system from hypothesis to verification, 2025

    InternAgent Team, Bo Zhang, Shiyang Feng, Xiangchao Yan, Jiakang Yuan, Runmin Ma, Yusong Hu, Zhiyin Yu, Xiaohan He, Songtao Huang, Shaowei Hou, Zheng Nie, Zhilong Wang, Jinyao Liu, Tianshuo Peng, Peng Ye, Dongzhan Zhou, Shufei Zhang, Xiaosong Wang, Yilan Zhang, Meng Li, Zhongying Tu, Xiangyu Yue, Wangli Ouyang, Bowen Zhou, and Lei Bai. Internagent: When a...

  38. [38]

    Automl-agent: A multi-agent llm framework for full-pipeline automl

    Patara Trirat, Wonyong Jeong, and Sung Ju Hwang. Automl-agent: A multi-agent llm framework for full-pipeline automl. arXiv preprint arXiv:2410.02958, 2024

  39. [39]

    NAS -bench-360: Benchmarking neural architecture search on diverse tasks

    Renbo Tu, Nicholas Roberts, Mikhail Khodak, Junhong Shen, Frederic Sala, and Ameet Talwalkar. NAS -bench-360: Benchmarking neural architecture search on diverse tasks. In Thirty-sixth Conference on Neural Information Processing Systems Datasets and Benchmarks Track, 2022. URL https://openreview.net/forum?id=xUXTbq6gWsB

  40. [40]

    R&d-agent: Automating data-driven ai solution building through llm-powered automated research, development, and evolution, 2025

    Xu Yang, Xiao Yang, Shikai Fang, Bowen Xian, Yuante Li, Jian Wang, Minrui Xu, Haoran Pan, Xinpeng Hong, Weiqing Liu, Yelong Shen, Weizhu Chen, and Jiang Bian. R&d-agent: Automating data-driven ai solution building through llm-powered automated research, development, and evolution, 2025. URL https://arxiv.org/abs/2505.14738

  41. [41]

    Dolphin: Moving towards closed-loop auto-research through thinking, practice, and feedback, 2025

    Jiakang Yuan, Xiangchao Yan, Shiyang Feng, Bo Zhang, Tao Chen, Botian Shi, Wanli Ouyang, Yu Qiao, Lei Bai, and Bowen Zhou. Dolphin: Moving towards closed-loop auto-research through thinking, practice, and feedback, 2025. URL https://arxiv.org/abs/2501.03916

  42. [42]

    Polygenic enrichment distinguishes disease associations of individual cells in single-cell rna-seq data

    Martin Jinye Zhang, Kangcheng Hou, Kushal K Dey, Saori Sakaue, Karthik A Jagadeesh, Kathryn Weinand, Aris Taychameekiatchai, Poorvi Rao, Angela Oliveira Pisco, James Zou, et al. Polygenic enrichment distinguishes disease associations of individual cells in single-cell rna-seq data. Nature genetics, 54 0 (10): 0 1572--1580, 2022

  43. [43]

    An automated framework for efficiently designing deep convolutional neural networks in genomics

    Zijun Zhang, Christopher Y Park, Chandra L Theesfeld, and Olga G Troyanskaya. An automated framework for efficiently designing deep convolutional neural networks in genomics. Nature Machine Intelligence, 3 0 (5): 0 392--400, 2021

  44. [44]

    write newline

    " write newline "" before.all 'output.state := FUNCTION n.dashify 't := "" t empty not t #1 #1 substring "-" = t #1 #2 substring "--" = not "--" * t #2 global.max substring 't := t #1 #1 substring "-" = "-" * t #2 global.max substring 't := while if t #1 #1 substring * t #2 global.max substring 't := if while FUNCTION format.date year duplicate empty "emp...

  45. [45]

    @esa (Ref

    \@ifxundefined[1] #1\@undefined \@firstoftwo \@secondoftwo \@ifnum[1] #1 \@firstoftwo \@secondoftwo \@ifx[1] #1 \@firstoftwo \@secondoftwo [2] @ #1 \@temptokena #2 #1 @ \@temptokena \@ifclassloaded agu2001 natbib The agu2001 class already includes natbib coding, so you should not add it explicitly Type <Return> for now, but then later remove the command n...

  46. [46]

    \@lbibitem[] @bibitem@first@sw\@secondoftwo \@lbibitem[#1]#2 \@extra@b@citeb \@ifundefined br@#2\@extra@b@citeb \@namedef br@#2 \@nameuse br@#2\@extra@b@citeb \@ifundefined b@#2\@extra@b@citeb @num @parse #2 @tmp #1 NAT@b@open@#2 NAT@b@shut@#2 \@ifnum @merge>\@ne @bibitem@first@sw \@firstoftwo \@ifundefined NAT@b*@#2 \@firstoftwo @num @NAT@ctr \@secondoft...

  47. [47]

    @open @close @open @close and [1] URL: #1 \@ifundefined chapter * \@mkboth \@ifxundefined @sectionbib * \@mkboth * \@mkboth\@gobbletwo \@ifclassloaded amsart * \@ifclassloaded amsbook * \@ifxundefined @heading @heading NAT@ctr thebibliography [1] @ \@biblabel @NAT@ctr \@bibsetup #1 @NAT@ctr @ @openbib .11em \@plus.33em \@minus.07em 4000 4000 `\.\@m @bibit...