pith. machine review for the scientific record. sign in

arxiv: 2605.10813 · v1 · submitted 2026-05-11 · 💻 cs.AI

Recognition: no theorem link

NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation

Authors on Pith no claims yet

Pith reviewed 2026-05-12 04:07 UTC · model grok-4.3

classification 💻 cs.AI
keywords personalized research automationmulti-agent systemsco-evolutionskill bankmemory modulepolicy learningLLM agentsresearch pipeline
0
0 comments X

The pith

NanoResearch uses tri-level co-evolution of skills, memory, and policy to personalize AI research automation.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes NanoResearch as a multi-agent framework that personalizes LLM-powered research automation for users with different resources, preferences, and output needs. It identifies that uniform systems under-serve individuals and addresses this by accumulating reusable skills across projects, retaining user-specific experience, and internalizing implicit preferences from free-form feedback. These three layers co-evolve so that skills enrich memory, memory guides better planning, and policy updates realign the system to each user. A sympathetic reader would care because this setup allows the automation to improve output quality while lowering costs over successive uses instead of remaining static.

Core claim

NanoResearch is a multi-agent framework that addresses personalization gaps through tri-level co-evolution: a skill bank distills recurring operations into compact procedural rules reusable across projects, a memory module maintains user- and project-specific experience that grounds planning in each user's research history, and label-free policy learning converts free-form feedback into persistent parameter updates of the planner. These layers co-evolve such that reliable skills produce richer memory, richer memory informs better planning, and preference internalization continuously realigns the loop to each user. Experiments show this delivers substantial gains over state-of-the-art AI 연구系统

What carries the argument

The tri-level co-evolution mechanism consisting of a skill bank for reusable procedural rules, a memory module for user-specific history, and label-free policy learning for preference updates, which together enable progressive adaptation without explicit formalization.

If this is right

  • The system produces research outputs that match individual users' resource limits and methodological preferences rather than uniform defaults.
  • Reusable skills distilled from past projects reduce repeated effort in new work.
  • User-specific memory allows planning to draw on personal research history for more relevant decisions.
  • Label-free feedback from free-form comments leads to ongoing planner adjustments without needing formal preference statements.
  • Overall performance improves and costs decrease as the system runs through multiple cycles for the same user.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same co-evolution structure could be tested in other multi-agent domains such as code generation or experiment design to see if personalization emerges without domain-specific redesign.
  • Long-term deployment might produce measurable divergence in research style and efficiency between users with different feedback patterns.
  • The approach suggests a path to evaluate personalization by tracking per-user cost-quality curves rather than aggregate benchmarks.
  • If feedback internalization works, it could reduce the need for explicit user modeling in other adaptive AI systems.

Load-bearing premise

The three layers of skills, memory, and policy will reliably interact and improve each other using only implicit feedback to produce better personalized outputs over time.

What would settle it

Run NanoResearch and a non-co-evolving baseline on a sequence of similar research tasks for the same user profile and check whether output quality rises and total resource cost falls across cycles.

Figures

Figures reproduced from arXiv: 2605.10813 by Cheng Tan, Conghui He, Dongxu Zhang, Jianxin Tang, Jingxuan Wei, Jinhang Xu, Marcia Tian, Odin Zhang, Qiyuan Zhu, Sirui Han, Siyuan Li, Yike Guo, Yiling Duan, Yujun Wu, Zirui Wang.

Figure 1
Figure 1. Figure 1: Comparison between (a) a uniform research automation pipeline that applies identical [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: The NanoResearch framework. An Orchestrator O processes a personalized research request and coordinates a three-stage pipeline (ideation, experimentation, writing) to produce a publication￾ready paper. A Skill Bank S, a Memory Module M, and policy learning jointly accumulate experience and drive self-evolution across cycles. queries, serving as persistent context for all subsequent decisions. As illustrate… view at source ↗
Figure 3
Figure 3. Figure 3: Composition of our benchmark. The 20 research tasks span seven domains, and cover a wide variety of subtasks (left), with dataset sizes ranging from ∼5K to over 1M samples (right). Baselines. We compare NanoResearch against four representative end-to-end automated research systems: AI-Researcher [27], DeepScientist [32], EvoScientist [22], and AI Scientist-v2 [35]. All systems are run under the same task s… view at source ↗
Figure 4
Figure 4. Figure 4: Per-task performance of NanoResearch. The most pronounced advantage emerges on Compliance (8.963 vs. 6.656), confirming that the user profile U and SDPO-based feedback internalization let NanoResearch faithfully re￾spect heterogeneous user preferences. Perfor￾mance further improves monotonically from Round 1 to Round 3 on all dimensions, with notable gains on Innovation (4.960 → 5.645) and Expression (5.42… view at source ↗
Figure 5
Figure 5. Figure 5: Case study on UCI HAR: three simulated users with [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: All pages of the system-generated sensor time-series paper [PITH_FULL_IMAGE:figures/full_fig_p013_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: All pages of the system-generated tabular regression paper [PITH_FULL_IMAGE:figures/full_fig_p014_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: All pages of the system-generated keyword spotting paper [PITH_FULL_IMAGE:figures/full_fig_p015_8.png] view at source ↗
Figure 9
Figure 9. Figure 9: Architecture diagram for Profile A (Evidence-First Scientist). [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗
Figure 10
Figure 10. Figure 10: Architecture diagram for Profile B (Ablation-Focused Researcher). [PITH_FULL_IMAGE:figures/full_fig_p024_10.png] view at source ↗
Figure 11
Figure 11. Figure 11: Architecture diagram for Profile C (Benchmark-Driven Exploratory Researcher). [PITH_FULL_IMAGE:figures/full_fig_p026_11.png] view at source ↗
read the original abstract

LLM-powered multi-agent systems can now automate the full research pipeline from ideation to paper writing, but a fundamental question remains: automation for whom? Researchers operate under different resource configurations, hold different methodological preferences, and target different output formats. A system that produces uniform outputs regardless of these differences will systematically under-serve every individual user, making personalization a precondition for research automation to be genuinely usable. However, achieving it requires three capabilities that current systems lack: accumulating reusable procedural knowledge across projects, retaining user-specific experience across sessions, and internalizing implicit preferences that resist explicit formalization. We propose NanoResearch, a multi-agent framework that addresses these gaps through tri-level co-evolution. A skill bank distills recurring operations into compact procedural rules reusable across projects. A memory module maintains user- and project-specific experience that grounds planning decisions in each user's research history. A label-free policy learning converts free-form feedback into persistent parameter updates of the planner, reshaping subsequent coordination. These three layers co-evolve: reliable skills produce richer memory, richer memory informs better planning, and preference internalization continuously realigns the loop to each user. Extensive experiments demonstrate that NanoResearch delivers substantial gains over state-of-the-art AI research systems, and progressively refines itself to produce better research at lower cost over successive cycles.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 1 minor

Summary. The paper proposes NanoResearch, a multi-agent framework for personalized LLM-powered research automation. It introduces tri-level co-evolution via a skill bank that distills reusable procedural rules, a memory module that retains user- and project-specific experience, and label-free policy learning that converts free-form feedback into planner parameter updates. The central claim is that these layers co-evolve productively to deliver substantial gains over state-of-the-art AI research systems while progressively refining output quality and reducing costs across successive cycles.

Significance. If validated, the work would address a genuine gap in research automation by making systems adaptable to individual researchers' preferences and histories rather than producing uniform outputs. The co-evolution architecture is a coherent conceptual contribution that could inform future adaptive agent designs. However, the absence of any reported metrics, baselines, or controls in the manuscript substantially limits its current significance and falsifiability.

major comments (2)
  1. [Abstract] Abstract, final sentence: the assertion of 'substantial gains over state-of-the-art AI research systems' and 'progressively refines itself to produce better research at lower cost' is load-bearing for the primary contribution yet is unsupported by any quantitative results, specific metrics, baselines, number of trials, or controls, rendering the empirical claim unverifiable.
  2. [Abstract] Abstract, paragraph 3: the description of the three layers co-evolving (skills produce richer memory, memory informs planning, label-free feedback internalizes preferences) is presented without algorithms, interaction protocols, or pseudocode, so the mechanism by which the components are claimed to reinforce one another cannot be evaluated for internal consistency or feasibility.
minor comments (1)
  1. [Abstract] Abstract: 'LLM' is used without expansion on first occurrence (though standard in the field).

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the careful review and for identifying key areas where the presentation of our contributions can be strengthened. We address each major comment below and commit to revisions that improve verifiability and clarity without altering the core framework.

read point-by-point responses
  1. Referee: [Abstract] Abstract, final sentence: the assertion of 'substantial gains over state-of-the-art AI research systems' and 'progressively refines itself to produce better research at lower cost' is load-bearing for the primary contribution yet is unsupported by any quantitative results, specific metrics, baselines, number of trials, or controls, rendering the empirical claim unverifiable.

    Authors: We agree that the abstract's empirical claims should be directly supported by concrete details to allow immediate verification. The manuscript's Experiments section reports comparisons against multiple baselines on repeated research tasks and documents progressive improvements in output quality and resource usage across cycles. To resolve the concern, we will revise the abstract to include explicit references to the baselines, the number of evaluation cycles, and the observed trends in quality and cost metrics, while ensuring the full quantitative results remain prominently detailed in the main text. revision: yes

  2. Referee: [Abstract] Abstract, paragraph 3: the description of the three layers co-evolving (skills produce richer memory, memory informs planning, label-free feedback internalizes preferences) is presented without algorithms, interaction protocols, or pseudocode, so the mechanism by which the components are claimed to reinforce one another cannot be evaluated for internal consistency or feasibility.

    Authors: The tri-level co-evolution process, including the skill distillation procedure, memory update and retrieval rules, and the label-free policy update from free-form feedback, is specified with equations and interaction flow in Section 3 of the manuscript. We acknowledge that the abstract provides only a high-level summary. In the revision we will add a compact algorithmic outline of the co-evolution loop and include pseudocode as a new figure or appendix entry so that the reinforcement mechanisms can be directly inspected for consistency and feasibility. revision: yes

Circularity Check

0 steps flagged

No significant circularity

full rationale

The paper proposes an architectural framework for tri-level co-evolution of skills, memory, and policy in a multi-agent research automation system. No mathematical derivations, equations, fitted parameters, or first-principles predictions are described in the abstract or claimed structure. Claims of performance gains rest on empirical experiments rather than any self-referential reduction of outputs to inputs by construction. No self-citations, uniqueness theorems, or ansatzes are invoked in a load-bearing way within the provided text.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 3 invented entities

The paper introduces three new system components as the core contribution. No explicit numerical free parameters are stated in the abstract. The framework rests on domain assumptions about LLM capabilities for skill distillation and feedback internalization.

axioms (1)
  • domain assumption LLM agents can reliably distill recurring operations into reusable procedural rules and internalize implicit preferences from free-form feedback
    This assumption underpins the skill bank and label-free policy learning components described in the abstract.
invented entities (3)
  • Skill bank no independent evidence
    purpose: Distills recurring operations into compact procedural rules reusable across projects
    New component introduced to accumulate procedural knowledge.
  • Memory module no independent evidence
    purpose: Maintains user- and project-specific experience to ground planning
    New component for retaining user-specific history.
  • Label-free policy learning no independent evidence
    purpose: Converts free-form feedback into persistent parameter updates of the planner
    New mechanism for preference internalization without labels.

pith-pipeline@v0.9.0 · 5580 in / 1422 out tokens · 54366 ms · 2026-05-12T04:07:31.055923+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

37 extracted references · 37 canonical work pages · 11 internal anchors

  1. [1]

    GPT-4 Technical Report

    Josh Achiam, Steven Adler, Sandhini Agarwal, Lama Ahmad, Ilge Akkaya, Florencia Leoni Aleman, Diogo Almeida, Janko Altenschmidt, Sam Altman, Shyamal Anadkat, et al. Gpt-4 technical report.arXiv preprint arXiv:2303.08774, 2023

  2. [2]

    Researchagent: Iterative research idea generation over scientific literature with large language models

    Jinheon Baek, Sujay Kumar Jauhar, Silviu Cucerzan, and Sung Ju Hwang. Researchagent: Iterative research idea generation over scientific literature with large language models. InProceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers), pag...

  3. [3]

    Qwen Technical Report

    Jinze Bai, Shuai Bai, Yunfei Chu, Zeyu Cui, Kai Dang, Xiaodong Deng, Yang Fan, Wenbin Ge, Yu Han, Fei Huang, et al. Qwen technical report.arXiv preprint arXiv:2309.16609, 2023

  4. [4]

    Qwen3-VL Technical Report

    Shuai Bai, Yuxuan Cai, Ruizhe Chen, Keqin Chen, Xionghui Chen, Zesen Cheng, Lianghao Deng, Wei Ding, Chang Gao, Chunjiang Ge, et al. Qwen3-vl technical report.arXiv preprint arXiv:2511.21631, 2025

  5. [5]

    Scibert: A pretrained language model for scientific text

    Iz Beltagy, Kyle Lo, and Arman Cohan. Scibert: A pretrained language model for scientific text. InProceedings of the 2019 conference on empirical methods in natural language processing and the 9th international joint conference on natural language processing (EMNLP-IJCNLP), pages 3615–3620, 2019

  6. [6]

    Aligning language models from user interactions.arXiv preprint arXiv:2603.12273,

    Thomas Kleine Buening, Jonas Hübotter, Barna Pásztor, Idan Shenfeld, Giorgia Ramponi, and Andreas Krause. Aligning language models from user interactions.arXiv preprint arXiv:2603.12273, 2026

  7. [7]

    Tldr: Extreme summarization of scientific documents

    Isabel Cachola, Kyle Lo, Arman Cohan, and Daniel S Weld. Tldr: Extreme summarization of scientific documents. InFindings of the Association for Computational Linguistics: EMNLP 2020, pages 4766–4777, 2020

  8. [8]

    Ai-gas: Ai-generating algorithms, an alternate paradigm for producing general artificial intelligence

    Jeff Clune. Ai-gas: Ai-generating algorithms, an alternate paradigm for producing general artificial intelli- gence.arXiv preprint arXiv:1905.10985, 2019

  9. [9]

    Researchcodeagent: An llm multi-agent system for automated codification of research methodologies

    Shubham Gandhi, Dhruv Shah, Manasi Patwardhan, Lovekesh Vig, and Gautam Shroff. Researchcodeagent: An llm multi-agent system for automated codification of research methodologies. InInternational Workshop on AI for Transportation, pages 3–37. Springer, 2025

  10. [10]

    DeepSeek-R1: Incentivizing Reasoning Capability in LLMs via Reinforcement Learning

    Daya Guo, Dejian Yang, Haowei Zhang, Junxiao Song, Peiyi Wang, Qihao Zhu, Runxin Xu, Ruoyu Zhang, Shirong Ma, Xiao Bi, et al. Deepseek-r1: Incentivizing reasoning capability in llms via reinforcement learning. arXiv preprint arXiv:2501.12948, 2025

  11. [11]

    Nova: An iterative planning and search approach to enhance novelty and diversity of llm generated ideas.arXiv preprint arXiv:2410.14255, 2024

    Xiang Hu, Hongyu Fu, Jinge Wang, Yifeng Wang, Zhikun Li, Renjun Xu, Yu Lu, Yaochu Jin, Lili Pan, and Zhenzhong Lan. Nova: An iterative planning and search approach to enhance novelty and diversity of llm generated ideas.arXiv preprint arXiv:2410.14255, 2024

  12. [12]

    ClinicalBERT: Modeling Clinical Notes and Predicting Hospital Readmission

    Kexin Huang, Jaan Altosaar, and Rajesh Ranganath. Clinicalbert: Modeling clinical notes and predicting hospital readmission.arXiv preprint arXiv:1904.05342, 2019

  13. [13]

    Mixtral of Experts

    Albert Q Jiang, Alexandre Sablayrolles, Antoine Roux, Arthur Mensch, Blanche Savary, Chris Bamford, Devendra Singh Chaplot, Diego de las Casas, Emma Bou Hanna, Florian Bressand, et al. Mixtral of experts. arXiv preprint arXiv:2401.04088, 2024

  14. [14]

    University of Chicago press Chicago, 1970

    Thomas S Kuhn and Ian Hacking.The structure of scientific revolutions, volume 2. University of Chicago press Chicago, 1970

  15. [15]

    MIT press, 1987

    Pat Langley.Scientific discovery: Computational explorations of the creative processes. MIT press, 1987

  16. [16]

    Laboratory life: The construction of scientific facts

    Bruno Latour, Jonas Salk, and Steve Woolgar. Laboratory life: The construction of scientific facts. 2013

  17. [17]

    Biobert: a pre-trained biomedical language representation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 2020

    Jinhyuk Lee, Wonjin Yoon, Sungdong Kim, Donghyeon Kim, Sunkyu Kim, Chan Ho So, and Jaewoo Kang. Biobert: a pre-trained biomedical language representation model for biomedical text mining.Bioinformatics, 36(4):1234–1240, 2020

  18. [18]

    AutoSOTA: An End-to-End Automated Research System for State-of-the-Art AI Model Discovery

    Yu Li, Chenyang Shao, Xinyang Liu, Ruotong Zhao, Peijie Liu, Hongyuan Su, Zhibin Chen, Qinglong Yang, Anjie Xu, Yi Fang, et al. Autosota: An end-to-end automated research system for state-of-the-art ai model discovery.arXiv preprint arXiv:2604.05550, 2026. 17 NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation

  19. [19]

    Autop2c: An llm-based agent framework for code repository generation from multimodal content in academic papers.arXiv preprint arXiv:2504.20115, 2025

    Zijie Lin, Yiqing Shen, Qilin Cai, He Sun, Jinrui Zhou, and Mingjun Xiao. Autop2c: An llm-based agent framework for code repository generation from multimodal content in academic papers.arXiv preprint arXiv:2504.20115, 2025

  20. [20]

    Troubling trends in machine learning scholarship: Some ml papers suffer from flaws that could mislead the public and stymie future research.Queue, 17(1):45–77, 2019

    Zachary C Lipton and Jacob Steinhardt. Troubling trends in machine learning scholarship: Some ml papers suffer from flaws that could mislead the public and stymie future research.Queue, 17(1):45–77, 2019

  21. [21]

    The AI Scientist: Towards Fully Automated Open-Ended Scientific Discovery

    Chris Lu, Cong Lu, Robert Tjarko Lange, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist: Towards fully automated open-ended scientific discovery.arXiv preprint arXiv:2408.06292, 2024

  22. [22]

    arXiv preprint arXiv:2603.08127 (2026)

    Yougang Lyu, Xi Zhang, Xinhao Yi, Yuyue Zhao, Shuyu Guo, Wenxiang Hu, Jan Piotrowski, Jakub Kaliski, Jacopo Urbani, Zaiqiao Meng, et al. Evoscientist: Towards multi-agent evolving ai scientists for end-to-end scientific discovery.arXiv preprint arXiv:2603.08127, 2026

  23. [23]

    Rodriques, and Andrew D

    Jakub Lála, Odhran O’Donoghue, Aleksandar Shtedritski, Sam Cox, Samuel G. Rodriques, and Andrew D. White. Paperqa: Retrieval-augmented generative agent for scientific research, 2023. URL https://arxiv. org/abs/2312.07559

  24. [24]

    Foundation models for generalist medical artificial intelligence.Nature, 616(7956): 259–265, 2023

    Michael Moor, Oishi Banerjee, Zahra Shakeri Hossein Abad, Harlan M Krumholz, Jure Leskovec, Eric J Topol, and Pranav Rajpurkar. Foundation models for generalist medical artificial intelligence.Nature, 616(7956): 259–265, 2023

  25. [25]

    Green ai.Communications of the ACM, 63(12): 54–63, 2020

    Roy Schwartz, Jesse Dodge, Noah A Smith, and Oren Etzioni. Green ai.Communications of the ACM, 63(12): 54–63, 2020

  26. [26]

    Omniscientist: Toward a co-evolving ecosystem of human and ai scientists

    Chenyang Shao, Dehao Huang, Yu Li, Keyu Zhao, Weiquan Lin, Yining Zhang, Qingbin Zeng, Zhiyu Chen, Tianxing Li, Yifei Huang, et al. Omniscientist: Toward a co-evolving ecosystem of human and ai scientists. arXiv preprint arXiv:2511.16931, 2025

  27. [27]

    2025 , doi =

    Jiabin Tang, Lianghao Xia, Zhonghang Li, and Chao Huang. Ai-researcher: Autonomous scientific innovation. arXiv preprint arXiv:2505.18705, 2025

  28. [28]

    Internagent: When agent becomes the scientist–building closed-loop system from hypothesis to verification.arXiv preprint arXiv:2505.16938, 2025

    InternAgent Team, Bo Zhang, Shiyang Feng, Xiangchao Yan, Jiakang Yuan, Runmin Ma, Yusong Hu, Zhiyin Yu, Xiaohan He, Songtao Huang, et al. Internagent: When agent becomes the scientist–building closed-loop system from hypothesis to verification.arXiv preprint arXiv:2505.16938, 2025

  29. [29]

    Kimi-VL Technical Report

    Kimi Team, Angang Du, Bohong Yin, Bowei Xing, Bowen Qu, Bowen Wang, Cheng Chen, Chenlin Zhang, Chenzhuang Du, Chu Wei, et al. Kimi-vl technical report.arXiv preprint arXiv:2504.07491, 2025

  30. [30]

    LLaMA: Open and Efficient Foundation Language Models

    Hugo Touvron, Thibaut Lavril, Gautier Izacard, Xavier Martinet, Marie-Anne Lachaux, Timothée Lacroix, Baptiste Rozière, Naman Goyal, Eric Hambro, Faisal Azhar, et al. Llama: Open and efficient foundation language models.arXiv preprint arXiv:2302.13971, 2023

  31. [31]

    Automating science.Science, 324(5923):43–44, 2009

    David Waltz and Bruce G Buchanan. Automating science.Science, 324(5923):43–44, 2009

  32. [32]

    International Conference on Learning Representations (ICLR) , year =

    Yixuan Weng, Minjun Zhu, Qiujie Xie, Qiyao Sun, Zhen Lin, Sifan Liu, and Yue Zhang. Deepscientist: Advancing frontier-pushing scientific findings progressively.arXiv preprint arXiv:2509.26603, 2025

  33. [33]

    The shaky foundations of large language models and foundation models for electronic health records.npj digital medicine, 6(1):135, 2023

    Michael Wornow, Yizhe Xu, Rahul Thapa, Birju Patel, Ethan Steinberg, Scott Fleming, Michael A Pfeffer, Jason Fries, and Nigam H Shah. The shaky foundations of large language models and foundation models for electronic health records.npj digital medicine, 6(1):135, 2023

  34. [34]

    An empirical analysis of uncertainty in large language model evaluations.arXiv preprint arXiv:2502.10709, 2025

    Qiujie Xie, Qingqiu Li, Zhuohao Yu, Yuejie Zhang, Yue Zhang, and Linyi Yang. An empirical analysis of uncertainty in large language model evaluations.arXiv preprint arXiv:2502.10709, 2025

  35. [35]

    The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

    Yutaro Yamada, Robert Tjarko Lange, Cong Lu, Shengran Hu, Chris Lu, Jakob Foerster, Jeff Clune, and David Ha. The ai scientist-v2: Workshop-level automated scientific discovery via agentic tree search.arXiv preprint arXiv:2504.08066, 2025

  36. [36]

    Ai becomes a masterbrain scientist.bioRxiv, pages 2023–04, 2023

    Zijie Yang, Yukai Wang, and Lijing Zhang. Ai becomes a masterbrain scientist.bioRxiv, pages 2023–04, 2023

  37. [37]

    "" 3Stable encoder for a controlled UCI HAR study. 4The design uses fixed temporal scales rather than sample-adaptive routing. 5

    Minjun Zhu, Qiujie Xie, Yixuan Weng, Jian Wu, Zhen Lin, Linyi Yang, and Yue Zhang. Ai scientists fail without strong implementation capability.arXiv preprint arXiv:2506.01372, 2025. 18 NanoResearch: Co-Evolving Skills, Memory, and Policy for Personalized Research Automation Appendix A User Requirement Alignment Prompt TheCompliance Score(Align.) measures ...