Agentic-Ideation: Sample Efficient Agentic Trajectories Synthesis for Scientific Ideation Agents

Fengli Xu; Keyu Zhao; Lingyan Kong; Yong Li

arxiv: 2606.31229 · v1 · pith:6PQVBRALnew · submitted 2026-06-30 · 💻 cs.AI

Agentic-Ideation: Sample Efficient Agentic Trajectories Synthesis for Scientific Ideation Agents

Keyu Zhao , Lingyan Kong , Fengli Xu , Yong Li This is my paper

Pith reviewed 2026-07-01 05:47 UTC · model grok-4.3

classification 💻 cs.AI

keywords Agentic LLMsScientific IdeationTrajectory SynthesisOracle-Guided LearningSample EfficiencyTool-Augmented AgentsMulti-Agent SystemsAutomated Discovery

0 comments

The pith

Oracle-guided synthesis lets agentic LLMs for scientific ideation beat workflow baselines by 11.91 percent while using over 10 times fewer samples.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper sets out to solve the high cost of creating training data for agentic LLMs that must perform open-ended scientific ideation. It replaces random trial-and-error with an oracle-guided pipeline that uses a known reference idea to steer a multi-agent system toward useful reasoning paths and tool calls. The resulting trajectories are used to train a model that learns decision logic alone because tool results are masked during training. A reader would care because the approach claims to make flexible, autonomous scientific agents practical rather than limited to rigid pre-written workflows.

Core claim

By defining a tool space of three external and three cognitive tools and then applying oracle-guided data synthesis, the method reconstructs logical reasoning trajectories from a reference idea; training an agentic LLM on these trajectories with masked tool feedback produces a model that outperforms the strongest workflow-based baseline by 11.91 percent in overall quality and achieves more than 10 times better sample efficiency for high-quality data.

What carries the argument

Oracle-Guided Data Synthesis, which uses a reference idea to direct a multi-agent system in reconstructing decision and tool-use sequences instead of undirected search.

If this is right

The trained agent can reason flexibly across literature and actions without being locked into a fixed workflow.
Masking tool execution results during training forces the model to internalize decision-making rather than rely on external signals.
The same synthesis pipeline can be applied to other domains that need autonomous tool use and long reasoning chains.
Data synthesis cost drops enough that larger sets of high-quality trajectories become feasible to collect.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The method may extend to domains outside science where reference solutions exist for training but not for deployment.
Removing the oracle entirely during synthesis and testing would reveal how much the performance gain depends on guided data creation.
Combining the trained agent with live experimental tools could close the loop from idea generation to validation.

Load-bearing premise

Trajectories built while an oracle reference idea is available will still produce useful decision logic once that reference is removed at inference time.

What would settle it

Run the trained agent on new ideation tasks that supply no reference idea at all and measure whether overall quality falls below the reported 11.91 percent gain over the workflow baseline.

Figures

Figures reproduced from arXiv: 2606.31229 by Fengli Xu, Keyu Zhao, Lingyan Kong, Yong Li.

**Figure 1.** Figure 1: Overview of the proposed AgenticIdeation includes Oracle-Guided Agentic Data Synthesis and Agentic Supervised FineTuning. The system utilizes Oracle guidance to synthesize high-quality research trajectories, which are subsequently used to fine-tune the model with a masking strategy applied to tool execution results. 2.2. Agentic Data Synthesis Distinct from rigid workflow-based systems, Agentic LLMs are d… view at source ↗

**Figure 2.** Figure 2: Comparsion between Reject Sampling and our Oracle Guided strategy. However, utilizing this system directly for data synthesis suffers from severe inefficiency, as shown in [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: The structure of an agentic trajectory training data. For model training, we utilize a Supervised Fine-Tuning (SFT) approach. Following (Chen et al., 2023; Jin et al., 2025; Zhang et al., 2025b), we mask tool execution results during loss calculation to ensure training performance and robustness. This design prevents the model from memorizing deterministic environmental feedback, forcing it to focus solel… view at source ↗

**Figure 4.** Figure 4: A visualized research trajectory generated by Agentic-Ideation. The agent actively explores the literature, identifies a gap in error correction, and crucially, utilizes the Reflection tool to reject a redundant initial idea, leading to a more novel final proposal. dard LLMs that often generate generic or hallucinated limitations, our agent leverages the <Analyze_Gap> tool to conduct a structural critique… view at source ↗

read the original abstract

Ideation plays a pivotal role in scientific discovery. Recent LLM, especially AI Scientist systems, show promising potential for automated ideation. However, existing approaches predominantly rely on pre-defined agentic workflows. This constraint severely limits the flexibility required to navigate the vast search space of scientific literature and the complex action space of research reasoning. Recently, training Agentic LLMs has emerged as a promising direction, offering flexible reasoning frameworks and the capability for autonomous tool utilization. However, there remains a non-trivial challenge: applying previous agentic data synthesis methods to scientific ideation suffers from prohibitively high data synthesis cost. To bridge this gap, we propose Agentic-Ideation, a novel framework comprising an automated trajectory synthesis pipeline and a specialized agentic LLM trained for scientific ideation. Specifically, we first define a comprehensive tool space incorporating three external tools and three cognitive tools. Then we introduce an Oracle-Guided Data Synthesis strategy. By leveraging a reference idea as oracle guidance, this approach steers the multi-agent system to efficiently reconstruct the logical reasoning and tool invocation paths, transforming aimless trial-and-error into directed trajectory generation. Finally, we train the agent on these synthesized trajectories, employing a masking strategy on tool execution results. This ensures the model focuses on decision-making logic without interference from external feedback. Experimental results demonstrate that our method outperforms the SOTA workflow-based baseline by \textbf{11.91\%} in overall quality. Furthermore, our approach improves the sample efficiency of high-quality data synthesis by \textbf{over 10$\times$}.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's main move is using a reference idea as oracle to steer multi-agent trajectory synthesis for scientific ideation agents, then masking tool results in training; the 11.91% and 10x claims are the part that needs verification.

read the letter

The core contribution is the oracle-guided data synthesis pipeline. They define a tool space with external and cognitive tools, then use a reference idea to direct multi-agent reconstruction of reasoning paths instead of pure trial-and-error. Training applies masking to tool execution results so the model learns decision logic rather than just reacting to feedback. This targets the high cost of generating agentic data for scientific ideation, which the abstract flags as a blocker for prior methods.

That combination is a concrete engineering step. It turns an expensive exploration problem into directed generation and pairs it with a training trick that keeps focus on the agent's choices. If the experiments hold, the efficiency gain would matter for anyone scaling these systems.

The results claim an 11.91% quality lift over the SOTA workflow baseline and over 10x better sample efficiency for high-quality data. Those numbers are the part that cannot be checked from the abstract alone; there are no details on the quality metric, baseline implementation, controls, or statistical tests.

The bigger open question is generalization. Synthesis uses the oracle to steer trajectories, but inference runs without it. The masking helps isolate decision-making, yet nothing described guarantees the trajectories encode oracle-independent reasoning rather than paths that only look good because the reference idea was present during creation. The stress-test note lands on this point.

This is for people working on agentic LLMs for research automation. It shows honest engagement with the data-synthesis bottleneck and cites relevant prior work on workflows and agent training. The central argument is plausible on its face but rests on empirical claims that need full scrutiny. It deserves a serious referee to examine the evaluation setup and any oracle-free tests.

Referee Report

2 major / 0 minor

Summary. The paper proposes Agentic-Ideation, a framework for scientific ideation agents consisting of a tool space (three external + three cognitive tools), an Oracle-Guided Data Synthesis pipeline that uses a reference idea to steer multi-agent trajectory reconstruction, and training of an agentic LLM with masking applied only to tool execution results. It claims this yields trajectories that enable an agent outperforming a SOTA workflow-based baseline by 11.91% in overall quality while improving sample efficiency of high-quality data synthesis by over 10×.

Significance. If the empirical results and generalization hold, the work would address a practical bottleneck in scaling agentic LLMs for open-ended scientific reasoning by converting high-cost trial-and-error synthesis into directed generation, potentially enabling more flexible, autonomous ideation systems beyond fixed workflows.

major comments (2)

[Abstract] Abstract / Experimental results: the 11.91% quality improvement and 10× sample-efficiency claims are stated without any description of the concrete metric(s) used for 'overall quality', the precise implementation and hyper-parameters of the SOTA workflow baseline, the number of evaluated trajectories or ideas, statistical significance tests, or controls for prompt sensitivity and oracle leakage.
[Method] Oracle-Guided Data Synthesis strategy (described in the method): the pipeline explicitly conditions trajectory reconstruction on a reference idea (oracle) to convert trial-and-error into directed generation, yet the subsequent training (masking only on tool results) contains no ablation, hold-out evaluation, or analysis showing that the learned decision logic remains effective when the oracle is absent at inference time—the setting required for real scientific ideation tasks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback, which highlights important areas for improving clarity and rigor in our presentation of results and methods. We address each major comment below and commit to revisions that strengthen the manuscript without altering its core contributions.

read point-by-point responses

Referee: [Abstract] Abstract / Experimental results: the 11.91% quality improvement and 10× sample-efficiency claims are stated without any description of the concrete metric(s) used for 'overall quality', the precise implementation and hyper-parameters of the SOTA workflow baseline, the number of evaluated trajectories or ideas, statistical significance tests, or controls for prompt sensitivity and oracle leakage.

Authors: We agree that the abstract would benefit from explicit details on these elements to allow readers to better assess the claims. In the revision, we will expand the abstract to define the overall quality metric (a composite human evaluation score across novelty, feasibility, and scientific impact), specify the SOTA baseline implementation and hyperparameters, report the evaluation scale (number of trajectories and ideas), include results of statistical significance tests, and describe the controls applied for prompt sensitivity and oracle leakage. Corresponding clarifications will also be added to the experimental section. revision: yes
Referee: [Method] Oracle-Guided Data Synthesis strategy (described in the method): the pipeline explicitly conditions trajectory reconstruction on a reference idea (oracle) to convert trial-and-error into directed generation, yet the subsequent training (masking only on tool results) contains no ablation, hold-out evaluation, or analysis showing that the learned decision logic remains effective when the oracle is absent at inference time—the setting required for real scientific ideation tasks.

Authors: The masking of tool execution results during training is specifically intended to encourage the model to internalize decision-making logic independent of external signals, supporting generalization to oracle-free inference. That said, we acknowledge the value of explicit validation for this claim. We will add an ablation study in the revised manuscript that evaluates the trained agent on held-out ideation tasks both with and without oracle guidance at inference time, along with analysis of decision quality in the oracle-absent setting. revision: yes

Circularity Check

0 steps flagged

No circularity; empirical method with no derivation chain

full rationale

The paper describes an empirical pipeline (tool space definition, Oracle-Guided Data Synthesis using reference ideas, masking during training) and reports performance gains (11.91% quality, 10× efficiency) against a SOTA baseline. No equations, derivations, or first-principles results appear. None of the six enumerated circularity patterns apply: no self-definitional relations, no fitted inputs relabeled as predictions, no load-bearing self-citations, and no uniqueness theorems or ansatzes. The oracle-to-inference generalization is an empirical assumption, not a circular reduction. The derivation chain is therefore self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The central claim rests on the unstated assumption that LLM-based multi-agent systems can reliably reconstruct logical paths when guided by an oracle and that masking tool outputs isolates decision-making skill; no free parameters or invented physical entities are mentioned.

axioms (2)

domain assumption LLM agents can be steered by an oracle reference idea to produce useful reasoning trajectories
Invoked in the description of Oracle-Guided Data Synthesis
domain assumption Masking tool execution results during training isolates decision-making logic from external feedback
Stated as the training strategy to focus the model

invented entities (1)

Oracle-Guided Data Synthesis strategy no independent evidence
purpose: Efficient trajectory generation for agent training
New named method introduced to replace aimless trial-and-error

pith-pipeline@v0.9.1-grok · 5814 in / 1399 out tokens · 22483 ms · 2026-07-01T05:47:56.598468+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

12 extracted references · 11 canonical work pages · 4 internal anchors

[1]

Fireact: Toward language agent fine-tuning.arXiv preprint arXiv:2310.05915,

URL https://api.semanticscholar. org/CorpusID:211054583. Chen, B., Shu, C., Shareghi, E., Collier, N., Narasimhan, K., and Yao, S. Fireact: Toward language agent fine-tuning.ArXiv, abs/2310.05915, 2023. URL https: //api.semanticscholar.org/CorpusID: 263829338. Chen, M., Li, T., Sun, H., Zhou, Y ., Zhu, C., Wang, H., Pan, J. Z., Zhang, W., zeng Chen, H., Y...

work page arXiv 2023
[2]

org/CorpusID:277313597

URL https://api.semanticscholar. org/CorpusID:277313597. Fang, R., Cai, S., Li, B., Wu, J., Li, G., Yin, W., Wang, X., Wang, X., Su, L., Zhang, Z., Wu, S., Tao, Z., Jiang, Y ., Xie, P., Huang, F., and Zhou, J. Towards general agentic intelligence via environment scaling.ArXiv, abs/2509.13311,

work page arXiv
[3]

WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

URL https://api.semanticscholar. org/CorpusID:281325844. Geng, X., Xia, P., Zhang, Z., Wang, X., Wang, Q., Ding, R., Wang, C., Wu, J., Zhao, Y ., Li, K., Jiang, Y ., Xie, P., Huang, F., and Zhou, J. Webwatcher: Breaking new frontier of vision- language deep research agent.ArXiv, abs/2508.05748,

work page internal anchor Pith review Pith/arXiv arXiv
[4]

Towards an AI co-scientist

URL https://api.semanticscholar. org/CorpusID:280561766. Gottweis, J., Weng, W.-H., Daryin, A., Tu, T., Palepu, A., Sirkovic, P., Myaskovsky, A., Weissenberger, F., Rong, K., Tanno, R., et al. Towards an ai co-scientist.arXiv preprint arXiv:2502.18864, 2025. Hu, S., Lu, C., and Clune, J. Automated de- sign of agentic systems.ArXiv, abs/2408.08435,

work page internal anchor Pith review Pith/arXiv arXiv 2025
[5]

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

URL https://api.semanticscholar. org/CorpusID:271892234. Jin, B., Zeng, H., Yue, Z., Wang, D., Zamani, H., and Han, J. Search-r1: Training llms to reason and leverage search engines with reinforcement learn- ing.ArXiv, abs/2503.09516, 2025. URL https: //api.semanticscholar.org/CorpusID: 276937772. Li, K., Zhang, Z., Yin, H., Ye, R., Zhao, Y ., Zhang, L., ...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[6]

org/CorpusID:278658695

URL https://api.semanticscholar. org/CorpusID:278658695. 9 Agentic-Ideation: Sample Efficient Agentic Trajectories Synthesis for Scientific Ideation Agents Pu, Y ., Lin, T., and Chen, H. Piflow: Principle-aware sci- entific discovery with multi-agent collaboration.arXiv preprint arXiv:2505.15047, 2025. Romera-Paredes, B., Barekatain, M., Novikov, A., Balo...

work page arXiv 2025
[7]

Schmidgall and M

URL https://api.semanticscholar. org/CorpusID:266223700. Schmidgall, S. and Moor, M. Agentrxiv: Towards collaborative autonomous research.arXiv preprint arXiv:2503.18102, 2025. Schmidgall, S., Su, Y ., Wang, Z., Sun, X., Wu, J., Yu, X., Liu, J., Liu, Z., and Barsoum, E. Agent laboratory: Using llm agents as research assistants.arXiv preprint arXiv:2501.04...

work page arXiv 2025
[8]

org/CorpusID:273228108

URL https://api.semanticscholar. org/CorpusID:273228108. Su, H., Chen, R., Tang, S., Yin, Z., Zheng, X., Li, J., Qi, B., Wu, Q., Li, H., Ouyang, W., Torr, P., Zhou, B., and Dong, N. Many heads are bet- ter than one: Improved scientific idea generation by a llm-based multi-agent system. InAnnual Meet- ing of the Association for Computational Linguistics,
[9]

Ai-researcher: Autonomous scientific innovation.arXiv preprint arXiv:2505.18705, 2025

URL https://api.semanticscholar. org/CorpusID:273346445. Tang, J., Xia, L., Li, Z., and Huang, C. Ai-researcher: Autonomous scientific innovation.arXiv preprint arXiv:2505.18705, 2025. Tao, Z., Wu, J., Yin, W., Zhang, J., Li, B., Shen, H., Li, K., Zhang, L., Wang, X., Jiang, Y ., Xie, P., Huang, F., and Zhou, J. Web- shaper: Agentically data synthesizing ...

work page arXiv 2025
[10]

org/CorpusID:280271252

URL https://api.semanticscholar. org/CorpusID:280271252. Wang, Q., Downey, D., Ji, H., and Hope, T. Sci- mon: Scientific inspiration machines optimized for novelty. InAnnual Meeting of the Association for Computational Linguistics, 2023. URL https: //api.semanticscholar.org/CorpusID: 258841365. Wang, W., Gu, L., Zhang, L., Luo, Y ., Dai, Y ., Shen, C., Xi...

work page arXiv 2023
[11]

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

URL https://api.semanticscholar. org/CorpusID:278959248. Yamada, Y ., Lange, R. T., Lu, C., Hu, S., Lu, C., Foerster, J., Clune, J., and Ha, D. The ai scientist-v2: Workshop-level automated scientific discovery via agentic tree search. arXiv preprint arXiv:2504.08066, 2025. Yan, X., Feng, S., Yuan, J., Xia, R., Wang, B., Bai, L., and Zhang, B. Surveyforge...

work page internal anchor Pith review Pith/arXiv arXiv 2025
[12]

org/CorpusID:278602855

URL https://api.semanticscholar. org/CorpusID:278602855. Yu, H., Hong, Z., Cheng, Z., Zhu, K., Xuan, K., Yao, J., Feng, T., and You, J. Researchtown: Simulator of human research community.arXiv preprint arXiv:2412.17767, 2024. Zhang, L., Wang, M., and Chen, B. Scientific judgment drifts over time in ai ideation.ArXiv, abs/2511.04964, 2025a. URL https://ap...

work page arXiv 2024

[1] [1]

Fireact: Toward language agent fine-tuning.arXiv preprint arXiv:2310.05915,

URL https://api.semanticscholar. org/CorpusID:211054583. Chen, B., Shu, C., Shareghi, E., Collier, N., Narasimhan, K., and Yao, S. Fireact: Toward language agent fine-tuning.ArXiv, abs/2310.05915, 2023. URL https: //api.semanticscholar.org/CorpusID: 263829338. Chen, M., Li, T., Sun, H., Zhou, Y ., Zhu, C., Wang, H., Pan, J. Z., Zhang, W., zeng Chen, H., Y...

work page arXiv 2023

[2] [2]

org/CorpusID:277313597

URL https://api.semanticscholar. org/CorpusID:277313597. Fang, R., Cai, S., Li, B., Wu, J., Li, G., Yin, W., Wang, X., Wang, X., Su, L., Zhang, Z., Wu, S., Tao, Z., Jiang, Y ., Xie, P., Huang, F., and Zhou, J. Towards general agentic intelligence via environment scaling.ArXiv, abs/2509.13311,

work page arXiv

[3] [3]

WebWatcher: Breaking New Frontier of Vision-Language Deep Research Agent

URL https://api.semanticscholar. org/CorpusID:281325844. Geng, X., Xia, P., Zhang, Z., Wang, X., Wang, Q., Ding, R., Wang, C., Wu, J., Zhao, Y ., Li, K., Jiang, Y ., Xie, P., Huang, F., and Zhou, J. Webwatcher: Breaking new frontier of vision- language deep research agent.ArXiv, abs/2508.05748,

work page internal anchor Pith review Pith/arXiv arXiv

[4] [4]

Towards an AI co-scientist

URL https://api.semanticscholar. org/CorpusID:280561766. Gottweis, J., Weng, W.-H., Daryin, A., Tu, T., Palepu, A., Sirkovic, P., Myaskovsky, A., Weissenberger, F., Rong, K., Tanno, R., et al. Towards an ai co-scientist.arXiv preprint arXiv:2502.18864, 2025. Hu, S., Lu, C., and Clune, J. Automated de- sign of agentic systems.ArXiv, abs/2408.08435,

work page internal anchor Pith review Pith/arXiv arXiv 2025

[5] [5]

Search-R1: Training LLMs to Reason and Leverage Search Engines with Reinforcement Learning

URL https://api.semanticscholar. org/CorpusID:271892234. Jin, B., Zeng, H., Yue, Z., Wang, D., Zamani, H., and Han, J. Search-r1: Training llms to reason and leverage search engines with reinforcement learn- ing.ArXiv, abs/2503.09516, 2025. URL https: //api.semanticscholar.org/CorpusID: 276937772. Li, K., Zhang, Z., Yin, H., Ye, R., Zhao, Y ., Zhang, L., ...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[6] [6]

org/CorpusID:278658695

URL https://api.semanticscholar. org/CorpusID:278658695. 9 Agentic-Ideation: Sample Efficient Agentic Trajectories Synthesis for Scientific Ideation Agents Pu, Y ., Lin, T., and Chen, H. Piflow: Principle-aware sci- entific discovery with multi-agent collaboration.arXiv preprint arXiv:2505.15047, 2025. Romera-Paredes, B., Barekatain, M., Novikov, A., Balo...

work page arXiv 2025

[7] [7]

Schmidgall and M

URL https://api.semanticscholar. org/CorpusID:266223700. Schmidgall, S. and Moor, M. Agentrxiv: Towards collaborative autonomous research.arXiv preprint arXiv:2503.18102, 2025. Schmidgall, S., Su, Y ., Wang, Z., Sun, X., Wu, J., Yu, X., Liu, J., Liu, Z., and Barsoum, E. Agent laboratory: Using llm agents as research assistants.arXiv preprint arXiv:2501.04...

work page arXiv 2025

[8] [8]

org/CorpusID:273228108

URL https://api.semanticscholar. org/CorpusID:273228108. Su, H., Chen, R., Tang, S., Yin, Z., Zheng, X., Li, J., Qi, B., Wu, Q., Li, H., Ouyang, W., Torr, P., Zhou, B., and Dong, N. Many heads are bet- ter than one: Improved scientific idea generation by a llm-based multi-agent system. InAnnual Meet- ing of the Association for Computational Linguistics,

[9] [9]

Ai-researcher: Autonomous scientific innovation.arXiv preprint arXiv:2505.18705, 2025

URL https://api.semanticscholar. org/CorpusID:273346445. Tang, J., Xia, L., Li, Z., and Huang, C. Ai-researcher: Autonomous scientific innovation.arXiv preprint arXiv:2505.18705, 2025. Tao, Z., Wu, J., Yin, W., Zhang, J., Li, B., Shen, H., Li, K., Zhang, L., Wang, X., Jiang, Y ., Xie, P., Huang, F., and Zhou, J. Web- shaper: Agentically data synthesizing ...

work page arXiv 2025

[10] [10]

org/CorpusID:280271252

URL https://api.semanticscholar. org/CorpusID:280271252. Wang, Q., Downey, D., Ji, H., and Hope, T. Sci- mon: Scientific inspiration machines optimized for novelty. InAnnual Meeting of the Association for Computational Linguistics, 2023. URL https: //api.semanticscholar.org/CorpusID: 258841365. Wang, W., Gu, L., Zhang, L., Luo, Y ., Dai, Y ., Shen, C., Xi...

work page arXiv 2023

[11] [11]

The AI Scientist-v2: Workshop-Level Automated Scientific Discovery via Agentic Tree Search

URL https://api.semanticscholar. org/CorpusID:278959248. Yamada, Y ., Lange, R. T., Lu, C., Hu, S., Lu, C., Foerster, J., Clune, J., and Ha, D. The ai scientist-v2: Workshop-level automated scientific discovery via agentic tree search. arXiv preprint arXiv:2504.08066, 2025. Yan, X., Feng, S., Yuan, J., Xia, R., Wang, B., Bai, L., and Zhang, B. Surveyforge...

work page internal anchor Pith review Pith/arXiv arXiv 2025

[12] [12]

org/CorpusID:278602855

URL https://api.semanticscholar. org/CorpusID:278602855. Yu, H., Hong, Z., Cheng, Z., Zhu, K., Xuan, K., Yao, J., Feng, T., and You, J. Researchtown: Simulator of human research community.arXiv preprint arXiv:2412.17767, 2024. Zhang, L., Wang, M., and Chen, B. Scientific judgment drifts over time in ai ideation.ArXiv, abs/2511.04964, 2025a. URL https://ap...

work page arXiv 2024