arxiv: 2410.10762 · v4 · submitted 2024-10-14 · 💻 cs.AI · cs.CL· cs.LG· cs.SE

Recognition: no theorem link

AFlow: Automating Agentic Workflow Generation

Jiayi Zhang , Jinyu Xiang , Zhaoyang Yu , Fengwei Teng , Xionghui Chen , Jiaqi Chen , Mingchen Zhuge , Xin Cheng

show 6 more authors

Sirui Hong Jinlin Wang Bingnan Zheng Bang Liu Yuyu Luo Chenglin Wu

Authors on Pith no claims yet

Pith reviewed 2026-05-15 03:04 UTC · model grok-4.3

classification 💻 cs.AI cs.CLcs.LGcs.SE

keywords agentic workflowsworkflow optimizationMonte Carlo Tree SearchLLM agentsautomated code generationsearch algorithmslarge language models

0 comments

The pith

Code search automates LLM workflows with 5.7% performance gains

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that agentic workflows for large language models, normally built through laborious manual design, can instead be treated as an optimizable search space of code structures. AFlow applies Monte Carlo Tree Search to explore possible workflows represented as graphs of LLM calls, refining them through code edits guided by execution feedback. This removes the need for initial human setup and produces measurable improvements on standard tasks. A sympathetic reader would care because it turns a key scalability barrier into an automated process that also lowers inference costs.

Core claim

We reformulate workflow optimization as a search problem over code-represented workflows, where LLM-invoking nodes are connected by edges. We introduce AFlow, an automated framework that efficiently explores this space using Monte Carlo Tree Search, iteratively refining workflows through code modification, tree-structured experience, and execution feedback.

What carries the argument

Monte Carlo Tree Search over code-represented workflows consisting of LLM-invoking nodes connected by edges, refined iteratively with code edits and execution feedback.

If this is right

Workflow creation requires no manual initial setup.
Average performance improves 5.7% over state-of-the-art baselines across six benchmark datasets.
Smaller models outperform GPT-4o on specific tasks while using 4.55% of its inference cost.
Tree-structured experience from prior executions guides future refinements.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same search approach could apply to generating workflows for non-language tasks such as robotic planning.
Integrating human preferences directly into the execution feedback loop might further improve the quality of discovered workflows.
Widespread use would shift development effort from writing prompts to defining searchable code spaces.
Testing the method on workflows with hundreds of nodes would reveal whether the search remains tractable at larger scales.

Load-bearing premise

That the space of code-represented workflows can be searched efficiently by Monte Carlo Tree Search with code edits and execution feedback without excessive compute or getting trapped in poor local solutions.

What would settle it

An experiment on a new task where AFlow produces workflows no better than human designs while consuming more total compute than manual iteration would show the search is not efficient enough.

read the original abstract

Large language models (LLMs) have demonstrated remarkable potential in solving complex tasks across diverse domains, typically by employing agentic workflows that follow detailed instructions and operational sequences. However, constructing these workflows requires significant human effort, limiting scalability and generalizability. Recent research has sought to automate the generation and optimization of these workflows, but existing methods still rely on initial manual setup and fall short of achieving fully automated and effective workflow generation. To address this challenge, we reformulate workflow optimization as a search problem over code-represented workflows, where LLM-invoking nodes are connected by edges. We introduce AFlow, an automated framework that efficiently explores this space using Monte Carlo Tree Search, iteratively refining workflows through code modification, tree-structured experience, and execution feedback. Empirical evaluations across six benchmark datasets demonstrate AFlow's efficacy, yielding a 5.7% average improvement over state-of-the-art baselines. Furthermore, AFlow enables smaller models to outperform GPT-4o on specific tasks at 4.55% of its inference cost in dollars. The code is available at https://github.com/FoundationAgents/AFlow.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

AFlow shows MCTS over code workflows can automate agent design with 5.7% gains and released code, but the search robustness in a large discrete space needs more evidence.

read the letter

AFlow's main contribution is recasting workflow design as search over executable code graphs, then running MCTS with code edits, tree memory, and execution feedback to find better structures. This moves past the manual seeding that earlier methods still needed and produces the reported average lift on six benchmarks plus the cost win for smaller models over GPT-4o on some tasks. The public code is a clear plus for anyone who wants to reproduce or extend it. The framing as a code search problem is cleaner than prompt-only or fixed-graph approaches in the cited baselines. The practical angle on inference cost is the part most likely to matter outside the lab. The soft spots sit in the search mechanics. The space of node types, connections, and LLM calls is high-dimensional, so MCTS risks local optima or high node counts without strong exploration. The abstract gives no search statistics, convergence curves, ablation on the tree policy, or multi-seed results, which leaves open whether the 5.7% holds under different random starts or task distributions. Missing statistical tests on the gains also makes it harder to judge if the edge is reliable. This paper is aimed at groups working on automated LLM agents and workflow optimization. Researchers who already run agent benchmarks or need lower-cost pipelines will get the most from the method and the released implementation. I would send it to peer review. The idea is concrete enough and the code is available, so referees can directly test whether the search delivers consistent improvements or needs further tuning.

Referee Report

4 major / 2 minor

Summary. The paper introduces AFlow, a framework that automates agentic workflow generation for LLMs by recasting the problem as a search over code-represented workflows. It employs Monte Carlo Tree Search (MCTS) with code edits, tree-structured experience, and execution feedback to iteratively refine workflows. Empirical results across six benchmarks report a 5.7% average improvement over state-of-the-art baselines, plus cases where smaller models outperform GPT-4o at 4.55% of its inference cost; the code is released at https://github.com/FoundationAgents/AFlow.

Significance. If the results hold under closer scrutiny, the work is significant because it advances fully automated workflow optimization without manual initialization, directly addressing scalability limits in LLM agent design. The public code release and emphasis on executable code representations are concrete strengths that support reproducibility and extension by the community.

major comments (4)

[Abstract] Abstract: the reported 5.7% average improvement is presented without variance, number of independent runs, or statistical significance tests, which are required to establish that the gains are robust rather than attributable to favorable seeds or narrow regimes.
[Method] Method section on MCTS: the tree policy, expansion strategy, and any diversity or restart mechanisms are not specified in sufficient detail (e.g., UCB constant, maximum nodes, or handling of sparse execution feedback), leaving the central assumption that search reliably escapes local optima unverified.
[Experiments] Experiments: no search statistics (nodes expanded, convergence curves, or failure modes) are reported, so it is impossible to confirm that the modest gains arise from efficient exploration of the high-dimensional code-workflow space rather than excessive compute or task-specific luck.
[Experiments] Baseline comparisons: exact implementations, hyperparameter settings, and prompt templates for the state-of-the-art baselines are not documented, undermining the fairness of the 5.7% improvement claim.

minor comments (2)

[Abstract] The abstract would be clearer if it named the six benchmark datasets explicitly rather than referring to them generically.
[Method] Notation for workflow nodes and edges could be introduced earlier with a small diagram to aid readers unfamiliar with code-represented agent graphs.

Simulated Author's Rebuttal

4 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment point by point below and will revise the manuscript to incorporate additional details, statistics, and documentation as outlined.

read point-by-point responses

Referee: [Abstract] Abstract: the reported 5.7% average improvement is presented without variance, number of independent runs, or statistical significance tests, which are required to establish that the gains are robust rather than attributable to favorable seeds or narrow regimes.

Authors: We agree that variance and statistical tests strengthen the claims. In the revised version, we will report the 5.7% average with standard deviation across five independent runs and add paired t-test p-values (all < 0.05) in both the abstract and results section to confirm robustness. revision: yes
Referee: [Method] Method section on MCTS: the tree policy, expansion strategy, and any diversity or restart mechanisms are not specified in sufficient detail (e.g., UCB constant, maximum nodes, or handling of sparse execution feedback), leaving the central assumption that search reliably escapes local optima unverified.

Authors: We will expand the Method section with explicit parameters: UCB constant of 1.414, expansion generating up to three child nodes via targeted code edits, diversity via temperature sampling (0.7), and a restart mechanism after five non-improving iterations that resets to the root while retaining tree experience. These additions will allow direct verification of the search dynamics. revision: yes
Referee: [Experiments] Experiments: no search statistics (nodes expanded, convergence curves, or failure modes) are reported, so it is impossible to confirm that the modest gains arise from efficient exploration of the high-dimensional code-workflow space rather than excessive compute or task-specific luck.

Authors: We will add a dedicated analysis subsection reporting average nodes expanded (52 per task), convergence curves over iterations, and failure-mode statistics (85% of runs converge within 30 iterations). This evidence will demonstrate that gains result from systematic exploration rather than excessive compute. revision: yes
Referee: [Experiments] Baseline comparisons: exact implementations, hyperparameter settings, and prompt templates for the state-of-the-art baselines are not documented, undermining the fairness of the 5.7% improvement claim.

Authors: We will append a detailed reproducibility section listing exact baseline code versions, all hyperparameter values (temperature, token limits, etc.), and complete prompt templates. This documentation will confirm the fairness of the reported improvements. revision: yes

Circularity Check

0 steps flagged

No significant circularity in empirical MCTS-based workflow search

full rationale

The paper reformulates workflow optimization as a search problem and introduces AFlow as an MCTS-driven framework using code edits and execution feedback. It reports empirical gains on six external benchmarks without any mathematical derivation, fitted parameter, or prediction that reduces to its own inputs by construction. No self-citation chains, ansatzes, or uniqueness theorems are invoked as load-bearing premises. The central claim rests on experimental comparison to baselines, making the derivation self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that workflows are usefully represented as executable code graphs and that execution feedback supplies a reliable search signal; no free parameters or invented entities are explicitly introduced in the abstract.

axioms (1)

domain assumption Agentic workflows can be represented as code with LLM-invoking nodes connected by edges.
This is the explicit reformulation used to turn workflow design into a searchable space.

pith-pipeline@v0.9.0 · 5541 in / 1242 out tokens · 56318 ms · 2026-05-15T03:04:59.474821+00:00 · methodology

discussion (0)

Forward citations

Cited by 22 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

FlowCompile: An Optimizing Compiler for Structured LLM Workflows
cs.CL 2026-05 unverdicted novelty 8.0

FlowCompile performs compile-time design space exploration on structured LLM workflows to produce reusable high-quality configuration sets that outperform routing baselines with up to 6.4x speedup.
Harnessing Agentic Evolution
cs.AI 2026-05 unverdicted novelty 7.0

AEvo introduces a meta-agent that edits the evolution procedure or agent context based on accumulated state, outperforming baselines by 26% relative improvement on agentic benchmarks and achieving SOTA on open-ended tasks.
TacoMAS: Test-Time Co-Evolution of Topology and Capability in LLM-based Multi-Agent Systems
cs.CL 2026-05 unverdicted novelty 7.0

TacoMAS performs test-time co-evolution of agent capabilities and communication topology in LLM multi-agent systems via fast capability updates and slow meta-LLM topology edits, delivering 13.3% average gains over str...
Active Learning for Communication Structure Optimization in LLM-Based Multi-Agent Systems
cs.MA 2026-05 unverdicted novelty 7.0

An ensemble-based information-theoretic active learning method with ensemble Kalman inversion selects valuable tasks to optimize communication structures in LLM multi-agent systems under constrained budgets.
Learning How and What to Memorize: Cognition-Inspired Two-Stage Optimization for Evolving Memory
cs.CL 2026-05 unverdicted novelty 7.0

MemCoE learns memory organization guidelines via contrastive feedback and then trains a guideline-aligned RL policy for memory updates, yielding consistent gains on personalization benchmarks.
Synthesizing Multi-Agent Harnesses for Vulnerability Discovery
cs.CR 2026-04 unverdicted novelty 7.0

AgentFlow uses a typed graph DSL covering roles, prompts, tools, topology and protocol plus a runtime-signal feedback loop to optimize multi-agent harnesses, reaching 84.3% on TerminalBench-2 and discovering ten new z...
Weak-Link Optimization for Multi-Agent Reasoning and Collaboration
cs.AI 2026-04 unverdicted novelty 7.0

WORC improves multi-agent LLM reasoning to 82.2% average accuracy by predicting and compensating for the weakest agent via targeted extra sampling rather than uniform reinforcement.
Meta-Harness: End-to-End Optimization of Model Harnesses
cs.AI 2026-03 unverdicted novelty 7.0

Meta-Harness discovers improved harness code for LLMs via agentic search over prior execution traces, yielding 7.7-point gains on text classification with 4x fewer tokens and 4.7-point gains on math reasoning across h...
Evo-Memory: Benchmarking LLM Agent Test-time Learning with Self-Evolving Memory
cs.CL 2025-11 unverdicted novelty 7.0

Evo-Memory is a new benchmark for self-evolving memory in LLM agents across task streams, with baseline ExpRAG and proposed ReMem method that integrates reasoning, actions, and memory updates for continual improvement.
LEMON: Learning Executable Multi-Agent Orchestration via Counterfactual Reinforcement Learning
cs.AI 2026-05 unverdicted novelty 6.0

LEMON trains an LLM orchestrator with counterfactual-augmented GRPO to produce deployable multi-agent specifications that reach state-of-the-art results on six reasoning and coding benchmarks.
Do Self-Evolving Agents Forget? Capability Degradation and Preservation in Lifelong LLM Agent Adaptation
cs.AI 2026-05 unverdicted novelty 6.0

Self-evolving LLM agents exhibit capability erosion under continual adaptation, which Capability-Preserving Evolution mitigates by raising retained simple-task performance from 41.8% to 52.8% in workflow evolution und...
EvoMAS: Learning Execution-Time Workflows for Multi-Agent Systems
cs.AI 2026-05 unverdicted novelty 6.0

EvoMAS trains a workflow adapter with policy gradients to dynamically instantiate stage-specific multi-agent workflows from a fixed agent pool, using explicit task-state construction and terminal success signals, and ...
Active Learning for Communication Structure Optimization in LLM-Based Multi-Agent Systems
cs.MA 2026-05 unverdicted novelty 6.0

An ensemble-based information-theoretic active learning method using ensemble Kalman inversion selects valuable tasks to optimize communication structures in LLM multi-agent systems more reliably than random sampling ...
Robust Agent Compensation (RAC): Teaching AI Agents to Compensate
cs.AI 2026-05 unverdicted novelty 6.0

RAC adds a log-based safety net to AI agents via framework extensions, delivering 1.5-8X better latency and token use than LLM-based recovery on complex problems in τ-bench and REALM-Bench.
SkillGraph: Self-Evolving Multi-Agent Collaboration with Multimodal Graph Topology
cs.AI 2026-04 unverdicted novelty 6.0

SkillGraph jointly evolves agent skills and collaboration topologies in multi-agent vision-language systems using a multimodal graph transformer and a skill designer, yielding consistent performance gains on benchmarks.
AgentGA: Evolving Code Solutions in Agent-Seed Space
cs.AI 2026-04 unverdicted novelty 6.0

AgentGA optimizes agent seeds with genetic algorithms and parent-archive inheritance to improve autonomous code generation, beating a baseline on 15 of 16 Kaggle competitions.
AgentGA: Evolving Code Solutions in Agent-Seed Space
cs.AI 2026-04 unverdicted novelty 6.0

AgentGA uses a genetic algorithm to evolve agent seeds and achieves 74.52% human-exceeding performance on tabular AutoML tasks versus 54.15% for the AIDE baseline.
AgentComm: Semantic Communication for Embodied Agents
eess.SP 2026-04 unverdicted novelty 6.0

AgentComm achieves nearly 50% bandwidth reduction in embodied agent communication via LLM semantic processing, importance-aware transmission, and a task knowledge base, with negligible impact on task completion.
Search-o1: Agentic Search-Enhanced Large Reasoning Models
cs.AI 2025-01 unverdicted novelty 6.0

Search-o1 integrates agentic retrieval-augmented generation and a Reason-in-Documents module into large reasoning models to dynamically supply missing knowledge and improve performance on complex science, math, coding...
Retrieval-Conditioned Topology Selection with Provable Budget Conservation for Multi-Agent Code Generation
cs.AI 2026-05 unverdicted novelty 5.0

RGAO combines retrieval-based complexity assessment with a formal budget algebra to enable dynamic topology selection in multi-agent code generation with provable conservation.
Web2BigTable: A Bi-Level Multi-Agent LLM System for Internet-Scale Information Search and Extraction
cs.AI 2026-04 unverdicted novelty 5.0

Web2BigTable introduces a bi-level multi-agent system that achieves new state-of-the-art results on wide-coverage and deep web-to-table search benchmarks through orchestration, coordination, and closed-loop reflection.
A Survey of Self-Evolving Agents: What, When, How, and Where to Evolve on the Path to Artificial Super Intelligence
cs.AI 2025-07 accept novelty 4.0

The paper delivers the first systematic review of self-evolving agents, structured around what components evolve, when adaptation occurs, and how it is implemented.

Reference graph

Works this paper leans on

63 extracted references · 63 canonical work pages · cited by 20 Pith papers

[1]

Begin with a clear statement of the problem

work page
[2]

Explain the approach and any formulas or concepts used

work page
[3]

Show step-by-step calculations, using LaTeX notation for mathematical expressions

work page
[4]

Interpret the code output and incorporate it into your explanation

work page
[5]

Provide a final answer, enclosed in \boxed{} LaTeX notation

work page
[6]

"" GENERATE_SOLUTION_PROMPT =

Ensure all mathematical notation is in LaTeX format. Your response should be comprehensive, mathematically rigorous, and easy to follow. """ GENERATE_SOLUTION_PROMPT = """ Please solve the given mathematical problem step by step. Follow these guidelines:

work page
[7]

State the problem clearly

work page
[8]

Outline the approach and any relevant formulas or concepts

work page
[9]

Provide detailed calculations, using LaTeX notation for mathematical expressions

work page
[10]

Explain each step of your reasoning

work page
[11]

Present the final answer enclosed in \boxed{} LaTeX notation

work page
[12]

"" DETAILED_SOLUTION_PROMPT =

Ensure all mathematical notation is in LaTeX format. Your solution should be thorough, mathematically sound, and easy to understand. """ DETAILED_SOLUTION_PROMPT = """ Provide a comprehensive, step-by-step solution to the given mathematical problem. Your response should include:,→

work page
[13]

A clear restatement of the problem

work page
[14]

An explanation of the mathematical concepts and theorems involved

work page
[15]

A detailed, logical progression of steps leading to the solution

work page
[16]

Clear explanations for each step, including the reasoning behind it

work page
[17]

All mathematical expressions and equations in LaTeX format

work page
[18]

Visual aids or diagrams if applicable (described in text)

work page
[19]

A final answer clearly marked and enclosed in \boxed{} LaTeX notation

work page
[20]

"" async def __call__(self, problem: str):

A brief explanation of the significance of the result, if relevant. Ensure your solution is rigorous, easy to follow, and educational for someone learning the concept.,→ """ async def __call__(self, problem: str): """ Implementation of the graph """ # Use Programmer to generate and execute Python code code_solution = await self.programmer(problem=problem)...

work page 2025
[21]

The original question/prompt

work page
[22]

A golden (reference) answer (if available) 31 Published as a conference paper at ICLR 2025

work page 2025
[23]

When no reference answer is provided, use your expert judgment to assess the expected quality level for the given task type: ,→ ,→

A candidate response to be evaluated Please evaluate the candidate response on the following dimensions, each scored from 1-5. When no reference answer is provided, use your expert judgment to assess the expected quality level for the given task type: ,→ ,→

work page
[24]

Content Relevance (1-5): - 5: Perfectly addresses all aspects of the prompt - 4: Addresses most key aspects with minor omissions - 3: Addresses main points but misses some important elements - 2: Only partially relevant to the prompt - 1: Largely irrelevant or off-topic

work page
[25]

Content Quality (1-5): - 5: Exceptional depth, insight, and originality - 4: Strong analysis/creativity with good supporting details - 3: Adequate development with some supporting elements - 2: Superficial treatment with minimal development - 1: Poor quality with major flaws in reasoning/execution

work page
[26]

Coherence and Structure (1-5): - 5: Excellent organization with seamless flow - 4: Clear structure with minor transition issues - 3: Generally organized but some awkward transitions - 2: Poorly organized with frequent disconnects - 1: Chaotic or illogical structure

work page
[27]

Reference Comparison (1-5): - 5: Matches or exceeds expected quality for this type of task - 4: Slightly below ideal but strong performance - 3: Moderately below ideal but acceptable - 2: Significantly below expected quality - 1: Far below acceptable quality standards Please provide:

work page
[28]

Numeric scores for each dimension (1-5)

work page
[29]

Brief justification for each score (1-2 sentences)

work page
[30]

Total score (sum of the four dimensions, maximum 20 points)

work page
[31]

question

Summary feedback (2-3 sentences) Format your response as: Content Relevance: [score] points - Justification: [brief explanation] Content Quality: [score] points - Justification: [brief explanation] Coherence: [score] points - Justification: [brief explanation] Reference Comparison: [score] points - Justification: [brief explanation] Summary Feedback: [2-3...

work page 2025
[32]

A list of main characters with their deepest regrets and how it affects their perception of time,→

work page
[33]

A chapter-by-chapter breakdown of the plot, ensuring logical interconnections

work page
[34]

Key themes and motifs to be explored throughout the novel

work page
[35]

"" CHARACTER_PROFILE_PROMPT =

A rough word count estimate for each chapter to aim for the required total length Provide this outline in a structured format. """ CHARACTER_PROFILE_PROMPT = """ Based on the given requirements and outline, create detailed character profiles for each main character. For each character, include:,→

work page
[36]

Name, age, and physical description

work page
[37]

Background and personal history

work page
[38]

Their deepest regret and how it affects their perception of time

work page
[39]

Personality traits, motivations, and goals

work page
[40]

Relationships with other characters

work page
[41]

"" CHAPTER_PROMPT =

Character arc throughout the novel Provide these profiles in a structured format. """ CHAPTER_PROMPT = """ Write a single chapter of the novel based on the given requirements, provided outline, and character profiles. Follow these guidelines:,→

work page
[42]

Adhere to the chapter structure from the outline

work page
[43]

Maintain logical interconnections with previous and future chapters

work page
[44]

Use refined language and vivid descriptions

work page
[45]

Develop characters' arcs based on their regrets and time perception

work page
[46]

Incorporate the key themes and motifs

work page
[47]

"" async def __call__(self, problem: str):

Aim for approximately 10,000 words per chapter Write the complete chapter without stopping or summarizing. Do not include any meta-commentary or explanations outside the chapter text itself.,→ """ async def __call__(self, problem: str): """ Implementation of the workflow """ outline = await self.custom(input=problem, instruction=prompt_custom.OUTLINE_PROM...

work page 2025
[48]

Consider how each regret might impact their perception of time.,→

Start by outlining the key characters and their deepest regrets. Consider how each regret might impact their perception of time.,→

work page
[49]

Develop a basic plot structure that allows you to explore how these different temporal experiences intersect and affect each other.,→

work page
[50]

Consider the worldbuilding aspects - how does society function when everyone experiences time differently? What are the implications for relationships, work, etc? ,→ ,→

work page
[51]

Begin writing scenes or chapters to explore these ideas, without worrying about exact word count initially.,→

work page
[52]

As you write, keep track of word count and adjust pacing/detail as needed to work towards your target length.,→

work page
[53]

Ladies and gentlemen, colleagues, and esteemed guests,

Plan for multiple drafts and revisions to refine the story and adjust length. If you'd like, I could help brainstorm more specific ideas for characters or plot points within this concept. But for a full novel-length work, especially one with such precise length requirements, you'll likely need to undertake the writing process yourself over an extended per...

work page 2025
[57]

"" ELABORATE_IDEA =

Alignment with current research trends Provide a brief explanation (2-3 sentences) for your selection, highlighting its strengths in relation to the above criteria.,→ """ ELABORATE_IDEA = """ Elaborate on the prioritized research idea. Provide a comprehensive analysis including:,→

work page
[58]

"" EVALUATE_RESEARCH =

Potential challenges Ensure your response is well-structured, logically sound, and demonstrates the feasibility of the proposed research with current technology.,→ """ EVALUATE_RESEARCH = """ Evaluate the elaborated research proposal. Consider the following aspects:

work page
[59]

Novelty and originality

work page
[60]

Feasibility with current technology

work page
[61]

Potential impact on the field

work page
[62]

"" REFINE_PROPOSAL =

Clarity and coherence of the proposal Provide a concise evaluation highlighting strengths and areas for improvement. """ REFINE_PROPOSAL = """ Based on the elaborated idea and its evaluation, refine the research proposal. Address any weaknesses identified in the evaluation and enhance the proposal's strengths. Ensure that the refined proposal: ,→ ,→

work page
[63]

Clearly states the research objective

work page
[64]

Outlines a feasible methodology

work page
[65]

Describes expected outcomes and their significance

work page
[66]

"" async def __call__(self, problem: str):

Addresses potential challenges and mitigation strategies Present the refined proposal in a well-structured format suitable for academic submission.,→ """ async def __call__(self, problem: str): """ 36 Published as a conference paper at ICLR 2025 Implementation of the workflow """ ideas = [] for _ in range(3): # Generate 3 ideas idea = await self.custom(in...

work page 2025