MetaPS: Adaptive Programmatic Strategy Selection for Market Agents

Aotian Luo; Chi Zhang; Jiaxiang Chen; Weiyi Huang; Zenglin Xu; Zhouyi Zheng

arxiv: 2606.22385 · v1 · pith:IIFK52ENnew · submitted 2026-06-21 · 💻 cs.AI · cs.CE

MetaPS: Adaptive Programmatic Strategy Selection for Market Agents

Jiaxiang Chen , Aotian Luo , Zhouyi Zheng , Weiyi Huang , Chi Zhang , Zenglin Xu This is my paper

Pith reviewed 2026-06-26 11:04 UTC · model grok-4.3

classification 💻 cs.AI cs.CE

keywords market strategy selectionprogrammatic agentssimulation-guided fine-tuningLLM trading agentsadaptive decision makingexecutable policies

0 comments

The pith

MetaPS trains models on simulation rollouts to select from a library of code-based market strategies instead of generating actions directly.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

No single trading rule succeeds in every market condition. The paper replaces direct action generation by language models with a library of fixed programmatic strategies, each a short code module. It rolls these strategies out in a market simulator to create labeled examples of which program performs best in each observed state. A model is then fine-tuned on those pairs so that, at runtime, it receives only the current state and chooses the right program from the library; the chosen program then executes the trade. Experiments show this selection approach improves results across model sizes from 0.8B to 9B parameters and beats both fixed-strategy baselines and prompted large API models.

Core claim

MetaPS converts simulation rollouts of candidate strategy programs into supervised training data that teaches a model to map market states to the program expected to yield better future outcomes; after training, the model selects a program from the library using only the live state, and the selected program produces the final action without further simulation queries.

What carries the argument

A simulation-guided supervised fine-tuning loop that labels state-strategy pairs by their measured performance in backtested or simulated markets.

If this is right

Compact fine-tuned models can exceed the trading performance of larger prompted API models in the tested settings.
The final agent remains fully executable as code and produces human-readable strategy selections.
Training data can be generated at scale from any market simulator without requiring human labels.
The same selection mechanism can be applied to any domain that supplies a library of programmatic policies.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

If the simulation-to-reality gap proves small, the approach could reduce reliance on very large models for sequential decision tasks.
Extending the method to non-stationary environments would require periodic re-simulation to refresh the training labels.

Load-bearing premise

Performance rankings observed when strategies are rolled out in simulation will continue to hold when the selected strategy runs in the real market environment.

What would settle it

Deploy the trained MetaPS selector and a direct-decision baseline in the same live market for a fixed period and compare realized returns under identical conditions.

Figures

Figures reproduced from arXiv: 2606.22385 by Aotian Luo, Chi Zhang, Jiaxiang Chen, Weiyi Huang, Zenglin Xu, Zhouyi Zheng.

**Figure 1.** Figure 1: Overview of MetaPS. Market simulations supervise a meta-level router by rolling out executable strategy [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗

**Figure 2.** Figure 2: Stock-market return comparison across baseline families on the held-out 2025 benchmark. MetaPS is compared with strategy-only baselines, nonLLM learned selectors, base/API LLM prompting, and matched-backbone Qwen routers [PITH_FULL_IMAGE:figures/full_fig_p005_2.png] view at source ↗

**Figure 3.** Figure 3: Cumulative return trajectories on the held [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: Strategy behavior under simulation-derived [PITH_FULL_IMAGE:figures/full_fig_p006_4.png] view at source ↗

**Figure 5.** Figure 5: Risk–return comparison across model scales. [PITH_FULL_IMAGE:figures/full_fig_p007_5.png] view at source ↗

**Figure 6.** Figure 6: 2025 returns across Qwen model scales under the Ranked-Strategy setting. Bars show MetaPS variants, [PITH_FULL_IMAGE:figures/full_fig_p008_6.png] view at source ↗

**Figure 7.** Figure 7: Portfolio equity curves for the main Qwen and MetaPS variants on the held-out 2025 stock benchmark. [PITH_FULL_IMAGE:figures/full_fig_p021_7.png] view at source ↗

**Figure 8.** Figure 8: Return dynamics for MetaPS-9B V3 and representative baselines on the held-out 2025 stock benchmark. [PITH_FULL_IMAGE:figures/full_fig_p021_8.png] view at source ↗

**Figure 9.** Figure 9: Drawdown diagnostics on the held-out 2025 stock benchmark. Lower values indicate deeper declines [PITH_FULL_IMAGE:figures/full_fig_p022_9.png] view at source ↗

**Figure 10.** Figure 10: Calendar-level return diagnostics on the held-out 2025 stock benchmark. The block-level view summa [PITH_FULL_IMAGE:figures/full_fig_p022_10.png] view at source ↗

**Figure 11.** Figure 11: Additional baseline diagnostics on the held-out 2025 stock benchmark. The left panel compares return [PITH_FULL_IMAGE:figures/full_fig_p023_11.png] view at source ↗

**Figure 12.** Figure 12: Scale and behavior analysis for the 2025 stock benchmark. The left panel reports returns across Qwen [PITH_FULL_IMAGE:figures/full_fig_p023_12.png] view at source ↗

**Figure 13.** Figure 13: Behavior diagnostics for realized action labels and SFT data views. The best-router distribution shows [PITH_FULL_IMAGE:figures/full_fig_p024_13.png] view at source ↗

read the original abstract

No single market strategy always wins: momentum, mean reversion, risk control,and event-driven rules can each succeed or fail as market conditions change.Rather than asking large language models to directly generate market actions,we study an executable decision paradigm where an agent selects from a library of programmatic strategies, each implemented as a code module mapping market observations to actions.We propose \textbf{MetaPS}, a simulation-guided framework for adaptive programmatic strategy selection. MetaPS rolls out candidate strategies in simulated or backtested markets, identifies states where particular strategies lead to better future outcomes, and converts these state--strategy pairs into supervised fine-tuning data. During inference, the simulator is no longer queried: MetaPS observes only the current market state and candidate strategy context, selects a suitable strategy program, and the selected program produces the final action. Experiments on multi-stock trading and a controlled goods-exchange sandbox show that MetaPS consistently improves across model scales from 0.8B to 9B parameters. It outperforms fixed-strategy baselines, direct decision-making agents, and prompted API-based LLM agents; in several settings, compact fine-tuned models even surpass stronger API models. These results demonstrate that market simulations can provide scalable and targeted supervision for learning adaptive, interpretable, and executable strategy selection.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

MetaPS trains a selector on sim rollouts to pick code strategies and beats baselines inside those sims, but leaves sim-to-real transfer untested.

read the letter

The paper's main move is to generate supervised data by rolling out a library of programmatic strategies in simulated markets, then fine-tune a model to map current states to the best strategy from that library. At inference the model just picks and the chosen code module runs the action. This keeps outputs executable and avoids direct action generation by the LLM.

The pipeline is new in its specific combination of simulation-guided labeling for strategy selection rather than generation. The experiments run on multi-stock trading and a goods-exchange sandbox, and they report consistent gains from 0.8B to 9B models, with some fine-tuned compact models beating prompted larger API models inside the test environments. That evidence is straightforward for what it measures.

The soft spot is the one flagged in the stress test. All results, including the performance rankings used to create the training labels, come from the same simulators. No experiment checks whether those rankings or the learned selector remain stable once real frictions such as slippage, liquidity shocks, or unmodeled regime shifts appear. For market agents that claim practical value, this transfer question is central and currently unanswered.

The work is aimed at researchers building LLM agents for dynamic control tasks where you want interpretable code actions. A reader interested in bootstrapping supervision from simulation would find the setup useful to examine.

I would bring it to a reading group to discuss the transfer gap. I would not cite it in the next year on current evidence. It deserves peer review because the method is concrete and the experiments line up with the claims, even though the sim-to-real issue needs attention.

Referee Report

2 major / 1 minor

Summary. The paper proposes MetaPS, a simulation-guided framework in which candidate programmatic strategies (implemented as code modules) are rolled out in simulated or backtested markets to generate state–strategy pairs; these pairs are used as supervised fine-tuning data for a selector model that, at inference time, maps observed market states to a chosen strategy program without further simulator queries. Experiments on multi-stock trading and a goods-exchange sandbox are reported to show consistent gains across model scales (0.8B–9B parameters), outperforming fixed-strategy baselines, direct decision-making agents, and prompted API-based LLM agents.

Significance. If the reported gains are robust, the approach offers a concrete method for obtaining targeted, scalable supervision from market simulators while preserving interpretability through executable strategy modules. This could be useful for domains where direct LLM decision-making is brittle and where programmatic policies are preferred for auditability.

major comments (2)

[Abstract] Abstract: the central claim that MetaPS 'consistently improves across model scales' and that 'compact fine-tuned models even surpass stronger API models' is stated without any numerical results, confidence intervals, dataset sizes, number of trials, or exclusion criteria, preventing verification of the magnitude or statistical reliability of the gains.
[Experiments section] Experiments on multi-stock trading and goods-exchange sandbox: all reported performance numbers are obtained inside the identical simulators used to generate the supervised fine-tuning data; no ablation or transfer experiment tests whether the learned state-to-strategy mapping or the relative ranking of strategy returns remains stable under unmodeled deployment dynamics (variable slippage, liquidity shocks, regime shifts). This assumption is load-bearing for any claim that the method produces deployable market agents.

minor comments (1)

[Abstract] Abstract: missing space in 'risk control,and event-driven'.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive feedback. We respond to each major comment below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: [Abstract] Abstract: the central claim that MetaPS 'consistently improves across model scales' and that 'compact fine-tuned models even surpass stronger API models' is stated without any numerical results, confidence intervals, dataset sizes, number of trials, or exclusion criteria, preventing verification of the magnitude or statistical reliability of the gains.

Authors: We agree that the abstract would benefit from quantitative support. In the revised manuscript we will add specific performance deltas (e.g., average return improvements across the 0.8B–9B scale), the number of evaluation trials, and a brief note on the evaluation protocol so that the magnitude and reliability of the reported gains can be assessed directly from the abstract. revision: yes
Referee: [Experiments section] Experiments on multi-stock trading and goods-exchange sandbox: all reported performance numbers are obtained inside the identical simulators used to generate the supervised fine-tuning data; no ablation or transfer experiment tests whether the learned state-to-strategy mapping or the relative ranking of strategy returns remains stable under unmodeled deployment dynamics (variable slippage, liquidity shocks, regime shifts). This assumption is load-bearing for any claim that the method produces deployable market agents.

Authors: The referee correctly notes that all reported results are obtained inside the simulators used for data generation. The current work evaluates the simulation-guided training pipeline itself rather than claiming out-of-distribution robustness; we therefore do not present transfer experiments. In revision we will add an explicit Limitations paragraph that states this scope limitation and the load-bearing nature of the in-simulator assumption, while preserving the paper’s focus on the proposed training method. revision: partial

Circularity Check

0 steps flagged

No circularity in claimed results or derivation

full rationale

The paper presents an empirical framework that generates supervised training data from strategy rollouts inside simulators and then measures selector performance inside the same simulators. No equations, fitted parameters renamed as predictions, or self-citation chains are present that would make any reported gain equivalent to its inputs by construction. The central claim is a standard machine-learning outcome (improved selection policy on held-out simulation episodes) rather than a tautological reduction, so the result remains self-contained.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Only the abstract is available; no explicit free parameters, axioms, or invented entities are stated.

pith-pipeline@v0.9.1-grok · 5762 in / 1039 out tokens · 19252 ms · 2026-06-26T11:04:39.460392+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

240 extracted references · 47 linked inside Pith

[1]

Forty-second International Conference on Machine Learning , year=

The berkeley function calling leaderboard (bfcl): From tool use to agentic evaluation of large language models , author=. Forty-second International Conference on Machine Learning , year=
[2]

arXiv preprint arXiv:2406.19314 , volume=

Livebench: A challenging, contamination-free llm benchmark , author=. arXiv preprint arXiv:2406.19314 , volume=

Pith/arXiv arXiv
[3]

arXiv preprint arXiv:2509.25140 , year=

Reasoningbank: Scaling agent self-evolving with reasoning memory , author=. arXiv preprint arXiv:2509.25140 , year=

Pith/arXiv arXiv
[4]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=
[5]

Forty-second International Conference on Machine Learning , year=

Multi-agent Architecture Search via Agentic Supernet , author=. Forty-second International Conference on Machine Learning , year=
[6]

arXiv preprint arXiv:2507.06261 , year=

Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities , author=. arXiv preprint arXiv:2507.06261 , year=

Pith/arXiv arXiv
[7]

2024 , month = may, howpublished =

OpenAI , title =. 2024 , month = may, howpublished =

2024
[8]

arXiv preprint arXiv:2505.09388 , year=

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

Pith/arXiv arXiv
[9]

Advances in neural information processing systems , volume=

Tree of thoughts: Deliberate problem solving with large language models , author=. Advances in neural information processing systems , volume=
[10]

arXiv preprint arXiv:2503.05244 , year=

Writingbench: A comprehensive benchmark for generative writing , author=. arXiv preprint arXiv:2503.05244 , year=

arXiv
[11]

arXiv preprint arXiv:2108.07732 , year=

Program synthesis with large language models , author=. arXiv preprint arXiv:2108.07732 , year=

Pith/arXiv arXiv
[12]

International Conference on Learning Representations , year=

Neural Architecture Search with Reinforcement Learning , author=. International Conference on Learning Representations , year=
[13]

International conference on machine learning , pages=

Efficient neural architecture search via parameters sharing , author=. International conference on machine learning , pages=. 2018 , organization=

2018
[14]

arXiv preprint arXiv:2107.03374 , year=

Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=

Pith/arXiv arXiv
[15]

arXiv preprint arXiv:2103.03874 , year=

Measuring mathematical problem solving with the math dataset , author=. arXiv preprint arXiv:2103.03874 , year=

Pith/arXiv arXiv
[16]

arXiv preprint arXiv:2408.06195 , year=

Mutual reasoning makes smaller llms stronger problem-solvers , author=. arXiv preprint arXiv:2408.06195 , year=

arXiv
[17]

The eleventh international conference on learning representations , year=

React: Synergizing reasoning and acting in language models , author=. The eleventh international conference on learning representations , year=
[18]

Proceedings of the AAAI conference on artificial intelligence , volume=

Graph of thoughts: Solving elaborate problems with large language models , author=. Proceedings of the AAAI conference on artificial intelligence , volume=
[19]

arXiv preprint arXiv:2308.10379 , year=

Algorithm of thoughts: Enhancing exploration of ideas in large language models , author=. arXiv preprint arXiv:2308.10379 , year=

arXiv
[20]

URL https://arxiv

Self-refine: Iterative refinement with self-feedback, 2023 , author=. URL https://arxiv. org/abs/2303.17651 , year=

Pith/arXiv arXiv 2023
[21]

arXiv preprint arXiv:2305.11738 , year=

Critic: Large language models can self-correct with tool-interactive critiquing , author=. arXiv preprint arXiv:2305.11738 , year=

Pith/arXiv arXiv
[22]

URL https://arxiv

Reflexion: Language agents with verbal reinforcement learning, 2023 , author=. URL https://arxiv. org/abs/2303.11366 , volume=

Pith/arXiv arXiv 2023
[23]

arXiv preprint arXiv:2506.11442 , year=

ReVeal: Self-Evolving Code Agents via Iterative Generation-Verification , author=. arXiv preprint arXiv:2506.11442 , year=

arXiv
[24]

arXiv preprint arXiv:2310.06117 , year=

Take a step back: Evoking reasoning via abstraction in large language models , author=. arXiv preprint arXiv:2310.06117 , year=

arXiv
[25]

Findings of the association for computational linguistics: ACL 2024 , pages=

Chain-of-verification reduces hallucination in large language models , author=. Findings of the association for computational linguistics: ACL 2024 , pages=

2024
[26]

arXiv preprint arXiv:2307.15337 , year=

Skeleton-of-thought: Prompting llms for efficient parallel generation , author=. arXiv preprint arXiv:2307.15337 , year=

arXiv
[27]

arXiv preprint arXiv:2308.04371 , year=

Cumulative reasoning with large language models , author=. arXiv preprint arXiv:2308.04371 , year=

Pith/arXiv arXiv
[28]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Deal: Decoding-time alignment for large language models , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[29]

The eleventh international conference on learning representations , year=

Large language models are human-level prompt engineers , author=. The eleventh international conference on learning representations , year=
[30]

The Twelfth International Conference on Learning Representations , year=

Large language models as optimizers , author=. The Twelfth International Conference on Learning Representations , year=
[31]

arXiv preprint arXiv:2306.07863 , year=

Synapse: Trajectory-as-exemplar prompting with memory for computer control , author=. arXiv preprint arXiv:2306.07863 , year=

arXiv
[32]

arXiv preprint arXiv:2502.12018 , year=

Atom of thoughts for markov llm test-time scaling , author=. arXiv preprint arXiv:2502.12018 , year=

arXiv
[33]

arXiv preprint arXiv:2509.26062 , year=

DyFlow: Dynamic Workflow Framework for Agentic Reasoning , author=. arXiv preprint arXiv:2509.26062 , year=

arXiv
[34]

arXiv preprint arXiv:2507.19457 , year=

Gepa: Reflective prompt evolution can outperform reinforcement learning , author=. arXiv preprint arXiv:2507.19457 , year=

Pith/arXiv arXiv
[35]

arXiv preprint arXiv:2305.04091 , year=

Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models , author=. arXiv preprint arXiv:2305.04091 , year=

Pith/arXiv arXiv
[36]

arXiv preprint arXiv:2304.11477 , year=

Llm+ p: Empowering large language models with optimal planning proficiency , author=. arXiv preprint arXiv:2304.11477 , year=

Pith/arXiv arXiv
[37]

arXiv preprint arXiv:2205.10625 , year=

Least-to-most prompting enables complex reasoning in large language models , author=. arXiv preprint arXiv:2205.10625 , year=

Pith/arXiv arXiv
[38]

Advances in Neural Information Processing Systems , volume=

Gorilla: Large language model connected with massive apis , author=. Advances in Neural Information Processing Systems , volume=
[39]

Advances in Neural Information Processing Systems , volume=

Toolformer: Language models can teach themselves to use tools , author=. Advances in Neural Information Processing Systems , volume=
[40]

arXiv preprint arXiv:2205.12255 , year=

Talm: Tool augmented language models , author=. arXiv preprint arXiv:2205.12255 , year=

arXiv
[41]

Easytool: Enhancing llm-based agents with concise tool instruction , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2025
[42]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

Calc-x and calcformers: Empowering arithmetical chain-of-thought through interaction with symbolic systems , author=. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

2023
[43]

Advances in Neural Information Processing Systems , volume=

Chameleon: Plug-and-play compositional reasoning with large language models , author=. Advances in Neural Information Processing Systems , volume=
[44]

arXiv preprint arXiv:2504.19413 , year=

Mem0: Building production-ready ai agents with scalable long-term memory , author=. arXiv preprint arXiv:2504.19413 , year=

Pith/arXiv arXiv
[45]

arXiv preprint arXiv:2508.06433 , year=

Memp: Exploring agent procedural memory , author=. arXiv preprint arXiv:2508.06433 , year=

Pith/arXiv arXiv
[46]

arXiv preprint arXiv:2512.10696 , year=

Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution , author=. arXiv preprint arXiv:2512.10696 , year=

Pith/arXiv arXiv
[47]

, author=

MemGPT: Towards LLMs as Operating Systems. , author=. 2023 , publisher=

2023
[48]

Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

Generative agents: Interactive simulacra of human behavior , author=. Proceedings of the 36th annual acm symposium on user interface software and technology , pages=
[49]

arXiv preprint arXiv:2506.07398 , year=

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems , author=. arXiv preprint arXiv:2506.07398 , year=

arXiv
[50]

Advances in Neural Information Processing Systems , volume=

Hipporag: Neurobiologically inspired long-term memory for large language models , author=. Advances in Neural Information Processing Systems , volume=
[51]

arXiv preprint arXiv:2507.22925 , year=

Hierarchical memory for high-efficiency long-term reasoning in llm agents , author=. arXiv preprint arXiv:2507.22925 , year=

arXiv
[52]

arXiv preprint arXiv:2407.04363 , year=

Arigraph: Learning knowledge graph world models with episodic memory for llm agents , author=. arXiv preprint arXiv:2407.04363 , year=

arXiv
[53]

arXiv preprint arXiv:2305.13304 , year=

Recurrentgpt: Interactive generation of (arbitrarily) long text , author=. arXiv preprint arXiv:2305.13304 , year=

arXiv
[54]

2022 , publisher=

Memprompt: Memory-assisted prompt editing with user feedback , author=. 2022 , publisher=

2022
[55]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Memorybank: Enhancing large language models with long-term memory , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=
[56]

Findings of the Association for Computational Linguistics: EMNLP 2025 , pages =

From Implicit Exploration to Structured Reasoning: Guideline and Refinement for LLMs , author =. Findings of the Association for Computational Linguistics: EMNLP 2025 , pages =

2025
[57]

Findings of the Association for Computational Linguistics: EMNLP 2025 , pages =

FinHEAR: Human Expertise and Adaptive Risk-Aware Temporal Reasoning for Financial Decision-Making , author =. Findings of the Association for Computational Linguistics: EMNLP 2025 , pages =

2025
[58]

IEEE Transactions on Big Data , year=

Finmem: A performance-enhanced llm trading agent with layered memory and character design , author=. IEEE Transactions on Big Data , year=
[59]

arXiv preprint arXiv:2506.17288 , year=

SlimRAG: Retrieval without Graphs via Entity-Aware Context Selection , author=. arXiv preprint arXiv:2506.17288 , year=

arXiv
[60]

arXiv preprint arXiv:2410.05779 , year=

Lightrag: Simple and fast retrieval-augmented generation , author=. arXiv preprint arXiv:2410.05779 , year=

Pith/arXiv arXiv
[61]

arXiv preprint arXiv:2404.16130 , year=

From local to global: A graph rag approach to query-focused summarization , author=. arXiv preprint arXiv:2404.16130 , year=

Pith/arXiv arXiv
[62]

arXiv preprint arXiv:2506.07820 , year=

Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation , author=. arXiv preprint arXiv:2506.07820 , year=

arXiv
[63]

arXiv preprint arXiv:2306.03901 , year=

Chatdb: Augmenting llms with databases as their symbolic memory , author=. arXiv preprint arXiv:2306.03901 , year=

arXiv
[64]

arXiv preprint arXiv:2508.08997 , year=

Intrinsic Memory Agents: Heterogeneous Multi-Agent LLM Systems through Structured Contextual Memory , author=. arXiv preprint arXiv:2508.08997 , year=

arXiv
[65]

differentiation

Textgrad: Automatic" differentiation" via text , author=. arXiv preprint arXiv:2406.07496 , year=

Pith/arXiv arXiv
[66]

arXiv preprint arXiv:2310.03714 , year=

Dspy: Compiling declarative language model calls into self-improving pipelines , author=. arXiv preprint arXiv:2310.03714 , year=

Pith/arXiv arXiv
[67]

Advances in neural information processing systems , volume=

Adaplanner: Adaptive planning from feedback with language models , author=. Advances in neural information processing systems , volume=
[68]

gradient descent

Automatic prompt optimization with" gradient descent" and beam search , author=. arXiv preprint arXiv:2305.03495 , year=

arXiv
[69]

arXiv preprint arXiv:2408.08435 , year=

Automated design of agentic systems , author=. arXiv preprint arXiv:2408.08435 , year=

Pith/arXiv arXiv
[70]

arXiv preprint arXiv:2410.10762 , year=

Aflow: Automating agentic workflow generation , author=. arXiv preprint arXiv:2410.10762 , year=

Pith/arXiv arXiv
[71]

arXiv preprint arXiv:2508.08053 , year=

AdaptFlow: Adaptive Workflow Optimization via Meta-Learning , author=. arXiv preprint arXiv:2508.08053 , year=

arXiv
[72]

arXiv preprint arXiv:2403.02502 , year=

Trial and error: Exploration-based trajectory optimization for llm agents , author=. arXiv preprint arXiv:2403.02502 , year=

arXiv
[73]

arXiv preprint arXiv:2409.07429 , year=

Agent workflow memory , author=. arXiv preprint arXiv:2409.07429 , year=

Pith/arXiv arXiv
[74]

arXiv preprint arXiv:2406.11176 , year=

Watch every step! llm agent learning via iterative step-level process refinement , author=. arXiv preprint arXiv:2406.11176 , year=

arXiv
[75]

arXiv preprint arXiv:2310.10134 , year=

Clin: A continually learning language agent for rapid task adaptation and generalization , author=. arXiv preprint arXiv:2310.10134 , year=

arXiv
[76]

arXiv preprint arXiv:2305.16291 , year=

Voyager: An open-ended embodied agent with large language models , author=. arXiv preprint arXiv:2305.16291 , year=

Pith/arXiv arXiv
[77]

arXiv preprint arXiv:2309.17428 , year=

Craft: Customizing llms by creating and retrieving from specialized toolsets , author=. arXiv preprint arXiv:2309.17428 , year=

arXiv
[78]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Llm agents making agent tools , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=
[79]

arXiv preprint arXiv:2511.10395 , year=

AgentEvolver: Towards Efficient Self-Evolving Agent System , author=. arXiv preprint arXiv:2511.10395 , year=

arXiv
[80]

arXiv preprint arXiv:2511.14460 , year=

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning , author=. arXiv preprint arXiv:2511.14460 , year=

Pith/arXiv arXiv

Showing first 80 references.

[1] [1]

Forty-second International Conference on Machine Learning , year=

The berkeley function calling leaderboard (bfcl): From tool use to agentic evaluation of large language models , author=. Forty-second International Conference on Machine Learning , year=

[2] [2]

arXiv preprint arXiv:2406.19314 , volume=

Livebench: A challenging, contamination-free llm benchmark , author=. arXiv preprint arXiv:2406.19314 , volume=

Pith/arXiv arXiv

[3] [3]

arXiv preprint arXiv:2509.25140 , year=

Reasoningbank: Scaling agent self-evolving with reasoning memory , author=. arXiv preprint arXiv:2509.25140 , year=

Pith/arXiv arXiv

[4] [4]

Advances in neural information processing systems , volume=

Chain-of-thought prompting elicits reasoning in large language models , author=. Advances in neural information processing systems , volume=

[5] [5]

Forty-second International Conference on Machine Learning , year=

Multi-agent Architecture Search via Agentic Supernet , author=. Forty-second International Conference on Machine Learning , year=

[6] [6]

arXiv preprint arXiv:2507.06261 , year=

Gemini 2.5: Pushing the frontier with advanced reasoning, multimodality, long context, and next generation agentic capabilities , author=. arXiv preprint arXiv:2507.06261 , year=

Pith/arXiv arXiv

[7] [7]

2024 , month = may, howpublished =

OpenAI , title =. 2024 , month = may, howpublished =

2024

[8] [8]

arXiv preprint arXiv:2505.09388 , year=

Qwen3 technical report , author=. arXiv preprint arXiv:2505.09388 , year=

Pith/arXiv arXiv

[9] [9]

Advances in neural information processing systems , volume=

Tree of thoughts: Deliberate problem solving with large language models , author=. Advances in neural information processing systems , volume=

[10] [10]

arXiv preprint arXiv:2503.05244 , year=

Writingbench: A comprehensive benchmark for generative writing , author=. arXiv preprint arXiv:2503.05244 , year=

arXiv

[11] [11]

arXiv preprint arXiv:2108.07732 , year=

Program synthesis with large language models , author=. arXiv preprint arXiv:2108.07732 , year=

Pith/arXiv arXiv

[12] [12]

International Conference on Learning Representations , year=

Neural Architecture Search with Reinforcement Learning , author=. International Conference on Learning Representations , year=

[13] [13]

International conference on machine learning , pages=

Efficient neural architecture search via parameters sharing , author=. International conference on machine learning , pages=. 2018 , organization=

2018

[14] [14]

arXiv preprint arXiv:2107.03374 , year=

Evaluating large language models trained on code , author=. arXiv preprint arXiv:2107.03374 , year=

Pith/arXiv arXiv

[15] [15]

arXiv preprint arXiv:2103.03874 , year=

Measuring mathematical problem solving with the math dataset , author=. arXiv preprint arXiv:2103.03874 , year=

Pith/arXiv arXiv

[16] [16]

arXiv preprint arXiv:2408.06195 , year=

Mutual reasoning makes smaller llms stronger problem-solvers , author=. arXiv preprint arXiv:2408.06195 , year=

arXiv

[17] [17]

The eleventh international conference on learning representations , year=

React: Synergizing reasoning and acting in language models , author=. The eleventh international conference on learning representations , year=

[18] [18]

Proceedings of the AAAI conference on artificial intelligence , volume=

Graph of thoughts: Solving elaborate problems with large language models , author=. Proceedings of the AAAI conference on artificial intelligence , volume=

[19] [19]

arXiv preprint arXiv:2308.10379 , year=

Algorithm of thoughts: Enhancing exploration of ideas in large language models , author=. arXiv preprint arXiv:2308.10379 , year=

arXiv

[20] [20]

URL https://arxiv

Self-refine: Iterative refinement with self-feedback, 2023 , author=. URL https://arxiv. org/abs/2303.17651 , year=

Pith/arXiv arXiv 2023

[21] [21]

arXiv preprint arXiv:2305.11738 , year=

Critic: Large language models can self-correct with tool-interactive critiquing , author=. arXiv preprint arXiv:2305.11738 , year=

Pith/arXiv arXiv

[22] [22]

URL https://arxiv

Reflexion: Language agents with verbal reinforcement learning, 2023 , author=. URL https://arxiv. org/abs/2303.11366 , volume=

Pith/arXiv arXiv 2023

[23] [23]

arXiv preprint arXiv:2506.11442 , year=

ReVeal: Self-Evolving Code Agents via Iterative Generation-Verification , author=. arXiv preprint arXiv:2506.11442 , year=

arXiv

[24] [24]

arXiv preprint arXiv:2310.06117 , year=

Take a step back: Evoking reasoning via abstraction in large language models , author=. arXiv preprint arXiv:2310.06117 , year=

arXiv

[25] [25]

Findings of the association for computational linguistics: ACL 2024 , pages=

Chain-of-verification reduces hallucination in large language models , author=. Findings of the association for computational linguistics: ACL 2024 , pages=

2024

[26] [26]

arXiv preprint arXiv:2307.15337 , year=

Skeleton-of-thought: Prompting llms for efficient parallel generation , author=. arXiv preprint arXiv:2307.15337 , year=

arXiv

[27] [27]

arXiv preprint arXiv:2308.04371 , year=

Cumulative reasoning with large language models , author=. arXiv preprint arXiv:2308.04371 , year=

Pith/arXiv arXiv

[28] [28]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Deal: Decoding-time alignment for large language models , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[29] [29]

The eleventh international conference on learning representations , year=

Large language models are human-level prompt engineers , author=. The eleventh international conference on learning representations , year=

[30] [30]

The Twelfth International Conference on Learning Representations , year=

Large language models as optimizers , author=. The Twelfth International Conference on Learning Representations , year=

[31] [31]

arXiv preprint arXiv:2306.07863 , year=

Synapse: Trajectory-as-exemplar prompting with memory for computer control , author=. arXiv preprint arXiv:2306.07863 , year=

arXiv

[32] [32]

arXiv preprint arXiv:2502.12018 , year=

Atom of thoughts for markov llm test-time scaling , author=. arXiv preprint arXiv:2502.12018 , year=

arXiv

[33] [33]

arXiv preprint arXiv:2509.26062 , year=

DyFlow: Dynamic Workflow Framework for Agentic Reasoning , author=. arXiv preprint arXiv:2509.26062 , year=

arXiv

[34] [34]

arXiv preprint arXiv:2507.19457 , year=

Gepa: Reflective prompt evolution can outperform reinforcement learning , author=. arXiv preprint arXiv:2507.19457 , year=

Pith/arXiv arXiv

[35] [35]

arXiv preprint arXiv:2305.04091 , year=

Plan-and-solve prompting: Improving zero-shot chain-of-thought reasoning by large language models , author=. arXiv preprint arXiv:2305.04091 , year=

Pith/arXiv arXiv

[36] [36]

arXiv preprint arXiv:2304.11477 , year=

Llm+ p: Empowering large language models with optimal planning proficiency , author=. arXiv preprint arXiv:2304.11477 , year=

Pith/arXiv arXiv

[37] [37]

arXiv preprint arXiv:2205.10625 , year=

Least-to-most prompting enables complex reasoning in large language models , author=. arXiv preprint arXiv:2205.10625 , year=

Pith/arXiv arXiv

[38] [38]

Advances in Neural Information Processing Systems , volume=

Gorilla: Large language model connected with massive apis , author=. Advances in Neural Information Processing Systems , volume=

[39] [39]

Advances in Neural Information Processing Systems , volume=

Toolformer: Language models can teach themselves to use tools , author=. Advances in Neural Information Processing Systems , volume=

[40] [40]

arXiv preprint arXiv:2205.12255 , year=

Talm: Tool augmented language models , author=. arXiv preprint arXiv:2205.12255 , year=

arXiv

[41] [41]

Easytool: Enhancing llm-based agents with concise tool instruction , author=. Proceedings of the 2025 Conference of the Nations of the Americas Chapter of the Association for Computational Linguistics: Human Language Technologies (Volume 1: Long Papers) , pages=

2025

[42] [42]

Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

Calc-x and calcformers: Empowering arithmetical chain-of-thought through interaction with symbolic systems , author=. Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing , pages=

2023

[43] [43]

Advances in Neural Information Processing Systems , volume=

Chameleon: Plug-and-play compositional reasoning with large language models , author=. Advances in Neural Information Processing Systems , volume=

[44] [44]

arXiv preprint arXiv:2504.19413 , year=

Mem0: Building production-ready ai agents with scalable long-term memory , author=. arXiv preprint arXiv:2504.19413 , year=

Pith/arXiv arXiv

[45] [45]

arXiv preprint arXiv:2508.06433 , year=

Memp: Exploring agent procedural memory , author=. arXiv preprint arXiv:2508.06433 , year=

Pith/arXiv arXiv

[46] [46]

arXiv preprint arXiv:2512.10696 , year=

Remember Me, Refine Me: A Dynamic Procedural Memory Framework for Experience-Driven Agent Evolution , author=. arXiv preprint arXiv:2512.10696 , year=

Pith/arXiv arXiv

[47] [47]

, author=

MemGPT: Towards LLMs as Operating Systems. , author=. 2023 , publisher=

2023

[48] [48]

Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

Generative agents: Interactive simulacra of human behavior , author=. Proceedings of the 36th annual acm symposium on user interface software and technology , pages=

[49] [49]

arXiv preprint arXiv:2506.07398 , year=

G-Memory: Tracing Hierarchical Memory for Multi-Agent Systems , author=. arXiv preprint arXiv:2506.07398 , year=

arXiv

[50] [50]

Advances in Neural Information Processing Systems , volume=

Hipporag: Neurobiologically inspired long-term memory for large language models , author=. Advances in Neural Information Processing Systems , volume=

[51] [51]

arXiv preprint arXiv:2507.22925 , year=

Hierarchical memory for high-efficiency long-term reasoning in llm agents , author=. arXiv preprint arXiv:2507.22925 , year=

arXiv

[52] [52]

arXiv preprint arXiv:2407.04363 , year=

Arigraph: Learning knowledge graph world models with episodic memory for llm agents , author=. arXiv preprint arXiv:2407.04363 , year=

arXiv

[53] [53]

arXiv preprint arXiv:2305.13304 , year=

Recurrentgpt: Interactive generation of (arbitrarily) long text , author=. arXiv preprint arXiv:2305.13304 , year=

arXiv

[54] [54]

2022 , publisher=

Memprompt: Memory-assisted prompt editing with user feedback , author=. 2022 , publisher=

2022

[55] [55]

Proceedings of the AAAI Conference on Artificial Intelligence , volume=

Memorybank: Enhancing large language models with long-term memory , author=. Proceedings of the AAAI Conference on Artificial Intelligence , volume=

[56] [56]

Findings of the Association for Computational Linguistics: EMNLP 2025 , pages =

From Implicit Exploration to Structured Reasoning: Guideline and Refinement for LLMs , author =. Findings of the Association for Computational Linguistics: EMNLP 2025 , pages =

2025

[57] [57]

Findings of the Association for Computational Linguistics: EMNLP 2025 , pages =

FinHEAR: Human Expertise and Adaptive Risk-Aware Temporal Reasoning for Financial Decision-Making , author =. Findings of the Association for Computational Linguistics: EMNLP 2025 , pages =

2025

[58] [58]

IEEE Transactions on Big Data , year=

Finmem: A performance-enhanced llm trading agent with layered memory and character design , author=. IEEE Transactions on Big Data , year=

[59] [59]

arXiv preprint arXiv:2506.17288 , year=

SlimRAG: Retrieval without Graphs via Entity-Aware Context Selection , author=. arXiv preprint arXiv:2506.17288 , year=

arXiv

[60] [60]

arXiv preprint arXiv:2410.05779 , year=

Lightrag: Simple and fast retrieval-augmented generation , author=. arXiv preprint arXiv:2410.05779 , year=

Pith/arXiv arXiv

[61] [61]

arXiv preprint arXiv:2404.16130 , year=

From local to global: A graph rag approach to query-focused summarization , author=. arXiv preprint arXiv:2404.16130 , year=

Pith/arXiv arXiv

[62] [62]

arXiv preprint arXiv:2506.07820 , year=

Guideline Forest: Experience-Induced Multi-Guideline Reasoning with Stepwise Aggregation , author=. arXiv preprint arXiv:2506.07820 , year=

arXiv

[63] [63]

arXiv preprint arXiv:2306.03901 , year=

Chatdb: Augmenting llms with databases as their symbolic memory , author=. arXiv preprint arXiv:2306.03901 , year=

arXiv

[64] [64]

arXiv preprint arXiv:2508.08997 , year=

Intrinsic Memory Agents: Heterogeneous Multi-Agent LLM Systems through Structured Contextual Memory , author=. arXiv preprint arXiv:2508.08997 , year=

arXiv

[65] [65]

differentiation

Textgrad: Automatic" differentiation" via text , author=. arXiv preprint arXiv:2406.07496 , year=

Pith/arXiv arXiv

[66] [66]

arXiv preprint arXiv:2310.03714 , year=

Dspy: Compiling declarative language model calls into self-improving pipelines , author=. arXiv preprint arXiv:2310.03714 , year=

Pith/arXiv arXiv

[67] [67]

Advances in neural information processing systems , volume=

Adaplanner: Adaptive planning from feedback with language models , author=. Advances in neural information processing systems , volume=

[68] [68]

gradient descent

Automatic prompt optimization with" gradient descent" and beam search , author=. arXiv preprint arXiv:2305.03495 , year=

arXiv

[69] [69]

arXiv preprint arXiv:2408.08435 , year=

Automated design of agentic systems , author=. arXiv preprint arXiv:2408.08435 , year=

Pith/arXiv arXiv

[70] [70]

arXiv preprint arXiv:2410.10762 , year=

Aflow: Automating agentic workflow generation , author=. arXiv preprint arXiv:2410.10762 , year=

Pith/arXiv arXiv

[71] [71]

arXiv preprint arXiv:2508.08053 , year=

AdaptFlow: Adaptive Workflow Optimization via Meta-Learning , author=. arXiv preprint arXiv:2508.08053 , year=

arXiv

[72] [72]

arXiv preprint arXiv:2403.02502 , year=

Trial and error: Exploration-based trajectory optimization for llm agents , author=. arXiv preprint arXiv:2403.02502 , year=

arXiv

[73] [73]

arXiv preprint arXiv:2409.07429 , year=

Agent workflow memory , author=. arXiv preprint arXiv:2409.07429 , year=

Pith/arXiv arXiv

[74] [74]

arXiv preprint arXiv:2406.11176 , year=

Watch every step! llm agent learning via iterative step-level process refinement , author=. arXiv preprint arXiv:2406.11176 , year=

arXiv

[75] [75]

arXiv preprint arXiv:2310.10134 , year=

Clin: A continually learning language agent for rapid task adaptation and generalization , author=. arXiv preprint arXiv:2310.10134 , year=

arXiv

[76] [76]

arXiv preprint arXiv:2305.16291 , year=

Voyager: An open-ended embodied agent with large language models , author=. arXiv preprint arXiv:2305.16291 , year=

Pith/arXiv arXiv

[77] [77]

arXiv preprint arXiv:2309.17428 , year=

Craft: Customizing llms by creating and retrieving from specialized toolsets , author=. arXiv preprint arXiv:2309.17428 , year=

arXiv

[78] [78]

Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

Llm agents making agent tools , author=. Proceedings of the 63rd Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers) , pages=

[79] [79]

arXiv preprint arXiv:2511.10395 , year=

AgentEvolver: Towards Efficient Self-Evolving Agent System , author=. arXiv preprint arXiv:2511.10395 , year=

arXiv

[80] [80]

arXiv preprint arXiv:2511.14460 , year=

Agent-R1: Training Powerful LLM Agents with End-to-End Reinforcement Learning , author=. arXiv preprint arXiv:2511.14460 , year=

Pith/arXiv arXiv