pith. sign in

arxiv: 2510.15949 · v5 · pith:7RAPPF33new · submitted 2025-10-10 · 💱 q-fin.TR · cs.AI

ATLAS: Adaptive Trading with LLM AgentS Through Dynamic Prompt Optimization and Multi-Agent Coordination

Pith reviewed 2026-05-21 20:20 UTC · model grok-4.3

classification 💱 q-fin.TR cs.AI
keywords adaptive prompt optimizationLLM trading agentsmulti-agent coordinationstochastic feedbackorder-aware action spacefinancial decision making
0
0 comments X

The pith

The ATLAS framework enables LLM trading agents to improve performance over time by dynamically optimizing prompts with stochastic market feedback.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

This paper presents ATLAS, a multi-agent system that turns large language models into autonomous trading agents capable of handling markets, news, and fundamentals together. The central mechanism is Adaptive-OPRO, which updates the trading agent's prompt using real trading outcomes even when feedback arrives late and mixed with noise. A reader would care because this approach could let AI agents learn and adapt in unpredictable financial settings instead of depending on unchanging instructions. Tests across different market regimes and several LLM families show that Adaptive-OPRO delivers consistent gains where fixed prompts and reflection-based methods do not.

Core claim

Within ATLAS the central trading agent works in an order-aware action space to produce executable market orders and applies Adaptive-OPRO to incorporate real-time stochastic feedback into its prompt, producing increasing performance over time that outperforms fixed prompts across regime-specific equity studies and multiple LLM families while reflection-based feedback yields no systematic gains.

What carries the argument

Adaptive-OPRO, a prompt-optimization technique that dynamically adapts the prompt by incorporating real-time, stochastic feedback from trading outcomes.

If this is right

  • The trading agent generates outputs that map directly to executable market orders rather than abstract signals.
  • Multiple agents synthesize market data, news, and corporate fundamentals into coherent trading decisions.
  • Performance improves measurably as the agent continues to trade and receive feedback.
  • These advantages appear across different market regimes and several large language model families.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same feedback-driven prompt updates could be tested in other sequential decision settings that feature delayed and noisy rewards.
  • Combining Adaptive-OPRO with additional coordination rules among agents might increase stability during sudden market shifts.
  • Live-market deployment would reveal whether the measured gains survive transaction costs and execution delays not present in the controlled studies.

Load-bearing premise

That late and noisy market feedback can be incorporated into prompt updates in a stable way that produces measurable performance gains without introducing instability or overfitting to specific regimes.

What would settle it

A new set of regime-specific equity trading tests in which Adaptive-OPRO produces no consistent outperformance relative to fixed prompts across additional LLM families would show the central claim does not hold.

Figures

Figures reproduced from arXiv: 2510.15949 by Angeliki Dimitriou, Charidimos Papadakis, Giorgos Filandrianos, Giorgos Stamou, Konstantinos Thomas, Maria Lymperaiou.

Figure 1
Figure 1. Figure 1: ATLAS Framework Overview. The Central Trading Agent submits orders to the Trading Execution Engine [PITH_FULL_IMAGE:figures/full_fig_p003_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: ROI across three assets using Adaptive-OPRO. [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Daily vs weekly reflection mechanism performance comparison across models and assets, showing ROI [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Header and trader identity modifications between iteration 4 and iteration 5, showing title changes and [PITH_FULL_IMAGE:figures/full_fig_p033_4.png] view at source ↗
Figure 5
Figure 5. Figure 5: Structural reorganization consolidating sections into a unified [PITH_FULL_IMAGE:figures/full_fig_p034_5.png] view at source ↗
Figure 6
Figure 6. Figure 6: Decision protocol restructuring from informal [PITH_FULL_IMAGE:figures/full_fig_p035_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Intermediate optimization (GPT-o4-mini, Prompt 4) featuring streamlined structure with a numbered [PITH_FULL_IMAGE:figures/full_fig_p036_7.png] view at source ↗
Figure 8
Figure 8. Figure 8: Final optimized prompt (GPT-o4-mini, Prompt 11) with a six-step decision framework and systematic [PITH_FULL_IMAGE:figures/full_fig_p037_8.png] view at source ↗
read the original abstract

Large language models show promise for financial decision-making, yet deploying them as autonomous trading agents raises fundamental challenges: how to adapt instructions when rewards arrive late and obscured by market noise, how to synthesize heterogeneous information streams into coherent decisions, and how to bridge the gap between model outputs and executable market actions. We present ATLAS (Adaptive Trading with LLM AgentS), a unified multi-agent framework that integrates structured information from markets, news, and corporate fundamentals to support robust trading decisions. Within ATLAS, the central trading agent operates in an order-aware action space, ensuring that outputs correspond to executable market orders rather than abstract signals. The agent can incorporate feedback while trading using Adaptive-OPRO, a novel prompt-optimization technique that dynamically adapts the prompt by incorporating real-time, stochastic feedback, leading to increasing performance over time. Across regime-specific equity studies and multiple LLM families, Adaptive-OPRO consistently outperforms fixed prompts, while reflection-based feedback fails to provide systematic gains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript presents ATLAS, a multi-agent framework for deploying large language models as trading agents. It integrates structured data from markets, news, and corporate fundamentals; employs an order-aware action space to produce executable market orders; and introduces Adaptive-OPRO, a dynamic prompt-optimization technique that incorporates real-time stochastic feedback to adapt instructions during trading. The central empirical claim is that Adaptive-OPRO yields consistently increasing performance and outperforms fixed prompts across regime-specific equity studies and multiple LLM families, whereas reflection-based feedback does not deliver systematic gains.

Significance. If the reported performance improvements prove robust under proper controls, the work would offer a practical contribution to LLM-driven quantitative trading by tackling adaptation to delayed, noisy market rewards. The order-aware action space and multi-agent coordination address deployment gaps between model outputs and executable trades. The finding that reflection-based methods fail systematically is a useful negative result for the field.

major comments (3)
  1. [Abstract] Abstract: The performance claims for Adaptive-OPRO are stated without any quantitative metrics, error bars, dataset descriptions, number of trials, or ablation results, so the central claim that it 'consistently outperforms fixed prompts' cannot be evaluated from the text.
  2. [§3–4] §3–4: The description of Adaptive-OPRO does not specify regularization, variance-reduction steps, or anti-overfitting mechanisms for incorporating late, market-noise-obscured feedback into prompt updates. Without these, measured gains in non-stationary equity regimes may reflect transient regime fitting rather than stable adaptation.
  3. [§5] §5 (regime-specific studies): The claim that reflection-based feedback fails systematically is presented as supporting evidence for Adaptive-OPRO, yet the manuscript does not detail how regimes are identified, how performance is aggregated across them, or whether the same noise issues affect both methods equally.
minor comments (2)
  1. [§3] Clarify the precise definition of 'real-time' feedback given the acknowledged latency of market rewards.
  2. [Figures/Tables] Add confidence intervals or standard errors to any performance curves or tables comparing Adaptive-OPRO against baselines.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their thorough and constructive review of our manuscript on ATLAS. We address each of the major comments in detail below, indicating where revisions will be made to enhance the clarity, rigor, and completeness of the paper.

read point-by-point responses
  1. Referee: [Abstract] Abstract: The performance claims for Adaptive-OPRO are stated without any quantitative metrics, error bars, dataset descriptions, number of trials, or ablation results, so the central claim that it 'consistently outperforms fixed prompts' cannot be evaluated from the text.

    Authors: We agree that the abstract would benefit from including key quantitative results to support the performance claims. In the revised version, we will incorporate specific metrics such as the average outperformance in returns and Sharpe ratio, along with the number of trials and a high-level description of the datasets used in the regime-specific studies. This will make the central claims more evaluable directly from the abstract. revision: yes

  2. Referee: [§3–4] §3–4: The description of Adaptive-OPRO does not specify regularization, variance-reduction steps, or anti-overfitting mechanisms for incorporating late, market-noise-obscured feedback into prompt updates. Without these, measured gains in non-stationary equity regimes may reflect transient regime fitting rather than stable adaptation.

    Authors: This is a valid concern regarding the robustness of Adaptive-OPRO. The current manuscript describes the core stochastic feedback loop but does not explicitly outline regularization or anti-overfitting procedures. We will revise Sections 3 and 4 to include a detailed explanation of the variance reduction achieved through multi-episode stochastic sampling and introduce a regularization term in the prompt optimization objective to mitigate overfitting to noisy market signals. We will also add ablation experiments demonstrating the impact of these mechanisms on performance stability across regimes. revision: yes

  3. Referee: [§5] §5 (regime-specific studies): The claim that reflection-based feedback fails systematically is presented as supporting evidence for Adaptive-OPRO, yet the manuscript does not detail how regimes are identified, how performance is aggregated across them, or whether the same noise issues affect both methods equally.

    Authors: We appreciate this point on the need for greater transparency in the experimental setup. Section 5 currently presents the results but can be expanded for clarity. In the revision, we will add a subsection detailing the regime identification process (based on statistical properties of the time series), the aggregation method for performance metrics across regimes, and a comparative analysis confirming that both feedback methods encounter equivalent market noise levels, with only Adaptive-OPRO showing systematic adaptation gains. This will strengthen the interpretation of the negative result for reflection-based methods. revision: yes

Circularity Check

0 steps flagged

No circularity detected; empirical framework is self-contained

full rationale

The paper describes an empirical multi-agent trading framework (ATLAS) and a prompt-optimization method (Adaptive-OPRO) that incorporates real-time stochastic feedback. No mathematical derivation chain, self-definitional loops, fitted inputs renamed as predictions, or load-bearing self-citations appear in the abstract or described claims. Performance comparisons (Adaptive-OPRO vs. fixed prompts) are presented as outcomes of regime-specific equity studies across LLM families, without equations or definitions that reduce the central result to its own inputs by construction. The work is therefore treated as self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 0 axioms · 0 invented entities

Review performed on abstract only; no explicit free parameters, axioms, or invented entities with independent evidence are stated in the provided text.

pith-pipeline@v0.9.0 · 5718 in / 1061 out tokens · 37284 ms · 2026-05-21T20:20:05.039937+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

What do these tags mean?
matches
The paper's claim is directly supported by a theorem in the formal canon.
supports
The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends
The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses
The paper appears to rely on the theorem as machinery.
contradicts
The paper's claim conflicts with a theorem or certificate in the canon.
unclear
Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. SHARP: A Self-Evolving Human-Auditable Rubric Policy for Financial Trading Agents

    cs.LG 2026-05 unverdicted novelty 6.0

    SHARP is a neuro-symbolic method that evolves bounded, auditable rule rubrics for LLM trading agents via cross-sample attribution and walk-forward validation, raising compact-model performance by 10-20 percentage poin...

  2. Signal or Noise in Multi-Agent LLM-based Stock Recommendations?

    q-fin.PM 2026-04 unverdicted novelty 6.0

    A multi-agent LLM equity system produces statistically significant outperformance on S&P 500 stocks, with strong-buy portfolios returning +2.18% monthly versus +1.15% for the equal-weight benchmark over 19 months.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · cited by 2 Pith papers

  1. [1]

    Magnificent Seven

    simulation environment and integrated into our analysis framework to provide comprehensive market insights. Data source.The Market Analyst consumes OHLCV , volume, and session VWAP series from Polygon.io2 for the specified instrument and eval- uation window. Bars are retrieved at daily resolu- tion and aligned to official U.S. market sessions, with corpor...

  2. [2]

    No Market Analyst: removes multi-timescale technical structure and indicators

  3. [3]

    No News Analyst: removes unstructured text processing of headlines and stories

  4. [4]

    We do not ablate theFundamental Analystdue to its intentionally low activation frequency within these windows; its role is assessed qualitatively around reporting events

    No Market & No News: leaves only portfolio state and fundamentals. We do not ablate theFundamental Analystdue to its intentionally low activation frequency within these windows; its role is assessed qualitatively around reporting events. Each ablation is run three times. D.6 Evaluation Methodology We use amulti-run protocolof three independent runs per co...

  5. [5]

    action":

    captures momentum shifts by computing the difference between the 12-day and 26-day exponential moving averages. A 9-day EMA of the MACD line is used as a signal line. Trading signals are generated when the MACD line crosses the signal line from below (buy) or from above (sell). The exponential formulation ensures increased sensitivity to recent price move...

  6. [6]

    **Market Structure:** Current trend context and notable support/resistance observations

  7. [7]

    **Price Action:** What the current session dynamics are showing

  8. [8]

    **Technical Patterns:** Observable confluences and technical formations

  9. [9]

    It enables the Market An- alyst to incorporate newly available data, refresh indicator readings, and re-evaluate market condi- tions

    **Notable Levels:** Key price levels and their technical significance 36 37**Available Technical Tools:** 38- Standard indicators: Moving averages, RSI, MACD, ATR, volume analysis 39- Advanced levels: Fibonacci retracements/extensions, pivot points, psychological levels 40- Pattern recognition: Chart patterns, candlestick formations, breakout setups 41- V...

  10. [10]

    **Sentiment Assessment:** What’s the overall sentiment trajectory and key narrative changes?

  11. [11]

    **Key Developments:** What significant events or announcements are reported?

  12. [12]

    **Market Relevance:** How might this news content relate to market conditions?

  13. [13]

    It enables the News Analyst to incorporate new articles, track evolv- ing sentiment trends, and reassess the relevance or reliability of information sources

    **Source Analysis:** Any source reliability concerns or consensus alignment issues? 32 33**Response Format:** 34- Write in simple, direct language without jargon overuse 35- Each section should be 2-3 concise sentences maximum 36- Avoid repetitive phrasing and redundant explanations 37- No excessive formatting, bold text, or bullet point lists 38- Focus o...

  14. [14]

    **Setup (this message)** - Complete framework, methodology and initial fundamentals batch

  15. [15]

    performance_analysis

    **Delta updates** - Compact {{ action_interval }} updates with updated fundamentals 9 10**CRITICAL:** Future deltas contain NO repeated instructions. 11All analytical frameworks must persist. 12 13You are an elite fundamental analyst with deep expertise in financial statement analysis and corporate finance. 14Your reputation is built on the ability 15to q...

  16. [16]

    Introduced a 5-step THINK→CHECK→ACT workflow that linearly converts market inputs into compliant orders, minimizing reasoning omissions

  17. [17]

    Added an explicit PRE-ORDER RISK CHECKLIST (cash, short limit, catalyst validity, ≥ 2:1 R:R) to curb rule violations and low-edge trades

  18. [18]

    Elevated the four context feeds (technical, news, fundamentals, reflection) into a singleMARKET SITUATION dashboard that the workflow must reference, ensuring holistic analysis

  19. [19]

    Moved the strict JSON schema into its own boxed section immediately before output in- structions; this reduces formatting errors

  20. [20]

    Kept language concise but directive, reinforcing trader autonomy while preventing over-trading with aPATIENCEoverride

  21. [21]

    -” and lines in green with a leading “+

    Preserved every required {{placeholder}} and {% if %} block exactly, guaranteeing template compatibility. Each modification directly corresponds to a specific weakness identified in the diagnostic phase, creating a clear causal chain from prob- lem identification to solution implementation. The architectural changes shown in Figures 4, 5, and 6 demonstrat...

  22. [22]

    Define thesis & edge

  23. [23]

    Identify entry, stop, and target levels

  24. [24]

    Assess risk/reward & size within cash limits

  25. [25]

    Choose order type & execution timing

  26. [26]

    action":

    Verify constraints & finalize plan ## CONSTRAINTS & PORTFOLIO - Fully concentrated in {{ instrument }}, Cash ${{ portfolio_cash }} - Long {{ shares_long }} | Short {{ shares_short }} | Net {{ shares_net }} - Recent orders: {{ executed_orders }} - Max short = 100% cash; close all shorts by {{ window_end }} - Actions: BUY, SELL, SHORT, SHORT_COVER - Order T...

  27. [27]

    Define Thesis & Edge: state your core conviction

  28. [28]

    Map Key Levels: identify entry, stop-loss, and target levels

  29. [29]

    Assess Risk/Reward: compute per-share risk, total risk, and reward potential

  30. [30]

    Allocate Size: determine quantity within cash limits (${{ portfolio_cash }})

  31. [31]

    Choose Execution: select action (BUY | SELL | SHORT | SHORT_COVER) and orderType (MARKET | LIMIT | STOP)

  32. [32]

    action":

    Validate Compliance: ensure all constraints are met before submission. ## OUTPUT SPECIFICATION Return only a JSON array of orders or an empty array ([]). No extra text: [ { "action": "BUY | SELL | SHORT | SHORT_COVER", "orderType": "MARKET | LIMIT | STOP", "price": float | null, "quantity": integer, "explanation": "Concise strategic reasoning" } ] Figure ...