arxiv: 2510.15949 · v4 · submitted 2025-10-10 · 💱 q-fin.TR · cs.AI

ATLAS: Adaptive Trading with LLM AgentS Through Dynamic Prompt Optimization and Multi-Agent Coordination

Charidimos Papadakis , Angeliki Dimitriou , Giorgos Filandrianos , Maria Lymperaiou , Konstantinos Thomas , Giorgos Stamou This is my paper

Pith reviewed 2026-05-18 08:16 UTC · model grok-4.3

classification 💱 q-fin.TR cs.AI

keywords LLM trading agentsprompt optimizationmulti-agent systemsalgorithmic tradingmarket feedbackadaptive learningfinancial decision making

0 comments

The pith

Adaptive prompt optimization lets LLM trading agents improve over time using noisy market feedback

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces ATLAS, a multi-agent system for financial decision-making that pulls together market data, news, and fundamentals into coherent trading actions. Its key innovation is Adaptive-OPRO, which updates the agent's instructions in real time by feeding back actual trading results even when those results are delayed and noisy. Tests on equity markets under different regimes and with several large language models show that this adaptive method beats static prompts, while reflection-based self-critique does not deliver reliable gains. The work targets the core difficulty of turning late, obscured rewards into better future decisions without requiring perfect upfront knowledge. If successful, it suggests LLM agents can gradually refine their behavior through ongoing interaction with real financial outcomes rather than relying on one-time prompt design.

Core claim

ATLAS is a unified multi-agent framework that integrates structured information from markets, news, and corporate fundamentals to support robust trading decisions. The central trading agent operates in an order-aware action space, ensuring that outputs correspond to executable market orders rather than abstract signals. The agent incorporates feedback while trading using Adaptive-OPRO, a novel prompt-optimization technique that dynamically adapts the prompt by incorporating real-time, stochastic feedback, leading to increasing performance over time. Across regime-specific equity studies and multiple LLM families, Adaptive-OPRO consistently outperforms fixed prompts, while reflection-based反馈

What carries the argument

Adaptive-OPRO, a prompt-optimization technique that dynamically adapts the agent's instructions by incorporating real-time stochastic feedback from trading outcomes to drive performance gains

Load-bearing premise

That real-time stochastic feedback from trading outcomes, despite arriving late and being obscured by market noise, can be effectively incorporated into prompt adaptation to produce systematic performance gains over time

What would settle it

A fresh set of regime-specific equity trading experiments in which Adaptive-OPRO shows no consistent outperformance over fixed prompts across multiple LLM families would falsify the central claim

Figures

Figures reproduced from arXiv: 2510.15949 by Angeliki Dimitriou, Charidimos Papadakis, Giorgos Filandrianos, Giorgos Stamou, Konstantinos Thomas, Maria Lymperaiou.

**Figure 2.** Figure 2: ROI across three assets using Adaptive-OPRO. [PITH_FULL_IMAGE:figures/full_fig_p017_2.png] view at source ↗

**Figure 3.** Figure 3: Daily vs weekly reflection mechanism performance comparison across models and assets, showing ROI [PITH_FULL_IMAGE:figures/full_fig_p023_3.png] view at source ↗

**Figure 4.** Figure 4: Header and trader identity modifications between iteration 4 and iteration 5, showing title changes and [PITH_FULL_IMAGE:figures/full_fig_p033_4.png] view at source ↗

**Figure 5.** Figure 5: Structural reorganization consolidating sections into a unified [PITH_FULL_IMAGE:figures/full_fig_p034_5.png] view at source ↗

**Figure 6.** Figure 6: Decision protocol restructuring from informal [PITH_FULL_IMAGE:figures/full_fig_p035_6.png] view at source ↗

**Figure 7.** Figure 7: Intermediate optimization (GPT-o4-mini, Prompt 4) featuring streamlined structure with a numbered [PITH_FULL_IMAGE:figures/full_fig_p036_7.png] view at source ↗

**Figure 8.** Figure 8: Final optimized prompt (GPT-o4-mini, Prompt 11) with a six-step decision framework and systematic [PITH_FULL_IMAGE:figures/full_fig_p037_8.png] view at source ↗

read the original abstract

Large language models show promise for financial decision-making, yet deploying them as autonomous trading agents raises fundamental challenges: how to adapt instructions when rewards arrive late and obscured by market noise, how to synthesize heterogeneous information streams into coherent decisions, and how to bridge the gap between model outputs and executable market actions. We present ATLAS (Adaptive Trading with LLM AgentS), a unified multi-agent framework that integrates structured information from markets, news, and corporate fundamentals to support robust trading decisions. Within ATLAS, the central trading agent operates in an order-aware action space, ensuring that outputs correspond to executable market orders rather than abstract signals. The agent can incorporate feedback while trading using Adaptive-OPRO, a novel prompt-optimization technique that dynamically adapts the prompt by incorporating real-time, stochastic feedback, leading to increasing performance over time. Across regime-specific equity studies and multiple LLM families, Adaptive-OPRO consistently outperforms fixed prompts, while reflection-based feedback fails to provide systematic gains.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

2 major / 2 minor

Summary. The manuscript introduces ATLAS, a multi-agent LLM framework for trading that integrates market data, news, and fundamentals. The central trading agent uses an order-aware action space for executable orders. Adaptive-OPRO is proposed as a dynamic prompt-optimization method that incorporates real-time stochastic feedback from trading outcomes to achieve increasing performance over time. Across regime-specific equity studies and multiple LLM families, the paper claims Adaptive-OPRO consistently outperforms fixed prompts while reflection-based feedback fails to deliver systematic gains.

Significance. If the empirical results hold after addressing controls for market noise, ATLAS could advance LLM deployment in noisy, delayed-reward financial settings by demonstrating effective prompt adaptation and multi-agent coordination. The order-aware action space and integration of heterogeneous information streams address practical deployment gaps. The work's value would be strengthened by reproducible code or parameter-free elements, but currently rests on empirical claims that require clearer attribution of gains to the adaptation loop.

major comments (2)

[Abstract] Abstract: The assertion of 'consistent outperformance' and 'increasing performance over time' across regimes and LLM families is made without any quantitative metrics, error bars, dataset details, or experimental controls, leaving the central empirical claim without visible supporting evidence.
[Methodology] Methodology (Adaptive-OPRO description): The method claims to use stochastic feedback from trading outcomes despite late arrival and market noise, yet no explicit noise-filtering, counterfactual estimation, or attribution mechanism (e.g., realized P&L vs. no-action baseline) is described to isolate the adaptation effect from exogenous variance or multi-agent coordination.

minor comments (2)

[Methodology] Clarify the exact structure of the order-aware action space with a concrete example of how model outputs map to executable orders.
[Experiments] Add a table summarizing performance metrics across LLM families and regimes to support the outperformance claims.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for the constructive and detailed comments on our manuscript. These observations help clarify the presentation of our empirical results and the handling of noisy, delayed rewards in Adaptive-OPRO. We address each major comment below and have revised the manuscript to strengthen the supporting evidence and methodological transparency.

read point-by-point responses

Referee: [Abstract] Abstract: The assertion of 'consistent outperformance' and 'increasing performance over time' across regimes and LLM families is made without any quantitative metrics, error bars, dataset details, or experimental controls, leaving the central empirical claim without visible supporting evidence.

Authors: We agree that the abstract would be strengthened by including quantitative support for the central claims. The full paper reports these results with metrics, standard errors, and controls in the Experiments section and associated tables/figures. We have revised the abstract to summarize key quantitative findings (outperformance margins, regime-specific results, and LLM-family consistency) while directing readers to the detailed evidence, error bars, and dataset descriptions already present in the main text. This change makes the empirical claims visible at the abstract level without altering the underlying experiments. revision: yes
Referee: [Methodology] Methodology (Adaptive-OPRO description): The method claims to use stochastic feedback from trading outcomes despite late arrival and market noise, yet no explicit noise-filtering, counterfactual estimation, or attribution mechanism (e.g., realized P&L vs. no-action baseline) is described to isolate the adaptation effect from exogenous variance or multi-agent coordination.

Authors: The referee correctly notes that clearer attribution would help isolate the contribution of the adaptation loop. Adaptive-OPRO incorporates stochastic trading outcomes directly as feedback signals for prompt updates, with robustness to noise arising from the optimization procedure itself rather than separate filtering steps. In the revision we have added an explicit paragraph in the methodology describing the attribution approach: all comparisons are performed within the same multi-agent framework, using fixed-prompt and no-adaptation baselines to attribute performance gains specifically to dynamic prompt optimization. We have also clarified that delayed rewards are handled by aggregating outcome signals over trading windows and that full per-trade counterfactuals are approximated via regime-controlled backtests. We acknowledge that explicit per-trade noise filtering or perfect counterfactuals are not feasible in non-stationary markets and therefore rely on the controlled experimental design already reported. revision: partial

Circularity Check

0 steps flagged

No circularity: empirical performance claims rest on experimental comparisons without self-referential derivations

full rationale

The paper describes a multi-agent trading framework (ATLAS) and introduces Adaptive-OPRO as a prompt-optimization method that incorporates stochastic feedback from trading outcomes. Its central claims of outperformance over fixed prompts and reflection-based methods are presented as results from regime-specific equity studies across multiple LLM families. No equations, normalizations, or first-principles derivations appear in the provided text that would reduce reported gains to fitted parameters or self-definitions by construction. The method is justified through direct empirical evaluation rather than any load-bearing self-citation chain or ansatz smuggled via prior work, making the derivation chain self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on domain assumptions about LLM information synthesis and the utility of stochastic trading feedback for adaptation, with Adaptive-OPRO introduced as a new method without additional free parameters or physical entities.

axioms (2)

domain assumption LLMs can synthesize heterogeneous information streams from markets, news, and corporate fundamentals into coherent trading decisions
Presented as the core capability the multi-agent system supports in the abstract.
domain assumption Stochastic feedback from trading outcomes can be used to dynamically adapt prompts despite delayed rewards and market noise
Central premise enabling the Adaptive-OPRO technique and performance gains.

invented entities (1)

Adaptive-OPRO no independent evidence
purpose: Dynamic prompt adaptation technique that incorporates real-time stochastic feedback to improve trading performance over time
Introduced as a novel method in the paper to address adaptation challenges.

pith-pipeline@v0.9.0 · 5718 in / 1440 out tokens · 22692 ms · 2026-05-18T08:16:46.812492+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SHARP: A Self-Evolving Human-Auditable Rubric Policy for Financial Trading Agents
cs.LG 2026-05 unverdicted novelty 6.0

SHARP is a neuro-symbolic method that evolves bounded, auditable rule rubrics for LLM trading agents via cross-sample attribution and walk-forward validation, raising compact-model performance by 10-20 percentage poin...
Signal or Noise in Multi-Agent LLM-based Stock Recommendations?
q-fin.PM 2026-04 unverdicted novelty 6.0

A multi-agent LLM equity system produces statistically significant outperformance on S&P 500 stocks, with strong-buy portfolios returning +2.18% monthly versus +1.15% for the equal-weight benchmark over 19 months.

Reference graph

Works this paper leans on

32 extracted references · 32 canonical work pages · cited by 2 Pith papers

[1]

Magnificent Seven

simulation environment and integrated into our analysis framework to provide comprehensive market insights. Data source.The Market Analyst consumes OHLCV , volume, and session VWAP series from Polygon.io2 for the specified instrument and eval- uation window. Bars are retrieved at daily resolu- tion and aligned to official U.S. market sessions, with corpor...

work page 1999
[2]

No Market Analyst: removes multi-timescale technical structure and indicators

work page
[3]

No News Analyst: removes unstructured text processing of headlines and stories

work page
[4]

We do not ablate theFundamental Analystdue to its intentionally low activation frequency within these windows; its role is assessed qualitatively around reporting events

No Market & No News: leaves only portfolio state and fundamentals. We do not ablate theFundamental Analystdue to its intentionally low activation frequency within these windows; its role is assessed qualitatively around reporting events. Each ablation is run three times. D.6 Evaluation Methodology We use amulti-run protocolof three independent runs per co...

work page 1996
[5]

action":

captures momentum shifts by computing the difference between the 12-day and 26-day exponential moving averages. A 9-day EMA of the MACD line is used as a signal line. Trading signals are generated when the MACD line crosses the signal line from below (buy) or from above (sell). The exponential formulation ensures increased sensitivity to recent price move...

work page 2023
[6]

**Market Structure:** Current trend context and notable support/resistance observations

work page
[7]

**Price Action:** What the current session dynamics are showing

work page
[8]

**Technical Patterns:** Observable confluences and technical formations

work page
[9]

It enables the Market An- alyst to incorporate newly available data, refresh indicator readings, and re-evaluate market condi- tions

**Notable Levels:** Key price levels and their technical significance 36 37**Available Technical Tools:** 38- Standard indicators: Moving averages, RSI, MACD, ATR, volume analysis 39- Advanced levels: Fibonacci retracements/extensions, pivot points, psychological levels 40- Pattern recognition: Chart patterns, candlestick formations, breakout setups 41- V...

work page
[10]

**Sentiment Assessment:** What’s the overall sentiment trajectory and key narrative changes?

work page
[11]

**Key Developments:** What significant events or announcements are reported?

work page
[12]

**Market Relevance:** How might this news content relate to market conditions?

work page
[13]

It enables the News Analyst to incorporate new articles, track evolv- ing sentiment trends, and reassess the relevance or reliability of information sources

**Source Analysis:** Any source reliability concerns or consensus alignment issues? 32 33**Response Format:** 34- Write in simple, direct language without jargon overuse 35- Each section should be 2-3 concise sentences maximum 36- Avoid repetitive phrasing and redundant explanations 37- No excessive formatting, bold text, or bullet point lists 38- Focus o...

work page
[14]

**Setup (this message)** - Complete framework, methodology and initial fundamentals batch

work page
[15]

performance_analysis

**Delta updates** - Compact {{ action_interval }} updates with updated fundamentals 9 10**CRITICAL:** Future deltas contain NO repeated instructions. 11All analytical frameworks must persist. 12 13You are an elite fundamental analyst with deep expertise in financial statement analysis and corporate finance. 14Your reputation is built on the ability 15to q...

work page
[16]

Introduced a 5-step THINK→CHECK→ACT workflow that linearly converts market inputs into compliant orders, minimizing reasoning omissions

work page
[17]

Added an explicit PRE-ORDER RISK CHECKLIST (cash, short limit, catalyst validity, ≥ 2:1 R:R) to curb rule violations and low-edge trades

work page
[18]

Elevated the four context feeds (technical, news, fundamentals, reflection) into a singleMARKET SITUATION dashboard that the workflow must reference, ensuring holistic analysis

work page
[19]

Moved the strict JSON schema into its own boxed section immediately before output in- structions; this reduces formatting errors

work page
[20]

Kept language concise but directive, reinforcing trader autonomy while preventing over-trading with aPATIENCEoverride

work page
[21]

-” and lines in green with a leading “+

Preserved every required {{placeholder}} and {% if %} block exactly, guaranteeing template compatibility. Each modification directly corresponds to a specific weakness identified in the diagnostic phase, creating a clear causal chain from prob- lem identification to solution implementation. The architectural changes shown in Figures 4, 5, and 6 demonstrat...

work page 2025
[22]

Define thesis & edge

work page
[23]

Identify entry, stop, and target levels

work page
[24]

Assess risk/reward & size within cash limits

work page
[25]

Choose order type & execution timing

work page
[26]

action":

Verify constraints & finalize plan ## CONSTRAINTS & PORTFOLIO - Fully concentrated in {{ instrument }}, Cash ${{ portfolio_cash }} - Long {{ shares_long }} | Short {{ shares_short }} | Net {{ shares_net }} - Recent orders: {{ executed_orders }} - Max short = 100% cash; close all shorts by {{ window_end }} - Actions: BUY, SELL, SHORT, SHORT_COVER - Order T...

work page
[27]

Define Thesis & Edge: state your core conviction

work page
[28]

Map Key Levels: identify entry, stop-loss, and target levels

work page
[29]

Assess Risk/Reward: compute per-share risk, total risk, and reward potential

work page
[30]

Allocate Size: determine quantity within cash limits (${{ portfolio_cash }})

work page
[31]

Choose Execution: select action (BUY | SELL | SHORT | SHORT_COVER) and orderType (MARKET | LIMIT | STOP)

work page
[32]

action":

Validate Compliance: ensure all constraints are met before submission. ## OUTPUT SPECIFICATION Return only a JSON array of orders or an empty array ([]). No extra text: [ { "action": "BUY | SELL | SHORT | SHORT_COVER", "orderType": "MARKET | LIMIT | STOP", "price": float | null, "quantity": integer, "explanation": "Concise strategic reasoning" } ] Figure ...

work page