ATLAS: Adaptive Trading with LLM AgentS Through Dynamic Prompt Optimization and Multi-Agent Coordination
Pith reviewed 2026-05-18 08:16 UTC · model grok-4.3
The pith
Adaptive prompt optimization lets LLM trading agents improve over time using noisy market feedback
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
ATLAS is a unified multi-agent framework that integrates structured information from markets, news, and corporate fundamentals to support robust trading decisions. The central trading agent operates in an order-aware action space, ensuring that outputs correspond to executable market orders rather than abstract signals. The agent incorporates feedback while trading using Adaptive-OPRO, a novel prompt-optimization technique that dynamically adapts the prompt by incorporating real-time, stochastic feedback, leading to increasing performance over time. Across regime-specific equity studies and multiple LLM families, Adaptive-OPRO consistently outperforms fixed prompts, while reflection-based反馈
What carries the argument
Adaptive-OPRO, a prompt-optimization technique that dynamically adapts the agent's instructions by incorporating real-time stochastic feedback from trading outcomes to drive performance gains
Load-bearing premise
That real-time stochastic feedback from trading outcomes, despite arriving late and being obscured by market noise, can be effectively incorporated into prompt adaptation to produce systematic performance gains over time
What would settle it
A fresh set of regime-specific equity trading experiments in which Adaptive-OPRO shows no consistent outperformance over fixed prompts across multiple LLM families would falsify the central claim
Figures
read the original abstract
Large language models show promise for financial decision-making, yet deploying them as autonomous trading agents raises fundamental challenges: how to adapt instructions when rewards arrive late and obscured by market noise, how to synthesize heterogeneous information streams into coherent decisions, and how to bridge the gap between model outputs and executable market actions. We present ATLAS (Adaptive Trading with LLM AgentS), a unified multi-agent framework that integrates structured information from markets, news, and corporate fundamentals to support robust trading decisions. Within ATLAS, the central trading agent operates in an order-aware action space, ensuring that outputs correspond to executable market orders rather than abstract signals. The agent can incorporate feedback while trading using Adaptive-OPRO, a novel prompt-optimization technique that dynamically adapts the prompt by incorporating real-time, stochastic feedback, leading to increasing performance over time. Across regime-specific equity studies and multiple LLM families, Adaptive-OPRO consistently outperforms fixed prompts, while reflection-based feedback fails to provide systematic gains.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces ATLAS, a multi-agent LLM framework for trading that integrates market data, news, and fundamentals. The central trading agent uses an order-aware action space for executable orders. Adaptive-OPRO is proposed as a dynamic prompt-optimization method that incorporates real-time stochastic feedback from trading outcomes to achieve increasing performance over time. Across regime-specific equity studies and multiple LLM families, the paper claims Adaptive-OPRO consistently outperforms fixed prompts while reflection-based feedback fails to deliver systematic gains.
Significance. If the empirical results hold after addressing controls for market noise, ATLAS could advance LLM deployment in noisy, delayed-reward financial settings by demonstrating effective prompt adaptation and multi-agent coordination. The order-aware action space and integration of heterogeneous information streams address practical deployment gaps. The work's value would be strengthened by reproducible code or parameter-free elements, but currently rests on empirical claims that require clearer attribution of gains to the adaptation loop.
major comments (2)
- [Abstract] Abstract: The assertion of 'consistent outperformance' and 'increasing performance over time' across regimes and LLM families is made without any quantitative metrics, error bars, dataset details, or experimental controls, leaving the central empirical claim without visible supporting evidence.
- [Methodology] Methodology (Adaptive-OPRO description): The method claims to use stochastic feedback from trading outcomes despite late arrival and market noise, yet no explicit noise-filtering, counterfactual estimation, or attribution mechanism (e.g., realized P&L vs. no-action baseline) is described to isolate the adaptation effect from exogenous variance or multi-agent coordination.
minor comments (2)
- [Methodology] Clarify the exact structure of the order-aware action space with a concrete example of how model outputs map to executable orders.
- [Experiments] Add a table summarizing performance metrics across LLM families and regimes to support the outperformance claims.
Simulated Author's Rebuttal
We thank the referee for the constructive and detailed comments on our manuscript. These observations help clarify the presentation of our empirical results and the handling of noisy, delayed rewards in Adaptive-OPRO. We address each major comment below and have revised the manuscript to strengthen the supporting evidence and methodological transparency.
read point-by-point responses
-
Referee: [Abstract] Abstract: The assertion of 'consistent outperformance' and 'increasing performance over time' across regimes and LLM families is made without any quantitative metrics, error bars, dataset details, or experimental controls, leaving the central empirical claim without visible supporting evidence.
Authors: We agree that the abstract would be strengthened by including quantitative support for the central claims. The full paper reports these results with metrics, standard errors, and controls in the Experiments section and associated tables/figures. We have revised the abstract to summarize key quantitative findings (outperformance margins, regime-specific results, and LLM-family consistency) while directing readers to the detailed evidence, error bars, and dataset descriptions already present in the main text. This change makes the empirical claims visible at the abstract level without altering the underlying experiments. revision: yes
-
Referee: [Methodology] Methodology (Adaptive-OPRO description): The method claims to use stochastic feedback from trading outcomes despite late arrival and market noise, yet no explicit noise-filtering, counterfactual estimation, or attribution mechanism (e.g., realized P&L vs. no-action baseline) is described to isolate the adaptation effect from exogenous variance or multi-agent coordination.
Authors: The referee correctly notes that clearer attribution would help isolate the contribution of the adaptation loop. Adaptive-OPRO incorporates stochastic trading outcomes directly as feedback signals for prompt updates, with robustness to noise arising from the optimization procedure itself rather than separate filtering steps. In the revision we have added an explicit paragraph in the methodology describing the attribution approach: all comparisons are performed within the same multi-agent framework, using fixed-prompt and no-adaptation baselines to attribute performance gains specifically to dynamic prompt optimization. We have also clarified that delayed rewards are handled by aggregating outcome signals over trading windows and that full per-trade counterfactuals are approximated via regime-controlled backtests. We acknowledge that explicit per-trade noise filtering or perfect counterfactuals are not feasible in non-stationary markets and therefore rely on the controlled experimental design already reported. revision: partial
Circularity Check
No circularity: empirical performance claims rest on experimental comparisons without self-referential derivations
full rationale
The paper describes a multi-agent trading framework (ATLAS) and introduces Adaptive-OPRO as a prompt-optimization method that incorporates stochastic feedback from trading outcomes. Its central claims of outperformance over fixed prompts and reflection-based methods are presented as results from regime-specific equity studies across multiple LLM families. No equations, normalizations, or first-principles derivations appear in the provided text that would reduce reported gains to fitted parameters or self-definitions by construction. The method is justified through direct empirical evaluation rather than any load-bearing self-citation chain or ansatz smuggled via prior work, making the derivation chain self-contained against external benchmarks.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption LLMs can synthesize heterogeneous information streams from markets, news, and corporate fundamentals into coherent trading decisions
- domain assumption Stochastic feedback from trading outcomes can be used to dynamically adapt prompts despite delayed rewards and market noise
invented entities (1)
-
Adaptive-OPRO
no independent evidence
Forward citations
Cited by 2 Pith papers
-
SHARP: A Self-Evolving Human-Auditable Rubric Policy for Financial Trading Agents
SHARP is a neuro-symbolic method that evolves bounded, auditable rule rubrics for LLM trading agents via cross-sample attribution and walk-forward validation, raising compact-model performance by 10-20 percentage poin...
-
Signal or Noise in Multi-Agent LLM-based Stock Recommendations?
A multi-agent LLM equity system produces statistically significant outperformance on S&P 500 stocks, with strong-buy portfolios returning +2.18% monthly versus +1.15% for the equal-weight benchmark over 19 months.
Reference graph
Works this paper leans on
-
[1]
simulation environment and integrated into our analysis framework to provide comprehensive market insights. Data source.The Market Analyst consumes OHLCV , volume, and session VWAP series from Polygon.io2 for the specified instrument and eval- uation window. Bars are retrieved at daily resolu- tion and aligned to official U.S. market sessions, with corpor...
work page 1999
-
[2]
No Market Analyst: removes multi-timescale technical structure and indicators
-
[3]
No News Analyst: removes unstructured text processing of headlines and stories
-
[4]
No Market & No News: leaves only portfolio state and fundamentals. We do not ablate theFundamental Analystdue to its intentionally low activation frequency within these windows; its role is assessed qualitatively around reporting events. Each ablation is run three times. D.6 Evaluation Methodology We use amulti-run protocolof three independent runs per co...
work page 1996
-
[5]
captures momentum shifts by computing the difference between the 12-day and 26-day exponential moving averages. A 9-day EMA of the MACD line is used as a signal line. Trading signals are generated when the MACD line crosses the signal line from below (buy) or from above (sell). The exponential formulation ensures increased sensitivity to recent price move...
work page 2023
-
[6]
**Market Structure:** Current trend context and notable support/resistance observations
-
[7]
**Price Action:** What the current session dynamics are showing
-
[8]
**Technical Patterns:** Observable confluences and technical formations
-
[9]
**Notable Levels:** Key price levels and their technical significance 36 37**Available Technical Tools:** 38- Standard indicators: Moving averages, RSI, MACD, ATR, volume analysis 39- Advanced levels: Fibonacci retracements/extensions, pivot points, psychological levels 40- Pattern recognition: Chart patterns, candlestick formations, breakout setups 41- V...
-
[10]
**Sentiment Assessment:** What’s the overall sentiment trajectory and key narrative changes?
-
[11]
**Key Developments:** What significant events or announcements are reported?
-
[12]
**Market Relevance:** How might this news content relate to market conditions?
-
[13]
**Source Analysis:** Any source reliability concerns or consensus alignment issues? 32 33**Response Format:** 34- Write in simple, direct language without jargon overuse 35- Each section should be 2-3 concise sentences maximum 36- Avoid repetitive phrasing and redundant explanations 37- No excessive formatting, bold text, or bullet point lists 38- Focus o...
-
[14]
**Setup (this message)** - Complete framework, methodology and initial fundamentals batch
-
[15]
**Delta updates** - Compact {{ action_interval }} updates with updated fundamentals 9 10**CRITICAL:** Future deltas contain NO repeated instructions. 11All analytical frameworks must persist. 12 13You are an elite fundamental analyst with deep expertise in financial statement analysis and corporate finance. 14Your reputation is built on the ability 15to q...
-
[16]
Introduced a 5-step THINK→CHECK→ACT workflow that linearly converts market inputs into compliant orders, minimizing reasoning omissions
-
[17]
Added an explicit PRE-ORDER RISK CHECKLIST (cash, short limit, catalyst validity, ≥ 2:1 R:R) to curb rule violations and low-edge trades
-
[18]
Elevated the four context feeds (technical, news, fundamentals, reflection) into a singleMARKET SITUATION dashboard that the workflow must reference, ensuring holistic analysis
-
[19]
Moved the strict JSON schema into its own boxed section immediately before output in- structions; this reduces formatting errors
-
[20]
Kept language concise but directive, reinforcing trader autonomy while preventing over-trading with aPATIENCEoverride
-
[21]
-” and lines in green with a leading “+
Preserved every required {{placeholder}} and {% if %} block exactly, guaranteeing template compatibility. Each modification directly corresponds to a specific weakness identified in the diagnostic phase, creating a clear causal chain from prob- lem identification to solution implementation. The architectural changes shown in Figures 4, 5, and 6 demonstrat...
work page 2025
-
[22]
Define thesis & edge
-
[23]
Identify entry, stop, and target levels
-
[24]
Assess risk/reward & size within cash limits
-
[25]
Choose order type & execution timing
-
[26]
Verify constraints & finalize plan ## CONSTRAINTS & PORTFOLIO - Fully concentrated in {{ instrument }}, Cash ${{ portfolio_cash }} - Long {{ shares_long }} | Short {{ shares_short }} | Net {{ shares_net }} - Recent orders: {{ executed_orders }} - Max short = 100% cash; close all shorts by {{ window_end }} - Actions: BUY, SELL, SHORT, SHORT_COVER - Order T...
-
[27]
Define Thesis & Edge: state your core conviction
-
[28]
Map Key Levels: identify entry, stop-loss, and target levels
-
[29]
Assess Risk/Reward: compute per-share risk, total risk, and reward potential
-
[30]
Allocate Size: determine quantity within cash limits (${{ portfolio_cash }})
-
[31]
Choose Execution: select action (BUY | SELL | SHORT | SHORT_COVER) and orderType (MARKET | LIMIT | STOP)
-
[32]
Validate Compliance: ensure all constraints are met before submission. ## OUTPUT SPECIFICATION Return only a JSON array of orders or an empty array ([]). No extra text: [ { "action": "BUY | SELL | SHORT | SHORT_COVER", "orderType": "MARKET | LIMIT | STOP", "price": float | null, "quantity": integer, "explanation": "Concise strategic reasoning" } ] Figure ...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.