Recursive Multi-Agent Trading System: Iterative Optimized Portfolio Strategy Under Geopolitical Uncertainty
Pith reviewed 2026-06-29 20:01 UTC · model grok-4.3
The pith
A recursive multi-agent trading system achieves a 9.62 percent maximum drawdown over 561 days, outperforming mean-variance optimization and sentiment baselines in most geopolitical stress tests.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
RMATS integrates Sentiment, Report, Analysis, and Risk agents under a recursive Manager Agent that applies iterative feedback loops; over the January 2023 to March 2025 period this produces a 9.62 percent maximum drawdown, lower than MVO at 15.49 percent and FinBERT Sentiment at 15.28 percent, while recording the lowest drawdown in three of the five geopolitical stress scenarios.
What carries the argument
The recursive Manager Agent that coordinates iterative feedback loops among the four specialized agents.
If this is right
- RMATS trades off some upside in sustained bull markets to achieve lower drawdowns.
- Ablation results indicate that removing any single agent weakens the downside protection.
- The architecture is positioned for institutional use focused on capital preservation rather than return maximization.
- The lowest event-period drawdowns occur in three of the five tested geopolitical scenarios.
Where Pith is reading between the lines
- The same recursive coordination pattern could be applied to non-financial multi-agent decision tasks that require repeated revision under uncertainty.
- Extending the agent set or replacing individual agents with newer language models might further reduce drawdown if the feedback mechanism remains intact.
- A live deployment with real capital would reveal whether execution costs or model drift alter the backtested risk reduction.
Load-bearing premise
The chosen 561-day window and the five selected geopolitical stress periods are representative enough to support general claims about risk-control performance.
What would settle it
A backtest on a later or earlier multi-year window that includes new stress events and shows RMATS maximum drawdown exceeding the MVO or FinBERT baselines would falsify the reported advantage.
Figures
read the original abstract
Recursive Multi-Agent Trading System (RMATS) integrates four specialized agents -- Sentiment, Report, Analysis, and Risk -- coordinated through a recursive Manager Agent with iterative feedback loops. Experimental evaluation over a 561-trading-day period (January 2023 to March 2025) across a 24-asset multi-class universe demonstrates that RMATS achieves a maximum drawdown of 9.62%, lower than MVO (15.49%) and FinBERT Sentiment (15.28%), and exhibits the lowest event-period drawdown in 3 of 5 geopolitical stress scenarios tested. While RMATS underperforms return-maximizing baselines in a sustained bull market environment, ablation studies confirm the individual contribution of each agent component to downside protection. These results position RMATS as a risk-control-oriented architecture suitable for institutions prioritizing capital preservation under geopolitical uncertainty.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The manuscript introduces the Recursive Multi-Agent Trading System (RMATS), which uses four specialized agents (Sentiment, Report, Analysis, Risk) coordinated by a recursive Manager Agent with iterative feedback loops for portfolio strategy under geopolitical uncertainty. Over a 561-trading-day backtest (Jan 2023 - Mar 2025) on 24 assets, it reports a maximum drawdown of 9.62% (vs. 15.49% for MVO and 15.28% for FinBERT Sentiment), with lowest event-period drawdown in 3 of 5 stress scenarios, and ablation studies supporting each agent's contribution.
Significance. If the empirical results can be verified and generalized, RMATS could represent a meaningful advance in applying multi-agent systems to risk-controlled portfolio management in uncertain environments. The recursive feedback mechanism offers a novel way to integrate agent outputs iteratively. However, the short and recent backtest period, combined with lack of implementation details, limits the immediate significance for the field.
major comments (3)
- Abstract and Experimental Evaluation: The central performance claims (maximum drawdown of 9.62%, lowest in 3/5 scenarios) are presented without any details on the implementation of the agents, the recursive feedback loops, the specific data sources or asset selection criteria for the 24-asset universe, or any controls for look-ahead bias or data snooping. This makes the results unverifiable and is load-bearing for the claim of superior risk control.
- Ablation Studies: Ablation studies are cited as confirming the contribution of each agent component, but no quantitative results, setup details, or comparison metrics are provided in the manuscript. Without this, it is impossible to assess whether the recursive Manager Agent's iterative mechanism is isolated or if effects are due to other factors.
- Experimental Setup: No statistical significance tests (e.g., t-tests or bootstrap on drawdown differences), walk-forward validation, or robustness checks against scenario selection are reported. The 561-day window and post-hoc identification of 5 geopolitical stress periods raise concerns about generalizability.
minor comments (2)
- The manuscript would benefit from clearer notation or pseudocode for the recursive Manager Agent to aid reproducibility.
- Consider adding references to prior work on multi-agent systems in finance for context.
Simulated Author's Rebuttal
Thank you for the detailed review. We address each of the major comments below, agreeing that additional details and analyses will strengthen the manuscript.
read point-by-point responses
-
Referee: [—] Abstract and Experimental Evaluation: The central performance claims (maximum drawdown of 9.62%, lowest in 3/5 scenarios) are presented without any details on the implementation of the agents, the recursive feedback loops, the specific data sources or asset selection criteria for the 24-asset universe, or any controls for look-ahead bias or data snooping. This makes the results unverifiable and is load-bearing for the claim of superior risk control.
Authors: We agree that the current manuscript lacks sufficient implementation details to allow full verification of the results. The full text describes the agent roles at a high level, but we will expand Section 3 with specific implementation details, including pseudocode for the recursive Manager Agent's feedback loops, exact data sources (e.g., news APIs and financial databases used), asset selection criteria (liquidity thresholds and sector balance for the 24 assets), and explicit statements on the use of only historical data to avoid look-ahead bias. These additions will be included in the revised manuscript. revision: yes
-
Referee: [—] Ablation Studies: Ablation studies are cited as confirming the contribution of each agent component, but no quantitative results, setup details, or comparison metrics are provided in the manuscript. Without this, it is impossible to assess whether the recursive Manager Agent's iterative mechanism is isolated or if effects are due to other factors.
Authors: The manuscript mentions ablation studies but does not present the quantitative results. We will add a dedicated table and section detailing the ablation experiments, including performance metrics (e.g., max drawdown, Sharpe ratio) for configurations with and without each agent, as well as the setup for isolating the recursive feedback mechanism. This will allow readers to evaluate the contribution of each component. revision: yes
-
Referee: [—] Experimental Setup: No statistical significance tests (e.g., t-tests or bootstrap on drawdown differences), walk-forward validation, or robustness checks against scenario selection are reported. The 561-day window and post-hoc identification of 5 geopolitical stress periods raise concerns about generalizability.
Authors: We acknowledge the value of statistical tests and will incorporate t-tests and bootstrap confidence intervals for the drawdown comparisons in the revised version. The 561-day period is the available data window, but we will add walk-forward analysis over sub-periods. Regarding the stress scenarios, they were selected based on major publicly known geopolitical events during the period (e.g., specific conflicts and policy announcements), not purely post-hoc; however, we will provide a clearer justification and sensitivity analysis to scenario selection to address generalizability concerns. revision: partial
Circularity Check
No significant circularity; empirical backtest with no derivations or self-referential reductions
full rationale
The paper describes an empirical evaluation of RMATS via a 561-day backtest and ablation studies comparing drawdown metrics against baselines like MVO and FinBERT. No equations, derivations, parameter fittings, or mathematical claims are present that could reduce to self-definitional inputs, fitted predictions, or self-citation chains. All load-bearing claims rest on direct experimental comparisons without any reduction by construction to the paper's own inputs or prior self-citations. This is a standard non-circular empirical architecture paper.
Axiom & Free-Parameter Ledger
axioms (1)
- domain assumption Geopolitical stress periods can be reliably identified within the 2023-2025 window and serve as valid test cases for downside protection
invented entities (5)
-
Sentiment Agent
no independent evidence
-
Report Agent
no independent evidence
-
Analysis Agent
no independent evidence
-
Risk Agent
no independent evidence
-
Recursive Manager Agent
no independent evidence
Forward citations
Cited by 1 Pith paper
-
Early Diagnosis of Wasted Computation in Multi-Agent LLM Systems via Failure-Aware Observability
Introduces failure-aware observability framework for diagnosing wasted computation in multi-agent LLM systems and evaluates it on 165 GAIA traces showing common operational failures.
Reference graph
Works this paper leans on
-
[1]
Portfolio selection,
H. Markowitz, “Portfolio selection, ”The Journal of Finance, vol. 7, no. 1, pp. 77–91, 1952
1952
-
[2]
Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation,
R. F. Engle, “Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation, ”Econometrica, vol. 50, no. 4, pp. 987–1007, 1982
1982
-
[3]
Generalized autoregressive conditional heteroskedasticity,
T. Bollerslev, “Generalized autoregressive conditional heteroskedasticity, ”Jour- nal of Econometrics, vol. 31, no. 3, pp. 307–327, 1986
1986
-
[4]
Attention is all you need,
A. Vaswani et al., “Attention is all you need, ” inAdvances in Neural Information Processing Systems, vol. 30, 2017
2017
-
[5]
Wooldridge,An Introduction to MultiAgent Systems, 2nd ed
M. Wooldridge,An Introduction to MultiAgent Systems, 2nd ed. Chichester: Wiley, 2009
2009
-
[6]
FinGPT: Open-source financial large lan- guage models,
H. Yang, X.-Y. Liu, and C. D. Wang, “FinGPT: Open-source financial large lan- guage models, ” arXiv:2306.06031, 2023
arXiv 2023
-
[7]
BloombergGPT: A large language model for finance,
S. Wu et al., “BloombergGPT: A large language model for finance, ” arXiv:2303.17564, 2023
Pith/arXiv arXiv 2023
-
[8]
PIXIU: A large language model, instruction data and evaluation benchmark for finance,
Q. Xie et al., “PIXIU: A large language model, instruction data and evaluation benchmark for finance, ” inAdvances in Neural Information Processing Systems, 2023
2023
-
[9]
Can ChatGPT forecast stock price movements? Re- turn predictability and large language models,
A. Lopez-Lira and Y. Tang, “Can ChatGPT forecast stock price movements? Re- turn predictability and large language models, ” arXiv:2304.07619, 2023
arXiv 2023
-
[10]
TradingAgents: Multi-agent trading framework,
Tauric Research, “TradingAgents: Multi-agent trading framework, ” https:// github.com/TauricResearch/TradingAgents, 2025
2025
-
[11]
FinAgent: A multimodal foundation agent for financial trading,
B. Wang et al., “FinAgent: A multimodal foundation agent for financial trading, ” arXiv:2402.18485, 2024
arXiv 2024
-
[12]
FinMem: A performance-enhanced LLM trading agent with layered memory and character design,
P. Yu et al., “FinMem: A performance-enhanced LLM trading agent with layered memory and character design, ” arXiv:2311.13743, 2023
arXiv 2023
-
[13]
FinRL: A deep reinforcement learning library for automated stock trading in quantitative finance,
Z. Liu et al., “FinRL: A deep reinforcement learning library for automated stock trading in quantitative finance, ” inNeurIPS Workshop on Deep Reinforcement Learning, 2020
2020
-
[14]
FinRL-Meta: Market environments and benchmarks for data-driven financial reinforcement learning,
Y. Liu et al., “FinRL-Meta: Market environments and benchmarks for data-driven financial reinforcement learning, ” inAdvances in Neural Information Processing Systems, vol. 35, 2022
2022
-
[15]
FinRL: Deep reinforcement learn- ing framework to automate trading in quantitative finance,
Z. Liu, J. Yang, R. Gao, M. Weng, and A. Walid, “FinRL: Deep reinforcement learn- ing framework to automate trading in quantitative finance, ” inProc. ACM Int. Conf. AI in Finance, 2021
2021
-
[16]
FinRL-Podracer: High performance and scalable deep reinforce- ment learning for quantitative finance,
X. Liu et al., “FinRL-Podracer: High performance and scalable deep reinforce- ment learning for quantitative finance, ” inProc. ACM Int. Conf. AI in Finance, 2021
2021
-
[17]
Towards a unified view of parameter-efficient transfer learning,
J. He et al., “Towards a unified view of parameter-efficient transfer learning, ” in Int. Conf. Learning Representations, 2022
2022
-
[18]
LoRA: Low-rank adaptation of large language models,
E. J. Hu et al., “LoRA: Low-rank adaptation of large language models, ” inInt. Conf. Learning Representations, 2022
2022
-
[19]
Parameter-efficient transfer learning for NLP,
N. Houlsby et al., “Parameter-efficient transfer learning for NLP, ” inInt. Conf. Machine Learning, 2019
2019
-
[20]
UniPET-SPK: A unified framework for parameter- efficient tuning of pre-trained speech models for robust speaker verification,
M. Sang and J. H. L. Hansen, “UniPET-SPK: A unified framework for parameter- efficient tuning of pre-trained speech models for robust speaker verification, ” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 33, pp. 1402–1414, 2025
2025
-
[21]
Efficient adapter tuning of pre-trained speech mod- els for automatic speaker verification,
M. Sang and J. H. L. Hansen, “Efficient adapter tuning of pre-trained speech mod- els for automatic speaker verification, ” inProc. IEEE ICASSP, pp. 12131–12135, 2024
2024
-
[22]
Improving transformer- based networks with locality for automatic speaker verification,
M. Sang, Y. Zhao, G. Liu, J. H. L. Hansen, and J. Wu, “Improving transformer- based networks with locality for automatic speaker verification, ” inProc. IEEE ICASSP, 2023
2023
-
[23]
Multi-frequency information enhanced chan- nel attention module for speaker representation learning,
M. Sang and J. H. L. Hansen, “Multi-frequency information enhanced chan- nel attention module for speaker representation learning, ” inProc. Interspeech, pp. 321–325, 2022
2022
-
[24]
Deep learning,
Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning, ”Nature, vol. 521, pp. 436– 444, 2015
2015
-
[25]
Fluctuations in uncertainty,
N. Bloom, “Fluctuations in uncertainty, ”Journal of Economic Perspectives, vol. 28, no. 2, pp. 153–176, 2014
2014
-
[26]
Measuring economic policy uncertainty,
S. R. Baker, N. Bloom, and S. J. Davis, “Measuring economic policy uncertainty, ” Quarterly Journal of Economics, vol. 131, no. 4, pp. 1593–1636, 2016
2016
-
[27]
Measuring geopolitical risk,
D. Caldara and M. Iacoviello, “Measuring geopolitical risk, ”American Economic Review, vol. 112, no. 4, pp. 1194–1225, 2022
2022
-
[28]
CAViaR: Conditional autoregressive value at risk by regression quantiles,
R. F. Engle and S. Manganelli, “CAViaR: Conditional autoregressive value at risk by regression quantiles, ”Journal of Business & Economic Statistics, vol. 22, no. 4, pp. 367–381, 2004
2004
-
[29]
Dynamic conditional correlation: A simple class of multivariate gen- eralized autoregressive conditional heteroskedasticity models,
R. F. Engle, “Dynamic conditional correlation: A simple class of multivariate gen- eralized autoregressive conditional heteroskedasticity models, ”Journal of Busi- ness & Economic Statistics, vol. 20, no. 3, pp. 339–350, 2002
2002
-
[30]
Agent-based systems for manufac- turing,
L. Monostori, J. Váncza, and M. A. Kumara, “Agent-based systems for manufac- turing, ”CIRP Annals, vol. 55, no. 2, pp. 697–720, 2006
2006
-
[31]
Russell and P
S. Russell and P. Norvig,Artificial Intelligence: A Modern Approach, 4th ed. Hobo- ken, NJ: Pearson, 2020
2020
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.