pith. sign in

arxiv: 2605.25311 · v1 · pith:ZOFIHIU5new · submitted 2026-05-25 · 💻 cs.MA

Recursive Multi-Agent Trading System: Iterative Optimized Portfolio Strategy Under Geopolitical Uncertainty

Pith reviewed 2026-06-29 20:01 UTC · model grok-4.3

classification 💻 cs.MA
keywords multi-agent tradingportfolio optimizationgeopolitical uncertaintyrisk managementrecursive feedbackmaximum drawdownagent coordinationstress testing
0
0 comments X

The pith

A recursive multi-agent trading system achieves a 9.62 percent maximum drawdown over 561 days, outperforming mean-variance optimization and sentiment baselines in most geopolitical stress tests.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces RMATS to coordinate four specialized agents through a recursive manager that uses iterative feedback loops for portfolio decisions. It tests the system on a 24-asset universe across a 561-trading-day window and five identified geopolitical stress periods, reporting lower overall and event-specific drawdowns than the compared methods. The central aim is to demonstrate that this architecture improves downside protection for portfolios when returns are secondary to capital preservation. Ablation results are presented to attribute the protection to the individual agent roles and the recursive coordination.

Core claim

RMATS integrates Sentiment, Report, Analysis, and Risk agents under a recursive Manager Agent that applies iterative feedback loops; over the January 2023 to March 2025 period this produces a 9.62 percent maximum drawdown, lower than MVO at 15.49 percent and FinBERT Sentiment at 15.28 percent, while recording the lowest drawdown in three of the five geopolitical stress scenarios.

What carries the argument

The recursive Manager Agent that coordinates iterative feedback loops among the four specialized agents.

If this is right

  • RMATS trades off some upside in sustained bull markets to achieve lower drawdowns.
  • Ablation results indicate that removing any single agent weakens the downside protection.
  • The architecture is positioned for institutional use focused on capital preservation rather than return maximization.
  • The lowest event-period drawdowns occur in three of the five tested geopolitical scenarios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • The same recursive coordination pattern could be applied to non-financial multi-agent decision tasks that require repeated revision under uncertainty.
  • Extending the agent set or replacing individual agents with newer language models might further reduce drawdown if the feedback mechanism remains intact.
  • A live deployment with real capital would reveal whether execution costs or model drift alter the backtested risk reduction.

Load-bearing premise

The chosen 561-day window and the five selected geopolitical stress periods are representative enough to support general claims about risk-control performance.

What would settle it

A backtest on a later or earlier multi-year window that includes new stress events and shows RMATS maximum drawdown exceeding the MVO or FinBERT baselines would falsify the reported advantage.

Figures

Figures reproduced from arXiv: 2605.25311 by Jianan Liu, Jing Yang, Mengwei Yuan, Penghao Liang, Weiran Yan, Xianyou Li, Yichao Wu.

Figure 1
Figure 1. Figure 1: RMATS agent collaboration architecture. Solid ar [PITH_FULL_IMAGE:figures/full_fig_p002_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Cumulative portfolio performance (normalized to [PITH_FULL_IMAGE:figures/full_fig_p003_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Portfolio drawdown comparison. RMATS main [PITH_FULL_IMAGE:figures/full_fig_p003_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: RMATS portfolio allocation over time. Red-shaded [PITH_FULL_IMAGE:figures/full_fig_p004_4.png] view at source ↗
Figure 6
Figure 6. Figure 6: Distribution of convergence rounds: 74.1% of steps [PITH_FULL_IMAGE:figures/full_fig_p004_6.png] view at source ↗
Figure 7
Figure 7. Figure 7: Normal vs. geopolitical stress convergence: stress [PITH_FULL_IMAGE:figures/full_fig_p004_7.png] view at source ↗
read the original abstract

Recursive Multi-Agent Trading System (RMATS) integrates four specialized agents -- Sentiment, Report, Analysis, and Risk -- coordinated through a recursive Manager Agent with iterative feedback loops. Experimental evaluation over a 561-trading-day period (January 2023 to March 2025) across a 24-asset multi-class universe demonstrates that RMATS achieves a maximum drawdown of 9.62%, lower than MVO (15.49%) and FinBERT Sentiment (15.28%), and exhibits the lowest event-period drawdown in 3 of 5 geopolitical stress scenarios tested. While RMATS underperforms return-maximizing baselines in a sustained bull market environment, ablation studies confirm the individual contribution of each agent component to downside protection. These results position RMATS as a risk-control-oriented architecture suitable for institutions prioritizing capital preservation under geopolitical uncertainty.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

3 major / 2 minor

Summary. The manuscript introduces the Recursive Multi-Agent Trading System (RMATS), which uses four specialized agents (Sentiment, Report, Analysis, Risk) coordinated by a recursive Manager Agent with iterative feedback loops for portfolio strategy under geopolitical uncertainty. Over a 561-trading-day backtest (Jan 2023 - Mar 2025) on 24 assets, it reports a maximum drawdown of 9.62% (vs. 15.49% for MVO and 15.28% for FinBERT Sentiment), with lowest event-period drawdown in 3 of 5 stress scenarios, and ablation studies supporting each agent's contribution.

Significance. If the empirical results can be verified and generalized, RMATS could represent a meaningful advance in applying multi-agent systems to risk-controlled portfolio management in uncertain environments. The recursive feedback mechanism offers a novel way to integrate agent outputs iteratively. However, the short and recent backtest period, combined with lack of implementation details, limits the immediate significance for the field.

major comments (3)
  1. Abstract and Experimental Evaluation: The central performance claims (maximum drawdown of 9.62%, lowest in 3/5 scenarios) are presented without any details on the implementation of the agents, the recursive feedback loops, the specific data sources or asset selection criteria for the 24-asset universe, or any controls for look-ahead bias or data snooping. This makes the results unverifiable and is load-bearing for the claim of superior risk control.
  2. Ablation Studies: Ablation studies are cited as confirming the contribution of each agent component, but no quantitative results, setup details, or comparison metrics are provided in the manuscript. Without this, it is impossible to assess whether the recursive Manager Agent's iterative mechanism is isolated or if effects are due to other factors.
  3. Experimental Setup: No statistical significance tests (e.g., t-tests or bootstrap on drawdown differences), walk-forward validation, or robustness checks against scenario selection are reported. The 561-day window and post-hoc identification of 5 geopolitical stress periods raise concerns about generalizability.
minor comments (2)
  1. The manuscript would benefit from clearer notation or pseudocode for the recursive Manager Agent to aid reproducibility.
  2. Consider adding references to prior work on multi-agent systems in finance for context.

Simulated Author's Rebuttal

3 responses · 0 unresolved

Thank you for the detailed review. We address each of the major comments below, agreeing that additional details and analyses will strengthen the manuscript.

read point-by-point responses
  1. Referee: [—] Abstract and Experimental Evaluation: The central performance claims (maximum drawdown of 9.62%, lowest in 3/5 scenarios) are presented without any details on the implementation of the agents, the recursive feedback loops, the specific data sources or asset selection criteria for the 24-asset universe, or any controls for look-ahead bias or data snooping. This makes the results unverifiable and is load-bearing for the claim of superior risk control.

    Authors: We agree that the current manuscript lacks sufficient implementation details to allow full verification of the results. The full text describes the agent roles at a high level, but we will expand Section 3 with specific implementation details, including pseudocode for the recursive Manager Agent's feedback loops, exact data sources (e.g., news APIs and financial databases used), asset selection criteria (liquidity thresholds and sector balance for the 24 assets), and explicit statements on the use of only historical data to avoid look-ahead bias. These additions will be included in the revised manuscript. revision: yes

  2. Referee: [—] Ablation Studies: Ablation studies are cited as confirming the contribution of each agent component, but no quantitative results, setup details, or comparison metrics are provided in the manuscript. Without this, it is impossible to assess whether the recursive Manager Agent's iterative mechanism is isolated or if effects are due to other factors.

    Authors: The manuscript mentions ablation studies but does not present the quantitative results. We will add a dedicated table and section detailing the ablation experiments, including performance metrics (e.g., max drawdown, Sharpe ratio) for configurations with and without each agent, as well as the setup for isolating the recursive feedback mechanism. This will allow readers to evaluate the contribution of each component. revision: yes

  3. Referee: [—] Experimental Setup: No statistical significance tests (e.g., t-tests or bootstrap on drawdown differences), walk-forward validation, or robustness checks against scenario selection are reported. The 561-day window and post-hoc identification of 5 geopolitical stress periods raise concerns about generalizability.

    Authors: We acknowledge the value of statistical tests and will incorporate t-tests and bootstrap confidence intervals for the drawdown comparisons in the revised version. The 561-day period is the available data window, but we will add walk-forward analysis over sub-periods. Regarding the stress scenarios, they were selected based on major publicly known geopolitical events during the period (e.g., specific conflicts and policy announcements), not purely post-hoc; however, we will provide a clearer justification and sensitivity analysis to scenario selection to address generalizability concerns. revision: partial

Circularity Check

0 steps flagged

No significant circularity; empirical backtest with no derivations or self-referential reductions

full rationale

The paper describes an empirical evaluation of RMATS via a 561-day backtest and ablation studies comparing drawdown metrics against baselines like MVO and FinBERT. No equations, derivations, parameter fittings, or mathematical claims are present that could reduce to self-definitional inputs, fitted predictions, or self-citation chains. All load-bearing claims rest on direct experimental comparisons without any reduction by construction to the paper's own inputs or prior self-citations. This is a standard non-circular empirical architecture paper.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 5 invented entities

The central claim rests on empirical backtest performance of a newly defined multi-agent architecture whose components are introduced without external validation and whose evaluation depends on the representativeness of one historical window.

axioms (1)
  • domain assumption Geopolitical stress periods can be reliably identified within the 2023-2025 window and serve as valid test cases for downside protection
    The paper evaluates performance specifically on five such scenarios identified in the test period.
invented entities (5)
  • Sentiment Agent no independent evidence
    purpose: Processes sentiment signals for trading decisions
    New component introduced as part of RMATS without independent evidence outside the paper
  • Report Agent no independent evidence
    purpose: Analyzes financial reports
    New component introduced as part of RMATS without independent evidence outside the paper
  • Analysis Agent no independent evidence
    purpose: Performs market analysis
    New component introduced as part of RMATS without independent evidence outside the paper
  • Risk Agent no independent evidence
    purpose: Assesses and manages portfolio risk
    New component introduced as part of RMATS without independent evidence outside the paper
  • Recursive Manager Agent no independent evidence
    purpose: Coordinates the four agents via iterative feedback loops
    Central coordinating entity introduced as part of RMATS without independent evidence outside the paper

pith-pipeline@v0.9.1-grok · 5691 in / 1560 out tokens · 53607 ms · 2026-06-29T20:01:30.309041+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

  1. Early Diagnosis of Wasted Computation in Multi-Agent LLM Systems via Failure-Aware Observability

    cs.AI 2026-05 unverdicted novelty 5.0

    Introduces failure-aware observability framework for diagnosing wasted computation in multi-agent LLM systems and evaluates it on 165 GAIA traces showing common operational failures.

Reference graph

Works this paper leans on

31 extracted references · 1 linked inside Pith · cited by 1 Pith paper

  1. [1]

    Portfolio selection,

    H. Markowitz, “Portfolio selection, ”The Journal of Finance, vol. 7, no. 1, pp. 77–91, 1952

  2. [2]

    Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation,

    R. F. Engle, “Autoregressive conditional heteroscedasticity with estimates of the variance of United Kingdom inflation, ”Econometrica, vol. 50, no. 4, pp. 987–1007, 1982

  3. [3]

    Generalized autoregressive conditional heteroskedasticity,

    T. Bollerslev, “Generalized autoregressive conditional heteroskedasticity, ”Jour- nal of Econometrics, vol. 31, no. 3, pp. 307–327, 1986

  4. [4]

    Attention is all you need,

    A. Vaswani et al., “Attention is all you need, ” inAdvances in Neural Information Processing Systems, vol. 30, 2017

  5. [5]

    Wooldridge,An Introduction to MultiAgent Systems, 2nd ed

    M. Wooldridge,An Introduction to MultiAgent Systems, 2nd ed. Chichester: Wiley, 2009

  6. [6]

    FinGPT: Open-source financial large lan- guage models,

    H. Yang, X.-Y. Liu, and C. D. Wang, “FinGPT: Open-source financial large lan- guage models, ” arXiv:2306.06031, 2023

  7. [7]

    BloombergGPT: A large language model for finance,

    S. Wu et al., “BloombergGPT: A large language model for finance, ” arXiv:2303.17564, 2023

  8. [8]

    PIXIU: A large language model, instruction data and evaluation benchmark for finance,

    Q. Xie et al., “PIXIU: A large language model, instruction data and evaluation benchmark for finance, ” inAdvances in Neural Information Processing Systems, 2023

  9. [9]

    Can ChatGPT forecast stock price movements? Re- turn predictability and large language models,

    A. Lopez-Lira and Y. Tang, “Can ChatGPT forecast stock price movements? Re- turn predictability and large language models, ” arXiv:2304.07619, 2023

  10. [10]

    TradingAgents: Multi-agent trading framework,

    Tauric Research, “TradingAgents: Multi-agent trading framework, ” https:// github.com/TauricResearch/TradingAgents, 2025

  11. [11]

    FinAgent: A multimodal foundation agent for financial trading,

    B. Wang et al., “FinAgent: A multimodal foundation agent for financial trading, ” arXiv:2402.18485, 2024

  12. [12]

    FinMem: A performance-enhanced LLM trading agent with layered memory and character design,

    P. Yu et al., “FinMem: A performance-enhanced LLM trading agent with layered memory and character design, ” arXiv:2311.13743, 2023

  13. [13]

    FinRL: A deep reinforcement learning library for automated stock trading in quantitative finance,

    Z. Liu et al., “FinRL: A deep reinforcement learning library for automated stock trading in quantitative finance, ” inNeurIPS Workshop on Deep Reinforcement Learning, 2020

  14. [14]

    FinRL-Meta: Market environments and benchmarks for data-driven financial reinforcement learning,

    Y. Liu et al., “FinRL-Meta: Market environments and benchmarks for data-driven financial reinforcement learning, ” inAdvances in Neural Information Processing Systems, vol. 35, 2022

  15. [15]

    FinRL: Deep reinforcement learn- ing framework to automate trading in quantitative finance,

    Z. Liu, J. Yang, R. Gao, M. Weng, and A. Walid, “FinRL: Deep reinforcement learn- ing framework to automate trading in quantitative finance, ” inProc. ACM Int. Conf. AI in Finance, 2021

  16. [16]

    FinRL-Podracer: High performance and scalable deep reinforce- ment learning for quantitative finance,

    X. Liu et al., “FinRL-Podracer: High performance and scalable deep reinforce- ment learning for quantitative finance, ” inProc. ACM Int. Conf. AI in Finance, 2021

  17. [17]

    Towards a unified view of parameter-efficient transfer learning,

    J. He et al., “Towards a unified view of parameter-efficient transfer learning, ” in Int. Conf. Learning Representations, 2022

  18. [18]

    LoRA: Low-rank adaptation of large language models,

    E. J. Hu et al., “LoRA: Low-rank adaptation of large language models, ” inInt. Conf. Learning Representations, 2022

  19. [19]

    Parameter-efficient transfer learning for NLP,

    N. Houlsby et al., “Parameter-efficient transfer learning for NLP, ” inInt. Conf. Machine Learning, 2019

  20. [20]

    UniPET-SPK: A unified framework for parameter- efficient tuning of pre-trained speech models for robust speaker verification,

    M. Sang and J. H. L. Hansen, “UniPET-SPK: A unified framework for parameter- efficient tuning of pre-trained speech models for robust speaker verification, ” IEEE/ACM Trans. Audio, Speech, Language Process., vol. 33, pp. 1402–1414, 2025

  21. [21]

    Efficient adapter tuning of pre-trained speech mod- els for automatic speaker verification,

    M. Sang and J. H. L. Hansen, “Efficient adapter tuning of pre-trained speech mod- els for automatic speaker verification, ” inProc. IEEE ICASSP, pp. 12131–12135, 2024

  22. [22]

    Improving transformer- based networks with locality for automatic speaker verification,

    M. Sang, Y. Zhao, G. Liu, J. H. L. Hansen, and J. Wu, “Improving transformer- based networks with locality for automatic speaker verification, ” inProc. IEEE ICASSP, 2023

  23. [23]

    Multi-frequency information enhanced chan- nel attention module for speaker representation learning,

    M. Sang and J. H. L. Hansen, “Multi-frequency information enhanced chan- nel attention module for speaker representation learning, ” inProc. Interspeech, pp. 321–325, 2022

  24. [24]

    Deep learning,

    Y. LeCun, Y. Bengio, and G. Hinton, “Deep learning, ”Nature, vol. 521, pp. 436– 444, 2015

  25. [25]

    Fluctuations in uncertainty,

    N. Bloom, “Fluctuations in uncertainty, ”Journal of Economic Perspectives, vol. 28, no. 2, pp. 153–176, 2014

  26. [26]

    Measuring economic policy uncertainty,

    S. R. Baker, N. Bloom, and S. J. Davis, “Measuring economic policy uncertainty, ” Quarterly Journal of Economics, vol. 131, no. 4, pp. 1593–1636, 2016

  27. [27]

    Measuring geopolitical risk,

    D. Caldara and M. Iacoviello, “Measuring geopolitical risk, ”American Economic Review, vol. 112, no. 4, pp. 1194–1225, 2022

  28. [28]

    CAViaR: Conditional autoregressive value at risk by regression quantiles,

    R. F. Engle and S. Manganelli, “CAViaR: Conditional autoregressive value at risk by regression quantiles, ”Journal of Business & Economic Statistics, vol. 22, no. 4, pp. 367–381, 2004

  29. [29]

    Dynamic conditional correlation: A simple class of multivariate gen- eralized autoregressive conditional heteroskedasticity models,

    R. F. Engle, “Dynamic conditional correlation: A simple class of multivariate gen- eralized autoregressive conditional heteroskedasticity models, ”Journal of Busi- ness & Economic Statistics, vol. 20, no. 3, pp. 339–350, 2002

  30. [30]

    Agent-based systems for manufac- turing,

    L. Monostori, J. Váncza, and M. A. Kumara, “Agent-based systems for manufac- turing, ”CIRP Annals, vol. 55, no. 2, pp. 697–720, 2006

  31. [31]

    Russell and P

    S. Russell and P. Norvig,Artificial Intelligence: A Modern Approach, 4th ed. Hobo- ken, NJ: Pearson, 2020