Recognition: no theorem link
Beyond Prompting: An Autonomous Framework for Systematic Factor Investing via Agentic AI
Pith reviewed 2026-05-15 11:09 UTC · model grok-4.3
The pith
An autonomous agentic AI system generates interpretable trading signals whose linear combination forms long-short U.S. equity portfolios with 3.11 annualized Sharpe ratio and 59.53% returns.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The central claim is that an autonomous AI agent can endogenously formulate interpretable trading signals for factor investing. A closed-loop architecture that requires both out-of-sample validation and economic rationale during signal creation produces signals whose simple linear combination yields long-short portfolios with an annualized Sharpe ratio of 3.11 and returns of 59.53% in the U.S. equity market. The work concludes that self-evolving AI offers a scalable and interpretable paradigm for systematic investing.
What carries the argument
The closed-loop agentic AI engine that endogenously formulates signals subject to out-of-sample validation and economic rationale requirements.
If this is right
- Linear combinations of the generated signals are sufficient to achieve the reported performance without requiring complex nonlinear models.
- The framework reduces dependence on manually specified factors by shifting signal creation to the autonomous engine.
- Strict out-of-sample and rationale filters are claimed to produce more robust signals than unconstrained statistical search.
- The same closed-loop structure can be applied to other asset classes once the validation discipline is maintained.
Where Pith is reading between the lines
- The approach may generalize to live trading environments where the agent continuously updates signals under the same validation constraints.
- Economic rationale checks could serve as a practical bridge between statistical factor discovery and traditional asset-pricing theory.
- If the signals prove stable across market regimes, the framework could lower the cost of maintaining systematic strategies over time.
Load-bearing premise
The closed-loop system with out-of-sample validation and economic rationale requirements is sufficient to eliminate data snooping biases during endogenous signal formulation.
What would settle it
Re-running the identical autonomous loop on a fresh post-sample equity dataset and finding that the resulting long-short portfolios deliver a Sharpe ratio below 1.0 would falsify the performance claim.
read the original abstract
This paper develops an autonomous framework for systematic factor investing via agentic AI. Rather than relying on sequential manual prompts, our approach operationalizes the model as a self-directed engine that endogenously formulates interpretable trading signals. To mitigate data snooping biases, this closed-loop system imposes strict empirical discipline through out-of-sample validation and economic rationale requirements. Applying this methodology to the U.S. equity market, we document that long-short portfolios formed on the simple linear combination of signals deliver an annualized Sharpe ratio of 3.11 and a return of 59.53%. Finally, our empirics demonstrate that self-evolving AI offers a scalable and interpretable paradigm.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper develops an autonomous agentic AI framework for systematic factor investing that endogenously generates interpretable trading signals rather than relying on manual prompts. The closed-loop system applies out-of-sample validation and economic-rationale filters to mitigate data snooping. On U.S. equity data, long-short portfolios constructed from the linear combination of these signals are reported to deliver an annualized Sharpe ratio of 3.11 and a return of 59.53%. The work concludes that self-evolving AI provides a scalable and interpretable paradigm for factor investing.
Significance. If the performance claims prove robust after proper controls, the paper would represent a meaningful contribution by automating signal discovery in a disciplined, interpretable manner that addresses longstanding concerns about manual factor construction and overfitting in quantitative finance. The emphasis on closed-loop validation and economic rationale could influence future AI applications in portfolio management, provided the methodology demonstrably separates signal invention from performance evaluation.
major comments (2)
- [Abstract] Abstract: The headline result—an annualized Sharpe ratio of 3.11 and 59.53% return for long-short portfolios formed on the linear combination of signals—is load-bearing for the central claim. The abstract supplies no information on the sample period, transaction costs, number of signals generated and tested by the agent, or exact out-of-sample procedures. Without these details the reported performance cannot be evaluated against standard concerns of overfitting or selection bias.
- [Methodology (closed-loop system)] Closed-loop system description: While out-of-sample validation and economic-rationale requirements are invoked to discipline the process, the manuscript states no explicit bound on the size of the hypothesis space the agent may explore, no pre-registration of the signal grammar, and no multiple-testing correction. This omission is critical because the out-of-sample window can still be used for implicit selection, undermining the claim that the metrics reflect a fixed, pre-specified strategy rather than endogenous optimization.
minor comments (1)
- [Abstract] Abstract: A concise statement of the asset universe and time span would immediately contextualize the performance numbers for readers.
Simulated Author's Rebuttal
We thank the referee for the constructive comments, which highlight important aspects of transparency and methodological rigor. We address each major comment point by point below.
read point-by-point responses
-
Referee: [Abstract] Abstract: The headline result—an annualized Sharpe ratio of 3.11 and 59.53% return for long-short portfolios formed on the linear combination of signals—is load-bearing for the central claim. The abstract supplies no information on the sample period, transaction costs, number of signals generated and tested by the agent, or exact out-of-sample procedures. Without these details the reported performance cannot be evaluated against standard concerns of overfitting or selection bias.
Authors: We agree that the abstract requires additional details to permit proper assessment of the results against concerns such as overfitting. In the revised manuscript we will expand the abstract to report the sample period, note the treatment of transaction costs, state the number of signals generated and tested by the agent, and describe the out-of-sample validation procedures. These changes will improve transparency without altering the headline performance figures. revision: yes
-
Referee: [Methodology (closed-loop system)] Closed-loop system description: While out-of-sample validation and economic-rationale requirements are invoked to discipline the process, the manuscript states no explicit bound on the size of the hypothesis space the agent may explore, no pre-registration of the signal grammar, and no multiple-testing correction. This omission is critical because the out-of-sample window can still be used for implicit selection, undermining the claim that the metrics reflect a fixed, pre-specified strategy rather than endogenous optimization.
Authors: We acknowledge the value of an explicit bound on the hypothesis space and will revise the methodology section to specify the agent's operational constraints, including the fixed grammar of base factors and operators that limits the searchable space. We will also add a discussion of multiple-testing considerations and clarify how the nested out-of-sample splits combined with the economic-rationale filter reduce the scope for implicit selection. Pre-registration of the signal grammar was not performed because the work introduces a novel agentic framework; the closed-loop design with strict validation is intended to serve an analogous disciplining function. revision: partial
- Pre-registration of the signal grammar, which cannot be retroactively applied to the completed experiments.
Circularity Check
Endogenous signal generation inside closed-loop AI reduces reported Sharpe to fitted result by construction
specific steps
-
fitted input called prediction
[Abstract]
"our approach operationalizes the model as a self-directed engine that endogenously formulates interpretable trading signals. To mitigate data snooping biases, this closed-loop system imposes strict empirical discipline through out-of-sample validation and economic rationale requirements. Applying this methodology to the U.S. equity market, we document that long-short portfolios formed on the simple linear combination of signals deliver an annualized Sharpe ratio of 3.11 and a return of 59.53%."
Signals are generated inside the closed-loop agent; the linear-combination portfolio and its Sharpe/return are then computed on those same signals. The 'prediction' is therefore the fitted output of the endogenous search rather than an independent evaluation of a fixed, pre-specified strategy.
full rationale
The paper's headline performance (SR 3.11, 59.53% return) is obtained from signals that the agentic system itself invents and selects. The abstract states the framework 'endogenously formulates interpretable trading signals' and then immediately reports the metrics from 'long-short portfolios formed on the simple linear combination of signals'. Out-of-sample validation and rationale gates are applied inside the same loop; no pre-registered grammar, no bound on hypothesis space, and no external fixed signal set are described. Consequently the reported numbers are the output of the search process rather than an independent test of a pre-specified strategy.
Axiom & Free-Parameter Ledger
free parameters (1)
- AI agent hyperparameters and signal thresholds
axioms (1)
- domain assumption Out-of-sample validation and economic rationale requirements eliminate data snooping
Forward citations
Cited by 2 Pith papers
-
From Hypotheses to Factors: Constrained LLM Agents in Cryptocurrency Markets
Constrained LLM agents discover cryptocurrency factors that produce a portfolio with 44.55% annualized return and Sharpe ratio of 1.55 in pure out-of-sample 2024-2026 testing after trading costs.
-
QRAFTI: An Agentic Framework for Empirical Research in Quantitative Finance
QRAFTI is a multi-agent framework using tool-calling and reflection-based planning to emulate quant research tasks like factor replication and signal testing on financial data.
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.