arxiv: 2603.14288 · v2 · submitted 2026-03-15 · 💱 q-fin.PM · q-fin.GN· q-fin.PR

Recognition: no theorem link

Beyond Prompting: An Autonomous Framework for Systematic Factor Investing via Agentic AI

Allen Yikuan Huang , Zheqi Fan

Authors on Pith no claims yet

Pith reviewed 2026-05-15 11:09 UTC · model grok-4.3

classification 💱 q-fin.PM q-fin.GNq-fin.PR

keywords factor investingagentic AIsystematic tradingSharpe ratiolong-short portfoliosdata snoopinginterpretable signalsU.S. equity market

0 comments

The pith

An autonomous agentic AI system generates interpretable trading signals whose linear combination forms long-short U.S. equity portfolios with 3.11 annualized Sharpe ratio and 59.53% returns.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper presents a framework that replaces sequential manual prompts with a self-directed AI engine for systematic factor investing. The engine formulates trading signals endogenously while a closed loop enforces out-of-sample validation and explicit economic rationale checks to limit data snooping. When the signals are combined linearly into long-short portfolios on U.S. equities, the resulting strategy records the reported performance metrics. This setup is positioned as a scalable alternative to conventional human-designed factor models.

Core claim

The central claim is that an autonomous AI agent can endogenously formulate interpretable trading signals for factor investing. A closed-loop architecture that requires both out-of-sample validation and economic rationale during signal creation produces signals whose simple linear combination yields long-short portfolios with an annualized Sharpe ratio of 3.11 and returns of 59.53% in the U.S. equity market. The work concludes that self-evolving AI offers a scalable and interpretable paradigm for systematic investing.

What carries the argument

The closed-loop agentic AI engine that endogenously formulates signals subject to out-of-sample validation and economic rationale requirements.

If this is right

Linear combinations of the generated signals are sufficient to achieve the reported performance without requiring complex nonlinear models.
The framework reduces dependence on manually specified factors by shifting signal creation to the autonomous engine.
Strict out-of-sample and rationale filters are claimed to produce more robust signals than unconstrained statistical search.
The same closed-loop structure can be applied to other asset classes once the validation discipline is maintained.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The approach may generalize to live trading environments where the agent continuously updates signals under the same validation constraints.
Economic rationale checks could serve as a practical bridge between statistical factor discovery and traditional asset-pricing theory.
If the signals prove stable across market regimes, the framework could lower the cost of maintaining systematic strategies over time.

Load-bearing premise

The closed-loop system with out-of-sample validation and economic rationale requirements is sufficient to eliminate data snooping biases during endogenous signal formulation.

What would settle it

Re-running the identical autonomous loop on a fresh post-sample equity dataset and finding that the resulting long-short portfolios deliver a Sharpe ratio below 1.0 would falsify the performance claim.

read the original abstract

This paper develops an autonomous framework for systematic factor investing via agentic AI. Rather than relying on sequential manual prompts, our approach operationalizes the model as a self-directed engine that endogenously formulates interpretable trading signals. To mitigate data snooping biases, this closed-loop system imposes strict empirical discipline through out-of-sample validation and economic rationale requirements. Applying this methodology to the U.S. equity market, we document that long-short portfolios formed on the simple linear combination of signals deliver an annualized Sharpe ratio of 3.11 and a return of 59.53%. Finally, our empirics demonstrate that self-evolving AI offers a scalable and interpretable paradigm.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper's agentic AI framework for self-generated factor signals is a clear step past standard prompting, but the headline Sharpe of 3.11 rests on claims that lack any supporting methodological detail.

read the letter

The main takeaway is that this work moves from prompted LLM use to a closed-loop agent that invents its own trading signals, applies out-of-sample checks and economic-rationale gates, and reports long-short portfolios with a Sharpe ratio of 3.11 and 59.53% annualized return on U.S. equities. That framing is new relative to the usual AI-for-finance literature, which mostly stops at sequential prompting or fixed signal templates. The attempt to embed empirical discipline inside the loop is a reasonable direction and gives the setup some conceptual appeal for people building autonomous quant systems. The paper does a service by naming data-snooping risk explicitly and trying to counter it with validation and rationale requirements. Those elements are worth noting even if they do not fully solve the problem. The soft spots sit in the empirical claims. The abstract supplies no data period, no count of signals the agent generated or tested, no transaction-cost treatment, and no description of how the out-of-sample window was protected from implicit selection. Without those pieces, a Sharpe above 3 is hard to evaluate and aligns with the usual pattern where unbounded search plus post-hoc filtering still produces inflated numbers. The stress-test point about missing bounds on the hypothesis space holds up on the available text; the out-of-sample period can still serve as a selection device if the agent is free to iterate extensively. This paper is aimed at researchers exploring agentic methods in quantitative finance. Someone already working on similar loops might extract useful architecture ideas, but the results section would need a full methods appendix and robustness tables before it could be taken seriously. I would not send it to peer review as written; it needs the missing implementation details and tests first.

Referee Report

2 major / 1 minor

Summary. The paper develops an autonomous agentic AI framework for systematic factor investing that endogenously generates interpretable trading signals rather than relying on manual prompts. The closed-loop system applies out-of-sample validation and economic-rationale filters to mitigate data snooping. On U.S. equity data, long-short portfolios constructed from the linear combination of these signals are reported to deliver an annualized Sharpe ratio of 3.11 and a return of 59.53%. The work concludes that self-evolving AI provides a scalable and interpretable paradigm for factor investing.

Significance. If the performance claims prove robust after proper controls, the paper would represent a meaningful contribution by automating signal discovery in a disciplined, interpretable manner that addresses longstanding concerns about manual factor construction and overfitting in quantitative finance. The emphasis on closed-loop validation and economic rationale could influence future AI applications in portfolio management, provided the methodology demonstrably separates signal invention from performance evaluation.

major comments (2)

[Abstract] Abstract: The headline result—an annualized Sharpe ratio of 3.11 and 59.53% return for long-short portfolios formed on the linear combination of signals—is load-bearing for the central claim. The abstract supplies no information on the sample period, transaction costs, number of signals generated and tested by the agent, or exact out-of-sample procedures. Without these details the reported performance cannot be evaluated against standard concerns of overfitting or selection bias.
[Methodology (closed-loop system)] Closed-loop system description: While out-of-sample validation and economic-rationale requirements are invoked to discipline the process, the manuscript states no explicit bound on the size of the hypothesis space the agent may explore, no pre-registration of the signal grammar, and no multiple-testing correction. This omission is critical because the out-of-sample window can still be used for implicit selection, undermining the claim that the metrics reflect a fixed, pre-specified strategy rather than endogenous optimization.

minor comments (1)

[Abstract] Abstract: A concise statement of the asset universe and time span would immediately contextualize the performance numbers for readers.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive comments, which highlight important aspects of transparency and methodological rigor. We address each major comment point by point below.

read point-by-point responses

Referee: [Abstract] Abstract: The headline result—an annualized Sharpe ratio of 3.11 and 59.53% return for long-short portfolios formed on the linear combination of signals—is load-bearing for the central claim. The abstract supplies no information on the sample period, transaction costs, number of signals generated and tested by the agent, or exact out-of-sample procedures. Without these details the reported performance cannot be evaluated against standard concerns of overfitting or selection bias.

Authors: We agree that the abstract requires additional details to permit proper assessment of the results against concerns such as overfitting. In the revised manuscript we will expand the abstract to report the sample period, note the treatment of transaction costs, state the number of signals generated and tested by the agent, and describe the out-of-sample validation procedures. These changes will improve transparency without altering the headline performance figures. revision: yes
Referee: [Methodology (closed-loop system)] Closed-loop system description: While out-of-sample validation and economic-rationale requirements are invoked to discipline the process, the manuscript states no explicit bound on the size of the hypothesis space the agent may explore, no pre-registration of the signal grammar, and no multiple-testing correction. This omission is critical because the out-of-sample window can still be used for implicit selection, undermining the claim that the metrics reflect a fixed, pre-specified strategy rather than endogenous optimization.

Authors: We acknowledge the value of an explicit bound on the hypothesis space and will revise the methodology section to specify the agent's operational constraints, including the fixed grammar of base factors and operators that limits the searchable space. We will also add a discussion of multiple-testing considerations and clarify how the nested out-of-sample splits combined with the economic-rationale filter reduce the scope for implicit selection. Pre-registration of the signal grammar was not performed because the work introduces a novel agentic framework; the closed-loop design with strict validation is intended to serve an analogous disciplining function. revision: partial

standing simulated objections not resolved

Pre-registration of the signal grammar, which cannot be retroactively applied to the completed experiments.

Circularity Check

1 steps flagged

Endogenous signal generation inside closed-loop AI reduces reported Sharpe to fitted result by construction

specific steps

fitted input called prediction [Abstract]
"our approach operationalizes the model as a self-directed engine that endogenously formulates interpretable trading signals. To mitigate data snooping biases, this closed-loop system imposes strict empirical discipline through out-of-sample validation and economic rationale requirements. Applying this methodology to the U.S. equity market, we document that long-short portfolios formed on the simple linear combination of signals deliver an annualized Sharpe ratio of 3.11 and a return of 59.53%."

Signals are generated inside the closed-loop agent; the linear-combination portfolio and its Sharpe/return are then computed on those same signals. The 'prediction' is therefore the fitted output of the endogenous search rather than an independent evaluation of a fixed, pre-specified strategy.

full rationale

The paper's headline performance (SR 3.11, 59.53% return) is obtained from signals that the agentic system itself invents and selects. The abstract states the framework 'endogenously formulates interpretable trading signals' and then immediately reports the metrics from 'long-short portfolios formed on the simple linear combination of signals'. Out-of-sample validation and rationale gates are applied inside the same loop; no pre-registered grammar, no bound on hypothesis space, and no external fixed signal set are described. Consequently the reported numbers are the output of the search process rather than an independent test of a pre-specified strategy.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the unverified premise that the AI agent's internal process plus out-of-sample checks produces genuinely new, non-overfit signals; this is an ad-hoc domain assumption with no independent evidence supplied.

free parameters (1)

AI agent hyperparameters and signal thresholds
Tuned internally by the agentic system to produce the reported portfolio performance.

axioms (1)

domain assumption Out-of-sample validation and economic rationale requirements eliminate data snooping
Invoked in the abstract as the mechanism that disciplines the closed-loop system.

pith-pipeline@v0.9.0 · 5411 in / 1206 out tokens · 60831 ms · 2026-05-15T11:09:13.890838+00:00 · methodology

discussion (0)

Forward citations

Cited by 2 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

From Hypotheses to Factors: Constrained LLM Agents in Cryptocurrency Markets
q-fin.PM 2026-04 unverdicted novelty 7.0

Constrained LLM agents discover cryptocurrency factors that produce a portfolio with 44.55% annualized return and Sharpe ratio of 1.55 in pure out-of-sample 2024-2026 testing after trading costs.
QRAFTI: An Agentic Framework for Empirical Research in Quantitative Finance
cs.MA 2026-04 unverdicted novelty 6.0

QRAFTI is a multi-agent framework using tool-calling and reflection-based planning to emulate quant research tasks like factor replication and signal testing on financial data.