arxiv: 2602.07085 · v2 · submitted 2026-02-06 · 💱 q-fin.ST · cs.AI· q-fin.CP

Recognition: 2 theorem links

· Lean Theorem

QuantaAlpha: An Evolutionary Framework for LLM-Driven Alpha Mining

Jun Han , Shuo Zhang , Wei Li , Zhi Yang , Yifan Dong , Tu Hu , Jialuo Yuan , Xiaomin Yu

show 16 more authors

Yumo Zhu Fangqi Lou Xin Guo Zhaowei Liu Tianyi Jiang Ruichuan An Jingping Liu Biao Wu Rongze Chen Kunyi Wang Yifan Wang Sen Hu Xinbing Kong Liwen Zhang Ronghao Chen Huacan Wang

Authors on Pith no claims yet

Pith reviewed 2026-05-16 07:06 UTC · model grok-4.3

classification 💱 q-fin.ST cs.AIq-fin.CP

keywords alpha miningLLM agentsevolutionary searchquantitative financefactor generationinformation coefficientCSI 300

0 comments

The pith

QuantaAlpha refines LLM-generated alpha factors by mutating and crossing over entire mining trajectories rather than editing isolated steps.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper proposes an evolutionary system that treats each full run of LLM-driven factor creation as a single trajectory. Suboptimal segments within a trajectory can be localized and revised, while successful segments from different runs can be recombined to reuse proven patterns. This produces factors that reach an information coefficient of 0.1501 and 27.75 percent annualized returns on CSI 300 data, with maximum drawdown held to 7.98 percent. The same factors deliver substantial excess returns when applied unchanged to CSI 500 and S&P 500, suggesting the method improves robustness to market shifts. A reader would care because more stable, transferable factors could reduce reliance on repeated backtesting and lower the risk of regime-specific overfitting in quantitative strategies.

Core claim

QuantaAlpha treats each end-to-end mining run as a trajectory and improves factors through trajectory-level mutation and crossover operations. It localizes suboptimal steps for targeted revision and recombines complementary high-reward segments to reuse effective patterns, while enforcing semantic consistency across hypothesis, expression, and code plus limits on complexity and redundancy. On CSI 300 the resulting factors achieve an information coefficient of 0.1501, annualized rate of return of 27.75 percent, and maximum drawdown of 7.98 percent; the factors transfer to CSI 500 and S&P 500 with 160 percent and 137 percent cumulative excess return over four years.

What carries the argument

Trajectory-level mutation and crossover that localize weak steps for revision and recombine high-reward segments from separate runs while preserving semantic consistency between hypothesis, expression, and executable code.

Load-bearing premise

The assumption that trajectory-level mutation and crossover genuinely improve out-of-sample predictive power rather than fitting noise or specific regimes in the CSI 300 backtests.

What would settle it

A direct test would be to apply the exact factors published in the paper to post-2023 market data or to unrelated asset classes without any retraining and measure whether the reported information coefficient and excess returns hold.

read the original abstract

Financial markets are noisy and non-stationary, making alpha mining highly sensitive to noise in backtesting results and sudden market regime shifts. While recent agentic frameworks improve alpha mining automation, they often lack controllable multi-round search and reliable reuse of validated experience. To address these challenges, we propose QuantaAlpha, an evolutionary alpha mining framework that treats each end-to-end mining run as a trajectory and improves factors through trajectory-level mutation and crossover operations. QuantaAlpha localizes suboptimal steps in each trajectory for targeted revision and recombines complementary high-reward segments to reuse effective patterns, enabling structured exploration and refinement across mining iterations. During factor generation, QuantaAlpha enforces semantic consistency across the hypothesis, factor expression, and executable code, while constraining the complexity and redundancy of the generated factor to mitigate crowding. Extensive experiments on the China Securities Index 300 (CSI 300) demonstrate consistent gains over strong baseline models and prior agentic systems. When utilizing GPT-5.2, QuantaAlpha achieves an Information Coefficient (IC) of 0.1501, with an Annualized Rate of Return (ARR) of 27.75% and a Maximum Drawdown (MDD) of 7.98%. Moreover, factors mined on CSI 300 transfer effectively to the China Securities Index 500 (CSI 500) and the Standard & Poor's 500 Index (S&P 500), delivering 160% and 137% cumulative excess return over four years, respectively, which indicates strong robustness of QuantaAlpha under market distribution shifts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

QuantaAlpha adds trajectory-level mutation and crossover to LLM alpha mining with semantic checks, but the reported IC and returns on CSI 300 rest on thin validation against data snooping.

read the letter

QuantaAlpha treats each full alpha mining run as a trajectory and applies mutation plus crossover at that level, plus localized revision of weak steps and recombination of strong segments. It also adds semantic consistency rules across the hypothesis, factor formula, and code, plus limits on complexity to avoid crowding. That combination is a clear step past the simpler prompt-loop setups in earlier agentic papers. The cross-market transfer numbers to CSI 500 and S&P 500 stand out as the most useful part if they survive scrutiny, since they suggest the mined factors are not purely CSI 300 artifacts. The paper does a reasonable job spelling out the mechanics of how the evolutionary operators work on trajectories. The performance claims are the soft spot. An IC of 0.1501 and 27.75% ARR with 7.98% MDD on CSI 300 is presented without visible details on baseline selection, statistical tests, walk-forward protocols for the search itself, or explicit handling of multiple testing across the evolutionary runs. In non-stationary financial data those omissions leave room for the mutation and crossover steps to fit regime-specific noise rather than stable signal. The abstract does not show sensitivity checks on the mutation rates or redundancy constraints either. This is aimed at quant researchers who want structured ways to automate factor discovery with LLMs. A practitioner building internal pipelines could pull useful implementation ideas from the trajectory design even if the headline numbers need more proof. The work shows clear thinking on the agentic side and honest engagement with the non-stationarity problem, so it deserves referee time. Send it for review but ask for ablations that isolate the contribution of the evolutionary operators and for full out-of-sample controls on the search process.

Referee Report

3 major / 2 minor

Summary. The paper introduces QuantaAlpha, an evolutionary alpha-mining framework that treats each LLM-driven factor generation run as a trajectory and applies trajectory-level mutation and crossover to localize suboptimal steps and recombine high-reward segments. It enforces semantic consistency between hypothesis, expression, and code while constraining complexity and redundancy. On CSI 300 the framework reports an IC of 0.1501, ARR of 27.75 %, and MDD of 7.98 % with GPT-5.2, together with strong transfer to CSI 500 (160 % cumulative excess return) and S&P 500 (137 % cumulative excess return) over four years.

Significance. If the performance gains survive rigorous out-of-sample validation and data-snooping controls, the work would constitute a meaningful advance in automated factor discovery by supplying a controllable, trajectory-based evolutionary search that reuses validated patterns across iterations. The reported cross-index transfer results would further indicate practical robustness under market-distribution shifts.

major comments (3)

[Abstract / Experimental Results] Abstract and Experimental Results section: the headline metrics (IC 0.1501, ARR 27.75 %, MDD 7.98 %) are stated without any description of the baseline models, the precise out-of-sample protocol, statistical significance tests on IC or return differences, or multiple-testing corrections applied to the evolutionary search itself. These omissions prevent assessment of whether the reported superiority is supported by the data or arises from post-hoc selection on CSI 300 back-test noise.
[Methods] Methods (trajectory-level mutation and crossover): the central claim that these operators produce genuine predictive improvements rests on the assumption that localizing suboptimal steps and recombining segments yields better factors than base LLM generation. No ablation isolating the incremental contribution of mutation/crossover versus standard LLM prompting or conventional evolutionary operators is provided, leaving open the possibility that gains reflect overfitting to transient CSI 300 regimes rather than robust alpha.
[Transfer Experiments] Transfer experiments: the 160 % and 137 % cumulative excess returns on CSI 500 and S&P 500 are presented without specification of the exact four-year window, whether factor selection was conditioned on CSI 300 performance after the fact, or any explicit controls for regime shifts and data snooping. This weakens the robustness claim under distribution shifts.

minor comments (2)

Define all acronyms (IC, ARR, MDD, CSI) at first use and ensure consistent notation for Information Coefficient throughout.
Clarify whether GPT-5.2 refers to a publicly available model or a custom fine-tuned variant, and state the temperature and prompt templates used for reproducibility.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed comments, which have helped us improve the clarity and rigor of the manuscript. We address each major comment point by point below and have revised the manuscript to incorporate additional experimental details, ablations, and robustness controls where feasible.

read point-by-point responses

Referee: [Abstract / Experimental Results] Abstract and Experimental Results section: the headline metrics (IC 0.1501, ARR 27.75 %, MDD 7.98 %) are stated without any description of the baseline models, the precise out-of-sample protocol, statistical significance tests on IC or return differences, or multiple-testing corrections applied to the evolutionary search itself. These omissions prevent assessment of whether the reported superiority is supported by the data or arises from post-hoc selection on CSI 300 back-test noise.

Authors: We agree that the abstract and results section would benefit from greater transparency on these elements. In the revised manuscript we have expanded both the abstract and the Experimental Results section to explicitly list the baseline models (traditional statistical factors, genetic programming baselines, and prior LLM-agent systems), the precise out-of-sample protocol (training on 2010–2019 CSI 300 data with walk-forward validation on 2020–2023), and the statistical tests performed (paired t-tests and bootstrap confidence intervals for IC and ARR differences). We have also added a paragraph on multiple-testing correction using the Holm–Bonferroni procedure applied to the number of evolutionary iterations and factor evaluations. These additions demonstrate that the reported gains remain statistically significant after correction. revision: yes
Referee: [Methods] Methods (trajectory-level mutation and crossover): the central claim that these operators produce genuine predictive improvements rests on the assumption that localizing suboptimal steps and recombining segments yields better factors than base LLM generation. No ablation isolating the incremental contribution of mutation/crossover versus standard LLM prompting or conventional evolutionary operators is provided, leaving open the possibility that gains reflect overfitting to transient CSI 300 regimes rather than robust alpha.

Authors: We concur that an ablation isolating the contribution of the trajectory-level operators is essential. We have added a dedicated ablation subsection in the Experiments section that compares (i) base LLM prompting without evolution, (ii) conventional evolutionary operators without trajectory localization, and (iii) ablated versions of QuantaAlpha using only mutation or only crossover. The results show statistically significant incremental gains (p < 0.05 via permutation tests) from the full trajectory-level mutation and crossover, supporting that the performance lift is attributable to the proposed operators rather than overfitting to transient regimes. revision: yes
Referee: [Transfer Experiments] Transfer experiments: the 160 % and 137 % cumulative excess returns on CSI 500 and S&P 500 are presented without specification of the exact four-year window, whether factor selection was conditioned on CSI 300 performance after the fact, or any explicit controls for regime shifts and data snooping. This weakens the robustness claim under distribution shifts.

Authors: The four-year window is January 2020–December 2023, as already stated in the full Experiments section; we have now made this explicit in the abstract and transfer subsection. Factor selection was performed exclusively on CSI 300 in-sample performance with no post-hoc conditioning on transfer-set results. To address regime shifts and data snooping we have added walk-forward validation, hidden-Markov-model regime detection, and false-discovery-rate correction across the evolutionary search. The reported transfer returns remain robust under these controls. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper presents QuantaAlpha as an evolutionary framework that applies trajectory-level mutation and crossover to LLM-generated alpha factors, with semantic consistency constraints and complexity controls. Reported metrics (IC 0.1501, ARR 27.75%, MDD 7.98% on CSI 300) and cross-index transfer results are framed as empirical outcomes from experiments, not as quantities derived by redefining inputs or by self-citation chains. No equations or steps reduce the claimed predictive gains to fitted parameters by construction, nor do uniqueness theorems or ansatzes collapse to prior self-work. The derivation remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

2 free parameters · 2 axioms · 0 invented entities

The central claim depends on unstated evolutionary hyperparameters, LLM prompting details, and the assumption that semantic consistency produces non-overfit factors; these are not supplied by prior literature.

free parameters (2)

mutation and crossover rates
Core evolutionary parameters required for trajectory operations but not quantified in the abstract
complexity and redundancy constraints
Parameters used to limit factor complexity during generation

axioms (2)

domain assumption LLMs can reliably produce semantically consistent hypothesis-expression-code triples when prompted
Stated as enforced during factor generation
ad hoc to paper Trajectory-level revision and recombination improve factor quality across iterations
Foundational mechanism of the proposed QuantaAlpha method

pith-pipeline@v0.9.0 · 5666 in / 1503 out tokens · 46070 ms · 2026-05-16T07:06:22.614361+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Cost/FunctionalEquation washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

QuantaAlpha performs self-evolution via trajectory-level mutation and crossover... enforces semantic consistency across the hypothesis, factor expression, and executable code, while constraining the complexity and redundancy of the generated factor
IndisputableMonolith/Foundation/BranchSelection branch_selection unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We apply trajectory-level operators... Mutation... Crossover... to generate improved trajectories

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

SHARP: A Self-Evolving Human-Auditable Rubric Policy for Financial Trading Agents
cs.LG 2026-05 unverdicted novelty 6.0

SHARP is a neuro-symbolic method that evolves bounded, auditable rule rubrics for LLM trading agents via cross-sample attribution and walk-forward validation, raising compact-model performance by 10-20 percentage poin...