Machine Learning Enhanced Multi-Factor Quantitative Trading: A Cross-Sectional Portfolio Optimization Approach with Bias Correction

Yimin Du

arxiv: 2507.07107 · v2 · submitted 2025-06-02 · 💱 q-fin.PM · cs.CE

Machine Learning Enhanced Multi-Factor Quantitative Trading: A Cross-Sectional Portfolio Optimization Approach with Bias Correction

Yimin Du This is my paper

Pith reviewed 2026-05-19 12:06 UTC · model grok-4.3

classification 💱 q-fin.PM cs.CE

keywords quantitative tradingfactor modelsmachine learningA-share marketbias correctionportfolio optimizationtradability maskupstream contamination

0 comments

The pith

A Boolean tradability mask built at data load time removes upstream contamination from non-executable prices in A-share factor pipelines and raises realised Sharpe by 0.44.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper claims that standard rolling-window factor models for Chinese A-shares ingest closing prices that daily move limits have made non-executable. These values contaminate moving averages, correlations and ranks before any filtering occurs, so models learn to predict returns they cannot trade. This upstream contamination inflates apparent information coefficients by 18 percent and lowers realised Sharpe by 0.44 points. The fix is a mask-first design: a Boolean tradability mask is created when data loads and is applied to every operator, ensuring no window ever sees a non-tradable price. Ablation on real 2022-2024 A-share data shows the mask contributes more performance than model choice, loss function or portfolio optimiser, yielding an overall Sharpe of 1.63.

Core claim

The central claim is that threading a Boolean tradability mask through every factor operator at data-load time fully blocks non-executable prices caused by daily limits, eliminating the 18 percent IC inflation and 0.44 Sharpe penalty that arise when models train on untradeable signals. On proprietary real A-share data this produces an annualised Sharpe of 1.63; on a calibrated 3,000-stock synthetic panel the same pipeline reaches 2.05. Ablation identifies the mask as the single largest contributor, larger than any change to the model, loss or optimisation step.

What carries the argument

The Boolean tradability mask, constructed once at data load and threaded through every windowed operator so that non-tradable prices are excluded from all aggregates.

If this is right

Realised Sharpe on real A-share data improves by 0.44 points once non-tradable prices are excluded.
Apparent information coefficient no longer overstates predictive power by 18 percent.
The GPU-vectorised 213-factor engine runs 51 times faster than pandas while respecting the mask.
Adjusted-MSE loss that penalises wrong-sign errors 11 times more than magnitude errors can now operate on uncontaminated inputs.
Markowitz-Ledoit-Wolf optimisation with cvxpy warm-start produces portfolios from signals that are actually executable.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same load-time masking step could be applied to any market that uses daily price limits, circuit breakers or trading halts.
Aligning every intermediate calculation with final execution constraints may reduce the gap between back-test and live performance in other machine-learning trading systems.
Requiring the mask to be preserved through every operator offers a concrete test for whether a pipeline has truly removed look-ahead or selection bias.

Load-bearing premise

That building the Boolean tradability mask at data-load time and applying it to every operator fully prevents contamination without creating new selection bias or changing the statistical properties of the remaining data.

What would settle it

Running the identical pipeline on the same 2022-2024 A-share data once with the mask and once without it; if the version without the mask does not show an 18 percent higher apparent IC and a 0.44 lower realised Sharpe, the central claim is falsified.

read the original abstract

Rolling-window factor pipelines for Chinese A-share markets contain a subtle but costly flaw: daily price-move limits (+/-10% main-board, +/-20% STAR/ChiNext) render a fraction of closing prices non-executable, yet standard implementations ingest these values before any row-filtering runs. The contaminated aggregates propagate silently through moving averages, correlations, and ranks--a failure mode we term "upstream contamination". On real A-share data it inflates apparent information coefficient by 18% while reducing realised Sharpe by 0.44 points, because the model learns to predict returns it cannot trade. We resolve this with a mask-first design: a Boolean tradability mask is constructed at data load time and threaded through every operator, so that no window ever reads a non-tradable price. Built on this foundation, the system adds (i) a GPU-vectorised 213-factor engine via PyTorch unfold primitives (51x over pandas); (ii) an Adjusted-MSE loss penalising wrong-sign predictions 11x more heavily than magnitude errors; (iii) block-bootstrap GBM augmentation; and (iv) Markowitz-Ledoit-Wolf portfolio optimisation with cvxpy warm-start caching. On a calibrated 3,000-stock synthetic panel the system achieves annualised Sharpe 2.05; on proprietary real A-share data (2022-2024) it achieves Sharpe 1.63. Ablation shows the mask contract is the single largest contributor (+0.44), exceeding any model or loss choice. The full implementation is released under MIT licence at https://github.com/initial-d/ml-quant-trading.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The mask-first threading through operators is the real contribution here, delivering a measurable fix for price-limit contamination on A-shares, though proprietary data leaves the size of the effect hard to verify independently.

read the letter

The main thing to know is that this paper shows how building a tradability mask at load time and threading it through every rolling operator stops models from training on non-executable limit-hit closes in Chinese A-shares. They report this cuts IC inflation by 18 percent and lifts realized Sharpe by 0.44 on their 2022-2024 data, with the mask mattering more than model choice or loss function in the ablations.

Referee Report

2 major / 2 minor

Summary. The manuscript identifies upstream contamination in standard rolling-window factor pipelines for Chinese A-share markets, where non-executable limit-hit closing prices (+/-10% or +/-20%) propagate through moving averages, correlations, and ranks before any filtering. It proposes a mask-first design that constructs a Boolean tradability mask at data load time and threads it through every operator to ensure no window reads non-tradable prices. The system further includes a GPU-vectorized 213-factor engine (51x speedup via PyTorch unfold), an Adjusted-MSE loss with 11x wrong-sign penalty, block-bootstrap GBM augmentation, and Markowitz-Ledoit-Wolf optimization with cvxpy caching. On a 3,000-stock synthetic panel it reports annualised Sharpe 2.05; on proprietary real A-share data (2022-2024) it reports Sharpe 1.63, with ablation attributing the single largest gain (+0.44 Sharpe) to the mask contract.

Significance. If the results hold, the work supplies a concrete, implementable fix for a pervasive but previously unquantified failure mode in limit-constrained markets, directly linking contamination removal to an 18% IC inflation reduction and 0.44 Sharpe recovery. The open-source MIT-licensed code and the dominance of the mask in ablations over model or loss choices make the contribution actionable for practitioners and provide a reproducible baseline for future research on execution-aware factor construction.

major comments (2)

[Mask threading through operators and ablation study] The central claim that the mask eliminates contamination without introducing new selection bias rests on the untested assumption that excluding limit-hit observations from rolling windows leaves factor statistics (means, variances, autocorrelations) materially unchanged. The ablation on proprietary data shows the mask dominates other choices, yet no table or figure quantifies the shift in factor distributions between masked and unmasked versions on the same non-limit periods; without this, the reported +0.44 Sharpe lift could partly reflect a change in training distribution rather than pure decontamination.
[Real-data evaluation and ablation results] The real-data Sharpe of 1.63 and the 18% IC inflation figure are obtained on proprietary 2022-2024 A-share data whose construction details (universe selection, handling of ST stocks, exact limit-hit definition) are not visible. While the synthetic-panel result (Sharpe 2.05) provides partial corroboration, the absence of a controlled public-data experiment isolating the mask effect leaves the load-bearing performance deltas unverifiable.

minor comments (2)

[Factor engine description] The abstract states a '51x over pandas' speedup but does not specify the hardware, data size, or exact benchmark configuration used for the comparison; adding these details would make the performance claim reproducible.
[Loss function] Notation for the Adjusted-MSE loss (wrong-sign penalty multiplier) is introduced without an explicit equation number; numbering it would improve clarity when the loss is referenced in the ablation.

Simulated Author's Rebuttal

2 responses · 1 unresolved

We thank the referee for the constructive and detailed comments, which help strengthen the manuscript. We address each major comment below, indicating where revisions will be made to improve clarity and evidence.

read point-by-point responses

Referee: [Mask threading through operators and ablation study] The central claim that the mask eliminates contamination without introducing new selection bias rests on the untested assumption that excluding limit-hit observations from rolling windows leaves factor statistics (means, variances, autocorrelations) materially unchanged. The ablation on proprietary data shows the mask dominates other choices, yet no table or figure quantifies the shift in factor distributions between masked and unmasked versions on the same non-limit periods; without this, the reported +0.44 Sharpe lift could partly reflect a change in training distribution rather than pure decontamination.

Authors: We agree that quantifying the impact on factor statistics strengthens the argument. In the revised manuscript we will add a new table (or supplementary figure) that compares means, variances, and autocorrelations for a representative subset of the 213 factors, computed exclusively on non-limit-hit periods under both the masked and unmasked pipelines. This analysis will show that the statistics remain materially unchanged when restricted to tradable observations, supporting that the +0.44 Sharpe improvement arises from decontamination rather than an unintended shift in the training distribution. The existing ablation already isolates the mask contribution; the new comparison provides direct evidence for the no-selection-bias claim. revision: yes
Referee: [Real-data evaluation and ablation results] The real-data Sharpe of 1.63 and the 18% IC inflation figure are obtained on proprietary 2022-2024 A-share data whose construction details (universe selection, handling of ST stocks, exact limit-hit definition) are not visible. While the synthetic-panel result (Sharpe 2.05) provides partial corroboration, the absence of a controlled public-data experiment isolating the mask effect leaves the load-bearing performance deltas unverifiable.

Authors: We acknowledge that proprietary data constraints limit full public verification. In the revision we will expand the data section with additional non-confidential details on universe construction, ST-stock handling, and the precise definition of limit-hit events used to build the tradability mask. The calibrated 3,000-stock synthetic panel already provides a fully controlled, reproducible isolation of the mask effect (Sharpe 2.05). While we cannot release the proprietary A-share panel, the MIT-licensed code base enables any researcher to apply the identical mask-first pipeline to public datasets, thereby reproducing the experimental design. We view the combination of synthetic controls and real-data ablations as sufficient to substantiate the core contribution. revision: partial

standing simulated objections not resolved

Full public release or replication of the exact 2022-2024 proprietary A-share dataset due to licensing and confidentiality restrictions.

Circularity Check

0 steps flagged

Empirical results on held-out real data anchor claims; no definitional or fitted-input circularity

full rationale

The paper presents an empirical pipeline whose performance metrics (Sharpe 1.63, mask contribution +0.44) are obtained from ablation and evaluation on proprietary 2022-2024 A-share data and a separate synthetic panel. The tradability mask is constructed at load time and its incremental effect is measured by direct comparison against the unmasked baseline; this comparison is external to any internal fitting loop. No equation reduces a reported quantity to a parameter defined inside the paper, no self-citation supplies a load-bearing uniqueness theorem, and the released code permits independent reproduction. The skeptic concern about selection bias in masked windows is a question of statistical validity, not a circular reduction of the reported lift to the mask definition itself. Consequently the derivation chain remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The central claim rests on the domain assumption that price-limit rules create non-executable closes and on a hand-chosen penalty multiplier inside the Adjusted-MSE loss. No new particles or dimensions are postulated.

free parameters (1)

wrong-sign penalty multiplier
The Adjusted-MSE loss penalises wrong-sign predictions 11x more heavily than magnitude errors; the factor 11 is introduced without derivation from first principles.

axioms (1)

domain assumption Daily price-move limits render a fraction of closing prices non-executable
Invoked at the start of the abstract to motivate the contamination problem.

pith-pipeline@v0.9.0 · 5828 in / 1460 out tokens · 66159 ms · 2026-05-19T12:06:18.979647+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

10 extracted references · 10 canonical work pages

[1]

Fama and K

E. Fama and K. French, ”Common risk factors in the returns on stocks and bonds,” Journal of Financial Economics , vol. 33, no. 1, pp. 3-56, 1993

work page 1993
[2]

Ross, ”The arbitrage theory of capital asset pricing, ” Journal of Economic Theory , vol

S. Ross, ”The arbitrage theory of capital asset pricing, ” Journal of Economic Theory , vol. 13, no. 3, pp. 341-360, 1976

work page 1976
[3]

Chen and C

T. Chen and C. Guestrin, ”XGBoost: A scalable tree boosti ng system,” in Proc. 22nd ACM SIGKDD International Conference on Knowledg e Discovery and Data Mining , 2016, pp. 785-794

work page 2016
[4]

Ke et al., ”LightGBM: A highly efﬁcient gradient boost ing decision tree,” in Advances in Neural Information Processing Systems , 2017, pp

G. Ke et al., ”LightGBM: A highly efﬁcient gradient boost ing decision tree,” in Advances in Neural Information Processing Systems , 2017, pp. 3146-3154

work page 2017
[5]

V aswani et al., ”Attention is all you need,” in Advances in Neural Information Processing Systems , 2017, pp

A. V aswani et al., ”Attention is all you need,” in Advances in Neural Information Processing Systems , 2017, pp. 5998-6008

work page 2017
[6]

Liu et al., ”Alpha158: A benchmark for factor-based st ock selection,” arXiv preprint arXiv:2101.02555 , 2021

Z. Liu et al., ”Alpha158: A benchmark for factor-based st ock selection,” arXiv preprint arXiv:2101.02555 , 2021

work page arXiv 2021
[7]

L ´ opez de Prado, ”The 7 reasons most machine learning f unds fail,” The Journal of Portfolio Management , vol

M. L ´ opez de Prado, ”The 7 reasons most machine learning f unds fail,” The Journal of Portfolio Management , vol. 44, no. 6, pp. 120-133, 2018

work page 2018
[8]

Bouchaud et al., ”The endogenous dynamics of markets: Price impact and feedback loops,” Physica A: Statistical Mechanics and its Applications, vol

J. Bouchaud et al., ”The endogenous dynamics of markets: Price impact and feedback loops,” Physica A: Statistical Mechanics and its Applications, vol. 310, no. 3-4, pp. 243-259, 2002

work page 2002
[9]

Grinold and R

R. Grinold and R. Kahn, Active Portfolio Management , McGraw-Hill, 2000

work page 2000
[10]

Lo and A

A. Lo and A. MacKinlay, A Non-Random W alk Down W all Street , Princeton University Press, 1999

work page 1999

[1] [1]

Fama and K

E. Fama and K. French, ”Common risk factors in the returns on stocks and bonds,” Journal of Financial Economics , vol. 33, no. 1, pp. 3-56, 1993

work page 1993

[2] [2]

Ross, ”The arbitrage theory of capital asset pricing, ” Journal of Economic Theory , vol

S. Ross, ”The arbitrage theory of capital asset pricing, ” Journal of Economic Theory , vol. 13, no. 3, pp. 341-360, 1976

work page 1976

[3] [3]

Chen and C

T. Chen and C. Guestrin, ”XGBoost: A scalable tree boosti ng system,” in Proc. 22nd ACM SIGKDD International Conference on Knowledg e Discovery and Data Mining , 2016, pp. 785-794

work page 2016

[4] [4]

Ke et al., ”LightGBM: A highly efﬁcient gradient boost ing decision tree,” in Advances in Neural Information Processing Systems , 2017, pp

G. Ke et al., ”LightGBM: A highly efﬁcient gradient boost ing decision tree,” in Advances in Neural Information Processing Systems , 2017, pp. 3146-3154

work page 2017

[5] [5]

V aswani et al., ”Attention is all you need,” in Advances in Neural Information Processing Systems , 2017, pp

A. V aswani et al., ”Attention is all you need,” in Advances in Neural Information Processing Systems , 2017, pp. 5998-6008

work page 2017

[6] [6]

Liu et al., ”Alpha158: A benchmark for factor-based st ock selection,” arXiv preprint arXiv:2101.02555 , 2021

Z. Liu et al., ”Alpha158: A benchmark for factor-based st ock selection,” arXiv preprint arXiv:2101.02555 , 2021

work page arXiv 2021

[7] [7]

L ´ opez de Prado, ”The 7 reasons most machine learning f unds fail,” The Journal of Portfolio Management , vol

M. L ´ opez de Prado, ”The 7 reasons most machine learning f unds fail,” The Journal of Portfolio Management , vol. 44, no. 6, pp. 120-133, 2018

work page 2018

[8] [8]

Bouchaud et al., ”The endogenous dynamics of markets: Price impact and feedback loops,” Physica A: Statistical Mechanics and its Applications, vol

J. Bouchaud et al., ”The endogenous dynamics of markets: Price impact and feedback loops,” Physica A: Statistical Mechanics and its Applications, vol. 310, no. 3-4, pp. 243-259, 2002

work page 2002

[9] [9]

Grinold and R

R. Grinold and R. Kahn, Active Portfolio Management , McGraw-Hill, 2000

work page 2000

[10] [10]

Lo and A

A. Lo and A. MacKinlay, A Non-Random W alk Down W all Street , Princeton University Press, 1999

work page 1999