Factor-Based Conditional Diffusion Model for Contextual Portfolio Optimization

Mengying He; Xuedong He; Xuefeng Gao

arxiv: 2509.22088 · v2 · submitted 2025-09-26 · 💱 q-fin.PM · stat.ML

Factor-Based Conditional Diffusion Model for Contextual Portfolio Optimization

Xuefeng Gao , Mengying He , Xuedong He This is my paper

Pith reviewed 2026-05-18 12:56 UTC · model grok-4.3

classification 💱 q-fin.PM stat.ML

keywords conditional diffusion modelportfolio optimizationfactor modelsgenerative samplingmean-variance optimizationmean-CVaRChinese A-share marketrisk-adjusted performance

0 comments

The pith

A factor-conditioned diffusion model learns next-day return distributions to improve daily mean-variance and mean-CVaR portfolio optimization.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper tries to establish that training a conditional diffusion model on asset-specific factors produces samples from the cross-sectional distribution of next-day returns, and that optimizing portfolios on those samples yields better risk-adjusted results than standard benchmarks. A sympathetic reader would care because traditional portfolio methods often rely on simplified assumptions about returns that fail to capture complex dependencies and tails in high-dimensional settings. The work applies this to daily optimization with transaction costs and constraints on Chinese A-share data while adding a theoretical analysis of how model approximation errors affect the final portfolios. If the central claim holds, generative sampling could replace parametric assumptions in many contextual financial decisions.

Core claim

The paper claims that a Diffusion Transformer with token-wise conditioning learns the conditional cross-sectional distribution of next-day stock returns given high-dimensional asset-specific factors. Generative samples drawn from this distribution support daily mean-variance and mean-CVaR optimization that includes transaction costs and realistic constraints. Empirical tests on Chinese A-share market data show consistent outperformance over benchmarks across risk-adjusted metrics. A theoretical error analysis quantifies how distributional approximation errors propagate into the downstream optimization task.

What carries the argument

Diffusion Transformer with token-wise conditioning, which links each asset's return to its own factor vector while capturing cross-asset dependencies.

If this is right

Daily mean-variance and mean-CVaR optimizations using the generated samples produce portfolios that outperform standard benchmarks on risk-adjusted metrics.
The theoretical error analysis bounds how closely the learned distribution must match the true one for the optimization gains to remain reliable.
The framework handles transaction costs and portfolio constraints directly within the sample-based optimization routine.
The same generative approach applies to other high-dimensional contextual stochastic optimization problems in finance.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Extending the conditioning to multi-period horizons could support sequential rebalancing strategies without retraining from scratch.
Comparing the diffusion approach against other conditional generative models would isolate whether the transformer architecture or the diffusion process drives the gains.
Applying the model to markets outside the training region would test whether factor-based conditioning transfers across different economic regimes.

Load-bearing premise

The diffusion model must accurately capture the true conditional distribution of next-day returns, including dependencies and tails, so that samples produce reliable out-of-sample portfolio performance.

What would settle it

Compare risk-adjusted performance of portfolios optimized on the model's generated samples versus portfolios optimized on actual realized next-day returns or on historical-sample estimates over the same out-of-sample period.

Figures

Figures reproduced from arXiv: 2509.22088 by Mengying He, Xuedong He, Xuefeng Gao.

**Figure 1.** Figure 1: The above findings show the importance of considering transaction fees in the construction and evaluation of portfolio strategies, whereas these fees are ignored in some studies in the literature. To 5 [PITH_FULL_IMAGE:figures/full_fig_p005_1.png] view at source ↗

**Figure 1.** Figure 1: Portfolio weights over time for the top 5 stocks in the optimal portfolio of [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

read the original abstract

We propose a novel conditional diffusion model for contextual portfolio optimization that learns the cross-sectional distribution of next-day stock returns conditioned on high-dimensional asset-specific factors. Our model leverages a Diffusion Transformer architecture with token-wise conditioning, which enables linking each asset's return to its own factor vector while capturing complex cross-asset dependencies. By drawing generative samples from the learned conditional return distribution, we perform daily mean-variance and mean-CVaR optimization, incorporating transaction costs and realistic constraints. Using data from the Chinese A-share market, we demonstrate that our approach consistently outperforms various standard benchmarks across multiple risk-adjusted performance metrics. Furthermore, we provide a theoretical error analysis that quantifies the propagation of distributional approximation errors from the conditional diffusion model to the downstream portfolio optimization task. Our results demonstrate the potential of generative diffusion models in high-dimensional data-driven contextual stochastic optimization and financial decision making.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper applies a Diffusion Transformer with token-wise factor conditioning to generate conditional asset return samples, then feeds those directly into constrained mean-CVaR optimization with a supporting error bound, and reports gains on Chinese A-shares.

read the letter

The central piece is a conditional diffusion model built on a transformer that conditions each asset token on its own high-dimensional factors while using attention to pick up cross-asset links. Samples from the learned next-day return distribution are then plugged straight into daily mean-variance and mean-CVaR problems that include transaction costs and realistic constraints. The authors also supply a theoretical bound on how approximation error in the generative step carries through to the portfolio objective, and they show better risk-adjusted numbers than several benchmarks on Chinese A-share data.

Referee Report

1 major / 2 minor

Summary. The manuscript proposes a conditional diffusion model based on a Diffusion Transformer with token-wise conditioning on asset-specific factors to learn the cross-sectional distribution of next-day stock returns. Generative samples from this learned conditional distribution are used to solve daily mean-variance and mean-CVaR portfolio optimization problems that incorporate transaction costs and realistic constraints. The approach is evaluated on Chinese A-share market data and reported to outperform standard benchmarks on multiple risk-adjusted performance metrics; a theoretical error analysis is provided to bound the effect of distributional approximation errors on the downstream optimization objective.

Significance. If the empirical outperformance and theoretical bounds hold under rigorous validation, the work would advance the use of generative models for high-dimensional contextual stochastic optimization in finance. The explicit theoretical error analysis that quantifies propagation from the diffusion model to the portfolio objective is a notable strength, as it directly addresses a core concern when using approximate generative distributions for decision-making. The token-wise conditioning plus cross-asset attention mechanism offers a principled way to handle both asset-specific information and dependencies, which is relevant for practical portfolio construction.

major comments (1)

[Empirical Evaluation] The central performance claims rest on the Diffusion Transformer accurately capturing the true conditional cross-sectional return distribution (including tails and dependencies) so that optimization on its samples yields reliable out-of-sample results. While the architecture description and theoretical bound are provided, the manuscript would benefit from explicit numerical checks (e.g., comparison of sample moments or tail quantiles against held-out data) in the empirical section to confirm that the learned distribution is sufficiently faithful for the reported risk-adjusted gains.

minor comments (2)

[Abstract] The abstract and introduction would be clearer if they explicitly stated the number of assets, the exact sample period for the Chinese A-share data, and the precise definitions of the benchmark strategies (including any hyperparameter tuning protocols).
[Model Description] Notation for the conditioning factors and the risk-aversion/CVaR parameters should be introduced once in a dedicated notation table or subsection to avoid repeated re-definition across the model and optimization sections.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the positive evaluation and the recommendation of minor revision. The suggestion to strengthen the empirical validation of the learned conditional distribution is constructive and will improve the manuscript.

read point-by-point responses

Referee: The central performance claims rest on the Diffusion Transformer accurately capturing the true conditional cross-sectional return distribution (including tails and dependencies) so that optimization on its samples yields reliable out-of-sample results. While the architecture description and theoretical bound are provided, the manuscript would benefit from explicit numerical checks (e.g., comparison of sample moments or tail quantiles against held-out data) in the empirical section to confirm that the learned distribution is sufficiently faithful for the reported risk-adjusted gains.

Authors: We agree that direct numerical diagnostics comparing generated samples to held-out data would provide stronger support for the fidelity of the learned distribution. In the revised version we will add a new subsection (or expand the existing empirical section) that reports comparisons of first four moments (mean, variance, skewness, kurtosis) as well as selected tail quantiles (5 % and 95 %) between the diffusion-generated samples and the corresponding held-out real returns for multiple representative trading days. These checks will be presented alongside the existing portfolio-performance results to confirm that the model captures both central tendency and tail behavior sufficiently well for the downstream optimization task. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation remains self-contained

full rationale

The paper trains a Diffusion Transformer on historical Chinese A-share data to learn a conditional distribution of next-day returns given asset-specific factors, then draws samples for daily mean-variance and mean-CVaR optimization with constraints. Performance is evaluated on held-out periods, and a separate theoretical error analysis bounds propagation of approximation errors to the portfolio objective. No equation or step reduces the reported outperformance to a fitted parameter by construction, nor does any load-bearing claim collapse to a self-citation or self-definition. The central pipeline (training on market data, sampling, optimization, out-of-sample testing) is independent of the target results and externally falsifiable, satisfying the criteria for a self-contained derivation.

Axiom & Free-Parameter Ledger

2 free parameters · 1 axioms · 0 invented entities

The central claim rests on standard neural-network training assumptions plus the domain premise that daily equity returns admit a factor-conditioned diffusion representation; no new physical entities are postulated.

free parameters (2)

Diffusion Transformer hyperparameters
Architecture depth, attention heads, diffusion steps, and conditioning strength are chosen or tuned during training and directly affect sample quality.
Risk aversion and CVaR level
Parameters in the downstream mean-variance and mean-CVaR optimizers that shape the final portfolio.

axioms (1)

domain assumption Next-day stock returns can be usefully modeled as samples from a conditional diffusion process given asset-specific factors.
This is the modeling premise that justifies training the generative model instead of a direct regression or parametric distribution.

pith-pipeline@v0.9.0 · 5673 in / 1285 out tokens · 31814 ms · 2026-05-18T12:56:58.247799+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

19 extracted references · 19 canonical work pages · 1 internal anchor

[1]

Synthetic data for portfolios: A throw of the dice will never abolish chance

Adil Rengim Cetingoz and Charles-Albert Lehalle. Synthetic data for portfolios: A throw of the dice will never abolish chance. arXiv preprint arXiv:2501.03993, 2025

work page arXiv 2025
[2]

Diffusion factor models: Generating high-dimensional returns with factor structure

Minshuo Chen, Renyuan Xu, Yumin Xu, and Ruixun Zhang. Diffusion factor models: Generating high-dimensional returns with factor structure. arXiv preprint arXiv:2504.06566, 2025

work page arXiv 2025
[3]

A generalized approach to portfolio optimization: Improving performance by constraining portfolio norms

Victor DeMiguel, Lorenzo Garlappi, Francisco J Nogales, and Raman Uppal. A generalized approach to portfolio optimization: Improving performance by constraining portfolio norms. Management science, 55 0 (5): 0 798--812, 2009

work page 2009
[4]

Diffusion models beat gans on image synthesis

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34: 0 8780--8794, 2021

work page 2021
[5]

Generative adversarial nets

Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. Advances in neural information processing systems, 27, 2014

work page 2014
[6]

Empirical asset pricing via machine learning

Shihao Gu, Bryan Kelly, and Dacheng Xiu. Empirical asset pricing via machine learning. The Review of Financial Studies, 33 0 (5): 0 2223--2273, 2020

work page 2020
[7]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, volume 33, 2020

work page 2020
[8]

Mean--variance portfolio selection by continuous-time reinforcement learning: Algorithms, regret analysis, and empirical study

Yilie Huang, Yanwei Jia, and Xun Yu Zhou. Mean--variance portfolio selection by continuous-time reinforcement learning: Algorithms, regret analysis, and empirical study. arXiv preprint arXiv:2412.16175, 2024

work page arXiv 2024
[9]

Estimation with quadratic loss

William James and Charles Stein. Estimation with quadratic loss. In Breakthroughs in statistics: Foundations and basic theory, pages 443--460. Springer, 1992

work page 1992
[10]

(re-) imag (in) ing price trends

Jingwen Jiang, Bryan Kelly, and Dacheng Xiu. (re-) imag (in) ing price trends. The Journal of Finance, 78 0 (6): 0 3193--3249, 2023

work page 2023
[11]

Machine learning in the chinese stock market

Markus Leippold, Qian Wang, and Wenyu Zhou. Machine learning in the chinese stock market. Journal of Financial Economics, 145 0 (2): 0 64--82, 2022

work page 2022
[12]

Generative time series forecasting with diffusion, denoise, and disentanglement

Yan Li, Xinjiang Lu, Yaqing Wang, and Dejing Dou. Generative time series forecasting with diffusion, denoise, and disentanglement. Advances in Neural Information Processing Systems, 35: 0 23009--23022, 2022

work page 2022
[13]

Portfolio selection

Harry Markowitz. Portfolio selection. Journal of Finance, 7 0 (1): 0 77--91, Mar. 1952

work page 1952
[14]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4195--4205, 2023

work page 2023
[15]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022
[16]

Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting

Kashif Rasul, Calvin Seward, Ingmar Schuster, and Roland Vollgraf. Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting. In International conference on machine learning, pages 8857--8868. PMLR, 2021

work page 2021
[17]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj \"o rn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684--10695, 2022

work page 2022
[18]

The arbitrage theory of capital asset pricing

Stephen A Ross. The arbitrage theory of capital asset pricing. In Handbook of the fundamentals of financial decision making: Part I, pages 11--30. World Scientific, 2013

work page 2013
[19]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, volume 32, 2019

work page 2019

[1] [1]

Synthetic data for portfolios: A throw of the dice will never abolish chance

Adil Rengim Cetingoz and Charles-Albert Lehalle. Synthetic data for portfolios: A throw of the dice will never abolish chance. arXiv preprint arXiv:2501.03993, 2025

work page arXiv 2025

[2] [2]

Diffusion factor models: Generating high-dimensional returns with factor structure

Minshuo Chen, Renyuan Xu, Yumin Xu, and Ruixun Zhang. Diffusion factor models: Generating high-dimensional returns with factor structure. arXiv preprint arXiv:2504.06566, 2025

work page arXiv 2025

[3] [3]

A generalized approach to portfolio optimization: Improving performance by constraining portfolio norms

Victor DeMiguel, Lorenzo Garlappi, Francisco J Nogales, and Raman Uppal. A generalized approach to portfolio optimization: Improving performance by constraining portfolio norms. Management science, 55 0 (5): 0 798--812, 2009

work page 2009

[4] [4]

Diffusion models beat gans on image synthesis

Prafulla Dhariwal and Alexander Nichol. Diffusion models beat gans on image synthesis. Advances in neural information processing systems, 34: 0 8780--8794, 2021

work page 2021

[5] [5]

Generative adversarial nets

Ian J Goodfellow, Jean Pouget-Abadie, Mehdi Mirza, Bing Xu, David Warde-Farley, Sherjil Ozair, Aaron Courville, and Yoshua Bengio. Generative adversarial nets. Advances in neural information processing systems, 27, 2014

work page 2014

[6] [6]

Empirical asset pricing via machine learning

Shihao Gu, Bryan Kelly, and Dacheng Xiu. Empirical asset pricing via machine learning. The Review of Financial Studies, 33 0 (5): 0 2223--2273, 2020

work page 2020

[7] [7]

Denoising diffusion probabilistic models

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. In Advances in Neural Information Processing Systems, volume 33, 2020

work page 2020

[8] [8]

Mean--variance portfolio selection by continuous-time reinforcement learning: Algorithms, regret analysis, and empirical study

Yilie Huang, Yanwei Jia, and Xun Yu Zhou. Mean--variance portfolio selection by continuous-time reinforcement learning: Algorithms, regret analysis, and empirical study. arXiv preprint arXiv:2412.16175, 2024

work page arXiv 2024

[9] [9]

Estimation with quadratic loss

William James and Charles Stein. Estimation with quadratic loss. In Breakthroughs in statistics: Foundations and basic theory, pages 443--460. Springer, 1992

work page 1992

[10] [10]

(re-) imag (in) ing price trends

Jingwen Jiang, Bryan Kelly, and Dacheng Xiu. (re-) imag (in) ing price trends. The Journal of Finance, 78 0 (6): 0 3193--3249, 2023

work page 2023

[11] [11]

Machine learning in the chinese stock market

Markus Leippold, Qian Wang, and Wenyu Zhou. Machine learning in the chinese stock market. Journal of Financial Economics, 145 0 (2): 0 64--82, 2022

work page 2022

[12] [12]

Generative time series forecasting with diffusion, denoise, and disentanglement

Yan Li, Xinjiang Lu, Yaqing Wang, and Dejing Dou. Generative time series forecasting with diffusion, denoise, and disentanglement. Advances in Neural Information Processing Systems, 35: 0 23009--23022, 2022

work page 2022

[13] [13]

Portfolio selection

Harry Markowitz. Portfolio selection. Journal of Finance, 7 0 (1): 0 77--91, Mar. 1952

work page 1952

[14] [14]

Scalable diffusion models with transformers

William Peebles and Saining Xie. Scalable diffusion models with transformers. In Proceedings of the IEEE/CVF international conference on computer vision, pages 4195--4205, 2023

work page 2023

[15] [15]

Hierarchical Text-Conditional Image Generation with CLIP Latents

Aditya Ramesh, Prafulla Dhariwal, Alex Nichol, Casey Chu, and Mark Chen. Hierarchical text-conditional image generation with CLIP latents. arXiv preprint arXiv:2204.06125, 2022

work page internal anchor Pith review Pith/arXiv arXiv 2022

[16] [16]

Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting

Kashif Rasul, Calvin Seward, Ingmar Schuster, and Roland Vollgraf. Autoregressive denoising diffusion models for multivariate probabilistic time series forecasting. In International conference on machine learning, pages 8857--8868. PMLR, 2021

work page 2021

[17] [17]

High-resolution image synthesis with latent diffusion models

Robin Rombach, Andreas Blattmann, Dominik Lorenz, Patrick Esser, and Bj \"o rn Ommer. High-resolution image synthesis with latent diffusion models. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 10684--10695, 2022

work page 2022

[18] [18]

The arbitrage theory of capital asset pricing

Stephen A Ross. The arbitrage theory of capital asset pricing. In Handbook of the fundamentals of financial decision making: Part I, pages 11--30. World Scientific, 2013

work page 2013

[19] [19]

Generative modeling by estimating gradients of the data distribution

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution. In Advances in Neural Information Processing Systems, volume 32, 2019

work page 2019