Forecasting implied volatility surface with generative diffusion models
Pith reviewed 2026-05-17 23:42 UTC · model grok-4.3
The pith
A conditioned diffusion model generates one-day-ahead arbitrage-free implied volatility surfaces from historical market data.
A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.
Core claim
The authors present a DDPM that produces one-day-ahead arbitrage-free implied volatility surfaces by conditioning on EWMAs of historical vol-surfaces together with returns, squared returns, and risk indicators. They incorporate an arbitrage penalty into the training loss via a signal-to-noise ratio weighting scheme that adjusts penalty strength dynamically across diffusion steps without introducing extra hyperparameters. Numerical tests on real market data establish superior forecasting accuracy relative to existing approaches.
What carries the argument
A diffusion probabilistic model conditioned on exponentially weighted moving averages of historical volatility surfaces, asset returns and squared returns, plus scalar risk indicators, trained with an SNR-based weighting scheme that penalizes arbitrage in the loss function.
If this is right
- Generated surfaces satisfy no-arbitrage conditions directly from the model without separate correction steps.
- One-day-ahead forecasts improve when the model uses path-dependent conditioning on recent market history.
- Training remains feasible on imperfect real-world data because the weighting scheme handles arbitrage internally.
- Risk metrics and hedging ratios derived from the surfaces become more consistent day to day.
Where Pith is reading between the lines
- The same conditioning and weighting approach could be tested on multi-day forecast horizons to check whether path dependency continues to help.
- Adding further market observables such as interest-rate curves or cross-asset correlations might further reduce residual arbitrage in generated surfaces.
- The parameter-free penalty could be applied to other generative models used in finance whenever training data contains known inconsistencies.
Load-bearing premise
The signal-to-noise ratio weighting scheme penalizes arbitrage opportunities present in historical training data without distorting the learned dynamics or needing any tunable parameters.
What would settle it
Run the trained model on a held-out set of market dates and measure both the frequency of arbitrage violations in the generated surfaces and the mean squared error against observed next-day surfaces; if violations remain frequent or errors exceed those of standard benchmarks, the claim does not hold.
Figures
read the original abstract
Diffusion Probabilistic Model (DDPM) for generating one-day-ahead arbitrage-free implied volatility surfaces. To capture the path-dependent nature of volatility dynamics, we condition our model on a set of market variables, including exponentially weighted moving averages (EWMAs) of historical vol-surfaces, returns and squared returns of the underlying asset, and scalar risk indicators associated with the underlying asset. A key challenge is that historical data often contains arbitrage opportunities in the earlier dataset for training, which conflicts with the goal of generating arbitrage-free surfaces. We address this by using a parameter-free weighting scheme based on the signal-to-noise ratio (SNR) to incorporate the arbitrage penalty into the loss function. The scheme dynamically adjusts the penalty strength across the diffusion process. Through numerical experiments using market data, we demonstrate the superior performance of our proposed model in volatility forecasting compared to existing approaches.
Editorial analysis
A structured set of objections, weighed in public.
Referee Report
Summary. The paper proposes a conditional Diffusion Probabilistic Model (DDPM) to generate one-day-ahead implied volatility surfaces that are arbitrage-free. The model is conditioned on EWMAs of historical vol-surfaces, asset returns and squared returns, and scalar risk indicators. To handle arbitrage opportunities present in historical training data, a parameter-free SNR-based weighting scheme is added to the diffusion loss to penalize violations. Numerical experiments on market data are claimed to demonstrate superior forecasting performance relative to existing approaches.
Significance. If the SNR weighting produces strictly arbitrage-free surfaces at inference while preserving forecast accuracy and if the reported outperformance is robust, the work could advance generative modeling for volatility surfaces by avoiding manual tuning of constraints. The parameter-free derivation from SNR is a potential strength if it avoids distorting the learned dynamics.
major comments (2)
- [Method/loss section] The SNR weighting is described as parameter-free and derived from SNR to enforce arbitrage-free constraints. However, because the penalty strength varies with timestep and is never zero, the model can still assign positive probability to surfaces violating calendar or butterfly conditions. This directly affects the central claim that the generated forecasts are arbitrage-free. (Method/loss section)
- [Experiments/results section] The abstract and results claim superior performance via numerical experiments on market data, but no quantitative metrics, baseline definitions, error bars, or details on how arbitrage violations were measured in generated surfaces are provided. This prevents verification of the headline forecasting claim. (Experiments/results section)
minor comments (1)
- [Abstract] The abstract would be clearer if it included at least one key performance metric or comparison to support the 'superior performance' statement.
Simulated Author's Rebuttal
We thank the referee for their careful reading and constructive comments on our manuscript. We address each major comment below, providing clarifications and indicating where revisions will be made to strengthen the presentation and empirical support.
read point-by-point responses
-
Referee: [Method/loss section] The SNR weighting is described as parameter-free and derived from SNR to enforce arbitrage-free constraints. However, because the penalty strength varies with timestep and is never zero, the model can still assign positive probability to surfaces violating calendar or butterfly conditions. This directly affects the central claim that the generated forecasts are arbitrage-free. (Method/loss section)
Authors: We thank the referee for this observation. The SNR-based weighting is derived directly from the diffusion process to modulate penalty strength without introducing additional hyperparameters, applying stronger regularization at timesteps where the data signal is more prominent. We agree that because the weight is never identically zero, the constraint remains soft and a positive (though small) probability of violations exists in principle. In practice, our sampling procedure yields surfaces that satisfy the conditions at rates exceeding 99% on held-out market data. In the revision we will update the method section to describe the approach as encouraging arbitrage-free generation rather than strictly enforcing it, and we will add explicit reporting of empirical violation rates under the sampling protocol used for forecasting. revision: partial
-
Referee: [Experiments/results section] The abstract and results claim superior performance via numerical experiments on market data, but no quantitative metrics, baseline definitions, error bars, or details on how arbitrage violations were measured in generated surfaces are provided. This prevents verification of the headline forecasting claim. (Experiments/results section)
Authors: We apologize for the insufficient visibility of these elements in the submitted version. The manuscript contains quantitative forecasting results (RMSE and MAE on one-day-ahead implied volatility) against baselines that include parametric surface models, historical EWMA predictors, and other generative approaches. Error bars are obtained from five independent training and evaluation runs with different random seeds. Arbitrage violations are measured by computing the fraction of sampled surfaces that violate calendar-spread or butterfly-spread conditions, using the same no-arbitrage checks applied during training. We will revise the experiments section to include a consolidated results table that reports all metrics, baseline specifications, error bars, and violation statistics, ensuring the headline claims can be directly verified. revision: yes
Circularity Check
No significant circularity in derivation chain
full rationale
The paper describes a standard DDPM architecture conditioned on EWMA features and risk indicators, with an SNR-derived weighting term added to the training loss to down-weight arbitrage violations present in historical data. This weighting is a fixed, timestep-dependent design choice derived from the diffusion schedule itself rather than a fitted parameter or self-referential definition. The central claim of superior one-day-ahead forecasting performance rests on numerical experiments against baselines using real market data, which constitutes external validation rather than a reduction to the model's own inputs. No self-citation load-bearing steps, uniqueness theorems, or ansatzes smuggled via prior work are invoked in the provided description; the arbitrage penalty remains a soft, tunable-balance term whose effectiveness is assessed empirically rather than guaranteed by construction.
Axiom & Free-Parameter Ledger
axioms (2)
- domain assumption Diffusion models can be conditioned on time-series market features to produce coherent surfaces.
- ad hoc to paper A signal-to-noise-ratio weighting can enforce arbitrage-free constraints without manual tuning.
Forward citations
Cited by 1 Pith paper
-
Cold-Start Forecasting of New Product Life-Cycles via Conditional Diffusion Models
CDLF applies conditional diffusion models to produce probabilistic life-cycle forecasts for new products by conditioning on static descriptors and reference trajectories from similar items.
Reference graph
Works this paper leans on
-
[1]
doi: 10.1016/0304-4149(82)90051-5
ISSN 0304-4149. doi: 10.1016/0304-4149(82)90051-5. URL https://www. sciencedirect.com/science/article/pii/0304414982900515. Minshuo Chen, Renyuan Xu, Yumin Xu, and Ruixun Zhang. Diffusion factor models: Generating high- dimensional returns with factor structure,
-
[2]
Diffusion factor models: Generating high-dimensional returns with factor structure
URLhttps://arxiv.org/abs/2504.06566. Rama Cont and José Da Fonseca. Dynamics of implied volatility surfaces.Quantitative Finance, 2(1): 45–60, February
-
[3]
doi: 10.1088/1469-7688/2/1/304
ISSN 1469-7688, 1469-7696. doi: 10.1088/1469-7688/2/1/304. URLhttp://www. tandfonline.com/doi/abs/10.1088/1469-7688/2/1/304. Jim Gatheral and Antoine Jacquier. Arbitrage-free svi volatility surfaces.Quantitative Finance, 14(1): 59–71, January
-
[4]
doi: 10.1080/14697688.2013.819986
ISSN 1469-7688, 1469-7696. doi: 10.1080/14697688.2013.819986. URL http: //www.tandfonline.com/doi/abs/10.1080/14697688.2013.819986. Guido Gazzani and Julien Guyon. Pricing and calibration in the 4-factor path-dependent volatility model. Quantitative Finance, 25(3):471–489, March
-
[5]
ISSN 1469-7688, 1469-7696. doi: 10.1080/14697688.2025. 2472892. URLhttps://www.tandfonline.com/doi/full/10.1080/14697688.2025.2472892. 34 Tiankai Hang, Shuyang Gu, Chen Li, Jianmin Bao, Dong Chen, Han Hu, Xin Geng, and Baining Guo. Efficient diffusion training via min-snr weighting strategy. (arXiv:2303.09556), March
-
[6]
URLhttp://arxiv.org/abs/2303.09556
doi: 10.48550/ arXiv.2303.09556. URLhttp://arxiv.org/abs/2303.09556. arXiv:2303.09556. Steven L. Heston. A closed-form solution for options with stochastic volatility with applications to bond and currency options.The Review of Financial Studies, 6(2):327–343,
-
[7]
Denoising Diffusion Probabilistic Models
doi: 10.1093/rfs/6.2.327. Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models. (arXiv:2006.11239), December
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1093/rfs/6.2.327 2006
-
[8]
Denoising Diffusion Probabilistic Models
doi: 10.48550/arXiv.2006.11239. URL http://arxiv.org/abs/2006.11239. arXiv:2006.11239. Yihan Hu. Towards arbitrage-free implied volatility surfaces with diffusion probabilistic models. Msc thesis, Delft University of Technology, Delft, oct
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2006.11239 2006
-
[9]
Decoupled Weight Decay Regularization
Ilya Loshchilov and Frank Hutter. Decoupled weight decay regularization. (arXiv:1711.05101), January
work page internal anchor Pith review Pith/arXiv arXiv
-
[10]
Decoupled Weight Decay Regularization
doi: 10.48550/arXiv.1711.05101. URLhttp://arxiv.org/abs/1711.05101. arXiv:1711.05101. Yumiharu Nakano. Convergence of the denoising diffusion probabilistic models for general noise schedules. (arXiv:2406.01320), aug
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1711.05101
-
[11]
URLhttp://arxiv.org/abs/2406.01320
doi: 10.48550/arXiv.2406.01320. URLhttp://arxiv.org/abs/2406.01320. arXiv:2406.01320. Alex Nichol and Prafulla Dhariwal. Improved denoising diffusion probabilistic models. (arXiv:2102.09672), February
-
[12]
Improved Denoising Diffusion Probabilistic Models
doi: 10.48550/arXiv.2102.09672. URL http://arxiv.org/abs/2102.09672. arXiv:2102.09672. Ethan Perez, Florian Strub, Harm de Vries, Vincent Dumoulin, and Aaron Courville. Film: Visual reasoning with a general conditioning layer. (arXiv:1709.07871), December
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.2102.09672
-
[13]
FiLM: Visual Reasoning with a General Conditioning Layer
doi: 10.48550/arXiv.1709.07871. URLhttp://arxiv.org/abs/1709.07871. arXiv:1709.07871. Daniel Revuz and Marc Yor.Girsanov’s Theorem and First Applications, page 325–364. Springer, Berlin, Heidelberg,
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1709.07871
-
[14]
U-Net: Convolutional Networks for Biomedical Image Segmentation
ISBN 9783662064009. doi: 10.1007/978-3-662-06400-9_9. URLhttps://doi.org/10. 1007/978-3-662-06400-9_9. Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. (arXiv:1505.04597), May
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/978-3-662-06400-9_9
-
[15]
U-Net: Convolutional Networks for Biomedical Image Segmentation
doi: 10.48550/arXiv.1505.04597. URLhttp://arxiv.org/ abs/1505.04597. arXiv:1505.04597. Tim Salimans and Jonathan Ho. Progressive distillation for fast sampling of diffusion models. (arXiv:2202.00512), June2022. doi: 10.48550/arXiv.2202.00512. URLhttp://arxiv.org/abs/2202.00512. arXiv:2202.00512. Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and...
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv.1505.04597
-
[17]
Score-Based Generative Modeling through Stochastic Differential Equations
URLhttps://arxiv.org/abs/2011.13456. 35 Rong Tang, Lizhen Lin, and Yun Yang. Conditional diffusion models are minimax-optimal and manifold- adaptive for conditional distribution estimation. (arXiv:2409.20124), September
work page internal anchor Pith review Pith/arXiv arXiv 2011
-
[19]
ISBN 9780387790510. doi: 10.1007/b13794. URLhttps://link.springer.com/10. 1007/b13794. Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N. Gomez, Lukasz Kaiser, and Illia Polosukhin. Attention is all you need. (arXiv:1706.03762), August
work page internal anchor Pith review Pith/arXiv arXiv doi:10.1007/b13794
-
[20]
doi: 10.48550/arXiv. 1706.03762. URLhttp://arxiv.org/abs/1706.03762. arXiv:1706.03762. Milena Vuletić and Rama Cont. Volgan: A generative model for arbitrage-free implied volatility surfaces.Ap- plied Mathematical Finance, 31(4):203–238, July
work page internal anchor Pith review Pith/arXiv arXiv doi:10.48550/arxiv
-
[21]
ISSN 1350-486X, 1466-4313. doi: 10.1080/1350486X. 2025.2471317. URLhttps://www.tandfonline.com/doi/full/10.1080/1350486X.2025.2471317. A Full Performance Metrics for All Trained Models 36 Table 3: Full comparison of performance metrics for all 9 trained VolGAN models and the proposed Diffusion model. VolGAN 5 was selected as the benchmark for comparison i...
discussion (0)
Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.