arxiv: 2603.07600 · v3 · submitted 2026-03-08 · 💱 q-fin.CP

Recognition: no theorem link

Differential Machine Learning for 0DTE Options with Stochastic Volatility and Jumps

Takayuki Sakuma

Authors on Pith no claims yet

Pith reviewed 2026-05-15 14:55 UTC · model grok-4.3

classification 💱 q-fin.CP

keywords differential machine learning0DTE optionsstochastic volatilityjump diffusionPIDE residualGreeksoption pricinghedging

0 comments

The pith

A three-stage differential machine learning procedure approximates jump terms more accurately in ultra-short maturity options while preserving pricing accuracy and delivering faster Greeks than Fourier methods.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper introduces a machine learning framework for pricing zero-days-to-expiry options in a stochastic-volatility jump-diffusion setting. It expresses the price in Black-Scholes form augmented by a maturity-gated variance correction, then trains a single network to output both the price and its first derivatives while penalizing violations of the underlying partial integro-differential equation. A separate jump-operator network is trained jointly in three stages to isolate the contribution of jumps. This construction is shown to reduce error in the jump component relative to single-stage baselines, keep overall pricing errors comparable, lower Greeks errors, and produce stable one-day delta hedges. The resulting network also runs substantially faster than Fourier-based benchmarks and improves calibration fit when jump-intensity sensitivities are included in the loss.

Core claim

Expressing the option price in Black-Scholes form with a maturity-gated variance correction, supervising both prices and Greeks from one pricing network, and fitting a jump-operator network jointly in a three-stage procedure with a PIDE-residual penalty improves the identifiability and accuracy of the jump term for ultra-short maturities relative to one-stage training while maintaining comparable pricing errors, reducing Greeks errors, yielding stable one-day delta hedges, and providing large speedups over Fourier methods.

What carries the argument

Three-stage joint training of a pricing network and a jump-operator network together with a maturity-gated variance correction inside a Black-Scholes representation and a PIDE-residual penalty term.

If this is right

Better jump-term recovery allows more reliable decomposition of price changes into diffusive and jump components at very short horizons.
Lower Greeks errors translate directly into smaller hedging residual variance for daily rebalancing of 0DTE positions.
Speedups over Fourier inversion make repeated calibration and real-time risk calculations feasible inside stochastic-volatility jump models.
Inclusion of jump-intensity price sensitivity in the loss further tightens the calibrated parameter fit.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same staged-training pattern could be applied to other path-dependent contracts whose pricing equations contain non-local integral terms.
If the networks generalize beyond the training measure, they could serve as fast surrogates inside Monte-Carlo engines that simulate many short-maturity paths.
Empirical tests on actual 0DTE market quotes would show whether the reported stability persists when the true jump distribution differs from the training assumption.

Load-bearing premise

The three-stage joint training of pricing and jump-operator networks together with the maturity-gated variance correction reliably separates jump contributions and yields accurate Greeks for ultra-short maturities without overfitting to the chosen training paths or model parameters.

What would settle it

On held-out 0DTE paths with large jump intensity, the learned jump-operator network produces approximation errors comparable to or larger than a one-stage baseline, or the resulting delta hedges become unstable over a one-day horizon.

read the original abstract

We present a differential machine learning method for zero-days-to-expiry (0DTE) options under a stochastic-volatility jump-diffusion model. To handle the ultra-short-maturity regime, we express the option price in Black-Scholes form with a maturity-gated variance correction, combining supervision on prices and Greeks with a PIDE-residual penalty. Prices and Greeks are derived from a single trained pricing network, while jump-term identifiability is ensured by a jump-operator network fitted jointly in a three-stage procedure. The method improves jump-term approximation relative to one-stage baselines while maintaining comparable pricing errors. Furthermore, it reduces errors in Greeks, produces stable one-day delta hedges, and offers significant speedups over Fourier-based benchmarks. Calibration experiments demonstrate the network's efficiency as a pricer; notably, incorporating jump-intensity price sensitivity into the learning process further improves the overall model fit.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

This paper gives a three-stage neural setup for 0DTE SVJD pricing that separates the jump operator and adds a maturity-gated correction, with claims of better Greeks and speed, but the abstract shows no numbers to judge the size of the gains.

read the letter

The core contribution is a pricing network that outputs a Black-Scholes-style price plus a maturity-dependent variance adjustment, trained together with a second network that isolates the jump term. They supervise on prices, Greeks, and a PIDE residual, using a three-stage procedure to make the jump part identifiable. This targets the regime where standard Fourier methods get expensive and jumps dominate the short end. The architecture choice looks reasonable for avoiding the network having to learn everything at once when maturity is near zero. If the joint training really improves jump approximation without inflating pricing error, and if the one-day delta hedges stay stable, the method could be practical for desks that need fast recalibration. The abstract also mentions that feeding jump-intensity sensitivities into training helps the overall fit, which is a concrete detail worth checking. The main limitation is that we see no error tables, no ablation on the three stages versus a single network, and no verification that the PIDE penalty is actually constraining the dynamics rather than being absorbed by the fit. Because the training already uses model-derived quantities for the jump sensitivities, it is hard to tell how much of the reported improvement is independent of the training procedure itself. The work is aimed at quants who price or hedge 0DTE options under jump-diffusion models and at researchers building differential ML solvers for short-maturity PIDEs. A reader who already works on neural PDE methods in finance will find the specific design choices useful to compare against their own baselines. It is worth sending to peer review because the problem is practically relevant and the proposed pipeline is specific enough that referees can test the claims directly once the numbers are shown.

Referee Report

3 major / 2 minor

Summary. The paper proposes a differential machine learning method for pricing 0DTE options under a stochastic-volatility jump-diffusion model. It expresses the option price in Black-Scholes form with a maturity-gated variance correction, combines supervision on prices and Greeks with a PIDE-residual penalty, derives prices and Greeks from a single pricing network, and uses a three-stage joint training procedure with a separate jump-operator network to ensure identifiability of jump terms. The central claims are improved jump-term approximation relative to one-stage baselines, reduced errors in Greeks, stable one-day delta hedges, significant speedups over Fourier benchmarks, and better calibration fit when incorporating jump-intensity sensitivities.

Significance. If the quantitative claims hold under independent verification, the work offers a computationally efficient approach to pricing and hedging ultra-short-maturity options with jumps, which is relevant for high-frequency trading and real-time risk management. The three-stage training procedure for separating jump-operator effects and the use of PIDE residuals within a differential ML framework constitute a targeted methodological contribution to computational finance for regimes where standard Fourier methods become expensive.

major comments (3)

[Abstract and Experiments] Abstract and Experiments section: the claims of improved jump-term approximation and reduced Greeks errors are stated without accompanying quantitative error tables, ablation studies comparing the three-stage procedure to one-stage baselines, or explicit verification that the PIDE residual enforces the model dynamics rather than being absorbed into the network fit.
[§3.2] §3.2 (three-stage training description): the identifiability of jump terms is asserted to follow from the joint training of pricing and jump-operator networks together with maturity-gated variance correction, yet no diagnostic tests (e.g., sensitivity to training-data distribution or recovery of known jump parameters on synthetic data) are reported to confirm that the procedure does not overfit to the specific model assumptions.
[Calibration experiments] Calibration experiments: while speedups over Fourier methods are claimed, the manuscript does not report the precise computational timings, the number of calibration iterations, or the out-of-sample pricing error on held-out strikes/maturities that would substantiate the efficiency advantage for practical use.

minor comments (2)

[§2] Notation for the maturity-gated variance correction and the jump-operator network output should be introduced with explicit equations rather than descriptive prose to improve reproducibility.
[Abstract] The abstract states that incorporating jump-intensity price sensitivity improves model fit, but the corresponding quantitative improvement (e.g., reduction in calibration RMSE) is not shown in any table or figure.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive and detailed feedback. We address each major comment below and will revise the manuscript accordingly to strengthen the quantitative support and validation of our claims.

read point-by-point responses

Referee: [Abstract and Experiments] Abstract and Experiments section: the claims of improved jump-term approximation and reduced Greeks errors are stated without accompanying quantitative error tables, ablation studies comparing the three-stage procedure to one-stage baselines, or explicit verification that the PIDE residual enforces the model dynamics rather than being absorbed into the network fit.

Authors: We agree that the claims would be more robust with explicit quantitative backing. In the revised manuscript we will add error tables quantifying jump-term and Greeks improvements, ablation studies isolating the three-stage procedure versus one-stage baselines, and verification that PIDE residuals remain small and consistent with the model dynamics (e.g., via residual norm statistics on held-out paths). revision: yes
Referee: [§3.2] §3.2 (three-stage training description): the identifiability of jump terms is asserted to follow from the joint training of pricing and jump-operator networks together with maturity-gated variance correction, yet no diagnostic tests (e.g., sensitivity to training-data distribution or recovery of known jump parameters on synthetic data) are reported to confirm that the procedure does not overfit to the specific model assumptions.

Authors: We acknowledge that diagnostic evidence would strengthen the identifiability argument. We will add synthetic-data recovery experiments (recovering known jump parameters) and sensitivity tests to training-data distributions in the revised §3.2 to demonstrate that the three-stage procedure does not overfit to the assumed model. revision: yes
Referee: [Calibration experiments] Calibration experiments: while speedups over Fourier methods are claimed, the manuscript does not report the precise computational timings, the number of calibration iterations, or the out-of-sample pricing error on held-out strikes/maturities that would substantiate the efficiency advantage for practical use.

Authors: We will include the requested details in the revised calibration section: wall-clock timings for the network versus Fourier benchmarks, the exact number of calibration iterations, and out-of-sample pricing errors on held-out strikes and maturities to substantiate the practical efficiency advantage. revision: yes

Circularity Check

0 steps flagged

No significant circularity in derivation chain

full rationale

The paper defines a three-stage training procedure for a pricing network (expressing price via Black-Scholes form plus maturity-gated variance correction) and a separate jump-operator network, with loss terms that include direct supervision on prices/Greeks plus a PIDE residual penalty. These residuals derive from the known SVJ dynamics rather than from the network outputs themselves, and performance is measured against independent Fourier benchmarks on generated data. No step reduces a claimed prediction to a fitted input by construction, no load-bearing self-citation appears, and the ansatz is explicitly introduced as part of the method rather than imported. The central claims (improved jump-term recovery and Greeks relative to one-stage baselines) therefore rest on external numerical validation rather than tautological reduction.

Axiom & Free-Parameter Ledger

1 free parameters · 2 axioms · 0 invented entities

The approach rests on the validity of expressing the 0DTE price via a Black-Scholes skeleton plus maturity-gated variance correction, the identifiability of jump terms through the auxiliary network, and the assumption that PIDE residuals provide useful supervision without introducing bias in the ultra-short regime. Neural network weights constitute many free parameters fitted to data.

free parameters (1)

neural network weights and hyperparameters
All trainable parameters of the pricing and jump-operator networks are fitted during the three-stage procedure.

axioms (2)

domain assumption Black-Scholes form with maturity-gated variance correction accurately represents the 0DTE price under the SVJD model
Invoked to structure the pricing network output.
ad hoc to paper Three-stage training ensures identifiability of jump terms
Claimed in the abstract as the mechanism for improved jump approximation.

pith-pipeline@v0.9.0 · 5443 in / 1477 out tokens · 78602 ms · 2026-05-15T14:55:45.002559+00:00 · methodology