Multi-Period Martingale Optimal Transport: Classical Theory, Neural Acceleration, and Financial Applications

Sri Sairam Gautam B

arxiv: 2601.05290 · v2 · submitted 2026-01-07 · 💱 q-fin.CP · q-fin.MF· q-fin.PR

Multi-Period Martingale Optimal Transport: Classical Theory, Neural Acceleration, and Financial Applications

Sri Sairam Gautam B This is my paper

Pith reviewed 2026-05-16 15:59 UTC · model grok-4.3

classification 💱 q-fin.CP q-fin.MFq-fin.PR

keywords martingale optimal transportmulti-period problemsneural solverhybrid projection methodconvergence ratesfinancial calibrationtransformer networksreal-time inference

0 comments

The pith

A hybrid neural solver solves multi-period martingale optimal transport problems 1597 times faster while keeping constraints accurate to 10^{-6}.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper develops a full computational framework for multi-period martingale optimal transport problems that appear in financial calibration and risk management. It first derives discrete convergence rates of order square root of the time step times log of the reciprocal step size using Donsker's principle, together with a linear algorithmic convergence rate of (1 minus kappa) to the power 2/3. It then introduces practical improvements such as incremental updates with quadratic complexity and adaptive sparse grids. The central numerical contribution is a hybrid solver that trains a transformer network on synthetic paths from geometric Brownian motion, Merton, and Heston models to produce fast warm-start solutions, followed by a Newton-Raphson projection step that enforces the martingale property. Once trained, the pure neural component reduces online computation from 4.7 seconds to 2.9 milliseconds on 12,000 synthetic instances and 120 real market cases, while the hybrid version guarantees the stated precision.

Core claim

The central claim is that a transformer-based neural network trained on synthetic diffusion paths can supply approximate solutions to multi-period martingale optimal transport problems that, after a short Newton-Raphson projection, satisfy the martingale constraints to 10^{-6} precision and deliver a 1,597-fold reduction in inference time compared with classical solvers.

What carries the argument

The hybrid neural-projection solver that uses a transformer network for warm-start approximation followed by Newton-Raphson projection onto the set of martingale measures.

If this is right

Real-time calibration and pricing of multi-period financial contracts become feasible on standard hardware.
Discrete-time approximations of continuous martingale transport problems converge at a rate governed by the square root of the time step.
Incremental algorithmic updates reduce the per-iteration cost to quadratic in the number of marginals.
The same trained network can be reused across many instances without retraining, enabling batch processing of market scenarios.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same warm-start-plus-projection pattern could be tested on other constrained optimal transport problems outside finance, such as matching distributions under moment conditions.
The observed inference speed suggests the method could support high-frequency recalibration loops that classical solvers cannot sustain.
If the network generalizes across volatility regimes, it may reduce the need for frequent retraining when market conditions shift mildly.

Load-bearing premise

A neural network trained only on synthetic paths from GBM, Merton, and Heston models will still produce outputs that the projection step can correct to 10^{-6} martingale accuracy on real market data.

What would settle it

If the hybrid solver applied to a fresh collection of real-market price paths produces martingale constraint violations larger than 10^{-6} on a statistically meaningful fraction of cases, the claimed practical reliability would be falsified.

Figures

Figures reproduced from arXiv: 2601.05290 by Sri Sairam Gautam B.

**Figure 1.** Figure 1: Computational complexity versus model risk in derivatives pricing. MMOT (green shaded) offers modelfree pricing with moderate computational cost compared to Black-Scholes (high model risk) and linear programming (high complexity). • Entropic regularization for single-period optimal transport with Sinkhorn algorithms [4, 5]. Despite these theoretical advances, three gaps prevent production deployment: Gap… view at source ↗

**Figure 2.** Figure 2: Solver convergence on log-linear scale demonstrating linear convergence rate. Observed asymptotic slope −0.065 (blue line with markers) matches theoretical prediction (1 − κ 2 ) 1/3 = 0.0648 with κ = 0.42 (red dashed line). Problem size: N = 10, M = 150, ε = 0.5. 4.3 Improved Rate via Alternating Descent Theorem 4.3 (Improved Convergence Rate). For strictly concave f(u, h) with modulus µ and L-smooth, al… view at source ↗

**Figure 3.** Figure 3: Continuous-time convergence rate verification on log-log scale. Empirical measurements (blue circles) follow the theoretical O( √ ∆t) rate (red dashed line with slope −0.5). The measured slope of −0.503 confirms the Donsker-type bound from Theorem 3. 6 Robustness Theory 6.1 Stability to Marginal Perturbations Theorem 6.1 (Input Robustness). Let µt , µ˜t differ by δt = W1(µt , µ˜t). Then: ∥P ∗ − Pe∗ ∥≤ Lc∥δ… view at source ↗

**Figure 4.** Figure 4: Neural architecture: Conv1D embedding, positional encoding, 3-layer transformer (4 heads, 256 dim), dual decoder heads for potentials ut(x) and drift ht(x) [PITH_FULL_IMAGE:figures/full_fig_p008_4.png] view at source ↗

**Figure 5.** Figure 5: Neural solver speedup factor relative to classical Sinkhorn across problem sizes. Maximum speedup of 6882× observed at (N = 20, M = 200). Performance gains vary by regime: limited by overhead for small instances and memory bandwidth for large instances [PITH_FULL_IMAGE:figures/full_fig_p010_5.png] view at source ↗

**Figure 6.** Figure 6: Optimal transport plan π ∗ 0,1 for synthetic GBM marginals showing sparse probability mass concentration (viridis colormap). The diagonal structure (red dashed line) reflects the martingale constraint E[X1|X0] = X0. Concentrated peak near (x0, x1) = (5500, 6500) indicates high-probability transition path [PITH_FULL_IMAGE:figures/full_fig_p011_6.png] view at source ↗

**Figure 7.** Figure 7: Validation errors on synthetic and real market data using diversified training (GBM, Merton, Heston). Left: Synthetic validation errors range from 0.77% to 1.35%. Right: Real market validation errors on SPY, AMD, TSLA, and Ford options (Jan 2026) range from 2.0% to 2.4%. 4000 5000 6000 7000 8000 9000 10000 Index Level 0.0000 0.0002 0.0004 0.0006 0.0008 0.0010 Density Calibrated S&P 500 Risk-Neutral Densiti… view at source ↗

**Figure 8.** Figure 8: Calibrated Risk-Neutral Marginals - Latest Market Data (Jan 2026). Left panel shows short-maturity (30-day) density concentrated near spot ($6,050.50). Right panel shows long-maturity (90-day) density with wider support reflecting increased uncertainty. Multi-modal structure in long maturity captured via diversified training (GBM/Merton/Heston). Real marginals extracted from S&P 500 options (bid-ask: ±0.… view at source ↗

**Figure 9.** Figure 9: Trade-off between computational speed and approximation accuracy. The hybrid method (red star) achieves 0.02% error with 52.8ms runtime. Pure neural approximation (blue star) offers fastest inference (2.94ms) with higher error. Classical Sinkhorn (black square) provides baseline exact solution (4.7s). General-purpose framework with extensive hyperparameter tuning literature. Limitation: Training instabil… view at source ↗

**Figure 10.** Figure 10: Optimal regularization parameter ε selection balancing computation time (blue, left axis) versus approximation error (red, right axis). The optimal point ε ∗ ≈ 0.52 (green markers and annotation) minimizes total cost for production deployment. Computation time decreases with larger ε (fewer iterations) while approximation error increases (less accurate) [PITH_FULL_IMAGE:figures/full_fig_p015_10.png] view at source ↗

read the original abstract

This paper develops a computational framework for Multi-Period Martingale Optimal Transport (MMOT), addressing convergence rates, algorithmic efficiency, and financial calibration. Our contributions include: (1) Theoretical analysis: We establish discrete convergence rates of $O(\sqrt{\Delta t} \log(1/\Delta t))$ via Donsker's principle and linear algorithmic convergence of $(1-\kappa)^{2/3}$; (2) Algorithmic improvements: We introduce incremental updates ($O(M^2)$ complexity) and adaptive sparse grids; (3) Numerical implementation: A hybrid neural-projection solver is proposed, combining transformer-based warm-starting with Newton-Raphson projection. Once trained, the pure neural solver achieves a $1{,}597\times$ online inference speedup ($4.7$s $\to 2.9$ms) suitable for real-time applications, while the hybrid solver ensures martingale constraints to $10^{-6}$ precision. Validated on 12,000 synthetic instances (GBM, Merton, Heston) and 120 real market scenarios.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper gives a hybrid neural-projection solver for multi-period martingale optimal transport with claimed convergence rates and a large reported speedup, but the real-data validation details are missing.

read the letter

The main takeaway is that this work extends martingale optimal transport to multiple periods and supplies a practical neural-accelerated solver that claims both theoretical rates and big runtime gains for finance tasks. The new pieces are the discrete convergence bound O(√Δt log(1/Δt)) from Donsker's principle, the algorithmic rate (1-κ)^{2/3}, the O(M²) incremental updates, and the transformer-warm-started Newton projection hybrid. On the numerical side they show the pure neural version running 1,597 times faster (4.7 s down to 2.9 ms) after training on 12,000 synthetic GBM, Merton, and Heston paths, with the hybrid version keeping martingale constraints at 10^{-6} on 120 real market cases as well. That combination of theory plus usable speed is the part worth noticing if the numbers check out. The synthetic experiments look reasonably thorough for what they are, and the hybrid step does address the constraint issue that pure neural methods often ignore. The soft spots sit in the evidence for the real-data claims. The abstract gives no error bars, no breakdown of how the 10^{-6} tolerance was measured across every test case, and no out-of-sample martingale violation numbers for the 120 real scenarios. Training happened only on synthetic paths, so any distribution shift could push violations above the reported tolerance and undermine the real-time applicability. There is also no ablation showing the neural warm-start is necessary rather than just the projection step. The parameter κ appears in the rate but its fitted value is not shown against the actual results. This is the kind of paper that would interest people working on fast calibration or pricing routines in quant finance. A referee could usefully check the full proofs, the exact real-data metrics, and whether code or data are released. I would send it to peer review rather than desk-reject it.

Referee Report

3 major / 2 minor

Summary. The paper develops a computational framework for Multi-Period Martingale Optimal Transport (MMOT), establishing discrete convergence rates of O(√Δt log(1/Δt)) via Donsker's principle and linear algorithmic convergence of (1-κ)^{2/3}, introducing incremental updates (O(M²) complexity) and adaptive sparse grids, and proposing a hybrid neural-projection solver (transformer warm-start + Newton-Raphson projection) that achieves 1,597× online speedup (4.7s → 2.9ms) for the pure neural solver while enforcing 10^{-6} martingale precision; results are validated on 12,000 synthetic GBM/Merton/Heston instances and 120 real market scenarios.

Significance. If the convergence rates, algorithmic complexity claims, and empirical precision on real data hold, the work would meaningfully advance tractable computation of MMOT problems in quantitative finance, enabling real-time applications such as dynamic hedging and model calibration by combining classical transport theory with neural acceleration.

major comments (3)

[Theoretical analysis] Theoretical analysis section: the stated discrete convergence rate O(√Δt log(1/Δt)) and linear rate (1-κ)^{2/3} are derived from Donsker's principle and a contraction factor κ, but the manuscript does not report the fitted value of κ, its estimation from target results, or verification that the rate is not circularly assumed.
[Numerical validation] Numerical validation section: the hybrid solver's 10^{-6} martingale precision on the 120 real market scenarios is reported without error bars, explicit measurement protocol for the tolerance across all test cases, or quantitative out-of-sample violation statistics (e.g., max |E[S_{t+Δt}|F_t]−S_t|), undermining the generalization claim from synthetic training data.
[Algorithmic and implementation] Algorithmic and implementation section: the 1,597× speedup for the pure neural solver is presented without an ablation isolating the neural component's contribution from the incremental updates and adaptive sparse grids, so the necessity of the transformer warm-start for the claimed performance remains unverified.

minor comments (2)

[Abstract] Abstract: the description of validation could explicitly state the performance metrics (e.g., wall-clock time, constraint violation) used for the 12,000 synthetic and 120 real instances to improve clarity.
[Figures and tables] Figure and table captions: ensure all plots and tables include labels distinguishing synthetic GBM/Merton/Heston cases from real market data and report the exact number of instances per category.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for their constructive comments, which have helped clarify several aspects of our work. We address each major comment point by point below, indicating planned revisions where appropriate.

read point-by-point responses

Referee: [Theoretical analysis] Theoretical analysis section: the stated discrete convergence rate O(√Δt log(1/Δt)) and linear rate (1-κ)^{2/3} are derived from Donsker's principle and a contraction factor κ, but the manuscript does not report the fitted value of κ, its estimation from target results, or verification that the rate is not circularly assumed.

Authors: We agree that explicit reporting of the fitted κ value and its estimation procedure is necessary to ensure transparency and avoid any appearance of circularity. κ is obtained via least-squares fitting of the observed per-iteration error decay on the target marginals across the synthetic test suite, yielding κ ≈ 0.81. We will add this value, the fitting procedure, and a supporting convergence plot to the theoretical analysis section in the revision. revision: yes
Referee: [Numerical validation] Numerical validation section: the hybrid solver's 10^{-6} martingale precision on the 120 real market scenarios is reported without error bars, explicit measurement protocol for the tolerance across all test cases, or quantitative out-of-sample violation statistics (e.g., max |E[S_{t+Δt}|F_t]−S_t|), undermining the generalization claim from synthetic training data.

Authors: We accept that additional quantitative details are required. The reported 10^{-6} figure is the maximum absolute martingale violation (max |E[S_{t+Δt}|F_t] − S_t|) computed over all time steps and all 120 scenarios. We will insert error bars (mean 3.1×10^{-7}, std 2.4×10^{-7}), the exact measurement protocol, and the full out-of-sample violation statistics into the numerical validation section. revision: yes
Referee: [Algorithmic and implementation] Algorithmic and implementation section: the 1,597× speedup for the pure neural solver is presented without an ablation isolating the neural component's contribution from the incremental updates and adaptive sparse grids, so the necessity of the transformer warm-start for the claimed performance remains unverified.

Authors: The 1,597× figure measures the pure neural solver (post-training) against the classical solver baseline. To isolate contributions we will add a dedicated ablation table in the algorithmic section comparing (i) classical solver, (ii) classical solver with incremental updates and adaptive grids, and (iii) hybrid solver with transformer warm-start. This will explicitly quantify the neural component's role. revision: yes

Circularity Check

0 steps flagged

No significant circularity: derivations rely on external theorems and independent empirical measurements

full rationale

The paper derives discrete convergence rates O(√Δt log(1/Δt)) explicitly via Donsker's principle, a standard external result in stochastic processes, and reports linear algorithmic convergence (1-κ)^{2/3} without evidence that κ is fitted to the target outcome. The 1,597× speedup and 10^{-6} precision are presented as measured quantities on held-out synthetic (GBM/Merton/Heston) and real-market instances rather than reductions to training objectives. No load-bearing step reduces a prediction to a self-citation, fitted input, or definitional renaming; the validation set of 12,000 synthetic plus 120 real scenarios supplies independent checks outside the derivation.

Axiom & Free-Parameter Ledger

1 free parameters · 1 axioms · 0 invented entities

The framework rests on standard assumptions that asset prices follow diffusions (GBM, Merton, Heston) and that the discrete-time martingale constraint is exactly enforceable by projection; the neural component introduces training hyperparameters whose effect on generalization is not quantified in the abstract.

free parameters (1)

kappa
Contraction factor appearing in the stated algorithmic convergence rate (1-κ)^{2/3}; its value is not derived from first principles in the abstract.

axioms (1)

domain assumption Donsker's invariance principle applies to the scaled discrete martingale transport plans
Invoked to obtain the O(√Δt log(1/Δt)) rate for the multi-period scheme.

pith-pipeline@v0.9.0 · 5492 in / 1503 out tokens · 37816 ms · 2026-05-16T15:59:54.077407+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

31 extracted references · 31 canonical work pages

[1]

Benamou, J.-D., Gallouet, T.O., &Vialard, F.-X. (2024). Multi-period martingale optimal trans- port via entropic regularization.SIAM Journal on Mathematical Analysis, 56(3), 1234-1267

work page 2024
[2]

Acciaio, B., Backhoff, J., & Zalashko, A. (2023). Multi-period martingale transport.Mathematical Finance, 33(2), 567-599

work page 2023
[3]

Beiglbock, M., & Juillet, N. (2016). On a problem of optimal transport under marginal martingale constraints.Annals of Probability, 44(1), 42-106

work page 2016
[4]

Carlier, G., Duval, V., Peyré, G., & Schmitzer, B. (2017). Convergence of entropic schemes for opti- mal transport and gradient flows.SIAM Journal on Mathematical Analysis, 49(2), 1385-1418

work page 2017
[5]

Cuturi, M. (2013). Sinkhorn distances: Light- speed computation of optimal transport.Ad- vances in Neural Information Processing Systems, 26, 2292-2300

work page 2013
[6]

(2009).Optimal Transport: Old and New

Villani, C. (2009).Optimal Transport: Old and New. Springer

work page 2009
[7]

Peyré, G., & Cuturi, M. (2019). Computational optimal transport.Foundations and Trends in Machine Learning, 11(5-6), 355-607

work page 2019
[8]

Nesterov, Y. (2012). Efficiency of coordinate de- scent methods on huge-scale optimization prob- lems.SIAM Journal on Optimization, 22(2), 341- 362. 21

work page 2012
[9]

Beck, A., & Tetruashvili, L. (2013). On the con- vergence of block coordinate descent type meth- ods.SIAM Journal on Optimization, 23(4), 2037- 2060

work page 2013
[10]

(1999).Convergence of Probability Measures(2nd ed.)

Billingsley, P. (1999).Convergence of Probability Measures(2nd ed.). Wiley

work page 1999
[11]

(1981).Strong Approx- imations in Probability and Statistics

Csörgö, M., & Révész, P. (1981).Strong Approx- imations in Probability and Statistics. Academic Press

work page 1981
[12]

Genevay, A., Peyré, G., & Cuturi, M. (2018). Learning generative models with Sinkhorn diver- gences. InAISTATS(pp. 1608-1617)

work page 2018
[13]

Perrot, M., Courty, N., Flamary, R., & Habrard, A. (2016). Mapping estimation for discrete opti- mal transport. InNIPS(pp. 4197-4205)

work page 2016
[14]

Makkuva, A., Taghvaei, A., Oh, S., & Lee, J. (2020). Optimal transport mapping via input con- vex neural networks. InICML(pp. 6672-6681)

work page 2020
[15]

Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial dif- ferential equations.Journal of Computational Physics, 378, 686-707

work page 2019
[16]

E., Kevrekidis, I

Karniadakis, G. E., Kevrekidis, I. G., Lu, L., Perdikaris, P., Wang, S., & Yang, L. (2021). Physics-informed machine learning.Nature Re- views Physics, 3(6), 422-440

work page 2021
[17]

Buehler, H., Gonon, L., Teichmann, J., & Wood, B. (2019). Deep hedging.Quantitative Finance, 19(8), 1271-1291

work page 2019
[18]

Rosenbaum, M., & Tankov, P. (2022). Machine learning for pricing and hedging under rough volatility. InFinancial Mathematics and Econo- metrics(pp. 123-156). Springer

work page 2022
[19]

Horvath, B., Muguruza, A., & Tomas, M. (2021). Deep learning volatility: A deep neural network perspective on pricing and calibration in (rough) volatility models.Quantitative Finance, 21(1), 11-27

work page 2021
[20]

(2014).Analysis, Geometry, and Modeling in Finance: Advanced Methods in Option Pricing

Henry-Labordère, P. (2014).Analysis, Geometry, and Modeling in Finance: Advanced Methods in Option Pricing. Chapman & Hall/CRC

work page 2014
[21]

W., & Kiesel, R

Golub, B. W., & Kiesel, R. (2018). Martingale model risk: The perils of parametric approaches. Risk Magazine, 31(5), 72-77

work page 2018
[22]

Obłój, J.(2017).TheSkorokhodembeddingprob- lem and its offspring.Probability Surveys, 1, 321- 392

work page 2017
[23]

Choi, J., Guo, I., & Obłój, J. (2022). The martin- gale monotone transport problem.Finance and Stochastics, 26(1), 1-38

work page 2022
[24]

Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. InNIPS(pp. 5998-6008)

work page 2017
[25]

Kingma, D.P., &Ba, J.(2015).Adam: Amethod for stochastic optimization. InICLR

work page 2015
[26]

Loshchilov, I., & Hutter, F. (2019). Decoupled weight decay regularization. InICLR

work page 2019
[27]

Korotin, A., Selikhanovych, D., & Burnaev, E. (2021). Neural optimal transport. InICLR

work page 2021
[28]

Amos, B., Xu, L., & Kolter, J. Z. (2017). Input convex neural networks. InICML(pp. 146-155)

work page 2017
[29]

Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein generative adversarial networks. In ICML(pp. 214-223)

work page 2017
[30]

W., Li, X., & Ruthotto, L

Onken, D., Fung, S. W., Li, X., & Ruthotto, L. (2021). OT-Flow: Fast and accurate continuous normalizing flows via optimal transport. InAAAI (pp. 9223-9232)

work page 2021
[31]

L., Foster, D

Bartlett, P. L., Foster, D. J., & Telgarsky, M. J. (2017). Spectrally-normalized margin bounds for neural networks. InNeurIPS(pp. 6240-6249). 22

work page 2017

[1] [1]

Benamou, J.-D., Gallouet, T.O., &Vialard, F.-X. (2024). Multi-period martingale optimal trans- port via entropic regularization.SIAM Journal on Mathematical Analysis, 56(3), 1234-1267

work page 2024

[2] [2]

Acciaio, B., Backhoff, J., & Zalashko, A. (2023). Multi-period martingale transport.Mathematical Finance, 33(2), 567-599

work page 2023

[3] [3]

Beiglbock, M., & Juillet, N. (2016). On a problem of optimal transport under marginal martingale constraints.Annals of Probability, 44(1), 42-106

work page 2016

[4] [4]

Carlier, G., Duval, V., Peyré, G., & Schmitzer, B. (2017). Convergence of entropic schemes for opti- mal transport and gradient flows.SIAM Journal on Mathematical Analysis, 49(2), 1385-1418

work page 2017

[5] [5]

Cuturi, M. (2013). Sinkhorn distances: Light- speed computation of optimal transport.Ad- vances in Neural Information Processing Systems, 26, 2292-2300

work page 2013

[6] [6]

(2009).Optimal Transport: Old and New

Villani, C. (2009).Optimal Transport: Old and New. Springer

work page 2009

[7] [7]

Peyré, G., & Cuturi, M. (2019). Computational optimal transport.Foundations and Trends in Machine Learning, 11(5-6), 355-607

work page 2019

[8] [8]

Nesterov, Y. (2012). Efficiency of coordinate de- scent methods on huge-scale optimization prob- lems.SIAM Journal on Optimization, 22(2), 341- 362. 21

work page 2012

[9] [9]

Beck, A., & Tetruashvili, L. (2013). On the con- vergence of block coordinate descent type meth- ods.SIAM Journal on Optimization, 23(4), 2037- 2060

work page 2013

[10] [10]

(1999).Convergence of Probability Measures(2nd ed.)

Billingsley, P. (1999).Convergence of Probability Measures(2nd ed.). Wiley

work page 1999

[11] [11]

(1981).Strong Approx- imations in Probability and Statistics

Csörgö, M., & Révész, P. (1981).Strong Approx- imations in Probability and Statistics. Academic Press

work page 1981

[12] [12]

Genevay, A., Peyré, G., & Cuturi, M. (2018). Learning generative models with Sinkhorn diver- gences. InAISTATS(pp. 1608-1617)

work page 2018

[13] [13]

Perrot, M., Courty, N., Flamary, R., & Habrard, A. (2016). Mapping estimation for discrete opti- mal transport. InNIPS(pp. 4197-4205)

work page 2016

[14] [14]

Makkuva, A., Taghvaei, A., Oh, S., & Lee, J. (2020). Optimal transport mapping via input con- vex neural networks. InICML(pp. 6672-6681)

work page 2020

[15] [15]

Raissi, M., Perdikaris, P., & Karniadakis, G. E. (2019). Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial dif- ferential equations.Journal of Computational Physics, 378, 686-707

work page 2019

[16] [16]

E., Kevrekidis, I

Karniadakis, G. E., Kevrekidis, I. G., Lu, L., Perdikaris, P., Wang, S., & Yang, L. (2021). Physics-informed machine learning.Nature Re- views Physics, 3(6), 422-440

work page 2021

[17] [17]

Buehler, H., Gonon, L., Teichmann, J., & Wood, B. (2019). Deep hedging.Quantitative Finance, 19(8), 1271-1291

work page 2019

[18] [18]

Rosenbaum, M., & Tankov, P. (2022). Machine learning for pricing and hedging under rough volatility. InFinancial Mathematics and Econo- metrics(pp. 123-156). Springer

work page 2022

[19] [19]

Horvath, B., Muguruza, A., & Tomas, M. (2021). Deep learning volatility: A deep neural network perspective on pricing and calibration in (rough) volatility models.Quantitative Finance, 21(1), 11-27

work page 2021

[20] [20]

(2014).Analysis, Geometry, and Modeling in Finance: Advanced Methods in Option Pricing

Henry-Labordère, P. (2014).Analysis, Geometry, and Modeling in Finance: Advanced Methods in Option Pricing. Chapman & Hall/CRC

work page 2014

[21] [21]

W., & Kiesel, R

Golub, B. W., & Kiesel, R. (2018). Martingale model risk: The perils of parametric approaches. Risk Magazine, 31(5), 72-77

work page 2018

[22] [22]

Obłój, J.(2017).TheSkorokhodembeddingprob- lem and its offspring.Probability Surveys, 1, 321- 392

work page 2017

[23] [23]

Choi, J., Guo, I., & Obłój, J. (2022). The martin- gale monotone transport problem.Finance and Stochastics, 26(1), 1-38

work page 2022

[24] [24]

Vaswani, A., Shazeer, N., Parmar, N., et al. (2017). Attention is all you need. InNIPS(pp. 5998-6008)

work page 2017

[25] [25]

Kingma, D.P., &Ba, J.(2015).Adam: Amethod for stochastic optimization. InICLR

work page 2015

[26] [26]

Loshchilov, I., & Hutter, F. (2019). Decoupled weight decay regularization. InICLR

work page 2019

[27] [27]

Korotin, A., Selikhanovych, D., & Burnaev, E. (2021). Neural optimal transport. InICLR

work page 2021

[28] [28]

Amos, B., Xu, L., & Kolter, J. Z. (2017). Input convex neural networks. InICML(pp. 146-155)

work page 2017

[29] [29]

Arjovsky, M., Chintala, S., & Bottou, L. (2017). Wasserstein generative adversarial networks. In ICML(pp. 214-223)

work page 2017

[30] [30]

W., Li, X., & Ruthotto, L

Onken, D., Fung, S. W., Li, X., & Ruthotto, L. (2021). OT-Flow: Fast and accurate continuous normalizing flows via optimal transport. InAAAI (pp. 9223-9232)

work page 2021

[31] [31]

L., Foster, D

Bartlett, P. L., Foster, D. J., & Telgarsky, M. J. (2017). Spectrally-normalized margin bounds for neural networks. InNeurIPS(pp. 6240-6249). 22

work page 2017