On the Regularity and Generalization of One-Step Wasserstein-guided Generative Models for PDE-Induced Measures

Jack Xin; Likun Lin; Zhiwen Zhang; Zhongjian Wang

arxiv: 2605.21388 · v1 · pith:KKU6MJMHnew · submitted 2026-05-20 · 💻 cs.LG · cs.AI· cs.NA· math.NA· stat.ML

On the Regularity and Generalization of One-Step Wasserstein-guided Generative Models for PDE-Induced Measures

Likun Lin , Zhongjian Wang , Jack Xin , Zhiwen Zhang This is my paper

Pith reviewed 2026-05-21 05:20 UTC · model grok-4.3

classification 💻 cs.LG cs.AIcs.NAmath.NAstat.ML

keywords generative modelsoptimal transportPDE-induced measuresdoubling conditionsHölder continuityWasserstein distancegeneralization boundsone-step models

0 comments

The pith

Optimal transport maps from uniform sources to PDE-induced measures are Hölder continuous under standard assumptions.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper establishes that normalized target densities from linear elliptic and parabolic PDEs on bounded domains, plus diffusion and Fokker-Planck equations on the torus, yield measures that satisfy doubling conditions when standard structural assumptions hold. Pairing this property with existing regularity results for optimal transport between doubling measures shows that the map transporting a uniform source distribution to the target is Hölder continuous. This continuity supplies an approximation-theoretic reason why one-step generative models can learn such PDE-induced distributions through a single learned pushforward. A reader would care because the result moves beyond pessimistic statistical bounds toward concrete justification for simple transport-based generators in scientific computing settings.

Core claim

Under standard structural assumptions, the target measures associated with linear elliptic and parabolic equations on bounded domains, as well as diffusion and Fokker-Planck equations on the torus, satisfy doubling conditions. Combining this fact with regularity theory for optimal transport between doubling measures shows that the optimal transport map from a uniform source measure to the target measure is Hölder continuous. This regularity yields an approximation-theoretic justification for one-step generative models that learn PDE-induced distributions via a single pushforward map. As a concrete case, excess-risk bounds are derived for DeepParticle that measure the gap to the population OT

What carries the argument

Doubling conditions on the PDE-induced target measures, which trigger Hölder regularity of the optimal transport map from a uniform source via established optimal transport theory.

If this is right

Excess-risk bounds quantify how well a learned one-step map approximates the true population OT map for models such as DeepParticle.
Robustness estimates quantify stability of the learned map under shifts in the target PDE-induced measure.
One-step pushforward models receive approximation-theoretic support for representing distributions arising from elliptic, parabolic, and Fokker-Planck equations.
Generalization rates follow from the Hölder exponent once the map is approximated by a neural network or similar function class.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same doubling-plus-regularity route may apply to other linear or semilinear PDEs once their densities are checked for the required structural properties.
Architectural choices in generative networks could be guided by the expected Hölder exponent to reduce approximation error.
The framework suggests that transport-based generators may scale better than iterative sampling methods when the target measure is known to be doubling.
Numerical PDE solvers could be recast as distribution-learning tasks whose accuracy is controlled by the same OT regularity.

Load-bearing premise

The normalized target densities from the listed PDEs meet the structural conditions that produce the doubling property for the associated measures.

What would settle it

Numerical approximation of the optimal transport map for the heat equation on a unit square, followed by direct verification of whether the map satisfies a Hölder bound with positive exponent strictly less than one.

Figures

Figures reproduced from arXiv: 2605.21388 by Jack Xin, Likun Lin, Zhiwen Zhang, Zhongjian Wang.

**Figure 1.** Figure 1: Validation Wasserstein–2 error versus sample size for the one- and twodimensional model problems. In each panel, the blue curve shows the observed mean validation error and the red line is the least-squares fit in log-log coordinates. One-dimensional experiment We use the source and target measures from Example 5.1 and the exact optimal transport map given there. The network architecture is [1, 256, 256, … view at source ↗

read the original abstract

Despite the remarkable empirical success of generative models, the available theory on their statistical accuracy in scientific computing remains largely pessimistic. This paper develops a theoretical framework for understanding the regularity of transport maps and the generalization properties of one-step Wasserstein-guided generative models for PDE-induced probability measures. We consider normalized target densities associated with linear elliptic and parabolic equations on bounded domains, as well as diffusion and Fokker--Planck equations on the torus. Under standard structural assumptions, we prove that these target measures satisfy doubling conditions. By combining this fact with regularity theory for optimal transport between doubling measures, we show that the optimal transport map from a uniform source measure to the target measure is H\"older continuous. This regularity yields an approximation-theoretic justification for one-step generative models that learn PDE-induced distributions via a single pushforward map. As a representative instance, we study DeepParticle and derive excess-risk bounds characterizing the discrepancy between the learned map and the population-optimal map. We also establish a robustness estimate under target shift and illustrate the theory with experiments which support the derived rates.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper shows Hölder continuity of OT maps for PDE-induced measures via doubling conditions and gives excess-risk bounds for one-step models like DeepParticle, but the key assumptions stay vague.

read the letter

The main takeaway is that the authors prove these target measures from linear elliptic and parabolic equations on domains, plus diffusion and Fokker-Planck on the torus, satisfy doubling conditions under standard structural assumptions. They then apply existing optimal transport regularity results to conclude the map from uniform source to target is Hölder continuous, which supplies an approximation argument for one-step generative models and yields excess-risk bounds plus a robustness estimate for DeepParticle.

Referee Report

1 major / 1 minor

Summary. The paper develops a theoretical framework for the regularity of transport maps and generalization properties of one-step Wasserstein-guided generative models targeting PDE-induced probability measures. It considers normalized target densities from linear elliptic and parabolic equations on bounded domains as well as diffusion and Fokker-Planck equations on the torus. Under standard structural assumptions, the target measures are shown to satisfy doubling conditions; combined with existing OT regularity theory for doubling measures, this yields Hölder continuity of the optimal transport map from a uniform source to the target. The regularity is used to justify one-step generative models, with excess-risk bounds derived for the representative case of DeepParticle, plus a robustness estimate under target shift and supporting experiments.

Significance. If the central claims on doubling conditions and the resulting Hölder regularity hold with explicit verification, the work would provide a valuable approximation-theoretic justification for one-step generative models in scientific computing applications involving PDE-induced distributions. It connects PDE structure to OT map regularity in a way that could inform generalization bounds and model design for physical measures, and the excess-risk analysis for DeepParticle plus robustness result add concrete quantitative content.

major comments (1)

[Abstract] Abstract: The central step asserting that the normalized target densities satisfy doubling conditions 'under standard structural assumptions' is load-bearing for the subsequent application of OT regularity theory and the Hölder continuity conclusion, yet the assumptions (e.g., uniform positivity/boundedness of the density, boundary behavior for domain cases, periodicity for torus cases) are invoked without explicit statement or verification that they hold uniformly across the four PDE families listed.

minor comments (1)

The abstract references experiments that 'support the derived rates' but provides no quantitative details on the experimental setup, error bars, or specific rates shown; adding a brief summary or pointer to the relevant figure/table would improve clarity.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for their careful reading of our manuscript and for the constructive comments. We address the major comment below.

read point-by-point responses

Referee: [Abstract] Abstract: The central step asserting that the normalized target densities satisfy doubling conditions 'under standard structural assumptions' is load-bearing for the subsequent application of OT regularity theory and the Hölder continuity conclusion, yet the assumptions (e.g., uniform positivity/boundedness of the density, boundary behavior for domain cases, periodicity for torus cases) are invoked without explicit statement or verification that they hold uniformly across the four PDE families listed.

Authors: We thank the referee for highlighting this point. In the manuscript, the structural assumptions are introduced and applied separately for each PDE family in the dedicated sections (linear elliptic and parabolic PDEs on bounded domains in Sections 3 and 4; diffusion and Fokker-Planck equations on the torus in Sections 5 and 6). These include uniform ellipticity and boundedness of coefficients, smooth or Lipschitz boundary conditions ensuring positive densities bounded away from zero and infinity, and standard periodicity for the torus cases. Under these conditions, the normalized densities are comparable to the Lebesgue measure on compact sets, which directly yields the doubling property via standard measure-theoretic arguments. We agree, however, that a unified and explicit enumeration would improve clarity and transparency. In the revision we will add a short preliminary subsection (new Section 2.3) that lists the precise assumptions for all four families side-by-side and recalls the elementary verification that each implies the doubling condition. This addition will be referenced from the abstract and introduction, without changing any theorems or proofs. revision: yes

Circularity Check

0 steps flagged

No significant circularity; derivation applies external OT regularity to independently established doubling conditions

full rationale

The paper's chain begins with normalized target densities from linear elliptic/parabolic PDEs on domains and diffusion/Fokker-Planck equations on the torus. Under explicitly invoked standard structural assumptions, it proves these measures satisfy doubling conditions. It then combines this with existing regularity theory for optimal transport maps between doubling measures to obtain Hölder continuity of the map from uniform source to target. This supplies an approximation-theoretic justification for one-step models and excess-risk bounds for DeepParticle. No quoted step reduces by construction to a fitted input, self-definition, or load-bearing self-citation whose content is itself unverified; the OT regularity is treated as an external result, and the doubling proof is presented as a direct consequence of the structural assumptions rather than a renaming or ansatz smuggling. The derivation therefore remains self-contained against external benchmarks.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claims rest on standard structural assumptions from PDE theory that are used to prove the doubling property; no free parameters or new entities are introduced in the abstract.

axioms (1)

domain assumption Standard structural assumptions on normalized target densities for linear elliptic, parabolic, diffusion and Fokker-Planck equations.
Invoked to establish that the target measures satisfy doubling conditions.

pith-pipeline@v0.9.0 · 5734 in / 1354 out tokens · 44190 ms · 2026-05-21T05:20:50.248239+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AlexanderDuality.lean alexander_duality_circle_linking unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

Under standard structural assumptions, we prove that these target measures satisfy doubling conditions. By combining this fact with regularity theory for optimal transport between doubling measures, we show that the optimal transport map ... is Hölder continuous.
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

the optimal transport map from a uniform source measure to the target measure is Hölder continuous

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Reference graph

Works this paper leans on

48 extracted references · 48 canonical work pages · 1 internal anchor

[1]

Wasserstein gener- ative adversarial networks

Martin Arjovsky, Soumith Chintala, and L´ eon Bottou. Wasserstein gener- ative adversarial networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 214–223. PMLR, 06–11 Aug 2017

work page 2017
[2]

Some theoretical insights into wasserstein GANs.Journal of Machine Learning Research, 22(119):1–45, 2021

G´ erard Biau, Maxime Sangnier, and Ugo Tanielian. Some theoretical insights into wasserstein GANs.Journal of Machine Learning Research, 22(119):1–45, 2021

work page 2021
[3]

American Mathematical Society, Providence, Rhode Island, 1 edition, 2019

Sergey Bobkov and Michel Ledoux.One-dimensional empirical measures, order statistics, and Kantorovich transport distances, volume 261 ofMemoirs of the American Mathematical Society. American Mathematical Society, Providence, Rhode Island, 1 edition, 2019

work page 2019
[4]

Caffarelli

Luis A. Caffarelli. The regularity of mappings with a convex potential. Journal of the American Mathematical Society, 5(1):99–104, 1992

work page 1992
[5]

Bartlett

Saptarshi Chakraborty and Peter L. Bartlett. On the statistical properties of generative adversarial models for low intrinsic data dimension.Journal of Machine Learning Research, 26:1–80, 2025

work page 2025
[6]

Distribution approximation and statistical estimation guarantees of generative adversarial networks, 2022

Minshuo Chen, Wenjing Liao, Hongyuan Zha, and Tuo Zhao. Distribution approximation and statistical estimation guarantees of generative adversarial networks, 2022

work page 2022
[7]

Xiaoli Chen, Phoebus Rosakis, Zhizhang Wu, and Zhiwen Zhang. Solving nonconvex energy minimization problems in martensitic phase transitions with a mesh-free deep learning approach.Computer Methods in Applied Mechanics and Engineering, 416:116384, 2023

work page 2023
[8]

Podno: Proper orthogonal decomposition neural operators, 2025

Zilan Cheng, Zhongjian Wang, Li-Lian Wang, and Mejdi Azaiez. Podno: Proper orthogonal decomposition neural operators, 2025

work page 2025
[9]

A variational neural network approach for glacier modelling with nonlinear rheology.Communi- cations in Computational Physics, 34(4):934–954, 2023

Tiangang Cui, Zhongjian Wang, and Zhiwen Zhang. A variational neural network approach for glacier modelling with nonlinear rheology.Communi- cations in Computational Physics, 34(4):934–954, 2023

work page 2023
[10]

Convergence of denoising diffusion models under the manifold hypoth- esis.arXiv preprint arXiv:2208.05314,

Valentin De Bortoli. Convergence of denoising diffusion models under the manifold hypothesis.arXiv preprint arXiv:2208.05314, 2022

work page arXiv 2022
[11]

The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems.Communications in Mathematics and Statistics, 6(1):1–12, 2018

Weinan E and Bing Yu. The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems.Communications in Mathematics and Statistics, 6(1):1–12, 2018. 24

work page 2018
[12]

Evans.Partial differential equations

Lawrence C. Evans.Partial differential equations. American Mathematical Society, Providence, R.I., 2010

work page 2010
[13]

On the rate of convergence in wasser- stein distance of the empirical measure.Probability Theory and Related Fields, 162:707–738, 2015

Nicolas Fournier and Arnaud Guillin. On the rate of convergence in wasser- stein distance of the empirical measure.Probability Theory and Related Fields, 162:707–738, 2015

work page 2015
[14]

Trudinger.Elliptic partial differential equations of second order

David Gilbarg and Neil S. Trudinger.Elliptic partial differential equations of second order. Grundlehren der mathematischen Wissenschaften ; 224. Springer-Verlag, Berlin, 2nd ed. edition, 1983

work page 1983
[15]

Goodfellow, J

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets.Advances in Neural Information Processing Systems, 27, 2014

work page 2014
[16]

Jain, and Pieter Abbeel

Jonathan Ho, Ajay N. Jain, and Pieter Abbeel. Denoising diffusion prob- abilistic models. InAdvances in Neural Information Processing Systems, volume 33. Curran Associates, Inc., 2020

work page 2020
[17]

Neural network Poisson-Boltzmann electrostatics for biomolecular interactions

Zunding Huang, Bo Li, Zhongming Wang, and Zhiwen Zhang. Neural network Poisson-Boltzmann electrostatics for biomolecular interactions. Journal of Computational Physics, page 114446, 2025

work page 2025
[18]

On the regularity of optimal transports between degenerate densities.Archive for Rational Mechanics and Analysis, 245(2):819–861, June 2022

Yash Jhaveri and Ovidiu Savin. On the regularity of optimal transports between degenerate densities.Archive for Rational Mechanics and Analysis, 245(2):819–861, June 2022

work page 2022
[19]

A generalization of caffarelli’s con- traction theorem via (reverse) heat flow.Mathematische annalen, 354(3):827– 862, 2012

Young-Heon Kim and Emanuel Milman. A generalization of caffarelli’s con- traction theorem via (reverse) heat flow.Mathematische annalen, 354(3):827– 862, 2012

work page 2012
[20]

Kobyzev, S

I. Kobyzev, S. Prince, and M. Brubaker. Normalizing flows: An introduction and review of current methods.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020

work page 2020
[21]

Approximating smooth functions by deep neural net- works with sigmoid activation function.Journal of Multivariate Analysis, 182:104696, 2021

Sophie Langer. Approximating smooth functions by deep neural net- works with sigmoid activation function.Journal of Multivariate Analysis, 182:104696, 2021

work page 2021
[22]

Fourier neural operator for parametric partial differential equations

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations, 2021

work page 2021
[23]

Physics- informed neural operator for learning partial differential equations.ACM / IMS J

Zongyi Li, Hongkai Zheng, Nikola Kovachki, David Jin, Haoxuan Chen, Burigede Liu, Kamyar Azizzadenesheli, and Anima Anandkumar. Physics- informed neural operator for learning partial differential equations.ACM / IMS J. Data Sci., 1(3), May 2024. 25

work page 2024
[24]

How well generative adversarial networks learn distribu- tions.Journal of Machine Learning Research, 22(228):1–41, 2021

Tengyuan Liang. How well generative adversarial networks learn distribu- tions.Journal of Machine Learning Research, 22(228):1–41, 2021

work page 2021
[25]

G. M. Lieberman.Second Order Parabolic Differential Equations. World Scientific, 1996

work page 1996
[26]

Cell-induced densification and tether formation in fibrous extracellular matrices with biomimetic physics-informed neural networks

Anci Lin, Xiaohong Liu, Zhiwen Zhang, Weidong Zhao, and Wenju Zhao. Biomimetic PINNs for cell-induced phase transitions: UQ-R3 sampling with causal gating.arXiv preprint arXiv:2603.29184, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026
[27]

Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3:218–229, 2021

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3:218–229, 2021

work page 2021
[28]

A convergent interacting particle method and computation of KPP front speeds in chaotic flows.SIAM Journal on Numerical Analysis, 60(3):1136–1167, 2022

Junlong Lyu, Zhongjian Wang, Jack Xin, and Zhiwen Zhang. A convergent interacting particle method and computation of KPP front speeds in chaotic flows.SIAM Journal on Numerical Analysis, 60(3):1136–1167, 2022

work page 2022
[29]

Opti- mal transport mapping via input convex neural networks

Ashok Makkuva, Amirhossein Taghvaei, Sewoong Oh, and Jason Lee. Opti- mal transport mapping via input convex neural networks. In Hal Daum´ e III and Aarti Singh, editors,Proceedings of the 37th International Confer- ence on Machine Learning, volume 119 ofProceedings of Machine Learning Research, pages 6672–6681. PMLR, 13–18 Jul 2020

work page 2020
[30]

Pathway to o( √ d) complexity bound under wasserstein metric of flow-based models, 2025

Xiangjun Meng and Zhongjian Wang. Pathway to o( √ d) complexity bound under wasserstein metric of flow-based models, 2025

work page 2025
[31]

Spectral normalization for generative adversarial networks, 2018

Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks, 2018

work page 2018
[32]

Adaptive computation and machine learning series

Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar.Foundations of machine learning. Adaptive computation and machine learning series. MIT Press, Cambridge, MA, 2012

work page 2012
[33]

Lawrence E. Payne. Maximum principles in differential equations (murray h. protter and hans f. weinberger).SIAM Review, 10(3):386–387, July 1968

work page 1968
[34]

Raissi, P

M. Raissi, P. Perdikaris, and G. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational Physics, 378:686–707, 2019

work page 2019
[35]

Z. Shen, Z. Wang, J. Xin, and Z. Zhang. Two-step diffusion: Fast sampling and reliable prediction for 3D Keller-Segel and KPP equations in fluid flows, 2026

work page 2026
[36]

Deep network approximation characterized by number of neurons.Communications in Computational Physics, 28(5):1768–1811, November 2020

Zuowei Shen, Haizhao Yang, and Shijun Zhang. Deep network approximation characterized by number of neurons.Communications in Computational Physics, 28(5):1768–1811, November 2020. 26

work page 2020
[37]

Dgm: A deep learning algorithm for solving partial differential equations.Journal of Computational Physics, 375:1339–1364, 2018

Justin Sirignano and Konstantinos Spiliopoulos. Dgm: A deep learning algorithm for solving partial differential equations.Journal of Computational Physics, 375:1339–1364, 2018

work page 2018
[38]

Wasserstein generative adversarial networks are minimax optimal distribution estimators

Arthur St´ ephanovitch, Eddie Aamari, and Cl´ ement Levrard. Wasserstein generative adversarial networks are minimax optimal distribution estimators. The Annals of Statistics, 52(5):2167 – 2193, 2024

work page 2024
[39]

Wasserstein bounds for generative diffusion models with gaussian tail targets.arXiv preprint arXiv:2412.11251, 2024

Xixian Wang and Zhongjian Wang. Wasserstein bounds for generative diffusion models with gaussian tail targets.arXiv preprint arXiv:2412.11251, 2024

work page arXiv 2024
[40]

Z. Wang, J. Xin, and Z. Zhang. DeepParticle: Learning invariant measure by a deep neural network minimizing Wasserstein distance on data generated from an interacting particle method.Journal of Computational Physics, 464:111309, 2022

work page 2022
[41]

Z. Wang, J. Xin, and Z. Zhang. A DeepParticle method for learning and gen- erating aggregation patterns in multi-dimensional Keller-Segel chemotaxis systems.Physica D, 460:134082, 2024

work page 2024
[42]

Computing effective dif- fusivity of chaotic and stochastic flows using structure-preserving schemes

Zhongjian Wang, Jack Xin, and Zhiwen Zhang. Computing effective dif- fusivity of chaotic and stochastic flows using structure-preserving schemes. SIAM Journal on Numerical Analysis, 56(4):2322–2344, 2018

work page 2018
[43]

Sharp error estimates on a stochastic structure-preserving scheme in computing effective diffusivity of 3D chaotic flows.Multiscale Modeling & Simulation, 19(3):1167–1189, 2021

Zhongjian Wang, Jack Xin, and Zhiwen Zhang. Sharp error estimates on a stochastic structure-preserving scheme in computing effective diffusivity of 3D chaotic flows.Multiscale Modeling & Simulation, 19(3):1167–1189, 2021

work page 2021
[44]

A mesh-free method for interface problems using the deep learning approach.Journal of Computational Physics, 400:108963, 2020

Zhongjian Wang and Zhiwen Zhang. A mesh-free method for interface problems using the deep learning approach.Journal of Computational Physics, 400:108963, 2020

work page 2020
[45]

Zhizhang Wu, Renaud Raqu´ epas, Jack Xin, and Zhiwen Zhang. Computing large deviation rate functions of entropy production for diffusion processes by an interacting particle method.SIAM Journal on Scientific Computing, 47(6):A3330–A3355, 2025

work page 2025
[46]

On the rates of conver- gence for learning with convolutional neural networks.SIAM Journal on Mathematics of Data Science, 7(4):1755–1772, 2025

Yunfei Yang, Han Feng, and Ding-Xuan Zhou. On the rates of conver- gence for learning with convolutional neural networks.SIAM Journal on Mathematics of Data Science, 7(4):1755–1772, 2025

work page 2025
[47]

Zhang, Z

T. Zhang, Z. Wang, J. Xin, and Z. Zhang. A bidirectional DeepParticle method for efficiently solving low-dimensional transport map problems. Journal of Computational Physics, 2026 to be appear

work page 2026
[48]

A convergent interacting particle method for computing KPP front speeds in random flows.SIAM/ASA Journal on Uncertainty Quantification, 13(2):639–678, 2025

Tan Zhang, Zhongjian Wang, Jack Xin, and Zhiwen Zhang. A convergent interacting particle method for computing KPP front speeds in random flows.SIAM/ASA Journal on Uncertainty Quantification, 13(2):639–678, 2025. 27

work page 2025

[1] [1]

Wasserstein gener- ative adversarial networks

Martin Arjovsky, Soumith Chintala, and L´ eon Bottou. Wasserstein gener- ative adversarial networks. In Doina Precup and Yee Whye Teh, editors, Proceedings of the 34th International Conference on Machine Learning, volume 70 ofProceedings of Machine Learning Research, pages 214–223. PMLR, 06–11 Aug 2017

work page 2017

[2] [2]

Some theoretical insights into wasserstein GANs.Journal of Machine Learning Research, 22(119):1–45, 2021

G´ erard Biau, Maxime Sangnier, and Ugo Tanielian. Some theoretical insights into wasserstein GANs.Journal of Machine Learning Research, 22(119):1–45, 2021

work page 2021

[3] [3]

American Mathematical Society, Providence, Rhode Island, 1 edition, 2019

Sergey Bobkov and Michel Ledoux.One-dimensional empirical measures, order statistics, and Kantorovich transport distances, volume 261 ofMemoirs of the American Mathematical Society. American Mathematical Society, Providence, Rhode Island, 1 edition, 2019

work page 2019

[4] [4]

Caffarelli

Luis A. Caffarelli. The regularity of mappings with a convex potential. Journal of the American Mathematical Society, 5(1):99–104, 1992

work page 1992

[5] [5]

Bartlett

Saptarshi Chakraborty and Peter L. Bartlett. On the statistical properties of generative adversarial models for low intrinsic data dimension.Journal of Machine Learning Research, 26:1–80, 2025

work page 2025

[6] [6]

Distribution approximation and statistical estimation guarantees of generative adversarial networks, 2022

Minshuo Chen, Wenjing Liao, Hongyuan Zha, and Tuo Zhao. Distribution approximation and statistical estimation guarantees of generative adversarial networks, 2022

work page 2022

[7] [7]

Xiaoli Chen, Phoebus Rosakis, Zhizhang Wu, and Zhiwen Zhang. Solving nonconvex energy minimization problems in martensitic phase transitions with a mesh-free deep learning approach.Computer Methods in Applied Mechanics and Engineering, 416:116384, 2023

work page 2023

[8] [8]

Podno: Proper orthogonal decomposition neural operators, 2025

Zilan Cheng, Zhongjian Wang, Li-Lian Wang, and Mejdi Azaiez. Podno: Proper orthogonal decomposition neural operators, 2025

work page 2025

[9] [9]

A variational neural network approach for glacier modelling with nonlinear rheology.Communi- cations in Computational Physics, 34(4):934–954, 2023

Tiangang Cui, Zhongjian Wang, and Zhiwen Zhang. A variational neural network approach for glacier modelling with nonlinear rheology.Communi- cations in Computational Physics, 34(4):934–954, 2023

work page 2023

[10] [10]

Convergence of denoising diffusion models under the manifold hypoth- esis.arXiv preprint arXiv:2208.05314,

Valentin De Bortoli. Convergence of denoising diffusion models under the manifold hypothesis.arXiv preprint arXiv:2208.05314, 2022

work page arXiv 2022

[11] [11]

The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems.Communications in Mathematics and Statistics, 6(1):1–12, 2018

Weinan E and Bing Yu. The deep Ritz method: A deep learning-based numerical algorithm for solving variational problems.Communications in Mathematics and Statistics, 6(1):1–12, 2018. 24

work page 2018

[12] [12]

Evans.Partial differential equations

Lawrence C. Evans.Partial differential equations. American Mathematical Society, Providence, R.I., 2010

work page 2010

[13] [13]

On the rate of convergence in wasser- stein distance of the empirical measure.Probability Theory and Related Fields, 162:707–738, 2015

Nicolas Fournier and Arnaud Guillin. On the rate of convergence in wasser- stein distance of the empirical measure.Probability Theory and Related Fields, 162:707–738, 2015

work page 2015

[14] [14]

Trudinger.Elliptic partial differential equations of second order

David Gilbarg and Neil S. Trudinger.Elliptic partial differential equations of second order. Grundlehren der mathematischen Wissenschaften ; 224. Springer-Verlag, Berlin, 2nd ed. edition, 1983

work page 1983

[15] [15]

Goodfellow, J

I. Goodfellow, J. Pouget-Abadie, M. Mirza, B. Xu, D. Warde-Farley, S. Ozair, A. Courville, and Y. Bengio. Generative adversarial nets.Advances in Neural Information Processing Systems, 27, 2014

work page 2014

[16] [16]

Jain, and Pieter Abbeel

Jonathan Ho, Ajay N. Jain, and Pieter Abbeel. Denoising diffusion prob- abilistic models. InAdvances in Neural Information Processing Systems, volume 33. Curran Associates, Inc., 2020

work page 2020

[17] [17]

Neural network Poisson-Boltzmann electrostatics for biomolecular interactions

Zunding Huang, Bo Li, Zhongming Wang, and Zhiwen Zhang. Neural network Poisson-Boltzmann electrostatics for biomolecular interactions. Journal of Computational Physics, page 114446, 2025

work page 2025

[18] [18]

On the regularity of optimal transports between degenerate densities.Archive for Rational Mechanics and Analysis, 245(2):819–861, June 2022

Yash Jhaveri and Ovidiu Savin. On the regularity of optimal transports between degenerate densities.Archive for Rational Mechanics and Analysis, 245(2):819–861, June 2022

work page 2022

[19] [19]

A generalization of caffarelli’s con- traction theorem via (reverse) heat flow.Mathematische annalen, 354(3):827– 862, 2012

Young-Heon Kim and Emanuel Milman. A generalization of caffarelli’s con- traction theorem via (reverse) heat flow.Mathematische annalen, 354(3):827– 862, 2012

work page 2012

[20] [20]

Kobyzev, S

I. Kobyzev, S. Prince, and M. Brubaker. Normalizing flows: An introduction and review of current methods.IEEE Transactions on Pattern Analysis and Machine Intelligence, 2020

work page 2020

[21] [21]

Approximating smooth functions by deep neural net- works with sigmoid activation function.Journal of Multivariate Analysis, 182:104696, 2021

Sophie Langer. Approximating smooth functions by deep neural net- works with sigmoid activation function.Journal of Multivariate Analysis, 182:104696, 2021

work page 2021

[22] [22]

Fourier neural operator for parametric partial differential equations

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations. InInternational Conference on Learning Representations, 2021

work page 2021

[23] [23]

Physics- informed neural operator for learning partial differential equations.ACM / IMS J

Zongyi Li, Hongkai Zheng, Nikola Kovachki, David Jin, Haoxuan Chen, Burigede Liu, Kamyar Azizzadenesheli, and Anima Anandkumar. Physics- informed neural operator for learning partial differential equations.ACM / IMS J. Data Sci., 1(3), May 2024. 25

work page 2024

[24] [24]

How well generative adversarial networks learn distribu- tions.Journal of Machine Learning Research, 22(228):1–41, 2021

Tengyuan Liang. How well generative adversarial networks learn distribu- tions.Journal of Machine Learning Research, 22(228):1–41, 2021

work page 2021

[25] [25]

G. M. Lieberman.Second Order Parabolic Differential Equations. World Scientific, 1996

work page 1996

[26] [26]

Cell-induced densification and tether formation in fibrous extracellular matrices with biomimetic physics-informed neural networks

Anci Lin, Xiaohong Liu, Zhiwen Zhang, Weidong Zhao, and Wenju Zhao. Biomimetic PINNs for cell-induced phase transitions: UQ-R3 sampling with causal gating.arXiv preprint arXiv:2603.29184, 2026

work page internal anchor Pith review Pith/arXiv arXiv 2026

[27] [27]

Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3:218–229, 2021

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators.Nature Machine Intelligence, 3:218–229, 2021

work page 2021

[28] [28]

A convergent interacting particle method and computation of KPP front speeds in chaotic flows.SIAM Journal on Numerical Analysis, 60(3):1136–1167, 2022

Junlong Lyu, Zhongjian Wang, Jack Xin, and Zhiwen Zhang. A convergent interacting particle method and computation of KPP front speeds in chaotic flows.SIAM Journal on Numerical Analysis, 60(3):1136–1167, 2022

work page 2022

[29] [29]

Opti- mal transport mapping via input convex neural networks

Ashok Makkuva, Amirhossein Taghvaei, Sewoong Oh, and Jason Lee. Opti- mal transport mapping via input convex neural networks. In Hal Daum´ e III and Aarti Singh, editors,Proceedings of the 37th International Confer- ence on Machine Learning, volume 119 ofProceedings of Machine Learning Research, pages 6672–6681. PMLR, 13–18 Jul 2020

work page 2020

[30] [30]

Pathway to o( √ d) complexity bound under wasserstein metric of flow-based models, 2025

Xiangjun Meng and Zhongjian Wang. Pathway to o( √ d) complexity bound under wasserstein metric of flow-based models, 2025

work page 2025

[31] [31]

Spectral normalization for generative adversarial networks, 2018

Takeru Miyato, Toshiki Kataoka, Masanori Koyama, and Yuichi Yoshida. Spectral normalization for generative adversarial networks, 2018

work page 2018

[32] [32]

Adaptive computation and machine learning series

Mehryar Mohri, Afshin Rostamizadeh, and Ameet Talwalkar.Foundations of machine learning. Adaptive computation and machine learning series. MIT Press, Cambridge, MA, 2012

work page 2012

[33] [33]

Lawrence E. Payne. Maximum principles in differential equations (murray h. protter and hans f. weinberger).SIAM Review, 10(3):386–387, July 1968

work page 1968

[34] [34]

Raissi, P

M. Raissi, P. Perdikaris, and G. Karniadakis. Physics-informed neural networks: A deep learning framework for solving forward and inverse problems involving nonlinear partial differential equations.Journal of Computational Physics, 378:686–707, 2019

work page 2019

[35] [35]

Z. Shen, Z. Wang, J. Xin, and Z. Zhang. Two-step diffusion: Fast sampling and reliable prediction for 3D Keller-Segel and KPP equations in fluid flows, 2026

work page 2026

[36] [36]

Deep network approximation characterized by number of neurons.Communications in Computational Physics, 28(5):1768–1811, November 2020

Zuowei Shen, Haizhao Yang, and Shijun Zhang. Deep network approximation characterized by number of neurons.Communications in Computational Physics, 28(5):1768–1811, November 2020. 26

work page 2020

[37] [37]

Dgm: A deep learning algorithm for solving partial differential equations.Journal of Computational Physics, 375:1339–1364, 2018

Justin Sirignano and Konstantinos Spiliopoulos. Dgm: A deep learning algorithm for solving partial differential equations.Journal of Computational Physics, 375:1339–1364, 2018

work page 2018

[38] [38]

Wasserstein generative adversarial networks are minimax optimal distribution estimators

Arthur St´ ephanovitch, Eddie Aamari, and Cl´ ement Levrard. Wasserstein generative adversarial networks are minimax optimal distribution estimators. The Annals of Statistics, 52(5):2167 – 2193, 2024

work page 2024

[39] [39]

Wasserstein bounds for generative diffusion models with gaussian tail targets.arXiv preprint arXiv:2412.11251, 2024

Xixian Wang and Zhongjian Wang. Wasserstein bounds for generative diffusion models with gaussian tail targets.arXiv preprint arXiv:2412.11251, 2024

work page arXiv 2024

[40] [40]

Z. Wang, J. Xin, and Z. Zhang. DeepParticle: Learning invariant measure by a deep neural network minimizing Wasserstein distance on data generated from an interacting particle method.Journal of Computational Physics, 464:111309, 2022

work page 2022

[41] [41]

Z. Wang, J. Xin, and Z. Zhang. A DeepParticle method for learning and gen- erating aggregation patterns in multi-dimensional Keller-Segel chemotaxis systems.Physica D, 460:134082, 2024

work page 2024

[42] [42]

Computing effective dif- fusivity of chaotic and stochastic flows using structure-preserving schemes

Zhongjian Wang, Jack Xin, and Zhiwen Zhang. Computing effective dif- fusivity of chaotic and stochastic flows using structure-preserving schemes. SIAM Journal on Numerical Analysis, 56(4):2322–2344, 2018

work page 2018

[43] [43]

Sharp error estimates on a stochastic structure-preserving scheme in computing effective diffusivity of 3D chaotic flows.Multiscale Modeling & Simulation, 19(3):1167–1189, 2021

Zhongjian Wang, Jack Xin, and Zhiwen Zhang. Sharp error estimates on a stochastic structure-preserving scheme in computing effective diffusivity of 3D chaotic flows.Multiscale Modeling & Simulation, 19(3):1167–1189, 2021

work page 2021

[44] [44]

A mesh-free method for interface problems using the deep learning approach.Journal of Computational Physics, 400:108963, 2020

Zhongjian Wang and Zhiwen Zhang. A mesh-free method for interface problems using the deep learning approach.Journal of Computational Physics, 400:108963, 2020

work page 2020

[45] [45]

Zhizhang Wu, Renaud Raqu´ epas, Jack Xin, and Zhiwen Zhang. Computing large deviation rate functions of entropy production for diffusion processes by an interacting particle method.SIAM Journal on Scientific Computing, 47(6):A3330–A3355, 2025

work page 2025

[46] [46]

On the rates of conver- gence for learning with convolutional neural networks.SIAM Journal on Mathematics of Data Science, 7(4):1755–1772, 2025

Yunfei Yang, Han Feng, and Ding-Xuan Zhou. On the rates of conver- gence for learning with convolutional neural networks.SIAM Journal on Mathematics of Data Science, 7(4):1755–1772, 2025

work page 2025

[47] [47]

Zhang, Z

T. Zhang, Z. Wang, J. Xin, and Z. Zhang. A bidirectional DeepParticle method for efficiently solving low-dimensional transport map problems. Journal of Computational Physics, 2026 to be appear

work page 2026

[48] [48]

A convergent interacting particle method for computing KPP front speeds in random flows.SIAM/ASA Journal on Uncertainty Quantification, 13(2):639–678, 2025

Tan Zhang, Zhongjian Wang, Jack Xin, and Zhiwen Zhang. A convergent interacting particle method for computing KPP front speeds in random flows.SIAM/ASA Journal on Uncertainty Quantification, 13(2):639–678, 2025. 27

work page 2025