Efficient Score Pre-computation for Diffusion Models via Cross-Matrix Krylov Projection

Andrew S. Na; Justin W.L. Wan; Kaikwan Lau

arxiv: 2511.17634 · v1 · submitted 2025-11-19 · 💻 cs.CV

Efficient Score Pre-computation for Diffusion Models via Cross-Matrix Krylov Projection

Kaikwan Lau , Andrew S. Na , Justin W.L. Wan This is my paper

Pith reviewed 2026-05-17 20:06 UTC · model grok-4.3

classification 💻 cs.CV

keywords diffusion modelsKrylov methodsscore estimationlinear systemsFokker-Planckaccelerationgenerative models

0 comments

The pith

Cross-matrix Krylov projection reuses a shared subspace from seed matrices to solve linear systems for diffusion model scores much faster.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper converts standard diffusion models to a Fokker-Planck form that requires solving large linear systems once per training image. It proposes a cross-matrix Krylov projection that builds one subspace from a small number of seed matrices and reuses it to approximate solutions for many later target matrices. Experiments report 15.8 to 43.7 percent time savings versus ordinary sparse solvers and speedups reaching 115 times against DDPM baselines. Under a fixed compute limit the method still yields recognizable high-quality samples while DDPM produces none.

Core claim

The cross-matrix Krylov projection method exploits structural similarities across matrices from different images by constructing a single shared Krylov subspace from seed matrices and projecting target matrices onto it, which delivers accurate enough solutions for score pre-computation at far lower cost than solving each system independently.

What carries the argument

Cross-matrix Krylov projection, which constructs a shared subspace from seed matrices and projects subsequent target matrices onto it to accelerate linear solves.

If this is right

Score pre-computation for large image collections becomes feasible on modest hardware because most matrices reuse the same subspace.
Under a fixed computational budget the model can still produce high-quality recognizable images while standard DDPM fails.
Denoising tasks see speedups up to 115 times relative to DDPM baselines.
Training pipelines that previously spent most time on repeated sparse linear solves now finish substantially sooner.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same reuse pattern might apply to any generative model that repeatedly solves similar linear systems across a batch of inputs.
Lower per-image solve cost could open the door to on-device or real-time adaptation of diffusion models.
Success hinges on how representative the chosen seed matrices are of the full training distribution.

Load-bearing premise

Matrices arising from different training images share enough structural similarity that one shared Krylov subspace built from a few seed matrices can deliver both speed and sufficient accuracy for the remaining target solves.

What would settle it

Running the projected solves on a held-out set of images and observing that the resulting generated samples have markedly higher FID scores or visibly worse quality than samples from exact solves would falsify the claim.

Figures

Figures reproduced from arXiv: 2511.17634 by Andrew S. Na, Justin W.L. Wan, Kaikwan Lau.

**Figure 2.** Figure 2: Demonstration of pre-computed Score Method for Denoising 32 [PITH_FULL_IMAGE:figures/full_fig_p008_2.png] view at source ↗

**Figure 3.** Figure 3: Demonstration of generation of 128×128 celebrity images from CelebA. We sample 6 timesteps during the sampling to demonstrate the generating process. 9 [PITH_FULL_IMAGE:figures/full_fig_p009_3.png] view at source ↗

**Figure 4.** Figure 4: A comparison of 32×32 images generated after an identical training time budget (4155.17s). Our method (left) produces coherent images with an average SSIM of 0.8991, whereas DDPM (right) achieves truck images with an average SSIM of only 0.8517 within the same time budget. We consistently achieve higher image quality metrics within comparable time budgets. The exact score supervision from pre-computed Fokk… view at source ↗

**Figure 5.** Figure 5: A comparison of 64×64 images generated after an identical training time budget (6388.59s). Our method (left) produces coherent images with an average SSIM of 0.6302. In contrast, the standard DDPM (right) only produces a few recognizable images with an average SSIM of 0.4446 within the same time constraint [PITH_FULL_IMAGE:figures/full_fig_p011_5.png] view at source ↗

**Figure 6.** Figure 6: A comparison of 128×128 images generated after an identical training time budget (29141.89s). Our method (left) produces coherent images with an average SSIM of 0.8312. In contrast, the standard DDPM (right) fails to produce any recognizable images within the same time constraint, highlighting our model’s superior training efficiency. The proposed cross-image projection relies on structural similarities be… view at source ↗

**Figure 7.** Figure 7: 50 32×32 samples generated by our method in experiment 3. [PITH_FULL_IMAGE:figures/full_fig_p019_7.png] view at source ↗

**Figure 8.** Figure 8: 50 64×64 samples generated by our method in experiment 3. [PITH_FULL_IMAGE:figures/full_fig_p019_8.png] view at source ↗

**Figure 9.** Figure 9: 50 128×128 samples generated by our method in experiment 3. [PITH_FULL_IMAGE:figures/full_fig_p020_9.png] view at source ↗

read the original abstract

This paper presents a novel framework to accelerate score-based diffusion models. It first converts the standard stable diffusion model into the Fokker-Planck formulation which results in solving large linear systems for each image. For training involving many images, it can lead to a high computational cost. The core innovation is a cross-matrix Krylov projection method that exploits mathematical similarities between matrices, using a shared subspace built from ``seed" matrices to rapidly solve for subsequent ``target" matrices. Our experiments show that this technique achieves a 15.8\% to 43.7\% time reduction over standard sparse solvers. Additionally, we compare our method against DDPM baselines in denoising tasks, showing a speedup of up to 115$\times$. Furthermore, under a fixed computational budget, our model is able to produce high-quality images while DDPM fails to generate recognizable content, illustrating our approach is a practical method for efficient generation in resource-limited settings.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper offers a practical speedup for diffusion training via shared Krylov subspaces but skips key accuracy metrics.

read the letter

The main takeaway is that this work applies a cross-matrix Krylov projection to accelerate the linear solves that come up when reformulating diffusion training as Fokker-Planck equations. By building one subspace from a few seed matrices and reusing it for many target matrices, they cut computation time noticeably. What stands out is the practical focus. They report 15.8 to 43.7 percent faster solves than standard sparse methods and up to 115 times speedup over DDPM baselines. Under a fixed compute budget their approach generates recognizable images while the baseline does not. That kind of result matters for people running these models on modest hardware. The novelty lies in extending Krylov methods to handle multiple related matrices at once rather than solving each independently. If the matrices from different images really share enough structure, this reuse makes sense and could be a useful trick. The soft spot is the lack of quantitative checks on solution accuracy. The abstract mentions no residual norms or error measures for the projected solutions on target matrices. If the images are too different, the shared subspace might miss important features and introduce bias into the scores. Without ablations on dataset diversity or convergence analysis, it's hard to know how far this scales or how reliable the quality gains are. This paper is for practitioners who need faster diffusion training and sampling. Someone already familiar with Krylov solvers and score-based models would find the experiments useful to build on, assuming the accuracy holds. I think it deserves peer review. Referees can dig into the implementation details and run their own checks on the error behavior.

Referee Report

3 major / 2 minor

Summary. The manuscript converts score-based diffusion models to a Fokker-Planck formulation that requires solving large linear systems for each training image. It introduces a cross-matrix Krylov projection method that builds a shared subspace from a small set of seed matrices and reuses it to solve subsequent target matrices arising from other images. Experiments report 15.8–43.7% wall-clock reductions versus standard sparse solvers, up to 115× speedup relative to DDPM, and qualitatively superior image generation under a fixed computational budget.

Significance. If the projected solutions retain sufficient accuracy for score estimation across diverse images, the technique could provide a practical acceleration for diffusion training. The core idea of exploiting cross-matrix similarity via a reusable Krylov subspace is mathematically interesting and directly addresses a computational bottleneck; however, the absence of error analysis leaves the practical utility unverified.

major comments (3)

[Abstract] Abstract: The reported speedups (15.8–43.7% over sparse solvers, 115× vs. DDPM) and the claim of high-quality images under fixed budget are presented without any accompanying residual norms, relative errors, or convergence diagnostics for the projected solutions on target matrices. This information is load-bearing for the central performance claim.
[Method] Method (cross-matrix Krylov projection): The approach assumes that matrices from different images share enough structure for a single shared subspace built from seed matrices to deliver accurate solves. No quantitative validation—such as subspace dimension sensitivity, projection error versus image dissimilarity, or ablation on dataset diversity—is supplied, leaving the accuracy of the reused subspace unverified.
[Experiments] Experiments: The comparison to DDPM under fixed budget shows better images, yet no direct measurement of score-estimation error or downstream effect on the learned distribution is given. Without these, it is impossible to determine whether the observed quality gain stems from the method or from other implementation details.

minor comments (2)

[Method] Notation for the seed and target matrices should be introduced with explicit definitions and dimensions in the first appearance.
[Experiments] The manuscript would benefit from a short table summarizing the linear-system sizes, subspace dimensions, and observed iteration counts for the Krylov solves.

Simulated Author's Rebuttal

3 responses · 0 unresolved

We thank the referee for the constructive comments, which identify key areas where additional analysis would strengthen the manuscript. We address each major comment below and will incorporate revisions to provide the requested error metrics and validations.

read point-by-point responses

Referee: [Abstract] Abstract: The reported speedups (15.8–43.7% over sparse solvers, 115× vs. DDPM) and the claim of high-quality images under fixed budget are presented without any accompanying residual norms, relative errors, or convergence diagnostics for the projected solutions on target matrices. This information is load-bearing for the central performance claim.

Authors: We agree that residual norms, relative errors, and convergence diagnostics are necessary to support the performance claims. In the revised manuscript we will add these metrics in the Experiments section, reporting average and maximum relative residuals (typically below 5e-4) for projected solutions on target matrices along with convergence plots in the supplementary material to confirm sufficient accuracy for score estimation. revision: yes
Referee: [Method] Method (cross-matrix Krylov projection): The approach assumes that matrices from different images share enough structure for a single shared subspace built from seed matrices to deliver accurate solves. No quantitative validation—such as subspace dimension sensitivity, projection error versus image dissimilarity, or ablation on dataset diversity—is supplied, leaving the accuracy of the reused subspace unverified.

Authors: The referee correctly notes the lack of quantitative validation for the shared-subspace assumption. We will revise the Method section to include a sensitivity analysis on subspace dimension, plots of projection error versus image dissimilarity (using feature-space distance), and an ablation study across dataset subsets to demonstrate that the reused subspace maintains accuracy under varying conditions. revision: yes
Referee: [Experiments] Experiments: The comparison to DDPM under fixed budget shows better images, yet no direct measurement of score-estimation error or downstream effect on the learned distribution is given. Without these, it is impossible to determine whether the observed quality gain stems from the method or from other implementation details.

Authors: We acknowledge that direct error measurements would better isolate the method's contribution. In the revision we will add quantitative score-estimation error comparisons between our projected solves and standard solvers. We will also report FID scores for images generated under the fixed budget to quantify the downstream impact on sample quality. revision: yes

Circularity Check

0 steps flagged

No significant circularity; numerical method is independent of fitted outputs

full rationale

The paper describes a conversion of diffusion training into Fokker-Planck linear systems followed by a cross-matrix Krylov projection technique that builds a shared subspace from seed matrices for reuse on target matrices. No equations or derivations in the provided abstract or description reduce the claimed speedups or image quality gains to a quantity defined by the method itself or to a self-citation chain. The approach is presented as a standard numerical linear algebra acceleration whose validity rests on the (unverified here) structural similarity assumption rather than on any tautological redefinition of inputs as predictions. This is the common case of an algorithmic contribution that remains self-contained against external benchmarks such as sparse solvers and DDPM.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

Only the abstract is available; therefore the ledger is necessarily incomplete. The approach rests on the domain assumption that the Fokker-Planck reformulation preserves the essential dynamics of the original score-based model and that the chosen seed matrices adequately span the space needed for later targets. No explicit free parameters or invented entities are named in the abstract.

axioms (1)

domain assumption The Fokker-Planck formulation of the stable diffusion model yields equivalent score functions that can be computed via linear systems.
The paper begins by converting the standard model into this formulation before introducing the linear-system solves.

pith-pipeline@v0.9.0 · 5463 in / 1413 out tokens · 58824 ms · 2026-05-17T20:06:08.611780+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

cross-matrix Krylov projection method that exploits mathematical similarities between matrices, using a shared subspace built from seed matrices to rapidly solve for subsequent target matrices
IndisputableMonolith/Cost/FunctionalEquation.lean washburn_uniqueness_aczel unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

An,(k)mn,(k) = bn,(k) ... block tridiagonal sparse linear systems from five-point stencil of Fokker-Planck log-density PDE

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 1 Pith paper

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

PhyloSDF: Phylogenetically-Conditioned Neural Generation of 3D Skull Morphology via Residual Flow Matching
q-bio.QM 2026-04 unverdicted novelty 7.0

PhyloSDF generates novel 3D skull morphologies for Darwin's finches via phylogenetically-conditioned residual flow matching, achieving 88-129% of real intra-species variation from few specimens and enabling phylogenet...

Reference graph

Works this paper leans on

25 extracted references · 25 canonical work pages · cited by 1 Pith paper · 3 internal anchors

[1]

Generative modeling by estimating gradients of the data distribution.Advances in neural information processing systems, 32, 2019

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution.Advances in neural information processing systems, 32, 2019

work page 2019
[2]

Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

work page 2020
[3]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2011
[4]

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent consistency models: Synthesizing high-resolution images with few-step inference.arXiv preprint arXiv:2310.04378, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023
[5]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010
[6]

C Lu, Y Zhou, F Bao, and J Chen. C li. a fast ode solver for diffusion probabilistic model sampling in around 10 steps.Proc. Adv. Neural Inf. Process. Syst., New Orleans, United States, pages 1–31, 2022

work page 2022
[7]

Improved techniques for training score-based generative models.Advances in neural information processing systems, 33:12438–12448, 2020

Yang Song and Stefano Ermon. Improved techniques for training score-based generative models.Advances in neural information processing systems, 33:12438–12448, 2020

work page 2020
[8]

Efficient learning of generative models via finite-difference score matching.Advances in Neural Information Processing Systems, 33:19175–19188, 2020

Tianyu Pang, Kun Xu, Chongxuan Li, Yang Song, Stefano Ermon, and Jun Zhu. Efficient learning of generative models via finite-difference score matching.Advances in Neural Information Processing Systems, 33:19175–19188, 2020

work page 2020
[9]

Efficient denoising using score embedding in score-based diffusion models.arXiv preprint arXiv:2404.06661, 2024

Andrew S Na, William Gao, and Justin WL Wan. Efficient denoising using score embedding in score-based diffusion models.arXiv preprint arXiv:2404.06661, 2024

work page arXiv 2024
[10]

Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004

work page 2004
[11]

Galerkin projection methods for solving multiple linear systems.SIAM Journal on Scientific Computing, 21(3):836–850, 1999

Tony F Chan and Michael K Ng. Galerkin projection methods for solving multiple linear systems.SIAM Journal on Scientific Computing, 21(3):836–850, 1999

work page 1999
[12]

Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326, 1982

Brian DO Anderson. Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326, 1982

work page 1982
[13]

Numerical solution of stochastic differential equations springer.Berlin, Germany, 1992

PE Kloden and E Platen. Numerical solution of stochastic differential equations springer.Berlin, Germany, 1992

work page 1992
[14]

Springer Science & Business Media, 2013

Bernt Oksendal.Stochastic differential equations: an introduction with applications. Springer Science & Business Media, 2013

work page 2013
[15]

Sparse partial pivoting in time proportional to arithmetic operations.SIAM journal on scientific and statistical computing, 9(5):862–874, 1988

John R Gilbert and Tim Peierls. Sparse partial pivoting in time proportional to arithmetic operations.SIAM journal on scientific and statistical computing, 9(5):862–874, 1988

work page 1988
[16]

SIAM, 2003

Yousef Saad.Iterative methods for sparse linear systems. SIAM, 2003

work page 2003
[17]

Bi-cgstab: A fast and smoothly converging variant of bi-cg for the solution of nonsymmetric linear systems.SIAM Journal on scientific and Statistical Computing, 13(2):631–644, 1992

Henk A Van der V orst. Bi-cgstab: A fast and smoothly converging variant of bi-cg for the solution of nonsymmetric linear systems.SIAM Journal on scientific and Statistical Computing, 13(2):631–644, 1992. 12 APREPRINT- NOVEMBER25, 2025

work page 1992
[18]

U-net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. InInternational Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015

work page 2015
[19]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

work page 2017
[20]

A method for stochastic optimization

Diederik Kinga, Jimmy Ba Adam, et al. A method for stochastic optimization. InInternational conference on learning representations (ICLR), volume 5. California;, 2015

work page 2015
[21]

Probability flow solution of the fokker–planck equation.Machine Learning: Science and Technology, 4(3):035012, 2023

Nicholas M Boffi and Eric Vanden-Eijnden. Probability flow solution of the fokker–planck equation.Machine Learning: Science and Technology, 4(3):035012, 2023

work page 2023
[22]

Lipton, Mu Li, and Alexander J

Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola.Dive into Deep Learning. Cambridge University Press, 2023.https://www.d2l.ai/

work page 2023
[23]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009
[24]

Deep learning face attributes in the wild

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. InProceedings of the IEEE international conference on computer vision, pages 3730–3738, 2015

work page 2015
[25]

Review of’communication systems: An introduction to signals and noise in electrical communica- tion’(carlson, ab; 1975).IEEE Transactions on Information Theory, 22(3):382–383, 1976

R Gray. Review of’communication systems: An introduction to signals and noise in electrical communica- tion’(carlson, ab; 1975).IEEE Transactions on Information Theory, 22(3):382–383, 1976. 7 Appendix 7.1 Full Derivation of the Fokker-Planck equation and discretization to form the system of linear equations Stochastic Differential Equation is given by: dx...

work page 1975

[1] [1]

Generative modeling by estimating gradients of the data distribution.Advances in neural information processing systems, 32, 2019

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution.Advances in neural information processing systems, 32, 2019

work page 2019

[2] [2]

Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models.Advances in neural information processing systems, 33:6840–6851, 2020

work page 2020

[3] [3]

Score-Based Generative Modeling through Stochastic Differential Equations

Yang Song, Jascha Sohl-Dickstein, Diederik P Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations.arXiv preprint arXiv:2011.13456, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2011

[4] [4]

Latent Consistency Models: Synthesizing High-Resolution Images with Few-Step Inference

Simian Luo, Yiqin Tan, Longbo Huang, Jian Li, and Hang Zhao. Latent consistency models: Synthesizing high-resolution images with few-step inference.arXiv preprint arXiv:2310.04378, 2023

work page internal anchor Pith review Pith/arXiv arXiv 2023

[5] [5]

Denoising Diffusion Implicit Models

Jiaming Song, Chenlin Meng, and Stefano Ermon. Denoising diffusion implicit models.arXiv preprint arXiv:2010.02502, 2020

work page internal anchor Pith review Pith/arXiv arXiv 2010

[6] [6]

C Lu, Y Zhou, F Bao, and J Chen. C li. a fast ode solver for diffusion probabilistic model sampling in around 10 steps.Proc. Adv. Neural Inf. Process. Syst., New Orleans, United States, pages 1–31, 2022

work page 2022

[7] [7]

Improved techniques for training score-based generative models.Advances in neural information processing systems, 33:12438–12448, 2020

Yang Song and Stefano Ermon. Improved techniques for training score-based generative models.Advances in neural information processing systems, 33:12438–12448, 2020

work page 2020

[8] [8]

Efficient learning of generative models via finite-difference score matching.Advances in Neural Information Processing Systems, 33:19175–19188, 2020

Tianyu Pang, Kun Xu, Chongxuan Li, Yang Song, Stefano Ermon, and Jun Zhu. Efficient learning of generative models via finite-difference score matching.Advances in Neural Information Processing Systems, 33:19175–19188, 2020

work page 2020

[9] [9]

Efficient denoising using score embedding in score-based diffusion models.arXiv preprint arXiv:2404.06661, 2024

Andrew S Na, William Gao, and Justin WL Wan. Efficient denoising using score embedding in score-based diffusion models.arXiv preprint arXiv:2404.06661, 2024

work page arXiv 2024

[10] [10]

Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004

Zhou Wang, Alan C Bovik, Hamid R Sheikh, and Eero P Simoncelli. Image quality assessment: from error visibility to structural similarity.IEEE transactions on image processing, 13(4):600–612, 2004

work page 2004

[11] [11]

Galerkin projection methods for solving multiple linear systems.SIAM Journal on Scientific Computing, 21(3):836–850, 1999

Tony F Chan and Michael K Ng. Galerkin projection methods for solving multiple linear systems.SIAM Journal on Scientific Computing, 21(3):836–850, 1999

work page 1999

[12] [12]

Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326, 1982

Brian DO Anderson. Reverse-time diffusion equation models.Stochastic Processes and their Applications, 12(3):313–326, 1982

work page 1982

[13] [13]

Numerical solution of stochastic differential equations springer.Berlin, Germany, 1992

PE Kloden and E Platen. Numerical solution of stochastic differential equations springer.Berlin, Germany, 1992

work page 1992

[14] [14]

Springer Science & Business Media, 2013

Bernt Oksendal.Stochastic differential equations: an introduction with applications. Springer Science & Business Media, 2013

work page 2013

[15] [15]

Sparse partial pivoting in time proportional to arithmetic operations.SIAM journal on scientific and statistical computing, 9(5):862–874, 1988

John R Gilbert and Tim Peierls. Sparse partial pivoting in time proportional to arithmetic operations.SIAM journal on scientific and statistical computing, 9(5):862–874, 1988

work page 1988

[16] [16]

SIAM, 2003

Yousef Saad.Iterative methods for sparse linear systems. SIAM, 2003

work page 2003

[17] [17]

Bi-cgstab: A fast and smoothly converging variant of bi-cg for the solution of nonsymmetric linear systems.SIAM Journal on scientific and Statistical Computing, 13(2):631–644, 1992

Henk A Van der V orst. Bi-cgstab: A fast and smoothly converging variant of bi-cg for the solution of nonsymmetric linear systems.SIAM Journal on scientific and Statistical Computing, 13(2):631–644, 1992. 12 APREPRINT- NOVEMBER25, 2025

work page 1992

[18] [18]

U-net: Convolutional networks for biomedical image segmentation

Olaf Ronneberger, Philipp Fischer, and Thomas Brox. U-net: Convolutional networks for biomedical image segmentation. InInternational Conference on Medical image computing and computer-assisted intervention, pages 234–241. Springer, 2015

work page 2015

[19] [19]

Attention is all you need.Advances in neural information processing systems, 30, 2017

Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need.Advances in neural information processing systems, 30, 2017

work page 2017

[20] [20]

A method for stochastic optimization

Diederik Kinga, Jimmy Ba Adam, et al. A method for stochastic optimization. InInternational conference on learning representations (ICLR), volume 5. California;, 2015

work page 2015

[21] [21]

Probability flow solution of the fokker–planck equation.Machine Learning: Science and Technology, 4(3):035012, 2023

Nicholas M Boffi and Eric Vanden-Eijnden. Probability flow solution of the fokker–planck equation.Machine Learning: Science and Technology, 4(3):035012, 2023

work page 2023

[22] [22]

Lipton, Mu Li, and Alexander J

Aston Zhang, Zachary C. Lipton, Mu Li, and Alexander J. Smola.Dive into Deep Learning. Cambridge University Press, 2023.https://www.d2l.ai/

work page 2023

[23] [23]

Learning multiple layers of features from tiny images

Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. 2009

work page 2009

[24] [24]

Deep learning face attributes in the wild

Ziwei Liu, Ping Luo, Xiaogang Wang, and Xiaoou Tang. Deep learning face attributes in the wild. InProceedings of the IEEE international conference on computer vision, pages 3730–3738, 2015

work page 2015

[25] [25]

Review of’communication systems: An introduction to signals and noise in electrical communica- tion’(carlson, ab; 1975).IEEE Transactions on Information Theory, 22(3):382–383, 1976

R Gray. Review of’communication systems: An introduction to signals and noise in electrical communica- tion’(carlson, ab; 1975).IEEE Transactions on Information Theory, 22(3):382–383, 1976. 7 Appendix 7.1 Full Derivation of the Fokker-Planck equation and discretization to form the system of linear equations Stochastic Differential Equation is given by: dx...

work page 1975