Local Diffusion Models and Phases of Data Distributions

Fangjun Hu; Guangkuo Liu; Xun Gao; Yifan F. Zhang

arxiv: 2508.06614 · v2 · submitted 2025-08-08 · 💻 cs.LG · cond-mat.stat-mech· quant-ph

Local Diffusion Models and Phases of Data Distributions

Fangjun Hu , Guangkuo Liu , Yifan F. Zhang , Xun Gao This is my paper

Pith reviewed 2026-05-18 23:30 UTC · model grok-4.3

classification 💻 cs.LG cond-mat.stat-mechquant-ph

keywords diffusion modelslocal denoisersphase transitionsdata distribution phasesspatial Markovianityscore functionsgenerative modelsefficient architectures

0 comments

The pith

The reverse denoising process splits into an early trivial phase and a late data phase separated by a rapid transition where local denoisers must fail.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper defines phases of data distributions by whether distributions can be connected through spatially local operations along the diffusion path. This definition shows that the reverse process begins in a trivial phase, ends in a data phase, and crosses a narrow interval of rapid change where any local denoiser necessarily fails. A reader would care because the result indicates that global score computations are required only inside that narrow interval, so local neural networks can handle most of the denoising process. The work links local-denoiser success to spatial Markovianity and uses this link as an operational test for the transition points, confirming the pattern on real datasets.

Core claim

We define two distributions as belonging to the same data distribution phase if they can be mutually connected via spatially local operations such as local denoisers, along the same evolution path as the diffusion. We demonstrate that the reverse denoising process consists of an early trivial phase and a late data phase, sandwiching a rapid phase transition where local denoisers must fail. We further demonstrate that the performance of local denoisers is closely tied to spatial Markovianity, which provides an operational criterion for diagnosing such phase transitions.

What carries the argument

Phases of data distributions, defined as equivalence classes under spatially local operations along the diffusion path, which locate the narrow interval where local denoisers fail.

If this is right

Far from the phase transition point, small local neural networks can compute the score function.
Global neural networks are needed only inside the narrow time window around each phase transition.
Spatial Markovianity supplies a practical test for locating those transition times.
Diffusion architectures can therefore use local networks for most timesteps and global networks only near the transitions.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

The same phase definition could be tested on other iterative generative processes to see whether similar transitions appear.
Architectures that switch between local and global layers at detected transition times could be built and benchmarked for speed gains.
Repeating the Markovianity diagnostic across many datasets would show whether the phase structure is common to high-dimensional data.
The framework may connect to other non-equilibrium analyses of generative models and suggest new ways to measure locality requirements.

Load-bearing premise

Two distributions belong to the same phase precisely when they can be connected by spatially local operations along the diffusion path.

What would settle it

An experiment in which a local denoiser maintains high performance through the entire reverse process on a standard image dataset, with no detectable drop at the predicted transition time, would contradict the existence of a rapid phase where local denoisers must fail.

Figures

Figures reproduced from arXiv: 2508.06614 by Fangjun Hu, Guangkuo Liu, Xun Gao, Yifan F. Zhang.

**Figure 2.** Figure 2: FIG. 2. Schematics of designing local denoisers. For time step [PITH_FULL_IMAGE:figures/full_fig_p004_2.png] view at source ↗

**Figure 3.** Figure 3: FIG. 3. (a) CMI [PITH_FULL_IMAGE:figures/full_fig_p005_3.png] view at source ↗

**Figure 4.** Figure 4: FIG. 4. 64 samples of denoised images, with local denoisers ( [PITH_FULL_IMAGE:figures/full_fig_p005_4.png] view at source ↗

read the original abstract

As a class of generative artificial intelligence frameworks inspired by statistical physics, diffusion models have shown extraordinary performance in synthesizing complicated data distributions through a denoising process gradually guided by score functions. Real-life data, like images, is often spatially structured in low-dimensional spaces. However, ordinary diffusion models ignore this local structure and learn spatially global score functions, which are often computationally expensive. In this work, motivated by recent advances in non-equilibrium statistical physics, we develop a generic framework for defining phases of data distributions and use it to analyze the locality requirements of denoisers in diffusion models. We define two distributions as belonging to the same data distribution phase if they can be mutually connected via spatially local operations such as local denoisers, along the same evolution path as the diffusion. We demonstrate that the reverse denoising process consists of an early trivial phase and a late data phase, sandwiching a rapid phase transition where local denoisers must fail. We further demonstrate that the performance of local denoisers is closely tied to spatial Markovianity, which provides an operational criterion for diagnosing such phase transitions. We validate this criterion through numerical experiments on real-world datasets. Our work suggests guidance for simpler and more efficient architectures of diffusion models: far from the phase transition point, we can use small local neural networks to compute the score function; global neural networks are only necessary around the narrow time interval of phase transitions. This result also opens up new directions for studying phases of data distributions, the broader science of generative artificial intelligence, and guiding the design of neural networks inspired by physics concepts.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The phase framework maps diffusion trajectories to local-connectivity phases and flags a transition for hybrid local-global denoisers, but the necessity claim tracks the definition closely.

read the letter

This paper's main point is that the reverse denoising process moves through an early trivial phase and a late data phase separated by a sharp transition where local denoisers must fail, with spatial Markovianity offered as a way to locate that point and guide cheaper architectures. The authors define two distributions as belonging to the same phase when they can be connected by spatially local operations along the diffusion path, then use that to argue for small local networks far from the transition and global ones only in a narrow window. What is new is the explicit phase construction drawn from non-equilibrium physics and the operational link to Markovianity as a diagnostic for when locality breaks. The practical suggestion for splitting computation in high-resolution generative tasks is a clear plus and engages the efficiency side of diffusion work directly. The numerics on real datasets at least show the idea is testable rather than purely abstract. The soft spots are around how much the central necessity claim stands apart from the definition itself. Because phases are defined by the existence of local connections, the statement that local denoisers fail at the transition follows directly from that setup, which makes the result partly tautological. Markovianity is presented as an independent criterion, yet the abstract does not spell out a derivation showing it is equivalent or a strict bound rather than correlated with the local models they tried. Without reported controls, baselines, or effect sizes, the validation remains preliminary. This is a moderate rather than fatal issue, but it does mean the argument needs tighter separation between definition and evidence to carry full weight. The work is aimed at researchers who build or optimize diffusion models and who are open to physics-style framing for locality and scaling. A reader interested in generative model efficiency or statistical mechanics of learning would get usable ideas even if they end up adjusting the details. It deserves a serious referee because the framing is distinct from standard diffusion papers and the efficiency angle is concrete enough to check. I would send it to peer review rather than desk reject, with the main request being clearer justification that the Markovianity diagnostic adds independent force beyond the phase definition.

Referee Report

2 major / 2 minor

Summary. The paper introduces a framework for phases of data distributions in diffusion models, defining two distributions as belonging to the same phase if they can be connected via spatially local operations along the diffusion path. It claims the reverse denoising process consists of an early trivial phase and a late data phase separated by a rapid transition at which local denoisers must fail, links this to spatial Markovianity as an operational diagnostic, validates the criterion numerically on real datasets, and concludes that small local networks suffice far from the transition while global networks are needed only in a narrow interval.

Significance. If the necessity claim for local-denoiser failure is shown to be independent of the phase definition and the Markovianity criterion is rigorously tied to it, the work could guide computationally lighter diffusion architectures for spatially structured data such as images. It also offers a physics-motivated lens for analyzing generative processes and may stimulate further study of phases in high-dimensional distributions.

major comments (2)

[Abstract / phase definition] Abstract and the phase-definition paragraph: the statement that 'local denoisers must fail' at the transition follows directly from the adopted definition (two distributions are in the same phase precisely when they are connected by spatially local operations). The manuscript must therefore derive, rather than assume, that spatial Markovianity is equivalent to (or a strict upper bound on) the non-existence of any local denoiser, not merely correlated with the specific local architectures tested.
[Numerical experiments] Numerical-validation section: the abstract states that the Markovianity criterion is validated on real-world datasets, yet supplies no quantitative details on how the transition point is located, what controls or baselines are used, or the observed effect sizes. Without these, it is unclear whether the rapid transition is observed independently of the definitional framework.

minor comments (2)

[Definition of phases] Clarify the precise mathematical meaning of 'spatially local operations' and 'along the same evolution path' when the definition is first introduced.
[Introduction] Add a short discussion of related prior work on local score estimation or physics-inspired restrictions on diffusion networks.

Simulated Author's Rebuttal

2 responses · 0 unresolved

We thank the referee for their constructive and detailed feedback on our manuscript. We address each major comment point by point below, indicating where revisions have been made to improve clarity and rigor.

read point-by-point responses

Referee: [Abstract / phase definition] Abstract and the phase-definition paragraph: the statement that 'local denoisers must fail' at the transition follows directly from the adopted definition (two distributions are in the same phase precisely when they are connected by spatially local operations). The manuscript must therefore derive, rather than assume, that spatial Markovianity is equivalent to (or a strict upper bound on) the non-existence of any local denoiser, not merely correlated with the specific local architectures tested.

Authors: We agree that the claim of local denoiser failure at the transition is a direct logical consequence of the phase definition, as two distributions belong to different phases precisely when no sequence of spatially local operations connects them along the diffusion path; this is derived from the definition rather than assumed. In the revised manuscript we have made this implication explicit in the abstract and the phase-definition section. On the link to spatial Markovianity, the original text presents it as an operational diagnostic supported by theoretical arguments connecting the loss of locality in the score function to the emergence of long-range correlations (i.e., Markovianity violation). We acknowledge that a general proof establishing equivalence or a strict upper bound for arbitrary local architectures is not supplied and would constitute a substantial extension. We have therefore added a clarifying paragraph that distinguishes the definitional necessity from the Markovianity criterion, notes the empirical correlation observed for the tested local networks, and flags a rigorous general proof as an open question for future work. revision: partial
Referee: [Numerical experiments] Numerical-validation section: the abstract states that the Markovianity criterion is validated on real-world datasets, yet supplies no quantitative details on how the transition point is located, what controls or baselines are used, or the observed effect sizes. Without these, it is unclear whether the rapid transition is observed independently of the definitional framework.

Authors: We thank the referee for identifying this gap in presentation. The revised numerical-validation section now supplies the requested quantitative information: the procedure used to locate the transition via the spatial Markovianity measure, the control experiments and baselines (including direct comparisons against global architectures), and the measured effect sizes on denoising performance for the real-world datasets examined. These additions demonstrate that the rapid transition is detected consistently and is not an artifact of the phase-definition framework itself. revision: yes

Circularity Check

1 steps flagged

Phase definition via local connectivity makes 'local denoisers must fail' at transition largely definitional rather than independently demonstrated

specific steps

self definitional [Abstract]
"We define two distributions as belonging to the same data distribution phase if they can be mutually connected via spatially local operations such as local denoisers, along the same evolution path as the diffusion. We demonstrate that the reverse denoising process consists of an early trivial phase and a late data phase, sandwiching a rapid phase transition where local denoisers must fail."

The demonstration that local denoisers must fail at the transition follows immediately from the preceding definition: distributions in different phases are precisely those that cannot be connected by local operations. Identifying a phase transition therefore entails the failure of local denoisers by the definition itself, rather than by an independent argument or external theorem.

full rationale

The paper's central claim that local denoisers must fail at the phase transition reduces directly to its own definition of phases as equivalence classes under spatially local operations along the diffusion path. By defining different phases as those not connectable by local denoisers, the failure at the transition point is true by construction once the transition is identified. The additional link to spatial Markovianity provides an operational diagnostic that is validated numerically on datasets, but does not independently derive the necessity claim from first principles outside the definitional framework. This produces partial circularity in the load-bearing step while leaving room for the empirical validation to add non-circular content.

Axiom & Free-Parameter Ledger

0 free parameters · 2 axioms · 1 invented entities

The framework rests on the definition of phases via local connectivity and the assumption that real data possesses low-dimensional spatial structure; no explicit free parameters or new particles are introduced in the abstract.

axioms (2)

domain assumption Real-life data such as images is spatially structured in low-dimensional spaces.
Invoked in the opening paragraph to motivate local denoisers.
ad hoc to paper Two distributions belong to the same phase if they can be mutually connected via spatially local operations along the diffusion path.
This is the central definitional axiom used to identify the phase transition.

invented entities (1)

data distribution phase no independent evidence
purpose: To classify distributions according to whether they are reachable from each other by local denoisers during diffusion.
New conceptual object introduced to organize the denoising trajectory; no independent falsifiable prediction is stated in the abstract.

pith-pipeline@v0.9.0 · 5821 in / 1575 out tokens · 28976 ms · 2026-05-18T23:30:17.774602+00:00 · methodology

discussion (0)

Lean theorems connected to this paper

Citations machine-checked in the Pith Canon. Every link opens the source theorem in the public Lean library.

IndisputableMonolith/Foundation/AbsoluteFloorClosure.lean reality_from_one_distinction unclear

?

unclear
Relation between the paper passage and the cited Recognition theorem.

We define two distributions as belonging to the same data distribution phase if they can be mutually connected via spatially local operations such as local denoisers, along the same evolution path as the diffusion.

What do these tags mean?

matches: The paper's claim is directly supported by a theorem in the formal canon.
supports: The theorem supports part of the paper's argument, but the paper may add assumptions or extra steps.
extends: The paper goes beyond the formal theorem; the theorem is a base layer rather than the whole result.
uses: The paper appears to rely on the theorem as machinery.
contradicts: The paper's claim conflicts with a theorem or certificate in the canon.
unclear: Pith found a possible connection, but the passage is too broad, indirect, or ambiguous to say the theorem truly supports the claim.

Forward citations

Cited by 3 Pith papers

Reviewed papers in the Pith corpus that reference this work. Sorted by Pith novelty score.

Concurrence of Symmetry Breaking and Nonlocality Phase Transitions in Diffusion Models
cs.LG 2026-05 unverdicted novelty 7.0

Symmetry breaking and nonlocality phase transitions occur nearly simultaneously during diffusion model generation in modern transformers.
Learning and Generating Mixed States Prepared by Shallow Channel Circuits
quant-ph 2026-04 unverdicted novelty 7.0

Any mixed state in the trivial phase can be efficiently learned and approximately generated by a shallow local channel circuit from polynomial measurements, without access to the original circuit.
Learning and Generating Mixed States Prepared by Shallow Channel Circuits
quant-ph 2026-04 unverdicted novelty 6.0

Mixed states in the trivial phase can be approximately generated by a learned shallow local channel circuit from measurement copies alone, with polynomial sample and runtime complexity.

Reference graph

Works this paper leans on

62 extracted references · 62 canonical work pages · cited by 2 Pith papers · 1 internal anchor

[1]

Y . Z. acknowledges support from NSF QuSEC-TAQS OSI 2326767. G. L. and X. G. acknowledge support from NSF PFC grant No. PHYS 2317149

work page
[2]

Sohl-Dickstein, E

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Gan- guli, Deep unsupervised learning using nonequilibrium thermo- dynamics, inInternational Conference on Machine Learning, V ol. 37 (2015) pp. 2256–2265

work page 2015
[3]

Song and S

Y . Song and S. Ermon, Generative modeling by estimating gra- dients of the data distribution, inAdvances in Neural Informa- tion Processing Systems, V ol. 32 (2019)

work page 2019
[4]

J. Ho, A. Jain, and P. Abbeel, Denoising diffusion probabilistic models, inAdvances in Neural Information Processing Systems, V ol. 33 (2020) pp. 6840–6851

work page 2020
[5]

J. Song, C. Meng, and S. Ermon, Denoising diffusion implicit models, inInternational Conference on Learning Representa- tions(2021)

work page 2021
[6]

Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Er- mon, and B. Poole, Score-based generative modeling through stochastic differential equations, inAdvances in Neural Infor- mation Processing Systems, V ol. 34 (2021)

work page 2021
[7]

Lipman, R

Y . Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le, Flow matching for generative modeling, inInternational Con- ference on Learning Representations(2022)

work page 2022
[8]

Midjourney, Inc., Midjourney (2022)

work page 2022
[9]

Stability AI, Stable Diffusion (2022)

work page 2022
[10]

OpenAI, DALL·E 3 (2023)

work page 2023
[11]

Google DeepMind, Imagen 4 (2025)

work page 2025
[12]

Hyv ¨arinen, Estimation of non-normalized statistical models by score matching, Journal of Machine Learning Research6, 695 (2005)

A. Hyv ¨arinen, Estimation of non-normalized statistical models by score matching, Journal of Machine Learning Research6, 695 (2005)

work page 2005
[13]

Z. Wang, Y . Jiang, H. Zheng, P. Wang, P. He, Z. Wang, W. Chen, M. Zhou,et al., Patch diffusion: Faster and more data-efficient training of diffusion models, inAdvances in Neural Information Processing Systems, V ol. 36 (2023)

work page 2023
[14]

Z. Ding, M. Zhang, J. Wu, and Z. Tu, Patched denoising dif- fusion models for high-resolution image synthesis, inInterna- tional Conference on Learning Representations(2023)

work page 2023
[15]

An analytic theory of creativity in convolutional diffusion models

M. Kamb and S. Ganguli, An analytic theory of creativity in convolutional diffusion models, arXiv:2412.20292 [cs.LG] (2024)

work page arXiv 2024
[16]

Niedoba, B

M. Niedoba, B. Zwartsenberg, K. Murphy, and F. Wood, To- wards a mechanistic explanation of diffusion model generaliza- tion, arXiv:2411.19339 [cs.LG] (2024)

work page arXiv 2024
[17]

Chen, Z.-C

X. Chen, Z.-C. Gu, and X.-G. Wen, Local unitary trans- formation, long-range quantum entanglement, wave function renormalization, and topological order, Physical Review B82, 155138 (2010)

work page 2010
[18]

Coser and D

A. Coser and D. P ´erez-Garc´ıa, Classification of phases for mixed states via fast dissipative evolution, Quantum3, 174 (2019)

work page 2019
[19]

Sang and T

S. Sang and T. H. Hsieh, Stability of mixed-state quantum phases via finite markov length, Physical Review Letters134, 070403 (2025)

work page 2025
[20]

Biroli, T

G. Biroli, T. Bonnaire, V . de Bortoli, and M. M ´ezard, Dynam- ical regimes of diffusion models, Nature Communications15, 9957 (2024)

work page 2024
[21]

Raya and L

G. Raya and L. Ambrogioni, Spontaneous symmetry breaking in generative diffusion models, inAdvances in Neural Informa- tion Processing Systems, V ol. 36 (2023)

work page 2023
[22]

Li and S

M. Li and S. Chen, Critical windows: non-asymptotic theory for feature emergence in diffusion models, arXiv:2403.01633 [cs.LG] (2024)

work page arXiv 2024
[23]

Sclocchi, A

A. Sclocchi, A. Favero, N. I. Levi, and M. Wyart, Probing the latent hierarchical structure of data via diffusion models, arXiv:2410.13770 [stat.ML] (2024)

work page arXiv 2024
[24]

Sclocchi, A

A. Sclocchi, A. Favero, and M. Wyart, A phase transition in diffusion models reveals the hierarchical nature of data, arXiv:2402.16991 [stat.ML] (2024)

work page arXiv 2024
[25]

M. Li, A. Karan, and S. Chen, Blink of an eye: a simple theory for feature localization in generative models, arXiv:2502.00921 [cs.LG] (2025)

work page arXiv 2025
[26]

LeCun, C

Y . LeCun, C. Cortes, and C. J. Burges, MNIST hand- written digit database,http://yann.lecun.com/exdb/ mnist/(1998)

work page 1998
[27]

Petz, Sufficient subalgebras and the relative entropy of states of a von neumann algebra, Communications in Mathematical Physics105, 123–131 (1986)

D. Petz, Sufficient subalgebras and the relative entropy of states of a von neumann algebra, Communications in Mathematical Physics105, 123–131 (1986)

work page 1986
[28]

W. M. Mark,Quantum Information Theory(Cambridge Univer- sity Press, 2016)

work page 2016
[29]

Junge, R

M. Junge, R. Renner, D. Sutter, M. M. Wilde, and A. Winter, Universal recovery maps and approximate sufficiency of quan- tum relative entropy, Annales Henri Poincar ´e19, 2955–2978 (2018)

work page 2018
[30]

H. Kwon, R. Mukherjee, and M.-S. Kim, Reversing lindblad dynamics via continuous petz recovery map, Physical Review Letters128, 020403 (2022)

work page 2022
[31]

B. D. Anderson, Reverse-time diffusion equation models, Stochastic Processes and their Applications12, 313–326 (1982)

work page 1982
[32]

Li and A

K. Li and A. Winter, Squashed entanglement,k-extendibility, 8 quantum markov chains, and recovery maps, Foundations of Physics48, 910–924 (2018)

work page 2018
[33]

Fawzi and R

O. Fawzi and R. Renner, Quantum conditional mutual informa- tion and approximate markov chains, Communications in Math- ematical Physics340, 575–611 (2015)

work page 2015
[34]

Zhang and S

Y . Zhang and S. Gopalakrishnan, Conditional mutual informa- tion and information-theoretic phases of decohered gibbs states, arXiv:2502.13210 [quant-ph] (2025)

work page arXiv 2025
[35]

S. Sang. Private communications

work page
[36]

Rosenblatt, Remarks on some nonparametric estimates of a density function, The Annals of Mathematical Statistics27, 832 (1956)

M. Rosenblatt, Remarks on some nonparametric estimates of a density function, The Annals of Mathematical Statistics27, 832 (1956)

work page 1956
[37]

Parzen, On estimation of a probability density function and mode, The Annals of Mathematical Statistics33, 1065 (1962)

E. Parzen, On estimation of a probability density function and mode, The Annals of Mathematical Statistics33, 1065 (1962)

work page 1962
[38]

Heitz, L

E. Heitz, L. Belcour, and T. Chambon, Iterativeα-(de)blending: a minimalist deterministic diffusion model, inProceedings of ICLR 2023 / SIGGRAPH 2023 Conference Track(2023)

work page 2023
[39]

M. I. Belghazi, A. Baratin, S. Rajeshwar, S. Ozair, Y . Bengio, A. Courville, and D. Hjelm, Mutual information neural estima- tion, inInternational Conference on Machine Learning, V ol. 80 (2018) pp. 531–540

work page 2018
[40]

Ronneberger, P

O. Ronneberger, P. Fischer, and T. Brox, U-net: Convolutional networks for biomedical image segmentation, inMedical Im- age Computing and Computer-Assisted Intervention (MICCAI) (Springer, 2015) pp. 234–241

work page 2015
[41]

Zhang, P

B. Zhang, P. Xu, X. Chen, and Q. Zhuang, Generative quantum machine learning via denoising diffusion probabilistic models, Physical Review Letters132, 100602 (2024)

work page 2024
[42]

Xinyu Liu, Jingze Zhuang, and Yi-Zhuang You, in prepara- tion. This work also leverages the Petz map to perform quan- tum diffusion models, and proposes a concrete scheme of weak measurement-based classical shadow tomography to learn the Petz map

work page
[43]

B. D. O. Anderson and I. B. Rhodes, Smoothing algorithms for nonlinear finite-dimensional systems, Stochastics9, 139–165 (1983)

work page 1983
[44]

H. Sun, L. Yu, B. Dai, D. Schuurmans, and H. Dai, Score-based continuous-time discrete diffusion models, arXiv:2211.16750 [cs.LG] (2022)

work page arXiv 2022
[45]

Sutter, M

D. Sutter, M. Tomamichel, and A. W. Harrow, Strengthened monotonicity of relative entropy via pinched petz recovery map, in2016 IEEE International Symposium on Information Theory (ISIT)(IEEE, 2016) p. 760–764

work page 2016
[46]

M. D. Donsker and S. R. S. Varadhan, Asymptotic evaluation of certain Markov process expectations for large time. IV, Com- munications on Pure and Applied Mathematics30, 182 (1983)

work page 1983
[47]

S. Lu, M. Kan ´asz-Nagy, I. Kukuljan, and J. I. Cirac, Tensor networks and efficient descriptions of classical data, Physical Review A111, 032409 (2025)

work page 2025
[48]

Srivastava, G

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: a simple way to prevent neural net- works from overfitting, J. Mach. Learn. Res.15, 1929–1958 (2014)

work page 1929
[49]

D. P. Kingma, Adam: A method for stochastic optimization, in International Conference on Learning Representations(2015)

work page 2015
[50]

Loshchilov and F

I. Loshchilov and F. Hutter, Decoupled weight decay regular- ization, inInternational Conference on Learning Representa- tions (ICLR)(2019)

work page 2019
[51]

Lee and W

K. Lee and W. Rhee, A benchmark suite for evaluating neu- ral mutual information estimators on unstructured datasets, in Advances in Neural Information Processing Systems(2025) pp. 46319–46338

work page 2025
[52]

FiLM: Visual Reasoning with a General Conditioning Layer

E. Perez, F. Strub, H. de Vries, V . Dumoulin, and A. Courville, Film: Visual reasoning with a general conditioning layer, arXiv:1709.07871 [cs.CV] (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024
[53]

M. S. Leifer and R. W. Spekkens, Towards a formulation of quantum theory as a causally neutral theory of bayesian infer- ence, Physical Review A88, 052130 (2013)

work page 2013
[54]

Khatri and M

S. Khatri and M. M. Wilde, Principles of quantum communica- tion theory: A modern approach, arXiv:2011.04672 [quant-ph] (2020). 1 Supplementary Materials: Local Diffusion Models and Phases of Data Distributions CONTENTS S1 Derivation of Score-based Denoising from Bayes Formula 1 A Denoising for the continuous variable . . . . . . . . . . . . . . . . . . ....

work page arXiv 2011
[55]

We notice thatχ| ϵ=0 =ρ 1 2 and ∂ ∂ϵ (χ−1) ϵ=0 =−(χ| ϵ=0)−1 ∂χ ∂ϵ ϵ=0 (χ|ϵ=0)−1 =−ρ − 1 2 ∂χ ∂ϵ ϵ=0 ρ− 1 2 .(S62) 13 Then we only need to compute ∂χ ∂ϵ ϵ=0

Derivative ofN(ρ) − 1 2 Now we letχ=N(ρ) 1 2 , namelyχ 2 =N(ρ). We notice thatχ| ϵ=0 =ρ 1 2 and ∂ ∂ϵ (χ−1) ϵ=0 =−(χ| ϵ=0)−1 ∂χ ∂ϵ ϵ=0 (χ|ϵ=0)−1 =−ρ − 1 2 ∂χ ∂ϵ ϵ=0 ρ− 1 2 .(S62) 13 Then we only need to compute ∂χ ∂ϵ ϵ=0 . Sinceχ 2 =N(ρ), we have χ ∂χ ∂ϵ + ∂χ ∂ϵ χ= ∂ ∂ϵ (N(ρ)).(S63) Here comes to the symmetric division atϵ= 0: the relation 1 2 n ρ 1 2 , ∂χ...

work page
[56]

For any operatorτ, we have ∂ ∂ϵ (N †(τ)) ϵ=0 =L †(τ) =a †τ a− 1 2(a†aτ+τ a †a).(S68)

Derivative ofN † This part is easy. For any operatorτ, we have ∂ ∂ϵ (N †(τ)) ϵ=0 =L †(τ) =a †τ a− 1 2(a†aτ+τ a †a).(S68)

work page
[57]

B Continuous-time Twirled Petz Map Thetwirled Petz mapis defined as TN,ρ (σ) = Z ∞ −∞ f(θ)ρ 1−iθ 2 N † h N(ρ) −1+iθ 2 σN(ρ) −1−iθ 2 i ρ 1+iθ 2 ,(S75) wheref(θ) = π 2(cosh(πθ)+1)

Derivative ofP N,ρ Now we can expandP N,ρ (σ)into PN,ρ (σ) =ρ 1 2 N(ρ) − 1 2 σN(ρ) − 1 2 +ϵL † N(ρ) − 1 2 σN(ρ) − 1 2 +O(ϵ 2) ρ 1 2 =ρ 1 2 ρ− 1 2 −ϵρ − 1 2 Lρ1/2 1 2 L(ρ) ρ− 1 2 +O(ϵ 2) σ ρ− 1 2 −ϵρ − 1 2 Lρ1/2 1 2 L(ρ) ρ− 1 2 +O(ϵ 2) ρ 1 2 +ϵρ 1 2 L† N(ρ) − 1 2 σN(ρ) − 1 2 ρ 1 2 +O(ϵ 2) =σ+ϵ −Lρ1/2 1 2 L(ρ) ρ− 1 2 σ−σρ − 1 2 Lρ1/2 1 2 L(ρ) +ρ 1 2 L† ρ− 1...

work page
[58]

We notice thatχ θ|ϵ=0 =ρ 1−iθ 2 and ∂ ∂ϵ (χ−1 θ ) ϵ=0 =−(χ θ|ϵ=0)−1 ∂χθ ∂ϵ ϵ=0 (χθ|ϵ=0)−1 =−ρ −1+iθ 2 ∂χθ ∂ϵ ϵ=0 ρ −1+iθ 2 .(S77) Then we only need to computeκ θ = ∂χθ ∂ϵ ϵ=0

Derivative ofN(ρ) −1+iθ 2 Now we letχ θ =N(ρ) 1−iθ 2 , namelyχ θχ† θ =N(ρ). We notice thatχ θ|ϵ=0 =ρ 1−iθ 2 and ∂ ∂ϵ (χ−1 θ ) ϵ=0 =−(χ θ|ϵ=0)−1 ∂χθ ∂ϵ ϵ=0 (χθ|ϵ=0)−1 =−ρ −1+iθ 2 ∂χθ ∂ϵ ϵ=0 ρ −1+iθ 2 .(S77) Then we only need to computeκ θ = ∂χθ ∂ϵ ϵ=0 . Sinceχ θχ† θ =χ † θχθ =N(ρ), we have χθ ∂χ† θ ∂ϵ + ∂χθ ∂ϵ χ† θ = ∂ ∂ϵ (N(ρ)),(S78) χ† θ ∂χθ ∂ϵ + ∂χ† θ ∂...

work page
[59]

Consider a state with Wigner distributionW(x, p) = 1 2π P(x)

Derivative ofT N,ρ LetT N,ρ (σ) = R ∞ −∞ dθ f(θ)R N,ρ,θ (σ), whereR N,ρ (σ)is the rotated Petz map, RN,ρ,θ (σ) =ρ 1−iθ 2 N(ρ) −1+iθ 2 σN(ρ) −1−iθ 2 +ϵL † N(ρ) −1+iθ 2 σN(ρ) −1−iθ 2 +O(ϵ 2) ρ 1+iθ 2 =ρ 1−iθ 2 ρ −1+iθ 2 −ϵL ρ1/2,θ ρ −1+iθ 2 L(ρ)ρ −1+iθ 2 +O(ϵ 2) σ ρ −1−iθ 2 −ϵL ρ1/2,−θ ρ −1−iθ 2 L(ρ)ρ −1−iθ 2 +O(ϵ 2) ρ 1+iθ 2 +ϵρ 1−iθ 2 L† N(ρ) −1+iθ 2 σN(ρ...

work page
[60]

Firstly, ˆρ 1 2 ˆpˆρ− 1 2 |ψ⟩ ↔ √ P(−i∂ x) 1√ P ψ=−i ∂x − 1 2(∂x lnP) ψ,(S108) 19 namely ˆb↔b=−i ∂x − 1 2(∂x lnP)

Dissipative term in continuous-time Petz map Now we can computeD[ ˆb]ˆσ=D h ˆρ 1 2 ˆpˆρ− 1 2 i ˆσwhereˆσ= R dx Q(x)|x⟩ ⟨x|. Firstly, ˆρ 1 2 ˆpˆρ− 1 2 |ψ⟩ ↔ √ P(−i∂ x) 1√ P ψ=−i ∂x − 1 2(∂x lnP) ψ,(S108) 19 namely ˆb↔b=−i ∂x − 1 2(∂x lnP) . Similarly, ˆρ− 1 2 ˆpˆρ 1 2 |ψ⟩ ↔ 1√ P (−i∂x) √ P ψ=−i ∂x + 1 2(∂x lnP) ψ,(S109) namely ˆb† ↔b=−i ∂x + 1 2(∂x lnP) . ...

work page
[61]

Hamiltonian term in continuous-time Petz map Before computing−i[ ˆR, σ], we recall that ˆR=− i 2 Z dxdx′ p P(x)− p P(x ′)p P(x) + p P(x ′) ⟨x|ˆp2 + ˆb†ˆb|x ′⟩ |x⟩ ⟨x′|.(S116) We first check that ˆp2 |ψ⟩ ↔ −∂ 2 xψ,(S117) ˆb†ˆb|ψ⟩ ↔ −∂2 x + 1 2 s′ + 1 4 s2 ψ.(S118) We notice that Z dxdx′ p P(x)− p P(x ′)p P(x) + p P(x ′) ⟨x| 1 2 s′(x) + 1 4 s(x)2 |x′⟩ |x⟩ ⟨...

work page
[62]

Final expression of continuous-time Petz map under decoherence limit Finally, we have (remembers=∂ x(lnP(x))is the score function) −i[ ˆR,ˆσ]|ψ⟩ ↔ −1 2 sQ′ ψ,(S124) D[ˆb]ˆσ|ψ⟩ ↔ −1 2 sQ′ −s ′Q+ 1 2 Q′′ ψ,(S125) (−i[ ˆR,ˆσ] +D[ˆb]ˆσ)|ψ⟩ ↔ −∂x(sQ) + 1 2 Q′′ ψ.(S126) We note here that both−i[ ˆR,ˆσ]andD[ ˆb]are not trace-class, but their summation is trace-c...

work page

[1] [1]

Y . Z. acknowledges support from NSF QuSEC-TAQS OSI 2326767. G. L. and X. G. acknowledge support from NSF PFC grant No. PHYS 2317149

work page

[2] [2]

Sohl-Dickstein, E

J. Sohl-Dickstein, E. Weiss, N. Maheswaranathan, and S. Gan- guli, Deep unsupervised learning using nonequilibrium thermo- dynamics, inInternational Conference on Machine Learning, V ol. 37 (2015) pp. 2256–2265

work page 2015

[3] [3]

Song and S

Y . Song and S. Ermon, Generative modeling by estimating gra- dients of the data distribution, inAdvances in Neural Informa- tion Processing Systems, V ol. 32 (2019)

work page 2019

[4] [4]

J. Ho, A. Jain, and P. Abbeel, Denoising diffusion probabilistic models, inAdvances in Neural Information Processing Systems, V ol. 33 (2020) pp. 6840–6851

work page 2020

[5] [5]

J. Song, C. Meng, and S. Ermon, Denoising diffusion implicit models, inInternational Conference on Learning Representa- tions(2021)

work page 2021

[6] [6]

Y . Song, J. Sohl-Dickstein, D. P. Kingma, A. Kumar, S. Er- mon, and B. Poole, Score-based generative modeling through stochastic differential equations, inAdvances in Neural Infor- mation Processing Systems, V ol. 34 (2021)

work page 2021

[7] [7]

Lipman, R

Y . Lipman, R. T. Chen, H. Ben-Hamu, M. Nickel, and M. Le, Flow matching for generative modeling, inInternational Con- ference on Learning Representations(2022)

work page 2022

[8] [8]

Midjourney, Inc., Midjourney (2022)

work page 2022

[9] [9]

Stability AI, Stable Diffusion (2022)

work page 2022

[10] [10]

OpenAI, DALL·E 3 (2023)

work page 2023

[11] [11]

Google DeepMind, Imagen 4 (2025)

work page 2025

[12] [12]

Hyv ¨arinen, Estimation of non-normalized statistical models by score matching, Journal of Machine Learning Research6, 695 (2005)

A. Hyv ¨arinen, Estimation of non-normalized statistical models by score matching, Journal of Machine Learning Research6, 695 (2005)

work page 2005

[13] [13]

Z. Wang, Y . Jiang, H. Zheng, P. Wang, P. He, Z. Wang, W. Chen, M. Zhou,et al., Patch diffusion: Faster and more data-efficient training of diffusion models, inAdvances in Neural Information Processing Systems, V ol. 36 (2023)

work page 2023

[14] [14]

Z. Ding, M. Zhang, J. Wu, and Z. Tu, Patched denoising dif- fusion models for high-resolution image synthesis, inInterna- tional Conference on Learning Representations(2023)

work page 2023

[15] [15]

An analytic theory of creativity in convolutional diffusion models

M. Kamb and S. Ganguli, An analytic theory of creativity in convolutional diffusion models, arXiv:2412.20292 [cs.LG] (2024)

work page arXiv 2024

[16] [16]

Niedoba, B

M. Niedoba, B. Zwartsenberg, K. Murphy, and F. Wood, To- wards a mechanistic explanation of diffusion model generaliza- tion, arXiv:2411.19339 [cs.LG] (2024)

work page arXiv 2024

[17] [17]

Chen, Z.-C

X. Chen, Z.-C. Gu, and X.-G. Wen, Local unitary trans- formation, long-range quantum entanglement, wave function renormalization, and topological order, Physical Review B82, 155138 (2010)

work page 2010

[18] [18]

Coser and D

A. Coser and D. P ´erez-Garc´ıa, Classification of phases for mixed states via fast dissipative evolution, Quantum3, 174 (2019)

work page 2019

[19] [19]

Sang and T

S. Sang and T. H. Hsieh, Stability of mixed-state quantum phases via finite markov length, Physical Review Letters134, 070403 (2025)

work page 2025

[20] [20]

Biroli, T

G. Biroli, T. Bonnaire, V . de Bortoli, and M. M ´ezard, Dynam- ical regimes of diffusion models, Nature Communications15, 9957 (2024)

work page 2024

[21] [21]

Raya and L

G. Raya and L. Ambrogioni, Spontaneous symmetry breaking in generative diffusion models, inAdvances in Neural Informa- tion Processing Systems, V ol. 36 (2023)

work page 2023

[22] [22]

Li and S

M. Li and S. Chen, Critical windows: non-asymptotic theory for feature emergence in diffusion models, arXiv:2403.01633 [cs.LG] (2024)

work page arXiv 2024

[23] [23]

Sclocchi, A

A. Sclocchi, A. Favero, N. I. Levi, and M. Wyart, Probing the latent hierarchical structure of data via diffusion models, arXiv:2410.13770 [stat.ML] (2024)

work page arXiv 2024

[24] [24]

Sclocchi, A

A. Sclocchi, A. Favero, and M. Wyart, A phase transition in diffusion models reveals the hierarchical nature of data, arXiv:2402.16991 [stat.ML] (2024)

work page arXiv 2024

[25] [25]

M. Li, A. Karan, and S. Chen, Blink of an eye: a simple theory for feature localization in generative models, arXiv:2502.00921 [cs.LG] (2025)

work page arXiv 2025

[26] [26]

LeCun, C

Y . LeCun, C. Cortes, and C. J. Burges, MNIST hand- written digit database,http://yann.lecun.com/exdb/ mnist/(1998)

work page 1998

[27] [27]

Petz, Sufficient subalgebras and the relative entropy of states of a von neumann algebra, Communications in Mathematical Physics105, 123–131 (1986)

D. Petz, Sufficient subalgebras and the relative entropy of states of a von neumann algebra, Communications in Mathematical Physics105, 123–131 (1986)

work page 1986

[28] [28]

W. M. Mark,Quantum Information Theory(Cambridge Univer- sity Press, 2016)

work page 2016

[29] [29]

Junge, R

M. Junge, R. Renner, D. Sutter, M. M. Wilde, and A. Winter, Universal recovery maps and approximate sufficiency of quan- tum relative entropy, Annales Henri Poincar ´e19, 2955–2978 (2018)

work page 2018

[30] [30]

H. Kwon, R. Mukherjee, and M.-S. Kim, Reversing lindblad dynamics via continuous petz recovery map, Physical Review Letters128, 020403 (2022)

work page 2022

[31] [31]

B. D. Anderson, Reverse-time diffusion equation models, Stochastic Processes and their Applications12, 313–326 (1982)

work page 1982

[32] [32]

Li and A

K. Li and A. Winter, Squashed entanglement,k-extendibility, 8 quantum markov chains, and recovery maps, Foundations of Physics48, 910–924 (2018)

work page 2018

[33] [33]

Fawzi and R

O. Fawzi and R. Renner, Quantum conditional mutual informa- tion and approximate markov chains, Communications in Math- ematical Physics340, 575–611 (2015)

work page 2015

[34] [34]

Zhang and S

Y . Zhang and S. Gopalakrishnan, Conditional mutual informa- tion and information-theoretic phases of decohered gibbs states, arXiv:2502.13210 [quant-ph] (2025)

work page arXiv 2025

[35] [35]

S. Sang. Private communications

work page

[36] [36]

Rosenblatt, Remarks on some nonparametric estimates of a density function, The Annals of Mathematical Statistics27, 832 (1956)

M. Rosenblatt, Remarks on some nonparametric estimates of a density function, The Annals of Mathematical Statistics27, 832 (1956)

work page 1956

[37] [37]

Parzen, On estimation of a probability density function and mode, The Annals of Mathematical Statistics33, 1065 (1962)

E. Parzen, On estimation of a probability density function and mode, The Annals of Mathematical Statistics33, 1065 (1962)

work page 1962

[38] [38]

Heitz, L

E. Heitz, L. Belcour, and T. Chambon, Iterativeα-(de)blending: a minimalist deterministic diffusion model, inProceedings of ICLR 2023 / SIGGRAPH 2023 Conference Track(2023)

work page 2023

[39] [39]

M. I. Belghazi, A. Baratin, S. Rajeshwar, S. Ozair, Y . Bengio, A. Courville, and D. Hjelm, Mutual information neural estima- tion, inInternational Conference on Machine Learning, V ol. 80 (2018) pp. 531–540

work page 2018

[40] [40]

Ronneberger, P

O. Ronneberger, P. Fischer, and T. Brox, U-net: Convolutional networks for biomedical image segmentation, inMedical Im- age Computing and Computer-Assisted Intervention (MICCAI) (Springer, 2015) pp. 234–241

work page 2015

[41] [41]

Zhang, P

B. Zhang, P. Xu, X. Chen, and Q. Zhuang, Generative quantum machine learning via denoising diffusion probabilistic models, Physical Review Letters132, 100602 (2024)

work page 2024

[42] [42]

Xinyu Liu, Jingze Zhuang, and Yi-Zhuang You, in prepara- tion. This work also leverages the Petz map to perform quan- tum diffusion models, and proposes a concrete scheme of weak measurement-based classical shadow tomography to learn the Petz map

work page

[43] [43]

B. D. O. Anderson and I. B. Rhodes, Smoothing algorithms for nonlinear finite-dimensional systems, Stochastics9, 139–165 (1983)

work page 1983

[44] [44]

H. Sun, L. Yu, B. Dai, D. Schuurmans, and H. Dai, Score-based continuous-time discrete diffusion models, arXiv:2211.16750 [cs.LG] (2022)

work page arXiv 2022

[45] [45]

Sutter, M

D. Sutter, M. Tomamichel, and A. W. Harrow, Strengthened monotonicity of relative entropy via pinched petz recovery map, in2016 IEEE International Symposium on Information Theory (ISIT)(IEEE, 2016) p. 760–764

work page 2016

[46] [46]

M. D. Donsker and S. R. S. Varadhan, Asymptotic evaluation of certain Markov process expectations for large time. IV, Com- munications on Pure and Applied Mathematics30, 182 (1983)

work page 1983

[47] [47]

S. Lu, M. Kan ´asz-Nagy, I. Kukuljan, and J. I. Cirac, Tensor networks and efficient descriptions of classical data, Physical Review A111, 032409 (2025)

work page 2025

[48] [48]

Srivastava, G

N. Srivastava, G. Hinton, A. Krizhevsky, I. Sutskever, and R. Salakhutdinov, Dropout: a simple way to prevent neural net- works from overfitting, J. Mach. Learn. Res.15, 1929–1958 (2014)

work page 1929

[49] [49]

D. P. Kingma, Adam: A method for stochastic optimization, in International Conference on Learning Representations(2015)

work page 2015

[50] [50]

Loshchilov and F

I. Loshchilov and F. Hutter, Decoupled weight decay regular- ization, inInternational Conference on Learning Representa- tions (ICLR)(2019)

work page 2019

[51] [51]

Lee and W

K. Lee and W. Rhee, A benchmark suite for evaluating neu- ral mutual information estimators on unstructured datasets, in Advances in Neural Information Processing Systems(2025) pp. 46319–46338

work page 2025

[52] [52]

FiLM: Visual Reasoning with a General Conditioning Layer

E. Perez, F. Strub, H. de Vries, V . Dumoulin, and A. Courville, Film: Visual reasoning with a general conditioning layer, arXiv:1709.07871 [cs.CV] (2024)

work page internal anchor Pith review Pith/arXiv arXiv 2024

[53] [53]

M. S. Leifer and R. W. Spekkens, Towards a formulation of quantum theory as a causally neutral theory of bayesian infer- ence, Physical Review A88, 052130 (2013)

work page 2013

[54] [54]

Khatri and M

S. Khatri and M. M. Wilde, Principles of quantum communica- tion theory: A modern approach, arXiv:2011.04672 [quant-ph] (2020). 1 Supplementary Materials: Local Diffusion Models and Phases of Data Distributions CONTENTS S1 Derivation of Score-based Denoising from Bayes Formula 1 A Denoising for the continuous variable . . . . . . . . . . . . . . . . . . ....

work page arXiv 2011

[55] [55]

We notice thatχ| ϵ=0 =ρ 1 2 and ∂ ∂ϵ (χ−1) ϵ=0 =−(χ| ϵ=0)−1 ∂χ ∂ϵ ϵ=0 (χ|ϵ=0)−1 =−ρ − 1 2 ∂χ ∂ϵ ϵ=0 ρ− 1 2 .(S62) 13 Then we only need to compute ∂χ ∂ϵ ϵ=0

Derivative ofN(ρ) − 1 2 Now we letχ=N(ρ) 1 2 , namelyχ 2 =N(ρ). We notice thatχ| ϵ=0 =ρ 1 2 and ∂ ∂ϵ (χ−1) ϵ=0 =−(χ| ϵ=0)−1 ∂χ ∂ϵ ϵ=0 (χ|ϵ=0)−1 =−ρ − 1 2 ∂χ ∂ϵ ϵ=0 ρ− 1 2 .(S62) 13 Then we only need to compute ∂χ ∂ϵ ϵ=0 . Sinceχ 2 =N(ρ), we have χ ∂χ ∂ϵ + ∂χ ∂ϵ χ= ∂ ∂ϵ (N(ρ)).(S63) Here comes to the symmetric division atϵ= 0: the relation 1 2 n ρ 1 2 , ∂χ...

work page

[56] [56]

For any operatorτ, we have ∂ ∂ϵ (N †(τ)) ϵ=0 =L †(τ) =a †τ a− 1 2(a†aτ+τ a †a).(S68)

Derivative ofN † This part is easy. For any operatorτ, we have ∂ ∂ϵ (N †(τ)) ϵ=0 =L †(τ) =a †τ a− 1 2(a†aτ+τ a †a).(S68)

work page

[57] [57]

B Continuous-time Twirled Petz Map Thetwirled Petz mapis defined as TN,ρ (σ) = Z ∞ −∞ f(θ)ρ 1−iθ 2 N † h N(ρ) −1+iθ 2 σN(ρ) −1−iθ 2 i ρ 1+iθ 2 ,(S75) wheref(θ) = π 2(cosh(πθ)+1)

Derivative ofP N,ρ Now we can expandP N,ρ (σ)into PN,ρ (σ) =ρ 1 2 N(ρ) − 1 2 σN(ρ) − 1 2 +ϵL † N(ρ) − 1 2 σN(ρ) − 1 2 +O(ϵ 2) ρ 1 2 =ρ 1 2 ρ− 1 2 −ϵρ − 1 2 Lρ1/2 1 2 L(ρ) ρ− 1 2 +O(ϵ 2) σ ρ− 1 2 −ϵρ − 1 2 Lρ1/2 1 2 L(ρ) ρ− 1 2 +O(ϵ 2) ρ 1 2 +ϵρ 1 2 L† N(ρ) − 1 2 σN(ρ) − 1 2 ρ 1 2 +O(ϵ 2) =σ+ϵ −Lρ1/2 1 2 L(ρ) ρ− 1 2 σ−σρ − 1 2 Lρ1/2 1 2 L(ρ) +ρ 1 2 L† ρ− 1...

work page

[58] [58]

We notice thatχ θ|ϵ=0 =ρ 1−iθ 2 and ∂ ∂ϵ (χ−1 θ ) ϵ=0 =−(χ θ|ϵ=0)−1 ∂χθ ∂ϵ ϵ=0 (χθ|ϵ=0)−1 =−ρ −1+iθ 2 ∂χθ ∂ϵ ϵ=0 ρ −1+iθ 2 .(S77) Then we only need to computeκ θ = ∂χθ ∂ϵ ϵ=0

Derivative ofN(ρ) −1+iθ 2 Now we letχ θ =N(ρ) 1−iθ 2 , namelyχ θχ† θ =N(ρ). We notice thatχ θ|ϵ=0 =ρ 1−iθ 2 and ∂ ∂ϵ (χ−1 θ ) ϵ=0 =−(χ θ|ϵ=0)−1 ∂χθ ∂ϵ ϵ=0 (χθ|ϵ=0)−1 =−ρ −1+iθ 2 ∂χθ ∂ϵ ϵ=0 ρ −1+iθ 2 .(S77) Then we only need to computeκ θ = ∂χθ ∂ϵ ϵ=0 . Sinceχ θχ† θ =χ † θχθ =N(ρ), we have χθ ∂χ† θ ∂ϵ + ∂χθ ∂ϵ χ† θ = ∂ ∂ϵ (N(ρ)),(S78) χ† θ ∂χθ ∂ϵ + ∂χ† θ ∂...

work page

[59] [59]

Consider a state with Wigner distributionW(x, p) = 1 2π P(x)

Derivative ofT N,ρ LetT N,ρ (σ) = R ∞ −∞ dθ f(θ)R N,ρ,θ (σ), whereR N,ρ (σ)is the rotated Petz map, RN,ρ,θ (σ) =ρ 1−iθ 2 N(ρ) −1+iθ 2 σN(ρ) −1−iθ 2 +ϵL † N(ρ) −1+iθ 2 σN(ρ) −1−iθ 2 +O(ϵ 2) ρ 1+iθ 2 =ρ 1−iθ 2 ρ −1+iθ 2 −ϵL ρ1/2,θ ρ −1+iθ 2 L(ρ)ρ −1+iθ 2 +O(ϵ 2) σ ρ −1−iθ 2 −ϵL ρ1/2,−θ ρ −1−iθ 2 L(ρ)ρ −1−iθ 2 +O(ϵ 2) ρ 1+iθ 2 +ϵρ 1−iθ 2 L† N(ρ) −1+iθ 2 σN(ρ...

work page

[60] [60]

Firstly, ˆρ 1 2 ˆpˆρ− 1 2 |ψ⟩ ↔ √ P(−i∂ x) 1√ P ψ=−i ∂x − 1 2(∂x lnP) ψ,(S108) 19 namely ˆb↔b=−i ∂x − 1 2(∂x lnP)

Dissipative term in continuous-time Petz map Now we can computeD[ ˆb]ˆσ=D h ˆρ 1 2 ˆpˆρ− 1 2 i ˆσwhereˆσ= R dx Q(x)|x⟩ ⟨x|. Firstly, ˆρ 1 2 ˆpˆρ− 1 2 |ψ⟩ ↔ √ P(−i∂ x) 1√ P ψ=−i ∂x − 1 2(∂x lnP) ψ,(S108) 19 namely ˆb↔b=−i ∂x − 1 2(∂x lnP) . Similarly, ˆρ− 1 2 ˆpˆρ 1 2 |ψ⟩ ↔ 1√ P (−i∂x) √ P ψ=−i ∂x + 1 2(∂x lnP) ψ,(S109) namely ˆb† ↔b=−i ∂x + 1 2(∂x lnP) . ...

work page

[61] [61]

Hamiltonian term in continuous-time Petz map Before computing−i[ ˆR, σ], we recall that ˆR=− i 2 Z dxdx′ p P(x)− p P(x ′)p P(x) + p P(x ′) ⟨x|ˆp2 + ˆb†ˆb|x ′⟩ |x⟩ ⟨x′|.(S116) We first check that ˆp2 |ψ⟩ ↔ −∂ 2 xψ,(S117) ˆb†ˆb|ψ⟩ ↔ −∂2 x + 1 2 s′ + 1 4 s2 ψ.(S118) We notice that Z dxdx′ p P(x)− p P(x ′)p P(x) + p P(x ′) ⟨x| 1 2 s′(x) + 1 4 s(x)2 |x′⟩ |x⟩ ⟨...

work page

[62] [62]

Final expression of continuous-time Petz map under decoherence limit Finally, we have (remembers=∂ x(lnP(x))is the score function) −i[ ˆR,ˆσ]|ψ⟩ ↔ −1 2 sQ′ ψ,(S124) D[ˆb]ˆσ|ψ⟩ ↔ −1 2 sQ′ −s ′Q+ 1 2 Q′′ ψ,(S125) (−i[ ˆR,ˆσ] +D[ˆb]ˆσ)|ψ⟩ ↔ −∂x(sQ) + 1 2 Q′′ ψ.(S126) We note here that both−i[ ˆR,ˆσ]andD[ ˆb]are not trace-class, but their summation is trace-c...

work page