When Do Local Score Models Extrapolate Across Size? A Diagnostic Theory and Benchmark

Wenjie Xi

arxiv: 2606.09705 · v1 · pith:GCWZUG6Inew · submitted 2026-06-08 · 💻 cs.LG · cond-mat.stat-mech

When Do Local Score Models Extrapolate Across Size? A Diagnostic Theory and Benchmark

Wenjie Xi This is my paper

Pith reviewed 2026-06-27 16:55 UTC · model grok-4.3

classification 💻 cs.LG cond-mat.stat-mech

keywords score-based generative modelssize extrapolationlocal modelsquasi-localityTweedie's formulaspatial mixingdiffusion modelsbenchmark

0 comments

The pith

Local score models extrapolate across sizes only when their receptive field covers the quasi-locality range of the Gaussian-smoothed score.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that translation-invariant architectures alone do not ensure stable size transfer in score-based generative models. Stable extrapolation instead requires that a model's receptive field encompass the response range of the Gaussian-smoothed score, because far-away perturbations reach local score components through posterior covariance. This dependence is formalized in a size-uniform comparison theorem for local marginals under reverse diffusion. The authors introduce the Finite-Depth Local Flow benchmark with exact scores and controllable ranges to isolate the role of spatial mixing. When mixing is strong the smoothed score stays quasi-local and extrapolation holds; when mixing weakens the locality degrades and transfer fails.

Core claim

Stable extrapolation is governed by the quasi-locality of the Gaussian-smoothed score: a local model succeeds only if its receptive field covers the smoothed score's response range, formalized by a size-uniform comparison theorem for local marginals under reverse diffusion. Through Tweedie's formula, far-away perturbations can influence local score components via posterior covariance. Under spatial mixing the smoothed score remains quasi-local relative to the receptive field, enabling stable extrapolation; when spatial mixing weakens the score's locality rapidly degrades and size transfer fails.

What carries the argument

Quasi-locality of the Gaussian-smoothed score, which sets the required receptive-field size through posterior covariance in Tweedie's formula.

If this is right

Under sufficient spatial mixing the smoothed score stays quasi-local relative to any fixed receptive field, permitting stable size extrapolation.
Weakening spatial mixing causes the smoothed score's effective range to grow, so models with fixed receptive fields lose extrapolation capability.
The Finite-Depth Local Flow construction supplies exact scores, densities, and tunable response ranges for isolating the quasi-locality mechanism.
Architectural translation invariance is necessary but not sufficient; receptive-field width must also match the quasi-local range.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

Design of scalable models for physical systems should estimate mixing length first and size receptive fields accordingly rather than relying on invariance alone.
The same quasi-locality diagnostic could be applied to other generative settings that assume local score or density approximations.
Controlled benchmarks with tunable mixing could be used to test whether the size-uniform comparison theorem holds for discrete or graph-structured data.

Load-bearing premise

The influence of far-away perturbations on local score components occurs exclusively through posterior covariance as given by Tweedie's formula, and spatial mixing properties remain stationary enough for the quasi-locality range to be well-defined independently of system size.

What would settle it

An experiment in which a local model whose receptive field is smaller than the measured response range of the smoothed score still produces accurate extrapolation on systems where spatial mixing is deliberately weakened.

Figures

Figures reproduced from arXiv: 2606.09705 by Wenjie Xi.

**Figure 1.** Figure 1: Fixed-architecture pure-CNN size extrapolation on the 2D FDLF benchmark, aggregated over three indepen [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗

**Figure 2.** Figure 2: Pure-CNN receptive-field sweep on the 2D FDLF benchmark, aggregated over three independent training [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗

**Figure 3.** Figure 3: Hard-valid mixed discrete-continuous check. Models are trained at [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗

**Figure 4.** Figure 4: Boundary response of the smoothed score near the Ising critical point. A local CNN with radius [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗

**Figure 1.** Figure 1: 2D fixed-size sweep Teacher families Short-range simple, short-range structured, and long-range stress. [PITH_FULL_IMAGE:figures/full_fig_p017_1.png] view at source ↗

**Figure 3.** Figure 3: hard-valid mixed 3D check Teacher flow Conditional affine-coupling hidden width 64; embedding [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗

read the original abstract

Scientific generative modeling often requires size transfer, where models trained on small systems are evaluated on larger ones. While translation-invariant architectures enable this evaluation, we show that architectural locality alone does not guarantee stable size extrapolation. Instead, stable extrapolation is governed by the quasi-locality of the Gaussian-smoothed score. Through Tweedie's formula, far-away perturbations can influence local score components via posterior covariance, meaning a local model succeeds only if its receptive field covers the smoothed score's response range. We formalize this mechanism, proving a size-uniform comparison theorem for local marginals under reverse diffusion. We also introduce Finite-Depth Local Flow (FDLF), a white-box diagnostic benchmark with exact scores, densities, and controllable response ranges. Empirically, we validate the interplay between spatial mixing, smoothed-score quasi-locality, and model receptive fields. Under spatial mixing, the smoothed score remains quasi-local relative to the receptive field, enabling stable extrapolation. Conversely, when spatial mixing weakens, the score's locality rapidly degrades, causing size transfer to fail.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Desk Editor's Note private letter to a colleague

The paper supplies a size-uniform comparison theorem plus the FDLF benchmark that together diagnose when local score models will extrapolate stably, but the stationarity assumption on spatial mixing needs explicit checks in regimes where correlation lengths grow with system size.

read the letter

The core contribution is a theorem that says a local score model extrapolates across sizes only when its receptive field covers the response range of the Gaussian-smoothed score, with the link coming through Tweedie's formula and posterior covariance. They pair this with the FDLF benchmark, which supplies exact scores and densities while letting the user dial the response range. That combination is new and directly addresses a practical failure mode in scientific diffusion models.

The paper does a clean job separating architectural translation invariance from the actual quasi-locality requirement. The empirical section shows that under stationary spatial mixing the smoothed score stays local enough for extrapolation to hold, while weakening mixing makes the score non-local and transfer fails. The benchmark being white-box with controllable ranges is a practical plus for testing the mechanism.

The main soft spot is the assumption that the quasi-locality range of the smoothed score remains independent of system size once the receptive field is large enough. The stress-test note flags that this can break when correlation lengths scale with L, as in critical phenomena. If the paper's derivation treats spatial mixing properties as stationary without additional conditions, that limits applicability in those regimes. The abstract claims the proof, but the strength rests on whether the reverse-diffusion steps and covariance bounds are stated tightly enough to be non-vacuous.

This work is aimed at people training score-based models on physical systems where training on small lattices and deploying on large ones is routine. It gives a concrete diagnostic rather than another architecture tweak. The combination of formal statement and controllable benchmark is solid enough to warrant peer review, even if some applications will require extra validation of the mixing assumption.

Referee Report

1 major / 1 minor

Summary. The paper claims that stable size extrapolation in local score-based generative models is governed by the quasi-locality of the Gaussian-smoothed score (via Tweedie's formula and posterior covariance) rather than architectural locality alone. It proves a size-uniform comparison theorem for local marginals under reverse diffusion, introduces the white-box Finite-Depth Local Flow (FDLF) benchmark with exact scores/densities and controllable response ranges, and empirically validates that stable extrapolation occurs under spatial mixing but fails when mixing weakens.

Significance. If the theorem holds with non-vacuous assumptions and the FDLF benchmark isolates the claimed mechanism, the work supplies a useful diagnostic theory and reproducible testbed for size transfer in scientific diffusion models. The exact, controllable quantities in FDLF are a clear strength for falsifiability and reproducibility.

major comments (1)

[Abstract / size-uniform comparison theorem] The size-uniform comparison theorem (abstract) routes all far-field influence exclusively through posterior covariance and requires spatial mixing properties to remain stationary enough for the quasi-locality range to be independent of system size. This assumption is load-bearing; the paper should explicitly delineate the regimes (e.g., fixed vs. diverging correlation length) where it holds and verify it does not fail in the FDLF construction.

minor comments (1)

The abstract refers to the theorem and empirical validation on FDLF but does not provide section or equation numbers for the full derivation or error analysis, making it difficult to confirm that the theorem's assumptions are satisfied in the benchmark.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the size-uniform comparison theorem. The observation correctly identifies that the theorem's applicability depends on spatial mixing properties remaining stationary with system size. We will revise the manuscript to explicitly delineate the relevant regimes and confirm the FDLF construction satisfies the required conditions.

read point-by-point responses

Referee: [Abstract / size-uniform comparison theorem] The size-uniform comparison theorem (abstract) routes all far-field influence exclusively through posterior covariance and requires spatial mixing properties to remain stationary enough for the quasi-locality range to be independent of system size. This assumption is load-bearing; the paper should explicitly delineate the regimes (e.g., fixed vs. diverging correlation length) where it holds and verify it does not fail in the FDLF construction.

Authors: The theorem indeed channels far-field effects solely through the posterior covariance (via Tweedie's formula) and requires that the correlation structure of the data distribution yields a quasi-locality range independent of system size. This holds under the regime where the correlation length remains fixed (or sub-linear) as system size grows, which is the setting we consider throughout the paper and in the FDLF benchmark. In the revised manuscript we will add a dedicated paragraph in Section 3 that (i) states the stationarity assumption on the mixing properties, (ii) contrasts the fixed-correlation-length regime (where the theorem applies) with the diverging-correlation-length regime (where quasi-locality may degrade), and (iii) verifies that every FDLF instance is constructed with a fixed finite interaction depth, ensuring the correlation length does not diverge with system size. This clarification does not alter the theorem statement but makes its scope explicit. revision: yes

Circularity Check

0 steps flagged

No circularity; central theorem derived from standard Tweedie's formula with independent benchmark

full rationale

The paper's derivation chain relies on Tweedie's formula (a standard statistical identity) to relate far-field perturbations to local scores via posterior covariance, then proves a new size-uniform comparison theorem for local marginals under reverse diffusion. The Finite-Depth Local Flow benchmark is presented as white-box with exact scores and densities, providing an independent diagnostic. No self-citations are load-bearing for the uniqueness or validity of the theorem, no parameters are fitted and then renamed as predictions, and no ansatz or known result is smuggled in via citation. The argument is therefore self-contained against external mathematical facts rather than reducing to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on Tweedie's formula (standard) and the modeling assumption that spatial mixing can be controlled independently of system size in the benchmark. No free parameters or invented physical entities are introduced; the benchmark itself is a constructed test distribution rather than a new physical postulate.

axioms (1)

standard math Tweedie's formula relating the score to posterior covariance under Gaussian smoothing
Invoked to connect far-away perturbations to local score components

pith-pipeline@v0.9.1-grok · 5708 in / 1441 out tokens · 17774 ms · 2026-06-27T16:55:26.569139+00:00 · methodology

discussion (0)

Reference graph

Works this paper leans on

34 extracted references · 15 canonical work pages

[1]

Estimation of non-normalized statistical models by score matching https://jmlr.org/papers/v6/hyvarinen05a.html

Aapo Hyvarinen. Estimation of non-normalized statistical models by score matching https://jmlr.org/papers/v6/hyvarinen05a.html. Journal of Machine Learning Research, 6:695--709, 2005

2005
[2]

2011 , issue_date =

Pascal Vincent. A connection between score matching and denoising autoencoders https://doi.org/10.1162/NECO_a_00142. Neural Computation, 23(7):1661--1674, 2011

work page doi:10.1162/neco_a_00142 2011
[3]

Deep unsupervised learning using nonequilibrium thermodynamics https://proceedings.mlr.press/v37/sohl-dickstein15.html

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics https://proceedings.mlr.press/v37/sohl-dickstein15.html. In International Conference on Machine Learning, pages 2256--2265, 2015

2015
[4]

Generative modeling by estimating gradients of the data distribution https://proceedings.neurips.cc/paper/2019/hash/3001ef257407d5a371a96dcd947c7d93-Abstract.html

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution https://proceedings.neurips.cc/paper/2019/hash/3001ef257407d5a371a96dcd947c7d93-Abstract.html. In Advances in Neural Information Processing Systems, 2019

2019
[5]

Denoising diffusion probabilistic models https://proceedings.neurips.cc/paper/2020/hash/4c5bcfec8584af0d967f1ab10179ca4b-Abstract.html

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models https://proceedings.neurips.cc/paper/2020/hash/4c5bcfec8584af0d967f1ab10179ca4b-Abstract.html. In Advances in Neural Information Processing Systems, 2020

2020
[6]

Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations https://openreview.net/forum?id=PxTIG12RRHS. International Conference on Learning Representations, 2021

2021
[7]

Convergence for score-based generative modeling with polynomial complexity https://arxiv.org/abs/2206.06227

Holden Lee, Jianfeng Lu, and Yixin Tan. Convergence for score-based generative modeling with polynomial complexity https://arxiv.org/abs/2206.06227. In Advances in Neural Information Processing Systems, 2022

arXiv 2022
[8]

Sitan Chen, Sinho Chewi, Jerry Li, Yuanzhi Li, Adil Salim, and Anru R. Zhang. Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions https://openreview.net/forum?id=zyLVMgsZ0U_. International Conference on Learning Representations, 2023

2023
[9]

Schoenholz, Patrick F

Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural message passing for quantum chemistry https://proceedings.mlr.press/v70/gilmer17a.html. In International Conference on Machine Learning, pages 1263--1272, 2017

2017
[10]

Battaglia

Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, and Peter W. Battaglia. Learning to simulate complex physics with graph networks https://proceedings.mlr.press/v119/sanchez-gonzalez20a.html. In International Conference on Machine Learning, pages 8459--8468, 2020

2020
[11]

Fourier neural operator for parametric partial differential equations https://openreview.net/forum?id=c8P9NQVtmnO

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations https://openreview.net/forum?id=c8P9NQVtmnO. International Conference on Learning Representations, 2021

2021
[12]

Neural operator: learning maps between function spaces with applications to PDEs https://jmlr.org/papers/v24/21-1524.html

Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: learning maps between function spaces with applications to PDEs https://jmlr.org/papers/v24/21-1524.html. Journal of Machine Learning Research, 24(89):1--97, 2023

2023
[13]

Mosaic: A Benchmark Suite for Differentiable Physics Solvers — Rehmann et al., 2026 13

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators https://doi.org/10.1038/s42256-021-00302-5. Nature Machine Intelligence, 3:218--229, 2021

work page doi:10.1038/s42256-021-00302-5 2021
[14]

Physical Review Letters , volume =

Jorg Behler and Michele Parrinello. Generalized neural-network representation of high-dimensional potential-energy surfaces https://doi.org/10.1103/PhysRevLett.98.146401. Physical Review Letters, 98:146401, 2007

work page doi:10.1103/physrevlett.98.146401 2007
[15]

Extensive deep neural networks for transferring small scale learning to large scale systems https://doi.org/10.1039/C8SC04578J

Kyle Mills, Matthew Spanner, and Isaac Tamblyn. Extensive deep neural networks for transferring small scale learning to large scale systems https://doi.org/10.1039/C8SC04578J. Chemical Science, 10:4129--4140, 2019

work page doi:10.1039/c8sc04578j 2019
[16]

Schutt, Pieter-Jan Kindermans, Huziel Enoc Sauceda, Stefan Chmiela, Alexandre Tkatchenko, and Klaus-Robert Muller

Kristof T. Schutt, Pieter-Jan Kindermans, Huziel Enoc Sauceda, Stefan Chmiela, Alexandre Tkatchenko, and Klaus-Robert Muller. SchNet: a continuous-filter convolutional neural network for modeling quantum interactions https://proceedings.neurips.cc/paper/2017/hash/303ed4c69846ab36c2904d3ba8573050-Abstract.html. In Advances in Neural Information Processing ...

2017
[17]

Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E

Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P. Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E. Smidt, and Boris Kozinsky. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials https://doi.org/10.1038/s41467-022-29939-5. Nature Communications, 13:2453, 2022

work page doi:10.1038/s41467-022-29939-5 2022
[18]

E(n) equivariant graph neural networks https://proceedings.mlr.press/v139/satorras21a.html

Victor Garcia Satorras, Emiel Hoogeboom, and Max Welling. E(n) equivariant graph neural networks https://proceedings.mlr.press/v139/satorras21a.html. In International Conference on Machine Learning, pages 9323--9332, 2021

2021
[19]

Equivariant diffusion for molecule generation in 3D https://proceedings.mlr.press/v162/hoogeboom22a.html

Emiel Hoogeboom, Victor Garcia Satorras, Clement Vignac, and Max Welling. Equivariant diffusion for molecule generation in 3D https://proceedings.mlr.press/v162/hoogeboom22a.html. In International Conference on Machine Learning, pages 8867--8887, 2022

2022
[20]

GeoDiff: A geometric diffusion model for molecular conformation generation https://openreview.net/forum?id=PzcvxEMzvQC

Minkai Xu, Lantao Yu, Yang Song, Chence Shi, Stefano Ermon, and Jian Tang. GeoDiff: A geometric diffusion model for molecular conformation generation https://openreview.net/forum?id=PzcvxEMzvQC. International Conference on Learning Representations, 2022

2022
[21]

Torsional diffusion for molecular conformer generation https://openreview.net/forum?id=w6fj2r62r_H

Bowen Jing, Gabriele Corso, Jeffrey Chang, Regina Barzilay, and Tommi Jaakkola. Torsional diffusion for molecular conformer generation https://openreview.net/forum?id=w6fj2r62r_H. In Advances in Neural Information Processing Systems, 2022

2022
[22]

Normalizing flows for probabilistic modeling and inference https://jmlr.org/papers/v22/19-1028.html

George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. Normalizing flows for probabilistic modeling and inference https://jmlr.org/papers/v22/19-1028.html. Journal of Machine Learning Research, 22(57):1--64, 2021

2021
[23]

Variational inference with normalizing flows https://proceedings.mlr.press/v37/rezende15.html

Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows https://proceedings.mlr.press/v37/rezende15.html. In International Conference on Machine Learning, pages 1530--1538, 2015

2015
[24]

Density estimation using Real NVP https://openreview.net/forum?id=HkpbnH9lx

Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using Real NVP https://openreview.net/forum?id=HkpbnH9lx. International Conference on Learning Representations, 2017

2017
[25]

R. L. Dobrushin. The description of a random field by means of conditional probabilities and conditions of its regularity https://doi.org/10.1137/1113026. Theory of Probability and Its Applications, 13(2):197--224, 1968

work page doi:10.1137/1113026 1968
[26]

Fisher and Michael N

Michael E. Fisher and Michael N. Barber. Scaling theory for finite-size effects in the critical region https://doi.org/10.1103/PhysRevLett.28.1516. Physical Review Letters, 28(23):1516--1519, 1972

work page doi:10.1103/physrevlett.28.1516 1972
[27]

Finite Size Scaling and Numerical Simulation of Statistical Systems https://doi.org/10.1142/1011

Vladimir Privman, editor. Finite Size Scaling and Numerical Simulation of Statistical Systems https://doi.org/10.1142/1011. World Scientific, 1990

work page doi:10.1142/1011 1990
[28]

Counting independent sets up to the tree threshold https://doi.org/10.1145/1132516.1132538

Dror Weitz. Counting independent sets up to the tree threshold https://doi.org/10.1145/1132516.1132538. In ACM Symposium on Theory of Computing, pages 140--149, 2006

work page doi:10.1145/1132516.1132538 2006
[29]

Recurrence of distributional limits of finite planar graphs https://doi.org/10.1214/EJP.v6-96

Itai Benjamini and Oded Schramm. Recurrence of distributional limits of finite planar graphs https://doi.org/10.1214/EJP.v6-96. Electronic Journal of Probability, 6:1--13, 2001

work page doi:10.1214/ejp.v6-96 2001
[30]

Michael Steele

David Aldous and J. Michael Steele. The objective method: probabilistic combinatorial optimization and local weak convergence https://doi.org/10.1007/978-3-662-09444-0_1. In Probability on Discrete Structures, pages 1--72. Springer, 2004

work page doi:10.1007/978-3-662-09444-0_1 2004
[31]

Crystal Statistics

Lars Onsager. Crystal statistics. I. A two-dimensional model with an order-disorder transition https://journals.aps.org/pr/abstract/10.1103/PhysRev.65.117. Physical Review, 65(3--4):117--149, 1944

work page doi:10.1103/physrev.65.117 1944
[32]

Georgii, Gibbs Measures and Phase Transitions, 2nd ed., De Gruyter Studies in Mathematics Vol

Hans-Otto Georgii. Gibbs Measures and Phase Transitions https://doi.org/10.1515/9783110250329. De Gruyter, second edition, 2011

work page doi:10.1515/9783110250329 2011
[33]

Herbert E. Robbins. An empirical Bayes approach to statistics https://doi.org/10.1007/978-1-4612-0919-5_26. In Breakthroughs in Statistics, pages 388--394. Springer, 1992

work page doi:10.1007/978-1-4612-0919-5_26 1992
[34]

Tweedie’s formula and selection bias.Journal of the Amer- ican Statistical Association, 106(496):1602–1614, 2011

Bradley Efron. Tweedie's formula and selection bias https://doi.org/10.1198/jasa.2011.tm11181. Journal of the American Statistical Association, 106(496):1602--1614, 2011

work page doi:10.1198/jasa.2011.tm11181 2011

[1] [1]

Estimation of non-normalized statistical models by score matching https://jmlr.org/papers/v6/hyvarinen05a.html

Aapo Hyvarinen. Estimation of non-normalized statistical models by score matching https://jmlr.org/papers/v6/hyvarinen05a.html. Journal of Machine Learning Research, 6:695--709, 2005

2005

[2] [2]

2011 , issue_date =

Pascal Vincent. A connection between score matching and denoising autoencoders https://doi.org/10.1162/NECO_a_00142. Neural Computation, 23(7):1661--1674, 2011

work page doi:10.1162/neco_a_00142 2011

[3] [3]

Deep unsupervised learning using nonequilibrium thermodynamics https://proceedings.mlr.press/v37/sohl-dickstein15.html

Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics https://proceedings.mlr.press/v37/sohl-dickstein15.html. In International Conference on Machine Learning, pages 2256--2265, 2015

2015

[4] [4]

Generative modeling by estimating gradients of the data distribution https://proceedings.neurips.cc/paper/2019/hash/3001ef257407d5a371a96dcd947c7d93-Abstract.html

Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution https://proceedings.neurips.cc/paper/2019/hash/3001ef257407d5a371a96dcd947c7d93-Abstract.html. In Advances in Neural Information Processing Systems, 2019

2019

[5] [5]

Denoising diffusion probabilistic models https://proceedings.neurips.cc/paper/2020/hash/4c5bcfec8584af0d967f1ab10179ca4b-Abstract.html

Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models https://proceedings.neurips.cc/paper/2020/hash/4c5bcfec8584af0d967f1ab10179ca4b-Abstract.html. In Advances in Neural Information Processing Systems, 2020

2020

[6] [6]

Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations https://openreview.net/forum?id=PxTIG12RRHS. International Conference on Learning Representations, 2021

2021

[7] [7]

Convergence for score-based generative modeling with polynomial complexity https://arxiv.org/abs/2206.06227

Holden Lee, Jianfeng Lu, and Yixin Tan. Convergence for score-based generative modeling with polynomial complexity https://arxiv.org/abs/2206.06227. In Advances in Neural Information Processing Systems, 2022

arXiv 2022

[8] [8]

Sitan Chen, Sinho Chewi, Jerry Li, Yuanzhi Li, Adil Salim, and Anru R. Zhang. Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions https://openreview.net/forum?id=zyLVMgsZ0U_. International Conference on Learning Representations, 2023

2023

[9] [9]

Schoenholz, Patrick F

Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural message passing for quantum chemistry https://proceedings.mlr.press/v70/gilmer17a.html. In International Conference on Machine Learning, pages 1263--1272, 2017

2017

[10] [10]

Battaglia

Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, and Peter W. Battaglia. Learning to simulate complex physics with graph networks https://proceedings.mlr.press/v119/sanchez-gonzalez20a.html. In International Conference on Machine Learning, pages 8459--8468, 2020

2020

[11] [11]

Fourier neural operator for parametric partial differential equations https://openreview.net/forum?id=c8P9NQVtmnO

Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations https://openreview.net/forum?id=c8P9NQVtmnO. International Conference on Learning Representations, 2021

2021

[12] [12]

Neural operator: learning maps between function spaces with applications to PDEs https://jmlr.org/papers/v24/21-1524.html

Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: learning maps between function spaces with applications to PDEs https://jmlr.org/papers/v24/21-1524.html. Journal of Machine Learning Research, 24(89):1--97, 2023

2023

[13] [13]

Mosaic: A Benchmark Suite for Differentiable Physics Solvers — Rehmann et al., 2026 13

Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators https://doi.org/10.1038/s42256-021-00302-5. Nature Machine Intelligence, 3:218--229, 2021

work page doi:10.1038/s42256-021-00302-5 2021

[14] [14]

Physical Review Letters , volume =

Jorg Behler and Michele Parrinello. Generalized neural-network representation of high-dimensional potential-energy surfaces https://doi.org/10.1103/PhysRevLett.98.146401. Physical Review Letters, 98:146401, 2007

work page doi:10.1103/physrevlett.98.146401 2007

[15] [15]

Extensive deep neural networks for transferring small scale learning to large scale systems https://doi.org/10.1039/C8SC04578J

Kyle Mills, Matthew Spanner, and Isaac Tamblyn. Extensive deep neural networks for transferring small scale learning to large scale systems https://doi.org/10.1039/C8SC04578J. Chemical Science, 10:4129--4140, 2019

work page doi:10.1039/c8sc04578j 2019

[16] [16]

Schutt, Pieter-Jan Kindermans, Huziel Enoc Sauceda, Stefan Chmiela, Alexandre Tkatchenko, and Klaus-Robert Muller

Kristof T. Schutt, Pieter-Jan Kindermans, Huziel Enoc Sauceda, Stefan Chmiela, Alexandre Tkatchenko, and Klaus-Robert Muller. SchNet: a continuous-filter convolutional neural network for modeling quantum interactions https://proceedings.neurips.cc/paper/2017/hash/303ed4c69846ab36c2904d3ba8573050-Abstract.html. In Advances in Neural Information Processing ...

2017

[17] [17]

Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E

Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P. Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E. Smidt, and Boris Kozinsky. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials https://doi.org/10.1038/s41467-022-29939-5. Nature Communications, 13:2453, 2022

work page doi:10.1038/s41467-022-29939-5 2022

[18] [18]

E(n) equivariant graph neural networks https://proceedings.mlr.press/v139/satorras21a.html

Victor Garcia Satorras, Emiel Hoogeboom, and Max Welling. E(n) equivariant graph neural networks https://proceedings.mlr.press/v139/satorras21a.html. In International Conference on Machine Learning, pages 9323--9332, 2021

2021

[19] [19]

Equivariant diffusion for molecule generation in 3D https://proceedings.mlr.press/v162/hoogeboom22a.html

Emiel Hoogeboom, Victor Garcia Satorras, Clement Vignac, and Max Welling. Equivariant diffusion for molecule generation in 3D https://proceedings.mlr.press/v162/hoogeboom22a.html. In International Conference on Machine Learning, pages 8867--8887, 2022

2022

[20] [20]

GeoDiff: A geometric diffusion model for molecular conformation generation https://openreview.net/forum?id=PzcvxEMzvQC

Minkai Xu, Lantao Yu, Yang Song, Chence Shi, Stefano Ermon, and Jian Tang. GeoDiff: A geometric diffusion model for molecular conformation generation https://openreview.net/forum?id=PzcvxEMzvQC. International Conference on Learning Representations, 2022

2022

[21] [21]

Torsional diffusion for molecular conformer generation https://openreview.net/forum?id=w6fj2r62r_H

Bowen Jing, Gabriele Corso, Jeffrey Chang, Regina Barzilay, and Tommi Jaakkola. Torsional diffusion for molecular conformer generation https://openreview.net/forum?id=w6fj2r62r_H. In Advances in Neural Information Processing Systems, 2022

2022

[22] [22]

Normalizing flows for probabilistic modeling and inference https://jmlr.org/papers/v22/19-1028.html

George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. Normalizing flows for probabilistic modeling and inference https://jmlr.org/papers/v22/19-1028.html. Journal of Machine Learning Research, 22(57):1--64, 2021

2021

[23] [23]

Variational inference with normalizing flows https://proceedings.mlr.press/v37/rezende15.html

Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows https://proceedings.mlr.press/v37/rezende15.html. In International Conference on Machine Learning, pages 1530--1538, 2015

2015

[24] [24]

Density estimation using Real NVP https://openreview.net/forum?id=HkpbnH9lx

Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using Real NVP https://openreview.net/forum?id=HkpbnH9lx. International Conference on Learning Representations, 2017

2017

[25] [25]

R. L. Dobrushin. The description of a random field by means of conditional probabilities and conditions of its regularity https://doi.org/10.1137/1113026. Theory of Probability and Its Applications, 13(2):197--224, 1968

work page doi:10.1137/1113026 1968

[26] [26]

Fisher and Michael N

Michael E. Fisher and Michael N. Barber. Scaling theory for finite-size effects in the critical region https://doi.org/10.1103/PhysRevLett.28.1516. Physical Review Letters, 28(23):1516--1519, 1972

work page doi:10.1103/physrevlett.28.1516 1972

[27] [27]

Finite Size Scaling and Numerical Simulation of Statistical Systems https://doi.org/10.1142/1011

Vladimir Privman, editor. Finite Size Scaling and Numerical Simulation of Statistical Systems https://doi.org/10.1142/1011. World Scientific, 1990

work page doi:10.1142/1011 1990

[28] [28]

Counting independent sets up to the tree threshold https://doi.org/10.1145/1132516.1132538

Dror Weitz. Counting independent sets up to the tree threshold https://doi.org/10.1145/1132516.1132538. In ACM Symposium on Theory of Computing, pages 140--149, 2006

work page doi:10.1145/1132516.1132538 2006

[29] [29]

Recurrence of distributional limits of finite planar graphs https://doi.org/10.1214/EJP.v6-96

Itai Benjamini and Oded Schramm. Recurrence of distributional limits of finite planar graphs https://doi.org/10.1214/EJP.v6-96. Electronic Journal of Probability, 6:1--13, 2001

work page doi:10.1214/ejp.v6-96 2001

[30] [30]

Michael Steele

David Aldous and J. Michael Steele. The objective method: probabilistic combinatorial optimization and local weak convergence https://doi.org/10.1007/978-3-662-09444-0_1. In Probability on Discrete Structures, pages 1--72. Springer, 2004

work page doi:10.1007/978-3-662-09444-0_1 2004

[31] [31]

Crystal Statistics

Lars Onsager. Crystal statistics. I. A two-dimensional model with an order-disorder transition https://journals.aps.org/pr/abstract/10.1103/PhysRev.65.117. Physical Review, 65(3--4):117--149, 1944

work page doi:10.1103/physrev.65.117 1944

[32] [32]

Georgii, Gibbs Measures and Phase Transitions, 2nd ed., De Gruyter Studies in Mathematics Vol

Hans-Otto Georgii. Gibbs Measures and Phase Transitions https://doi.org/10.1515/9783110250329. De Gruyter, second edition, 2011

work page doi:10.1515/9783110250329 2011

[33] [33]

Herbert E. Robbins. An empirical Bayes approach to statistics https://doi.org/10.1007/978-1-4612-0919-5_26. In Breakthroughs in Statistics, pages 388--394. Springer, 1992

work page doi:10.1007/978-1-4612-0919-5_26 1992

[34] [34]

Tweedie’s formula and selection bias.Journal of the Amer- ican Statistical Association, 106(496):1602–1614, 2011

Bradley Efron. Tweedie's formula and selection bias https://doi.org/10.1198/jasa.2011.tm11181. Journal of the American Statistical Association, 106(496):1602--1614, 2011

work page doi:10.1198/jasa.2011.tm11181 2011