pith. sign in

arxiv: 2606.09705 · v1 · pith:GCWZUG6Inew · submitted 2026-06-08 · 💻 cs.LG · cond-mat.stat-mech

When Do Local Score Models Extrapolate Across Size? A Diagnostic Theory and Benchmark

Pith reviewed 2026-06-27 16:55 UTC · model grok-4.3

classification 💻 cs.LG cond-mat.stat-mech
keywords score-based generative modelssize extrapolationlocal modelsquasi-localityTweedie's formulaspatial mixingdiffusion modelsbenchmark
0
0 comments X

The pith

Local score models extrapolate across sizes only when their receptive field covers the quasi-locality range of the Gaussian-smoothed score.

A machine-rendered reading of the paper's core claim, the machinery that carries it, and where it could break.

The paper shows that translation-invariant architectures alone do not ensure stable size transfer in score-based generative models. Stable extrapolation instead requires that a model's receptive field encompass the response range of the Gaussian-smoothed score, because far-away perturbations reach local score components through posterior covariance. This dependence is formalized in a size-uniform comparison theorem for local marginals under reverse diffusion. The authors introduce the Finite-Depth Local Flow benchmark with exact scores and controllable ranges to isolate the role of spatial mixing. When mixing is strong the smoothed score stays quasi-local and extrapolation holds; when mixing weakens the locality degrades and transfer fails.

Core claim

Stable extrapolation is governed by the quasi-locality of the Gaussian-smoothed score: a local model succeeds only if its receptive field covers the smoothed score's response range, formalized by a size-uniform comparison theorem for local marginals under reverse diffusion. Through Tweedie's formula, far-away perturbations can influence local score components via posterior covariance. Under spatial mixing the smoothed score remains quasi-local relative to the receptive field, enabling stable extrapolation; when spatial mixing weakens the score's locality rapidly degrades and size transfer fails.

What carries the argument

Quasi-locality of the Gaussian-smoothed score, which sets the required receptive-field size through posterior covariance in Tweedie's formula.

If this is right

  • Under sufficient spatial mixing the smoothed score stays quasi-local relative to any fixed receptive field, permitting stable size extrapolation.
  • Weakening spatial mixing causes the smoothed score's effective range to grow, so models with fixed receptive fields lose extrapolation capability.
  • The Finite-Depth Local Flow construction supplies exact scores, densities, and tunable response ranges for isolating the quasi-locality mechanism.
  • Architectural translation invariance is necessary but not sufficient; receptive-field width must also match the quasi-local range.

Where Pith is reading between the lines

These are editorial extensions of the paper, not claims the author makes directly.

  • Design of scalable models for physical systems should estimate mixing length first and size receptive fields accordingly rather than relying on invariance alone.
  • The same quasi-locality diagnostic could be applied to other generative settings that assume local score or density approximations.
  • Controlled benchmarks with tunable mixing could be used to test whether the size-uniform comparison theorem holds for discrete or graph-structured data.

Load-bearing premise

The influence of far-away perturbations on local score components occurs exclusively through posterior covariance as given by Tweedie's formula, and spatial mixing properties remain stationary enough for the quasi-locality range to be well-defined independently of system size.

What would settle it

An experiment in which a local model whose receptive field is smaller than the measured response range of the smoothed score still produces accurate extrapolation on systems where spatial mixing is deliberately weakened.

Figures

Figures reproduced from arXiv: 2606.09705 by Wenjie Xi.

Figure 1
Figure 1. Figure 1: Fixed-architecture pure-CNN size extrapolation on the 2D FDLF benchmark, aggregated over three indepen [PITH_FULL_IMAGE:figures/full_fig_p006_1.png] view at source ↗
Figure 2
Figure 2. Figure 2: Pure-CNN receptive-field sweep on the 2D FDLF benchmark, aggregated over three independent training [PITH_FULL_IMAGE:figures/full_fig_p007_2.png] view at source ↗
Figure 3
Figure 3. Figure 3: Hard-valid mixed discrete-continuous check. Models are trained at [PITH_FULL_IMAGE:figures/full_fig_p008_3.png] view at source ↗
Figure 4
Figure 4. Figure 4: Boundary response of the smoothed score near the Ising critical point. A local CNN with radius [PITH_FULL_IMAGE:figures/full_fig_p009_4.png] view at source ↗
Figure 1
Figure 1. Figure 1: 2D fixed-size sweep Teacher families Short-range simple, short-range structured, and long-range stress. [PITH_FULL_IMAGE:figures/full_fig_p017_1.png] view at source ↗
Figure 3
Figure 3. Figure 3: hard-valid mixed 3D check Teacher flow Conditional affine-coupling hidden width 64; embedding [PITH_FULL_IMAGE:figures/full_fig_p017_3.png] view at source ↗
read the original abstract

Scientific generative modeling often requires size transfer, where models trained on small systems are evaluated on larger ones. While translation-invariant architectures enable this evaluation, we show that architectural locality alone does not guarantee stable size extrapolation. Instead, stable extrapolation is governed by the quasi-locality of the Gaussian-smoothed score. Through Tweedie's formula, far-away perturbations can influence local score components via posterior covariance, meaning a local model succeeds only if its receptive field covers the smoothed score's response range. We formalize this mechanism, proving a size-uniform comparison theorem for local marginals under reverse diffusion. We also introduce Finite-Depth Local Flow (FDLF), a white-box diagnostic benchmark with exact scores, densities, and controllable response ranges. Empirically, we validate the interplay between spatial mixing, smoothed-score quasi-locality, and model receptive fields. Under spatial mixing, the smoothed score remains quasi-local relative to the receptive field, enabling stable extrapolation. Conversely, when spatial mixing weakens, the score's locality rapidly degrades, causing size transfer to fail.

Editorial analysis

A structured set of objections, weighed in public.

Desk editor's note, referee report, simulated authors' rebuttal, and a circularity audit. Tearing a paper down is the easy half of reading it; the pith above is the substance, this is the friction.

Referee Report

1 major / 1 minor

Summary. The paper claims that stable size extrapolation in local score-based generative models is governed by the quasi-locality of the Gaussian-smoothed score (via Tweedie's formula and posterior covariance) rather than architectural locality alone. It proves a size-uniform comparison theorem for local marginals under reverse diffusion, introduces the white-box Finite-Depth Local Flow (FDLF) benchmark with exact scores/densities and controllable response ranges, and empirically validates that stable extrapolation occurs under spatial mixing but fails when mixing weakens.

Significance. If the theorem holds with non-vacuous assumptions and the FDLF benchmark isolates the claimed mechanism, the work supplies a useful diagnostic theory and reproducible testbed for size transfer in scientific diffusion models. The exact, controllable quantities in FDLF are a clear strength for falsifiability and reproducibility.

major comments (1)
  1. [Abstract / size-uniform comparison theorem] The size-uniform comparison theorem (abstract) routes all far-field influence exclusively through posterior covariance and requires spatial mixing properties to remain stationary enough for the quasi-locality range to be independent of system size. This assumption is load-bearing; the paper should explicitly delineate the regimes (e.g., fixed vs. diverging correlation length) where it holds and verify it does not fail in the FDLF construction.
minor comments (1)
  1. The abstract refers to the theorem and empirical validation on FDLF but does not provide section or equation numbers for the full derivation or error analysis, making it difficult to confirm that the theorem's assumptions are satisfied in the benchmark.

Simulated Author's Rebuttal

1 responses · 0 unresolved

We thank the referee for the constructive comment on the size-uniform comparison theorem. The observation correctly identifies that the theorem's applicability depends on spatial mixing properties remaining stationary with system size. We will revise the manuscript to explicitly delineate the relevant regimes and confirm the FDLF construction satisfies the required conditions.

read point-by-point responses
  1. Referee: [Abstract / size-uniform comparison theorem] The size-uniform comparison theorem (abstract) routes all far-field influence exclusively through posterior covariance and requires spatial mixing properties to remain stationary enough for the quasi-locality range to be independent of system size. This assumption is load-bearing; the paper should explicitly delineate the regimes (e.g., fixed vs. diverging correlation length) where it holds and verify it does not fail in the FDLF construction.

    Authors: The theorem indeed channels far-field effects solely through the posterior covariance (via Tweedie's formula) and requires that the correlation structure of the data distribution yields a quasi-locality range independent of system size. This holds under the regime where the correlation length remains fixed (or sub-linear) as system size grows, which is the setting we consider throughout the paper and in the FDLF benchmark. In the revised manuscript we will add a dedicated paragraph in Section 3 that (i) states the stationarity assumption on the mixing properties, (ii) contrasts the fixed-correlation-length regime (where the theorem applies) with the diverging-correlation-length regime (where quasi-locality may degrade), and (iii) verifies that every FDLF instance is constructed with a fixed finite interaction depth, ensuring the correlation length does not diverge with system size. This clarification does not alter the theorem statement but makes its scope explicit. revision: yes

Circularity Check

0 steps flagged

No circularity; central theorem derived from standard Tweedie's formula with independent benchmark

full rationale

The paper's derivation chain relies on Tweedie's formula (a standard statistical identity) to relate far-field perturbations to local scores via posterior covariance, then proves a new size-uniform comparison theorem for local marginals under reverse diffusion. The Finite-Depth Local Flow benchmark is presented as white-box with exact scores and densities, providing an independent diagnostic. No self-citations are load-bearing for the uniqueness or validity of the theorem, no parameters are fitted and then renamed as predictions, and no ansatz or known result is smuggled in via citation. The argument is therefore self-contained against external mathematical facts rather than reducing to its own inputs by construction.

Axiom & Free-Parameter Ledger

0 free parameters · 1 axioms · 0 invented entities

The central claim rests on Tweedie's formula (standard) and the modeling assumption that spatial mixing can be controlled independently of system size in the benchmark. No free parameters or invented physical entities are introduced; the benchmark itself is a constructed test distribution rather than a new physical postulate.

axioms (1)
  • standard math Tweedie's formula relating the score to posterior covariance under Gaussian smoothing
    Invoked to connect far-away perturbations to local score components

pith-pipeline@v0.9.1-grok · 5708 in / 1441 out tokens · 17774 ms · 2026-06-27T16:55:26.569139+00:00 · methodology

discussion (0)

Sign in with ORCID, Apple, or X to comment. Anyone can read and Pith papers without signing in.

Reference graph

Works this paper leans on

34 extracted references · 15 canonical work pages

  1. [1]

    Estimation of non-normalized statistical models by score matching https://jmlr.org/papers/v6/hyvarinen05a.html

    Aapo Hyvarinen. Estimation of non-normalized statistical models by score matching https://jmlr.org/papers/v6/hyvarinen05a.html. Journal of Machine Learning Research, 6:695--709, 2005

  2. [2]

    2011 , issue_date =

    Pascal Vincent. A connection between score matching and denoising autoencoders https://doi.org/10.1162/NECO_a_00142. Neural Computation, 23(7):1661--1674, 2011

  3. [3]

    Deep unsupervised learning using nonequilibrium thermodynamics https://proceedings.mlr.press/v37/sohl-dickstein15.html

    Jascha Sohl-Dickstein, Eric Weiss, Niru Maheswaranathan, and Surya Ganguli. Deep unsupervised learning using nonequilibrium thermodynamics https://proceedings.mlr.press/v37/sohl-dickstein15.html. In International Conference on Machine Learning, pages 2256--2265, 2015

  4. [4]

    Generative modeling by estimating gradients of the data distribution https://proceedings.neurips.cc/paper/2019/hash/3001ef257407d5a371a96dcd947c7d93-Abstract.html

    Yang Song and Stefano Ermon. Generative modeling by estimating gradients of the data distribution https://proceedings.neurips.cc/paper/2019/hash/3001ef257407d5a371a96dcd947c7d93-Abstract.html. In Advances in Neural Information Processing Systems, 2019

  5. [5]

    Denoising diffusion probabilistic models https://proceedings.neurips.cc/paper/2020/hash/4c5bcfec8584af0d967f1ab10179ca4b-Abstract.html

    Jonathan Ho, Ajay Jain, and Pieter Abbeel. Denoising diffusion probabilistic models https://proceedings.neurips.cc/paper/2020/hash/4c5bcfec8584af0d967f1ab10179ca4b-Abstract.html. In Advances in Neural Information Processing Systems, 2020

  6. [6]

    Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole

    Yang Song, Jascha Sohl-Dickstein, Diederik P. Kingma, Abhishek Kumar, Stefano Ermon, and Ben Poole. Score-based generative modeling through stochastic differential equations https://openreview.net/forum?id=PxTIG12RRHS. International Conference on Learning Representations, 2021

  7. [7]

    Convergence for score-based generative modeling with polynomial complexity https://arxiv.org/abs/2206.06227

    Holden Lee, Jianfeng Lu, and Yixin Tan. Convergence for score-based generative modeling with polynomial complexity https://arxiv.org/abs/2206.06227. In Advances in Neural Information Processing Systems, 2022

  8. [8]

    Sitan Chen, Sinho Chewi, Jerry Li, Yuanzhi Li, Adil Salim, and Anru R. Zhang. Sampling is as easy as learning the score: theory for diffusion models with minimal data assumptions https://openreview.net/forum?id=zyLVMgsZ0U_. International Conference on Learning Representations, 2023

  9. [9]

    Schoenholz, Patrick F

    Justin Gilmer, Samuel S. Schoenholz, Patrick F. Riley, Oriol Vinyals, and George E. Dahl. Neural message passing for quantum chemistry https://proceedings.mlr.press/v70/gilmer17a.html. In International Conference on Machine Learning, pages 1263--1272, 2017

  10. [10]

    Battaglia

    Alvaro Sanchez-Gonzalez, Jonathan Godwin, Tobias Pfaff, Rex Ying, Jure Leskovec, and Peter W. Battaglia. Learning to simulate complex physics with graph networks https://proceedings.mlr.press/v119/sanchez-gonzalez20a.html. In International Conference on Machine Learning, pages 8459--8468, 2020

  11. [11]

    Fourier neural operator for parametric partial differential equations https://openreview.net/forum?id=c8P9NQVtmnO

    Zongyi Li, Nikola Kovachki, Kamyar Azizzadenesheli, Burigede Liu, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Fourier neural operator for parametric partial differential equations https://openreview.net/forum?id=c8P9NQVtmnO. International Conference on Learning Representations, 2021

  12. [12]

    Neural operator: learning maps between function spaces with applications to PDEs https://jmlr.org/papers/v24/21-1524.html

    Nikola Kovachki, Zongyi Li, Burigede Liu, Kamyar Azizzadenesheli, Kaushik Bhattacharya, Andrew Stuart, and Anima Anandkumar. Neural operator: learning maps between function spaces with applications to PDEs https://jmlr.org/papers/v24/21-1524.html. Journal of Machine Learning Research, 24(89):1--97, 2023

  13. [13]

    Mosaic: A Benchmark Suite for Differentiable Physics Solvers — Rehmann et al., 2026 13

    Lu Lu, Pengzhan Jin, Guofei Pang, Zhongqiang Zhang, and George Em Karniadakis. Learning nonlinear operators via DeepONet based on the universal approximation theorem of operators https://doi.org/10.1038/s42256-021-00302-5. Nature Machine Intelligence, 3:218--229, 2021

  14. [14]

    Physical Review Letters , volume =

    Jorg Behler and Michele Parrinello. Generalized neural-network representation of high-dimensional potential-energy surfaces https://doi.org/10.1103/PhysRevLett.98.146401. Physical Review Letters, 98:146401, 2007

  15. [15]

    Extensive deep neural networks for transferring small scale learning to large scale systems https://doi.org/10.1039/C8SC04578J

    Kyle Mills, Matthew Spanner, and Isaac Tamblyn. Extensive deep neural networks for transferring small scale learning to large scale systems https://doi.org/10.1039/C8SC04578J. Chemical Science, 10:4129--4140, 2019

  16. [16]

    Schutt, Pieter-Jan Kindermans, Huziel Enoc Sauceda, Stefan Chmiela, Alexandre Tkatchenko, and Klaus-Robert Muller

    Kristof T. Schutt, Pieter-Jan Kindermans, Huziel Enoc Sauceda, Stefan Chmiela, Alexandre Tkatchenko, and Klaus-Robert Muller. SchNet: a continuous-filter convolutional neural network for modeling quantum interactions https://proceedings.neurips.cc/paper/2017/hash/303ed4c69846ab36c2904d3ba8573050-Abstract.html. In Advances in Neural Information Processing ...

  17. [17]

    Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E

    Simon Batzner, Albert Musaelian, Lixin Sun, Mario Geiger, Jonathan P. Mailoa, Mordechai Kornbluth, Nicola Molinari, Tess E. Smidt, and Boris Kozinsky. E(3)-equivariant graph neural networks for data-efficient and accurate interatomic potentials https://doi.org/10.1038/s41467-022-29939-5. Nature Communications, 13:2453, 2022

  18. [18]

    E(n) equivariant graph neural networks https://proceedings.mlr.press/v139/satorras21a.html

    Victor Garcia Satorras, Emiel Hoogeboom, and Max Welling. E(n) equivariant graph neural networks https://proceedings.mlr.press/v139/satorras21a.html. In International Conference on Machine Learning, pages 9323--9332, 2021

  19. [19]

    Equivariant diffusion for molecule generation in 3D https://proceedings.mlr.press/v162/hoogeboom22a.html

    Emiel Hoogeboom, Victor Garcia Satorras, Clement Vignac, and Max Welling. Equivariant diffusion for molecule generation in 3D https://proceedings.mlr.press/v162/hoogeboom22a.html. In International Conference on Machine Learning, pages 8867--8887, 2022

  20. [20]

    GeoDiff: A geometric diffusion model for molecular conformation generation https://openreview.net/forum?id=PzcvxEMzvQC

    Minkai Xu, Lantao Yu, Yang Song, Chence Shi, Stefano Ermon, and Jian Tang. GeoDiff: A geometric diffusion model for molecular conformation generation https://openreview.net/forum?id=PzcvxEMzvQC. International Conference on Learning Representations, 2022

  21. [21]

    Torsional diffusion for molecular conformer generation https://openreview.net/forum?id=w6fj2r62r_H

    Bowen Jing, Gabriele Corso, Jeffrey Chang, Regina Barzilay, and Tommi Jaakkola. Torsional diffusion for molecular conformer generation https://openreview.net/forum?id=w6fj2r62r_H. In Advances in Neural Information Processing Systems, 2022

  22. [22]

    Normalizing flows for probabilistic modeling and inference https://jmlr.org/papers/v22/19-1028.html

    George Papamakarios, Eric Nalisnick, Danilo Jimenez Rezende, Shakir Mohamed, and Balaji Lakshminarayanan. Normalizing flows for probabilistic modeling and inference https://jmlr.org/papers/v22/19-1028.html. Journal of Machine Learning Research, 22(57):1--64, 2021

  23. [23]

    Variational inference with normalizing flows https://proceedings.mlr.press/v37/rezende15.html

    Danilo Jimenez Rezende and Shakir Mohamed. Variational inference with normalizing flows https://proceedings.mlr.press/v37/rezende15.html. In International Conference on Machine Learning, pages 1530--1538, 2015

  24. [24]

    Density estimation using Real NVP https://openreview.net/forum?id=HkpbnH9lx

    Laurent Dinh, Jascha Sohl-Dickstein, and Samy Bengio. Density estimation using Real NVP https://openreview.net/forum?id=HkpbnH9lx. International Conference on Learning Representations, 2017

  25. [25]

    R. L. Dobrushin. The description of a random field by means of conditional probabilities and conditions of its regularity https://doi.org/10.1137/1113026. Theory of Probability and Its Applications, 13(2):197--224, 1968

  26. [26]

    Fisher and Michael N

    Michael E. Fisher and Michael N. Barber. Scaling theory for finite-size effects in the critical region https://doi.org/10.1103/PhysRevLett.28.1516. Physical Review Letters, 28(23):1516--1519, 1972

  27. [27]

    Finite Size Scaling and Numerical Simulation of Statistical Systems https://doi.org/10.1142/1011

    Vladimir Privman, editor. Finite Size Scaling and Numerical Simulation of Statistical Systems https://doi.org/10.1142/1011. World Scientific, 1990

  28. [28]

    Counting independent sets up to the tree threshold https://doi.org/10.1145/1132516.1132538

    Dror Weitz. Counting independent sets up to the tree threshold https://doi.org/10.1145/1132516.1132538. In ACM Symposium on Theory of Computing, pages 140--149, 2006

  29. [29]

    Recurrence of distributional limits of finite planar graphs https://doi.org/10.1214/EJP.v6-96

    Itai Benjamini and Oded Schramm. Recurrence of distributional limits of finite planar graphs https://doi.org/10.1214/EJP.v6-96. Electronic Journal of Probability, 6:1--13, 2001

  30. [30]

    Michael Steele

    David Aldous and J. Michael Steele. The objective method: probabilistic combinatorial optimization and local weak convergence https://doi.org/10.1007/978-3-662-09444-0_1. In Probability on Discrete Structures, pages 1--72. Springer, 2004

  31. [31]

    Crystal Statistics

    Lars Onsager. Crystal statistics. I. A two-dimensional model with an order-disorder transition https://journals.aps.org/pr/abstract/10.1103/PhysRev.65.117. Physical Review, 65(3--4):117--149, 1944

  32. [32]

    Georgii, Gibbs Measures and Phase Transitions, 2nd ed., De Gruyter Studies in Mathematics Vol

    Hans-Otto Georgii. Gibbs Measures and Phase Transitions https://doi.org/10.1515/9783110250329. De Gruyter, second edition, 2011

  33. [33]

    Herbert E. Robbins. An empirical Bayes approach to statistics https://doi.org/10.1007/978-1-4612-0919-5_26. In Breakthroughs in Statistics, pages 388--394. Springer, 1992

  34. [34]

    Tweedie’s formula and selection bias.Journal of the Amer- ican Statistical Association, 106(496):1602–1614, 2011

    Bradley Efron. Tweedie's formula and selection bias https://doi.org/10.1198/jasa.2011.tm11181. Journal of the American Statistical Association, 106(496):1602--1614, 2011